0% found this document useful (0 votes)

29 views237 pages

Girvin Introduction To Quantum 2024-04-21 v45

Uploaded by

sssergiomatias

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views237 pages

Girvin Introduction To Quantum 2024-04-21 v45

Uploaded by

sssergiomatias

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 237

Introduction to Quantum Information,

Computation and Communication

PHYS 345 Spring 2024

Steven M. Girvin

Disclosure:
The author is a shareholder in IBM Corporation,
and consultant for, and equity holder in,
Quantum Circuits, Inc., a quantum computing company.

[Compiled: May 4, 2024]

0.1 Preface
These lecture notes are intended for undergraduates from a variety of dis-
ciplines (e.g. physics, mathematics, chemistry, computer science, electrical
engineering) interested in an initial introduction to the new field of quantum
information science.
The reader may wish to consult texts such as ‘An Introduction to Quan-
tum Computing,’ by Phillip Kaye, Raymond Laflamme and Michele Mosca,
hereafter referred to as [KLM], or ‘Quantum Computing: A Gentle Introduc-
tion,’ by Eleanor Rieffel and Wolfgang Polak, hereafter referred to as [RP].
Computer scientists may be interested in consulting ‘Quantum Computer
Systems,’ by Yongshan Ding and Fredrick Chong. Having mastered the lec-
ture notes for this class, you will be ready to open the bible in the field,
‘Quantum Computation and Quantum Information,’ by Michael Nielsen and
Isaac Chuang, universally referred to as ‘Mike and Ike’ (perhaps after the
candy bar of the same name).
Contents

0.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

1 Introduction 1
1.1 What is Information? . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Error Correction and Data Compression . . . . . . . . . . . 14
1.3 What is a computation? . . . . . . . . . . . . . . . . . . . . 23
1.4 Universal Gate Sets . . . . . . . . . . . . . . . . . . . . . . . 30

2 Quantum Bits: Qubits 36

2.1 Quantum Bits for Computer Scientists . . . . . . . . . . . . 37
2.2 Quantum Bits for Budding Physicists . . . . . . . . . . . . . 41

3 Introduction to Hilbert Space 57

3.1 Linear Operators on Hilbert Space . . . . . . . . . . . . . . . 60
3.2 Dirac Notation for Operators . . . . . . . . . . . . . . . . . . 67
3.3 Orthonormal bases for qubit states . . . . . . . . . . . . . . 68
3.4 Rotations in Hilbert Space . . . . . . . . . . . . . . . . . . . 74
3.5 Hilbert Space and Operators for Multiple Qubits . . . . . . . 86

4 Two-Qubit Gates and Multi-Qubit Entanglement 95

4.1 Introduction: Multi-Qubit Operations and Measurements . . 95
4.2 Multi-Qubit Measurements . . . . . . . . . . . . . . . . . . . 96
4.3 Two-Qubit Gates . . . . . . . . . . . . . . . . . . . . . . . . 99
4.4 Bell Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.5 Quantum Dense Coding . . . . . . . . . . . . . . . . . . . . 116
4.6 No-Cloning Theorem Revisited . . . . . . . . . . . . . . . . . 122
4.7 Quantum Teleportation . . . . . . . . . . . . . . . . . . . . . 126
4.8 YET TO DO: . . . . . . . . . . . . . . . . . . . . . . . . . . 129

ii
CONTENTS iii

5 Algorithms and Complexity Classes 131

5.1 Phase ‘Kickback’ of a Controlled-Unitary Operation . . . . . 132
5.2 Exponentiation Gadget . . . . . . . . . . . . . . . . . . . . . 135
5.3 Quantum Oracles . . . . . . . . . . . . . . . . . . . . . . . . 138
5.4 Deutsch Algorithm . . . . . . . . . . . . . . . . . . . . . . . 142
5.5 Deutsch-Jozsa Algorithm . . . . . . . . . . . . . . . . . . . . 146
5.6 Bernstein-Vazirani Problem . . . . . . . . . . . . . . . . . . 152
5.7 Simon’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . 159
5.8 Grover Search Algorithm . . . . . . . . . . . . . . . . . . . . 170
5.9 VQE and QAOA Algorithms . . . . . . . . . . . . . . . . . . 181
5.10 Phase Estimation Algorithm . . . . . . . . . . . . . . . . . . 181
5.11 Quantum Fourier Transform . . . . . . . . . . . . . . . . . . 181

6 Quantum Error Correction 182

6.1 An advanced topic for the experts . . . . . . . . . . . . . . . 188

7 Yet To Do 191

A Quick Review of Probability and Statistics 195

A.1 Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
A.2 Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
A.3 Statistical Estimators . . . . . . . . . . . . . . . . . . . . . . 200
A.4 Joint Probability Distributions . . . . . . . . . . . . . . . . . 207

B Formal Definition of a Vector Space 211

B.1 Basis Vectors and the Dimension of a Vector Space . . . . . 212
B.2 Inner Products and Norms . . . . . . . . . . . . . . . . . . . 213
B.3 Dirac Notation, Outer Products and Operators for Single
and Multiple Qubits . . . . . . . . . . . . . . . . . . . . . . 215
B.4 The Vector Space of Bit Strings . . . . . . . . . . . . . . . . 220

C Handy Mathematical Identities 222

D Suggestions for Projects 226

D.1 Gates and Instruction Sets . . . . . . . . . . . . . . . . . . . 226
D.2 Efficient Representation of Quantum States . . . . . . . . . . 227
D.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
D.4 Quantum Error Correction and Fault Tolerance . . . . . . . 229
D.5 Quantum Communication and Security . . . . . . . . . . . . 230
CONTENTS iv

D.6 Quantum Complexity Classes . . . . . . . . . . . . . . . . . 230

D.7 Quantum Hardware Platforms . . . . . . . . . . . . . . . . . 230
Chapter 1

Introduction

The first quantum revolution began with the development of the quan-
tum theory early in the 20th century. Scientists carrying out fundamental,
curiosity-driven research about the nature of reality, discovered that reality
is not what it seems to humans. The laws of classical mechanics developed
by Galileo, Newton and others beautifully and accurately describe the mo-
tion of macroscopic objects like baseballs, soccer balls and planets, but they
fail for microscopic objects like atoms and electrons. This research was not
motivated by practical applications and indeed seemed likely to have none.
And yet it did. The first quantum revolution led to the invention of the
transistor, the laser and the atomic clock. These three inventions in turn
produced the fantastic technological revolution of the second half of the 20th
century that brought us (among many other things) the computer, fiber-optic
communication and the global positioning system.
There is now a second quantum revolution underway in the 21st century.
It is not based on a new theory of quantum mechanics. Indeed the quantum
theory is essentially unchanged from the 1920’s and is the most successful
theory in all of physical science–it makes spectacularly accurate quantitative
predictions. Instead this second revolution is based on a new understanding
of the often counter-intuitive aspects of the quantum theory. We now un-
derstand that the quantum devices invented in the 20th century do not use
the full power that is available to quantum machines. This new understand-
ing has rocked the foundations of of theoretical computer science laid down
nearly a century ago by Church and Turing in their exploration of the con-
cept of computability. This new understanding is leading to the invention of
radically new techniques for processing and communicating information and

1
CHAPTER 1. INTRODUCTION 2

for making measurements of the tiniest signals. These new techniques are
being rapidly developed and if they can be transformed into practical tech-
nologies (and this is a big ‘if’), the second quantum revolution could have an
impact on human society at least as profound as the technological revolution
of the 20th century.
Achieving the full potential of the second quantum revolution will require
interdisciplinary research involving physics, computer science, electrical en-
gineering, mathematics and many other disciplines. My goal is to present
quantum information, computation and communication in a way that is ac-
cessible to undergraduates from any quantitative discipline. This will require
removing a lot of the physics language, historical background and motivation
to the quantum theory and simply presenting a set of rules by which one can
describe the states and operations of quantum machines.
Before entering the quantum realm, let us begin by reviewing some ba-
sic concepts about information and how it is quantified and processed. By
analogy with the fact that the pre-quantum theory of mechanics is known
today as ‘classical’ mechanics, we will refer to the pre-quantum theory of
information as classical information.
Both classical and quantum information involves the concept of random-
ness. Before going further, the reader is therefore urged to review the basic
concepts of probability and statistics in Appendix A

1.1 What is Information?

Information is ‘news’ or ‘surprise.’ If Alice tells Bob1 something he already
knows, she has not transmitted any information to him and Bob has not
learned (received) any information. Suppose for example that Bob asks Alice
a question that has two possible answers (yes/no or true/false, say). Further
suppose that Bob does not know the answer to his own question. Alice can
transmit the answer as a message to Bob by choosing one of two (agreed
upon) symbols, say T or F for true or false, or y or n for yes or no. For
simplicity we will assume that Alice transmits a ‘binary digit,’ or ‘bit’, either
the symbol 0 or 1.
1
In the quantum communication literature, it is traditional to refer to the two commu-
nicating parties as ‘Alice’ and ‘Bob.’ An eavesdropper listening in on the conversation is
traditionally referred to as ‘Eve.’ Who says physicists don’t have a sense of humor?
CHAPTER 1. INTRODUCTION 3

The word ‘bit’ is short for binary digit a number whose value can be
represented by 0 or 1 in the binary numbering system (base 2 numbers).
The word bit is also used to mean the amount of information contained in
the answer to a yes/no or true/false question (assuming you have no prior
knowledge of the answer and that the two answers are equally likely as far
as you know). If Alice gives Bob either a 0 or a 1 drawn randomly with
equal probability, then Bob receives one bit of information. Bits and base
2 numbers are very natural to use because information is physical. It is
transmitted and stored using the states of physical objects, as illustrated
in Fig. 1.1. For example an electrical switch can be open or closed and its
state naturally encodes one 1 bit of information. Similarly a transistor in a
computer chip can be in one of two electrical states, on or off. Discrete voltage
levels in a computer circuit, say 0 volts and 5 volts are also used to encode
the value of a bit. This discretization is convenient because it helps make
the system robust against noise. Any value of voltage near 0 is interpreted
as representing the symbol 0 and any value near 5 volts is interpreted as
representing the symbol 1. Information can also be stored in small domains
of magnetism in disk drives. Each magnetic domain acts like a small bar
magnet whose magnetic moment (a vector quantity pointing in the direction
running from the south pole to the north pole of the bar magnet) can only
point up or down. Ordinarily a bar magnet can point in any direction in
space, but the material properties of the disk drive are intentionally chosen
to be anisotropic so that only two directions are possible. Information is also
communicated via the states of physical objects. A light bulb can be on or
off and the particles of light (photons) that it emits can be present or absent,
allowing a distant observer to receive information.

Box 1.1. Two meanings of ‘bit’ The word bit can refer to the mathe-
matical measure of information content of a message consisting of a string
of symbols with a specified probability distribution. It can also refer to the
physical object storing the information, as in ‘My computer has 100 bytes of
memory’, with one byte being 8 bits. A useful table of prefixes is
kilo 103 peta 1015
mega 106 exa 1018
giga 109 zetta 1021
terra 1012 yotta 1024
CHAPTER 1. INTRODUCTION 4

“0” “1”

“1” “0”

Figure 1.1: The fact that ‘information is physical’ is illustrated by this electrical circuit
encoding one bit of information. Upper panel: Bit value 0 (1) is represented by the switch
being open (closed) and the light bulb is off (on). Lower panel: There is only one other
possible encoding of the classical information, namely the one in which states 0 and 1 and
interchanged. A simple NOT operation transforms one encoding into the other. We will
see that the quantum situation is much richer than this.

A storage register holding N bits can be in one of 2N distinct states. For

example 3 classical bits can be in 8 states:

(000), (001), (010), (011), (100), (101), (110), (111),

corresponding to the 8 binary numbers whose (base 10) values run from 0
to 7. This is a remarkably efficient encoding–with an ‘alphabet’ of only two
symbols (0 and 1) Alice can send Bob one of 232 = 4, 294, 967, 296 possible
messages at a cost of only having to transmit 32 bits (or 4 ‘bytes’). In fact,
if all the messages are equally likely to be chosen, this is provably the most
efficient possible encoding.2
Let us now turn this picture around by considering the same situation in
which Alice sends one message from a collection of M = 2N possible distinct
messages to Bob. If Alice is equally likely to choose all messages, this will
maximize Bob’s ‘surprise’ because no information in advance about which
message is likely to be sent. Thus the information he receives is maximized.
We have seen that Alice must pay the price of sending N = log2 M physical
2
More precisely, there is no encoding more efficient than this. There are many encodings
that are as efficient that differ simply by permutation of the symbols. For example, we
could order the binary digits with the least significant to the left instead of to the right.
CHAPTER 1. INTRODUCTION 5

bits to Bob. This strongly suggests that we should quantify the amount of
information that Bob receives as

H = log2 M = N (1.1)

bits of information (one for each physical bit received). If the messages are
not all equally likely to be sent, the surprise is less and we will see shortly
how to modify this formula to account for that.

Box 1.2. Reminder About Logarithms In Eq. (1.1), log2 means loga-
rithm base 2. Let us remind ourselves about some basic properties of loga-
rithms in different bases:

x = eln x (1.2)
(log10 x)
10log10 x = e(ln 10) = e(ln 10)(log10 x)

x = (1.3)
(log2 x)
2log2 x = e(ln 2) = e(ln 2)(log2 x)

x = (1.4)
ln x = (ln 2)(log2 x) = (ln 10)(log10 x) (1.5)

1
log2 x = ln x ≈ 1.442695 ln x (1.6)
ln 2

ln 10
log2 x = log10 x ≈ 3.32193 log10 x, (1.7)
ln 2
where ln is the natural logarithm (log base e). Thus the choice of different
bases simply multiplies the information measure by a constant factor. The
standard choice of log base 2 gives us (by definition) the information content
in bits.

Does it make sense that the information is logarithmic in the total number
of possible messages? Indeed it does, because if Alice randomly chooses
m messages to send, the total number of possible compound messages is
M ′ = M m and we have

H = log2 M m = mN. (1.8)

Thus the amount of information Bob receives is linear in the number of

messages Alice sends, a natural and sensible property for any measure of
information to have. It also makes sense from the number of physical bits
Alice needs to send Bob. If sending one message costs log2 M physical bits
CHAPTER 1. INTRODUCTION 6

(when using the optimal encoding), then sending m messages should require
m log2 M physical bits. As noted above, if we had chosen a different base
for the logarithm, our measure of information would just change by a fixed
scale factor. Only when we use base 2 is the information content measured
in bits.

Box 1.3. A Natural Measure of Information A natural measure of

the information content of a message randomly selected from a range of
possibilities is the length of the shortest possible encoding of the messages.
The intuition is that to transmit more information requires transmitting
more symbols. Suppose that there is a choice of M = 2N possible messages
(labeled say by a message number j ∈ {0, M − 1}), and both Alice and
Bob have a book that contains the messages written out in plain text. Each
message could have a different length of text ranging from a single word to
an entire book chapter. The only thing that Alice needs to send Bob is the
number labeling the message, not the text of the message itself. The most
efficient encoding is for Alice to send a string of N bits that constitute the
binary representation of the message number j. This measure of information
content in terms of most efficient possible encoding is thus consisten with our
definition in Eq. (1.1). Note that there are plenty of less efficient encodings.
For example, Alice could transmit a string of 2N bits, all of which are 0 except
for the bit at position j in the string. This would uniquely identify which
message was sent but is exponentially less efficient than using the binary
number representation which requires only transmitting a string of length
N . Table 1.1 illustrates these two encodings for a set of M = 8 possible
messages.

So far we have only considered the case where Alice chooses randomly
with equal probability amongst M = 2N messages. It is the randomness
of her choice that insures that Bob is always surprised by the message he
receives. He has no way of guessing in advance what the message will be.
You might wonder why Alice wants to send Bob random messages. If she
were sending Bob text written in English or Swedish say, the characters she
sends would not be random. For example, in English text the letters e,t and
s occur more frequently than other letters. Perhaps Alice is a spy and she
is sending secretly encoded messages to Bob. If Alice made the mistake of
using a simple substitution cipher in which (say) e is always represented as g,
t as w and s as p, etc., then an eavesdropper would notice that the symbols
CHAPTER 1. INTRODUCTION 7

Message Number Binary Encoding Unary Encoding

0 000 00000001
1 001 00000010
2 010 00000100
3 011 00001000
4 100 00010000
5 101 00100000
6 110 01000000
7 111 10000000

Table 1.1: Alice wants to transmit one of 8 messages to Bob. Since she and Bob each
have a book containing the messages, she need only transmit the number labeling the
message she wants to convey to Bob. The binary encoding requires only three bits. This
is the shortest possible encoding (using two symbols) and thus a natural measure of the
information content is 3 bits. There are many less efficient encodings such as the unary
encoding shown in the third column of the table. Notice that this inefficient encoding is not
very random since 0 occurs exponentially more frequently than 1 (relative to the binary
encoding). It is this lack of randomness (‘surprise’) that makes the encoding inefficient.

g, w and p appear more frequently than others and this would give a clue
that could be used to help break the code. To prevent this, Alice should
instead use a code with the property that the encoded messages appear to
be completely random. Perhaps Alice is not a spy, but merely wants to
compress the data she sends so that the message is as short as possible. As
we will see below, the non-randomness of natural language texts means that
they can be compressed into shorter messages, but since the total amount
of information must be preserved, the compressed messages are necessarily
more random.
We already saw an illustration of this concept in the example given in
Table 1.1. Provided that Alice selects her messages to send randomly with
equal probability, then the bits in the binary code (middle column) are ran-
dom and unpredictable, whereas the bits in the inefficient code are much less
random (being mostly zeros). Suppose however that the probability distribu-
tion from which Alice selects her messages is not uniform. Say for example,
message 0 is selected 93% of the time and the seven remaining messages are
selected with probability 1%. Then the binary bits of the binary code are less
random, being all zeros 93% of the time (since message 0 is encoded as (000)
CHAPTER 1. INTRODUCTION 8

in the simple binary code). How do we quantify the information content in

this case? It must be smaller because there is less surprise for Bob. This
means that we should be able to find an even more efficient code to transmit
the messages with fewer but more random bits.
In 1948, Claude Shannon, a scientist at Bell Telephone Laboratories,
invented information theory and explained how to quantify information for
messages drawn from a given probability distribution. He called the quantity
of information the ‘information entropy’ and gave it the symbol H. Entropy
is also a term from the field of thermodynamics3 and is a measure of the
randomness or unpredictability of the microscopic state of the molecules in
a gas or fluid. Thermodynamic entropy can be viewed as the amount of in-
formation Bob gains when Alice tells him the microstate of the gas or fluid
(i.e. the positions and velocities of all the particles). In a crystalline solid
the positions of the atoms are highly ordered since they are stacked in an in-
finitely repeating pattern as shown in Fig. 1.2. Once you know the positions
a few atoms you can predict the positions of the others. This is not true in
a liquid which is much more disordered. Hence a liquid has greater thermo-
dynamic entropy than a crystalline solid. Similarly the Shannon entropy of
the message listing the positions of all the atoms in a liquid is much greater
than that of the message listing all the positions of the atoms in a solid.
Shannon showed that H is a simple function (which we derive in Box 1.5)
of the probability of receiving different messages
M
X
HM = − pj log2 pj , (1.9)
j=1

where pj is the probability of receiving the jth message from the list of M
distinct possible messages. We need the minus sign in the definition because
probabilities always lie in the domain 0 ≤ p ≤ 1 and so their log is always
negative.4 For the case we have been considering up to now, all messages
3
Thermodynamics is the study of heat, work and phase transitions such as melting of
solids into liquids and boiling of liquids into vapors. Chemists use it to predict chemical
reactions and physicist and engineers use it to predict the efficiency of heat engines. In
these fields, the standard symbol for entropy is S and it is usually define with natural
logarithms rather than base 2 logs.
4
Note that the logarithm diverges as pj goes to zero, but only very slowly. Using
l’Hôpital’s rule, one sees that the limit limpj →0 pj log2 (pj ) = 0. We implicitly use this
smooth limit in the case where any of the probabilities actually are zero and the corre-
sponding logarithm is undefined.
CHAPTER 1. INTRODUCTION 9

Figure 1.2: Left panel: positions of atoms in an ideal crystal. If you are told the positions
of a few atoms in the upper left corner, you can predict the positions of the rest because
of the simple repeating nature of the pattern. Right panel: snapshot of the positions of
the atoms in a liquid at some moment. The positions are not completely random (e.g., no
two atoms occupy the same space) but they are much more random than in the crystal.
Thus the thermodynamic entropy is larger for the liquid, as is the Shannon information
entropy of the message listing the positions of all the atoms.

have been equally likely, so pj = 1/M and

M
X 1 1
HM =− log2 = log2 M, (1.10)
j=1
M M

in agreement with our previous result in Eq. (1.1). We begin to see here that
the information content of a set of messages is a function of the probability
distribution over the set of possible messages.
It turns out that Shannon’s Eq. (1.9) is still correct even if different
messages have different probabilities. Indeed we should think of Shannon
entropy as a general property of any probability distribution. The entropy is
a measure of the randomness associated with sampling from the distribution.
To begin to understand why this is so, consider the case M = 2 with the
probability of receiving the symbol 0 being p0 and the probability of receiving
the symbol 1 being p1 = 1 − p0 The Shannon entropy for this special case,
H2 = h(p0 , p1 ) is plotted in Fig. 1.3. The graph makes intuitive sense because
the information content of the message goes to zero when the probability of
either symbol approaches unity and the other approaches zero. In this limit
the messages are completely predictable and there is no surprise left. We
also see that, consistent with our earlier intuition above, when the messages
are equally likely (p0 = p1 = 21 ), the randomness is maximized and the
information content of the message takes on its maximum value of precisely
CHAPTER 1. INTRODUCTION 10

h( p0 , p1 )

Figure 1.3: Shannon information entropy for a message containing a single symbol. The
symbol 0 is chosen randomly with probability p0 and the symbol 1 is chosen with proba-
bility p1 = 1 − p0 . Notice that the information content (Shannon entropy) is maximized
at p0 = p1 = 12 , the point at which the surprise as to which symbol is chosen is maximum.

one bit.
It may seem strange to talk about fractions of a bit of information. What
does that mean? To prevent confusion we have to remember that the word
bit has two meanings (see Box 1.1). First it can mean a physical carrier of
information that can be in two possible states (e.g. a switch that is on or
off). The number of these physical objects is always an integer. The second
meaning is an abstract measure of your level of surprise in reading a message.
The message is carried by single physical objects storing a 0 or 1. If each
object is almost always in state 0 when you read it, then you are not very
surprised when the next one drawn from the same probability distribution
is also 0. Thus the Shannon information (entropy) carried by one physical
instantiation of a bit can be much less than one abstract mathematical bit
of information.
Suppose now that Alice sends Bob two symbols each drawn from the
same probability distribution. If Eq. (1.9) is correct, then it should predict
that the total information transmitted is twice as large as when only a single
symbol is sent. We can think of this two-symbol message as being drawn
from an ‘alphabet’ of M = 4 possible messages chosen from a probability
distribution R shown in Table 1.2.
Here we have used the fact that if two events are statistically independent,
their joint probability is the product of their individual probabilities5 . If Alice
5
This factoring of a joint probability distribution into two separate probability distri-
CHAPTER 1. INTRODUCTION 11

Ordinal number of message Message Probability

0 00 r0 = p 0 p0
1 01 r1 = p 0 p1
2 10 r2 = p 1 p0
3 11 r3 = p 1 p1

Table 1.2: Enumeration of the four possible messages of length two and their corresponding
probabilities.

is using a biased coin toss to select the symbols for the two messages, the
second coin toss is unaffected by the outcome of the first. Plugging this into
Eq. (1.9) and using the probability distribution R yields
3
X
H4 = − rj log2 (rj ) (1.11)
j=0
= − {p0 p0 log2 (p0 p0 ) + 2p0 p1 log2 (p0 p1 ) + p1 p1 log2 (p1 p1 )}
= − {2p0 p0 log2 (p0 ) + 2p0 p1 [log2 (p0 ) + log2 (p1 )] + 2p1 p1 log2 (p1 )}
= −2 {(p0 + p1 )p0 log2 (p0 ) + (p0 + p1 )p1 log2 (p1 )}
= −2 {p0 log2 (p0 ) + p1 log2 (p1 )}
= 2h(p0 , p1 ) = 2H2 . (1.12)

Shannon’s expression in Eq. (1.9) has precisely the desired property that the
information content is linear in the number of messages sent. With a little
more work one can show that this expression is the unique correct extension
of Eq. (1.1) to the case where the messages in the alphabet do not all have
the same probability.
butions is essentially the definition of statistical independence: P (A, B) = P1 (A)P2 (B).
See App. A
CHAPTER 1. INTRODUCTION 12

Box 1.4. Shannon Entropy as the Average Information Because the

Shannon entropy is a function of the entire probability distribution, it makes
more sense to view it as the average information content of messages drawn
from the probability distribution, rather than the information content of
any single message drawn from the distribution. In this picture we can
interpret Eq. (1.9) as saying that the information content of the jth message
is (− log2 pj ) bits. The rarer a particular message is, the more suprised you
are when you receive it. On the other hand you don’t receive it very often
so it doesn’t contribute much to the average information.
CHAPTER 1. INTRODUCTION 13

Box 1.5. Derivation of Shannon’s Formula To derive Shannon’s ex-

pression we can use the following simple construction to turn a problem of
N messages numbered 1, . . . , N , all with equal probability into a problem
with g messages of varying probabilities. Suppose we have a set of g groups
{G1 , G2 , . . . , Gg } (boxes if you like). We take our set of N messages all of
equal probability and we assign Pg n1 of them to group G1 , n2 of them to group
G2 , and so on. Clearly j=1 nj = N. Since all the messages are equally
likely, each one carries H = log2 N bits of information. Equivalently telling
the receiver the ordinal number of the message gives the receiver H bits of
information. Suppose however instead of telling the receiver the ordinal num-
ber of the message, the sender only tells receiver which group the message
comes from. We don’t yet know how to calculate how much information this
transmits, so let’s define the information that the message comes from group
Gj to be Hj . We will shortly see how to compute this.
What is the probability Pj that the message is from group Gj ? If the messages
are selected uniformly at random then clearly Pj is just equal to the fraction
n
of the messages found in group Gj Pj = Nj . Clearly these messages do not
(necessarily) have equal probabilities. Thus we should compute the average
information when relaying to the receiver which group the message came
from
g
X
HG = Pj Hj . (1.13)
j=1

The following clever argument allows us to compute Hj . Suppose that having

told the receiver that the message came from group Gj , the send then tells
the receiver which of the nj messages it was. Since they are all equally likely,
this news represents log2 nj bits of information. Since the receiver now knows
which of the N messages was sent and thus possesses a total of log2 N bits
of information. From this we deduce that

Hj + log2 nj = log2 N, (1.14)

nj
Hj = − log2 , (1.15)
N
Hj = − log2 Pj , (1.16)
g
X
HG = − Pj log2 Pj , (1.17)
j=1

which is Shannon’s formula as stated in Eq. (1.9).

CHAPTER 1. INTRODUCTION 14

Exercise 1.1. Suppose that Alice sends to Bob one message drawn
from a code book of M1 messages and having probability distribution P ,
and one drawn from a different code book of M2 messages and having
probability distribution Q. Show that the average information content of
these two messages is simply the sum of the average information content
of the individual messages.

H = HP + HQ

This is a generalization of the result in Eq. (1.12).

Exercise 1.2. Suppose that Alice sends to Bob N = 1000 messages

drawn from the following probability distribution

P = {0.93, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01}. (1.18)

a) What is the average information content (in bits) of each message?

b) Suppose Alice finds an optimal encoding that allows her to bundle

all 1, 000 messages into a single binary code word. How long is that
code word? [Strictly speaking, a perfectly optimal encoding that
has arbitrarily low failure probability for decoding requires N to
be asymptotically large (i.e., N → ∞). We here assume N = 1000
is large enough that the decoder almost always works.]

1.2 Error Correction and Data Compression

1.2.1 Error correction
Error correction is another example of the numerous insights that can be
gained from Shannon’s work. If the communication channel is noisy (e.g.
some bits are randomly flipped from 0 to 1 or vice versa when the message is
sent), then there exist useful codes which are longer and less random than the
original message. These redundant encodings are useful because the original
message can still be recovered (with high probability) even though the noise
adds unwanted randomness. A necessary (but not sufficient) condition for the
receiver to be able to decode the message is that the encoded message be long
enough to have enough extra information capacity to hold both the original
uncorrupted message and the information on what particular errors occurred
during transmission. If this is not satisfied no code, no matter how clever,
CHAPTER 1. INTRODUCTION 15

can work. On the other hand this condition is certainly not sufficient because
the code has to be very cleverly designed so that the receiver can learn which
bits are erroneous without knowing in advance anything about the message
other than what redundant encoding was used. These same concepts will
be important when we study quantum communication and quantum error
correction and so we will delay our study them until then.
Without understanding how to construct (classical) error correction
codes, we can however derive a bound on how much longer (and less ran-
dom) the encoded message has to be if the receiver is going to have a good
chance to receive it over a noisy channel and correctly decode it. This so-
called Hashing bound is described in Ex. 1.3.
CHAPTER 1. INTRODUCTION 16

Exercise 1.3. Hashing Bound for Error Correction of Noisy

Channels Alice has a message M of random bits of length LM she
wishes to send Bob. She encodes this into a longer string Mencoded
of of length N bits that is less random and sends it to Bob through
a noisy channel. The channel has a small probability ϵ of any given
bit being erroneously flipped during transmission. We can think of
the action of the error channel as increasing the randomness of the
bit string in the following way: Alice transmits the encoded message
Mencoded of N bits but before Bob receives it, the noise demon cre-
ates a random error string of the form C = 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, . . .
of length N in which each bit is chosen randomly to be 1 with prob-
ability ϵ and 0 with probability 1 − ϵ. The noise demon then trans-
forms Alice’s transmitted bit string into a corrupted bit string M ′
via

M ′ = Mencoded ⊕ C (1.19)

where ⊕ indicates bitwise addition modulo 2 (i.e., 0 ⊕ 0 = 0, 0 ⊕ 1 =

1, 1 ⊕ 0 = 1, 1 ⊕ 1 = 0).

1. What is the mean N̄e and standard deviation σe of the number

of errors? Hint: Think of a random walk defined by the bit
string C in which the walker steps 0 distance to the right if the
bit is correct and 1 step to the right if the bit is erroneous. The
final position of the walker is the total number of errors. Note
that in the limit where N is so large that the average number
of errors is also large, you may use the central limit theorem to
argue that the number of errors is (to a good approximation)
Gaussian distributed around N̄e with width σe ≪ N̄e .

2. What is the total Shannon entropy Se in the probability dis-

tribution of the errors (i.e., the entropy in the error string
C)?

3. If Alice cleverly encodes her message within the string of N

bits, Bob should be able to decode it to learn both the string
of errors and the bit string of the original intended message
M of length LM . Since Alice sends N bits and Bob learns Se
bits about the noise. What is the maximum number of bits
LM that can be in Alice’s intended message if Bob is to be
able to decode it? This is called the Hashing Bound (valid for
asymptotically large N ). Efficient error correction codes can
approach but never exceed this bound.
CHAPTER 1. INTRODUCTION 17

1.2.2 Data Compression

We have seen that randomness is a key feature of information transmission.
Without randomness in Alice’s choice of messages to send, Bob will not be
surprised by what he receives and thus will learn no new information. From
Shannon’s deep insight many important results follow. For example, the
concept of data compression is based on the idea that if the string of symbols
in a long message is not completely random, then there exists a code (a new
set of symbols) that is shorter but more random than the original, that can be
used to store or transmit the same information more compactly. Consider for
example, the Morse code which was invented for telegraphy by Samuel F.B.
Morse (Yale College class of 1810). This is a binary code consisting of ‘dots,’
‘dashes’ with an additional symbol (blank space) separating code words for
English letters. The code is transmitted by pulses of current in a wire, with
‘dot’ corresponding to a brief pulse and ‘dash’ corresponding to a longer pulse
(about 3X longer). The most commonly occurring letter in English text, ‘e,’
is given the shortest code word, ‘dot.’ The next most commonly occurring
letter, ‘t,’ is given the next shortest code word, ‘dash.’ A rarely occurring
letter like ‘z,’ is given a longer code word, ‘dash,dash,dot,dot.’ You could
imagine an even more efficient encoding that uses small words to represent
commonly used phrases such as commonly done in modern text messaging
when LOL is used for ‘laugh out loud.’
To better understand classical data compression, let us consider a specific
example. Suppose that Alice sends Bob a string of M ≫ 1 bits each drawn
from the probability distribution p0 = 1 − ϵ, p1 = ϵ, where ϵ ≪ 1. Further
suppose that even though ϵ ≪ 1, M is so large that ϵM ≫ 1. Alice’s message
consists mostly of 0’s with occasional 1’s sprinkled randomly throughout.
However the message is so long that the average total number of 1’s, ϵM ≫ 1.
Typically there are many consecutive 0’s between each bit with value 1, so
the message is not very random and the information content is low. The
Shannon entropy is in fact

HM = −M [(1 − ϵ) log2 (1 − ϵ) + ϵ log2 ϵ.] . (1.20)

Rather than Alice transmitting the entire bit string of length M , she
could save time and instead transmit a compressed encoding of the message
as a smaller bit string of length N . In order for the encoded bit string to
contain the same amount of information as the original bit string, it has to
be more random than the original bit string. Since the information in the
CHAPTER 1. INTRODUCTION 18

original message is HM , the minimum possible bit string length that a data
compression algorithm can theoretically achieve would be N = HM or equiv-
alently a maximum compression factor r = M/HM . Practical compression
codes can approach this limit but never exceed it.
Exercise 1.4. A random source outputs 0 with probability p0 =
0.37 and 1 with probability p1 = 0.63. This source is used to con-
struct a string of M = 220 bits (i.e., 1 megabit = (1024)2 bits) which
is stored on a hard drive. What is the minimum file size (in bits)
that a data compression program could achieve in this case?

Optional Reading on Possible Encodings

One possible (not necessarily optimal) encoding could be such that each bit
string is a binary number indicating the number of consecutive 0’s in the
original message before the next 1 occurs (in the bit string). Let us try to
estimate how much this code compresses the data.
Let P (ℓ) be the probability that the next bit with value 1 (in the original
message) is separated from the previous occurrence of 1 by ℓ zeros. We then
have that

P (ℓ) = (1 − ϵ)ℓ ϵ. (1.21)

The first term in the RHS above represents the probability that all of the
next ℓ bits are zero and the last term represents the probability that the
ℓ + 1th bit is 1. Using the properties of geometric series, we can readily prove
that (taking M to infinity)
∞
X
P (ℓ) = 1. (1.22)
ℓ=0

Let the number of consecutive 0’s after a particular 1 be L. An N -bit binary

number can represent any integer from L = 0 to L = 2N − 1. Hence our code
will succeed provided that L < 2N . The probability of failure is (for each 1
CHAPTER 1. INTRODUCTION 19

in the message)
∞
X
Pfail = ϵ(1 − ϵ)ℓ
ℓ=2N
∞
2N ′
X
= (1 − ϵ) ϵ(1 − ϵ)ℓ
ℓ′ =0
2N
= (1 − ϵ) . (1.23)
This is the failure probability encoding the number of 0’s after a particular
one of the 1’s, in the limit of an extremely long message (since we have taken
the upper limits on the summations to be infinity). Because of the double
exponential, N does not have to be very large in order to make Pfail extremely
small (even if ϵ is small). It is convenient to approximate the expression for
Pfail by taking its (natural) logarithm and stopping its Taylor series expansion
at first order in ϵ
ln Pfail = 2N ln(1 − ϵ) ≈ 2N (−ϵ). (1.24)
Thus for small ϵ we have to a good approximation
N
Pfail ≈ e−ϵ2 . (1.25)
The average number of times we will have to use this encoding in a mes-
sage string of length M is ϵM ≫ 1. Everyone one of these has to succeed
or the message will be garbled. Hence the overall success probability for
encoding the entire message is (for the average case)
Ptotal success = (1 − Pfail )ϵM ≈ 1 − ϵM Pfail , (1.26)
where in the last approximate equality we have assumed that ϵM Pfail is small.
The overall failure probability in this limit is then
N
Pfailure overall = 1 − Ptotal success ≈ ϵM Pfail ≈ ϵM e−ϵ2 . (1.27)
The length of the original unencoded message is M bits. In the encoded
message we are simply sending (on average) ϵM bit strings of length N .
Hence the encoded message has on average a length of D = ϵN M . Let us
compare this to the Shannon entropy of the original message

H = M H2 (1 − ϵ, ϵ) = −M (1 − ϵ) log2 (1 − ϵ) + ϵ log2 ϵ

≈ ϵ 1/ ln(2) − log2 ϵ M ≪ M, (1.28)
CHAPTER 1. INTRODUCTION 20

where in the last step we have used log2 (x) = ln(x)/ ln(2) ≈ 1.4427 ln(x).
Naturally, for small ϵ the information content of the message is much smaller
than the length of the message. In principle we should be able to compress the
message of length M (nearly) down to a length D approaching H, achieving
a compression factor approaching
M −1 1
C= = ≈ . (1.29)
H (1 − ϵ) log2 (1 − ϵ) + ϵ log2 ϵ ϵ 1/ ln(2) − log2 ϵ

Let D = rH, where r measures how close the code comes to the theoret-
ically maximum compression factor. We know that for any reliable code we
need r > 1 because we cannot compress a message to a length that is smaller
than the Shannon information content of the message. We have (for small ϵ)
ϵN M N
r ≈ = . (1.30)
ϵM [1/ ln(2) − log2 ϵ] 1/ ln(2) − log2 ϵ
The smaller is r the less reliable is the encoding. However for r slightly
greater than unity, the failure probability can be very small for an efficient
code.
As a specific example for the code being considered here, suppose that
ϵ = 2−7 ≈ 0.78 × 10−2 . In this case the theoretical maximum compression
factor is (using the exact and approximate formulae in Eq. (1.29))

C = 15.1712 ≈ 15.161. (1.31)

Let us take the overall raw (unencoded) message length to M = 214 = 16192.
Then ϵM = 27 = 128 ≫ 1 as required by our assumptions. If we choose
N = 11 then the overall failure probability from Eq. (1.27) is
N −7
Pfailure overall ≈ 27 e−2 ≈ 1.44 × 10−5 , (1.32)

and we have for the r factor

r ≈ 1.3029, (1.33)

which is not far above the theoretical bound r = 1. If we choose to increase

N from 11 to N = 12 we obtain a slightly less efficient code r ≈ 1.42135, but
the overall failure probability drops dramatically to
N −7
Pfailure overall ≈ 27 e−2 ≈ 1.62 × 10−12 . (1.34)
CHAPTER 1. INTRODUCTION 21

This shows us the remarkable power of the double exponential suppression

of the error rate with increasing N .
We could slighlty improve the efficiency of the code by not using the trun-
cating the leading zeros in our length N bit strings that define the number
of consecutive zeros (see Ex. 1.5). However, if the bit strings are of variable
length, one must be able to transmit an additional character to mark the
beginning and end of each string. This will come at small but non-zero cost
in efficiency. In any case, the overall improvement will be modest since the
simple code described here already achieved an r close to unity (at least for
the parameters considered here).
The code described above has a serious problem if it does fail. If any
one of the bit strings representing the distance to the next occurence of 1
should fail. Then we will no longer be able to find the positions of any of
the subsequent 1’s. A better idea might be to simply encode the message by
listing the positions of each of the 1’s (rather than listing their separations).
Suppose the message length is M = 2N . If we take position of the first
bit in the message to be defined to be position x = 0 the range of possible
positions for the 1’s will be integers in the range [0, 2N − 1], we can represent
them with bit strings of (fixed) length log2 M = N without any chance of
‘overflow’ errors. The average number of 1’s will be ϵM . Hence we can
encode the message in a (variable length) bit string whose average length is
L = ϵM log2 M, (1.35)
yield a compression factor
M 1
r= = . (1.36)
L ϵ log2 M
Thus for example if M = 220 ∼ 106 , we have
1
r= , (1.37)
20ϵ
which exceeds unity for ϵ < 0.05 grows ever larger as ϵ shrinks. If ϵ log2 M >
1, we have r < 1 and this code cannot compress the data. Clearly we do not
want to choose M too large because the compression factor (slowly) decreases
as log2 M increases.
So far we have looked at some specific methods of error correction encod-
ing that may or may not be optimal. It is interesting to ask the question:
What is the theoretical limit
CHAPTER 1. INTRODUCTION 22

Exercise 1.5. NEED HELP FINDING THE ERROR IN THIS

ARGUMENT. ANY VOLUNTEERS? Compute the efficiency factor
r for the code described above in which the length of the strings of
consecutive 0’s is encoded into variable length binary numbers–that is
the leading 0’s are dropped from the encoding described above. Ignore
the cost of having to transmit an additional symbol (or blank space) to
demarcate the beginning and end of these variable length strings. Does
this code have any chance of failing? Does it achieve the theoretical
r = 1?
Hint: Let QN be the probability that the number of consecutive 0’s is
such that N bits are required to represent that number. The mean length
is given by
∞
X
N̄ = 1 + QN N,
N =1

where the first term accounts for the space that needs to be added be-
tween the variable-length binary strings.
Show that

Q1 = P (0) + P (1)
Q2 = P (2) + P (3)
Q3 = P (4) + P (5) + P (6) + P (7)
N −1
2X
QN = P (ℓ), (1.38)
ℓ=2(N −1)

where P (ℓ) is defined in Eq. (1.21), and the last equation above is valid
for any N ≥ 2. Using the properties of geometric series, find an explicit
expression for QN .
Numerical evaluation for the case ϵ = 2−7 yields N̄ = 7.67724. The
achieved compression factor is C ′ = 1/(ϵN̄ ) = 16.6726 exceeds the
theoretical limit C = 15.1712 producing an r factor less than unity:
r = 0.909945. This violation means there is something wrong with the
above argument! What has gone wrong??

We have seen in our survey of classical information theory that surprise

requires randomness and unpredictability. It turns out that randomness plays
central and fundamental role in quantum mechanics as we shall shortly see.
CHAPTER 1. INTRODUCTION 23

In fact the quantum theory is a theory of randomness and probabilities that

is a distinct alternative to the ordinary theory of classical probability and
statistics outlined in Appendix A. We will soon see that the predictions of
quantum theory are strange and wondrous and seem to make no sense until
you get used to them. The only thing they have going for them is that they
agree perfectly with experiment!

1.3 What is a computation?

We are so used to the apparent magic of modern technology that we may
not stop think about what a computation is. For our purposes, a (classical)
computer is simply a black box that has an input register into which one can
load N bits of information and output register of M bits. The black box
fills the output register bits based on some specified function of the input
register bits. You could even think of the black box as simply an editor that
edits the contents of the input register and places the result in the output
register. It is hard to imagine that this seemingly simple functionality is
what gives us video calling over the internet, computer games, the ability
to predict rocket trajectories, numerically solve differential equations, and
myriad other applications that we take for granted.
Computers are programmed in order to execute binary-valued functions

f : {0, 1}N → {0, 1}M . (1.39)

How many different such functions are there? That is, given an input of N
bits and output of M bits, how many distinct programs are there? Let us
begin by enumerating the possibilities when N = M = 1. How many distinct
functions are there that map the domain {0, 1} to the range {0, 1}? Table
1.3 enumerates the four possibilities.
For each of the 2 input values there are 2 possible choices of output value,
so we have a total of four possible functions. Let us now consider the slightly
more general case of N input bits, still only M = 1 output bits. The number
of distinct possible inputs is S = 2N . For each of those inputs we have to
choose a value for the output. Let us assign an ordinal number j ∈ [0, S − 1]
to each input that corresponds to the binary number represented by the
input. Thus the 0th input is {0, 0, 0, . . . , 0}, and the (S − 1)th input is
{1, 1, 1, . . . , 1}. Hence the function evaluated by the program is defined by a
string of S = 2N bits: {b0 , b1 , b2 , . . . , bS−1 }, where bj is the bit value output
CHAPTER 1. INTRODUCTION 24

Function Input Output Global Property

0 0
IDENTITY Balanced
1 1
0 1
NOT Balanced
1 0
0 0
ZERO (ERASE) Constant
1 0
0 1
ONE (ERASE-NOT) Constant
1 1

Table 1.3: Enumeration of the four possible functions of one input bit and one output
bit. The ERASE function sets all the output states to 0. We can also think of this as
the RESET function used to initialize a register to 0. The ERASE-NOT function applies
ERASE and then NOT, thereby setting all the output states to 1. Balanced functions
output 0 and 1 equally often. Constant functions always output the same value. For
the special case of one input bit and one output bit, every function is either constant or
balanced, but in the general of n input bits and one output bit, most functions are neither
balanced nor constant. (See Ex. 5.3).

by the the program for the jth possible input. Each bit in such a list has
two possible values so the total number of possible programs is
N
Z(N, M = 1) = 2S = 22 . (1.40)
To understand all this, the reader may find it useful to study the specific case
shown in Table 1.4 which lists the data associated with all possible functions
mapping two bits to one bit.
We see from the above that the number of distinct functions is a double
exponential which is a very rapidly growing function of the input register
size, N . For example, for N = 10, Z(10, 1) ≈ 1.8 × 10308 . The situation is
even more dramatic for output register size M > 1. In this case our program
is defined by M different output strings, each of length S = 2N . Thus the
total number of binary digits defining the program is M 2N and the total
number of possible programs is
2N
Z(N, M ) = 2M (2 ) = 2M
N
. (1.41)
This is consistent with our starting analysis above that Z(1, 1) = 4. Note
that
Z(10, 10) ≈ 3.5 × 103082 (1.42)
CHAPTER 1. INTRODUCTION 25

⃗x f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15

00 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
01 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
10 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
11 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

Table 1.4: List of all the functions mapping 2 bits to 1. The first column lists the four pos-
sible arguments of the function. The remaining columns give the corresponding values of
the 16 different functions fj (⃗x). Notice that the ordinal number j of function fj determines
the four-bit binary string giving the values of fj for each of its four arguments. We see
that only f0 and f15 are constant. There are six balanced functions: f3 , f5 , f6 , f9 , f10 , f12
having equal numbers of 0 and 1 outputs. The remaining eight functions are neither con-
stant nor balanced.

is a seriously big number!

Exercise 1.6. Alice has available the set of all possible binary functions
that map N bits to M bits. She draws one at random (with equal
probability) and communicates her choice to Bob in a message.
(a) What is the smallest number of physical bits that Alice could use to
transmit this message? [The input string corresponding to each output
string need not be transmitted if we agree in advance that the list of
outputs is ordered corresponding to the inputs being in ascending order.]
(b) What is the Shannon information content of this message?
(c) The channel Alice uses has the capacity to transmit 1 gigabit per
second. How long would it take Alice to send the message if N = 30 and
M = 30?
(d) How long would it take Alice to transmit all the possible messages
once each?

1.3.1 Reversible Gates

Notice that the IDENTITY and NOT functions are invertible (reversible).
This simply means that given the output c of the function we can uniquely
determine what the input b was

c = IDENTITY(b) =⇒ b = IDENTITY(c) = IDENTITY(IDENTITY(b))

c = NOT(b) =⇒ b = NOT(c) = NOT(NOT(b)) (1.43)
CHAPTER 1. INTRODUCTION 26

That is to say, IDENTITY and NOT are invertible because they happen (in
this special case) to be their own inverses
(IDENTITY)2 = IDENTITY (1.44)
(NOT)2 = IDENTITY. (1.45)
Obviously, the ZERO (ERASE or RESET) and ONE (ERASE-NOT) func-
tions are not their own inverses since they are idempotent
(ZERO)2 = ZERO (1.46)
(ONE)2 = ONE. (1.47)
Indeed, these gates do not have inverses at all. Given the output, we cannot
tell what the input was. It turns out that this irreversibility (the fact that the
operation cannot be run ‘backwards in time’) has profound implications on
the fundamental thermodynamics of computation. An irreversible computer
must dissipate energy, just as friction irreversibly brings a hockey puck sliding
across the ice to a stop while converting the kinetic energy of the motion of
the puck into heat (random thermal motion of the water molecules in the
ice). If we were to make a movie of the hockey puck’s motion and then ran
it backwards, the viewer would immediately be able to tell that the film was
running backwards because they would see the puck spontaneously speeding
up (and the ice cooling down!) something that is not physically possible
(or at least fantastically statistically unlikely). This is what we mean by
irreversible in the physics sense. In the mathematical sense, an irreversible
gate is not invertible–it does not have an inverse because there is not enough
information at the output to deduce the input.
The thermodynamics of computation were explored by Rolf Landauer and
Charles Bennett at IBM and Bennett developed a reversible model (gate set)
of classical computation in which the only irreversible step was the erasure
of information needed to initialize the computer at the beginning of the com-
putation. As illustrated in Fig. 1.4, this irreversibility is related to the fact
that RESET of a register with unknown contents (caused say by the person
who ran their program before you in the queue) lowers the entropy (Shannon
or thermodynamic) of the register. The second law of thermodynamics says
that the entropy of the universe tends to increase and lowering the entropy of
a system generally requires you to do work (e.g. with a refrigerator). While
these ideas were profound contributions to theoretical physics, in practice
classical computers are highly energy inefficient and dissipate many orders of
CHAPTER 1. INTRODUCTION 27

magnitude more energy than the theoretical limits discovered by Landauer

and Bennett. The ‘friction’ in their operation is largely associated with elec-
trical resistance in the wires of the silicon transistor chips.

1 bit of
Shannon
Alice randomly entropy
ERASE 0
selects b=0,1

1 bit of
thermodynamic
entropy

Bath

Figure 1.4: Alice selects a random bit value 0 or 1 and inputs it to an ERASE gate. The
output is 0 independent of the input. The input comes from a probability distribution
containing one bit of Shannon entropy, but the output has zero entropy. Because the
universe conserves information, the missing Shannon entropy must have been irreversibly
dumped into bath as thermodynamic entropy.

We will shortly discuss the reversible SWAP gate and Fig. 1.5 shows that
RESET can be achieved via such a gate, provided that one has a resource
consisting of a supply of bath bits in state 0, and each of those bits is used
only a single time. This picture makes clear that the SWAP gate simply
moves the entropy (information) from the data bit to the bath bit. If Alice
has no further access to the bath after that, then the operation is effectively
irreversible6 ‘.
As we shall see later, it turns out that all the operations (other than the
initialization or RESET) in a quantum computer are required by the laws of
6
It begs the question however, of how one irreversibly sets the original bath bits to zero
in the first place. If one measures the initial value of a bath bit and then applies a NOT
operation conditioned on the measurement result being 1, then the initial randomness is
converted into information in the hands of the experimentalist, so again information is
conserved.
CHAPTER 1. INTRODUCTION 28

1 bit of b
Alice randomly
Shannon
selects b=0,1 X 0
entropy

0 1 bit of
Bath X b Shannon
entropy

Figure 1.5: Reset can be achieved via the reversible SWAP gate, provided that one has a
resource, a supply of bath bits in state 0, and each is used only once. Then it is clear that
the entropy has gone from Alice’s data bit into the bath.

quantum mechanics to be reversible. That is, if a quantum computer dissi-

pates any energy, its program has failed! Thus Bennett’s work on reversible
classical computation was an important intellectual precursor to quantum
computation. It is useful therefore to briefly explore some examples of how
to make classical computer operations reversible (in principle). Given that
we have seen that RESET destroys information (decreases the Shannon en-
tropy), it is clear that the basic rule for making a reversible computer is
simply to never destroy any information. Let us consider as a simple exam-
ple the AND gate. This gate has two-inputs x, y and a single output which
is 0 except if x = 1 and (hence the name!) y = 1. Fig. 1.6 illustrates the
circuit.

Toffoli
x x x x x
Reversible
y
AND x∧ y y
AND
y y y
z z⊕x∧ y z z⊕x∧ y

Figure 1.6: Left panel: Standard circuit notation for the AND gate. This is clearly
irreversible because there are fewer output lines than input lines and information has been
lost. Middle panel: A reversible version of the AND gate that preserves the information
by copying the inputs to the outputs and flipping the ancilla bit z if and only if x = y = 1.
Right Panel: This is also known as a Toffoli or controlled-controlled-NOT (CCNOT) gate,
displayed here in the notation we will be using later for quantum gates. This gate applies
the NOT operation to the target bit (bottom wire with the open circle symbol) if and only
if both control bits (top two wires with the solid circle symbol) are in the 1 state.

These ‘circuit’ diagrams are schematic representations of the electrical

circuits in a computer that produce the corresponding gate. They are read
CHAPTER 1. INTRODUCTION 29

from left to right: the input bits are each represented by a horizontal line
(‘wire’) entering the blackbox from the left, and the output bits are repre-
sented by wires exiting on the right. The wedge notation x ∧ y represents
Boolean logical AND or equivalently the product of the bit values xy. The
⊕ notation refers to addition of the bit values modulo 2:

0⊕0 = 0
0⊕1 = 1
1⊕0 = 1
1⊕1 = 0. (1.48)

Note that this is the truth table for the ‘exclusive OR’ (XOR) gate, which
outputs 1 iff one, but not both, of the inputs are 1.
Clearly the traditional AND gate shown in the left panel of Fig. 1.6 is
irreversible because there is not enough data in the bit value on the single
output line determine the bit values on the two input lines. (See Table 1.5.)
The circuit in the middle panel shows a reversible version of the AND gate.
It has the same number of output and input lines. This gate is its own
inverse since applying it twice yields IDENTITY. This follows directly from
algebraic identity

b⊕b=0 ∀b. (1.49)

Thus the gate is reversible.

Notice that we can reproduce the action of the traditional irreversible
AND gate by setting z = 0 at the input and discarding x and y on the
output, leaving only the single output line q = x ∧ y. Also notice that if
the input is z = 1, then the output is is inverted to NOT(x AND y) =
NAND(x,y). The truth table of the NAND gate is given in Table 1.5.
Two other reversible gates are SWAP and CNOT (controlled-NOT) illus-
trated in Fig. 1.7. As shown in the figure (see also Ex. 1.7), the SWAP gate
can be synthesized from three controlled-NOT (CNOT) gates. Just as the
name suggests, the SWAP gate interchanges the values of the two bits on its
input lines. The CNOT is a simpler version of the Toffoli gate: it peforms a
NOT operation on the target bit iff (if and only if) the control bit is 1.
Suppose that we have as a resource a ‘bath’ of available bits all in state 0.
Then the reset operation illustrated in Fig. 1.4 can be performed by simply
swapping the data bit into the bath and forgetting about it as illustrated in
CHAPTER 1. INTRODUCTION 30

x y AND(x, y) NAND(x, y)
0 0 0 1
0 1 0 1
1 0 0 1
1 1 1 0

Table 1.5: Truth tables for the AND and NAND (NOT[AND]) gates. The NAND gate is
illustrated in Fig. 1.8.

Fig. 1.5. This clearly shows that the entropy in the data has been transferred
into the bath and thus information is conserved.
See [KLM] Sec. 1.5 for additional discussion of reversible logic.

1.4 Universal Gate Sets

We have seen that for a classical computer with an input register of N bits
and an output register of M bits, the total number of possible programs that
can be run is truly enormous, Z = 2M (2 ) . A universal gate set is defined
N

to be any set of gates that can be used to build circuits that execute all Z
possible programs (for any size N, M ). A gate set consisting solely of the
Toffoli gate (CC-NOT) is universal for classical computation, provided that
one has access to ancilla bits and the ability to initialize these auxiliary bits in
0 or 1 as needed). Since the Toffoli gate is reversible, one can build circuits
that execute all Z possible programs and do so reversibly. The reversible
circuit will have the same number of input and output lines and their total
number may be larger than M+N due to the presence of the ancilla bits.
We have seen that the Toffoli gate is not just universal for reversible
classical computations, but can also be used to create irreversible gates if
desired (by throwing away the data on the control lines). Another simple
gate set that is universal for irreversible computation is {NAND, FANOUT}
illustrated in Fig. 1.8. NAND(x, y) = NOT(AND(x, y)) and its truth table
is shown in Table 1.5. FANOUT splits a single ‘wire’ into two (or more)
wires, or equivalently copies the value of a particular bit into another bit.
When we come to quantum gates, we will see that FANOUT is not allowed
because of the quantum ‘no-cloning theorem.’ The quantum Toffoli gate is
allowed however because it is reversible and does not violate the no-cloning
CHAPTER 1. INTRODUCTION 31

(a) (b) (c)

x X y x x x x

y
X x y y⊕x y y⊕x

(d)
x y

y x

Figure 1.7: (a) Standard circuit notation for the SWAP gate which interchanges the bits
on the two wires. (b) The controlled-NOT (CNOT) gate applies the NOT operation to
the target bit (denoted by the open circle symbol) if and only if the control bit (denoted
by the solid circle symbol) is in the 1 state. The control bit x is unchanged. The target bit
y is mapped to XOR(x, y) = y ⊕ x. In contrast to the Toffoli gate, the CNOT is a two-bit
gate, not a 3-bit gate. (c) C̄NOT gate which applies the NOT gate to the target iff the
control bit is in the 0 state. x̄ denotes NOT(x). (d) The SWAP gate can by synthesized
from three CNOT gates.

rule. Curiously, one- and two-bit gates are inadequate to achieve reversible
universal classical computation (hence the need for the three-bit Toffoli gate),
however a gate set with only one- and two-qubit gates can be found that is
universal for quantum computation. Peculiar quantum interference effects
permit the Toffoli gate to be synthesized from two-qubit gates, something
that is not possible in (reversible) classical computation. More on this later!
CHAPTER 1. INTRODUCTION 32

Box 1.6. Computational Complexity Once we have a universal gate set

we can ask the question, how many gates are needed to ‘hard wire’ a par-
ticular one of the Z(N, M ) = 2M (2 ) possible binary functions (programs)
N

enumerated in Eq. (1.41). Said another way, what is the smallest possible
circuit depth needed to execute the function correctly for all possible inputs?
This is one measure of the computational complexity of executing the de-
sired functiona . Clearly if the function to be evaluated has a lot ‘structure’
(as opposed to being ‘random’) it should be possible to have a very small
circuit. For example if the function has N inputs and N outputs and is
simply IDENTITY, then we just need an IDENTITY gate on each of the N
lines. These can be executed in parallel so the circuit ‘depth’ is one. On the
other hand, if the output value for each possible input is determined by N
coin flips (whose results are permanently recorded, not rerun each time the
computation is run) then it seems intuitive that the circuit would have to
be very complex and deep to deal with each and every possible input in the
correct way.
In theoretical computer science, statements about the complexity of solv-
ing some class of problems (e.g. the ‘traveling salesman’ route optimization
problem) are typically statements about the asymptotic behavior of the cir-
cuit depth as the input size N goes to infinity. There are many classes of
problems that contain provably hard (i.e. requiring circuit depth that is su-
perpolynomial in N ) instances and yet are ‘easy’ for typical cases. It is often
more difficult to prove sharp statements about average-case difficulty than
worse-case difficulty.
For a related discussion of complexity of strings of bits in the classical case
see:

https://www.wikiwand.com/en/Kolmogorov complexity

We will see that the complexity of quantum circuits can sometimes be dra-
matically less than that of the best known classical circuits for certain prob-
lems. However there is still much computer science research that needs to
be done to fully understand the power of quantum computers and to classify
the hardness of different problems on such computers.
a
Note that in the circuit model, circuit depth and run time are essentially the same
thing. The number of lines in a program written in a high-level language need not have
anything to do with the run time because of ‘DO LOOP’ commands. Indeed, the famous
‘halting problem’ theorem tells us that it is not even possible to write a program that can
determine if another program will ever halt.
CHAPTER 1. INTRODUCTION 33

x x
NAND x∧ y x FANOUT
y

Figure 1.8: Left panel: Standard circuit notation for the NAND (NOT[AND]) gate. Like
the AND gate, this is clearly irreversible. Right panel: FANOUT copies the input to two
(or more) output lines. These two gates together form a simple but universal gates set
for irreversible classical computation. Neither of these gates is allowed in a quantum com-
puter. NAND violates the reversibility requirement and FANOUT violates the quantum
no-cloning theorem.

See [KLM] Sec. 1.3 for additional discussion of universal gate sets.
Exercise 1.7. Prove that the three CNOT gates shown in Fig. 1.7c are
equivalent to the SWAP gate. Do this by considering all four possible
input states and explicitly showing in your solution what the states are
after each of the three CNOT gates.

Exercise 1.8. Use a NAND gate, plus any needed auxiliary bits ini-
tialized to 0 or 1, to construct a NOT gate.

Exercise 1.9. For a computer with input register of size N = 2 and

output register of size M = 2, there are Z(2, 2) = 44 = 256 possible
programs that can be run. One of these is the ‘exclusive or’ function,
XOR, that is true (i.e. has value 1) if and only if exactly 1 of its inputs
is 1 (i.e. either x = 1, y = 0 or x = 0, y = 1). Construct a reversible
circuit for XOR that is its own inverse

a) using one CNOT gate and 2 wires with the inputs being x and y,
and the outputs being x and q = XOR(x, y) = x ⊕ y, where ⊕ is
addition mod 2.

b) with the same inputs and outputs as above, but using one Tof-
foli gate, plus any needed auxiliary bits (internal to the circuit)
initialized to 0 or 1. Hint: You will need a total of 3 wires.

c) using two CNOT gates and three wires with the inputs being x, y, z
and the outputs being x, y and q = z ⊕ x ⊕ y = z ⊕ XOR(x, y).

Another gate we will find useful is the controlled SWAP gate (cSWAP)
also known as the Fredkin gate, illustrated in Fig. 1.9. cSWAP applies the
CHAPTER 1. INTRODUCTION 34

identity if the control bit is 0 and swaps the two target bits if the control bit
is 1. The c̄SWAP does the reverse–performing the SWAP if the control bit
is in 0. As discussed in Exercise 1.10, the cSWAP can be constructed from
Toffoli gates. As illustrated in Fig. 1.10 and discussed in Exercise 1.11, the
cSWAP can be used to construct a router that sends an input bit b down a
binary tree to a destination specifed by the bits in an address register.
Exercise 1.10. Consider a computer with input register (x, y, z) of
size N = 3 and output register of size M = 3. Construct the con-
trolled SWAP (cSWAP or Fredkin) gate shown in Fig. 1.9 whose output
is (x, y, z) if z = 0 and whose output is (y, x, z) is z = 1. You may
utilize one Toffoli gate and two CNOT gates. Hint: With 3 CNOT gates
you can create a SWAP. What can you do with 2 CNOT gates and one
Toffoli gate?

Exercise 1.11. Use three controlled SWAP (cSWAP or Fredkin) gates

to construct the binary tree router shown in Fig. 1.10 that sends an input
bit b to one of four final destinations based on the values of two address
bits, {a1 , a0 }, and outputs 0 in all the other bits of the output register.
Compute how many cSWAP gates would be needed for a general binary
tree router with N address bits.

x X x X

y X y X
cSWAP cSWAP
z z

Figure 1.9: Standard circuit notation for the controlled SWAP (cSWAP) or Fredkin gate
which swaps the two target bits x, y iff the control bit z is 1. The c̄SWAP gate does the
reverse–it swaps the two target bits only iff the control is 0.
CHAPTER 1. INTRODUCTION 35

Input register
b

Address: (00) (01) (10) (11)

Output register

Figure 1.10: A binary tree router that sends input bit b to one of 4 possible elements in
the output register based on the values of two address bits (a1 , a0 ). All other bits in the
output register should be 0.
Chapter 2

Quantum Bits: Qubits

As stated previously, information is physical and is stored in the state of a

physical system. As illustrated in Fig. 1.1, classical information can be stored
in the physical state of an electrical switch and transmitted via the presence
or absence of a light from a bulb. As noted previously since a classical bit can
take on only two possible values, 0 and 1, there are only two possible physical
encodings, 0 and 1 can be represented by the switch off and on respectively,
or the other way around. If Alice sends Bob a message via a beam of light
and does not tell Bob which encoding she is using, it is an easy matter for
Bob to try to decipher the message using both encodings since they differ
only by a simple NOT operation that interchanges 0 and 1.
Let us abstract away the details of the physical instantiation of the clas-
sical bit and, with a view towards the quantum notation about to be in-
troduced, represent its state by an upward or downward pointing arrow as
shown in Fig. 2.1.
As we shall soon see, things are very different for quantum bits. How
things are different can only be deduced from the experimental phenomenol-
ogy. We will present here two versions of the story–the first from a computer
science perspective that removes most of the physics language and the second
aimed at beginning physicists. The reader is encouraged to learn from both
points of view.

36
CHAPTER 2. QUANTUM BITS: QUBITS 37

0 1

1 0
Figure 2.1: Representation of the two possible states of a classical switch (or other physical
instantiation of a classical bit) with an up or down arrow. Classically there are only two
possible encodings: uparrow is state 0 (left panel) or downarrow is state 0 (right panel).
They differ by a NOT operation, or in the language of the arrow, they differ by a rotation
of the arrows through 180 degrees.

2.1 Quantum Bits for Computer Scientists

A quantum bit (‘qubit’) is information stored in a quantum system that has
two possible physical states, distinguished (say) by the distinct and discretely
quantized values of their energy. Quantum bits are like classical bits in that
if you measure which energy state they are in, you get a binary result, either
0 or 1. We are perhaps not used to thinking about making measurements
of the state of classical bits, yet that is exactly what is done to read out
data from a hard drive where the a sensor in the flying head scans over the
disk and measures the orientation of the small domains of magnetism whose
orientation (up or down, say) represents the state of each bit.
But in addition to this traditional ‘digital’ (i.e., binary) feature, quantum
bits also have ‘analog’ characteristics because there exists a continuum of
possible encodings as illustrated in Fig. 2.2. Associated with this continuous
choice of encodings (also known as ‘quantization axes’ or ‘frames’) is the
ability to choose the measurements we make to be in any of the possible
frames. That is, if Bob prepares a qubit in state 0′ as represented in the
right panel of Fig. 2.2, there is a continuum of adjustments he can make
to ‘orient’ his measurement apparatus in order to make its measurements in
that same frame. The apparatus contains some controls that determine the
unit vector n̂ defining the orientation of the frame (in some abstract space,
not necessarily our ordinary three-dimensional space).
Something strange happens if Alice prepares state 0 in her frame (left
panel of Fig. 2.2) and gives it to Bob to measure in his frame choice. It
CHAPTER 2. QUANTUM BITS: QUBITS 38

makes sense that if he chooses θ = 0, φ = 0, his frame matches Alice’s and he

his measurement should yield result 0 since that is what Alice prepared. On
the other hand, if Bob chooses θ = π, φ = 0, then his frame is the same as
Alice’s except that 0 and 1 are interchanged. Hence his measurement result
should always be 1. How do we reconcile the fact that Bob’s frame choice
can be varied continuously and yet his measurement results will always be
discrete? As he changes θ continuously from 0 to π, his measurement result
has to change from 0 to 1. Since the measurement results are discrete (as for
classical bits) it is impossible for them to change continuously. It turns out
that the only possible way to reconcile the digital/analog features of quantum
bits is through the introduction of randomness. The measurement results are
always binary, but 0 and 1 occur randomly with a probability distribution
that varies continuously as the measurement ‘axis’ n̂ varies.
Quantum theory is a radical alternative to traditional probability theory
and its predictions can be quite different as we will see. Randomness ap-
pears in classical probability theory as a representation of our ignorance of
certain ‘hidden’ variables that we would need to know in order to know the
outcome of some experiment (e.g., flipping a coin). In the quantum theory,
randomness is intrinsic and not due to ignorance. If Alice prepares 0 in her
frame and Bob measures in his frame, neither Alice nor Bob have any way of
knowing what the outcome of the measurement will be. The result is truly
and ineluctably random. This can be a resource that is more powerful than
the ‘pseudorandom’ numbers generated by codes run on classical comput-
ers However experiment shows (and quantum theory correctly predicts) that
Bob will measure 0′ and 1′ with probabilities

2 θ 1
p0′ = cos = [1 + cos θ] (2.1)
2 2

2 θ 1
p1′ = sin = [1 − cos θ] . (2.2)
2 2
These probabilities are plotted in Fig. 2.3. Notice that for θ = 0 there
is no randomness in the measurements, and Bob’s results always give 0 in
agreement with with Alice. For the θ = π, there is also no randomness,
but Bob’s results are always exactly opposite those of Alice. Notice that the
randomness is maximized at θ = π/2 where Alice’s and Bob’s quantization
axes are perpindicular. At that point Bob’s measurement results are a ‘50;50
coin-toss’ and yield the maximum possible information of one classical bit.
However this is not information Alice is sending to Bob. She has absolutely
CHAPTER 2. QUANTUM BITS: QUBITS 39

Alice Bob

0
0′
θ
ϕ

1′
1
Figure 2.2: There are an infinite number of possible encodings for quantum information.
These encodings are in one-to-one correspondence with the points on the surface of a unit
sphere. Left panel: Alice chooses the ‘standard’ encoding. Right panel: Bob chooses an
encoding (aka ‘quantization axis,’ or ‘frame’) defined in polar coordinates by the polar
angle θ and azimuthal angle φ of a chosen point on the unit sphere (also known as the
‘Bloch sphere’ in honor of Felix Bloch).

no control over Bob’s measurement results (i.e., if she prepares 1 instead of

0, it has no affect on Bob’s random results if θ = π/2.).
There is a second strange aspect of the dual binary/analog nature of
quantum bits. Suppose that Bob measures Alice’s qubit and obtains the
(random) result 0′ . If he measures the same qubit again, the result will not
be random. It will be identical to the first result. The next time Alice sends
Bob a qubit (again prepared in her state 0), Bob might randomly obtain
the measurement result 1′ . If he measures that same qubit again, the result
will always be 1′ . The randomness has dissappeared! Consistent with the
discussion above (in which Bob’s original measurement is not random if θ =
0), the only way for the randomness to go away is if the act of measurement
has (rather mysteriously) realigned the qubit with Bob’s measurement axis
(parallel or anti-parallel to it, depending on whether the measurement result
is 0′ or 1′ ). This so-called ‘measurement back action’ or ‘state collapse’ is
completely invisible to Bob. Suppose that Bob does not know that his frame
is not aligned with Alice’s. When Alice sends him a single bit in her state
0, Bob obtains one of his measurement results randomly, but with only one
event, he has no way of knowing if this is random. If the result is, say, 1′ , he
can verify it by measuring several times and will always get 1′ . As far as he
can tell, Alice sent him a 1′ , even though in fact, she sent him a 0.
As we will see later when we study quantum encryption and communi-
cation, this invisibility of the measurement back action can be used as a
CHAPTER 2. QUANTUM BITS: QUBITS 40

p0′

p1′

θ /π

Figure 2.3: Probabilities p0′ , p1′ that Bob measures results 0, 1 in his frame when Alice
prepares the qubit in state 0 in her frame. θ is the polar angle between Bob’s quantization
axis and Alice’s. The probabilities are independent of the azimuthal angle φ.

resource for encrypting communications and for actively detecting the pres-
ence of an eavesdropper. Conversely the same effect dramatically complicates
quantum error correction in computation (and communication). When you
look to see if errors have occurred you do damage to the state!!

Box 2.1. Holevo Bound If Alice and Bob agree in advance on the quanti-
zation axis to use, Alice can send Bob information by sending him qubits.
For simplicity let us say they agree to use the standard computational basis
states |0⟩ and |1⟩. If Alice wishes to send Bob the three-bit classical message
011, she can send him three qubits in states |0⟩, |1⟩, |1⟩. Because Bob knows
the correct quantization axis to use, his measurement results will be 0, 1, 1
and he will correctly receive the message Alice sent. If Bob uses the wrong
quantization axis for his measurement, he will gain less information about
Alice’s message because randomness will cause his measurement results to
(have some non-zero probability to) disagree with what Alice intended.

Alice can (in principle) encode an infinite number of bits in the complex
coefficients of a coherent superposition state α|0⟩ + β|1⟩, but Bob’s measure-
ment can only yield one classical bit of information per qubit. Subsequent
measurements on the same qubit yield no new information because of state
collapse. This limit of one classical bit of information per quantum bit is
called the Holevo bound on the transmittable information.

If Alice sends Bob a large collection of qubits, all prepared in the same
CHAPTER 2. QUANTUM BITS: QUBITS 41

state, say 0, then Bob can tell something about the alignment of his frame
relative to Alice’s from the probability distribution of the results. Using his
measurements Bob can estimate θ from

2 θ 2 θ
p0′ − p1′ = cos − sin = cos[θ]. (2.3)
2 2
We will see later that by orienting his frame in a different direction, he
can also acquire information about cos[φ], provided Alice supplies him with
additional fresh qubits all prepared in the same state.
Exercise 2.1. The quantum Zeno effect is a vivid demonstration of
the existence of measurement back action. Suppose that Alice gives Bob
a qubit prepared in state 0 in her frame. Further suppose that Bob’s
frame is aligned with Alice’s frame, so that if he measures the qubit he
will also obtain the result 0 with probability 1. Bob can (with high prob-
ability) rotate the qubit from state 0 to state 1 purely by making a series
of measurements! Suppose Bob makes a series of N ≫ 1 measurements,
gradually rotating his quantization axis between each measurement. Let
the first measurement have θ1 = π/N and the jth measurement have
θj = jθ1 . (Assume all have φ = 0.) If all N measurements happen to
yield state 0′ in Bob’s (gradually rotating) frame, then after N measure-
ments, Alice will see that Bob’s measurements will have turned the qubit
from her 0 to her 1 orientation. A lower bound on the success probability
is given by the probability that every single one of Bob’s measurements
is 0′ (in his gradually rotating frame). Compute this probability as a
function of N and find an analytic approximation for it valid for N ≫ 1.
Hint: expand the logarithm of the probability for large N and then ex-
ponentiate that.

2.2 Quantum Bits for Budding Physicists

A quantum bit (‘qubit’) is information stored in a quantum system that has
two possible physical states. In classical mechanics, the energy of physical
systems can vary continuously. Quantum systems however (can1 ) have dis-
crete, or ‘quantized’ energy levels, as illustrated in Fig. 2.4. These discrete
1
When an electron is in a low-energy state in an atom, it is ‘bound’ to the nucleus
and has only discrete energy levels. For very high excitation energies the atom becomes
ionized and the electron becomes unbound. In that case the allowed energy levels of the
electron form a continuum rather than being discrete. For all of the qubits we study, we
will be discussing only the low-energy states which are always discretely quantized.
CHAPTER 2. QUANTUM BITS: QUBITS 42

Energy
1
0

Figure 2.4: Schematic illustration of the quantized energy levels of a naturally occuring
(or an artificial) atom used as a quantum bit (qubit). The atom being in the lowest energy
state can represent state 0 and the atom being in the first excited state can represent state
1. Electromagnetic radiation can be used to flip the qubit from one state to the other. For
natural atoms the required frequency can be in the optical or the microwave. For artificial
atoms such as superconducting qubits, the required frequency is in the microwave domain.

physical states can be used to store information. For example, state 0 might
denote the atom being in its lowest energy state (the ‘ground’ state), and
state 1 might denote the atom being in its first excited state. The quantiza-
tion of atomic energy levels is related to the fact that the electrons orbiting
the nucleus behave like waves. The allowed energies of the electrons can be
found by solving the Schrödinger wave equation–a complex task outside the
scope of this discussion. Crudely speaking however, one can understand en-
ergy level quantization as arising from the fact that in the quantum theory
particles act like waves and only electron orbits whose circumference is an
integer number of wavelengths are allowed (otherwise the wave destructively
interferes with itself). Most quantum systems have many more than two
states, but with the right experimental situation, it is possible to focus at-
tention only on the two lowest energy states and ignore the others (to a good
approximation).
CHAPTER 2. QUANTUM BITS: QUBITS 43

Box 2.2. Classical Mechanics vs. Quantum Mechanics In classical

mechanics, the configuration space is the set of all possible configurations
of the system. The configuration of the N particles in a system can be
described by listing the position vectors ⃗rj = (xj , yy , zj ) for every particle
j ∈ {1, N }. These N points in 3-dimensional position space, or equivalently
giving a single point in a 3N -dimensional configuration space, defines the
spatial configuration of the system. Because Newton’s equation of motion
contains two time derivatives (for the acceleration caused by the forces on the
particles), the spatial configuration is not enough to predict the future state.
You need to also know the momentum vector p⃗j of each particle. The 6N -
dimensional space that defines the positions and momenta of all the particles
is called phase space.
In quantum mechanics the state (‘configuration’) of a quantum system is
not defined by a point in the usual configuration space. It is defined by a
complex-valued function on that space–this function is known as the wave
function. Because the Schrödinger equation governing the time evolution
of the wave function has only one time derivative, we only need to know
the wave function Ψ(⃗r, t) to predict the future state. [In essence Ψ contains
information about the momentum as well as the position.] Max Born gave
an interpretation of the wave function: The intensity Ψ∗ (⃗r, t)Ψ(⃗r, t) gives the
probability (density) of finding the particle at ⃗r (or finding the positions of
all the particles if ⃗r is a 3N -dimensional vector in the case of an N -particle
system). Because the Schrödinger equation is linear, the sum of two wave
solutions is also a solution, leading to the possibility of quantum interferences
(since the magnitude squared of the sum of the waves is not equal to the sum
of the individual probability densities of each wave).
For simplicity let us consider one spatial dimension and x ∈ [0, L]. We can
if we wish, approximate the complex function ψ(x) by specifying its value at
M equally spaced points over the interval and approximating the function
as piecewise constant. Then the function is defined by a complex vector of
(very large) length M ; that is, the function is (approximately) represented
by a vector in an M dimensional complex vector space. See the discussion
in R. Shankar, Principles of Quantum Mechanics.
The configuration space of a classical bit consists of just two points, labeled
0 and 1. The state (configuration) of a qubit is defined by a complex-valued
function on those two points (which in the Dirac notation we call |0⟩ and|1⟩).
A function whose argument takes on only two values returns two complex
numbers and these two numbers completely specify the function. We can
call them ψ0 and ψ1 and think of the state defined by these two values as
being represented by a two-component complex vector

ψ0
|Ψ⟩ = = ψ0 |0⟩ + ψ1 |1⟩.
ψ1
CHAPTER 2. QUANTUM BITS: QUBITS 44

One of the key features of quantum mechanics is that if one measures

the energy of an atom (or more generally any qubit), one always obtains a
result equal to one of the quantized values allowed for the energy and never
anything else. If we are allowed to focus on only the two lowest levels then
we always get one of two possible measurement results corresponding to the
qubit being in state 0 or state 1. So far this is exactly the same as for classical
bits. For example, one can measure the switch setting in the circuit shown in
Fig. 1.1 by checking to see if the light bulb is on or off. In principle therefore
it is possible to build a classical computer using individual atoms to store
bits. This would be the ultimate in miniaturization!
However, because quantum systems behave strangely–particles can act
like waves and waves sometimes act like particles, quantum bits are com-
pletely different than classical bits in that they have both analog and digital
characteristics. These differences will give rise to the surprising powers of
quantum systems to store, manipulate and transmit information. The wave-
like features of quantum systems means that a qubit can exist in an infi-
nite number of different ‘superposition’ states intermediate between 0 and 1.
Mathematically this goes back to the fact that the Schrödinger wave equation
is a linear differential equation. Therefore if it has two (or more) different
solutions, any arbitrary linear combination of those solutions also solves the
equation. You have seen the superposition principle in action if you have
ever watched two waves on the surface of a lake pass through each other and
keep going as if nothing had happened. You have also heard it in action when
two people are speaking and the vibrations of the air (the sound waves) from
each speaker reach your ears.2
Fig. 2.5 illustrates the interference effects which occur when two waves
are superposed.
For two-state systems (qubits) superposition states are represented math-
ematically in a notation invented by Paul A.M. Dirac as

|ψ⟩ = α|0⟩ + β|1⟩. (2.4)

Here |0⟩ represents state 0 of the qubit and |1⟩ represents state 1 of the
qubit, and α and β are (complex) wave amplitudes also known as ‘probability
amplitudes.’ The ‘state’ of the qubit has something to do with the kind of
2
The physics of the sound waves is such that, to an excellent approximation, each one
propagates through the air unaffected by the other. This does not mean that your brain
won’t have trouble following two conversations at once, but that is a separate matter.
CHAPTER 2. QUANTUM BITS: QUBITS 45

Figure 2.5: Top left panel: Two sine waves, Ψ1 and Ψ2 of unit amplitude (vertically
displaced from each other for clarity) having slightly different wavelengths. Top right
panel: Superposition of the two waves Ψ = αΨ1 + βΨ2 , with α = β = +1. The amplitude
is the sum of the amplitudes of the two waves. In some regions the waves cancel each
other out (destructively interfere) and in other regions they are in phase and constructively
interfere. Bottom left panel: Interference pattern when the wave amplitudes are unequal,
α = +1, β = +0.25. The interference effects are thus smaller. Bottom right panel:
Superposition of the two waves Ψ = αΨ1 + βΨ2 , with α = +1, β = −1. The net amplitude
is the difference of the amplitudes of the two waves. Notice that the regions of high net
amplitude are different than in the upper right panel.
CHAPTER 2. QUANTUM BITS: QUBITS 46

waves illustrated above, but for our purposes we do not ever need the details
of that. |0⟩ and |1⟩ are abstractions of these wave states and all we have
to know about them is that they represent two distinct energy states of the
physical object holding the quantum bit of information. As we shall see later,
the complex wave amplitudes α and β do control interference effects which
are analogous to, but an abstraction from, those shown in Fig. 2.5. The
analogy for qubits to take away from this is the understanding that quantum
superposition states such as
1
|Ψ+ ⟩ = √ [|0⟩ + |1⟩] , (2.5)
2
and
1
|Ψ− ⟩ = √ [|0⟩ − |1⟩] , (2.6)
2
are physically distinct, just as the wave patterns corresponding to the differ-
ent superpositions shown in Fig. 2.5 are distinct.
Readers familiar with the concept of atomic orbitals in chemistry may
find it useful to think about the waves associated with the ground (S orbital)
and excited (Px orbital) states of the hydrogen atom as illustrated schemat-
ically in Fig. 2.6. The two LCAOs (linear combination of atomic orbitals)
corresponding to |Ψ± ⟩ correspond to distinct SP hybridized bonds sticking
out to the right and left as illustrated in Fig. 2.6. It is important to note
that while these two states of the hydrogen atom could in principle be used
to form a qubit, the actual physical qubits in quantum hardware could be
completely different. The states |0⟩ and |1⟩ are simply abstract representa-
tions of whatever the two lowest energy states of our quantum bit are. The
mathematics describing quantum two-level systems is the same for all qubits
and the implementation details are not needed for discussion of the theory.
Readers unfamiliar with these quantum concepts, need not worry about
them. We will provide the rules of the abstract game as a generalization of
the rules for manipulating classical bits learned in computer science.
Qubits are analog devices in the sense that they have a continuum of
possible superposition states defined by two complex numbers (here α and
β) and yet they are digital in the sense that if you measure which energy
state they are in there are only two possible results for the measurement,
|0⟩ and |1⟩. (That is, the measured energy is always found to be either E0
or E1 .) There is a peculiar asymmetry here–it takes an infinite number of
CHAPTER 2. QUANTUM BITS: QUBITS 47

1 1
0  0 − 1   0 − 1 
2 2

Figure 2.6: Left panel: Schematic illustration of the wave describing the ground state
|0⟩ and lowest excited state |1⟩ of the electron in a hydrogen atom. (Sharp-eyed expert
readers will notice that these are actually the wave functions for a one-dimensional har-
monic oscillator.) Right panel: waves corresponding to two different linear combinations
(superpositions) of the two orbitals. One superposition would be appropriate for forming
a chemical bond sticking out to the right and the other to the left.

bits to specify the quantum state (i.e. specify the complex numbers α, β)
but a measurement of the state always yields only one of two results. Thus
the information gained from the measurement is (at most) one (classical) bit.
How can we reconcile these wildly different continuous and discrete properties
of qubits? It seems reasonable that if α = 1 and β = 0 so that we have the
state
|ψ⟩ = |0⟩, (2.7)
then we should always measure the energy to be E0 and never E1 . Conversely,
if α = 0 and β = 1, so that we have the state
|ψ⟩ = |1⟩, (2.8)
then we should always measure the energy to be E1 and never E0 . But what
happens if we continuously decrease α from 1 to 0 and continuously increase
β from 0 to 1? At what point does the measurement result change from
being E0 to suddenly being E1 ? Given that the change in the superposition
state is continuous, it seems like the measurement results ought to also be
continuous. It turns out that the only way we can reconcile the discreteness
of the measurement results with the continuity of the superposition states is
for the measurement results to be intrinsically and ineluctably random. The
measurement results are still discrete, but the probabilities of obtaining E0
and E1 vary continuously with α and β. Without randomness there is simply
no way to reconcile the continuous behavior and the discrete behavior. This
is the source of randomness in the quantum theory.
CHAPTER 2. QUANTUM BITS: QUBITS 48

If one measures whether the qubit has value 0 or 1, the answer is random
(provided α and β are both non-zero). There is simply no way to predict the
outcome of the measurement. As discussed in Box 2.4, this randomness is
intrinsic to the quantum theory and is not the result of ignorance of the values
of any hidden variables that Alice forgot to set or Bob failed to measure. Max
Born3 argued that α and β should be thought of as wave amplitudes and the
measurement probabilities are given by the wave intensities. That is, the
so-called Born rule states that the measurement yields 0 with probability4
p0 = |α|2 and 1 with probability p1 = |β|2 .
We know from ordinary probability theory that if A and B are mutually
exclusive5 events with probabilities pA , pB respectively, then the probablity
of C = (A OR B) is

pC = pA + p B . (2.9)

From this we have the important inequality

pC ≥ max{pA , pB }. (2.10)

Since the events of measuring 0 and 1 exhaust the set of all possible outcomes,
we have the important constraint that the two measurement probabilities
must add up to unity

|α|2 + |β|2 = 1. (2.11)

We say that the state |ψ⟩ must be ‘normalized.’ To repeat: As α and β vary
continuously, the probabilities p0 , p1 vary continously but the measurement
results are always one of two discrete values, E0 , E1 .
Since classical probablities are positive, we know that if there are two
mutually exclusive ways an event can occur, the overall probability will be
increased. This can be seen in Eq. (2.9). If {A, B, C, D} are four possi-
ble mutually exclusive outcomes but {A, B} are in the category ‘good’ and
3
Interesting trivia item: Max Born (who won the Nobel Prize in 1954) was the grand-
father of British-Australian singer, Dame Olivia Newton John, who co-starred with John
Travolta in the 1978 musical movie, Grease.
4
We are using the following standard notation. If we have a complex number z = x+iy,
we define complex conjugation as z ∗ = x − iy and the magnitude (squared) of the number
as |z|2 = z ∗ z = x2 + y 2 .
5
Mutually exclusive simply means that A and B never both occur on any ‘throw of the
dice.’
CHAPTER 2. QUANTUM BITS: QUBITS 49

{C, D} are in the category ‘bad,’ then there are two ways to have a good
outcome and the probability of such an outcome is given by the sum of the
probabilities of all the different ways a good outcome can occur. Consider
however what would happen if there were two contributions to the quantum
probability amplitudes instead of the probabilities
1
|Ψ⟩ = √ [(α + α′ )|0⟩ + (β + β ′ )|1⟩] . (2.12)
Λ
We interpret the probability amplitudes as (complex-valued) wave-like ampli-
tudes6 . Noting that wave amplitudes can interfere constructively or destruc-
tively, we see that the probability of a particular outcome might be increased
or decreased if there are two contributions to the probability amplitude. We
can have

|α + α′ |2 = |α|2 + |α′ |2 + α∗ α′ + α′∗ α (2.13)

greater than (constructive interference), equal to, or less than (destructive

interference) |α|2 + |α′ |2 . Thus quantum systems can violate the inequality
in Eq. (2.10). Unlike probabilities, probability amplitudes can be negative or
even complex. Thus it is possible that when there are two ways for something
to happen (two ‘paths’ to the same final state) they cancel each other and the
event never occurs! This changes the game completely and we will see later
that such interference effects play a key role in the functioning of quantum
algorithms.
It is important to recognize that an overall ‘global phase’ of the form

eiχ |ψ⟩ = eiχ α|0⟩ + eiχ β|1⟩ (2.14)

(where χ is an arbitrary real number) can have no measurable effect be-

cause the probabilities for different measurement results depend only on the
magnitude squared of the state coefficients (probability amplitudes). Hence,
without loss of generality we can choose the phase angle χ to make the co-
efficient a = eiχ α in front of |0⟩ real. The complex number β = b + ic is
6
Here Λ is a real-valued number called the normalization factor that must be inserted
to guarantee that the probabilities add up to unity. Because of this, there is not really
a concept of translations in Hilbert space. The summation of the two amplitudes shown
here, once normalized actually corresponds to a rotation that keeps the length of the vector
fixed. More on this later in the discussion surrounding Eq. (3.83).)
CHAPTER 2. QUANTUM BITS: QUBITS 50

parameterized by two real numbers c and d. However the normalization con-

straint a2 + b2 + c2 = 1, removes one of these three degrees of freedom. In the
end, the irrelevance of the overall phase and the normalization constraint re-
duce the four real parameters (defining the two complex numbers α, β) down
to two real parameters to specify the state of a single qubit.
The standard parametrization of basis states associated with the quanti-
zation axis ŝ in terms of two (real) angles θ and φ is the following
θ θ
| + ŝ⟩ = cos |0⟩ + eiφ sin |1⟩, (2.15)
2 2
θ θ
| − ŝ⟩ = sin |0⟩ − eiφ cos |1⟩. (2.16)
2 2
The two angles θ and ϕ can be visualized as the polar and azimuthal angles
of a unit vector given in Cartesian coordinates by

ŝ = (x, y, z) = (sin θ cos φ, sin θ sin φ, cos θ) (2.17)

and pointing from the origin to the surface of the unit sphere as illustrated in
Fig. 2.7. The unit vector ŝ is variously referred to as the qubit ‘polarization
vector,’ or the ‘spin’ or ‘spin vector’ of the qubit7 . We will later describe the
meaning of the unit vector on the Bloch sphere in relation to the properties
of the corresponding quantum state and will also explain why half angles
(which are common in spherical trigonometry) appear in the parametrization
of the state in Eqs. (2.16-2.15). We simply note here that the two states of
a classical bit are represented by the Bloch sphere vectors corresponding to
the north pole ŝ = (0, 0, 1) and the south pole ŝ = (0, 0, −1). Quantum bits
can be in states corresponding to an arbitrary point on the sphere.
7
Physics students will know that the word ‘spin’ refers to the fact that certain ele-
mentary particles like the electron carry an intrinsic angular momentum which is a vector
quantity that can point in any direction, and yet when we measure the projection of that
spin vector onto a fixed axis, we always obtain only one of two results (at least for the
electron since it has spin s = 1/2 and therefore 2s + 1 = 2 independent states). The spin
degree of freedom of an electron is therefore a qubit! Other kinds of qubits do not literally
have an angular momentum vector but their superposition states can still be represented
mathematically as if they did.
CHAPTER 2. QUANTUM BITS: QUBITS 51

Figure 2.7: Unit vector corresponding to a point on the Bloch sphere with cartesian
coordinates (x, y, z) = (sin θ cos φ, sin θ sin φ, cos θ). The orientation of this 3D unit vector
can be obtained by starting with the unit vector pointing to the ‘north pole’ of the sphere
and then rotating it in the xz plane by an angle θ around the y axis and then rotating by
an angle φ around the z axis. This unit vector corresponds to the parametrization of the
qubit state given in Eqs. (2.16-2.15) [Figure Credit: Nielsen and Chuang].

Box 2.3. A word about notation. Since |0⟩ and |1⟩ might represent the
ground and first excited states of a quantum system, we will sometimes de-
note them |g⟩ and |e⟩ respectively. Similarly, we may use the orientation
on the Bloch sphere to label the same states | ↑⟩ and | ↓⟩. The state corre-
sponding to (x, y, z) = (1, 0, 0) (or equivalently, θ = π/2, φ = 0) on the Bloch
sphere might be written
1 1
| →⟩ = | + X⟩ = |+⟩ = √ [|0⟩ + |1⟩] = √ [| ↑⟩ + | ↓⟩], (2.18)
2 2
It is a weird feature of quantum spins that a coherent superpostion of up and
down points sideways!
The state corresponding to the diametricaly opposite point on the Bloch
sphere (x, y, z) = (−1, 0, 0) (or equivalently, θ = π/2, φ = π) is
1 1
| ←⟩ = | − X⟩ = |−⟩ = √ [|0⟩ − |1⟩] = √ [| ↑⟩ − | ↓⟩]. (2.19)
2 2
It is important not to confuse −| + X⟩ with | − X⟩. In the former case
the minus sign denotes the quantum amplitude associated with the state
pointing in the +X direction on the Bloch sphere. In the latter case, | − X⟩
is a completely different state corresponding to a different point on the Bloch
sphere.
Similarly, the states corresponding to the points (x, y, z) = (0, ±1, 0) (or
equivalently, θ = π/2, φ = ±π/2) on the Bloch sphere are
1 1
| ± Y ⟩ = | ± i⟩ = √ [|0⟩ ± i|1⟩] = √ [| ↑⟩ ± i| ↓⟩]. (2.20)
2 2
CHAPTER 2. QUANTUM BITS: QUBITS 52

Box 2.4. True Randomness It is very important to understand that the

randomness in quantum mechanics does not arise from our lack of knowledge
of all the variables in the experimental set up. It is not true that the qubit
has a certain value determined by some ‘hidden’ variable that you do not
know about. You can prepare two qubits in exactly the same state and still
get different measurement results! In fact, the qubit does not have a ‘value’
before it is measured. We will see later that assuming that the qubit has a
value before it is measured leads to contradiction with experiment. One can
actually prove experimentally that there are no hidden variables. This leads
to the following clever statement (due to Alexander Korotkov):
‘In quantum mechanics you do not see what you get, you get what you see.’

Importantly, if (using an idealized apparatus) you measure a qubit and

obtain the random result that it is in state |1⟩ (say), and you then measure
the same qubit again, you will get the same value for the second (and all
subsequent) measurement results. This means that the first measurement
has ‘collapsed’ the state. If you start with the state α|0⟩ + β|1⟩ and the first
measurement result happens to be 1, then the state collapses to |1⟩ and stays
there for all subsequent measurements. Conversely, if the first measurement
happens to yield the value 0, then the state collapses to |0⟩ and stays there.
Beginning students find this collapse effect confusing, but don’t worry you
are in good company, as experts do too!
As stated previously, the above results illustrate a curious asymmetry in
quantum information. The state of a qubit requires specification of two real
numbers (the Bloch sphere angles, θ and φ). In principle this means it takes
an infinite number of bits to specify the state. And yet, when we make a
measurement, the state collapses to |0⟩ or |1⟩ and we learn only one bit of
information. This asymmetry has far-reaching implications as we will see
later. In particular it will play an important role in the no-cloning theorem
(which tells us that unknown quantum states cannot be copied). This in
turn is the basis of quantum cryptography.
One might wonder why one would want to build a computer out of qubits
whose measured values seem to be random. The answer (somehow) partly
lies in the ability of a collection of qubits to explore a huge number of configu-
rations simultaneously. Thus a collection of 3 qubits can be in a simultaneous
superposition of 23 states of the form

|ψ⟩ = α0 |000⟩ + α1 |001⟩ + . . . + α7 |111⟩. (2.21)

CHAPTER 2. QUANTUM BITS: QUBITS 53

Thus if we build a quantum computer with an input register of N qubits, we

can give the computer a single giant superposition of all the possible (classi-
cal) inputs we could ever give it. Because the Schrödinger equation describing
the time evolution of the quantum state of the computer is linear, the com-
puter will carry out its calculation on every possible input simultaneously
without difficulty. Thus we have what appears to be exponentially powerful
‘quantum parallelism’ in which the computer carries out 2N computations
at once. Unfortunately, the output register contains a huge superposition
of all 2N answers (results of the calculation). When we measure the output
register the state randomly collapses to only one of the 2N answers and it
seems we lose all the quantum advantage of the parallelism.
It turns out however that all is not necessarily lost. There exist certain
classes of problems which have simple answers and one can use the wave-like
properties of the superpositions, illustrated in Eq. (2.12), to yield construc-
tive interference which ‘focuses’ the probability amplitudes in the output
register onto the desired answer (see Fig. 2.8). The classic example of this
is the Shor algorithm for finding the prime factors of a composite number.
The input to the quantum computer is very simple–the single number to be
factored. The output is also very simple–one of the factors8 . The algorithm
almost perfectly focuses the output amplitude onto one of the factors so that
measurement of the output register gives the correct answer with relatively
high probability9 . Since factoring is a ‘one-way problem’ that is hard to solve
but the solution is easy to check (using a simple classical computation), one
simply runs the algorithm a small number of times and checks the answer
each time. Within a relatively small number of tries, you will be able to verify
that you have found a factor. In this case, where the inputs and outputs are
relatively simple, the power of the quantum computer (presumably) comes
from its ability to explore an exponentially large number of intermediate
states during the course of the calculation.
The Shor algorithm is exponentially faster than the best known classical
algorithm for factoring. (Factoring is believed to be exponentially hard clas-
sically, but this has not actually been rigorously proven.) Other algorithms
offer only polynomial speed up. For example, we will show later that the
8
This is not quite true. It requires some classical postprocessing of the output to obtain
one of the factors.
9
Actually, the algorithm focuses the output onto many different values, but there exists
a fast classical algorithm for finding the desired answer from any one of these outputs using
modular arithmetic.
CHAPTER 2. forQUANTUM
Co-Design Center BITS: QUBITS
Quantum Advantage https://bnl.gov/quantumcenter Quantum.Yale.edu 54

A quantum register can hold an exponentially large superposition of all possible 2 N states
000 ⇒ 000 + 001 − 010 − 011 + 100 + 101 + 110 − 111 INPUT

Apply wave-like
Quantum
destructive interference
computer
to eliminate (many of) the
program
‘wrong’ answers.
000 + 001 − 010 − 011 + 100 + 101 + 110 − 111

Measurement then yields the correct answer with high probability.

OUTPUT

‘programmable diffraction grating’

Figure 2.8: Schematic illustration of the action of a quantum algorithm in focusing the
wave-like quantum input onto the desired answer in the output register. We can think of
the algorithm as a programmable diffraction grating that blocks certain combinations of
waves or changes their phase to modify the resulting interference.

Grover search algorithm

√ can find a given entry in an unordered data base
of size N in about N queries to the (quantum) database. Consider for
example a telephone directory which has the entries in random rather than
alphabetically order. To find your friend’s name in this database classcially
would require examining on average N/2 out of the N total entries. For this
problem, the quantum algorithm is not able to focus all the output amplitude
onto the desired answer, but rather can partially
√ focus the output probability
amplitudes so that they are spread over only N output states, rather√than
all N . Thus measurement of the output register has probability p ∼ 1/ N of
collapsing√onto the desired state. One thus expects to find the correct answer
in about N rather than N/2 attempts as in the classical case. This is not
exponential speed up but it is still an interesting (and surprising) quantum
advantage.
As mentioned previously, it is important to understand that we still do not
yet know the full power of quantum hardware relative to classical hardware.
Many quantum algorithms remain to be discovered and the theoretical classi-
fication of the hardness (complexity) of different tasks on quantum hardware
is not yet complete and remains an important open problem in quantum
computer science. Fig. 2.9 gives some illustrative examples.
The source of the power of quantum computers lies partly in the existence
of superpositions but also in another subtle concept called entanglement that
Co-Design Center for Quantum Advantage https://bnl.gov/quantumcenter Quantum.Yale.edu

CHAPTER 2. QUANTUM BITS: QUBITS 55

Interacting quantum
fermions/bosons;
Quantum Chemistry
Multiplication Classically Quantum Quantum
easy easy hard
Real-time
Factorization evolution of
quantum
systems

Figure 2.9: Schematic illustration of quantum and classical problem complexity. Given a
quantum state (loaded into the input register of a quantum computer) and the Hamilto-
nian (energy function) that determines its time evolution via the Schrödinger equation, 1

a quantum computer can efficiently evolve the state forward in time. Thus quantum dy-
namics is quantum easy. However given the quantum Hamiltonian it may be quantum
hard to find the ground state. [Figure courtesy of Shruti Puri.]

gives qubits ‘spooky’ correlations that can be stronger than is allowed for
classical bits. In turns out that to utilize the full power of entanglement
requires an additional ingredient which goes by the unfortunate name of
‘magic.’ Of course it isn’t magic, it is physics, but it does seem like magic
and Einstein thought that this was sure proof that the quantum theory was
wrong. Ironically, today we use this magic as a daily engineering test to
make sure our quantum computers really are quantum. We will look into
these mysteries in a later section.

Box 2.5. Can a quantum computer do everything a classical com-

puter can do? We will see that quantum computers have special powers,
but a question one may ask is whether a quantum computer could be used as
a classical computer. The answer is yes–the powers of a quantum computer
are a superset of the powers of a classical computer. If you load the in-
put register with a standard basis state (not a superposition of such states),
quantum Toffoli gates will map such states to other standard basis states
without ever creating any quantum superpositions. The action of the Toffoli
gates then is equivalent to those of such gates acting on classical bits and we
know that this is universal for classical computation.
CHAPTER 2. QUANTUM BITS: QUBITS 56

Box 2.6. Church-Turing Thesis In the 1930’s Church and Turing made
foundational contributions to logic and computer science. Turing invented
a (theoretical) prototype computer which is universal–it’s capabilities essen-
tially define what is ‘computable.’ Scott Aaronson summarizes the (physical)
Church-Turing Thesis as saying that every physical process can be simulated
by a Turing machine to any desired precision, and the Extended Church-
Turing Thesis as saying that every physical process can be efficiently simu-
lated by a Turing machine. We now understand that this thesis is false for
quantum processes. There are quantum processes which are exponentially
hard to simulate on classical computers. An interesting question is whether
a quantum Turing machine obeys the Extended Church-Turing thesis. It is
natural to presume that quantum hardware can simulate any physical quan-
tum process. But is quantum mechanics the ultimate theory that describes
all possible physical processes (even those occuring at extreme energies (the
Planck scale) where quantum gravity may become important)? Does the
universe actually obey some other theory which cannot be efficiently simu-
lated on hardware obeying the rules of ordinary quantum mechanics. I for
one, don’t know...
Chapter 3

Introduction to Hilbert Space

We have seen that there exist a continuum of possible encodings (choice of

quantization axis) that can be used to represent and measure the quantum
states of a two-level system (qubit). The choice of quantization axis is rep-
resented by a three dimensional unit vector corresponding to a point on the
surface of the Bloch sphere. However since measurement of a qubit always
yields only one of two possible outcomes, the mathematical representation of
the quantum state of a qubit turns out to be an element of a two- (not three-)
dimensional vector in a complex vector space. The state of a qubit whose
polarization vector (‘spin’) points in a specified direction on the Bloch sphere
can be written as a linear combination of a pair of ‘basis states.’ These basis
states could for example be the standard (‘computational’) basis states |0⟩
and |1⟩

|ψ⟩ = α|0⟩ + β|1⟩, (3.1)

where α and β are complex coefficients known as quantum amplitudes. The

mathematical space of allowed states for a qubit is a two-dimensional vector
space over the field of complex numbers and is referred to as the system’s
Hilbert space. The reader is urged to review the formal definition of
a vector space in Appendix B. In addition, a handy summary of
the equations and mathematical identities used in this chapter is
presented in App. C.
We are used to writing ordinary vectors as ‘row vectors’ (x, y) or (x, y, z)
whose components are real numbers. It is traditional however in quantum
mechanics to represent vectors as (possibly complex) ‘column vectors.’ Thus

57
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 58

for example the standard basis states can be represented as

1
|0⟩ = (3.2)
0

0
|1⟩ = (3.3)
1

which gives for Eq. (3.1)

α
|ψ⟩ = . (3.4)
β

The notation |0⟩, |1⟩ refers to the direction of the polarization vector on the
Bloch sphere, while the notation 0, 1 is the computer science notation for the
bit value of the standard basis states. (See Fig. 2.7.)
As described in Appendix B, it will be useful to define an ‘inner product’
between pairs of vectors in Hilbert space that is analogous to the dot product
of two ordinary vectors1 . The inner product of a pair of vectors is a scalar,
that is it is an element of the field over which the vector space is defined.
Ordinary vectors like the position ⃗r of a particle in three-dimensional space
form a vector space over the field of real numbers and the inner (i.e., dot)
product is a real number. Hilbert space is an abstract space of vectors (quan-
tum states) over the field of complex numbers. Hence the inner product of
two state vectors can be complex. Following common parlance, we will use
the terms ‘inner product’ and ‘overlap’ interchangeably. The inner product
between |ψ⟩ and
′
′ ′ ′ α
|ψ ⟩ = α |0⟩ + β |1⟩ = (3.5)
β′

is defined to be (using the Dirac notation2 discussed in Appendix B and to

be explained further below)
∗ ∗
⟨ψ ′ |ψ⟩ = α′ α + β ′ β. (3.6)
1
Recall that the ordinary dot product of two three-dimensional p vectors ⃗vj = (xj , yj , zj )
is ⃗v1 · ⃗v2 = x1 x2 + y1 y2 + z1 z2 = |⃗v1 ||⃗v2 | cos θ12 , where |⃗vj | = ⃗vj · ⃗vj is the length of the
jth vector and θ12 is the angle between the two vectors.
2
Mathematicians often prefer the notation (ψ ′ , ψ) for the inner product.
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 59

⃗ ·B
For ordinary vectors A ⃗ =B⃗ · A,
⃗ but notice that for a complex vector space
we have to be careful about the order since

⟨ψ ′ |ψ⟩ = (⟨ψ|ψ ′ ⟩)∗ . (3.7)

The complex conjugation means that the inner product of any vector with
itself is real

⟨ψ|ψ⟩ = |α|2 + |β|2 = 1, (3.8)

where the second equality follows from the Born interpretation of the magni-
tude squared of probability amplitudes as measurment outcome probabilities.
Dirac referred to his notation for state vectors as the ‘bracket’ notation.
A quantum states is represented by ‘ket’ |ψ⟩. Associated with this vector is
a dual ‘bra’ vector defined to be the following row vector
†
α∗ β ∗

α
⟨ψ| = = , (3.9)
β

where † indicates the adjoint, that is the complex conjugate of the transpose.
Thus
α∗ β ∗
′
′ α
⟨ψ|ψ ⟩ = = α∗ α′ + β ∗ β ′ , (3.10)
β′

where the last equality follows from the ordinary rules of matrix multiplica-
tion (here applied to non-square matrices).
Even though the inner product can be complex, we can still think of it as
telling us something about the angle between two vectors in Hilbert space.
Thus for example

⟨0|1⟩ = ⟨↑ | ↓⟩ = 0. (3.11)

Notice the important fact that these pairs of vectors in Hilbert space are
orthogonal even though their corresponding qubit polarization vectors on
the Bloch sphere are co-linear (anti-parallel) and thus not orthogonal in the
usual geometric sense. Thus opposite points on the Bloch sphere correspond
to orthogonal vectors in Hilbert space. [This is closely tied to the fact that
half angles appear in Eqs. (2.16-2.15).] If two state vectors are orthogonal
then the states are completely physically distinguishable. A system prepared
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 60

in one of the states can be measured to uniquely determine which of the two
states the system is in.
Of course if the system is in a superposition of two orthogonal states,
the measurement result can be random. Conversely, if a system is in one of
two states that are not orthogonal, it is not possible to reliably determine by
measurement which state the system is in. We shall explore this more deeply
when we discuss measurements.

3.1 Linear Operators on Hilbert Space

An operator defines a mapping from the Hilbert space back into the same
Hilbert space. In quantum mechanics we will deal with linear operators,
which for the case of a single qubit are simply 2x2 matrices that when mul-
tiplied into a column vector representing a state, produce another column
vector representing another state. It turns out that physical observables in
quantum mechanics are represented by Hermitian matrices. We cannot use
numbers to represent physical observables because observables do not have
values before we measure them. Instead observables have the potential to
have different values upon measurement. It turns out that matrices can be
used to represent this peculiar situation. That is matrices (since they contain
many numbers) have the potential to express all the possible measurement
outcomes. This takes some getting used to, so we will proceed with some
examples.
We might for example wish to describe the possibility that the z com-
ponent of the polarization vector on the Bloch sphere could turn out, upon
measurement, to be either one of the two allowed results, +1 or −1. This
physical quantity (the z component of the polarization vector on the Bloch
sphere) is represented by the following linear operator (matrix)

z 1 0
σ =Z= . (3.12)
0 −1

where σ z is the standard physics notation (associated with the z component

of an electron spin vector say) and Z is the standard quantum computer
science notation. We shall use these interchangeably. [Please note that the
superscript z is not an exponent but simply a label. In Exercise 3.4 we
define three so-called Pauli matrices labeled x, y, z as the three components
of a vector whose components are 2x2 matrices.]
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 61

Now notice the following interesting properties of this matrix

z 1 0 1
σ |0⟩ =
0 −1 0

1
= = (+1)|0⟩ (3.13)
0

z 1 0 0
σ |1⟩ =
0 −1 1

0
= = (−1)|1⟩. (3.14)
−1

These equations tell us that |0⟩ is an eigenstate of σ z with eigenvalue +1

and that |1⟩ is an eigenstate of σ z with eigenvalue -1. The eigenvectors of
an operator are those vectors that are mapped into themselves under the
operation, up to a constant factor called the eigenvalue.
It turns out that the possible measurement results for some physical ob-
servable are given by the eigenvalues of the operator corresponding to that
observable. Hence, we see that for the case of σ z the possible measurement
results must be ±1 in agreement with experiment. Furthermore, it turns out
that if a given quantum state is an eigenstate of a particular operator with
eigenvalue λ then the measured value of the observable will not be random,
but in fact will always be exactly λ. Let us consider what happens when a
state is not an eigenstate of an observable. Consider for example

|ψ⟩ = α|0⟩ + β|1⟩ (3.15)

σ z |ψ⟩ = +α|0⟩ − β|1⟩. (3.16)

If α and β are both non-zero, this is clearly not an eigenvector. But this is
consistent with the fact that the measurement result will be random with
probability p0 = |α|2 of being +1 and probability p1 = |β|2 of being -1.
The reader is urged to carefully study Box 3.1 since the topic of random
measurement results and state collapse can be confusing to beginners. A
common confusion among students is the idea that measurement of σ z is
represented mathematical by acting with σ z on the state. This is incorrect.
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 62

Box 3.1. Eigenvalues, Eigenvectors and Measurements Measurement

of the value of a physical observable always yields one of the eigenvalues of
the operator that represents that observable. It is important to note that the
act of measurement of an observable is not simply represented by multiplying
the quantum state |ψ⟩ by the operator corresponding to the observable being
measured (e.g. σ z as in Eq. (3.16)). In particular, σ z |ψ⟩ is not the state of
the system after the measurement of σ z . Instead, the state always randomly
collapses to one of the eigenstates of the operator being measured. Further-
more, the measurement result is always one of the eigenvalues of the operator
being measured and the particular state to which the system collapses is the
eigenvector corresponding to the eigenvalue that is the measurement result.
Thus if the result of the measurement of σ z is +1, the state collapses to

α|0⟩ + β|1⟩ −→ |0⟩, (3.17)

and the quantum amplitudes α, β are permanently ‘lost.’ Conversely, if the

result of the measurement is −1, the state collapses to |1⟩. From the Born
rule, the probabilities that the state collapses to |0⟩ and |1⟩ are respectively

p0 = |α|2 = ⟨ψ|0⟩⟨0|ψ⟩ (3.18)

p1 = |β|2 = ⟨ψ|1⟩⟨1|ψ⟩. (3.19)

How do we know that measurement collapses the state onto one of the
eigenvectors of the operator being measured? This follows from the
experimental fact that a Z measurement (say) that gives a certain result,
will give exactly the same result if we make a second Z measurement. The
most general possible state is a superposition of the two eigenstates of σ z of
the form |ψ⟩ = α|0⟩ + β|1⟩. In order for the second Z measurement to not
be random, we must have (from the Born rule) that |α|2 and |β|2 cannot
both be non-zero. We have to be in one eigenstate or the other so that
the probability of getting the same measurement result is 100%. Thus the
first Z measurement has to collapse the state onto one of the eigenstates of Z.

Shortly, we will encounter other Hermitian operators that are not diagonal in
the standard basis. In order to apply the Born rule for them, it is essential to
re-express the state in the basis of eigenstates of the operator being measured.
See Box 3.3 for discussion of this point.

As discussed further in Box 3.1, measurement collapses the state, so if we

CHAPTER 3. INTRODUCTION TO HILBERT SPACE 63

want to know the average measurement result for the state we have to prepare
many copies and measure each copy only once. The average measurement
result will of course be (+1)p0 + (−1)p1 = |α|2 − |β|2 . Let us compare
this quantity to the so-called ‘expectation value’ of the operator σ z in the
state |ψ⟩ which is defined by the expression ⟨ψ|σ z |ψ⟩. What we mean by
this expression is compute the state σ z |ψ⟩ and then takes its inner product
(‘overlap’) with |ψ⟩:

⟨ψ|σ z |ψ⟩ = (α∗ ⟨0| + β ∗ ⟨1|)(+α|0⟩ − β|1⟩

= +|α|2 ⟨0|0⟩ − α∗ β⟨0|1⟩ + β ∗ α⟨1|0⟩ − |β|2 ⟨1|1⟩
= |α|2 − |β|2 . (3.20)

Or equivalently in matrix notation the ‘expectation value of the observable

σ z in the state |ψ⟩’ is given by

ψ0∗ ψ1∗

z 1 0 ψ0
⟨ψ|σ |ψ⟩ = (3.21)
0 −1 ψ1
= |ψ0 |2 − |ψ1 |2 = |α|2 − |β|2 . (3.22)

Thus we have the nice result that the average measurement result for some
observable is simply the expectation value of the observable (operator) in the
quantum state being measured. We see again that the operator associated
with a physical observable contains information about all possible values that
could ever be measured for that observable. This is why the operator has
to be a matrix. We again emphasize however that individual measurement
results are random and the state after the measurement collapses to the
eigenvector corresponding to the measured eigenvalue. The state after the
measurement has nothing to do with σ z |ψ⟩.
Can we now say something about how random the measurement result
will be? Let us begin by reminding ourselves about some basic facts about
classical random variables. The reader is urged to review the discussion of
probability and statistics in Appendix A. Suppose some random variable
ξ takes on values from the set {v1 , v2 , . . . , vM } and value vj occurs with
probability pj . Then the mean value (also known as the expectation value)
is given by
M
X
ξ¯ = pj vj . (3.23)
j=1
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 64

where the overbar indicates statistical average. One measure of the random-
ness of ξ is its variance defined by
¯ 2 = ξ 2 − ξ¯ 2
Var(ξ) ≡ (ξ − ξ) (3.24)
M
" M
#2
X X
= pj vj2 − pj v j . (3.25)
j=1 j=1

We see that the variance is the mean square deviation of the measured quan-
tity from the average and so is a measure of the size of the random fluctua-
tions in the ξ.
Exercise 3.1. Statistical Variance
a) Derive Eq. (3.25).

b) Assuming that all the eigenvalues vj are distinct (i.e. non-

degenerate), prove that the variance vanishes if and only if one
of the pj ’s is unity and the rest vanish (so that ξ is not random).
If one of the eigenvalues is m-fold degenerate, then the correspond-
ing m probabilities must add up to unity and the rest must vanish.
Let us now turn to the quantum case. Let observable Q have eigenvalues
{v1 , v2 , . . . , vM }. (So far we have only considered the case of a single qubit
for which M = 2. In more general cases (e.g. multiple qubits), the Hilbert
space dimension M can be larger.) The analog of Eq. (3.25) for the variance
of the measurement results for Q in state |ψ⟩ is

Var(Q) = ⟨ψ|Q2 |ψ⟩ − ⟨ψ|Q|ψ⟩2 . (3.26)

Now notice that if |ψ⟩ is an eigenvector of Q (say the mth eigenvector)

Q|ψ⟩ = vm |ψ⟩, (3.27)

then
2
Var(Q) = vm ⟨ψ|ψ⟩ − [vm ⟨ψ|ψ⟩]2 = 0. (3.28)

Thus only superpositions of states (in the basis of eigenvectors appropriate

to the measurement!) with different eigenvalues will have randomness in the
measurement results. This is entirely consistent with the experimental facts
described above for a single qubit.
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 65

Exercise 3.2. Since the eigenvalues of σ z are ±1, we expect that (σ z )2

has both of its eigenvalues equal to unity. From the matrix form of σ z
prove that its square is the identity matrix

z 2 ˆ 1 0
(σ ) = I = .
0 1
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 66

Box 3.2. Observables as Hermitian operators Hermitian matrices (that

is, matrices obeying M † = M , where again † indicates transpose followed by
complex conjugation) are guaranteed to have real eigenvalues and guaranteed
that their eigenvectors form a complete basis set that spans the Hilbert space
(more on what this means later; see also Appendix B). Since the measured
values of physical observables are always real, physical observables are always
represented by Hermitian matrices.
Let us now demonstrate that Hermitian matrices have real eigenvalues and
that their eigenvectors are orthogonal. Consider the jth and kth (normalized)
eigenvectors

H|ψj ⟩ = λj |ψj ⟩, (3.29)

H|ψk ⟩ = λk |ψj ⟩. (3.30)

Taking the adjoint of the second equation we obtain

⟨ψk |H † = λ∗k ⟨ψk |. (3.31)

Assuming H † = H, can now use Eq. (3.29) and Eq. (3.31) to obtain

⟨ψk |H|ψj ⟩ = λj ⟨ψk |ψj ⟩, (3.32)

⟨ψk |H|ψj ⟩ = λ∗k ⟨ψk |ψj ⟩. (3.33)

Subtracting these yields

0 = (λj − λ∗k )⟨ψk |ψj ⟩. (3.34)

For the case j = k, we know that (by construction) ⟨ψj |ψj ⟩ = 1, and hence the
imaginary part of λj must vanish. Thus all the eigenvalues of an Hermitian
operator are real. If for j ̸= k, the eigenvalues are non-degenerate (i.e.,
λj ̸= λk ), then Eq. (3.34) requires ⟨ψk |ψj ⟩ = 0 and so the two eigenvectors
must be orthogonal. Thus if the full spectrum is non-degenerate, the set of
eigenvectors is orthonormal: ⟨ψk |ψj ⟩ = δkj (where δkj is the Kronecker delta
symbol which vanishes if k ̸= j and is unity for k = j).
If M eigenvalues are degenerate, then any superposition of the M eigenvec-
tors is also an eigenvector. If they are not orthogonal, they can be made
orthogonal by taking appropriate linear combinations of the set of M eigen-
vectors (via the so-called Gram-Schmidt procedure).
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 67

3.2 Dirac Notation for Operators

We have seen that in Dirac’s bracket notation, the inner product ⟨Φ|Ψ⟩ is a
(possibly complex) number. As discussed in Appendix B, the so-called outer
product G = |Ψ⟩⟨Φ| turns out to be an operator on the space. To see why,
simply operate with G on an arbitrary state |χ⟩

G|χ⟩ = (|Ψ⟩⟨Φ|) |χ⟩ = |Ψ⟩ (⟨Φ|χ⟩) = g|Ψ⟩, (3.35)

where g is the complex number representing the inner product of |Φ⟩ and
|Ψ⟩,

g = ⟨Φ|χ⟩. (3.36)

Hence when applied to any vector in the Hilbert space, G returns another
vector in the Hilbert space. Thus it is an operator and indeed it is a linear
operator since

G(α|χ⟩ + β|ϕ⟩) = αG|χ⟩ + βG|ϕ⟩. (3.37)

As a specific example, consider the operator

G = |0⟩⟨0| − |1⟩⟨1|. (3.38)

Clearly

G|0⟩ = (+1)|0⟩ (3.39)

G|1⟩ = (−1)|1⟩, (3.40)

so it must be that G = σ z . To confirm this let us write G out in matrix form

1 1 0 0 0 1
G = − (3.41)
0 1

1 0 0 0
= − (3.42)
0 0 0 1

1 0
= , (3.43)
0 −1
where the last result follows from the rules of matrix products given in App. B
and agrees with Eq. (3.12).
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 68

The above result is a particular case of the general fact that in Dirac’s
notation, Hermitian operators take on a very simple form
M
X
V = vj |ψj ⟩⟨ψj |, (3.44)
j=1

where |ψj ⟩ is the jth eigenvector of V with eigenvalue vj .

Exercise 3.3. Show that the representation of the operator V in
Eq. (3.44) correctly reproduces the required property of V that

V |ψm ⟩ = vm |ψm ⟩,

for every eigenvector |ψm ⟩ of V .

Exercise 3.4. The three components of the (dimensionless) spin

(qubit) polarization vector ⃗σ = (σ x , σ y , σ z ) are each physical observ-
ables. The 2 × 2 matrices representing these operators are known as the
Pauli matrices. We have already seen the matrix representation for σ z
in the up/down (Z) basis

z +1 0
σ = . (3.45)
0 −1

Using the known form of the states | ± X⟩ from Eq. (2.18) and Eq. (2.19)
and | ± Y ⟩ in Eq. (2.20), show that in the Z basis

x 0 +1
σ = (3.46)
+1 0

y 0 −i
σ = . (3.47)
+i 0

3.3 Orthonormal bases for qubit states

Using the standard state parametrization in Eqs. (2.16-2.15), it is possible to
show that the quantum states | ± n̂⟩ corresponding to any pair of oppositely
directed unit vectors ±n̂ on the Bloch sphere are orthogonal

⟨−n̂| + n̂⟩ = 0. (3.48)

CHAPTER 3. INTRODUCTION TO HILBERT SPACE 69

Since were are dealing with a Hilbert space of only two dimensions and we
have an orthonormal pair of states in the Hilbert space, they constitute a
complete basis for expressing any vector in the Hilbert space.
Similarly, it is straightforward to derive the so-called completeness rela-
tion
ˆ
| + n̂⟩⟨+n̂| + | − n̂⟩⟨−n̂| = I, (3.49)

where Iˆ is the identity operator whose matrix representation is

ˆ 1 0
I= . (3.50)
0 1
This result is the simplest illustration of the theorem that the eigenvectors
of any Hermitian operator on a Hilbert space form a complete basis for that
space. Furthermore, if the spectrum is non-degenerate (no two eigenvalues
are the same), then the eigenvectors are automatically orthogonal and thus
form a complete orthonormal basis. See Box 3.2 and App. B.
Exercise 3.5. Derive Eq. (3.48).

Exercise 3.6. Derive Eq. (3.49).

The completeness relation is a powerful result because it allows us to
express any state |ψ⟩ in any basis by simply writing
ˆ
|ψ⟩ = I|ψ⟩ = | + n̂⟩⟨+n̂|ψ⟩ + | − n̂⟩⟨−n̂|ψ⟩. (3.51)

We see immediately the coefficients in the expansion of the state |ψ⟩ in the
basis are simply computed as the inner products of the basis vectors with
the state |ψ⟩.
The results above are very reminiscent of how we find the representation of
an ordinary vector, say a position vector in 2D. We may have some orthogonal
basis vectors for our coordinate system, for example x̂, ŷ, or a rotated set î, ĵ.
We can represent any vector as
⃗r = (rx , ry ) = x̂(x̂ · ⃗r) + ŷ(ŷ · ⃗r) (3.52)
= î(î · ⃗r) + ĵ(ĵ · ⃗r). (3.53)
This suggests we can think of the identity transformation as

Iˆ = x̂x̂ + ŷ ŷ = îî + ĵ ĵ, (3.54)

CHAPTER 3. INTRODUCTION TO HILBERT SPACE 70

where we interpret x̂x̂ applied to a vector V̂ to mean x̂(x̂ · V⃗ ). Note the

similarity between Eq. (3.54) and Eq. (3.49).
It is useful at this point to discuss the concepts of projections and projec-
tors. Recall that when we project an ordinary vector ⃗r onto the x axis, we
simply remove all components of ⃗r other than the x component. Thus the
projection of ⃗r onto the x axis is the vector x̂Vx = x̂(x̂ · V⃗ ). We can think of
Px = x̂x̂ as a projection operator (or projector) because

Px V⃗ = (x̂x̂)V⃗ = x̂(x̂ · V ). (3.55)

Notice that Px satisfies the defining characteristic of projection operators

(Px )2 = Px . (3.56)

This has a simple interpretation–once the vector is projected onto the x axis,
further projection doesn’t do anything.
We can think of the shadow of an object cast onto the ground as the pro-
jection of the objection onto the horizontal (xy) plane. This is accomplished
by the projection operator

Pxy = x̂x̂ + ŷ ŷ = Iˆ − ẑ ẑ, (3.57)

where Iˆ = x̂x̂ + ŷ ŷ + x̂ẑ is the identity for the 3D case. This shows us that
projection onto the xy plane simply removes the component of the vector
normal to the plane. Despite this more complicated form, it is straightfor-
ward to show that (Pxy )2 = Pxy as required. The key to this is that the basis
vectors x̂, ŷ, ẑ are orthonormal.
The analogous projector onto the vector |n̂⟩ in the Hilbert space describ-
ing the polarization of a qubit is simply

Pn̂ = |n̂⟩⟨n̂|. (3.58)

It is straightforward to show that this is idempotent Pn̂ Pn̂ = Pn̂ as required.

CHAPTER 3. INTRODUCTION TO HILBERT SPACE 71

Box 3.3. Measurements not in the standard basis Suppose we are

given the (standard basis) state

|Ψ⟩ = α|0⟩ + β|1⟩, (3.59)

and asked to measure an Hermitian operator

M = M+ |n̂⟩⟨n̂| + M− | − n̂⟩⟨−n̂| (3.60)

that is not diagonal in the standard basis (i.e., n̂ ̸= ±ẑ). Clearly the two
eigenvalues and eigenvectors of M are given by

M| + n̂⟩ = M+ | + n̂⟩ (3.61)

M| − n̂⟩ = M− | − n̂⟩. (3.62)

In order to apply the Born rule, it is essential that we express the state in
the basis of the eigenstates of M

|Ψ⟩ = α′ | + n̂⟩ + β ′ | − n̂⟩, (3.63)

from which we determine that measurement result M+ will occur with prob-
ability P+ = |α′ |2 and the state will collapse in this case to | + n̂⟩, while
measurement result M− will occur with probability P− = |β ′ |2 and the state
will collapse in this case to | − n̂⟩.
We can use the completeness relation in Eq. (3.49) to change the basis

|Ψ⟩ = | + n̂⟩⟨+n̂|Ψ⟩ + | − n̂⟩⟨−n̂|Ψ⟩ (3.64)

α′ = ⟨+n̂|Ψ⟩ = α⟨+n̂|0⟩ + β⟨+n̂|1⟩ (3.65)
β ′ = ⟨−n̂|Ψ⟩ = α⟨−n̂|0⟩ + β⟨−n̂|1⟩. (3.66)

Thus, Eq. (3.49) along with the Born rule tells us that means that, given an
arbitrary state |ψ⟩, a measurement asking the question ‘Is the state |Ψ⟩ actu-
ally |n̂⟩’ will be answered ‘yes’ with probability given by |α′ |2 = ⟨ψ|Pn̂ |ψ⟩ =
|⟨+n̂|ψ⟩|2 . Correspondingly the question ‘Is the state |Ψ⟩ actually | + n̂⟩’ will
be answered ‘yes’ with probability given by |β ′ |2 = ⟨ψ|P−n̂ |ψ⟩ = |⟨−n̂|ψ⟩|2 .
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 72

Exercise 3.7. Show that | ± n̂⟩ is an eigenvector of

Γn̂ = n̂ · ⃗σ = (n̂ · x̂)σ x + (n̂ · ŷ)σ y + (n̂ · ẑ)σ z (3.67)

with eigenvalue ±1

Γn̂ | + n̂⟩ = (+1)| + n̂⟩ (3.68)

Γn̂ | − n̂⟩ = (−1)| − n̂⟩. (3.69)

Hint: Show that

Γn̂ = | + n̂⟩⟨n̂| − | − n̂⟩⟨−n̂|. (3.70)

This is simply a reflection of the rotation invariance of space. We can

align our measurement apparatus to measure the qubit polarization in
the arbitrary direction n̂ and the state of the qubit will collapse to either
| + n̂⟩ or | − n̂⟩.

3.3.1 Gauge Invariance

We have seen that any two states at opposite points on the Bloch sphere
form an orthonormal basis that can be used to represent any state in the
2D Hilbert space of a single qubit. One confusing aspect of all this that we
have not yet discussed, is the following. The states | ± n̂⟩ in the Hilbert space
corresponding to the Bloch sphere unit vectors ±n̂ are not unique. Each basis
vector can be multiplied by a different arbitrary phase factor |v+ ⟩ = eiξ+ |+ n̂⟩
and |v− ⟩ = eiξ− | − n̂⟩ and we would still have a perfectly good orthonormal
basis obeying

⟨v+ |v+ ⟩ = ⟨v− |v− ⟩ = 1, (3.71)

⟨v− |v+ ⟩ = 0, (3.72)

and the completeness relation

Iˆ = |v+ ⟩⟨v+ | + |v− ⟩⟨v− |. (3.73)

The particular choice of phase factors cancels out in the above expressions.
The specific representation of one other operator (besides the identity) is also
independent of the so-called ‘gauge’ choice that we make by picking particluar
phase factors. For example, the (diagonal) operator that is measured by an
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 73

apparatus that detects the n̂ component of the qubit polarization vector

n̂ · σ = |v+ ⟩⟨v+ | − |v− ⟩⟨v− | (3.74)
= | + n̂⟩⟨+n̂| − | − n̂⟩⟨−n̂|, (3.75)
is invariant and the probability of the two measurement results is unaffected
by the gauge choice. However non-diagonal operators such as the spin flip
operator
Q̂ = |v+ ⟩⟨v− | + |v− ⟩⟨v+ | (3.76)
= e+i(ξ+ −ξ− ) | + n̂⟩⟨−n̂| + e−i(ξ+ −ξ− ) | − n̂⟩⟨+n̂| (3.77)
are gauge dependent when expressed in the | ± n̂⟩ basis. Of course the state
vectors also become gauge dependent
|ψ⟩ = α|v+ ⟩ + β|v− ⟩ (3.78)
= αeiξ+ | + n̂⟩ + βeiξ− | − n̂⟩, (3.79)
and so nothing changes in the physics.
Interestingly, even the apparently fixed definition for | + n̂⟩ in Eqs. (2.16-
2.16) is ambiguous when it comes to writing down the other basis vector |−n̂⟩.
If n̂ = (sin θ cos φ, sin θ sin φ, cos θ), then we can arrive at the oppositely
directed unit vector −n̂ via two different routes: i) θ → θ + π, φ → φ, or ii)
θ → π − θ, φ → φ + π. Consider for example the case θ = π/2, φ = 0 which
gives
1
|n̂⟩ = | + X⟩ = √ [|0⟩ + |1⟩] . (3.80)
2
Both transformation (i) and (ii) take n̂ to −n̂ on the Bloch sphere but the two
resulting states in the Hilbert space differ by a gauge choice (i.e., a phase)
1
| − n̂⟩ = | − X⟩ = − √ [|0⟩ − |1⟩] Method(i) (3.81)
2
1
= + √ [|0⟩ − |1⟩] Method(ii) (3.82)
2
Method (ii) self-consistently keeps θ in the range 0 ≤ θ ≤ π and is the
standard gauge choice. Going back to Eq. (2.19) and Eq. (2.20), we see that
Method (ii) was used in to define | − X⟩ and | − Y ⟩ and yields the standard
form of the Pauli matrices displayed in Ex. 3.4.
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 74

3.4 Rotations in Hilbert Space

In order to build a quantum computer we need to have complete control over
the quantum states of a system of many qubits. For the moment however let
us consider how we might create an arbitrary state of a single qubit starting
from some fiducial state, typically taken to be either |0⟩ or |1⟩ in the standard
basis (also known as the computational basis). Since the most general state
is a superposition state |n̂⟩ corresponding to an arbitrary point n̂ on the
Bloch sphere, we need to be able to perform rotations in Hilbert space that
correspond to the rotation of the Bloch vector some initial position to any
desired position on the Bloch sphere. Being able to prepare arbitrary states
is a requirement for carrying out a quantum computation.
Because every state in the single- or multi-qubit Hilbert space has the
same length ⟨ψ|ψ⟩ = 1, rotations are the only operations we need to move
throughout the entire Hilbert space. There is (despite our earlier discussion
of interference from adding quantum amplitudes in Eq. (2.13)) no concept
of ‘translations’ in Hilbert space. If we add a vector |ϕ⟩ to the vector |ψ⟩
we of course obtain another element of the complex vector space, but gener-
ically |ψ⟩ + |ϕ⟩ is not properly normalized to unity. One can show that the
normalized version of the state
1
|Λ⟩ = √ (|ψ⟩ + |ϕ⟩) , (3.83)
W
with

W = 2 + ⟨ψ|ϕ⟩ + ⟨ϕ|ψ⟩, (3.84)

can be obtained from the initial state |ψ⟩ via a rotation. The idea is illus-
trated schematically for ordinary 2D vectors in Fig. 3.1.
It turns out that rotations in Hilbert space are executed via unitary op-
erations. A unitary matrix U obeys two defining conditions

U † U = I, (U is an isometry) (3.85)
U U † = I. (U is a coisometry) (3.86)

where I is the identity matrix. Thus, both the left and the right inverse of
U is simply the adjoint U −1 = U † . It turns out that unitary transformations
preserve the inner products between vectors in Hilbert space, just as the
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 75

G
v1
G G
v3 v2
G
v2
G G
G v1 v2
v1

Figure 3.1: Two unit vectors ⃗v1 , ⃗v2 whose end points lie on the unit circle in 2D. The
sum of the vectors ⃗v1 + ⃗v2 (generically) has length squared W = 2 + 2(⃗v1 · ⃗v2 ) ̸= 1. The
normalized vector ⃗v3 = √1W (⃗v1 + ⃗v2 ) does lie on the unit circle and can be obtained by a
rotation applied to either ⃗v1 or ⃗v2 .

more familiar orthogonal rotation matrices preserve the angles (dot products)
between ordinary vectors.3
To better understand rotations consider the representation of an arbitrary
state vector in terms of some orthonormal basis {|j⟩; j = 1, 2, 3, . . . , N } span-
ning the N -dimensional Hilbert space
N
X
|ψ⟩ = ψj |j⟩. (3.87)
j=0

From the Born rule, the probability of measuring the system to be in the basis
state |j⟩ is given by pj = |ψj |2 . The requirement that the total probability
be unity gives us the normalization requirement on the state vector
N
X
|ψj |2 = 1. (3.88)
j=1

Now consider a linear operation U that preserves the length of every vector
3
Recall that the defining property of an orthogonal matrix is that R−1 = RT . Thus a
unitary matrix whose elements are all real is an orthogonal matrix. Unitary matrices are
the natural generalization of orthogonal rotation matrices to vector spaces over the field
of complex numbers. This point is discussed further below in the vicinity of Eq. (3.129).
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 76

in the Hilbert space.

N
X
′
|ψ ⟩ = U |ψ⟩ = ψj U |j⟩, (3.89)
j=1

⟨ψ ′ |ψ ′ ⟩ = ⟨ψ|U † U |ψ⟩ (3.90)

XN
= ψk∗ ψj ⟨k|U † U |j⟩ (3.91)
j,k=1
N
X N
X
= |ψj |2 ⟨j|U † U |j⟩ + ψk∗ ψj ⟨k|U † U |j⟩. (3.92)
j=1 j̸=k

Since U preserves the length of every vector we have

⟨ψ|U † U |ψ⟩ = 1 (3.93)

and

⟨j|U † U |j⟩ = 1 (3.94)

for every basis vector |j⟩. Thus we must have

N
X
ψk∗ ψj ⟨k|U † U |j⟩ = 0 (3.95)
j̸=k

for all possible choices of the set of amplitudes ψj ; j = 1, 2, 3, . . . , N . Thus it

must be that

⟨k|U † U |j⟩ = 0 (3.96)

for k ̸= j and hence

⟨k|U † U |j⟩ = δkj , (3.97)

where the Kronecker delta symbol δjk = 1 for j = k and 0 for j ̸= k.

Equivalently, this means

U † U = I. (3.98)

Thus the only linear operations that conserve probability for all states are
unitary operations. It follows that unitary transformations preserve not only
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 77

the inner product of states with themselves but also preserve the inner prod-
ucts between any pair of states
|ϕ′ ⟩ = U |ϕ⟩ (3.99)
|ψ ′ ⟩ = U |ψ⟩ (3.100)
⟨ϕ′ |ψ ′ ⟩ = ⟨ϕ|U † U ψ⟩ = ⟨ϕ|ψ⟩. (3.101)
It turns out that in an ideal, dissipationless closed quantum system, the
evolution of the system from its initial state at time 0 to its final state at time
t is described by a unitary transformation. We can control the time evolution,
and thus create different unitary operations, by applying control signals to
our quantum system. The specifics of the physics of how this is done for
different systems using laser beams, microwave pulses, magnetic fields, etc.
will not concern us here. We will for now simply postulate that the only
operations available to us to control the quantum system are multiplication
of the starting state by a unitary matrix to effect a rotation in Hilbert space.
This seems reasonable because we are required to conserve total probability.
It turns out that in a deep sense, unitary transformations preserve in-
formation. In a so-called open quantum system that is coupled to its envi-
ronment (also called a ‘bath’), the time evolution of the system plus bath
is unitary but the evolution of the system alone is not because information
about the system can leak into the environment and is no longer conserved
(assuming we cannot access it once it is in the bath).
Being able to rotate states in Hilbert space can also be very useful for
the purposes of measurement. It often happens that the energy eigenstates
of the system (i.e., of the Hamiltonian operator) constitute the only basis in
which measurements can be conveniently made. Typically we represent the
energy eigenstates in terms of the standard basis states |0⟩ and |1⟩. Thus the
Hamiltonian (energy operator) is
E0 + E1 ˆ E1 − E0 z
H= I+ σ , (3.102)
2 2
which has eigenvalues E0 and E1 . If we choose the zero of energy to be half
way between the ground and excited state energies then E0 + E1 = 0 and we
can drop the first term. If we are able to measure the energy (say) then the
preferred measurement operator to which we have access is σ z . If the qubit
is in the state
|ψ⟩ = α|0⟩ + β|0⟩, (3.103)
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 78

and we are able to prepare many copies of this state, a histogram of the
measurement results Z = ±1 plus the Born rule allows us to estimate the
values of |α|2 and |β|2 . We cannot however deduce the relative complex phase
of α and β. (Recall that WLOG we can take α to be real.) To fully determine
the state we need (in general) to be able to measure all three components of
the ‘spin’ vector ⃗σ . If we only have access to measurements of σ z , then full
state ‘tomography’ seems to be impossible. However if we prepend certain
selected rotations of the state before making making the Z measurement we
can achieve our goal. For example a rotation by π/2 around the y axis takes
| + X⟩ to | − Z⟩ and | − X⟩ to | + Z⟩. Similarly a rotation by π/2 around
the x axis takes | + Y ⟩ to | + Z⟩ and | − Y ⟩ to | − Z⟩. Thus we can measure
all three components of the qubit polarization (spin) vector. In fact, we can
rotate any state |n̂⟩ into | + Z⟩ and thus measure the operator n̂ · ⃗σ .
Before we learn how to rotate states in Hilbert space, let us review the
familiar concept of rotations in ordinary space. For example if we start with
an ordinary 3D unit vector on the Bloch sphere
n̂ = xx̂ + y ŷ + z ẑ = [sin θ cos φ x̂ + sin θ sin φ ŷ + cos θ ẑ] , (3.104)
we can rotate it through an angle χ around the z axis to yield a new vector
n̂ ′ = [sin θ cos(φ + χ) x̂ + sin θ sin(φ + χ) ŷ + cos θ ẑ] , (3.105)
which simply corresponds to a transformation of the polar coordinates θ →
θ, φ → φ + χ. If we choose to represent n̂ as a column vector
   
x sin θ cos φ
n̂ =  y  =  sin θ sin φ  , (3.106)
z cos θ
then, using the trigonometric identities cos(θ + χ) = cos(θ) cos(χ) −
sin(θ) sin(χ) and sin(θ + χ) = sin(θ) cos(χ) + cos(θ) sin(χ), it is straight-
forward to show that the new vector n̂ ′ is represented by
 ′   
x x
′ ′ 
n̂ =  y = Rz (χ) y  ,
 (3.107)
′
z z
where Rz (χ) is a 3 × 3 ‘rotation matrix’
 
cos χ − sin χ 0
Rz (χ) =  sin χ cos χ 0  . (3.108)
0 0 1
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 79

Of course, if we follow this by a rotation through angle −χ we must return

the vector to its original orientation. Hence Rz (−χ)Rz (χ) = Iˆ or equivalently
Rz (−χ) = Rz−1 (χ). From the fact that sin and cos are respectively odd and
even in their arguments, it is straightforward to show that

Rz−1 (χ) = RzT (χ). (3.109)

Matrices whose transpose is equal to their inverse are called orthogonal, and
it turns out that rotations (for ordinary vectors) are always represented by
orthogonal matrices.
Two natural properties of rotations are that they preserve the length of
vectors and they preserve the angle between vectors. These facts can be
summarized in the single statement that if ⃗r1 ′ = Rz (χ)⃗r1 and ⃗r2 ′ = Rz (χ)⃗r2 ,
then ⃗r1 ′ · ⃗r2 ′ = ⃗r1 · ⃗r2 . That is, the dot product between any two vectors
(including the case of the the dot product of a vector with itself) is invariant
under rotations. The mathematical source of the preservation of lengths can
be traced back to the fact that the determininant of an orthogonal matrix is
unity.
Let us now turn to rotations in Hilbert space. There, must be some con-
nection to ordinary rotations, because rotation of the qubit spin vector on
the Bloch sphere is an ordinary (3D vector) rotation. However the Hilbert
space is only two-dimensional, and unlike the example above where the stan-
dard basis vectors were x̂, ŷ and ŷ, the standard basis vectors in the Hilbert
space correspond to the orthogonal states | + Z⟩ = |0⟩ and | − Z⟩ = |1⟩.
Furthermore, the inner product in Hilbert space involves complex conjuga-
tion unlike the case of the dot product for ordinary vectors. Hence we expect
that rotation operations in Hilbert space will not look like 3D rotations of
the unit vectors on the Bloch sphere.
Let us begin with rotations around the z axis. We know from the example
above, that rotation of the 3D vector n̂ on the Bloch sphere by an angle χ
around the z axis, simply corresponds to the transformation φ → φ + χ.
From the standard quantum state representation in Eqs. (2.16-2.15), we see
that this simply changes the relative phase of the coefficients of |0⟩ and |1⟩.
Let us therefore consider the following operator on the Hilbert space which
does something very similar
χ z
Uz (χ) = e−i 2 σ . (3.110)

What do we mean by the exponential of a matrix? One way to interpret this

CHAPTER 3. INTRODUCTION TO HILBERT SPACE 80

expression is via a power series expansion

∞
X 1 h χ in z n
Uz (χ) = −i [σ ] . (3.111)
n=0
n! 2

The series should converge provided that |χ| 2

|σ z | lies within the radius of
convergence of the series expansion. Here |σ z | means the absolute value of
the largest eigenvalue of the matrix. Since |σ z | = 1 and since the exponential
is an entire function (it is analytic everywhere in the complex plane and
hence the series has infinite radius of convergence), the series does converge4 .
Using the result of Ex. 3.2, we can rewrite Eq. (3.111) as
X 1 h χ in X 1 h χ in
−i χ σz ˆ
Uz (χ) = e 2 = −i I+ −i σz
n= even
n! 2 n= odd
n! 2
χ χ
= cos Iˆ − i sin σz . (3.112)
2 2
4
For infinite-dimensional matrices with unbounded eigenvalues, one has to be more
careful with this analysis.
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 81

Exercise 3.8. a) Prove that

e−iθ = cos θ − i sin θ (3.113)

without using the power series expansion. Hint: define the func-
tions

P (θ) = cos θ − i sin θ, (3.114)

−iθ
Q(θ) = −e , (3.115)

and show that they obey the same first-order differential equation

d
P (θ) = −iP (θ), (3.116)
dθ
d
Q(θ) = −iQ(θ) (3.117)
dθ

and have the same initial condition

P (0) = Q(0) = 1. (3.118)

b) Use the same method to prove that

Thus the 2 × 2 complex matrix Uz (χ) correctly rotates the quantum state by
an angle χ around the z axis of the Bloch sphere. Notice however that the
resulting state differs from the standard state by an (irrelevant) global phase
factor. This is because we made the arbitrary choice to have the standard
state parametrization for |n̂⟩ yield a coefficient of |0⟩ that is purely real so
that the only complex amplitude is found in the coefficient of |1⟩.
Notice that Uz (−χ) = Uz−1 (χ) = Uz† (χ). Hence Uz (χ) is unitary, meaning
that
ˆ
Uz† Uz = I. (3.127)

All rotations in Hilbert space are unitary and, as discussed earlier, it is

straightforward to show that any unitary transformation U preserves the
inner product in the Hilbert space. Thus unitary matrices for complex vector
spaces are the analog of orthogonal matrices for real vector spaces. The
inverse of an orthogonal matrix is its transpose

O−1 = OT . (3.128)

The inverse of a unitary matrix is its conjugate transpose

∗
U −1 = U † = U T , (3.129)

which is the natural generalization of orthogonal rotation matrices for vector

spaces over the reals to complex vector spaces.
Exercise 3.9. Prove that the Pauli matrices σ x , σ y , σ z are both Her-
mitian and unitary. This fact can be confusing for beginning students.
Because the Pauli matrices are Hermitian they correspond to physical
observables that can be measured. Because they are unitary they also
correspond to rotation operations that can be applied to quantum states.
These are two very different things.

Exercise 3.10. a) Prove that every operator that is both Hermi-

tian and unitary squares to the identity. Using this, prove that if
such operators are traceless, their spectrum contains only +1 and
−1 and has equal numbers of each.

b) Conversely, prove the following lemma: Every operator that both

squares to the identity and is Hermitian, is necessarily unitary.
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 83

Exercise 3.11. Every N × N unitary matrix can be written in the

form

U = eiθM̂ , (3.130)

where θ is a real parameter and M̂ is an N × N Hermitian matrix.

a) Using the fact that the eigenvectors of an Hermitian matrix form

a complete basis, find the spectrum (set of eigenvalues) and eigen-
vectors of U in terms of the eigenvalues and eigenvectors in terms
of M̂ .

b) Using this result, prove that U is unitary.

So far we have only considered rotations around the z axis of the Bloch
sphere. By analogy with Eq. (3.110), we can rotate the state around the x
axis through an angle χ via the unitary
χ x
Ux (χ) = e−i 2 σ , (3.131)

around the y axis via

χ y
Uy (χ) = e−i 2 σ , (3.132)

and around an arbitrary ω̂ axis via

χ
Uω̂ (χ) = e−i 2 ω̂·⃗σ , (3.133)

where

ω̂ · ⃗σ = ωx σ x + ωy σ z + ωz σ z (3.134)

is a linear combination of the three Pauli matrices.5

Mathematically, the three components of the qubit spin polarization S ⃗=
1 x y z
2
(σ , σ , σ ) are the generators of (the Lie group of) rotations in spin space
(i.e., rotations on the Bloch sphere). As an example, let us consider a rotation
by angle π/2 around the y axis to see how the cardinal points on the Bloch
5
It is important to note that since the Pauli matrices do not commute with each other,
the exponential of the sum of Pauli matrices cannot be written as a simple product of the
individual exponentials.
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 84

sphere transform into each other as expected:

π h πˆ π yi 1
Uy ( )| + Z⟩ = cos I − i sin σ |0⟩ = √ [|0⟩ + |1⟩]
2 4 4 2
= +| + X⟩, (3.135)
π
Uy ( )| − Z⟩ = −| − X⟩ (3.136)
2
π
Uy ( )| + X⟩ = +| − Z⟩ (3.137)
2
π
Uy ( )| − X⟩ = +| + Z⟩ (3.138)
2
where we have used the fact that

y 0 −i 1 0
−iσ |0⟩ = −i = = |1⟩ (3.139)
+i 0 0 1

y 0 −i 0 1
−iσ |1⟩ = −i =− = −|0⟩ (3.140)
+i 0 1 0

Exercise 3.12. Prove that

π
Ux ( )| + Z⟩ = |−Y⟩ (3.141)
2
π
Ux ( )| − Z⟩ = −i| + Y ⟩. (3.142)
2
SMG NEEDS WORK: This discussion needs to be improved. Most gen-
eral unitary requires three angles like Euler. Finding the unitary that takes
|0⟩ to an arbitrary state is not the most general unitary!
Readers familiar with classical mechanics may be aware that to rotate an
ordinary physical object in 3 dimensions from an initial configuration to an
arbitrary final configuration (generically) requires specification of 3 so-called
Euler angles. For single qubit quantum states, we need only specify two
rotations to rotate the initial state |0⟩ to an arbitrary final state | + n̂⟩. We
could for example rotate by angle θ around the y axis and then by angle ϕ
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 85

around the z axis. This would yield

Thus we obtain the desired final state up to an irrelevant global phase. We

could get rid of this phase by a third and final rotation around the n̂ axis
which would correspond to the 3rd Euler angle in the classical case. This
shows us that the phase in question is not important we are only looking at
state | + n̂⟩ or | − n̂⟩ since the final rotation just changes the global phase.
However if we are looking at a superposition of these two states, the spin
vector is point in some direction not co-linear with n̂ and the final rotation
changes the spin vector since it produces an important relative phase between
the two states in the superposition.
Some authors choose to define the Z rotation unitary to be
φ φ
U1 (φ) = e−i 2 [Z−I] = e+i 2 Uz (φ), (3.149)

so that the entire relative phase is achieved by changing the phase only of
the |1⟩

U1 (φ)|0⟩ = |0⟩, (3.150)

U1 (φ)|1⟩ = eiφ |1⟩ (3.151)

to be consistent with the phase choice made in the definition of | + n̂⟩ and
we have
θ
U1 (φ)e−i 2 Y |0⟩ = | + n̂⟩. (3.152)
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 86

3.4.1 The Hadamard Gate

We see from Eqs. (3.135-3.138) that a rotation by π/2 around the y axis
interchanges the states ±Z and ±X, but not in a simple way. In order to swap
the x and z components of the polarization vector without the inconvenience
of the various minus signs in Eqs. (3.136-3.138), quantum computer scientists
like to invoke the Hadamard gate
H = | + X⟩⟨+Z| + | − X⟩⟨−Z| (3.153)

1 x z 1 1 1
= √ (σ + σ ) = √ (3.154)
2 2 1 −1
which has the following nice properties

H|0⟩ = |+⟩ (3.155)

H|1⟩ = |−⟩ (3.156)
H|+⟩ = |0⟩ (3.157)
H|−⟩ = |1⟩. (3.158)

Exercise 3.13. Using Eq. (3.153), prove that the Hadamard gate obeys
a) H = | + Z⟩⟨+X| + | − Z⟩⟨−X|,

b) H 2 = I,

c) H is unitary.

d) H is a rotation of the general form given in Eq. (3.133). Find the

parameters χ and ω̂.

3.5 Hilbert Space and Operators for Multiple

Qubits
So far we have been focused on the two-dimensional complex vector space
describing the quantum states of a single spin or qubit. We turn now to the
study of how to extend this to two qubits and ultimately to N qubits. If
we have a single qubit, the standard basis has two states, |0⟩ and |1⟩ and
the qubit can be in any linear superposition of those two basis states. If we
have two qubits, we need four basis states: |11⟩, |10⟩, |0, 1⟩, |00⟩. We see that
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 87

the states are labeled by the binary numbers 00, 01, 10, 11 corresponding to
the base-10 numbers 0, 1, 2, 3 just as in a classical computer memory that
contains only two bits. Our quantum bits can however be in a superposition
of all four states of the form

|ψ⟩ = ψ11 |11⟩ + ψ10 |10⟩ + ψ01 |01⟩ + ψ00 |00⟩, (3.159)

or equivalently a column vector of length four

 
ψ00
 ψ01 
|ψ⟩ = 
 ψ10  .
 (3.160)
ψ11

Note that the choice of how to order the entries in the column is completely
arbitrary, but once you make a choice you must stick to it for all your cal-
culations. For the case of N qubits, all of the above generalizes to a vector
space of dimension 2N . We shall always use the standard ordering which
lists the entries in the column according to the binary numbering order used
above.
We have so far seen two kinds of products for vectors, the inner product
⟨ψ|ϕ⟩ which is a scalar (complex number), and the outer product |ϕ⟩⟨ψ|
which is an operator that has a matrix representation. When it comes to
thinking about the quantum states of a composite physical system consisting
of multiple qubits, we have to deal with yet another kind of product, the
tensor product. The tensor product of the Hilbert space H1 of the first qubit
with that of a second qubit H2 yields a new Hilbert space H1 ⊗ H2 whose
dimension is the product of the dimensions of the two individual Hilbert
spaces. The two-qubit basis states in Eq. (3.159) can be thought of as tensor
products of individual vectors,

|01⟩ = |0⟩ ⊗ |1⟩. (3.161)

The tensor product of two column vectors of length n1 and n2 is a column

vector of length n1 n2 . For example, the two-qubit states in the standard
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 88

basis are
   
1 1
1
1 1 
0 =
  0 
|00⟩ = |0⟩ ⊗ |0⟩ = ⊗ = (3.162)
0 0  1   0 
0
0 0
   
0 0
1
1 0 
1 =
  1 
|01⟩ = |0⟩ ⊗ |1⟩ = ⊗ = (3.163)
0 1  0   0 
0
1 0
   
1 0
0
0 1 
0 =
  0 
|10⟩ = |1⟩ ⊗ |0⟩ = ⊗ = (3.164)
1 0  1   1 
1
0 0
   
0 0
0
0 0 
1 =
  0 
|11⟩ = |1⟩ ⊗ |1⟩ = ⊗ = (3.165)
.
1 1  0   0 
1
1 1
Notice that we compute the tensor product of two vectors by inserting the
second vector (i.e., the one on the right) in the product into the first vector as
illustrated in the expressions in the fourth column of the array of equations
above. [Note: It is crucial that you maintain the correct ordering convention.]
For a general pair of two-qubit states, the tensor product is
   
u 0 v 0 u0
 v0 u1

v0 u0  =  v0 u1  .
  
⊗ = (3.166)
v1 u1  u0   v1 u0 
v1
u1 v1 u1

How do we represent operators in this multi-qubit space? Physicists some-

times put labels on the operators to denote which qubit they are acting on.
For example, the x component of the spin vector of the jth qubit is denoted
σjx , with j = 0, 1. We will find it convenient to label the qubits in an N -qubit
register starting with 0 for the right most qubit and ending with N − 1 for
the left most qubit like this

|qN −1 . . . q2 q1 q0 ⟩. (3.167)
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 89

Using this convention, we have for example

σ0x |00⟩ = |01⟩ (3.168)

σ1x |00⟩ = |10⟩ (3.169)
σ1x σ0x |00⟩ = |11⟩ (3.170)
σ1z σ1x |00⟩ = σ1z |10⟩ = −|10⟩. (3.171)

More formally however, we need to recognize that the Hilbert space for
two qubits is the tensor product of their individual Hilbert spaces and there-
fore states are represented as column vectors of length 4. This means that all
linear operators are represented by 4 × 4 matrices. Therefore we should write
two-qubit operators as the Kronecker product (also known confusingly and
sloppily as the outer, direct or tensor product) of the 2×2 matrices represent-
ing the individual qubit operators. [AUTHOR NOTE: NEED TO DEFINE
ALL THESE CONCEPTS EXPLICITLY AND ACCURATELY HERE OR
IN APP. B. DO ANY OF YOU MATH MAJORS HAVE SUGGESTIONS?]
The Kronecker product of a matrix with dimension n1 ×n1 with a matrix with
dimension n2 × n2 is a larger matrix of dimension (n1 n2 ) × (n1 n2 ). This is of
course consistent with the fact that the dimension of the Kronecker product
Hilbert space is n1 n2 . As an example of a Kronecker product consider the
matrix representation of the operator σ1z

z 0 +1 0 +1 0
σ ⊗σ = ⊗ . (3.172)
0 −1 0 +1

This is the formal way of representing σ1z that acts on qubit 1 with the Pauli
Z operator and does nothing to the 0th qubit (applies the identity). Notice
that we have removed the qubit labels because the position of the operator
in the Kronecker product implies which qubit it acts on. The rule for how
you write out the entries to this 4 × 4 matrix depends on exactly how you
order the terms in the column vector in Eq. (3.160). It is clear however that
we want the following to be true (assuming we number the qubits in |01⟩
from right to left inside the Dirac ket)

σ z ⊗ σ 0 |00⟩ = (+1)|00⟩ (3.173)

σ z ⊗ σ 0 |01⟩ = (+1)|01⟩ (3.174)
σ z ⊗ σ 0 |10⟩ = (−1)|10⟩ (3.175)
σ z ⊗ σ 0 |11⟩ = (−1)|11⟩, (3.176)
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 90

and for the operator σ0z we have

σ 0 ⊗ σ z |00⟩ = (+1)|00⟩ (3.177)
σ 0 ⊗ σ z |01⟩ = (−1)|01⟩ (3.178)
σ 0 ⊗ σ z |10⟩ = (+1)|10⟩ (3.179)
σ 0 ⊗ σ z |11⟩ = (−1)|11⟩. (3.180)
Using the ordering of the basis states in the column vector representation
in Eq. (3.160), the outer product in Eq. (3.172) is represented by the matrix
 
+1 0 0 0
 0 +1 0 0 
σz ⊗ σ0 =  . (3.181)
 0 0 −1 0 
0 0 0 −1
As a convenient mnemonic, think of the Kronecker product as producing
a 2 × 2 array whose entries are the second matrix in the product (the 2 × 2
identity matrices in the above example), each multiplied by one of the four
entries in the first matrix in the product (the 2 × 2 σ1z matrix in the example
above)

z 0 +1 0 +1 0
σ ⊗σ = ⊗ (3.182)
0 −1 0 +1
 
+1 0 +1 0
 (+1) 0 +1
(0)
0 +1 

=   (3.183)
+1 0 +1 0 
(0) (−1)
0 +1 0 +1
 
+1 0 0 0
 0 +1 0 0 
=  , (3.184)
 0 0 −1 0 
0 0 0 −1
in agreement with Eq. (3.181). Thus the procedure for computing the Kro-
necker product of two matrices is directly analogous to the procedure for
computing the tensor product of two state vectors.
Reversing the order of the operators in the product gives
 
+1 0 0 0
 0 −1 0 0 
σ0 ⊗ σz = 
 0
. (3.185)
0 +1 0 
0 0 0 −1
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 91

We can think of this as four copies of the σ z matrix each multiplied by the
appropriate entry in the identity matrix.
We can also consider the sum (Kronecker sum) of two operator–not to
be confused with the ‘direct sum’ which is different). For example, the z
component of the total spin vector is given by

S z ≡ σ1z + σ0z = σ z ⊕ σ z , (3.186)

is represented in matrix form by

 
+2 0 0 0
 0 0 0 0 
S z = σz ⊗ σ0 + σ0 ⊗ σz = 
 0
. (3.187)
0 0 0 
0 0 0 −2

This matrix form makes sense because it says that

S z |00⟩ = (+2)|00⟩
S z |01⟩ = (+0)|01⟩
S z |10⟩ = (+0)|10⟩
S z |11⟩ = (−2)|11⟩.
All this notation can be a bit confusing. To summarize: the symbols ⊗
and ⊕ refer to Kronecker product and Kronecker sum (not the ‘direct’ sum,
which is different)
https://en.wikipedia.org/wiki/Matrix_addition,
although confusingly the Kronecker product is sometimes referred to as the
matrix direct product,
https://en.wikipedia.org/wiki/Kronecker_product.
In Wolfram Mathematica ® , the following commands will produce Kro-
necker products of operators
II = IdentityMatrix[2];
MatrixForm[II]
Z = {{1, 0}, {0, -1}};
MatrixForm[Z]
ZI = KroneckerProduct[Z, II];
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 92

MatrixForm[ZI]
IZ = KroneckerProduct[II,Z];
MatrixForm[IZ]
MatrixForm[IZ + ZI]

and the following commands will produce operators from Kronecker products
of their eigenvectors

plusystate = {1, I}/Sqrt[2]; minusystate = {1, -I}/Sqrt[2];

Y = MatrixForm[
KroneckerProduct[plusystate, Conjugate[plusystate]] -
KroneckerProduct[minusystate, Conjugate[minusystate]]]

The above is equivalent to

(1 +i)∗ (1 −i)∗

1 1 1 1
Y = − (3.188)
2 +i 2 −i

Exercise 3.14. The Kronecker products of operators we have been

studying have simple distributive properties. Show that for general qubit
states |ψ⟩ and |ϕ⟩

(σ z ⊗ σ x ) (|ψ⟩ ⊗ |ϕ⟩) = (σ z |ψ⟩) ⊗ (σ x |ϕ⟩) .

Hint: Do this using the matrix and column vector representations of

each of the terms on each side of the equation.

Exercise 3.15. Write out the matrix representation of the following

two-qubit Pauli matrices

a) σ1x σ0x = σ x ⊗ σ x , where the subscripts 0 and 1 refer to which qubit

the operator acts upon.
ˆ + (Iˆ ⊗ σ x ).
b) σ1x + σ0x = σ x ⊕ σ x = (σ x ⊗ I)

1 1 0
(σ10 + σ1z ) + (σ00 + σ0z ) = σ0 + σz ⊕ σ + σz

2 2
h i 1h i
σ + σ ⊗ Iˆ +
0 z
Iˆ ⊗ σ 0 + σ z .

=
2
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 93

Box 3.4. The No-cloning Theorem The no-cloning theorem [1] states
that it is impossible to make a copy of an unknown quantum state.
The essential idea of the no-cloning theorem is that in order to make a copy
of an unknown quantum state, you would have to measure it to see what the
state is and then use that knowledge to make the copy. However measurement
of the state produces in general a random measurement result and random
back action (state collapse) and thus it is not possible to fully determine the
state. This is a reflection of the fact that measurement of a qubit yields one
classical bit of information which is not enough in general to fully specify the
state via its co-latitude θ and longitude φ on the Bloch sphere.
Of course if you have prior knowledge, such as the fact that the state is an
eigenstate of σ x , then a measurement of σ x tells you the ±1 eigenvalue and
hence the | ± X⟩ state. The measurement gives you one additional classical
bit of information which is all you need to have complete knowledge of the
state.
A more formal statement of the no-cloning theorem is the following. Given
an unknown state |ψ⟩ = α|0⟩ + β|1⟩ and an ancilla qubit initially prepared
in a definite state (e.g. |1⟩), there does not exist a unitary operation U that
will take the initial joint state

|Φ⟩ = [α|0⟩ + β|1⟩] ⊗ |1⟩ (3.189)

to the final state

U |Φ⟩ = [α|0⟩ + β|1⟩] ⊗ [α|0⟩ + β|1⟩] ,

= α2 |00⟩ + αβ[|01⟩ + |10⟩] + β 2 |11⟩. (3.190)

unless U depends on α and β.

The proof is straightforward. The RHS of Eq. (3.189) is linear in α and β,
whereas the RHS of Eq. (3.190) is quadratic. This is impossible unless U
depends on α and β. For an unknown state, we do not know α and β and
therefore cannot construct U .

Exercise 3.16. Assuming you have knowledge of α and β, construct

an explicit unitary U (α, β) that carries out the cloning operation in
Eq. (3.190).
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 94

Exercise 3.17. Bob attempts to clone an unknown state |ψ⟩ = |n̂⟩

that Alice has given him. Its orientation n̂ could be anywhere on the
Bloch sphere with equal a priori probability. Bob chooses to measure the
qubit along an arbitrary measurement axis m̂ by measuring M = m̂ · ⃗σ .
Based on the measurement result Bob guesses that the qubit was in
state | + m̂⟩ (if the measurement result was +1) or state | − m̂⟩ (if
the measurement result was −1). Knowing this result Bob can then
make an arbitrary number of approximate clones. What is the average
fidelity of this approximate cloning process? Hint: WLOG you can take
m̂ = (0, 0, 1).
The reader is reminder that a handy summary of the key equations in
this chapter is provided in App. C.
Chapter 4

Two-Qubit Gates and

Multi-Qubit Entanglement

4.1 Introduction: Multi-Qubit Operations

and Measurements
We learned in Chap. 3 that the physical operations we are allowed to carry
out (besides measurement) are represented by unitary matrices acting on
state vectors. These operations are rotations in Hilbert space and transform
one state into another. Indeed, this is all a quantum computer can do. You
load a quantum state into the N qubits of the input register, carry out a
unitary rotation in the 2N -dimensional Hilbert space, and place the resulting
state into the N qubits of the output register. Measure the state of the
output register and you acquire N classical bits of information.1 Unitary
operations can be combined with measurements in interesting ways. After
carrying out a unitary U0 , one could measure a subset of say m qubits. The
measurement will yield m classical bits of information corresponding to one
instance out of 2m possible results. Conditioned on this, one can choose which
1
As noted earlier, N classical bits is far less information than it takes to specify a
generic N -qubit state. Thus generically the measurements give random results as the
measured state collapses. However certain algorithms, e.g. Shor’s factoring algorithm use
quantum interference to ‘focus’ the amplitudes in the output state that it consists of a
superposition of only a small number of states in the standard basis and thus the result
is not very random. Repeating the algorithm a small number of times yields the desired
prime factors with high probability.

95
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT96

of 2m subsequenty unitaries Uj from the set U1 , U2 , . . . , U2m to apply. The

net effect of these three steps is the most general possible quantum operation
and is not equivalent to a unitary transformation. Such quantum operations
can be more powerful for certain purposes (e.g. quantum error correction).

4.2 Multi-Qubit Measurements

We have already studied measurements on a single qubit where the Born rule
tells us that a measurement of M̂ = ⃗σ · n̂ = (| + n̂⟩⟨+n̂|) − (| − n̂⟩⟨−n̂|) (i.e.,
made in the | ± n̂⟩ basis) on state

|ψ⟩ = α| + n̂⟩ + β| − n̂⟩, (4.1)

yields as a measurement result the M = 1 eigenvalue of M̂ with probability

|α|2 and the M = −1 eigenvalue with probability |β|2 . Recall that in order to
apply the Born rule to determine the measurement probabilities, it is essential
to re-express the given state in the basis of eigenvectors of the operator
being measured. The state collapses to the eigenvector corresponding to the
measured eigenvalue.
Multi-qubit measurements follow the same rule. To apply the Born rule
you must express the state in the eigenbasis of the multi-qubit operator being
measured. For simplicity, we will focus on measurements of operators that
are diagonal in the standard basis, but keep in mind that if you are measuring
non-diagonal operators, you need to re-express the multi-qubit state in terms
of the eigenvectors of the operator being measured.
As preparation for the discussion below, the reader is encouraged to re-
view the discussion of joint probability distributions of multiple variables in
Sec. A.4 in Appendix A. For simplicity let us consider a generic two-qubit
state in the standard basis

|ψ⟩ = α00 |00⟩ + α01 |01⟩ + α10 |10⟩ + α11 |11⟩. (4.2)

Measurement of the state of each of the two qubits yields result (x, y) (where
x ∈ {0, 1} and y ∈ {0, 1}) with Born-rule probability P (x, y) = |αxy |2 and
the two-qubit state correspondingly collapses to |xy⟩.
Because we have more than one qubit, we can now ask a new question:
What happens if we only measure one of the qubits but not the other? Sup-
pose we only measure q0 (qubit 0, numbering the qubits from right to left as
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT97

usual). How does the state collapse? To answer this, let us rewrite Eq. (4.2)
in the following form which has the state of qubit 0 factored out on the right:

|ψ⟩ = α00 |0⟩ + α10 |1⟩ |0⟩ + α01 |0⟩ + α11 |1⟩ |1⟩. (4.3)

Let us measure q0 in the standard basis using the operator B̂0 = Iˆ ⊗ |1⟩⟨1|.
This operator has a doubly degenerate eigenvalue of 0 with eigenvectors of
the form

|ϕ0 ⟩ = |µ⟩|0⟩, (4.4)

where |µ⟩, the state of q1 , is arbitrary (|0⟩, |1⟩ or any linear combination).
The operator also has a doubly degenerate eigenvalue of 1 with eigenvectors
of the form

|ϕ0 ⟩ = |ν⟩|1⟩, (4.5)

where |ν⟩ is arbitrary.

Hence if we obtain the measurement result 1, the two-qubit state has to
collapse to a new state consistent with that result, namely
1
|ψ1 ⟩ = p α01 |0⟩ + α11 |1⟩ |1⟩, (4.6)
P0 (1)

where the square root factor produces the correct normalization for the new
state and

P0 (1) = |α01 |2 + |α11 |2 = P (0, 1) + P (1, 1) (4.7)

is the Born rule probability for obtaining measurement result 1 on qubit 0.

The sum reflects the fact that there are two independent ways to obtain
this measurement result and their probabilities add. (We do not add the
probability amplitudes before squaring because the two states are orthogonal
rather than identical.)
Similarly if we obtain the measurement result 0, the two-qubit state col-
lapses to
1
|ψ0 ⟩ = p α00 |0⟩ + α10 |1⟩ |0⟩, (4.8)
P0 (0)
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT98

where

P0 (0) = |α00 |2 + |α10 |2 . (4.9)

In both cases, we see that measurement of q0 produces only a ‘partial

collapse’ of the state that leaves q1 still in a superposition state (dependent
on the measurement outcome). This concept of partial collapse will be very
important when we study Simon’s algorithm and also when we study quan-
tum error correction. The idea behind quantum error correction is that we
can make a measurement that that collapses a multi-qubit state enough to
tell us what error occurred but does not collapse the state enough to destroy
the quantum information being stored in that multi-qubit state, thereby al-
lowing us to correct the error without learning anything about the state in
which the error occurred! Much more on this later.
Having seen the result of measuring q0 and finding that q1 has been left in
a superposition state, we can now investigate what happens when we subse-
quently measure q1 (and thus complete the full collapse of the state). Suppose
that the measurement of q0 yielded the result 1 so that the partially collapsed
state is given by Eq. 4.6. From this state we deduce that a measurement of
q1 conditioned on the measurement of q0 having previously yielded 1, obtains
result 0 with probability
1
P (0|1) = |α01 |2 . (4.10)
P0 (1)
Thus the overall probability P (0, 1) to obtain 1 for q0 and 0 for q1 is P0 (1), the
probability to obtain 1 for the q0 measurement, times the conditional proba-
bility P (0|1) for obtaining 0 for measurement of q1 given that we previously
obtained 1 for q0 :

P (0, 1) = P (0|1)P0 (1) = |α01 |2 . (4.11)

This of course is exactly the Born-rule probability we obtained for the result
(0, 1) from simultaneous measurement of these two commuting observables.
One can summarize all this more formally in the following way. Suppose
that we have a n n-qubit system in state |ψ⟩ and we measure a single qubit
qj to be in state |bj ⟩. Then the state of the system (partially) collapses to

P̂j (bj )|ψ⟩

|ψ ′ ⟩ = , (4.12)
⟨ψ|P̂j (bj )|ψ⟩1/2
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT99

where the projector is onto the subspace of the Hilbert space that is consistent
with the measurement result (as opposed to projecting onto the single state
consistent with the measurement result as occurs with a single qubit):

P̂j (bj ) = Iˆn Iˆn−1 . . . Iˆj+1 |bj ⟩⟨bj | Iˆj−1 . . . Iˆ1 Iˆ0 .

(4.13)

A more subtle case is the one in which we measure a two-qubit operator

such as Z1 Z0 . A measurement result of +1 only tells us that the two qubits
are in the same state, but not which state they are in. Thus the corresponding
projector for (just) those two qubits is

P̂10 (+1) = |00⟩⟨00| + |11⟩⟨11|. (4.14)

Similarly if the measurement result is −1 the two qubits must be in opposite

states and the corresponding projector is

P̂10 (−1) = |01⟩⟨01| + |10⟩⟨10|. (4.15)

This type of multi-qubit joint measurement will play a crucial role in quan-
tum error correction and in entanglement generation via measurement. Note
the important fact that joint measurement of Z1 Z0 is completely different
than the product of separate measurement results Z1 and Z0 . The joint
measurement yields one bit of classical information, while with separate mea-
surements you learn two classical bits of information, namely the individual
values for each qubit. Thus the state collapse is more complete in the latter
case.
How do we actually make a joint measurement of an operator like Z1 Z0
without learning about the individual qubit states? This is subtle and un-
derstanding how to design a circuit for making joint measurements requires
us to first learn how to execute two-qubit gates. We turn to this topic in the
next section.

4.3 Two-Qubit Gates

Let us concentrate for now on two-qubit gates (unitary operations). We saw
in Chap. 3 that Hilbert space consisting of the direct product of the Hilbert
space for two single qubits is four-dimensional and that operators on that
space are represented by 4 × 4 matrices. It does not seem that stepping up
from one qubit to two qubits is going to lead to anything dramatically new.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT100

However, it does–we will explore the fascinating concept of entanglement.

Entanglement was deeply disturbing to the founders of quantum mechanics,
especially to Einstein who understood and pointed out its spooky implica-
tions. The power of entanglement as a quantum resource has come to be
appreciated in recent decades and underlies certain key protocols for quan-
tum communication and for quantum computation.

4.3.1 The CNOT Gate

The NOT gate in a classical computer simply flips the bit taking 0 to 1 and
1 to 0. The so-called ‘controlled-NOT’ or CNOT gate allows one to do ‘if-
then-else’ logic with a computer. It is a two-bit gate that flips the state of
the target bit conditioned on the state of the control qubit. It has the ‘truth
table’ shown in Table 4.1 from which we see that the target bit is flipped, if
and only if, the control bit is in the 1 state.

CNOT Truth Table

Control In Target In Control Out Target Out
0 0 0 0
0 1 0 1
1 0 1 1
1 1 1 0
Table 4.1: Action of the two-bit CNOT gate in the standard basis.

In the classical context one can imagine measuring the state of the control
bit and then using that information to control the flipping of the target bit.
However in the quantum context, it is crucially important to emphasize that
measuring the control qubit would collapse its state. We must therefore
avoid any measurements and seek a unitary gate which works correctly when
the control qubit is in |0⟩ and |1⟩ and even when it is in a superposition of
both possibilities α|0⟩ + β|1⟩. As we will soon see, it is this latter situation
which will allow us to generate entanglement. When the control qubit is in
a superposition state, the CNOT gate causes the target qubit to be both
flipped and not flipped in a manner that is correlated with the state of the
control qubit. As we will see, these are not ordinary classical statistical
correlations (e.g. clouds are correlated with rain), but rather special (and
powerful) quantum correlations resulting from entanglement.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT101

Box 4.1. CNOT without Measurement How can we possibly flip the
state of the target qubit conditioned on the state of the control qubit without
measuring the control and hence collapsing its state? We know that in many
systems we can cause a transition between two quantum levels separated in
energy by an amount ℏω by applying an electromagnetic wave of frequency
ω for a certain precise period of time. [In quantum mechanics energy E =
ω
hf = ℏω and frequency f = 2π are related with a proportionality given by
Planck’s constant h = 2πℏ.] The way to make this bit flip of the target be
conditioned on the state of the control is to have an interaction between the
two qubits that causes the transition energy of the target depend on the state
of the control. For example, consider the Hamiltonian (the energy operator)

ℏω0 ℏω1
H=− Z0 + Z1 + ℏgZ1 Z0 . (4.16)
2 2

The energy change to flip qubit 0 from |0⟩ to |1⟩ is ∆E = ℏ(ω0 − gZ1 ). Thus
if we shine light (or microwaves as appropriate) of frequency ℏ(ω0 + g), qubit
0 will flip only when qubit 1 is in |1⟩ because that matches the transition
frequency. However if qubit 1 is in the state |0⟩, the transition frequency
for qubit 0 is shifted to ℏ(ω0 − g) and the light has the wrong frequency to
cause the transition. See Fig. 4.1 for an illustration of the level scheme.

If the transition energies (frequencies) are unique then we can apply a

driving field (electric or magnetic or whatever couples to the qubit) to cause
the desired transition and no others. This allows us to flip the target qubit
if and only if the control is in a particular state we choose (1 for CNOT, 0
for C̄NOT).

It may seem intuitive that a qubit can be excited by bathing it in photons.

It is less intuitive that the qubit can be de-excited the same way. This is
called “stimulated emission.” The excited qubit can de-excite by emitting a
photon. This can happen spontaneously (which is a form of decoherence that
can cause errors in quantum computers), but it can be made to happen much
faster (and coherently so it is not an error) if the emitted photon joins many
other identical photons that are present because of the drive. This effect was
predicted by Einstein (very early before the quantum theory was even fully
developed) and is the mechanism behind the operation of lasers.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT102

|11〉

CNOT1
(ω0 − g ) CNOT0
(ω1 − g )
|10〉

| 01〉
CNOT0
(ω1 + g )
CNOT1
(ω0 + g )
| 00〉

Figure 4.1: Energy levels for two interacting qubits in the computational basis |q1 q0 ⟩
whose Hamiltonian is given by Eq. (4.16). Blue lines correspond to transitions in
which qubit q0 is flipped. Red lines correspond to transitions in which qubit q1
is flipped. By choosing ω0 , ω1 and g appropriately, all four single qubit transition
energies can be made unique, thereby allowing flip of one qubit conditioned on the
state of the other. Each transition is labeled by its energy and the corresponding
operation CNOTj where j labels which qubit is the control. Dashed lines corre-
spond to C̄NOT gates in which the control must be in state 0 (rather than 1) in
order to flip the target. Altogether there are four different transistions correspond-
ing to four distinct CNOT or C̄NOT gates. If g = 0, then the two qubits are not
interacting and the transition energy for one qubit is not dependent on the state
of the other, so that conditional operations are not possible.

From the classical truth table we can attempt to construct the appropriate
quantum operator by putting the dual of initial state ket in the bra and the
desired final state in the ket. Number the qubits from right to left (beginning
with zero) and letting qubit 0 be the target and qubit 1 be the control, we
have
CNOT1 = |00⟩⟨00| + |01⟩⟨01| + |11⟩⟨10| + |10⟩⟨11|, (4.17)
where the subscript on CNOT tells us which qubit is the control bit. We see
from the orthonormality of the basis states that this produces all the desired
transformations, for example
CNOT1 |11⟩ = |10⟩. (4.18)
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT103

This however is not enough. We have to prove that the desired transforma-
tions are legal. That is, we must show that CNOT is unitary. It is clear
by inspection that CNOT is Hermitian. A straightforward calculation shows
that (CNOT)2 = I. ˆ Hence by the lemma in Ex. 3.10b, CNOT is unitary.
It is also instructive to write the gate in the following manner
1 0 1 0
σ − σz ⊗ σx + σ + σz ⊗ σ0.

CNOT1 = (4.19)
2 2
The first parentheses (including the factor of 1/2) is the projector |1⟩ ⟨1| onto
the |1⟩ state for the control qubit (qubit 1). Similarly, the second parentheses
(including the factor of 1/2) is the projector |0⟩ ⟨0| onto the |0⟩ state for the
control qubit. Thus if the control qubit is in |1⟩, then σ x flips the target
qubit while the remaining term vanishes. Conversely when the control qubit
is in |0⟩, the coefficient of σ x vanishes and only the identity in the second
term acts.
In the standard two-qubit basis defined in Eq. (3.160), the operator in
Eq. (4.17) and Eq. (4.19) has the matrix representation

0 0 0 1 1 0 1 0
CNOT1 = ⊗ + ⊗ (4.20)
0 1 1 0 0 0 0 1
 
1 0 0 0
 0 1 0 0 
=   0 0 0 1 .
 (4.21)
0 0 1 0
The CNOT1 unitary is represented in ‘quantum circuit’ notation by the con-
struction illustrated in left panel of Fig. 4.2.
Exercise 4.1. By analogy with Eq. (4.21), find the matrix representa-
tion of the CNOT0 gate given in the right panel of Fig. 4.2

Exercise 4.2. Consider the reset operator R defined in Box 4.2. Find
a state whose norm is not preserved under R. This is further proof
that R is not a legal unitary operator and requires the assistance of a
measurement operation.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT104

| q0 〉 | q0 〉

| q1 〉 | q1 〉

Figure 4.2: Quantum circuit representation of the CNOT1 operation (left panel) and the
CNOT0 operation (right panel). The filled circle denotes the control qubit (in the left
panel, q1 ) and the symbol ⊕ denotes the target qubit (in the left panel, q0 ) for the gate.
The right panel shows the gate with control and target interchanged. In quantum circuit
notation the order in which gates are applied (‘time’) runs from left to right. If the circuit
has GATE1 followed by GATE2 , reading from left to right, this corresponds to the (right-
to-left) sequence of matrix operations GATE2 GATE1 |INPUT STATE⟩. For the C̄NOT
gate the control is shown as an open, rather than filled, circle. This denotes the operation
being activated by the control being in 0 rather than 1.

Box 4.2. RESET: Some desired operations are not unitary. One im-
portant task in a quantum computer is to reset all the bits to some standard
state before starting a new computation. Let us take the standard state to
be |0⟩. We want to map every initial state to |0⟩ which can be done with

1 1
R = |0⟩⟨0| + |0⟩⟨1| = . (4.22)
0 0

R cannot be unitary because a unitary operation preserves inner products

between states. The two initial states |0⟩ and |1⟩ are orthogonal and yet they
both map to the state |0⟩ which means that the states under the mapping are
no longer orthogonal. Equivalently, we see that R is not invertible because
we don’t know if R−1 is supposed to take |0⟩ to |0⟩ or to |1⟩. One can also
mechanically check that R is not unitary by writing

R† = |0⟩⟨0| + |1⟩⟨0|, (4.23)

R† R = (|0⟩⟨0| + |1⟩⟨0|) (|0⟩⟨0| + |0⟩⟨1|) (4.24)

1 1
= = I + σ x ̸= I. (4.25)
1 1
Thus if we only have access to unitary rotations, we cannot reset an unknown
state. There is however a simple fix for this problem. If we measure σ z (say),
the unknown state will randomly collapse to either |0⟩ or |1⟩. If it collapses to
|0⟩, apply the identity operation. If it collapses to |1⟩, then apply the unitary
operation σ x to flip the state from |1⟩ to |0⟩. We see that applying one of two
different unitaries (I or σ x ) conditioned on the outcome of a measurement
gives us a new capability beyond that of unitary operations.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT105

4.3.2 Using CNOT gates to measure two-qubit opera-

tors
Given the ability to readout (i.e., measure) the states of individual qubits,
how do we parlay this ability into the ability to make joint measurements of
multi-qubit operators like Z1Z2 . As we have emphasized, measuring Z1 Z2
produces only a single classical bit of information and causes different state
collapse than measuring the individual values of Z1 and Z2 and then multi-
plying the results of the two measurements. We need a way to learn the value
of Z1 Z2 without learning the value of Z1 or Z2 . Recall that to understand
the probabilities of different measurement outcomes, we need to know the
eigenvalues and eigenvectors of the operator being measured. Since both Z1
and Z2 are diagonal in the standard basis, so is their product. Thus any
standard basis state is an eigenvector

Z1 Z2 |00⟩ = (+1)|00⟩ (4.26)

Z1 Z2 |01⟩ = (−1)|01⟩ (4.27)
Z1 Z2 |10⟩ = (−1)|10⟩ (4.28)
Z1 Z2 |11⟩ = (+1)|11⟩. (4.29)

Because the measurement result is always an eigenvalue of the operator being

measured, we see that measurement of Z1 Z2 tells us whether the two qubits
are the same state or opposite states, but tells us nothing about which state
the individual qubits are in.
Consider the circuit shown in Fig. 4.3. It uses two CNOT gates (here
represented equivalently as controlled X (CX) gates). If q1 and q2 are the
same, q0 is flipped either 0 times (if |b1 b2 ⟩ = |00⟩) or two times (if |b1 b2 ⟩ =
|11⟩). Either way, q0 ends up in b0 = 0. If q1 and q2 are the opposite,
exactly one of the CX gates is activated and q0 ends up in b0 = 1. Thus
measurement of Z0 is equivalent to measurement of the joint operator Z1 Z2 .
This scheme is readily extended to produce measurements of arbitrary strings
of Pauli operators of any length. A related method will be discussed in Sec.
666666666 for executing multi-qubit gates (as opposed to measurements).

4.3.3 Using CNOT gates to produce entanglement

Now that we have established the CNOT unitary and how it acts on the
standard basis, let us consider what happens when the control qubit is in
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT106

q0 | 0〉
X X | b1 ⊕ b2 〉
z0= z1 ⋅ z2
q1 | b1 〉 | b1 〉

q2 | b2 〉 | b2 〉

Figure 4.3: Circuit for mapping the joint operator Z1 Z2 onto the auxiliary qubit operator
Z0 . Measurement of Z0 yields the value of Z1 Z2 without conveying any information about
the individual values of Z1 and Z2 . b0 , b1 , b2 ∈ {0, 1} denote qubit states in the standard
basis and ⊕ denotes bitwise addition mod 2.

a superposition state. We begin with a so-called product state or separable

state which can be written as the product of a state for the first qubit and a
state for the second qubit
|Ψ⟩ = [α1 |0⟩ + β1 |1⟩] ⊗ [α0 |0⟩ + β0 |1⟩]
= α1 α0 |00⟩ + α1 β0 |01⟩ + β1 α0 |10⟩ + β1 β0 |11⟩. (4.30)
For example consider the following separable initial state
1
|ψ⟩ = |−⟩|0⟩ = | ←⟩|0⟩ = √ (|0⟩ − |1⟩) |0⟩ (4.31)
2
1
= √ (|00⟩ − |10⟩) . (4.32)
2
When we apply the CNOT gate to this superposition state, the target
qubit both flips and doesn’t flip (depending on the state of the control)! The
two qubits become ‘entangled’
1
|B2 ⟩ = CNOT1 |ψ⟩ = √ (|00⟩ − |11⟩) , (4.33)
2
where we have added a subscript to CNOT to explicitly indicate that qubit
1 is the control qubit. An entangled state of two qubits2 is any state which
2
The concept of entanglement is more difficult to uniquely define and quantify for
N > 2 qubits.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT107

cannot be written in separable form as a tensor product of single particle

states.

Box 4.3. LOCC No-Go Theorem In order to convert a product state into
an entangled state, we must use an entangling gate. Such gates are always
controlled gates such as the CNOT or CPHASE. These gates cannot be
written as a single Kronecker product of two single-qubit operators. There
is an important no-go theorem which states that using local operations and
classical communication (LOCC) two parties cannot establish an entangled
state between them starting from product states. Here local operations
means operations of the form Z ⊗ I or I ⊗ H, which one party carries out
independently of the other. Operations like CNOT and CPHASE, being
conditional operations are explicitly not local. Classical communication
refers to the two parties communicating with each other via a classical (i.e.,
without exchanging qubits) about which gates (or measurements) they have
performed or request that the other part perform.

In order to perform a non-local gate, the qubits have to be (at some point
in the protocol) physically nearby so they can interact with each other, or if
remote, then the two parties much communicate via a quantum channel. For
example, Alice can locally use a CNOT gate on a pair of qubits to create an
entangled pair and then send one of the qubits to Bob.

A convenient basis in which to represent entangled states is the so-called

CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT108

Bell basis3
1
|B0 ⟩ = √ [|01⟩ − |10⟩] (4.34)
2
1
|B1 ⟩ = √ [|01⟩ + |10⟩] (4.35)
2
1
|B2 ⟩ = √ [|00⟩ − |11⟩] (4.36)
2
1
|B3 ⟩ = √ [|00⟩ + |11⟩] . (4.37)
2
Each of these four states is a ‘maximally entangled’ state, but they are mu-
tually orthogonal and therefore must span the full four-dimensional Hilbert
space. Thus linear superpositions of them can represent any state, including
even product states. For example,
1
|0⟩|0⟩ = |00⟩ = √ [|B2 ⟩ + |B3 ⟩] . (4.38)
2
Entanglement is very mysterious and entangled states have many pecu-
liar and counter-intuitive properties. In an entangled state the individual
spin components have zero expectation value and yet the spins are strongly
correlated. For example in the Bell state |B0 ⟩,
⟨B0 |⃗σ1 |B0 ⟩ = ⃗0 (4.39)
⟨B0 |⃗σ0 |B0 ⟩ = ⃗0 (4.40)
⟨B0 |σ1x σ0x |B0 ⟩ = −1 (4.41)
⟨B0 |σ1y σ0y |B0 ⟩ = −1 (4.42)
⟨B0 |σ1z σ0z |B0 ⟩ = −1. (4.43)
This means that the spins have quantum correlations which are stronger than
is possible classically. In particular,
⟨B0 |⃗σ1 · ⃗σ0 |B0 ⟩ = −3 (4.44)
3
Named after John Bell, the physicist who in the 1960’s developed deep insights into the
issues surrounding the concept of entanglement that so bothered Einstein. Bell proposed
a rigorous experimental test of the idea that the randomness in quantum experiments is
due to our inability to access hidden classical variables. At the time this was a theorist’s
‘gedanken’ experiment, but today the ‘Bell test’ has rigorously ruled out the possibility of
hidden variables. Indeed, the Bell test is now a routine engineering test to make sure that
your quantum computer really is a quantum computer, not a classical computer.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT109

despite the fact that in any product state |ψ⟩ = |n̂1 ⟩ ⊗ |n̂0 ⟩

|⟨⃗σ1 ⟩| = |n̂1 | = |⟨⃗σ0 ⟩| = |n̂0 | = 1. (4.45)

CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT110

Box 4.4. Joint Measurements of Multiple Qubits One might ask how
the computer architect designs the hardware to make joint measurements of
operators of the form M = σ1x σ0z , etc. First, a general remark: If she wants
to measure M = AB, the product of two observables A and B, then M must
be an observable (i.e., Hermitian).

M † = (AB)† = B † A† = BA
M † = M ⇐⇒ AB = BA ⇐⇒ [A, B] = 0,

where [A, B] ≡ AB − BA is the commutator of the two operators. Thus the

product of two observables is not itself an observable unless the two opera-
tors commute. If A and B act on different qubits, then they automatically
commute and they share the same eigenvectors (though not necessarily the
same eigenvalues of course). This makes the Born rule easy to apply
X
|ψ⟩ = ψj |j⟩,
j

where |j⟩ is the jth eigenvector of A,B, and M = AB having eigenvalues

A|j⟩ = aj |j⟩
B|j⟩ = bj |j⟩
M |j⟩ = aj bj |j⟩.

The computer architect might be able to devise a physical set up to di-

rectly make the joint measurement (see Fig. 4.3, but it is likely easier for her
to employ her existing ability to make single-qubit measurements and thus
measure A and B sequentially (or separately but in parallel). For simplicity,
assume that the spectrum of each operator is non-degenerate (eigenvalues are
unique). Measurement of A yields the result aj with probability |ψj |2 and
the state collapses to |j⟩. Subsequent measurement of B yields the result
bj with probability 1 since |j⟩ is an eigenstate of B. Hence the measured
value of M formed by multiplying the two measurement results, mj = aj bj
occurs with probability |ψj |2 , which is the same result that would have been
obtained by measuring B first and then A or by directly measuring the joint
operator M = (AB). It is crucially important however to understand (as
we discussed in Sec. 4.2 that joint measurements produce less information
but induce less state collapse than measurements of the individual qubits
involved in the joint measurement. This difference can be very useful.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT111

We can obtain a useful picture of the Bell states by examining all of the
possible two-spin correlators in the so-called ‘Pauli bar plot’ for the state
|B0 ⟩ shown in Fig. 4.4. We see that all the single-spin operator (e.g. IX
and ZI) expectation values vanish. Because of the entanglement, each spin
is, on average, totally unpolarized. Yet three of the two-spin correlators,
XX, Y Y, ZZ, are all -1 indicating that the two spins are pointing in exactly
opposite directions. This is becasue |B0 ⟩ is the rotationally invariant ‘spin-
singlet’ state.

0.5

II IX IY IZ XI YI ZI XX XY XZ YX YY YZ ZX ZY ZZ

-0.5

-1.0

Figure 4.4: ‘Pauli bar plot’ of one and two spin correlators in the Bell state |B0 ⟩.

In maximally entangled states Bell states, the individual spins do not

have their own ‘state’ in the sense that ⟨B|⃗σ1 |B⟩ = ⃗0, rather than n̂, where
n̂ is some unit vector. We can only say for |B0 ⟩ for example that the z
components of the two spins will always be measured to be exactly opposite
each other and have product −1. Remarkably, the same is true for the
x and y components as well. Classical vectors of length 1 drawn from a
random probability distribution simply cannot have this property. Thus
the correlations of the spins in entangled states are intrinsically quantum
in nature and stronger than is possible in any classical (or ‘hidden variable’
quantum) model.
Exercise 4.3. Derive Eqs. (4.39-4.45)

4.4 Bell Inequalities

We turn now to further consideration of the ‘spooky’ correlations in entangled
states. We have already seen for the Bell state |B0 ⟩ that the components of
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT112

the two spins are perfectly anti-correlated. Suppose now that Alice prepares
two qubits in this Bell state and then sends one of the two qubits to Bob who
is far away (say one light-year). Alice now chooses to make make a measure-
ment of her qubit projection along some arbitrary axis n̂. For simplicity let
us say that she chooses the ẑ axis. Let us further say that her measurement
result is −1. Then she immediately knows that if Bob chooses to measure his
qubit along the same axis, his measurement result will be the opposite, +1.
It seems that Alice’s measurement has collapsed the state of the spins from
|B0 ⟩ to |10⟩. This ‘spooky action at a distance’ in which Alice’s measurement
seems to instantaneously change Bob’s distant qubit was deeply troubling to
Einstein [3].
Upon reflection one can see that this effect cannot be used for superlu-
minal communication (in violation of special relativity). Even if Alice and
Bob had agreed in advance on what axis to use for measurements, Alice
has no control over her measurement result and so cannot use it to signal
Bob. It is true that Bob can immediately find out what Alice’s measurement
result was, but this does not give Alice the ability to send a message. In
fact, suppose that Bob’s clock were off and he accidentally made his mea-
surement slightly before Alice. Would either he or Alice be able to tell? The
answer is no, because each would see a random result just as expected. This
must be so because in special relativity, the very concept of simultaneity is
frame-dependent and not universal.
Things get more interesting when Alice and Bob choose different mea-
surement axes. Einstein felt that quantum mechanics must somehow not be
a complete description of reality and that there might be ‘hidden variables’
which if they could be measured would remove the probabilistic nature of
the quantum description. However in 1964 John S. Bell proved a remarkable
inequality [4] showing that when Alice and Bob use certain different measure-
ment axes, the correlations between their measurement results are stronger
than any possible classical correlation that could be induced by (local) hidden
variables. Experimental violation of the Bell inequality proves that it is not
true that quantum observables have values (determined by hidden classical
variables) before we measure them. Precision experimental measurements
which violate the Bell inequalities are now the strongest proof that quantum
mechanics is correct and that local hidden variable theories are excluded.
Perhaps the simplest way to understand this result is to consider the
CHSH inequality developed by Clauser, Horn, Shimoni and Holt [5] follow-
ing Bell’s ideas. Consider the measurement axes shown in Fig. 4.5. The
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT113

experiment consists of many trials of the following protocol. Alice and Bob
share a pair of qubits in an entangled state. Alice randomly chooses to mea-
sure the first qubit using X or Z while Bob randomly chooses to measure the
second qubit using X ′ or Z ′ which are rotated 45 degrees relative to Alice’s
axes. After many trials (each starting with a fresh copy of the entangled
state), Alice and Bob compare notes (via classical subluminal communica-
tion) on their measurement results and compute the following correlation
function

S = ⟨XX ′ ⟩ + ⟨ZZ ′ ⟩ − ⟨XZ ′ ⟩ + ⟨ZX ′ ⟩. (4.46)

By correlation function of two measurement results we mean (for example)

⟨XZ ′ ⟩ = ⟨B0 |XZ ′ |B0 ⟩. (4.47)

Experimentally this is measured by the following. For the XZ ′ correlator,

Alice and Bob select those instances in which Alice happens to have chosen
to measure X and Bob happens to have chosen to measure Z ′ . (This occurs
say N times, corresponding to 25% of the total number of trials on average.)
Let xj = ±1 and zj′ = ±1 be respectively Alice’s random measurement result
for X in the jth trial and and Bob’s random measurement result for Z ′ in the
jth trial. Then Alice and Bob’s comparision of measurement results can be
used to create the following unbiased (but noisy) estimator of the correlation

N
1 X
⟨B0 |XZ ′ |B0 ⟩ ≈ xj zj . (4.48)
N j=1

In the limit of large N the statistical uncertainty in the estimator goes to

zero and we obtain an arbitrarily accurate estimate of the correlation. The
four possible combinations of the two measurement results are illustrated in
Fig. 4.6. The correlator is given by the probabilities of the four outcomes

⟨B0 |XZ ′ |B0 ⟩ = P++ − P−+ + P−− − P+− . (4.49)

If the measurements are perfectly correlated (xj = zj′ every time) then the
correlator will be +1. If perfectly anticorrelated (xj = −zj′ every time), then
the correlator will be −1. If the measurements are uncorrelated, then all four
measurement outcomes will be equally likely and xj zj′ will be fully random
(±1 with equal probability) and the correlator will vanish (on average).
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT114

Z¢ X¢

45o
X
Figure 4.5: Measurement axes used by Alice (solid lines) and Bob (dashed lines) in estab-
lishing the Clauser-Horn-Shimoni-Holt (CHSH) inequality.

z cj
(1, 1) (1, 1)

(1, 1) (1, 1)

Figure 4.6: Illustration of the four possible measurement outcomes in the jth run of
an experiment in which Alice measures X and Bob measures Z ′ . The net correlator
of the two measurements is given by Eq. (4.49).

Eq. (4.46) can be rewritten

S = ⟨(X + Z)X ′ ⟩ − ⟨(X − Z)Z ′ ⟩. (4.50)

Alice and Bob note that their measurement results are random variables
which are always equal to either +1 or −1. In a particular trial Alice chooses
randomly to measure either X or Z. If you believe in the hidden variable
theory, then surely, the quantities not measured still have a value of either
+1 or −1 (because when we do measure them, they always are either +1 or
−1). If this is true, then either X = Z or X = −Z. Thus either X + Z
vanishes or X − Z vanishes in any given realization of the random variables.
The combination that does not vanish is either +2 or −2. Hence it follows
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT115

immediately that S is bounded by the CHSH inequality

−2 ≤ S ≤ +2. (4.51)

It turns out however that the quantum correlations between the two spins
in the entangled pair violate this classical bound. They are stronger than
can ever be possible in any classical local hidden variable theory. To see this,
note that because ⃗σ is a vector, we can resolve its form in one basis in terms
of the other via
′ 1
σx = √ [σ z + σ x ] (4.52)
2
′ 1 z
σz = √ [σ − σ x ]. (4.53)
2
Thus we can express S in terms of the ‘Pauli bar’ correlations
1
S = √ [⟨XX + XZ + ZX + ZZ⟩ − ⟨XZ − XX − ZZ + ZX⟩] . (4.54)
2
For Bell state |B0 ⟩, these correlations are shown in Fig. 4.4 and yield
√
S = −2 2, (4.55)

in clear violation of the CHSH inequality. Strong violations of the CHSH

inequality are routine in modern experiments.4 This teaches us that in our
quantum world, observables do not have values if you do not measure them.
There are no hidden variables which determine the random values. The
unobserved spin components simply do not have values. Recall that σ x and
σ z are incompatible observables and when we choose to measure one, the
other remains not merely unknown, but unknowable.
It is ironic that Einstein’s arguments that quantum mechanics must be
incomplete because of the spooky properties of entanglement have led to the
strongest experimental tests verifying the quantum theory and falsifying all
4
Although strictly speaking, in many experiments there are loopholes associated with
imperfections in the detectors and the fact that Alice and Bob typically do not have a
space-like separation. However recent experiments have closed all these loopholes. Note
that all measured correlators have statistical uncertainty in them when the number of
measurements N is finite. However modern experiments can achieve results that exceed
the Bell bound by many standard deviations.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT116

local hidden variable theories. We are forced to give up the idea that physical
observables have values before they are observed.
Exercise 4.4. Other Bell inequalities.
1. Work out the ‘Pauli bar plots’ (analogous to Fig. 4.4) for each of
the Bell states B1 , B2 , B3 .

2. Using the same quantization axes as in Fig. 4.5, find the analog of
Eq. (4.46) for the correlators that should be measured to achieve
violation of the Bell inequality for these other Bell states.

4.5 Quantum Dense Coding

Now that we have learned about Bell states describing the entanglement
of a pair of qubits, we will show that entanglement is a powerful resource
for quantum information tasks. In this section we will see how quantum
entanglement can be used as a resource to help in the task of quantum
communication.
We saw above that quantum correlations are strong enough to violate
certain classical bounds, but the spooky action at a distance seemed unable
to help us send signals. We will learn more about this ‘no signaling’ con-
dition in this section. It turns out that by using a special ‘quantum dense
coding’ protocol, we can use entanglement to help us transmit information in
a way that is impossible classically [2]. As preparation for understanding this
protocol, let us consider the following situation. Alice has qubit q0 and Bob
has qubit q1 and the combined system is in a product state. As an example,
suppose that this state is

|ψ⟩ = |χ⟩ ⊗ |0⟩. (4.56)

If Alice and Bob are far apart, Alice is unable to do any operations on Bob’s
qubit, but she can perform a local unitary U0 = I ⊗ U on her qubit q0 . We
can now ask the following question: Given this constraint of local operations,
how many distinct (i.e., orthogonal) states for the combined system can Alice
reach starting from this initial state? Clearly for U0 = I ⊗X, she can produce

(I ⊗ X)|ψ⟩ = |χ⟩ ⊗ |1⟩, (4.57)

CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT117

which is orthogonal to the original state. This however is the only orthogonal
state Alice can reach. The two-qubit Hilbert space is four-dimensional but
Alice cannot fully explore it. It seems ‘obvious’ that this is because she has
no access to Bob’s qubit, however as we will see, things become much less
obvious when we consider entangled states.
The situation is very different for two-qubit entangled states. We take as
our basis the four orthogonal Bell states in Eqs. (4.34-4.37). Suppose that
Alice prepares the Bell state |B0 ⟩ using qubits q1 , q0 and sends q1 to Bob
who is in a distant location. Using a remarkable protocol called quantum
dense coding [2], Alice can now send Bob two classical bits of information
by performing a local gate on q0 and then sending it to Bob. The protocol
(whose circuit in illustrated in Fig. 4.7) relies on the amazing fact that Alice
can transform the initial Bell state into any of the other Bell states by purely
local operations on her remaining qubit without communicating with Bob.
The four possible unitary operations Alice should perform are I0 , X0 , Y0 , Z0
which yield5
I0 |B0 ⟩ = + |B0 ⟩ (4.58)
Z0 |B0 ⟩ = − |B1 ⟩ (4.59)
X0 |B0 ⟩ = + |B2 ⟩ (4.60)
Y0 |B0 ⟩ = −i|B3 ⟩. (4.61)
It seems somehow miraculous that without touching Bob’s qubit, Alice
can reach all four orthogonal states by merely rotating her own qubit. How-
ever the fact that this is possible follows immediately from Eqs. (4.39) and
(4.40), the Pauli bar plot in Fig. 4.4 and the corresponding plots for the other
Bell states. In every Bell state, the expectation value of every component of
⃗σ0 (and ⃗σ1 ) vanishes. Thus for example
⟨B0 |σ0x |B0 ⟩ = ⟨B0 |I ⊗ X|B0 ⟩ = 0. (4.62)
But this can only occur if the state (σ0x |B0 ⟩) is orthogonal to |B0 ⟩! This in
turn means that there are four possible two-bit messages Alice can send by
associating each with one of the four operations6 I0 , X0 , Y0 , Z0 as shown in
5
As usual we are simplifying the notation. For example Z0 stands for the more math-
ematically formal I ⊗ Z since it applies Z to Alice’s qubit, q0 , and the identity to Bob’s
qubit, q1 . Note that the global phase factors are irrelevant to the workings of the protocol.
6
The particular operators associated with each of the four binary numbers is somewhat
arbitrary and was chosen in this case to correspond to a particular choice of Bob’s decoding
circuit which will be described later.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT118

Alice’s Bob’s
Message Bob’s Decoder Measurement

q0 U H
After Alice
selects message
| B0 〉
Before Alice
selects message

Figure 4.7: Illustration of the quantum dense coding protocol showing the power of
having a prepositioned Bell pair shared by Alice and Bob. Alice has two qubits,
q1 , q0 and on Monday prepares them in an entangled Bell state |B0 ⟩. She sends q1
to Bob. On Tuesday she decides to send Bob a two-bit classical message by choos-
ing one of four possible unitaries U ∈ {I, X, Y, Z} to apply to her remaining qubit,
q0 . This unitary maps the initial Bell state |B0 ⟩ to one of the four orthogonal
Bell states. She then sends q0 to Bob who decodes the Bell pair (by mapping the
four Bell basis states back to the standard computational basis). If Alice chooses
the classical message z1 , z0 , Bob’s decoder outputs a state ei ϕ(z1 , z0 )|z1 z0 ⟩, where
ϕ(z1 , z0 ) is an irrelevant phase factor. Bob then measures the qubits and obtains
from the measurements two classical numbers z1 , z0 corresponding to Alice’s mes-
sage.

Table 4.2.
The reader might now reasonably ask the following question. Before
Alice sends Bob q0 can he, by making local measurements on his qubit q1 ,
detect the fact that Alice’s operation has changed the joint state of their
two qubits? Clearly from the Holevo bound (see Box 2.1) he can learn at
most 1 bit of information so would not be able to fully learn which of the
four operations Alice did, but perhaps he could learn something. If he could,
then special relativity would be violated because of signal would have passed
instantaneously from Alice to Bob exceeding the bound set by the speed of
light. The answer is a firm no, as discussed in Box 4.5.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT119

Alice’s Alice’s State Bob Bob’s Bob’s measurement

message operation receives decoded state result (z1 , z0 )
00 Y0 −i|B3 ⟩ −i|00⟩ (0, 0)
01 X0 + |B2 ⟩ + |01⟩ (0, 1)
10 Z0 − |B1 ⟩ − |10⟩ (1, 0)
11 I0 + |B0 ⟩ − |11⟩ (1, 1)

Table 4.2: Illustration of the quantum dense encoding protocol, showing the unitary oper-
ation Alice must carry out on her qubit, q0 , the state produced by Bob’s decoder and the
measurement results Bob obtains from which he reads Alice’s two classical bit message.
The extra phase factors in front of the Bell states have no effect on the measurement
results because Bob never has to deal with a superposition of these Bell states.

Box 4.5. No Signaling Condition We saw in Box 4.3 that it is impossible

to create entanglement using LOCC. With the quantum dense coding
protocol we see that if entanglement already exsits between two parties, it
is possible for (either) one of the parties to change the entangled state using
only LOCC. The ‘no signaling’ condition tells us that if one party changes
the entangled state, the other party is not able to detect this using only
local operations and measurements. Thus it is impossible for one party to
communicate with the other using only local operations. To see why, note
from Eqs. (4.39) and (4.40) for Bell state |B0 ⟩, and the analogous equations
for the other Bell states, that every Pauli operator that one measures has
zero expectation value, meaning that one sees only random shot noise and
no signal. It is true that if both parties measure the same Pauli operator
there will be perfect correlations in the results. However the results are
purely random and in order to see the correlations, the two parties must
‘compare notes’ which requires classical communication at (or below) the
speed of light. Equivalently, since Alice’s measurement results are random
and not under her control, she cannot use these as intentional messages
to Bob. [Note for Pauli measurements which can only yield ±1, the mean
of the distribution of measurement results, m = p+ − p− , determines the
entire distribution, since p+ + p− = 1. Thus if the mean is unchanged, the
entire distribution is unchanged. Hence no messages can be hidden in the
distribution if the mean cannot be changed.]

The upshot of all this is that, while it may appear that there is spooky action
at a distance in the changes that one party can make in an entangled state us-
ing LOCC, these changes are locally invisible to the other party because only
the correlations in measurement results of the two parties actually change.
Computation of these correlations requires (subluminal) classical communi-
cation.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT120

Exercise 4.5. The argument for the no-signaling condition given in

Box 4.5 relies on the quantum state being one of the 4 Bell states that
Alice can create using the quantum dense coding protocol by applying
a Pauli gate to her qubit. We know however that superpositions of Bell
states can be used to represent ordinary product states. Show that if,
instead of applying a Pauli gate like Y0 , Alice applies a rotation by angle
θ
θ
U = e−i 2 Y0 = cos θ [I ⊗ I] − i sin θ [I ⊗ Y ],
the resulting superposition of Bell states still obeys the no-signaling con-
dition for Bob’s measurements.

After encoding her message by carrying out the appropriate operation on

her qubit, Alice physically transmits her qubit to Bob. Bob then makes a
joint measurement (details provided further below) on the two qubits which
tells him which of the four Bell states he has and thus recovers two classical
bits of information even though Alice sent him only one quantum bit after
deciding what the message was. The pre-positioning of the entangled pair
has given them a resource which doubles the number of classical bits that can
be transmitted with one (subsequent) quantum bit. Of course in total Alice
transmitted two qubits to Bob, so the Holevo bound has not been violated.
The key point is that the first qubit was sent in advance of deciding what
the message was. How weird is that!?
This remarkable protocol sheds considerable light on the concerns that
Einstein raised in the EPR paradox [3]. It shows that the special correlations
in Bell states can be used to communicate information in a novel and efficient
way by ‘prepositioning’ entangled pairs shared by Alice and Bob. However
causality and the laws of special relativity are not violated because Alice
still has to physically transmit her qubit(s) to Bob in order to send the
information.
The above protocol requires Bob to determine which of the four Bell
states he has. How can he do this? A theorist might suggest that Bob
simply measure the operator

MBell = 0|B0 ⟩⟨B0 | + 1|B1 ⟩⟨B1 | + 2|B2 ⟩⟨B2 | + 3|B3 ⟩⟨B3 |. (4.63)

The four eigenstates of this Hermitian operator are the four (orthonormal)
Bell states, and we see explicitly that Bell state j has eigenvalue j. Thus the
measurement result tells us precisely which Bell state the system is in.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT121

Exercise 4.6. Prove that measurement of the operator MBell can be

used to non-deterministically create entangled states. If you measure
MBell in the product state |00⟩, what entangled states will result and
with what probabilities will they occur? Repeat your calculation for the
product state |01⟩. Note that having randomly created one of the four
Bell states, you can convert it to any other via single qubit Pauli gates
on either one of the two qubits as illustrated in Eqs. (4.58-4.61).

An experimentalist would find it more practical to use the quantum circuit

shown in Fig. 4.8 that uses two standard gates to uniquely map each of the
Bell states onto one of the four computational basis states (eigenstates of Z1
and Z2 ) that are easily measured. The first symbol indicates the CNOT gate
which flips the target qubit (in this case qubit 1) if and only if the control
qubit (qubit 0) is in the excited state. The second gate in the circuit (denoted
H) is the Hadamard gate that was introduced in Sec. 3.4.1 acting on qubit
zero.

Figure 4.8: Bell-basis measurement circuit comprising a Bell state decoder with a CNOT
and Hadamard gate followed by measurement in the computational basis. This circuit
permits measurement of which Bell state a pair of qubits is in by mapping the states to
the standard basis of eigenstates of σ0z and σ1z .

Exercise 4.7. Prove the following identities for the circuit shown in
Fig. 4.8 where qubit 0 is the control and qubit 1 is the target

H0 CNOT0 |B0 ⟩ = −|11⟩ (4.64)

H0 CNOT0 |B1 ⟩ = +|10⟩ (4.65)
H0 CNOT0 |B2 ⟩ = +|01⟩ (4.66)
H0 CNOT0 |B3 ⟩ = +|00⟩ (4.67)

Once Bob has mapped the Bell states onto unique computational basis
states, he measures Z0 and Z1 separately, thereby gaining two bits of classical
information and effectively reading the message Alice has sent as shown in
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT122

the last column of Table 4.2. Note that the overall sign in front of the
basis states produced by the circuit is irrelevant and not observable in the
measurement process. Also note that to create Bell states in the first place,
Alice can simply run the circuit in Fig. 4.8 backwards.
Exercise 4.8. Construct explicit quantum circuits that take the start-
ing state |00⟩ to each of the four Bell states.)

4.6 No-Cloning Theorem Revisited

Now that we have understood the EPR paradox and communication via
quantum dense coding, we can gain some new insight into the no-cloning
theorem (see Box 3.4 and [1]). It turns out that if cloning of an unknown
quantum state were possible, then we could use an entangled pair of qubits
to communicate information at superluminal speeds in violation of special
relativity. Consider the following protocol. Alice and Bob share an entangled
Bell pair in state
1
|B0 ⟩ = √ [|01⟩ − |10⟩]. (4.68)
2
Alice chooses to measure her qubit (say q0 ) in either the Z0 basis or the X0
basis. The choice she makes defines one classical bit of information. The
result of the measurement collapses the entangled pair into a simple product
state. If Alice chooses the Z0 basis for her measurement, then the state
collapses with equal probability to either |01⟩ (if the measurement result is
−1) or |10⟩ (if the measurement result is +1). In the resulting product state,
Bob’s qubit will have the opposite value for Z1 .
Suppose instead that Alice chooses to measure in the X0 basis. To analyze
this situation, note that the B0 state can also be written in terms of the X
eigenstates as
1
|B0 ⟩ = √ [| − +⟩ − | + −⟩]. (4.69)
2
Then in the product state produced by the measurement, Bob’s qubit will
be either |+⟩ or |−⟩.
How does Bob tell whether his state is an X eigenstate or a Z eigenstate?
With only one qubit, he cannot tell for sure since a measurement of X or
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT123

Z always yields an X or Z eigenstate. Suppose for example the state Bob

has happens to be |0⟩ (because Alice measured Z and got −1). Then if Bob
measures X he learns nothing at all (his result is completely random). If Bob
measures Z his result will be +1. He learns from this that his state could
have been |0⟩, or |+⟩, or |−⟩. However it could not have been |1⟩ since that
would have yielded a measurement of −1. Thus Bob gains a small amount
of information. See Box 4.6 for how to quantify this information gain.
If Bob can clone his qubit to make many copies then he can distinguish the
four cases with near unit probability. If he measures a large number of copies
in the Z basis and always gets the same answer, he knows that his qubit is
almost certainly in a Z eigenstate. If even a single measurement result is
different from the first, he knows his qubit cannot be in a Z eigenstate and
so must be in an X eigenstate. (Of course he could also measure a bunch of
copies in the X basis and gain the same information.)
Let us suppose as a specific example, that Alice makes her measurement in
the Z0 basis. Further suppose that Bob clones his qubit to obtain 2N copies.
He measures N of the qubits in the Z1 basis and N in the X1 basis. Every
single one of his measurements in the Z1 basis will give precisely the same
result (the opposite of whatever Alice got in her one measurement) because
(after Alice’s measurement) the resulting cloned state is an eigenstate of
Z1 . Because the state is an eigenstate of Z1 , measurements of X1 will be
completely random with equal probability of ±1 results. The probability
that all N of the X1 measurements would be all be +1 or all be −1 is
P± (N ) = 2−N . Thus the probability that Bob would be unable to decide
which basis gave identical measurements and which gave random would be
Pfail = 2 · 2−N . (4.70)
Since the failure probability falls exponentially with N , the procedure is
efficient. In case of a ‘tie’ between the two measurement strings, Bob could
simply clone a few more copies (from one qubit he has set aside and not
measured) and with high probability would break the tie.
Note that Alice cannot control her measurement result, only the choice of
measurement axis. Thus Bob does not gain any information from the sign of
his measurement result (which is opposite that of Alice if he measures in the
same basis as Alice). He learns one bit of information about the quantization
axis of Alice’s measurement (if he can clone his qubit) from the measurement
results all being the same, but it does not matter what the sign of these mea-
surement results is. Since measurement state collapse is instantaneous, this
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT124

would yield superluminal communication that would work across any spatial
distance and would not take any time to occur (beyond the time it takes
Bob to clone his qubit a few times) and thus would violate special relativity.
Hence cloning is fundamentally incompatible with relativistic causality.
In fact, cloning would make it possible for Alice to transmit an unlimited
number of classical bits using only a single Bell pair. Alice could choose
an arbitrary measurement axis n̂. The specification of n̂ requires two real
numbers (the polar and azimuthal angles). It would take a very large number
of bits to represent these real numbers to some high accuracy. Now if Bob can
make an enormous number of copies of his qubit, he can divide the copies
in three groups and measure the vector spin polarization ⟨⃗σ ⟩ to arbitrary
accuracy. From this he knows the polarizaton axis n̂ = ±⟨⃗σ ⟩ Alice chose (up
to an unknown sign since he does not know the sign of Alice’s measurement
result for n̂ · σ). Hence Bob has learned a large number of classical bits of
information. The accuracy ϵ (and hence the number of bits ∼ log2 (1/ϵ)) is
limited only by the statistical uncertainties resulting from the fact that his
individual measurement results can be random, but these can be reduced to
an arbitrarily low level with a sufficiently large number of copies N ∼ 1/ϵ2
of the state. Note however that the ‘cost’ N is exponential in the number of
bits accurately learned. It would be remarkably useful to transmit multiple
classical bits using a single quantum bit. But, it turns out to be impossible
because of the no-cloning theorem.
Exercise 4.9. Suppose that Bob has the ability to clone his qubit,
but does not have the ability to make perfect measurements. Let there
be a probability 0 < ϵ < 0.5 that his measuring apparatus produces
a result that is the opposite of the true result each time he makes a
measurement (i.e., a measurement of Z in state |0⟩ sometimes yields −1
instead of the correct +1). Suppose he uses N copies of his qubit to
measure Z and N to measure X in order to determine the quantization
axis Alice chose. The failure probability will naturally be higher than
that given in Eq. (4.70). For N ≫ 1, give an estimate of Bob’s failure
probability for determining the quantization axis of the qubit.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT125

Box 4.6. Quantifying Information Gain from a measurement Sup-

pose that by prior agreement, Alice randomly gives Bob either the state |0⟩
or the state |1⟩ with equal probability. The Shannon information entropy in
this probability distribution is
X
Sprior = − pj log2 pj = 1 bit
j=0,1

for the case p0 = p1 = 21 . Bob knows he should measure in the Z basis. After
Bob makes his measurement, we need to update the probability distribution
based on the new information received. (See the discussion of Bayes Rule
in App. A.) For example, if Bob’s measurement is z = +1 then the new
probability distribution is p0 = 1, p1 = 0 since there is no randomness left.
The Shannon entropy Spost = 0. Hence the information gained by Bob from
his measurement is given by the decrease in randomness of the distribution
upon measurement
I = Sprior − Spost = 1 bit,
as expected.

The situation is more complex if by prior agreement Alice randomly gives

Bob one copy of |0⟩, |1⟩, |+⟩, or |−⟩ with equal probability. Here of course
Sprior = 2 bits. Suppose Bob measures in the Z basis and obtains the result
z = +1. Application of Bayes Rule (see App. A) yields the new posterior
distribution of states consistent with the measurement result
1 1 1
p|0⟩ = , p|1⟩ = 0, p|+⟩ = , p|−⟩ = ,
2 4 4
which yields
1 1 1 1 1 1 3
Spost = − log2 − log2 − log2 = bits.
2 2 4 4 4 4 2
Thus the information gain Bob receives from his measurement is
1
I = Sprior − Spost = bit.
2
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT126

4.7 Quantum Teleportation

Even though it is impossible to clone an unknown quantum state, it is possi-
ble for Alice to ‘teleport’ an unknown state to Bob as long as her copy of the
original is destroyed in the process [6]. Just as for quantum dense coding,
teleportation protocols take advantage of the power of ‘pre-positioned’ en-
tangled pairs. However unlike quantum dense coding where Alice ultimately
sends her qubit to Bob, teleportation only requires Alice to send two classical
bits to Bob. She does not have to send any additional quantum bits to Bob
after pre-positioning the initial Bell pair. A simple protocol is as follows (see
Fig. 4.9): Alice creates a |B3 ⟩ Bell state and sends one of the qubits to Bob.
Alice has in her possession an additional qubit in an unknown state

|ψ⟩ = α|0⟩ + β|1⟩ (4.71)

which she wishes Bob to be able to obtain without her physically sending it
to him. Again, this must done at the expense of destroying the state of her
copy of the qubit because of the no-cloning theorem.
Remarkably, Alice is able to teleport her (unknown) state to Bob using
only LOCC, provided that she and Bob share a pre-existing Bell pair. Alice
applies the Bell state measurement protocol illustrated in Fig. 4.8 to deter-
mine the joint state of the unknown qubit and her half of the Bell pair she
shares with Bob. She then transmits two classical bits using an ordinary clas-
sical communication channel relaying her measurement results to Bob (via
the double line wires shown in Fig. 4.8 ). Note that even if Alice knew what
the state |ψ⟩ was, two classical bits alone are not enough to provide Bob
with the information needed to prepare his own copy of the state |ψ⟩ (since
it takes an infinite number of bits to specify the two angles determining the
position on the Bloch sphere).
To see how Bob is able to reconstruct the initial state using the pre-
existing Bell pair, note that we can rewrite the initial state of the three qubits
in the basis of Bell states for the two qubits that Alice will be measuring as
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT127

unknown
q0 | ψ 〉 H z0
Alice Bell Measurement

q1 z1
Bell
State
| B3 〉 unknown
Bob q2 X Z |ψ 〉 Bob

Figure 4.9: Quantum circuit that Alice can use to teleport and unknown state
to Bob using only two bits of classical information, provided that she and Bob
have shared a Bell pair in advance (in this case Bell state |B3 ⟩). Single wires
indicate quantum channels (qubits), double wires indicate classical information
channels. Based on the measurement results z0 and z1 that Alice sends to Bob,
Bob performs the operation Z z0 X z1 on his half of the Bell pair (q2 ). That is,
if Alice’s measurement result z1 = 1 he performs an X gate and then if Alice’s
measurement result z0 = 1, he performs a Z gate. The quantum state of Alice’s
qubit is destroyed (randomly collapsed) by the measurements and so the no-cloning
theorem is obeyed.

follows
|Φ⟩ = |B3 ⟩ ⊗ [α|0⟩ + β|1⟩] (4.72)
|{z} | {z }
q2 q1 q0
α β
= √ [|000⟩ + |110⟩] + √ [|001⟩ + |111⟩] (4.73)
2 2
1 1
= [β|0⟩ − α|1⟩] ⊗ |B0 ⟩ + [β|0⟩ + α|1⟩] ⊗ |B1 ⟩
2 | {z } |{z} 2| {z } |{z}
q2 q1 q0 q2 q1 q0
1 1
+ [α|0⟩ − β|1⟩] ⊗ |B2 ⟩ + [α|0⟩ + β|1⟩] ⊗ |B3 ⟩ (4.74)
2| {z } |{z} 2| {z } |{z}
q2 q1 q0 q2 q1 q0

From this representation we see that when Alice tells Bob which Bell state
she found (from the decoding step), Bob can find a local unitary operation to
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT128

perform on his qubit to recover the original unknown state. The appropriate
operations are
Alice’s Bell state Alice’s decoded state Bob’s operation
(q1 q0 ) (q1 q0 ) (q2 )
|B0 ⟩ −|11⟩ ZX
|B1 ⟩ +|10⟩ X
|B2 ⟩ +|01⟩ Z
|B3 ⟩ +|00⟩ I
Notice that the final (decoded) state of Alice’s two qubits contains no infor-
mation about the quantum amplitudes α, β in Alice’s original state. This
information has been destroyed during the teleportation process and hence
the no-cloning theorem is not violated when Bob obtains a ‘copy’ of Alice’s
state, since he has the only ‘copy.’
Notice also the similarities between quantum dense coding and state tele-
portation. Both use a pre-positioned Bell pair. In quantum dense coding
Alice uses one of four operations, I, X, Y, Z on her qubit to send two clas-
sical bits to Bob (which become available when she sends her qubit to Bob
and he measures the pair in the Bell basis to obtain one of four results). In
teleportation, Alice makes a joint measurement of her unknown state |ψ⟩ and
her half of the Bell pair to obtain one of four results and sends the resulting
two classical bits to Bob who then uses that information to apply one of four
unitaries to his half of the Bell pair to reconstruct Alice’s state.
How might quantum teleportation be useful in a quantum computer ar-
chitecture? It is much easier to transmit classical bits than quantum bits
from one part of the computer to another (or between nodes in a quantum
computer cluster). Suppose you have a slow and unreliable channel for trans-
mitting quantum bits that can be used to (slowly and unreliably) distribute
Bell pairs. There exist error-correction protocols to distill a few high-fidelity
Bell pairs from a larger collection of faulty Bell pairs. See the discussion to
be added in Chapter XXXXX. Using this we can preposition high-quality
Bell pairs between distant nodes and then teleport quantum states between
them using only ‘local operations and classical communication’ (LOCC).
We will later study an even more powerful protocol in which quantum
gates on distant qubits can be performed locally and then teleported into a
distant qubit. Like state teleportation, this ‘gate teleportation’ has numerous
applications in quantum computer architectures. For example, the CNOT
logic gate requires an operation on the target qubit conditioned on the state
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT129

of the control qubit. This means the two qubits have to physically interact
in some way, something that is most easily accomplished if the qubits are
adjacent to each other in the hardware layout. We can release this constraint
and peform a CNOT between distant qubits using gate teleportation.

4.8 YET TO DO:

1. Explain how Hadamard gates interchange control and target in the
CNOT.

2. Show that there is NO state for which Alice can change the probability
of the outcomes for Bob’s Z measurement by applying any unitary to
her qubit. (Because operators on different qubits commute.) However
Alice can change Bob’s probability by making a measurement of her
qubit. But, she can’t control the outcome of the measurement, so there
is no superluminal communication.

3. The value measured for the observable is always one of the eigenvalues
of that observable and the state always collapses to the corresponding
eigenvector. (If two or more of the eigenvalues are degenerate, then
the situation is slightly more subtle. The state is projected onto the
degenerate subspace and then normalized.) Thus if the measurement
result is the non-degenerate eigenvalue λj then the state |ψ⟩ collapses
to
Pj |ψ⟩
|ψ⟩ → p ,
⟨ψ|Pj |ψ⟩
where the square root in the denominator simply normalizes the col-
lapsed state. Introduce this projector idea for measurements to avoid
the confusion that some students think measurement of X is repre-
sented by multiplying by X. Discuss further the confusion that Pauli
operators are both unitary operations and Hermitian observables.
If one measures ⃗σ · n̂ (i.e. asks ‘Are you in state | + n̂⟩ or state | − n̂⟩?),
then from the Born rule, the probability that the measurement result
is ±1 is |⟨±n̂|ψ⟩|2 . If you make a measurement of a product state,
say |00⟩ that asks, ‘Which entangled Bell state are you in?’ the state
will collapse to one of the Bell states and the collapse (aka ‘measure-
ment back action’) leaves the state entangled. Note this doesn’t work
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT130

with the Bell state measurement that relies on the decoding into the
standard basis.
Chapter 5

Algorithms and Complexity

Classes

The very first quantum algorithm was proposed by Deutsch in 1985 and so
it will be the first that we study. This was followed by the Deutsch-Jozsa
algorithm in 1992, the first algorithm that was exponentially faster than any
deterministic classical algorithm. We will learn however that probabilistic
classical algorithms can be much faster than deterministic ones, provided
that one can except a small chance of failure. Motivated by this, Simon
invented a problem for which even probabilistic classical algorithms take
exponentially long time and he found a fast quantum algorithm with bounded
failure probability that runs in polynomial time.
These ‘toy’ problems and quantum algorithms were expressly invented to
be difficult for classical computers and easy for quantum computers in order
to demonstrate the possibilities of quantum hardware. They are not practi-
cally useful algorithms but they paved the way for all subsequent ones which
have been invented, including most famously, the Grover search algorithm for
unstructured databases and Shor’s algorithm for finding the prime factors of
large numbers–a task that is required to break RSA public key encryption.
We will study a key component of Shor’s algorithm, the quantum Fourier
transform, since it has wide application.
Before beginning our study of algorithms we will need to understand two
key concepts: ‘phase kickback’ of a controlled unitary operation and the
concept of a quantum oracle.

131
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 132

5.1 Phase ‘Kickback’ of a Controlled-Unitary

Operation
Recall that the NOT operation (X) looks very classical in the standard basis

NOT|0⟩ = |1⟩ (5.1)

NOT|1⟩ = |0⟩. (5.2)

NOT simply reproduces the classical truth table. The same can be said for
the controlled-NOT (CNOT) operation which applies NOT to a target qubit
iff the control qubit is in |1⟩.
When we put the control bit of a CNOT circuit (see Fig. 5.1) into a su-
perposition state, we obtain a new non-trivial quantum effect, entanglement

1
CNOT0 |0⟩|+⟩ = √ [|00⟩ + |11⟩]. (5.3)
2
(Recall that the subscript on the CNOT tells us which qubit is the control.)

| q0 〉 = 0 H

| q1 〉 = 0

Figure 5.1: Circuit applying a CNOT gate with the control qubit q0 being in a
superposition state so that the initial product state is mapped onto an entangled
Bell state.

Something else interesting happens when the quantum NOT gate is ap-
plied when both control and target are in superposition states as illustrated
in Fig. 5.2. Notice that |±⟩ are eigenstates of the NOT operation

X|+⟩ = (+1)|+⟩ (5.4)

X|−⟩ = (−1)|−⟩. (5.5)
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 133

| q0 〉 = + + | q0 〉 = + −

| q1 〉 = + + | q1 〉 = − −

H H
=
H H

Figure 5.2: Upper panels: Circuit applying a CNOT gate with the control qubit q0
and the target qubit q1 being in a superposition state. Because the target is an
eigenstate of the NOT operation, it is actually the control qubit that gets flipped
(due to the phase kickback)! Lower panels: Using the Hadamard gates to change
from the Z basis to the X basis interchanges the control and target of the CNOT
gate.

Here we see a very non-classical effect: we can flip a qubit and leave it in
exactly the same state (up to a possible global phase factor)! It turns out
that if we do a controlled NOT operation on |±⟩ the eigenvalue of NOT is a
relative phase (not a global phase) that gets ‘kicked back’ onto the control.
The circuits shown in Fig. 5.2 illustrate the peculiar effect that results. It
seems that in the X basis the role of target and control are reversed!

H ⊗ H CNOT0 H ⊗ H = CNOT1 . (5.6)

That this reversal is due to the phase kickback can be seen from the following
1 1
CNOT0 | + +⟩ = [|0⟩ + |1⟩] ⊗ |0⟩ + (X ⊗ I) [|0⟩ + |1⟩] ⊗ |1⟩
2 2
= | + +⟩ (5.7)
1 1
CNOT0 | − +⟩ = [|0⟩ − |1⟩] ⊗ |0⟩ + (X ⊗ I) [|0⟩ − |1⟩] ⊗ |1⟩
2 2
1 1
= [|0⟩ − |1⟩] ⊗ |0⟩ − − [|0⟩ + |1⟩] ⊗ |1⟩
2 2
= | − −⟩ (5.8)
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 134

| q0 〉 = 0 H

| q1 〉

U
| qM 〉

Figure 5.3: Circuit applying a general controlled unitary to the state of M qubits.
If the M qubits are in a eigenstate of U with eigenvalue eiφ , the phase kickback
on the control rotates it around the z axis by angle φ.

The CNOT gate is a particular example of the general concept of a con-

trolled unitary operation, illustrated in Fig. 5.3. If the control qubit (q0 ) is
0, then the controlled unitary applies the identity. If the control qubit is 1,
then the M -qubit unitary U is applied to qubits 1 through M . In this circuit
the control qubit q0 is placed in the |+⟩ state by the Hadamard gate.
We are used to thinking about the (real-valued) eigenvalues and eigenvec-
tors of Hermitian observables. Here it is useful to think about the eigenvalues
and eigenvectors of the unitary operator U . Because U † U = I, it must be
that the eigenvalues of any unitary lie on the unit circle in the complex plane.
Thus in the basis of eigenvectors |ψj ⟩ of U with corresponding eigenvalues
eiφj , the matrix representation of U has the form
 
eiφ1 0 0 ...
 0 eiφ2 0 . . . 
U = 0 . (5.9)
 
iφ3
 0 e . . . 
.. .. .. ...
. . .

We see immediately that if qubits 1 through M are in the eigenstate |ψj ⟩,

the phase kicked back onto the control qubit is φj

CU [|ψj ⟩ ⊗ (α|0⟩ + β|1⟩)] = |ψj ⟩ ⊗ (α|0⟩ + βeiφj |1⟩). (5.10)

|{z} | {z }
target control

This phase kickback causes (and thus can be detected by) a rotation of the
control qubit by an angle φj around the z axis. For the case of the controlled-
NOT gate studied above, unitary is X and its eigenvalues are ±1, so the phase
kickback is either 0 or π.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 135

Phase kickback will play a central role in a number of quantum algorithms

that we will study including the quantum Fourier transform and the phase
estimation algorithm.

5.2 Exponentiation Gadget

The exponentiation ‘gadget’ is the circuit illustrated in the left panel of
Fig. 5.4 exponentiates a unitary matrix U to carry out the unitary gate
θ θ θ
e−i 2 U = cos I − i sin U, (5.11)
2 2
provided that U is also Hermitian so that U = U † and thus U 2 = I. Multi-
qubit products of Pauli operators are natural examples of unitaries satisfying
this condition. Interestingly, despite being rotated in the middle of the cir-
cuit, the control qubit starts in 0 and ends in 0, merely playing the role of a
‘catalyst.’
Single qubit rotations around an axis on the Bloch sphere defined by an
arbitrary unit vector n̂
θ
e−i 2 n̂·⃗σ (5.12)

satisfy the Euler-Pauli identity in Eq. (5.11) and are generally natively avail-
able on quantum computing hardware. Hence we don’t usually need a special
exponentiation gadget to execute such gates. (Note that the exponentiation
gadget in fact uses one Xθ gate which is a single qubit rotation a round the x
axis.) However if U is a more complex (Hermitian) unitary such as the three-
qubit operation U = Z ⊗ X ⊗ Z, the ability to exponentiate it is unlikely
to be natively available on the hardware. If however we are able to execute
a controlled version of this unitary (conditioned on the state of a separate
control qubit), then we can take advantage of the exponentiation gadget in
Fig. 5.4.
To see in more detail how the exponentiation gadget works, let |ψ⟩ repre-
sent the state of qubits 1 through M . The state of the system after the first
controlled unitary is

1
√ |ψ⟩ ⊗ |0⟩ + U |ψ⟩ ⊗ |1⟩ . (5.13)
2
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 136

Using the Euler-Pauli identity, we see that after the Xθ gate the state of the
system is

1 θ θ θ θ
√ |ψ⟩ ⊗ [cos |0⟩ − i sin |1⟩] + U |ψ⟩ ⊗ [cos |1⟩ − i sin |0⟩] . (5.14)
2 2 2 2 2
Using the fact that U 2 = I, we see that after the second controlled unitary,
the state of the system is

1 θ θ θ θ
√ |ψ⟩⊗[cos |+⟩+ U |ψ⟩ ⊗[−i sin |+⟩] = cos I−i sin U |ψ⟩ ⊗|0⟩,
2 2 2 2 2
(5.15)
which reproduces Eq. (5.11). Notice that for θ = 0, this is the identity gate.
This makes sense because in this case, the Xθ gate in Fig. 5.4 is the identity
and the two controlled unitaries either act zero times or twice, in both cases
yielding identity for the overall circuit.
As a simple but practical example, suppose that U is the three-qubit
unitary U = Z ⊗ Z ⊗ Z. By itself, this operator cannot take a product state
into an entangled state. For example
U (|+⟩ ⊗ |+⟩ ⊗ |+⟩) = |−⟩ ⊗ |−⟩ ⊗ |−⟩ (5.16)
takes a product state to a product state. The fact that we end up with a
product state remains true even if we exponentiate the individual gates since
θ2 θ1 θ0

e−i 2 Z ⊗ e−i 2 Z ⊗ e−i 2 Z [|+⟩ ⊗ |+⟩ ⊗ |+⟩] =
θ2 θ1 θ0
e−i 2 Z |+⟩ ⊗ e−i 2 Z |+⟩ ⊗ e−i 2 Z |+⟩ . (5.17)

However, this product of exponentials is not the same as the exponential of

the product, since by the Pauli-Euler identity
θ θ θ
ZZZ(θ) = e−i 2 (Z⊗Z⊗Z) = cos I − i sin (Z ⊗ Z ⊗ Z) (5.18)
2 2
which (for generic θ) is an entangling gate for qubits q1 , q2 , q3 in the circuit
shown in Fig. 5.4
ZZZ(θ) [|+⟩ ⊗ |+⟩ ⊗ |+⟩] =
θ θ
cos [|+⟩ ⊗ |+⟩ ⊗ |+⟩] − i sin [|−⟩ ⊗ |−⟩ ⊗ |−⟩] . (5.19)
2 2
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 137

q0 0 H Xθ H 0 q0

q1 q1 Z
q2 Z
q3
U U
qM
Figure 5.4: Left panel: Circuit representing the exponentiation gadget that expo-
nentiates a unitary U provided that it is also Hermitian, U = U † . The single-qubit
θ
gate Xθ ≡ e−i 2 X is a rotation by angle θ about the x axis. Right panel: circuit to
synthesize the controlled ZZZ unitary U using three cZ gates.

This superposition of two product states is an entangled state (for generic θ ̸=

0, π). The controlled version of this three-qubit unitary is readily achieved
using three cZ gates as shown in the right panel of Fig. 5.4.
The exponentiation gadget can be used a variety of tasks, but is partic-
ularly useful for synthesizing complicated unitaries out of a series of simpler
gates. Physicists familiar with the concept of time evolution under a Hamil-
tonian will recognize Eq. (5.18) as giving the time evolution operator for a
spin model with three-spin interactions.
It is interesting to note that by inserting identity operations in the form
of pairs of Hadamard gates (using the identities H 2 = I, HXH = Z, and
HZH = X) and using the CNOT (equivalently CX) identity shown in the
bottom panels of Fig. 5.2, we can give a new, and very different looking
but fully equivalent, representation of the exponentiation gadget for the case
U = Z ⊗ Z ⊗ Z shown in Fig. 5.4. This representation, shown in Fig. 5.5
is intimately related to the circuit shown in Fig. 4.3 that allows for multi-
qubit Z measurements. The first portion of this gadget uses three CNOT
(CX) gates to map the value of Z1 Z2 Z3 onto Z0 (just as was done in the
measurement circuit in Fig. 4.3). The Zθ gate then rotates q0 by an angle
θ about the z axis. Then the last portion of the circuit ‘uncomputes’ the
mapping returning q0 to state |0⟩. The Zθ rotation in the middle of the circuit
creates the desired exponential unitary given in Eq. (5.18). In general, any
Hermitian unitary (e.g., an arbitrary Pauli operator string) automatically
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 138

Figure 5.5: Upper panel: The exponentiation gadget shown in the left panel of
Fig. 5.4, for the particular case of U = Z1 Z2 Z3 . Lower panel: A completely
equivalent alternative circuit, reminiscent of the multiqubit measurement circuit
in Fig. 4.3.

has only eigenvalues ±1 and can be mapped onto a single auxiliary qubit
using this trick and thus any such unitary can be exponentiated.

5.3 Quantum Oracles

Oracle, a person (such as a priestess of ancient Greece) through whom a deity
is believed to speak.
From the Latin oraculum, from orare, ‘to speak’

A quantum oracle is a ‘blackbox’ whose inner workings may be inacces-

sible to the user. Oracles are often used as inputs to quantum algorithms.
In particular they are used to bring classical data into a quantum computa-
tion and can also be used inside quantum algorithms to evaluate functions.
We will soon learn about the Grover algorithm to search an unsorted classi-
cal database. The algorithm requires an oracle to make ‘calls’ to query the
data base. Typically an oracle is a unitary operation (no measurements are
involved that would bring in partial state collapse) as illustrated in Fig. 5.6.
As an example, let ⃗b = (bn−1 bn−2 . . . b2 b1 b0 ) be a classical binary bit string
of length n and let f be a classical binary function: f (⃗b) = ⃗c, where ⃗c is also
a binary bit string of the same length n. If we want, we can just think of
this function as a bunch of classical data as illustrated in Table 5.1
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 139

| ψ in 〉  | ψ out 〉

classical
data
Figure 5.6: Circuit representation of a generic oracle O that performs a unitary
transformation based on (i.e., parametrized by) classical data supplied to it..

⃗b ⃗c
000 001
001 010
010 011
011 100
100 101
101 110
110 111
111 000

Table 5.1: Data defining the binary function f (⃗b) = ⃗c, that is equivalent to F (k) = (k + 1)
mod 8 for integer k ∈ {0, 7}.

How do we turn this classical binary function into a quantum oracle?

Let the quantum state |ψin ⟩ = |bn−1 bn−2 . . . b2 b1 b0 ⟩ in the standard basis
represent the input to the oracle (and to the function), and let the quantum
state |ψout ⟩ = |cn−1 cn−2 . . . c2 c1 c0 ⟩ represent the output of the oracle (and the
function). Then the action that we require of the unitary oracle is

O|⃗b⟩ = |⃗c⟩ = |f (⃗b)⟩. (5.20)

This seems straightforward, but actually there is a problem if f is not
invertible (that is, f (⃗x) = f (⃗y ) for some ⃗x ̸= ⃗y ). If this is the case then
O|⃗x ⟩ = O|⃗y ⟩ (5.21)
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 140

implies that

⟨⃗x |O† O|⃗y ⟩ = 1, (5.22)

even though ⟨⃗x |⃗y ⟩ = 0 because ⃗x ̸= ⃗y . Hence O cannot be unitary because

it does not preserve inner products. Equivalently, it is not invertible and, by
definition, unitaries are invertible since U −1 = U † .
Here is a second problem: the length m of the output string for a generic
function need not be the same as the length n of the input string. We can fix
this problem and at the same time guarantee that O is unitary by the circuit
construction illustrated in Fig. 5.7. This circuit executes the transformation

⃗ ⊗ |⃗b⟩) = |d⃗ ⊕ f (⃗b)⟩ ⊗ |⃗b⟩,

O(|d⟩ (5.23)

where ⊕ represents bitwise addition modulo 2 (bitwise XOR) as defined in

Eq. (1.48). For example,

(11010111) ⊕ (01011100) = (10001011). (5.24)

This construction should look familiar because it is exactly analogous to that

used in the middle panel of Fig. 1.6 to construct a reversible AND gate. The
input to the function is available at the output so that the gate is reversible.
Recalling the argument used for the reversible AND gate, notice that
⃗ ⊗ |⃗b⟩) = |d⃗ ⊕ f (⃗b) ⊕ f (⃗b)⟩ ⊗ |⃗b⟩ = |d⟩
O2 (|d⟩ ⃗ ⊗ |⃗b⟩. (5.25)

For the AND gate, the particular function used had two input bits and one
output bit: f (x, y) = x ∧ y, but the reversibility argument applies to any
binary function. This guarantees that O2 = I and thus the inverse exists
O−1 = O. This also shows that the only allowed eigenvalues of O are ±1.
Furthermore, if we represent O as a matrix acting on a column vector
⃗ ⊗ |⃗b⟩ it will yield a new vector representing |d⃗ ⊕ f (⃗b)⟩ ⊗ |⃗b⟩.
representing |d⟩
Similarly,

O(|d⃗ ⊕ f (⃗b)⟩ ⊗ |⃗b⟩) = |d⟩

⃗ ⊗ |⃗b⟩, (5.26)

shows us that the matrix representing O is symmetric. Because the quantum

⃗ ⃗b, f (⃗b) are all taken
amplitudes of the states defined by the binary strings d,
to be real, it is straightforward to show that the matrix representing O is
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 141

real and symmetric, and therefore it is Hermitian. Hence O† = O = O−1

proving that O is unitary. An alternative proof involves showing that O
maps orthogonal states to orthogonal states. Let
|ψ⟩ = |d⟩⃗ ⊗ |⃗b⟩, (5.27)
|ψf ⟩ = |d⃗ ⊕ f (⃗b)⟩ ⊗ |⃗b⟩, (5.28)
|ψ ′ ⟩ = |d⃗′ ⟩ ⊗ |⃗b′ ⟩, (5.29)
|ψ ′ ⟩ = |d⃗′ ⊕ f (⃗b′ )⟩ ⊗ |⃗b′ ⟩.
f (5.30)
Then one can readily show that
⟨ψ ′ |ψ⟩ = 0 =⇒ ⟨ψf′ |ψf ⟩ = 0. (5.31)

For the case ⃗b′ ̸= ⃗b, the orthogonality follows immediately, independent of
any property of the function f . For the case ⃗b′ = ⃗b, d⃗′ ̸= d,
⃗ it follows from
f (⃗b) = f (⃗b ) that |ψf ⟩ and |ψf ⟩ involve different bit strings d⃗ ⊕ f (⃗b) and
′ ′

d⃗′ ⊕ f (⃗b) and are therefore orthogonal. This together with the eigenvalues of
O being ±1 also proves unitarity.
Thus we have found a construction for creating a unitary oracle based on
classical data supplied in the form of classical binary function f , even if that
function is itself not reversible. The reversibility relies on f (⃗b) ⊕ f (⃗b) = ⃗0
which is true for any binary function.

 q0 
| b〉 | b〉
qn

 qn +1    
| d〉 | c 〉= | d ⊕ f (b )〉
qn + m

classical
data

Figure 5.7: Circuit representation of an oracle representing a function whose domain

(input) is bit strings of length n and whose range (output) is bit strings of length m.
⊕ represents bitwise addition mod 2 (bitwise XOR). The oracle gate is guaranteed
to be unitary and reversible even if the classical function f is not.

Again, the simplest example of all this is the AND gate, or equivalently,
the Toffoli gate (or CCNOT gate) shown in Fig. 1.6. This gate can be taken
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 142

directly from the (reversible version of the) classical gate into the quantum
gate construction shown in Fig.5.7 whose action is defined by Eq. (5.23).
Exercise 5.1. Construct a quantum oracle corresponding to the clas-
sical function that adds two one-bit binary numbers (a0) and (b0) to
obtain one two-bit binary number (c1 c0).

5.4 Deutsch Algorithm

We are now ready to study our first algorithm. As mentioned in the in-
troduction to this chapter, in 1985 David Deutsch developed the very first
quantum algorithm. It only solves a ‘toy’ problem, but it opened the door
to a new world. Let us now enter that new world.
Here is the problem setup: Alice creates an oracle for a binary function
f of her choosing that maps one bit to one bit (i.e., the input and output
string lengths are n = m = 1). Bob’s task is to query the oracle with
his quantum computer to acquire information about the (unknown to him)
function. Recall from the list in Table 1.3 and Eq. (1.41) that for n = m = 1,
the number of such functions is 22 = 4. Two of the functions (ONE and
ZERO) have constant output and the other two (IDENTITY and NOT)
have balanced outputs (i.e., 0 and 1 occur equally often).
Bob’s task is not to find the full function encoded in the oracle (a task
which would require learning two classical bits of information that distinguish
the four different functions), but simply to learn one bit of information that
tells him if the function is constant or balanced. Classically Bob must query
the oracle twice to identify the full function f before being able to determine
which category the function is in. For example, suppose Bob asks the oracle,
‘What is f (0)?’ and the oracle replies ‘1.’ As a result Bob knows that either
f = ONE (constant) or f = NOT (balanced), but he does not know which.
Bob is therefore forced to make a second query, ‘What is f (1),’ to resolve
the uncertainty. The answer tells him exactly which function f is and thus
which category it is in. He has learned two bits of information which is more
than he needed, but there was no way to avoid this.
If Bob has a quantum computer he can use the Deutsch algorithm to make
a single query to the oracle but with a ‘superposition of questions’ that will,
through quantum interference, give Bob the correct answer to the question
of whether the function is constant or balanced. The circuit for this is given
in the Fig. 5.8. To see how this works notice that Bob does not require the
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 143

individual values of f (0) and f (1) but only needs to know one bit of global
information about the function, namely the value of g = f (0) ⊕ f (1) since:
g = 0 ⇐⇒ f is CONSTANT (5.32)
g = 1 ⇐⇒ f is BALANCED. (5.33)
Again it is clear that classically one has to query the oracle twice to learn
the value of this global property of the function. While this is only a toy
problem, it is remarkable that one can learn a global property of a function
with only a single (quantum) query!
Let us now see how the Deutsch quantum algorithm works. Since this is
a quantum oracle, Bob is free to give a superposition state as input
1
|ψin ⟩ = |d⟩ ⊗ |+⟩ = |d⟩ ⊗ √ |0⟩ + |1⟩ , (5.34)
2
where d is the bit value at the input to the oracle (on what will be the
eventual output line) as shown in Fig. 5.8. It follows from the linearity of
unitary transformations that the output state is
1
|ψout ⟩ = √ |d ⊕ f (0)⟩ ⊗ |0⟩ + |d ⊕ f (1)⟩ ⊗ |1⟩ . (5.35)
2
We see that there is information about both f (0) and f (1) in the output
state even though we have queried the oracle only once. We have asked a
‘superposition of questions.’
Our task is to now harness this result to achieve Bob’s goal. Unfortu-
nately, if we stop here and measure the state in the computational basis, it
will collapse randomly into either the state

|ψout ⟩ = |d ⊕ f (0)⟩ ⊗ |0⟩, (5.36)

or the state

|ψout ⟩ = |d ⊕ f (1)⟩ ⊗ |1⟩. (5.37)

From the measurement result (and knowing the initial state of the d qubit)
we will randomly learn either the value of f (0) or the value of f (1). Even
though the state before the measurement contained information about both
f (0) and f (1), the measurement has not captured the global information we
seek.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 144

To remedy this situation we need to put both the b and d input qubits
into superposition
|ψin ⟩ = |−⟩ ⊗ |+⟩
1
= |0⟩ ⊗ |0⟩ + |0⟩ ⊗ |1⟩ − |1⟩ ⊗ |0⟩ − |1⟩ ⊗ |1⟩ , (5.38)
2
for which the oracle yields
1
|ψout ⟩ = |0 ⊕ f (0)⟩ ⊗ |0⟩ + |0 ⊕ f (1)⟩ ⊗ |1⟩
2
−|1 ⊕ f (0)⟩ ⊗ |0⟩ − |1 ⊕ f (1)⟩ ⊗ |1⟩ . (5.39)
To make further progress, let us consider the two cases:

Case I: f is constant.

In this case f (0) = f (1) and f (0) ⊕ f (1) = 0. Thus we can rewrite the
output state as

|0 ⊕ f (0)⟩ − |1 ⊕ f (0)⟩ |0⟩ + |1⟩
|ψout ⟩ = √ ⊗ √ (5.40)
2 2
= (±|−⟩) ⊗ (|+⟩) . (5.41)
Note that because the function f is constant, the state |c⟩ is going to be
±|−⟩. The (unobservable) ± phase factor is determined by whether the con-
stant function is ONE or ZERO. As we will see shortly, the information Bob
seeks lies in the state |b⟩.

Case II: f is balanced.

In the case that f is balanced, we know that f (0)⊕f (1) = 1 which implies
0 ⊕ f (0) = 1 ⊕ f (1) (5.42)
0 ⊕ f (1) = 1 ⊕ f (0). (5.43)
Using these relations, Eq. (5.39) can be rewritten
1
|ψout ⟩ = √ |0 ⊕ f (0)⟩ ⊗ |−⟩ + |0 ⊕ f (1)⟩ ⊗ (−|−⟩) (5.44)
2
= (±|−⟩) ⊗ (|−⟩) . (5.45)
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 145

In this case, the (unobservable) ± global phase factor depends on whether

the balanced function is IDENTITY or NOT. We see that, in contrast to
Eq. (5.41), the state of |b⟩ has now changed to the |−⟩ state. This means
that Bob will therefore be able to determine from a measurement of b in the
x basis whether the function is balanced or constant, after only a single query
of the oracle.
Before moving on, it will be useful to repeat the above analysis in a
new way that will prove useful when we study the Deutsch-Jozsa algorithm.
(The analysis here follows that in Nielsen and Chuang except they use the
opposite ordering for the qubit labels.) The input state to the algorithmic
circuit shown in Fig. 5.8 is |−⟩ ⊗ |+⟩ (in the number convention where the
top wire is qubit q0 and the bottom qubit is q1 ). The state after the oracle
acts is
( )
|0⟩ − |1⟩ 1 X
|ψ1 ⟩ = Of √ ⊗√ |x⟩
2 2 x=0,1
X 1 1
= √ |0 ⊕ f (x)⟩ − |1 ⊕ f (x)⟩ ⊗ √ |x⟩
x=0,1
2 2
1 X
=√ (−1)f (x) |−⟩ ⊗ |x⟩
2 x=0,1
1 X
= |−⟩ ⊗ √ (−1)f (x) |x⟩ (5.46)
2 x=0,1

Note that the dummy variable x here ranges over {0, 1} and does not repre-
sent anything to do with eigenvectors of the Pauli X operator.
To find the state after the final Hadamard gate, it is useful to invoke the
following handy identity which can be readily verified by hand for the two
cases x = 0 and x = 1
1 X
H|x⟩ = √ (−1)xz |z⟩. (5.47)
2 z=0,1

Using this identity, the output state of the circuit is

1 X X
|ψ2 ⟩ = |−⟩ ⊗ (−1)f (x)+xz |z⟩. (5.48)
2 z=0,1 x=0,1
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 146

Let’s look at the quantum amplitude az to find qubit q0 in state |z⟩

1 X
a0 = (−1)f (x) , (5.49)
2 x=0,1
1 X
a1 = (−1)f (x)+xz . (5.50)
2 x=0,1

Notice that if f is constant (so that f (x) = f (0)) we have

a0 = (−1)f (0) = ±1, (5.51)

1 X
a1 = (−1)f (0) (−1)x = 0. (5.52)
2 x=0,1

For this case the measurement result is guaranteed to yield z = 0 since

|a0 |2 = 1. Conversely if f is balanced, then the summation
X
(−1)f (x) = 0, (5.53)
x=0,1

because the summand is +1 and −1 equally often. Thus z = 0 never occurs

for the balanced case and the measurement result must always be z = +1. In
fact, it is straightforward to show that for the balanced case the amplitude
to end up in state |−⟩ ⊗ |1⟩ is
1 X
a1 = (−1)f (x)+x = (−1)f (0) = ±1. (5.54)
2 x=0,1

These results are perfectly consistent with Eqs. (5.41) and (5.45) once you
take into account that the final Hadamard gate that is included in Fig. 5.8
but not present in Fig. 5.7.
Exercise 5.2. The oracle for the Deutsch algorithm encodes one of
four possible functions, ZERO, ONE, IDENTITY, and NOT. Construct
explicit quantum circuits to realize each of these oracle functions.

5.5 Deutsch-Jozsa Algorithm

The Deutsch algorithm showed us that it is possible to learn a global prop-
erty of a function that maps one classical bit into another classical bit by
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 147

b = + H
f
d = − c = ± −

Figure 5.8: Circuit that executes the Deutsch algorithm to determine with a single
query whether the function f is constant or balanced. The classical data defining
the function f is encoded in the oracle Of . Measurement of qubit b yielding result
z = +1 tells us that f is constant, while measurement result z = −1 tells us that
f is balanced.

encoding the function in a quantum oracle and querying the oracle only a
single time. This is a toy model but a highly instructive one. The Deutsch-
Jozsa algorithm also involves a toy problem, but a more sophisticated one,
designed to show off an exponential separation between the power of the
best deterministic classical algorithm for the problem and a simple quantum
algorithm.
From our study of the Deutsch algorithm we know that there are four
binary functions f : {0, 1} → {0, 1} and that two of them are constant and
two of them are balanced. For Deutsch-Jozsa we will study functions that
map n bits to one bit: f : {0, 1}n → {0, 1} where the situation is more
complicated. Let ⃗x, ⃗y be vectors in {0, 1}n (i.e., binary strings of length n).
Alice chooses a function f such that f (⃗x) = 0 (say) for some ⃗x and f (⃗y ) = 1
(say) for some ⃗y . Recall that the number of functions on n bits into m bits is
n n
Z(n, m) = 2m2 . In this case m = 1 so Z = 22 . Out of this enormous space
of possible functions that Alice could choose, only two of them are constant,
ZERO : f (⃗x) = 0 ∀⃗x, (5.55)
ONE : f (⃗x) = 1 ∀⃗x. (5.56)
A balanced function is defined as before. Of the 2n possible input strings, it
outputs 0 for half of its 2n possible arguments and outputs 1 for the other
half of its arguments. If we look at the example of n = 2 which is listed
2
in its entirety in Table 1.4, there are Z(2, 1) = 22 = 16 possible functions.
Of these, two are constant, 6 are balanced and 8 are neither constant nor
balanced. For larger n, the vast majority of the functions are neither constant
nor balanced. (See Ex. 5.3).
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 148

In the Deutsch-Jozsa problem, Alice gives Bob a promise that the function
f she has chosen for her oracle maps n bits to one bit and is either constant
or balanced. (Note that since most functions are neither, this is an important
promise.) Bob’s task is to discover where the f is constant or balanced The
circuit for the Deutsch-Jozsa algorithm is essentially the same as in Fig. 5.8,
except the upper wire is replaced by n input wires in a product state in which
each qubit in the |+⟩ state. At the output, a Hadamard is applied to each of
the upper wires followed by a Z measurement of each of the n upper wires.
We begin our analysis of the circuit by noting that the input product
state of the upper n wires an be written as an equal superposition of all
computational basis states
1 X
|+⟩⊗n = √ |⃗x ⟩, (5.57)
2n ⃗x

where ⃗x is a binary string of length n and the sum is over all 2n such strings.
The analog of Eq. (5.46) for the state of the circuit after application of
the oracle is thus
( )
|0⟩ − |1⟩ 1 X
|ψ1 ⟩ = Of √ ⊗√ |⃗x ⟩
2 2n ⃗x
X 1 1
= √ |0 ⊕ f (⃗x)⟩ − |1 ⊕ f (⃗x)⟩ ⊗ √ |⃗x ⟩
⃗
x
2 2n
1 X
= |−⟩ ⊗ √ (−1)f (⃗x) |⃗x ⟩, (5.58)
n
2 ⃗x

The analog of Eq. (5.47) is

1 X
H ⊗n |⃗x ⟩ = √ (−1)⃗x.⃗z |⃗z⟩, (5.59)
n
2 ⃗z

where the ‘inner product’ is computed bitwise modulo 2

n −1
2X
!
⃗x.⃗z = xj zj mod 2. (5.60)
j=0

See App. B.4 for a discussion of the inner product for the vector space of bit
strings.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 149

Using this identity, the output state of the circuit before the final mea-
surements is
1 XX
|ψ2 ⟩ = |−⟩ ⊗ n (−1)f (⃗x)+⃗x.⃗z |⃗z⟩. (5.61)
2
⃗
x ⃗
z

By analogy with what we did in the case of the Deutsch algorithm, let us
look at the amplitude of the state |−⟩ ⊗ |⃗z = ⃗0⟩
1 X
a0 = n
(−1)f (⃗x) . (5.62)
2
⃗
x

⃗
If f is constant a0 = (−1)f (0) = ±1 and so all other amplitudes vanish, and
the only possible measurement outcome is the all zeroes bit string ⃗0. If f is
balanced then a0 = 0 and the all zeroes measurement result never occurs.
Thus if any of the measurement results is non-zero, we are guaranteed that
the function is balanced, and if all the measurement results are zero, we are
guaranteed that the function is constant.
Just as for the Deutsch algorithm, a single query of the quantum oracle
tells us the global property of the function. Thus the performance of the two
quantum algorithms is essentially the same. The difference here lies in the
difficulty of the classical algorithm to solve these two problems. The classi-
cal algorithm requires only two oracle queries to solve the Deutsch problem
with certainty. Indeed two queries is enough for Bob to completely specify
the function Alice chose, not just whether it is constant or balanced. The
Deutsch-Jozsa problem is however much harder since it maps n bits to one
bit rather than just 1 bit to 1 bit. To completely learn Alice’s function, Bob
would have to query the function 2n times, once for each possible argument
⃗x of the function to learn all of the f (⃗x) values. How many queries must
Bob make to learn only whether the function is constant or balanced? The
worst-case classical scenario is when the function is such that Bob queries
the oracle 2n /2 times (with different arguments) and gets the same result
(say 0) every time. This means the function could be constant (if all of the
remaining queries return 0) or it could be balanced if all of the remaining
queries return 1). Since Alice has given Bob a promise that the function is
either constant or balanced, he needs to measure only one more value of the
function to be certain of the answer. Thus the so-called ‘query complexity’
of the best classical algorithm is 2n /2 + 1 is exponentially larger than that
of the quantum algorithm.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 150

Interestingly, one can use a slightly different measure of the degree of dif-
ficulty of the classical algorithm and obtain a very different answer. Suppose
that we seek a classical algorithm which is stochastic (i.e., involves random-
ness in some way) and produces the correct answer with probability 1 − ϵ.
We can ask how the query complexity grows as we make the acceptable error
probability ϵ smaller and smaller. It turns out that Turing machines (uni-
versal classical computers) that contain a source of (true) randomness can
create stochastic algorithms that are more powerful than purely determin-
istic Turing machines. To understand this, suppose that Alice and Bob are
adversaries and Alice is trying to choose functions f that make Bob’s task as
difficult as possible. If she knows that Bob always deterministically orders
a certain way (say first querying f (0000), then f (0001), then f (0010), etc.
in ascending order), then she can choose functions that produce the worst-
case scenario discussed above. Bob however can defeat this strategy if his
queries use truly randomly chosen (but non-repeating) arguments ⃗x. If he
ever sees any two values of the function that are different from each other,
he knows that the function is balanced. If the function is balanced and the
arguments ⃗x are chosen randomly, then any given query result f (⃗x) is equally
likely to be 0 or 1. Suppose Bob is unlucky and after M queries he has seen
f (⃗x) = 1 (say) every time, strongly suggesting to Bob that the function is
constant. The probability that a balanced function would give this result
(thereby fooling Bob)

Pfail (M ) = ϵ = 2 × 2−M , (5.63)

where the extra factor of 2 is from the fact that the first M measurement
results could be all 0’s or all 1’s. This equation is reminiscent of Eq. (4.70)
presented in the discussion of cloning and superluminal communication.
Note that Eq. 5.63 assumes that ⃗x is chosen randomly and allows for the
possibility that the same bit string could be chosen more than once. It would
be (slightly) better if Bob chose ⃗x randomly and then removed it from his
list so that there would be no repeats. Then the failure probability would
be slightly lower because he can’t be fooled by the same bit strings a second
time. Imagine you have a box filled with N = 2n numbers, half of them being
0 and half of them one. The probability that the first one you draw from
the box is a 0 is NN0 = 12 , where N0 = N/2 is the initial number of 0’s. After
discarding that number, the probability that the second number is also a 0
is N0N−1 = 21 − N1 . Continuing this process we see that the failure probability
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 151

in Eq. 5.63 should replaced by

M −1
−M
Y 2j
P̃fail (M ) = 2 × 2 1− , (5.64)
j=0
N
M −1
Y 2j M (M −1)
−M
≈2×2 e− N ≈ 2 × 2−M e− N . (5.65)
j=0

where we have made the reasonable assumption that M ≪ N since we do

not need to consider large M to drive the failure probability extremely low.
The rule of not repeating any queries with the same ⃗x makes the failure
probability (slightly) smaller.
These results show us that there is a huge reduction in query complexity
if you are willing to accept a tiny (exponentially small) chance of failure.
Another way to say it is that the average case is much easier classically than
the worst case.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 152

Exercise 5.3. Show that the number of balanced functions

f : {0, 1}n → {0, 1}

is given by the binomial coefficient

n
2
.
2n−1

Using the Stirling (asymptotic) approximation for the factorial of a num-

ber,
1 1
ln M ! ∼ (M + ) ln M − M + ln(2π) + . . . ,
2 2
show that the fraction F (n) of all functions that are balanced rapidly
becomes small as n increases, and asymptotically approaches
2
F (n) ∼ √ .
2π2n
An intuitive way to think about this result is the following. Rather
than considering all possible functions f , consider a single example of f
whose value f (⃗x) = ±1 is chosen randomly for each
P value of its argument
⃗x. Then the probability distribution for S = ⃗x f (⃗x) will, to a good
approximation, be a Gaussian with mean N/2 and variance N/4 with
N = 2n s
1 − 1 (S−N/2)2
P (S) ≈ e 2(N/4) .
2π(N/4)
Note that for N ≫ 1 the Gaussian is such aPslowly varyingRfunction that
+∞
we can replace the normalization condition N S=0 P (S) by −∞ dS P (S).
The probability of the function being balanced is
2
F (n) = P (N/2) ≈ √ .
2π2n

5.6 Bernstein-Vazirani Problem

[The discussion here follows Riefel and Polak, Sec. 7.5.2.]
The Bernstein-Vazirani problem bears some relationship to the Deutsch-
Jozsa problem that we studied previously and is defined in the following way.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 153

Alice creates a function f : {0, 1}⊗n → {0, 1} of the form

f (⃗x) = ⃗u.⃗x, (5.66)

where

⃗u = (un−1 un−2 . . . u2 u1 u0 ) (5.67)

is a fixed binary vector that Alice chooses. Bob’s task is to learn ⃗u with the
smallest possible number of queries of the function f . Before studying the
quantum algorithm, let us think about the query complexity in the classical
case. The vector ⃗u is unknown to Bob since Alice has chosen it. Bob can
query the function with a sequence of input vectors each of which contains
only a single non-zero entry

⃗x0 = (00 . . . 001)

⃗x1 = (00 . . . 010)
⃗x2 = (00 . . . 100)
...
⃗xn−2 = (01 . . . 000) (5.68)
⃗xn−1 = (10 . . . 000), (5.69)

and with each query will learn one bit in the string ⃗u because

f (⃗xj ) = uj . (5.70)

Since Alice could have chosen the bits in the string ⃗u at random, there is no
shortcut that can reduce the classical query complexity below n, the length
of the bit string.
Remarkably, the Bernstein-Vazirani quantum algorithm can find ⃗u (with
certainty) with only a single query to the quantum oracle that encodes the
function f ! The circuit is the same one used for the Deutsch-Jozsa algorithm
and the output is given by Eq. (5.61). However because of the particular
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 154

The last line follows from the fact that if ⃗y = ⃗u ⊕ ⃗z ̸= ⃗0 then the function
g(⃗x) = ⃗y .⃗x is necessarily balanced (see Box 5.1), meaning that
X
(−1)g(⃗x) = 0 (5.72)
⃗
x

for all ⃗y ̸= ⃗0. Because ⃗y = ⃗u ⊕ ⃗z = ⃗0 ⇐⇒ ⃗z = ⃗u, the only term that

survives the sum over ⃗x is ⃗z = ⃗u. Thus the output measurements from the
circuit immediately give us the unknown vector ⃗u after only a single call to
the oracle thereby providing a polynomial (in n) quantum advantage.
David Mermin provided an interesting alternative view of the Bernstein-
Vazirani circuit based on constructing the actual quantum circuit shown
in Fig. 5.9 that realizes the oracle. Using this oracle with the b register
initialized in |+⟩⊗n and d0 initialized in |−⟩ as shown in the left panel of
Fig. 5.10 executes the Bernstein-Vazirani algorithm analyzed above. We can
gain a new understanding of the algorithm by inserting identity operators
in the form of pairs of Hadamard gates on each side of each CNOT gate.
Using the identity from Eq. (5.6) illustrated in the lower panel of Fig. 5.2
yields the equivalent circuit shown in the right panel of Fig. 5.10. Now we see
immediately that because d0 is in state |1⟩, the algorithm works by simply
mapping the b register from its input value ⃗b = ⃗0 to the output value ⃗b = ⃗u.
No quantum superpositions appear anywhere in this version of the algorithm!
Alice has tried to challenge Bob by hiding a series of CNOTs inside the oracle
that are controlled by a random (unknown to Bob) subset of the input qubits
and that all target the output qubit. It seems like it would be impossible for
Bob to meet Alice’s challenge in a single query of the oracle. However Bob
can cleverly turn the tables on Alice by working in the X basis, interchanging
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 155

the control and target qubits, thereby giving himself control of the situation
and flipping only those input bits j for which uj = 1, thereby decoding Alice’s
hidden bit string in a single query of the oracle!

Box 5.1. Balanced Function Lemma

Let ⃗x and ⃗y be bit strings of length n. Define the function g : {0, 1}n → {0, 1}
via
n−1
!
X
g(⃗x) = ⃗y .⃗x = y j xj mod 2. (5.73)
j=0

Lemma: g(⃗x) is a balanced function (outputs 0 and 1 an equally often)

provided ⃗y ̸= ⃗0.
Proof: Since ⃗y ̸= ⃗0, ⃗y has at least one component that is non-zero (i.e.
is unity). Let the index of the first (counting from right to left) non-zero
component of ⃗y be k. Then every vector ⃗x has a unique partner ⃗x ′ which
differs from ⃗x only at position k. It follows immediately that

g(⃗x) + g(⃗x ′ ) = 1
′
(−1)g(⃗x) + (−1)g(⃗x ) = 0
X
(−1)g(⃗x) = 0, (5.74)
⃗
x

and thus g(⃗x) is balanced. Boxes 5.2 and 5.3 present more complex but
interesting and informative alternative proofs.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 156

b0
b1
b2
b3
b4
b5
 
d0 d0 ⊕ u. b

Figure 5.9: Circuit realization of the Bernstein-Vazirani oracle for the function
f (⃗b) = ⃗u.⃗b with (in this case) ⃗u = (u5 u4 u3 u2 u1 u0 ) = (101101). The CNOT gates
all have d0 as their target, but the controls are limited to those wires j for which
uj = 1. Thus the CNOT will flip d0 only if bj uj = 1 and the net result after all
the CNOTs have been applied is d0 → d0 ⊕ ⃗u.⃗b.

Figure 5.10: Left panel: Circuit realizing the Bernstein-Vazirani oracle and al-
gorithm for the function f (⃗b) = ⃗u.⃗b with (in this case) ⃗u = (u5 u4 u3 u2 u1 u0 ) =
(101101). The CNOT gates all have d0 as their target, but the controls are limited
to those wires j for which uj = 1. Thus the CNOT will flip d0 only if bj uj = 1
and the net result after all the CNOTs have been applied is d0 → d0 ⊕ ⃗u.⃗b. Right
panel: Mermin’s analysis of the same circuit obtained by conjugating each CNOT
with identity operations H 2 = I. The controls are now all d0 = 1 and the targets
are those input lines j for which uj = 1. Thus the output register maps to ⃗b = ⃗u.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 157

Box 5.2. Balanced Function Lemma Alternative Proof I: Let ⃗x and

⃗y be bit strings of length n. Define the function g : {0, 1}n → {0, 1} via
n−1
!
X
g(⃗x) = ⃗y .⃗x = y j xj mod 2. (5.75)
j=0

Lemma: g(⃗x) is a balanced function provided ⃗y ̸= ⃗0.

Proof: Note that the following summation over all bit strings ⃗x contains
2n terms (and that for this particular calculation we can drop the modular
arithmetic)
 
X X X n−1
Y X
P
Q≡ (−1)g(⃗x) = ... (−1) j xj yj =  (−1)xj yj 
⃗
x x0 =0,1 xn−1 =0,1 j=0 xj =0,1
(5.76)

Let S be the set of indices j for which yj = 1. Since ⃗y ̸= ⃗0, this set is not
empty. Let S̄ be its complement (i.e., the set of indices k for which yk = 0).
Then we can rewrite the product above as two products
!  
Y X Y X
Q= (−1)0  (−1)xj  . (5.77)
k∈S̄ xk =0,1 j∈S xj =0,1

Each term in the second product vanishes. Hence Q = 0 and thus g is

balanced. QED.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 158

Box 5.3. Balanced Function Lemma Alterantive Proof II: Let ⃗x and
⃗y be bit strings of length n. Define the function g : {0, 1}n → {0, 1} via

g(⃗x) = ⃗y .⃗x. (5.78)

Lemma: g(⃗x) is a balanced function provided ⃗y ̸= ⃗0.

Proof: note that the following summation over all possible bit strings ⃗x
contains 2n terms
n−1
!
X
g(⃗x) = yj xj mod 2 = m mod 2, where (5.79)
j=0
X
m≡ xj , (5.80)
j∈S

and the summation in the last line is only over the set S of j values for which
yj = 1. Since ⃗y ̸= ⃗0, this set is not empty. Let k be the cardinality of the
set S (i.e., the number of non-zero bits in ⃗y ). The complement of this set, S̄
has cardinality n − k (i.e., the number of zero bits in ⃗y is n − k). The value
of m can range from 0 (if for all the j ∈ S, xj = 0) to k (if for all the j ∈ S,
xj = 1). The values of xj for j ∈ S̄ do not contribute since the corresponding
yj ’s vanish. Using these facts we can write

X k
X
(−1)g(⃗x) = Gm (−1)m , (5.81)
⃗
x m=0

where Gm is the number of times that a given value of m occurs when sum-
ming over ⃗x

n−k k
Gm = 2 . (5.82)
m

The power of 2 is the contribution from the sum in the LHS of Eq. (5.81) over
the (n − k) components of ⃗x in S̄, and the binomial factor comes from the
sum over the k components of ⃗x in S subject to the constraint in Eq. (5.80);
i.e., it is the number of distinct ways of distributing m 1’s and (k − m) 0’s
among the k positions within the set S. Finally we obtain
k
X
g(⃗
x) n−k
X
k−m m k
(−1) =2 (+1) (−1) = 2n−k [1 + (−1)]k = 0.
m
⃗
x m=0
(5.83)

Hence g is balanced. QED.

CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 159

5.7 Simon’s Algorithm

Simon’s algorithm solves another toy problem but, as noted in Box 5.5, has
(unlike Deutsch-Jozsa) the distinction of providing exponential oracle sep-
aration from BPP. While it only solves a toy problem, Simon’s algorithm
inspired important algorithms based on the quantum Fourier transform and
Shor’s factoring algorithm.
The problem statement is the following:
1. Given a binary function f⃗ : {0, 1}n → {0, 1}n and

2. Given a guarantee that f⃗ is either one-to-one or two-to-one, and

3. Given a further guarantee that
f⃗(⃗x1 ) = f⃗(⃗x2 ) ⇐⇒ ⃗x2 = ⃗x1 ⊕ ⃗b, for fixed ⃗b,

4. Task: find the ‘hidden’ bit string ⃗b. (Note that f⃗ is one-to-one ⇐⇒
⃗b = ⃗0.)

We write f⃗ as a vector-valued function because it maps the set of binary

vectors of length n into itself. Table 5.2 gives three different examples of
functions for the case n = 2 and ⃗b = (1, 0).

⃗x ⃗x ⊕ ⃗b f⃗1 (⃗x) f⃗2 (⃗x) f⃗3 (⃗x)

00 10 00 10 01
01 11 11 01 11
10 00 00 10 01
11 01 11 01 11
Table 5.2: Three example functions obeying the guarantees required for Simon’s problem
with n = 2 and ⃗b = (1, 0).

Classical Query Complexity for Simon’s Problem

Let us begin our analysis of Simon’s problem by considering how hard it is
to solve classically. That is, what is the query complexity? Suppose that
(somehow) we discover two inputs with the same output
f⃗(⃗x1 ) = f⃗(⃗x2 ). (5.84)
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 160

Then we know that the function is two-to-one and we can easily solve for the
unknown ⃗b using the identity

⃗x1 ⊕ ⃗x2 = ⃗x1 ⊕ (⃗x1 ⊕ ⃗b)

= (⃗x1 ⊕ ⃗x1 ) ⊕ ⃗b = ⃗b. (5.85)

Thus is we can find a matching pair of inputs, we have solved the problem.
Note that if we cannot find a matching pair, then we have shown that the
function is one-to-one and have also solved the problem (since ⃗b = ⃗0 in this
case).
How hard is it to find a pair with matching outputs? In the worst case,
to be absolutely sure that there are no matching pairs we would need to
evaluate N = 2n /2 + 1 inputs. This is because if we evaluate only half of
the possible inputs, the partners of all those inputs might accidentally be in
the other half. Hence the query complexity for a deterministic solution is
exponential in n, just as for the Deutsch-Jozsa problem.
As we did for Deutsch-Jozsa, let us now ask what the query complexity is
for a probabilistic algorithm. Suppose we query f⃗ M times with M distinct
inputs ⃗x. The number of distinct pairs in our sample is

M (M − 1)
Mpairs = , (5.86)
2
where the factor of 2 in the denominator is to prevent counting {⃗xj , ⃗xk } and
{⃗xk , ⃗xj } as different pairs.
Given some ⃗x1 , and assuming f⃗ is two-to-one, the probability that a
randomly selected ⃗x2 ̸= ⃗x1 yields f⃗(⃗x2 ) = f⃗(x1 ) is
1
ϵ= ∼ 2−n . (5.87)
2n − 1
This is because there are 2n − 1 choices for x2 and only one of them gives a
matching output (assuming f⃗ is two-to-one). There is only one way to fail
to find a match in a sample of size M : no pairs match. This occurs with
probability
M (M −1)
Pfail = (1 − ϵ) 2

M (M −1) M (M −1)
= eln(1−ϵ) 2 ≈ e−ϵ 2 , (5.88)
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 161

where the last (approximate) equality follows from Taylor series expanding
the logarithm to first order in ϵ. Thus if ϵ M (M2 −1) > 1, there is a reason-
able chance of success (which rapidly approaches unity as M exceeds this
threshold). For small ϵ, the threshold value for M is
12
2
M0 ∼
ϵ
1
2 2
∼
2−n
n+1
∼2 2 . (5.89)

We see that because though the number of pairs in our samples scales
quadratically with the sample size,1 we can begin to √ be reasonably sure of
finding a matching pair after sampling only about 2n times, even though
the worst case scenario would require sampling 2(n−1) + 1 times. Neverthe-
less, we still require a number of samples that is exponential in n to insure a
reasonable probability of success. Thus, unlike the Deutsch-Jozsa problem,
Simon’s problem lies outside complexity class BPP. we can use a probabilis-
tic algorithm to obtain a bounded error, but not in polynomial time. Thus
Simon’s problem is exponentially (in n) harder than Deutsch-Jozsa for a
classical computer, even if we allow probabilistic computation.

Quantum Query Complexity for Simon’s Problem

Let us now consider Simon’s quantum algorithm which shows that the prob-
lem is in BQP (see Box 5.5) thereby giving exponential quantum advantage.
We will use the quantum circuit shown in Fig. 5.11 where Uf is a quantum
oracle that encodes the function f and is of the form shown in Fig. 5.7. For
explanatory purposes, we will initially take the case in which the function f
is two-to-one and not one-to-one.
The Hadamard gates put the upper register in a uniform superposition
of all its possible states so that the state of the system immediately after the
1
This quadratic scaling of the number of distinct pairs is the origin of the so-called
‘birthday paradox.’ In a group of M people the probability of two people having the same
birthday is larger than you might naively guess.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 162

q0 0 H H
0 H H
0 H H
qn −1 0 H H
qn 0
U f
0

q2 n −1 0

Figure 5.11: Circuit for Simon’s algorithm.

application of the oracle is

1 X ⃗
|ψ1 ⟩ = √ f (⃗x) ⊗ ⃗x . (5.90)
2n ⃗x

It is convenient at this point to partition the set S of all the vectors ⃗x into
two disjoint parts parts S = S1 ∪ S2 defined by ⃗x ∈ S1 ⇐⇒ (⃗x ⊕ ⃗b) ∈ S2 .
That is each ⃗x and its partner ⃗x ⊕ ⃗b are in opposite subsets. It does not
matter which subset we choose to put ⃗x in, only that its partner is not in
the same subset. With this partition, Eq. (5.90) can be written
1 X 1 h i
|ψ1 ⟩ = √ f⃗(⃗x) ⊗ √ ⃗x + ⃗x ⊕ ⃗b , (5.91)
2(n−1) ⃗x∈S1 2

where we have taken advantage of the fact that f⃗(⃗x ⊕ ⃗b) = f⃗(⃗x). It is
important to understand that the bit string f⃗(⃗x) is unique in the sense that
the value of f⃗(⃗x) is unique for every vector ⃗x ∈ S1 . The only time the same
bit string occurs is for the partner ⃗x ⊕ ⃗b which lies in S1 .
As shown in Fig. 5.11, the next step is to measure the lower register. It
turns out that this step is not actually necessary, but it greatly simplifies the
analysis. Measurement of the lower register tells us the unique vector f⃗(⃗y )
for some random (and as yet unknown) vector ⃗y ∈ S1 . As a result the state
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 163

collapses to
1 n o
ψ ′ = √ f⃗(⃗y ) ⊗ ⃗y + ⃗y ⊕ ⃗b . (5.92)
2
Crucially, because the function (we are assuming) is two-to-one, the upper
register only partially collapses, ending up in a superposition of the two states
that are consistent with the measurement result because they yield the same
value of the function (same state within the register being measured). (For a
reminder about partial state collapse under measurement of a subset of the
qubits in a system, see Sec. 4.2.) At this point we have learned the value of
f⃗ but not the value of ⃗y or ⃗y ⊕ ⃗b.
Stop for a moment and consider just how powerful this result is. The
huge superposition state has collapsed to a very small superposition that
automatically picked out (at random) a state |⃗y ⟩ and its partner |⃗y ⊕ ⃗b⟩!
This is a huge advantage–but we are not yet home free. If we measure the
upper register, the state collapses to either |⃗y ⟩ or |⃗y ⊕ ⃗b⟩ and we have lost
all the information we needed to find ⃗b using ⃗b = ⃗y ⊕ (⃗y ⊕ ⃗b).
The solution to this difficulty is shown in Fig. 5.11. Before measuring the
upper register we apply H ⊗n to it to obtain a kind of quantum interference
between |⃗y ⟩ and |⃗y ⊕ ⃗b⟩ that will allow us to determine ⃗b
1 n o
ψ ′′ = √ f⃗(⃗y ) ⊗ H ⊗n ⃗y + ⃗y ⊕ ⃗b . (5.93)
2
Recalling Eqs. (5.47,5.59), and Eq. (5.60) we obtain

′′ 1 ⃗ 1 X ⃗
y .⃗
z
n
⃗b.⃗
z
o
ψ = √ f (⃗y ) ⊗ √ (−1) 1 + (−1) ⃗z . (5.94)
2 2n ⃗z

Noting that the bit-string vector dot product is computed modulo 2, we see
immediately that if ⃗b.⃗z = 1, the term in curly brackets vanishes, while if
⃗b.⃗z = 0, then the term in curly brackets is +2. We conclude therefore that
the measurement collapses the state with equal probabilities onto all possible
states |⃗z⟩ obeying
⃗b.⃗z = 0. (5.95)

Now if ⃗b ̸= ⃗0 is a vector of length n then from the Balanced Function Lemma

in Box 5.1, we know that there are 2n−1 different solutions to this equation,
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 164

comprising one half of the set of all possible vectors. The measurement results
yield random vectors ⃗z of length n that are ‘perpendicular’ to ⃗b. If we run
the algorithm n − 1 + m times with m being a small integer constant of order
unity, it is highly likely that measurement results {⃗zj ; j ∈ [1, n − 1 + m]} will
yield a set of n − 1 linearly independent non-zero vectors ⃗vj ; j ∈ 1, n − 1, so
that the set of linear equations
⃗b.⃗v1 = 0
⃗b.⃗v2 = 0
⃗b.⃗v3 = 0
..
.
⃗b.⃗vn−1 = 0 (5.96)

can be uniquely solved for ⃗b using Gaussian elimination. As a simple example,

consider the case n = 2 and ⃗b = (01). There is exactly n − 1 = 1 non-
zero solution of Eq. (5.95), namely ⃗z = (10). Note that if we find that the
algorithm produces n rather than n−1 linearly independent non-zero vectors,
then we know that ⃗b = ⃗0 and the function f is one-to-one. (The dimension of
the vector space is n so the largest set of linearly independent vectors is n.)
Interestingly, if ⃗b contains an even number of non-zero entries (e.g., ⃗b = (11))
then ⃗b.⃗b = 0 and ⃗b itself is one of the solutions (in the n = 2 example, the
only non-zero solution).
As we will discuss shortly, it turns out that the failure probability falls
exponentially with m, the number of queries beyond n − 1. Thus m does
not need to increase as n increases. Thus the quantum algorithm is proba-
bilistic (it may fail to find enough linearly independent vectors for any fixed
number of queries, n − 1 + m), but has a query complexity polynomial (in
this case linear) in n, whereas the best classical algorithm is exponential in
n. The quantum complexity class (see Box 5.5) is thus BQP (Bounded Error
Quantum Polynomial Time).

Failure Probability of Simon’s Algorithm for fixed query number

For simplicity we will continue to assume that ⃗b ̸= ⃗0 so that there are only
n − 1 linearly independent vectors satisfying Eq. (5.95). If ⃗b = ⃗0, there will
be n such vectors and the discussion below is easily modified for this case by
replacing n − 1 with n in the various formulae we will discuss.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 165

As mentioned above Simon’s algorithm outputs a single bit string ⃗z of

length n obeying Eq. (5.95). There are 2n−1 such bit strings, all equally
likely. The only way to fail is if ⃗z1 = ⃗0. Hence the probability of obtaining a
non-zero result on the first try is very close to unity for large n

P = 1 − 2−(n−1) . (5.97)

Next suppose that we have obtained a non-zero ⃗v1 = ⃗z1 . There are now two
ways to fail to advance when we call the oracle a second time: either ⃗z2 = ⃗0
or ⃗z2 = ⃗z1 . Hence the success probability to obtain a second vector ⃗v2 that
is non-zero and is linearly independent of ⃗v1 is

P = 1 − 2−(n−2) . (5.98)

What happens when we continue to iterate this process? Suppose that

we are at the point where we have found ℓ linearly independent solutions
⃗v1 , . . . , ⃗vℓ . These span a subspace of dimension 2ℓ that includes ⃗0. The
next step of the algorithm will fail to produce a new (non-zero) linearly
independent vector ⃗vℓ+1 if the algorithm outputs a vector in this subspace.
Hence the failure probability is

2ℓ
Pℓ→ℓ = = 2ℓ−(n−1) , (5.99)
2(n−1)
and the success probability to transition from having ℓ to ℓ + 1 linearly
independent non-zero vectors is

2n−1 − 2ℓ
Pℓ→ℓ+1 = = 1 − 2ℓ−(n−1) . (5.100)
2n−1
Notice that this expression for P0→1 agrees with Eq. (5.97) and this expression
for P1→2 agrees with Eq. (5.98). The final transition to the goal has the
lowest success probability: Pn−2→n−1 = 12 . Also notice that P(n−1)→n = 0 (so
that P(n−1)→(n−1) = 1) as required since the number of linearly independent
vectors cannot exceed n − 1 (for the case ⃗b ̸= ⃗0) and the transitions must
terminate.
These transition probabilities define a Markov chain illustrated in
Fig. 5.12. The probability of passing through the chain from ℓ = 0 to
ℓ = (n − 1) in n − 1 steps (i.e., successfully advancing each time with zero
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 166

failures) is
n−2
Y n−2
Y
P0 (n) = Pℓ→ℓ+1 = [1 − 2ℓ−(n−1) ]. (5.101)
ℓ=0 ℓ=0

In the limit of large n the success probability in the initial stages is extremely
close to unity. As a result the probability of reaching the goal with zero
failures approaches a constant given by the infinite product

1 3 7 15 31
P0 (∞) = . . . ≈ 0.288788095. (5.102)
2 4 8 16 32

Every ‘trajectory’ must pass through the chain from beginning to end and
so the factor P0 (n) appears in the probability of every trajectory.
Let us now consider all the trajectories that involve precisely one failure
(an hence require n − 1 + 1 = n calls to the oracle). The failure to advance
can occur at any site from ℓ = 0 to ℓ = n−2. Summing over all these distinct
trajectories gives
n−2
X
P1 (n) = P0 (n) Pℓ→ℓ = P0 (n)S1 (n), (5.103)
ℓ=0

where
0
X
S1 (n) ≡ 2ℓ−(n−1) (5.104)
ℓ=n−2
1 1 1 1
= + + + . . . n−1 , (5.105)
2 4 8 2
where for convenience we have reversed the order of the summation. In the
limit of large n this approaches

1 1
S1 (∞) = = 1, (5.106)
2 1 − 12

which yields a total probability of success in n steps or fewer which is greater

than 50%

P0 (∞) + P1 (∞) = 2P0 (∞) ≈ 0.57757619. (5.107)

CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 167

A little thought shows that the corresponding sum over all possible com-
binations of two failure probabilities is
"∞ #2 ∞
1 X 1X 2 1 1 2
S2 (∞) = Pℓ→ℓ + [Pℓ→ℓ ] = 1+ = . (5.108)
2 ℓ=0 2 ℓ=0 2 3 3

From this we obtain the total probability to arrive at the goal in at most
n + 1 calls to the oracle is

2
P0 (∞) + P1 (∞) + P2 (∞) = 2 + P0 (∞) ≈ 0.770102. (5.109)
3

We thus see that the probability is rapidly converging towards unity as the
number of calls n − 1 + m to the oracle increases beyond the minimum n − 1.
With further effort it is possible show that this convergence is exponential
in m. Roughly speaking, the last step from ℓ = n − 2 to ℓ = n − 1 is
the bottleneck in the dynamics since it has the lowest success probability
Pn−2→n−1 = 21 . From this it follows that the probability to fail to reach the
goal for large m is proportional to 2−m .

Box 5.4. Exact Markov Chain Analysis [STILL UNDER CON-

STRUCTION] We can generalize the above to obtain a formally exact
expression for the success probability for trajectories involving m failures to
advance
n−2
µ
X Y
j
Pm (n) = P0 (n) G(⃗µ, m) Pj→j+1 , (5.110)
µ
⃗ j=0

where µ⃗ is a vector of length n whose entries are non-negative integers rep-

resenting the number of failures on each site of the Markov chain, and the
constraint that there be precisely m failures is enforced by
n−1
X
G(⃗µ, m) = 1 ⇐⇒ µj = m
j=0

= 0 otherwise. (5.111)

[STILL UNDER CONSTRUCTION.]

CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 168

One relatively simple quantity to compute is the mean number of failures

to advance at each node of the Markov chain. For node ℓ, the probability of
k failures before advancing is (Pℓℓ )k . Hence the mean number of failures is
∞
X
f¯ℓ = k(Pℓℓ )k . (5.112)
k=0

This sum can be evaluated by defining the generating function

∞
X 1
fℓ (λ) = e−λk Pℓℓk = , (5.113)
k=0
1 − e−λ Pℓℓ

and noting that

d Pℓℓ
f¯ℓ = − fℓ (λ) = . (5.114)
dλ λ=0 (1 − Pℓℓ )2

The mean number of failures before reaching the end of the Markov chain is
thus
n−2
X Pℓℓ
m̄(n) =
ℓ=0
(1 − Pℓℓ )2
n−1
X 2−k
= ≈ 2.744, (5.115)
k=1
(1 − 2−k )2

where the last equality is the result for asymptotically large n but is accurate
(to three digits after the decimal point) for any n ≥ 16. The small value of
the mean number of failures is consistent with the success probability rapidly
approaching unity for m ∼ 3 and larger. [Note this argument does not rule
out the possibility of a long tail in the distribution of the number of failures.
However a similar calculation of the mean square number of failures does
rule this out. Appendix G in Mermin, Quantum Computer Science, provides
a lower bound on the success probability after n − 1 + m queries.]
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 169

P0→0 P1→1 Pn − 2→n − 2 Pn −1→n −1

P0→1 Pn − 2→n −1
=0  =1 = n − 2 = n − 1

Figure 5.12: Markov chain showing the state transition probabilities of Simon’s
algorithm as one calls the oracle in an attempt to increase the number of linearly
independent output vectors from ℓ to ℓ + 1. Pℓ→ℓ is the probability of failing to
advance and Pℓ→ℓ+1 is the success probability.

Box 5.5. Complexity Classes Theoretical computer science devotes much

effort to defining different complexity classes that classify the degree of dif-
ficulty of solving different types of problems within various different compu-
tational models. Scott Aaronson and collaborators maintain a ‘Complexity
Class Zoo’ at https://complexityzoo.net/Complexity Zoo.
The Deutsch-Jozsa problem is an oracle problem in the complexity class
known as EQP, meaning that it can be solved exactly in polynomial time
(oracle query complexity is polynomial in the number of bits in the input
to the function f ). In fact the time is a constant independent of n. This is
exponentially better than the best classical algorithm that exactly solves the
problem which we saw requires a number of oracle queries 2n /2 + 1 which is
exponential in n.
However we also saw that there exists a fast probabilistic classical algorithm
that solves the problem using only a polynomial number of queries N pro-
vided that the user is willing to accept an exponentially small failure prob-
ability ϵ = 22−N (that is independent of n). This complexity class is called
BPP, meaning that problems in this class can be solved on a classical com-
puter with bounded error in polynomial time.
In short, the Deutsch-Jozsa quantum algorithm is not better than the best
classical algorithm if one is willing to allow probabilistic classical algorithms
and thus accept an exponentially small failure probability. In this case ‘not
better’ means that the quantum and probabilistic classical algorithms only
need a fixed number of oracle queries, independent of the problem size. (The
number of queries required for the probabilistic classical algorithm depends
on the logarithm of 1/ϵ.)
As we will see, the Berstein-Vazirani algorithm studied in Sec. 5.6 provides an
‘oracle separation’ between complexity classes (see Box 5.5) BPP and BQP
because the quantum algorithm solves the problem with n0 = 1 queries while
the best probabilistic classical algorithm requires n queries. In contrast, Si-
mon’s Problem studied in Sec. 5.7 provides an exponential oracle separation
from the best probabilistic (or deterministic) classical algorithm.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 170

5.8 Grover Search Algorithm

Suppose that Bob’s adversary, Alice, creates an unstructured database and
Bob’s task is to locate some particular data that he needs. For example,
suppose the database comprises a large list of names and associated telephone
numbers, but Alice has put the names in random rather than alphabetical
order. To find the phone number of a particular person, there is no faster
classical algorithm for Bob to use than to check each entry one at a time.
If the total number of entries in the list is N , the average query complexity
(number of times Bob has to look at the list) will be N/2 since the name he
is looking for is equally likely to be at any position in the list. The worst-
case scenario is that Bob will need N queries. To subvert malicious intent
on the part Alice, Bob might choose which entry to look at randomly using
a probabilistic process. If Alice has no way of knowing the order in which
Bob is going to search the list, she cannot force the worst-case scenario on
Bob with certainty (though Bob may still be unlucky in his random choices
and still end up needing N queries). If Bob plans to use the database many
times, he may find it useful to reorder the entries in the database using a
sorting algorithm such as QUICKSORT which has an average-case cost of
order ∼ N log N and a worst-case cost of order N 2 queries.
Remarkably, Lov Grover (then at Bell Laboratories) invented a probabilis-
tic quantum√ search algorithm that can find the answer with high probability
in about N queries, something that is impossible classically!
To begin our analysis, consider the database illustrated in Table 5.3. In
keeping with the previous algorithms we have studied, let us set this up as
an oracle problem. Suppose that Bob is looking for the phone number of his
friend Susan and let the binary encoding of her name be Q. ⃗ Let us define a
function
(
1, if Y⃗ (⃗x) = Q,
⃗ else
f (⃗x) = (5.116)
0.

The function evaluates to 1 if the address ⃗x in the database corresponds

to the entry for his friend’s name. Otherwise it yields zero. Of course the
function f depends on the particular name Q ⃗ that Bob is searching for, but
we will leave this dependence implicit and keep only ⃗x as the argument of f .
Bob can query the function with different arguments x until it returns
1 and then he only needs to look at position x in the database to find the
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 171

⃗x Y⃗ ⃗
Z

000 Y⃗ (000) Z(000)

⃗

001 Y⃗ (001) Z(001)

⃗

010 Y⃗ (010) Z(010)

⃗

011 Y⃗ (011) Z(011)

⃗

100 Y⃗ (100) Z(100)

⃗

101 Y⃗ (101) Z(101)

⃗

110 Y⃗ (110) Z(110)

⃗

111 Y⃗ (111) Z(111)

⃗

Table 5.3: Unsorted database (of size 8) created by Alice. Here ⃗x is the binary number
indicating the ordinal position in the database and Y⃗ (⃗x) and Z(⃗
⃗ x) are a pair of binary
numbers of fixed length m representing the data. For example, Y ⃗ (⃗x) might be the ASCII
encoding of a person’s name, and Z(⃗⃗ x) might be the binary encoding of their telephone
number. The list of names is not sorted alphabetically but rather is in random order.

phone number he seeks. (Of course he could set up the function with extra
bits in the output to automatically return the phone number if there is a
match, but to simplify the notation we are leaving this aside.) As noted
in Box 5.6, the important fact that Bob can use the database to create the
oracle himself emphasizes the point that the oracle is not telling Bob the
location of the data he is looking for–it is simply telling him whether or not
the data is actually at the particular location that Bob queries the oracle
about. That is, Bob is guessing the location in the database and the oracle
is telling him whether or not his guess is correct.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 172

Box 5.6. Who can create the Grover Search Oracle? It is interesting
to recall that in the algorithms we have studied previously, it is the adversary
Alice who creates the oracle function and Bob has access to it only as a ‘black
box.’ Bob’s task is to learn something about the hidden structure of the
oracle function (i.e., a hidden bit string used to define the function). Here
the situtation is different. Alice may have created the random ordering of the
data in the database, but Bob can easily create the oracle function himself
without knowing where the name he is looking for is in the database. He
simply creates a function that takes in a value of ⃗x (and implicitly takes in
the name Bob is searching for), goes to position with index ⃗x in the database
and checks if the name there matches the name Bob is searching for. If it
does match, the function returns 1, and if not, it returns 0.

To create a quantum algorithm we can, as usual, encode this function in

a reversible unitary quantum oracle Of that takes in the state
|ψin ⟩ = |d⟩ ⊗ |⃗x ⟩, (5.117)
and outputs the state
|ψout ⟩ = Of |ψin ⟩ = |d ⊕ f (⃗x)⟩ ⊗ |⃗x ⟩, (5.118)
where ⃗x is the binary representation of the ordinal index in the database.
For simplicity we will assume that N = 2n , so that the range of indices can
be encoded in bit strings ⃗x of length n. If necessary we can pad the database
with additional entries (all of which return f = 0) to achieve this condition.
As in the Deutsch and Deutsch-Jozsa algorithm circuits (see Fig. 5.8) we
will put the lower input wire d into the superposition state |−⟩ so that
|ψout ⟩ = |−⟩ ⊗ (−1)f (⃗x) |⃗x ⟩. (5.119)
We see that the lower wire d is a kind of quantum ‘catalyst’ that causes the
phase kickback (−1)f (⃗x) onto the input state, but is itself left unchanged in the
process. Since the state of the lower wire is unchanged and is never entangled
with the other wires, we will ignore it from now on and just consider the state
of the upper n wires. Let us suppose that the name Q ⃗ that we are looking
for is located at index ⃗y in the data base. That is, suppose that Y⃗ (⃗y ) = Q. ⃗
It is straight forward to see from Eq. (5.119) that the (n + 1) × (n + 1) oracle
unitary Of is equivalent (ignoring the catalyst wire) to the n × n unitary
Uf = I − 2|⃗y ⟩⟨⃗y | (5.120)
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 173

where I is the n-qubit identity operator. It is clear from this that

Uf |⃗y ⟩ = −|⃗y ⟩, and (5.121)

Uf |⃗x ⟩ = +|⃗x ⟩, if ⃗x =
̸ ⃗y . (5.122)

It is important here to again emphasize that even though the form of the
oracle unitary as written in Eq. (5.120) depends explicitly on the ‘answer’
⃗y that Bob seeks, Bob is still able to use the database to create the oracle
himself without having to explicitly know the answer ⃗y in advance. Note
that to do this, he has to use the extra qubit d to create the reversible
(n+1)×(n+1) unitary oracle matrix Of based on the query success function
f . It is only when we consider the special case that d is initialized in the
state |−⟩ that is left unchanged at the end, that we can restrict our attention
to the first n qubits and write the effective n × n unitary matrix Uf . (See
Box 5.6.)
Again, just as in the Deutsch-Jozsa algorithm, we will initialize the n
upper (input) wires in state

|Φ0 ⟩ = H ⊗n |0⟩⊗n = (H|0⟩)⊗n = |+⟩⊗n , (5.123)

so that when we apply the oracle once we make one query, but it is in a
superposition of all possible single queries. The result is

Uf |Φ0 ⟩ = I − 2|⃗y ⟩⟨⃗y | |Φ0 ⟩ (5.124)
1 X
=√ (−1)f (⃗x) |⃗x ⟩ (5.125)
n
2 ⃗x
1 X 1
=√ |⃗x ⟩ − √ |⃗y ⟩. (5.126)
2n ⃗x̸=⃗y 2n

We see that the giant superposition state |Φ0 ⟩ is unchanged except one entry
(the one we want) has been ‘marked’ by the oracle because its sign flipped.
Now consider a function g that is like f except that it is for the specific
case where the desired entry in the database has index ⃗0. That is, g(⃗x) = 1
for ⃗x = ⃗0 but g(⃗x) = 0 for all ⃗x ̸= ⃗0. Now encode this function in a unitary
oracle (that we can create since know the function g) Ug . By the same
argument as above

Ug = I − 2|⃗0⟩⟨⃗0|. (5.127)
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 174

Now use this to define the operator G via

G = −H ⊗n Ug H ⊗n = 2|Φ0 ⟩⟨Φ0 | − I, (5.128)
where the minus sign has been included purely for convenience.
The Grover algorithm consists of repeatedly applying the pair of operators
R ≡ GUf (5.129)
to the initial state |Φ0 ⟩. Importantly, it turns out that since R only involves
the identity operator and the projectors |⃗y ⟩⟨⃗y | and |Φ⃗ 0 ⟩⟨Φ0 |, R is actually
a rotation matrix in the space spanned by the starting state |Φ0 ⟩ and the
target state |⃗y ⟩. To see this, let us first find an orthonormal basis for this
subspace. Since our target state is |⃗y ⟩, we will choose it as one basis state.
|Φ0 ⟩ is very nearly orthogonal to |⃗y ⟩
1
g ≡ ⟨⃗y |Φ0 ⟩ = √ . (5.130)
2n
Thus the second basis vector is very close to |Φ0 ⟩ (up to an unimportant
arbitrary global phase that we are free to choose). Let us define the second
basis vector to be
1
|χ⟩ = √ [|Φ0 ⟩ − g|⃗y ⟩] , (5.131)
W
where W = 1 − g 2 is the normalization constant. It is straightforward to see
that this choice guarantees that this second basis vector is indeed orthogonal
to the first
⟨⃗y |χ⟩ = 0. (5.132)
Using the orthonormality of the basis, an arbitrary superposition of states in
the span of these two basis vectors can be written
θ θ
|ψ⟩ = cos |χ⟩ + sin eiφ |⃗y ⟩, (5.133)
2 2
thereby defining a point with coordinates θ, ϕ on an effective ‘Bloch sphere’
shown in Fig. 5.13 that has |χ⟩ at the north pole and |⃗y ⟩ at the south pole.
As an example, the input state for the algorithm is from Eq. (5.131)
√
|Φ0 ⟩ = W |χ⟩ + g|⃗y ⟩, (5.134)
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 175

and is described by angles θ0 , φ0 obeying

θ0
sin =g (5.135)
2
θ0 √
cos = W (5.136)
2
φ0 = 0. (5.137)

Since (for large n) g ≪ 1, the initial state is nearly orthogonal to the target
state |⃗y ⟩, and thus is extremely close to the north pole. Further more it lies
in the XZ plane of the Bloch sphere as shown in Fig. 5.13. The initial state
|Φ0 ⟩ contains the target state, but with only a very small amplitude so that
a measurement is exponentially unlikely to yield the desired answer. The
goal of the Grover algorithm is to ‘amplify’ the quantum amplitude of the
target state to near unity. If this can be achieved, then measurement of the
output register will yield the database address ⃗y corresponding to the name
of Bob’s friend. With this information Bob then knows where to look to find
his friends phone number.

χ Φ0
R Φ0
θ0
0
2θ
θ=

− +


y

Figure 5.13: Effective Bloch sphere for the Grover search algorithm for the two-
dimensional subspace spanned by the starting state |Φ0 ⟩ and the target state |⃗y ⟩.
The target state lies at the south pole and the starting state lies very close to the
north pole, at exponentially small polar angle θ0 . Application of the rotation R
increase the polar angle by 2θ0 .
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 176

Exercise 5.4. Show that the basis state orthogonal to |⃗y ⟩ in Eq. (5.131)
is given by
1 X
|χ⟩ = √ |⃗x ⟩.
2n − 1 ⃗x̸=⃗y

Consider the action of Uf on the basis states

Uf |χ⟩ = +|χ⟩, (5.138)

Uf |⃗y ⟩ = −|⃗y ⟩. (5.139)

What is the effect of G in this subspace? We see that

√
G|χ⟩ = −|χ⟩ + 2|Φ0 ⟩⟨Φ0 |χ⟩ = (2W − 1)|χ⟩ + 2g W |⃗y ⟩, (5.140)
√
G|⃗y ⟩ = 2|Φ0 ⟩⟨Φ0 |⃗y ⟩ − |⃗y ⟩ = 2g W |χ⟩ − (2W − 1)|⃗y ⟩. (5.141)

Combining all these results we have that the effect of R = GUf is

√
R|χ⟩ = (2W − 1)|χ⟩ + 2g W |⃗y ⟩, (5.142)
√
R|⃗y ⟩ = −2g W |χ⟩ + (2W − 1)|⃗y ⟩. (5.143)

Within this two-dimensional subspace we can use spinor representations for

the basis states

1
= |χ⟩ (5.144)
0

0
= |⃗y ⟩, (5.145)
1

and thus have the matrix representation of R as a rotation around the Y axis
of the effective Bloch sphere by a small angle +θ
cos 2θ − sin 2θ

θ
R= = e−i 2 Y , (5.146)
+ sin 2θ cos 2θ

with
θ √ θ0 θ0
sin = 2g W = 2 sin cos = sin θ0 , (5.147)
2 2 2
where θ0 is the polar angle of the state |Φ0 ⟩. Thus the rotation associated
with R is θ = 2θ0 = 4 arcsin g ≈ 4g, where for the last equality we have used
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 177

the small angle approximation that is valid for large n. We see from Fig. 5.13
that since the rotation is about the Y axis by a positive angle θ0 , repeatedly
applying R rotates the Bloch vector closer and closer to the target state at
the south pole. After N applications of R (N oracle queries) the polar angle
of the Bloch vector is

θN = (2N + 1)θ0 . (5.148)

The probability of the measured output state being ⃗y is maximized by choos-

ing θN close to π (so that the state of the system is close to |⃗y ⟩). This can
be achieved by choosing N to be an integer Nq close to

π 1 π√ n
Nq ≈ − ≈ 2 , (5.149)
4 arcsin g 2 4

where again, the last approximate equality is valid for large n (small g).
Thus we have the remarkable result that even though Bob does not know
the target state, having access to the search oracle that can check if a guess
is correct, allows Bob to rotate the initial state |Φ0 ⟩ (almost perfectly) into
the target state |⃗y ⟩! Again, the oracle does not tell Bob directly what the
value of ⃗y is, only whether a guessed value ⃗x is correct or not.
The average case classical query complexity is Nc ∼ 2n−1 ∼ Nq2 is quadrat-
ically worse than the quantum query complexity. This is often referred to as
a quadratic quantum speed-up. Because the speed up is not exponential, the
Grover algorithm must still call the oracle an exponential number of times
n
Nq ∼ 2 2 . (5.150)

In practice, with real-world quantum computers that suffer errors, it is dif-

ficult to avoid fatal errors for circuits that have exponential depth. Hence
it is unlikely (at least in the near future without nearly perfect quantum
error correction) that the Grover algorithm will have practical applications.
Nevertheless, it is an interesting quantum programming primitive to keep in
mind.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 178

5.8.1 Geometric Interpretation of the Grover Opera-

tions
Recall the properties of the Pauli matrix Z in terms of the standard single-
qubit basis states
Z ≡ |0⟩⟨0| − |1⟩⟨1|

= |0⟩⟨0| + |1⟩⟨1| − 2|1⟩⟨1|
= I − 2|1⟩⟨1|, (5.151)

where from the completeness relation, I = |0⟩⟨0| + |1⟩⟨1| is the identity
operator. This tells us that, while Uf can act on the entire Hilbert space,
when it acts on the two-dimensional manifold of states spanned by the Bloch
sphere states we have defined, it acts like an effective Pauli Z
Uf = I − 2|⃗y ⟩⟨⃗y | = Z. (5.152)
Similarly, the unitary G also acts within the two-dimensional subspace
like an effective Pauli matrix
G = 2|Φ0 ⟩⟨Φ0 | − I = +⃗λ · ⃗σ , (5.153)
where ⃗λ = (sin θ0 , 0, cos θ0 ) is the unit vector parallel to the spin vector
associated with the location of |Φ0 ⟩ on the Bloch sphere.
Now notice that up to an irrelevant global phase factor, Z produces a
rotation of the spin around the Z axis by an angle of π. Thus Uf does the
same. Similarly, G is equivalent to a rotation around the ⃗λ axis. This leads
us to a different geometric understanding of the Grover amplification process
illustrated in Fig. 5.14.

5.8.2 Multiple Entry Search

We can readily generalize the Grover algorithm to the case where the
database contains multiple targets for the search. For example, suppose that
Bob is searching for anyone in the database with a certain family name. How
does he modify his quantum search if there are M entries in the database for
which the function f returns the value 1? Rather than a single state |⃗y ⟩, we
can take seek a target state
1 X
|TM ⟩ = √ f (⃗x)|⃗x ⟩. (5.154)
M ⃗x
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 179

 
z λ z 
λ

Figure 5.14: Grover amplification process. Left panel shows that Uf rotates the
Bloch vector ⃗λ of state |Φ0 ⟩ by π around the Z axis. Right panel shows the next
step in which G rotates the state vector by π around the ⃗λ axis. The combined
effect increases the polar angle of the state vector from θ0 to 3θ0 , equivalent to a
rotation around the Y axis by an angle of 2θ0 .

Note that if f (⃗x) = 1 for M distinct values of ⃗x, then this state is properly
normalized. If the algorithm were able to rotate the starting state |Φ0 ⟩ into
|TM ⟩, then with high probability a single measurement will yield a random
value ⃗y from among the M distinct values. If Bob desires to obtain all M of
the solutions ⃗y1 , . . . ⃗yM to the equation f (⃗x) = 1, he can do so probabilisti-
cally by running the search algorithm a number of times of order M until it
succeeds in finding all M entries.
The operator G and the initial state |Φ0 ⟩ remain the same as in the
previous case. The analog of Eq. (5.120) is
M
X
Uf = I − 2 |⃗yj ⟩⟨⃗yj |. (5.155)
j=1

It is important to note that this operator is not the same as

Vf = I − 2|TM ⟩⟨TM |. (5.156)
Nevertheless it is still true that
Uf |TM ⟩ = Vf |TM ⟩ = −|TM ⟩, (5.157)
Uf |Φ0 ⟩ = Vf |Φ0 ⟩. (5.158)
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 180

Hence within the subspace spanned by the initial state |Φ0 ⟩ and the target
state |TM ⟩, the oracle Uf is effectively equal to Vf .
The upshot is that the only change from the analysis for the case of M = 1
is the increase in the overlap between the initial and target states from g to
√
r
M
⟨TM |Φ0 ⟩ = = g M. (5.159)
2n
√
Replacing g in all the formulae by√ g M means that (for large n) the rotation
angle θ increases
q by a factor of M and the rotation must be applied fewer
n
times N ≈ 4 2M to optimize the success probability (of finding one of the
π

correct entries) to be near unity. This again a quadratic quantum advantage

over the classical query complexity 2n /M .
Exercise 5.5. Consider a Grover search problem where the database
contains M entries satisfying the search criterion (i.e. there are M dis-
tinct indices for which the oracle query returns the value 1 instead of 0).
Develop a probabilistic hybrid quantum/classical algorithm to find all
M distinct database entries. Hybrid in this case means that a classical
computer keeps track of how many of the M entries have been found so
far and (may) decide to adjust the oracle or the algorithm accordingly.

Oblivious Amplification
Notice that if we apply the Grover rotation operator R in Eq. (5.129) too
many or too few times, we will over or under rotate the state on the Bloch
sphere and not end up close to the optimal position at the south pole. The
correct number of times to apply R is determined by the size of the database
(Nd = 2n ) and the number of instances M for which f returns the value 1
(i.e., the number of ‘correct answers’). We have to be given Nd to be able to
query the oracle. But what if the value of M is unknown? Then we have a
problem. Fortunately there is a clever modification of the Grover algorithm
that solves this problem by means of oblivious amplitude amplification. That
is, independent of the value of M , it automatically stops the process when it
has brought the state near the south pole. [The method, introduced by Berry
et al. in 2014, is reviewed in ‘Fixed-point oblivious quantum amplitude-
amplification algorithm,’ Bao Yan et al., Scientific Reports volume 12, Article
number: 14339 (2022). See also Appendix D of PRX QUANTUM 2, 040203
(2021)]
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 181

NEED TO PRESENT DETAILS OF METHOD.

5.8.3 Generic Quantum Speedup for Decision Prob-

lems
The Grover algorithm provides a very generic method to speed up decision
problems where the answer is unknown or difficult to obtain, but is easy
to check. As noted by Nielsen and Chuang, the factoring problem is one
such example. Suppose we are given an exponentially large number Λ that
is the product of two (unknown) primes, Λ = m1 m2 . Finding the prime
factors is (believed to be) difficult. Conversely if you guess two factors, it
is easy to multiply them to check if they are indeed factors. Rather than
blindly guessing, one could develop a quantum oracle which takes as input
two large numbers M1 , M2 and returns 1 if their product equals Λ and returns
0 otherwise. This would allow a Grover-accelerated search for the factors.
However the speed up is actually not good relative to the best known classical
algorithms which, while still superpolynomial,2 are much better than random
guessing. In addition, Shor’s quantum algorithm (based on the quantum
Fourier transform) solves the factoring problem vastly faster than the best
known classical algorithms.

5.9 VQE and QAOA Algorithms

To be developed.

5.10 Phase Estimation Algorithm

To be developed.

5.11 Quantum Fourier Transform

To be developed.

(1/3)
2
An example of superpolynomial scaling would be a function of the form e(1/2)n ,
which grows more rapidly for large n than any polynomial in n, but more slowly than
exponentially in n, say as e(1/16)n .
Chapter 6

Quantum Error Correction

IMPORTANT ADDITIONS TO BE MADE: (1) Explain how to create a

circuit to measure stabilizers and (2) why learning both Z1 and Z2 would
destroy the quantum information. Explain how quantum errors are contin-
uous but measurement of error syndromes discretizes them. Explain how to
do that with two CNOTs. Add discussion of repetition code for phase errors
and then concatenate to get the Short code. Perhaps also present the toric
code.
Now that we understand entanglement, we are in a position to tackle
quantum error correction.
To overcome the deleterious effects of electrical noise, cosmic rays and
other hazards, modern digital computers rely heavily on error correcting
codes to store and correctly retrieve vast quantities of data. Classical error
correction works by introducing extra bits which provide redundant encoding
of the information. Error correction proceeds by measuring the bits and
comparing them to the redundant information in the auxiliary bits. Another
benefit of the representation of information as discrete bits (with 0 and 1
corresponding to a voltage for example) is that one can ignore small noise
voltages. That is, V = 0.99 volts can be safely assumed to represent 1 and
not 0.
All classical (and quantum) error correction codes are based on the as-
sumption that the hardware is good enough that errors are rare. The goal is
to make them even rarer. For classical bits there is only one kind of error,
namely the bit flip which maps 0 to 1 or vice versa. We will assume for sim-
plicity that there is probability p ≪ 1 that a bit flip error occurs, and that
error occurrences are uncorrelated among different bits. One of the simplest

182
CHAPTER 6. QUANTUM ERROR CORRECTION 183

classical error correction codes to understand involves repetition and major-

ity rule. Suppose we have a classical bit carrying the information we wish
to protect from error and we have available two ancilla bits (also subject to
errors). The procedure consists copying the state of the first bit into the two
ancilla bits. Thus a ‘logical’ 1 is represented by three ‘physical’ bits in state
111, and a ‘logical’ 0 is represented by three ‘physical’ bits in state 000. Any
other physical state is an error state outside of the logical state space.
Suppose now that one of the three physical bits suffers an error. For
example, suppose that the state is supposed to be 000 but the first bit flips
so the state becomes 001. Assuming that errors are rare and thus no more
than 1 error has occurred, it is easy to identify that it is the first bit that
flipped because it disagrees with the other two. That is, by examining the
state of each bit it is a simple matter to identify the bit which has flipped
and is not in agreement with the ‘majority.’ We then simply flip the minority
bit so that it again agrees with the majority. This procedure succeeds if the
number of errors is zero or one, but it fails if there is more than one error.
For example the error state 001 could have arisen not from a single error
in the code word 000 but rather two errors in the code word 111. There is
no way to tell for sure which case actually occurred. However if errors are
rare, it is far more likely that a single error occurred rather than two errors.
Hence we should our error decoder should assume the more likely scenario.
Similarly the state 000 could represent zero errors or three errors that 111
into 000. Again this is undetectable and causes a failure of the code. Let us
now analyze this failure probability quantitatively.
Of course since we have replaced one imperfect bit with three imperfect
bits, this means that the probability of an error occurring has increased
considerably. However if the most frequent errors (a single bit flip on one of
the three bits) are correctable, we may still come out ahead. For three bits
the probability Pn of n errors is given by
P0 = (1 − p)3 (6.1)
P1 = 3p(1 − p)2 (6.2)
P2 = 3p2 (1 − p) (6.3)
P3 = p3 . (6.4)
Because our error correction code only fails for two or more physical bit errors
the error probability for our logical qubit is
plogical = P2 + P3 = 3p2 − 2p3 , (6.5)
CHAPTER 6. QUANTUM ERROR CORRECTION 184

The logical error rate is plotted against the physical probability in Fig. 6.1.
We see that if the physical error probability per qubit obeys p < 1/2, then
plogical < p and the error correction scheme reduces the error rate (instead
of making it worse). Thus the ‘break-even’ point is p∗ = 1/2. If the error
probability is far below the break-even point, for example p = 10−6 , then
plogical ∼ 3p2 ∼ 3 × 10−12 . Thus the lower the raw error rate, the greater the
improvement because the logical error rate scales quadratically ∼ 3p2 . Note
however that even at this low error rate, a petabyte (8 × 1015 bit) storage
system would have on average 24,000 errors. Furthermore, one would have
to buy three petabytes of storage since 2/3 of the disk would be taken up
with ancilla bits!

pphysical
break-
even
plogical point

pphysical
Figure 6.1: Plot of logical error probability vs. physical error probability per bit for
the three bit repetition code. For small physical error probability, the logical error
probability scales quadratically, plogical ∼ 3p2physical . The failure probability for the
quantum repetition code is the same.

We are now ready to enter the remarkable and magic world of quantum
error correction. Without quantum error correction, quantum computation
would be impossible and there is a sense in which the fact that error cor-
rection is possible is even more amazing and counterintuitive than the fact
of quantum computation itself. Naively, it would seem that quantum error
correction is completely impossible. The no-cloning theorem (see Box 3.4)
does not allow us to copy an unknown state of a qubit onto ancilla qubits.
Furthermore, in order to determine if an error has occurred, we would have
CHAPTER 6. QUANTUM ERROR CORRECTION 185

to make a measurement, and the back action (state collapse) from that mea-
surement would itself produce random unrecoverable errors.
Part of the power of a quantum computer derives from its analog
character–quantum states are described by continuous real (or complex) vari-
ables. This raises the specter that noise and small errors will destroy the
extra power of the computer just as it does for classical analog computers.
Remarkably, this is not the case! This is because the quantum computer
also has characteristics that are digital. Recall that any measurement of the
state of qubit always yields a binary result. Because of this, quantum errors
are continuous, but measured quantum errors are discrete. Amazingly this
makes it possible to perform quantum error correction and keep the calcula-
tion running even on an imperfect and noisy computer. In many ways, this
discovery by Peter Shor in 1995 and Andrew Steane in 1996 is even more
profound and unexpected than the discovery of efficient quantum algorithms
that work on ideal computers.
It would seem obvious that quantum error correction is impossible be-
cause the act of measurement to check if there is an error would collapse the
state, destroying any possible quantum superposition information. Remark-
ably however, one can encode the information in such a way that the presence
of an error can be detected by measurement, and if the code is sufficiently
sophisticated, the error can be corrected, just as in classical computation.
Classically, the only error that exists is the bit flip. Quantum mechani-
cally there are other types of errors (e.g. phase flip, energy decay, erasure
channels, etc.). However codes have been developed (using a minimum of
5 qubits) which will correct all possible quantum errors. By concatenating
these codes to higher levels of redundancy, even small imperfections in the
error correction process itself can be corrected. Thus quantum superposi-
tions can in principle be made to last essentially forever even in an imperfect
noisy system. It is this remarkable insight that makes quantum computation
possible.
As an entré to this rich field, we will consider a simplified example of one
qubit in some state α|0⟩ + β|1⟩ plus two ancillary qubits in state |0⟩ which we
would like to use to protect the quantum information in the first qubit. As
already noted, the simplest classical error correction code simply replicates
the first bit twice and then uses majority voting to correct for (single) bit
flip errors. This procedure fails in the quantum case because the no-cloning
theorem (see Box 3.4) prevents replication of an unknown qubit state. Thus
CHAPTER 6. QUANTUM ERROR CORRECTION 186

there does not exist a unitary transformation which takes

[α|0⟩ + β|1⟩] ⊗ |00⟩ −→ [α|0⟩ + β|1⟩]⊗3 . (6.6)

As was mentioned earlier, this is clear from the fact that the above transfor-
mation is not linear in the amplitudes α and β and quantum mechanics is
linear. One can however perform the repetition code transformation:

[α|0⟩ + β|1⟩] ⊗ |00⟩ −→ [α|000⟩ + β|111⟩], (6.7)

since this is in fact a unitary transformation. Just as in the classical case,

these three physical qubits form a single logical qubit. The two logical basis
states are

|0⟩log = |000⟩
|1⟩log = |111⟩. (6.8)

The analog of the single-qubit Pauli operators for this logical qubit are readily
seen to be

Xlog = X1 X2 X3
Ylog = iXlog Zlog
Zlog = Z1 Z2 Z3 . (6.9)

We see that this encoding complicates things considerably because now to

do even a simple single logical qubit rotation we have to perform some rather
non-trivial three-qubit joint operations. It is not always easy to achieve an
effective Hamiltonian that can produce such joint operations, but this is an
essential price we must pay in order to carry out quantum error correction.
It turns out that this simple code cannot correct all possible quantum
errors, but only a single type. For specificity, let us take the error operating
on our system to be a single bit flip, either X1 , X2 , or X3 . These three
together with the identity operator, I, constitute the set of operators that
produce the four possible error states of the system we will be able to correctly
deal with. Following the formalism developed by Daniel Gottesman, let us
define two stabilizer operators

S1 = Z1 Z2 (6.10)
S2 = Z2 Z3 . (6.11)
CHAPTER 6. QUANTUM ERROR CORRECTION 187

These have the nice property that they commute both with each other (i.e.,
[S1 , S2 ] = S1 S2 − S2 S1 = 0) and with all three of the logical qubit operators
listed in Eq. (6.9). This means that they can both be measured simulta-
neously and that the act of measurement does not destroy the quantum
information stored in any superposition of the two logical qubit states. Fur-
thermore they each commute or anticommute with the four error operators
in such a way that we can uniquely identify what error (if any) has occurred.
Each of the four possible error states (including no error) is an eigenstate of
both stabilizers with the eigenvalues listed in the table below
error S1 S2
I +1 +1
X1 −1 +1
X2 −1 −1
X3 +1 −1

Thus measurement of the two stabilizers yields two bits of classical informa-
tion (called the ‘error syndrome’) which uniquely identify which of the four
possible error states the system is in and allows the experimenter to correct
the situation by applying the appropriate error operator, I, X1 , X2 , X3 to the
system to cancel the original error.
We now have our first taste of fantastic power of quantum error correction.
We have however glossed over some important details by assuming that either
an error has occurred or it hasn’t (that is, we have been assuming we are
in a definite error state). At the next level of sophistication we have to
recognize that we need to be able to handle the possibility of a quantum
superposition of an error and no error. After all, in a system described by
smoothly evolving superposition amplitudes, errors can develop continuously.
Suppose for example that the correct state of the three physical qubits is

|Ψ0 ⟩ = α|000⟩ + β|111⟩, (6.12)

and that there is some perturbation to the Hamiltonian such that after some
time there is a small amplitude ϵ that error X2 has occurred. Then the state
of the system is
p
|Ψ⟩ = [ 1 − |ϵ|2 I + ϵX2 ]|Ψ0 ⟩. (6.13)

(The reader may find it instructive to verify that the normalization is correct.)
CHAPTER 6. QUANTUM ERROR CORRECTION 188

What happens if we apply our error correction scheme to this state? The
measurement of each stabilizer will always yield a binary result, thus illustrat-
ing the dual digital/analog nature of quantum information processing. With
probability P0 = 1 − |ϵ|2 , the measurement result will be S1 = S2 = +1.
In this case the state collapses back to the original ideal one and the error
is removed! Indeed, the experimenter has no idea whether ϵ had ever even
developed a non-zero value. All she knows is that if there was an error, it is
now gone. This is the essence of the Zeno effect in quantum mechanics that
repeated observation can stop dynamical evolution. (It is also, once again, a
clear illustration of the maxim that in quantum mechanics ‘You get what you
see.’) Rarely however (with probability P1 = |ϵ|2 ) the measurement result
will be S1 = S2 = −1 heralding the presence of an X2 error. The correction
protocol then proceeds as originally described above. Thus error correction
still works for superpositions of no error and one error. A simple extension of
this argument shows that it works for an arbitrary superposition of all four
error states.

6.1 An advanced topic for the experts

There remains however one more level of subtlety we have been ignoring. The
above discussion assumed a classical noise source modulating the Hamilto-
nian parameters. However in reality, a typical source of error is that one of
the physical qubits becomes entangled with its environment. We generally
have no access to the bath degrees of freedom and so for all intents and pur-
poses, we can trace out the bath and work with the reduced density matrix
of the logical qubit. Clearly this is generically not a pure state. How can we
possibly go from an impure state (containing the entropy of entanglement
with the bath) to the desired pure (zero entropy) state? Ordinary unitary
operations on the logical qubit preserve the entropy so clearly will not work.
Fortunately our error correction protocol involves applying one of four pos-
sible unitary operations conditioned on the outcome of the measurement of
the stabilizers. The wave function collapse associated with the measurement
gives us just the non-unitarity we need and the error correction protocol
works even in this case. Effectively we have a Maxwell demon which uses
Shannon information entropy (from the measurement results) to remove an
equivalent amount of von Neumann entropy from the logical qubit!
To see that the protocol still works, we generalize Eq. (6.13) to include
CHAPTER 6. QUANTUM ERROR CORRECTION 189

the bath
p
|Ψ⟩ = [ 1 − |ϵ|2 |Ψ0 , Bath0 ⟩ + ϵX2 ]|Ψ0 , Bath2 ⟩. (6.14)

For example, the error could be caused by the second qubit having a coupling
to a bath operator O2 of the form

V2 = g X2 O2 , (6.15)

acting for a short time ϵℏ/g so that

|Bath2 ⟩ ≈ O2 |Bath0 ⟩. (6.16)

Notice that once the stabilizers have been measured, then either the exper-
imenter obtained the result S1 = S2 = +1 and the state of the system plus
bath collapses to

|Ψ⟩ = |Ψ0 , Bath0 ⟩, (6.17)

or the experimenter obtained the result S1 = S2 = −1 and the state collapses

|Ψ⟩ = X2 |Ψ0 , Bath2 ⟩. (6.18)

Both results yield a product state in which the logical qubit is unentangled
with the bath. Hence the algorithm can simply proceed as before and will
work.
Finally, there is one more twist in this plot. We have so far described a
measurement-based protocol for removing the entropy associated with errors.
There exists another route to the same goal in which purely unitary multi-
qubit operations are used to move the entropy from the logical qubit to
some ancillae, and then the ancillae are reset to the ground state to remove
the entropy. The reset operation could consist, for example, of putting the
ancillae in contact with a cold bath and allowing the qubits to spontaneously
and irreversibly decay into the bath. Because the ancillae are in a mixed
state with some probability to be in the excited state and some to be in the
ground state, the bath ends up in a mixed state containing (or not containing)
photons resulting from the decay. Thus the entropy ends up in the bath. It
is important for this process to work that the bath be cold so that the qubits
always relax to the ground state and are never driven to the excited state.
CHAPTER 6. QUANTUM ERROR CORRECTION 190

We could if we wished, measure the state of the bath and determine which
error (if any) occurred, but in this protocol, no actions conditioned on the
outcome of such a measurement are required.
Quantum error correction is extremely challenging to carry out in prac-
tice. In fact the first error correction protocol to actually reach the break
even point (where carrying out the protocol extends rather than shortens
the lifetime of the quantum information) was achieved by the Yale group in
2016. This was done not using two-level systems as qubits but rather by
storing the quantum information in the states of a harmonic oscillator (a
superconducting microwave resonator containing superpositions of 0, 1, 2, . . .
photons).
Chapter 7

Yet To Do

1. Dave Bacon lecture notes on reversible classical gates has a nice dis-
cussion of Charlie Bennett’s ”uncomputation” trick to erasure scratch
registers. Add a discussion of this.
2. Add more discussion of joint measurements with examples for students
to make sure they understand that Z ⊗X is not measured by measuring
Z ⊗ I and I ⊗ X since this give too much information. However the
expectation value of this operator can be measured from averaging
the product of the individual measurement results. Another example
is measuring XY = iZ which is NOT the product of the individual
measurements since they are incompatible.
3. Define mutual information, useful for QEC.

4. Important: For classical information theory present argument from

Raisbeck’s book to derive the Shannon formula when the probabilities
are not all equal. Brings in concept that Shannon is the average in-
formation in a message, not necessarily the information in a particular
message.
5. Inconsistent notation for states | ± Z⟩, | ± X⟩, |0⟩, |1⟩, |+⟩, |−⟩. See for
example section where Hadamard gate is introduced. Need to make no-
tation uniform or at least explain the equivalences among these states.
6. Discuss projective measurements in terms of projectors. Specifically
discuss projective measurements for only a subset of the qubits (so
the projector contains some identity operators). Let the reader know

191
CHAPTER 7. YET TO DO 192

that this is important for Simons’s algorithm and for quantum error
correction.

7. Perhaps introduce QM as a theory of probability by showing that

the probability of some event classically is the sum of the proba-
bilities of independent routes to the event. For example P(‘A or
B’) is P(A)[1-P(B)]+[1-P(A)]P(B)+P(A)P(B)=P(A)+P(B)-P(A)P(B)
whereas in quantum you have to add amplitudes before you square

8. Add to the appendix on statistics discussion of conditional probabili-

ties, factoring of probability distributions, meaning of statistical inde-
pendence, etc.

9. Insert 2019 homework as exercises in text

10. Did I insert proof that length preservation implies general inner product
conservation which implies unitary? Proof could be improved.

11. Appendix on linear algebra, outer products, direct sums, eigenval-

ues, degeneracies, determinants, traces, hermitian matrices producing
bases, etc.

12. 3 CNOTS make a SWAP but apply to a general state or to the X basis

13. quantum teleportation

14. Shor code as next step beyond 3 qubit repetition code

15. matrix mechanics for harmonic oscillator?

16. Make sure this statement: “Peculiar quantum interference effects per-
mit the Toffoli gate to be synthesized from two-qubit gates, something
that is not possible in (reversible) classical computation. More on this
later!” in Chapter 1 gets followed up. Show the quantum synthesis of
Toffoli from CNOTs.

17. In Chap 2 when I introduce entanglement I mention the special type of

entanglement called ’magic’ and say we will look into it further later.
Make sure this happens.
CHAPTER 7. YET TO DO 193

18. SECTION 3.0: Of course if the system is in a superposition of two

orthogonal states, the measurement result can be random. Conversely,
if a system is in one of two states that are not orthogonal, it is not
possible to reliably determine by measurement which state the system
is in. We shall explore this more deeply when we discuss measurements.
MAKE SURE TO DO THIS.

19. Need to make sure that we derive (perhaps in appendix B) that eigen-
vectors of any Hermitian operator form a complete basis. Then give an
exercise where students derive that the variance of measurement results
of an operator is ⟨ψ|Q̂2 |ψ⟩ − ⟨ψ|Q̂|ψ⟩2 .

20. Add derivation of completeness relation and eigenvectors of a Hermitian

operator are complete to App. B. Box 3.2 states that this will be in
App. B.

21. Add discussion of classical Euler angles to Sec. 3.4 and show that in
quantum a final rotation along the qubit polarization axis just intro-
duces a global phase on the state. [See Lecture 08 spring 2023.]

22. Consider adding a chapter on quantum channels in connection with

error connection.

23. in Chapter 3 or maybe appendix B, do a better job of defining the

differences among tensor, Kronecker and outer products.

24. Redo Box and Figure on physical implementation of CNOT to use |0⟩
and |1⟩ instead of | ↑⟩ and | ↓⟩. Also change sign of Hamiltonian.

25. Include section on entanglement distillation (referred to in Chap. 4).

Needs to be after errors are discussed.

26. Tighten up mixture of notation among σ x , X, | + Z⟩, |+⟩, etc.

27. create a Box to explain that Toffoli is universal for classical computa-
tion and Toffoli + Hadamard is universal for quantum (Dorit paper).
Classically the Toffoli cannot be synthesized from CNOTs and NOTs
but quantum mechanically it CAN be synthesized from CNOTs and
single qubit (non-Clifford) rotations.
CHAPTER 7. YET TO DO 194

28. Define the Clifford hierarchy somewhere. Perhaps when discussing sta-
bilizer codes in QEC.

29. IMPORTANT: Show that the space of binary strings is a vector space
over the field {0, 1}. The only allowed scalars are 0 and 1, and vectors
are added bitwise mod 2. Refer to this when we do the Deutsch and
the Deutsch-Jozsa algorithms.

30. Review JGEH comments on Box 2.2 on configuration space vs state

space vs phase space. See his email 2023.02.26 3:01pm.
Appendix A

Quick Review of Probability

and Statistics

A.1 Randomness
Randomness plays an essential role in the theory of information–both classical
and quantum. It is therefore useful for us to review basic concepts from
probability and statistics.
What is randomness anyway? In the classical world randomness is re-
lated to ignorance. We lack knowledge of all the conditions and parameters
needed to make accurate predictions. For example, flipping a coin and see-
ing if it lands face up or face down is considered random. But it isn’t really
random. If Bob watches Alice flip a coin and were able measure exactly how
rapidly she made it spin and measure its initial upward velocity, he could
predict (using Newton’s laws of classical dynamics) how long it will be in
the air and whether it will land face up or face down. In more complicated
dynamical systems with several interacting degrees of freedom, the motion
can be chaotic. Tiny changes in initial conditions (positions and velocities)
can lead to large changes in the subsequent trajectory. Thus even though
classical mechanics is completely deterministic, motion on long time scales
can appear to be random.
Many computer programs rely on so-called random number generators.
They are not actually random but rather chaotic iterative maps–they take
an initial seed number and compute some complicated function of that seed
to produce a new number. That number is then used as the input to the next

195
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS196

round of iteration. The results may look random and may even pass many
statistical tests for randomness, but if an observer knows the program that
was used and the starting seed, he or she can predict the entire sequence of
numbers perfectly.
In quantum mechanics randomness is an ineluctable feature. It is not due
to ignorance of initial conditions but rather is an essential part of the theory.
Alice can prepare N truly identical copies of a quantum state and Bob can
make a measurement of some physical property on each copy of that state
and obtain truly random (not pseudo-random) results. The results are not
random because of some ‘hidden variable’ whose value Alice forgot to fix or
Bob failed to measure. They are truly random–it is impossible to predict the
outcome of the measurement before it is performed.

A.2 Probabilities
The probability pj of an event j is a non-negative number obeying 0 ≤ pj ≤ 1.
The probabilities of all possible events (in a universe of M possible events)
obeys the sum rule
M
X
pj = 1. (A.1)
j=1

This simply means that one of the possible events (from the complete set of
all possible mutually exclusive events) definitely happened.
As an example, suppose we have an M -sided die (die is the singular form
of dice). On each face of the die is a number. Let xj denote the number of
the jth face of the die, and pj be the probability that the jth face lands face
up so that the number xj is showing when the die is randomly tossed onto a
table. We can define a so-called ‘random variable’ X to be the number that
comes up when we toss the die. X randomly takes on one of the allowed
values xj ; j = 1, . . . , M . We can now ask simple questions like, what is the
mean (i.e. average) value of X? This is also known as the expectation value
of X and is often denoted by an overbar or by double brackets
M
X
X̄ = ⟨⟨X⟩⟩ = p j xj . (A.2)
j=1
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS197

This sum over all possible results weighted by their frequency of occurrence
gives the value one would obtain by averaging the results of a huge number
of trials of the experiment (of rolling the die).
As an example, suppose we have a standard cube-shaped die with the six
faces numbered 1 through 6, that is xj = j. We will take the ‘measurement
result’ to be the number on the top face of the die after it stops rolling. If
the die is fair, the probability of result xj is pj = 16 , and thus the mean value
will be
6 6
X X 1
X̄ = p j xj = j = 3.5. (A.3)
j=1 j=1
6

Exercise A.1. Consider a pair of standard six-sided die with faces

numbered consecutively from 1 to 6. Assuming the dice are fair

a) What are the unique possible values for the sum of the two num-
bers showing on the top faces of the dice?

b) What is the probability that each of these unique possible values

occurs?

For later purposes, it will also be useful to consider what happens when
we have two independent rolls of the die. Let the random variable X1 be
the number that comes up on the first toss and let X2 be the number that
comes up on the second toss. What is the joint probability distribution for
the two results? That is, what is the probability P(xj , xk ) that X1 = xj
and X2 = xk ? Because each result is drawn independently from the same
probability distribution we simply have that the probability for a given result
of two tosses is just the product of the probabilities of the individual results

P(xj , xk ) = pj pk . (A.4)

This is simply the statement that the joint probability distribution factorizes
into two separate distributions for the individual events (assuming the events
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS198

are independent of each other). From this it follows that

M X
X M
⟨⟨X1 X2 ⟩⟩ = P(xj , xk )xj xk
j=1 k=1
M X
X M
= p j p k xj xk
j=1 k=1
(M )( M )
X X
= p j xj p k xk
j=1 k=1

= ⟨⟨X1 ⟩⟩⟨⟨X2 ⟩⟩ = X̄ 2 . (A.5)

We conclude that for independent (i.e. uncorrelated) random variables, the

mean of the product is equal to the product of the means.
The random variables X1 , X2 will not be independent if they correspond
to the result from the same throw of the die. Then of course X2 = X1 = X
and we have a different result
M
X
2
⟨⟨(X) ⟩⟩ = pj (xj )2 . (A.6)
j=1

This is simply the mean of the square of the number that comes up when we
toss the die. Does the mean of the square bear any relation to the square
of the mean, X̄ 2 ? To find out, let us consider the so-called variance of the
distribution, the mean of the square of the deviation of the random variable
from its own mean
M
2
2 X 2
σ ≡ ⟨⟨ X − X̄ ⟩⟩ = pj xj − X̄
j=1
M
X
pj (xj )2 − 2X̄xj + X̄ 2

=
j=1

= ⟨⟨X 2 ⟩⟩ − ⟨⟨X⟩⟩2 . (A.7)

In deriving this result we have used the fact that

M
X M
X
pj X̄ 2 = X̄ 2 pj = X̄ 2 , (A.8)
j=1 j=1
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS199

and
M
X M
X
pj 2X̄xj = 2X̄ pj xj = 2X̄ 2 . (A.9)
j=1 j=1

Clearly the variance is non-negative σ 2 ≥ 0 because (X − X̄)2 can never

be negative. Hence we conclude

⟨⟨X 2 ⟩⟩ ≥ ⟨⟨X⟩⟩2 . (A.10)

The variance is a measure of the width of the probability distribution.

Another related quantity is the standard deviation or ‘root mean square
deviation’ of the random variable from its mean
2 1
σ ≡ ⟨⟨ X − X̄ ⟩⟩ 2 . (A.11)

To understand better what we mean by the width of the distribution consider

the following examples. If a six-sided die has the number 3 painted on every
face the probability distribution has zero variance. The only number that
ever comes up is 3 so the mean of the distribution is 3 and no result ever
deviates from the mean. Similarly, if the die has the standard consecutive
numbering of the faces from 1 to 6, then a wide variety of outcomes is possible.
If the die is fair then all outcomes are equally likely and the probability
distribution is wide as shown in the left panel of Fig. A.1. Suppose however
that the die is unfair and yields the number 3 with probability p3 = 0.8, and
the other numbers with probability p1 = p2 = p4 = p5 = p6 = 0.04. This
distribution is plotted in the right panel of Fig. A.1. The distribution has
wide support (i.e. is non-zero over the full range from 1 to 6) but is still
sharply peaked at 3. Hence the variance is small but not zero.
Exercise A.2. A standard six-sided die has faces numbered consec-
utively from 1 to 6. Find the variance and standard deviation of the
probability distribution associated with random throws of the die

a) Assuming the die is fair (as in the left panel of Fig. A.1).

b) Assuming the die is biased with the probabilities given in the right
panel of Fig. A.1.
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS200

mean = 3.50 mean = 3.12

 = 2.917
2
 2 = 0.746
 = 1.708  = 0.863

Figure A.1: Left panel: Graph of the probability distribution for the outcome of the throw
of a fair die (pj = 1/6; j = 1, . . . , 6). The variance is large. Right panel: Graph of the
probability distribution of an unfair (highly biased) die having p3 = 0.8, and p1 = p2 =
p4 = p5 = p6 = 0.04. The variance is smaller.

A.3 Statistical Estimators

When performing experiments one may wish to attempt to measure the av-
erage value of some statistical quantity. Since we can execute only a finite
number of runs of the experiment, we cannot be guaranteed to obtain the
exact value of the mean, only an estimate.
For N trials we can form a so-called ‘estimator’ X̃ for X̄, the mean value
of a random variable via
N
1 X
X̃(N ) = Xk , (A.12)
N k=1

where Xk is the number that came up in the kth run of the experiment (e.g.
throw of the die). We use the tilde over the X to indicate that this is an
estimator for the mean, not the true mean.
For finite N , this estimator is not likely to be exact, but for large N we
expect it to become better and better. Is there a way to determine how
accurate this estimator is likely to be? Indeed there is. Let us write

X̃(N ) = X̄ + δN , (A.13)

where the error δN is given by

N
1 X
δN = Xk − X̄ (A.14)
N k=1
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS201

If we were to ensemble average this estimator over a huge (formally infinite)

number of experiments consisting of N throws we would obtain
N
1 X
⟨⟨δN ⟩⟩ = ⟨⟨Xk ⟩⟩ − X̄ = 0. (A.15)
N k=1

Thus the average error vanishes. That is, our estimator is unbiased as ex-
pected.
We can get a sense of the typical size of the error by considering its
variance:
2
σN ≡ ⟨⟨(δN )2 ⟩⟩
N N
2 1 XX
σN = ⟨⟨ X j − X̄ X k − X̄ ⟩⟩
N 2 j=1 k=1
N N
1 XX
= ⟨⟨Xj Xk ⟩⟩ − X̄ 2
N 2 j=1 k=1
N N
1 X 2 1 X
= 2
⟨⟨Xj Xk ⟩⟩ − X̄ + 2 ⟨⟨Xj Xk ⟩⟩ − X̄ 2
N j̸=k N j=k
N
1 X
= ⟨⟨(Xj )2 ⟩⟩ − X̄ 2
N 2 j=1
1 2
= σ , (A.16)
N 1
where σ1 = σ, the standard error for a single throw of the die defined in
Eq. (A.11). Thus our estimator has a random error whose variance decreases
inversely with N . The standard error thus is
1
σN = √ σ1 . (A.17)
N
This is a simple estimate of the size of the error that our estimator of the
mean is likely to have.
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS202

Box A.1. Sample variance vs. true variance Care must be exercised
when estimating the variance σ12 of an unknown probability distribution from
a finite sample size drawn from the distribution. If (somehow) we know the
true mean of the distribution, then we can simply use as our estimator of the
variance
N
1 X
σ̃ 2 = (Xj − X̄)2 . (A.18)
N j=1

It is straightforward to show that this is an unbiased estimator since

⟨⟨σ̃ 2 ⟩⟩ = σ12 . (A.19)

The situation is not so simple when we do not know the mean of the un-
known distribution and are forced to estimate it using Eq. (A.12). While
this estimator of the mean is unbiased it still will have some small error and
that error is positively correlated with the values of Xj in our sample. This
means that if we use as our estimator of the variance
N
2 1 X
σ̃ = (Xj − X̃(N ))2 , (A.20)
N j=1

it will be biased because it is too small. As an extreme example, consider

the case N = 1. Our estimate of the mean is X̃(N ) = X1 . If we substitute
this for the mean in Eq. (A.20) we always obtain σ̃ 2 = (X1 − X1 )2 = 0. A
straightforward calculation shows that for general N
N −1 2
⟨⟨σ̃ 2 ⟩⟩ = σ1 , (A.21)
N
which is consistent with our result that it vanishes for N = 1. Therefore if
we want to have an unbiased estimate of the variance we should use
N
N 1 X
σ̃S2 = σ̃ 2 = (Xj − X̃(N ))2 .. (A.22)
N −1 N − 1 j=1

We will refer to this as the unbiased sample variance.

APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS203

Let us return now to the question of estimating the mean of a distri-

bution from N samples. We have an unbiased estimator and have already
computed the variance of our estimator in Eq. (A.17). The question arises as
to whether or not we could say something about the probability distribution
of the error. For large N , the error δN in Eq. (A.14) is the sum of a large
number of small random terms. By the central limit theorem, the error will
be, to a good approximation, Gaussian distributed. That is, the probability
distribution for the error is well approximated by a continuous distribution
having probability density
1 − 12 δN
2σ
2
P (δN ) = p 2
e N . (A.23)
2πσN
The interpretation of probability density for a continuous variable x is the
following. P (x)dx is the probability that the value of the random variable
X lies between x and x + dx. The normalization condition on probability
becomes an integral (see the discussion on Gaussian integrals in Box A.2)
Z +∞
dx P (x) = 1. (A.24)
−∞

Summations over random variables such as in Eq. (A.14) can be inter-

preted as random walks. Suppose that ∆ is a random variable with equal
probability of being ±ϵ, where ϵ is a fixed step length. Then a random walk
of N steps ends up a position
N
X
x= ∆j , (A.25)
j=1

where ∆j is the value of the random variable ∆ for the jth step.
In Fig. A.2 we see plots of the probability distribution for x for different
values of N , together with the Gaussian approximation to it. We see that the
Gaussian approximation is quite good even for modest values of N . This is
the essence of the central limit theorem. The sum of a large number of ran-
dom variables (with bounded variance) is well-approximated by a Gaussian
distribution.
Exercise A.3 provides an opportunity to prove the central limit theorem
for the particular case of a random walk. To see how this works, let us
derive the exact probability distribution for a random walk of N steps, each
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS204

of size ϵ = 1/2. We take N even so that the final position m of the walker is
always an integer and lies in the interval m ∈ [−N/2, +N/2]. Let p± be the
probability of stepping to the right or left respectively. Let the number of
steps to the right be R and the number to the left be L. We have N = R + L
and the final position is given by m = (R − L)ϵ. The probability of any given
sequence of steps is
P = pR L
+ p− . (A.26)
The number of different walks with R steps to the right and L steps to the
left can be determined from combinatorics. Think of a string of N symbols,
ℓ and r denoting the direction of each step in the random walk. There
are altogether N ! permutations of the order of these symbols. However L!
permutations merely swap ℓ’s with other ℓ’s and so should not be counted
as distinct walks. Similarly there are R! permutations of the r’s that should
not be counted. The number of distinct walks M (R, L) is therefore
N!
M (R, L) = . (A.27)
R!L!
The probability of ending up at m = (R − L)ϵ after N = R + L (even) steps
is therefore

R L N! R (N −R) N
P (N, m) = p+ p− = p+ p− (A.28)
R!L! R
where the last expression is the binomial coefficient

N N!
= . (A.29)
R R!(N − R)!
For this reason, this probability distribution is known as the binomial distri-
bution. Notice that this expression is correctly normalized because
+N N
X X (N −R) N!
P (N, m) = pR
+ p− = (p+ +p− )N = (1)N = 1, (A.30)
m=−N R=0
R!(N − R)!

where we have used the binomial theorem to evaluate the sum.

If the final position is m = (R−L)ϵ, we have (for ϵ = 1/2) R = (N +m)/2
and L = (N − m)/2 so Eq. (A.28) can also be written
(N +m)/2 N −m)/2 N!
P (N, m) = p+ p− (A.31)
[(N + m)/2]! [(N − m)/2]!
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS205

Exercise A.3. A random walker moving in one dimension takes steps

of length ϵ = 1/2 to the right (x → x + ϵ) with probability p and to
the left (x → x − ϵ) with probability q = 1 − p. The walker starts at
x = 0 and the position after N steps is simply the algebraic sum of all
the steps. The final position lies in the interval [−N ϵ, +N ϵ]. If ϵ = 1/2
and N is even, only integer positions are possible.

a) Derive exact expressions for the mean and variance of the position
m after N steps by two different methods. Hint: You can either
derive the appropriate properties of the binomial distribution, or
you can use the fact that the individual walk steps have a mean
and variance and are statistically independent. For the former it
is useful to notice that p+ ∂p∂+ (p+ )R = R(p+ )R .

b) Discuss and explain what happens to the variance as p approaches

either 0 or 1.

c) Take the logarithm of P (N, m) in Eq. (A.31) and use Stirling’s

(asymptotic∗ ) expansion
1
ln Z! ∼ Z ln Z − Z + ln(2πZ) (A.32)
2
for the factorials in the binomial coefficients to show that for large
N the binomial distribution with q = p = 1/2 approaches the
Gaussian distribution
1 1 2
PN (x) = √ e− 2σ2 x , (A.33)
2πσ 2
√ √
where the standard deviation is σ = N ϵ = N /2. Hint: In
evaluating the logarithm of the probability, keep terms of order
x2 /N but neglect higher-order terms in 1/N such as (x/N )2 .

This constitutes a proof of the central limit theorem for this special case.

∗ Note that an asymptotic expansion does not mean that the difference
between the LHS and RHS of Eq. (A.32) becomes arbitrarily close to zero
for large N . It means that the ratio of the RHS to the LHS approaches
unity for large N . Since both quantities are diverging, their difference
can still be large even if their ratio approaches unity.
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS206

Box A.2. Gaussian Integrals A general Gaussian integral over a single

variable has the form
Z +∞
2
I= dz e−az +bz . (A.34)
−∞

It turns out that integrals of this form (and its multi-dimensional general-
ization) are ubiquitous in physics, so it is handy to know how to carry them
out. We begin by ‘completing the square’ by writing
Z +∞
b 2 b2
I= dz e−a[z− 2a ] + 4a . (A.35)
−∞

b
Shifting the dummy variable of integration to x = z − 2a
we have
Z +∞
b2 2
I=e 4a dx e−ax . (A.36)
−∞

It turns out to be easier to consider the square of the integral which we can
write as
h b2 i2 Z +∞ 2 2
2
I = e 4a dxdy e−a[x +y ] . (A.37)
−∞

This two-dimensional
p integral is readily carried out using polar coordinates
r, θ with r = x2 + y 2
h b2 i 2 Z ∞ 2
2
I = e 4a 2πrdr e−ar . (A.38)
0

Defining u = ar2 and using du = 2ardr we have

π h b2 i2 ∞
Z
2
I = e 4a du e−u (A.39)
a 0

which yields the final result

r
π b2
I= e 4a . (A.40)
a

From this we see that the Gaussian probability distribution in Eq. (A.23) is
properly normalized to unity.
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS207

N =2 N =4

N =8 N = 16

Figure A.2: Dots: Probability distribution for the ending position of random walks of N
with step √
length ϵ = 1/2. The smooth curve is the Gaussian distribution with standard
deviation N /2. The Gaussian approximation rapidly becomes very accurate as N in-
creases.

A.4 Joint Probability Distributions

This section provides background for the discussion of multi-qubit measure-
ments in Sec. 4.2.
Suppose that we have a probability distribution P (A, B) for two random
variables, A and B. A and B might for example be the measured values
of the Pauli Z operator for each two qubits in a quantum computer. Thus
for example P (+1, −1) is the probability that the measured value of Z for
q0 is −1 and the measured value of Z for q1 is +1. Altogether there are
four possible measurement results for the combined measurement of the two
qubits and P obeys the usual sum rule
X X
P (A, B) = 1. (A.41)
A=±1 B=±1

The probability distribution for measurement results on q1 is simply

X
P1 (A) = P (A, B). (A.42)
B=±1
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS208

That is, it is the sum of the probabilities of all the independent ways that q0
can be found with measured value A independent of what B is. Notice that
it follows immediately from the definition in Eq. (A.42) and the sum rule in
Eq. (A.41) that this distribution is properly normalized
X X X
P1 (A) = P (A, B) = 1. (A.43)
A=±1 A=±1 B=±1

Similarly, the probability distribution for measurement results on q0 is

X
P0 (B) = P (A, B), (A.44)
A=±1

and this too is properly normalized.

A.4.1 Conditional Probabilities and Bayes Rule

Another useful quantity is the conditional probability distribution of A given
a fixed value of B which is denoted P (A|B). You might naively think that
this is simply given by

P (A|B) = P (A, B), (A.45)

with B held fixed. This however is not properly normalized. The correctly
normalized expression is
P (A, B)
P (A|B) = P (A.46)
A=±1 P (A, B)
P (A, B)
= . (A.47)
P0 (B)
From this it follows that

P (A, B) = P (A|B)P0 (B). (A.48)

Similarly, the probability distribution of B given a fixed value of A is

P (A, B)
P (B|A) = P (A.49)
B=±1 P (A, B)
P (A, B)
= . (A.50)
P1 (A)
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS209

From this it follows that

P (A, B) = P (B|A)P1 (A). (A.51)

Equating the above two expressions for P (A, B) yields the very important
Bayes rule:
P0 (B)
P (B|A) = P (A|B) . (A.52)
P1 (A)

This is especially powerful in situations (which occur frequently) in which

P (B|A) is hard to compute but P (A|B) is easy to compute.
To understand Bayes rule better, let us consider the example discussed
in Box 4.6 in which we given the problem of quantifying the information
gain from making a measurement. To recap the setup of the problem: Alice
randomly gives Bob one copy of one of the following four states
|ψ0 ⟩ = |0⟩ (A.53)
|ψ1 ⟩ = |1⟩ (A.54)
|ψ2 ⟩ = |+⟩ (A.55)
|ψ3 ⟩ = |−⟩ (A.56)
selected with equal probability. Here of course the Shannon entropy of the
distribution prior to the measurement is Sprior = 2 bits. Suppose Bob mea-
sures Alice’s selected qubit in the Z basis and obtains the result z = +1.
With this new knowledge he must update the probability distribution from
the one prior to the measurement to the new one conditioned on the mea-
surement result. The Born rule tells us immediately what the probability is
for the measurement result being the observed z = +1 conditioned on the
state being |ψj ⟩
P (z = +1|ψ0 ) = 1 (A.57)
P (z = +1|ψ1 ) = 0 (A.58)
1
P (z = +1|ψ2 ) = (A.59)
2
1
P (z = +1|ψ0 ) = . (A.60)
2
What we are after however is the reverse, namely the probability that the
observed result z = +1 was obtained from state ψj , P (ψj |z = +1). This is
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS210

precisely the situation where Bayes Rule is useful. We have from Eq. (A.52)
Pψ (ψj )
P (ψj |z = +1) = P (z = +1|ψj ) , (A.61)
Pz (z = +1)
where Pψ (ψj ) is the prior probability that Alice selects state ψj . In this
case, since Alice chooses each state with equal probability, we have Pψ (ψj ) =
1
4
for all four values of j. Pz (z) is the prior probability of obtaining the
measurement result z. In this case, we find using Eqs. (A.57-A.60)

3
X 1 1 1 1
Pz (z = +1) = P (z = +1|ψj )Pψ (ψj ) = 1 + 0 + + = .
j=0
2 2 4 2
(A.62)

We see that Pz (z = +1) on the RHS of Eq. (A.62) is just a constant factor
needed to fix the normalization of the posterior probability distribution on
the LHS.
Combining all these results with Eq. (A.61) yields the results given in
Box 4.6
1
P (ψ0 |z = +1) = (A.63)
2
P (ψ1 |z = +1) = 0 (A.64)
1
P (ψ2 |z = +1) = (A.65)
4
1
P (ψ3 |z = +1) = . (A.66)
4
(A.67)

As noted in Box 4.6, this probability distribution has Shannon entropy of

(3/2) bits. Thus the information gain Bob receives from his measurement is

3 1
I = Sprior − Spost = 2 − bits = bit.
2 2
Appendix B

Formal Definition of a Vector

Space

In classical physics and engineering, we generally work with simple three-

component position, velocity and acceleration vectors consisting of a triple
of real numbers. The concept of a vector space is however much more general
than this. Strictly speaking, we should refer to a ‘vector space over a field.’
Here are the properties:
We need a field S (e.g., a set of objects like the real numbers or the
complex numbers on which two operations, multiplication and addition are
defined). The elements of the field are called scalars.
The vectors are a set of objects {V } with two operations:
addition, so that for ⃗v1 , ⃗v2 ∈ {V }, ⃗v1 + ⃗v2 is also in {V } (closure under
addition).
multiplication by a scalar. If s ∈ S then s⃗v1 ∈ {V } (closure under
scalar multiplication).
The following eight properties must be satsified:
1. (⃗v1 + ⃗v2 ) + ⃗v3 = ⃗v1 + (⃗v2 + ⃗v3 ) (addition is associative)
2. ⃗v1 + ⃗v2 = ⃗v2 + ⃗v1 (addition is commutative)
3. ∃ ⃗0 ∈ {V } such that ⃗0 + ⃗v = ⃗v (additive identity exists)
4. ∀ ⃗v ∈ {V }, ∃ − ⃗v ∈ {V } such that ⃗v + (−⃗v ) = ⃗0. (additive inverse
exists)

211
APPENDIX B. FORMAL DEFINITION OF A VECTOR SPACE 212

5. s(⃗v1 + ⃗v2 ) = s⃗v1 + s⃗v2 (1st distribution law)

6. (s1 + s2 )⃗v = s1⃗v + s2⃗v (2nd distribution law)

7. s1 (s2⃗v ) = (s1 s2 )⃗v

8. 1(⃗v ) = ⃗v (where 1 is the multiplicative identity of the scalar field)

B.1 Basis Vectors and the Dimension of a

Vector Space
The dimension D of a vector space is the maximum number1 of linearly inde-
pendent vectors. Suppose that we have a set of vectors Q = {B1 , B2 , . . . , BN }
and we consider the function
N
X
F (s1 , s2 , . . . , sN ) = sj Bj . (B.1)
j=1

The set Q of vectors is by definition linearly independent if the only solution

to the equation F (s1 , s2 , . . . , sN ) = 0 is that all of its arguments vanish
(s1 , s2 , . . . , sN ) = (0, 0, . . . , 0). If N > D there are too many vectors to all be
pointing in orthogonal directions and some of the vectors are redundant in
the sense that they can be expressed as linear combinations of other vectors.
In this case F = 0 has solutions for non-zero arguments of F .
A basis for a vector space of dimension D is a set of vectors Q =
{B1 , B2 , . . . , BD } which span the space. That is every vector V in the space
can be written as a (unique) linear combination of the basis vectors
D
X
V = bj Bj (B.2)
j=1

by choosing appropriate values for the set of scalars (coefficients)

(b1 , b2 , . . . , bD ). This list of coefficients constitutes a representation of the
abstract vector V in this basis B.
1
We consider here only the case where the Hilbert space dimension is finite (or at least
countable).
APPENDIX B. FORMAL DEFINITION OF A VECTOR SPACE 213

As a familiar example, the unit vectors

î = (1, 0, 0) (B.3)
ĵ = (0, 1, 0) (B.4)
k̂ = (0, 0, 1) (B.5)
can be used to form any position vector in ordinary three-dimensional space

⃗r = (x, y, z) = xî + y ĵ + z k̂. (B.6)

These Cartesian coordinate basis vectors happen to be orthogonal (formally
defined below) because they are perpendicular in the usual geometric sense.
However a basis set is not required to be orthogonal, only complete.

B.2 Inner Products and Norms

For the ordinary vectors like positions and displacements (differences of po-
sition vectors) that we are used to the concept of the length (or ‘norm’) of a
vector defined in terms of Pythagorean theorem. If
⃗ = (Ax , Ay , Az ),
A (B.7)
then the norm is
q
⃗ = A2 + A2 + A2 .
|A| (B.8)
x y z

The general definition of a norm for a vector space is (more or less) any
mapping from the vectors to the non-negative real numbers satisfying the
⃗ + B|
triangle inequality |A ⃗ ≤ |A|
⃗ + |B|.
⃗ It is important to note that a given
vector space need not have a norm defined.
For ordinary vectors we are used to the dot product
⃗·B
A ⃗ = ax Bx + Ay By + Az Bz = |A||
⃗ B|⃗ cos θAB , (B.9)
where θAB is the angle between the two vectors. The dot product is a specific
example of the general concept of an inner product. An inner product of two
abstract vectors V1 and V2 is a mapping onto a scalar s, often denoted
(V1 , V2 ) = s. (B.10)
For the case where the vector space is defined over the field of complex
numbers, an inner product satisfies the following requirements:
APPENDIX B. FORMAL DEFINITION OF A VECTOR SPACE 214

Linearity in the second argument

(V1 , aV2 ) = as, (B.11)

(V1 , V2 + V3 ) = (V1 , V2 ) + (V1 , V3 ), (B.12)

Anti-linearity (conjugate linearity) in the first argument

(aV1 , V2 ) = a∗ (V1 , V2 ), (B.13)

(V1 , V2 ) = (V2 , V1 )∗ , (B.14)

Positive semi-definite

(V1 , V1 ) ≥ 0 (B.15)

and

(V1 , V1 ) = 0 =⇒ V1 = ⃗0. (B.16)

[Note the first zero above is the scalar zero and the arrow is placed on
the second zero to make clear that this is the null vector (additive zero
vector) not the scalar zero.]
Two vectors are defined to be orthogonal if their inner product vanishes.
Exercise B.1. Prove that the usual dot product for real three-
dimensional vectors satisfies the definition of an inner product.
The Pythagorean norm defined above for ordinary real vectors is thus
related to the inner product of the vector with itself
p
⃗ = A
|A| ⃗ · A.
⃗ (B.17)

p
Exercise B.2. Prove that for a general complex vector space, (V, V )
satisfies the definition of a norm.
In describing qubit states we will deal with two component complex val-
ued vectors of the form
Ψ1 = (α1 , β1 ), (B.18)
Ψ2 = (α2 , β2 ). (B.19)
APPENDIX B. FORMAL DEFINITION OF A VECTOR SPACE 215

The standard inner product and norm for such vectors are defined in Ex. B.3.
Exercise B.3. Prove that the following generalization of the dot prod-
uct to complex vectors satisfies the definition of an inner product

(Ψ1 , Ψ2 ) = α1∗ α2 + β1∗ β2 . (B.20)

and prove that

p p
(Ψ1 , Ψ1 ) = α1∗ α1 + β1∗ β1 . (B.21)

satisfies the definition of a norm.

B.3 Dirac Notation, Outer Products and Op-

erators for Single and Multiple Qubits
Physicists generally use a notation for complex state vectors and their inner
products that was developed by Paul Adrian Maurice Dirac. In this notation
the complex vector in Eq. (B.18) is represented as column vector instead of
a row vector using the ‘bracket’ notation

α1
|Ψ1 ⟩ = , (B.22)
β1

where |Ψ1 ⟩ is referred to as a ‘ket’ vector and the inner product is represented
by

(Ψ1 , Ψ2 ) = ⟨Ψ1 |Ψ2 ⟩, (B.23)

where the ‘dual vector’

⟨Ψ1 | = (α1∗ , β1∗ ) , (B.24)

is a row vector representing the conjugate transpose of the column vector.

In this notation, the inner product is computed using the rules of matrix
multiplication

(α1∗ , β1∗ )

α2
⟨Ψ1 |Ψ2 ⟩ = = α1∗ α2 + β1∗ β2 . (B.25)
β2
APPENDIX B. FORMAL DEFINITION OF A VECTOR SPACE 216

The Dirac notation is also very convenient for defining the outer product
of two vectors which, as described in Chapter 3 is a linear operator that maps
the vector space back onto itself (i.e. maps vectors onto other vectors)

O = |Ψ2 ⟩⟨Ψ1 |. (B.26)

Applying this operator to a vector |Ψ3 ⟩ yields

O|Ψ3 ⟩ = (|Ψ2 ⟩⟨Ψ1 |) |Ψ3 ⟩ = |Ψ2 ⟩⟨Ψ1 |Ψ3 ⟩ = |Ψ2 ⟩ (⟨Ψ1 |Ψ3 ⟩) = s13 |Ψ2 ⟩, (B.27)

where the scalar s13 is given by the inner product (also known in quantum
parlance as the ‘overlap’)

s13 = ⟨Ψ1 |Ψ3 ⟩. (B.28)

Subsituting the definitions of of the bra and ket vectors in Eq. (B.26) we see
that the abstract operator can be represented as a matrix

(α1∗ , β1∗ ) α2 α1∗ α2 β1∗

α2
O= = . (B.29)
β2 β2 α1∗ β2 β1∗

By simply switching the order of the row and column vector from that in
Eq. (B.25), the rules of matrix multiplication give us a matrix instead of a
scalar!
Exercise B.4. Use the matrix representation of O in Eq. (B.29) and
apply it to the column vector representation of |Ψ3 ⟩

γ
|Ψ3 ⟩ =
δ

to verify the last equality in Eq. (B.27.)

Recall that the adjoint of the product of two matrices is the product of
the adjoints in reverse order

(AB)† = B † A† . (B.30)

It works the same way in the Dirac notation for the outer product of two
vectors that forms an operator. Thus the adjoint of the operator in Eq. (B.26)
is simply

O† = |Ψ1 ⟩⟨Ψ2 |. (B.31)

APPENDIX B. FORMAL DEFINITION OF A VECTOR SPACE 217

To see that this is true, it is best to work with the representation in Eq. (B.29)
† †
(α1∗ , β1∗ ) α2 α1∗ α2 β1∗

† α2
O = = , (B.32)
β2 β2 α1∗ β2 β1∗
(α2∗ , β2∗ ) α1 α2∗ α1 β2∗

α1
= = , (B.33)
β1 β1 α2∗ β1 β2∗
= |Ψ1 ⟩⟨Ψ2 |. (B.34)

Tensor Products and Multiqubit States

The so-called tensor product of a pair of 2 × 2 matrices produces a 4 × 4
matrix

A11 [B] A12 [B]
A⊗B = (B.35)
A21 [B] A22 [B]
 
B11 B12 B11 B12
 A11 B21 B22 A12
B21 B22  .

=  (B.36)
B11 B12 B11 B12 
A21 A22
B21 B22 B21 B22
Such tensor products appear when dealing with the Hilbert space of two
qubits where the operator A acts on one qubit and operator B acts on the
other. This Hilbert space has dimension four and the states are represented
by column vectors of length four
 
ψ0
 ψ1 
|ψ⟩ = 
 ψ2  .
 (B.37)
ψ3

For the Hilbert space of n qubits, we will use the following Dirac notation
for state vectors and their duals in the computational basis

|ψ⟩ = |bn−1 . . . b2 b1 b0 ⟩ (B.38)

⟨ψ| = ⟨bn−1 . . . b2 b1 b0 | (B.39)

in which the qubits are numbered from 0 to n − 1 and their values bj ∈ {0, 1}
are ordered from right to left. (Note that we maintain this same label ordering
in the dual vector.) The computational basis is simply the tensor product
APPENDIX B. FORMAL DEFINITION OF A VECTOR SPACE 218

of the computational basis of the individual qubits (illustrated here for the
case of two qubits)
 
1
1 1  0 
|00⟩ = |0⟩ ⊗ |0⟩ = ⊗ =  (B.40)
0 0  0 
0
 
0
1 0  1 
|01⟩ = |0⟩ ⊗ |1⟩ = ⊗ =  (B.41)
0 1  0 
0
 
0
0 1  0 
|10⟩ = |1⟩ ⊗ |0⟩ = ⊗ =  (B.42)
1 0  1 
0
 
0
0 0  0 
|11⟩ = |1⟩ ⊗ |1⟩ = ⊗ = 0 .
 (B.43)
1 1
1
Notice that if we label the ordinal positions in the column vector starting with
0 at the top and ending with 3 as in Eq. (B.37), then the binary representation
of the position of the entry containing 1 gives the state of the two qubits
in the computational basis. For example |11⟩ corresponds to the binary
representation of the number 3 which in turn corresponds to the location of
the entry 1 being at the bottom of the column vector in Eq. (B.43)
We can also write the dual vectors associated with the above two-qubit
state vectors. For example the dual of the vector in Eq. (B.41) is

⟨01| = ⟨0| ⊗ ⟨1| = 1 0 ⊗ 0 1 = 0 1 0 0 . (B.44)

Recall from Eq. (B.30) for ordinary products of matrices we need to reverse
the order of the matrices when forming the transpose. However in forming
the dual of the tensor product |0⟩ ⊗ |1⟩, we do not reverse the order of the
two terms in the tensor product. This is because of our convention of keeping
the bit order the same when writing the dual of |01⟩ as ⟨01| rather than ⟨10|.
As examples of operators acting on this Hilbert space consider the joint
APPENDIX B. FORMAL DEFINITION OF A VECTOR SPACE 219

Pauli operators
 
0 +1 0 0
 +1 0 0 0 
Z ⊗X = , (B.45)
 0 0 0 −1 
0 0 −1 0
and
 
0 0 +1 0
 0 0 0 −1 
X ⊗Z =
 +1 0
. (B.46)
0 0 
0 −1 0 0

It is straightforward to verify that (for example)

   
0 +1 0 0 0 0
 +1 0 0 0  0  = − 0 
    
(Z ⊗ X)|10⟩ =  (B.47)
 0 0 0 −1   1   0 
0 0 −1 0 0 1

is equivalent to (i.e., correctly represents)

(Z ⊗ X)|10⟩ = (Z|1⟩) ⊗ (X|0⟩) (B.48)
= (−|1⟩) ⊗ (|1⟩) (B.49)
= −|11⟩. (B.50)
Similarly
    
0 0 +1 0 0 1
 0 0 0 −1  0   0 
(X ⊗ Z)|10⟩ = 
 +1 0
  =  , (B.51)
0 0  1   0 
0 −1 0 0 0 0
is equivalent to
(X ⊗ Z)|10⟩ = (X|1⟩) ⊗ (Z|0⟩) (B.52)
= (|0⟩) ⊗ (+|0⟩) (B.53)
= +|00⟩. (B.54)
The above examples show us that the tensor product of two operators is
represented by a 4 × 4 matrix that can act on the column vector representing
APPENDIX B. FORMAL DEFINITION OF A VECTOR SPACE 220

the tensor product of two single qubit states. But we also see that this
is exactly equivalent to each operator acting separately on their respective
qubits and then taking the tensor product of the resulting state vectors. If
we want an operator that acts on only qubit q0 we simply tensor it with the
identity acting on q1 . For example,
 
0 1 0 0
 1 0 0 0 
X0 = I ⊗ X =   0 0 0 1 ,
 (B.55)
0 0 1 0

whereas if we want the operator to act only on q1 we should use

 
0 0 1 0
 0 0 0 1 
X0 = X ⊗ I =   1 0 0
. (B.56)
0 
0 1 0 0

It is important to note that the tensor product of two Pauli matrices is

not the most general form of two-qubit operator. For example, the CNOT
gate shown in Eq. (4.19) is a sum of four different products of Pauli operators.

B.4 The Vector Space of Bit Strings

Computer and information science deals with bit strings and it turns out
that these form a vector space over a field. To see this, consider the set
of all possible bit strings ⃗x = (xn−1 xn−2 . . . x2 x1 x0 ) of length n. Since every
element in the vector is a bit whose only allowed values are 0 and 1, we define
addition via

⃗x ⊕ ⃗y ≡ ⃗x + ⃗y mod 2, (B.57)

where the mod 2 operation is performed bitwise. Strangely, this means that
every vector is its own additive inverse

⃗x ⊕ ⃗x = ⃗0. (B.58)

Similarly the only allowed scalars are s = 0, 1 because only for these values
are vectors of the form s⃗x in the space of bit strings. The set of scalars {0, 1}
APPENDIX B. FORMAL DEFINITION OF A VECTOR SPACE 221

together with the operations of ordinary multiplication and addition mod 2,

constitute a field. This binary field is traditionally denoted F2 . Given these
facts, it is straightforward to verify that the criteria required for the set of
bit strings of length n to be a vector space over a field are all satisfied.
We can define an inner product on this vector space
n−1
!
X
⃗x.⃗y = xj yj mod 2 = (⃗x · ⃗y ) mod 2, (B.59)
j=0

where ⃗x · ⃗y is the ordinary Euclidean dot product. (The inner product has
to be a scalar and there are only two allowed scalars which is why the mod
2 arithmetic is required in the inner product.) The ‘length’ of a vector
L2 = L = ⃗x.⃗x is thus the parity of the number non-zero bits in the string.
L = 0 if the vector contains an even number of 1’s and L = 1 if the vector
contains an odd number of 1’s.
This notion of ‘length’ does not give a very complete notion of the dis-
tance between two vectors. To remedy this one can define the notion of the
Hamming distance between two vectors
n−1
X
dH (⃗x, ⃗y ) = (xj ⊕ yj ). (B.60)
j=0

Because xj ⊕ yj is zero if the two bits agree and one if they differ, the
Hamming distance is the total number of instances where the bit strings
differ. Equivalently it is the total number of bits in ⃗x that would need to be
flipped to convert ⃗x into ⃗y .
The notion of Hamming distance is very important in classical error cor-
rection, because the Hamming distance between a code word (a bit string in
the code space) and a word that has been corrupted by errors (bit flips) is
equal to the number of bitflip errors. These various notions of length and
distance are not to be confused with the ‘length’ n of a bit string in the
ordinary sense of the number of bits in the bit string vectors, which is the
dimension of the vector space.
Appendix C

Handy Mathematical Identities

Useful Information about the Mathematical Representation of

Qubit States and Operations
Standard basis states:

1
|0⟩ = | ↑⟩ =
0

0
|1⟩ = | ↓⟩ = .
1
The representation of the basis states in the n̂ basis in terms of the stan-
dard basis states is

θ iφ θ
| + n̂⟩ = cos |0⟩ + e sin |1⟩
2 2

θ iφ θ
| − n̂⟩ = sin |0⟩ − e cos |1⟩
2 2
Standard basis states:

1
|0⟩ = | ↑⟩ =
0

0
|1⟩ = | ↓⟩ = .
1

222
APPENDIX C. HANDY MATHEMATICAL IDENTITIES 223

Pauli matrices:

x 0 +1
X=σ =
+1 0

0 −i
Y = σy =
+i 0

+1 0
Z = σz =
0 −1

+1 0
I = σ0 = .
0 +1
Trace of Pauli matrices
Tr X = Tr Y = Tr Z = 0
TrI = 2.
Products of Pauli Matrices
X2 = Y 2 = Z2 = I
XY = −Y X = iZ
YZ = −ZY = iX
ZX = −XZ = iY
XY Z = iI.

Commutators of Pauli matrices:

[X, Y ] = XY − Y X = 2iZ
[Y, Z] = Y Z − ZY = 2iX
[Z, X] = ZX − XZ = 2iY.
APPENDIX C. HANDY MATHEMATICAL IDENTITIES 224

Anticommutators of Pauli matrices

{X, Y } = XY + Y X = 0
{Y, Z} = Y Z + ZY = 0
{Z, X} = ZX + XZ = 0.
Eigenstates of σ x :
1
|±⟩ = √ [|0⟩ ± |1⟩] .
2
Eigenstates of σ y :
1
| ± i⟩ = √ [|0⟩ ± i|1⟩] .
2
2
Euler-Pauli Identity: If A is the identity operator I (for any dimension
Hilbert space) then for real θ, eiθA = [cos θ]I + [i sin θ]A.

Rotation of qubit on Bloch sphere by an angle χ around the ω̂ axis (recall

the right-hand rule)
χ
Rω̂ (χ) = e−i 2 ω̂·⃗σ = cos(χ/2) Iˆ − i sin(χ/2) ω̂ · ⃗σ .
Standard right-to-left ordering of multi-qubit states in the computational
basis |qn−1 , qn−2 , . . . , q2 , q1 , q0 ⟩. Generic two-qubit state is:
 
ψ00
 ψ01 
|Ψ⟩ = ψ00 |00⟩ + ψ01 |01⟩ + ψ10 |10⟩ + ψ11 |11⟩ =   ψ10  .


ψ11
In addition to the standard orthonormal computational basis states for
two qubits, {|00⟩, |01⟩, |10⟩, |11⟩}, another commonly used orthonormal basis
for two qubits are the so-called Bell states:
1
|B0 ⟩ = √ [|01⟩ − |10⟩]
2
1
|B1 ⟩ = √ [|01⟩ + |10⟩]
2
1
|B2 ⟩ = √ [|00⟩ − |11⟩]
2
1
|B3 ⟩ = √ [|00⟩ + |11⟩] .
2
APPENDIX C. HANDY MATHEMATICAL IDENTITIES 225

Handy trig identities:

2 ξ 2 ξ
cos(ξ) = cos − sin
2 2

ξ ξ
sin(ξ) = 2 sin cos
2 2

ξ ξ ξ
sin = − sin cos(ξ) + cos sin(ξ)
2 2 2

ξ ξ ξ
cos = sin sin(ξ) + cos cos(ξ)
2 2 2
√
sin(π/4) = cos(π/4) = 1/ 2.
Appendix D

Suggestions for Projects

The TF’s and the Peer Tutors as well as members of the Yale Undergraduate
Quantum Computing Club can help with selecting and carrying out your
project. The goal is to learn something not covered in class and produce a
∼ 10 page pedagogical write up that will allow other students to learn the
topic as well.

D.1 Gates and Instruction Sets

1. Clifford group; Clifford operations produce entanglement but are easy
to simulate classically by updating the stabilizer set.

2. Clifford gates (X,Y,Z,H,CNOT) plus the T gate is universal; What

is the circuit depth required to synthesize an arbitrary single-qubit
rotation to precision ϵ?

3. Toffoli plus Hadamard gate set is universal:

http:arxiv.org/abs/quant-ph/0301040v1

4. In classical reversible computing a three bit gate is required for uni-

versality (e.g. Toffoli or Fredkin). Single bit operations plus reversible
two-qubit gates (e.g. CNOT) are insufficient for universal computation
because it is impossible to synthesize a Toffoli (CCNOT )or Fredkin
(CSWAP) with classical CNOTs. Quantum mechanically it is possible.
See: Chau and Wilczek, Phys. Rev. Lett. 75, 748 (1995); Smolin and
Divincenzo, Phys. Rev. A 53, 2855 (1996).

226
APPENDIX D. SUGGESTIONS FOR PROJECTS 227

5. Quantum Random Access Memory (QRAM)

Used to query a database with a superposition of addresses. Numerous
applications.
For example, here are some references that describe how classical data
may be encoded in the amplitudes of a quantum state.

(a) The basic idea was first presented in https://arxiv.org/pdf/quant-

ph/0208112.pdf. However, in that work, they don’t use QRAM.
Rather, they assume a restriction that the data is efficiently inte-
grable, which obviates the need for QRAM.
(b) In the general case, we can use QRAM to encode data in the am-
plitudes,
Pmprovided that the QRAM also stores partial sums of the
2
form i=n |xi | , where x is the data vector. This is described
in https:arxiv.org/abs/1812.00954 (in the vicinity of Eq. (8)),
in https:arxiv.org/pdf/1607.05256.pdf (proposition 3.3.5), and in
https:arxiv.org/pdf/1603.08675.pdf (Theorem A1).
(c) In situations where the data cannot be efficiently integrated, and
where the partial sums are not available, then it is still possible
to prepare the amplitude encoding probabilistically using QRAM.
This is described in https:arxiv.org/pdf/1804.00281.pdf (in the
vicinity of Eq. (3)). This approach is not always efficient, as the
success probability can be small.

6. Alternatives to the Circuit Model of Quantum Computing

(a) Cluster state quantum computing uses only an initial entangled

state plus measurement and feed-forward
(b) Adiabatic quantum computation (for students with enough quan-
tum mechanics background to understand the adiabatic theorem
for state evolution under a slowly varying Hamiltonian)

D.2 Efficient Representation of Quantum

States
1. Matrix Product States; PEPS; Tensor Product Methods; their use in
VQE and QAOA
APPENDIX D. SUGGESTIONS FOR PROJECTS 228

D.3 Algorithms
We will cover some of the algorithms described in Kaye, Laflamme and Mosca
(An Introduction to Quantum Computing) but not all, so this is a useful
reference. Rieffel and Polak (Quantum Computing: A Gentle Introduction)
is also useful.
The review by Ashley Montaro (doi:10.1038/npjqi.2015.23) on quantum
algorithms is a good starting point. This and other papers can be found in
“Papers for Projects” folder in the files section of Canvas.
The following websites maintain a large list of quantum algorithms and
protocols

https://quantumalgorithmzoo.org/
https://wiki.veriqloud.fr/index.php?title=Protocol Library

Here are a few suggestions:

1. Quantum Walks (quantum versions of statistical random walks)

2. Accelerated Grover Search? Korepin https://arxiv.org/abs/2102.01783

3. Oblivious Amplification

4. Amplitude Estimation and Quantum Counting

5. Phase estimation (may cover some aspects of this in class but you could
go into more depth or could experiment with executing in on the IBM
machines)

6. Quantum Fourier Transform (We will do this in class but you could do
a more detailed comparison with the classical Fast Fourier Transform
Algorithm, experiment with executing the QFT on the IBM machines.)

7. Shor’s factoring algorithm

8. Discrete Logarithm algorithm

9. Hidden Subgroup Problem

10. Graph Isomorphism Problem

11. HHL Algorithm for Linear Algebra

APPENDIX D. SUGGESTIONS FOR PROJECTS 229

12. Quantum Machine Learning

13. Quantum Singular Value transformation (See the colloquium talk by

Isaac Chuang:

https://www.youtube.com/watch?v=NZ5PmIaJ5IE

14. Variational Quantum Estimation (VQE) (will cover in class but you
could apply this to a specific problem using the IBM Q system)

15. Quantum Approximate Optimization Algorithm (QAOA) (closely re-

lated to VQE; will cover in class but you could apply this to a specific
problem using the IBM Q system)

16. Quantum secret sharing

17. Quantum bit commitment

18. Quantum multi-party function evaluation

D.4 Quantum Error Correction and Fault

Tolerance
1. Steane vs. Knill error correction and leakage errors

2. Concatenation of codes and the code capacity threshold

3. Families of surface codes

4. Entanglement purification and distillation

5. Magic state injection

6. Easton-Knill theorem on transversal gates and fault tolerance

7. Quantum Low-Density Parity Check (LDPC) Codes; CSS codes;

APPENDIX D. SUGGESTIONS FOR PROJECTS 230

D.5 Quantum Communication and Security

1. Bell Inequalities (discussed in course notes but not in class); CHSH
test, Mermin 3 qubit test, Experiments on Loophole-free Bell tests;
Contextuality in quantum mechanics; ‘magic states’

2. Quantum encryption using entanglement (BB84 which does not use

entanglement will be covered in class)

(a) ‘Quantum cryptography based on Bell’s theorem,’ Artur K. Ekert,

Phys. Rev. Lett. 67, 661 (1991).
(b) ‘Quantum Cryptography: Public Key
Distribution and Coin Tossing,’ Charles H. Bennett and Gilles
Brassard, arxiv.org/abs/2003.06557.
(c) ‘Entanglement-based quantum communication over 144km,’ R.
Ursin et al. (Zeilinger group), doi:10.1038/nphys629

3. Quantum channel capacity; hashing bound

D.6 Quantum Complexity Classes

1. Sampling Complexity

2. Quantum Supremacy Claim: Cross entropy benchmark and Heavy Out-

put Sampling; may cover this in class but you could do experiments on
the IBM Q; How good is the maximum entropy assumption for random
unitaries?

D.7 Quantum Hardware Platforms

1. Ion traps

2. Rydberg atom arrays

3. Continuous variable quantum information processing (using harmonic

oscillators)

4. Superconducting qubits
APPENDIX D. SUGGESTIONS FOR PROJECTS 231

(a) circuit QED (physics background needed)

(b) Prof. Jens Koch at Northwestern
(https://sites.northwestern.edu/koch/)
has volunteered to advise a project on using his scQubits Python
platform to solve the Schrödinger equation for the energy level
structure of superconducting qubits
Bibliography

[1] Scarani, Valerio and Iblisdir, Sofyan and Gisin, Nicolas and Acı́n, An-
tonio, ‘Quantum cloning,’ Rev. Mod. Phys. 77, 1225 (2005).

[2] Bennett, Charles H. and Wiesner, Stephen J., ‘Communication via one-
and two-particle operators on Einstein-Podolsky-Rosen states,’ Phys.
Rev. Lett. 69, 2881-2884 (1992).

[3] A. Einstein and B. Podolsky and N. Rosen, ‘Can quantum-mechanical

description of physical reality be considered complete?’, Phys. Rev. 47,
777-780 (1935).

[4] J. S. Bell, ‘On the Einstein Podolsky Rosen paradox,’ Physics 1, 195–200
(1964).

[5] J. F. Clauser, M. A. Horne, A. Shimony, and R. A. Holt, ‘Proposed

experiment to test local hidden-variable theories,’ Phys. Rev. Lett. 23,
880–884, (1969).

[6] C. H. Bennett, G. Brassard, C. Crépeau, R. Jozsa, A. Peres, and W.

K. Wootters, ‘Teleporting an unknown quantum state via dual classical
and Einstein-Podolsky-Rosen channels,’ Phys. Rev. Lett. 70, 1895–1899
(1993).

232

Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Online Agriculture Products Store - 1
100% (7)
Online Agriculture Products Store - 1
53 pages
Classical and Quantum Computing With C++ and Java Simulations
100% (1)
Classical and Quantum Computing With C++ and Java Simulations
606 pages
303012357X Mathematics of Quantum Computing An Introduction (Scherer 2019-11-13) (29B45CBD) PDF
100% (6)
303012357X Mathematics of Quantum Computing An Introduction (Scherer 2019-11-13) (29B45CBD) PDF
773 pages
Introduction To Quantum Algorithms
100% (1)
Introduction To Quantum Algorithms
390 pages
2022 RaySharp Product Guide
No ratings yet
2022 RaySharp Product Guide
87 pages
(Graduate Studies in Mathematics 047) A. Yu. Kitaev, A. H. Shen, M. N. Vyalyi - Classical and Quantum Computation-Amer Mathematical Society (2002)
No ratings yet
(Graduate Studies in Mathematics 047) A. Yu. Kitaev, A. H. Shen, M. N. Vyalyi - Classical and Quantum Computation-Amer Mathematical Society (2002)
274 pages
Principles - of .Quantum - Artificial.Intelligence PDF
100% (5)
Principles - of .Quantum - Artificial.Intelligence PDF
277 pages
كوانتم كومبيوتك
No ratings yet
كوانتم كومبيوتك
165 pages
03 Algorithm and Error Correction New Updates
No ratings yet
03 Algorithm and Error Correction New Updates
102 pages
IQC Masterfile
No ratings yet
IQC Masterfile
117 pages
Lectures Notes On Quantum Computing and Quantum Information
No ratings yet
Lectures Notes On Quantum Computing and Quantum Information
187 pages
Lectures Qco
No ratings yet
Lectures Qco
124 pages
Qclec
No ratings yet
Qclec
260 pages
Intro To Quantum Computing - Aaronson
No ratings yet
Intro To Quantum Computing - Aaronson
259 pages
Qbook 1
No ratings yet
Qbook 1
438 pages
Quantum Computation and Quantum Information: Michael A. Nielsen & Isaac L. Chuang
No ratings yet
Quantum Computation and Quantum Information: Michael A. Nielsen & Isaac L. Chuang
8 pages
Quantum Computing - Lecture Notes
100% (2)
Quantum Computing - Lecture Notes
114 pages
Quantum Problems
No ratings yet
Quantum Problems
255 pages
22 Scheme Physics For Cse Module 3 Notes
No ratings yet
22 Scheme Physics For Cse Module 3 Notes
45 pages
Quantum Computers - Theory and Algorithms by Belal Ehsan Baaquie Leong-Chuan Kwek (Springer 2023)
No ratings yet
Quantum Computers - Theory and Algorithms by Belal Ehsan Baaquie Leong-Chuan Kwek (Springer 2023)
297 pages
Quantum Computing: Lecture Notes: Ronald de Wolf
No ratings yet
Quantum Computing: Lecture Notes: Ronald de Wolf
163 pages
The Temple of Quantum Computing
No ratings yet
The Temple of Quantum Computing
251 pages
A Short Introduction To Quantum Computing For Physicits
No ratings yet
A Short Introduction To Quantum Computing For Physicits
66 pages
Quantum Programming in QCL PDF
No ratings yet
Quantum Programming in QCL PDF
109 pages
Quantum Computing Lecture Notes Another Set
No ratings yet
Quantum Computing Lecture Notes Another Set
105 pages
O Zapata S Notes On Quantum Computing 1 2-3-188 Pages 1741209003
No ratings yet
O Zapata S Notes On Quantum Computing 1 2-3-188 Pages 1741209003
188 pages
Physics 160 Notes
No ratings yet
Physics 160 Notes
73 pages
Introduction To Classical and Quantum Computing
No ratings yet
Introduction To Classical and Quantum Computing
392 pages
The Temple of Quantum Computing
No ratings yet
The Temple of Quantum Computing
250 pages
QC Notes
No ratings yet
QC Notes
141 pages
Quantum Cryptography and Quantum Computation: Network Security Course Project Report
No ratings yet
Quantum Cryptography and Quantum Computation: Network Security Course Project Report
16 pages
Qic 710 Primer 2024
No ratings yet
Qic 710 Primer 2024
70 pages
Unconventional Computation - MacLennan - 2018
No ratings yet
Unconventional Computation - MacLennan - 2018
304 pages
Barbados 2016
No ratings yet
Barbados 2016
111 pages
Ph410 Physics of Quantum Computation: Princeton University
No ratings yet
Ph410 Physics of Quantum Computation: Princeton University
252 pages
Quantum Computing A Short Course From Theory To Experiment 1st Edition Joachim Stolze
No ratings yet
Quantum Computing A Short Course From Theory To Experiment 1st Edition Joachim Stolze
51 pages
PIII - Quantum Computation - Chua, Jozsa (2016) 50pg
No ratings yet
PIII - Quantum Computation - Chua, Jozsa (2016) 50pg
50 pages
Luecke W. - Quantum Information Processing (2005)
No ratings yet
Luecke W. - Quantum Information Processing (2005)
201 pages
An Introduction To Quantum Computing Without The Physics Giacomo Nannicini 46 2020
No ratings yet
An Introduction To Quantum Computing Without The Physics Giacomo Nannicini 46 2020
46 pages
Aaronson - 2016 - The Complexity of Quantum States and Transformations From Quantum Money To Black Holes
No ratings yet
Aaronson - 2016 - The Complexity of Quantum States and Transformations From Quantum Money To Black Holes
111 pages
Renes Lecture Notes14 PDF
No ratings yet
Renes Lecture Notes14 PDF
187 pages
Quantum Information Theory (Lecture Notes)
No ratings yet
Quantum Information Theory (Lecture Notes)
101 pages
Basics of Quantum Computing
No ratings yet
Basics of Quantum Computing
23 pages
Script QI
No ratings yet
Script QI
199 pages
Yao Andrew1
No ratings yet
Yao Andrew1
31 pages
Introduction To Classical and Quantum Computing 1e4p
No ratings yet
Introduction To Classical and Quantum Computing 1e4p
400 pages
Classical and Quantum Computation
No ratings yet
Classical and Quantum Computation
24 pages
Lectures On Quantum Computation, Quantum Error Correcting Codes and Information Theory
No ratings yet
Lectures On Quantum Computation, Quantum Error Correcting Codes and Information Theory
132 pages
Amity School of Engineering: Report Heading: Quantum Computing
No ratings yet
Amity School of Engineering: Report Heading: Quantum Computing
38 pages
Course On Quantum Computing
No ratings yet
Course On Quantum Computing
235 pages
Qcnotes PDF
No ratings yet
Qcnotes PDF
218 pages
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet
Human Nature Potential in Nurture
From Everand
Human Nature Potential in Nurture
David L. Hawk
No ratings yet
Plain JavaScript: Learning the Front-End
From Everand
Plain JavaScript: Learning the Front-End
Roger Beans-Rivet
No ratings yet
12-PWE3 Introduction-20090724-A
No ratings yet
12-PWE3 Introduction-20090724-A
38 pages
Northern Railway - Tender Document
No ratings yet
Northern Railway - Tender Document
52 pages
102 Calculus (Lecture Note) Limit and Continuity
100% (1)
102 Calculus (Lecture Note) Limit and Continuity
6 pages
BS EN 80000-13-2008 - 1 Quantities and Units - Djvu
No ratings yet
BS EN 80000-13-2008 - 1 Quantities and Units - Djvu
36 pages
A Smart Walking Stick For Visually Impaired Using Raspberry Pi
No ratings yet
A Smart Walking Stick For Visually Impaired Using Raspberry Pi
6 pages
Applicant Information: Change The Way You Watch TV
No ratings yet
Applicant Information: Change The Way You Watch TV
3 pages
Daniel - Matta Analytics
No ratings yet
Daniel - Matta Analytics
1 page
Cambridge IGCSE™ ICT Coursebook (Victoria Wright, Denise Taylor, David Waller) (Z-Library)
100% (1)
Cambridge IGCSE™ ICT Coursebook (Victoria Wright, Denise Taylor, David Waller) (Z-Library)
560 pages
HOMO2
No ratings yet
HOMO2
15 pages
Factoring Polynomials: Be Sure Your Answers Will Not Factor Further!
No ratings yet
Factoring Polynomials: Be Sure Your Answers Will Not Factor Further!
5 pages
Data Loss Prevention PDF
No ratings yet
Data Loss Prevention PDF
44 pages
Lecture 11
No ratings yet
Lecture 11
85 pages
An Advanced Liquid Cooling Design For Data Center Final v3 1 PDF
No ratings yet
An Advanced Liquid Cooling Design For Data Center Final v3 1 PDF
23 pages
User Manual: PFO.9 1K/1.5K/2K/3K Online UPS
No ratings yet
User Manual: PFO.9 1K/1.5K/2K/3K Online UPS
19 pages
Abstract Algebra
No ratings yet
Abstract Algebra
4 pages
Ae 212 Midterm Departmental Exam - Docx-1
No ratings yet
Ae 212 Midterm Departmental Exam - Docx-1
12 pages
Internet and Web Browsers
No ratings yet
Internet and Web Browsers
28 pages
Entity Authentication
No ratings yet
Entity Authentication
30 pages
C++ Operator Overloading 2
No ratings yet
C++ Operator Overloading 2
38 pages
Log
No ratings yet
Log
52 pages
Pitfalls and Challenges Faced During A Microservices Architecture Implementation Codex5066
No ratings yet
Pitfalls and Challenges Faced During A Microservices Architecture Implementation Codex5066
21 pages
Top 50 db2 Interview Questions
No ratings yet
Top 50 db2 Interview Questions
8 pages
2022-Cloud Computing Security and Law-1
No ratings yet
2022-Cloud Computing Security and Law-1
5 pages
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
No ratings yet
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
85 pages
UKG Dimensions Users GuideFinal
No ratings yet
UKG Dimensions Users GuideFinal
26 pages
Normal
No ratings yet
Normal
41 pages
Kill Stale Version Store Connection
No ratings yet
Kill Stale Version Store Connection
3 pages
Excel 365 Charts
No ratings yet
Excel 365 Charts
63 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.