Girvin Introduction To Quantum 2024-04-21 v45
Girvin Introduction To Quantum 2024-04-21 v45
Steven M. Girvin
© 2021-2024
Disclosure:
The author is a shareholder in IBM Corporation,
and consultant for, and equity holder in,
Quantum Circuits, Inc., a quantum computing company.
0.1 Preface
These lecture notes are intended for undergraduates from a variety of dis-
ciplines (e.g. physics, mathematics, chemistry, computer science, electrical
engineering) interested in an initial introduction to the new field of quantum
information science.
The reader may wish to consult texts such as ‘An Introduction to Quan-
tum Computing,’ by Phillip Kaye, Raymond Laflamme and Michele Mosca,
hereafter referred to as [KLM], or ‘Quantum Computing: A Gentle Introduc-
tion,’ by Eleanor Rieffel and Wolfgang Polak, hereafter referred to as [RP].
Computer scientists may be interested in consulting ‘Quantum Computer
Systems,’ by Yongshan Ding and Fredrick Chong. Having mastered the lec-
ture notes for this class, you will be ready to open the bible in the field,
‘Quantum Computation and Quantum Information,’ by Michael Nielsen and
Isaac Chuang, universally referred to as ‘Mike and Ike’ (perhaps after the
candy bar of the same name).
Contents
0.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
1 Introduction 1
1.1 What is Information? . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Error Correction and Data Compression . . . . . . . . . . . 14
1.3 What is a computation? . . . . . . . . . . . . . . . . . . . . 23
1.4 Universal Gate Sets . . . . . . . . . . . . . . . . . . . . . . . 30
ii
CONTENTS iii
7 Yet To Do 191
Introduction
The first quantum revolution began with the development of the quan-
tum theory early in the 20th century. Scientists carrying out fundamental,
curiosity-driven research about the nature of reality, discovered that reality
is not what it seems to humans. The laws of classical mechanics developed
by Galileo, Newton and others beautifully and accurately describe the mo-
tion of macroscopic objects like baseballs, soccer balls and planets, but they
fail for microscopic objects like atoms and electrons. This research was not
motivated by practical applications and indeed seemed likely to have none.
And yet it did. The first quantum revolution led to the invention of the
transistor, the laser and the atomic clock. These three inventions in turn
produced the fantastic technological revolution of the second half of the 20th
century that brought us (among many other things) the computer, fiber-optic
communication and the global positioning system.
There is now a second quantum revolution underway in the 21st century.
It is not based on a new theory of quantum mechanics. Indeed the quantum
theory is essentially unchanged from the 1920’s and is the most successful
theory in all of physical science–it makes spectacularly accurate quantitative
predictions. Instead this second revolution is based on a new understanding
of the often counter-intuitive aspects of the quantum theory. We now un-
derstand that the quantum devices invented in the 20th century do not use
the full power that is available to quantum machines. This new understand-
ing has rocked the foundations of of theoretical computer science laid down
nearly a century ago by Church and Turing in their exploration of the con-
cept of computability. This new understanding is leading to the invention of
radically new techniques for processing and communicating information and
1
CHAPTER 1. INTRODUCTION 2
for making measurements of the tiniest signals. These new techniques are
being rapidly developed and if they can be transformed into practical tech-
nologies (and this is a big ‘if’), the second quantum revolution could have an
impact on human society at least as profound as the technological revolution
of the 20th century.
Achieving the full potential of the second quantum revolution will require
interdisciplinary research involving physics, computer science, electrical en-
gineering, mathematics and many other disciplines. My goal is to present
quantum information, computation and communication in a way that is ac-
cessible to undergraduates from any quantitative discipline. This will require
removing a lot of the physics language, historical background and motivation
to the quantum theory and simply presenting a set of rules by which one can
describe the states and operations of quantum machines.
Before entering the quantum realm, let us begin by reviewing some ba-
sic concepts about information and how it is quantified and processed. By
analogy with the fact that the pre-quantum theory of mechanics is known
today as ‘classical’ mechanics, we will refer to the pre-quantum theory of
information as classical information.
Both classical and quantum information involves the concept of random-
ness. Before going further, the reader is therefore urged to review the basic
concepts of probability and statistics in Appendix A
The word ‘bit’ is short for binary digit a number whose value can be
represented by 0 or 1 in the binary numbering system (base 2 numbers).
The word bit is also used to mean the amount of information contained in
the answer to a yes/no or true/false question (assuming you have no prior
knowledge of the answer and that the two answers are equally likely as far
as you know). If Alice gives Bob either a 0 or a 1 drawn randomly with
equal probability, then Bob receives one bit of information. Bits and base
2 numbers are very natural to use because information is physical. It is
transmitted and stored using the states of physical objects, as illustrated
in Fig. 1.1. For example an electrical switch can be open or closed and its
state naturally encodes one 1 bit of information. Similarly a transistor in a
computer chip can be in one of two electrical states, on or off. Discrete voltage
levels in a computer circuit, say 0 volts and 5 volts are also used to encode
the value of a bit. This discretization is convenient because it helps make
the system robust against noise. Any value of voltage near 0 is interpreted
as representing the symbol 0 and any value near 5 volts is interpreted as
representing the symbol 1. Information can also be stored in small domains
of magnetism in disk drives. Each magnetic domain acts like a small bar
magnet whose magnetic moment (a vector quantity pointing in the direction
running from the south pole to the north pole of the bar magnet) can only
point up or down. Ordinarily a bar magnet can point in any direction in
space, but the material properties of the disk drive are intentionally chosen
to be anisotropic so that only two directions are possible. Information is also
communicated via the states of physical objects. A light bulb can be on or
off and the particles of light (photons) that it emits can be present or absent,
allowing a distant observer to receive information.
Box 1.1. Two meanings of ‘bit’ The word bit can refer to the mathe-
matical measure of information content of a message consisting of a string
of symbols with a specified probability distribution. It can also refer to the
physical object storing the information, as in ‘My computer has 100 bytes of
memory’, with one byte being 8 bits. A useful table of prefixes is
kilo 103 peta 1015
mega 106 exa 1018
giga 109 zetta 1021
terra 1012 yotta 1024
CHAPTER 1. INTRODUCTION 4
“0” “1”
“1” “0”
Figure 1.1: The fact that ‘information is physical’ is illustrated by this electrical circuit
encoding one bit of information. Upper panel: Bit value 0 (1) is represented by the switch
being open (closed) and the light bulb is off (on). Lower panel: There is only one other
possible encoding of the classical information, namely the one in which states 0 and 1 and
interchanged. A simple NOT operation transforms one encoding into the other. We will
see that the quantum situation is much richer than this.
corresponding to the 8 binary numbers whose (base 10) values run from 0
to 7. This is a remarkably efficient encoding–with an ‘alphabet’ of only two
symbols (0 and 1) Alice can send Bob one of 232 = 4, 294, 967, 296 possible
messages at a cost of only having to transmit 32 bits (or 4 ‘bytes’). In fact,
if all the messages are equally likely to be chosen, this is provably the most
efficient possible encoding.2
Let us now turn this picture around by considering the same situation in
which Alice sends one message from a collection of M = 2N possible distinct
messages to Bob. If Alice is equally likely to choose all messages, this will
maximize Bob’s ‘surprise’ because no information in advance about which
message is likely to be sent. Thus the information he receives is maximized.
We have seen that Alice must pay the price of sending N = log2 M physical
2
More precisely, there is no encoding more efficient than this. There are many encodings
that are as efficient that differ simply by permutation of the symbols. For example, we
could order the binary digits with the least significant to the left instead of to the right.
CHAPTER 1. INTRODUCTION 5
bits to Bob. This strongly suggests that we should quantify the amount of
information that Bob receives as
H = log2 M = N (1.1)
bits of information (one for each physical bit received). If the messages are
not all equally likely to be sent, the surprise is less and we will see shortly
how to modify this formula to account for that.
Box 1.2. Reminder About Logarithms In Eq. (1.1), log2 means loga-
rithm base 2. Let us remind ourselves about some basic properties of loga-
rithms in different bases:
x = eln x (1.2)
(log10 x)
10log10 x = e(ln 10) = e(ln 10)(log10 x)
x = (1.3)
(log2 x)
2log2 x = e(ln 2) = e(ln 2)(log2 x)
x = (1.4)
ln x = (ln 2)(log2 x) = (ln 10)(log10 x) (1.5)
1
log2 x = ln x ≈ 1.442695 ln x (1.6)
ln 2
ln 10
log2 x = log10 x ≈ 3.32193 log10 x, (1.7)
ln 2
where ln is the natural logarithm (log base e). Thus the choice of different
bases simply multiplies the information measure by a constant factor. The
standard choice of log base 2 gives us (by definition) the information content
in bits.
Does it make sense that the information is logarithmic in the total number
of possible messages? Indeed it does, because if Alice randomly chooses
m messages to send, the total number of possible compound messages is
M ′ = M m and we have
(when using the optimal encoding), then sending m messages should require
m log2 M physical bits. As noted above, if we had chosen a different base
for the logarithm, our measure of information would just change by a fixed
scale factor. Only when we use base 2 is the information content measured
in bits.
So far we have only considered the case where Alice chooses randomly
with equal probability amongst M = 2N messages. It is the randomness
of her choice that insures that Bob is always surprised by the message he
receives. He has no way of guessing in advance what the message will be.
You might wonder why Alice wants to send Bob random messages. If she
were sending Bob text written in English or Swedish say, the characters she
sends would not be random. For example, in English text the letters e,t and
s occur more frequently than other letters. Perhaps Alice is a spy and she
is sending secretly encoded messages to Bob. If Alice made the mistake of
using a simple substitution cipher in which (say) e is always represented as g,
t as w and s as p, etc., then an eavesdropper would notice that the symbols
CHAPTER 1. INTRODUCTION 7
Table 1.1: Alice wants to transmit one of 8 messages to Bob. Since she and Bob each
have a book containing the messages, she need only transmit the number labeling the
message she wants to convey to Bob. The binary encoding requires only three bits. This
is the shortest possible encoding (using two symbols) and thus a natural measure of the
information content is 3 bits. There are many less efficient encodings such as the unary
encoding shown in the third column of the table. Notice that this inefficient encoding is not
very random since 0 occurs exponentially more frequently than 1 (relative to the binary
encoding). It is this lack of randomness (‘surprise’) that makes the encoding inefficient.
g, w and p appear more frequently than others and this would give a clue
that could be used to help break the code. To prevent this, Alice should
instead use a code with the property that the encoded messages appear to
be completely random. Perhaps Alice is not a spy, but merely wants to
compress the data she sends so that the message is as short as possible. As
we will see below, the non-randomness of natural language texts means that
they can be compressed into shorter messages, but since the total amount
of information must be preserved, the compressed messages are necessarily
more random.
We already saw an illustration of this concept in the example given in
Table 1.1. Provided that Alice selects her messages to send randomly with
equal probability, then the bits in the binary code (middle column) are ran-
dom and unpredictable, whereas the bits in the inefficient code are much less
random (being mostly zeros). Suppose however that the probability distribu-
tion from which Alice selects her messages is not uniform. Say for example,
message 0 is selected 93% of the time and the seven remaining messages are
selected with probability 1%. Then the binary bits of the binary code are less
random, being all zeros 93% of the time (since message 0 is encoded as (000)
CHAPTER 1. INTRODUCTION 8
where pj is the probability of receiving the jth message from the list of M
distinct possible messages. We need the minus sign in the definition because
probabilities always lie in the domain 0 ≤ p ≤ 1 and so their log is always
negative.4 For the case we have been considering up to now, all messages
3
Thermodynamics is the study of heat, work and phase transitions such as melting of
solids into liquids and boiling of liquids into vapors. Chemists use it to predict chemical
reactions and physicist and engineers use it to predict the efficiency of heat engines. In
these fields, the standard symbol for entropy is S and it is usually define with natural
logarithms rather than base 2 logs.
4
Note that the logarithm diverges as pj goes to zero, but only very slowly. Using
l’Hôpital’s rule, one sees that the limit limpj →0 pj log2 (pj ) = 0. We implicitly use this
smooth limit in the case where any of the probabilities actually are zero and the corre-
sponding logarithm is undefined.
CHAPTER 1. INTRODUCTION 9
Figure 1.2: Left panel: positions of atoms in an ideal crystal. If you are told the positions
of a few atoms in the upper left corner, you can predict the positions of the rest because
of the simple repeating nature of the pattern. Right panel: snapshot of the positions of
the atoms in a liquid at some moment. The positions are not completely random (e.g., no
two atoms occupy the same space) but they are much more random than in the crystal.
Thus the thermodynamic entropy is larger for the liquid, as is the Shannon information
entropy of the message listing the positions of all the atoms.
in agreement with our previous result in Eq. (1.1). We begin to see here that
the information content of a set of messages is a function of the probability
distribution over the set of possible messages.
It turns out that Shannon’s Eq. (1.9) is still correct even if different
messages have different probabilities. Indeed we should think of Shannon
entropy as a general property of any probability distribution. The entropy is
a measure of the randomness associated with sampling from the distribution.
To begin to understand why this is so, consider the case M = 2 with the
probability of receiving the symbol 0 being p0 and the probability of receiving
the symbol 1 being p1 = 1 − p0 The Shannon entropy for this special case,
H2 = h(p0 , p1 ) is plotted in Fig. 1.3. The graph makes intuitive sense because
the information content of the message goes to zero when the probability of
either symbol approaches unity and the other approaches zero. In this limit
the messages are completely predictable and there is no surprise left. We
also see that, consistent with our earlier intuition above, when the messages
are equally likely (p0 = p1 = 21 ), the randomness is maximized and the
information content of the message takes on its maximum value of precisely
CHAPTER 1. INTRODUCTION 10
h( p0 , p1 )
p0
Figure 1.3: Shannon information entropy for a message containing a single symbol. The
symbol 0 is chosen randomly with probability p0 and the symbol 1 is chosen with proba-
bility p1 = 1 − p0 . Notice that the information content (Shannon entropy) is maximized
at p0 = p1 = 12 , the point at which the surprise as to which symbol is chosen is maximum.
one bit.
It may seem strange to talk about fractions of a bit of information. What
does that mean? To prevent confusion we have to remember that the word
bit has two meanings (see Box 1.1). First it can mean a physical carrier of
information that can be in two possible states (e.g. a switch that is on or
off). The number of these physical objects is always an integer. The second
meaning is an abstract measure of your level of surprise in reading a message.
The message is carried by single physical objects storing a 0 or 1. If each
object is almost always in state 0 when you read it, then you are not very
surprised when the next one drawn from the same probability distribution
is also 0. Thus the Shannon information (entropy) carried by one physical
instantiation of a bit can be much less than one abstract mathematical bit
of information.
Suppose now that Alice sends Bob two symbols each drawn from the
same probability distribution. If Eq. (1.9) is correct, then it should predict
that the total information transmitted is twice as large as when only a single
symbol is sent. We can think of this two-symbol message as being drawn
from an ‘alphabet’ of M = 4 possible messages chosen from a probability
distribution R shown in Table 1.2.
Here we have used the fact that if two events are statistically independent,
their joint probability is the product of their individual probabilities5 . If Alice
5
This factoring of a joint probability distribution into two separate probability distri-
CHAPTER 1. INTRODUCTION 11
Table 1.2: Enumeration of the four possible messages of length two and their corresponding
probabilities.
is using a biased coin toss to select the symbols for the two messages, the
second coin toss is unaffected by the outcome of the first. Plugging this into
Eq. (1.9) and using the probability distribution R yields
3
X
H4 = − rj log2 (rj ) (1.11)
j=0
= − {p0 p0 log2 (p0 p0 ) + 2p0 p1 log2 (p0 p1 ) + p1 p1 log2 (p1 p1 )}
= − {2p0 p0 log2 (p0 ) + 2p0 p1 [log2 (p0 ) + log2 (p1 )] + 2p1 p1 log2 (p1 )}
= −2 {(p0 + p1 )p0 log2 (p0 ) + (p0 + p1 )p1 log2 (p1 )}
= −2 {p0 log2 (p0 ) + p1 log2 (p1 )}
= 2h(p0 , p1 ) = 2H2 . (1.12)
Shannon’s expression in Eq. (1.9) has precisely the desired property that the
information content is linear in the number of messages sent. With a little
more work one can show that this expression is the unique correct extension
of Eq. (1.1) to the case where the messages in the alphabet do not all have
the same probability.
butions is essentially the definition of statistical independence: P (A, B) = P1 (A)P2 (B).
See App. A
CHAPTER 1. INTRODUCTION 12
Exercise 1.1. Suppose that Alice sends to Bob one message drawn
from a code book of M1 messages and having probability distribution P ,
and one drawn from a different code book of M2 messages and having
probability distribution Q. Show that the average information content of
these two messages is simply the sum of the average information content
of the individual messages.
H = HP + HQ
can work. On the other hand this condition is certainly not sufficient because
the code has to be very cleverly designed so that the receiver can learn which
bits are erroneous without knowing in advance anything about the message
other than what redundant encoding was used. These same concepts will
be important when we study quantum communication and quantum error
correction and so we will delay our study them until then.
Without understanding how to construct (classical) error correction
codes, we can however derive a bound on how much longer (and less ran-
dom) the encoded message has to be if the receiver is going to have a good
chance to receive it over a noisy channel and correctly decode it. This so-
called Hashing bound is described in Ex. 1.3.
CHAPTER 1. INTRODUCTION 16
M ′ = Mencoded ⊕ C (1.19)
Rather than Alice transmitting the entire bit string of length M , she
could save time and instead transmit a compressed encoding of the message
as a smaller bit string of length N . In order for the encoded bit string to
contain the same amount of information as the original bit string, it has to
be more random than the original bit string. Since the information in the
CHAPTER 1. INTRODUCTION 18
original message is HM , the minimum possible bit string length that a data
compression algorithm can theoretically achieve would be N = HM or equiv-
alently a maximum compression factor r = M/HM . Practical compression
codes can approach this limit but never exceed it.
Exercise 1.4. A random source outputs 0 with probability p0 =
0.37 and 1 with probability p1 = 0.63. This source is used to con-
struct a string of M = 220 bits (i.e., 1 megabit = (1024)2 bits) which
is stored on a hard drive. What is the minimum file size (in bits)
that a data compression program could achieve in this case?
The first term in the RHS above represents the probability that all of the
next ℓ bits are zero and the last term represents the probability that the
ℓ + 1th bit is 1. Using the properties of geometric series, we can readily prove
that (taking M to infinity)
∞
X
P (ℓ) = 1. (1.22)
ℓ=0
in the message)
∞
X
Pfail = ϵ(1 − ϵ)ℓ
ℓ=2N
∞
2N ′
X
= (1 − ϵ) ϵ(1 − ϵ)ℓ
ℓ′ =0
2N
= (1 − ϵ) . (1.23)
This is the failure probability encoding the number of 0’s after a particular
one of the 1’s, in the limit of an extremely long message (since we have taken
the upper limits on the summations to be infinity). Because of the double
exponential, N does not have to be very large in order to make Pfail extremely
small (even if ϵ is small). It is convenient to approximate the expression for
Pfail by taking its (natural) logarithm and stopping its Taylor series expansion
at first order in ϵ
ln Pfail = 2N ln(1 − ϵ) ≈ 2N (−ϵ). (1.24)
Thus for small ϵ we have to a good approximation
N
Pfail ≈ e−ϵ2 . (1.25)
The average number of times we will have to use this encoding in a mes-
sage string of length M is ϵM ≫ 1. Everyone one of these has to succeed
or the message will be garbled. Hence the overall success probability for
encoding the entire message is (for the average case)
Ptotal success = (1 − Pfail )ϵM ≈ 1 − ϵM Pfail , (1.26)
where in the last approximate equality we have assumed that ϵM Pfail is small.
The overall failure probability in this limit is then
N
Pfailure overall = 1 − Ptotal success ≈ ϵM Pfail ≈ ϵM e−ϵ2 . (1.27)
The length of the original unencoded message is M bits. In the encoded
message we are simply sending (on average) ϵM bit strings of length N .
Hence the encoded message has on average a length of D = ϵN M . Let us
compare this to the Shannon entropy of the original message
H = M H2 (1 − ϵ, ϵ) = −M (1 − ϵ) log2 (1 − ϵ) + ϵ log2 ϵ
≈ ϵ 1/ ln(2) − log2 ϵ M ≪ M, (1.28)
CHAPTER 1. INTRODUCTION 20
where in the last step we have used log2 (x) = ln(x)/ ln(2) ≈ 1.4427 ln(x).
Naturally, for small ϵ the information content of the message is much smaller
than the length of the message. In principle we should be able to compress the
message of length M (nearly) down to a length D approaching H, achieving
a compression factor approaching
M −1 1
C= = ≈ . (1.29)
H (1 − ϵ) log2 (1 − ϵ) + ϵ log2 ϵ ϵ 1/ ln(2) − log2 ϵ
Let D = rH, where r measures how close the code comes to the theoret-
ically maximum compression factor. We know that for any reliable code we
need r > 1 because we cannot compress a message to a length that is smaller
than the Shannon information content of the message. We have (for small ϵ)
ϵN M N
r ≈ = . (1.30)
ϵM [1/ ln(2) − log2 ϵ] 1/ ln(2) − log2 ϵ
The smaller is r the less reliable is the encoding. However for r slightly
greater than unity, the failure probability can be very small for an efficient
code.
As a specific example for the code being considered here, suppose that
ϵ = 2−7 ≈ 0.78 × 10−2 . In this case the theoretical maximum compression
factor is (using the exact and approximate formulae in Eq. (1.29))
Let us take the overall raw (unencoded) message length to M = 214 = 16192.
Then ϵM = 27 = 128 ≫ 1 as required by our assumptions. If we choose
N = 11 then the overall failure probability from Eq. (1.27) is
N −7
Pfailure overall ≈ 27 e−2 ≈ 1.44 × 10−5 , (1.32)
r ≈ 1.3029, (1.33)
where the first term accounts for the space that needs to be added be-
tween the variable-length binary strings.
Show that
Q1 = P (0) + P (1)
Q2 = P (2) + P (3)
Q3 = P (4) + P (5) + P (6) + P (7)
N −1
2X
QN = P (ℓ), (1.38)
ℓ=2(N −1)
where P (ℓ) is defined in Eq. (1.21), and the last equation above is valid
for any N ≥ 2. Using the properties of geometric series, find an explicit
expression for QN .
Numerical evaluation for the case ϵ = 2−7 yields N̄ = 7.67724. The
achieved compression factor is C ′ = 1/(ϵN̄ ) = 16.6726 exceeds the
theoretical limit C = 15.1712 producing an r factor less than unity:
r = 0.909945. This violation means there is something wrong with the
above argument! What has gone wrong??
How many different such functions are there? That is, given an input of N
bits and output of M bits, how many distinct programs are there? Let us
begin by enumerating the possibilities when N = M = 1. How many distinct
functions are there that map the domain {0, 1} to the range {0, 1}? Table
1.3 enumerates the four possibilities.
For each of the 2 input values there are 2 possible choices of output value,
so we have a total of four possible functions. Let us now consider the slightly
more general case of N input bits, still only M = 1 output bits. The number
of distinct possible inputs is S = 2N . For each of those inputs we have to
choose a value for the output. Let us assign an ordinal number j ∈ [0, S − 1]
to each input that corresponds to the binary number represented by the
input. Thus the 0th input is {0, 0, 0, . . . , 0}, and the (S − 1)th input is
{1, 1, 1, . . . , 1}. Hence the function evaluated by the program is defined by a
string of S = 2N bits: {b0 , b1 , b2 , . . . , bS−1 }, where bj is the bit value output
CHAPTER 1. INTRODUCTION 24
Table 1.3: Enumeration of the four possible functions of one input bit and one output
bit. The ERASE function sets all the output states to 0. We can also think of this as
the RESET function used to initialize a register to 0. The ERASE-NOT function applies
ERASE and then NOT, thereby setting all the output states to 1. Balanced functions
output 0 and 1 equally often. Constant functions always output the same value. For
the special case of one input bit and one output bit, every function is either constant or
balanced, but in the general of n input bits and one output bit, most functions are neither
balanced nor constant. (See Ex. 5.3).
by the the program for the jth possible input. Each bit in such a list has
two possible values so the total number of possible programs is
N
Z(N, M = 1) = 2S = 22 . (1.40)
To understand all this, the reader may find it useful to study the specific case
shown in Table 1.4 which lists the data associated with all possible functions
mapping two bits to one bit.
We see from the above that the number of distinct functions is a double
exponential which is a very rapidly growing function of the input register
size, N . For example, for N = 10, Z(10, 1) ≈ 1.8 × 10308 . The situation is
even more dramatic for output register size M > 1. In this case our program
is defined by M different output strings, each of length S = 2N . Thus the
total number of binary digits defining the program is M 2N and the total
number of possible programs is
2N
Z(N, M ) = 2M (2 ) = 2M
N
. (1.41)
This is consistent with our starting analysis above that Z(1, 1) = 4. Note
that
Z(10, 10) ≈ 3.5 × 103082 (1.42)
CHAPTER 1. INTRODUCTION 25
Table 1.4: List of all the functions mapping 2 bits to 1. The first column lists the four pos-
sible arguments of the function. The remaining columns give the corresponding values of
the 16 different functions fj (⃗x). Notice that the ordinal number j of function fj determines
the four-bit binary string giving the values of fj for each of its four arguments. We see
that only f0 and f15 are constant. There are six balanced functions: f3 , f5 , f6 , f9 , f10 , f12
having equal numbers of 0 and 1 outputs. The remaining eight functions are neither con-
stant nor balanced.
That is to say, IDENTITY and NOT are invertible because they happen (in
this special case) to be their own inverses
(IDENTITY)2 = IDENTITY (1.44)
(NOT)2 = IDENTITY. (1.45)
Obviously, the ZERO (ERASE or RESET) and ONE (ERASE-NOT) func-
tions are not their own inverses since they are idempotent
(ZERO)2 = ZERO (1.46)
(ONE)2 = ONE. (1.47)
Indeed, these gates do not have inverses at all. Given the output, we cannot
tell what the input was. It turns out that this irreversibility (the fact that the
operation cannot be run ‘backwards in time’) has profound implications on
the fundamental thermodynamics of computation. An irreversible computer
must dissipate energy, just as friction irreversibly brings a hockey puck sliding
across the ice to a stop while converting the kinetic energy of the motion of
the puck into heat (random thermal motion of the water molecules in the
ice). If we were to make a movie of the hockey puck’s motion and then ran
it backwards, the viewer would immediately be able to tell that the film was
running backwards because they would see the puck spontaneously speeding
up (and the ice cooling down!) something that is not physically possible
(or at least fantastically statistically unlikely). This is what we mean by
irreversible in the physics sense. In the mathematical sense, an irreversible
gate is not invertible–it does not have an inverse because there is not enough
information at the output to deduce the input.
The thermodynamics of computation were explored by Rolf Landauer and
Charles Bennett at IBM and Bennett developed a reversible model (gate set)
of classical computation in which the only irreversible step was the erasure
of information needed to initialize the computer at the beginning of the com-
putation. As illustrated in Fig. 1.4, this irreversibility is related to the fact
that RESET of a register with unknown contents (caused say by the person
who ran their program before you in the queue) lowers the entropy (Shannon
or thermodynamic) of the register. The second law of thermodynamics says
that the entropy of the universe tends to increase and lowering the entropy of
a system generally requires you to do work (e.g. with a refrigerator). While
these ideas were profound contributions to theoretical physics, in practice
classical computers are highly energy inefficient and dissipate many orders of
CHAPTER 1. INTRODUCTION 27
1 bit of
Shannon
Alice randomly entropy
ERASE 0
selects b=0,1
1 bit of
thermodynamic
entropy
Bath
Figure 1.4: Alice selects a random bit value 0 or 1 and inputs it to an ERASE gate. The
output is 0 independent of the input. The input comes from a probability distribution
containing one bit of Shannon entropy, but the output has zero entropy. Because the
universe conserves information, the missing Shannon entropy must have been irreversibly
dumped into bath as thermodynamic entropy.
We will shortly discuss the reversible SWAP gate and Fig. 1.5 shows that
RESET can be achieved via such a gate, provided that one has a resource
consisting of a supply of bath bits in state 0, and each of those bits is used
only a single time. This picture makes clear that the SWAP gate simply
moves the entropy (information) from the data bit to the bath bit. If Alice
has no further access to the bath after that, then the operation is effectively
irreversible6 ‘.
As we shall see later, it turns out that all the operations (other than the
initialization or RESET) in a quantum computer are required by the laws of
6
It begs the question however, of how one irreversibly sets the original bath bits to zero
in the first place. If one measures the initial value of a bath bit and then applies a NOT
operation conditioned on the measurement result being 1, then the initial randomness is
converted into information in the hands of the experimentalist, so again information is
conserved.
CHAPTER 1. INTRODUCTION 28
1 bit of b
Alice randomly
Shannon
selects b=0,1 X 0
entropy
0 1 bit of
Bath X b Shannon
entropy
Figure 1.5: Reset can be achieved via the reversible SWAP gate, provided that one has a
resource, a supply of bath bits in state 0, and each is used only once. Then it is clear that
the entropy has gone from Alice’s data bit into the bath.
Toffoli
x x x x x
Reversible
y
AND x∧ y y
AND
y y y
z z⊕x∧ y z z⊕x∧ y
Figure 1.6: Left panel: Standard circuit notation for the AND gate. This is clearly
irreversible because there are fewer output lines than input lines and information has been
lost. Middle panel: A reversible version of the AND gate that preserves the information
by copying the inputs to the outputs and flipping the ancilla bit z if and only if x = y = 1.
Right Panel: This is also known as a Toffoli or controlled-controlled-NOT (CCNOT) gate,
displayed here in the notation we will be using later for quantum gates. This gate applies
the NOT operation to the target bit (bottom wire with the open circle symbol) if and only
if both control bits (top two wires with the solid circle symbol) are in the 1 state.
from left to right: the input bits are each represented by a horizontal line
(‘wire’) entering the blackbox from the left, and the output bits are repre-
sented by wires exiting on the right. The wedge notation x ∧ y represents
Boolean logical AND or equivalently the product of the bit values xy. The
⊕ notation refers to addition of the bit values modulo 2:
0⊕0 = 0
0⊕1 = 1
1⊕0 = 1
1⊕1 = 0. (1.48)
Note that this is the truth table for the ‘exclusive OR’ (XOR) gate, which
outputs 1 iff one, but not both, of the inputs are 1.
Clearly the traditional AND gate shown in the left panel of Fig. 1.6 is
irreversible because there is not enough data in the bit value on the single
output line determine the bit values on the two input lines. (See Table 1.5.)
The circuit in the middle panel shows a reversible version of the AND gate.
It has the same number of output and input lines. This gate is its own
inverse since applying it twice yields IDENTITY. This follows directly from
algebraic identity
x y AND(x, y) NAND(x, y)
0 0 0 1
0 1 0 1
1 0 0 1
1 1 1 0
Table 1.5: Truth tables for the AND and NAND (NOT[AND]) gates. The NAND gate is
illustrated in Fig. 1.8.
Fig. 1.5. This clearly shows that the entropy in the data has been transferred
into the bath and thus information is conserved.
See [KLM] Sec. 1.5 for additional discussion of reversible logic.
to be any set of gates that can be used to build circuits that execute all Z
possible programs (for any size N, M ). A gate set consisting solely of the
Toffoli gate (CC-NOT) is universal for classical computation, provided that
one has access to ancilla bits and the ability to initialize these auxiliary bits in
0 or 1 as needed). Since the Toffoli gate is reversible, one can build circuits
that execute all Z possible programs and do so reversibly. The reversible
circuit will have the same number of input and output lines and their total
number may be larger than M+N due to the presence of the ancilla bits.
We have seen that the Toffoli gate is not just universal for reversible
classical computations, but can also be used to create irreversible gates if
desired (by throwing away the data on the control lines). Another simple
gate set that is universal for irreversible computation is {NAND, FANOUT}
illustrated in Fig. 1.8. NAND(x, y) = NOT(AND(x, y)) and its truth table
is shown in Table 1.5. FANOUT splits a single ‘wire’ into two (or more)
wires, or equivalently copies the value of a particular bit into another bit.
When we come to quantum gates, we will see that FANOUT is not allowed
because of the quantum ‘no-cloning theorem.’ The quantum Toffoli gate is
allowed however because it is reversible and does not violate the no-cloning
CHAPTER 1. INTRODUCTION 31
x X y x x x x
y
X x y y⊕x y y⊕x
(d)
x y
y x
Figure 1.7: (a) Standard circuit notation for the SWAP gate which interchanges the bits
on the two wires. (b) The controlled-NOT (CNOT) gate applies the NOT operation to
the target bit (denoted by the open circle symbol) if and only if the control bit (denoted
by the solid circle symbol) is in the 1 state. The control bit x is unchanged. The target bit
y is mapped to XOR(x, y) = y ⊕ x. In contrast to the Toffoli gate, the CNOT is a two-bit
gate, not a 3-bit gate. (c) C̄NOT gate which applies the NOT gate to the target iff the
control bit is in the 0 state. x̄ denotes NOT(x). (d) The SWAP gate can by synthesized
from three CNOT gates.
rule. Curiously, one- and two-bit gates are inadequate to achieve reversible
universal classical computation (hence the need for the three-bit Toffoli gate),
however a gate set with only one- and two-qubit gates can be found that is
universal for quantum computation. Peculiar quantum interference effects
permit the Toffoli gate to be synthesized from two-qubit gates, something
that is not possible in (reversible) classical computation. More on this later!
CHAPTER 1. INTRODUCTION 32
enumerated in Eq. (1.41). Said another way, what is the smallest possible
circuit depth needed to execute the function correctly for all possible inputs?
This is one measure of the computational complexity of executing the de-
sired functiona . Clearly if the function to be evaluated has a lot ‘structure’
(as opposed to being ‘random’) it should be possible to have a very small
circuit. For example if the function has N inputs and N outputs and is
simply IDENTITY, then we just need an IDENTITY gate on each of the N
lines. These can be executed in parallel so the circuit ‘depth’ is one. On the
other hand, if the output value for each possible input is determined by N
coin flips (whose results are permanently recorded, not rerun each time the
computation is run) then it seems intuitive that the circuit would have to
be very complex and deep to deal with each and every possible input in the
correct way.
In theoretical computer science, statements about the complexity of solv-
ing some class of problems (e.g. the ‘traveling salesman’ route optimization
problem) are typically statements about the asymptotic behavior of the cir-
cuit depth as the input size N goes to infinity. There are many classes of
problems that contain provably hard (i.e. requiring circuit depth that is su-
perpolynomial in N ) instances and yet are ‘easy’ for typical cases. It is often
more difficult to prove sharp statements about average-case difficulty than
worse-case difficulty.
For a related discussion of complexity of strings of bits in the classical case
see:
https://www.wikiwand.com/en/Kolmogorov complexity
We will see that the complexity of quantum circuits can sometimes be dra-
matically less than that of the best known classical circuits for certain prob-
lems. However there is still much computer science research that needs to
be done to fully understand the power of quantum computers and to classify
the hardness of different problems on such computers.
a
Note that in the circuit model, circuit depth and run time are essentially the same
thing. The number of lines in a program written in a high-level language need not have
anything to do with the run time because of ‘DO LOOP’ commands. Indeed, the famous
‘halting problem’ theorem tells us that it is not even possible to write a program that can
determine if another program will ever halt.
CHAPTER 1. INTRODUCTION 33
x x
NAND x∧ y x FANOUT
y
Figure 1.8: Left panel: Standard circuit notation for the NAND (NOT[AND]) gate. Like
the AND gate, this is clearly irreversible. Right panel: FANOUT copies the input to two
(or more) output lines. These two gates together form a simple but universal gates set
for irreversible classical computation. Neither of these gates is allowed in a quantum com-
puter. NAND violates the reversibility requirement and FANOUT violates the quantum
no-cloning theorem.
See [KLM] Sec. 1.3 for additional discussion of universal gate sets.
Exercise 1.7. Prove that the three CNOT gates shown in Fig. 1.7c are
equivalent to the SWAP gate. Do this by considering all four possible
input states and explicitly showing in your solution what the states are
after each of the three CNOT gates.
Exercise 1.8. Use a NAND gate, plus any needed auxiliary bits ini-
tialized to 0 or 1, to construct a NOT gate.
a) using one CNOT gate and 2 wires with the inputs being x and y,
and the outputs being x and q = XOR(x, y) = x ⊕ y, where ⊕ is
addition mod 2.
b) with the same inputs and outputs as above, but using one Tof-
foli gate, plus any needed auxiliary bits (internal to the circuit)
initialized to 0 or 1. Hint: You will need a total of 3 wires.
c) using two CNOT gates and three wires with the inputs being x, y, z
and the outputs being x, y and q = z ⊕ x ⊕ y = z ⊕ XOR(x, y).
Another gate we will find useful is the controlled SWAP gate (cSWAP)
also known as the Fredkin gate, illustrated in Fig. 1.9. cSWAP applies the
CHAPTER 1. INTRODUCTION 34
identity if the control bit is 0 and swaps the two target bits if the control bit
is 1. The c̄SWAP does the reverse–performing the SWAP if the control bit
is in 0. As discussed in Exercise 1.10, the cSWAP can be constructed from
Toffoli gates. As illustrated in Fig. 1.10 and discussed in Exercise 1.11, the
cSWAP can be used to construct a router that sends an input bit b down a
binary tree to a destination specifed by the bits in an address register.
Exercise 1.10. Consider a computer with input register (x, y, z) of
size N = 3 and output register of size M = 3. Construct the con-
trolled SWAP (cSWAP or Fredkin) gate shown in Fig. 1.9 whose output
is (x, y, z) if z = 0 and whose output is (y, x, z) is z = 1. You may
utilize one Toffoli gate and two CNOT gates. Hint: With 3 CNOT gates
you can create a SWAP. What can you do with 2 CNOT gates and one
Toffoli gate?
x X x X
y X y X
cSWAP cSWAP
z z
Figure 1.9: Standard circuit notation for the controlled SWAP (cSWAP) or Fredkin gate
which swaps the two target bits x, y iff the control bit z is 1. The c̄SWAP gate does the
reverse–it swaps the two target bits only iff the control is 0.
CHAPTER 1. INTRODUCTION 35
Input register
b
Figure 1.10: A binary tree router that sends input bit b to one of 4 possible elements in
the output register based on the values of two address bits (a1 , a0 ). All other bits in the
output register should be 0.
Chapter 2
36
CHAPTER 2. QUANTUM BITS: QUBITS 37
0 1
1 0
Figure 2.1: Representation of the two possible states of a classical switch (or other physical
instantiation of a classical bit) with an up or down arrow. Classically there are only two
possible encodings: uparrow is state 0 (left panel) or downarrow is state 0 (right panel).
They differ by a NOT operation, or in the language of the arrow, they differ by a rotation
of the arrows through 180 degrees.
Alice Bob
0
0′
θ
ϕ
1′
1
Figure 2.2: There are an infinite number of possible encodings for quantum information.
These encodings are in one-to-one correspondence with the points on the surface of a unit
sphere. Left panel: Alice chooses the ‘standard’ encoding. Right panel: Bob chooses an
encoding (aka ‘quantization axis,’ or ‘frame’) defined in polar coordinates by the polar
angle θ and azimuthal angle φ of a chosen point on the unit sphere (also known as the
‘Bloch sphere’ in honor of Felix Bloch).
p0′
p1′
θ /π
Figure 2.3: Probabilities p0′ , p1′ that Bob measures results 0, 1 in his frame when Alice
prepares the qubit in state 0 in her frame. θ is the polar angle between Bob’s quantization
axis and Alice’s. The probabilities are independent of the azimuthal angle φ.
resource for encrypting communications and for actively detecting the pres-
ence of an eavesdropper. Conversely the same effect dramatically complicates
quantum error correction in computation (and communication). When you
look to see if errors have occurred you do damage to the state!!
Box 2.1. Holevo Bound If Alice and Bob agree in advance on the quanti-
zation axis to use, Alice can send Bob information by sending him qubits.
For simplicity let us say they agree to use the standard computational basis
states |0⟩ and |1⟩. If Alice wishes to send Bob the three-bit classical message
011, she can send him three qubits in states |0⟩, |1⟩, |1⟩. Because Bob knows
the correct quantization axis to use, his measurement results will be 0, 1, 1
and he will correctly receive the message Alice sent. If Bob uses the wrong
quantization axis for his measurement, he will gain less information about
Alice’s message because randomness will cause his measurement results to
(have some non-zero probability to) disagree with what Alice intended.
Alice can (in principle) encode an infinite number of bits in the complex
coefficients of a coherent superposition state α|0⟩ + β|1⟩, but Bob’s measure-
ment can only yield one classical bit of information per qubit. Subsequent
measurements on the same qubit yield no new information because of state
collapse. This limit of one classical bit of information per quantum bit is
called the Holevo bound on the transmittable information.
If Alice sends Bob a large collection of qubits, all prepared in the same
CHAPTER 2. QUANTUM BITS: QUBITS 41
state, say 0, then Bob can tell something about the alignment of his frame
relative to Alice’s from the probability distribution of the results. Using his
measurements Bob can estimate θ from
2 θ 2 θ
p0′ − p1′ = cos − sin = cos[θ]. (2.3)
2 2
We will see later that by orienting his frame in a different direction, he
can also acquire information about cos[φ], provided Alice supplies him with
additional fresh qubits all prepared in the same state.
Exercise 2.1. The quantum Zeno effect is a vivid demonstration of
the existence of measurement back action. Suppose that Alice gives Bob
a qubit prepared in state 0 in her frame. Further suppose that Bob’s
frame is aligned with Alice’s frame, so that if he measures the qubit he
will also obtain the result 0 with probability 1. Bob can (with high prob-
ability) rotate the qubit from state 0 to state 1 purely by making a series
of measurements! Suppose Bob makes a series of N ≫ 1 measurements,
gradually rotating his quantization axis between each measurement. Let
the first measurement have θ1 = π/N and the jth measurement have
θj = jθ1 . (Assume all have φ = 0.) If all N measurements happen to
yield state 0′ in Bob’s (gradually rotating) frame, then after N measure-
ments, Alice will see that Bob’s measurements will have turned the qubit
from her 0 to her 1 orientation. A lower bound on the success probability
is given by the probability that every single one of Bob’s measurements
is 0′ (in his gradually rotating frame). Compute this probability as a
function of N and find an analytic approximation for it valid for N ≫ 1.
Hint: expand the logarithm of the probability for large N and then ex-
ponentiate that.
Energy
1
0
Figure 2.4: Schematic illustration of the quantized energy levels of a naturally occuring
(or an artificial) atom used as a quantum bit (qubit). The atom being in the lowest energy
state can represent state 0 and the atom being in the first excited state can represent state
1. Electromagnetic radiation can be used to flip the qubit from one state to the other. For
natural atoms the required frequency can be in the optical or the microwave. For artificial
atoms such as superconducting qubits, the required frequency is in the microwave domain.
physical states can be used to store information. For example, state 0 might
denote the atom being in its lowest energy state (the ‘ground’ state), and
state 1 might denote the atom being in its first excited state. The quantiza-
tion of atomic energy levels is related to the fact that the electrons orbiting
the nucleus behave like waves. The allowed energies of the electrons can be
found by solving the Schrödinger wave equation–a complex task outside the
scope of this discussion. Crudely speaking however, one can understand en-
ergy level quantization as arising from the fact that in the quantum theory
particles act like waves and only electron orbits whose circumference is an
integer number of wavelengths are allowed (otherwise the wave destructively
interferes with itself). Most quantum systems have many more than two
states, but with the right experimental situation, it is possible to focus at-
tention only on the two lowest energy states and ignore the others (to a good
approximation).
CHAPTER 2. QUANTUM BITS: QUBITS 43
Here |0⟩ represents state 0 of the qubit and |1⟩ represents state 1 of the
qubit, and α and β are (complex) wave amplitudes also known as ‘probability
amplitudes.’ The ‘state’ of the qubit has something to do with the kind of
2
The physics of the sound waves is such that, to an excellent approximation, each one
propagates through the air unaffected by the other. This does not mean that your brain
won’t have trouble following two conversations at once, but that is a separate matter.
CHAPTER 2. QUANTUM BITS: QUBITS 45
Figure 2.5: Top left panel: Two sine waves, Ψ1 and Ψ2 of unit amplitude (vertically
displaced from each other for clarity) having slightly different wavelengths. Top right
panel: Superposition of the two waves Ψ = αΨ1 + βΨ2 , with α = β = +1. The amplitude
is the sum of the amplitudes of the two waves. In some regions the waves cancel each
other out (destructively interfere) and in other regions they are in phase and constructively
interfere. Bottom left panel: Interference pattern when the wave amplitudes are unequal,
α = +1, β = +0.25. The interference effects are thus smaller. Bottom right panel:
Superposition of the two waves Ψ = αΨ1 + βΨ2 , with α = +1, β = −1. The net amplitude
is the difference of the amplitudes of the two waves. Notice that the regions of high net
amplitude are different than in the upper right panel.
CHAPTER 2. QUANTUM BITS: QUBITS 46
waves illustrated above, but for our purposes we do not ever need the details
of that. |0⟩ and |1⟩ are abstractions of these wave states and all we have
to know about them is that they represent two distinct energy states of the
physical object holding the quantum bit of information. As we shall see later,
the complex wave amplitudes α and β do control interference effects which
are analogous to, but an abstraction from, those shown in Fig. 2.5. The
analogy for qubits to take away from this is the understanding that quantum
superposition states such as
1
|Ψ+ ⟩ = √ [|0⟩ + |1⟩] , (2.5)
2
and
1
|Ψ− ⟩ = √ [|0⟩ − |1⟩] , (2.6)
2
are physically distinct, just as the wave patterns corresponding to the differ-
ent superpositions shown in Fig. 2.5 are distinct.
Readers familiar with the concept of atomic orbitals in chemistry may
find it useful to think about the waves associated with the ground (S orbital)
and excited (Px orbital) states of the hydrogen atom as illustrated schemat-
ically in Fig. 2.6. The two LCAOs (linear combination of atomic orbitals)
corresponding to |Ψ± ⟩ correspond to distinct SP hybridized bonds sticking
out to the right and left as illustrated in Fig. 2.6. It is important to note
that while these two states of the hydrogen atom could in principle be used
to form a qubit, the actual physical qubits in quantum hardware could be
completely different. The states |0⟩ and |1⟩ are simply abstract representa-
tions of whatever the two lowest energy states of our quantum bit are. The
mathematics describing quantum two-level systems is the same for all qubits
and the implementation details are not needed for discussion of the theory.
Readers unfamiliar with these quantum concepts, need not worry about
them. We will provide the rules of the abstract game as a generalization of
the rules for manipulating classical bits learned in computer science.
Qubits are analog devices in the sense that they have a continuum of
possible superposition states defined by two complex numbers (here α and
β) and yet they are digital in the sense that if you measure which energy
state they are in there are only two possible results for the measurement,
|0⟩ and |1⟩. (That is, the measured energy is always found to be either E0
or E1 .) There is a peculiar asymmetry here–it takes an infinite number of
CHAPTER 2. QUANTUM BITS: QUBITS 47
1 1
0 0 − 1 0 − 1
2 2
Figure 2.6: Left panel: Schematic illustration of the wave describing the ground state
|0⟩ and lowest excited state |1⟩ of the electron in a hydrogen atom. (Sharp-eyed expert
readers will notice that these are actually the wave functions for a one-dimensional har-
monic oscillator.) Right panel: waves corresponding to two different linear combinations
(superpositions) of the two orbitals. One superposition would be appropriate for forming
a chemical bond sticking out to the right and the other to the left.
bits to specify the quantum state (i.e. specify the complex numbers α, β)
but a measurement of the state always yields only one of two results. Thus
the information gained from the measurement is (at most) one (classical) bit.
How can we reconcile these wildly different continuous and discrete properties
of qubits? It seems reasonable that if α = 1 and β = 0 so that we have the
state
|ψ⟩ = |0⟩, (2.7)
then we should always measure the energy to be E0 and never E1 . Conversely,
if α = 0 and β = 1, so that we have the state
|ψ⟩ = |1⟩, (2.8)
then we should always measure the energy to be E1 and never E0 . But what
happens if we continuously decrease α from 1 to 0 and continuously increase
β from 0 to 1? At what point does the measurement result change from
being E0 to suddenly being E1 ? Given that the change in the superposition
state is continuous, it seems like the measurement results ought to also be
continuous. It turns out that the only way we can reconcile the discreteness
of the measurement results with the continuity of the superposition states is
for the measurement results to be intrinsically and ineluctably random. The
measurement results are still discrete, but the probabilities of obtaining E0
and E1 vary continuously with α and β. Without randomness there is simply
no way to reconcile the continuous behavior and the discrete behavior. This
is the source of randomness in the quantum theory.
CHAPTER 2. QUANTUM BITS: QUBITS 48
If one measures whether the qubit has value 0 or 1, the answer is random
(provided α and β are both non-zero). There is simply no way to predict the
outcome of the measurement. As discussed in Box 2.4, this randomness is
intrinsic to the quantum theory and is not the result of ignorance of the values
of any hidden variables that Alice forgot to set or Bob failed to measure. Max
Born3 argued that α and β should be thought of as wave amplitudes and the
measurement probabilities are given by the wave intensities. That is, the
so-called Born rule states that the measurement yields 0 with probability4
p0 = |α|2 and 1 with probability p1 = |β|2 .
We know from ordinary probability theory that if A and B are mutually
exclusive5 events with probabilities pA , pB respectively, then the probablity
of C = (A OR B) is
pC = pA + p B . (2.9)
pC ≥ max{pA , pB }. (2.10)
Since the events of measuring 0 and 1 exhaust the set of all possible outcomes,
we have the important constraint that the two measurement probabilities
must add up to unity
We say that the state |ψ⟩ must be ‘normalized.’ To repeat: As α and β vary
continuously, the probabilities p0 , p1 vary continously but the measurement
results are always one of two discrete values, E0 , E1 .
Since classical probablities are positive, we know that if there are two
mutually exclusive ways an event can occur, the overall probability will be
increased. This can be seen in Eq. (2.9). If {A, B, C, D} are four possi-
ble mutually exclusive outcomes but {A, B} are in the category ‘good’ and
3
Interesting trivia item: Max Born (who won the Nobel Prize in 1954) was the grand-
father of British-Australian singer, Dame Olivia Newton John, who co-starred with John
Travolta in the 1978 musical movie, Grease.
4
We are using the following standard notation. If we have a complex number z = x+iy,
we define complex conjugation as z ∗ = x − iy and the magnitude (squared) of the number
as |z|2 = z ∗ z = x2 + y 2 .
5
Mutually exclusive simply means that A and B never both occur on any ‘throw of the
dice.’
CHAPTER 2. QUANTUM BITS: QUBITS 49
{C, D} are in the category ‘bad,’ then there are two ways to have a good
outcome and the probability of such an outcome is given by the sum of the
probabilities of all the different ways a good outcome can occur. Consider
however what would happen if there were two contributions to the quantum
probability amplitudes instead of the probabilities
1
|Ψ⟩ = √ [(α + α′ )|0⟩ + (β + β ′ )|1⟩] . (2.12)
Λ
We interpret the probability amplitudes as (complex-valued) wave-like ampli-
tudes6 . Noting that wave amplitudes can interfere constructively or destruc-
tively, we see that the probability of a particular outcome might be increased
or decreased if there are two contributions to the probability amplitude. We
can have
and pointing from the origin to the surface of the unit sphere as illustrated in
Fig. 2.7. The unit vector ŝ is variously referred to as the qubit ‘polarization
vector,’ or the ‘spin’ or ‘spin vector’ of the qubit7 . We will later describe the
meaning of the unit vector on the Bloch sphere in relation to the properties
of the corresponding quantum state and will also explain why half angles
(which are common in spherical trigonometry) appear in the parametrization
of the state in Eqs. (2.16-2.15). We simply note here that the two states of
a classical bit are represented by the Bloch sphere vectors corresponding to
the north pole ŝ = (0, 0, 1) and the south pole ŝ = (0, 0, −1). Quantum bits
can be in states corresponding to an arbitrary point on the sphere.
7
Physics students will know that the word ‘spin’ refers to the fact that certain ele-
mentary particles like the electron carry an intrinsic angular momentum which is a vector
quantity that can point in any direction, and yet when we measure the projection of that
spin vector onto a fixed axis, we always obtain only one of two results (at least for the
electron since it has spin s = 1/2 and therefore 2s + 1 = 2 independent states). The spin
degree of freedom of an electron is therefore a qubit! Other kinds of qubits do not literally
have an angular momentum vector but their superposition states can still be represented
mathematically as if they did.
CHAPTER 2. QUANTUM BITS: QUBITS 51
Figure 2.7: Unit vector corresponding to a point on the Bloch sphere with cartesian
coordinates (x, y, z) = (sin θ cos φ, sin θ sin φ, cos θ). The orientation of this 3D unit vector
can be obtained by starting with the unit vector pointing to the ‘north pole’ of the sphere
and then rotating it in the xz plane by an angle θ around the y axis and then rotating by
an angle φ around the z axis. This unit vector corresponds to the parametrization of the
qubit state given in Eqs. (2.16-2.15) [Figure Credit: Nielsen and Chuang].
Box 2.3. A word about notation. Since |0⟩ and |1⟩ might represent the
ground and first excited states of a quantum system, we will sometimes de-
note them |g⟩ and |e⟩ respectively. Similarly, we may use the orientation
on the Bloch sphere to label the same states | ↑⟩ and | ↓⟩. The state corre-
sponding to (x, y, z) = (1, 0, 0) (or equivalently, θ = π/2, φ = 0) on the Bloch
sphere might be written
1 1
| →⟩ = | + X⟩ = |+⟩ = √ [|0⟩ + |1⟩] = √ [| ↑⟩ + | ↓⟩], (2.18)
2 2
It is a weird feature of quantum spins that a coherent superpostion of up and
down points sideways!
The state corresponding to the diametricaly opposite point on the Bloch
sphere (x, y, z) = (−1, 0, 0) (or equivalently, θ = π/2, φ = π) is
1 1
| ←⟩ = | − X⟩ = |−⟩ = √ [|0⟩ − |1⟩] = √ [| ↑⟩ − | ↓⟩]. (2.19)
2 2
It is important not to confuse −| + X⟩ with | − X⟩. In the former case
the minus sign denotes the quantum amplitude associated with the state
pointing in the +X direction on the Bloch sphere. In the latter case, | − X⟩
is a completely different state corresponding to a different point on the Bloch
sphere.
Similarly, the states corresponding to the points (x, y, z) = (0, ±1, 0) (or
equivalently, θ = π/2, φ = ±π/2) on the Bloch sphere are
1 1
| ± Y ⟩ = | ± i⟩ = √ [|0⟩ ± i|1⟩] = √ [| ↑⟩ ± i| ↓⟩]. (2.20)
2 2
CHAPTER 2. QUANTUM BITS: QUBITS 52
A quantum register can hold an exponentially large superposition of all possible 2 N states
000 ⇒ 000 + 001 − 010 − 011 + 100 + 101 + 110 − 111 INPUT
Apply wave-like
Quantum
destructive interference
computer
to eliminate (many of) the
program
‘wrong’ answers.
000 + 001 − 010 − 011 + 100 + 101 + 110 − 111
OUTPUT
Figure 2.8: Schematic illustration of the action of a quantum algorithm in focusing the
wave-like quantum input onto the desired answer in the output register. We can think of
the algorithm as a programmable diffraction grating that blocks certain combinations of
waves or changes their phase to modify the resulting interference.
Interacting quantum
fermions/bosons;
Quantum Chemistry
Multiplication Classically Quantum Quantum
easy easy hard
Real-time
Factorization evolution of
quantum
systems
Figure 2.9: Schematic illustration of quantum and classical problem complexity. Given a
quantum state (loaded into the input register of a quantum computer) and the Hamilto-
nian (energy function) that determines its time evolution via the Schrödinger equation, 1
a quantum computer can efficiently evolve the state forward in time. Thus quantum dy-
namics is quantum easy. However given the quantum Hamiltonian it may be quantum
hard to find the ground state. [Figure courtesy of Shruti Puri.]
gives qubits ‘spooky’ correlations that can be stronger than is allowed for
classical bits. In turns out that to utilize the full power of entanglement
requires an additional ingredient which goes by the unfortunate name of
‘magic.’ Of course it isn’t magic, it is physics, but it does seem like magic
and Einstein thought that this was sure proof that the quantum theory was
wrong. Ironically, today we use this magic as a daily engineering test to
make sure our quantum computers really are quantum. We will look into
these mysteries in a later section.
Box 2.6. Church-Turing Thesis In the 1930’s Church and Turing made
foundational contributions to logic and computer science. Turing invented
a (theoretical) prototype computer which is universal–it’s capabilities essen-
tially define what is ‘computable.’ Scott Aaronson summarizes the (physical)
Church-Turing Thesis as saying that every physical process can be simulated
by a Turing machine to any desired precision, and the Extended Church-
Turing Thesis as saying that every physical process can be efficiently simu-
lated by a Turing machine. We now understand that this thesis is false for
quantum processes. There are quantum processes which are exponentially
hard to simulate on classical computers. An interesting question is whether
a quantum Turing machine obeys the Extended Church-Turing thesis. It is
natural to presume that quantum hardware can simulate any physical quan-
tum process. But is quantum mechanics the ultimate theory that describes
all possible physical processes (even those occuring at extreme energies (the
Planck scale) where quantum gravity may become important)? Does the
universe actually obey some other theory which cannot be efficiently simu-
lated on hardware obeying the rules of ordinary quantum mechanics. I for
one, don’t know...
Chapter 3
57
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 58
The notation |0⟩, |1⟩ refers to the direction of the polarization vector on the
Bloch sphere, while the notation 0, 1 is the computer science notation for the
bit value of the standard basis states. (See Fig. 2.7.)
As described in Appendix B, it will be useful to define an ‘inner product’
between pairs of vectors in Hilbert space that is analogous to the dot product
of two ordinary vectors1 . The inner product of a pair of vectors is a scalar,
that is it is an element of the field over which the vector space is defined.
Ordinary vectors like the position ⃗r of a particle in three-dimensional space
form a vector space over the field of real numbers and the inner (i.e., dot)
product is a real number. Hilbert space is an abstract space of vectors (quan-
tum states) over the field of complex numbers. Hence the inner product of
two state vectors can be complex. Following common parlance, we will use
the terms ‘inner product’ and ‘overlap’ interchangeably. The inner product
between |ψ⟩ and
′
′ ′ ′ α
|ψ ⟩ = α |0⟩ + β |1⟩ = (3.5)
β′
⃗ ·B
For ordinary vectors A ⃗ =B⃗ · A,
⃗ but notice that for a complex vector space
we have to be careful about the order since
The complex conjugation means that the inner product of any vector with
itself is real
where the second equality follows from the Born interpretation of the magni-
tude squared of probability amplitudes as measurment outcome probabilities.
Dirac referred to his notation for state vectors as the ‘bracket’ notation.
A quantum states is represented by ‘ket’ |ψ⟩. Associated with this vector is
a dual ‘bra’ vector defined to be the following row vector
†
α∗ β ∗
α
⟨ψ| = = , (3.9)
β
where † indicates the adjoint, that is the complex conjugate of the transpose.
Thus
α∗ β ∗
′
′ α
⟨ψ|ψ ⟩ = = α∗ α′ + β ∗ β ′ , (3.10)
β′
where the last equality follows from the ordinary rules of matrix multiplica-
tion (here applied to non-square matrices).
Even though the inner product can be complex, we can still think of it as
telling us something about the angle between two vectors in Hilbert space.
Thus for example
⟨0|1⟩ = ⟨↑ | ↓⟩ = 0. (3.11)
Notice the important fact that these pairs of vectors in Hilbert space are
orthogonal even though their corresponding qubit polarization vectors on
the Bloch sphere are co-linear (anti-parallel) and thus not orthogonal in the
usual geometric sense. Thus opposite points on the Bloch sphere correspond
to orthogonal vectors in Hilbert space. [This is closely tied to the fact that
half angles appear in Eqs. (2.16-2.15).] If two state vectors are orthogonal
then the states are completely physically distinguishable. A system prepared
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 60
in one of the states can be measured to uniquely determine which of the two
states the system is in.
Of course if the system is in a superposition of two orthogonal states,
the measurement result can be random. Conversely, if a system is in one of
two states that are not orthogonal, it is not possible to reliably determine by
measurement which state the system is in. We shall explore this more deeply
when we discuss measurements.
If α and β are both non-zero, this is clearly not an eigenvector. But this is
consistent with the fact that the measurement result will be random with
probability p0 = |α|2 of being +1 and probability p1 = |β|2 of being -1.
The reader is urged to carefully study Box 3.1 since the topic of random
measurement results and state collapse can be confusing to beginners. A
common confusion among students is the idea that measurement of σ z is
represented mathematical by acting with σ z on the state. This is incorrect.
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 62
How do we know that measurement collapses the state onto one of the
eigenvectors of the operator being measured? This follows from the
experimental fact that a Z measurement (say) that gives a certain result,
will give exactly the same result if we make a second Z measurement. The
most general possible state is a superposition of the two eigenstates of σ z of
the form |ψ⟩ = α|0⟩ + β|1⟩. In order for the second Z measurement to not
be random, we must have (from the Born rule) that |α|2 and |β|2 cannot
both be non-zero. We have to be in one eigenstate or the other so that
the probability of getting the same measurement result is 100%. Thus the
first Z measurement has to collapse the state onto one of the eigenstates of Z.
Shortly, we will encounter other Hermitian operators that are not diagonal in
the standard basis. In order to apply the Born rule for them, it is essential to
re-express the state in the basis of eigenstates of the operator being measured.
See Box 3.3 for discussion of this point.
want to know the average measurement result for the state we have to prepare
many copies and measure each copy only once. The average measurement
result will of course be (+1)p0 + (−1)p1 = |α|2 − |β|2 . Let us compare
this quantity to the so-called ‘expectation value’ of the operator σ z in the
state |ψ⟩ which is defined by the expression ⟨ψ|σ z |ψ⟩. What we mean by
this expression is compute the state σ z |ψ⟩ and then takes its inner product
(‘overlap’) with |ψ⟩:
ψ0∗ ψ1∗
z 1 0 ψ0
⟨ψ|σ |ψ⟩ = (3.21)
0 −1 ψ1
= |ψ0 |2 − |ψ1 |2 = |α|2 − |β|2 . (3.22)
Thus we have the nice result that the average measurement result for some
observable is simply the expectation value of the observable (operator) in the
quantum state being measured. We see again that the operator associated
with a physical observable contains information about all possible values that
could ever be measured for that observable. This is why the operator has
to be a matrix. We again emphasize however that individual measurement
results are random and the state after the measurement collapses to the
eigenvector corresponding to the measured eigenvalue. The state after the
measurement has nothing to do with σ z |ψ⟩.
Can we now say something about how random the measurement result
will be? Let us begin by reminding ourselves about some basic facts about
classical random variables. The reader is urged to review the discussion of
probability and statistics in Appendix A. Suppose some random variable
ξ takes on values from the set {v1 , v2 , . . . , vM } and value vj occurs with
probability pj . Then the mean value (also known as the expectation value)
is given by
M
X
ξ¯ = pj vj . (3.23)
j=1
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 64
where the overbar indicates statistical average. One measure of the random-
ness of ξ is its variance defined by
¯ 2 = ξ 2 − ξ¯ 2
Var(ξ) ≡ (ξ − ξ) (3.24)
M
" M
#2
X X
= pj vj2 − pj v j . (3.25)
j=1 j=1
We see that the variance is the mean square deviation of the measured quan-
tity from the average and so is a measure of the size of the random fluctua-
tions in the ξ.
Exercise 3.1. Statistical Variance
a) Derive Eq. (3.25).
then
2
Var(Q) = vm ⟨ψ|ψ⟩ − [vm ⟨ψ|ψ⟩]2 = 0. (3.28)
Assuming H † = H, can now use Eq. (3.29) and Eq. (3.31) to obtain
For the case j = k, we know that (by construction) ⟨ψj |ψj ⟩ = 1, and hence the
imaginary part of λj must vanish. Thus all the eigenvalues of an Hermitian
operator are real. If for j ̸= k, the eigenvalues are non-degenerate (i.e.,
λj ̸= λk ), then Eq. (3.34) requires ⟨ψk |ψj ⟩ = 0 and so the two eigenvectors
must be orthogonal. Thus if the full spectrum is non-degenerate, the set of
eigenvectors is orthonormal: ⟨ψk |ψj ⟩ = δkj (where δkj is the Kronecker delta
symbol which vanishes if k ̸= j and is unity for k = j).
If M eigenvalues are degenerate, then any superposition of the M eigenvec-
tors is also an eigenvector. If they are not orthogonal, they can be made
orthogonal by taking appropriate linear combinations of the set of M eigen-
vectors (via the so-called Gram-Schmidt procedure).
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 67
where g is the complex number representing the inner product of |Φ⟩ and
|Ψ⟩,
g = ⟨Φ|χ⟩. (3.36)
Hence when applied to any vector in the Hilbert space, G returns another
vector in the Hilbert space. Thus it is an operator and indeed it is a linear
operator since
Clearly
The above result is a particular case of the general fact that in Dirac’s
notation, Hermitian operators take on a very simple form
M
X
V = vj |ψj ⟩⟨ψj |, (3.44)
j=1
V |ψm ⟩ = vm |ψm ⟩,
Using the known form of the states | ± X⟩ from Eq. (2.18) and Eq. (2.19)
and | ± Y ⟩ in Eq. (2.20), show that in the Z basis
x 0 +1
σ = (3.46)
+1 0
y 0 −i
σ = . (3.47)
+i 0
Since were are dealing with a Hilbert space of only two dimensions and we
have an orthonormal pair of states in the Hilbert space, they constitute a
complete basis for expressing any vector in the Hilbert space.
Similarly, it is straightforward to derive the so-called completeness rela-
tion
ˆ
| + n̂⟩⟨+n̂| + | − n̂⟩⟨−n̂| = I, (3.49)
We see immediately the coefficients in the expansion of the state |ψ⟩ in the
basis are simply computed as the inner products of the basis vectors with
the state |ψ⟩.
The results above are very reminiscent of how we find the representation of
an ordinary vector, say a position vector in 2D. We may have some orthogonal
basis vectors for our coordinate system, for example x̂, ŷ, or a rotated set î, ĵ.
We can represent any vector as
⃗r = (rx , ry ) = x̂(x̂ · ⃗r) + ŷ(ŷ · ⃗r) (3.52)
= î(î · ⃗r) + ĵ(ĵ · ⃗r). (3.53)
This suggests we can think of the identity transformation as
(Px )2 = Px . (3.56)
This has a simple interpretation–once the vector is projected onto the x axis,
further projection doesn’t do anything.
We can think of the shadow of an object cast onto the ground as the pro-
jection of the objection onto the horizontal (xy) plane. This is accomplished
by the projection operator
where Iˆ = x̂x̂ + ŷ ŷ + x̂ẑ is the identity for the 3D case. This shows us that
projection onto the xy plane simply removes the component of the vector
normal to the plane. Despite this more complicated form, it is straightfor-
ward to show that (Pxy )2 = Pxy as required. The key to this is that the basis
vectors x̂, ŷ, ẑ are orthonormal.
The analogous projector onto the vector |n̂⟩ in the Hilbert space describ-
ing the polarization of a qubit is simply
that is not diagonal in the standard basis (i.e., n̂ ̸= ±ẑ). Clearly the two
eigenvalues and eigenvectors of M are given by
In order to apply the Born rule, it is essential that we express the state in
the basis of the eigenstates of M
from which we determine that measurement result M+ will occur with prob-
ability P+ = |α′ |2 and the state will collapse in this case to | + n̂⟩, while
measurement result M− will occur with probability P− = |β ′ |2 and the state
will collapse in this case to | − n̂⟩.
We can use the completeness relation in Eq. (3.49) to change the basis
Thus, Eq. (3.49) along with the Born rule tells us that means that, given an
arbitrary state |ψ⟩, a measurement asking the question ‘Is the state |Ψ⟩ actu-
ally |n̂⟩’ will be answered ‘yes’ with probability given by |α′ |2 = ⟨ψ|Pn̂ |ψ⟩ =
|⟨+n̂|ψ⟩|2 . Correspondingly the question ‘Is the state |Ψ⟩ actually | + n̂⟩’ will
be answered ‘yes’ with probability given by |β ′ |2 = ⟨ψ|P−n̂ |ψ⟩ = |⟨−n̂|ψ⟩|2 .
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 72
with eigenvalue ±1
The particular choice of phase factors cancels out in the above expressions.
The specific representation of one other operator (besides the identity) is also
independent of the so-called ‘gauge’ choice that we make by picking particluar
phase factors. For example, the (diagonal) operator that is measured by an
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 73
can be obtained from the initial state |ψ⟩ via a rotation. The idea is illus-
trated schematically for ordinary 2D vectors in Fig. 3.1.
It turns out that rotations in Hilbert space are executed via unitary op-
erations. A unitary matrix U obeys two defining conditions
U † U = I, (U is an isometry) (3.85)
U U † = I. (U is a coisometry) (3.86)
where I is the identity matrix. Thus, both the left and the right inverse of
U is simply the adjoint U −1 = U † . It turns out that unitary transformations
preserve the inner products between vectors in Hilbert space, just as the
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 75
G
v1
G G
v3 v2
G
v2
G G
G v1 v2
v1
Figure 3.1: Two unit vectors ⃗v1 , ⃗v2 whose end points lie on the unit circle in 2D. The
sum of the vectors ⃗v1 + ⃗v2 (generically) has length squared W = 2 + 2(⃗v1 · ⃗v2 ) ̸= 1. The
normalized vector ⃗v3 = √1W (⃗v1 + ⃗v2 ) does lie on the unit circle and can be obtained by a
rotation applied to either ⃗v1 or ⃗v2 .
more familiar orthogonal rotation matrices preserve the angles (dot products)
between ordinary vectors.3
To better understand rotations consider the representation of an arbitrary
state vector in terms of some orthonormal basis {|j⟩; j = 1, 2, 3, . . . , N } span-
ning the N -dimensional Hilbert space
N
X
|ψ⟩ = ψj |j⟩. (3.87)
j=0
From the Born rule, the probability of measuring the system to be in the basis
state |j⟩ is given by pj = |ψj |2 . The requirement that the total probability
be unity gives us the normalization requirement on the state vector
N
X
|ψj |2 = 1. (3.88)
j=1
Now consider a linear operation U that preserves the length of every vector
3
Recall that the defining property of an orthogonal matrix is that R−1 = RT . Thus a
unitary matrix whose elements are all real is an orthogonal matrix. Unitary matrices are
the natural generalization of orthogonal rotation matrices to vector spaces over the field
of complex numbers. This point is discussed further below in the vicinity of Eq. (3.129).
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 76
and
U † U = I. (3.98)
Thus the only linear operations that conserve probability for all states are
unitary operations. It follows that unitary transformations preserve not only
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 77
the inner product of states with themselves but also preserve the inner prod-
ucts between any pair of states
|ϕ′ ⟩ = U |ϕ⟩ (3.99)
|ψ ′ ⟩ = U |ψ⟩ (3.100)
⟨ϕ′ |ψ ′ ⟩ = ⟨ϕ|U † U ψ⟩ = ⟨ϕ|ψ⟩. (3.101)
It turns out that in an ideal, dissipationless closed quantum system, the
evolution of the system from its initial state at time 0 to its final state at time
t is described by a unitary transformation. We can control the time evolution,
and thus create different unitary operations, by applying control signals to
our quantum system. The specifics of the physics of how this is done for
different systems using laser beams, microwave pulses, magnetic fields, etc.
will not concern us here. We will for now simply postulate that the only
operations available to us to control the quantum system are multiplication
of the starting state by a unitary matrix to effect a rotation in Hilbert space.
This seems reasonable because we are required to conserve total probability.
It turns out that in a deep sense, unitary transformations preserve in-
formation. In a so-called open quantum system that is coupled to its envi-
ronment (also called a ‘bath’), the time evolution of the system plus bath
is unitary but the evolution of the system alone is not because information
about the system can leak into the environment and is no longer conserved
(assuming we cannot access it once it is in the bath).
Being able to rotate states in Hilbert space can also be very useful for
the purposes of measurement. It often happens that the energy eigenstates
of the system (i.e., of the Hamiltonian operator) constitute the only basis in
which measurements can be conveniently made. Typically we represent the
energy eigenstates in terms of the standard basis states |0⟩ and |1⟩. Thus the
Hamiltonian (energy operator) is
E0 + E1 ˆ E1 − E0 z
H= I+ σ , (3.102)
2 2
which has eigenvalues E0 and E1 . If we choose the zero of energy to be half
way between the ground and excited state energies then E0 + E1 = 0 and we
can drop the first term. If we are able to measure the energy (say) then the
preferred measurement operator to which we have access is σ z . If the qubit
is in the state
|ψ⟩ = α|0⟩ + β|0⟩, (3.103)
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 78
and we are able to prepare many copies of this state, a histogram of the
measurement results Z = ±1 plus the Born rule allows us to estimate the
values of |α|2 and |β|2 . We cannot however deduce the relative complex phase
of α and β. (Recall that WLOG we can take α to be real.) To fully determine
the state we need (in general) to be able to measure all three components of
the ‘spin’ vector ⃗σ . If we only have access to measurements of σ z , then full
state ‘tomography’ seems to be impossible. However if we prepend certain
selected rotations of the state before making making the Z measurement we
can achieve our goal. For example a rotation by π/2 around the y axis takes
| + X⟩ to | − Z⟩ and | − X⟩ to | + Z⟩. Similarly a rotation by π/2 around
the x axis takes | + Y ⟩ to | + Z⟩ and | − Y ⟩ to | − Z⟩. Thus we can measure
all three components of the qubit polarization (spin) vector. In fact, we can
rotate any state |n̂⟩ into | + Z⟩ and thus measure the operator n̂ · ⃗σ .
Before we learn how to rotate states in Hilbert space, let us review the
familiar concept of rotations in ordinary space. For example if we start with
an ordinary 3D unit vector on the Bloch sphere
n̂ = xx̂ + y ŷ + z ẑ = [sin θ cos φ x̂ + sin θ sin φ ŷ + cos θ ẑ] , (3.104)
we can rotate it through an angle χ around the z axis to yield a new vector
n̂ ′ = [sin θ cos(φ + χ) x̂ + sin θ sin(φ + χ) ŷ + cos θ ẑ] , (3.105)
which simply corresponds to a transformation of the polar coordinates θ →
θ, φ → φ + χ. If we choose to represent n̂ as a column vector
x sin θ cos φ
n̂ = y = sin θ sin φ , (3.106)
z cos θ
then, using the trigonometric identities cos(θ + χ) = cos(θ) cos(χ) −
sin(θ) sin(χ) and sin(θ + χ) = sin(θ) cos(χ) + cos(θ) sin(χ), it is straight-
forward to show that the new vector n̂ ′ is represented by
′
x x
′ ′
n̂ = y = Rz (χ) y ,
(3.107)
′
z z
where Rz (χ) is a 3 × 3 ‘rotation matrix’
cos χ − sin χ 0
Rz (χ) = sin χ cos χ 0 . (3.108)
0 0 1
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 79
Matrices whose transpose is equal to their inverse are called orthogonal, and
it turns out that rotations (for ordinary vectors) are always represented by
orthogonal matrices.
Two natural properties of rotations are that they preserve the length of
vectors and they preserve the angle between vectors. These facts can be
summarized in the single statement that if ⃗r1 ′ = Rz (χ)⃗r1 and ⃗r2 ′ = Rz (χ)⃗r2 ,
then ⃗r1 ′ · ⃗r2 ′ = ⃗r1 · ⃗r2 . That is, the dot product between any two vectors
(including the case of the the dot product of a vector with itself) is invariant
under rotations. The mathematical source of the preservation of lengths can
be traced back to the fact that the determininant of an orthogonal matrix is
unity.
Let us now turn to rotations in Hilbert space. There, must be some con-
nection to ordinary rotations, because rotation of the qubit spin vector on
the Bloch sphere is an ordinary (3D vector) rotation. However the Hilbert
space is only two-dimensional, and unlike the example above where the stan-
dard basis vectors were x̂, ŷ and ŷ, the standard basis vectors in the Hilbert
space correspond to the orthogonal states | + Z⟩ = |0⟩ and | − Z⟩ = |1⟩.
Furthermore, the inner product in Hilbert space involves complex conjuga-
tion unlike the case of the dot product for ordinary vectors. Hence we expect
that rotation operations in Hilbert space will not look like 3D rotations of
the unit vectors on the Bloch sphere.
Let us begin with rotations around the z axis. We know from the example
above, that rotation of the 3D vector n̂ on the Bloch sphere by an angle χ
around the z axis, simply corresponds to the transformation φ → φ + χ.
From the standard quantum state representation in Eqs. (2.16-2.15), we see
that this simply changes the relative phase of the coefficients of |0⟩ and |1⟩.
Let us therefore consider the following operator on the Hilbert space which
does something very similar
χ z
Uz (χ) = e−i 2 σ . (3.110)
without using the power series expansion. Hint: define the func-
tions
and show that they obey the same first-order differential equation
d
P (θ) = −iP (θ), (3.116)
dθ
d
Q(θ) = −iQ(θ) (3.117)
dθ
Thus the 2 × 2 complex matrix Uz (χ) correctly rotates the quantum state by
an angle χ around the z axis of the Bloch sphere. Notice however that the
resulting state differs from the standard state by an (irrelevant) global phase
factor. This is because we made the arbitrary choice to have the standard
state parametrization for |n̂⟩ yield a coefficient of |0⟩ that is purely real so
that the only complex amplitude is found in the coefficient of |1⟩.
Notice that Uz (−χ) = Uz−1 (χ) = Uz† (χ). Hence Uz (χ) is unitary, meaning
that
ˆ
Uz† Uz = I. (3.127)
O−1 = OT . (3.128)
U = eiθM̂ , (3.130)
where
ω̂ · ⃗σ = ωx σ x + ωy σ z + ωz σ z (3.134)
so that the entire relative phase is achieved by changing the phase only of
the |1⟩
to be consistent with the phase choice made in the definition of | + n̂⟩ and
we have
θ
U1 (φ)e−i 2 Y |0⟩ = | + n̂⟩. (3.152)
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 86
Exercise 3.13. Using Eq. (3.153), prove that the Hadamard gate obeys
a) H = | + Z⟩⟨+X| + | − Z⟩⟨−X|,
b) H 2 = I,
c) H is unitary.
the states are labeled by the binary numbers 00, 01, 10, 11 corresponding to
the base-10 numbers 0, 1, 2, 3 just as in a classical computer memory that
contains only two bits. Our quantum bits can however be in a superposition
of all four states of the form
|ψ⟩ = ψ11 |11⟩ + ψ10 |10⟩ + ψ01 |01⟩ + ψ00 |00⟩, (3.159)
Note that the choice of how to order the entries in the column is completely
arbitrary, but once you make a choice you must stick to it for all your cal-
culations. For the case of N qubits, all of the above generalizes to a vector
space of dimension 2N . We shall always use the standard ordering which
lists the entries in the column according to the binary numbering order used
above.
We have so far seen two kinds of products for vectors, the inner product
⟨ψ|ϕ⟩ which is a scalar (complex number), and the outer product |ϕ⟩⟨ψ|
which is an operator that has a matrix representation. When it comes to
thinking about the quantum states of a composite physical system consisting
of multiple qubits, we have to deal with yet another kind of product, the
tensor product. The tensor product of the Hilbert space H1 of the first qubit
with that of a second qubit H2 yields a new Hilbert space H1 ⊗ H2 whose
dimension is the product of the dimensions of the two individual Hilbert
spaces. The two-qubit basis states in Eq. (3.159) can be thought of as tensor
products of individual vectors,
basis are
1 1
1
1 1
0 =
0
|00⟩ = |0⟩ ⊗ |0⟩ = ⊗ = (3.162)
0 0 1 0
0
0 0
0 0
1
1 0
1 =
1
|01⟩ = |0⟩ ⊗ |1⟩ = ⊗ = (3.163)
0 1 0 0
0
1 0
1 0
0
0 1
0 =
0
|10⟩ = |1⟩ ⊗ |0⟩ = ⊗ = (3.164)
1 0 1 1
1
0 0
0 0
0
0 0
1 =
0
|11⟩ = |1⟩ ⊗ |1⟩ = ⊗ = (3.165)
.
1 1 0 0
1
1 1
Notice that we compute the tensor product of two vectors by inserting the
second vector (i.e., the one on the right) in the product into the first vector as
illustrated in the expressions in the fourth column of the array of equations
above. [Note: It is crucial that you maintain the correct ordering convention.]
For a general pair of two-qubit states, the tensor product is
u 0 v 0 u0
v0 u1
v0 u0 = v0 u1 .
⊗ = (3.166)
v1 u1 u0 v1 u0
v1
u1 v1 u1
|qN −1 . . . q2 q1 q0 ⟩. (3.167)
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 89
More formally however, we need to recognize that the Hilbert space for
two qubits is the tensor product of their individual Hilbert spaces and there-
fore states are represented as column vectors of length 4. This means that all
linear operators are represented by 4 × 4 matrices. Therefore we should write
two-qubit operators as the Kronecker product (also known confusingly and
sloppily as the outer, direct or tensor product) of the 2×2 matrices represent-
ing the individual qubit operators. [AUTHOR NOTE: NEED TO DEFINE
ALL THESE CONCEPTS EXPLICITLY AND ACCURATELY HERE OR
IN APP. B. DO ANY OF YOU MATH MAJORS HAVE SUGGESTIONS?]
The Kronecker product of a matrix with dimension n1 ×n1 with a matrix with
dimension n2 × n2 is a larger matrix of dimension (n1 n2 ) × (n1 n2 ). This is of
course consistent with the fact that the dimension of the Kronecker product
Hilbert space is n1 n2 . As an example of a Kronecker product consider the
matrix representation of the operator σ1z
z 0 +1 0 +1 0
σ ⊗σ = ⊗ . (3.172)
0 −1 0 +1
This is the formal way of representing σ1z that acts on qubit 1 with the Pauli
Z operator and does nothing to the 0th qubit (applies the identity). Notice
that we have removed the qubit labels because the position of the operator
in the Kronecker product implies which qubit it acts on. The rule for how
you write out the entries to this 4 × 4 matrix depends on exactly how you
order the terms in the column vector in Eq. (3.160). It is clear however that
we want the following to be true (assuming we number the qubits in |01⟩
from right to left inside the Dirac ket)
We can think of this as four copies of the σ z matrix each multiplied by the
appropriate entry in the identity matrix.
We can also consider the sum (Kronecker sum) of two operator–not to
be confused with the ‘direct sum’ which is different). For example, the z
component of the total spin vector is given by
MatrixForm[ZI]
IZ = KroneckerProduct[II,Z];
MatrixForm[IZ]
MatrixForm[IZ + ZI]
and the following commands will produce operators from Kronecker products
of their eigenvectors
(1 +i)∗ (1 −i)∗
1 1 1 1
Y = − (3.188)
2 +i 2 −i
c)
1 1 0
(σ10 + σ1z ) + (σ00 + σ0z ) = σ0 + σz ⊕ σ + σz
2 2
h i 1h i
σ + σ ⊗ Iˆ +
0 z
Iˆ ⊗ σ 0 + σ z .
=
2
CHAPTER 3. INTRODUCTION TO HILBERT SPACE 93
Box 3.4. The No-cloning Theorem The no-cloning theorem [1] states
that it is impossible to make a copy of an unknown quantum state.
The essential idea of the no-cloning theorem is that in order to make a copy
of an unknown quantum state, you would have to measure it to see what the
state is and then use that knowledge to make the copy. However measurement
of the state produces in general a random measurement result and random
back action (state collapse) and thus it is not possible to fully determine the
state. This is a reflection of the fact that measurement of a qubit yields one
classical bit of information which is not enough in general to fully specify the
state via its co-latitude θ and longitude φ on the Bloch sphere.
Of course if you have prior knowledge, such as the fact that the state is an
eigenstate of σ x , then a measurement of σ x tells you the ±1 eigenvalue and
hence the | ± X⟩ state. The measurement gives you one additional classical
bit of information which is all you need to have complete knowledge of the
state.
A more formal statement of the no-cloning theorem is the following. Given
an unknown state |ψ⟩ = α|0⟩ + β|1⟩ and an ancilla qubit initially prepared
in a definite state (e.g. |1⟩), there does not exist a unitary operation U that
will take the initial joint state
95
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT96
|ψ⟩ = α00 |00⟩ + α01 |01⟩ + α10 |10⟩ + α11 |11⟩. (4.2)
Measurement of the state of each of the two qubits yields result (x, y) (where
x ∈ {0, 1} and y ∈ {0, 1}) with Born-rule probability P (x, y) = |αxy |2 and
the two-qubit state correspondingly collapses to |xy⟩.
Because we have more than one qubit, we can now ask a new question:
What happens if we only measure one of the qubits but not the other? Sup-
pose we only measure q0 (qubit 0, numbering the qubits from right to left as
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT97
usual). How does the state collapse? To answer this, let us rewrite Eq. (4.2)
in the following form which has the state of qubit 0 factored out on the right:
|ψ⟩ = α00 |0⟩ + α10 |1⟩ |0⟩ + α01 |0⟩ + α11 |1⟩ |1⟩. (4.3)
Let us measure q0 in the standard basis using the operator B̂0 = Iˆ ⊗ |1⟩⟨1|.
This operator has a doubly degenerate eigenvalue of 0 with eigenvectors of
the form
where |µ⟩, the state of q1 , is arbitrary (|0⟩, |1⟩ or any linear combination).
The operator also has a doubly degenerate eigenvalue of 1 with eigenvectors
of the form
where the square root factor produces the correct normalization for the new
state and
where
This of course is exactly the Born-rule probability we obtained for the result
(0, 1) from simultaneous measurement of these two commuting observables.
One can summarize all this more formally in the following way. Suppose
that we have a n n-qubit system in state |ψ⟩ and we measure a single qubit
qj to be in state |bj ⟩. Then the state of the system (partially) collapses to
where the projector is onto the subspace of the Hilbert space that is consistent
with the measurement result (as opposed to projecting onto the single state
consistent with the measurement result as occurs with a single qubit):
P̂j (bj ) = Iˆn Iˆn−1 . . . Iˆj+1 |bj ⟩⟨bj | Iˆj−1 . . . Iˆ1 Iˆ0 .
(4.13)
This type of multi-qubit joint measurement will play a crucial role in quan-
tum error correction and in entanglement generation via measurement. Note
the important fact that joint measurement of Z1 Z0 is completely different
than the product of separate measurement results Z1 and Z0 . The joint
measurement yields one bit of classical information, while with separate mea-
surements you learn two classical bits of information, namely the individual
values for each qubit. Thus the state collapse is more complete in the latter
case.
How do we actually make a joint measurement of an operator like Z1 Z0
without learning about the individual qubit states? This is subtle and un-
derstanding how to design a circuit for making joint measurements requires
us to first learn how to execute two-qubit gates. We turn to this topic in the
next section.
In the classical context one can imagine measuring the state of the control
bit and then using that information to control the flipping of the target bit.
However in the quantum context, it is crucially important to emphasize that
measuring the control qubit would collapse its state. We must therefore
avoid any measurements and seek a unitary gate which works correctly when
the control qubit is in |0⟩ and |1⟩ and even when it is in a superposition of
both possibilities α|0⟩ + β|1⟩. As we will soon see, it is this latter situation
which will allow us to generate entanglement. When the control qubit is in
a superposition state, the CNOT gate causes the target qubit to be both
flipped and not flipped in a manner that is correlated with the state of the
control qubit. As we will see, these are not ordinary classical statistical
correlations (e.g. clouds are correlated with rain), but rather special (and
powerful) quantum correlations resulting from entanglement.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT101
Box 4.1. CNOT without Measurement How can we possibly flip the
state of the target qubit conditioned on the state of the control qubit without
measuring the control and hence collapsing its state? We know that in many
systems we can cause a transition between two quantum levels separated in
energy by an amount ℏω by applying an electromagnetic wave of frequency
ω for a certain precise period of time. [In quantum mechanics energy E =
ω
hf = ℏω and frequency f = 2π are related with a proportionality given by
Planck’s constant h = 2πℏ.] The way to make this bit flip of the target be
conditioned on the state of the control is to have an interaction between the
two qubits that causes the transition energy of the target depend on the state
of the control. For example, consider the Hamiltonian (the energy operator)
ℏω0 ℏω1
H=− Z0 + Z1 + ℏgZ1 Z0 . (4.16)
2 2
The energy change to flip qubit 0 from |0⟩ to |1⟩ is ∆E = ℏ(ω0 − gZ1 ). Thus
if we shine light (or microwaves as appropriate) of frequency ℏ(ω0 + g), qubit
0 will flip only when qubit 1 is in |1⟩ because that matches the transition
frequency. However if qubit 1 is in the state |0⟩, the transition frequency
for qubit 0 is shifted to ℏ(ω0 − g) and the light has the wrong frequency to
cause the transition. See Fig. 4.1 for an illustration of the level scheme.
|11〉
CNOT1
(ω0 − g ) CNOT0
(ω1 − g )
|10〉
| 01〉
CNOT0
(ω1 + g )
CNOT1
(ω0 + g )
| 00〉
Figure 4.1: Energy levels for two interacting qubits in the computational basis |q1 q0 ⟩
whose Hamiltonian is given by Eq. (4.16). Blue lines correspond to transitions in
which qubit q0 is flipped. Red lines correspond to transitions in which qubit q1
is flipped. By choosing ω0 , ω1 and g appropriately, all four single qubit transition
energies can be made unique, thereby allowing flip of one qubit conditioned on the
state of the other. Each transition is labeled by its energy and the corresponding
operation CNOTj where j labels which qubit is the control. Dashed lines corre-
spond to C̄NOT gates in which the control must be in state 0 (rather than 1) in
order to flip the target. Altogether there are four different transistions correspond-
ing to four distinct CNOT or C̄NOT gates. If g = 0, then the two qubits are not
interacting and the transition energy for one qubit is not dependent on the state
of the other, so that conditional operations are not possible.
From the classical truth table we can attempt to construct the appropriate
quantum operator by putting the dual of initial state ket in the bra and the
desired final state in the ket. Number the qubits from right to left (beginning
with zero) and letting qubit 0 be the target and qubit 1 be the control, we
have
CNOT1 = |00⟩⟨00| + |01⟩⟨01| + |11⟩⟨10| + |10⟩⟨11|, (4.17)
where the subscript on CNOT tells us which qubit is the control bit. We see
from the orthonormality of the basis states that this produces all the desired
transformations, for example
CNOT1 |11⟩ = |10⟩. (4.18)
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT103
This however is not enough. We have to prove that the desired transforma-
tions are legal. That is, we must show that CNOT is unitary. It is clear
by inspection that CNOT is Hermitian. A straightforward calculation shows
that (CNOT)2 = I. ˆ Hence by the lemma in Ex. 3.10b, CNOT is unitary.
It is also instructive to write the gate in the following manner
1 0 1 0
σ − σz ⊗ σx + σ + σz ⊗ σ0.
CNOT1 = (4.19)
2 2
The first parentheses (including the factor of 1/2) is the projector |1⟩ ⟨1| onto
the |1⟩ state for the control qubit (qubit 1). Similarly, the second parentheses
(including the factor of 1/2) is the projector |0⟩ ⟨0| onto the |0⟩ state for the
control qubit. Thus if the control qubit is in |1⟩, then σ x flips the target
qubit while the remaining term vanishes. Conversely when the control qubit
is in |0⟩, the coefficient of σ x vanishes and only the identity in the second
term acts.
In the standard two-qubit basis defined in Eq. (3.160), the operator in
Eq. (4.17) and Eq. (4.19) has the matrix representation
0 0 0 1 1 0 1 0
CNOT1 = ⊗ + ⊗ (4.20)
0 1 1 0 0 0 0 1
1 0 0 0
0 1 0 0
= 0 0 0 1 .
(4.21)
0 0 1 0
The CNOT1 unitary is represented in ‘quantum circuit’ notation by the con-
struction illustrated in left panel of Fig. 4.2.
Exercise 4.1. By analogy with Eq. (4.21), find the matrix representa-
tion of the CNOT0 gate given in the right panel of Fig. 4.2
Exercise 4.2. Consider the reset operator R defined in Box 4.2. Find
a state whose norm is not preserved under R. This is further proof
that R is not a legal unitary operator and requires the assistance of a
measurement operation.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT104
| q0 〉 | q0 〉
| q1 〉 | q1 〉
Figure 4.2: Quantum circuit representation of the CNOT1 operation (left panel) and the
CNOT0 operation (right panel). The filled circle denotes the control qubit (in the left
panel, q1 ) and the symbol ⊕ denotes the target qubit (in the left panel, q0 ) for the gate.
The right panel shows the gate with control and target interchanged. In quantum circuit
notation the order in which gates are applied (‘time’) runs from left to right. If the circuit
has GATE1 followed by GATE2 , reading from left to right, this corresponds to the (right-
to-left) sequence of matrix operations GATE2 GATE1 |INPUT STATE⟩. For the C̄NOT
gate the control is shown as an open, rather than filled, circle. This denotes the operation
being activated by the control being in 0 rather than 1.
Box 4.2. RESET: Some desired operations are not unitary. One im-
portant task in a quantum computer is to reset all the bits to some standard
state before starting a new computation. Let us take the standard state to
be |0⟩. We want to map every initial state to |0⟩ which can be done with
1 1
R = |0⟩⟨0| + |0⟩⟨1| = . (4.22)
0 0
q0 | 0〉
X X | b1 ⊕ b2 〉
z0= z1 ⋅ z2
q1 | b1 〉 | b1 〉
q2 | b2 〉 | b2 〉
Figure 4.3: Circuit for mapping the joint operator Z1 Z2 onto the auxiliary qubit operator
Z0 . Measurement of Z0 yields the value of Z1 Z2 without conveying any information about
the individual values of Z1 and Z2 . b0 , b1 , b2 ∈ {0, 1} denote qubit states in the standard
basis and ⊕ denotes bitwise addition mod 2.
Box 4.3. LOCC No-Go Theorem In order to convert a product state into
an entangled state, we must use an entangling gate. Such gates are always
controlled gates such as the CNOT or CPHASE. These gates cannot be
written as a single Kronecker product of two single-qubit operators. There
is an important no-go theorem which states that using local operations and
classical communication (LOCC) two parties cannot establish an entangled
state between them starting from product states. Here local operations
means operations of the form Z ⊗ I or I ⊗ H, which one party carries out
independently of the other. Operations like CNOT and CPHASE, being
conditional operations are explicitly not local. Classical communication
refers to the two parties communicating with each other via a classical (i.e.,
without exchanging qubits) about which gates (or measurements) they have
performed or request that the other part perform.
In order to perform a non-local gate, the qubits have to be (at some point
in the protocol) physically nearby so they can interact with each other, or if
remote, then the two parties much communicate via a quantum channel. For
example, Alice can locally use a CNOT gate on a pair of qubits to create an
entangled pair and then send one of the qubits to Bob.
Bell basis3
1
|B0 ⟩ = √ [|01⟩ − |10⟩] (4.34)
2
1
|B1 ⟩ = √ [|01⟩ + |10⟩] (4.35)
2
1
|B2 ⟩ = √ [|00⟩ − |11⟩] (4.36)
2
1
|B3 ⟩ = √ [|00⟩ + |11⟩] . (4.37)
2
Each of these four states is a ‘maximally entangled’ state, but they are mu-
tually orthogonal and therefore must span the full four-dimensional Hilbert
space. Thus linear superpositions of them can represent any state, including
even product states. For example,
1
|0⟩|0⟩ = |00⟩ = √ [|B2 ⟩ + |B3 ⟩] . (4.38)
2
Entanglement is very mysterious and entangled states have many pecu-
liar and counter-intuitive properties. In an entangled state the individual
spin components have zero expectation value and yet the spins are strongly
correlated. For example in the Bell state |B0 ⟩,
⟨B0 |⃗σ1 |B0 ⟩ = ⃗0 (4.39)
⟨B0 |⃗σ0 |B0 ⟩ = ⃗0 (4.40)
⟨B0 |σ1x σ0x |B0 ⟩ = −1 (4.41)
⟨B0 |σ1y σ0y |B0 ⟩ = −1 (4.42)
⟨B0 |σ1z σ0z |B0 ⟩ = −1. (4.43)
This means that the spins have quantum correlations which are stronger than
is possible classically. In particular,
⟨B0 |⃗σ1 · ⃗σ0 |B0 ⟩ = −3 (4.44)
3
Named after John Bell, the physicist who in the 1960’s developed deep insights into the
issues surrounding the concept of entanglement that so bothered Einstein. Bell proposed
a rigorous experimental test of the idea that the randomness in quantum experiments is
due to our inability to access hidden classical variables. At the time this was a theorist’s
‘gedanken’ experiment, but today the ‘Bell test’ has rigorously ruled out the possibility of
hidden variables. Indeed, the Bell test is now a routine engineering test to make sure that
your quantum computer really is a quantum computer, not a classical computer.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT109
despite the fact that in any product state |ψ⟩ = |n̂1 ⟩ ⊗ |n̂0 ⟩
Box 4.4. Joint Measurements of Multiple Qubits One might ask how
the computer architect designs the hardware to make joint measurements of
operators of the form M = σ1x σ0z , etc. First, a general remark: If she wants
to measure M = AB, the product of two observables A and B, then M must
be an observable (i.e., Hermitian).
M † = (AB)† = B † A† = BA
M † = M ⇐⇒ AB = BA ⇐⇒ [A, B] = 0,
A|j⟩ = aj |j⟩
B|j⟩ = bj |j⟩
M |j⟩ = aj bj |j⟩.
We can obtain a useful picture of the Bell states by examining all of the
possible two-spin correlators in the so-called ‘Pauli bar plot’ for the state
|B0 ⟩ shown in Fig. 4.4. We see that all the single-spin operator (e.g. IX
and ZI) expectation values vanish. Because of the entanglement, each spin
is, on average, totally unpolarized. Yet three of the two-spin correlators,
XX, Y Y, ZZ, are all -1 indicating that the two spins are pointing in exactly
opposite directions. This is becasue |B0 ⟩ is the rotationally invariant ‘spin-
singlet’ state.
0.5
II IX IY IZ XI YI ZI XX XY XZ YX YY YZ ZX ZY ZZ
-0.5
-1.0
Figure 4.4: ‘Pauli bar plot’ of one and two spin correlators in the Bell state |B0 ⟩.
the two spins are perfectly anti-correlated. Suppose now that Alice prepares
two qubits in this Bell state and then sends one of the two qubits to Bob who
is far away (say one light-year). Alice now chooses to make make a measure-
ment of her qubit projection along some arbitrary axis n̂. For simplicity let
us say that she chooses the ẑ axis. Let us further say that her measurement
result is −1. Then she immediately knows that if Bob chooses to measure his
qubit along the same axis, his measurement result will be the opposite, +1.
It seems that Alice’s measurement has collapsed the state of the spins from
|B0 ⟩ to |10⟩. This ‘spooky action at a distance’ in which Alice’s measurement
seems to instantaneously change Bob’s distant qubit was deeply troubling to
Einstein [3].
Upon reflection one can see that this effect cannot be used for superlu-
minal communication (in violation of special relativity). Even if Alice and
Bob had agreed in advance on what axis to use for measurements, Alice
has no control over her measurement result and so cannot use it to signal
Bob. It is true that Bob can immediately find out what Alice’s measurement
result was, but this does not give Alice the ability to send a message. In
fact, suppose that Bob’s clock were off and he accidentally made his mea-
surement slightly before Alice. Would either he or Alice be able to tell? The
answer is no, because each would see a random result just as expected. This
must be so because in special relativity, the very concept of simultaneity is
frame-dependent and not universal.
Things get more interesting when Alice and Bob choose different mea-
surement axes. Einstein felt that quantum mechanics must somehow not be
a complete description of reality and that there might be ‘hidden variables’
which if they could be measured would remove the probabilistic nature of
the quantum description. However in 1964 John S. Bell proved a remarkable
inequality [4] showing that when Alice and Bob use certain different measure-
ment axes, the correlations between their measurement results are stronger
than any possible classical correlation that could be induced by (local) hidden
variables. Experimental violation of the Bell inequality proves that it is not
true that quantum observables have values (determined by hidden classical
variables) before we measure them. Precision experimental measurements
which violate the Bell inequalities are now the strongest proof that quantum
mechanics is correct and that local hidden variable theories are excluded.
Perhaps the simplest way to understand this result is to consider the
CHSH inequality developed by Clauser, Horn, Shimoni and Holt [5] follow-
ing Bell’s ideas. Consider the measurement axes shown in Fig. 4.5. The
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT113
experiment consists of many trials of the following protocol. Alice and Bob
share a pair of qubits in an entangled state. Alice randomly chooses to mea-
sure the first qubit using X or Z while Bob randomly chooses to measure the
second qubit using X ′ or Z ′ which are rotated 45 degrees relative to Alice’s
axes. After many trials (each starting with a fresh copy of the entangled
state), Alice and Bob compare notes (via classical subluminal communica-
tion) on their measurement results and compute the following correlation
function
N
1 X
⟨B0 |XZ ′ |B0 ⟩ ≈ xj zj . (4.48)
N j=1
If the measurements are perfectly correlated (xj = zj′ every time) then the
correlator will be +1. If perfectly anticorrelated (xj = −zj′ every time), then
the correlator will be −1. If the measurements are uncorrelated, then all four
measurement outcomes will be equally likely and xj zj′ will be fully random
(±1 with equal probability) and the correlator will vanish (on average).
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT114
Z¢ X¢
45o
X
Figure 4.5: Measurement axes used by Alice (solid lines) and Bob (dashed lines) in estab-
lishing the Clauser-Horn-Shimoni-Holt (CHSH) inequality.
z cj
(1, 1) (1, 1)
xj
Figure 4.6: Illustration of the four possible measurement outcomes in the jth run of
an experiment in which Alice measures X and Bob measures Z ′ . The net correlator
of the two measurements is given by Eq. (4.49).
Alice and Bob note that their measurement results are random variables
which are always equal to either +1 or −1. In a particular trial Alice chooses
randomly to measure either X or Z. If you believe in the hidden variable
theory, then surely, the quantities not measured still have a value of either
+1 or −1 (because when we do measure them, they always are either +1 or
−1). If this is true, then either X = Z or X = −Z. Thus either X + Z
vanishes or X − Z vanishes in any given realization of the random variables.
The combination that does not vanish is either +2 or −2. Hence it follows
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT115
−2 ≤ S ≤ +2. (4.51)
It turns out however that the quantum correlations between the two spins
in the entangled pair violate this classical bound. They are stronger than
can ever be possible in any classical local hidden variable theory. To see this,
note that because ⃗σ is a vector, we can resolve its form in one basis in terms
of the other via
′ 1
σx = √ [σ z + σ x ] (4.52)
2
′ 1 z
σz = √ [σ − σ x ]. (4.53)
2
Thus we can express S in terms of the ‘Pauli bar’ correlations
1
S = √ [⟨XX + XZ + ZX + ZZ⟩ − ⟨XZ − XX − ZZ + ZX⟩] . (4.54)
2
For Bell state |B0 ⟩, these correlations are shown in Fig. 4.4 and yield
√
S = −2 2, (4.55)
local hidden variable theories. We are forced to give up the idea that physical
observables have values before they are observed.
Exercise 4.4. Other Bell inequalities.
1. Work out the ‘Pauli bar plots’ (analogous to Fig. 4.4) for each of
the Bell states B1 , B2 , B3 .
2. Using the same quantization axes as in Fig. 4.5, find the analog of
Eq. (4.46) for the correlators that should be measured to achieve
violation of the Bell inequality for these other Bell states.
If Alice and Bob are far apart, Alice is unable to do any operations on Bob’s
qubit, but she can perform a local unitary U0 = I ⊗ U on her qubit q0 . We
can now ask the following question: Given this constraint of local operations,
how many distinct (i.e., orthogonal) states for the combined system can Alice
reach starting from this initial state? Clearly for U0 = I ⊗X, she can produce
which is orthogonal to the original state. This however is the only orthogonal
state Alice can reach. The two-qubit Hilbert space is four-dimensional but
Alice cannot fully explore it. It seems ‘obvious’ that this is because she has
no access to Bob’s qubit, however as we will see, things become much less
obvious when we consider entangled states.
The situation is very different for two-qubit entangled states. We take as
our basis the four orthogonal Bell states in Eqs. (4.34-4.37). Suppose that
Alice prepares the Bell state |B0 ⟩ using qubits q1 , q0 and sends q1 to Bob
who is in a distant location. Using a remarkable protocol called quantum
dense coding [2], Alice can now send Bob two classical bits of information
by performing a local gate on q0 and then sending it to Bob. The protocol
(whose circuit in illustrated in Fig. 4.7) relies on the amazing fact that Alice
can transform the initial Bell state into any of the other Bell states by purely
local operations on her remaining qubit without communicating with Bob.
The four possible unitary operations Alice should perform are I0 , X0 , Y0 , Z0
which yield5
I0 |B0 ⟩ = + |B0 ⟩ (4.58)
Z0 |B0 ⟩ = − |B1 ⟩ (4.59)
X0 |B0 ⟩ = + |B2 ⟩ (4.60)
Y0 |B0 ⟩ = −i|B3 ⟩. (4.61)
It seems somehow miraculous that without touching Bob’s qubit, Alice
can reach all four orthogonal states by merely rotating her own qubit. How-
ever the fact that this is possible follows immediately from Eqs. (4.39) and
(4.40), the Pauli bar plot in Fig. 4.4 and the corresponding plots for the other
Bell states. In every Bell state, the expectation value of every component of
⃗σ0 (and ⃗σ1 ) vanishes. Thus for example
⟨B0 |σ0x |B0 ⟩ = ⟨B0 |I ⊗ X|B0 ⟩ = 0. (4.62)
But this can only occur if the state (σ0x |B0 ⟩) is orthogonal to |B0 ⟩! This in
turn means that there are four possible two-bit messages Alice can send by
associating each with one of the four operations6 I0 , X0 , Y0 , Z0 as shown in
5
As usual we are simplifying the notation. For example Z0 stands for the more math-
ematically formal I ⊗ Z since it applies Z to Alice’s qubit, q0 , and the identity to Bob’s
qubit, q1 . Note that the global phase factors are irrelevant to the workings of the protocol.
6
The particular operators associated with each of the four binary numbers is somewhat
arbitrary and was chosen in this case to correspond to a particular choice of Bob’s decoding
circuit which will be described later.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT118
Alice’s Bob’s
Message Bob’s Decoder Measurement
q0 U H
After Alice
selects message
| B0 〉
Before Alice
selects message
q1
Figure 4.7: Illustration of the quantum dense coding protocol showing the power of
having a prepositioned Bell pair shared by Alice and Bob. Alice has two qubits,
q1 , q0 and on Monday prepares them in an entangled Bell state |B0 ⟩. She sends q1
to Bob. On Tuesday she decides to send Bob a two-bit classical message by choos-
ing one of four possible unitaries U ∈ {I, X, Y, Z} to apply to her remaining qubit,
q0 . This unitary maps the initial Bell state |B0 ⟩ to one of the four orthogonal
Bell states. She then sends q0 to Bob who decodes the Bell pair (by mapping the
four Bell basis states back to the standard computational basis). If Alice chooses
the classical message z1 , z0 , Bob’s decoder outputs a state ei ϕ(z1 , z0 )|z1 z0 ⟩, where
ϕ(z1 , z0 ) is an irrelevant phase factor. Bob then measures the qubits and obtains
from the measurements two classical numbers z1 , z0 corresponding to Alice’s mes-
sage.
Table 4.2.
The reader might now reasonably ask the following question. Before
Alice sends Bob q0 can he, by making local measurements on his qubit q1 ,
detect the fact that Alice’s operation has changed the joint state of their
two qubits? Clearly from the Holevo bound (see Box 2.1) he can learn at
most 1 bit of information so would not be able to fully learn which of the
four operations Alice did, but perhaps he could learn something. If he could,
then special relativity would be violated because of signal would have passed
instantaneously from Alice to Bob exceeding the bound set by the speed of
light. The answer is a firm no, as discussed in Box 4.5.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT119
Table 4.2: Illustration of the quantum dense encoding protocol, showing the unitary oper-
ation Alice must carry out on her qubit, q0 , the state produced by Bob’s decoder and the
measurement results Bob obtains from which he reads Alice’s two classical bit message.
The extra phase factors in front of the Bell states have no effect on the measurement
results because Bob never has to deal with a superposition of these Bell states.
The upshot of all this is that, while it may appear that there is spooky action
at a distance in the changes that one party can make in an entangled state us-
ing LOCC, these changes are locally invisible to the other party because only
the correlations in measurement results of the two parties actually change.
Computation of these correlations requires (subluminal) classical communi-
cation.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT120
MBell = 0|B0 ⟩⟨B0 | + 1|B1 ⟩⟨B1 | + 2|B2 ⟩⟨B2 | + 3|B3 ⟩⟨B3 |. (4.63)
The four eigenstates of this Hermitian operator are the four (orthonormal)
Bell states, and we see explicitly that Bell state j has eigenvalue j. Thus the
measurement result tells us precisely which Bell state the system is in.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT121
q0
q1
Figure 4.8: Bell-basis measurement circuit comprising a Bell state decoder with a CNOT
and Hadamard gate followed by measurement in the computational basis. This circuit
permits measurement of which Bell state a pair of qubits is in by mapping the states to
the standard basis of eigenstates of σ0z and σ1z .
Exercise 4.7. Prove the following identities for the circuit shown in
Fig. 4.8 where qubit 0 is the control and qubit 1 is the target
Once Bob has mapped the Bell states onto unique computational basis
states, he measures Z0 and Z1 separately, thereby gaining two bits of classical
information and effectively reading the message Alice has sent as shown in
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT122
the last column of Table 4.2. Note that the overall sign in front of the
basis states produced by the circuit is irrelevant and not observable in the
measurement process. Also note that to create Bell states in the first place,
Alice can simply run the circuit in Fig. 4.8 backwards.
Exercise 4.8. Construct explicit quantum circuits that take the start-
ing state |00⟩ to each of the four Bell states.)
would yield superluminal communication that would work across any spatial
distance and would not take any time to occur (beyond the time it takes
Bob to clone his qubit a few times) and thus would violate special relativity.
Hence cloning is fundamentally incompatible with relativistic causality.
In fact, cloning would make it possible for Alice to transmit an unlimited
number of classical bits using only a single Bell pair. Alice could choose
an arbitrary measurement axis n̂. The specification of n̂ requires two real
numbers (the polar and azimuthal angles). It would take a very large number
of bits to represent these real numbers to some high accuracy. Now if Bob can
make an enormous number of copies of his qubit, he can divide the copies
in three groups and measure the vector spin polarization ⟨⃗σ ⟩ to arbitrary
accuracy. From this he knows the polarizaton axis n̂ = ±⟨⃗σ ⟩ Alice chose (up
to an unknown sign since he does not know the sign of Alice’s measurement
result for n̂ · σ). Hence Bob has learned a large number of classical bits of
information. The accuracy ϵ (and hence the number of bits ∼ log2 (1/ϵ)) is
limited only by the statistical uncertainties resulting from the fact that his
individual measurement results can be random, but these can be reduced to
an arbitrarily low level with a sufficiently large number of copies N ∼ 1/ϵ2
of the state. Note however that the ‘cost’ N is exponential in the number of
bits accurately learned. It would be remarkably useful to transmit multiple
classical bits using a single quantum bit. But, it turns out to be impossible
because of the no-cloning theorem.
Exercise 4.9. Suppose that Bob has the ability to clone his qubit,
but does not have the ability to make perfect measurements. Let there
be a probability 0 < ϵ < 0.5 that his measuring apparatus produces
a result that is the opposite of the true result each time he makes a
measurement (i.e., a measurement of Z in state |0⟩ sometimes yields −1
instead of the correct +1). Suppose he uses N copies of his qubit to
measure Z and N to measure X in order to determine the quantization
axis Alice chose. The failure probability will naturally be higher than
that given in Eq. (4.70). For N ≫ 1, give an estimate of Bob’s failure
probability for determining the quantization axis of the qubit.
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT125
for the case p0 = p1 = 21 . Bob knows he should measure in the Z basis. After
Bob makes his measurement, we need to update the probability distribution
based on the new information received. (See the discussion of Bayes Rule
in App. A.) For example, if Bob’s measurement is z = +1 then the new
probability distribution is p0 = 1, p1 = 0 since there is no randomness left.
The Shannon entropy Spost = 0. Hence the information gained by Bob from
his measurement is given by the decrease in randomness of the distribution
upon measurement
I = Sprior − Spost = 1 bit,
as expected.
which she wishes Bob to be able to obtain without her physically sending it
to him. Again, this must done at the expense of destroying the state of her
copy of the qubit because of the no-cloning theorem.
Remarkably, Alice is able to teleport her (unknown) state to Bob using
only LOCC, provided that she and Bob share a pre-existing Bell pair. Alice
applies the Bell state measurement protocol illustrated in Fig. 4.8 to deter-
mine the joint state of the unknown qubit and her half of the Bell pair she
shares with Bob. She then transmits two classical bits using an ordinary clas-
sical communication channel relaying her measurement results to Bob (via
the double line wires shown in Fig. 4.8 ). Note that even if Alice knew what
the state |ψ⟩ was, two classical bits alone are not enough to provide Bob
with the information needed to prepare his own copy of the state |ψ⟩ (since
it takes an infinite number of bits to specify the two angles determining the
position on the Bloch sphere).
To see how Bob is able to reconstruct the initial state using the pre-
existing Bell pair, note that we can rewrite the initial state of the three qubits
in the basis of Bell states for the two qubits that Alice will be measuring as
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT127
unknown
q0 | ψ 〉 H z0
Alice Bell Measurement
q1 z1
Bell
State
| B3 〉 unknown
Bob q2 X Z |ψ 〉 Bob
Figure 4.9: Quantum circuit that Alice can use to teleport and unknown state
to Bob using only two bits of classical information, provided that she and Bob
have shared a Bell pair in advance (in this case Bell state |B3 ⟩). Single wires
indicate quantum channels (qubits), double wires indicate classical information
channels. Based on the measurement results z0 and z1 that Alice sends to Bob,
Bob performs the operation Z z0 X z1 on his half of the Bell pair (q2 ). That is,
if Alice’s measurement result z1 = 1 he performs an X gate and then if Alice’s
measurement result z0 = 1, he performs a Z gate. The quantum state of Alice’s
qubit is destroyed (randomly collapsed) by the measurements and so the no-cloning
theorem is obeyed.
follows
|Φ⟩ = |B3 ⟩ ⊗ [α|0⟩ + β|1⟩] (4.72)
|{z} | {z }
q2 q1 q0
α β
= √ [|000⟩ + |110⟩] + √ [|001⟩ + |111⟩] (4.73)
2 2
1 1
= [β|0⟩ − α|1⟩] ⊗ |B0 ⟩ + [β|0⟩ + α|1⟩] ⊗ |B1 ⟩
2 | {z } |{z} 2| {z } |{z}
q2 q1 q0 q2 q1 q0
1 1
+ [α|0⟩ − β|1⟩] ⊗ |B2 ⟩ + [α|0⟩ + β|1⟩] ⊗ |B3 ⟩ (4.74)
2| {z } |{z} 2| {z } |{z}
q2 q1 q0 q2 q1 q0
From this representation we see that when Alice tells Bob which Bell state
she found (from the decoding step), Bob can find a local unitary operation to
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT128
perform on his qubit to recover the original unknown state. The appropriate
operations are
Alice’s Bell state Alice’s decoded state Bob’s operation
(q1 q0 ) (q1 q0 ) (q2 )
|B0 ⟩ −|11⟩ ZX
|B1 ⟩ +|10⟩ X
|B2 ⟩ +|01⟩ Z
|B3 ⟩ +|00⟩ I
Notice that the final (decoded) state of Alice’s two qubits contains no infor-
mation about the quantum amplitudes α, β in Alice’s original state. This
information has been destroyed during the teleportation process and hence
the no-cloning theorem is not violated when Bob obtains a ‘copy’ of Alice’s
state, since he has the only ‘copy.’
Notice also the similarities between quantum dense coding and state tele-
portation. Both use a pre-positioned Bell pair. In quantum dense coding
Alice uses one of four operations, I, X, Y, Z on her qubit to send two clas-
sical bits to Bob (which become available when she sends her qubit to Bob
and he measures the pair in the Bell basis to obtain one of four results). In
teleportation, Alice makes a joint measurement of her unknown state |ψ⟩ and
her half of the Bell pair to obtain one of four results and sends the resulting
two classical bits to Bob who then uses that information to apply one of four
unitaries to his half of the Bell pair to reconstruct Alice’s state.
How might quantum teleportation be useful in a quantum computer ar-
chitecture? It is much easier to transmit classical bits than quantum bits
from one part of the computer to another (or between nodes in a quantum
computer cluster). Suppose you have a slow and unreliable channel for trans-
mitting quantum bits that can be used to (slowly and unreliably) distribute
Bell pairs. There exist error-correction protocols to distill a few high-fidelity
Bell pairs from a larger collection of faulty Bell pairs. See the discussion to
be added in Chapter XXXXX. Using this we can preposition high-quality
Bell pairs between distant nodes and then teleport quantum states between
them using only ‘local operations and classical communication’ (LOCC).
We will later study an even more powerful protocol in which quantum
gates on distant qubits can be performed locally and then teleported into a
distant qubit. Like state teleportation, this ‘gate teleportation’ has numerous
applications in quantum computer architectures. For example, the CNOT
logic gate requires an operation on the target qubit conditioned on the state
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT129
of the control qubit. This means the two qubits have to physically interact
in some way, something that is most easily accomplished if the qubits are
adjacent to each other in the hardware layout. We can release this constraint
and peform a CNOT between distant qubits using gate teleportation.
2. Show that there is NO state for which Alice can change the probability
of the outcomes for Bob’s Z measurement by applying any unitary to
her qubit. (Because operators on different qubits commute.) However
Alice can change Bob’s probability by making a measurement of her
qubit. But, she can’t control the outcome of the measurement, so there
is no superluminal communication.
3. The value measured for the observable is always one of the eigenvalues
of that observable and the state always collapses to the corresponding
eigenvector. (If two or more of the eigenvalues are degenerate, then
the situation is slightly more subtle. The state is projected onto the
degenerate subspace and then normalized.) Thus if the measurement
result is the non-degenerate eigenvalue λj then the state |ψ⟩ collapses
to
Pj |ψ⟩
|ψ⟩ → p ,
⟨ψ|Pj |ψ⟩
where the square root in the denominator simply normalizes the col-
lapsed state. Introduce this projector idea for measurements to avoid
the confusion that some students think measurement of X is repre-
sented by multiplying by X. Discuss further the confusion that Pauli
operators are both unitary operations and Hermitian observables.
If one measures ⃗σ · n̂ (i.e. asks ‘Are you in state | + n̂⟩ or state | − n̂⟩?),
then from the Born rule, the probability that the measurement result
is ±1 is |⟨±n̂|ψ⟩|2 . If you make a measurement of a product state,
say |00⟩ that asks, ‘Which entangled Bell state are you in?’ the state
will collapse to one of the Bell states and the collapse (aka ‘measure-
ment back action’) leaves the state entangled. Note this doesn’t work
CHAPTER 4. TWO-QUBIT GATES AND MULTI-QUBIT ENTANGLEMENT130
with the Bell state measurement that relies on the decoding into the
standard basis.
Chapter 5
The very first quantum algorithm was proposed by Deutsch in 1985 and so
it will be the first that we study. This was followed by the Deutsch-Jozsa
algorithm in 1992, the first algorithm that was exponentially faster than any
deterministic classical algorithm. We will learn however that probabilistic
classical algorithms can be much faster than deterministic ones, provided
that one can except a small chance of failure. Motivated by this, Simon
invented a problem for which even probabilistic classical algorithms take
exponentially long time and he found a fast quantum algorithm with bounded
failure probability that runs in polynomial time.
These ‘toy’ problems and quantum algorithms were expressly invented to
be difficult for classical computers and easy for quantum computers in order
to demonstrate the possibilities of quantum hardware. They are not practi-
cally useful algorithms but they paved the way for all subsequent ones which
have been invented, including most famously, the Grover search algorithm for
unstructured databases and Shor’s algorithm for finding the prime factors of
large numbers–a task that is required to break RSA public key encryption.
We will study a key component of Shor’s algorithm, the quantum Fourier
transform, since it has wide application.
Before beginning our study of algorithms we will need to understand two
key concepts: ‘phase kickback’ of a controlled unitary operation and the
concept of a quantum oracle.
131
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 132
NOT simply reproduces the classical truth table. The same can be said for
the controlled-NOT (CNOT) operation which applies NOT to a target qubit
iff the control qubit is in |1⟩.
When we put the control bit of a CNOT circuit (see Fig. 5.1) into a su-
perposition state, we obtain a new non-trivial quantum effect, entanglement
1
CNOT0 |0⟩|+⟩ = √ [|00⟩ + |11⟩]. (5.3)
2
(Recall that the subscript on the CNOT tells us which qubit is the control.)
| q0 〉 = 0 H
| q1 〉 = 0
Figure 5.1: Circuit applying a CNOT gate with the control qubit q0 being in a
superposition state so that the initial product state is mapped onto an entangled
Bell state.
Something else interesting happens when the quantum NOT gate is ap-
plied when both control and target are in superposition states as illustrated
in Fig. 5.2. Notice that |±⟩ are eigenstates of the NOT operation
| q0 〉 = + + | q0 〉 = + −
| q1 〉 = + + | q1 〉 = − −
H H
=
H H
Figure 5.2: Upper panels: Circuit applying a CNOT gate with the control qubit q0
and the target qubit q1 being in a superposition state. Because the target is an
eigenstate of the NOT operation, it is actually the control qubit that gets flipped
(due to the phase kickback)! Lower panels: Using the Hadamard gates to change
from the Z basis to the X basis interchanges the control and target of the CNOT
gate.
Here we see a very non-classical effect: we can flip a qubit and leave it in
exactly the same state (up to a possible global phase factor)! It turns out
that if we do a controlled NOT operation on |±⟩ the eigenvalue of NOT is a
relative phase (not a global phase) that gets ‘kicked back’ onto the control.
The circuits shown in Fig. 5.2 illustrate the peculiar effect that results. It
seems that in the X basis the role of target and control are reversed!
H ⊗ H CNOT0 H ⊗ H = CNOT1 . (5.6)
That this reversal is due to the phase kickback can be seen from the following
1 1
CNOT0 | + +⟩ = [|0⟩ + |1⟩] ⊗ |0⟩ + (X ⊗ I) [|0⟩ + |1⟩] ⊗ |1⟩
2 2
= | + +⟩ (5.7)
1 1
CNOT0 | − +⟩ = [|0⟩ − |1⟩] ⊗ |0⟩ + (X ⊗ I) [|0⟩ − |1⟩] ⊗ |1⟩
2 2
1 1
= [|0⟩ − |1⟩] ⊗ |0⟩ − − [|0⟩ + |1⟩] ⊗ |1⟩
2 2
= | − −⟩ (5.8)
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 134
| q0 〉 = 0 H
| q1 〉
U
| qM 〉
Figure 5.3: Circuit applying a general controlled unitary to the state of M qubits.
If the M qubits are in a eigenstate of U with eigenvalue eiφ , the phase kickback
on the control rotates it around the z axis by angle φ.
This phase kickback causes (and thus can be detected by) a rotation of the
control qubit by an angle φj around the z axis. For the case of the controlled-
NOT gate studied above, unitary is X and its eigenvalues are ±1, so the phase
kickback is either 0 or π.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 135
satisfy the Euler-Pauli identity in Eq. (5.11) and are generally natively avail-
able on quantum computing hardware. Hence we don’t usually need a special
exponentiation gadget to execute such gates. (Note that the exponentiation
gadget in fact uses one Xθ gate which is a single qubit rotation a round the x
axis.) However if U is a more complex (Hermitian) unitary such as the three-
qubit operation U = Z ⊗ X ⊗ Z, the ability to exponentiate it is unlikely
to be natively available on the hardware. If however we are able to execute
a controlled version of this unitary (conditioned on the state of a separate
control qubit), then we can take advantage of the exponentiation gadget in
Fig. 5.4.
To see in more detail how the exponentiation gadget works, let |ψ⟩ repre-
sent the state of qubits 1 through M . The state of the system after the first
controlled unitary is
1
√ |ψ⟩ ⊗ |0⟩ + U |ψ⟩ ⊗ |1⟩ . (5.13)
2
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 136
Using the Euler-Pauli identity, we see that after the Xθ gate the state of the
system is
1 θ θ θ θ
√ |ψ⟩ ⊗ [cos |0⟩ − i sin |1⟩] + U |ψ⟩ ⊗ [cos |1⟩ − i sin |0⟩] . (5.14)
2 2 2 2 2
Using the fact that U 2 = I, we see that after the second controlled unitary,
the state of the system is
1 θ θ θ θ
√ |ψ⟩⊗[cos |+⟩+ U |ψ⟩ ⊗[−i sin |+⟩] = cos I−i sin U |ψ⟩ ⊗|0⟩,
2 2 2 2 2
(5.15)
which reproduces Eq. (5.11). Notice that for θ = 0, this is the identity gate.
This makes sense because in this case, the Xθ gate in Fig. 5.4 is the identity
and the two controlled unitaries either act zero times or twice, in both cases
yielding identity for the overall circuit.
As a simple but practical example, suppose that U is the three-qubit
unitary U = Z ⊗ Z ⊗ Z. By itself, this operator cannot take a product state
into an entangled state. For example
U (|+⟩ ⊗ |+⟩ ⊗ |+⟩) = |−⟩ ⊗ |−⟩ ⊗ |−⟩ (5.16)
takes a product state to a product state. The fact that we end up with a
product state remains true even if we exponentiate the individual gates since
θ2 θ1 θ0
e−i 2 Z ⊗ e−i 2 Z ⊗ e−i 2 Z [|+⟩ ⊗ |+⟩ ⊗ |+⟩] =
θ2 θ1 θ0
e−i 2 Z |+⟩ ⊗ e−i 2 Z |+⟩ ⊗ e−i 2 Z |+⟩ . (5.17)
q0 0 H Xθ H 0 q0
q1 q1 Z
q2 Z
q3
U U
qM
Figure 5.4: Left panel: Circuit representing the exponentiation gadget that expo-
nentiates a unitary U provided that it is also Hermitian, U = U † . The single-qubit
θ
gate Xθ ≡ e−i 2 X is a rotation by angle θ about the x axis. Right panel: circuit to
synthesize the controlled ZZZ unitary U using three cZ gates.
Figure 5.5: Upper panel: The exponentiation gadget shown in the left panel of
Fig. 5.4, for the particular case of U = Z1 Z2 Z3 . Lower panel: A completely
equivalent alternative circuit, reminiscent of the multiqubit measurement circuit
in Fig. 4.3.
has only eigenvalues ±1 and can be mapped onto a single auxiliary qubit
using this trick and thus any such unitary can be exponentiated.
| ψ in 〉 | ψ out 〉
classical
data
Figure 5.6: Circuit representation of a generic oracle O that performs a unitary
transformation based on (i.e., parametrized by) classical data supplied to it..
⃗b ⃗c
000 001
001 010
010 011
011 100
100 101
101 110
110 111
111 000
Table 5.1: Data defining the binary function f (⃗b) = ⃗c, that is equivalent to F (k) = (k + 1)
mod 8 for integer k ∈ {0, 7}.
implies that
For the AND gate, the particular function used had two input bits and one
output bit: f (x, y) = x ∧ y, but the reversibility argument applies to any
binary function. This guarantees that O2 = I and thus the inverse exists
O−1 = O. This also shows that the only allowed eigenvalues of O are ±1.
Furthermore, if we represent O as a matrix acting on a column vector
⃗ ⊗ |⃗b⟩ it will yield a new vector representing |d⃗ ⊕ f (⃗b)⟩ ⊗ |⃗b⟩.
representing |d⟩
Similarly,
For the case ⃗b′ ̸= ⃗b, the orthogonality follows immediately, independent of
any property of the function f . For the case ⃗b′ = ⃗b, d⃗′ ̸= d,
⃗ it follows from
f (⃗b) = f (⃗b ) that |ψf ⟩ and |ψf ⟩ involve different bit strings d⃗ ⊕ f (⃗b) and
′ ′
d⃗′ ⊕ f (⃗b) and are therefore orthogonal. This together with the eigenvalues of
O being ±1 also proves unitarity.
Thus we have found a construction for creating a unitary oracle based on
classical data supplied in the form of classical binary function f , even if that
function is itself not reversible. The reversibility relies on f (⃗b) ⊕ f (⃗b) = ⃗0
which is true for any binary function.
q0
| b〉 | b〉
qn
qn +1
| d〉 | c 〉= | d ⊕ f (b )〉
qn + m
classical
data
Again, the simplest example of all this is the AND gate, or equivalently,
the Toffoli gate (or CCNOT gate) shown in Fig. 1.6. This gate can be taken
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 142
directly from the (reversible version of the) classical gate into the quantum
gate construction shown in Fig.5.7 whose action is defined by Eq. (5.23).
Exercise 5.1. Construct a quantum oracle corresponding to the clas-
sical function that adds two one-bit binary numbers (a0) and (b0) to
obtain one two-bit binary number (c1 c0).
individual values of f (0) and f (1) but only needs to know one bit of global
information about the function, namely the value of g = f (0) ⊕ f (1) since:
g = 0 ⇐⇒ f is CONSTANT (5.32)
g = 1 ⇐⇒ f is BALANCED. (5.33)
Again it is clear that classically one has to query the oracle twice to learn
the value of this global property of the function. While this is only a toy
problem, it is remarkable that one can learn a global property of a function
with only a single (quantum) query!
Let us now see how the Deutsch quantum algorithm works. Since this is
a quantum oracle, Bob is free to give a superposition state as input
1
|ψin ⟩ = |d⟩ ⊗ |+⟩ = |d⟩ ⊗ √ |0⟩ + |1⟩ , (5.34)
2
where d is the bit value at the input to the oracle (on what will be the
eventual output line) as shown in Fig. 5.8. It follows from the linearity of
unitary transformations that the output state is
1
|ψout ⟩ = √ |d ⊕ f (0)⟩ ⊗ |0⟩ + |d ⊕ f (1)⟩ ⊗ |1⟩ . (5.35)
2
We see that there is information about both f (0) and f (1) in the output
state even though we have queried the oracle only once. We have asked a
‘superposition of questions.’
Our task is to now harness this result to achieve Bob’s goal. Unfortu-
nately, if we stop here and measure the state in the computational basis, it
will collapse randomly into either the state
or the state
From the measurement result (and knowing the initial state of the d qubit)
we will randomly learn either the value of f (0) or the value of f (1). Even
though the state before the measurement contained information about both
f (0) and f (1), the measurement has not captured the global information we
seek.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 144
To remedy this situation we need to put both the b and d input qubits
into superposition
|ψin ⟩ = |−⟩ ⊗ |+⟩
1
= |0⟩ ⊗ |0⟩ + |0⟩ ⊗ |1⟩ − |1⟩ ⊗ |0⟩ − |1⟩ ⊗ |1⟩ , (5.38)
2
for which the oracle yields
1
|ψout ⟩ = |0 ⊕ f (0)⟩ ⊗ |0⟩ + |0 ⊕ f (1)⟩ ⊗ |1⟩
2
−|1 ⊕ f (0)⟩ ⊗ |0⟩ − |1 ⊕ f (1)⟩ ⊗ |1⟩ . (5.39)
To make further progress, let us consider the two cases:
Case I: f is constant.
In this case f (0) = f (1) and f (0) ⊕ f (1) = 0. Thus we can rewrite the
output state as
|0 ⊕ f (0)⟩ − |1 ⊕ f (0)⟩ |0⟩ + |1⟩
|ψout ⟩ = √ ⊗ √ (5.40)
2 2
= (±|−⟩) ⊗ (|+⟩) . (5.41)
Note that because the function f is constant, the state |c⟩ is going to be
±|−⟩. The (unobservable) ± phase factor is determined by whether the con-
stant function is ONE or ZERO. As we will see shortly, the information Bob
seeks lies in the state |b⟩.
In the case that f is balanced, we know that f (0)⊕f (1) = 1 which implies
0 ⊕ f (0) = 1 ⊕ f (1) (5.42)
0 ⊕ f (1) = 1 ⊕ f (0). (5.43)
Using these relations, Eq. (5.39) can be rewritten
1
|ψout ⟩ = √ |0 ⊕ f (0)⟩ ⊗ |−⟩ + |0 ⊕ f (1)⟩ ⊗ (−|−⟩) (5.44)
2
= (±|−⟩) ⊗ (|−⟩) . (5.45)
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 145
Note that the dummy variable x here ranges over {0, 1} and does not repre-
sent anything to do with eigenvectors of the Pauli X operator.
To find the state after the final Hadamard gate, it is useful to invoke the
following handy identity which can be readily verified by hand for the two
cases x = 0 and x = 1
1 X
H|x⟩ = √ (−1)xz |z⟩. (5.47)
2 z=0,1
These results are perfectly consistent with Eqs. (5.41) and (5.45) once you
take into account that the final Hadamard gate that is included in Fig. 5.8
but not present in Fig. 5.7.
Exercise 5.2. The oracle for the Deutsch algorithm encodes one of
four possible functions, ZERO, ONE, IDENTITY, and NOT. Construct
explicit quantum circuits to realize each of these oracle functions.
b = + H
f
d = − c = ± −
Figure 5.8: Circuit that executes the Deutsch algorithm to determine with a single
query whether the function f is constant or balanced. The classical data defining
the function f is encoded in the oracle Of . Measurement of qubit b yielding result
z = +1 tells us that f is constant, while measurement result z = −1 tells us that
f is balanced.
encoding the function in a quantum oracle and querying the oracle only a
single time. This is a toy model but a highly instructive one. The Deutsch-
Jozsa algorithm also involves a toy problem, but a more sophisticated one,
designed to show off an exponential separation between the power of the
best deterministic classical algorithm for the problem and a simple quantum
algorithm.
From our study of the Deutsch algorithm we know that there are four
binary functions f : {0, 1} → {0, 1} and that two of them are constant and
two of them are balanced. For Deutsch-Jozsa we will study functions that
map n bits to one bit: f : {0, 1}n → {0, 1} where the situation is more
complicated. Let ⃗x, ⃗y be vectors in {0, 1}n (i.e., binary strings of length n).
Alice chooses a function f such that f (⃗x) = 0 (say) for some ⃗x and f (⃗y ) = 1
(say) for some ⃗y . Recall that the number of functions on n bits into m bits is
n n
Z(n, m) = 2m2 . In this case m = 1 so Z = 22 . Out of this enormous space
of possible functions that Alice could choose, only two of them are constant,
ZERO : f (⃗x) = 0 ∀⃗x, (5.55)
ONE : f (⃗x) = 1 ∀⃗x. (5.56)
A balanced function is defined as before. Of the 2n possible input strings, it
outputs 0 for half of its 2n possible arguments and outputs 1 for the other
half of its arguments. If we look at the example of n = 2 which is listed
2
in its entirety in Table 1.4, there are Z(2, 1) = 22 = 16 possible functions.
Of these, two are constant, 6 are balanced and 8 are neither constant nor
balanced. For larger n, the vast majority of the functions are neither constant
nor balanced. (See Ex. 5.3).
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 148
In the Deutsch-Jozsa problem, Alice gives Bob a promise that the function
f she has chosen for her oracle maps n bits to one bit and is either constant
or balanced. (Note that since most functions are neither, this is an important
promise.) Bob’s task is to discover where the f is constant or balanced The
circuit for the Deutsch-Jozsa algorithm is essentially the same as in Fig. 5.8,
except the upper wire is replaced by n input wires in a product state in which
each qubit in the |+⟩ state. At the output, a Hadamard is applied to each of
the upper wires followed by a Z measurement of each of the n upper wires.
We begin our analysis of the circuit by noting that the input product
state of the upper n wires an be written as an equal superposition of all
computational basis states
1 X
|+⟩⊗n = √ |⃗x ⟩, (5.57)
2n ⃗x
where ⃗x is a binary string of length n and the sum is over all 2n such strings.
The analog of Eq. (5.46) for the state of the circuit after application of
the oracle is thus
( )
|0⟩ − |1⟩ 1 X
|ψ1 ⟩ = Of √ ⊗√ |⃗x ⟩
2 2n ⃗x
X 1 1
= √ |0 ⊕ f (⃗x)⟩ − |1 ⊕ f (⃗x)⟩ ⊗ √ |⃗x ⟩
⃗
x
2 2n
1 X
= |−⟩ ⊗ √ (−1)f (⃗x) |⃗x ⟩, (5.58)
n
2 ⃗x
See App. B.4 for a discussion of the inner product for the vector space of bit
strings.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 149
Using this identity, the output state of the circuit before the final mea-
surements is
1 XX
|ψ2 ⟩ = |−⟩ ⊗ n (−1)f (⃗x)+⃗x.⃗z |⃗z⟩. (5.61)
2
⃗
x ⃗
z
By analogy with what we did in the case of the Deutsch algorithm, let us
look at the amplitude of the state |−⟩ ⊗ |⃗z = ⃗0⟩
1 X
a0 = n
(−1)f (⃗x) . (5.62)
2
⃗
x
⃗
If f is constant a0 = (−1)f (0) = ±1 and so all other amplitudes vanish, and
the only possible measurement outcome is the all zeroes bit string ⃗0. If f is
balanced then a0 = 0 and the all zeroes measurement result never occurs.
Thus if any of the measurement results is non-zero, we are guaranteed that
the function is balanced, and if all the measurement results are zero, we are
guaranteed that the function is constant.
Just as for the Deutsch algorithm, a single query of the quantum oracle
tells us the global property of the function. Thus the performance of the two
quantum algorithms is essentially the same. The difference here lies in the
difficulty of the classical algorithm to solve these two problems. The classi-
cal algorithm requires only two oracle queries to solve the Deutsch problem
with certainty. Indeed two queries is enough for Bob to completely specify
the function Alice chose, not just whether it is constant or balanced. The
Deutsch-Jozsa problem is however much harder since it maps n bits to one
bit rather than just 1 bit to 1 bit. To completely learn Alice’s function, Bob
would have to query the function 2n times, once for each possible argument
⃗x of the function to learn all of the f (⃗x) values. How many queries must
Bob make to learn only whether the function is constant or balanced? The
worst-case classical scenario is when the function is such that Bob queries
the oracle 2n /2 times (with different arguments) and gets the same result
(say 0) every time. This means the function could be constant (if all of the
remaining queries return 0) or it could be balanced if all of the remaining
queries return 1). Since Alice has given Bob a promise that the function is
either constant or balanced, he needs to measure only one more value of the
function to be certain of the answer. Thus the so-called ‘query complexity’
of the best classical algorithm is 2n /2 + 1 is exponentially larger than that
of the quantum algorithm.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 150
Interestingly, one can use a slightly different measure of the degree of dif-
ficulty of the classical algorithm and obtain a very different answer. Suppose
that we seek a classical algorithm which is stochastic (i.e., involves random-
ness in some way) and produces the correct answer with probability 1 − ϵ.
We can ask how the query complexity grows as we make the acceptable error
probability ϵ smaller and smaller. It turns out that Turing machines (uni-
versal classical computers) that contain a source of (true) randomness can
create stochastic algorithms that are more powerful than purely determin-
istic Turing machines. To understand this, suppose that Alice and Bob are
adversaries and Alice is trying to choose functions f that make Bob’s task as
difficult as possible. If she knows that Bob always deterministically orders
a certain way (say first querying f (0000), then f (0001), then f (0010), etc.
in ascending order), then she can choose functions that produce the worst-
case scenario discussed above. Bob however can defeat this strategy if his
queries use truly randomly chosen (but non-repeating) arguments ⃗x. If he
ever sees any two values of the function that are different from each other,
he knows that the function is balanced. If the function is balanced and the
arguments ⃗x are chosen randomly, then any given query result f (⃗x) is equally
likely to be 0 or 1. Suppose Bob is unlucky and after M queries he has seen
f (⃗x) = 1 (say) every time, strongly suggesting to Bob that the function is
constant. The probability that a balanced function would give this result
(thereby fooling Bob)
where the extra factor of 2 is from the fact that the first M measurement
results could be all 0’s or all 1’s. This equation is reminiscent of Eq. (4.70)
presented in the discussion of cloning and superluminal communication.
Note that Eq. 5.63 assumes that ⃗x is chosen randomly and allows for the
possibility that the same bit string could be chosen more than once. It would
be (slightly) better if Bob chose ⃗x randomly and then removed it from his
list so that there would be no repeats. Then the failure probability would
be slightly lower because he can’t be fooled by the same bit strings a second
time. Imagine you have a box filled with N = 2n numbers, half of them being
0 and half of them one. The probability that the first one you draw from
the box is a 0 is NN0 = 12 , where N0 = N/2 is the initial number of 0’s. After
discarding that number, the probability that the second number is also a 0
is N0N−1 = 21 − N1 . Continuing this process we see that the failure probability
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 151
where
is a fixed binary vector that Alice chooses. Bob’s task is to learn ⃗u with the
smallest possible number of queries of the function f . Before studying the
quantum algorithm, let us think about the query complexity in the classical
case. The vector ⃗u is unknown to Bob since Alice has chosen it. Bob can
query the function with a sequence of input vectors each of which contains
only a single non-zero entry
and with each query will learn one bit in the string ⃗u because
f (⃗xj ) = uj . (5.70)
Since Alice could have chosen the bits in the string ⃗u at random, there is no
shortcut that can reduce the classical query complexity below n, the length
of the bit string.
Remarkably, the Bernstein-Vazirani quantum algorithm can find ⃗u (with
certainty) with only a single query to the quantum oracle that encodes the
function f ! The circuit is the same one used for the Deutsch-Jozsa algorithm
and the output is given by Eq. (5.61). However because of the particular
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 154
form of the function given in Eq. (5.66), we can say more about the solution
1 XX
|ψ2 ⟩ = |−⟩ ⊗ (−1)f (⃗x)+⃗x.⃗z |⃗z⟩
2n
⃗
x ⃗
z
1 XX
= |−⟩ ⊗ n (−1)⃗u.⃗x+⃗x.⃗z |⃗z⟩
2
⃗
x ⃗
z
1 XX
= |−⟩ ⊗ n (−1)(⃗u⊕⃗z).⃗x |⃗z⟩
2
⃗
x ⃗
z
= |−⟩ ⊗ |⃗u⟩. (5.71)
The last line follows from the fact that if ⃗y = ⃗u ⊕ ⃗z ̸= ⃗0 then the function
g(⃗x) = ⃗y .⃗x is necessarily balanced (see Box 5.1), meaning that
X
(−1)g(⃗x) = 0 (5.72)
⃗
x
the control and target qubits, thereby giving himself control of the situation
and flipping only those input bits j for which uj = 1, thereby decoding Alice’s
hidden bit string in a single query of the oracle!
g(⃗x) + g(⃗x ′ ) = 1
′
(−1)g(⃗x) + (−1)g(⃗x ) = 0
X
(−1)g(⃗x) = 0, (5.74)
⃗
x
and thus g(⃗x) is balanced. Boxes 5.2 and 5.3 present more complex but
interesting and informative alternative proofs.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 156
b0
b1
b2
b3
b4
b5
d0 d0 ⊕ u. b
Figure 5.9: Circuit realization of the Bernstein-Vazirani oracle for the function
f (⃗b) = ⃗u.⃗b with (in this case) ⃗u = (u5 u4 u3 u2 u1 u0 ) = (101101). The CNOT gates
all have d0 as their target, but the controls are limited to those wires j for which
uj = 1. Thus the CNOT will flip d0 only if bj uj = 1 and the net result after all
the CNOTs have been applied is d0 → d0 ⊕ ⃗u.⃗b.
Figure 5.10: Left panel: Circuit realizing the Bernstein-Vazirani oracle and al-
gorithm for the function f (⃗b) = ⃗u.⃗b with (in this case) ⃗u = (u5 u4 u3 u2 u1 u0 ) =
(101101). The CNOT gates all have d0 as their target, but the controls are limited
to those wires j for which uj = 1. Thus the CNOT will flip d0 only if bj uj = 1
and the net result after all the CNOTs have been applied is d0 → d0 ⊕ ⃗u.⃗b. Right
panel: Mermin’s analysis of the same circuit obtained by conjugating each CNOT
with identity operations H 2 = I. The controls are now all d0 = 1 and the targets
are those input lines j for which uj = 1. Thus the output register maps to ⃗b = ⃗u.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 157
Let S be the set of indices j for which yj = 1. Since ⃗y ̸= ⃗0, this set is not
empty. Let S̄ be its complement (i.e., the set of indices k for which yk = 0).
Then we can rewrite the product above as two products
!
Y X Y X
Q= (−1)0 (−1)xj . (5.77)
k∈S̄ xk =0,1 j∈S xj =0,1
Box 5.3. Balanced Function Lemma Alterantive Proof II: Let ⃗x and
⃗y be bit strings of length n. Define the function g : {0, 1}n → {0, 1} via
and the summation in the last line is only over the set S of j values for which
yj = 1. Since ⃗y ̸= ⃗0, this set is not empty. Let k be the cardinality of the
set S (i.e., the number of non-zero bits in ⃗y ). The complement of this set, S̄
has cardinality n − k (i.e., the number of zero bits in ⃗y is n − k). The value
of m can range from 0 (if for all the j ∈ S, xj = 0) to k (if for all the j ∈ S,
xj = 1). The values of xj for j ∈ S̄ do not contribute since the corresponding
yj ’s vanish. Using these facts we can write
X k
X
(−1)g(⃗x) = Gm (−1)m , (5.81)
⃗
x m=0
where Gm is the number of times that a given value of m occurs when sum-
ming over ⃗x
n−k k
Gm = 2 . (5.82)
m
The power of 2 is the contribution from the sum in the LHS of Eq. (5.81) over
the (n − k) components of ⃗x in S̄, and the binomial factor comes from the
sum over the k components of ⃗x in S subject to the constraint in Eq. (5.80);
i.e., it is the number of distinct ways of distributing m 1’s and (k − m) 0’s
among the k positions within the set S. Finally we obtain
k
X
g(⃗
x) n−k
X
k−m m k
(−1) =2 (+1) (−1) = 2n−k [1 + (−1)]k = 0.
m
⃗
x m=0
(5.83)
4. Task: find the ‘hidden’ bit string ⃗b. (Note that f⃗ is one-to-one ⇐⇒
⃗b = ⃗0.)
Then we know that the function is two-to-one and we can easily solve for the
unknown ⃗b using the identity
Thus is we can find a matching pair of inputs, we have solved the problem.
Note that if we cannot find a matching pair, then we have shown that the
function is one-to-one and have also solved the problem (since ⃗b = ⃗0 in this
case).
How hard is it to find a pair with matching outputs? In the worst case,
to be absolutely sure that there are no matching pairs we would need to
evaluate N = 2n /2 + 1 inputs. This is because if we evaluate only half of
the possible inputs, the partners of all those inputs might accidentally be in
the other half. Hence the query complexity for a deterministic solution is
exponential in n, just as for the Deutsch-Jozsa problem.
As we did for Deutsch-Jozsa, let us now ask what the query complexity is
for a probabilistic algorithm. Suppose we query f⃗ M times with M distinct
inputs ⃗x. The number of distinct pairs in our sample is
M (M − 1)
Mpairs = , (5.86)
2
where the factor of 2 in the denominator is to prevent counting {⃗xj , ⃗xk } and
{⃗xk , ⃗xj } as different pairs.
Given some ⃗x1 , and assuming f⃗ is two-to-one, the probability that a
randomly selected ⃗x2 ̸= ⃗x1 yields f⃗(⃗x2 ) = f⃗(x1 ) is
1
ϵ= ∼ 2−n . (5.87)
2n − 1
This is because there are 2n − 1 choices for x2 and only one of them gives a
matching output (assuming f⃗ is two-to-one). There is only one way to fail
to find a match in a sample of size M : no pairs match. This occurs with
probability
M (M −1)
Pfail = (1 − ϵ) 2
M (M −1) M (M −1)
= eln(1−ϵ) 2 ≈ e−ϵ 2 , (5.88)
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 161
where the last (approximate) equality follows from Taylor series expanding
the logarithm to first order in ϵ. Thus if ϵ M (M2 −1) > 1, there is a reason-
able chance of success (which rapidly approaches unity as M exceeds this
threshold). For small ϵ, the threshold value for M is
12
2
M0 ∼
ϵ
1
2 2
∼
2−n
n+1
∼2 2 . (5.89)
We see that because though the number of pairs in our samples scales
quadratically with the sample size,1 we can begin to √ be reasonably sure of
finding a matching pair after sampling only about 2n times, even though
the worst case scenario would require sampling 2(n−1) + 1 times. Neverthe-
less, we still require a number of samples that is exponential in n to insure a
reasonable probability of success. Thus, unlike the Deutsch-Jozsa problem,
Simon’s problem lies outside complexity class BPP. we can use a probabilis-
tic algorithm to obtain a bounded error, but not in polynomial time. Thus
Simon’s problem is exponentially (in n) harder than Deutsch-Jozsa for a
classical computer, even if we allow probabilistic computation.
q0 0 H H
0 H H
0 H H
qn −1 0 H H
qn 0
U f
0
q2 n −1 0
It is convenient at this point to partition the set S of all the vectors ⃗x into
two disjoint parts parts S = S1 ∪ S2 defined by ⃗x ∈ S1 ⇐⇒ (⃗x ⊕ ⃗b) ∈ S2 .
That is each ⃗x and its partner ⃗x ⊕ ⃗b are in opposite subsets. It does not
matter which subset we choose to put ⃗x in, only that its partner is not in
the same subset. With this partition, Eq. (5.90) can be written
1 X 1 h i
|ψ1 ⟩ = √ f⃗(⃗x) ⊗ √ ⃗x + ⃗x ⊕ ⃗b , (5.91)
2(n−1) ⃗x∈S1 2
where we have taken advantage of the fact that f⃗(⃗x ⊕ ⃗b) = f⃗(⃗x). It is
important to understand that the bit string f⃗(⃗x) is unique in the sense that
the value of f⃗(⃗x) is unique for every vector ⃗x ∈ S1 . The only time the same
bit string occurs is for the partner ⃗x ⊕ ⃗b which lies in S1 .
As shown in Fig. 5.11, the next step is to measure the lower register. It
turns out that this step is not actually necessary, but it greatly simplifies the
analysis. Measurement of the lower register tells us the unique vector f⃗(⃗y )
for some random (and as yet unknown) vector ⃗y ∈ S1 . As a result the state
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 163
collapses to
1 n o
ψ ′ = √ f⃗(⃗y ) ⊗ ⃗y + ⃗y ⊕ ⃗b . (5.92)
2
Crucially, because the function (we are assuming) is two-to-one, the upper
register only partially collapses, ending up in a superposition of the two states
that are consistent with the measurement result because they yield the same
value of the function (same state within the register being measured). (For a
reminder about partial state collapse under measurement of a subset of the
qubits in a system, see Sec. 4.2.) At this point we have learned the value of
f⃗ but not the value of ⃗y or ⃗y ⊕ ⃗b.
Stop for a moment and consider just how powerful this result is. The
huge superposition state has collapsed to a very small superposition that
automatically picked out (at random) a state |⃗y ⟩ and its partner |⃗y ⊕ ⃗b⟩!
This is a huge advantage–but we are not yet home free. If we measure the
upper register, the state collapses to either |⃗y ⟩ or |⃗y ⊕ ⃗b⟩ and we have lost
all the information we needed to find ⃗b using ⃗b = ⃗y ⊕ (⃗y ⊕ ⃗b).
The solution to this difficulty is shown in Fig. 5.11. Before measuring the
upper register we apply H ⊗n to it to obtain a kind of quantum interference
between |⃗y ⟩ and |⃗y ⊕ ⃗b⟩ that will allow us to determine ⃗b
1 n o
ψ ′′ = √ f⃗(⃗y ) ⊗ H ⊗n ⃗y + ⃗y ⊕ ⃗b . (5.93)
2
Recalling Eqs. (5.47,5.59), and Eq. (5.60) we obtain
′′ 1 ⃗ 1 X ⃗
y .⃗
z
n
⃗b.⃗
z
o
ψ = √ f (⃗y ) ⊗ √ (−1) 1 + (−1) ⃗z . (5.94)
2 2n ⃗z
Noting that the bit-string vector dot product is computed modulo 2, we see
immediately that if ⃗b.⃗z = 1, the term in curly brackets vanishes, while if
⃗b.⃗z = 0, then the term in curly brackets is +2. We conclude therefore that
the measurement collapses the state with equal probabilities onto all possible
states |⃗z⟩ obeying
⃗b.⃗z = 0. (5.95)
comprising one half of the set of all possible vectors. The measurement results
yield random vectors ⃗z of length n that are ‘perpendicular’ to ⃗b. If we run
the algorithm n − 1 + m times with m being a small integer constant of order
unity, it is highly likely that measurement results {⃗zj ; j ∈ [1, n − 1 + m]} will
yield a set of n − 1 linearly independent non-zero vectors ⃗vj ; j ∈ 1, n − 1, so
that the set of linear equations
⃗b.⃗v1 = 0
⃗b.⃗v2 = 0
⃗b.⃗v3 = 0
..
.
⃗b.⃗vn−1 = 0 (5.96)
P = 1 − 2−(n−1) . (5.97)
Next suppose that we have obtained a non-zero ⃗v1 = ⃗z1 . There are now two
ways to fail to advance when we call the oracle a second time: either ⃗z2 = ⃗0
or ⃗z2 = ⃗z1 . Hence the success probability to obtain a second vector ⃗v2 that
is non-zero and is linearly independent of ⃗v1 is
P = 1 − 2−(n−2) . (5.98)
2ℓ
Pℓ→ℓ = = 2ℓ−(n−1) , (5.99)
2(n−1)
and the success probability to transition from having ℓ to ℓ + 1 linearly
independent non-zero vectors is
2n−1 − 2ℓ
Pℓ→ℓ+1 = = 1 − 2ℓ−(n−1) . (5.100)
2n−1
Notice that this expression for P0→1 agrees with Eq. (5.97) and this expression
for P1→2 agrees with Eq. (5.98). The final transition to the goal has the
lowest success probability: Pn−2→n−1 = 12 . Also notice that P(n−1)→n = 0 (so
that P(n−1)→(n−1) = 1) as required since the number of linearly independent
vectors cannot exceed n − 1 (for the case ⃗b ̸= ⃗0) and the transitions must
terminate.
These transition probabilities define a Markov chain illustrated in
Fig. 5.12. The probability of passing through the chain from ℓ = 0 to
ℓ = (n − 1) in n − 1 steps (i.e., successfully advancing each time with zero
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 166
failures) is
n−2
Y n−2
Y
P0 (n) = Pℓ→ℓ+1 = [1 − 2ℓ−(n−1) ]. (5.101)
ℓ=0 ℓ=0
In the limit of large n the success probability in the initial stages is extremely
close to unity. As a result the probability of reaching the goal with zero
failures approaches a constant given by the infinite product
1 3 7 15 31
P0 (∞) = . . . ≈ 0.288788095. (5.102)
2 4 8 16 32
Every ‘trajectory’ must pass through the chain from beginning to end and
so the factor P0 (n) appears in the probability of every trajectory.
Let us now consider all the trajectories that involve precisely one failure
(an hence require n − 1 + 1 = n calls to the oracle). The failure to advance
can occur at any site from ℓ = 0 to ℓ = n−2. Summing over all these distinct
trajectories gives
n−2
X
P1 (n) = P0 (n) Pℓ→ℓ = P0 (n)S1 (n), (5.103)
ℓ=0
where
0
X
S1 (n) ≡ 2ℓ−(n−1) (5.104)
ℓ=n−2
1 1 1 1
= + + + . . . n−1 , (5.105)
2 4 8 2
where for convenience we have reversed the order of the summation. In the
limit of large n this approaches
1 1
S1 (∞) = = 1, (5.106)
2 1 − 12
A little thought shows that the corresponding sum over all possible com-
binations of two failure probabilities is
"∞ #2 ∞
1 X 1X 2 1 1 2
S2 (∞) = Pℓ→ℓ + [Pℓ→ℓ ] = 1+ = . (5.108)
2 ℓ=0 2 ℓ=0 2 3 3
From this we obtain the total probability to arrive at the goal in at most
n + 1 calls to the oracle is
2
P0 (∞) + P1 (∞) + P2 (∞) = 2 + P0 (∞) ≈ 0.770102. (5.109)
3
We thus see that the probability is rapidly converging towards unity as the
number of calls n − 1 + m to the oracle increases beyond the minimum n − 1.
With further effort it is possible show that this convergence is exponential
in m. Roughly speaking, the last step from ℓ = n − 2 to ℓ = n − 1 is
the bottleneck in the dynamics since it has the lowest success probability
Pn−2→n−1 = 21 . From this it follows that the probability to fail to reach the
goal for large m is proportional to 2−m .
= 0 otherwise. (5.111)
d Pℓℓ
f¯ℓ = − fℓ (λ) = . (5.114)
dλ λ=0 (1 − Pℓℓ )2
The mean number of failures before reaching the end of the Markov chain is
thus
n−2
X Pℓℓ
m̄(n) =
ℓ=0
(1 − Pℓℓ )2
n−1
X 2−k
= ≈ 2.744, (5.115)
k=1
(1 − 2−k )2
where the last equality is the result for asymptotically large n but is accurate
(to three digits after the decimal point) for any n ≥ 16. The small value of
the mean number of failures is consistent with the success probability rapidly
approaching unity for m ∼ 3 and larger. [Note this argument does not rule
out the possibility of a long tail in the distribution of the number of failures.
However a similar calculation of the mean square number of failures does
rule this out. Appendix G in Mermin, Quantum Computer Science, provides
a lower bound on the success probability after n − 1 + m queries.]
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 169
P0→1 Pn − 2→n −1
=0 =1 = n − 2 = n − 1
Figure 5.12: Markov chain showing the state transition probabilities of Simon’s
algorithm as one calls the oracle in an attempt to increase the number of linearly
independent output vectors from ℓ to ℓ + 1. Pℓ→ℓ is the probability of failing to
advance and Pℓ→ℓ+1 is the success probability.
⃗x Y⃗ ⃗
Z
Table 5.3: Unsorted database (of size 8) created by Alice. Here ⃗x is the binary number
indicating the ordinal position in the database and Y⃗ (⃗x) and Z(⃗
⃗ x) are a pair of binary
numbers of fixed length m representing the data. For example, Y ⃗ (⃗x) might be the ASCII
encoding of a person’s name, and Z(⃗⃗ x) might be the binary encoding of their telephone
number. The list of names is not sorted alphabetically but rather is in random order.
phone number he seeks. (Of course he could set up the function with extra
bits in the output to automatically return the phone number if there is a
match, but to simplify the notation we are leaving this aside.) As noted
in Box 5.6, the important fact that Bob can use the database to create the
oracle himself emphasizes the point that the oracle is not telling Bob the
location of the data he is looking for–it is simply telling him whether or not
the data is actually at the particular location that Bob queries the oracle
about. That is, Bob is guessing the location in the database and the oracle
is telling him whether or not his guess is correct.
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 172
Box 5.6. Who can create the Grover Search Oracle? It is interesting
to recall that in the algorithms we have studied previously, it is the adversary
Alice who creates the oracle function and Bob has access to it only as a ‘black
box.’ Bob’s task is to learn something about the hidden structure of the
oracle function (i.e., a hidden bit string used to define the function). Here
the situtation is different. Alice may have created the random ordering of the
data in the database, but Bob can easily create the oracle function himself
without knowing where the name he is looking for is in the database. He
simply creates a function that takes in a value of ⃗x (and implicitly takes in
the name Bob is searching for), goes to position with index ⃗x in the database
and checks if the name there matches the name Bob is searching for. If it
does match, the function returns 1, and if not, it returns 0.
It is important here to again emphasize that even though the form of the
oracle unitary as written in Eq. (5.120) depends explicitly on the ‘answer’
⃗y that Bob seeks, Bob is still able to use the database to create the oracle
himself without having to explicitly know the answer ⃗y in advance. Note
that to do this, he has to use the extra qubit d to create the reversible
(n+1)×(n+1) unitary oracle matrix Of based on the query success function
f . It is only when we consider the special case that d is initialized in the
state |−⟩ that is left unchanged at the end, that we can restrict our attention
to the first n qubits and write the effective n × n unitary matrix Uf . (See
Box 5.6.)
Again, just as in the Deutsch-Jozsa algorithm, we will initialize the n
upper (input) wires in state
so that when we apply the oracle once we make one query, but it is in a
superposition of all possible single queries. The result is
Uf |Φ0 ⟩ = I − 2|⃗y ⟩⟨⃗y | |Φ0 ⟩ (5.124)
1 X
=√ (−1)f (⃗x) |⃗x ⟩ (5.125)
n
2 ⃗x
1 X 1
=√ |⃗x ⟩ − √ |⃗y ⟩. (5.126)
2n ⃗x̸=⃗y 2n
We see that the giant superposition state |Φ0 ⟩ is unchanged except one entry
(the one we want) has been ‘marked’ by the oracle because its sign flipped.
Now consider a function g that is like f except that it is for the specific
case where the desired entry in the database has index ⃗0. That is, g(⃗x) = 1
for ⃗x = ⃗0 but g(⃗x) = 0 for all ⃗x ̸= ⃗0. Now encode this function in a unitary
oracle (that we can create since know the function g) Ug . By the same
argument as above
Ug = I − 2|⃗0⟩⟨⃗0|. (5.127)
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 174
Since (for large n) g ≪ 1, the initial state is nearly orthogonal to the target
state |⃗y ⟩, and thus is extremely close to the north pole. Further more it lies
in the XZ plane of the Bloch sphere as shown in Fig. 5.13. The initial state
|Φ0 ⟩ contains the target state, but with only a very small amplitude so that
a measurement is exponentially unlikely to yield the desired answer. The
goal of the Grover algorithm is to ‘amplify’ the quantum amplitude of the
target state to near unity. If this can be achieved, then measurement of the
output register will yield the database address ⃗y corresponding to the name
of Bob’s friend. With this information Bob then knows where to look to find
his friends phone number.
χ Φ0
R Φ0
θ0
0
2θ
θ=
− +
y
Figure 5.13: Effective Bloch sphere for the Grover search algorithm for the two-
dimensional subspace spanned by the starting state |Φ0 ⟩ and the target state |⃗y ⟩.
The target state lies at the south pole and the starting state lies very close to the
north pole, at exponentially small polar angle θ0 . Application of the rotation R
increase the polar angle by 2θ0 .
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 176
Exercise 5.4. Show that the basis state orthogonal to |⃗y ⟩ in Eq. (5.131)
is given by
1 X
|χ⟩ = √ |⃗x ⟩.
2n − 1 ⃗x̸=⃗y
and thus have the matrix representation of R as a rotation around the Y axis
of the effective Bloch sphere by a small angle +θ
cos 2θ − sin 2θ
θ
R= = e−i 2 Y , (5.146)
+ sin 2θ cos 2θ
with
θ √ θ0 θ0
sin = 2g W = 2 sin cos = sin θ0 , (5.147)
2 2 2
where θ0 is the polar angle of the state |Φ0 ⟩. Thus the rotation associated
with R is θ = 2θ0 = 4 arcsin g ≈ 4g, where for the last equality we have used
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 177
the small angle approximation that is valid for large n. We see from Fig. 5.13
that since the rotation is about the Y axis by a positive angle θ0 , repeatedly
applying R rotates the Bloch vector closer and closer to the target state at
the south pole. After N applications of R (N oracle queries) the polar angle
of the Bloch vector is
π 1 π√ n
Nq ≈ − ≈ 2 , (5.149)
4 arcsin g 2 4
where again, the last approximate equality is valid for large n (small g).
Thus we have the remarkable result that even though Bob does not know
the target state, having access to the search oracle that can check if a guess
is correct, allows Bob to rotate the initial state |Φ0 ⟩ (almost perfectly) into
the target state |⃗y ⟩! Again, the oracle does not tell Bob directly what the
value of ⃗y is, only whether a guessed value ⃗x is correct or not.
The average case classical query complexity is Nc ∼ 2n−1 ∼ Nq2 is quadrat-
ically worse than the quantum query complexity. This is often referred to as
a quadratic quantum speed-up. Because the speed up is not exponential, the
Grover algorithm must still call the oracle an exponential number of times
n
Nq ∼ 2 2 . (5.150)
z λ z
λ
Figure 5.14: Grover amplification process. Left panel shows that Uf rotates the
Bloch vector ⃗λ of state |Φ0 ⟩ by π around the Z axis. Right panel shows the next
step in which G rotates the state vector by π around the ⃗λ axis. The combined
effect increases the polar angle of the state vector from θ0 to 3θ0 , equivalent to a
rotation around the Y axis by an angle of 2θ0 .
Note that if f (⃗x) = 1 for M distinct values of ⃗x, then this state is properly
normalized. If the algorithm were able to rotate the starting state |Φ0 ⟩ into
|TM ⟩, then with high probability a single measurement will yield a random
value ⃗y from among the M distinct values. If Bob desires to obtain all M of
the solutions ⃗y1 , . . . ⃗yM to the equation f (⃗x) = 1, he can do so probabilisti-
cally by running the search algorithm a number of times of order M until it
succeeds in finding all M entries.
The operator G and the initial state |Φ0 ⟩ remain the same as in the
previous case. The analog of Eq. (5.120) is
M
X
Uf = I − 2 |⃗yj ⟩⟨⃗yj |. (5.155)
j=1
Hence within the subspace spanned by the initial state |Φ0 ⟩ and the target
state |TM ⟩, the oracle Uf is effectively equal to Vf .
The upshot is that the only change from the analysis for the case of M = 1
is the increase in the overlap between the initial and target states from g to
√
r
M
⟨TM |Φ0 ⟩ = = g M. (5.159)
2n
√
Replacing g in all the formulae by√ g M means that (for large n) the rotation
angle θ increases
q by a factor of M and the rotation must be applied fewer
n
times N ≈ 4 2M to optimize the success probability (of finding one of the
π
Oblivious Amplification
Notice that if we apply the Grover rotation operator R in Eq. (5.129) too
many or too few times, we will over or under rotate the state on the Bloch
sphere and not end up close to the optimal position at the south pole. The
correct number of times to apply R is determined by the size of the database
(Nd = 2n ) and the number of instances M for which f returns the value 1
(i.e., the number of ‘correct answers’). We have to be given Nd to be able to
query the oracle. But what if the value of M is unknown? Then we have a
problem. Fortunately there is a clever modification of the Grover algorithm
that solves this problem by means of oblivious amplitude amplification. That
is, independent of the value of M , it automatically stops the process when it
has brought the state near the south pole. [The method, introduced by Berry
et al. in 2014, is reviewed in ‘Fixed-point oblivious quantum amplitude-
amplification algorithm,’ Bao Yan et al., Scientific Reports volume 12, Article
number: 14339 (2022). See also Appendix D of PRX QUANTUM 2, 040203
(2021)]
CHAPTER 5. ALGORITHMS AND COMPLEXITY CLASSES 181
(1/3)
2
An example of superpolynomial scaling would be a function of the form e(1/2)n ,
which grows more rapidly for large n than any polynomial in n, but more slowly than
exponentially in n, say as e(1/16)n .
Chapter 6
182
CHAPTER 6. QUANTUM ERROR CORRECTION 183
The logical error rate is plotted against the physical probability in Fig. 6.1.
We see that if the physical error probability per qubit obeys p < 1/2, then
plogical < p and the error correction scheme reduces the error rate (instead
of making it worse). Thus the ‘break-even’ point is p∗ = 1/2. If the error
probability is far below the break-even point, for example p = 10−6 , then
plogical ∼ 3p2 ∼ 3 × 10−12 . Thus the lower the raw error rate, the greater the
improvement because the logical error rate scales quadratically ∼ 3p2 . Note
however that even at this low error rate, a petabyte (8 × 1015 bit) storage
system would have on average 24,000 errors. Furthermore, one would have
to buy three petabytes of storage since 2/3 of the disk would be taken up
with ancilla bits!
pphysical
break-
even
plogical point
pphysical
Figure 6.1: Plot of logical error probability vs. physical error probability per bit for
the three bit repetition code. For small physical error probability, the logical error
probability scales quadratically, plogical ∼ 3p2physical . The failure probability for the
quantum repetition code is the same.
We are now ready to enter the remarkable and magic world of quantum
error correction. Without quantum error correction, quantum computation
would be impossible and there is a sense in which the fact that error cor-
rection is possible is even more amazing and counterintuitive than the fact
of quantum computation itself. Naively, it would seem that quantum error
correction is completely impossible. The no-cloning theorem (see Box 3.4)
does not allow us to copy an unknown state of a qubit onto ancilla qubits.
Furthermore, in order to determine if an error has occurred, we would have
CHAPTER 6. QUANTUM ERROR CORRECTION 185
to make a measurement, and the back action (state collapse) from that mea-
surement would itself produce random unrecoverable errors.
Part of the power of a quantum computer derives from its analog
character–quantum states are described by continuous real (or complex) vari-
ables. This raises the specter that noise and small errors will destroy the
extra power of the computer just as it does for classical analog computers.
Remarkably, this is not the case! This is because the quantum computer
also has characteristics that are digital. Recall that any measurement of the
state of qubit always yields a binary result. Because of this, quantum errors
are continuous, but measured quantum errors are discrete. Amazingly this
makes it possible to perform quantum error correction and keep the calcula-
tion running even on an imperfect and noisy computer. In many ways, this
discovery by Peter Shor in 1995 and Andrew Steane in 1996 is even more
profound and unexpected than the discovery of efficient quantum algorithms
that work on ideal computers.
It would seem obvious that quantum error correction is impossible be-
cause the act of measurement to check if there is an error would collapse the
state, destroying any possible quantum superposition information. Remark-
ably however, one can encode the information in such a way that the presence
of an error can be detected by measurement, and if the code is sufficiently
sophisticated, the error can be corrected, just as in classical computation.
Classically, the only error that exists is the bit flip. Quantum mechani-
cally there are other types of errors (e.g. phase flip, energy decay, erasure
channels, etc.). However codes have been developed (using a minimum of
5 qubits) which will correct all possible quantum errors. By concatenating
these codes to higher levels of redundancy, even small imperfections in the
error correction process itself can be corrected. Thus quantum superposi-
tions can in principle be made to last essentially forever even in an imperfect
noisy system. It is this remarkable insight that makes quantum computation
possible.
As an entré to this rich field, we will consider a simplified example of one
qubit in some state α|0⟩ + β|1⟩ plus two ancillary qubits in state |0⟩ which we
would like to use to protect the quantum information in the first qubit. As
already noted, the simplest classical error correction code simply replicates
the first bit twice and then uses majority voting to correct for (single) bit
flip errors. This procedure fails in the quantum case because the no-cloning
theorem (see Box 3.4) prevents replication of an unknown qubit state. Thus
CHAPTER 6. QUANTUM ERROR CORRECTION 186
As was mentioned earlier, this is clear from the fact that the above transfor-
mation is not linear in the amplitudes α and β and quantum mechanics is
linear. One can however perform the repetition code transformation:
|0⟩log = |000⟩
|1⟩log = |111⟩. (6.8)
The analog of the single-qubit Pauli operators for this logical qubit are readily
seen to be
Xlog = X1 X2 X3
Ylog = iXlog Zlog
Zlog = Z1 Z2 Z3 . (6.9)
S1 = Z1 Z2 (6.10)
S2 = Z2 Z3 . (6.11)
CHAPTER 6. QUANTUM ERROR CORRECTION 187
These have the nice property that they commute both with each other (i.e.,
[S1 , S2 ] = S1 S2 − S2 S1 = 0) and with all three of the logical qubit operators
listed in Eq. (6.9). This means that they can both be measured simulta-
neously and that the act of measurement does not destroy the quantum
information stored in any superposition of the two logical qubit states. Fur-
thermore they each commute or anticommute with the four error operators
in such a way that we can uniquely identify what error (if any) has occurred.
Each of the four possible error states (including no error) is an eigenstate of
both stabilizers with the eigenvalues listed in the table below
error S1 S2
I +1 +1
X1 −1 +1
X2 −1 −1
X3 +1 −1
Thus measurement of the two stabilizers yields two bits of classical informa-
tion (called the ‘error syndrome’) which uniquely identify which of the four
possible error states the system is in and allows the experimenter to correct
the situation by applying the appropriate error operator, I, X1 , X2 , X3 to the
system to cancel the original error.
We now have our first taste of fantastic power of quantum error correction.
We have however glossed over some important details by assuming that either
an error has occurred or it hasn’t (that is, we have been assuming we are
in a definite error state). At the next level of sophistication we have to
recognize that we need to be able to handle the possibility of a quantum
superposition of an error and no error. After all, in a system described by
smoothly evolving superposition amplitudes, errors can develop continuously.
Suppose for example that the correct state of the three physical qubits is
and that there is some perturbation to the Hamiltonian such that after some
time there is a small amplitude ϵ that error X2 has occurred. Then the state
of the system is
p
|Ψ⟩ = [ 1 − |ϵ|2 I + ϵX2 ]|Ψ0 ⟩. (6.13)
(The reader may find it instructive to verify that the normalization is correct.)
CHAPTER 6. QUANTUM ERROR CORRECTION 188
What happens if we apply our error correction scheme to this state? The
measurement of each stabilizer will always yield a binary result, thus illustrat-
ing the dual digital/analog nature of quantum information processing. With
probability P0 = 1 − |ϵ|2 , the measurement result will be S1 = S2 = +1.
In this case the state collapses back to the original ideal one and the error
is removed! Indeed, the experimenter has no idea whether ϵ had ever even
developed a non-zero value. All she knows is that if there was an error, it is
now gone. This is the essence of the Zeno effect in quantum mechanics that
repeated observation can stop dynamical evolution. (It is also, once again, a
clear illustration of the maxim that in quantum mechanics ‘You get what you
see.’) Rarely however (with probability P1 = |ϵ|2 ) the measurement result
will be S1 = S2 = −1 heralding the presence of an X2 error. The correction
protocol then proceeds as originally described above. Thus error correction
still works for superpositions of no error and one error. A simple extension of
this argument shows that it works for an arbitrary superposition of all four
error states.
the bath
p
|Ψ⟩ = [ 1 − |ϵ|2 |Ψ0 , Bath0 ⟩ + ϵX2 ]|Ψ0 , Bath2 ⟩. (6.14)
For example, the error could be caused by the second qubit having a coupling
to a bath operator O2 of the form
V2 = g X2 O2 , (6.15)
Notice that once the stabilizers have been measured, then either the exper-
imenter obtained the result S1 = S2 = +1 and the state of the system plus
bath collapses to
Both results yield a product state in which the logical qubit is unentangled
with the bath. Hence the algorithm can simply proceed as before and will
work.
Finally, there is one more twist in this plot. We have so far described a
measurement-based protocol for removing the entropy associated with errors.
There exists another route to the same goal in which purely unitary multi-
qubit operations are used to move the entropy from the logical qubit to
some ancillae, and then the ancillae are reset to the ground state to remove
the entropy. The reset operation could consist, for example, of putting the
ancillae in contact with a cold bath and allowing the qubits to spontaneously
and irreversibly decay into the bath. Because the ancillae are in a mixed
state with some probability to be in the excited state and some to be in the
ground state, the bath ends up in a mixed state containing (or not containing)
photons resulting from the decay. Thus the entropy ends up in the bath. It
is important for this process to work that the bath be cold so that the qubits
always relax to the ground state and are never driven to the excited state.
CHAPTER 6. QUANTUM ERROR CORRECTION 190
We could if we wished, measure the state of the bath and determine which
error (if any) occurred, but in this protocol, no actions conditioned on the
outcome of such a measurement are required.
Quantum error correction is extremely challenging to carry out in prac-
tice. In fact the first error correction protocol to actually reach the break
even point (where carrying out the protocol extends rather than shortens
the lifetime of the quantum information) was achieved by the Yale group in
2016. This was done not using two-level systems as qubits but rather by
storing the quantum information in the states of a harmonic oscillator (a
superconducting microwave resonator containing superpositions of 0, 1, 2, . . .
photons).
Chapter 7
Yet To Do
1. Dave Bacon lecture notes on reversible classical gates has a nice dis-
cussion of Charlie Bennett’s ”uncomputation” trick to erasure scratch
registers. Add a discussion of this.
2. Add more discussion of joint measurements with examples for students
to make sure they understand that Z ⊗X is not measured by measuring
Z ⊗ I and I ⊗ X since this give too much information. However the
expectation value of this operator can be measured from averaging
the product of the individual measurement results. Another example
is measuring XY = iZ which is NOT the product of the individual
measurements since they are incompatible.
3. Define mutual information, useful for QEC.
191
CHAPTER 7. YET TO DO 192
that this is important for Simons’s algorithm and for quantum error
correction.
10. Did I insert proof that length preservation implies general inner product
conservation which implies unitary? Proof could be improved.
12. 3 CNOTS make a SWAP but apply to a general state or to the X basis
16. Make sure this statement: “Peculiar quantum interference effects per-
mit the Toffoli gate to be synthesized from two-qubit gates, something
that is not possible in (reversible) classical computation. More on this
later!” in Chapter 1 gets followed up. Show the quantum synthesis of
Toffoli from CNOTs.
19. Need to make sure that we derive (perhaps in appendix B) that eigen-
vectors of any Hermitian operator form a complete basis. Then give an
exercise where students derive that the variance of measurement results
of an operator is ⟨ψ|Q̂2 |ψ⟩ − ⟨ψ|Q̂|ψ⟩2 .
21. Add discussion of classical Euler angles to Sec. 3.4 and show that in
quantum a final rotation along the qubit polarization axis just intro-
duces a global phase on the state. [See Lecture 08 spring 2023.]
24. Redo Box and Figure on physical implementation of CNOT to use |0⟩
and |1⟩ instead of | ↑⟩ and | ↓⟩. Also change sign of Hamiltonian.
27. create a Box to explain that Toffoli is universal for classical computa-
tion and Toffoli + Hadamard is universal for quantum (Dorit paper).
Classically the Toffoli cannot be synthesized from CNOTs and NOTs
but quantum mechanically it CAN be synthesized from CNOTs and
single qubit (non-Clifford) rotations.
CHAPTER 7. YET TO DO 194
28. Define the Clifford hierarchy somewhere. Perhaps when discussing sta-
bilizer codes in QEC.
29. IMPORTANT: Show that the space of binary strings is a vector space
over the field {0, 1}. The only allowed scalars are 0 and 1, and vectors
are added bitwise mod 2. Refer to this when we do the Deutsch and
the Deutsch-Jozsa algorithms.
A.1 Randomness
Randomness plays an essential role in the theory of information–both classical
and quantum. It is therefore useful for us to review basic concepts from
probability and statistics.
What is randomness anyway? In the classical world randomness is re-
lated to ignorance. We lack knowledge of all the conditions and parameters
needed to make accurate predictions. For example, flipping a coin and see-
ing if it lands face up or face down is considered random. But it isn’t really
random. If Bob watches Alice flip a coin and were able measure exactly how
rapidly she made it spin and measure its initial upward velocity, he could
predict (using Newton’s laws of classical dynamics) how long it will be in
the air and whether it will land face up or face down. In more complicated
dynamical systems with several interacting degrees of freedom, the motion
can be chaotic. Tiny changes in initial conditions (positions and velocities)
can lead to large changes in the subsequent trajectory. Thus even though
classical mechanics is completely deterministic, motion on long time scales
can appear to be random.
Many computer programs rely on so-called random number generators.
They are not actually random but rather chaotic iterative maps–they take
an initial seed number and compute some complicated function of that seed
to produce a new number. That number is then used as the input to the next
195
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS196
round of iteration. The results may look random and may even pass many
statistical tests for randomness, but if an observer knows the program that
was used and the starting seed, he or she can predict the entire sequence of
numbers perfectly.
In quantum mechanics randomness is an ineluctable feature. It is not due
to ignorance of initial conditions but rather is an essential part of the theory.
Alice can prepare N truly identical copies of a quantum state and Bob can
make a measurement of some physical property on each copy of that state
and obtain truly random (not pseudo-random) results. The results are not
random because of some ‘hidden variable’ whose value Alice forgot to fix or
Bob failed to measure. They are truly random–it is impossible to predict the
outcome of the measurement before it is performed.
A.2 Probabilities
The probability pj of an event j is a non-negative number obeying 0 ≤ pj ≤ 1.
The probabilities of all possible events (in a universe of M possible events)
obeys the sum rule
M
X
pj = 1. (A.1)
j=1
This simply means that one of the possible events (from the complete set of
all possible mutually exclusive events) definitely happened.
As an example, suppose we have an M -sided die (die is the singular form
of dice). On each face of the die is a number. Let xj denote the number of
the jth face of the die, and pj be the probability that the jth face lands face
up so that the number xj is showing when the die is randomly tossed onto a
table. We can define a so-called ‘random variable’ X to be the number that
comes up when we toss the die. X randomly takes on one of the allowed
values xj ; j = 1, . . . , M . We can now ask simple questions like, what is the
mean (i.e. average) value of X? This is also known as the expectation value
of X and is often denoted by an overbar or by double brackets
M
X
X̄ = ⟨⟨X⟩⟩ = p j xj . (A.2)
j=1
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS197
This sum over all possible results weighted by their frequency of occurrence
gives the value one would obtain by averaging the results of a huge number
of trials of the experiment (of rolling the die).
As an example, suppose we have a standard cube-shaped die with the six
faces numbered 1 through 6, that is xj = j. We will take the ‘measurement
result’ to be the number on the top face of the die after it stops rolling. If
the die is fair, the probability of result xj is pj = 16 , and thus the mean value
will be
6 6
X X 1
X̄ = p j xj = j = 3.5. (A.3)
j=1 j=1
6
a) What are the unique possible values for the sum of the two num-
bers showing on the top faces of the dice?
For later purposes, it will also be useful to consider what happens when
we have two independent rolls of the die. Let the random variable X1 be
the number that comes up on the first toss and let X2 be the number that
comes up on the second toss. What is the joint probability distribution for
the two results? That is, what is the probability P(xj , xk ) that X1 = xj
and X2 = xk ? Because each result is drawn independently from the same
probability distribution we simply have that the probability for a given result
of two tosses is just the product of the probabilities of the individual results
P(xj , xk ) = pj pk . (A.4)
This is simply the statement that the joint probability distribution factorizes
into two separate distributions for the individual events (assuming the events
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS198
This is simply the mean of the square of the number that comes up when we
toss the die. Does the mean of the square bear any relation to the square
of the mean, X̄ 2 ? To find out, let us consider the so-called variance of the
distribution, the mean of the square of the deviation of the random variable
from its own mean
M
2
2 X 2
σ ≡ ⟨⟨ X − X̄ ⟩⟩ = pj xj − X̄
j=1
M
X
pj (xj )2 − 2X̄xj + X̄ 2
=
j=1
and
M
X M
X
pj 2X̄xj = 2X̄ pj xj = 2X̄ 2 . (A.9)
j=1 j=1
a) Assuming the die is fair (as in the left panel of Fig. A.1).
b) Assuming the die is biased with the probabilities given in the right
panel of Fig. A.1.
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS200
Figure A.1: Left panel: Graph of the probability distribution for the outcome of the throw
of a fair die (pj = 1/6; j = 1, . . . , 6). The variance is large. Right panel: Graph of the
probability distribution of an unfair (highly biased) die having p3 = 0.8, and p1 = p2 =
p4 = p5 = p6 = 0.04. The variance is smaller.
where Xk is the number that came up in the kth run of the experiment (e.g.
throw of the die). We use the tilde over the X to indicate that this is an
estimator for the mean, not the true mean.
For finite N , this estimator is not likely to be exact, but for large N we
expect it to become better and better. Is there a way to determine how
accurate this estimator is likely to be? Indeed there is. Let us write
X̃(N ) = X̄ + δN , (A.13)
Thus the average error vanishes. That is, our estimator is unbiased as ex-
pected.
We can get a sense of the typical size of the error by considering its
variance:
2
σN ≡ ⟨⟨(δN )2 ⟩⟩
N N
2 1 XX
σN = ⟨⟨ X j − X̄ X k − X̄ ⟩⟩
N 2 j=1 k=1
N N
1 XX
= ⟨⟨Xj Xk ⟩⟩ − X̄ 2
N 2 j=1 k=1
N N
1 X 2 1 X
= 2
⟨⟨Xj Xk ⟩⟩ − X̄ + 2 ⟨⟨Xj Xk ⟩⟩ − X̄ 2
N j̸=k N j=k
N
1 X
= ⟨⟨(Xj )2 ⟩⟩ − X̄ 2
N 2 j=1
1 2
= σ , (A.16)
N 1
where σ1 = σ, the standard error for a single throw of the die defined in
Eq. (A.11). Thus our estimator has a random error whose variance decreases
inversely with N . The standard error thus is
1
σN = √ σ1 . (A.17)
N
This is a simple estimate of the size of the error that our estimator of the
mean is likely to have.
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS202
Box A.1. Sample variance vs. true variance Care must be exercised
when estimating the variance σ12 of an unknown probability distribution from
a finite sample size drawn from the distribution. If (somehow) we know the
true mean of the distribution, then we can simply use as our estimator of the
variance
N
1 X
σ̃ 2 = (Xj − X̄)2 . (A.18)
N j=1
The situation is not so simple when we do not know the mean of the un-
known distribution and are forced to estimate it using Eq. (A.12). While
this estimator of the mean is unbiased it still will have some small error and
that error is positively correlated with the values of Xj in our sample. This
means that if we use as our estimator of the variance
N
2 1 X
σ̃ = (Xj − X̃(N ))2 , (A.20)
N j=1
where ∆j is the value of the random variable ∆ for the jth step.
In Fig. A.2 we see plots of the probability distribution for x for different
values of N , together with the Gaussian approximation to it. We see that the
Gaussian approximation is quite good even for modest values of N . This is
the essence of the central limit theorem. The sum of a large number of ran-
dom variables (with bounded variance) is well-approximated by a Gaussian
distribution.
Exercise A.3 provides an opportunity to prove the central limit theorem
for the particular case of a random walk. To see how this works, let us
derive the exact probability distribution for a random walk of N steps, each
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS204
of size ϵ = 1/2. We take N even so that the final position m of the walker is
always an integer and lies in the interval m ∈ [−N/2, +N/2]. Let p± be the
probability of stepping to the right or left respectively. Let the number of
steps to the right be R and the number to the left be L. We have N = R + L
and the final position is given by m = (R − L)ϵ. The probability of any given
sequence of steps is
P = pR L
+ p− . (A.26)
The number of different walks with R steps to the right and L steps to the
left can be determined from combinatorics. Think of a string of N symbols,
ℓ and r denoting the direction of each step in the random walk. There
are altogether N ! permutations of the order of these symbols. However L!
permutations merely swap ℓ’s with other ℓ’s and so should not be counted
as distinct walks. Similarly there are R! permutations of the r’s that should
not be counted. The number of distinct walks M (R, L) is therefore
N!
M (R, L) = . (A.27)
R!L!
The probability of ending up at m = (R − L)ϵ after N = R + L (even) steps
is therefore
R L N! R (N −R) N
P (N, m) = p+ p− = p+ p− (A.28)
R!L! R
where the last expression is the binomial coefficient
N N!
= . (A.29)
R R!(N − R)!
For this reason, this probability distribution is known as the binomial distri-
bution. Notice that this expression is correctly normalized because
+N N
X X (N −R) N!
P (N, m) = pR
+ p− = (p+ +p− )N = (1)N = 1, (A.30)
m=−N R=0
R!(N − R)!
a) Derive exact expressions for the mean and variance of the position
m after N steps by two different methods. Hint: You can either
derive the appropriate properties of the binomial distribution, or
you can use the fact that the individual walk steps have a mean
and variance and are statistically independent. For the former it
is useful to notice that p+ ∂p∂+ (p+ )R = R(p+ )R .
This constitutes a proof of the central limit theorem for this special case.
∗ Note that an asymptotic expansion does not mean that the difference
between the LHS and RHS of Eq. (A.32) becomes arbitrarily close to zero
for large N . It means that the ratio of the RHS to the LHS approaches
unity for large N . Since both quantities are diverging, their difference
can still be large even if their ratio approaches unity.
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS206
It turns out that integrals of this form (and its multi-dimensional general-
ization) are ubiquitous in physics, so it is handy to know how to carry them
out. We begin by ‘completing the square’ by writing
Z +∞
b 2 b2
I= dz e−a[z− 2a ] + 4a . (A.35)
−∞
b
Shifting the dummy variable of integration to x = z − 2a
we have
Z +∞
b2 2
I=e 4a dx e−ax . (A.36)
−∞
It turns out to be easier to consider the square of the integral which we can
write as
h b2 i2 Z +∞ 2 2
2
I = e 4a dxdy e−a[x +y ] . (A.37)
−∞
This two-dimensional
p integral is readily carried out using polar coordinates
r, θ with r = x2 + y 2
h b2 i 2 Z ∞ 2
2
I = e 4a 2πrdr e−ar . (A.38)
0
From this we see that the Gaussian probability distribution in Eq. (A.23) is
properly normalized to unity.
APPENDIX A. QUICK REVIEW OF PROBABILITY AND STATISTICS207
N =2 N =4
N =8 N = 16
Figure A.2: Dots: Probability distribution for the ending position of random walks of N
with step √
length ϵ = 1/2. The smooth curve is the Gaussian distribution with standard
deviation N /2. The Gaussian approximation rapidly becomes very accurate as N in-
creases.
That is, it is the sum of the probabilities of all the independent ways that q0
can be found with measured value A independent of what B is. Notice that
it follows immediately from the definition in Eq. (A.42) and the sum rule in
Eq. (A.41) that this distribution is properly normalized
X X X
P1 (A) = P (A, B) = 1. (A.43)
A=±1 A=±1 B=±1
with B held fixed. This however is not properly normalized. The correctly
normalized expression is
P (A, B)
P (A|B) = P (A.46)
A=±1 P (A, B)
P (A, B)
= . (A.47)
P0 (B)
From this it follows that
Equating the above two expressions for P (A, B) yields the very important
Bayes rule:
P0 (B)
P (B|A) = P (A|B) . (A.52)
P1 (A)
precisely the situation where Bayes Rule is useful. We have from Eq. (A.52)
Pψ (ψj )
P (ψj |z = +1) = P (z = +1|ψj ) , (A.61)
Pz (z = +1)
where Pψ (ψj ) is the prior probability that Alice selects state ψj . In this
case, since Alice chooses each state with equal probability, we have Pψ (ψj ) =
1
4
for all four values of j. Pz (z) is the prior probability of obtaining the
measurement result z. In this case, we find using Eqs. (A.57-A.60)
3
X 1 1 1 1
Pz (z = +1) = P (z = +1|ψj )Pψ (ψj ) = 1 + 0 + + = .
j=0
2 2 4 2
(A.62)
We see that Pz (z = +1) on the RHS of Eq. (A.62) is just a constant factor
needed to fix the normalization of the posterior probability distribution on
the LHS.
Combining all these results with Eq. (A.61) yields the results given in
Box 4.6
1
P (ψ0 |z = +1) = (A.63)
2
P (ψ1 |z = +1) = 0 (A.64)
1
P (ψ2 |z = +1) = (A.65)
4
1
P (ψ3 |z = +1) = . (A.66)
4
(A.67)
211
APPENDIX B. FORMAL DEFINITION OF A VECTOR SPACE 212
The general definition of a norm for a vector space is (more or less) any
mapping from the vectors to the non-negative real numbers satisfying the
⃗ + B|
triangle inequality |A ⃗ ≤ |A|
⃗ + |B|.
⃗ It is important to note that a given
vector space need not have a norm defined.
For ordinary vectors we are used to the dot product
⃗·B
A ⃗ = ax Bx + Ay By + Az Bz = |A||
⃗ B|⃗ cos θAB , (B.9)
where θAB is the angle between the two vectors. The dot product is a specific
example of the general concept of an inner product. An inner product of two
abstract vectors V1 and V2 is a mapping onto a scalar s, often denoted
(V1 , V2 ) = s. (B.10)
For the case where the vector space is defined over the field of complex
numbers, an inner product satisfies the following requirements:
APPENDIX B. FORMAL DEFINITION OF A VECTOR SPACE 214
Positive semi-definite
(V1 , V1 ) ≥ 0 (B.15)
and
[Note the first zero above is the scalar zero and the arrow is placed on
the second zero to make clear that this is the null vector (additive zero
vector) not the scalar zero.]
Two vectors are defined to be orthogonal if their inner product vanishes.
Exercise B.1. Prove that the usual dot product for real three-
dimensional vectors satisfies the definition of an inner product.
The Pythagorean norm defined above for ordinary real vectors is thus
related to the inner product of the vector with itself
p
⃗ = A
|A| ⃗ · A.
⃗ (B.17)
p
Exercise B.2. Prove that for a general complex vector space, (V, V )
satisfies the definition of a norm.
In describing qubit states we will deal with two component complex val-
ued vectors of the form
Ψ1 = (α1 , β1 ), (B.18)
Ψ2 = (α2 , β2 ). (B.19)
APPENDIX B. FORMAL DEFINITION OF A VECTOR SPACE 215
The standard inner product and norm for such vectors are defined in Ex. B.3.
Exercise B.3. Prove that the following generalization of the dot prod-
uct to complex vectors satisfies the definition of an inner product
where |Ψ1 ⟩ is referred to as a ‘ket’ vector and the inner product is represented
by
(α1∗ , β1∗ )
α2
⟨Ψ1 |Ψ2 ⟩ = = α1∗ α2 + β1∗ β2 . (B.25)
β2
APPENDIX B. FORMAL DEFINITION OF A VECTOR SPACE 216
The Dirac notation is also very convenient for defining the outer product
of two vectors which, as described in Chapter 3 is a linear operator that maps
the vector space back onto itself (i.e. maps vectors onto other vectors)
O|Ψ3 ⟩ = (|Ψ2 ⟩⟨Ψ1 |) |Ψ3 ⟩ = |Ψ2 ⟩⟨Ψ1 |Ψ3 ⟩ = |Ψ2 ⟩ (⟨Ψ1 |Ψ3 ⟩) = s13 |Ψ2 ⟩, (B.27)
where the scalar s13 is given by the inner product (also known in quantum
parlance as the ‘overlap’)
Subsituting the definitions of of the bra and ket vectors in Eq. (B.26) we see
that the abstract operator can be represented as a matrix
By simply switching the order of the row and column vector from that in
Eq. (B.25), the rules of matrix multiplication give us a matrix instead of a
scalar!
Exercise B.4. Use the matrix representation of O in Eq. (B.29) and
apply it to the column vector representation of |Ψ3 ⟩
γ
|Ψ3 ⟩ =
δ
(AB)† = B † A† . (B.30)
It works the same way in the Dirac notation for the outer product of two
vectors that forms an operator. Thus the adjoint of the operator in Eq. (B.26)
is simply
To see that this is true, it is best to work with the representation in Eq. (B.29)
† †
(α1∗ , β1∗ ) α2 α1∗ α2 β1∗
† α2
O = = , (B.32)
β2 β2 α1∗ β2 β1∗
(α2∗ , β2∗ ) α1 α2∗ α1 β2∗
α1
= = , (B.33)
β1 β1 α2∗ β1 β2∗
= |Ψ1 ⟩⟨Ψ2 |. (B.34)
For the Hilbert space of n qubits, we will use the following Dirac notation
for state vectors and their duals in the computational basis
in which the qubits are numbered from 0 to n − 1 and their values bj ∈ {0, 1}
are ordered from right to left. (Note that we maintain this same label ordering
in the dual vector.) The computational basis is simply the tensor product
APPENDIX B. FORMAL DEFINITION OF A VECTOR SPACE 218
of the computational basis of the individual qubits (illustrated here for the
case of two qubits)
1
1 1 0
|00⟩ = |0⟩ ⊗ |0⟩ = ⊗ = (B.40)
0 0 0
0
0
1 0 1
|01⟩ = |0⟩ ⊗ |1⟩ = ⊗ = (B.41)
0 1 0
0
0
0 1 0
|10⟩ = |1⟩ ⊗ |0⟩ = ⊗ = (B.42)
1 0 1
0
0
0 0 0
|11⟩ = |1⟩ ⊗ |1⟩ = ⊗ = 0 .
(B.43)
1 1
1
Notice that if we label the ordinal positions in the column vector starting with
0 at the top and ending with 3 as in Eq. (B.37), then the binary representation
of the position of the entry containing 1 gives the state of the two qubits
in the computational basis. For example |11⟩ corresponds to the binary
representation of the number 3 which in turn corresponds to the location of
the entry 1 being at the bottom of the column vector in Eq. (B.43)
We can also write the dual vectors associated with the above two-qubit
state vectors. For example the dual of the vector in Eq. (B.41) is
⟨01| = ⟨0| ⊗ ⟨1| = 1 0 ⊗ 0 1 = 0 1 0 0 . (B.44)
Recall from Eq. (B.30) for ordinary products of matrices we need to reverse
the order of the matrices when forming the transpose. However in forming
the dual of the tensor product |0⟩ ⊗ |1⟩, we do not reverse the order of the
two terms in the tensor product. This is because of our convention of keeping
the bit order the same when writing the dual of |01⟩ as ⟨01| rather than ⟨10|.
As examples of operators acting on this Hilbert space consider the joint
APPENDIX B. FORMAL DEFINITION OF A VECTOR SPACE 219
Pauli operators
0 +1 0 0
+1 0 0 0
Z ⊗X = , (B.45)
0 0 0 −1
0 0 −1 0
and
0 0 +1 0
0 0 0 −1
X ⊗Z =
+1 0
. (B.46)
0 0
0 −1 0 0
the tensor product of two single qubit states. But we also see that this
is exactly equivalent to each operator acting separately on their respective
qubits and then taking the tensor product of the resulting state vectors. If
we want an operator that acts on only qubit q0 we simply tensor it with the
identity acting on q1 . For example,
0 1 0 0
1 0 0 0
X0 = I ⊗ X = 0 0 0 1 ,
(B.55)
0 0 1 0
⃗x ⊕ ⃗y ≡ ⃗x + ⃗y mod 2, (B.57)
where the mod 2 operation is performed bitwise. Strangely, this means that
every vector is its own additive inverse
⃗x ⊕ ⃗x = ⃗0. (B.58)
Similarly the only allowed scalars are s = 0, 1 because only for these values
are vectors of the form s⃗x in the space of bit strings. The set of scalars {0, 1}
APPENDIX B. FORMAL DEFINITION OF A VECTOR SPACE 221
where ⃗x · ⃗y is the ordinary Euclidean dot product. (The inner product has
to be a scalar and there are only two allowed scalars which is why the mod
2 arithmetic is required in the inner product.) The ‘length’ of a vector
L2 = L = ⃗x.⃗x is thus the parity of the number non-zero bits in the string.
L = 0 if the vector contains an even number of 1’s and L = 1 if the vector
contains an odd number of 1’s.
This notion of ‘length’ does not give a very complete notion of the dis-
tance between two vectors. To remedy this one can define the notion of the
Hamming distance between two vectors
n−1
X
dH (⃗x, ⃗y ) = (xj ⊕ yj ). (B.60)
j=0
Because xj ⊕ yj is zero if the two bits agree and one if they differ, the
Hamming distance is the total number of instances where the bit strings
differ. Equivalently it is the total number of bits in ⃗x that would need to be
flipped to convert ⃗x into ⃗y .
The notion of Hamming distance is very important in classical error cor-
rection, because the Hamming distance between a code word (a bit string in
the code space) and a word that has been corrupted by errors (bit flips) is
equal to the number of bitflip errors. These various notions of length and
distance are not to be confused with the ‘length’ n of a bit string in the
ordinary sense of the number of bits in the bit string vectors, which is the
dimension of the vector space.
Appendix C
222
APPENDIX C. HANDY MATHEMATICAL IDENTITIES 223
The representation of the basis states in the n̂ basis in terms of the stan-
dard basis states is
θ iφ θ
| + n̂⟩ = cos |0⟩ + e sin |1⟩
2 2
θ θ
| − n̂⟩ = sin |0⟩ − eiφ cos |1⟩
2 2
where n̂ = (sin θ cos φ, sin θ sin φ, cos θ).
Pauli matrices:
x 0 +1
X=σ =
+1 0
0 −i
Y = σy =
+i 0
+1 0
Z = σz =
0 −1
+1 0
I = σ0 = .
0 +1
Trace of Pauli matrices
Tr X = Tr Y = Tr Z = 0
TrI = 2.
Products of Pauli Matrices
X2 = Y 2 = Z2 = I
XY = −Y X = iZ
YZ = −ZY = iX
ZX = −XZ = iY
XY Z = iI.
ψ11
In addition to the standard orthonormal computational basis states for
two qubits, {|00⟩, |01⟩, |10⟩, |11⟩}, another commonly used orthonormal basis
for two qubits are the so-called Bell states:
1
|B0 ⟩ = √ [|01⟩ − |10⟩]
2
1
|B1 ⟩ = √ [|01⟩ + |10⟩]
2
1
|B2 ⟩ = √ [|00⟩ − |11⟩]
2
1
|B3 ⟩ = √ [|00⟩ + |11⟩] .
2
APPENDIX C. HANDY MATHEMATICAL IDENTITIES 225
The TF’s and the Peer Tutors as well as members of the Yale Undergraduate
Quantum Computing Club can help with selecting and carrying out your
project. The goal is to learn something not covered in class and produce a
∼ 10 page pedagogical write up that will allow other students to learn the
topic as well.
226
APPENDIX D. SUGGESTIONS FOR PROJECTS 227
D.3 Algorithms
We will cover some of the algorithms described in Kaye, Laflamme and Mosca
(An Introduction to Quantum Computing) but not all, so this is a useful
reference. Rieffel and Polak (Quantum Computing: A Gentle Introduction)
is also useful.
The review by Ashley Montaro (doi:10.1038/npjqi.2015.23) on quantum
algorithms is a good starting point. This and other papers can be found in
“Papers for Projects” folder in the files section of Canvas.
The following websites maintain a large list of quantum algorithms and
protocols
https://quantumalgorithmzoo.org/
https://wiki.veriqloud.fr/index.php?title=Protocol Library
3. Oblivious Amplification
5. Phase estimation (may cover some aspects of this in class but you could
go into more depth or could experiment with executing in on the IBM
machines)
6. Quantum Fourier Transform (We will do this in class but you could do
a more detailed comparison with the classical Fast Fourier Transform
Algorithm, experiment with executing the QFT on the IBM machines.)
https://www.youtube.com/watch?v=NZ5PmIaJ5IE
14. Variational Quantum Estimation (VQE) (will cover in class but you
could apply this to a specific problem using the IBM Q system)
4. Superconducting qubits
APPENDIX D. SUGGESTIONS FOR PROJECTS 231
[1] Scarani, Valerio and Iblisdir, Sofyan and Gisin, Nicolas and Acı́n, An-
tonio, ‘Quantum cloning,’ Rev. Mod. Phys. 77, 1225 (2005).
[2] Bennett, Charles H. and Wiesner, Stephen J., ‘Communication via one-
and two-particle operators on Einstein-Podolsky-Rosen states,’ Phys.
Rev. Lett. 69, 2881-2884 (1992).
[4] J. S. Bell, ‘On the Einstein Podolsky Rosen paradox,’ Physics 1, 195–200
(1964).
232