Quantum Information Theory (Lecture Notes)
Quantum Information Theory (Lecture Notes)
Renato Renner
with contributions by Matthias Christandl
February 6, 2013
Contents
1 Introduction 5
2 Probability Theory 6
2.1 What is probability? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Definition of probability spaces and random variables . . . . . . . . . . . . . 7
2.2.1 Probability space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3 Notation for events . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.4 Conditioning on events . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Probability theory with discrete random variables . . . . . . . . . . . . . . . 9
2.3.1 Discrete random variables . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Marginals and conditional distributions . . . . . . . . . . . . . . . . 9
2.3.3 Special distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.4 Independence and Markov chains . . . . . . . . . . . . . . . . . . . . 10
2.3.5 Functions of random variables, expectation values, and Jensen’s in-
equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.6 Trace distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.7 I.i.d. distributions and the law of large numbers . . . . . . . . . . . 12
2.3.8 Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Information Theory 15
3.1 Quantifying information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Approaches to define information and entropy . . . . . . . . . . . . . 15
3.1.2 Entropy of events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.3 Entropy of random variables . . . . . . . . . . . . . . . . . . . . . . 17
3.1.4 Conditional entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.5 Mutual information . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.6 Smooth min- and max- entropies . . . . . . . . . . . . . . . . . . . . 20
3.1.7 Shannon entropy as a special case of min- and max-entropy . . . . . 20
3.2 An example application: channel coding . . . . . . . . . . . . . . . . . . . . 21
3.2.1 Definition of the problem . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.2 The general channel coding theorem . . . . . . . . . . . . . . . . . . 21
3.2.3 Channel coding for i.i.d. channels . . . . . . . . . . . . . . . . . . . . 24
3.2.4 The converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2
4 Quantum States and Operations 26
4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.1 Hilbert spaces and operators on them . . . . . . . . . . . . . . . . . 26
4.1.2 The bra-ket notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.3 Tensor products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.4 Trace and partial trace . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1.5 Decompositions of operators and vectors . . . . . . . . . . . . . . . . 29
4.1.6 Operator norms and the Hilbert-Schmidt inner product . . . . . . . 31
4.1.7 The vector space of Hermitian operators . . . . . . . . . . . . . . . . 32
4.2 Postulates of quantum mechanics . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Quantum states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3.1 Density operators — definition and properties . . . . . . . . . . . . . 34
4.3.2 Quantum-mechanical postulates in the language of density operators 34
4.3.3 Partial trace and purification . . . . . . . . . . . . . . . . . . . . . . 35
4.3.4 Mixtures of states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.5 Hybrid classical-quantum states . . . . . . . . . . . . . . . . . . . . . 37
4.3.6 Distance between states . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Evolution and measurements . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4.1 Completely positive maps (CPMs) . . . . . . . . . . . . . . . . . . . 44
4.4.2 The Choi-Jamiolkowski isomorphism . . . . . . . . . . . . . . . . . . 45
4.4.3 Stinespring dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.4 Operator-sum representation . . . . . . . . . . . . . . . . . . . . . . 48
4.4.5 Measurements as CPMs . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4.6 Positive operator valued measures (POVMs) . . . . . . . . . . . . . 50
4.4.7 The diamond norm of CPMs . . . . . . . . . . . . . . . . . . . . . . 51
4.4.8 Example: Why to enlarge the Hilbert space . . . . . . . . . . . . . . 54
6 Basic Protocols 65
6.1 Teleportation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2 Superdense coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.3 Entanglement conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3.1 Majorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3
7.5 Conditional min-entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8 Resources Inequalities 85
8.1 Resources and inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.2 Monotones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.3 Teleportation is optimal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
8.4 Superdense coding is optimal . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.5 Entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.6 Cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4
1 Introduction
The very process of doing physics is to acquire information about the world around us.
At the same time, the storage and processing of information is necessarily a physical
process. It is thus not surprising that physics and the theory of information are inherently
connected.1 Quantum information theory is an interdisciplinary research area whose goal
is to explore this connection.
As the name indicates, the information carriers in quantum information theory are
quantum-mechanical systems (e.g., the spin of a single electron). This is in contrast to
classical information theory where information is assumed to be represented by systems
that are accurately characterized by the laws of classical mechanics and electrodynamics
(e.g., a classical computer, or simply a piece of paper). Because any such classical system
can in principle be described in the language of quantum mechanics, classical information
theory is a (practically significant) special case of quantum information theory.
The course starts with a quick introduction to classical probability and information the-
ory. Many of the relevant concepts, e.g., the notion of entropy as a measure of uncertainty,
can already be defined in the purely classical case. I thus consider this classical part as a
good preparation as well as a source of intuition for the more general quantum-mechanical
treatment.
We will then move on to the quantum setting, where we will spend a considerable
amount of time to introduce a convenient framework for representing and manipulating
quantum states and quantum operations. This framework will be the prerequisite for
formulating and studying typical information-theoretic problems such as information stor-
age and transmission (with possibly noisy devices). Furthermore, we will learn in what
sense information represented by quantum systems is different from information that is
represented classically. Finally, we will have a look at applications such as quantum key
distribution.
I would like to emphasize that it is not an intention of this course to give a complete
treatment of quantum information theory. Instead, the goal is to focus on certain key
concepts and to study them in more detail. For further reading, I recommend the standard
textbook by Nielsen and Chuang [1].
1 Thisconnection has been noticed by numerous famous scientists over the past fifty years, among them
Rolf Landauer with his claim “information is physical.”
5
2 Probability Theory
Information theory is largely based on probability theory. Therefore, before introducing
information-theoretic concepts, we need to recall some key notions of probability theory.
The following section is, however, not thought as an introduction to probability theory.
Rather, its main purpose is to summarize some basic facts as well as the notation we are
going to use in this course.
6
0 0 0
cand cand cand
PX (1) = PX (2) = 1/2 and PX (3) = 0 .
When interpreting a probability distribution as a state of knowledge and, hence, as
subjective quantity, we need to carefully specify whose state of knowledge we are referring
to. This is particularly relevant for the analysis of information-theoretic settings, which
usually involve more than one party. For example, in a communication scenario, we might
have a sender who intends to transmit a message M to a receiver. Clearly, before M is
sent, the sender and the receiver have different knowledge about M and, consequently,
would assign different probability distributions to M . In the following, when describing
such settings, we will typically understand all distributions as states of knowledge of an
outside observer.
P : E → R+
that assigns to each event E ∈ E a nonnegative real number PS[E], calledPthe probability
of E. It must satisfy the probability axioms P [Ω] = 1 and P [ i∈N Ei ] = i∈N P [Ei ] for
any family (Ei )in∈N of pairwise disjoint events.
7
E and F. This means that the preimage of any F ∈ F is an event, i.e., X −1 (F ) ∈ E.
The probability measure P on (Ω, E) induces a probability measure PX on the measurable
space (X , F), which is also called range of X,
PX [F ] := P [X −1 (F )] ∀F ∈ F . (2.1)
A pair (X, Y ) of random variables can obviously be seen as a new random variable. More
precisely, if X and Y are random variables with range (X , F) and (Y, G), respectively, then
(X, Y ) is the random variable with range (X × Y, F × G) defined by3
We will typically write PXY to denote the joint probability measure P(X,Y ) on (X × Y, F × G)
induced by (X, Y ). This convention can, of course, be extended to more than two random
variables in a straightforward way. For example, we will write PX1 ···Xn for the probability
measure induced by an n-tuple of random variables (X1 , . . . , Xn ).
In a context involving only finitely many random variables X1 , . . . , Xn , it is usually
sufficient to specify the joint probability measure PX1 ···Xn , while the underlying probability
space (Ω, E, P ) is irrelevant. In fact, as long as we are only interested in events defined in
terms of the random variables X1 , . . . , Xn (see Section 2.2.3 below), we can without loss
of generality identify the sample space (Ω, E) with the range of the tuple (X1 , . . . , Xn ) and
define the probability measure P to be equal to PX1 ···Xn .
P [E ∩ E 0 ]
P [E|E 0 ] := ∀E ∈ E .
P [E 0 ]
8
Similarly, we can define PX|E 0 as the probability measure of a random variable X con-
ditioned on E 0 . Analogously to (2.1), it is the probability measure induced by P [·|E 0 ],
i.e.,
PX|E 0 [F ] := P [X −1 (F )|E 0 ] ∀F ∈ F .
More generally, for an event E 0 with P [E 0 ] > 0, the probability mass function of X con-
ditioned on E 0 is given by PX|E 0 (x) := PX|E 0 [{x}], and also satisfies the normalization
condition (2.2).
and likewise for PY . Furthermore, for any y ∈ Y with PY (y) > 0, the distribution PX|Y =y
of X conditioned on the event Y = y obeys
PXY (x, y)
PX|Y =y (x) = ∀x ∈ X . (2.4)
PY (y)
4 It is easy to see that the power set of X is indeed a σ-algebra over X .
5 Note that X and Y can themselves be tuples of random variables.
9
2.3.3 Special distributions
Certain distributions are important enough to be given a name. We call PX flat if all
non-zero probabilities are equal, i.e.,
PX (x) ∈ {0, q} ∀x ∈ X
1
for some q ∈ [0, 1]. Because of the normalization condition (2.2), we have q = |suppPX|
,
where suppPX := {x ∈ X : PX (x) > 0} is the support of the function PX . Furthermore,
if PX is flat and has no zero probabilities, i.e.,
1
PX (x) = ∀x ∈ X ,
|X |
we call it uniform.
For a random variable X whose alphabet X is a module over the reals R (i.e., there is a
notion of addition and multiplication with reals), we define the expectation value of X by
X
hXiPX := PX (x)x .
x∈X
6P × PY denotes the function (x, y) 7→ PX (x)PY (y).
X
10
If the distribution PX is clear from the context, we sometimes omit the subscript.
For a convex real function f on a convex set X , the expectation values of X and f (X)
are related by Jensen’s inequality
hf (X)i ≥ f (hXi) .
The inequality is essentially a direct consequence of the definition of convexity (see Fig. 2.1).
In the literature, the trace distance is also called statistical distance, variational distance,
or Kolmogorov distance.8 It is easy to verify that δ is indeed a metric, that is, it is
symmetric, nonnegative, zero if and only if P = Q, and it satisfies the triangle inequality.
Furthermore, δ(P, Q) ≤ 1 with equality if and only if P and Q have distinct support.
Because P and Q satisfy the normalization condition (2.2), the trace distance can equiv-
alently be written as
X
δ(P, Q) = 1 − min[P (x), Q(x)] . (2.5)
x∈X
The trace distance between the probability mass functions QX and QX 0 of two random
variables X and X 0 has a simple interpretation. It can be seen as the minimum probability
that X and X 0 take different values.
7 Thedefinition can easily be generalized to probability measures.
8 We use the term trace distance because, as we shall see, it is a special case of the trace distance for
density operators.
11
Lemma 2.3.1. Let QX and QX 0 be probability mass functions on X . Then
where the minimum ranges over all joint probability mass functions PXX 0 with marginals
PX = QX and PX 0 = QX 0 .
Proof. To prove the inequality δ(QX , QX 0 ) ≤ minPXX 0 PXX 0 [X 6= X 0 ], we use (2.5) and the
fact that, for any joint probability mass function PXX 0 , min[PX (x), PX 0 (x)] ≥ PXX 0 (x, x),
which gives
X X
δ(PX , PX 0 ) = 1 − min[PX (x), PX 0 (x)] ≤ 1 − PXX 0 (x, x) = PXX 0 [X 6= X 0 ] .
x∈X x∈X
We thus have δ(PX , PX 0 ) ≤ PXX 0 [X 6= X 0 ], for any probability mass function PXX 0 .
Taking the minimum over all PXX 0 with PX = QX and PX 0 = QX 0 gives the desired
inequality.
The proof of the opposite inequality is given in the exercises.
An important property of the trace distance is that it can only decrease under the
operation of taking marginals.
Lemma 2.3.2. For any two density mass functions PXY and QXY ,
Proof. Applying the triangle inequality for the absolute value, we find
1X 1X X
|PXY (x, y) − QXY (x, y)| ≥ | PXY (x, y) − QXY (x, y)|
2 x,y 2 x y
1X
= |PX (x) − QX (x)| ,
2 x
where the second equality is (2.3). The assertion then follows from the definition of the
trace distance.
The i.i.d. property thus characterizes situations where a certain process is repeated n times
independently. In the context of information theory, the i.i.d. property is often used to
describe the statistics of noise, e.g., in repeated uses of a communication channel (see
Section 3.2).
12
The law of large numbers characterizes the “typical behavior” of real-valued i.i.d. ran-
dom variables X1 , . . . , Xn in the limit of large n. It usually comes in two versions, called
the weak and the strong law of large numbers. As the name suggests, the latter implies
the first.
Let µ = hXi i be the expectation value of Xi (which, by the i.i.d. assumption, is the
same for all X1 , . . . , Xn ), and let
n
1X
Zn := Xi
n i=1
be the sample mean. Then, according to the weak law of large numbers, the probability
that Zn is ε-close to µ for any positive ε converges to one, i.e.,
lim P |Zn − µ| < ε = 1 ∀ε > 0 . (2.6)
n→∞
The weak law of large numbers will be sufficient for our purposes. However, for com-
pleteness, we mention the strong law of large numbers which says that Zn converges to µ
with probability 1,
P lim Zn = µ = 1 .
n→∞
2.3.8 Channels
A channel p is a probabilistic mapping that assigns to each value of an input alphabet X
a value of the output alphabet. Formally, p is a function
p: X × Y → R+
(x, y) 7→ p(y|x)
Channels can be seen as abstractions of any (classical) physical device that takes an
input X and outputs Y . A typical example for such a device is, of course, a communication
channel, e.g., an optical fiber, where X is the input provided by a sender and where Y is
the (possibly noisy) version of X delivered to a receiver. A practically relevant question
9 It is easy to verify that PXY is indeed a probability mass function.
13
then is how much information one can transmit reliably over such a channel, using an
appropriate encoding.
But channels do not only carry information over space, but also over time. Typical
examples are memory devices, e.g., a hard drive or a CD (where one wants to model the
errors introduced between storage and reading out of data). Here, the question is how
much redundancy we need to introduce in the stored data in order to correct these errors.
The notion of channels is illustrated by the following two examples.
Example 2.3.3. The channel depicted in Fig. 2.2 maps the input 0 with equal probability
to either 0 or 1; the input 1 is always mapped to 2. The channel has the property that its
input is uniquely determined by its output. As we shall see later, such a channel would
allow to reliably transmit one classical bit of information.
Example 2.3.4. The channel shown in Fig. 2.3 maps each possible input with equal
probability to either 0 or 1. The output is thus completely independent of the input. Such
a channel is obviously not useful to transmit information.
The notion of i.i.d. random variables naturally translates to channels. A channel pn from
X ×· · ·×X to Y ×· · ·×Y is said to be i.i.d. if it can be written as pn = p×n := p×· · ·×p.
14
3 Information Theory
3.1 Quantifying information
The main object of interest in information theory, of course, is information and the way it
is processed. The quantification of information thus plays a central role. The aim of this
section is to introduce some notions and techniques that are needed for the quantitative
study of classical information, i.e., information that can be represented by the state of a
classical (in contrast to quantum) system.
15
generating the exact sequence of bits X is, most likely, simply the program that has the
whole sequence already stored.1
Despite the elegance of its definition, the algorithmic entropy has a fundamental disad-
vantage when being used as a measure for uncertainty: it is not computable. This means
that there cannot exist a method (e.g., a computer program) that estimates the algorith-
mic complexity of a given string X. This deficiency as well as its implications2 render the
algorithmic complexity unsuitable as a measure of entropy for most practical applications.
In this course, we will consider a different approach which is based on ideas developed in
thermodynamics. The approach has been proposed in 1948 by Shannon [3] and, since then,
has proved highly successful, with numerous applications in various scientific disciplines
(including, of course, physics). It can also be seen as the theoretical foundation of modern
information and communication technology. Today, Shannon’s theory is viewed as the
standard approach to information theory.
In contrast to the algorithmic approach described above, where the entropy is defined
as a function of the actual data X, the information measures used in Shannon’s theory
depend on the probability distribution of the data. More precisely, the entropy of a value
X is a measure for the likelihood that a particular value occurs. Applied to the above
compression problem, this means that one needs to assign a probability mass function to
the data to be compressed. The method used for compression might then be optimized
for the particular probability mass function assigned to the data.
H: E → R ∪ {∞}
E 7→ H(E) .
For the following, we assume that the events are defined on a probability space with
probability measure P . The function H should then satisfy the following properties.
1. Independence of the representation: H(E) only depends on the probability P [E] of
the event E.
2. Continuity: H is continuous in the probability measure P (relative to the topology
induced by the trace distance).
3. Additivity: H(E ∩ E 0 ) = H(E) + H(E 0 ) for two independent events E and E 0 .
4. Normalization: H(E) = 1 for E with P [E] = 12 .
1 Infact, a (deterministic) computer can only generate pseudo-random numbers, i.e., numbers that cannot
be distinguished (using any efficient method) from true random numbers.
2 An immediate implication is that there cannot exist a compression method that takes as input data X
16
The axioms appear natural if we think of H as a measure of uncertainty. Indeed,
Axiom 3 reflects the idea that our total uncertainty about two independent events is
simply the sum of the uncertainty about the individual events. We also note that the
normalization imposed by Axiom 4 can be chosen arbitrarily; the convention, however, is
to assign entropy 1 to the event corresponding to the outcome of a fair coin flip.
The axioms uniquely define the function H.
Lemma 3.1.3. The function H satisfies the above axioms if and only if it has the form
Proof. It is straightforward that H as defined in the lemma satisfies all the axioms. It
thus remains to show that the definition is unique. For this, we make the ansatz
where f is an arbitrary function from R+ ∪ {∞} to R ∪ {∞}. We note that, apart from
taking into account the first axiom, this is no restriction of generality, because any possible
function of P [E] can be written in this form.
From the continuity axiom, it follows that f must be continuous. Furthermore, inserting
the additivity axiom for events E and E 0 with probabilities p and p0 , respectively, gives
f (a) + f (a0 ) = f (a + a0 ) .
Together with the continuity axiom, we conclude that f is linear, i.e., f (x) = γx for some
γ ∈ R. The normalization axiom then implies that γ = 1.
Then the Shannon entropy is defined as the expectation value of h(x), i.e.,
X
H(X) := hh(X)i = − PX (x) log2 PX (x) .
x∈X
If the probability measure P is unclear from the context, we will include it in the notation
as a subscript, i.e., we write H(X)P .
17
Similarly, the min-entropy, denoted Hmin , is defined as the minimum entropy H(Ex ) of
the events Ex , i.e.,
A slightly different entropy measure is the max-entropy, denoted Hmax . Despite the
similarity of its name to the above measure, the definition does not rely on the entropy of
events, but rather on the cardinality of the support suppPX := {x ∈ X : PX (x) > 0} of
PX ,
Hmax (X) := log2 suppPX .
with equality if the probability mass function PX is flat. Furthermore, they have various
properties in common. The following holds for H, Hmin , and Hmax ; to keep the notation
simple, however, we only write H.
1. H is invariant under permutations of the elements, i.e., H(X) = H(π(X)), for any
permutation π.
2. H is nonnegative.3
3. H is upper bounded by the logarithm of the alphabet size, i.e., H(X) ≤ log2 |X |.
4. H equals zero if and only if exactly one of the entries of PX equals one, i.e., if
|suppPX | = 1.
3 Note that this will no longer be true for the conditional entropy of quantum states.
18
For the definition of the min-entropy of X given Y , the expectation value is replaced
by a minimum, i.e.,
Hmin (X|Y ) := min h(x|y) = − log2 max PX|Y =y (x) .
x∈X x∈X
y∈Y y∈Y
The conditional entropies H, Hmin , and Hmax satisfy the rules listed in Section 3.1.3.
Furthermore, the entropies can only decrease when conditioning on an additional random
variable Z, i.e.,
H(X|Y ) ≥ H(X|Y Z) . (3.4)
This relation is also known as strong subadditivity and we will prove it in the more general
quantum case.
Finally, it is straightforward to verify that the Shannon entropy H satisfies the chain
rule
H(X|Y Z) = H(XY |Z) − H(Y |Z) .
In particular, if we omit the random variable Z, we get
H(X|Y ) = H(XY ) − H(Y )
that is, the uncertainty of X given Y can be seen as the uncertainty about the pair (X, Y )
minus the uncertainty about Y . We note here that a slightly modified version of the chain
rule also holds for Hmin and Hmax , but we will not go further into this.
19
3.1.6 Smooth min- and max- entropies
The dependency of the min- and max-entropy of a random variable on the underlying
probability mass functions is discontinuous. To see this, consider a random variable X
with alphabet {1, . . . , 2` } and probability mass function PX
ε
given by
ε
PX (1) = 1 − ε
ε ε
PX (x) = ` if x > 1 ,
2 −1
where ε ∈ [0, 1]. It is easy to see that, for ε = 0,
Hmax (X)PX0 = 0
whereas, for any ε > 0,
Hmax (X)PXε = ` .
0 ε
Note also that the trace distance between the two distributions satisfies δ(PX , PX ) = ε.
That is, an arbitrarily small change in the distribution can change the entropy Hmax (X)
by an arbitrary amount. In contrast, a small change of the underlying probability mass
function is often irrelevant in applications. This motivates the following definition of
smooth min- and max-entropies, which extends the above definition.
Let X and Y be random variables with joint probability mass function PXY , and let
ε ≥ 0. The ε-smooth min-entropy of X conditioned on Y is defined as
ε
Hmin (X|Y ) := max Hmin (X|Y )QXY
QXY ∈Bε (PXY )
where the maximum ranges over the ε-ball B ε (PXY ) of probability mass functions QXY
satisfying δ(PXY , QXY ) ≤ ε. Similarly, the ε-smooth max-entropy of X conditioned on Y
is defined as
ε
Hmax (X|Y ) := min Hmax (X|Y )QXY .
QXY ∈Bε (PXY )
Note that the original definitions of Hmin and Hmax can be seen as the special case
where ε = 0.
20
Proof. The lemma is a consequence of the law of large numbers (see Section 2.3.7), applied
to the random variables Zi := h(Xi |Yi ), for h(x|y) defined by (3.3). More details are given
in the exercises.
The transmission is successful if M = M 0 . More generally, for any fixed encoding and
decoding procedures enc` and dec` , and for any message m ∈ {0, 1}` , we can define
penc
err
` ,dec`
(m) := P [dec` ◦ p ◦ enc` (M ) 6= M |M = m]
as the probability that the decoded message M 0 := dec` ◦ p ◦ enc` (M ) generated by the
process (3.5) does not coincide with M .5
In the following, we analyze the maximum number of message bits ` that can be trans-
mitted in one use of the channel p if we tolerate a maximum error probability ε,
21
Theorem 3.2.1. For any channel p and any ε ≥ 0,
1
`ε (p) ≥ max Hmin (X) − Hmax (X|Y ) − log2 − 3 ,
PX ε
where the entropies on the right hand side are evaluated for the random variables X and
Y jointly distributed according to PXY = PX p.6
The proof idea is illustrated in Fig. 3.1.
X codewords Y
M M
enc l p dec l
m m
� y
supp PX|Y =y
Figure 3.1: The figure illustrates the proof idea of the channel coding theorem. The range
of the encoding function enc` is called code and their elements are the code-
words.
Proof. The argument is based on a randomized construction of the encoding function. Let
PX be the distribution that maximizes the right hand side of the claim of the theorem
and let ` be
2
` = bHmin (X) − Hmax (X|Y ) − log2 c. (3.6)
ε
In a first step, we consider an encoding function enc` chosen at random by assigning to
each m ∈ {0, 1}` a value enc` (m) := X where X is chosen according to PX . We then show
that for a decoding function dec` that maps y ∈ Y to an arbitrary value m0 ∈ {0, 1}` that
is compatible with y, i.e., enc` (m0 ) ∈ suppPX|Y =y , the error probability for a message M
chosen uniformly at random satisfies
ε
penc ` ,dec`
err (M ) = P [dec` ◦ p ◦ enc` (M ) 6= M ] ≤ . (3.7)
2
In a second step, we use this bound to show that there exist enc0`−1 and dec0`−1 such that
enc0 ,dec0`−1
perr `−1 (m) ≤ ε ∀m ∈ {0, 1}`−1 . (3.8)
22
We then have
`ε (p) ≥ ` − 1
= bHmin (X) − Hmax (X|Y ) − log2 (2/ε)c − 1
≥ Hmin (X) − Hmax (X|Y ) − log2 (1/ε) − 3.
To prove (3.7), let enc` and M be chosen at random as described, let Y := p ◦ enc` (M )
be the channel output, and let M 0 := dec` (Y ) be the decoded message. We then consider
any pair (m, y) such that PM Y (m, y) > 0. It is easy to see that, conditioned on the
event that (M, Y ) = (m, y), the decoding function dec` described above can only fail, i.e.,
produce an M 0 6= M , if there exists m0 6= m such that enc` (m0 ) ∈ suppPX|Y =y . Hence,
the probability that the decoding fails is bounded by
Because, by construction, enc` (m0 ) is a value chosen at random according to the distri-
bution PX , the probability in the sum on the right hand side of the inequality is given
by
X
P [enc` (m0 ) ∈ suppPX|Y =y ] = PX (x)
x∈suppPX|Y =y
where the last inequality follows from the definitions of Hmin and Hmax . Combining this
with the above and observing that there are only 2` − 1 values m0 6= m, we find
ε
P [M 6= M 0 |M = m, Y = y] ≤ 2`−(Hmin (X)−Hmax (X|Y )) ≤ .
2
Because this holds for any m and y, we have
ε
P [M 6= M 0 ] ≤ max P [M 6= M 0 |M = m, Y = y] ≤ .
m,y 2
This immediately implies that (3.7) holds on average over all choices of enc` . But this
also implies that there exists at least one specific choice for enc` such that (3.7) holds.
It remains to show inequality (3.8). For this, we divide the set of messages {0, 1}` into
enc` ,dec`
two equally large sets M and M such that perr (m) ≤ penc
err
` ,dec`
(m) for any m ∈ M
and m ∈ M. We then have
X
max penc
err
` ,dec`
(m) ≤ min penc
err
` ,dec`
(m) ≤ 2−(`−1) penc
err
` ,dec`
(m) .
m∈M m∈M
m∈M
23
Using (3.7), we conclude
X
max penc ` ,dec`
2−` penc ` ,dec`
(m) = 2 penc ` ,dec`
err (m) ≤ 2 err err (M ) ≤ ε .
m∈M
m∈{0,1}`
Inequality (3.8) then follows by defining enc0`−1 as the encoding function enc` restricted
to M, and adapting the decoding function accordingly.
where the entropies on the right hand side are evaluated for PXY := PX p.
I(U : W ) ≤ I(U : V )
24
which holds for any random variables such that U ↔ V ↔ W is a Markov chain. The
inequality is proved by
where the first inequality holds because the mutual information cannot be negative and
the last equality follows because I(U : W |V ) = 0 (see end of Section 3.1.5). The remaining
equalities are essentially rewritings of the chain rule (for the Shannon entropy).
Let now M , X, Y , and M 0 be defined as in (3.5). If the decoding is successful then
M = M 0 which implies
Combining this with (3.11) and assuming that the message M is uniformly distributed
over the set {0, 1}` of bitstrings of length ` gives
It is straightforward to verify that the statement still holds approximately if ` on the left
hand side is replaced by `ε , for some small decoding error ε > 0. Taking the limits as
in (3.10) finally gives
25
4 Quantum States and Operations
The mathematical formalism used in quantum information theory to describe quantum
mechanical systems is in many ways more general than that of typical introductory books
on quantum mechanics. This is why we devote a whole chapter to it. The main concepts
to be treated in the following are density operators, which represent the state of a system,
as well as positive-valued measures (POVMs) and completely positive maps (CPMs), which
describe measurements and, more generally, the evolution of a system.
4.1 Preliminaries
4.1.1 Hilbert spaces and operators on them
An inner product space is a vector space (over R or C) equipped with an inner product
(·, ·). A Hilbert
p space H is an inner product space such that the metric defined by the
norm kαk ≡ (α, α) is complete, i.e., every Cauchy sequence is convergent. We will often
deal with finite-dimensional spaces, where the completeness condition always holds, i.e.,
inner product spaces are equivalent to Hilbert spaces.
We denote the set of homomorphisms (i.e., the linear maps) from a Hilbert space H to a
Hilbert space H0 by Hom(H, H0 ). Furthermore, End(H) is the set of endomorphism (i.e.,
the homomorphisms from a space to itself) on H, that is, End(H) = Hom(H, H). The
identity operator α 7→ α that maps any vector α ∈ H to itself is denoted by id.
The adjoint of a homomorphism S ∈ Hom(H, H0 ), denoted S ∗ , is the unique operator
in Hom(H0 , H) such that
(α0 , Sα) = (S ∗ α0 , α) ,
for any α ∈ H and α0 ∈ H0 . In particular, we have (S ∗ )∗ = S. If S is represented as a
matrix, then the adjoint operation can be thought of as the conjugate transpose.
In the following, we list some properties of endomorphisms S ∈ End(H).
• S is normal if SS ∗ = S ∗ S.
• S is unitary if SS ∗ = S ∗ S = id. Unitary operators S are always normal.
• S is Hermitian if S ∗ = S. Hermitian operators are always normal.
• S is positive if (α, Sα) ≥ 0 for all α ∈ H. Positive operators are always Hermitian.
We will sometimes write S ≥ 0 to express that S is positive.
• S is a projector if SS = S. Projectors are always positive.
Given an orthonormal basis {ei }i of H, we also say that S is diagonal with respect to {ei }i
if the matrix (Si,j ) defined by the elements Si,j = (ei , Sej ) is diagonal.
26
4.1.2 The bra-ket notation
In this script, we will make extensive use of a variant of Dirac’s bra-ket notation, where
vectors are interpreted as operators. More precisely, we identify any vector α ∈ H with
an endomorphism |αi ∈ Hom(C, H), called ket, and defined as
|αi : γ 7→ αγ
for any γ ∈ C. The adjoint |αi∗ of this mapping is called bra and denoted by hα|. It is
easy to see that hα| is an element of the dual space H∗ := Hom(H, C), namely the linear
functional defined by
hα| : β 7→ (α, β)
for any β ∈ H.
Using this notation, the concatenation hα||βi of a bra hα| ∈ Hom(H, C) with a ket
|βi ∈ Hom(C, H) results in an element of Hom(C, C), which can be identified with C. It
follows immediately from the above definitions that, for any α, β ∈ H,
hα||βi ≡ (α, β) .
for some families of vectors {αi }i and {βi }i . For example, the identity id ∈ End(H) can
be written as
X
id = |ei ihei |
i
• α ⊗ (β + β 0 ) = α ⊗ β + α ⊗ β 0
• 0⊗β =α⊗0=0
27
for any α, α0 ∈ HA and β, β 0 ∈ HB , where 0 denotes the zero vector. Furthermore, the
inner product of HA ⊗ HB is defined by the linear extension (and completion) of
for any α ∈ HA and β ∈ HB . The space spanned by the products S ⊗ T can be canonically
identified1 with the tensor product of the spaces of the homomorphisms, i.e.,
0
Hom(HA , HA 0
) ⊗ Hom(HB , HB )∼ 0
= Hom(HA ⊗ HB , HA 0
⊗ HB ). (4.2)
|αi ⊗ |βi = |α ⊗ βi ,
where {ei }i is any orthonormal basis of H. The trace is well defined because the above
expression is independent of the choice of the basis, as one can easily verify.
The trace operation tr is obviously linear, i.e.,
for any S, T ∈ End(H) and u, v ∈ C. It also commutes with the operation of taking the
adjoint,3
tr(S ∗ ) = tr(S)∗ .
tr(ST ) = tr(T S) .
1 That is, the mapping defined by (4.1) is an isomorphism between these two vector spaces.
2 More precisely, the trace is only defined for trace class operators over a separable Hilbert space. However,
all endomorphisms on a finite-dimensional Hilbert space are trace class operators.
3 The adjoint of a complex number γ ∈ C is simply its complex conjugate.
28
Also, it is easy to verify4 that the trace tr(S) of a positive operator S ≥ 0 is positive.
More generally
(S ≥ 0) ∧ (T ≥ 0) =⇒ tr(ST ) ≥ 0 . (4.3)
trB : S ⊗ T 7→ tr(T )S ,
and
trB (TA ⊗ idB )SAB = TA trB (SAB ) . (4.5)
We will also make use of the property that the trace on a bipartite system can be
decomposed into partial traces on the individual subsystems, i.e.,
S = U DU ∗ .
4 The assertion can, for instance, be proved using the spectral decomposition of S and T (see below for
a review of the spectral decomposition).
5 Here and in the following, we will use subscripts to indicate the space on which an operator acts.
6 Alternatively, the partial trace tr
B can be defined as a product mapping I ⊗ tr where I is the identity
operation on End(HA ) and tr is the trace mapping elements of End(HB ) to End(C). Because the
trace is a completely positive map (see definition below) the same is true for the partial trace.
7 More generally, the partial trace commutes with any mapping that acts like the identity on End(H ).
B
29
The spectral decomposition implies that, for any normal S ∈ End(H), there exists a
basis {ei }i of H with respect to which S is diagonal. That is, S can be written as
X
S= αi |ei ihei | (4.7)
i
Polar decomposition. Let S ∈ End(H). Then there exists a unitary U ∈ End(H) such
that
√
S = SS ∗ U
and
√
S = U S∗S .
S = V DU .
In particular, for any S ∈ Hom(H, H0 ), there exist bases {ei }i of H and {e0i }i of H0 such
that the matrix defined by the elements (e0i , Sej ) is diagonal.
where ei ∈ HA and e0i∈ HB are eigenvectors of the operators ρA := trB (|ΨihΨ|) and
ρB := trA (|ΨihΨ|), respectively, and where γi2 are the corresponding eigenvalues. In
particular, the existence of the Schmidt decomposition implies that ρA and ρB have the
same nonzero eigenvalues.
30
4.1.6 Operator norms and the Hilbert-Schmidt inner product
The Hilbert-Schmidt inner product between two operators S, T ∈ End(H) is defined by
(S, T ) := tr(S ∗ T ) .
p
The induced norm kSk2 := P(S, S) is called Hilbert-Schmidt norm. If S is normal with
spectral decomposition S = i αi |ei ihei | then
sX
kSk2 = |αi |2 .
i
tr(ST ) ≥ 0 .
√ 2 √ 2
Proof. If S is positive we have S = S and T = T . Hence, using the cyclicity of the
trace, we have
tr(ST ) = tr(V ∗ V )
√ √
where V = S T . Because the trace of a positive operator is positive, it suffices to show
that V ∗ V ≥ 0. This, however, follows from the fact that, for any φ ∈ H,
kSk1 := tr|S|
where
√
S∗S .
|S| :=
P
If S is normal with spectral decomposition S = i αi |ei ihei | then
X
kSk1 = |αi | .
i
31
Proof. We need to show that, for any unitary U ,
which proves (4.8). Finally, it is easy to see that equality holds for U := V ∗ .
Herm(HA ) ⊗ Herm(HB ) ∼
= Herm(HA ⊗ HB ) . (4.10)
To see this, consider the canonical mapping from Herm(HA )⊗Herm(HB ) to Herm(HA ⊗ HB )
defined by (4.1). It is easy to verify that this mapping is injective. Furthermore, because
by (4.9) the dimension of both spaces equals dim(HA )2 dim(HB )2 , it is a bijection, which
proves (4.10).
32
in the following, e.g., the postulates of quantum mechanics, might appear to lack a clear
motivation.
In this section, we pursue one of the standard approaches to quantum mechanics. It
is based on a number of postulates about the states of physical systems as well as their
evolution. (For more details, we refer to Section 2 of [1], where an equivalent approach is
described.) The postulates are as follows:
Ψ = φ ⊗ φ0 ∈ HA ⊗ HB .
3. Evolutions: For any possible evolution of an isolated physical system with state
space H and for any fixed time interval [t0 , t1 ] there exists a unitary U describing
the mapping of states φ ∈ H at time t0 to states
φ0 = U φ
where Px denotes the projector onto the eigenspace belonging to the eigenvalue
x, i.e., O = x xPx . Finally, the state φ0x of the system after the measurement,
P
conditioned on the event that the outcome is x, equals
s
0 1
φx := Px φ .
PX (x)
33
a scenario where a system might be in two possible states φ0 or φ1 , chosen according to
a certain probability distribution. Another simple example is a system consisting of two
correlated parts A and B in a state
r
1
Ψ= e0 ⊗ e0 + e1 ⊗ e1 ∈ HA ⊗ HB , (4.11)
2
where {e0 , e1 } are orthonormal vectors in HA = HB . From the point of view of an
observer that has no access to system B, the state of A does not correspond to a fixed
vector φ ∈ HA , but is rather described by a mixture of such states. In this section,
we introduce the density operator formalism, which allows for a simple and convenient
characterization of such situations.
where PX is the probability mass function defined by the eigenvalues PX (x) of ρ and
{ex }x are the corresponding eigenvectors. Given this representation, it is easy to see that
a density operator is pure if and only if exactly one of the eigenvalues equals 1 whereas
the others are 0. In particular, we have the following lemma.
Lemma 4.3.2. A density operator ρ is pure if and only if tr(ρ2 ) = 1.
34
2. Composition: The states of a composite system with state spaces HA and HB are
represented as density operators on HA ⊗ HB . Furthermore, if the states of the
individual subsystems are independent of each other and represented by density
operators ρA and ρB , respectively, then the state of the joint system is ρA ⊗ ρB .
3. Evolution: Any isolated evolution of a subsystem of a composite system over a
fixed time interval [t0 , t1 ] corresponds to a unitary on the state space H of the
subsystem. For a composite system with state space HA ⊗HB and isolated evolutions
on both subsystems described by UA and UB , respectively, any state ρAB at time t0
is transformed into the state10
at time t1 .11
It is straightforward to verify that these postulates are indeed compatible with those of
Section 4.2. What is new is merely the fact that the evolution and measurements can be
restricted to individual subsystems of a composite system. As we shall see, this extension
is, however, very powerful because it allows us to examine parts of a subsystem without
the need of keeping track of the state of the entire system.
35
where UB is an arbitrary isolated evolution on HB . Using rules (4.6) and (4.4), this can
be transformed into
PX (x) = tr Px UA trB (ρAB )UA† ,
which is independent of UB . Observe now that this expression could be obtained equiv-
alently by simply applying the above postulates to the reduced state ρA := trB (ρAB ). In
other words, the reduced state already fully characterizes all observable properties of the
subsystem HA .
This principle, which is sometimes called locality, plays a crucial role in many information-
theoretic considerations. For example, it implies that it is impossible to influence system
HA by local actions on system HB . In particular, communication between the two subsys-
tems is impossible as long as their evolution is determined by local operations UA ⊗ UB .
In this context, it is important to note that the reduced state ρA of a pure joint state
ρAB is not necessarily pure. For instance, if the joint system is in state ρAB = |ΨihΨ| for
Ψ defined by (4.11) then
1 1
ρA = |e0 ihe0 | + |e1 ihe1 | , (4.15)
2 2
i.e., the density operator ρA is fully mixed. In the next section, we will give an interpre-
tation of non-pure, or mixed, density operators.
Conversely, any mixed density operator can be seen as part of a pure state on a larger
system. More precisely, given ρA on HA , there exists a pure density operator ρAB on a
joint system HA ⊗ HB (where the dimension of HB is at least as large as the rank of ρA )
such that
ρA = trB (ρAB ) (4.16)
A pure density operator ρAB for which (4.16) holds is called a purification of ρA .
36
By linearity, this can be rewritten as
PX (x) = tr(Px UA ρA UA∗ ) . (4.17)
where
X
ρA := PZ (z)ρzA .
z
37
where {ez }z is a family of orthonormal vectors on HZ .
More generally, we use the following definition of classicality.
Definition 4.3.3. Let HA and HZ be Hilbert spaces and let {ez }z be a fixed orthonormal
basis of HZ . Then a density operator ρAZ ∈ S(HA ⊗ HZ ) is said to be classical on HZ
(with respect to {ez }z ) if12
δ(ρ, σ) = δ(P, Q) .
More generally, the following lemma implies that for any (not necessarily classical) ρ
and σ there is always a measurement O that “conserves” the trace distance.
Lemma 4.3.5. Let ρ, σ ∈ S(H). Then
where the maximum ranges over all observables O ∈ HermH and where P and Q are the
probability mass functions of the outcomes when applying the measurement described by O
to ρ and σ, respectively.
12 Ifthe classical system HZ itself has a tensor product structure (e.g., HZ = HZ 0 ⊗ HZ 00 ) we typically
assume that the basis used for defining classical states has the same product structure (i.e., the basis
vectors are of the form e = e0 ⊗ e00 with e0 ∈ HZ 0 and e00 ∈ HZ 00 ).
38
P
Proof. Define ∆ := ρ − σ and let ∆ = i αi |ei ihei | be a spectral decomposition. Further-
more, let R and S be positive operators defined by
X
R= αi |ei ihei |
i: αi ≥0
X
S=− αi |ei ihei | ,
i: αi <0
that is,
∆=R−S (4.19)
|∆| = R + S . (4.20)
P
Finally, let O = x xPx be a spectral decomposition of O, where each Px is a projector
onto the eigenspace corresponding to the eigenvalue x. Then
1 X 1 X
tr(Px ρ) − tr(Px σ) = 1
X
δ(P, Q) = P (x) − Q(x) = tr(Px ∆) . (4.21)
2 x 2 x 2 x
This proves that the maximum maxO δ(P, Q) on the right hand side of the assertion of
the lemma cannot be larger than δ(ρ, σ). To see that equality holds, it suffices to verify
that the inequality in(4.22) becomes an equality if for any x the projector Px either lies in
the support of R or in the support of S. Such a choice of the projectors is always possible
because R and S have mutually orthogonal support.
An implication of Lemma 4.3.5 is that the trace distance between two states ρ and σ can
be interpreted as the maximum distinguishing probability, i.e., the maximum probability
by which a difference between ρ and σ can be detected (see Lemma 2.3.1). Another
consequence of Lemma 4.3.5 is that the trace distance cannot increase under the partial
trace, as stated by the following lemma.
Lemma 4.3.6. Let ρAB and σAB be bipartite density operators and let ρA := trB (ρAB )
and σA := trB (σAB ) be the reduced states on the first subsystem. Then
39
Proof. Let P and Q be the probability mass functions of the outcomes when applying a
measurement OA to ρA and σA , respectively. Then, for an appropriately chosen OA , we
have according to Lemma 4.3.5
Consider now the observable OAB on the joint system defined by OAB := OA ⊗ idB .
It follows from property (4.4) of the partial trace that, when applying the measurement
described by OAB to the joint states ρAB and σAB , we get the same probability mass
functions P and Q. Now, using again Lemma 4.3.5,
The fidelity between pure states thus simply corresponds to the (absolute value of the)
scalar product between the states.
The following statement from Uhlmann generalizes this statement to arbitrary states.
Theorem 4.3.8 (Uhlmann). Let ρA and σA be density operators on a Hilbert space HA .
Then
where the maximum ranges over all purifications ρAB and σAB of ρA and σA , respectively.
40
Proof. Because any finite-dimensional Hilbert space can be embedded into any other
Hilbert space with higher dimension, we can assume without loss of generality that HA
and HB have equal dimension.
Let {ei }i and {fi }i be orthonormal bases of HA and HB , respectively, and define
X
Θ := ei ⊗ fi .
i
Furthermore, let W ∈ Hom(HA , HB ) be the transformation of the basis {ei }i to the basis
{fi }i , that is,
W : ei 7→ fi .
Writing out the definition of Θ, it is easy to verify that, for any SB ∈ End(HB ),
0
(idA ⊗ SB )Θ = (SA ⊗ idB )Θ (4.26)
0
where SA := W −1 SB
T T
W , and where SB denotes the transpose of SB with respect to the
basis {fi }i .
Let now ρAB = |ΨihΨ| and let
X
Ψ= αi e0i ⊗ fi0
i
be a Schmidt decomposition of Ψ. Because the coefficients αi are the square roots of the
eigenvalues of ρA , we have
√
Ψ = ( ρA ⊗ idB )(UA ⊗ UB )Θ
where UA is the transformation of {ei }i to {e0i }i and, likewise, UB is the transformation
of {fi }i to {fi0 }i . Using (4.26), this can be rewritten as
√
Ψ = ( ρA V ⊗ idB )Θ
for V := UA W −1 UBT W unitary. Similarly, for σAB = |Ψ0 ihΨ0 |, we have
√
Ψ0 = ( σA V 0 ⊗ idB )Θ
for some appropriately chosen unitary V 0 . Thus, using (4.25), we find
√ √ √ √
F (ρAB , σAB ) = |hΨ|Ψ0 i| = hΘ|V ∗ ρA σA V 0 |Θi = tr(V ∗ ρA σA V 0 ) ,
where the last equality is a consequence of the definition of Θ. Using the fact that any
unitary V 0 can be obtained by an appropriate choice of the purification σAB , this can be
rewritten as
√ √
F (ρAB , σAB ) = max tr(U ρA σA ) .
U
41
Uhlmann’s theorem is very useful for deriving properties of the fidelity, as, e.g., the
following lemma.
Lemma 4.3.9. Let ρAB and σAB be bipartite states. Then
F (ρAB , σAB ) ≤ F (ρA , σA ) .
Proof. According to Uhlmann’s theorem, there exist purifications ρABC and σABC of ρAB
and σAB such that
F (ρAB , σAB ) = F (ρABC , σABC ) . (4.27)
Trivially, ρABC and σABC are also purifications of ρA and σA , respectively. Hence, again
by Uhlmann’s theorem,
F (ρA , σA ) ≥ F (ρABC , σABC ) . (4.28)
Combining (4.27) and (4.28) concludes the proof.
The trace distance and the fidelity are related to each other. In fact, for pure states,
represented by normalized vectors φ and ψ, we have
p
δ(φ, ψ) = 1 − F (φ, ψ)2 . (4.29)
To see this, let φ⊥ be a normalized vector orthogonal to φ such that ψ = αφ + βφ⊥ , for
some α, β ∈ R+ such that α2 +β 2 = 1. (Because the phases of both φ, φ⊥ , ψ are irrelevant,
the coefficients α and β can without loss of generality assumed to be real and positive.)
The operators |φihφ| and |ψihψ| can then be written as matrices with respect to the basis
{φ, φ⊥ },
1 0
|φihφ| =
0 0
|α| αβ ∗
2
|ψihψ| =
α∗ β |β|2
In particular, the trace distance takes the form
−αβ ∗
2
1
|φihφ| − |ψihψ|
= 1
1 − |α|
δ(φ, ψ) = ∗
.
2 1 2
−α β −|β|2
1
The eigenvalues of the matrix on the right hand side are α0 = β and α1 = −β. We thus
find
1
δ(φ, ψ) = |α0 | + |α1 | = β .
2
Furthermore, by the definition of β, we have
p
β = 1 − |hφ|ψi|2 .
The assertion (4.29) then follows from (4.25).
Equality (4.29) together with Uhlmann’s theorem are sufficient to prove one direction
of the following lemma.
42
Lemma 4.3.10. Let ρ and σ be density operators. Then
p
1 − F (ρ, σ) ≤ δ(ρ, σ) ≤ 1 − F (ρ, σ)2 .
Proof. We only prove the second inequality. For a proof of the first, we refer to [1].
Consider two density operators ρA and σA and let ρAB and σAB be purifications such
that
as in Uhlmann’s theorem. Combining this with equality (4.29) and Lemma 4.3.6, we find
p p
1 − F (ρA , σA )2 = 1 − F (ρAB , σAB )2 = δ(ρAB , σAB ) ≥ δ(ρA , σA ) .
A set of states which are often used in quantum information are the Bell states. As we
will use them later in the course we state here the definition.
Definition 4.3.11. The Bell states or EPR pairs are four specific two-qubit states
β0 , ..., β3 defined by
X 1
|βµ i := √ (σµ )ab |a, bi.
a,b∈{0,1}
2
Having defined what Bell states are we can define the ebit as follows.
Definition 4.3.12. An ebit is one unit of bipartite entanglement, the amount of entan-
glement that is contained in a maximally entangled two-qubit state, i.e. a Bell state.
In other words this means that if we speak about an ebit, we mean one of the four Bell
states.
43
4.4.1 Completely positive maps (CPMs)
Let HA and HB be the Hilbert spaces describing certain (not necessarily disjoint) parts
of a physical system. The evolution of the system over a time interval [t0 , t1 ] induces a
mapping E from the set of states S(HA ) on subsystem HA at time t0 to the set of states
S(HB ) on subsystem HB at time t1 . This and the following sections are devoted to the
study of this mapping.
Obviously, not every function E from S(HA ) to S(HB ) corresponds to a physically
possible evolution. In fact, based on the considerations in the previous sections, we have
the following requirement. If ρ is a mixture of two states ρ0 and ρ1 , then we expect that
E(ρ) is the mixture of E(ρ0 ) and E(ρ1 ). In other words, a physical mapping E needs to
conserve the convex structure of the set of density operators, that is,
E pρ0 + (1 − p)ρ1 = pE(ρ0 ) + (1 − p)E(ρ1 ) , (4.30)
TA : S 7→ S T ,
where S T denotes the transpose with respect to some fixed basis. To see that TA is
positive, note that S ≥ 0 implies hφ̄|S|φ̄i ≥ 0 for any vector φ̄. Hence hφ|S T |φi =
hφ|S̄ ∗ |φi = hφ|S̄|φi = hφ̄|S|φ̄i ≥ 0, from which we conclude S T ≥ 0.
Remarkably, positivity of two maps E and F does not necessarily imply positivity of the
tensor map E ⊗ F defined by
In fact, it is straightforward to verify that the map IA ⊗TA0 applied to the positive operator
ρAA0 := |ΨihΨ|, for Ψ defined by (4.11), results in a non-positive operator.
To guarantee that tensor products of mappings such as E ⊗ F are positive, a stronger
requirement is needed, called complete positivity.
44
Definition 4.4.2. A linear map E ∈ Hom(End(HA ), End(HB )) is said to be completely
positive if for any Hilbert space HR , the map E ⊗ IR is positive.
Definition 4.4.3. A linear map E ∈ Hom(End(HA ), End(HB )) is said to be trace pre-
serving if tr(E(S)) = tr(S) for any S ∈ End(HA ).
We will use the abbreviation CPM to denote completely positive maps. Moreover, we
denote by TPCPM(HA , HB ) the set of trace-preserving completely positive maps from
End(HA ) to End(HB ).
τ : E 7→ (IA0 ⊗ E)(|ΨihΨ|) .
Proof. It suffices to verify that the mapping τ −1 defined in the lemma is indeed an inverse
of τ . We first check that τ ◦ τ −1 is the identity on End(HA0 ⊗ HB ). That is, we show that
for any operator ρA0 B ∈ End(HA0 ⊗ HB ), the operator
τ (τ −1 (ρA0 B )) := d · (IA0 ⊗ trA0 ) (IA0 ⊗ TA→A0 )(|ΨihΨ|) ⊗ idB (idA0 ⊗ ρA0 B )
(4.31)
45
equals ρA0 B (where we have written IA0 ⊗ trA0 instead of trA0 to indicate that the trace
only acts on the second subsystem HA0 ). Inserting the definition of Ψ, we find
X
τ (τ −1 (ρA0 B )) = (IA0 ⊗ trA0 )
(|ei ihej |A0 ⊗ |ej ihei |A0 ⊗ idB )(idA0 ⊗ ρA0 B )
i,j
X
= (|ei ihei |A0 ⊗ idB )ρA0 B (|ej ihej |A0 ⊗ idB ) = ρA0 B ,
i,j
Together with the fact that trA0 (|ΨihΨ|) = d1 idA this implies
E(SA ) = d · E SA trA0 (|ΨihΨ|)
= d · trA0 (IA0 ⊗ E) (idA0 ⊗ SA )|ΨihΨ|
= d · trA0 (IA0 ⊗ E) (TA→A0 (SA ) ⊗ idA )|ΨihΨ|
= d · trA0 (TA→A0 (SA ) ⊗ idA )(IA0 ⊗ E)(|ΨihΨ|) .
Assume now that τ (E) = 0. Then, by definition, (IA0 ⊗ E)(|ΨihΨ|) = 0. By virtue of the
above equality, this implies E(SA ) = 0 for any SA and, hence, E = 0. In other words,
τ (E) = 0 implies E = 0, i.e., τ is injective.
In the following, we focus on trace-preserving CPMs. The set TPCPM(HA , HB ) ob-
viously is a subset of Hom(End(HA ), End(HB )). Consequently, τ (TPCPM(HA , HB )) is
also a subset of End(HA0 ⊗ HB ). It follows immediately from the complete positivity
property that τ (TPCPM(HA , HB )) only contains positive operators. Moreover, by the
trace-preserving property, any ρA0 B ∈ τ (TPCPM(HA0 , HB )) satisfies
1
trB (ρA0 B ) = idA0 . (4.32)
d
In particular, ρA0 B is a density operator.
Conversely, the following lemma implies13 that any density operator ρA0 B that satis-
fies (4.32) is the image of some trace-preserving CPM. We therefore have the following
characterization of the image of TPCPM(HA , HB ) under the Choi-Jamiolkowski isomor-
phism,
46
1
Lemma 4.4.6. Let Φ ∈ HA0 ⊗ HB such that trB (|ΦihΦ|) = d idA .
0 Then the mapping
E := τ −1 (|ΦihΦ|) has the form
E : SA 7→ U SA U ∗
√ P
where Ei := d · (hei | ⊗ idB )|Φihei |. Defining U := i Ei , we conclude that E has the
desired form, i.e., E(SA ) = U SA U ∗ .
To show that U is an isometry, let
1 X
Φ= √ ei ⊗ fi
d i
be a Schmidt decomposition of Φ. (Note that, because trB (|ΦihΦ|) is fully mixed, the
basis {ei } can be chosen to coincide with the basis used for the definition of τ .) Then
(hei | ⊗ idB )|Φi = |fi i and, hence,
X
U ∗U = d |ej ihΦ|(|ej i ⊗ idB )(hei | ⊗ idB )|Φihei | = idA .
i,j
E : SA 7→ trR (U SA U ∗ ) .
47
Proof. Let EA→B := E, define ρAB := τ (E), and let ρABR be a purification of ρAB . We then
define E 0 = EA→(B,R)
0
:= τ −1 (ρABR ). According to Lemma 4.4.6, because trBR (ρABR ) is
0
fully mixed, EA→(B,R) has the form
0
EA→(B,R) : SA 7→ U SA U ∗ ,
where U is an isometry. The assertion then follows from the fact that the diagram below
commutes, which can be readily verified from the definition of the Choi-Jamiolkowski
isomorphism. (Note that the arrow on the top corresponds to the operation E 0 7→ trR ◦ E 0 .)
tr0
EA→B ←−−R−− EA→(B,R)
x
τy
−1
τ
ρA0 B −−−−→ ρA0 BR
purif.
Ũ : v ⊗ w0 7→ U v .
Using the fact that U is an isometry, it is easy to see that there always exists such a Ũ .
By construction, the unitary Ũ satisfies
14 In the sense that there is less redundant information in the description of the CPM.
48
Lemma 4.4.8 (Operator-sum representation). For any E ∈ TPCPM(HA , HB ) there ex-
ists a family {Ex }x of operators Ex ∈ Hom(HA , HB ) such that
X
E : SA 7→ Ex SA Ex∗ (4.33)
x
Note that the family {Ex }x is not uniquely determined by the CPM E. This is easily
seen by the following example. Let E be the trace-preserving CPM from End(HA ) to
End(HB ) defined by
E : SA 7→ tr(SA )|wihw|
for any operator SA ∈ End(HA ) and some fixed w ∈ HB . That is, E maps any density
operator to the state |wihw|. It is easy to verify that this CPM can be written in the
form (4.33) for
Ex := |wihex |
49
4.4.5 Measurements as CPMs
An elegant approach to describe measurements is to use the P notion of classical states.
Let ρAB be a density operator on HA ⊗ HB and let O = x xPx be an observable on
HA . Then, according to the measurement postulate of Section 4.3.2, the measurement
process produces a classical value X distributed according to the probability distribution
PX specified by (4.13), and the post-measurement state ρ0AB,x conditioned on the outcome
x is given by (4.14). This situation is described by a density operator
X
ρ0XAB := PX (x)|ex ihex | ⊗ ρ0AB,x .
x
Note that the mapping E from ρAB to ρ0XAB can be written in the operator-sum repre-
sentation (4.33) with
Ex := |xi ⊗ Px ⊗ idB ,
where
X X
Ex∗ Ex = Px ⊗ idB = idAB .
x x
E : ρAB 7→ ρ0XAB
is a trace-preserving CPM.
This is a remarkable statement. According to the Stinespring dilation theorem, it tells
us that any measurement can be seen as a unitary on a larger system. In other words, a
measurement is just a special type of evolution of the system.
50
In particular, if we apply the CPM E to a density operator ρA , the distribution PX of the
measurement outcome X is given by
where Mx := Ex∗ Ex .
From this we conclude that, as long as we are only interested in the probability distri-
bution of X, it suffices to characterize the evolution and the measurement by the family
of operators Mx . Note, however, that the operators Mx do not fully characterize the full
evolution. In fact, distinct operators Ex can give raise to the same operator Mx = Ex∗ Ex .
It is easy to see from Lemma 4.4.8 that the family {Mx }x of operators defined as above
satisfies the following definition.
Definition 4.4.9. A positive operator valued measure (POVM) (on H) is a family {Mx }x
of positive operators Mx ∈ Herm(H) such that
X
Mx = idH .
x
Conversely, any POVM {Mx }x corresponds to a (not unique) physically possible evo-
lution followed by a measurement. This can easily
√ be seen by defining a CPM by the
operator-sum representation with operators Ex := Mx .
if one recalls the ”maximal distinguishing probability property” of the trace distance. Up
to a factor 1/2 this is the maximal probability to distinguish the CPMs E and F in an
experiment which works with initial states in the Hilbert space H. But this is not the best
way to distinguish the CPMs E and F in an experiment! Note that in our naive definition
above we have excluded the possibility to consider initial states in ”larger” Hilbert spaces
in the maximization-procedure. The probability to distinguish the CPMs E and F in an
experiment may increase if we ”enlarge” the input Hilbert space H by an additional tensor
space factor;
H H ⊗ HE ;
and apply the CPMs E and F as E ⊗ IE and F ⊗ IE to states in S(H ⊗ HE ). These
replacements lead to a simultaneous replacement of the output Hilbert space:
H0 H 0 ⊗ HE .
51
In Section 4.4.8, an explicit example is discussed which shows that there exist situations
in which
˜ F) < d(E
d(E, ˜ ⊗ IE , F ⊗ IE )
for some Hilbert space HE . This shows why we discard the immediate use of d(E, ˜ F) but
˜
use a distance measure of the form d(E ⊗ IE , F ⊗ IE ) instead. We will still have to figure
out the optimal choice for the Hilbert space HE which will lead to the definition of the so
called ”diamond norm”.
As motivated above we consider
d(E, F) := max kE ⊗ IE (ρ) − F ⊗ IE (ρ)k1
ρ∈S(H⊗HE )
instead of our naive approach. Next one asks how the distinguishing probability depends
on the choice of the Hilbert space HE . To that purpose we are stating and proving
Lemma 4.4.10. In the final definition of distance between CPMs we will then use a Hilbert
space HE which maximizes the probability for distinguishing the CPMs E and F.
Lemma 4.4.10. Let ρAB be a pure state on a Hilbert space HA ⊗ HB and let ρ0AB 0 be an
arbitrary state on a Hilbert space HA ⊗ HB 0 , such that
trB ρAB = trB 0 ρ0AB 0 .
Then there exists a CPM E : S(HB ) → S(HB 0 ), such that
ρ0AB 0 = IA ⊗ E(ρAB ).
Proof. Assume that ρ0AB 0 is pure. Since ρAB is pure (and by the assumption made in the
lemma) there exist states ψ ∈ HA ⊗ HB and ψ 0 ∈ HA ⊗ HB 0 , such that ρAB = |ψihψ| and
ρ0AB 0 = |ψ 0 ihψ 0 |. Let Xp
|ψi = λi |vi iA ⊗ |wi iB
i
and Xp
|ψ 0 i = λ0i |vi0 iA ⊗ |wi0 iB 0
i
be the Schmidt decompositions of |ψi and |ψ 0 i. Without loss of generality we assume
|vi iA = |vi0 iA because vi and vi0 are both eigenvectors of the operator ρA := trB ρAB =
trB 0 ρ0AB 0 . Define the map X
U := |wi0 iB 0 hwi |B .
i
This map U is an isometry because {wi }i and {wi0 }i are orthonormal systems in HB and
HB 0 , respectively. Consequently,
ψ 0 = (idA ⊗ U )(ψ)
which proves the lemma for ρ0AB 0 being pure.
Now let’s assume that ρ0AB 0 isn’t pure and consider the purification ρ0AB 0 R of ρ0AB 0 . Then
(according to the statement proved so far) there exists a map
U : HB → HB 0 ⊗ HR ,
52
such that
ρ0AB 0 R = (idA ⊗ U )ρAB (idA ⊗ U ∗ ).
Now we simply define E := trR ◦ adidA ⊗U and thus
ρ0AB 0 = IA ⊗ E(ρAB )
We have proved in an earlier chapter about quantum states and operations that trace
preserving CPMs can never increase the distance between states. We thus get
This inequality holds for any choice of HB and states in S(HA ⊗ HB ). We conclude that
the right hand sight of our the inequality describes the best way to distinguish the CPMs
E1 and E2 in an experiment. Consequently, this is the best choice for the distance measure
between CPMs. This distance measure is induced by the following norm.
Definition 4.4.11 (Diamond norm for CPMs). Let H and G be two Hilbert spaces and
let
E : S(H) → S(G)
be a CPM. Then the diamond norm kEk♦ of E is defined as
kEk♦ := kE ⊗ IH k1 ,
where k · k1 denotes the so called trace norm for resources which is defined as
53
4.4.8 Example: Why to enlarge the Hilbert space
We consider an explicit example to recognize that situations occur in which
˜ F) < d(E
d(E, ˜ ⊗ IE , F ⊗ IE )
E: S(C 2 ) → S(C 2 )
ρ 7→ E(ρ) = (1 − p)ρ + p2 IC2
We first compute the left hand side explicitly and prove the inequality afterwards building
on the explicit result derived for the left hand side.
According to the proposed distance measure d(., ˜ .),
ρ = pρ1 + (1 − p)ρ2 ,
where we have used the linearity of CPMs and the triangle inequality in the first step.
The application of this to smaller and smaller subsystems leads to pure states in the end.
This proves the claim.
Claim 4.4.13. The distance kE(ρ) − I(ρ)k1 is invariant under unitary transformations
of ρ, i.e.,
kE(ρ) − ρk1 = kE(U ρU ∗ ) − U ρU ∗ k1 .
Proof. Because of the invariance of the trace norm under unitaries,
where we have used the explicit definition of the map E in the second step. This proves
the claim.
54
Together, these two claims imply that we can use any pure state ρ = |ψihψ| to maximize
kE(ρ) − ρk1 . We chose |ψi = |0i where {|0i, |1i} is the computational basis of C2 . We get
p
− 0
˜
d(E, I) =
2
p
= p.
0 2
1
Now that we have computed d(E,˜ I) we have a closer look at an experiment where the
experimentalist implements the maps E and I as E ⊗ IE = E ⊗ I and I ⊗ IE = I ⊗ I,
respectively. We thus have to show that
˜ ⊗ IE , I ⊗ IE ).
p < d(E
˜ .) it is sufficient to find a state ρ ∈ S(C2 ⊗ C2 ) such that
According to the definition of d(.,
˜ I) = p.
kE ⊗ I(ρ) − I ⊗ I(ρ)k1 ≥ d(E,
For simplicity, we assume p = 1/2. Our ansatz for ρ is the Bell state |β0 ihβ0 |, where
1
|β0 i = √ (|00i + |11i),
2
as introduced in Definition 4.3.11. For p = 1/2 we obtain
1
E ⊗ I(ρ) − I ⊗ I(ρ) = (−|00ih00| + |10ih10| − 2|00ih11| + |00ih01| + |10ih11|
8
−2|11ih00| + |01ih00| + |11ih10| − |11ih11| + |01ih01|) .
55
5 The Completeness of Quantum
Theory
In this section we prove that based on two weak assumptions,
• that quantum theory is correct and
5.1 Motivation
⊃ detector
|φi
⊃ detector
56
5.2 Preliminary definitions
Before stating the result we have to introduce some notation. Note that we do not make
any restrictions or assumptions in this section.
A physical theory usually takes some parameters A and then makes a prediction about
some observable quantity X.
Example 5.2.1. Classical mechanics can predict how long it takes until a mass hits the
ground when falling freely from a certain altitude. In this case A is the altitude and X is
the time that it takes to hit the ground.
Example 5.2.2. In the Stern-Gerlach experiment above, A would be the state in which
the particle is prepared and the angle of the apparatus. X denotes the coordinate where
the particle hits the screen.
More generally, we may have a set of parameters and outcomes. A physical theory
generally imposes a causal structure on these values. Mathematically, we will look at a set
of random variables {A, B, X, Y } with a partial order A → X. In the following we often
consider the following example situation.
X. .Y
A. .B
If A → X, we say that A and X are causally ordered, or that X is in the causal future of
A. If A → X does not hold we write A 6→ X.
t t
X. Y . X .
A. . A . .
Y
B
x
B . x
Assume that any random variable is associated to a space-time point (the point where a
parameter is chosen or where it is observed).
Definition 5.2.4. A parameter A is called freely random if and only if PAΓA = PA × PΓA ,
where ΓA := {W ∈ Γ : A 6→ W }.
57
t t
ΓA
X. Y . X . ΓA
A. B.
A . Y.
B.
x x
Example 5.2.5. For the same scenario as above, the dashed ellipses denote the set ΓA .
X. .Y
Alice Bob
A. .B
Λ
58
Because B is freely random PBAΛX = PB PAΛX . In particular we have PB|AΛX PB|AΛ =
PB . Hence (5.1) and (5.2) are equivalent to
PBX|AΛ = PX|BAΛ PB and (5.3)
PBX|AΛ = PX|AΛ PB . (5.4)
This allows us to conclude that
PX|BAΛ = PX|AΛ . (5.5)
Note that (5.5) is called a non-signalling condition. (It implies that an agent choosing B
cannot use his choice to communicate to an agent that chooses A and sees X.)
By symmetry we also have a second non-signalling condition
PY |BΛA = PY |BΛ . (5.6)
Consider now an experiment where the two systems (Alice and Bob) are prepared in a
state |Ψi = √12 (| ↑↑i + | ↓↓i) and N denotes a large integer. On Alice’s side, we perform
a measurement in the basis
{|αi, |α⊥ i}, where |αi = cos α2 | ↑i + sin α2 | ↓i (5.7)
π
and where α = A 2N with A ∈ {0, 2, 4, . . . 2N − 2}. Bob measures with respect to the basis
59
Let Z be an arbitrary value computed from Λ, i.e. Z = f (Λ) for some function f (·).
The intuition behind Z is that it can be viewed as a guess for the outcome X if A = 0
and B = 1. We define
A B
Z=X Z=X
p X=Y X 6= Y
⇒ Z = Y ≤ p ⇒ Z 6= Y
Z 6= X Z 6= X
X=Y X 6= Y ≤
⇒ Z 6= Y ⇒Z=Y
≤
Pr[Z = Y |B = 1] ≥ p − . (5.18)
Using once more the non-signalling condition we can write Pr[Z = Y |A = 2, B = 1] ≥ p−.
We next apply the same step recursively. Considering the same argumentation we used
above (cf. probability table) we obtain
1 Ifthe doted sets are denoted by A and B basic probability theory tells us that P (A ∪ B) = P (A) +
P (B\A) ≤ p − .
60
We can also find an upper bound for Pr[Z = Y |A = 0, B = 2N − 1] using (5.11). We can
use a similar argument as was used before with the diagram on the previous page, but now
the left column of the diagram will be upper bounded by as dictated by (5.11). The top
row will still be equal to p. The bottom row clearly is (using the non-signalling condition
and (5.15))
Pr[Z 6= X|A = 0, B = 2N − 1] = 1 − p. (5.22)
Now we just need to find the maximum probability for the top left and bottom right
squares together. This occurs when the top left square’s probability is and the bottom
left square’s probability is 0. This gives an upper bound of
Pr[Z = Y |A = 0, B = 2N − 1] ≤ 1 − p + . (5.23)
which is equivalent to
2p ≤ 1 + 2N . (5.25)
Since this must hold for arbitrary values of N and = O( N12 ) we conclude that we have
p ≤ 21 . Ergo we have found that for any Z (computed from Λ) Pr[Z = X|A = 0, B =
1] ≤ 21 . This means we cannot guess X better than with probability of success upper
bounded by 1/2. If one would take X as guess, the prediction would be defined via the
measurement outcomes which contradicts the free choice of Λ.
In particular note that PXΛ|A = PX × PΛ|A . This can be proven by contradiction. The
main idea is that if we assume that Λ depends on X, then there exists a function f such
that Z = f (Λ) depends on X. This proves that Λ is useless for predicting the outcome X
for the particular measurement that we considered.
Note that with an additional proof step (cf. [4]) the statement can be extended to
arbitrary measurements on arbitrary states.
61
Theorem above
See [5, 6]
In the following we give a very brief overview of earlier work about the question whether
quantum theory is complete or not.
Einstein, Podolsky and Rosen (1935) In 1935 Einstein, Podolsky and Rosen tried to
come up with an argument for the incompleteness of quantum theory [7]. They considered
an entangled state between two system A and B, √12 (|00iAB +|11iAB ).2 Furthermore they
considered to have a measurement device that measures the state with respect to a certain
basis {|αi, |α⊥ i} (as introduced earlier in this chapter) and outputs the measurement
outcome X. Assume that we perform such a measurement on the system A. Let ϕα,X
denote the post measurement state in B. We obtain that if X = 0, we have ϕα,0 = |αi
and if X = 1, we get ϕα,1 = |α⊥ i. If we now (after having measured on A) measure on B
in a basis rotated by α, with an outcome Y , we can predict the outcome perfectly since
we have Y = X. Their argumentation now consists of three steps.
1. Since Y can be predicted with certainty, it is considered as an element of reality.
Kochen and Specker (1967) Kochen and Specker [8] considered a natural property
called non-contextuality.
2 Intheir original work they considered a different state. However it is convienient to use the Bell state
for their argumentation.
62
meas. in basis a: meas. in basis a0 :
3 rotation in xy-plane 3
z z
2 2
y y x 1
x 1
2. deterministic
3. free randomness
4. compatible with quantum theory.
Note that the theorem we discussed in this chapter tells us that Properties 3 and 4 imply
that Λ is independent of X and therefore cannot determine X.
Bell (1964) In 1964, Bell published a theorem [9] which tells us that there cannot exist
a theory with the following properties:
1. non-signalling
2. deterministic
3. free randomness
63
Aspect (see also Zeilinger & Gisin) Since the 1980s experimentalists are trying to come
up with experimental evidence of the theoretical theorems we have seen [10]. Note that all
theorems we have seen in this chapter assume compatibility with quantum theory. This
assumption can be replaced by actual experimental data. It has been showed that I2 < 1
holds experimentally, without assuming the correctness of quantum theory. It then follows
from Bell’s argument that these experimental data cannot be explained by any theory
(not necessarily compatible with quantum theory) that is non-signalling, deterministic
and compatible with free randomness.
64
6 Basic Protocols
6.1 Teleportation
Bennett, Brassard, Crépeau, Jozsa, Peres, Wootters, 1993.
“An unknown quantum state |φi can be disassembled into, then later recon-
structed from, purely classical information and purely nonclassical Einstein-
Podolsky-Rosen (EPR) correlations. To do so the sender, Alice, and the re-
ceiver, Bob, must prearrange the sharing of an EPR-correlated pair of particles.
Alice makes a joint measurement on her EPR particle and the unknown quan-
tum system, and sends Bob the classical result of this measurement. Knowing
this, Bob can convert the state of his EPR particle into an exact replica of the
unknown state |φi which Alice destroyed.”
With EPR correlations, Bennett et al. mean our familiar ebit √1 |00 + 11i. In more
2
precise terms, we are interested in performing the following task:
Task: Alice wants to communicate the unknown state ρ of one qubit in system S to Bob.
They share one Bell state. She can also send him two classical bits.
The protocol that achieves this, makes integral use of the Bell measurement. This is a
measurement of two qubits and consists of projectors onto the four Bell states
1
|ψ 00 i = √ |00 + 11i
2
1
|ψ 01 i = √ |00 − 11i
2
1
|ψ 10 i = √ |01 + 10i
2
1
|ψ 11 i = √ |01 − 10i.
2
More compactly, we can write
|ψ ij i = id ⊗ σ ij |ψ00 i
where σ ij = σxi σzj . For simplicity of the exposition, let ρ = |φihφ| be a pure state,
|φi = α|0i + β|1i (the more general case of mixed ρ follows then by linearity of the
protocol). The global state before the protocol is therefore given by |φiS ⊗ |ψ 00 iAB . The
protocol is as follows:
65
Protocol
1. Alice measures S and A (her half of the entangled state) in the Bell basis.
2. Alice sends the classical bits that describe her outcome, i, j, to Bob.
3. Bob applies σ ij on his qubit.
The resulting state is |φi as one easily verifies.
ρ ψij
Bell
i,j
ψ00
Alice
Bob
ρ
σ ij
Note that each outcome is equally probable and that entanglement between ρ and the
rest of the universe is preserved.
Diagrammatically, we can summarise the teleportation as the following conversion of
resources:
2
g
→ 1
1 ≥
where the straight arrow represents the sending of a classical bit, the wiggly line an
ebit and the wiggly arrow the sending of a qubit. The inequality sign means that there
exists a protocol that can transform the resources of one ebit and two bits of classical
communication into the resource of sending one qubit.
66
6.2 Superdense coding
Superdense coding answers the question of how many classical bits we can send with one
use of a quantum channel if we are allowed to use preshared ebits.
Task Alice wants to send two classical bits, i and j, to Bob. They share one Bell state.
She can also send him one qubit.
Protocol
1. Alice applies a local unitary operation, σ ij , on her half of the entangled state.
Recall, that the states |ψ ij i form a basis for two qubits: the Bell basis.
2. Alice sends her qubit to Bob.
3. Bob measures the two qubits in the Bell basis. Outcome of his measurement: i, j.
i,j
σ ij
unitary operation
ψ00
Alice
Bob measurement
ψij
Bell
i,j
67
6.3 Entanglement conversion
With teleportation and superdense coding we have seen two tasks that can be solved nicely
when we have access to ebits. In a realistic scenario, unfortunately, it is difficult to obtain
or generate ebits exactly. It is therefore important to understand when and how we can
distill ebits from other quantum correlations or more generally, how to convert one type
of quantum correlation into another one. In this section, we will consider the simplest
instance of this problem, namely the conversion of one bipartite pure state into another
one. Before we state the main result, we need to do some preparatory work and introduce
the concept of majorisation.
6.3.1 Majorisation
Given two d-dimensional real vectors x and
P y with P entries in non-increasing order (i.e.
xi ≥ xi+1 and yi ≥ yi+1 ) which satisfy i xi = i yi we say that y majorises x, and
write x ≺ y if
Xk Xk
xi ≤ yi
i=1 i=1
Proof. We prove lemma inductively. Clearly the case d = 1 is true and we will therefore
focus on the inductive step d − 1 7→ d.
y x implies that x1 ≤ y1 , which in turn implies that there exists j such that yj ≤
x1 ≤ yj−1 ≤ y1 . Consequently, there is a t ∈ [0, 1] such that x1 = ty1 + (1 − t)yj . Let T
be the transposition that interchanges places 1 and j and let P = tid + (1 − t)T . Then
P y = (x1 , y2 , . . . , yj−1 , (1 − t)y1 + tyj , . . .). It remains to show that ỹ x̃, where the latter
| {z }
ỹ
is just x without x1 , since then the result follows by applying the inductive hypothesis to
x̃ and ỹ. This is shown as follows. For k < j:
k−1
X k
X k
X k
X k
X k−1
X
x̃i = xi ≤ x1 ≤ yj−1 ≤ yi = ỹi .
i=1 i=2 i=2 i=2 i=2 i=1
68
For k ≥ j:
k−1
X k
X
x̃i = xi
i=1 i=2
k
!
X
≤ yi + (y1 − x1 )
i=2
k
X
= yi + (y1 − (1 − t)yj − ty1 + yj )
i=2:i6=j
k
X
= yi + ((1 − t)y1 + tyj )
i=2:i6=j
k−1
X
= ỹi
i=1
where we used Ky Fan’s principle which characterises the largest (and also the largest k)
eigenvalues in a variational way.
Corollary 6.3.3. Let r and s be the eigenvalues (incl. multiplicities) of density matrices
ρ and σ in non-increasing order. Then r ≺ s iff there exists a finite set of unitaries and
associated probabilities {Ui , pi } such that
X
ρ= pi Ui σUi−1
i
69
which is equivalent to the claim for Ui :=PU −1 πi V .
Conversely, Lemma 6.3.2 applied to ρ = i pi Ui σUi−1 implies
X
s = EV(σ) = EV(pi Ui σUi−1 ) EV(ρ) = r,
i
where EV(σ) denotes the non-increasingly ordered vector containing the eigenvalues of
σ.
We now want to argue that any measurement on Bob’s side of the state |ψi can be
replaced by a measurement on Alice’s side and a unitary on Bob’s side dependent on
Alice’s measurement outcome. Note that this is only possible since we know the state on
which the measurement will be applied – without this knowledge this is impossible. In
order to see how it works, we write |ψi in its Schmidt decomposition
X
|ψi = ψi |iiA |iiB
i
We now define measurement operators for Alice with respect to her Schmidt basis
X
A0k = bk,ji |jihi|A
ij
which means that we can simulate the measurement on Bob’s side on |ψi by a measurement
on Alice’s side (with Kraus operators Ak = Uk A0k ) followed by a unitary Vk on Bob’s side.
This way we can reduce an arbitrary LOCC2 protocol between Alice and Bob (applied
to |ψi) by a measurement on Alice’s side followed by a unitary on Bob’s side conditioned
on Alice’s measurement outcome.
This preparation allows us to prove the following result due to Nielsen.
Theorem 6.3.4. |φi can be transformed into |ψi by LOCC iff r ≺ s, where r and s are
the local eigenvalues of |ψi and |φi, respectively.
1F
P
= ij |jihi| ⊗ |iihj|
2 local operations and classical communication
70
Proof. Define ρAB = |ψihψ|AB and σAB = |φihφ|AB with reduced states ρA and σA . By
the above it suffices to consider protocols where Alice performs a measurement with Kraus
operators Ak followed by a unitary Vk on Bob’s side. Since the protocol must transform
Alice’s local state for each measurement outcome into the local part of the final state, we
have
Ak ρA A†k = pk σA (6.1)
be the polar decomposition of the LHS. Multiplying this equation with its hermitian
conjugate and using (6.1) we find
√ √
ρA A†k Ak ρA = pk Uk† σA Uk .
pk Uk† σA Uk
X
ρA = (6.2)
k
where we assume for simplicity that ρA is invertible (the other case can be considered a
limiting case). It is easy to verify that k A†k Ak = id. Clearly
P
Ak ρA A†k = pk σA
and therefore there exist unitaries Vk on Bob’s side such that the final state is |φi.
71
7 Entropy of Quantum States
In Chapter 3 we have discussed the definitions and properties of classical entropy mea-
sures and we have learned about their usefulness in the discussion of the channel coding
theorem. After the introduction of the quantum mechanical basics in Chapter 4 and after
Chapter 5 about the completeness of quantum theory, we are ready to introduce the notion
of entropy in the quantum mechanical context. Textbooks usually start the discussion of
quantum mechanical entropy with the definition of the so called von Neumann entropy
and justify the explicit expression as being the most natural analog of the classical Shan-
non entropy for quantum systems. But this explanation is not completely satisfactory.
Hence a lot of effort is made to replace the von Neumann entropy by the smooth min-
and max-entropies which can be justified by its profound operational interpretation (re-
call for example the discussion of the channel coding theorem where we worked with the
min-entropy and where the Shannon entropy only appears as a special case).
One can prove that the smooth min-entropy of a product state ρ⊗n converges for large
n to n-times the von Neumann entropy of the state ρ. The quantum mechanical min-
entropy thus generalizes the von Neumann entropy in some sense. But since this work is
still in progress we forgo this modern point of view and begin with the definition of the von
Neumann entropy and only indicate at the end of the chapter these new developments.
where PZ (z) is the probability distribution for measuring |zi in a measurement of ρ in the
basis {|zi}z . Our central demand on the definition of the entropy measures of quantum
states is that they generalize the classical entropies. More precisely, we demand that the
evaluation of the quantum entropy on ρ yields the corresponding classical entropy of the
distribution PZ (z). The following definitions meet these requirements as we will see below.
Definition 7.1.1. Let ρ be an arbitrary state on a Hilbert space HA . Then the von
Neumann entropy H is the quantum mechanical generalization of the Shannon entropy.
It is defined by
H(A)ρ := −tr(ρ log ρ).
72
The quantum mechanical min-entropy Hmin generalizes the classical min-entropy. It is
defined by
Hmin (A)ρ := − log2 kρk∞ .
The quantum mechanical max-entropy Hmax generalizes the classical max-entropy. It is
defined by
Hmax (A)ρ := log2 |supp(ρ)|,
where supp(ρ) denotes the support of the operator ρ.
Now, we check if our requirement from above really is fulfilled. To that purpose we
consider again the state X
ρZ = PZ (z)|zihz|.
z
Since the map ρ → ρ log ρ is defined through the eigenvalues of ρ,
X
H(Z)ρ = −tr(ρ log ρ) = − PZ (z) log2 PZ (z),
z
which reproduces the Shannon entropy as demanded. Recall that kρk∞ is the operator
norm which equals the greatest eigenvalue of the operator ρ. Thus, the quantum mechan-
ical min-entropy reproduces the classical min-entropy:
Hmin (Z)ρ = − log2 kρk∞ = − log max PZ (z).
z∈Z
To show that the classical max-entropy emerges as a special case from the quantum me-
chanical max-entropy we make the simple observation
Hmax (Z)ρ = log2 |suppρ| = log2 |supp PZ |.
Notation. Let ρAB be a density operator on the Hilbert space HA ⊗ HB and let ρA
and ρB be defined as the partial traces
ρA := trB ρAB , ρB := trA ρAB .
Then the entropies of the states ρAB ∈ S(HA ⊗ HB ), ρA ∈ S(HA ) and ρB ∈ S(HB ) are
denoted by
H(AB)ρ := H(AB)ρAB , H(A)ρ := H(A)ρA , H(B)ρ := H(B)ρB .
73
Proof. Let {|ji}j be a complete orthonormal system which diagonalizes ρ, i.e.,
X
ρ= pj |jihj|,
j
P
with j pj = 1. Therefore,
X
H(A)ρ = − pj log pj . (7.1)
j
The function −x log x is positive on [0, 1]. Consequently, the RHS above is positive which
shows that the entropy is non-negative. It is left to show that H(A)ρ = 0 iff ρ is pure.
Assume H(A)ρ = 0. Since the function −x log x is non-negative on [0, 1] each term in
the summation in (7.1) has
Pto vanish separately. Thus, either pk = 0 or pk = 1 for all k.
Because of the constraint j pj = 1 exactly one coefficient pm is equal to one whereas all
the others vanish. We conclude that ρ describes the pure state |mi.
ρ = |φihφ|
f (U M U −1 ) = U f (M )U −1
for U ∈ GL(H) arbitrary. Let D denote the diagonal matrix similar to M . The operator
V U −1 diagonalizes U M U −1 . According to the definition above,
f (U M U −1 ) = U V −1 f (V U −1 U M U −1 U V −1 )V U −1 = U V −1 f (V M V −1 )V U −1 .
74
Lemma 7.2.3. Let HA and HB be Hilbert spaces, let |ψi be is a pure state on HA ⊗ HB
and let ρAB := |ψihψ|. Then,
H(A)ρ = H(B)ρ .
Proof. According to the Schmidt decomposition there exist orthonormal families {|iA i}
P {|i
and
2
B i} in HA and HB , respectively, and positive real numbers {λi } with the property
λ
i i = 1 such that
X
|ψi = λi |iA i ⊗ |iB i.
i
Hence, trB (ρAB ) and trA (ρAB ) have the same eigenvalues and thus, H(A)ρAB = H(B)ρAB .
We deduce
X
H(AB)ρA ⊗ρB = − pA B A B
i pj log(pi pj )
ij
= H(A)ρA + H(B)ρA .
ρ = p1 ρ1 + ... + pn ρn
where {Hclass ({pi }i )} denotes the Shannon entropy of the probability distribution {pi }i .
(i)
Proof. Let {λj } and {|j (i) i} the eigenvalues and eigenvectors of the density operators
{ρi }. Thus,
(i)
X
ρ= pi λj |j (i) ihj (i) |
i,j
75
and consequently,
(i) (i)
X
H(A)ρ = − pi λj log(pi λj )
i,j
X X (i) X X (i) (i)
= − λj pi log(pi ) − pi λj log(λj )
i j i j
X
= Hclass ({pi }) + pi H(A)ρi .
i
A consequence of this lemma is that the entropy is concave. More precisely, let ρ1 , ..., ρn
be density operators on the same Hilbert space HA . Consider a mixture P of those density
operators according to a probability distribution {pj }j on {1, ..., n}, ρ = j pj ρj .
Then X
H(A)ρ ≥ pj H(A)ρj .
j
and thus,
(1) (n) (1) (n)
p1 H(ρA ) + ... + pn H(ρA ) ≤ H(p1 ρA + ... + ρA )
Lemma 7.2.6. Let HA and HZ be Hilbert spaces and let ρAZ be a state on HA ⊗ HZ
which is classical on HZ with respect to the basis {|zi}z of HZ , i.e., ρAZ is of the form
(z)
X
ρAZ = PZ (z)ρA ⊗ |zihz|.
z
76
Then X
H(AZ)ρ = Hclass ({PZ (z)}z ) + PZ (z)H(A)ρ(z) .
A
z
Proof. Define
(z)
ρ̃z := ρA ⊗ |zihz|,
apply Lemma 7.2.5 with ρi replaced by ρ̃z , use lemma 7.2.4 and apply Lemma 7.2.1.
for classical entropies in the chapter about classical information theory. We use exactly
this identity to define conditional entropy in the context of quantum information theory.
Definition 7.3.1. Let HA and HB be two Hilbert spaces and let ρAB be a state on
HA ⊗ HB . Then, the conditional entropy H(A|B)ρ is defined by
Recasting this defining equation leads immediately to the so called chain rule:
Lemma 7.3.2. Let ρAB be a pure state on a Hilbert space HA ⊗HB . Then H(A|B)ρAB < 0
iff ρAB is entangled, i.e. H(AB)ρAB 6= H(A)ρAB + H(B)ρAB .
Proof. Observe that
H(A|B)ρAB = H(AB)ρAB − H(B)ρAB .
Recall from Lemma 7.2.1 that the entropy of a state is zero iff it is pure. The state
trA (ρAB ) is pure iff ρAB is not entangled. Thus, indeed H(A|B)ρAB is negative iff ρAB is
entangled.
Hence, the conditional entropy can be negative.
Lemma 7.3.3. Let HA , HB and HC be Hilbert spaces and let ρABC be a pure state on
HA ⊗ HB ⊗ HC . Then,
H(A|B)ρABC = −H(A|C)ρABC .
Proof. We have seen in Lemma 7.2.3 that ρABC pure implies that
Thus,
H(A|B)ρ = H(AB)ρ − H(B)ρ = H(C)ρ − H(AC)ρ = −H(A|C)ρ.
77
Lemma 7.3.4. Let HA and HZ be Hilbert spaces, let {|zi}z be a complete orthonormal
basis in HZ and let ρAZ be classical on HZ with respect to the basis {|zi}z , i.e.,
(z)
X
ρAZ = PZ (z)ρA ⊗ |zihz|.
z
Moreover,
H(A|Z)ρ ≥ 0.
Proof. Apply Lemma 7.2.6 to get
In Lemma 7.2.1 we have seen that H(ρ) ≥ 0 for all states ρ. Hence, H(A|Z)ρ ≥ 0.
Now it’s time to state one of the central identities in quantum information theory: the
so called strong subadditivity.
Theorem 7.3.5. Let ρABC be a state on HA ⊗ HB ⊗ HC . Then,
H(A|B)ρABC ≥ H(A|BC)ρABC .
In textbooks you presently find complex proofs of this theorem based on the Araki-Lieb
inequality (see for example [1]) . An alternative shorter proof can be found in [11].
Lemma 7.3.6. Let ρ be an arbitrary state on a d-dimensional Hilbert space H. Then,
H(ρ) ≤ log2 d,
1
with equality iff ρ is a completely mixed state, i.e., a state similar to d idH .
Proof. Let ρ be a state on H which maximizes the entropy and let {|ji} the diagonalizing
basis, i.e., X
ρ= pj |jihj|.
j
The entropy does only depend on the state’s eigenvalue, thus, in order to maximize the
entropy, we are allowed to consider the entropy H as a function mapping ρ’s eigenvalues
(p1 , ..., pd ) ∈ [0, 1]d to R. Consequently, we have to maximize the function H(p1 , ..., pd )
78
under the constraint p1 + ... + pd = 1. This is usually done using Lagrange multipliers.
One gets pj = 1/d for all j = 1, ..., d and therefore,
1
ρ= idH
d
(this is the completely mixed state). This description of the state uniquely characterizes
the state independently of the choice of the basis the matrix above refers to since the
identity idH is unaffected by similarity transformations. This proves that ρ is the only
state that maximizes the entropy. The immediate observation that
S(ρ) = log2 d
H(X|B)ρ ≥ 0
Hence,
H(X|B)ρBXX 0 = H(BX)ρBXX 0 − H(B)ρBXX 0
and
H(X|BX 0 )ρBXX 0 = H(BXX 0 )ρBXX 0 − H(BX 0 )ρBXX 0 .
According to the strong subadditivity
79
To prove the assertion we have to show that the RHS vanishes or equivalently that
H(BXX 0 )ρBXX 0 is equal to H(BX 0 )ρBXX 0 . Let ρBX 0 denote the state which emerges from
ρBXX 0 after the application of trX (·). Hence, H(BX 0 )ρBXX 0 = H(BX 0 )ρBX 0 . Further,
where |0i is a state in the basis {|xi}z of the Hilbert space HX . Define the map
S : HX ⊗ HX 0 → HX ⊗ HX 0
by
S(|z0i) := |zzi
S(|zzi) := |z0i
S(|xyi) := |xyi, (otherwise).
We observe,
[IB ⊗ S]ρBX 0 ⊗ |0ih0|[IB ⊗ S]−1 = ρBXX 0 .
Obviously, [IB ⊗ S] ∈ GL(HX ⊗ HX 0 ) (the general linear group) and thus does not change
the entropy:
be a state on HA ⊗ HB 0 . Then,
According to the Stinespring dilation the Hilbert space HR can be chosen such that there
exists a unitary U with the property
80
where adU (·) := U (·)U −1 and ξ ∈ S(HB ). Since the entropy is invariant under similarity
transformations we can use this transformation U to get
H(A|B)ρAB = H(AB 0 R)[IA ⊗adU ](ρAB ⊗|0ih0|) − H(B 0 R)[IA ⊗adU ](ρAB ⊗|0ih0|)
= H(A|B 0 R)[IA ⊗adU ](ρAB ⊗|0ih0|)
≤ H(A|B 0 )[IA ⊗trR ◦adU ](ρAB ⊗|0ih0|)
= H(A|B 0 )[IA ⊗E](ρAB )
= H(A|B 0 )ρAB0 ,
where we have used the strong subadditivity and the Stinespring dilation. We get
We observe that the definition of quantum mutual information and the definition of
classical mutual information are formally identical. Next we prove a small number of
properties of the mutual information.
Lemma 7.4.2. Let ρABC a state on a Hilbert space HA ⊗ HB ⊗ HC . Then,
I(A : B|C) ≥ 0.
81
Lemma 7.4.4. Let HA , HB , HC be Hilbert spaces and let ρABC be a state on a Hilbert
space HA ⊗ HB ⊗ HC . Then,
I(A : BC) = I(A : B) + I(A : C|B).
To prove this statement we simply have to plug in the definition of mutual information
and conditional mutual information.
Exercise (Bell state). Compute the mutual information I(A : B) of a Bell state ρAB .
You should get H(A) = 1, H(A|B) = −1 and thus I(A : B) = 2.
−1
where the maximisation is taken over density operators σB . σB denotes the pseudo-inverse
of σB , i.e. the operator
−1
σB = U diag(λ−1 −1 †
1 , . . . , λ` , 0, . . . , 0)U ,
82
The following lemma shows that conditional min-entropy characterises the maximum
probability of guessing a value X correctly giving access to quantum information in a
register B.
P
Lemma 7.5.1. Let ρXB = x |xihx| ⊗ ρx , then
Hmin (X|B) = − log pguess (X|B)
where X
pguess (X|B) = max tr[ρx Ex ]
{Ex }POVM
x
is the maximum probability of guessing X correctly given access to B.
Proof. The proof uses semidefinite programming,
P an extension
P of linear programming. For
a review see [12]. Defining C = − x |xihx| ⊗ ρx , X̃ = x |xihx| ⊗ Ex , Aij = id ⊗ eij
where eij denotes a matrix with a one in column i and row j and bij := δij , pguess takes
the classic form of a primal semidefinite programme:
X
− min trC X̃ : X̃ ≥ 0, trAij X̃ = bij .
ij
Both programmes are strictly feasible, since the points X = id and σ = id are feasible
points, respectively. By semidefinite programming duality, the two programmes therefore
have the same value. This proves the claim.
Recall that by definition the conditional von Neumann entropy satisfies H(A|B) =
H(AB) − H(B). From the definition of the conditional min-entropy such an inequality is
certainly non-obvious and indeed false when taken literally. For most purposes, a set of
inequalities replaces this important equality (which is often known as a chain rule). To
give you the flavor of such inequalities we will prove the most basic one:
Lemma 7.5.2.
Hmin (A|B) ≥ Hmin (AB) − Hmax (B)
Proof.
Hmin (A|B)ρ = max(− log min{λ : λidA ⊗ σB ≥ ρAB }) (7.2)
σB
ρ0B
≥ − log min{λ : λidA ⊗ ≥ ρAB } (7.3)
|suppρB |
= − log min{µ|suppρB | : µidA ⊗ idB ≥ ρAB } (7.4)
= − log min{µ : µidA ⊗ idB ≥ ρAB } − log |suppρB | (7.5)
= Hmin (AB)ρ − Hmax (B)ρ (7.6)
83
where ρ0B denotes the projector onto the support of ρB .
Strong subadditivity of von Neumann entropy is the inequality:
Using the definition of the conditional von Neumann entropy, this is equivalent to the
inequality
H(A|B) ≥ H(A|BC)
which is often interpreted as “conditioning reduces entropy”. In this form, it has a direct
analog for conditional min entropy:
Lemma 7.5.3.
Hmin (A|B) ≥ Hmin (A|BC)
Proof. Since λσBC ≥ ρABC implies λσB ≥ ρAB we find for the σBC that maximises the
expression for Hmin (A|BC)
In the exercises, you will show how these two lemmas also hold for the smooth min and
max-entropy. Combined with the asymptotic equipartition property that we discussed
in the part on classical information theory you will then prove strong subadditivity of
von Neumann entropy. The very fundamental result by the mathematical physicist Beth
Ruskai and Elliot Lieb was proven in 1973 and remains the only known inequality for the
von Neumann entropy — there may be more, we just haven’t discovered them yet!
84
8 Resources Inequalities
We have seen that ebits, classical communication and quantum communication can be seen
as valuable resources with which we can achieve certain tasks. An important example was
the teleportation protocol which shows one ebit and two bits of classical communication
can simulate the transmission of one qubit. In the following we will develop a framework
for the transformation resources and present a technique that allows to show the optimality
of certain transformations.
g
(Alice sends n bits to Bob)
• n shared entanglement, or ebits
f
(Alice and Bob share n Bell pairs)
• n shared bits
85
for all > 0 and n large enough.
In the remainder we will only be concerned with an exact conversion of perfect resources
with the main goal to show that the teleportation and superdense coding protocols are
optimal.
8.2 Monotones
Given a class of quantum operations, a monotone M is a function from states into the real
numbers that has the property that it does not increase under any operations from the
class. Rather than making this definition too formal (e.g. by specifying exactly on which
systems the operations act), we will consider a few characteristic examples.
Example 8.2.1. For bipartite states, the quantum mutual information is a monotone
for the class of local operations. More precisely, given a bipartite state ρAB and a local
quantum operation (CPTP map), say on Bob’s side, Λ : End(B) 7→ End(B 0 )
I(A : B) ≥ I(A : B 0 ).
I(A : B) = I(A : B 0 B 00 )
Strong subadditivity implies that the second term is nonnegative which leads us to the
desired conclusion.
A similar argument shows that
where ρABE is an arbitrary extension of ρAB , i.e. satisfies trE ρABE = ρAB .
Example 8.2.2 (Squashed entanglement). The squashed entanglement of a state ρAB is
given by
1
Esq (A : B) := inf I(A : B|E)
2 E
where the minimisation extends over all extensions ρABE of ρAB . Note that we do not
impose a limit on the dimension of E. (That is why we do not know whether the mini-
mum is achieved and write inf rather than min.) Squashed entanglement is a monotone
under local operations and classical communication (often abbreviated as LOCC). That
squashed entanglement is monotone under local operations follows immediately from the
previous example. We just only need to verify that it does not increase under classical
communication.
Consider the case where Alice sends a classical system C to Bob (e.g. a bit string).
86
We want to compare Esq (AC : B) and Esq (A : BC). For any extension E, we have
I(B : AC|E) = H(B|E) − H(B|ACE)
≥ H(B|EC) − H(B|AEC) (strong subadditivity)
= I(B : A|EC)
= I(BC : A|EC) EC =: E 0
≥ min
0
I(BC : A|E 0 )
E
This shows that Esq (AC : B) ≥ Esq (A : BC). By symmetry Esq (AC : B) = Esq (A : BC)
follows.
implies
g
→
n ≥
m
g
∞
gn
→
≥ m
so we only need to show that we cannot increase the number of ebits by classical commu-
nication. This sounds easy, but in fact needs our monotone squashed entanglement. Since
every possible extension ρABE of a pure state ρAB (for instance the n ebits) is of the form
ρABE = ρAB ⊗ ρE we find
2Esq (A : B)g
n = inf I(A : B|E) = I(A : B) = 2n. (8.1)
E
87
8.4 Superdense coding is optimal
We want to prove that we need at least one qubit channel in order to send two classical
bits, regardless of how many ebits we have available:
n 2m
g
∞ ≥ g∞ →
implies m ≤ n
Now we have to prove that this implies n ≥ m, i.e. entanglement does not help us to
send more qubits. For this, we consider an additional player Charlie who holds system
C and shares ebits with Alice. Let Bi be Bob’s initial system, Q an n qubit system that
Alice sends to Bob, Λ Bob’s local operation and Bf Bob’s final system. Clearly, if an n
qubit channel could simulate an m qubit channel for m > n, then Alice could send m
fresh halves of ebits that she shares with Charlie to Bob, thereby increasing the quantum
mutual information between Charlie and Bob by 2m.
Charlie
8
n
Alice Bob
...
8
We are now going to show that the amount of quantum mutual information that Bob
and Charlie share cannot increase by more than two times the number of qubits that he
receives from Alice, i.e. by 2n. For this we bound Bob’s final quantum mutual information
with Charlie by
I(C : Bf ) ≤ I(C : Bi Q)
= I(C : Bi ) + I(C : Q|Bi )
≤ I(C : Bi ) + 2n
88
Therefore m ≤ n. This concludes our proof that the superdense coding protocol is optimal.
Interestingly, for this argument, we did not use a monotone such as squashed entan-
glement from above. We merely used the property that the quantum mutual information
cannot increase by too much under communication. Quantities that have the opposite
behaviour (i.e. can increase sharply when only few qubits are communicated) are known
as lockable quantities and have been in the focus of the attention in quantum information
theory in recent years. So, we might also say that the quantum mutual information is
nonlockable.
8.5 Entanglement
We have already encountered the word entanglement many times. Formally, we say that
a quantum state ρAB is separable if it can be written as a convex combination of product
states, i.e. X
ρAB = pk τk ⊗ σk
k
where the pk form a probability distribution and the ρk are states on A and the σk are
states on B. A state is then called entangled if it is not separable.
Characteristic examples of separable states are
• ρAB = |φihφ|A ⊗ |ψihψ|B
P
• Non-maximally entangled pure states of the form i αi |iii, where the αi are not all
of equal magnitude. In certain cases they can be converted (distilled) into maximally
entangled states (of lower dimension) using Nielsen’s majorisation criterion [13].
1
P
• The totally antisymmetric state ρAB = d(d−1) i<j |ij − jiihij − ji|AB can be seen
to be entangled, since every pure state supported on the antisymmetric subspace is
entangled.
Theorem 8.5.1. For any state ρAB we have that Esq (A : B) = 0 iff ρAB is separable.
89
Proof. We only prove here that a separable state ρAB implies that Esq (A : B) = 0. The
converse is beyond the scope of this course and has been proven recently [14]. We consider
the following separable classical-quantum state
X
ρABC = pi ρiA ⊗ ρiB ⊗ |iihi|C ,
i
P
for pi being a probability (i.e. pi ≥ 0 and i pi = 1). Using the definition of the mutual
information we can write
The first two equalities follow by definition and the final step is can be verified by the
chain rule which gives
Since ebits are so useful, we can ask ourselves how many ebits we can extract per given
copy of ρAB , as the number of copies approaches infinity. Formally, this number is known
as the distillable entanglement of ρAB :
m
ED (ρAB ) = lim lim sup { : hebit|⊗m Λ(ρ⊗n
AB )|ebiti
⊗m
≥ 1 − }
7→0 n7→∞ Λ LOCC n
This number is obviously very difficult to compute, but there is a whole theory of
entanglement measures out there with the aim to provide upper bounds on distillable
entanglement. A particularly easy upper bound is given by the squashed entanglement.
The proof uses only the monotonicity of squashed entanglement under LOCC operations
and the fact that the squashed entanglement of a state that is close to n ebits (in the
purified distance) is close to n. In the exercise you will show that squashed entanglement
of separable state is zero. This then immediately implies that one cannot extract any ebits
from separable states.
8.6 Cloning
The very important no-cloning theorem [15, 16] states that there cannot exist a quantum
operation that takes a state |ψi to |ψi ⊗ |ψi for all states |ψi. It has far-reaching con-
sequences and there exist several different proofs. It is desirable to have a proof that as
90
independent of the underlying theory as possible. For example a proof based on the lin-
earity of quantum mechanics is problematic as the proof would become invalid if someone
detects non-linear quantum effects - which in principle could exist.
We next present two different proofs of the non-cloning theorem. Recall that for any
state ρABC we have1
H(A|B) + H(A|C) ≥ 0. (8.2)
Assume that we have a machine that takes some system Q and outputs two copies Q1
and Q2 . Furthermore let R denote a reference system (e.g. a qubit). Let I(R : Q) = 2,
then after the cloning we must have I(R : Q1 ) = I(R : Q2 ) = 2. Using the definition of
the mutual information we obtain H(R) + H(R|Q1 ) = 2 and H(R) + H(R|Q2 ) = 2. Let
H(R) = 1, we then have H(R|Q1 ) = H(R|Q2 ) = −1 which contradicts (8.2) and hence
proves that such a cloning machine cannot exist.
We next present an even more theory independent proof. Consider the following exper-
iment.
Alice Q1 Bob1
angle α R cloning
Q2 Bob2
X
1 Let
ρABCD be a purification, then H(A|B) + H(A|CD) = 0. Using the data processing inequality gives
H(A|B) + H(A|C) ≥ 0.
91
9 Quantum Key Distribution
9.1 Introduction
In this chapter, we introduce the concept of quantum key distribution. Traditionally,
cryptography is concerned with the problem of securely sending a secret message from A
to B. Note however that secure message transmission is only one branch of cryptography.
Another example for a problem studied in cryptography is coin tossing. There the problem
is that two parties, Alice and Bob, which are physically separated and do not trust each
other want to toss a coin over the telephone. Blum showed that this problem cannot be
solved as long as one does not introduce additional asssumptions [17]. Note that coin
tossing is possible using quantum communication.
To start, we introduce the concept of cryptographic resources. A classical insecure
communication channel is denoted by
Alice Bob
Eve
The arrows to the adversary, Eve, indicate that she can receive all messages sent by Alice.
Furthermore Eve is able to modify the message which Bob finally receives. This channel
does not provide any guarantees. It can be used to model for example email traffic.
A classical authentic channel is denoted by
and guarantees that messages received by Bob are sent by Alice. It can be used to describe
e.g. a telephone conversation with voice authentification.
The most restrictive classical channel model we consider is the so-called secure channel
which has the same guarantees as the authentic channel and ensures in addition that no
information leaks. It is denoted by
In the quantum setup, an insecure quantum channel that has no guarantees is repre-
sented by
Note that an authentic quantum channel is automatically also a secure quantum channel
since reading out a message always changes the message.
The following symbol
92
k
denotes k classical secret bits, i.e. k bits that are uniformly distributed and maximally
correlated between Alice and Bob.
A desirable goal of quantum cryptography would be to have a protocol that simulates
a secure classical channel using an insecure quantum channel, i.e.,
?
≥
However, such a protocol cannot exist since this scenario has complete symmetry between
Alice and Eve, which makes it impossible for Bob to distinguish between them. If we add
a classical authentic channel in addition to the insecure quantum channel, it is possible as
we shall see to simulate a classical secret channel, i.e.,
n
n
≥
is possible.
In classical cryptography there exists a protocol [18], called authentication protocol, that
achieves the following
k
n
≥ n k.
Thus, if Alice and Bob have a (short) password, they can use an insecure channel to
simulate an authentic channel. This implies
k
n
≥ n k.
93
Let M be a message bit and S a secret key bit. The operation ⊕ denotes an addition
modulo 2. Alice first computes C = M ⊕ S and sends C over a classical authentic channel
to Bob. Bob then computes M 0 = C ⊕ S. The protocol is correct as
M 0 = C ⊕ S = (M ⊕ S) ⊕ S = M ⊕ (S ⊕ S) = M.
Proof. Let M ∈ {0, 1}n be the message which should be sent secretly from Alice to Bob.
Alice and Bob share a secret key S ∈ {0, 1}k . Alice first encrypts the message M and
sends a string C over a public channel to Bob. Bob decrypts the message, i.e. he computes
a string M 0 out of C and his key S. We assume that the protocol fulfills the following two
requirements.
1. Reliability: Bob should be able to reproduce M (i.e. M 0 = M ).
2. Secrecy: Eve does not gain information about M .
We consider a message that is uniformly distributed on {0, 1}n . The secrecy requirement
can be written as I(M : C) = 0 which implies that H(M |C) = H(M ) = n. We thus
obtain
I(M : S|C) = H(M |C) − H(M |CS) = n, (9.1)
where we also used the reliability requirement H(M |CS) = 0 in the last equality. Using
the data processing inequality and the non negativity of the Shannon entropy, we can
write
I(M : S|C) = H(S|C) − H(S|CM ) ≤ H(S). (9.2)
Combining (9.1) and (9.2) gives n ≤ H(S) which implies that k ≥ n.
Shannon’s result shows that information theoretic secrecy (i.e. I(M : C) ≈ 0) cannot
be achieved unless one uses very long keys (as long as the message).
94
In computational cryptography, one relaxes the security criterion. More precisely, the
mutual information I(M : C) is no longer small, but it is still computationally hard (i.e.
it takes a lot of time) to compute M from C. In other words, we no longer have the
requirement that H(M |C) is large. In fact, for public key cryptosystems (such as RSA
and DH), we have H(M |C) = 0. This implies that there exists a function f such that
M = f (C), which means that it is in principle possible to compute M from C. Security
is obtained because f is believed1 to be hard to compute. Note, however, that for the
protocol to be practical, one requires that there exists an efficiently computable function
g, such that M = g(C, S).
Note that this does not contradict Shannon’s proof of Theorem 9.2.1, since in the quantum
regime the no-cloning theorem (cf. Section 8.6) forbids that Bob and Eve receive the same
state, i.e., the ciphertext C is not generally available to both of them. Therefore, Shannon’s
proof is not valid in the quantum setup, which allows quantum cryptography to go beyond
classical cryptography.
As we will see in the following, it is sufficient to consider quantum key distribution
(QKD), which does the following.
n
≈n
≥ (4)
n
The protocol (4) implies () as we can concatenate it with the one-time pad encryption.
More precisely,
n ≈n
QKD OTP ≈n
2n ≥ n ≥
1 Inclassical cryptography one usually makes statements of the following form. If f was easy to compute
then some other function F is also easy to compute. For example F could be the decomposition of a
number into its prime factors.
95
which is the justification that we can focus on the task of QKD in the following.
We next define more precisely what we mean by a secret key, as denoted by SA and SB .
In quantum cryptography, we generally consider the following three requirements where
≥0
1. Correctness: Pr[SA 6= SB ] ≤
2. Robustness: if the adversary is passive, then2 Pr[SA =⊥] ≤
3. Secrecy: kρSA E − (pρ⊥ ⊗ ρE⊥ + (1 − p)ρk ⊗ ρEk )k1 ≤ , where ρE⊥ , ρEk are arbitrary
density operators, ρ⊥ = | ⊥ih⊥ | and ρk is a completely mixed state on {0, 1}n , i.e.
−n
P
ρk = 2 s∈{0,1}n |sihs|. The cq-state ρSA E describes the key SA together with
the system E held by the adversary after the protocol execution. The parameter p
can be viewed as the failure probability of the protocol.
The secrecy condition implies that there is either a uniform and uncorrelated (to E) key
or there is no key at all.
BB84 Protocol:
Distribution step Alice and Bob perform the following task N times and let i =
1, . . . , N . Alice first chooses Bi , Xi ∈ {0, 1} at random and prepares a state of
a qubit Qi (with basis {|0i, |1i}) according to
B X Q
0 0 |0i
0 1 |1i
1 0 |0̄i
1 1 |1̄i.
Alice then sends Qi to Bob.
Bob next chooses Bi0 ∈ {0, 1} and measures Qi either in basis {|0i, |1i} (if
Bi0 = 0) or in basis {|0̄i, |1̄i} (if Bi0 = 1) and stores the result in Xi . Recall that
all the steps so far are repeated N -times.
96
Sifting step Alice sends B1 , . . . , Bn to Bob and vice versa, using the classical au-
thentic channel. Bob discards all outcomes for which Bi 6= Bi0 and Alice does so
as well. For better understanding we consider the following example situation.
Q |1i |1i |1̄i |0̄i |0i |1̄i |0̄i |1i |1̄i
B 0 0 1 1 0 1 1 0 1
X 1 1 1 0 0 1 0 1 1
B’ 0 0 0 1 1 0 1 1 0
X’ 1 1 1 0 1 1 0 0 1
no. 1 2 3 4 5 6 7 8 9
Hence, Alice and Bob discard columns 3 , 5 , 6 , 8 and 9 .
Checking step Alice and Bob compare (via communication over the classical√ au-
thentic channel) Xi and Xi0 for some randomly chosen sample i of size n. If
there is disagreement, the protocol aborts, i.e. SA = SB =⊥.
Extraction step We consider here the simplest case where we assume to have no
errors (due to noise). The key SA is equal to the remaining bits of X1 , . . . , Xn
and the key SB are the remaining bits of X10 , . . . , Xn0 . Note that the protocol
can be generalized such that it also works in the presence of noise.
where Z denotes a measurement in the basis {|0i, |1i}, X denotes a measurement in the
basis {|0̄i, |1̄i} and where B and E are arbitrary quantum systems.
Ekert91 protocol: Similarly to the BB84 protocol this scheme also consists of four different
steps.
Distribution step (repeated N times) Alice prepares entangled qubit pairs and sends
one half of each pair to Bob (over the insecure quantum channel). Alice and
Bob then measure their qubit in a random basis Bi (for Alice)4 and Bi0 (for
Bob). They report the outcomes Xi (for Alice) and Xi0 (for Bop).
Sifting step Alice and Bob discard all (Xi , Xi0 ) for which Bi 6= Bi0 .
97
Checking step For a random sample of positions i Alice and Bob check whether
Xi = Xi0 . If the test fails they abort the protocol by outputting ⊥.
Extracting step Alice’s key SA consists of the remaining bits of X1 , X2 , . . .. Bob’s
key SB consists of the remaining bits X10 , X20 , . . ..
We next show that Ekert91 is equivalent to BB84. On Bob’s side it is easy to verify
that the two protocols are equivalent since Bob has to perform exactly the same tasks
for both. The following schematic figure summarizes Alice’s task in the Ekert91 and the
BB84 protocol.
Bi Xi Bi Xi
where |ϕ0,0 i = |0i, |ϕ0,1 i = |1i, |ϕ1,0 i = |0̄i, and |ϕ1,1 i = |1̄i. The BB84 protocol leads to
the same state
1 X 1 X
ρBB84
Bi Xi Qi = |bihb|Bi ⊗ |xihx|Xi ⊗ |ϕb,x ihϕb,x |Qi . (9.5)
2 2
b∈{0,1} x∈{0,1}
We thus conclude that viewed from outside the dashed box the two protocols are equivalent
in terms of security and hence to prove security for BB84 it is sufficient to prove security
for Ekert91. Note that both protocols have some advantages and drawbacks. While for
Ekert91 it is easier to prove securtiy, the BB84 protocol is technologically simpler to
implement.
ρABE
A B
E
X Z X0 Z0
meas. basis meas. basis meas. basis meas. basis
{|0̄i, |1̄i} {|0i, |1i} {|0̄i, |1̄i} {|0i, |1i}
It remains to prove that the Ekert91 protocol is secure. The idea is to consider the state
of the entire system (i.e. Alice, Bob and Eve) after the sending the distribution of the
98
entangled qubit pairs over the insecure channel (which may be arbitrarily modified by
Eve) but before Alice and Bob have measured. The state ρABE is arbitrary except that
the subsystem A is a fully mixed state (i.e. ρA is maximally mixed). At this point the
completeness of quantum theory (cf. Chapter 5) shows up again. Since quantum theory is
complete, we know that anything Eve could possibly do is described within our framework.
We now consider two alternative measurements for Alice (B = 0, B = 1). Call the
outcome of the measurement Z if B = 0 and X if B = 1. The uncertainty relation (9.3)5
now implies that
H(Z|E) + H(X|B) ≥ 1, (9.6)
which holds for arbitrary states ρABE where the first term is evaluated for ρZBE and the
second term is evaluated for ρXBE . The state ρXBE is defined as the post-measurement
state when measuring ρABE in the basis {|0̄i, |1̄i} and the sate ρZBE is defined as the
post-measurement state when measuring ρABE in the basis {|0i, |1i}. Using (9.6), we can
bound Eve’s information as H(Z|E) ≥ 1 − H(X|B). We next show that H(X|B) = 0
which implies that H(Z|E) = 1, i.e. Eve has no information about Alice’s state. The data
processing inequality implies H(Z|E) ≥ 1 − H(X|X 0 ).
In the protocol, there is a step called the testing phase where two alternative things can
happen
• if Pr[X 6= X 0 ] > 0, then Alice and Bob detect a deviation in their sample and abort
the protocol.
• if Pr[X = X 0 ] ≈ 1, Alice and Bob output a key.
Let us therefore assume
√ that Pr[X 6= X 0 ] = δ for δ ≈ 0. In this case, we have H(Z|E) ≥
1 − h(δ) ≈ 1 − δ for small δ, where h(δ) := −δ log2 δ − (1 − δ) log2 (1 − δ) denotes
the binary entropy function. Note that also H(Z) = 1, which implies that I(Z : E) =
H(Z) − H(Z|E) ≤ h(δ). Recall that I(Z : E) = D(ρZE ||ρZ ⊗ ρE ). Thus, if I(Z : E) = 0,
we have D(ρZE ||ρZ ⊗ρE ) = 0 for δ → 0. This implies that ρZE = ρZ ⊗ρE which completes
the security proof.6
Important remarks to the security proof The proof given above establishes security
under the assumption that there are no correlations between the rounds of the protocol.
Note that if the state involved in the i-th round is described by ρAi Bi Ei we have in general
Therefore, it is not sufficient to analyze the rounds individually and hence we so far only
proved security against i.i.d. attacks, but not against general attacks. Fortunately, there
is a solution to this problem. The De Finetti theorem shows that the proof for individual
attacks also implies security for general attacks. A rigorous proof of this statement is
beyond the scope of this course and can be found in [11] and [27] which uses a post
selection technique.
5 The uncertainty relation was topic of one exercise.
6 Inprinciple, we have to repeat the whole argument in the complementary basis, i.e. using the uncertainty
relation H(X|E) + H(Z|B) ≥ 1 (cf. (9.6))
99
Bibliography
[1] Micheal A. Nielsen and Isaac L. Chuang. Quantum Computation and Quantum In-
formation. Cambridge University Press, 2000.
[2] David H. Mellor. Probability: A Philosophical Introduction. Routledge, 2005.
[3] Claude E. Shannon. A mathematical theory of communication. Bell System Technical
Journal, 27:379–423 and 623–656, 1948.
[4] Roger Colbeck and Renato Renner. The completeness of quantum theory for predict-
ing measurement outcomes. arXiv:1208.4123.
[5] Matthew F. Pusey, Jonathan Barrett, and Terry Rudolph. On the reality of the
quantum state. DOI:10.1038/nphys2309, arXiv:1111.3328.
[6] Roger Colbeck and Renato Renner. Is a system’s wave function in one-to-one cor-
respondence with its elements of reality? DOI:10.1103/PhysRevLett.108.150402,
arXiv:1111.6597.
[7] Albert Einstein, Boris Podolsky, and Nathan Rosen. Can quantum-mechanical de-
scription of physical reality be considered complete? DOI:10.1103/PhysRev.47.777.
[8] Simon B. Kochen and Ernst P. Specker. The problem of hidden variables in quantum
mechanics. Journal of Mathematics and Mechanics, 17, 1967.
[9] John S. Bell. On the Einstein Podolsky Rosen Paradox. Physics, 1, 1964.
[10] Alain Aspect, Jean Dalibard, and Gérard Roger. Experimental test of bell’s inequal-
ities using time- varying analyzers. DOI:10.1103/PhysRevLett.49.1804.
[11] Renato Renner. Security of quantum key distribution. PhD thesis, ETH Zurich,
December 2005. arXiv:0512258.
[12] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge Univer-
sity Press, 2004. available at:http://www.stanford.edu/ boyd/cvxbook/.
[13] Michael A. Nielsen. Conditions for a class of entanglement transformations. Phys.
Rev. Lett., 83:436–439, Jul 1999. DOI:10.1103/PhysRevLett.83.436.
[14] Fernando Brandao, Matthias Christandl, and Jon Yard. Faithful squashed
entanglement. Communications in Mathematical Physics, 306:805–830, 2011.
DOI:10.1007/s00220-011-1302-1.
100
[15] William K Wootters and Wojciech H. Zurek. A single quantum cannot be cloned.
Nature, 299:802 – 803, 1982. DOI:10.1038/299802a0.
[16] Dennis Dieks. Communication by epr devices. Physics Letters A, 92(6):271 – 272,
1982. DOI:10.1016/0375-9601(82)90084-6.
[17] Manuel Blum. Coin flipping by telephone a protocol for solving impossible problems.
SIGACT News, 15(1):23–27, January 1983. DOI:10.1145/1008908.1008911.
[18] Douglas R. Stinson. Cryptography: Theory and Practice. CRC Press, 2005.
[19] Claude E. Shannon. Communication theory of secrecy systems. Bell System Technical
Journal, 28:656–715, 1949.
[20] Stephen Wiesner. Conjugate coding. SIGACT News, 15(1):78–88, January 1983.
10.1145/1008908.1008920.
[21] Charles H. Bennett and Gilles Brassard. Quantum cryptography: Public key distribu-
tion and coin tossing. Proceedings International Conference on Computers, Systems
& Signal Processing, 1984.
[22] Dominic Mayers. Unconditionally secure quantum bit commitment is impossible.
Phys. Rev. Lett., 78:3414–3417, Apr 1997. DOI:10.1103/PhysRevLett.78.3414.
[23] Peter W. Shor and John Preskill. Simple proof of security of the BB84
quantum key distribution protocol. Phys. Rev. Lett., 85:441–444, Jul 2000.
DOI:10.1103/PhysRevLett.85.441.
[24] Eli Biham, Michel Boyer, P. Oscar Boykin, Tal Mor, and Vwani Roychowdhury. A
proof of the security of quantum key distribution. Journal of Cryptology, 19:381–439,
2006. DOI:10.1007/s00145-005-0011-3.
[25] Artur K. Ekert. Quantum cryptography based on bell’s theorem. Phys. Rev. Lett.,
67:661–663, Aug 1991. DOI:10.1103/PhysRevLett.67.661.
[26] Mario Berta, Matthias Christandl, Roger Colbeck, Joseph M. Renes, and Renato
Renner. The uncertainty principle in the presence of quantum memory. Nature
Physics, 6, 2010. DOI:10.1038/nphys1734.
[27] Matthias Christandl, Robert König, and Renato Renner. Postselection technique
for quantum channels with applications to quantum cryptography. Phys. Rev. Lett.,
102:020504, Jan 2009.
101