0% found this document useful (0 votes)

11 views1,240 pages

Quantum Communication

Uploaded by

Indranil Maiti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views1,240 pages

Quantum Communication

Uploaded by

Indranil Maiti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1240

arXiv:2011.

04672v2 [quant-ph] 11 Feb 2024

Principles of
Quantum Communication Theory:
A Modern Approach

Sumeet Khatri and Mark M. Wilde

February 13, 2024

Preface
[IN PROGRESS]

ii
Acknowledgements
[IN PROGRESS]
We dedicate this book to the memory of Jonathan P. Dowling. Jon was generous
and kind-hearted, and he always gave all of his students his full, unwavering support.
His tremendous impact on the lives of everyone who met him will ensure that his
memory lives on and that he will not be forgotten. We will especially remember
Jon’s humour and his sharp wit. We are sure that, as he had promised, this book
would have made the perfect doorstop for his office.
Sumeet Khatri acknowledges support from the National Science Foundation
under Grant No. 1714215 and the Natural Sciences and Engineering Research
Council of Canada postgraduate scholarship. Mark M. Wilde acknowledges support
from the National Science Foundation over the past decade (specifically from Grant
Nos. 1350397, 1714215, 1907615, 2014010), and is indebted and grateful to Patrick
Hayden for hosting him for a sabbatical at Stanford University during calendar year
2020, with support from Stanford QFARM and AFOSR (FA9550-19-1-0369).

iii
Table of Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

I Preliminaries 2
2 Mathematical Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Finite-Dimensional Hilbert Spaces . . . . . . . . . . . . . . . . . 4
2.2 Linear Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Tensor Product . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Image, Kernel, and Support . . . . . . . . . . . . . . . . . 12
2.2.3 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.4 Transpose and Conjugate Transpose . . . . . . . . . . . . 15
2.2.5 Hilbert–Schmidt Inner Product, Vectorization, and Trans-
pose Trick . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.6 Notable Classes of Linear Operators . . . . . . . . . . . . 19
2.2.7 Singular Value, Schmidt, and Polar Decompositions . . . . 22
2.2.8 Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . 25
2.2.9 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.10 Operator Inequalities . . . . . . . . . . . . . . . . . . . . 42
2.2.11 Superoperators . . . . . . . . . . . . . . . . . . . . . . . 48
2.3 Analysis and Probability . . . . . . . . . . . . . . . . . . . . . . 52
2.3.1 Limits, Infimum, Supremum, and Continuity . . . . . . . 53
2.3.2 Compact Sets . . . . . . . . . . . . . . . . . . . . . . . . 56
2.3.3 Convex Sets and Functions . . . . . . . . . . . . . . . . . 56
2.3.4 Fenchel–Eggleston–Carathéodory Theorem . . . . . . . . 58
2.3.5 Minimax Theorems . . . . . . . . . . . . . . . . . . . . . 58
2.3.6 Probability Distributions . . . . . . . . . . . . . . . . . . 61
2.4 Semi-Definite Programming . . . . . . . . . . . . . . . . . . . . 62

iv
2.4.1 SDPs for Spectral and Trace Norm, Maximum and Mini-
mum Eigenvalue . . . . . . . . . . . . . . . . . . . . . . 68
2.5 Symmetric Subspace . . . . . . . . . . . . . . . . . . . . . . . . 73
2.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

3 Quantum States and Measurements . . . . . . . . . . . . . . . . . . 84

3.1 Axioms of Quantum Mechanics . . . . . . . . . . . . . . . . . . . 84
3.2 Quantum Systems and States . . . . . . . . . . . . . . . . . . . . 86
3.2.1 Bipartite States and Schmidt Decomposition . . . . . . . . 91
3.2.2 Partial Trace . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.2.3 Separable and Entangled States . . . . . . . . . . . . . . . 97
3.2.4 Bell States . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.2.5 Purifications and Extensions . . . . . . . . . . . . . . . . 105
3.2.6 Multipartite States and Permutations . . . . . . . . . . . . 108
3.2.7 Group-Invariant States . . . . . . . . . . . . . . . . . . . 113
3.2.8 Ensembles and Classical–Quantum States . . . . . . . . . 116
3.2.9 Partial Transpose and PPT States . . . . . . . . . . . . . . 118
3.2.10 Isotropic and Werner States . . . . . . . . . . . . . . . . . 123
3.3 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3.4 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 132
Appendix 3.A Proof of Lemma 3.3 . . . . . . . . . . . . . . . . . . . 133
Appendix 3.B Proof of Lemma 3.4 . . . . . . . . . . . . . . . . . . . 134

4 Quantum Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.2 Choi Representation . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.3 Characterizations of Channels: Choi, Kraus, Stinespring . . . . . 145
4.3.1 Relating Quantum State Extensions and Purifications . . . 150
4.3.2 Complementary Channels . . . . . . . . . . . . . . . . . 150
4.3.3 Unitary Extensions of Quantum Channels from Isometric
Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . 153
4.4 General Types of Channels . . . . . . . . . . . . . . . . . . . . . 156
4.4.1 Preparation, Appending, and Replacement Channels . . . 156
4.4.2 Trace and Partial-Trace Channels . . . . . . . . . . . . . . 158
4.4.3 Isometric and Unitary Channels . . . . . . . . . . . . . . 159
4.4.4 Classical–Quantum and Quantum–Classical Channels . . . 160
4.4.5 Quantum Instruments . . . . . . . . . . . . . . . . . . . . 164

v
4.4.6 Entanglement-Breaking Channels . . . . . . . . . . . . . 166
4.4.7 Hadamard Channels . . . . . . . . . . . . . . . . . . . . . 172
4.4.8 Covariant Channels . . . . . . . . . . . . . . . . . . . . . 175
4.4.9 Bipartite and Multipartite Channels . . . . . . . . . . . . 181
4.5 Examples of Communication Channels . . . . . . . . . . . . . . . 181
4.5.1 (Generalized) Amplitude Damping Channel . . . . . . . . 181
4.5.2 Erasure Channel . . . . . . . . . . . . . . . . . . . . . . . 185
4.5.3 Pauli Channels . . . . . . . . . . . . . . . . . . . . . . . 187
4.5.4 Generalized Pauli Channels . . . . . . . . . . . . . . . . . 189
4.6 Special Types of Channels . . . . . . . . . . . . . . . . . . . . . 191
4.6.1 Petz Recovery Map . . . . . . . . . . . . . . . . . . . . . 191
4.6.2 LOCC Channels . . . . . . . . . . . . . . . . . . . . . . . 196
4.6.3 Completely PPT-Preserving Channels . . . . . . . . . . . 202
4.6.4 Non-Signaling Channels . . . . . . . . . . . . . . . . . . 206
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
4.8 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 208
4.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

5 Fundamental Quantum Information Processing Tasks . . . . . . . . 212

5.1 Quantum Teleportation . . . . . . . . . . . . . . . . . . . . . . . 213
5.1.1 Qubit Teleportation Protocol . . . . . . . . . . . . . . . . 213
5.1.2 Qudit Teleportation Protocol . . . . . . . . . . . . . . . . 216
5.1.3 Post-Selected Teleportation . . . . . . . . . . . . . . . . . 223
5.1.4 Teleportation-Simulable Channels . . . . . . . . . . . . . 224
5.2 Quantum Super-Dense Coding . . . . . . . . . . . . . . . . . . . 227
5.3 Quantum Hypothesis Testing . . . . . . . . . . . . . . . . . . . . 229
5.3.1 Symmetric Case (State Discrimination) . . . . . . . . . . 233
5.3.2 Multiple State Discrimination . . . . . . . . . . . . . . . 247
5.3.3 Asymmetric Case . . . . . . . . . . . . . . . . . . . . . . 250
5.4 Quantum Channel Discrimination . . . . . . . . . . . . . . . . . 255
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
5.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 259
5.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

6 Distinguishibility Measures for Quantum States and Channels . . . 262

6.1 Trace Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
6.2 Fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
6.2.1 Sine Distance . . . . . . . . . . . . . . . . . . . . . . . . 280

vi
6.3 Diamond Distance . . . . . . . . . . . . . . . . . . . . . . . . . . 282
6.4 Fidelity Measures for Channels . . . . . . . . . . . . . . . . . . . 286
6.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 292
Appendix 6.A SDP for Normalized Diamond Distance . . . . . . . . . 293
Appendix 6.B SDPs for Fidelity of States and Channels . . . . . . . . 296
6.B.1 Proof of Proposition 6.6 . . . . . . . . . . . . . . . . . . 296
6.B.2 Proof of Proposition 6.24 . . . . . . . . . . . . . . . . . . 299

7 Quantum Entropies and Information . . . . . . . . . . . . . . . . . 305

7.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
7.2 Quantum Relative Entropy . . . . . . . . . . . . . . . . . . . . . 308
7.2.1 Information Measures from Quantum Relative Entropy . . 322
7.2.2 Quantum Conditional Mutual Information . . . . . . . . . 326
7.2.3 Quantum Mutual Information . . . . . . . . . . . . . . . . 335
7.3 Generalized Divergences . . . . . . . . . . . . . . . . . . . . . . 339
7.4 Petz–Rényi Relative Entropy . . . . . . . . . . . . . . . . . . . . 345
7.5 Sandwiched Rényi Relative Entropy . . . . . . . . . . . . . . . . 362
7.6 Geometric Rényi Relative Entropy . . . . . . . . . . . . . . . . . 386
7.6.1 Proof of Proposition 7.41 . . . . . . . . . . . . . . . . . . 413
7.6.2 Proof of Proposition 7.40 . . . . . . . . . . . . . . . . . . 414
7.7 Belavkin–Staszewski Relative Entropy . . . . . . . . . . . . . . . 423
7.8 Max-Relative Entropy . . . . . . . . . . . . . . . . . . . . . . . . 434
7.8.1 Smooth Max-Relative Entropy . . . . . . . . . . . . . . . 440
7.9 Hypothesis Testing Relative Entropy . . . . . . . . . . . . . . . . 445
7.9.1 Connection to Quantum Relative Entropy . . . . . . . . . 453
7.9.2 Connections to Quantum Rényi Relative Entropies . . . . 454
7.10 Quantum Stein’s Lemma . . . . . . . . . . . . . . . . . . . . . . 458
7.10.1 Error and Strong Converse Exponents . . . . . . . . . . . 466
7.11 Information Measures for Quantum Channels . . . . . . . . . . . 471
7.11.1 Simplified Formulas for Rényi Information Measures . . . 488
7.11.2 Remarks on Defining Channel Quantities from State Quantities495
7.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
7.13 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 497

8 Information Measures for Quantum Channels . . . . . . . . . . . . 502

9 Entanglement Measures . . . . . . . . . . . . . . . . . . . . . . . . . 503

9.1 Definition and Basic Properties . . . . . . . . . . . . . . . . . . . 505

vii
9.1.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 513
9.2 Generalized Divergence of Entanglement . . . . . . . . . . . . . . 537
9.2.1 Cone Program Formulations . . . . . . . . . . . . . . . . 549
9.3 Generalized Rains Divergence . . . . . . . . . . . . . . . . . . . 552
9.3.1 Semi-Definite Program Formulations . . . . . . . . . . . 558
9.4 Squashed Entanglement . . . . . . . . . . . . . . . . . . . . . . . 566
9.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576
9.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 576
Appendix 9.A Semi-Definite Programs for Negativity . . . . . . . . . 579

10 Entanglement Measures for Quantum Channels . . . . . . . . . . . 581

10.1 Definition and Basic Properties . . . . . . . . . . . . . . . . . . . 581
10.2 Amortized Entanglement . . . . . . . . . . . . . . . . . . . . . . 586
10.2.1 Amortized Entanglement and Teleportation Simulation . . 592
10.3 Generalized Divergence of Entanglement . . . . . . . . . . . . . . 594
10.4 Generalized Rains Divergence . . . . . . . . . . . . . . . . . . . 602
10.5 Squashed Entanglement . . . . . . . . . . . . . . . . . . . . . . . 609
10.6 Amortization Collapses . . . . . . . . . . . . . . . . . . . . . . . 612
10.6.1 Max-Relative Entropy of Entanglement . . . . . . . . . . 613
10.6.2 Max-Rains Information . . . . . . . . . . . . . . . . . . . 617
10.6.3 Squashed Entanglement . . . . . . . . . . . . . . . . . . . 621
10.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
10.8 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 625
10.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626
Appendix 10.A The 𝛼 → 1 and 𝛼 → ∞ Limits of the Sandwiched Rényi
Entanglement Measures . . . . . . . . . . . . . . . . . . . . . . . 626

II Quantum Communication Protocols 631

11 Entanglement-Assisted Classical Communication . . . . . . . . . . . 632
11.1 One-Shot Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 633
11.1.1 Protocol Over a Useless Channel . . . . . . . . . . . . . . 640
11.1.2 Upper Bound on the Number of Transmitted Bits . . . . . 645
11.1.3 Lower Bound on the Number of Transmitted Bits via
Position-Based Coding and Sequential Decoding . . . . . 649
11.2 Entanglement-Assisted Classical Capacity of a
Quantum Channel . . . . . . . . . . . . . . . . . . . . . . . . . . 658

viii
11.2.1 Proof of Achievability . . . . . . . . . . . . . . . . . . . 664
11.2.2 Additivity of the Sandwiched Rényi Mutual Information of
a Channel . . . . . . . . . . . . . . . . . . . . . . . . . . 667
11.2.3 Proof of the Strong Converse . . . . . . . . . . . . . . . . 676
11.2.4 Proof of the Weak Converse . . . . . . . . . . . . . . . . 677
11.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678
11.3.1 Covariant Channels . . . . . . . . . . . . . . . . . . . . . 678
11.3.2 Generalized Amplitude Damping Channel . . . . . . . . . 684
11.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686
11.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 687
Appendix 11.A Proof of Theorem 11.7 . . . . . . . . . . . . . . . . . . 688
Appendix 11.B The 𝛼 → 1 Limit of the Sandwiched Rényi Mutual
Information of a Channel . . . . . . . . . . . . . . . . . . . . . . 695
Appendix 11.C Achievability from a Different Point of View . . . . . . 696
Appendix 11.D Proof of Lemma 11.20 . . . . . . . . . . . . . . . . . . 698
Appendix 11.E Alternate Expression for the 1 → 𝛼 CB Norm . . . . . 702
Appendix 11.F Proof of the Multiplicativity of the 1 → 𝛼 CB Norm . . 704
Appendix 11.G The Strong Converse from a Different Point of View . . 713

12 Classical Communication . . . . . . . . . . . . . . . . . . . . . . . . 716

12.1 One-Shot Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 719
12.1.1 Protocol Over a Useless Channel . . . . . . . . . . . . . . 723
12.1.2 Upper Bound on the Number of Transmitted Bits . . . . . 725
12.1.3 Lower Bound on the Number of Transmitted Bits . . . . . 729
12.2 Classical Capacity of a Quantum Channel . . . . . . . . . . . . . 738
12.2.1 Proof of Achievability . . . . . . . . . . . . . . . . . . . 745
12.2.2 Proof of the Weak Converse . . . . . . . . . . . . . . . . 748
12.2.3 The Additivity Question . . . . . . . . . . . . . . . . . . 751
12.2.4 Proof of the Strong Converse for Entanglement-Breaking
Channels . . . . . . . . . . . . . . . . . . . . . . . . . . 761
12.2.5 General Upper Bounds on the Strong Converse Classical
Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . 763
12.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781
12.3.1 Covariant Channels . . . . . . . . . . . . . . . . . . . . . 782
12.3.2 Amplitude Damping Channel . . . . . . . . . . . . . . . . 791
12.3.3 Hadamard Channels . . . . . . . . . . . . . . . . . . . . . 792
12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796
12.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 797

ix
Appendix 12.A The 𝛼 → 1 Limit of the Sandwiched Rényi Υ-Information
of a Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798
Appendix 12.B Proof of the Additivity of 𝐶 𝛽 (N) . . . . . . . . . . . . 799

13 Entanglement Distillation . . . . . . . . . . . . . . . . . . . . . . . . 807

13.1 One-Shot Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 809
13.1.1 Upper Bounds on the Number of Ebits . . . . . . . . . . . 811
13.1.2 Lower Bound on the Number of Ebits via Decoupling . . . 820
13.2 Distillable Entanglement of a Quantum State . . . . . . . . . . . . 833
13.2.1 Proof of Achievability . . . . . . . . . . . . . . . . . . . 840
13.2.2 Proof of the Weak Converse . . . . . . . . . . . . . . . . 844
13.2.3 Rains Relative Entropy Strong Converse Upper Bound . . 846
13.2.4 Squashed Entanglement Weak Converse Upper Bound . . 850
13.2.5 One-Way Entanglement Distillation . . . . . . . . . . . . 852
13.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857
13.3.1 Pure States . . . . . . . . . . . . . . . . . . . . . . . . . 857
13.3.2 Degradable and Anti-Degradable States . . . . . . . . . . 858
13.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 861
13.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 862
Appendix 13.A One-Shot Decoupling and Proof of Theorem 13.11 . . . 864

14 Quantum Communication . . . . . . . . . . . . . . . . . . . . . . . . 873

14.1 One-Shot Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 876
14.1.1 Protocol for a Useless Channel . . . . . . . . . . . . . . . 880
14.1.2 Upper Bound on the Number of Transmitted Qubits . . . . 882
14.1.3 Lower Bound on the Number of Transmitted Qubits via
Entanglement Distillation . . . . . . . . . . . . . . . . . . 884
14.2 Quantum Capacity of a Quantum Channel . . . . . . . . . . . . . 896
14.2.1 Proof of Achievability . . . . . . . . . . . . . . . . . . . 901
14.2.2 Proof of the Weak Converse . . . . . . . . . . . . . . . . 905
14.2.3 The Additivity Question . . . . . . . . . . . . . . . . . . 908
14.2.4 Rains Information Strong Converse Upper Bound . . . . . 909
14.2.5 Squashed Entanglement Weak Converse Bound . . . . . . 918
14.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 919
14.3.1 Degradable Channels . . . . . . . . . . . . . . . . . . . . 920
14.3.2 Anti-Degradable Channels . . . . . . . . . . . . . . . . . 926
14.3.3 Generalized Amplitude Damping Channel . . . . . . . . . 928
14.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 932

x
14.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 933
Appendix 14.A Alternative Notions of Quantum Communication . . . . 934

15 Secret Key Distillation . . . . . . . . . . . . . . . . . . . . . . . . . . 938

15.1 One-Shot Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 940
15.1.1 Tripartite Key States and Bipartite Private States . . . . . . 943
15.1.2 Equivalence of Tripartite Key Distillation and Bipartite
Private State Distillation . . . . . . . . . . . . . . . . . . 948
15.1.3 Upper Bounds on the Number of Secret-Key Bits . . . . . 953
15.1.4 Lower Bound on the Number of Secret-Key Bits via
Position-Based Coding and Convex Splitting . . . . . . . . 972
15.2 Distillable Key of a Quantum State . . . . . . . . . . . . . . . . . 984
15.2.1 Proof of Achievability . . . . . . . . . . . . . . . . . . . 990
15.2.2 Proof of the Weak Converse . . . . . . . . . . . . . . . . 995
15.2.3 Relative Entropy of Entanglement Strong Converse Upper
Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997
15.2.4 Squashed Entanglement Weak Converse Upper Bound . . 999
15.3 One-Way Secret Key Distillation . . . . . . . . . . . . . . . . . .1001
15.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1005
15.4.1 Pure States . . . . . . . . . . . . . . . . . . . . . . . . .1005
15.4.2 Degradable and Anti-Degradable States . . . . . . . . . .1006
15.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1008
15.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . .1009
Appendix 15.A Proof of Smooth Convex Split Lemma . . . . . . . . .1010
Appendix 15.B Relating Two Variants of Smooth-Max Mutual Information1014

16 Private Communication . . . . . . . . . . . . . . . . . . . . . . . . .1018

16.1 One-Shot Setting . . . . . . . . . . . . . . . . . . . . . . . . . .1019
16.1.1 Private Communication and Quantum Communication . .1025
16.1.2 Secret-Key Transmission and Bipartite Private-State Trans-
mission . . . . . . . . . . . . . . . . . . . . . . . . . . .1027
16.1.3 Upper Bounds on the Number of Transmitted Private Bits .1031
16.1.4 Lower Bound on the Number of Transmitted Private Bits
via Position-Based Coding and Convex Splitting . . . . . .1037
16.2 Private Capacity of a Quantum Channel . . . . . . . . . . . . . .1047
16.2.1 Proof of Achievability . . . . . . . . . . . . . . . . . . .1053
16.2.2 Proof of the Weak Converse . . . . . . . . . . . . . . . .1057
16.2.3 Relative Entropy of Entanglement Strong Converse Bound 1059

xi
16.2.4 Squashed Entanglement Weak Converse Bound . . . . . .1060
16.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1062
16.3.1 Degradable Channels . . . . . . . . . . . . . . . . . . . .1062
16.3.2 Anti-Degradable Channels . . . . . . . . . . . . . . . . .1065
16.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1065
16.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . .1066

III Quantum Communication Protocols With Feedback

Assistance 1068
17 Quantum-Feedback-Assisted Communication . . . . . . . . . . . . .1069
17.1 𝑛-Shot Quantum Feedback-Assisted Communication Protocols . .1070
17.1.1 Protocol over a Useless Channel . . . . . . . . . . . . . .1073
17.1.2 Upper Bound on the Number of Transmitted Bits . . . . .1076
17.1.3 The Amortized Perspective . . . . . . . . . . . . . . . . .1082
17.2 Quantum Feedback-Assisted Classical Capacity of a Quantum Channel1089
17.3 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . .1091

18 Classical-Feedback-Assisted Communication . . . . . . . . . . . . .1093

18.1 𝑛-Shot Classical Feedback-Assisted Communication Protocols . .1094
18.2 Protocol over a Useless Channel . . . . . . . . . . . . . . . . . .1096
18.3 Upper Bounds on the Number of Transmitted Bits . . . . . . . . .1098
18.3.1 Upper Bounds for Entanglement-Breaking Channels . . .1098
18.3.2 Entropy Upper Bound on the Number of Transmitted Bits .1105
18.3.3 Geometric Υ-Information Upper Bound on the Number of
Transmitted Bits . . . . . . . . . . . . . . . . . . . . . . .1114
18.4 Classical Capacity of a Quantum Channel Assisted by Classical
Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1138
18.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1141
18.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1142
18.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . .1143

19 LOCC-Assisted Quantum Communication . . . . . . . . . . . . . .1145

19.1 𝑛-Shot LOCC-Assisted Quantum Communication Protocol . . . .1148
19.1.1 Lower Bound on the Number of Transmitted Qubits . . . .1151
19.1.2 Amortized Entanglement as a General Upper Bound for
LOCC-Assisted Quantum Communication Protocols . . .1151

xii
19.1.3 Squashed Entanglement Upper Bound on the Number of
Transmitted Qubits . . . . . . . . . . . . . . . . . . . . .1153
19.2 𝑛-Shot PPT-Assisted Quantum Communication Protocol . . . . .1154
19.2.1 Rényi–Rains Information Upper Bounds on the Number of
Transmitted Qubits . . . . . . . . . . . . . . . . . . . . .1156
19.3 LOCC- and PPT-Assisted Quantum Capacities of Quantum Channels1158
19.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1161
19.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . .1161

20 Secret Key Agreement . . . . . . . . . . . . . . . . . . . . . . . . . .1163

20.1 𝑛-Shot Secret-Key-Agreement Protocol . . . . . . . . . . . . . . .1165
20.1.1 Equivalence between Secret Key Agreement and LOPC-
Assisted Private Communication . . . . . . . . . . . . . .1169
20.2 Equivalence between Secret Key Agreement and LOCC-Assisted
Private-State Distillation . . . . . . . . . . . . . . . . . . . . . .1171
20.2.1 The Purified Protocol . . . . . . . . . . . . . . . . . . . .1172
20.2.2 LOCC-Assisted Bipartite Private-State Distillation . . . .1174
20.2.3 Relation between Secret Key Agreement and LOCC-Assisted
Quantum Communication . . . . . . . . . . . . . . . . .1177
20.2.4 𝑛-Shot Secret-Key-Agreement Protocol Assisted by Public
Separable Channels . . . . . . . . . . . . . . . . . . . . .1177
20.2.5 Amortized Entanglement Bound for Secret-Key-Agreement
Protocols . . . . . . . . . . . . . . . . . . . . . . . . . .1179
20.3 Squashed Entanglement Upper Bound on the Number of Transmit-
ted Private Bits . . . . . . . . . . . . . . . . . . . . . . . . . . .1181
20.3.1 Squashed Entanglement and Approximate Private States .1182
20.3.2 Squashed Entanglement Upper Bound . . . . . . . . . . .1186
20.4 Relative Entropy of Entanglement Upper Bounds on the Number
of Transmitted Private Bits . . . . . . . . . . . . . . . . . . . . .1187
20.5 Secret-Key-Agreement Capacities of Quantum Channels . . . . .1189
20.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1191
20.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . .1191

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1194

Appendix A Analyzing General Communication Scenarios . . . . . . .1195

xiii
Chapter 1

Introduction
[IN PROGRESS]

1
Part I

Preliminaries

Before starting our journey though quantum communication protocols, it

is necessary for us to learn about and understand the various mathematical
and physical concepts involved in their construction and analysis. To this
end, we begin in Chapter 2 by providing an overview of the mathemat-
ics required for understanding quantum communication protocols, and
quantum information more broadly. Then, in Chapters 3–6, we study
the basic axioms of quantum mechanics, including quantum states and
measurements (Chapter 3); followed quantum channels, with the general
theory and many examples (Chapter 4); followed by fundamental quantum
information processing tasks, such as teleportation, super-dense coding,
and hypothesis testing (Chapter 5); and then distinguishability measures
for states and channels, such as fidelity, trace distance, and diamond
distance (Chapter 6). Entropies and entanglement measures are crucial in
quantifying the performance of quantum communication protocols, but
they are also interesting in their own right, and they have applications
in other areas of mathematical physics. In Chapters 7–9, we study these
quantities in detail.
Chapter 2

Mathematical Tools
In this chapter, we learn about the various mathematical concepts required for the
analysis of quantum communication protocols. We mostly provide a summary of
the main definitions and results needed in later chapters, and we omit several of the
proofs. For further details on the concepts presented here, as well as for proofs not
explicity given here, please consult the Bibliographic Notes (Section 2.6) at the
end of the chapter.
Linear algebra forms the core mathematical foundation of quantum information
theory for finite-dimensional quantum systems, and thus it is worthwhile for us
to start by reviewing the basics of linear algebra, with an emphasis on linear
operators. We then proceed to give a summary of several relevant definitions
and results in real and convex analysis, probability theory, and semi-definite
programming. Concepts from real analysis play an important role in quantum
information theory. Indeed, as we discover later, the capacity of a quantum channel
is defined as a limit, which is a core notion in real analysis. Convexity plays a
prominent role as well. Not only is the set of quantum states a convex set, but
also the operator Jensen inequality, a foundational statement about operator convex
functions, is a fundamental inequality that leads to various quantum data-processing
inequalities. The latter data-processing principle is one of the central tenets of
quantum information that allows for placing limitations on the communication
capacities of quantum channels. Probability theory is essential as well, due to the
probabilistic nature of quantum mechanics and the inevitable and unpredictable
errors that occur when communicating information over quantum channels. Finally,
semi-definite programming is a remarkably useful tool, not only as an analytical tool
but also for numerically calculating relevant quantities of interest. Semi-definite
3
Chapter 2: Mathematical Tools

programming has also played a pivotal role in many of the substantive advances that
have taken place in quantum information theory during the past several decades,
and so it has become one of the standard tools in the quantum information theorist’s
toolkit.

2.1 Finite-Dimensional Hilbert Spaces

The primary mathematical object in quantum theory is the Hilbert space. We
consider only finite-dimensional Hilbert spaces, denoted by H, throughout this
book, and we use dim(H) to denote the dimension of H. Although we consider
finite-dimensional spaces exclusively in this book, we note here that many of the
statements and claims extend directly to the case of separable, infinite-dimensional
Hilbert spaces, especially for operationally-defined tasks and information quantities.
However, we do not delve into these details.
A 𝑑-dimensional Hilbert space (1 ≤ 𝑑 < ∞) is defined to be a complex vector
space equipped with an inner product1 . We use the notation |𝜓⟩ to denote a vector
in H. An inner product is a function ⟨·|·⟩ : H × H → C that satisfies the following
properties:
• Non-negativity: ⟨𝜓|𝜓⟩ ≥ 0 for all |𝜓⟩ ∈ H, and ⟨𝜓|𝜓⟩ = 0 if and only if
|𝜓⟩ = 0.
• Conjugate bilinearity: For all |𝜓1 ⟩, |𝜓2 ⟩, |𝜙1 ⟩, |𝜙2 ⟩ ∈ H and 𝛼1 , 𝛽1 , 𝛼2 , 𝛽2 ∈ C,

⟨𝛼1 𝜓1 + 𝛽1 𝜙1 |𝛼2 𝜓2 + 𝛽2 𝜙2 ⟩ = 𝛼1 𝛼2 ⟨𝜓1 |𝜓2 ⟩ + 𝛼1 𝛽2 ⟨𝜓1 |𝜙2 ⟩

+ 𝛽1 𝛼2 ⟨𝜙1 |𝜓2 ⟩ + 𝛽1 𝛽2 ⟨𝜙1 |𝜙2 ⟩. (2.1.1)

• Conjugate symmetry: ⟨𝜓|𝜙⟩ = ⟨𝜙|𝜓⟩ for all |𝜓⟩, |𝜙⟩ ∈ H.

In the above, 𝛼 denotes the complex conjugate of 𝛼 ∈ C. Throughout this book,
the term “Hilbert space” always refers to a finite-dimensional Hilbert space.
All 𝑑-dimensional Hilbert spaces are isomorphic to the vector space C𝑑 equip-
ped with the Euclidean inner product. By two Hilbert spaces H and H′ being
1This definition suffices in the finite-dimensional case. More generally, a Hilbert space is a
complete inner product space; please consult the Bibliographic Notes (Section 2.6).

4
Chapter 2: Mathematical Tools

isomorphic, we mean that there is a bĳective linear mapping 𝑈 : H → H′ such that

⟨𝑈𝜑|𝑈𝜓⟩ = ⟨𝜑|𝜓⟩, (2.1.2)

for all |𝜑⟩, |𝜓⟩ ∈ H, and 𝑈 is called an isomorphism. For the finite-dimensional case
of interest for us, 𝑈 is a unitary operator (discussed in more detail in Section 2.2.6).
Note that C𝑑 is the vector space of 𝑑-dimensional column vectors with elements
in C. We let {|𝑖⟩}𝑖=0𝑑−1 denote an orthonormal basis, called the standard basis or

computational basis, for the Hilbert space with respect to the Euclidean inner
product. The vector |𝑖⟩ is defined to be a column vector with its (𝑖 + 1) th entry equal
to one and all others equal to zero, so that

© 1ª ©0ª ©0ª
0® 1® 0®
® ® ®
|0⟩ = 0®® , |1⟩ = 0®® , ... , |𝑑 − 1⟩ = 0®® . (2.1.3)
... ® ... ® ... ®
® ® ®
« 0¬ «0¬ «1¬
The inner product ⟨𝑖| 𝑗⟩ evaluates to ⟨𝑖| 𝑗⟩ = 𝛿𝑖, 𝑗 for all 𝑖, 𝑗 ∈ {0, . . . , 𝑑 − 1}, where
the Kronecker delta function is defined as
(
0 if 𝑖 ≠ 𝑗
𝛿𝑖, 𝑗 B (2.1.4)
1 if 𝑖 = 𝑗 .
Í𝑑−1 Í𝑑−1
More generally, for two vectors |𝜓⟩ = 𝑖=0 𝛼𝑖 |𝑖⟩ and |𝜙⟩ = 𝑖=0 𝛽𝑖 |𝑖⟩, with
𝛼𝑖 = ⟨𝑖|𝜓⟩ ∈ C and 𝛽𝑖 = ⟨𝑖|𝜙⟩ ∈ C being the respective components of |𝜓⟩ and |𝜙⟩
in the standard basis, the inner product ⟨𝜓|𝜙⟩ is defined as
𝑑−1
∑︁
⟨𝜓|𝜙⟩ B 𝛼𝑖 𝛽𝑖 . (2.1.5)
𝑖=0

The Euclidean norm, denoted by ∥|𝜓⟩∥ 2 , of a vector |𝜓⟩ ∈ H is the norm induced
by the inner product, i.e., √︁
∥|𝜓⟩∥ 2 B ⟨𝜓|𝜓⟩. (2.1.6)

The Cauchy–Schwarz inequality is the following statement: for two vectors

|𝜓⟩, |𝜙⟩ ∈ H, the following inequality holds:

|⟨𝜓|𝜙⟩| 2 ≤ ⟨𝜓|𝜓⟩ · ⟨𝜙|𝜙⟩ = ∥|𝜓⟩∥ 22 · ∥|𝜙⟩∥ 22 , (2.1.7)

5
Chapter 2: Mathematical Tools

with equality if and only if |𝜙⟩ = 𝛼|𝜓⟩ for some 𝛼 ∈ C.

Given a vector |𝜓⟩ ∈ H, its dual vector, denoted by ⟨𝜓|, is defined to be a
linear functional from H to C such that ⟨𝜓|(|𝜙⟩) = ⟨𝜓|𝜙⟩ for all |𝜙⟩ ∈ H. If
Í𝑑−1 Í𝑑−1
|𝜓⟩ = 𝑖=0 𝛼𝑖 |𝑖⟩, then ⟨𝜓| = 𝑖=0 𝛼𝑖 ⟨𝑖|, where ⟨𝑖| can be interpreted, based on
(2.1.3), as a row vector with its (𝑖 + 1) th entry equal to one and all other entries
equal to zero; i.e., ⟨𝑖| = (|𝑖⟩) T , where (·) T denotes the matrix transpose.
The tensor product of vectors, operators, and Hilbert spaces plays an important
role in quantum theory. For example, it is used to describe the state of multiple
quantum systems. For two Hilbert spaces H 𝐴 and H𝐵 with dimensions 𝑑 𝐴 and 𝑑 𝐵 ,
𝑑 𝐴−1 𝐵 −1
respectively, along with associated orthonormal bases {|𝑖⟩ 𝐴 }𝑖=0 and {| 𝑗⟩𝐵 } 𝑑𝑗=0 ,
the tensor product vector |𝑖⟩ 𝐴 ⊗ | 𝑗⟩𝐵 is a vector in a (𝑑 𝐴 𝑑 𝐵 )-dimensional Hilbert
space with a one in its (𝑖 · 𝑑 𝐵 + 𝑗 + 1) th entry and zeros elsewhere. Notice here that
we have employed the labels 𝐴 and 𝐵 in order to keep track of the Hilbert spaces of
the vectors in the tensor product. Later on, when we move to the study of quantum
information, we will see that the label 𝐴 can be associated to a quantum system in
possession of “Alice” and the label 𝐵 can be associated to a quantum system in
possession of “Bob.” As an example of the tensor-product vector |𝑖⟩ 𝐴 ⊗ | 𝑗⟩𝐵 , if
𝑑 𝐴 = 2, 𝑑 𝐵 = 3, 𝑖 = 0, and 𝑗 = 2, then

= 𝛼𝑖 𝛽 𝑗 |𝑖⟩ 𝐴 ⊗ | 𝑗⟩𝐵 . (2.1.10)

𝑖=0 𝑗=0

As an example with 𝑑 𝐴 = 2 and 𝑑 𝐵 = 3, we find that |𝜓⟩ 𝐴 ⊗ |𝜙⟩𝐵 can be calculated

6
Chapter 2: Mathematical Tools

by a generalization of the “stack-and-multiply” procedure used in (2.1.8):

𝛽0 𝛼0 𝛽 0
𝛼0 · 𝛽1 ®® 𝛼0 𝛽1 ®
© © ªª © ª
𝛽0 ® ®
𝛼0 𝛽2 ¬® 𝛼0 𝛽2 ®
|𝜓⟩ 𝐴 ⊗ |𝜙⟩𝐵 = ⊗ 𝛽1 ® = ®=
© ª « ®. (2.1.11)
𝛼1 𝛽 0 ® 𝛼1 𝛽 0 ®
« 𝛽2 ¬ 𝛼1 · © 𝛽1 ª®®® 𝛼1 𝛽1 ®®
« « 𝛽2 ¬¬ «𝛼1 𝛽2 ¬

The tensor-product Hilbert space H 𝐴 ⊗ H𝐵 is defined to be the Hilbert space

spanned by the vectors |𝑖⟩ 𝐴 ⊗ | 𝑗⟩𝐵 defined above:

H 𝐴 ⊗ H𝐵 B span{|𝑖⟩ 𝐴 ⊗ | 𝑗⟩𝐵 : 0 ≤ 𝑖 ≤ 𝑑 𝐴 − 1, 0 ≤ 𝑗 ≤ 𝑑 𝐵 − 1}. (2.1.12)

The inner product on H 𝐴 ⊗ H𝐵 is given by

(⟨𝑖| 𝐴 ⊗ ⟨ 𝑗 | 𝐵 )(|𝑖′⟩ 𝐴 ⊗ | 𝑗 ′⟩𝐵 ) = ⟨𝑖|𝑖′⟩⟨ 𝑗 | 𝑗 ′⟩ = 𝛿𝑖,𝑖 ′ 𝛿 𝑗, 𝑗 ′ (2.1.13)

for all 𝑖, 𝑖′, 𝑗, 𝑗 ′ satisfying 0 ≤ 𝑖, 𝑖′ ≤ 𝑑 𝐴 − 1 and 0 ≤ 𝑗, 𝑗 ′ ≤ 𝑑 𝐵 − 1. The Hilbert

space H 𝐴 ⊗ H𝐵 consequently has dimension 𝑑 𝐴 𝑑 𝐵 . We often use the notation
H 𝐴𝐵 ≡ H 𝐴 ⊗ H𝐵 , as well as the abbreviation |𝑖, 𝑗⟩ 𝐴𝐵 ≡ |𝑖⟩ 𝐴 ⊗ | 𝑗⟩𝐵 . We often also
use the notation H 𝐴𝑛 ≡ H ⊗𝑛 𝐴 to refer to the 𝑛-fold tensor product of H 𝐴 .
The direct sum of H 𝐴 and H𝐵 , denoted by H 𝐴 ⊕ H𝐵 , is defined to be the Hilbert
space of vectors of the form |𝜓⟩ 𝐴 ⊕ |𝜙⟩𝐵 , with |𝜓⟩ 𝐴 ∈ H 𝐴 and |𝜙⟩𝐵 ∈ H𝐵 , where

|𝜓⟩ 𝐴
|𝜓⟩ 𝐴 ⊕ |𝜙⟩𝐵 B . (2.1.14)
|𝜙⟩𝐵

In other words, H 𝐴 ⊕ H𝐵 can be viewed as the Hilbert space of column vectors

formed by stacking elements of the constituent Hilbert spaces. Observe that if H 𝐴
has the same dimension as H𝐵 , then we can write

|𝜓⟩ 𝐴 ⊕ |𝜙⟩𝐵 = |0⟩ ⊗ |𝜓⟩ 𝐴 + |1⟩ ⊗ |𝜙⟩𝐵 , (2.1.15)

where {|0⟩, |1⟩} is the standard basis for a two-dimensional Hilbert space.

Exercise 2.1
Verify (2.1.15).

7
Chapter 2: Mathematical Tools

𝑑 𝐴−1 𝐵 −1
If {|𝑖⟩ 𝐴 }𝑖=0 and {| 𝑗⟩𝐵 } 𝑑𝑗=0 are orthonormal bases for H 𝐴 and H𝐵 , respectively,
then
|𝑖⟩ 𝐴 0
: 0 ≤ 𝑖 ≤ 𝑑𝐴 − 1 ∪ : 0 ≤ 𝑗 ≤ 𝑑𝐵 − 1 (2.1.16)
0 | 𝑗⟩𝐵
is an orthonormal basis for H 𝐴 ⊕ H𝐵 under the inner product

(⟨𝑖| 𝐴 ⊕ ⟨ 𝑗 | 𝐵 )(|𝑖′⟩ 𝐴 ⊕ | 𝑗 ′⟩𝐵 ) = ⟨𝑖|𝑖′⟩ + ⟨ 𝑗 | 𝑗 ′⟩ = 𝛿𝑖,𝑖 ′ + 𝛿 𝑗, 𝑗 ′ . (2.1.17)

for all 0 ≤ 𝑖, 𝑖′ ≤ 𝑑 𝐴 − 1 and 0 ≤ 𝑗, 𝑗 ′ ≤ 𝑑 𝐵 − 1. Consequently, H 𝐴 ⊕ H𝐵 has

dimension 𝑑 𝐴 + 𝑑 𝐵 . One of the simplest examples of a direct-sum Hilbert space is
C ⊕ C = C2 . More generally, the 𝑑-fold direct sum C⊕𝑑 is equal to C𝑑 .
If H is a 𝑑-dimensional Hilbert space, then the 𝑘-fold direct sum H ⊕𝑘 is a
𝑘 𝑑-dimensional Hilbert space. Consequently, it is isomorphic to C𝑘 ⊗ H, and
the isomorphism is a generalization of the simple example presented in (2.1.15).
Indeed, let H 𝑋 ≡ C𝑘 , with orthonormal basis {|𝑖⟩ 𝑋 }𝑖=0𝑘−1 , and let H ≡ H, with
𝐴
orthonormal basis {| 𝑗⟩ 𝐴 } 𝑗=0 . We then have the correspondence
𝑑−1

0
© . ª
.. ®
®
0 ®
®
|𝑖⟩ 𝑋 ⊗ | 𝑗⟩ 𝐴 ↔ | 𝑗⟩ 𝐴 ®® , (2.1.18)
0 ®
®
.. ®
. ®
« 0 ¬
holding for all 0 ≤ 𝑖 ≤ 𝑘 − 1 and all 0 ≤ 𝑗 ≤ 𝑑 − 1, where on the right-hand side
there is a one in the (𝑖 · 𝑑 + 𝑗 + 1) th entry of the column vector and zeros elsewhere.
Then, for an element |𝜓0 ⟩ 𝐴 ⊕ |𝜓1 ⟩ 𝐴 ⊕ · · · ⊕ |𝜓 𝑘−1 ⟩ 𝐴 ∈ H ⊕𝑘
𝐴 , we have

© |𝜓0 ⟩ 𝐴 ª 𝑘−1
|𝜓1 ⟩ 𝐴 ® ∑︁
. ®↔ |𝑖⟩ 𝑋 ⊗ |𝜓𝑖 ⟩ 𝐴 . (2.1.19)
.. ®
® 𝑖=0
« |𝜓 𝑘−1 ⟩ 𝐴 ¬
The isomorphism between H ⊕𝑘 and C𝑘 ⊗ H given by (2.1.18) and (2.1.19) is
relevant in the context of superpositions of quantum states and entanglement.

8
Chapter 2: Mathematical Tools

2.2 Linear Operators

Linear operators are relevant in quantum theory for describing states of quantum
systems, as well as physical evolutions of the states, including measurements and
unitary evolutions as special cases of general physical evolutions. Given a Hilbert
space H 𝐴 with dimension 𝑑 𝐴 and a Hilbert space H𝐵 with dimension 𝑑 𝐵 , a linear
operator 𝑋 : H 𝐴 → H𝐵 is defined to be a function such that

𝑋 (𝛼|𝜓⟩ 𝐴 + 𝛽|𝜙⟩ 𝐴 ) = 𝛼𝑋 |𝜓⟩ 𝐴 + 𝛽𝑋 |𝜙⟩ 𝐴 (2.2.1)

for all 𝛼, 𝛽 ∈ C and |𝜓⟩ 𝐴 , |𝜙⟩ 𝐴 ∈ H 𝐴 . For clarity, we sometimes write 𝑋 𝐴→𝐵 to
explicitly indicate the input and output Hilbert spaces of the linear operator 𝑋.
We use 1 to denote the identity operator, which is defined as the unique linear
operator such that 1 |𝜓⟩ = |𝜓⟩ for every vector |𝜓⟩. For clarity, when needed, we
write 1𝑑 to indicate the identity operator acting on a 𝑑-dimensional Hilbert space.

Exercise 2.2
Given an orthonormal basis {|𝑒 𝑘 ⟩} 𝑑𝑘=1 for a 𝑑-dimensional Hilbert space, prove
that
𝑑
1𝑑 =
∑︁
|𝑒 𝑘 ⟩⟨𝑒 𝑘 |. (2.2.2)
𝑘=1

We denote the set of all linear operators from H 𝐴 to H𝐵 by L(H 𝐴 , H𝐵 ). If

H 𝐴 = H𝐵 , then L(H 𝐴 ) B L(H 𝐴 , H 𝐴 ), and we sometimes indicate the input
Hilbert space H 𝐴 of 𝑋 ∈ L(H 𝐴 ) by writing 𝑋 𝐴 . In particular, we often write 𝑋 𝐴𝐵
when referring to linear operators in L(H 𝐴 ⊗ H𝐵 ), i.e., when referring to linear
operators acting on a tensor-product Hilbert space.
The set L(H 𝐴 , H𝐵 ) is itself a 𝑑 𝐴 𝑑 𝐵 -dimensional vector space. The standard
basis for L(H 𝐴 , H𝐵 ) is defined to be

{|𝑖⟩𝐵 ⟨ 𝑗 | 𝐴 : 0 ≤ 𝑖 ≤ 𝑑 𝐵 − 1, 0 ≤ 𝑗 ≤ 𝑑 𝐴 − 1}. (2.2.3)

By applying (2.1.3), we see that the operator |𝑖⟩𝐵 ⟨ 𝑗 | 𝐴 has a matrix representation
as a 𝑑 𝐵 × 𝑑 𝐴 matrix with the (𝑖 + 1, 𝑗 + 1) th entry equal to one and all other entries

9
Chapter 2: Mathematical Tools

equal to zero, i.e.,

©1 0 ··· 0ª ©0 1 ··· 0ª
0 0 ··· 0®® 0 0 ··· 0®®
|0⟩𝐵 ⟨0| 𝐴 = .. .. ... |0⟩𝐵 ⟨1| 𝐴 = .. ..
, ... ,...,
. . 0®® . . 0®®
«0 0 ··· 0¬ «0 0 ··· 0¬
(2.2.4)
©0 0 · · · 0ª
0 0 · · · 0®
|𝑑 𝐵 − 1⟩𝐵 ⟨𝑑 𝐴 − 1| 𝐴 = .. .. . . ®.
. . . 0 ®
®
« 0 0 · · · 1 ¬
Using this basis, we can write a linear operator 𝑋 ∈ L(H 𝐴 , H𝐵 ) as
𝐴−1
𝐵 −1 𝑑∑︁
𝑑∑︁
𝑋 𝐴→𝐵 = 𝑋𝑖, 𝑗 |𝑖⟩𝐵 ⟨ 𝑗 | 𝐴 , (2.2.5)
𝑖=0 𝑗=0

where 𝑋𝑖, 𝑗 B ⟨𝑖| 𝐵 𝑋 | 𝑗⟩ 𝐴 . This follows because

𝑋 𝐴→𝐵 = 1𝐵 𝑋 𝐴→𝐵 1 𝐴 (2.2.6)

𝑑 𝐴−1
!
𝐵 −1
𝑑∑︁
© ∑︁
= |𝑖⟩⟨𝑖| 𝐵 | 𝑗⟩⟨ 𝑗 | 𝐴 ®
ª
𝑋 𝐴→𝐵 (2.2.7)
𝑖=0 « 𝑗=0 ¬
𝐴−1
𝐵 −1 𝑑∑︁
𝑑∑︁
= ⟨𝑖| 𝐵 𝑋 𝐴→𝐵 | 𝑗⟩ 𝐴 |𝑖⟩𝐵 ⟨ 𝑗 | 𝐴 (2.2.8)
𝑖=0 𝑗=0
𝐴−1
𝐵 −1 𝑑∑︁
𝑑∑︁
= 𝑋𝑖, 𝑗 |𝑖⟩𝐵 ⟨ 𝑗 | 𝐴 . (2.2.9)
𝑖=0 𝑗=0

We can thus interpret a linear operator 𝑋 ∈ L(H 𝐴 , H𝐵 ) as a 𝑑 𝐵 × 𝑑 𝐴 matrix with

the (𝑖 + 1, 𝑗 + 1) th element equal to 𝑋𝑖, 𝑗 = ⟨𝑖| 𝐵 𝑋 | 𝑗⟩ 𝐴 , where 0 ≤ 𝑖 ≤ 𝑑 𝐵 − 1 and
0 ≤ 𝑗 ≤ 𝑑 𝐴 − 1. For example, if 𝑑 𝐴 = 2 and 𝑑 𝐵 = 3, then

𝑋0,0 𝑋0,1
𝑋 = 𝑋1,0 𝑋1,1 ® .
© ª
(2.2.10)
« 𝑋2,0 𝑋2,1 ¬

10
Chapter 2: Mathematical Tools

Exercise 2.3
Show that every linear operator 𝑋 ∈ L(H 𝐴 , H𝐵 ), expressed as in (2.2.5), can
be written as
𝐵 −1
𝑑∑︁ 𝐴−1
𝑑∑︁
𝑋 𝐴→𝐵 = |𝑖⟩𝐵 ⟨𝜓𝑖 | 𝐴 = |𝜙 𝑗 ⟩𝐵 ⟨ 𝑗 | 𝐴 , (2.2.11)
𝑖=0 𝑗=0
𝑑 𝐵 −1 𝐴−1
where {⟨𝜓𝑖 | 𝐴 }𝑖=0 and {|𝜙 𝑗 ⟩𝐵 } 𝑑𝑗=0 are the rows and columns, respectively,
of 𝑋.

2.2.1 Tensor Product

Given two linear operators 𝑋 ∈ L(H 𝐴 , H𝐵 ) and 𝑌 ∈ L(H 𝐴′ , H𝐵′ ), their tensor
product 𝑋 ⊗ 𝑌 is a linear operator in L(H 𝐴 ⊗ H 𝐴′ , H𝐵 ⊗ H𝐵′ ) such that

(𝑋 ⊗ 𝑌 )(|𝜓⟩ 𝐴 ⊗ |𝜙⟩ 𝐴′ ) = 𝑋 |𝜓⟩ 𝐴 ⊗ 𝑌 |𝜙⟩ 𝐴′ (2.2.12)

for all |𝜓⟩ 𝐴 ∈ H 𝐴 and |𝜙⟩ 𝐴′ ∈ H 𝐴′ . The matrix representation of 𝑋 ⊗ 𝑌 is the

Kronecker product of the matrix representations of 𝑋 and 𝑌 , which is a matrix
generalization of the “stack-and-multiply” procedure from (2.1.11). For example,
if 𝑑 𝐴 = 𝑑 𝐵 = 2 and 𝑑 𝐴′ = 𝑑 𝐵′ = 3, then
𝑌0,0 𝑌0,1 𝑌0,2
𝑋0,0 𝑋0,1
𝑋 ⊗𝑌 = ⊗ 𝑌1,0 𝑌1,1 𝑌1,2 ®
© ª
(2.2.13)
𝑋1,0 𝑋1,1
«𝑌2,0 𝑌2,1 𝑌2,2 ¬
𝑌0,0 𝑌0,1 𝑌0,2 𝑌0,0 𝑌0,1 𝑌0,2
𝑋0,0 · 𝑌1,0 𝑌1,1 𝑌1,2 ® 𝑋0,1 · 𝑌1,0 𝑌1,1 𝑌1,2 ®®
© © ª © ªª
®
𝑌2,0 𝑌2,1 𝑌2,2 ¬ 𝑌2,0 𝑌2,1 𝑌2,2 ¬®
=
« « ® (2.2.14)
𝑌0,0 𝑌0,1 𝑌0,2 𝑌0,0 𝑌0,1 𝑌0,2 ®
𝑋1,0 · 𝑌1,0 𝑌1,1 𝑌1,2 ® 𝑋1,1 · 𝑌1,0 𝑌1,1 𝑌1,2 ®®
© ª © ª®

« «𝑌2,0 𝑌2,1 𝑌2,2 ¬ «𝑌2,0 𝑌2,1 𝑌2,2 ¬¬

𝑋0,0𝑌0,0 𝑋0,0𝑌0,1 𝑋0,0𝑌0,2 𝑋0,1𝑌0,0 𝑋0,1𝑌0,1 𝑋0,1𝑌0,2
© ª
𝑋0,0𝑌1,0 𝑋0,0𝑌1,1 𝑋0,0𝑌1,2 𝑋0,1𝑌1,0 𝑋0,1𝑌1,1 𝑋0,1𝑌1,2 ®
®
𝑋0,0𝑌2,0 𝑋0,0𝑌2,1 𝑋0,0𝑌2,2 𝑋0,1𝑌2,0 𝑋0,1𝑌2,1 𝑋0,1𝑌2,2 ®
= ®. (2.2.15)
𝑋1,0𝑌0,0 𝑋1,0𝑌0,1 𝑋1,0𝑌0,2 𝑋1,1𝑌0,0 𝑋1,1𝑌0,1 𝑋1,1𝑌0,2 ®
®
𝑋1,0𝑌1,0 𝑋1,0𝑌1,1 𝑋1,0𝑌1,2 𝑋1,1𝑌1,0 𝑋1,1𝑌1,1 𝑋1,1𝑌1,2 ®
« 𝑋1,0𝑌2,0 𝑋1,0𝑌2,1 𝑋1,0𝑌2,2 𝑋1,1𝑌2,0 𝑋1,1𝑌2,1 𝑋1,1𝑌2,2 ¬
11
Chapter 2: Mathematical Tools

2.2.2 Image, Kernel, and Support

The image of a linear operator 𝑋 ∈ L(H 𝐴 , H𝐵 ), denoted by im(𝑋), is the set

defined as

im(𝑋) B {|𝜙⟩𝐵 ∈ H𝐵 : |𝜙⟩𝐵 = 𝑋 |𝜓⟩ 𝐴 , |𝜓⟩ 𝐴 ∈ H 𝐴 }. (2.2.16)

It is also known as the column space or range of 𝑋. The image of 𝑋 is a subspace

of H𝐵 . The rank of 𝑋, denoted by rank(𝑋), is defined2 to be the dimension of
im(𝑋). Note that rank(𝑋) ≤ min{𝑑 𝐴 , 𝑑 𝐵 } for all 𝑋 ∈ L(H 𝐴 , H𝐵 ).
The kernel of a linear operator 𝑋 ∈ L(H 𝐴 , H𝐵 ), denoted by ker(𝑋), is defined
to be the set of vectors in the input space H 𝐴 of 𝑋 for which the output is the zero
vector; i.e.,
ker(𝑋) B {|𝜓⟩ 𝐴 ∈ H 𝐴 : 𝑋 |𝜓⟩ 𝐴 = 0}. (2.2.17)
It is also known as the null space of 𝑋. The following dimension formula holds:

𝑑 𝐴 = rank(𝑋) + dim(ker(𝑋)), (2.2.18)

and it is known as the rank-nullity theorem (the quantity dim(ker(𝑋)) is called the
nullity of 𝑋).
The support of a linear operator 𝑋 ∈ L(H 𝐴 , H𝐵 ), denoted by supp(𝑋), is
defined to be the orthogonal complement of its kernel:

supp(𝑋) B ker(𝑋) ⊥ B {|𝜓⟩ ∈ H 𝐴 : ⟨𝜓|𝜙⟩ = 0 ∀|𝜙⟩ ∈ ker(𝑋)}. (2.2.19)

It is also known as the row space or coimage of 𝑋.

See Figure 2.1 for a visual representation of the subspaces im(𝑋), ker(𝑋), and
supp(𝑋) corresponding to a linear operator 𝑋 ∈ L(H 𝐴 , H𝐵 ). We use the notions
of support and kernel extensively in Chapter 7, when proving properties of quantum
relative entropy and its variants, which are core distinguishability measures in
quantum information.
A linear operator 𝑋 ∈ L(H 𝐴 , H𝐵 ) is called injective (or one-to-one) if, for
all |𝜓⟩, |𝜙⟩ ∈ H 𝐴 , 𝑋 |𝜓⟩ = 𝑋 |𝜙⟩ implies |𝜓⟩ = |𝜙⟩. A necessary and sufficient
condition for 𝑋 to be injective that the kernel of 𝑋 contains only the zero vector (i.e.,
2The rank of a linear operator can also be equivalently defined as the number of its singular
values; please see Theorem 2.1.

12
Chapter 2: Mathematical Tools

ker(X)
0

im(X)

supp(X)

Figure 2.1: Visual representation of the subspaces im(𝑋), ker(𝑋), and supp(𝑋)
corresponding to a linear operator 𝑋 ∈ L(H 𝐴 , H𝐵 ). Note that only the zero
vector is contained in both ker(𝑋) and supp(𝑋).

the column vector in which all of the elements are equal to zero), which implies
that dim(ker(𝑋)) = 0.
A linear operator 𝑋 ∈ L(H 𝐴 , H𝐵 ) is called surjective (or onto) if, for all
|𝜙⟩ ∈ H𝐵 , there exists |𝜓⟩ ∈ H 𝐴 such that 𝑋 |𝜓⟩ = |𝜙⟩. A necessary and sufficient
condition for 𝑋 to be surjective is that rank(𝑋) = 𝑑 𝐵 .

Exercise 2.4
Prove that a linear operator 𝑋 ∈ L(H) with the same, finite-dimensional input
and output Hilbert space H is injective if and only if it is surjective. (Hint: use
the rank-nullity theorem in (2.2.18).)

A linear operator 𝑋 ∈ L(H) that is both injective and surjective is known as a

bĳection. By definition, every bĳection is invertible, meaning that there exists a
unique linear operator, denoted by 𝑋 −1 , such that 𝑋 𝑋 −1 = 𝑋 −1 𝑋 = 1.

13
Chapter 2: Mathematical Tools

2.2.3 Trace

The trace of a linear operator 𝑋 ∈ L(H) acting on a 𝑑-dimensional Hilbert space H

is defined as
𝑑−1
∑︁
Tr[𝑋] B ⟨𝑖|𝑋 |𝑖⟩, (2.2.20)
𝑖=0
which can be interpreted as the sum of the diagonal elements of the matrix
corresponding to 𝑋 in the standard basis.

Exercise 2.5
Prove that the trace of a linear operator is independent of the choice of basis
Í𝑑−1 Í
used in (2.2.20). In other words, prove that 𝑖=0 ⟨𝑖|𝑋 |𝑖⟩ = 𝑑𝑘=1 ⟨𝑒 𝑘 |𝑋 |𝑒 𝑘 ⟩ for
every orthonormal basis {|𝑒 𝑘 ⟩} 𝑑𝑘=1 . (Hint: use (2.2.2).)

The trace satisfies the cyclicity property: for 𝑋, 𝑌 , 𝑍 ∈ L(H),

Tr[𝑋𝑌 𝑍] = Tr[𝑌 𝑍 𝑋] = Tr[𝑍 𝑋𝑌 ]. (2.2.21)

More generally, the cyclicity property holds for linear operators with different input
and output Hilbert spaces: for 𝑍 𝐴→𝐵 ∈ L(H 𝐴 , H𝐵 ), 𝑌𝐵→𝐶 ∈ L(H𝐵 , H𝐶 ), and
𝑋𝐶→𝐴 ∈ L(H𝐶 , H 𝐴 ),

Tr[𝑋𝐶→𝐴𝑌𝐵→𝐶 𝑍 𝐴→𝐵 ] = Tr[𝑌𝐵→𝐶 𝑍 𝐴→𝐵 𝑋𝐶→𝐴 ] (2.2.22)

= Tr[𝑍 𝐴→𝐵 𝑋𝐶→𝐴𝑌𝐵→𝐶 ]. (2.2.23)

Exercise 2.6
1. Prove the equalities in (2.2.22) and (2.2.23).
2. Prove that Tr[𝑋 ⊗ 𝑌 ] = Tr[𝑋]Tr[𝑌 ] for all 𝑋 ∈ L(H 𝐴 ) and 𝑌 ∈ L(H𝐵 ).

14
Chapter 2: Mathematical Tools

2.2.4 Transpose and Conjugate Transpose

Consider 𝑋 ∈ L(H 𝐴 , H𝐵 ) as written in (2.2.5). The transpose of 𝑋 is denoted

by 𝑋 T or alternatively by T(𝑋), and it is defined as
𝐴−1
𝐵 −1 𝑑∑︁
𝑑∑︁
T
𝑋 ≡ T(𝑋) B 𝑋𝑖, 𝑗 | 𝑗⟩ 𝐴 ⟨𝑖| 𝐵 . (2.2.24)
𝑖=0 𝑗=0

Note that the transpose is basis dependent, in the sense that it is defined with
respect to a particular basis (in the case above, we have defined it with respect to
the standard bases of H 𝐴 and H𝐵 ). Furthermore, taking the transpose with respect
to one orthonormal basis can lead to an operator different from that found by taking
the transpose with respect to a different orthonormal basis. In this sense, we could
more precisely refer to the operation in (2.2.24) as the “standard transpose.” The
standard transpose can also be understood as a linear superoperator (an operator on
operators) with the following representation:
𝑑∑︁ 𝐴−1
𝐵 −1 𝑑∑︁ 𝐴−1
𝐵 −1 𝑑∑︁
𝑑∑︁
T(𝑋) = (| 𝑗⟩ 𝐴 ⟨𝑖| 𝐵 ) 𝑋 (| 𝑗⟩ 𝐴 ⟨𝑖| 𝐵 ) = 𝑋𝑖, 𝑗 | 𝑗⟩ 𝐴 ⟨𝑖| 𝐵 . (2.2.25)
𝑖=0 𝑗=0 𝑖=0 𝑗=0

Superoperators are discussed in more detail in Section 2.2.11.

The conjugate transpose of 𝑋 ∈ L(H 𝐴 , H𝐵 ), also known as the Hermitian
conjugate or the adjoint of 𝑋, is the linear operator 𝑋 † ∈ L(H𝐵 , H 𝐴 ) defined as
𝐴−1
𝐵 −1 𝑑∑︁
𝑑∑︁
†
𝑋 B 𝑋𝑖, 𝑗 | 𝑗⟩ 𝐴 ⟨𝑖| 𝐵 . (2.2.26)
𝑖=0 𝑗=0

The adjoint of 𝑋 is the unique linear operator that satisfies

⟨𝜙|𝑋𝜓⟩ = ⟨𝑋 † 𝜙|𝜓⟩ (2.2.27)

for all |𝜓⟩ ∈ H 𝐴 and |𝜙⟩ ∈ H𝐵 .

Exercise 2.7
Prove that the conjugate transpose is a basis-independent operation, i.e., that
one does not need to specify a basis in order to take the conjugate transpose of

15
Chapter 2: Mathematical Tools

a linear operator.

2.2.5 Hilbert–Schmidt Inner Product, Vectorization, and Trans-

pose Trick

On the vector space L(H 𝐴 , H𝐵 ), we define the Hilbert–Schmidt inner product as

follows:
⟨𝑋, 𝑌 ⟩ B Tr[𝑋 †𝑌 ], 𝑋, 𝑌 ∈ L(H 𝐴 , H𝐵 ). (2.2.28)
A key application of the Hilbert–Schmidt inner product in quantum mechanics
is the Born rule: In Section 3.3, we learn that the probability of obtaining an
outcome in an experiment is equal to the Hilbert–Schmidt inner product of the
state in which the system is prepared and a measurement operator corresponding to
the measurement outcome. With this inner product, the vector space L(H 𝐴 , H𝐵 )
becomes an inner product space, and thus a Hilbert space. The basis defined
in (2.2.3) is an orthonormal basis for L(H 𝐴 , H𝐵 ) under this inner product. The
Hilbert–Schmidt norm (or Schatten 2-norm) of an operator 𝑋 ∈ L(H 𝐴 , H𝐵 ) is
defined from the Hilbert–Schmidt inner product as follows:
√︁
∥ 𝑋 ∥ 2 B ⟨𝑋, 𝑋⟩, (2.2.29)

which is a generalization of the Euclidean norm in (2.1.6). The Cauchy–Schwarz

inequality for the Hilbert–Schmidt inner product is as follows:

|⟨𝑋, 𝑌 ⟩| 2 ≤ ⟨𝑋, 𝑋⟩ · ⟨𝑌 , 𝑌 ⟩ = ∥ 𝑋 ∥ 22 · ∥𝑌 ∥ 22 , (2.2.30)

for all 𝑋, 𝑌 ∈ L(H 𝐴 , H𝐵 ). Observe that the inequality in (2.2.30) is a generalization

of that in (2.1.7).
The Hilbert space L(H 𝐴 , H𝐵 ) is isomorphic to the tensor-product Hilbert space
H 𝐴 ⊗ H𝐵 , where the isomorphism is defined by

|𝑖⟩𝐵 ⟨ 𝑗 | 𝐴 ↔ | 𝑗⟩ 𝐴 ⊗ |𝑖⟩𝐵 C vec(|𝑖⟩𝐵 ⟨ 𝑗 | 𝐴 ) (2.2.31)

for all 𝑖 ∈ {0, . . . , 𝑑 𝐵 − 1} and 𝑗 ∈ {0, . . . , 𝑑 𝐴 − 1}. The operation vec on the right
in (2.2.31) is the “vectorize” operation, which transposes the rows of a 𝑑 𝐵 × 𝑑 𝐴
matrix with respect to the standard basis and then stacks the resulting columns

16
Chapter 2: Mathematical Tools

in order one on top of the next in order to construct a 𝑑 𝐴 𝑑 𝐵 -dimensional column

vector. Specifically, for a linear operator 𝑋 ∈ L(H 𝐴 , H𝐵 ) written as in (2.2.5),
𝐴−1
𝐵 −1 𝑑∑︁
𝑑∑︁
vec(𝑋) = 𝑋𝑖, 𝑗 | 𝑗⟩ 𝐴 ⊗ |𝑖⟩𝐵 . (2.2.32)
𝑖=0 𝑗=0

The following are useful identities involving the vec operation that we call upon
repeatedly throughout this book.
1. For every linear operator 𝑋 ∈ L(H 𝐴 , H𝐵 ),

vec(𝑋) = ( 1 𝐴 ⊗ 𝑋 𝐴→𝐵 )|Γ⟩ 𝐴𝐴 , (2.2.33)

where
𝐴−1
𝑑∑︁
|Γ⟩ 𝐴𝐴 B |𝑖, 𝑖⟩ 𝐴𝐴 . (2.2.34)
𝑖=0

For reasons that become clear later, we refer to |Γ⟩ 𝐴𝐴 as the “maximally
entangled vector.” Note that |Γ⟩ 𝐴𝐴 = vec( 1 𝐴 ). For clarity, when needed, we
write |Γ𝑑 ⟩ to refer to the vector defined in (2.2.34) when each Hilbert space
has dimension 𝑑.
We also note that, for two vectors |𝜓⟩𝐵 ∈ H𝐵 and |𝜙⟩ 𝐴 ∈ H 𝐴 ,

vec(|𝜓⟩𝐵 ⟨𝜙| 𝐴 ) = |𝜙⟩ 𝐴 ⊗ |𝜓⟩𝐵 . (2.2.35)

Exercise 2.8
1. Prove (2.2.33).
2. Prove the equality in (2.2.35) by writing both |𝜓⟩𝐵 and |𝜙⟩ 𝐴 in terms
𝑑 𝐵 −1 𝐴−1
of the orthonormal bases {|𝑖⟩𝐵 }𝑖=0 and {| 𝑗⟩ 𝐴 } 𝑑𝑗=0 , respectively, and
using (2.2.31).

𝜓
2. For every vector |𝜓⟩ 𝐴𝐵 ∈ H 𝐴 ⊗ H𝐵 , there exists a linear operator 𝑋 𝐴→𝐵 ∈
L(H 𝐴 , H𝐵 ) such that

|𝜓⟩ 𝐴𝐵 = ( 1 𝐴 ⊗ 𝑋 𝐴→𝐵 )|Γ⟩ 𝐴𝐴 = vec(𝑋 𝐴→𝐵 ).

𝜓 𝜓
(2.2.36)

17
Chapter 2: Mathematical Tools
Í𝑑 𝐴−1 Í𝑑 𝐵 −1
In particular, if |𝜓⟩ 𝐴𝐵 = 𝑖=0 𝑗=0 𝛼𝑖, 𝑗 |𝑖, 𝑗⟩ 𝐴𝐵 , then we can set

𝐴−1
𝐵 −1 𝑑∑︁
𝑑∑︁
𝜓
𝑋 𝐴→𝐵 = 𝛼𝑖, 𝑗 | 𝑗⟩𝐵 ⟨𝑖| 𝐴 . (2.2.37)
𝑗=0 𝑖=0

𝜓
Alternatively, there exists a linear operator 𝑌𝐵→𝐴 ∈ L(H𝐵 , H 𝐴 ) such that

|𝜓⟩ 𝐴𝐵 = (𝑌𝐵→𝐴 ⊗ 1𝐵 )|Γ⟩𝐵𝐵 .

𝜓
(2.2.38)

In particular, we can set

𝐴−1
𝐵 −1 𝑑∑︁
𝑑∑︁
𝜓
𝑌𝐵→𝐴 = 𝛼𝑖, 𝑗 |𝑖⟩ 𝐴 ⟨ 𝑗 | 𝐵 , (2.2.39)
𝑗=0 𝑖=0

𝜓 𝜓
and subsequently find by inspection of (2.2.37)–(2.2.39) that 𝑋 𝐴→𝐵 = T(𝑌𝐵→𝐴 ).
3. Transpose trick: For every linear operator 𝑋 ∈ L(H 𝐴 , H𝐵 ), the following
equality holds:

( 1 𝐴 ⊗ 𝑋 𝐴→𝐵 )|Γ⟩ 𝐴𝐴 = ((𝑋 T ) 𝐵→𝐴 ⊗ 1𝐵 )|Γ⟩𝐵𝐵 . (2.2.40)

4. For every linear operator 𝑋 ∈ L(C𝑑 ),

Tr[𝑋] = ⟨Γ|(𝑋 ⊗ 1)|Γ⟩ = ⟨Γ|( 1 ⊗ 𝑋)|Γ⟩ = ⟨Γ|vec(𝑋). (2.2.41)

Exercise 2.9
1. Prove (2.2.40).
2. Prove (2.2.41).
3. Let 𝑋 ∈ L(H 𝐴 , H𝐵 ), 𝑌 ∈ L(H𝐶 , H 𝐴 ), and 𝑍 ∈ L(H𝐶 , H𝐷 ). Prove that

vec(𝑋𝑌 𝑍 † ) = (𝑍 ⊗ 𝑋)vec(𝑌 ). (2.2.42)

4. Prove that ∥ 𝑋 ∥ 2 = ∥vec(𝑋)∥ 2 for every linear operator 𝑋 ∈ L(H 𝐴 , H𝐵 ).

18
Chapter 2: Mathematical Tools

2.2.6 Notable Classes of Linear Operators

The following classes of linear operators are particularly notable.

• Hermitian operators: Also known as self-adjoint operators, they are linear
operators that are equal to their conjugate transpose. That is, 𝑋 ∈ L(H) is
Hermitian if 𝑋 † = 𝑋. The set of all Hermitian operators is a real vector space
with dimension 𝑑 2 , where 𝑑 = dim(H). This means that every Hermitian
2
operator 𝑋 can be expanded in terms of an orthonormal basis {𝐵 𝑘 } 𝑑𝑘=1 of
Hermitian operators such that
𝑑2
∑︁
𝑋= 𝑥 𝑘 𝐵𝑘 , (2.2.43)
𝑘=1

where 𝑥 𝑘 are real numbers. Orthonormality is with respect to the Hilbert–

Schmidt inner product, which means that ⟨𝐵 𝑘 , 𝐵ℓ ⟩ = Tr[𝐵 𝑘 𝐵ℓ ] = 𝛿 𝑘,𝑙 and
𝑥 𝑘 = ⟨𝐵 𝑘 , 𝑋⟩ = Tr[𝐵 𝑘 𝑋] for all 𝑘, ℓ ∈ {1, 2, . . . , 𝑑 2 }.
An example of an orthonormal basis of 𝑑 2 Hermitian operators is the following:

(𝑑) 1𝑑
𝜎0,0 B√ , (2.2.44)
𝑑
(𝑑;+) 1
𝜎𝑘,ℓ B √ (|𝑘⟩⟨ℓ| + |ℓ⟩⟨𝑘 |) , 0 ≤ 𝑘 < ℓ ≤ 𝑑 − 1, (2.2.45)
2
(𝑑;i) 1
𝜎𝑘,ℓ B √ (−i|𝑘⟩⟨ℓ| + i|ℓ⟩⟨𝑘 |) , 0 ≤ 𝑘 < ℓ ≤ 𝑑 − 1, (2.2.46)
2
𝑘−1
(𝑑) 1 ©©∑︁
| 𝑗⟩⟨ 𝑗 | ® − 𝑘 |𝑘⟩⟨𝑘 | ® , 1 ≤ 𝑘 ≤ 𝑑 − 1,
ª ª
𝜎𝑘,𝑘 B √︁ (2.2.47)
𝑘 (𝑘 + 1) 𝑗=0
«« ¬ ¬
𝑑 (𝑑−1) (𝑑;+)
Observe that there are 2 operators labeled as 𝜎𝑘,ℓ , all of which are
𝑑 (𝑑−1) (𝑑;i)
traceless, and 2 operators labeled as 𝜎𝑘,ℓ , which are also all traceless.
(𝑑)
The 𝑑 − 1 operators 𝜎𝑘,𝑘 are also traceless. If we scale each of the above
√
operators by 𝑑, then they are called the generalized Gell-Mann matrices.
When 𝑑 = 2, the generalized Gell-Mann matrices reduce to the Pauli matrices:
√ (2) √ (2;+)

1 B 0 1 = 2𝜎0,0 , 𝑋 B 1 0 = 2𝜎0,1
1 0 0 1
, (2.2.48)

19
Chapter 2: Mathematical Tools

√ (2;i) √ (2)

0 −i 1 0
𝑌 B = 2𝜎0,1 , 𝑍B = 2𝜎1,1 . (2.2.49)
i 0 0 −1

The Pauli matrices are important in the context of quantum mechanics, and
quantum information more generally, as they can be used to describe the
quantum states of two-dimensional quantum systems, as well as their evolu-
tion. They are also involved in fundamental quantum information processing
protocols such as quantum teleportation. We elaborate upon these points in
Chapters 3–5.

Exercise 2.10
Prove that the operators in (2.2.44)–(2.2.47) do indeed form an orthonormal
basis for the vector space of Hermitian operators. More generally, prove
that they form an orthonormal basis for the vector space L(C𝑑 ) of all linear
operators.

• Positive semi-definite operators: A Hermitian operator 𝑋 ∈ L(H) is positive

semi-definite if ⟨𝜓|𝑋 |𝜓⟩ ≥ 0 for all |𝜓⟩ ∈ H. For every positive semi-definite
operator 𝑋, there exists a linear operator 𝑌 such that 𝑋 = 𝑌 †𝑌 . 𝑋 is called
positive definite if ⟨𝜓|𝑋 |𝜓⟩ > 0 for all |𝜓⟩ ∈ H such that |𝜓⟩ ≠ 0. We write
𝑋 ≥ 0 if 𝑋 is positive semi-definite, and 𝑋 > 0 if 𝑋 is positive definite. If
𝑋 − 𝑍 is positive semi-definite for Hermitian operators 𝑋 and 𝑍, then we write
𝑋 ≥ 𝑍, and if 𝑋 − 𝑍 is positive definite, then we write 𝑋 > 𝑍. The ordering
𝑋 ≥ 𝑍 on Hermitian operators is a partial order called the Löwner order and is
discussed more in Definition 2.12.

Exercise 2.11
Let 𝑋 ∈ L(H 𝐴 , H𝐵 ) be a linear operator, with H 𝐴 and H𝐵 arbitrary. Prove
that 𝑋 † 𝑋 is positive semi-definite.

• Density operators: These are Hermitian operators that are positive semi-definite
and have unit trace. Density operators are generalizations of probability
distributions from classical information theory and describe the states of a
quantum system, as detailed in Chapter 3.
• Unitary operators: These are linear operators whose inverses are equal to their
adjoints. That is, 𝑈 ∈ L(H) is unitary if 𝑈 †𝑈 = 𝑈𝑈 † = 1. Unitary operators
20
Chapter 2: Mathematical Tools

generalize invertible maps or permutations from classical information theory

and describe the noiseless evolution of the state of a quantum system, as
detailed in Chapters 3 and 4.

Exercise 2.12
Let 𝑈 ∈ L(H) be a unitary operator acting on a 𝑑-dimensional Hilbert
space H.
1. Given an orthonormal basis {|𝑒 𝑘 ⟩} 𝑑𝑘=1 for H, prove that the set
{| 𝑓 𝑘 ⟩} 𝑑𝑘=1 , with | 𝑓 𝑘 ⟩ B 𝑈|𝑒 𝑘 ⟩ for all 1 ≤ 𝑘 ≤ 𝑑, is another orthonormal
basis for H.
2. Using the transpose trick identity in (2.2.40), prove that

|Γ⟩ = (𝑈 ⊗ 𝑈)|Γ⟩. (2.2.50)

3. Using 1. and 2., conclude that the following identity holds for every
orthonormal basis {|𝑒 𝑘 ⟩} 𝑑𝑘=1 for H:

𝑑
∑︁
|Γ⟩ = |𝑒 𝑘 ⟩ ⊗ |𝑒 𝑘 ⟩. (2.2.51)
𝑘=1

• Isometric operators or isometries: A linear operator 𝑉 ∈ L(H 𝐴 , H𝐵 ) is

isometric if 𝑉 †𝑉 = 1 𝐴 (we also say that 𝑉 is an isometry). Isometries also
describe the noiseless evolution of the state of a quantum system, as detailed
in Chapters 3 and 4.

Exercise 2.13
Let 𝑉 ∈ L(H 𝐴 , H𝐵 ) be an isometry.
1. Prove that ⟨𝑉𝜓|𝑉 𝜙⟩ = ⟨𝜓|𝜙⟩ for all |𝜓⟩, |𝜙⟩ ∈ H 𝐴 .
2. Using 1., prove that 𝑉 is injective.
3. Using 2., prove that 𝑑 𝐵 ≥ 𝑑 𝐴 . (Hint: use the rank-nullity theorem in
(2.2.18).)

21
Chapter 2: Mathematical Tools

4. Prove that, if 𝑑 𝐴 = 𝑑 𝐵 , then 𝑉 is a unitary. (Hint: use the result of

Exercise 2.4.)

• Projection operators: A linear operator 𝑃 ∈ L(H) is a projection if it is

Hermitian and satisfies 𝑃2 = 𝑃. Such operators are also sometimes called
orthogonal projection operators.
Note that every linear operator 𝑋 can be decomposed as

𝑋 = Re[𝑋] + i Im[𝑋], (2.2.52)

where Re[𝑋] B 12 (𝑋 + 𝑋 † ) and Im[𝑋] B 2i1 (𝑋 − 𝑋 † ) are both Hermitian operators,

generalizing how complex numbers can be decomposed into real and imaginary
parts.

2.2.7 Singular Value, Schmidt, and Polar Decompositions

An important fact that we make use of throughout this book is the singular value
decomposition theorem.

Theorem 2.1 Singular Value Decomposition

For every linear operator 𝑋 ∈ L(H 𝐴 , H𝐵 ) with 𝑟 B rank(𝑋), there exist strictly
positive real numbers {𝑠 𝑘 > 0 : 1 ≤ 𝑘 ≤ 𝑟}, called the singular values of 𝑋,
and orthonormal vectors {|𝑒 𝑘 ⟩𝐵 : 1 ≤ 𝑘 ≤ 𝑟 } and {| 𝑓 𝑘 ⟩ 𝐴 : 1 ≤ 𝑘 ≤ 𝑟}, such
that 𝑟
∑︁
𝑋= 𝑠 𝑘 |𝑒 𝑘 ⟩𝐵 ⟨ 𝑓 𝑘 | 𝐴 . (2.2.53)
𝑘=1
This is called the singular value decomposition of 𝑋.

Remark: From (2.2.53), we see that the rank of a linear operator 𝑋, which we defined earlier
as the dimension of the image of 𝑋, is equal to the number of singular values of 𝑋.

The singular value decomposition theorem can be written in the following matrix
form that is familiar from elementary linear algebra. We first extend the orthonormal
vectors {|𝑒 𝑘 ⟩𝐵 : 1 ≤ 𝑘 ≤ 𝑟} in H𝐵 to an orthonormal basis {|𝑒 𝑘 ⟩𝐵 : 1 ≤ 𝑘 ≤ 𝑑 𝐵 }
22
Chapter 2: Mathematical Tools

for H𝐵 , and similarly, we extend the set {| 𝑓 𝑘 ⟩ 𝐴 : 1 ≤ 𝑘 ≤ 𝑟} of orthonormal vectors

in H 𝐴 to an orthonormal basis {| 𝑓 𝑘 ⟩ 𝐴 : 1 ≤ 𝑘 ≤ 𝑑 𝐴 } for H 𝐴 . Then, by defining
the unitaries
𝑑𝐵
∑︁ 𝑑𝐴
∑︁
𝑊B |𝑒 𝑘 ⟩𝐵 ⟨𝑘 − 1| 𝐵 , 𝑉 B | 𝑓 𝑘 ⟩ 𝐴 ⟨𝑘 − 1| 𝐴 , (2.2.54)
𝑘=1 𝑘=1

we can write (2.2.53) as

𝑋 = 𝑊 𝑆𝑉 † , (2.2.55)
where
𝑟
∑︁
𝑆B 𝑠 𝑘 |𝑘 − 1⟩𝐵 ⟨𝑘 − 1| 𝐴 (2.2.56)
𝑘=1
is the matrix of singular values. Note that the matrix 𝑆 is a 𝑑 𝐵 × 𝑑 𝐴 rectangular
matrix and (2.2.56) is only specifying the non-zero entries of 𝑆 along the diagonal.
The singular value decomposition can be used to prove the following useful
theorem for vectors in a tensor-product Hilbert space.

Theorem 2.2 Schmidt Decomposition

Let |𝜓⟩ 𝐴𝐵 be a vector in the tensor-product Hilbert space H 𝐴𝐵 . Let 𝑋 𝐴→𝐵
be the linear operator with matrix elements ⟨ 𝑗 | 𝐵 𝑋 |𝑖⟩ 𝐴 = ⟨𝑖, 𝑗 |𝜓⟩ 𝐴𝐵 , and let
𝑟 = rank(𝑋). Then, there exist strictly positive Schmidt coefficients {𝜆 𝑘 }𝑟𝑘=1 ,
and orthonormal vectors {|𝑒 𝑘 ⟩ 𝐴 }𝑟𝑘=1 and {| 𝑓 𝑘 ⟩𝐵 }𝑟𝑘=1 , such that
𝑟 √︁
∑︁
|𝜓⟩ 𝐴𝐵 = 𝜆 𝑘 |𝑒 𝑘 ⟩ 𝐴 ⊗ | 𝑓 𝑘 ⟩𝐵 . (2.2.57)
𝑘=1

The quantity 𝑟 is called the Schmidt rank, satisfying 𝑟 ≤ min{𝑑 𝐴 , 𝑑 𝐵 }.

Proof: This is a direct application of the singular value decomposition (Theo-

rem 2.1). Consider the operator 𝑋 defined in the statement of the theorem, having
matrix elements ⟨ 𝑗 | 𝐵 𝑋 |𝑖⟩ 𝐴 = ⟨𝑖, 𝑗 |𝜓⟩ 𝐴𝐵 . Observe then that vec(𝑋) = |𝜓⟩ 𝐴𝐵 (see
(2.2.31)).
Now, by the singular-value decomposition (Theorem 2.1), we can write 𝑋 as
Í
𝑋 = 𝑟𝑘=1 𝑠 𝑘 | 𝑓 𝑘 ⟩𝐵 ⟨𝑔 𝑘 | 𝐴 , where 𝑠 𝑘 are strictly positive numbers, 𝑟 = rank(𝑋) ≤
min{𝑑 𝐴 , 𝑑 𝐵 }, and {| 𝑓 𝑘 ⟩𝐵 : 1 ≤ 𝑘 ≤ 𝑟} and {|𝑔 𝑘 ⟩ 𝐴 : 1 ≤ 𝑘 ≤ 𝑟} are sets of
23
Chapter 2: Mathematical Tools

orthonormal vectors in H𝐵 and H 𝐴 , respectively. Letting 𝜆 𝑘 = 𝑠2𝑘 , and upon

vectorizing 𝑋 as written in this form (see (2.2.31)), we find that
𝑟 √︁
∑︁
|𝜓⟩ 𝐴𝐵 = vec(𝑋) = 𝜆 𝑘 |𝑔 𝑘 ⟩ 𝐴 ⊗ | 𝑓 𝑘 ⟩𝐵 . (2.2.58)
𝑘=1

Letting |𝑒 𝑘 ⟩ 𝐴 B |𝑔 𝑘 ⟩ 𝐴 , the result follows. ■

A simple but important consequence of the Schmidt decomposition theorem is

that every vector |𝜓⟩ 𝐴𝐵 in a tensor-product Hilbert space H 𝐴𝐵 can be written as
min{𝑑 𝐴,𝑑 𝐵 }
∑︁ √
|𝜓⟩ 𝐴𝐵 = 𝑝 𝑘 |𝑢 𝑘 ⟩ 𝐴 ⊗ |𝑣 𝑘 ⟩𝐵 (2.2.59)
𝑘=1

where 𝑝 𝑘 = 𝑠2𝑘 for 1 ≤ 𝑘 ≤ 𝑟 and 𝑝 𝑘 = 0 for 𝑟 < 𝑘 ≤ min{𝑑 𝐴 , 𝑑 𝐵 }. The vectors

|𝑢 𝑘 ⟩ 𝐴 and |𝑣 𝑘 ⟩𝐵 are such that |𝑢 𝑘 ⟩ 𝐴 = |𝑒 𝑘 ⟩ 𝐴 and |𝑣 𝑘 ⟩𝐵 = | 𝑓 𝑘 ⟩𝐵 for 1 ≤ 𝑘 ≤ 𝑟,
and the remaining vectors combine to form orthonormal bases for a min{𝑑 𝐴 , 𝑑 𝐵 }-
dimensional subspace of H 𝐴 ⊗ H𝐵 .

Exercise 2.14
Using arguments similar to the proof of Theorem 2.2, prove that every linear
operator 𝑋 𝐴𝐵 ∈ L(H 𝐴 ⊗ H𝐵 ) can be written as
𝑟 √
∑︁
𝑋 𝐴𝐵 = 𝜆 𝑘 𝐸 𝐴𝑘 ⊗ 𝐹𝐵𝑘 , (2.2.60)
𝑘=1

where the set {𝜆 𝑘 }𝑟𝑘=1 consists of strictly positive reals, {𝐸 𝐴𝑘 }𝑟𝑘=1 and {𝐹𝐵𝑘 }𝑟𝑘=1 are
orthonormal sets of linear operators acting on H 𝐴 and H𝐵 , respectively, and 𝑟 =
rank(𝑀), where 𝑀 ∈ L(H 𝐴 ⊗ H 𝐴 , H𝐵 ⊗ H𝐵 ) is defined by ⟨ 𝑗, ℓ| 𝐵𝐵 𝑀 |𝑖, 𝑘⟩ 𝐴𝐴 =
⟨𝑖, 𝑗 | 𝐴𝐵 𝑋 |𝑘, ℓ⟩ 𝐴𝐵 for all 0 ≤ 𝑖, 𝑘 ≤ 𝑑 𝐴 − 1 and 0 ≤ 𝑗, ℓ ≤ 𝑑 𝐵 − 1.

Another important decomposition of a linear operator is the polar decomposition.

24
Chapter 2: Mathematical Tools

Theorem 2.3 Polar Decomposition

Every linear operator 𝑋 ∈ L(H) can be written as 𝑋 = 𝑈𝑃, where 𝑈 is
a unitary operator
√ and 𝑃 is a positive semi-definite operator. In particular,
𝑃 = |𝑋 | B 𝑋 † 𝑋.

Proof: If 𝑋 = 𝑊 𝑆𝑉 † is the singular value decomposition of 𝑋, then we can take

𝑃 = 𝑉 𝑆𝑉 † and 𝑈 = 𝑊𝑉 † . ■

The polar decomposition generalizes the polar form 𝑧 = 𝑟e𝑖𝜃 of a complex

number 𝑧. Indeed, the fact that 𝑟 ≥ 0 is in correspondence with 𝑃 being a positive
semi-definite operator, and the phase e𝑖𝜃 is in correspondence with the unitary 𝑈,
the latter having eigenvalues on the unit circle (i.e., of the form e𝑖𝜃 ).

2.2.8 Spectral Theorem

Given a linear operator 𝑋 ∈ L(H) acting on some Hilbert space H, if there exists a
vector |𝜓⟩ ∈ H such that
𝑋 |𝜓⟩ = 𝜆|𝜓⟩, (2.2.61)
then |𝜓⟩ is said to be an eigenvector of 𝑋 with associated eigenvalue 𝜆. The set
of all eigenvectors associated with an eigenvalue 𝜆 is a subspace of H called the
eigenspace of 𝑋 associated with 𝜆, and the multiplicity of 𝜆 is the number of linearly
independent eigenvectors of 𝑋 that are associated with 𝜆 (in other words, it is
the dimension of the eigenspace of 𝑋 associated with 𝜆). The eigenspace of 𝑋
associated with 𝜆 is equal to ker(𝑋 − 𝜆𝐼).
The spectral theorem, which we state below, allows us to decompose every
normal operator 𝑋, i.e., an operator that commutes with its adjoint, so that
𝑋 𝑋 † = 𝑋 † 𝑋, in terms of its eigenvalues and projections onto its corresponding
eigenspaces. We employ it most often when analyzing quantum states and
observables.

25
Chapter 2: Mathematical Tools

Theorem 2.4 Spectral Theorem

For every normal operator 𝑋 ∈ L(H), there exists 𝑛 ∈ N such that
𝑛
∑︁
𝑋= 𝜆 𝑗Π𝑗, (2.2.62)
𝑗=1

where 𝜆 1 , 𝜆2 , . . . , 𝜆 𝑛 ∈ C are the distinct eigenvalues of 𝑋 and Π1 , Π2 , . . . , Π𝑛

are the spectral projections onto the corresponding eigenspaces, which satisfy
Π1 + Π2 + · · · + Π𝑛 = 1H and Π𝑖 Π 𝑗 = 𝛿𝑖, 𝑗 Π𝑖 . The decomposition in (2.2.62) is
unique and is called the spectral decomposition of 𝑋. The spectrum of 𝑋 is
denoted by spec(𝑋) B {𝜆 1 , 𝜆2 , . . . , 𝜆 𝑛 }.

Remark: The multiplicity of an eigenvalue 𝜆 𝑖 is equal to the rank of the corresponding

projection Π𝑖 .
The property Π𝑖 Π 𝑗 = 𝛿𝑖, 𝑗 Π𝑖 of the spectral projections indicates that the eigenspaces of
a linear operator are orthogonal to each other. In other words, for every eigenvector |𝜓𝜆1 ⟩
associated with the eigenvalue 𝜆 1 and every eigenvector |𝜓𝜆2 ⟩ associated with the eigenvector 𝜆 2 ,
where 𝜆 1 ≠ 𝜆 2 , we have that ⟨𝜓𝜆1 |𝜓𝜆2 ⟩ = 0.

If we take the spectral decomposition of a normal operator 𝑋 ∈ L(H) in

(2.2.62) and find orthonormal bases for the corresponding eigenspaces, then 𝑋 can
be written as
∑︁𝑑
𝑋= 𝜆 𝑘 |𝜓 𝑘 ⟩⟨𝜓 𝑘 |, (2.2.63)
𝑘=1

where 𝑑 = dim(H) and {|𝜓 𝑘 ⟩} 𝑑𝑘=1 is a set of orthonormal vectors such that

|𝜓1 ⟩⟨𝜓1 | + |𝜓2 ⟩⟨𝜓2 | + · · · + |𝜓 𝑑 ⟩⟨𝜓 𝑑 | = 1H . (2.2.64)

From this decomposition, we see clearly that |𝜓 𝑘 ⟩ is an eigenvector of 𝑋 with

associated eigenvalue 𝜆 𝑘 .
The spectral theorem is a statement of a fact familiar from elementary linear
algebra, that every normal operator can be diagonalized by a unitary. Indeed,
if we considerÍthe spectral decomposition in (2.2.63), and we define the unitary
operator 𝑈 B 𝑑𝑘=1 |𝜓 𝑘 ⟩⟨𝑘 − 1|, then (2.2.63) can be written as 𝑋 = 𝑈𝐷𝑈 † , where
Í
𝐷 B 𝑑𝑘=1 𝜆 𝑘 |𝑘 − 1⟩⟨𝑘 − 1| is a diagonal matrix.
26
Chapter 2: Mathematical Tools

Note that for the decomposition in (2.2.63) the numbers 𝜆 𝑘 ∈ C are not all
distinct because the eigenspace associated with each eigenvalue can have dimension
greater than one. Also, note that the decomposition in (2.2.63) is generally not
unique because the decomposition of the spectral projections into orthonormal
vectors is not unique.
From (2.2.63), it is evident that the rank of a normal operator 𝑋 (recall the
discussion around (2.2.16)) is equal to the number of non-zero eigenvalues of 𝑋
(including their multiplicities). Furthermore, the support of 𝑋 (recall (2.2.19)) is
equal to the span of all eigenvectors of 𝑋 associated with the non-zero eigenvalues
of 𝑋. In particular, ∑︁
Π𝑋 B |𝜓 𝑘 ⟩⟨𝜓 𝑘 | (2.2.65)
𝑘:𝜆 𝑘 ≠0

is the projection onto the support of 𝑋. It is also evident that the trace of 𝑋 is equal
Í Í
to the sum of its eigenvalues, i.e., Tr[𝑋] = 𝑑𝑘=1 𝜆 𝑘 = 𝑘:𝜆 𝑘 ≠0 𝜆 𝑘 .

Exercise 2.15
Let 𝑃 be a projection operator.
1. Prove that the eigenvalues of 𝑃 are either 0 or 1. Prove that Tr[𝑃] = rank(𝑃).
2. Using 1., conclude that rank(𝑋) = Tr[Π 𝑋 ] for every linear operator 𝑋,
where Π 𝑋 is the projection onto the support of 𝑋, as defined in (2.2.65).

The singular values of a linear operator 𝑋 (not necessarily normal) are related
to the eigenvalues of 𝑋 † 𝑋 and 𝑋 𝑋 † in the following way. Let {𝑠 𝑘 }rank(𝑋)
𝑘=1 be the set
rank(𝑋)
of singular values of 𝑋, and let {𝜆 𝑘 } 𝑘=1 be the non-zero eigenvalues of 𝑋 † 𝑋,
which are the same as the eigenvalues of 𝑋 𝑋 † . (Note that both 𝑋 † 𝑋 and 𝑋 𝑋 †√are
normal operators, so that the spectral theorem applies to them.) Then, 𝑠 𝑘 = 𝜆 𝑘
for all 1 ≤ 𝑘 ≤ rank(𝑋). In particular,
√ if 𝑋 is a Hermitian operator with non-zero
rank(𝑋)
eigenvalues {𝜔 𝑘 } 𝑘=1 , then 𝑠 𝑘 = 𝜔2𝑘 = |𝜔 𝑘 | for all 1 ≤ 𝑘 ≤ rank(𝑋).

Exercise 2.16
1. Prove that every Hermitian operator has real eigenvalues.
2. Prove that every unitary operator has eigenvalues with unit modulus; i.e., if

27
Chapter 2: Mathematical Tools

𝜆 is an eigenvalue of a given unitary operator, then |𝜆| = 1.

If 𝑋 is a Hermitian operator, then we can split it into a positive part, denoted

by 𝑋+ , and a negative part, denoted by 𝑋− , so that
𝑋 = 𝑋+ − 𝑋− . (2.2.66)
This follows because 𝑋 has real eigenvalues (see Exercise 2.16). In particular, if
Í
𝑋 = 𝑑𝑘=1 𝜆 𝑘 |𝜓 𝑘 ⟩⟨𝜓 𝑘 | is a spectral decomposition of 𝑋, then
∑︁ ∑︁
𝑋+ B 𝜆 𝑘 |𝜓 𝑘 ⟩⟨𝜓 𝑘 |, 𝑋− B |𝜆 𝑘 | |𝜓 𝑘 ⟩⟨𝜓 𝑘 |. (2.2.67)
𝑘:𝜆 𝑘 ≥0 𝑘:𝜆 𝑘 <0
Note that both 𝑋+ and 𝑋− are positive semi-definite operators, and they are supported
on orthogonal subspaces, meaning that 𝑋+ 𝑋− = 0. Such a decomposition of a
Hermitian operator 𝑋 into positive and negative parts is called the Jordan–Hahn
decomposition of 𝑋.

Exercise 2.17
Using the Jordan–Hahn decomposition, prove that there exists a basis for L(C𝑑 )
consisting entirely of positive semi-definite operators, for all 𝑑 ≥ 2.

Another important fact is that a Hermitian operator 𝑋 is positive semi-definite

if and only if all of its eigenvalues are non-negative, and positive definite if and
only if all of its eigenvalues are strictly positive. The latter implies that every
positive definite operator 𝑋 ∈ L(H) has full support, i.e., supp(𝑋) = H and
rank(𝑋) = dim(H). Thus, positive definite operators are invertible.

Exercise 2.18
For every positive semi-definite operator 𝑋, prove that the operator 𝑋 + 𝜀 1 is
positive definite for all 𝜀 > 0.

2.2.8.1 Functions of Hermitian Operators

Using the spectral decomposition as in (2.2.63), we can define a function of a

Hermitian operator. Basic functions of density operators that we employ extensively
are 𝑥 → −𝑥 log2 𝑥 for the quantum entropy and 𝑥 → 𝑥 𝛼 for the Rényi entropy.
28
Chapter 2: Mathematical Tools

Let 𝑓 : R → R be a real-valued function with domain dom( 𝑓 ). For every

Hermitian operator 𝑋 ∈ L(H) with spectral decomposition
𝑑
∑︁
𝑋= 𝜆 𝑘 |𝜓 𝑘 ⟩⟨𝜓 𝑘 |, (2.2.68)
𝑘=1

where 𝑑 = dim(H), we define 𝑓 (𝑋) as the following operator:

∑︁
𝑓 (𝑋) B 𝑓 (𝜆 𝑘 )|𝜓 𝑘 ⟩⟨𝜓 𝑘 |. (2.2.69)
𝑘:𝜆 𝑘 ∈dom( 𝑓 )

An immediate consequence of this definition is that if 𝑋 is a Hermitian operator

and 𝑈 is a unitary operator, then

𝑓 (𝑈 𝑋𝑈 † ) = 𝑈 𝑓 (𝑋)𝑈 † . (2.2.70)

This is due to the fact that 𝑋 and 𝑈 𝑋𝑈 † have the same eigenvalues.
Functions that arise frequently throughout this book are as follows:
• Power functions: for every 𝛼 ∈ N, the function 𝑓 (𝑥) = 𝑥 𝛼 , with 𝑥 ∈ R, extends
to Hermitian operators via (2.2.69) as
𝑑
∑︁
𝛼
𝑋 B 𝜆𝛼𝑘 |𝜓 𝑘 ⟩⟨𝜓 𝑘 |, 𝛼 ∈ N. (2.2.71)
𝑘=1

The function 𝑓 (𝑥) = 𝑥 −𝛼 = 1

𝑥𝛼 , for 𝛼 ∈ N, has domain {𝑥 ∈ R : 𝑥 ≠ 0}, so that
∑︁ 1
−𝛼
𝑋 B
𝜆 𝛼 |𝜓 𝑘 ⟩⟨𝜓 𝑘 |, 𝛼 ∈ N. (2.2.72)
𝑘:𝜆 ≠0 𝑘
𝑘

More generally, for 𝛼 ∈ (0, ∞) \ N, the function 𝑓 (𝑥) = 𝑥 𝛼 has domain

{𝑥 ∈ R : 𝑥 ≥ 0}, so that
∑︁
𝑋𝛼 B 𝜆𝛼𝑘 |𝜓 𝑘 ⟩⟨𝜓 𝑘 |, 𝛼 ∈ (0, ∞). (2.2.73)
𝑘:𝜆 𝑘 ≥0

1
√ that 𝜆 𝑘 ≥ 0 for
If 𝑋 is a positive semi-definite operator (so
1
all 𝑘), then in √the
case 𝛼 = 2 we typically use the notation 𝑋 to refer to 𝑋 2 . In particular, 𝑋
√ √
is the unique positive semi-definite operator such that 𝑋 𝑋 = 𝑋.
29
Chapter 2: Mathematical Tools

For 𝛼 = 0, the following equality holds

∑︁
𝑋0 B |𝜓 𝑘 ⟩⟨𝜓 𝑘 | = Π 𝑋 , (2.2.74)
𝑘:𝜆 𝑘 ≠0

because 𝑓 (𝑥) = 𝑥 0 has domain {𝑥 ∈ R : 𝑥 ≠ 0}. In the above, Π 𝑋 is the

projection into the support of 𝑋; recall (2.2.65).
The function 𝑓 (𝑥) = 𝑥 −𝛼 = 𝑥1𝛼 , for 𝛼 ∈ (0, ∞) \ N has domain {𝑥 ∈ R : 𝑥 > 0},
so that ∑︁ 1
−𝛼
𝑋 B
𝜆 𝛼 |𝜓 𝑘 ⟩⟨𝜓 𝑘 |, 𝛼 ∈ (0, ∞). (2.2.75)
𝑘:𝜆 >0 𝑘
𝑘

• Logarithm functions: For the function log𝑏 : (0, ∞) → R with base 𝑏 > 0, we
define ∑︁
log𝑏 (𝑋) B log𝑏 (𝜆 𝑘 )|𝜓 𝑘 ⟩⟨𝜓 𝑘 |. (2.2.76)
𝑘:𝜆 𝑘 >0

We deal throughout this book exclusively with the base-2 logarithm log2 and
the base-e logarithm loge ≡ ln.
We end this section with a lemma, which is used several times in Chapter 7.

Lemma 2.5
Let 𝑋 ∈ L(H), and let 𝑓 be a function such that the squares of the singular
values of 𝑋 are in the domain of 𝑓 . Then

𝑋 𝑓 (𝑋 † 𝑋) = 𝑓 (𝑋 𝑋 † ) 𝑋. (2.2.77)

Proof: This is a direct consequence of the singular value decomposition theorem

(Theorem 2.1). Let 𝑋 = 𝑊 𝑆𝑉 † be a singular value decomposition of 𝐿, where 𝑊
and 𝑉 are unitary operators and 𝑆 is a diagonal, positive semi-definite operator.
Then
†
† † † †
𝑋 𝑓 (𝑋 𝑋) = 𝑊 𝑆𝑉 𝑓 𝑊 𝑆𝑉 𝑊 𝑆𝑉 (2.2.78)

= 𝑊 𝑆𝑉 † 𝑓 (𝑉 𝑆𝑊 †𝑊 𝑆𝑉 † ) (2.2.79)
= 𝑊 𝑆𝑉 † 𝑓 (𝑉 𝑆 2𝑉 † ). (2.2.80)

30
Chapter 2: Mathematical Tools

Now, we use the fact that 𝑓 (𝑉 𝑆 2𝑉 † ) = 𝑉 𝑓 (𝑆 2 )𝑉 † , which holds because the function
(·) ↦→ 𝑉 (·)𝑉 † , with 𝑉 unitary, preserves the eigenvalues. Using as well the fact
that 𝑆 𝑓 (𝑆 2 ) = 𝑓 (𝑆 2 )𝑆, we obtain

𝑋 𝑓 (𝑋 † 𝑋) = 𝑊 𝑆 𝑓 (𝑆 2 )𝑉 † (2.2.81)
= 𝑊 𝑓 (𝑆 2 )𝑆𝑉 † (2.2.82)
= 𝑊 𝑓 (𝑆𝑉𝑉 † 𝑆)𝑊 †𝑊 𝑆𝑉 † (2.2.83)
= 𝑓 (𝑊 𝑆𝑉 †𝑉 𝑆𝑊 † )𝑊 𝑆𝑉 † (2.2.84)
= 𝑓 (𝑋 𝑋 † ) 𝑋. (2.2.85)

This concludes the proof. ■

2.2.9 Norms

A norm on a Hilbert space H (more generally a vector space) is a function

∥·∥ : H → R that satisfies the following properties:
• Non-negativity: ∥|𝜓⟩∥ ≥ 0 for all |𝜓⟩ ∈ H, and ∥|𝜓⟩∥ = 0 if and only if
|𝜓⟩ = 0.
• Scaling: For every 𝛼 ∈ C and |𝜓⟩ ∈ H, ∥𝛼|𝜓⟩∥ = |𝛼| ∥|𝜓⟩∥.
• Triangle inequality: For all |𝜓⟩, |𝜙⟩ ∈ H, ∥|𝜓⟩ + |𝜙⟩∥ ≤ ∥|𝜓⟩∥ + ∥|𝜙⟩∥.
An immediate consequence of the scaling property and the triangle inequality is
that a norm is convex:

∥𝜆|𝜓⟩ + (1 − 𝜆)|𝜙⟩∥ ≤ 𝜆 ∥|𝜓⟩∥ + (1 − 𝜆) ∥|𝜙⟩∥ (2.2.86)

for all vectors |𝜓⟩ and |𝜙⟩ and all 𝜆 ∈ [0, 1].
In this section, we are primarily interested in the Hilbert space L(H) of linear
operators 𝑋 : H → H for some Hilbert space H. The following norm for linear
operators is used extensively in this book.

31
Chapter 2: Mathematical Tools

Definition 2.6 Schatten Norm

For every linear operator 𝑋 ∈ L(H) acting on a Hilbert space H, we define its
Schatten 𝛼-norm as 1
∥ 𝑋 ∥ 𝛼 B (Tr[|𝑋 | 𝛼 ]) 𝛼 , (2.2.87)
√
for all 𝛼 ∈ [1, ∞), where |𝑋 | B 𝑋 † 𝑋. We let

∥ 𝑋 ∥ ∞ B lim ∥ 𝑋 ∥ 𝛼 . (2.2.88)
𝛼→∞

Throughout this book, we extend the function ∥·∥ 𝛼 to include 𝛼 ∈ (0, 1) (with
the definition exactly as in (2.2.87)), although in this case it is not a norm because
it does not satisfy the triangle inequality.
Norms are typically employed in pure mathematics to measure the lengths of
vectors or operators, and different norms give different ways of measuring length.
In quantum information, we employ norms to measure entropy and information of
quantum states and channels (see Chapter 7). The parameter 𝛼 for the Schatten
norm then becomes the Rényi parameter for the Rényi entropy.

Exercise 2.19
Let 𝑋 be a linear operator, and let {𝑠 𝑘 }𝑟𝑘=1 be the set of singular values of 𝑋,
where 𝑟 B rank(𝑋). Prove that

𝑟
! 𝛼1
∑︁
∥ 𝑋 ∥𝛼 = 𝑠 𝛼𝑘 (2.2.89)
𝑘=1

for all 𝛼 ∈ (0, ∞).

If 𝑋 is a Hermitian operator with non-zero eigenvalues {𝜆 𝑘 }𝑟𝑘=1 , then its singular

values are 𝑠 𝑘 = |𝜆 𝑘 |, where 𝑘 ∈ {1, . . . , 𝑟 } (see Section 2.2.8). Therefore, for
all 𝛼 ∈ (0, ∞),
𝑟
! 𝛼1
∑︁
∥ 𝑋 ∥𝛼 = |𝜆 𝑘 | 𝛼 (𝑋 Hermitian). (2.2.90)
𝑘=1

32
Chapter 2: Mathematical Tools

Exercise 2.20
Let 𝑋 be a linear operator and 𝛼 ∈ (0, ∞). Prove that

𝑋†𝑋 𝛼
= 𝑋 𝑋† 𝛼
= ∥ 𝑋 ∥ 22𝛼 . (2.2.91)

We now state several important properties of the Schatten norm.

Proposition 2.7 Properties of Schatten Norm

For all 𝛼 ∈ [1, ∞], the Schatten norm ∥·∥ 𝛼 has the following properties.
1. Monotonicity: The Schatten norm is monotonically non-increasing with 𝛼:
for 𝛼 ≥ 𝛽 and a linear operator 𝑋, the following holds:
∥ 𝑋 ∥𝛼 ≤ ∥ 𝑋 ∥ 𝛽 . (2.2.92)
In particular, we then have that ∥ 𝑋 ∥ ∞ ≤ ∥ 𝑋 ∥ 𝛼 ≤ ∥ 𝑋 ∥ 1 for all 𝛼 ∈ [1, ∞]
and every linear operator 𝑋.
2. Isometric invariance: For all isometries 𝑈 and 𝑉,
∥ 𝑋 ∥ 𝛼 = 𝑈 𝑋𝑉 † 𝛼
. (2.2.93)

3. Submultiplicativity: For all linear operators 𝑋, 𝑌 , and 𝑍,

∥ 𝑋𝑌 𝑍 ∥ 𝛼 ≤ ∥ 𝑋 ∥ ∞ ∥𝑌 ∥ 𝛼 ∥𝑍 ∥ ∞ . (2.2.94)
Consequently, for all linear operators 𝑋 and 𝑌 ,
∥ 𝑋𝑌 ∥ 𝛼 ≤ ∥ 𝑋 ∥ 𝛼 ∥𝑌 ∥ 𝛼 . (2.2.95)

4. Multiplicativity with respect to tensor product: For all linear operators 𝑋

and 𝑌 ,
∥ 𝑋 ⊗ 𝑌 ∥ 𝛼 = ∥ 𝑋 ∥ 𝛼 ∥𝑌 ∥ 𝛼 . (2.2.96)
5. Direct-sum property: Given a collection {𝑋𝑥 }𝑥∈X of linear operators
indexed by a finite alphabet X, the following equality holds for every
orthonormal basis {|𝑥⟩}𝑥∈X :
𝛼
∑︁ ∑︁
|𝑥⟩⟨𝑥| ⊗ 𝑋𝑥 = ∥ 𝑋𝑥 ∥ 𝛼𝛼 . (2.2.97)
𝑥∈X 𝛼 𝑥∈X

33
Chapter 2: Mathematical Tools

6. Duality: For every linear operator 𝑋,

1 1
∥ 𝑋 ∥ 𝛼 = sup Tr[𝑌 † 𝑋] : ∥𝑌 ∥ 𝛽 ≤ 1, + = 1 . (2.2.98)
𝑌 ≠0 𝛼 𝛽

The equality above implies Hölder’s inequality:

Tr[𝑍 † 𝑋] ≤ ∥ 𝑋 ∥ 𝛼 ∥𝑍 ∥ 𝛽 (2.2.99)

which holds for all linear operators 𝑋 and 𝑍, where 𝛼, 𝛽 ∈ [1, ∞] satisfy
1 1 1 1
𝛼 + 𝛽 = 1. In this sense, the norms ∥·∥ 𝛼 and ∥·∥ 𝛽 , with 𝛼 + 𝛽 = 1, are said
to be dual to each other.

Proof:
1. In the case that 𝑋 = 0, the statement is trivial. So we focus on the case 𝑋 ≠ 0.
Let {𝑠 𝑘 }𝑟𝑘=1 denote the singular values of 𝑋, where 𝑟 B rank(𝑋). If we can
d
show that d𝛼 ∥ 𝑋 ∥ 𝛼 ≤ 0 for all 𝛼 ≥ 1, then it follows that ∥ 𝑋 ∥ 𝛼 is monotone
non-increasing with 𝛼. To this end, starting with (2.2.89), consider that

𝑟
! 𝛼1
d d ∑︁ d 1 ln Í𝑟 𝑠 𝛼
∥ 𝑋 ∥𝛼 = 𝑠 𝛼𝑘 = e𝛼 𝑘=1 𝑘 (2.2.100)
d𝛼 d𝛼 𝑘=1
d𝛼
𝑟
!
𝛼 d 1
1 Í 𝑟
∑︁
= e 𝛼 ln 𝑘=1 𝑠 𝑘 ln 𝑠 𝛼𝑘 (2.2.101)
d𝛼 𝛼 𝑘=1
𝑟
! 𝛼1 𝑟
" 𝑟
#!
∑︁ 1 ∑︁ 1 d ∑︁
= 𝑠 𝛼𝑘 − 2 ln 𝑠 𝛼𝑘 + ln 𝑠 𝛼𝑘 (2.2.102)
𝑘=1
𝛼 𝑘=1
𝛼 d𝛼 𝑘=1
! 1 " #!
𝑟 𝛼 𝑟 𝑟
∑︁ 1 ∑︁ 1 d ∑︁
= 𝑠 𝛼𝑘 − 2 ln 𝑠 𝛼𝑘 + Í𝑟 𝛼 𝑠 𝛼𝑘 (2.2.103)
𝑘=1
𝛼 𝑘=1
𝛼 𝑘=1 𝑠 𝑘 d𝛼 𝑘=1
𝑟
! 𝛼1 −1 " 𝑟 # " 𝑟 # " 𝑟
#!
∑︁ 1 ∑︁ ∑︁ 1 d ∑︁
= 𝑠 𝛼𝑘 − 2 𝑠 𝛼𝑘 ln 𝑠 𝛼𝑘 + 𝑠 𝛼𝑘 (2.2.104)
𝑘=1
𝛼 𝑘=1 𝑘=1
𝛼 d𝛼 𝑘=1
Í
𝑟
! 𝛼1 −1 d 𝑟 𝛼 − Í𝑟 𝛼 ln Í𝑟 𝛼
∑︁ 𝛼
© d𝛼 𝑘=1 𝑘 𝑠 𝑘=1 𝑘𝑠 𝑠
𝑘=1 𝑘 ª
= 𝑠 𝛼𝑘
2
® (2.2.105)
𝑘=1
𝛼 ®
« ¬
34
Chapter 2: Mathematical Tools
Í Í
! 𝛼1 −1 Í𝑟 𝑟 𝑟
𝑟
∑︁ ©𝛼 ln 𝑠 𝑘 − 𝛼
𝑘=1 𝑠 𝑘 ln 𝛼
𝑘=1 𝑠 𝑘 ª
𝛼
𝑘=1 𝑠 𝑘
= 𝑠 𝛼𝑘
2
® (2.2.106)
𝑘=1
𝛼 ®
« Í Í ¬
1
𝛼 −1
Í𝑟 𝑟 𝑟
𝑘=1 𝑠 𝑘 ln 𝑠 𝑘 −
! 𝛼 𝛼 𝛼 𝛼
𝑘=1 𝑠 𝑘 ln
𝑟
∑︁ © 𝑘=1 𝑠 𝑘 ª
= 𝑠 𝛼𝑘
2
®. (2.2.107)
𝑘=1
𝛼 ®
« ¬
The term on the very left in the last line is non-negative and so is the denominator
with 𝛼2 . The inequality
𝑟 𝑟
! 𝑟
!
∑︁ ∑︁ ∑︁
𝑠 𝛼𝑘 ln 𝑠 𝛼𝑘 − 𝑠 𝛼𝑘 ln 𝑠 𝛼𝑘 ≤ 0 (2.2.108)
𝑘=1 𝑘=1 𝑘=1

then follows because it is equivalent to

𝑟 𝑟
!
∑︁ 𝑠 𝛼𝑘 ∑︁
Í𝑟 𝛼 ln 𝑠 𝛼𝑘 ≤ ln 𝑠 𝛼𝑘 . (2.2.109)
𝑘=1 𝑘=1 𝑠 𝑘 𝑘=1

This latter inequality follows by defining the probabilities

𝑟
∑︁ 𝑠 𝛼𝑘
𝑝𝑘 B Í𝑟 𝛼 (2.2.110)
𝑘=1 𝑘=1 𝑠 𝑘

so that
𝑟 𝑟 𝑟
! 𝑟
!
∑︁ 𝑠 𝛼𝑘 ∑︁ ∑︁ ∑︁
Í𝑟 𝛼 ln 𝑠 𝛼𝑘 = 𝑝 𝑘 ln 𝑠 𝛼𝑘 ≤ ln 𝑝 𝑘 𝑠 𝛼𝑘 ≤ ln 𝑠 𝛼𝑘 , (2.2.111)
𝑘=1 𝑘=1 𝑠 𝑘 𝑘=1 𝑘=1 𝑘=1

where we applied concavity of the logarithm function, as well as the mono-

tonicity of the logarithm and the fact that 𝑝 𝑘 ≤ 1 for all 𝑘. So we conclude
d
that d𝛼 ∥ 𝑋 ∥ 𝛼 ≤ 0 for all 𝛼 ≥ 1, which implies that ∥ 𝑋 ∥ 𝛼 is monotone non-
increasing. In fact, observe that the proof given holds for 𝛼 > 0, which implies
that ∥ 𝑋 ∥ 𝛼 is monotone non-increasing for all 𝛼 > 0.
2. Isometric invariance holds because the singular values of a linear operator 𝑋
do not change under the action 𝑋 ↦→ 𝑈 𝑋𝑉 † , for isometries 𝑈 and 𝑉.
3. For a proof of (2.2.94), see the Bibliographic Notes (Section 2.6). Submulti-
plicativity in (2.2.95) follows immediately from (2.2.94) by taking 𝑍 = 1, using
the fact that ∥ 1 ∥ ∞ = 1 (see Section 2.2.9.1 below), and using monotonicity,
which implies that ∥ 𝑋 ∥ ∞ ≤ ∥ 𝑋 ∥ 𝛼 .
35
Chapter 2: Mathematical Tools

4.-5. Multiplicativity with respect to the tensor product and the direct sum property
follow immediately from the definition of ∥·∥ 𝛼 .
6. We provide a proof of (2.2.98) in the special case 𝛼 = 1, 𝛽 = ∞ in Proposi-
tion 2.10 below. For all other values of 𝛼 and 𝛽, please consult the Biblio-
graphic Notes (Section 2.6). Given (2.2.98), for all linear operators 𝑋 and 𝑍,
let 𝑌 = ∥𝑍𝑍∥ . Then, ∥𝑌 ∥ 𝛽 ≤ 1, which means that
𝛽

1
Tr[𝑍 † 𝑋] = Tr[𝑌 † 𝑋] ≤ ∥ 𝑋 ∥ 𝛼 ⇒ Tr[𝑍 † 𝑋] ≤ ∥ 𝑋 ∥ 𝛼 ∥𝑍 ∥ 𝛽 ,
∥𝑍 ∥ 𝛽
(2.2.112)
which is the inequality in (2.2.99). ■

In addition to the variational characterization of the Schatten norm ∥·∥ 𝛼 given

in (2.2.98), we have the following variational characterization, which extends to
𝛼 ∈ (0, 1).

Proposition 2.8
Let 𝛼 ∈ (0, 1) ∪ (1, ∞]. Then, for every 𝛽 such that 𝛼1 + 1𝛽 = 1, and every
positive semi-definite operator 𝑋,
( 1
inf{Tr[𝑋𝑌 𝛽 ] : 𝑌 > 0, Tr[𝑌 ] = 1} if 𝛼 ∈ [0, 1),
∥ 𝑋 ∥𝛼 = 1 (2.2.113)
sup{Tr[𝑋𝑌 𝛽 ] : 𝑌 ≥ 0, Tr[𝑌 ] = 1} if 𝛼 ∈ [1, ∞).

Proof: Please consult the Bibliographic Notes (Section 2.6). ■

2.2.9.1 Schatten ∞-Norm (Spectral/Operator Norm)

An important case of the Schatten norms is the Schatten ∞-norm, which we recall
from (2.2.88) is defined as

∥ 𝑋 ∥ ∞ B lim ∥ 𝑋 ∥ 𝛼 , (2.2.114)
𝛼→∞

36
Chapter 2: Mathematical Tools

Proposition 2.9 Schatten ∞-Norm is Largest Singular Value

For every linear operator 𝑋 ∈ L(H 𝐴 , H𝐵 ), ∥ 𝑋 ∥ ∞ is equal to the largest singular
value of 𝑋, which we denote by 𝑠max , i.e.,

∥ 𝑋 ∥ ∞ = 𝑠max . (2.2.115)

Proof: Let 𝑠® B (𝑠 𝑘 ) 𝑟𝑘=1 denote the vector of singular values of 𝑋, where 𝑟 B

Í
rank(𝑋). Then for all 𝛼 ≥ 1 and 𝑘 ∈ {1, . . . , 𝑟 }, the inequality 𝑠 𝛼𝑘 ≤ 𝑟𝑘=1 𝑠 𝛼𝑘 holds,
Í 𝛼1
𝑟
which implies that 𝑠 𝑘 ≤ ∥®𝑠 ∥ 𝛼 where ∥®𝑠 ∥ 𝛼 B 𝛼
𝑘=1 𝑠 𝑘 . Note that ∥®𝑠 ∥ 𝛼 = ∥ 𝑋 ∥ 𝛼 .
So we conclude that 𝑠max ≤ ∥ 𝑋 ∥ 𝛼 for all 𝛼 ≥ 1. Using the definition in (2.2.114),
we obtain
𝑠max ≤ ∥ 𝑋 ∥ ∞ . (2.2.116)

We now prove the opposite inequality. Consider for 𝛼 > 𝛽 > 1 that

𝑟
! 𝛼1 𝑟
! 𝛼1
∑︁ ∑︁
𝛼−𝛽 𝛽 𝛼−𝛽 𝛽
∥ 𝑋 ∥ 𝛼 = ∥®𝑠 ∥ 𝛼 = 𝑠𝑘 𝑠𝑘 ≤ 𝑠max 𝑠 𝑘 (2.2.117)
𝑘=1 𝑘=1
𝑟
! 𝛼1
1− 𝛽 1− 𝛽 𝛽
1− 𝛽 𝛽
∑︁
𝛽
= 𝑠max𝛼 𝑠𝑘 = 𝑠max𝛼 ∥®𝑠 ∥ 𝛽𝛼 = 𝑠max𝛼 ∥ 𝑋 ∥ 𝛽𝛼 . (2.2.118)
𝑘=1

We thus have
1− 𝛼𝛽 𝛽
∥ 𝑋 ∥𝛼 ≤ 𝑠max ∥𝑋 ∥𝛽 .𝛼
(2.2.119)
For every fixed 𝛽, we find that the limit 𝛼 → ∞ of the right-hand side of the
above inequality is equal to 𝑠max . Therefore, ∥ 𝑋 ∥ ∞ ≤ 𝑠max , which concludes the
proof. ■

Due to Proposition 2.9, the term spectral norm is often used to refer to the
Schatten ∞-norm. It is also referred to as the operator norm, because it is the
norm induced by the Euclidean norm on the underlying Hilbert space on which the
operator 𝑋 acts, i.e.,
∥ 𝑋 |𝜓⟩∥ 2
∥ 𝑋 ∥ ∞ = sup = sup ∥ 𝑋 |𝜓⟩∥ 2 . (2.2.120)
|𝜓⟩≠0 ∥|𝜓⟩∥ 2 |𝜓⟩:∥|𝜓⟩∥ 2 =1

37
Chapter 2: Mathematical Tools

In the equation above, we have employed the shorthand sup, which stands for
supremum. We also often employ inf for infimum. These concepts are reviewed in
Section 2.3.1.

Exercise 2.21
Using the fact that 𝑋 has a singular value decomposition of the form 𝑋 =
Írank(𝑋)
𝑘=1 𝑠 𝑘 |𝑒 𝑘 ⟩⟨ 𝑓 𝑘 | (see Theorem 2.1), prove (2.2.120). Similarly, prove that

∥ 𝑋 ∥ ∞ = sup{|⟨𝜓|𝑋 |𝜙⟩| : ∥|𝜓⟩∥ 2 = ∥|𝜙⟩∥ 2 = 1}. (2.2.121)

If 𝑋 is Hermitian and positive semi-definite, then ∥ 𝑋 ∥ ∞ is equal to the largest

eigenvalue of 𝑋, and we can write

∥ 𝑋 ∥∞ = sup ⟨𝜓|𝑋 |𝜓⟩ (2.2.122)

|𝜓⟩:∥|𝜓⟩∥ 2 =1
= sup{Tr[𝑋 𝜌] : Tr[𝜌] = 1} (𝑋 positive semi-definite). (2.2.123)
𝜌≥0

More generally, if 𝑋 is Hermitian and {𝜆 𝑘 }rank(𝑋)

𝑘=1 is the set of its eigenvalues, then

∥ 𝑋 ∥∞ = sup |⟨𝜓|𝑋 |𝜓⟩| (2.2.124)

|𝜓⟩:∥|𝜓⟩∥ 2 =1
= max |𝜆 𝑘 | (𝑋 Hermitian). (2.2.125)
1≤𝑘 ≤rank(𝑋)

Exercise 2.22
Let 𝑈 be a unitary operator. Prove that ∥𝑈 ∥ ∞ = 1. More generally, prove that
∥𝑉 ∥ ∞ = 1 for every isometry 𝑉.

2.2.9.2 Schatten 1-Norm (Trace Norm)

Another important special case of the Schatten 𝛼-norm is 𝛼 = 1. In this case, we

refer to it as the trace norm, and by applying (2.2.89), it is equal to the sum of the
singular values of 𝑋:
rank(𝑋)
∑︁
∥ 𝑋 ∥1 = 𝑠𝑘 . (2.2.126)
𝑘=1

38
Chapter 2: Mathematical Tools

If 𝑋 is Hermitian and positive semi-definite, then ∥ 𝑋 ∥ 1 is equal to the sum of the

eigenvalues of 𝑋, i.e., to the trace of 𝑋:
∥ 𝑋 ∥ 1 = Tr[𝑋] (X positive semi-definite). (2.2.127)
More generally, if 𝑋 is Hermitian and {𝜆 𝑘 }rank(𝑋)
𝑘=1 is the set of its eigenvalues, then
rank(𝑋)
∑︁
∥ 𝑋 ∥1 = |𝜆 𝑘 | (X Hermitian). (2.2.128)
𝑘=1

Exercise 2.23
Consider two vectors |𝜓⟩, |𝜙⟩ ∈ C𝑑 , with 𝑑 ≥ 2. Show that

∥|𝜓⟩⟨𝜓| − |𝜙⟩⟨𝜙|∥ 21 = (⟨𝜓|𝜓⟩ + ⟨𝜙|𝜙⟩) 2 − 4 |⟨𝜓|𝜙⟩| 2 . (2.2.129)

We now provide a proof of the variational characterization of the Schatten norm

in (2.2.98) for the special case of 𝛼 = 1 and 𝛽 = ∞.

Proposition 2.10 Variational Characterization of Trace Norm

For all 𝑋 ∈ L(H 𝐴 , H𝐵 ), the trace norm of 𝑋 has the following variational
characterization:
∥ 𝑋 ∥1 = sup Tr[𝑌 † 𝑋] , (2.2.130)
𝑌 ≠0:∥𝑌 ∥ ∞ ≤1

where the optimization is with respect to all non-zero 𝑌 ∈ L(H 𝐴 , H𝐵 ) with

spectral norm bounded from above by one.

Í
Proof: Let 𝑋 = 𝑟𝑘=1 𝑠 𝑘 |𝑒 𝑘 ⟩𝐵 ⟨ 𝑓 𝑘 | 𝐴 be the singular value decomposition of 𝑋,
where 𝑟 B rank(𝑋). Let 𝑌 ∈ L(H 𝐴 , H𝐵 ) be such that ∥𝑌 ∥ ∞ ≤ 1. Then,
" 𝑟
!#
∑︁
Tr[𝑌 † 𝑋] = Tr 𝑌 † 𝑠 𝑘 |𝑒 𝑘 ⟩𝐵 ⟨ 𝑓 𝑘 | 𝐴 (2.2.131)
𝑘=1
𝑟
∑︁
= 𝑠 𝑘 ⟨𝑒 𝑘 | 𝐵𝑌 | 𝑓 𝑘 ⟩ 𝐴 (2.2.132)
𝑘=1
𝑟
∑︁
≤ 𝑠 𝑘 |⟨𝑒 𝑘 | 𝐵𝑌 | 𝑓 𝑘 ⟩ 𝐴 | , (2.2.133)
𝑘=1

39
Chapter 2: Mathematical Tools

where the last line is due to the triangle inequality. Now, using (2.2.121), we have

|⟨𝑒 𝑘 | 𝐵𝑌 | 𝑓 𝑘 ⟩ 𝐴 | ≤ ∥𝑌 ∥ ∞ ≤ 1, (2.2.134)

for every 𝑘 ∈ {1, 2, . . . , 𝑟 }. Therefore,

𝑟
∑︁
†
Tr[𝑌 𝑋] ≤ 𝑠𝑘 = ∥ 𝑋 ∥1 , (2.2.135)
𝑘=1

which holds for every non-zero 𝑌 ∈ L(H 𝐴 , H𝐵 ) satisfying ∥𝑌 ∥ ∞ ≤ 1, so that the

inequality
sup Tr[𝑌 † 𝑋] ≤ ∥ 𝑋 ∥ 1 (2.2.136)
𝑌 ≠0:∥𝑌 ∥ ∞ ≤1

holds. The opposite inequality holds by making a particular choice for 𝑌 . We pick
𝑌 to be the following linear operator defined from the singular value decomposition
Í
of 𝑋: 𝑌 = 𝑟𝑘=1 |𝑒 𝑘 ⟩𝐵 ⟨ 𝑓 𝑘 | 𝐴 . Observe that ∥𝑌 ∥ ∞ = 1. Thus,
" 𝑟 ! 𝑟 !#
∑︁ ∑︁
sup Tr[𝑌 † 𝑋] ≥ Tr | 𝑓 𝑘 ′ ⟩ 𝐴 ⟨𝑒 𝑘 ′ | 𝐵 𝑠 𝑘 |𝑒 𝑘 ⟩𝐵 ⟨ 𝑓 𝑘 | 𝐴 (2.2.137)
𝑌 ≠0:∥𝑌 ∥ ∞ ≤1 𝑘 ′ =1 𝑘=1
𝑟
∑︁
= 𝑠𝑘 (2.2.138)
𝑘=1
= ∥ 𝑋 ∥1 . (2.2.139)

This completes the proof. ■

Remark: Observe that Proposition 2.10 can be generalized as follows for every linear operator
𝑋 𝐴→𝐵 ∈ L(H 𝐴, H 𝐵 ):
†
∥ 𝑋 ∥1 = sup Re Tr[𝑌 𝑋] , (2.2.140)
𝑌 ≠0:∥𝑌 ∥ ∞ ≤1

where, as before, the optimization is with respect to every non-zero operator 𝑌 ∈ L(H 𝐴, H 𝐵 )
with spectral norm bounded from above by one. Indeed, for every complex number 𝑧 ∈ C, the
inequality Re(𝑧) ≤ |Re(𝑧)| ≤ |𝑧| holds, which means that

sup Re Tr[𝑌 † 𝑋] ≤ sup Tr[𝑌 † 𝑋] = ∥ 𝑋 ∥ 1 . (2.2.141)
𝑌 ≠0:∥𝑌 ∥ ∞ ≤1 𝑌 ≠0:∥𝑌 ∥ ∞ ≤1

Then, to obtain the opposite inequality, the same choice for 𝑌 as in the the proof of Proposition 2.10
can be made, because for that choice of 𝑌 we have Tr[𝑌 † 𝑋] = ∥ 𝑋 ∥ 1 , which is real, so that
Re(Tr[𝑌 † 𝑋]) = ∥ 𝑋 ∥ 1 . We can thus conclude (2.2.140).

40
Chapter 2: Mathematical Tools

We also remark that in both (2.2.130) and (2.2.140), it suffices to optimize with respect to
isometries. In particular, because ∥𝑈 ∥ ∞ = 1 for every isometry 𝑈 (see Exercise 2.22), using
similar techniques as in the proof of Proposition 2.10, it is straightforward to prove that for all
𝑋 ∈ L(H 𝐴, H 𝐵 ),

∥ 𝑋 ∥ 1 = sup |Tr[𝑈 𝐵→𝐴 𝑋 𝐴→𝐵 ] | = sup Re (Tr[𝑈 𝐵→𝐴 𝑋 𝐴→𝐵 ]) , 𝑑 𝐴 ≥ 𝑑𝐵, (2.2.142)
𝑈𝐵→𝐴 𝑈𝐵→𝐴
isometry isometry

∥ 𝑋 ∥ 1 = sup Tr[𝑉 𝐴→𝐵 (𝑋 𝐴→𝐵 ) † ] = sup Re Tr[𝑉 𝐴→𝐵 (𝑋 𝐴→𝐵 ) † ] , 𝑑 𝐴 ≤ 𝑑𝐵.
𝑉𝐴→𝐵 𝑉𝐴→𝐵
isometry isometry
(2.2.143)

In particular, if 𝑑 𝐴 = 𝑑 𝐵 = 𝑑, then the optimization in (2.2.142) and (2.2.143) is with respect to

unitary operators, so that for all 𝑋 ∈ L(C𝑑 ),

∥ 𝑋 ∥1 = sup |Tr[𝑈 𝑋] | = sup Re (Tr[𝑈 𝑋]) . (2.2.144)

𝑈 ∈L(C𝑑 ) 𝑈 ∈L(C𝑑 )
unitary unitary

The monotonicity result in Proposition 2.7 implies that

∥ 𝑋 ∥∞ ≤ ∥ 𝑋 ∥1 . (2.2.145)
for every linear operator 𝑋. The following proposition gives a tighter bound than
the one in (2.2.145) for the special case when 𝑋 is a traceless Hermitian operator.

Lemma 2.11
Let 𝑋 be a Hermitian operator satisfying Tr[𝑋] = 0. Then,
1
∥ 𝑋 ∥∞ ≤ ∥ 𝑋 ∥1 . (2.2.146)
2

Proof: Let the Jordan–Hahn decomposition of 𝑋 be given by

𝑋 = 𝑋+ − 𝑋− , (2.2.147)
where 𝑋+ , 𝑋− ≥ 0 and 𝑋+ 𝑋− = 0. Then,
∥ 𝑋 ∥ 1 = Tr[𝑋+ ] + Tr[𝑋− ]. (2.2.148)
Since Tr[𝑋] = 0, we have that Tr[𝑋+ ] = Tr[𝑋− ], which means that
∥ 𝑋 ∥ 1 = 2Tr[𝑋+ ]. (2.2.149)
41
Chapter 2: Mathematical Tools

We also have that

∥ 𝑋 ∥ ∞ = max{∥ 𝑋+ ∥ ∞ , ∥ 𝑋− ∥ ∞ }, (2.2.150)
because 𝑋+ 𝑋− = 0. Then,
1
∥ 𝑋 ∥ ∞ = max {∥ 𝑋+ ∥ ∞ , ∥ 𝑋− ∥ ∞ } ≤ Tr[𝑋+ ] =
∥ 𝑋 ∥1 , (2.2.151)
2
with the inequality following from (2.2.145) and the fact that Tr[𝑋+ ] = Tr[𝑋− ] =
∥ 𝑋+ ∥ 1 . ■

We remark that the monotonicity inequality ∥ 𝑋 ∥ ∞ ≤ ∥ 𝑋 ∥ 1 in (2.2.145) can be

reversed to give
∥ 𝑋 ∥1 ≤ 𝑑 ∥ 𝑋 ∥∞ (2.2.152)
for every linear operator 𝑋 acting on a 𝑑-dimensional Hilbert space. This follows
because
𝑟
∑︁ ∑︁𝑟
∥ 𝑋 ∥1 = 𝑠𝑘 ≤ max 𝑠 𝑘 = 𝑟 max 𝑠 𝑘 = 𝑟 ∥ 𝑋 ∥ ∞ ≤ 𝑑 ∥ 𝑋 ∥ ∞ , (2.2.153)
𝑘 ∈{1,...,𝑟 } 𝑘 ∈{1,...,𝑟}
𝑘=1 𝑘=1

where {𝑠 𝑘 }𝑟𝑘=1 is the set of singular values of 𝑋 and 𝑟 B rank(𝑋).

Using Proposition 2.10, we can establish the following slight strengthening of
the Hölder inequality in (2.2.99):
𝑍†𝑋 1
≤ ∥ 𝑋 ∥ 𝛼 ∥𝑍 ∥ 𝛽 , (2.2.154)
which holds for all linear operators 𝑋 and 𝑍 and 𝛼, 𝛽 ∈ [1, ∞] satisfying 𝛼1 + 1𝛽 = 1.
This actually follows by a direct application of the Hölder inequality itself:
Tr[𝑈𝑍 † 𝑋] ≤ ∥ 𝑋 ∥ 𝛼 𝑍𝑈 † 𝛽
= ∥ 𝑋 ∥ 𝛼 ∥𝑍 ∥ 𝛽 , (2.2.155)
which holds for every isometry 𝑈. Therefore, it follows from (2.2.130) that
𝑍†𝑋 1
= sup Tr[𝑈𝑍 † 𝑋] ≤ ∥ 𝑋 ∥ 𝛼 ∥𝑍 ∥ 𝛽 . (2.2.156)
𝑈

2.2.10 Operator Inequalities

Throughout this book, we make use of the Löwner partial order for Hermitian
operators. It is useful as a way of comparing two Hermitian operators in L(H),
generalizing the way in which we compare two real numbers.

42
Chapter 2: Mathematical Tools

Definition 2.12 Löwner Partial Order for Hermitian Operators

For two Hermitian operators 𝑋 and 𝑌 , the expression 𝑋 ≥ 𝑌 is an operator
inequality and means that 𝑋 − 𝑌 ≥ 0, i.e., that 𝑋 − 𝑌 is positive semi-definite.
We also write 𝑋 ≤ 𝑌 to mean 𝑌 − 𝑋 ≥ 0. The expressions 𝑋 > 𝑌 and 𝑌 < 𝑋
mean that 𝑋 − 𝑌 is positive definite.

The relations “≥” and “≤” satisfy the following expected properties: 𝑋 ≤ 𝑌
and 𝑋 ≥ 𝑌 imply that 𝑋 = 𝑌 , and 𝑋 ≤ 𝑌 and 𝑌 ≤ 𝑍 imply that 𝑋 ≤ 𝑍. The term
“partial order” is used because not every pair (𝑋, 𝑌 ) of Hermitian operators satisfies
either 𝑋 ≥ 𝑌 or 𝑋 ≤ 𝑌 .

Definition 2.13 Operator Convex, Concave, Monotone Functions

Let 𝑋 and 𝑌 be Hermitian operators, and let 𝑓 : R → R be a function extended
to Hermitian operators as in (2.2.69).
1. The function 𝑓 is called operator convex if for all 𝜆 ∈ [0, 1] and Hermitian
operators 𝑋 and 𝑌 , the following inequality holds:

𝑓 (𝜆𝑋 + (1 − 𝜆)𝑌 ) ≤ 𝜆 𝑓 (𝑋) + (1 − 𝜆) 𝑓 (𝑌 ). (2.2.157)

We call 𝑓 operator concave if − 𝑓 is operator convex.

2. The function 𝑓 is called operator monotone if 𝑋 ≤ 𝑌 implies 𝑓 (𝑋) ≤ 𝑓 (𝑌 )
for all Hermitian operators 𝑋 and 𝑌 . We call 𝑓 operator anti-monotone if
− 𝑓 is operator monotone.

The functions considered in Section 2.2.8.1 have the following properties with
respect to Definition 2.13:
• The function 𝑥 ↦→ 𝑥 𝛼 is operator monotone for 𝛼 ∈ [0, 1] and 𝑥 ∈ [0, ∞),
operator anti-monotone for 𝛼 ∈ [−1, 0) and 𝑥 ∈ (0, ∞), operator convex for
𝛼 ∈ [−1, 0) and 𝑥 ∈ (0, ∞), operator convex for [1, 2] and 𝑥 ∈ [0, ∞), and
operator concave for 𝛼 ∈ (0, 1] and 𝑥 ∈ [0, ∞). Note that the function 𝑥 ↦→ 𝑥 𝛼
is neither operator monotone, operator convex, nor operator concave for 𝛼 < −1
and 𝛼 > 2.
• The function 𝑥 ↦→ log𝑏 (𝑥), for every base 𝑏 > 0 and 𝑥 ∈ (0, ∞), is operator
monotone and operator concave.
43
Chapter 2: Mathematical Tools

• The function 𝑥 ↦→ 𝑥 log𝑏 (𝑥), for every base 𝑏 > 0 and 𝑥 ∈ [0, ∞), is operator
convex3 .
For proofs of these properties, please see the Bibliographic Notes (Section 2.6).
We note here that these properties are critical for understanding quantum entropies,
as detailed in Chapter 7. Especially, the data-processing inequality for quantum
relative entropy, which is at the heart of understanding quantum communication
limits, is intimately related to operator convexity.
We now state some basic operator inequalities that we use repeatedly throughout
the book.

Lemma 2.14 Basic Operator Inequalities

Let 𝑋, 𝑌 ∈ L(H) be Hermitian operators acting on a Hilbert space H.
1. 𝑋 ≥ 0 ⇒ 𝑍 𝑋 𝑍 † ≥ 0 for all 𝑍 ∈ L(H, H′). In particular, 𝑋 ≥ 𝑌 ⇒
𝑍 𝑋 𝑍 † ≥ 𝑍𝑌 𝑍 † for all 𝑍 ∈ L(H, H′).
2. 𝑋 ≥ 𝑌 ⇒ Tr[𝑋] ≥ Tr[𝑌 ]. More generally, 𝑋 ≥ 𝑌 ⇒ Tr[𝑊 𝑋] ≥ Tr[𝑊𝑌 ]
for all 𝑊 ∈ L(H) satisfying 𝑊 ≥ 0.
3. For every Hermitian operator 𝑋 with maximum and minimum eigenvalues
𝜆 max and 𝜆 min , respectively, 𝜆 min 1 ≤ 𝑋 ≤ 𝜆 max 1.
4. Let 𝑋 and 𝑌 have their spectrum in some interval 𝐼 ⊂ R, and let 𝑓 : 𝐼 → R
be a monotone increasing function. If 𝑋 ≤ 𝑌 , then Tr[ 𝑓 (𝑋)] ≤ Tr[ 𝑓 (𝑌 )].
In particular, if 𝑋 and 𝑌 are positive semi-definite, then

0 ≤ 𝑋 ≤ 𝑌 ⇒ Tr[𝑋 𝛼 ] ≤ Tr[𝑌 𝛼 ] ∀ 𝛼 > 0. (2.2.158)

Proof:
1. 𝑋 ≥ 0 implies that ⟨𝜓|𝑋 |𝜓⟩ ≥ 0 for all |𝜓⟩ ∈ H. Then, for every vector
|𝜙⟩ ∈ H′, we have ⟨𝜙|𝑍 𝑋 𝑍 † |𝜙⟩ ≥ 0 because 𝑍 † |𝜙⟩ ≡ |𝜓⟩ is some vector in H.
Therefore, 𝑍 𝑋 𝑍 † ≥ 0.
Now, 𝑋 ≥ 𝑌 is equivalent to 𝑋 − 𝑌 ≥ 0. Let 𝑊 = 𝑋 − 𝑌 . Then, from the
arguments in the previous paragraph, we have 𝑍𝑊 𝑍 † ≥ 0 for all 𝑍, which
3 Note that, because lim 𝑥→0 𝑥 log𝑏 (𝑥) = 0, we take the convention that 0 log𝑏 (0) = 0 throughout
this book.

44
Chapter 2: Mathematical Tools

implies that 𝑍 𝑋 𝑍 † − 𝑍𝑌 𝑍 † ≥ 0, i.e., 𝑍 𝑋 𝑍 † ≥ 𝑍𝑌 𝑍 † , as required.

2. 𝑋 ≥ 𝑌 implies that 𝑋 − 𝑌 ≥ 0. The trace of a positive semi-definite operator
is non-negative, since positive semi-definite operators have non-negative
eigenvalues and the trace of every normal operator is equal to the sum of its
eigenvalues. Thus, 𝑋 − 𝑌 ≥ 0 implies Tr[𝑋 − 𝑌 ] ≥ 0, which implies that
Tr[𝑋] ≥ Tr[𝑌 ], as required.
Next,√let 𝑊√be a positive
√ √semi-definite operator. Using 1. above, 𝑋 ≥ 𝑌 implies
that 𝑊 𝑋 𝑊√≥ 𝑊𝑌 √ 𝑊. Then, √ using
√ the result of the previous paragraph,
we obtain Tr[ 𝑊 𝑋 𝑊] ≥ Tr[ 𝑊𝑌 𝑊]. Finally, by cyclicity of the trace
(recall (2.2.21)), we find that Tr[𝑊 𝑋] ≥ Tr[𝑊𝑌 ], as required.
3. This result follows from the fact that, for every Hermitian operator 𝑋 ∈ L(H)
with eigenvalues {𝜆 𝑘 }dim(H)
𝑘=1 , the eigenvalues of 𝑋 +𝑡 1 are equal to {𝜆 𝑘 +𝑡}dim(H)
𝑘=1
for every 𝑡 ∈ R. In particular, then, by definition of the minimum eigenvalue,
𝑋 − 𝜆 min 1 ≥ 0, because all of the eigenvalues of 𝑋 − 𝜆 min 1 are non-negative.
Similarly, by definition of the maximum eigenvalue, 𝑋 − 𝜆 max 1 ≤ 0, because
all of the eigenvalues of 𝑋 − 𝜆 max 1 are non-positive.

4. Let 𝜆↓𝑖 (𝑋) denote the sequence of decreasingly ordered eigenvalues of 𝑋.

Then the inequalities 𝜆↓𝑖 (𝑋) ≤ 𝜆↓𝑖 (𝑌 ) hold for all 𝑖 ∈ {1, . . . , dim(H)}. These
inequalities are a consequence of the Courant–Fischer–Weyl minimax principle
(please consult the Bibliographic Notes in Section 2.6 for a reference to
this principle). Then, the desired inequality follows directly from the fact
𝑓 (𝜆↓𝑖 (𝑋)), as well as the monotonicity of 𝑓 . The
Ídim(H)
that Tr[ 𝑓 (𝑋)] = 𝑖=1
inequality in (2.2.158) follows because the function 𝑥 𝛼 with domain 𝑥 ≥ 0 is
monotone for all 𝛼 > 0. ■

Lemma 2.15 Araki–Lieb–Thirring Inequality

Let 𝑋 and 𝑌 be positive semi-definite operators acting on a finite-dimensional
Hilberthspace. For
1 1
i 𝑞 ≥ h
all
𝑟𝑞
0:
𝑟
i 𝑟 𝑞
1. Tr 𝑌 2 𝑋𝑌 2 ≥ Tr 𝑌 2 𝑋 𝑟 𝑌 2 for all 0 ≤ 𝑟 < 1.

1 𝑟𝑞 𝑟 𝑞
h 1 i h 𝑟 i
2. Tr 𝑌 2 𝑋𝑌 2 ≤ Tr 𝑌 2 𝑋 𝑌 2
𝑟 for all 𝑟 ≥ 1.

45
Chapter 2: Mathematical Tools

Proof: Please consult the Bibliographic Notes (Section 2.6). ■

The operator Jensen inequality below is the linchpin of several quantum data-
processing inequalities presented later on in Chapter 7. These in turn are repeatedly
used in Parts II and III to place fundamental limits on quantum communication
protocols. As such, the operator Jensen inequality is a significant bridge that
connects convexity to information processing.

Theorem 2.16 Operator Jensen Inequality

Let 𝑓 : R → R be a continuous function with dom( 𝑓 ) = 𝐼 ⊂ R (where 𝐼 is an
interval). Then, the following are equivalent:
1. 𝑓 is operator convex.
2. For all 𝑛 ∈ N, the inequality
𝑛
! 𝑛
∑︁ ∑︁
𝑓 𝐴†𝑘 𝑋𝑘 𝐴 𝑘 ≤ 𝐴†𝑘 𝑓 (𝑋𝑘 ) 𝐴 𝑘 (2.2.159)
𝑘=1 𝑘=1

holds for every collection {𝑋𝑘 }𝑛𝑘=1 of Hermitian operators acting on

a Hilbert space H with spectrum contained in 𝐼 and every collection
{𝐴 𝑘 }𝑛𝑘=1 of linear operators in L(H′, H) satisfying 𝑛𝑘=1 𝐴†𝑘 𝐴 𝑘 = 1H′ .
Í

3. For every Hermitian operator 𝑋 ∈ L(H) with spectrum in 𝐼 and every

isometry 𝑉 ∈ L(H′, H), the following inequality holds:

𝑓 (𝑉 † 𝑋𝑉) ≤ 𝑉 † 𝑓 (𝑋)𝑉 . (2.2.160)

Proof: We first prove that 2. ⇒ 1. Let 𝑋 and 𝑌 be Hermitian operators

√ with
their eigenvalues
√ in 𝐼. Let 𝜆 ∈ [0, 1]. We can take 𝑛 = 2, 𝐴1 = 𝜆1, 𝑋1 = 𝑋,
𝐴2 = 1 − 𝜆1, 𝑋2 = 𝑌 , and the following operator inequality is an immediate
consequence of (2.2.159):

𝑓 (𝜆𝑋 + (1 − 𝜆) 𝑌 ) ≤ 𝜆 𝑓 (𝑋) + (1 − 𝜆) 𝑓 (𝑌 ). (2.2.161)

Since 𝑋, 𝑌 , and 𝜆 are arbitrary, it follows that 𝑓 is operator convex.

3. is actually a special case of 2., found by setting 𝑛 = 1 and taking 𝐴1 = 𝑉 and
𝑋𝑘 = 𝑋, with 𝑉 an isometry and 𝑋 Hermitian with eigenvalues in 𝐼.
46
Chapter 2: Mathematical Tools

Now we prove that 3. ⇒ 2. Fix 𝑛 ∈ N and the sets {𝐴 𝑘 }𝑛𝑘=1 and {𝑋𝑘 }𝑛𝑘=1 of
operators such that they satisfy the conditions specified in 2. Define the following
Hermitian operator:
𝑛
∑︁
𝑋B 𝑋𝑘 ⊗ |𝑘⟩⟨𝑘 |, (2.2.162)
𝑘=1
as well as the isometry
𝑛
∑︁
𝑉 B 𝐴 𝑘 ⊗ |𝑘⟩, (2.2.163)
𝑘=1

where {|𝑘⟩}𝑛𝑘=1 is an orthonormal basis. The condition 𝑛𝑘=1 𝐴†𝑘 𝐴 𝑘 = 1 and a

Í
calculation imply that 𝑉 is an isometry (satisfying 𝑉 †𝑉 = 1). Another calculation
implies that
∑︁𝑛
†
𝑉 𝑋𝑉 = 𝐴†𝑘 𝑋𝑘 𝐴 𝑘 . (2.2.164)
𝑘=1
Since !
𝑛
∑︁ 𝑛
∑︁
𝑓 (𝑋) = 𝑓 𝑋𝑘 ⊗ |𝑘⟩⟨𝑘 | = 𝑓 (𝑋𝑘 ) ⊗ |𝑘⟩⟨𝑘 |, (2.2.165)
𝑘=1 𝑘=1
which follows as a consequence of (2.2.69), a similar calculation implies that
𝑛
∑︁
†
𝑉 𝑓 (𝑋)𝑉 = 𝐴†𝑘 𝑓 (𝑋𝑘 ) 𝐴 𝑘 . (2.2.166)
𝑘=1

Then the desired inequality in (2.2.159) follows from (2.2.164), (2.2.166), and
(2.2.160).
We finally prove that 1. ⇒ 3. Fix the operator 𝑋 and isometry 𝑉, as specified in 3.
Let 𝑀 be a Hermitian operator in L(H′) with spectrum in 𝐼. Let 𝑃 B 1H − 𝑉𝑉 † ,
and observe that 𝑃 is a projection (i.e., 𝑃2 = 𝑃), 𝑉 † 𝑃 = 0, and 𝑃𝑉 = 0. Set

𝑋 0 𝑉 𝑃 𝑉 −𝑃
𝑍B , 𝑈B , 𝑊B . (2.2.167)
0 𝑀 0 −𝑉 † 0 𝑉†

Observe that 𝑈 and 𝑊 are unitary operators (these are called unitary dilations of
the isometry 𝑉). By direct calculation, we then find that
† †𝑋𝑃

𝑉 𝑋𝑉 𝑉
𝑈 † 𝑍𝑈 = , (2.2.168)
𝑃𝑋𝑉 𝑃𝑋 𝑃 + 𝑉 𝑀𝑉 †
47
Chapter 2: Mathematical Tools

† 𝑋𝑉 †𝑋𝑃

𝑉 −𝑉
𝑊 † 𝑍𝑊 = , (2.2.169)
−𝑃𝑋𝑉 𝑃𝑋 𝑃 + 𝑉 𝑀𝑉 †
so that
1 † 𝑉 † 𝑋𝑉 0

†
𝑈 𝑍𝑈 + 𝑊 𝑍𝑊 = . (2.2.170)
2 0 𝑃𝑋 𝑃 + 𝑉 𝑀𝑉 †
From the same reasoning that leads to (2.2.165), and using (2.2.170), we find that
𝑓 𝑉 † 𝑋𝑉

0
𝑓 𝑃𝑋 𝑃 + 𝑉 𝐵𝑉 †

0
†
𝑉 𝑋𝑉 0
= 𝑓 (2.2.171)
0 𝑃𝑋 𝑃 + 𝑉 𝐵𝑉 †

1 †
= 𝑓 𝑈 𝑍𝑈 + 𝑊 † 𝑍𝑊 (2.2.172)
2
1 † 1 †
≤ 𝑓 𝑈 𝑍𝑈 + 𝑓 𝑊 𝑍𝑊 (2.2.173)
2 2
1 1
= 𝑈 † 𝑓 (𝑍) 𝑈 + 𝑊 † 𝑓 (𝑍) 𝑊 (2.2.174)
2
† 2
𝑉 𝑓 (𝑋)𝑉 0
= . (2.2.175)
0 𝑃 𝑓 (𝑋)𝑃 + 𝑉 𝑓 (𝐵)𝑉 †
The inequality follows from the assumption that 𝑓 is operator convex. The third
equality follows from (2.2.69). The final equality follows because

𝑓 (𝑋) 0
𝑓 (𝑍) = , (2.2.176)
0 𝑓 (𝑀)
and by applying (2.2.170) again, with the substitutions 𝑍 → 𝑓 (𝑍), 𝑋 → 𝑓 (𝑋),
and 𝑀 → 𝑓 (𝑀). It follows that
𝑓 𝑉 † 𝑋𝑉
†
0 ≤ 𝑉 𝑓 (𝑋)𝑉 0
,
0 𝑓 𝑃𝑋 𝑃 + 𝑉 𝐵𝑉 † 0 𝑃 𝑓 (𝑋)𝑃 + 𝑉 𝑓 (𝐵)𝑉 †
(2.2.177)
† †
and we finally conclude that 𝑓 (𝑉 𝑋𝑉) ≤ 𝑉 𝑓 (𝑋)𝑉 by examining the upper left
blocks in the operator inequality in (2.2.177). ■

2.2.11 Superoperators

Just as we have been considering linear operators of the form 𝑋 : H 𝐴 → H𝐵 , with

input and output Hilbert spaces H 𝐴 and H𝐵 , we can consider linear operators with
48
Chapter 2: Mathematical Tools

input Hilbert space L(H 𝐴 ) and output Hilbert space L(H𝐵 ). We use the term
superoperator to refer to a linear operator acting on the Hilbert space of linear
operators. Specifically, a superoperator is a function N : L(H 𝐴 ) → L(H𝐵 ) such
that
N(𝛼𝑋 + 𝛽𝑌 ) = 𝛼N(𝑋) + 𝛽N(𝑌 ) (2.2.178)
for all 𝛼, 𝛽 ∈ C and 𝑋, 𝑌 ∈ L(H 𝐴 ). It is often helpful to indicate explicitly the input
and output Hilbert spaces of a superoperator N : L(H 𝐴 ) → L(H𝐵 ) by writing
N 𝐴→𝐵 . We make use of this notation throughout the book.
For every superoperator N 𝐴→𝐵 , there exists 𝑛 ∈ N, and sets {𝐾𝑖 }𝑖=1
𝑛 and {𝐿 } 𝑛
𝑖 𝑖=1
of operators in L(H 𝐴 , H𝐵 ) such that
𝑛
∑︁
N 𝐴→𝐵 (𝑋 𝐴 ) = 𝐾𝑖 𝑋 𝐴 𝐿 𝑖† , (2.2.179)
𝑖=1

for all 𝑋 𝐴 ∈ L(H 𝐴 ). This follows as a consequence of the requirement that N 𝐴→𝐵
has a linear action on 𝑋 𝐴 and the isomorphism in (2.2.31). The transpose operation
discussed previously in (2.2.25) is an example of a superoperator. In Chapter 4,
we see that quantum physical evolutions of quantum states, known as quantum
channels, are other examples of superoperators with additional constraints on the
sets {𝐾𝑖 }𝑖=1
𝑛 and {𝐿 } 𝑛 .
𝑖 𝑖=1

We denote the identity superoperator by id : L(H) → L(H), and by definition

it satisfies id(𝑋) = 𝑋 for all 𝑋 ∈ L(H).
Given two superoperators N 𝐴→𝐴′ and M𝐵→𝐵′ , their tensor product N 𝐴→𝐴′
⊗ M𝐵→𝐵′ is a superoperator with input Hilbert space L(H 𝐴 ⊗ H𝐵 ) and output
Hilbert space L(H 𝐴′ ⊗ H𝐵′ ), such that

(N 𝐴→𝐴′ ⊗ M𝐵→𝐵′ )(𝑋 𝐴 ⊗ 𝑌𝐵 ) = N 𝐴→𝐴′ (𝑋 𝐴 ) ⊗ M𝐵→𝐵′ (𝑌𝐵 ) (2.2.180)

for all 𝑋 𝐴 ∈ L(H 𝐴 ) and 𝑌𝐵 ∈ L(H𝐵 ). We use the abbreviation

N 𝐴→𝐴′ ⊗ id𝐵→𝐵′ ≡ N 𝐴→𝐴′ , id 𝐴→𝐴′ ⊗ M𝐵→𝐵′ ≡ M𝐵→𝐵′ (2.2.181)

throughout this book whenever a superoperator acts only on one of the tensor
factors of the underlying Hilbert space of linear operators.

49
Chapter 2: Mathematical Tools

Definition 2.17 Hermiticity-Preserving Superoperator

A superoperator N is called Hermiticity preserving if N(𝑋) is Hermitian for
every Hermitian input 𝑋.

Exercise 2.24
Using (2.2.52), prove that a superoperator N is Hermiticity preserving if and
only if N(𝑋 † ) = N(𝑋) † for every linear operator 𝑋.

Definition 2.18 Adjoint of a Superoperator

The adjoint of a superoperator N : L(H 𝐴 ) → L(H 𝐴′ ) is the unique superoper-
ator N† : L(H 𝐴′ ) → L(H 𝐴 ) that satisfies

⟨𝑌 , N(𝑋)⟩ = ⟨N† (𝑌 ), 𝑋⟩ (2.2.182)

for all 𝑋 ∈ L(H 𝐴 ) and 𝑌 ∈ L(H 𝐴′ ), where we recall that ⟨·, ·⟩ is the Hilbert–
Schmidt inner product defined in (2.2.28).

Exercise 2.25
Let N 𝐴→𝐵 be a superoperator represented as in (2.2.179).
1. Prove that the adjoint N† is given by N† (𝑌 ) = 𝑖=1 𝐾𝑖†𝑌 𝐿 𝑖 for every linear
Í𝑛
operator 𝑌 .
2. If N is Hermiticity preserving,Í
then prove that an alternate operator-sum
representation of N is N(𝑋) = 𝑖=1𝑛
𝐿 𝑖 𝑋𝐾𝑖† for all 𝑋 ∈ L(H 𝐴 ).
3. Using 1. and 2., prove that if N is Hermiticity preserving, then so is its
adjoint N† .

Definition 2.19 Trace-Preserving and Unital Superoperator

Let N 𝐴→𝐵 be a superoperator.

50
Chapter 2: Mathematical Tools

1. N is called trace preserving if Tr[N(𝑋)] = Tr[𝑋] for all 𝑋 ∈ L(H 𝐴 ).

2. N is called unital if N( 1 𝐴 ) = 1𝐵 .

Remark: Observe that if N is trace preserving and unital, and if H 𝐴 and H 𝐵 have finite
dimensions, then we find that 𝑑 𝐴 = 𝑑 𝐵 , by taking the trace on both sides of N( 1 𝐴) = 1 𝐵 . This
means that, in finite dimensions, it is necessary for trace-preserving and unital superoperators to
have the same input and output dimensions.

Exercise 2.26
Let N 𝐴→𝐵 be a trace-preserving superoperator represented as in (2.2.179).
𝐾𝑖† 𝐿 𝑖 = 1 𝐴 .
Í𝑛
1. Prove that 𝑖=1
2. Using 1., show that the adjoint N† is unital. Thus, the adjoint of every
trace-preserving superoperator is unital.

For every superoperator N 𝐴→𝐵 : L(H 𝐴 ) → L(H𝐵 ), we define its induced trace
norm ∥N∥ 1 as

∥N(𝑋)∥ 1
∥N∥ 1 B sup : 𝑋 ∈ L(H 𝐴 ), 𝑋 ≠ 0 (2.2.183)
∥ 𝑋 ∥1
= sup{∥N(𝑋) ∥ 1 : 𝑋 ∈ L(H 𝐴 ), ∥ 𝑋 ∥ 1 ≤ 1}. (2.2.184)

Then, for all 𝑋 ∈ L(H 𝐴 ), it immediately follows that

∥N(𝑋) ∥ 1 ≤ ∥N∥ 1 ∥ 𝑋 ∥ 1 . (2.2.185)

Exercise 2.27
Prove that
∥N∥ 1 = sup N† (𝑈) ∞
(2.2.186)
𝑈∈L(H 𝐵 )
unitary

for every superoperator N 𝐴→𝐵 , where the optimization is with respect to every
unitary operator 𝑈 acting on H𝐵 .

51
Chapter 2: Mathematical Tools

Definition 2.20 Diamond Norm

Let N 𝐴→𝐵 be a superoperator. The quantity

∥N∥⋄ B sup{∥ (id 𝑅 ⊗ N 𝐴→𝐵 )(𝑋 𝑅 𝐴 )∥ 1 : 𝑋 𝑅 𝐴 ∈ L(H 𝑅 𝐴 ), ∥ 𝑋 𝑅 𝐴 ∥ 1 ≤ 1}

(2.2.187)
is known as the diamond norm of N, where the optimization is with respect to
every linear operator 𝑋 𝑅 𝐴 , and there is an implicit optimization over Hilbert
spaces H 𝑅 of dimension 𝑑 𝑅 ≥ 1.

Theorem 2.21
For every superoperator N 𝐴→𝐵 ,

∥N∥⋄ = ∥id 𝐴 ⊗ N 𝐴→𝐵 ∥ 1 (2.2.188)

= sup{∥ (id 𝐴 ⊗ N 𝐴→𝐵 )(|𝜓⟩⟨𝜙| 𝐴𝐴 )∥ 1 : ∥|𝜓⟩ 𝐴𝐴 ∥ 2 = ∥|𝜙⟩ 𝐴𝐴 ∥ 2 = 1}.
(2.2.189)

Furthermore, if N is Hermiticity preserving, then

∥N∥⋄ = sup{∥ (id 𝐴 ⊗ N 𝐴→𝐵 )(|𝜓⟩⟨𝜓| 𝐴𝐴 )∥ 1 : ∥|𝜓⟩ 𝐴𝐴 ∥ 2 = 1}. (2.2.190)

Proof: Please see the Bibliographic Notes in Section 2.6. ■

We study the diamond norm in detail in Chapter 6 in the context of quantum

channels.

2.3 Analysis and Probability

In this section, we briefly review some essential concepts from mathematical
analysis, in particular the concepts of the limit, supremum, and infimum, as well as
the continuity of real-valued functions. We also discuss compact sets, convex sets,
and functions, as well as the basic notions of probability distributions.

52
Chapter 2: Mathematical Tools

2.3.1 Limits, Infimum, Supremum, and Continuity

Limit of a sequence

We start with the definition of the limit of a sequence of real numbers. A sequence
{𝑠𝑛 }𝑛∈N ⊂ R of real numbers is said to have the limit ℓ, written lim𝑛→∞ 𝑠𝑛 = ℓ, if
for all 𝜀 > 0 there exists 𝑛𝜀 ∈ N such that, for all 𝑛 ≥ 𝑛𝜀 , the inequality |𝑠𝑛 − ℓ| < 𝜀
holds.
One can think of the concept of a limit intuitively as a game between two
players, an antagonist and a protagonist. The antagonist goes first, and gets to pick
an arbitrary 𝜀 > 0. The protagonist wins if he reports back an entry in the sequence
{𝑠𝑛 }𝑛 such that |𝑠𝑛 − ℓ| < 𝜀. If the protagonist reports back an entry 𝑠𝑛 such that
|𝑠𝑛 − ℓ| ≥ 𝜀, then the antagonist wins. If the limit exists and is equal to ℓ, then the
protagonist always wins by taking 𝑛 sufficiently large (i.e., larger than 𝑛𝜀 ) and then
reporting back 𝑠𝑛 . If the limit does not exist or if the limit is not equal to ℓ, then the
protagonist cannot necessarily win with the strategy of taking 𝑛 sufficiently large;
in this case, there exists a choice of 𝜀 > 0, such that for all 𝑛𝜀 ∈ N, there exists
𝑛 ≥ 𝑛𝜀 such that |𝑠𝑛 − ℓ| ≥ 𝜀. In this latter case, the choice of 𝜀 > 0 can again be
understood as a strategy of the antagonist.

Infimum and supremum

Let us now recall the concepts of the infimum and supremum of subsets of the
real numbers. Roughly speaking, they are generalizations of the concepts of the
minimum and maximum, respectively, of a set. Formally, let 𝐸 ⊂ R.
• A point 𝑥 ∈ R is a lower bound of 𝐸 if 𝑦 ≥ 𝑥 for all 𝑦 ∈ 𝐸. If 𝑥 is the greatest
such lower bound, then 𝑥 is called the infimum of 𝐸, and we write 𝑥 = inf 𝐸.
• A point 𝑥 ∈ R is an upper bound of 𝐸 if 𝑦 ≤ 𝑥 for all 𝑦 ∈ 𝐸. If 𝑥 is the least
such upper bound, then 𝑥 is called the supremum of 𝐸, and we write 𝑥 = sup 𝐸.
The supremum and infinum may or may not be contained in the subset 𝐸. For
example, let 𝐸 = { 𝑛1 }𝑛∈N . Then, sup 𝐸 = 1 ∈ 𝐸, but inf 𝐸 = 0 ∉ 𝐸. As another
example, let 𝐸 = [0, 1). Then sup 𝐸 = 1 ∉ 𝐸 and inf 𝐸 = 0 ∈ 𝐸. If the supremum
is contained in 𝐸, then it is equal to the maximum element of 𝐸. Similarly, if the
infimum is contained in 𝐸, then it is equal to the minimum element of 𝐸.

53
Chapter 2: Mathematical Tools

When considering a function 𝐹 : 𝑆 → R defined on a subset 𝑆 of L(H), its

infimum and supremum are defined for the set 𝐸 = {𝐹 (𝑋) : 𝑋 ∈ 𝑆}. Specifically,

inf 𝐹 (𝑋) B inf{𝐹 (𝑋) : 𝑋 ∈ 𝑆}, (2.3.1)

𝑋∈𝑆

and
sup 𝐹 (𝑋) B sup{𝐹 (𝑋) : 𝑋 ∈ 𝑆}. (2.3.2)
𝑋∈𝑆

Limit inferior and limit superior

Turning back to limits, the limit of a sequence need not always exist. A particularly
illuminating example is the sequence {𝑟 𝑛 }𝑛∈N for 𝑟 ∈ R. If −1 < 𝑟 < 1, then the
limit exists and is equal to zero. If 𝑟 > 1, then the sequence never converges to a
finite value and so the limit does not exist. We say that the sequence diverges to +∞
in this case. If 𝑟 < −1, then the sequence oscillates and diverges (but it does not
specifically diverge to either +∞ or −∞). If 𝑟 = −1, then the sequence oscillates
back and forth between −1 and +1 and so the limit does not exist.
Given that the limit of a sequence need not always exist, it can be helpful to have
a reasonable substitute for this asymptotic concept that does always exist. Such a
substitute is provided by two quantities: the limit inferior and limit superior of a
sequence. We now define the limit inferior and limit superior, noting that they can
be understood as asymptotic versions of the infimum and supremum just discussed.
• We say that 𝑠 is an asymptotic lower bound on the sequence {𝑠𝑛 }𝑛∈N if for all
𝜀 > 0 there exists 𝑛𝜀 ∈ N such that, for all 𝑛 ≥ 𝑛𝜀 , the inequality 𝑠𝑛 > 𝑠 − 𝜀
holds. The limit inferior is the greatest asymptotic lower bound and is denoted
by
lim inf 𝑠𝑛 . (2.3.3)
𝑛→∞

• The definition of the limit superior is essentially opposite to that of the limit
inferior. We say that 𝑠 is an asymptotic upper bound on the sequence {𝑠𝑛 }𝑛∈N
if for all 𝜀 > 0, there exists 𝑛𝜀 ∈ N, such that for all 𝑛 ≥ 𝑛𝜀 , the inequality
𝑠𝑛 < 𝑠 + 𝜀 holds. The limit superior is the least asymptotic upper bound and is
denoted by
lim sup 𝑠𝑛 . (2.3.4)
𝑛→∞

54
Chapter 2: Mathematical Tools

The limit inferior and limit superior always exist by extending the real line R to
include −∞ and +∞. Furthermore, every asymptotic lower bound on the sequence
cannot exceed an asymptotic upper bound, implying that the following inequality
holds for every sequence {𝑠𝑛 }𝑛∈N :

lim inf 𝑠𝑛 ≤ lim sup 𝑠𝑛 . (2.3.5)

𝑛→∞ 𝑛→∞

If the opposite inequality holds for a sequence {𝑠𝑛 }𝑛∈N , then the limit of the
sequence exists and we can write

lim inf 𝑠𝑛 = lim sup 𝑠𝑛 = lim 𝑠𝑛 . (2.3.6)

𝑛→∞ 𝑛→∞ 𝑛→∞

This collapse is a direct consequence of the definitions of limit, limit inferior, and
limit superior.

Limits and continuity

We now consider the limit and continuity of a function. Specifically, we consider

real-valued functions 𝐹 : L(H) → R defined on the space of linear operators acting
on a Hilbert space H. We view this space as a normed vector space with either the
trace norm ∥·∥ 1 or the spectral norm ∥·∥ ∞ . In the definitions that follow, we use
∥·∥ to denote either one of these norms.
• Limit: We write lim 𝑋→𝑋0 𝐹 (𝑋) = 𝑦 to the mean the following: for all 𝜀 > 0,
there exists 𝛿𝜀 > 0 such that |𝐹 (𝑋) − 𝑦| < 𝜀 for all 𝑋 ∈ L(H) for which
∥ 𝑋 − 𝑋0 ∥ < 𝛿𝜀 .
• Continuity at a point: We say that 𝐹 is continuous at 𝑋0 if for all 𝜀 > 0,
there exists 𝛿𝜀 > 0 such that |𝐹 (𝑋) − 𝐹 (𝑋0 )| < 𝜀 for every point 𝑋 for which
∥ 𝑋 − 𝑋0 ∥ < 𝛿𝜀 . 𝐹 is said to be continuous if 𝐹 is continuous at 𝑋0 for all
𝑋0 ∈ L(H).
• Uniform continuity: We say that 𝑓 is uniformly continuous if for all 𝜀 > 0,
there exists 𝛿𝜀 > 0 such that |𝐹 (𝑋) − 𝐹 (𝑋 ′)| < 𝜀 for all 𝑋, 𝑋 ′ ∈ L(H) for
which ∥ 𝑋 − 𝑋 ′ ∥ < 𝛿𝜀 .
• Upper and lower semi-continuity: We say that 𝐹 is upper semi-continuous
at 𝑋0 if for all 𝜀 > 0, there exists a neighborhood 𝑁 𝑋0 ,𝜀 of 𝑋0 such that: if
𝐹 (𝑋0 ) > −∞, then 𝐹 (𝑋) ≤ 𝐹 (𝑋0 ) + 𝜀 for all 𝑋 ∈ 𝑁 𝑋0 ,𝜀 ; if 𝐹 (𝑋0 ) = −∞,
55
Chapter 2: Mathematical Tools

then lim 𝑋→𝑋0 𝐹 (𝑋) = −∞. 𝐹 called upper semi-continuous if it is upper

semi-continuous at 𝑋0 for all 𝑋0 ∈ L(H).
We say that 𝑓 is lower semi-continuous at 𝑋0 if for all 𝜀 > 0, there exists a
neighborhood 𝑁 𝑋0 ,𝜀 of 𝑋0 such that: if 𝐹 (𝑋0 ) < +∞, then 𝐹 (𝑋) ≥ 𝐹 (𝑋0 ) − 𝜀
for all 𝑋 ∈ 𝑁 𝑋0 ,𝜀 ; if 𝐹 (𝑋0 ) = +∞, then lim 𝑋→𝑋0 𝐹 (𝑋) = +∞.

2.3.2 Compact Sets

A subset 𝑆 of a topological vector space is called compact if every sequence of

elements in 𝑆 has a subsequence that converges to an element in 𝑆. For finite-
dimensional vector spaces, a subset 𝑆 is compact if and only if it is closed and
bounded.
An important fact about compact sets is that the infimum and supremum of
continuous functions defined on compact sets always exist and are contained in the
set. Consequently, for optimization problems over compact sets, the infimum can
be replaced by a minimum and the supremum can be replaced by a maximum. In
other words, if 𝐹 : 𝑆 → R is a continuous function defined on a compact subset 𝑆
of L(H), then

inf 𝐹 (𝑋) = min 𝐹 (𝑋) and sup 𝐹 (𝑋) = max 𝐹 (𝑋). (2.3.7)
𝑋∈𝑆 𝑋∈𝑆 𝑋∈𝑆 𝑋∈𝑆

An important example of a compact set is the set {𝑋 ∈ L(H) : 𝑋 ≥ 0, Tr[𝑋] ≤

1} of positive semi-definite operators with trace bounded from above by one. This
set contains the set of density operators acting on H.

2.3.3 Convex Sets and Functions

A subset 𝐶 of a vector space is called convex if, for all elements 𝑢, 𝑣 ∈ 𝐶 and for
all 𝜆 ∈ [0, 1], we have 𝜆𝑢 + (1 − 𝜆)𝑣 ∈ 𝐶. We often call 𝜆𝑢 + (1 − 𝜆)𝑣 a convex
combination of 𝑢 and 𝑣. More generally, for every set 𝑆 = {𝑣 𝑥 }𝑥∈X of elements of
a real vector space indexed by an alphabet X, and every function 𝑝 : X → [0, 1]
Í Í
with 𝑝(𝑥) ≥ 0 for all 𝑥 ∈ X and 𝑥∈X 𝑝(𝑥) = 1, the sum 𝑥∈X 𝑝(𝑥)𝑣 𝑥 is called a
convex combination of the vectors in 𝑆. The convex hull of 𝑆 is the convex set of
all possible convex combinations of the vectors in 𝑆.

56
Chapter 2: Mathematical Tools

Throughout this book, in the context of convex sets and functions, we consider
the real vector space of Hermitian operators acting on some Hilbert space. Then,
an important example of a convex subset is the set of all positive semi-definite
operators. Indeed, if 𝑋 and 𝑌 are positive semi-definite operators, then 𝜆𝑋 + (1−𝜆)𝑌
is a positive semi-definite operator for all 𝜆 ∈ [0, 1]. From now on, we assume 𝐶
to be a convex subset of the set of Hermitian operators, and we use 𝑋, 𝑌 , and 𝑍 to
denote arbitrary elements of 𝐶.
An element 𝑍 ∈ 𝐶 is called an extreme point of 𝐶 if 𝑍 cannot be written as a non-
trivial convex combination of other vectors in 𝐶. Formally, 𝑍 is an extreme point if
every decomposition of 𝑍 as the convex combination 𝑍 = 𝜆𝑋 + (1 − 𝜆)𝑌 , such that
𝜆 ∈ (0, 1) (so that the decomposition is non-trivial), implies that 𝑋 = 𝑌 = 𝑍. An
important fact is that every convex set is equal to the convex hull of its extreme
points.
We now define convex and concave functions.

Definition 2.22 Convex and Concave Functions

A function 𝐹 : 𝐶 → R defined on a convex subset 𝐶 ⊆ L(H) is a convex
function if, for all 𝑋, 𝑌 ∈ 𝐶 and 𝜆 ∈ [0, 1], the following inequality holds:

𝐹 (𝜆𝑋 + (1 − 𝜆)𝑌 ) ≤ 𝜆𝐹 (𝑋) + (1 − 𝜆)𝐹 (𝑌 ). (2.3.8)

A function 𝐹 : 𝐶 → R is a concave function if −𝐹 is a convex function.

It follows from an inductive argument using Definition 2.22 that a convex

function 𝐹 : 𝐶 → R on a convex subset 𝐶 ⊂ L(H) satisfies
!
∑︁ ∑︁
𝐹 𝑝(𝑥) 𝑋𝑥 ≤ 𝑝(𝑥)𝐹 (𝑋𝑥 ) (2.3.9)
𝑥∈X 𝑥∈X

every function 𝑝 : X → [0, 1] defined

for every set {𝑋𝑥 }𝑥∈X of elements in 𝐶 and Í
on X, such that 𝑝(𝑥) ≥ 0 for all 𝑥 ∈ X and 𝑥∈X 𝑝(𝑥) = 1.

• A function 𝐹 : 𝐶 → R defined on a convex subset 𝐶 ⊆ L(H) is called

quasi-convex if for all 𝑋, 𝑌 ∈ 𝐶 and all 𝜆 ∈ [0, 1] the following inequality
holds:
𝐹 (𝜆𝑋 + (1 − 𝜆)𝑌 ) ≤ max{𝐹 (𝑋), 𝐹 (𝑌 )}. (2.3.10)

57
Chapter 2: Mathematical Tools

• A function 𝐹 : 𝐶 → R defined on a convex subset 𝐶 ⊆ L(H) is called

quasi-concave if −𝐹 is quasi-convex. Specifically, 𝐹 is called quasi-concave if
for all 𝑋, 𝑌 ∈ 𝐶 and all 𝜆 ∈ [0, 1], the following inequality holds:

𝐹 (𝜆𝑋 + (1 − 𝜆)𝑌 ) ≥ min{𝐹 (𝑋), 𝐹 (𝑌 )}. (2.3.11)

2.3.4 Fenchel–Eggleston–Carathéodory Theorem

We mentioned above that the convex hull of a subset 𝑆 of a real vector space is the
convex set consisting of all convex combinations of the vectors in 𝑆. A fundamental
result is that if the underlying vector space has dimension 𝑑, then, in order to obtain
an element in the convex hull of 𝑆, it suffices to take a convex combination of no
more than 𝑑 + 1 elements of 𝑆. If 𝑆 is connected and compact, then no more than 𝑑
elements are required. We state this formally as follows.

Theorem 2.23 Fenchel–Eggleston–Carathéodory Theorem

Let 𝑆 be a set of vectors in a real 𝑑-dimensional vector space (𝑑 < ∞), and let
conv(𝑆) denote the convex hull of 𝑆. Then, an element 𝑣 ∈ conv(𝑆) if and only
if 𝑣 can be expressed as a convex combination of 𝑚 ≤ 𝑑 + 1 elements in 𝑆. If 𝑆
is connected and compact, then the same statement holds with 𝑚 ≤ 𝑑.

Proof: Please consult the Bibliographic Notes (Section 2.6). ■

2.3.5 Minimax Theorems

We often encounter expressions of the following form in quantum information

theory:
inf sup 𝐹 (𝑋, 𝑌 ). (2.3.12)
𝑋∈𝑆 𝑌 ∈𝑆 ′

The expression above contains both an infimum and a supremum over subsets
𝑆, 𝑆′ ⊆ L(H) of some real-valued function 𝐹 : 𝑆 × 𝑆′ → R.
Expressions such as the one in (2.3.12) arise in the context of two-player
zero-sum games. In such a game, the function 𝐹 (𝑋, 𝑌 ) represents the reward of a
protagonist, who chooses elements 𝑌 ∈ 𝑆′ in order to maximize 𝐹. The antagonist
chooses elements 𝑋 ∈ 𝑆 in order to minimize 𝐹, i.e., to minimize the reward to the
58
Chapter 2: Mathematical Tools

protagonist4 . The worst-case scenario for the antagonist is, no matter what element
𝑋 ∈ 𝑆 it chooses, the protagonist chooses the “best” possible element in 𝑆′ for its
benefit, so that the reward is 𝐺 (𝑋) B sup𝑌 ∈𝑆′ 𝐹 (𝑋, 𝑌 ). The optimal reward of
the antagonist in this scenario is thus given by inf 𝑋∈𝑆 𝐺 (𝑋), which is the quantity
in (2.3.12).
On the other hand, the worst-case scenario for the protagonist is, no matter what
element 𝑌 ∈ 𝑆′ it chooses, the antagonist chooses the “best” possible element in 𝑆
e(𝑌 ) B inf 𝑋∈𝑆 𝐹 (𝑋, 𝑌 ). The optimal reward
for its benefit, so that the reward is 𝐺
e(𝑌 ), i.e.,
of the protagonist in this scenario is thus given by sup𝑌 ∈𝑆′ 𝐺

sup inf 𝐹 (𝑋, 𝑌 ). (2.3.13)

𝑌 ∈𝑆 ′ 𝑋∈𝑆

Intuitively, it is advantageous for the protagonist to achieve a higher reward

when playing second (in reaction to the antagonist’s choice). This intuition is
captured by the following “max-min inequality”:

sup inf 𝐹 (𝑋, 𝑌 ) ≤ inf sup 𝐹 (𝑋, 𝑌 ), (2.3.14)

𝑌 ∈𝑆 ′ 𝑋∈𝑆 𝑋∈𝑆 𝑌 ∈𝑆 ′

which always holds. We now prove this formally. Observe that for all 𝑋 ∈ 𝑆, 𝑌 ∈ 𝑆′,
e(𝑌 ) ≤ 𝐹 (𝑋, 𝑌 ). It then follows that sup𝑌 ∈𝑆′ 𝐺
we have that 𝐺 e(𝑌 ) ≤ sup𝑌 ∈𝑆′ 𝐹 (𝑋, 𝑌 )
by applying the definition of supremum. Since this latter inequality holds for all 𝑋 ∈
𝑆, the definition of infimum implies that sup𝑌 ∈𝑆′ 𝐺 e(𝑌 ) ≤ inf 𝑋∈𝑆 sup𝑌 ∈𝑆′ 𝐹 (𝑋, 𝑌 ),
which is precisely the inequality in (2.3.14).
Many proofs that we present in this book require determining when the inequality
opposite to the one in (2.3.14) holds, i.e.,
?
inf sup 𝐹 (𝑋, 𝑌 ) ≤ sup inf 𝐹 (𝑋, 𝑌 ), (2.3.15)
𝑋∈𝑆 𝑌 ∈𝑆 ′ 𝑌 ∈𝑆 ′ 𝑋∈𝑆

which is known as the “min-max inequality.” If it holds, then the inequality in

(2.3.14) is saturated and becomes an equality. The game-theoretic interpretation
of the situation when the reverse inequality holds is that, for the sets 𝑆 and 𝑆′ and
the reward function 𝐹, there is no advantage to playing first; i.e., the reward is the
same regardless of who goes first, as long as the protagonist and antagonist play
4This is due the fact that the reward of the protagonist is equal to −𝐹 (𝑋, 𝑌 ) as a consequence of
the zero-sum property of the game.

59
Chapter 2: Mathematical Tools

optimal strategies. It is thus important to know under what conditions this reverse
inequality holds.
We now present theorems for two classes of functions that tell us when the
inequality in (2.3.15) holds.

Theorem 2.24 Sion Minimax

Let 𝑆 be a compact and convex subset of a normed vector space and let 𝑆′ be a
convex subset of a normed vector space. Let 𝐹 : 𝑆 × 𝑆′ → R be a real-valued
function such that
1. The function 𝐹 (𝑋, ·) : 𝑆′ → R is upper semi-continuous and quasi-concave
on 𝑆′ for every 𝑋 ∈ 𝑆.
2. The function 𝐹 (·, 𝑌 ) : 𝑆 → R is lower semi-continuous and quasi-convex
on 𝑆 for every 𝑌 ∈ 𝑆′.
Then,
inf sup 𝐹 (𝑋, 𝑌 ) = sup inf 𝐹 (𝑋, 𝑌 ). (2.3.16)
𝑋∈𝑆 𝑌 ∈𝑆 ′ 𝑌 ∈𝑆 ′ 𝑋∈𝑆

Furthermore, the infimum can be replaced with a minimum.

Proof: Please consult the Bibliographic Notes (Section 2.6). ■

Theorem 2.25 Mosonyi–Hiai Minimax

Let 𝑆 be a compact normed vector space, let 𝑆′ ⊆ R, and let 𝐹 : 𝑆 × 𝑆′ →
R ∪ {+∞, −∞}. Suppose that
1. The function 𝐹 (·, 𝑦) : 𝑆′ → R ∪ {+∞, −∞} is lower semi-continuous for
every 𝑦 ∈ 𝑆′.
2. The function 𝐹 (𝑋, ·) : 𝑆 → R ∪ {+∞, −∞} is either monotonically
increasing or monotonically decreasing for every 𝑋 ∈ 𝑆.
Then,
inf sup 𝐹 (𝑋, 𝑦) = sup inf 𝐹 (𝑋, 𝑦). (2.3.17)
𝑋∈𝑆 𝑦∈𝑆 ′ 𝑦∈𝑆 ′ 𝑋∈𝑆

Furthermore, the infimum can be replaced with a minimum.

60
Chapter 2: Mathematical Tools

Proof: Please consult the Bibliographic Notes (Section 2.6). ■

2.3.6 Probability Distributions

Throughout this book, we are concerned for the most part with discrete probability
distributions, and the following definitions suffice for our needs. A discrete
probability distribution is a function 𝑝 : X → [0, 1] defined on an finite alphabet X
Í
such that 𝑝(𝑥) ≥ 0 for all 𝑥 ∈ X and 𝑥∈X 𝑝(𝑥) = 1. Formally, we can consider the
alphabet X to be the set of realizations of a discrete random variable 𝑋 : Ω → X
from the space Ω of experimental outcomes, called the sample space, to the set X.
We then write 𝑝 𝑋 to denote the probability distribution of the random variable 𝑋,
i.e., 𝑝 𝑋 (𝑥) ≡ Pr[𝑋 = 𝑥].
The expected value or mean E[𝑋] of a random variable 𝑋 taking values in
X ⊂ R is defined as ∑︁
E[𝑋] = 𝑥 𝑝 𝑋 (𝑥). (2.3.18)
𝑥∈X

For every function 𝑔 : X → R, we define 𝑔(𝑋) to be the random variable

𝑔 ◦ 𝑋 : Ω → R with image {𝑔(𝑥) : 𝑥 ∈ X}. Then,
∑︁
E[𝑔(𝑋)] = 𝑔(𝑥) 𝑝 𝑋 (𝑥). (2.3.19)
𝑥∈X

A useful fact is Markov’s inequality: if 𝑋 is a non-negative random variable,

then for all 𝑎 > 0 we have
E[𝑋]
Pr[𝑋 ≥ 𝑎] ≤ . (2.3.20)
𝑎

Jensen’s inequality is the following: if 𝑋 is a random variable with finite mean,

and 𝑓 is a real-valued convex function acting on the output of 𝑋, then

𝑓 (E[𝑋]) ≤ E[ 𝑓 (𝑋)]. (2.3.21)

This is a very special case of the more elaborate operator Jensen inequality presented
previously in Theorem 2.16.
As we explain in Chapter 3, observables 𝑂 in quantum mechanics (which
are merely Hermitian operators) generalize random variables, such that their
61
Chapter 2: Mathematical Tools

expectation is given by E[𝑂] ≡ ⟨𝑂⟩ 𝜌 B Tr[𝑂 𝜌] for a density operator 𝜌. In this

case, an application of the operator Jensen inequality from Theorem 2.16 leads to
the following: for a Hermitian operator 𝑂, a density operator 𝜌, and an operator
convex function 𝑓 ,
𝑓 (Tr[𝑂 𝜌]) ≤ Tr[ 𝑓 (𝑂) 𝜌], (2.3.22)
which we can alternatively write as
𝑓 (⟨𝑂⟩ 𝜌 ) ≤ ⟨ 𝑓 (𝑂)⟩ 𝜌 . (2.3.23)

2.4 Semi-Definite Programming

Semi-definite programs (SDPs) constitute an important class of optimization
problems that arise frequently in quantum information theory. An SDP is a
constrained optimization problem in which the optimization variable is a positive
semi-definite operator 𝑋, the objective function is linear in 𝑋, and the constraint is
an operator inequality featuring a linear function of 𝑋. Not only are SDPs useful as
an analytical tool, but there also exist a number of numerical solvers that can be
used for evaluating these optimization problems (one can use the CVX package for
Matlab or the CVXPY package for Python).
Semi-definite programs play an important role in quantum information because,
for a number of operational tasks of interest, we are trying to maximize a linear
objective function over the sets of quantum states or measurements, which are
specified by semi-definite constraints. Furthermore, many of the different communi-
cation capacities of quantum channels are difficult to characterize or compute, and it
can helpful to find semi-definite relaxations of them that are efficiently computable.
These are two common ways in which semi-definite programs arise in this book.

Definition 2.26 Semi-Definite Program

Given a Hermiticity-preserving superoperator Φ and Hermitian operators 𝐴
and 𝐵, a semi-definite program (SDP) corresponds to two optimization problems.
The first is the primal SDP, which is defined as

maximize Tr[ 𝐴𝑋]

subject to Φ(𝑋) ≤ 𝐵, (2.4.1)
𝑋 ≥ 0.

62
Chapter 2: Mathematical Tools

The second optimization problem is the dual SDP, which is defined as

minimize Tr[𝐵𝑌 ]
subject to Φ† (𝑌 ) ≥ 𝐴, (2.4.2)
𝑌 ≥ 0.

We let

𝑆(Φ, 𝐴, 𝐵) B sup {Tr[ 𝐴𝑋] : Φ(𝑋) ≤ 𝐵}, (2.4.3)

𝑋 ≥0
𝑆(Φ,
b 𝐴, 𝐵) B inf {Tr[𝐵𝑌 ] : Φ† (𝑌 ) ≥ 𝐴} (2.4.4)
𝑌 ≥0

denote the optimal values of the primal and dual SDPs, respectively.

A variable 𝑋 for the primal SDP in (2.4.3) is called a feasible point if it is

positive semi-definite (𝑋 ≥ 0) and satisfies the constraint Φ(𝑋) ≤ 𝐵, and it is
called a strictly feasible point if 𝑋 is positive definite (𝑋 > 0) and the constraint is
satisfied with a strict inequality, i.e., if Φ(𝑋) < 𝐵. The same definitions apply to
the dual SDP in (2.4.4): a variable 𝑌 is a feasible point if 𝑌 ≥ 0 and Φ† (𝑌 ) ≥ 𝐴,
and it is strictly feasible if 𝑌 > 0 and Φ† (𝑌 ) > 𝐴. By convention, if there is no
primal feasible operator 𝑋, then 𝑆(Φ, 𝐴, 𝐵) = −∞, and if there is no dual feasible
operator 𝑌 , then 𝑆(Φ,
b 𝐴, 𝐵) = +∞. It is also possible for 𝑆(Φ, 𝐴, 𝐵) = +∞ or
𝑆(Φ, 𝐴, 𝐵) = −∞. A simple example of 𝑆(Φ, 𝐴, 𝐵) = +∞ is when 𝐴 = 1 is a scalar,
b
Φ(𝑋) = 0, and 𝐵 = 1 (so that the constraint in (2.4.3) is trivially satisfied), and a
simple example of 𝑆(Φ,
b 𝐴, 𝐵) = −∞ is when 𝐵 = −1, Φ(𝑌 ) = 0, and 𝐴 = −1.

Proposition 2.27 Weak Duality

For every SDP corresponding to Φ, 𝐴, and 𝐵, the following weak duality
inequality holds:
𝑆(Φ, 𝐴, 𝐵) ≤ 𝑆(Φ,
b 𝐴, 𝐵). (2.4.5)

Proof: Let 𝑋 ≥ 0 be primal feasible, and let 𝑌 ≥ 0 be dual feasible. Then the
following holds
Tr[ 𝐴𝑋] ≤ Tr[Φ† (𝑌 ) 𝑋] = Tr[𝑌 Φ(𝑋)] ≤ Tr[𝑌 𝐵]. (2.4.6)
The first inequality follows from the assumption that 𝑌 is dual feasible, so that
we have 𝐴 ≤ Φ† (𝑌 ), and by applying 2. of Lemma 2.14. The equality holds by
63
Chapter 2: Mathematical Tools

definition of the adjoint map Φ† ; see (2.2.182). The last inequality follows from the
assumption that 𝑋 is primal feasible, so that we have Φ(𝑋) ≤ 𝐵, and by applying
2. of Lemma 2.14. Since the inequality holds for all primal feasible 𝑋 and for all
dual feasible 𝑌 , we can take a supremum over the left-hand side of (2.4.6) and an
infimum over the right-hand side of (2.4.6), and we thus arrive at the weak duality
inequality in (2.4.5). ■

There is a deep connection between the weak duality inequality in Proposi-

tion 2.27 and the max-min inequality in (2.3.14). This is realized by introducing
the Lagrangian L(Φ, 𝐴, 𝐵, 𝑋, 𝑌 ) for the SDP as follows:

L(Φ, 𝐴, 𝐵, 𝑋, 𝑌 ) B Tr[ 𝐴𝑋] + Tr[𝐵𝑌 ] − Tr[Φ(𝑋)𝑌 ]. (2.4.7)

Note that the following equalities hold, which are helpful in the discussion below:

L(Φ, 𝐴, 𝐵, 𝑋, 𝑌 ) = Tr[ 𝐴𝑋] + Tr[(𝐵 − Φ(𝑋))𝑌 ] (2.4.8)

= Tr[𝐵𝑌 ] + Tr[( 𝐴 − Φ† (𝑌 )) 𝑋]. (2.4.9)

By first taking an infimum over 𝑌 ≥ 0 and then a supremum over 𝑋 ≥ 0, we

find that
sup inf L(Φ, 𝐴, 𝐵, 𝑋, 𝑌 ) = 𝑆(Φ, 𝐴, 𝐵). (2.4.10)
𝑋 ≥0 𝑌 ≥0
This equality follows because

sup inf L(Φ, 𝐴, 𝐵, 𝑋, 𝑌 ) = sup Tr[ 𝐴𝑋] + inf Tr[(𝐵 − Φ(𝑋))𝑌 ] . (2.4.11)
𝑋 ≥0 𝑌 ≥0 𝑋 ≥0 𝑌 ≥0

The inner infimum with respect to 𝑌 ≥ 0 forces the outer optimization to be with
respect to every feasible point 𝑋 for the primal SDP in (2.4.3). In this sense,
the variable 𝑌 can be thought as a “Lagrange multiplier”, analogous to Lagrange
multipliers that are used in constrained optimization problems in elementary
calculus. Indeed, suppose that an infeasible 𝑋 ≥ 0 is chosen, meaning that the
constraint Φ(𝑋) ≤ 𝐵 is violated. This means that there exists a non-trivial negative
eigenspace of 𝐵 − Φ(𝑋). Let |𝜑⟩ be a unit vector in this negative eigenspace.
We can then pick 𝑌 = 𝑐|𝜑⟩⟨𝜑| for 𝑐 > 0 and take the limit 𝑐 → ∞, so that
inf𝑌 ≥0 Tr[(𝐵 − Φ(𝑋))𝑌 ] = −∞. So a violation of the constraint Φ(𝑋) ≤ 𝐵 incurs
an infinite cost for the outer optimization with respect to 𝑋 ≥ 0, which is suboptimal.
The constraint Φ(𝑋) ≤ 𝐵 is therefore forced to be satisfied, leading to the equality
in (2.4.10).
64
Chapter 2: Mathematical Tools

If we instead take a supremum over 𝑋 ≥ 0 first and then take an infimum over
𝑌 ≥ 0, it follows that
inf sup L(Φ, 𝐴, 𝐵, 𝑋, 𝑌 ) = 𝑆(Φ,
b 𝐴, 𝐵). (2.4.12)
𝑌 ≥0 𝑋 ≥0

This time, the equality follows because

inf sup L(Φ, 𝐴, 𝐵, 𝑋, 𝑌 ) = inf Tr[𝐵𝑌 ] + sup Tr[( 𝐴 − Φ† (𝑌 )) 𝑋] (2.4.13)
𝑌 ≥0 𝑋 ≥0 𝑌 ≥0 𝑋 ≥0

Similar to what was argued previously, the inner optimization variable 𝑋 is a

Lagrange multiplier that forces the outer optimization to be with respect to every
feasible point 𝑌 for the dual SDP in (2.4.4). Indeed, suppose that an infeasible
𝑌 ≥ 0 is chosen, meaning that the constraint Φ† (𝑌 ) ≥ 𝐴 is violated. This means
that 𝐴 − Φ† (𝑌 ) has a non-trivial positive eigenspace. Let |𝜑⟩ be a unit vector in
this positive eigenspace. We can then pick 𝑋 = 𝑐|𝜑⟩⟨𝜑| for 𝑐 > 0 and take the limit
𝑐 → ∞, so that sup 𝑋 ≥0 Tr[( 𝐴 − Φ† (𝑌 )) 𝑋] = +∞. So a violation of the constraint
Φ† (𝑌 ) ≥ 𝐴 incurs an infinite penalty for the outer optimization with respect to
𝑌 ≥ 0, which is suboptimal. The constraint Φ† (𝑌 ) ≥ 𝐴 is therefore forced to be
satisfied, leading to the equality in (2.4.12).
Now, by examining (2.3.14), (2.4.10), and (2.4.12), we see that the weak duality
inequality in Proposition 2.27 can be understood as a consequence of the max-min
inequality in (2.3.14).
The inequality opposite to the one in (2.4.5) does not hold in general; if it does,
it implies that 𝑆(Φ, 𝐴, 𝐵) = 𝑆(Φ,
b 𝐴, 𝐵). We then say that the SDP corresponding
to Φ, 𝐴, and 𝐵 has the strong duality property, or that it satisfies strong duality.
Considering the discussion above in terms of the Lagrangian of the SDP, we also
can understand strong duality as being equivalent to a minimax theorem holding.

Theorem 2.28 Slater’s Condition

Slater’s condition is a sufficient condition for strong duality to hold, and it is
given as follows:
1. If there exists 𝑋 ≥ 0 such that Φ(𝑋) ≤ 𝐵 and there exists 𝑌 > 0 such that
Φ† (𝑌 ) > 𝐴, then 𝑆(Φ, 𝐴, 𝐵) = 𝑆(Φ,
b 𝐴, 𝐵). Furthermore, there exists a
primal feasible operator 𝑋 for which Tr[ 𝐴𝑋] = 𝑆(Φ, 𝐴, 𝐵).
2. If there exists 𝑌 ≥ 0 such that Φ† (𝑌 ) ≥ 𝐴 and there exists 𝑋 > 0 such

65
Chapter 2: Mathematical Tools

that Φ(𝑋) < 𝐵, then 𝑆(Φ, 𝐴, 𝐵) = 𝑆(Φ,

b 𝐴, 𝐵). Furthermore, there exists
a dual feasible operator 𝑌 for which Tr[𝐵𝑌 ] = 𝑆(Φ,
b 𝐴, 𝐵).

Remark: The nomenclature Slater’s “condition” (rather than “conditions”) is commonly used,
but note that one can check either one of the two sufficient conditions above to determine if
strong duality holds.

For many SDPs of interest, it is straightforward to determine if Slater’s condition

holds. We provide an example in Section 2.4.1.
Complementary slackness for SDPs is useful for understanding further con-
straints on an optimal primal operator 𝑋 and an optimal dual operator 𝑌 .

Proposition 2.29 Complementary Slackness of SDPs

Consider an arbitrary SDP corresponding to Φ, 𝐴, and 𝐵, and suppose that
strong duality holds. Then the following complementary slackness conditions
hold for feasible 𝑋 and 𝑌 if and only if they are optimal:

𝑌 𝐵 = 𝑌 Φ(𝑋), (2.4.14)
Φ† (𝑌 ) 𝑋 = 𝐴𝑋. (2.4.15)

Proof: On the one hand, suppose that 𝑋 is primal feasible, that 𝑌 is dual feasible,
and that they satisfy (2.4.14)–(2.4.15). Then it is clear by inspecting (2.4.6) that
the inequalities are saturated, thus implying that 𝑋 is primal optimal and 𝑌 is dual
optimal.
On the other hand, suppose that 𝑋 is primal optimal and that 𝑌 is dual optimal.
Then, by this assumption, it follows that Tr[ 𝐴𝑋] = Tr[𝐵𝑌 ] so that the inequalities
in (2.4.6) are saturated. This means that
Tr[(Φ† (𝑌 ) − 𝐴) 𝑋] = 0 (2.4.16)
Tr[𝑌 (𝐵 − Φ(𝑋))] = 0. (2.4.17)

Since Φ† (𝑌 )− 𝐴 and 𝑋 are positive semi-definite, the equality in (2.4.16) implies that
(Φ† (𝑌 ) − 𝐴) 𝑋 = 0, which is equivalent to (2.4.14). Similarly, since 𝐵 − Φ(𝑋) and
𝑌 are positive semi-definite, the equality in (2.4.17) implies that 𝑌 (𝐵 − Φ(𝑋)) = 0,
which is equivalent to (2.4.15). ■
66
Chapter 2: Mathematical Tools

If the matrices 𝐴 and 𝐵 and the map Φ involved in an SDP are of reasonable size,
then the SDP can be computed efficiently using numerical solvers (specifically, the
time required is polynomial in the size of these objects and polynomial in the inverse
of the numerical accuracy desired). As mentioned earlier, SDPs arise frequently in
quantum information, with some examples appearing in Chapter 6. Furthermore,
SDPs appear in some of the upper bounds for rates of quantum communication
protocols that we consider in Parts II and III.

Exercise 2.28
Consider the following pair of primal and dual optimization problems:

maximize Tr[𝐶 𝑍] minimize Tr[𝐷𝑊]

subject to Ψ(𝑍) = 𝐷, subject to Ψ† (𝑊) ≥ 𝐶, (2.4.18)
𝑍≥0 𝑊 Hermitian,

where 𝐶 and 𝐷 are Hermitian operators and Ψ is a Hermiticity-preserving

superoperator. Let us show that these problems are SDPs, i.e., that they are
equivalent to the optimization problems in (2.4.1) and (2.4.2).
1. Given 𝐶, 𝐷, and Ψ for the optimization problems in (2.4.18), find 𝐴,
𝐵, and Φ such that these optimization problems can be expressed in the
forms presented in (2.4.1) and (2.4.2). (Hint: Start by using the fact that
Ψ(𝑍) = 𝐷 if and only if Ψ(𝑍) ≤ 𝐷 and −Ψ(𝑍) ≤ −𝐷.)
2. Conversely, given 𝐴, 𝐵, and Φ for the SDPs in (2.4.1) and (2.4.2), find 𝐶,
𝐷, and Ψ such that those SDPs can be expressed in the forms in (2.4.18).
(Hint: Start by using the fact that Φ(𝑋) ≤ 𝐵 if and only if there exists
𝑆 ≥ 0 such that Φ(𝑋) + 𝑆 = 𝐵.)

Reasoning analogous to that in Exercise 2.28 can be used to show that the
following pair of optimization problems are also SDPs, equivalent to the ones in
(2.4.1) and (2.4.2):

minimize Tr[𝐶 𝑍] maximize Tr[𝐷𝑊]

subject to Ψ(𝑍) = 𝐷, subject to Ψ† (𝑊) ≤ 𝐶, (2.4.19)
𝑍≥0 𝑊 Hermitian

67
Chapter 2: Mathematical Tools

Exercise 2.29
1. Consider the following SDP in primal form:

sup {Tr[ 𝐴𝑋] : Φ1 (𝑋) ≤ 𝐵1 , Φ2 (𝑋) = 𝐵2 } , (2.4.20)

𝑋 ≥0

where 𝐴, 𝐵1 , 𝐵2 are Hermitian operators and Φ1 , Φ2 are Hermiticity-

preserving superoperators. Show that the dual SDP is given by
n o
inf Tr[𝐵1𝑌1 ] + Tr[𝐵2𝑌2 ] : Φ†1 (𝑌1 ) + Φ†2 (𝑌2 ) ≥ 𝐴 . (2.4.21)
𝑌1 ≥0,
𝑌2 Hermitian

Furthermore, evaluate Slater’s conditions for strong duality, as well as the

conditions for complementary slackness.
2. Now suppose that the primal SDP has the form

inf {Tr[𝐵𝑌 ] : Φ1 (𝑌 ) ≥ 𝐴1 , Φ2 (𝑌 ) = 𝐴2 } , (2.4.22)

𝑌 ≥0

where 𝐴1 , 𝐴2 , 𝐵 are Hermitian operators and Φ1 , Φ2 are Hermiticity-

preserving superoperators. Show that the dual SDP is given by
n o
† †
sup Tr[ 𝐴1 𝑋1 ] + Tr[ 𝐴2 𝑋2 ] : Φ1 (𝑋1 ) + Φ2 (𝑋2 ) ≤ 𝐵 . (2.4.23)
𝑋1 ≥0,
𝑋2 Hermitian

Furthermore, evaluate Slater’s conditions for strong duality, as well as the

conditions for complementary slackness.

2.4.1 SDPs for Spectral and Trace Norm, Maximum and Mini-
mum Eigenvalue

In this section, we provide semi-definite programs for calculating the spectral and
trace norms of Hermitian operators, as well as their largest and smallest eigenvalues.

68
Chapter 2: Mathematical Tools

Theorem 2.30 SDPs for the Spectral Norm of Hermitian Operators

Let 𝐻 be a Hermitian operator, and consider the following functions:

𝑓 (𝐻) B sup {Tr[𝐻 (𝑋1 − 𝑋2 )] : Tr[𝑋1 + 𝑋2 ] ≤ 1} , (2.4.24)

𝑋1 ,𝑋2 ≥0

𝑓 (𝐻) B inf {𝑡 : −𝑡 1 ≤ 𝐻 ≤ 𝑡 1} .
b (2.4.25)
𝑡≥0

The quantities above can be computed via SDPs, and in fact, the following
equality holds
𝑓 (𝐻) = ∥𝐻 ∥ ∞ .
𝑓 (𝐻) = b (2.4.26)
That is, 𝑓 (𝐻) is equal to the largest singular value of the Hermitian operator 𝐻.

Proof: Given that the optimization in (2.4.24) is a maximization, let us first show
that (2.4.24) can be written in the form of 𝑆(Φ, 𝐴, 𝐵) in (2.4.3). Indeed if we let
𝑋1 𝑍 †

𝐻 0
𝑋= , 𝐴= , (2.4.27)
𝑍 𝑋2 0 −𝐻
Φ(𝑋) = Tr[𝑋1 + 𝑋2 ], 𝐵 = 1, (2.4.28)
then we have that
𝑓 (𝐻) = sup {Tr[ 𝐴𝑋] : Φ(𝑋) ≤ 𝐵} . (2.4.29)
𝑋 ≥0
The constraint 𝑋 ≥ 0 implies that 𝑋1 , 𝑋2 ≥ 0. Furthermore, notice that the operator
𝑍 appears neither in the objective function Tr[𝐻 (𝑋1 − 𝑋2 )] nor in the constraint
Tr[𝑋1 + 𝑋2 ] ≤ 1. Thus, the operator 𝑍 plays no role in the optimization, and so we
can simply set 𝑍 = 0, so that

𝑋1 0
𝑋= . (2.4.30)
0 𝑋2
Thus, (2.4.24) is indeed an SDP in primal form.
Now, recall from (2.2.124) that the spectral norm of 𝐻 is given by the maximum
of the absolute values of the eigenvalues of 𝐻. In particular, we can write
∥𝐻 ∥ ∞ = max {|𝜆 max | , |𝜆 min |} , (2.4.31)
where 𝜆 max and 𝜆 min are the maximum and minimum eigenvalues, respectively,
of 𝐻. Note that we always have 𝜆 max ≥ 𝜆 min . Let |𝜙max ⟩ be an eigenvector of 𝐻
69
Chapter 2: Mathematical Tools

satisfying 𝐻|𝜙max ⟩ = 𝜆 max |𝜙max ⟩, and let |𝜙min ⟩ be an eigenvector of 𝐻 satisfying

𝐻|𝜙min ⟩ = 𝜆 min |𝜙min ⟩. Let us suppose at first that 𝜆 max ≥ 0. Then one feasible
choice of 𝑋1 and 𝑋2 in (2.4.24) is 𝑋1 = |𝜙max ⟩⟨𝜙max | and 𝑋2 = 0, and for this
choice, we find that 𝑓 (𝐻) ≥ 𝜆 max = |𝜆 max |. If 𝜆 max ≤ 0, then another feasible
choice of 𝑋1 and 𝑋2 in (2.4.24) is 𝑋1 = 0 and 𝑋2 = |𝜙max ⟩⟨𝜙max |, and for this choice,
we find that 𝑓 (𝐻) ≥ −𝜆 max = |𝜆 max |. Therefore, we conclude that

𝑓 (𝐻) ≥ |𝜆 max | . (2.4.32)

Now, suppose that 𝜆 min ≥ 0. Then a feasible choice of 𝑋1 and 𝑋2 in (2.4.24) is

𝑋1 = |𝜙min ⟩⟨𝜙min | and 𝑋2 = 0, and for this choice, we find that 𝑓 (𝐻) ≥ 𝜆 min = |𝜆 min |.
If 𝜆 min ≤ 0, then another feasible choice of 𝑋1 and 𝑋2 in (2.4.24) is 𝑋1 = 0 and
𝑋2 = |𝜙min ⟩⟨𝜙min |, and for this choice, we find that 𝑓 (𝐻) ≥ −𝜆 min = |𝜆min |.
Therefore, we conclude that

𝑓 (𝐻) ≥ max {|𝜆 max | , |𝜆min |} = ∥𝐻 ∥ ∞ . (2.4.33)

It now remains to prove the reverse inequality, namely, the inequality 𝑓 (𝐻) ≤
∥𝐻 ∥ ∞ . To prove this, let us show that b
𝑓 (𝐻), as defined in (2.4.25), is given by the
SDP dual to the one that defines 𝑓 (𝐻). In order to do this, we should determine
the map Φ† , which is the adjoint of Φ. Since 𝐵 = 1 and Φ(𝑋) = Tr[𝑋1 + 𝑋2 ] are
scalars, we take 𝑌 = 𝑡 to be a scalar also. Then, we find that

Tr[𝑌 Φ(𝑋)] = 𝑡 Tr[𝑋1 + 𝑋2 ] (2.4.34)

𝑡 1 0 𝑋1 0

= Tr
0 𝑡 1 0 𝑋2
(2.4.35)

= Tr[Φ† (𝑌 ) 𝑋], (2.4.36)

from which we conclude that

𝑡1 0

† †
Φ (𝑌 ) = Φ (𝑡) =
0 𝑡1
. (2.4.37)

Plugging this into the standard form of the dual in (2.4.4), we find that
𝑡1 0

† 𝐻 0
inf Tr[𝐵𝑌 ] : Φ (𝑌 ) ≥ 𝐴 = inf 𝑡 : ≥
0 𝑡1
(2.4.38)
𝑌 ≥0 𝑡≥0 0 −𝐻
= inf {𝑡 : 𝑡 1 ≥ 𝐻, 𝑡 1 ≥ −𝐻} (2.4.39)
𝑡≥0
= inf {𝑡 : −𝑡 1 ≤ 𝐻 ≤ 𝑡 1} (2.4.40)
𝑡≥0

70
Chapter 2: Mathematical Tools

𝑓 (𝐻).
= b (2.4.41)

Let us now recall property 3. of Lemma 2.14, which states that 𝜆 min 1 ≤
𝐻 ≤ 𝜆max 1. By combining with (2.4.33), we find that 𝜆 max 1 ≤ ∥𝐻 ∥ ∞ 1 and
𝜆 min 1 ≥ − ∥𝐻 ∥ ∞ 1, which implies that
− ∥𝐻 ∥ ∞ 1 ≤ 𝐻 ≤ ∥𝐻 ∥ ∞ 1. (2.4.42)
Thus, we see that ∥𝐻 ∥ ∞ is a feasible choice for 𝑡 in (2.4.40), which implies that

𝑓 (𝐻) ≤ ∥𝐻 ∥ ∞ .
b (2.4.43)

Now, combining the inequalities in (2.4.33) and (2.4.43) gives us 𝑓 (𝐻) ≥

∥𝐻 ∥ ∞ ≥ b 𝑓 (𝐻). Then, using the weak duality inequality from Proposition 2.27,
which for our case implies that 𝑓 (𝐻) ≤ b𝑓 (𝐻), we conclude that the primal and
dual optimal values are equal to each other and equal to the spectral norm of 𝐻:
𝑓 (𝐻) = b𝑓 (𝐻) = ∥𝐻 ∥ ∞ . ■

We proved (2.4.26) by employing clever guesses for primal feasible and dual
feasible points. Doing so is possible in this case because the problem is simple
enough to begin with, and we could apply knowledge from linear algebra to make
these clever guesses. Although it is sometimes possible to make clever guesses and
arrive at analytical solutions like we did above, in many cases it is not possible. In
such cases, it can be helpful to check Slater’s condition in Theorem 2.28 explicitly
in order to see if strong duality holds. So let us do so for the SDPs corresponding to
𝑓 (𝐻) and b𝑓 (𝐻). For the primal SDP in (2.4.24), a strictly feasible point consists
of the choice 𝑋1 = 𝛼 1𝑑 and 𝑋2 = 𝛽 1𝑑 such that 𝛼, 𝛽 > 0 and 𝛼 + 𝛽 < 1, where 𝑑 is
the dimension of 1. Then we clearly have 𝑋1 > 0, 𝑋2 > 0, and Tr[𝑋1 + 𝑋2 ] < 1,
so that 𝑋1 and 𝑋2 are strictly feasible, as claimed. A feasible point for the dual
consists of the choice 𝛾 ≥ ∥𝐻 ∥ ∞ . Thus, strong duality holds, further confirming
that 𝑓 (𝐻) = b 𝑓 (𝐻), as shown above.
We now remark about the complementary slackness conditions from Proposi-
tion 2.29 for the SDPs corresponding to 𝑓 (𝐻) and b 𝑓 (𝐻), which apply to optimal
primal 𝑋 and optimal dual 𝑌 . In this case, the conditions reduce to
𝑡 = 𝑡 Tr[𝑋1 + 𝑋2 ], (2.4.44)
𝑡1 0

𝑋1 0 𝐻 0 𝑋1 0
=
0 𝑡1
, (2.4.45)
0 𝑋2 0 −𝐻 0 𝑋2
71
Chapter 2: Mathematical Tools

and the latter is the same as the following two separate conditions:

𝑡 𝑋1 = 𝐻 𝑋1 , −𝑡 𝑋2 = 𝐻 𝑋2 . (2.4.46)

If we have prior knowledge about the operator 𝐻,—e.g., that one of its eigenvalues
is non-zero—then we conclude that the optimal 𝑡 ≠ 0 and the condition in (2.4.44)
implies that Tr[𝑋1 + 𝑋2 ] = 1. In this case, we can conclude that the inequality
constraint in (2.4.24) is loose and it suffices to optimize over 𝑋1 and 𝑋2 satisfying
the constraint with equality. The conditions in (2.4.46) indicate that the image of
the optimal 𝑋1 should be in the eigenspace of 𝐻 with optimal eigenvalue 𝑡, and the
image of the optimal 𝑋2 should be in the eigenspace of 𝐻 with optimal eigenvalue
−𝑡. Observe that these complementary slackness conditions are consistent with the
choices that we made above.
As a final remark, if 𝐻 is actually positive semi-definite, then the lower bound
constraint in (2.4.25) is unnecessary. Letting 𝑃 be a positive semi-definite operator,
we thus find that
𝑓 (𝑃) = ∥𝑃∥ ∞ = inf {𝑡 : 𝑃 ≤ 𝑡 1} . (2.4.47)
𝑡≥0

Note that, in this case, ∥𝑃∥ ∞ is the largest eigenvalue of 𝑃.

Exercise 2.30 SDPs for the Trace Norm of Hermitian Operators

1. Let 𝐻 be a Hermitian operator. Like the spectral norm of 𝐻, as shown

above, prove that the trace norm of 𝐻 can also be computed using an SDP.
Specifically, prove that

∥𝐻 ∥ 1 = sup {Tr[𝐻 (Λ1 − Λ2 )] : Λ1 , Λ2 ≤ 1}. (2.4.48)

Λ1 ,Λ2 ≥0

(Hint: Use (2.2.67) and (2.2.128).)

2. Show that an alternate SDP formulation for ∥𝐻 ∥ 1 is

∥𝐻 ∥ 1 = inf {Tr[𝑌1 + 𝑌2 ] : 𝑌1 ≥ 𝐻, 𝑌2 ≥ −𝐻}. (2.4.49)

𝑌1 ,𝑌2 ≥0

(Hint: Show that the SDP in (2.4.49) is dual to the one in (2.4.48), and
then prove strong duality.)

72
Chapter 2: Mathematical Tools

Exercise 2.31 SDPs for the Maximum and Minimum Eigenvalue of Her-
mitian Operators
Let 𝐻 be a Hermitian operator. Prove that the maximum and minimum
eigenvalues of 𝐻, denoted by 𝜆 max (𝐻) and 𝜆 min (𝐻), respectively, have the
following SDP characterizations:

𝜆 min (𝐻) = inf {Tr[𝐻 𝜌] : Tr[𝜌] = 1} (2.4.50)

𝜌≥0
= sup{𝑡 : 𝐻 ≥ 𝑡 1} (2.4.51)
𝑡∈R

and

𝜆 max (𝐻) = sup{Tr[𝐻 𝜌] : Tr[𝜌] = 1} (2.4.52)

𝜌≥0
= inf {𝑡 : 𝑡 1 ≥ 𝐻}. (2.4.53)
𝑡∈R

(Hint: Use the spectral theorem (Theorem 2.4) and the duality of SDPs.)

2.5 Symmetric Subspace

Given a 𝑑-dimensional Hilbert space H and an 𝑛-fold tensor product H ⊗𝑛 of H, for

𝑛 ≥ 2, it is often important to consider a permutation of the individual Hilbert spaces
in the 𝑛-fold tensor product. This is especially the case in quantum information
theory, because we often assume or it is often the case that the resources involved
have permutation symmetry. These permutations can be implemented using a
unitary representation of the symmetric group on 𝑛 elements.
The symmetric group on 𝑛 elements, denoted by S𝑛 , is defined to be the set
of permutations of the set {1, 2, . . . , 𝑛}. A permutation in S𝑛 is an invertible
function 𝜋 : {1, 2, . . . , 𝑛} → {1, 2, . . . , 𝑛} that describes how each element in the
set {1, 2, . . . , 𝑛} should be rearranged, or permuted. An example of a permutation
in S3 is the function 𝜋 such that 𝜋(1) = 3, 𝜋(2) = 1, and 𝜋(3) = 2. Since there
are 𝑛! ways to permute 𝑛 distinct elements, it follows that the set S𝑛 contains 𝑛!
elements.
Given a permutation 𝜋 ∈ S𝑛 and an orthonormal basis {|𝑖⟩}𝑖=0
𝑑−1 for H, we define

73
Chapter 2: Mathematical Tools

the unitary permutation operators 𝑊 𝜋 acting on H ⊗𝑛 by

𝑊 𝜋 |𝑖 1 , 𝑖2 , . . . , 𝑖 𝑛 ⟩ = |𝑖 𝜋(1) , 𝑖 𝜋(2) , . . . , 𝑖 𝜋(𝑛) ⟩, 0 ≤ 𝑖 1 , 𝑖2 , . . . , 𝑖 𝑛 ≤ 𝑑 − 1. (2.5.1)

Since the set {|𝑖 1 , 𝑖2 , . . . , 𝑖 𝑛 ⟩ : 0 ≤ 𝑖 1 , 𝑖2 , . . . , 𝑖 𝑛 ≤ 𝑑 − 1} is an orthonormal basis

for H ⊗𝑛 , the definition in (2.5.1) extends to every vector in H ⊗𝑛 by linearity. The
operators in the set {𝑊 𝜋 } 𝜋∈S𝑛 constitute a unitary representation of S𝑛 , in the sense
that
−1
(𝑊 𝜋 ) † = 𝑊 𝜋 , 𝑊 𝜋1 𝑊 𝜋2 = 𝑊 𝜋1 ◦𝜋2 (2.5.2)
for all 𝜋, 𝜋1 , 𝜋2 ∈ S𝑛 .
Given a 𝑑-dimensional Hilbert space H and the unitary representation {𝑊 𝜋 } 𝜋∈S𝑛
of S𝑛 defined in (2.5.1), we are interested in the subspace of vectors |𝜓⟩ ∈ H ⊗𝑛
that are invariant under permutations, i.e., 𝑊 𝜋 |𝜓⟩ = |𝜓⟩ for all 𝜋 ∈ S𝑛 . We call
this subspace the symmetric subspace of H ⊗𝑛 , and it is formally defined as

Sym𝑛 (H) B span{|𝜓⟩ ∈ H ⊗𝑛 : 𝑊 𝜋 |𝜓⟩ = |𝜓⟩ for all 𝜋 ∈ S𝑛 }. (2.5.3)

A vector |𝜓⟩ ∈ Sym𝑛 (H) is sometimes called symmetric. The subspace

ASym𝑛 (H) B span{|𝜓⟩ ∈ H ⊗𝑛 : 𝑊 𝜋 |𝜓⟩ = sgn(𝜋)|𝜓⟩ for all 𝜋 ∈ S𝑛 } (2.5.4)

is called the anti-symmetric subspace of H ⊗𝑛 , where sgn(𝜋) is the sign of the permu-
tation 𝜋, defined as sgn(𝜋) = (−1)𝑇 (𝜋) where 𝑇 (𝜋) is the number of transpositions
into which 𝜋 can be decomposed5 .
The operator
1 ∑︁ 𝜋
ΠSym𝑛 (H) B 𝑊 (2.5.5)
𝑛!
𝜋∈S𝑛

is the orthogonal projection onto the symmetric subspace of H ⊗𝑛 , while

1 ∑︁
ΠASym𝑛 (H) B sgn(𝜋)𝑊 𝜋 (2.5.6)
𝑛!
𝜋∈S𝑛

is the orthogonal projection onto the anti-symmetric subspace of H ⊗𝑛 .

5A transposition is a permutation that permutes only two elements of the set {1, 2, . . . , 𝑛}.
Any permutation 𝜋 ∈ S𝑛 can be decomposed into a product of transpositions. Although this
decomposition is in general not unique, the parity of the number 𝑇 (𝜋) of transpositions into which
𝜋 can be decomposed is unique, so that sgn(𝜋) is well defined.

74
Chapter 2: Mathematical Tools

Exercise 2.32
1. Prove that ΠSym𝑛 (H) and ΠASym𝑛 (H) are projections, as claimed above.
2. Prove that Sym𝑛 (H) and ASym𝑛 (H) are orthogonal subspaces of H ⊗𝑛 by
showing that
ΠSym𝑛 (H) ΠASym𝑛 (H) = 0. (2.5.7)
This implies that ⟨𝜓 𝑠 |𝜓𝑎 ⟩ = 0 for all |𝜓 𝑠 ⟩ ∈ Sym𝑛 (H) and |𝜓𝑎 ⟩ ∈
ASym𝑛 (H).

Exercise 2.33
Let H be a 𝑑-dimensional Hilbert space, 𝑑 ≥ 2. Show that, for 𝑛 = 2,
1
ΠSym2 (H) = ( 1𝑑 ⊗ 1𝑑 + 𝐹), (2.5.8)
2
1
ΠASym2 (H) = ( 1𝑑 ⊗ 1𝑑 − 𝐹), (2.5.9)
2
where 𝐹 B 𝑊 𝜋 is the representation of the permutation 𝜋 = (1 2), i.e.,
𝑑−1
∑︁
𝐹= |𝑘⟩⟨𝑘 ′ | ⊗ |𝑘 ′⟩⟨𝑘 |. (2.5.10)
𝑘,𝑘 ′ =0

In quantum information theory, 𝐹 is referred to as the swap operator.

We focus primarily on the symmetric subspace of H ⊗𝑛 in this book, and so we

now provide some additional facts about it.
The following set of vectors constitutes an orthonormal basis for the symmetric
subspace Sym𝑛 (H) corresponding to the 𝑑-dimensional Hilbert space H:
|𝑛1 , 𝑛2 , . . . , 𝑛 𝑑 ⟩
1 ∑︁
B √︂ Î 𝑊 𝜋 (|0⟩ ⊗𝑛1 ⊗ |1⟩ ⊗𝑛2 ⊗ · · · ⊗ |𝑑 − 1⟩ ⊗𝑛𝑑 ), (2.5.11)
𝑑 𝜋∈S𝑛
𝑛! 𝑗=1 𝑛 𝑗 !
Í
where 𝑛1 , 𝑛2 , . . . , 𝑛 𝑑 ≥ 0 are such that 𝑑𝑗=1 𝑛 𝑗 = 𝑛. We often call this the
occupation number basis for Sym𝑛 (H). The reason for this name is that, physically,
75
Chapter 2: Mathematical Tools

each of the 𝑛 Hilbert spaces H corresponds to a quantum system, and each 𝑛 𝑗 tells
us how many of the 𝑛 quantum systems are in the state given by | 𝑗 − 1⟩. (We
formally draw the correspondence between Hilbert spaces and quantum systems in
Chapter 3.) The number of elements in this basis is equal to the number of ways
of selecting 𝑛 elements, with repetition, from a set of 𝑑 distinct elements. This
𝑑+𝑛−1
number is equal to 𝑛 . Consequently, the dimension of Sym𝑛 (H) is

𝑑+𝑛−1 𝑑+𝑛−1
dim(Sym𝑛 (H)) = = . (2.5.12)
𝑛 𝑑−1

Exercise 2.34
Let 𝑑 ≥ 2 and 𝑛 = 2. Show that the basis elements |𝑛1 , 𝑛2 , . . . , 𝑛 𝑑 ⟩ of Sym2 (C𝑑 )
are given as follows:

|𝑛1 , 𝑛2 , . . . , 𝑛 𝑑 ⟩ = | 𝑗 − 1, 𝑗 − 1⟩, (2.5.13)

if 𝑛 𝑗 = 2, 𝑛ℓ = 0 ∀ℓ ≠ 𝑗, and

1
|𝑛1 , 𝑛2 , . . . , 𝑛 𝑑 ⟩ = √ (| 𝑗 − 1, 𝑘 − 1⟩ + |𝑘 − 1, 𝑗 − 1⟩), (2.5.14)
2
if 𝑛 𝑗 = 𝑛 𝑘 = 1, 𝑘 ≠ 𝑗 and 𝑛ℓ = 0 ∀ℓ ≠ 𝑗, 𝑘.

Remark: The direct sum vector space

∞
Ê
F 𝐵 (H) B Sym𝑛 (H) (2.5.15)
𝑛=0

is called the bosonic Fock space. (Note that Sym0 (H) is the set of complex scalars, i.e.,
Sym0 (H) = C.) It is an infinite-dimensional Hilbert space that is relevant for the study of
quantum optical and other continuous-variable quantum systems.

An important fact that we state without proof (please consult the Bibliographic
Notes in Section 2.6) is that for every 𝑑-dimensional Hilbert space H,
∫
𝑑+𝑛−1
ΠSym𝑛 (H) = 𝜓 ⊗𝑛 d𝜓, (2.5.16)
𝑛
where the integral on the right-hand side is taken with respect to the Haar measure
over all unit vectors.

76
Chapter 2: Mathematical Tools

Remark: The measure d𝜓 is also called the Fubini-Study measure. A concrete coordinate
representation of the measure can be obtained by using the following parameterization of every
unit vector |𝜓⟩ in a 𝑑-dimensional Hilbert space H:
𝑑−1
∑︁
|𝜓⟩ = 𝑟 𝑘 ei𝜑𝑘 |𝑘⟩, (2.5.17)
𝑘=0

where 0 ≤ 𝜑Í 𝑘 ≤ 2𝜋 and 𝑟 𝑘 ≥ 0 for all 0 ≤ 𝑘 ≤ 𝑑 − 1. Furthermore, since |𝜓⟩ is a unit vector, we

require that 𝑑−1 2
𝑘=0 𝑟 𝑘 = 1. The conditions on the coefficients 𝑟 𝑘 imply that they parameterize the
positive octant of a sphere in 𝑑 dimensions. As such, each 𝑟 𝑘 can be written as
𝑑−1
Ö 𝜃𝑘
𝑟0 = sin , (2.5.18)
𝑘=1
2
𝑑−1
𝜃𝑚 Ö 𝜃𝑘
𝑟 𝑚 = cos sin , 1 ≤ 𝑚 ≤ 𝑑 − 2, (2.5.19)
2 𝑘=𝑚+1 2
𝜃 𝑑−1
𝑟 𝑑−1 = cos , (2.5.20)
2
where 0 ≤ 𝜃 𝑖 ≤ 𝜋. Similarly, the angles 𝜑 𝑘 parameterize a torus in 𝑑 dimensions. The
Fubini-Study measure d𝜓 is then the volume element of the coordinate system formed from the
𝑟 𝑘 and the coordinate system formed by the 𝜑 𝑘 :
𝑑−1
(𝑑 − 1)! Ö 𝜃𝑖 𝜃𝑖
d𝜓 = cos sin2𝑖−1 d𝜃 𝑖 d𝜑𝑖 . (2.5.21)
(2𝜋) 𝑑−1
𝑖=1
2 2

(Please consult the Bibliographic Notes in Section 2.6 for details.) In the case 𝑑 = 2, we have that
1 𝜃1 𝜃1
d𝜓 = cos sin d𝜃 1 d𝜑1 (𝑑 = 2). (2.5.22)
2𝜋 2 2

We often consider the case that the Hilbert space H is a tensor product of two
Hilbert spaces, i.e., H = H 𝐴 ⊗ H𝐵 ≡ H 𝐴𝐵 , with H 𝐴 a 𝑑 𝐴 -dimensional Hilbert
𝑑 𝐴−1
space and H𝐵 a 𝑑 𝐵 -dimensional Hilbert space. As we have seen above, if {|𝑖⟩ 𝐴 }𝑖=0
𝐵 −1
is an orthonormal basis for H 𝐴 and {| 𝑗⟩𝐵 } 𝑑𝑗=0 is an orthonormal basis for H𝐵 ,
then {|𝑖, 𝑗⟩ 𝐴𝐵 ≡ |𝑖⟩ 𝐴 ⊗ | 𝑗⟩𝐵 : 0 ≤ 𝑖 ≤ 𝑑 𝐴 − 1, 0 ≤ 𝑗 ≤ 𝑑 𝐵 − 1} is an orthonormal
basis for H 𝐴𝐵 . In this case, if we consider the 𝑛-fold tensor product H ⊗𝑛 𝐴𝐵 , then the
unitary representation {𝑊 ( 𝐴𝐵) 𝑛 } 𝜋∈S𝑛 defined in (2.5.1) acts as follows:
𝜋

77
Chapter 2: Mathematical Tools

for all 0 ≤ 𝑖 1 , 𝑖2 , . . . , 𝑖 𝑛 ≤ 𝑑 𝐴 − 1 and all 0 ≤ 𝑗 1 , 𝑗2 , . . . , 𝑗 𝑛 ≤ 𝑑 𝐵 − 1. However, by

rearranging the tensor factors, we find that the right-hand side of the above equation
can be written as

𝑊 𝐴𝜋 𝑛 |𝑖 1 , 𝑖2 , . . . , 𝑖 𝑛 ⟩ 𝐴1 ···𝐴𝑛 ⊗ 𝑊𝐵𝜋𝑛 | 𝑗 1 , 𝑗2 , . . . , 𝑗 𝑛 ⟩𝐵1 𝐵2 ···𝐵𝑛 , (2.5.24)

where {𝑊 𝐴𝜋 𝑛 } 𝜋∈S𝑛 and {𝑊𝐵𝜋𝑛 } 𝜋∈S𝑛 are the unitary representations of S𝑛 acting on
H ⊗𝑛 ⊗𝑛
𝐴 and H 𝐵 , respectively. We can thus write the projection onto Sym𝑛 (H 𝐴 ⊗ H 𝐵 )
as
1 ∑︁ 𝜋 1 ∑︁ 𝜋
ΠSym𝑛 (H 𝐴 ⊗H𝐵 ) = 𝑊 ( 𝐴𝐵) 𝑛 ≡ 𝑊 𝐴𝑛 ⊗ 𝑊𝐵𝜋𝑛 . (2.5.25)
𝑛! 𝑛!
𝜋∈S𝑛 𝜋∈S𝑛

2.6 Bibliographic Notes

The study of inner product spaces, including Hilbert spaces, is the primary focus of
functional analysis, for which we refer to the following books: (Reed and Simon,
1981; Kreyszig, 1989; Hall, 2013). In the case of finite-dimensional Hilbert spaces,
which is what we consider throughout this book, many of the concepts studied in
functional analysis reduce to those studied in linear algebra and matrix analysis. For
these topics, we refer to Bhatia (1997); Horn and Johnson (2013); Strang (2016).
The generalized Gell-Mann matrices discussed after (2.2.47) were presented by
Hioe and Eberly (1981); Bertlmann and Krammer (2008).
A review of operator monotone, operator concave, and operator convex functions
is given by Bhatia (1997). The short course of Carlen (2010) is also helpful. For
proofs of the properties listed immediately after Definition 2.13, see (Bhatia, 1997,
Chapter V).
The proof of (2.2.94) follows immediately from (Bhatia, 1997, Problem III.6.2).
A proof of (2.2.98), and therefore, of the Hölder inequality in (2.2.99), can be
found in (Bhatia, 1997, Section IV & Exercise IV.2.12).
Lemma 2.11 can be found in (Audenaert and Eisert, 2005, Lemma 4). A
proof of Proposition 2.8 can be found in Müller-Lennert et al. (2013, Lemma 12).
Lemma 2.15 was proved by Lieb and Thirring (1976); Araki (1990). The Courant–
Fischer–Weyl minimax principle, which is invoked in the proof of property 4 of
Lemma 2.14, is presented in (Bhatia, 1997, Corollary III.1.2).

78
Chapter 2: Mathematical Tools

A proof of the operator Jensen inequality (Theorem 2.16) was given by Hansen
and Pedersen (2003). In presenting the implication 1. ⇒ 3. of Theorem 2.16, we
followed the proof given by Fujii et al. (2004, Theorem 3).
The notation ∥·∥⋄ for the quantity on the right-hand side of (2.2.187) was
introduced by Kitaev (1997), and it is known as a completely bounded trace norm
in the mathematics literature; see, for example, (Paulsen, 2003). The result in
(2.2.188) is due to Smith (1983) (see Theorem 2.10 therein), but it can also be
found in (Kitaev, 1997; Aharonov et al., 1998). For a proof of (2.2.190), see
Theorem 3.51 in (Watrous, 2018), which also contains several more properties of
the diamond norm.
For an introduction to real analysis, see (Rudin, 1976).
For an introduction to convex analysis, see (Rockafellar, 1970; Boyd and
Vandenberghe, 2004), and for a proof of the Fenchel–Eggleston–Carathéodory
theorem, see (Eggleston, 1958; Rockafellar, 1970).
Sion’s minimax theorem (Theorem 2.24) is due to Sion (1958), and it is a
generalization of a minimax theorem of von Neumann (1928). A short proof of
Sion’s minimax theorem can be found in (Komiya, 1988). The minimax theorem in
Theorem 2.25 was presented by Mosonyi and Hiai (2011).
For an introduction to probability theory, see (Feller, 1968; Ross, 2019). Proofs
of Markov’s inequality (2.3.20) and Jensen’s inequality (2.3.21) can be found in,
e.g., (Fristedt and Gray, 1997).
For further details on semi-definite programming, see Vandenberghe and Boyd
(1996); Watrous (2018). Various polynomial-time algorithms for solving semi-
definite programs were developed by Khachiyan (1980); Arora et al. (2005); Arora
and Kale (2007); Arora et al. (2012); Lee et al. (2015). A proof of Slater’s Theorem
(Theorem 2.28) can be found in (Boyd and Vandenberghe, 2004, Section 5.3.2).
For further details about the symmetric subspace of a tensor product of finite-
dimensional Hilbert spaces, as well as for a proof of (2.5.16), see (Harrow, 2013)
(see also Bengtsson and Zyczkowski (2017, Section 12.7)). Further details about
the Fubini-Study measure d𝜓 introduced in (2.5.16) and elaborated upon in the
remark immediately below it may be found in (Bengtsson and Zyczkowski, 2017,
Chapter 4).

79
Chapter 2: Mathematical Tools

2.7 Problems
1. Prove that a linear operator 𝑋 ∈ L(H) is positive semi-definite if and only if it can be
written as 𝑋 = 𝑌 †𝑌 for some 𝑌 ∈ L(H, H′).

2. Prove that the columns of every isometry form an orthonormal set of vectors. Similarly,
prove that the rows and columns of every unitary operator form orthonormal sets of
vectors. (Hint: Consider using the expressions in (2.2.11).)

3. Let 𝑋 ∈ L(H 𝐴 ) and 𝑌 ∈ L(H𝐵 ) be normal operators, and consider their so-called
Kronecker sum:
𝑋 ⊕K 𝑌 B 𝑋 ⊗ 1 𝐵 + 1 𝐴 ⊗ 𝑌 . (2.7.1)
Prove that spec(𝑋 ⊕K 𝑌 ) = {𝜆 + 𝜇 : 𝜆 ∈ spec(𝑋), 𝜇 ∈ spec(𝑌 )}. Also prove that the
associated eigenvectors are of the form |𝜓⟩ ⊗ |𝜙⟩, where |𝜓⟩ is an eigenvector of 𝑋 and
|𝜙⟩ is an eigenvector of 𝑌 .

4. The Hadamard product, also known as the Schur product, of two linear operators
𝑋, 𝑌 ∈ L(C𝑑 ), with 𝑑 ≥ 2, is defined to be the element-wise product of 𝑋 and 𝑌 : if
Í Í𝑑−1
𝑋 = 𝑖,𝑑−1
𝑗=0 𝑋𝑖, 𝑗 |𝑖⟩⟨ 𝑗 | and 𝑌 = 𝑖, 𝑗=0 𝑌𝑖, 𝑗 |𝑖⟩⟨ 𝑗 |, then

𝑑−1
∑︁
𝑋 ∗𝑌 B 𝑋𝑖, 𝑗 𝑌𝑖, 𝑗 |𝑖⟩⟨ 𝑗 |. (2.7.2)
𝑖, 𝑗=0

(a) Verify that for all |𝜓⟩, |𝜙⟩ ∈ C𝑑 ,

⟨𝜓|𝑋 ∗ 𝑌 |𝜙⟩ = Tr 𝑋 T diag(⟨𝜓|)𝑌 diag(|𝜙⟩) , (2.7.3)
Í𝑑−1 Í
where for |𝜓⟩ = 𝑖=0 𝛼𝑖 |𝑖⟩ and |𝜙⟩ = 𝑑−1
𝑗=0 𝛽 𝑗 | 𝑗⟩,

𝑑−1
∑︁ 𝑑−1
∑︁
diag(⟨𝜓|) B 𝛼𝑖 |𝑖⟩⟨𝑖|, diag(|𝜙⟩) B 𝛽 𝑗 | 𝑗⟩⟨ 𝑗 |. (2.7.4)
𝑖=0 𝑗=0

(b) Prove that for all |𝜓⟩, |𝜙⟩ ∈ C𝑑 ,

|𝜓⟩⟨𝜓| ∗ |𝜙⟩⟨𝜙| = (|𝜓⟩ ∗ |𝜙⟩)(⟨𝜓| ∗ ⟨𝜙|). (2.7.5)

(c) Prove that the Hadamard product of two positive semi-definite operators is positive
semi-definite.
80
Chapter 2: Mathematical Tools

5. Let {|𝜓 𝑗 ⟩} 𝑑𝑗=1 be a set of 𝑑 linearly independent vectors in C𝑑 , with 𝑑 ≥ 2. By definition,

this means that, for all 𝑐 1 , 𝑐 2 , . . . , 𝑐 𝑑 ∈ C, the equation 𝑐 1 |𝜓1 ⟩ + 𝑐 2 |𝜓2 ⟩ + · · · + 𝑐 𝑑 |𝜓 𝑑 ⟩ = 0
implies 𝑐 1 = 𝑐 2 = · · · = 𝑐 𝑑 = 0.
(a) Let
𝑑
∑︁
𝑇 B |𝜓 𝑗 ⟩⟨ 𝑗 − 1|. (2.7.6)
𝑗=1

The operator 𝑇 can be thought of as a 𝑑 × 𝑑 matrix whose columns are given by

the vectors |𝜓 𝑗 ⟩. Prove that 𝑇 is invertible. (Hint: First prove that 𝑇 is injective,
by showing that its kernel contains only the zero vector. Then use the result of
Exercise 2.4.)

(b) Using (a), prove that {|𝜓 𝑗 ⟩} 𝑑𝑗=1 is a basis for C𝑑 . In other words, prove that every
vector |𝜙⟩ ∈ C𝑑 can be written as a unique linear combination of the vectors |𝜓 𝑗 ⟩.
We thus have that every set of 𝑑 linearly independent vectors in C𝑑 is a basis for C𝑑 .

By combining this result with the result of Exercise 2.2, we have that a linearly
independent set {|𝜓 𝑗 ⟩} 𝑑𝑗=1 of vectors in C𝑑 is an orthonormal basis if and only if
𝑗=1 |𝜓 𝑗 ⟩⟨𝜓 𝑗 | = 1𝑑 .
Í𝑑

2
6. Let {𝐵 𝑗 } 𝑑𝑗=1 be an orthonormal basis for L(C𝑑 ), with 𝑑 ≥ 2.
(a) Prove that
𝑑2
∑︁
𝐵 𝑗 ⊗ 𝐵 𝑗 = Γ𝑑 , (2.7.7)
𝑗=1
Í𝑑−1
where we recall that Γ𝑑 = |Γ𝑑 ⟩⟨Γ𝑑 | = 𝑖, 𝑗=0 |𝑖, 𝑖⟩⟨ 𝑗, 𝑗 |; see (2.2.34). Similarly, prove
that
𝑑2
∑︁
𝐵†𝑗 ⊗ 𝐵 𝑗 = 𝐹, (2.7.8)
𝑗=1
Í𝑑−1
where we recall that 𝐹 = 𝑖, 𝑗=0 |𝑖, 𝑗⟩⟨ 𝑗, 𝑖|; see (2.5.10).
2
(Hint: Start by verifying that {𝐵 𝑗 } 𝑑𝑗=1 is an orthonormal basis for L(C𝑑 ). Then,
use the fact that every linear operator 𝑍 ∈ L(C𝑑 ⊗ C𝑑 ) can be written as 𝑍 =
Í𝑑 2
𝑗,𝑘=1 𝑐 𝑗,𝑘 𝐵 𝑗 ⊗ 𝐵 𝑘 for some coefficients 𝑐 𝑗,𝑘 ∈ C.)

81
Chapter 2: Mathematical Tools

(b) Prove that for all 𝑋 ∈ L(C𝑑 ),

𝑑2
𝐵 𝑗 𝑋 𝐵†𝑗 = Tr[𝑋] 1𝑑 .
∑︁
(2.7.9)
𝑗=1

(Hint: Use (2.7.7), along with the identities in (2.2.40)–(2.2.42).)

7. For all 𝑑 ≥ 2, construct a basis for L(C𝑑 ) that consists entirely of density operators.
(Hint: Consider using the eigenvectors of the orthonormal basis of Hermitian operators
defined in (2.2.45)–(2.2.47).)

8. Let {|𝜓 𝑗 ⟩} 𝑑𝑗=1 be a set of linearly independent, normalized, but non-orthogonal vectors
in C𝑑 , with 𝑑 ≥ 2. We would like to transform these vectors into a new set {|𝜙 𝑗 ⟩} 𝑑𝑗=1 of
orthonormal vectors via an invertible linear operator 𝑋, such that |𝜙 𝑗 ⟩ = 𝑋 |𝜓 𝑗 ⟩ for all
𝑗 ∈ {1, 2, . . . , 𝑑}.
(a) Prove that the operator 𝑆 defined as
𝑑
∑︁
𝑆B |𝜓 𝑗 ⟩⟨𝜓 𝑗 | (2.7.10)
𝑗=1

is invertible and positive definite. (Hint: Write 𝑆 in terms of the operator 𝑇 defined
in (2.7.6).)

(b) Let
1
|𝜙 𝑗 ⟩ B 𝑆 − 2 |𝜓 𝑗 ⟩ (2.7.11)
for all 𝑗 ∈ {1, 2, . . . , 𝑑}. Prove that {|𝜙 𝑗 ⟩} 𝑑𝑗=1 is an orthonormal basis for C𝑑 .
1
(Hint: See problem 5.(c).) Also, prove that ⟨𝜙𝑖 |𝜓 𝑗 ⟩ = ⟨𝑖 − 1|𝐺 2 | 𝑗 − 1⟩ for all
𝑖, 𝑗 ∈ {1, 2, . . . , 𝑑}, where 𝐺 B 𝑇 †𝑇 and 𝑇 B 𝑑𝑗=1 |𝜓 𝑗 ⟩⟨ 𝑗 − 1|.
Í

(c) Let us now show that the vectors defined in (2.7.11) are optimal with respect to the
Euclidean norm, in the following sense:
 𝑑
 ∑︁
 


2
inf |𝜓 𝑗 ⟩ − |𝜙 𝑗 ⟩ 2
: |𝜙 𝑗 ⟩ = 𝑋 |𝜓 𝑗 ⟩, ⟨𝜙𝑖 |𝜙 𝑗 ⟩ = 𝛿𝑖, 𝑗 ∀ 1 ≤ 𝑗 ≤ 𝑑 (2.7.12)
𝑋  
 𝑗=1 

82
Chapter 2: Mathematical Tools

𝑑
∑︁ 2
1
= |𝜓 𝑗 ⟩ − 𝑆 − 2 |𝜓 𝑗 ⟩ , (2.7.13)
2
𝑗=1

where the optimization in (2.7.12) is with respect to invertible linear operators 𝑋.

i. Prove that solving the optimization problem given by (2.7.12) can be reduced to
solving the optimization problem given by

sup Tr[(𝑋 + 𝑋 † )𝑆] : 𝑋𝑆𝑋 † = 1𝑑 .

(2.7.14)
𝑋

ii. Prove that the constraint 𝑋𝑆𝑋 † = 1𝑑 in (2.7.14) implies 𝑋 = 𝑈𝑆 − 2 , where 𝑈 is a

unitary operator. (Hint: Consider a polar decomposition of 𝑋; see Theorem 2.3.)

Hence, show that the optimization problem given by (2.7.14) is equivalent to
1

sup Re Tr[𝑈𝑆 2 ] , (2.7.15)
𝑈

where the optimization is with respect to unitary operators 𝑈 acting on C𝑑 .

iii. Prove that the solution to the optimization problem given by (2.7.15) is 𝑈 = 1𝑑 ,
1
implying that the optimal 𝑋 in (2.7.12) is indeed 𝑆 − 2 . (Hint: Use Proposi-
tion 2.10.)
(Bibliographic Note: The vectors |𝜙 𝑗 ⟩ defined in (2.7.11) are known as the symmetric
orthogonalization of the original vectors |𝜓 𝑗 ⟩, and this construction is attributed to
Löwdin (1950); see also (Löwdin, 1970). An alternate proof of the optimality of this
construction, as worked out in part (c) of this problem, can be found in (Mayer, 2002).)

9. For the case 𝑑 = 2 and 𝑛 = 2, verify the equalities given by (2.5.5) and (2.5.16) by
making use of (2.5.22).

10. Prove that the right-hand side of (2.5.5) is indeed the projection onto Sym𝑛 (H) by
showing that
∑︁
|𝑛1 , 𝑛2 , . . . , 𝑛 𝑑 ⟩⟨𝑛1 , 𝑛2 , . . . , 𝑛 𝑑 | = ΠSym𝑛 (H) . (2.7.16)
𝑛1 ,𝑛2 ,...,𝑛 𝑑 ≥0,
Í𝑑
𝑗=1 𝑛 𝑗 =𝑛

83
Chapter 3

Quantum States and

Measurements
In the previous chapter, we studied several important topics in mathematics that
collectively form one foundational piece for the study of quantum information
processing. Another foundational piece is quantum mechanics, and in this and the
following chapter, we provide an overview of it, placing particular emphasis on
those aspects of it that are useful for the communication protocols that we discuss
in later chapters. Many aspects of quantum mechanics cannot be explained by
classical reasoning. For example, there is no strong classical analogue for pure
quantum states or entanglement, and this leads to stark differences between what
is possible in the classical and quantum worlds. However, at the same time, it is
important to emphasize that all of classical information theory is subsumed by
quantum information theory, so that whatever is possible with classical information
processing is also possible with quantum information processing. Interestingly as
well, quantum information processing allows for richer possibilities, with protocols
such as quantum teleportation and super-dense coding.

3.1 Axioms of Quantum Mechanics

The mathematical description of quantum systems can be summarized by the
following axioms. Each of these axioms is elaborated upon in the section indicated.
1. Quantum systems: A quantum system 𝐴 is associated with a Hilbert space H 𝐴 .
84
Chapter 3: Quantum States and Measurements

The state of the system 𝐴 is described by a density operator, which is a unit-trace,

positive semi-definite linear operator acting on H 𝐴 . (See Section 3.2.)
2. Bipartite quantum systems: For distinct quantum systems 𝐴 and 𝐵 with
associated Hilbert spaces H 𝐴 and H𝐵 , the composite system 𝐴𝐵 is associated
with the Hilbert space H 𝐴 ⊗ H𝐵 . (See Section 3.2.1.)
3. Measurement: The measurement of a quantum system 𝐴 is described by a
positive operator-valued measure (POVM) {𝑀𝑥 }𝑥∈X , which is defined to be
a collection of positive semi-definite operators satisfying 𝑥∈X 𝑀𝑥 = 1H 𝐴 ,
Í
where X is a finite alphabet1 . If the system is in the state 𝜌 and the measurement
outcome is described by a random variable 𝑋, then the probability Pr[𝑋 = 𝑥]
of obtaining the outcome 𝑥 is given by the Born rule as
Pr[𝑋 = 𝑥] = Tr[𝑀𝑥 𝜌]. (3.1.1)
Furthermore, a physical observable 𝑂 corresponds to a Hermitian operator
acting on the underlying Hilbert space. Recall from the spectral theorem
(Theorem 2.4) that 𝑂 has a spectral decomposition as follows:
∑︁
𝑂= 𝜆 Π𝜆 , (3.1.2)
𝜆∈spec(𝑂)

where spec(𝑂) is the set of distinct eigenvalues of 𝑂 and Π𝜆 is a spectral

projection. A measurement of 𝑂 is described by the POVM {Π𝜆 }𝜆 , which is
indexed by the distinct eigenvalues 𝜆 of 𝑂. The expected value ⟨𝑂⟩ 𝜌 of the
observable 𝑂 when the state is 𝜌 is given by
∑︁
⟨𝑂⟩ 𝜌 B Tr[𝑂 𝜌] = 𝜆 Tr[Π𝜆 𝜌]. (3.1.3)
𝜆∈spec(𝑂)

(See Section 3.3.)

4. Evolution: The evolution of the state of a quantum system is described by a
quantum channel, which is a linear, completely positive, and trace-preserving
map acting on the state of the system. (See Chapter 4.)
Note that the second axiom for the description of bipartite quantum systems is
sufficient to conclude that the multipartite quantum system 𝐴1 𝐴2 · · · 𝐴 𝑘 , comprising
𝑘 distinct quantum systems 𝐴1 , 𝐴2 , . . . , 𝐴 𝑘 , is associated with the Hilbert space
H 𝐴1 ⊗ H 𝐴2 ⊗ · · · ⊗ H 𝐴 𝑘 .
1 POVMs need not contain a finite number of elements, but we consider POVMs with a finite
number of elements exclusively throughout this book.

85
Chapter 3: Quantum States and Measurements

3.2 Quantum Systems and States

Each quantum system is associated with a Hilbert space. In this book, we consider
only finite-dimensional quantum systems, that is, quantum systems described by
finite-dimensional Hilbert spaces. In the following, we provide a mathematical
description of several finite-dimensional quantum systems, along with examples of
how these systems can be physically realized.

1. Qubit systems: The qubit is perhaps the most fundamental quantum system and
is the quantum analogue of the (classical) bit. Every physical system with two
distinct degrees of freedom obeying the laws of quantum mechanics can be
considered a qubit system. The Hilbert space associated with a qubit system is
C2 , whose standard orthonormal basis is denoted by {|0⟩, |1⟩}. Three common
ways of physically realizing qubit systems are as follows:
(a) The two spin states of a spin- 12 particle.
(b) Two distinct energy levels of an atom, such as the ground state and one of
the excited states.
(c) Clockwise and counter-clockwise directions of current flow in a supercon-
ducting electronic circuit.
2. Qutrit systems: A qutrit system is a quantum system consisting of three distinct
physical degrees of freedom. The Hilbert space of a qutrit is C3 , with the
standard orthonormal basis denoted by {|0⟩, |1⟩, |2⟩}. Qutrit systems are less
commonly considered than qubit systems for implementations, although one
important example of an implementation of a qutrit system occurs in quantum
optical systems, which we discuss below. Like qubit systems, qutrit systems
can also be physically realized using, for example, the spin states of a spin-1
atom or three distinct energy levels of an atom.
3. Qudit systems: A qudit system is a quantum system with 𝑑 distinct degrees
of freedom and is described by the Hilbert space C𝑑 , with the standard
orthonormal basis denoted by {|0⟩, |1⟩, . . . , |𝑑 − 1⟩}. The spin states of every
spin- 𝑗 atom can be used to realize a qudit system with 𝑑 = 2 𝑗 + 1. Another
physical realization of a qudit system is with the 𝑑 distinct energy levels of an
atom.
4. Quantum optical systems: An important quantum system, particularly for the
implementation of many quantum communication protocols, is a quantum
86
Chapter 3: Quantum States and Measurements

optical system. By a quantum optical system, we mean a physical system,

such as an optical cavity or a fiber-optic cable, in which modes of light, with
photons as information carriers, propagate. A mode of light has a well defined
momentum, frequency, polarization, and spatial direction.
Formally, a quantum optical system with 𝑑 distinct modes is described by the
Fock space FB (C𝑑 ), which is a Hilbert space equipped with the orthonormal
occupation number basis {|𝑛1 , . . . , 𝑛 𝑑 ⟩ : 𝑛1 , . . . , 𝑛 𝑑 ≥ 0}, where 𝑛 𝑗 , for
𝑗 ∈ {1, . . . , 𝑑}, indicates the number of photons occupied in mode 𝑗. See
(2.5.11) and the surrounding discussion for a brief review of the occupation
number basis and Fock space.
The Fock space is infinite dimensional, but by restricting to particular subspaces,
it is possible use photons to physically realize finite-dimensional quantum
systems. The following two realizations of a qubit system are particularly
important:
(a) A single-mode optical system, with Hilbert space FB (C), restricted to
the subspace spanned by the orthonormal vectors {|0⟩, |1⟩}, interpreted
as either zero or one photon occupied in the mode. The vector |0⟩
corresponding to no photons is commonly called the vacuum state vector
of the mode.
(b) A two-mode optical system, with Hilbert space FB (C2 ), restricted to the
subspace spanned by the orthonormal vectors {|0, 1⟩, |1, 0⟩}, consisting
of only one photon in total occupying either one of the two modes. This
realization of a qubit system is commonly called the dual-rail encoding
because it makes use of two modes of light. Two distinct polarization
degrees of freedom of photons, such as horizontal and vertical polarizations,
are commonly used as the two modes in dual-rail encodings of a qubit.
One then usually lets |𝐻⟩ ≡ |0, 1⟩ and |𝑉⟩ ≡ |1, 0⟩ denote a horizontally-
and vertically-polarized photon, respectively.
By considering the three-dimensional subspace spanned by {|0, 0⟩, |0, 1⟩,
|1, 0⟩}, that is, the dual-rail qubit system with the additional orthogonal
vacuum state vector |0, 0⟩ of the two modes, we obtain a physical realization
of a qutrit system. This particular realization of a qutrit system is relevant
for communication protocols in the context of the erasure channel, which
is discussed in Section 4.5.2.

Having discussed how a quantum system is mathematically described, let us

now move on to the mathematical description of the state of a quantum system.
87
Chapter 3: Quantum States and Measurements

Definition 3.1 Quantum State

The state of a quantum system is described by a density operator acting on
the underlying Hilbert space of the quantum system. A density operator is
a unit-trace, positive semi-definite linear operator. Throughout the book, we
identify a state with its corresponding density operator. We denote the set of
density operators on a Hilbert space H as D(H).

We typically use the Greek letters 𝜌, 𝜎, 𝜏, or 𝜔 to denote quantum states.

Exercise 3.1
Prove that the set of quantum states is a convex set. (Recall the definition of a
convex set from Section 2.3.3.) In other words, prove that for every alphabet X
and set {𝜌 𝑥 }𝑥∈X of quantum states, along with every probability distribution
𝑝 : X → [0, 1], the following convex combination is a quantum state:
∑︁
𝜌= 𝑝(𝑥) 𝜌 𝑥 . (3.2.1)
𝑥∈X

The extremal points in the convex set of quantum states are called pure states. A
pure state is a rank-one projection onto a unit vector in the Hilbert space. Concretely,
pure states are of the form |𝜓⟩⟨𝜓| where |𝜓⟩ ∈ H is a normalized vector. For
convenience, we sometimes denote |𝜓⟩⟨𝜓| simply as 𝜓, and refer to the unit vector
|𝜓⟩ as a state vector. Since every element of a convex set can be written as a convex
combination of the extremal points in the set, every quantum state 𝜌 that is not a
pure state can be written as
∑︁
𝜌= 𝑝(𝑥)|𝜓𝑥 ⟩⟨𝜓𝑥 | (3.2.2)
𝑥∈X
for some set {|𝜓𝑥 ⟩}𝑥∈X of state vectors defined with respect to a finite alphabet X,
where 𝑝 : X → [0, 1] is a probability distribution.

Exercise 3.2
Prove that a quantum state 𝜌 is pure if and only if 𝜌 2 = 𝜌. More generally, prove
that 𝜌 is pure if and only if Tr[𝜌 2 ] = 1. The quantity Tr[𝜌 2 ] is known as the
purity of 𝜌.

88
Chapter 3: Quantum States and Measurements

A state 𝜌 that is not pure is called a mixed state, because it can be thought of as
arising from the lack of knowledge of which pure state from the set {|𝜓𝑥 ⟩}𝑥∈X in
(3.2.2) the system has been prepared. Note that the decomposition in (3.2.2), of a
quantum state into pure states, is generally not unique.
A state 𝜌 is called maximally mixed if the set {|𝜓𝑥 ⟩}𝑥∈X in (3.2.2) consists of
𝑑 orthonormal state vectors and the probability distribution {𝑝(𝑥)}𝑥∈X is uniform
(i.e., 𝑝(𝑥) = 𝑑1 for all 𝑥 ∈ X). In this case, it follows that

1𝑑
𝜌= C 𝜋𝑑 , (3.2.3)
𝑑
as a consequence of Exercise 2.2. The state 𝜋 𝑑 is called maximally mixed because it
corresponds to having the most uncertainty about which state from the set {|𝜓 𝑘 ⟩} 𝑑𝑘=1
the system is in. This uncertainty can be quantified by using quantum entropy, and
in Chapter 7, we find that the maximally mixed state 𝜋 𝑑 has the largest entropy
among all states of a finite-dimensional system of dimension 𝑑, thus justifying the
term “maximally mixed.”
Now, let us recall the orthonormal basis of Hermitian operators defined in
(2.2.44)–(2.2.47).
√ In quantum information, it is common to scale these operators
by 𝑑, where 𝑑 is the dimension, so that we have an orthogonal basis {𝑆 𝑘(𝑑) } 𝑑𝑘=0−1
2

Hermitian operators, with 𝑆0(𝑑) = 1𝑑 and 𝑆 𝑘(𝑑) , 𝑘 ∈ {1, 2, . . . , 𝑑 2 − 1}, equal to

√
the traceless operators in (2.2.45)–(2.2.47) multiplied by 𝑑. Note here that
we have also relabeled the indices of the set of operators defined in (2.2.44)–
(2.2.47). These operators satisfy Tr[(𝑆 𝑘(𝑑) ) 2 ] = 𝑑 and Tr[𝑆 𝑘(𝑑) 𝑆ℓ(𝑑) ] = 𝑑𝛿 𝑘,ℓ for all
𝑘, ℓ ∈ {0, 1, . . . , 𝑑 2 − 1}. We often suppress the dimension and write 𝑆 𝑘 ≡ 𝑆 𝑘(𝑑) if
the dimension is unimportant or clear from the context. Using these operators, we
can write every density operator 𝜌 ∈ D(C𝑑 ) in the following form:
2 −1
𝑑∑︁
1©
𝜌 = 1 +
ª
𝑟𝑘 𝑆𝑘 ® , (3.2.4)
𝑑 𝑘=1
« ¬
where 𝑟 𝑘 = ⟨𝑆 𝑘 , 𝜌⟩ = Tr[𝑆 𝑘 𝜌] ∈ R for all 𝑘 ∈ {1, 2, . . . , 𝑑 2 − 1}. The vector
𝑟®𝜌 B (𝑟 1 , 𝑟 2 , . . . , 𝑟 𝑑 2 −1 ) ∈ R𝑑 −1 is sometimes called the Bloch vector, or the
2

coherence vector, of 𝜌; please see the Bibliographic Notes (Section 3.4) for more
information on this terminology.

89
Chapter 3: Quantum States and Measurements

Exercise 3.3
Let 𝜌 be the quantum state represented as in (3.2.4).
1. Verify that Tr[𝜌] = 1.
Í𝑑 2 −1
2. Prove that 𝜌 is pure if and only if 𝑘=1 𝑟 𝑘2 = 𝑑 − 1.

At this point, it is instructive to look at an example. Let us consider quantum

systems with 𝑑 = 2, i.e., qubits. The representation of an arbitrary quantum state 𝜌
in (3.2.4) becomes
1
𝜌 = ( 1 + 𝑟 1 𝑋 + 𝑟 2𝑌 + 𝑟 3 𝑍) (qubit state), (3.2.5)
2
where 𝑋, 𝑌 , and 𝑍 are the Pauli operators, which we defined in (2.2.48) and (2.2.49):

0 1 0 −i 1 0
𝑋= , 𝑌= , 𝑍= . (3.2.6)
1 0 i 0 0 −1

The fact that Tr[𝜌] = 1 follows from the fact that 𝑋, 𝑌 , and 𝑍 are traceless operators,
while Tr[1] = 2. The condition for 𝜌 to be positive semi-definite is left to the
following exercise.

Exercise 3.4
Show that the positive semi-definiteness of every qubit state 𝜌, as represented
in (3.2.5), is equivalent to 𝑟 12 + 𝑟 22 + 𝑟 32 ≤ 1.

The condition 𝑟 12 + 𝑟 22 + 𝑟 32 ≤ 1 for every qubit state 𝜌 represented as in (3.2.5),

along with the fact that 𝑟 1 , 𝑟 2 , 𝑟 3 ∈ R, implies that the vector 𝑟®𝜌 = (𝑟 1 , 𝑟 2 , 𝑟 3 ) lies on
or inside the unit sphere in three dimensions, for every qubit state 𝜌. Furthermore,
the condition in Exercise 3.3 for 𝜌 to be pure implies that a qubit state is pure if and
only if the vector 𝑟®𝜌 lies on the surface of the unit sphere. In quantum mechanics,
this unit sphere is known as the Bloch sphere, and if we include all mixed states
corresponding to the interior of the sphere, then we use the term Bloch ball to refer
to the set of all qubit states. See Figure 3.1 for a visual representation of the Bloch
ball.

90
Chapter 3: Quantum States and Measurements

|0i

|−i

|−ii |+ii

|+i

|1i

Figure 3.1: The quantum states in D(C2 ) of every qubit system can be
represented as a point in the so-called Bloch ball. All pure states lie on the
surface of the Bloch ball, which is known as the Bloch sphere. Shown are the
basis state vectors |0⟩ and |1⟩, corresponding to the Bloch vectors (0, 0, 1) and
(0, 0, −1), respectively. The superposition state vectors |±⟩ B √1 (|0⟩ ± |1⟩)
2
correspond to the Bloch vectors (±1, 0, 0), and the superposition state vectors
| ± i⟩ B √1 (|0⟩ ± i|1⟩) correspond to the Bloch vectors (0, ±1, 0).
2

3.2.1 Bipartite States and Schmidt Decomposition

The joint state of two distinct quantum systems 𝐴 and 𝐵 is described by a bipartite
quantum state 𝜌 𝐴𝐵 ∈ D(H 𝐴 ⊗ H𝐵 ). For brevity, the joint Hilbert space H 𝐴 ⊗ H𝐵
of the composite system 𝐴𝐵 is denoted by H 𝐴𝐵 .
𝑑 𝐴−1 𝐵 −1
Let {|𝑖⟩ 𝐴 }𝑖=0 and {| 𝑗⟩𝐵 } 𝑑𝑗=0 be orthonormal bases for H 𝐴 and H𝐵 , respectively.
Then,
{|𝑖⟩ 𝐴 ⊗ | 𝑗⟩𝐵 : 0 ≤ 𝑖 ≤ 𝑑 𝐴 − 1, 0 ≤ 𝑗 ≤ 𝑑 𝐵 − 1} (3.2.7)
is an orthonormal basis for H 𝐴𝐵 . For brevity, we often write |𝑖, 𝑗⟩ 𝐴𝐵 instead of
|𝑖⟩ 𝐴 ⊗ | 𝑗⟩𝐵 . Every state vector |𝜓⟩ 𝐴𝐵 ∈ H 𝐴𝐵 can thus be written as
𝐴−1 𝑑∑︁
𝑑∑︁ 𝐵 −1

|𝜓⟩ 𝐴𝐵 = 𝛼𝑖, 𝑗 |𝑖, 𝑗⟩ 𝐴𝐵 , (3.2.8)

𝑖=0 𝑗=0
Í𝑑 𝐴−1 Í𝑑 𝐵 −1 2
where 𝛼𝑖, 𝑗 = ⟨𝑖, 𝑗 |𝜓⟩ ∈ C and 𝑖=0 𝑗=0 𝛼𝑖, 𝑗 = 1. By the Schmidt decomposi-
tion theorem (Theorem 2.2), we can alternatively write |𝜓⟩ 𝐴𝐵 as
𝑟 √︁
∑︁
|𝜓⟩ 𝐴𝐵 = 𝜆 𝑘 |𝑒 𝑘 ⟩ 𝐴 ⊗ | 𝑓 𝑘 ⟩𝐵 , (3.2.9)
𝑘=1

91
Chapter 3: Quantum States and Measurements
Í
where each Schmidt coefficient 𝜆 𝑘 is strictly positive and they all satisfy 𝑟𝑘=1 𝜆 𝑘 =
1, {|𝑒 𝑘 ⟩ 𝐴 }𝑟𝑘=1 and {| 𝑓 𝑘 ⟩𝐵 }𝑟𝑘=1 are orthonormal sets of vectors in H 𝐴 and H𝐵 ,
respectively, and 𝑟 = rank(𝑋), where 𝑋 ∈ L(H 𝐴 , H𝐵 ) is defined as ⟨ 𝑗 | 𝐵 𝑋 |𝑖⟩ 𝐴 =
⟨𝑖, 𝑗 |𝜓⟩ 𝐴𝐵 for all 0 ≤ 𝑖 ≤ 𝑑 𝐴 − 1 and 0 ≤ 𝑗 ≤ 𝑑 𝐵 − 1.
More generally, recall from Chapter 2 that we can define the orthonormal bases
{|𝑖⟩⟨𝑖′ | 𝐴 : 0 ≤ 𝑖, 𝑖′ ≤ 𝑑 𝐴 − 1}, {| 𝑗⟩⟨ 𝑗 ′ | 𝐵 : 0 ≤ 𝑗, 𝑗 ′ ≤ 𝑑 𝐵 − 1}, (3.2.10)
for L(H 𝐴 ) and L(H𝐵 ), respectively. Then, the set
{|𝑖, 𝑗⟩⟨𝑖′, 𝑗 ′ | 𝐴𝐵 ≡ |𝑖⟩⟨𝑖′ | 𝐴 ⊗| 𝑗⟩⟨ 𝑗 ′ | 𝐵 : 0 ≤ 𝑖, 𝑖′ ≤ 𝑑 𝐴 −1, 0 ≤ 𝑗, 𝑗 ′ ≤ 𝑑 𝐵 −1} (3.2.11)
is an orthonormal basis for L(H 𝐴𝐵 ). It follows that every mixed state 𝜌 𝐴𝐵 ∈
D(H 𝐴𝐵 ) can be written as
𝐴−1 𝑑∑︁
𝑑∑︁ 𝐵 −1

𝜌 𝐴𝐵 = 𝛽𝑖, 𝑗;𝑖 ′ , 𝑗 ′ |𝑖, 𝑗⟩⟨𝑖′, 𝑗 ′ | 𝐴𝐵 , (3.2.12)

𝑖,𝑖 ′ =0 𝑗, 𝑗 ′ =0

where 𝛽𝑖, 𝑗;𝑖 ′ , 𝑗 ′ = ⟨|𝑖, 𝑗⟩⟨𝑖′, 𝑗 ′ |, 𝜌 𝐴𝐵 ⟩ = ⟨𝑖, 𝑗 |𝜌 𝐴𝐵 |𝑖′, 𝑗 ′⟩ ∈ C. Similarly, consider

𝑑 2 −1 𝑑 2 −1
orthogonal bases {𝑆 𝑘𝐴 } 𝑘=0
𝐴
and {𝑆 ℓ𝐵 }ℓ=0
𝐵
for L(H 𝐴 ) and L(H𝐵 ), respectively, as
defined in the paragraph above (3.2.4). Then, {𝑆 𝑘𝐴 ⊗ 𝑆 ℓ𝐵 : 0 ≤ 𝑘 ≤ 𝑑 2𝐴 − 1, 0 ≤
ℓ ≤ 𝑑 2𝐵 − 1} is an orthogonal basis for L(H 𝐴𝐵 ), so that every quantum state
𝜌 𝐴𝐵 ∈ D(H 𝐴𝐵 ) can be written as
𝑑 𝐴−1 𝑑 𝐵 −1
2 2
1 ∑︁ ∑︁
𝜌 𝐴𝐵 = 𝑟 𝑘,ℓ 𝑆 𝑘𝐴 ⊗ 𝑆 ℓ𝐵 , (3.2.13)
𝑑 𝐴 𝑑 𝐵 𝑘=0 ℓ=0
where
𝑟 𝑘,ℓ = ⟨𝑆 𝑘𝐴 ⊗ 𝑆 ℓ𝐵 , 𝜌 𝐴𝐵 ⟩ = Tr[(𝑆 𝑘𝐴 ⊗ 𝑆 ℓ𝐵 ) 𝜌 𝐴𝐵 ] (3.2.14)
for all 𝑘 ∈ {0, 1, . . . , 𝑑 2𝐴 − 1} and ℓ ∈ {0, 1, . . . , 𝑑 2𝐵 − 1}. Also, as with the Schmidt
decomposition of state vectors in (3.2.9), from Exercise 2.14 we have that every
mixed state 𝜌 𝐴𝐵 can be written as
𝑟 √
∑︁
𝜌 𝐴𝐵 = 𝜆 𝑘 𝐸 𝐴𝑘 ⊗ 𝐹𝐵𝑘 , (3.2.15)
𝑘=1

where each coefficient 𝜆 𝑘 is strictly positive, {𝐸 𝐴𝑘 }𝑟𝑘=1 and {𝐹𝐵𝑘 }𝑟𝑘=1 are orthonormal
sets of linear operators acting on H 𝐴 and H𝐵 , respectively, and 𝑟 = rank(𝑀), where
𝑀 ∈ L(H 𝐴 ⊗ H 𝐴 , H𝐵 ⊗ H𝐵 ) is defined by ⟨ 𝑗, ℓ| 𝐵𝐵 𝑀 |𝑖, 𝑘⟩ 𝐴𝐴 = ⟨𝑖, 𝑗 |𝜌 𝐴𝐵 |𝑘, ℓ⟩ for
all 0 ≤ 𝑖, 𝑗 ≤ 𝑑 𝐴 − 1 and 0 ≤ 𝑗, ℓ ≤ 𝑑 𝐵 − 1.
92
Chapter 3: Quantum States and Measurements

A A
ρA ≡ TrB [ρAB ]

ρAB ρAB
B B
ρB ≡ TrA [ρAB ]

Figure 3.2: The partial trace superoperator (see Definition 3.2) is the math-
ematical representation of physically discarding a subsystem of a composite
quantum system. In other words, given a bipartite state 𝜌 𝐴𝐵 for the two quantum
systems 𝐴 and 𝐵, the partial trace Tr 𝐵 allows us to determine the quantum state
of system 𝐴 when we do not have access to system 𝐵 (left), and Tr 𝐴 allows
us to determine the quantum state of system 𝐵 when we do not have access to
system 𝐴 (right).

3.2.2 Partial Trace

Recall from Section 2.2 that the trace of a linear operator 𝑋 acting on a 𝑑-dimensional
Hilbert space can be written as
𝑑−1
∑︁
Tr[𝑋] = ⟨𝑖|𝑋 |𝑖⟩, (3.2.16)
𝑖=0

where {|𝑖⟩}𝑖=0
𝑑−1 is the standard orthonormal basis. We can interpret the trace as

the sum of the diagonal elements of the matrix corresponding to 𝑋 written in the
standard basis. From Exercise 2.5, however, we have that the trace is independent
of the choice of basis used to evaluate it.
The trace is physically relevant, especially when it acts on one part of a bipartite
quantum state, in which case we call it the partial trace. To be specific, given a
state 𝜌 𝐴𝐵 for the bipartite system 𝐴𝐵, we are often interested in determining the
state of only one of its subsystems. The partial trace Tr 𝐵 , which we define formally
below, takes a state 𝜌 𝐴𝐵 acting on the space H 𝐴𝐵 and returns a state 𝜌 𝐴 ≡ Tr 𝐵 [𝜌 𝐴𝐵 ]
acting on the space H 𝐴 . The partial trace is therefore the mathematical operation
used to determine the state of one of the subsystems given the state of a composite
system comprising two or more subsystems, and it can be thought of as the action
of “discarding” one of the subsystems; see Figure 3.2. The partial trace generalizes
the notion of marginalizing a joint probability distribution. Later, in Chapter 4, we
see that partial trace is a particular kind of quantum channel corresponding to this
discarding action.
93
Chapter 3: Quantum States and Measurements

Definition 3.2 Partial Trace

Given quantum systems 𝐴 and 𝐵, the partial trace over 𝐵 is denoted by
Tr 𝐵 ≡ id 𝐴 ⊗ Tr 𝐵 , and it is defined as

Tr 𝐵 [𝑋 𝐴𝐵 ] = (id 𝐴 ⊗ Tr)(𝑋 𝐴𝐵 )
𝐵 −1
𝑑∑︁
(3.2.17)
= ( 1 𝐴 ⊗ ⟨ 𝑗 | 𝐵 ) 𝑋 𝐴𝐵 ( 1 𝐴 ⊗ | 𝑗⟩𝐵 )
𝑗=0

for every linear operator 𝑋 𝐴𝐵 ∈ L(H 𝐴 ⊗ H𝐵 ). Similarly, the partial trace over
𝐴 is denoted by Tr 𝐴 ≡ Tr 𝐴 ⊗ id𝐵 and is defined as

Tr 𝐴 [𝑋 𝐴𝐵 ] = (Tr ⊗ id𝐵 )(𝑋 𝐴𝐵 )

𝐴−1
𝑑∑︁
(3.2.18)
= (⟨𝑖| 𝐴 ⊗ 1𝐵 ) 𝑋 𝐴𝐵 (|𝑖⟩ 𝐴 ⊗ 1𝐵 )
𝑖=0

for all 𝑋 𝐴𝐵 ∈ L(H 𝐴 ⊗ H𝐵 ).

Remark: For every linear operator 𝑋 𝐴𝐵 acting on H 𝐴𝐵 , we can define the partial trace
Tr 𝐵 [𝑋 𝐴𝐵 ] more abstractly as the unique linear operator 𝑋 𝐴 acting on H 𝐴 such that

Tr[(𝑀 𝐴 ⊗ 1 𝐵 ) 𝑋 𝐴𝐵 ] = Tr[𝑀 𝐴 𝑋 𝐴] (3.2.19)

for every operator 𝑀 𝐴 ∈ L(H 𝐴). If we let 𝑋 𝐴𝐵 be the state 𝜌 𝐴𝐵 and 𝑀 𝐴 be a Hermitian operator,
then we can interpret this equation physically in the following way: in order to determine the
expectation value of an observable 𝑀 𝐴 acting on only one of the subsystems (in this case, the 𝐴
subsystem), it suffices to know the reduced state 𝜌 𝐴 of the subsystem 𝐴 rather than the joint state
𝜌 𝐴𝐵 of the total system.

It is clear from Definition 3.2 that the partial trace is a linear superoperator.
In particular, the expressions in (3.2.17) and (3.2.18) define the partial trace in
precisely the operator-sum form for superoperators shown in (2.2.179).
Now, in order to explicitly determine the partial trace of a given linear operator
𝑋 𝐴𝐵 ∈ L(H 𝐴𝐵 ), it suffices to know the action of the partial trace on basis elements
of L(H 𝐴𝐵 ) because the action of every linear superoperator is completely defined
by its action on basis elements. Using the orthonormal basis for L(H 𝐴𝐵 ) given in
(3.2.11), it is straightforward to see that the action of the partial trace Tr 𝐵 on this

94
Chapter 3: Quantum States and Measurements

basis is
Tr 𝐵 [|𝑖⟩⟨𝑖′ | 𝐴 ⊗ | 𝑗⟩⟨ 𝑗 ′ | 𝐵 ] = |𝑖⟩⟨𝑖′ | 𝐴 𝛿 𝑗, 𝑗 ′ (3.2.20)
for all 0 ≤ 𝑖, 𝑖′ ≤ 𝑑 𝐴 − 1, 0 ≤ 𝑗, 𝑗 ′ ≤ 𝑑 𝐵 − 1. Similarly, for Tr 𝐴 , we obtain

Tr 𝐴 [|𝑖⟩⟨𝑖′ | 𝐴 ⊗ | 𝑗⟩⟨ 𝑗 ′ | 𝐵 ] = 𝛿𝑖,𝑖 ′ | 𝑗⟩⟨ 𝑗 ′ | 𝐵 (3.2.21)

for all 0 ≤ 𝑖, 𝑖′ ≤ 𝑑 𝐴 − 1, 0 ≤ 𝑗, 𝑗 ′ ≤ 𝑑 𝐵 − 1. Then, by decomposing every linear

operator 𝑋 𝐴𝐵 as
𝐴−1 𝑑∑︁
𝑑∑︁ 𝐵 −1

𝑋 𝐴𝐵 = 𝑋𝑖, 𝑗;𝑖 ′ , 𝑗 ′ |𝑖⟩⟨𝑖′ | 𝐴 ⊗ | 𝑗⟩⟨ 𝑗 ′ | 𝐵

𝑖,𝑖 ′ =0 𝑗, 𝑗 ′ =0
(3.2.22)
𝐴−1 𝑑∑︁
𝑑∑︁ 𝐵 −1

= 𝑋𝑖, 𝑗;𝑖 ′ , 𝑗 ′ |𝑖, 𝑗⟩⟨𝑖′, 𝑗 ′ | 𝐴𝐵 ,

𝑖,𝑖 ′ =0 𝑗, 𝑗 ′ =0

where 𝑋𝑖, 𝑗;𝑖 ′ , 𝑗 ′ B ⟨𝑖, 𝑗 |𝑋 𝐴𝐵 |𝑖′, 𝑗 ′⟩, we find that

𝐴−1 𝑑∑︁
𝑑∑︁ 𝐵 −1

Tr 𝐵 [𝑋 𝐴𝐵 ] = 𝑋𝑖, 𝑗;𝑖 ′ , 𝑗 ® |𝑖⟩⟨𝑖′ | 𝐴 ,

Tr 𝐴 [𝑋 𝐴𝐵 ] = 𝑋𝑖, 𝑗;𝑖, 𝑗 ′ | 𝑗⟩⟨ 𝑗 ′ | 𝐵 . (3.2.24)

𝑗, 𝑗 ′ =0 𝑖=0

For every bipartite linear operator 𝑋 𝐴𝐵 , we let

𝑋 𝐴 ≡ Tr 𝐵 [𝑋 𝐴𝐵 ] and 𝑋𝐵 ≡ Tr 𝐴 [𝑋 𝐴𝐵 ] (3.2.25)

denote its partial traces. For states, we also use the terms marginal states or reduced
states to refer to their partial traces.
An immediate consequence of the Schmidt decomposition theorem is that the
marginal states 𝜌 𝐴 B Tr 𝐵 [|𝜓⟩⟨𝜓| 𝐴𝐵 ] and 𝜌 𝐵 B Tr 𝐴 [|𝜓⟩⟨𝜓| 𝐴𝐵 ] of every pure state
|𝜓⟩⟨𝜓| 𝐴𝐵 have the same non-zero eigenvalues. Indeed, using (3.2.9), we find that
𝑟 √︁
∑︁
𝜌𝐴 = 𝜆 𝑘 𝜆 𝑘 ′ Tr 𝐵 [|𝑒 𝑘 ⟩⟨𝑒 𝑘 ′ | 𝐴 ⊗ | 𝑓 𝑘 ⟩⟨ 𝑓 𝑘 ′ | 𝐵 ] (3.2.26)
𝑘,𝑘 ′ =1
∑︁𝑟 √︁
= 𝜆 𝑘 𝜆 𝑘 ′ |𝑒 𝑘 ⟩⟨𝑒 𝑘 ′ | 𝐴 𝛿 𝑘,𝑘 ′ (3.2.27)
𝑘,𝑘 ′ =1

95
Chapter 3: Quantum States and Measurements

𝑟
∑︁
= 𝜆 𝑘 |𝑒 𝑘 ⟩⟨𝑒 𝑘 | 𝐴 , (3.2.28)
𝑘=1
𝑟
∑︁ √︁
and 𝜌𝐵 = 𝜆 𝑘 𝜆 𝑘 ′ Tr 𝐴 [|𝑒 𝑘 ⟩⟨𝑒 𝑘 ′ | 𝐴 ⊗ | 𝑓 𝑘 ⟩⟨ 𝑓 𝑘 ′ | 𝐵 ] (3.2.29)
𝑘,𝑘 ′ =1
𝑟 √︁
∑︁
= 𝜆 𝑘 𝜆 𝑘 ′ 𝛿 𝑘,𝑘 ′ | 𝑓 𝑘 ⟩⟨ 𝑓 𝑘 ′ | 𝐵 (3.2.30)
𝑘,𝑘 ′ =1
𝑟
∑︁
= 𝜆 𝑘 | 𝑓 𝑘 ⟩⟨ 𝑓 𝑘 | 𝐵 , (3.2.31)
𝑘=1

in which the equalities in (3.2.28) and (3.2.31) contain spectral decompositions of

𝜌 𝐴 and 𝜌 𝐵 .

Exercise 3.5
Consider two quantum systems 𝐴 and 𝐵, with 𝑑 𝐴 = 𝑑 𝐵 = 𝑑.
1. Calculate Tr 𝐴 [|Γ⟩⟨Γ| 𝐴𝐵 ] and Tr 𝐵 [|Γ⟩⟨Γ| 𝐴𝐵 ], where we recall from (2.2.34)
Í
that |Γ⟩ 𝐴𝐵 = 𝑑−1𝑗=0 | 𝑗, 𝑗⟩ 𝐴𝐵 .

2. Calculate Tr 𝐴 [𝐹𝐴𝐵 ] and Tr 𝐵 [𝐹𝐴𝐵 ], where we recall from (2.5.10) that

Í
𝐹𝐴𝐵 = 𝑑−1 ′ ′
𝑘,𝑘 ′ =0 |𝑘, 𝑘 ⟩⟨𝑘 , 𝑘 | 𝐴𝐵 .

Below are two useful lemmas about how the support of a bipartite linear
operator (recall the definition of support from Section 2.2) relates to the support of
its partial traces. Their proofs are somewhat technical, and so we provide them in
Appendices 3.A and 3.B.

Lemma 3.3
Let 𝑋 𝐴𝐵 ∈ L(H 𝐴 ⊗ H𝐵 ) be positive semi-definite, and let 𝑋 𝐴 B Tr 𝐵 [𝑋 𝐴𝐵 ]
and 𝑋𝐵 B Tr 𝐴 [𝑋 𝐴𝐵 ]. Then supp(𝑋 𝐴𝐵 ) ⊆ supp(𝑋 𝐴 ) ⊗ supp(𝑋𝐵 ).

96
Chapter 3: Quantum States and Measurements

Lemma 3.4
Let 𝑋 𝐴𝐵 , 𝑌 𝐴𝐵 ∈ L(H 𝐴 ⊗ H𝐵 ) be positive semi-definite, and suppose that
supp(𝑋 𝐴𝐵 ) ⊆ supp(𝑌 𝐴𝐵 ). Then supp(𝑋 𝐴 ) ⊆ supp(𝑌 𝐴 ), where 𝑋 𝐴 B
Tr 𝐵 [𝑋 𝐴𝐵 ] and 𝑌 𝐴 B Tr 𝐵 [𝑌 𝐴𝐵 ].

3.2.3 Separable and Entangled States

The concepts of separable and entangled states are at the heart of virtually all
of the communication protocols that we consider in this book. More generally,
entanglement is a key distinction between the classical and quantum theories
of information; it simply is not present and therefore does not play a role in
classical information theory. Entanglement, in particular, is a key element of
private communication and secure key distillation, and the successful distribution
of entangled states among several spatially separated parties is a crucial ingredient
in the implementation of such protocols over the future quantum internet. If the
parties share only separable, unentangled states, then it is not possible for them to
distill a key that is secure against a general quantum adversary.
We begin this section by defining separable and entangled states.

Definition 3.5 Separable and Entangled States

A bipartite state 𝜎𝐴𝐵 is called separable if there exists a finite alphabet X, a
probability distribution 𝑝 : X → [0, 1] on X, and sets {𝜔𝑥𝐴 }𝑥∈X and {𝜏𝐵𝑥 }𝑥∈X of
states for 𝐴 and 𝐵, respectively, such that
∑︁
𝜎𝐴𝐵 = 𝑝(𝑥)𝜔𝑥𝐴 ⊗ 𝜏𝐵𝑥 . (3.2.32)
𝑥∈X

In other words, a state is called separable if it can be written as a convex

combination of product states, each of which has the form 𝜔 𝐴 ⊗ 𝜏𝐵 . The set of
separable states on H 𝐴𝐵 is denoted by SEP( 𝐴 : 𝐵).
A state that is not separable is called entangled.

97
Chapter 3: Quantum States and Measurements

Remark: Note that a separable state can always be written in the form
∑︁
𝜎𝐴𝐵 = 𝑞(𝑥 ′ )|𝜓 𝑥 ′ ⟩⟨𝜓 𝑥 ′ | 𝐴 ⊗ |𝜙 𝑥 ′ ⟩⟨𝜙 𝑥 ′ | 𝐵 (3.2.33)
𝑥 ′ ∈X′

for some probability distribution 𝑞 : X′ → [0, 1] on a finite alphabet X′ and sets of pure states
{|𝜓 𝑥 ′ ⟩⟨𝜓 𝑥 ′ | 𝐴 : 𝑥 ′ ∈ X′ }, {|𝜙 𝑥 ′ ⟩⟨𝜙 𝑥 ′ | 𝐵 : 𝑥 ′ ∈ X′ }. In other words, separable states can always
be written as a convex combination of pure product states. Indeed, from (3.2.32), we can take
spectral decompositions of 𝜔 𝑥𝐴 and 𝜏𝐵𝑥 ,

𝑟𝐴 𝑥 𝑥
𝑟𝐵
∑︁ ∑︁
𝜔 𝑥𝐴 = 𝑡 𝑘𝑥 |𝑒 𝑘𝑥 ⟩⟨𝑒 𝑘𝑥 | 𝐴, 𝜏𝐵𝑥 = 𝑠ℓ𝑥 | 𝑓ℓ𝑥 ⟩⟨ 𝑓ℓ𝑥 | 𝐵 , (3.2.34)
𝑘=1 ℓ=1

where 𝑟 𝐴𝑥 = rank(𝜔 𝑥𝐴) and 𝑟 𝐵𝑥 = rank(𝜏𝐵𝑥 ), so that

𝑟 𝐴 𝑟𝐵 𝑥 𝑥
∑︁ ∑︁ ∑︁
𝜌 𝐴𝐵 = 𝑝(𝑥)𝑡 𝑘𝑥 𝑠ℓ𝑥 |𝑒 𝑘𝑥 ⟩⟨𝑒 𝑘𝑥 | 𝐴 ⊗ | 𝑓ℓ𝑥 ⟩⟨ 𝑓ℓ𝑥 | 𝐵 . (3.2.35)
𝑥 ∈X 𝑘=1 ℓ=1

Then, define the alphabet X′ = {𝑥 ′ B (𝑥, 𝑘, ℓ) : 𝑥 ∈ X, 1 ≤ 𝑘 ≤ 𝑟 𝐴𝑥 , 1 ≤ ℓ ≤ 𝑟 𝐵𝑥 }, so that 𝑥 ′ is a

superindex, and the unit vectors

|𝜓 𝑥 ′ ⟩ 𝐴 B |𝑒 𝑘𝑥 ⟩ 𝐴, |𝜙 𝑥 ′ ⟩ 𝐵 B | 𝑓ℓ𝑥 ⟩ 𝐵 . (3.2.36)

Also, define the probability distribution 𝑞 : X′ → [0, 1] by

𝑞(𝑥, 𝑘, ℓ) = 𝑝(𝑥)𝑡 𝑘𝑥 𝑠ℓ𝑥 . (3.2.37)

Therefore, (3.2.35) can be written as

∑︁
𝜎𝐴𝐵 = 𝑞(𝑥 ′ )|𝜓 𝑥 ′ ⟩⟨𝜓 𝑥 ′ | 𝐴 ⊗ |𝜙 𝑥 ′ ⟩⟨𝜙 𝑥 ′ | 𝐵 . (3.2.38)
𝑥 ′ ∈X′

From the development above, it follows that the set of separable states is the convex hull of the
set of pure product states. By an application of the Fenchel–Eggleston–Carathéodory Theorem
(Theorem 2.23), it follows that 𝜎𝐴𝐵 can be written as a convex combination of no more than
rank(𝜎𝐴𝐵 ) 2 pure product states.

In the sense that follows, bipartite separable states can be thought of as exhibiting
purely classical correlations between the two parties, Alice and Bob. Suppose that
Alice draws 𝑥 with probability 𝑝(𝑥), prepares her system in the state 𝜔𝑥𝐴 , sends
𝑥 to Bob over a classical channel, who then prepares his system in the state 𝜏𝐵𝑥 ,
where 𝑥 ∈ X and X is a finite alphabet. This procedure corresponds to preparing
the ensemble {( 𝑝(𝑥), 𝜔𝑥𝐴 ⊗ 𝜏𝐵𝑥 )}𝑥∈X , and if Alice and Bob discard the label 𝑥, then
Í
their shared joint state is the separable state 𝜎𝐴𝐵 = 𝑥∈X 𝑝(𝑥)𝜔𝑥𝐴 ⊗ 𝜏𝐵𝑥 .

98
Chapter 3: Quantum States and Measurements

On the other hand, no such procedure consisting of only local operations by

Alice and Bob, supplemented by classical communication between them, can ever
be used to generate an entangled state between them (without them already sharing
some entanglement beforehand). In Section 4.6.2, we introduce local operations
and classical communication (LOCC) channels and explain this point in more detail.
Essentially, two entangled quantum systems are intrinsically linked in such a way
that it is insufficient to describe each one individually.
Observe that a pure state 𝜓 𝐴𝐵 is separable if and only if it is a product state, i.e.,
if and only if there exist pure states 𝜙 𝐴 and 𝜑 𝐵 such that 𝜓 𝐴𝐵 = 𝜙 𝐴 ⊗ 𝜑 𝐵 . Recalling
that every pure state has a Schmidt decomposition (see Theorem 2.2), we obtain
the following result:

A pure state is entangled if and only if its Schmidt rank

is strictly greater than one.

An important example of an entangled pure state is the state Φ 𝐴𝐵 on two

𝑑-dimensional systems 𝐴 and 𝐵, defined as Φ 𝐴𝐵 B |Φ⟩⟨Φ| 𝐴𝐵 , where
𝑑−1
1 ∑︁ 1
|Φ⟩ 𝐴𝐵 B√ |𝑖⟩ 𝐴 ⊗ |𝑖⟩𝐵 = √ |Γ⟩ 𝐴𝐵 , (3.2.39)
𝑑 𝑖=0 𝑑
Í𝑑−1
and |Γ⟩ 𝐴𝐵 = 𝑖=0 |𝑖⟩ 𝐴 ⊗ |𝑖⟩𝐵 is the vector defined in (2.2.34).

Exercise 3.6

1. Show that Tr 𝐴 [Φ 𝐴𝐵 ] = Tr 𝐵 [Φ 𝐴𝐵 ] = 1𝑑𝑑 .

2. Define Φ𝑈𝐴𝐵 B (𝑈 𝐴 ⊗ 1𝐵 )Φ 𝐴𝐵 (𝑈 𝐴 ⊗ 1𝐵 ) † , for 𝑈 𝐴 a unitary acting on

system 𝐴. Show that Tr 𝐴 [Φ𝑈𝐴𝐵 ] = Tr 𝐵 [Φ𝑈𝐴𝐵 ] = 1𝑑𝑑 .

The state Φ 𝐴𝐵 is an example of a maximally entangled state.

99
Chapter 3: Quantum States and Measurements

Definition 3.6 Maximally Entangled Pure State

A pure state 𝜓 𝐴𝐵 = |𝜓⟩⟨𝜓| 𝐴𝐵 , for two systems 𝐴 and 𝐵 of the same dimension
𝑑, is called maximally entangled if the Schmidt coefficients of |𝜓⟩ 𝐴𝐵 are all
equal to √1 , with 𝑑 being the Schmidt rank of |𝜓⟩ 𝐴𝐵 .
𝑑

In other words, 𝜓 𝐴𝐵 is called maximally entangled if |𝜓⟩ 𝐴𝐵 has the Schmidt

decomposition
𝑑
1 ∑︁
|𝜓⟩ 𝐴𝐵 = √ |𝑒 𝑘 ⟩ 𝐴 ⊗ | 𝑓 𝑘 ⟩𝐵 (3.2.40)
𝑑 𝑘=1
for some orthonormal sets {|𝑒 𝑘 ⟩ 𝐴 : 1 ≤ 𝑘 ≤ 𝑑} and {| 𝑓 𝑘 ⟩𝐵 : 1 ≤ 𝑘 ≤ 𝑑}. Observe
then that
1𝑑
Tr 𝐴 [|𝜓⟩⟨𝜓| 𝐴𝐵 ] = = Tr 𝐵 [|𝜓⟩⟨𝜓| 𝐴𝐵 ]. (3.2.41)
𝑑
In other words, like the state Φ 𝐴𝐵 , the marginal states of every maximally entangled
state are maximally mixed for 𝐴 and 𝐵.
Maximally entangled states provide a good example of why entangled quantum
systems are, in a sense, greater than the sum of their parts. Since maximally
entangled states have maximally mixed marginal states, the individual quantum
systems in a maximally entangled state can be viewed as being in a completely
random state since, as we have seen, maximally mixed states can be written as
the expected state of every ensemble of orthonormal pure states with uniform
probability distribution. However, intriguingly, the overall composite system is in a
pure, definite state.

Exercise 3.7
Prove that every state vector of the form ( 1𝑑 ⊗ 𝑈)|Φ𝑑 ⟩ = √1 vec(𝑈) and
𝑑
(𝑈 ⊗ 1𝑑 )|Φ𝑑 ⟩, with 𝑑 ≥ 2 and 𝑈 a unitary operator, corresponds to a maximally
entangled pure state. Conversely, given a maximally entangled pure state
|𝜓⟩⟨𝜓| 𝐴𝐵 , prove that there exists a unitary 𝑈 𝐴 such that |𝜓⟩ 𝐴𝐵 = (𝑈 𝐴 ⊗ 1𝐵 )|Φ⟩ 𝐴𝐵 .

100
Chapter 3: Quantum States and Measurements

3.2.4 Bell States

In Exercise 3.7, we learned that every state vector of the form ( 1 ⊗ 𝑈)|Φ⟩ and
(𝑈 ⊗ 1)|Φ⟩, with 𝑈 a unitary operator, is a maximally entangled state. We now
provide an important example of a class of maximally entangled states, known
as Bell states, for every dimension 𝑑 ≥ 2. These states are defined by particular
choices for the unitary 𝑈. The Bell states are an important element of many
quantum information processing tasks, most prominently quantum teleportation
and super-dense coding, which we discuss in Chapter 5.
We start with dimension 𝑑 = 2. Recall the Pauli operators 𝑋 and 𝑍 from (3.2.6):

0 1 1 0
𝑋= , 𝑍= . (3.2.42)
1 0 0 1

Observe that, in addition to being Hermitian, these operators are unitary, which is
due to the fact that 𝑋 2 = 𝑍 2 = 1. The operator 𝑌 defined in (3.2.6) is also unitary,
since 𝑌 2 = 1, from which it follows that the operator 𝑍 𝑋 = i𝑌 is also unitary.
Using the operators 𝑋, 𝑍, and 𝑍 𝑋, we define the following set of four entangled,
two-qubit state vectors:

|Φ𝑧,𝑥 ⟩ B (𝑍 𝑧 𝑋 𝑥 ⊗ 1)|Φ⟩, (3.2.43)

for 𝑧, 𝑥 ∈ {0, 1}, where we recall from (3.2.39) that |Φ⟩ B √1 (|0, 0⟩ + |1, 1⟩). The
2
corresponding density operators Φ𝑧,𝑥 are known as the two-qubit Bell states. The
following notation is commonly used:
1
|Φ+ ⟩ ≡ |Φ0,0 ⟩ = √ (|0, 0⟩ + |1, 1⟩), (3.2.44)
2
1
|Φ− ⟩ ≡ |Φ1,0 ⟩ = √ (|0, 0⟩ − |1, 1⟩), (3.2.45)
2
1
|Ψ+ ⟩ ≡ |Φ0,1 ⟩ = √ (|0, 1⟩ + |1, 0⟩), (3.2.46)
2
1
|Ψ− ⟩ ≡ |Φ1,1 ⟩ = √ (|0, 1⟩ − |1, 0⟩). (3.2.47)
2

101
Chapter 3: Quantum States and Measurements

Exercise 3.8
1. Prove that the two-qubit Bell state vectors defined in (3.2.43) form an
orthonormal basis for C2 ⊗ C2 .
2. Prove that the state vectors |Φ+ ⟩, |Φ− ⟩, and |Ψ+ ⟩ form an orthonormal basis
for Sym2 (C2 ). (Hint: See (2.5.11) and Exercise 2.34.) For this reason, the
subspace Sym2 (C2 ) is called the triplet subspace.
3. Prove that ASym2 (C2 ) = span{|Ψ− ⟩}. For this reason, the subspace
ASym2 (C2 ) is called the singlet subspace and the state |Ψ− ⟩ is called the
singlet state vector.

We can generalize the Bell state vectors in (3.2.43) to systems with dimension
𝑑 > 2. Doing so requires a generalization of the qubit Pauli operators 𝑋 and 𝑍 to
unitary operators for qudits2 .

Definition 3.7 Heisenberg–Weyl Operators

Let 𝑑 ≥ 2. The Heisenberg–Weyl operators make up the set {𝑊𝑧,𝑥 : 0 ≤ 𝑧, 𝑥 ≤
𝑑 − 1} of 𝑑 2 unitary operators acting on C𝑑 , defined as follows:

𝑊𝑧,𝑥 = 𝑍 (𝑧) 𝑋 (𝑥), (3.2.48)

𝑑−1
∑︁ 2 𝜋i𝑘𝑧
𝑍 (𝑧) B e 𝑑 |𝑘⟩⟨𝑘 |, (3.2.49)
𝑘=0
𝑑−1
∑︁
𝑋 (𝑥) B |𝑘 + 𝑥⟩⟨𝑘 |, (3.2.50)
𝑘=0

where the addition operation in the definition of 𝑋 (𝑥) is performed modulo 𝑑.

2The qudit operators defined in (2.2.44)–(2.2.47) represent one generalization of the qubit Pauli
operators. Although they are Hermitian, they are not generally unitary. What we require here is a
generalization to qudit operators that are unitary.

102
Chapter 3: Quantum States and Measurements

Exercise 3.9
1. Verify that when 𝑑 = 2, the Heisenberg–Weyl operators reduce to the qubit
Pauli operators 𝑍, 𝑋, and 𝑍 𝑋.
2. Prove that the operators 𝑍 (𝑧) and 𝑋 (𝑥) defined in (3.2.49) and (3.2.50)
satisfy the commutation relation
2 𝜋i𝑥𝑧
𝑍 (𝑧) 𝑋 (𝑥) = e 𝑑 𝑋 (𝑥)𝑍 (𝑧), (3.2.51)

for all 𝑧, 𝑥 ∈ {0, 1, . . . , 𝑑 − 1}.

The Heisenberg–Weyl operators are unitary, just like the Pauli operators;
however, unlike the Pauli operators, they are not Hermitian. In particular,
2 𝜋i𝑥𝑧
†
𝑊𝑧,𝑥 = e− 𝑑 𝑊−𝑧,−𝑥 . (3.2.52)

It is also straightforward to show that

2 𝜋i𝑥1 𝑧2
𝑊𝑧1 ,𝑥1 𝑊𝑧2 ,𝑥2 = e− 𝑑 𝑊𝑧1 +𝑧2 ,𝑥1 +𝑥2 . (3.2.53)

Furthermore, the Heisenberg–Weyl operators are orthogonal with respect to the

Hilbert–Schmidt inner product, meaning that

⟨𝑊𝑧1 ,𝑥1 , 𝑊𝑧2 ,𝑥2 ⟩ = Tr[𝑊𝑧†1 ,𝑥1 𝑊𝑧2 ,𝑥2 ] = 𝑑𝛿 𝑧1 ,𝑧2 𝛿𝑥1 ,𝑥2 (3.2.54)

for all 0 ≤n 𝑧1 , 𝑧2 , 𝑥1 , 𝑥2 ≤ 𝑑 − 1. This

o implies that the scaled Heisenberg–Weyl
1
operators √ 𝑊𝑧,𝑥 : 0 ≤ 𝑧, 𝑥 ≤ 𝑑 − 1 form an orthonormal basis for L(C𝑑 ) for all
𝑑
𝑑 ≥ 2.

Exercise 3.10
Prove (3.2.52), (3.2.53), and (3.2.54).

103
Chapter 3: Quantum States and Measurements

Exercise 3.11
Let 𝑑 ≥ 2, and consider the operator 𝑄 𝑑 defined as
𝑑−1
1 ∑︁ 2 𝜋i𝑘ℓ
𝑄𝑑 B √ e 𝑑 |𝑘⟩⟨ℓ|. (3.2.55)
𝑑 𝑘,ℓ=0

1. Show that 𝑄 𝑑 is a unitary operator.

2. Prove that

𝑄 𝑑 𝑋 (𝑥)𝑄 †𝑑 = 𝑍 (𝑥), (3.2.56)

𝑄 𝑑 𝑍 (𝑧)𝑄 †𝑑 = 𝑋 (𝑧) † , (3.2.57)

for all 𝑧, 𝑥 ∈ {0, 1, . . . , 𝑑 − 1}.

The unitary operator 𝑄 𝑑 is known as the (discrete) quantum Fourier transform
operator.

Using the Heisenberg–Weyl operators, we now define the set of qudit Bell states
in a manner analogous to (3.2.43).

Definition 3.8 Qudit Bell States

Let 𝑑 ≥ 2. The qudit Bell states are 𝑑 2 pure quantum states Φ𝑧,𝑥 B |Φ𝑧,𝑥 ⟩⟨Φ𝑧,𝑥 |
in D(C𝑑 ⊗ C𝑑 ), where

|Φ𝑧,𝑥 ⟩ B (𝑊𝑧,𝑥 ⊗ 1𝑑 )|Φ𝑑 ⟩ (3.2.58)

Í
for all 𝑧, 𝑥 ∈ {0, 1, . . . , 𝑑 − 1}, and |Φ𝑑 ⟩ = √1 𝑑−1
𝑗=0 | 𝑗, 𝑗⟩.
𝑑

Exercise 3.12
Prove that the two-qudit Bell state vectors defined in (3.2.58) form an orthonor-
mal basis for C𝑑 ⊗ C𝑑 for all 𝑑 ≥ 2.

The fact that the two-qubit Bell state vectors form an orthonormal basis for

104
Chapter 3: Quantum States and Measurements

C𝑑 ⊗ C𝑑 implies, from Exercise 2.2, that

𝑑−1
|Φ𝑧,𝑥 ⟩⟨Φ𝑧,𝑥 | = 1𝑑 ⊗ 1𝑑 .
∑︁
(3.2.59)
𝑧,𝑥=0

A two-qudit state 𝜌 𝐴𝐵 is known as a Bell-diagonal state if it is diagonal in the

two-qudit Bell basis, so that it is of the form
𝑑−1
∑︁
𝑝(𝑧, 𝑥)|Φ𝑧,𝑥 ⟩⟨Φ𝑧,𝑥 | (3.2.60)
𝑧,𝑥=0

for some probability distribution 𝑝 : {0, 1, . . . , 𝑑 − 1}2 → [0, 1], meaning that
Í𝑑−1
0 ≤ 𝑝(𝑧, 𝑥) ≤ 1 for all 𝑧, 𝑥 ∈ {0, 1, . . . , 𝑑 − 1} and 𝑧,𝑥=0 𝑝(𝑧, 𝑥) = 1.

3.2.5 Purifications and Extensions

One of the most useful concepts in quantum information is the notion of purification.
There is no strong classical analogue of this concept, and thus this notion represents
another distinction between the classical and quantum theories of information.

Definition 3.9 Purification

Let 𝜌 𝐴 be a state of a system 𝐴. A purification of 𝜌 𝐴 is a pure state |𝜓⟩⟨𝜓| 𝑅 𝐴
for a bipartite system 𝑅 𝐴 such that

Tr 𝑅 [|𝜓⟩⟨𝜓| 𝑅 𝐴 ] = 𝜌 𝐴 . (3.2.61)

We often call the reference system 𝑅 the “purifying system.”

The following simple theorem establishes that every state 𝜌 𝐴 has a purification.

Theorem 3.10 State Purification

For every state 𝜌 𝐴 , there exists a purification |𝜓⟩⟨𝜓| 𝑅 𝐴 of 𝜌 𝐴 with 𝑑 𝑅 ≥
rank(𝜌 𝐴 ).

105
Chapter 3: Quantum States and Measurements

Proof: Consider a spectral decomposition of 𝜌 𝐴

𝑟
∑︁
𝜌𝐴 = 𝜆 𝑘 |𝜙 𝑘 ⟩⟨𝜙 𝑘 |, (3.2.62)
𝑘=1

where 𝑟 = rank(𝜌 𝐴 ). Consider a reference system 𝑅 with 𝑑 𝑅 ≥ 𝑟 and an arbitrary

set {|𝜑 𝑘 ⟩ 𝑅 : 1 ≤ 𝑘 ≤ 𝑟 } of orthonormal states. The unit vector
𝑟 √︁
∑︁
|𝜓⟩ 𝑅 𝐴 B 𝜆 𝑘 |𝜑 𝑘 ⟩ 𝑅 ⊗ |𝜙 𝑘 ⟩ 𝐴 (3.2.63)
𝑘=1

then satisfies
∑︁𝑟 √︁
Tr 𝑅 [|𝜓⟩⟨𝜓| 𝑅 𝐴 ] = 𝜆 𝑘 𝜆 𝑘 ′ Tr[|𝜑 𝑘 ⟩⟨𝜑 𝑘 ′ | 𝑅 ] |𝜙 𝑘 ⟩⟨𝜙 𝑘 ′ | 𝐴 (3.2.64)
𝑘,𝑘 ′ =1
| {z }
𝛿 𝑘,𝑘 ′
𝑟
∑︁
= 𝜆 𝑘 |𝜙 𝑘 ⟩⟨𝜙 𝑘 | 𝐴 (3.2.65)
𝑘=1
= 𝜌 𝐴, (3.2.66)
so that |𝜓⟩⟨𝜓| 𝑅 𝐴 is a purification of 𝜌 𝐴 , as required. ■

Remark: The theorem above states that the condition 𝑑 𝑅 ≥ rank(𝜌 𝐴) on the dimension of the
purifying system 𝑅 is sufficient to guarantee the existence of a purification. This condition is
also necessary, meaning that it is not possible to have a purifying system whose dimension is less
than the rank of 𝜌 𝐴.

The proof of the theorem above not only tells us that every state has a purification,
but it also tells us explicitly how to construct one such purification. We can also
construct a purification of every state 𝜌 𝐴 as follows:
√ √
|𝜓⟩ 𝑅 𝐴 = ( 1 𝑅 ⊗ 𝜌 𝐴 )|Γ⟩ 𝑅 𝐴 = vec( 𝜌 𝐴 ), (3.2.67)
Í𝑑 𝐴−1
where |Γ⟩ 𝑅 𝐴 = 𝑖=0 |𝑖, 𝑖⟩ 𝑅 𝐴 and where the operation vec is defined in (2.2.31).
We often call the state |𝜓⟩⟨𝜓| 𝑅 𝐴 the canonical purification of 𝜌 𝐴 . Note that the
canonical purification is very closely related to the purification used in the proof of
Theorem 3.10. Indeed, if
𝑟
∑︁
𝜌𝐴 = 𝜆 𝑘 |𝜙 𝑘 ⟩⟨𝜙 𝑘 | (3.2.68)
𝑘=1

106
Chapter 3: Quantum States and Measurements

is a spectral decomposition of 𝜌 𝐴 , with 𝑟 = rank(𝜌 𝐴 ), then

𝑟
√ ∑︁ √︁
vec( 𝜌 𝐴 ) = 𝜆 𝑘 |𝜙 𝑘 ⟩ 𝑅 ⊗ |𝜙 𝑘 ⟩ 𝐴 , (3.2.69)
𝑘=1

where we have made use of (2.2.35).

Physically, the fact that every state 𝜌 𝐴 has a purification means that every
quantum system 𝐴 in a mixed state can be viewed as being entangled with some
system 𝑅 to which we do not have access, such that the global state is a pure state
|𝜓⟩⟨𝜓| 𝑅 𝐴 . Since we do not have access to 𝑅, our description of the state of system
𝐴 has to be as the partial trace of |𝜓⟩⟨𝜓| 𝑅 𝐴 over 𝑅, i.e., by 𝜌 𝐴 .
Observe that if the state 𝜌 𝐴 is pure, i.e., if 𝜌 𝐴 = |𝜙⟩⟨𝜙| 𝐴 , then the only possible
purification of it is of the form

|𝜓⟩⟨𝜓| 𝑅 𝐴 = |𝜑⟩⟨𝜑| 𝑅 ⊗ |𝜙⟩⟨𝜙| 𝐴 , (3.2.70)

with |𝜑⟩⟨𝜑| 𝑅 a pure state of the system 𝑅. In other words, purifications of pure states
can only be pure product states. Somewhat technically, according to Theorem 3.10,
the dimension of system 𝑅 need only satisfy 𝑑 𝑅 ≥ rank(𝜌 𝐴 ). In the case of a pure
state, the rank is equal to one, so that the reference system can be a trivial system of
dimension one. Thus, in this technical sense, pure states already purify themselves.
If we take the reference system to satisfy 𝑑 𝑅 ≥ 2, then indeed the purification has
the form given in (3.2.70).
Purifications of states are not unique. In fact, given a state 𝜌 𝐴 and a purification
|𝜓⟩⟨𝜓| 𝑅 𝐴 of 𝜌 𝐴 as in (3.2.63), let 𝑉𝑅→𝑅′ be an isometry (i.e., a linear operator
satisfying 𝑉 †𝑉 = 1 𝑅 ) acting on the 𝑅 system. Defining

|𝜓 ′⟩ 𝑅′ 𝐴 = (𝑉𝑅→𝑅′ ⊗ 1 𝐴 )|𝜓⟩ 𝑅 𝐴 , (3.2.71)

we find that
𝑟 √︁
∑︁
′ ′
Tr 𝑅′ [|𝜓 ⟩⟨𝜓 | 𝑅′ 𝐴 ] = 𝜆 𝑘 𝜆 𝑘 ′ Tr[𝑉 |𝜑 𝑘 ⟩⟨𝜑 𝑘 ′ | 𝑅𝑉 † ]|𝜙 𝑘 ⟩⟨𝜙 𝑘 | 𝐴 (3.2.72)
𝑘,𝑘 ′ =1
𝑟
∑︁
= 𝜆 𝑘 |𝜙 𝑘 ⟩⟨𝜙 𝑘 | 𝐴 (3.2.73)
𝑘=1
= 𝜌 𝐴, (3.2.74)

107
Chapter 3: Quantum States and Measurements

where we conclude that Tr[𝑉 |𝜑 𝑘 ⟩⟨𝜑 𝑘 ′ |𝑉 † ] = 𝛿 𝑘,𝑘 ′ from cyclicity of the trace and
𝑉 †𝑉 = 1 𝑅 . So |𝜓 ′⟩⟨𝜓 ′ | 𝑅′ 𝐴 is also a purification of 𝜌 𝐴 .
A converse statement holds as well by employing the Schmidt decomposition
(Theorem 2.2): if |𝜓⟩⟨𝜓| 𝑅 𝐴 and |𝜓 ′⟩⟨𝜓 ′ | 𝑅′ 𝐴 are two purifications of the state 𝜌 𝐴 ,
then they are related by an isometry taking one reference system to the other. By
combining this statement and the previous one, we can thus say that purifications
are unique “up to isometries acting on the reference system.”
A purification is an example of an “extension” of a quantum state.

Definition 3.11 Extension

An extension of a quantum state 𝜌 𝐴 is a state 𝜔 𝑅 𝐴 satisfying Tr 𝑅 [𝜔 𝑅 𝐴 ] = 𝜌 𝐴 ,
where 𝑅 is a reference system.

Remark: For a purification, it is required that 𝑑 𝑅 ≥ rank(𝜌 𝐴). However, there is no such
requirement for an extension.

Note that if the state 𝜌 𝐴 is pure, i.e., if 𝜌 𝐴 = |𝜙⟩⟨𝜙| 𝐴 , then every extension 𝜔 𝑅 𝐴
of 𝜌 𝐴 must be a product state, i.e., we must have 𝜔 𝑅 𝐴 = 𝜎𝑅 ⊗ |𝜙⟩⟨𝜙| 𝐴 for some
state 𝜎𝑅 .

3.2.6 Multipartite States and Permutations

A multipartite quantum state is a quantum state of more than two quantum systems.
Let 𝐴1 , . . . , 𝐴𝑛 denote 𝑛 ≥ 2 quantum systems. Then, every quantum state 𝜌 𝐴1 ···𝐴𝑛
can be represented as
𝑑 𝐴1 −1 𝐴𝑛 −1
𝑑∑︁
∑︁
𝜌 𝐴1 ···𝐴𝑛 = ··· 𝛽𝑖1 ,...,𝑖 𝑛 ;𝑖1′ ,...,𝑖 ′𝑛 |𝑖 1 , . . . , 𝑖 𝑛 ⟩⟨𝑖′1 , . . . , 𝑖′𝑛 | 𝐴1 ···𝐴𝑛 , (3.2.75)
𝑖 1 ,𝑖 1′ =0 𝑖 𝑛 ,𝑖 ′𝑛 =0

where 𝛽𝑖1 ,...,𝑖 𝑛 ;𝑖1′ ,...,𝑖 ′𝑛 = ⟨𝑖 1 , . . . , 𝑖 𝑛 |𝜌 𝐴1 ···𝐴𝑛 |𝑖′1 , . . . , 𝑖′𝑛 ⟩. This representation is simply
the generalization of the representation in (3.2.12) to 𝑛 ≥ 2 quantum systems.
Similarly, the generalization of the representation in (3.2.13) to 𝑛 ≥ 2 quantum

108
Chapter 3: Quantum States and Measurements

systems is
𝑑 2𝐴 −1 𝑑 2𝐴𝑛 −1
1
1 ∑︁ ∑︁
𝜌 𝐴1 ···𝐴𝑛 = ··· 𝑟 𝑘 1 ,...,𝑘 𝑛 𝑆 𝑘𝐴11 ⊗ · · · ⊗ 𝑆 𝑘𝐴𝑛𝑛 . (3.2.76)
𝑑 𝐴1 · · · 𝑑 𝐴 𝑛 𝑘 1 =0 𝑘 𝑛 =0

If the quantum systems 𝐴1 , 𝐴2 , . . . , 𝐴𝑛 are identical, meaning that the Hilbert

spaces H 𝐴1 , . . . , H 𝐴𝑛 are isomorphic to each other, so that 𝑑 𝐴1 = 𝑑 𝐴2 = · · · = 𝑑 𝐴𝑛 ,
then we can identify them with a single quantum system 𝐴 of dimension 𝑑 𝐴 . In this
case, for brevity, we often write 𝜌 𝐴𝑛 ≡ 𝜌 𝐴1 ···𝐴𝑛 to denote a quantum state for the 𝑛
identical systems. For the remainder of this section, we assume that the systems
𝐴1 , 𝐴2 , . . . , 𝐴𝑛 are identical.
Unlike for bipartite systems, the entanglement of multipartite systems is less
straightforward to define, because there are different notions of separability that one
can define. A discussion of these different notions of separability for multipartite
quantum systems is beyond of the scope of this book. Please see the Bibliographic
Notes (Section 3.4) for references.
An important consideration for a collection of identical quantum systems is
permutations. For example, many quantum systems, such as bosons and fermions,
have quantum states that are symmetric and anti-symmetric, respectively, under
permutation of the individual systems. This is due to the fact that bosons and
fermions are not only identical particles but also indistinguishable. (See the
Bibliographic Notes in Section 3.4 for more information about the quantum theory
of bosons and fermions.) For our purposes, in quantum information, permutations
are a useful tool for establishing certain information quantities as upper bounds on
the rates of some quantum communication tasks.
Recall that we discussed the notion of permutations in Section 2.5. Specifically,
in (2.5.1), we defined a unitary operator 𝑊 𝐴𝜋 𝑛 acting on H ⊗𝑛
𝐴 , for every permutation
𝜋 ∈ S𝑛 , as follows:
𝑊 𝐴𝜋 𝑛 |𝑖1 , 𝑖2 , . . . , 𝑖 𝑛 ⟩ 𝐴𝑛 = |𝑖 𝜋(1) , 𝑖 𝜋(2) , . . . , 𝑖 𝜋(𝑛) ⟩ 𝐴𝑛 , (3.2.77)
for all 0 ≤ 𝑖 1 , 𝑖2 , . . . , 𝑖 𝑛 ≤ 𝑑 − 1. Physically, the operators 𝑊 𝐴𝜋 𝑛 correspond to
permuting the states of the (identical) systems 𝐴1 , 𝐴2 , . . . , 𝐴𝑛 . As an example,
consider 𝑛 quantum states 𝜌 1 , 𝜌 2 , . . . , 𝜌 𝑛 ∈ D(H 𝐴 ). Then, for every permutation
𝜋 ∈ S𝑛 ,

𝑊 𝐴𝑛 𝜌 𝐴1 ⊗ 𝜌 𝐴2 · · · ⊗ 𝜌 𝐴𝑛 𝑊 𝐴𝜋†𝑛 = 𝜌 𝜋(1)
𝜋 1 2 𝑛 𝜋(2) 𝜋(𝑛)
𝐴1 ⊗ 𝜌 𝐴2 · · · ⊗ 𝜌 𝐴 𝑛 , (3.2.78)

109
Chapter 3: Quantum States and Measurements

A1 A1

A2 A2
ρ A4 A3 A3 W π ρA4 W π†

A4 A4

Wπ
Figure 3.3: Depiction of the permutation operator 𝑊 𝜋 defined in (3.2.77) for
𝑛 = 4 quantum systems and the permutation 𝜋 defined by 𝜋(1) = 4, 𝜋(2) = 1,
𝜋(3) = 2, and 𝜋(4) = 3.

which follows straightforwardly from the definition in (3.2.77). See Figure 3.3 for
a visual depiction of the action of 𝑊 𝐴𝜋 𝑛 .

Exercise 3.13
1. Verify (3.2.78).
2. Let 𝑛 ≥ 2, and consider the cyclic permutation 𝜋 = (1 2 . . . 𝑛), which
satisfies 𝜋(𝑖) = 𝑖 + 1 for all 𝑖 ∈ {1, 2, . . . , 𝑛 − 1} and 𝜋(𝑛) = 1. Prove
that for all quantum states 𝜌 1𝐴1 , 𝜌 2𝐴2 , . . . , 𝜌 𝑛𝐴𝑛 , with 𝐴1 , 𝐴2 , . . . , 𝐴𝑛 being
identical quantum systems,

Tr[𝑊 𝐴𝜋 𝑛 (𝜌 1𝐴1 ⊗ 𝜌 2𝐴2 ⊗ · · · ⊗ 𝜌 𝑛𝐴𝑛 )] = Tr[𝜌 1𝐴1 𝜌 2𝐴2 · · · 𝜌 𝑛𝐴𝑛 ]. (3.2.79)

If, in (3.2.78), we have that 𝜌 1 = 𝜌 2 = · · · = 𝜌 𝑛 = 𝜌, then the state 𝜌 ⊗𝑛 is

invariant under every permutation, i.e.,
𝑊 𝐴𝜋 𝑛 𝜌 ⊗𝑛 𝜋† ⊗𝑛
𝐴 𝑊 𝐴𝑛 = 𝜌 𝐴 (3.2.80)
for all 𝜋 ∈ S𝑛 .

Definition 3.12 Permutation-Invariant State

A state 𝜌 ∈ D(H ⊗𝑛 ) is called permutation invariant if
𝜌 = 𝑊 𝜋 𝜌𝑊 𝜋† (3.2.81)
for every permutation 𝜋 ∈ S𝑛 , where the unitary permutation operator 𝑊 𝜋 is

110
Chapter 3: Quantum States and Measurements

defined in (3.2.77).

Remark: Note that the permutation-invariance condition in (3.2.81) does not imply that the
state 𝜌 is supported on the symmetric subspace Sym𝑛 (H) of H ⊗𝑛 . In other words, the condition
in (3.2.81) does not imply that

ΠSym𝑛 (H) 𝜌ΠSym𝑛 (H) = 𝜌. (3.2.82)

As a simple example, suppose that H = C2 , and let 𝜌 = |Ψ − ⟩⟨Ψ − |, where |Ψ − ⟩ = √1 (|0, 1⟩−|1, 0⟩)
2
is the two-qubit Bell state defined in (3.2.47). Then, it is easy to see that 𝑊 𝜋 𝜌𝑊 𝜋† = 𝜌 for all
𝜋 ∈ S2 , while ΠSym2 (H) 𝜌ΠSym2 (H) = 0. The latter is true because |Ψ − ⟩ is an anti-symmetric
state, i.e., |Ψ − ⟩ ∈ ASym2 (H). The state 𝜌 is thus supported on the anti-symmetric subspace,
even though it is permutation invariant.

Exercise 3.14
Í𝑛
Let 𝜌 𝐴𝐵 = 𝑥∈X 𝑝(𝑥)𝜎𝐴𝑥 ⊗ 𝜏𝐵𝑥 , where X is a finite alphabet, 𝑝 : X → [0, 1] is a
probability distribution, and {𝜎𝐴𝑥 }𝑥∈X , {𝜏𝐵𝑥 }𝑥∈X are sets of quantum states.
Í
1. Prove that e 𝜌 𝐴𝐵𝐵′ B 𝑥∈X 𝑝(𝑥)𝜎𝐴𝑥 ⊗ 𝜏𝐵𝑥 ⊗ 𝜏𝐵𝑥 ′ is an extension of 𝜌 𝐴𝐵 , with
𝐵′ being the reference system, in accordance with Definition 3.11, such
that 𝑑 𝐵′ = 𝑑 𝐵 . Prove also that e 𝜌 𝐴𝐵′ B Tr 𝐵 [e 𝜌 𝐴𝐵𝐵′ ] = 𝜌 𝐴𝐵 .
Í
2. Now, let e 𝜌 𝐴𝐵1 𝐵2 ···𝐵 𝑘 B 𝑥∈X 𝑝(𝑥)𝜎𝐴𝑥 ⊗ 𝜏𝐵𝑥 1 ⊗ 𝜏𝐵𝑥 2 ⊗ · · · ⊗ 𝜏𝐵𝑥 𝑘 , where 𝑘 ∈ N.
Prove that e 𝜌 𝐴𝐵ℓ B Tr 𝐵 𝑗 : 𝑗≠ℓ [e 𝜌 𝐴𝐵1 𝐵2 ···𝐵 𝑘 ] = 𝜌 𝐴𝐵 for all ℓ ∈ {1, 2, . . . , 𝑘 },
𝜋†
and that 𝑊𝐵𝜋1 ···𝐵 𝑘 e 𝜌 𝐴𝐵1 ···𝐵 𝑘 for all 𝜋 ∈ S𝑘 . The notation
𝜌 𝐴𝐵1 ···𝐵 𝑘 𝑊𝐵1 ···𝐵 𝑘 = e
Tr 𝐵 𝑗 : 𝑗≠ℓ indicates to take the partial trace over all of the 𝐵 systems except
for 𝐵ℓ .

In the case 𝑛 = 2, meaning that there are only two quantum systems under
consideration, there is only one non-trivial permutation, 𝜋 = (1 2), which swaps
the two elements of the set {1, 2}. Recall from Exercise 2.33 that
𝑑−1
∑︁
(1 2)
𝑊 =𝐹B |𝑘, 𝑘 ′⟩⟨𝑘 ′, 𝑘 |. (3.2.83)
𝑘,𝑘 ′ =0

We call 𝐹 the swap operator, because 𝐹 (𝜌 ⊗ 𝜎)𝐹 † = 𝜎 ⊗ 𝜌 for all quantum states
𝜌 and 𝜎, which is a simple consequence of (3.2.78). In other words, the two states
𝜌 and 𝜎 become “swapped” with respect to the quantum systems after the action of
111
Chapter 3: Quantum States and Measurements

the operator 𝐹. The swap operator is Hermitian and satisfies 𝐹 2 = 1, meaning that
it is also unitary and self-inverse. Also, as a consequence of (3.2.79), we have
Tr[𝐹 (𝜌 ⊗ 𝜎)] = Tr[𝜌𝜎] (3.2.84)
for all quantum states 𝜌 and 𝜎.

Exercise 3.15
1. Verify that 𝐹 |Φ𝑑 ⟩ = |Φ𝑑 ⟩ for all 𝑑 ≥ 2.
2. For the two-qubit Bell state vectors defined in (3.2.43), prove that 𝐹 |Φ𝑧,𝑥 ⟩ =
(−1) 𝑧𝑥 |Φ𝑧,𝑥 ⟩ for all 𝑧, 𝑥 ∈ {0, 1}.
3. More generally, for the two-qudit Bell state vectors defined in (3.2.58),
prove that
2 𝜋i𝑧 𝑥
𝐹 |Φ𝑧,𝑥 ⟩ = e 𝑑 |Φ𝑧,−𝑥 ⟩ (3.2.85)
for all 𝑧, 𝑥 ∈ {0, 1, . . . , 𝑑 − 1}.

Exercise 3.16
Let 𝜌 𝐴𝑛 ∈ D(H ⊗𝑛 ) be an arbitrary quantum state, and consider the state
1 ∑︁ 𝜋
𝜎 𝐴𝑛 B 𝑊 𝐴𝑛 𝜌 𝐴𝑛 𝑊 𝐴𝜋†𝑛 . (3.2.86)
𝑛!
𝜋∈S𝑛

1. Prove that 𝜎𝐴𝑛 is a permutation-invariant state.

2. Let |𝜙⟩ 𝑅 𝐴𝑛 be a purification of 𝜌 𝐴𝑛 . Verify that
1 ∑︁
|𝜓⟩ 𝑋 𝑅 𝐴𝑛 B√ |𝜋⟩ 𝑋 ⊗ ( 1 𝑅 ⊗ 𝑊 𝐴𝜋 𝑛 )|𝜙⟩ 𝑅 𝐴 (3.2.87)
𝑛! 𝜋∈S𝑛

is a purification of 𝜎𝐴𝑛 , where {|𝜋⟩} 𝜋∈S𝑛 is an orthonormal basis indexed

by the elements of S𝑛 , such that ⟨𝜋|𝜋′⟩ = 𝛿 𝜋,𝜋′ for all 𝜋, 𝜋′ ∈ S𝑛 .

It turns out that for permutation-invariant states, we can construct a purification

that is itself permutation invariant.

112
Chapter 3: Quantum States and Measurements

Lemma 3.13 Purification of Permutation-Invariant States

Let 𝜌 𝐴𝑛 ∈ D(H ⊗𝑛
𝐴 ) be a permutation-invariant state, i.e.,

𝜌 𝐴𝑛 = 𝑊 𝐴𝜋 𝑛 𝜌 𝐴𝑛 𝑊 𝐴𝜋†𝑛 (3.2.88)

for all 𝜋 ∈ S𝑛 , where the unitary operators in the set {𝑊 𝐴𝜋 𝑛 } 𝜋∈S𝑛 are defined in
(2.5.1). Then, there exists a permutation-invariant purification |𝜓 𝜌 ⟩⟨𝜓 𝜌 | 𝐴ˆ 𝑛 𝐴𝑛 of
𝜌 𝐴𝑛 , such that |𝜓 𝜌 ⟩ 𝐴ˆ 𝑛 𝐴𝑛 ∈ Sym𝑛 (H 𝐴𝐴
ˆ ). This means that

|𝜓 𝜌 ⟩ 𝐴ˆ 𝑛 𝐴𝑛 = 𝑊 𝐴𝜋ˆ 𝑛 ⊗ 𝑊 𝐴𝜋 𝑛 |𝜓 𝜌 ⟩ 𝐴ˆ 𝑛 𝐴𝑛 (3.2.89)

for all 𝜋 ∈ S𝑛 , where the dimension of 𝐴ˆ is equal to the dimension of 𝐴.

Proof: Consider the canonical purification of 𝜌 𝐴𝑛 as defined in (3.2.67); i.e., let

√
|𝜓 𝜌 ⟩ 𝐴ˆ 𝑛 𝐴𝑛 B 1 𝐴ˆ 𝑛 ⊗ 𝜌 𝐴𝑛 |Γ⟩ 𝐴ˆ 𝑛 𝐴𝑛 .

(3.2.90)

Then, because the operators 𝑊 𝐴𝜋 𝑛 are real in the standard basis, meaning that
𝑊 𝜋ˆ 𝑛 = 𝑊 𝜋ˆ 𝑛 for all 𝜋 ∈ S𝑛 , and using the transpose trick in (2.2.42), we obtain
𝐴 𝐴
√
= 𝑊 ˆ 𝑛 ⊗ 𝑊 𝐴𝑛 1 𝐴ˆ 𝑛 ⊗ 𝜌 𝐴𝑛 |Γ⟩ 𝐴ˆ 𝑛 𝐴𝑛

𝑊 𝐴𝜋ˆ 𝑛 ⊗ 𝑊 𝐴𝜋 𝑛 |𝜓 𝜌 ⟩ 𝐴ˆ 𝑛 𝐴𝑛 𝜋 𝜋
(3.2.91)
𝐴
𝜋 √

= 1 𝐴ˆ 𝑛 ⊗ 𝑊 𝐴𝑛 𝜌 𝐴𝑛 𝑊 𝐴𝑛 |Γ⟩ 𝐴ˆ 𝑛 𝐴𝑛
𝜋†
(3.2.92)
√︃
= 1 𝐴ˆ 𝑛 ⊗ 𝑊 𝐴𝑛 𝜌 𝐴𝑛 𝑊 𝐴𝑛 |Γ⟩ 𝐴ˆ 𝑛 𝐴𝑛
𝜋 𝜋†
(3.2.93)
√
= 1 𝐴ˆ 𝑛 ⊗ 𝜌 𝐴𝑛 |Γ⟩ 𝐴ˆ 𝑛 𝐴𝑛

(3.2.94)
= |𝜓 𝜌 ⟩ 𝐴ˆ 𝑛 𝐴𝑛 (3.2.95)

for all 𝜋 ∈ S𝑛 , where the third equality follows from (2.2.70) and the fourth equality
follows from the permutation invariance of 𝜌 𝐴𝑛 . ■

3.2.7 Group-Invariant States

So far, we have seen two special types of unitary operators: the Heisenberg–Weyl
operators {𝑊𝑧,𝑥 } 𝑧,𝑥=0
𝑑−1 , introduced in Definition 3.7, and the permutation operators

{𝑊 𝜋 } 𝜋∈S𝑛 defined in (3.2.77). Both sets of operators are examples of projective

113
Chapter 3: Quantum States and Measurements

unitary group representations. Specifically, the Heisenberg–Weyl operators form

a projective unitary representation of the group Z𝑑 × Z𝑑 , and the permutation
operators form a unitary representation of the symmetric group S𝑛 .
Let us now formally define the concepts of a group and group representations.
A group 𝐺 is a tuple (G, ∗) consisting of a set G of objects and an associative
operation ∗ used to combine them. We write 𝑔 ∈ 𝐺 to mean that the object 𝑔
belongs to the set G. Then, the operation ∗ is such that 𝑔 ∗ 𝑔′ ∈ 𝐺 for all 𝑔, 𝑔′ ∈ 𝐺.
Furthermore, there is an identity element id such that 𝑔 ∗ id = 𝑔 = id ∗ 𝑔 for all
𝑔 ∈ 𝐺, and corresponding to every element 𝑔 ∈ 𝐺 is an inverse 𝑔 −1 ∈ 𝐺 such that
𝑔 ∗ 𝑔 −1 = 𝑔 −1 ∗ 𝑔 = id. We mostly consider finite groups in this book, and we
use |𝐺 | to denote the number of elements in the associated set G. Please see the
Bibliographic Notes (Section 3.4) for more information about groups.
A unitary representation of a group 𝐺 is a set {𝑈 𝑔 }𝑔∈𝐺 of unitary operators,
with one unitary operator associated to each element of 𝐺. The unitary operators
′ ′
respect the group operation ∗, in the sense that 𝑈 𝑔𝑈 𝑔 = 𝑈 𝑔∗𝑔 for all 𝑔 ∈ 𝐺. In
particular, this implies that 𝑈 id = 1 and 𝑈 𝑔 = (𝑈 𝑔 ) † for all 𝑔 ∈ 𝐺. A unitary
−1

representation of a group 𝐺 is called projective if the unitaries respect the group

′ ′
operation up to a phase factor, i.e., if 𝑈 𝑔𝑈 𝑔 = 𝜔(𝑔, 𝑔′)𝑈 𝑔∗𝑔 for all 𝑔, 𝑔′ ∈ 𝐺, where
𝜔(𝑔, 𝑔′) ∈ C satisfies |𝜔(𝑔, 𝑔′)| = 1 for all 𝑔, 𝑔′ ∈ 𝐺. Please see the Bibliographic
Notes (Section 3.4) for more information about group representations.
The action of the group representation on a quantum system is defined by the
channels 𝜌 ↦→ 𝑈 𝑔 𝜌𝑈 𝑔† for all 𝑔 ∈ 𝐺, where 𝜌 is a state of the quantum system and
{𝑈 𝑔 }𝑔∈𝐺 is a (projective) unitary representation of the group 𝐺. Physically, groups
are used to model certain types of operations on a system (such as permutations,
translations, rotations, etc.). Mathematically, systems that have symmetries are
such that the states of the system are invariant under the action of the corresponding
group.

Definition 3.14 Group-Invariant State

Let 𝐺 be a (finite) group and {𝑈 𝑔 }𝑔∈𝐺 a 𝑑-dimensional unitary representation
of 𝐺, with 𝑑 ≥ 2. A quantum state 𝜌 ∈ D(C𝑑 ) is called group invariant, or
𝐺-invariant, if 𝜌 = 𝑈 𝑔 𝜌𝑈 𝑔† for all 𝑔 ∈ 𝐺.

114
Chapter 3: Quantum States and Measurements

Exercise 3.17
Let 𝜌 𝐴 be a quantum state for a 𝑑-dimensional quantum system 𝐴, let 𝐺 be a
𝑔
group, and let {𝑈 𝐴 }𝑔∈𝐺 be a 𝑑-dimensional unitary representation of 𝐺.
1. Prove that the state
1 ∑︁ 𝑔 𝑔†
T (𝜌 𝐴 ) B
𝐺
𝑈 𝐴 𝜌 𝐴𝑈 𝐴 (3.2.96)
|𝐺 | 𝑔∈𝐺

is group invariant. The quantum channel T 𝐺 is known as the twirl channel

with respect to the unitary representation {𝑈 𝑔 }𝑔∈𝐺 of the group 𝐺.
2. Let |𝜙⟩ 𝑅 𝐴 be a purification of 𝜌 𝐴 . Verify that
1
|𝑔⟩ 𝑋 ⊗ ( 1 𝑅 ⊗ 𝑈 𝐴 )|𝜙⟩ 𝑅 𝐴
∑︁
𝑔
√︁ (3.2.97)
|𝐺 | 𝑔∈𝐺

is a purification of T 𝐺 (𝜌 𝐴 ).

The twirl map in (3.2.96) corresponding to the Heisenberg–Weyl unitaries has

the following special form.

Lemma 3.15 Heisenberg–Weyl Twirl

For every linear operator 𝑀 ∈ L(C𝑑 ),

1 ∑︁
𝑑−1
† 1𝑑
𝑊 𝑧,𝑥 𝑀𝑊 𝑧,𝑥 = Tr[𝑀] . (3.2.98)
𝑑 2 𝑧,𝑥=0 𝑑

Proof: The Heisenberg–Weyl operators form an irreducible projective unitary

representation of the group Z𝑑 × Z𝑑 . This fact can be used to prove (3.2.98) (see
Bibliographic Notes in Section 3.4). Alternatively, the result is immediate using
orthonormality of the set { √1 𝑊𝑧,𝑥 } 𝑧,𝑥=0
𝑑−1 (see (3.2.54)) along with Problem 6 in
𝑑
Section 2.7. For an alternative approach, see Exercise 3.18. ■

115
Chapter 3: Quantum States and Measurements

Exercise 3.18
Provide a direct proof of Lemma 3.15. Do this by first showing that
1 Í𝑑−1 †
𝑑2 𝑧,𝑥=0 𝑊𝑧,𝑥 𝑀𝑊𝑧,𝑥 = (D 𝑋 ◦ D 𝑍 )(𝑀), where

𝑑−1
1 ∑︁
D 𝑋 (𝑀) B 𝑋 (𝑥) 𝑀 𝑋 (𝑥) † , (3.2.99)
𝑑 𝑥=0
𝑑−1
1 ∑︁
D𝑍 (𝑀) B 𝑍 (𝑧) 𝑀 𝑍 (𝑧) † . (3.2.100)
𝑑 𝑧=0

Recall that the Heisenberg–Weyl operators reduce to the Pauli operators for
𝑑 = 2. In this case, we obtain
1 1 1 1 12
𝑀 + 𝑋 𝑀 𝑋 + 𝑌 𝑀𝑌 + 𝑍 𝑀 𝑍 = Tr[𝑀] , (3.2.101)
4 4 4 4 2
for every 𝑀 ∈ L(C2 ).

Exercise 3.19
Let 𝐺 be a group, and let {𝑈 𝑔 }𝑔∈𝐺 be a 𝑑-dimensional unitary representation
of 𝐺, with 𝑑 ≥ 2. If 𝜌 ∈ D(C𝑑 ) is a group-invariant state, then prove that there
exists a purification |𝜓 𝜌 ⟩ of 𝜌 such that 𝑈 𝑔 ⊗ 𝑈 𝑔 |𝜓 𝜌 ⟩ = |𝜓 𝜌 ⟩ for all 𝑔 ∈ 𝐺.

3.2.8 Ensembles and Classical–Quantum States

For a finite alphabet X, an ensemble is a collection {( 𝑝(𝑥), 𝜌 𝑥 )}𝑥∈X consisting of a

probability distribution 𝑝 : X → [0, 1] such that each probability 𝑝(𝑥) is paired
with a quantum state 𝜌 𝑥 . Ensembles are used to describe quantum systems that are
known to be in one of a given set of states with some probability.
Suppose that Alice is in possession of a quantum system and that she prepares
the system in the state 𝜌 𝑥 with probability 𝑝(𝑥). The state of the system can thus be
described by the ensemble {( 𝑝(𝑥), 𝜌 𝑥 )}𝑥∈X . If she sends the system to Bob without
telling him in which of the states the system has been prepared, but Bob knows
the ensemble {( 𝑝(𝑥), 𝜌 𝑥 )}𝑥∈X describing the system, then from Bob’s perspective
the state of the system is given by the expected state 𝜌 of the ensemble, which is
116
Chapter 3: Quantum States and Measurements

specified as ∑︁
𝜌= 𝑝(𝑥) 𝜌 𝑥 . (3.2.102)
𝑥∈X

On the other hand, if Alice sends Bob classical information about which state
she has prepared, then from Bob’s perspective the state of the system can be
described by the following classical–quantum state:
∑︁
𝜌𝑋 𝐵 = 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐵 , (3.2.103)
𝑥∈X

where 𝑋 is the |X|-dimensional quantum system corresponding to the register

holding the information about which state was sent and {|𝑥⟩}𝑥∈X is an orthonormal
basis for H 𝑋 . Furthermore, the reduced state of Bob’s system 𝐵, after discarding 𝑋,
is ∑︁
Tr 𝑋 [𝜌 𝑋 𝐵 ] = 𝑝(𝑥) 𝜌 𝑥 = 𝜌, (3.2.104)
𝑥∈X
consistent with (3.2.102).
Classical–quantum states have a block-diagonal structure, in the sense that they
can equivalently be written as the following block-diagonal matrix:

We can represent (3.2.105) more compactly as

Ê
𝜌𝑋 𝐵 = 𝑝(𝑥) 𝜌 𝑥 . (3.2.106)
𝑥∈X

Exercise 3.20
Construct a purification of the classical–quantum state 𝜌 𝑋 𝐵 in (3.2.103). (Hint:
Consider a purification analogous to the one in (3.2.87).)

117
Chapter 3: Quantum States and Measurements

3.2.9 Partial Transpose and PPT States

In Section 3.2.2, we defined the partial trace superoperator as a generalization of the

usual trace to the case that it acts only on one part of a composite quantum system.
In a similar manner, in this section, we define the partial transpose superoperator.
The partial transpose plays an important role in quantum information theory due to
its connection with entanglement. In fact, as we show in this section, it leads to a
sufficient condition for a bipartite state to be entangled.
Recall from (2.2.25) that the action of the transpose map T on a linear opera-
tor 𝑋 𝐴→𝐴′ can be written as
𝐴′ −1
𝐴−1 𝑑∑︁
𝑑∑︁
T(𝑋) = |𝑖⟩ 𝐴 ⟨ 𝑗 | 𝐴′ 𝑋 |𝑖⟩ 𝐴 ⟨ 𝑗 | 𝐴′ , (3.2.107)
𝑖=0 𝑗=0

where we have defined the transpose with respect to the orthonormal bases
{|𝑖⟩ 𝐴 : 0 ≤ 𝑖 ≤ 𝑑 𝐴 − 1} and {| 𝑗⟩ 𝐴′ : 0 ≤ 𝑗 ≤ 𝑑 𝐴′ − 1}. This is consistent with the
familiar definition of the transpose of a matrix 𝑋 as being the matrix 𝑋 T with its
rows and columns flipped relative to 𝑋. Indeed, if 𝑋 has the matrix representation
Í𝑑 𝐴′ −1 Í𝑑 𝐴−1 ′
𝑋 = 𝑗=0 𝑖=0 𝑋 𝑗,𝑖 | 𝑗⟩⟨ | 𝐴 𝑖 𝐴 , then it follows that the transpose of 𝑋 is

𝐴′ −1 𝑑∑︁
𝑑∑︁ 𝐴−1
T
𝑋 = 𝑋 𝑗,𝑖 |𝑖⟩ 𝐴 ⟨ 𝑗 | 𝐴′ = T(𝑋). (3.2.108)
𝑗=0 𝑖=0

Note that, unlike the trace or the conjugate transpose, the transpose depends on the
choice of orthonormal bases used to evaluate it. Throughout the rest of this book,
we use both T(𝑋) and 𝑋 T to refer to the transpose of 𝑋 with respect to the standard
orthonormal basis.

Exercise 3.21
For all 𝑑 ≥ 2, prove that the transpose map can be realized as follows, in terms
of the Heisenberg–Weyl operators from Definition 3.7:
𝑑−1
1 ∑︁ 2 𝜋i𝑧 𝑥 †
T(𝑋) = e 𝑑 𝑊𝑧,𝑥 𝑋𝑊𝑧,−𝑥 , (3.2.109)
𝑑 𝑧,𝑥=0

where the equality holds for every linear operator 𝑋 ∈ L(C𝑑 ).

118
Chapter 3: Quantum States and Measurements

The transpose map is known as the partial transpose when it acts on one
subsystem of a bipartite linear operator 𝑋 𝐴𝐵 .

Definition 3.16 Partial Transpose

Given quantum systems 𝐴 and 𝐵, the partial transpose on 𝐵 is denoted by
T𝐵 ≡ id 𝐴 ⊗ T𝐵 , and it is defined as
𝐵 −1
𝑑∑︁
T𝐵 (𝑋 𝐴𝐵 ) B ( 1 𝐴 ⊗ | 𝑗⟩⟨ 𝑗 ′ | 𝐵 ) 𝑋 𝐴𝐵 ( 1 𝐴 ⊗ | 𝑗⟩⟨ 𝑗 ′ | 𝐵 ) (3.2.110)
𝑗, 𝑗 ′ =0

for every linear operator 𝑋 𝐴𝐵 ∈ L(H 𝐴 ⊗ H𝐵 ). Similarly, the partial transpose

on 𝐴 is denoted by T 𝐴 ≡ T 𝐴 ⊗ id𝐵 , and it is defined as
𝐴−1
𝑑∑︁
T 𝐴 (𝑋 𝐴𝐵 ) B (|𝑖⟩⟨𝑖′ | 𝐴 ⊗ 1𝐵 ) 𝑋 𝐴𝐵 (|𝑖⟩⟨𝑖′ | 𝐴 ⊗ 1𝐵 ) (3.2.111)
𝑖,𝑖 ′ =0

for all 𝑋 𝐴𝐵 ∈ L(H 𝐴 ⊗ H𝐵 ).

Given an expansion of 𝑋 𝐴𝐵 as
𝐵 −1
𝑑∑︁
𝑖, 𝑗
𝑋 𝐴𝐵 = 𝑋 𝐴 ⊗ |𝑖⟩⟨ 𝑗 | 𝐵 , (3.2.112)
𝑖, 𝑗=0

𝑖, 𝑗
where each 𝑋 𝐴 B ⟨𝑖| 𝐵 𝑋 𝐴𝐵 | 𝑗⟩𝐵 is a linear operator acting on system 𝐴, the partial
transpose map T𝐵 has the action
𝐵 −1
𝑑∑︁ 𝐵 −1
𝑑∑︁
𝑖, 𝑗 𝑗,𝑖
T𝐵 (𝑋 𝐴𝐵 ) = 𝑋𝐴 ⊗ | 𝑗⟩⟨𝑖| 𝐵 = 𝑋 𝐴 ⊗ |𝑖⟩⟨ 𝑗 | 𝐵 . (3.2.113)
𝑖, 𝑗=0 𝑖, 𝑗=0

The partial transpose map is self-inverse, i.e.,

(id 𝐴 ⊗ T𝐵 ) ◦ (id 𝐴 ⊗ T𝐵 ) = id 𝐴𝐵 , (3.2.114)

and it is self-adjoint with respect to the Hilbert–Schmidt inner product, in the sense
that
⟨𝑋 𝐴𝐵 , T𝐵 (𝑌 𝐴𝐵 )⟩ = ⟨T𝐵 (𝑋 𝐴𝐵 ), 𝑌 𝐴𝐵 ⟩, (3.2.115)
119
Chapter 3: Quantum States and Measurements

for all operators 𝑋 𝐴𝐵 and 𝑌 𝐴𝐵 .

We also have the following generalization of the transpose trick from (2.2.40):

(𝑋 𝑅 𝐴 ⊗ 1𝐵 )( 1 𝑅 ⊗ |Γ⟩ 𝐴𝐵 ) = ( 1 𝐴 ⊗ T𝐵 (𝑋 𝑅𝐵 ))( 1 𝑅 ⊗ |Γ⟩ 𝐴𝐵 ), (3.2.116)

where the Hilbert spaces corresponding to the systems 𝐴 and 𝐵 are isomorphic and
𝑋 𝑅 𝐴 ∈ L(H 𝑅 ⊗ H 𝐴 ).

Exercise 3.22
Verify (3.2.114), (3.2.115), and (3.2.116).

Definition 3.17 PPT State

A bipartite state 𝜌 𝐴𝐵 is called a positive partial transpose (PPT) state if the
partial transpose T𝐵 (𝜌 𝐴𝐵 ) is positive semi-definite.
The set of PPT states is denoted by PPT( 𝐴 : 𝐵), so that

PPT( 𝐴 : 𝐵) B {𝜎𝐴𝐵 : 𝜎𝐴𝐵 ≥ 0, T𝐵 (𝜎𝐴𝐵 ) ≥ 0, Tr[𝜎𝐴𝐵 ] = 1} . (3.2.117)

Lemma 3.18
Given quantum systems 𝐴 and 𝐵, the set PPT( 𝐴 : 𝐵) does not depend on which
system the transpose is taken, nor does it depend on which orthonormal basis is
used to define the transpose map.

Proof: To see the first statement, suppose that 𝜌 𝐴𝐵 ∈ PPT( 𝐴 : 𝐵). This means
that T𝐵 (𝜌 𝐴𝐵 ) ≥ 0. But since the eigenvalues are invariant under a full transpose
T 𝐴 ⊗ T𝐵 , this means that (T 𝐴 ⊗ T𝐵 )(T𝐵 (𝜌 𝐴𝐵 )) ≥ 0, the latter being the same
as T 𝐴 (𝜌 𝐴𝐵 ) ≥ 0 due to the self-inverse property of the partial transpose. So
T𝐵 (𝜌 𝐴𝐵 ) ≥ 0 implies T 𝐴 (𝜌 𝐴𝐵 ) ≥ 0, and vice versa.
𝑑 𝐵 −1
To see the second statement, let T𝐵 (𝜌 𝐴𝐵 ) ≥ 0, and let {|𝜙ℓ ⟩𝐵 }ℓ=0 be some
other orthonormal basis for 𝐵. The partial transpose with respect to this basis is
given by
𝐵 −1
𝑑∑︁
( 1 𝐴 ⊗ |𝜙ℓ ⟩⟨𝜙ℓ′ | 𝐵 ) 𝜌 𝐴𝐵 ( 1 𝐴 ⊗ |𝜙ℓ ⟩⟨𝜙ℓ′ | 𝐵 ). (3.2.118)
ℓ,ℓ ′ =0

120
Chapter 3: Quantum States and Measurements

|𝜙ℓ ⟩⟨𝜙ℓ | = 1𝐵 , so that

Í𝑑 𝐵 −1
Now, consider that ℓ=0

T𝐵 (𝜌 𝐴𝐵 )
𝐵 −1
𝑑∑︁
= ( 1 𝐴 ⊗ | 𝑗⟩⟨ 𝑗 ′ | 𝐵 ) 𝜌 𝐴𝐵 ( 1 𝐴 ⊗ | 𝑗⟩⟨ 𝑗 ′ | 𝐵 )
𝑗, 𝑗 ′ =0
𝐵 −1
𝑑∑︁
= ( 1 𝐴 ⊗ | 𝑗⟩⟨ 𝑗 ′ |𝜙ℓ ⟩⟨𝜙ℓ | 𝐵 ) 𝜌 𝐴𝐵 ( 1 𝐴 ⊗ |𝜙ℓ′ ⟩⟨𝜙ℓ′ | 𝑗⟩⟨ 𝑗 ′ | 𝐵 )
𝑗, 𝑗 ′ ,ℓ,ℓ ′ =0
𝐵 −1
𝑑∑︁
= ( 1 𝐴 ⊗ ⟨𝜙ℓ′ | 𝑗⟩| 𝑗⟩⟨𝜙ℓ | 𝐵 ) 𝜌 𝐴𝐵 ( 1 𝐴 ⊗ |𝜙ℓ′ ⟩⟨ 𝑗 ′ | 𝐵 ⟨ 𝑗 ′ |𝜙ℓ ⟩)
𝑗, 𝑗 ′ ,ℓ,ℓ ′ =0
𝐵 −1
𝑑∑︁ 𝑑 𝐵 −1 𝐵 −1
𝑑∑︁
1 𝐴 ⊗ ⟨𝜙ℓ′ | 𝑗⟩| 𝑗⟩ ® ⟨𝜙ℓ | 𝐵 ® 𝜌 𝐴𝐵 1 𝐴 ⊗ |𝜙ℓ′ ⟩
© ∑︁
= ⟨ 𝑗 ′ | 𝐵 ⟨ 𝑗 ′ |𝜙ℓ ⟩ ®®
© ª ª © © ªª
ℓ,ℓ ′ =0 « « 𝑗=0 ¬ ¬ «
′
« 𝑗 =0 ¬¬
𝐵 −1
𝑑∑︁
= 1 𝐴 ⊗ |𝜙ℓ′ ⟩⟨𝜙ℓ | 𝐵 𝜌 𝐴𝐵 1 𝐴 ⊗ |𝜙ℓ′ ⟩⟨𝜙ℓ | , (3.2.119)
ℓ,ℓ ′ =0

Í𝑑 𝐵 −1
where in the last line we defined |𝜙ℓ ⟩ B ⟨𝜙ℓ | 𝑗⟩| 𝑗⟩ for 0 ≤ ℓ ≤ 𝑑 𝐵 − 1. Note
𝑗=0
𝑑 𝐵 −1 Í𝑑 𝐵 −1
that the set {|𝜙ℓ ⟩}ℓ=0 is orthonormal, so that 𝑈𝐵 B ℓ=0 |𝜙ℓ ⟩⟨𝜙ℓ | 𝐵 is a unitary
operator. Then we find that
𝐵 −1
𝑑∑︁
( 1 𝐴 ⊗ |𝜙ℓ′ ⟩⟨𝜙ℓ | 𝐵 ) 𝜌 𝐴𝐵 ( 1 𝐴 ⊗ |𝜙ℓ′ ⟩⟨𝜙ℓ | 𝐵 ) = 𝑈𝐵 T𝐵 (𝜌 𝐴𝐵 )𝑈𝐵† ≥ 0, (3.2.120)
ℓ,ℓ ′ =0

where the last inequality follows from the condition T𝐵 (𝜌 𝐴𝐵 ) ≥ 0 and property 1
of Lemma 2.14. We thus conclude that the PPT property does not depend on which
orthonormal basis is used to define the transpose map. ■

We mentioned at the beginning of this section that the partial transpose is useful
in quantum information because it leads to a sufficient condition for a bipartite state
to be entangled. We now derive the sufficient condition.
Suppose that a state 𝜎𝐴𝐵 is a separable, unentangled state of the following form:
∑︁
𝜎𝐴𝐵 = 𝑝(𝑥)𝜔𝑥𝐴 ⊗ 𝜏𝐵𝑥 , (3.2.121)
𝑥∈X

121
Chapter 3: Quantum States and Measurements

where 𝑝 : X → [0, 1] is a probability distribution on a finite alphabet X and

{𝜔𝑥𝐴 }𝑥∈X and {𝜏𝐵𝑥 }𝑥∈X are sets of quantum states. Then, the action of the partial
transpose map T𝐵 on 𝜎𝐴𝐵 is as follows:
∑︁
T𝐵 (𝜎𝐴𝐵 ) = 𝑝(𝑥)𝜔𝑥𝐴 ⊗ T(𝜏𝐵𝑥 ), (3.2.122)
𝑥

which is a separable quantum state, being the expected state of the ensemble
{( 𝑝(𝑥), 𝜔𝑥𝐴 ⊗ T(𝜏𝐵𝑥 ))}𝑥∈X . Each element of the ensemble is indeed a quantum state
because the transpose is a positive map, i.e., T(𝜏𝐵𝑥 ) ≥ 0 if 𝜏𝐵𝑥 ≥ 0. Due to this fact,
we conclude that T𝐵 (𝜎𝐴𝐵 ) ≥ 0, so that 𝜎𝐴𝐵 is a PPT state. Thus, we conclude the
following:

If a state is separable, then it is PPT.

This is called the PPT criterion. Equivalently, by taking the contrapositive of this
statement, we obtain the following:

If a state is not PPT, then it is entangled.

So the condition of a state being NPT (non-positive partial transpose) is sufficient

for detecting entanglement.
As an example, let us consider applying the partial transpose map T𝐵 to the
maximally entangled state Φ 𝐴𝐵 , as defined in (3.2.39). In the case that the bases of
the partial transpose and the maximally entangled state are the same, we find that
𝑑−1
!
1 ∑︁
T𝐵 (Φ 𝐴𝐵 ) = T𝐵 |𝑖⟩⟨𝑖′ | 𝐴 ⊗ |𝑖⟩⟨𝑖′ | 𝐵 (3.2.123)
𝑑 ′
𝑖,𝑖 =0
𝑑−1
∑︁
= |𝑖⟩⟨𝑖′ | 𝐴 ⊗ |𝑖′⟩⟨𝑖| 𝐵 (3.2.124)
𝑖,𝑖 ′ =0
1
= 𝐹𝐴𝐵 , (3.2.125)
𝑑
where 𝐹𝐴𝐵 is the swap operator defined in (3.2.83). If the bases are not the same,
then we find, by applying the same development from (3.2.118)–(3.2.120), that

122
Chapter 3: Quantum States and Measurements

there exists a unitary 𝑈𝐵 such that

1
T𝐵 (Φ 𝐴𝐵 ) = 𝑈𝐵 𝐹𝐴𝐵𝑈𝐵† . (3.2.126)
𝑑

From (2.5.8) and (2.5.9), the swap operator has the following spectral decom-
position:
Sym ASym
𝐹𝐴𝐵 = Π 𝐴𝐵 − Π 𝐴𝐵 , (3.2.127)
Sym
where Π 𝐴𝐵 ≡ ΠSym2 (C𝑑 ) is the projection onto the symmetric subspace of C𝑑 ⊗ C𝑑
ASym
and Π 𝐴𝐵 ≡ ΠASym2 (C𝑑 ) is the projection onto the anti-symmetric subspace of
C𝑑 ⊗ C𝑑 . Indeed, we have that Π 𝐴𝐵 + Π 𝐴𝐵 = 1 𝐴𝐵 and Π 𝐴𝐵 Π 𝐴𝐵 = 0. Thus,
Sym ASym Sym ASym

the swap operator has negative eigenvalues, which by the PPT criterion means that
Φ 𝐴𝐵 is an entangled state, as expected.
Although the PPT criterion is generally only a necessary condition for separabil-
ity of a bipartite state, it is known that the PPT criterion is necessary and sufficient
for every quantum state 𝜌 𝐴𝐵 for which both 𝐴 and 𝐵 are qubits or 𝐴 is a qubit and
𝐵 is a qutrit; please consult the Bibliographic Notes in Section 3.4. In particular,
therefore, in higher dimensions there are PPT states that are entangled. These PPT
entangled states turn out to be useless for the task of entanglement distillation (see
Chapter 13), and thus they are called bound entangled (although they are entangled,
they cannot be used to extract pure maximally entangled states at a non-zero rate).

Exercise 3.23
Prove that the swap operator 𝐹𝐴𝐵 possesses the following symmetry:

(𝑈 𝐴 ⊗ 𝑉𝐵 )𝐹𝐴𝐵 = 𝐹𝐴𝐵 (𝑉𝐴 ⊗ 𝑈𝐵 ) (3.2.128)

for all unitaries 𝑈 and 𝑉.

3.2.10 Isotropic and Werner States

In Section 3.2.6, we defined permutation-invariant states, which are states that are
invariant under the action of the unitary operator 𝑊 𝜋 for every permutation 𝜋 ∈ S𝑛 .
Another important class of quantum states in quantum information theory consists
of bipartite states that are invariant under certain kinds of unitaries. There are two
distinct such classes of states that we define in this section.
123
Chapter 3: Quantum States and Measurements

Definition 3.19 Isotropic States

Consider two quantum systems 𝐴 and 𝐵, with 𝑑 𝐴 = 𝑑 𝐵 = 𝑑 ≥ 2. A quantum
state 𝜌 𝐴𝐵 is called an isotropic state if it is invariant under the action of 𝑈 ⊗ 𝑈
for every unitary 𝑈, i.e., if

𝜌 𝐴𝐵 = (𝑈 ⊗ 𝑈) 𝜌 𝐴𝐵 (𝑈 ⊗ 𝑈) † (3.2.129)

for every unitary 𝑈. For every isotropic state 𝜌 𝐴𝐵 , there exists 𝑝 ∈ [0, 1] such
iso;𝑝
that 𝜌 𝐴𝐵 = 𝜌 𝐴𝐵 , where

1− 𝑝
( 1 𝐴𝐵 − |Φ⟩⟨Φ| 𝐴𝐵 ) ,
iso;𝑝
𝜌 𝐴𝐵 B 𝑝|Φ⟩⟨Φ| 𝐴𝐵 + (3.2.130)
𝑑2 − 1
Í𝑑−1
where |Φ⟩ 𝐴𝐵 = √1 | 𝑗, 𝑗⟩ 𝐴𝐵 .
𝑑 𝑗=0

Using (3.2.59), we can write every isotropic state as follows:

iso;𝑝 1− 𝑝 ∑︁
𝜌 𝐴𝐵 = 𝑝|Φ⟩⟨Φ| 𝐴𝐵 + 2 |Φ𝑧,𝑥 ⟩⟨Φ𝑧,𝑥 |. (3.2.131)
𝑑 − 1 0≤𝑧,𝑥≤𝑑−1
(𝑧,𝑥)≠(0,0)
In other words, the isotropic state can be viewed as a probabilistic mixture of the
qudit Bell states defined in (3.2.58), such that the state |Φ⟩⟨Φ| is prepared with
probability 𝑝, and the states |Φ𝑧,𝑥 ⟩⟨Φ𝑧,𝑥 |, with (𝑧, 𝑥) ≠ (0, 0), are prepared with
probability 𝑑1−𝑝
2 −1 . This implies that every isotropic state is a Bell-diagonal state (see

(3.2.60)), that it has full rank for 𝑝 ∈ (0, 1), and that its eigenvalues are 𝑝 and 𝑑1−𝑝
2 −1

(the latter with multiplicity 𝑑 − 1).

Exercise 3.24

1. Verify that the isotropic state in (3.2.130) is invariant under 𝑈 ⊗ 𝑈 for every
unitary 𝑈, i.e., verify that

(𝑈 𝐴 ⊗ 𝑈 𝐵 ) 𝜌 𝐴𝐵 (𝑈 𝐴 ⊗ 𝑈 𝐵 ) † = 𝜌 𝐴𝐵
iso;𝑝 iso;𝑝
(3.2.132)

for every unitary 𝑈 and all 𝑝 ∈ [0, 1].

iso;𝑝
2. Show that, for all 𝑝 ∈ [0, 1], the isotropic state 𝜌 𝐴𝐵 can be represented as

124
Chapter 3: Quantum States and Measurements

1 𝐴𝐵
h i
𝑝𝑑 2 −1 −1
𝑎|Φ⟩⟨Φ| 𝐴𝐵 + (1 − 𝑎) 𝑑2
, where 𝑎 = 𝑑 2 −1
. Conclude that 𝑎 ∈ 𝑑 2 −1
,1 .

Just as every multipartite quantum state can be made permutation invariant via
the construction in (3.2.86), every bipartite quantum state 𝜌 𝐴𝐵 , with 𝑑 𝐴 = 𝑑 𝐵 , can
be made invariant under 𝑈 ⊗ 𝑈, i.e., isotropic, via the following construction:
∫
(𝑈 ⊗ 𝑈) 𝜌 𝐴𝐵 (𝑈 ⊗ 𝑈) † d𝑈. (3.2.133)
𝑈

This construction can be thought of as a uniform average over unitaries, analogous

to the uniform average over permutations in (3.2.86). The object “d𝑈” is known
as the Haar measure. Intuitively, an integral is used due to the fact that the set
of unitaries is continuous; however, the integral can be evaluated using a uniform
average over a discrete set of unitaries known as a unitary two-design. In particular,
for every state 𝜌 𝐴𝐵 ,
∫
(𝑈 ⊗ 𝑈) 𝜌 𝐴𝐵 (𝑈 ⊗ 𝑈) † d𝑈 = 𝜌 𝐴𝐵 , 𝑝 = ⟨Φ|𝜌 𝐴𝐵 |Φ⟩.
iso;𝑝
(3.2.134)
𝑈

Please see the Bibliographic Notes (Section 3.4) for more information about this
result, as well as about integrals of functions of unitaries with respect to the Haar
measure.
The isotropic states constitute one class of bipartite quantum states in which
every state is invariant under the action of a unitary acting on the individual
subsystems. We now define a second class of such states.

Definition 3.20 Werner States

Consider two quantum systems 𝐴 and 𝐵, with 𝑑 𝐴 = 𝑑 𝐵 = 𝑑 ≥ 2. A quantum
state 𝜌 𝐴𝐵 is called a Werner state if it is invariant under the action of 𝑈 ⊗ 𝑈 for
every unitary 𝑈, i.e., if

𝜌 𝐴𝐵 = (𝑈 ⊗ 𝑈) 𝜌 𝐴𝐵 (𝑈 ⊗ 𝑈) † (3.2.135)

for every unitary 𝑈. For every Werner state 𝜌 𝐴𝐵 , there exists 𝑝 ∈ [0, 1] such
W;𝑝
that 𝜌 𝐴𝐵 = 𝜌 𝐴𝐵 , where
W;𝑝 ⊥
𝜌 𝐴𝐵 B 𝑝𝜁 𝐴𝐵 + (1 − 𝑝)𝜁 𝐴𝐵 . (3.2.136)

125
Chapter 3: Quantum States and Measurements

⊥ are quantum states defined as

Here, 𝜁 𝐴𝐵 and 𝜁 𝐴𝐵

1
𝜁 𝐴𝐵 B ( 1 𝐴𝐵 − 𝐹𝐴𝐵 ) , (3.2.137)
𝑑 (𝑑 − 1)
1
⊥
𝜁 𝐴𝐵 B ( 1 𝐴𝐵 + 𝐹𝐴𝐵 ) , (3.2.138)
𝑑 (𝑑 + 1)
Í𝑑−1
and 𝐹𝐴𝐵 = 𝑖, 𝑗=0 |𝑖, 𝑗⟩⟨ 𝑗, 𝑖| 𝐴𝐵 is the swap operator.

⊥ in Definition 3.20 are proportional to the

Observe that the states 𝜁 𝐴𝐵 and 𝜁 𝐴𝐵
ASym Sym
projections Π 𝐴𝐵 ≡ ΠASym2 (C𝑑 ) and Π 𝐴𝐵 ≡ ΠSym2 (C𝑑 ) onto the anti-symmetric
and symmetric subspaces, respectively, of C𝑑 ⊗ C𝑑 (recall (2.5.8) and (2.5.9)). In
particular,
2 ASym ⊥ 2 Sym
𝜁 𝐴𝐵 = Π 𝐴𝐵 , 𝜁 𝐴𝐵 = Π 𝐴𝐵 . (3.2.139)
𝑑 (𝑑 − 1) 𝑑 (𝑑 + 1)

Exercise 3.25
1. Verify that the Werner state in (3.2.136) is invariant under 𝑈 ⊗ 𝑈 for every
unitary 𝑈, i.e., verify that

(𝑈 𝐴 ⊗ 𝑈𝐵 ) 𝜌 𝐴𝐵 (𝑈 𝐴 ⊗ 𝑈𝐵 ) † = 𝜌 𝐴𝐵
W;𝑝 W;𝑝
(3.2.140)

for every unitary 𝑈 and all 𝑝 ∈ [0, 1].

W;𝑝
2. Show that, for all 𝑝 ∈ [0, 1], the Werner state 𝜌 𝐴𝐵 can be represented as
1
𝑑 2 −𝑑𝑎
( 1 𝐴𝐵 − 𝑎𝐹𝐴𝐵 ), where 𝑎 = 𝑑 (2𝑝−1)+1
2𝑝−1+𝑑 . Conclude that 𝑎 ∈ [−1, 1].

= |Ψ− ⟩⟨Ψ− |, where |Ψ− ⟩ is defined in

ASym
3. Prove that for 𝑑 = 2, 𝜁 𝐴𝐵 = Π 𝐴𝐵
(3.2.47).

As with the isotropic states, every bipartite quantum state 𝜌 𝐴𝐵 , with 𝑑 𝐴 = 𝑑 𝐵 ,

can be made into a Werner state via the following construction:
∫
(𝑈 ⊗ 𝑈) 𝜌 𝐴𝐵 (𝑈 ⊗ 𝑈) † d𝑈. (3.2.141)
𝑈

126
Chapter 3: Quantum States and Measurements

As before, the integral represents the uniform average over all unitaries, which can
be evaluated using a unitary two-design. In particular, for every state 𝜌 𝐴𝐵 ,
∫ h i
† W;𝑝 ASym
(𝑈 ⊗ 𝑈) 𝜌 𝐴𝐵 (𝑈 ⊗ 𝑈) d𝑈 = 𝜌 𝐴𝐵 , 𝑝 = Tr Π 𝐴𝐵 𝜌 𝐴𝐵 . (3.2.142)
𝑈

Please see the Bibliographic Notes (Section 3.4) for more information.

3.3 Measurements
Measurements in quantum mechanics are described by positive operator-valued
measures (POVMs).

Definition 3.21 Positive Operator-Valued Measure (POVM)

A POVM is a set {𝑀𝑥 }𝑥∈X of operators satisfying

𝑀𝑥 = 1.
∑︁
𝑀𝑥 ≥ 0 ∀ 𝑥 ∈ X, (3.3.1)
𝑥∈X

For our purposes, it suffices to consider finite sets of such operators. The
elements of the finite alphabet X are used to label the outcomes of the
measurement.

The measurement of a quantum system in the state 𝜌 according to the POVM

{𝑀𝑥 }𝑥∈X induces a probability distribution 𝑝 𝑋 : X → [0, 1]. This distribution
corresponds to a random variable 𝑋, which takes values in the alphabet X, and is
defined by the Born rule:

𝑝 𝑋 (𝑥) = Pr[𝑋 = 𝑥] = Tr[𝑀𝑥 𝜌]. (3.3.2)

If every element 𝑀𝑥 of the POVM is a projection, then the corresponding

measurement is called a projective measurement. If the POVM of a projective
measurement consists solely of rank-one projections, then the measurement is
sometimes called a von Neumann measurement. Observe using (2.2.2) that every
orthonormal basis {|𝑒 𝑘 ⟩} 𝑑𝑘=1 for a 𝑑-dimensional Hilbert space H defines a von
Neumann measurement via the POVM {|𝑒 𝑘 ⟩⟨𝑒 𝑘 |} 𝑑𝑘=1 . Furthermore, as stated

127
Chapter 3: Quantum States and Measurements

at the beginning of this chapter, every Hermitian operator defines a projective

measurement, with the corresponding POVM given by its spectral projections.
Projective measurements, in particular von Neumann measurements, are viewed
as the simplest type of measurement that can be performed on a quantum system.
However, as it turns out, we can implement every measurement as a von Neumann
measurement if we have access to an auxiliary quantum system, and this is the
content of Naimark’s theorem.

Theorem 3.22 Naimark’s Theorem

For every POVM {𝑀𝑥 }𝑥∈X , there exists an isometry 𝑉 such that

𝑀𝑥 = 𝑉 † ( 1 ⊗ |𝑥⟩⟨𝑥|)𝑉 ∀ 𝑥 ∈ X, (3.3.3)

where {|𝑥⟩}𝑥∈X is an orthonormal set.

Proof: This follows immediately by defining the isometry 𝑉 as

∑︁ √︁
𝑉= 𝑀𝑥 ⊗ |𝑥⟩. (3.3.4)
𝑥∈X

This is indeed an isometry because

! !
∑︁ √︁ ∑︁ √︁
𝑉 †𝑉 = 𝑀𝑥 ⊗ ⟨𝑥| 𝑀𝑥 ′ ⊗ |𝑥 ′⟩ (3.3.5)
𝑥∈X 𝑥 ′ ∈X
∑︁ √︁ √︁ ′
= 𝑀𝑥 𝑀𝑥 ′ ⟨𝑥|𝑥 ⟩ (3.3.6)
𝑥,𝑥 ′ ∈X
∑︁
= 𝑀𝑥 (3.3.7)
𝑥∈X
= 1, (3.3.8)

where the last equality holds because {𝑀𝑥 }𝑥∈X is a POVM. It is straightforward to
check that (3.3.3) is satisfied for the choice of 𝑉 in (3.3.4). ■

The physical relevance of Naimark’s theorem is illustrated in Figure 3.4. We

model the measurement of the system 𝐴 of interest as an interaction of the system
with a probe system 𝑃 followed by a projective measurement of the probe system
described by the POVM {|𝑥⟩⟨𝑥| : 𝑥 ∈ X}. The interaction is described by an
128
Chapter 3: Quantum States and Measurements

ρA VA→ A0 P {| x ih x |} x∈X

ρ A0

Figure 3.4: The system-probe model of measurement. In order to measure the

system 𝐴 of interest, we allow it to first interact with a probe system 𝑃 via an
isometry 𝑉𝐴→𝐴′ 𝑃 . We then measure the probe with a projective measurement
consisting of rank-one elements.

isometry 𝑉𝐴→𝐴′ 𝑃 , and if 𝜌 𝐴𝑃 = 𝑉 𝜌 𝐴𝑉 † is the joint state of the system and the probe
after the interaction, then the measurement outcome probabilities are
𝑝 𝑋 (𝑥) = Tr[( 1 𝐴′ ⊗ |𝑥⟩⟨𝑥|) 𝜌 𝐴′ 𝑃 ] (3.3.9)
= Tr[( 1 𝐴′ ⊗ |𝑥⟩⟨𝑥|)𝑉 𝜌 𝐴𝑉 † ] (3.3.10)
= Tr[𝑉 † ( 1 𝐴′ ⊗ |𝑥⟩⟨𝑥|)𝑉 𝜌 𝐴 ] (3.3.11)
= Tr[𝑀𝑥 𝜌 𝐴 ], (3.3.12)

where we let 𝑀𝑥 B 𝑉 † ( 1 𝐴′ ⊗ |𝑥⟩⟨𝑥|)𝑉. This shows us that the system-probe model

of measurement, in which the probe is measured after it interacts with the system of
interest, can effectively be described by a POVM. Naimark’s theorem, on the other
hand, tells us the converse: for every measurement described by a POVM, there
exists an isometry such that the measurement can be described in the system-probe
model, as depicted in Figure 3.4.

Exercise 3.26
Consider the qubit state vectors

2𝜋𝑘 2𝜋𝑘
|𝜓 𝑘 ⟩ B cos |0⟩ + sin |1⟩, 𝑘 ∈ {0, 1, 2, 3, 4}. (3.3.13)
5 5
4
Verify that the set 25 |𝜓 𝑘 ⟩⟨𝜓 𝑘 | 𝑘=0 is a POVM. Note that this POVM gives us an
example of a non-projective measurement.

When performing measurements in quantum mechanics, we are typically

interested not only in the measurement outcomes and their probabilities but also
with the so-called post-measurement states of the system being measured. That is,
129
Chapter 3: Quantum States and Measurements

we are interested in knowing the state of the system after we have measured it and
observed the outcome.
Since every POVM element 𝑀𝑥 is positive semi-definite, there exists an operator
𝐾𝑥 such that 𝑀𝑥 = 𝐾𝑥† 𝐾𝑥 √for all 𝑥 ∈ X. For example, we could let 𝐾𝑥 be the square
root of 𝑀𝑥 , so that 𝐾𝑥 = 𝑀𝑥 . Then, the Born rule in (3.3.2) for the probability of
the measurement outcome 𝑥 ∈ X can be written as 𝑝 𝑋 (𝑥) = Tr[𝐾𝑥 𝜌𝐾𝑥† ]. In this
case, the post-measurement state corresponding to the outcome 𝑥 ∈ X is as follows:

𝑥 𝐾𝑥 𝜌𝐾𝑥†
𝜌 B . (3.3.14)
Tr[𝐾𝑥 𝜌𝐾𝑥† ]
The state 𝜌 𝑥 can be understood to capture the experimenter’s description of the
state of the system given that the measurement outcome was observed to be 𝑥.
The post-measurement states 𝜌 𝑥 give rise to the ensemble {( 𝑝 𝑋 (𝑥), 𝜌 𝑥 )}𝑥∈X .
The expected density operator of the ensemble is
∑︁ ∑︁
𝜌M B 𝑥
𝑝 𝑋 (𝑥) 𝜌 = 𝐾𝑥 𝜌𝐾𝑥† . (3.3.15)
𝑥∈X 𝑥∈X

This expected density operator is the state of the system after measurement if
the measurement outcome is not available. It can be interpreted as the state of
the system after measurement if the experimenter does not have access to the
measurement outcome.
†
√ in the decomposition 𝑀𝑥 = 𝐾𝑥 𝐾𝑥 , other choices
Due to the unitary freedom
of 𝐾𝑥 are given by 𝐾𝑥 = 𝑈𝑥 𝑀𝑥 for some unitary 𝑈𝑥 , so that there is not a unique
way to determine the post-measurement state when starting from a POVM.
Suppose now that we perform a measurement on a subsystem of a composite
system. Specifically, consider measuring a system 𝐴 that is in a joint state 𝜌 𝑅 𝐴 with
a reference system 𝑅, and let the measurement be described by the POVM {𝑀 𝐴𝑥 }𝑥∈X
for some finite alphabet X. If we let 𝑀 𝐴𝑥 = 𝐾 𝐴𝑥† 𝐾 𝐴𝑥 , then according to (3.3.14), the
measurement probabilities are given by 𝑝 𝑋 (𝑥) = Tr[( 1 𝑅 ⊗ 𝐾 𝐴𝑥 ) 𝜌 𝑅 𝐴 ( 1 𝑅 ⊗ 𝐾 𝐴𝑥† )]
and the post-measurement states are as follows:

( 1 𝑅 ⊗ 𝐾 𝐴𝑥 ) 𝜌 𝑅 𝐴 ( 1 𝑅 ⊗ 𝐾 𝐴𝑥† )
𝜌 𝑥𝑅 𝐴 = (3.3.16)
Tr[( 1 𝑅 ⊗ 𝐾 𝐴𝑥 ) 𝜌 𝑅 𝐴 ( 1 𝑅 ⊗ 𝐾 𝐴𝑥† )]
1
= ( 1 𝑅 ⊗ 𝐾 𝐴𝑥 ) 𝜌 𝑅 𝐴 ( 1 𝑅 ⊗ 𝐾 𝐴𝑥† ) (3.3.17)
𝑝 𝑋 (𝑥)
130
Chapter 3: Quantum States and Measurements

for all 𝑥 ∈ X. The state of the system 𝑅 conditioned on the measurement outcome
𝑥 is then

𝜌 𝑥𝑅 B Tr 𝐴 [𝜌 𝑥𝑅 𝐴 ] (3.3.18)
1
= Tr 𝐴 [( 1 𝑅 ⊗ 𝐾 𝐴𝑥 ) 𝜌 𝑅 𝐴 ( 1 𝑅 ⊗ 𝐾 𝐴𝑥† )] (3.3.19)
𝑝 𝑋 (𝑥)
1
= Tr 𝐴 [( 1 𝑅 ⊗ 𝐾 𝐴𝑥† 𝐾 𝐴 ) 𝜌 𝑅 𝐴 ] (3.3.20)
𝑝 𝑋 (𝑥)
1
= Tr 𝐴 [( 1 𝑅 ⊗ 𝑀 𝐴𝑥 ) 𝜌 𝑅 𝐴 ]. (3.3.21)
𝑝 𝑋 (𝑥)
We thus see that, although the post-measurement state on the system 𝐴 being
measured is not uniquely defined due to the unitary freedom in the decomposition
𝑀 𝐴𝑥 = 𝐾 𝐴𝑥† 𝐾 𝐴𝑥 , as described earlier, the post-measurement state on the reference
system 𝑅 not being measured is uniquely defined because it depends directly on
each POVM element 𝑀 𝐴𝑥 . If the system 𝐴 undergoes a measurement for which
𝑀 𝐴𝑥 = |𝜓 𝑥 ⟩⟨𝜓 𝑥 | 𝐴 , then (3.3.21) can be written as

1
𝜌 𝑥𝑅 = ⟨𝜓 𝑥 | 𝐴 𝜌 𝑅 𝐴 |𝜓 𝑥 ⟩ 𝐴 . (3.3.22)
𝑝 𝑋 (𝑥)

Exercise 3.27
Í
Consider a finite set {𝜌 𝑥 }𝑥∈X of quantum states, and let 𝑅 B 𝑥∈X 𝜌𝑥 .
1. If 𝑅 is invertible, let
1 1
𝑀𝑥 B 𝑅 − 2 𝜌 𝑥 𝑅 − 2 ∀ 𝑥 ∈ X. (3.3.23)

Prove that {𝑀𝑥 }𝑥∈X is a POVM.

2. If 𝑅 is not invertible, let

𝑀𝑥 B 𝑅 − 2 𝜌 𝑥 𝑅 − 2 + 1 − Π 𝑅
1 1
∀ 𝑥 ∈ X, (3.3.24)

where Π 𝑅 is the projection onto the support of 𝑅. Prove that {𝑀𝑥 }𝑥∈X is a
POVM. (Hint: Recall the defnitions from Section 2.2.8.1.)

131
Chapter 3: Quantum States and Measurements

3.4 Bibliographic Notes

More details on many of the concepts that have been presented in this chapter can
be found in the books of Nielsen and Chuang (2000), Holevo (2013), Hayashi
(2017), Wilde (2017a), and Watrous (2018). For a treatment of these concepts in
infinite-dimensional Hilbert spaces, see the books by Holevo (2013) and Heinosaari
and Ziman (2012).
The mathematical theory of quantum mechanics was developed by von Neumann
(1927a, 1932) and Landau (1927). The book by Helstrom (1976) also contains
early developments in the theory of quantum measurements.
We have used quantum-optical modes as a concrete example of qubit encodings
throughout this chapter. For a detailed reference on this topic, see the review by
Kok et al. (2007).
Recall the vector 𝑟®𝜌 in (3.2.4) of coefficients of a quantum state in terms of
the orthogonal basis defined via the operators in (2.2.44)–(2.2.47). This vector is
known as both the “coherence vector” and the “Bloch vector” of 𝜌. Perhaps the
earliest use of the term “coherence vector” is in the work of Hioe and Eberly (1981),
in which the orthogonality convention for the operators in (2.2.44)–(2.2.47) differs
from the convention used in this book. Positivity of linear operators in terms of the
coherence vector was presented by Byrd and Khaneja (2003); Kimura (2003). The
latter work by Kimura (2003) uses the term “Bloch vector”, which has also been
used by Bertlmann and Krammer (2008).
Lemma 3.13, regarding purifications of permutation-invariant states, is due to
Renner (2005). Separable and entangled states were defined by Werner (1989b).
For an in-depth review of the properties of entanglement and its various applications
in quantum information theory, including a discussion of multipartite entanglement,
we refer to the article by Horodecki et al. (2009b). For an in-depth discussion of
multipartite entanglement, we refer to (Walter et al., 2016). The relevance of the
partial transpose for characterizing entanglement in quantum information was given
by Peres (1996); Horodecki et al. (1996), and the set of positive-partial-transpose
states was discussed by Horodecki (1997); Horodecki et al. (1998). In particular,
the existence of PPT entangled states was found by Horodecki (1997).
Isotropic states were defined by Horodecki and Horodecki (1999), and Werner
states were defined by Werner (1989b). Proofs of formulas for the integration of

132
Chapter 3: Quantum States and Measurements

unitary operators with respect to the Haar measure, including proofs of (3.2.134)
and (3.2.142), can be found in (Collins, 2003; Collins and Śniady, 2006; Roy and
Scott, 2009).

Appendix 3.A Proof of Lemma 3.3

Proof: First suppose that 𝑋 𝐴𝐵 is rank one, so that 𝑋 𝐴𝐵 = |Ψ⟩⟨Ψ| 𝐴𝐵 for some vector
|Ψ⟩ 𝐴𝐵 ∈ H 𝐴 ⊗ H𝐵 . Due to the Schmidt decomposition theorem (Theorem 2.2),
we have that ∑︁
|Ψ⟩ 𝐴𝐵 = 𝛾𝑧 |𝜃 𝑧 ⟩ 𝐴 ⊗ |𝜉 𝑧 ⟩𝐵 , (3.A.1)
𝑧∈Z
where |Z| ≤ min{dim(H 𝐴 ), dim(H𝐵 )}, the set {𝛾𝑧 } 𝑧 is a set of strictly positive
numbers, and {|𝜃 𝑧 ⟩ 𝐴 } 𝑧 and {|𝜉 𝑧 ⟩𝐵 } 𝑧 are orthonormal bases. Then

supp(𝑋 𝐴𝐵 ) = span{|Ψ⟩ 𝐴𝐵 } (3.A.2)

⊆ span{|𝜃 𝑧 ⟩ 𝐴 : 𝑧 ∈ Z} ⊗ span{|𝜉 𝑧 ⟩𝐵 : 𝑧 ∈ Z}. (3.A.3)

The statement then follows for this case because supp(𝑋 𝐴 ) = span{|𝜃 𝑧 ⟩ 𝐴 : 𝑧 ∈ Z}
and supp(𝑋𝐵 ) = span{|𝜉 𝑧 ⟩𝐵 : 𝑧 ∈ Z}.
Now suppose that 𝑋 𝐴𝐵 is not rank one. It admits a decomposition into rank-one
vectors of the following form:
∑︁
𝑋 𝐴𝐵 = |Ψ𝑥 ⟩⟨Ψ𝑥 | 𝐴𝐵 , (3.A.4)
𝑥∈X

where |Ψ𝑥 ⟩ 𝐴𝐵 ∈ H 𝐴 ⊗ H𝐵 for all 𝑥 ∈ X. Set Ψ𝑥𝐴𝐵 = |Ψ𝑥 ⟩⟨Ψ𝑥 | 𝐴𝐵 , and let
Ψ𝑥𝐴 B Tr 𝐵 [Ψ𝑥𝐴𝐵 ] and Ψ𝐵𝑥 B Tr 𝐴 [Ψ𝑥𝐴𝐵 ]. Then

supp(𝑋 𝐴𝐵 ) = span{|Ψ𝑥 ⟩ 𝐴𝐵 : 𝑥 ∈ X} (3.A.5)

" #
Ø
⊆ span supp(Ψ𝑥𝐴 ) ⊗ supp(Ψ𝐵𝑥 ) (3.A.6)
"𝑥∈X # " #
Ø Ø
⊆ span supp(Ψ𝑥𝐴 ) ⊗ span supp(Ψ𝐵𝑥 ) (3.A.7)
𝑥∈X 𝑥∈X
= supp(𝑋 𝐴 ) ⊗ supp(𝑋𝐵 ), (3.A.8)

concluding the proof. ■

133
Chapter 3: Quantum States and Measurements

Appendix 3.B Proof of Lemma 3.4

Proof: First suppose that 𝑋 𝐴𝐵 is rank one, as in the first part of the proof of the
previous lemma, and let us use the same notation as given there. Applying the
same lemma gives that

supp(𝑋 𝐴𝐵 ) ⊆ supp(𝑌 𝐴𝐵 ) ⊆ supp(𝑌 𝐴 ) ⊗ supp(𝑌𝐵 ), (3.B.1)

which in turn implies that supp(𝑋 𝐴𝐵 ) = span{|Ψ⟩ 𝐴𝐵 } ⊆ supp(𝑌 𝐴 ) ⊗supp(𝑌𝐵 ). This

implies that |𝜃 𝑧 ⟩ 𝐴 ∈ supp(𝑌 𝐴 ) for all 𝑧 ∈ Z, and thus that span{|𝜃 𝑧 ⟩ 𝐴 } ⊆ supp(𝑌 𝐴 ).
We can then conclude the statement in this case because span{|𝜃 𝑧 ⟩ 𝐴 } = supp(𝑋 𝐴 ).
Now suppose that 𝑋 𝐴𝐵 is not rank one. Then it admits a decomposition as
given in the proof of the previous lemma. Using the same notation, we have that
supp(Ψ𝑥𝐴𝐵 ) ⊆ supp(𝑌 𝐴𝐵 ) holds for all 𝑥 ∈ X. Since we have proven the lemma
for rank-one operators, we can conclude that supp(Ψ𝑥𝐴 ) ⊆ supp(𝑌 𝐴 ) holds for all
𝑥 ∈ X. As a consequence, we find that
" #
Ø
supp(𝑋 𝐴 ) = span supp(Ψ𝑥𝐴 ) ⊆ supp(𝑌 𝐴 ), (3.B.2)
𝑥∈X

concluding the proof. ■

134
Chapter 4

Quantum Channels
In the previous chapter, we studied quantum states and measurements. These two
topics constitute the first three axioms of quantum mechanics, as presented in
Section 3.1. The fourth and final axiom is about the evolution of quantum systems,
which is the subject of this chapter. Mathematically, the evolution is described by a
quantum channel. As quantum communication necessarily involves the evolution
of quantum systems (such as the evolution of photons when travelling through
an optical fiber), quantum channels are the primary objects of study in this book.
This chapter is devoted to a detailed study of quantum channels, including their
properties, representations, and various examples that are relevant for quantum
communication and quantum information more broadly.
The fourth axiom in Section 3.1 states that a quantum channel is a “linear,
completely positive, and trace-preserving map acting on the state of the system.” At
first glance, this appears to be a purely mathematical statement (which we elaborate
upon in Section 4.1), with seemingly little connection to physics. However, we can
connect this statement to the axiom of evolution of quantum systems as taught in a
basic quantum physics course. In such a course, one learns that the evolution of a
(non-relativistic) quantum system is governed by the Schrödinger equation:
𝜕
iℏ |𝜓(𝑡)⟩ = 𝐻 (𝑡)|𝜓(𝑡)⟩, (4.0.1)
𝜕𝑡
where |𝜓(𝑡)⟩ is the state vector of the system at time 𝑡 ≥ 0 and 𝐻 (𝑡) is the
Hamiltonian operator of the system at time 𝑡. The Hamiltonian operator is a
Hermitian operator that describes the energy of the system. Now, we know from
Chapter 3 that the state of a quantum system is described more generally by a
135
Chapter 4: Quantum Channels

density operator. The analogue of (4.0.1) for density operators is known as the von
Neumann equation:
𝜕 𝜌(𝑡)
iℏ = [𝐻 (𝑡), 𝜌(𝑡)], (4.0.2)
𝜕𝑡
where 𝜌(𝑡) is the density operator describing the state of the system at time 𝑡 ≥ 0,
and [𝐻 (𝑡), 𝜌(𝑡)] = 𝐻 (𝑡) 𝜌(𝑡) − 𝜌(𝑡)𝐻 (𝑡) is the commutator of the Hamiltonian
𝐻 (𝑡) and the state 𝜌(𝑡).
Both (4.0.1) and (4.0.2) describe the evolution of so-called closed quantum
systems, and this evolution is given by unitary maps. In other words, the solution
to (4.0.1) is |𝜓(𝑡)⟩ = 𝑈 (𝑡)|𝜓0 ⟩ for all 𝑡 ≥ 0, where |𝜓0 ⟩ is an initial state vector of
the system (at time 𝑡 = 0) and 𝑈 (𝑡) is a unitary operator. Similarly, the solution
to (4.0.2) is 𝜌(𝑡) = 𝑈 (𝑡) 𝜌0𝑈 (𝑡) † for all 𝑡 ≥ 0, where 𝜌0 is an initial quantum
state of the system (at time 𝑡 = 0) and 𝑈 (𝑡) is a unitary operator. We refer to the
Bibliographic Notes in Section 4.8 for references on explicit forms for the unitary
𝑈 (𝑡). We show in this chapter that unitary maps are quantum channels. This fact
provides a connection between the mathematical statement of the evolution axiom
in Section 3.1 and the statement of the evolution axiom typically taught in quantum
physics courses.
More generally, we are interested in the evolution of open quantum systems,
i.e., quantum systems that interact with an external environment that is out of
our control. For such systems, the same connection as before holds. In fact, the
evolution is given by a joint unitary evolution of the system and environment
followed by discarding the state of the environment, and as we show in Section 4.3,
every completely positive trace-preserving map (i.e., every quantum channel) can
be viewed in terms of a joint unitary evolution with an environment followed by
discarding the state of the environment. (Please see the Bibliographic Notes in
Section 4.8 for references on open quantum systems.) Thus, from an abstract,
information-theoretic perspective, the evolution of a quantum system is given
simply by a quantum channel, and the details of the actual physical system of
interest (which would be given by the Hamiltonian operator) are unimportant. This
viewpoint is powerful: with it, we realize that virtually every operation on quantum
states, including measurements, is a quantum channel.

136
Chapter 4: Quantum Channels

4.1 Definition
We can motivate the definition of a quantum channel by using the following basic
mathematical facts that should be satisfied by a map N : L(H) → L(H′) that
represents the evolution of a quantum system:

1. If N acts on a mixture of quantum states, then the output state should be equal
to the mixture of the individual outputs. That is,
N(𝜆𝜌 + (1 − 𝜆)𝜎) = 𝜆N(𝜌) + (1 − 𝜆)N(𝜎) (4.1.1)
for all states 𝜌 and 𝜎 and 𝜆 ∈ [0, 1]. This requirement is called convex linearity
on density operators, and for each convex linear map acting on the convex set
of density operators, it is possible to define a unique linear map acting on the
space of all linear operators. The latter is the mathematical object that we
employ, and so we require that N be a linear map, i.e., a superoperator. (Recall
the definition of a superoperator from Section 2.2.11.)
2. The map N should accept a quantum state (or a mixture of quantum states)
as input and output a legitimate quantum state. This means that N should
be trace preserving and positive. However, it is furthermore reasonable to
demand that if the channel acts on one share 𝐴 of a bipartite quantum state
𝜌 𝑅 𝐴 , then the output should be a legitimate bipartite quantum state. So we
demand additionally that a quantum channel should be not just positive, but
additionally completely positive. Let us now define these terms.
(a) N is called trace preserving if Tr[N(𝑋)] = Tr[𝑋] for every linear oper-
ator 𝑋. More generally, N is called trace non-increasing if Tr[N(𝑋)] ≤
Tr[𝑋] for every positive semi-definite operator 𝑋.
(b) N is called positive if it maps positive semi-definite operators to positive
semi-definite operators, i.e., N(𝑋) ≥ 0 for all 𝑋 ≥ 0. It is called 𝑘-positive,
with 𝑘 ≥ 1, if the map id𝑘 ⊗ N is positive. Note that if N is a map
acting on linear operators in L(C𝑑 ), then the map id𝑘 ⊗ N acts on linear
operators in L(C𝑘 𝑑 ). In other words, for every linear operator 𝑋 acting on
a 𝑘 𝑑-dimensional Hilbert space, which we can decompose as the block
matrix
© 𝑋0,0 · · · 𝑋0,𝑘−1 ª
𝑋 = .. . ... .. ®, (4.1.2)
. ®
« 𝑋𝑘−1,0 · · · 𝑋𝑘−1,𝑘−1 ¬
137
Chapter 4: Quantum Channels

such that 𝑋𝑖, 𝑗 is a 𝑑 × 𝑑 matrix for all 0 ≤ 𝑖, 𝑗 ≤ 𝑘 − 1, the action of the

map id𝑘 ⊗ N is defined as

© N(𝑋. 0,0 ) ·. · · N(𝑋0,𝑘−1

.
) ª
(id𝑘 ⊗ N)(𝑋) = .. .. .. ®.
® (4.1.3)
«N(𝑋 𝑘−1,0 ) . . . N(𝑋 )
𝑘−1,𝑘−1 ¬

We can write this more compactly as follows. Noting that L(C𝑘 𝑑 ) is

isomorphic to L(C𝑘 ) ⊗ L(C𝑑 ), we can write 𝑋 ∈ L(C𝑘 𝑑 ) as
𝑘−1
∑︁
𝑋= |𝑖⟩⟨ 𝑗 | ⊗ 𝑋𝑖, 𝑗 . (4.1.4)
𝑖, 𝑗=0

Then the action of id𝑘 ⊗ N is defined as

𝑘−1
∑︁
(id𝑘 ⊗ N)(𝑋) = |𝑖⟩⟨ 𝑗 | ⊗ N(𝑋𝑖, 𝑗 ). (4.1.5)
𝑖, 𝑗=0

The superoperator N is called completely positive if id𝑘 ⊗ N is positive for

every integer 𝑘 ≥ 1.
Physically, the complete positivity of N takes into account the fact that the
system of interest might be entangled with another system that is outside
of our control, so that simply letting N be positive is not sufficient to
ensure that all positive semi-definite operators get mapped to positive semi-
definite operators. Letting N be completely positive means that positive
semi-definite operators are mapped to positive semi-definite operators even
in this more general setting.

The defining properties of linearity, complete positivity, and trace preservation

for quantum channels together ensure that quantum states for the systems of interest
get mapped to quantum states, even if they happen to be entangled with other
external systems outside of our control. These properties are consistent with what
is observed in real physical systems.
Throughout this book, we write N 𝐴→𝐵 to denote a map N : L(H 𝐴 ) → L(H𝐵 )
taking a quantum system 𝐴 to a quantum system 𝐵. We sometimes write N 𝐴 if the
input and output systems of the channel have the same dimension. We drop the
subscript indicating the input and output systems if they are not important in the
context being considered. Physically, the quantum channel, being a description
138
Chapter 4: Quantum Channels

A N B N
space
B

(a) (b)

time

Figure 4.1: Our convention for drawing quantum channels throughout this
book, with time increasing horizontally towards the right, and spatial separations
indicated vertically. In (a), the input and output systems 𝐴 and 𝐵, respectively,
of the quantum channel N are temporally separated but not spatially separated.
In (b), 𝐴 and 𝐵 are both spatially and temporally separated. We often draw a
dashed line to indicate the spatial separation explicitly.

of the time evolution of a quantum system, describes the transition of a quantum

system between two distinct points in time. The systems 𝐴 and 𝐵 at the input
and output of the channel, respectively, thus represent quantum systems at two
distinct points in time, as shown in Figure 4.1(a). However, in the context of
communication, we regard the systems 𝐴 and 𝐵 as being separated both in time
as well as in space, with 𝐴 belonging to an individual “Alice” and 𝐵 belonging to
“Bob.” We show this physical separation explicitly throughout this book according
to the convention shown in Figure 4.1(b).

Exercise 4.1
Let N be a 𝑘-positive superoperator, for an integer 𝑘 ≥ 1. Prove that N is
Hermiticity preserving (recall Definition 2.17). (Hint: See Exercise 2.17 and
use the Jordan–Hahn decomposition.)

Exercise 4.2
Prove that a superoperator N 𝐴→𝐵 is trace-non-increasing if and only if its adjoint
is subunital, meaning that N† ( 1𝐵 ) ≤ 1 𝐴 . Prove that the inequality is saturated,
i.e., that N† ( 1𝐵 ) = 1 𝐴 (N† is unital), if and only if N is trace preserving.

139
Chapter 4: Quantum Channels

Exercise 4.3
1. Let N 𝐴→𝐵 be a positive superoperator. Starting with (2.2.186), prove that

∥N∥ 1 = N† ( 1𝐵 ) ∞
. (4.1.6)

2. Using 1., conclude the following:

(a) If N 𝐴→𝐵 is a 𝑘-positive, trace-non-increasing superoperator, then
∥id𝑘 ⊗ N∥ 1 ≤ 1;
(b) If N 𝐴→𝐵 is a 𝑘-positive, trace-preserving superoperator, then
∥id𝑘 ⊗ N∥ 1 = 1.

3. Using 1. and 2., conclude the following:

(a) If N 𝐴→𝐵 is a completely positive, trace-non-increasing superoperator,
then ∥N∥⋄ ≤ 1;
(b) If N 𝐴→𝐵 is a quantum channel, then ∥N∥⋄ = 1. (The diamond norm
∥·∥⋄ is introduced in Definition 2.20.)

Combining the result of Exercise 4.3 and (2.2.185), we conclude that for every
positive, trace-non-increasing superoperator N 𝐴→𝐵 , and every linear operator
𝑋 ∈ L(H 𝐴 ),
∥N(𝑋) ∥ 1 ≤ ∥ 𝑋 ∥ 1 . (4.1.7)
An inequality of this type is called a data-processing inequality, for which we
provide an interpretation later on in Section 6.1. We encounter numerous such
inequalities with respect to various different quantities throughout the rest of this
book, and they turn out to be of central importance in the analysis of quantum
communication protocols, and in quantum information theory more generally.

4.2 Choi Representation

The Choi representation of a quantum channel gives a way to represent a quantum
channel as a bipartite operator and is an essential concept in quantum information
theory.

140
Chapter 4: Quantum Channels

A
ΦAA0
A0

Alice
Bob N ΦNAB

Figure 4.2: The normalized Choi representation ΦN 𝐴𝐵 of a superoperator N 𝐴→𝐵

is the bipartite operator resulting from sending one share of the maximally
entangled state Φ 𝐴𝐴′ , defined in (3.2.39), through N.

Definition 4.1 Choi Representation

For every superoperator N 𝐴→𝐵 , its Choi representation, or Choi operator, is
defined as
𝐴−1
𝑑∑︁
ΓN
𝐴𝐵 B N 𝐴′ →𝐵 (|Γ⟩⟨Γ| 𝐴𝐴′ ) = |𝑖⟩⟨ 𝑗 | 𝐴 ⊗ N(|𝑖⟩⟨ 𝑗 | 𝐴′ ), (4.2.1)
𝑖, 𝑗=0

where H 𝐴′ is isomorphic to the Hilbert space H 𝐴 corresponding to the channel

input system 𝐴. We also define the operator
1 N
ΦN
𝐴𝐵 B Γ = N 𝐴′ →𝐵 (Φ 𝐴𝐴′ ), (4.2.2)
𝑑 𝐴 𝐴𝐵
where Φ 𝐴𝐴′ = |Φ⟩⟨Φ| 𝐴𝐴′ is the maximally entangled state defined in (3.2.39).

The Choi representation ΓN 𝐴𝐵 of the superoperator N 𝐴→𝐵 is an operator acting

on H 𝐴𝐵 and uniquely characterizes the map because it specifies the action of N
on the basis {|𝑖⟩⟨ 𝑗 | 𝐴 : 0 ≤ 𝑖, 𝑗 ≤ 𝑑 𝐴 − 1} of linear operators acting on H 𝐴 . As
shown in Figure 4.2, the operator ΦN 𝐴𝐵 is simply the Choi representation normalized
by the dimension 𝑑 𝐴 of the input system 𝐴 of the superoperator N, and it is the
linear operator resulting from sending one share of a maximally entangled state
through N. When N is a quantum channel, we refer to ΦN 𝐴𝐵 as the Choi state of N,
N
because Φ 𝐴𝐵 is positive semi-definite and has unit trace; see Theorem 4.3 below.

141
Chapter 4: Quantum Channels

Exercise 4.4
Let N 𝐴→𝐵 be a superoperator.

𝐴𝐵 ] = N 𝐴→𝐵 ( 1 𝐴 ).
1. Prove that Tr 𝐴 [ΓN

𝐴𝐵 ] = 1 𝐴 . Conclude
2. Prove that N 𝐴→𝐵 is trace preserving if and only if Tr 𝐵 [ΓN
N
that Tr[Φ 𝐴𝐵 ] = 1.
D E
N
3. Prove that 𝑋 𝐴 ⊗ 𝑌𝐵 , Γ𝐴𝐵 = ⟨𝑌𝐵 , N 𝐴→𝐵 (𝑋 𝐴 )⟩ for all 𝑋 𝐴 ∈ L(H 𝐴 ) and
𝑌𝐵 ∈ L(H𝐵 ).
4. Using 3., prove that the Choi representation of N can be expressed using
the adjoint N† as follows:
𝐵 −1
𝑑∑︁
ΓN
𝐴𝐵 = N† (|𝑘⟩⟨ℓ| 𝐵 ) ⊗ |𝑘⟩⟨ℓ| 𝐵 . (4.2.3)
𝑘,ℓ=0

𝐴𝐵 ] = N ( 1 𝐵 ).
Conclude that Tr 𝐵 [ΓN †

5. Prove that, for every unitary operator 𝑈 𝐴 ,

ΓN
𝐴𝐵 = (U 𝐴 ⊗ (N 𝐴′ →𝐵 ◦ U 𝐴′ ))(Γ 𝐴𝐴′ ), (4.2.4)
†
where U 𝐴 (·) B 𝑈 𝐴 (·)𝑈 𝐴† and U 𝐴 (·) B 𝑈 𝐴 (·)𝑈 𝐴 .

Proposition 4.2 Quantum Dynamics from Choi Operator

Let N 𝐴→𝐵 be a superoperator, and let 𝑋 𝑅 𝐴 be a bipartite operator, with 𝑅 an
arbitrary reference system. Then the action of the superoperator id 𝑅 ⊗ N 𝐴→𝐵
on 𝑋 𝑅 𝐴 can be expressed in terms of the Choi operator ΓN𝐴𝐵 as follows:

(id 𝑅 ⊗ N 𝐴→𝐵 )(𝑋 𝑅 𝐴 ) = Tr 𝐴 [(T 𝐴 (𝑋 𝑅 𝐴 ) ⊗ 1𝐵 )( 1 𝑅 ⊗ ΓN

𝐴𝐵 )], (4.2.5)
= ⟨Γ| 𝐴′ 𝐴 (𝑋 𝑅 𝐴′ ⊗ ΓN
𝐴𝐵 )|Γ⟩ 𝐴′ 𝐴 , (4.2.6)

where T 𝐴 denotes the partial transpose from Definition 3.16.

142
Chapter 4: Quantum Channels

Proof: Observe that (4.2.1) implies that

N 𝐴→𝐵 (|𝑖⟩⟨ 𝑗 | 𝐴 ) = (⟨𝑖| 𝐴 ⊗ 1𝐵 )ΓN

𝐴𝐵 (| 𝑗⟩ 𝐴 ⊗ 1 𝐵 ) (4.2.7)

for all 0 ≤ 𝑖, 𝑗 ≤ 𝑑 𝐴 − 1. We extend this by linearity to apply to every input

Í −1
operator 𝑋 𝐴 by the following reasoning. Expanding 𝑋 𝐴 as 𝑋 𝐴 = 𝑖,𝑑 𝐴𝑗=0 𝑋𝑖, 𝑗 |𝑖⟩⟨ 𝑗 | 𝐴 ,
we find that
𝐴−1
𝑑∑︁
N 𝐴→𝐵 (𝑋 𝐴 ) = 𝑋𝑖, 𝑗 N 𝐴→𝐵 (|𝑖⟩⟨ 𝑗 | 𝐴 ) (4.2.8)
𝑖, 𝑗=0
𝐴−1
𝑑∑︁
= 𝑋𝑖, 𝑗 (⟨𝑖| 𝐴 ⊗ 1𝐵 )ΓN
𝐴𝐵 (| 𝑗⟩ 𝐴 ⊗ 1 𝐵 ) (4.2.9)
𝑖, 𝑗=0
𝐴−1
𝑑∑︁
= 𝑋𝑖, 𝑗 Tr 𝐴 [(| 𝑗⟩⟨𝑖| 𝐴 ⊗ 1𝐵 )ΓN
𝐴𝐵 ] (4.2.10)
𝑖, 𝑗=0

= Tr 𝐴 [(𝑋 𝐴T ⊗ 1𝐵 )ΓN
𝐴𝐵 ]. (4.2.11)

So we conclude that the action of N 𝐴→𝐵 on every linear operator 𝑋 𝐴 can be

expressed using the Choi representation as

N 𝐴→𝐵 (𝑋 𝐴 ) = Tr 𝐴 [(𝑋 𝐴T ⊗ 1𝐵 )ΓN

𝐴𝐵 ]. (4.2.12)

Now employing (2.2.41) and (2.2.40), we conclude that the action of N 𝐴→𝐵 on
every linear operator 𝑋 𝐴 can be expressed alternatively as

N 𝐴→𝐵 (𝑋 𝐴 ) = ⟨Γ| 𝐴′ 𝐴 (𝑋 𝐴′ ⊗ ΓN
𝐴𝐵 )|Γ⟩ 𝐴′ 𝐴 . (4.2.13)

The identities in (4.2.5) and (4.2.13) extend more generally to the case of the
superoperator id 𝑅 ⊗ N 𝐴→𝐵 acting on a bipartite operator 𝑋 𝑅 𝐴 by expanding 𝑋 𝑅 𝐴
Í −1 𝑖, 𝑗
as 𝑋 𝑅 𝐴 = 𝑖,𝑑 𝑅𝑗=0 |𝑖⟩⟨ 𝑗 | 𝑅 ⊗ 𝑋 𝐴 and using linearity. We thus conclude (4.2.5) and
(4.2.6). ■

Exercise 4.5
Prove that a superoperator N 𝐴→𝐵 is Hermiticity preserving (recall the definition
in Section 2.2.11) if and only if its Choi representation ΓN
𝐴𝐵 is Hermitian.

143
Chapter 4: Quantum Channels

Using the definition of the Choi state and the maximally entangled state Φ 𝐴′ 𝐴 ,
we can write (4.2.6) as
1
⟨Φ| 𝐴′ 𝐴 (𝑋 𝑅 𝐴′ ⊗ ΦN
𝐴𝐵 )|Φ⟩ 𝐴′ 𝐴 = N 𝐴′ →𝐵 (𝑋 𝑅 𝐴′ ). (4.2.14)
𝑑 2𝐴

Comparing this equation with (3.3.22), we see that it has the following physical
interpretation: if we start with the systems 𝑅, 𝐴′, 𝐴 and 𝐵 in the state 𝜌 𝑅 𝐴′ ⊗ ΦN
𝐴𝐵
and we measure 𝐴′ and 𝐴 according to the POVM {|Φ⟩⟨Φ| 𝐴′ 𝐴 , 1 𝐴′ 𝐴 − |Φ⟩⟨Φ| 𝐴′ 𝐴 },
then the outcome corresponding to |Φ⟩⟨Φ| 𝐴′ 𝐴 occurs with probability 𝑑12 and the
𝐴
post-measurement state on systems 𝑅 and 𝐵 is N 𝐴′ →𝐵 (𝜌 𝑅 𝐴′ ). The Choi state ΦN
𝐴𝐵
can thus be viewed as a resource state for the probabilistic implementation of the
channel N. We return to this point in Section 5.1 when we discuss post-selected
quantum teleportation.
The concept of the Choi state allows us to associate to each quantum channel
N 𝐴→𝐵 a bipartite quantum state. Conversely, given a bipartite state 𝜌 𝐴𝐵 , we can
associate a map given by

𝑋 𝐴 ↦→ 𝑑 𝐴 Tr 𝐴 [(𝑋 𝐴T ⊗ 1𝐵 ) 𝜌 𝐴𝐵 ]. (4.2.15)

It is straightforward to see that this map is completely positive; however, it is trace

𝜌
preserving if and only if Tr 𝐵 [𝜌 𝐴𝐵 ] = 𝜋 𝐴 . On the other hand, the map N 𝐴→𝐵 defined
as
− 12 − 12
N 𝐴→𝐵 (𝑋 𝐴 ) B Tr 𝐴 (𝑋 𝐴 ⊗ 1𝐵 ) 𝜌 𝐴 𝜌 𝐴𝐵 𝜌 𝐴
𝜌 T
(4.2.16)

is a quantum channel whenever 𝜌 𝐴 is positive definite, where 𝜌 𝐴 = Tr 𝐵 [𝜌 𝐴𝐵 ]. The

−1 −1
operator 𝜌 𝐴 2 𝜌 𝐴𝐵 𝜌 𝐴 2 is sometimes called a “conditional state,” motivated by the
fact that it reduces to a conditional probability distribution when 𝜌 𝐴𝐵 is a fully
Í
classical state, so that it can be written as 𝜌 𝐴𝐵 = 𝑥,𝑦 𝑝(𝑥, 𝑦)|𝑥⟩⟨𝑥| 𝐴 ⊗ |𝑦⟩⟨𝑦| 𝐵
where 𝑝(𝑥, 𝑦) is a probability distribution. Note that if 𝜌 𝐴 is not invertible, then the
inverse in (4.2.16) should be taken on the support of 𝜌 𝐴 (as in (2.2.72)), in which
𝜌
case the channel N 𝐴→𝐵 is defined as in (4.2.16) only on the support of 𝜌 𝐴 .

Exercise 4.6
1. Given two superoperators (N1 ) 𝐴1 →𝐵1 and (N2 ) 𝐴2 →𝐵2 , prove that the Choi

144
Chapter 4: Quantum Channels

representation of the tensor-product superoperator N1 ⊗ N2 is given by

ΓN1 ⊗N2 N1 N2
𝐴1 𝐴2 𝐵 1 𝐵 2 = Γ 𝐴1 𝐵 1 ⊗ Γ 𝐴2 𝐵 2 . (4.2.17)

2. Given two superoperators N 𝐴→𝐵 and M𝐵→𝐶 , prove that the Choi represen-
tation of the composition (M ◦ N) 𝐴→𝐶 is given by

ΓM◦N N
𝐴𝐶 = M 𝐵→𝐶 (Γ 𝐴𝐵 ) (4.2.18)
= Tr 𝐵 [T𝐵 (ΓN M
𝐴𝐵 )Γ𝐵𝐶 ] (4.2.19)
= ⟨Γ| 𝐵𝐵′ ΓN M
𝐴𝐵 ⊗ Γ𝐵′ 𝐶 |Γ⟩ 𝐵𝐵′ . (4.2.20)

Exercise 4.7
Let N 𝐴→𝐵 be a superoperator. Prove that
1
ΓN ≤ ∥N∥⋄ ≤ ΓN . (4.2.21)
𝑑 𝐴 𝐴𝐵 1
𝐴𝐵
1

(Hint: Start with Theorem 2.21. Then, for the right-most inequality, start with
the discussion around (2.2.38) and then use (2.2.94).)

4.3 Characterizations of Channels: Choi, Kraus,

Stinespring
The following theorem provides three useful ways to characterize quantum channels,
and as such, it is one of the most important theorems in quantum information theory.

Theorem 4.3 Characterizations of Quantum Channels

Let N 𝐴→𝐵 be a linear map from L(H 𝐴 ) to L(H𝐵 ). Then the following are
equivalent:
1. N is a quantum channel.
2. Choi: The Choi representation ΓN
𝐴𝐵 is positive semi-definite and satisfies

145
Chapter 4: Quantum Channels

𝐴𝐵 ] = 1 𝐴 .
Tr 𝐵 [ΓN
3. Kraus: There exists a set {𝐾𝑖 }𝑖=1
𝑟 of operators, called Kraus operators,
such that 𝑟
∑︁
N(𝑋 𝐴 ) = 𝐾𝑖 𝑋 𝐴 𝐾𝑖† (4.3.1)
𝑖=1

for every linear operator 𝑋 𝐴 , where 𝐾𝑖 ∈ L(H 𝐴 , H𝐵 ) for all 𝑖 ∈ {1, . . . , 𝑟 },

𝑖=1 𝐾𝑖 𝐾𝑖 = 1 𝐴 .
†
𝑟 ≥ rank(ΓN
Í𝑟
𝐴𝐵 ), and
4. Stinespring: There exists an isometry 𝑉𝐴→𝐵𝐸 , called an isometric extension,
with 𝑑 𝐸 ≥ rank(ΓN𝐴𝐵 ), such that

N(𝑋 𝐴 ) = Tr𝐸 [𝑉 𝑋 𝐴𝑉 † ] (4.3.2)

for every linear operator 𝑋 𝐴 .

Please consult the Bibliographic Notes in Section 3.4 for references containing
a proof of this theorem.

Remark: Theorem 4.3 holds for quantum channels, i.e., completely positive trace-preserving
maps. More generally, if N 𝐴→𝐵 is completely positive and trace non-increasing, then the
trace-preserving condition for the Choi, Kraus, and Stinespring representations of N 𝐴→𝐵 changes
as follows.

• The trace-preserving condition Tr 𝐵 [ΓN𝐴𝐵 ] = 1 𝐴 on the Choi representation of N 𝐴→𝐵

changes to Tr 𝐵 [Γ 𝐴𝐵 ] ≤ 1 𝐴 when N 𝐴→𝐵 is trace non-increasing.
N

• The trace-preserving condition 𝑟𝑖=1 𝐾𝑖† 𝐾𝑖 = 1 𝐴 on a set of Kraus operators changes to

𝑖=1 𝐾𝑖 𝐾𝑖 ≤ 1 𝐴 when N 𝐴→𝐵 is trace non-increasing.

Í𝑟 †

• The isometric property of the operator 𝑉 in (4.3.2), which corresponds to the trace-
preserving property of every quantum channel, changes to 𝑉 †𝑉 ≤ 1 𝐴 when N 𝐴→𝐵 is trace
non-increasing.

Completely positive trace-non-increasing maps arise in the context of quantum instruments,

which we discuss in Section 4.4.5.

The Kraus operators 𝐾𝑖 in (4.3.1) have an interpretation in quantum error

correction as “error operators” that characterize various errors that a quantum
system undergoes. Kraus operators for a given quantum channel are, however, not
𝑟 is a set of Kraus operators for the channel N, then,
unique in general. If {𝐾𝑖 }𝑖=1
146
Chapter 4: Quantum Channels

given an 𝑠 × 𝑟 isometric matrix 𝑉 with elements {𝑉𝑖, 𝑗 : 1 ≤ 𝑖 ≤ 𝑠, 1 ≤ 𝑗 ≤ 𝑟}, the

operators {𝐾𝑖′ }𝑖=1
𝑠 defined as

𝑟
∑︁
𝐾𝑖′ = 𝑉𝑖, 𝑗 𝐾 𝑗 (4.3.3)
𝑗=1

are also Kraus operators for N. Indeed, for every linear operator 𝑋, the following
equality holds
𝑠
∑︁ 𝑠 ∑︁
∑︁ 𝑟
𝐾𝑖′ 𝑋 (𝐾𝑖′) † = 𝑉𝑖, 𝑗 𝑉𝑖, 𝑗 ′ 𝐾 𝑗 𝑋𝐾 †𝑗 ′ (4.3.4)
𝑖=1 𝑖=1 𝑗, 𝑗 ′ =1
𝑟 𝑠
!
∑︁ ∑︁
= (𝑉 † ) 𝑗 ′ ,𝑖𝑉𝑖, 𝑗 𝐾 𝑗 𝑋𝐾 †𝑗 ′ (4.3.5)
𝑗, 𝑗 ′ =1 𝑖=1
| {z }
(𝑉 †𝑉) 𝑗 ′ , 𝑗 =𝛿 𝑗 ′ , 𝑗
𝑟
∑︁
= 𝐾 𝑗 𝑋𝐾 †𝑗 (4.3.6)
𝑗=1
= N(𝑋), (4.3.7)

where the second equality follows because 𝑉𝑖, 𝑗 ′ = (𝑉 † ) 𝑗 ′ ,𝑖 .

We note also that a converse statement holds: if {𝐾𝑖 }𝑖=1 𝑟 and {𝐾 ′ } 𝑠 are two
𝑖 𝑖=1
sets of Kraus operators that realize the same quantum channel, then they are related
by an isometry as in (4.3.3). This is a dynamical version of the statement made
earlier in Section 3.2.5, the statement there being that all purifications of a state are
related by an isometry acting on the purifying system.

Exercise 4.8
Show that the Choi representation of a quantum channel N 𝐴→𝐵 can be expressed
𝑟 of its Kraus operators, with 𝑟 ≥ rank(ΓN ), as
using a set {𝐾𝑖 }𝑖=1 𝐴𝐵

𝑟
∑︁
ΓN
𝐴𝐵 = vec(𝐾𝑖 )vec(𝐾𝑖 ) † . (4.3.8)
𝑖=1

The isometric extension 𝑉𝐴→𝐵𝐸 in (4.3.2) can be thought of physically as

modelling the interaction of the quantum system of interest with its environment, i.e.,
147
Chapter 4: Quantum Channels

B A B
N(ρA ) ρA N(ρA )

ρA VA→BE U AE 0 →BE 00
0
E E E 00
|0iE 0

Figure 4.3: (Left) According to Stinespring’s theorem, the evolution of every

quantum system 𝐴 via a quantum channel N 𝐴→𝐵 can be described as an
interaction of the system 𝐴 with its environment 𝐸 via an isometry 𝑉𝐴→𝐵𝐸 ,
followed by discarding the environment. (Right) The isometry 𝑉𝐴→𝐵𝐸 can be
extended to a unitary 𝑈 𝐴𝐸 ′ →𝐵𝐸 ′′ using, e.g., the construction in (4.3.27), such
that 𝐴 and 𝐸 ′ are initially in a product state, with 𝐸 ′ starting in a pure state. The
two systems then interact accoring to 𝑈, and after discarding the environment
𝐸 ′′, the resulting state is the output of the channel.

anything external to the quantum system that is not under our control. Stinespring’s
theorem then tells us that the evolution of every quantum system can be thought of
as first an interaction of the system with its environment, followed by discarding
the environment; see Figure 4.3. Given a set {𝐾𝑖 }𝑖=1𝑟 of Kraus operators for N, we

can let the environment 𝐸 correspond to a space of dimension 𝑟 and define the
isometry 𝑉𝐴→𝐵𝐸 as
𝑟
∑︁
𝑉𝐴→𝐵𝐸 = 𝐾 𝑗 ⊗ | 𝑗 − 1⟩𝐸 . (4.3.9)
𝑗=1

It is straightforward to show that this is indeed an isometric extension of N since

Tr𝐸 [𝑉 𝑋𝑉 † ] = 𝑟𝑗=1 𝐾 𝑗 𝑋𝐾 †𝑗 = N(𝑋).
Í

Exercise 4.9
Show that the Choi representation of a quantum channel N 𝐴→𝐵 can be expressed
using an isometric extension 𝑉𝐴→𝐵𝐸 of N, with 𝑑 𝐸 ≥ rank(ΓN 𝐴𝐵 ), as

ΓN †
𝐴𝐵 = Tr 𝐸 [vec(𝑉)vec(𝑉) ]. (4.3.10)

Hence, conclude that √1 vec(𝑉)

𝑑𝐴
= ( 1 𝐴 ⊗ 𝑉𝐴′ →𝐵𝐸 )|Φ⟩ 𝐴𝐴′ is a purification of
the Choi state ΦN
𝐴𝐵 of N.

148
Chapter 4: Quantum Channels

Exercise 4.10
Let N 𝐴→𝐵 be a quantum channel with the following Kraus and Stinespring
representations:
𝑟
∑︁
N(𝑋) = 𝐾𝑖 𝑋𝐾𝑖† = Tr𝐸 [𝑉 𝑋𝑉 † ]. (4.3.11)
𝑖=1

1. Verify using (2.2.182) that the adjoint map N† can be represented in the
following two ways:
𝑟
𝐾𝑖†𝑌 𝐾𝑖 = 𝑉 † (𝑌 ⊗ 1𝐸 )𝑉 .
∑︁
†
N (𝑌 ) = (4.3.12)
𝑖=1

2. Using 1., verify the following facts:

(a) The adjoint of a completely positive map is completely positive.
(b) The adjoint of a trace preserving map is a unital map (recall Defini-
tion 2.19). More generally, the adjoint of a trace non-increasing map is
subunital, meaning that N† ( 1𝐵 ) ≤ 1 𝐴 .
(c) The adjoint of a unital map is trace preserving, and the adjoint of a
subunital map is trace non-increasing.

Exercise 4.11
1. Let N 𝐴→𝐵 be a positive trace-preserving map. Prove that the set
𝑑 𝐵 −1
{N† (|𝑖⟩⟨𝑖|)}𝑖=0 is a POVM. More generally, prove that the set
𝑑2 𝑑2
{N† (𝐸 𝑗 𝜌𝐸 †𝑗 )} 𝑗=1
𝐵
is a POVM for every orthonormal basis {𝐸 𝑗 } 𝑗=1
𝐵
for
L(H𝐵 ) and every quantum state 𝜌 ∈ D(H𝐵 ).
2. Conversely, let {𝑀𝑥 }𝑥∈X be a POVM, where X is a finite set. Prove that
there exists a quantum channel N such that 𝑀𝑥 = N† (|𝑥⟩⟨𝑥|) for all 𝑥 ∈ X,
where {|𝑥⟩}𝑥∈X is an orthonormal set. (Hint: Recall Naimark’s Theorem
(Theorem 3.22).)

149
Chapter 4: Quantum Channels

4.3.1 Relating Quantum State Extensions and Purifications

We can now establish a fundamental relationship between a purification 𝜓 𝑅 𝐴 of a

state 𝜌 𝐴 and an extension 𝜔 𝑅′ 𝐴 of 𝜌 𝐴 .

Proposition 4.4
Let 𝜌 𝐴 be a quantum state with purification 𝜓 𝑅 𝐴 . For every extension 𝜔 𝑅′ 𝐴
of 𝜌 𝐴 , there exists a quantum channel N 𝑅→𝑅′ such that

N 𝑅→𝑅′ (𝜓 𝑅 𝐴 ) = 𝜔 𝑅′ 𝐴 . (4.3.13)

Proof: Consider a purification 𝜙 𝑅′′ 𝑅′ 𝐴 of 𝜔 𝑅′ 𝐴 , with purifying system 𝑅′′ satisfying

𝑑 𝑅′′ ≥ rank(𝜔 𝑅′ 𝐴 ). Since 𝜔 𝑅′ 𝐴 is an extension of 𝜌 𝐴 , we have that
Tr 𝑅′′ 𝑅′ [𝜙 𝑅′′ 𝑅′ 𝐴 ] = 𝜌 𝐴 , (4.3.14)
which means that 𝜙 𝑅′′ 𝑅′ 𝐴 is a purification of 𝜌 𝐴 . The state 𝜓 𝑅 𝐴 is also a purification
of 𝜌 𝐴 , which means that, by the isometric equivalence of purifications (see (3.2.71)–
(3.2.74) and the paragraph thereafter), there exists an isometry 𝑉𝑅→𝑅′′ 𝑅′ such
that
𝑉𝑅→𝑅′′ 𝑅′ |𝜓⟩ 𝑅 𝐴 = |𝜙⟩ 𝑅′′ 𝑅′ 𝐴 . (4.3.15)
Now, let us use this isometry to define the channel N 𝑅→𝑅′ :
†
N 𝑅→𝑅′ (·) = Tr 𝑅′′ [𝑉𝑅→𝑅′′ 𝑅′ (·)𝑉𝑅→𝑅 ′′ 𝑅 ′ ]. (4.3.16)
It then follows that
†
N 𝑅→𝑅′ (𝜓 𝑅 𝐴 ) = Tr 𝑅′′ [𝑉𝑅→𝑅′′ 𝑅′ 𝜓 𝑅 𝐴𝑉𝑅→𝑅 ′′ 𝑅 ′ ] = Tr 𝑅 ′′ [𝜙 𝑅 ′′ 𝑅 ′ 𝐴 ] = 𝜔 𝑅 ′ 𝐴 , (4.3.17)

as required. ■

Proposition 4.4 tells us that an extension of a quantum state can be “reached” via
a quantum channel acting on a purification of the state. In this sense, a purification
can be viewed as the “strongest” extension of a state.

4.3.2 Complementary Channels

As stated earlier, the Stinespring representation N 𝐴→𝐵 (𝑋 𝐴 ) = Tr𝐸 [𝑉 𝑋𝑉 † ] of a

quantum channel N 𝐴→𝐵 , where 𝑉𝐴→𝐵𝐸 is an isometry, can be interpreted as an
150
Chapter 4: Quantum Channels

interaction of the quantum system 𝐴 of interest with its environment 𝐸 followed

by discarding 𝐸. If instead we discard the system 𝐵 of the isometric channel
V 𝐴→𝐵𝐸 (𝑋 𝐴 ) = 𝑉 𝑋𝑉 † , then we obtain the state of the environment after the
interaction with 𝐴. This defines a channel complementary to N 𝐴→𝐵 .

Definition 4.5 Complementary Channel

Let 𝑉𝐴→𝐵𝐸 be an isometry and N 𝐴→𝐵 a quantum channel defined as

N 𝐴→𝐵 (𝑋 𝐴 ) = Tr𝐸 [𝑉 𝑋𝑉 † ]. (4.3.18)

The complementary channel for N 𝐴→𝐵 associated with the isometric extension
𝑉𝐴→𝐵𝐸 is denoted by N𝑐𝐴→𝐸 and is defined as

N𝑐𝐴→𝐸 (𝑋 𝐴 ) B Tr 𝐵 [𝑉 𝑋𝑉 † ]. (4.3.19)

Related to the above, the channel M𝑐𝐴→𝐸 is a complementary channel for the
channel M 𝐴→𝐵 if there exists an isometric channel W 𝐴→𝐵𝐸 such that

M 𝐴→𝐵 = Tr𝐸 ◦ W 𝐴→𝐵𝐸 (4.3.20)

M𝑐𝐴→𝐸 = Tr 𝐵 ◦ W 𝐴→𝐵𝐸 . (4.3.21)

Given a channel N 𝐴→𝐵 , it does not have a unique complementary channel, just
as it does not have a unique Kraus representation, nor does a given quantum state 𝜌 𝐴
have a unique purification. Similar to the latter scenarios, however, it is possible
to show that all complementary channels for N 𝐴→𝐵 are related by an isometric
channel acting on their output. That is, let us suppose that (N1 ) 𝑐𝐴→𝐸 and (N2 ) 𝑐𝐴→𝐸 ′
are complementary channels for N 𝐴→𝐵 . Then there exists an isometric channel
S𝐸→𝐸 ′ such that
(N2 ) 𝑐𝐴→𝐸 ′ = S𝐸→𝐸 ′ ◦ (N1 ) 𝑐𝐴→𝐸 . (4.3.22)

Exercise 4.12
Let N 𝐴→𝐵 be a quantum channel with a set {𝐾𝑖 }𝑖=1
𝑟 of Kraus operators, where

𝑟 ≥ rank(ΓN
𝐴𝐵 ).

151
Chapter 4: Quantum Channels

1. Using (4.3.9), show that a channel complementary to N is given by

𝑟
∑︁
N (𝑋) =
𝑐
Tr[𝐾𝑖 𝑋𝐾𝑖†′ ]|𝑖 − 1⟩⟨𝑖′ − 1| 𝐸 (4.3.23)
𝑖,𝑖 ′ =1

for all 𝑋 ∈ L(H 𝐴 ).

𝐾𝑖† ⊗ |𝑖 − 1⟩. Show that the Choi representation of the
Í𝑟
2. Let 𝑊 B 𝑖=1
complementary channel in (4.3.23) is

ΓN † T
𝑐
𝐴𝐸 = (𝑊𝑊 ) . (4.3.24)

The notion of a complementary channel arises in the scenario in which two

parties, Alice and Bob, wish to communicate to each other using a channel N 𝐴→𝐵
in the presence of an eavesdropper Eve. In this scenario, we can naturally identify
Alice and Bob with the quantum systems 𝐴 and 𝐵 and Eve with the system 𝐸, where
𝐸 is as given in Definition 4.5. Any signals sent through the quantum channel by
Alice are received by Bob via the action of N 𝐴→𝐵 , while Eve receives a signal
via the action of N𝑐𝐴→𝐸 . These concepts are important for private communication
over quantum channels, which is the topic of Chapter 16. Two important classes of
channels are relevant in this context.

Definition 4.6 Degradable and Anti-Degradable Channels

A channel N 𝐴→𝐵 is called degradable if there exists a channel D𝐵→𝐸 , called a
degrading channel, such that

D𝐵→𝐸 ◦ N 𝐴→𝐵 = N𝑐𝐴→𝐸 , (4.3.25)

where N𝑐𝐴→𝐸 is a complementary channel of N 𝐴→𝐵 . The channel N 𝐴→𝐵 is

called anti-degradable if a complementary channel N𝑐𝐴→𝐸 of it is degradable,
i.e., if there exists a channel A𝐸→𝐵 , called an anti-degrading channel, such that

A𝐸→𝐵 ◦ N𝑐𝐴→𝐸 = N 𝐴→𝐵 . (4.3.26)

See Figure 4.4 for a schematic depiction of degradable and anti-degradable

channels. A degradable channel is one whose complement can be simulated (via D)
using the output of N. This means that Bob can simulate Eve’s received signal. On
152
Chapter 4: Quantum Channels

B B

N N

A D A A

Nc Nc

E E

(a) N degradable. (b) N anti-degradable.

Figure 4.4: Degradable and anti-degradable channels. In (a), the channel N

is degradable, meaning that there exists a channel D that Bob can apply to the
output he receives via N that can be used to simulate what Eve receives via N𝑐 .
In (b) on the other hand, N is anti-degradable since Eve can simulate, using the
channel E, what Bob receives.

the other hand, an anti-degradable channel is such that N can be simulated (via A)
using the output of N𝑐 , which means that Eve can simulate Bob’s received signal.

Exercise 4.13
Prove that a quantum channel N 𝐴→𝐵 is anti-degradable if and only if its Choi state
ΦN𝐴𝐵 is two-extendible, meaning that there exists a state 𝜎𝐴𝐵𝐵 , with 𝑑 𝐵 = 𝑑 𝐵 ,
′ ′
N
such that Tr 𝐵′ [𝜎𝐴𝐵𝐵′ ] = Tr 𝐵 [𝜎𝐴𝐵𝐵′ ] = Φ 𝐴𝐵 . (Hint: Use Proposition 4.4.)

4.3.3 Unitary Extensions of Quantum Channels from Isometric

Extensions

We can always extend every isometric extension 𝑉𝐴→𝐵𝐸 of a channel N 𝐴→𝐵 to a

unitary 𝑈 𝐴𝐸 ′ →𝐵𝐸 ′′ in a way similar to what we used in (2.2.167) in the proof of the
operator Jensen inequality (Theorem 2.16). Let the unitary 𝑈 𝐴𝐸 ′ →𝐵𝐸 ′′ be defined
as the following block matrix:

𝑉 1 − 𝑉𝑉 † 0𝑑 𝐵 𝑑𝐸 ×𝑑 ′
𝑈 = 0𝑑 𝐴×𝑑 𝐴 𝑉†
© ª
0𝑑 𝐴×𝑑 ′ ® , (4.3.27)
« 0𝑑 ′ ×𝑑 𝐴 0𝑑 ′ ×𝑑 𝐵 𝑑 𝐸 1𝑑 ′ ¬
153
Chapter 4: Quantum Channels

where we set 𝑑 ′ := (𝑑 𝐵 − 1)𝑑 𝐴 . Without the various dimensions indicated, 𝑈 is

more simply expressed as
𝑉 1 − 𝑉𝑉 † 0
𝑈 = 0 𝑉†
© ª
0® . (4.3.28)
«0 0 1¬
Note that 𝑉 is a 𝑑 𝐵 𝑑 𝐸 × 𝑑 𝐴 matrix and 𝑉𝑉 † is a 𝑑 𝐵 𝑑 𝐸 × 𝑑 𝐵 𝑑 𝐸 matrix. Furthermore,
let us suppose that 𝑑 𝐸 = 𝑑 𝐴 𝑑 𝐵 . Note that, by the Stinespring theorem, it is always
possible to pick this as the dimension of H𝐸 since 0 < rank(ΓN 𝐴𝐵 ) ≤ 𝑑 𝐴 𝑑 𝐵 . Then

𝑑 𝐵 𝑑 𝐸 + 𝑑 𝐴 + 𝑑 ′ = 𝑑 𝐴 𝑑 𝐵 (𝑑 𝐵 + 1), (4.3.29)
and we conclude that 𝑈 is a (𝑑 𝐴 𝑑 𝐵 (𝑑 𝐵 + 1)) × (𝑑 𝐴 𝑑 𝐵 (𝑑 𝐵 + 1)) matrix. It is also
indeed a unitary because
𝑉 1 − 𝑉𝑉 † 0 𝑉† 0 0
†
𝑈𝑈 = 0 † 0 ® 1 − 𝑉𝑉 𝑉 0 ®
†
© ª© ª
𝑉 (4.3.30)
«0 0 1¬ « 0 0 1¬
𝑉𝑉 † + ( 1 − 𝑉𝑉 † )( 1 − 𝑉𝑉 † ) ( 1 − 𝑉𝑉 † )𝑉 0
= 𝑉 † ( 1 − 𝑉𝑉 † ) 𝑉 †𝑉
© ª
0® (4.3.31)
« 0 0 1¬
1 0 0
= 0 1 0® .
© ª
(4.3.32)
« 0 0 1¬
Similarly, it can be shown that 𝑈 †𝑈 = 1. By defining the system 𝐸 ′ with dimension
𝑑 𝐸 ′ = 𝑑 𝐵 (𝑑 𝐵 + 1), we can think of 𝑈 as acting on the input tensor-product space
H𝐸 ′ ⊗ H 𝐴 . Then, we can embed the state 𝜌 𝐴 into this larger space as
𝜌𝐴 0 0
|0⟩⟨0| 𝐸 ′ ⊗ 𝜌 𝐴 = 0 0 0® ,
© ª
(4.3.33)
« 0 0 0¬
so that
𝜌𝐴 0 0 𝑉 𝜌𝑉 † 0 0
ª † ©
𝑈 0 0 0® 𝑈 = 0
© ª
0 0® . (4.3.34)
« 0 0 0¬ « 0 0 0¬
By defining the system 𝐸 ′′ with dimension 𝑑 𝐸 ′′ = 𝑑 𝐴 (𝑑 𝐵 + 1), we can think of the
output space of 𝑈 as the tensor-product space H𝐵 ⊗ H𝐸 ′′ , so that
N(𝜌 𝐴 ) = Tr𝐸 [𝑉 𝜌𝑉] = Tr𝐸 ′′ [𝑈 (|0⟩⟨0| 𝐸 ′ ⊗ 𝜌 𝐴 )𝑈 † ]. (4.3.35)
154
Chapter 4: Quantum Channels

The above construction is not necessarily an efficient construction (using as

few extra degrees of freedom as possible), but it illustrates the principle that every
quantum channel can be thought of as arising from
1. adjoining an environment state |0⟩⟨0| 𝐸 ′ to the input system 𝐴,
2. performing a unitary interaction 𝑈, and then
3. tracing over an output environment system 𝐸 ′′,
thus assigning a strong physical meaning to the notion of isometric extension of a
quantum channel; see Figure 4.3.
Another construction of a unitary operator that is less explicit but more
efficient is as follows. Let 𝑉𝐴→𝐵𝐸 be the isometric extension from (4.3.9), and
let 𝑟 = rank(ΓN 𝐴𝐵 ). Since 𝑟 ≤ 𝑑 𝐴 𝑑 𝐵 , without loss of generality, we can take the
output environment system 𝐸 to have dimension 𝑑 𝐸 = 𝑑 𝐴 𝑑 𝐵 . Let {|𝑘⟩ 𝐴 ⊗ |ℓ⟩𝐸 ′ } 𝑘,ℓ
be an orthonormal basis for the input system 𝐴 and an input environment system
𝐸 ′, with 𝑑 𝐸 ′ = 𝑑 2𝐵 , where 0 ≤ 𝑘 ≤ 𝑑 𝐴 − 1 and 0 ≤ ℓ ≤ 𝑑 𝐸 ′ − 1. Then define the
orthonormal vectors |𝜙 𝑘,1 ⟩𝐵𝐸 for 0 ≤ 𝑘 ≤ 𝑑 𝐴 − 1 as follows:
𝑑𝐸
©∑︁
|𝜙 𝑘,0 ⟩𝐵𝐸 𝐾 𝑗 ⊗ | 𝑗 − 1⟩𝐸 ⟨0| 𝐸 ′ ® |𝑘⟩ 𝐴 ⊗ |0⟩𝐸 ′
ª
B (4.3.36)
« 𝑗=1 ¬
= 𝑉𝐴→𝐵𝐸 |𝑘⟩ 𝐴 . (4.3.37)
The fact that these vectors form an orthonormal set is a consequence of the facts
that 𝑉 †𝑉 = 1 𝐴 and {|𝑘⟩ 𝐴 } 𝑑𝑘=0
𝐴−1
is an orthonormal set. We then define the action of
the unitary 𝑈 𝐴𝐸 ′ →𝐵𝐸 on these vectors |𝑘⟩ 𝐴 ⊗ |0⟩𝐸 ′ as follows:
𝑈 𝐴𝐸 ′ →𝐵𝐸 |𝑘⟩ 𝐴 ⊗ |0⟩𝐸 ′ = |𝜙 𝑘,0 ⟩𝐵𝐸 , (4.3.38)
for 0 ≤ 𝑘 ≤ 𝑑 𝐴 − 1. By construction, we have that
𝐴−1
𝑑∑︁
|𝜙 𝑘,0 ⟩⟨𝜙 𝑘,0 | 𝐵𝐸 = 𝑉𝑉 † . (4.3.39)
𝑘=0

Now let a spectral decomposition of the projection 1𝐵𝐸 − 𝑉𝑉 † of dimension

𝑑 𝐵 𝑑 𝐸 − 𝑑 𝐴 = (𝑑 2𝐵 − 1)𝑑 𝐴 be given by
2 −1
𝐴−1 𝑑∑︁
𝑑∑︁ 𝐵

1𝐵𝐸 − 𝑉𝑉 † = |𝜙 𝑘,ℓ ⟩⟨𝜙 𝑘,ℓ | 𝐵𝐸 . (4.3.40)

𝑘=0 ℓ=1

155
Chapter 4: Quantum Channels

We can thus complete the action of the unitary 𝑈 𝐴𝐸 ′ →𝐵𝐸 on the remaining vectors
as follows:
𝑈 𝐴𝐸 ′ →𝐵𝐸 |𝑘⟩ 𝐴 ⊗ |ℓ⟩𝐸 ′ = |𝜙 𝑘,ℓ ⟩𝐵𝐸 , (4.3.41)
for 0 ≤ 𝑘 ≤ 𝑑 𝐴 − 1 and 1 ≤ ℓ ≤ 𝑑 2𝐵 − 1. Thus, the full unitary 𝑈 𝐴𝐸 ′ →𝐵𝐸 is specified
as
2 −1
𝐴−1 𝑑∑︁
𝑑∑︁ 𝐵

𝑈 𝐴𝐸 ′ →𝐵𝐸 B |𝜙 𝑘,ℓ ⟩𝐵𝐸 ⟨𝑘 | 𝐴 ⊗ ⟨ℓ| 𝐸 ′ . (4.3.42)

𝑘=0 ℓ=0
By construction, the following identity holds for every input state 𝜌 𝐴 :
𝑈 𝐴𝐸 ′ →𝐵𝐸 (𝜌 𝐴 ⊗ |0⟩⟨0| 𝐸 ′ ) (𝑈 𝐴𝐸 ′ →𝐵𝐸 ) † = 𝑉 𝜌 𝐴𝑉 † , (4.3.43)
so that we can realize the isometric channel 𝜌 𝐴 ↦→ 𝑉 𝜌 𝐴𝑉 † by tensoring in an
environment state |0⟩⟨0| 𝐸 ′ and applying the unitary 𝑈 𝐴𝐸 ′ →𝐵𝐸 . We then realize
the original channel N 𝐴→𝐵 by applying a final partial trace over the environment
system 𝐸:
N 𝐴→𝐵 (𝜌 𝐴 ) = Tr𝐸 [𝑈 𝐴𝐸 ′ →𝐵𝐸 (𝜌 𝐴 ⊗ |0⟩⟨0| 𝐸 ′ ) (𝑈 𝐴𝐸 ′ →𝐵𝐸 ) † ]. (4.3.44)

4.4 General Types of Channels

4.4.1 Preparation, Appending, and Replacement Channels

The preparation of a quantum system in a given (fixed) state, as well as taking the
tensor product of a state with a given (fixed) state, can both be viewed as quantum
channels.

Definition 4.7 Preparation and Appending Channels

For a quantum system 𝐴 and a state 𝜌 𝐴 , the preparation channel P 𝜌 𝐴 is defined
for 𝛼 ∈ C as
P 𝜌 𝐴 (𝛼) B 𝛼𝜌 𝐴 . (4.4.1)
When acting in parallel with the identity channel on a linear operator 𝑌𝐵 of
another quantum system 𝐵, the preparation channel P 𝜌 𝐴 is called the appending
channel P 𝜌 𝐴 and is defined as
P 𝜌 𝐴 (𝑌𝐵 ) ≡ (P 𝜌 𝐴 ⊗ id𝐵 )(𝑌𝐵 ) = 𝜌 𝐴 ⊗ 𝑌𝐵 . (4.4.2)

156
Chapter 4: Quantum Channels

In other words, the appending channel P 𝜌 𝐴 ⊗ id𝐵 takes the tensor product of its
argument with 𝜌 𝐴 .

One way to determine whether a map N on quantum states is completely

positive is to find a Kraus representation for it, i.e., a set {𝐾𝑖 }𝑖 of operators such
that N(𝑋) = 𝑖 𝐾𝑖 𝑋𝐾𝑖† for every operator 𝑋, for then by Theorem 4.3 we get that
Í

N is completely positive. If in addition the Kraus operators satisfy 𝑖 𝐾𝑖† 𝐾𝑖 = 1,

Í
then N is also trace preserving and therefore a quantum channel.
Supposing that 𝜌 𝐴 has a spectral decomposition of the form
∑︁
𝜌𝐴 = 𝜆 𝑘 |𝜙 𝑘 ⟩⟨𝜙 𝑘 | 𝐴 , (4.4.3)
𝑘
√
one set of Kraus operators for the preparation channel P 𝜌 𝐴 is { 𝜆√𝑘 |𝜙 𝑘 ⟩ 𝐴 } 𝑘 . A set
of Kraus operators for the appending channel P 𝜌 𝐴 ⊗ id𝐵 is then { 𝜆 𝑘 |𝜙 𝑘 ⟩ ⊗ 1𝐵 } 𝑘 .

Exercise 4.14
Determine the Choi representation and a Stinespring representation of the
preparation channel P 𝜌 𝐴 corresponding to the quantum state 𝜌 𝐴 .

Definition 4.8 Replacement Channel

For a state 𝜎𝐵 , the replacement channel R𝜎𝐴→𝐵 𝐵
is defined as the channel that
traces out its input and replaces it with the state 𝜎𝐵 ; i.e.,

R𝜎𝐴→𝐵
𝐵
(𝑋 𝐴 ) = Tr[𝑋 𝐴 ]𝜎𝐵 (4.4.4)

for every linear operator 𝑋 𝐴 . When acting on one share of a bipartite state 𝜌 𝑅 𝐴 ,
the replacement channel R𝜎𝐴→𝐵 𝐵
has the following action:

R𝜎𝐴→𝐵
𝐵
(𝜌 𝑅 𝐴 ) = Tr 𝐴 [𝜌 𝑅 𝐴 ] ⊗ 𝜎𝐵 = 𝜌 𝑅 ⊗ 𝜎𝐵 . (4.4.5)

Observe that we can write the replacement channel R𝜎𝐴→𝐵 𝐵

as the composition
of the partial trace over 𝐴 followed by the preparation/appending channel P𝜎𝐵 :

R𝜎𝐴→𝐵
𝐵
= P𝜎𝐵 ◦ Tr 𝐴 . (4.4.6)
157
Chapter 4: Quantum Channels

We often omit the superscript on R𝜎𝐴→𝐵

𝐵
when it is clear from the context that the
replacement state is 𝜎𝐵 .

Exercise 4.15
Given a quantum state 𝜎𝐵 , determine the Choi representation, as well as Kraus
and Stinespring representations, of the replacement channel R𝜎𝐴→𝐵
𝐵
.

4.4.2 Trace and Partial-Trace Channels

Recall the trace and partial trace of a linear operator from Definition 3.2. As a
map acting on linear operators, one can ask whether the partial trace is a channel.
The answer, perhaps not surprisingly, is “yes.” In fact, observe that the definition
in (3.2.17) of the partial trace Tr 𝐵 over 𝐵 is already in Kraus form, with Kraus
operators 𝐾 𝑗 = 1 𝐴 ⊗ ⟨ 𝑗 | 𝐵 . This means that Tr 𝐵 is completely positive. It is also
trace preserving because
𝐵 −1
𝑑∑︁ 𝐵 −1
𝑑∑︁ 𝐵 −1
𝑑∑︁
𝐾 †𝑗 𝐾 𝑗 = ( 1 𝐴 ⊗ | 𝑗⟩𝐵 )( 1 𝐴 ⊗ ⟨ 𝑗 | 𝐵 ) = 1 𝐴 ⊗ | 𝑗⟩⟨ 𝑗 | 𝐵 = 1 𝐴𝐵 , (4.4.7)
𝑗=0 𝑗=0 𝑗=0

| 𝑗⟩⟨ 𝑗 | 𝐵 = 1𝐵 .
Í𝑑 𝐵 −1
where we used the fact that 𝑗=0

Exercise 4.16
1. Determine the Choi representation, as well as a Stinespring representation,
of the partial trace channel Tr 𝐵 .
2. Prove that the adjoint of the partial trace channel Tr 𝐵 is

Tr†𝐵 (𝑋 𝐴 ) = 𝑋 𝐴 ⊗ 1𝐵 . (4.4.8)

Unlike the trace and partial trace, the transpose and partial transpose are trace-
preserving maps but not completely positive. Indeed, for the latter, recall from
(3.2.83) that T𝐵 (Φ 𝐴𝐵 ) = 𝑑1 𝐹𝐴𝐵 , so that its Choi representation is ΓT𝐴𝐵𝐵 = 𝐹𝐴𝐵 , which
we know has negative eigenvalues, as shown in (3.2.127). So by Theorem 4.3, the
transpose map T𝐵 is not completely positive.

158
Chapter 4: Quantum Channels

4.4.3 Isometric and Unitary Channels

Two more simple examples of quantum channels are isometric and unitary channels.
An isometric channel conjugates the channel input by an isometry, and a unitary
channel conjugates the channel input by a unitary. Specifically, the isometric
channel V corresponding to an isometry 𝑉 is

V(𝑋) B 𝑉 𝑋𝑉 † . (4.4.9)

Similarly, the unitary channel U corresponding to a unitary 𝑈 is

U(𝑋) B 𝑈 𝑋𝑈 † . (4.4.10)

Since every unitary is also an isometry, it follows that every unitary channel is an
isometric channel. Isometric channels are completely positive because they can
be described using only one Kraus operator, the isometry 𝑉. In fact, a quantum
channel is isometric if and only if it has a single Kraus operator.
Observe that by the unitarity of 𝑈, the map

U† (𝑌 ) B 𝑈 †𝑌𝑈, (4.4.11)

i.e., the conjugation by 𝑈 † , is also a channel. In particular,

U† ◦ U = U ◦ U† = id, (4.4.12)

so that U† is the inverse channel of U.

On the other hand, for an isometry 𝑉, conjugation by 𝑉 † is not necessarily
a channel: although the map V† (𝑌 ) B 𝑉 †𝑌𝑉 is completely positive, it is not
necessarily trace preserving because 𝑉𝑉 † ≠ 1 in general. However, by defining the
reversal channel R𝑉 as

R𝑉 (𝑌 ) B V† (𝑌 ) + Tr[( 1 − 𝑉𝑉 † )𝑌 ]𝜎, (4.4.13)

where 𝜎 is an arbitrary (but fixed) state, we find that R𝑉 is trace preserving:

Tr[R𝑉 (𝑌 )] = Tr V† (𝑌 ) + Tr[( 1 − 𝑉𝑉 † )𝑌 ]𝜎

(4.4.14)
= Tr[𝑉 †𝑌𝑉] + Tr[𝑌 ] − Tr[𝑉𝑉 †𝑌 ] (4.4.15)
= Tr[𝑌 ]. (4.4.16)

159
Chapter 4: Quantum Channels

Since it is also completely positive, being the sum of completely positive maps, the
map R𝑉 is indeed a quantum channel. Like U† , the reversal channel R𝑉 reverses
the action of V:

(R𝑉 ◦ V)(𝑋) = V† (V(𝑋)) + Tr[( 1 − 𝑉𝑉 † )V(𝑋)]𝜎 (4.4.17)

= 𝑉 †𝑉 𝑋𝑉 †𝑉 + Tr[( 1 − 𝑉𝑉 † )𝑉 𝑋𝑉 † ]𝜎 (4.4.18)
= 𝑋 + (Tr[𝑉 𝑋𝑉 † ] − Tr[𝑉𝑉 †𝑉 𝑋𝑉 † ])𝜎 (4.4.19)
= 𝑋. (4.4.20)

In the above sense, R𝑉 is a left inverse of V. Unlike U† , however, R𝑉 is not the

right inverse of V because the equality (V ◦ R𝑉 )(𝑌 ) = 𝑌 need not hold.

Exercise 4.17
Determine the Choi representation, as well as a Kraus and Stinespring represen-
tation, of the reversal channel R𝑉 corresponding to an isometry 𝑉.

4.4.4 Classical–Quantum and Quantum–Classical Channels

Any classical probability distribution 𝑝 : X → [0, 1] over a finite alphabet X can

be represented as a quantum state of a |X|-dimensional system 𝑋 that is diagonal in
an orthonormal basis {|𝑥⟩}𝑥∈X of 𝑋. Specifically, the probability distribution can
be represented as the state
∑︁
𝜌𝑋 = 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 . (4.4.21)
𝑥∈X

States that are diagonal in a preferred basis, as in the equation above, are typically
called classical states. In addition to being in one-to-one correspondence with
classical probability distributions, classical states do not exhibit the quantum
properties of coherence and entanglement.
In Chapter 12, however, we are interested in classical communication over
quantum channels. We are then interested in so-called classical–quantum channels,
which we discuss in this section, that take a classical state as input and output a
quantum state.

160
Chapter 4: Quantum Channels

Definition 4.9 Classical–Quantum Channel

A classical–quantum channel is a map from a system 𝑋 with alphabet X and
orthonormal basis {|𝑥⟩ : 𝑥 ∈ X} to a quantum system 𝐴 with a specified set
{𝜎𝐴𝑥 : 𝑥 ∈ X} of states such that

|𝑥⟩⟨𝑥 ′ | ↦→ 𝛿𝑥,𝑥 ′ 𝜎𝐴𝑥 ∀ 𝑥, 𝑥 ′ ∈ X. (4.4.22)

If Ncq is a classical–quantum channel, then for every classical state 𝜌 𝑋 =

Í
𝑥∈X 𝑝(𝑥)|𝑥⟩⟨𝑥|, we have that
∑︁
Ncq (𝜌 𝑋 ) = 𝑝(𝑥)𝜎𝐴𝑥 . (4.4.23)
𝑥∈X

More generally, for every |X|-dimensional system 𝐴 and every state 𝜌 𝐴 that is
not necessarily classical and is expressed in the computational basis as 𝜌 𝐴 =
Í ′ ′
𝑥,𝑥 ′ ∈X ⟨𝑥|𝜌 𝐴 |𝑥 ⟩|𝑥⟩⟨𝑥 |, we find that
∑︁ ∑︁
cq ′
N (𝜌 𝐴 ) = 𝑥
⟨𝑥|𝜌 𝐴 |𝑥 ⟩𝛿𝑥,𝑥 ′ 𝜎𝐴 = ⟨𝑥|𝜌 𝐴 |𝑥⟩𝜎𝐴𝑥 . (4.4.24)
𝑥,𝑥 ′ ∈X 𝑥∈X

Therefore, for every quantum state, the classical–quantum channel Ncq takes
the input state, measures it in the computational basis {|𝑥⟩}𝑥∈X , and with the
corresponding outcome probability ⟨𝑥|𝜌 𝐴 |𝑥⟩, outputs the state 𝜎𝐴𝑥 .

Exercise 4.18
Show that the Choi state of a classical–quantum channel Ncq is
1 ∑︁
ΦN
cq
= |𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 . (4.4.25)
𝑋𝐴
|X|
𝑥∈X

In other words, the Choi state of a classical–quantum channel is a classical–

quantum state.

If the states 𝜎𝐴𝑥 have spectral decomposition

𝑟𝑥
∑︁
𝜎𝐴𝑥 = 𝜆𝑥𝑗 |𝜑𝑥𝑗 ⟩⟨𝜑𝑥𝑗 |, (4.4.26)
𝑗=1

161
Chapter 4: Quantum Channels

where 𝑟 𝑥 = rank(𝜎𝐴𝑥 ), then {𝐾 𝑥𝑗 : 𝑥 ∈ X, 1 ≤ 𝑗 ≤ 𝑟 𝑥 } is a set of Kraus operators

√
for Ncq , where 𝐾 𝑥𝑗 = 𝜆𝑥𝑗 |𝜑𝑥𝑗 ⟩ 𝐴 ⟨𝑥| 𝑋 . Indeed, for all 𝑥, 𝑥 ′ ∈ X,
𝑟 𝑥 ′′
∑︁ ∑︁ ′′ ′′
𝐾 𝑥𝑗 |𝑥⟩⟨𝑥 ′ |(𝐾 𝑥𝑗 ) †
𝑥 ′′ ∈X 𝑗=1
∑︁ ∑︁𝑟 𝑥 ′′ √︁ √︁ ′′
′′ ′′ ′′
= 𝜆𝑥𝑗 |𝜑𝑥𝑗 ⟩⟨𝑥 ′′ |𝑥⟩⟨𝑥 ′ |𝑥 ′′⟩⟨𝜑𝑥𝑗 | 𝜆𝑥𝑗 (4.4.27)
𝑥 ′′ ∈X 𝑗=1
𝑟𝑥
∑︁
= 𝛿𝑥,𝑥 ′ 𝜆𝑥𝑘 |𝜑𝑥𝑗 ⟩⟨𝜑𝑥𝑗 | (4.4.28)
𝑗=1
= 𝛿𝑥,𝑥 ′ 𝜎𝐴𝑥 (4.4.29)
= N (|𝑥⟩⟨𝑥 ′ |),
cq
(4.4.30)

and
𝑟𝑥
∑︁ ∑︁ 𝑟𝑥
∑︁ ∑︁
(𝐾 𝑥𝑗 ) † 𝐾 𝑥𝑗 = 𝜆𝑥𝑗 |𝑥⟩ ⟨𝜑𝑥𝑗 |𝜑𝑥𝑗 ⟩ ⟨𝑥| (4.4.31)
𝑥∈X 𝑗=1 𝑥∈X 𝑗=1
| {z }
=1
𝑟𝑥
∑︁ ∑︁
= 𝜆𝑥𝑗 |𝑥⟩⟨𝑥| (4.4.32)
𝑥∈X 𝑗=1
|{z}
=1 ∀ 𝑥
= 1𝑋 . (4.4.33)

Also, observe from the construction above that every classical–quantum channel
has a Kraus representation with unit-rank Kraus operators.
Having described classical–quantum channels, let us now describe channels for
which the situation is opposite, such that they accept quantum inputs and provide
classical outputs.

Definition 4.10 Quantum–Classical Channel

Given a quantum system 𝐴 and a measurement on 𝐴 with corresponding POVM
{𝑀𝑥 }𝑥∈X indexed by elements of a finite alphabet X, a quantum–classical
channel, or measurement channel, is a map M 𝐴→𝑋 from the quantum system 𝐴

162
Chapter 4: Quantum Channels

to a classical system 𝑋 with alphabet X such that

∑︁
M 𝐴→𝑋 (𝜌 𝐴 ) = Tr[𝑀𝑥 𝜌 𝐴 ]|𝑥⟩⟨𝑥| 𝑋 (4.4.34)
𝑥∈X

for every state 𝜌 𝐴 on 𝐴, where {|𝑥⟩ : 𝑥 ∈ X} is an orthonormal basis for 𝑋.

A measurement channel thus takes the measurement outcome probabilities

Tr[𝑀𝑥 𝜌 𝐴 ] and arranges them into a classical state.

Exercise 4.19
Prove that the Choi state of a measurement channel M 𝐴→𝑋 , as defined in
(4.4.34), is
M 1 ∑︁ T
Φ 𝐴𝑋 = (𝑀𝑥 ) 𝐴 ⊗ |𝑥⟩⟨𝑥| 𝑋 . (4.4.35)
𝑑𝐴
𝑥∈X

By writing a spectral decomposition of 𝑀𝑥 as

𝑟𝑥
∑︁
𝑀𝑥 = 𝜇𝑥𝑗 |𝜙𝑥𝑗 ⟩⟨𝜙𝑥𝑗 |, (4.4.36)
𝑗=1

where 𝑟 𝑥 = rank(𝑀𝑥 ), we can write

𝑟𝑥
∑︁
Tr[𝑀𝑥 𝜌] = 𝜇𝑥𝑗 ⟨𝜙𝑥𝑗 |𝜌|𝜙𝑥𝑗 ⟩. (4.4.37)
𝑗=1

Therefore, the action of a quantum–classical channel Nqc can be written as

∑︁
qc
N (𝜌) = Tr[𝑀𝑥 𝜌]|𝑥⟩⟨𝑥| (4.4.38)
𝑥∈X
𝑟𝑥
∑︁ ∑︁
= 𝜇𝑥𝑗 ⟨𝜙𝑥𝑗 |𝜌|𝜙𝑥𝑗 ⟩|𝑥⟩⟨𝑥| (4.4.39)
𝑥∈X 𝑗=1
∑︁ ∑︁𝑟𝑥 √ √
= 𝜇𝑥𝑗 |𝑥⟩⟨𝜙𝑥𝑗 |𝜌|𝜙𝑥𝑗 ⟩⟨𝑥| 𝜇𝑥𝑗 (4.4.40)
𝑥∈X 𝑗=1
∑︁ ∑︁𝑟𝑥
= 𝐾 𝑥𝑗 𝜌(𝐾 𝑥𝑗 ) † , (4.4.41)
𝑥∈X 𝑗=1

163
Chapter 4: Quantum Channels

where √
𝐾 𝑥𝑗 B 𝜇𝑥𝑗 |𝑥⟩⟨𝜙𝑥𝑗 | ∀ 𝑥 ∈ X, 1 ≤ 𝑗 ≤ 𝑟 𝑥 . (4.4.42)
Since
𝑟𝑥
∑︁ ∑︁ 𝑟𝑥
∑︁ ∑︁
(𝐾 𝑥𝑗 ) † 𝐾 𝑥𝑗 = 𝜇𝑥𝑗 |𝜙𝑥𝑗 ⟩⟨𝑥|𝑥⟩⟨𝜙𝑥𝑗 | (4.4.43)
𝑥∈X 𝑗=1 𝑥∈X 𝑗=1
∑︁ ∑︁𝑟𝑥
= 𝜇𝑥𝑗 |𝜙𝑥𝑗 ⟩⟨𝜙𝑥𝑗 | (4.4.44)
𝑥∈X 𝑗=1
| {z }
𝑀𝑥
∑︁
= 𝑀𝑥 (4.4.45)
𝑥∈X
= 1𝐴, (4.4.46)
it holds that {𝐾 𝑥𝑗 : 𝑥 ∈ X, 1 ≤ 𝑗 ≤ 𝑟 𝑥 } is a set of Kraus operators for Nqc .
This means that all quantum–classical channels have a Kraus representation with
unit-rank Kraus operators.

4.4.5 Quantum Instruments

Recall from Section 3.3 that, for an arbitrary POVM {𝑀𝑥 }𝑥∈X , a set of post-measure-
ment states corresponding to an initial state 𝜌 can be given by (3.3.14),

𝐾𝑥 𝜌𝐾𝑥†
𝑥
𝜌 = ∀ 𝑥 ∈ X, (4.4.47)
Tr[𝐾𝑥 𝜌𝐾𝑥† ]

where {𝐾𝑥 }𝑥∈X is a set of operators such that 𝑀𝑥 = 𝐾𝑥† 𝐾𝑥 for all 𝑥 ∈ X. Also recall
that the expected density operator of the measurement is given by (3.3.15),
∑︁
𝐾𝑥 𝜌𝐾𝑥† . (4.4.48)
𝑥∈X

This expected state can be seen as arising from a quantum channel with Kraus
operators {𝐾𝑥 }𝑥∈X . Note that this map is indeed a channel since 𝑥∈X 𝐾𝑥† 𝐾𝑥 =
Í

𝑥∈X 𝑀𝑥 = 1. We can view the channel as being the sum of the completely positive
Í
and trace-non-increasing maps M𝑥 defined as
M𝑥 (𝜌) = 𝐾𝑥 𝜌𝐾𝑥† . (4.4.49)
164
Chapter 4: Quantum Channels

A quantum instrument generalizes this notion to maps M𝑥 that are completely

positive and trace non-increasing with an arbitrary number of Kraus operators, not
just one.

Definition 4.11 Quantum Instrument

A quantum instrument is a collection {M𝑥 }𝑥∈X of completely positive and
trace-non-increasing maps indexed by elements of a finite alphabet X, such that
Í
the sum map 𝑥∈X M𝑥 is a quantum channel.

A quantum instrument corresponds to a more general notion of a measurement

in which, as before, the elements of the alphabet X correspond to the measurement
outcomes. If the initial state of the system is 𝜌, then the corresponding measurement
outcome probabilities 𝑝(𝑥) are given by

𝑝(𝑥) = Tr[M𝑥 (𝜌)], (4.4.50)

and the corresponding post-measurement state is

M𝑥 (𝜌)
𝜌𝑥 = . (4.4.51)
Tr[M𝑥 (𝜌)]
The expected state of the ensemble {( 𝑝(𝑥), 𝜌 𝑥 )}𝑥∈X is then
∑︁
M𝑥 (𝜌). (4.4.52)
𝑥∈X

It is customary to define the output of a quantum instrument {M𝑥 }𝑥∈X as the

channel M defined by
∑︁
M(𝜌) = |𝑥⟩⟨𝑥| 𝑋 ⊗ M𝑥 (𝜌). (4.4.53)
𝑥∈X

That is, the output of a quantum instrument is a classical–quantum state such that
the classical register 𝑋 stores the outcome of the measurement. This is unlike
the expected state in (4.4.52), which represents a lack of knowledge of which
measurement outcome occurred.
Note that the channel in (4.4.53) corresponding to a quantum instrument reduces
to the measurement channel defined in (4.4.34) if we consider a measurement with
165
Chapter 4: Quantum Channels

POVM {𝑀𝑥 }𝑥∈X and we define the maps M𝑥 as M𝑥 (𝜌) = Tr[𝑀𝑥 𝜌] for all 𝑥 ∈ X.
In this case, the channel in (4.4.53) becomes
∑︁
M(𝜌) = Tr[𝑀𝑥 𝜌]|𝑥⟩⟨𝑥| 𝑋 , (4.4.54)
𝑥∈X
which is precisely the measurement channel in (4.4.34).

4.4.6 Entanglement-Breaking Channels

An important class of channels in quantum communication consists of those that,

when acting on one share of a bipartite state, eliminate any entanglement between
the two systems, such that the resulting state is separable (recall Definition 3.5).
Such channels are called entanglement breaking, and we define them as follows.

Definition 4.12 Entanglement-Breaking Channel

A channel N 𝐴→𝐵 is called entanglement breaking if (id 𝑅 ⊗ N 𝐴→𝐵 )(𝜌 𝑅 𝐴 ) is a
separable state for every state 𝜌 𝑅 𝐴 , where 𝑅 is an arbitrary reference system.

Although Definition 4.12 suggests that it is necessary to check an infinite

number of input states to determine whether a channel is entanglement breaking,
the following proposition states that it is only necessary to check the output of the
channel on the maximally entangled state.

Proposition 4.13
A channel N is entanglement breaking if and only if its Choi state ΦN
𝐴𝐵 is
separable.

Proof: Observe that if N 𝐴→𝐵 is entanglement breaking, then its Choi state ΦN 𝐴𝐵
is separable. On the other hand, if the Choi state ΦN 𝐴𝐵 of a given channel N is
separable, then it is of the form
∑︁
N
Φ 𝐴𝐵 = 𝑝(𝑥)𝜎𝐴𝑥 ⊗ 𝜏𝐵𝑥 (4.4.55)
𝑥∈X
for some probability distribution 𝑝 : X → [0, 1] on a finite alphabet X and sets
{𝜎𝐴𝑥 : 𝑥 ∈ X} and {𝜏𝐵𝑥 : 𝑥 ∈ X} of states. We note that the property Tr 𝐵 [ΦN
𝐴𝐵 ] = 𝜋 𝐴
166
Chapter 4: Quantum Channels

of the Choi state translates to

1𝐴 ∑︁
𝜋𝐴 = = 𝑝(𝑥)𝜎𝐴𝑥 . (4.4.56)
𝑑𝐴
𝑥∈X

Then, for every reference system 𝑅 and state 𝜉 𝑅 𝐴 acting on H 𝑅 𝐴 , we find, by using
(4.2.5), that

(id 𝑅 ⊗ N)(𝜉 𝑅 𝐴 ) = 𝑑 𝐴 Tr 𝐴 [(T 𝐴 (𝜉 𝑅 𝐴 ) ⊗ 1𝐵 )( 1 𝑅 ⊗ ΦN

𝐴𝐵 )] (4.4.57)
∑︁
= 𝑞(𝑥)𝜔𝑥𝑅 ⊗ 𝜏𝐵𝑥 , (4.4.58)
𝑥∈X

where
Tr 𝐴 [T 𝐴 (𝜉 𝑅 𝐴 )( 1 𝑅 ⊗ 𝑑 𝐴 𝑝(𝑥)𝜎𝐴𝑥 )]
𝜔𝑥𝑅 B , (4.4.59)
𝑞(𝑥)
T
𝑞(𝑥) B 𝑝(𝑥)𝑑 𝐴 Tr[𝜉 𝐴 𝜎𝐴𝑥 ]. (4.4.60)

Now, the map 𝑥 ↦→ 𝑞(𝑥) is a probability distribution on X since 𝑞(𝑥) ≥ 0 for all
𝑥 ∈ X and
" !#
∑︁ ∑︁
𝑞(𝑥) = 𝑑 𝐴 Tr 𝜉 T𝐴 𝑝(𝑥)𝜎𝐴𝑥 (4.4.61)
𝑥∈X 𝑥∈X
T
= 𝑑 𝐴 Tr[𝜉 𝐴 𝜋 𝐴 ] (4.4.62)
= Tr[𝜉 𝐴 ] (4.4.63)
= 1, (4.4.64)

where we have made use of (4.4.56). So (id 𝑅 ⊗ N)(𝜉 𝑅 𝐴 ) is a separable state. ■

Another useful characterization of entanglement breaking channels is through

their Kraus representations, as shown in the following proposition:

Proposition 4.14
A channel N is entanglement breaking if and only if there exists a set of Kraus
operators for N, with each Kraus operator having unit rank.

Proof: First suppose that the Kraus operators of N have unit rank. They are
therefore of the form |𝜙 𝑗 ⟩𝐵 ⟨𝜓 𝑗 | 𝐴 C 𝐾 𝑗 for 1 ≤ 𝑗 ≤ 𝑟. Without loss of generality,
167
Chapter 4: Quantum Channels

we can let each vector in the set {|𝜙 𝑗 ⟩}𝑟𝑗=1 be normalized. Then, since N is trace
preserving, it holds that
𝑟 𝑟 𝑟
1𝐴 =
∑︁ ∑︁ ∑︁
𝐾 †𝑗 𝐾 𝑗 = |𝜓 𝑗 ⟩ 𝐴 ⟨𝜙 𝑗 |𝜙 𝑗 ⟩⟨𝜓 𝑗 | 𝐴 = |𝜓 𝑗 ⟩⟨𝜓 𝑗 | 𝐴 . (4.4.65)
𝑗=1 𝑗=1 𝑗=1

Now, for every reference system 𝑅 of arbitrary dimension and every state 𝜌 𝑅 𝐴 ,
we find that
𝑟
( 1 𝑅 ⊗ 𝐾 𝑗 ) 𝜌 𝑅 𝐴 ( 1 𝑅 ⊗ 𝐾 †𝑗 )
∑︁
(id 𝑅 ⊗ N)(𝜌 𝑅 𝐴 ) = (4.4.66)
𝑗=1
𝑟
( 1 𝑅 ⊗ |𝜙 𝑗 ⟩𝐵 ⟨𝜓 𝑗 | 𝐴 ) 𝜌 𝑅 𝐴 ( 1 𝑅 ⊗ |𝜓 𝑗 ⟩ 𝐴 ⟨𝜙 𝑗 | 𝐵 )
∑︁
= (4.4.67)
𝑗=1
𝑟
( 1 𝑅 ⊗ ⟨𝜓 𝑗 | 𝐴 )(𝜌 𝑅 𝐴 )( 1 𝑅 ⊗ |𝜓 𝑗 ⟩ 𝐴 ) ⊗ |𝜙 𝑗 ⟩⟨𝜙 𝑗 | 𝐵
∑︁
= (4.4.68)
𝑗=1
𝑟
∑︁
𝑗
= 𝑝( 𝑗)𝜎𝑅 ⊗ |𝜙 𝑗 ⟩⟨𝜙 𝑗 | 𝐵 , (4.4.69)
𝑗=1

where

𝑗 ( 1 𝑅 ⊗ ⟨𝜓 𝑗 | 𝐴 )(𝜌 𝑅 𝐴 )( 1 𝑅 ⊗ |𝜓 𝑗 ⟩ 𝐴 )
𝜎𝑅 B , 𝑝( 𝑗) B ⟨𝜓 𝑗 | 𝐴 𝜌 𝐴 |𝜓 𝑗 ⟩ 𝐴 . (4.4.70)
𝑝( 𝑗)
Note that 𝑗 ↦→ 𝑝( 𝑗) is a probability distribution since 𝑝( 𝑗) ≥ 0 for all 𝑗, and
𝑟
∑︁ 𝑟
∑︁
𝑝( 𝑗) = ⟨𝜓 𝑗 | 𝐴 𝜌 𝐴 |𝜓 𝑗 ⟩ 𝐴 (4.4.71)
𝑗=1 𝑗=1
𝑟
∑︁
= Tr[|𝜓 𝑗 ⟩⟨𝜓 𝑗 |𝜌 𝐴 ] (4.4.72)
𝑗=1

© 𝑟
 ∑︁ 

= Tr  |𝜓 𝑗 ⟩⟨𝜓 𝑗 | 𝐴 ® 𝜌 𝐴 
ª
(4.4.73)
 𝑗=1 
« ¬ 
= Tr[𝜌 𝐴 ] (4.4.74)
= 1, (4.4.75)

168
Chapter 4: Quantum Channels

where we have made use of (4.4.65). Therefore, (id 𝑅 ⊗ N)(𝜌 𝑅 𝐴 ) is a separable

state, so that N is entanglement breaking.
Now, suppose that N is entanglement breaking. This means that its Choi state
ΦN
𝐴𝐵 is a separable state, which means that it can be written as
∑︁
N
Φ 𝐴𝐵 = 𝑝(𝑥)|𝜓𝑥 ⟩⟨𝜓𝑥 | 𝐴 ⊗ |𝜙𝑥 ⟩⟨𝜙𝑥 | 𝐵 (4.4.76)
𝑥∈X

for some probability distribution 𝑝 : X → [0, 1] on a finite alphabet X and sets of

pure states {|𝜓𝑥 ⟩ 𝐴 : 𝑥 ∈ X}, {|𝜙𝑥 ⟩𝐵 : 𝑥 ∈ X}. Define the unit-rank operators
√
𝐾𝑥 B 𝑑 𝐴 𝑝(𝑥)|𝜙𝑥 ⟩𝐵 ⟨𝜓𝑥 | 𝐴 , 𝑥 ∈ X. (4.4.77)

Then, for every orthonormal basis state |𝑖⟩ 𝐴 on 𝐴, we have

√
𝐾𝑥 |𝑖⟩ 𝐴 = 𝑑 𝐴 𝑝(𝑥)|𝜙𝑥 ⟩𝐵 ⟨𝜓𝑥 |𝑖⟩ (4.4.78)
√
= 𝑑 𝐴 𝑝(𝑥)⟨𝑖|𝜓𝑥 ⟩|𝜙𝑥 ⟩𝐵 (4.4.79)
√
= 𝑑 𝐴 𝑝(𝑥)(⟨𝑖| 𝐴 ⊗ 1𝐵 )(|𝜓𝑥 ⟩ 𝐴 ⊗ |𝜙𝑥 ⟩𝐵 ). (4.4.80)

Using this, we find that

N(|𝑖⟩⟨𝑖′ | 𝐴 ) = 𝑑 𝐴 (⟨𝑖| 𝐴 ⊗ 1𝐵 )ΦN𝐴𝐵 (|𝑖′⟩ 𝐴 ⊗ 1𝐵 )

∑︁ √
= 𝑑 𝐴 𝑝(𝑥)(⟨𝑖| 𝐴 ⊗ 1𝐵 )(|𝜓𝑥 ⟩ 𝐴 ⊗ |𝜙𝑥 ⟩𝐵 )
𝑥∈X
√
× (⟨𝜓𝑥 | 𝐴 ⊗ ⟨𝜙𝑥 | 𝐵 )(|𝑖 ⟩ 𝐴 ⊗ 1𝐵 ) 𝑑 𝐴 𝑝(𝑥)
′
∑︁
= 𝐾𝑥 |𝑖⟩⟨𝑖′ | 𝐴 𝐾𝑥† . (4.4.81)
𝑥∈X

This holds for all 0 ≤ 𝑖, 𝑖′ ≤ 𝑑 𝐴 − 1, which means that {𝐾𝑥 }𝑥∈X is a set of Kraus
operators for N, each of which has unit rank. ■

From this proposition, we immediately see that both quantum–classical and

classical–quantum channels are entanglement breaking, because each one has a
Kraus representation with unit-rank Kraus operators. This implies that a composition
of a quantum–classical channel followed by a classical-quantum channel is also
entanglement-breaking, and every such map can be written as
∑︁
𝜌 ↦→ Tr[𝑀𝑥 𝜌]𝜎 𝑥 , (4.4.82)
𝑥∈X

169
Chapter 4: Quantum Channels

where X is a finite alphabet, {𝜎 𝑥 }𝑥∈X is a set of quantum states, and {𝑀𝑥 }𝑥∈X is a
POVM. Indeed, if each POVM element 𝑀𝑥 has a spectral decomposition of the
form 𝑟𝑥
∑︁
𝑀𝑥 = 𝜆𝑥𝑘 |𝜓 𝑥𝑘 ⟩⟨𝜓 𝑥𝑘 |, (4.4.83)
𝑘=1
where 𝑟 𝑥 = rank(𝑀𝑥 ), and each state 𝜎 𝑥 has a spectral decomposition of the form
𝑠𝑥
∑︁
𝑥
𝜎 = 𝛼ℓ𝑥 |𝜙ℓ𝑥 ⟩⟨𝜙ℓ𝑥 |, (4.4.84)
ℓ=1

where 𝑠𝑥 = rank(𝜎 𝑥 ), then (4.4.82) can be written as

𝑟 𝑥 ∑︁
∑︁ ∑︁ 𝑠𝑥
𝜌 ↦→ 𝜆𝑥𝑘 𝛼ℓ𝑥 |𝜙ℓ𝑥 ⟩⟨𝜙ℓ𝑥 |⟨𝜓 𝑥𝑘 |𝜌|𝜓 𝑥𝑘 ⟩ (4.4.85)
𝑥∈X 𝑘=1 ℓ=1
∑︁ ∑︁ 𝑠 𝑥 √︃
𝑟 𝑥 ∑︁ √︃
= 𝜆ℓ𝑘 𝛼ℓ𝑥 |𝜙ℓ𝑥 ⟩⟨𝜓 𝑥𝑘 |𝜌|𝜓 𝑥𝑘 ⟩⟨𝜙ℓ𝑥 | 𝜆𝑥𝑘 𝛼ℓ𝑥 (4.4.86)
𝑥∈X 𝑘=1 ℓ=1
∑︁ ∑︁𝑟 𝑥 ∑︁
𝑠𝑥
𝑥 𝑥 †
= 𝐾 𝑘,ℓ 𝜌(𝐾 𝑘,ℓ ) , (4.4.87)
𝑥∈X 𝑘=1 ℓ=1
√︁
𝑥 B 𝜆 𝑥 𝛼 𝑥 |𝜙𝑥 ⟩⟨𝜓 𝑥 |. Since {𝐾 𝑥 : 𝑥 ∈ X, 1 ≤ 𝑘 ≤ 𝑟 , 1 ≤ ℓ ≤ 𝑠 } is a
where 𝐾 𝑘,ℓ 𝑘 ℓ ℓ 𝑘 𝑘,ℓ 𝑥 𝑥
set of Kraus operators for the map, with each Kraus operator having unit rank, it
holds by Proposition 4.14 that the map in (4.4.82) is entanglement breaking.
A channel of the form (4.4.82) is sometimes called a “measure-and-prepare
channel” or an “intercept-resend channel” since the input to the channel is first
measured then replaced by a new state conditioned on the measurement outcome.
Remarkably, any entanglement breaking channel can be written in the form (4.4.82),
as we now show.

Theorem 4.15
For every entanglement-breaking channel N, there exists a finite alphabet X, a
set {𝜎 𝑥 }𝑥∈X of states, and a POVM {𝑀𝑥 }𝑥∈X such that the action of N can be
written as ∑︁
N(𝜌) = Tr[𝑀𝑥 𝜌]𝜎 𝑥 (4.4.88)
𝑥∈X
for every state 𝜌.

170
Chapter 4: Quantum Channels

Proof: By Proposition 4.14, the action of N can be written as

𝑟
∑︁
N(𝜌) = 𝐾 𝑗 𝜌𝐾 †𝑗 , (4.4.89)
𝑗=1

where 𝑟 = rank(ΓN ) and {𝐾 𝑗 }𝑟𝑗=1 is a set of Kraus operators for N, with each Kraus
operator having unit rank. Since each Kraus operator has unit rank, it holds that
𝐾 𝑗 = |𝜙 𝑗 ⟩⟨𝜓 𝑗 | for all 1 ≤ 𝑗 ≤ 𝑟, where {|𝜙 𝑗 ⟩}𝑟𝑗=1 and {|𝜓 𝑗 ⟩}𝑟𝑗=1 are sets of vectors
(without loss of generality, we can take {|𝜙 𝑗 ⟩}𝑟𝑗=1 to be a set of pure states). Since
N is trace preserving, it holds that
𝑟 𝑟 𝑟
|𝜓 𝑗 ⟩⟨𝜓 𝑗 | = 1.
∑︁ ∑︁ ∑︁
𝐾 †𝑗 𝐾 𝑗 = |𝜓 𝑗 ⟩⟨𝜙 𝑗 |𝜙 𝑗 ⟩⟨𝜓 𝑗 | = (4.4.90)
𝑗=1 𝑗=1 𝑗=1

This implies that {|𝜓 𝑗 ⟩⟨𝜓 𝑗 |}𝑟𝑗=1 is a POVM. Therefore, defining the alphabet
X = {1, 2, . . . , 𝑟 }, the POVM elements 𝑀𝑥 B |𝜓𝑥 ⟩⟨𝜓𝑥 | and states 𝜎 𝑥 B |𝜙𝑥 ⟩⟨𝜙𝑥 |,
we have that ∑︁
N(𝜌) = Tr[𝑀𝑥 𝜌]𝜎 𝑥 , (4.4.91)
𝑥∈X
as required. ■

An extreme example of a measure-and-prepare channel, as in (4.4.88), is one for

which the output is a fixed state 𝜎 for every outcome of the measurement described
by the POVM {𝑀𝑥 }𝑥∈X . In this case, the channel in (4.4.88) takes the form
∑︁
N(𝜌) = Tr[𝑀𝑥 𝜌]𝜎 = Tr[𝜌]𝜎 = R𝜎 (𝜌), (4.4.92)
𝑥∈X

where the second equality follows because 𝑥∈X 𝑀𝑥 = 1, and the last equality from
Í
the definition of the replacement channel for 𝜎 in Definition 4.8. This means that
every replacement channel is a measure-and-prepare channel, and in particular an
entanglement-breaking channel.
The development above Proposition 4.14 tells us that to every entanglement
breaking channel there is associated a separable state, namely, the Choi state. The
converse statement also holds; see Exercise 4.20.

171
Chapter 4: Quantum Channels

Exercise 4.20
Í 𝜌
Given a separable state 𝜌 𝐴𝐵 = 𝑥∈X 𝑝(𝑥)𝜔𝑥𝐴 ⊗ 𝜏𝐵𝑥 , show that the channel N 𝐴→𝐵
defined in (4.2.16) has the form
∑︁
𝜌
N 𝐴→𝐵 (𝑋 𝐴 ) = Tr[𝑋 𝐴 𝑀 𝐴𝑥 ]𝜏𝐵𝑥 , (4.4.93)
𝑥∈X

where T
−1 −1
𝑀 𝐴𝑥 = 𝑝(𝑥) 𝜌 𝐴 2 𝜔𝑥𝐴 𝜌 𝐴 2 (4.4.94)

for all 𝑥 ∈ X. In other words, by the discussion after (4.4.82), every separable
state can be associated with an entanglement-breaking channel.

4.4.7 Hadamard Channels

It turns out that every entanglement-breaking channel can be regarded as the

complement of what is called a Hadamard channel, which we now define.

Definition 4.16 Hadamard Channel

A channel N is called a Hadamard channel or a Schur channel if there exists a
positive semi-definite operator 𝑁 with unit diagonal elements (in the standard
basis) and an isometry 𝑉 such that

N(𝑋) = 𝑁 ∗ 𝑉 𝑋𝑉 † (4.4.95)

for every linear operator 𝑋, where 𝑁 ∗ 𝑉 𝑋𝑉 † is the Hadamard product, also

called the Schur product, which is defined as the element-wise product of
the operators 𝑁 and 𝑉 𝑋𝑉 † when represented as matrices with respect to the
standard basis.

Given linear operators 𝑋 and 𝑌 acting on a 𝑑-dimensional Hilbert space, with

matrix representations in the standard basis as
𝑑−1
∑︁ 𝑑−1
∑︁
𝑋= 𝑋𝑖, 𝑗 |𝑖⟩⟨ 𝑗 |, 𝑌= 𝑌𝑖, 𝑗 |𝑖⟩⟨ 𝑗 |, (4.4.96)
𝑖, 𝑗=0 𝑖, 𝑗=0

172
Chapter 4: Quantum Channels

the Hadamard product 𝑋 ∗ 𝑌 is the operator with matrix representation in the

standard basis given by the product of the matrix elements of 𝑋 and 𝑌 :
𝑑−1
∑︁
𝑋 ∗𝑌 = 𝑋𝑖, 𝑗 𝑌𝑖, 𝑗 |𝑖⟩⟨ 𝑗 |. (4.4.97)
𝑖, 𝑗=0

Note that the Hadamard product can be defined as the element-wise product with
respect to an arbitrary orthonormal basis; however, in this book, we only consider
the Hadamard product with respect to the standard basis.
The positive semi-definiteness of 𝑁 in the definition of a Hadamard channel is
necessary and sufficient for the map N defined in (4.4.95) to be completely positive,
while the fact that 𝑁 has unit diagonal elements in the standard basis and that 𝑉 is
an isometry ensures that N is trace preserving.

Exercise 4.21
1. Show that a dephasing channel, as defined in (4.5.35), is a Hadamard
channel.
2. Let {𝐾 𝑗 }𝑟𝑗=1 be a set of Kraus operators. Prove that the channel, defined by
the set {𝐾 𝑗 ⊗ | 𝑗⟩}𝑟𝑗=1 of Kraus operators, is a Hadamard channel.

The following fact about complements of Hadamard channels provides an

important characterization of Hadamard channels:

Proposition 4.17
Any complement of a Hadamard channel is entanglement breaking.

Proof: Suppose N 𝐴→𝐵 is a Hadamard channel between systems 𝐴 and 𝐵 with

associated positive semi-definite operator 𝑁 having unit diagonal elements in the
standard basis and with associated isometry 𝑉. Since 𝑁 is positive semi-definite
and it has unit diagonal elements, it can be expressed as the Gram matrix of some
set {|𝜓𝑖 ⟩ : 1 ≤ 𝑖 ≤ 𝑑} of normalized vectors, so that
𝑑−1
∑︁
𝑁= ⟨𝜓𝑖 |𝜓 𝑗 ⟩|𝑖⟩⟨ 𝑗 |. (4.4.98)
𝑖, 𝑗=0

173
Chapter 4: Quantum Channels

Therefore, using (4.4.97) and Definition 4.16, the action of N can be written as
𝑑−1
∑︁
N(𝑋) = ⟨𝜓𝑖 |𝜓 𝑗 ⟩⟨𝑖|𝑉 𝑋𝑉 † | 𝑗⟩|𝑖⟩⟨ 𝑗 | 𝐵 . (4.4.99)
𝑖, 𝑗=0

Now, set
|𝜙𝑖 ⟩ B 𝑉 † |𝑖⟩ ⇒ ⟨𝜙𝑖 | = ⟨𝑖|𝑉, (4.4.100)
so that
𝑑−1
∑︁
N(𝑋) = ⟨𝜓𝑖 |𝜓 𝑗 ⟩⟨𝜙𝑖 |𝑋 |𝜙 𝑗 ⟩|𝑖⟩⟨ 𝑗 | 𝐵 . (4.4.101)
𝑖, 𝑗=0

Consider the operator (𝑈 N ) 𝐴→𝐵𝐸 defined as

𝑑−1
∑︁
N
𝑈 B |𝑖⟩𝐵 ⟨𝜙𝑖 | 𝐴 ⊗ |𝜓𝑖 ⟩𝐸 . (4.4.102)
𝑖=0

Since 𝑉 is an isometry, and the vectors {|𝜓𝑖 ⟩ : 1 ≤ 𝑖 ≤ 𝑑} are normalized, it follows

that
𝑑−1 𝑑−1
𝑉 † |𝑖⟩⟨𝑖|𝑉 = 𝑉 †𝑉 = 1.
∑︁ ∑︁
N † N
(𝑈 ) 𝑈 = |𝜙𝑖 ⟩⟨𝜙𝑖 | = (4.4.103)
𝑖=0 𝑖=0

This means that 𝑈 N is an isometry. Furthermore,

𝑑−1
∑︁
N N †
𝑈 𝑋 (𝑈 ) = ⟨𝜙𝑖 |𝑋 |𝜙 𝑗 ⟩|𝜓𝑖 ⟩⟨𝜓 𝑗 | 𝐸 ⊗ |𝑖⟩⟨ 𝑗 | 𝐵 , (4.4.104)
𝑖, 𝑗=0

so that Tr𝐸 [𝑈 N 𝑋 (𝑈 N ) † ] = N(𝑋). The operator 𝑈 N is therefore an isometric

extension of N. A complementary channel then results from tracing out 𝐵 in
(4.4.104), i.e.,

N𝑐 (𝑋) = Tr 𝐵 [𝑈 N 𝑋 (𝑈 N ) † ] (4.4.105)
𝑑−1
∑︁
= ⟨𝜙𝑖 |𝑋 |𝜙𝑖 ⟩|𝜓𝑖 ⟩⟨𝜓𝑖 | 𝐸 (4.4.106)
𝑖=0
𝑑−1
∑︁
= |𝜓𝑖 ⟩⟨𝜙𝑖 |𝑋 |𝜙𝑖 ⟩⟨𝜓𝑖 | (4.4.107)
𝑖=0

174
Chapter 4: Quantum Channels

𝑑−1
∑︁
= 𝐾𝑖 𝑋𝐾𝑖† , (4.4.108)
𝑖=0
where 𝐾𝑖 B |𝜓𝑖 ⟩⟨𝜙𝑖 |. So N𝑐 has a Kraus representation with unit-rank Kraus
operators, which means, by Proposition 4.14, that N𝑐 is entanglement breaking.
Every complement of a channel is related to another complement by an isometric
channel acting on the output of the complement, and this does not change the
entanglement-breaking property. ■

By following the proof above backwards, we find that every entanglement-

breaking channel is the complement of some Hadamard channel.

4.4.8 Covariant Channels

In Section 3.2.7, we defined states that are invariant under the action of a unitary
representation of a group. We now define an analogous notion of invariance for
quantum channels.

Definition 4.18 Group-Covariant Channel

Let 𝐺 be a group. A channel N 𝐴→𝐵 is called covariant with respect to
𝐺, group-covariant, 𝐺-covariant, or simply covariant, if there exist unitary
𝑔 𝑔
representations {𝑈 𝐴 }𝑔∈𝐺 and {𝑉𝐵 }𝑔∈𝐺 of 𝐺 such that for every state 𝜌 𝐴 ,
𝑔 𝑔† 𝑔 𝑔†
N 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝑈 𝐴 ) = 𝑉𝐵 N 𝐴→𝐵 (𝜌 𝐴 )𝑉𝐵 (4.4.109)

for all 𝑔 ∈ 𝐺.

Exercise 4.22
Let N 𝐴→𝐵 be a group-covariant channel, as per Definition 4.18.
1. Show that the condition in (4.4.109) can be written more compactly as
follows:
𝑔† 𝑔
N 𝐴→𝐵 = V𝐵 ◦ N 𝐴→𝐵 ◦ U 𝐴 (4.4.110)
𝑔† 𝑔† 𝑔 𝑔 𝑔 𝑔†
for all 𝑔 ∈ 𝐺, where V𝐵 (·) B 𝑉𝐵 (·)𝑉𝐵 and U 𝐴 (·) B 𝑈 𝐴 (·)𝑈 𝐴 .

175
Chapter 4: Quantum Channels

2. Show that the Choi representation of N 𝐴→𝐵 is invariant under the action of
𝑔T 𝑔†
𝑈 𝐴 ⊗ 𝑉𝐵 for all 𝑔 ∈ 𝐺; i.e., show that

ΓN N †
𝑔T 𝑔† 𝑔T 𝑔†
𝐴𝐵 = (𝑈 𝐴 ⊗ 𝑉𝐵 )Γ 𝐴𝐵 (𝑈 𝐴 ⊗ 𝑉𝐵 ) (4.4.111)

for all 𝑔 ∈ 𝐺.
𝑟 of Kraus operators for N, with 𝑟 ≥ rank(ΓN ), show
3. For every set {𝐾𝑖 }𝑖=1 𝐴𝐵
𝑔 𝑟 𝑔 𝑔† 𝑔
that {𝐾𝑖 }𝑖=1 , with 𝐾𝑖 B 𝑉𝐵 𝐾𝑖𝑈 𝐴 , is another set of Kraus operators for N
for all 𝑔 ∈ 𝐺.
4. For every isometric extension 𝑊 𝐴→𝐵𝐸 of N, with 𝑑 𝐸 ≥ rank(ΓN𝐴𝐵 ), show
𝑔 𝑔† 𝑔
that 𝑊 𝐴→𝐵𝐸 B 𝑉𝐵 𝑊𝑈 𝐴 is another isometric extension of N for all 𝑔 ∈ 𝐺.

Group covariant channels have group covariant isometric extensions, as the

following lemma states.

Lemma 4.19 Isometric Extensions of Group Covariant Channels

Suppose that a channel N 𝐴→𝐵 is covariant with respect to a group 𝐺. For an
N
isometric extension 𝑈 𝐴→𝐵𝐸
𝑔
of N 𝐴→𝐵 , there is a set of unitaries {𝑊𝐸 }𝑔∈𝐺 such
that the following covariance holds for all 𝑔 ∈ 𝐺:
N 𝑔 𝑔 𝑔 N
𝑈 𝐴→𝐵𝐸 𝑈 𝐴 = 𝑉𝐵 ⊗ 𝑊𝐸 𝑈 𝐴→𝐵𝐸 . (4.4.112)

Proof: Given is a group 𝐺 and a quantum channel N 𝐴→𝐵 that is covariant in the
following sense:
𝑔 𝑔† 𝑔 𝑔†
N 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝑈 𝐴 ) = 𝑉𝐵 N 𝐴→𝐵 (𝜌 𝐴 )𝑉𝐵 , (4.4.113)
𝑔 𝑔
for a set of unitaries {𝑈 𝐴 }𝑔∈𝐺 and {𝑉𝐵 }𝑔∈𝐺 .
Let a Kraus representation of M 𝐴→𝐵 be given as
∑︁
N 𝐴→𝐵 (𝜌 𝐴 ) = 𝐿 𝑗 𝜌 𝐴 𝐿 𝑗† . (4.4.114)
𝑗

We can rewrite (4.4.113) as

𝑔† 𝑔 𝑔† 𝑔
𝑉𝐵 N 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝑈 𝐴 )𝑉𝐵 = N 𝐴→𝐵 (𝜌 𝐴 ), (4.4.115)
176
Chapter 4: Quantum Channels

which means that for all 𝑔, the following equality holds

∑︁ ∑︁ †
𝑗 𝑗† 𝑔† 𝑔 𝑔† 𝑗 𝑔
𝐿 𝜌𝐴𝐿 = 𝑉𝐵 𝐿 𝑗 𝑈 𝐴 𝜌 𝐴 𝑉𝐵 𝐿 𝑈 𝐴 . (4.4.116)
𝑗 𝑗

𝑔† 𝑔
Thus, the channel has two different Kraus representations {𝐿 𝑗 } 𝑗 and {𝑉𝐵 𝐿 𝑗 𝑈 𝐴 } 𝑗 ,
𝑔
and these are necessarily related by a unitary with matrix elements 𝑤 𝑗 𝑘 (see the
discussion after (4.3.7)):
∑︁
𝑔† 𝑗 𝑔 𝑔
𝑉𝐵 𝐿 𝑈 𝐴 = 𝑤 𝑗𝑘 𝐿𝑘 . (4.4.117)
𝑘

N
A canonical isometric extension 𝑈 𝐴→𝐵𝐸 of N 𝐴→𝐵 is given as
∑︁
N
𝑈 𝐴→𝐵𝐸 = 𝐿 𝑗 ⊗ | 𝑗⟩𝐸 , (4.4.118)
𝑗

𝑔
where {| 𝑗⟩𝐸 } 𝑗 is an orthonormal basis. Defining 𝑊𝐸 as the following unitary
∑︁
𝑔 𝑔
𝑊𝐸 |𝑘⟩𝐸 = 𝑤 𝑗 𝑘 | 𝑗⟩𝐸 , (4.4.119)
𝑗

where the states |𝑘⟩𝐸 are chosen from {| 𝑗⟩𝐸 } 𝑗 , consider that
∑︁
N 𝑔 𝑔
𝑈 𝐴→𝐵𝐸 𝑈𝐴 = 𝐿 𝑗 𝑈 𝐴 ⊗ | 𝑗⟩𝐸 (4.4.120)
𝑗
∑︁
𝑔 𝑔† 𝑔
= 𝑉𝐵 𝑉𝐵 𝐿 𝑗 𝑈 𝐴 ⊗ | 𝑗⟩𝐸 (4.4.121)
𝑗
" #
∑︁ ∑︁
𝑔 𝑔
= 𝑉𝐵 𝑤 𝑗 𝑘 𝐿 𝑘 ⊗ | 𝑗⟩𝐸 (4.4.122)
𝑗 𝑘
∑︁ ∑︁
𝑔 𝑘 𝑔
= 𝑉𝐵 𝐿 ⊗ 𝑤 𝑗 𝑘 | 𝑗⟩𝐸 (4.4.123)
𝑘 𝑗
∑︁
𝑔 𝑔
= 𝑉𝐵 𝐿 𝑘 ⊗ 𝑊𝐸 |𝑘⟩𝐸 (4.4.124)
𝑘
𝑔 𝑔 N
= 𝑉𝐵 ⊗ 𝑊𝐸 𝑈 𝐴→𝐵𝐸 . (4.4.125)

This concludes the proof. ■

177
Chapter 4: Quantum Channels

Recall the definition of the twirling map in Exercise 3.17:

1 ∑︁ 𝑔 𝑔†
T (𝜌) B
𝐺
𝑈 𝜌𝑈 . (4.4.126)
|𝐺 | 𝑔∈𝐺

This map, which is evidently a quantum channel, takes an arbitrary state 𝜌

and makes it invariant under the action of the group 𝐺 given by the unitary
representation {𝑈 𝑔 }𝑔∈𝐺 . Similarly, we can define the twirl of a quantum channel,
which takes an arbitrary quantum channel N 𝐴→𝐵 and makes it group covariant, as
per Definition 4.18:
1 ∑︁ 𝑔 𝑔†
N𝐺𝐴→𝐵 B V𝐵 ◦ N 𝐴→𝐵 ◦ U 𝐴 . (4.4.127)
|𝐺 | 𝑔∈𝐺

Exercise 4.23
Given a quantum channel N 𝐴→𝐵 , prove that the twirled channel N𝐺𝐴→𝐵 , as
defined in (4.4.127), is group covariant.

Proposition 4.20
Let N 𝐴→𝐵 be a Hermiticity-preserving superoperator that is covariant with
respect to a group 𝐺, as defined in Definition 4.18.
1. For every pure state 𝜓 𝐴′ 𝐴 , with 𝑑 𝐴′ = 𝑑 𝐴 ,
𝜌
∥N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )∥ 1 ≤ ∥N 𝐴→𝐵 (𝜙 𝐴′ 𝐴 )∥ 1 , (4.4.128)
Í 𝑔 𝑔†
where 𝜌 𝐴 B 𝜓 𝐴 = Tr 𝐴′ [𝜓 𝐴′ 𝐴 ], 𝜌 𝐴 B T 𝐺
𝐴 (𝜌 𝐴 ) =
1
|𝐺 | 𝑔∈𝐺 𝑈 𝐴 𝜌 𝐴𝑈 𝐴 , and
𝜌
𝜙 𝐴′ 𝐴 is a purification of 𝜌.
2. The diamond norm of N is given by

∥N∥⋄ = sup ∥ (id 𝐴′ ⊗ N 𝐴→𝐵 )(𝜓 𝐴′ 𝐴 )∥ 1 : 𝜓 𝐴 = T 𝐺
𝐴 (𝜓 𝐴 ) , (4.4.129)
𝜓 𝐴′ 𝐴

where the optimization is with respect to pure states 𝜓 𝐴′ 𝐴 , with 𝑑 𝐴′ = 𝑑 𝐴 ,

such that the reduced state 𝜓 𝐴 is invariant under the twirling channel
1 Í 𝑔 𝑔† 𝑔
T𝐺
𝐴 (·) B |𝐺 | 𝑔∈𝐺 𝑈 𝐴 (·)𝑈 𝐴 defined by the representation {𝑈 𝐴 } 𝑔∈𝐺 .

178
Chapter 4: Quantum Channels

𝑔 1𝐴
3. If the representation {𝑈 𝐴 }𝑔∈𝐺 is such that T 𝐺
𝐴 (·) = Tr[·] 𝑑 𝐴 , then

∥N∥⋄ = ΦN
𝐴𝐵 . (4.4.130)
1

Proof:
𝜌
1. Let 𝜓 𝐴′ 𝐴 be an arbitrary pure state, 𝜌 𝐴 = 𝜓 𝐴 , 𝜌 𝐴 = T 𝐺
𝐴 (𝜌 𝐴 ), and let 𝜙 𝐴′ 𝐴 be a
purification of 𝜌 𝐴 . Let us also consider the following purification of 𝜌 𝐴 :

𝜌 1 ∑︁
𝑔
|𝜓 ⟩ 𝑅 𝐴′ 𝐴 B √︁ |𝑔⟩ 𝑅 ⊗ 𝑈 𝐴 |𝜓⟩ 𝐴′ 𝐴 , (4.4.131)
|𝐺 | 𝑔∈𝐺

where {|𝑔⟩}𝑔∈𝐺 is an orthonormal set. (Recall Exercise 3.17.) Now, because

all purifications of a state can be mapped to each other via isometries acting
on the purifying system (see Section 3.2.5), there exists an isometry 𝑊 𝐴′ →𝑅 𝐴′
such that |𝜓 𝜌 ⟩ 𝑅 𝐴′ 𝐴 = 𝑊 𝐴′ →𝑅 𝐴′ |𝜙 𝜌 ⟩ 𝐴′ 𝐴 . Therefore,

= 𝑊 𝐴′ →𝑅 𝐴′ N 𝐴→𝐵 (𝜙 𝐴′ 𝐴 )(𝑊 𝐴′ →𝑅 𝐴′ ) †
𝜌 𝜌
N 𝐴→𝐵 (𝜓 𝑅 𝐴′ 𝐴 ) (4.4.132)
1 1
𝜌
= N 𝐴→𝐵 (𝜙 𝐴′ 𝐴 ) , (4.4.133)
1

where the last line follows from isometric invariance of the trace norm (see
(2.2.93)).
Í
Now, let us apply the quantum channel 𝑋 ↦→ 𝑔∈𝐺 |𝑔⟩⟨𝑔|𝑋 |𝑔⟩⟨𝑔| to the
system 𝑅. By the data-processing inequality in (4.1.7), we find that

𝜌 1 ∑︁ 𝑔
N 𝐴→𝐵 (𝜓 𝑅 𝐴′ 𝐴 ) ≥ |𝑔⟩⟨𝑔| 𝑅 ⊗ (N 𝐴→𝐵 ◦ U 𝐴 )(𝜓 𝐴′ 𝐴 ) (4.4.134)
1 |𝐺 | 𝑔∈𝐺
1

1 ∑︁ 𝑔† 𝑔
= |𝑔⟩⟨𝑔| 𝑅 ⊗ (V𝐵 ◦ N 𝐴→𝐵 ◦ U 𝐴 )(𝜓 𝐴′ 𝐴 ) ,
|𝐺 | 𝑔∈𝐺
1
(4.4.135)

where the last line follows from applying the unitary channel defined by the
Í 𝑔†
unitary 𝑔∈𝐺 |𝑔⟩⟨𝑔| 𝑅 ⊗ 𝑉𝐵 and from unitary invariance of the trace norm.

179
Chapter 4: Quantum Channels

Finally, using the covariance of N, in particular (4.4.110), along with (4.4.133),

we obtain
𝜌 𝜌
N 𝐴→𝐵 (𝜙 𝐴′ 𝐴 ) = N 𝐴→𝐵 (𝜓 𝑅 𝐴′ 𝐴 ) (4.4.136)
1 1

1 ∑︁
≥ |𝑔⟩⟨𝑔| 𝑅 ⊗ N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 ) (4.4.137)
|𝐺 | 𝑔∈𝐺
1
= ∥N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )∥ 1 , (4.4.138)

where the last line follows from (2.2.96). The derived inequality is precisely
(4.4.128),
𝜌
2. Note that, by definition, every purification 𝜙 𝐴′ 𝐴 of 𝜌 𝐴 is such that its reduced
state on 𝐴 is invariant under the channel T 𝐺𝐴 . Therefore, using (4.4.128), for
every pure state 𝜓 𝐴′ 𝐴 , we obtain
𝜌
∥N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )∥ 1 ≤ N 𝐴→𝐵 (𝜙 𝐴′ 𝐴 ) (4.4.139)
1

≤ sup ∥N 𝐴→𝐵 (𝜙 𝐴′ 𝐴 ) ∥ 1 : 𝜙 𝐴 = T 𝐺
𝐴 (𝜙 𝐴 ) . (4.4.140)
𝜙 𝐴′ 𝐴

Since this inequality holds for every pure state 𝜓 𝐴′ 𝐴 , and because N is
Hermiticity preserving, we can use (2.2.190) to conclude that

∥N∥⋄ = sup ∥N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )∥ 1 (4.4.141)

𝜓 𝐴′ 𝐴

≤ sup ∥N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )∥ 1 : 𝜓 𝐴 = T 𝐺
𝐴 (𝜓 𝐴 ) . (4.4.142)
𝜓 𝐴′ 𝐴

The opposite inequality is immediate, because the set {𝜓 𝐴′ 𝐴 : 𝜓 𝐴 = T 𝐺

𝐴 (𝜓 𝐴 )}
is a subset of all pure states. This concludes the proof of (4.4.129).
1𝐴
3. If T 𝐺
𝐴 (·) = Tr[·] 𝑑 𝐴 , then the optimization in (4.4.128) is with respect to pure
states 𝜓 𝐴′ 𝐴 whose reduced state on 𝐴 is maximally mixed. All such pure
states are maximally entangled (see Definition 3.6 and the discussion below
it), which means that there exists a unitary 𝑈 such that 𝜓 𝐴′ 𝐴 = U 𝐴′ (Φ 𝐴′ 𝐴 ),
where Φ 𝐴′ 𝐴 = |Φ⟩⟨Φ| 𝐴′ 𝐴 is the maximally entangled state defined by |Φ⟩ 𝐴′ 𝐴 =
Í𝑑 𝐴−1
√1
𝑖=0 |𝑖, 𝑖⟩ 𝐴 𝐴 (see Exercise 3.7). By unitary invariance of the trace
′
𝑑
𝐴
norm, we thus immediately obtain ∥N∥⋄ = ∥N 𝐴→𝐵 (Φ 𝐴′ 𝐴 )∥ 1 = ΦN
𝐴′ 𝐵 1 , as
required. ■
180
Chapter 4: Quantum Channels

4.4.9 Bipartite and Multipartite Channels

4.5 Examples of Communication Channels

4.5.1 (Generalized) Amplitude Damping Channel

The amplitude damping channel with decay parameter 𝛾 ∈ [0, 1] is the channel
A𝛾 given by A𝛾 (𝜌) = 𝐴1 𝜌 𝐴1† + 𝐴2 𝜌 𝐴2† , with the two Kraus operators 𝐴1 and 𝐴2
defined as
√ √︁
𝐴1 = 𝛾|0⟩⟨1|, 𝐴2 = |0⟩⟨0| + 1 − 𝛾|1⟩⟨1|. (4.5.1)
It is straightforward to verify that 𝐴1† 𝐴1 + 𝐴2† 𝐴2 = 1, so that A𝛾 is indeed trace
preserving.
Let 𝜌 be a state with a matrix representation in the standard basis {|0⟩, |1⟩} as

1−𝜆 𝛼
𝜌= . (4.5.2)
𝛼 𝜆

In order for 𝜌 to be a state (positive semi-definite with unit trace), the conditions
0 ≤ 𝜆 ≤ 1 and 𝜆(1 − 𝜆) ≥ |𝛼| 2 should hold, where 𝛼 ∈ C. The output state A𝛾 (𝜌)
has the matrix representation
√︁
1− (1 − 𝛾)𝜆 1 − 𝛾𝛼
A𝛾 (𝜌) = √︁ . (4.5.3)
1 − 𝛾𝛼 (1 − 𝛾)𝜆

To obtain a physical interpretation of the amplitude damping channel, consider

that it can written in the Stinespring form as

A𝛾 (𝜌) = Tr𝐸 [𝑈 𝜂 (𝜌 ⊗ |0⟩⟨0| 𝐸 )(𝑈 𝜂 ) † ], (4.5.4)

where 𝐸 is a qubit environment system, 𝜂 B 1 − 𝛾, and

1 0 √︁ 0 0
√
i 1−𝜂
© ª
0 𝜂 0®
𝑈 𝜂 = √︁ √ ® (4.5.5)
0 i 1 − 𝜂 𝜂 0®
«0 0 0 1¬

181
Chapter 4: Quantum Channels

is unitary. Note that the action of A𝛾 on the pure states |0⟩ and |1⟩ is, respectively,
A1−𝜂 (|0⟩⟨0| 𝐴 ) = |0⟩⟨0| 𝐵 ,
(4.5.6)
A1−𝜂 (|1⟩⟨1| 𝐴 ) = (1 − 𝜂)|0⟩⟨0| 𝐵 + 𝜂|1⟩⟨1| 𝐵 .

A complementary channel A𝑐𝛾 for A𝛾 is defined from

A𝑐𝛾 (𝜌) B Tr 𝐵 [𝑈 1−𝛾 (𝜌 ⊗ |0⟩⟨0| 𝐸 )(𝑈 1−𝛾 ) † ]. (4.5.7)

Exercise 4.24
Prove that A𝑐𝛾 = A1−𝛾 for all 𝛾 ∈ [0, 1].

Using the result of Exercise 4.24, we have that the action of the complementary
channel A1−𝜂
𝑐 on these states is

A1−𝜂
𝑐
(|0⟩⟨0| 𝐴 ) = |0⟩⟨0| 𝐸 ,
(4.5.8)
A1−𝜂
𝑐
(|1⟩⟨1| 𝐴 ) = 𝜂|0⟩⟨0| 𝐸 + (1 − 𝜂)|1⟩⟨1| 𝐸 .
We see that whenever the state |0⟩⟨0| is input to the channel, the output systems
𝐵 and 𝐸 are both in the state |0⟩⟨0|. On the other hand, if the input state is |1⟩⟨1|,
then 𝐵 receives a mixed state: with probability 1 − 𝜂, the state is |0⟩⟨0|, and with
probability 𝜂, the state is |1⟩⟨1|. The situation for 𝐸 is reversed, receiving |0⟩⟨0|
with probability 𝜂 and |1⟩⟨1| with probability 1 − 𝜂. The unitary 𝑈 𝜂 can thus be
viewed as a qubit analogue of a beamsplitter, and the amplitude damping channel
A1−𝜂 can be viewed as a qubit analogue of the pure-loss bosonic channel; see
Figure 4.5.
A beamsplitter is an optical device that takes two beams of light as input and
splits them into two separate output beams, with one of the output beams containing
a fraction 𝜂 of the intensity of the incoming beam and the other output beam
containing the remaining fraction 1 − 𝜂 of the incoming intensity. When one of the
input ports of the beamsplitter is empty, i.e., is in the vacuum state, the output to
the receiver is by definition the pure-loss bosonic channel. In the case of a single
incoming photon, the pure-loss channel either transmits the photon with probability
𝜂 (allowing it to go to the receiver) or reflects it with probability 1 − 𝜂 (sending it
to the environment).
To draw the correspondence between the qubit amplitude damping channel and
the pure-loss bosonic channel described above, we can think of the qubit state |0⟩
182
Chapter 4: Quantum Channels

|0ih0|
Uη

ρ A1 − η ( ρ )

Figure 4.5: The amplitude damping channel A1−𝜂 can be interpreted, using
(4.5.4), as an interaction with a qubit analogue of a bosonic beamsplitter
unitary 𝑈 𝜂 , followed by discarding the output state of the environment. The
channel from the sender to the receiver is from the left to the right, while the
input and output environment systems are at the top and the bottom, respectively.
In the bosonic case, the state |0⟩⟨0| of the environmental input arm of the
beamsplitter corresponds to the vacuum state, which contains no photons, and
the channel to the receiver’s end is called the pure-loss bosonic channel.

as the vacuum state and the qubit state |1⟩ as the state of a single photon. It is
possible to show that the output states of the amplitude damping channel in (4.5.6)
and (4.5.8) for the receiver and environment, respectively, then exactly match the
action of the bosonic pure-loss channel on the subspace spanned by |0⟩ and |1⟩
(please consult the Bibliographic Notes in Section 3.4 for further references on this
connection). The amplitude damping channel can indeed, therefore, be viewed as
the qubit analogue of the bosonic pure-loss channel.
By replacing the initial state |0⟩⟨0| of the environment in (4.5.4) with the state

𝜃 𝑁 𝐵 B (1 − 𝑁 𝐵 )|0⟩⟨0| + 𝑁 𝐵 |1⟩⟨1|, 𝑁 𝐵 ∈ [0, 1], (4.5.9)

we define the generalized amplitude damping channel A1−𝜂,𝑁 𝐵 as

A𝛾,𝑁 𝐵 (𝜌) ≡ A1−𝜂,𝑁 𝐵 (𝜌) B Tr𝐸 [𝑈 𝜂 (𝜌 ⊗ 𝜃 𝑁 𝐵 )(𝑈 𝜂 ) † ], (4.5.10)

where we again use the relation 𝛾 = 1 − 𝜂. This channel has the following four
Kraus operators:
√
√︁ 1 √︁ 0 √︁ 0 𝛾
𝐴1 = 1 − 𝑁 𝐵 , 𝐴2 = 1 − 𝑁 𝐵 , (4.5.11)
0 1−𝛾 0 0
√︁
√︁ 1−𝛾 0 √︁ 0 0
𝐴3 = 𝑁 𝐵 , 𝐴4 = 𝑁 𝐵 √ . (4.5.12)
0 1 𝛾 0

183
Chapter 4: Quantum Channels

Note that the amplitude damping channel is a special case of the generalized
amplitude damping channel in which the thermal noise 𝑁 𝐵 = 0, so that A1−𝜂 =
A1−𝜂,0 .

Exercise 4.25
1. Prove that A𝛾,𝑁 = (1 − 𝑁)A𝛾,0 + 𝑁A𝛾,1 for all 𝛾, 𝑁 ∈ [0, 1].
2. Prove that A𝛾,𝑁 = A𝛾2 ,𝑁2 ◦ A𝛾1 ,𝑁1 , where 𝛾 = 𝛾1 + 𝛾2 − 𝛾1 𝛾2 and 𝑁 =
𝛾1 (1−𝛾2 )𝑁1 +𝛾2 𝑁2
𝛾1 +𝛾2 −𝛾1 𝛾2 .

3. Using 2., along with the result of Exercise 4.24, prove that the amplitude
1
damping channel A𝛾,0 is degradable for all 𝛾 ∈ 0, 2 , with degrading
channel A 1−2𝛾 ,0 .
1−𝛾

The state 𝜃 𝑁 𝐵 in (4.5.10) is a qubit thermal state and can be regarded as a qubit
analogue of the bosonic thermal state. The latter is a state corresponding to a heat
bath with an average number of photons equal to 𝑁 𝐵 . The generalized amplitude
damping channel can then be seen as a qubit analogue of the bosonic thermal noise
channel.

Exercise 4.26
Recall the Pauli operators 𝑋, 𝑌 , 𝑍 from (3.2.6), and consider the generalized
amplitude-damping channel A𝛾,𝑁 , with 𝛾, 𝑁 ∈ [0, 1]. Show that
√︁
A𝛾,𝑁 (𝑋) = 1 − 𝛾𝑋, (4.5.13)
√︁
A𝛾,𝑁 (𝑌 ) = 1 − 𝛾𝑌 , (4.5.14)
A𝛾,𝑁 (𝑍) = (1 − 𝛾)𝑍, (4.5.15)
A𝛾,𝑁 ( 1) = 1 + 𝛾(1 − 2𝑁)𝑍. (4.5.16)

From this, conclude that the generalized amplitude-damping channel is covariant

with respect to the group defined by the operators {1, 𝑍 }, so that for all
𝛾, 𝑁 ∈ [0, 1],
A𝛾,𝑁 (𝑍 𝜌𝑍 † ) = 𝑍A𝛾,𝑁 (𝜌)𝑍 † (4.5.17)
for every state 𝜌.

184
Chapter 4: Quantum Channels

4.5.2 Erasure Channel

The qudit erasure channel E 𝑝 with erasure probability 𝑝 ∈ [0, 1] is defined as

follows for every 𝜌 ∈ L(C𝑑 ), with 𝑑 ≥ 2:
E 𝑝 (𝜌) = (1 − 𝑝) 𝜌 + 𝑝Tr[𝜌]|𝑒⟩⟨𝑒|, (4.5.18)
where |𝑒⟩ is some unit vector that is orthogonal to all states in the input qudit
Hilbert space and |𝑒⟩⟨𝑒| is called the erasure state. For example, if the input qudit
Hilbert space is spanned by the standard basis {|0⟩, |1⟩, . . . , |𝑑 − 1⟩}, then we can
set |𝑒⟩ = |𝑑⟩. Thus, the erasure channel takes a linear operator acting on C𝑑 and
outputs a linear operator acting on C𝑑+1 .

Exercise 4.27
Verify that a set of Kraus operators for the erasure channel E 𝑝 is
n√︁ √ √ o
1 − 𝑝(|0⟩⟨0| + · · · + |𝑑 − 1⟩⟨𝑑 − 1|), 𝑝|𝑒⟩⟨0|, . . . , 𝑝|𝑒⟩⟨𝑑 − 1| .
(4.5.19)
Also, show that the Choi representation of E 𝑝 is

ΓE 𝑝 = (1 − 𝑝)Γ𝑑 + 𝑝 1𝑑 ⊗ |𝑒⟩⟨𝑒|. (4.5.20)

We now discuss how the pure-loss bosonic channel restricted to acting on

dual-rail single-photon inputs, an important model for transmission of single
photons through an optical fiber, corresponds to a qubit erasure channel. The
correspondence is illustrated in Figure 4.6, and recall our earlier discussion from
Section 3.2.
Consider a dual-rail optical system, i.e., a quantum optical system with two
distinct optical modes 𝐴1 and 𝐴2 representing the input to the channel. As described
at the beginning of Section 3.2, the two-dimensional subspace spanned by the states
{|0, 1⟩ 𝐴1 ,𝐴2 , |1, 0⟩ 𝐴1 ,𝐴2 }, consisting of a total of one photon in either one of the two
modes, constitutes a qubit system. We also let 𝐸 1 and 𝐸 2 be two distinct optical
modes constituting a dual-rail qubit system representing the environment of the
channel. Finally, let 𝐵1 and 𝐵2 be two distinct optical modes spanned by the states
{|0, 0⟩𝐵1 ,𝐵2 , |0, 1⟩𝐵1 ,𝐵2 , |1, 0⟩𝐵1 ,𝐵2 }, so that together 𝐵1 and 𝐵2 constitute a qutrit
system.
When the unitary corresponding to a quantum-optical beamsplitter acts on the
185
Chapter 4: Quantum Channels

|0ih0|
Uη E1

A1 B1

ρ A1 ,A2 E1−η (ρ A1 ,A2 )

|0ih0|
Uη E2

A2 B2

Figure 4.6: The qubit erasure channel E1−𝜂 , for 𝜂 ∈ [0, 1], can be physically
realized by using a photonic dual-rail qubit system and passing each of the two
modes through a beamsplitter, modeled by the unitary 𝑈 𝜂 in (4.5.21), such that
the input from the environment is the vacuum state.

space spanned by {|0, 0⟩, |0, 1⟩, |1, 0⟩}, it is equivalent to the upper left 3 × 3 matrix
in (4.5.5). We can thus make use of this fact because the environment state for
𝜂
the bosonic pure-loss channel is prepared in the state |0⟩⟨0|. Let 𝑈 𝐴𝑖 𝐸𝑖 →𝐵𝑖 𝐸𝑖 then
denote the beamsplitter unitary, for 𝑖 ∈ {1, 2}, with the following action on the
basis {|0, 0⟩, |0, 1⟩, |1, 0⟩} of the input and output modes (in that order):
1 0 √︁ 0 ª
𝜂 √
= 0 √︁ 𝜂 i 1 − 𝜂® .
©
𝑈 𝐴𝑖 𝐸𝑖 →𝐵𝑖 𝐸𝑖 (4.5.21)
√
«0 i 1 − 𝜂 𝜂 ¬
Letting 𝜌 𝐴1 𝐴2 be an arbitrary qubit state on the two modes 𝐴1 and 𝐴2 defined by

𝜌 𝐴1 ,𝐴2 = (1 − 𝜆)|0, 1⟩⟨0, 1| 𝐴1 ,𝐴2 + 𝛼|0, 1⟩⟨1, 0| 𝐴1 ,𝐴2 + 𝛼|1, 0⟩⟨0, 1| 𝐴1 ,𝐴2
(4.5.22)
+ 𝜆|1, 0⟩⟨1, 0| 𝐴1 ,𝐴2 ,
the pure-loss bosonic channel on the dual-rail qubit system 𝐴1 𝐴2 is given by
h
𝜂 𝜂
Tr𝐸1 ,𝐸2 (𝑈 𝐴1 𝐸1 →𝐵1 𝐸1 ⊗ 𝑈 𝐴2 𝐸2 →𝐵2 𝐸2 )(𝜌 𝐴1 ,𝐴2 ⊗ |0, 0⟩⟨0, 0| 𝐸1 ,𝐸2 )
i (4.5.23)
𝜂 𝜂 †
×(𝑈 𝐴1 𝐸1 →𝐵1 𝐸1 ⊗ 𝑈 𝐴2 𝐸2 →𝐵2 𝐸2 ) .

186
Chapter 4: Quantum Channels

Although this has a similar form to (4.5.4), which defines the amplitude damping
channel, our particular realization of the qubit system in terms of dual-rail single
photons results in a completely different output from that of the amplitude damping
channel. In particular, using (4.5.22) along with (4.5.21), it is straightforward to
show that
h
𝜂 𝜂
Tr𝐸1 ,𝐸2 (𝑈 𝐴1 𝐸1 →𝐵1 𝐸1 ⊗ 𝑈 𝐴2 𝐸2 →𝐵2 𝐸2 )(𝜌 𝐴1 ,𝐴2 ⊗ |0, 0⟩⟨0, 0| 𝐸1 ,𝐸2 )
i
𝜂 𝜂 †
×(𝑈 𝐴1 𝐸1 →𝐵1 𝐸1 ⊗ 𝑈 𝐴2 𝐸2 →𝐵2 𝐸2 )
= 𝜂𝜌 𝐵1 ,𝐵2 + (1 − 𝜂)|0, 0⟩⟨0, 0| 𝐵1 ,𝐵2
= E1−𝜂 (𝜌 𝐴1 ,𝐴2 ). (4.5.24)

In other words, the pure-loss bosonic channel on a dual-rail qubit system is simply
the qubit erasure channel E1−𝜂 with erasure probability 1 − 𝜂 and erasure state
|0, 0⟩⟨0, 0| 𝐵1 ,𝐵2 . This means that a dual-rail single-photonic qubit sent through a
pure-loss bosonic channel is transmitted to the receiver unchanged with probability 𝜂,
or it is lost, and replaced by the vacuum state |0, 0⟩⟨0, 0|, with probability 1 − 𝜂.

4.5.3 Pauli Channels

The Pauli channels constitute an important class of channels on qubit systems.

They are based on the qubit Pauli operators 𝑋, 𝑌 , 𝑍, which we first introduced in
(2.2.48) and (2.2.49) and have the following matrix representation in the standard
basis {|0⟩, |1⟩}:

0 1 0 −i 1 0
𝑋= , 𝑌= , 𝑍= . (4.5.25)
1 0 i 0 0 −1

Recall that the Pauli operators are Hermitian, satisfy 𝑋 2 = 𝑌 2 = 𝑍 2 = 1, and

mutually anti-commute.
A general Pauli channel is one whose Kraus operators are proportional to the
Pauli operators, i.e.,

𝜌 ↦→ 𝑝 𝐼 𝜌 + 𝑝 𝑋 𝑋 𝜌𝑋 + 𝑝𝑌 𝑌 𝜌𝑌 + 𝑝 𝑍 𝑍 𝜌𝑍, (4.5.26)

where 𝑝 𝐼 , 𝑝 𝑋 , 𝑝𝑌 , 𝑝 𝑍 ≥ 0, 𝑝 𝐼 + 𝑝 𝑋 + 𝑝𝑌 + 𝑝 𝑍 = 1.
Here we highlight two particular Pauli channels of interest.
187
Chapter 4: Quantum Channels

1. Dephasing channel: We let 𝑝 𝐼 = 1 − 𝑝 and 𝑝 𝑍 = 𝑝 for 0 ≤ 𝑝 ≤ 1, and

𝑝 𝑋 = 𝑝𝑌 = 0. The action of the dephasing channel is thus
𝜌 ↦→ (1 − 𝑝) 𝜌 + 𝑝𝑍 𝜌𝑍. (4.5.27)
To see why this is called the dephasing channel, consider again a general state
𝜌 of the form (4.5.2) and let 𝑝 = 21 . In this case, we call the channel the
completely dephasing channel, and it is straightforward to see that

1−𝜆 𝛼 1 1 1−𝜆 0
𝜌= ↦→ 𝜌 + 𝑍 𝜌𝑍 = . (4.5.28)
𝛼 𝜆 2 2 0 𝜆
In other words, the completely dephasing channel eliminates the off-diagonal
elements of the input state (when represented in the standard basis, which is
the same basis in which 𝑍 is diagonal), so that the relative phase between the
|0⟩ state and the |1⟩ state vanishes and the state becomes effectively classical.
In the more general case when 𝑝 ≠ 12 , the effect of the dephasing channel is to
reduce the magnitude of the off-diagonal elements:

1−𝜆 𝛼
𝜌= ↦→ (1 − 𝑝) 𝜌 + 𝑝𝑍 𝜌𝑍 (4.5.29)
𝛼 𝜆

1−𝜆 (1 − 2𝑝)𝛼
= . (4.5.30)
(1 − 2𝑝)𝛼 𝜆

2. Depolarizing Channel: For 𝑝 ∈ [0, 1], the depolarizing channel is defined by

letting 𝑝 𝐼 = 1 − 𝑝 and 𝑝 𝑋 = 𝑝𝑌 = 𝑝 𝑍 = 3𝑝 , so that
𝑝
𝜌 ↦→ (1 − 𝑝) 𝜌 + (𝑋 𝜌𝑋 + 𝑌 𝜌𝑌 + 𝑍 𝜌𝑍). (4.5.31)
3
By using the identity
1 1 1 1
𝜌 + 𝑋 𝜌𝑋 + 𝑌 𝜌𝑌 + 𝑍 𝜌𝑍 = Tr[𝜌]𝜋, (4.5.32)
4 4 4 4
(see Lemma 3.15 and (3.2.101)) we can equivalently write the action of the
depolarizing channel as

4𝑝 4𝑝
𝜌 ↦→ 1 − 𝜌+ Tr[𝜌]𝜋, (4.5.33)
3 3
which has the interpretation that the state of the system is replaced by the
maximally mixed state 𝜋 with probability 4𝑝 4𝑝
3 . Observe, however, that 3 can
be interpreted as a probability only for 0 ≤ 𝑝 ≤ 43 .
188
Chapter 4: Quantum Channels

4.5.4 Generalized Pauli Channels

Using the Heisenberg–Weyl operators defined in (3.2.48), we can generalize Pauli

channels to the qudit case. For all 𝑑 ≥ 2, a generalized Pauli channel is defined as
𝑑−1
∑︁
†
𝜌 ↦→ 𝑝(𝑧, 𝑥)𝑊𝑧,𝑥 𝜌𝑊𝑧,𝑥 , (4.5.34)
𝑧,𝑥=0

where 𝑝 : {0, 1, . . . , 𝑑 − 1}2 → [0, 1] is a probability

Í𝑑−1 distribution, so that 0 ≤
𝑝(𝑧, 𝑥) ≤ 1 for all 𝑧, 𝑥 ∈ {0, 1, . . . , 𝑑 − 1} and 𝑧,𝑥=0 𝑝(𝑧, 𝑥) = 1. In other words, a
generalized Pauli channel randomly applies one of the Heisenberg–Weyl operators
to the input. The Kraus operators of a generalized Pauli channel are therefore
n√︁ o 𝑑−1
𝑝(𝑧, 𝑥)𝑊𝑧,𝑥 .
𝑧,𝑥=0

Exercise 4.28
Show that the Choi state of a generalized Pauli channel is a Bell-diagonal state.
(Recall the definition of a Bell-diagonal state in (3.2.60).)

A special case of a generalized Pauli channel is a 𝑑-dimensional dephasing

channel, which is obtained by letting 𝑝(𝑧, 𝑥) = 0 for all 𝑥 ∈ {1, 2, . . . , 𝑑 − 1}:
𝑑−1
∑︁
𝜌 ↦→ 𝑝(𝑧, 0)𝑍 (𝑧) 𝜌𝑍 (𝑧) † . (4.5.35)
𝑧=0
For this special case, only the phase operators 𝑍 (𝑧) are applied randomly to the
input 𝜌.

Exercise 4.29
For the 𝑑-dimensional dephasing channel defined in (4.5.35), prove that, in the
standard basis, only the off-diagonal elements of the input state 𝜌 are affected
by the channel.

The 𝑑-dimensional depolarizing channel is defined analogously to the qubit

case as follows:
𝑝 ∑︁
†
D 𝑝 (𝜌) B (1 − 𝑝) 𝜌 + 2 𝑊𝑧,𝑥 𝜌𝑊𝑧,𝑥 , (4.5.36)
𝑑 − 1 (𝑧,𝑥)≠(0,0)

189
Chapter 4: Quantum Channels

for all 𝑝 ∈ [0, 1].

Exercise 4.30
Using (3.2.98), prove that for all 𝑝 ∈ [0, 1], the action of the depolarizing
channel D 𝑝 can be written as

1𝑑
D 𝑝 (𝜌) = (1 − 𝑞) 𝜌 + 𝑞Tr[𝜌] (4.5.37)
𝑑
𝑝𝑑 2
for every linear operator 𝜌, where 𝑞 = 𝑑 2 −1
.

Exercise 4.31
Prove that the Choi state of the depolarizing channel D 𝑝 is the isotropic state
𝜌 iso;1−𝑝 (recall (3.2.131)). Conversely, using (4.2.16), prove that every isotropic
state is the Choi state of a depolarizing channel. In other words, prove that for
all 𝑝 ∈ [0, 1],
D 𝑝 (𝑋) = 𝑑Tr[(𝑋 T ⊗ 1) 𝜌 iso;1−𝑝 ] (4.5.38)
for all 𝑋 ∈ L(C𝑑 ).

Exercise 4.32
1. Using the result of Exercise 4.31, along with (3.2.132), show that the
depolarizing channel has the following covariance property:

D 𝑝 = U† ◦ D 𝑝 ◦ U, (4.5.39)

for all 𝑝 ∈ [0, 1] and every unitary 𝑈.

2. Using the result of Exercise 4.31, along with (3.2.134), prove that for every
quantum channel N 𝐴→𝐵 , with 𝑑 𝐴 = 𝑑 𝐵 = 𝑑,
∫
U ◦ N ◦ U† d𝑈 = D1−𝑝 , (4.5.40)
𝑈

where 𝑝 = ⟨Φ|ΦN |Φ⟩ is the entanglement fidelity of N, which we define

formally in Section 6.4.

190
Chapter 4: Quantum Channels

4.6 Special Types of Channels

4.6.1 Petz Recovery Map

The following channel plays an important role in variations of the data-processing

inequality for the quantum relative entropy and other entropic quantities that we
define in the next chapter.

Definition 4.21 Petz Recovery Map

Let 𝜎 ∈ L(H) be a positive semi-definite operator and let N : L(H) → L(H′)
be a quantum channel. The Petz recovery map for 𝜎 and N is the completely
positive and trace-non-increasing map P𝜎,N : L(H′) → L(H′) defined as
1
1
† − 12 − 12
P𝜎,N (𝑋) B 𝜎 N N(𝜎) 𝑋N(𝜎)
2 𝜎2 (4.6.1)

for all 𝑋 ∈ L(H′).

Remark: If the operator N(𝜎) is invertible, then the Petz recovery map P 𝜎,N is a channel. If
1
the operator N(𝜎) is not invertible, then the inverse N(𝜎) − 2 is taken on the support on N(𝜎),
following the convention from Section 2.2.8.1. In this latter case, the Petz recovery map P 𝜎,N is
a trace non-increasing map.

The Petz recovery map is indeed completely positive because it is the composition
of the following completely positive maps:
1
1. Sandwiching by the positive semi-definite operator N(𝜎) − 2 .
2. The adjoint N† of N.
1
3. Sandwiching by the positive semi-definite operator 𝜎 2 .
The Petz recovery map is also trace non-increasing, as we can readily verify. For
every positive semi-definite operator 𝑋, the following holds
h 1 1i
† − 12 − 21
Tr[P𝜎,N (𝑋)] = Tr 𝜎 N N(𝜎) 𝑋N(𝜎)
2 𝜎2 (4.6.2)
h 1 1
i
= Tr 𝜎N† N(𝜎) − 2 𝑋N(𝜎) − 2 (4.6.3)

191
Chapter 4: Quantum Channels
h i
− 12 − 12
= Tr N(𝜎)N(𝜎) 𝑋N(𝜎) (4.6.4)
h i
− 21 − 21
= Tr N(𝜎) N(𝜎)N(𝜎) 𝑋 (4.6.5)
= Tr[ΠN(𝜎) 𝑋] (4.6.6)
≤ Tr[𝑋], (4.6.7)

where ΠN(𝜎) is the projection onto the support of N(𝜎), which arises because N(𝜎)
need not be invertible. If 𝑋 is contained in the support of N(𝜎), then Tr[ΠN(𝜎) 𝑋] =
Tr[𝑋], which means that the Petz recovery channel is trace-preserving for all inputs
with support contained in the support of N(𝜎).
One of the important properties of the Petz recovery channel P𝜎,N is that it
reverses the action of N on 𝜎 whenever N(𝜎) is invertible. In particular,
1
1
† − 12 − 12
P𝜎,N (N(𝜎)) = 𝜎 N N(𝜎) N(𝜎)N(𝜎)
2 𝜎2 (4.6.8)

= 𝜎 2 N† ( 1H′ )𝜎 2
1 1
(4.6.9)
= 𝜎, (4.6.10)

where we have used the fact that N(𝜎) is invertible, so that

N(𝜎) − 2 N(𝜎)N(𝜎) − 2 = 1H′ .

1 1
(4.6.11)

Then, since N is a channel, its adjoint is unital, which leads to the final equality.

Remark: The equality in (4.6.10) holds more generally; i.e., it holds even when N(𝜎) is not
invertible. To see this, we use the fact that the projection ΠN( 𝜎) onto the support of N(𝜎)
satisfies ΠN( 𝜎) ≤ 1H′ . Then, we find that
1
1 1
1
P 𝜎,N (N(𝜎)) = 𝜎 2 N† N(𝜎) − 2 N(𝜎)N(𝜎) − 2 𝜎 2 (4.6.12)
1 1
= 𝜎 2 N† (ΠN( 𝜎) )𝜎 2 (4.6.13)
≤ 𝜎 N† ( 1H′ )𝜎
1 1
2 2 (4.6.14)
= 𝜎. (4.6.15)

On the other hand, if we let 𝑈 : H → H′ ⊗ H𝐸 be an isometric extension of N, then we can

use Lemma 3.3, which implies that supp(𝑈𝜎𝑈 † ) ⊆ supp(N(𝜎) ⊗ 1𝐸 ), and in turn implies that
Π𝑈 𝜎𝑈 † ≤ ΠN( 𝜎) ⊗1𝐸 . Then, for every vector |𝜓⟩ ∈ H, we obtain

⟨𝜓|Π 𝜎 |𝜓⟩ = ⟨𝜓|𝑈 † Π𝑈 𝜎𝑈 † 𝑈|𝜓⟩ (4.6.16)

≤ ⟨𝜓|𝑈 (ΠN( 𝜎) ⊗ 1𝐸 )𝑈|𝜓⟩
†
(4.6.17)

192
Chapter 4: Quantum Channels

= Tr[𝑈|𝜓⟩⟨𝜓|𝑈 † (ΠN( 𝜎) ⊗ 1𝐸 )] (4.6.18)

†
= Tr[Tr𝐸 [𝑈|𝜓⟩⟨𝜓|𝑈 ]ΠN( 𝜎) ] (4.6.19)
= Tr[N(|𝜓⟩⟨𝜓|)ΠN( 𝜎) ] (4.6.20)
= Tr[|𝜓⟩⟨𝜓|N† (ΠN( 𝜎) )] (4.6.21)
= ⟨𝜓|N† (ΠN( 𝜎) )|𝜓⟩. (4.6.22)

Since |𝜓⟩ is arbitrary, it holds that Π 𝜎 ≤ N† (ΠN( 𝜎) ). Using this, we find that
1 1 1 1
P 𝜎,N (N(𝜎)) = 𝜎 2 N† (ΠN( 𝜎) )𝜎 2 ≥ 𝜎 2 Π 𝜎 𝜎 2 = 𝜎. (4.6.23)

Having shown that P 𝜎,N (N(𝜎)) ≤ 𝜎 and P 𝜎,N (N(𝜎)) ≥ 𝜎, we conclude that

P 𝜎,N (N(𝜎)) = 𝜎, (4.6.24)

even when N(𝜎) is not invertible.

4.6.1.1 Petz Recovery Channel for Partial Trace

Let the input Hilbert space to the channel N in the definition of the Petz recovery
map be H = H 𝐴𝐵 . Then, let N = Tr 𝐵 be the partial trace over 𝐵, and note that (see
Exercise 4.16)
N† (𝜎𝐵 ) = 𝜎𝐴 ⊗ 1𝐵 . (4.6.25)
Indeed, using Definition 2.18 for the adjoint of a superoperator, we have

⟨N† (𝜎𝐴 ), 𝜎𝐴𝐵 ⟩ = ⟨𝜎𝐴 ⊗ 1𝐵 , 𝜎𝐴𝐵 ⟩ (4.6.26)

= Tr[(𝜎𝐴 ⊗ 1𝐵 ) † 𝜎𝐴𝐵 ] (4.6.27)
= Tr[𝜎𝐴† Tr 𝐵 (𝜎𝐴𝐵 )] (4.6.28)
= ⟨𝜎𝐴 , N(𝜎𝐴𝐵 )⟩. (4.6.29)

Therefore, the Petz recovery map corresponding to the partial trace over 𝐵 is

1 1 1 1
− −
P𝜎𝐴𝐵 ,Tr𝐵 (𝑋 𝐴 ) = 𝜎𝐴𝐵
2
𝜎𝐴 2 𝑋 𝐴 𝜎𝐴 2 ⊗ 1𝐵 𝜎𝐴𝐵
2
. (4.6.30)

By writing the identity on 𝐵 as 1𝐵 = 𝑑𝑗=0

Í 𝐵 −1
| 𝑗⟩⟨ 𝑗 | 𝐵 , we can write the action P𝜎𝐴𝐵 ,Tr𝐵
as

1 1 1 1
− −
P𝜎𝐴𝐵 ,Tr𝐵 (𝑋 𝐴 ) = 𝜎𝐴𝐵
2
𝜎𝐴 2 𝑋 𝐴 𝜎𝐴 2 ⊗ 1𝐵 𝜎𝐴𝐵 2
(4.6.31)

193
Chapter 4: Quantum Channels

𝐵 −1
𝑑∑︁
1
−1 −1 1
= 𝜎𝐴𝐵 𝜎𝐴 2 𝑋 𝐴 𝜎𝐴 2
2
⊗ | 𝑗⟩⟨ 𝑗 | 𝐵 𝜎𝐴𝐵
2
(4.6.32)
𝑗=0
𝐵 −1
𝑑∑︁
1
−1 −1 1
= 𝜎𝐴𝐵 𝜎𝐴 2
2
⊗ | 𝑗⟩𝐵 𝑋 𝐴 𝜎𝐴 2 ⊗ ⟨ 𝑗 | 𝐵 𝜎𝐴𝐵
2
(4.6.33)
𝑗=0
𝐵 −1
𝑑∑︁
= 𝐾 𝑗 𝑋 𝐴 𝐾 †𝑗 , (4.6.34)
𝑗=0

where
1
−1
𝐾𝑗 B 𝜎𝐴𝐵 𝜎𝐴 2
2
⊗ | 𝑗⟩𝐵 . (4.6.35)

The operators 𝐾 𝑗 , for 0 ≤ 𝑗 ≤ 𝑑 𝐵 − 1, are thus Kraus operators for the Petz recovery
map for the partial trace. Using (4.3.9), and letting the environment 𝐸 be denoted
by 𝐵ˆ (since the dimension of the environment in the construction (4.3.9) is the
same as the number of Kraus operators, which in this case is equal to the dimension
of 𝐵), we find that
𝐵 −1
𝑑∑︁ 1
−1
𝑉𝐴→𝐵 𝐵ˆ = 2
𝜎𝐴𝐵 (𝜎𝐴 2 ⊗ | 𝑗⟩𝐵 ) ⊗ | 𝑗⟩𝐵ˆ (4.6.36)
𝑗=0
𝐵 −1
𝑑∑︁ 1
−1
= 2
𝜎𝐴𝐵 (𝜎𝐴 2 ⊗ | 𝑗⟩𝐵 ⊗ | 𝑗⟩𝐵ˆ ) (4.6.37)
𝑗=0

1 𝐵 −1
𝑑∑︁
− 21
= (𝜎𝐴𝐵 ⊗ 1 )(𝜎 ⊗ 1𝐵 ⊗ 1𝐵ˆ ) 1 𝐴 ⊗ | 𝑗, 𝑗⟩𝐵 𝐵ˆ ®
2 © ª
𝐵ˆ 𝐴 (4.6.38)
« 𝑗=0 ¬
1
−1
= (𝜎𝐴𝐵
2
⊗ 1𝐵ˆ )(𝜎𝐴 2 ⊗ 1𝐵 𝐵ˆ )( 1 𝐴 ⊗ |Γ⟩𝐵 𝐵ˆ ), (4.6.39)

which is an isometric extension of the Petz recovery map P𝜎𝐴𝐵 ,Tr𝐵 . Omitting
identity operators, this can be written more simply as follows:
1
−1
𝑉𝐴→𝐵 𝐵ˆ = 𝜎𝐴𝐵
2
𝜎𝐴 2 |Γ⟩𝐵 𝐵ˆ . (4.6.40)

194
Chapter 4: Quantum Channels

Exercise 4.33
Recall the Bayes theorem from probability theory:

𝑝 𝑋 |𝑌 (𝑥|𝑦) 𝑝𝑌 (𝑦) = 𝑝𝑌 |𝑋 (𝑦|𝑥) 𝑝 𝑋 (𝑥), (4.6.41)

where 𝑋 and 𝑌 are random variables with joint probability distribution 𝑝 𝑋𝑌 (𝑥, 𝑦)
over the alphabets X and Y, and the distributions given above are derived from
this joint distribution as
∑︁ ∑︁
𝑝 𝑋 (𝑥) = 𝑝 𝑋𝑌 (𝑥, 𝑦), 𝑝𝑌 (𝑦) = 𝑝 𝑋𝑌 (𝑥, 𝑦), (4.6.42)
𝑦∈Y 𝑥∈X
𝑝 𝑋𝑌 (𝑥, 𝑦) 𝑝 𝑋𝑌 (𝑥, 𝑦)
𝑝𝑌 |𝑋 (𝑦|𝑥) = , 𝑝 𝑋 |𝑌 (𝑥|𝑦) = . (4.6.43)
𝑝 𝑋 (𝑥) 𝑝𝑌 (𝑦)
We now develop a connection between Bayes theorem and the Petz recovery
map. Let {𝜌𝑥 }𝑥∈X be a set of states, let N be a channel, and let {𝑀𝑦 } 𝑦∈Y be a
POVM. Set
𝑝𝑌 |𝑋 (𝑦|𝑥) = Tr[𝑀𝑦 N(𝜌𝑥 )]. (4.6.44)
1. Show that (4.6.41) is satisfied with

𝑝 𝑋 |𝑌 (𝑥|𝑦) = Tr[𝐿 𝑥 P(𝜎𝑦 )], (4.6.45)

for the set {𝜎𝑦 } 𝑦∈Y of states, channel P, and POVM {𝐿 𝑥 }𝑥∈X chosen as

1 1 1
𝜎𝑦 = [N(𝜌)] 2 𝑀𝑦 [N(𝜌)] 2 , (4.6.46)
𝑝𝑌 (𝑦)
1
1
† − 12 − 12
P(·) = 𝜌 N [N(𝜌)] (·) [N(𝜌)]
2 𝜌2, (4.6.47)
1 1
𝐿 𝑥 = 𝑝 𝑋 (𝑥) [𝜌] − 2 𝜌𝑥 [𝜌] − 2 , (4.6.48)

where ∑︁
𝜌= 𝑝 𝑋 (𝑥) 𝜌𝑥 . (4.6.49)
𝑥∈X

2. Verify that {𝜎𝑦 } 𝑦∈Y is a set of states, P is a channel, and {𝐿 𝑥 }𝑥∈X is a

POVM. For simplicity, suppose that the states 𝜌 and N(𝜌) are positive
definite.

195
Chapter 4: Quantum Channels

4.6.2 LOCC Channels

A very common physical scenario encountered in quantum information theory is

one in which two parties, Alice and Bob, who are distantly separated, perform local
quantum operations (consisting of channels and/or measurements) at their respective
locations and communicate classically with each other in order to transform some
initially shared state to some final desired state. This sequence of local operations
and classical communication is abbreviated LOCC and is an important element
of many quantum communication protocols, such as entanglement distillation and
secret key distillation.
As with every other transformation in quantum theory, an LOCC operation
corresponds mathematically to a channel, which we call an LOCC channel. Formally,
an LOCC channel is defined as follows:

Definition 4.22 LOCC Channel

Let X be a finite alphabet, let {M𝑥 }𝑥∈X be a quantumÍinstrument (a set of
completely positive trace-non-increasing maps such that 𝑥∈X M𝑥 is a channel),
and let {N𝑥 }𝑥∈X be a set of quantum channels. Then, a one-way LOCC channel
from Alice to Bob is the channel L→𝐴𝐵→𝐴′ 𝐵′ from Alice’s initial and final systems
𝐴 and 𝐴 to Bob’s initial and final systems 𝐵 and 𝐵′, defined as
′

∑︁
→
L 𝐴𝐵→𝐴′ 𝐵′ = M𝑥𝐴→𝐴′ ⊗ N𝑥𝐵→𝐵′ . (4.6.50)
𝑥∈X

for some finite alphabet Y and sets {S𝑦 } 𝑦∈Y , {W𝑦 } 𝑦∈Y of completely positive
Í
trace-non-increasing maps such that 𝑦∈Y S𝑦 ⊗ W𝑦 is trace preserving.

196
Chapter 4: Quantum Channels

Consider the one-way LOCC channel L→ 𝐴𝐵→𝐴′ 𝐵′ from Alice to Bob defined
as in (4.6.50). The values in the set X form the possible messages that can be
communicated from Alice to Bob and constitute the “classical communication”
part of LOCC. The set {M𝑥 }𝑥∈X of completely positive trace-non-increasing maps
and the set {N𝑥 }𝑥∈X of quantum channels specify the actions of Alice and Bob
for each value 𝑥 ∈ X and constitute the “local operations” part of LOCC. The set
{M𝑥 }𝑥∈X of completely positive trace-non-increasing maps that sum to a channel
essentially specifies a quantum instrument. The operations corresponding to these
maps can thus be considered probabilisitic since the maps are not trace preserving.
In general, the party that performs the quantum instrument determines the direction
of classical communication and thus the direction of the LOCC channel. In this
case, Alice performs the classical communication since she performs the quantum
instrument. The values in the set X specify the outcomes of the instrument, and
Alice communicates the outcome to Bob, who performs the corresponding channel
selected from his set {N𝑥 }𝑥∈X of channels.
In more detail, let 𝜌 𝐴𝐵 be the initial state shared by Alice and Bob. Using
the definition in (4.4.53) for the channel corresponding to the quantum instrument
{M𝑥 }𝑥∈X , the state after applying the quantum instrument is
∑︁
M𝑥𝐴→𝐴′ (𝜌 𝐴𝐵 ) ⊗ |𝑥⟩⟨𝑥| 𝑋 𝐴 , (4.6.53)
𝑥∈X

where the system 𝑋 𝐴 stores the classical information corresponding to the outcome
of the instrument. Alice then communicates the outcome of the instrument to Bob.
This classical communication can be understood via the noiseless classical channel
from 𝑋 𝐴 to 𝑋𝐵 defined by
∑︁
𝜃 𝑋 𝐴 ↦→ ⟨𝑥| 𝑋 𝐴 𝜃 𝑋 𝐴 |𝑥⟩ 𝑋 𝐴 |𝑥⟩⟨𝑥| 𝑋𝐵 . (4.6.54)
𝑥∈X

The state in (4.6.53) thus gets transformed to

∑︁
M𝑥𝐴→𝐴′ (𝜌 𝐴𝐵 ) ⊗ |𝑥⟩⟨𝑥| 𝑋𝐵 . (4.6.55)
𝑥∈X

Finally, Bob applies the channel specified by

∑︁
𝜏𝐵 ⊗ 𝜔 𝑋𝐵 ↦→ N𝑥𝐵→𝐵′ (𝜏𝐵 ) ⊗ ⟨𝑥| 𝑋𝐵 𝜔 𝑋𝐵 |𝑥⟩ 𝑋𝐵 , (4.6.56)
𝑥∈X

197
Chapter 4: Quantum Channels

{M1x } x∈X1 {N2x } x∈X2 ··· {Mkx } x∈Xk

ρ A0 B0 Alice x1 xk ρ Ak Bk
Bob x2

{N1x } x∈X1 {M2x } x∈X2 ··· {Nkx } x∈Xk

Round 1 Round 2 Round k

L1, →
L2, ← ··· Lk,A→ B → A
A0 B0 → A1 B1 A1 B1 → A2 B2 k −1 k −1 k Bk

Figure 4.7: Illustration of an LOCC channel with 𝑘 rounds of alternating

one-way LOCC channels from Alice to Bob and from Bob to Alice. In each
round 𝑖, there is a finite alphabet X𝑖 , a set {M𝑖𝑥 }𝑥∈X𝑖 of completely positive
trace-non-increasing maps that sum to a channel, and a set {N𝑖𝑥 }𝑥∈X𝑖 of quantum
channels.

which corresponds to Bob measuring his system 𝑋𝐵 in the basis {|𝑥⟩ 𝑋𝐵 }𝑥∈X and
applying a quantum channel from the set {N𝑥𝐵→𝐵′ }𝑥∈X to the system 𝐵 based on the
outcome. The final state is then
∑︁
(M𝑥𝐴→𝐴′ ⊗ N𝑥𝐵→𝐵′ )(𝜌 𝐴𝐵 ), (4.6.57)
𝑥∈X

which is precisely the output of the one-way LOCC channel L→ 𝐴𝐵→𝐴′ 𝐵′ defined in
(4.6.50). We can succinctly write the steps in (4.6.53)–(4.6.56) as

L 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ) = (N 𝑋𝐵 𝐵→𝐵′ ◦ C 𝑋 𝐴→𝑋𝐵 ◦ M 𝐴→𝐴′ 𝑋 𝐴 )(𝜌 𝐴𝐵 ), (4.6.58)

where the channel M 𝐴→𝐴′ 𝑋 𝐴 is defined as

∑︁
M 𝐴→𝐴′ 𝑋 𝐴 (𝜉 𝐴 ) B M𝑥𝐴→𝐴′ (𝜉 𝐴 ) ⊗ |𝑥⟩⟨𝑥| 𝑋 𝐴 , (4.6.59)
𝑥∈X

and the channel N𝐵𝑋𝐵 →𝐵′ is defined as

N𝐵𝑋𝐵 →𝐵′ (𝜏𝐵 ⊗ |𝑥⟩⟨𝑥| 𝑋𝐵 ) B N𝑥𝐵→𝐵′ (𝜏𝐵 ) (4.6.60)

for all 𝑥 ∈ X. As indicated above, the channel C 𝑋 𝐴→𝑋𝐵 , defined in (4.6.54), is a

noiseless classical channel that transforms the classical register 𝑋 𝐴 , held by Alice,
to the classical register 𝑋𝐵 (which is simply a copy of 𝑋), held by Bob.

198
Chapter 4: Quantum Channels

An example of an LOCC channel is illustrated in Figure 4.7. This is an

LOCC channel consisting of 𝑘 rounds of alternating Alice-to-Bob and Bob-to-Alice
one-way LOCC channels and is of the form

L↔ 𝑘,→ 𝑘−1,←
𝐴0 𝐵0 →𝐴 𝑘 𝐵 𝑘 = L 𝐴 𝑘−1 𝐵 𝑘−1 →𝐴 𝑘 𝐵 𝑘 ◦ L 𝐴 𝑘−2 𝐵 𝑘−2 →𝐴 𝑘−1 𝐵 𝑘−1 ◦ · · ·
(4.6.61)
◦ L2,← 1,→
𝐴1 𝐵1 →𝐴2 𝐵2 ◦ L 𝐴0 𝐵0 →𝐴1 𝐵1 .

For each round 𝑖, there is a finite alphabet X𝑖 consisting of the messages commu-
nicated in the round, along with a set {M𝑖𝑥 }𝑥∈X𝑖 of completely positive trace-non-
increasing maps that sum to a quantum channel and a set {N𝑖𝑥 }𝑥∈X𝑖 of quantum
channels. In a multi-round LOCC channel such as this one, it possible for the
operation sets {M𝑖𝑥 }𝑥∈X𝑖 and {N𝑖𝑥 }𝑥∈X𝑖 on the 𝑖th round to depend on the outcomes
and actions taken in previous rounds. The quantum teleportation protocol in
Section 5.1 provides a concrete example of a one-way LOCC protocol from Alice
to Bob.

Definition 4.23 Separable Channel

A separable channel is a quantum channel S 𝐴𝐵→𝐴′ 𝐵′ such that
∑︁
S 𝐴𝐵→𝐴′ 𝐵′ = C𝑥𝐴→𝐴′ ⊗ D𝑥𝐵→𝐵′ (4.6.62)
𝑥∈X

for some finite alphabet X and sets of completely positive and trace-non-
increasing maps {C𝑥 }𝑥∈X and {D𝑥 }𝑥∈X such that S is trace preserving.

Every separable channel has a set of Kraus operators in product form; i.e., for
every separable channel S 𝐴𝐵→𝐴′ 𝐵′ as in (4.6.62) there exists a finite alphabet Y and
𝑦 𝑦
sets {𝐶 𝐴→𝐴′ } 𝑦∈Y and {𝐷 𝐵→𝐵′ } 𝑦∈Y such that
∑︁
(𝐶 𝐴→𝐴′ ⊗ 𝐷 𝐵→𝐵′ ) 𝜌 𝐴𝐵 (𝐶 𝐴→𝐴′ ⊗ 𝐷 𝐵→𝐵′ ) †
𝑦 𝑦 𝑦 𝑦
S 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ) = (4.6.63)
𝑦∈Y

for all 𝜌 𝐴𝐵 .
A key property of a separable channel is that it outputs a separable state if the input
Í
state is separable. To see this, consider the separable state 𝜎𝐴𝐵 = 𝑧∈Z 𝑝(𝑧)𝜏𝐴𝑧 ⊗ 𝜔 𝑧𝐵 .
Then the output state S 𝐴𝐵→𝐴′ 𝐵′ (𝜎𝐴𝐵 ) is given by

S 𝐴𝐵→𝐴′ 𝐵′ (𝜎𝐴𝐵 )
199
Chapter 4: Quantum Channels
!
∑︁ ∑︁
𝑝(𝑧)𝜏𝐴𝑧 ⊗ 𝜔 𝑧𝐵 (𝐶 𝐴→𝐴′ ⊗ 𝐷 𝐵→𝐵′ ) †
𝑦 𝑦 𝑦 𝑦
= (𝐶 𝐴→𝐴′ ⊗ 𝐷 𝐵→𝐵′ ) (4.6.64)
𝑦∈Y 𝑧∈Z
∑︁
𝑝(𝑧)𝐶 𝐴→𝐴′ 𝜏𝐴𝑧 (𝐶 𝐴→𝐴′ ) † ⊗ 𝐷 𝐵→𝐵′ 𝜔 𝑧𝐵 (𝐷 𝐵→𝐵′ ) † ,
𝑦 𝑦 𝑦 𝑦
= (4.6.65)
𝑦∈Y,𝑧∈Z

and is manifestly separable.

Proposition 4.24 LOCC Channels are Separable Channels

From (4.6.52) and (4.6.62), we conclude that every LOCC channel is a separable
channel.

The converse of Proposition 4.24 is not true. For example, let us define the
following operators:
1
𝐾1 B √ |0⟩⟨0| + |1⟩⟨1|, 𝐾2 B |0⟩⟨0|, 𝐾3 B |1⟩⟨1|. (4.6.66)
2
Then, following the notation in (4.6.62), let
C1𝐴→𝐴′ (·) = 𝐾1 (·)𝐾1† , (4.6.67)
C2𝐴→𝐴′ (·) = 𝐾2 (·)𝐾2† , (4.6.68)
C3𝐴→𝐴′ (·) = 𝐾3 (·)𝐾3† , (4.6.69)
and
D1𝐵→𝐵′ (·) = 𝐾2 (·)𝐾2† , (4.6.70)
D2𝐵→𝐵′ (·) = 𝐾1 (·)𝐾1† , (4.6.71)
D3𝐵→𝐵′ (·) = 𝐾3 (·)𝐾3† . (4.6.72)
Then, the map
3
∑︁
S 𝐴𝐵→𝐴′ 𝐵′ (·) B (C𝑥𝐴→𝐴′ ⊗ D𝑥𝐵→𝐵′ )(·) (4.6.73)
𝑥=1
= (𝐾1 ⊗ 𝐾2 )(·)(𝐾1 ⊗ 𝐾2 ) † + (𝐾2 ⊗ 𝐾1 )(·)(𝐾2 ⊗ 𝐾1 ) † (4.6.74)
+ (𝐾3 ⊗ 𝐾3 )(·)(𝐾3 ⊗ 𝐾3 ) † (4.6.75)
is a separable channel, but it can be shown that it is not an LOCC channel; please
consult the Bibliographic Notes in Section 3.4.
200
Chapter 4: Quantum Channels

ρA
ρA N A→ B N(ρ A ) −→ L↔
RAB0 → B N(ρ A )
ω RB0

N A→ B

Figure 4.8: Depiction of an LOCC-simulable channel N 𝐴→𝐵 with associated

resource state 𝜔 𝑅𝐵′ . The channel N 𝐴→𝐵 can be realized via the LOCC channel
L↔𝑅 𝐴𝐵′ →𝐵 and the resource state 𝜔 𝑅𝐵 , for some auxiliary system 𝑅. The LOCC
′
↔
channel L 𝑅 𝐴𝐵′ →𝐵 could in principle consist of a sequence of several rounds of
one-way LOCC channels, as depicted in Figure 4.7.

4.6.2.1 LOCC and Separable Simulation of Channels

Given a bipartite quantum channel N 𝐴𝐵→𝐴′ 𝐵′ , an important question related to the

physical realization of the channel is whether the channel is an LOCC channel.
This amounts to determining whether the channel can be decomposed as in (4.6.52).
More generally, one can ask whether a given channel N 𝐴→𝐵 can be simulated by an
LOCC channel acting on the input and a resource state, a notion that is illustrated
in Figure 4.8 and defined below.

Definition 4.25 LOCC-Simulable Channel

A channel N 𝐴→𝐵 is called LOCC-simulable with associated resource state 𝜔 𝑅𝐵′
if there exists an auxiliary system 𝑅 and an LOCC channel L↔
𝑅 𝐴𝐵′ →𝐵 such that,
for every state 𝜌 𝐴 ,

N(𝜌 𝐴 ) = L↔
𝑅 𝐴𝐵′ →𝐵 (𝜌 𝐴 ⊗ 𝜔 𝑅𝐵′ ). (4.6.76)

For the LOCC channel L↔ 𝑅 𝐴𝐵′ →𝐵 , the input systems 𝑅 𝐴 are Alice’s, the input
′
system 𝐵 is Bob’s, Alice’s output system is trivial, and Bob’s output system
is 𝐵.

If a channel N 𝐴→𝐵 is LOCC-simulable with associated resource state 𝜔 𝑅𝐵′ , it

means that Alice and Bob, the sender and receiver of the channel, respectively,
can execute an LOCC channel of the form as depicted in Figure 4.7, with the
assistance of the auxiliary system 𝑅, such that the output on Bob’s system at the

201
Chapter 4: Quantum Channels

end is N(𝜌 𝐴 ). The resource state 𝜔 𝑅𝐵′ is fixed, being such that the same resource
state can be used for every input state 𝜌 𝐴 on Alice’s system. A concrete example of
an LOCC simulation of a channel is shown in the context of quantum teleportation
in Section 5.1 below.
Due to the fact that separable channels strictly contain LOCC channels, it is
sensible to generalize the notion of teleportation simulation even further to this
case:

Definition 4.26 Separable-Simulable Channel

A channel N 𝐴→𝐵 is called separable-simulable with associated resource state
𝜔 𝑅𝐵′ if there exists an auxiliary system 𝑅 and a separable channel S 𝑅 𝐴𝐵′ →𝐵
such that, for every state 𝜌 𝐴 ,

N(𝜌 𝐴 ) = S 𝑅 𝐴𝐵′ →𝐵 (𝜌 𝐴 ⊗ 𝜔 𝑅𝐵′ ). (4.6.77)

For the separable channel S↔

𝑅 𝐴𝐵′ →𝐵 , the input systems 𝑅 𝐴 are Alice’s, the input
′
system 𝐵 is Bob’s, Alice’s output system is trivial, and Bob’s output system
is 𝐵.

4.6.3 Completely PPT-Preserving Channels

In Definition 3.17, we introduced PPT states as bipartite states 𝜌 𝐴𝐵 such that

the partial transpose T𝐵 (𝜌 𝐴𝐵 ) is positive semi-definite. A class of channels that
preserve PPT states are called completely PPT-preserving channels, abbreviated as
C-PPT-P channels, and we define them formally as follows.

Definition 4.27 Completely PPT-Preserving Channel

A channel P 𝐴𝐵→𝐴′ 𝐵′ is called completely PPT-preserving if the following map
is a channel:
T𝐵′ ◦ P 𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 . (4.6.78)

Proposition 4.28
Completely PPT-preserving channels preserve the set of PPT states.

202
Chapter 4: Quantum Channels

Proof: Suppose that 𝜌 𝐴𝐵 is a PPT state and that P 𝐴𝐵→𝐴′ 𝐵′ is a completely PPT-
preserving channel. If we take the partial transpose T𝐵′ on the output state
𝜎𝐴′ 𝐵′ = P 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ), then we find that

T𝐵′ (𝜎𝐴′ 𝐵′ ) = (T𝐵′ ◦ P 𝐴𝐵→𝐴′ 𝐵′ )(𝜌 𝐴𝐵 ) (4.6.79)

= (T𝐵 ◦ P 𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 )(T𝐵 (𝜌 𝐴𝐵 )). (4.6.80)

Since P 𝐴𝐵→𝐴′ 𝐵′ is completely PPT-preserving, by definition T𝐵′ ◦ P 𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵

is completely positive. Since T𝐵 (𝜌 𝐴𝐵 ) is positive, this implies that T𝐵′ (𝜎𝐴′ 𝐵′ ) is
positive, which means that the output state is a PPT state. ■

Proposition 4.29
Every separable channel is a completely PPT-preserving channel.

Proof: Let S 𝐴𝐵→𝐴′ 𝐵′ be a separable channel. By definition, it has the form

∑︁
S 𝐴𝐵→𝐴′ 𝐵′ = R𝑥𝐴→𝐴′ ⊗ W𝑥𝐵→𝐵′ , (4.6.81)
𝑥∈X

where X is a finite alphabet and {R𝑥 }𝑥∈X and {W𝑥 }𝑥∈X are sets of completely
Í
positive trace non-increasing maps such that 𝑥∈X R𝑥 ⊗ W𝑥 is trace preserving.
Then,
∑︁
T𝐵′ ◦ S 𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 = R𝑥𝐴→𝐴′ ⊗ (T𝐵′ ◦ W𝑥𝐵→𝐵′ ◦ T𝐵 ). (4.6.82)
𝑥∈X

By applying Lemma 4.30 below, we conclude that the maps T𝐵′ ◦ W𝑥𝐵→𝐵′ ◦ T𝐵
are completely positive for all 𝑥 ∈ X, which means that T𝐵′ ◦ S 𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 is
completely positive. Therefore, S 𝐴𝐵→𝐴′ 𝐵′ is completely PPT-preserving. ■

As a consequence of Proposition 4.29, it follows that every LOCC channel is a

completely PPT-preserving channel, because every LOCC channel is a separable
channel.
The next lemma applies to channels that do not have a bipartite structure, i.e.,
there is no input or output system for Alice.

203
Chapter 4: Quantum Channels

Lemma 4.30
Let N𝐵→𝐵′ be a completely positive map. Then the map T𝐵′ ◦ N𝐵→𝐵′ ◦ T𝐵 is
completely positive, and its Choi operator is given by the full transpose of the
Choi operator for N𝐵→𝐵′ , i.e.,
T ′ ◦N 𝐵→𝐵′ ◦T 𝐵 N
Γ𝐵𝐵𝐵 ′ = T(Γ𝐵𝐵 ′ ). (4.6.83)

Proof: Let Γ𝐵 𝐵ˆ = |Γ⟩⟨Γ| 𝐵 𝐵ˆ , where |Γ⟩ is defined in (2.2.34) and 𝑑 𝐵ˆ = 𝑑 𝐵 . Observe

that T𝐵 (Γ𝐵𝐵
ˆ ) = T 𝐵ˆ (Γ𝐵𝐵
ˆ ), since

𝐵 −1
𝑑∑︁
T𝐵 (Γ𝐵𝐵
ˆ ) = |𝑖⟩⟨ 𝑗 | 𝐵ˆ ⊗ (|𝑖⟩⟨ 𝑗 | 𝐵 ) T (4.6.84)
𝑖, 𝑗=0
𝐵 −1
𝑑∑︁
= |𝑖⟩⟨ 𝑗 | 𝐵ˆ ⊗ | 𝑗⟩⟨𝑖| 𝐵 (4.6.85)
𝑖, 𝑗=0
𝐵 −1
𝑑∑︁
= (| 𝑗⟩⟨𝑖| 𝐵ˆ ) T ⊗ | 𝑗⟩⟨𝑖| 𝐵 (4.6.86)
𝑖, 𝑗=0
= T𝐵ˆ (Γ𝐵𝐵
ˆ ). (4.6.87)

Then the following holds for the Choi representation of T𝐵′ ◦ N𝐵→𝐵′ ◦ T𝐵 :
T ′ ◦N 𝐵→𝐵′ ◦T 𝐵
Γ𝐵𝐵𝐵 ′ = (T𝐵′ ◦ N𝐵→𝐵′ ◦ T𝐵 )(Γ𝐵𝐵ˆ ) (4.6.88)
= ((T𝐵ˆ ⊗ T𝐵′ ) ◦ N𝐵→𝐵′ )(Γ𝐵𝐵
ˆ ) (4.6.89)
N
= T(Γ𝐵𝐵 ′ ). (4.6.90)

Since the map N𝐵→𝐵′ is completely positive, its Choi representation Γ𝐵𝐵N is positive
′
semi-definite. Since positive semi-definiteness is preserved under transposition, we
N ) is positive semi-definite, which means that the map T ′ ◦N
find that T(Γ𝐵𝐵 ′ 𝐵 𝐵→𝐵′ ◦T 𝐵
is completely positive (by applying Theorem 4.3). ■

As a generalization of the lemma above, we have the following:

204
Chapter 4: Quantum Channels

Proposition 4.31 Choi States of C-PPT-P Channels

Let N 𝐴𝐵→𝐴′ 𝐵′ be a bipartite channel. The channel N 𝐴𝐵→𝐴′ 𝐵′ is completely
PPT-preserving if and only if its Choi state ΦN
𝐴𝐵𝐴′ 𝐵′ is a PPT state with respect
′ ′
to the bipartite cut 𝐴𝐴 |𝐵𝐵 .

Proof: We begin by proving the only-if part. Suppose that N 𝐴𝐵→𝐴′ 𝐵′ is com-
pletely PPT-preserving. By definition, this implies that T𝐵′ ◦ N 𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 is
completely positive. By the definition of complete positivity, we conclude that
(T𝐵′ ◦ N 𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 )(Φ 𝐴¯ 𝐵𝐴𝐵
¯ ) is a positive semi-definite operator, where the
Hilbert spaces corresponding to systems 𝐴¯ and 𝐵¯ are isomorphic to the Hilbert
spaces corresponding to the input systems 𝐴 and 𝐵, respectively. By employing a
calculation similar to that in (4.6.84)–(4.6.90), we conclude that

(T𝐵′ ◦ N 𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 )(Φ 𝐴¯ 𝐵𝐴

¯ ′ 𝐵′ ) = ((T 𝐵¯ ⊗ T 𝐵′ ) ◦ N 𝐴𝐵→𝐴′ 𝐵′ )(Φ 𝐴¯ 𝐵𝐴𝐵
¯ ), (4.6.91)

which implies that the Choi state N 𝐴𝐵→𝐴′ 𝐵′ )(Φ 𝐴¯ 𝐵𝐴𝐵

¯ ) is a PPT state with respect to
′ ′
the bipartite cut 𝐴𝐴 |𝐵𝐵 . Running this calculation backwards and making use of
Theorem 4.3 establishes the if-part of the proposition. ■

4.6.3.1 PPT Simulation of Channels

Just as we asked whether a given channel N 𝐴→𝐵 can be simulated by an LOCC

channel, we can ask whether the channel N 𝐴→𝐵 can be simulated by a completely
PPT-preserving channel. This leads to the following definition:

Definition 4.32 PPT-Simulable Channel

A channel N 𝐴→𝐵 is called PPT-simulable with associated resource state 𝜔 𝑅𝐵′
if there exists an auxiliary system 𝑅 and a completely PPT-preserving channel
P 𝑅 𝐴𝐵′ →𝐵 such that for every state 𝜌 𝐴

N(𝜌 𝐴 ) = P 𝑅 𝐴𝐵′ →𝐵 (𝜌 𝐴 ⊗ 𝜔 𝑅𝐵′ ). (4.6.92)

For the C-PPT-P channel P 𝑅 𝐴𝐵′ →𝐵 , the input systems 𝑅 𝐴 are Alice’s, the input
system 𝐵′ is Bob’s, Alice’s output system is trivial, and Bob’s output system
is 𝐵.

205
Chapter 4: Quantum Channels

ρA
ρA N A→ B N(ρ A ) −→ PRAB0 → B N(ρ A )
ω RB0

N A→ B

Figure 4.9: Depiction of a PPT-simulable channel N 𝐴→𝐵 with associated

resource state 𝜔 𝑅𝐵′ . The channel N 𝐴→𝐵 can be realized via the completely
PPT-preserving channel P 𝑅 𝐴𝐵′ →𝐵 and the resource state 𝜔 𝑅𝐵′ , for some auxiliary
system 𝑅.

The definition of a PPT-simulable channel is illustrated in Figure 4.9.

4.6.4 Non-Signaling Channels

One of the main applications considered in this book is communication and, more
specifically, when communication is possible or impossible. To this end, suppose
that Alice and Bob are connected by means of a bipartite channel N 𝐴𝐵→𝐴′ 𝐵′ . Such
a channel is said to be non-signaling from Alice to Bob if it is impossible for Alice
and Bob to make use of it for the purpose of Alice to communicate a message to
Bob. We give a precise definition as follows:

Definition 4.33 Non-Signaling Channel

A bipartite channel N 𝐴𝐵→𝐴′ 𝐵′ is non-signaling from Alice to Bob if the following
condition holds

Tr 𝐴′ ◦N 𝐴𝐵→𝐴′ 𝐵′ = Tr 𝐴′ ◦N 𝐴𝐵→𝐴′ 𝐵′ ◦ R𝜋𝐴 , (4.6.93)

where R𝜋𝐴 is a replacer channel, defined as R𝜋𝐴 (·) := Tr 𝐴 [·]𝜋 𝐴 , with 𝜋 𝐴 :=

1 𝐴 /𝑑 𝐴 the maximally mixed state on system 𝐴.

To interpret the condition in (4.6.93), consider the following. For Bob, the
reduced state of his output system 𝐵′ is obtained by tracing out Alice’s output
system 𝐴′. Note that the reduced state on 𝐵′ is all that Bob can access at the output
in this scenario. If the condition in (4.6.93) holds, then the reduced state on Bob’s

206
Chapter 4: Quantum Channels

output system 𝐵′ has no dependence on Alice’s input system. Thus, if (4.6.93)

holds, then Alice cannot use N 𝐴𝐵→𝐴′ 𝐵′ to send a signal to Bob.

Proposition 4.34 Choi Operator of a Non-Signaling Channel

Let N 𝐴𝐵→𝐴′ 𝐵′ be a bipartite channel. Let

ΓN
𝐴𝐵𝐴′ 𝐵′ B N 𝐴¯ 𝐵→𝐴
¯ ′ 𝐵 ′ (Γ 𝐴 𝐴
¯ ⊗ Γ𝐵 𝐵¯ ) (4.6.94)

be the Choi operator of N 𝐴𝐵→𝐴′ 𝐵′ , where 𝐴¯ is isomorphic to 𝐴 and 𝐵¯ is

isomorphic to 𝐵. The channel N 𝐴𝐵→𝐴′ 𝐵′ is non-signaling from Alice to Bob if
and only if its Choi operator ΓN
𝐴𝐵𝐴′ 𝐵′ satisfies the following condition:

Tr 𝐴′ [ΓN N
𝐴𝐵𝐴′ 𝐵′ ] = 𝜋 𝐴 ⊗ Tr 𝐴′ 𝐴 [Γ 𝐴𝐵𝐴′ 𝐵′ ]. (4.6.95)

Proof: We begin by proving the if-part. Consider that

N
Tr 𝐴′ ◦N 𝐴¯ 𝐵→𝐴
¯ ′ 𝐵 ′ (Γ 𝐴 𝐴
¯ ⊗ Γ𝐵 𝐵¯ ) = Tr 𝐴′ [Γ 𝐴𝐵𝐴′ 𝐵′ ] (4.6.96)

Also, consider that

′ 𝐵 ′ ◦ R ¯ )(Γ 𝐴 𝐴
𝜋
(Tr 𝐴′ ◦N 𝐴¯ 𝐵→𝐴
¯ 𝐴 ¯ ⊗ Γ𝐵 𝐵¯ )

= (Tr 𝐴′ ◦N 𝐴¯ 𝐵→𝐴
¯ ′ 𝐵 ′ )( 1 𝐴 ⊗ 𝜋 𝐴 ¯ ⊗ Γ𝐵 𝐵¯ ) (4.6.97)
= (Tr 𝐴′ ◦N 𝐴¯ 𝐵→𝐴
¯ ′ 𝐵 ′ )(𝜋 𝐴 ⊗ 1 𝐴 ¯ ⊗ Γ𝐵 𝐵¯ ) (4.6.98)
= 𝜋 𝐴 ⊗ (Tr 𝐴′ ◦N 𝐴¯ 𝐵→𝐴
¯ ′ 𝐵 ′ )( 1 𝐴
¯ ⊗ Γ𝐵 𝐵¯ ) (4.6.99)
= 𝜋 𝐴 ⊗ (Tr 𝐴𝐴′ ◦N 𝐴¯ 𝐵→𝐴 ¯ ′ 𝐵 ′ )(Γ 𝐴 𝐴
¯ ⊗ Γ𝐵 𝐵¯ ) (4.6.100)
= 𝜋 𝐴 ⊗ Tr 𝐴′ 𝐴 [ΓN
𝐴𝐵𝐴′ 𝐵′ ]. (4.6.101)

Thus, we conclude that (4.6.93) implies (4.6.95).

To see the other implication, we simply run the reasoning given above backwards
and note that two channels are equal if and only if their Choi operators are equal
(see Theorem 4.3). ■

A one-way LOCC channel from Bob to Alice is an interesting example of

a bipartite channel that is non-signaling from Alice to Bob. Indeed, consider a
bipartite channel of the form in (4.6.51), and let us check that the condition in

207
Chapter 4: Quantum Channels

(4.6.93) holds for such a channel. By tracing over the output system 𝐴′, we find that
∑︁
Tr 𝐴′ [L←
𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 )] = Tr 𝐴′ [(N𝑥𝐴→𝐴′ ⊗ M𝑥𝐵→𝐵′ )(𝜌 𝐴𝐵 )] (4.6.102)
𝑥∈X
∑︁
= M𝑥𝐵→𝐵′ (Tr 𝐴 [𝜌 𝐴𝐵 ]) (4.6.103)
𝑥∈X
∑︁
= M𝑥𝐵→𝐵′ (𝜌 𝐵 ). (4.6.104)
𝑥∈X

The second equality follows because N𝑥𝐴→𝐴′ is trace preserving for all 𝑥. Also,
consider that

(Tr 𝐴′ ◦L←
𝐴𝐵→𝐴′ 𝐵′ ◦ R 𝐴 )(𝜌 𝐴𝐵 )
𝜋

= (Tr 𝐴′ ◦L← 𝐴𝐵→𝐴′ 𝐵′ )(𝜋 𝐴 ⊗ 𝜌 𝐵 ) (4.6.105)

∑︁
= Tr 𝐴′ [(N𝑥𝐴→𝐴′ ⊗ M𝑥𝐵→𝐵′ )(𝜋 𝐴 ⊗ 𝜌 𝐵 )] (4.6.106)
𝑥∈X
∑︁
= M𝑥𝐵→𝐵′ (Tr 𝐴 [𝜋 𝐴 ⊗ 𝜌 𝐵 ]) (4.6.107)
𝑥∈X
∑︁
= M𝑥𝐵→𝐵′ (𝜌 𝐵 ). (4.6.108)
𝑥∈X

Thus, the condition in (4.6.93) holds, and as expected, L←

𝐴𝐵→𝐴′ 𝐵′ is non-signaling
from Alice to Bob.

4.7 Summary

4.8 Bibliographic Notes

For an introduction to quantum dynamics, and in particular to the Schrödinger
equation and explicit forms for the unitaries 𝑈 (𝑡) describing the evolution of closed
quantum systems, please see the book of Sakurai (1994). The book of Breuer
and Petruccione (2002) provides a general introduction to the evolution of open
quantum systems as a quantum generalization of classical stochastic processes,
and shows how completely positive trace-preserving maps (quantum channels)
arise from extensions of the Schrödinger and von Neumann equations to “master
equations”. We also refer to the tutorial article of Milz and Modi (2021) for a
208
Chapter 4: Quantum Channels

similar exposition on quantum channels from the perspective of stochastic processes

and master equations.
The Choi representation of a linear map is named after Choi (1975), with
his paper establishing the equivalence between complete positivity of a map and
positivity of its Choi representation. Naimark’s dilation theorem for positive
operator-valued measures was established by Naimark (1940) (see also Gelfand and
Naimark (1943)). Stinespring’s dilation theorem for completely positive maps was
established by Stinespring (1955). An early exposition of the theory of quantum
channels was presented by Kraus (1983). Leifer (2007) proposed the concept of
conditional states (see also (Leifer and Spekkens, 2013)). Proposition 4.4 was
found by Christandl and Winter (2004). The channel in (4.4.13) for reversing the
action of an isometric channel was presented by Wilde (2017a). The notions of a
complementary channel and a degradable channel were defined by Devetak and
Shor (2005). Anti-degradable channels were defined by Caruso and Giovannetti
(2006). The quantum erasure channel was defined by Grassl et al. (1997). The
connection of the amplitude damping channel to the bosonic pure-loss channel was
realized by Giovannetti and Fazio (2005). The quantum instrument formalism was
developed by Davies and Lewis (1970) and further developed by Ozawa (1984).
Entanglement-breaking channels were defined by Horodecki et al. (2003) and
several of their properties were established therein. Hadamard channels were
defined by King et al. (2007).
The Petz recovery map was established by Petz (1986b) and Petz (1988) in the
context of proving the conditions under which equality holds in the data-processing
inequality for the quantum relative entropy. The results therein are stated for von
Neumann algebras. A more accessible exposition that considers operators acting
only on finite-dimensional Hilbert spaces can be found in Petz (2003) and Mosonyi
and Petz (2004) (see also Hayden et al. (2004)).
The paradigm of local operations and classical communication was defined
by Bennett et al. (1996c), and its mathematical properties were explored in more
detail by Chitambar et al. (2014). See Section 4.3 of Chitambar et al. (2014) for a
justification of the example provided in Section 4.6.2 of a separable channel that is
not an LOCC channel. Separable channels were defined by Vedral et al. (1997);
Barnum et al. (1998). The existence of a separable channel that is not LOCC was
found by Bennett et al. (1999a). Completely PPT-preserving channels were defined
by Rains (1999a) and further developed by Chitambar et al. (2020). Non-signaling
channels were introduced by Beckman et al. (2001) and further considered by

209
Chapter 4: Quantum Channels

Eggeling et al. (2002); Piani et al. (2006).

4.9 Problems
𝑟 and {𝐾 ′ } 𝑠 be two sets of Kraus operators
1. Let N be a quantum channel, and let {𝐾𝑖 }𝑖=1 𝑖 𝑖=1
for N. Prove that these sets of Kraus operators are related by an isometry as in (4.3.3).

Í −1
2. Let 𝐹𝐴𝐴′ = 𝑖,𝑑 𝐴𝑗=0 | 𝑗, 𝑖⟩⟨𝑖, 𝑗 | 𝐴𝐴′ be the swap operator, as defined in (3.2.83), and let
N 𝐴→𝐵 be a superoperator. Consider the operator

𝐴 −1
𝑑∑︁
N
𝐹𝐴𝐵 B N 𝐴′ →𝐵 (𝐹𝐴𝐴′ ) = | 𝑗⟩⟨𝑖| 𝐴 ⊗ N(|𝑖⟩⟨ 𝑗 | 𝐴′ ). (4.9.1)
𝑖, 𝑗=0

N = T (ΓN ).
(a) Prove that 𝐹𝐴𝐵 𝐴 𝐴𝐵

𝐴→𝐵 ( 1 𝐴 ).
N ] =N
(b) Prove that Tr 𝐴 [𝐹𝐴𝐵

N ] =1 .
(c) If N 𝐴→𝐵 is trace preserving, then prove that Tr 𝐵 [𝐹𝐴𝐵 𝐴

D E
(d) Prove that 𝑋 𝐴† ⊗ 𝑌𝐵 , 𝐹𝐴𝐵
N = ⟨𝑌𝐵 , N 𝐴→𝐵 (𝑋 𝐴 )⟩ for all 𝑋 𝐴 ∈ L(H 𝐴 ) and 𝑌𝐵 ∈
L(H𝐵 ).

N uniquely characterizes N, just as the Choi representation, by showing

(e) Prove that 𝐹𝐴𝐵
that
N 𝐴→𝐵 (𝑋 𝐴 ) = Tr 𝐴 [(𝑋 𝐴 ⊗ 1𝐵 )𝐹𝐴𝐵
N
] (4.9.2)
for every linear operator 𝑋 𝐴 .

N can be expressed in terms of the adjoint N† as follows:

(f) Prove that 𝐹𝐴𝐵

𝐵 −1
𝑑∑︁
N
𝐹𝐴𝐵 = N† (|𝑘⟩⟨ℓ| 𝐵 ) † ⊗ |𝑘⟩⟨ℓ| 𝐵 . (4.9.3)
𝑘,ℓ=0

N ] = N† ( 1 ) † . Furthermore, if N is Hermiticity preserving,

Conclude that Tr 𝐵 [𝐹𝐴𝐵 𝐵
then conclude that Tr 𝐵 [𝐹𝐴𝐵N ] = N† ( 1 ) and that 𝐹 N = (N† ) ′
𝐵 𝐴𝐵 𝐵 →𝐴 (𝐹𝐵′ 𝐵 ), where
Í𝑑 𝐵 −1
𝐹𝐵′ 𝐵 = 𝑘,ℓ=0 |ℓ, 𝑘⟩⟨𝑘, ℓ| 𝐵′ 𝐵 is the swap operator for system 𝐵.

210
Chapter 4: Quantum Channels

(g) Prove that, for every unitary operator 𝑈 𝐴 ,

N
𝐹𝐴𝐵 = (U 𝐴 ⊗ N 𝐴′ →𝐵 ◦ U 𝐴′ )(𝐹𝐴𝐴′ ). (4.9.4)

N (|𝜓⟩ ⊗ |𝜙⟩ ) ≥ 0
(h) Prove that N 𝐴→𝐵 is a positive map if and only if (⟨𝜓| 𝐴 ⊗ ⟨𝜙| 𝐵 )𝐹𝐴𝐵 𝐴 𝐵
for all |𝜓⟩ ∈ H 𝐴 and |𝜙⟩𝐵 ∈ H𝐵 .
(Bibliographic Note: The representation in (4.9.1) was defined by de Pillis (1967). It is
sometimes called the Jamiołkowski representation of N due to the work of Jamiołkowski
N in (h) such that N
(1972), who proved the necessary and sufficient condition on 𝐹𝐴𝐵 𝐴→𝐵
is positive.)

⊥ defined in (3.2.137) and (3.2.138), respectively, and

3. Consider the states 𝜁 𝐴𝐵 and 𝜁 𝐴𝐵
𝜌
recall the channel N 𝐴→𝐵 defined in (4.2.16) for a given bipartite state 𝜌 𝐴𝐵 .
(a) Show that
1
Tr[𝑋] 1𝑑 − 𝑋 T ,
𝜁
N 𝐴→𝐵 (𝑋) = (4.9.5)
𝑑−1
𝜁⊥ 1
Tr[𝑋] 1𝑑 + 𝑋 T

N 𝐴→𝐵 (𝑋) = (4.9.6)
𝑑+1
for every linear operator 𝑋 ∈ L(H 𝐴 ).
⊥
(Bibliographic Note: The quantum channels N 𝜁 and N 𝜁 are sometimes called
Werner–Holevo channels, after Werner and Holevo (2002).)

211
Chapter 5

Fundamental Quantum
Information Processing Tasks
Having studied quantum states, measurements, and channels in detail in the
previous two chapters, we are now ready to study three fundamental tasks in
quantum information processing: quantum teleportation, quantum super-dense
coding, and quantum hypothesis testing. Quantum hypothesis testing has been
studied since the late 1960s, with the aim of generalizing (classical) statistical
hypothesis testing to the quantum setting. The discovery of quantum teleportation
and super-dense coding in the early 1990s demonstrated the practical advantages
that entanglement could allow for with respect to communication, and it contributed
to the rise of quantum information science as a prominent field of study in both
theoretical and experimental physics.
All of the tasks that we study in this chapter provide us with prototypes of some
of the quantum communication scenarios that we consider in Parts II and III of this
book. In particular, listed below are the tasks and protocols that we study in this
chapter and how they are connected to the communication tasks that we study later.
• Quantum teleportation (Section 5.1) is connected to the task of quantum
communication (Chapter 14), and in particular to LOCC-assisted quantum
communication (Chapter 19).
• Quantum super-dense coding (Section 5.2) is connected to the task of entangle-
ment-assisted classical communication (Chapter 11).
• Quantum hypothesis testing (Section 5.3), in particular state discrimination
212
Chapter 5: Fundamental Quantum Information Processing Tasks

in Section 5.3.1 and Section 5.3.2, is connected to classical communication

(Chapter 12). Furthermore, asymmetric hypothesis testing in Section 5.3.3 is
fundamental to the analysis of every communication scenario that we consider
in this book, as it provides us with a method for placing an upper bound on the
rate of communication with a finite number of uses of a quantum channel.
Several fundamental quantities used in quantum information theory, particularly
in the analysis of quantum communication protocols, arise naturally in the context
of quantum hypothesis testing. For example, the trace distance arises in terms of
the optimal success probability for discriminating between two quantum states in
the task of symmetric hypothesis testing (Section 5.3.1), and similarly the diamond
distance arises in terms of the optimal success probability for discriminating between
two quantum channels in the task of symmetric quantum channel hypothesis testing
(Section 5.4). The Chernoff divergence quantifies the optimal error exponent
for symmetric hypothesis testing of two quantum states in the asymptotic setting
(Section 5.3.1.1), and this quantity is related to the Petz–Rényi relative entropy,
which we define later in Section 7.4. With respect to asymmetric hypothesis testing
of two quantum states, the optimal error probability defines the so-called hypothesis
testing relative entropy (Section 7.9), and in the asymptotic setting the optimal
error exponent is given by the quantum relative entropy (Section 7.2). The task
of hypothesis testing thus provides several fundamental quantities in quantum
information theory with an operational meaning, and we devote Chapters 6 and 7
to the detailed study of these and other quantities.

5.1 Quantum Teleportation

Quantum teleportation is a remarkable and fundamental protocol in quantum
information theory. The simplest version of the protocol allows two parties, Alice
and Bob, to transfer the state of a qubit from Alice to Bob while making use of
one shared pair of qubits in a maximally entangled state, along with two bits of
classical communication.

5.1.1 Qubit Teleportation Protocol

Before stating the basic teleportation protocol, let us start by introducing a key
element of the protocol, the Bell measurement.
213
Chapter 5: Fundamental Quantum Information Processing Tasks

The Bell measurement is a measurement on two qubits defined by the POVM

{|Φ𝑧,𝑥 ⟩⟨Φ𝑧,𝑥 | : 𝑧, 𝑥 ∈ {0, 1}}, where we recall from (3.2.43) that the two-qubit Bell
states are defined as
|Φ𝑧,𝑥 ⟩ 𝐴𝐵 = ( 1 𝐴 ⊗ 𝑋𝐵𝑥 𝑍 𝐵𝑧 )|Φ⟩ 𝐴𝐵 = (𝑍 𝐴𝑧 𝑋 𝐴𝑥 ⊗ 1𝐵 )|Φ⟩ 𝐴𝐵 (5.1.1)
for all 𝑥, 𝑧 ∈ {0, 1}, so that
1
|Φ0,0 ⟩ B √ (|0, 0⟩ + |1, 1⟩), (5.1.2)
2
1
|Φ1,0 ⟩ B √ (|0, 0⟩ − |1, 1⟩), (5.1.3)
2
1
|Φ0,1 ⟩ B √ (|0, 1⟩ + |1, 0⟩), (5.1.4)
2
1
|Φ1,1 ⟩ B √ (|0, 1⟩ − |1, 0⟩). (5.1.5)
2
The Bell states are all maximally entangled, and they form an orthonormal
Í1 basis for
the Hilbert space C ⊗ C of two qubits. As such, we have that 𝑧,𝑥=0 |Φ𝑧,𝑥 ⟩⟨Φ𝑧,𝑥 | =
2 2

12 ⊗ 12 , so that the set {|Φ𝑧,𝑥 ⟩⟨Φ𝑧,𝑥 | : 𝑧, 𝑥 ∈ {0, 1}} is indeed a POVM. Furthermore,
the classical bits 𝑥 and 𝑧 can be viewed as being the outcomes of the measurement.
We can write the usual computational basis states for two qubits in terms of the
Bell states as
1
|0, 0⟩ = √ (|Φ0,0 ⟩ + |Φ1,0 ⟩), (5.1.6)
2
1
|0, 1⟩ = √ (|Φ0,1 ⟩ + |Φ1,1 ⟩), (5.1.7)
2
1
|1, 0⟩ = √ (|Φ0,1 ⟩ − |Φ1,1 ⟩), (5.1.8)
2
1
|1, 1⟩ = √ (|Φ0,0 ⟩ − |Φ1,0 ⟩). (5.1.9)
2

We now detail the teleportation protocol; see Figure 5.1 for a circuit diagram
depicting the protocol. The protocol starts with Alice and Bob sharing two qubits
in the state |Φ⟩ 𝐴𝐵 . Alice has an additional qubit, which is in the state |𝜓⟩ 𝐴′ , that
she wishes to teleport to Bob, where
|𝜓⟩ 𝐴′ = 𝛼|0⟩ 𝐴′ + 𝛽|1⟩ 𝐴′ , 𝛼, 𝛽 ∈ C, |𝛼| 2 + |𝛽| 2 = 1. (5.1.10)
214
Chapter 5: Fundamental Quantum Information Processing Tasks

|ψiA0 z
x
Bell
Measurement
Alice
|ΦiAB
Bob

Xx Zz |ψiB

Figure 5.1: Circuit diagram for the quantum teleportation protocol. The
protocol accomplishes the task of sending a quantum state 𝜓 from Alice to
Bob using a shared entangled state and two bits of classical communication.
The outcomes (𝑥, 𝑧) of Alice’s Bell measurement on her qubits 𝐴 and 𝐴′ are
communicated to Bob, who applies the unitary 𝑍 𝑧 𝑋 𝑥 on his qubit to transform
it to the state 𝜓 that Alice wished to send.

The state |𝜓⟩ 𝐴′ is arbitrary and need not be known to either Alice or Bob. The
overall joint state between Alice and Bob at the start of the protocol is therefore
1
|𝜓⟩ 𝐴′ ⊗ |Φ⟩ 𝐴𝐵 = √ (𝛼|0, 0, 0⟩ 𝐴′ 𝐴𝐵 + 𝛼|0, 1, 1⟩ 𝐴′ 𝐴𝐵
2 (5.1.11)
+ 𝛽|1, 0, 0⟩ 𝐴′ 𝐴𝐵 + 𝛽|1, 1, 1⟩ 𝐴′ 𝐴𝐵 ).
Alice and Bob then proceed as follows.
1. Alice performs a Bell measurement on her two qubits 𝐴′ and 𝐴. To determine
the measurement outcomes and their probabilities, it is helpful to write down the
initial state (5.1.11) in the Bell basis on Alice’s systems. Using (5.1.6)–(5.1.9),
we find that
|𝜓⟩ 𝐴′ ⊗ |Φ⟩ 𝐴𝐵
1
= |Φ0,0 ⟩ 𝐴′ 𝐴 ⊗ (𝛼|0⟩𝐵 + 𝛽|1⟩𝐵 ) +|Φ1,0 ⟩ 𝐴′ 𝐴 ⊗ (𝛼|0⟩𝐵 − 𝛽|1⟩𝐵 )
2
+|Φ0,1 ⟩ 𝐴′ 𝐴 ⊗ (𝛼|1⟩𝐵 + 𝛽|0⟩𝐵 ) +|Φ1,1 ⟩ 𝐴′ 𝐴 ⊗ (𝛼|1⟩𝐵 − 𝛽|0⟩𝐵 ) (5.1.12)
1
= |Φ0,0 ⟩ 𝐴′ 𝐴 ⊗ |𝜓⟩𝐵 +|Φ1,0 ⟩ 𝐴′ 𝐴 ⊗ 𝑍 𝐵 |𝜓⟩𝐵
2
+|Φ0,1 ⟩ 𝐴′ 𝐴 ⊗ 𝑋𝐵 |𝜓⟩𝐵 +|Φ1,1 ⟩ 𝐴′ 𝐴 ⊗ 𝑋𝐵 𝑍 𝐵 |𝜓⟩𝐵 . (5.1.13)
From this, it is clear that each outcome (𝑥, 𝑧) ∈ {0, 1}2 of the Bell measurement
occurs with equal probablity 14 and that the state of Bob’s qubit after the
measurement is 𝑋𝐵𝑥 𝑍 𝐵𝑧 |𝜓⟩𝐵 .
215
Chapter 5: Fundamental Quantum Information Processing Tasks

Exercise 5.1
Verify (5.1.12) and (5.1.13).

2. Alice communicates to Bob the two classical bits 𝑥 and 𝑧 resulting from the
Bell measurement.
3. Upon receiving the measurement outcomes, Bob performs 𝑋 𝑥 and then 𝑍 𝑧 on
his qubit. The resulting state of Bob’s qubit is |𝜓⟩.
Although we have described the teleportation protocol using a pure state |𝜓⟩ 𝐴′
as the state being teleported, the protocol applies just as well if the state to be
teleported is a mixed state 𝜌 𝐴′ .

5.1.2 Qudit Teleportation Protocol

The teleportation protocol for qubits described above can be easily generalized to
qudits using the Heisenberg–Weyl operators {𝑊𝑧,𝑥 : 0 ≤ 𝑧, 𝑥 ≤ 𝑑 − 1} introduced
in Definition 3.7. Specifically, recall from (3.2.58) that we define the two-qudit
Bell states in terms of the Heisenberg–Weyl operators as follows:

|Φ𝑧,𝑥 ⟩ 𝐴𝐵 B (𝑊 𝐴𝑧,𝑥 ⊗ 1𝐵 )|Φ⟩ 𝐴𝐵 , (5.1.14)

Í𝑑−1
which is a direct generalization of (5.1.1), where |Φ⟩ 𝐴𝐵 = √1 |𝑖, 𝑖⟩ 𝐴𝐵 . Just
𝑑 𝑖=0
like the qubit Bell states, the qudit Bell states form an orthonormal basis for C𝑑 ⊗ C𝑑
(see Exercise 3.12). This means that the set of operators

{|Φ𝑧,𝑥 ⟩⟨Φ𝑧,𝑥 | : 0 ≤ 𝑧, 𝑥 ≤ 𝑑 − 1} (5.1.15)

constitutes a POVM, which is the POVM corresponding to the Bell measurement

on two qudits.
Now we start, like before, with Alice holding two qudits, one shared with Bob
and in the joint state |Φ⟩ 𝐴𝐵 , and the other in the state |𝜓⟩ 𝐴′ , where
𝑑−1
∑︁ 𝑑−1
∑︁
|𝜓⟩ 𝐴′ = 𝑐𝑖 |𝑖⟩ 𝐴′ , |𝑐𝑖 | 2 = 1. (5.1.16)
𝑖=0 𝑖=0

216
Chapter 5: Fundamental Quantum Information Processing Tasks

The state |𝜓⟩ 𝐴′ is the one to be teleported to Bob’s system. The starting joint state
on the three qudits 𝐴′, 𝐴, and 𝐵 is
𝑑−1
1 ∑︁
|𝜓⟩ 𝐴′ ⊗ |Φ⟩ 𝐴𝐵 =√ 𝑐𝑖 |𝑖, 𝑗, 𝑗⟩ 𝐴′ 𝐴𝐵 . (5.1.17)
𝑑 𝑖, 𝑗=0

Alice then performs a Bell measurement on her two qudits. By writing the Bell
states |Φ𝑧,𝑥 ⟩ 𝐴′ 𝐴 as
𝑑−1
1 ∑︁ 2 𝜋i(𝑘+𝑥 ) 𝑧
|Φ𝑧,𝑥 ⟩ 𝐴′ 𝐴 =√ e 𝑑 |𝑘 + 𝑥, 𝑘⟩ 𝐴′ 𝐴 , (5.1.18)
𝑑 𝑘=0

for each outcome (𝑧, 𝑥) ∈ {0, . . . , 𝑑 − 1}2 , we use (3.3.22) to find that the corre-
sponding (unnormalized) post-measurement state of Bob’s qudit is

(⟨Φ𝑧,𝑥 | 𝐴′ 𝐴 ⊗ 1𝐵 )(|𝜓⟩ 𝐴′ ⊗ |Φ⟩ 𝐴𝐵 )

𝑑−1
1 ∑︁ 2 𝜋i(𝑘+𝑥 ) 𝑧
= 𝑐 ℓ e− 𝑑 ⟨𝑘 + 𝑥|ℓ⟩⟨𝑘 | 𝑗⟩| 𝑗⟩𝐵 (5.1.19)
𝑑 𝑗,𝑘,ℓ=0
𝑑−1
1 ∑︁ 2 𝜋i(𝑘+𝑥 ) 𝑧
= 𝑐 𝑘+𝑥 e− 𝑑 |𝑘⟩𝐵 (5.1.20)
𝑑 𝑘=0
𝑑−1
1 ∑︁ 2 𝜋i𝑘 ′ 𝑧
= 𝑐 𝑘 ′ e− 𝑑 |𝑘 ′ − 𝑥⟩𝐵 (5.1.21)
𝑑 𝑘 ′ =0
𝑑−1
1 ∑︁
= 𝑐 𝑘 ′ 𝑋 (−𝑥)𝑍 (−𝑧)|𝑘 ′⟩𝐵 (5.1.22)
𝑑 𝑘=0
1
= 𝑋 (−𝑥)𝑍 (−𝑧)|𝜓⟩𝐵 . (5.1.23)
𝑑
Therefore, each outcome occurs with probability 𝑑12 and the corresponding post-
measurement state of Bob’s qudit is 𝑋 (−𝑥)𝑍 (−𝑧)|𝜓⟩𝐵 . This means that Bob,
upon receiving the two classical values corresponding to the outcome of the Bell
measurement, can apply the unitary 𝑍 (𝑧) 𝑋 (𝑥) = 𝑊𝑧,𝑥 in order to transform the
state of his qudit to |𝜓⟩𝐵 , completing the teleportation protocol.
Of course, the teleportation protocol works just as well if the state to be
teleported is mixed. Also, as shown in Figure 5.2, we can write down the entire
217
Chapter 5: Fundamental Quantum Information Processing Tasks

ρ A0 z
x
Bell
ρ A0
Measurement
ΦAB
Alice
Bob
−→ TA0 AB→B ρB

ΦAB
Wz,x ρB

TA0 AB→B

Figure 5.2: The qudit teleportation protocol, depicted on the left, can be
regarded as an LOCC channel T 𝐴′ 𝐴𝐵→𝐵 with the input states 𝜌 𝐴′ and |Φ⟩⟨Φ| 𝐴𝐵
and the output state 𝜌 𝐵 , as shown on the right.

teleportation protocol as a one-way LOCC channel T 𝐴′ 𝐴𝐵→𝐵 from Alice to Bob,

with the state 𝜌 𝐴′ to be teleported and the state |Φ⟩⟨Φ| 𝐴𝐵 as the inputs. By the
analysis above, the output state received by Bob is exactly the same as the original
input state:
T 𝐴′ 𝐴𝐵→𝐵 (𝜌 𝐴′ ⊗ |Φ⟩⟨Φ| 𝐴𝐵 ) = 𝜌 𝐵 . (5.1.24)
This equation can also be interpreted as follows:
T 𝐴′ 𝐴𝐵→𝐵 ((·) 𝐴′ ⊗ |Φ⟩⟨Φ| 𝐴𝐵 ) = id 𝐴′ →𝐵 (·), (5.1.25)
i.e., that the teleportation protocol simulates the identity channel.
To see that the teleportation protocol can indeed be viewed as a one-way LOCC
channel from Alice to Bob, let us explicitly write down the quantum channel
T 𝐴′ 𝐴𝐵→𝐵 defined above in the form of (4.6.58), i.e.,
T 𝐴′ 𝐴𝐵→𝐵 = D𝐵𝑌1𝑌2 →𝐵 ◦ C 𝑋1 𝑋2 →𝑌1𝑌2 ◦ E 𝐴′ 𝐴→𝑋1 𝑋2 (5.1.26)
which we recall is equivalent to the form in (4.6.50). We have that
𝑑−1
∑︁
E 𝐴′ 𝐴→𝑋1 𝑋2 = E𝑧,𝑥
𝐴′ 𝐴→∅ ⊗ |𝑧, 𝑥⟩⟨𝑧, 𝑥| 𝑋1 𝑋2 , (5.1.27)
𝑧,𝑥=0
E𝑧,𝑥
𝐴′ 𝐴→∅ (·)B Tr 𝐴′ 𝐴 [|Φ𝑧,𝑥 ⟩⟨Φ𝑧,𝑥 | 𝐴′ 𝐴 (·)], (5.1.28)
C 𝑋1 𝑋2 →𝑌1𝑌2 (|𝑧, 𝑥⟩⟨𝑧, 𝑥| 𝑋1 𝑋2 ) = |𝑧, 𝑥⟩⟨𝑧, 𝑥|𝑌1𝑌2 , (5.1.29)
D𝐵𝑌1𝑌2 →𝐵 ((·) 𝐵 ⊗ |𝑧, 𝑥⟩⟨𝑧, 𝑥|𝑌1𝑌2 ) = D𝑧,𝑥 𝐵→𝐵 (·), (5.1.30)
†
D𝑧,𝑥
𝐵→𝐵 (·) B 𝑊𝑧,𝑥 (·)𝑊𝑧,𝑥 . (5.1.31)

218
Chapter 5: Fundamental Quantum Information Processing Tasks

Exercise 5.2
Combine the quantum channels in (5.1.27)–(5.1.31) according to (5.1.26) and
conclude that the channel T 𝐴′ 𝐴𝐵→𝐵 can be written as
𝑑−1
∑︁ 𝑧,𝑥 †
T 𝐴′ 𝐴𝐵→𝐵 (𝜎𝐴′ 𝐴𝐵 ) = Tr 𝐴′ 𝐴 Φ𝑧,𝑥
′ 𝑊
𝐴 𝐴 𝐵
𝑧,𝑥
(𝜎𝐴 ′ 𝐴𝐵 )(𝑊
𝐵 ) (5.1.32)
𝑧,𝑥=0

for every state 𝜎𝐴′ 𝐴𝐵 . Verify that, for the input state 𝜎𝐴′ 𝐴𝐵 = 𝜌 𝐴′ ⊗ Φ 𝐴𝐵 , we
get T 𝐴′ 𝐴𝐵→𝐵 (𝜌 𝐴′ ⊗ Φ 𝐴𝐵 ) = 𝜌 𝐵 , as expected.

We can also connect with the previously defined notion of LOCC simulation of
a quantum channel (Definition 4.25). That is, we can understand the teleportation
protocol and (5.1.24) as demonstrating that the identity channel is LOCC simulable
with associated resource state given by the maximally entangled state. Now, by
using the teleportation protocol in conjunction with a quantum channel N 𝐴→𝐵 ,
we find that every quantum channel N 𝐴→𝐵 is LOCC simulable with associated
resource state given by the maximally entangled state of an appropriate Schmidt
rank. To see this, observe that the channel N 𝐴→𝐵 can be trivially written as
N 𝐴→𝐵 = N𝐵′ →𝐵 ◦ id 𝐴→𝐵′ , where 𝐵′ is an auxiliary system with the same dimension
as 𝐴. Then, by (5.1.25), we can simulate the identity channel id 𝐴→𝐵′ using the usual
teleportation protocol, so that the overall LOCC channel L is N𝐵′ →𝐵 ◦ T 𝐴𝐴′ 𝐵′ →𝐵′
and the resource state is |Φ⟩⟨Φ| 𝐴′ 𝐵′ , with the dimension of 𝐴′ equal to the dimension
of 𝐴. This is illustrated in Figure 5.3. We can also simulate N via teleportation
in a different manner, in which Alice locally applies the channel N to her input
state 𝜌 𝐴 , then teleports the resulting state to Bob. Mathematically, we write this
as N 𝐴→𝐵 = id 𝐴→𝐵ˆ ◦ N 𝐴→ 𝐴ˆ = T 𝐴ˆ 𝐴𝐵→𝐵
˜ ◦ N 𝐴→ 𝐴ˆ , where 𝐴,
ˆ 𝐴˜ are auxiliary systems
with the same dimension as 𝐵. We thus have have the following two ways to
represent the action of the channel N using teleportation:
N 𝐴→𝐵 (𝜌 𝐴 ) = N𝐵′ →𝐵 (T 𝐴𝐴′ 𝐵′ →𝐵′ (𝜌 𝐴 ⊗ |Φ⟩⟨Φ| 𝐴′ 𝐵′ )) (5.1.33)

= T 𝐴ˆ 𝐴𝐵→𝐵
˜ N 𝐴→ 𝐴ˆ (𝜌 𝐴 ) ⊗ |Φ⟩⟨Φ| 𝐴𝐵
˜ . (5.1.34)

Depending on whether the input dimension is smaller than the output dimension
of the channel, there can be a more economical way to perform the simulation.
If the channel’s output dimension is smaller than its input dimension, then the
more economical way to simulate the channel is for Alice to apply N 𝐴→𝐵 first and
then for Alice and Bob to perform the teleportation protocol. In this way, they
219
Chapter 5: Fundamental Quantum Information Processing Tasks

ρ A0 z
x
Bell
Measurement
ρA NA→B N(ρA ) = ΦAB0

Wz,x N N(ρA0 )

TA0 AB0 →B0

L→
A0 AB0 →B

Figure 5.3: Every quantum channel N is teleportation simulable, because Alice

and Bob can first perform the usual teleportation protocol, and then Bob can
apply the channel N to the state teleported by Alice. The combination of these
two steps can be taken as the LOCC channel L→ 𝐴′ 𝐴𝐵′ →𝐵 .

exploit a maximally entangled state of Schmidt rank 𝑑 𝐵 in order to accomplish the

simulation. If the channel’s input dimension is smaller than its output dimension,
then the more economical way to simulate the channel is for Alice and Bob to
perform the teleportation protocol first, and then for Bob to apply the channel
locally. In this way, they exploit a maximally entangled state of Schmidt rank 𝑑 𝐴
in order to accomplish the simulation. As we see in Section 5.1.4, depending on
the channel and its symmetries, there can be even more economical methods to
simulate a quantum channel via teleportation.

5.1.2.1 Qudit Teleportation Protocol With Respect to a Finite Group

The qudit teleportation protocol outlined above is based on the Heisenberg–Weyl

operators {𝑊𝑧,𝑥 : 0 ≤ 𝑧, 𝑥 ≤ 𝑑 − 1} acting on a 𝑑-dimensional system, which are
used to form the qudit Bell states and thus the Bell measurement.
More generally, we can consider an arbitrary finite group 𝐺 and an irreducible
unitary representation {𝑈 𝑔 }𝑔∈𝐺 of 𝐺 acting on a 𝑑-dimensional Hilbert space,
where 𝑑 2 ≤ |𝐺 |. It then follows from Schur’s lemma (see Bibliographic Notes in
Section 3.4) that
1 ∑︁ 𝑔 𝑔† 1𝑑
𝑈 𝜌𝑈 = Tr[𝜌] (5.1.35)
|𝐺 | 𝑔∈𝐺 𝑑
for every state 𝜌. In particular, for bipartite states 𝜌 𝐴𝐵 in which the systems 𝐴 and
220
Chapter 5: Fundamental Quantum Information Processing Tasks

𝐵 are both 𝑑-dimensional, we find that

1 ∑︁ 𝑔 1𝐴
(𝑈 𝐴 ⊗ 1𝐵 ) 𝜌 𝐴𝐵 (𝑈 𝐴 ⊗ 1𝐵 ) =
𝑔†
⊗ Tr 𝐴 [𝜌 𝐴𝐵 ]. (5.1.36)
|𝐺 | 𝑔∈𝐺 𝑑

Now, let us take the maximally entangled state |Φ⟩ 𝐴𝐵 and define the states

|Φ𝑔 ⟩ 𝐴𝐵 B (𝑈 𝐴 ⊗ 1𝐵 )|Φ⟩ 𝐴𝐵 .
𝑔
(5.1.37)

We call these states the generalized Bell states. We see that these states are a direct
generalization of the usual qudit Bell states in (3.2.58).

Exercise 5.3
Using (5.1.36), prove that
1 ∑︁ 𝑔 𝑔 1 𝐴𝐵
|Φ ⟩⟨Φ | 𝐴𝐵 = 2 . (5.1.38)
|𝐺 | 𝑔∈𝐺 𝑑

By defining the operators

𝑔 𝑑2 𝑔 𝑔
𝑀 𝐴𝐵 B |Φ ⟩⟨Φ | 𝐴𝐵 , (5.1.39)
|𝐺 |
we find that
𝑀 𝐴𝐵 = 1 𝐴𝐵 .
∑︁
𝑔
(5.1.40)
𝑔∈𝐺

Since the operators 𝑀 𝐴𝐵 satisfy 0 ≤ 𝑀 𝐴𝐵 ≤ 1 𝐴𝐵 for all 𝑔 ∈ 𝐺 (the right-most

𝑔 𝑔
𝑔
inequality due to the assumption 𝑑 2 ≤ |𝐺 |), we conclude that the set {𝑀 𝐴𝐵 }𝑔∈𝐺
constitutes a POVM. This POVM defines the 𝐺-Bell measurement.
We use the 𝐺-Bell measurement defined by the POVM {𝑀 𝑔 }𝑔∈𝐺 in order to
construct the generalized teleportation protocol. The protocol proceeds as follows.
As before, Alice and Bob start by sharing two qudits in the state |Φ⟩ 𝐴𝐵 , with Alice
holding an extra qudit, i.e., in the state |𝜓⟩ 𝐴′ , to be teleported to Bob.
1. Alice performs, on her qudits 𝐴 and 𝐴′, the generalized Bell measurement
𝑔
given by the POVM {𝑀 𝐴𝐴′ }𝑔∈𝐺 . For each outcome 𝑔 ∈ 𝐺 of the measurement,

221
Chapter 5: Fundamental Quantum Information Processing Tasks

according to (3.3.22) the (unnormalized) post-measurement state of Bob’s

qudit is
!
𝑑
√︁ ⟨Φ𝑔 | 𝐴′ 𝐴 ⊗ 1𝐵 (|𝜓⟩ 𝐴′ ⊗ |Φ⟩ 𝐴𝐵 )
|𝐺 |
𝑑
⟨Φ| 𝐴′ 𝐴 (𝑈 𝐴′ ⊗ 1 𝐴 ) ⊗ 1𝐵 (|𝜓⟩ 𝐴′ ⊗ |Φ⟩ 𝐴𝐵 )
𝑔†
= √︁ (5.1.41)
|𝐺 |
𝑑
!
𝑑 1 ∑︁
⟨𝑘, 𝑘 | 𝐴′ 𝐴 (𝑈 𝐴′ ⊗ 1 𝐴 ) ⊗ 1𝐵
𝑔†
= √︁ √
|𝐺 | 𝑑 𝑘=1
𝑑
!
1 ∑︁ ′ ′
× |𝜓⟩ 𝐴′ ⊗ √ |𝑘 , 𝑘 ⟩ 𝐴𝐵 (5.1.42)
𝑑 𝑘 ′ =1
𝑑
1 ∑︁
⟨𝑘 | 𝐴′ 𝑈 𝐴′ |𝜓⟩ 𝐴′ |𝑘⟩𝐵 ⟨𝑘 |𝑘 ′⟩
𝑔†
= √︁ (5.1.43)
|𝐺 | 𝑘,𝑘 ′ =1
𝑑
1 ∑︁
𝑔†
= √︁ ⟨𝑘 | 𝐴′ 𝑈 𝐴′ |𝜓⟩ 𝐴′ |𝑘⟩𝐵 (5.1.44)
|𝐺 | 𝑘=1
1 𝑔†
= √︁ 𝑈𝐵 |𝜓⟩𝐵 (5.1.45)
|𝐺 |

We see that each outcome occurs with probability |𝐺1 | , and the post-measurement
𝑔†
state of Bob’s qudit is 𝑈𝐵 |𝜓⟩𝐵 .
2. Alice communicates the outcome 𝑔 resulting from the measurement to Bob.
3. Upon receiving the measurement outcome, Bob applies 𝑈 𝑔 on his qudit. The
resulting state of Bob’s qudit is |𝜓⟩𝐵 .
Observe that the original qudit teleportation protocol is a special case of the
generalized teleportation protocol outlined above, in which the group 𝐺 is Z𝑑 × Z𝑑
and its irreducible projective unitary representation {𝑈 𝑔 }𝑔∈𝐺 is taken to be the set
of Heisenberg–Weyl operators. Then, the generalized Bell states Φ𝑔 are precisely
the qudit Bell states Φ𝑧,𝑥 defined in (3.2.58). Furthermore, since |𝐺 | = 𝑑 2 , the
𝑑2
POVM elements 𝑀 𝑔 = |𝐺 | |Φ ⟩⟨Φ | are the projections on to the qudit Bell states,
𝑔 𝑔

exactly as in the qudit teleportation protocol.

222
Chapter 5: Fundamental Quantum Information Processing Tasks

Exercise 5.4
By following a development similar to that in (5.1.26)–(5.1.31) and Exercise 5.2,
verify that the one-way LOCC channel corresponding to the generalized
teleportation protocol presented above has the following form analogous to
(5.1.32):
∑︁ h i
𝑔 𝑔 𝑔†
T 𝐴′ 𝐴𝐵→𝐵 (𝜎𝐴′ 𝐴𝐵 ) =
𝐺
Tr 𝐴′ 𝐴 𝑀 𝐴′ 𝐴𝑈𝐵 (𝜎𝐴′ 𝐴𝐵 )𝑈𝐵 (5.1.46)
𝑔∈𝐺

for every state 𝜎𝐴′ 𝐴𝐵 . Conclude that T 𝐺

𝐴′ 𝐴𝐵→𝐵 (𝜌 𝐴 ⊗ Φ 𝐴𝐵 ) = 𝜌 𝐵 , as expected.
′

5.1.3 Post-Selected Teleportation

Throughout this section, we have described teleportation protocols that involve

performing a particular kind of Bell measurement between a system 𝐴′, whose state
is to be teleported, and a system 𝐴 that is one share of a bipartite system 𝐴𝐵 in
the joint resource state |Φ⟩⟨Φ| 𝐴𝐵 . Based on the outcome of the Bell measurement
performed on 𝐴′ 𝐴, Bob applies a particular correction operation in order to obtain
the initial state of 𝐴′ in his system 𝐵. Thus, although the individual outcomes of the
Bell measurement occur with some probability, the overall teleportation protocol is
deterministic, due to the correction operations; i.e., it succeeds with probability
one.
If Bob does not have the ability to apply correction operations to his system
based on the Bell measurement outcomes, then Alice and Bob can perform what
is called post-selected teleportation. Post-selected teleportation is based on the
fact that, in the teleportation protocols that we have considered, Bob does not need
to apply a correction operation on his system if the outcome corresponding to
|Φ⟩⟨Φ| 𝐴′ 𝐴 occurs in the Bell measurement performed on 𝐴′ 𝐴. This is due to the
fact that the post-measurement state on Bob’s system, conditioned on the |Φ⟩⟨Φ| 𝐴′ 𝐴
outcome, is precisely the initial state 𝜌 𝐴′ to be teleported:
1
⟨Φ| 𝐴′ 𝐴 (𝜌 𝐴′ ⊗ |Φ⟩⟨Φ| 𝐴𝐵 )|Φ⟩ 𝐴′ 𝐴 = 𝜌𝐵, (5.1.47)
𝑑2
where 𝑑 = 𝑑 𝐴 = 𝑑 𝐵 . This is a special case of (4.2.14) in which N 𝐴′ →𝐵 = id 𝐴′ →𝐵
and 𝑋 𝑅 𝐴′ = 𝜌 𝐴′ . If we thus modify the teleportation protocol such that we consider

223
Chapter 5: Fundamental Quantum Information Processing Tasks

ρ A0 g

Bell
Measurement
Alice
|ΦiAB
Bob

N Vg V g N(U g† ρU g )V g†

Figure 5.4: Circuit diagram of a modified teleportation protocol in which Bob

applies a channel N to his qudit before performing a correction operation 𝑉 𝑔 ,
which is a unitary operation from the set {𝑉 𝑔 : 𝑔 ∈ 𝐺} of unitary operators.

the |Φ⟩⟨Φ| 𝐴′ 𝐴 outcome of the Bell measurement as a “success” and the rest of the
outcomes as a “failure”, then we obtain post-selected teleportation. Post-selected
teleportation is probabilistic by definition. In particular, from (5.1.47), we see that
it succeeds with probability 𝑑12 .

5.1.4 Teleportation-Simulable Channels

Let us now consider the even more general protocol depicted in Figure 5.4. Let 𝐺 be
a finite group. As before, Alice and Bob start with a shared pair of qudits in the state
|Φ⟩⟨Φ| 𝐴𝐵 , while Alice holds an extra qudit in the state 𝜌 𝐴′ to be teleported to Bob.
Unlike the teleportation protocol above, however, Bob applies the channel N to his
qudit before he receives the results of the Bell measurement. Once he receives the
measurement results, he applies the unitary operation 𝑉 𝑔 from the set {𝑉 𝑔 : 𝑔 ∈ 𝐺}
of pre-determined unitary operators constituting a projective unitary representation
of 𝐺.
The initial tripartite joint state of the protocol is

𝜌 𝐴′ ⊗ ( 1 𝐴 ⊗ N𝐵 )(|Φ⟩⟨Φ| 𝐴𝐵 ). (5.1.48)

Alice performs the same generalized Bell measurement as before on 𝐴 and 𝐴′, which
𝑔 𝑔
we recall has the POVM {Π 𝐴𝐴′ }𝑔∈𝐺 with elements Π 𝐴𝐴′ defined in (5.1.39). Recall
that this POVM corresponds to an irreducible projective unitary representation of
𝐺 given by {𝑈 𝑔 }𝑔∈𝐺 . Since the Bell measurement operates only on the systems
𝐴′ and 𝐴, we can bring them inside the action of N on Bob’s share of the state

224
Chapter 5: Fundamental Quantum Information Processing Tasks

ρ A0 g

Bell
Measurement
Alice
ΦNAB
Bob

Vg
† †
V g N(U g ρU g )V g

Figure 5.5: A mathematically equivalent way of describing the protocol in

Figure 5.4. In this case, Alice and Bob start with the Choi state ΦN
𝐴𝐵 of the
channel N instead of the maximally entangled state |Φ⟩⟨Φ| 𝐴𝐵 .

|Φ⟩⟨Φ| 𝐴𝐵 . This means that the analysis for the qudit teleportation protocol from
Section 5.1.2.1 carries over exactly in this case. In other words, each outcome
𝑔 ∈ 𝐺 occurs with an equal probability of |𝐺1 | , and the post-measurement state on
Bob’s qudit corresponding to the outcome 𝑔 is

N(𝑈 𝑔† 𝜌𝑈 𝑔 ). (5.1.49)

After Bob applies the unitary 𝑉 𝑔 , the state of Bob’s qudit at the end of the protocol
is
𝑉 𝑔 N(𝑈 𝑔† 𝜌𝑈 𝑔 )𝑉 𝑔† . (5.1.50)
1
This occurs with probability |𝐺 | for all 𝑔 ∈ 𝐺.
Now, observe that the state (5.1.48) can be written as

𝜌 𝐴′ ⊗ ΦN
𝐴𝐵 , (5.1.51)

where we recall that ΦN 𝐴𝐵 = (id 𝐴 ⊗ N 𝐵 )(|Φ⟩⟨Φ| 𝐴𝐵 ) is the Choi state of the channel
N. In other words, the protocol depicted in Figure 5.4 is mathematically equivalent
to the teleportation protocol over a group 𝐺 outlined above, except that instead of
starting with the shared maximally entangled state |Φ⟩ 𝐴𝐵 , Alice and Bob start with
the shared state ΦN𝐴𝐵 . This equivalent protocol is depicted in Figure 5.5.
If Bob discards the classical message 𝑔 at the end of the protocol, then the state
of his system is given by
1 ∑︁ 𝑔
𝑉 N(𝑈 𝑔† 𝜌𝑈 𝑔 )𝑉 𝑔† . (5.1.52)
|𝐺 | 𝑔∈𝐺

225
Chapter 5: Fundamental Quantum Information Processing Tasks

ρ A0 g

Bell
Measurement
ρA NA→B N(ρA ) = ΦNAB

Vg N(ρA0 )

L→
A0 AB→B

Figure 5.6: Teleportation simulation of a 𝐺-covariant channel N, where the

operators {𝑉 𝑔 : 𝑔 ∈ 𝐺} form a unitary representation of 𝐺 on the output space
of N.

Recall from (4.4.127) that this state is simply the output state of the twirl of N
with respect to the unitary representations {𝑈 𝑔 }𝑔∈𝐺 and {𝑉 𝑔 }𝑔∈𝐺 , because the
twirled channel N is a symmetrized version of the original channel N. Thus, the
generalized teleportation protocol gives an explicit procedure for implementing a
channel twirl by implementing the teleportation protocol using the Choi state of the
channel as the resource state.
Suppose now that the channel N satisfies the group covariance property from
Definition 4.18 for all 𝑔 ∈ 𝐺. In this case, we see that N(𝑈 𝑔† 𝜌𝑈 𝑔 ) = 𝑉 𝑔† N(𝜌)𝑉 𝑔
for every outcome 𝑔 of Alice’s generalized Bell measurement. Therefore, after Bob
applies 𝑉 𝑔 , the state of his qudit is N(𝜌). This generalized teleportation protocol
therefore effectively applies the channel N to the state 𝜌 𝐴′ and transfers the resulting
state to Bob’s qudit; see Figure 5.6. We say that the teleportation protocol simulates
the action of the channel N on the input state 𝜌 𝐴′ . As stated earlier, in this sense,
the original teleportation protocol can be regarded as a way to simulate the identity
channel.
The notion of simulation of a channel by a teleportation protocol can be extended
to a one-way LOCC channel L→ , as introduced in Definition 4.22, to obtain the
following definition.

Definition 5.1 Teleportation-Simulable Channel

A channel N 𝐴→𝐵 is called teleportation-simulable with associated resource
state 𝜔 𝑅𝐵 if there exists a one-way LOCC channel L→
𝑅 𝐴𝐵′ →𝐵 such that, for every

226
Chapter 5: Fundamental Quantum Information Processing Tasks

ρA
ρA NA→B N(ρA ) = L→
RAB0 →B N(ρA )
ωRB0

NA→B

Figure 5.7: Depiction of a teleportation-simulable channel with associated

resource state 𝜔 𝑅𝐵′ . The teleportation-simulable channel N 𝐴→𝐵 can be realized
via the interaction LOCC channel L→ 𝑅 𝐴𝐵→𝐵 and the resource state 𝜔 𝑅𝐵 .
′

input state 𝜌 𝐴 ,
N(𝜌 𝐴 ) = L→
𝑅 𝐴𝐵′ →𝐵 (𝜌 𝐴 ⊗ 𝜔 𝑅𝐵′ ). (5.1.53)

Figure 5.7 illustrates the concept of a teleportation-simulable channel. Note

that in (5.1.53) the resource state 𝜔 𝑅𝐵′ is fixed, as well as the LOCC channel L→ .
Both sides of the equation should thus be regarded as functions of 𝜌 𝐴 .
From the discussions above, we conclude that every group-covariant channel
is teleportation-simulable, where the one-way LOCC channel L is simply the
teleportation protocol with respect to the group, and the resource state is the Choi
state of the channel.

5.2 Quantum Super-Dense Coding

We now discuss the quantum super-dense coding protocol. This protocol can be
viewed as a “dual” to the quantum teleportation protocol in the following sense:
while in the basic quantum teleportation protocol, Alice and Bob make use of
two qubits in the entangled state vector |Φ+ ⟩ = √1 (|0, 0⟩ + |1, 1⟩) and two bits of
2
classical information to simulate a noiseless qubit channel, in quantum super-dense
coding they make use of the shared entangled state |Φ+ ⟩ along with one use of a
noiseless qubit channel to communicate two bits of classical information. This
is remarkable because, without the shared entanglement and only one use of the
noiseless qubit channel, they can communicate at most only one bit of classical
information. The quantum super-dense coding protocol thus represents one of the
simplest examples in which prior shared entanglement provides an advantage for

227
Chapter 5: Fundamental Quantum Information Processing Tasks

x z

Xx Zz

Alice
|Φ+ iAB
Bob

(z, x)

Bell
Measurement

Figure 5.8: Circuit diagram for the super-dense coding protocol. Using the bits
(𝑧, 𝑥) that she wishes to send, Alice applies the appropriate Pauli 𝑋 and/or 𝑍
operators to her share 𝐴 of the maximally entangled qubits that are in the state
|Φ+ ⟩ 𝐴𝐵 and sends it through a noiseless qubit channel to Bob. Bob then performs
a Bell measurement on the two qubits to recover the encoded bits (𝑧, 𝑥).

classical communication.
Let us now go through the quantum super-dense coding protocol. See Figure
5.8 for a depiction of the protocol. Alice wishes to send two classical bits
(𝑧, 𝑥) ∈ {0, 1}2 to Bob by making use of a shared pair of qubits in the maximally
entangled state |Φ+ ⟩ 𝐴𝐵 and one use of a noiseless qubit channel. Depending on the
bits she wishes to send, she performs the following operations on her share of the
entangled qubits:
• To send the bits (0, 0), she does nothing.
• To send the bits (0, 1), she applies the Pauli 𝑋 operator to her qubit, transforming
the joint state |Φ+ ⟩ 𝐴𝐵 to |Ψ+ ⟩ 𝐴𝐵 .
• To send the bits (1, 0), she applies the Pauli 𝑍 operator to her qubit, transforming
the joint state |Φ+ ⟩ 𝐴𝐵 to |Φ− ⟩ 𝐴𝐵 .
• To send the bits (1, 1), she applies the 𝑋 operator followed by the 𝑍 operator,
so that the joint state becomes |Ψ− ⟩ 𝐴𝐵 .
After applying the appropriate operation, Alice sends her qubit to Bob with the one
allowed use of a noiseless qubit channel.

228
Chapter 5: Fundamental Quantum Information Processing Tasks

Bob now holds both qubits, and they are in one of the four Bell states

|Φ𝑧,𝑥 ⟩ 𝐴𝐵 = (𝑍 𝐴𝑧 𝑋 𝐴𝑥 ⊗ 1𝐵 )|Φ+ ⟩ 𝐴𝐵 (5.2.1)

depending on the bits (𝑧, 𝑥) Alice sent. Bob then performs a Bell measurement on
his two qubits, and the outcome of this measurement consists precisely of the bits
(𝑧, 𝑥) that Alice wished to send.
The super-dense coding protocol has a simple generalization to the qudit case.
In this case, Alice and Bob share the qudit Bell state |Φ⟩ 𝐴𝐵 before communication
begins, and by applying one of the 𝑑 2 Heisenberg–Weyl operators 𝑊𝑧,𝑥 from (3.2.48)
on her share of the state, Alice can rotate the global state to one of the 𝑑 2 qudit Bell
states in (3.2.58). After Alice sends her share of the encoded state over a noiseless
qudit channel to Bob, Bob can then perform the qudit Bell measurement to decode
which of the 𝑑 2 messages Alice transmitted.

5.3 Quantum Hypothesis Testing

We now consider the task of quantum hypothesis testing, which is a generalization of
classical statistical hypothesis testing to the quantum setting. In the quantum setting,
the statistical hypotheses are represented by the states of a particular quantum
system1 , and the task is to determine which of the hypotheses is “true”, i.e., to
determine the state of the quantum system.
To be more specific, consider the following scenario. Bob is given a quantum
system by Alice, which is either in the state 𝜌 or in the state 𝜎, and his task is to
determine in which state the system has been prepared. Bob’s strategy consists
of performing a measurement of the system, described by the POVM {𝑀 𝜌 , 𝑀𝜎 }
(so that 𝑀 𝜌 , 𝑀𝜎 ≥ 0 and 𝑀 𝜌 + 𝑀𝜎 = 1), and then guessing “𝜌” if the outcome
corresponds to 𝑀 𝜌 and guessing “𝜎” if the outcome corresponds to 𝑀𝜎 ; see
Figure 5.9. Of course, Bob’s guess might not always be correct, and there are two
types of errors that can occur:
1. Type-I Error: Bob guesses “𝜎”, but the system is in the state 𝜌. The probability
of this occurring is Tr[𝑀𝜎 𝜌].
1The hypotheses can be represented by quantum channels more generally, as we detail in
Section 5.4.

229
Chapter 5: Fundamental Quantum Information Processing Tasks

Guess “ρ” if outcome is Mρ

ρ or σ Guess “σ” if outcome is Mσ

{Mρ , Mσ }

Figure 5.9: In quantum hypothesis testing, a given quantum system is known

to be either in the state 𝜌 or the state 𝜎. The most general strategy to determine
the state of the system consists of measuring it according to a two-outcome
POVM {𝑀 𝜌 , 𝑀𝜎 }. If the outcome corresponding to 𝑀 𝜌 occurs, then we guess
that the system is in the state 𝜌, and if the outcome corresponding to 𝑀𝜎 occurs,
then we guess that the system is in the state 𝜎. The goal is to minimize the
probability of error of this general strategy.

2. Type-II Error: Bob guesses “𝜌”, but the system is in the state 𝜎. The probability
of this occurring is Tr[𝑀 𝜌 𝜎].
In order to obtain an optimal strategy for Bob, there are two cases that are
typically considered.
• Symmetric Case: Also called quantum state discrimination, in this setting, Bob
has some prior knowledge about the state he is given. Specifically, he knows
that the state is 𝜌 with probability 𝜆 ∈ [0, 1] and 𝜎 with probability 1 − 𝜆. The
goal is then to minimize the average of the type-I and type-II error probabilities
with respect to this probability distribution. In other words, letting 𝑀 ≡ 𝑀 𝜌 ,
the goal is to minimize the function

𝑝 err (𝜆, 𝜌, 𝜎, 𝑀) B 𝜆Tr[( 1 − 𝑀) 𝜌] + (1 − 𝜆)Tr[𝑀𝜎]. (5.3.1)

The optimization problem we are interested in is thus:

minimize 𝑝 err (𝜆, 𝜌, 𝜎, 𝑀)

subject to 0 ≤ 𝑀 ≤ 1,
(5.3.2)

where the minimization is with respect to operators 0 ≤ 𝑀 ≤ 1 representing

the two-outcome POVM {𝑀, 1 − 𝑀 }. We discuss symmetric hypothesis
testing, and the optimization problem in (5.3.2), in detail in Section 5.3.1.
• Asymmetric Case: In this setting, the goal is to minimize the type-II error
probability, given an upper bound on the type-I error probability. In other
230
Chapter 5: Fundamental Quantum Information Processing Tasks

words, letting 𝑀 ≡ 𝑀 𝜌 , the optimal measurement is given by solving the

following optimization problem:
minimize Tr[𝑀𝜎]
subject to Tr[( 1 − 𝑀) 𝜌] ≤ 𝜀, (5.3.3)
0 ≤ 𝑀 ≤ 1,
where 𝜀 ∈ [0, 1] is the upper bound on the type-I error probability, and the
optimization is with respect to every operator 𝑀 satisfying 0 ≤ 𝑀 ≤ 1,
representing the two-outcome POVM {𝑀, 1 − 𝑀 }. We discuss asymmetric
hypothesis testing, and the optimization problem in (5.3.3), in detail in
Section 5.3.3.

Exercise 5.5
Consider a very simple hypothesis testing strategy in which Bob discards the
state of the quantum system and simply guesses “𝜌” with some probability
𝑞 ∈ [0, 1] and “𝜎” with probability 1 − 𝑞.
1. What is the POVM corresponding to this strategy?
2. Evaluate the type-I and type-II error probabilities for this strategy.
3. If, in the symmetric setting, the prior probability for the state 𝜌 is 𝜆 ∈ [0, 1],
then evaluate the error probability in (5.3.1) for this strategy.

Now, suppose that Bob is given several copies, say 𝑛 ≥ 1, of a quantum system,
each one of which is either in the state 𝜌 or the state 𝜎. His strategy to determine the
state can now make use of these multiple copies in an adaptive manner, for example,
and could allow the error probabilities to go below the “single-shot” (𝑛 = 1) error
probabilities defined above. Since Bob ultimately has to make a decision between 𝜌
and 𝜎, his strategy is still described by a two-outcome POVM, which we denote by
{𝑀 𝜌(𝑛) , 𝑀𝜎(𝑛) }. This setting of hypothesis testing with multiple copies is depicted in
Figure 5.10. The type-I and type-II error probabilities are defined in an analogous
manner as before. Specifically, the type-I error is Tr[𝑀𝜎(𝑛) 𝜌 ⊗𝑛 ] and the type-II error
is Tr[𝑀 𝜌(𝑛) 𝜎 ⊗𝑛 ]. In the symmetric case, if 𝜆 ∈ [0, 1] is the probability that each
system is in the state 𝜌, then the error probability is

𝜆Tr[𝑀𝜎(𝑛) 𝜌 ⊗𝑛 ] + (1 − 𝜆)Tr[𝑀 𝜌(𝑛) 𝜎 ⊗𝑛 ]

= 𝜆Tr[( 1⊗𝑛 − 𝑀 (𝑛) ) 𝜌 ⊗𝑛 ] + (1 − 𝜆)Tr[𝑀 (𝑛) 𝜎 ⊗𝑛 ] (5.3.4)
231
Chapter 5: Fundamental Quantum Information Processing Tasks

Guess “ρ” if outcome is Mρ(n)

n copies of
ρ or σ

Guess “σ” if outcome is Mσ(n)

{Mρ(n) , Mσ(n) }

Figure 5.10: Quantum hypothesis testing with 𝑛 ≥ 1 copies of the state. As

in the case of one copy shown in Figure 5.9 (𝑛 = 1), the most general decision
strategy in this case consists of a measurement of all 𝑛 copies of the system
according to a POVM {𝑀 𝜌(𝑛) , 𝑀𝜎(𝑛) }. If the outcome corresponding to 𝑀 𝜌(𝑛)
occurs, then we guess that each copy of the system is in the state 𝜌, and if the
outcome corresponding to 𝑀𝜎(𝑛) occurs, then we guess that each copy of the
system is in the state 𝜎.

= 𝑝 err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 , 𝑀 (𝑛) ), (5.3.5)

where we have let 𝑀 𝜌(𝑛) ≡ 𝑀 (𝑛) .

Exercise 5.6
Consider states 𝜌 and 𝜎 along with a POVM {𝑀0 , 𝑀1 } representing a strategy
for hypothesis testing of a single copy of the quantum system, where the outcome
“0” corresponds to 𝜌 and the outcome “1” corresponds to 𝜎. Let 𝜆 ∈ [0, 1] be
the prior probability for 𝜌, and let 𝑛 ≥ 2. Construct the POVM {𝑀 𝜌(𝑛) , 𝑀𝜎(𝑛) },
and evaluate the type-I and type-II error probabilities for the following two
decision strategies for hypothesis testing of 𝑛 copies of the quantum system.
1. The majority-vote decision strategy: (1) Measure each system according
to the POVM {𝑀0 , 𝑀1 }, and let 𝑁𝑥 be the number of times the outcome
𝑥 occurs. (2) If 𝑁0 > 𝑁1 , guess “𝜌”, and if 𝑁1 > 𝑁0 , guess “𝜎”. If 𝑛 is
even and 𝑁0 = 𝑁1 , then guess “𝜌” with probability 𝑞 ∈ [0, 1] and guess
“𝜎” with probability 1 − 𝑞.
2. The unanimous-vote decision strategy: (1) Measure each system according
to the POVM {𝑀0 , 𝑀1 }, and let 𝑁𝑥 be the number of times the outcome 𝑥

232
Chapter 5: Fundamental Quantum Information Processing Tasks

occurs. (2) If 𝑁0 = 𝑛, then guess “𝜌”; otherwise, guess “𝜎”.

5.3.1 Symmetric Case (State Discrimination)

Given quantum states 𝜌 and 𝜎, the goal of symmetric hypothesis testing, also known
as quantum state discrimination, is to devise a measurement strategy that minimizes
the error probability defined in (5.3.1), where 𝜆 ∈ [0, 1] is the probability that
the state is 𝜌 and 1 − 𝜆 is the probability that the state is 𝜎. The value of the
corresponding optimization problem in (5.3.2) is

𝑝 ∗err (𝜆, 𝜌, 𝜎) B inf {Tr[( 1 − 𝑀)(𝜆𝜌)] + Tr[𝑀 (1 − 𝜆)𝜎]} . (5.3.6)

𝑀:0≤𝑀 ≤1

Exercise 5.7
Show that 𝑝 ∗err (𝜆, 𝜌, 𝜎) can be evaluated using a semi-definite program. Then,
using strong duality, prove that an alternate expression for 𝑝 ∗err (𝜆, 𝜌, 𝜎) is

𝑝 ∗err (𝜆, 𝜌, 𝜎) = sup {Tr[𝑊] : 𝑊 ≤ 𝜆𝜌, 𝑊 ≤ (1 − 𝜆)𝜎} . (5.3.7)

𝑊 Hermitian

Finally, evaluate the complementary slackness conditions from Proposition 2.29.

An optimal operator 𝑊 is known as the “greatest lower bound operator”.

Exercise 5.8
Prove that 𝑝 ∗err (𝜆, 𝜌, 𝜎) is isometrically invariant: for every isometry 𝑉,
𝑝 ∗err (𝜆, 𝜌, 𝜎) = 𝑝 ∗err (𝜆, 𝑉 𝜌𝑉 † , 𝑉 𝜎𝑉 † ).

More generally than isometric invariance, the following data-processing in-

equality holds for the error probability for discriminating two quantum states. The
intuition behind the proof of Proposition 5.2 is as follows: Suppose that we are
given a quantum system in an unknown state. Before applying a measurement to
determine the state, we could perform a quantum channel N. However, if we do so,
this strategy is not necessarily an optimal strategy, and the error probability is never
smaller than if we simply apply an optimal measurement to distinguish the states.

233
Chapter 5: Fundamental Quantum Information Processing Tasks

Proposition 5.2 Data-Processing Inequality for State Discrimination

Consider states 𝜌 and 𝜎, 𝜆 ∈ [0, 1], and let N be a positive and trace preserving
superoperator. Then,

𝑝 ∗err (𝜆, 𝜌, 𝜎) ≤ 𝑝 ∗err (𝜆, N(𝜌), N(𝜎)) (5.3.8)

Proof: Let 𝑀 ′ be an operator satisfying 0 ≤ 𝑀 ′ ≤ 1, and consider the operator

N† (𝑀 ′) (this is the measurement operator corresponding to performing the chan-
nel N first and then applying the measurement operator 𝑀 ′). Due to the positivity
of N, and thus of N† , we have N† (𝑀 ′) ≥ 0. Now, the condition 𝑀 ′ ≤ 1 implies
1 − 𝑀 ′ ≥ 0. Thus, by the positivity of N† , it follows that N† ( 1 − 𝑀 ′) ≥ 0, which
implies N† (𝑀 ′) ≤ N† ( 1). Now, N† is unital, because N is trace preserving (see Ex-
ercise 4.10), which means that N† ( 1) = 1. We thus conclude that 0 ≤ N† (𝑀 ′) ≤ 1.
Therefore, N† (𝑀 ′) is a measurement operator and thus a feasible point in the
optimization problem for 𝑝 ∗err (𝜆, 𝜌, 𝜎), so that
𝑝 ∗err (𝜆, 𝜌, 𝜎) = inf {Tr[( 1 − 𝑀)(𝜆𝜌)] + Tr[𝑀 (1 − 𝜆)𝜎]} (5.3.9)
𝑀:0≤𝑀 ≤1
≤ Tr[( 1 − N† (𝑀 ′))(𝜆𝜌)] + Tr[N† (𝑀 ′)(1 − 𝜆)𝜎] (5.3.10)
= Tr[( 1 − 𝑀 ′)(𝜆N(𝜌))] + Tr[𝑀 ′ (1 − 𝜆)N(𝜎)], (5.3.11)
where the last line follows from the definition of the adjoint of a superoperator.
Finally, because the inequality
𝑝 ∗err (𝜆, 𝜌, 𝜎) ≤ Tr[( 1 − 𝑀 ′)(𝜆N(𝜌))] + Tr[𝑀 ′ (1 − 𝜆)N(𝜎)] (5.3.12)
holds for all 𝑀 ′ satisfying 0 ≤ 𝑀 ′ ≤ 1, we conclude that
𝑝 ∗err (𝜆, 𝜌, 𝜎) ≤ inf Tr[( 1 − 𝑀 ′)(𝜆N(𝜌))] + Tr[𝑀 ′ (1 − 𝜆)N(𝜎)] (5.3.13)
𝑀 ′ :0≤𝑀 ′ ≤ 1
= 𝑝 ∗err (𝜆, N(𝜌), N(𝜎)), (5.3.14)
as required. ■

It turns out that 𝑝 ∗err (𝜆, 𝜌, 𝜎) can be written in terms of the trace norm (Sec-
tion 2.2.9.2) as
1
𝑝 ∗err (𝜆, 𝜌, 𝜎) = (1 − ∥𝜆𝜌 − (1 − 𝜆)𝜎∥ 1 ) , (5.3.15)
2
which is an immediate consequence of the following theorem.

234
Chapter 5: Fundamental Quantum Information Processing Tasks

Theorem 5.3 Helstrom–Holevo Theorem

For all positive semi-definite operators 𝐴 and 𝐵,
1
inf {Tr[( 1 − 𝑀) 𝐴] + Tr[𝑀 𝐵]} = (Tr[ 𝐴 + 𝐵] − ∥ 𝐴 − 𝐵∥ 1 ) .
𝑀:0≤𝑀 ≤1 2
(5.3.16)
A measurement operator 𝑀 is optimal if and only if 𝑀 = Π+ + Λ0 , where Π+ is
the projection onto the strictly positive part of 𝐴 − 𝐵, the operator Π0 is the
projection onto the zero eigenspace of 𝐴 − 𝐵, and 0 ≤ Λ0 ≤ Π0 . Furthermore,
1
sup Tr[𝑀 ( 𝐴 − 𝐵)] = (Tr[ 𝐴 − 𝐵] + ∥ 𝐴 − 𝐵∥ 1 ) , (5.3.17)
𝑀:0≤𝑀 ≤1 2

and the conditions for an optimal 𝑀 are the same as given above.

Remark: Letting 𝐴 = 𝜆𝜌 and 𝐵 = (1 − 𝜆)𝜎 in the statement of Theorem 5.3, we recognize that
the objective function on the left-hand side of (5.3.16) is equal to 𝑝 err (𝜆, 𝜌, 𝜎, 𝑀) as defined
in (5.3.1). We thus obtain (5.3.15). Note that Theorem 5.3 also gives us a measurement that
achieves the minimal error probability.

Proof: Let 𝑀 be an arbitrary operator satisfying 0 ≤ 𝑀 ≤ 1. Let Δ B 𝐴 − 𝐵

and let Δ+ and Δ− be the positive and negative parts, respectively, of Δ, so that
𝐴 − 𝐵 = Δ+ − Δ− and Δ+ Δ− = 0 (recall (2.2.66)). We can then write the objective
function in (5.3.16) as

Tr[( 1 − 𝑀) 𝐴] + Tr[𝑀 𝐵] = Tr[ 𝐴] − (Tr[𝑀Δ+ ] − Tr[𝑀Δ− ]) . (5.3.18)

Now, since Tr[𝑀Δ− ] ≥ 0, on account of both 𝑀 and Δ− being positive semi-definite,

we find that

Tr[𝑀Δ+ ] − Tr[𝑀Δ− ] ≤ Tr[𝑀Δ+ ] ≤ Tr[Δ+ ], (5.3.19)

where the last inequality follows because 𝑀 ≤ 1. The equality in (5.3.18) and the
inequality in (5.3.19) imply that

Tr[( 1 − 𝑀) 𝐴] + Tr[𝑀 𝐵] ≥ Tr[ 𝐴] − Tr[Δ+ ]. (5.3.20)

Since
∥ 𝐴 − 𝐵∥ 1 = Tr[| 𝐴 − 𝐵|] = Tr[Δ+ ] + Tr[Δ− ] (5.3.21)
235
Chapter 5: Fundamental Quantum Information Processing Tasks

and
Δ− = Δ+ + 𝐵 − 𝐴, (5.3.22)
we can write Tr[Δ+ ] as
1
Tr[Δ+ ] = (∥ 𝐴 − 𝐵∥ 1 − Tr[𝐵 − 𝐴]) . (5.3.23)
2
This means that the objective function on the left-hand side of (5.3.18) can be
bounded from below by 12 (Tr[ 𝐴 + 𝐵] − ∥ 𝐴 − 𝐵∥ 1 ). We have thus shown that
1
Tr[( 1 − 𝑀) 𝐴] + Tr[𝑀 𝐵] ≥ (Tr[ 𝐴 + 𝐵] − ∥ 𝐴 − 𝐵∥ 1 ) . (5.3.24)
2
for all 𝑀 such that 0 ≤ 𝑀 ≤ 1, which implies that
1
inf Tr[( 1 − 𝑀) 𝐴] + Tr[𝑀 𝐵] ≥ (Tr[ 𝐴 + 𝐵] − ∥ 𝐴 − 𝐵∥ 1 ) . (5.3.25)
𝑀:0≤𝑀 ≤1 2

To see the reverse inequality, let 𝑀 = Π+ + Λ0 , where Π+ is the projection

onto the strictly positive part of 𝐴 − 𝐵 and Λ0 satisfies 0 ≤ Λ0 ≤ Π0 , with Π0 the
projection onto the zero eigenspace of 𝐴 − 𝐵. Then,

Tr[𝑀 ( 𝐴 − 𝐵)] = Tr[(Π+ + Λ0 )(Δ+ − Δ− )] (5.3.26)

= Tr[(Π+ + Λ0 )Δ+ ] − Tr[(Π+ + Λ0 )Δ− ] (5.3.27)
= Tr[Δ+ ], (5.3.28)

where the last equality follows because Tr[Π+ Δ+ ] = Tr[Δ+ ] and Tr[Π+ Δ− ] =
0, since Π+ and Δ− are by definition orthogonal. We also used Tr[Λ0 Δ+ ] =
Tr[Λ0 Δ− ] = 0, with these latter equalities following because 0 ≤ Tr[Λ0 Δ± ] ≤
Tr[Π0 Δ± ] = 0. Therefore, using (5.3.23), we find that
1
Tr[ 𝐴] − Tr[(Π+ + Λ0 )( 𝐴 − 𝐵)] = (Tr[ 𝐴 + 𝐵] − ∥ 𝐴 − 𝐵∥ 1 ) . (5.3.29)
2
The operator Π+ + Λ0 thus achieves the bound in (5.3.24), which means that
1
inf Tr[( 1 − 𝑀) 𝐴] + Tr[𝑀 𝐵] = (Tr[ 𝐴 + 𝐵] − ∥ 𝐴 − 𝐵∥ 1 ) , (5.3.30)
𝑀:0≤𝑀 ≤ 1 2
so that (5.3.16) is established.
To see that Π+ + Λ0 is the only form for an optimal measurement operator,
suppose that 𝑀 is optimal, i.e., satisfies 0 ≤ 𝑀 ≤ 1 and saturates (5.3.16) with
236
Chapter 5: Fundamental Quantum Information Processing Tasks

equality. Then it follows that the two inequalities in (5.3.19) are saturated with
equality, so that
Tr[𝑀 (Δ+ − Δ− )] = Tr[𝑀Δ+ ] = Tr[Δ+ ] = Tr[Π+ Δ+ ]. (5.3.31)
The leftmost equality implies that Tr[𝑀Δ− ] = 0, where Δ− is the strictly negative
part of 𝐴 − 𝐵. Since both 𝑀 and Δ− are positive semi-definite and Π− is the
projection onto the strictly negative part of Δ− , we conclude that 𝑀Π− = Π− 𝑀 = 0.
This in turn implies that
𝑀 (Π+ + Π0 ) = (Π+ + Π0 ) 𝑀 = 𝑀, (5.3.32)
which, after sandwiching 𝑀 ≤ 1 on the left and right by Π+ + Π0 , implies that
𝑀 ≤ Π+ + Π0 . Since
0 = Tr[Δ+ (Π+ − 𝑀)] (5.3.33)
= Tr[Δ+ (Π+ − Π+ 𝑀Π+ )] (5.3.34)
= Tr[Δ+ Π+ ( 1 − 𝑀)Π+ ], (5.3.35)
we find that Π+ ( 1 − 𝑀)Π+ = 0. Now consider that Π− ( 1 − 𝑀)Π+ = 0 because
Π− Π+ = 0 and Π− 𝑀Π+ = Π− (Π+ + Π0 ) 𝑀Π+ = 0. So then Π+ ( 1 − 𝑀)Π+ = 0 and
Π− ( 1 − 𝑀)Π+ = 0 imply that ( 1 − 𝑀)Π+ = 0. From this equation, we conclude
that Π+ = Π+ 𝑀 = 𝑀Π+ . By sandwiching Π+ ≤ 1 by 𝑀 and applying operator
monotonicity of the square-root function (see Section 2.2.8.1), we conclude that
Π+ ≤ 𝑀. Combining this operator inequality with the previous one, we conclude
that an optimal 𝑀 satisfies Π+ ≤ 𝑀 ≤ Π+ + Π0 , which is equivalent to 𝑀
decomposing as 𝑀 = Π+ + Λ0 for 0 ≤ Λ0 ≤ Π0 .
The equality in (5.3.17) follows as a rewrite of (5.3.16):
1 1
∥ 𝐴 − 𝐵∥ 1 = Tr[ 𝐴 + 𝐵] − inf Tr[( 1 − 𝑀) 𝐴] + Tr[𝑀 𝐵] (5.3.36)
2 2 𝑀:0≤𝑀 ≤1
1
= sup Tr[ 𝐴 + 𝐵] − (Tr[( 1 − 𝑀) 𝐴] + Tr[𝑀 𝐵]) (5.3.37)
𝑀:0≤𝑀 ≤1 2
1
= sup Tr[𝐵 − 𝐴] + Tr[𝑀 ( 𝐴 − 𝐵)] (5.3.38)
𝑀:0≤𝑀 ≤1 2
1
= Tr[𝐵 − 𝐴] + sup Tr[𝑀 ( 𝐴 − 𝐵)]. (5.3.39)
2 𝑀:0≤𝑀 ≤1
Rearranging this equality, we arrive at (5.3.17). An optimal 𝑀 having the form
Π+ + Λ0 again follows from (5.3.19), (5.3.26)–(5.3.28), and the reasoning given
above. ■

237
Chapter 5: Fundamental Quantum Information Processing Tasks

Exercise 5.9
Let 𝜌 = |𝜓⟩⟨𝜓| ≡ 𝜓 and 𝜎 = |𝜙⟩⟨𝜙| ≡ 𝜙 be pure states, and let 𝜆 ∈ [0, 1].
Show that
1 √︁
∗ 2
𝑝 err (𝜆, 𝜓, 𝜙) = 1 − 1 − 4𝜆(1 − 𝜆) |⟨𝜓|𝜙⟩| . (5.3.40)
2
What is a measurement that achieves this optimal error probability?

Observe from (5.3.40) that if |𝜓⟩ and |𝜙⟩ are orthogonal, then 𝑝 ∗err (𝜆, 𝜓, 𝜙) = 0.

Exercise 5.10
Let 𝜌 and 𝜎 be quantum states that are orthogonal, in the sense that Π 𝜌 Π𝜎 =
Π𝜎 Π 𝜌 = 0, where Π 𝜌 and Π𝜎 are the projections onto the support of 𝜌 and
𝜎, respectively (recall (2.2.65)). Prove that the optimal error probability
for discriminating 𝜌 and 𝜎 vanishes, i.e., that 𝑝 ∗err (𝜆, 𝜌, 𝜎) = 0. What is a
measurement achieving this optimal error probability?

Exercise 5.11

𝑝 ∗err
iso;𝑝 W;𝑝
Evaluate the optimal error probability 𝜆, 𝜌 𝐴𝐵 1 , 𝜌 𝐴𝐵 2 for discriminating
iso;𝑝 W;𝑝
between the isotropic state 𝜌 𝐴𝐵 1 , 𝑝 1 ∈ [0, 1], and the Werner state 𝜌 𝐴𝐵 2 ,
𝑝 2 ∈ [0, 1], where 𝜆 ∈ [0, 1].

The Helstrom–Holevo theorem gives us the lowest possible error probability

in distinguishing between two states 𝜌 and 𝜎, given just one copy of either state.
Suppose now that, instead of just one copy, Alice sends Bob 𝑛 copies of either
𝜌 or 𝜎. Bob’s task is then to discriminate between the states 𝜌 ⊗𝑛 and 𝜎 ⊗𝑛 . The
Helstrom–Holevo theorem still applies in this case, so that

∗ ⊗𝑛 ⊗𝑛 1 ⊗𝑛 ⊗𝑛

𝑝 err (𝜆, 𝜌 , 𝜎 ) = 1 − 𝜆𝜌 − (1 − 𝜆)𝜎 1 (5.3.41)
2
is the lowest possible error probability. However, because Bob now has 𝑛 copies of
either 𝜌 or 𝜎, he can perform a discrimination strategy that involves a collective
measurement acting on the 𝑛 copies of the state. This means that the optimal error
exponent can generally be lower with 𝑛 ≥ 2 copies than with just one copy.
238
Chapter 5: Fundamental Quantum Information Processing Tasks

Exercise 5.12
Prove that the optimal error probability 𝑝 ∗err (𝜆, 𝜌, 𝜎) for quantum state discrim-
ination is monotonically non-increasing with 𝑛, i.e., prove that

𝑝 ∗err (𝜆, 𝜌 ⊗𝑛+1 , 𝜎 ⊗𝑛+1 ) ≤ 𝑝 ∗err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 ) (5.3.42)

for all 𝑛 ≥ 1.

5.3.1.1 Asymptotic Setting

Given states 𝜌 and 𝜎 and 𝜆 ∈ (0, 1), how does the optimal error probability
𝑝 err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 ) behave as the number 𝑛 of copies of the state increases? If 𝜌 ≡ 𝜓
and 𝜎 ≡ 𝜙 are pure states, then because 𝜓 ⊗𝑛are 𝜙 ⊗𝑛 are both pure states, we can use
√
(5.3.40) and the expansion 12 1 − 1 − 4𝑥 = 𝑥 + 𝑂 (𝑥 2 ) to see that the following
approximation holds as 𝑛 becomes large:
2
𝑝 ∗err (𝜆, 𝜓 ⊗𝑛 , 𝜙 ⊗𝑛 ) ≈ 𝜆(1 − 𝜆) |⟨𝜓|𝜙⟩| 2𝑛 = 𝜆(1 − 𝜆)2−𝑛(− log2 |⟨𝜓|𝜙⟩| ) . (5.3.43)
Now, because |⟨𝜓|𝜙⟩| 2 ∈ [0, 1], we have that − log2 |⟨𝜙|𝜓⟩| 2 ≥ 0, which means
that, as 𝑛 becomes large, the optimal error probability decays exponentially to
zero. Does the exponential decay hold more generally? In other words, for
arbitrary states 𝜌 and 𝜎, is it true that 𝑝 ∗err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 ) ≈ 2−𝑛𝜉 (𝜆,𝜌,𝜎) as 𝑛 becomes
large, where 𝜉 (𝜆, 𝜌, 𝜎) = − 𝑛1 log2 𝑝 ∗err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 ) is a non-negative asymptotic
error exponent that is independent of 𝑛? To be more precise, does the limit
lim𝑛→∞ − 𝑛1 log2 𝑝 ∗err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 ) exist, and if so, what is its value?
The following theorem provides positive answers to both questions. The
characterization given below is useful because the quantity 𝑝 ∗err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 )
becomes more and more difficult to calculate as 𝑛 increases, so that the asymptotic
error exponent is a helpful characterization of 𝑝 ∗err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 ).

Theorem 5.4 Quantum Chernoff Bound

For all quantum states 𝜌 and 𝜎, and 𝜆 ∈ (0, 1), the following limit exists and is
equal to the the quantum Chernoff divergence of 𝜌 and 𝜎:
1
lim − log2 𝑝 ∗err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 ) = 𝐶 (𝜌∥𝜎), (5.3.44)
𝑛→∞ 𝑛

239
Chapter 5: Fundamental Quantum Information Processing Tasks

where
𝑠 1−𝑠
𝐶 (𝜌∥𝜎) B sup − log2 Tr[𝜌 𝜎 ] . (5.3.45)
𝑠∈(0,1)

That is, 𝐶 (𝜌∥𝜎) is the optimal asymptotic error exponent for symmetric
hypothesis testing of 𝜌 and 𝜎.

Remark: Theorem 5.4 tells us that, as 𝑛 becomes large, the following approximation holds

𝑝 ∗err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 ) ≈ 2−𝑛 𝐶 (𝜌∥ 𝜎) , (5.3.46)

so that the optimal error probability does indeed decay exponentially with the number 𝑛 of copies
of the state. In particular, the quantum Chernoff divergence is the optimal asymptotic error
exponent for the exponential decay of the error probability as the number 𝑛 of copies increases.
Note that the optimal error exponent in (5.3.44) is independent of the prior probability
distribution. This means that, in the asymptotic limit (i.e., in the limit 𝑛 → ∞), the prior
probability distribution is irrelevant for the optimal error exponent.
We call Theorem 5.4 the quantum Chernoff bound because it is analogous to the Chernoff
bound from (classical) symmetric hypothesis testing, which is the task of discriminating between
two hypotheses, each of which has a corresponding probability distribution (please consult the
Bibliographic Notes in Section 3.4 for details).

An important element of the proof of the quantum Chernoff bound is Lemma 5.5
below.

Lemma 5.5
Let 𝐴 and 𝐵 be positive semi-definite operators. Then, for all 𝑠 ∈ (0, 1),
1
(Tr[ 𝐴 + 𝐵] − ∥ 𝐴 − 𝐵∥ 1 ) ≤ Tr[ 𝐴 𝑠 𝐵1−𝑠 ]. (5.3.47)
2

Proof: Let Δ B 𝐴 − 𝐵, and let Δ+ and Δ− be the positive and negative parts,
respectively, of Δ, so that Δ = Δ+ − Δ− (recall (2.2.66)). Then,

|Δ| = |Δ+ − Δ− | = Δ+ + Δ− . (5.3.48)

Therefore,
∥ 𝐴 − 𝐵∥ 1 = ∥Δ∥ 1 = Tr[|Δ|] = Tr[Δ+ ] + Tr[Δ− ]. (5.3.49)

240
Chapter 5: Fundamental Quantum Information Processing Tasks

In addition, we write
𝐴 + 𝐵 = 𝐴 − 𝐵 + 2𝐵 = Δ+ − Δ− + 2𝐵, (5.3.50)
so that
1
(Tr[ 𝐴 + 𝐵] − ∥ 𝐴 − 𝐵∥ 1 )
2
1
= (Tr[Δ+ ] − Tr[Δ− ] + 2Tr[𝐵] − Tr[Δ+ ] − Tr[Δ− ]) (5.3.51)
2
= Tr[𝐵] − Tr[Δ− ]. (5.3.52)
So it suffices to prove that the following inequality holds for all 𝑠 ∈ (0, 1):
Tr[𝐵] − Tr[Δ− ] ≤ Tr[ 𝐴 𝑠 𝐵1−𝑠 ]. (5.3.53)
Using the fact that Δ+ ≥ 0, by definition of the positive part of Δ, we obtain
𝐵 + Δ+ ≥ 𝐵. (5.3.54)
Similarly, using 𝐴 − 𝐵 = Δ+ − Δ− , we obtain
𝐴 + Δ− = 𝐵 + Δ+ ≥ 𝐵. (5.3.55)
By the operator monotonicity of 𝑥 ↦→ 𝑥 𝑠 for 𝑠 ∈ (0, 1) (recall Definition 2.13 and
thereafter), the inequality in (5.3.55) implies that
𝐵 𝑠 ≤ ( 𝐴 + Δ− ) 𝑠 . (5.3.56)
Using this, along with the fact that Tr[𝐵] = Tr[𝐵 𝑠 𝐵1−𝑠 ], we find that
Tr[𝐵] − Tr[ 𝐴 𝑠 𝐵1−𝑠 ] = Tr[(𝐵 𝑠 − 𝐴 𝑠 )𝐵1−𝑠 ] (5.3.57)
≤ Tr[(( 𝐴 + Δ− ) 𝑠 − 𝐴 𝑠 )𝐵1−𝑠 ] (5.3.58)
≤ Tr[(( 𝐴 + Δ− ) 𝑠 − 𝐴 𝑠 )( 𝐴 + Δ− ) 1−𝑠 ] (5.3.59)
= Tr[ 𝐴] + Tr[Δ− ] − Tr[ 𝐴 𝑠 ( 𝐴 + Δ− ) 1−𝑠 ] (5.3.60)
≤ Tr[ 𝐴] + Tr[Δ− ] − Tr[ 𝐴] (5.3.61)
= Tr[Δ− ]. (5.3.62)

The first inequality follows because 𝐵1−𝑠 ≥ 0 and from (5.3.56). The second
inequality follows because ( 𝐴 + Δ− ) 𝑠 ≥ 𝐴 𝑠 and from (5.3.56) with the substitution
𝑠 → 1 − 𝑠. The last inequality follows because
Tr[ 𝐴 𝑠 ( 𝐴 + Δ− ) 1−𝑠 ] ≥ Tr[ 𝐴 𝑠 𝐴1−𝑠 ] = Tr[ 𝐴], (5.3.63)
241
Chapter 5: Fundamental Quantum Information Processing Tasks

which in turn follows because 𝐴 𝑠 ≥ 0 and ( 𝐴 + Δ− ) 1−𝑠 ≥ 𝐴1−𝑠 . Therefore, we

conclude that
Tr[𝐵] − Tr[ 𝐴 𝑠 𝐵1−𝑠 ] ≤ Tr[Δ− ], (5.3.64)
establishing the desired inequality in (5.3.53). ■

Exercise 5.13
Let 𝐴 and 𝐵 be positive semi-definite operators and 𝑠 ∈ (0, 1). Starting with
Lemma 5.5, prove that
1 1
∥ 𝐴 − 𝐵∥ 1 ≥ Tr[ 𝐴 + 𝐵] − 𝐴 𝑠 𝐵1−𝑠 1
. (5.3.65)
2 2
(Hint: Recall Theorem 2.10.)

Remark: In the case 𝑠 = 21 , the inequality in (5.3.65) becomes

1 1 √ √
∥ 𝐴 − 𝐵∥ 1 ≥ Tr[ 𝐴 + 𝐵] − 𝐴 𝐵 (5.3.66)
2 2 1
1 √︁
= Tr[ 𝐴 + 𝐵] − 𝐹 ( 𝐴, 𝐵), (5.3.67)
2
where in the second line we let
√ √ 2
𝐹 ( 𝐴, 𝐵) B 𝐴 𝐵 . (5.3.68)
1
The quantity 𝐹 ( 𝐴, 𝐵) is called the fidelity between 𝐴 and 𝐵, and it plays a critical role in
the analysis of quantum communication protocols. We study the fidelity function in detail in
Chapter 6.

We now proceed to the proof of the quantum Chernoff bound.

Proof of the Quantum Chernoff Bound (Theorem 5.4): Since the limit on
the left-hand side of (5.3.44) need not exist a priori, let us define the following
quantities based on the discussion in Section 2.3.1:
1
𝜉 (𝜌, 𝜎) B lim inf − log2 𝑝 ∗err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 ), (5.3.69)
𝑛→∞ 𝑛
1
𝜉 (𝜌, 𝜎) B lim sup − log2 𝑝 ∗err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 ). (5.3.70)
𝑛→∞ 𝑛

242
Chapter 5: Fundamental Quantum Information Processing Tasks

Note that, by definition, we always have 𝜉 (𝜌, 𝜎) ≤ 𝜉 (𝜌, 𝜎); see (2.3.5). Our goal
is to prove that 𝜉 (𝜌, 𝜎) = 𝜉 (𝜌, 𝜎) = lim𝑛→∞ − 𝑛1 log2 𝑝 ∗err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 ) = 𝐶 (𝜌∥𝜎).
Now, if 𝜆 is the probability with which 𝜌 is chosen, and 1 − 𝜆 the probability
with which 𝜎 is chosen, then an application of Lemma 5.5, with 𝐴 = 𝜆𝜌 ⊗𝑛 ,
𝐵 = (1 − 𝜆)𝜎 ⊗𝑛 , and 𝑠 ∈ (0, 1), gives the following:
1
𝑝 ∗err (𝜆, ⊗𝑛 ⊗𝑛
𝜌 ,𝜎 ) = ⊗𝑛
1 − 𝜆𝜌 − (1 − 𝜆)𝜎 1 ⊗𝑛
(5.3.71)
2
≤ Tr 𝜆 𝑠 (𝜌 ⊗𝑛 ) 𝑠 (1 − 𝜆) 1−𝑠 (𝜎 ⊗𝑛 ) 1−𝑠

(5.3.72)
= 𝜆 𝑠 (1 − 𝜆) 1−𝑠 Tr[(𝜌 𝑠 ) ⊗𝑛 (𝜎 1−𝑠 ) ⊗𝑛 ] (5.3.73)
𝑛
𝑠 1−𝑠 𝑠 1−𝑠
= 𝜆 (1 − 𝜆) Tr[𝜌 𝜎 ] (5.3.74)
𝑛
≤ Tr[𝜌 𝑠 𝜎 1−𝑠 ] , (5.3.75)

where the last line follows from the fact that 𝜆 𝑠 (1 − 𝜆) 1−𝑠 ≤ 1. By taking a negative
logarithm and dividing by 𝑛, this bound becomes
1
− log2 𝑝 ∗err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 ) ≥ − log2 Tr[𝜌 𝑠 𝜎 1−𝑠 ]. (5.3.76)
𝑛
Since the bound above holds for all 𝑠 ∈ (0, 1), we obtain the following bound:
1
𝜉 (𝜌, 𝜎) = lim inf − log2 𝑝 ∗err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 ) (5.3.77)
𝑛→∞ 𝑛
≥ sup − log2 Tr[𝜌 𝑠 𝜎 1−𝑠 ] (5.3.78)
𝑠∈(0,1)
= 𝐶 (𝜌∥𝜎). (5.3.79)

For the opposite inequality, we start by observing that it suffices to optimize over
projectors when determining the optimal error probability 𝑝 ∗err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 ). (This
is true because one choice of an optimal measurement is a projective measurement,
as shown in the proof of Theorem 5.3, where we can set Λ0 = 0.) Next, suppose
that 𝜌 and 𝜎 have the following spectral decompositions:
𝑑−1
∑︁ 𝑑−1
∑︁
𝜌= 𝜆𝑖 |𝜓𝑖 ⟩⟨𝜓𝑖 |, 𝜎= 𝜇 𝑗 |𝜙 𝑗 ⟩⟨𝜙 𝑗 |, (5.3.80)
𝑖=0 𝑗=0

243
Chapter 5: Fundamental Quantum Information Processing Tasks

where 𝑑 is the dimension of the underlying Hilbert space. Then, for every
projection Π,
𝑑−1
Tr[( 1 − Π) 𝜌] = 𝜆𝑖 Tr[( 1 − Π)|𝜓𝑖 ⟩⟨𝜓𝑖 |]
∑︁
(5.3.81)
𝑖=0
𝑑−1
𝜆𝑖 Tr[( 1 − Π)|𝜓𝑖 ⟩⟨𝜓𝑖 |( 1 − Π)]
∑︁
= (5.3.82)
𝑖=0
𝑑−1
𝜆𝑖 Tr[|𝜙 𝑗 ⟩⟨𝜙 𝑗 |( 1 − Π)|𝜓𝑖 ⟩⟨𝜓𝑖 |( 1 − Π)]
∑︁
= (5.3.83)
𝑖, 𝑗=0
𝑑−1
𝜆𝑖 ⟨𝜓𝑖 |( 1 − Π)|𝜙 𝑗 ⟩ ,
∑︁ 2
= (5.3.84)
𝑖, 𝑗=0

where we have used the fact that 1 =

Í𝑑−1
𝑗=0 |𝜙 𝑗 ⟩⟨𝜙 𝑗 |. Similarly, using the fact that
1 = 𝑖=0
Í𝑑−1
|𝜓𝑖 ⟩⟨𝜓𝑖 |, we have
𝑑−1
∑︁
Tr[Π𝜎] = 𝜇 𝑗 Tr[Π|𝜙 𝑗 ⟩⟨𝜙 𝑗 |] (5.3.85)
𝑗=0
𝑑−1
∑︁
= 𝜇 𝑗 Tr[Π|𝜙 𝑗 ⟩⟨𝜙 𝑗 |Π] (5.3.86)
𝑗=0
𝑑−1
∑︁
= 𝜇 𝑗 Tr[|𝜓𝑖 ⟩⟨𝜓𝑖 |Π|𝜙 𝑗 ⟩⟨𝜙 𝑗 |Π] (5.3.87)
𝑖, 𝑗=0
𝑑−1
∑︁ 2
= 𝜇 𝑗 ⟨𝜓𝑖 |Π|𝜙 𝑗 ⟩ . (5.3.88)
𝑖, 𝑗=0

Then, using the fact that 𝜆𝑖 ≥ min{𝜆𝑖 , 𝜇 𝑗 } and 𝜇 𝑗 ≥ min{𝜆𝑖 , 𝜇 𝑗 } for all 0 ≤ 𝑖, 𝑗 ≤
𝑑 − 1, the error probability under the measurement given by Π is

𝑝 err (𝜆, 𝜌, 𝜎, Π)
= Tr[( 1 − Π)(𝜆𝜌)] + Tr[Π(1 − 𝜆)𝜎] (5.3.89)
𝑑−1
𝜆𝜆𝑖 ⟨𝜓𝑖 |( 1 − Π)|𝜙 𝑗 ⟩ + (1 − 𝜆)𝜇 𝑗 ⟨𝜓𝑖 |Π|𝜙 𝑗 ⟩
∑︁ 2 2
= (5.3.90)
𝑖, 𝑗=0

244
Chapter 5: Fundamental Quantum Information Processing Tasks

1
2 2
≥ min{𝜆𝜆𝑖 , (1 − 𝜆)𝜇 𝑗 } 𝑥𝑖, 𝑗 + 𝑦𝑖, 𝑗 , (5.3.91)
2
where we have defined 𝑥𝑖, 𝑗 B ⟨𝜓𝑖 |( 1 − Π)|𝜙 𝑗 ⟩ and 𝑦𝑖, 𝑗 B ⟨𝜓𝑖 |Π|𝜙 𝑗 ⟩ in the last line.
Using the identity |𝑎| 2 + |𝑏| 2 ≥ 12 |𝑎 + 𝑏| 2 , which holds for all 𝑎, 𝑏 ∈ C, we obtain
𝑑−1
1 ∑︁ 2
𝑝 err (𝜆, 𝜌, 𝜎, Π) ≥ min{𝜆𝜆𝑖 , (1 − 𝜆)𝜇 𝑗 } ⟨𝜓𝑖 |𝜙 𝑗 ⟩ . (5.3.92)
2 𝑖, 𝑗=0

Now, let us define two probability distributions, 𝑝, 𝑞 : {0, 1, . . . , 𝑑 − 1}2 → [0, 1]

as follows:
2 2
𝑝(𝑖, 𝑗) B 𝜆𝑖 ⟨𝜓𝑖 |𝜙 𝑗 ⟩ , 𝑞(𝑖, 𝑗) B 𝜇 𝑗 ⟨𝜓𝑖 |𝜙 𝑗 ⟩ , ∀ 0 ≤ 𝑖, 𝑗 ≤ 𝑑 − 1. (5.3.93)

It is straightforward to verify that 𝑝 and 𝑞 are indeed probability distributions.

Then, since the projector Π is arbitrary, and it suffices to optimize over projective
measurements, as explained above, the following inequality holds
𝑑−1
1 ∑︁
𝑝 ∗err (𝜆, 𝜌, 𝜎) ≥ min{𝜆𝑝(𝑖, 𝑗), (1 − 𝜆)𝑞(𝑖, 𝑗)}. (5.3.94)
2 𝑖, 𝑗=0

The expression on the right-hand side of the above inequality is precisely half the
optimal error probability for discriminating the two probability distributions 𝑝 and 𝑞,
with a prior probability of 𝜆. Indeed, we can see this by an application Íof Theorem 5.3
𝑑−1
to the case of commutative 𝐴 and 𝐵. To this end, letting 𝐴 = 𝑖=0 𝑎𝑖 |𝑖⟩⟨𝑖| and
Í𝑑−1
𝐵 = 𝑖=0 𝑏𝑖 |𝑖⟩⟨𝑖| where 𝑎𝑖 , 𝑏𝑖 ≥ 0 for all 𝑖 ∈ {0, . . . , 𝑑−1}, it follows that an optimal
Í Í
measurement operator Π+ = 𝑖:𝑎𝑖 ≥𝑏𝑖 |𝑖⟩⟨𝑖| and its complement Π− = 𝑖:𝑎𝑖 <𝑏𝑖 |𝑖⟩⟨𝑖|,
so that

inf {Tr[( 1 − 𝑀) 𝐴] + Tr[𝑀 𝐵]} = Tr[Π− 𝐴] + Tr[Π+ 𝐵] (5.3.95)

𝑀:0≤𝑀 ≤1
∑︁ ∑︁
= 𝑎𝑖 + 𝑏𝑖 (5.3.96)
𝑖:𝑎 𝑖 <𝑏 𝑖 𝑖:𝑎 𝑖 ≥𝑏 𝑖
𝑑−1
∑︁
= min{𝑎𝑖 , 𝑏𝑖 }. (5.3.97)
𝑖=0

We denote this optimal error probability by 𝑝 err (𝜆, 𝑝, 𝑞). Therefore,

1 ∗
𝑝 ∗err (𝜆, 𝜌, 𝜎) ≥ 𝑝 (𝜆, 𝑝, 𝑞). (5.3.98)
2 err
245
Chapter 5: Fundamental Quantum Information Processing Tasks

Now, for 𝑛 ≥ 2, by following the same arguments as above, we obtain

1
𝑝 ∗err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 ) ≥ 𝑝 ∗err (𝜆, 𝑝 (𝑛) , 𝑞 (𝑛) ), (5.3.99)
2
where 𝑝 (𝑛) , 𝑞 (𝑛) are the 𝑛-fold i.i.d. probability distributions corresponding to 𝑝
and 𝑞, respectively. Then, by the classical Chernoff bound (please consult the
Bibliographic Notes in Section 3.4), we find that
1
𝜉 (𝜌, 𝜎) = lim sup − log2 𝑝 ∗err (𝜆, 𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 ) (5.3.100)
𝑛→∞ 𝑛
1
≤ lim sup − log2 𝑝 ∗err (𝜆, 𝑝 (𝑛) , 𝑞 (𝑛) ) (5.3.101)
𝑛→∞ 𝑛
𝑑−1
∑︁
= sup − log2 𝑝(𝑖, 𝑗) 𝑠 𝑞(𝑖, 𝑗) 1−𝑠 , (5.3.102)
𝑠∈(0,1) 𝑖, 𝑗=0

where the factor of 12 in (5.3.99) vanishes in the asymptotic limit. Finally, observe
that
𝑑−1 𝑑−1
2 𝑠 1−𝑠 2 1−𝑠
∑︁ ∑︁
𝑠 1−𝑠 𝑠
𝑝(𝑖, 𝑗) 𝑞(𝑖, 𝑗) = 𝜆𝑖 ⟨𝜓𝑖 |𝜙 𝑗 ⟩ 𝜇𝑗 ⟨𝜓𝑖 |𝜙 𝑗 ⟩ (5.3.103)
𝑖, 𝑗=0 𝑖, 𝑗=0
𝑑−1
∑︁ 2
= 𝜆𝑖𝑠 𝜇1−𝑠
𝑗 ⟨𝜓𝑖 |𝜙 𝑗 ⟩ (5.3.104)
𝑖, 𝑗=0
𝑑−1
∑︁
= 𝜆𝑖𝑠 𝜇1−𝑠
𝑗 Tr[|𝜓𝑖 ⟩⟨𝜓𝑖 |𝜙 𝑗 ⟩⟨𝜙 𝑗 |] (5.3.105)
𝑖, 𝑗=0
! 𝑑−1
 𝑑−1 𝑠
 ∑︁ ∑︁ 
1−𝑠

= Tr  𝜆𝑖 |𝜓𝑖 ⟩⟨𝜓𝑖 | 𝜇 𝑗 |𝜙 𝑗 ⟩⟨𝜙 𝑗 | ®
© ª
(5.3.106)
 𝑖=0
« 𝑗=0

 ¬
= Tr[𝜌 𝑠 𝜎 1−𝑠 ]. (5.3.107)
Therefore,
𝜉 (𝜌, 𝜎) ≤ sup − log2 Tr[𝜌 𝑠 𝜎 1−𝑠 ] = 𝐶 (𝜌∥𝜎), (5.3.108)
𝑠∈(0,1)

which, combined with (5.3.79) and 𝜉 (𝜌, 𝜎) ≤ 𝜉 (𝜌, 𝜎), implies that
𝜉 (𝜌, 𝜎) = 𝜉 (𝜌, 𝜎) = sup − log2 Tr[𝜌 𝑠 𝜎 1−𝑠 ] = 𝐶 (𝜌∥𝜎), (5.3.109)
𝑠∈(0,1)

which is what we set out to prove. ■

246
Chapter 5: Fundamental Quantum Information Processing Tasks

5.3.2 Multiple State Discrimination

We now briefly consider state discrimination when there are more than two states.
Suppose that Alice prepares a quantum system in a state chosen randomly from a
set {𝜌 𝑥 }𝑥∈X of states. We assume that X is some finite alphabet with size |X| ≥ 2
and that the state 𝜌 𝑥 is chosen with probability 𝑝(𝑥), where 𝑝 : X → [0, 1] is a
probability distribution. Alice sends her chosen state to Bob, whose task is to guess
the value of 𝑥, i.e., which state Alice sent. Bob’s knowledge of the system can be
described by the ensemble {( 𝑝(𝑥), 𝜌 𝑥𝐵 )}𝑥∈X , which has the following associated
classical–quantum state:
∑︁
𝜌 𝑋𝐴𝐵 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 𝐴 ⊗ 𝜌 𝑥𝐵 , (5.3.110)
𝑥∈X
where 𝑋 𝐴 is a classical |X|-dimensional register that contains Alice’s choice of
state. Note that while Bob knows both the prior probability distribution 𝑝 and
the association 𝑥 ↔ 𝜌 𝑥 between the labels 𝑥 and the states 𝜌 𝑥 , he does not have
access to the register 𝑋 𝐴 . (If Bob did have access to the classical register 𝑋 𝐴 , he
could simply measure it in the basis {|𝑥⟩}𝑥∈X and figure out what state Alice sent.)
Therefore, as before, Bob must make a measurement. His strategy is to choose a
POVM {𝑀𝐵𝑥 }𝑥∈X with elements indexed by the elements of X. If he obtains the
outcome corresponding to 𝑥 ∈ X, then he guesses that the state sent was 𝜌 𝑥 .
The scenario of multiple state discrimination is very similar to the task of
classical communication over a quantum channel N that we consider in Chapter 12.
The classical messages to be sent correspond to the labels 𝑥 ∈ X, while the states 𝜌 𝑥
correspond to an encoding of the messages into quantum states, which are then sent
through the quantum channel, and 𝑝 corresponds to the prior probability over the set
of messages. The measurement performed, in order to guess the state, corresponds
to a decoding channel that is applied at the receiving end of the quantum channel in
order to determine the message that was sent. The quantity in (5.3.119), evaluated
for the ensemble {( 𝑝(𝑥), N(𝜌 𝑥 ))}𝑥∈X , is then the optimal average probability of
correctly guessing the message sent, where the optimization is over all POVMs
indexed by the messages.
Í
Let M𝐵→𝑋𝐵 (·) B 𝑥∈X Tr[𝑀𝐵𝑥 (·)]|𝑥⟩⟨𝑥| 𝑋𝐵 be the measurement channel corre-
sponding to the POVM {𝑀𝐵𝑥 }𝑥∈X , where 𝑋𝐵 is a |X|-dimensional classical register.
(Recall the definition of a measurement channel from Definition 4.10.) After the
measurement, the classical–quantum state in (5.3.110) transforms to
𝜔 𝑋 𝐴 𝑋𝐵 B M𝐵→𝑋𝐵 (𝜌 𝑋 𝐴 𝐵 ) (5.3.111)
247
Chapter 5: Fundamental Quantum Information Processing Tasks
∑︁ ′
= 𝑝(𝑥)Tr[𝑀𝐵𝑥 𝜌 𝑥𝐵 ]|𝑥⟩⟨𝑥| 𝑋 𝐴 ⊗ |𝑥 ′⟩⟨𝑥 ′ | 𝑋𝐵 . (5.3.112)
𝑥,𝑥 ′ ∈X

Let ∑︁
Π succ
𝑋 𝐴 𝑋𝐵 B |𝑥⟩⟨𝑥| 𝑋 𝐴 ⊗ |𝑥⟩⟨𝑥| 𝑋𝐵 , (5.3.113)
𝑥∈X
which is a projector corresponding to the registers 𝑋 𝐴 and 𝑋𝐵 having the same
value; this is what we want for state discrimination to be successful. The expected
success probability of the strategy given by the POVM {𝑀 𝑥 }𝑥∈X is thus

𝑝 succ ({( 𝑝(𝑥), 𝜌 𝑥 )}𝑥 , {𝑀 𝑥 }𝑥 ) B Tr[Π 𝑋 𝐴 𝑋𝐵 𝜔 𝑋 𝐴 𝑋𝐵 ] (5.3.114)

∑︁
= 𝑝(𝑥)Tr[𝑀 𝑥 𝜌 𝑥 ]. (5.3.115)
𝑥∈X

Exercise 5.14
Show that the error probability for discriminating the states in the ensemble
{( 𝑝(𝑥), 𝜌 𝑥 )}𝑥∈X using the POVM {𝑀 𝑥 }𝑥∈X is

𝑝 err ({( 𝑝(𝑥), 𝜌 𝑥 )}𝑥 , {𝑀 𝑥 }𝑥 ) B 1 − 𝑝 succ ({( 𝑝(𝑥), 𝜌 𝑥 )}𝑥 , {𝑀 𝑥 }𝑥 ) (5.3.116)

= Tr[( 1 𝑋 𝐴 𝑋𝐵 − Π 𝑋 𝐴 𝑋𝐵 )𝜔 𝑋 𝐴 𝑋𝐵 ] (5.3.117)
𝑝(𝑥)Tr[( 1 − 𝑀 𝑥 ) 𝜌 𝑥 ].
∑︁
= (5.3.118)
𝑥∈X

The optimal success probability for discriminating the states in the ensemble
{( 𝑝(𝑥), 𝜌 𝑥 )}𝑥∈X is

𝑝 ∗succ ({( 𝑝(𝑥), 𝜌 𝑥 )}𝑥 )

( )
𝑝(𝑥)Tr[𝑀 𝑥 𝜌 𝑥 ] : 0 ≤ 𝑀 𝑥 ≤ 1 ∀𝑥, 𝑀 𝑥 = 1 . (5.3.119)
∑︁ ∑︁
B sup
{𝑀 𝑥 } 𝑥 𝑥∈X 𝑥∈X

The optimal error probability is then 𝑝 ∗err ({( 𝑝(𝑥), 𝜌 𝑥 )}𝑥 ) B 1−𝑝 ∗succ ({( 𝑝(𝑥), 𝜌 𝑥 )}𝑥 ).

Exercise 5.15
Given an ensemble {( 𝑝(𝑥), 𝜌 𝑥 )}𝑥∈X of quantum states with associated classical–
Í
quantum state 𝜌 𝑋 𝐵 = 𝑥∈X 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐵 , show that the optimal success

248
Chapter 5: Fundamental Quantum Information Processing Tasks

probability 𝑝 ∗succ ({( 𝑝(𝑥), 𝜌 𝑥 )}𝑥 ) can be evaluated using the following semi-
definite program:
maximize Tr[𝑀 𝑋 𝐵 𝜌 𝑋 𝐵 ]
subject to Tr 𝑋 [𝑀 𝑋 𝐵 ] = 1𝐵 , (5.3.120)
𝑀 𝑋 𝐵 ≥ 0.
In other words, show that

𝑝 ∗succ ({( 𝑝(𝑥), 𝜌 𝑥 )}𝑥 ) = sup {Tr[𝑀 𝑋 𝐵 𝜌 𝑋 𝐵 ] : Tr 𝑋 [𝑀 𝑋 𝐵 ] = 1𝐵 } . (5.3.121)

𝑀𝑋𝐵 ≥0

Then, using strong duality, prove that

𝑝 ∗succ ({( 𝑝(𝑥), 𝜌 𝑥 )}𝑥 ) = inf {Tr[𝑌𝐵 ] : 1 𝑋 ⊗ 𝑌𝐵 ≥ 𝜌 𝑋 𝐵 }, (5.3.122)

𝑌𝐵 ≥0
= inf {Tr[𝑌𝐵 ] : 𝑌𝐵 ≥ 𝑝(𝑥) 𝜌 𝑥𝐵 ∀𝑥 ∈ X}. (5.3.123)
𝑌𝐵 ≥0

Finally, evaluate the complementary slackness conditions from Proposition 2.29.

Exercise 5.16
Let {|𝜓 𝑥 ⟩}𝑥∈X be a finite set of mutually orthogonal vectors, let 𝜌 𝑥 = |𝜓 𝑥 ⟩⟨𝜓 𝑥 |
for all 𝑥 ∈ X, and consider the ensemble {( 𝑝(𝑥), 𝜌 𝑥 )}𝑥∈X , where 𝑝 : X → [0, 1]
is a probability distribution. Prove that 𝑝 ∗succ ({( 𝑝(𝑥), 𝜌 𝑥 )}𝑥 ) = 1 and construct
the corresponding optimal measurement.

Exercise 5.17
Generalize Proposition 5.2 to the case of multiple state discrimination. Specif-
ically, for every finite ensemble {( 𝑝(𝑥), 𝜌 𝑥 )}𝑥∈X of states and every positive,
trace preserving map N, prove that

𝑝 ∗succ ({( 𝑝(𝑥), 𝜌 𝑥 )}𝑥 ) ≥ 𝑝 ∗succ ({( 𝑝(𝑥), N(𝜌 𝑥 ))}𝑥 ). (5.3.124)

249
Chapter 5: Fundamental Quantum Information Processing Tasks

5.3.3 Asymmetric Case

Given quantum states 𝜌 and 𝜎, the goal of asymmetric quantum hypothesis testing
is to minimize the type-II error probability given an upper bound on the type-I
error probability, as per the optimization problem in (5.3.3). The value of that
optimization problem is given by

𝛽𝜀 (𝜌∥𝜎) B inf{Tr[𝑀𝜎] : 0 ≤ 𝑀 ≤ 1, Tr[𝑀 𝜌] ≥ 1 − 𝜀}, (5.3.125)

with 𝜀 ∈ [0, 1] being an upper bound on the type-I error probability. Intuitively,
we might expect a trade-off between the type-I and type-II error probabilities. In
particular, we might expect that we can achieve a lower minimum type-II error
probability by increasing our tolerance on the type-I error probability. This is indeed
true. Observe that every measurement operator 𝑀 that satisfies Tr[𝑀 𝜌] ≥ 1 − 𝜀
also satisfies Tr[𝑀 𝜌] ≥ 1 − 𝜀′ for all 𝜀′ greater than 𝜀. All such measurement
operators are thus feasible points in the optimization for 𝛽𝜀′ (𝜌∥𝜎). Therefore,

𝛽𝜀′ (𝜌∥𝜎) ≤ 𝛽𝜀 (𝜌∥𝜎) ∀ 𝜀′ ∈ [𝜀, 1]. (5.3.126)

Exercise 5.18
Show that 𝛽𝜀 (𝜌∥𝜎) can be evaluated using a semi-definite program. Then,
using strong duality, prove that an alternate expression for 𝛽𝜀 (𝜌∥𝜎) is

𝛽𝜀 (𝜌∥𝜎) = sup {𝜇(1 − 𝜀) − Tr[𝑍] : 𝜇𝜌 ≤ 𝜎 + 𝑍 }. (5.3.127)

𝜇≥0,𝑍 ≥0

Finally, evaluate the complementary slackness conditions from Proposition 2.29

and conclude that

𝑀 (𝜎 + 𝑍) = 𝜇𝑀 𝜌, Tr[𝑀 𝜌] 𝜇 = (1 − 𝜀)𝜇, 𝑀 𝑍 = 𝑍. (5.3.128)

Exercise 5.19
Prove that the minimum type-II error probability for asymmetric hypothesis
testing of states 𝜌 and 𝜎, with 𝜀 ∈ [0, 1], is isometrically invariant: for every
isometry 𝑉, we have that 𝛽𝜀 (𝜌∥𝜎) = 𝛽𝜀 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ).

As with the minimum error probability for symmetric hypothesis testing, the
250
Chapter 5: Fundamental Quantum Information Processing Tasks

minimum type-II error probability for asymmetric hypothesis testing obeys the
following data-processing inequality.

Proposition 5.6 Data-Processing Inequality for Asymmetric Hypothesis

Testing
Consider states 𝜌 and 𝜎, 𝜀 ∈ [0, 1], and let N be a positive and trace preserving
map. Then,
𝛽𝜀 (𝜌∥𝜎) ≤ 𝛽𝜀 (N(𝜌)∥N(𝜎)). (5.3.129)

Proof: The proof is analogous to the proof of Proposition 5.2, and the intuition
behind it is as well. Let 𝑀 ′ be an operator satisfying 0 ≤ 𝑀 ′ ≤ 1 and Tr[𝑀 ′N(𝜌)] ≥
1 − 𝜀. Then, due to the positivity of N, and thus of N† , we have that N† (𝑀 ′) ≥ 0
and N† ( 1 − 𝑀 ′) ≥ 0 ⇒ N† (𝑀 ′) ≤ N† ( 1). Since N is trace preserving, the
adjoint N† is unital (see Exercise 4.10), which means that N† ( 1) = 1, so that
0 ≤ N† (𝑀 ′) ≤ 1. Furthermore, by definition of the adjoint of a superoperator, the
inequality Tr[N† (𝑀 ′) 𝜌] ≥ 1 − 𝜀 holds. Therefore, N† (𝑀 ′) is a feasible point in
the optimization for 𝛽𝜀 (𝜌∥𝜎), so that

𝛽𝜀 (𝜌∥𝜎) = inf{Tr[𝑀𝜎] : 0 ≤ 𝑀 ≤ 1, Tr[𝑀 𝜌] ≥ 1 − 𝜀} (5.3.130)

≤ Tr[N† (𝑀 ′)𝜎] (5.3.131)
= Tr[𝑀 ′N(𝜎)], (5.3.132)

where the last line follows by definition of the adjoint of a superoperator. Finally,
because the inequality 𝛽𝜀 (𝜌∥𝜎) ≤ Tr[𝑀 ′N(𝜎)] holds for every operator 𝑀 ′
satisfying 0 ≤ 𝑀 ′ ≤ 1 and Tr[𝑀 ′N(𝜌)] ≥ 1 − 𝜀, we conclude that

𝛽𝜀 (𝜌∥𝜎) ≤ inf{Tr[𝑀 ′N(𝜎)] : 0 ≤ 𝑀 ′ ≤ 1, Tr[𝑀 ′N(𝜌)] ≥ 1 − 𝜀} (5.3.133)

= 𝛽𝜀 (N(𝜌)∥N(𝜎)), (5.3.134)

as required. ■

Proposition 5.7 Optimal Measurement for Asymmetric Hypothesis Test-

ing
For every state 𝜌, positive semi-definite operator 𝜎, and 𝜀 ∈ [0, 1], the minimum
type-II error probabilitya 𝛽𝜀 (𝜌∥𝜎) is achieved by the following measurement

251
Chapter 5: Fundamental Quantum Information Processing Tasks

operator:
𝑀 (𝜇∗ , 𝑝 ∗ ) B Π 𝜇∗ 𝜌>𝜎 + 𝑝 ∗ Π 𝜇∗ 𝜌=𝜎 , (5.3.135)
where Π 𝜇∗ 𝜌>𝜎 is the projection onto the strictly positive part of 𝜇∗ 𝜌 − 𝜎, the
projection Π 𝜇∗ 𝜌=𝜎 projects onto the zero eigenspace of 𝜇∗ 𝜌 − 𝜎, and 𝜇∗ ≥ 0
and 𝑝 ∗ ∈ [0, 1] are chosen as follows:

𝜇∗ B sup 𝜇 : Tr[Π 𝜇𝜌>𝜎 𝜌] ≤ 1 − 𝜀 ,

(5.3.136)
1 − 𝜀 − Tr[Π 𝜇∗ 𝜌>𝜎 𝜌]
𝑝∗ B . (5.3.137)
Tr[Π 𝜇∗ 𝜌=𝜎 𝜌]
a Even though this quantity need not be an error probability when 𝜎 is a general positive
semi-definite operator, we still refer to it as such.

Proof: First, observe that it suffices to optimize with respect to every measurement
operator 𝑀 that meets the constraint Tr[𝑀 𝜌] ≥ 1 − 𝜀 with equality. This follows
because for every measurement operator 𝑀 such that Tr[𝑀 𝜌] > 1 − 𝜀, there exists
a positive number 𝜆 ∈ [0, 1) such that Tr[(𝜆𝑀) 𝜌] = 1 − 𝜀. Note that 0 ≤ 𝜆𝑀 ≤ 1,
so that 𝜆𝑀 is a measurement operator, and because Tr[(𝜆𝑀)𝜎] < Tr[𝑀𝜎], we
conclude that
𝛽𝜀 (𝜌∥𝜎) = inf{Tr[𝑀𝜎] : 0 ≤ 𝑀 ≤ 1, Tr[𝑀 𝜌] = 1 − 𝜀}. (5.3.138)
Based on this, let 𝑀 be a measurement operator satisfying Tr[𝑀 𝜌] = 1 − 𝜀 and let
𝜇 ≥ 0. Then,
Tr[𝑀𝜎] = Tr[𝑀𝜎] + 𝜇 (1 − 𝜀 − Tr[𝑀 𝜌]) (5.3.139)
= −𝜇𝜀 + Tr[( 1 − 𝑀)𝜇𝜌] + Tr[𝑀𝜎] (5.3.140)
1
≥ −𝜇𝜀 + (Tr[𝜇𝜌 + 𝜎] − ∥𝜇𝜌 − 𝜎∥ 1 ) (5.3.141)
2
1
= −𝜇𝜀 + (𝜇 + Tr[𝜎] − ∥𝜇𝜌 − 𝜎∥ 1 ) . (5.3.142)
2
The sole inequality follows as an application of Theorem 5.3, with 𝐵 = 𝜎 and
𝐴 = 𝜇𝜌. Observe that the final expression is a universal bound independent of 𝑀.
To determine an optimal measurement operator, we can look to Theorem 5.3.
There, it was established that the following measurement operator is an optimal
one for inf 𝑀:0≤𝑀 ≤1 {Tr[( 1 − 𝑀)𝜇𝜌] + Tr[𝑀𝜎]}:
𝑀 (𝜇, 𝑝) B Π 𝜇𝜌>𝜎 + 𝑝Π 𝜇𝜌=𝜎 , (5.3.143)
252
Chapter 5: Fundamental Quantum Information Processing Tasks

where Π 𝜇𝜌>𝜎 is the projection onto the strictly positive part of 𝜇𝜌 − 𝜎, the
projection Π 𝜇𝜌=𝜎 projects onto the zero eigenspace of 𝜇𝜌 − 𝜎, and 𝑝 ∈ [0, 1]. The
measurement operator 𝑀 (𝜇, 𝑝) is called a quantum Neyman–Pearson test. We still
need to choose the parameters 𝜇 ≥ 0 and 𝑝 ∈ [0, 1]. Let us pick 𝜇 according to the
following optimization:

𝜇∗ B sup 𝜇 : Tr[Π 𝜇𝜌>𝜎 𝜌] ≤ 1 − 𝜀 .

(5.3.144)

If it happens that 𝜇∗ is such that Tr[Π 𝜇∗ 𝜌>𝜎 𝜌] = 1 − 𝜀, then we are done; we

can pick 𝑝 = 0. However, if 𝜇∗ is such that Tr[Π 𝜇∗ 𝜌>𝜎 𝜌] < 1 − 𝜀, then we pick
𝑝 ∗ ∈ [0, 1] such that
1 − 𝜀 − Tr[Π 𝜇∗ 𝜌>𝜎 𝜌]
𝑝∗ B , (5.3.145)
Tr[Π 𝜇∗ 𝜌=𝜎 𝜌]
with it following that 𝑝 ∗ ∈ [0, 1] because

Tr[Π 𝜇∗ 𝜌>𝜎 𝜌] < 1 − 𝜀 ≤ Tr[Π 𝜇∗ 𝜌≥𝜎 𝜌]. (5.3.146)

With these choices, we then find that

Tr[𝑀 (𝜇∗ , 𝑝 ∗ ) 𝜌] = 1 − 𝜀. (5.3.147)

By the analysis in (5.3.139)–(5.3.142), it then follows that

Tr[𝑀𝜎] ≥ Tr[𝑀 (𝜇∗ , 𝑝 ∗ )𝜎] (5.3.148)

for every measurement operator 𝑀 satisfying 0 ≤ 𝑀 ≤ 1 and Tr[𝑀 𝜌] = 1 − 𝜀. ■

5.3.3.1 Asymptotic Setting

Now, suppose that multiple copies, say 𝑛, of the state (𝜌 or 𝜎) are available.
Based on the discussion at the beginning of Section 5.3, the optimal type-II
error probability, given an upper bound of 𝜀 on the type-I error probability, is
𝛽𝜀 (𝜌 ⊗𝑛 ∥𝜎 ⊗𝑛 ) for all 𝑛 ≥ 1. Then, as in the symmetric case, we are interested in
the behaviour of this type-II error probability as 𝑛 becomes large. Furthermore,
based on the earlier discussion of the trade-off between the type-I and type-II error
probabilities, we might imagine that as 𝑛 becomes large it is possible to bring the
type-I error probability all the way down to zero, because the states become more
distinguishable as 𝑛 increases. In order to investigate this possibility, we consider
the optimal type-II error exponent, by analogy with the error exponent for state
253
Chapter 5: Fundamental Quantum Information Processing Tasks

discrimination that we considered in Section 5.3.1.1. To be precise, we consider

the following quantities:
1
𝐸 (𝜌, 𝜎) B inf lim inf − log2 𝛽𝜀 (𝜌 ⊗𝑛 ∥𝜎 ⊗𝑛 ), (5.3.149)
𝜀∈(0,1) 𝑛→∞ 𝑛
e(𝜌, 𝜎) B sup lim sup − 1 log2 𝛽𝜀 (𝜌 ⊗𝑛 ∥𝜎 ⊗𝑛 ).
𝐸 (5.3.150)
𝜀∈(0,1) 𝑛→∞ 𝑛

We refer to 𝐸 (𝜌, 𝜎) as the optimal achievable type-II error exponent and 𝐸

e(𝜌, 𝜎)
as the optimal strong converse type-II error exponent. Note that the following
inequality is a direct consequence of definitions:

𝐸 (𝜌, 𝜎) ≤ 𝐸
e(𝜌, 𝜎). (5.3.151)

The following result, known as the quantum Stein’s lemma, provides us with a
tractable expression for 𝐸 (𝜌, 𝜎) and 𝐸e(𝜌, 𝜎) in terms of the quantum relative
entropy, an important quantity in quantum information theory that we introduce
formally in Chapter 7. We also delay the proof of the result to Chapter 7, when all
of the required elements of the proof become available to us.

Theorem 5.8 Quantum Stein’s Lemma

For all states 𝜌 and 𝜎, the optimal achievable and strong converse rates are
equal to the quantum relative entropy of 𝜌 and 𝜎, i.e.,

𝐸 (𝜌, 𝜎) = 𝐸
e(𝜌, 𝜎) = 𝐷 (𝜌∥𝜎), (5.3.152)

where

Tr[𝜌(log2 𝜌 − log2 𝜎)] if supp(𝜌) ⊆ supp(𝜎),
𝐷 (𝜌∥𝜎) B (5.3.153)
+∞ otherwise

is the quantum relative entropy.

Proof: See Section 7.10. ■

Remark: See Section 2.2.8.1 for the definition of the logarithm of a Hermitian operator. See
Section 7.2 for a more detailed explanation of the support conditions in the definition of the
quantum relative entropy.

254
Chapter 5: Fundamental Quantum Information Processing Tasks

R
Guess “N” if outcome is MN

ρRA

A
? B
Guess “M” if outcome is MM

{MN , MM }
N or M

Figure 5.11: The most general strategy for discriminating two quantum channels
N and M is to prepare a bipartite state 𝜌 𝑅 𝐴 , with the reference system 𝑅 having
arbitrary dimension, sending the system 𝐴 through the unknown quantum
channel, and then measuring both systems 𝑅 and 𝐴 according to a two-outcome
POVM {𝑀N , 𝑀M }. If the outcome corresponding to 𝑀N occurs, then we guess
that the unknown channel is N; otherwise, we guess that it is M. The minimum
error probability among all such strategies is given by Theorem 5.9.

5.4 Quantum Channel Discrimination

In Sections 5.3.1 and 5.3.2, we considered symmetric hypothesis testing with
respect to quantum states, also known as quantum state discrimination, in which the
hypotheses are modelled as quantum states, and the task is to devise a measurement
strategy that maximizes the probability of success of correctly identifying the state
of a given quantum system. Let us now consider the analogous scenario with
quantum channels, which we refer to as quantum channel discrimination. We do
not discuss asymmetric hypothesis testing with respect to quantum channels in this
book; please see the Bibliographic Notes in Section 5.6 for references.
The task of quantum channel discrimination is as follows. Consider that a
quantum system 𝐴 undergoes an evolution according to one of two quantum
channels, N 𝐴→𝐵 or M 𝐴→𝐵 . Furthermore, we suppose that the quantum channel is
chosen to be N with probability 𝜆 ∈ [0, 1] and M with probability 1 − 𝜆. The goal
is to devise a strategy that correctly guesses the unknown quantum channel with
the highest probability.
The most general strategy for quantum channel discrimination is illustrated in
Figure 5.11. The strategy consists of a state 𝜌 𝑅 𝐴 and a two-outcome measurement
described by the POVM {𝑀N , 𝑀M }. The bipartite state 𝜌 𝑅 𝐴 is such that the
system 𝑅 can be arbitrarily large in principle, in order try to achieve the lowest

255
Chapter 5: Fundamental Quantum Information Processing Tasks

error probability. The measurement acts on both the system 𝑅 and the system
𝐴 after system 𝐴 has passed through the unknown channel. The expected error
probability of this strategy is analogous to the expected error probability of a
strategy for quantum state discrimination: it is the expectation, with respect to
the prior probability distribution given by 𝜆, of the probabilities of the two types
of errors that can occur: guessing “M” when the channel is N, and guessing “N”
when the channel is M. In other words, the expected error probability is

𝜆Tr[𝑀M N 𝐴→𝐵 (𝜌 𝑅 𝐴 )] + (1 − 𝜆)Tr[𝑀N M 𝐴→𝐵 (𝜌 𝑅 𝐴 )]

= 𝑝 err (𝜆, N 𝐴→𝐵 (𝜌 𝑅 𝐴 ), M 𝐴→𝐵 (𝜌 𝑅 𝐴 ), 𝑀), (5.4.1)

where the last line follows by letting 𝑀N ≡ 𝑀 and from the definition of 𝑝 err in
(5.3.1). We see that, given a state 𝜌 𝑅 𝐴 , the task of discriminating N and M reduces
to the task of discriminating the states N 𝐴→𝐵 (𝜌 𝑅 𝐴 ) and M 𝐴→𝐵 (𝜌 𝑅 𝐴 ). The optimal
error probability is obtained by optimizing with respect to every state 𝜌 𝑅 𝐴 and
measurement operator 𝑀, so that

𝑝 ∗err (𝜆, N, M) B inf 𝑝 err (𝜆, N 𝐴→𝐵 (𝜌 𝑅 𝐴 ), M 𝐴→𝐵 (𝜌 𝑅 𝐴 ),

𝜌𝑅 𝐴
𝑀) (5.4.2)
𝑀:0≤𝑀 ≤1
= inf 𝑝 ∗err (𝜆, N 𝐴→𝐵 (𝜌 𝑅 𝐴 ), M 𝐴→𝐵 (𝜌 𝑅 𝐴 )), (5.4.3)
𝜌𝑅 𝐴

where the optimization is with respect to every quantum state 𝜌 𝑅 𝐴 , and there is
an implicit optimization with respect to the dimension of the system 𝑅. This
gives us a first look into how a quantity defined initially for quantum states can be
“lifted” to a quantity defined for quantum channels. In particular, the quantity 𝑝 ∗err ,
initially defined for quantum states as in (5.3.6), has been extended to quantum
channels by evaluating the state quantity with respect to the states N 𝐴→𝐵 (𝜌 𝑅 𝐴 ) and
M 𝐴→𝐵 (𝜌 𝑅 𝐴 ) and then optimizing with respect to both to every state 𝜌 𝑅 𝐴 and the
dimension of 𝑅. Such constructions of channel quantities from state quantities
arise throughout the rest of the book.
Just as the optimal error probability for discriminating two states can be
expressed using the trace norm (recall (5.3.15)), we now show that, analogously, the
optimal error probability for discriminating two quantum channels can be expressed
in terms of the diamond norm.

256
Chapter 5: Fundamental Quantum Information Processing Tasks

Theorem 5.9 Minimum Error Probability for Quantum Channel Dis-

crimination
Let N 𝐴→𝐵 and M 𝐴→𝐵 be quantum channels, and 𝜆 ∈ [0, 1]. The optimal error
probability for discriminating N and M is given by

𝑝 ∗err (𝜆, N, M) = inf 𝑝 ∗err (𝜆, N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 ), M 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )) (5.4.4)

𝜓 𝐴′ 𝐴
1
= (1 − ∥𝜆N − (1 − 𝜆)M∥⋄), (5.4.5)
2
where, in the first line, the optimization is with respect to every pure state 𝜓 𝐴′ 𝐴
with 𝑑 𝐴′ = 𝑑 𝐴 .

Proof: Using (5.3.15), we have

1
𝑝 ∗err (𝜆, N, M) = 1 − sup ∥𝜆N 𝐴→𝐵 (𝜌 𝑅 𝐴 ) − (1 − 𝜆)M 𝐴→𝐵 (𝜌 𝑅 𝐴 )∥ 1 . (5.4.6)
2 𝜌𝑅 𝐴

Now, consider a state 𝜌 𝑅 𝐴 in the optimization above, and let 𝜓 𝑅′ 𝑅 𝐴 be a purification

of 𝜌 𝑅 𝐴 . Then,

∥𝜆N 𝐴→𝐵 (𝜌 𝑅 𝐴 ) − (1 − 𝜆)M 𝐴→𝐵 (𝜌 𝑅 𝐴 ) ∥ 1

= ∥𝜆Tr 𝑅′ [N 𝐴→𝐵 (𝜓 𝑅′ 𝑅 𝐴 )] − (1 − 𝜆)Tr 𝑅′ [M 𝐴→𝐵 (𝜓 𝑅′ 𝑅 𝐴 )] ∥ 1 (5.4.7)
≤ ∥𝜆N 𝐴→𝐵 (𝜓 𝑅′ 𝑅 𝐴 ) − (1 − 𝜆)M 𝐴→𝐵 (𝜓 𝑅′ 𝑅 𝐴 )∥ 1 , (5.4.8)

where the inequality follows from (4.1.7), with respect to the partial trace chan-
nel Tr 𝑅′ . Now, without loss of generality, we can let 𝑑 𝑅′ ≥ 𝑑 𝑅 𝑑 𝐴 ; see Section 3.2.5.
Then, by the Schmidt decomposition theorem (Theorem 2.2), in particular (2.2.59),
the state vector |𝜓⟩ 𝑅′ 𝑅 𝐴 can be expressed according to the 𝑅′ 𝑅| 𝐴 bipartition as
Í 𝐴 √
|𝜓⟩ 𝑅′ 𝑅 𝐴 = 𝑑𝑘=1 𝑝 𝑘 |𝑢 𝑘 ⟩ 𝑅′ 𝑅 ⊗ |𝑣 𝑘 ⟩ 𝐴 , where 𝑝 𝑘 is a probability and the vectors
|𝑢 𝑘 ⟩ 𝑅′ 𝑅 and |𝑣 𝑘 ⟩ 𝐴 form orthonormal bases for a 𝑑 𝐴 -dimensional vector space. In
other words, only a 𝑑 𝐴 -dimensional subspace of H 𝑅′ 𝑅 , call it H 𝐴′ , is relevant for
calculating the trace norm in (5.4.8), and there exists an isometry 𝑉𝐴′ →𝑅′ 𝑅 such that
𝑉𝐴′ →𝑅′ 𝑅 |𝜓⟩ 𝐴′ 𝐴 = |𝜓⟩ 𝑅′ 𝑅 𝐴 . Adopting the shorthand 𝑉 ≡ 𝑉𝐴′ →𝑅′ 𝑅 , it follows that

∥𝜆N 𝐴→𝐵 (𝜌 𝑅 𝐴 ) − (1 − 𝜆)M 𝐴→𝐵 (𝜌 𝑅 𝐴 )∥ 1

≤ 𝜆N 𝐴→𝐵 (𝑉𝜓 𝐴′ 𝐴𝑉 † ) − (1 − 𝜆)M 𝐴→𝐵 (𝑉𝜓 𝐴′ 𝐴𝑉 † ) 1
(5.4.9)
= ∥𝜆N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 ) − (1 − 𝜆)M 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )∥ 1 (5.4.10)
257
Chapter 5: Fundamental Quantum Information Processing Tasks

≤ sup ∥𝜆N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 ) − (1 − 𝜆)M 𝐴→𝐵 (𝜓 𝐴′ 𝐴 ) ∥ 1 (5.4.11)

𝜓 𝐴′ 𝐴
= ∥𝜆N − (1 − 𝜆)M∥⋄ , (5.4.12)
where the last line follows from (2.2.190), because the map 𝜆N − (1 − 𝜆)M is
Hermiticity preserving. Therefore,
1
𝑝 ∗err (𝜆, N, M) ≥(1 − ∥𝜆N − (1 − 𝜆)M∥⋄) . (5.4.13)
2
The opposite inequality holds simply by restricting the optimization in (5.4.3) to
every pure state 𝜓 𝐴′ 𝐴 satisfying 𝑑 𝐴′ = 𝑑 𝐴 . We thus obtain (5.4.5), as required. ■

Exercise 5.20
Consider quantum channels N 𝐴→𝐵 and M 𝐴→𝐵 , and let 𝜆 ∈ [0, 1]. Using
(5.4.4) and (5.4.1), show that the optimal error probability 𝑝 ∗err (𝜆, N, M) can
be evaluated using a semi-definite program. Then, using strong duality, prove
that an alternate expression for 𝑝 ∗err (𝜆, N, M) is

𝑝 ∗err (𝜆, N, M)
= sup 𝜆 min (Tr 𝐵 [𝑊 𝐴𝐵 ]) : 𝑊 𝐴𝐵 ≤ 𝜆ΓN M

𝐴𝐵 , 𝑊 𝐴𝐵 ≤ (1 − 𝜆)Γ 𝐴𝐵 , (5.4.14)
𝑊 𝐴𝐵
Hermitian

where 𝜆 min (Tr 𝐵 [𝑊 𝐴𝐵 ]) is the smallest eigenvalue of Tr 𝐵 [𝑊 𝐴𝐵 ]; see Exer-

cise 2.31.

Exercise 5.21
Consider quantum channels N 𝐴→𝐵 and M 𝐴→𝐵 , and let 𝜆 ∈ [0, 1]. Prove the
following bounds on the optimal error probability for discriminating N and M
in terms of the optimal error probability for discriminating the Choi states of N
and M:

𝑑 𝐴 𝑝 ∗err (𝜆, ΦN M ∗ ∗ N M
𝐴𝐵 , Φ 𝐴𝐵 ) ≤ 𝑝 err (𝜆, N, M) ≤ 𝑝 err (𝜆, Φ 𝐴𝐵 , Φ 𝐴𝐵 ). (5.4.15)

(Hint: See Exercise 4.7.)

The upper bound in (5.4.15) corresponds to the strategy that consists of letting
the state 𝜌 𝑅 𝐴 in Figure 5.11 be the maximally-entangled state Φ 𝐴′ 𝐴 = |Φ⟩⟨Φ| 𝐴′ 𝐴 ,
258
Chapter 5: Fundamental Quantum Information Processing Tasks
Í𝑑 𝐴−1
with |Φ⟩ 𝐴′ 𝐴 = √1𝑑 𝑖=0 |𝑖, 𝑖⟩ 𝐴 𝐴 . The following exercise tells us when this strategy
′
𝐴
is optimal, i.e., when the upper bound in (5.4.15) is achieved.

Exercise 5.22
Let the quantum channels N 𝐴→𝐵 and M 𝐴→𝐵 be jointly covariant with respect
to a group 𝐺, so that
𝑔 𝑔† 𝑔 𝑔†
N 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝑈 𝐴 ) = 𝑉𝐵 N 𝐴→𝐵 (𝜌 𝐴 )𝑉𝐵 , (5.4.16)
𝑔 𝑔† 𝑔 𝑔†
M 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝑈 𝐴 ) = 𝑉𝐵 M 𝐴→𝐵 (𝜌 𝐴 )𝑉𝐵 , (5.4.17)
𝑔 𝑔
for every 𝑔 ∈ 𝐺 and every state 𝜌 𝐴 , where {𝑈 𝐴 }𝑔∈𝐺 and {𝑉𝐵 }𝑔∈𝐺 are unitary
representations of 𝐺 acting on H 𝐴 and H𝐵 , respectively. Furthermore, let
𝑔
{𝑈 𝐴 }𝑔∈𝐺 be such that

1 ∑︁ 𝑔 𝑔† 1𝐴
T𝐺
𝐴 (·) B 𝑈 𝐴 (·)𝑈 𝐴 = Tr[·] . (5.4.18)
|𝐺 | 𝑔∈𝐺 𝑑𝐴

Prove that 𝑝 ∗err (𝜆, N, M) = 𝑝 ∗err (𝜆, ΦN M

𝐴𝐵 , Φ 𝐴𝐵 ). (Hint: Use Proposition 4.20.)

5.5 Summary

5.6 Bibliographic Notes

The quantum teleportation protocol for qubits and qudits presented in Section 5.1
was devised by Bennett et al. (1993). The generalization to groups that act
irreducibly on the input state was presented by Braunstein et al. (2000); Werner
(2001). Covariant channels were considered by Holevo (2002b). For an overview
of group- and representation-theoretic concepts for finite groups, see Steinberg
(2011).
The notion of teleportation simulation of a quantum channel was introduced by
Bennett et al. (1996c) (see Section V therein). The LOCC simulation of a quantum
channel was given by Horodecki et al. (1999) (see Eq. (10) therein), and the related
PPT simulation of a quantum channel was given by Kaur and Wilde (2017). The

259
Chapter 5: Fundamental Quantum Information Processing Tasks

generalized teleportation simulation of a quantum channel was developed for groups

that act irreducibly on the channel input space by Chiribella et al. (2009) (see
Section VII therein). Pirandola et al. (2017) played a role in developing the tool of
LOCC/teleportation simulation more recently. The fact that the channel twirl can be
realized by generalized teleportation simulation was observed by Kaur and Wilde
(2017) (see Appendix B therein). The teleportation protocol was extended to states
of bosonic systems by Braunstein and Kimble (1998). Teleportation simulation of
bosonic Gaussian channels was given by Giedke and Ignacio Cirac (2002); Niset
et al. (2009), and a detailed analysis of convergence in this scenario was presented
by Wilde (2018a).
The quantum super-dense coding protocol was discovered by Bennett and
Wiesner (1992).
The problem of state discrimination, as described in Section 5.3.1, was con-
sidered by Helstrom (1967) (see also Helstrom (1969)) and Holevo (1972a), who
determined the optimal success probability in the case of projective measurements
and general POVMs, respectively. The proof of the optimal measurement operators
in Theorem 5.3 is due to Jencova (2010). Lemma 5.5 is due to Audenaert et al.
(2007), with the simple proof presented here attributed by Jaksic et al. (2012) and
Audenaert (2014) as being due to N. Ozawa. The Chernoff bound for probability
distributions was given by Chernoff (1952). The corresponding bound for quantum
states in Theorem 5.4 was established in two works: Audenaert et al. (2007) deter-
mined the upper bound on the optimal error exponent, while Nussbaum and Szkoła
(2009) established the lower bound. The semi-definite program and complementary
slackness conditions for multiple state discrimination in Section 5.3.2 are due to
Yuen et al. (1975). For work on measurement strategies in two-state discrimination,
including adaptive strategies and strategies that achieve the Chernoff bound, we
refer to the work of Brody and Meister (1996); Ban et al. (1997); Acín et al. (2005);
Higgins et al. (2011); Brandsen et al. (2020).
The operational interpretation of diamond distance in terms of symmetric
hypothesis testing of quantum channels (specifically, Theorem 5.9) was given by
Rosgen and Watrous (2005); Sacchi (2005). The work of Kretschmann and Werner
(2004); Gilchrist et al. (2005) provides a different operational interpretation of
diamond distance in terms of quantifying the error between an ideal channel and an
experimental approximation of it, which we elaborate upon in Chapter 6. ...(refer-
ences for multiple channel discrimination...references for asymmetric hypothesis
testing for channels...)

260
Chapter 5: Fundamental Quantum Information Processing Tasks

5.7 Problems
1. Let 𝜌 𝐴𝐵 be a quantum state with 𝑑 𝐴 = 𝑑 𝐵 = 𝑑 ≥ 2, and consider the quantity

𝑞 corr ( 𝐴|𝐵) 𝜌 ≡ 𝑞 corr (𝜌 𝐴𝐵 ) B 𝑑 sup ⟨Φ| 𝐴𝐵 (id 𝐴 ⊗ N𝐵 )(𝜌 𝐴𝐵 )|Φ⟩ 𝐴𝐵 , (5.7.1)

where the optimization is with respect to quantum channels N𝐵 acting on system 𝐵.

(a) Show that 𝑞 corr (𝜌 𝐴𝐵 ) can be evaluated using a semi-definite program, and determine
the corresponding dual program.
Í
(b) Suppose that 𝜌 𝐴𝐵 is a classical–quantum state, i.e., 𝜌 𝐴𝐵 ≡ 𝜌 𝑋 𝐵 = 𝑥∈X 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗
𝜌 𝑥𝐵 , where X is a set of size 𝑑, 𝑝 : X → [0, 1] is a probability distribution, and
{𝜌 𝑥𝐵 }𝑥∈X is a set of states. Prove that

𝑞 corr (𝜌 𝑋 𝐵 ) = 𝑝 ∗succ ({( 𝑝(𝑥), 𝜌 𝑥 )}𝑥 ). (5.7.2)

In other words, for classical–quantum states, the quantity 𝑞 corr reduces to the optimal
success probability for multiple state discrimination of the set {𝜌 𝑥𝐵 }𝑥∈X . (Hint: See
Exercise 4.11.)
(Bibliographic Note: The function 𝑞 corr was defined by Koenig et al. (2009) within the
context of the min-entropy (a quantity that we encounter in Chapter 7) and its operational
meaning.)

261
Chapter 6

Distinguishibility Measures for

Quantum States and Channels
[in progress]

6.1 Trace Distance

The trace distance is a distance measure based on the trace norm; see Section 2.2.9.2.
For two quantum states 𝜌 and 𝜎, we define the normalized trace distance between
𝜌 and 𝜎 as 12 ∥ 𝜌 − 𝜎∥ 1 .
Using (2.23), observe that the normalized trace distance between pure states
|𝜓⟩⟨𝜓| and |𝜙⟩⟨𝜙| is given by
√︃
1
∥|𝜓⟩⟨𝜓| − |𝜙⟩⟨𝜙|∥ 1 = 1 − |⟨𝜓|𝜙⟩| 2 . (6.1.1)
2
In Section 5.3.1, we showed that the trace distance arises in terms of the optimal
error probability for discriminating two quantum states. Specifically, for two
quantum states 𝜌 and 𝜎, and for 𝜆 = 12 , we have

1 1
𝑝 ∗err (1/2, 𝜌, 𝜎) = 1 − ∥ 𝜌 − 𝜎∥ 1 . (6.1.2)
2 2
The trace distance thus has an operational meaning as quantifying the optimal error
probability for distinguishing two quantum states with equal prior probability.
262
Chapter 6: Distinguishibility Measures for Quantum States and Channels

An alternative operational meaning of the trace distance is in terms of assessing

the performance of a quantum information processing protocol in which the ideal
state to be generated is 𝜌 but the actual state generated is 𝜎. To see that this is the
case, suppose that a third party is trying to assess how distinguishable the actual state
𝜎 is from the ideal state 𝜌. Such an individual can do so by performing a quantum
measurement described by the POVM {𝑀𝑥 }𝑥∈X whose elements are indexed by
some finite set X. In the case that 𝜌 was prepared, the probability of obtaining the
outcome corresponding to 𝑥 is Tr[𝑀𝑥 𝜌], and in the case that 𝜎 was prepared, this
probability is Tr[𝑀𝑥 𝜎]. In order for 𝜌 and 𝜎 to be considered “close,” what we
demand is that the absolute deviation between the actual probability Tr[𝑀𝑥 𝜎] and
the ideal probability Tr[𝑀𝑥 𝜌] be no larger than some desired tolerance 𝜀 > 0, so
that |Tr[𝑀𝑥 𝜌] − Tr[𝑀𝑥 𝜎]| ≤ 𝜀. We want this condition to hold for all possible
measurements that one could perform, so what we demand mathematically is that

sup |Tr[𝑀 𝜌] − Tr[𝑀𝜎]| ≤ 𝜀. (6.1.3)

0≤𝑀 ≤1

As stated in Theorem 6.1 below, the following identity holds

1
sup |Tr[𝑀 𝜌] − Tr[𝑀𝜎] | = ∥ 𝜌 − 𝜎∥ 1 , (6.1.4)
0≤𝑀 ≤1 2

indicating that if 12 ∥ 𝜌 − 𝜎∥ 1 ≤ 𝜀, then the deviation between probabilities for every

possible measurement operator never exceeds 𝜀, so that the approximation between
states 𝜌 and 𝜎 is naturally quantified by the trace distance 12 ∥ 𝜌 − 𝜎∥ 1 .
Let us now prove the variational characterization of the trace distance stated in
(6.1.4).

Theorem 6.1 Trace Distance via Measurement

For two states 𝜌 and 𝜎, the following equality holds
1
∥ 𝜌 − 𝜎∥ 1 = sup |Tr[𝑀 (𝜌 − 𝜎)]| . (6.1.5)
2 0≤𝑀 ≤1

An optimal measurement operator 𝑀 is equal to Π+ + Λ0 , where Π+ is the

projection onto the strictly positive part of 𝜌 − 𝜎 and Λ0 is an operator satisfying
0 ≤ Λ0 ≤ Π0 , with Π0 the projection onto the zero eigenspace of 𝜌 − 𝜎.

263
Chapter 6: Distinguishibility Measures for Quantum States and Channels

Proof: Consider that

sup |Tr[𝑀 (𝜌 − 𝜎)]| = sup Tr[𝑀 (𝜌 − 𝜎)] , (6.1.6)

0≤Λ≤1 0≤𝑀 ≤ 1

because there are always choices for 𝑀 such that Tr[𝑀 (𝜌 − 𝜎)] ≥ 0. Then the
equality follows as a direct application of (5.3.17) and the optimality statement
following it, with 𝐴 = 𝜌 and 𝐵 = 𝜎. ■

From Exercise 2.30, we know that, for Hermitian operators, the trace norm
can be evaluated using semi-definite programming. The normalized trace distance
1
2 ∥ 𝜌 − 𝜎∥ 1 can therefore be evaluated using semi-definite programming, because
𝜌 − 𝜎 is Hermitian. Since 𝜌 and 𝜎 are positive semi-definite, we obtain the
following simpler semi-definite programs for their normalized trace distance.

Proposition 6.2 SDPs for Normalized Trace Distance

The trace distance between every two quantum states 𝜌 and 𝜎 can be written as
the following semi-definite programs:
1
∥ 𝜌 − 𝜎∥ 1 = sup {Tr[𝑀 (𝜌 − 𝜎)] : Λ ≤ 1} (6.1.7)
2 𝑀 ≥0
= inf {Tr[𝑍] : 𝑍 ≥ 𝜌 − 𝜎}. (6.1.8)
𝑍 ≥0

Proof: The expression in (6.1.7) is immediate from (5.3.17), which already

provides an expression for 12 ∥ 𝜌 − 𝜎∥ 1 as a semi-definite program in the primal
form as in (2.4.3). Obtaining the expression in (6.1.8) is then straightforward; see
Exercise 6.1. ■

Exercise 6.1
Prove (6.1.8).

The trace distance obeys the following data-processing inequality.

264
Chapter 6: Distinguishibility Measures for Quantum States and Channels

Theorem 6.3 Data-Processing Inequality for Trace Distance

Let 𝜌 and 𝜎 be states, and let N be a positive, trace-non-increasing map. Then,

∥ 𝜌 − 𝜎∥ 1 ≥ ∥N(𝜌) − N(𝜎)∥ 1 . (6.1.9)

Proof: This is immediate from (4.1.7), which tells us that the trace norm is
monotone non-increasing under the action of every positive trace-non-increasing
superoperator for every linear operator. It is also possible to provide a direct proof
using the expression in (6.1.7); see Exercise 6.2. ■

Exercise 6.2
Provide a direct proof of (6.1.9) using the expression in (6.1.7). (Hint: See the
proof of Proposition 5.2.)

By combining the results of Theorems 6.1 and 6.3, we find that the trace distance
is achieved by a measurement channel:

Theorem 6.4 Trace Distance Achieved by Measurement Channel

For two states 𝜌 and 𝜎, the following equality holds
∑︁
∥ 𝜌 − 𝜎∥ 1 = max |Tr[Λ𝑥 𝜌] − Tr[Λ𝑥 𝜎]| , (6.1.10)
{Λ 𝑥 } 𝑥
𝑥∈X

where the optimization is performed over POVMs {Λ𝑥 }𝑥∈X defined with respect
to a finite alphabet X, and an optimal POVM is given by {Λ∗ , 1 − Λ∗ }, where
Λ∗ = Π+ + Λ0 , the projection Π+ is the projection onto the strictly positive part
of 𝜌 − 𝜎, and Λ0 satisfies 0 ≤ Λ0 ≤ Π0 , with Π0 the projection onto the zero
eigenspace of 𝜌 − 𝜎.

Proof: The inequality

∑︁
∥ 𝜌 − 𝜎∥ 1 ≥ max |Tr[Λ𝑥 𝜌] − Tr[Λ𝑥 𝜎] | (6.1.11)
{Λ 𝑥 } 𝑥
𝑥∈X

follows from Theorem 6.3 by taking the channel N to be the quantum–classical

Í
channel defined as N(𝜔) = 𝑥∈X Tr[Λ𝑥 𝜔]|𝑥⟩⟨𝑥|. Then, from Theorem 6.1, we
265
Chapter 6: Distinguishibility Measures for Quantum States and Channels

have the equality

∥ 𝜌 − 𝜎∥ 1 = |Tr[Λ∗ (𝜌 − 𝜎)]| + |Tr[( 1 − Λ∗ )(𝜌 − 𝜎)]|, (6.1.13)

so that an optimal POVM is given by {Λ∗ , 1 − Λ∗ }. ■

6.2 Fidelity
In addition to the trace distance, another distinguishability measure for states that
we consider in this book is the fidelity (also called Uhlmann fidelity).

Definition 6.5 Fidelity

For two quantum states 𝜌 and 𝜎, the fidelity between 𝜌 and 𝜎, denoted by
𝐹 (𝜌, 𝜎), is defined as

√ √ h√︁√ √ i 2
2
𝐹 (𝜌, 𝜎) B 𝜌 𝜎 1
= Tr 𝜎𝜌 𝜎 . (6.2.1)

Observe that the fidelity is symmetric in its arguments. We also have that
𝐹 (𝜌, 𝜎) ∈ [0, 1] for all states 𝜌 and 𝜎, a fact that we prove below.
For a pure state |𝜓⟩ and mixed state 𝜌, the fidelity between them is equal to

𝐹 (𝜌, |𝜓⟩⟨𝜓|) = ⟨𝜓|𝜌|𝜓⟩ = Tr[|𝜓⟩⟨𝜓|𝜌]. (6.2.2)

Also, for two pure states |𝜓⟩ and |𝜙⟩, the fidelity is simply

𝐹 (|𝜓⟩⟨𝜓|, |𝜙⟩⟨𝜙|) = |⟨𝜓|𝜙⟩| 2 . (6.2.3)

The formula in (6.2.2) gives the fidelity an operational meaning that we employ
in later chapters. Suppose that the goal of a quantum information processing
protocol is to produce the pure state |𝜓⟩⟨𝜓|, but it instead produces a mixed state
𝜌. Then the fidelity 𝐹 (𝜌, |𝜓⟩⟨𝜓|) is equal to the probability that the actual state 𝜌
passes a test for being the ideal state |𝜓⟩⟨𝜓|, with the test being given by the POVM
266
Chapter 6: Distinguishibility Measures for Quantum States and Channels

{|𝜓⟩⟨𝜓|, 1 − |𝜓⟩⟨𝜓|}. That is, the probability of obtaining the first outcome of the
measurement (i.e., “success”) is equal to 𝐹 (𝜌, |𝜓⟩⟨𝜓|). In this way, the fidelity
provides another natural way for assessing the performance of quantum information
processing protocols.
Like the trace distance, the fidelity can be computed via a semi-definite program,
as stated in the following proposition:

Proposition 6.6 SDPs for Root Fidelity of States

√ √ √
The root fidelity 𝐹 (𝜌, 𝜎) = 𝜌 𝜎 1 of quantum states 𝜌 and 𝜎 is charac-
terized by the following primal and dual semi-definite programs:
√

1 𝜌 𝑋
𝐹 (𝜌, 𝜎) = sup Tr[𝑋] + Tr[𝑋 † ] : † ≥0 (6.2.4)
2 𝑋∈L(H) 𝑋 𝜎
𝑌 1

1
inf Tr[𝑌 𝜌] + Tr[𝑍𝜎] :
=
2 𝑌 ,𝑍 ≥0 1 𝑍 ≥0 (6.2.5)

Proof: See Appendix 6.B.1. ■

Theorem 6.7 Basic Properties of Fidelity

1. For two states 𝜌 and 𝜎, the inequalities 0 ≤ 𝐹 (𝜌, 𝜎) ≤ 1 hold. Furthermore,

𝐹 (𝜌, 𝜎) = 1 if and only if 𝜌 = 𝜎, and 𝐹 (𝜌, 𝜎) = 0 if and only if 𝜌 and 𝜎
are supported on orthogonal subspaces.
2. Isometric invariance: For all states 𝜌 and 𝜎, and for every isometry 𝑉,

𝐹 (𝜌, 𝜎) = 𝐹 (𝑉 𝜌𝑉 † , 𝑉 𝜎𝑉 † ). (6.2.6)

3. Multiplicativity: The fidelity is multiplicative with respect to tensor-product

states, meaning that for all states 𝜌1 , 𝜎1 , 𝜌2 , 𝜎2 , we have

𝐹 (𝜌1 ⊗ 𝜌2 , 𝜎1 ⊗ 𝜎2 ) = 𝐹 (𝜌1 , 𝜎1 )𝐹 (𝜌2 , 𝜎2 ). (6.2.7)

Proof:
1. The fact that 𝐹 (𝜌, 𝜎) ≥ 0 for all states 𝜌 and 𝜎 follows from the definition
267
Chapter 6: Distinguishibility Measures for Quantum States and Channels

of the fidelity as the squared trace norm and the fact that the trace norm is
always non-negative. If 𝜌 and 𝜎 are supported on orthogonal subspaces, then
√ √
𝜌 𝜎 = 0, which means that 𝐹 (𝜌, 𝜎) = 0. Conversely, if 𝐹 (𝜌, 𝜎) = 0, then
√ √ √ √
𝜌 𝜎 1 = 0, which implies (by definition of a norm) that 𝜌 𝜎 = 0, which
in turn implies that 𝜌 and 𝜎 are supported on orthogonal subspaces.
Now, using (2.2.130), there exists a unitary 𝑈 such that
√ √ 2 √ √ 2
𝐹 (𝜌, 𝜎) = 𝜌 𝜎 1 = Tr[𝑈 𝜌 𝜎] . (6.2.8)

Then, using the Cauchy–Schwarz inequality for the Hilbert–Schmidt inner

product (see (2.2.30)), we find that
√ √ 2
𝐹 (𝜌, 𝜎) = Tr[𝑈 𝜌 𝜎] (6.2.9)
√ √ √ √
≤ Tr[𝑈 𝜌 𝜌𝑈 † ]Tr[ 𝜎 𝜎] (6.2.10)
√ √ √ √
= Tr[ 𝜌 𝜌]Tr[ 𝜎 𝜎] (6.2.11)
= Tr[𝜌]Tr[𝜎] (6.2.12)
= 1. (6.2.13)

If 𝜌 = 𝜎, then 𝐹 (𝜌, 𝜎) = ∥ 𝜌∥ 21 = Tr[𝜌] 2 = 1. On the other hand, if

𝐹 (𝜌, 𝜎) = 1, then the inequality in the Cauchy–Schwarz inquality is saturated.
The Cauchy–Schwarz inequality is saturated if and only if the two operators
involved are proportional to each other. This means that 𝜌 = 𝛼𝜎 for some
𝛼 > 0. But since both 𝜌 and 𝜎 are states, it must be the case that 𝛼 = 1, which
means that 𝜌 = 𝜎.
2. Proof of isometric invariance: For every isometry 𝑉 and every two states 𝜌
and 𝜎, since
√︁ the action of an isometry
√ does not change the eigenvalues, we
√ † √
have that 𝑉 𝜌𝑉 = 𝑉 𝜌𝑉 and 𝑉 𝜎𝑉 = 𝑉 𝜎𝑉 † . Therefore,
† †

√︃ √︁ 2
† †
𝐹 (𝑉 𝜌𝑉 , 𝑉 𝜎𝑉 ) = 𝑉 𝜌𝑉 † 𝑉 𝜎𝑉 † (6.2.14)
1
√ † √† 2
= 𝑉 𝜌𝑉 𝑉 𝜎𝑉 1 (6.2.15)
√ √ 2
= 𝑉 𝜌 𝜎𝑉 † 1 (6.2.16)
√ √ 2
= 𝜌 𝜎 1, (6.2.17)

as required, where the last line is due to the isometric invariance of the Schatten
norms, as stated in (2.2.93).
268
Chapter 6: Distinguishibility Measures for Quantum States and Channels
√ √ √
3. Proof of multiplicativity:
√ Using the fact that 𝜌1 ⊗ 𝜌2 = 𝜌1 ⊗ 𝜌2 , and
similarly for 𝜎1 ⊗ 𝜎2 , and using the multiplicativity of the trace norm with
respect to the tensor product (see (2.2.96)), we find that
√︁ √ 2
𝐹 (𝜌1 ⊗ 𝜌2 , 𝜎1 ⊗ 𝜎2 ) = 𝜌1 ⊗ 𝜌2 𝜎1 ⊗ 𝜎2 (6.2.18)
1
√ √ √ √ 2
= ( 𝜌1 ⊗ 𝜌2 )( 𝜎1 ⊗ 𝜎2 ) 1
(6.2.19)
√ √ √ √ 2
= 𝜌1 𝜎1 ⊗ 𝜌2 𝜎2 1 (6.2.20)
√ √ 2 √ √ 2
= 𝜌1 𝜎1 1 𝜌2 𝜎2 1 (6.2.21)
= 𝐹 (𝜌1 , 𝜎1 )𝐹 (𝜌2 , 𝜎2 ), (6.2.22)

as required. ■

Theorem 6.8 Uhlmann’s Theorem

√
For two quantum states 𝜌 𝐴 and 𝜎𝐴 , let |𝜓 𝜌 ⟩ B ( 1 𝑅 ⊗ 𝜌 𝐴 )|Γ⟩ 𝑅 𝐴 and |𝜓 𝜎 ⟩ B
√
( 1 𝑅 ⊗ 𝜎𝐴 )|Γ⟩ 𝑅 𝐴 be purifications of 𝜌 and 𝜎, respectively, with the dimension
of 𝑅 equal to the dimension of 𝐴. Then,

𝐹 (𝜌, 𝜎) = max |⟨𝜓 𝜌 | 𝑅 𝐴 (𝑈 𝑅 ⊗ 1 𝐴 )|𝜓 𝜎 ⟩ 𝑅 𝐴 | 2 , (6.2.23)

𝑈𝑅

where the optimization is over unitaries on 𝑅.

Remark: Since all purifications are related to each other by isometries on the purifying system
(which is the system 𝑅 as in the statement of the theorem), Uhlmann’s theorem tells us that the
fidelity between two quantum states is equal to the maximum overlap between their purifications.
Furthermore, it is straightforward to show that it suffices to take the dimension of 𝑅 the
same as the dimension of 𝐴, as we have done in the statement of the theorem. In other words,
performing an optimization over the dimension of 𝑅 leads to the same result as in (6.2.23).

Proof: Using the definitions of |𝜓 𝜌 ⟩ 𝑅 𝐴 and |𝜓 𝜎 ⟩ 𝑅 𝐴 , we find for every unitary 𝑈 𝑅

that

|⟨𝜓 𝜌 | 𝑅 𝐴 (𝑈 𝑅 ⊗ 1 𝐴 )|𝜓 𝜎 ⟩ 𝑅 𝐴 | 2
√ √
= ⟨Γ| 𝑅 𝐴 ( 1 𝑅 ⊗ 𝜌 𝐴 )(𝑈 𝑅 ⊗ 1 𝐴 )( 1 𝑅 ⊗ 𝜎𝐴 )|Γ⟩ 𝑅 𝐴
2
(6.2.24)
√ √ 2
= ⟨Γ| 𝑅 𝐴 (𝑈 𝑅 ⊗ 𝜌 𝐴 𝜎𝐴 )|Γ⟩ 𝑅 𝐴 (6.2.25)
269
Chapter 6: Distinguishibility Measures for Quantum States and Channels

√ √
= ⟨Γ| 𝑅 𝐴 ( 1 𝑅 ⊗
2
𝜌 𝐴 𝜎𝐴𝑈 𝐴T )|Γ𝑅 𝐴 ⟩ , (6.2.26)
where the last line follows from the transpose trick in (2.2.40). Then, using (2.2.41),
we find that
√ √
|⟨𝜓 𝜌 | 𝑅 𝐴 (𝑈 𝑅 ⊗ 1 𝐴 )|𝜓 𝜎 ⟩ 𝑅 𝐴 | 2 = Tr[ 𝜌 𝐴 𝜎𝐴𝑈 𝐴T ] .
2
(6.2.27)
Since 𝑈 𝐴 is arbitrary, and 𝑈 𝐴T is also a unitary, we use (2.2.130) to obtain
√ √
max |⟨𝜓 𝜌 | 𝑅 𝐴 (𝑈 𝑅 ⊗ 1 𝐴 )|𝜓 𝜎 ⟩ 𝑅 𝐴 | 2 = max Tr[ 𝜌 𝐴 𝜎𝐴𝑈 𝐴T ]
2
(6.2.28)
𝑈 𝑈
√ √ 2
= max Tr[ 𝜌 𝐴 𝜎𝐴𝑈 𝐴 ] (6.2.29)
𝑈
√ √ 2
= 𝜌 𝐴 𝜎𝐴 1 (6.2.30)
as required. ■

Theorem 6.9 Data-Processing Inequality for Fidelity

Let 𝜌 and 𝜎 be states, and let N be a quantum channel. Then,

𝐹 (𝜌, 𝜎) ≤ 𝐹 (N(𝜌), N(𝜎)). (6.2.31)

Proof: Recall that every quantum channel N 𝐴→𝐵 can be written in the Stinespring
form as N 𝐴→𝐵 (𝜌 𝐴 ) = Tr𝐸 [𝑉 𝜌 𝐴𝑉 † ], where 𝑉 ≡ 𝑉𝐴→𝐵𝐸 is some isometric extension
of N and 𝑑 𝐸 ≤ rank(ΓN 𝐴𝐵 ). Since we have shown that the fidelity is invariant
under isometric channels, it remains to show that the fidelity is non-decreasing
under the action of the partial trace. To this end, consider bipartite states 𝜌 𝐴𝐵
and 𝜎𝐴𝐵 , and let |𝜓⟩ 𝑅 𝐴𝐵 be an arbitrary purification of 𝜌 𝐴𝐵 and let |𝜙⟩ 𝑅 𝐴𝐵 be an
arbitrary purification of 𝜎𝐴𝐵 , where 𝑑 𝑅 = 𝑑 𝐴 𝑑 𝐵 . Observe that |𝜓⟩ 𝑅 𝐴𝐵 and |𝜙⟩ 𝑅 𝐴𝐵
are also purifications of 𝜌 𝐴 = Tr 𝐵 [𝜌 𝐴𝐵 ] and 𝜎𝐴 = Tr 𝐵 [𝜎𝐴𝐵 ], respectively. Then,
by Uhlmann’s theorem, we have that
𝐹 (𝜌 𝐴 , 𝜎𝐴 ) = max |⟨𝜓| 𝑅 𝐴𝐵 (𝑈 𝑅𝐵 ⊗ 1 𝐴 )|𝜙 𝑅 𝐴𝐵 ⟩| 2 . (6.2.32)
𝑈 𝑅𝐵

By restricting the maximization above to unitaries of the form 𝑈 𝑅 ⊗ 1𝐵 , we have

that
𝐹 (𝜌 𝐴 , 𝜎𝐴 ) ≥ |⟨𝜓| 𝑅 𝐴𝐵 (𝑈 𝑅 ⊗ 1 𝐴𝐵 )|𝜙⟩ 𝑅 𝐴𝐵 | 2 (6.2.33)
for all unitaries 𝑈 𝑅 . Therefore,

270
Chapter 6: Distinguishibility Measures for Quantum States and Channels

max |⟨𝜓| 𝑅 𝐴𝐵 (𝑈 𝑅 ⊗ 1 𝐴𝐵 )|𝜙⟩ 𝑅 𝐴𝐵 | 2

𝑈𝑅
≤ max |⟨𝜓| 𝑅 𝐴𝐵 (𝑈 𝑅𝐵 ⊗ 1 𝐴 )|𝜙⟩ 𝑅 𝐴𝐵 | 2 . (6.2.34)
𝑈 𝑅𝐵

But, by Uhlmann’s theorem,

max |⟨𝜓| 𝑅 𝐴𝐵 (𝑈 𝑅 ⊗ 1 𝐴𝐵 )|𝜙⟩ 𝑅 𝐴𝐵 | 2 = 𝐹 (𝜌 𝐴𝐵 , 𝜎𝐴𝐵 ). (6.2.35)

𝑈𝑅

Therefore,

𝐹 (𝜌 𝐴𝐵 , 𝜎𝐴𝐵 ) ≤ 𝐹 (𝜌 𝐴 , 𝜎𝐴 ) = 𝐹 (Tr 𝐵 [𝜌 𝐴𝐵 ], Tr 𝐵 [𝜎𝐴𝐵 ]). (6.2.36)

The fidelity thus satisfies the data-processing inequality with respect to the partial
trace.
Using the data-processing inequality for the fidelity with respect to the partial
trace, along with its invariance under isometries, we conclude that

† †
𝐹 (N(𝜌), N(𝜎)) = 𝐹 Tr𝐸 [𝑉 𝜌𝑉 ], Tr𝐸 [𝑉 𝜌𝑉 ] (6.2.37)
≥ 𝐹 (𝑉 𝜌𝑉 † , 𝑉 𝜎𝑉 † ) (6.2.38)
= 𝐹 (𝜌, 𝜎), (6.2.39)

as required. ■

With Uhlmann’s theorem and the data-processing inequality for the fidelity in
hand, we can now establish two more properties of the fidelity.

Theorem 6.10 Concavity of Fidelity

The fidelity is concave in either one of its arguments:
!
∑︁ ∑︁
𝐹 𝑝(𝑥) 𝜌 𝑥 , 𝜎 ≥ 𝑝(𝑥)𝐹 (𝜌 𝑥 , 𝜎), (6.2.40)
𝑥∈X 𝑥∈X

where X is a finite alphabet, 𝑝 : X → [0, 1] is a probability distribution, and 𝜎

and {𝜌 𝑥 }𝑥∈X are states.

Proof: By Uhlmann’s theorem, we know that the fidelity is given by the maximum
overlap between the purifications of the two states under consideration. Based on
271
Chapter 6: Distinguishibility Measures for Quantum States and Channels

this, let |𝜓 𝜎 ⟩ 𝑅 𝐴 be a purification of 𝜎𝐴 . Then, for 𝑥 ∈ X, let |𝜙𝑥 ⟩ 𝑅 𝐴 be a purification

of 𝜌 𝑥𝐴 such that 𝐹 (𝜌 𝑥𝐴 , 𝜎) = |⟨𝜙𝑥 |𝜓 𝜎 ⟩| 2 . Then,
∑︁ ∑︁
𝑝(𝑥)𝐹 (𝜌 𝑥𝐴 , 𝜎𝐴 ) = 𝑝(𝑥) |⟨𝜙𝑥 |𝜓 𝜎 ⟩| 2 (6.2.41)
𝑥∈X 𝑥∈X
!
∑︁
= ⟨𝜓 𝜎 | 𝑅 𝐴 𝑝(𝑥)|𝜙𝑥 ⟩⟨𝜙𝑥 | 𝑅 𝐴 |𝜓 𝜎 ⟩ 𝑅 𝐴 (6.2.42)
𝑥∈X
!
∑︁
=𝐹 𝑝(𝑥)|𝜙𝑥 ⟩⟨𝜙𝑥 | 𝑅 𝐴 , |𝜓 𝜎 ⟩⟨𝜓 𝜎 | 𝑅 𝐴 , (6.2.43)
𝑥∈X

where the last line follows from the formula in (6.2.2) for the fidelity between a
pure state and a mixed state. Then, using the data-processing inequality for the
fidelity with respect to the partial trace, we obtain
!
∑︁ ∑︁
𝑝(𝑥)𝐹 (𝜌 𝑥𝐴 , 𝜎𝐴 ) = 𝐹 𝑝(𝑥)|𝜙𝑥 ⟩⟨𝜙𝑥 | 𝑅 𝐴 , |𝜓 𝜎 ⟩⟨𝜓 𝜎 | 𝑅 𝐴 (6.2.44)
𝑥∈X 𝑥∈X
!
∑︁
≤𝐹 𝑝(𝑥)Tr 𝑅 [|𝜙𝑥 ⟩⟨𝜙𝑥 | 𝑅 𝐴 ], Tr 𝑅 [|𝜓 𝜎 ⟩⟨𝜓 𝜎 | 𝑅 𝐴 ] (6.2.45)
𝑥∈X
!
∑︁
=𝐹 𝑝(𝑥) 𝜌 𝑥𝐴 , 𝜎𝐴 , (6.2.46)
𝑥∈X

which is the required result. ■

A more general result than the concavity result proved above, namely joint
concavity, can be obtained if we consider instead the square root of the fidelity,
which we call the “root fidelity” and denote by
√ √︁ √ √
𝐹 (𝜌, 𝜎) B 𝐹 (𝜌, 𝜎) = 𝜌 𝜎 1 . (6.2.47)

Theorem 6.11 Joint Concavity of Root Fidelity

The root fidelity is jointly concave:
!
√ ∑︁
𝑥
∑︁
𝑥
∑︁ √
𝐹 𝑝(𝑥) 𝜌 , 𝑝(𝑥)𝜎 ≥ 𝑝(𝑥) 𝐹 (𝜌 𝑥 , 𝜎 𝑥 ), (6.2.48)
𝑥∈X 𝑥∈X 𝑥∈X

272
Chapter 6: Distinguishibility Measures for Quantum States and Channels

where X is a finite alphabet, 𝑝 : X → [0, 1] is a probability distribution, and

{𝜌 𝑥 }𝑥∈X and {𝜎 𝑥 }𝑥∈X are sets of states.

Proof: This result follows by defining the classical–quantum states

∑︁
𝜌𝑋 𝐴 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 , (6.2.49)
𝑥∈X
∑︁
𝜎𝑋 𝐴 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 , (6.2.50)
𝑥∈X

and observing that

√
𝐹 (𝜌 𝑋 𝐴 , 𝜎𝑋 𝐴 )
√ √
= 𝜌 𝑋 𝐴 𝜎𝑋 𝐴 1 (6.2.51)
! !
∑︁ √︃ ∑︁ √︃
′
= |𝑥⟩⟨𝑥| 𝑋 ⊗ 𝑝(𝑥) 𝜌 𝑥𝐴 |𝑥 ′⟩⟨𝑥 ′ | 𝑋 ⊗ 𝑝(𝑥 ′)𝜎𝐴𝑥 (6.2.52)
𝑥∈X 𝑥 ′ ∈X 1
∑︁ √︃ √︃
= |𝑥⟩⟨𝑥| 𝑋 ⊗ 𝑝(𝑥) 𝜌 𝑥𝐴 𝜎𝐴𝑥 (6.2.53)
𝑥∈X 1
∑︁ √︃ √︃
= 𝑝(𝑥) 𝜌 𝑥𝐴 𝜎𝐴𝑥 (6.2.54)
1
𝑥∈X
∑︁ √︃ √︃
= 𝑝(𝑥) 𝜌 𝑥𝐴 𝜎𝐴𝑥 (6.2.55)
1
𝑥∈X
∑︁ √
= 𝑝(𝑥) 𝐹 (𝜌 𝑥𝐴 , 𝜎𝐴𝑥 ). (6.2.56)
𝑥∈X

From the above and the data-processing inequality for the fidelity under partial
trace (Theorem 6.9), we conclude that
∑︁ √ √ √
𝑝(𝑥) 𝐹 (𝜌 𝑥𝐴 , 𝜎𝐴𝑥 ) = 𝐹 (𝜌 𝑋 𝐴 , 𝜎𝑋 𝐴 ) ≤ 𝐹 (𝜌 𝐴 , 𝜎𝐴 ), (6.2.57)
𝑥∈X

which is equivalent to (6.2.48). ■

The steps in (6.2.51)–(6.2.56) demonstrate that the root fidelity satisfies the
direct-sum property: for every finite alphabet X, probability distributions 𝑝, 𝑞 :
X → [0, 1], and sets {𝜌 𝑥𝐴 }𝑥∈X , {𝜎𝐴𝑥 }𝑥∈X of states, we have

273
Chapter 6: Distinguishibility Measures for Quantum States and Channels
!
√ ∑︁ ∑︁
𝐹 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 , 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥
𝑥∈X 𝑥∈X
∑︁ √︁ √
= 𝑝(𝑥)𝑞(𝑥) 𝐹 (𝜌 𝑥𝐴 , 𝜎𝐴𝑥 ). (6.2.58)
𝑥∈X

Just as the trace distance can be achieved with a measurement, so it holds that
the fidelity can also be achieved with a measurement, as we now show.

Theorem 6.12 Fidelity via Measurement

For two states 𝜌 and 𝜎, the following equality holds
!2
∑︁ √︁ √︁
𝐹 (𝜌, 𝜎) = min Tr[Λ𝑥 𝜌] Tr[Λ𝑥 𝜎] , (6.2.59)
{Λ 𝑥 } 𝑥
𝑥∈X

where the optimization is with respect to all POVMs {Λ𝑥 }𝑥∈X defined with
respect to a finite alphabet X.

Proof: Let M be a measurement channel defined as

∑︁
M(𝜌) = Tr[Λ𝑥 𝜌]|𝑥⟩⟨𝑥|, (6.2.60)
𝑥∈X
where {Λ𝑥 }𝑥∈X is a POVM and X is a finite alphabet. Then, by the data-processing
inequality for the fidelity with respect to the channel M, and since the action of M
leads to a state that is diagonal in the orthonormal basis {|𝑥⟩}𝑥∈X , we obtain
𝐹 (𝜌, 𝜎) ≤ 𝐹 (M(𝜌), M(𝜎)) (6.2.61)
√︁ √︁ 2
= M(𝜌) M(𝜎) (6.2.62)
1
√︄∑︁ √︄∑︁ 2

= Tr[Λ𝑥 𝜌]|𝑥⟩⟨𝑥| Tr[Λ𝑥 𝜎]|𝑥⟩⟨𝑥| (6.2.63)

𝑥∈X 𝑥∈X
1
! ! 2
∑︁ √︁ ∑︁ √︁
= Tr[Λ𝑥 𝜌]|𝑥⟩⟨𝑥| Tr[Λ𝑥 𝜎]|𝑥⟩⟨𝑥| (6.2.64)
𝑥∈X 𝑥∈X 1
2
∑︁ √︁ √︁
= Tr[Λ𝑥 𝜌] Tr[Λ𝑥 𝜎]|𝑥⟩⟨𝑥| (6.2.65)
𝑥∈X 1
274
Chapter 6: Distinguishibility Measures for Quantum States and Channels
!2
∑︁ √︁ √︁
= Tr[Λ𝑥 𝜌] Tr[Λ𝑥 𝜎] . (6.2.66)
𝑥∈X

Since the POVM {Λ𝑥 }𝑥∈X is arbitrary, we find that

!2
∑︁ √︁ √︁
𝐹 (𝜌, 𝜎) ≤ min Tr[Λ𝑥 𝜌] Tr[Λ𝑥 𝜎] . (6.2.67)
{Λ 𝑥 } 𝑥
𝑥∈X

We now prove the reverse inequality by explicitly constructing a POVM that

achieves the fidelity. First, observe that we can write the fidelity 𝐹 (𝜌, 𝜎) as

√ √ 2 √ √ 2
√︃
𝐹 (𝜌, 𝜎) = 𝜌 𝜎 1 = Tr 𝜎𝜌 𝜎 = Tr[ 𝐴𝜎] 2 , (6.2.68)

where
12
− 21 1 1 1
𝐴B𝜎 𝜎 𝜌𝜎
2 2 𝜎− 2 . (6.2.69)
If 𝜎 is not invertible, then the inverse is understood to be on the support of 𝜎, in
which case 𝐴 is supported on the support of 𝜎. So the fidelity is simply equal to
the squared expectation value of the Hermitian operator 𝐴 with respect to the state
𝜎. Observe that we can also write
2
√ √
√︃
𝐹 (𝜌, 𝜎) = Tr 𝜎𝜌 𝜎 (6.2.70)
2
√ √
√︃
= Tr 𝜎Π𝜎 𝜌Π𝜎 𝜎 (6.2.71)

= 𝐹 (Π𝜎 𝜌Π𝜎 , 𝜎), (6.2.72)

√
where
√ Π 𝜎√is the projection onto the support of 𝜎. This holds because 𝜎 =
Π𝜎 𝜎 = 𝜎Π𝜎 .
Now, let us consider a measurement in the eigenbasis of 𝐴. Let {|𝜓𝑖 ⟩}𝑖=0
𝑟−1 be the

eigenvectors of 𝐴, where 𝑟 = rank( 𝐴). If 𝜎 is not invertible, then we can always add
a set {|𝜓𝑖 ⟩}𝑖=𝑟
𝑑−1 of linearly independent pure states orthogonal to the eigenbasis of

𝐴 in order to obtain a POVM on the full 𝑑-dimensional space. Therefore, suppose

that
𝑑−1
∑︁
𝐴= 𝜆𝑖 |𝜓𝑖 ⟩⟨𝜓𝑖 |, (6.2.73)
𝑖=0

275
Chapter 6: Distinguishibility Measures for Quantum States and Channels

where we have included the vectors {|𝜓𝑖 ⟩}𝑖=𝑟

𝑑−1 that have corresponding eigenvalues

equal to zero. Then,

" 𝑑−1 #
∑︁
Tr[ 𝐴𝜎] = Tr 𝜆𝑖 |𝜓𝑖 ⟩⟨𝜓𝑖 |𝜎 (6.2.74)
𝑖=0
𝑑−1
∑︁
= 𝜆𝑖 ⟨𝜓𝑖 |𝜎|𝜓𝑖 ⟩ (6.2.75)
𝑖=0
𝑑−1 √︁
∑︁ √︁
= ⟨𝜓𝑖 |𝜆𝑖 𝜎𝜆𝑖 |𝜓𝑖 ⟩ ⟨𝜓𝑖 |𝜎|𝜓𝑖 ⟩ (6.2.76)
𝑖=0
𝑟−1 √︁
∑︁ √︁
= ⟨𝜓𝑖 | 𝐴𝜎 𝐴|𝜓𝑖 ⟩ ⟨𝜓𝑖 |𝜎|𝜓𝑖 ⟩, (6.2.77)
𝑖=0

where the last line follows because 𝐴|𝜓𝑖 ⟩ = 𝜆𝑖 |𝜓𝑖 ⟩ for all 0 ≤ 𝑖 ≤ 𝑟 − 1, and we
have used the fact that 𝜆𝑖 = 0 for all 𝑟 ≤ 𝑖 ≤ 𝑑 − 1. Now, it is straightforward to
show that 𝐴𝜎 𝐴 = Π𝜎 𝜌Π𝜎 . Therefore,
𝑟−1 √︁
!2
∑︁ √︁
𝐹 (𝜌, 𝜎) = Tr[ 𝐴𝜎] 2 = ⟨𝜓𝑖 |Π𝜎 𝜌Π𝜎 |𝜓𝑖 ⟩ ⟨𝜓𝑖 |𝜎|𝜓𝑖 ⟩ (6.2.78)
𝑖=0
𝑟−1 √︁
!2
∑︁ √︁
= ⟨𝜓𝑖 |𝜌|𝜓𝑖 ⟩ ⟨𝜓𝑖 |𝜎|𝜓𝑖 ⟩ , (6.2.79)
𝑖=0

where the last line follows because 𝐴 is defined on the support of 𝜎, which means
that Π𝜎 |𝜓𝑖 ⟩ = |𝜓𝑖 ⟩ for all 0 ≤ 𝑖 ≤ 𝑟 − 1. We thus have
!2 𝑟−1 √︁
!2
∑︁ √︁ √︁ ∑︁ √︁
min Tr[Λ𝑥 𝜌] Tr[Λ𝑥 𝜎] ≤ ⟨𝜓𝑖 |𝜌|𝜓𝑖 ⟩ ⟨𝜓𝑖 |𝜎|𝜓𝑖 ⟩ (6.2.80)
{Λ 𝑥 } 𝑥
𝑥∈X 𝑖=0
= 𝐹 (𝜌, 𝜎), (6.2.81)

which is precisely the reverse inequality, as desired. ■

By employing Theorem 6.12, we conclude the following stronger data-proce-

ssing inequality for the fidelity, which strengthens the statement of Theorem 6.9
considerably:

276
Chapter 6: Distinguishibility Measures for Quantum States and Channels

Proposition 6.13 Improved Data Processing for Fidelity

Let 𝜌 and 𝜎 be quantum states, and let N be a positive, trace-preserving map.
Then the following inequality holds:

𝐹 (𝜌, 𝜎) ≤ 𝐹 (N(𝜌), N(𝜎)). (6.2.82)

Proof: The reasoning here follows the reasoning of the proof of Theorem 6.3
closely. Let {Λ′𝑥 }𝑥∈X be a POVM. Then consider that
∑︁ √︁ ∑︁ √︃
′ ′
Tr[Λ𝑥 N(𝜌)]Tr[Λ𝑥 N(𝜎)] = Tr[N† (Λ′𝑥 ) 𝜌]Tr[N† (Λ′𝑥 )𝜎] (6.2.83)
𝑥∈X 𝑥∈X
∑︁ √︁
≥ min Tr[Λ𝑥 𝜌]Tr[Λ𝑥 𝜎] (6.2.84)
{Λ 𝑥 } 𝑥 ∈X
𝑥∈X
√
= 𝐹 (𝜌, 𝜎). (6.2.85)

The inequality follows because {N† (Λ′𝑥 )}𝑥∈X is a POVM since {Λ′𝑥 }𝑥∈X is and
N is a positive, trace-preserving map, so that N† (Λ′𝑥 ) ≥ 0 for all 𝑥 ∈ X and
𝑥∈X Λ𝑥 = N ( 1) = 1. The last equality follows from
† ′ † ′ †
Í Í
𝑥∈X N (Λ𝑥 ) = N
Theorem 6.12. Since the inequality holds for all POVMs {Λ′𝑥 }𝑥∈X , we conclude
that
√ ∑︁ √︁
𝐹 (N(𝜌), N(𝜎)) = min ′
Tr[Λ′𝑥 N(𝜌)]Tr[Λ′𝑥 N(𝜎)] (6.2.86)
{Λ 𝑥 } 𝑥 ∈X
𝑥∈X
√
≥ 𝐹 (𝜌, 𝜎), (6.2.87)

where we have again employed Theorem 6.12 for the equality. ■

We now establish a useful relation between trace distance and fidelity.

Theorem 6.14 Relation Between Trace Distance and Fidelity

For two states 𝜌 and 𝜎, the following chain of inequalities relates their trace
distance with the fidelity between them:
√︁ 1 √︁
1 − 𝐹 (𝜌, 𝜎) ≤ ∥ 𝜌 − 𝜎∥ 1 ≤ 1 − 𝐹 (𝜌, 𝜎). (6.2.88)
2

277
Chapter 6: Distinguishibility Measures for Quantum States and Channels

Proof: We first prove the upper bound. To do so, recall the formula in (6.1.1)
for the trace distance between two pure states. If we let |𝜓 𝜌 ⟩ 𝑅 𝐴 and |𝜓 𝜎 ⟩ 𝑅 𝐴 be
purifications of 𝜌 𝐴 and 𝜎𝐴 , respectively, such that 𝐹 (𝜌 𝐴 , 𝜎𝐴 ) = |⟨𝜓 𝜌 |𝜓 𝜎 ⟩| 2 , and
if we use the data-processing inequality for the trace distance with respect to the
partial trace channel Tr 𝑅 , then we obtain
1 1
∥ 𝜌 𝐴 − 𝜎𝐴 ∥ 1 = ∥Tr 𝑅 [|𝜓 𝜌 ⟩⟨𝜓 𝜌 | 𝑅 𝐴 − |𝜓 𝜎 ⟩⟨𝜓 𝜎 | 𝑅 𝐴 ] ∥ 1 (6.2.89)
2 2
1
≤ ∥|𝜓 𝜌 ⟩⟨𝜓 𝜌 | 𝑅 𝐴 − |𝜓 𝜎 ⟩⟨𝜓 𝜎 | 𝑅 𝐴 ∥ 1 (6.2.90)
2
√︃
= 1 − |⟨𝜓 𝜌 |𝜓 𝜎 ⟩| 2 (6.2.91)
√︁
= 1 − 𝐹 (𝜌 𝐴 , 𝜎𝐴 ), (6.2.92)
as required.
For the lower bound, we use the results of Theorems 6.12 and 6.4. Theorem
6.12 tells us that there exists a POVM {Λ𝑥 }𝑥∈X such that
!2
∑︁ √︁ √︁
𝐹 (𝜌, 𝜎) = Tr[Λ𝑥 𝜌] Tr[Λ𝑥 𝜎] (6.2.93)
𝑥∈X
!2
∑︁ √︁
≡ 𝑝(𝑥)𝑞(𝑥) , (6.2.94)
𝑥∈X
where we have let 𝑝(𝑥) B Tr[Λ𝑥 𝜌] and 𝑞(𝑥) B Tr[Λ𝑥 𝜎]. Using this, observe that
∑︁ √ √ 2 ∑︁ √
𝑝(𝑥) − 𝑞(𝑥) = 𝑝(𝑥) − 2 𝑝(𝑥)𝑞(𝑥) + 𝑞(𝑥) (6.2.95)
𝑥∈X 𝑥∈X
∑︁ √
=2−2 𝑝(𝑥)𝑞(𝑥) (6.2.96)
𝑥∈X
√︁
= 2 − 2 𝐹 (𝜌, 𝜎). (6.2.97)

Now, Theorem 6.4 tells us that

∑︁
∥ 𝜌 − 𝜎∥ 1 = max |𝑟 (𝑦) − 𝑠(𝑦)|, (6.2.98)
{Ω 𝑦 } 𝑦
𝑥∈X
where 𝑟 (𝑦) B Tr[Ω𝑦 𝜌] and 𝑠(𝑦) B Tr[Ω𝑦 𝜎]. In particular, for the POVM {Λ𝑥 }𝑥∈X
that achieves the fidelity, we have
∑︁
| 𝑝(𝑥) − 𝑞(𝑥)| ≤ ∥ 𝜌 − 𝜎∥ 1 . (6.2.99)
𝑥∈X

So we have
√︁ ∑︁ √ √ 2
2 − 2 𝐹 (𝜌, 𝜎) = 𝑝(𝑥) − 𝑞(𝑥) ≤ ∥ 𝜌 − 𝜎∥ 1 , (6.2.103)
𝑥∈X

which is the required lower bound. ■

Lemma 6.15 Gentle Measurement

Let 𝜌 be a density operator and Λ a measurement operator, satisfying 0 ≤ Λ ≤ 𝐼
and Tr[Λ𝜌] ≥ 1 − 𝜀, for 𝜀 ∈ [0, 1]. Then the post-measurement state
√ √
Λ𝜌 Λ
𝜌′ B (6.2.104)
Tr[Λ𝜌]
satisfies

𝐹 (𝜌, 𝜌′) ≥ 1 − 𝜀, (6.2.105)

1 √
∥ 𝜌 − 𝜌′ ∥ 1 ≤ 𝜀. (6.2.106)
2

Proof: Suppose first that 𝜌 is a pure state |𝜓⟩⟨𝜓|. The post-measurement state is
then √ √
Λ|𝜓⟩⟨𝜓| Λ
. (6.2.107)
⟨𝜓|Λ|𝜓⟩
The fidelity between the original state |𝜓⟩ and the post-measurement state above is
as follows:
√ √ ! √ 2
Λ|𝜓⟩⟨𝜓| Λ ⟨𝜓| Λ|𝜓⟩ |⟨𝜓|Λ|𝜓⟩| 2
⟨𝜓| |𝜓⟩ = ≥ (6.2.108)
⟨𝜓|Λ|𝜓⟩ ⟨𝜓|Λ|𝜓⟩ ⟨𝜓|Λ|𝜓⟩

279
Chapter 6: Distinguishibility Measures for Quantum States and Channels

= ⟨𝜓|Λ|𝜓⟩ ≥ 1 − 𝜀. (6.2.109)
√
The first inequality follows because Λ ≥ Λ when Λ ≤ 𝐼. The second inequality
follows from the hypothesis of the lemma. Now let us consider when we have
mixed states 𝜌 𝐴 and 𝜌′𝐴 . Suppose |𝜓⟩ 𝑅 𝐴 and |𝜓 ′⟩ 𝑅 𝐴 are respective purifications of
𝜌 𝐴 and 𝜌′𝐴 , where
√
′ 𝐼 𝑅 ⊗ Λ 𝐴 |𝜓⟩ 𝑅 𝐴
|𝜓 ⟩ 𝑅 𝐴 ≡ √︁ . (6.2.110)
⟨𝜓|𝐼 𝑅 ⊗ Λ 𝐴 |𝜓⟩ 𝑅 𝐴
Then we can apply the data-processing inequality for fidelity (Proposition 6.13)
and the result above for pure states to conclude that

𝐹 (𝜌 𝐴 , 𝜌′𝐴 ) ≥ 𝐹 (𝜓 𝑅 𝐴 , 𝜓 ′𝑅 𝐴 ) ≥ 1 − 𝜀. (6.2.111)

We obtain the bound on the normalized trace distance 1

2 𝜌 𝐴 − 𝜌′𝐴 1
by exploiting
Theorem 6.14. ■

6.2.1 Sine Distance

Unlike the trace distance, the fidelity is not a distance measure in the mathematical
sense because it does not satisfy the triangle inequality. The following distance
measure based on the fidelity, however, does satisfy the triangle inequality, along
with the other properties that define a distance measure.

Definition 6.16 Sine Distance

For two states 𝜌 and 𝜎, we define the sine distance between 𝜌 and 𝜎 as
√︁
𝑃(𝜌, 𝜎) B 1 − 𝐹 (𝜌, 𝜎). (6.2.112)

The measure 𝑃(𝜌, 𝜎) is known as the sine distance due to the fact that 𝐹 (𝜌, 𝜎)
has the interpretation as the largest value of the squared cosine of the angle
between
√︁ two arbitrary purifications of 𝜌 and 𝜎 (see Theorem 6.8), which means
that 1 − 𝐹 (𝜌, 𝜎) has the interpretation as the sine of the same angle. Related to
this interpretation, the measure 𝑃(𝜌, 𝜎) is equal to the minimum trace distance
between purifications of 𝜌 and 𝜎:

280
Chapter 6: Distinguishibility Measures for Quantum States and Channels

1
inf 𝜎 ∥|𝜓 𝜌 ⟩⟨𝜓 𝜌 | 𝑅 𝐴 − |𝜓 𝜎 ⟩⟨𝜓 𝜎 | 𝑅 𝐴 ∥ 1
|𝜓 𝜌 ⟩ 𝑅 𝐴,|𝜓 ⟩𝑅 𝐴 2
√︃
= 𝜌 inf 𝜎 1 − |⟨𝜓 𝜌 |𝜓 𝜎 ⟩ 𝑅 𝐴 | 2 = 𝑃(𝜌, 𝜎), (6.2.113)
|𝜓 ⟩ 𝑅 𝐴,|𝜓 ⟩ 𝑅 𝐴

where the optimization is over all purifications |𝜓 𝜌 ⟩ 𝑅 𝐴 and |𝜓 𝜎 ⟩ 𝑅 𝐴 of 𝜌 and 𝜎,

respectively. This follows by applying (6.1.1), as well as Uhlmann’s theorem
(Theorem 6.8).
Since the fidelity satisfies the data-processing inequality with respect to positive,
trace-preserving maps (Proposition 6.13), so does the sine distance: for two states
𝜌 and 𝜎 and a positive, trace-preserving map N, we have that

𝑃(𝜌, 𝜎) ≥ 𝑃(N(𝜌), N(𝜎)). (6.2.114)

Lemma 6.17 Triangle Inequality for Sine Distance

Let 𝜌, 𝜎, and 𝜔 be quantum states. Then the triangle inequality holds for the
sine distance:
𝑃(𝜌, 𝜎) ≤ 𝑃(𝜌, 𝜔) + 𝑃(𝜔, 𝜎). (6.2.115)

Proof: Define the canonical purifications as

√
|𝜓 𝜌 ⟩ 𝑅 𝐴 = ( 1 𝑅 ⊗ 𝜌 𝐴 )|Γ⟩ 𝑅 𝐴 , (6.2.116)
√
|𝜓 𝜎 ⟩ 𝑅 𝐴 = ( 1 𝑅 ⊗ 𝜎𝐴 )|Γ⟩ 𝑅 𝐴 , (6.2.117)
√
|𝜓 𝜔 ⟩ 𝑅 𝐴 = ( 1 𝑅 ⊗ 𝜔 𝐴 )|Γ⟩ 𝑅 𝐴 , (6.2.118)

where |Γ⟩ 𝑅 𝐴 is the maximally entangled vector from (2.2.34). Recalling (6.1.1),
for pure states |𝜙⟩ and |𝜑⟩, we have that
√︃
1 √︁
∥|𝜙⟩⟨𝜙| − |𝜑⟩⟨𝜑|∥ 1 = 1 − 𝐹 (𝜙, 𝜑) = 1 − |⟨𝜙|𝜑⟩| 2 . (6.2.119)
2
Let 𝑈 𝑅 and 𝑉𝑅 be arbitrary unitaries acting on the reference system 𝑅. From the
fact that trace distance obeys the triangle inequality and the equality given above,
we find that
√︃ √︃
1 − |⟨𝜓 | 𝑅 𝐴 (𝑊 𝑅 ⊗ 1 𝐴 )|𝜓 ⟩ 𝑅 𝐴 | ≤ 1 − |⟨𝜓 𝜎 | 𝑅 𝐴 (𝑈 𝑅† ⊗ 1 𝐴 )|𝜓 𝜔 ⟩ 𝑅 𝐴 | 2
𝜎 𝜌 2
√︃
+ 1 − |⟨𝜓 𝜔 | 𝑅 𝐴 (𝑉𝑅 ⊗ 1 𝐴 )|𝜓 𝜌 ⟩ 𝑅 𝐴 | 2 , (6.2.120)
281
Chapter 6: Distinguishibility Measures for Quantum States and Channels

where 𝑊 𝑅 B 𝑈 𝑅† 𝑉𝑅 . By minimizing the left-hand side with respect to all unitaries

𝑊 𝑅 , and applying Uhlmann’s theorem, we find that
√︃
1 − |⟨𝜓 𝜎 | 𝑅 𝐴 (𝑈 𝑅† ⊗ 1 𝐴 )|𝜓 𝜔 ⟩ 𝑅 𝐴 | 2
√︁
1 − 𝐹 (𝜎𝐴 , 𝜌 𝐴 ) ≤
√︃
+ 1 − |⟨𝜓 𝜔 | 𝑅 𝐴 (𝑉𝑅 ⊗ 1 𝐴 )|𝜓 𝜌 ⟩ 𝑅 𝐴 | 2 . (6.2.121)

Since the inequality holds for arbitrary unitaries 𝑈 and 𝑉, it holds for the minimum of
each term on the right, and so this, combined with Uhlmann’s theorem (Theorem 6.8),
implies the desired result:
√︁ √︁ √︁
1 − 𝐹 (𝜎𝐴 , 𝜌 𝐴 ) ≤ 1 − 𝐹 (𝜎𝐴 , 𝜔 𝐴 ) + 1 − 𝐹 (𝜔 𝐴 , 𝜌 𝐴 ), (6.2.122)

concluding the proof. ■

6.3 Diamond Distance

Just as there are measures of distinguishability for quantum states, it is important
to develop measures of distinguishability for quantum channels, in order to assess
the performance of quantum information-processing protocols that attempt to
simulate an ideal process. The measures that we introduce in this section are again
motivated by operational concerns, stemming from the ability of an experimenter to
distinguish one quantum channel from another when given access to a single use of
the channel. In what follows, our discussion mirrors and generalizes the discussion
in Section 6.1 that motivated trace distance as a measure of distinguishability for
quantum states.
How should we measure the distance between two quantum channels N 𝐴→𝐵
and M 𝐴→𝐵 ? Related, how should we assess the performance of a quantum inform-
ation-processing protocol in which the ideal channel to be simulated is N 𝐴→𝐵 but
the channel realized in practice is M 𝐴→𝐵 ? Suppose that a third party is trying
to assess how distinguishable the actual channel M 𝐴→𝐵 is from the ideal channel
N 𝐴→𝐵 . Such an individual has access to both the input and output ports of the
channel, and so the most general strategy for the distinguisher to employ is to
prepare a state 𝜌 𝑅 𝐴 of a reference system 𝑅 and the channel input system 𝐴. The
distinguisher transmits the 𝐴 system of 𝜌 𝑅 𝐴 into the unknown channel. After
that, the distinguisher receives the channel output system 𝐵 and then performs

282
Chapter 6: Distinguishibility Measures for Quantum States and Channels

a measurement described by the POVM {Λ𝑥𝑅𝐵 }𝑥 on the reference system 𝑅 and

the channel output system 𝐵. The probability of obtaining a particular outcome
Λ𝑥𝑅𝐵 is given by the Born rule. In the case that the unknown channel is N 𝐴→𝐵 ,
this probability is Tr[Λ𝑥𝑅𝐵 N 𝐴→𝐵 (𝜌 𝑅 𝐴 )], and in the case that the unknown channel
is M 𝐴→𝐵 , this probability is Tr[Λ𝑥𝑅𝐵 M 𝐴→𝐵 (𝜌 𝑅 𝐴 )]. What we demand is that
the absolute deviation between the two probabilities Tr[Λ𝑥𝑅𝐵 N 𝐴→𝐵 (𝜌 𝑅 𝐴 )] and
Tr[Λ𝑥𝑅𝐵 M 𝐴→𝐵 (𝜌 𝑅 𝐴 )] is no larger than some tolerance 𝜀. Since this should be the
case for all possible input states and measurement outcomes, what we demand
mathematically is that
sup |Tr[Λ 𝑅𝐵 N 𝐴→𝐵 (𝜌 𝑅 𝐴 )] − Tr[Λ 𝑅𝐵 M 𝐴→𝐵 (𝜌 𝑅 𝐴 )]| ≤ 𝜀. (6.3.1)
𝜌 𝑅 𝐴, 0≤Λ 𝑅𝐵 ≤1 𝑅𝐵

As a consequence of the characterization of trace distance from Theorem 6.1 we

have
sup |Tr[Λ 𝑅𝐵 (N 𝐴→𝐵 − M 𝐴→𝐵 )(𝜌 𝑅 𝐴 )]|
𝜌 𝑅 𝐴,0≤Λ 𝑅𝐵 ≤1 𝑅𝐵
1 1
= sup ∥N 𝐴→𝐵 (𝜌 𝑅 𝐴 ) − M 𝐴→𝐵 (𝜌 𝑅 𝐴 )∥ 1 C ∥N − M∥⋄ , (6.3.2)
𝜌𝑅 𝐴 2 2
where 12 ∥N − M∥⋄ is defined to be the normalized diamond distance between
N and M. This indicates that if 12 ∥N − M∥⋄ ≤ 𝜀, then the absolute deviation
between probabilities for every possible input state and measurement operator never
exceeds 𝜀, so that the approximation between quantum channels N 𝐴→𝐵 and M 𝐴→𝐵
is naturally quantified by the normalized diamond distance 12 ∥N − M∥⋄.
With the above in mind, we now define the diamond norm for Hermiticity-
preserving maps, from which the diamond distance measure for channels arises.

Definition 6.18 Diamond Norm

The diamond norm of a Hermiticity-preserving map P 𝐴→𝐵 is defined as

∥P∥⋄ B sup ∥P 𝐴→𝐵 (𝜌 𝑅 𝐴 ) ∥ 1 , (6.3.3)

𝜌𝑅 𝐴

where the optimization is over all states 𝜌 𝑅 𝐴 , with the dimension of 𝑅 unbounded.

For two quantum channels N and M, note that the difference N − M is a

Hermiticity-preserving map. An important simplification of the diamond norm of
a Hermiticity-preserving map is given by the following proposition:
283
Chapter 6: Distinguishibility Measures for Quantum States and Channels

Proposition 6.19
The diamond norm of a Hermiticity-preserving map P 𝐴→𝐵 can be calculated as

∥P∥⋄ = sup ∥P 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥ 1 , (6.3.4)

𝜓𝑅 𝐴

where the optimization is over all pure states 𝜓 𝑅 𝐴 , such that the dimension of 𝑅
is equal to the dimension of the system 𝐴.

Proof: Let 𝜌 𝑅 𝐴 be an arbitrary state. It has a spectral decomposition as follows:

∑︁
𝜌𝑅 𝐴 = 𝑝(𝑥)𝜓 𝑥𝑅 𝐴 , (6.3.5)
𝑥

where {𝑝(𝑥)}𝑥 is a probability distribution and {𝜓 𝑥𝑅 𝐴 }𝑥 is a set of pure states. From

the convexity of the trace norm (see Section 2.2.9), it follows that
∑︁
∥P 𝐴→𝐵 (𝜌 𝑅 𝐴 )∥ 1 ≤ 𝑝(𝑥) P 𝐴→𝐵 (𝜓 𝑥𝑅 𝐴 ) 1 (6.3.6)
𝑥
≤ sup P 𝐴→𝐵 (𝜓 𝑥𝑅 𝐴 ) 1
(6.3.7)
𝑥
≤ sup ∥P 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥ 1 (6.3.8)
𝜓𝑅 𝐴

From the Schmidt decomposition (Theorem 2.2), it follows that the rank of the
reduced state 𝜓 𝑅 is no larger than the dimension of system 𝐴. So then it suffices to
optimize with respect to all pure states 𝜓 𝑅 𝐴 , such that the dimension of 𝑅 is equal
to the dimension of the system 𝐴. ■

The normalized diamond distance between two quantum channels can be

computed via a semi-definite program (SDP). The following proposition states this
fact formally and also states the dual optimization problem. Appendix 6.A provides
a proof.

Proposition 6.20 SDPs for Normalized Diamond Distance

The diamond distance between two quantum channels N 𝐴→𝐵 and M 𝐴→𝐵 can

284
Chapter 6: Distinguishibility Measures for Quantum States and Channels

be written as the following semi-definite programs:

1
∥N − M∥⋄
2
N
= sup {Tr[Ω 𝑅𝐵 (Γ𝑅𝐵 M
− Γ𝑅𝐵 )] : Ω 𝑅𝐵 ≤ 𝜌 𝑅 ⊗ 1𝐵 , Tr[𝜌 𝑅 ] = 1} (6.3.9)
𝜌 𝑅 ≥0,
Ω 𝑅𝐵 ≥0
N
= inf {𝜇 : 𝑍 𝑅𝐵 ≥ Γ𝑅𝐵 M
− Γ𝑅𝐵 , 𝜇1 𝑅 ≥ Tr 𝐵 [𝑍 𝑅𝐵 ]}. (6.3.10)
𝜇≥0,
𝑍 𝑅𝐵 ≥0

The latter expression is equal to

N M
inf {∥Tr 𝐵 [𝑍 𝑅𝐵 ] ∥ ∞ : 𝑍 𝑅𝐵 ≥ Γ𝑅𝐵 − Γ𝑅𝐵 }. (6.3.11)
𝑍 𝑅𝐵 ≥0

As described at the beginning of this section, the diamond distance has an

operational meaning in terms of the task of channel discrimination, which is a
generalization of state discrimination (see Section 5.3.1). Let us now analyze the
task of channel discrimination in more detail.
Suppose that Alice gives Bob a device that implements either the channel N or
the channel M, but she does not tell him which channel it implements. Bob’s task
is to decide which channel the device implements. Suppose that the channels N
and M have prior probabilities 𝜆 and 1 − 𝜆 of being selected, respectively. The only
way for Bob to determine which channel the device implements (without guessing
randomly) is to pass a quantum system, say in the state 𝜌, through it. He can then
perform a measurement on the resulting output state and make a guess as to which
channel was implemented. Therefore, in addition to having the freedom to choose
any binary measurement (which is the case in state discrimination), in channel
discrimination Bob also has the freedom to prepare a system 𝐴 and a reference
system 𝑅 in any state 𝜌 𝑅 𝐴 of his choosing, with the system 𝐴 being passed through
the device.
For every fixed input state 𝜌 𝑅 𝐴 , there are two possible output states, depending
on which channel was implemented. This means that, for every fixed input state,
the task of channel discrimination reduces to the task of state discrimination. Using
the result of Theorem 5.3, for the input state 𝜌 𝑅 𝐴 the corresponding optimal error
probability (i.e., the error probability obtained by optimizing over all measurements)

285
Chapter 6: Distinguishibility Measures for Quantum States and Channels

is
1
𝑝 ∗err (𝜌 𝑅 𝐴 ) = (1 − ∥𝜆N 𝐴→𝐵 (𝜌 𝑅 𝐴 ) − (1 − 𝜆)M 𝐴→𝐵 (𝜌 𝑅 𝐴 )∥ 1 ) . (6.3.12)
2
Then, optimizing over all input states 𝜌 𝑅 𝐴 in order to minimize the error probability,
we find that

∗ 1
inf 𝑝 (𝜌 𝑅 𝐴 ) = 1 − sup ∥(𝜆N 𝐴→𝐵 − (1 − 𝜆)M 𝐴→𝐵 )(𝜌 𝑅 𝐴 )∥ 1 (6.3.13)
𝜌 𝑅 𝐴 err 2 𝜌𝑅 𝐴
1
= (1 − ∥𝜆N − (1 − 𝜆)M∥⋄) , (6.3.14)
2
where the last line follows from the definition of the diamond norm. The optimal
error probability for the task of channel discrimination is thus a simple function of
the diamond norm.

6.4 Fidelity Measures for Channels

The diamond distance is a distance measure for channels that is based on the trace
distance for states. We now define a fidelity-based quantity for channels that can be
used to assess its ability to preserve entanglement.

Definition 6.21 Entanglement Fidelity of a Channel

For a quantum channel N 𝐴 with input and output systems of equal dimension
𝑑, we define its entanglement fidelity as

𝐹𝑒 (N) B ⟨Φ| 𝑅 𝐴 (id 𝑅 ⊗ N 𝐴 )(Φ 𝑅 𝐴 )|Φ⟩ 𝑅 𝐴 , (6.4.1)

where
𝑑−1
1 ∑︁
|Φ⟩ 𝑅 𝐴 =√ |𝑖, 𝑖⟩ 𝑅 𝐴 . (6.4.2)
𝑑 𝑖=0

Notice that the entanglement fidelity of a channel is the fidelity of the maximally
entangled state with the Choi state of the channel. Intuitively, then, the entanglement
fidelity quantifies how good a channel is at preserving the entanglement between
two systems when it acts on one of the two systems.
286
Chapter 6: Distinguishibility Measures for Quantum States and Channels

It turns out that the entanglement fidelity is very closely related to another
fidelity-based measure on quantum channels called the average fidelity:
∫
𝐹 (N) B ⟨𝜓|N(𝜓)|𝜓⟩ d𝜓, (6.4.3)
𝜓

where we integrate over all pure states acting on the input Hilbert space of N with
respect to the Haar measure. The Haar measure is the uniform probability measure
on pure quantum states (see the remark after (2.5.16)). For a quantum channel N
with input system dimension 𝑑, the following identity holds
𝑑𝐹𝑒 (N) + 1
𝐹 (N) = . (6.4.4)
𝑑+1

Instead of taking the average as in (6.4.3), we can take the minimum over all
input states to obtain the minimum fidelity:
𝐹min (N) B inf ⟨𝜓|N(𝜓)|𝜓⟩, (6.4.5)
𝜓

where the optimization is over all pure states 𝜓 acting on the input Hilbert space of
the channel N. By introducing a reference system 𝑅 and optimizing over all joint
states |𝜓⟩ 𝑅 𝐴 of 𝑅 and the input system 𝐴 of the channel N, we obtain a fidelity
measure that generalizes the entanglement fidelity.

Definition 6.22 Fidelity of a Quantum Channel

For a quantum channel N 𝐴 with equal input and output system dimension, we
define the fidelity of N as

𝐹 (N) B inf ⟨𝜓| 𝑅 𝐴 (id 𝑅 ⊗ N 𝐴 )(𝜓 𝑅 𝐴 )|𝜓⟩ 𝑅 𝐴 , (6.4.6)

𝜓𝑅 𝐴

where we take the infimum over all pure states |𝜓⟩ 𝑅 𝐴 , with the dimension of 𝑅
equal to the dimension of 𝐴.

Note that the state |𝜓⟩ 𝑅 𝐴 = |Φ⟩ 𝑅 𝐴 is a special case in the optimization in (6.4.6).
This implies that, for a channel N, 𝐹 (N) ≤ 𝐹𝑒 (N).
More generally, we define the fidelity between two quantum channels N 𝐴→𝐵
and M 𝐴→𝐵 as follows:

287
Chapter 6: Distinguishibility Measures for Quantum States and Channels

Definition 6.23 Fidelity of Quantum Channels

Let N 𝐴→𝐵 and M 𝐴→𝐵 be quantum channels. Their channel fidelity is defined
as
𝐹 (N, M) = inf 𝐹 (N 𝐴→𝐵 (𝜌 𝑅 𝐴 ), M 𝐴→𝐵 (𝜌 𝑅 𝐴 )), (6.4.7)
𝜌𝑅 𝐴

where the infimum is taken over all bipartite states 𝜌 𝑅 𝐴 , with the dimension of
𝑅 arbitrarily large.

Remark: Similar to the diamond distance, we define the channel fidelity as above in order
to indicate its operational meaning with an infimum over all possible input states, but it is not
necessary to take the infimum over all bipartite states. One can instead restrict the infimum to be
over pure bipartite states where the reference system 𝑅 is isomorphic to the channel input system
𝐴, so that
𝐹 (N, M) = inf 𝐹 (N 𝐴→𝐵 (𝜓 𝑅 𝐴), M 𝐴→𝐵 (𝜓 𝑅 𝐴)), (6.4.8)
𝜓𝑅 𝐴
where 𝜓 𝑅 𝐴 is a pure bipartite state with system 𝑅 is isomorphic to system 𝐴. The same statement
thus applies to (6.4.6). An argument for this is similar to that given in the proof of Proposition 6.19,
except using the joint concavity of root fidelity rather than convexity of the trace norm.
Here, we provide a different argument for this fact. First, we have that
inf 𝐹 (N 𝐴→𝐵 (𝜌 𝑅 𝐴), M 𝐴→𝐵 (𝜌 𝑅 𝐴)) ≤ inf 𝐹 (N 𝐴→𝐵 (𝜓 𝑅 𝐴), M 𝐴→𝐵 (𝜓 𝑅 𝐴)) (6.4.9)
𝜌𝑅 𝐴 𝜓𝑅 𝐴

which holds simply by restricting the optimization on the left-hand side to pure states.
Next, given a state 𝜌 𝑅 𝐴, with the dimension of 𝑅 not necessarily equal to the dimension of
𝐴, we can purify it to a state 𝜓 𝑅′ 𝑅 𝐴. Then, using the data-processing inequality for the fidelity
with respect to the partial trace channel Tr 𝑅′ (Proposition 6.13), we find that
𝐹 (N 𝐴→𝐵 (𝜌 𝑅 𝐴), M 𝐴→𝐵 (𝜌 𝑅 𝐴)) = 𝐹 (N 𝐴→𝐵 (Tr 𝑅′ [𝜓 𝑅′ 𝑅 𝐴]), M 𝐴→𝐵 (Tr 𝑅′ [𝜓 𝑅′ 𝑅 𝐴])) (6.4.10)
= 𝐹 (Tr 𝑅′ [N 𝐴→𝐵 (𝜓 𝑅′ 𝑅 𝐴)], Tr 𝑅′ [M 𝐴→𝐵 (𝜓 𝑅′ 𝑅 𝐴)]) (6.4.11)
≥ 𝐹 (N 𝐴→𝐵 (𝜓 𝑅′ 𝑅 𝐴), M 𝐴→𝐵 (𝜓 𝑅′ 𝑅 𝐴)) (6.4.12)
≥ inf 𝐹 (N 𝐴→𝐵 (𝜓 𝑅′ 𝑅 𝐴), M 𝐴→𝐵 (𝜓 𝑅′ 𝑅 𝐴)). (6.4.13)
𝜓𝑅′ 𝑅 𝐴

Since the state 𝜌 𝑅 𝐴 is arbitrary, we obtain

inf 𝐹 (N 𝐴→𝐵 (𝜌 𝑅 𝐴), M 𝐴→𝐵 (𝜌 𝑅 𝐴)) ≥ inf 𝐹 (N 𝐴→𝐵 (𝜓 𝑅 𝐴), M 𝐴→𝐵 (𝜓 𝑅 𝐴)). (6.4.14)
𝜌𝑅 𝐴 𝜓𝑅 𝐴

Finally, by the Schmidt decomposition theorem (Theorem 2.2), for every pure state 𝜓 𝑅 𝐴, the
rank of the reduced state 𝜓 𝑅 need not exceed the dimension of 𝐴, implying that it suffices to
optimize over pure states for which the system 𝑅 has the same dimension as the system 𝐴. We
thus obtain
inf 𝐹 (N 𝐴→𝐵 (𝜌 𝑅 𝐴), M 𝐴→𝐵 (𝜌 𝑅 𝐴)) = inf 𝐹 (N 𝐴→𝐵 (𝜓 𝑅 𝐴), M 𝐴→𝐵 (𝜓 𝑅 𝐴) (6.4.15)
𝜌𝑅 𝐴 𝜓𝑅 𝐴

288
Chapter 6: Distinguishibility Measures for Quantum States and Channels

= 𝐹 (N, M). (6.4.16)

We then have that 𝐹 (N) = 𝐹 (N, id). In other words, the fidelity 𝐹 (N) of
a quantum channel N can be viewed as the fidelity between N and the identity
channel id.
Similar to the diamond distance, the fidelity of quantum channels can be
computed by means of primal and dual semi-definite programs:

Proposition 6.24 SDP for Root Fidelity of Channels

Let N 𝐴→𝐵 and M 𝐴→𝐵 be quantum channels with respective Choi operators
N and ΓM . Then their root channel fidelity
Γ𝑅𝐵 𝑅𝐵
√ √
𝐹 (N 𝐴→𝐵 , M 𝐴→𝐵 ) := inf 𝐹 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), M 𝐴→𝐵 (𝜓 𝑅 𝐴 )) (6.4.17)
𝜓𝑅 𝐴

can be calculated by means of the following semi-definite program:

√
𝐹 (N 𝐴→𝐵 , M 𝐴→𝐵 )
Γ𝑅𝐵 𝑄 †𝑅𝐵
N
= sup 𝜆 : 𝜆𝐼 𝑅 ≤ Re[Tr 𝐵 [𝑄 𝑅𝐵 ]], M ≥0 (6.4.18)
𝜆≥0,𝑄 𝑅𝐵 𝑄 𝑅𝐵 Γ𝑅𝐵
(
1 N M
= inf Tr[Γ𝑅𝐵 𝑊 𝑅𝐵 ] + Tr[Γ𝑅𝐵 𝑍 𝑅𝐵 ] : Tr[𝜌 𝑅 ] = 1,
2 𝜌 𝑅 ≥0,𝑊𝑅𝐵 ,𝑍 𝑅𝐵
)
𝑊 𝑅𝐵 𝜌 𝑅 ⊗ 𝐼 𝐵
≥0 . (6.4.19)
𝜌𝑅 ⊗ 𝐼𝐵 𝑍 𝑅𝐵

The expression in (6.4.18) is equal to

Γ𝑅𝐵 𝑄 †𝑅𝐵
N
sup 𝜆 min (Re[Tr 𝐵 [𝑄 𝑅𝐵 ]]) : M ≥0 , (6.4.20)
𝑄 𝑅𝐵 𝑄 𝑅𝐵 Γ𝑅𝐵

where 𝜆min denotes the minimum eigenvalue of its argument.

Proof: See Appendix 6.B.2. ■

289
Chapter 6: Distinguishibility Measures for Quantum States and Channels

The inequality in (6.2.88) relating the fidelity between two states 𝜌 and 𝜎
and their trace distance can be used to relate the fidelity-based distance measure
𝐹 (N, M) on channels and the diamond distance 12 ∥N − M∥⋄. It is straightfoward
to show that
√︁ 1 √︁
1 − 𝐹 (N, M) ≤ ∥N − M∥⋄ ≤ 1 − 𝐹 (N, M). (6.4.21)
2

The following proposition relates the fidelity 𝐹 (N) of a channel N to its

minimum fidelity 𝐹min (N) in (6.4.5), telling us that if 𝐹min (N) is large then so is
𝐹 (N).

Proposition 6.25
Let N be a quantum
√ channel. For all 𝜀 ∈ [0, 1], if 𝐹min (N) ≥ 1 − 𝜀, then
𝐹 (N) ≥ 1 − 2 𝜀.

Proof: The inequality in 𝐹min (N) ≥ 1 − 𝜀 implies that the following inequality
holds for all state vectors |𝜙⟩ ∈ H:

⟨𝜙| [|𝜙⟩⟨𝜙| − N(|𝜙⟩⟨𝜙|)] |𝜙⟩ ≤ 𝜀. (6.4.22)

By (6.2.88), this implies that

√
∥|𝜙⟩⟨𝜙| − N(|𝜙⟩⟨𝜙|) ∥ 1 ≤ 2 𝜀, (6.4.23)

for all state vectors |𝜙⟩ ∈ H. We will show that

√
⟨𝜙| |𝜙⟩⟨𝜙⊥ | − N(|𝜙⟩⟨𝜙⊥ |) |𝜙⊥ ⟩ ≤ 2 𝜀,

(6.4.24)

for every orthonormal pair {|𝜙⟩, |𝜙⊥ ⟩} of state vectors in H. Set

|𝜙⟩ + i𝑘 |𝜙⊥ ⟩
|𝑤 𝑘 ⟩ B √ , (6.4.25)
2
for 𝑘 ∈ {0, 1, 2, 3}. Then, it follows that
3
⊥ 1 ∑︁ 𝑘
|𝜙⟩⟨𝜙 | = i |𝑤 𝑘 ⟩⟨𝑤 𝑘 |. (6.4.26)
2 𝑘=0

290
Chapter 6: Distinguishibility Measures for Quantum States and Channels

Consider now that

⟨𝜙| |𝜙⟩⟨𝜙⊥ | − N(|𝜙⟩⟨𝜙⊥ |) |𝜙⊥ ⟩

≤ |𝜙⟩⟨𝜙⊥ | − N(|𝜙⟩⟨𝜙⊥ |) ∞ (6.4.27)

3
1 ∑︁
≤ ∥|𝑤 𝑘 ⟩⟨𝑤 𝑘 | − N(|𝑤 𝑘 ⟩⟨𝑤 𝑘 |) ∥ ∞ (6.4.28)
2 𝑘=0
3
1 ∑︁
≤ ∥|𝑤 𝑘 ⟩⟨𝑤 𝑘 | − N(|𝑤 𝑘 ⟩⟨𝑤 𝑘 |) ∥ 1 (6.4.29)
4 𝑘=0
√
≤ 2 𝜀. (6.4.30)

The first inequality follows from the characterization of the operator norm in
(2.2.121) as ∥ 𝑋 ∥ ∞ = sup|𝜙⟩,|𝜓⟩ |⟨𝜓|𝑋 |𝜙⟩|, where the optimization is with respect to
pure states. The second inequality follows from substituting (6.4.26) and applying
the triangle inequality and homogeneity of the ∞-norm. The third inequality follows
because the ∞-norm of a traceless Hermitian operator is bounded from above by
half of its trace norm (see Lemma 2.11 below). The final inequality follows from
applying (6.4.23). Let |𝜓⟩ ∈ H′ ⊗ H be an arbitrary state. All such states have a
Schmidt decomposition of the following form:
∑︁ √︁
|𝜓⟩ = 𝑝(𝑥)|𝜁𝑥 ⟩ ⊗ |𝜑𝑥 ⟩, (6.4.31)
𝑥

where {𝑝(𝑥)}𝑥 is a probability distribution and {|𝜁𝑥 ⟩}𝑥 and {|𝜑𝑥 ⟩}𝑥 are sets of
states. Then, consider that

1 − ⟨𝜓|(idH′ ⊗ N)(|𝜓⟩⟨𝜓|)|𝜓⟩
= ⟨𝜓|(idH′ ⊗ idH − idH′ ⊗ N)(|𝜓⟩⟨𝜓|)|𝜓⟩ (6.4.32)
= ⟨𝜓|(idH′ ⊗ [idH − N])(|𝜓⟩⟨𝜓|)|𝜓⟩ (6.4.33)
∑︁
= 𝑝(𝑥) 𝑝(𝑦)⟨𝜑𝑥 | |𝜑𝑥 ⟩⟨𝜑 𝑦 | − N(|𝜑𝑥 ⟩⟨𝜑 𝑦 |) |𝜑 𝑦 ⟩. (6.4.34)
𝑥,𝑦

Now, applying the triangle inequality and (6.4.24), we find that the following holds
for all |𝜓⟩ ∈ H′ ⊗ H:

1 − ⟨𝜓|(idH′ ⊗ N)(|𝜓⟩⟨𝜓|)|𝜓⟩
∑︁
= 𝑝(𝑥) 𝑝(𝑦)⟨𝜑𝑥 | |𝜑𝑥 ⟩⟨𝜑 𝑦 | − N(|𝜑𝑥 ⟩⟨𝜑 𝑦 |) |𝜑 𝑦 ⟩ (6.4.35)
𝑥,𝑦

291
Chapter 6: Distinguishibility Measures for Quantum States and Channels
∑︁
≤ 𝑝(𝑥) 𝑝(𝑦) ⟨𝜑𝑥 | |𝜑𝑥 ⟩⟨𝜑 𝑦 | − N(|𝜑𝑥 ⟩⟨𝜑 𝑦 |) |𝜑 𝑦 ⟩ (6.4.36)
𝑥,𝑦
√
≤ 2 𝜀. (6.4.37)

This concludes the proof. ■

6.5 Bibliographic Notes

The quantum fidelity was defined by Uhlmann (1976), and Theorem 6.8 is due to
Uhlmann (1976). The semi-definite program for root fidelity in Proposition 6.6
was established by Watrous (2013), and we have followed the proof therein. The
fact that the fidelity is achieved by a quantum measurement was realized by Fuchs
and Caves (1995). The relation between trace distance and fidelity presented in
Theorem 6.14 was proved by Fuchs and van de Graaf (1998). For very closely related
√ √
inequalities, with the fidelity replaced by the “Holevo fidelity” (Tr[ 𝜌 𝜎]) 2 , see
Holevo (1972b). The sine distance was defined by Rastegin (2002, 2003); Gilchrist
et al. (2005); Rastegin (2006), and its interpretation in terms of the minimal trace
distance of purifications was given by Rastegin (2006). The sine distance was
generalized to subnormalized states by Tomamichel et al. (2010), where it was
given the name “purified distance.”
The diamond norm was presented and studied by Kitaev (1997), who applied
it to problems in quantum information theory and quantum computation. The
operational interpretation of the diamond distance in terms of hypothesis testing
of quantum channels was given by Kretschmann and Werner (2004); Rosgen and
Watrous (2005); Gilchrist et al. (2005). More properties of the diamond norm
can be found in Watrous (2018). The SDP in Proposition 6.20 for the normalized
diamond distance of quantum channels was given by Watrous (2009).
Schumacher (1996) introduced the entanglement fidelity of a quantum channel,
and Barnum et al. (2000) made further observations regarding it. Nielsen (2002)
provided a simple proof for the relation between entanglement fidelity and average
fidelity in (6.4.4). The fidelity of quantum channels was introduced by Gilchrist
et al. (2005), and it can be understood as a special case of the generalized channel
divergence (Leditzky et al., 2018). A semi-definite program for the root fidelity of
channels was given by Yuan and Fung (2017). The particular semi-definite program
in Proposition 6.24, for the root fidelity of channels, was presented by Katariya

292
Chapter 6: Distinguishibility Measures for Quantum States and Channels

and Wilde (2021). Proposition 6.25 was established by Barnum et al. (2000) and
reviewed by Kretschmann and Werner (2004). Here we followed the proof given
by Watrous (2018, Theorem 3.56), which therein established a relation between
trace distance and diamond distance between an arbitrary channel and the identity
channel.

Appendix 6.A SDP for Normalized Diamond Dis-

tance
Here, we provide a proof of Proposition 6.20.

Proof of Proposition 6.20: Employing (6.3.2), consider that

1
∥N − M∥⋄
2
Tr[Λ 𝑅𝐵 (N 𝐴→𝐵 − M 𝐴→𝐵 )(𝜓 𝑅 𝐴 )] : Λ 𝑅𝐵 ≤ 1 𝑅𝐵 ,

= sup , (6.A.1)
𝜓 𝑅 𝐴 ≥0, Tr[𝜓 𝑅 𝐴 ] = 1, Tr[𝜓 2𝑅 𝐴 ] = 1
Λ 𝑅𝐵 ≥0

where the constraints 𝜓 𝑅 𝐴 ≥ 0, Tr[𝜓 𝑅 𝐴 ] = 1, and Tr[𝜓 2𝑅 𝐴 ] = 1 correspond to 𝜓 𝑅 𝐴

being a pure bipartite state. Note that the above is equal to

Tr[Λ 𝑅𝐵 (N 𝐴→𝐵 − M 𝐴→𝐵 )(𝜓 𝑅 𝐴 )] : Λ 𝑅𝐵 ≤ 1 𝑅𝐵 , 𝜓 𝑅 > 0,

sup , (6.A.2)
𝜓 𝑅 𝐴 ≥0, Tr[𝜓 𝑅 𝐴 ] = 1, Tr[𝜓 2𝑅 𝐴 ] = 1
Λ 𝑅𝐵 ≥0

due to the fact that the set of pure states with reduced state 𝜓 𝑅 positive definite is
dense in the set of all pure states. Now, recall from (2.2.38) that any such pure
state can be written as 𝜓 𝑅 𝐴 = 𝑋 𝑅 Γ𝑅 𝐴 𝑋 𝑅† for some linear operator 𝑋 𝑅 such that
Tr[𝑋 𝑅† 𝑋 𝑅 ] = 1 and |𝑋 𝑅 | > 0, where Γ𝑅 𝐴 defined in (2.2.34). Using this, we find
that the objective function can be rewritten as

Tr[Λ 𝑅𝐵 (N 𝐴→𝐵 − M 𝐴→𝐵 )(𝑋 𝑅 Γ𝑅 𝐴 𝑋 𝑅† )] (6.A.3)

= Tr[𝑋 𝑅† Λ 𝑅𝐵 𝑋 𝑅 (N 𝐴→𝐵 − M 𝐴→𝐵 )(Γ𝑅 𝐴 )] (6.A.4)
= Tr[𝑋 𝑅† Λ 𝑅𝐵 𝑋 𝑅 (Γ𝑅𝐵
N M
− Γ𝑅𝐵 )]. (6.A.5)

293
Chapter 6: Distinguishibility Measures for Quantum States and Channels

Now, observe the following equivalence:

0 ≤ Λ 𝑅𝐵 ≤ 1 𝑅𝐵 ⇔ 0 ≤ 𝑋 𝑅† Λ 𝑅𝐵 𝑋 𝑅 ≤ 𝑋 𝑅† 𝑋 𝑅 ⊗ 1𝐵 . (6.A.6)

Thus, defining Ω 𝑅𝐵 B 𝑋 𝑅† Λ 𝑅𝐵 𝑋 𝑅 and 𝜌 𝑅 B 𝑋 𝑅† 𝑋 𝑅 , the optimization in (6.A.2) is

equivalent to the following one:
N
sup {Tr[Ω 𝑅𝐵 (Γ𝑅𝐵 M
− Γ𝑅𝐵 )] : Ω 𝑅𝐵 ≤ 𝜌 𝑅 ⊗ 1𝐵 , 𝜌 𝑅 > 0, Tr[𝜌 𝑅 ] = 1}, (6.A.7)
𝜌𝑅 ,
Ω 𝑅𝐵 ≥0

giving the equality in (6.3.9). Finally, setting

Ω 𝑅𝐵 0
𝑌 B , (6.A.8)
0 𝜌𝑅
N M 0

Γ𝑅𝐵 − Γ𝑅𝐵
𝐷B , (6.A.9)
0 0
Ω 𝑅𝐵 − 𝜌 𝑅 ⊗ 1𝐵 0 0
Φ(𝑌 ) B Tr[𝜌 𝑅 ]
© ª
0 0 ®, (6.A.10)
« 0 0 −Tr[𝜌 𝑅 ] ¬
0 0 0
© ª
𝐶 B 0 1 0 ® (6.A.11)
«0 0 −1¬
we find that (6.3.9) is now in the standard form from (2.4.3), namely,

sup{Tr[𝐷𝑌 ] : Φ(𝑌 ) ≤ 𝐶}. (6.A.12)

𝑌 ≥0

Now, to establish the dual SDP in (6.3.10), we first determine the adjoint Φ† of
Φ using
Tr[Φ(𝑌 )𝑍] = Tr[𝑌 Φ† (𝑍)], (6.A.13)
where without loss of generality we can take 𝑍 to be

𝑍 𝑅𝐵 0 0
© ª
𝑍 B 0 𝜇1 0 ® . (6.A.14)
« 0 0 𝜇2 ¬
Then, we find that

Tr[Φ(𝑌 )𝑍] = Tr[(Ω 𝑅𝐵 − 𝜌 𝑅 ⊗ 1𝐵 )𝑍 𝑅𝐵 ] + Tr[𝜌 𝑅 ] 𝜇1 − Tr[𝜌 𝑅 ] 𝜇2 (6.A.15)

294
Chapter 6: Distinguishibility Measures for Quantum States and Channels

= Tr[Ω 𝑅𝐵 𝑍 𝑅𝐵 ] + Tr[𝜌 𝑅 ((𝜇1 − 𝜇2 ) 1 𝑅 − Tr 𝐵 [𝑍 𝑅𝐵 ])], (6.A.16)

from which we conclude that

𝑍 0
Φ† (𝑍) = 𝑅𝐵
0 (𝜇1 − 𝜇2 ) 1 𝑅 − Tr 𝐵 [𝑍 𝑅𝐵 ]
. (6.A.17)

The standard form of the dual SDP from (2.4.4), which is

inf {Tr[𝐶 𝑍] : Φ† (𝑍) ≥ 𝐷}, (6.A.18)

𝑍 ≥0

then becomes
N
inf {𝜇1 − 𝜇2 : 𝑍 𝑅𝐵 ≥ Γ𝑅𝐵 M
− Γ𝑅𝐵 , (𝜇1 − 𝜇2 ) 1 𝑅 ≥ Tr 𝐵 [𝑍 𝑅𝐵 ]}. (6.A.19)
𝜇1 ≥0,
𝜇2 ≥0,
𝑍 𝑅𝐵 ≥0

Now, observe that the variables 𝜇1 and 𝜇2 always appear together in the above
optimization as 𝜇1 − 𝜇2 , and so can be reduced to the a single real variable 𝜇 ∈ R.
Then, the condition 𝜇1 𝑅 ≥ Tr 𝐵 [𝑍 𝑅𝐵 ] implies that 𝜇 ≥ 0. Thus, the optimization
in (6.A.19) can be simplified to
N
inf {𝜇 : 𝑍 𝑅𝐵 ≥ Γ𝑅𝐵 M
− Γ𝑅𝐵 , 𝜇1 𝑅 ≥ Tr 𝐵 [𝑍 𝑅𝐵 ]}, (6.A.20)
𝜇≥0,
𝑍 𝑅𝐵 ≥0

which is precisely (6.3.10). Equality of the primal and dual SDPs is due to strong
duality, which holds for the SDP in (6.3.10) because 𝑍 𝑅𝐵 = Γ𝑅𝐵 N − ΓM + 𝛿 1
𝑅𝐵 𝑅𝐵
and 𝜇 = Tr 𝐵 [𝑍 𝑅𝐵 ] + 𝛿1 𝑅 together form a strictly feasible point for all 𝛿 > 0 and a
feasible point for the primal is 𝜌 𝑅 = 𝜋 𝑅 and Ω 𝑅𝐵 = 𝜋 𝑅 ⊗ 1𝐵 .
Finally, the equality

N
inf {𝜇 : 𝑍 𝑅𝐵 ≥ Γ𝑅𝐵 M
− Γ𝑅𝐵 , 𝜇1 𝑅 ≥ Tr 𝐵 [𝑍 𝑅𝐵 ]}
𝜇≥0,
𝑍 𝑅𝐵 ≥0
N M
= inf {∥Tr 𝐵 [𝑍 𝑅𝐵 ] ∥ ∞ : 𝑍 𝑅𝐵 ≥ Γ𝑅𝐵 − Γ𝑅𝐵 } (6.A.21)
𝑍 𝑅𝐵 ≥0

holds by the expression in (2.4.47) for the Schatten ∞-norm for positive semi-definite
operators.

295
Chapter 6: Distinguishibility Measures for Quantum States and Channels

Appendix 6.B SDPs for Fidelity of States and Chan-

nels

6.B.1 Proof of Proposition 6.6

First, let us verify that strong duality holds for the primal and dual semi-definite
programs in (6.2.4) and (6.2.5), respectively. Consider that 𝑋 = 0 is a feasible
choice for the primal program, while 𝑌 = 𝑍 = 21 is strictly feasible for the dual
program. Thus, strong duality holds according to Theorem 2.28.
In order to prove the equality in (6.2.4), we start with the following lemma:

Lemma 6.26
Let 𝑃 and 𝑄 be positive semi-definite operators in L(H), and let 𝑋 ∈ L(H).
Then the operator
𝑃 𝑋
(6.B.1)
𝑋† 𝑄

√ √if and only if there exists 𝐾 ∈ L(H) satisfying

is positive semi-definite
∥𝐾 ∥ ∞ ≤ 1 and 𝑋 = 𝑃𝐾 𝑄.

Proof: See Theorem IX.5.9 of Bhatia (1997). ■

It follows from this lemma

√ that √ the operators 𝑋 in the primal optimization in
(6.2.4) can range over 𝑋 = 𝑃𝐾 𝑄 such that 𝐾 ∈ L(H) and ∥𝐾 ∥ ∞ ≤ 1, so that
the optimization over 𝑋 is reduced to an optimization over 𝐾. We then find that the
primal optimal value is given by

1 † 𝜌 𝑋
sup Tr[𝑋] + Tr[𝑋 ] : † ≥0
2 𝑋∈L(H) 𝑋 𝜎
1 √ √ √ √
= sup Tr[ 𝜌𝐾 𝜎] + Tr[ 𝜎𝐾 † 𝜌] (6.B.2)
2 𝐾:∥𝐾 ∥ ∞ ≤1
√ √
= sup Re[Tr[ 𝜌𝐾 𝜎]] (6.B.3)
𝐾:∥𝐾 ∥ ∞ ≤1
√ √
= sup Tr[ 𝜌𝐾 𝜎] (6.B.4)
𝐾:∥𝐾 ∥ ∞ ≤1

296
Chapter 6: Distinguishibility Measures for Quantum States and Channels
√ √ √
= 𝜌 𝜎 1
= 𝐹 (𝜌, 𝜎). (6.B.5)

The first equality follows from Lemma 6.26. The third equality follows because we
can use the optimization over 𝐾 to adjust a global phase such that the real part is
equal to the absolute value (here, one should think of the fact that Re[𝑧] = 𝑟 cos(𝜃)
for 𝑧 = 𝑟𝑒𝑖𝜃 , and then one can optimize the value of 𝜃 so that Re[𝑧] = 𝑟). The
final equality follows by a generalization of Proposition 2.10 (in fact the same
proof given there implies that the optimization can be with respect to 𝑈 satisfying
∥𝑈 ∥ ∞ ≤ 1, rather than just with respect to isometries).
We now prove that (6.2.5) is the dual program of (6.2.4). We can rewrite the
primal SDP as
1
sup Tr[𝑋] + Tr[𝑋 † ] (6.B.6)
2
subject to
𝜌 0 0 𝑋 𝑅 𝑋
≥ , ≥0 (6.B.7)
0 𝜎 𝑋† 0 𝑋† 𝑆
because 𝑅 and 𝑆 are not involved in the objective function and can always be chosen
so that the second operator is PSD. Also, the following equivalences hold

𝜌 𝑋 𝜌 −𝑋
≥ 0 ⇐⇒ ≥0 (6.B.8)
𝑋† 𝜎 −𝑋 † 𝜎

𝜌 0 0 𝑋
⇐⇒ ≥ . (6.B.9)
0 𝜎 𝑋† 0

As given in (2.4.3) and (2.4.4), the standard forms of primal and dual SDPs for
Hermitian 𝐴 and 𝐵 and Hermiticity-preserving map Φ are as follows:

sup {Tr[ 𝐴𝐺] : Φ(𝐺) ≤ 𝐵} , (6.B.10)

𝐺≥0
inf Tr[𝐵𝑌 ] : Φ† (𝑌 ) ≥ 𝐴 .

(6.B.11)
𝑌 ≥0

So the SDP above is in standard form with

0 1

𝑅 𝑋
𝐺= † , 𝐴=
𝑋 𝑆 1 0 , (6.B.12)

0 𝑋 𝜌 0
Φ(𝐺) = † , 𝐵= . (6.B.13)
𝑋 0 0 𝜎

297
Chapter 6: Distinguishibility Measures for Quantum States and Channels

Setting
𝑊 𝑉
𝑌= † , (6.B.14)
𝑉 𝑍
the adjoint of Φ is given by

𝑊 𝑉 0 𝑋
Tr[𝑌 Φ(𝑋)] = Tr (6.B.15)
𝑉† 𝑍 𝑋† 0
†
𝑉𝑋 𝑊𝑋
= Tr (6.B.16)
𝑍 𝑋† 𝑉†𝑋
= Tr[𝑉 𝑋 † ] + Tr[𝑋𝑉 † ] (6.B.17)

0 𝑉 𝑅 𝑋
= Tr , (6.B.18)
𝑉† 0 𝑋† 𝑆

so that
0 𝑉
Φ† (𝑌 ) = † . (6.B.19)
𝑉 0
Then the dual is given by

1 𝜌 0 𝑊 𝑉
inf Tr (6.B.20)
2 0 𝜎 𝑉† 𝑍

subject to
0 1

0 𝑉 𝑊 𝑉
≥ ≥ 0.
𝑉† 0 1 0 , 𝑉† 𝑍
(6.B.21)

This simplifies to
1
inf Tr[𝜌𝑊] + Tr[𝜎𝑍], (6.B.22)
2
subject to
0 1

0 𝑉 𝑊 𝑉
≥ ≥0
𝑉† 0 1 0 , 𝑉† 𝑍
(6.B.23)

Since

𝑊 𝑉 𝑊 −𝑉
≥0 ⇐⇒ ≥0 (6.B.24)
𝑉† 𝑍 −𝑉 † 𝑍

𝑊 0 0 𝑉
⇐⇒ ≥ † , (6.B.25)
0 𝑍 𝑉 0

298
Chapter 6: Distinguishibility Measures for Quantum States and Channels

we find that there is a single condition

0 1

𝑊 0 0 𝑉
≥ † ≥
0 𝑍 𝑉 0 1 0 , (6.B.26)

and since 𝑉 plays no role in the objective function, we can set 𝑉 = 1. So the final
SDP simplifies as follows:
1
inf Tr[𝜌𝑊] + Tr[𝜎𝑍] (6.B.27)
2 𝑊,𝑍
subject to
𝑊 −1

≥ 0.
−1 𝑍
(6.B.28)

Using the fact that

𝑊 −1 𝑊 1

≥0 ⇐⇒
−1 𝑍 1 𝑍 ≥0 (6.B.29)

we can do one final rewriting as follows:

1
inf Tr[𝜌𝑊] + Tr[𝜎𝑍] (6.B.30)
2 𝑊,𝑍
subject to
𝑊 1

1 𝑍 ≥ 0. (6.B.31)

6.B.2 Proof of Proposition 6.24

First, strong duality holds, according to Theorem 2.28, because 𝑄 𝑅𝐵 = 0 and 𝜆 = 0

is feasible for the primal program, while 𝜌 𝑅 = 1/|𝑅| and 𝑊 𝑅𝐵 = 𝑍 𝑅𝐵 = 21 𝑅𝐵 is
strictly feasible for the dual.
For a pure bipartite state 𝜓 𝑅 𝐴 , we use (2.2.38) to conclude that

𝜓 𝑅 𝐴 = 𝑋 𝑅 Γ𝑅 𝐴 𝑋 𝑅† , (6.B.32)

where Tr[𝑋 𝑅† 𝑋 𝑅 ] = 1 to see that

N
N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) = 𝑋 𝑅 Γ𝑅𝐵 𝑋 𝑅† , M †
M 𝐴→𝐵 (𝜓 𝑅 𝐴 ) = 𝑋 𝑅 Γ𝑅𝐵 𝑋𝑅 , (6.B.33)
299
Chapter 6: Distinguishibility Measures for Quantum States and Channels

and then plug in to (6.2.5) to get that

√ 1
𝐹 (N 𝐴→𝐵 , M 𝐴→𝐵 ) = N
inf Tr[𝑋 𝑅 Γ𝑅𝐵 𝑋 𝑅† 𝑊 𝑅𝐵 ] + Tr[𝑋 𝑅 Γ𝑅𝐵
M †
𝑋 𝑅 𝑍 𝑅𝐵 ]
2 𝑅𝐵 𝑅𝐵
𝑊 ,𝑍
(6.B.34)
subject to
𝑊 𝑅𝐵 1 𝑅𝐵

1𝑅𝐵 𝑍 𝑅𝐵 ≥ 0. (6.B.35)

Consider that the objective function can be written as

N ′ M ′
Tr[Γ𝑅𝐵 𝑊 𝑅𝐵 ] + Tr[Γ𝑅𝐵 𝑍 𝑅𝐵 ], (6.B.36)

with
′
𝑊 𝑅𝐵 := 𝑋 𝑅† 𝑊 𝑅𝐵 𝑋 𝑅 , ′
𝑍 𝑅𝐵 := 𝑋 𝑅† 𝑍 𝑅𝐵 𝑋 𝑅 (6.B.37)
Now consider that the inequality in (6.B.35) is equivalent to

𝑊 𝑅𝐵 1 𝑅𝐵 𝑋 𝑅 0
†
𝑋𝑅 0
0 𝑋𝑅 1𝑅𝐵 𝑍 𝑅𝐵 0 𝑋𝑅 ≥ 0. (6.B.38)

(Here we have assumed that 𝑋 𝑅 is invertible, but it suffices to do so for this

optimization because the set of invertible 𝑋 𝑅 is dense in the set of all possible 𝑋 𝑅 .)
Multiplying out the last matrix we find that

𝑊 𝑅𝐵 1 𝑅𝐵 𝑋 𝑅 0
†
𝑋𝑅 0
0 𝑋𝑅 1𝑅𝐵 𝑍 𝑅𝐵 0 𝑋𝑅
𝑋 𝑅 𝑊 𝑅𝐵 𝑋 𝑅 𝑋 𝑅† 𝑋 𝑅 ⊗ 1𝐵
†
= † (6.B.39)
𝑋 𝑅 𝑋 𝑅 ⊗ 1𝐵 𝑋 𝑅† 𝑍 𝑅𝐵 𝑋 𝑅
′ 𝜌 𝑅 ⊗ 1𝐵

𝑊 𝑅𝐵
=
𝜌 𝑅 ⊗ 1𝐵 ′ , (6.B.40)
𝑍 𝑅𝐵

where we defined 𝜌 𝑅 = 𝑋 𝑅† 𝑋 𝑅 . Observing that 𝜌 𝑅 ≥ 0 and Tr[𝜌 𝑅 ] = 1, we can

write the final SDP as follows:
√ 1 N M
𝐹 (N 𝐴→𝐵 , M 𝐴→𝐵 ) = inf Tr[Γ𝑅𝐵 𝑊 𝑅𝐵 ] + Tr[Γ𝑅𝐵 𝑍 𝑅𝐵 ], (6.B.41)
2 𝜌 𝑅 ,𝑊𝑅𝐵 ,𝑍 𝑅𝐵
subject to

𝜌 𝑅 ⊗ 1𝐵

𝑊 𝑅𝐵
𝜌 𝑅 ≥ 0, Tr[𝜌 𝑅 ] = 1, ≥ 0.
𝜌 𝑅 ⊗ 1𝐵
(6.B.42)
𝑍 𝑅𝐵
300
Chapter 6: Distinguishibility Measures for Quantum States and Channels

Now let us calculate the dual SDP to this, using the following standard forms
for primal and dual SDPs, with Hermitian operators 𝐴 and 𝐵 and a Hermiticity-
preserving map Φ (as given in (2.4.3) and (2.4.4)):

inf Tr[𝐵𝑌 ] : Φ† (𝑌 ) ≥ 𝐴 .

sup {Tr[ 𝐴𝑋] : Φ(𝑋) ≤ 𝐵} , (6.B.43)
𝑋 ≥0 𝑌 ≥0

Consider that the constraint in (6.B.42) implies 𝑊 𝑅𝐵 ≥ 0 and 𝑍 𝑅𝐵 ≥ 0, so that we

can set
𝑊 𝑅𝐵 0 0 ΓN 0 0
© 𝑅𝐵 M ª
𝑌 = 0 𝑍 𝑅𝐵 0 ® , 𝐵 = 0 Γ𝑅𝐵 0® ,
© ª
(6.B.44)
« 0 0 𝜌𝑅¬ « 0 0 0¬
𝑊 𝑅𝐵 𝜌 𝑅 ⊗ 1𝐵 0 0
𝜌 𝑅 ⊗ 1𝐵
© ª
† 𝑍 𝑅𝐵 0 0
Φ (𝑌 ) =
®
®, (6.B.45)
0 0 Tr[𝜌 𝑅 ] 0 ®
« 0 0 0 − Tr[𝜌 𝑅 ] ¬
0 0 0 0
© ª
0 0 0 0 ®
𝐴= ®. (6.B.46)
0 0 1 0 ®
«0 0 0 −1¬
Then with
𝑃 𝑅𝐵 𝑄 †𝑅𝐵 0 0
© ª
𝑄 𝑆 0 0®
𝑋 = 𝑅𝐵 𝑅𝐵 ® (6.B.47)
0 0 𝜆 0®
« 0 0 0 𝜇¬
the map Φ is given by

Tr[𝑋Φ† (𝑌 )]
 𝑃 𝑅𝐵 𝑄 † 0 0
𝑅𝐵
𝑊 𝑅𝐵 𝜌 𝑅 ⊗ 1𝐵 0 0 
𝑄 𝑅𝐵 𝑆 𝑅𝐵 0 0 ® 𝜌 𝑅 ⊗ 1𝐵
© ª © ª 
𝑍 𝑅𝐵 0 0 
= Tr 
®
®
Tr[𝜌 𝑅 ]
®
 0 0 𝜆 0® 0 0 0 ®
 0
« 0 0 𝜇¬ « 0 0 0 − Tr[𝜌 𝑅 ] ¬
= Tr[𝑃 𝑅𝐵𝑊 𝑅𝐵 ] + Tr[𝑄 †𝑅𝐵 (𝜌 𝑅 ⊗ 1𝐵 )] + Tr[𝑄 𝑅𝐵 (𝜌 𝑅 ⊗ 1𝐵 )]
+ Tr[𝑆 𝑅𝐵 𝑍 𝑅𝐵 ] + (𝜆 − 𝜇) Tr[𝜌 𝑅 ]
= Tr[𝑃 𝑅𝐵𝑊 𝑅𝐵 ] + Tr[𝑆 𝑅𝐵 𝑍 𝑅𝐵 ] + Tr[(Tr 𝐵 [𝑄 𝑅𝐵 + 𝑄 †𝑅𝐵 ] + (𝜆 − 𝜇) 1 𝑅 ) 𝜌 𝑅 ]

301
Chapter 6: Distinguishibility Measures for Quantum States and Channels

 𝑃 𝑅𝐵 0 0 𝑊 𝑅𝐵 0 0 
©
= Tr  0 𝑆 𝑅𝐵 0 ª © ª
® 0 𝑍 𝑅𝐵 0 ® .
 0
« 0 Tr 𝐵 [𝑄 𝑅𝐵 + 𝑄 †𝑅𝐵 ] + (𝜆 − 𝜇) 1 𝑅 ¬ « 0 0 𝜌 𝑅 ¬
(6.B.48)
So then
𝑃 𝑅𝐵 0 0
Φ(𝑋) =
© 0 𝑆 0 ª
𝑅𝐵 ®. (6.B.49)
« 0 0 Tr 𝐵 [𝑄 𝑅𝐵 + 𝑄 𝑅𝐵 ] + (𝜆 − 𝜇) 1 𝑅 ¬
†

The primal is then given by

 0
© 0 0 0 𝑃 𝑅𝐵 𝑄 †𝑅𝐵 0 0 
1 0 ª© ª
0 0 0 ® 𝑄 𝑅𝐵 𝑆 𝑅𝐵 0 0 ®
sup Tr  ® ® , (6.B.50)
2 0 0 1 0 ® 0 0 𝜆 0 ®
 0
« 0 0 −1¬ « 0 0 0 𝜇¬
subject to

𝑃 𝑅𝐵 0 0 N
Γ𝑅𝐵 0 0
M 0ª ,
® ≤ 0 Γ𝑅𝐵
© 0 𝑆 0 ª ©
𝑅𝐵 ® (6.B.51)
« 0 0 Tr 𝐵 [𝑄 𝑅𝐵 + 𝑄 𝑅𝐵 ] + (𝜆 − 𝜇) 1 𝑅 ¬ « 0
†
0 0¬
𝑃 𝑅𝐵 𝑄 †𝑅𝐵 0 0
© ª
𝑄 𝑅𝐵 𝑆 𝑅𝐵 0 0 ®
® ≥ 0, (6.B.52)
0 0 𝜆 0®
« 0 0 0 𝜇¬
which simplifies to
1
sup (𝜆 − 𝜇) (6.B.53)
2
subject to
N
𝑃 𝑅𝐵 ≤ Γ𝑅𝐵 , (6.B.54)
M
𝑆 𝑅𝐵 ≤ Γ𝑅𝐵 , (6.B.55)
Tr 𝐵 [𝑄 𝑅𝐵 + 𝑄 †𝑅𝐵 ] + (𝜆 − 𝜇) 1 𝑅 ≤ 0, (6.B.56)
𝑃 𝑅𝐵 𝑄 †𝑅𝐵

≥ 0, (6.B.57)
𝑄 𝑅𝐵 𝑆 𝑅𝐵
𝜆, 𝜇 ≥ 0. (6.B.58)
302
Chapter 6: Distinguishibility Measures for Quantum States and Channels

We can simplify this even more. We can set 𝜆′ = 𝜆 − 𝜇 ∈ R, and we can substitute
𝑄 𝑅𝐵 with −𝑄 𝑅𝐵 without changing the value, so then it becomes
1
sup 𝜆′ (6.B.59)
2
subject to
N
𝑃 𝑅𝐵 ≤ Γ𝑅𝐵 , (6.B.60)
M
𝑆 𝑅𝐵 ≤ Γ𝑅𝐵 , (6.B.61)
𝜆′1 𝑅 ≤ Tr 𝐵 [𝑄 𝑅𝐵 + 𝑄 †𝑅𝐵 ], (6.B.62)
−𝑄 †𝑅𝐵

𝑃 𝑅𝐵
≥ 0, (6.B.63)
−𝑄 𝑅𝐵 𝑆 𝑅𝐵
𝜆′ ∈ R. (6.B.64)

We can rewrite
𝑃 𝑅𝐵 −𝑄 †𝑅𝐵 𝑃 𝑅𝐵 𝑄 †𝑅𝐵

≥0 ⇐⇒ ≥0 (6.B.65)
−𝑄 𝑅𝐵 𝑆 𝑅𝐵 𝑄 𝑅𝐵 𝑆 𝑅𝐵
−𝑄 †𝑅𝐵

𝑃 𝑅𝐵 0 0
⇐⇒ ≥ (6.B.66)
0 𝑆 𝑅𝐵 −𝑄 𝑅𝐵 0

We then have the simplified condition

−𝑄 †𝑅𝐵
N
0 𝑃 𝑅𝐵 0 Γ𝑅𝐵 0
≤ ≤ M . (6.B.67)
−𝑄 𝑅𝐵 0 0 𝑆 𝑅𝐵 0 Γ𝑅𝐵

Since 𝑃 𝑅𝐵 and 𝑆 𝑅𝐵 do not appear in the objective function, we can set them to their
largest value and obtain the following simplification
1
sup 𝜆′ (6.B.68)
2
subject to
N 𝑄 †𝑅𝐵

Γ𝑅𝐵
𝜆 1𝑅 ≤
′
Tr 𝐵 [𝑄 𝑅𝐵 + 𝑄 †𝑅𝐵 ], M ≥ 0, 𝜆′ ∈ R (6.B.69)
𝑄 𝑅𝐵 Γ𝑅𝐵

Since a feasible solution is 𝜆′ = 0 and 𝑄 𝑅𝐵 = 0, it is clear that we can restrict to

𝜆′ ≥ 0. After a relabeling, this becomes

303
Chapter 6: Distinguishibility Measures for Quantum States and Channels

†
N
1 Γ
sup 𝜆 : 𝜆1 𝑅 ≤ Tr 𝐵 [𝑄 𝑅𝐵 + 𝑄 †𝑅𝐵 ], 𝑅𝐵 𝑄 𝑅𝐵 ≥ 0
M
𝑄 𝑅𝐵 Γ𝑅𝐵
2 𝜆≥0,𝑄 𝑅𝐵
Γ𝑅𝐵 𝑄 †𝑅𝐵
N
= sup 𝜆 : 𝜆1 𝑅 ≤ Re[Tr 𝐵 [𝑄 𝑅𝐵 ]], M ≥ 0 . (6.B.70)
𝜆≥0,𝑄 𝑅𝐵 𝑄 𝑅𝐵 Γ 𝑅𝐵

This is equivalent to
N 𝑄 †𝑅𝐵

Γ𝑅𝐵
sup 𝜆min (Re[Tr 𝐵 [𝑄 𝑅𝐵 ]]) : M ≥0 . (6.B.71)
𝑄 𝑅𝐵 𝑄 𝑅𝐵 Γ𝑅𝐵

This concludes the proof.

304
Chapter 7

Quantum Entropies and

Information
In this chapter, we introduce various entropic and information quantities that play
a fundamental role in the analysis of quantum communication protocols. Here
we see that the notions of entropy and information take many forms. The most
basic and fundamental entropic and information measures are members of the “von
Neumann family,” and these often correspond to optimal communication rates for
information-theoretic tasks in the asymptotic regime of many uses of an independent
and identically distributed (i.i.d.) resource. More refined entropy measures belong
to the “Rényi family,” and interestingly, in part due to the non-commutativity of
quantum states, there are several interesting ways of generalizing the classical
Rényi relative entropy that are meaningful for understanding information-theoretic
tasks. The Rényi measures reduce to the von Neumann ones in a particular limit,
and they are useful for characterizing optimal rates of information-theoretic tasks
in the non-asymptotic regime, in particular when trying to determine how fast
an error probability is converging to zero or one (so-called error exponents and
strong converse exponents, respectively). Even more broadly, we define entropy
measures from the meaningful “one-shot” information-theoretic task given by
quantum hypothesis testing. Even though one might debate whether such an
operationally defined quantity is truly an entropy, our opinion is that this perspective
is quite powerful, and so we adopt it in this chapter and the rest of the book.
Thus, this chapter explores quite broadly a variety of entropic measures and their
mathematical properties, as they are the basis for analyzing a wide variety of
quantum information-processing protocols.
305
Chapter 7: Quantum Entropies and Information

We start our development with a brief preview of quantum entropies and

information (Section 7.1) and then proceed to the well-known quantum relative
entropy (Section 7.2), which plays a foundational role in understanding properties
of other entropies. We then proceed to defining a generalized divergence, which is
a concept that plays an important role in the proofs of strong converse theorems
throughout this book (Section 7.3). Of particular focus are several prominent
examples of generalized divergences, the Petz–Rényi relative entropy (Section 7.4),
the sandwiched Rényi relative entropy (Section 7.5), the geometric Rényi relative
entropy (Section 7.6), the Belavkin–Staszewski relative entropy (Section 7.7), the
(smooth) max-relative entropy (Section 7.8), and the hypothesis testing relative
entropy (Section 7.9). The first of these plays an important role in achievability
proofs for quantum channel capacities, and the second plays an important role
for strong converses. The hypothesis testing relative entropy is fundamental in
establishing bounds on one-shot channel capacities.

7.1 Preview
Arguably one of the most important quantities in quantum information theory is the
von Neumann entropy, which is the quantum generalization of the Shannon entropy.
We also refer to it as the quantum entropy and do so from now on. For a quantum
system 𝐴 in the state 𝜌 𝐴 ∈ D(H 𝐴 ), the von Neumann entropy is defined as

𝐻 (𝜌 𝐴 ) ≡ 𝐻 ( 𝐴) 𝜌 B −Tr[𝜌 𝐴 log2 𝜌 𝐴 ]. (7.1.1)

If 𝜌 𝐴 has a spectral decomposition of the form

𝑟
∑︁
𝜌𝐴 = 𝜆𝑖 |𝜓𝑖 ⟩⟨𝜓𝑖 | 𝐴 , (7.1.2)
𝑖=1

where 𝑟 ≡ rank(𝜌 𝐴 ), then we can write 𝐻 (𝜌 𝐴 ) in terms of the non-zero eigenvalues

{𝜆𝑖 }𝑖=1
𝑟 of 𝜌 as
𝐴
𝑟
∑︁
𝐻 (𝜌 𝐴 ) = − 𝜆𝑖 log2 𝜆𝑖 . (7.1.3)
𝑖=1
Note that the zero eigenvalues of 𝜌 𝐴 do not contribute to the entropy due to the
convention that 0 log2 0 = 0, which is taken because lim𝑥→0+ 𝑥 log2 𝑥 = 0. By
viewing 𝜌 𝐴 as a probabilistic mixture of the pure states {|𝜓𝑖 ⟩ 𝐴 }𝑖=1
𝑟 defined in (7.1.2),

306
Chapter 7: Quantum Entropies and Information

the quantum entropy quantifies the uncertainty about which of these pure states the
system 𝐴 is in. In particular, 𝐻 (𝜌 𝐴 ) is, in a rough sense, the expected information
gain upon performing an experiment to determine the state of the system.
Note that the right-hand side of (7.1.3) is the formula for the Shannon entropy
of the probability distribution {𝜆𝑖 }𝑖 corresponding to the eigenvalues of 𝜌 𝐴 . The
Shannon entropy of the probability distribution {𝑝, 1 − 𝑝}, for 𝑝 ∈ [0, 1], shows
up frequently and is denoted by ℎ2 ( 𝑝), i.e.,
ℎ2 ( 𝑝) B −𝑝 log2 𝑝 − (1 − 𝑝) log2 (1 − 𝑝). (7.1.4)
It is called the binary entropy function.
Just as the Shannon entropy has an operational meaning as the optimal rate of
(classical) data compression, the quantum entropy has an operational interpretation
as the optimal rate of quantum data compression. More precisely, given the state
𝜌 ⊗𝑛
𝐴 , the quantum entropy 𝐻 (𝜌 𝐴 ) is the minimum number of qubits per copy of the
state 𝜌 𝐴 that are needed to faithfully represent 𝜌 ⊗𝑛
𝐴 , when 𝑛 becomes large. This
task is also called Schumacher compression.
Other fundamental information-theoretic quantities, which are functions of
the Shannon entropy, have straightforward generalizations to the quantum setting.
Let 𝜌 𝐴𝐵 be a bipartite state, and let 𝜎𝐴𝐵𝐶 be a tripartite state.
1. The quantum conditional entropy is defined as
𝐻 ( 𝐴|𝐵) 𝜌 B 𝐻 ( 𝐴𝐵) 𝜌 − 𝐻 (𝐵) 𝜌 . (7.1.5)
The quantum conditional entropy quantifies the uncertainty about the state of
the system 𝐴 in the presence of additional quantum side information in the
form of the quantum system 𝐵.
2. The coherent information is defined as
𝐼 ( 𝐴⟩𝐵) 𝜌 B 𝐻 (𝐵) 𝜌 − 𝐻 ( 𝐴𝐵) 𝜌 = −𝐻 ( 𝐴|𝐵) 𝜌 , (7.1.6)
and it arises in the context of communication of quantum information over
quantum channels (see Chapter 14). The coherent information is asymmetric
and can be interpreted as having a directionality. We obtain a quantity called
the reverse coherent information by swapping the systems 𝐴 and 𝐵 in (7.1.6):
𝐼 (𝐵⟩ 𝐴) 𝜌 B 𝐻 ( 𝐴) 𝜌 − 𝐻 ( 𝐴𝐵) 𝜌 = −𝐻 (𝐵| 𝐴) 𝜌 . (7.1.7)
This quantity arises when studying feedback-assisted quantum communication.
307
Chapter 7: Quantum Entropies and Information

3. The quantum mutual information is defined as

𝐼 ( 𝐴; 𝐵) 𝜌 B 𝐻 ( 𝐴) 𝜌 + 𝐻 (𝐵) 𝜌 − 𝐻 ( 𝐴𝐵) 𝜌 , (7.1.8)

= 𝐻 ( 𝐴) 𝜌 − 𝐻 ( 𝐴|𝐵) 𝜌 (7.1.9)
= 𝐻 (𝐵) 𝜌 − 𝐻 (𝐵| 𝐴) 𝜌 , (7.1.10)

and it arises in the context of communicating classical information over

quantum channels (see Chapters 11 and 12).
4. The quantum conditional mutual information is defined as

𝐼 ( 𝐴; 𝐵|𝐶)𝜎 B 𝐻 ( 𝐴|𝐶)𝜎 + 𝐻 (𝐵|𝐶)𝜎 − 𝐻 ( 𝐴𝐵|𝐶)𝜎 , (7.1.11)

and it is the basis for an entanglement measure called squashed entanglement

(see Chapter 9). An important result is that the quantum conditional mutual
information is non-negative, i.e., 𝐼 ( 𝐴; 𝐵|𝐶)𝜎 ≥ 0 for every tripartite state 𝜎𝐴𝐵𝐶 .
This inequality goes by the name strong subadditivity of quantum entropy,
and we show at the end of Section 7.2 that it follows from the data-processing
inequality for the quantum relative entropy (Theorem 7.4).
As it turns out, all of these quantities, including the entropy itself, can be derived
from a single parent quantity, the quantum relative entropy, which we introduce in
the next section.

7.2 Quantum Relative Entropy

We define the quantum relative entropy as follows:

Definition 7.1 Quantum Relative Entropy

For every state 𝜌 and positive semi-definite operator 𝜎, the quantum relative
entropy of 𝜌 and 𝜎, denoted by 𝐷 (𝜌∥𝜎), is defined as

Tr[𝜌(log2 𝜌 − log2 𝜎)] if supp(𝜌) ⊆ supp(𝜎),
𝐷 (𝜌∥𝜎) = (7.2.1)
+∞ otherwise.

308
Chapter 7: Quantum Entropies and Information

Remark: More generally, we could define the quantum relative entropy exactly as above, but
with both arguments being positive semi-definite operators. For our purposes in this book,
however, it suffices to restrict the first argument to be a state.

The quantum relative entropy is a particular quantum generalization of the

Kullback–Leibler divergence or classical relative entropy, which for two probability
distributions 𝑝, 𝑞 : X → [0, 1] defined on a finite alphabet X is given by

∑︁ 𝑝(𝑥)
𝐷 ( 𝑝∥𝑞) = 𝑝(𝑥) log2 . (7.2.2)
𝑞(𝑥)
𝑥∈X

The quantum relative entropy has an operational meaning in terms of the task of
quantum hypothesis testing, as is shown in Section 7.10 on the quantum Stein’s
lemma (Theorem 7.78). The quantum relative entropy 𝐷 (𝜌∥𝜎) can also be
interpreted as a distinguishability measure for the quantum states 𝜌 and 𝜎, in part
due to the facts that 𝐷 (𝜌∥𝜎) ≥ 0 and 𝐷 (𝜌∥𝜎) = 0 if and only if 𝜌 = 𝜎, which is
shown in Proposition 7.3 below.
The support condition supp(𝜌) ⊆ supp(𝜎) in the definition of the quantum
relative entropy essentially has to do with the term Tr[𝜌 log2 𝜎] and the fact that the
logarithm of an operator is really only well defined for positive definite operators,
while we allow for 𝜎 to be positive semi-definite, which means that it could have
some eigenvalues equal to zero. Recall that the expression 𝜌 log2 𝜌 is well defined
even for states with eigenvalues equal to zero since we set 0 log2 0 = 0. We justified
this with the fact that lim𝑥→0+ 𝑥 log2 𝑥 = 0. We can similarly make sense of the
support condition in the definition of the quantum relative entropy by using the
following fact.

Proposition 7.2
For every state 𝜌 and positive semi-definite operator 𝜎,

𝐷 (𝜌∥𝜎) = lim+ Tr[𝜌(log2 𝜌 − log2 (𝜎 + 𝜀 1))]. (7.2.3)

𝜀→0

Consequently, whenever 𝜎 does not have full support (i.e., it is positive semi-
definite as opposed to positive definite), we can write 𝐷 (𝜌∥𝜎) as the following
limit:
𝐷 (𝜌∥𝜎) = lim+ 𝐷 (𝜌∥𝜎 + 𝜀 1). (7.2.4)
𝜀→0

309
Chapter 7: Quantum Entropies and Information

Proof: Observe that, for all 𝜀 > 0, the operator 𝜎 + 𝜀 1 has full support; i.e.,
supp(𝜎 + 𝜀 1) = H for all 𝜀 > 0, where H is the underlying Hilbert space. This
means that the quantity
Tr[𝜌 log2 (𝜎 + 𝜀 1)] (7.2.5)
is finite for all 𝜀 > 0. Now, let us decompose the Hilbert space H into the direct sum
of the orthogonal subspaces supp(𝜎) and ker(𝜎), so that H = supp(𝜎) ⊕ ker(𝜎).
Let Π𝜎 be the projection onto supp(𝜎) and Π𝜎⊥ the projection onto ker(𝜎). Then
with respect to this decomposition, the operators 𝜌 and 𝜎 can be written as the
block matrices
Π𝜎 𝜌Π𝜎 Π𝜎 𝜌Π𝜎⊥

𝜌0,0 𝜌0,1
𝜌= ≡ † ,
Π𝜎⊥ 𝜌Π𝜎 Π𝜎⊥ 𝜌Π𝜎⊥ 𝜌0,1 𝜌1,1
(7.2.6)
Π𝜎 𝜎Π𝜎 Π𝜎 𝜎Π𝜎⊥

𝜎 0
𝜎= = .
Π𝜎⊥ 𝜎Π𝜎 Π𝜎⊥ 𝜎Π𝜎⊥ 0 0

Now, let us first suppose that supp(𝜌) ⊆ supp(𝜎). This

means that 𝜌0,0 = 𝜌
Π
and that 𝜌0,1 = 0 and 𝜌1,1 = 0. Using the fact that 1H = 𝜎
0
, we can write
0 Π𝜎⊥
the second term in (7.2.3) as

𝜎 + 𝜀Π𝜎 0
Tr[𝜌 log2 (𝜎 + 𝜀 1)] = Tr
𝜌0,0 0
log2 (7.2.7)
0 0 0 𝜀Π𝜎⊥

𝜌0,0 0 log2 (𝜎 + 𝜀Π𝜎 ) 0
= Tr (7.2.8)
0 0 0 log2 (𝜀Π𝜎⊥ )
= Tr[𝜌0,0 log2 (𝜎 + 𝜀Π𝜎 )]. (7.2.9)
Therefore,
lim Tr[𝜌 log2 (𝜎 + 𝜀 1)] = Tr[𝜌0,0 log2 (𝜎 + 𝜀Π𝜎 )] = Tr[𝜌 log2 𝜎], (7.2.10)
𝜀→0+

which means that

lim Tr[𝜌(log2 𝜌 − log2 (𝜎 + 𝜀 1))] = lim+ Tr[𝜌(log2 𝜌 − log2 𝜎)] (7.2.11)
𝜀→0+ 𝜀→0

whenever supp(𝜌) ⊆ supp(𝜎).

If supp(𝜌) ⊈ supp(𝜎), then the block 𝜌1,1 of 𝜌 is non-zero (and the block 𝜌0,1
could be non-zero), and we obtain

𝜌0,0 𝜌0,1 log2 (𝜎 + 𝜀Π𝜎 )
Tr[𝜌 log2 (𝜎 + 𝜀 1)] = Tr
0
† (7.2.12)
𝜌0,1 𝜌1,1 0 log2 (𝜀Π𝜎⊥ )
310
Chapter 7: Quantum Entropies and Information

= Tr[𝜌0,0 log2 (𝜎 + 𝜀Π𝜎 )] + Tr[𝜌1,1 log2 (𝜀Π𝜎⊥ )] (7.2.13)

= Tr[𝜌0,0 log2 (𝜎 + 𝜀Π𝜎 )] + log2 (𝜀)Tr[𝜌1,1 Π𝜎⊥ ]. (7.2.14)

Then, using the fact that lim𝜀→0+ (− log2 𝜀) = +∞, we find that

lim Tr[𝜌(log2 𝜌 − log2 (𝜎 + 𝜀 1))] = Tr[𝜌 log2 𝜌]

𝜀→0+
− lim+ Tr[𝜌0,0 log2 (𝜎 + 𝜀Π𝜎 )] − log2 (𝜀) lim+ Tr[𝜌1,1 Π𝜎⊥ ] = +∞. (7.2.15)
𝜀→0 𝜀→0

We thus conclude that

lim Tr[𝜌(log2 𝜌 − log2 (𝜎 + 𝜀 1))]
𝜀→0+

Tr[𝜌(log2 𝜌 − log2 𝜎)] if supp(𝜌) ⊆ supp(𝜎),
= (7.2.16)
+∞ otherwise.
= 𝐷 (𝜌∥𝜎),

as required. ■

The proposition above, in particular the fact that we can write the quantum
relative entropy as 𝐷 (𝜌∥𝜎) = lim𝜀→0+ 𝐷 (𝜌∥𝜎+𝜀 1), allows us to take the logarithm
log2 (𝜎) of 𝜎 on only the support of 𝜎 when determining the quantum relative
entropy. We can use this fact to write another formula for the quantum relative
entropy. We start by writing a spectral decomposition of the state 𝜌 and positive
semi-definite operator 𝜎. In particular, setting 𝑟 𝜌 ≡ rank(𝜌) and 𝑟 𝜎 ≡ rank(𝜎), let
𝑑
∑︁ 𝑟𝜌
∑︁
𝜌= 𝑝 𝑗 |𝜓 𝑗 ⟩⟨𝜓 𝑗 | = 𝑝 𝑗 |𝜓 𝑗 ⟩⟨𝜓 𝑗 |, (7.2.17)
𝑗=1 𝑗=1
𝑑
∑︁ 𝑟𝜎
∑︁
𝜎= 𝑞 𝑘 |𝜙 𝑘 ⟩⟨𝜙 𝑘 | = 𝑞 𝑘 |𝜙 𝑘 ⟩⟨𝜙 𝑘 |, (7.2.18)
𝑘=1 𝑘=1

be spectral decompositions of 𝜌 and 𝜎, where 𝑑 = dim(H) and in the second

equality we have restricted the sum to only those eigenvalues that are non-zero.
Then,
𝑟𝜌
∑︁
𝐷 (𝜌∥𝜎) = 𝑝 𝑗 log2 𝑝 𝑗
𝑗=1

311
Chapter 7: Quantum Entropies and Information

𝑟
 ∑︁ 𝑟𝜎
!
© 𝜌 ∑︁ 
− Tr  𝑝 𝑗 |𝜓 𝑗 ⟩⟨𝜓 𝑗 | ® log2 𝑞 𝑘 |𝜙 𝑘 ⟩⟨𝜙 𝑘 | 
ª
(7.2.19)
 𝑗=1
¬ 𝑘=1

« 
𝑟
∑︁𝜌 𝑟 𝜌
∑︁ ∑︁ 𝑟 𝜎
2
= 𝑝 𝑗 log2 𝑝 𝑗 − ⟨𝜓 𝑗 |𝜙 𝑘 ⟩ 𝑝 𝑗 log2 𝑞 𝑘 (7.2.20)
𝑗=1 𝑗=1 𝑘=1
𝑟𝜌 " 𝑟𝜎
#
∑︁ ∑︁ 2
= 𝑝 𝑗 log2 𝑝 𝑗 − ⟨𝜓 𝑗 |𝜙 𝑘 ⟩ 𝑝 𝑗 log2 𝑞 𝑘 . (7.2.21)
𝑗=1 𝑘=1

Now, using the fact that the eigenvectors {|𝜙 𝑘 ⟩ : 1 ≤ 𝑘 ≤ 𝑑} form a complete
orthonormal basis for H, so that 1H = 𝑑𝑘=1 |𝜙 𝑘 ⟩⟨𝜙 𝑘 |, we conclude that
Í

𝑑
∑︁ 𝑟𝜎
∑︁ 2
1 = ⟨𝜓 𝑗 |𝜓 𝑗 ⟩ = ⟨𝜓 𝑗 |𝜙 𝑘 ⟩⟨𝜙 𝑘 |𝜓 𝑗 ⟩ = ⟨𝜓 𝑗 |𝜙 𝑘 ⟩ , (7.2.22)
𝑘=1 𝑘=1

for 1 ≤ 𝑗 ≤ 𝑟 𝜌 , where the last equality follows from the assumption that supp(𝜌) ⊆
supp(𝜎), which implies that the eigenvectors {|𝜓 𝑗 ⟩ : 1 ≤ 𝑗 ≤ 𝑟 𝜌 } of 𝜌 can be
expressed as a linear combination of the eigenvectors {|𝜙 𝑘 ⟩ : 1 ≤ 𝑘 ≤ 𝑟 𝜎 } of 𝜎.
Therefore,
𝑟 𝜌 ∑︁
𝑟𝜎
∑︁ 2 𝑝𝑗
𝐷 (𝜌∥𝜎) = ⟨𝜓 𝑗 |𝜙 𝑘 ⟩ 𝑝 𝑗 log2 , (7.2.23)
𝑗=1 𝑘=1
𝑞 𝑘

whenever supp(𝜌) ⊆ supp(𝜎).

We now detail some important mathematical properties of the quantum relative
entropy.

Proposition 7.3 Basic Properties of the Quantum Relative Entropy

The quantum relative entropy satisfies the following properties for all states
𝜌, 𝜌1 , 𝜌2 and positive semi-definite operators 𝜎, 𝜎1 , 𝜎2 :
1. Isometric invariance: For every isometry 𝑉,

𝐷 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ) = 𝐷 (𝜌∥𝜎). (7.2.24)

2. (a) Klein’s inequality: If Tr(𝜎) ≤ 1, then 𝐷 (𝜌∥𝜎) ≥ 0.

(b) Faithfulness: 𝐷 (𝜌∥𝜎) = 0 if and only if 𝜌 = 𝜎.

312
Chapter 7: Quantum Entropies and Information

(c) If 𝜌 ≤ 𝜎, then 𝐷 (𝜌∥𝜎) ≤ 0.

(d) If 𝜎 ≤ 𝜎′, then 𝐷 (𝜌∥𝜎) ≥ 𝐷 (𝜌∥𝜎′).
3. Additivity:

𝐷 (𝜌1 ⊗ 𝜌2 ∥𝜎1 ⊗ 𝜎2 ) = 𝐷 (𝜌1 ∥𝜎1 ) + 𝐷 (𝜌2 ∥𝜎2 ). (7.2.25)

As a special case, for all 𝛽 ∈ (0, ∞),

1
𝐷 (𝜌∥ 𝛽𝜎) = 𝐷 (𝜌∥𝜎) + log2 . (7.2.26)
𝛽

4. Direct-sum property: Let 𝑝 : X → [0, 1] be a probability distribution

over a finite alphabet X with associated |X|-dimensional system 𝑋, and
let 𝑞 : X → [0, ∞) be a non-negative function on X. Let {𝜌 𝑥𝐴 }𝑥∈X be a set
of states on a system 𝐴, and let {𝜎𝐴𝑥 }𝑥∈X be a set of positive semi-definite
operators on 𝐴. Then,
∑︁
𝐷 (𝜌 𝑋 𝐴 ∥𝜎𝑋 𝐴 ) = 𝐷 ( 𝑝∥𝑞) + 𝑝(𝑥)𝐷 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ). (7.2.27)
𝑥∈X

where
∑︁
𝜌𝑋 𝐴 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 , (7.2.28)
𝑥∈X
∑︁
𝜎𝑋 𝐴 B 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 . (7.2.29)
𝑥∈X

Remark: If we let the first argument of the relative entropy be a general positive semi-definite
operator instead of just a state, then (7.2.26) can be generalized for every 𝛼, 𝛽 ∈ (0, ∞) as

𝛼
𝐷 (𝛼𝜌∥ 𝛽𝜎) = 𝛼𝐷 (𝜌∥𝜎) + 𝛼 log2 . (7.2.30)
𝛽

Proof:
1. Proof of isometric invariance: When supp(𝜌) ⊈ supp(𝜎), there is noth-
ing to prove because supp(𝑉 𝜌𝑉 † ) ⊈ supp(𝑉 𝜎𝑉 † ), which means that both

313
Chapter 7: Quantum Entropies and Information

𝐷 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ) and 𝐷 (𝜌∥𝜎) are equal to +∞.

Suppose that supp(𝜌) ⊆ supp(𝜎), which implies that supp(𝑉 𝜌𝑉 † ) ⊆ supp(𝑉 𝜎𝑉 † ).
Let 𝜌 and 𝜎 have the spectral decompositions given in (7.2.17) and (7.2.18),
respectively. Using the formula in (7.2.23), we find that
𝑟 𝜌 ∑︁
𝑟𝜎
∑︁ 𝑝𝑗2
𝐷 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ) = ⟨𝜓 𝑗 |𝑉 †𝑉 |𝜙 𝑘 ⟩ 𝑝 𝑗 log2
𝑗=1 𝑘=1
𝑞𝑘
𝑟 𝜌 ∑︁
𝑟𝜎
∑︁ 2 𝑝𝑗 (7.2.31)
= ⟨𝜓 𝑗 |𝜙 𝑘 ⟩ 𝑝 𝑗 log2
𝑗=1 𝑘=1
𝑞𝑘
= 𝐷 (𝜌∥𝜎).

We used the fact that 𝑉 is an isometry, i.e., satisfying 𝑉 †𝑉 = 1.

2. (a) Proof of Klein’s inequality: The result is trivial in the case that supp(𝜌) ⊈
supp(𝜎), and so we assume that supp(𝜌) ⊆ supp(𝜎). We can write the
relative entropy as in (7.2.21),
𝑑
" 𝑑
#
∑︁ ∑︁ 2
𝐷 (𝜌∥𝜎) = 𝑝 𝑗 log2 𝑝 𝑗 − ⟨𝜓 𝑗 |𝜙 𝑘 ⟩ 𝑝 𝑗 log2 𝑞 𝑘 , (7.2.32)
𝑗=1 𝑘=1

where we have extended the sums to include all terms up to 𝑑 = dim(H).

2
Now, let 𝑐 𝑗,𝑘 ≡ ⟨𝜓 𝑗 |𝜙 𝑘 ⟩ , and observe that
𝑑
∑︁
𝑐 𝑗,𝑘 ≥ 0 ∀ 1 ≤ 𝑗, 𝑘 ≤ 𝑑, and 𝑐 𝑗,𝑘 = 1 ∀ 1 ≤ 𝑗 ≤ 𝑑. (7.2.33)
𝑘=1

The latter indeed holds because

𝑑
∑︁ 𝑑
∑︁
𝑐 𝑗,𝑘 = ⟨𝜓 𝑗 |𝜙 𝑘 ⟩⟨𝜙 𝑘 |𝜓 𝑗 ⟩ (7.2.34)
𝑘=1 𝑘=1
𝑑
!
∑︁
= ⟨𝜓 𝑗 | |𝜙 𝑘 ⟩⟨𝜙 𝑘 | |𝜓 𝑗 ⟩ (7.2.35)
𝑘=1
| {z }
1
= ⟨𝜓 𝑗 |𝜓 𝑗 ⟩ (7.2.36)
= 1. (7.2.37)
314
Chapter 7: Quantum Entropies and Information

Therefore, for each 𝑗, the set {𝑐 𝑗,𝑘 : 1 ≤ 𝑘 ≤ 𝑑} constitutes a probability

distribution over 𝑘. Using the concavity of the function log2 , we thus
obtain
𝑑 𝑑
!
∑︁ ∑︁
𝑐 𝑗,𝑘 log2 (𝑞 𝑘 ) ≤ log2 𝑐 𝑗,𝑘 𝑞 𝑘 = log2 (𝑟 𝑗 ) (7.2.38)
𝑘=1 𝑘=1
for all 1 ≤ 𝑗 ≤ 𝑑, where
𝑑
∑︁
𝑟𝑗 B 𝑐 𝑗,𝑘 𝑞 𝑘 . (7.2.39)
𝑘=1
Therefore, we obtain
𝑑 𝑑 𝑑
!
∑︁ ∑︁ ∑︁
𝐷 (𝜌∥𝜎) = 𝑝 𝑗 log2 ( 𝑝 𝑗 ) − 𝑝𝑗 𝑐 𝑗,𝑘 log2 (𝑞 𝑘 ) (7.2.40)
𝑗=1 𝑗=1 𝑘=1
𝑑
∑︁ 𝑝𝑗
≥ 𝑝 𝑗 log2 (7.2.41)
𝑗=1
𝑟𝑗
𝑑
∑︁ 𝑟𝑗
=− 𝑝 𝑗 log2 . (7.2.42)
𝑗=1
𝑝 𝑗

Now, we make use of the fact that

1−𝑥
− log2 (𝑥) ≥ ∀𝑥 > 0, (7.2.43)
ln(2)
with equality if and only if 𝑥 = 1. This fact can be readily verified by
elementary calculus. Using this and (7.2.40)–(7.2.42), we obtain
𝑑
1 ∑︁ 𝑟𝑗
𝐷 (𝜌∥𝜎) ≥ 𝑝𝑗 1 − (7.2.44)
ln(2) 𝑗=1 𝑝𝑗
𝑑 𝑑
1 ∑︁ 1 ∑︁
= 𝑝𝑗 − 𝑟𝑗. (7.2.45)
ln(2) 𝑗=1 ln(2) 𝑗=1
Í𝑑
Now, 𝑗=1 𝑝 𝑗 = Tr(𝜌) = 1, and since
𝑑
∑︁ 𝑑
©∑︁
= ⟨𝜙 𝑘 | |𝜓 𝑗 ⟩⟨𝜓 𝑗 | ® |𝜙 𝑘 ⟩ = 1,
ª
𝑐 𝑗,𝑘 (7.2.46)
𝑗=1 « 𝑗=1 ¬
| {z }
1
315
Chapter 7: Quantum Entropies and Information

we obtain
𝑑
∑︁ 𝑑
∑︁
𝑟𝑗 = 𝑞 𝑘 = Tr(𝜎). (7.2.47)
𝑗=1 𝑘=1

Therefore,
1
𝐷 (𝜌∥𝜎) ≥ (1 − Tr(𝜎)) ≥ 0, (7.2.48)
ln(2)
as required, where the last inequality holds by the assumption that Tr(𝜎) ≤
1.
(b) We are now interested in the case of equality in the statement 𝐷 (𝜌∥𝜎) ≥ 0
that we just proved in (a). In that proof, we made use of two inequalities.
The first was in (7.2.38), where we made use of the concavity of the
logarithm. Equality holds in (7.2.38) if and only if for each 𝑗 there exists 𝑘
such that 𝑐 𝑗,𝑘 = 1. The second inequality we used was in (7.2.43), where
equality holds if and only if 𝑥 = 1. Therefore, equality holds in (7.2.44)
if and only if 𝑝 𝑗 = 𝑟 𝑗 for all 𝑗, and equality in (7.2.38) is true if and only
if the eigenvectors of 𝜌 and 𝜎 are, up to relabeling, the same. Therefore
𝐷 (𝜌∥𝜎) = 0 if and only if 𝑝 𝑗 = 𝑞 𝑗 for all 𝑗 and the corresponding
eigenvectors are (up to relabeling) equal, which is true if and only if 𝜌 = 𝜎.
(c) Suppose that both 𝜌 and 𝜎 are positive definite. Since the logarithm is
operator monotone (see Section 2.2.8.1), the operator inequality 𝜌 ≤ 𝜎 im-
1 1
plies that log2 (𝜌) ≤ log2 (𝜎). This implies the inequality 𝜌 2 log2 (𝜌) 𝜌 2 ≤
1 1
𝜌 2 log2 (𝜎) 𝜌 2 , which implies that Tr[𝜌 log2 (𝜌)] ≤ Tr[𝜌 log2 (𝜎)], proving
the result. In the case that 𝜌 and/or 𝜎 are not positive definite, we first apply
the result to the positive definite state (1 − 𝛿) 𝜌 + 𝛿𝜋 and the positive definite
operator 𝜎 + 𝜀 1, with 𝛿, 𝜀 > 0, so that 𝐷 ((1 − 𝛿) 𝜌 + 𝛿𝜋∥𝜎 + 𝜀 1) ≤ 0.
Then, using

lim lim 𝐷 ((1 − 𝛿) 𝜌 + 𝛿𝜋∥𝜎 + 𝜀 1) = 𝐷 (𝜌∥𝜎), (7.2.49)

𝜀→0+ 𝛿→0+

we obtain the desired result.

(d) As in (c), first suppose that 𝜌, 𝜎, and 𝜎′ are positive definite. Since the
logarithm is operator monotone, the operator inequality 𝜎′ ≥ 𝜎 implies that
log2 (𝜎′) ≥ log2 (𝜎), which implies that Tr[𝜌 log2 (𝜎′)] ≥ Tr[𝜌 log2 𝜎].
Therefore,

𝐷 (𝜌∥𝜎) = Tr[𝜌(log2 𝜌 − log2 𝜎)] (7.2.50)

316
Chapter 7: Quantum Entropies and Information

≥ Tr[𝜌(log2 𝜌 − log2 𝜎′)] (7.2.51)

= 𝐷 (𝜌∥𝜎′), (7.2.52)

as required. In the general case that the operators are not positive definite,
as in (c) we apply the result to the positive definite operators (1 − 𝛿) 𝜌 + 𝛿𝜋,
𝜎 + 𝜀 1, and 𝜎′ + 𝜀′1, for 𝛿, 𝜀, 𝜀′ > 0, and then use (7.2.49) to obtain the
result.
3. Proof of additivity: Since supp(𝜌1 ⊗ 𝜌2 ) = supp(𝜌1 ) ⊗ supp(𝜌2 ) and supp(𝜎1 ⊗
𝜎2 ) = supp(𝜎1 ) ⊗ supp(𝜎2 ), the condition supp(𝜌1 ⊗ 𝜌2 ) ⊈ supp(𝜎1 ⊗ 𝜎2 )
is equivalent to the condition supp(𝜌1 ) ⊈ supp(𝜎1 ) or supp(𝜌2 ) ⊈ supp(𝜎2 ).
Therefore, 𝐷 (𝜌1 ⊗ 𝜌2 ∥𝜎1 ⊗𝜎2 ) = +∞ and 𝐷 (𝜌1 ∥𝜎1 ) = +∞ or 𝐷 (𝜌2 ∥𝜎2 ) = +∞
if one of the support conditions is violated. Now suppose that supp(𝜌1 ⊗ 𝜌2 ) ⊆
supp(𝜎1 ⊗ 𝜎2 ). Letting 𝜌1 and 𝜌2 have spectral decompositions
𝑑
∑︁ 𝑑
∑︁
𝜌1 = 𝑝 1𝑗 |𝜓 1𝑗 ⟩⟨𝜓 1𝑗 |, 𝜌2 = 𝑝 2𝑘 |𝜓 2𝑘 ⟩⟨𝜓 2𝑘 |, (7.2.53)
𝑗=1 𝑘=1

we find that
𝑑
© ∑︁ 1 1 1
log2 (𝜌1 ⊗ 𝜌2 ) = log2 𝑝 𝑗 |𝜓 𝑗 ⟩⟨𝜓 𝑗 | ⊗ 𝑝 2𝑘 |𝜓 2𝑘 ⟩⟨𝜓 2𝑘 | ®
ª
(7.2.54)
« 𝑗,𝑘=1 ¬
𝑑
∑︁
= log2 ( 𝑝 1𝑗 𝑝 2𝑘 )|𝜓 1𝑗 ⟩⟨𝜓 1𝑗 | ⊗ |𝜓 2𝑘 ⟩⟨𝜓 2𝑘 | (7.2.55)
𝑗,𝑘=1
𝑑
∑︁
= log2 ( 𝑝 1𝑗 )|𝜓 1𝑗 ⟩⟨𝜓 1𝑗 | ⊗ |𝜓 2𝑘 ⟩⟨𝜓 2𝑘 | (7.2.56)
𝑗,𝑘=1
𝑑
∑︁
+ log2 ( 𝑝 2𝑘 )|𝜓 1𝑗 ⟩⟨𝜓 1𝑗 | ⊗ |𝜓 2𝑘 ⟩⟨𝜓 2𝑘 | (7.2.57)
𝑗,𝑘=1
= log2 (𝜌1 ) ⊗ 1 + 1 ⊗ log2 (𝜌2 ). (7.2.58)

Similarly, log2 (𝜎1 ⊗ 𝜎2 ) = log2 (𝜎1 ) ⊗ 1 + 1 ⊗ log2 (𝜎2 ). Therefore,

𝐷 (𝜌1 ⊗ 𝜌2 ∥𝜎1 ⊗ 𝜎2 )
= Tr (𝜌1 ⊗ 𝜌2 )(log2 (𝜌1 ) ⊗ 1 + 1 ⊗ log2 (𝜌2 )

− Tr (𝜌1 ⊗ 𝜌2 )(log2 (𝜎1 ) ⊗ 1 + 1 ⊗ log2 (𝜎2 )

(7.2.59)
317
Chapter 7: Quantum Entropies and Information

= Tr(𝜌2 ) Tr[𝜌1 log2 (𝜌1 )] − Tr[𝜌1 log2 (𝜎1 )]

− Tr(𝜌1 ) Tr[𝜌2 log2 (𝜌2 )] − Tr[𝜌2 log2 (𝜎2 )] (7.2.60)
= 𝐷 (𝜌1 ∥𝜎1 ) + 𝐷 (𝜌2 ∥𝜎2 ). (7.2.61)

Then, to see (7.2.26), let 𝜌 = 𝜌1 , 𝛼 = 𝜌2 = 1, 𝜎 = 𝜎1 , and 𝛽 = 𝜎2 . Recognizing

that the tensor product with a scalar is just multiplication by the scalar, we find
that
𝐷 (𝜌∥ 𝛽𝜎) = 𝐷 (𝜌∥𝜎) + 𝐷 (1∥ 𝛽)
= 𝐷 (𝜌∥𝜎) + (log2 (1) − log2 𝛽)
(7.2.62)
1
= 𝐷 (𝜌∥𝜎) + log2 .
𝛽
4. Proof of the direct-sum property: Define the classical–quantum operators
∑︁ ∑︁
𝑥
𝜌𝑋 𝐴 = 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝐴 , 𝜎𝑋 𝐴 = 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 . (7.2.63)
𝑥∈X 𝑥∈X

Observe that
∑︁
log2 𝜌 𝑋 𝐴 = |𝑥⟩⟨𝑥| 𝑋 ⊗ log2 ( 𝑝(𝑥) 𝜌 𝑥𝐴 ) (7.2.64)
𝑥∈X
|𝑥⟩⟨𝑥| 𝑋 ⊗ log2 𝑝(𝑥) 1 𝐴 +
∑︁ ∑︁
= |𝑥⟩⟨𝑥| 𝑋 ⊗ log2 𝜌 𝑥𝐴 , (7.2.65)
𝑥∈X 𝑥∈X
∑︁
log2 𝜎𝑋 𝐴 = |𝑥⟩⟨𝑥| 𝑋 ⊗ log2 (𝑞(𝑥)𝜎𝐴𝑥 ) (7.2.66)
𝑥∈X
|𝑥⟩⟨𝑥| 𝑋 ⊗ log2 𝑞(𝑥) 1 𝐴 +
∑︁ ∑︁
= |𝑥⟩⟨𝑥| 𝑋 ⊗ log2 𝜎𝐴𝑥 . (7.2.67)
𝑥∈X 𝑥∈X

Then, in the case supp(𝜌 𝑋 𝐴 ) ⊆ supp(𝜎𝑋 𝐴 ), we obtain

𝐷 (𝜌 𝑋 𝐴 ∥𝜎𝑋 𝐴 ) = Tr[𝜌 𝑋 𝐴 log2 𝜌 𝑋 𝐴 ] − Tr[𝜌 𝑋 𝐴 log2 𝜎𝑋 𝐴 ] (7.2.68)

"
∑︁
= Tr 𝑝(𝑥) log2 ( 𝑝(𝑥))|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴
𝑋∈X
#
∑︁
+ 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 log2 𝜌 𝑥𝐴
𝑥∈X
"
∑︁
− Tr 𝑝(𝑥) log2 (𝑞(𝑥))|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴
𝑥∈X

318
Chapter 7: Quantum Entropies and Information
#
∑︁
+ 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 log2 𝜎𝐴𝑥 (7.2.69)
𝑥∈X
∑︁
= 𝑝(𝑥) log2 𝑝(𝑥) − 𝑝(𝑥) log2 𝑞(𝑥)
𝑥∈X
∑︁
+ 𝑝(𝑥)Tr 𝜌 𝑥𝐴 log2 𝜌 𝑥𝐴 − 𝜌 𝑥𝐴 log2 𝜎𝐴𝑥 (7.2.70)
𝑥∈X
∑︁
= 𝐷 ( 𝑝∥𝑞) + 𝑝(𝑥)𝐷 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ), (7.2.71)
𝑥∈X

as required. ■

An important consequence of Klein’s inequality from the proposition above is

that
𝐷 (𝜌∥𝜎) ≥ 0 for all states 𝜌, 𝜎, and
(7.2.72)
𝐷 (𝜌∥𝜎) = 0 if and only if 𝜌 = 𝜎.

This allows us to use the quantum relative entropy as a distinguishability measure

for quantum states. We emphasize, however, that the quantum relative entropy
is not a metric in the mathematical sense since it is neither symmetric in its two
arguments nor does it satisfy the triangle inequality.
We now come to one of the most important properties of the quantum relative
entropy that is used frequently throughout this book: the data-processing inequality.
It is also called the monotonicity of the quantum relative entropy.

Theorem 7.4 Data-Processing Inequality for Quantum Relative Entropy

Let 𝜌 be a state, 𝜎 a positive semi-definite operator, and N a quantum channel.
Then,
𝐷 (𝜌∥𝜎) ≥ 𝐷 (N(𝜌)∥N(𝜎)). (7.2.73)

In other words, the quantum relative entropy 𝐷 (𝜌∥𝜎) can only decrease or
stay the same if we apply the same quantum channel N to the states 𝜌 and 𝜎.
When the quantum relative entropy is interpreted as a distinguishability measure on
quantum states, the data-processing inequality tells us that the distinguishability of
two quantum states cannot increase when we act on them with the same quantum
channel; see Figure 7.1. We postpone the proof of the data-processing inequality
319
Chapter 7: Quantum Entropies and Information

ρ
N

N (ρ)

D (ρkσ) D (N (ρ)kN (σ))

N (σ)
N
σ

Figure 7.1: Illustration of the data-processing inequality for the quantum

relative entropy (Theorem 7.4). The quantum states 𝜌, 𝜎, N(𝜌), and N(𝜎) are
represented by spheres that signify the amount of “space” they occupy in the
Hilbert space. While the states 𝜌 and 𝜎 are nearly distinguishable as depicted,
since their spheres do not overlap, after processing with the channel N, the states
become much less distinguishable because the spheres overlap significantly.

to later in the chapter, where it follows easily as a consequence of the data-

processing inequality for the Petz–Rényi and sandwiched Rényi relative entropies
(see Corollaries 7.25 and 7.34, respectively).
It is typically interesting and illuminating to investigate the conditions under
which an important inequality is saturated. The data-processing inequality for
quantum relative entropy is no exception. In the case that the action of the quantum
channel can be reversed on 𝜌 and 𝜎, so that there exists a recovery channel R
satisfying
𝜌 = (R ◦ N)(𝜌), 𝜎 = (R ◦ N)(𝜎), (7.2.74)
then it follows from an application of Theorem 7.4 that

𝐷 (N(𝜌)∥N(𝜎)) ≥ 𝐷 ((R ◦ N)(𝜌)∥(R ◦ N)(𝜎)) (7.2.75)

= 𝐷 (𝜌∥𝜎). (7.2.76)

Thus, by combining with Theorem 7.4, we in fact have that

𝐷 (𝜌∥𝜎) = 𝐷 (N(𝜌)∥N(𝜎)), (7.2.77)

so that the existence of a recovery channel implies saturation of the data-processing

inequality in Theorem 7.4.
320
Chapter 7: Quantum Entropies and Information

On the other hand, suppose that 𝜌, 𝜎, and N are such that the inequality in
Theorem 7.4 is saturated:

𝐷 (𝜌∥𝜎) = 𝐷 (N(𝜌)∥N(𝜎)), (7.2.78)

Then it is a non-trivial result that there exists a recovery channel R such that the
equality in (7.2.74) holds. In fact, this channel can be taken as the Petz recovery
channel from Definition 4.21. We do not provide a proof here and instead point to
the Bibliographic Notes in Section 7.13 for more details.
One of the remarkable aspects of the data-processing inequality for the quantum
relative entropy is that it alone can be used to prove many of the properties of the
quantum relative entropy stated in Proposition 7.3. For example, Klein’s inequality
follows by considering the trace channel Tr, so that for every state 𝜌 and positive
semi-definite operator 𝜎 such that Tr[𝜎] ≤ 1, we find that

Tr(𝜌)
𝐷 (𝜌∥𝜎) ≥ 𝐷 (Tr(𝜌)∥Tr(𝜎)) = Tr(𝜌) log2 (7.2.79)
Tr(𝜎)

1
= log2 ≥ 0. (7.2.80)
Tr(𝜎)
Isometric invariance also follows from the data-processing inequality. The inequality
𝐷 (𝜌∥𝜎) ≥ 𝐷 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ) follows from data processing because (·) → 𝑉 (·)𝑉 †
is a channel. The reverse inequality also follows from data processing because
𝐷 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ) ≥ 𝐷 (R𝑉 (𝑉 𝜌𝑉 † )∥R𝑉 (𝑉 𝜎𝑉 † )) = 𝐷 (𝜌∥𝜎), where R𝑉 is the
reversal channel defined in (4.4.13) and we used (4.4.17)–(4.4.20).
Another important fact that follows from the data-processing inequality for
quantum relative entropy is its joint convexity.

Proposition 7.5 Joint Convexity of Quantum Relative Entropy

Let 𝑝 : X → [0, 1] be a probability distribution over a finite alphabet X
with associated |X|-dimensional system 𝑋, let {𝜌 𝑥𝐴 }𝑥∈X be a set of states on a
system 𝐴, and let {𝜎𝐴𝑥 }𝑥∈X be a set of positive semi-definite operators on 𝐴.
Then,
!
∑︁ ∑︁ ∑︁
𝑝(𝑥)𝐷 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ) ≥ 𝐷 𝑝(𝑥) 𝜌 𝑥𝐴 𝑝(𝑥)𝜎𝐴𝑥 . (7.2.81)
𝑥∈X 𝑥∈X 𝑥∈X

321
Chapter 7: Quantum Entropies and Information

Proof: Define the classical–quantum state and operator, respectively, as

∑︁ ∑︁
𝜌𝑋 𝐴 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 , 𝜎𝑋 𝐴 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 . (7.2.82)
𝑥∈X 𝑥∈X

By the direct-sum property of the quantum relative entropy (Proposition 7.3), we

find that ∑︁
𝐷 (𝜌 𝑋 𝐴 ∥𝜎𝑋 𝐴 ) = 𝑝(𝑥)𝐷 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ). (7.2.83)
𝑥∈X
Í
Then, since we have 𝜌 𝐴 = Tr 𝑋 [𝜌 𝑋 𝐴 ] = 𝑥∈X 𝑝(𝑥) 𝜌 𝑥𝐴 and 𝜎𝐴 = Tr 𝑋 [𝜎𝑋 𝐴 ] =
Í 𝑥
𝑥∈X 𝑝(𝑥)𝜎𝐴 , we apply the data-processing inequality for quantum relative entropy
with respect to the partial trace channel Tr 𝑋 to obtain
∑︁
𝑝(𝑥)𝐷 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ) = 𝐷 (𝜌 𝑋 𝐴 ∥𝜎𝑋 𝐴 ) (7.2.84)
𝑥∈X
≥ 𝐷 (Tr 𝑋 (𝜌 𝑋 𝐴 )∥Tr 𝑋 (𝜎𝑋 𝐴 )) (7.2.85)
!
∑︁ ∑︁
=𝐷 𝑝(𝑥) 𝜌 𝑥𝐴 𝑝(𝑥)𝜎𝐴𝑥 , (7.2.86)
𝑥∈X 𝑥∈X

which is the desired joint convexity of quantum relative entropy. ■

Exercise 7.1
Í Í
Let 𝜌 𝑋 𝐴 B 𝑥∈X 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 and 𝜎𝑋 𝐴 B 𝑥∈X 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 . Prove
that
!
∑︁ ∑︁
𝐷 (𝜌 𝑋 𝐴 ∥𝜎𝑋 𝐴 ) ≥ 𝐷 ( 𝑝∥𝑞) + 𝐷 𝑝(𝑥) 𝜌 𝑥𝐴 𝑝(𝑥)𝜎𝐴𝑥 . (7.2.87)
𝑥∈X 𝑥∈X

7.2.1 Information Measures from Quantum Relative Entropy

As stated at the beginning of this chapter, the quantum relative entropy acts, as
in the classical case, as a parent quantity for all of the fundamental information-
theoretic quantities based on the quantum entropy. Indeed, using the properties of
the quantum relative entropy stated previously, it is straightforward to verify the
following:

322
Chapter 7: Quantum Entropies and Information

1. The quantum entropy 𝐻 (𝜌) of a state 𝜌 is given by

𝐻 (𝜌) = −𝐷 (𝜌∥ 1). (7.2.88)

2. The quantum conditional entropy 𝐻 ( 𝐴|𝐵) 𝜌 of a bipartite state 𝜌 𝐴𝐵 is given by

𝐻 ( 𝐴|𝐵) 𝜌 = −𝐷 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜌 𝐵 ) (7.2.89)
= − inf 𝐷 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ), (7.2.90)
𝜎𝐵 ∈D(H 𝐵 )

and the coherent information 𝐼 ( 𝐴⟩𝐵) 𝜌 of a bipartite state 𝜌 𝐴𝐵 is given by

𝐼 ( 𝐴⟩𝐵) 𝜌 = 𝐷 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜌 𝐵 ) (7.2.91)
= inf 𝐷 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ). (7.2.92)
𝜎𝐵 ∈D(H 𝐵 )

Observe that
𝐻 ( 𝐴|𝐵) 𝜌 = −𝐼 ( 𝐴⟩𝐵) 𝜌 (7.2.93)
for every bipartite state 𝜌 𝐴𝐵 . Similarly, we can write the reverse coherent
information as

𝐼 (𝐵⟩ 𝐴) 𝜌 = 𝐷 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴 ⊗ 1𝐵 ) (7.2.94)
= inf 𝐷 (𝜌 𝐴𝐵 ∥𝜎𝐴 ⊗ 1𝐵 ) (7.2.95)
𝜎𝐴 ∈D(H 𝐴)

3. The quantum mutual information 𝐼 ( 𝐴; 𝐵) 𝜌 of a bipartite state 𝜌 𝐴𝐵 is given by

𝐼 ( 𝐴; 𝐵) 𝜌 = 𝐷 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴 ⊗ 𝜌 𝐵 ) (7.2.96)
= inf 𝐷 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴 ⊗ 𝜎𝐵 ) (7.2.97)
𝜎𝐵 ∈D(H 𝐵 )
= inf 𝐷 (𝜌 𝐴𝐵 ∥𝜏𝐴 ⊗ 𝜌 𝐵 ) (7.2.98)
𝜏𝐴 ∈D(H 𝐴)
= inf 𝐷 (𝜌 𝐴𝐵 ∥𝜏𝐴 ⊗ 𝜎𝐵 ). (7.2.99)
𝜏𝐴 ∈D(H 𝐴)
𝜎𝐵 ∈D(H 𝐵 )

4. The quantum conditional mutual information 𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 of a tripartite state

𝜌 𝐴𝐵𝐶 is given by
𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 = 𝐷 (𝜌 𝐴𝐵𝐶 ∥𝜎𝐴𝐵𝐶 ), (7.2.100)
where

𝜎𝐴𝐵𝐶 B 2log2 (𝜌 𝐴𝐶 ⊗1𝐵 )+log2 (𝜌 𝐵𝐶 ⊗1 𝐴)−log2 (1 𝐴𝐵 ⊗𝜌𝐶 ) . (7.2.101)

323
Chapter 7: Quantum Entropies and Information

The expressions in (7.2.90), (7.2.92), and (7.2.97), in terms of an optimization of

the conditional entropy, coherent information, and mutual information, respectively,
are useful for defining information-theoretic quantities analogous to these ones in
the context of generalized divergences, which we introduce in the next section.

Exercise 7.2
Verify the equalities in (7.2.90), (7.2.92), and (7.2.97). Hint: First prove that

𝐷 (𝜌 𝐴𝐵 ∥𝜏𝐴 ⊗ 𝜌 𝐵 ) + 𝐷 (𝜏𝐴 ⊗ 𝜌 𝐵 ∥𝜏𝐴 ⊗ 𝜎𝐵 ) = 𝐷 (𝜌 𝐴𝐵 ∥𝜏𝐴 ⊗ 𝜎𝐵 ), (7.2.102)

for all states 𝜏𝐴 and 𝜎𝐵 . Then use the fact that

𝐷 (𝜏𝐴 ⊗ 𝜌 𝐵 ∥𝜏𝐴 ⊗ 𝜎𝐵 ) ≥ 0 (7.2.103)

which holds for all states 𝜏𝐴 and 𝜎𝐵 , by Klein’s inequality as stated in (7.2.72).
Set 𝜏𝐴 = 𝜋 𝐴 to prove (7.2.90) and (7.2.92), and set 𝜏𝐴 = 𝜌 𝐴 to prove (7.2.97).

The properties of the quantum relative entropy, such as the ones stated in
Propositions 7.3 and 7.5, can be directly translated to properties of the derived
information measures stated above. Some of these properties are used frequently
throughout the book, and so we state them here for convenience. They are
straightforward to verify using definitions and properties of the quantum relative
entropy.
• Additivity of the quantum entropy for product states 𝜌 and 𝜏:
𝐻 (𝜌 ⊗ 𝜏) = 𝐻 (𝜌) + 𝐻 (𝜏). (7.2.104)

• Isometric invariance of the quantum entropy for a state 𝜌 and an isometry 𝑉:

𝐻 (𝜌) = 𝐻 (𝑉 𝜌𝑉 † ). (7.2.105)

• Concavity of the quantum entropy: The joint convexity of the quantum relative
entropy, as stated in Proposition 7.5, and the identity in (7.2.88) imply that
the quantum entropy is concave in its input: if 𝑝 : X → [0, 1] is a probability
distribution over a finite alphabet X and {𝜌 𝑥𝐴 }𝑥∈X is a set of states on a system 𝐴,
then !
∑︁ ∑︁
𝑥
𝐻 𝑝(𝑥) 𝜌 𝐴 ≥ 𝑝(𝑥)𝐻 (𝜌 𝑥𝐴 ). (7.2.106)
𝑥∈X 𝑥∈X

324
Chapter 7: Quantum Entropies and Information

By taking a spectral decomposition of a state 𝜌, applying the concavity

inequality above, and the fact that the entropy of a pure state is equal to zero,
we conclude that the quantum entropy is non-negative for every state 𝜌:
𝐻 (𝜌) ≥ 0. (7.2.107)
By employing the mixing property of the Heisenberg–Weyl unitaries from
(3.2.98), the invariance of entropy under a unitary, its concavity, and the fact
that the entropy of the maximally mixed state 𝜋 𝐴 is equal to log2 𝑑 𝐴 , we
conclude the following dimension bound for the entropy of a state of system 𝐴:
𝐻 (𝜌) ≤ log2 𝑑 𝐴 . (7.2.108)

• Direct-sum property of the quantum entropy: The direct-sum property of the

quantum relative entropy (see Proposition 7.3) translates to the following for
the quantum entropy: If 𝑝 : X → [0, 1] is a probability distribution over a
finite alphabet X with associated |X|-dimensional system 𝑋 and {𝜌 𝑥𝐴 }𝑥∈X is a
set of states on a system 𝐴, then
!
∑︁ ∑︁
𝐻 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 = 𝐻 ( 𝑝) + 𝑝(𝑥)𝐻 (𝜌 𝑥𝐴 ), (7.2.109)
𝑥∈X 𝑥∈X
Í
where 𝐻 ( 𝑝) B − 𝑥∈X 𝑝(𝑥) log2 𝑝(𝑥) is the (classical) Shannon entropy of
the probability distribution 𝑝.
• Chain rule for conditional entropy: For every state 𝜌 𝐴𝐵𝐶 , the following equality
holds
𝐻 ( 𝐴𝐵|𝐶) 𝜌 = 𝐻 ( 𝐴|𝐶) 𝜌 + 𝐻 (𝐵| 𝐴𝐶) 𝜌 . (7.2.110)
If the system 𝐶 is trivial, so that the state is a bipartite state 𝜌 𝐴𝐵 , then this
equality reduces to the following:
𝐻 ( 𝐴𝐵) 𝜌 = 𝐻 ( 𝐴) 𝜌 + 𝐻 (𝐵| 𝐴) 𝜌 . (7.2.111)

• Chain rule for quantum mutual information: For every state 𝜌 𝐴𝐵𝐶 , the following
equality holds
𝐼 ( 𝐴; 𝐵𝐶) 𝜌 = 𝐼 ( 𝐴; 𝐵) 𝜌 + 𝐼 ( 𝐴; 𝐶 |𝐵) 𝜌 . (7.2.112)
We call this the chain rule because it can be interpreted as saying that the
correlations between 𝐴 and 𝐵𝐶 can be built up by first establishing correlations
between 𝐴 and 𝐵 (signified by 𝐼 ( 𝐴; 𝐵) 𝜌 ), then establishing correlations between
𝐴 and 𝐶, given the correlations with 𝐵 (signified by 𝐼 ( 𝐴; 𝐶 |𝐵) 𝜌 ).

325
Chapter 7: Quantum Entropies and Information

Exercise 7.3
Provide explicit proofs of the properties in (7.2.104)–(7.2.109), by following
what is stated above.

Exercise 7.4
Verify the chain rules stated in (7.2.110) and (7.2.112). More generally, prove
that 𝐼 ( 𝐴; 𝐵𝐶 |𝐷) 𝜌 = 𝐼 ( 𝐴; 𝐵|𝐷) 𝜌 + 𝐼 ( 𝐴; 𝐶 |𝐵𝐷) 𝜌 for a four-party state 𝜌 𝐴𝐵𝐶𝐷 .

7.2.2 Quantum Conditional Mutual Information

We now develop in detail some properties of the quantum conditional mutual

information that we use later in the book.
We start with the proof of the strong subadditivity property of the quantum
entropy, which we recall from (7.1.11) is the statement that

𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 = 𝐻 ( 𝐴|𝐶) 𝜌 + 𝐻 (𝐵|𝐶) 𝜌 − 𝐻 ( 𝐴𝐵|𝐶) 𝜌 ≥ 0 (7.2.113)

for every state 𝜌 𝐴𝐵𝐶 .

Theorem 7.6 Strong Subadditivity of Quantum Entropy

For every state 𝜌 𝐴𝐵𝐶 , the following inequality holds

𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 ≥ 0. (7.2.114)

Proof: One way to prove this result is by means the data-processing inequality for
the quantum relative entropy, along with the expression for the quantum conditional
entropy in (7.2.89). We start by using the definition of the quantum conditional
entropy to rewrite the quantum conditional mutual information defined in (7.1.11)
as

𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 = 𝐻 ( 𝐴|𝐶) 𝜌 + 𝐻 (𝐵|𝐶) 𝜌 − 𝐻 ( 𝐴𝐵|𝐶) 𝜌 (7.2.115)

= 𝐻 (𝐵|𝐶) 𝜌 − 𝐻 (𝐵| 𝐴𝐶) 𝜌 . (7.2.116)

Then, using the expression in (7.2.89) for the quantum conditional entropy in terms
326
Chapter 7: Quantum Entropies and Information

of the quantum relative entropy, we find that

𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 = 𝐷 (𝜌 𝐴𝐵𝐶 ∥ 1𝐵 ⊗ 𝜌 𝐴𝐶 ) − 𝐷 (𝜌 𝐵𝐶 ∥ 1𝐵 ⊗ 𝜌𝐶 ). (7.2.117)
Finally, observe that, by the data-processing inequality for the quantum relative
entropy with respect to the partial trace channel Tr 𝐴 , we obtain
𝐷 (𝜌 𝐴𝐵𝐶 ∥ 1𝐵 ⊗ 𝜌 𝐴𝐶 ) ≥ 𝐷 (Tr 𝐴 [𝜌 𝐴𝐵𝐶 ] ∥ 1𝐵 ⊗ Tr 𝐴 [𝜌 𝐴𝐶 ]) (7.2.118)
= 𝐷 (𝜌 𝐵𝐶 ∥ 1𝐵 ⊗ 𝜌𝐶 ), (7.2.119)
which implies that 𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 ≥ 0, as required. ■

Two direct consequences of strong subadditivity are that the conditional entropy
is concave and non-negative for every separable state. We detail these properties
below.

Corollary 7.7 Concavity of Conditional Entropy

The conditional entropy is concave, and the coherent information is convex:
∑︁
𝐻 ( 𝐴|𝐵) 𝜌¯ ≥ 𝑝(𝑥)𝐻 ( 𝐴|𝐵) 𝜌 𝑥 , (7.2.120)
𝑥∈X
∑︁
𝐼 ( 𝐴⟩𝐵) 𝜌¯ ≤ 𝑝(𝑥)𝐼 ( 𝐴⟩𝐵) 𝜌 𝑥 , (7.2.121)
𝑥∈X

where X is a finite alphabet, 𝑝 : X → [0, 1] is a probability distribution,

Í
{𝜌 𝑥𝐴𝐵 }𝑥∈X is a set of bipartite states, and 𝜌¯ B 𝑥∈X 𝑝(𝑥) 𝜌 𝑥𝐴𝐵 .

Proof: This follows directly by constructing the following classical–quantum state

𝜌 𝑋 𝐴𝐵 : ∑︁
𝜌 𝑋 𝐴𝐵 = 𝑝(𝑥)|𝑥⟩⟨𝑥| ⊗ 𝜌 𝑥𝐴𝐵 , (7.2.122)
𝑥∈X
applying strong subadditivity 𝐻 ( 𝐴|𝐵) − 𝐻 ( 𝐴|𝐵𝑋) 𝜌 = 𝐼 ( 𝐴; 𝑋 |𝐵) 𝜌 ≥ 0, and
observing that ∑︁
𝐻 ( 𝐴|𝐵𝑋) 𝜌 = 𝑝(𝑥)𝐻 ( 𝐴|𝐵) 𝜌 𝑥 . (7.2.123)
𝑥∈X
By applying (7.2.93), we conclude from (7.2.120) that coherent information is
convex. ■

327
Chapter 7: Quantum Entropies and Information

Corollary 7.8 Non-Negativity of Conditional Entropy on Separable

States
The conditional entropy 𝐻 ( 𝐴|𝐵)𝜎 is non-negative for every separable state
𝜎𝐴𝐵 .

Proof: Recall from Definition 3.5 that 𝜎𝐴𝐵 is separable if it can be written as
∑︁
𝜎𝐴𝐵 = 𝑝(𝑥)𝜏𝐴𝑥 ⊗ 𝜔𝑥𝐵 , (7.2.124)
𝑥∈X

where 𝑝 : X → [0, 1] is a probability distribution over a finite alphabet X and

{𝜏𝐴𝑥 }𝑥∈X and {𝜔𝑥𝐵 }𝑥∈X are sets of states. Defining the extension 𝜎𝑋 𝐴𝐵 of 𝜎𝐴𝐵 as
∑︁
𝜎𝑋 𝐴𝐵 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜏𝐴𝑥 ⊗ 𝜔𝑥𝐵 , (7.2.125)
𝑥∈X

we then conclude that 𝐼 ( 𝐴; 𝑋 |𝐵)𝜎 ≥ 0, which implies the desired inequality:

𝐻 ( 𝐴|𝐵)𝜎 ≥ 𝐻 ( 𝐴|𝐵𝑋)𝜎 (7.2.126)

∑︁
= 𝑝(𝑥)𝐻 ( 𝐴|𝐵)𝜏 𝑥 ⊗𝜔 𝑥 (7.2.127)
𝑥∈X
∑︁
= 𝑝(𝑥)𝐻 ( 𝐴)𝜏 𝑥 ≥ 0. (7.2.128)
𝑥∈X

This concludes the proof. ■

We now prove some other properties of the quantum conditional mutual

information that recur throughout the book.

Proposition 7.9 Properties of Quantum Conditional Mutual Information

The quantum conditional mutual information has the following properties:
1. Symmetry: For every state 𝜌 𝐴𝐵𝐶 , we have 𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 = 𝐼 (𝐵; 𝐴|𝐶) 𝜌 .
2. Local entropy and dimension bounds: For every state 𝜌 𝐴𝐵𝐶 ,

𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 ≤ 2 min{𝐻 ( 𝐴) 𝜌 , 𝐻 (𝐵) 𝜌 } ≤ 2 log2 (min{𝑑 𝐴 , 𝑑 𝐵 }). (7.2.129)

328
Chapter 7: Quantum Entropies and Information

Let 𝜎𝑋 𝐵𝐶 be a classical–quantum state of the form

∑︁
𝜎𝑋 𝐵𝐶 = 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐵𝐶 , (7.2.130)
𝑥∈X

where 𝑝 : X → [0, 1] is a probability distribution over a finite alphabet X

with associated |X|-dimensional system 𝑋, and {𝜌 𝑥𝐵𝐶 }𝑥∈X is a set of states.
Then the following bounds hold

𝐼 (𝑋; 𝐵|𝐶)𝜎 ≤ 𝐻 (𝑋)𝜎 ≤ log2 |X|. (7.2.131)

3. Product conditioning system: For a state 𝜌 𝐴𝐵𝐶 = 𝜎𝐴𝐵 ⊗ 𝜏𝐶 ,

𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 = 𝐼 ( 𝐴; 𝐵)𝜎 . (7.2.132)

4. Additivity: For states 𝜌 𝐴1 𝐵1𝐶1 and 𝜏𝐴2 𝐵2𝐶2 , the following equality holds for
the product state 𝜌 𝐴1 𝐵1𝐶1 ⊗ 𝜏𝐴2 𝐵2𝐶2 :

𝐼 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 |𝐶1𝐶2 ) 𝜌⊗𝜏 = 𝐼 ( 𝐴1 ; 𝐵1 |𝐶1 ) 𝜌 + 𝐼 ( 𝐴2 ; 𝐵2 |𝐶2 )𝜏 . (7.2.133)

5. Direct-sum property: For the classical–quantum state

∑︁
𝜎𝑋 𝐴𝐵𝐶 = 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴𝐵𝐶 (7.2.134)
𝑥∈X

the following equality holds

∑︁
𝐼 ( 𝐴; 𝐵|𝐶 𝑋)𝜎 = 𝑝(𝑥)𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 𝑥 . (7.2.135)
𝑥∈X

6. Chain rule: For every state 𝜌 𝐴𝐵1 𝐵2𝐶 ,

𝐼 ( 𝐴; 𝐵1 𝐵2 |𝐶) 𝜌 = 𝐼 ( 𝐴; 𝐵1 |𝐶) 𝜌 + 𝐼 ( 𝐴; 𝐵2 |𝐵1𝐶) 𝜌 (7.2.136)

7. Data-processing inequality for local channels: For every state 𝜌 𝐴𝐵𝐶 and
all local channels N 𝐴→𝐴′ and M𝐵→𝐵′ , the following inequality holds

𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 ≥ 𝐼 ( 𝐴′; 𝐵′ |𝐶)𝜔 , (7.2.137)

where 𝜔 𝐴′ 𝐵′𝐶 B (N 𝐴→𝐴′ ⊗ M𝐵→𝐵′ )(𝜌 𝐴𝐵𝐶 ).

329
Chapter 7: Quantum Entropies and Information

Proof: We establish the various properties one by one.

1. Symmetry under exchange of 𝐴 and 𝐵 follows immediately from the definition.
2. Using the definition of conditional entropy, we can write 𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 as

𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 = 𝐻 ( 𝐴𝐶) 𝜌 − 𝐻 (𝐶) 𝜌 + 𝐻 (𝐵𝐶) 𝜌 − 𝐻 ( 𝐴𝐵𝐶) 𝜌 (7.2.138)

= 𝐻 ( 𝐴|𝐶) 𝜌 − 𝐻 ( 𝐴|𝐵𝐶) 𝜌 . (7.2.139)

The inequality 𝐻 ( 𝐴|𝐶) 𝜌 ≤ 𝐻 ( 𝐴) 𝜌 is a direct consequence of strong subaddi-

tivity, under a relabeling and with trivial conditioning system. Then, by the
dimension bound from (7.2.108), it follows that 𝐻 ( 𝐴) 𝜌 ≤ log2 𝑑 𝐴 . Therefore,

𝐻 ( 𝐴|𝐶) 𝜌 ≤ log2 𝑑 𝐴 . (7.2.140)

Now, let |𝜓⟩ 𝐴𝐵𝐶𝐸 be a purification of 𝜌 𝐴𝐵𝐶 . Then, since 𝜌 𝐴𝐵𝐶 and 𝜓 𝐸 have
the same spectrum, and since 𝜌 𝐵𝐶 and 𝜓 𝐴𝐸 have the same spectrum, we obtain

𝐻 ( 𝐴|𝐵𝐶) 𝜌 = 𝐻 ( 𝐴𝐵𝐶) 𝜌 − 𝐻 (𝐵𝐶) 𝜌 (7.2.141)

= 𝐻 (𝐸)𝜓 − 𝐻 ( 𝐴𝐸)𝜓 (7.2.142)
= −𝐻 ( 𝐴|𝐸)𝜓 (7.2.143)
≥ −𝐻 ( 𝐴) (7.2.144)
≥ − log2 𝑑 𝐴 , (7.2.145)

where the first inequality follows from (7.2.140), and the second inequality, as
before, from the fact that 𝐻 ( 𝐴) 𝜌 ≤ log2 𝑑 𝐴 for every state 𝜌. Therefore,

𝐻 ( 𝐴|𝐵𝐶) 𝜌 ≥ −𝐻 ( 𝐴) 𝜌 ≥ − log2 𝑑 𝐴 , (7.2.146)

which means that 𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 = 𝐻 ( 𝐴|𝐶) 𝜌 −𝐻 ( 𝐴|𝐵𝐶) 𝜌 ≤ 2𝐻 ( 𝐴) 𝜌 ≤ 2 log2 𝑑 𝐴 .

By applying symmetry under exchange of 𝐴 and 𝐵 and the same argument,
we conclude that 𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 ≤ 2𝐻 (𝐵) 𝜌 ≤ 2 log2 𝑑 𝐵 . Thus, we conclude
(7.2.129).
If the system 𝐴 is classical (as in (7.2.130)), then the state is separable with
respect to the bipartite cut 𝐴|𝐵𝐶. As such, the lower bound in (7.2.146)
improves to 𝐻 ( 𝐴|𝐵𝐶) 𝜌 ≥ 0 (as a consequence of Corollary 7.8), implying that
the upper bound improves to 𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 ≤ 𝐻 ( 𝐴) 𝜌 ≤ log2 𝑑 𝐴 in this case.
3. For 𝜌 𝐴𝐵𝐶 = 𝜎𝐴𝐵 ⊗ 𝜏𝐶 , we have

𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 = 𝐻 ( 𝐴|𝐶)𝜎⊗𝜏 + 𝐻 (𝐵|𝐶)𝜎⊗𝜏 − 𝐻 ( 𝐴𝐵|𝐶)𝜎⊗𝜏 . (7.2.147)

330
Chapter 7: Quantum Entropies and Information

Now, we use the fact that

𝐻 ( 𝐴|𝐶)𝜎⊗𝜏 = 𝐻 ( 𝐴)𝜎 , (7.2.148)

which follows from (7.2.104). This means that 𝐻 (𝐵|𝐶)𝜎⊗𝜏 = 𝐻 (𝐵)𝜎 and
𝐻 ( 𝐴𝐵|𝐶)𝜎⊗𝜏 = 𝐻 ( 𝐴𝐵)𝜎 , and we find that

𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 = 𝐻 ( 𝐴)𝜎 + 𝐻 (𝐵)𝜎 − 𝐻 ( 𝐴𝐵)𝜎 = 𝐼 ( 𝐴; 𝐵)𝜎 . (7.2.149)

4. By writing

𝐼 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 |𝐶1𝐶2 ) 𝜌⊗𝜏 = 𝐻 ( 𝐴1 𝐴2 |𝐶1𝐶2 ) 𝜌⊗𝜏 + 𝐻 (𝐵1 𝐵2 |𝐶1𝐶2 ) 𝜌⊗𝜏

− 𝐻 ( 𝐴1 𝐴1 𝐵1 𝐵2 |𝐶1𝐶2 ) 𝜌⊗𝜏 , (7.2.150)

and recognizing that each conditional entropy on the right-hand side is evaluated
on a product state, we can use (7.2.104) to obtain the desired result. As an
example, we evaluate the first term on the right-hand side:

𝐻 ( 𝐴1 𝐴2 |𝐶1𝐶2 ) 𝜌⊗𝜏
= 𝐻 ( 𝐴1 𝐴2𝐶1𝐶2 ) 𝜌⊗𝜏 − 𝐻 (𝐶1𝐶2 ) 𝜌⊗𝜏 (7.2.151)
= 𝐻 ( 𝐴1𝐶1 ) 𝜌 + 𝐻 ( 𝐴2𝐶2 )𝜏 − 𝐻 (𝐶1 ) 𝜌 − 𝐻 (𝐶2 )𝜏 (7.2.152)
= 𝐻 ( 𝐴1 |𝐶1 ) 𝜌 + 𝐻 ( 𝐴2 |𝐶2 )𝜏 . (7.2.153)

5. By definition, we have that

𝐼 ( 𝐴; 𝐵|𝐶 𝑋)𝜎 = 𝐻 ( 𝐴|𝐶 𝑋)𝜎 + 𝐻 (𝐵|𝐶 𝑋)𝜎 − 𝐻 ( 𝐴𝐵|𝐶 𝑋)𝜎 . (7.2.154)

Let us consider the first term on the right-hand side. Using the definition of the
quantum conditional entropy and using the direct-sum property of the quantum
entropy, as stated in (7.2.109), we obtain

𝐻 ( 𝐴|𝐶 𝑋)𝜎
= 𝐻 ( 𝐴𝐶 𝑋)𝜎 − 𝐻 (𝐶 𝑋)𝜎 (7.2.155)
∑︁ ∑︁
= 𝐻 ( 𝑝) + 𝑝(𝑥)𝐻 ( 𝐴𝐶) 𝜌 𝑥 − 𝐻 ( 𝑝) − 𝑝(𝑥)𝐻 (𝐶) 𝜌 𝑥 (7.2.156)
𝑥∈X 𝑥∈X
∑︁
= 𝑝(𝑥) 𝐻 ( 𝐴𝐶) 𝜌 𝑥 − 𝐻 (𝐶) 𝜌 𝑥 (7.2.157)
𝑥∈X
∑︁
= 𝑝(𝑥)𝐻 ( 𝐴|𝐶) 𝜌 𝑥 . (7.2.158)
𝑥∈X

331
Chapter 7: Quantum Entropies and Information

6. This follows straightforwardly by using the definition of the quantum conditional

mutual information to expand both sides of (7.2.136) to confirm that they are
equal to each other.
7. Let us first prove the following inequality,
𝐼 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 |𝐶) 𝜌 ≥ 𝐼 ( 𝐴1 ; 𝐵1 |𝐶) 𝜌 (7.2.160)
for every state 𝜌 𝐴1 𝐴2 𝐵1 𝐵2𝐶 . This inequality implies that discarding parts
of the non-conditioning systems never increases the quantum conditional
mutual information. Applying the chain rule in (7.2.136), along with strong
subadditivity, we find that
𝐼 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 |𝐶) 𝜌 = 𝐼 ( 𝐴1 𝐴2 ; 𝐵1 |𝐶) 𝜌 + 𝐼 ( 𝐴1 𝐴2 ; 𝐵2 |𝐵1𝐶) 𝜌 (7.2.161)
≥ 𝐼 ( 𝐴1 𝐴2 ; 𝐵1 |𝐶) 𝜌 , (7.2.162)
where the last line follows from strong subadditivity, which implies that
𝐼 ( 𝐴1 𝐴2 ; 𝐵2 |𝐵1𝐶) 𝜌 ≥ 0. Applying the chain rule in (7.2.136) and strong
subadditivity once again, we conclude that
𝐼 ( 𝐴1 𝐴2 ; 𝐵1 |𝐶) 𝜌 = 𝐼 ( 𝐴1 ; 𝐵1 |𝐶) 𝜌 + 𝐼 ( 𝐴2 ; 𝐵1 | 𝐴1𝐶) 𝜌 (7.2.163)
≥ 𝐼 ( 𝐴1 ; 𝐵1 |𝐶) 𝜌 . (7.2.164)
So we have 𝐼 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 |𝐶) 𝜌 ≥ 𝐼 ( 𝐴1 ; 𝐵1 |𝐶) 𝜌 .
Now, let 𝑉𝐴→𝐴′ 𝐸1 and 𝑊𝐵→𝐵′ 𝐸2 be isometric extensions of the channels N 𝐴→𝐴′
and M𝐵→𝐵′ , respectively, with appropriate environment systems 𝐸 1 and 𝐸 2 .
Then, by (7.2.160) and the isometric invariance of the quantum entropy, we
find that
𝐼 ( 𝐴′; 𝐵′ |𝐶)𝜔 ≤ 𝐼 ( 𝐴′ 𝐸 1 ; 𝐵′ 𝐸 2 |𝐶) (𝑉 ⊗𝑊) 𝜌(𝑉 ⊗𝑊) † (7.2.165)
= 𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 , (7.2.166)
as required. ■

332
Chapter 7: Quantum Entropies and Information

Proposition 7.10 Uniform Continuity of Conditional Mutual Information

Let 𝜌 𝐴𝐵𝐶 and 𝜎𝐴𝐵𝐶 be tripartite quantum states such that
1
∥ 𝜌 𝐴𝐵𝐶 − 𝜎𝐴𝐵𝐶 ∥ 1 ≤ 𝜀, (7.2.167)
2
for 𝜀 ∈ [0, 1]. Then the following bound applies to their conditional mutual
informations:

𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 − 𝐼 ( 𝐴; 𝐵|𝐶)𝜎 ≤ 2𝜀 log2 (min{𝑑 𝐴 , 𝑑 𝐵 }) + 2𝑔2 (𝜀), (7.2.168)

where 𝑔2 (𝜀) B (𝜀 + 1) log2 (𝜀 + 1) − 𝜀 log2 𝜀. If the system 𝐴 is classical (so

that both 𝜌 𝐴𝐵𝐶 and 𝜎𝐴𝐵𝐶 are classical–quantum–quantum states of the form in
(7.2.130)), then the following bound holds

𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 − 𝐼 ( 𝐴; 𝐵|𝐶)𝜎 ≤ 𝜀 log2 𝑑 𝐴 + 2𝑔2 (𝜀). (7.2.169)

Remark: See Section 2.3 for a definition of uniform continuity.

Proof: Suppose without loss of generality that 𝜀 > 0 (otherwise the statement
trivially holds). Let 𝜔𝜆𝐴𝐵𝐶 B 𝜆𝜌 𝐴𝐵𝐶 + (1 − 𝜆) 𝜎𝐴𝐵𝐶 for 𝜆 ∈ [0, 1]. Then the
following inequality holds
𝜆𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 + (1 − 𝜆) 𝐼 ( 𝐴; 𝐵|𝐶)𝜎 ≤ 𝐼 ( 𝐴; 𝐵|𝐶)𝜔𝜆 + ℎ2 (𝜆), (7.2.170)
because for the classical–quantum state
𝜔𝜆𝐴𝐵𝐶 𝑋 B 𝜆𝜌 𝐴𝐵𝐶 ⊗ |0⟩⟨0| 𝑋 + (1 − 𝜆) 𝜎𝐴𝐵𝐶 ⊗ |1⟩⟨1| 𝑋 , (7.2.171)
we have that
𝜆𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 + (1 − 𝜆) 𝐼 ( 𝐴; 𝐵|𝐶)𝜎 = 𝐼 ( 𝐴; 𝐵|𝐶 𝑋)𝜔𝜆 (7.2.172)
≤ 𝐼 ( 𝐴𝑋; 𝐵|𝐶)𝜔𝜆 (7.2.173)
= 𝐼 ( 𝐴; 𝐵|𝐶)𝜔𝜆 + 𝐼 (𝑋; 𝐵|𝐶 𝐴)𝜔𝜆 (7.2.174)
≤ 𝐼 ( 𝐴; 𝐵|𝐶)𝜔𝜆 + 𝐻 (𝑋)𝜔𝜆 (7.2.175)
= 𝐼 ( 𝐴; 𝐵|𝐶)𝜔𝜆 + ℎ2 (𝜆). (7.2.176)
The first equality follows from (7.2.135). The first inequality follows from the chain
rule and strong subadditivity. The second equality follows from the chain rule. The
333
Chapter 7: Quantum Entropies and Information

second inequality follows from the local entropy bound in (7.2.131). We also have
that
𝐼 ( 𝐴; 𝐵|𝐶)𝜔𝜆 ≤ 𝐼 ( 𝐴𝑋; 𝐵|𝐶)𝜔𝜆 (7.2.177)
= 𝐼 (𝑋; 𝐵|𝐶)𝜔𝜆 + 𝐼 ( 𝐴; 𝐵|𝐶 𝑋)𝜔𝜆 (7.2.178)
≤ ℎ2 (𝜆) + 𝜆𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 + (1 − 𝜆) 𝐼 ( 𝐴; 𝐵|𝐶)𝜎 , (7.2.179)
which together imply that
𝜆𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 + (1 − 𝜆) 𝐼 ( 𝐴; 𝐵|𝐶)𝜎 − 𝐼 ( 𝐴; 𝐵|𝐶)𝜔𝜆 ≤ ℎ2 (𝜆). (7.2.180)
Then consider the state
1
𝜁 𝐴𝐵𝐶 B (𝜌 𝐴𝐵𝐶 + [𝜎𝐴𝐵𝐶 − 𝜌 𝐴𝐵𝐶 ] + ) , (7.2.181)
1+𝜀
where [·] + denotes the positive part of an operator, and for this choice, we have that
1 𝜀 1
𝜌 𝐴𝐵𝐶 + 𝜉 = 𝜁 𝐴𝐵𝐶 (7.2.182)
1+𝜀 1 + 𝜀 𝐴𝐵𝐶
1 𝜀 2
= 𝜎𝐴𝐵𝐶 + 𝜉 , (7.2.183)
1+𝜀 1 + 𝜀 𝐴𝐵𝐶
analogous to the approach of Thales of Milete, where the states 𝜉 1𝐴𝐵𝐶 and 𝜉 2𝐴𝐵𝐶 are
defined as
1
𝜉 1𝐴𝐵𝐶 B [𝜎𝐴𝐵𝐶 − 𝜌 𝐴𝐵𝐶 ] + , (7.2.184)
𝜀
1
𝜉 2𝐴𝐵𝐶 B ((1 + 𝜀)𝜁 𝐴𝐵𝐶 − 𝜎𝐴𝐵𝐶 ) . (7.2.185)
𝜀
Applying (7.2.180) to the convex decompositions above, we find that
1
𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 − 𝐼 ( 𝐴; 𝐵|𝐶)𝜎
1+𝜀
𝜀 𝜀
≤ 𝐼 ( 𝐴; 𝐵|𝐶)𝜉 2 − 𝐼 ( 𝐴; 𝐵|𝐶)𝜉 1 + 2ℎ2 (7.2.186)
1+𝜀 𝜀 1 + 𝜀
𝜀
≤ 𝐼 ( 𝐴; 𝐵|𝐶)𝜉 2 + 2ℎ2 (7.2.187)
1+𝜀 1+𝜀
𝜀 𝜀
≤ 2 log(min{𝑑 𝐴 , 𝑑 𝐵 }) + 2ℎ2 , (7.2.188)
1+𝜀 1+𝜀
where the last line follows from the dimension bound in (7.2.129). Multiplying
𝜀
through by 1 + 𝜀 and using the fact that 𝑔2 (𝜀) = (1 + 𝜀) ℎ2 1+𝜀 , we conclude that
𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 − 𝐼 ( 𝐴; 𝐵|𝐶)𝜎 ≤ 2𝜀 log2 (min{𝑑 𝐴 , 𝑑 𝐵 }) + 2𝑔2 (𝜀). (7.2.189)
334
Chapter 7: Quantum Entropies and Information

To arrive at the other inequality, we again apply (7.2.180) to the convex decomposi-
tions above and find that
1
𝐼 ( 𝐴; 𝐵|𝐶)𝜎 − 𝐼 ( 𝐴; 𝐵|𝐶) 𝜌
1+𝜀
𝜀 𝜀
≤ 𝐼 ( 𝐴; 𝐵|𝐶)𝜉 1 − 𝐼 ( 𝐴; 𝐵|𝐶)𝜉 2 + 2ℎ2 . (7.2.190)
1+𝜀 1+𝜀
Then we apply the same reasoning as above to find that
𝐼 ( 𝐴; 𝐵|𝐶)𝜎 − 𝐼 ( 𝐴; 𝐵|𝐶) 𝜌 ≤ 2𝜀 log2 (min{𝑑 𝐴 , 𝑑 𝐵 }) + 2𝑔2 (𝜀). (7.2.191)

The inequality in (7.2.169) follows from the same proof, but applying observation
that the 𝐴 system of the states 𝜉 1𝐴𝐵𝐶 and 𝜉 2𝐴𝐵𝐶 in (7.2.184)–(7.2.185) are classical
when 𝜌 𝐴𝐵𝐶 and 𝜎𝐴𝐵𝐶 are classical on 𝐴. Here, we also apply the dimension bound
in (7.2.130) in (7.2.188) above. ■

7.2.3 Quantum Mutual Information

The quantum mutual information of a bipartite state 𝜌 𝐴𝐵 is a measure of all

correlations between the 𝐴 and 𝐵 systems. We defined it previously in (7.1.8) and
(7.2.96), and we recall its definition here:
𝐼 ( 𝐴; 𝐵) 𝜌 B 𝐻 ( 𝐴) 𝜌 + 𝐻 (𝐵) 𝜌 − 𝐻 ( 𝐴𝐵) 𝜌 . (7.2.192)

It can be understood as a special case of the conditional mutual information in

which the 𝐶 system is trivial (i.e., formally, a one-dimensional system that is in
tensor product with 𝜌 𝐴𝐵 and equal to the number one). As such, all of the properties
of the conditional mutual information apply directly to mutual information, and we
list them here for convenience:

Corollary 7.11 Non-Negativity of Quantum Mutual Information

For every state 𝜌 𝐴𝐵 , the following inequality holds

𝐼 ( 𝐴; 𝐵) 𝜌 ≥ 0. (7.2.193)

We can view non-negativity in (7.2.193) as a consequence of strong subadditivity

in (7.2.114). However, the conclusion in (7.2.193) follows more easily as a
335
Chapter 7: Quantum Entropies and Information

consequence of non-negativity of quantum relative entropy for quantum states from

(7.2.72) because
𝐼 ( 𝐴; 𝐵) 𝜌 = 𝐷 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴 ⊗ 𝜌 𝐵 ) ≥ 0. (7.2.194)

Proposition 7.12 Properties of Quantum Mutual Information

The quantum mutual information has the following properties:
1. Symmetry: For every state 𝜌 𝐴𝐵 , we have 𝐼 ( 𝐴; 𝐵) 𝜌 = 𝐼 (𝐵; 𝐴) 𝜌 .
2. Local entropy and dimension bounds: For every state 𝜌 𝐴𝐵 , the following
bounds hold
𝐼 ( 𝐴; 𝐵) 𝜌 ≤ 2 min{𝐻 ( 𝐴) 𝜌 , 𝐻 (𝐵) 𝜌 } (7.2.195)
≤ 2 log2 (min{𝑑 𝐴 , 𝑑 𝐵 }). (7.2.196)
Let 𝜎𝑋 𝐵 be a classical–quantum state of the form
∑︁
𝜎𝑋 𝐵 = 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐵 , (7.2.197)
𝑥∈X

where 𝑝 : X → [0, 1] is a probability distribution over a finite alphabet X

with associated |X|-dimensional system 𝑋, and {𝜌 𝑥𝐵 }𝑥∈X is a set of states.
Then the following bounds hold
𝐼 (𝑋; 𝐵)𝜎 ≤ 𝐻 (𝑋)𝜎 ≤ log2 |X|. (7.2.198)

3. Additivity: For states 𝜌 𝐴1 𝐵1 and 𝜏𝐴2 𝐵2 , the following equality holds for the
product state 𝜌 𝐴1 𝐵1 ⊗ 𝜏𝐴2 𝐵2 :
𝐼 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 ) 𝜌⊗𝜏 = 𝐼 ( 𝐴1 ; 𝐵1 ) 𝜌 + 𝐼 ( 𝐴2 ; 𝐵2 )𝜏 . (7.2.199)

4. Direct-sum property: Let 𝑝 : X → [0, 1] be a probability distribution

over a finite alphabet X with associated |X|-dimensional system 𝑋, and let
{𝜌 𝑥𝐴𝐵 }𝑥∈X be a set of states. Then, for the classical–quantum state
∑︁
𝜎𝑋 𝐴𝐵 = 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴𝐵 (7.2.200)
𝑥∈X

the following equality holds

∑︁
𝐼 ( 𝐴; 𝐵|𝑋)𝜎 = 𝑝(𝑥)𝐼 ( 𝐴; 𝐵) 𝜌 𝑥 . (7.2.201)
𝑥∈X

336
Chapter 7: Quantum Entropies and Information

5. Data-processing inequality for local channels: For every state 𝜌 𝐴𝐵 , the

quantum mutual information is non-increasing under the action of local
channels. In other words, for every state 𝜌 𝐴𝐵 and all local channels N 𝐴→𝐴′
and M𝐵→𝐵′ , the following inequality holds

𝐼 ( 𝐴; 𝐵) 𝜌 ≥ 𝐼 ( 𝐴′; 𝐵′)𝜔 , (7.2.202)

where 𝜔 𝐴′ 𝐵′ B (N 𝐴→𝐴′ ⊗ M𝐵→𝐵′ )(𝜌 𝐴𝐵 ).

Proposition 7.13 Uniform Continuity of Mutual Information

Let 𝜌 𝐴𝐵 and 𝜎𝐴𝐵 be bipartite quantum states such that
1
∥ 𝜌 𝐴𝐵 − 𝜎𝐴𝐵 ∥ 1 ≤ 𝜀, (7.2.203)
2
for 𝜀 ∈ [0, 1]. Then the following bound applies to their mutual informations:

𝐼 ( 𝐴; 𝐵) 𝜌 − 𝐼 ( 𝐴; 𝐵)𝜎 ≤ 2𝜀 log2 (min{𝑑 𝐴 , 𝑑 𝐵 }) + 2𝑔2 (𝜀), (7.2.204)

where 𝑔2 (𝜀) B (𝜀 + 1) log2 (𝜀 + 1) − 𝜀 log2 𝜀.

Proposition 7.14 Mutual Information of Classical–Quantum States

Let 𝑝 : X → [0, 1] be a probability distribution over a finite alphabet X with
associated |X|-dimensional system 𝑋, and let {𝜌 𝑥𝐴 }𝑥∈X be a set of states. Then,
for the classical–quantum state
∑︁
𝜌𝑋 𝐴 = 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 (7.2.205)
𝑥∈X

the following equalities hold

∑︁
𝐼 (𝑋; 𝐴) 𝜌 = 𝐻 (𝜌 𝐴 ) − 𝑝(𝑥)𝐻 (𝜌 𝑥𝐴 ) (7.2.206)
𝑥∈X
∑︁
= 𝑝(𝑥)𝐷 (𝜌 𝑥𝐴 ∥ 𝜌 𝐴 ), (7.2.207)
𝑥∈X
Í
where 𝜌 𝐴 B 𝑥∈X 𝑝(𝑥) 𝜌 𝑥𝐴 .

337
Chapter 7: Quantum Entropies and Information

Remark: The mutual information 𝐼 (𝑋; 𝐴)𝜌 of a classical–quantum state 𝜌 𝑋 𝐴 is often called
Holevo information, a term we use throughout this book.

Proof: Consider from (7.2.96) that

𝐼 (𝑋; 𝐴) 𝜌 = 𝐷 (𝜌 𝑋 𝐴 ∥ 𝜌 𝑋 ⊗ 𝜌 𝐴 ) (7.2.208)
= Tr[𝜌 𝑋 𝐴 log2 𝜌 𝑋 𝐴 ] − Tr[𝜌 𝑋 𝐴 log2 (𝜌 𝑋 ⊗ 𝜌 𝐴 )]. (7.2.209)
Using (7.2.65) in the proof of Proposition 7.3, we find that
|𝑥⟩⟨𝑥| 𝑋 ⊗ log2 𝑝(𝑥) 1 𝐴 +
∑︁ ∑︁
log2 𝜌 𝑋 𝐴 = |𝑥⟩⟨𝑥| 𝑋 ⊗ log 𝜌 𝑥𝐴 , (7.2.210)
𝑥∈X 𝑥∈X
which leads to
∑︁ ∑︁
Tr[𝜌 𝑋 𝐴 log 𝜌 𝑋 𝐴 ] = 𝑝(𝑥) log2 𝑝(𝑥) + 𝑝(𝑥)Tr[𝜌 𝑥𝐴 log2 𝜌 𝑥𝐴 ] (7.2.211)
𝑥∈X 𝑥∈X
∑︁
= −𝐻 (𝑋) − 𝑝(𝑥)𝐻 (𝜌 𝑥𝐴 ). (7.2.212)
𝑥∈X
Then, using log2 (𝜌 𝑋 ⊗ 𝜌 𝐴 ) = log2 𝜌 𝑋 ⊗ 1 𝐴 + 1 𝑋 ⊗ log2 𝜌 𝐴 , we conclude that
Tr[𝜌 𝑋 𝐴 log2 (𝜌 𝑋 ⊗ 𝜌 𝐴 )] = Tr[𝜌 𝑋 log 𝜌 𝑋 ] + Tr[𝜌 𝐴 log2 𝜌 𝐴 ] (7.2.213)
= −𝐻 (𝑋) − 𝐻 (𝜌 𝐴 ). (7.2.214)
However, 𝜌 𝐴 = 𝜌 𝐴 , so that
∑︁
𝐼 (𝑋; 𝐴) 𝜌 = 𝐻 (𝜌 𝐴 ) − 𝑝(𝑥)𝐻 (𝜌 𝑥𝐴 ), (7.2.215)
𝑥∈X
which is the statement in (7.2.206).
Now we prove (7.2.207). Starting from (7.2.206), consider that
∑︁
𝐻 (𝜌 𝐴 ) − 𝑝(𝑥)𝐻 (𝜌 𝑥𝐴 )
𝑥∈X
∑︁
= −Tr[𝜌 𝐴 log2 𝜌 𝐴 ] + 𝑝(𝑥)Tr[𝜌 𝑥𝐴 log2 𝜌 𝑥𝐴 ] (7.2.216)
𝑥∈X
∑︁ ∑︁
𝑥
=− 𝑝(𝑥)Tr[𝜌 𝐴 log2 𝜌 𝐴 ] + 𝑝(𝑥)Tr[𝜌 𝑥𝐴 log2 𝜌 𝑥𝐴 ] (7.2.217)
𝑥∈X 𝑥∈X
∑︁
𝑥 𝑥
= 𝑝(𝑥)Tr[𝜌 𝐴 (log2 𝜌 𝐴 − log2 𝜌 𝐴 )] (7.2.218)
𝑥∈X
∑︁
= 𝑝(𝑥)𝐷 (𝜌 𝑥𝐴 ∥ 𝜌 𝐴 ), (7.2.219)
𝑥∈X
which establishes (7.2.207). ■
338
Chapter 7: Quantum Entropies and Information

7.3 Generalized Divergences

The quantum relative entropy is a function on states and positive semi-definite
operators that satisfies the data-processing inequality under quantum channels. The
data-processing inequality is quite powerful and a unifying concept, in the sense
that it allows for establishing many properties of information and distinguishability
measures. As such, this motivates the definition of generalized divergence as
a function on states and positive semi-definite operators that satisfies the data-
processing inequality.

Definition 7.15 Generalized Divergence

For every Hilbert space H, a function 𝑫 : D(H) × L+ (H) → R ∪ {+∞} is
called a generalized divergence if it satisfies the data-processing inequality
under every channel, i.e., for all channels N, states 𝜌, and positive semi-definite
operators 𝜎,
𝑫 (𝜌∥𝜎) ≥ 𝑫 (N(𝜌)∥N(𝜎)). (7.3.1)

Many of the strong converse theorems in this book rely heavily on the data-
processing inequality, and so we employ the generalized divergence to emphasize
this point.
We have already mentioned the quantum relative entropy as an example of a
generalized divergence. Other examples discussed later in this chapter, which are
relevant in the context of channel capacity theorems, are the Petz–, sandwiched,
and geometric Rényi relative entropies.
From the fact that generalized divergences satisfy the data-processing inequality
by definition, we immediately obtain two properties of interest.

Proposition 7.16 Basic Properties of the Generalized Divergence

For every generalized divergence 𝑫, for every state 𝜌, and every positive
semi-definite operator 𝜎:
1. The generalized divergence is isometrically invariant; i.e., for every isome-
try 𝑉,
𝑫 (𝜌∥𝜎) = 𝑫 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ). (7.3.2)

339
Chapter 7: Quantum Entropies and Information

2. For every state 𝜏,

𝑫 (𝜌∥𝜎) = 𝑫 (𝜌 ⊗ 𝜏∥𝜎 ⊗ 𝜏). (7.3.3)

Proof:
1. We follow the same approach discussed after (7.2.80). Since the map 𝜌 ↦→
𝑉 𝜌𝑉 † is a channel, we immediately obtain 𝑫 (𝜌∥𝜎) ≥ 𝑫 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ). To
prove that 𝑫 (𝜌∥𝜎) ≤ 𝑫 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ), consider the channel R𝑉 , which was
defined in (4.4.13) as

R𝑉 (𝜔) = 𝑉 † 𝜔𝑉 + Tr[( 1 − 𝑉𝑉 † )𝜔]𝜏 (7.3.4)

for every positive semi-definite operator 𝜔, where 𝜏 is an arbitrary (but fixed)

state. Recall from Section 4.4.3 that this is a general way to take the map
𝜌 ↦→ 𝑉 † 𝜌𝑉, which is completely positive but not trace preserving, and make it
trace preserving and hence a channel. Then, recall that R𝑉 (𝑉 𝜌𝑉 † ) = 𝜌 and
R𝑉 (𝑉 𝜎𝑉 † ) = 𝜎. Therefore, by the data-processing inequality,

𝑫 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ) ≥ 𝑫 (R𝑉 (𝑉 𝜌𝑉 † )∥R𝑉 (𝑉 𝜎𝑉 † )) = 𝑫 (𝜌∥𝜎), (7.3.5)

and so 𝑫 (𝜌∥𝜎) = 𝑫 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ).
2. Since taking the tensor product with a fixed state is a channel (recall Def-
inition 4.7), by definition of generalized divergence we obtain 𝑫 (𝜌∥𝜎) ≥
𝑫 (𝜌 ⊗ 𝜏∥𝜎 ⊗ 𝜏). On the other hand, the partial trace is also a channel, and so
by discarding the second system in the operators 𝜌 ⊗ 𝜏 and 𝜎 ⊗ 𝜏, we obtain

𝑫 (𝜌∥𝜎) = 𝑫 (Tr2 (𝜌 ⊗ 𝜏)∥Tr2 (𝜎 ⊗ 𝜏)) ≤ 𝑫 (𝜌 ⊗ 𝜏∥𝜎 ⊗ 𝜏), (7.3.6)

which means that 𝑫 (𝜌∥𝜎) = 𝑫 (𝜌 ⊗ 𝜏∥𝜎 ⊗ 𝜏). ■

Proposition 7.17
Suppose that the generalized divergence obeys the following direct-sum prop-
erty: ∑︁
𝑫 (𝜌 𝑋 𝐴 ∥𝜎𝑋 𝐴 ) = 𝑝(𝑥) 𝑫 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ), (7.3.7)
𝑥∈X

where X is a finite alphabet, 𝑝 : X → [0, 1] is a probability distribution,

340
Chapter 7: Quantum Entropies and Information

{𝜌 𝑥𝐴 }𝑥∈X is a set of states, {𝜎𝐴𝑥 }𝑥∈X is a set of positive semi-definite operators,

and
∑︁
𝜌𝑋 𝐴 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 , (7.3.8)
𝑥∈X
∑︁
𝜎𝑋 𝐴 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 . (7.3.9)
𝑥∈X

Then the generalized divergence is jointly convex; i.e., the following inequality
holds ∑︁
𝑝(𝑥) 𝑫 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ) ≥ 𝑫 (𝜌 𝐴 ∥𝜎 𝐴 ), (7.3.10)
𝑥∈X
Í Í
where 𝜌 𝐴 B 𝑥∈X 𝑝(𝑥) 𝜌 𝑥𝐴 and 𝜎 𝐴 B 𝑥∈X 𝑝(𝑥)𝜎𝐴𝑥 .

Proof: The proof is the same as the proof of Proposition 7.5 with 𝐷 replaced by
𝑫. ■

Now, just as we defined entropic quantities like the entropy, conditional entropy,
and mutual information using the quantum relative entropy, we can define their
generalized counterparts using the generalized divergence.

Definition 7.18 Generalized Information Measures for States

Let 𝑫 be a generalized divergence, as given in Definition 7.15.
1. The generalized quantum entropy 𝑯(𝜌), for a state 𝜌, is defined as

𝑯(𝜌) B −𝑫 (𝜌∥ 1). (7.3.11)

2. The generalized quantum conditional entropy 𝑯( 𝐴|𝐵) 𝜌 , for a bipartite

state 𝜌 𝐴𝐵 , is defined as

𝑯( 𝐴|𝐵) 𝜌 B − inf 𝑫 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ), (7.3.12)

𝜎𝐵 ∈D(H 𝐵 )

and the generalized coherent information 𝑰( 𝐴⟩𝐵) 𝜌 , for a bipartite state

𝜌 𝐴𝐵 , is defined as

𝑰( 𝐴⟩𝐵) 𝜌 B inf 𝑫 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ). (7.3.13)

𝜎𝐵 ∈D(H 𝐵 )

341
Chapter 7: Quantum Entropies and Information

3. The generalized quantum mutual information 𝑰( 𝐴; 𝐵) 𝜌 , for a bipartite state

𝜌 𝐴𝐵 , is defined as

𝑰( 𝐴; 𝐵) 𝜌 B inf 𝑫 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴 ⊗ 𝜎𝐵 ). (7.3.14)
𝜎𝐵 ∈D(H 𝐵 )

The data-processing inequality for every generalized divergence translates to

the derived generalized information measures for states in the following way.

Proposition 7.19 Data-Processing Inequality for Generalized Informa-

tion Measures for States
Let 𝑫 be a generalized divergence.
1. The generalized quantum entropy does not decrease under the action of a
unital channel, i.e.,
𝑯(𝜌) ≤ 𝑯(N(𝜌)), (7.3.15)
for every state 𝜌 and every unital channel N.
2. For every bipartite state 𝜌 𝐴𝐵 , the generalized quantum conditional entropy
does not decrease under the action of an arbitrary unital channel on 𝐴 and
an arbitrary channel on 𝐵, i.e.,

𝑯( 𝐴|𝐵) 𝜌 ≤ 𝑯( 𝐴′ |𝐵′) 𝜌′ , (7.3.16)

where 𝜌′𝐴′ 𝐵′ B (N 𝐴→𝐴′ ⊗ M𝐵→𝐵′ )(𝜌 𝐴𝐵 ), with N 𝐴→𝐴′ an arbitrary unital

channel and M𝐵→𝐵′ an arbitrary channel. It follows by definition that

𝑰( 𝐴⟩𝐵) 𝜌 ≥ 𝑰( 𝐴′⟩𝐵′) 𝜌′ . (7.3.17)

3. For every bipartite state 𝜌 𝐴𝐵 , the generalized quantum mutual information

does not increase under the action of arbitrary channels on 𝐴 and 𝐵, i.e.,

𝑰( 𝐴; 𝐵) 𝜌 ≥ 𝑰( 𝐴′; 𝐵′) 𝜌′ , (7.3.18)

where 𝜌′𝐴′ 𝐵′ = (N 𝐴→𝐴′ ⊗ M𝐵→𝐵′ )(𝜌 𝐴𝐵 ), with N 𝐴→𝐴′ and M𝐵→𝐵′ arbitrary
channels.

Proof:

342
Chapter 7: Quantum Entropies and Information

1. Let 𝜌 be an arbitrary state, and let N be a unital channel. The unitality of N

means, by definition, that N( 1) = 1. Using this, along with the data-processing
inequality for the generalized divergence 𝑫, we obtain

𝑯(N(𝜌)) = −𝑫 (N(𝜌)∥ 1) (7.3.19)

= −𝑫 (N(𝜌)∥N( 1)) (7.3.20)
≥ −𝑫 (𝜌∥ 1) (7.3.21)
= 𝑯(𝜌), (7.3.22)

as required.
2. Let 𝜌 𝐴𝐵 be an arbitrary bipartite state, let N 𝐴→𝐴′ be an arbitrary unital channel,
and let M𝐵→𝐵′ be an arbitrary channel. Also, let 𝜎𝐵 be an arbitrary state. Then,
using the data-processing inequality of the generalized divergence 𝑫 and the
unitality of N, we obtain

𝑫 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ) ≥ 𝑫 ((N ⊗ M)(𝜌 𝐴𝐵 )∥(N ⊗ M)( 1 𝐴 ⊗ 𝜎𝐵 )) (7.3.23)

= 𝑫 (𝜌′𝐴′ 𝐵′ ∥N( 1 𝐴 ) ⊗ M(𝜎𝐵 )) (7.3.24)
= 𝑫 (𝜌′𝐴′ 𝐵′ ∥ 1 𝐴′ ⊗ M(𝜎𝐵 )) (7.3.25)
≥ inf 𝑫 (𝜌′𝐴′ 𝐵′ ∥ 1 𝐴′ ⊗ 𝜎𝐵′ ) (7.3.26)
𝜎𝐵′
= −𝑯( 𝐴′ |𝐵′) 𝜌′ . (7.3.27)

Since the state 𝜎𝐵 is arbitrary, we find that

𝑯( 𝐴|𝐵) 𝜌 = − inf 𝑫 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ) ≤ 𝑯( 𝐴′ |𝐵′) 𝜌′ , (7.3.28)

𝜎𝐵

as required. The data-processing inequality in (7.3.17) for the generalized

coherent information follows from the fact that, by definition, 𝑰( 𝐴⟩𝐵) 𝜌 =
−𝑯( 𝐴|𝐵) 𝜌 .
3. Let 𝜌 𝐴𝐵 be an arbitrary bipartite state, and let N 𝐴→𝐴′ and M𝐵→𝐵′ be arbitrary
channels. Also, let 𝜎𝐵 be an arbitrary state. Then, using the data-processing
inequality for the generalized divergence 𝑫, we obtain

𝑫 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴 ⊗ 𝜎𝐵 ) ≥ 𝑫 ((N ⊗ M)(𝜌 𝐴𝐵 )∥(N ⊗ M)(𝜌 𝐴 ⊗ 𝜎𝐵 )) (7.3.29)

= 𝑫 (𝜌′𝐴′ 𝐵′ ∥ 𝜌′𝐴′ ⊗ M(𝜎𝐵 )) (7.3.30)
≥ inf 𝑫 (𝜌′𝐴′ 𝐵′ ∥ 𝜌′𝐴′ ⊗ 𝜎𝐵′ ) (7.3.31)
𝜎𝐵′

343
Chapter 7: Quantum Entropies and Information

= 𝑰( 𝐴′; 𝐵′) 𝜌′ . (7.3.32)

Since the state 𝜎𝐵 is arbitrary, we find that

𝑰( 𝐴; 𝐵) 𝜌 = inf 𝑫 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴 ⊗ 𝜎𝐵 ) ≥ 𝑰( 𝐴′; 𝐵′) 𝜌′ , (7.3.33)

𝜎𝐵

as required. ■

In the above, we have proved most properties of a generalized divergence

by employing its defining property only (i.e., by employing the data-processing
inequality in (7.3.1)). In further applications, it can be useful to make some very
minimal extra assumptions about a generalized divergence. In what follows, we list
two of these minimal assumptions. If we ever employ these assumptions in future
applications, we indicate this explicitly.
1. A first assumption is that
𝑫 (1∥𝑐) ≥ 0 (7.3.34)
for 𝑐 ∈ (0, 1]. That is, if we plug in a trivial one-dimensional density operator
𝜌 (i.e., the number 1) and a trivial positive semi-definite operator with trace less
than or equal to one, then the generalized divergence evaluates to a non-negative
real. This assumption is satisfied by all examples of generalized divergences
that we employ in this book.
A consequence of the assumption in (7.3.34) is that

𝑫 (𝜌∥𝜎) ≥ 0 (7.3.35)

for every density operator 𝜌 and positive semi-definite operator 𝜎 satisfying

Tr[𝜎] ≤ 1. This follows from (7.3.1) and (7.3.34) by applying the trace-out
channel.
2. A second minimal assumption is that

𝑫 (𝜌∥ 𝜌) = 0 (7.3.36)

for every state 𝜌. We should clarify that this assumption is quite minimal.
The reason is that it is essentially a direct consequence of (7.3.1) up to an
inessential additive factor. That is, (7.3.1) implies that there exists a constant 𝑐
such that
𝑫 (𝜌∥ 𝜌) = 𝑐 (7.3.37)
344
Chapter 7: Quantum Entropies and Information

for every state 𝜌. To see this, consider that one can get from the state 𝜌 to
another state 𝜔 by means of a trace and replace channel Tr[·]𝜔, so that (7.3.1)
implies that
𝑫 (𝜌∥ 𝜌) ≥ 𝑫 (𝜔∥𝜔). (7.3.38)
However, by the same argument, 𝑫 (𝜔∥𝜔) ≥ 𝑫 (𝜌∥ 𝜌), so that the claim holds.
So the assumption in (7.3.36) amounts to a redefinition of the generalized
divergence as
𝑫 ′ (𝜌∥𝜎) := 𝑫 (𝜌∥𝜎) − 𝑐. (7.3.39)

7.4 Petz–Rényi Relative Entropy

One important example of a generalized divergence is the Petz–Rényi relative
entropy.

Definition 7.20 Petz–Rényi Relative Entropy

For all 𝛼 ∈ (0, 1) ∪ (1, ∞), we define the Petz–Rényi relative quasi-entropy for
every state 𝜌 and positive semi-definite operator 𝜎 as

 if 𝛼 ∈ (0, 1), or
 Tr[𝜌 𝛼 𝜎 1−𝛼 ]


𝑄 𝛼 (𝜌∥𝜎) B 𝛼 ∈ (1, ∞) and supp(𝜌) ⊆ supp(𝜎), (7.4.1)

 +∞
 otherwise.

The Petz–Rényi relative entropy is then defined as
1
𝐷 𝛼 (𝜌∥𝜎) B log2 𝑄 𝛼 (𝜌∥𝜎). (7.4.2)
𝛼−1

By following the recipe in (7.3.11), i.e., setting 𝜎 = 1 and applying a minus

sign out in front, the Petz–Rényi relative entropy reduces to the Rényi entropy of a
quantum state 𝜌:
1 𝛼
𝐻𝛼 (𝜌) = −𝐷 𝛼 (𝜌∥ 1) = log2 Tr[𝜌 𝛼 ] = log2 ∥ 𝜌∥ 𝛼 . (7.4.3)
1−𝛼 1−𝛼
If the state 𝜌 is defined on system 𝐴, we also employ the notation
𝐻𝛼 ( 𝐴) 𝜌 ≡ 𝐻𝛼 (𝜌). (7.4.4)
345
Chapter 7: Quantum Entropies and Information

The Rényi entropy is defined for all 𝛼 ∈ (0, 1) ∪ (1, ∞), and one evaluates its value
at 𝛼 ∈ {0, 1, ∞} by taking limits:
𝐻0 (𝜌) B lim 𝐻𝛼 (𝜌) = log2 rank(𝜌), (7.4.5)
𝛼→0
𝐻1 (𝜌) B lim 𝐻𝛼 (𝜌) = −Tr[𝜌 log2 𝜌] = 𝐻 (𝜌), (7.4.6)
𝛼→1
𝐻∞ (𝜌) B lim 𝐻𝛼 (𝜌) = − log 𝜆max (𝜌). (7.4.7)
𝛼→∞

Turning back to the Petz–Rényi relative entropy, note that 1 − 𝛼 is negative

for 𝛼 > 1. In this case, the inverse of 𝜎 is taken on its support. An alternative
to this convention is to define 𝐷 𝛼 (𝜌∥𝜎) for 𝛼 > 1 for only positive definite 𝜎
and for positive semi-definite 𝜎 define 𝐷 𝛼 (𝜌∥𝜎) = lim𝜀→0 𝐷 𝛼 (𝜌∥𝜎 + 𝜀 1). Both
alternatives are equivalent, as we now show.

Proposition 7.21
For every state 𝜌 and positive semi-definite operator 𝜎,
1
log2 Tr 𝜌 𝛼 (𝜎 + 𝜀 1) 1−𝛼 .

𝐷 𝛼 (𝜌∥𝜎) = lim+ (7.4.8)
𝜀→0 𝛼−1

Proof: For 𝛼 ∈ (0, 1), this is immediate from the fact that the logarithm, trace,
and power functions are continuous, so that the limit can be brought inside the trace
and inside the power (𝜎 + 𝜀 1) 1−𝛼 .
For 𝛼 ∈ (1, ∞), since 1 − 𝛼 is negative and 𝜎 is not necessarily invertible, we
first decompose the underlying Hilbert space H as H = supp(𝜎) ⊕ ker(𝜎), just as
we did in (7.2.6), in order to write

𝜌0,0 𝜌0,1 𝜎 0
𝜌= † , 𝜎= . (7.4.9)
𝜌0,1 𝜌1,1 0 0
Then, writing 1 = Π𝜎 + Π𝜎⊥ , where Π𝜎 is the projection onto the support of 𝜎 and
Π𝜎⊥ is the projection onto the orthogonal complement of supp(𝜎), we find that

𝜎 + 𝜀Π𝜎 0
𝜎 + 𝜀1 = , (7.4.10)
0 𝜀Π𝜎⊥
which implies that

(𝜎 + 𝜀Π𝜎 ) 1−𝛼
(𝜎 + 𝜀 1) 1−𝛼
0
= . (7.4.11)
0 (𝜀Π𝜎⊥ ) 1−𝛼
346
Chapter 7: Quantum Entropies and Information

If supp(𝜌) ⊆ supp(𝜎), then 𝜌 = 𝜌0,0 , 𝜌0,1 = 0, and 𝜌1,1 = 0, which means that
𝛼
(𝜎 + ) 1−𝛼 0
𝜌 𝛼 (𝜎 + 𝜀 1) 1−𝛼 =
𝜌 𝜀Π 𝜎
, (7.4.12)
0 0

so that
1
log2 Tr 𝜌 𝛼 (𝜎 + 𝜀 1) 1−𝛼

lim+
𝜀→0 𝛼 − 1
1
= lim+ log2 Tr 𝜌 𝛼 (𝜎 + 𝜀Π𝜎 ) 1−𝛼 (7.4.13)
𝜀→0 𝛼 − 1
1
= log2 Tr 𝜌 𝛼 𝜎 1−𝛼 (7.4.14)
𝛼−1
= 𝐷 𝛼 (𝜌∥𝜎), (7.4.15)

as required.
If supp(𝜌) ⊈ supp(𝜎), then the blocks 𝜌0,1 and 𝜌1,1 are generally non-zero,
and we obtain

𝜌 𝛼 (𝜎 + 𝜀 1) 1−𝛼
𝛼
𝜌0,0 𝜌0,1 (𝜎 + 𝜀Π𝜎 ) 1−𝛼 0
= † (7.4.16)
𝜌0,1 𝜌1,1 0 (𝜀Π𝜎⊥ ) 1−𝛼
𝛼 𝛼−1
𝜌 0,0 𝜌 0,1 𝜀 (𝜎 + 𝜀Π 𝜎 ) 1−𝛼 0
= 𝜀 1−𝛼 † . (7.4.17)
𝜌0,1 𝜌1,1 0 (Π𝜎⊥ ) 1−𝛼

Due to the fact that 𝛼 ∈ (1, ∞) it holds that lim𝜀→0+ 𝜀 1−𝛼 = +∞, and since the limit
𝜀 → 0+ of the matrix in square brackets in (7.4.17) is finite, we find that
1
log2 Tr 𝜌 𝛼 (𝜎 + 𝜀 1) 1−𝛼 = +∞ = 𝐷 𝛼 (𝜌∥𝜎)

lim (7.4.18)
𝜀→0 𝛼 − 1

for the case 𝛼 ∈ (1, ∞) and supp(𝜌) ⊈ supp(𝜎). ■

The Petz–Rényi relative entropy is a natural generalization of the classical Rényi

relative entropy, which is defined for a probability distribution 𝑝 : X → [0, 1] and
a positive measure 𝑞 : X → [0, ∞) over the same alphabet X as
1 ∑︁
𝐷 𝛼 ( 𝑝∥𝑞) = log2 𝑝(𝑥) 𝛼 𝑞(𝑥) 1−𝛼 (7.4.19)
𝛼−1
𝑥∈X

347
Chapter 7: Quantum Entropies and Information

for 𝛼 ∈ (0, 1). For 𝛼 ∈ (1, ∞), the same expression defines 𝐷 𝛼 ( 𝑝∥𝑞) whenever
𝑞(𝑥) = 0 implies 𝑝(𝑥) = 0 for all 𝑥 ∈ X; otherwise, 𝐷 𝛼 ( 𝑝∥𝑞) = +∞.
Recall the quantum Chernoff bound from Theorem 5.4 in Section 5.3.1, which
states that optimal error exponent for the task of discriminating between the states
𝜌 and 𝜎 is
𝜉 (𝜌, 𝜎) = 𝜉 (𝜌, 𝜎) = 𝐶 (𝜌∥𝜎) B sup − log2 Tr[𝜌 𝛼 𝜎 1−𝛼 ]. (7.4.20)
𝛼∈(0,1)
Using the definition of the Petz–Rényi relative entropy, we find that
𝐶 (𝜌∥𝜎) = inf (1 − 𝛼)𝐷 𝛼 (𝜌∥𝜎). (7.4.21)
𝛼∈(0,1)
The Petz–Rényi relative entropy thus plays a role in providing the optimal error ex-
ponent for the task of discriminating between two states (i.e., symmetric hypothesis
testing).
Like the quantum relative entropy, the Petz–Rényi relative entropy is faithful,
meaning that, for 𝛼 ∈ (0, 1) ∪ (1, ∞) and states 𝜌, 𝜎,
𝐷 𝛼 (𝜌∥𝜎) = 0 ⇐⇒ 𝜌 = 𝜎. (7.4.22)
We prove this in Proposition 7.36 in the next section, as it requires results from
both this section and the next one. Also, like the quantum relative entropy, the
Petz–Rényi relative entropy is a generalized divergence for certain values of 𝛼,
which is shown in Theorem 7.24 below.
Before getting to Theorem 7.24, we first discuss several important properties of
the Petz–Rényi relative entropy. Let us note that, if 𝜌 and 𝜎 act on a 𝑑-dimensional
Hilbert space and are invertible, then the Petz–Rényi relative quasi-entropy can be
written as
𝑄 𝛼 (𝜌∥𝜎) = ⟨𝜑 𝜌 |(𝜌 −1 ⊗ 𝜎 T ) 1−𝛼 |𝜑 𝜌 ⟩, (7.4.23)
where |𝜑 𝜌 ⟩ B (𝜌 2 ⊗ 1)|Γ⟩ is a purification of 𝜌 and |Γ⟩ = 𝑖=1
1 Í 𝑑
|𝑖, 𝑖⟩. This is due
to the transpose trick in (2.2.40) and the identity in (2.2.41). We can extend the
applicability of this expression to states 𝜌 and positive semi-definite operators 𝜎
that are not invertible by noting that
1
log2 Tr ((1 − 𝛿) 𝜌 + 𝛿𝜋) 𝛼 (𝜎 + 𝜀 1) 1−𝛼 . (7.4.24)

𝑄 𝛼 (𝜌∥𝜎) = lim+ lim+
𝜀→0 𝛿→0 𝛼 − 1

We start by establishing the important fact that the quantum relative entropy is
a special case of the Petz–Rényi relative entropy in the limit 𝛼 → 1.

348
Chapter 7: Quantum Entropies and Information

Proposition 7.22
Let 𝜌 be a state and 𝜎 a positive semi-definite operator. Then, in the limit
𝛼 → 1, the Petz–Rényi relative entropy converges to the quantum relative
entropy:
lim 𝐷 𝛼 (𝜌∥𝜎) = 𝐷 (𝜌∥𝜎). (7.4.25)
𝛼→1

Proof: Let us first consider the case 𝛼 ∈ (1, ∞). If supp(𝜌) ⊈ supp(𝜎), then
𝐷 𝛼 (𝜌∥𝜎) = +∞, so that lim𝛼→1+ 𝐷 𝛼 (𝜌∥𝜎) = +∞, consistent with the definition of
the quantum relative entropy in this case (see Definition 7.1). If supp(𝜌) ⊆ supp(𝜎),
then 𝐷 𝛼 (𝜌∥𝜎) is finite and we can write
1
𝐷 𝛼 (𝜌∥𝜎) = log2 𝑄 𝛼 (𝜌∥𝜎). (7.4.26)
𝛼−1
Now, let us define the function
𝑄 𝛼,𝛽 (𝜌∥𝜎) B Tr[𝜌 𝛼 𝜎 1−𝛽 ], (7.4.27)
so that 𝑄 𝛼 (𝜌∥𝜎) = 𝑄 𝛼,𝛼 (𝜌∥𝜎). By noting that supp(𝜌) ⊆ supp(𝜎) implies
𝑄 1 (𝜌∥𝜎) = Tr[𝜌Π𝜎 ] = 1 (where Π𝜎 is the projection onto the support of 𝜎), and
since log2 1 = 0, we can write 𝐷 𝛼 (𝜌∥𝜎) as
log2 𝑄 𝛼 (𝜌∥𝜎) − log2 𝑄 1 (𝜌∥𝜎)
𝐷 𝛼 (𝜌∥𝜎) = , (7.4.28)
𝛼−1
so that
d
lim 𝐷 𝛼 (𝜌∥𝜎) = log2 𝑄 𝛼 (𝜌∥𝜎) (7.4.29)
𝛼→1 d𝛼 𝛼=1
d
1 d𝛼 𝑄 𝛼 (𝜌∥𝜎) 𝛼=1
= (7.4.30)
ln(2) 𝑄 1 (𝜌∥𝜎)
1 d
= 𝑄 𝛼 (𝜌∥𝜎) , (7.4.31)
ln(2) d𝛼 𝛼=1

where the first equality follows from the definition of the derivative and the second
equality from the derivative of the natural logarithm, along with the chain rule.
Using the function 𝑄 𝛼,𝛽 and the chain rule, we write
d d d
𝑄 𝛼 (𝜌∥𝜎) = 𝑄 𝛼,1 (𝜌∥𝜎) + 𝑄 1,𝛽 (𝜌∥𝜎) . (7.4.32)
d𝛼 𝛼=1 d𝛼 𝛼=1 d𝛽 𝛽=1

349
Chapter 7: Quantum Entropies and Information

Then,
d d d
𝑄 𝛼,1 (𝜌∥𝜎) = Tr[𝜌 𝛼 Π𝜎 ] = Tr[𝜌 𝛼 ] = Tr[𝜌 𝛼 ln 𝜌], (7.4.33)
d𝛼 d𝛼 d𝛼
where we used the fact that 𝜌 𝛼 Π𝜎 = 𝜌 𝛼 since supp(𝜌) ⊆ supp(𝜎). Therefore,

d
𝑄 𝛼,1 (𝜌∥𝜎) = Tr[𝜌 ln 𝜌]. (7.4.34)
d𝛼 𝛼=1

Similarly,
d d
𝑄 1,𝛽 (𝜌∥𝜎) = Tr[𝜌𝜎 1−𝛽 ] = −Tr[𝜌𝜎 1−𝛽 ln 𝜎], (7.4.35)
d𝛽 d𝛽
so that
d
𝑄 1,𝛽 (𝜌∥𝜎) = −Tr[𝜌Π𝜎 ln 𝜎] = −Tr[𝜌 ln 𝜎], (7.4.36)
d𝛽 𝛽=1

where the last equality follows from the fact that the support condition supp(𝜌) ⊆
supp(𝜎) holds. So we find that

1 d
lim 𝐷 𝛼 (𝜌∥𝜎) = 𝑄 𝛼 (𝜌∥𝜎) = Tr[𝜌 log2 𝜌] − Tr[𝜌 log2 𝜎] (7.4.37)
𝛼→1 ln(2) d𝛼 𝛼=1

when supp(𝜌) ⊆ supp(𝜎), which means that for 𝛼 ∈ (1, ∞),

Tr[𝜌 log2 𝜌] − Tr[𝜌 log2 𝜎] if supp(𝜌) ⊆ supp(𝜎),
lim+ 𝐷 𝛼 (𝜌∥𝜎) =
𝛼→1 +∞ otherwise
= 𝐷 (𝜌∥𝜎).
(7.4.38)
Let us now consider the case 𝛼 ∈ (0, 1). If supp(𝜌) ⊆ supp(𝜎), then since the
limit in (7.4.37) holds from both sides, we find that

lim 𝐷 𝛼 (𝜌∥𝜎) = Tr[𝜌 log2 𝜌] − Tr[𝜌 log2 𝜎]. (7.4.39)

𝛼→1 −

If supp(𝜌) ⊈ supp(𝜎) (and Tr[𝜌𝜎] ≠ 0), then observe that we can write 𝐷 𝛼 as
log2 𝑄 𝛼 (𝜌∥𝜎) − log2 𝑄 1 (𝜌∥𝜎) log2 𝑄 1 (𝜌∥𝜎)
𝐷 𝛼 (𝜌∥𝜎) = + , (7.4.40)
𝛼−1 𝛼−1
350
Chapter 7: Quantum Entropies and Information

so that
d log2 Tr[𝜌Π𝜎 ]
lim− 𝐷 𝛼 (𝜌∥𝜎) = log2 𝑄 𝛼 (𝜌∥𝜎) + lim− , (7.4.41)
𝛼→1 d𝛼 𝛼=1 𝛼→1 𝛼 − 1
where we have used 𝑄 1 (𝜌∥𝜎) = Tr[𝜌Π𝜎 ]. Now, since Tr[𝜌𝜎] ≠ 0 and supp(𝜌)
⊈ supp(𝜎), we have that 0 < Tr[𝜌Π𝜎 ] < 1, which means that log2 Tr[𝜌Π𝜎 ] < 0.
1
Since lim𝛼→1− 𝛼−1 = −∞, we get that the second term in (7.4.41) is equal to +∞,
which means that lim𝛼→1− 𝐷 𝛼 (𝜌∥𝜎) = +∞. Therefore,
lim 𝐷 𝛼 (𝜌∥𝜎)
𝛼→1 −

Tr[𝜌 log2 𝜌] − Tr[𝜌 log2 𝜎] if supp(𝜌) ⊆ supp(𝜎),
= (7.4.42)
+∞ otherwise
= 𝐷 (𝜌∥𝜎).
To conclude, we have that lim𝛼→1+ 𝐷 𝛼 (𝜌∥𝜎) = lim𝛼→1− 𝐷 𝛼 (𝜌∥𝜎) = 𝐷 (𝜌∥𝜎),
which means that
lim 𝐷 𝛼 (𝜌∥𝜎) = 𝐷 (𝜌∥𝜎), (7.4.43)
𝛼→1
as required. ■

Proposition 7.23 Properties of the Petz–Rényi Relative Entropy

For all states 𝜌, 𝜌1 , 𝜌2 and positive semi-definite operators 𝜎, 𝜎1 , 𝜎2 , the
Petz–Rényi relative entropy satisfies the following properties.
1. Isometric invariance: For all 𝛼 ∈ (0, 1) ∪ (1, ∞) and for all isometries 𝑉,

𝐷 𝛼 (𝜌∥𝜎) = 𝐷 𝛼 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ). (7.4.44)

2. Monotonicity in 𝛼: For all 𝛼 ∈ (0, 1) ∪ (1, ∞), 𝐷 𝛼 is monotonically

increasing in 𝛼, i.e., 𝛼 < 𝛽 implies 𝐷 𝛼 (𝜌∥𝜎) ≤ 𝐷 𝛽 (𝜌∥𝜎).
3. Additivity: For all 𝛼 ∈ (0, 1) ∪ (1, ∞),

𝐷 𝛼 (𝜌1 ⊗ 𝜌2 ∥𝜎1 ⊗ 𝜎2 ) = 𝐷 𝛼 (𝜌1 ∥𝜎1 ) + 𝐷 𝛼 (𝜌2 ∥𝜎2 ). (7.4.45)

4. Direct-sum property: Let 𝑝 : X → [0, 1] be a probability distribution

over a finite alphabet X with associated |X|-dimensional system 𝑋, and let
𝑞 : X → [0, ∞) be a positive function on X. Let {𝜌 𝑥𝐴 }𝑥∈X be a set of states

351
Chapter 7: Quantum Entropies and Information

on a system 𝐴, and let {𝜎𝐴𝑥 }𝑥∈X be a set of positive semi-definite operators

on 𝐴. Then,
∑︁
𝑄 𝛼 (𝜌 𝑋 𝐴 ∥𝜎𝑋 𝐴 ) = 𝑝(𝑥) 𝛼 𝑞(𝑥) 1−𝛼 𝑄 𝛼 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ), (7.4.46)
𝑥∈X

where
∑︁
𝜌𝑋 𝐴 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 , (7.4.47)
𝑥∈X
∑︁
𝜎𝑋 𝐴 B 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 . (7.4.48)
𝑥∈X

Remark: Observe that the direct-sum property analogous to that for the quantum relative entropy
(see Proposition 7.3) does not hold for the Petz–Rényi relative entropy for every 𝛼 ∈ (0, 1) ∪ (1, ∞).
We can instead only make a statement for the Petz–Rényi relative quasi-entropy.

Proof:
1. Proof of isometric invariance: Let us start by writing 𝐷 𝛼 (𝜌∥𝜎) as in (7.4.8):
1
log2 Tr 𝜌 𝛼 (𝜎 + 𝜀 1) 1−𝛼 .

𝐷 𝛼 (𝜌∥𝜎) = lim+ (7.4.49)
𝜀→0 𝛼−1
Then, using the fact that (𝑉 𝜌𝑉 † ) 𝛼 = 𝑉 𝜌 𝛼𝑉 † , we find that

𝐷 𝛼 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † )
1
log2 Tr (𝑉 𝜌𝑉 † ) 𝛼 (𝑉 𝜎𝑉 † + 𝜀 1) 1−𝛼

= lim+ (7.4.50)
𝜀→0 𝛼 − 1
1
log2 Tr 𝑉 𝜌 𝛼𝑉 † (𝑉 𝜎𝑉 † + 𝜀 1) 1−𝛼 .

= lim+ (7.4.51)
𝜀→0 𝛼 − 1

Now, let Π B 𝑉𝑉 † be the projection onto the image of 𝑉, and let Π̂ B 1 − Π.

Then, we can write

𝑉 𝜎𝑉 † + 𝜀 1 = 𝑉 𝜎𝑉 † + 𝜀Π + 𝜀 Π̂ = 𝑉 (𝜎 + 𝜀 1)𝑉 † + 𝜀 Π̂. (7.4.52)

Since 𝑉 (𝜎 + 𝜀 1)𝑉 † and 𝜀 Π̂ are supported on orthogonal subspaces, we obtain

(𝑉 𝜎𝑉 † + 𝜀 1) 1−𝛼 = 𝑉 (𝜎 + 𝜀 1) 1−𝛼𝑉 † + 𝜀 1−𝛼 Π̂. (7.4.53)

352
Chapter 7: Quantum Entropies and Information

Therefore,
Tr 𝑉 𝜌 𝛼𝑉 † (𝑉 𝜎𝑉 † + 𝜀 1) 1−𝛼

= Tr 𝑉 𝜌 𝛼𝑉 †𝑉 (𝜎 + 𝜀 1) 1−𝛼𝑉 † + 𝜀 1−𝛼𝑉 𝜌 𝛼𝑉 † Π̂

(7.4.54)
= Tr 𝑉 𝜌 𝛼 (𝜎 + 𝜀 1) 1−𝛼𝑉 †

(7.4.55)
= Tr 𝜌 𝛼 (𝜎 + 𝜀 1) 1−𝛼 ,

(7.4.56)
where the second equality follows from the fact that 𝑉 † Π̂𝑉 = 𝑉 †𝑉 −𝑉 †𝑉𝑉 †𝑉 =
1 − 1 = 0, and the last equality from cyclicity of the trace. Therefore,
1
𝐷 𝛼 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ) = lim+ log2 Tr 𝜌 𝛼 (𝜎 + 𝜀 1) 1−𝛼

(7.4.57)
𝜀→0 𝛼 − 1
= 𝐷 𝛼 (𝜌∥𝜎), (7.4.58)
as required.
2. Proof of monotonicity in 𝛼: Using the expression in (7.4.2) for 𝐷 𝛼 along with
the form in (7.4.23) for the quasi-entropy 𝑄 𝛼 , let us write 𝐷 𝛼 (𝜌∥𝜎) as
1 ln⟨𝜑 𝜌 |𝑋 1−𝛼 |𝜑 𝜌 ⟩ 1 ln⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩
𝐷 𝛼 (𝜌∥𝜎) = =− , (7.4.59)
𝛼−1 ln(2) 𝛾 ln(2)
where 𝑋 = 𝜌 −1 ⊗ 𝜎 T , 𝛾 B 1 − 𝛼, and |𝜑 𝜌 ⟩ = (𝜌 2 ⊗ 1)|Γ⟩ is a purification of
1

𝜌. We first prove the result for 𝜌 invertible, and the proof for non-invertible
d d d𝛾 d
states 𝜌 follows by (7.4.24). Now, since d𝛼 = d𝛾 d𝛼 = − d𝛾 , we find that

d
𝐷 𝛼 (𝜌∥𝜎)
d𝛼
1 d 1
= ln⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩ (7.4.60)
ln(2) d𝛾 𝛾

1 1 𝜌 𝛾 𝜌 1 ⟨𝜑 𝜌 |𝑋 𝛾 ln 𝑋 |𝜑 𝜌 ⟩
= − 2 ln⟨𝜑 |𝑋 |𝜑 ⟩ + (7.4.61)
ln(2) 𝛾 𝛾 ⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩
1 −⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩ ln⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩ + 𝛾⟨𝜑 𝜌 |𝑋 𝛾 ln 𝑋 |𝜑 𝜌 ⟩
= (7.4.62)
ln(2) 𝛾 2 ⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩
1 −⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩ ln⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩ + ⟨𝜑 𝜌 |𝑋 𝛾 ln 𝑋 𝛾 |𝜑 𝜌 ⟩
= . (7.4.63)
ln(2) 𝛾 2 ⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩
Letting 𝑔(𝑥) B 𝑥 log2 𝑥, it follows that
d ⟨𝜑 𝜌 |𝑔(𝑋 𝛾 )|𝜑 𝜌 ⟩ − 𝑔(⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩)
𝐷 𝛼 (𝜌∥𝜎) = . (7.4.64)
d𝛼 𝛾 2 ⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩
353
Chapter 7: Quantum Entropies and Information

Then, since 𝑔 is operator convex, by the operator Jensen inequality in (2.3.23),

we conclude that

⟨𝜑 𝜌 |𝑔(𝑋 𝛾 )|𝜑 𝜌 ⟩ ≥ 𝑔(⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩), (7.4.65)

d
which means that d𝛼 𝐷 𝛼 (𝜌∥𝜎) ≥ 0. Therefore, 𝐷 𝛼 (𝜌∥𝜎) is monotonically
increasing in 𝛼, as required.
3. Proof of additivity: When all quantities are finite, we have that
1
𝐷 𝛼 (𝜌1 ⊗ 𝜌2 ∥𝜎1 ⊗ 𝜎2 ) = log2 Tr (𝜌1 ⊗ 𝜌2 ) 𝛼 (𝜎1 ⊗ 𝜎2 ) 1−𝛼 . (7.4.66)
𝛼−1
Using the fact that (𝑋 ⊗ 𝑌 ) 𝛽 = 𝑋 𝛽 ⊗ 𝑌 𝛽 for all positive semi-definite operators
𝑋, 𝑌 and all 𝛽 ∈ R, we obtain

𝑄 𝛼 (𝜌1 ⊗ 𝜌2 ∥𝜎1 ⊗ 𝜎2 ) = Tr (𝜌1 ⊗ 𝜌2 ) 𝛼 (𝜎1 ⊗ 𝜎2 ) 1−𝛼 (7.4.67)
𝛼
= Tr (𝜌1 ⊗ 𝜌2𝛼 )(𝜎11−𝛼 ⊗ 𝜎21−𝛼 ) (7.4.68)
𝛼 1−𝛼
= Tr 𝜌1 𝜎1 ⊗ 𝜌2𝛼 𝜎21−𝛼 (7.4.69)
= Tr[𝜌1𝛼 𝜎11−𝛼 ] · Tr[𝜌2𝛼 𝜎21−𝛼 ] (7.4.70)
= 𝑄 𝛼 (𝜌1 ∥𝜎1 ) · 𝑄 𝛼 (𝜌2 ∥𝜎2 ). (7.4.71)
1
Applying 𝛼−1 log2 and definitions, additivity follows.
4. Proof of the direct-sum property: Define the classical–quantum operators
∑︁ ∑︁
𝑥
𝜌𝑋 𝐴 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝐴 , 𝜎𝑋 𝐴 B 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 . (7.4.72)
𝑥∈X 𝑥∈X

Then,
∑︁
𝜌 𝛼𝑋 𝐴 = |𝑥⟩⟨𝑥| 𝑋 ⊗ ( 𝑝(𝑥) 𝜌 𝑥𝑋 ) 𝛼 (7.4.73)
𝑥∈X
∑︁
𝜎𝑋1−𝛼
𝐴 = |𝑥⟩⟨𝑥| 𝑋 ⊗ (𝑞(𝑥)𝜎𝐴𝑥 ) 1−𝛼 , (7.4.74)
𝑥∈X

so that
∑︁
𝜌 𝛼𝑋 𝐴 𝜎𝑋1−𝛼
𝐴 = |𝑥⟩⟨𝑥| 𝑋 ⊗ ( 𝑝(𝑥) 𝜌 𝑥𝐴 ) 𝛼 (𝑞(𝑥)𝜎𝐴𝑥 ) 1−𝛼 (7.4.75)
𝑥∈X

354
Chapter 7: Quantum Entropies and Information
∑︁
= 𝑝(𝑥) 𝛼 𝑞(𝑥) 1−𝛼 |𝑥⟩⟨𝑥| 𝑋 ⊗ (𝜌 𝑥𝐴 ) 𝛼 (𝜎𝐴𝑥 ) 1−𝛼 , (7.4.76)
𝑥∈X

and

𝑄 𝛼 (𝜌 𝑋 𝐴 ∥𝜎𝑋 𝐴 ) = Tr[𝜌 𝛼𝑋 𝐴 𝜎𝑋1−𝛼

𝐴 ] (7.4.77)
∑︁
= 𝑝(𝑥) 𝛼 𝑞(𝑥) 1−𝛼 Tr[(𝜌 𝑥𝐴 ) 𝛼 (𝜎𝐴𝑥 ) 1−𝛼 ] (7.4.78)
𝑥∈X
∑︁
= 𝑝(𝑥) 𝛼 𝑞(𝑥) 1−𝛼 𝑄 𝛼 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ), (7.4.79)
𝑥∈X

as required. ■

We now prove the data-processing inequality for the Petz–Rényi relative entropy
for 𝛼 ∈ [0, 1) ∪ (1, 2].

Theorem 7.24 Data-Processing Inequality for Petz–Rényi Relative En-

tropy
Let 𝜌 be a state, 𝜎 a positive semi-definite operator, and N a quantum channel.
Then, for all 𝛼 ∈ [0, 1) ∪ (1, 2],

𝐷 𝛼 (𝜌∥𝜎) ≥ 𝐷 𝛼 (N(𝜌)∥N(𝜎)). (7.4.80)

Proof: We prove the statement for 𝛼 ∈ (0, 1) ∪ (1, 2]. The case of 𝛼 = 0 then
follows by taking the limit 𝛼 → 0. From Stinespring’s theorem (Theorem 4.3), we
know that the action of every channel N on a linear operator 𝑋 can be written as

N(𝑋) = Tr𝐸 [𝑉 𝑋𝑉 † ], (7.4.81)

for some 𝑉, where 𝑉 is an isometry and 𝐸 is an auxiliary system with dimension

𝑑 𝐸 ≥ rank(ΓN ). As stated in (7.4.44), 𝐷 𝛼 is isometrically invariant. Therefore, it
suffices to show the data-processing inequality for 𝐷 𝛼 under partial trace; i.e., it
suffices to show that for every state 𝜌 𝐴𝐵 and every positive semi-definite operator
𝜎𝐴𝐵 ,
𝐷 𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) ≥ 𝐷 𝛼 (𝜌 𝐴 ∥𝜎𝐴 ), 𝛼 ∈ (0, 1) ∪ (1, 2]. (7.4.82)
We now proceed to prove this inequality. We prove it for 𝜌 𝐴𝐵 , and hence 𝜌 𝐴 ,
invertible, as well as for 𝜎𝐴𝐵 and 𝜎𝐴 invertible. The result follows in the general
355
Chapter 7: Quantum Entropies and Information

case of 𝜌 𝐴𝐵 and/or 𝜌 𝐴 non-invertible, as well as 𝜎𝐴𝐵 and/or 𝜎𝐴 non-invertible, by

applying the result to the invertible operators (1 − 𝛿) 𝜌 𝐴𝐵 + 𝛿𝜋 𝐴𝐵 and 𝜎𝐴𝐵 + 𝜀 1 𝐴𝐵 ,
with 𝛿, 𝜀 > 0, and taking the limit 𝛿 → 0+ , followed by 𝜀 → 0+ , because

𝐷 𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) = lim+ lim+ 𝐷 𝛼 ((1 − 𝛿) 𝜌 𝐴𝐵 + 𝛿𝜋 𝐴𝐵 ∥𝜎𝐴𝐵 + 𝜀 1 𝐴𝐵 ), (7.4.83)

𝜀→0 𝛿→0
𝐷 𝛼 (𝜌 𝐴 ∥𝜎𝐴 ) = lim+ lim+ 𝐷 𝛼 ((1 − 𝛿) 𝜌 𝐴 + 𝛿𝜋 𝐴 ∥𝜎𝐴 + 𝑑 𝐵 𝜀 1 𝐴 ), (7.4.84)
𝜀→0 𝛿→0

which can be verified in a similar manner to the proof of (7.4.8) in Proposition 7.21.
Using the quasi-entropy 𝑄 𝛼 , we can equivalently write (7.4.82) as

𝑄 𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) ≥ 𝑄 𝛼 (𝜌 𝐴 ∥𝜎𝐴 ), for 𝛼 ∈ (1, 2],

(7.4.85)
𝑄 𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) ≤ 𝑄 𝛼 (𝜌 𝐴 ∥𝜎𝐴 ), for 𝛼 ∈ (0, 1).

The remainder of this proof is thus devoted to establishing (7.4.85).

Consider that

𝑄 𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) = ⟨𝜑 𝜌 𝐴𝐵 | 𝑓 (𝜌 −1 T
𝐴𝐵 ⊗ 𝜎𝐴ˆ 𝐵ˆ )|𝜑
𝜌 𝐴𝐵
⟩, (7.4.86)
𝑄 𝛼 (𝜌 𝐴 ∥𝜎𝐴 ) = ⟨𝜑 𝜌 𝐴 | 𝑓 (𝜌 −1 T 𝜌𝐴
𝐴 ⊗ 𝜎𝐴ˆ )|𝜑 ⟩, (7.4.87)

where we have set

Now, let us define the isometry 𝑉𝐴 𝐴→𝐴𝐵 1

ˆ 𝐴ˆ 𝐵ˆ as
1
−1
𝐴ˆ 𝐵ˆ = 𝜌 𝐴𝐵 (𝜌 𝐴 ⊗ 1 𝐴ˆ )|Γ⟩ 𝐵 𝐵ˆ .
2 2
𝑉𝐴 𝐴→𝐴𝐵
ˆ (7.4.91)

Observe then that

1
−1 1

𝐴ˆ 𝐵ˆ |𝜑 ⟩ 𝐴 𝐴ˆ = 𝜌 𝐴𝐵 (𝜌 𝐴 ⊗ 1 𝐴ˆ )(𝜌 𝐴 ⊗ 1 𝐴ˆ )|Γ⟩ 𝐴 𝐴ˆ |Γ⟩ 𝐵 𝐵ˆ

𝜌𝐴 2 2 2
𝑉𝐴 𝐴→𝐴𝐵
ˆ (7.4.92)
1
= (𝜌 𝐴𝐵
2
⊗ 1 𝐴ˆ 𝐵ˆ )|Γ⟩ 𝐴 𝐴ˆ |Γ⟩𝐵 𝐵ˆ (7.4.93)
1 Observe that the isometry is related to the isometric extension in (4.6.39) of the Petz recovery
channel for the partial trace, as discussed in Section 4.6.1.

356
Chapter 7: Quantum Entropies and Information

= |𝜑 𝜌 𝐴𝐵 ⟩. (7.4.94)
We thus obtain, using the operator Jensen inequality (Theorem 2.16),
𝑄 𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) = ⟨𝜑 𝜌 𝐴 |𝑉 † 𝑓 (𝜌 −1 T 𝜌𝐴
𝐴𝐵 ⊗ 𝜎𝐴ˆ 𝐵ˆ )𝑉 |𝜑 ⟩ (7.4.95)
≥ ⟨𝜑 𝜌 𝐴 | 𝑓 (𝑉 † (𝜌 −1 T 𝜌𝐴
𝐴𝐵 ⊗ 𝜎𝐴ˆ 𝐵ˆ )𝑉)|𝜑 ⟩, for 𝛼 ∈ (1, 2], (7.4.96)
and
𝑄 𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) = ⟨𝜑 𝜌 𝐴 |𝑉 † 𝑓 (𝜌 −1 T 𝜌𝐴
𝐴𝐵 ⊗ 𝜎𝐴ˆ 𝐵ˆ )𝑉 |𝜑 ⟩ (7.4.97)
≤ ⟨𝜑 𝜌 𝐴 | 𝑓 (𝑉 † (𝜌 −1 T 𝜌𝐴
𝐴𝐵 ⊗ 𝜎𝐴ˆ 𝐵ˆ )𝑉)|𝜑 ⟩, for 𝛼 ∈ [0, 1). (7.4.98)
Note that the operator Jensen inequality is applicable because for 𝛼 ∈ (1, 2] the
function 𝑓 in (7.4.88) is operator convex and for 𝛼 ∈ (0, 1) it is operator concave.2
Now, consider that
𝑉 † (𝜌 −1 T
𝐴𝐵 ⊗ 𝜎𝐴ˆ 𝐵ˆ )𝑉
−1 1 1
−1
= ⟨Γ| 𝐵 𝐵ˆ (𝜌 𝐴 2 ⊗ 1 𝐴ˆ ) 𝜌 𝐴𝐵
2
𝐴𝐵 ⊗ 𝜎𝐴ˆ 𝐵ˆ ) 𝜌 𝐴𝐵 (𝜌 𝐴 ⊗ 1 𝐴ˆ )|Γ⟩ 𝐵 𝐵ˆ
(𝜌 −1 T 2 2
(7.4.99)
−1 −1
= ⟨Γ| 𝐵 𝐵ˆ (𝜌 𝐴 2 ⊗ 1 𝐴ˆ )(𝜌 0𝐴𝐵 ⊗ 𝜎𝐴Tˆ 𝐵ˆ )(𝜌 𝐴 2 ⊗ 1 𝐴ˆ )|Γ⟩𝐵 𝐵ˆ (7.4.100)
−1 −1
= ⟨Γ| 𝐵 𝐵ˆ (𝜌 𝐴 2 ⊗ 1 𝐴ˆ )( 1 𝐴𝐵 ⊗ 𝜎𝐴Tˆ 𝐵ˆ )(𝜌 𝐴 2 ⊗ 1 𝐴ˆ )|Γ⟩𝐵 𝐵ˆ (7.4.101)
= 𝜌 −1 T
𝐴 ⊗ ⟨Γ| 𝐵 𝐵ˆ 𝜎𝐴ˆ 𝐵ˆ |Γ⟩ 𝐵 𝐵ˆ (7.4.102)
= 𝜌 −1 T
𝐴 ⊗ 𝜎𝐴ˆ , (7.4.103)
where the last equality follows from the fact that
⟨Γ| 𝐵 𝐵ˆ 𝜎𝐴Tˆ 𝐵ˆ |Γ⟩𝐵 𝐵ˆ = Tr 𝐵ˆ [𝜎𝐴Tˆ 𝐵ˆ ] = 𝜎𝐴Tˆ , (7.4.104)
the last equality due to the fact that the transpose is taken on a product basis for
H 𝐴ˆ ⊗ H𝐵ˆ . Therefore, we find that
𝑄 𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) ≥ 𝑄 𝛼 (𝜌 𝐴 ∥𝜎𝐴 ), for 𝛼 ∈ (1, 2], (7.4.105)
𝑄 𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) ≤ 𝑄 𝛼 (𝜌 𝐴 ∥𝜎𝐴 ), for 𝛼 ∈ (0, 1), (7.4.106)
as required. This establishes the data-processing inequality for 𝐷 𝛼 under partial
trace. Combining this with the isometric invariance of 𝐷 𝛼 and Stinespring’s
theorem, we conclude that
𝐷 𝛼 (𝜌∥𝜎) ≥ 𝐷 𝛼 (N(𝜌)∥N(𝜎)), 𝛼 ∈ (0, 1) ∪ (1, 2] (7.4.107)
for every state 𝜌, positive semi-definite operator 𝜎, and channel N. ■
2 Indeed, the function 𝑥 𝛽 is operator convex for 𝛽 ∈ [−1, 0) ∪ [1, 2] and operator concave for
𝛽 ∈ (0, 1], where here 𝛽 = 1 − 𝛼.

357
Chapter 7: Quantum Entropies and Information

By taking the limit 𝛼 → 1 in the statement of data-processing inequality for 𝐷 𝛼 ,

and using Proposition 7.22, we immediately obtain the data-processing inequality
for the quantum relative entropy, stated previously as Theorem 7.4.

Corollary 7.25 Data-Processing Inequality for Quantum Relative En-

tropy
Let 𝜌 be a state, 𝜎 a positive semi-definite operator, and N a quantum channel.
Then,
𝐷 (𝜌∥𝜎) ≥ 𝐷 (N(𝜌)∥N(𝜎)). (7.4.108)

The data-processing inequality for the Petz–Rényi relative entropy can be written
using the Petz–Rényi relative quasi-entropy 𝑄 𝛼 as
1 1
log2 𝑄 𝛼 (𝜌∥𝜎) ≥ log2 𝑄 𝛼 (N(𝜌)|N(𝜎)) (7.4.109)
𝛼−1 𝛼−1
for all 𝛼 ∈ [0, 1) ∪ (1, 2]. Then, since 𝛼 − 1 is negative for 𝛼 ∈ [0, 1), we can use
the monotonicity of the function log2 to conclude that

𝑄 𝛼 (𝜌∥𝜎) ≥ 𝑄 𝛼 (N(𝜌)∥N(𝜎)), 𝛼 ∈ (1, 2], (7.4.110)

𝑄 𝛼 (𝜌∥𝜎) ≤ 𝑄 𝛼 (N(𝜌)∥N(𝜎)), 𝛼 ∈ [0, 1). (7.4.111)

With the data-processing inequality for the Petz–Rényi relative entropy in hand,
it is now straightforward to prove some of the following additional properties.

Proposition 7.26 Additional Properties of Petz–Rényi Relative Entropy

The Petz–Rényi relative entropy 𝐷 𝛼 satisfies the following properties for every
state 𝜌 and positive semi-definite operator 𝜎 for 𝛼 ∈ (0, 1) ∪ (1, 2]:
1. If Tr(𝜎) ≤ Tr(𝜌) = 1, then 𝐷 𝛼 (𝜌∥𝜎) ≥ 0.
2. Faithfulness: If Tr[𝜎] ≤ 1, we have that 𝐷 𝛼 (𝜌∥𝜎) = 0 if and only if
𝜌 = 𝜎.
3. If 𝜌 ≤ 𝜎, then 𝐷 𝛼 (𝜌∥𝜎) ≤ 0.
4. For every positive semi-definite operator 𝜎′ such that 𝜎′ ≥ 𝜎, we have
𝐷 𝛼 (𝜌∥𝜎) ≥ 𝐷 𝛼 (𝜌∥𝜎′).

358
Chapter 7: Quantum Entropies and Information

Proof:
1. By the data-processing inequality for 𝐷 𝛼 with respect to the trace channel Tr,
and letting 𝑥 = Tr(𝜌) = 1 and 𝑦 = Tr(𝜎), we find that
1
𝐷 𝛼 (𝜌∥𝜎) ≥ 𝐷 𝛼 (𝑥∥𝑦) = log2 Tr[𝑥 𝛼 𝑦 1−𝛼 ] (7.4.112)
𝛼−1
1
= log2 (𝑦 1−𝛼 ) (7.4.113)
𝛼−1
1−𝛼
= log2 𝑦 (7.4.114)
𝛼−1
= − log2 𝑦 (7.4.115)
≥ 0, (7.4.116)

where the last line follows from the assumption that 𝑦 = Tr(𝜎) ≤ 1.
2. Proof of faithfulness: If 𝜌 = 𝜎, then the following equalities hold for all
𝛼 ∈ (0, 1) ∪ (1, 2):
1
𝐷 𝛼 (𝜌∥ 𝜌) = log2 Tr[𝜌 𝛼 𝜌 1−𝛼 ] (7.4.117)
𝛼−1
1
= log2 Tr(𝜌) (7.4.118)
𝛼−1
= 0. (7.4.119)

Next, suppose that 𝛼 ∈ (0, 1) ∪ (1, 2) and 𝐷 𝛼 (𝜌∥𝜎) = 0. From the above, we
conclude that 𝐷 𝛼 (Tr(𝜌)∥Tr(𝜎)) = − log2 𝑦 ≥ 0. From the fact that log2 𝑦 = 0
if and only if 𝑦 = 1, we conclude that 𝐷 𝛼 (𝜌∥𝜎) = 0 implies Tr(𝜎) = Tr(𝜌) = 1,
so that 𝜎 is a density operator. Then, for every measurement channel M,

𝐷 𝛼 (M(𝜌)∥M(𝜎)) ≤ 𝐷 𝛼 (𝜌∥𝜎) = 0. (7.4.120)

On the other hand, since Tr(𝜎) = Tr(𝜌),

𝐷 (M(𝜌)∥M(𝜎)) ≥ 𝐷 𝛼 (Tr(M(𝜌))∥Tr(M(𝜎))) (7.4.121)

= 𝐷 𝛼 (Tr(𝜌)∥Tr(𝜎)) (7.4.122)
= 0, (7.4.123)

which means that 𝐷 𝛼 (M(𝜌)∥M(𝜎)) = 0 for all measurement channels M.

Now, recall that M(𝜌) and M(𝜎) are effectively probability distributions

359
Chapter 7: Quantum Entropies and Information

determined by the measurement. Since the classical Rényi relative entropy is

equal to zero if and only if its two arguments are equal, we can conclude that
M(𝜌) = M(𝜎). Since this is true for every measurement channel, we conclude
from Theorem 6.4 and the fact that the trace norm is a norm that 𝜌 = 𝜎.
So we have that 𝐷 𝛼 (𝜌∥𝜎) = 0 if and only if 𝜌 = 𝜎, as required.
3. Consider that 𝜌 ≤ 𝜎 implies that 𝜎 − 𝜌 ≥ 0. Then define the following positive
semi-definite operators:

𝜌ˆ B |0⟩⟨0| ⊗ 𝜌, (7.4.124)
ˆ B |0⟩⟨0| ⊗ 𝜌 + |1⟩⟨1| ⊗ (𝜎 − 𝜌) .
𝜎 (7.4.125)

By exploiting the direct-sum property of Petz–Rényi relative entropy in (7.4.46)

and the data-processing inequality, we find that

0 = 𝐷 𝛼 (𝜌∥ 𝜌) = 𝐷 𝛼 ( 𝜌∥ ˆ ≥ 𝐷 𝛼 (𝜌∥𝜎),
ˆ 𝜎) (7.4.126)

where the inequality follows from data processing with respect to partial trace
over the classical register.
4. Consider the state 𝜌ˆ B |0⟩⟨0| ⊗ 𝜌 and the operator 𝜎
ˆ B |0⟩⟨0| ⊗ 𝜎 + |1⟩⟨1| ⊗
(𝜎 − 𝜎), which is positive semi-definite because 𝜎′ ≥ 𝜎 by assumption.
′

Then
ˆ 1−𝛼 = |0⟩⟨0| ⊗ 𝜌 𝛼 𝜎 1−𝛼 ,
𝜌ˆ 𝛼 𝜎 (7.4.127)
which implies that
𝐷 𝛼 ( 𝜌∥ ˆ = 𝐷 𝛼 (𝜌∥𝜎).
ˆ 𝜎) (7.4.128)
ˆ = 𝜎′, and using the data-processing inequality
Then, observing that Tr1 [ 𝜎]
for 𝐷 𝛼 with respect to the partial trace channel Tr1 , we conclude that

𝐷 𝛼 (𝜌∥𝜎′) = 𝐷 𝛼 (Tr1 ( 𝜌)∥Tr

ˆ 1 ( 𝜎))
ˆ ≤ 𝐷 𝛼 ( 𝜌∥ ˆ = 𝐷 𝛼 (𝜌∥𝜎),
ˆ 𝜎) (7.4.129)

as required. ■

360
Chapter 7: Quantum Entropies and Information

Proposition 7.27 Joint Convexity & Concavity of the Petz–Rényi Relative

Quasi-Entropy
Let 𝑝 : X → [0, 1] be a probability distribution over a finite alphabet X with
associated |X|-dimensional system 𝑋, let {𝜌 𝑥𝐴 }𝑥∈X be a set of states on a system
𝐴, and let {𝜎𝐴𝑥 }𝑥∈X be a set of positive semi-definite operators on 𝐴. Then,
!
∑︁ ∑︁ ∑︁
𝑥 𝑥
𝑄𝛼 𝑝(𝑥) 𝜌 𝐴 𝑝(𝑥)𝜎𝐴 ≤ 𝑝(𝑥)𝑄 𝛼 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ), for 𝛼 ∈ (1, 2],
𝑥∈X 𝑥∈X 𝑥∈X
(7.4.130)
!
∑︁ ∑︁ ∑︁
𝑄𝛼 𝑝(𝑥) 𝜌 𝑥𝐴 𝑝(𝑥)𝜎𝐴𝑥 ≥ 𝑝(𝑥)𝑄 𝛼 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ), for 𝛼 ∈ [0, 1),
𝑥∈X 𝑥∈X 𝑥∈X
(7.4.131)

Furthermore, the Petz–Rényi relative entropy 𝐷 𝛼 is jointly convex for 𝛼 ∈ [0, 1):
!
∑︁ ∑︁ ∑︁
𝑥 𝑥
𝐷𝛼 𝑝(𝑥) 𝜌 𝐴 𝑝(𝑥)𝜎𝐴 ≤ 𝑝(𝑥)𝐷 𝛼 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ), 𝛼 ∈ [0, 1).
𝑥∈X 𝑥∈X 𝑥∈X
(7.4.132)

Proof: By the direct-sum property of 𝑄 𝛼 and applying (7.4.110)–(7.4.111) and

Proposition 7.17, we conclude (7.4.130)–(7.4.131).
1
For 𝛼 ∈ [0, 1), applying log2 to both sides of (7.4.131) and multiplying by 𝛼−1 ,
which is negative, we conclude that
!
1 ∑︁ ∑︁
log2 𝑄 𝛼 𝑝(𝑥) 𝜌 𝑥𝐴 𝑝(𝑥)𝜎𝐴𝑥
𝛼−1
𝑥∈X 𝑥∈X
! (7.4.133)
1 ∑︁
≤ log2 𝑝(𝑥)𝑄 𝛼 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ) .
𝛼−1
𝑥∈X

Then, since − log2 is a convex function, and using the definition of 𝐷 𝛼 in terms of
𝑄 𝛼 , we find that
!
∑︁
𝑥
∑︁
𝑥
∑︁ 1
𝐷𝛼 𝑝(𝑥) 𝜌 𝐴 𝑝(𝑥)𝜎𝐴 ≤ 𝑝(𝑥) log2 𝑄 𝛼 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ) (7.4.134)
𝛼−1
𝑥∈X 𝑥∈X 𝑥∈X

361
Chapter 7: Quantum Entropies and Information
∑︁
= 𝑝(𝑥)𝐷 𝛼 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ), (7.4.135)
𝑥∈X

as required. ■

7.5 Sandwiched Rényi Relative Entropy

A second example of a generalized divergence is the sandwiched Rényi relative
entropy, which we define as follows.

Definition 7.28 Sandwiched Rényi Relative Entropy

For all 𝛼 ∈ (0, 1) ∪ (1, ∞), we define the sandwiched Rényi relative quasi-
entropy for every state 𝜌 and positive semi-definite operator 𝜎 as

e𝛼 (𝜌∥𝜎)
𝑄

 h 1− 𝛼 i if 𝛼 ∈ (0, 1), or
1− 𝛼 𝛼

 Tr 𝜎 2𝛼 𝜌𝜎 2𝛼
 (7.5.1)
B 𝛼 ∈ (1, ∞), supp(𝜌) ⊆ supp(𝜎),

 +∞
 otherwise.

The sandwiched Rényi relative entropy is then defined as

e𝛼 (𝜌∥𝜎) B 1 e𝛼 (𝜌∥𝜎).
𝐷 log2 𝑄 (7.5.2)
𝛼−1

Observe that we can use the definition of the Schatten norm from (2.2.87) to
write the sandwiched Rényi relative entropy 𝐷
e𝛼 in the following different ways:

e𝛼 (𝜌∥𝜎) = 𝛼 1− 𝛼 1− 𝛼
𝐷 log2 𝜎 2𝛼 𝜌𝜎 2𝛼 (7.5.3)
𝛼−1 𝛼
𝛼 1 1− 𝛼 1
= log2 𝜌 2 𝜎 𝛼 𝜌 2 (7.5.4)
𝛼−1 𝛼
2𝛼 1 1− 𝛼
= log2 𝜌 2 𝜎 2𝛼 . (7.5.5)
𝛼−1 2𝛼

The expression in (7.5.4) and Proposition 2.8 then lead us to the following variational

362
Chapter 7: Quantum Entropies and Information

e𝛼 for 𝛼 ∈ (0, 1) ∪ (1, ∞):

characterization of 𝐷
e𝛼 (𝜌∥𝜎) = sup 𝐷
𝐷 e𝛼 (𝜌∥𝜎; 𝜏), (7.5.6)
𝜏>0,
Tr(𝜏)=1

where
e𝛼 (𝜌∥𝜎; 𝜏)
𝐷
(
+∞ h i if 𝛼 > 1, supp(𝜌) ⊈ supp(𝜎), (7.5.7)
B 𝛼 1 1− 𝛼 1 𝛼−1
𝛼−1 log2 Tr 𝜌 𝜎
2 𝛼 𝜌 𝜏
2 𝛼 otherwise.

Also observe that for 𝛼 = 21 , the sandwiched Rényi relative entropy 𝐷

e 1 (𝜌∥𝜎) can
2
be expressed as
e 1 (𝜌∥𝜎) = − log2 𝐹 (𝜌, 𝜎),
𝐷 (7.5.8)
2

where we recall the definition of the fidelity 𝐹 (𝜌, 𝜎) from Definition 6.5.
In the case 𝛼 ∈ (1, ∞), since 1 − 𝛼 is negative, we take the inverse of 𝜎. In case
𝜎 is not invertible, we take the inverse of 𝜎 on its support. An alternative to this
convention is to define 𝐷e𝛼 (𝜌∥𝜎) for 𝛼 > 1 using only positive definite 𝜎, and for
positive semi-definite 𝜎, define
e𝛼 (𝜌∥𝜎) = lim 𝐷
𝐷 e𝛼 (𝜌∥𝜎 + 𝜀 1). (7.5.9)
+ 𝜀→0

Both alternatives are equivalent, as we now show (similar to what we did in the
proofs of Propositions 7.2 and 7.21).

Proposition 7.29
For every state 𝜌 and positive semi-definite operator 𝜎,
1− 𝛼 1 𝛼
1 h 1 i
e𝛼 (𝜌∥𝜎) = lim
𝐷 log2 Tr 𝜌 (𝜎 + 𝜀 1) 𝜌
2 𝛼 2 . (7.5.10)
+
𝜀→0 𝛼−1

For 𝛼 ∈ (1, ∞), since 1 − 𝛼 is negative and 𝜎 is not necessarily invertible, let
us start by decomposing the underlying Hilbert space H as H = supp(𝜎) ⊕ ker(𝜎),
363
Chapter 7: Quantum Entropies and Information

as in (7.2.6), so that
1 1
!
(𝜌 2 )0,0 (𝜌 2 )0,1
1 𝜎 0
𝜌 = 2 1 † 1 , 𝜎= . (7.5.11)
(𝜌 2 )0,1 (𝜌 2 )1,1 0 0

Then, writing 1 = Π𝜎 + Π𝜎⊥ , where Π𝜎 is the projection onto the support of 𝜎 and
Π𝜎⊥ is the projection onto the orthogonal complement of supp(𝜎), we find that

𝜎 + 𝜀Π𝜎 0
𝜎 + 𝜀1 = , (7.5.12)
0 𝜀Π𝜎⊥

which implies that

1− 𝛼
!
(𝜎 + 𝜀Π𝜎 )
(𝜎 + 𝜀 1)
1− 𝛼 𝛼 0
𝛼 = 1− 𝛼 . (7.5.13)
0 (𝜀Π𝜎⊥ ) 𝛼

1 1 1 1
If supp(𝜌) ⊆ supp(𝜎), then (𝜌 2 )1,0 = 0, (𝜌 2 )1,1 = 0, and (𝜌 2 )0,0 = 𝜌 2 , which
means that
1 1− 𝛼 1
(𝜌 2 ) 0,0 (𝜎 + 𝜀Π𝜎 ) 𝛼 (𝜌 2 ) 0,0 0
𝜌 2 (𝜎 + 𝜀 1) 𝛼 𝜌 2 =
1 1− 𝛼 1
, (7.5.14)
0 0

so that
1− 𝛼 1 𝛼
1 h 1 i
lim log2 Tr 𝜌 2 (𝜎 + 𝜀 1) 𝛼 𝜌 2 e𝛼 (𝜌∥𝜎),
=𝐷 𝛼 ∈ (1, ∞), (7.5.15)
𝜀→0+ 𝛼 − 1

as required.
1
If supp(𝜌) ⊈ supp(𝜎), then (𝜌 2 )1,1 is non-zero. In this case, we use the fact
that
1− 𝛼
!
(𝜎 + ) 0 0
(𝜎 + 𝜀 1) 𝛼 =
1− 𝛼 𝜀Π 𝜎 𝛼 0
≥ 1− 𝛼 (7.5.16)
0 (𝜀Π𝜎⊥ ) 𝛼
1− 𝛼
0 (𝜀Π𝜎⊥ ) 𝛼

to conclude that

𝜌 2 (𝜎 + 𝜀 1)
1 1− 𝛼 1
𝛼 𝜌2
1 1− 𝛼 1 † 1 1− 𝛼 1
!
(𝜌 )0,1 (𝜀Π𝜎⊥ ) 𝛼 (𝜌 2 )0,1
2 (𝜌 )0,1 (𝜀Π𝜎⊥ ) 𝛼 (𝜌 2 )1,1
2
≥ 1 1− 𝛼 1 † 1 1− 𝛼 1 (7.5.17)
(𝜌 2 )1,1 (𝜀Π𝜎⊥ ) 𝛼 (𝜌 2 )0,1 (𝜌 2 )1,1 (𝜀Π𝜎⊥ ) 𝛼 (𝜌 2 )1,1
364
Chapter 7: Quantum Entropies and Information

1 1− 𝛼 1 † 1 1− 𝛼 1
!
1− 𝛼 (𝜌 )0,1 (Π𝜎⊥ ) 𝛼 (𝜌 2 )0,1
2 (𝜌 )0,1 (Π𝜎⊥ ) 𝛼 (𝜌 2 )1,1
2
=𝜀 𝛼
1 1− 𝛼 1 † 1 1− 𝛼 1 . (7.5.18)
(𝜌 2 )1,1 (Π𝜎⊥ ) 𝛼 (𝜌 2 )0,1 (𝜌 2 )1,1 (Π𝜎⊥ ) 𝛼 (𝜌 2 )1,1
1− 𝛼
Now, since 𝛼 ∈ (1, ∞), we have that lim𝜀→0+ 𝜀 𝛼 = +∞; therefore, by continuity
arguments similar to those given above, we conclude that
1− 𝛼 1 𝛼
1 h 1 i
lim log2 Tr 𝜌 2 (𝜎 + 𝜀 1) 𝛼 𝜌 2 ≥ +∞, (7.5.19)
𝜀→0+ 𝛼 − 1

for the case 𝛼 ∈ (1, ∞) and supp(𝜌) ⊈ supp(𝜎). This implies that
1− 𝛼 1 𝛼
1 h 1 i
lim log2 Tr 𝜌 2 (𝜎 + 𝜀 1) 𝛼 𝜌 2 e𝛼 (𝜌∥𝜎),
=𝐷 (7.5.20)
𝜀→0+ 𝛼 − 1

for the case 𝛼 ∈ (1, ∞) and supp(𝜌) ⊈ supp(𝜎), as required. ■

The Petz–Rényi and sandwiched Rényi relative entropies are two ways of
defining a quantum generalization of the classical Rényi relative entropy in (7.4.19).
Indeed, if 𝜌 and 𝜎 are both classical, commuting states (i.e., both are diagonal in
the same basis), then both 𝐷 𝛼 (𝜌∥𝜎) and 𝐷 e𝛼 (𝜌∥𝜎) reduce to the classical Rényi
relative entropy in (7.4.19). In general, there are often many (in fact, typically
infinitely many) ways to generalize classical quantities to the quantum (i.e., non-
commutative) case such that we recover the original classical quantity in the special
case of commuting operators. What distinguishes one generalization from another
is the role that they play in characterizing operational tasks in quantum information
theory, which is a theme explored throughout this book.
We now establish the important fact that the quantum relative entropy is a
special case of the sandwiched Rényi relative entropy in the limit 𝛼 → 1. The
proof proceeds very similarly to the proof of the same property for the Petz–Rényi
relative entropy.

Proposition 7.30
Let 𝜌 be a state and 𝜎 a positive semi-definite operator. Then, in the limit
𝛼 → 1, the sandwiched Rényi relative entropy converges to the quantum relative
entropy:
e𝛼 (𝜌∥𝜎) = 𝐷 (𝜌∥𝜎).
lim 𝐷 (7.5.21)
𝛼→1

365
Chapter 7: Quantum Entropies and Information

Proof: Let us first consider the case 𝛼 ∈ (1, ∞). If supp(𝜌) ⊈ supp(𝜎), then
e𝛼 (𝜌∥𝜎) = +∞ for all 𝛼 ∈ (1, ∞), so that lim𝛼→1+ 𝐷
𝐷 e𝛼 (𝜌∥𝜎) = +∞. If supp(𝜌) ⊆
e𝛼 (𝜌∥𝜎) is finite and using (7.5.4) we write
supp(𝜎), then 𝐷
1 1 h 1 1− 𝛼 1 𝛼 i
e𝛼 (𝜌∥𝜎) =
𝐷 log2 𝑄 𝛼 (𝜌∥𝜎) =
e log2 Tr 𝜌 2 𝜎 𝛼 𝜌 2 . (7.5.22)
𝛼−1 𝛼−1
Let us define the function
𝛽
1 1− 𝛼 1
e𝛼,𝛽 (𝜌∥𝜎) B Tr 𝜌 𝜎
𝑄 2 𝛼 𝜌 2 , (7.5.23)

so that 𝑄e𝛼 (𝜌∥𝜎) = 𝑄 e𝛼,𝛼 (𝜌∥𝜎). By noting that supp(𝜌) ⊆ supp(𝜎) implies
e1 (𝜌∥𝜎) = Tr[𝜌Π𝜎 ] = 1 (where Π𝜎 is the projection onto the support of 𝜎), and
𝑄
e𝛼 (𝜌∥𝜎) as
since log2 1 = 0, we can write 𝐷

e𝛼 (𝜌∥𝜎) − log2 𝑄
log2 𝑄 e1 (𝜌∥𝜎)
e𝛼 (𝜌∥𝜎) =
𝐷 , (7.5.24)
𝛼−1
so that

e𝛼 (𝜌∥𝜎) = d log2 𝑄
lim 𝐷 e𝛼 (𝜌∥𝜎) (7.5.25)
𝛼→1 d𝛼 𝛼=1
d e
1 d𝛼 𝑄 𝛼 (𝜌∥𝜎) 𝛼=1
= (7.5.26)
ln(2) e1 (𝜌∥𝜎)
𝑄
1 d e
= 𝑄 𝛼 (𝜌∥𝜎) , (7.5.27)
ln(2) d𝛼 𝛼=1

d e d e d e
𝑄 𝛼 (𝜌∥𝜎) = 𝑄 𝛼,1 (𝜌∥𝜎) + 𝑄 1,𝛽 (𝜌∥𝜎) . (7.5.28)
d𝛼 𝛼=1 d𝛼 𝛼=1 d𝛽 𝛽=1

Then,
d e d h 1− 𝛼 i 1 h 1− 𝛼 i
𝑄 𝛼,1 (𝜌∥𝜎) = Tr 𝜌𝜎 𝛼 = − 2 Tr 𝜌𝜎 𝛼 ln 𝜎 , (7.5.29)
d𝛼 d𝛼 𝛼

366
Chapter 7: Quantum Entropies and Information

so that
d e
𝑄 𝛼,1 (𝜌∥𝜎) = −Tr[𝜌Π𝜎 ln 𝜎] = −Tr[𝜌 ln 𝜎], (7.5.30)
d𝛼 𝛼=1

where we used the fact that supp(𝜌) ⊆ supp(𝜎) to obtain the last equality. Similarly,

d e d 1 1 𝛽 d
𝑄 1,𝛽 (𝜌∥𝜎) = Tr 𝜌 2 Π𝜎 𝜌 2 = Tr[𝜌 𝛽 ] = Tr[𝜌 𝛽 ln 𝜌], (7.5.31)
d𝛽 d𝛽 d𝛽
where we again used the fact that supp(𝜌) ⊆ supp(𝜎) in order to conclude that
1 1
𝜌 2 Π𝜎 𝜌 2 = 𝜌. Therefore,
d e
𝑄 1,𝛽 (𝜌∥𝜎) = Tr[𝜌 ln 𝜌]. (7.5.32)
d𝛽 𝛽=1

So we find that

e𝛼 (𝜌∥𝜎) = 1 d e
lim 𝐷 𝑄 𝛼 (𝜌∥𝜎) = Tr[𝜌 log2 𝜌] − Tr[𝜌 log2 𝜎] (7.5.33)
𝛼→1 ln(2) d𝛼 𝛼=1

when supp(𝜌) ⊆ supp(𝜎). Therefore, for 𝛼 ∈ (1, ∞),

e𝛼 (𝜌∥𝜎)
lim 𝐷
𝛼→1+

Tr[𝜌 log2 𝜌] − Tr[𝜌 log2 𝜎] if supp(𝜌) ⊆ supp(𝜎), (7.5.34)
=
+∞ otherwise
= 𝐷 (𝜌∥𝜎).

Let us now consider the case 𝛼 ∈ (0, 1). If supp(𝜌) ⊆ supp(𝜎), then since the
limit in (7.5.33) holds from both sides, we find that
e𝛼 (𝜌∥𝜎) = Tr[𝜌 log2 𝜌] − Tr[𝜌 log2 𝜎].
lim 𝐷 (7.5.35)
𝛼→1 −

If supp(𝜌) ⊈ supp(𝜎) (and Tr[𝜌𝜎] ≠ 0), then observe that we can write 𝐷
e𝛼 as

e𝛼 (𝜌∥𝜎) − log2 𝑄
log2 𝑄 e1 (𝜌∥𝜎) log2 𝑄
e1 (𝜌∥𝜎)
e𝛼 (𝜌∥𝜎) =
𝐷 + , (7.5.36)
𝛼−1 𝛼−1
so that
log2 Tr[𝜌Π𝜎 ]
e𝛼 (𝜌∥𝜎) = d log2 𝑄
lim− 𝐷 e𝛼 (𝜌∥𝜎) + lim− , (7.5.37)
𝛼→1 d𝛼 𝛼=1 𝛼→1 𝛼 − 1
367
Chapter 7: Quantum Entropies and Information

where we have used 𝑄 e1 (𝜌∥𝜎) = Tr[𝜌Π𝜎 ]. Now, since supp(𝜌) ⊈ supp(𝜎) and
Tr[𝜌𝜎] ≠ 0, we have that 0 < Tr[𝜌Π𝜎 ] < 1, which means that log2 Tr[𝜌Π𝜎 ] < 0.
1
Since lim𝛼→1− 𝛼−1 = −∞, we find that the second term in (7.5.37) is equal to +∞,
which means that lim𝛼→1− 𝐷e𝛼 (𝜌∥𝜎) = +∞. Therefore,

lim 𝐷e𝛼 (𝜌∥𝜎)

𝛼→1 −

Tr[𝜌 log2 𝜌] − Tr[𝜌 log2 𝜎] if supp(𝜌) ⊆ supp(𝜎), (7.5.38)
=
+∞ otherwise
= 𝐷 (𝜌∥𝜎).

To conclude, we have that lim𝛼→1+ 𝐷e𝛼 (𝜌∥𝜎) = lim𝛼→1− 𝐷

e𝛼 (𝜌∥𝜎) = 𝐷 (𝜌∥𝜎),
which means that (7.5.21) holds. ■

In the following proposition, we state several basic properties of the sandwiched

Rényi relative entropy. The proofs of the first four properties are analogous to those
of the same properties of the Petz–Rényi relative entropy. The last property in the
proposition establishes that the sandwiched Rényi relative entropy is always less
than or equal to the Petz–Rényi relative entropy.

Proposition 7.31 Properties of Sandwiched Rényi Relative Entropy

For all states 𝜌, 𝜌1 , 𝜌2 and positive semi-definite operators 𝜎, 𝜎1 , 𝜎2 , the
sandwiched Rényi relative entropy 𝐷 e𝛼 satisfies the following properties:
1. Isometric invariance: For all 𝛼 ∈ (0, 1) ∪ (1, ∞) and for every isometry 𝑉,

𝐷 e𝛼 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ).
e𝛼 (𝜌∥𝜎) = 𝐷 (7.5.39)

2. Monotonicity in 𝛼: For all 𝛼 ∈ (0, 1) ∪ (1, ∞), 𝐷 e𝛼 is monotonically

e𝛼 (𝜌∥𝜎) ≤ 𝐷
increasing in 𝛼, i.e., 𝛼 < 𝛽 implies 𝐷 e 𝛽 (𝜌∥𝜎).

3. Additivity: For all 𝛼 ∈ (0, 1) ∪ (1, ∞),

e𝛼 (𝜌1 ⊗ 𝜌2 ∥𝜎1 ⊗ 𝜎2 ) = 𝐷
𝐷 e𝛼 (𝜌1 ∥𝜎1 ) + 𝐷
e𝛼 (𝜌2 ∥𝜎2 ). (7.5.40)

4. Direct-sum property: Let 𝑝 : X → [0, 1] be a probability distribution

over a finite alphabet X with associated |X|-dimensional system 𝑋, and let
𝑞 : X → [0, ∞) be a positive function on X. Let {𝜌 𝑥𝐴 }𝑥∈X be a set of states

368
Chapter 7: Quantum Entropies and Information

on a system 𝐴, and let {𝜎𝐴𝑥 }𝑥∈X be a set of positive semi-definite operators

on 𝐴. Then,
∑︁
𝑄 𝛼 (𝜌 𝑋 𝐴 ∥𝜎𝑋 𝐴 ) =
e 𝑝(𝑥) 𝛼 𝑞(𝑥) 1−𝛼 𝑄
e𝛼 (𝜌 𝑥 ∥𝜎 𝑥 ).
𝐴 𝐴 (7.5.41)
𝑥∈X

where
∑︁
𝜌𝑋 𝐴 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 , (7.5.42)
𝑥∈X
∑︁
𝜎𝑋 𝐴 B 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 . (7.5.43)
𝑥∈X

5. If 𝜌1 ≤ 𝛾𝜌2 for some 𝛾 ≥ 1, then

e𝛼 (𝜌1 ∥𝜎) ≤ 𝛼
𝐷 log2 𝛾 + 𝐷
e𝛼 (𝜌2 ∥𝜎), 𝛼 > 1. (7.5.44)
𝛼−1

6. For all 𝛼 ∈ (0, 1) ∪ (1, ∞), the sandwiched Rényi relative entropy 𝐷 e𝛼 is
always less than or equal to the Petz–Rényi relative entropy 𝐷 𝛼 , i.e.,
e𝛼 (𝜌∥𝜎) ≤ 𝐷 𝛼 (𝜌∥𝜎).
𝐷 (7.5.45)

Furthermore, for 𝛼 ∈ (0, 1), we have

𝛼𝐷 𝛼 (𝜌∥𝜎) + (1 − 𝛼)(− log2 Tr[𝜎]) ≤ 𝐷

e𝛼 (𝜌∥𝜎). (7.5.46)

Proof:
e𝛼 (𝜌∥𝜎) using the
1. Proof of isometric invariance: Let us start by writing 𝐷
function ∥·∥ 𝛼 as in (7.5.4):
𝛼
log2 𝜌 2 (𝜎 + 𝜀 1) 𝛼 𝜌 2
1 1− 𝛼 1
e𝛼 (𝜌∥𝜎) = lim
𝐷 , (7.5.47)
𝜀→0 𝛼 − 1 𝛼

where we have also made use of the fact that for positive semi-definite operators,
e𝛼 (𝜌∥𝜎) can be defined as in (7.5.10). Now,
𝐷
e𝛼 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † )
𝐷
𝛼 (7.5.48)
log2 (𝑉 𝜌𝑉 † ) 2 (𝑉 𝜎𝑉 † + 𝜀 1) 𝛼 (𝑉 𝜌𝑉 † ) 2
1 1− 𝛼 1
= lim .
𝜀→0 𝛼 − 1 𝛼

369
Chapter 7: Quantum Entropies and Information

1 1
Since (𝑉 𝜌𝑉 † ) 2 = 𝑉 𝜌 2 𝑉 † , we find that

(𝑉 𝜌𝑉 † ) 2 (𝑉 𝜎𝑉 † + 𝜀 1)
1 1− 𝛼 1
𝛼 (𝑉 𝜌𝑉 † ) 2
𝛼
= 𝑉 𝜌 𝑉 (𝑉 𝜎𝑉 + 𝜀 1)
1 1− 𝛼 1
† † †
2 𝛼 𝑉𝜌 𝑉 2 (7.5.49)
𝛼
= 𝜌 2 𝑉 † (𝑉 𝜎𝑉 † + 𝜀 1)
1 1− 𝛼 1
𝛼 𝑉𝜌2 , (7.5.50)
𝛼

where the last equality follows from the isometric invariance of the function
∥·∥ 𝛼 . Now, let Π B 𝑉𝑉 † be the projection onto the image of 𝑉, and let
Π̂ B 1 − Π. Then, we write

𝑉 𝜎𝑉 † + 𝜀 1 = 𝑉 𝜎𝑉 † + 𝜀Π + 𝜀 Π̂ = 𝑉 (𝜎 + 𝜀 1)𝑉 † + 𝜀 Π̂. (7.5.51)

Since 𝑉 (𝜎 + 𝜀 1)𝑉 † and 𝜀 Π̂ are supported on orthogonal subspaces, we obtain

(𝑉 𝜎𝑉 † + 𝜀 1) = 𝑉 (𝜎 + 𝜀 1)
1− 𝛼 1− 𝛼 1− 𝛼
𝛼 𝛼 𝑉† + 𝜀 𝛼 Π̂. (7.5.52)

Continuing from (7.5.50), we thus find that

𝜌 2 𝑉 † (𝑉 𝜎𝑉 † + 𝜀 1) 𝛼 𝑉 𝜌 2
1 1− 𝛼 1

𝛼
= 𝜌 2 𝑉 † 𝑉 (𝜎 + 𝜀 1) 𝛼 𝑉 † + 𝜀 𝛼 Π̂ 𝑉 𝜌 2
1 1− 𝛼 1− 𝛼 1
(7.5.53)
𝛼
= 𝜌 2 𝑉 †𝑉 (𝜎 + 𝜀 1)
1 1− 𝛼 1 1− 𝛼 1 1
𝛼 𝑉 †𝑉 𝜌 2 + 𝜀 𝛼 𝜌 2 𝑉 † Π̂𝑉 𝜌 2 (7.5.54)
𝛼
= 𝜌 2 (𝜎 + 𝜀 1)
1 1− 𝛼 1
𝛼 𝜌2 , (7.5.55)
𝛼

where the last equality follows from the fact that 𝑉 † Π̂𝑉 = 𝑉 †𝑉 − 𝑉 †𝑉𝑉 †𝑉 =
1 − 1 = 0. Therefore,
𝛼
log2 𝜌 2 (𝜎 + 𝜀 1) 𝛼 𝜌 2
1 1− 𝛼 1
e𝛼 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ) = lim
𝐷
𝜀→0 𝛼 − 1 𝛼 (7.5.56)
e𝛼 (𝜌∥𝜎),
=𝐷
as required.
e𝛼 (𝜌∥𝜎; 𝜏) defined
2. Proof of monotonicity in 𝛼: We make use of the function 𝐷
in (7.5.7), which we can write as

e𝛼 (𝜌∥𝜎; 𝜏) = − 1 log2 ⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩ = − 1 ln⟨𝜑 |𝑋 |𝜑 ⟩ ,

𝜌 𝛾 𝜌
𝐷 (7.5.57)
𝛾 𝛾 ln(2)
370
Chapter 7: Quantum Entropies and Information

𝛼 and |𝜑 ⟩ = (𝜌 ⊗ 1)|Γ⟩ is a purification of 𝜌.

1
where 𝑋 = 𝜏 −1 ⊗ 𝜎 T , 𝛾 B 1−𝛼 𝜌 2

We prove monotonicity of this quantity by taking its derivative with respect

d𝛾
to 𝛼 and showing that it is non-negative. Since d𝛼 = − 𝛼12 , we can express the
derivative with respect to 𝛼 in terms of the derivative with respect to 𝛾 using
d d d𝛾 1 d
d𝛼 = d𝛾 d𝛼 = − 𝛼2 d𝛾 . Therefore,

d e 1 d 1 ln⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩ 1 d𝑓
𝐷 𝛼 (𝜌∥𝜎; 𝜏) = − 2 − = 2 , (7.5.58)
d𝛼 𝛼 d𝛾 𝛾 ln(2) 𝛼 d𝛾
where
1 ln⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩
𝑓 (𝛾) B . (7.5.59)
𝛾 ln(2)
Now,

d𝑓 1 1 1 ⟨𝜑 𝜌 |𝑋 𝛾 ln 𝑋 |𝜑 𝜌 ⟩
= − 2 ln⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩ + (7.5.60)
d𝛾 ln(2) 𝛾 𝛾 ⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩
−⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩ log2 ⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩ + ⟨𝜑 𝜌 |𝑋 𝛾 log2 𝑋 𝛾 |𝜑 𝜌 ⟩
= . (7.5.61)
𝛾 2 ⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩
Now, let 𝑔(𝑥) B 𝑥 log2 𝑥. Then, we can write
d𝑓 ⟨𝜑 𝜌 |𝑔(𝑋 𝛾 )|𝜑 𝜌 ⟩ − 𝑔(⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩)
= . (7.5.62)
d𝛾 𝛾 2 ⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩
Since 𝑔 is operator convex, the operator Jensen inequality in (2.3.23) implies
that
⟨𝜑 𝜌 |𝑔(𝑋 𝛾 )|𝜑 𝜌 ⟩ ≥ 𝑔(⟨𝜑 𝜌 |𝑋 𝛾 |𝜑 𝜌 ⟩), (7.5.63)
which implies that dd𝛾𝑓 ≥ 0. Therefore, 𝐷
e𝛼 (𝜌∥𝜎; 𝜏) is monotonically increasing
in 𝛼 for all 𝜌, 𝜎, 𝜏. By (7.5.6), we conclude that 𝐷e𝛼 (𝜌∥𝜎) is monotonically
increasing in 𝛼, as required.
3. Proof of additivity: When all quantities are finite, we have that
e𝛼 (𝜌1 ⊗ 𝜌2 ∥𝜎1 ⊗ 𝜎2 )
𝐷
1 h 1− 𝛼 1− 𝛼 𝛼
i (7.5.64)
= log2 Tr (𝜎1 ⊗ 𝜎2 ) 2𝛼 (𝜌1 ⊗ 𝜌2 )(𝜎1 ⊗ 𝜎2 ) 2𝛼 .
𝛼−1
Using the fact that (𝑋 ⊗ 𝑌 ) 𝛽 = 𝑋 𝛽 ⊗ 𝑌 𝛽 for all positive semi-definite operators
𝑋, 𝑌 and all 𝛽 ∈ R, we obtain
e𝛼 (𝜌1 ⊗ 𝜌2 ∥𝜎1 ⊗ 𝜎2 )
𝑄
371
Chapter 7: Quantum Entropies and Information
h 1− 𝛼 1− 𝛼
𝛼i
= Tr (𝜎1 ⊗ 𝜎2 ) (𝜌1 ⊗ 𝜌2 )(𝜎1 ⊗ 𝜎2 )
2𝛼 2𝛼 (7.5.65)
𝛼
1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
= Tr 𝜎1 ⊗ 𝜎2
2𝛼 2𝛼
(𝜌1 ⊗ 𝜌2 ) 𝜎1 ⊗ 𝜎2 2𝛼 2𝛼
(7.5.66)
𝛼
1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
= Tr 𝜎12𝛼 𝜌1 𝜎12𝛼 ⊗ 𝜎22𝛼 𝜌2 𝜎22𝛼 (7.5.67)
𝛼 𝛼
1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
= Tr 𝜎1 𝜌1 𝜎1
2𝛼 2𝛼
⊗ 𝜎2 𝜌2 𝜎2
2𝛼 2𝛼
(7.5.68)
𝛼 𝛼
1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
= Tr 𝜎12𝛼 𝜌1 𝜎12𝛼 · Tr 𝜎22𝛼 𝜌2 𝜎22𝛼 (7.5.69)
e𝛼 (𝜌1 ∥𝜎1 ) · 𝑄
=𝑄 e𝛼 (𝜌2 ∥𝜎2 ). (7.5.70)
1
Applying 𝛼−1 log2 and definitions, additivity follows.
4. Proof of the direct-sum property: Define the classical–quantum state and
operator, respectively, as
∑︁ ∑︁
𝜌𝑋 𝐴 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 , 𝜎𝑋 𝐴 B 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 . (7.5.71)
𝑥∈X 𝑥∈X
Then, since
1− 𝛼 ∑︁ 1− 𝛼
𝜎𝑋2𝛼𝐴 = |𝑥⟩⟨𝑥| 𝑋 ⊗ (𝑞(𝑥)𝜎𝐴𝑥 ) 2𝛼 , (7.5.72)
𝑥∈X
we find that
1− 𝛼 1− 𝛼 ∑︁ 1− 𝛼 1− 𝛼
𝜎𝑋2𝛼𝐴 𝜌 𝑋 𝐴 𝜎𝑋2𝛼𝐴 = |𝑥⟩⟨𝑥| 𝑋 ⊗ (𝑞(𝑥)𝜎𝐴𝑥 ) 2𝛼 ( 𝑝(𝑥) 𝜌 𝑥𝐴 )(𝑞(𝑥)𝜎𝐴𝑥 ) 2𝛼 (7.5.73)
𝑥∈X
∑︁ 1− 𝛼 1− 𝛼 1− 𝛼
= 𝑝(𝑥)𝑞(𝑥) 𝛼 |𝑥⟩⟨𝑥| 𝑋 ⊗ (𝜎𝐴𝑥 ) 2𝛼 𝜌 𝑥𝐴 (𝜎𝐴𝑥 ) 2𝛼 , (7.5.74)
𝑥∈X
which means that
𝛼
1− 𝛼 1− 𝛼
2𝛼 2𝛼
𝜎𝑋 𝐴 𝜌 𝑋 𝐴 𝜎𝑋 𝐴
∑︁ 𝛼 𝛼
(7.5.75)
𝛼 1−𝛼 𝑥 1− 𝛼
𝑥 𝑥 1−
= 𝑝(𝑥) 𝑞(𝑥) |𝑥⟩⟨𝑥| 𝑋 ⊗ (𝜎𝐴 ) 𝜌 𝐴 (𝜎𝐴 )
2𝛼 2𝛼 .
𝑥∈X

Taking the trace on both sides of this equation, and using the definition of 𝑄
e𝛼 ,
we conclude that
∑︁
𝑄 𝛼 (𝜌 𝑋 𝐴 ∥𝜎𝑋 𝐴 ) =
e 𝑝(𝑥) 𝛼 𝑞(𝑥) 1−𝛼 𝑄
e𝛼 (𝜌 𝑥 ∥𝜎 𝑥 ), (7.5.76)
𝐴 𝐴
𝑥∈X

372
Chapter 7: Quantum Entropies and Information

as required.
5. From the assumption that 𝜌1 ≤ 𝛾𝜌2 , we have that
1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
𝜎 2𝛼 𝜌1 𝜎 2𝛼 ≤ 𝛾𝜎 2𝛼 𝜌2 𝜎 2𝛼 . (7.5.77)
Then, using (2.2.158), we obtain
1− 𝛼 𝛼 1− 𝛼 𝛼
h 1− 𝛼 i h 1− 𝛼 i
𝛼
Tr 𝜎 𝜌1 𝜎
2𝛼 2𝛼 ≤ 𝛾 Tr 𝜎 𝜌2 𝜎
2𝛼 2𝛼 . (7.5.78)

The result follows after applying the logarithm and dividing by 𝛼 − 1 on both
sides of this inequality.
6. This follows from the Araki–Lieb–Thirring inequalities, which we state here
without proof (see the Bibliographic Notes in Section 7.13): for positive
semi-definite operators 𝐴 and 𝐵 acting on a finite-dimensional Hilbert space,
and for 𝑞 ≥ 0, the following inequalities hold
1 𝑟𝑞 𝑟 𝑞
h 1 i h 𝑟 i
(a) Tr 𝐵 2 𝐴𝐵 2 ≥ Tr 𝐵 2 𝐴 𝐵 2
𝑟 for all 𝑟 ∈ [0, 1].
1 𝑟𝑞 𝑟 𝑞
h 1 i h 𝑟 i
(b) Tr 𝐵 𝐴𝐵
2 2 ≤ Tr 𝐵 𝐴 𝐵
2 𝑟 2 for all 𝑟 ≥ 1.
For 𝛼 ∈ (0, 1), we make use of the first of these inequalities. In particular, we
1− 𝛼
set 𝑞 = 1, 𝑟 = 𝛼, 𝐴 = 𝜌 and 𝐵 = 𝜎 𝛼 . Then, letting 𝛾 B 1−𝛼 2𝛼 , we obtain
h 1− 𝛼 i
𝛼 1−2 𝛼
𝛾 𝛾 𝛼 𝛼𝛾 𝛼 𝛼𝛾
Tr (𝜎 𝜌𝜎 ) ≥ Tr[𝜎 𝜌 𝜎 ] = Tr 𝜎 𝜌 𝜎 2 = Tr[𝜌 𝛼 𝜎 1−𝛼 ],
(7.5.79)
where the last equality holds by cyclicity of the trace. Since the logarithm
function is monotonically increasing, this inequality implies that
1− 𝛼 𝛼
h 1− 𝛼 i
log2 Tr 𝜎 𝜌𝜎
2𝛼 2𝛼 ≥ log2 Tr 𝜌 𝛼 𝜎 1−𝛼 . (7.5.80)

Finally, since 𝛼 − 1 < 0 for all 𝛼 ∈ (0, 1), we conclude that

1− 𝛼 𝛼
1 h 1− 𝛼 i 1
log2 Tr 𝜎 2𝛼 𝜌𝜎 2𝛼 ≤ log2 Tr[𝜌 𝛼 𝜎 1−𝛼 ]. (7.5.81)
𝛼−1 𝛼−1
e𝛼 (𝜌∥𝜎) ≤ 𝐷 𝛼 (𝜌∥𝜎), as required.
That is, 𝐷
For 𝛼 ∈ (1, ∞), we make use of the second Araki–Lieb–Thirring inequality
1
above. As before, we let 𝑞 = 1, 𝑟 = 𝛼, 𝐴 = 𝜌, and 𝐵 2 = 𝜎 𝛾 . We find that
1− 𝛼 𝛼
h 1− 𝛼 i h 1− 𝛼 1− 𝛼
i
Tr 𝜎 2𝛼 𝜌𝜎 2𝛼 ≤ Tr 𝜎 2 𝜌 𝜎 2 = Tr[𝜌 𝛼 𝜎 1−𝛼 ].
𝛼
(7.5.82)

373
Chapter 7: Quantum Entropies and Information

Then, since the logarithm function is a monotonically increasing function and

𝛼 − 1 > 0 for all 𝛼 ∈ (1, ∞), we conclude that
1− 𝛼 𝛼
1 h 1− 𝛼 i 1
log2 Tr 𝜎 𝜌𝜎2𝛼 2𝛼 ≤ log2 Tr[𝜌 𝛼 𝜎 1−𝛼 ], (7.5.83)
𝛼−1 𝛼−1
e𝛼 (𝜌∥𝜎) ≤ 𝐷 𝛼 (𝜌∥𝜎), as required.
i.e., 𝐷
For the inequality in (7.5.46), with 𝜌 a state and 𝜎 a positive semi-definite
operator, we use the following “reverse” Araki–Lieb–Thirring inequality, which
we state here without proof (see the Bibliographic Notes in Section 7.13):
h 1 1 𝑟𝑞
i h 𝑟 i 𝑟 1−𝑟 2𝑟𝑞 1−𝑟 2𝑟𝑞
𝑟 𝑞
𝑟
Tr 𝐵 2 𝐴𝐵 2 ≤ Tr 𝐵 2 𝐴 𝐵 2 𝐴 2 𝐵 2 . (7.5.84)
𝑎 𝑏

This inequality holds for all positive semi-definite operators 𝐴 and 𝐵, as well
1 1
as for 𝑞 > 0, 𝑟 ∈ (0, 1], 𝑎, 𝑏 ∈ (0, ∞], and for 2𝑟𝑞 = 2𝑞 + 𝑎1 + 𝑏1 . Taking 𝑞 = 1,
1− 𝛼 2 2𝛼
𝑟 = 𝛼, 𝐴 = 𝜌, 𝐵 = 𝜎 𝛼 ,𝑎= 1−𝛼 , and 𝑏 = (1−𝛼) 2
, we obtain
h 1− 𝛼 1− 𝛼
𝛼i
Tr 𝜎 2𝛼 𝜌𝜎 2𝛼

2𝛼
2𝛼 (1− 𝛼) 2
h 1− 𝛼 1− 𝛼
i 𝛼 1− 𝛼
≤ Tr 𝜎 2 𝜌𝜎 2 𝜌 2
2
𝜎 2𝛼 . (7.5.85)
1− 𝛼 2𝛼
(1− 𝛼) 2

Now, because 𝜌 is a state,

2 𝛼(1−𝛼)
1− 𝛼 2𝛼 1− 𝛼 1− 𝛼
𝜌 2
2
= Tr 𝜌 2 (7.5.86)
1− 𝛼

1− 𝛼
1−2 𝛼 𝛼(1−𝛼)
= Tr 𝜌 2 (7.5.87)

= (Tr[𝜌]) 𝛼(1−𝛼) (7.5.88)

= 1. (7.5.89)

For 𝜎, we obtain
2𝛼
(1− 𝛼) 2
= (Tr[𝜎]) (1−𝛼) .
2
𝜎 2𝛼 (7.5.90)
2𝛼
(1− 𝛼) 2

Therefore,
1− 𝛼 𝛼
h 1− 𝛼 i 𝛼
≤ Tr[𝜌 𝜎 ] · (Tr[𝜎]) (1−𝛼) .
𝛼 1−𝛼 2
Tr 𝜎 2𝛼 𝜌𝜎 2𝛼 (7.5.91)

374
Chapter 7: Quantum Entropies and Information

1
Taking the logarithm of both sides and multiplying by 𝛼−1 , which is negative
for 𝛼 ∈ (0, 1), we obtain the inequality in (7.5.46). ■

The monotonicity in 𝛼 of the sandwiched Rényi relative entropy establishes an

inequality relating the quantum relative entropy and the fidelity of quantum states 𝜌
and 𝜎, by picking 𝛼 = 1 and 𝛼 = 1/2, respectively, and applying Proposition 7.30:
𝐷 (𝜌∥𝜎) ≥ − log2 𝐹 (𝜌, 𝜎). (7.5.92)
We can modify the lower bound a bit to establish an inequality relating the quantum
relative entropy and the trace distance:

Corollary 7.32 Quantum Pinsker Inequality

Let 𝜌 and 𝜎 be quantum states. Then the following inequality holds
1
𝐷 (𝜌∥𝜎) ≥ ∥ 𝜌 − 𝜎∥ 21 . (7.5.93)
4 ln 2

Remark: The constant prefactor can be improved from 4 ln1 2 to 1

2 ln 2 , but we do not give a proof
here (please consult the Bibliographic Notes in Section 7.13).

Proof: We can rewrite (7.5.92) as follows:

1
𝐷 (𝜌∥𝜎) ≥ − ln 𝐹 (𝜌, 𝜎) (7.5.94)
ln 2
1
=− ln[1 − (1 − 𝐹 (𝜌, 𝜎))] (7.5.95)
ln 2
1
≥ (1 − 𝐹 (𝜌, 𝜎)) (7.5.96)
ln 2
1
≥ ∥ 𝜌 − 𝜎∥ 21 . (7.5.97)
4 ln 2
The second inequality follows from − ln(1 − 𝑥) ≥ 𝑥, which holds for 𝑥 ∈ [0, 1].
The final inequality follows from Theorem 6.14. ■

Like the quantum relative entropy and the Petz–Rényi relative entropy, the
sandwiched Rényi relative entropy is faithful, meaning that for all states 𝜌, 𝜎 and
all 𝛼 ∈ (0, 1) ∪ (1, ∞),
e𝛼 (𝜌∥𝜎) = 0
𝐷 ⇐⇒ 𝜌 = 𝜎. (7.5.98)
375
Chapter 7: Quantum Entropies and Information

We prove this in Proposition 7.36 below.

We now prove the data-processing inequality for the sandwiched Rényi relative
entropy 𝐷 e𝛼 for 𝛼 ∈ [1/2, 1) ∪ (1, ∞). This, along with Proposition 7.30, gives us
a different way (apart from using data-processing inequality for the Petz–Rényi
relative entropy) to prove the data-processing inequality for the quantum relative
entropy.

Theorem 7.33 Data-Processing Inequality for Sandwiched Rényi Rela-

tive Entropy
Let 𝜌 be a state, 𝜎 a positive semi-definite operator, and N a quantum channel.
Then, for all 𝛼 ∈ [1/2, 1) ∪ (1, ∞),
e𝛼 (𝜌∥𝜎) ≥ 𝐷
𝐷 e𝛼 (N(𝜌)∥N(𝜎)). (7.5.99)

Proof: This proof follows steps very similar to those in the proof of the data-
processing inequality for the Petz–Rényi relative entropy (Theorem 7.24), with the
key difference being that in this case we make use of the fact that the sandwiched
Rényi relative entropy can be written as the optimization in (7.5.6).
From Stinespring’s theorem (Theorem 4.3), we know that the action of a channel
N on a linear operator 𝑋 can be written as
N(𝑋) = Tr𝐸 [𝑉 𝑋𝑉 † ], (7.5.100)
for some 𝑉, where 𝑉 is an isometry and 𝐸 is an auxiliary system with dimension
𝑑 𝐸 ≥ rank(ΓN ). As stated in (7.5.39), 𝐷 e𝛼 is isometrically invariant. Therefore, it
suffices to prove the data-processing inequality for 𝐷 e𝛼 under partial trace; i.e., it
suffices to show that for every state 𝜌 𝐴𝐵 , every positive semi-definite operator 𝜎𝐴𝐵 ,
and all 𝛼 ∈ [1/2, 1) ∪ (1, ∞):
e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) ≥ 𝐷
𝐷 e𝛼 (𝜌 𝐴 ∥𝜎𝐴 ). (7.5.101)
We now proceed to prove this inequality. We prove it for 𝜌 𝐴𝐵 , and hence 𝜌 𝐴 ,
invertible, as well as for 𝜎𝐴𝐵 and 𝜎𝐴 invertible. The result follows in the general
case of 𝜌 𝐴𝐵 and/or 𝜌 𝐴 non-invertible, as well as 𝜎𝐴𝐵 and/or 𝜎𝐴 non-invertible, by
applying the result to the invertible operators (1 − 𝛿) 𝜌 𝐴𝐵 + 𝛿𝜋 𝐴𝐵 and 𝜎𝐴𝐵 + 𝜀 1 𝐴𝐵 ,
with 𝛿, 𝜀 > 0, and taking the limits 𝜀 → 0+ and 𝛿 → 0+ , since
e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) = lim lim 𝐷
𝐷 e𝛼 ((1 − 𝛿) 𝜌 𝐴𝐵 + 𝛿𝜋 𝐴𝐵 ∥𝜎𝐴𝐵 + 𝜀 1 𝐴𝐵 ), (7.5.102)
+ +
𝜀→0 𝛿→0

376
Chapter 7: Quantum Entropies and Information

e𝛼 (𝜌 𝐴 ∥𝜎𝐴 ) = lim lim 𝐷

𝐷 e𝛼 ((1 − 𝛿) 𝜌 𝐴 + 𝛿𝜋 𝐴 ∥𝜎𝐴 + 𝑑 𝐵 𝜀 1 𝐴 ), (7.5.103)
+ +𝜀→0 𝛿→0

which can be verified in a similar manner to the proof of (7.5.10) in Proposition 7.29.
e𝛼 (𝜌∥𝜎; 𝜏) as
Let us start by defining the quantity 𝑄

e𝛼 (𝜌∥𝜎; 𝜏) B ⟨𝜑 𝜌 |(𝜏 −1 ⊗ 𝜎 T ) 1−𝛼𝛼 |𝜑 𝜌 ⟩,

𝑄 (7.5.104)

where 𝜏 is a positive definite state and

|𝜑 𝜌 ⟩ B (𝜌 2 ⊗ 1)|Γ⟩
1
(7.5.105)

is a purification of 𝜌. We note that

h 1 1− 𝛼 1 𝛼−1 i
𝑄 𝛼 (𝜌∥𝜎; 𝜏) = Tr 𝜌 2 𝜎 𝛼 𝜌 2 𝜏 𝛼
e (7.5.106)

so that
e𝛼 (𝜌∥𝜎; 𝜏) = 𝛼 e𝛼 (𝜌∥𝜎; 𝜏),
𝐷 log2 𝑄 (7.5.107)
𝛼−1
e𝛼 (𝜌∥𝜎; 𝜏) defined in (7.5.7). Now, to prove (7.5.101),
where we recall the quantity 𝐷
we show that for every positive definite state 𝜔 𝐴 , there exists a positive definite
state 𝜏𝐴𝐵 such that
e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ; 𝜏𝐴𝐵 ) ≥ 𝑄
𝑄 e𝛼 (𝜌 𝐴 ∥𝜎𝐴 ; 𝜔 𝐴 ), for 𝛼 ∈ (1, ∞),
(7.5.108)
e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ; 𝜏𝐴𝐵 ) ≤ 𝑄
𝑄 e𝛼 (𝜌 𝐴 ∥𝜎𝐴 ; 𝜔 𝐴 ), for 𝛼 ∈ [1/2, 1) .

With these two inequalities, along with (7.5.107) and (7.5.6), the result follows.
Consider that
e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ; 𝜏𝐴𝐵 ) = ⟨𝜑 𝜌 𝐴𝐵 | 𝑓 (𝜏 −1 ⊗ 𝜎 T )|𝜑 𝜌 𝐴𝐵 ⟩,
𝑄 (7.5.109)
𝐴𝐵 𝐴ˆ 𝐵ˆ
−1 T
e𝛼 (𝜌 𝐴 ∥𝜎𝐴 ; 𝜔 𝐴 ) = ⟨𝜑 𝐴 | 𝑓 (𝜔 ⊗ 𝜎 )|𝜑 𝜌 𝐴 ⟩,
𝜌
𝑄 𝐴 𝐴ˆ
(7.5.110)

where we have set 1− 𝛼

377
Chapter 7: Quantum Entropies and Information

Now, let us use the same isometry 𝑉𝐴 𝐴→𝐴𝐵

ˆ 𝐴ˆ 𝐵ˆ from (7.4.91) that we used in the
proof of data-processing inequality for the Petz–Rényi relative entropy; that is, let
1
−1
𝐴ˆ 𝐵ˆ B 𝜌 𝐴𝐵 (𝜌 𝐴 ⊗ 1 𝐴ˆ )|Γ⟩ 𝐵 𝐵ˆ .
2 2
𝑉𝐴 𝐴→𝐴𝐵
ˆ (7.5.114)

Recall that
1
−1 1

𝐴ˆ 𝐵ˆ |𝜑 ⟩ 𝐴 𝐴ˆ = 𝜌 𝐴𝐵 (𝜌 𝐴 ⊗ 1 𝐴ˆ )(𝜌 𝐴 ⊗ 1 𝐴ˆ )|Γ⟩ 𝐴 𝐴ˆ |Γ⟩ 𝐵 𝐵ˆ

𝜌𝐴 2 2 2
𝑉𝐴 𝐴→𝐴𝐵
ˆ (7.5.115)
1
= (𝜌 𝐴𝐵
2
⊗ 1 𝐴ˆ 𝐵ˆ )|Γ⟩ 𝐴 𝐴ˆ |Γ⟩𝐵 𝐵ˆ (7.5.116)
= |𝜑 𝜌 𝐴𝐵 ⟩. (7.5.117)

We thus obtain, for all 𝜏𝐴𝐵 ,

e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ; 𝜏𝐴𝐵 ) = ⟨𝜑 𝜌 𝐴 |𝑉 † 𝑓 (𝜏 −1 ⊗ 𝜎 T )𝑉 |𝜑 𝜌 𝐴 ⟩
𝑄 𝐴𝐵 𝐴ˆ 𝐵ˆ
(7.5.118)
≥ ⟨𝜑 𝜌 𝐴 | 𝑓 (𝑉 † (𝜏𝐴𝐵
−1
⊗ 𝜎𝐴Tˆ 𝐵ˆ )𝑉)|𝜑 𝜌 𝐴 ⟩

for 𝛼 ∈ (1, ∞) and

e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ; 𝜏𝐴𝐵 ) = ⟨𝜑 𝜌 𝐴 |𝑉 † 𝑓 (𝜏 −1 ⊗ 𝜎 T )𝑉 |𝜑 𝜌 𝐴 ⟩
𝑄 𝐴𝐵 𝐴ˆ 𝐵ˆ
(7.5.119)
≤ ⟨𝜑 𝜌 𝐴 | 𝑓 (𝑉 † (𝜏𝐴𝐵
−1
⊗ 𝜎𝐴Tˆ 𝐵ˆ )𝑉)|𝜑 𝜌 𝐴 ⟩

for 𝛼 ∈ [1/2, 1), where to obtain the last inequality in each case we used the operator
Jensen inequality (Theorem 2.16), which is applicable since for 𝛼 ∈ (1, ∞) the
function 𝑓 in (7.5.111) is operator convex and for 𝛼 ∈ [1/2, 1) it is operator concave.
Now, recall that to conclude (7.5.101), we should perform an optimization over
e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ; 𝜏𝐴𝐵 ) in order
invertible states 𝜏𝐴𝐵 as per the definition in (7.5.6) of 𝐷
to obtain 𝐷e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ). Since we only require a lower bound on 𝐷 e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ),
we can obtain the lower bound in (7.5.101) on 𝐷 e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) by simply picking a
particular state 𝜏𝐴𝐵 in the optimization in (7.5.6). Let us therefore take
1
−1 −1 1
𝜏𝐴𝐵 = 𝜉 𝐴𝐵 (𝜔 𝐴 ) B 𝜌 𝐴𝐵
2
(𝜌 𝐴 2 𝜔 𝐴 𝜌 𝐴 2 ⊗ 1𝐵 ) 𝜌 𝐴𝐵
2
, (7.5.120)

where 𝜔 𝐴 is an arbitrary invertible state. Note that this choice of 𝜏𝐴𝐵 is indeed a
state because it is the result of applying the Petz recovery channel P 𝜌 𝐴𝐵 ,Tr𝐵 defined
in (4.6.30) to 𝜔 𝐴 . It is also invertible; in particular,
−1 1 1
−1
𝐴 𝜌 𝐴 ⊗ 1 𝐵 ) 𝜌 𝐴𝐵 .
−1
𝜏𝐴𝐵 = [𝜉 𝐴𝐵 (𝜔 𝐴 )] −1 = 𝜌 𝐴𝐵2 (𝜌 𝐴2 𝜔−1 2 2
(7.5.121)
378
Chapter 7: Quantum Entropies and Information

With the choice in (7.5.120) for 𝜏𝐴𝐵 , we find that

𝑉 † (𝜏𝐴𝐵
−1
⊗ 𝜎𝐴Tˆ 𝐵ˆ )𝑉
−1 1 1
−1
= ⟨Γ| 𝐵 𝐵ˆ (𝜌 𝐴 2 ⊗ 1 𝐴ˆ ) 𝜌 𝐴𝐵
2 −1
(𝜏𝐴𝐵 ⊗ 𝜎𝐴Tˆ 𝐵ˆ ) 𝜌 𝐴𝐵
2
(𝜌 𝐴 2 ⊗ 1 𝐴ˆ )|Γ⟩𝐵 𝐵ˆ (7.5.122)

− 12 12 − 12 1 1
− 12 12 − 12
= ⟨Γ| 𝐵 𝐵ˆ 𝜌 𝐴 𝜌 𝐴𝐵 𝜌 𝐴𝐵 (𝜌 𝐴 𝜔 𝐴 𝜌 𝐴 ⊗ 1𝐵 ) 𝜌 𝐴𝐵 𝜌 𝐴𝐵 𝜌 𝐴 ⊗ 𝜎𝐴Tˆ 𝐵ˆ |Γ⟩𝐵 𝐵ˆ (7.5.123)
2 −1 2

𝐴 ⊗ 1 𝐵 ⊗ 𝜎𝐴ˆ 𝐵ˆ |Γ⟩ 𝐵 𝐵ˆ
= ⟨Γ| 𝐵 𝐵ˆ 𝜔−1 T
(7.5.124)
= 𝜔−1 T
𝐴 ⊗ ⟨Γ| 𝐵 𝐵ˆ 𝜎𝐴ˆ 𝐵ˆ |Γ⟩ 𝐵 𝐵ˆ (7.5.125)
= 𝜔−1 T
𝐴 ⊗ 𝜎𝐴ˆ , (7.5.126)

where we have used the fact that ⟨Γ| 𝐵 𝐵ˆ 𝜎 Tˆ ˆ |Γ⟩𝐵 𝐵ˆ = Tr 𝐵ˆ [𝜎 Tˆ ˆ ] = 𝜎 Tˆ , the last
𝐴𝐵 𝐴𝐵 𝐴
equality due to the fact that the transpose is taken on a product basis for H 𝐴ˆ ⊗ H𝐵ˆ .
Therefore, for 𝛼 ∈ (1, ∞), taking the logarithm on both sides of (7.5.118) and
using the state in (7.5.120), we find that
e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ; 𝜉 𝐴𝐵 (𝜔 𝐴 )) ≥ log2 ⟨𝜑 𝜌 𝐴 | 𝑓 (𝜔−1 ⊗ 𝜎 T )|𝜑 𝜌 𝐴 ⟩
log2 𝑄 (7.5.127)
𝐴 𝐴ˆ
e𝛼 (𝜌 𝐴 ∥𝜎𝐴 ; 𝜔 𝐴 ).
= log2 𝑄 (7.5.128)
𝛼
Multiplying both sides of this inequality by 𝛼−1 , we obtain

e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) =
𝐷 sup e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ; 𝜏𝐴𝐵 )
𝐷 (7.5.129)
𝜏𝐴𝐵 >0
Tr[𝜏𝐴𝐵 ]=1
≥𝐷
e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ; 𝜉 𝐴𝐵 (𝜔 𝐴 )) (7.5.130)
≥𝐷
e𝛼 (𝜌 𝐴 ∥𝜎𝐴 ; 𝜔 𝐴 ) (7.5.131)

for all invertible states 𝜔 𝐴 . Finally, taking the supremum over the set {𝜔 𝐴 : 𝜔 𝐴 >
0, Tr[𝜔 𝐴 ] = 1}, we conclude that
e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) ≥ 𝐷
𝐷 e𝛼 (𝜌 𝐴 ∥𝜎𝐴 ), for 𝛼 ∈ (1, ∞). (7.5.132)

For 𝛼 ∈ [1/2, 1), taking the logarithm on both sides of (7.5.119) and using the
state in (7.5.120), we conclude that
e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ; 𝜉 𝐴𝐵 (𝜔 𝐴 )) ≤ log2 ⟨𝜑 𝜌 𝐴 | 𝑓 (𝜔−1 ⊗ 𝜎 T )|𝜑 𝜌 𝐴 ⟩
log2 𝑄 (7.5.133)
𝐴 𝐴ˆ
e𝛼 (𝜌 𝐴 ∥𝜎𝐴 ; 𝜔 𝐴 ).
= log2 𝑄 (7.5.134)

379
Chapter 7: Quantum Entropies and Information
𝛼
Multiplying both sides of this inequality by 𝛼−1 , which is negative in this case, so
that
𝛼 e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ; 𝜉 𝐴𝐵 (𝜔 𝐴 )) ≥ 𝛼 log2 𝑄 e𝛼 (𝜌 𝐴 ∥𝜎𝐴 ; 𝜔 𝐴 ), (7.5.135)
log2 𝑄
𝛼−1 𝛼−1
we obtain
e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) =
𝐷 sup e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ; 𝜏𝐴𝐵 )
𝐷 (7.5.136)
𝜏𝐴𝐵 >0,
Tr[𝜏𝐴𝐵 ]=1
≥𝐷
e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ; 𝜉 𝐴𝐵 (𝜔 𝐴 )) (7.5.137)
≥𝐷
e𝛼 (𝜌 𝐴 ∥𝜎𝐴 ; 𝜔 𝐴 ) (7.5.138)

for all invertible states 𝜔 𝐴 . Finally, taking the supremum over the set {𝜔 𝐴 : 𝜔 𝐴 >
0, Tr[𝜔 𝐴 ] = 1}, we conclude that
e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) ≥ 𝐷
𝐷 e𝛼 (𝜌 𝐴 ∥𝜎𝐴 ), for 𝛼 ∈ [1/2, 1) . (7.5.139)

Having established the data-processing inequality for 𝐷

e𝛼 under the partial trace
channel for 𝛼 ∈ [1/2, 1) ∪ (1, ∞), we conclude that
e𝛼 (𝜌∥𝜎) ≥ 𝐷
𝐷 e𝛼 (N(𝜌)∥N(𝜎)), (7.5.140)

for 𝛼 ∈ [1/2, 1) ∪ (1, ∞), all states 𝜌, positive semi-definite operators 𝜎, and all
channels N. ■

By taking the limit 𝛼 → 1 in the statement of the data-processing inequality

for 𝐷
e𝛼 , along with Proposition 7.30, we immediately obtain the data-processing
inequality for the quantum relative entropy.

Corollary 7.34 Data-Processing Inequality for Quantum Relative En-

tropy
Let 𝜌 be a state, 𝜎 a positive semi-definite operator, and N a quantum channel.
Then,
𝐷 (𝜌∥𝜎) ≥ 𝐷 (N(𝜌)∥N(𝜎)). (7.5.141)

With the data-processing inequality for the sandwiched Rényi relative entropy in
hand, it is now straightforward to prove some of the following additional properties.

380
Chapter 7: Quantum Entropies and Information

Proposition 7.35 Additional Properties of Sandwiched Rényi Relative

Entropy
The sandwiched Rényi relative entropy 𝐷 e𝛼 satisfies the following properties
for every state 𝜌 and positive semi-definite operator 𝜎 for 𝛼 ∈ [1/2, 1) ∪ (1, ∞).
1. If Tr(𝜎) ≤ Tr(𝜌) = 1, then 𝐷 e𝛼 (𝜌∥𝜎) ≥ 0.

2. Faithfulness: If Tr[𝜎] ≤ 1, we have that 𝐷

e𝛼 (𝜌∥𝜎) = 0 if and only if
𝜌 = 𝜎.
3. If 𝜌 ≤ 𝜎, then 𝐷
e𝛼 (𝜌∥𝜎) ≤ 0.

4. For every positive semi-definite operator 𝜎′ such that 𝜎′ ≥ 𝜎, we have

e𝛼 (𝜌∥𝜎) ≥ 𝐷
𝐷 e𝛼 (𝜌∥𝜎′).

Proof:
1. By the data-processing inequality for 𝐷
e𝛼 with respect to the trace channel Tr,
and letting 𝑥 = Tr(𝜌) = 1 and 𝑦 = Tr(𝜎), we find that
1 1− 𝛼 1− 𝛼
e𝛼 (𝜌∥𝜎) ≥ 𝐷
𝐷 e𝛼 (𝑥∥𝑦) = log2 Tr[(𝑦 2𝛼 𝑥𝑦 2𝛼 ) 𝛼 ] (7.5.142)
𝛼−1
1
= log2 (𝑦 1−𝛼 ) (7.5.143)
𝛼−1
1−𝛼
= log2 𝑦 (7.5.144)
𝛼−1
= − log2 𝑦 (7.5.145)
≥ 0, (7.5.146)

where the last line follows from the assumption that 𝑦 = Tr(𝜎) ≤ 1.
2. Proof of faithfulness: If 𝜌 = 𝜎, then the following equalities hold for all
𝛼 ∈ [1/2, 1) ∪ (1, ∞):
1 h 1− 𝛼 1− 𝛼 𝛼 i
e𝛼 (𝜌∥ 𝜌) =
𝐷 log2 Tr 𝜌 2𝛼 𝜌𝜌 2𝛼 (7.5.147)
𝛼−1
1 h 1− 𝛼 i
𝛼 1−2 𝛼
= log2 Tr 𝜌 𝜌 𝜌
2 (7.5.148)
𝛼−1
1
= log2 Tr[𝜌 1−𝛼 𝜌 𝛼 ] (7.5.149)
𝛼−1
381
Chapter 7: Quantum Entropies and Information

1
= log2 Tr(𝜌) (7.5.150)
𝛼−1
= 0. (7.5.151)

Next, suppose that 𝛼 ∈ [1/2, 1)∪(1, ∞) and 𝐷 e𝛼 (𝜌∥𝜎) = 0. From the above, we
conclude that 𝐷 e𝛼 (Tr(𝜌)∥Tr(𝜎)) = − log2 𝑦 ≥ 0. From the fact that log2 𝑦 = 0
e𝛼 (𝜌∥𝜎) = 0 implies Tr(𝜎) = Tr(𝜌) = 1,
if and only if 𝑦 = 1, we conclude that 𝐷
so that 𝜎 is a density operator. Then, for every measurement channel M,
e𝛼 (M(𝜌)∥M(𝜎)) ≤ 𝐷
𝐷 e𝛼 (𝜌∥𝜎) = 0. (7.5.152)

On the other hand, since Tr(𝜎) = Tr(𝜌),

𝐷 (M(𝜌)∥M(𝜎)) ≥ 𝐷
e𝛼 (Tr(M(𝜌))∥Tr(M(𝜎))) (7.5.153)
=𝐷e𝛼 (Tr(𝜌)∥Tr(𝜎)) (7.5.154)
= 0, (7.5.155)

which means that 𝐷 e𝛼 (M(𝜌)∥M(𝜎)) = 0 for all measurement channels M.

Now, recall that M(𝜌) and M(𝜎) are effectively probability distributions
determined by the measurement. Since the classical Rényi relative entropy
is equal to zero if and only if its two arguments are equal, we can conclude
that M(𝜌) = M(𝜎). Since this is true for every measurement channel, we
conclude from Theorem 6.4 and the fact that the trace norm is a norm that
𝜌 = 𝜎. So we have that 𝐷 e𝛼 (𝜌∥𝜎) = 0 if and only if 𝜌 = 𝜎, as required.

3. Consider that 𝜌 ≤ 𝜎 implies that 𝜎 − 𝜌 ≥ 0. Then define the following positive

semi-definite operators:

𝜌ˆ B |0⟩⟨0| ⊗ 𝜌, (7.5.156)
ˆ B |0⟩⟨0| ⊗ 𝜌 + |1⟩⟨1| ⊗ (𝜎 − 𝜌) .
𝜎 (7.5.157)

By exploiting the direct-sum property of sandwiched Rényi relative entropy

(Proposition 7.31) and the data-processing inequality, we find that
e𝛼 (𝜌∥ 𝜌) = 𝐷
0=𝐷 e𝛼 ( 𝜌∥ ˆ ≥𝐷
ˆ 𝜎) e𝛼 (𝜌∥𝜎), (7.5.158)

where the inequality follows from data processing with respect to partial trace
over the classical register.

382
Chapter 7: Quantum Entropies and Information

4. Consider the state 𝜌ˆ B |0⟩⟨0| ⊗ 𝜌 and the operator 𝜎ˆ B |0⟩⟨0| ⊗ 𝜎 + |1⟩⟨1| ⊗

′ ′
(𝜎 − 𝜎), which is positive semi-definite because 𝜎 ≥ 𝜎 by assumption. Then
1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
ˆ
𝜎 2𝛼 𝜌ˆ 𝜎
ˆ 2𝛼 = |0⟩⟨0| ⊗ 𝜎 2𝛼 𝜌𝜎 2𝛼 , (7.5.159)

which implies that

e𝛼 ( 𝜌∥
𝐷 ˆ 𝜎) e𝛼 (𝜌∥𝜎).
ˆ =𝐷 (7.5.160)
ˆ = 𝜎′, and using the data-processing inequality
Then, observing that Tr1 [ 𝜎]
for 𝐷
e𝛼 with respect to the partial trace channel Tr1 , we conclude that

e𝛼 (𝜌∥𝜎′) = 𝐷
𝐷 e𝛼 (Tr1 ( 𝜌)∥Tr
ˆ 1 ( 𝜎))
ˆ ≤𝐷e𝛼 ( 𝜌∥
ˆ 𝜎) e𝛼 (𝜌∥𝜎),
ˆ =𝐷 (7.5.161)

as required. ■

Let us now prove the faithfulness of both the Petz–Rényi and sandwiched Rényi
relative entropies for the full range of parameters for which they are defined.

Proposition 7.36 Faithfulness of the Petz–Rényi and Sandwiched Rényi

Relative Entropies
For all 𝛼 ∈ (0, 1) ∪ (1, ∞) and for all states 𝜌, 𝜎, the Petz–Rényi and sandwiched
Rényi relative entropies are faithful, meaning that

𝐷 𝛼 (𝜌∥𝜎) = 0 if and only if 𝜌 = 𝜎, (7.5.162)

e𝛼 (𝜌∥𝜎) = 0 if and only if 𝜌 = 𝜎.
𝐷 (7.5.163)

Proof: Note that the equality 𝐷e𝛼 (𝜌∥ 𝜌) = 0 for all 𝛼 ∈ (0, 1) ∪ (1, ∞) is immediate
from the definition (see also (7.5.147)–(7.5.151)). The converse statement has
already been established in property 2. of Proposition 7.35 for 𝛼 ∈ [1/2, 1) ∪ (1, ∞).
Before getting to the range 𝛼 ∈ (0, 1/2), let us consider the Petz–Rényi relative
entropy.
It is immediately clear from the definition that 𝐷 𝛼 (𝜌∥ 𝜌) = 0 for all 𝛼 ∈
(0, 1)∪(1, ∞). For 𝛼 ∈ [0, 1)∪(1, 2], the converse follows from the data-processing
inequality, which holds for this parameter range as shown in Theorem 7.24, as well
as from arguments analogous to those in the proof of property 2. in Proposition 7.35.
For 𝛼 ∈ (2, ∞), we use the fact that 𝐷 𝛼 (𝜌∥𝜎) ≥ 𝐷 e𝛼 (𝜌∥𝜎) for all 𝜌, 𝜎, as shown
in Proposition 7.31. In particular, if 𝐷 𝛼 (𝜌∥𝜎) = 0, then 𝐷
e𝛼 (𝜌∥𝜎) ≤ 0. However,

383
Chapter 7: Quantum Entropies and Information

because 𝜌 and 𝜎 are states, by property 1. of Proposition 7.35, we have that

e𝛼 (𝜌∥𝜎) ≥ 0, which means that 𝐷
𝐷 e𝛼 (𝜌∥𝜎) = 0. Then, by property 2. of
Proposition 7.35, we immediately get that 𝜌 = 𝜎.
e𝛼 (𝜌∥𝜎) = 0, where 𝛼 ∈ (0, 1/2). Then, using (7.5.46),
Finally, suppose that 𝐷
we have that 𝛼𝐷 𝛼 (𝜌∥𝜎) ≤ 0. However, because 𝜌 and 𝜎 are states, by the
data-processing inequality we have that
1
𝛼 1−𝛼
𝐷 𝛼 (𝜌∥𝜎) ≥ 𝐷 𝛼 (Tr[𝜌] ∥Tr[𝜎]) = log2 Tr[𝜌] Tr[𝜎] = 0. (7.5.164)
𝛼−1
Therefore, 𝐷 𝛼 (𝜌∥𝜎) = 0, which implies that 𝜌 = 𝜎 by the faithfulness of the
Petz–Rényi relative entropy, which we just proved. ■

The data-processing inequality for the sandwiched Rényi relative entropy can
be written using the sandwiched Rényi relative quasi-entropy 𝑄
e𝛼 as

1 e𝛼 (𝜌∥𝜎) ≥ 1 log2 𝑄
e𝛼 (N(𝜌)|N(𝜎)).
log2 𝑄 (7.5.165)
𝛼−1 𝛼−1
Then, since 𝛼 − 1 is negative for 𝛼 ∈ [1/2, 1), we can use the monotonicity of the
function log2 to obtain

e𝛼 (𝜌∥𝜎) ≥ 𝑄
𝑄 e𝛼 (N(𝜌)∥N(𝜎)), for 𝛼 ∈ (1, ∞), (7.5.166)
e𝛼 (𝜌∥𝜎) ≤ 𝑄
𝑄 e𝛼 (N(𝜌)∥N(𝜎)), for 𝛼 ∈ [1/2, 1) . (7.5.167)

Just as with the Petz–Rényi relative entropy, we can use this to prove the joint
convexity of the sandwiched Rényi relative entropy.

Proposition 7.37 Joint Convexity & Concavity of Sandwiched Rényi

Relative Quasi-Entropy
Let 𝑝 : X → [0, 1] be a probability distribution over a finite alphabet X with
associated |X|-dimensional system 𝑋, let {𝜌 𝑥𝐴 }𝑥∈X be a set of states on a system
𝐴, and let {𝜎𝐴𝑥 }𝑥∈X be a set of positive semi-definite operators on 𝐴. Then, for
𝛼 ∈ (1, ∞)
!
∑︁ ∑︁ ∑︁
𝑥 𝑥 e𝛼 (𝜌 𝑥 ∥𝜎 𝑥 ),
𝑄𝛼
e 𝑝(𝑥) 𝜌
𝐴 𝑝(𝑥)𝜎 ≤
𝐴 𝑝(𝑥) 𝑄 𝐴 𝐴 (7.5.168)
𝑥∈X 𝑥∈X 𝑥∈X

384
Chapter 7: Quantum Entropies and Information

and for 𝛼 ∈ [1/2, 1),

!
∑︁ ∑︁ ∑︁
𝑄
e𝛼 𝑝(𝑥) 𝜌 𝑥𝐴 𝑝(𝑥)𝜎𝐴𝑥 ≥ e𝛼 (𝜌 𝑥 ∥𝜎 𝑥 ).
𝑝(𝑥) 𝑄 𝐴 𝐴 (7.5.169)
𝑥∈X 𝑥∈X 𝑥∈X

Consequently, the sandwiched Rényi relative entropy 𝐷

e𝛼 is jointly convex for
𝛼 ∈ [1/2, 1):
!
∑︁ ∑︁ ∑︁
𝑥 𝑥 e𝛼 (𝜌 𝑥 ∥𝜎 𝑥 ).
𝐷𝛼
e 𝑝(𝑥) 𝜌 𝐴 𝑝(𝑥)𝜎𝐴 ≤ 𝑝(𝑥) 𝐷 𝐴 𝐴 (7.5.170)
𝑥∈X 𝑥∈X 𝑥∈X

Proof: By the direct-sum property of 𝑄 e𝛼 and applying (7.5.166)–(7.5.167) and

Proposition 7.17, we conclude (7.5.168)–(7.5.169).
1
For 𝛼 ∈ [1/2, 1), taking log2 of both sides and multiplying by 𝛼−1 , which is
negative, we find that
!
1 ∑︁ ∑︁
log2 𝑄
e𝛼 𝑝(𝑥) 𝜌 𝑥𝐴 𝑝(𝑥)𝜎𝐴𝑥
𝛼−1
𝑥∈X 𝑥∈X
!
1 ∑︁
e𝛼 (𝜌 𝑥 ∥𝜎 𝑥 . (7.5.171)
≤ log2 𝑝(𝑥) 𝑄
𝛼−1 𝐴 𝐴
𝑥∈X

Then, since − log2 is a convex function, and using the definition of 𝐷 e𝛼 in terms of
e𝛼 , we conclude that
𝑄
!
∑︁ ∑︁ ∑︁ 1
𝐷e𝛼 𝑝(𝑥) 𝜌 𝑥𝐴 𝑝(𝑥)𝜎𝐴𝑥 ≤ 𝑝(𝑥) log2 𝑄e𝛼 (𝜌 𝑥 ∥𝜎 𝑥 ) (7.5.172)
𝛼−1 𝐴 𝐴
𝑥∈X 𝑥∈X 𝑥∈X
∑︁
= e𝛼 (𝜌 𝑥 ∥𝜎 𝑥 ),
𝑝(𝑥) 𝐷 (7.5.173)
𝐴 𝐴
𝑥∈X
as required. ■

Although the sandwiched Rényi relative entropy is not jointly convex for
𝛼 ∈ (1, ∞), it is jointly quasi-convex, in the sense that
!
∑︁ ∑︁
𝐷e𝛼 𝑝(𝑥) 𝜌 𝑥𝐴 𝑝(𝑥)𝜎𝐴𝑥 ≤ max 𝐷 e𝛼 (𝜌 𝑥 ∥𝜎 𝑥 ),
𝐴 𝐴 (7.5.174)
𝑥∈X
𝑥∈X 𝑥∈X

385
Chapter 7: Quantum Entropies and Information

for every finite alphabet X, probability distribution 𝑝 : X → [0, 1], set {𝜌 𝑥𝐴 }𝑥∈X of
states, and set {𝜎𝐴𝑥 }𝑥∈X of positive semi-definite operators. Indeed, from (7.5.168),
we immediately obtain
!
∑︁ ∑︁
𝑄
e𝛼 𝑝(𝑥) 𝜌 𝑥𝐴 𝑝(𝑥)𝜎𝐴𝑥 ≤ max 𝑄 e𝛼 (𝜌 𝑥 ∥𝜎 𝑥 ).
𝐴 𝐴 (7.5.175)
𝑥∈X
𝑥∈X 𝑥∈X
1
Taking the logarithm and multiplying by 𝛼−1 on both sides of this inequality leads
to (7.5.174).

7.6 Geometric Rényi Relative Entropy

In the previous two sections, we considered two examples of generalized divergences,
the Petz– and sandwiched Rényi relative entropies. Both of these are quantum
generalizations of the classical Rényi relative entropy defined in (7.4.19).
In this section, we consider another generalization of the classical Rényi relative
entropy, called the geometric Rényi relative entropy. Unlike the two previous Rényi
relative entropies, the geometric Rényi relative entropy does not converge to the
quantum relative entropy in the 𝛼 → 1 limit. Instead, it converges to what is
called the Belavkin–Staszewski relative entropy, as shown in Section 7.7. This
latter quantity represents a different quantum generalization of the classical relative
entropy in (7.2.2). The main use of the geometric Rényi and Belavkin–Staszewski
relative entropies is in establishing upper bounds on the rates of feedback-assisted
quantum communication protocols, the latter of which is the main focus of Part III
of this book.

Definition 7.38 Geometric Rényi Relative Entropy

Let 𝜌 be a state, 𝜎 a positive semi-definite operator, and 𝛼 ∈ (0, 1) ∪ (1, ∞).
The geometric Rényi relative quasi-entropy is defined as
𝛼
− 12 − 12
b𝛼 (𝜌∥𝜎) B lim Tr 𝜎𝜀 𝜎𝜀 𝜌𝜎𝜀
𝑄 = lim+ Tr[𝐺 𝛼 (𝜎𝜀 , 𝜌)], (7.6.1)
+
𝜀→0 𝜀→0

where 𝜎𝜀 B 𝜎 + 𝜀 1 and
𝛼
1
−1 −1 1
𝐺 𝛼 (𝜎𝜀 , 𝜌) B 𝜎𝜀
2
𝜎𝜀 2 𝜌𝜎𝜀 2 𝜎𝜀2 (7.6.2)

386
Chapter 7: Quantum Entropies and Information

is the weighted operator geometric mean of 𝜎𝜀 and 𝜌. The geometric Rényi

relative entropy is then defined as

b𝛼 (𝜌∥𝜎) B 1 b𝛼 (𝜌∥𝜎).
𝐷 log2 𝑄 (7.6.3)
𝛼−1

Remark: In general, the weighted operator geometric mean of two positive definite operators
𝑋 and 𝑌 is defined as
1 𝛽
1
1 1
𝐺 𝛽 (𝑋, 𝑌 ) B 𝑋 2 𝑋 − 2 𝑌 𝑋 − 2 𝑋 2 , (7.6.4)

where 𝛽 ∈ R is the weight parameter. We recover the standard operator geometric mean for
𝛽 = 21 .
An important property of the weighted operator geometric mean is that

𝐺 𝛽 (𝑋, 𝑌 ) = 𝐺 1−𝛽 (𝑌 , 𝑋) (7.6.5)

for all positive definite 𝑋, 𝑌 , and all 𝛽 ∈ R. To see this, observe that
1 1−𝛽
1
1 1
𝐺 1−𝛽 (𝑌 , 𝑋) = 𝑌 2 𝑌 − 2 𝑋𝑌 − 2 𝑌2 (7.6.6)
1
1 1 −𝛽 1
−2 − 12 −2 − 12
= 𝑌 𝑌 𝑋𝑌
2 𝑌 𝑋𝑌 𝑌2 (7.6.7)
1 1 1
1 1 1 1 −𝛽 1
= 𝑋 2 𝑋 2𝑌 − 2 𝑌 − 2 𝑋 2 𝑋 2𝑌 − 2 𝑌2 (7.6.8)

1 1
Now we apply Lemma 2.5. Specifically, we set 𝐿 = 𝑋 2 𝑌 − 2 and 𝑓 (𝑥) = 𝑥 −𝛽 therein to conclude
that
1
1 1 1 1 −𝛽 1 1 1
𝐺 1−𝛽 (𝑌 , 𝑋) = 𝑋 2 𝑋 2 𝑌 − 2 𝑌 − 2 𝑋 2 𝑋 2𝑌 − 2𝑌 2 (7.6.9)
1 𝛽
1
1 1
= 𝑋 2 𝑋− 2𝑌 𝑋− 2 𝑋 2 (7.6.10)
= 𝐺 𝛽 (𝑋, 𝑌 ). (7.6.11)

Definition 7.38 of the geometric Rényi relative entropy involves a limit, which
has to do with the possibility that 𝜎 might not be invertible (i.e., it might not be
positive definite). Recall that the same situation arises for the Petz– and sandwiched
Rényi relative entropies, which leads to expressions for them in terms of a limit
in Propositions 7.21 and 7.29, respectively. For these two quantities, the limits
evaluate to a finite value with an explicit expression under the condition 𝛼 ∈ (0, 1)
and Tr[𝜌𝜎] ≠ 0, or 𝛼 ∈ (1, ∞) and supp(𝜌) ⊆ supp(𝜎). For the geometric Rényi
relative entropy, however, there are several cases for which the limit in (7.6.1) is
387
Chapter 7: Quantum Entropies and Information

finite and has an explicit expression. The following proposition outlines some of
the simpler cases in which 𝜎 is positive definite:

Proposition 7.39
Let 𝜌 be a state, and let 𝜎 be a positive definite operator. Then,
1 𝛼
h 1 i
−
b𝛼 (𝜌∥𝜎) = Tr 𝜎 𝜎 2 𝜌𝜎 2 −
𝑄 = Tr[𝐺 𝛼 (𝜎, 𝜌)] (7.6.12)

for all 𝛼 ∈ (0, 1) ∪ (1, ∞). If 𝜌 is a positive definite state and 𝜎 a positive
definite operator, then

1 1 1−𝛼
−
b𝛼 (𝜌∥𝜎) = Tr 𝜌 𝜌 2 𝜎𝜌 2 −
𝑄 = Tr[𝐺 1−𝛼 (𝜌, 𝜎)] (7.6.13)

1 1 𝛼−1
= Tr 𝜌 𝜌 2 𝜎 −1 𝜌 2 . (7.6.14)

for all 𝛼 ∈ (0, 1) ∪ (1, ∞).

Proof. If 𝜎 is positive definite, then the support of 𝜎 is the entire Hilbert space,
and so the limit 𝜀 → 0+ in (7.6.1) simply evaluates to 𝑄 b𝛼 (𝜌∥𝜎) = Tr[𝐺 𝛼 (𝜎, 𝜌)]
for all 𝛼 ∈ (0, 1) ∪ (1, ∞).
If 𝜌 is also positive definite, then by invoking the equality in (7.6.5), we conclude
that Tr[𝐺 𝛼 (𝜎, 𝜌)] = Tr[𝐺 1−𝛼 (𝜌, 𝜎)] for all 𝛼 ∈ (0, 1) ∪ (1, ∞). Furthermore,
since both 𝜌 and 𝜎 are positive definite, the following equality holds
1−𝛼 𝛼−1
− 12 − 21 1
−1 1
𝜌 𝜎𝜌 = 𝜌 𝜎 𝜌
2 2 . (7.6.15)

Therefore, the equality in (7.6.14) holds. ■

We now provide explicit expressions for the geometric Rényi relative quasi-
b𝛼 (𝜌∥𝜎) that are consistent with the limit-based definition in (7.6.1)
entropy 𝑄
whenever 𝜌 and/or 𝜎 are not positive definite. The expressions given in (7.6.16)
below cover all possible values of 𝛼 ∈ (0, 1) ∪ (1, ∞) and support conditions.
Additional expressions are given in (7.6.19).

388
Chapter 7: Quantum Entropies and Information

Proposition 7.40 Explicit Expressions for Geometric Rényi Relative

Quasi-Entropy
For every state 𝜌, positive semi-definite operator 𝜎, and 𝛼 ∈ (0, 1) ∪ (1, ∞),
the following equality holds for the geometric Rényi relative quasi-entropy:


h 1
− − 1 𝛼
i if 𝛼 ∈ (0, 1) ∪ (1, ∞)
 Tr 𝜎 𝜎 2 𝜌𝜎 2
and supp(𝜌) ⊆ supp(𝜎)








if 𝛼 ∈ (0, 1)

 h 1 1 𝛼
 i
b −
𝑄 𝛼 (𝜌∥𝜎) = Tr 𝜎 𝜎 𝜌𝜎2 ˜ − 2 (7.6.16)

 and supp(𝜌) ⊈ supp(𝜎)




if 𝛼 ∈ (1, ∞) and


+∞




 supp(𝜌) ⊈ supp(𝜎),

where

−1 † 𝜌0,0 𝜌0,1
𝜌˜ B 𝜌0,0 − 𝜌0,1 𝜌1,1 𝜌0,1 , 𝜌= † , (7.6.17)
𝜌0,1 𝜌1,1
𝜌0,0 B Π𝜎 𝜌Π𝜎 , 𝜌0,1 B Π𝜎 𝜌Π𝜎⊥ , 𝜌1,1 B Π𝜎⊥ 𝜌Π𝜎⊥ , (7.6.18)

Π𝜎 is the projection onto the support of 𝜎, Π𝜎⊥ is the projection onto the kernel
1
of 𝜎, and the inverses 𝜎 − 2 and 𝜌1,1 −1 are taken on the supports of 𝜎 and 𝜌 ,
1,1
respectively. We also have the alternative expressions below for certain cases:


 − 1
− 1 1−𝛼 if 𝛼 ∈ (0, 1)
 Tr 𝜌 𝜌 2 𝜎𝜌 2
and supp(𝜎) ⊆ supp(𝜌)





𝑄 𝛼 (𝜌∥𝜎) =
b
, (7.6.19)
if 𝛼 ∈ (1, ∞)

 𝛼−1
1
 −1 1
 Tr 𝜌 𝜌 2 𝜎 𝜌 2

and supp(𝜌) ⊆ supp(𝜎)


1
where the inverses 𝜌 − 2 and 𝜎 −1 are taken on the supports of 𝜌 and 𝜎, respec-
tively.

Proof: The proof is similar in spirit to the proofs of Propositions 7.21 and 7.29, but
it is more complicated than these previous proofs. We provide it in Section 7.6.2. ■

Observe that when supp(𝜌) ⊆ supp(𝜎) and 𝛼 ∈ (0, 1), the expression

389
Chapter 7: Quantum Entropies and Information

Tr[𝜎(𝜎 −1/2 𝜌𝜎 −1/2 ) 𝛼 ] is actually a special case of Tr[𝜎(𝜎 −1/2 𝜌𝜎˜ −1/2 ) 𝛼 ], because
the operators 𝜌0,1 and 𝜌1,1 are both equal to zero in this case, so that Π𝜎 𝜌 = 𝜌Π𝜎 = 𝜌
and 𝜌˜ = 𝜌0,0 .
The main intuition behind the first expression in (7.6.16) and those in (7.6.19)
is as follows. If 𝜌 and 𝜎 are positive definite, then the following equalities hold

1 𝛼 1 1−𝛼
h 1 i 1
Tr 𝜎 𝜎 − 2 𝜌𝜎 − 2 = Tr 𝜌 𝜌 − 2 𝜎𝜌 − 2 (7.6.20)

1 1 𝛼−1
−1
= Tr 𝜌 𝜌 2 𝜎 𝜌 2 , (7.6.21)

for all 𝛼 ∈ (0, 1) ∪ (1, ∞), as shown previously in Proposition 7.39.

1. If the support condition supp(𝜌) ⊆ supp(𝜎) holds, then we can think of
supp(𝜎) as being the whole Hilbert space and 𝜎 being invertible on the
1
whole space. So then generalized inverses like 𝜎 − 2 or 𝜎 −1 are true inverses
on supp(𝜎), and the expression Tr[𝜎(𝜎 −1/2 𝜌𝜎 −1/2 ) 𝛼 ] is sensible for 𝛼 ∈
1
(0, 1) ∪ (1, ∞), with the only inverse in the expression being 𝜎 − 2 ; this
expression results after taking the limit 𝜀 → 0+ .
2. Similarly, the expression Tr[𝜌(𝜌 1/2 𝜎 −1 𝜌 1/2 ) 𝛼−1 ] is sensible for 𝛼 ∈ (1, ∞),
with the only inverse in the expression being 𝜎 −1 ; this latter expression results
after taking the limit 𝜀 → 0+ .
3. If the support condition supp(𝜎) ⊆ supp(𝜌) holds, then we can think of
supp(𝜌) as being the whole Hilbert space and 𝜌 being invertible on the whole
1
space. So then the generalized inverse 𝜌 − 2 is a true inverse on supp(𝜌), and
the expression Tr[𝜌(𝜌 −1/2 𝜎𝜌 −1/2 ) 1−𝛼 ] is sensible for 𝛼 ∈ (0, 1), with the only
1
inverse in the expression being 𝜌 − 2 ; this expression also results after taking
the limit 𝜀 → 0+ .
In order to understand the first expression in (7.6.19) further, observe that the
following identities hold for all 𝛼 ∈ (0, 1) ∪ (1, ∞):
𝛼
− 12 − 21
𝑄b𝛼 (𝜌∥𝜎) = lim lim Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 (7.6.22)
+
𝜀→0 𝛿→0 +

= lim+ lim+ Tr[𝐺 𝛼 (𝜎𝜀 , 𝜌𝛿 )], (7.6.23)

𝜀→0 𝛿→0

where
𝜌 𝛿 B (1 − 𝛿) 𝜌 + 𝛿𝜋, (7.6.24)
390
Chapter 7: Quantum Entropies and Information

and 𝜋 is the maximally mixed state. This holds because the expression for the
geometric Rényi relative quasi-entropy in Definition 7.38 does not involve an
inverse of the state 𝜌.
As it turns out, the order of the limits in (7.6.22) does not matter for 𝛼 ∈ (0, 1):

Lemma 7.41 Limit Interchange for Geometric Rényi Relative Quasi-

Entropy
Let 𝜌 be a state and 𝜎 a positive semi-definite operator. For 𝛼 ∈ (0, 1), the
following equality holds
𝛼
− 12 − 12
𝑄b𝛼 (𝜌∥𝜎) = lim lim Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 (7.6.25)
𝜀→0+ 𝛿→0+
𝛼
− 21 − 12
= inf Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 (7.6.26)
𝜀,𝛿>0
𝛼
− 12 − 12
= lim+ lim+ Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 , (7.6.27)
𝛿→0 𝜀→0

where 𝜌 𝛿 B (1 − 𝛿) 𝜌 + 𝛿𝜋, 𝛿 ∈ (0, 1), 𝜋 is the maximally mixed state,

𝜎𝜀 B 𝜎 + 𝜀 1, and 𝜀 > 0.

Proof: See Section 7.6.1. ■

Now, because both 𝜎𝜀 and 𝜌 𝛿 are positive definite for 𝜀, 𝛿 > 0, we can use the
property in (7.6.5), along with Lemma 7.41, to obtain the following for 𝛼 ∈ (0, 1):
b𝛼 (𝜌∥𝜎) = lim lim Tr[𝐺 1−𝛼 (𝜌 𝛿 , 𝜎𝜀 )]
𝑄 (7.6.28)
𝛿→0+ 𝜀→0+
" 1−𝛼 #
1 1
− −
= lim+ lim+ Tr 𝜌 𝛿 𝜌 𝛿 2 𝜎𝜀 𝜌 𝛿 2 (7.6.29)
𝛿→0 𝜀→0
" 1−𝛼 #
− 21 − 12
= lim+ Tr 𝜌 𝛿 𝜌 𝛿 𝜎𝜌 𝛿 , (7.6.30)
𝛿→0

where the last equality holds for the analogous reason that (7.6.22) holds, namely,
that the inverse of 𝜎 is not involved. We are now in a situation that looks like
the expression in (7.6.1), except that the roles of 𝜌 and 𝜎 are reversed and 𝛼
is substituted with 1 − 𝛼. Then, in the limit 𝛿 → 0+ , if the support condition
391
Chapter 7: Quantum Entropies and Information

supp(𝜎) ⊆ supp(𝜌) holds, the expression converges to Tr[𝜌(𝜌 −1/2 𝜎𝜌 −1/2 ) 1−𝛼 ].
It is worthwhile to consider the special case of 𝛼 = 2. In this case, the geometric
Rényi relative quasi-entropy collapses to the Petz–Rényi relative quasi-entropy
when supp(𝜌) ⊆ supp(𝜎):
" 2#
1 1
b2 (𝜌∥𝜎) = lim Tr 𝜎𝜀 𝜎𝜀− 2 𝜌𝜎𝜀− 2
𝑄 (7.6.31)
𝜀→0 +

−1 −1 −1 −1
= lim+ Tr 𝜎𝜀 𝜎𝜀 2 𝜌𝜎𝜀 2 𝜎𝜀 2 𝜌𝜎𝜀 2 (7.6.32)
𝜀→0

= lim+ Tr[𝜌𝜎𝜀−1 𝜌] (7.6.33)

𝜀→0
= lim+ Tr[𝜌 2 𝜎𝜀−1 ] (7.6.34)
𝜀→0
= 𝑄 2 (𝜌∥𝜎), (7.6.35)
with the last line following from Proposition 7.21. The development above implies
that the corresponding Rényi relative entropies are equal:
b2 (𝜌∥𝜎) = 𝐷 2 (𝜌∥𝜎).
𝐷 (7.6.36)
The geometric and sandwiched Rényi relative entropies also converge to the same
value in the limit 𝛼 → ∞, as shown in Section 7.8.
A first property of the geometric Rényi relative entropy that we establish is its
relation to the sandwiched Rényi relative entropy.

Proposition 7.42 Ordering of Sandwiched and Geometric Rényi Relative

Entropies
Let 𝜌 be a state and 𝜎 a positive semi-definite operator. The geometric Rényi
relative entropy is not smaller than the sandwiched Rényi relative entropy for
all 𝛼 ∈ (0, 1) ∪ (1, ∞):
e𝛼 (𝜌∥𝜎) ≤ 𝐷
𝐷 b𝛼 (𝜌∥𝜎). (7.6.37)

Proof: This is a direct consequence of the Araki–Lieb–Thirring inequality

(Lemma 2.15), which we recall here for convenience. For positive semi-definite
operators 𝑋 and 𝑌 , 𝑞 ≥ 0, and 𝑟 ∈ [0, 1], the following inequality holds
1 𝑟𝑞 𝑟 𝑞
h 1 i h 𝑟 i
𝑟
Tr 𝑌 2 𝑋𝑌 2 ≥ Tr 𝑌 2 𝑋 𝑌 2 . (7.6.38)

392
Chapter 7: Quantum Entropies and Information

For 𝑟 ≥ 1, the following inequality holds

1 𝑟𝑞 𝑟 𝑞
h 1 i h 𝑟 i
Tr 𝑌 2 𝑋𝑌 2 ≤ Tr 𝑌 2 𝑋 𝑟 𝑌 2 . (7.6.39)
1
−1 −1
By employing (7.6.38) with 𝑞 = 1, 𝑟 = 𝛼 ∈ (0, 1), 𝑌 = 𝜎𝜀𝛼 , and 𝑋 = 𝜎𝜀 2 𝜌𝜎𝜀 2 ,
and recalling that 𝜎𝜀 B 𝜎 + 𝜀 1, we find that
𝛼
− 21 − 12
𝑄b𝛼 (𝜌∥𝜎𝜀 ) = Tr 𝜎𝜀 𝜎𝜀 𝜌𝜎𝜀 (7.6.40)
𝛼 𝛼 𝛼
1 1 1 1
− −
= Tr 𝜎𝜀2𝛼 𝜎𝜀 2 𝜌𝜎𝜀 2 𝜎𝜀2𝛼 (7.6.41)
𝛼
1 1 1 1
− −
≤ Tr 𝜎𝜀2𝛼 𝜎𝜀 2 𝜌𝜎𝜀 2 𝜎𝜀2𝛼 (7.6.42)
𝛼
1− 𝛼 1− 𝛼
= Tr 𝜎𝜀2𝛼 𝜌𝜎𝜀2𝛼 (7.6.43)
e𝛼 (𝜌∥𝜎𝜀 ),
=𝑄 (7.6.44)
which implies for 𝛼 ∈ (0, 1), by using definitions, that
e𝛼 (𝜌∥𝜎𝜀 ) ≤ 𝐷
𝐷 b𝛼 (𝜌∥𝜎𝜀 ). (7.6.45)
Now taking the limit as 𝜀 → 0+ , employing Proposition 7.29 and Definition 7.38,
we arrive at the inequality in (7.6.37).
Since the Araki–Lieb–Thirring inequality is reversed for 𝑟 = 𝛼 ∈ (1, ∞), we
can employ similar reasoning as above, using (7.6.39) and definitions, to arrive at
(7.6.37) for 𝛼 ∈ (1, ∞). ■

If the state 𝜌 is pure, then the geometric Rényi relative entropy simplifies as
follows, such that it is independent of 𝛼:

Proposition 7.43 Geometric Rényi Relative Entropy for Pure States

Let 𝜌 = |𝜓⟩⟨𝜓| be a pure state and 𝜎 a positive semi-definite operator. Then the
following equality holds for all 𝛼 ∈ (0, 1) ∪ (1, ∞):
−1

b𝛼 (𝜌∥𝜎) = log2 ⟨𝜓|𝜎 |𝜓⟩ if supp(|𝜓⟩⟨𝜓|) ⊆ supp(𝜎) ,
𝐷 (7.6.46)
+∞ otherwise

393
Chapter 7: Quantum Entropies and Information

where the inverse 𝜎 −1 is taken on the support of 𝜎. If 𝜎 is also a rank-one

operator, so that 𝜎 = |𝜙⟩⟨𝜙| and ∥|𝜙⟩∥ 2 > 0, then the following equality holds
for all 𝛼 ∈ (0, 1) ∪ (1, ∞):
2
𝐷b𝛼 (𝜌∥𝜎) = − log2 ∥|𝜙⟩∥ 2 if ∃𝑐 ∈ C such that |𝜓⟩ = 𝑐|𝜙⟩ . (7.6.47)
+∞ otherwise

In particular, if 𝜎 = |𝜙⟩⟨𝜙| is a state so that ∥|𝜙⟩∥ 22 = 1, then

b𝛼 (𝜌∥𝜎) = 0 if |𝜓⟩ = |𝜙⟩
𝐷 . (7.6.48)
+∞ otherwise

Proof: Defining 𝜎𝜀 B 𝜎 + 𝜀 1, consider that

𝛼 𝛼
− 12 − 21 − 21 − 12
Tr 𝜎𝜀 𝜎𝜀 𝜌𝜎𝜀 = Tr 𝜎𝜀 𝜎𝜀 |𝜓⟩⟨𝜓|𝜎𝜀 (7.6.49)
 𝛼

!𝛼  © −1 1ª 

2 −
−1  𝜎 2 |𝜓⟩⟨𝜓|𝜎𝜀 2 ® 
®
= 𝜎𝜀 2 |𝜓⟩ Tr 𝜎𝜀 𝜀 2 ® 
®  (7.6.50)
1
2  − ® 
 𝜎𝜀 2 |𝜓⟩
 2 ¬ 

 «
 
!𝛼  1 1
2  𝜎 − 2 |𝜓⟩⟨𝜓|𝜎 − 2 
− 21
= 𝜎𝜀 |𝜓⟩ Tr 𝜎𝜀 𝜀 𝜀 

2 
 (7.6.51)
1
2  −

 𝜎𝜀 2 |𝜓⟩ 

 2 
2 𝛼−1
!
1
−2 − 12 − 12
= 𝜎𝜀 |𝜓⟩ Tr 𝜎𝜀 𝜎𝜀 |𝜓⟩⟨𝜓|𝜎𝜀 (7.6.52)
2
2 𝛼−1
!
−1
= 𝜎𝜀 2 |𝜓⟩ Tr[|𝜓⟩⟨𝜓|] (7.6.53)
2
𝛼−1
= ⟨𝜓|𝜎𝜀−1 |𝜓⟩

. (7.6.54)
The third equality follows because |𝜑⟩⟨𝜑| 𝛼 = |𝜑⟩⟨𝜑| for all 𝛼 ∈ (0, 1) ∪ (1, ∞) when
∥|𝜑⟩∥ 2 = 1. Applying the chain of equalities above, we find that
𝛼
1 − 12 − 21 1 𝛼−1
log2 ⟨𝜓|𝜎𝜀−1 |𝜓⟩

log2 Tr 𝜎𝜀 𝜎𝜀 𝜌𝜎𝜀 = (7.6.55)
𝛼−1 𝛼−1
394
Chapter 7: Quantum Entropies and Information

= log2 ⟨𝜓|𝜎𝜀−1 |𝜓⟩. (7.6.56)

Now let a spectral decomposition of 𝜎 be given by

∑︁
𝜎= 𝜇𝑦𝑄 𝑦, (7.6.57)
𝑦

where 𝜇 𝑦 are the non-negative eigenvalues and 𝑄 𝑦 are the eigenprojections. In this
decomposition, we are including values of 𝜇 𝑦 for which 𝜇 𝑦 = 0. Then it follows
that
𝜎𝜀 = 𝜎 + 𝜀 1 =
∑︁
𝜇𝑦 + 𝜀 𝑄 𝑦, (7.6.58)
𝑦

and we find that ∑︁ −1

𝜎𝜀−1 = 𝜇𝑦 + 𝜀 𝑄𝑦. (7.6.59)
𝑦

We then conclude that

∑︁ −1
⟨𝜓|𝜎𝜀−1 |𝜓⟩ = ⟨𝜓| 𝜇𝑦 + 𝜀 𝑄 𝑦 |𝜓⟩ (7.6.60)
𝑦
∑︁ −1
= 𝜇𝑦 + 𝜀 ⟨𝜓|𝑄 𝑦 |𝜓⟩ (7.6.61)
𝑦
∑︁ −1
= 𝜇𝑦 + 𝜀 ⟨𝜓|𝑄 𝑦 |𝜓⟩ + 𝜀 −1 ⟨𝜓|𝑄 𝑦0 |𝜓⟩, (7.6.62)
𝑦:𝜇 𝑦 ≠0

where 𝑦 0 is the value of 𝑦 for which 𝜇 𝑦 = 0 (if no such value of 𝑦 exists, then 𝑄 𝑦0
is equal to the zero operator). Thus, if ⟨𝜓|𝑄 𝑦0 |𝜓⟩ ≠ 0 (equivalent to |𝜓⟩ being
outside the support of 𝜎), then it follows that

lim+ log2 ⟨𝜓|𝜎𝜀−1 |𝜓⟩ = +∞. (7.6.63)

𝜀→0

Otherwise the expression converges as claimed.

Now suppose that 𝜎 is a rank-one operator, so that 𝜎 = |𝜙⟩⟨𝜙| and ∥|𝜙⟩∥ 2 > 0.
By defining
|𝜙⟩
|𝜙′⟩ B √︁ , 𝑁 B ∥|𝜙⟩∥ 22 , (7.6.64)
∥|𝜙⟩∥ 2
we find that

𝜎𝜀 = |𝜙⟩⟨𝜙| + 𝜀 1 (7.6.65)
395
Chapter 7: Quantum Entropies and Information

= 𝑁 |𝜙′⟩⟨𝜙′ | + 𝜀 ( 1 − |𝜙′⟩⟨𝜙′ | + |𝜙′⟩⟨𝜙′ |) (7.6.66)

= (𝑁 + 𝜀) |𝜙′⟩⟨𝜙′ | + 𝜀 ( 1 − |𝜙′⟩⟨𝜙′ |) , (7.6.67)

so that

𝜎𝜀−1 = (𝑁 + 𝜀) −1 |𝜙′⟩⟨𝜙′ | + 𝜀 −1 ( 1 − |𝜙′⟩⟨𝜙′ |) (7.6.68)

= (𝑁 + 𝜀) −1 − 𝜀 −1 |𝜙′⟩⟨𝜙′ | + 𝜀 −1 1 (7.6.69)

and then
h i
⟨𝜓|𝜎𝜀−1 |𝜓⟩ = ⟨𝜓| (𝑁 + 𝜀) −1 − 𝜀 −1 |𝜙′⟩⟨𝜙′ | + 𝜀 −1 1 |𝜓⟩ (7.6.70)

−1 −1
= (𝑁 + 𝜀) − 𝜀 |⟨𝜓|𝜙′⟩| 2 + 𝜀 −1 (7.6.71)
|⟨𝜓|𝜙′⟩| 2 1 − |⟨𝜓|𝜙′⟩| 2
= + . (7.6.72)
𝑁 +𝜀 𝜀

Note that we always have |⟨𝜓|𝜙′⟩| 2 ∈ [0, 1] because |𝜓⟩ and |𝜙′⟩ are unit vectors.
In the case that |⟨𝜓|𝜙′⟩| 2 ∈ [0, 1), then we find that
′⟩| 2 ′⟩| 2

|⟨𝜓|𝜙 1 − |⟨𝜓|𝜙
lim log2 ⟨𝜓|𝜎𝜀−1 |𝜓⟩ = lim+ log2

+ (7.6.73)
𝜀→0+ 𝜀→0 𝑁 +𝜀 𝜀
= +∞. (7.6.74)

Otherwise, if |⟨𝜓|𝜙′⟩| 2 = 1, then

|⟨𝜓|𝜙′⟩| 2 1 − |⟨𝜓|𝜙′⟩| 2

lim+ log2 ⟨𝜓|𝜎𝜀−1 |𝜓⟩

= lim+ log2 + (7.6.75)
𝜀→0 𝜀→0 𝑁 +𝜀 𝜀

1
= lim+ log2 (7.6.76)
𝜀→0 𝑁 +𝜀
= − log2 𝑁, (7.6.77)

concluding the proof. ■

We note here that, for pure states 𝜌 and 𝜎 and as indicated by (7.6.48), the
geometric Rényi relative entropy is either equal to zero or +∞, depending on
whether 𝜌 = 𝜎. This behavior of the geometric Rényi relative entropy for pure
states 𝜌 and 𝜎 is very different from that of the Petz– and sandwiched Rényi relative

396
Chapter 7: Quantum Entropies and Information

entropies. The latter quantities always evaluate to a finite value if the pure states
are non-orthogonal.
The geometric Rényi relative entropy possesses a number of useful properties,
similar to those for the Petz– and sandwiched Rényi relative entropies, which we
delineate now.

Proposition 7.44 Properties of Geometric Rényi Relative Entropy

For all states 𝜌, 𝜌1 , 𝜌2 and positive semi-definite operators 𝜎, 𝜎1 , 𝜎2 , the
geometric Rényi relative entropy satisfies the following properties:
1. Isometric invariance: For all 𝛼 ∈ (0, 1) ∪ (1, ∞) and for every isometry 𝑉,

𝐷 b𝛼 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ).
b𝛼 (𝜌∥𝜎) = 𝐷 (7.6.78)

2. Monotonicity in 𝛼: For all 𝛼 ∈ (0, 1) ∪ (1, ∞), the geometric Rényi

relative entropy 𝐷b𝛼 is monotonically increasing in 𝛼; i.e., 𝛼 < 𝛽 implies
b𝛼 (𝜌∥𝜎) ≤ 𝐷
𝐷 b 𝛽 (𝜌∥𝜎).

3. Additivity: For all 𝛼 ∈ (0, 1) ∪ (1, ∞),

b𝛼 (𝜌1 ⊗ 𝜌2 ∥𝜎1 ⊗ 𝜎2 ) = 𝐷
𝐷 b𝛼 (𝜌1 ∥𝜎1 ) + 𝐷
b𝛼 (𝜌2 ∥𝜎2 ). (7.6.79)

4. Direct-sum property: Let 𝑝 : X → [0, 1] be a probability distribution

over a finite alphabet X with associated |X|-dimensional
𝑥 system 𝑋, and let
𝑞 : X → [0, ∞) be a positive function on X. Let 𝜌 𝐴 𝑥∈X be a set of states

on a system 𝐴, and let 𝜎𝐴𝑥 𝑥∈X be a set of positive semi-definite operators
on 𝐴. Then,
∑︁
𝑄 𝛼 (𝜌 𝑋 𝐴 ∥𝜎𝑋 𝐴 ) =
b 𝑝(𝑥) 𝛼 𝑞(𝑥) 1−𝛼 𝑄
b𝛼 (𝜌 𝑥 ∥𝜎 𝑥 ),
𝐴 𝐴 (7.6.80)
𝑥∈X

where
∑︁
𝜌𝑋 𝐴 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 , (7.6.81)
𝑥∈X
∑︁
𝜎𝑋 𝐴 B 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 . (7.6.82)
𝑥∈X

397
Chapter 7: Quantum Entropies and Information

Proof:
b𝛼 (𝜌∥𝜎) as in (7.6.1)–
1. Proof of isometric invariance: Let us start by writing 𝐷
(7.6.3):
𝛼
1 − 21 − 12
b𝛼 (𝜌∥𝜎) = lim
𝐷 log2 Tr 𝜎𝜀 𝜎𝜀 𝜌𝜎𝜀 . (7.6.83)
𝜀→0+ 𝛼 − 1

where
𝜎𝜀 B 𝜎 + 𝜀 1. (7.6.84)
Let 𝑉 be an isometry. Then, defining

𝜔𝜀 B 𝑉 𝜎𝑉 † + 𝜀 1, (7.6.85)

we find that
𝛼
1 1 1
− −
b𝛼 (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ) = lim
𝐷 log2 Tr 𝜔𝜀 𝜔𝜀 2 𝑉 𝜌𝑉 † 𝜔𝜀 2 . (7.6.86)
𝜀→0 𝛼 − 1
+

Now let Π B 𝑉𝑉 † be the projection onto the image of 𝑉, so that Π𝑉 = 𝑉, and

let Π̂ B 1 − Π. Then we can write

𝜔𝜀 = 𝑉 𝜎𝑉 † + 𝜀Π + 𝜀 Π̂ = 𝑉 𝜎𝜀𝑉 † + 𝜀 Π̂. (7.6.87)

Since 𝑉 𝜎𝜀𝑉 † and 𝜀 Π̂ are supported on orthogonal subspaces, we obtain

−1 −1 1
𝜔𝜀 2 = 𝑉 𝜎𝜀 2 𝑉 † + 𝜀 − 2 Π̂. (7.6.88)

Consider then that

−1 −1
𝜔𝜀 2 𝑉 𝜌𝑉 † 𝜔𝜀 2

−1 1 −1 1
= 𝑉 𝜎𝜀 2 𝑉 † + 𝜀 − 2 Π̂ Π𝑉 𝜌𝑉 † Π 𝑉 𝜎𝜀 2 𝑉 † + 𝜀 − 2 Π̂ (7.6.89)

−1 −1
2 † † 2 †
= 𝑉 𝜎𝜀 𝑉 Π𝑉 𝜌𝑉 Π 𝑉 𝜎𝜀 𝑉 (7.6.90)
−1 −1
= 𝑉 𝜎𝜀 𝜌𝜎𝜀 𝑉 † ,
2 2
(7.6.91)

where the second equality follows because Π̂Π = Π Π̂ = 0. Thus,

𝛼 1 𝛼
− 21 − 1 − −1
𝜔𝜀 𝑉 𝜌𝑉 † 𝜔𝜀 2 = 𝑉 𝜎𝜀 2 𝜌𝜎𝜀 2 𝑉 † , (7.6.92)

398
Chapter 7: Quantum Entropies and Information

and we find that

𝛼
−1 −1
Tr 𝜔𝜀 𝜔𝜀 2 𝑉 𝜌𝑉 † 𝜔𝜀 2
1 𝛼
† −2 − 21
= Tr 𝑉 𝜎𝜀𝑉 + 𝜀 Π̂ 𝑉 𝜎𝜀 𝜌𝜎𝜀 𝑉† (7.6.93)
1 𝛼
− −1
2 2
= Tr 𝜎𝜀 𝜎𝜀 𝜌𝜎𝜀 . (7.6.94)

Since the equality

𝛼 1 𝛼
− 12 − 1 − −1
Tr 𝜔𝜀 𝜔𝜀 𝑉 𝜌𝑉 † 𝜔𝜀 2 2
= Tr 𝜎𝜀 𝜎𝜀 𝜌𝜎𝜀 2
(7.6.95)

holds for all 𝜀 > 0, we conclude the proof of isometric invariance by taking
the limit 𝜀 → 0+ .
2. Proof of monotonicity in 𝛼: We prove this by showing that the derivative is
non-negative for all 𝛼 > 0. By applying (7.6.22), we can consider 𝜌 and 𝜎 to
be positive definite without loss of generality. By applying (7.6.14), consider
that

1 1 1−𝛼
b𝛼 (𝜌∥𝜎) = Tr 𝜌 𝜌 − 2 𝜎𝜌 − 2
𝑄 (7.6.96)

1 1 𝛼−1
= Tr 𝜌 𝜌 2 𝜎 −1 𝜌 2 . (7.6.97)

Now defining |𝜑 𝜌 ⟩ = (𝜌 2 ⊗ 1)|Γ⟩ as a purification of 𝜌, and setting

𝛾 B 𝛼 − 1, (7.6.98)
1 1
𝑋 B 𝜌 𝜎 −1 𝜌 ,
2 2 (7.6.99)

we can write the geometric Rényi relative entropy as

1 1 ln⟨𝜑 𝜌 |𝑋 𝛾 ⊗ 1 |𝜑 𝜌 ⟩
𝐷 𝛼 (𝜌∥𝜎) = log2 ⟨𝜑 |𝑋 ⊗ 1 |𝜑 ⟩ =
b 𝜌 𝛾 𝜌
, (7.6.100)
𝛾 𝛾 ln(2)
d d d𝛾 d
where we made use of (7.6.97). Then d𝛼 = d𝛾 d𝛼 = d𝛾 , and so we find that

d b
ln(2) 𝐷 𝛼 (𝜌∥𝜎)
d𝛼
399
Chapter 7: Quantum Entropies and Information

d 1
= ln⟨𝜑 𝜌 |𝑋 𝛾 ⊗ 1 |𝜑 𝜌 ⟩ (7.6.101)
d𝛾 𝛾

1 1 d
= − 2 ln⟨𝜑 |𝑋 ⊗ 1 |𝜑 ⟩ +
𝜌 𝛾 𝜌
ln⟨𝜑 |𝑋 ⊗ 1 |𝜑 ⟩
𝜌 𝛾 𝜌
(7.6.102)
𝛾 𝛾 d𝛾
1 ⟨𝜑 𝜌 |𝑋 𝛾 ln 𝑋 ⊗ 1 |𝜑 𝜌 ⟩

1
= − 2 ln⟨𝜑 |𝑋 ⊗ 1 |𝜑 ⟩ +
𝜌 𝛾 𝜌
(7.6.103)
𝛾 𝛾 ⟨𝜑 𝜌 |𝑋 𝛾 ⊗ 1 |𝜑 𝜌 ⟩
−⟨𝜑 𝜌 |𝑋 𝛾 ⊗ 1 |𝜑 𝜌 ⟩ ln⟨𝜑 𝜌 |𝑋 𝛾 ⊗ 1 |𝜑 𝜌 ⟩ + 𝛾⟨𝜑 𝜌 |𝑋 𝛾 ln 𝑋 ⊗ 1 |𝜑 𝜌 ⟩

=
𝛾 2 ⟨𝜑 𝜌 |𝑋 𝛾 ⊗ 1 |𝜑 𝜌 ⟩
(7.6.104)
−⟨𝜑 𝜌 |𝑋 𝛾 ⊗ 1 |𝜑 𝜌 ⟩ ln⟨𝜑 𝜌 |𝑋 𝛾 ⊗ 1 |𝜑 𝜌 ⟩ + ⟨𝜑 𝜌 |𝑋 𝛾 ln 𝑋 𝛾 ⊗ 1 |𝜑 𝜌 ⟩

= .
𝛾 2 ⟨𝜑 𝜌 |𝑋 𝛾 ⊗ 1 |𝜑 𝜌 ⟩
(7.6.105)

Letting 𝑔(𝑥) B 𝑥 log2 𝑥, we write

d b ⟨𝜑 𝜌 |𝑔(𝑋 𝛾 ⊗ 1)|𝜑 𝜌 ⟩ − 𝑔(⟨𝜑 𝜌 |(𝑋 𝛾 ⊗ 1)|𝜑 𝜌 ⟩)

𝐷 𝛼 (𝜌∥𝜎) = . (7.6.106)
d𝛼 𝛾 2 ⟨𝜑 𝜌 |𝑋 𝛾 ⊗ 1 |𝜑 𝜌 ⟩
Then, since 𝑔(𝑥) is operator convex, by the operator Jensen inequality in
(2.3.23), we conclude that

⟨𝜑 𝜌 |𝑔(𝑋 𝛾 ⊗ 1)|𝜑 𝜌 ⟩ ≥ 𝑔(⟨𝜑 𝜌 |(𝑋 𝛾 ⊗ 1)|𝜑 𝜌 ⟩), (7.6.107)

d b
which means that d𝛼 𝐷 𝛼 (𝜌∥𝜎) ≥ 0. Therefore, 𝐷
b𝛼 (𝜌∥𝜎) is monotonically
increasing in 𝛼, as required.
3. Proof of additivity: The proof of (7.6.79) is found by direct evaluation. Consider
that

b𝛼 (𝜌1 ∥𝜎1,𝜀1 ) · lim 𝑄

lim 𝑄 b𝛼 (𝜌2 ∥𝜎2,𝜀2 )
𝜀 1 →0+ + 𝜀 2 →0
b𝛼 (𝜌1 ∥𝜎1,𝜀1 ) · 𝑄
= lim + lim + 𝑄 b𝛼 (𝜌2 ∥𝜎2,𝜀2 ), (7.6.108)
𝜀 1 →0 𝜀 2 →0

where 𝜎1,𝜀1 B 𝜎1 + 𝜀1 1 and 𝜎2,𝜀2 B 𝜎2 + 𝜀2 1. We then find that

b𝛼 (𝜌1 ∥𝜎1,𝜀1 ) · 𝑄
𝑄 b𝛼 (𝜌2 ∥𝜎2,𝜀2 )
𝛼 𝛼
− 21 − 12 − 12 − 12
= Tr 𝜎1,𝜀1 𝜎1,𝜀1 𝜌1 𝜎1,𝜀1 Tr 𝜎2,𝜀2 𝜎2,𝜀2 𝜌2 𝜎2,𝜀2 (7.6.109)

400
Chapter 7: Quantum Entropies and Information
𝛼 𝛼
−1 −1 −1 −1
= Tr 𝜎1,𝜀1 𝜎1,𝜀21 𝜌1 𝜎1,𝜀21 ⊗ 𝜎2,𝜀2 𝜎2,𝜀22 𝜌2 𝜎2,𝜀22 (7.6.110)
𝛼 𝛼
−1 −1 −1 −1
= Tr 𝜎1,𝜀1 ⊗ 𝜎2,𝜀2 𝜎1,𝜀21 𝜌1 𝜎1,𝜀21 ⊗ 𝜎2,𝜀22 𝜌2 𝜎2,𝜀22 (7.6.111)
𝛼
−1 −1 −1 −1
= Tr 𝜎1,𝜀1 ⊗ 𝜎2,𝜀2 𝜎1,𝜀21 𝜌1 𝜎1,𝜀21 ⊗ 𝜎2,𝜀22 𝜌2 𝜎2,𝜀22 (7.6.112)
h − 12 1 𝛼i
−2
= Tr 𝜎1,𝜀1 ⊗ 𝜎2,𝜀2 𝜎1,𝜀1 ⊗ 𝜎2,𝜀2 (𝜌1 ⊗ 𝜌2 ) 𝜎1,𝜀1 ⊗ 𝜎2,𝜀2
(7.6.113)
b𝛼 (𝜌1 ⊗ 𝜌2 ∥𝜎1,𝜀1 ⊗ 𝜎2,𝜀2 ).
=𝑄 (7.6.114)

By considering that

lim lim 𝜎1,𝜀1 ⊗ 𝜎2,𝜀2 = lim+ 𝜎1 ⊗ 𝜎2 + 𝜀 1 ⊗ 1, (7.6.115)

𝜀 1 →0+ 𝜀 2 →0+ 𝜀→0

along with continuity of the underlying functions, we conclude that

b𝛼 (𝜌1 ⊗ 𝜌2 ∥𝜎1,𝜀1 ⊗ 𝜎2,𝜀2 )

lim lim 𝑄
𝜀 1 →0+ 𝜀 2 →0+
b𝛼 (𝜌1 ⊗ 𝜌2 ∥𝜎1 ⊗ 𝜎2 + 𝜀 1 ⊗ 1). (7.6.116)
= lim+ 𝑄
𝜀→0

1
Finally, by applying the continuous function 𝛼−1 log2 (·) to all sides of the
equalities established, we conclude that additivity holds.
4. Proof of direct-sum property: Define the classical–quantum state 𝜌 𝑋 𝐴 and
operator 𝜎𝑋 𝐴 , respectively, as
∑︁ ∑︁
𝑥
𝜌𝑋 𝐴 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝐴 , 𝜎𝑋 𝐴 B 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 . (7.6.117)
𝑥∈X 𝑥∈X

Define

𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 + 𝜀 1 𝑋 ⊗ 1 𝐴
∑︁
𝜎𝑋𝜀 𝐴 B (7.6.118)
𝑥∈X
|𝑥⟩⟨𝑥| 𝑋 ⊗ 1 𝐴
∑︁ ∑︁
= 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 +𝜀 (7.6.119)
𝑥∈X,𝑞(𝑥)≠0 𝑥∈X

|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜀 1 𝐴 ,
∑︁ ∑︁
𝑥
= |𝑥⟩⟨𝑥| 𝑋 ⊗ 𝑞(𝑥)𝜎𝐴,𝜀 + (7.6.120)
𝑥∈X,𝑞(𝑥)≠0 𝑥∈X,𝑞(𝑥)=0

401
Chapter 7: Quantum Entropies and Information

where
𝑥
𝜎𝐴,𝜀 B 𝜎𝐴𝑥 + 𝜀 1 𝐴 . (7.6.121)
Then we find that
− 1 ∑︁ − 21
𝜎𝑋𝜀 𝐴 2 = |𝑥⟩⟨𝑥| 𝑋 ⊗ 𝑥
𝑞(𝑥)𝜎𝐴,𝜀 +
𝑥∈X,𝑞(𝑥)≠0

|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜀 − 2 1 𝐴 , (7.6.122)
∑︁ 1

𝑥∈X,𝑞(𝑥)=0

so that (omitting some lines of calculation)

−1 −1 𝛼
𝜎𝑋𝜀 𝐴 2 𝜌 𝑋 𝐴 𝜎𝑋𝜀 𝐴 2
∑︁
𝑝(𝑥)
𝛼 − 21 − 12 𝛼
𝑥
= |𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴,𝜀 𝜌 𝑥𝐴 𝜎𝐴,𝜀
𝑥
𝑞(𝑥)
𝑥∈X,𝑞(𝑥)≠0,
𝑝(𝑥)≠0
∑︁
+ |𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜀 −𝛼 (𝜌 𝑥𝐴 ) 𝛼 . (7.6.123)
𝑥∈X,𝑞(𝑥)=0,
𝑝(𝑥)≠0

Defining
− 21 − 12
𝜔𝑥𝐴 B 𝑥
𝜎𝐴,𝜀 𝜌 𝑥𝐴 𝑥
𝜎𝐴,𝜀 , (7.6.124)
it then follows that
b𝛼 (𝜌 𝑋 𝐴 ∥𝜎 𝜀 )
𝑄
h 𝑋 𝐴 1 1 𝛼
i
𝜀 𝜀 −2 𝜀 −2
= Tr 𝜎𝑋 𝐴 𝜎𝑋 𝐴 𝜌 𝑋 𝐴 𝜎𝑋 𝐴 (7.6.125)
 
 © ª 
 
′
𝛼 ®
© ∑︁
𝑥 ®
ª ∑︁
′ ′ 𝑝(𝑥 ) 𝛼
𝑥 ′ ®
® 
= Tr  𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴,𝜀 ® |𝑥 ⟩⟨𝑥 | 𝑋 ⊗

𝜔 𝐴 ®
 ® ′ 𝑞(𝑥 ′) ®
 𝑥∈X, 𝑥 ∈X,
¬ 𝑞(𝑥 ′′)≠0,
«𝑞(𝑥)≠0 ®
« 𝑝(𝑥 )≠0
 
 ¬
 
 © ª 
 ® 
© ∑︁ ª ∑︁ 
′ ′ −𝛼 𝑥 ′ 𝛼 ®
𝑥 ®
®
+ Tr  𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴,𝜀 ® |𝑥 ⟩⟨𝑥 | 𝑋 ⊗ 𝜀 (𝜌 𝐴 ) ®

 ® ′ ®
 𝑥∈X, 𝑥 ∈X,
¬ 𝑞(𝑥 ′′)=0,
«𝑞(𝑥)≠0 ®
« 𝑝(𝑥 )≠0
 
 ¬
402
Chapter 7: Quantum Entropies and Information

 
 © ª 
 
′
𝛼 ®
© ∑︁ ª ∑︁ 𝑝(𝑥 ) 𝛼 
|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜀 1 𝐴 ® ′ ′ 𝑥 ′ ®
®
+ Tr  |𝑥 ⟩⟨𝑥 | 𝑋 ⊗
 ®
𝜔 𝐴 ®
 ® ′ 𝑞(𝑥 ′) ®
 𝑥∈X, 𝑥 ∈X,
′
¬ 𝑞(𝑥 ′)≠0,
«𝑞(𝑥)=0 ®
« 𝑝(𝑥 )≠0
 
 ¬
 
© ∑︁ ª © ª 
|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜀 1 𝐴 ®
∑︁ ′
|𝑥 ′⟩⟨𝑥 ′ | 𝑋 ⊗ 𝜀 −𝛼 (𝜌 𝑥𝐴 ) 𝛼 ® (7.6.126)
 
+ Tr 
® ®
 𝑥∈X, ® ′ ′ )=0,
®
𝑥 ∈X,𝑞(𝑥
¬ « 𝑝(𝑥 ′ )≠0
 𝑞(𝑥)=0 
« h i ¬
∑︁ 𝛼
= 𝑝(𝑥) 𝛼 𝑞(𝑥) 1−𝛼 Tr 𝜎𝐴,𝜀 𝑥
𝜔𝑥𝐴
𝑥∈X,𝑞(𝑥)≠0,𝑝(𝑥)≠0
∑︁
+ 𝜀 1−𝛼 Tr[(𝜌 𝑥𝐴 ) 𝛼 ]. (7.6.127)
𝑥∈X,𝑞(𝑥)=0,𝑝(𝑥)≠0
h 𝛼i
Now observing that Tr 𝜎𝐴,𝜀 𝑥 𝜔𝑥𝐴 = 𝑄b𝛼 (𝜌 𝑥 ∥𝜎 𝑥 ) and taking the limit
𝐴 𝐴,𝜀
𝜀 → 0+ in the last line above, we find that

© ª
∑︁ ∑︁ ®
𝛼 1−𝛼 𝑥 𝑥 1−𝛼 𝑥 𝛼
®
lim+ 𝑝(𝑥) 𝑞(𝑥) 𝑄 b𝛼 (𝜌 ∥𝜎 ) + 𝜀 Tr[(𝜌 ) ] ®
𝜀→0 𝐴 𝐴,𝜀 𝐴 ®
𝑥∈X,
𝑞(𝑥)≠0, 𝑥∈X, ®
®
𝑞(𝑥)=0,
« 𝑝(𝑥)≠0 ∑︁
𝑝(𝑥)≠0 ¬
𝛼 1−𝛼 b 𝑥 𝑥
= 𝑝(𝑥) 𝑞(𝑥) 𝑄 𝛼 (𝜌 𝐴 ∥𝜎𝐴 ) (7.6.128)
𝑥∈X

if 𝛼 ∈ (0, 1) or if 𝛼 ∈ (1, ∞), supp(𝜌 𝑥𝐴 ) ⊆ supp(𝜎𝐴𝑥 ), and there does not exist
a value of 𝑥 for which 𝑝(𝑥) ≠ 0 and 𝑞(𝑥) = 0. The latter support conditions
are precisely the same as supp(𝜌 𝑋 𝐴 ) ⊆ supp(𝜎𝑋 𝐴 ). If 𝛼 ∈ (1, ∞) and the
support conditions do not hold, then the limit evaluates to +∞, consistent
with the right-hand side above. This concludes the proof of the direct-sum
property. ■

We now establish the data-processing inequality for the geometric Rényi relative
entropy for 𝛼 ∈ (0, 1) ∪ (1, 2].

403
Chapter 7: Quantum Entropies and Information

Theorem 7.45 Data-Processing Inequality for Geometric Rényi Relative

Entropy
Let 𝜌 be a state, 𝜎 a positive semi-definite operator, and N a quantum channel.
Then, for all 𝛼 ∈ (0, 1) ∪ (1, 2],
b𝛼 (𝜌∥𝜎) ≥ 𝐷
𝐷 b𝛼 (N(𝜌)∥N(𝜎)). (7.6.129)

Proof: From Stinespring’s dilation theorem (Theorem 4.3), we know that the
action of a quantum channel N on every linear operator 𝑋 can be written as
N(𝑋) = Tr𝐸 [𝑉 𝑋𝑉 † ], (7.6.130)
where 𝑉 is an isometry and 𝐸 is an auxiliary system with dimension 𝑑 𝐸 ≥ rank(ΓN 𝐴𝐵 ),
N
with Γ𝐴𝐵 the Choi operator for the channel N. As stated in Proposition 7.44, the
geometric Rényi relative entropy 𝐷 b𝛼 is isometrically invariant. Therefore, it suffices
to establish the data-processing inequality for 𝐷
b𝛼 under partial trace; i.e., it suffices
to show that for every state 𝜌 𝐴𝐵 , positive semi-definite operator 𝜎𝐴𝐵 , and for all
𝛼 ∈ (0, 1) ∪ (1, 2]:
b𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) ≥ 𝐷
𝐷 b𝛼 (𝜌 𝐴 ∥𝜎𝐴 ). (7.6.131)
We now proceed to prove this inequality. We prove it for 𝜌 𝐴𝐵 , and hence 𝜌 𝐴 ,
invertible, as well as for 𝜎𝐴𝐵 and 𝜎𝐴 invertible. The result follows in the general
case of 𝜌 𝐴𝐵 and/or 𝜌 𝐴 non-invertible, as well as 𝜎𝐴𝐵 and/or 𝜎𝐴 non-invertible, by
applying the result to the invertible operators (1 − 𝛿) 𝜌 𝐴𝐵 + 𝛿𝜋 𝐴𝐵 and 𝜎𝐴𝐵 + 𝜀 1 𝐴𝐵 ,
with 𝛿 ∈ (0, 1) and 𝜀 > 0, and taking the limit 𝛿 → 0+ followed by 𝜀 → 0+ ,
because
b𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) = lim lim 𝐷
𝐷 b𝛼 ((1 − 𝛿) 𝜌 𝐴𝐵 + 𝛿𝜋 𝐴𝐵 ∥𝜎𝐴𝐵 + 𝜀 1 𝐴𝐵 ), (7.6.132)
𝜀→0+ 𝛿→0+
b𝛼 (𝜌 𝐴 ∥𝜎𝐴 ) = lim lim 𝐷
𝐷 b𝛼 ((1 − 𝛿) 𝜌 𝐴 + 𝛿𝜋 𝐴 ∥𝜎𝐴 + 𝑑 𝐵 𝜀 1 𝐴 ), (7.6.133)
+
𝜀→0 𝛿→0+

which follows from (7.6.22) and the fact that the dimensional factor 𝑑 𝐵 does not
affect the limit in the second quantity above.
To establish the data-processing inequality, we make use of the Petz recovery
channel for partial trace (see Section 4.6.1.1), as well as the operator Jensen
inequality (Theorem 2.16). Recall that the Petz recovery channel P𝜎𝐴𝐵 ,Tr𝐵 for
partial trace is defined as

1 1 1 1
− −
P𝜎𝐴𝐵 ,Tr𝐵 (𝑋 𝐴 ) ≡ P(𝑋 𝐴 ) B 𝜎𝐴𝐵
2
𝜎𝐴 2 𝑋 𝐴 𝜎𝐴 2 ⊗ 1𝐵 𝜎𝐴𝐵
2
. (7.6.134)

404
Chapter 7: Quantum Entropies and Information

The Petz recovery channel has the following property:

P(𝜎𝐴 ) = 𝜎𝐴𝐵 , (7.6.135)

which can be verified by inspection. Since P𝜎𝐴𝐵 ,Tr𝐵 is completely positive and trace
preserving, it follows that its adjoint
−1 1 1
−1
P† (𝑌 𝐴𝐵 ) B 𝜎𝐴 2 Tr 𝐵 [𝜎𝐴𝐵
2 2
𝑌 𝐴𝐵 𝜎𝐴𝐵 ]𝜎𝐴 2 , (7.6.136)

is completely positive and unital. Observe that

† − 12 − 21 − 12 − 12
P (𝜎𝐴𝐵 𝜌 𝐴𝐵 𝜎𝐴𝐵 ) = 𝜎𝐴 𝜌 𝐴 𝜎𝐴 . (7.6.137)

We then find for 𝛼 ∈ (1, 2] that

𝛼
−1 −1
b𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) = Tr
𝑄 𝜎𝐴𝐵 𝜎𝐴𝐵2 𝜌 𝐴𝐵 𝜎𝐴𝐵2 (7.6.138)
𝛼
−1 −1
= Tr P(𝜎𝐴 ) 𝜎𝐴𝐵2 𝜌 𝐴𝐵 𝜎𝐴𝐵2 (7.6.139)
𝛼
† − 12 − 12
= Tr 𝜎𝐴 P 𝜎𝐴𝐵 𝜌 𝐴𝐵 𝜎𝐴𝐵 (7.6.140)
𝛼
1 1
− −
≥ Tr 𝜎𝐴 P† 𝜎𝐴𝐵2 𝜌 𝐴𝐵 𝜎𝐴𝐵2 (7.6.141)
𝛼
− 21 − 12
= Tr 𝜎𝐴 𝜎𝐴 𝜌 𝐴 𝜎𝐴 (7.6.142)
b𝛼 (𝜌 𝐴 ∥𝜎𝐴 ).
=𝑄 (7.6.143)

The second equality follows from (7.6.135). The sole inequality is a consequence of
the operator Jensen inequality and the fact that 𝑥 𝛼 is operator convex for 𝛼 ∈ (1, 2].
Indeed, for M a completely positive unital map, item 2. of Theorem 2.16 implies
that
𝑓 (M(𝑋)) ≤ M( 𝑓 (𝑋)) (7.6.144)
for Hermitian 𝑋 and an operator convex function 𝑓 . The second-to-last equality
follows from (7.6.137).
Applying the same reasoning as above, but using the fact that 𝑥 𝛼 is operator
concave for 𝛼 ∈ (0, 1), we find for 𝛼 ∈ (0, 1) that
b𝛼 (𝜌 𝐴 ∥𝜎𝐴 ) ≥ 𝑄
𝑄 b𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ). (7.6.145)
405
Chapter 7: Quantum Entropies and Information

Putting together the above and employing definitions, we find that the following
inequality holds for 𝛼 ∈ (0, 1) ∪ (1, 2]:
b𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) ≥ 𝐷
𝐷 b𝛼 (𝜌 𝐴 ∥𝜎𝐴 ), (7.6.146)

concluding the proof. ■

With the data-processing inequality for the geometric Rényi relative entropy in
hand, we can establish some additional properties.

Proposition 7.46 Additional Properties of Geometric Rényi Relative

Entropy
The geometric Rényi relative entropy 𝐷 b𝛼 satisfies the following properties for
every state 𝜌 and positive semi-definite operator 𝜎 for 𝛼 ∈ (0, 1) ∪ (1, 2]:
1. If Tr[𝜎] ≤ Tr[𝜌] = 1, then 𝐷b𝛼 (𝜌∥𝜎) ≥ 0.

2. Faithfulness: Suppose that Tr[𝜎] ≤ Tr[𝜌] = 1 and let 𝛼 ∈ (0, 1) ∪ (1, ∞).
Then 𝐷b𝛼 (𝜌∥𝜎) = 0 if and only if 𝜌 = 𝜎.

3. If 𝜌 ≤ 𝜎, then 𝐷
b𝛼 (𝜌∥𝜎) ≤ 0.

4. For every positive semi-definite operator 𝜎′ such that 𝜎′ ≥ 𝜎, the following

inequality holds 𝐷b𝛼 (𝜌∥𝜎) ≥ 𝐷 b𝛼 (𝜌∥𝜎′).

Proof:
1. Apply the data-processing inequality with the channel being the full trace-out
channel:
b𝛼 (𝜌∥𝜎) ≥ 𝐷
𝐷 b𝛼 (Tr[𝜌] ∥ Tr[𝜎]) (7.6.147)
1
= log2 (Tr[𝜌]) 𝛼 (Tr[𝜎]) 1−𝛼 (7.6.148)
𝛼−1
= − log2 Tr[𝜎] (7.6.149)
≥ 0. (7.6.150)

b𝛼 (𝜌∥𝜎) = 0 for 𝛼 ∈
2. If 𝜌 = 𝜎, then it follows by direct evaluation that 𝐷
(0, 1) ∪ (1, ∞).

406
Chapter 7: Quantum Entropies and Information

To see the other implication, suppose first that (0, 1) ∪ (1, 2]. Then 𝐷
b𝛼 (𝜌∥𝜎) =
0 implies that equality is achieved in the two inequalities in item 1. above.
So then Tr[𝜎] = 1. Furthermore, we conclude from data processing that
𝐷b𝛼 (M(𝜌)∥M(𝜎)) = 0 for all measurement channels M. This includes the
measurement that achieves the trace distance. By applying the faithfulness of
the classical Rényi relative entropy on the distributions that result from the
optimal trace-distance measurement, we conclude that 𝜌 = 𝜎. To get the range
outside the data-processing interval of (0, 1) ∪ (1, 2], note that 𝐷 b𝛼 (𝜌∥𝜎) = 0
for 𝛼 > 2 implies by monotonicity (Property 2 of Proposition 7.44) that
𝐷b𝛼 (𝜌∥𝜎) = 0 for 𝛼 ≤ 2. Then it follows that 𝜌 = 𝜎.

3. Consider that 𝜌 ≤ 𝜎 implies that 𝜎 − 𝜌 ≥ 0. Then define the following positive

semi-definite operators:

𝜌ˆ B |0⟩⟨0| ⊗ 𝜌, (7.6.151)
ˆ B |0⟩⟨0| ⊗ 𝜌 + |1⟩⟨1| ⊗ (𝜎 − 𝜌) .
𝜎 (7.6.152)

By exploiting the direct-sum property of geometric Rényi relative entropy

(Proposition 7.44) and the data-processing inequality (Theorem 7.45), we find
that
0=𝐷 b𝛼 (𝜌∥ 𝜌) = 𝐷b𝛼 ( 𝜌∥ ˆ ≥𝐷
ˆ 𝜎) b𝛼 (𝜌∥𝜎), (7.6.153)
where the inequality follows from data processing with respect to partial trace
over the classical register.
4. Similar to the above proof, the condition 𝜎′ ≥ 𝜎 implies that 𝜎′ − 𝜎 ≥ 0.
Then define the following positive semi-definite operators:

𝜌ˆ B |0⟩⟨0| ⊗ 𝜌, (7.6.154)
ˆ B |0⟩⟨0| ⊗ 𝜎 + |1⟩⟨1| ⊗ (𝜎′ − 𝜎) .
𝜎 (7.6.155)

By exploiting the direct-sum property of geometric Rényi relative entropy

(Proposition 7.44) and the data-processing inequality (Theorem 7.45), we find
that
𝐷b𝛼 (𝜌∥𝜎) = 𝐷 b𝛼 ( 𝜌∥ ˆ ≥𝐷
ˆ 𝜎) b𝛼 (𝜌∥𝜎′), (7.6.156)
where the inequality follows from data processing with respect to partial trace
over the classical register. ■

407
Chapter 7: Quantum Entropies and Information

The data-processing inequality for the geometric Rényi relative entropy can be
b𝛼 (𝜌∥𝜎) as
written using the geometric Rényi relative quasi-entropy 𝑄
1 b𝛼 (𝜌∥𝜎) ≥ 1 log2 𝑄 b𝛼 (N(𝜌)∥N(𝜎)).
log2 𝑄 (7.6.157)
𝛼−1 𝛼−1
Since 𝛼 − 1 is negative for 𝛼 ∈ (0, 1), we can use the monotonicity of the function
log2 to obtain
b𝛼 (𝜌∥𝜎) ≥ 𝑄
𝑄 b𝛼 (N(𝜌)∥N(𝜎)), for 𝛼 ∈ (1, 2], (7.6.158)
b𝛼 (𝜌∥𝜎) ≤ 𝑄
𝑄 b𝛼 (N(𝜌)∥N(𝜎)), for 𝛼 ∈ (0, 1). (7.6.159)
We can use this to establish some convexity/concavity statements for the geometric
Rényi relative entropy.

Proposition 7.47 Joint Convexity & Concavity of the Geometric Rényi

Relative Quasi-Entropy
Let 𝑝 : X → [0, 1] be a probability distribution
𝑥 over a finite alphabet X with
associated |X|-dimensional system 𝑋, let 𝜌 𝐴 𝑥∈X be a set of states on system

𝐴, and let 𝜎𝐴𝑥 𝑥∈X be a set of positive semi-definite operators on 𝐴. Then, for
𝛼 ∈ (1, 2],
!
∑︁ ∑︁ ∑︁
𝑥 𝑥 b𝛼 (𝜌 𝑥 ∥𝜎 𝑥 ),
𝑄𝛼
b 𝑝(𝑥) 𝜌 𝐴 𝑝(𝑥)𝜎 ≤ 𝐴 𝑝(𝑥) 𝑄 (7.6.160)
𝐴 𝐴
𝑥∈X 𝑥∈X 𝑥∈X

and for 𝛼 ∈ (0, 1),

!
∑︁ ∑︁ ∑︁
𝑄
b𝛼 𝑝(𝑥) 𝜌 𝑥𝐴 𝑝(𝑥)𝜎𝐴𝑥 ≥ b𝛼 (𝜌 𝑥 ∥𝜎 𝑥 ).
𝑝(𝑥) 𝑄 𝐴 𝐴 (7.6.161)
𝑥∈X 𝑥∈X 𝑥∈X

Consequently, the geometric Rényi relative entropy 𝐷b𝛼 is jointly convex for
𝛼 ∈ (0, 1):
!
∑︁ ∑︁ ∑︁
𝐷
b𝛼 𝑝(𝑥) 𝜌 𝑥𝐴 𝑝(𝑥)𝜎𝐴𝑥 ≤ b𝛼 (𝜌 𝑥 ∥𝜎 𝑥 ).
𝑝(𝑥) 𝐷 𝐴 𝐴 (7.6.162)
𝑥∈X 𝑥∈X 𝑥∈X

Proof: The first two inequalities follow directly from the direct-sum property of the
geometric Rényi relative entropy (Proposition 7.44), the data-processing inequality
408
Chapter 7: Quantum Entropies and Information

(Theorem 7.45), and Proposition 7.17. The last inequality follows from the first by
applying the logarithm, scaling by 1/(𝛼 − 1), and taking a maximum. ■

Although the geometric Rényi relative entropy is not jointly convex for 𝛼 ∈
(1, 2] , it is jointly quasi-convex, in the sense that
!
∑︁ ∑︁
𝐷b𝛼 𝑝(𝑥) 𝜌 𝑥𝐴 𝑝(𝑥)𝜎𝐴𝑥 ≤ max 𝐷 b𝛼 (𝜌 𝑥 ∥𝜎 𝑥 ),
𝐴 𝐴 (7.6.163)
𝑥∈X
𝑥∈X 𝑥∈X

for every finite alphabet X, probability distribution 𝑝 : X → [0, 1], set 𝜌 𝑥𝐴 𝑥∈X of

states, and set 𝜎𝐴𝑥 𝑥∈X of positive semi-definite operators. Indeed, from (7.6.160),
we immediately obtain
!
∑︁ ∑︁
𝑄b𝛼 𝑝(𝑥) 𝜌 𝑥
𝐴 𝑝(𝑥)𝜎 𝑥 ≤ max 𝑄
𝐴
b𝛼 (𝜌 𝑥 ∥𝜎 𝑥 ).
𝐴 𝐴 (7.6.164)
𝑥∈X
𝑥∈X 𝑥∈X

1
Taking the logarithm and multiplying by 𝛼−1 on both sides of this inequality leads
to (7.6.163).
The geometric Rényi relative entropy has another interpretation, which is
worthwhile to mention.

Proposition 7.48 Geometric Rényi Relative Entropy from Classical

Preparations
Let 𝜌 be a state and 𝜎 a positive semi-definite operator satisfying supp(𝜌) ⊆
supp(𝜎). For all 𝛼 ∈ (0, 1) ∪ (1, 2], the geometric Rényi relative entropy is
equal to the smallest value that the classical Rényi relative entropy can take by
minimizing over classical–quantum channels that realize the state 𝜌 and the
positive semi-definite operator 𝜎. That is, the following equality holds
b𝛼 (𝜌∥𝜎) = inf {𝐷 𝛼 ( 𝑝∥𝑞) : P(𝜔( 𝑝)) = 𝜌, P(𝜔(𝑞)) = 𝜎} ,
𝐷 (7.6.165)
{𝑝,𝑞,P}

where the classical Rényi relative entropy is defined in (7.4.19), the channel
P is a classical–quantum channel, 𝑝 : X → [0, 1] is a probability distribution
over a finite alphabet X, 𝑞 : X → [0, ∞) is a positive function on X, 𝜔( 𝑝) B
Í Í
𝑥∈X 𝑝(𝑥)|𝑥⟩⟨𝑥|, and 𝜔(𝑞) B 𝑥∈X 𝑞(𝑥)|𝑥⟩⟨𝑥|.

409
Chapter 7: Quantum Entropies and Information

Proof: First, suppose that there exists a quantum channel P such that

P(𝜔( 𝑝)) = 𝜌, P(𝜔(𝑞)) = 𝜎. (7.6.166)

Then consider the following chain of inequalities:

𝐷 𝛼 ( 𝑝∥𝑞) = 𝐷
b𝛼 (𝜔( 𝑝)∥𝜔(𝑞)) (7.6.167)
≥𝐷b𝛼 (P(𝜔( 𝑝))∥P(𝜔(𝑞))) (7.6.168)
b𝛼 (𝜌∥𝜎).
=𝐷 (7.6.169)

The first equality follows because the geometric Rényi relative entropy reduces to
the classical Rényi relative entropy for commuting operators. The inequality is
a consequence of the data-processing inequality for the geometric Rényi relative
entropy (Theorem 7.45). The final equality follows from the constraint in (7.6.166).
Since the inequality holds for arbitrary 𝑝, 𝑞, and P satisfying (7.6.166), we conclude
that
inf {𝐷 𝛼 ( 𝑝∥𝑞) : P( 𝑝) = 𝜌, P(𝑞) = 𝜎} ≥ 𝐷 b𝛼 (𝜌∥𝜎). (7.6.170)
{𝑝,𝑞,P}

The equality in (7.6.165) then follows by demonstrating a specific distribution

𝑝, positive function 𝑞, and preparation channel P that saturate the inequality in
(7.6.170). The optimal choices of 𝑝, 𝑞, and P are given by

𝑝(𝑥) B 𝜆 𝑥 𝑞(𝑥), (7.6.171)

𝑞(𝑥) B Tr[Π𝑥 𝜎], (7.6.172)
1 1
∑︁ 𝜎 2 Π𝑥 𝜎 2
P(·) B ⟨𝑥|(·)|𝑥⟩ , (7.6.173)
𝑥
𝑞(𝑥)
1 1
where the spectral decomposition of the positive semi-definite operator 𝜎 − 2 𝜌𝜎 − 2
is given by ∑︁
− 21 − 12
𝜎 𝜌𝜎 = 𝜆 𝑥 Π𝑥 . (7.6.174)
𝑥

The choice of 𝑝(𝑥) above is a probability distribution because

∑︁ ∑︁ ∑︁ 1 1
𝑝(𝑥) = 𝜆𝑥 𝑞(𝑥) = 𝜆𝑥 Tr[Π𝑥 𝜎] = Tr[𝜎 − 2 𝜌𝜎 − 2 𝜎] = Tr[Π𝜎 𝜌] = 1.
𝑥 𝑥 𝑥
(7.6.175)

410
Chapter 7: Quantum Entropies and Information

The preparation channel P is a classical–quantum channel that measures the input

1 1
𝜎 2 Π𝑥 𝜎 2
in the basis {|𝑥⟩}𝑥 and prepares the state 𝑞(𝑥) if the measurement outcome is 𝑥.
We find that
!
∑︁ 𝑝(𝑥) 1 1
∑︁ 𝜆 𝑥 𝑞(𝑥) 1 1 1
∑︁ 1
P(𝜔( 𝑝)) = 𝜎 2 Π𝑥 𝜎 2 = 𝜎 2 Π𝑥 𝜎 2 = 𝜎 2 𝜆 𝑥 Π𝑥 𝜎 2
𝑥
𝑞(𝑥) 𝑥
𝑞(𝑥) 𝑥
1
1 1
1
= 𝜎 2 𝜎 − 2 𝜌𝜎 − 2 𝜎 2 = Π𝜎 𝜌Π𝜎 = 𝜌, (7.6.176)

and
!
∑︁ 𝑞(𝑥) 1 1 1
∑︁ 1
P(𝜔(𝑞)) = 𝜎 2 Π𝑥 𝜎 2 = 𝜎 2 Π𝑥 𝜎 2 = 𝜎. (7.6.177)
𝑥
𝑞(𝑥) 𝑥

Finally, consider the classical Rényi relative quasi-entropy:

∑︁ ∑︁ ∑︁ ∑︁
𝛼 1−𝛼 𝛼 1−𝛼 𝛼
𝑝(𝑥) 𝑞(𝑥) = (𝜆 𝑥 𝑞(𝑥)) 𝑞(𝑥) = 𝜆 𝑥 𝑞(𝑥) = 𝜆𝛼𝑥 Tr[Π𝑥 𝜎]
𝑥 𝑥 𝑥 𝑥
" !#
∑︁ h 1 𝛼i
−2 − 12
= Tr 𝜎 𝜆𝛼𝑥 Π𝑥 = Tr 𝜎 𝜎 𝜌𝜎 b𝛼 (𝜌∥𝜎),
=𝑄
𝑥
(7.6.178)
where the second-to-last equality follows from the spectral decomposition in
(7.6.174) and the form of the geometric Rényi relative quasi-entropy from Proposi-
tion 7.40. As a consequence of the equality
∑︁
𝑝(𝑥) 𝛼 𝑞(𝑥) 1−𝛼 = 𝑄
b𝛼 (𝜌∥𝜎), (7.6.179)
𝑥

and the fact that these choices of 𝑝, 𝑞, and P satisfy the constraints P( 𝑝) = 𝜌 and
P(𝑞) = 𝜎, we conclude that

𝐷 𝛼 ( 𝑝∥𝑞) = 𝐷
b𝛼 (𝜌∥𝜎). (7.6.180)

Combining this equality with (7.6.170), we conclude the equality in (7.6.165). ■

The following proposition establishes the ordering between the sandwiched,

Petz–, and geometric Rényi relative entropies for the interval 𝛼 ∈ (0, 1) ∪ (1, 2]. It
follows by applying similar reasoning as in the proof of Proposition 7.48.

411
Chapter 7: Quantum Entropies and Information

Proposition 7.49
Let 𝜌 be a state and 𝜎 a positive semi-definite operator. For 𝛼 ∈ (0, 1) ∪ (1, 2],
the following inequalities hold
e𝛼 (𝜌∥𝜎) ≤ 𝐷 𝛼 (𝜌∥𝜎) ≤ 𝐷
𝐷 b𝛼 (𝜌∥𝜎), (7.6.181)

for the sandwiched ( 𝐷

e𝛼 ), Petz (𝐷 𝛼 ), and geometric ( 𝐷
b𝛼 ) Rényi relative
entropies.

Proof: The first inequality was stated as the last property of Proposition 7.31.
So we establish the proof of the second inequality here. Suppose that P is a
classical–quantum channel, 𝑝 : X → [0, 1] is a probability distribution over a finite
alphabet X, and 𝑞 : X → (0, ∞) is a positive function on X satisfying

P(𝜔( 𝑝)) = 𝜌, P(𝜔(𝑞)) = 𝜎, (7.6.182)

where
∑︁ ∑︁
𝜔( 𝑝) B 𝑝(𝑥)|𝑥⟩⟨𝑥|, 𝜔(𝑞) B 𝑞(𝑥)|𝑥⟩⟨𝑥|. (7.6.183)
𝑥∈X 𝑥∈X

Then consider the following chain of inequalities:

𝐷 𝛼 ( 𝑝∥𝑞) = 𝐷 𝛼 (𝜔( 𝑝)∥𝜔(𝑞)) (7.6.184)

≥ 𝐷 𝛼 (P(𝜔( 𝑝))∥P(𝜔(𝑞))) (7.6.185)
= 𝐷 𝛼 (𝜌∥𝜎). (7.6.186)

The first equality follows because the Petz–Rényi relative entropy reduces to
the classical Rényi relative entropy for commuting operators. The inequality
follows from the data-processing inequality for the Petz–Rényi relative entropy for
𝛼 ∈ (0, 1) ∪ (1, 2] (Theorem 7.24). The final equality follows from the constraint in
(7.6.182). Since the inequality above holds for all 𝑝, 𝑞, and P satisfying (7.6.182),
we conclude that

inf {𝐷 𝛼 ( 𝑝∥𝑞) : P( 𝑝) = 𝜌, P(𝑞) = 𝜎} ≥ 𝐷 𝛼 (𝜌∥𝜎). (7.6.187)

{𝑝,𝑞,P}

Now applying Proposition 7.48, we conclude the second inequality in (7.6.181). ■

412
Chapter 7: Quantum Entropies and Information

7.6.1 Proof of Proposition 7.41

First consider that

(1 − 𝛿) 𝜌′𝛿 ≤ 𝜌 𝛿 ≤ 𝜌′𝛿 , (7.6.188)
where
𝜌′𝛿 B 𝜌 + 𝛿𝜋. (7.6.189)
By operator monotonicity of 𝑥 𝛼 for 𝛼 ∈ (0, 1), we conclude that
𝛼 𝛼
𝛼 − 21 ′ − 12 − 21 − 12
(1 − 𝛿) Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 ≤ Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 (7.6.190)
𝛼
− 21 ′ − 12
≤ Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 . (7.6.191)

The prefactors in these bounds to the left of the trace expressions are uniform and
independent of 𝜀, and so it follows that
𝛼 𝛼
− 12 − 21 − 21 ′ − 12
lim lim Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 = lim+ lim+ Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 , (7.6.192)
𝜀→0+ 𝛿→0+ 𝜀→0 𝛿→0
𝛼 𝛼
− 12 − 21 − 21 ′ − 12
lim+ lim+ Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 = lim+ lim+ Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 . (7.6.193)
𝛿→0 𝜀→0 𝛿→0 𝜀→0

Again from the operator monotonicity of 𝑥 𝛼 for 𝛼 ∈ (0, 1), we conclude for fixed
𝜀 > 0 that
𝛼 𝛼
− 12 ′ − 12 − 12 ′ − 12
𝛿1 ≤ 𝛿2 ⇒ Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿1 𝜎𝜀 ≤ Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿2 𝜎𝜀 ,
(7.6.194)
where 𝛿1 > 0. By exploiting the identity
𝛼
− 12 ′ − 12 1 1 1−𝛼
′ ′ −2 ′ −2
Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 = Tr 𝜌 𝛿 𝜌 𝛿 𝜎𝜀 𝜌 𝛿 (7.6.195)

from Proposition 7.39 and operator monotonicity of 𝑥 1−𝛼 for 𝛼 ∈ (0, 1), we
conclude for fixed 𝛿 > 0 that
𝛼 𝛼
− 12 − 12 − 12 ′ − 12
𝜀1 ≤ 𝜀2 ⇒ Tr 𝜎𝜀1 𝜎𝜀1 𝜌 𝛿 𝜎𝜀1 ≤ Tr 𝜎𝜀2 𝜎𝜀2 𝜌 𝛿 𝜎𝜀2 ,
(7.6.196)

413
Chapter 7: Quantum Entropies and Information

where 𝜀1 > 0. Thus, we find that

𝛼 𝛼
− 12 ′ − 12 − 21 ′ − 12
lim lim Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 = inf inf Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 , (7.6.197)
𝜀→0+ 𝛿→0+ 𝜀>0 𝛿>0
𝛼 𝛼
− 12 ′ − 12 − 21 ′ − 12
lim+ lim+ Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 = inf inf Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 . (7.6.198)
𝛿→0 𝜀→0 𝛿>0 𝜀>0

Since infima can be exchanged, we conclude the statement of the proposition.

7.6.2 Proof of Proposition 7.40

First suppose that 𝛼 ∈ (1, ∞) and supp(𝜌) ⊈ supp(𝜎). Then from Propositions 7.29
e𝛼 (𝜌∥𝜎) =
and 7.42 and the fact that the sandwiched Rényi relative quasi-entropy 𝑄
+∞ in this case, it follows that 𝑄 b𝛼 (𝜌∥𝜎) = +∞, thus establishing the third
expression in (7.6.16).
Now suppose that 𝛼 ∈ (0, 1) ∪ (1, ∞) and supp(𝜌) ⊆ supp(𝜎). Let us employ
the decomposition of the Hilbert space H as H = supp(𝜎) ⊕ ker(𝜎). Then we can
write 𝜌 as
𝜌0,0 𝜌0,1 𝜎 0
𝜌= † , 𝜎= . (7.6.199)
𝜌0,1 𝜌1,1 0 0
Writing 1 = Π𝜎 + Π𝜎⊥ , where Π𝜎 is the projection onto the support of 𝜎 and Π𝜎⊥ is
the projection onto the orthogonal complement of supp(𝜎), we find that

𝜎 + 𝜀Π𝜎 0
𝜎𝜀 = , (7.6.200)
0 𝜀Π𝜎⊥
which implies that !
− 12
− 21 (𝜎 + 𝜀Π𝜎 ) 0
𝜎𝜀 = 1 . (7.6.201)
0 𝜀 − 2 Π𝜎⊥

The condition supp(𝜌) ⊆ supp(𝜎) implies that 𝜌0,1 = 0 and 𝜌1,1 = 0. Then
− 12 − 12

− 21
𝜎𝜀 𝜌𝜎𝜀 =
− 12 (𝜎 + 𝜀Π 𝜎 ) 𝜌 0,0 (𝜎 + 𝜀Π 𝜎 ) 0 , (7.6.202)
0 0
so that
𝛼
− 12 − 12
Tr 𝜎𝜀 𝜎𝜀 𝜌𝜎𝜀

414
Chapter 7: Quantum Entropies and Information
" h i𝛼 !#
− 12 − 12
𝜎 + 𝜀Π𝜎 0 (𝜎 + 𝜀Π𝜎 ) 𝜌0,0 (𝜎 + 𝜀Π𝜎 ) 0
= Tr ⊥ (7.6.203)
0 𝜀Π𝜎 0 0
h h i𝛼i
− 21 − 21
= Tr (𝜎 + 𝜀Π𝜎 ) (𝜎 + 𝜀Π𝜎 ) 𝜌0,0 (𝜎 + 𝜀Π𝜎 ) . (7.6.204)

Taking the limit 𝜀 → 0+ then leads to

𝛼 h 1 𝛼i
− 12 − 21 −2 − 12
lim Tr 𝜎𝜀 𝜎𝜀 𝜌𝜎𝜀 = Tr 𝜎 𝜎 𝜌0,0 𝜎 (7.6.205)
𝜀→0+
h 1 𝛼i
−2 − 12
= Tr 𝜎 𝜎 𝜌𝜎 , (7.6.206)

thus establishing the first expression in (7.6.16).

We now establish (7.6.19). For 𝛼 ∈ (1, ∞) and supp(𝜌) ⊆ supp(𝜎), the same
analysis implies that
𝛼 𝛼
− 12 − 21 − 21 − 12
Tr 𝜎𝜀 𝜎𝜀 𝜌𝜎𝜀 = Tr 𝜎
ˆ𝜀 𝜎
ˆ 𝜀 𝜌0,0 𝜎
ˆ𝜀 , (7.6.207)

where
ˆ 𝜀 B 𝜎 + 𝜀Π𝜎 .
𝜎 (7.6.208)
Since 𝛼 𝛼−1
−1 −1 −1 −1 −1 −1
ˆ 𝜀 2 𝜌0,0 𝜎
𝜎 ˆ𝜀 2 = ˆ 𝜀 2 𝜌0,0 𝜎
𝜎 ˆ𝜀 2 ˆ 𝜀 2 𝜌0,0 𝜎
𝜎 ˆ𝜀 2 (7.6.209)

for 𝛼 > 1, we have that

" 𝛼−1 #
− 12 − 21 − 12 − 12
Tr 𝜎
ˆ 𝜀𝜎
ˆ 𝜀 𝜌0,0 𝜎
ˆ𝜀 ˆ 𝜀 𝜌0,0 𝜎
𝜎 ˆ𝜀
" 𝛼−1 #
1 1 1
− 21 − 21 1 1
− 12
= Tr 𝜎 2 2
ˆ𝜀
ˆ 𝜀 𝜌0,0 𝜌0,0 𝜎 2 2
ˆ 𝜀 𝜌0,0 𝜌0,0 𝜎
𝜎 ˆ𝜀 2
(7.6.210)
" 𝛼−1 #
1 1 1
− 12 − 21 1 1
− 21
= Tr 𝜎 2 2
ˆ 𝜀 𝜌0,0 𝜌0,0 𝜎 ˆ 𝜀 𝜌0,0
ˆ𝜀 𝜎 2 2 2
ˆ𝜀
𝜌0,0 𝜎 (7.6.211)
" 𝛼−1 #
1
− 12 1 1 1 1
= Tr 𝜌0,0 𝜎 2
ˆ𝜀 𝜎 ˆ 𝜀−1 𝜌0,0
ˆ 𝜀 𝜌0,0 𝜌0,0 𝜎 2 2 2 2
(7.6.212)
" 𝛼−1 #
1 1
ˆ 𝜀−1 𝜌0,0
= Tr 𝜌0,0 𝜌0,0 𝜎 2 2
, (7.6.213)

415
Chapter 7: Quantum Entropies and Information

1
−1
where we applied Lemma 2.5 with 𝑓 (𝑥) = 𝑥 𝛼−1 and 𝐿 = 𝜌0,0 2
ˆ 𝜀 2 . Now taking the
𝜎
limit 𝜀 → 0+ , we conclude that
𝛼 " 𝛼−1 #
1 1 1 1
− −
lim+ Tr 𝜎𝜀 𝜎𝜀 2 𝜌𝜎𝜀 2 = lim+ Tr 𝜌0,0 𝜌0,0
2
ˆ 𝜀−1 𝜌0,0
𝜎 2
(7.6.214)
𝜀→0 𝜀→0
" 𝛼−1 #
1 1
= Tr 𝜌0,0 𝜌0,0 𝜎 −1 𝜌0,0
2 2
(7.6.215)

1 1 𝛼−1
−1
= Tr 𝜌 𝜌 2 𝜎 𝜌 2 , (7.6.216)

for the case 𝛼 ∈ (1, ∞) and supp(𝜌) ⊆ supp(𝜎), thus establishing (7.6.19).
For the case that 𝛼 ∈ (0, 1) and supp(𝜎) ⊆ supp(𝜌), we can employ the limit
exchange from Lemma 7.41 and a similar argument as in (7.6.199)–(7.6.206), but
with respect to the decomposition H = supp(𝜌) ⊕ ker(𝜌), to conclude that

1 1 1−𝛼
− −
b𝛼 (𝜌∥𝜎) = Tr 𝜌 𝜌 2 𝜎𝜌 2
𝑄 , (7.6.217)

thus establishing the second expression in (7.6.16). This case amounts to the
exchange 𝜌 ↔ 𝜎 and 𝛼 ↔ 1 − 𝛼.
We finally consider the case 𝛼 ∈ (0, 1) and supp(𝜌) ⊈ supp(𝜎), which is the
most involved case. Consider that

𝜎𝜀 B 𝜎 + 𝜀 1 =
ˆ𝜀 0
𝜎
, (7.6.218)
0 𝜀Π𝜎⊥

ˆ 𝜀 B 𝜎 + 𝜀Π𝜎 . Let us define

where 𝜎

𝜌 𝛿 B (1 − 𝛿) 𝜌 + 𝛿𝜋, (7.6.219)

with 𝛿 ∈ (0, 1) and 𝜋 the maximally mixed state. By invoking Lemma 7.41, we
conclude that the following exchange of limits is possible for 𝛼 ∈ (0, 1):

lim 𝐷 𝛼 (𝜌∥𝜎𝜀 ) = lim+ lim+ 𝐷 𝛼 (𝜌 𝛿 ∥𝜎𝜀 ) = lim+ lim+ 𝐷 𝛼 (𝜌 𝛿 ∥𝜎𝜀 ). (7.6.220)

𝜀→0+ 𝜀→0 𝛿→0 𝛿→0 𝜀→0

Now define
𝛿
𝜌0,0 B Π𝜎 𝜌 𝛿 Π𝜎 , 𝛿
𝜌0,1 B Π𝜎 𝜌 𝛿 Π𝜎⊥ , 𝛿
𝜌1,1 B Π𝜎⊥ 𝜌 𝛿 Π𝜎⊥ , (7.6.221)
416
Chapter 7: Quantum Entropies and Information

so that
𝛿 𝛿
𝜌0,0 𝜌0,1
𝜌𝛿 = 𝛿 † . (7.6.222)
(𝜌0,1 ) 𝜌1,1
𝛿

Then 𝛼
1 − 21 − 12
𝐷 𝛼 (𝜌 𝛿 ∥𝜎𝜀 ) = log2 Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 . (7.6.223)
𝛼−1
Consider that
− 21
𝛿 𝛿 − 21
−1 −1 ˆ
𝜎 0 𝜌0,0 𝜌0,1 ˆ𝜀 0
𝜎
𝜎𝜀 2 𝜌 𝛿 𝜎𝜀 2 = 𝜀 (7.6.224)
0 𝜀Π𝜎⊥ 𝛿 †
(𝜌0,1 ) 𝜌1,1
𝛿 0 𝜀Π𝜎⊥
! !
−1 𝛿 𝛿 − 12
ˆ𝜀 2 0 𝜌0,0 𝜌0,1 𝜎 ˆ𝜀 0
= 𝜎 † (7.6.225)
𝜀 − 2 Π ⊥ (𝜌0,1 ) 𝜌1,1
1 𝛿 𝛿 1
0 𝜎 0 𝜀− 2 Π⊥ 𝜎
− 12 −1 1 − 12
© 𝜎 ˆ𝜀 𝛿
𝜌0,0 ˆ𝜀 2
𝜎 𝜀− 2 𝜎 𝛿
ˆ 𝜀 𝜌0,1 Π𝜎⊥ ª
= 1 − 12 ® (7.6.226)
𝜀 − 2 Π ⊥ (𝜌 𝛿 ) † 𝜎
ˆ 𝜀 −1 Π ⊥ 𝜌 𝛿 Π ⊥
« 𝜎 0,1 𝜀 𝜎 1,1 𝜎 ¬
−1 −1 1 −1
© 𝜎
𝛿
ˆ 𝜀 2 𝜌0,0 ˆ 𝜀 2 𝜀− 2 𝜎
𝜎 𝛿
ˆ 𝜀 2 𝜌0,1
= 1
ª
− 1 ®. (7.6.227)
𝜀 − 2 (𝜌 𝛿 ) † 𝜎
ˆ 2
𝜀 −1 𝜌 𝛿
« 0,1 𝜀 1,1 ¬

So then
𝛼
−1 −1
Tr 𝜎𝜀 𝜎𝜀 2 𝜌 𝛿 𝜎𝜀 2
𝛼
 − 12 𝛿 − 12 − 1 − 12 𝛿
 𝜎ˆ 0 ©© 𝜎 ˆ 𝜀 𝜌0,0 𝜎 ˆ𝜀 𝜀 𝜎 2 ˆ 𝜀 𝜌0,1 ªª 
= Tr  𝜀 ⊥ 1 − 21 ®®  (7.6.228)
 0 𝜀Π𝜎 𝜀 − 2 (𝜌 𝛿 ) † 𝜎 ˆ 𝜀 −1 𝜌 𝛿
 «« 0,1 𝜀 1,1 ¬¬ 

𝛼
 − 12 𝛿 − 21 1 − 12 𝛿
 𝜎ˆ 𝜀
0 © −1 © 𝜀 0,0 𝜀 ˆ
𝜎 𝜌 ˆ
𝜎 𝜀 2 ˆ
𝜎 𝜀 𝜌 
= Tr  𝜀 ⊥ 𝜀 1 −2 1
0,1 ªª 
®® (7.6.229)
 0 𝜀Π𝜎 2 (𝜌 𝛿 ) † 𝜎 𝛿 
« « 𝜀 0,1 ˆ 𝜀 𝜌 1,1 ¬¬ 


𝛼
 −𝛼 − 12 𝛿 − 21 1 − 12 𝛿
 𝜀 𝜎 ˆ 𝜀 0 𝜀 𝜎ˆ
© 𝜀 0,0 𝜀 𝜌 𝜎ˆ 𝜀 2𝜎 ˆ 𝜀 𝜌 0,1 ª 

= Tr  1−𝛼 ⊥ −2 1 (7.6.230)
𝜀 Π𝜎 𝜀 2 (𝜌 𝛿 ) † 𝜎
1 ®
 0 𝛿 
« 0,1 ˆ 𝜀 𝜌 1,1 ¬ 


Let us define
𝛿 −1 −1 𝛿 1 −1
ˆ 𝜀 2 𝜌0,0
©𝜀 𝜎 ˆ 𝜀 2 𝜀2𝜎
𝜎 ˆ 𝜀 2 𝜌0,1
𝐾 (𝜀) B 1
ª
1 ®, (7.6.231)
𝛿 † −2
«𝜀 (𝜌0,1 ) 𝜎
𝛿
2 ˆ𝜀 𝜌1,1 ¬
417
Chapter 7: Quantum Entropies and Information

so that we can write

𝛼 −𝛼
− 12 − 21 𝜀 𝜎 ˆ𝜀 0 𝛼
Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 = Tr (𝐾 (𝜀)) . (7.6.232)
0 𝜀 1−𝛼 Π𝜎⊥
Now let us invoke Lemma 7.50 with the substitutions
𝛿
𝐴 ↔ 𝜌1,1 , (7.6.233)
𝛿 † −1
𝐵 ↔ (𝜌0,1 ) 𝜎
ˆ 𝜀 2, (7.6.234)
𝛿 −1 −1
𝐶↔𝜎
ˆ 𝜀 2 𝜌0,0 ˆ 𝜀 2,
𝜎 (7.6.235)
1
𝜀↔𝜀 . 2 (7.6.236)

Defining

𝜀𝑆(𝜌 𝛿 , 𝜎 ˆ 𝜀) 0
𝐿 (𝜀) B , (7.6.237)
0 𝛿
𝜌1,1 + 𝜀𝑅

𝛿 − 21 𝛿 𝛿 𝛿 −1 𝛿 † −1
ˆ 𝜀) B 𝜎
𝑆(𝜌 , 𝜎 ˆ 𝜀 𝜌0,0 − 𝜌0,1 (𝜌1,1 ) (𝜌0,1 ) 𝜎 ˆ 𝜀 2, (7.6.238)
𝛿 −1 𝛿 †
𝑅 B Re[(𝜌1,1 ˆ 𝜀 ) −1 (𝜌0,1
) (𝜌0,1 ) ( 𝜎 𝛿
)], (7.6.239)

we conclude from Lemma 7.50 that

√ √
𝐾 (𝜀) − 𝑒 −𝑖 𝜀𝐺
𝐿(𝜀)𝑒𝑖 𝜀𝐺
≤ 𝑜(𝜀), (7.6.240)
∞

where 𝐺 in Lemma 7.50 is defined from 𝐴 and 𝐵 above. The inequality in (7.6.240)
in turn implies the following operator inequalities:
√ √ √ √
𝑒 −𝑖 𝜀𝐺
𝐿(𝜀)𝑒 𝑖 𝜀𝐺
− 𝑜(𝜀) 1 ≤ 𝐾 (𝜀) ≤ 𝑒 −𝑖 𝜀𝐺
𝐿(𝜀)𝑒 𝑖 𝜀𝐺
+ 𝑜(𝜀) 1. (7.6.241)

Observe that
√ √ √ √
𝑒 −𝑖 𝜀𝐺
𝐿 (𝜀)𝑒𝑖 𝜀𝐺
+ 𝑜(𝜀) 1 = 𝑒 −𝑖 𝜀𝐺
[𝐿 (𝜀) + 𝑜(𝜀) 1] 𝑒𝑖 𝜀𝐺
. (7.6.242)

Now invoking these and the operator monotonicity of the function 𝑥 𝛼 for 𝛼 ∈ (0, 1),
we find that
𝛼
− 12 − 21
Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 (7.6.243)
−𝛼
𝜀 𝜎 ˆ𝜀 0
= Tr (𝐾 (𝜀)) 𝛼 (7.6.244)
0 𝜀 1−𝛼 Π𝜎⊥
418
Chapter 7: Quantum Entropies and Information

𝜀 −𝛼 𝜎
√ √ 𝛼

𝑒 −𝑖 𝜀𝐺 [𝐿 (𝜀) + 𝑜(𝜀) 1] 𝑒𝑖 𝜀𝐺
ˆ𝜀 0
≤ Tr (7.6.245)
0 𝜀 1−𝛼 Π𝜎⊥
−𝛼 √ √

(𝐿 (𝜀) + 𝑜(𝜀) 1) 𝑒
𝜀 𝜎 ˆ𝜀 0 −𝑖 𝜀𝐺 𝛼 𝑖 𝜀𝐺
= Tr 𝑒 . (7.6.246)
0 𝜀 1−𝛼 Π𝜎⊥

Consider that

(𝐿 (𝜀) + 𝑜(𝜀) 1) 𝛼
ˆ 𝜀 ) + 𝑜(𝜀) 1
𝛼
𝜀𝑆(𝜌 𝛿 , 𝜎 0
=
+ 𝜀𝑅 + 𝑜(𝜀) 1
𝛿 (7.6.247)
0 𝜌1,1
ˆ 𝜀 ) + 𝑜(𝜀) 1
𝛼 !
𝜀𝑆(𝜌 𝛿 , 𝜎 0
= 𝛼
𝜌1,1 + 𝜀𝑅 + 𝑜(𝜀) 1
𝛿 (7.6.248)
0
ˆ 𝜀 ) + 𝑜(1) 1
𝛼 !
𝜀 𝛼 𝑆(𝜌 𝛿 , 𝜎 0
= 𝛼 .
𝜌1,1 + 𝜀𝑅 + 𝑜(𝜀) 1
𝛿 (7.6.249)
0
√
Now expanding 𝑒𝑖 𝜀𝐺 to first order in order to evaluate (7.6.246) (higher order
terms will end up being irrelevant), we find that
−𝛼 √ √

𝑒 −𝑖 𝜀𝐺 (𝐿(𝜀) + 𝑜(𝜀) 1) 𝛼 𝑒𝑖 𝜀𝐺
𝜀 𝜎 ˆ𝜀 0
Tr
0 𝜀 1−𝛼 Π𝜎⊥
−𝛼
(𝐿 (𝜀) + 𝑜(𝜀) 1)
𝜀 𝜎 ˆ𝜀 0 𝛼
= Tr
0 𝜀 1−𝛼 Π𝜎⊥
−𝛼
√

−𝑖 𝜀𝐺 (𝐿 (𝜀) + 𝑜(𝜀) 1)
𝜀 𝜎 ˆ𝜀 0 𝛼
+ Tr
0 𝜀 1−𝛼 Π𝜎⊥
−𝛼
√

(𝐿 (𝜀) + 𝑜(𝜀) 1) 𝛼 𝑖 𝜀𝐺 + 𝑜(1)
𝜀 𝜎 ˆ𝜀 0
+ Tr (7.6.250)
0 𝜀 1−𝛼 Π𝜎⊥
ˆ 𝜀 ) + 𝑜(1) 1
" 𝛼 !#
𝜎ˆ 𝜀 𝑆(𝜌 𝛿 , 𝜎 0
= Tr 𝛼
0 𝜀 Π𝜎 𝜌1,1 + 𝜀𝑅 + 𝑜(𝜀) 1
1−𝛼 ⊥ 𝛿

ˆ 𝜀 ) + 𝑜(1) 1 𝜎
" 𝛼 ! #
√ 𝑆(𝜌 𝛿 , 𝜎 ˆ𝜀 0
− 𝑖 𝜀 Tr 𝛼
𝜌1,1 + 𝜀𝑅 + 𝑜(𝜀) 1 Π𝜎⊥
1−𝛼 𝛿 𝐺
0 𝜀
1
" 𝛼 ! #
√ ˆ
𝜎 𝜀 𝑆(𝜌 𝛿, 𝜎ˆ 𝜀 ) + 𝑜(1) 0
+ 𝑖 𝜀 Tr 𝛼 𝐺 + 𝑜(1)
0 𝜀 Π𝜎 𝜌1,1 + 𝜀𝑅 + 𝑜(𝜀) 1
1−𝛼 ⊥ 𝛿

(7.6.251)

419
Chapter 7: Quantum Entropies and Information

ˆ 𝜀 ) + 𝑜(1) 1
" 𝛼 !#
ˆ 𝜀 𝑆(𝜌 𝛿 , 𝜎
𝜎 0
= Tr 𝛼 + 𝑜(1)
0 𝜀 1−𝛼 Π𝜎⊥ 𝜌1,1
𝛿
+ 𝜀𝑅 + 𝑜(𝜀) 1
(7.6.252)
h 𝛼i 𝛼
ˆ 𝜀 ) + 𝑜(1) 1
ˆ 𝜀 𝑆(𝜌 𝛿 , 𝜎
= Tr 𝜎 + 𝜀 1−𝛼 Tr[Π𝜎⊥ 𝜌1,1
𝛿
+ 𝜀𝑅 + 𝑜(𝜀) 1 ] + 𝑜(1).
(7.6.253)
√
By observing the last line, we see that higher order terms for 𝑒𝑖 𝜀𝐺 include prefactors
of 𝜀 (or higher powers), which vanish in the 𝜀 → 0+ limit. Now taking the limit
𝜀 → 0+ , we find that
−𝛼 √ √

𝑒 −𝑖 𝜀𝐺 (𝐿 (𝜀) + 𝑜(𝜀) 1) 𝑒𝑖 𝜀𝐺
𝜀 𝜎 ˆ𝜀 0 𝛼
lim Tr
𝜀→0+ 0 𝜀 1−𝛼 Π𝜎⊥
1 𝛼
h 1 i
= Tr 𝜎 𝜎 − 2 𝜌0,0 𝛿 𝛿
− 𝜌0,1 𝛿 −1 𝛿 †
(𝜌1,1 ) (𝜌0,1 ) 𝜎 − 2 , (7.6.254)

where the inverses are taken on the support of 𝜎. By proceeding in a similar way,
but using the lower bound in (7.6.241), we find the following lower bound on
(7.6.243):
−𝛼 √ √

𝑒 −𝑖 𝜀𝐺 (𝐿(𝜀) − 𝑜(𝜀) 1) 𝛼 𝑒𝑖 𝜀𝐺 .
𝜀 𝜎 ˆ𝜀 0
Tr (7.6.255)
0 𝜀 1−𝛼 Π𝜎⊥
Then by the same argument above, the lower bound on (7.6.243) after taking the
limit 𝜀 → 0+ is the same as in (7.6.254). So we conclude that
𝛼 h 1 𝛼i
− 12 − 21 −2 𝛿 𝛿 𝛿 −1 𝛿 † − 21
lim+ Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 = Tr 𝜎 𝜎 𝜌0,0 − 𝜌0,1 (𝜌1,1 ) (𝜌0,1 ) 𝜎 .
𝜀→0
(7.6.256)
Now consider that
𝛿 𝛿 𝛿 −1 𝛿 † −1 †
lim+ 𝜌0,0 − 𝜌0,1 (𝜌1,1 ) (𝜌0,1 ) = 𝜌0,0 − 𝜌0,1 𝜌1,1 𝜌0,1 , (7.6.257)
𝛿→0

where the inverse on the right is taken on the support of 𝜌1,1 . This follows because
†
the image of 𝜌0,1 is contained in the support of 𝜌1,1 . Thus, we take the limit 𝛿 → 0+ ,
and find that
𝛼 h 1 𝛼i
− 12 − 12 −2 −1 † − 21
lim+ lim+ Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 = Tr 𝜎 𝜎 𝜌0,0 − 𝜌0,1 𝜌1,1 𝜌0,1 𝜎 ,
𝛿→0 𝜀→0
(7.6.258)
where all inverses are taken on the support. This concludes the proof.

420
Chapter 7: Quantum Entropies and Information

Lemma 7.50
Let 𝐴 be an invertible Hermitian operator, 𝐵 a linear operator, 𝐶 a Hermitian
operator, and let 𝜀 > 0. Then with

𝐴 𝜀𝐵
𝑀 (𝜀) B , (7.6.259)
𝜀𝐵† 𝜀 2𝐶
𝐴 + 𝜀 2 Re[ 𝐴−1 𝐵𝐵† ]

0
𝐷 (𝜀) B , (7.6.260)
0 𝜀 2 𝐶 − 𝐵† 𝐴−1 𝐵
−𝑖 𝐴−1 𝐵

0
𝐺B , (7.6.261)
𝑖𝐵† 𝐴−1 0

the following inequality holds

𝑀 (𝜀) − 𝑒 −𝑖𝜀𝐺 𝐷 (𝜀)𝑒𝑖𝜀𝐺 ∞

≤ 𝑜(𝜀 2 ). (7.6.262)

Proof. Observe that 𝐺 is Hermitian and consider that

−𝑖𝜀𝐺 𝜀2 2 𝜀2 2
𝑖𝜀𝐺
𝑒 𝑀 (𝜀)𝑒 = 𝐼 + 𝑖𝜀𝐺 − 𝐺 𝑀 (𝜀) 𝐼 − 𝑖𝜀𝐺 − 𝐺 + 𝑜(𝜀 2 ).
2 2
Then we find that

𝜀2 2 𝜀2 2
𝐼 + 𝑖𝜀𝐺 − 𝐺 𝑀 (𝜀) 𝐼 − 𝑖𝜀𝐺 − 𝐺 = 𝑀 (𝜀) + 𝑖𝜀 [𝐺 𝑀 (𝜀) − 𝑀 (𝜀)𝐺]
2 2

1 1
+ 𝜀 2 𝐺 𝑀 (𝜀)𝐺 − 𝐺 2 𝑀 (𝜀) − 𝑀 (𝜀)𝐺 2 + 𝑜(𝜀 2 ). (7.6.263)
2 2
Now observe that
−𝑖 𝐴−1 𝐵

0 𝐴 𝜀𝐵
𝐺 𝑀 (𝜀) = (7.6.264)
𝑖𝐵† 𝐴−1 0 𝜀𝐵† 𝜀 2𝐶
−𝑖𝜀 𝐴−1 𝐵𝐵† −𝑖𝜀 2 𝐴−1 𝐵𝐶

= (7.6.265)
𝑖𝐵† 𝑖𝜀𝐵† 𝐴−1 𝐵
−𝑖𝜀 𝐴−1 𝐵𝐵†

𝑜(𝜀)
= , (7.6.266)
𝑖𝐵† 𝑖𝜀𝐵† 𝐴−1 𝐵
𝑀 (𝜀)𝐺 = [𝐺 𝑀 (𝜀)] † (7.6.267)
𝑖𝜀𝐵𝐵† 𝐴−1

−𝑖𝐵
= , (7.6.268)
𝑜(𝜀) −𝑖𝜀𝐵† 𝐴−1 𝐵
421
Chapter 7: Quantum Entropies and Information

which implies that

𝑖𝜀 [𝐺 𝑀 (𝜀) − 𝑀 (𝜀)𝐺]
−𝑖𝜀 𝐴−1 𝐵𝐵† 𝑖𝜀𝐵𝐵† 𝐴−1

𝑜(𝜀) −𝑖𝐵
= 𝑖𝜀 − (7.6.269)
𝑖𝐵† 𝑖𝜀𝐵† 𝐴−1 𝐵 𝑜(𝜀) −𝑖𝜀𝐵† 𝐴−1 𝐵
2𝜀 Re[ 𝐴−1 𝐵𝐵† ] −𝜀𝐵 + 𝑜(𝜀 2 )
2
= . (7.6.270)
−𝜀𝐵† + 𝑜(𝜀 2 ) −2𝜀 2 𝐵† 𝐴−1 𝐵

Also, observe that

−𝑖 𝐴−1 𝐵

𝑜(1) 𝑜(𝜀) 0
𝐺 𝑀 (𝜀)𝐺 = (7.6.271)
𝑖𝐵† 𝑜(1) 𝑖𝐵† 𝐴−1 0

𝑜(𝜀) 𝑜(1)
= , (7.6.272)
𝑜(1) 𝐵 𝐴−1 𝐵
†

𝐺 2 𝑀 (𝜀) = 𝐺 [𝐺 𝑀 (𝜀)] (7.6.273)

−𝑖 𝐴−1 𝐵 𝑜(1) 𝑜(𝜀)

0
= (7.6.274)
𝑖𝐵† 𝐴−1 0 𝑖𝐵† 𝑜(1)
−1 †
𝐴 𝐵𝐵 𝑜(1)
= , (7.6.275)
𝑜(1) 𝑜(𝜀)
†
𝑀 (𝜀)𝐺 2 = 𝐺 2 𝑀 (𝜀) (7.6.276)
† −1
𝐵𝐵 𝐴 𝑜(1)
= . (7.6.277)
𝑜(1) 𝑜(𝜀)

So then we find that

1 1
𝜀 2 𝐺 𝑀 (𝜀)𝐺 − 𝐺 2 𝑀 (𝜀) − 𝑀 (𝜀)𝐺 2
2 2
−1 †
1 𝐵𝐵† 𝐴−1 𝑜(1)

2 𝑜(𝜀) 𝑜(1) 1 𝐴 𝐵𝐵 𝑜(1)
=𝜀 − − (7.6.278)
𝑜(1) 𝐵† 𝐴−1 𝐵 2 𝑜(1) 𝑜(𝜀) 2 𝑜(1) 𝑜(𝜀)
−𝜀 Re[ 𝐴−1 𝐵𝐵† ] + 𝑜(𝜀 3 )
2
𝑜(𝜀 2 )
= . (7.6.279)
𝑜(𝜀 2 ) 𝜀 2 𝐵† 𝐴−1 𝐵 + 𝑜(𝜀 3 )

So then

𝜀2 2 𝜀2 2
𝐼 + 𝑖𝜀𝐺 − 𝐺 𝑀 (𝜀) 𝐼 − 𝑖𝜀𝐺 − 𝐺
2 2
= 𝑀 (𝜀) + 𝑖𝜀 [𝐺 𝑀 (𝜀) − 𝑀 (𝜀)𝐺]
422
Chapter 7: Quantum Entropies and Information

1 1
+ 𝜀 2 𝐺 𝑀 (𝜀)𝐺 − 𝐺 2 𝑀 (𝜀) − 𝑀 (𝜀)𝐺 2 + 𝑜(𝜀 2 ) (7.6.280)
2 2
2𝜀 Re[ 𝐴−1 𝐵𝐵† ] −𝜀𝐵 + 𝑜(𝜀 2 )
2
𝐴 𝜀𝐵
= +
𝜀𝐵† 𝜀 2𝐶 −𝜀𝐵† + 𝑜(𝜀 2 ) −2𝜀 2 𝐵† 𝐴−1 𝐵
−𝜀 Re[ 𝐴−1 𝐵𝐵† ] + 𝑜(𝜀 3 )
2
𝑜(𝜀 2 )
+ + 𝑜(𝜀 2 ) (7.6.281)
𝑜(𝜀 2 ) 𝜀 2 𝐵† 𝐴−1 𝐵 + 𝑜(𝜀 3 )
𝐴 + 𝜀 2 Re[ 𝐴−1 𝐵𝐵† ]

0 + 𝑜(𝜀 2 )
= † −1 (7.6.282)
0 𝜀 𝐶−𝐵 𝐴 𝐵
2

= 𝐷 (𝜀) + 𝑜(𝜀 2 ). (7.6.283)

So we conclude that

𝑒𝑖𝜀𝐺 𝑀 (𝜀)𝑒 −𝑖𝜀𝐺 = 𝐷 (𝜀) + 𝑜(𝜀 2 ), (7.6.284)

which in turn implies that

𝑀 (𝜀) = 𝑒 −𝑖𝜀𝐺 𝐷 (𝜀)𝑒𝑖𝜀𝐺 + 𝑜(𝜀 2 ), (7.6.285)

from which we conclude the claim in (7.6.262). ■

7.7 Belavkin–Staszewski Relative Entropy

A different quantum generalization of the classical relative entropy is given by the
Belavkin–Staszewski3 relative entropy:

Definition 7.51 Belavkin–Staszewski Relative Entropy

The Belavkin–Staszewski relative entropy of a quantum state 𝜌 and a positive
semi-definite operator 𝜎 is defined as
( h 1 1
i
Tr 𝜌 log 𝜌 2 𝜎 −1 𝜌 2 if supp(𝜌) ⊆ supp(𝜎)
𝐷b (𝜌∥𝜎) B 2 , (7.7.1)
+∞ otherwise

where the inverse 𝜎 −1 is taken on the support of 𝜎 and the logarithm is evaluated
on the support of 𝜌.

3The name Staszewski is pronounced Stah·shev·ski, with emphasis on the second syllable.

423
Chapter 7: Quantum Entropies and Information

This quantum generalization of classical relative entropy is not known to possess

an information-theoretic meaning. However, it is quite useful for obtaining upper
bounds on quantum channel capacities and quantum channel discrimination rates,
as considered in Part III of this book.
An important property of the Belavkin–Staszewski relative entropy is that it is
the limit of the geometric Rényi relative entropy as 𝛼 → 1.

Proposition 7.52
Let 𝜌 be a state and 𝜎 a positive semi-definite operator. Then, in the limit 𝛼 → 1,
the geometric Rényi relative entropy converges to the Belavkin–Staszewski
relative entropy:
lim 𝐷b𝛼 (𝜌∥𝜎) = 𝐷 b (𝜌∥𝜎). (7.7.2)
𝛼→1

Proof: Suppose at first that supp(𝜌) ⊆ supp(𝜎). Then 𝐷 b𝛼 (𝜌∥𝜎) is finite for
all 𝛼 ∈ (0, 1) ∪ (1, ∞), and we can write the following explicit formula for the
geometric Rényi relative entropy by employing Proposition 7.40:

b𝛼 (𝜌∥𝜎) = 1 b𝛼 (𝜌∥𝜎)
𝐷 log2 𝑄 (7.7.3)
𝛼−1
1 h 1 𝛼i
−2 − 12
= log2 Tr 𝜎 𝜎 𝜌𝜎 . (7.7.4)
𝛼−1
Our assumption implies that Tr[Π𝜎 𝜌] = 1, and we find that
h 1 i
−2 − 21
𝑄 1 (𝜌∥𝜎) = Tr 𝜎 𝜎 𝜌𝜎
b (7.7.5)
= Tr[Π𝜎 𝜌] (7.7.6)
= 1. (7.7.7)

Since log2 1 = 0, we can write

b𝛼 (𝜌∥𝜎) − log2 𝑄
log2 𝑄 b1 (𝜌∥𝜎)
b𝛼 (𝜌∥𝜎) =
𝐷 , (7.7.8)
𝛼−1
so that
b𝛼 (𝜌∥𝜎) − log2 𝑄
log2 𝑄 b1 (𝜌∥𝜎)
b𝛼 (𝜌∥𝜎) = lim
lim 𝐷 (7.7.9)
𝛼→1 𝛼→1 𝛼−1
424
Chapter 7: Quantum Entropies and Information

d b𝛼 (𝜌∥𝜎)
= log2 𝑄 (7.7.10)
d𝛼 𝛼=1
d b
1 d𝛼 𝑄 𝛼 (𝜌∥𝜎) 𝛼=1
= (7.7.11)
ln(2) b1 (𝜌∥𝜎)
𝑄
1 d b
= 𝑄 𝛼 (𝜌∥𝜎) . (7.7.12)
ln(2) d𝛼 𝛼=1

Then
d b d h 1 𝛼i
−2 − 12
𝑄 𝛼 (𝜌∥𝜎) = Tr 𝜎 𝜎 𝜌𝜎
d𝛼 𝛼=1 d𝛼
𝛼=1
d 1 1
𝛼
= Tr 𝜎 𝜎 − 2 𝜌𝜎 − 2 .
d𝛼 𝛼=1

For a positive semi-definite operator 𝑋 with spectral decomposition

∑︁
𝑋= 𝜈 𝑧 Π𝑧 , (7.7.13)
𝑧

it follows that

d 𝛼 d ∑︁ 𝛼
𝑋 = 𝜈 Π𝑧 (7.7.14)
d𝛼 𝛼=1 d𝛼 𝑧 𝑧
𝛼=1
∑︁ d
𝛼
= 𝜈𝑧 Π𝑧 (7.7.15)
𝑧
d𝛼 𝛼=1
∑︁
𝛼 𝛼
= 𝜈 𝑧 ln 𝜈 𝑧 𝛼=1 Π𝑧 (7.7.16)
𝑧
∑︁
= (𝜈 𝑧 ln 𝜈 𝑧 ) Π𝑧 (7.7.17)
𝑧
= 𝑋 ln∗ 𝑋, (7.7.18)

where
ln(𝑥) 𝑥 > 0
ln∗ (𝑥) B . (7.7.19)
0 𝑥=0
Thus we find that

d −1 −1 𝛼
Tr 𝜎 𝜎 2 𝜌𝜎 2
d𝛼 𝛼=1
425
Chapter 7: Quantum Entropies and Information
h 1 1 i
−2 − 21 −2 − 12
= Tr 𝜎 𝜎 𝜌𝜎 ln∗ 𝜎 𝜌𝜎 (7.7.20)
h 1 1 1 1 1 1 1 1 i
= Tr 𝜎 2 𝜌 2 𝜌 2 𝜎 2 ln∗ 𝜎 − 2 𝜌 2 𝜌 2 𝜎 − 2
−
(7.7.21)
h 1 1 1 1 1 1 1 1i
= Tr 𝜎 𝜌 ln∗ 𝜌 2 𝜎 − 2 𝜎 − 2 𝜌 2 𝜌 2 𝜎 − 2
2 2 (7.7.22)
h 1 1
1 1 1 1 i
= Tr 𝜌 Π𝜎 𝜌 ln∗ 𝜌 2 𝜎 − 2 𝜎 − 2 𝜌 2
2 2 (7.7.23)
h 1 1
i
−1
= Tr 𝜌 ln 𝜌 2 𝜎 𝜌 2 . (7.7.24)

The third equality follows from Lemma 2.5. The final equality follows from the
assumption supp(𝜌) ⊆ supp(𝜎) and by applying the interpretation of the logarithm
exactly as stated in Definition 7.51. Then we find that
h i
b𝛼 (𝜌∥𝜎) = Tr 𝜌 log2 𝜌 21 𝜎 −1 𝜌 12 ,
lim 𝐷 (7.7.25)
𝛼→1

for the case in which supp(𝜌) ⊆ supp(𝜎).

Now suppose that 𝛼 ∈ (1, ∞) and supp(𝜌) ⊈ supp(𝜎). Then 𝐷 b𝛼 (𝜌∥𝜎) = +∞,
so that lim𝛼→1+ 𝐷b𝛼 (𝜌∥𝜎) = +∞, consistent with the definition of the Belavkin–
Staszewski relative entropy in this case (see Definition 7.51).
Suppose that 𝛼 ∈ (0, 1) and supp(𝜌) ⊈ supp(𝜎). Employing Proposition 7.42,
b𝛼 (𝜌∥𝜎) ≥ 𝐷
we have that 𝐷 e𝛼 (𝜌∥𝜎) for all 𝛼 ∈ (0, 1). Since lim𝛼→1− 𝐷
e𝛼 (𝜌∥𝜎) =
+∞ in this case, it follows that lim𝛼→1− 𝐷
b𝛼 (𝜌∥𝜎) = +∞.

Therefore,

lim 𝐷b𝛼 (𝜌∥𝜎)

𝛼→1 −
( h 1 1
i
Tr 𝜌 log2 𝜌 2 𝜎 −1 𝜌 2 if supp(𝜌) ⊆ supp(𝜎)
= (7.7.26)
+∞ otherwise
=𝐷b (𝜌∥𝜎). (7.7.27)

b𝛼 (𝜌∥𝜎) = lim𝛼→1− 𝐷
To conclude, we have established that lim𝛼→1+ 𝐷 b𝛼 (𝜌∥𝜎) =
b (𝜌∥𝜎), which means that
𝐷
b𝛼 (𝜌∥𝜎) = 𝐷
lim 𝐷 b (𝜌∥𝜎), (7.7.28)
𝛼→1

as required. ■
426
Chapter 7: Quantum Entropies and Information

The following inequality relates the quantum relative entropy to the Belavkin–
Staszewski relative entropy:

Proposition 7.53
Let 𝜌 be a state and 𝜎 a positive semi-definite operator. Then the quantum
relative entropy is never larger than the Belavkin–Staszewski relative entropy:

𝐷 (𝜌∥𝜎) ≤ 𝐷
b (𝜌∥𝜎). (7.7.29)

Proof: If supp(𝜌) ⊈ supp(𝜎), then there is nothing to prove in this case because
both
𝐷 (𝜌∥𝜎) = 𝐷 b (𝜌∥𝜎) = +∞, (7.7.30)
and so the inequality in (7.7.29) holds trivially in this case. So let us suppose
instead that supp(𝜌) ⊆ supp(𝜎). From Propositions 7.42 and 7.40, we conclude
for all 𝛼 ∈ (0, 1) ∪ (1, ∞) that
e𝛼 (𝜌∥𝜎) ≤ 𝐷
𝐷 b𝛼 (𝜌∥𝜎). (7.7.31)

From Proposition 7.30, we know that

e𝛼 (𝜌∥𝜎) = 𝐷 (𝜌∥𝜎).
lim 𝐷 (7.7.32)
𝛼→1

While from Proposition 7.52, we know that

b𝛼 (𝜌∥𝜎) = 𝐷
lim 𝐷 b (𝜌∥𝜎). (7.7.33)
𝛼→1

Thus, applying the limit 𝛼 → 1 to (7.7.31) and the two equalities above, we
conclude (7.7.29). ■

Similar to what was shown in Proposition 7.2, Definition 7.51 is consistent with
the following limit:

Proposition 7.54
For every state 𝜌 and positive semi-definite operator 𝜎, the following limit

427
Chapter 7: Quantum Entropies and Information

holds
1 1
b (𝜌∥𝜎) = lim lim Tr 𝜌 𝛿 log2 𝜌 𝜎𝜀−1 𝜌
𝐷 2 2
, (7.7.34)
+ +
𝜀→0 𝛿→0 𝛿 𝛿

where 𝛿 ∈ (0, 1) and

𝜌 𝛿 B (1 − 𝛿) 𝜌 + 𝛿𝜋, 𝜎𝜀 B 𝜎 + 𝜀 1, (7.7.35)

with 𝜋 the maximally mixed state.

Proof: Suppose first that supp(𝜌) ⊆ supp(𝜎). We follow an approach similar to

that given in the proof of Proposition 7.40. Let us employ the decomposition of the
Hilbert space into supp(𝜎) ⊕ ker(𝜎). Then we can write 𝜌 and 𝜎 as in (7.6.199),
so that
(𝜎 + 𝜀Π𝜎 ) −1

−1 0
𝜎𝜀 = , (7.7.36)
0 𝜀 −1 Π𝜎⊥
where we have followed the developments in (7.6.199)–(7.6.201). The condition
supp(𝜌) ⊆ supp(𝜎) implies that 𝜌0,1 = 0 and 𝜌1,1 = 0. It thus follows that
lim𝛿→0+ 𝜌 𝛿 = 𝜌0,0 . We then find that

1 1 1 1 1 1 1 1 1 1
− − −
Tr 𝜌 𝛿 log2 𝜌 𝛿2 𝜎𝜀−1 𝜌 𝛿2 = Tr 𝜌 𝛿2 𝜎𝜀2 𝜎𝜀 2 𝜌 𝛿2 log2 𝜌 𝛿2 𝜎𝜀 2 𝜎𝜀 2 𝜌 𝛿2 (7.7.37)

1 1 1 1 1 1 1 1
− − −
= Tr 𝜌 𝛿2 𝜎𝜀2 log2 𝜎𝜀 2 𝜌 𝛿2 𝜌 𝛿2 𝜎𝜀 2 𝜎𝜀 2 𝜌 𝛿2 (7.7.38)

− 12 − 21 − 12 − 21
= Tr log2 𝜎𝜀 𝜌 𝛿 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 𝜎𝜀 (7.7.39)

− 21 − 21 − 12 − 21
= Tr 𝜎𝜀 𝜎𝜀 𝜌 𝛿 𝜎𝜀 log2 𝜎𝜀 𝜌 𝛿 𝜎𝜀 (7.7.40)

− 21 − 21
= Tr 𝜎𝜀 𝜂 𝜎𝜀 𝜌 𝛿 𝜎𝜀 , (7.7.41)

where the second equality follows from applying Lemma 2.5 with 𝑓 = log2 and
1
−1 −1 −1
𝐿 = 𝜌 𝛿2 𝜎𝜀 2 . The second-to-last equality follows because 𝜎𝜀 2 𝜌 𝛿 𝜎𝜀 2 commutes
− 12 − 21
with log2 (𝜎𝜀 𝜌 𝛿 𝜎𝜀 ),
and by employing cyclicity of trace. In the last line, we
made use of the following function:

𝜂(𝑥) B 𝑥 log2 𝑥, (7.7.42)

428
Chapter 7: Quantum Entropies and Information

defined for all 𝑥 ∈ [0, ∞) with 𝜂(0) = 0. By appealing to the continuity of the
function 𝜂(𝑥) on 𝑥 ∈ [0, ∞) and the fact that lim𝛿→0+ 𝜌 𝛿 = 𝜌0,0 , we find that

− 21 − 21 − 12 − 12
lim+ Tr 𝜎𝜀 𝜂 𝜎𝜀 𝜌 𝛿 𝜎𝜀 = Tr 𝜎𝜀 𝜂 𝜎𝜀 𝜌0,0 𝜎𝜀 . (7.7.43)
𝛿→0

Now recall the function log2,∗ defined in (7.7.19). Using it, we can write

− 21 − 12
Tr 𝜎𝜀 𝜂 𝜎𝜀 𝜌0,0 𝜎𝜀

− 21 − 21 − 21 − 12
= Tr 𝜎𝜀 𝜎𝜀 𝜌0,0 𝜎𝜀 log2,∗ 𝜎𝜀 𝜌0,0 𝜎𝜀 (7.7.44)

1 1 1 1 1 1 1 1
− − −
= Tr 𝜎𝜀2 𝜌0,0
2 2
𝜌0,0 𝜎𝜀 2 log2,∗ 𝜎𝜀 2 𝜌0,0 2 2
𝜌0,0 𝜎𝜀 2 (7.7.45)

1 1 1 1 1 1 1 1
− − −
= Tr 𝜎𝜀2 𝜌0,0
2
log2,∗ 𝜌0,0 2
𝜎𝜀 2 𝜎𝜀 2 𝜌0,0
2 2
𝜌0,0 𝜎𝜀 2 (7.7.46)

1 1
−1 2
= Tr 𝜌0,0 log2,∗ 𝜌0,0 𝜎𝜀 𝜌0,0
2
(7.7.47)

1 1
= Tr 𝜌0,0 log2,∗ 𝜌0,0 2
(𝜎 + 𝜀Π𝜎 ) −1 𝜌0,0 2
, (7.7.48)

where the last line follows because

1 1
2
𝜌0,0 (𝜎 + 𝜀Π𝜎 ) −1 𝜌0,0
2

1
! 1 !
−1
2 (𝜎 + 𝜀Π𝜎 ) 0 2
= 𝜌0,0 0 −1 ⊥
𝜌0,0 0 (7.7.49)
0 0 0 𝜀 Π𝜎 0 0
1 1
!
−1 2
= 0,0𝜌 2
(𝜎 + 𝜀Π 𝜎 ) 𝜌 0,0 0 . (7.7.50)
0 0

Now taking the limit as 𝜀 → 0+ , and appealing to continuity of log2,∗ (𝑥) and 𝑥 −1
for 𝑥 > 0, we find that

1 1
−1 2
lim Tr 𝜌0,0 log2,∗ 𝜌0,0 (𝜎 + 𝜀Π𝜎 ) 𝜌0,0
2
𝜀→0+

1 1
= Tr 𝜌0,0 log2,∗ 𝜌0,0
2
𝜎 −1 𝜌0,0
2
(7.7.51)

429
Chapter 7: Quantum Entropies and Information
h 1 1
i
−1
= Tr 𝜌 log2 𝜌 𝜎 𝜌 2 2 (7.7.52)

where the formula in the last line is interpreted exactly as stated in Definition 7.51.
Thus, we conclude that
h 1 i
1 1
−1 −1 21
lim+ lim+ Tr 𝜌 𝛿 log2 𝜌 𝛿 𝜎𝜀 𝜌 𝛿 = Tr 𝜌 log2 𝜌 𝜎 𝜌
2 2 2 . (7.7.53)
𝜀→0 𝛿→0

Now suppose that supp(𝜌) ⊈ supp(𝜎). Then applying Proposition 7.53, we

find that the following inequality holds for all 𝛿 ∈ (0, 1) and 𝜀 > 0:
b (𝜌 𝛿 ∥𝜎𝜀 ) ≥ 𝐷 (𝜌 𝛿 ∥𝜎𝜀 ).
𝐷 (7.7.54)

Now taking limits and applying Proposition 7.2, we find that

b (𝜌 𝛿 ∥𝜎𝜀 ) ≥ lim lim 𝐷 (𝜌 𝛿 ∥𝜎𝜀 )
lim lim 𝐷 (7.7.55)
𝜀→0+ 𝛿→0+ + +𝜀→0 𝛿→0
= lim+ 𝐷 (𝜌∥𝜎𝜀 ) (7.7.56)
𝜀→0
= +∞. (7.7.57)

This concludes the proof. ■

By taking the limit 𝛼 → 1 in the statement of the data-processing inequality

for 𝐷
b𝛼 , and applying Proposition 7.52, we immediately obtain the data-processing
inequality for the Belavkin–Staszewski relative entropy.

Corollary 7.55 Data-Processing Inequality for Belavkin–Staszewski Rel-

ative Entropy
Let 𝜌 be a state, 𝜎 a positive semi-definite operator, and N a quantum channel.
Then
𝐷b (𝜌∥𝜎) ≥ 𝐷 b (N(𝜌)∥N(𝜎)). (7.7.58)

Some basic properties of the Belavkin–Staszewski relative entropy are as

follows:

430
Chapter 7: Quantum Entropies and Information

Proposition 7.56 Properties of Belavkin–Staszewski Relative Entropy

The Belavkin–Staszewski relative entropy satisfies the following properties for
states 𝜌, 𝜌1 , 𝜌2 and positive semi-definite operators 𝜎, 𝜎1 , 𝜎2 .
1. Isometric invariance: For every isometry 𝑉,
b (𝑉 𝜌𝑉 † ∥𝑉 𝜎𝑉 † ) = 𝐷
𝐷 b (𝜌∥𝜎). (7.7.59)

2. (a) If Tr[𝜎] ≤ 1, then 𝐷

b (𝜌∥𝜎) ≥ 0.
(b) Faithfulness: Suppose that Tr[𝜎] ≤ Tr[𝜌] = 1. Then 𝐷
b (𝜌∥𝜎) = 0 if
and only if 𝜌 = 𝜎.
(c) If 𝜌 ≤ 𝜎, then 𝐷
b (𝜌∥𝜎) ≤ 0.
(d) If 𝜎 ≤ 𝜎′, then 𝐷 b (𝜌∥𝜎′).
b (𝜌∥𝜎) ≥ 𝐷

3. Additivity:
b (𝜌1 ⊗ 𝜌2 ∥𝜎1 ⊗ 𝜎2 ) = 𝐷
𝐷 b (𝜌1 ∥𝜎1 ) + 𝐷 (𝜌2 ∥𝜎2 ). (7.7.60)

As a special case, for every 𝛽 ∈ (0, ∞),

𝐷 b (𝜌∥𝜎) + log2 1 .
b (𝜌∥ 𝛽𝜎) = 𝐷 (7.7.61)
𝛽

4. Direct-sum property: Let 𝑝 : X → [0, 1] be a probability distribution

over a finite alphabet X with associated |X|-dimensional system 𝑋, and let
𝑞 : X → [0, ∞) be a positive function on X. Let {𝜌 𝑥𝐴 : 𝑥 ∈ X} be a set of
states on a system 𝐴, and let {𝜎𝐴𝑥 : 𝑥 ∈ X} be a set of positive semi-definite
operators on 𝐴. Then,
∑︁
b (𝜌 𝑋 𝐴 ∥𝜎𝑋 𝐴 ) = 𝐷
𝐷 b ( 𝑝∥𝑞) + b (𝜌 𝑥 ∥𝜎 𝑥 ).
𝑝(𝑥) 𝐷 (7.7.62)
𝐴 𝐴
𝑥∈X

where
∑︁
𝜌𝑋 𝐴 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 , (7.7.63)
𝑥∈X
∑︁
𝜎𝑋 𝐴 B 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 . (7.7.64)
𝑥∈X

431
Chapter 7: Quantum Entropies and Information

Proof:
1. Isometric invariance is a direct consequence of Propositions 7.44 and 7.52.
2. All of the properties in the second item follow from data processing (Corol-
lary 7.55).
(a) Applying the trace-out channel, we find that
b (𝜌∥𝜎) ≥ 𝐷
𝐷 b (Tr[𝜌] ∥ Tr[𝜎]) (7.7.65)
= Tr[𝜌] log2 (Tr[𝜌]/Tr[𝜎]) (7.7.66)
= − log2 Tr[𝜎] (7.7.67)
≥ 0. (7.7.68)

(b) If 𝜌 = 𝜎, then it follows by direct evalution that 𝐷 b (𝜌∥𝜎) = 0. If

𝐷b (𝜌∥𝜎) = 0 and Tr[𝜎] ≤ 1, then 𝐷 (𝜌∥𝜎) = 0 by Proposition 7.53 and
we conclude that 𝜌 = 𝜎 from faithfulness of the quantum relative entropy
(Proposition 7.3).
(c) If 𝜌 ≤ 𝜎, then 𝜎 − 𝜌 is positive semi-definite, and the following operator
is positive semi-definite:

ˆ B |0⟩⟨0| ⊗ 𝜌 + |1⟩⟨1| ⊗ (𝜎 − 𝜌) .
𝜎 (7.7.69)

Defining 𝜌ˆ B |0⟩⟨0| ⊗ 𝜌, we find from the direct-sum property that

b (𝜌∥ 𝜌) = 𝐷
0=𝐷 b ( 𝜌∥ ˆ ≥𝐷
ˆ 𝜎) b (𝜌∥𝜎), (7.7.70)

where the inequality follows from data processing by tracing out the first
classical register of 𝜌ˆ and 𝜎.
ˆ
(d) If 𝜎 ≤ 𝜎′, then the operator 𝜎′ − 𝜎 is positive semi-definite and so is the
following one:

ˆ B |0⟩⟨0| ⊗ 𝜎 + |1⟩⟨1| ⊗ (𝜎′ − 𝜎) .

𝜎 (7.7.71)

Defining 𝜌ˆ B |0⟩⟨0| ⊗ 𝜌, we find from the direct-sum property that

b (𝜌∥𝜎) = 𝐷
𝐷 b ( 𝜌∥ b (𝜌∥𝜎′),
ˆ ≥𝐷
ˆ 𝜎) (7.7.72)

where the inequality follows from data processing by tracing out the first
classical register of 𝜌ˆ and 𝜎.
ˆ
432
Chapter 7: Quantum Entropies and Information

3. Additivity follows by direct evaluation.

4. The direct-sum property follows by direct evaluation. ■

A statement similar to that made by Proposition 7.48 holds for the Belavkin–
Staszewski relative entropy:

Proposition 7.57 Belavkin–Staszewski Relative Entropy from Classical

Preparations
Let 𝜌 be a state and 𝜎 a positive semi-definite operator satisfying supp(𝜌) ⊆
supp(𝜎). The Belavkin–Staszewski relative entropy is equal to the smallest
value that the classical relative entropy can take by minimizing over classical–
quantum channels that realize the state 𝜌 and the positive semi-definite operator
𝜎. That is, the following equality holds
b (𝜌∥𝜎) = inf {𝐷 ( 𝑝∥𝑞) : P(𝜔( 𝑝)) = 𝜌, P(𝜔(𝑞)) = 𝜎} ,
𝐷 (7.7.73)
{𝑝,𝑞,P}

where the classical relative entropy is defined in (7.2.2), the channel P is a

classical–quantum channel, 𝑝 : X → [0, 1] is a probability distribution over
a finite alphabet X, 𝑞 : X → [0, ∞) is a positive function on X, 𝜔( 𝑝) B
Í Í
𝑥∈X 𝑝(𝑥)|𝑥⟩⟨𝑥|, and 𝜔(𝑞) B 𝑥∈X 𝑞(𝑥)|𝑥⟩⟨𝑥|.

Proof: The proof is very similar to the proof of Proposition 7.48, and so we use
the same notation to provide a brief proof. By following the same reasoning that
leads to (7.6.170), it follows that

inf {𝐷 ( 𝑝∥𝑞) : P( 𝑝) = 𝜌, P(𝑞) = 𝜎} ≥ 𝐷

b (𝜌∥𝜎). (7.7.74)
{𝑝,𝑞,P}

The optimal choices of 𝑝, 𝑞, and P saturating the inequality in (7.7.74) are again
given by (7.6.171)–(7.6.173). Consider for those choices that
∑︁
∑︁ 𝑝(𝑥)
𝑝(𝑥) log2 = 𝑝(𝑥) log2 (𝜆𝑥 ) (7.7.75)
𝑥
𝑞(𝑥) 𝑥
∑︁
= 𝜆 𝑥 𝑞(𝑥) log2 (𝜆𝑥 ) (7.7.76)
𝑥
∑︁
= 𝜆 𝑥 Tr[Π𝑥 𝜎] log2 (𝜆 𝑥 ) (7.7.77)
𝑥

433
Chapter 7: Quantum Entropies and Information
" !#
∑︁
= Tr 𝜎 𝜆𝑥 log2 (𝜆 𝑥 ) Π𝑥 (7.7.78)
h 𝑥 1 1
1 1
i
= Tr 𝜎 𝜎 − 2 𝜌𝜎 − 2 log2 𝜎 − 2 𝜌𝜎 − 2 (7.7.79)
h 1 i
−1 21
= Tr 𝜌 log2 𝜌 𝜎 𝜌
2 , (7.7.80)

where the last equality follows from reasoning similar to that used to justify
(7.7.20)–(7.7.24). Then by following the reasoning at the end of the proof of
Proposition 7.48, we conclude (7.7.73). ■

7.8 Max-Relative Entropy

An important generalized divergence that appears in the context of placing upper
bounds on communication rates of feedback-assisted protocols is the max-relative
entropy.

Definition 7.58 Max-Relative Entropy

The max-relative entropy 𝐷 max (𝜌∥𝜎) of a state 𝜌 and a positive semi-definite
operator 𝜎 is defined as
1 1
𝐷 max (𝜌∥𝜎) B log2 𝜎 − 2 𝜌𝜎 − 2 , (7.8.1)
∞

if supp(𝜌) ⊆ supp(𝜎); otherwise, 𝐷 max (𝜌∥𝜎) = +∞.

The max-relative entropy has the following equivalent representations:

1 1
𝐷 max (𝜌∥𝜎) = log2 𝜌 2 𝜎 −1 𝜌 2 (7.8.2)
∞
1 1
= 2 log2 𝜌 2 𝜎 − 2 (7.8.3)
∞
= log2 inf {𝜆 : 𝜌 ≤ 𝜆𝜎} (7.8.4)
𝜆≥0
= inf {𝜆 : 𝜌 ≤ 2𝜆 𝜎} (7.8.5)
𝜆∈R
= log2 sup {Tr[𝑀 𝜌] : Tr[𝑀𝜎] ≤ 1}. (7.8.6)
𝑀 ≥0

434
Chapter 7: Quantum Entropies and Information

The second-to-last equality demonstrates that 𝐷 max (𝜌∥𝜎) can be calculated using a
semi-definite program (SDP) (see Section 2.4). Indeed, the optimization in (7.8.4)
can be cast in the standard form in (2.4.4), i.e.,

 infimum Tr[𝐵𝑌 ]



inf{𝜆 : 𝜌 ≤ 𝜆𝜎} = subject to Φ† (𝑌 ) ≥ 𝐴, (7.8.7)
𝑌 ≥ 0,



with 𝑌 ≡ 𝜆, 𝐴 ≡ 𝜌, 𝐵 ≡ 1, and Φ† (𝑌 ) = 𝑌 𝜎. (Note that taking the trace on both
sides of the constraint 𝜌 ≤ 𝜆𝜎 in (7.8.4) results in 𝜆 ≥ 1/Tr[𝜎], so that 𝜆 ≥ 0.)
The final equality in (7.8.6) results from calculating the SDP dual to that in (7.8.4).
The max-relative entropy has the following alternative representation, which,
when 𝜌 and 𝜎 are states, allows for thinking of it as being related to the largest
weight that one can place on 𝜌 to realize 𝜎 as a probabilistic mixture of 𝜌 and some
other state.

Lemma 7.59
The max-relative entropy 𝐷 max (𝜌∥𝜎) of a state 𝜌 and a positive semi-definite
operator 𝜎 can be written as follows:
n o
−𝜆 −𝜆
𝐷 max (𝜌∥𝜎) = inf 𝜆 : 𝜎 =2 𝜌+ 1−2 𝜔, Tr[𝜔] = 1 . (7.8.8)
𝜆∈R,𝜔≥0

Proof: Let 𝜇 ∈ R be such that 𝜌 ≤ 2 𝜇 𝜎. Then it follows that 2 𝜇 𝜎 − 𝜌 ≥ 0, so that

𝜇
𝜔 := 22 𝜇𝜎−𝜌
−1 is a quantum state. Now consider that

−𝜇 −𝜇 −𝜇 2𝜇 𝜎 − 𝜌
−𝜇
2 𝜌 + (1 − 2 ) 𝜔 = 2 𝜌 + (1 − 2 ) 𝜇 (7.8.9)
2 −1
−𝜇
−𝜇 2 (𝜎 − 2 𝜌)
𝜇
−𝜇
= 2 𝜌 + (1 − 2 ) (7.8.10)
2𝜇 − 1
= 2−𝜇 𝜌 + 𝜎 − 2−𝜇 𝜌 (7.8.11)
= 𝜎. (7.8.12)

Thus, 𝜇 and 𝜔 satisfy the constraints for the optimization problem in (7.8.8), and
we conclude that
n o
−𝜆 −𝜆
𝜇 ≥ inf 𝜆 : 𝜎 =2 𝜌+ 1−2 𝜔, Tr[𝜔] = 1 . (7.8.13)
𝜆∈R,𝜔≥0

435
Chapter 7: Quantum Entropies and Information

By taking an infimum over all 𝜇 satisfying 𝜌 ≤ 2 𝜇 𝜎 and applying (7.8.5), we

conclude that
n o
−𝜆 −𝜆
𝐷 max (𝜌∥𝜎) ≥ inf 𝜆 : 𝜎 =2 𝜌+ 1−2 𝜔, Tr[𝜔] = 1 . (7.8.14)
𝜆∈R,𝜔≥0

Now we prove the opposite inequality. Let 𝜆 ∈ R and let 𝜔 be an arbitrary state
−𝜆 −𝜆

satisfying 𝜎 = 2 𝜌 + 1 − 2 𝜔. Then it follows that

−𝜆 −𝜆
𝜎 =2 𝜌+ 1−2 𝜔 ≥ 2−𝜆 𝜌, (7.8.15)

from which we conclude that 𝜌 ≤ 2𝜆 𝜎. So it follows that 𝜆 ≥ 𝐷 max (𝜌∥𝜎). Since

𝜔 and 𝜆 are arbitrary, we conclude that
n o
−𝜆 −𝜆
inf 𝜆 : 𝜎 =2 𝜌+ 1−2 𝜔, Tr[𝜔] = 1 ≥ 𝐷 max (𝜌∥𝜎), (7.8.16)
𝜆∈R,𝜔≥0

which completes the proof. ■

Proposition 7.60 Data-Processing Inequality for Max-Relative Entropy

Let 𝜌 be a state, 𝜎 a positive semi-definite operator, and N a quantum channel.
Then,
𝐷 max (𝜌∥𝜎) ≥ 𝐷 max (N(𝜌)∥N(𝜎)). (7.8.17)

Remark: This result holds more generally for positive maps that are not necessarily trace
preserving.

Proof: To see this, let 𝜆 ∈ R be such that the operator inequality 𝜌 ≤ 2𝜆 𝜎 holds.
Then the operator inequality N(𝜌) ≤ 2𝜆 N(𝜎) holds because the quantum channel
N is a positive map. Then

𝐷 max (N(𝜌)∥N(𝜎)) = inf {𝜇 : N(𝜌) ≤ 2 𝜇 N(𝜎)} ≤ 𝜆. (7.8.18)

𝜇∈R

Since the inequality holds for all choices of 𝜆 such that 𝜌 ≤ 2𝜆 𝜎 holds, we conclude
that

𝐷 max (N(𝜌)∥N(𝜎)) ≤ inf {𝜆 : 𝜌 ≤ 2𝜆 𝜎} = 𝐷 max (𝜌∥𝜎). (7.8.19)

𝜆∈R

This concludes the proof. ■

436
Chapter 7: Quantum Entropies and Information

It turns out that the max-relative entropy is a limiting case of the sandwiched
and geometric Rényi relative entropies, as we now show.

Proposition 7.61
The sandwiched and geometric Rényi relative entropies converge to the max-
relative entropy in the limit 𝛼 → ∞:
e𝛼 (𝜌∥𝜎) = lim 𝐷
lim 𝐷 b𝛼 (𝜌∥𝜎) = 𝐷 max (𝜌∥𝜎). (7.8.20)
𝛼→∞ 𝛼→∞

Proof: We begin with the case in which supp(𝜌) ⊈ supp(𝜎). We trivially have
e𝛼 (𝜌∥𝜎) = 𝐷
𝐷 b𝛼 (𝜌∥𝜎) = 𝐷 max (𝜌∥𝜎) = +∞ for all 𝛼 > 1, which implies the
equality in (7.8.20) in this case.
In the case that supp(𝜌) ⊆ supp(𝜎), we can consider, without loss of generality,
that supp(𝜎) = H, so that 𝜆 min (𝜎) > 0.
We begin with the sandwiched Rényi relative entropy. Consider that
1 1 1− 𝛼 1
e𝛼 (𝜌∥𝜎) =
𝐷 log2 Tr[(𝜌 2 𝜎 𝛼 𝜌 2 ) 𝛼 ] (7.8.21)
𝛼−1
1 1 1 1 1 1
= log2 Tr[(𝜌 2 𝜎 − 2 𝜎 𝛼 𝜎 − 2 𝜌 2 ) 𝛼 ]. (7.8.22)
𝛼−1
By the operator inequalities [𝜆 min (𝜎)] 𝛼 1 ≤ 𝜎 𝛼 ≤ [𝜆max (𝜎)] 𝛼 1 and the mono-
1 1 1

tonicity Tr[𝑋 𝛼 ] ≤ Tr[𝑌 𝛼 ] for positive semi-definite 𝑋 and 𝑌 satisfying 𝑋 ≤ 𝑌 (see

Lemma 2.14), we find for 𝛼 > 1 that
1 1 1 1 1 1 1
𝜆 min (𝜎)Tr[(𝜌 2 𝜎 −1 𝜌 2 ) 𝛼 ] ≤ Tr[(𝜌 2 𝜎 − 2 𝜎 𝛼 𝜎 − 2 𝜌 2 ) 𝛼 ] (7.8.23)
1
≤ 𝜆 max (𝜎)Tr[(𝜌 1/2 𝜎 −1 𝜌 2 ) 𝛼 ]. (7.8.24)

Using the fact that

1 1 1 1 𝛼
Tr[(𝜌 2 𝜎 −1 𝜌 2 ) 𝛼 ] = 𝜌 2 𝜎 −1 𝜌 2 , (7.8.25)
𝛼
1
and taking a logarithm followed by multiplication of 𝛼−1 , we find that

1 𝛼 1 1
log2 𝜆 min (𝜎) + log2 𝜌 2 𝜎 −1 𝜌 2 ≤ 𝐷e𝛼 (𝜌∥𝜎)
𝛼−1 𝛼−1 𝛼
1 𝛼 1 1
≤ log2 𝜆 max (𝜎) + log2 𝜌 2 𝜎 −1 𝜌 2 . (7.8.26)
𝛼−1 𝛼−1 𝛼

437
Chapter 7: Quantum Entropies and Information

Now taking the limit 𝛼 → ∞ and applying the fact that lim𝛼→∞ ∥ 𝑋 ∥ 𝛼 = ∥ 𝑋 ∥ ∞
e𝛼 (𝜌∥𝜎) = 𝐷 max (𝜌∥𝜎).
(Proposition 2.9), we conclude the equality lim𝛼→∞ 𝐷
We now consider the geometric Rényi relative entropy. Since we have that
𝜆 min (𝜎) 1 ≤ 𝜎 ≤ 𝜆max (𝜎) 1, (7.8.27)
it follows that
h 𝛼i h 1 𝛼i
− 12 − 12 −2 − 12
𝜆 min (𝜎) Tr 𝜎 𝜌𝜎 ≤ Tr 𝜎 𝜎 𝜌𝜎 (7.8.28)
h 1 𝛼i
−2 − 12
≤ 𝜆 max (𝜎) Tr 𝜎 𝜌𝜎 . (7.8.29)

Now taking a logarithm, dividing by 𝛼 − 1, and applying definitions, we find that

the following inequalities hold for 𝛼 > 1:
1 1 h 1 𝛼i
−2 − 12
log2 𝜆min (𝜎) + log2 Tr 𝜎 𝜌𝜎
𝛼−1 𝛼−1
≤𝐷 b𝛼 (𝜌∥𝜎) (7.8.30)
1 𝛼
1 1 h 1
i
≤ log2 𝜆 max (𝜎) + log2 Tr 𝜎 − 2 𝜌𝜎 − 2 . (7.8.31)
𝛼−1 𝛼−1
Rewriting
1 h 1 𝛼i 𝛼 h 1 𝛼 i 𝛼1
−2 − 12 −2 − 12
log2 Tr 𝜎 𝜌𝜎 = log2 Tr 𝜎 𝜌𝜎 (7.8.32)
𝛼−1 𝛼−1
𝛼 1 1
= log2 𝜎 − 2 𝜌𝜎 − 2 . (7.8.33)
𝛼−1 𝛼

Then by applying lim𝛼→∞ ∥ 𝑋 ∥ 𝛼 = ∥ 𝑋 ∥ ∞ , it follows that

1 h 1 𝛼i
−2 − 12
lim log2 Tr 𝜎 𝜌𝜎 = 𝐷 max (𝜌∥𝜎). (7.8.34)
𝛼→∞ 𝛼 − 1

Combining this limit with the inequalities in (7.8.30) and (7.8.31), we arrive at the
equality lim𝛼→∞ 𝐷b𝛼 (𝜌∥𝜎) = 𝐷 max (𝜌∥𝜎). ■

As a consequence of Proposition 7.61, it is customary to use the notations

e∞ (𝜌∥𝜎) ≡ 𝐷 max (𝜌∥𝜎) ≡ 𝐷
𝐷 b∞ (𝜌∥𝜎) (7.8.35)
to denote the max-relative entropy. It also follows that the max-relative entropy
satisfies the properties of isometric invariance and additivity, as stated in Propo-
sition 7.31. Proposition 7.31 also tells us that the sandwiched Rényi relative
438
Chapter 7: Quantum Entropies and Information

entropy 𝐷e𝛼 is monotonically increasing in 𝛼, which means that the max-relative

entropy has the largest value among all sandwiched Rényi relative entropies. Due
to Proposition 7.44, a similar conclusion holds for the geometric Rényi relative
entropies. The max-relative entropy also satisfies all of the properties stated in
Proposition 7.35.
The conditional entropy arising from the max-relative entropy (according to the
general definition in (7.3.12)) is known as the conditional min-entropy:
e∞ ( 𝐴|𝐵) 𝜌 = − inf 𝐷 max (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 )
𝐻min ( 𝐴|𝐵) 𝜌 B 𝐻 (7.8.36)
𝜎𝐵

for all bipartite states 𝜌 𝐴𝐵 , where the optimization is over states 𝜎𝐵 . Since the
max-relative entropy has the largest value among all the sandwiched Rényi relative
entropies, the quantity in (7.8.36) has the smallest value among all conditional
sandwiched Rényi entropies, which is why it is called the conditional min-entropy.
Note that the conditional sandwiched Rényi entropy is defined through (7.3.12),
with the generalized divergence 𝑫 therein replaced by the sandwiched Rényi relative
entropy 𝐷e𝛼 , the latter defined in (7.5.2). On the other hand, the quantity
𝐻max ( 𝐴|𝐵) 𝜌 B 𝐻 e 1 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 )
e1 ( 𝐴|𝐵) 𝜌 = − inf 𝐷 (7.8.37)
2 𝜎𝐵 2

is known as the conditional max-entropy of the state 𝜌 𝐴𝐵 , where the optimization

is over states 𝜎𝐵 . This name comes from the fact that 𝛼 = 12 is the smallest value
of 𝛼 for which the sandwiched Rényi relative entropy is known to satisfy the
data-processing inequality (recall Theorem 7.33). Since the sandwiched Rényi
relative entropy 𝐷 e𝛼 is monotonically increasing in 𝛼, the quantity in (7.8.37) is
known as the conditional max-entropy because it has the largest value among all
conditional sandwiched Rényi entropies for which the data-processing inequality is
known to hold.

Remark: Let 𝜌 𝑋𝐵 be a classical–quantum state of the form

∑︁
𝜌 𝑋𝐵 = 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝐵𝑥 , (7.8.38)
𝑥 ∈X

where X is a finite alphabet 𝑝 : X → [0, 1] is a probability distribution, and {𝜌 𝐵𝑥 } 𝑥 ∈X is a set of

states. Using the duality of semi-definite programs (see Section 2.4), as done in (5.3.122), it
follows that
𝐻min (𝑋 |𝐵)𝜌 = − log2 𝑝 ∗succ ({( 𝑝(𝑥), 𝜌 𝐵𝑥 )} 𝑥 ), (7.8.39)
where 𝑝 ∗succ ({( 𝑝(𝑥), 𝜌 𝐵𝑥 )}), defined in (5.3.119), is the optimal success probability for multiple
state discrimination. The conditional min-entropy of a classical–quantum state thus has
an operational interpretation in terms of the optimal success probability for multiple state
discrimination.

439
Chapter 7: Quantum Entropies and Information

7.8.1 Smooth Max-Relative Entropy

For the analysis of lower bounds on quantum and private communication rates,
we require the smooth max-relative entropy, which is an example of a smooth
generalized divergence. A smooth generalized divergence, denoted by 𝑫 𝜀 (𝜌∥𝜎), is
defined by taking a generalized divergence 𝑫 (𝜌∥𝜎), for a given state 𝜌 and positive
semi-definite operator 𝜎, and optimizing the quantity 𝑫 (e 𝜌 ∥𝜎) over states e 𝜌 that
are within a distance 𝜀 from the given state 𝜌. Specifically, it is defined as follows:

𝑫 𝜀 (𝜌∥𝜎) B inf 𝑫 (e
𝜌 ∥𝜎), (7.8.40)
𝜌 ∈B 𝜀 (𝜌)
e

where
B𝜀 (𝜌) B {𝜏 : 𝜏 ≥ 0, Tr[𝜏] = 1, 𝑃(𝜌, 𝜏) ≤ 𝜀} (7.8.41)
is the set of states 𝜏 that are 𝜀-close to 𝜌 in terms of the sine distance (Definition 6.16).

Definition 7.62 Smooth Max-Relative Entropy

Let 𝜌 be a state and 𝜎 a positive semi-definite operator. Then, the 𝜀-smooth
max-relative entropy, for 𝜀 ∈ [0, 1), is defined as
𝜀
𝐷 max (𝜌∥𝜎) B inf 𝐷 max (e
𝜌 ∥𝜎). (7.8.42)
𝜌 ∈B 𝜀 (𝜌)
e

Just like the max-relative entropy, the smooth max-relative entropy is a general-
ized divergence, satisfying the data-processing inequality:

Proposition 7.63 Data-Processing Inequality for Smooth Max-Relative

Entropy
Let 𝜌 be a state, 𝜎 a positive semi-definite operator, and N a quantum channel.
The smooth max-relative entropy obeys the following data-processing inequality
for all 𝜀 ∈ (0, 1):
𝜀 𝜀
𝐷 max (𝜌∥𝜎) ≥ 𝐷 max (N(𝜌)∥N(𝜎)). (7.8.43)

Proof: To see this, let e

𝜌 be an arbitrary state such that

𝜌 , 𝜌) ≤ 𝜀.
𝑃(e (7.8.44)
440
Chapter 7: Quantum Entropies and Information

Then from the data-processing inequality for the sine distance under positive
trace-preserving maps (see (6.2.114)), it follows that

𝜌 ), N(𝜌)) ≤ 𝜀.
𝑃(N(e (7.8.45)

So it follows that

𝐷 max (e
𝜌 ∥𝜎) ≥ 𝐷 max (N(e
𝜌 )∥N(𝜎)) (7.8.46)
𝜀
≥ 𝐷 max (N(𝜌)∥N(𝜎)). (7.8.47)

Since the inequality holds for an arbitrary state e

𝜌 satisfying (7.8.44), we conclude
(7.8.43). ■

Remark: The proof given above holds more generally when N is a positive, trace-preserving
map, so that (7.8.43) holds in this more general case.

The smooth max-relative entropy can be related to the sandwiched Rényi relative
entropy as follows:

Proposition 7.64 Smooth Max- to Sandwiched Rényi Relative Entropy

Let 𝜌 be a state, 𝜎 a positive semi-definite operator, 𝛼 ∈ (1, ∞), and 𝜀 ∈ (0, 1).
Then,

𝜀 1 1 1
𝐷 max (𝜌∥𝜎) ≤ 𝐷 e𝛼 (𝜌∥𝜎) + log2 2 + log2 . (7.8.48)
𝛼−1 𝜀 1 − 𝜀2

Proof: The statement is trivially true if 𝜌 = 𝜎 or if supp(𝜌) ⊈ supp(𝜎). In the

𝜀 (𝜌∥𝜎) = 𝐷
former case, 𝐷 max e𝛼 (𝜌∥𝜎) = 0, and in the latter, 𝐷
e𝛼 (𝜌∥𝜎) = +∞.

So going forward, we assume that 𝜌 ≠ 𝜎 and supp(𝜌) ⊆ supp(𝜎). As

mentioned in (7.8.6), the SDP dual of 𝐷 max (𝜏∥𝜔) is given by

𝐷 max (𝜏∥𝜔) = log2 sup {Tr[Λ𝜏] : Tr[Λ𝜔] ≤ 1} , (7.8.49)

Λ≥0

implying that
𝜀
𝐷 max (𝜌∥𝜎) = log2 inf sup 𝜌 ].
Tr[Λe (7.8.50)
𝜌 :𝑃(e
e 𝜌 ,𝜌)≤𝜀 Λ≥0,Tr[Λ𝜎]≤1

441
Chapter 7: Quantum Entropies and Information

𝜌 ] is linear in Λ and e
Since the objective function Tr[Λe 𝜌 , the set {Λ : Λ ≥
0, Tr[Λ𝜎] ≤ 1} is compact and concave, and the set

{e 𝜌 , 𝜌) ≤ 𝜀, e
𝜌 : 𝑃(e 𝜌 ≥ 0, Tr[e
𝜌 ] = 1} (7.8.51)

is compact and convex (due to convexity of sine distance), the minimax theorem
(Theorem 2.24) applies and we find that
𝜀
𝐷 max (𝜌∥𝜎) = log2 sup inf 𝜌 ].
Tr[Λe (7.8.52)
𝜌 :𝑃(e
Λ≥0,Tr[Λ𝜎]≤1 e 𝜌 ,𝜌)≤𝜀

For a fixed operator Λ ≥ 0 with spectral decomposition

∑︁
Λ= 𝜆𝑖 |𝜙𝑖 ⟩⟨𝜙𝑖 |, (7.8.53)
𝑖

let us define the following set, for a choice of 𝜆 > 0 to be specified later:

S B 𝑖 : ⟨𝜙𝑖 |𝜌|𝜙𝑖 ⟩ > 2𝜆 ⟨𝜙𝑖 |𝜎|𝜙𝑖 ⟩ . (7.8.54)

Let ∑︁
ΠB |𝜙𝑖 ⟩⟨𝜙𝑖 |. (7.8.55)
𝑖∈S
Then from the definition, we find that

Tr[Π𝜌] > 2𝜆 Tr[Π𝜎], (7.8.56)

which implies that

Tr[Π𝜌]
> 2𝜆 . (7.8.57)
Tr[Π𝜎]
Now consider from the data-processing inequality under the channel

Δ(𝜔) B Tr[Π𝜔]|0⟩⟨0| + Tr[ Π̂𝜔]|1⟩⟨1| (7.8.58)

that
e𝛼 (𝜌∥𝜎) ≥ 𝐷
𝐷 e𝛼 (Δ(𝜌)∥Δ(𝜎)) (7.8.59)
!
1 (Tr[Π𝜌]) 𝛼 (Tr[Π𝜎]) 1−𝛼
= log2 𝛼 1−𝛼 (7.8.60)
𝛼−1 + Tr[ Π̂𝜌] Tr[ Π̂𝜎]
1
𝛼 1−𝛼
≥ log2 (Tr[Π𝜌]) (Tr[Π𝜎]) (7.8.61)
𝛼−1
442
Chapter 7: Quantum Entropies and Information

𝛼−1 !
1 Tr[Π𝜌]
= log2 Tr[Π𝜌] (7.8.62)
𝛼−1 Tr[Π𝜎]

1 Tr[Π𝜌]
= log2 (Tr[Π𝜌]) + log2 (7.8.63)
𝛼−1 Tr[Π𝜎]
1
≥ log2 (Tr[Π𝜌]) + 𝜆. (7.8.64)
𝛼−1
Now picking
e𝛼 (𝜌∥𝜎) + 1 1
𝜆=𝐷 log2 2 , (7.8.65)
𝛼−1 𝜀
we conclude that
Tr[Π𝜌] ≤ 𝜀 2 . (7.8.66)
Defining Π̂ B 1 − Π, this means that

Tr[ Π̂𝜌] ≥ 1 − 𝜀 2 . (7.8.67)

Thus, the state

Π̂𝜌 Π̂
𝜌′ B (7.8.68)
Tr[ Π̂𝜌]
is such that
𝐹 (𝜌, 𝜌′) ≥ 1 − 𝜀 2 , (7.8.69)
by applying Lemma 6.15, and in turn that

𝑃(𝜌, 𝜌′) ≤ 𝜀. (7.8.70)

We also have that

Π̂𝜌 Π̂
𝜌′ ≤ . (7.8.71)
1 − 𝜀2
Now let Λ be an arbitrary operator satisfying Λ ≥ 0 and Tr[Λ𝜎] ≤ 1, and let Π
be the projection defined in (7.8.55) for this choice of Λ. Then we find that

1 − 𝜀 Tr[Λ𝜌′] ≤ Tr[ΛΠ̂𝜌 Π̂]
2
(7.8.72)
= Tr[ Π̂ΛΠ̂𝜌] (7.8.73)
∑︁
= 𝜆𝑖 ⟨𝜙𝑖 |𝜌|𝜙𝑖 ⟩ (7.8.74)
𝑖∉S

443
Chapter 7: Quantum Entropies and Information
∑︁
≤ 2𝜆 𝜆𝑖 ⟨𝜙𝑖 |𝜎|𝜙𝑖 ⟩ (7.8.75)
𝑖∉S
𝜆
≤ 2 Tr[Λ𝜎] (7.8.76)
≤ 2𝜆 . (7.8.77)

Thus, we have found the following uniform bound for any operator Λ satisfying
Λ ≥ 0 and Tr[Λ𝜎] ≤ 1, with 𝜌′ the state in (7.8.68) depending on Λ and satisfying
𝑃(𝜌, 𝜌′) ≤ 𝜀:
1
Tr[Λ𝜌′] ≤ 2
𝜆+log2
1− 𝜀 2 . (7.8.78)
Then it follows that
𝜀
𝐷 max (𝜌∥𝜎) = log2 sup inf 𝜌]
Tr[Λe (7.8.79)
𝜌 :𝑃(e
Λ≥0,Tr[Λ𝜎]≤1 e 𝜌 ,𝜌)≤𝜀
′
≤ log2 sup Tr[Λ𝜌 ] (7.8.80)
Λ≥0,Tr[Λ𝜎]≤1

1
≤ 𝜆 + log2 . (7.8.81)
1 − 𝜀2
This concludes the proof. ■

A quantity of interest is the smooth conditional min-entropy, which is a

conditional entropy that we define via the general construction of conditional
entropies in (7.3.12). For every bipartite state 𝜌 𝐴𝐵 , and every 𝜀 ∈ [0, 1), we define
it as
𝜀
𝐻min 𝜀
( 𝐴|𝐵) 𝜌 B − inf 𝐷 max (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ), (7.8.82)
𝜎𝐵

where we take the infimum over states 𝜎𝐵 .

Using the definition of the smooth conditional min-entropy in (7.8.82) and
applying Proposition 7.64, we conclude that

𝜀 1 1 1
𝐻min ( 𝐴|𝐵) 𝜌 ≥ 𝐻
e𝛼 ( 𝐴|𝐵) 𝜌 − log2 2 − log2 , (7.8.83)
𝛼−1 𝜀 1 − 𝜀2

for all 𝛼 > 1 and 𝜀 ∈ (0, 1). Note that the conditional sandwiched Rényi entropy
𝐻e𝛼 ( 𝐴|𝐵) 𝜌 is defined through (7.3.12), with the generalized divergence 𝑫 therein
replaced by the sandwiched Rényi relative entropy 𝐷 e𝛼 , the latter defined in (7.5.2).

444
Chapter 7: Quantum Entropies and Information

7.9 Hypothesis Testing Relative Entropy

We now explore another important generalized divergence, called the hypothesis
testing relative entropy. This particular entropy is defined to be the optimal value
of an operationally defined problem in the context of quantum hypothesis testing.
As such, it is debatable as to whether such a quantity should be given the name
“entropy.” However, our perspective is that the advantages of doing so far outweigh
this semantic point about nomenclature, and so we adopt this perspective here and
throughout the book. At the most fundamental level, the hypothesis testing relative
entropy obeys the quantum data-processing inequality, and for this reason and
others, it is useful for characterizing the optimal limits of various communication
protocols.

Definition 7.65 𝜺-Hypothesis Testing Relative Entropy

Given a state 𝜌, a positive semi-definite operator 𝜎, and 𝜀 ∈ [0, 1], the
𝜀-hypothesis testing relative entropy is defined as

𝐷 𝜀𝐻 (𝜌∥𝜎) B − log2 𝛽𝜀 (𝜌∥𝜎), (7.9.1)

where

𝛽𝜀 (𝜌∥𝜎) B inf {Tr[Λ𝜎] : 0 ≤ Λ ≤ 1, Tr[Λ𝜌] ≥ 1 − 𝜀} . (7.9.2)

Observe that 𝐷 𝜀𝐻 (𝜌∥𝜎) can be written as

𝐷 𝜀𝐻 (𝜌∥𝜎) = sup{− log2 Tr[Λ𝜎] : 0 ≤ Λ ≤ 1, Tr[Λ𝜌] = 1 − 𝜀}. (7.9.3)
Λ

That is, the monotonicity of the log2 function allows us to bring − log2 inside
the minimization in the definition of 𝛽𝜀 (𝜌∥𝜎), and it suffices to optimize over
measurement operators that meet the constraint Tr[Λ𝜌] ≥ 1 − 𝜀 with equality. This
follows because for every measurement operator Λ such that Tr[Λ𝜌] > 1−𝜀, we can
modify it by scaling it by a positive number 𝜆 ∈ [0, 1) such that Tr[(𝜆Λ) 𝜌] = 1 − 𝜀.
The new operator 𝜆Λ is a legitimate measurement operator and the error probability
Tr[(𝜆Λ)𝜎] only decreases under this scaling (i.e., Tr[(𝜆Λ)𝜎] < Tr[Λ𝜎]), which
allows us to conclude (7.9.3).
The hypothesis testing relative entropy can be computed using a semi-definite
445
Chapter 7: Quantum Entropies and Information

program, as indicated in the following proposition:

Proposition 7.66 Hypothesis Testing Relative Entropy as an SDP

For every state 𝜌, positive semi-definite operator 𝜎, and 𝜀 ∈ [0, 1], the
𝜀-hypothesis testing relative entropy can be expressed as the following SDPs:

𝐷 𝜀𝐻 (𝜌∥𝜎) = − log2 inf {Tr[Λ𝜎] : Λ ≤ 1, Tr[Λ𝜌] ≥ 1 − 𝜀} (7.9.4)

Λ≥0
= − log2 sup {𝜇(1 − 𝜀) − Tr[𝑍] : 𝜇𝜌 ≤ 𝜎 + 𝑍 }. (7.9.5)
𝜇≥0,𝑍 ≥0

Complementary slackness implies that the following equalities hold for optimal
Λ, 𝜇, and 𝑍:

Λ(𝜎 + 𝑍) = 𝜇Λ𝜌, Tr[Λ𝜌] 𝜇 = (1 − 𝜀)𝜇, Λ𝑍 = 𝑍. (7.9.6)

Proof: The primal formulation in (7.9.4) is immediate from Definition 7.65.

Indeed, considering the standard form in (2.4.4), we see that we can set

Tr[Λ𝜌] 0 1 − 𝜀 0
𝐵 = 𝜎, 𝑌 = Λ, Φ† (𝑌 ) = , 𝐴=
−1
. (7.9.7)
0 −Λ 0

To figure out the dual and having already identified 𝐴, 𝐵, and Φ† , we need to
determine the map Φ and plug into the standard form in (2.4.3). Letting

𝜇 0
𝑋B , (7.9.8)
0 𝑍
we find that

† Tr[Λ𝜌] 0 𝜇 0
Tr[Φ (𝑌 ) 𝑋] = Tr (7.9.9)
0 −Λ 0 𝑍
= 𝜇Tr[Λ𝜌] − Tr[Λ𝑍] (7.9.10)
= Tr[Λ(𝜇𝜌 − 𝑍)], (7.9.11)

which implies that

Φ(𝑋) = 𝜇𝜌 − 𝑍. (7.9.12)
Now substituting into (2.4.3) and simplifying, we conclude that the right-hand side
of (7.9.5) is the dual SDP.
446
Chapter 7: Quantum Entropies and Information

To show that this is equal to 𝐷 𝜀𝐻 (𝜌∥𝜎), we should demonstrate that the primal
and dual SDPs satisfy the strong duality property. It is clear that Λ = 1 is a feasible
point for the primal SDP. Furthermore, the choices 𝜇 = 1 and 𝑍 = 1 + [𝜎 − 𝜌] + ,
where [𝜎 − 𝜌] + is the positive part of 𝜎 − 𝜌, are strictly feasible for the dual. Thus,
we conclude (7.9.5) by applying Theorem 2.28.
The complementary slackness conditions in (7.9.6) follow directly from Propo-
sition 2.29. ■

Proposition 7.67 Optimal Measurement for Hypothesis Testing Relative

Entropy
For every state 𝜌, positive semi-definite operator 𝜎, and 𝜀 ∈ [0, 1], the 𝜀-
hypothesis testing relative entropy 𝐷 𝜀𝐻 (𝜌∥𝜎) is achieved by the following
measurement operator:

Λ(𝜇∗ , 𝑝 ∗ ) B Π 𝜇∗ 𝜌>𝜎 + 𝑝 ∗ Π 𝜇∗ 𝜌=𝜎 , (7.9.13)

where Π 𝜇∗ 𝜌>𝜎 is the projection onto the strictly positive part of 𝜇∗ 𝜌 − 𝜎, the
projection Π 𝜇∗ 𝜌=𝜎 projects onto the zero eigenspace of 𝜇∗ 𝜌 − 𝜎, and 𝜇∗ ≥ 0
and 𝑝 ∗ ∈ [0, 1] are chosen as follows:

𝜇∗ B sup 𝜇 : Tr[Π 𝜇𝜌>𝜎 𝜌] ≤ 1 − 𝜀 ,

(7.9.14)
1 − 𝜀 − Tr[Π 𝜇∗ 𝜌>𝜎 𝜌]
𝑝∗ B . (7.9.15)
Tr[Π 𝜇∗ 𝜌=𝜎 𝜌]

Proof: To find the form of an optimal measurement operator for the hypothesis
testing relative entropy, let Λ be a measurement operator satisfying Tr[Λ𝜌] = 1 − 𝜀
and let 𝜇 ≥ 0. Then

Tr[Λ𝜎] = Tr[Λ𝜎] + 𝜇 (1 − 𝜀 − Tr[Λ𝜌]) (7.9.16)

= −𝜇𝜀 + Tr[(𝐼 − Λ)𝜇𝜌] + Tr[Λ𝜎] (7.9.17)
1
≥ −𝜇𝜀 + (Tr[𝜇𝜌 + 𝜎] − ∥𝜇𝜌 − 𝜎∥ 1 ) (7.9.18)
2
1
= −𝜇𝜀 + (𝜇 + Tr[𝜎] − ∥𝜇𝜌 − 𝜎∥ 1 ) . (7.9.19)
2
The sole inequality follows as an application of Theorem 5.3, with 𝐵 = 𝜎 and
𝐴 = 𝜇𝜌. Observe that the final expression is a universal bound independent of
447
Chapter 7: Quantum Entropies and Information

Λ. To determine an optimal measurement operator, we can look to Theorem 5.3.

There, it was established that the following measurement operator is an optimal
one for inf Λ:0≤Λ≤𝐼 {Tr[(𝐼 − Λ)𝜇𝜌] + Tr[Λ𝜎]}:

Λ(𝜇, 𝑝) B Π 𝜇𝜌>𝜎 + 𝑝Π 𝜇𝜌=𝜎 , (7.9.20)

where Π 𝜇𝜌>𝜎 is the projection onto the strictly positive part of 𝜇𝜌 − 𝜎, the
projection Π 𝜇𝜌=𝜎 projects onto the zero eigenspace of 𝜇𝜌 − 𝜎, and 𝑝 ∈ [0, 1]. The
measurement operator Λ(𝜇, 𝑝) is called a quantum Neyman–Pearson test. We still
need to choose the parameters 𝜇 ≥ 0 and 𝑝 ∈ [0, 1]. Let us pick 𝜇 according to the
following optimization:

𝜇∗ B sup 𝜇 : Tr[Π 𝜇𝜌>𝜎 𝜌] ≤ 1 − 𝜀 .

(7.9.21)

If it so happens that 𝜇∗ is such that Tr[Π 𝜇∗ 𝜌>𝜎 𝜌] = 1 − 𝜀, then we are done; we

can pick 𝑝 = 0. However, if 𝜇∗ is such that Tr[Π 𝜇∗ 𝜌>𝜎 𝜌] < 1 − 𝜀, then we pick
𝑝 ∗ ∈ [0, 1] such that
1 − 𝜀 − Tr[Π 𝜇∗ 𝜌>𝜎 𝜌]
𝑝∗ B , (7.9.22)
Tr[Π 𝜇∗ 𝜌=𝜎 𝜌]
with it following that 𝑝 ∗ ∈ [0, 1] because

Tr[Π 𝜇∗ 𝜌>𝜎 𝜌] < 1 − 𝜀 ≤ Tr[Π 𝜇∗ 𝜌≥𝜎 𝜌]. (7.9.23)

With these choices, we then find that

Tr[Λ(𝜇∗ , 𝑝 ∗ ) 𝜌] = 1 − 𝜀. (7.9.24)

By the analysis in (7.9.16)–(7.9.19), it then follows that

Tr[Λ𝜎] ≥ Tr[Λ(𝜇∗ , 𝑝 ∗ )𝜎] (7.9.25)

for all measurement operators Λ satisfying 0 ≤ Λ ≤ 𝐼 and Tr[Λ𝜌] = 1 − 𝜀. ■

Note that the other generalized divergences we have considered so far satisfy
𝑫 (𝜌∥ 𝜌) = 0 for all states 𝜌. The 𝜀-hypothesis testing relative entropy, however,
does not have this property unless 𝜀 = 0. In fact, it is clear from the definition,
along with (7.9.3), that

𝐷 𝜀𝐻 (𝜌∥ 𝜌) = − log2 (1 − 𝜀) (7.9.26)

for all states 𝜌 and 𝜀 ∈ [0, 1].

448
Chapter 7: Quantum Entropies and Information

Like the quantum relative entropy, the Petz–Rényi relative entropy, the sand-
wiched Rényi relative entropy, and the max-relative entropy, the 𝜀-hypothesis
testing relative entropy is also a generalized divergence, meaning that is satisfies
the data-processing inequality.

Theorem 7.68 Data-Processing Inequality for Hypothesis Testing Rela-

tive Entropy
Let 𝜌 be a state, 𝜎 a positive semi-definite operator, and N a quantum channel.
Then, for all 𝜀 ∈ [0, 1],

𝐷 𝜀𝐻 (𝜌∥𝜎) ≥ 𝐷 𝜀𝐻 (N(𝜌)∥N(𝜎)). (7.9.27)

Proof: The intuition for this proof is as follows: A measurement operator Λ

satisfying the constraints 0 ≤ Λ ≤ 1 and Tr[ΛN(𝜌)] ≥ 1 − 𝜀 can be understood as
a particular measurement strategy for distinguishing 𝜌 from 𝜎 in which we first apply
the channel N and then apply the measurement operator Λ. Then this particular
measurement strategy cannot perform better than the optimal measurement strategy
for distinguishing 𝜌 from 𝜎.
We start by writing 𝐷 𝜀𝐻 (N(𝜌)∥N(𝜎)) as in (7.9.1):
𝐷 𝜀𝐻 (N(𝜌)∥N(𝜎))
= sup{− log2 Tr[ΛN(𝜎)] : 0 ≤ Λ ≤ 1, Tr[ΛN(𝜌)] ≥ 1 − 𝜀}. (7.9.28)
Λ
Fix Λ such that 0 ≤ Λ ≤ 1 and Tr[ΛN(𝜌)] ≥ 1 − 𝜀. By definition of the adjoint,
we have that
Tr[ΛN(𝜎)] = Tr[N† (Λ)𝜎], Tr[ΛN(𝜌)] = Tr[N† (Λ) 𝜌]. (7.9.29)
Also, note that 0 ≤ N† (Λ) ≤ 1. The leftmost inequality is due to the fact that N†
is a positive map because N is. The rightmost inequality is due to the fact that N†
is subunital because N is trace non-increasing. By the positivity of N† , we obtain
Λ ≤ 1 ⇒ 1 − Λ ≥ 0 ⇒ N† ( 1 − Λ) ≥ 0 ⇒ N† (Λ) ≤ N† ( 1) ≤ 1. (7.9.30)
Since Λ is arbitrary, we bound (7.9.28) as
𝐷 𝜀𝐻 (N(𝜌)∥N(𝜎))
≤ sup{− log2 Tr[N† (Λ)𝜎] : 0 ≤ N† (Λ) ≤ 1, Tr[N† (Λ) 𝜌] ≥ 1 − 𝜀}.
Λ
(7.9.31)
449
Chapter 7: Quantum Entropies and Information

Now, by enlarging the optimization set from measurement operators N† (Λ) satisfy-
ing 0 ≤ N† (Λ) ≤ 1 and Tr[N† (Λ) 𝜌] ≥ 1 − 𝜀 to all measurement operators, say
Λ′, satisfying 0 ≤ Λ′ ≤ 1 and Tr[Λ′ 𝜌] ≥ 1 − 𝜀, we obtain
𝐷 𝜀𝐻 (N(𝜌)∥N(𝜎))
≤ sup{− log2 Tr[N† (Λ)𝜎] : 0 ≤ N† (Λ) ≤ 1, Tr[N† (Λ) 𝜌] ≥ 1 − 𝜀} (7.9.32)
Λ
≤ sup{− log2 Tr[Λ′𝜎] : 0 ≤ Λ′ ≤ 1, Tr[Λ′ 𝜌] ≥ 1 − 𝜀} (7.9.33)
Λ′
= 𝐷 𝜀𝐻 (𝜌∥𝜎), (7.9.34)
as required. ■

Remark: Inspection of the proof above reveals that it holds more generally for N a positive,
trace-non-increasing map.

Proposition 7.69 Properties of Hypothesis Testing Relative Entropy

The 𝜀-hypothesis testing relative entropy satisfies the following properties for
all 𝜀 ∈ [0, 1]:
1. If 𝜀′ ∈ (𝜀, 1], then
′
𝐷 𝜀𝐻 (𝜌∥𝜎) ≤ 𝐷 𝜀𝐻 (𝜌∥𝜎). (7.9.35)

2. The following limit holds

lim 𝐷 𝜀𝐻 (𝜌∥𝜎) = 𝐷 0 (𝜌∥𝜎), (7.9.36)

𝜀→0

where 𝐷 0 (𝜌∥𝜎) = − log2 Tr[Π 𝜌 𝜎] is the Petz–Rényi relative entropy of

order zero and Π 𝜌 is the projection onto the support of 𝜌.
3. For every state 𝜌 and positive semi-definite operators 𝜎, 𝜎′ such that
𝜎′ ≥ 𝜎, we have that 𝐷 𝜀𝐻 (𝜌∥𝜎) ≥ 𝐷 𝜀𝐻 (𝜌∥𝜎′).
4. For every state 𝜌, positive semi-definite operator 𝜎, and 𝛼 > 0, we have
that 𝐷 𝜀𝐻 (𝜌∥𝛼𝜎) = 𝐷 𝜀𝐻 (𝜌∥𝜎) − log2 𝛼.
5. Let 𝑝, 𝑞 : X → [0, 1] be two probability distributions over a finite alphabet
X with associated |X|-dimensional system 𝑋, let {𝜌 𝑥𝐴 }𝑥∈X be a set of states
on a system 𝐴, and let 𝜎𝐴 be a state on system 𝐴. Then,

450
Chapter 7: Quantum Entropies and Information

!
∑︁ ∑︁
𝐷 𝜀𝐻 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥
𝑥∈X 𝑥∈X
≥ min 𝐷 𝜀𝐻 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ). (7.9.37)
𝑥∈X

Proof:
1. Eq. (7.9.35) follows from Definition 7.65: increasing 𝜀 increases the set of
measurement operators Λ over which we can optimize, and 𝐷 𝜀𝐻 (𝜌∥𝜎) does
not decrease under such a change.
2. Consider that the following inequality holds for all 𝜀 ∈ (0, 1):
𝐷 𝜀𝐻 (𝜌∥𝜎) ≥ 𝐷 0 (𝜌∥𝜎), (7.9.38)
because the measurement operator Π 𝜌 (projection onto support of 𝜌) satisfies
Tr[Π 𝜌 𝜌] ≥ 1 − 𝜀 for all 𝜀 ∈ (0, 1). So we conclude that
lim inf 𝐷 𝜀𝐻 (𝜌∥𝜎) ≥ 𝐷 0 (𝜌∥𝜎). (7.9.39)
𝜀→0

Alternatively, suppose that Λ is a measurement operator satisfying Tr[Λ𝜌] =

1 − 𝜀 (note that when optimizing 𝐷 𝜀𝐻 , it suffices to optimize over measurement
operators satisfying the constraint Tr[Λ𝜌] ≥ 1 − 𝜀 with equality, as mentioned
in (7.9.3)). Then applying the data-processing inequality for 𝐷 𝛼 (𝜌∥𝜎) under
the measurement {Λ, 𝐼 − Λ}, which holds for 𝛼 ∈ (0, 1), we find that
1
𝐷 𝛼 (𝜌∥𝜎) ≥ log2 (1 − 𝜀) 𝛼 Tr[Λ𝜎] 1−𝛼 + 𝜀 𝛼 (1 − Tr[Λ𝜎]) 1−𝛼 .
𝛼−1
(7.9.40)
Since this bound holds for all measurement operators Λ satisfying Tr[Λ𝜌] =
1 − 𝜀, we conclude the following bound for all 𝛼 ∈ (0, 1):

𝐷 𝛼 (𝜌∥𝜎) ≥

𝜀 (𝜌∥𝜎) 1−𝛼 𝜀 (𝜌∥𝜎) 1−𝛼
1
𝛼 −𝐷 𝐻 𝛼 −𝐷 𝐻
log2 (1 − 𝜀) 2 + 𝜀 1−2 . (7.9.41)
𝛼−1
Now taking the limit of the right-hand side as 𝜀 → 0, we find that the following
bound holds for all 𝛼 ∈ (0, 1):
𝐷 𝛼 (𝜌∥𝜎) ≥ lim sup 𝐷 𝜀𝐻 (𝜌∥𝜎). (7.9.42)
𝜀→0

451
Chapter 7: Quantum Entropies and Information

Since the bound holds for all 𝛼 ∈ (0, 1), we can take the limit on the left-hand
side to arrive at

lim 𝐷 𝛼 (𝜌∥𝜎) = 𝐷 0 (𝜌∥𝜎) ≥ lim sup 𝐷 𝜀𝐻 (𝜌∥𝜎). (7.9.43)

𝛼→0 𝜀→0

Now putting together (7.9.39) and (7.9.43), we conclude (7.9.36).

3. Fix an operator Λ satisfying 0 ≤ Λ ≤ 1. The assumption 𝜎′ ≥ 𝜎 implies that
Tr[Λ𝜎′] ≥ Tr[Λ𝜎], which in turn implies that

− log2 Tr[Λ𝜎′] ≤ − log2 Tr[Λ𝜎]. (7.9.44)

Since the operator Λ is arbitrary, we obtain 𝐷 𝜀𝐻 (𝜌∥𝜎) ≥ 𝐷 𝜀𝐻 (𝜌∥𝜎′), as

required.
4. This follows immediately from the fact that

− log2 Tr[Λ(𝛼𝜎)] = − log2 (𝛼Tr[Λ𝜎]) = − log2 Tr[Λ𝜎] − log2 𝛼 (7.9.45)

for all operators Λ satisfying 0 ≤ Λ ≤ 1.

5. Since 𝐷 𝜀𝐻 (𝜌∥𝜎) = − log2 𝛽𝜀 (𝜌∥𝜎), where 𝛽𝜀 (𝜌∥𝜎) is defined in (5.3.125),
we can equivalently show that
!
∑︁ ∑︁
𝛽𝜀 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥
𝑥∈X 𝑥∈X
≤ max 𝛽𝜀 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ). (7.9.46)
𝑥∈X

Now, in the definition of 𝛽𝜀 on the left-hand side of the inequality above, let
us restrict the infimum over all measurement operators to those of the form
Λ 𝑋 𝐴 = 𝑥∈X |𝑥⟩⟨𝑥| 𝑋 ⊗ 𝑀 𝐴𝑥 such that 0 ≤ 𝑀 𝐴𝑥 ≤ 1 𝐴 and Tr[𝑀 𝐴𝑥 𝜌 𝑥𝐴 ] ≥ 1 − 𝜀 for
Í
all 𝑥 ∈ X. Doing this leads to
!
∑︁ ∑︁
𝛽𝜀 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 (7.9.47)
𝑥∈X 𝑥∈X
∑︁
≤ inf 𝑞(𝑥)Tr[𝑀 𝐴𝑥 𝜎𝐴𝑥 ] : 0 ≤ 𝑀 𝐴𝑥 ≤ 1 𝐴 ,
{𝑀 𝐴 } 𝑥 ∈X
𝑥
𝑥∈X

Tr[𝑀 𝐴𝑥 𝜌 𝑥𝐴 ] ≥ 1 − 𝜀 ∀𝑥 ∈ X (7.9.48)

452
Chapter 7: Quantum Entropies and Information

𝑞(𝑥) inf𝑥 Tr[𝑀 𝐴𝑥 𝜎𝐴𝑥 ] : 0 ≤ 𝑀 𝐴𝑥 ≤ 1 𝐴 ,

∑︁
=
𝑀𝐴
𝑥∈X
Tr[𝑀 𝐴𝑥 𝜌 𝑥𝐴 ] ≥ 1 − 𝜀 (7.9.49)
∑︁
= 𝑞(𝑥) 𝛽𝜀 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ) (7.9.50)
𝑥∈X
≤ max 𝛽𝜀 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ). (7.9.51)
𝑥∈X

The last inequality follows because 𝛽𝜀 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ) ≤ max𝑥∈X 𝛽𝜀 (𝜌 𝑥𝐴 ∥𝜎𝐴𝑥 ) for all
𝑥 ∈ X. So we have shown that the inequality (7.9.46) holds, which completes
the proof. ■

7.9.1 Connection to Quantum Relative Entropy

We now prove a bound relating the 𝜀-hypothesis testing relative entropy to the
quantum relative entropy.

Proposition 7.70 Hypothesis Testing to Quantum Relative Entropy

Fix 𝜀 ∈ [0, 1). Let 𝜌 be a state and 𝜎 a positive semi-definite operator. Then
the following bound relates the 𝜀-hypothesis testing relative entropy to the
quantum relative entropy:
1
𝐷 𝜀𝐻 (𝜌∥𝜎) ≤ 𝐷 (𝜌∥𝜎) + ℎ2 (𝜀) + 𝜀 log2 Tr[𝜎] , (7.9.52)
1−𝜀
where ℎ2 (𝜀) is the binary entropy defined in (7.1.4).

Proof: To see this, we use the fact that the optimization in the definition of
𝐷 𝜀𝐻 (𝜌∥𝜎) can be restricted as in (7.9.3), i.e.,
𝐷 𝜀𝐻 (𝜌∥𝜎) = sup{− log2 Tr[Λ𝜎] : 0 ≤ Λ ≤ 1, Tr[Λ𝜌] = 1 − 𝜀}. (7.9.53)
Λ

For every measurement operator Λ such that Tr[Λ𝜌] = 1 − 𝜀, the data-processing

inequality for the quantum relative entropy (Theorem 7.4) implies that
𝐷 (𝜌∥𝜎) ≥ 𝐷 ({1 − 𝜀, 𝜀}∥{Tr[Λ𝜎], Tr[𝜎] − Tr[Λ𝜎]})

1−𝜀 𝜀
= (1 − 𝜀) log2 + 𝜀 log2 (7.9.54)
Tr[Λ𝜎] Tr[𝜎] − Tr[Λ𝜎]
453
Chapter 7: Quantum Entropies and Information

= (1 − 𝜀) log2 (1 − 𝜀) − (1 − 𝜀) log2 (Tr[Λ𝜎])

1
+ 𝜀 log2 (𝜀) + 𝜀 log2 (7.9.55)
Tr[𝜎] (1 − Tr[Λ𝜎/Tr[𝜎]])
= − (1 − 𝜀) log2 Tr[Λ𝜎] − ℎ2 (𝜀)

1
+ 𝜀 log2 − 𝜀 log2 Tr[𝜎] (7.9.56)
1 − Tr[Λ𝜎/Tr[𝜎]]
≥ − (1 − 𝜀) log2 Tr[Λ𝜎] − ℎ2 (𝜀) − 𝜀 log2 Tr[𝜎], (7.9.57)

1
where the inequality holds because 𝜀 log2 1−Tr[Λ𝜎/Tr[𝜎]] ≥ 0. Rewriting this gives

1
− log Tr[Λ𝜎] ≤ 𝐷 (𝜌∥𝜎) + ℎ2 (𝜀) + 𝜀 log2 Tr[𝜎] . (7.9.58)
1−𝜀
Since this bound holds for all measurement operators Λ satisfying Tr[Λ𝜌] = 1 − 𝜀,
we conclude (7.9.52). ■

7.9.2 Connections to Quantum Rényi Relative Entropies

We now show a connection between the hypothesis testing relative entropy and the
Petz– and sandwiched Rényi relative entropies.

Proposition 7.71 Hypothesis Testing to Sandwiched Rényi Relative En-

tropy
Let 𝜌 be a state and 𝜎 a positive semi-definite operator. Let 𝛼 ∈ (1, ∞) and
𝜀 ∈ [0, 1). Then the following inequality holds

𝛼 1
𝐷 𝜀𝐻 (𝜌∥𝜎) ≤ 𝐷e𝛼 (𝜌∥𝜎) + log2 . (7.9.59)
𝛼−1 1−𝜀

In particular, in the limit 𝛼 → ∞,

1
𝐷 𝜀𝐻 (𝜌∥𝜎) ≤ 𝐷 max (𝜌∥𝜎) + log2 . (7.9.60)
1−𝜀

Proof: If the support condition supp(𝜌) ⊆ supp(𝜎) does not hold, then the
right-hand side of (7.9.59) is equal to +∞, and so the result is trivially true. Thus,
454
Chapter 7: Quantum Entropies and Information

in what follows, we suppose that the support condition supp(𝜌) ⊆ supp(𝜎) holds.
Let Λ denote a measurement operator such that Tr[Λ𝜌] = 1 − 𝜀. Let 𝑞 B Tr[Λ𝜎].
By the data-processing inequality for the sandwiched Rényi relative entropy for
𝛼 > 1 (Theorem 7.33), under the measurement channel
𝜔 ↦→ Tr[Λ𝜔]|0⟩⟨0| + Tr[( 1 − Λ)𝜔]|1⟩⟨1|, (7.9.61)
we find that
e𝛼 (𝜌∥𝜎) ≥ 𝐷
𝐷 e𝛼 ({1 − 𝜀, 𝜀}∥{𝑞, Tr[𝜎] − 𝑞}) (7.9.62)
1
= log2 (1 − 𝜀) 𝛼 𝑞 1−𝛼 + 𝜀 𝛼 (Tr[𝜎] − 𝑞) 1−𝛼 (7.9.63)
𝛼−1
1
≥ log2 [(1 − 𝜀) 𝛼 𝑞 1−𝛼 ] (7.9.64)
𝛼−1
𝛼
= log2 (1 − 𝜀) − log2 𝑞. (7.9.65)
𝛼−1
The statement in (7.9.59) follows by taking the supremum over all Λ such that
Tr[Λ𝜌] = 1 − 𝜀. Furthermore, (7.9.60) follows because lim𝛼→∞ 𝐷 e𝛼 (𝜌∥𝜎) =
𝛼
𝐷 max (𝜌∥𝜎) (as shown in Proposition 7.61) and the fact that lim𝛼→∞ 𝛼−1 = 1. ■

The following proposition establishes an inequality relating hypothesis testing

relative entropy and the Petz–Rényi relative entropy, and it represents a counterpart
to Proposition 7.71. After giving its proof, we show how Proposition 7.71 and the
following proposition lead to a proof of the quantum Stein’s lemma.

Proposition 7.72 Hypothesis Testing to Petz–Rényi Relative Entropy

Let 𝜌 be a state, and let 𝜎 be a positive semi-definite operator. Let 𝛼 ∈ (0, 1)
and 𝜀 ∈ (0, 1]. Then the following inequality holds:

𝜀 𝛼 1
𝐷 𝐻 (𝜌∥𝜎) ≥ 𝐷 𝛼 (𝜌∥𝜎) + log2 . (7.9.66)
𝛼−1 𝜀

Proof: The statement is trivially true if 𝜌𝜎 = 0 because both 𝐷 𝜀𝐻 (𝜌∥𝜎) = +∞

and 𝐷 𝛼 (𝜌∥𝜎) = +∞ in this case. So we consider the non-trivial case when this
equality does not hold. Recall from Lemma 5.5 that the following inequality holds
for positive semi-definite operators 𝐴 and 𝐵 and for 𝛼 ∈ (0, 1):
1
inf Tr[( 1 − Λ) 𝐴] + Tr[Λ𝐵] = (Tr[ 𝐴 + 𝐵] − ∥ 𝐴 − 𝐵∥ 1 ) (7.9.67)
Λ:0≤Λ≤1 2
455
Chapter 7: Quantum Entropies and Information

≤ Tr[ 𝐴𝛼 𝐵1−𝛼 ], (7.9.68)

where the first equality is the result of Theorem 5.3. For 𝑝 ∈ (0, 1), pick 𝐴 = 𝑝𝜌
and 𝐵 = (1 − 𝑝) 𝜎. Plugging in to the inequality above, we find that there exists a
measurement operator Λ∗ = Λ( 𝑝, 𝜌, 𝜎) such that

𝑝Tr[( 1 − Λ∗ ) 𝜌] + (1 − 𝑝)Tr[Λ∗ 𝜎] ≤ 𝑝 𝛼 (1 − 𝑝) 1−𝛼 Tr[𝜌 𝛼 𝜎 1−𝛼 ]. (7.9.69)

This implies that

𝑝Tr[( 1 − Λ∗ ) 𝜌] ≤ 𝑝 𝛼 (1 − 𝑝) 1−𝛼 Tr[𝜌 𝛼 𝜎 1−𝛼 ], (7.9.70)

and in turn that

1−𝛼
1− 𝑝
Tr[( 1 − Λ ) 𝜌] ≤
∗
Tr[𝜌 𝛼 𝜎 1−𝛼 ]. (7.9.71)
𝑝
For a given 𝜀 ∈ (0, 1] and 𝛼 ∈ (0, 1), we pick 𝑝 ∈ (0, 1) such that
1−𝛼
1− 𝑝
Tr[𝜌 𝛼 𝜎 1−𝛼 ] = 𝜀. (7.9.72)
𝑝
This is possible because we can rewrite the equation above as
1−𝛼
1− 𝑝
𝜀= Tr[𝜌 𝛼 𝜎 1−𝛼 ]
𝑝
1−𝛼
1
= −1 Tr[𝜌 𝛼 𝜎 1−𝛼 ] (7.9.73)
𝑝
1−𝛼
1 𝜀
⇐⇒ −1 = (7.9.74)
𝑝 Tr[𝜌 𝛼 𝜎 1−𝛼 ]
1−1 𝛼
1 𝜀
⇐⇒ = +1 (7.9.75)
𝑝 Tr[𝜌 𝛼 𝜎 1−𝛼 ]
1
⇐⇒ 𝑝= 1−1 𝛼 ∈ (0, 1) . (7.9.76)
𝜀
Tr[𝜌 𝛼 𝜎 1− 𝛼 ]
+1

This means that Λ∗ = Λ( 𝑝, 𝜌, 𝜎), with 𝑝 selected as above, is a measurement

operator such that
Tr[( 1 − Λ∗ ) 𝜌] ≤ 𝜀. (7.9.77)
456
Chapter 7: Quantum Entropies and Information

Now, we use the fact that

(1 − 𝑝)Tr[Λ∗ 𝜎] ≤ 𝑝 𝛼 (1 − 𝑝) 1−𝛼 Tr[𝜌 𝛼 𝜎 1−𝛼 ] (7.9.78)

implies 𝛼
∗ 𝑝
Tr[Λ 𝜎] ≤ Tr[𝜌 𝛼 𝜎 1−𝛼 ]. (7.9.79)
1− 𝑝
Considering that
1−𝛼 𝛼−1
1− 𝑝 1−𝛼 𝑝
𝜀= Tr[𝜌 𝜎 𝛼
]= Tr[𝜌 𝛼 𝜎 1−𝛼 ] (7.9.80)
𝑝 1− 𝑝
implies that
1
𝛼−1
𝜀 𝑝
= , (7.9.81)
Tr[𝜌 𝛼 𝜎 1−𝛼 ] 1− 𝑝
we get that
𝛼
𝑝
Tr[Λ∗ 𝜎] ≤ Tr[𝜌 𝛼 𝜎 1−𝛼 ] (7.9.82)
1− 𝑝
1 !𝛼
𝛼−1
𝜀
= Tr[𝜌 𝛼 𝜎 1−𝛼 ] (7.9.83)
Tr[𝜌 𝜎 ]
𝛼 1−𝛼

𝛼
1−𝛼𝛼
1−𝛼
= 𝜀 𝛼−1 Tr[𝜌 𝜎 ]𝛼
Tr[𝜌 𝛼 𝜎 1−𝛼 ] (7.9.84)
𝛼
1
𝛼 1−𝛼 1− 𝛼
=𝜀 𝛼−1 Tr[𝜌 𝜎 ] . (7.9.85)

Then, by taking the negative logarithm and optimizing over all Λ∗ satisfying
(7.9.77), we find that

𝐷 𝜀𝐻 (𝜌∥𝜎) ≥ − log2 Tr[Λ∗ 𝜎] (7.9.86)

1−1 𝛼
𝛼
𝛼 1−𝛼
≥ − log2 𝜀 𝛼−1 Tr[𝜌 𝜎 ] (7.9.87)
𝛼 1
=− log2 (𝜀) + log2 Tr[𝜌 𝛼 𝜎 1−𝛼 ] (7.9.88)
𝛼−1 𝛼−1
𝛼
=− log2 (𝜀) + 𝐷 𝛼 (𝜌∥𝜎). (7.9.89)
𝛼−1
Rearranging this inequality leads to (7.9.66), as required. ■
457
Chapter 7: Quantum Entropies and Information

7.10 Quantum Stein’s Lemma

In this section, we show how Propositions 7.71 and 7.72 lead to a proof of the
quantum Stein’s lemma, which is one of the most important results in the asymptotic
theory of quantum hypothesis testing. Before doing so, let us discuss the task of
asymmetric hypothesis testing.
The general setting of asymmetric i.i.d. hypothesis testing is illustated in Figure
7.2. Bob is given 𝑛 copies of a quantum system, each of which is either in the state
𝜌 or in the state 𝜎, and his task is to determine in which state the systems have
been prepared. Bob’s strategy consists of performing a joint measurement on all of
the systems at once, described by the POVM {Λ (𝑛) , 1⊗𝑛 − Λ (𝑛) }, guessing “𝜌” if
the outcome corresponds to Λ (𝑛) and guessing “𝜎” if the outcome corresponds to
1⊗𝑛 − Λ (𝑛) . In this case, there are two types of errors that can occur.
1. Type-I Error: Bob guesses “𝜎”, but the systems are in the state 𝜌 ⊗𝑛 . The
probability of this occurring is Tr[( 1⊗𝑛 − Λ (𝑛) ) 𝜌 ⊗𝑛 ].
2. Type-II Error: Bob guesses “𝜌”, but the systems are in the state 𝜎 ⊗𝑛 . The
probability of this occurring is Tr[Λ (𝑛) 𝜎 ⊗𝑛 ].
The quantity 𝛽𝜀 (𝜌 ⊗𝑛 ∥𝜎 ⊗𝑛 ), defined from (5.3.125), can be interpreted as
the minimum type-II error probability subject to the constraint 1 − Tr[Λ (𝑛) 𝜌] =
Tr[( 1⊗𝑛 − Λ (𝑛) ) 𝜌] ≤ 𝜀 on the type-I error probability. The rate of this protocol,
given a type-I error probability constraint of 𝜀, is 𝑛1 𝐷 𝜀𝐻 (𝜌 ⊗𝑛 ∥𝜎 ⊗𝑛 ), which is
essentially the minimum type-II error probability exponent per copy of the states
𝜌 and 𝜎. By increasing the number 𝑛 of copies, one might imagine that this
normalized minimum type-II error probability exponent can be increased. Also,
there is generally a trade-off between the type-I and type-II error probabilities,
meaning that both cannot be made arbitrarily small simultaneously. However, by
increasing the number 𝑛 of copies of the states 𝜌 and 𝜎, we might expect that the
type-I error probability can be brought down all the way to zero. We thus define
the maximum rate for hypothesis testing of the states 𝜌 and 𝜎 as the largest value of
the normalized type-II error probability exponent, as 𝑛 → ∞, such that the type-I
error probability vanishes in this limit, i.e.,
𝐷 𝜀𝐻 (𝜌 ⊗𝑛 ∥𝜎 ⊗𝑛 )
inf lim inf . (7.10.1)
𝜀∈(0,1) 𝑛→∞ 𝑛
A tractable expression for this optimal rate is given by the quantum Stein’s lemma,
458
Chapter 7: Quantum Entropies and Information

ρ or σ “ρ” or “σ”

{ Λ ( n ) , 1⊗ n − Λ ( n ) }

Figure 7.2: Schematic depiction of hypothesis testing. Many copies, say 𝑛, of

an identical quantum system are each prepared either in the state 𝜌 or the state
𝜎. A joint binary measurement, described by the POVM {Λ (𝑛) , 1⊗𝑛 − Λ (𝑛) }, is
made on the overall state, which is either 𝜌 ⊗𝑛 or 𝜎 ⊗𝑛 .

which we state and prove in Theorem 7.78 below.

The task of asymmetric hypothesis testing is similar to the task of state
discrimination that we considered in Section 5.3.1. While in hypothesis testing we
consider two error probabilities, the type-I and type-II error probabilities, in state
discrimination we consider only one error probability that is in fact the average of
the type-I and type-II error probabilities taken with respect to a prior probability
distribution. If 𝜆 ∈ [0, 1] is the probability of choosing the state 𝜌, then the average
of the type-I and type-II error probabilities is

𝜆Tr[( 1⊗𝑛 − Λ (𝑛) ) 𝜌 ⊗𝑛 ] + (1 − 𝜆)Tr[Λ (𝑛) 𝜎 ⊗𝑛 ]

= 𝑝 err ({Λ (𝑛) , 1⊗𝑛 − Λ (𝑛) }, {𝜌 ⊗𝑛 , 𝜎 ⊗𝑛 }), (7.10.2)

where we recall the definition of the error probability 𝑝 err ({Λ (𝑛) , 1⊗𝑛 −Λ (𝑛) }; {𝜌, 𝜎})
of state discrimination from (5.3.1). Due to the fact that we combine both the type-I
and type-II error probabilities, the task of state discrimination is often referred to
as symmetric hypothesis testing, while the task of hypothesis testing that we are
considering in this section is referred to as asymmetric hypothesis testing.
More formally, a hypothesis testing protocol is defined by the four elements
(𝑛, 𝜌, 𝜎, Λ (𝑛) ), where 𝑛 is the number of copies of the system, each of which is
either in the state 𝜌 or 𝜎, and 0 ≤ Λ (𝑛) ≤ 1⊗𝑛 is the operator defining the POVM
{Λ (𝑛) , 1⊗𝑛 − Λ (𝑛) } used to decide the state of the system.

459
Chapter 7: Quantum Entropies and Information

Definition 7.73 (𝒏, 𝜺II , 𝜺I ) Hypothesis Testing Protocol

Let (𝑛, 𝜌, 𝜎, Λ (𝑛) ) be the elements of a hypothesis testing protocol. The
protocol is called an (𝑛, 𝜀II , 𝜀I ) protocol for 𝜌 and 𝜎 if the type-I error
probability Tr[( 1⊗𝑛 − Λ (𝑛) ) 𝜌 ⊗𝑛 ] satisfies

Tr[( 1⊗𝑛 − Λ (𝑛) ) 𝜌 ⊗𝑛 ] ≤ 𝜀I , (7.10.3)

and the type-II error probability Tr[Λ (𝑛) 𝜎 ⊗𝑛 ] satisfies

Tr[Λ (𝑛) 𝜎 ⊗𝑛 ] ≤ 𝜀II . (7.10.4)

Observe that, by definition, if there exists an (𝑛, 𝜀II , 𝜀I ) hypothesis testing

protocol for 𝜌 and 𝜎, then there exists an (𝑛, 𝜀′II , 𝜀I ) hypothesis testing protocol for
all 𝜀′II ≥ 𝜀II because every measurement operator Λ (𝑛) for which Tr[Λ (𝑛) 𝜎 ⊗𝑛 ] ≤ 𝜀II
clearly satisfies Tr[Λ (𝑛) 𝜎 ⊗𝑛 ] ≤ 𝜀′II . Similarly, if there does not exist an (𝑛, 𝜀II , 𝜀I )
hypothesis testing protocol, then there does not exist an (𝑛, 𝜀′II , 𝜀I ) hypothesis
testing protocol for all 𝜀′II ≤ 𝜀II .
We define the rate of a hypothesis testing protocol (𝑛, 𝜌, 𝜎, Λ (𝑛) ) as
1
𝑅(𝑛, 𝜌, 𝜎, Λ (𝑛) ) B − log2 Tr[Λ (𝑛) 𝜎 ⊗𝑛 ], (7.10.5)
𝑛
The rate is also called the (normalized) type-II error exponent because the type-II
error probability Tr[Λ (𝑛) 𝜎 ⊗𝑛 ] can be expressed as Tr[Λ (𝑛) 𝜎 ⊗𝑛 ] = 2−𝑛𝑅 . The goal
of the hypothesis testing protocol is to find the highest rate such that the type-I
error probability tends to zero as 𝑛 increases.

Definition 7.74 Achievable Rate for Hypothesis Testing

Given quantum states 𝜌 and 𝜎, a rate 𝑅 ∈ R+ is called an achievable rate for
hypothesis testing of 𝜌 and 𝜎 if for all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently large
𝑛 there exists an (𝑛, 2−𝑛(𝑅−𝛿) , 𝜀) hypothesis testing protocol for 𝜌 and 𝜎.

Remark: When we say that there exists an (𝑛, 2−𝑛(𝑅− 𝛿 ) , 𝜀) hypothesis testing protocol for 𝜌
and 𝜎 for all 𝜀 ∈ (0, 1], 𝛿 > 0, and “sufficiently large 𝑛”, we mean that for all 𝜀 ∈ (0, 1] and
𝛿 > 0, there exists a number 𝑁 𝜀, 𝛿 ∈ N such that for all 𝑛 ≥ 𝑁 𝜀, 𝛿 , there exists an (𝑛, 2−𝑛(𝑅− 𝛿 ) , 𝜀)
hypothesis testing protocol for 𝜌 and 𝜎. This convention with the nomenclature “sufficiently
large 𝑛” is taken throughout the rest of the book.

460
Chapter 7: Quantum Entropies and Information

Note that, by definition, for every achievable rate 𝑅 there exists a value of 𝑛
such that the type-I error probability 𝜀 becomes arbitrarily close to zero.

Definition 7.75 Optimal Achievable Rate for Hypothesis Testing

Given quantum states 𝜌 and 𝜎, the optimal achievable rate, denoted by 𝐸 (𝜌, 𝜎),
is defined as the supremum of all achievable rates for hypothesis testing of 𝜌
and 𝜎, i.e.,

𝐸 (𝜌, 𝜎) B sup{𝑅 : 𝑅 is an achievable rate for 𝜌, 𝜎}. (7.10.6)

Definition 7.76 Strong Converse Rate for Hypothesis Testing

Given quantum states 𝜌 and 𝜎, a rate 𝑅 ∈ R+ is called a strong converse rate
for hypothesis testing of 𝜌 and 𝜎 if for all 𝜀 ∈ [0, 1), 𝛿 > 0, and sufficiently
large 𝑛, there does not exist an (𝑛, 2−𝑛(𝑅+𝛿) , 𝜀) hypothesis testing protocol for 𝜌
and 𝜎.

Note that, by definition, for every strong converse rate 𝑅 there exists a value of
𝑛 such that the type-I error probability 𝜀 is arbitrarily close to one.

Definition 7.77 Optimal Strong Converse Rate for Hypothesis Testing

Given quantum states 𝜌 and 𝜎, the optimal strong converse rate, denoted by
e(𝜌, 𝜎), is defined as the infimum of all strong converse rates for hypothesis
𝐸
testing of 𝜌 and 𝜎, i.e.,
e(𝜌, 𝜎) B inf{𝑅 : 𝑅 is a strong converse rate for 𝜌, 𝜎}.
𝐸 (7.10.7)

Note that the following inequality is a direct consequence of definitions:

𝐸 (𝜌, 𝜎) ≤ 𝐸
e(𝜌, 𝜎). (7.10.8)

Indeed, suppose for a contradiction that this is not true, i.e., that 𝐸 (𝜌, 𝜎) > 𝐸
e(𝜌, 𝜎)
holds. This means that there exists an achievable rate 𝑅 such that 𝐸 e(𝜌, 𝜎) < 𝑅 <
𝐸 (𝜌, 𝜎), so that for all 𝜀 ∈ (0, 1], all 𝛿 > 0, and all sufficiently large 𝑛 there exists

461
Chapter 7: Quantum Entropies and Information

an (𝑛, 2−𝑛(𝑅−𝛿) , 𝜀) hypothesis testing protocol. On the other hand, since 𝐸

e(𝜌, 𝜎) is
the optimal strong converse rate, there exists a 𝛿 > 0 such that 𝑅 − 𝛿 > 𝐸 e(𝜌, 𝜎)
−𝑛(𝑅−𝛿+𝛿 ′)
is a strong converse rate. This implies, by definition, that an (𝑛, 2 , 𝜀)
′
protocol, with 𝛿 ∈ (0, 𝛿), does not exist. However, since 𝑅 was claimed to be an
achievable rate, such a protocol should exist since 𝑅 − 𝛿 + 𝛿′ < 𝑅. We have thus
reached a contradiction. The inequality 𝐸 (𝜌, 𝜎) > 𝐸 e(𝜌, 𝜎) therefore cannot be
true, which means that 𝐸 (𝜌, 𝜎) ≤ 𝐸 e(𝜌, 𝜎).

We now state and prove the quantum Stein’s lemma: both the optimal achievable
and strong converse rates are equal to the quantum relative entropy. As alluded
to at the beginning of Section 7.2 on the quantum relative entropy, the quantum
Stein’s lemma gives the quantum relative entropy its most fundamental operational
meaning as the optimal rate in asymmetric quantum hypothesis testing.

Theorem 7.78 Quantum Stein’s Lemma

For all states 𝜌 and 𝜎, the optimal achievable and strong converse rates are
equal to the quantum relative entropy of 𝜌 and 𝜎, i.e.,

𝐸 (𝜌, 𝜎) = 𝐸
e(𝜌, 𝜎) = 𝐷 (𝜌∥𝜎). (7.10.9)

Proof: Note that if supp(𝜌) ⊈ supp(𝜎), then 𝐷 (𝜌∥𝜎) = +∞ by definition. In this

e(𝜌, 𝜎) is undefined, and so we
singular case, the optimal strong converse rate 𝐸
prove that 𝐸 (𝜌, 𝜎) = +∞.
Fix 𝜀 ∈ (0, 1], and let Π𝜎 be the projection onto the support of 𝜎. Note that
since supp(𝜌) ⊈ supp(𝜎), we have that Tr[Π𝜎 𝜌] < 1. Now, pick 𝑛 large enough so
that (Tr[Π𝜎 𝜌]) 𝑛 ≤ 𝜀. Define the measurement operator Λ (𝑛) as Λ (𝑛) B 1 − (Π𝜎 ) ⊗𝑛 .
Observe that the type-I error probability is

Tr[( 1⊗𝑛 − Λ (𝑛) ) 𝜌 ⊗𝑛 ] = (Tr[Π𝜎 𝜌]) 𝑛 ≤ 𝜀 (7.10.10)

and the type-II error probability is

Tr[Λ (𝑛) 𝜎 ⊗𝑛 ] = 1 − (Tr[Π𝜎 𝜎]) 𝑛 = 0. (7.10.11)

Therefore, for all sufficiently large 𝑛 such that (Tr[Π𝜎 𝜌]) 𝑛 ≤ 𝜀 holds, the elements
(𝑛, 𝜌, 𝜎, Λ (𝑛) ) constitute an (𝑛, 0, 𝜀) hypothesis testing protocol for 𝜌 and 𝜎, the
rate of which is +∞ = 𝐷 (𝜌∥𝜎). Since 𝜀 is arbitrary, we conclude that, for all
𝜀 ∈ (0, 1] and sufficiently large 𝑛, there exists a hypothesis testing protocol for 𝜌
462
Chapter 7: Quantum Entropies and Information

and 𝜎 with rate 𝑅 = +∞. This implies that 𝐸 (𝜌, 𝜎) = +∞ in the singular case of
supp(𝜌) ⊈ supp(𝜎).
For the remainder of the proof, we assume that the support condition supp(𝜌) ⊆
supp(𝜎) holds, so that 𝐷 (𝜌∥𝜎) is finite.
Let us first show that 𝐷 (𝜌∥𝜎) is an achievable rate, which establishes that
𝐸 (𝜌, 𝜎) ≥ 𝐷 (𝜌∥𝜎). To this end, fix 𝜀 ∈ (0, 1] and 𝛿 > 0. Let 𝛿1 , 𝛿2 > 0 be such
that
𝛿1 + 𝛿2 = 𝛿. (7.10.12)
Set 𝛼 ∈ (0, 1) such that
𝛿1 ≥ 𝐷 (𝜌∥𝜎) − 𝐷 𝛼 (𝜌∥𝜎), (7.10.13)
which is possible because lim𝛼→1 𝐷 𝛼 (𝜌∥𝜎) = 𝐷 (𝜌∥𝜎) by Proposition 7.22 and
𝐷 𝛼 is monotonically increasing in 𝛼, as established in Proposition 7.23. Then, with
this choice of 𝛼, take 𝑛 large enough so that

𝛼 1
𝛿2 ≥ log2 . (7.10.14)
𝑛(1 − 𝛼) 𝜀

Now, let 0 ≤ Λ (𝑛) ≤ 1⊗𝑛 be a measurement operator that achieves the 𝜀-

hypothesis testing relative entropy 𝐷 𝜀𝐻 (𝜌 ⊗𝑛 ∥𝜎 ⊗𝑛 ), which means that
Tr[( 1⊗𝑛 − Λ (𝑛) ) 𝜌 ⊗𝑛 ] = 𝜀 (7.10.15)
and
1 1
− log2 Tr[Λ (𝑛) 𝜎 ⊗𝑛 ] = 𝐷 𝜀𝐻 (𝜌 ⊗𝑛 ∥𝜎 ⊗𝑛 ). (7.10.16)
𝑛 𝑛
⊗𝑛 ⊗𝑛
The elements (𝑛, 𝜌, 𝜎, Λ (𝑛) ) thus constitute an (𝑛, 2−𝑛 ( 𝑛 𝐷 𝐻 (𝜌 ∥𝜎 ) ) , 𝜀) hypothesis
1 𝜀

testing protocol for 𝜌 and 𝜎. We now apply Proposition 7.72 and the additivity of
the Petz–Rényi relative entropy from Proposition 7.23 to find that

1 𝜀 ⊗𝑛 ⊗𝑛 1 𝛼 1
𝐷 𝐻 (𝜌 ∥𝜎 ) ≥ 𝐷 𝛼 (𝜌 ⊗𝑛 ∥𝜎 ⊗𝑛 ) + log2 (7.10.17)
𝑛 𝑛 𝑛(𝛼 − 1) 𝜀

𝛼 1
= 𝐷 𝛼 (𝜌∥𝜎) + log2 . (7.10.18)
𝑛(𝛼 − 1) 𝜀
Rearranging the right-hand side of this inequality and using (7.10.12)–(7.10.14),
we conclude that
1 𝜀 ⊗𝑛 ⊗𝑛
𝐷 (𝜌 ∥𝜎 )
𝑛 𝐻
463
Chapter 7: Quantum Entropies and Information

𝛼 1
≥ 𝐷 (𝜌∥𝜎) − 𝐷 (𝜌∥𝜎) − 𝐷 𝛼 (𝜌∥𝜎) + log2 (7.10.19)
𝑛(1 − 𝛼) 𝜀
≥ 𝐷 (𝜌∥𝜎) − (𝛿1 + 𝛿2 ) (7.10.20)
≥ 𝐷 (𝜌∥𝜎) − 𝛿. (7.10.21)

We thus have
1 𝜀 ⊗𝑛 ⊗𝑛
𝐷 (𝜌∥𝜎) − 𝛿 ≤ 𝐷 (𝜌 ∥𝜎 ). (7.10.22)
𝑛 𝐻
⊗𝑛 ⊗𝑛
The error 2−𝑛(𝐷 (𝜌∥𝜎)−𝛿) is then greater than or equal to 2−𝑛 ( 𝑛 𝐷 𝐻 (𝜌 ∥𝜎 ) ) , which
1 𝜀

means, by the fact stated in the paragraph immediately after Definition 7.73, that
there exists an (𝑛, 2−𝑛(𝑅−𝛿) , 𝜀) hypothesis testing protocol with 𝑅 = 𝐷 (𝜌∥𝜎) for
all sufficiently large 𝑛 such that (7.10.14) holds. Since 𝜀 and 𝛿 are arbitrary, we
conclude that for all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently large 𝑛, there exists an
(𝑛, 2−𝑛(𝑅−𝛿) , 𝜀) hypothesis testing protocol with 𝑅 = 𝐷 (𝜌∥𝜎). Then 𝐷 (𝜌∥𝜎) is
an achievable rate, so that

𝐸 (𝜌, 𝜎) ≥ 𝐷 (𝜌∥𝜎). (7.10.23)

Let us now show that the quantum relative entropy 𝐷 (𝜌∥𝜎) is a strong converse
rate, which establishes that 𝐸 e(𝜌, 𝜎) ≤ 𝐷 (𝜌∥𝜎). Fix 𝜀 ∈ [0, 1) and 𝛿 > 0. Let
𝛿1 , 𝛿2 > 0 be such that
𝛿 > 𝛿 1 + 𝛿 2 C 𝛿′ . (7.10.24)
Set 𝛼 ∈ (1, ∞) such that

𝛿1 ≥ 𝐷
e𝛼 (𝜌∥𝜎) − 𝐷 (𝜌∥𝜎), (7.10.25)

which is possible because lim𝛼→1 𝐷 e𝛼 (𝜌∥𝜎) = 𝐷 (𝜌∥𝜎) by Proposition 7.30 and

e𝛼 is monotonically increasing in 𝛼, as established in Proposition 7.31. With this
𝐷
value of 𝛼, take 𝑛 large enough so that

𝛼 1
𝛿2 ≥ log2 . (7.10.26)
𝑛(𝛼 − 1) 1−𝜀

Now, consider an arbitrary measurement operator Λ (𝑛) such that the hypothesis
testing protocol given by (𝑛, 𝜌, 𝜎, Λ (𝑛) ) satisfies 𝜀 ≥ Tr[( 1⊗𝑛 − Λ (𝑛) ) 𝜌 ⊗𝑛 ] and
𝜀II ≥ Tr[Λ (𝑛) 𝜎 ⊗𝑛 ]. By definition of the hypothesis testing relative entropy, we

464
Chapter 7: Quantum Entropies and Information

have that − log2 Tr[Λ (𝑛) 𝜎 ⊗𝑛 ] ≤ 𝐷 𝜀𝐻 (𝜌 ⊗𝑛 ∥𝜎 ⊗𝑛 ). Applying Proposition 7.71, we

thus find that
1 1
− log2 Tr[Λ (𝑛) 𝜎 ⊗𝑛 ] ≤ 𝐷 𝜀𝐻 (𝜌 ⊗𝑛 ∥𝜎 ⊗𝑛 ) (7.10.27)
𝑛 𝑛
𝛼 1 1e ⊗𝑛 ⊗𝑛
≤ log2 + 𝐷 𝛼 (𝜌 ∥𝜎 ) (7.10.28)
𝑛(𝛼 − 1) 1−𝜀 𝑛

𝛼 1
= log2 +𝐷
e𝛼 (𝜌∥𝜎), (7.10.29)
𝑛(𝛼 − 1) 1−𝜀
where the second line follows from the additivity of the sandwiched Rényi relative
entropy, as stated in Proposition 7.31. Rearranging the right-hand side of this
inequality and using (7.10.24)–(7.10.26), we obtain
1
− log2 Tr[Λ (𝑛) 𝜎 ⊗𝑛 ] ≤ 𝐷 (𝜌∥𝜎) + 𝐷
e𝛼 (𝜌∥𝜎) − 𝐷 (𝜌∥𝜎)
𝑛
𝛼 1
+ log2 (7.10.30)
𝑛(𝛼 − 1) 1−𝜀
′
≤ 𝐷 (𝜌∥𝜎) + 𝛿 (7.10.31)
< 𝐷 (𝜌∥𝜎) + 𝛿. (7.10.32)

We thus have that Tr[Λ (𝑛) 𝜎 ⊗𝑛 ] > 2−𝑛(𝐷 (𝜌∥𝜎)+𝛿) . Since Λ (𝑛) is an arbitrary measure-
ment operator satisfying 𝜀 ≥ Tr[( 1⊗𝑛 − Λ (𝑛) ) 𝜌 ⊗𝑛 ], we see that, for all sufficiently
large 𝑛 such that (7.10.26) holds, an (𝑛, 2−𝑛(𝐷 (𝜌∥𝜎)+𝛿) , 𝜀) hypothesis testing protocol
cannot exist, for if it did we would have Tr[Λ (𝑛) 𝜎 ⊗𝑛 ] ≤ 2−𝑛(𝐷 (𝜌∥𝜎)+𝛿) for some Λ (𝑛) .
Since 𝜀 and 𝛿 are arbitrary, we have that for all 𝜀 ∈ [0, 1), 𝛿 > 0, and sufficiently
large 𝑛, there does not exist an (𝑛, 2−𝑛(𝐷 (𝜌∥𝜎)+𝛿) , 𝜀) hypothesis testing protocol for
𝜌 and 𝜎, which means that 𝐷 (𝜌∥𝜎) is a strong converse rate, so that
e(𝜌, 𝜎) ≤ 𝐷 (𝜌∥𝜎).
𝐸 (7.10.33)

Using (7.10.23), (7.10.33), and (7.10.8), we obtain

𝐸 (𝜌, 𝜎) ≤ 𝐸
e(𝜌, 𝜎) ≤ 𝐷 (𝜌∥𝜎) ≤ 𝐸 (𝜌, 𝜎), (7.10.34)

which means that 𝐸 (𝜌, 𝜎) = 𝐸

e(𝜌, 𝜎) = 𝐷 (𝜌∥𝜎). ■

We can conclude the main result of Theorem 7.78 in a different yet related way.
Recall that an alternate definition of the optimal type-II error exponent is given in
465
Chapter 7: Quantum Entropies and Information

(7.10.1), i.e.,
𝐷 𝜀𝐻 (𝜌 ⊗𝑛 ∥𝜎 ⊗𝑛 )
𝐸 (𝜌, 𝜎) = inf lim inf . (7.10.35)
𝜀∈(0,1) 𝑛→∞ 𝑛
It is straightforward to show that

𝐷 𝜀𝐻 (𝜌 ⊗𝑛 ∥𝜎 ⊗𝑛 )
inf lim inf = 𝐷 (𝜌∥𝜎). (7.10.36)
𝜀∈(0,1) 𝑛→∞ 𝑛
Indeed, using (7.10.18), we find that
1
inf lim inf 𝐷 𝜀𝐻 (𝜌 ⊗𝑛 ∥𝜎 ⊗𝑛 ) ≥ 𝐷 𝛼 (𝜌∥𝜎) (7.10.37)
𝜀∈(0,1) 𝑛→∞ 𝑛

for all 𝛼 ∈ (0, 1), so that taking the supremum over 𝛼 ∈ (0, 1) on the right-hand
side leads to
𝐸 (𝜌, 𝜎) ≥ 𝐷 (𝜌∥𝜎). (7.10.38)
We can also write the optimal strong converse type-II error exponent as

e(𝜌, 𝜎) = sup lim sup 1 𝐷 𝜀𝐻 (𝜌 ⊗𝑛 ∥𝜎 ⊗𝑛 ).

𝐸 (7.10.39)
𝜀∈(0,1) 𝑛→∞ 𝑛

Similarly, using (7.10.29), we conclude that

1
sup lim sup 𝐷 𝜀𝐻 (𝜌 ⊗𝑛 ∥𝜎 ⊗𝑛 ) ≤ 𝐷
e𝛼 (𝜌∥𝜎) (7.10.40)
𝜀∈(0,1) 𝑛→∞ 𝑛

for all 𝛼 > 1, so that taking the infimum over 𝛼 ∈ (1, ∞) on the right-hand side
leads to
e(𝜌, 𝜎) ≤ 𝐷 (𝜌∥𝜎).
𝐸 (7.10.41)
We therefore conclude that 𝐸 (𝜌, 𝜎) = 𝐸 e(𝜌, 𝜎) = 𝐷 (𝜌∥𝜎). Note that in the
arguments above for the lower bound on 𝐸 (𝜌, 𝜎) and the upper bound on 𝐸
e(𝜌, 𝜎),
we did not have to explicitly take the infimum or supremum over 𝜀 ∈ (0, 1),
respectively.

7.10.1 Error and Strong Converse Exponents

Given states 𝜌 and 𝜎, as well as the number 𝑛 of copies of the states, we can change
our perspective a bit from that given in the previous section and instead determine
466
Chapter 7: Quantum Entropies and Information

bounds on the type-I error probability 𝜀. In particular, we can change our focus a bit,
such that we are now interested in how fast the type-I error probability converges to
zero if the type-II error exponent is equal to a constant smaller than the quantum
relative entropy, and we are also interested in how fast the type-I error probability
converges to one if the type-II error exponent is equal to a constant larger than the
quantum relative entropy. To assist with this analysis, we establish the following
propositions, whose proofs are closely related to the proofs of Propositions 7.71
and 7.72.

Proposition 7.79
Let 𝜌 be a state, and let 𝜎 be a positive semi-definite operator. Let 𝛼 > 1 and
𝑅 ≥ 0. Then, for Λ a measurement operator satisfying

Tr[Λ𝜎] ≤ 2−𝑅 , (7.10.42)

the following bound holds

Tr[(𝐼 − Λ) 𝜌] ≥ 1 − 2− ( )( 𝑅−𝐷e 𝛼 (𝜌∥𝜎) ) .

𝛼−1
𝛼 (7.10.43)

Remark: Note that the second bound is nontrivial only in the case that 𝑅 > 𝐷 (𝜌∥𝜎) because
e 𝛼 (𝜌∥𝜎) > 𝐷 (𝜌∥𝜎) for 𝛼 > 1.
𝐷

Proof: The proof is similar to the proof of Proposition 7.71. Let 𝑝 B Tr[Λ𝜌] and
𝑞 B Tr[Λ𝜎]. By applying the data-processing inequality for the sandwiched Rényi
relative entropy along with the measurement channel from (7.9.61), we conclude
that

𝐷 e𝛼 ({𝑝, 1 − 𝑝} ∥ {𝑞, Tr[𝜎] − 𝑞})

e𝛼 (𝜌∥𝜎) ≥ 𝐷 (7.10.44)
1
= log2 𝑝 𝛼 𝑞 1−𝛼 + (1 − 𝑝) 𝛼 (Tr[𝜎] − 𝑞) 1−𝛼 (7.10.45)
𝛼−1
1
≥ log2 𝑝 𝛼 𝑞 1−𝛼 (7.10.46)
𝛼−1
𝛼
= log2 𝑝 − log2 𝑞 (7.10.47)
𝛼−1
𝛼
≥ log2 𝑝 + 𝑅 (7.10.48)
𝛼−1
𝛼
= log2 Tr[Λ𝜌] + 𝑅, (7.10.49)
𝛼−1
467
Chapter 7: Quantum Entropies and Information

which implies that

Tr[Λ𝜌] ≤ 2− ( )( 𝑅−𝐷e 𝛼 (𝜌∥𝜎) )
𝛼−1
𝛼 (7.10.50)
Rewriting this gives the bound in the statement of the proposition. ■

Proposition 7.80
Let 𝜌 be a state, and let 𝜎 be a positive semi-definite operator. Let 𝛼 ∈ (0, 1)
and 𝑅 ≥ 0. Then, there exists a measurement operator Λ such that

Tr[Λ𝜎] ≤ 2−𝑅 , (7.10.51)

Tr[(𝐼 − Λ) 𝜌] ≤ 2− ( 𝛼 ) (𝐷 𝛼 (𝜌∥𝜎)−𝑅) .
1− 𝛼
(7.10.52)

Remark: Note that the second bound above is nontrivial only in the case that 𝑅 < 𝐷 (𝜌∥𝜎)
because 𝐷 𝛼 (𝜌∥𝜎) < 𝐷 (𝜌∥𝜎) for 𝛼 ∈ (0, 1).

Proof: The proof is similar to the proof of Proposition 7.72. Employing the same
measurement operator Λ∗ therein, we conclude from the same reasoning in that
proof that
1−𝛼
∗ 1− 𝑝
Tr[(𝐼 − Λ ) 𝜌] ≤ Tr[𝜌 𝛼 𝜎 1−𝛼 ], (7.10.53)
𝑝
𝛼
𝑝
Tr[Λ∗ 𝜎] ≤ Tr[𝜌 𝛼 𝜎 1−𝛼 ]. (7.10.54)
1− 𝑝

We then pick 𝑝 ∈ (0, 1) such that the following equation is satisfied

𝛼
−𝑅 𝑝
2 = Tr[𝜌 𝛼 𝜎 1−𝛼 ] (7.10.55)
1− 𝑝
𝛼
𝑝
= 2 (𝛼−1)𝐷 𝛼 (𝜌∥𝜎) (7.10.56)
1− 𝑝
𝛼
−𝑅 (1−𝛼)𝐷 𝛼 (𝜌∥𝜎) 𝑝
⇐⇒ 2 2 = (7.10.57)
1− 𝑝

1 − 𝑝
⇐⇒ 2 𝑅/𝛼 2 (𝛼−1)𝐷 𝛼 (𝜌∥𝜎)/𝛼 = . (7.10.58)
𝑝

468
Chapter 7: Quantum Entropies and Information

We see that picking 𝑝 in such a way is always possible because one more step of
the development above leads to the conclusion that
1
𝑝= ∈ (0, 1) . (7.10.59)
1 + 2 𝑅/𝛼 2 (𝛼−1)𝐷 𝛼 (𝜌∥𝜎)/𝛼
Substituting into (7.10.53), we find that
1−𝛼
𝑅/𝛼 (𝛼−1)𝐷 𝛼 (𝜌∥𝜎)/𝛼
∗
Tr[(𝐼 − Λ ) 𝜌] ≤ 2 2 2 (𝛼−1)𝐷 𝛼 (𝜌∥𝜎) (7.10.60)
(1− 𝛼) 2
= 2 ( 𝛼 ) 𝑅 2− 𝛼
1− 𝛼
𝐷 𝛼 (𝜌∥𝜎) (𝛼−1)𝐷 𝛼 (𝜌∥𝜎)
2 (7.10.61)
( 𝛼−1)
= 2( ) 𝑅2
1− 𝛼
𝛼 𝛼 𝐷 𝛼 (𝜌∥𝜎) (7.10.62)
= 2− ( ) (𝐷 𝛼 (𝜌∥𝜎)−𝑅) ,
1− 𝛼
𝛼 (7.10.63)

concluding the proof. ■

The inequalities from Propositions 7.79 and 7.80 lead to the following bounds
on the type-I error probability 𝜀 for quantum hypothesis testing, when the type-II
error probability has a fixed rate 𝑅:

1 − 2−𝑛 ( )( 𝑅−𝐷e 𝛼 (𝜌∥𝜎) ) ≤ 𝜀 ≤ 2−𝑛 ( 1−𝛼𝛼 ) (𝐷 𝛼 (𝜌∥𝜎)−𝑅) .

𝛼−1
𝛼 (7.10.64)

The left inequality holds for all 𝛼 > 1, while the right inequality holds for all
𝛼 ∈ (0, 1). Let us now examine the behavior of 𝜀 above and below the optimal rate
𝐷 (𝜌∥𝜎).

1. Consider a sequence {(𝑛, 2−𝑛𝑅 , 𝜀 𝑛 )}𝑛∈N of hypothesis testing protocols for 𝜌

and 𝜎 such that the rate of each protocol has some arbitrary (but fixed) value
𝑅 < 𝐷 (𝜌∥𝜎). By Proposition 7.80, the sequence can be chosen such that each
element satisfies the right-most inequality in (7.10.64), so that

𝜀 𝑛 ≤ 2−𝑛 ( ) (𝐷 𝛼 (𝜌∥𝜎)−𝑅) .
1− 𝛼
𝛼 (7.10.65)

Since the Petz–Rényi relative entropy 𝐷 𝛼 (𝜌∥𝜎) is monotonically increasing

in 𝛼, as established in Proposition 7.23, there exists an 𝛼∗ < 1 such that
𝐷 𝛼∗ (𝜌∥𝜎) lies in between 𝐷 (𝜌∥𝜎) and 𝑅; i.e., we have that 𝑅 < 𝐷 𝛼∗ (𝜌∥𝜎).
We thus obtain ∗

1− 𝛼
−𝑛 𝛼∗
(𝐷 𝛼∗ (𝜌∥𝜎)−𝑅)
𝜀𝑛 ≤ 2 . (7.10.66)

469
Chapter 7: Quantum Entropies and Information

n→∞

1
Type-I
Error
Probability,
εn
Type-II
0 Error
Exponent,
D (ρkσ)
R

Figure 7.3: The type-I error probability 𝜀 𝑛 as a function of the rate 𝑅, i.e., the
type-II error exponent, as the number 𝑛 of copies of the system approaches
infinity for the task of asymmetric hypothesis testing for the states 𝜌 and 𝜎.
The optimal rate of 𝐷 (𝜌∥𝜎) for this task, as established by the quantum Stein’s
lemma in Theorem 7.78, has what is called the strong converse property, which
means that it is the optimal strong converse rate. Therefore, for every rate above
it, the type-I error probability converges to one in the limit of arbitrarily many
copies of the system.

Since 𝑅 < 𝐷 𝛼∗ (𝜌∥𝜎), taking the limit 𝑛 → ∞ on both sides of this inequality
gives us lim𝑛→∞ 𝜀 𝑛 ≤ 0. However, 𝜀 𝑛 ≥ 0 for all 𝑛 because 𝜀 𝑛 is by definition
a probability. So we find that
𝜀 𝑛 → 0 as 𝑛 → ∞ if 𝑅 < 𝐷 (𝜌∥𝜎). (7.10.67)

2. Now consider a sequence {(𝑛, 2−𝑛𝑅 , 𝜀 𝑛 )}𝑛∈N of hypothesis testing protocols

for 𝜌 and 𝜎 such that the rate of each protocol has some arbitrary (but fixed)
value 𝑅 > 𝐷 (𝜌∥𝜎). In other words, suppose that we would like to perform
hypothesis testing at a rate above the optimal rate. Each element of the sequence
satisfies the left-most inequality in (7.10.64), so that
𝜀 𝑛 ≥ 1 − 2−𝑛 ( )( 𝑅−𝐷e 𝛼 (𝜌∥𝜎) ) .
𝛼−1
𝛼 (7.10.68)
Then, recalling from Proposition 7.31 that the sandwiched Rényi relative entropy
e𝛼 (𝜌∥𝜎) is monotonically increasing in 𝛼, there exists an 𝛼∗ > 1 such that
𝐷
e𝛼∗ (𝜌∥𝜎) lies in between 𝐷 (𝜌∥𝜎) and 𝑅; i.e., we have that 𝑅 > 𝐷
𝐷 e𝛼∗ (𝜌∥𝜎).
We thus obtain ∗
−𝑛 𝛼𝛼−1
∗ ( 𝑅−𝐷e 𝛼∗ (𝜌∥𝜎) )
𝜀𝑛 ≥ 1 − 2 . (7.10.69)
Since 𝑅 > 𝐷 e𝛼∗ (𝜌∥𝜎), taking the limit 𝑛 → ∞ on both sides of this inequality
gives us lim𝑛→∞ 𝜀 𝑛 ≥ 1. However, 𝜀 𝑛 ≤ 1 for all 𝑛 because 𝜀 𝑛 is by definition
a probability. So we conclude that
𝜀 𝑛 → 1 as 𝑛 → ∞ if 𝑅 > 𝐷 (𝜌∥𝜎). (7.10.70)
470
Chapter 7: Quantum Entropies and Information

So the type-I error probability grows exponentially to one in the limit 𝑛 → ∞

for every rate above the optimal rate.

The optimal rate 𝐷 (𝜌∥𝜎) is therefore a sharp dividing point below which the
type-I error probability 𝜀 𝑛 exponentially drops to zero as 𝑛 → ∞ and above which
it exponentially increases to one as 𝑛 → ∞. This behavior is illustrated in Figure
7.3.

7.11 Information Measures for Quantum Channels

We conclude this chapter with a discussion of information measures for quantum
channels.
Given an information measure defined for quantum states, we define a cor-
responding information measure for channels by sending one share of a bipar-
tite state through a given channel and evaluating the information measure on
the corresponding output state. Specifically, for every generalized divergence
𝑫 : D(H) × L+ (H) → R ∪ {+∞}, which was given in Definition 7.15, we define
the generalized channel divergence as follows:

Definition 7.81 Generalized Channel Divergence

Given a generalized divergence 𝑫 : D(H) × L+ (H) → R ∪ {+∞}, a quantum
channel N 𝐴→𝐵 , and a completely positive map M 𝐴→𝐵 , we define the generalized
channel divergence of N and M as

𝑫 (N∥M) B sup 𝑫 (N 𝐴→𝐵 (𝜌 𝑅 𝐴 )∥M 𝐴→𝐵 (𝜌 𝑅 𝐴 )) , (7.11.1)

𝜌𝑅 𝐴

where the supremum is over all mixed states 𝜌 𝑅 𝐴 with an arbitrary reference
system 𝑅.

Proposition 7.82
Let 𝑫, N 𝐴→𝐵 , and M 𝐴→𝐵 be as given in Definition 7.81. It suffices to optimize
the generalized channel divergence 𝑫 (N∥M) with respect to pure states 𝜓 𝑅 𝐴

471
Chapter 7: Quantum Entropies and Information

with the reference system 𝑅 isomorphic to the input system 𝐴:

𝑫 (N∥M) = sup 𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥M 𝐴→𝐵 (𝜓 𝑅 𝐴 )) (7.11.2)

𝜓𝑅 𝐴
√ √ √ M√
= sup 𝑫 ( 𝜌 𝐴 ΓN
𝐴𝐵 𝜌 𝐴 ∥ 𝜌 𝐴 Γ 𝐴𝐵 𝜌 𝐴 ). (7.11.3)
𝜌𝐴

In the second line, ΓN M

𝐴𝐵 and Γ 𝐴𝐵 denote the Choi operators of N 𝐴→𝐵 and M 𝐴→𝐵 ,
respectively, and the optimization is with respect to every density operator 𝜌 𝐴 .

Proof: Observe that if we take a purification |𝜙⟩ 𝑅′ 𝑅 𝐴 of the state 𝜌 𝑅 𝐴 in the

optimization in (7.11.1), then we find that

𝑫 (N 𝐴→𝐵 (𝜌 𝑅 𝐴 )∥M 𝐴→𝐵 (𝜌 𝑅 𝐴 )) (7.11.4)

= 𝑫 (N 𝐴→𝐵 (Tr 𝑅′ [𝜙 𝑅′ 𝑅 𝐴 ])∥M 𝐴→𝐵 (Tr 𝑅′ [𝜙 𝑅′ 𝑅 𝐴 ])) (7.11.5)
= 𝑫 (Tr 𝑅′ [N 𝐴→𝐵 (𝜙 𝑅′ 𝑅 𝐴 )] ∥Tr 𝑅′ [M 𝐴→𝐵 (𝜙 𝑅′ 𝑅 𝐴 )]) (7.11.6)
≤ 𝑫 (N 𝐴→𝐵 (𝜙 𝑅′ 𝑅 𝐴 )∥M 𝐴→𝐵 (𝜙 𝑅′ 𝑅 𝐴 )) , (7.11.7)

where we have used the data-processing inequality for the generalized divergence
in the last line. This means that for every state 𝜌 𝑅 𝐴 , the generalized divergence in
(7.11.4) is never larger than the corresponding generalized divergence evaluated on
a purification of 𝜌 𝑅 𝐴 . This means that it suffices in (7.11.1) to optimize over only
pure states. Furthermore, by the Schmidt decomposition theorem (Theorem 2.2),
the purifying space H 𝑅′ 𝑅 need not have dimension exceeding that of the dimension
of H 𝐴 . Therefore, the generalized channel divergence can be written as in (7.11.2).
To see (7.11.3), we first use the fact in (2.2.38), which implies that for every
pure state 𝜓 𝑅 𝐴 , there exists a state 𝜌 𝑅 and a unitary 𝑈 𝑅 such that
√
|𝜓⟩ 𝑅 𝐴 = (𝑈 𝑅 𝜌 𝑅 ⊗ 1 𝐴 )|Γ⟩ 𝑅 𝐴 . (7.11.8)

Thus, it follows that

√ √
N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) = N 𝐴→𝐵 ((𝑈 𝑅 𝜌 𝑅 ⊗ 1 𝐴 )Γ𝑅 𝐴 ( 𝜌 𝑅𝑈 𝑅† ⊗ 1 𝐴 )) (7.11.9)
√ √
= (𝑈 𝑅 𝜌 𝑅 ⊗ 1𝐵 )N 𝐴→𝐵 (Γ𝑅 𝐴 )( 𝜌 𝑅𝑈 𝑅† ⊗ 1𝐵 ) (7.11.10)
√ N √
= (𝑈 𝑅 𝜌 𝑅 ⊗ 1𝐵 )Γ𝑅𝐵 ( 𝜌 𝑅𝑈 𝑅† ⊗ 1𝐵 ), (7.11.11)

where we employed the Choi representation Γ𝑅𝐵 N of the channel N. Similarly,

√ M √ 𝜌 𝑈 † . By employing the unitary invariance of a
M 𝐴→𝐵 (𝜓 𝑅 𝐴 ) = 𝑈 𝑅 𝜌 𝑅 Γ𝑅𝐵 𝑅 𝑅

472
Chapter 7: Quantum Entropies and Information

generalized divergence, we conclude that

√ √ √ M√
𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥M 𝐴→𝐵 (𝜓 𝑅 𝐴 )) = 𝑫 ( 𝜌 𝐴 ΓN
𝐴𝐵 𝜌 𝐴 ∥ 𝜌 𝐴 Γ 𝐴𝐵 𝜌 𝐴 ). (7.11.12)

Then it suffices to optimize over states 𝜌 𝐴 , so that (7.11.3) holds. ■

Proposition 7.83
Let 𝑫, N 𝐴→𝐵 , and M 𝐴→𝐵 be as given in Definition 7.81, and suppose that 𝑫
obeys the direct-sum property in (7.3.7). Then the function
√ √ √ M√
𝑓 (𝜌 𝐴 , M 𝐴→𝐵 ) B 𝑫 ( 𝜌 𝐴 ΓN
𝐴𝐵 𝜌 𝐴 ∥ 𝜌 𝐴 Γ 𝐴𝐵 𝜌 𝐴 ) (7.11.13)

is concave in the first argument and convex in the second. If 𝔐 is a convex set
of completely positive maps, then
√ √ √ M√
inf sup 𝐷 ( 𝜌 𝐴 ΓN
𝐴𝐵 𝜌 𝐴 ∥ 𝜌 𝐴 Γ 𝐴𝐵 𝜌 𝐴 )
M∈𝔐 𝜌 𝐴
√ √ √ M√
= sup inf 𝐷 ( 𝜌 𝐴 ΓN
𝐴𝐵 𝜌 𝐴 ∥ 𝜌 𝐴 Γ 𝐴𝐵 𝜌 𝐴 ). (7.11.14)
𝜌 𝐴 M∈𝔐

Equivalently,

inf 𝑫 (N∥M) = sup inf 𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥M 𝐴→𝐵 (𝜓 𝑅 𝐴 )), (7.11.15)

M∈𝔐 𝜓 𝑅 𝐴 M∈𝔐

where 𝜓 𝑅 𝐴 is a pure state with system 𝑅 isomorphic to system 𝐴.

Proof: To see the concavity, let 𝜓 0𝑅 𝐴 and 𝜓 1𝑅 𝐴 be pure states with reduced states
𝜓 0𝐴 and 𝜓 1𝐴 . Let 𝜓 𝑆𝑅
𝜆
𝐴 denote the following pure state:
√ √
|𝜓 𝜆 ⟩𝑆𝑅 𝐴 := 1 − 𝜆|0⟩𝑆 |𝜓 0 ⟩ 𝑅 𝐴 + 𝜆|1⟩𝑆 |𝜓 1 ⟩ 𝑅 𝐴 . (7.11.16)

Observe that
𝜓 𝜆𝐴 = (1 − 𝜆) 𝜓 0𝐴 + 𝜆𝜓 1𝐴 , (7.11.17)
so that the reduced state 𝜓 𝜆𝐴 is a convex combination of the reduced states 𝜓 0𝐴 and
𝜓 1𝐴 . Define
𝜆
𝜓 𝑆𝑅 𝐴 := (1 − 𝜆) |0⟩⟨0| 𝑆 ⊗ 𝜓 0𝑅 𝐴 + 𝜆|1⟩⟨1| 𝑆 ⊗ 𝜓 1𝑅 𝐴 , (7.11.18)

473
Chapter 7: Quantum Entropies and Information

which is the state resulting from the action of a completely dephasing qubit channel
on system 𝑆. Let 𝜙𝜆𝑅 𝐴 be an arbitrary pure state with reduced state equal to 𝜓 𝜆𝐴 .
Then we find that

𝑫 (N 𝐴→𝐵 (𝜙𝜆𝑅 𝐴 )∥M 𝐴→𝐵 (𝜙𝜆𝑅 𝐴 ))

𝜆 𝜆
= 𝑫 (N 𝐴→𝐵 (𝜓 𝑆𝑅 𝐴 )∥M 𝐴→𝐵 (𝜓 𝑆𝑅 𝐴 )) (7.11.19)
𝜆 𝜆
≥ 𝑫 (N 𝐴→𝐵 (𝜓 𝑆𝑅 𝐴 )∥M 𝐴→𝐵 (𝜓 𝑆𝑅 𝐴 )) (7.11.20)
= 𝜆𝑫 (N 𝐴→𝐵 (𝜓 0𝑅 𝐴 )∥M 𝐴→𝐵 (𝜓 0𝑅 𝐴 ))
+ (1 − 𝜆) 𝑫 (N 𝐴→𝐵 (𝜓 1𝑅 𝐴 )∥M 𝐴→𝐵 (𝜓 1𝑅 𝐴 )). (7.11.21)

The first equality follows because every two purifications of the same state are
related by an isometric channel acting on the reference system, as well as the
isometric invariance of the generalized divergence. The inequality follows from
quantum data processing, by the action of a completely dephasing qubit channel
on the system 𝑆. The final equality follows from the direct-sum property and
because the generalized divergence is invariant under tensoring in the same state
(Proposition 7.16). Finally, we have the following equalities, by employing the
isometric invariance of the generalized divergence (Proposition 7.16), the equality
in (7.11.12), and the definition in (7.11.13):

𝑫 (N 𝐴→𝐵 (𝜙𝜆𝑅 𝐴 )∥M 𝐴→𝐵 (𝜙𝜆𝑅 𝐴 )) = 𝑓 (𝜓 𝜆𝐴 , M 𝐴→𝐵 ), (7.11.22)

𝑫 (N 𝐴→𝐵 (𝜓 0𝑅 𝐴 )∥M 𝐴→𝐵 (𝜓 0𝑅 𝐴 )) = 𝑓 (𝜓 0𝐴 , M 𝐴→𝐵 ), (7.11.23)
𝑫 (N 𝐴→𝐵 (𝜓 1𝑅 𝐴 )∥M 𝐴→𝐵 (𝜓 1𝑅 𝐴 )) = 𝑓 (𝜓 1𝐴 , M 𝐴→𝐵 ). (7.11.24)

To see the convexity in M, consider that for every 𝜆 ∈ [0, 1] and completely
positive maps M1 , M2 ∈ 𝔐, the joint convexity of the generalized divergence
(Proposition 7.17) gives that

𝑓 (𝜌 𝐴 , 𝜆M1 + (1 − 𝜆)M2 )
√ √ √ 𝜆M1 +(1−𝜆)M2 √
= 𝑫 ( 𝜌 𝐴 ΓN 𝐴𝐵 𝜌 𝐴 ∥ 𝜌 𝐴 Γ 𝐴𝐵 𝜌 𝐴) (7.11.25)
√ √ √ √
= 𝑫 (𝜆 𝜌 𝐴 ΓN 𝐴𝐵 𝜌 𝐴 + (1 − 𝜆) 𝜌 𝐴 Γ 𝐴𝐵 𝜌 𝐴
N
√ √ √ M2 √
∥𝜆 𝜌 𝐴 ΓM𝐴𝐵
1
𝜌 𝐴 + (1 − 𝜆) 𝜌 𝐴 Γ 𝐴𝐵 𝜌 𝐴 ) (7.11.26)
√ √ √ M1 √
≤ 𝜆𝑫 ( 𝜌 𝐴 ΓN 𝐴𝐵 𝜌 𝐴 ∥ 𝜌 𝐴 Γ 𝐴𝐵 𝜌 𝐴 )
√ √ √ M2 √
+ (1 − 𝜆) 𝑫 ( 𝜌 𝐴 ΓN 𝐴𝐵 𝜌 𝐴 ∥ 𝜌 𝐴 Γ 𝐴𝐵 𝜌 𝐴 ) (7.11.27)
= 𝜆 𝑓 (𝜌 𝐴 , M1 ) + (1 − 𝜆) 𝑓 (𝜌 𝐴 , M2 ). (7.11.28)
474
Chapter 7: Quantum Entropies and Information

The equality in (7.11.14) follows from what was just shown and Sion’s minimax
theorem (Theorem 2.24). The equality in (7.11.15) follows from (7.11.12) and
Proposition 7.82. ■

The generalized channel divergence takes a simple form if the channel N and
the completely positive map M both happen to be jointly covariant with respect to
a group, as shown in the following proposition.

Proposition 7.84 Generalized Divergence for Jointly Covariant Channels

Let 𝑫 : D(H) × L+ (H) → R ∪ {+∞} be a generalized divergence and 𝐺 a
𝑔 𝑔
finite group with unitary representations {𝑈 𝐴 }𝑔∈𝐺 and {𝑉𝐵 }𝑔∈𝐺 . Let N 𝐴→𝐵
be a quantum channel and M 𝐴→𝐵 a completely positive map that are both
covariant with respect to 𝐺, i.e., (Definition 4.18)
N(𝑈𝑔 𝜌𝑈𝑔† ) = 𝑉𝑔 N(𝜌)𝑉𝑔† ,
(7.11.29)
M(𝑈𝑔 𝜌𝑈𝑔† ) = 𝑉𝑔 M(𝜌)𝑉𝑔† ,
for every 𝑔 ∈ 𝐺 and state 𝜌. Then, for every state 𝜓 𝐴′ 𝐴 , with the dimension of
𝐴′ equal to the dimension of 𝐴,
𝑫 (N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )∥M 𝐴→𝐵 (𝜓 𝐴′ 𝐴 ))
𝜌 𝜌
≤ 𝑫 (N 𝐴→𝐵 (𝜙 𝑅 𝐴 )∥M 𝐴→𝐵 (𝜙 𝑅 𝐴 )), (7.11.30)
where 𝜌 𝐴 B 𝜓 𝐴 = Tr 𝐴′ [𝜓 𝐴′ 𝐴 ],
1 ∑︁ 𝑔 𝑔†
𝜌 𝐴 B T𝐺 (𝜌 𝐴 ) B 𝑈 𝐴 𝜌 𝐴𝑈 𝐴 , (7.11.31)
|𝐺 | 𝑔∈𝐺
𝜌
and 𝜙 𝑅 𝐴 is a purification of 𝜌 𝐴 . Consequently, the generalized channel
divergence 𝑫 (N∥M) is given by optimizing over pure states 𝜓 𝐴′ 𝐴 such that the
reduced state 𝜓 𝐴 is invariant under the channel T𝐺 ; i.e.,
𝑫 (N∥M)
= sup {𝑫 (N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )∥M 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )) : 𝜓 𝐴 = T𝐺 (𝜓 𝐴 )}. (7.11.32)
𝜓 𝐴′ 𝐴
𝑔
In particular, if the representation {𝑈 𝐴 }𝑔∈𝐺 is irreducible, then the optimal state
in (7.11.32) is the maximally entangled state Φ 𝐴′ 𝐴 , so that
𝑫 (N∥M) = 𝑫 (𝜌 N M
𝐴𝐵 ∥ 𝜌 𝐴𝐵 ), (7.11.33)

475
Chapter 7: Quantum Entropies and Information

where 𝜌 N M
𝐴𝐵 and 𝜌 𝐴𝐵 are Choi states of N and M, respectively.

Proof: The inequality

𝑫 (N∥M) ≥ sup {𝑫 (N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )∥M 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )) : 𝜓 𝐴 = T𝐺 (𝜓 𝐴 )} (7.11.34)

𝜓 𝐴′ 𝐴

is immediate from the fact that the set {𝜓 𝐴′ 𝐴 : 𝜓 𝐴 = T𝐺 (𝜓 𝐴 )} of pure states is

a subset of all pure states. The remainder of this proof is devoted to the reverse
inequality.
Let 𝜓 𝐴′ 𝐴 be an arbitrary state, let 𝜌 𝐴 = 𝜓 𝐴 , and let 𝜌 𝐴 = T𝐺 (𝜌 𝐴 ). Furthermore,
𝜌
let 𝜙 𝑅 𝐴 be a purification of 𝜌 𝐴 . Let us also consider the following purification of
𝜌 𝐴:
1 ∑︁
|𝑔⟩ 𝑅′ ⊗ ( 1 𝐴′ ⊗ 𝑈 𝐴 )|𝜓⟩ 𝐴′ 𝐴 ,
𝜌 ′ ′ 𝑔
|𝜓 ⟩ 𝑅 𝐴 𝐴 B √︁ (7.11.35)
|𝐺 | 𝑔∈𝐺
where {|𝑔⟩}𝑔∈𝐺 is an orthonormal basis for H 𝑅′ indexed by the elements of 𝐺.
Since all purifications of a state can be mapped to each other by isometries
acting on the purifying systems, there exists an isometry 𝑊 𝑅→𝑅′ 𝐴′ such that
|𝜓 𝜌 ⟩ 𝑅′ 𝐴′ 𝐴 = 𝑊 𝑅→𝑅′ 𝐴′ |𝜙 𝜌 ⟩ 𝑅 𝐴 . Using this, we find that
𝜌 𝜌
𝑫 (N 𝐴→𝐵 (𝜓 𝑅′ 𝐴′ 𝐴 )∥M 𝐴→𝐵 (𝜓 𝑅′ 𝐴′ 𝐴 )) (7.11.36)
𝜌 𝜌
= 𝑫 (N 𝐴→𝐵 (W 𝑅→𝑅′ 𝐴′ (𝜙 𝑅 𝐴 ))∥M 𝐴→𝐵 (W 𝑅→𝑅′ 𝐴′ (𝜙 𝑅 𝐴 ))) (7.11.37)
𝜌 𝜌
= 𝑫 (W 𝑅→𝑅′ 𝐴′ (N 𝐴→𝐵 (𝜙 𝑅 𝐴 ))∥W 𝑅→𝑅′ 𝐴′ (M 𝐴→𝐵 (𝜙 𝑅 𝐴 ))) (7.11.38)
𝜌 𝜌
= 𝑫 (N 𝐴→𝐵 (𝜙 𝑅 𝐴 )∥M 𝐴→𝐵 (𝜙 𝑅 𝐴 )), (7.11.39)

where the last equality follows from the fact that every generalized divergence is
isometrically invariant (recall Proposition 7.16). Now, let us apply the dephasing
map 𝑋 ↦→ 𝑔∈𝐺 |𝑔⟩⟨𝑔|𝑋 |𝑔⟩⟨𝑔| to the 𝑅′ system. Since this map is a channel, by the
Í
data-processing inequality for the generalized divergence, we obtain

𝜌 𝜌
𝑫 N 𝐴→𝐵 (𝜓 𝑅′ 𝐴′ 𝐴 )∥M 𝐴→𝐵 (𝜓 𝑅′ 𝐴′ 𝐴 )

476
Chapter 7: Quantum Entropies and Information

1 ∑︁ 𝑔
|𝑔⟩⟨𝑔| 𝑅′ ⊗ (M 𝐴→𝐵 ◦ U 𝐴 )(𝜓 𝐴′ 𝐴 ) ® . (7.11.40)
ª
|𝐺 | 𝑔∈𝐺
¬
Then, because generalized divergences are invariant under unitaries, we can apply
Í 𝑔†
the unitary channel given by the unitary 𝑔∈𝐺 |𝑔⟩⟨𝑔| ⊗ 𝑉𝐵 at the output of N and
M to obtain

𝜌 𝜌
𝑫 N 𝐴→𝐵 (𝜓 𝑅′ 𝐴′ 𝐴 )∥M 𝐴→𝐵 (𝜓 𝑅′ 𝐴′ 𝐴 )

© 1 ∑︁
|𝑔⟩⟨𝑔| 𝑅′ ⊗ ((V𝐵 ) † ◦ N 𝐴→𝐵 ◦ U 𝐴 )(𝜓 𝐴′ 𝐴 )
𝑔 𝑔
≥ 𝑫
|𝐺 | 𝑔∈𝐺
«
1 ∑︁
|𝑔⟩⟨𝑔| 𝑅′ ⊗ ((V𝐵 ) † ◦ M 𝐴→𝐵 ◦ U 𝐴 )(𝜓 𝐴′ 𝐴 ) ® .
𝑔 𝑔 ª
(7.11.41)
|𝐺 | 𝑔∈𝐺
¬
Finally, since the group-covariance of N and M with respect to the representations
𝑔 𝑔
{𝑈 𝐴 }𝑔∈𝐺 and {𝑉𝐵 }𝑔∈𝐺 implies that

(V𝐵 ) † ◦ N ◦ U 𝐴 = N,
𝑔 𝑔 𝑔† 𝑔
V𝐵 ◦ M ◦ U 𝐴 = M (7.11.42)

for all 𝑔 ∈ 𝐺, and since from Proposition 7.16 generalized divergences are invariant
under tensoring with the same state in both arguments, we obtain
𝜌 𝜌
𝑫 (N 𝐴→𝐵 (𝜙 𝑅 𝐴 )∥M 𝐴→𝐵 (𝜙 𝑅 𝐴 )) (7.11.43)

𝜌 𝜌
= 𝑫 N 𝐴→𝐵 (𝜓 𝑅′ 𝐴′ 𝐴 )∥M 𝐴→𝐵 (𝜓 𝑅′ 𝐴′ 𝐴 ) (7.11.44)

© 1 ∑︁ 1 ∑︁
≥ 𝑫 |𝑔⟩⟨𝑔| 𝑅′ ⊗ N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 ) |𝑔⟩⟨𝑔| 𝑅′ ⊗ M 𝐴→𝐵 (𝜓 𝐴′ 𝐴 ) ®
ª
|𝐺 | 𝑔∈𝐺 |𝐺 | 𝑔∈ 𝐺
« ¬
(7.11.45)
= 𝑫 (N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )∥(M 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )) , (7.11.46)
𝜌
which is precisely (7.11.30). By definition, the pure state 𝜙 𝑅 𝐴 is such that its
reduced state on 𝐴 is invariant under the channel T𝐺 . Therefore, optimizing over
all such pure states, we obtain

𝑫 (N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )∥M 𝐴→𝐵 (𝜓 𝐴′ 𝐴 ))

477
Chapter 7: Quantum Entropies and Information

≤ sup{𝑫 (N 𝐴→𝐵 (𝜙 𝑅 𝐴 )∥M 𝐴→𝐵 (𝜙 𝑅 𝐴 )) : 𝜙 𝐴 = T𝐺 (𝜙 𝐴 )}. (7.11.47)

𝜙𝑅 𝐴

Since this inequality holds for all pure states 𝜓 𝐴′ 𝐴 , we obtain

𝑫 (N∥M) = sup 𝑫 (N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )∥M 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )) (7.11.48)

𝜓 𝐴′ 𝐴
≤ sup {𝑫 (N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )∥M 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )) : 𝜓 𝐴 = T𝐺 (𝜓 𝐴 )}. (7.11.49)
𝜓 𝐴′ 𝐴

Combining this inequality with (7.11.34) gives us (7.11.32).

𝑔
Finally, if the representation {𝑈 𝐴 }𝑔∈𝐺 acting on the input space of the channel
N and the completely positive map M is irreducible, then for every state 𝜓 𝐴′ 𝐴 such
that 𝜌 𝐴 = 𝜓 𝐴 , it holds that 𝜌 𝐴 = 1 𝐴 /𝑑 𝐴 . Then, since the maximally entangled state
𝜌
is a purification of the maximally mixed state, we let 𝜙 𝑅 𝐴 = Φ 𝑅 𝐴 , which implies by
(7.11.39) that
𝜌 𝜌
𝑫 (N 𝐴→𝐵 (𝜓 𝑅′ 𝐴′ 𝐴 )∥M 𝐴→𝐵 (𝜓 𝑅′ 𝐴′ 𝐴 )) (7.11.50)
= 𝑫 (N 𝐴→𝐵 (Φ 𝑅 𝐴 )∥M 𝐴→𝐵 (Φ 𝑅 𝐴 )) (7.11.51)
= 𝑫 (𝜌 N M
𝑅𝐵 ∥ 𝜌 𝑅𝐵 ). (7.11.52)

Then, by (7.11.46), we have that

𝑫 (𝜌 N M
𝑅𝐵 ∥ 𝜌 𝑅𝐵 ) ≥ 𝑫 (N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )∥M 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )) (7.11.53)

for all states 𝜓 𝐴′ 𝐴 , so that 𝑫 (𝜌 N M

𝑅𝐵 ∥ 𝜌 𝑅𝐵 ) ≥ 𝑫 (N∥M). Since the reverse inequality
trivially holds, we obtain (7.11.33). ■

We can also define information measures for channels based on information

measures for states derived from generalized divergences. Using the generalized
coherent information and the generalized mutual information defined in (7.3.13)
and (7.3.14), respectively, we make the following definitions.

Definition 7.85 Generalized Information Measures for Quantum Chan-

nels
Let 𝑫 be a generalized divergence, as defined in Definition 7.15, and let N 𝐴→𝐵
be a quantum channel.

478
Chapter 7: Quantum Entropies and Information

1. The generalized mutual information of N, denoted by 𝑰(N), is defined as

𝑰(N) B sup 𝑰(𝑅; 𝐵)𝜔

𝜓𝑅 𝐴
(7.11.54)
= sup inf 𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ 𝜎𝐵 ),
𝜓 𝑅 𝐴 𝜎𝐵

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), 𝜓 𝑅 𝐴 is a pure state with the dimension of 𝑅

the same as the dimension of 𝐴, and the generalized mutual information
𝑰( 𝐴; 𝐵) 𝜌 of a bipartite state 𝜌 𝐴𝐵 has been defined in (7.3.14).
2. The generalized coherent information of N, denoted by 𝑰 𝑐 (N), is defined
as
𝑰 𝑐 (N) B sup 𝑰(𝑅⟩𝐵)𝜔
𝜓𝑅 𝐴
(7.11.55)
= sup inf 𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥ 1 𝑅 ⊗ 𝜎𝐵 ),
𝜓 𝑅 𝐴 𝜎𝐵

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), 𝜓 𝑅 𝐴 is a pure state with the dimension of 𝑅

the same as the dimension of 𝐴, and the generalized coherent information
𝑰 𝑐 ( 𝐴⟩𝐵) 𝜌 of a bipartite state 𝜌 𝐴𝐵 has been defined in (7.3.13).
3. The generalized Holevo information of N, denoted by 𝝌(N), is defined as

𝝌(N) B sup 𝑰(𝑋; 𝐵)𝜔

𝜌𝑋 𝐴
(7.11.56)
= sup inf 𝑫 (N 𝐴→𝐵 (𝜌 𝑋 𝐴 )∥ 𝜌 𝑋 ⊗ 𝜎𝐵 ),
𝜌 𝑋 𝐴 𝜎𝐵
Í
where 𝜔 𝑋 𝐴 = N 𝐴→𝐵 (𝜌 𝑋 𝐴 ). Here, 𝜌 𝑋 𝐴 = 𝑥∈X 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 is a
classical–quantum state, with X a finite alphabet with corresponding |X|-
dimensional system 𝑋, {𝜌 𝑥𝐴 }𝑥∈X is a set of states, and 𝑝 : X → [0, 1] is
a probability distribution on X. The supremum is additionally over the
classical system 𝑋; i.e., no restriction is made on the size of X.

The generalized mutual information 𝑰(N) and the generalized coherent infor-
mation 𝑰 𝑐 (N) both involve an optimization over pure states. It is straightforward to
show that it suffices to optimize over pure states for both quantities. The argument
is similar to that in (7.11.4)–(7.11.7) above.
For the generalized mutual information of a covariant channel, we can prove a

479
Chapter 7: Quantum Entropies and Information

result that is analogous to Proposition 7.84.

Proposition 7.86 Generalized Mutual Information for Covariant Chan-

nels
Let N 𝐴→𝐵 be a 𝐺-covariant quantum channel for a finite group 𝐺. Then, for
all pure states 𝜓 𝑅 𝐴 , with the dimension of 𝑅 equal to the dimension of 𝐴, we
have that
𝑰(𝑅; 𝐵)𝜔 ≤ 𝑰(𝑅; 𝐵)𝜔 , (7.11.57)
𝜌
where 𝜔 𝑅𝐵 B N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), 𝜔 𝑅𝐵 B N 𝐴→𝐵 (𝜙 𝑅 𝐴 ),

1 ∑︁ 𝑔 𝑔†
𝜌𝐴 = 𝑈 𝐴 𝜌 𝐴𝑈 𝐴 C T𝐺 (𝜌 𝐴 ), (7.11.58)
|𝐺 | 𝑔∈𝐺

𝜌
𝜌 𝐴 = 𝜓 𝐴 = Tr 𝑅 [𝜓 𝑅 𝐴 ], and 𝜙 𝑅 𝐴 is a purification of 𝜌 𝐴 . Consequently,

𝑰(N) = sup{𝑰(𝑅; 𝐵)𝜔 : 𝜙 𝐴 = T𝐺 (𝜙 𝐴 ), 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜙 𝑅 𝐴 )}. (7.11.59)

𝜙𝑅 𝐴

In other words, in order to calculate 𝑰(N), it suffices to optimize over pure

states 𝜓 𝑅 𝐴 such that the reduced state 𝜓 𝐴 is invariant under the channel T𝐺
defined in (7.11.58).
𝑔
If the representation {𝑈 𝐴 }𝑔∈𝐺 is irreducible, then 𝑰(N) is equal to the general-
ized mutual information of the Choi state of the channel, i.e.,

𝑰(N) = 𝑰(𝑅; 𝐵) 𝜌N . (7.11.60)

Proof: The proof is similar to the proof of Proposition 7.84 and uses some steps
therein.
The inequality

𝑰(N) ≥ sup{𝑰(𝑅; 𝐵)𝜔 : 𝜙 𝐴 = T𝐺 (𝜙 𝐴 ), 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜙 𝑅 𝐴 )} (7.11.61)

𝜙𝑅 𝐴

holds simply by restricting the optimization in the definition of 𝑰(N) to pure states
𝜙 𝑅 𝐴 whose reduced states 𝜙 𝐴 are invariant under the channel T𝐺 . The remainder
of the proof is devoted to showing that the reverse inequality holds as well.

480
Chapter 7: Quantum Entropies and Information

Let 𝜓 𝑅 𝐴 be an arbitrary pure state, let 𝜌 𝐴 = 𝜓 𝐴 , and let 𝜌 𝐴 = T𝐺 (𝜌 𝐴 ).

𝜌
Furthermore, let 𝜙 𝑅 𝐴 be a purification of 𝜌 𝐴 . Let M 𝐴→𝐵 be the replacement
channel that traces out the 𝐴 system and replaces it with a state 𝜎𝐵 . Now following
the steps in (7.11.35)–(7.11.41), and incorporating the covariance of the channel N,
we conclude that
𝜌 𝜌
𝑫 (N 𝐴→𝐵 (𝜑 𝑅 𝐴 )∥M 𝐴→𝐵 (𝜑 𝑅 𝐴 ))
𝜌 𝜌
= 𝑫 (N 𝐴→𝐵 (𝜑 𝑅 𝐴 )∥𝜑 𝑅 ⊗ 𝜎𝐵 ) (7.11.62)

© 1 ∑︁ 1 ∑︁ 𝑔† 𝑔ª
≥ 𝑫 |𝑔⟩⟨𝑔| 𝑅′ ⊗ N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) |𝑔⟩⟨𝑔| 𝑅′ ⊗ 𝜓 𝑅 ⊗ 𝑉𝐵 𝜎𝐵𝑉𝐵 ®
|𝐺 | 𝑔∈𝐺 |𝐺 | 𝑔∈𝐺
« ¬
(7.11.63)
≥ 𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ 𝜏𝐵 ), (7.11.64)

where the last line follows from the data-processing inequality under the partial-trace
Í 𝑔† 𝑔
channel Tr 𝑅′ and we let 𝜏𝐵 B |𝐺1 | 𝑔∈𝐺 𝑉𝐵 𝜎𝐵𝑉𝐵 . By taking the infimum over all
states 𝜏𝐵 on the right-hand side of the inequality above, we find that
𝜌 𝜌
𝑫 (N 𝐴→𝐵 (𝜙 𝑅 𝐴 )∥𝜙 𝑅 ⊗ 𝜎𝐵 ) ≥ 𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ 𝜏𝐵 ) (7.11.65)
≥ inf 𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ 𝜏𝐵 ) (7.11.66)
𝜏𝐵
= 𝑰(𝑅; 𝐵)𝜔 , (7.11.67)

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ). The inequality above holds for all states 𝜓 𝑅 𝐴 and all
states 𝜎𝐵 . Therefore, optimizing over all states 𝜎𝐵 on the left-hand side of the
above inequality leads to
𝜌 𝜌
inf 𝑫 (N 𝐴→𝐵 (𝜙 𝑅 𝐴 )∥𝜙 𝑅 ⊗ 𝜎𝐵 ) = 𝑰(𝑅; 𝐵)𝜔 ≥ 𝑰(𝑅; 𝐵)𝜔 , (7.11.68)
𝜎𝐵

𝜌
where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜙 𝑅 𝐴 ). Thus, we conclude (7.11.57).
𝜌
Next, by construction, the state 𝜙 𝑅 𝐴 is such that its reduced state on 𝐴 is invariant
under the channel T𝐺 . Optimizing over all such states leads to

sup{𝑰(𝑅; 𝐵)𝜔 : 𝜙 𝐴 = T𝐺 (𝜙 𝐴 ), 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜙 𝑅 𝐴 )} ≥ 𝑰(𝑅; 𝐵)𝜔 . (7.11.69)

𝜙𝑅 𝐴

Since this inequality holds for all pure states 𝜓 𝑅 𝐴 , we finally obtain

481
Chapter 7: Quantum Entropies and Information

sup{𝑰(𝑅; 𝐵)𝜔 : 𝜙 𝐴 = T𝐺 (𝜙 𝐴 ), 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜙 𝑅 𝐴 )}

𝜙𝑅 𝐴
≥ sup 𝑰(𝑅; 𝐵)𝜔 = 𝑰(N), (7.11.70)
𝜓𝑅 𝐴

as required.
𝑔
To prove (7.11.60), note that if {𝑈 𝐴 }𝑔∈𝐺 is irreducible, then for every state
𝜓 𝑅 𝐴 , the state 𝜌 𝐴 = 𝜓 𝐴 satisfies 𝜌 𝐴 = T𝐺 (𝜌 𝐴 ) = 1𝑑 𝐴𝐴 . Then, since the maximally
𝜌
entangled state is a purification of the maximally mixed state, we let 𝜙 𝑅 𝐴 = Φ 𝑅 𝐴 ,
which implies via (7.11.68) that

inf 𝑫 (N 𝐴→𝐵 (Φ 𝑅 𝐴 )∥𝜋 𝑅 ⊗ 𝜎𝐵 ) = 𝑰(𝑅; 𝐵) 𝜌N ≥ 𝑰(𝑅; 𝐵)𝜔 , (7.11.71)

𝜎𝐵

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ). Since the pure state 𝜓 𝑅 𝐴 is arbitrary, we have that

𝑰(𝑅; 𝐵) 𝜌N ≥ sup 𝑰(𝑅; 𝐵)𝜔 = 𝑰(N). (7.11.72)

𝜓𝑅 𝐴

The reverse inequality holds simply by restricting the optimization in the definition
of 𝑰(N) to the maximally entangled state Φ 𝑅 𝐴 . We thus have (7.11.60), as
required. ■

The following proposition is helpful in simplifying the computation of the

generalized Holevo information 𝝌(N) of a quantum channel N:

Proposition 7.87
Let N be a quantum channel. To compute its generalized Holevo information
𝝌(N), as defined in (7.11.56), it suffices to optimize over ensembles consisting
of pure states. If the underlying generalized divergence is continuous, then
no more than 𝑑 2 pure states are needed for the optimization, where 𝑑 is the
dimension of the input space of N.

Proof: Let 𝜌 𝑋 𝐴 be a classical–quantum state of the form

∑︁
𝜌𝑋 𝐴 = 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 , (7.11.73)
𝑥∈X

where X is a finite alphabet with associated |X|-dimensional system 𝑋, 𝑝 : X →

[0, 1] is a probability distribution on X, and {𝜌 𝑥𝐴 }𝑥∈X is a set of states.
482
Chapter 7: Quantum Entropies and Information

First, by simply restricting the optimization in the definition of the Holevo

information to ensembles containing pure states only, we obtain

𝝌(N) = sup 𝑰(𝑋; 𝐵)N 𝐴→𝐵 (𝜌 𝑋 𝐴) ≥ sup 𝑰(𝑍; 𝐵)N 𝐴→𝐵 (𝜏𝑍 𝐴) , (7.11.74)
𝜌𝑋 𝐴 𝜏𝑍 𝐴

where 𝜏𝑍 𝐴 is a classical–quantum state consisting only of pure states, i.e.,

∑︁
𝜏𝑍 𝐴 = 𝑝(𝑥)|𝑧⟩⟨𝑧| 𝑍 ⊗ |𝜓 𝑧 ⟩⟨𝜓 𝑧 | 𝐴 . (7.11.75)
𝑧∈Z

Now, let each state 𝜌 𝑥𝐴 in the classical–quantum state 𝜌 𝑋 𝐴 have a spectral

decomposition of the form
𝑟𝑥
∑︁
𝜌 𝑥𝐴 = 𝜆𝑥𝑦 |𝜙𝑥𝑦 ⟩⟨𝜙𝑥𝑦 |, (7.11.76)
𝑦=1

where 𝑟 𝑥 = rank(𝜌 𝑥𝐴 ). So 𝜌 𝑋 𝐴 can be written as

𝑟𝑥
∑︁ ∑︁
𝜌𝑋 𝐴 = 𝑝(𝑥)𝜆𝑥𝑦 |𝑥⟩⟨𝑥| 𝑋 ⊗ |𝜙𝑥𝑦 ⟩⟨𝜙𝑥𝑦 | 𝐴 . (7.11.77)
𝑥∈X 𝑦=1

Now, let us define the state

𝑟𝑥
∑︁ ∑︁
𝜔 𝑋𝑌 𝐴 = 𝑝(𝑥)𝜆𝑥𝑦 |𝑥⟩⟨𝑥| 𝑋 ⊗ |𝑦⟩⟨𝑦|𝑌 ⊗ |𝜙𝑥𝑦 ⟩⟨𝜙𝑥𝑦 | 𝐴 . (7.11.78)
𝑥∈X 𝑦=1

Then, we have that

𝜌 𝑋 𝐴 = Tr𝑌 [𝜔 𝑋𝑌 𝐴 ]. (7.11.79)
Therefore, by the data-processing inequality for the generalized divergence, we find
that

𝑰(𝑋; 𝐵)N(𝜌) = inf 𝑫 (N 𝐴→𝐵 (𝜌 𝑋 𝐴 )∥ 𝜌 𝑋 ⊗ 𝜎𝐵 ) (7.11.80)

𝜎𝐵
= inf 𝑫 (Tr𝑌 [N 𝐴→𝐵 (𝜔 𝑋𝑌 𝐴 )] ∥Tr𝑌 [𝜔 𝑋𝑌 ] ⊗ 𝜎𝐵 ) (7.11.81)
𝜎𝐵
≤ inf 𝑫 (N 𝐴→𝐵 (𝜔 𝑋𝑌 𝐴 )∥𝜔 𝑋𝑌 ⊗ 𝜎𝐵 ) (7.11.82)
𝜎𝐵
= 𝑰(𝑋𝑌 ; 𝐵)N(𝜔) . (7.11.83)

483
Chapter 7: Quantum Entropies and Information

Since 𝜔 𝑋𝑌 𝐴 is a classical–quantum state with pure states, it holds that

𝑰(𝑋𝑌 ; 𝐵)N(𝜔) ≤ sup 𝑰(𝑍; 𝐵)N(𝜏) . (7.11.84)

𝜏𝑍 𝐴

Therefore,

𝝌(N) = sup 𝑰(𝑋; 𝐵)N 𝐴→𝐵 (𝜌 𝑋 𝐴) ≤ sup 𝑰(𝑍; 𝐵)N 𝐴→𝐵 (𝜏𝑍 𝐴) (7.11.85)
𝜌𝑋 𝐴 𝜏𝑍 𝐴

which means that

𝝌(N) = sup 𝑰(𝑍; 𝐵)N 𝐴→𝐵 (𝜏𝑍 𝐴) , (7.11.86)
𝜏𝑍 𝐴

as required.
When the underlying generalized divergence is continuous, the fact that the
alphabet X of the classical–quantum states 𝜏𝑍 𝐴 need not exceed 𝑑 2 elements is due
to the Fenchel–Eggleston–Carathéodory Theorem (Theorem 2.23) and the fact that
dimension of the space of density operators on a 𝑑-dimensional space is 𝑑 2 . ■

The information measures for channels on which we primarily focus in this book
are those based on the following generalized divergences: the quantum relative
entropy, the Petz–Rényi relative entropy, the sandwiched Rényi relative entropy,
and the hypothesis testing relative entropy. Specifically, given a channel N 𝐴→𝐵 , we
are interested in the following mutual information quantities. In each case, 𝜓 𝑅 𝐴
is a pure state with the dimension of the system 𝑅 the same as that of 𝐴, the state
𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), and 𝜎𝐵 is a state.
1. 𝜀-hypothesis testing mutual information of N, defined for 𝜀 ∈ [0, 1] as

𝐼 𝐻𝜀 (N) B sup 𝐼 𝐻𝜀 (𝑅; 𝐵)𝜔 , (7.11.87)

𝜓𝑅 𝐴

where
𝐼 𝐻𝜀 ( 𝐴; 𝐵) 𝜌 B inf 𝐷 𝜀𝐻 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴 ⊗ 𝜎𝐵 ) (7.11.88)
𝜎𝐵

is the 𝜀-hypothesis testing mutual information of the bipartite state 𝜌 𝐴𝐵 .

2. Petz–Rényi mutual information of N:

𝐼𝛼 (N) B sup 𝐼𝛼 (𝑅; 𝐵)𝜔 ∀ 𝛼 ∈ [0, 1) ∪ (1, 2], (7.11.89)

𝜓𝑅 𝐴

484
Chapter 7: Quantum Entropies and Information

where
𝐼𝛼 ( 𝐴; 𝐵) 𝜌 B inf 𝐷 𝛼 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴 ⊗ 𝜎𝐵 ) (7.11.90)
𝜎𝐵

is the Petz–Rényi mutual information of the bipartite state 𝜌 𝐴𝐵 .

3. Sandwiched Rényi mutual information of N:

𝐼𝛼 (N) B sup e
e 𝐼𝛼 (𝑅; 𝐵)𝜔 ∀ 𝛼 ∈ [1/2, 1) ∪ (1, ∞), (7.11.91)
𝜓𝑅 𝐴

where
𝐼𝛼 ( 𝐴; 𝐵) 𝜌 B inf 𝐷
e e𝛼 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴 ⊗ 𝜎𝐵 ) (7.11.92)
𝜎𝐵

is the sandwiched Rényi mutual information of the bipartite state 𝜌 𝐴𝐵 .

For each of these quantities, we are also interested in the special case of classical–
quantum states. If X is a finite alphabet with corresponding |X|-dimensional system
𝑋, {𝜌 𝑥𝐴 }𝑥∈X is a set of states, 𝑝 : X → [0, 1] a probability distribution on X, and
Í
𝜌 𝑋 𝐴 = 𝑥∈X 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 , then for every channel N we define the following
quantities. In each case, we define 𝜔 𝑋 𝐵 B N 𝐴→𝐵 (𝜌 𝑋 𝐴 ).
1. 𝜀-hypothesis testing Holevo information of N, defined for all 𝜀 ∈ [0, 1] as

𝜒𝐻𝜀 (N) B sup 𝐼 𝐻𝜀 (𝑋; 𝐵)𝜔 . (7.11.93)

𝜌𝑋 𝐴

2. Petz–Rényi Holevo information of N:

𝜒𝛼 (N) B sup 𝐼𝛼 (𝑋; 𝐵)𝜔 ∀ 𝛼 ∈ [0, 1) ∪ (1, 2]. (7.11.94)

𝜌𝑋 𝐴

3. Sandwiched Rényi Holevo information of N:

𝜒𝛼 (N) B sup e
e 𝐼𝛼 (𝑋; 𝐵)𝜔 ∀ 𝛼 ∈ [1/2, 1) ∪ (1, ∞). (7.11.95)
𝜌𝑋 𝐴

We are also interested in the corresponding coherent information quantities. In

each case, 𝜓 𝑅 𝐴 is a pure state with the dimension of the system 𝑅 the same as that
of 𝐴, the state 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), and 𝜎𝐵 is a state. The coherent information
quantities are defined as follows for every quantum channel N 𝐴→𝐵 :

485
Chapter 7: Quantum Entropies and Information

1. 𝜀-hypothesis testing coherent information of N, defined for all 𝜀 ∈ [0, 1] as

𝐼 𝐻𝑐,𝜀 (N) B sup 𝐼 𝐻𝜀 (𝑅⟩𝐵)𝜔 , (7.11.96)
𝜓𝑅 𝐴

where
𝐼 𝐻𝜀 ( 𝐴⟩𝐵) 𝜌 B inf 𝐷 𝜀𝐻 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ) (7.11.97)
𝜎𝐵
is the 𝜀-hypothesis testing coherent information of the bipartite state 𝜌 𝐴𝐵 .
2. Petz–Rényi coherent information of N:
𝐼𝛼𝑐 (N) B sup 𝐼𝛼 (𝑅⟩𝐵)𝜔 ∀ 𝛼 ∈ [0, 1) ∪ (1, 2], (7.11.98)
𝜓𝑅 𝐴

where
𝐼𝛼 ( 𝐴⟩𝐵) 𝜌 B inf 𝐷 𝛼 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ) (7.11.99)
𝜎𝐵
is the Petz–Rényi coherent information of the bipartite state 𝜌 𝐴𝐵 .
3. Sandwiched Rényi coherent information of N:
𝐼𝛼𝑐 (N) B sup e
e 𝐼𝛼 (𝑅⟩𝐵)𝜔 ∀ 𝛼 ∈ [1/2, 1) ∪ (1, ∞), (7.11.100)
𝜓𝑅 𝐴

where
e e𝛼 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 )
𝐼𝛼 ( 𝐴⟩𝐵) 𝜌 B inf 𝐷 (7.11.101)
𝜎𝐵
is the sandwiched Rényi coherent information of the bipartite state 𝜌 𝐴𝐵 .
For all of the quantities defined above, we define the corresponding quantities
based on the quantum relative entropy by taking the limit 𝛼 → 1. The key such
quantities of interest in this book are the following:
1. Mutual information of N, denoted by 𝐼 (N) and defined as
𝐼 (N) B sup 𝐼 (𝑅; 𝐵)𝜔 , (7.11.102)
𝜓𝑅 𝐴

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), and we recall from (7.2.96) that the mutual

information 𝐼 ( 𝐴; 𝐵) 𝜌 of a bipartite state 𝜌 𝐴𝐵 is given by
𝐼 ( 𝐴; 𝐵) 𝜌 = 𝐻 ( 𝐴) 𝜌 + 𝐻 (𝐵) 𝜌 − 𝐻 ( 𝐴𝐵) 𝜌 (7.11.103)
= 𝐷 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴 ⊗ 𝜌 𝐵 ) (7.11.104)
= inf 𝐷 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴 ⊗ 𝜎𝐵 ), (7.11.105)
𝜎𝐵

where the optimization in the last line is over states 𝜎𝐵 .

486
Chapter 7: Quantum Entropies and Information

2. Holevo information of N, denoted by 𝜒(N) and defined as

𝜒(N) B sup 𝐼 (𝑋; 𝐵)𝜔 , (7.11.106)

𝜌𝑋 𝐴

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜌 𝑋 𝐴 ), and the supremum is over all classical-quantum

Í
states of the form 𝜌 𝑋 𝐴 = 𝑥∈X 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 , with X a finite alphabet
with associated |X|-dimensional system 𝑋, {𝜌 𝑥𝐴 }𝑥∈X a set of states, and
𝑝 : X → [0, 1] a probability distribution on X.
3. Coherent information of N, denoted by 𝐼 𝑐 (N) and defined as

𝐼 𝑐 (N) B sup 𝐼 (𝑅⟩𝐵)𝜔 , (7.11.107)

𝜓𝑅 𝐴

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), and we recall from (7.2.91) that the coherent

information 𝐼 ( 𝐴⟩𝐵) 𝜌 of a bipartite state 𝜌 𝐴𝐵 is given by

𝐼 ( 𝐴⟩𝐵) 𝜌 = 𝐻 (𝐵) 𝜌 − 𝐻 ( 𝐴𝐵) 𝜌 (7.11.108)

= 𝐷 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜌 𝐵 ) (7.11.109)
= inf 𝐷 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ), (7.11.110)
𝜎𝐵

where the optimization in the last line is over states 𝜎𝐵 .

Using (7.11.108), we can write the coherent information of the channel N as

𝐼 𝑐 (N) = sup{𝐻 (𝐵)𝜔 − 𝐻 (𝑅𝐵)𝜔 }. (7.11.111)

𝜓𝑅 𝐴

Given a Stinespring representation of N, so that N(𝜌) = Tr𝐸 [𝑉 𝜌𝑉 † ], where

𝑉 : H 𝐴 → H𝐵 ⊗ H𝐸 is an isometry such that 𝑑 𝐸 ≥ rank(ΓN ), observe that

𝐻 (𝑅𝐵)𝜔 = 𝐻 (𝐸)𝜏 , (7.11.112)

where 𝜏𝐸 = N𝑐 (𝜓 𝐴 ). This is due to the fact that the state 𝜙 𝑅𝐵𝐸 B 𝑉𝜓 𝑅 𝐴𝑉 † is

pure, meaning that 𝜔 𝑅𝐵 and 𝜏𝐸 = Tr 𝑅𝐵 [𝑉𝜓 𝑅 𝐴𝑉 † ] = N𝑐 (𝜓 𝐴 ) have the same
spectrum. Furthermore, the state on which 𝐻 (𝐵)𝜔 is evaluated is equal to
N(𝜓 𝐴 ). Therefore, we have that

𝐼 𝑐 (N) = sup{𝐻 (N(𝜌)) − 𝐻 (N𝑐 (𝜌))}, (7.11.113)

𝜌

where the optimization is over all states 𝜌.

487
Chapter 7: Quantum Entropies and Information

7.11.1 Simplified Formulas for Rényi Information Measures

In this section, we provide some simplified formulas for the Petz–Rényi information
quantities for general bipartite states and for all Rényi information quantities for
pure bipartite states.

Proposition 7.88 Quantum Sibson Identities

Let 𝜌 𝐴𝐵 be a bipartite state. Then the Petz–Rényi mutual and coherent
informations simplify as follows for all 𝛼 ∈ (0, 1) ∪ (1, ∞):
1
𝛼 𝛼 1−𝛼 𝛼
𝐼𝛼 ( 𝐴; 𝐵) 𝜌 = log2 Tr Tr 𝐴 [𝜌 𝐴𝐵 𝜌 𝐴 ] , (7.11.114)
𝛼−1
𝛼 h 1
i
𝛼 𝛼
𝐼𝛼 ( 𝐴⟩𝐵) 𝜌 = log2 Tr Tr 𝐴 [𝜌 𝐴𝐵 ] . (7.11.115)
𝛼−1

Proof: We show both identities with a unified approach. Let 𝜏𝐴 be a positive

semi-definite operator, let 𝜎𝐵 be a state, and let 𝜔 𝐵 (𝛼) denote the following state:
1
Tr 𝐴 [𝜌 𝛼𝐴𝐵 𝜏𝐴1−𝛼 ] 𝛼
𝜔 𝐵 (𝛼) := h 𝛼1 i , (7.11.116)
Tr Tr 𝐴 [𝜌 𝐴𝐵 𝜏𝐴 ]
𝛼 1−𝛼

so that
𝛼1 𝛼1
Tr 𝐴 [𝜌 𝛼𝐴𝐵 𝜏𝐴1−𝛼 ] = Tr Tr 𝐴 [𝜌 𝛼𝐴𝐵 𝜏𝐴1−𝛼 ] · 𝜔 𝐵 (𝛼) (7.11.117)

We first prove that

𝐷 𝛼 (𝜌 𝐴𝐵 ∥𝜏𝐴 ⊗ 𝜎𝐵 ) = 𝐷 𝛼 (𝜌 𝐴𝐵 ∥𝜏𝐴 ⊗ 𝜔 𝐵 (𝛼)) + 𝐷 𝛼 (𝜔 𝐵 (𝛼)∥𝜎𝐵 ) (7.11.118)

≥ 𝐷 𝛼 (𝜌 𝐴𝐵 ∥𝜏𝐴 ⊗ 𝜔 𝐵 (𝛼)), (7.11.119)

where the inequality follows because 𝐷 𝛼 (𝜔 𝐵 (𝛼)∥𝜎𝐵 ) ≥ 0 for all states. Consider
that

𝑄 𝛼 (𝜌 𝐴𝐵 ∥𝜏𝐴 ⊗ 𝜎𝐵 )
= Tr[𝜌 𝛼𝐴𝐵 (𝜏𝐴 ⊗ 𝜎𝐵 ) 1−𝛼 ] (7.11.120)
= Tr[𝜌 𝛼𝐴𝐵 (𝜏𝐴1−𝛼 ⊗ 𝜎𝐵1−𝛼 )] (7.11.121)
488
Chapter 7: Quantum Entropies and Information

= Tr 𝐵 [Tr 𝐴 [𝜌 𝛼𝐴𝐵 (𝜏𝐴1−𝛼 ⊗ 𝜎𝐵1−𝛼 )]] (7.11.122)

= Tr[Tr 𝐴 [𝜌 𝛼𝐴𝐵 𝜏𝐴1−𝛼 ]𝜎𝐵1−𝛼 ] (7.11.123)
𝛼1 𝛼
= Tr Tr Tr 𝐴 [𝜌 𝛼𝐴𝐵 𝜏𝐴1−𝛼 ] · 𝜔 𝐵 (𝛼) 𝜎𝐵1−𝛼 (7.11.124)
𝛼1 𝛼
= Tr Tr 𝐴 [𝜌 𝛼𝐴𝐵 𝜏𝐴1−𝛼 ] · Tr 𝜔 𝐵 (𝛼) 𝛼 𝜎𝐵1−𝛼 . (7.11.125)

1
Applying the function (·) → 𝛼−1 log2 (·) to both sides, we conclude that

𝐷 𝛼 (𝜌 𝐴𝐵 ∥𝜏𝐴 ⊗ 𝜎𝐵 ) =
1
𝛼 𝛼 1−𝛼 𝛼
log2 Tr Tr 𝐴 [𝜌 𝐴𝐵 𝜏𝐴 ] + 𝐷 𝛼 (𝜔 𝐵 (𝛼)∥𝜎𝐵 ). (7.11.126)
𝛼−1
Now consider that
𝑄 𝛼 (𝜌 𝐴𝐵 ∥𝜏𝐴 ⊗ 𝜔 𝐵 (𝛼))
= Tr[𝜌 𝛼𝐴𝐵 (𝜏𝐴1−𝛼 ⊗ 𝜔 𝐵 (𝛼) 1−𝛼 )] (7.11.127)
= Tr[Tr 𝐴 [𝜌 𝛼𝐴𝐵 𝜏𝐴1−𝛼 ]𝜔 𝐵 (𝛼) 1−𝛼 ] (7.11.128)
1−𝛼 
 𝛼1

𝛼 1−𝛼
© Tr 𝐴 [𝜌 𝐴𝐵 𝜏𝐴 ]
𝛼 1−𝛼 ª 
= Tr Tr 𝐴 [𝜌 𝐴𝐵 𝜏𝐴 ] h


® 
1i®
(7.11.129)
Tr Tr 𝐴 [𝜌 𝐴𝐵 𝜏𝐴 ]
𝛼 1−𝛼
 𝛼 
 
 « ¬ 
1 𝛼−1 1−𝛼𝛼
𝛼 1−𝛼 𝛼 𝛼 1−𝛼 𝛼 1−𝛼
= Tr Tr 𝐴 [𝜌 𝐴𝐵 𝜏𝐴 ] Tr Tr 𝐴 [𝜌 𝐴𝐵 𝜏𝐴 ] Tr 𝐴 [𝜌 𝐴𝐵 𝜏𝐴 ]
(7.11.130)
𝛼1 𝛼−1 𝛼1
= Tr Tr 𝐴 [𝜌 𝛼𝐴𝐵 𝜏𝐴1−𝛼 ] Tr Tr 𝐴 [𝜌 𝛼𝐴𝐵 𝜏𝐴1−𝛼 ] (7.11.131)
𝛼1 𝛼
= Tr Tr 𝐴 [𝜌 𝛼𝐴𝐵 𝜏𝐴1−𝛼 ] . (7.11.132)

1
Now applying the function (·) → 𝛼−1 log2 (·) to both sides, we conclude that
𝛼1
𝛼
𝐷 𝛼 (𝜌 𝐴𝐵 ∥𝜏𝐴 ⊗ 𝜔 𝐵 (𝛼)) = log2 Tr Tr 𝐴 [𝜌 𝛼𝐴𝐵 𝜏𝐴1−𝛼 ] . (7.11.133)
𝛼−1
So this establishes (7.11.118). We then conclude from (7.11.119) that
inf 𝐷 𝛼 (𝜌 𝐴𝐵 ∥𝜏𝐴 ⊗ 𝜎𝐵 ) = 𝐷 𝛼 (𝜌 𝐴𝐵 ∥𝜏𝐴 ⊗ 𝜔 𝐵 (𝛼)), (7.11.134)
𝜎𝐵

489
Chapter 7: Quantum Entropies and Information

because the lower bound in (7.11.119) is achieved by picking 𝜎𝐵 = 𝜔 𝐵 (𝛼). So this

establishes that
𝛼1
𝛼
inf 𝐷 𝛼 (𝜌 𝐴𝐵 ∥𝜏𝐴 ⊗ 𝜎𝐵 ) = log2 Tr Tr 𝐴 [𝜌 𝛼𝐴𝐵 𝜏𝐴1−𝛼 ] . (7.11.135)
𝜎𝐵 𝛼−1
We conclude the formula in (7.11.114) by setting 𝜏𝐴 = 𝜌 𝐴 , and we conclude the
formula in (7.11.115) by setting 𝜏𝐴 = 1 𝐴 . ■

Proposition 7.89 Rényi Information Measures for Pure Bipartite States

Let 𝜓 𝐴𝐵 be a pure bipartite state, and let 𝛼 ∈ (0, 1) ∪ (1, ∞). Then the Petz–,
sandwiched, and geometric Rényi mutual informations simplify as follows:

𝐼𝛼 ( 𝐴; 𝐵)𝜓 = 2𝐻 2− 𝛼 ( 𝐴)𝜓 , (7.11.136)

𝛼

𝐼𝛼 ( 𝐴; 𝐵)𝜓 = 2𝐻
e 1 ( 𝐴)𝜓 , (7.11.137)
2𝛼−1

𝐼𝛼 ( 𝐴; 𝐵)𝜓 = 2𝐻0 ( 𝐴)𝜓 ,

b (7.11.138)

where the Rényi entropy 𝐻𝛼 ( 𝐴) is defined in (7.4.3). The Petz–, sandwiched,

and geometric Rényi coherent informations simplify as follows:

𝐼𝛼 ( 𝐴⟩𝐵)𝜓 = 𝐻 1 ( 𝐴)𝜓 , (7.11.139)

𝛼
e 𝛼 ( 𝐴) 𝜓 ,
𝐼𝛼 ( 𝐴⟩𝐵)𝜓 = 𝐻 2𝛼−1 (7.11.140)
𝐼𝛼 ( 𝐴⟩𝐵)𝜓 = 𝐻 1 ( 𝐴)𝜓 .
b (7.11.141)
2

Proof: We start by proving (7.11.136). We assume that 𝜓 𝐴𝐵 is in its Schmidt

form, so that without loss of generality the Hilbert spaces for systems 𝐴 and 𝐵 are
isomorphic and each have dimension equal to the Schmidt rank of 𝜓 𝐴𝐵 . With Γ𝐴𝐵
the maximally entangled operator with local bases chosen to match those from the
1 1
Schmidt decomposition, we have that 𝜓 𝐴𝐵 = 𝜓 𝐴2 Γ𝐴𝐵 𝜓 𝐴2 , where 𝜓 𝐴 = Tr 𝐵 [𝜓 𝐴𝐵 ].
We apply the Sibson identity in (7.11.114) to find that
𝛼1 𝛼1
𝛼−1
𝐼 ( 𝐴;𝐵) 𝛼 1−𝛼 1−𝛼
2 𝛼 𝛼 𝜓
= Tr Tr 𝐴 [𝜓 𝐴𝐵 𝜓 𝐴 ] = Tr Tr 𝐴 [𝜓 𝐴𝐵 𝜓 𝐴 ] (7.11.142)
" 1#
1 1 𝛼
= Tr Tr 𝐴 [𝜓 𝐴2 Γ𝐴𝐵 𝜓 𝐴2 𝜓 1−𝛼
𝐴 ] (7.11.143)

490
Chapter 7: Quantum Entropies and Information
" 𝛼1 #
1 1
= Tr Tr 𝐴 [Γ𝐴𝐵 𝜓 𝐴 𝜓 1−𝛼
2
𝐴 𝜓 𝐴]
2
(7.11.144)
𝛼1
= Tr Tr 𝐴 [Γ𝐴𝐵 𝜓 2−𝛼
𝐴 ] (7.11.145)
1 1
𝑇 2−𝛼 𝛼 𝑇 2−𝛼 𝛼
= Tr Tr 𝐴 [Γ𝐴𝐵 (𝜓 𝐵 ) ] = Tr (𝜓 𝐵 ) (7.11.146)

2− 𝛼 2− 𝛼
= Tr 𝜓 𝐵𝛼 = Tr 𝜓 𝐴𝛼 . (7.11.147)

The fourth equality follows from cyclicity of partial trace and the sixth follows from
the transpose trick in (2.2.40). Rearranging the first and last lines gives

𝛼 2− 𝛼
𝐼𝛼 ( 𝐴; 𝐵)𝜓 = log2 Tr 𝜓 𝐴 𝛼
(7.11.148)
𝛼−1
!
1 2− 𝛼
=2 2−𝛼
log2 Tr 𝜓 𝐴𝛼 (7.11.149)
1− 𝛼
= 2𝐻 2− 𝛼 ( 𝐴)𝜓 . (7.11.150)
𝛼

We now prove (7.11.139). It follows from the Sibson identity in (7.11.115):

𝛼−1
h 1i h 1
i
2 𝛼 𝐼 𝛼 ( 𝐴⟩𝐵) 𝜓 = Tr Tr 𝐴 [𝜓 𝛼𝐴𝐵 ] 𝛼 = Tr (Tr 𝐴 [𝜓 𝐴𝐵 ]) 𝛼 (7.11.151)
h i
1 1
= Tr (𝜓 𝐵 ) 𝛼 = Tr 𝜓 𝐴𝛼 . (7.11.152)

Rearranging this gives

𝛼 1 1 1
𝐼𝛼 ( 𝐴⟩𝐵)𝜓 = log2 Tr 𝜓 𝐴𝛼 = log2 Tr 𝜓 𝐴 = 𝐻 1 ( 𝐴)𝜓 .
𝛼
(7.11.153)
𝛼−1 1− 1
𝛼
𝛼

We now prove (7.11.137). Consider, for an arbitrary state 𝜎𝐵 , that

e𝛼 (𝜓 𝐴𝐵 ∥𝜓 𝐴 ⊗ 𝜎𝐵 )
𝑄
𝛼
1 1− 𝛼 1
= Tr 𝜓 𝐴𝐵 2
(𝜓 𝐴 ⊗ 𝜎𝐵 ) 𝛼 𝜓 𝐴𝐵
2
(7.11.154)
h 1− 𝛼
𝛼i
= Tr |𝜓⟩⟨𝜓| 𝐴𝐵 (𝜓 𝐴 ⊗ 𝜎𝐵 ) 𝛼 |𝜓⟩⟨𝜓| 𝐴𝐵 (7.11.155)

491
Chapter 7: Quantum Entropies and Information
1− 𝛼
𝛼
= ⟨𝜓| 𝐴𝐵 (𝜓 𝐴 ⊗ 𝜎𝐵 ) 𝛼|𝜓⟩ 𝐴𝐵 Tr |𝜓⟩⟨𝜓| 𝛼𝐴𝐵 (7.11.156)
1− 𝛼
𝛼
= ⟨𝜓| 𝐴𝐵 (𝜓 𝐴 ⊗ 𝜎𝐵 ) 𝛼 |𝜓⟩ 𝐴𝐵 (7.11.157)
𝛼
1 1− 𝛼 1− 𝛼 1
= ⟨Γ| 𝐴𝐵 𝜓 𝐴2 𝜓 𝐴𝛼 ⊗ 𝜎𝐵 𝛼 𝜓 𝐴2 |Γ⟩ 𝐴𝐵 (7.11.158)
𝛼
1 1− 𝛼
= ⟨Γ| 𝐴𝐵 𝜓 𝐴 ⊗ 𝜎𝐵
𝛼 𝛼
|Γ⟩ 𝐴𝐵 (7.11.159)
𝛼
1
= ⟨Γ| 𝐴𝐵 𝜓 𝐴𝛼 [T 𝐴 (𝜎𝐴 )] 𝛼 ⊗ 1𝐵 |Γ⟩ 𝐴𝐵
1− 𝛼
(7.11.160)
𝛼
1 1− 𝛼
= Tr 𝜓 𝐴 [T 𝐴 (𝜎𝐴 )] 𝛼
𝛼
. (7.11.161)

1
Now applying the function (·) → 𝛼−1 log2 (·) to both sides, we conclude that

𝛼 1 1− 𝛼
e𝛼 (𝜓 𝐴𝐵 ∥𝜓 𝐴 ⊗ 𝜎𝐵 ) =
𝐷 log2 Tr 𝜓 𝐴𝛼 [T 𝐴 (𝜎𝐴 )] 𝛼 , (7.11.162)
𝛼−1
and applying Proposition 2.8, we conclude that

𝐼𝛼 ( 𝐴; 𝐵)𝜓 = inf 𝐷e𝛼 (𝜓 𝐴𝐵 ∥𝜓 𝐴 ⊗ 𝜎𝐵 ) (7.11.163)

𝜎𝐵

𝛼 1 1− 𝛼
= inf log2 Tr 𝜓 𝐴 [T 𝐴 (𝜎𝐴 )] 𝛼
𝛼
(7.11.164)
𝜎𝐴 𝛼 − 1
𝛼 1
= log2 𝜓 𝐴𝛼 (7.11.165)
𝛼−1 𝛼
2𝛼−1
" 𝛼 # ! 2𝛼−1 𝛼
𝛼 1 2𝛼−1
= log2 Tr 𝜓 𝐴 𝛼
(7.11.166)
𝛼−1
" 𝛼 #
𝛼 2𝛼 − 1 1 2𝛼−1
= log2 Tr 𝜓 𝐴𝛼 (7.11.167)
𝛼−1 𝛼

2𝛼 − 1 1
= log2 Tr 𝜓 𝐴 2𝛼−1
(7.11.168)
𝛼−1
!
1 1
=2 1
log2 Tr 𝜓 𝐴2𝛼−1 (7.11.169)
1 − 2𝛼−1
= 2𝐻 1 ( 𝐴)𝜓 . (7.11.170)
2𝛼−1

492
Chapter 7: Quantum Entropies and Information

The third equality follows from Proposition 2.8.

We now prove (7.11.140):
e𝛼 (𝜓 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 )
𝑄
𝛼
1 1
= Tr 𝜓 𝐴𝐵 ( 1 𝐴 ⊗ 𝜎𝐵 ) 𝛼 𝜓 𝐴𝐵
1− 𝛼
2 2
(7.11.171)
𝛼
1− 𝛼
= Tr |𝜓⟩⟨𝜓| 𝐴𝐵 1 𝐴 ⊗ 𝜎𝐵 𝛼 |𝜓⟩⟨𝜓| 𝐴𝐵 (7.11.172)
𝛼
1− 𝛼
= ⟨𝜓| 𝐴𝐵 1 𝐴 ⊗ 𝜎𝐵 𝛼
|𝜓⟩ 𝐴𝐵 Tr[(|𝜓⟩⟨𝜓| 𝐴𝐵 ) 𝛼 ] (7.11.173)
𝛼
1− 𝛼
= ⟨𝜓| 𝐴𝐵 1 𝐴 ⊗ 𝜎𝐵 𝛼 |𝜓⟩ 𝐴𝐵 (7.11.174)
𝛼
1− 𝛼
= ⟨Γ| 𝐴𝐵 𝜓 𝐴 ⊗ 𝜎𝐵 𝛼 |Γ⟩ 𝐴𝐵 (7.11.175)
𝛼
= ⟨Γ| 𝐴𝐵 𝜓 𝐴 [T 𝐴 (𝜎𝐴 )] 𝛼 ⊗ 1𝐵 |Γ⟩ 𝐴𝐵
1− 𝛼
(7.11.176)
h 1− 𝛼
i𝛼
= Tr 𝜓 𝐴 [T 𝐴 (𝜎𝐴 )] 𝛼 . (7.11.177)

Then consider that

1 e𝛼 (𝜓 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 )
𝐼𝛼 ( 𝐴⟩𝐵)𝜓 = inf
e log2 𝑄 (7.11.178)
𝜎𝐵 𝛼 − 1
1 h 1− 𝛼
i𝛼
= inf log2 Tr 𝜓 𝐴 [T 𝐴 (𝜎𝐴 )] 𝛼 (7.11.179)
𝜎𝐴 𝛼 − 1
𝛼
= log2 ∥𝜓 𝐴 ∥ 2𝛼−1
𝛼 (7.11.180)
𝛼−1
𝛼 h 𝛼 i 2𝛼−1 𝛼
= log2 Tr 𝜓 𝐴2𝛼−1 (7.11.181)
𝛼−1
𝛼 2𝛼 − 1 h 𝛼 i
= log2 Tr 𝜓 𝐴2𝛼−1 (7.11.182)
𝛼−1 𝛼
2𝛼 − 1 h 𝛼 i
= log2 Tr 𝜓 𝐴2𝛼−1 (7.11.183)
𝛼−1
1 h 𝛼 i
= 𝛼 log 2 Tr 𝜓 𝐴2𝛼−1 (7.11.184)
1 − 2𝛼−1
𝛼 ( 𝐴) 𝜓 .
= 𝐻 2𝛼−1 (7.11.185)

The third equality follows from Proposition 2.8.

493
Chapter 7: Quantum Entropies and Information

We prove (7.11.138). Let 𝜎𝐵 be a state with the same support as 𝜓 𝐵 . Recall the
formula in Proposition 7.43 for the geometric Rényi relative entropy when the state
𝜌 is pure. We use this to conclude that
b𝛼 (𝜓 𝐴𝐵 ∥𝜓 𝐴 ⊗ 𝜎𝐵 ) = log2 ⟨𝜓| 𝐴𝐵 (𝜓 𝐴 ⊗ 𝜎𝐵 ) −1 |𝜓⟩ 𝐴𝐵
𝐷 (7.11.186)

= log2 ⟨𝜓| 𝐴𝐵 𝜓 −1 −1
𝐴 ⊗ 𝜎𝐵 |𝜓⟩ 𝐴𝐵 (7.11.187)
1 1
= log2 ⟨Γ| 𝐴𝐵 𝜓 𝐴2 𝜓 −1
𝐴 ⊗ 𝜎 −1
𝐵 𝜓 𝐴2 |Γ⟩ 𝐴𝐵 (7.11.188)

= log2 ⟨Γ| 𝐴𝐵 1 𝐴 ⊗ 𝜎𝐵 |Γ⟩ 𝐴𝐵
−1
(7.11.189)
−1
= log2 Tr 𝜎𝐵 . (7.11.190)

Now consider that the minimum value of inf 𝜎𝐵 Tr 𝜎𝐵−1 occurs when 𝜎𝐵 is the

maximally mixed state 𝜋 𝐵. This follows from using the Lagrange multiplier method
−1 2
(or alternatively inf 𝜎𝐵 Tr 𝜎𝐵 can be evaluated as 𝑑 𝐵 by applying Proposition 2.8
again, with an implicit identity operator acting on the support of 𝜓 𝐵 ). We then
conclude that
𝐼𝛼 ( 𝐴; 𝐵)𝜓 = inf log2 Tr 𝜎𝐵−1 = log2 Tr 𝜋 −1

b 𝐵 (7.11.191)
𝜎𝐵
= 2 log2 rank(𝜓 𝐴 ) = 2𝐻0 ( 𝐴)𝜓 . (7.11.192)

We finally prove (7.11.141):

b𝛼 (𝜓 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ) = log2 ⟨𝜓| 𝐴𝐵 ( 1 𝐴 ⊗ 𝜎𝐵 ) −1 |𝜓⟩ 𝐴𝐵
𝐷 (7.11.193)

= log2 ⟨𝜓| 𝐴𝐵 1 𝐴 ⊗ 𝜎𝐵 |𝜓⟩ 𝐴𝐵
−1
(7.11.194)

= log2 ⟨Γ| 𝐴𝐵 𝜓 𝐴 ⊗ 𝜎𝐵−1 |Γ⟩ 𝐴𝐵 (7.11.195)

= log2 ⟨Γ| 𝐴𝐵 1 𝐴 ⊗ 𝜎𝐵 T𝐵 (𝜓 𝐵 ) |Γ⟩ 𝐴𝐵
−1
(7.11.196)
= log2 Tr 𝜎𝐵−1 T𝐵 (𝜓 𝐵 ) .

(7.11.197)
Now applying Proposition 2.8, we conclude that
b𝛼 (𝜓 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ) = inf log2 Tr 𝜎𝐵−1 T𝐵 (𝜓 𝐵 )

𝐼𝛼 ( 𝐴⟩𝐵)𝜓 = inf 𝐷
b (7.11.198)
𝜎𝐵 𝜎𝐵
= log2 ∥T𝐵 (𝜓 𝐵 ) ∥ 1 = log2 ∥𝜓 𝐵 ∥ 1 (7.11.199)
2 2

= log2 ∥𝜓 𝐴 ∥ 1 = 𝐻 1 ( 𝐴)𝜓 . (7.11.200)

2 2

This concludes the proof. ■

494
Chapter 7: Quantum Entropies and Information

7.11.2 Remarks on Defining Channel Quantities from State

Quantities

Observe that all of the generalized information measures for quantum channels given
in Definition 7.85, as well as all of the channel information measures given above
for specific generalized divergences, are defined in a common manner. Specifically,
given a function 𝑓 : D(H 𝐴𝐵 ) → R for bipartite states, the corresponding function
𝑓 for quantum channels4 is defined as

𝑓 (N) B sup 𝑓 (𝑅; 𝐵)𝜔 , (7.11.201)

𝜌𝑅 𝐴

where N 𝐴→𝐵 is a quantum channel, 𝜌 𝑅 𝐴 is a quantum state, with the dimension

of 𝑅 unbounded, and 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜌 𝑅 𝐴 ). In other words, we define the channel
quantity by taking a quantum state 𝜌 𝑅 𝐴 of a bipartite system consisting of the input
system 𝐴 of the channel and a reference system 𝑅 (whose dimension is in general
unbounded), passing 𝐴 through the channel, then evaluating the state quantity on
the output state N 𝐴→𝐵 (𝜌 𝑅 𝐴 ). We then optimize over all states 𝜌 𝑅 𝐴 .
A similar principle as in (7.11.201) has been used in Definition 7.81 for the
generalized divergence between two quantum channels. In particular, if we have a
function 𝑓 : D(H) × L+ (H) → R ∪ {+∞} on two quantum states,5 then we define
a corresponding quantity 𝑓 for two quantum channels as

𝑓 (N, M) B sup 𝑓 (N 𝐴→𝐵 (𝜌 𝑅 𝐴 ), M 𝐴→𝐵 (𝜌 𝑅 𝐴 )), (7.11.202)

𝜌𝑅 𝐴

where N 𝐴→𝐵 and M 𝐴→𝐵 are quantum channels and 𝜌 𝑅 𝐴 is a quantum state,
with the dimension of 𝑅 unbounded. In other words, we define the channel
quantity by evaluating the state quantity on the states N 𝐴→𝐵 (𝜌 𝑅 𝐴 ) and M 𝐴→𝐵 (𝜌 𝑅 𝐴 )
and optimizing over all states 𝜌 𝑅 𝐴 . We have already seen this principle being
used in Chapter 6 to define the diamond distance (Definition 6.18) and fidelity
(Definition 6.23) of two quantum channels, in which case the state quantity 𝑓 is the
trace distance or fidelity.
In both (7.11.201) and (7.11.202), properties of the underlying state quantity
(namely, the data-processing inequality), as well as the Schmidt decomposition
4 Ina slight abuse of notation, we use the same letter to denote the channel quantity as the state
quantity.
5We allow for the second argument to be a positive semi-definite operator more generally.

495
Chapter 7: Quantum Entropies and Information

theorem, allow us to restrict the optimizations in (7.11.201) and (7.11.202) to

pure states |𝜓⟩ 𝑅 𝐴 without loss of generality, with the dimension of 𝑅 equal to the
dimension of 𝐴. If the underlying state quantity 𝑓 in (7.11.201) is invariant with
respect to local unitaries, then we can use this simplification to write 𝑓 (N) as

𝑓 (N) = sup 𝑓 (𝜌 𝐴 , N 𝐴→𝐵 ), (7.11.203)

𝜌𝐴

where the optimization is now only over states 𝜌 𝐴 for the input system 𝐴 for the
channel N, and
√ √
𝑓 (𝜌 𝐴 , N 𝐴→𝐵 ) B 𝑓 ( 𝐴; 𝐵)𝜔 , 𝜔 𝐴𝐵 = 𝜌 𝐴 ΓN
𝐴𝐵 𝜌 𝐴 . (7.11.204)

This holds due to (2.2.38), which states for every purification |𝜓 𝜌 ⟩ 𝐴𝐴′ of 𝜌 𝐴
there exists an operator 𝑌 𝐴 such that (𝑌 𝐴 ⊗ 1 𝐴′ )|Γ⟩ 𝐴𝐴′ = |𝜓 𝜌 ⟩ 𝐴𝐴′ . Then, by the
polar decomposition (Theorem 2.3), and the fact that 𝑌 𝐴𝑌 𝐴† = 𝜌 𝐴 , it holds that
√
𝑌 𝐴 = 𝑈 𝐴 𝜌 𝐴 for some unitary 𝑈 𝐴 . Finally, using the definition of the Choi
representation ΓN 𝐴𝐵 and the unitary invariance of 𝑓 , we obtain (7.11.203). This
equivalent formulation of the channel quantity 𝑓 (N) has been used in (7.11.113)
for the coherent information of a channel.
If the underlying state quantity in (7.11.202) is unitarily invariant, then by
using the same reasoning as above we can write 𝑓 (N, M) in a form analogous to
(7.11.203):
√ √ √ M√
𝑓 (N, M) = sup 𝑓 ( 𝜌 𝐴 ΓN
𝐴𝐵 𝜌 𝐴 , 𝜌 𝐴 Γ 𝐴𝐵 𝜌 𝐴 ). (7.11.205)
𝜌𝐴

7.12 Summary
In this chapter, we studied various entropic quantities, starting with quantum
relative entropy. We proved many of its most important properties, and we saw
that it acts as a parent quantity for well-known quantities such as von Neumann
entropy, quantum conditional entropy, quantum mutual information and conditional
mutual information, and coherent information. We then studied the Petz–Rényi,
sandwiched Rényi, geometric Rényi, and hypothesis testing relative entropies, and
we proved many of their most important properties.
The unifying concept of this chapter is that of generalized divergence. A
generalized divergence is a function 𝑫 : D(H) × L+ (H) → R that satisifes the
496
Chapter 7: Quantum Entropies and Information

data-processing inequality: for every state 𝜌, positive semi-definite operator 𝜎, and

quantum channel N,
𝑫 (𝜌∥𝜎) ≥ 𝑫 (N(𝜌)∥N(𝜎)). (7.12.1)
This inequality holds for all of the quantum relative entropies that we considered in
this chapter, and many of their important properties (such as joint convexity) can
be derived using it. The data-processing inequality is a core concept in information
theory, and it underlies virtually all of the results that we present in this book.
At the end of the chapter, we defined information measures for quantum channels.
Given a state information measure (or, more generally, a generalized divergence),
we define an information measure for channels in a manner analogous to the way
that the diamond norm (a generalized divergence for channels) is defined from the
trace norm (a generalized divergence for states): we send one share of a bipartite
pure state through the channel, evaluate the state measure, and then optimize over
all input states. All of the upper and lower bounds on communication capacities
that we present in this book are given in terms of channel information measures
defined in this way.

7.13 Bibliographic Notes

The von Neumann quantum entropy was originally defined by von Neumann
(1927b). The quantum conditional entropy was considered implicitly by Lieb and
Ruskai (1973a,b), proposed as a quantum-information theoretic quantity of interest
by Cerf and Adami (1997), and the coherent information thereafter by Schumacher
and Nielsen (1996). The modern definition of the quantum mutual information was
proposed by Stratonovich (1965). Its non-negativity was proved by Molière and
Delbrück (1935); Lanford and Robinson (1968). The quantum conditional mutual
information was considered implicitly by Lanford and Robinson (1968); Lieb and
Ruskai (1973a,b) and was proposed as a quantum information-theoretic quantity
of interest by Cerf and Adami (1997, 1998). Chain rules for quantum conditional
entropy and information were employed by Cerf and Adami (1998).
The definition of the quantum relative entropy as presented in Definition 7.1 is
due to Umegaki (1962). It took many years after this until the paper by Hiai and
Petz (1991) was published, which solidified the operational interpretation of the
“Umegaki quantum relative entropy” in terms of quantum hypothesis testing and
the quantum Stein’s lemma. The strong converse for the quantum Stein’s lemma
497
Chapter 7: Quantum Entropies and Information

was established by Ogawa and Nagaoka (2005).

The non-negativity of quantum relative entropy follows as a consequence of
an inequality by Klein (1931), and its data-processing inequality for quantum
channels was established by Lindblad (1975). The data-processing inequality for
the quantum relative entropy under partial trace was established by Lieb and Ruskai
(1973a), from which its joint convexity follows. The expression in (7.2.100) for the
quantum conditional mutual information was given implicitly by Lieb and Ruskai
(1973a). Strong subadditivity of quantum entropy (or equivalently, non-negativity
of quantum conditional mutual information) was established by Lieb and Ruskai
(1973a,b). The data-processing inequality for the quantum conditional mutual
information under local channels was established by Christandl and Winter (2004).
The uniform continuity bound for quantum conditional mutual information, as
presented in Proposition 7.10, was established by Shirokov (2017).
The notion of generalized divergence in the classical case was proposed by
Polyanskiy and Verdú (2010), and in the quantum case by Sharma and Warsi
(2013). Proposition 7.16 was established by Wilde et al. (2014); Tomamichel et al.
(2017). The generalized mutual information and conditional quantum entropy were
proposed by Sharma and Warsi (2013).
The Petz–Rényi relative entropy was proposed by Petz (1985, 1986a), wherein its
data-processing inequality was established for the case of partial trace. The general
definition of Petz–Rényi relative entropy incorporating support conditions and its
data-processing inequality for general channels was established by Tomamichel
et al. (2009), along with several of its other properties such as monotonicity in 𝛼.
The expression in (7.4.23) for the Petz–Rényi relative entropy was presented by
Tomamichel et al. (2009); Sharma (2012).
The sandwiched Rényi relative entropy was independently proposed by Müller-
Lennert et al. (2013) and Wilde et al. (2014), and the alternative expression in (7.5.5)
was given by Dupuis and Wilde (2016). The variational expression in (7.5.6), as
well as Proposition 7.29, are due to Müller-Lennert et al. (2013). Proposition 7.30
was established by Müller-Lennert et al. (2013); Wilde et al. (2014). The fact
that sandwiched Rényi relative entropy is monotone with respect to 𝛼 is due to
Müller-Lennert et al. (2013), with an independent proof for 𝛼 > 1 by Beigi (2013).
The inequality in (7.5.45) for 𝛼 > 1 is due to Wilde et al. (2014), as a direct
consequence of the inequality by Lieb and Thirring (1976). The inequality in
(7.5.45) for 𝛼 ∈ (0, 1) is due to Datta and Leditzky (2014), as a direct consequence
of the Araki–Lieb–Thirring inequalities by Lieb and Thirring (1976); Araki (1990).

498
Chapter 7: Quantum Entropies and Information

The “reverse” Araki–Lieb–Thirring inequality, which leads to the inequality in

(7.5.46), was proved by Iten et al. (2017). The data-processing inequality for the
sandwiched Rényi relative entropy was established in a number of papers for various
parameter ranges of 𝛼: Müller-Lennert et al. (2013); Wilde et al. (2014); Frank and
Lieb (2013); Beigi (2013); Mosonyi and Ogawa (2015), being established for the
full range 𝛼 ∈ [1/2, ∞] by Frank and Lieb (2013). The proof that we presented here
is due to Wilde (2018b). Counterexamples to data processing for the sandwiched
Rényi relative entropy in the range 𝛼 ∈ (0, 1/2) were given by Berta et al. (2017).
The geometric Rényi relative entropy has its roots in work of Petz and Ruskai
(1998), and it was further developed by Matsumoto (2013, 2018). See also
(Tomamichel, 2015; Hiai and Mosonyi, 2017) for other expositions. It was given the
name “geometric Rényi relative entropy” by Fang and Fawzi (2021) because it is a
function of the matrix geometric mean of its arguments. See, e.g., Lawson and Lim
(2001) for a review of matrix geometric means. Proposition 7.40 was established
by Katariya and Wilde (2021), with roots in the earlier work of Matsumoto
(2014a,b). In particular, the expression Tr[𝜎(𝜎 −1/2 𝜌𝜎˜ −1/2 ) 𝛼 ] for 𝛼 = 1/2 and
supp(𝜌) ⊈ supp(𝜎) was identified in Matsumoto (2014a, Section 3) and later
generalized to all 𝛼 ∈ (0, 1) in Matsumoto (2014b, Section 2). Lemma 7.41
and Proposition 7.43 were also established by Katariya and Wilde (2021), along
with monotoniticity in 𝛼 for all 𝛼 ∈ (0, 1) ∪ (1, ∞) (in Proposition 7.44). The
inequality in Proposition 7.42 was established for the interval 𝛼 ∈ (0, 1) ∪ (1, 2]
in Tomamichel (2015) (by making use of a general result in Matsumoto (2013,
2018)) and for the full interval 𝛼 ∈ (1, ∞) in Wang et al. (2019a). Here, we have
followed the approach of Wang et al. (2019a) and offered a unified proof in terms
of the Araki–Lieb–Thirring inequality Araki (1990); Lieb and Thirring (1976).
The first inequality in Proposition 7.49 was established for 𝛼 ∈ (1, 2] in Wilde
et al. (2014) and for 𝛼 ∈ (0, 1) in Datta and Leditzky (2014), by employing the
Araki–Lieb–Thirring inequality Araki (1990); Lieb and Thirring (1976). The
second inequality was established by Matsumoto (2013, 2018) and reviewed by
Tomamichel (2015). Data processing was established by an operator-theoretic
approach in Petz and Ruskai (1998) and by an operational method in Matsumoto
(2013, 2018). The operator-theoretic approach taken here has its roots in Hiai
and Petz (1991, Proposition 2.5) and was reviewed in Hiai and Mosonyi (2017,
Corollary 3.31). The interpretation of geometric Rényi relative entropy given in
Proposition 7.48 was discovered by Matsumoto (2013, 2018). Lemma 7.50 was
presented by Katariya and Wilde (2021) and is based on Zhou and Jiang (2019,
Lemma 3).

499
Chapter 7: Quantum Entropies and Information

Belavkin and Staszewski (1982) discovered the quantum generalization of

the classical relative entropy given in Section 7.7, now known as the Belavkin–
Staszewski relative entropy. Matsumoto (2013, 2018) showed that the Belavkin–
Staszewski relative entropy is the limit of the geometric Rényi relative entropy as
𝛼 → 1. The proof given here was presented by Katariya and Wilde (2021). Hiai and
Petz (1991) found the inequality in Proposition 7.53 relating the quantum relative
entropy to the Belavkin–Staszewski relative entropy (the proof given here is due to
Katariya and Wilde (2021)). Hiai and Petz (1991) established the data-processing
inequality for the Belavkin–Staszewski relative entropy, by a method different
from that given here. Matsumoto (2013, 2018) found the interpretation of the
Belavkin–Staszewski relative entropy given in Proposition 7.57.
The max-relative entropy was proposed by Datta (2009b) as a quantum
information-theoretic quantity of interest. Datta also established many basic
information processing properties of the max-relative entropy and studied its role
in quantum hypothesis testing and entanglement theory Datta (2009b,a). Propo-
sition 7.61 is due to Müller-Lennert et al. (2013); Katariya and Wilde (2021).
Proposition 7.64 is due to Wang and Wilde (2019) and Anshu et al. (2019).
The conditional min-entropy was defined by Renner (2005), as well as the
smooth conditional min-entropy. The operational interpretations of the conditional
min- and max-entropies were examined by Koenig et al. (2009).
The hypothesis testing relative entropy was studied implicitly by a variety of
authors for a long time in the context of quantum hypothesis testing: Hiai and
Petz (1991); Ogawa and Nagaoka (2005); Hayashi and Nagaoka (2003); Nagaoka
(2006); Hayashi (2007). It was proposed as a quantum information-theoretic
quantity of interest by Buscemi and Datta (2010a) (in the context of operator
smoothing of a one-shot entropic quantity), and given the name hypothesis testing
relative entropy by Wang and Renner (2012), wherein its connection to classical
communication was explored (see also Hayashi and Nagaoka (2003)). An alternate
proof of the data-processing inequality for quantum relative entropy in terms
of hypothesis testing was given by Bjelakovic and Siegmund-Schultze (2003).
Various properties of the hypothesis testing relative entropy were established by
Dupuis et al. (2013) and Datta et al. (2016), including the fact that it can be
written as an SDP (see also Wang and Wilde (2019) in this context). Eq. (7.9.36)
was established by Wang and Wilde (2019). Proposition 7.67 was considered
by Helstrom (1976) and discussed more recently by Vazquez-Vilar (2016). A
special case of Proposition 7.70 was established by Wang and Renner (2012);

500
Chapter 7: Quantum Entropies and Information

Matthews and Wehner (2014). Proposition 7.71 was established by Cooney et al.
(2016). Proposition 7.72 is essentially due to Hayashi (2007), with a refinement by
Audenaert et al. (2012) and a later rediscovery of it, formulated in a different way,
by Qi et al. (2018b). Proposition 7.80 is essentially due to Hayashi (2007).
The generalized channel divergence of Definition 7.81 was proposed by Leditzky
et al. (2018), and Proposition 7.84 was established as well by Leditzky et al. (2018).
The various generalized channel information measures can be found in the papers
of Wilde et al. (2014); Gupta and Wilde (2015), and the related channel information
measures based on hypothesis testing, Petz–Rényi, and sandwiched Rényi relative
entropy are from Koenig and Wehner (2009); Sharma and Warsi (2013); Wilde
et al. (2014); Gupta and Wilde (2015); Datta et al. (2016). The channel mutual
information was defined by Adami and Cerf (1997), the channel Holevo information
by Schumacher and Westmoreland (1997) (based on the Holevo quantity for
ensembles Holevo (1973)), and the channel coherent information by Lloyd (1997).
These papers together thus developed a general concept of promoting a measure
of correlations in a quantum state to a measure of a channel’s ability to create the
same correlations, by optimizing the state measure with respect to a (subset of) all
of the states that can be generated by means of the channel.
The review by Ruskai (2002) is helpful not only for understanding entropy
inequalities in quantum information, but also for understanding the history of
developments with respect to quantum entropy and information. The book of
Tomamichel (2015) provides an exposition of Rényi relative entropies and their
properties (see also Leditzky (2016)). The book of Wilde (2017a) provides an
overview of entropies in the von Neumann family, their properties, and the derived
channel information measures.

501
Chapter 8

Information Measures for

Quantum Channels

502
Chapter 9

Entanglement Measures
In the previous chapter, we laid the foundation for analyzing quantum communication
protocols by defining entropic quantities, such as the Petz– and sandwiched Rényi
relative entropies, as well as information measures for quantum states and channels
derived from these relative entropies. We now use these information measures to
define entanglement measures for quantum states and channels. Given quantum
systems 𝐴 and 𝐵, an entanglement measure is a function 𝐸 : D(H 𝐴𝐵 ) → R that
quantifies the amount of entanglement present in a state 𝜌 𝐴𝐵 of these systems.
The notion of “quantifying entanglement” is explained in Section 9.1 below,
with the defining requirement of an entanglement measure being that it does not
increase under channels realized by local operations and classical communication
(Definition 4.22). We can think of this requirement of “LOCC monotonicity” as
a restricted form of the data-processing inequality, but now applied to a single
bipartite state rather than to a pair of states. The data-processing inequality indicates
that the distinguishability of two states does not increase under the action of the
same quantum channel (Definition 7.15), whereas LOCC monotonicity indicates
that entanglement does not increase under the action of an LOCC channel on a
bipartite state.
Given an entanglement measure 𝐸 for states, the corresponding entanglement
measure for channels is defined using the general principle in Section 7.11.2, which
is to optimize the state measure with respect to all bipartite states that can be shared
between the sender and receiver of the channel by making use of the channel. We
develop entanglement measures for channels in Chapter 10, and these naturally
quantify how much entanglement can be generated by a channel connecting a
sender to a receiver.
503
Chapter 9: Entanglement Measures

Entanglement measures feature prominently in the analysis of optimal rates for

distillation and communication protocols. In particular, entanglement measures
for states arise as upper bounds on their distillable entanglement and secret key,
which we examine in Chapters 13 and 15, respectively. Entanglement measures
for quantum channels arise as upper bounds on the rates of quantum and private
communication over a quantum channel, which we consider in Chapters 14 and 16,
respectively, as well as for their feedback-assisted counterparts that we consider in
Part III of this book.
Being a uniquely quantum-mechanical property, it is perhaps not surprising
that entanglement features prominently in quantum communication protocols, both
in the encoding and decoding of messages, as well as in the analysis of their
optimal rates. In fact, any state that is not entangled (i.e., separable) is useless
for entanglement and secret key distillation, and similarly, entanglement breaking
channels are useless for quantum and private communication. The distinguishability
of a given state from the set of separable states, which we show in this chapter
is an entanglement measure for states, can thus give an indication of how much
entanglement or secret key can be distilled from it. Similarly, for communication
tasks over quantum channels, the distinguishability of a given quantum channel
from the set of entanglement breaking channels, which we show in this chapter is
an entanglement measure for channels, can be used to determine how good the
channel is for quantum or private communication.
The rest of this chapter proceeds as follows. We start in Section 9.1 by formally
defining what it means for a function 𝐸 to be an entanglement measure for bipartite
states and by providing examples of entanglement measures. In Section 9.2, we
consider entanglement measures that quantify the distinguishability of a given state
𝜌 𝐴𝐵 from the set SEP( 𝐴 : 𝐵) of separable states, with the measure given by some
generalized divergence 𝑫 (see Definition 7.15). We also consider in Section 9.3
a class of entanglement measures based on the distinguishability of a given state
from the larger set PPT′ ⊃ SEP. In Section 9.4, we consider a different kind of
entanglement measure for states called squashed entanglement.

504
Chapter 9: Entanglement Measures

9.1 Definition and Basic Properties

Recall from Definition 3.5 that a bipartite state 𝜌 𝐴𝐵 is called entangled if it is not
separable, meaning that it cannot be written in the following form
∑︁
𝑝(𝑥)𝜏𝐴𝑥 ⊗ 𝜔𝑥𝐵 , (9.1.1)
𝑥∈X

for some finite alphabet X, probability distribution 𝑝 : X → [0, 1], and sets {𝜏𝐴𝑥 }𝑥∈X
and {𝜔𝑥𝐵 }𝑥∈X of states. Determining whether a given quantum state 𝜌 𝐴𝐵 is entangled
is a fundamental problem in quantum information theory. In Section 3.2.1, in the
discussion after Definition 3.5, we listed the following criteria for the entanglement
of pure and mixed states:
• Schmidt rank criterion: A pure bipartite state 𝜓 𝐴𝐵 is entangled if and only if
its Schmidt rank is strictly greater than one.
• PPT criterion: If a bipartite mixed state 𝜌 𝐴𝐵 has negative partial transpose
(i.e., the partial transpose 𝜌 T𝐴𝐵
𝐵
has at least one negative eigenvalue), then it is
entangled. If both systems 𝐴 and 𝐵 are qubit systems or if one of the systems
is a qubit and the other a qutrit, then 𝜌 𝐴𝐵 is entangled if and only if 𝜌 T𝐴𝐵
𝐵
has
negative partial transpose.
In the case of mixed states, there is generally not a simple necessary and sufficient
criterion to determine whether a given bipartite state is entangled, and in fact it is
known that it is computationally difficult, in a precise sense, to decide if a state is
entangled (please consult the Bibliographic Notes in Section 9.6).
In addition to determining whether or not a given quantum state is entangled, we
are interested in quantifying the amount of entanglement present in a quantum state.
Doing so allows us to compare quantum states based on the amount of entanglement
present in them. An entanglement measure is a function 𝐸 : D(H 𝐴𝐵 ) → R from
the set of density operators acting on the Hilbert space of a bipartite system to
the set of real numbers, and it quantifies the entanglement of a quantum state
𝜌 𝐴𝐵 ∈ D(H 𝐴𝐵 ). (The formal definition of an entanglement measure is given in
Definition 9.1.) To indicate the partitioning of the subsystems explicitly, we often
write 𝐸 ( 𝐴; 𝐵) 𝜌 instead of 𝐸 (𝜌 𝐴𝐵 ).
How exactly do we quantify entanglement? Suppose that we have a bipartite
state 𝜌 𝐴𝐵 and we would like to quantify the entanglement between the systems
505
Chapter 9: Entanglement Measures

𝐴 and 𝐵. One fundamental observation is that the entanglement of 𝜌 𝐴𝐵 cannot

increase under the action of a local operations and classical communication (LOCC)
channel (recall Definition 4.22). This is intuitive because entanglement is a non-
local property of a bipartite state, and so local operations alone do not increase it.
Similarly, classical communication should only affect the classical correlations
between the two systems 𝐴 and 𝐵, and not the quantum correlations, i.e., the
entanglement.
Given the reasoning above, the defining property of an entanglement measure
𝐸 : D(H 𝐴𝐵 ) → R for the quantum systems 𝐴 and 𝐵 is that it does not increase
under the action of an LOCC channel:

Definition 9.1 Entanglement Measure

We say that 𝐸 ( 𝐴; 𝐵) 𝜌 is an entanglement measure for a bipartite state 𝜌 𝐴𝐵 if
the following inequality holds for every bipartite state 𝜌 𝐴𝐵 and every LOCC
channel L 𝐴𝐵→𝐴′ 𝐵′ that acts on 𝜌 𝐴𝐵 :

𝐸 ( 𝐴; 𝐵) 𝜌 ≥ 𝐸 ( 𝐴′; 𝐵′)𝜔 , (9.1.2)

where 𝜔 𝐴′ 𝐵′ B L 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ).

This property of LOCC monotonicity is the most operational property of an

entanglement measure, in the sense that it relates directly to the distillation or
communication tasks that we consider in this book, which involve LOCC between
the two parties sharing a state 𝜌 𝐴𝐵 or between sender and receiver at the terminals
of a quantum channel, respectively. It is also analogous to the data-processing
inequality for generalized divergences.
The defining requirement of LOCC monotonicity implies that an entanglement
measure 𝐸 takes on its minimum value on the set of separable states. To see this,
recall from the discussion after Definition 3.5 that a separable state can be prepared
by LOCC. Thus, starting from an arbitrary state 𝜌 𝐴𝐵 , Alice and Bob can trace out
their local systems 𝐴 and 𝐵 and perform LOCC to prepare a separable state 𝜎𝐴′ 𝐵′ .
The serial concatenation of these two actions is itself an LOCC channel. Thus, it
follows from the definition above that

𝐸 ( 𝐴; 𝐵) 𝜌 ≥ 𝐸 ( 𝐴′; 𝐵′)𝜎 (9.1.3)

for every separable state 𝜎𝐴′ 𝐵′ . Now, given another separable state 𝜎𝐴′ ′′ 𝐵′′ , it
506
Chapter 9: Entanglement Measures

is possible to transform between 𝜎𝐴′ 𝐵′ and 𝜎𝐴′ ′′ 𝐵′′ using LOCC, meaning that
𝐸 ( 𝐴′; 𝐵′)𝜎 ≥ 𝐸 ( 𝐴′′; 𝐵′′)𝜎′ and 𝐸 ( 𝐴′; 𝐵′)𝜎 ≤ 𝐸 ( 𝐴′′; 𝐵′′)𝜎′ . Therefore,
𝐸 ( 𝐴′; 𝐵′)𝜎 = 𝐸 ( 𝐴′′; 𝐵′′)𝜎′ , (9.1.4)
for all separable states 𝜎𝐴𝐵 and 𝜎𝐴′ ′ 𝐵′ . As a consequence, an entanglement measure
𝐸 takes on its minimum value and is equal to a constant 𝑐 ∈ R for all separable
states. It is often convenient and simpler if an entanglement measure 𝐸 is equal to
zero for all separable states. If this is not the case, then we can simplify redefine
the entanglement measure as 𝐸 ′ ( 𝐴; 𝐵) 𝜌 = 𝐸 ( 𝐴; 𝐵) 𝜌 − 𝑐. By this reasoning and
adjustment (if needed), every entanglement measure (as per Definition 9.1) satisfies
the following two properties of non-negativity on all states and vanishing on
separable states:
1. Non-negativity: 𝐸 (𝜌 𝐴𝐵 ) ≥ 0 for every state 𝜌 𝐴𝐵 .
2. Vanishing for separable states: 𝐸 (𝜎𝐴𝐵 ) = 0 for every separable state 𝜎𝐴𝐵 .
Other properties that are desirable for an entanglement measure 𝐸 are as follows:
1. Faithfulness: 𝐸 (𝜎𝐴𝐵 ) = 0 if and only if 𝜎𝐴𝐵 is separable, so that 𝐸 (𝜌 𝐴𝐵 ) > 0
if and only if 𝜌 𝐴𝐵 is entangled.
2. Invariance under classical communication: For every finite alphabet X,
probability distribution 𝑝 : X → [0, 1] on X, and set {𝜌 𝑥𝐴𝐵 }𝑥∈X of states,
define the following classical–quantum state:
∑︁
𝜌 𝑋 𝐴𝐵 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴𝐵 . (9.1.5)
𝑥∈X

Then the entanglement measure 𝐸 satisfies invariance under classical commu-

nication if
∑︁
𝐸 (𝑋 𝐴; 𝐵) 𝜌 = 𝐸 ( 𝐴; 𝐵𝑋) 𝜌 = 𝑝(𝑥)𝐸 ( 𝐴; 𝐵) 𝜌 𝑥 . (9.1.6)
𝑥∈X

This property has the interpretation of invariance under classical communication

because the equality 𝐸 (𝑋 𝐴; 𝐵) 𝜌 = 𝐸 ( 𝐴; 𝐵𝑋) 𝜌 indicates that the classical value
𝑥 in register 𝑋 can be communicated classically to Bob and discarded locally,
and the entanglement measure does not change under this action. Furthermore,
the value of the entanglement is simply the expected entanglement, where the
expectation is calculated with respect to the probability distribution 𝑝(𝑥).
This property is also known as the “flags” property in the research literature.
507
Chapter 9: Entanglement Measures

3. Convexity: For every finite alphabet X, probability distribution 𝑝 : X → [0, 1]

on X, and set {𝜌 𝑥𝐴𝐵 }𝑥∈X of states,
!
∑︁ ∑︁
𝑝(𝑥)𝐸 (𝜌 𝑥𝐴𝐵 ) ≥ 𝐸 𝑝(𝑥) 𝜌 𝑥𝐴𝐵 . (9.1.7)
𝑥∈X 𝑥∈X

Convexity is an intuitive property of entanglement, and for entanglement

measures that are invariant under classical communication (obeying (9.1.6)), it
captures the idea that entanglement should not increase on average if classical
information about the identity of a state is lost. (In fact, convexity is an
immediate consequence of LOCC monotonicity and invariance under classical
communication.)
4. Additivity: The entanglement of a tensor-product state 𝜌 𝐴1 𝐴2 𝐵1 𝐵2 = 𝜏𝐴1 𝐵1 ⊗
𝜔 𝐴2 𝐵2 is the sum of the entanglement of the individual states in the tensor
product:
𝐸 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 )𝜏⊗𝜔 = 𝐸 ( 𝐴1 ; 𝐵1 )𝜏 + 𝐸 ( 𝐴2 ; 𝐵2 )𝜔 . (9.1.8)
If instead we have only that

𝐸 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 )𝜏⊗𝜔 ≤ 𝐸 ( 𝐴1 ; 𝐵1 )𝜏 + 𝐸 ( 𝐴2 ; 𝐵2 )𝜔 (9.1.9)

for all states 𝜏𝐴1 𝐵1 and 𝜔 𝐴2 𝐵2 , then the entanglement measure 𝐸 is subadditive.
5. Selective LOCC monotonicity: A property stronger than LOCC monotonicity
is that 𝐸 is non-increasing on average under an LOCC instrument. In more
detail, let 𝜌 𝐴𝐵 be a bipartite state, and let {L𝑥𝐴𝐵→𝐴′ 𝐵′ }𝑥∈X be a collection of
maps, such that L↔ 𝐴𝐵→𝐴′ 𝐵′ is an LOCC channel of the form:
∑︁
↔
L 𝐴𝐵→𝐴′ 𝐵′ = L𝑥𝐴𝐵→𝐴′ 𝐵′ , (9.1.10)
𝑥∈X

for some finite alphabet X and where each map L𝑥𝐴𝐵→𝐴′ 𝐵′ is completely positive
such that the sum map L↔ 𝐴𝐵→𝐴′ 𝐵′ is trace preserving (i.e., a quantum channel).
Furthermore, each map L𝑥𝐴𝐵→𝐴′ 𝐵′ can be written in the form of (4.6.52), as
follows: ∑︁
𝑥,𝑦 𝑥,𝑦
L𝑥𝐴𝐵→𝐴′ 𝐵′ = E 𝐴→𝐴′ ⊗ F𝐵→𝐵′ , (9.1.11)
𝑦∈Y
𝑥,𝑦 𝑥,𝑦
where {E 𝐴→𝐴′ }𝑥∈X and {F𝐵→𝐵′ }𝑥∈X are sets of completely positive maps. Set

𝑝(𝑥) B Tr[L𝑥𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 )], (9.1.12)

508
Chapter 9: Entanglement Measures

and for 𝑥 ∈ X such that 𝑝(𝑥) ≠ 0, set

1
𝜔𝑥𝐴𝐵 B L𝑥𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ). (9.1.13)
𝑝(𝑥)
If the classical value of 𝑥 is not discarded, then the given state 𝜌 𝐴𝐵 is transformed
to the ensemble {( 𝑝(𝑥), 𝜔𝑥𝐴𝐵 )}𝑥∈X via LOCC.
The entanglement measure 𝐸 satisfies selective LOCC monotonicity if
∑︁
𝐸 (𝜌 𝐴𝐵 ) ≥ 𝑝(𝑥)𝐸 (𝜔𝑥𝐴𝐵 ), (9.1.14)
𝑥∈X:𝑝(𝑥)≠0

for every ensemble {( 𝑝(𝑥), 𝜔𝑥𝐴𝐵 )}𝑥∈X that arises from 𝜌 𝐴𝐵 via LOCC as
specified above. Selective LOCC monotonicity indicates that entanglement
does not increase on average under the action of LOCC. Many entanglement
measures satisfy this stronger property.
Observe that selective LOCC monotonicity in (9.1.14) implies LOCC mono-
tonicity in (9.1.2), simply because the alphabet X in (9.1.10) can consist of
only one letter.
The entanglement measures that we consider in this chapter satisfy many of the
properties listed above.
Given that we would like to quantify entanglement, it makes sense to ask what
the basic unit of entanglement should be. We take as our unit of entanglement the
two-qubit maximally entangled Bell state |Φ⟩ = √1 (|0, 0⟩ + |1, 1⟩), and we thus say
2
that the state |Φ⟩ represents “one ebit.” A maximally entangled state of Schmidt rank
𝑑 is then referred to as having “log2 𝑑 ebits.” All of the entanglement measures that
we consider in this chapter are equal to one for a two-qubit maximally entangled state,
which is another justification for using it as the unit of entanglement.1 Similarly, for
a maximally entangled state of Schmidt rank 𝑑, all of the entanglement measures
that we consider in this chapter are equal to log2 𝑑.
To close out this introductory section, we prove a lemma that helps to reduce
the difficulty in determining whether a given function is an entanglement measure.

1This “normalization” condition is sometimes taken to be a requirement for an entanglement

measure.

509
Chapter 9: Entanglement Measures

Lemma 9.2
Let 𝐸 : 𝐷 (H 𝐴𝐵 ) → R be a function that, for every bipartite state 𝜌 𝐴𝐵 , is
1. invariant under classical communication, as defined in (9.1.6), and
2. obeys data processing under local channels, in the sense that

𝐸 ( 𝐴; 𝐵) 𝜌 ≥ 𝐸 ( 𝐴′; 𝐵′)𝜔 , (9.1.15)

for all channels N 𝐴→𝐴′ and M𝐵→𝐵′ , where

𝜔 𝐴′ 𝐵′ B (N 𝐴→𝐴′ ⊗ M𝐵→𝐵′ )(𝜌 𝐴𝐵 ). (9.1.16)

Then 𝐸 is convex, as defined in (9.1.7), and a selective LOCC monotone, as

defined in (9.1.14).

Proof: We first prove convexity. Let

∑︁
𝜌 𝑋 𝐴𝐵 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴𝐵 , (9.1.17)
𝑥∈X

where X is a finite alphabet, 𝑝 : X → [0, 1] is a probability distribution, and

{𝜌 𝑥𝐴𝐵 }𝑥∈X is a set of states. Then
∑︁
𝑝(𝑥)𝐸 ( 𝐴; 𝐵) 𝜌 𝑥 = 𝐸 (𝑋 𝐴; 𝐵) 𝜌 (9.1.18)
𝑥∈X
≥ 𝐸 ( 𝐴; 𝐵) 𝜌 , (9.1.19)

Í 𝐸 ( 𝐴; 𝐵)𝑥 in the last line is evaluated with respect to the

where the entanglement
reduced state 𝜌 𝐴𝐵 = 𝑥∈X 𝑝(𝑥) 𝜌 𝐴𝐵 . The equality follows from the assumption of
invariance under classical communication, as defined in (9.1.6), and the inequality
follows because the partial trace channel Tr 𝑋 is a local channel that discards the
classical system 𝑋.
To establish selective LOCC monotonicity, consider that an LOCC channel
L↔
𝐴𝐵→𝐴′ 𝐵′ of the form in (9.1.10), by Definition 4.22, can be built up as a serial
concatenation of one-way LOCC channels. Furthermore, each one-way LOCC
channel can keep some classical information and discard some of it. It is helpful
conceptually to think of the retained classical information as part of the variable
𝑥 in (9.1.10) and the discarded classical information as part of the variable 𝑦 in
510
Chapter 9: Entanglement Measures

(9.1.11). In more detail, each such one-way LOCC channel (from Alice to Bob in
the case discussed below) has the following form:
∑︁
E𝑘,ℓ 𝑘,ℓ
𝐴→𝐴′ ⊗ F 𝐵→𝐵′ , (9.1.20)
𝑘,ℓ

where {E𝑘,ℓ 𝐴 } 𝑘,ℓ is a collection of completely positive maps such that the sum map
Í 𝑘,ℓ 𝑘,ℓ
𝑘,ℓ E 𝐴 is trace preserving, and {F 𝐵 } 𝑘,ℓ is a collection of quantum channels. For
now and for simplicity, let use the superindex 𝑚 B (𝑘, ℓ). We should think of the
classical information 𝑘 as that which is being kept and that in ℓ as that which is
being lost or let go. The one-way LOCC channel in (9.1.20) can be implemented
using the following steps:

1. Alice applies the following local quantum channel:

∑︁
𝜏𝐴𝐵 → E𝑚𝐴→𝐴′ (𝜏𝐴𝐵 ) ⊗ |𝑚⟩⟨𝑚| 𝑀 𝐴 . (9.1.21)
𝑚

2. Alice employs a classical communication channel

∑︁
(·) 𝑀 𝐴 → |𝑚⟩ 𝑀𝐵 ⟨𝑚| 𝑀 𝐴 (·) 𝑀 𝐴 |𝑚⟩ 𝑀 𝐴 ⟨𝑚| 𝑀𝐵 (9.1.22)
𝑚

to communicate the value in 𝑀 𝐴 to Bob:

∑︁ ∑︁
E𝑚𝐴→𝐴′ (𝜏𝐴𝐵 ) ⊗ |𝑚⟩⟨𝑚| 𝑀 𝐴 → E𝑚𝐴→𝐴′ (𝜏𝐴𝐵 ) ⊗ |𝑚⟩⟨𝑚| 𝑀𝐵 . (9.1.23)
𝑚 𝑚

3. Bob performs the local channel

∑︁
(·) 𝐵𝑀𝐵 → F𝐵→𝐵
𝑚
′ (·) ⊗ |𝑚⟩⟨𝑚| 𝑀 𝐵 (·) 𝑀 𝐵 |𝑚⟩⟨𝑚| 𝑀 𝐵 , (9.1.24)
𝑚

which can be understood as “looking in the classical register 𝑀𝐵 ” to determine

the value 𝑚 and performing the local quantum channel F𝐵→𝐵 𝑚
′ based on the
value 𝑚 found. Under this local channel, the global state becomes as follows:
∑︁ ∑︁
E 𝐴→𝐴′ (𝜏𝐴𝐵 ) ⊗ |𝑚⟩⟨𝑚| 𝑀𝐵 →
𝑚
(E𝑚𝐴→𝐴′ ⊗ F𝐵→𝐵
𝑚
′ )(𝜏𝐴𝐵 ) ⊗ |𝑚⟩⟨𝑚| 𝑀 𝐵 .
𝑚 𝑚
(9.1.25)

511
Chapter 9: Entanglement Measures

4. Bob then discards the ℓ part of the classical information 𝑚, as follows:

∑︁
(E𝑚𝐴→𝐴′ ⊗ F𝐵→𝐵
𝑚
′ )(𝜏𝐴𝐵 ) ⊗ |𝑚⟩⟨𝑚| 𝑀 𝐵
𝑚
∑︁
= (E𝑘,ℓ 𝑘,ℓ
𝐴→𝐴′ ⊗ F 𝐵→𝐵′ )(𝜏𝐴𝐵 ) ⊗ |𝑘, ℓ⟩⟨𝑘, ℓ| 𝐾 𝐵 𝐿 𝐵 (9.1.26)
𝑘,ℓ
∑︁
→ (E𝑘,ℓ 𝑘,ℓ
𝐴→𝐴′ ⊗ F 𝐵→𝐵′ )(𝜏𝐴𝐵 ) ⊗ |𝑘⟩⟨𝑘 | 𝐾 𝐵 (9.1.27)
𝑘,ℓ
∑︁
= 𝑝(𝑘)𝜔 𝑘𝐴′ 𝐵′ ⊗ |𝑘⟩⟨𝑘 | 𝐾 𝐵 , (9.1.28)
𝑘

where
" #
∑︁
𝑝(𝑘) B Tr (E𝑘,ℓ 𝑘,ℓ
𝐴→𝐴′ ⊗ F 𝐵→𝐵′ )(𝜏𝐴𝐵 ) , (9.1.29)
ℓ
1 ∑︁ 𝑘,ℓ 𝑘,ℓ
𝜔 𝑘𝐴′ 𝐵′ B (E 𝐴→𝐴′ ⊗ F𝐵→𝐵 ′ )(𝜏𝐴𝐵 ). (9.1.30)
𝑝(𝑘) ℓ

Bob could, if desired, finally discard the classical register 𝐾 𝐵 to implement the
one-way LOCC channel in (9.1.20). However, it is helpful to hold on to it for
our analysis below.

Now we analyze how the entanglement changes under each of these steps,
omitting the state subscripts at each step except for the first and last lines, because
those not shown are clear from the context:

𝐸 ( 𝐴; 𝐵)𝜏 ≥ 𝐸 ( 𝐴′ 𝑀 𝐴 ; 𝐵) (9.1.31)
= 𝐸 ( 𝐴′; 𝐵𝑀𝐵 ) (9.1.32)
≥ 𝐸 ( 𝐴′; 𝐵′ 𝑀𝐵 ) (9.1.33)
= 𝐸 ( 𝐴′; 𝐵′𝐾 𝐵 𝐿 𝐵 ) (9.1.34)
≥ 𝐸 ( 𝐴′; 𝐵′𝐾 𝐵 ) (9.1.35)
∑︁
= 𝑝(𝑘)𝐸 ( 𝐴′; 𝐵′)𝜔 𝑘 . (9.1.36)
𝑘

The first inequality follows from data processing under the local channel in
(9.1.21). The first equality follows from the assumption of invariance of classical
communication, i.e., invariance under the action of the classical channel in (9.1.22).
The second inequality follows from data processing under the local channel in
512
Chapter 9: Entanglement Measures

(9.1.25). The second equality is trivial, following because 𝑀𝐵 = (𝐾 𝐵 , 𝐿 𝐵 ) by

definition. The third inequality follows again from data processing under the local
channel in (9.1.27). The final equality follows again from invariance under classical
communication.
Thus, we have shown selective one-way LOCC monotonicity (from Alice to
Bob) in the following sense:
∑︁
𝐸 ( 𝐴; 𝐵)𝜏 ≥ 𝑝(𝑘)𝐸 ( 𝐴′; 𝐵′)𝜔 𝑘 , (9.1.37)
𝑘

where the ensemble {( 𝑝(𝑘), 𝜔 𝑘𝐴′ 𝐵′ )} 𝑘 arises from the state 𝜏𝐴𝐵 by means of one-way
LOCC from Alice to Bob. By the same argument, but flipping the role of Alice and
Bob, it follows that selective one-way LOCC monotonicity from Bob to Alice holds
for the function 𝐸. Since every LOCC channel is built up as a serial concatenation
of one-way LOCC channels and since we have proven that selective monotonicity
holds for the function 𝐸 for each of them, it follows that 𝐸 obeys selective LOCC
monotonicity. ■

9.1.1 Examples

Let us now consider some examples of entanglement measures. The first two
entanglement measures that we consider are related to the Schmidt rank criterion
and the PPT criterion, respectively, stated in Section 3.2.1 and reiterated at the
beginning of this chapter. They are known as the entanglement of formation and the
log-negativity, respectively, and are some of the simplest and earliest entanglement
measures defined. They are also conceptually linked to more complex entanglement
measures like squashed entanglement and the Rains relative entropy, the latter of
which are the best known upper bounds on distillable entanglement (studied in
Chapter 13).

9.1.1.1 Entanglement of Formation

Given a pure bipartite state 𝜓 𝐴𝐵 = |𝜓⟩⟨𝜓| 𝐴𝐵 , there exists a Schmidt decomposition

of |𝜓⟩ 𝐴𝐵 such that
𝑟 √︁
∑︁
|𝜓⟩ 𝐴𝐵 = 𝜆 𝑘 |𝑒 𝑘 ⟩ 𝐴 ⊗ | 𝑓 𝑘 ⟩𝐵 , (9.1.38)
𝑘=1

513
Chapter 9: Entanglement Measures

where 𝑟 is the Schmidt rank, 𝜆 𝑘 > 0 are the Schmidt coefficients, and {|𝑒 𝑘 ⟩ 𝐴 }𝑟𝑘=1 ,
{| 𝑓 𝑘 ⟩𝐵 }𝑟𝑘=1 are orthonormal sets. Observe that the reduced states 𝜓 𝐴 B Tr 𝐵 [𝜓 𝐴𝐵 ]
and 𝜓 𝐵 B Tr 𝐴 [𝜓 𝐴𝐵 ] have the same non-zero eigenvalues, which means that their
entropies are equal, i.e., 𝐻 (𝜓 𝐴 ) = 𝐻 (𝜓 𝐵 ). Furthermore, 𝐻 (𝜓 𝐴 ) = 0 if and only
if 𝑟 = 1, and 𝑟 = 1 if and only if 𝜓 𝐴𝐵 is separable, by the Schmidt rank criterion.
Therefore, the entropy of the reduced state of a pure bipartite state provides us with
a signature of entanglement for pure bipartite states:
𝜓 𝐴𝐵 entangled ⇐⇒ 𝐻 (𝜓 𝐴 ) > 0. (9.1.39)
We let
𝑟
∑︁
𝐸 𝐹 (𝜓 𝐴𝐵 ) B 𝐻 (Tr 𝐵 [𝜓 𝐴𝐵 ]) = − 𝜆 𝑘 log2 𝜆 𝑘 (9.1.40)
𝑘=1
for every pure state 𝜓 𝐴𝐵 .
The function 𝐸 𝐹 is an entanglement measure, as proven in Proposition 9.3
below. When evaluated on pure bipartites as above, it is known as the entropy of
entanglement or entanglement entropy, and it is often simply denoted by 𝐸 (𝜓 𝐴𝐵 )
in the research literature. By (9.1.39), it is also a faithful entanglement measure
on pure states, i.e., 𝐸 𝐹 (𝜓 𝐴𝐵 ) = 0 if and only if 𝜓 𝐴𝐵 is a separable state. Recall
from Section 3.2.3 that a maximally entangled state is defined by having 𝜆 𝑘 = 1𝑟
for all 1 ≤ 𝑘 ≤ 𝑟. For such states, 𝐸 𝐹 (𝜓 𝐴𝐵 ) = log2 𝑟, which justifies calling them
maximally entangled because log2 𝑟 is the largest value of the quantum entropy for
states supported on an 𝑟-dimensional space.
The definition in (9.1.40) for the entanglement measure 𝐸 𝐹 , so far, has been
defined only for pure states. To extend the definition to mixed states, we use the
fact that a mixed state 𝜌 𝐴𝐵 can be decomposed into a convex combination of pure
states as follows: ∑︁
𝜌 𝐴𝐵 = 𝑝(𝑥)𝜓 𝑥𝐴𝐵 , (9.1.41)
𝑥∈X
where X is a finite alphabet, 𝑝 : X → [0, 1] is a probability distribution, and
{𝜓 𝑥𝐴𝐵 }𝑥∈X is a set of pure states. We can then measure the entanglement of
𝜌 𝐴𝐵 by taking the expected entanglement of the pure states involved in the
Í
decomposition of 𝜌 𝐴𝐵 , i.e., by 𝑥∈X 𝑝(𝑥)𝐻 (𝜓 𝑥𝐴 ), where 𝜓 𝑥𝐴 B Tr 𝐵 [𝜓 𝑥𝐴𝐵 ]. However,
this strategy can lead to different values for the entanglement of 𝜌 𝐴𝐵 , depending on
the chosen decomposition, because the decomposition of mixed states into a convex
combination of pure states is generally not unique. We can address this issue by
minimizing over all possible decompositions, leading to the following definition:

514
Chapter 9: Entanglement Measures

Definition 9.3 Entanglement of Formation

The entanglement of formation of a bipartite state 𝜌 𝐴𝐵 is defined as
( )
∑︁ ∑︁
𝐸 𝐹 (𝜌 𝐴𝐵 ) B inf𝑥 𝑝(𝑥)𝐻 (𝜓 𝑥𝐴 ) : 𝜌 𝐴𝐵 = 𝑝(𝑥)𝜓 𝑥𝐴𝐵 ,
{( 𝑝(𝑥),𝜓 𝐴𝐵 )} 𝑥 ∈X
𝑥∈X 𝑥∈X
(9.1.42)
Í
where 𝜌 𝐴𝐵 = 𝑥
𝑥∈X 𝑝(𝑥)𝜓 𝐴𝐵 is a pure-state decomposition of 𝜌 𝐴𝐵 .

It suffices to take |X| ≤ dim(H 𝐴𝐵 ) 2 in the optimization in (9.1.42), and

furthermore, the infimum is achieved by at least one pure-state decomposition. The
fact that the alphabet X need not exceed dim(H 𝐴𝐵 ) 2 elements is due to the entropy
being a continuous function and the Fenchel–Eggleston–Carathéodory Theorem
(Theorem 2.23) and the fact that dimension of the space of density operators on a
dim(H 𝐴𝐵 )-dimensional space is dim(H 𝐴𝐵 ) 2 . The fact that the infimum is achieved
follows because the optimization is with respect to a compact space and the function
being optimized is continuous.
For every pure bipartite state 𝜓 𝐴𝐵 , the equality in (9.1.40) holds because it is not
possible to decompose a pure state 𝜓 𝐴𝐵 as a mixture of other pure states different
from 𝜓 𝐴𝐵 . As mentioned above, the entanglement of formation is also known as
the entropy of entanglement for the special case of a pure bipartite state, due to the
fact that it is equal to the entropy of the reduced state.
It is a direct consequence of Definition 9.3 and the non-negativity of quantum
entropy in (7.2.107) that the entanglement of formation is non-negative, i.e.,
𝐸 𝐹 (𝜌 𝐴𝐵 ) ≥ 0 (9.1.43)
for every bipartite state 𝜌 𝐴𝐵 .
Not only is the entanglement of formation non-negative, but it is also faithful;
and we can make an even more refined statement about approximate faithfulness
(when 𝜌 𝐴𝐵 is close to separable or when 𝐸 𝐹 (𝜌 𝐴𝐵 ) is close to zero).
Before establishing this statement in Proposition 9.5 below, we first prove a
uniform continuity bound for the entanglement of formation. Uniform continuity is a
desirable property of an entanglement measure: if two bipartite states 𝜌 𝐴𝐵 and 𝜎𝐴𝐵
are not very distinguishable from each other (according to some distinguishability
measure), then the difference of their entanglement should be small. (See Section 2.3
515
Chapter 9: Entanglement Measures

for a definition of uniform continuity.)

Proposition 9.4 Uniform Continuity of Entanglement of Formation

Let 𝜌 𝐴𝐵 and 𝜎𝐴𝐵 be states satisfying
1
∥ 𝜌 𝐴𝐵 − 𝜎𝐴𝐵 ∥ 1 ≤ 𝜀, (9.1.44)
2
where 𝜀 ∈ [0, 1]. Then the entanglement of formations of 𝜌 𝐴𝐵 and 𝜎𝐴𝐵 satisfy

|𝐸 𝐹 (𝜌 𝐴𝐵 ) − 𝐸 𝐹 (𝜎𝐴𝐵 )| ≤ 𝛿 log2 min {𝑑 𝐴 , 𝑑 𝐵 } + 𝑔2 (𝛿), (9.1.45)

√︁
where 𝛿 B 𝜀 (2 − 𝜀) and 𝑔2 (𝑥) B (𝑥 + 1) log2 (𝑥 + 1) − 𝑥 log2 𝑥.

Proof: By applying Theorem 6.14 to (9.1.44), we find that

√
𝐹 (𝜌 𝐴𝐵 , 𝜎𝐴𝐵 ) ≥ 1 − 𝜀, (9.1.46)

which implies that

√︁ √︃ √︁
1 − 𝐹 (𝜌 𝐴𝐵 , 𝜎𝐴𝐵 ) ≤ 1 − (1 − 𝜀) 2 = 𝜀 (2 − 𝜀). (9.1.47)

Let ∑︁ √︁
𝜌
|𝜓 ⟩ 𝑅 𝐴𝐵 B 𝑝(𝑥)|𝑥⟩ 𝑅 |𝜓 𝑥 ⟩ 𝐴𝐵 . (9.1.48)
𝑥
Í
be a purification of 𝜌 𝐴𝐵 , with 𝜌 𝐴𝐵 = 𝑥 𝑝(𝑥)𝜓 𝑥𝐴𝐵 a pure-state decomposition of 𝜌 𝐴𝐵 .
By applying Uhlmann’s theorem (Theorem 6.8), there exists a purification 𝜓 𝑅𝜎𝐴𝐵
𝜌
of 𝜎𝐴𝐵 such that 𝐹 (𝜓 𝑅 𝐴𝐵 , 𝜓 𝑅𝜎𝐴𝐵 ) = 𝐹 (𝜌 𝐴𝐵 , 𝜎𝐴𝐵 ). By combining this observation
with (9.1.47), and the fact that the sine distance of two pure states is equal to the
normalized trace distance (see (6.1.1)), we conclude that
1 𝜌 √︁
𝜓 𝑅 𝐴𝐵 − 𝜓 𝑅𝜎𝐴𝐵 1
≤ 𝜀 (2 − 𝜀). (9.1.49)
2
Í
We now apply the measurement channel M 𝑅→𝑋 (·) B 𝑥 |𝑥⟩ 𝑋 ⟨𝑥| 𝑅 (·)|𝑥⟩ 𝑅 ⟨𝑥| 𝑋 to
the 𝑅 systems, as well as the data-processing inequality for the trace distance, to
conclude that
1 𝜌
√︁
M 𝑅→𝑋 (𝜓 𝑅 𝐴𝐵 ) − M 𝑅→𝑋 (𝜓 𝑅𝜎𝐴𝐵 ) 1 ≤ 𝜀 (2 − 𝜀). (9.1.50)
2
516
Chapter 9: Entanglement Measures

Consider that
∑︁
𝜌
𝜌 𝑋 𝐴𝐵 B M 𝑅→𝑋 (𝜓 𝑅 𝐴𝐵 ) = 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜓 𝑥𝐴𝐵 , (9.1.51)
𝑥

and there exists a probability distribution 𝑞(𝑥) and a set {𝜑𝑥𝐴𝐵 }𝑥 , satisfying
∑︁
𝜎𝐴𝐵 = 𝑞(𝑥)𝜑𝑥𝐴𝐵 , (9.1.52)
𝑥

such that ∑︁
𝜎𝑋 𝐴𝐵 B M 𝑅→𝑋 (𝜓 𝑅𝜎𝐴𝐵 ) = 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜑𝑥𝐴𝐵 . (9.1.53)
𝑥
Now applying the uniform continuity of conditional mutual information (Proposi-
tion 7.10), we conclude that
1 1
𝐼 ( 𝐴; 𝐵|𝑋)𝜎 ≤ 𝐼 ( 𝐴; 𝐵|𝑋) 𝜌 + 𝛿 log2 min{𝑑 𝐴 , 𝑑 𝐵 } + 𝑔2 (𝛿), (9.1.54)
2 2
√︁
with 𝛿 = 𝜀 (2 − 𝜀). Since the states of systems 𝐴𝐵 are pure when conditioned on
the classical system 𝑋, for both 𝜌 𝑋 𝐴𝐵 and 𝜎𝑋 𝐴𝐵 , consider that
1 1
𝐼 ( 𝐴; 𝐵|𝑋)𝜎 = 𝐻 ( 𝐴|𝑋)𝜎 , 𝐼 ( 𝐴; 𝐵|𝑋) 𝜌 = 𝐻 ( 𝐴|𝑋) 𝜌 . (9.1.55)
2 2
So we conclude that

𝐸 𝐹 (𝜎𝐴𝐵 ) ≤ 𝐻 ( 𝐴|𝑋)𝜎 ≤ 𝐻 ( 𝐴|𝑋) 𝜌 + 𝛿 log2 min{𝑑 𝐴 , 𝑑 𝐵 } + 𝑔2 (𝛿), (9.1.56)

where the first inequality follows from Definition 9.3 and (9.1.52). Since the
pure-state decomposition of 𝜌 𝐴𝐵 is arbitrary, we conclude that

𝐸 𝐹 (𝜎𝐴𝐵 ) ≤ 𝐸 𝐹 (𝜌 𝐴𝐵 ) + 𝛿 log2 min{𝑑 𝐴 , 𝑑 𝐵 } + 𝑔2 (𝛿). (9.1.57)

Running the argument again, but starting from an arbitrary pure-state decomposition
of 𝜎𝐴𝐵 , we conclude the inequality

𝐸 𝐹 (𝜌 𝐴𝐵 ) ≤ 𝐸 𝐹 (𝜎𝐴𝐵 ) + 𝛿 log2 min{𝑑 𝐴 , 𝑑 𝐵 } + 𝑔2 (𝛿), (9.1.58)

which, together with (9.1.57), implies (9.1.45). ■

517
Chapter 9: Entanglement Measures

Proposition 9.5 Faithfulness of Entanglement of Formation

The entanglement of formation is faithful, so that 𝐸 𝐹 (𝜌 𝐴𝐵 ) = 0 if and only if
the state 𝜌 𝐴𝐵 is separable. More quantitatively, for a state 𝜌 𝐴𝐵 and 𝜀 ∈ [0, 1], if
1
inf ∥ 𝜌 𝐴𝐵 − 𝜎𝐴𝐵 ∥ 1 ≤ 𝜀, (9.1.59)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵) 2

then
𝐸 𝐹 (𝜌 𝐴𝐵 ) ≤ 𝛿 log2 min{𝑑 𝐴 , 𝑑 𝐵 } + 𝑔2 (𝛿), (9.1.60)
√︁
where 𝛿 B 𝜀 (2 − 𝜀). Conversely, for 𝜀 ≥ 0, if

𝐸 𝐹 (𝜌 𝐴𝐵 ) ≤ 𝜀, (9.1.61)

then
1 √
inf ∥ 𝜌 𝐴𝐵 − 𝜎𝐴𝐵 ∥ 1 ≤ 𝜀 ln 2. (9.1.62)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵) 2

Proof: We begin by proving the first statement. Suppose that the state 𝜎𝐴𝐵
is separable. Then by the remark after Definition 3.5, there exists a pure-state
decomposition of 𝜎𝐴𝐵 as
∑︁
𝜎𝐴𝐵 = 𝑝(𝑥)𝜙𝑥𝐴 ⊗ 𝜑𝑥𝐵 . (9.1.63)
𝑥
Í
For this decomposition, we have that 𝑥 𝑝(𝑥)𝐻 (𝜙𝑥𝐴 ) = 0 because the quantum
entropy is equal to zero for a pure state. This implies by definition that 𝐸 𝐹 (𝜎𝐴𝐵 ) = 0.
The statement in (9.1.59)–(9.1.60) then follows by combining this observation with
(9.1.44)–(9.1.45), as well as the fact that the function on the right-hand side of
(9.1.60) is monotone in 𝜀.
To see the second statement, let
∑︁
𝜌 𝐴𝐵 = 𝑝(𝑥)𝜓 𝑥𝐴𝐵 (9.1.64)
𝑥

be an arbitrary pure-state decomposition of 𝜌 𝐴𝐵 . By applying the observation in

(9.1.55), we find that
∑︁ 1 ∑︁
𝑝(𝑥)𝐻 (𝜓 𝑥𝐴 ) = 𝑝(𝑥)𝐼 ( 𝐴; 𝐵)𝜓 𝑥 (9.1.65)
𝑥
2 𝑥

518
Chapter 9: Entanglement Measures

1 ∑︁
= 𝑝(𝑥)𝐷 (𝜓 𝑥𝐴𝐵 ∥𝜓 𝑥𝐴 ⊗ 𝜓 𝑥𝐵 ) (9.1.66)
2 𝑥
1 ∑︁ 2
≥ 𝑝(𝑥) 𝜓 𝑥𝐴𝐵 − 𝜓 𝑥𝐴 ⊗ 𝜓 𝑥𝐵 1
(9.1.67)
4 ln 2 𝑥
2
1 ∑︁ 𝑥
∑︁
≥ 𝑝(𝑥)𝜓 𝐴𝐵 − 𝑝(𝑥)𝜓 𝑥𝐴 ⊗ 𝜓 𝑥𝐵 (9.1.68)
4 ln 2 𝑥 𝑥 1
2
1 ∑︁
= 𝜌 𝐴𝐵 − 𝑝(𝑥)𝜓 𝑥𝐴 ⊗ 𝜓 𝑥𝐵 (9.1.69)
4 ln 2 𝑥 1
2
1 1
≥ inf ∥ 𝜌 𝐴𝐵 − 𝜎𝐴𝐵 ∥ 1 . (9.1.70)
ln 2 𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵) 2

The second equality follows from rewriting the mutual information in terms of
relative entropy (see (7.2.96)). The first inequality follows from the quantum Pinsker
inequality (Corollary 7.32 and the remark thereafter). The second inequality follows
from convexity of the square function and the trace norm. Since the inequality
holds for an arbitrary pure-state decomposition of 𝜌 𝐴𝐵 , we conclude that
2
1 1
𝐸 𝐹 (𝜌 𝐴𝐵 ) ≥ inf ∥ 𝜌 𝐴𝐵 − 𝜎𝐴𝐵 ∥ 1 . (9.1.71)
ln 2 𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵) 2

From this and (9.1.61), we conclude (9.1.62).

Finally, to conclude exact faithfulness from approximate faithfulness, we argue
that the infimum in (9.1.59) and (9.1.62) is achieved (i.e., can be replaced with a
minimum). This follows because the trace distance is continuous in 𝜎𝐴𝐵 and the
set of separable states is compact. ■

As the following proposition indicates, the entanglement of formation is an

entanglement measure according to Definition 9.1. The proof detailed below is
also a good opportunity to apply Lemma 9.2 to a simple example.

Proposition 9.6
The entanglement of formation is convex, so that (9.1.7) holds with 𝐸 set to 𝐸 𝐹 ,
and it is a selective LOCC monotone, so that (9.1.14) holds with 𝐸 set to 𝐸 𝐹 .

519
Chapter 9: Entanglement Measures

Proof: By Lemma 9.2, we only need to show that the entanglement of formation
does not increase under the action of a local channel and is invariant under
classical communication. We begin with the first one. Let 𝜌 𝐴𝐵 be a bipartite state,
and let N 𝐴→𝐴′ be a local quantum channel. Let {( 𝑝(𝑥), 𝜓 𝑥𝐴𝐵 )}𝑥 be a pure-state
Í
decomposition of 𝜌 𝐴𝐵 , i.e., satisfying 𝑥 𝑝(𝑥)𝜓 𝑥𝐴𝐵 = 𝜌 𝐴𝐵 . Let N 𝐴→𝐴′ have the
𝑦
Kraus representation {𝑁 𝐴→𝐴′ } 𝑦 . Then
∑︁
𝜔 𝐴′ 𝐵 B N 𝐴→𝐴′ (𝜌 𝐴𝐵 ) = 𝑝(𝑥)N 𝐴→𝐴′ (𝜓 𝑥𝐴𝐵 ), (9.1.72)
𝑥

and
∑︁ ∑︁
𝑁 𝐴→𝐴′ 𝜓 𝑥𝐴𝐵 (𝑁 𝐴→𝐴′ ) † =
𝑦 𝑦 𝑥,𝑦
N 𝐴→𝐴′ (𝜓 𝑥𝐴𝐵 ) = 𝑝(𝑦|𝑥)𝜑 𝐴′ 𝐵 , (9.1.73)
𝑦 𝑦

where
𝑝(𝑦|𝑥) B Tr[𝑁 𝐴→𝐴′ 𝜓 𝑥𝐴𝐵 (𝑁 𝐴→𝐴′ ) † ],
𝑦 𝑦
(9.1.74)
1
𝑁 𝐴→𝐴′ 𝜓 𝑥𝐴𝐵 (𝑁 𝐴→𝐴′ ) † .
𝑥,𝑦 𝑦 𝑦
𝜑 𝐴′ 𝐵 B (9.1.75)
𝑝(𝑦|𝑥)
𝑥,𝑦
Thus, {( 𝑝(𝑥) 𝑝(𝑦|𝑥), 𝜑 𝐴′ 𝐵 )}𝑥,𝑦 is a pure-state decomposition of 𝜔 𝐴′ 𝐵 . Also, observe
that
𝜓 𝑥𝐵 = Tr 𝐴 [𝜓 𝑥𝐴𝐵 ] (9.1.76)
= Tr 𝐴′ [N 𝐴→𝐴′ (𝜓 𝑥𝐴𝐵 )] (9.1.77)
∑︁
𝑥,𝑦
= 𝑝(𝑦|𝑥) Tr 𝐴′ [𝜑 𝐴′ 𝐵 ] (9.1.78)
𝑦
∑︁
𝑥,𝑦
= 𝑝(𝑦|𝑥)𝜑 𝐵 . (9.1.79)
𝑦

Then we have that

∑︁ ∑︁
𝑥,𝑦
𝑝(𝑥)𝐻 (𝜓 𝑥𝐵 ) ≥ 𝑝(𝑥) 𝑝(𝑦|𝑥)𝐻 (𝜑 𝐵 ) (9.1.80)
𝑥 𝑥,𝑦
≥ 𝐸 𝐹 ( 𝐴′; 𝐵)𝜔 , (9.1.81)
where the first inequality follows from the concavity of entropy (see (7.2.106)) and
the second from the definition of entanglement of formation. Since the inequality
holds for all pure-state decompositions of 𝜌 𝐴𝐵 , we conclude the desired inequality:
𝐸 𝐹 ( 𝐴; 𝐵) 𝜌 ≥ 𝐸 𝐹 ( 𝐴′; 𝐵)𝜔 . (9.1.82)
520
Chapter 9: Entanglement Measures

By flipping the role of Alice and Bob in the analysis above, we conclude that the
entanglement of formation does not increase under the action of a local channel on
Bob’s system.
Now we prove that 𝐸 𝐹 is invariant under classical communication. Let 𝜌 𝑋 𝐴𝐵
be the classical–quantum state defined in (9.1.5). A pure-state decomposition of
𝑥,𝑦
𝜌 𝑋 𝐴𝐵 has the form {( 𝑝(𝑥) 𝑝(𝑦|𝑥), |𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜓 𝐴𝐵 )}𝑥,𝑦 , where
∑︁
𝑥 𝑥,𝑦
𝜌 𝐴𝐵 = 𝑝(𝑦|𝑥)𝜓 𝐴𝐵 . (9.1.83)
𝑦

This ensemble serves as a decomposition of 𝜌 𝑋 𝐴𝐵 for 𝐸 𝐹 (𝑋 𝐴; 𝐵) 𝜌 . Then

∑︁ ∑︁
𝑥,𝑦
𝑝(𝑥) 𝑝(𝑦|𝑥)𝐻 (𝜓 𝐵 ) ≥ 𝑝(𝑥)𝐸 𝐹 ( 𝐴; 𝐵) 𝜌 𝑥 . (9.1.84)
𝑥,𝑦 𝑥

Since the inequality holds for all pure-state decompositions of 𝜌 𝑋 𝐴𝐵 , we conclude

that ∑︁
𝐸 𝐹 (𝑋 𝐴; 𝐵) 𝜌 ≥ 𝑝(𝑥)𝐸 𝐹 ( 𝐴; 𝐵) 𝜌 𝑥 . (9.1.85)
𝑥
𝑥,𝑦
Now let {( 𝑝(𝑦|𝑥), 𝜓 𝐴𝐵 )} 𝑦 be a pure-state decomposition of 𝜌 𝑥𝐴𝐵 . Then we find that
∑︁
𝑥,𝑦
𝑝(𝑥) 𝑝(𝑦|𝑥)𝐻 (𝜓 𝐵 ) ≥ 𝐸 𝐹 (𝑋 𝐴; 𝐵) 𝜌 (9.1.86)
𝑥,𝑦
𝑥,𝑦
because the ensemble {( 𝑝(𝑥) 𝑝(𝑦|𝑥), |𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜓 𝐴𝐵 )}𝑥,𝑦 is a particular pure-state
decomposition of 𝜌 𝑋 𝐴𝐵 . Since the inequality holds for all pure-state decompositions
of 𝜌 𝑥𝐴𝐵 , we conclude that
∑︁
𝑝(𝑥)𝐸 𝐹 ( 𝐴; 𝐵) 𝜌 𝑥 ≥ 𝐸 𝐹 (𝑋 𝐴; 𝐵) 𝜌 . (9.1.87)
𝑥

Putting together (9.1.85) and (9.1.87), we conclude that

∑︁
𝑝(𝑥)𝐸 𝐹 ( 𝐴; 𝐵) 𝜌 𝑥 = 𝐸 𝐹 (𝑋 𝐴; 𝐵) 𝜌 . (9.1.88)
𝑥

By the same argument, but exchanging the roles of Alice and Bob, we conclude that
∑︁
𝑝(𝑥)𝐸 𝐹 ( 𝐴; 𝐵) 𝜌 𝑥 = 𝐸 𝐹 ( 𝐴; 𝐵𝑋) 𝜌 . (9.1.89)
𝑥

This concludes the proof. ■

521
Chapter 9: Entanglement Measures

Proposition 9.7 Subadditivity of Entanglement of Formation

The entanglement of formation is subadditive; i.e., (9.1.9) holds with 𝐸 set to
𝐸𝐹 .

Í Í 𝑦
Proof: Let 𝑥 𝑝(𝑥)𝜓 𝑥𝐴1 𝐵1 and 𝑦 𝑞(𝑦)𝜙 𝐴2 𝐵2 be respective pure-state decompo-
Í 𝑦
sitions of 𝜏𝐴1 𝐵1 and 𝜔 𝐴2 𝐵2 . Then 𝑥,𝑦 𝑝(𝑥)𝑞(𝑦)𝜓 𝑥𝐴1 𝐵1 ⊗ 𝜙 𝐴2 𝐵2 is a pure-state
decomposition of 𝜏𝐴1 𝐵1 ⊗ 𝜔 𝐴2 𝐵2 . It follows that
∑︁
𝑦
𝐸 𝐹 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 )𝜏⊗𝜔 ≤ 𝑝(𝑥)𝑞(𝑦)𝐻 (𝜓 𝑥𝐴1 ⊗ 𝜙 𝐴2 ) (9.1.90)
𝑥,𝑦
∑︁
𝑦
= 𝑝(𝑥)𝑞(𝑦) [𝐻 (𝜓 𝑥𝐴1 ) + 𝐻 (𝜙 𝐴2 )] (9.1.91)
𝑥,𝑦
∑︁ ∑︁
𝑦
= 𝑝(𝑥)𝐻 (𝜓 𝑥𝐴1 ) + 𝑞(𝑦)𝐻 (𝜙 𝐴2 ). (9.1.92)
𝑥 𝑦

Since the inequality holds for arbitrary pure-state decompositions of 𝜏𝐴1 𝐵1 and
𝜔 𝐴2 𝐵2 , we conclude that subadditivity holds. ■

The opposite inequality, superadditivity of entanglement of formation, is known

not to hold in general. Thus, the entanglement of formation is non-additive
in general. The proof of this statement is highly nontrivial (please consult the
Bibliographic Notes in Section 9.6).
The entanglement of formation is connected to an information-theoretic task:
𝐸 𝐹 (𝜌 𝐴𝐵 ) is an achievable rate for the task of preparing the state 𝜌 𝐴𝐵 from many
copies of the two-qubit maximally entangled state |Φ⟩ 𝐴𝐵 when allowing LOCC for
free (please consult the Bibliographic Notes in Section 9.6).
For two-qubit states 𝜌 𝐴𝐵 , the entanglement of formation has the following
analytic expression:
√︁ !
1 + 1 − 𝐶 (𝜌 𝐴𝐵 ) 2
𝐸 𝐹 (𝜌 𝐴𝐵 ) = ℎ2 (two-qubit states). (9.1.93)
2

(please consult the Bibliographic Notes in Section 9.6.) Here,

𝐶 (𝜌 𝐴𝐵 ) = max{0, 𝜆1 − 𝜆 2 − 𝜆 3 − 𝜆 4 }, (9.1.94)

522
Chapter 9: Entanglement Measures
√︁√ √
where 𝜆 1 , 𝜆2 , 𝜆3 , 𝜆4 are the eigenvalues of 𝜌 𝐴𝐵 𝜌 𝐴𝐵 in decreasing order.
𝜌 𝐴𝐵 e
The operator e 𝜌 𝐴𝐵 B (𝑌 ⊗𝑌 ) 𝜌 𝐴𝐵 (𝑌 ⊗𝑌 ), with 𝑌 being the Pauli-𝑌
𝜌 𝐴𝐵 is defined as e
operator (see (4.5.25)) and 𝜌 𝐴𝐵 being the complex conjugate of 𝜌 𝐴𝐵 in the standard
basis.

9.1.1.2 Negativity and Logarithmic Negativity

Similar to how we motivated the entanglement of formation from the Schmidt rank
criterion for pure states, we can also motivate an entanglement measure from the
PPT criterion. The PPT criterion states that if the partial transpose T𝐵 (𝜌 𝐴𝐵 ) of a
given state 𝜌 𝐴𝐵 has a negative eigenvalue, then 𝜌 𝐴𝐵 is entangled.2 We use this fact
to define the negativity of 𝜌 𝐴𝐵 as
∥T𝐵 (𝜌 𝐴𝐵 )∥ 1 − 1
𝑁 (𝜌 𝐴𝐵 ) B , (9.1.95)
2
and the logarithmic negativity (often written simply as log-negativity) of 𝜌 𝐴𝐵 as

𝐸 𝑁 (𝜌 𝐴𝐵 ) B log2 ∥T𝐵 (𝜌 𝐴𝐵 )∥ 1 . (9.1.96)

Both the negativity and the log-negativity quantify the extent to which the
partial transpose T𝐵 (𝜌 𝐴𝐵 ) has negative eigenvalues. In particular, suppose that
T𝐵 (𝜌 𝐴𝐵 ) has the following Jordan–Hahn decomposition:

T𝐵 (𝜌 𝐴𝐵 ) = 𝑃 − 𝑁, (9.1.97)

where 𝑃 and 𝑁 are the positive and negative parts of T𝐵 (𝜌 𝐴𝐵 ), satisfying 𝑃, 𝑁 ≥ 0

and 𝑃𝑁 = 0, and we have used (2.2.66) and (2.2.67). By definition of the trace
norm,
∥T𝐵 (𝜌 𝐴𝐵 ) ∥ 1 = Tr[𝑃 + 𝑁]. (9.1.98)
On the other hand, observe that Tr[T𝐵 (𝜌 𝐴𝐵 )] = Tr[𝜌 𝐴𝐵 ] = 1, so that

1 = Tr[T𝐵 (𝜌 𝐴𝐵 )] = Tr[𝑃 − 𝑁]. (9.1.99)

Therefore,
∥T𝐵 (𝜌 𝐴𝐵 ) ∥ 1 − 1 ∥T𝐵 (𝜌 𝐴𝐵 )∥ 1 − Tr[T𝐵 (𝜌 𝐴𝐵 )]
𝑁 (𝜌 𝐴𝐵 ) = = = Tr[𝑁]. (9.1.100)
2 2
2 Note that it does not matter in which basis the transpose is defined.

523
Chapter 9: Entanglement Measures

So, according to (2.2.67), the negativity is the sum of the absolute values of the
negative eigenvalues of 𝜌 T𝐴𝐵
𝐵
.
By utilizing Hölder duality and semi-definite programming duality, it is possible
to write ∥T𝐵 (𝜌 𝐴𝐵 ) ∥ 1 as the following primal and dual semi-definite programs:

∥T𝐵 (𝜌 𝐴𝐵 )∥ 1 = sup {Tr[𝑅 𝐴𝐵 𝜌 𝐴𝐵 ] : −1 𝐴𝐵 ≤ T𝐵 (𝑅 𝐴𝐵 ) ≤ 1 𝐴𝐵 } , (9.1.101)

𝑅 𝐴𝐵
= inf {Tr[𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 ] : T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ) = 𝜌 𝐴𝐵 } . (9.1.102)
𝐾 𝐴𝐵 ,𝐿 𝐴𝐵 ≥0

where the optimization in the first line is with respect to Hermitian 𝑅 𝐴𝐵 . We give a
proof of (9.1.102)–(9.1.101) in Appendix 9.A.

Proposition 9.8
The log-negativity is non-negative for all bipartite states, and it is faithful on
the set of PPT states (i.e., it is equal to zero if and only if a state is PPT).

Proof: To see the first statement, we note that the choice 𝑅 𝐴𝐵 = 1 𝐴𝐵 is feasible
for the primal SDP in (9.1.101), so that ∥T𝐵 (𝜌 𝐴𝐵 ) ∥ 1 ≥ 1, and hence 𝐸 𝑁 (𝜌 𝐴𝐵 ) ≥ 0,
for every bipartite state 𝜌 𝐴𝐵 .
Suppose that 𝜌 𝐴𝐵 is a PPT state. Then ∥T𝐵 (𝜌 𝐴𝐵 )∥ 1 = Tr[T𝐵 (𝜌 𝐴𝐵 )] =
Tr[𝜌 𝐴𝐵 ] = 1 due to the assumption that T𝐵 (𝜌 𝐴𝐵 ) ≥ 0, implying that 𝐸 𝑁 (𝜌 𝐴𝐵 ) = 0
for every PPT state.
Finally, suppose that 𝐸 𝑁 (𝜌 𝐴𝐵 ) = 0. Then ∥T𝐵 (𝜌 𝐴𝐵 ) ∥ 1 = 1, and employing the
notation of (9.1.98)–(9.1.99), we conclude that 1 = Tr[𝑃 + 𝑁] = Tr[𝑃 − 𝑁], which
implies that Tr[𝑁] = 0. Since 𝑁 ≥ 0, this implies that 𝑁 = 0. Thus, T𝐵 (𝜌 𝐴𝐵 ) has
no negative part and 𝜌 𝐴𝐵 is thus a PPT state. ■

Definition 9.9 Selective PPT Monotonicity

As a generalization of selective LOCC monotonicity defined in (9.1.14), we
say that a function 𝐸 : D(H 𝐴𝐵 ) → R is a selective PPT monotone if it satisfies
∑︁
𝐸 (𝜌 𝐴𝐵 ) ≥ 𝑝(𝑥)𝐸 (𝜔𝑥𝐴′ 𝐵′ ), (9.1.103)
𝑥∈X:𝑝(𝑥)>0

524
Chapter 9: Entanglement Measures

for every bipartite state 𝜌 𝐴𝐵 and C-PPT-P instrument {P𝑥𝐴𝐵→𝐴′ 𝐵′ }𝑥∈X , with

𝑝(𝑥) B Tr[P𝑥𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 )], (9.1.104)

1 𝑥
𝜔𝑥𝐴′ 𝐵′ B P ′ ′ (𝜌 𝐴𝐵 ). (9.1.105)
𝑝(𝑥) 𝐴𝐵→𝐴 𝐵
A C-PPT-P instrument is such that every map P𝑥𝐴𝐵→𝐴′ 𝐵′ is completely positive
Í
and T𝐵′ ◦P𝑥𝐴𝐵→𝐴′ 𝐵′ ◦T𝐵 is completely positive, and the sum map 𝑥∈X P𝑥𝐴𝐵→𝐴′ 𝐵′
is trace preserving.
It follows that 𝐸 is a PPT monotone if it is a selective PPT monotone, because
the former is a special case of the latter in which the alphabet X has only one
letter.

An LOCC instrument in (9.1.10) is a C-PPT-P instrument because every

map L𝑥𝐴𝐵→𝐴′ 𝐵′ in an LOCC instrument satisfies the requirements for a C-PPT-P
instrument.
The negativity and the log-negativity are entanglement measures, as shown in
Proposition 9.10 below. Interestingly, the method of proof does not involve making
use of Lemma 9.2 because it is impossible to do so for the log-negativity, as the
latter is not convex. In any case, we prove a stronger result than selective LOCC
monotonicity for the log-negativity: we prove that it is a selective PPT monotone.

Proposition 9.10
The negativity and the log-negativity are selective PPT monotones, satisfying
(9.1.103). The negativity is convex, satisfying (9.1.7), but the log-negativity is
not.

Proof: Define 𝜌 𝐴𝐵 and {( 𝑝(𝑥), 𝜔𝑥𝐴′ 𝐵′ )}𝑥∈X as in (9.1.104)–(9.1.105), and let

{P𝑥𝐴𝐵→𝐴′ 𝐵′ }𝑥∈X be a C-PPT-P instrument. Let 𝐾 𝐴𝐵 and 𝐿 𝐴𝐵 be arbitrary positive
semi-definite operators satisfying
T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ) = 𝜌 𝐴𝐵 . (9.1.106)
Then we find that
𝑝(𝑥)𝜔𝑥𝐴′ 𝐵′ = P𝑥𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ) (9.1.107)
= P𝑥𝐴𝐵→𝐴′ 𝐵′ (T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 )) (9.1.108)
525
Chapter 9: Entanglement Measures

= T𝐵′ (T𝐵′ ◦ P𝑥𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 )(𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 )). (9.1.109)

Let us define
1
𝐾 𝐴𝑥 ′ 𝐵′ B (T𝐵′ ◦ P𝑥𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 )(𝐾 𝐴𝐵 ), (9.1.110)
𝑝(𝑥)
1
𝐿 𝑥𝐴′ 𝐵′ B (T𝐵′ ◦ P𝑥𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 )(𝐿 𝐴𝐵 ), (9.1.111)
𝑝(𝑥)
so that
𝜔𝑥𝐴′ 𝐵′ = T𝐵′ (𝐾 𝐴𝑥 ′ 𝐵′ − 𝐿 𝑥𝐴′ 𝐵′ ). (9.1.112)
Furthermore, since T𝐵′ ◦ P𝑥𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 is completely positive, it follows that
𝐾 𝐴𝑥 ′ 𝐵′ , 𝐿 𝑥𝐴′ 𝐵′ ≥ 0. Thus, 𝐾 𝐴𝑥 ′ 𝐵′ and 𝐿 𝑥𝐴′ 𝐵′ are feasible for the SDP in (9.1.102) for
T𝐵 (𝜔𝑥𝐴′ 𝐵′ ) 1 , and we conclude that

Tr[𝐾 𝐴𝑥 ′ 𝐵′ + 𝐿 𝑥𝐴′ 𝐵′ ] ≥ T𝐵 (𝜔𝑥𝐴′ 𝐵′ ) 1

. (9.1.113)

Then consider that

Tr[𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 ]
= Tr[T𝐵 (𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 )] (9.1.114)
∑︁
= Tr[P𝑥𝐴𝐵→𝐴′ 𝐵′ (T𝐵 (𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 ))] (9.1.115)
𝑥∈X:𝑝(𝑥)>0
∑︁
= Tr[(T𝐵′ ◦ P𝑥𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 )(𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 )] (9.1.116)
𝑥∈X:𝑝(𝑥)>0
∑︁
= 𝑝(𝑥) Tr[𝐾 𝐴𝑥 ′ 𝐵′ + 𝐿 𝑥𝐴′ 𝐵′ ] (9.1.117)
𝑥∈X:𝑝(𝑥)>0
∑︁
≥ 𝑝(𝑥) T𝐵 (𝜔𝑥𝐴′ 𝐵′ ) 1
. (9.1.118)
𝑥∈X:𝑝(𝑥)>0

The first and third equalities hold because the trace is invariant Íunder a partial
transpose. The second equality follows because the sum map 𝑥 P𝑥𝐴𝐵→𝐴′ 𝐵′ is
trace preserving. The fourth equality follows from the definitions in (9.1.110)–
(9.1.111). The inequality follows from (9.1.113). Since the inequality holds for all
𝐾 𝐴𝐵 , 𝐿 𝐴𝐵 ≥ 0 satisfying (9.1.106), we conclude that
∑︁
∥T𝐵 (𝜌 𝐴𝐵 )∥ 1 ≥ 𝑝(𝑥) T𝐵 (𝜔𝑥𝐴′ 𝐵′ ) 1 . (9.1.119)
𝑥∈X:𝑝(𝑥)>0

526
Chapter 9: Entanglement Measures

By applying the definition in (9.1.95), we conclude that the negativity is a

selective PPT monotone. Now considering (9.1.119) and taking the logarithm,
and using its monotonicity and concavity, we conclude that the log-negativity is a
selective PPT monotone:

𝐸 𝑁 (𝜌 𝐴𝐵 ) = log2 ∥T𝐵 (𝜌 𝐴𝐵 )∥ 1 (9.1.120)

 ∑︁ 
 
≥ log2  𝑝(𝑥) T𝐵 (𝜔𝑥𝐴′ 𝐵′ ) 1  (9.1.121)
𝑥∈X:𝑝(𝑥)>0 
∑︁  
𝑥
≥ 𝑝(𝑥) log2 T𝐵 (𝜔 𝐴′ 𝐵′ ) 1 (9.1.122)
𝑥∈X:𝑝(𝑥)>0
∑︁
= 𝑝(𝑥)𝐸 𝑁 (𝜔𝑥𝐴′ 𝐵′ ). (9.1.123)
𝑥∈X:𝑝(𝑥)>0

That the negativity is convex follows directly from the definition, convexity of
the trace norm, and linearity of the partial transpose.
The lack of convexity of log-negativity follows from direct evaluation for
Í Í
the states Φ 𝐴𝐵 B 12 𝑖, 𝑗 ∈{0,1} |𝑖𝑖⟩⟨ 𝑗 𝑗 | 𝐴𝐵 , 𝜎𝐴𝐵 B 12 𝑖∈{0,1} |𝑖𝑖⟩⟨𝑖𝑖| 𝐴𝐵 , and 𝜌 𝐴𝐵 B
1
2 (Φ 𝐴𝐵 + 𝜎𝐴𝐵 ), for which we have

3
𝐸 𝑁 (Φ 𝐴𝐵 ) = 1, 𝐸 𝑁 (𝜎𝐴𝐵 ) = 0, 𝐸 𝑁 (𝜌 𝐴𝐵 ) = log2 , (9.1.124)
2
so that
1
𝐸 𝑁 (𝜌 𝐴𝐵 ) > (𝐸 𝑁 (Φ 𝐴𝐵 ) + 𝐸 𝑁 (𝜎𝐴𝐵 )) . (9.1.125)
2
This concludes the proof. ■

Proposition 9.11 Additivity of Log-Negativity

The logarithmic negativity is additive; i.e., (9.1.8) holds with 𝐸 set to 𝐸 𝑁 .

Proof: For every two states 𝜏𝐴1 𝐵1 and 𝜔 𝐴2 𝐵2 , consider that

𝐸 𝑁 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 )𝜏⊗𝜔 = log2 T𝐵1 𝐵2 (𝜏𝐴1 𝐵1 ⊗ 𝜔 𝐴2 𝐵2 ) 1

(9.1.126)
= log2 T𝐵1 (𝜏𝐴1 𝐵1 ) ⊗ T𝐵2 (𝜔 𝐴2 𝐵2 ) 1 (9.1.127)

= log2 T𝐵1 (𝜏𝐴1 𝐵1 ) 1 · T𝐵2 (𝜔 𝐴2 𝐵2 ) 1 (9.1.128)

527
Chapter 9: Entanglement Measures

= log2 T𝐵1 (𝜏𝐴1 𝐵1 ) 1

+ log2 T𝐵2 (𝜔 𝐴2 𝐵2 ) 1
(9.1.129)
= 𝐸 𝑁 ( 𝐴1 ; 𝐵 1 ) 𝜏 + 𝐸 𝑁 ( 𝐴2 ; 𝐵 2 ) 𝜔 . (9.1.130)

This concludes the proof. ■

Proposition 9.12 Log-Negativity of Pure Bipartite States

For a pure bipartite state 𝜓 𝐴𝐵 , the log-negativity is equal to the Rényi entropy
of order 12 of the reduced state 𝜓 𝐴 :

𝐸 𝑁 (𝜓 𝐴𝐵 ) = 𝐻 1 (𝜓 𝐴 ). (9.1.131)
2

Proof: For every pure state 𝜓 𝐴𝐵 = |𝜓⟩⟨𝜓| 𝐴𝐵 such that

𝑟 √︁
∑︁
|𝜓⟩ 𝐴𝐵 = 𝜆 𝑘 |𝑒 𝑘 ⟩ 𝐴 ⊗ | 𝑓 𝑘 ⟩𝐵 (9.1.132)
𝑘=1

is a Schmidt decomposition, we have that

𝑟 √︁
∑︁
T𝐵
|𝜓⟩⟨𝜓| 𝐴𝐵 = 𝜆 𝑘 𝜆 𝑘 ′ |𝑒 𝑘 ⟩⟨𝑒 𝑘 ′ | 𝐴 ⊗ | 𝑓 𝑘 ′ ⟩⟨ 𝑓 𝑘 | 𝐵 , (9.1.133)
𝑘,𝑘 ′ =1

where we have taken the partial transpose with respect to the orthonormal set
{| 𝑓 𝑘 ⟩𝐵 }𝑟𝑘=1 . Observe that
𝑟 √︁ 𝑟 √︁
!
∑︁ ∑︁
|𝜓⟩⟨𝜓| T𝐴𝐵
𝐵
= 𝐹𝐴𝐵 𝜆 𝑘 ′ |𝑒 𝑘 ′ ⟩⟨𝑒 𝑘 ′ | 𝐴 ⊗ 𝜆 𝑘 | 𝑓 𝑘 ⟩⟨ 𝑓 𝑘 | 𝐵 , (9.1.134)
𝑘 ′ =1 𝑘=1
Í
where 𝐹𝐴𝐵 = 𝑟𝑘,𝑘 ′ =1 |𝑒 𝑘 ′ ⟩⟨𝑒 𝑘 | 𝐴 ⊗ | 𝑓 𝑘 ⟩⟨ 𝑓 𝑘 ′ | 𝐵 is a unitary swap operator. Thus, by
unitary invariance of the trace norm, we obtain
𝑟 √︁
!2 𝑟 √︁
!
∑︁ ∑︁
𝐸 𝑁 (𝜓 𝐴𝐵 ) = log2 𝜆 𝑘 = 2 log2 𝜆𝑘 . (9.1.135)
𝑘=1 𝑘=1

By comparing with (7.4.3), we conclude the statement of the proposition. ■

For a maximally entangled state (𝜆 𝑘 = 1𝑟 for all 1 ≤ 𝑘 ≤ 𝑟), we find that

𝐸 𝑁 (𝜓 𝐴𝐵 ) = log2 𝑟, exactly as with the entanglement of formation.
528
Chapter 9: Entanglement Measures

D(H AB )

∗
ρ AB σAB SEP( A : B)
∗ )
D (ρ AB kσAB

Figure 9.1: A simple way to measure the entanglement of a bipartite state

𝜌 𝐴𝐵 is to calculate its divergence with the set of separable states. If we
use a generalized divergence 𝑫 as our measure, then the measure of the
entanglement in 𝜌 𝐴𝐵 is given by the smallest value of 𝑫 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ), where
∗ ) =
𝜎𝐴𝐵 ∈ SEP( 𝐴 : 𝐵) is a separable state, i.e., by the quantity 𝑫 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵
inf 𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵) 𝑫 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ).

9.1.1.3 Divergence-Based Measures

The two entanglement measures considered above are based on specific mathemati-
cal properties of entanglement. However, using the fact that entangled states are, by
definition, not separable, we can construct a broad class of entanglement measures
by finding the divergence of a given state 𝜌 𝐴𝐵 with the set of separable states. This
idea is illustrated in Figure 9.1. We primarily consider such divergence-based
entanglement measures in this book (in the research literature, these are also called
“distance-based” entanglement measures, even though divergences that are not
distances, such as relative entropy, are used in this approach).
As an example of a divergence-based entanglement measure, let us consider a
concrete divergence, the normalized trace distance, which we defined in Section 6.1
as 12 ∥ 𝜌 − 𝜎∥ 1 for every two states 𝜌 and 𝜎. Mathematically, the distance of a point
to a set is defined by finding the element of that set that is closest to the given point.
With this idea, we define the trace distance of entanglement of a state 𝜌 𝐴𝐵 as the
normalized trace distance from 𝜌 𝐴𝐵 to the closest state 𝜎𝐴𝐵 ∈ SEP( 𝐴 : 𝐵):
1
𝐸𝑇 ( 𝐴; 𝐵) 𝜌 B inf ∥ 𝜌 𝐴𝐵 − 𝜎𝐴𝐵 ∥ 1 . (9.1.136)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵) 2

Note that the infimum is indeed achieved, because SEP( 𝐴 : 𝐵) is a compact set and
529
Chapter 9: Entanglement Measures

the trace norm is continuous in 𝜎𝐴𝐵 , so that there always exists a closest separable
state to the given state 𝜌 𝐴𝐵 . Recall that we implicitly introduced the trace distance
of entanglement in Proposition 9.5, when considering approximate faithfulness of
the entanglement of formation.
The quantity 𝐸𝑇 is indeed an entanglement measure. To see this, we use the
data-processing inequality for the trace distance (Theorem 6.3), and the fact that
separable states are preserved under LOCC channels (which follows immediately
from the definition of LOCC channels). Then, for every state 𝜌 𝐴𝐵 , every LOCC
channel L 𝐴𝐵→𝐴′ 𝐵′ , and letting 𝜔 𝐴′ 𝐵′ = L 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ), we obtain
1
𝐸𝑇 ( 𝐴; 𝐵) 𝜌 = inf ∥ 𝜌 𝐴𝐵 − 𝜎𝐴𝐵 ∥ 1 (9.1.137)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵) 2
1
≥ inf ∥L 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ) − L 𝐴𝐵→𝐴′ 𝐵′ (𝜎𝐴𝐵 )∥ 1 (9.1.138)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵) 2
1
≥ inf ′ ′ ∥L 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ) − 𝜏𝐴′ 𝐵′ ∥ 1 (9.1.139)
𝜏𝐴′ 𝐵′ ∈SEP( 𝐴 :𝐵 ) 2
= 𝐸𝑇 ( 𝐴′; 𝐵′)𝜔 . (9.1.140)

Although the simple proof above makes it clear that the trace distance of entangle-
ment is an LOCC monotone, it is known that the trace distance of entanglement
is not a selective LOCC monotone, as defined in (9.1.14) (please consult the
Bibliographic Notes in Section 9.6).
The trace distance of entanglement is also faithful, which is due to the fact that
the trace distance is a metric in the mathematical sense: 12 ∥ 𝜌 𝐴𝐵 − 𝜎𝐴𝐵 ∥ 1 ≥ 0 for
all states 𝜌 𝐴𝐵 , 𝜎𝐴𝐵 , and 12 ∥ 𝜌 𝐴𝐵 − 𝜎𝐴𝐵 ∥ 1 = 0 if and only if 𝜌 𝐴𝐵 = 𝜎𝐴𝐵 .
Beyond the trace distance, we can take any distinguishability measure and
define an entanglement measure analogous to the one in (9.1.136). That is, we can
take any generalized divergence 𝑫 as our divergence. Recall from Definition 7.15
that a generalized divergence is a function 𝑫 : D(H) × L+ (H) → R ∪ {+∞} that
obeys the data-processing inequality. We then define the generalized divergence of
entanglement of 𝜌 𝐴𝐵 as follows:

𝑬 ( 𝐴; 𝐵) 𝜌 B inf 𝑫 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ). (9.1.141)

𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)

See Figure 9.1 for a visual depiction of the idea behind this quantity. If the
generalized divergence 𝑫 is continuous in its second argument, then the infimum
in (9.1.141) is achieved. We study this entanglement measure in much more detail
530
Chapter 9: Entanglement Measures

in Section 9.2. By the data-processing inequality for the generalized divergence, as

well as the fact that separable states are preserved under LOCC channels, it follows
that 𝑬 ( 𝐴; 𝐵) 𝜌 is an entanglement measure. We prove this and other properties
of the generalized divergence of entanglement in Proposition 9.16. As was the
case for the trace distance of entanglement, it does not necessarily follow that the
generalized divergence of entanglement is a selective LOCC monotone, as defined
in (9.1.14), but it does hold for some important cases.
Although the generalized divergence of entanglement of a bipartite state is
conceptually simple, it is in general difficult to optimize over the set of separable
states because it does not have a simple characterization (except in low dimensions).
This means that the generalized divergence of entanglement is difficult to compute
in most cases.
To obtain an entanglement measure that is simpler to compute, one idea is to
relax the optimization in (9.1.141) from the set of separable states to some other set
of states that contains the set of separable states. It is ideal if this other set is easier
to characterize than the set of separable states. As a first step, let us recall the PPT
criterion from Section 3.2.9, which states that if a bipartite state is separable, then it
is PPT, meaning that it has positive partial transpose (recall Definition 3.17). This
fact immediately leads to the containment SEP( 𝐴 : 𝐵) ⊆ PPT( 𝐴 : 𝐵). (As stated in
Section 3.2.9, if both 𝐴 and 𝐵 are qubits, or if one of them is a qubit and the other
a qutrit, then PPT( 𝐴 : 𝐵) = SEP( 𝐴 : 𝐵).) We can thus define the PPT generalized
divergence of a bipartite state 𝜌 𝐴𝐵 as
𝑬 PPT ( 𝐴; 𝐵) 𝜌 B inf 𝑫 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ). (9.1.142)
𝜎𝐴𝐵 ∈PPT( 𝐴:𝐵)

If the generalized divergence 𝑫 is continuous in its second argument, then the

infimum is achieved. Also, note that 𝑬 PPT ( 𝐴; 𝐵) 𝜌 = 𝑬 ( 𝐴; 𝐵) 𝜌 when both 𝐴 and 𝐵
are qubits or when one of them is a qubit and the other a qutrit.
Like the generalized divergence of entanglement, the PPT divergence is an
entanglement measure. In fact, the PPT divergence is monotone under C-PPT-P
channels, as defined in Definition 4.27. This is due to the data-processing inequality
for the generalized divergence and the fact that the set of PPT states is closed
under C-PPT-P channels (see Proposition 4.28). Since the set of LOCC channels is
contained in the set of C-PPT-P channels (Propositions 4.24 and 4.29), it follows
that the PPT divergence is an entanglement measure.
Unlike the generalized divergence of entanglement, the PPT divergence is not
a faithful entanglement measure. It is true that 𝑬 PPT ( 𝐴; 𝐵) 𝜌 = 0 for all separable
531
Chapter 9: Entanglement Measures

PPT0 ( A : B) PPT( A : B) SEP( A : B)

Figure 9.2: The set SEP( 𝐴 : 𝐵) of separable states acting on the Hilbert space
H 𝐴𝐵 is contained in the set PPT( 𝐴 : 𝐵) of positive partial transpose (PPT)
states, which in turn is contained in the set PPT′ ( 𝐴 : 𝐵) of operators defined in
(9.1.144). The sets PPT and PPT′ are relaxations of the set of separable states
that can be easily characterized in terms of semi-definite constraints.

states 𝜌 𝐴𝐵 due to the containment

SEP( 𝐴 : 𝐵) ⊆ PPT( 𝐴 : 𝐵). (9.1.143)
However, the converse statement is not true because the infimum in (9.1.142), if
achieved, need not be achieved by a separable state. In other words, there exist PPT
entangled states 𝜌 𝐴𝐵 for which 𝑬 PPT ( 𝐴; 𝐵) 𝜌 = 0.
It turns out to be useful to relax the set of PPT states further:

Definition 9.13 PPT’

Let PPT′ ( 𝐴 : 𝐵) denote the following convex set of positive semi-definite
operators:

PPT′ ( 𝐴 : 𝐵) B {𝜎𝐴𝐵 : 𝜎𝐴𝐵 ≥ 0, ∥T𝐵 (𝜎𝐴𝐵 ) ∥ 1 ≤ 1} . (9.1.144)

Convexity of the set PPT′ ( 𝐴 : 𝐵) follows from convexity of the trace norm.
Furthermore, the set PPT′ ( 𝐴 : 𝐵) contains the set of PPT states because every PPT
state 𝜎𝐴𝐵 satisfies ∥T𝐵 (𝜎𝐴𝐵 )∥ 1 = 1. Furthermore, every operator 𝜎𝐴𝐵 ∈ PPT′ ( 𝐴 :
𝐵) is subnormalized, satisfying Tr[𝜎𝐴𝐵 ] ≤ 1, which follows because
Tr[𝜎𝐴𝐵 ] = Tr[T𝐵 (𝜎𝐴𝐵 )] ≤ ∥T𝐵 (𝜎𝐴𝐵 )∥ 1 ≤ 1. (9.1.145)
532
Chapter 9: Entanglement Measures

The set PPT′ ( 𝐴 : 𝐵) can be written equivalently as

PPT′ ( 𝐴 : 𝐵) B {𝜎𝐴𝐵 : 𝜎𝐴𝐵 ≥ 0, 𝐸 𝑁 (𝜎𝐴𝐵 ) ≤ 0} , (9.1.146)

by inspecting the formula for log-negativity in (9.1.96). By comparing with

(3.2.117), we clearly have the containment

SEP( 𝐴 : 𝐵) ⊆ PPT( 𝐴 : 𝐵) ⊆ PPT′ ( 𝐴 : 𝐵). (9.1.147)

See Figure 9.2 for a visual depiction of this containment.

We define the generalized Rains divergence of a bipartite state 𝜌 𝐴𝐵 as

𝑹( 𝐴; 𝐵) 𝜌 B inf 𝑫 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ). (9.1.148)

𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)

If the underlying generalized divergence 𝑫 is continuous in its second argument,

then the infimum is achieved in (9.1.148). Like the entanglement measures
𝑬 ( 𝐴; 𝐵) 𝜌 and 𝑬 PPT ( 𝐴; 𝐵) 𝜌 , the generalized Rains divergence is an entanglement
measure. In fact, it is monotone under C-PPT-P channels, which follows from
the data-processing inequality for the generalized divergence and because the set
PPT′ is preserved under C-PPT-P channels (a consequence of Lemma 9.14 below).
Since the set of LOCC channels is contained in the set of C-PPT-P channels
(Propositions 4.24 and 4.29), it follows that the generalized Rains divergence is an
entanglement measure.
As with the PPT divergence, the generalized Rains divergence is not a faithful
entanglement measure. It is true that 𝑹( 𝐴; 𝐵) 𝜌 = 0 for all separable states 𝜌 𝐴𝐵 , due
to the containment SEP( 𝐴 : 𝐵) ⊆ PPT′ ( 𝐴 : 𝐵). However, the converse statement is
not true because the infimum in (9.1.148) need not be achieved by a separable state.
Depending on the form of the generalized divergence 𝑫, the relaxation from
the set SEP to the set PPT′ leads to an entanglement measure that can be computed
efficiently via semi-definite programming (Section 2.4). We investigate one such
example of an entanglement measure in Section 9.3.1. Also, due to the containments
in (9.1.147), we have that

𝑬 ( 𝐴; 𝐵) 𝜌 ≥ 𝑬 PPT ( 𝐴; 𝐵) 𝜌 ≥ 𝑹( 𝐴; 𝐵) 𝜌 , (9.1.149)

for every bipartite state 𝜌 𝐴𝐵 . Thus, as we show later in the book, the relaxation from
SEP to PPT′ via the generalized Rains divergence not only allows for the possibility
of efficiently computable entanglement measures, but due to the inequality in
533
Chapter 9: Entanglement Measures

(9.1.149), it also allows for the possibility of obtaining a tighter upper bound on
communication rates in certain scenarios. We investigate the properties of the
generalized Rains divergence in detail in Section 9.3.
Before proceeding, let us state some properties of the set PPT′.

Lemma 9.14 Properties of the Set PPT′

The set PPT′ ( 𝐴 : 𝐵) defined in (9.1.144) has the following properties:
1. It is closed under completely PPT-preserving channels (recall Definition
4.27). In more detail, let P 𝐴𝐵→𝐴′ 𝐵′ be a completely PPT-preserving channel.
Then, for every state 𝜌 𝐴𝐵 ∈ PPT′ ( 𝐴 : 𝐵), we have that P 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ) ∈
PPT′ ( 𝐴′ : 𝐵′).
2. It is closed under LOCC channels (recall Definition 4.22). In more detail, let
L 𝐴𝐵→𝐴′ 𝐵′ be an LOCC channel. Then, for every state 𝜌 𝐴𝐵 ∈ PPT′ ( 𝐴 : 𝐵),
it holds that L 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ) ∈ PPT′ ( 𝐴′ : 𝐵′).

Remark: We emphasize that not all operators in the set PPT′ are quantum states, meaning that
not all operators 𝜎𝐴𝐵 ∈ PPT′ ( 𝐴 : 𝐵) satisfy Tr[𝜎𝐴𝐵 ] = 1.

Proof:
1. Let 𝜎𝐴′ 𝐵′ = P 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ). Since P 𝐴𝐵→𝐴′ 𝐵′ is a channel, and 𝜌 𝐴𝐵 is a state,
we have that 𝜎𝐴′ 𝐵′ ≥ 0. Then,

T𝐵′ (𝜎𝐴′ 𝐵′ ) = (T𝐵′ ◦ P 𝐴𝐵→𝐴′ 𝐵′ )(𝜌 𝐴𝐵 ) (9.1.150)

= (T𝐵′ ◦ P 𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 ◦ T𝐵 )(𝜌 𝐴𝐵 ) (9.1.151)
= (T𝐵′ ◦ P 𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 )(T𝐵 (𝜌 𝐴𝐵 )). (9.1.152)

Now, consider that the induced trace norm of T𝐵′ ◦ P 𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 satisfies
∥T𝐵′ ◦ P 𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 ∥ 1 = 1, which follows from (2.2.184)–(2.2.185) and
the fact that T𝐵′ ◦ P 𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 is a channel by definition of P 𝐴𝐵→𝐴′ 𝐵′ .
Furthermore, we have that ∥T𝐵 (𝜌 𝐴𝐵 ) ∥ 1 ≤ 1 because 𝜌 𝐴𝐵 ∈ PPT′ ( 𝐴 : 𝐵).
Putting these observations together, we find that

∥T𝐵′ (𝜎𝐴′ 𝐵′ )∥ 1 = ∥ (T𝐵′ ◦ P 𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 )(T𝐵 (𝜌 𝐴𝐵 ))∥ 1 (9.1.153)

≤ ∥T𝐵′ ◦ P 𝐴𝐵→𝐴′ 𝐵′ ◦ T𝐵 ∥ 1 ∥T𝐵 (𝜌 𝐴𝐵 ) ∥ 1 (9.1.154)
≤ 1. (9.1.155)
534
Chapter 9: Entanglement Measures

Therefore, 𝜎𝐴′ 𝐵′ ∈ PPT′ ( 𝐴 : 𝐵).

2. Since every LOCC channel is a completely PPT-preserving channel (Proposi-
tions 4.24 and 4.29), the result immediately follows from the proof above. ■

9.1.1.4 Squashed Entanglement

In the previous example, we considered the generalized divergence of entanglement,

which is simply the generalized divergence of a given bipartite state with the set of
separable states. If we restrict ourselves to product states only, i.e., states of the
form 𝜏𝐴 ⊗ 𝜎𝐵 , and we consider the quantum relative entropy, then we obtain

inf 𝐷 (𝜌 𝐴𝐵 ∥𝜏𝐴 ⊗ 𝜎𝐵 ) = 𝐼 ( 𝐴; 𝐵) 𝜌 , (9.1.156)

𝜏𝐴,𝜎𝐵

where we recall the expression in (7.2.99) for the mutual information 𝐼 ( 𝐴; 𝐵) 𝜌 of

the state 𝜌 𝐴𝐵 . Thus, the mutual information is the minimal relative entropy between
the state of interest and the set of product states. We can thus view the mutual
information as a measure of the correlations contained in the bipartite state 𝜌 𝐴𝐵 .
However, the mutual information quantifies both classical and quantum correlations
because it detects any correlation whatsoever. Thus, it cannot be the case that the
mutual information is an entanglement measure, because it is strictly positive even
for some separable states that have only classical correlations, and so it can in
general increase under LOCC channels.
Regardless, we can still use the mutual information in a meaningful way to
quantify entanglement. For example, suppose that 𝜎𝐴𝐵 is a separable state, so that
∑︁
𝜎𝐴𝐵 = 𝑝(𝑥) 𝜌 𝑥𝐴 ⊗ 𝜏𝐵𝑥 , (9.1.157)
𝑥∈X

for a finite alphabet X, probability distribution 𝑝 : X → [0, 1], and sets {𝜌 𝑥𝐴 }𝑥∈X ,
{𝜏𝐵𝑥 }𝑥∈X of states. Let us form the following extension of 𝜎𝐴𝐵 to a state 𝜔 𝐴𝐵𝑋 , with
𝑋 a classical register:
∑︁
𝜔 𝐴𝐵𝑋 = 𝑝(𝑥) 𝜌 𝑥𝐴 ⊗ 𝜏𝐵𝑥 ⊗ |𝑥⟩⟨𝑥| 𝑋 . (9.1.158)
𝑥∈X

This is indeed an extension because Tr 𝑋 [𝜔 𝐴𝐵𝑋 ] = 𝜎𝐴𝐵 . Let us now consider the
conditional mutual information 𝐼 ( 𝐴; 𝐵|𝑋)𝜔 of this extension (recall the definition

535
Chapter 9: Entanglement Measures

of the quantum conditional mutual information in (7.1.11)). Since 𝜔 𝐴𝐵𝑋 is a

classical–quantum state, it follows that
∑︁
𝐼 ( 𝐴; 𝐵|𝑋)𝜔 = 𝑝(𝑥)𝐼 ( 𝐴; 𝐵) 𝜌 𝑥 ⊗𝜏 𝑥 = 0. (9.1.159)
𝑥∈X

Therefore, while the mutual information of a separable state can in general be

non-zero, the conditional mutual information is always zero. Intuitively, this is
due to the fact that the classical system acts as a “probe,” which, when measured,
reveals a value 𝑥 ∈ X for the classical system 𝑋. Conditioned on this value, the
joint state 𝜌 𝑥𝐴 ⊗ 𝜏𝐵𝑥 is product.
Thus, for every separable state 𝜌 𝐴𝐵 , there exists a classical extension 𝜔 𝐴𝐵𝑋
such that the conditional mutual information 𝐼 ( 𝐴; 𝐵|𝑋)𝜔 is equal to zero. Using
this idea, we could propose a potential measure of entanglement as follows:
1
inf {𝐼 ( 𝐴; 𝐵|𝑋)𝜔 : Tr 𝑋 [𝜔 𝐴𝐵𝑋 ] = 𝜌 𝐴𝐵 } , (9.1.160)
2 𝜔 𝐴𝐵𝑋
where the optimization is with respect to extensions of 𝜌 𝐴𝐵 having a classical
extension system 𝑋 of arbitrary (finite) dimension |X|, so that
∑︁
𝜔 𝐴𝐵𝑋 = 𝑝(𝑥) 𝜌 𝑥𝐴𝐵 ⊗ |𝑥⟩⟨𝑥| 𝑋 (9.1.161)
𝑥∈X

for some set {𝜌 𝑥𝐴𝐵 }𝑥∈X of states and a probability distribution 𝑝(𝑥) satisfying
Í
𝜌 𝐴𝐵 = 𝑥∈X 𝑝(𝑥) 𝜌 𝑥𝐴𝐵 . The normalization factor of 12 is there for reasons that
become apparent later. If we require that every state 𝜌 𝑥𝐴𝐵 in the extension 𝜔 𝐴𝐵𝑋
should be pure, then the measure in (9.1.160) reduces to the entanglement of
formation (this was actually used in (9.1.55) in the proof of Proposition 9.4).
The quantity proposed in (9.1.160) is non-negative for every state 𝜌 𝐴𝐵 , due
to the non-negativity of mutual information and the fact that conditional mutual
information with a classical conditioning system is equal to a convex combination
of mutual informations. It is already clear that the quantity proposed in (9.1.160) is
equal to zero for every separable state—if a state is separable, then the optimization
in (9.1.160) finds the separable decomposition and the value of the quantity is zero,
as discussed just after (9.1.159). The converse is also true, which follows from the
same proof given for (9.1.61)–(9.1.62). It is actually also possible to show that
the quantity in (9.1.160) is an entanglement measure. However, we do not make
further use of this quantity in this book, because there is an entanglement measure
more suitable for our purposes, as introduced below.
536
Chapter 9: Entanglement Measures

Instead of taking a classical extension of the separable state 𝜎𝐴𝐵 in (9.1.157),

as we did in (9.1.158), we can take a “quantum extension,” in the sense that we
could define an extension 𝜔 𝐴𝐵𝐸 in which the system 𝐸 is some finite-dimensional
quantum system. Optimizing over all such extensions, we obtain the squashed
entanglement:
1
𝐸 sq ( 𝐴; 𝐵) 𝜌 B inf {𝐼 ( 𝐴; 𝐵|𝐸)𝜔 : Tr𝐸 [𝜔 𝐴𝐵𝐸 ] = 𝜌 𝐴𝐵 }, (9.1.162)
2 𝜔 𝐴𝐵𝐸
which can only be smaller than the quantity proposed in (9.1.160). Note that we
optimize with respect to extensions 𝜔 𝐴𝐵𝐸 for which the extension system 𝐸 can
have arbitrary finite dimension. It is not known whether the optimization can be
restricted to extension systems of a certain fixed dimension. In general, therefore,
it is not known whether the infimum in (9.1.162) can be replaced by a minimum.
The squashed entanglement is indeed an entanglement measure, as we show
in Section 9.4. It is also a faithful entanglement measure. That it vanishes for
separable states follows from the arguments presented above. For a proof of the
converse direction, please consult the Bibliographic Notes in Section 9.6. We
establish other important properties of the squashed entanglement in Section 9.4.

9.2 Generalized Divergence of Entanglement

In this section, we investigate properties of the generalized divergence of entangle-
ment, which is a general construction of an entanglement measure. Let us recall
from (9.1.141) above that the generalized divergence of entanglement of a bipartite
state is the generalized divergence between that state and the set of separable states.

Definition 9.15 Generalized Divergence of Entanglement

Let 𝑫 be a generalized divergence (see Definition 7.15). For every bipartite
state 𝜌 𝐴𝐵 , we define the generalized divergence of entanglement of 𝜌 𝐴𝐵 as

𝑬 ( 𝐴; 𝐵) 𝜌 B inf 𝑫 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ). (9.2.1)

𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)

If the underlying generalized divergence 𝑫 is continuous in its second argument,

then the infimum can be replaced by a minimum.

537
Chapter 9: Entanglement Measures

We are particularly interested throughout the rest of this book in the following
generalized divergences of entanglement for every state 𝜌 𝐴𝐵 :
1. The relative entropy of entanglement of 𝜌 𝐴𝐵 ,

𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 B inf 𝐷 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ), (9.2.2)

𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)

where 𝐷 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) is the quantum relative entropy of 𝜌 𝐴𝐵 and 𝜎𝐴𝐵 (Defini-

tion 7.1).
2. The 𝜀-hypothesis testing relative entropy of entanglement of 𝜌 𝐴𝐵 ,

𝐸 𝑅𝜀 ( 𝐴; 𝐵) 𝜌 B inf 𝐷 𝜀𝐻 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ), (9.2.3)

𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)

where 𝐷 𝜀𝐻 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) is the 𝜀-hypothesis testing relative entropy of 𝜌 𝐴𝐵 and

𝜎𝐴𝐵 (Definition 7.65).
3. The sandwiched Rényi relative entropy of entanglement of 𝜌 𝐴𝐵 ,
e𝛼 ( 𝐴; 𝐵) 𝜌 B
𝐸 inf e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ),
𝐷 (9.2.4)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)

where 𝐷 e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ), 𝛼 ∈ [1/2, 1) ∪ (1, ∞), is the sandwiched Rényi relative

entropy of 𝜌 𝐴𝐵 and 𝜎𝐴𝐵 (Definition 7.28). Note that 𝐸 e𝛼 ( 𝐴; 𝐵) 𝜌 is monotonically
increasing in 𝛼 for all 𝜌 𝐴𝐵 (see Proposition 7.31). This fact, along with the
fact that lim𝛼→1 𝐷e𝛼 = 𝐷 (see Proposition 7.30), leads to

e𝛼 ( 𝐴; 𝐵) 𝜌 = 𝐸 𝑅 ( 𝐴; 𝐵) 𝜌
lim 𝐸 (9.2.5)
𝛼→1

for every state 𝜌 𝐴𝐵 . See Appendix 10.A for details of the proof.
4. The max-relative entropy of entanglement of 𝜌 𝐴𝐵 ,

𝐸 max ( 𝐴; 𝐵) 𝜌 B inf 𝐷 max (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ), (9.2.6)

𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)

where 𝐷 max (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) is the max-relative entropy of 𝜌 𝐴𝐵 and 𝜎𝐴𝐵 (Defini-

tion 7.58). Using the fact that lim𝛼→∞ 𝐷 e𝛼 = 𝐷 max (see Proposition 7.61), we
find that
𝐸 max ( 𝐴; 𝐵) 𝜌 = lim 𝐸
e𝛼 ( 𝐴; 𝐵) 𝜌 (9.2.7)
𝛼→∞

538
Chapter 9: Entanglement Measures

for every state 𝜌 𝐴𝐵 . See Appendix 10.A for details of the proof. As a
e𝛼 ( 𝐴; 𝐵) 𝜌 is monotonically increasing
consequence of this fact, and the fact that 𝐸
in 𝛼 for all 𝜌 𝐴𝐵 , we have that

𝐸 max ( 𝐴; 𝐵) 𝜌 ≥ 𝐸
e𝛼 ( 𝐴; 𝐵) 𝜌 (9.2.8)

for all 𝛼 ∈ (1, ∞) and every state 𝜌 𝐴𝐵 .

In addition to the quantities above, we also can define the Petz– and geometric
Rényi relative entropies of entanglement in a similar way, but based on Defini-
tions 7.20 and 7.38, respectively, for the range of 𝛼 for which data processing holds.
These are denoted by 𝐸 𝛼 ( 𝐴; 𝐵) 𝜌 and 𝐸 b𝛼 ( 𝐴; 𝐵) 𝜌 , respectively.

Proposition 9.16 Properties of Generalized Divergence of Entanglement

Let 𝑫 be a generalized divergence that is continuous in its second argument,
and consider the generalized divergence of entanglement 𝑬 ( 𝐴; 𝐵) 𝜌 of a state
𝜌 𝐴𝐵 , as defined in (9.2.1).
1. Separable monotonicity: For every separable channel S 𝐴𝐵→𝐴′ 𝐵′ , the gener-
alized divergence of entanglement is monotone non-increasing:

𝑬 ( 𝐴; 𝐵) 𝜌 ≥ 𝑬 ( 𝐴′; 𝐵′)𝜔 , (9.2.9)

where 𝜔 𝐴′ 𝐵′ = S 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ). Since every LOCC channel is a separable

channel, the generalized divergence of entanglement is also monotone
non-increasing under LOCC channels. It is therefore an entanglement
measure as per Definition 9.1.
2. Faithfulness: If 𝑫 satisfies 𝑫 (𝜌∥𝜎) ≥ 0 and 𝑫 (𝜌∥𝜎) = 0 if and only
if 𝜌 = 𝜎 (for all states 𝜌 and 𝜎), then 𝑬 ( 𝐴; 𝐵) 𝜌 = 0 if and only if
𝜌 𝐴𝐵 ∈ SEP( 𝐴 : 𝐵). The generalized divergence of entanglement is then a
faithful entanglement measure.
3. Subadditivity: If 𝑫 is additive for product positive semi-definite operators,
i.e., 𝑫 (𝜌 ⊗ 𝜔∥𝜎 ⊗ 𝜏) = 𝑫 (𝜌∥𝜎) + 𝑫 (𝜔∥𝜏), then for every two quantum
states 𝜌 𝐴1 𝐵1 and 𝜔 𝐴2 𝐵2 , the generalized divergence of entanglement is
sub-additive:

𝑬 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 ) 𝜌⊗𝜔 ≤ 𝑬 ( 𝐴1 ; 𝐵1 ) 𝜌 + 𝑬 ( 𝐴2 ; 𝐵2 )𝜔 . (9.2.10)

539
Chapter 9: Entanglement Measures

4. Convexity: If 𝑫 is jointly convex, meaning that

!
∑︁ ∑︁ ∑︁
𝑫 𝑝(𝑥) 𝜌 𝑥𝐴𝐵 𝑥
𝑝(𝑥)𝜎𝐴𝐵 ≤ 𝑝(𝑥) 𝑫 (𝜌 𝑥𝐴𝐵 ∥𝜎𝐴𝐵
𝑥
), (9.2.11)
𝑥∈X 𝑥∈X 𝑥∈X

for every finite alphabet X, probability distribution 𝑝 : X → [0, 1], and

sets {𝜌 𝑥𝐴𝐵 }𝑥∈X , {𝜎𝐴𝐵
𝑥 }
𝑥∈X of states, then the generalized divergence of
entanglement is convex:
∑︁
𝑬 ( 𝐴; 𝐵) 𝜌 ≤ 𝑝(𝑥)𝑬 ( 𝐴; 𝐵) 𝜌 𝑥 , (9.2.12)
𝑥∈X
Í
where 𝜌 𝐴𝐵 = 𝑥∈X 𝑝(𝑥) 𝜌 𝑥𝐴𝐵 .
Properties 1., 2., 3. are satisfied when the generalized divergence is the quantum
relative entropy, the Petz–, sandwiched, and geometric Rényi relative entropy,
and the max-relative entropy. Property 4. is satisfied when the generalized
divergence is the quantum relative entropy and the Petz–, sandwiched, and
geometric Rényi relative entropies for the range of 𝛼 < 1 for which data
processing holds.

Proof:
1. For 𝜔 𝐴′ 𝐵′ = S 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ), we have by definition,
𝑬 ( 𝐴′; 𝐵′)𝜔 = inf 𝑫 (𝜔 𝐴′ 𝐵′ ∥𝜏𝐴′ 𝐵′ ) (9.2.13)
𝜏𝐴′ 𝐵′ ∈SEP( 𝐴′ :𝐵′ )
= inf 𝑫 (S 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 )∥𝜏𝐴′ 𝐵′ ). (9.2.14)
𝜏𝐴′ 𝐵′ ∈SEP( 𝐴′ :𝐵′ )

Now, recall that every separable channel S 𝐴𝐵→𝐴′ 𝐵′ takes 𝜎𝐴𝐵 ∈ SEP( 𝐴 : 𝐵)
to a state in SEP( 𝐴′ : 𝐵′), as shown already in (4.6.64)–(4.6.65). Therefore,
restricting the optimization in (9.2.14) leads to
𝑬 ( 𝐴′; 𝐵′)𝜔 ≤ inf 𝑫 (S 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 )∥S 𝐴𝐵→𝐴′ 𝐵′ (𝜎𝐴𝐵 )) (9.2.15)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)
≤ inf 𝑫 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) (9.2.16)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)
= 𝑬 ( 𝐴; 𝐵) 𝜌 , (9.2.17)
as required, where we used the data-processing inequality for the generalized
divergence to obtain the second inequality.
540
Chapter 9: Entanglement Measures

2. We have
𝑬 ( 𝐴; 𝐵) 𝜌 = inf 𝑫 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ). (9.2.18)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)
If 𝜌 𝐴𝐵 ∈ SEP( 𝐴 : 𝐵), then the state 𝜌 𝐴𝐵 itself achieves the minimum in
(9.2.18) because 𝑫 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴𝐵 ) = 0. We thus have 𝑬 ( 𝐴; 𝐵) 𝜌 = 0. On the
∗ such that
other hand, if 𝑬 ( 𝐴; 𝐵) 𝜌 = 0, then there exists a separable state 𝜎𝐴𝐵
∗ ) = 0, which by assumption implies that 𝜌
𝑫 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ∗
𝐴𝐵 = 𝜎𝐴𝐵 , i.e., that 𝜌 𝐴𝐵
is separable.
3. By definition, the optimization in the definition of 𝑬 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 ) 𝜌⊗𝜔 is
over the set SEP( 𝐴1 𝐴2 : 𝐵1 𝐵2 ). It is straightforward to see that this set
contains states of the form 𝜉 𝐴1 𝐵1 ⊗ 𝜏𝐴2 𝐵2 , where 𝜉 𝐴1 𝐵1 ∈ SEP( 𝐴1 : 𝐵1 ) and
𝜏𝐴2 𝐵2 ∈ SEP( 𝐴2 : 𝐵2 ). By restricting the optimization to such states, and by
using additivity of the generalized divergence 𝑫, we obtain
𝑬 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 ) 𝜌⊗𝜔 ≤ 𝑫 (𝜌 𝐴1 𝐵1 ⊗ 𝜔 𝐴2 𝐵2 ∥𝜉 𝐴1 𝐵1 ⊗ 𝜏𝐴2 𝐵2 ) (9.2.19)
= 𝑫 (𝜌 𝐴1 𝐵1 ∥𝜉 𝐴1 𝐵1 ) + 𝑫 (𝜔 𝐴2 𝐵2 ∥𝜏𝐴2 𝐵2 ). (9.2.20)
Since 𝜉 𝐴1 𝐵1 ∈ SEP( 𝐴1 : 𝐵1 ) and 𝜏𝐴2 𝐵2 ∈ SEP( 𝐴2 : 𝐵2 ) are arbitrary, the
inequality in (9.2.10) follows.
4. We have
!
∑︁
𝑬 ( 𝐴; 𝐵) 𝜌 = inf 𝑫 𝑝(𝑥) 𝜌 𝑥𝐴𝐵 𝜎𝐴𝐵 . (9.2.21)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)
𝑥∈X

Let us restrict the optimization over all separable states to an optimization

over sets {𝜎𝐴𝐵 𝑥∈X of separable states indexed by the alphabet X. Then,
𝑥 }
Í 𝑥
𝑥∈X 𝑝(𝑥)𝜎𝐴𝐵 is a separable state because the set of separable states is convex.
Therefore, using the joint convexity of 𝑫, we obtain
!
∑︁ ∑︁
𝑬 ( 𝐴; 𝐵) 𝜌 ≤ 𝑥 inf 𝑫 𝑝(𝑥) 𝜌 𝑥𝐴𝐵 𝑝(𝑥)𝜎𝐴𝐵𝑥
(9.2.22)
{𝜎𝐴𝐵 } 𝑥 ⊂SEP( 𝐴:𝐵)
∑︁𝑥∈X 𝑥∈X
𝑥 𝑥
≤ inf 𝑝(𝑥) 𝑫 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) (9.2.23)
{𝜎𝐴𝐵
𝑥 } ⊂SEP( 𝐴:𝐵)
𝑥
𝑥∈X
∑︁
≤ 𝑝(𝑥) inf
𝑥 ∈SEP( 𝐴:𝐵)
𝑫 (𝜌 𝑥𝐴𝐵 ∥𝜎𝐴𝐵
𝑥
) (9.2.24)
𝜎𝐴𝐵
𝑥∈X
∑︁
= 𝑝(𝑥)𝑬 ( 𝐴; 𝐵) 𝜌 𝑥 , (9.2.25)
𝑥∈X

as required. ■
541
Chapter 9: Entanglement Measures

We now delve a bit more into particular examples of the generalized divergence
of entanglement, which are based on the relative entropy and the Petz–, sandwiched,
and geometric Rényi relative entropies.

Proposition 9.17
The relative entropy of entanglement is invariant under classical communication;
i.e., (9.1.6) holds with 𝐸 set to 𝐸 𝑅 .

Proof: Let 𝜌 𝑋 𝐴𝐵 be a classical–quantum state of the form in (9.1.5). Let {𝜎𝐴𝐵

𝑥 }
𝑥∈X
be an arbitrary set of separable states, and set
∑︁
𝑥
𝜎𝑋 𝐴𝐵 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝐵 . (9.2.26)
𝑥∈X

Consider that

𝐸 𝑅 (𝑋 𝐴; 𝐵) 𝜌 = inf 𝐷 (𝜌 𝑋 𝐴𝐵 ∥𝜎𝑋 𝐴𝐵 ) (9.2.27)

𝜎𝑋 𝐴𝐵 ∈SEP(𝑋 𝐴:𝐵)
≤ 𝐷 (𝜌 𝑋 𝐴𝐵 ∥𝜎𝑋 𝐴𝐵 ) (9.2.28)
∑︁
= 𝑝(𝑥)𝐷 (𝜌 𝑥𝐴𝐵 ∥𝜎𝐴𝐵
𝑥
), (9.2.29)
𝑥∈X

where the last equality follows from the direct-sum property of relative entropy in
(7.2.27). Since the inequality holds for every set {𝜎𝐴𝐵
𝑥 }
𝑥∈X of separable states, we
conclude that ∑︁
𝐸 𝑅 (𝑋 𝐴; 𝐵) 𝜌 ≤ 𝑝(𝑥)𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 𝑥 . (9.2.30)
𝑥∈X

Now suppose that 𝜎𝑋 𝐴𝐵 is an arbitrary separable state of the systems 𝑋 𝐴|𝐵

(here we assume that the system 𝑋 is not necessarily classical). After performing
Í
the completely dephasing channel Δ 𝑋 (·) B 𝑥∈X |𝑥⟩⟨𝑥| 𝑋 (·)|𝑥⟩⟨𝑥| 𝑋 on the 𝑋 system
of 𝜎𝑋 𝐴𝐵 , the resulting state is a classical–quantum state of the following form:
∑︁
𝑥
Δ 𝑋 (𝜎𝑋 𝐴𝐵 ) = 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝐵 , (9.2.31)
𝑥∈X

where 𝑞(𝑥) is a probability distribution and {𝜎𝐴𝐵

𝑥 }
𝑥∈X is a set of separable states.
Then consider that

𝐷 (𝜌 𝑋 𝐴𝐵 ∥𝜎𝑋 𝐴𝐵 ) ≥ 𝐷 (Δ 𝑋 (𝜌 𝑋 𝐴𝐵 )∥Δ 𝑋 (𝜎𝑋 𝐴𝐵 )) (9.2.32)

542
Chapter 9: Entanglement Measures
∑︁
= 𝑝(𝑥)𝐷 (𝜌 𝑥𝐴𝐵 ∥𝜎𝐴𝐵
𝑥
) + 𝐷 ( 𝑝∥𝑞) (9.2.33)
𝑥∈X
∑︁
≥ 𝑝(𝑥)𝐷 (𝜌 𝑥𝐴𝐵 ∥𝜎𝐴𝐵
𝑥
) (9.2.34)
𝑥∈X
∑︁
≥ 𝑝(𝑥)𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 𝑥 . (9.2.35)
𝑥∈X
The first inequality follows from the data-processing inequality for relative entropy
(Theorem 7.4). The equality follows from the direct-sum property of relative
entropy in (7.2.27). The second inequality follows from the non-negativity of the
classical relative entropy 𝐷 ( 𝑝∥𝑞). The final inequality follows from the definition
of the relative entropy of entanglement and the fact that 𝜎𝐴𝐵 𝑥 is separable. Since

the chain of inequalities holds for every separable state 𝜎𝑋 𝐴𝐵 , we conclude that
∑︁
𝐸 𝑅 (𝑋 𝐴; 𝐵) 𝜌 ≥ 𝑝(𝑥)𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 𝑥 . (9.2.36)
𝑥∈X
Putting together (9.2.30) and (9.2.36), and noting that the same argument applies
when exchanging the roles of Alice and Bob, we conclude the statement of the
proposition. ■

As an immediate corollary of Proposition 9.17, Lemma 9.2, and Property 1. of

Proposition 9.16, we conclude that the relative entropy of entanglement is a selective
LOCC monotone. However, we can conclude something stronger, which is what
we prove in Proposition 9.19 below after defining selective separable monotonicity.

Definition 9.18 Selective Separable Monotonicity

As a generalization of selective LOCC monotonicity defined in (9.1.14) and in
the spirit of the selective PPT monotonicity from Definition 9.9, we say that a
function 𝐸 : D(H 𝐴𝐵 ) → R is a selective separable monotone if it satisfies
∑︁
𝐸 (𝜌 𝐴𝐵 ) ≥ 𝑝(𝑥)𝐸 (𝜔𝑥𝐴′ 𝐵′ ), (9.2.37)
𝑥∈X:𝑝(𝑥)>0

for every bipartite state 𝜌 𝐴𝐵 and separable instrument {S𝑥𝐴𝐵→𝐴′ 𝐵′ }𝑥∈X , with
𝑝(𝑥) B Tr[S𝑥𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 )], (9.2.38)
1 𝑥
𝜔𝑥𝐴′ 𝐵′ B S ′ ′ (𝜌 𝐴𝐵 ). (9.2.39)
𝑝(𝑥) 𝐴𝐵→𝐴 𝐵

543
Chapter 9: Entanglement Measures

A separable instrument is such that every map S𝑥𝐴𝐵→𝐴′ 𝐵′ is completely positive

and separable (with Kraus operators of the form in (4.6.63)), and the sum map
Í
𝑥∈X S 𝐴𝐵→𝐴′ 𝐵′ is trace preserving.
𝑥

It follows that 𝐸 is a separable monotone if it is a selective separable monotone,

because the former is a special case of the latter in which the alphabet X has
only one letter.

Proposition 9.19 Selective Separable Monotonicity of Relative Entropies

of Entanglement
The relative entropy of entanglement is a selective separable monotone; i.e.,
(9.2.37) holds with 𝐸 set to 𝐸 𝑅 . The Petz–, sandwiched, and geometric Rényi
relative entropies of entanglement are selective separable monotones for the
range 𝛼 > 1 for which data processing holds.

Proof: Let us begin with the relative entropy of entanglement. Let 𝜌 𝐴𝐵 be an

arbitrary bipartite state, let {S𝑥𝐴𝐵→𝐴′ 𝐵′ }𝑥∈X be a separable instrument, and let
S 𝐴𝐵→𝑋 𝐴′ 𝐵′ denote the following quantum channel:
∑︁
S 𝐴𝐵→𝑋 𝐴′ 𝐵′ (𝜔 𝐴𝐵 ) B |𝑥⟩⟨𝑥| 𝑋 ⊗ S𝑥𝐴𝐵→𝐴′ 𝐵′ (𝜔 𝐴𝐵 ). (9.2.40)
𝑥∈X

Let 𝜏𝑋 𝐴′ 𝐵′ B S 𝐴𝐵→𝑋 𝐴′ 𝐵′ (𝜌 𝐴𝐵 ), and note that

∑︁
𝜏𝑋 𝐴 𝐵 =
′ ′ 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜏𝐴𝑥 ′ 𝐵′ , (9.2.41)
𝑥∈X

for some probability distribution 𝑝(𝑥) and set {𝜏𝐴𝑥 ′ 𝐵′ }𝑥∈X of states. Then consider
that

𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 ≥ 𝐸 𝑅 (𝑋 𝐴′; 𝐵′)𝜏 (9.2.42)

∑︁
= 𝑝(𝑥)𝐸 𝑅 ( 𝐴′; 𝐵′)𝜏 𝑥 . (9.2.43)
𝑥∈X

The inequality follows because 𝐸 𝑅 is monotone under separable channels (Prop-

erty 1. of Proposition 9.16), and the equality follows from Proposition 9.17.
Let us now consider proving the statement for the Petz–Rényi relative entropy
of entanglement for 𝛼 ∈ (1, 2]. Consider the same channel S 𝐴𝐵→𝑋 𝐴′ 𝐵′ and state
544
Chapter 9: Entanglement Measures

𝜏𝑋 𝐴′ 𝐵′ defined above. Let 𝜎𝐴𝐵 be an arbitrary separable state, and consider that
∑︁
S 𝐴𝐵→𝑋 𝐴′ 𝐵′ (𝜎𝐴𝐵 ) = 𝑞(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐴𝑥 ′ 𝐵′ , (9.2.44)
𝑥∈X

for some probability distribution 𝑞(𝑥) and set {𝜎𝐴𝑥 ′ 𝐵′ }𝑥∈X of separable states. Then
we find that

𝑄 𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) ≥ 𝑄 𝛼 (S 𝐴𝐵→𝑋 𝐴′ 𝐵′ (𝜌 𝐴𝐵 )∥S 𝐴𝐵→𝑋 𝐴′ 𝐵′ (𝜎𝐴𝐵 )) (9.2.45)

∑︁
= 𝑝(𝑥) 𝛼 𝑞(𝑥) 1−𝛼 𝑄 𝛼 (𝜏𝐴𝑥 ′ 𝐵′ ∥𝜎𝐴𝑥 ′ 𝐵′ ) (9.2.46)
𝑥∈X:𝑝(𝑥)>0
𝛼−1
∑︁ 𝑝(𝑥)
= 𝑝(𝑥) 𝑄 𝛼 (𝜏𝐴𝑥 ′ 𝐵′ ∥𝜎𝐴𝑥 ′ 𝐵′ ). (9.2.47)
𝑞(𝑥)
𝑥∈X:𝑝(𝑥)>0

The first inequality follows from the data-processing inequality for the Petz–Rényi
relative quasi-entropy (Theorem 7.24), and the first equality follows from its direct-
sum property (see (7.4.46)). Now applying the monotonicity and concavity of the
1
function (·) → 𝛼−1 log2 (·) for 𝛼 ∈ (1, 2], we find that

𝐷 𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 )
1
= log2 𝑄 𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) (9.2.48)
𝛼−1
 ∑︁ 𝛼−1 
1  𝑝(𝑥) 𝑥 𝑥

≥ log2  𝑝(𝑥) 𝑄 𝛼 (𝜏𝐴′ 𝐵′ ∥𝜎𝐴′ 𝐵′ )  (9.2.49)
𝛼−1 𝑥∈X:𝑝(𝑥)>0 𝑞(𝑥) 
 " 
#
𝛼−1
∑︁ 1 𝑝(𝑥)
≥ 𝑝(𝑥) log2 𝑄 𝛼 (𝜏𝐴𝑥 ′ 𝐵′ ∥𝜎𝐴𝑥 ′ 𝐵′ ) (9.2.50)
𝛼−1 𝑞(𝑥)
𝑥∈X:𝑝(𝑥)>0

∑︁ 𝑝(𝑥)
= 𝑝(𝑥) log2 + 𝐷 𝛼 (𝜏𝐴𝑥 ′ 𝐵′ ∥𝜎𝐴𝑥 ′ 𝐵′ ) (9.2.51)
𝑞(𝑥)
𝑥∈X:𝑝(𝑥)>0
∑︁
= 𝐷 ( 𝑝∥𝑞) + 𝑝(𝑥)𝐷 𝛼 (𝜏𝐴𝑥 ′ 𝐵′ ∥𝜎𝐴𝑥 ′ 𝐵′ ) (9.2.52)
𝑥∈X:𝑝(𝑥)>0
∑︁
≥ 𝑝(𝑥)𝐸 𝛼 ( 𝐴′; 𝐵′)𝜏 𝑥 . (9.2.53)
𝑥∈X:𝑝(𝑥)>0

The final two equalities follow by direct evaluation and applying definitions. The
final inequality follows because 𝐷 ( 𝑝∥𝑞) ≥ 0 for probability distributions 𝑝 and
545
Chapter 9: Entanglement Measures

𝑞, and it also follows from the definition of the Petz–Rényi relative entropy of
entanglement and the fact that the state 𝜎𝐴𝑥 ′ 𝐵′ is separable. Since the inequality
holds for every separable state 𝜎𝐴𝐵 , we conclude the desired inequality:
∑︁
𝐸 𝛼 ( 𝐴; 𝐵) 𝜌 ≥ 𝑝(𝑥)𝐸 𝛼 ( 𝐴′; 𝐵′)𝜏 𝑥 . (9.2.54)
𝑥∈X:𝑝(𝑥)>0

By applying the same method of proof for the sandwiched and geometric Rényi
relative entropies for the range of 𝛼 > 1 for which data processing holds, along with
their data processing and direct-sum properties, we conclude the same inequality
for the sandwiched and geometric Rényi relative entropies of entanglement. ■

The following additional facts are known specifically about the relative entropy
of entanglement and the Petz–, sandwiched, and geometric Rényi relative entropies
of entanglement.

Proposition 9.20

1. For every bipartite state 𝜌 𝐴𝐵 ,

𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 ≥ max{𝐼 ( 𝐴⟩𝐵) 𝜌 , 𝐼 (𝐵⟩ 𝐴) 𝜌 }, (9.2.55)

𝐸 𝛼 ( 𝐴; 𝐵) 𝜌 ≥ max{𝐼𝛼 ( 𝐴⟩𝐵) 𝜌 , 𝐼𝛼 (𝐵⟩ 𝐴) 𝜌 }, (9.2.56)
e𝛼 ( 𝐴; 𝐵) 𝜌 ≥ max{ e
𝐸 𝐼𝛼 ( 𝐴⟩𝐵) 𝜌 , e𝐼𝛼 (𝐵⟩ 𝐴) 𝜌 } (9.2.57)
b𝛼 ( 𝐴; 𝐵) 𝜌 ≥ max{ b
𝐸 𝐼𝛼 ( 𝐴⟩𝐵) 𝜌 , b𝐼𝛼 (𝐵⟩ 𝐴) 𝜌 }, (9.2.58)

where the last three inequalities hold for the range of 𝛼 for which data
processing holds.
2. For every pure bipartite state 𝜓 𝐴𝐵 ,

𝐸 𝑅 ( 𝐴; 𝐵)𝜓 = 𝐻 ( 𝐴)𝜓 , (9.2.59)

𝐸 𝛼 ( 𝐴; 𝐵)𝜓 = 𝐻 1 ( 𝐴)𝜓 , (9.2.60)
𝛼
e𝛼 ( 𝐴; 𝐵)𝜓 = 𝐻 𝛼 ( 𝐴)𝜓 ,
𝐸 (9.2.61)
2𝛼−1
b𝛼 ( 𝐴; 𝐵)𝜓 = 𝐻 1 ( 𝐴)𝜓 ,
𝐸 (9.2.62)
2

where the last three equalities hold for the range of 𝛼 for which data
processing holds.

546
Chapter 9: Entanglement Measures

Remark: Observe that for pure states, the relative entropy of entanglement is equal to the
entanglement of formation (see (9.1.40)).

Proof:
1. Let 𝜎𝐴𝐵 be an arbitrary separable state, which can be written as
∑︁
𝜎𝐴𝐵 = 𝑝(𝑥)𝜔𝑥𝐴 ⊗ 𝜏𝐵𝑥 , (9.2.63)
𝑥∈X

where X is some finite alphabet 𝑝 : X → [0, 1] is a probability distribution,

and {𝜔𝑥𝐴 }𝑥∈X , {𝜏𝐵𝑥 }𝑥∈X are sets of states. Since every 𝜔𝑥𝐴 is a state (in particular,
since all of its eigenvalues are bounded from above by one), the inequality
𝜔𝑥𝐴 ≤ 1 𝐴 holds, which implies that 1 𝐴 ⊗ 𝜏𝐵𝑥 ≥ 𝜔𝑥𝐴 ⊗ 𝜏𝐵𝑥 for all 𝑥 ∈ X, which
thus implies that

1 𝐴 ⊗ 𝜎𝐵 = 𝑝(𝑥) 1 𝐴 ⊗ 𝜏𝐵𝑥 ≥
∑︁ ∑︁
𝑝(𝑥)𝜔𝑥𝐴 ⊗ 𝜏𝐵𝑥 = 𝜎𝐴𝐵 . (9.2.64)
𝑥∈X 𝑥∈X

Therefore, using property 2.(d) in Proposition 7.3, we have that

𝐷 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) ≥ 𝐷 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ) (9.2.65)
≥ inf 𝐷 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ) (9.2.66)
𝜎𝐵
= 𝐼 ( 𝐴⟩𝐵) 𝜌 (9.2.67)

for all separable states, where the optimization is with respect to every state
𝜎𝐵 on the right-hand side, and where we have used the expression in (7.2.92)
for coherent information. We thus have

𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 = inf 𝐷 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) ≥ 𝐼 ( 𝐴⟩𝐵) 𝜌 . (9.2.68)

𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)

By the same argument, but flipping the roles of Alice and Bob, we conclude
that
𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 ≥ 𝐼 (𝐵⟩ 𝐴) 𝜌 . (9.2.69)
Combining this inequality with the one in (9.2.68) leads to the desired result.
The same proof, but using (9.2.64) and Property 4. of Propositions 7.26, 7.35,
and 7.46, leads to the inequalities in (9.2.56)–(9.2.58).

547
Chapter 9: Entanglement Measures

2. Let
𝑟 √︁
∑︁
|𝜓⟩ 𝐴𝐵 = 𝜆 𝑘 |𝑒 𝑘 ⟩ 𝐴 ⊗ | 𝑓 𝑘 ⟩𝐵 (9.2.70)
𝑘=1
be the Schmidt decomposition of |𝜓⟩ 𝐴𝐵 , where 𝑟 is the Schmidt rank, 𝜆 𝑘 > 0
for all 1 ≤ 𝑘 ≤ 𝑟, and {|𝑒 𝑘 ⟩ 𝐴 }𝑟𝑘=1 , {| 𝑓 𝑘 ⟩𝐵 }𝑟𝑘=1 are orthonormal sets of vectors.
Since the entropy 𝐻 ( 𝐴𝐵)𝜓 vanishes for all pure states, we immediately have

𝐼 ( 𝐴⟩𝐵)𝜓 = 𝐻 (𝐵)𝜓 = 𝐻 ( 𝐴)𝜓 = 𝐼 (𝐵⟩ 𝐴)𝜓 , (9.2.71)

where the equality 𝐻 (𝐵)𝜓 = 𝐻 ( 𝐴)𝜓 follows from the Schmidt decomposition
in (9.2.70), which tells us that the reduced states 𝜓 𝐴 and 𝜓 𝐵 have the same
non-zero eigevalues. Based on the fact that 𝐸 𝑅 ( 𝐴; 𝐵)𝜓 ≥ 𝐼 ( 𝐴⟩𝐵)𝜓 , which we
just proved above, we thus have the lower bound

𝐸 𝑅 ( 𝐴; 𝐵)𝜓 ≥ 𝐻 ( 𝐴)𝜓 . (9.2.72)

The same reasoning, but using the lower bounds in (9.2.56)–(9.2.58), as well
as (7.11.139)–(7.11.141), leads to the inequalities

𝐸 𝛼 ( 𝐴; 𝐵)𝜓 ≥ 𝐻 1 ( 𝐴)𝜓 , (9.2.73)

𝛼
e𝛼 ( 𝐴; 𝐵)𝜓 ≥ 𝐻 𝛼 ( 𝐴)𝜓 ,
𝐸 (9.2.74)
2𝛼−1
b𝛼 ( 𝐴; 𝐵)𝜓 ≥ 𝐻 1 ( 𝐴)𝜓 .
𝐸 (9.2.75)
2

For the reverse inequality, let

𝑟
∑︁
ΠB |𝑒 𝑘 ⟩⟨𝑒 𝑘 | 𝐴 ⊗ | 𝑓 𝑘 ⟩⟨ 𝑓 𝑘 | 𝐵 (9.2.76)
𝑘=1

be the projection onto the 𝑟 2 -dimension subspace of H 𝐴𝐵 on which 𝜓 𝐴𝐵 is

supported. Also, define a channel N as

N(𝑋 𝐴𝐵 ) B Π𝑋 𝐴𝐵 Π + ( 1 𝐴𝐵 − Π) 𝑋 𝐴𝐵 ( 1 𝐴𝐵 − Π). (9.2.77)

Note that, by definition, N(𝜓 𝐴𝐵 ) = 𝜓 𝐴𝐵 . Also, let 𝜎𝐵 be a state, and, for

𝑝(𝑘) B ⟨ 𝑓 𝑘 |𝜎𝐵 | 𝑓 𝑘 ⟩𝐵 , set
𝑟
∑︁
𝜎 𝐴𝐵 B 𝑝(𝑘)|𝑒 𝑘 ⟩⟨𝑒 𝑘 | 𝐴 ⊗ | 𝑓 𝑘 ⟩⟨ 𝑓 𝑘 | 𝐵 , (9.2.78)
𝑘=1

548
Chapter 9: Entanglement Measures

which is a separable state. It is straightforward to show that

Π( 1 𝐴 ⊗ 𝜎𝐵 )Π = 𝜎 𝐴𝐵 . (9.2.79)
Also, we have that
𝐷 (N(𝜓 𝐴𝐵 )∥N( 1 𝐴 ⊗ 𝜎𝐵 ))
= 𝐷 (𝜓 𝐴𝐵 ∥𝜎 𝐴𝐵 + ( 1 𝐴𝐵 − Π)( 1 𝐴 ⊗ 𝜎𝐵 )( 1 𝐴𝐵 − Π)) (9.2.80)
= 𝐷 (𝜓 𝐴𝐵 ∥𝜎 𝐴𝐵 ), (9.2.81)
because the operator ( 1 𝐴𝐵 − Π)( 1 𝐴 ⊗ 𝜎𝐵 )( 1 𝐴𝐵 − Π) is supported on the space
orthogonal to the support of 𝜓 𝐴𝐵 . Therefore, by the data-processing inequality
for quantum relative entropy, we obtain
𝐸 𝑅 ( 𝐴; 𝐵)𝜓 = inf 𝐷 (𝜓 𝐴𝐵 ∥𝜎𝐴𝐵 ) (9.2.82)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)
≤ 𝐷 (𝜓 𝐴𝐵 ∥𝜎 𝐴𝐵 ) (9.2.83)
= 𝐷 (N(𝜓 𝐴𝐵 )∥N( 1 𝐴 ⊗ 𝜎𝐵 )) (9.2.84)
≤ 𝐷 (𝜓 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ). (9.2.85)
Since the inequality holds for every state 𝜎𝐵 , we conclude that
𝐸 𝑅 ( 𝐴; 𝐵)𝜓 ≤ inf 𝐷 (𝜓 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ) = 𝐼 ( 𝐴⟩𝐵)𝜓 = 𝐻 ( 𝐴)𝜓 . (9.2.86)
𝜎𝐵

Combining this with (9.2.72) gives us 𝐸 𝑅 ( 𝐴; 𝐵)𝜓 = 𝐻 ( 𝐴)𝜓 , as required.

Applying the same method of proof, but using the properties of the Petz–,
sandwiched, and geometric Rényi relative entropies, as well as (7.11.139)–
(7.11.141), we conclude the inequalities
𝐸 𝛼 ( 𝐴; 𝐵)𝜓 ≤ 𝐻 1 ( 𝐴)𝜓 , (9.2.87)
𝛼
e𝛼 ( 𝐴; 𝐵)𝜓 ≤ 𝐻 𝛼 ( 𝐴)𝜓 ,
𝐸 (9.2.88)
2𝛼−1
b𝛼 ( 𝐴; 𝐵)𝜓 ≤ 𝐻 1 ( 𝐴)𝜓 ,
𝐸 (9.2.89)
2

which, combined with (9.2.73)–(9.2.75), leads to (9.2.60)–(9.2.62). ■

9.2.1 Cone Program Formulations

Computing a generalized divergence of entanglement involves an optimization

over the set of separable states, which is known to be hard (please consult the
549
Chapter 9: Entanglement Measures

Bibliographic Notes in Section 9.6). The optimization is made more complicated

by the fact that most of the generalized divergences we consider in this book are
non-linear functions of the input state (for example, sandwiched Rényi relative
entropy). However, both the max-relative entropy and hypothesis testing relative
entropy can be formulated as semi-definite programs (SDPs). Indeed, recall from
(7.8.4) that
𝐷 max (𝜌∥𝜎) = log2 inf{𝜆 : 𝜌 ≤ 𝜆𝜎}, (9.2.90)
and recall from (5.3.125) that

𝐷 𝜀𝐻 (𝜌∥𝜎) = − log2 inf{Tr[Λ𝜎] : 0 ≤ Λ ≤ 1, Tr[Λ𝜌] ≥ 1 − 𝜀}. (9.2.91)

As discussed earlier, both of these generalized divergences can be cast into the
SDP standard forms in Definition 2.26, and thus their corresponding generalized
divergence of entanglement can be formulated as a cone program. A cone program
is an optimization problem over a convex cone3 with a convex objective function.
An SDP is a special case of a cone program in which the convex cone is the set of
positive semi-definite operators.
The convex cone of interest here is the set SEP(
d 𝐴 : 𝐵) of all separable operators,
which we define as follows: 𝑋 𝐴𝐵 ∈ SEP(
d 𝐴 : 𝐵) if there exists a positive integer ℓ
and positive semi-definite operators {𝑃𝑥𝐴 }ℓ𝑥=1 and {𝑄 𝑥𝐵 }ℓ𝑥=1 such that
ℓ
∑︁
𝑋 𝐴𝐵 = 𝑃𝑥𝐴 ⊗ 𝑄 𝑥𝐵 . (9.2.92)
𝑥=1

In what follows, we sometimes employ the shorthands SEP and SEP

d when the
bipartition is clear from the context.
We now show that the max-relative entropy of entanglement can be written as a
cone program.

Proposition 9.21 Cone Program for Max-Relative Entropy of Entangle-

ment
Let 𝜌 𝐴𝐵 be a bipartite state. Then,
𝐸 max ( 𝐴; 𝐵) 𝜌 = log2 𝐺 max ( 𝐴; 𝐵) 𝜌 , (9.2.93)

3A subset 𝐶 of a vector space is called a cone if 𝛼𝑥 ∈ 𝐶 for every 𝑥 ∈ 𝐶 and 𝛼 > 0. A convex
cone is one for for which 𝛼𝑥 + 𝛽𝑦 ∈ 𝐶 for all 𝛼, 𝛽 > 0 and 𝑥, 𝑦 ∈ 𝐶.

550
Chapter 9: Entanglement Measures

where
n o
𝐺 max ( 𝐴; 𝐵) 𝜌 B inf Tr[𝑋 𝐴𝐵 ] : 𝜌 𝐴𝐵 ≤ 𝑋 𝐴𝐵 , 𝑋 𝐴𝐵 ∈ SEP
d . (9.2.94)

Proof: Employing the expression in (9.2.90), we find that

𝐸 max ( 𝐴; 𝐵) 𝜌 inf 𝐷 max (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 )

𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)
= log2 inf{𝜇 : 𝜌 𝐴𝐵 ≤ 𝜇𝜎𝐴𝐵 , 𝜎𝐴𝐵 ∈ SEP} (9.2.95)
n o
= log2 inf Tr[𝑋 𝐴𝐵 ] : 𝜌 𝐴𝐵 ≤ 𝑋 𝐴𝐵 , 𝑋 𝐴𝐵 ∈ SEP ,
d (9.2.96)

as required, where in the last step we made the change of variable 𝜇𝜎𝐴𝐵 ≡ 𝑋 𝐴𝐵 .
Since 𝜎𝐴𝐵 ∈ SEP( 𝐴 : 𝐵) and 𝜇 ≥ 0, we have that 𝑋 𝐴𝐵 ∈ SEP(
d 𝐴 : 𝐵). ■

Next, we show that the hypothesis testing relative entropy of entanglement can
be written as a cone program.

Proposition 9.22 Cone Program for Hypothesis Testing Relative Entropy

of Entanglement
Let 𝜌 𝐴𝐵 be a bipartite state. Then, for all 𝜀 ∈ [0, 1],

𝐸 𝑅𝜀 ( 𝐴; 𝐵) 𝜌 = − log2 sup{𝜇(1 − 𝜀) − Tr[𝑍 𝐴𝐵 ] : 𝜇 ≥ 0, 𝑍 𝐴𝐵 ≥ 0,

𝜎𝐴𝐵 ∈ SEP,
d 𝜇𝜌 𝐴𝐵 ≤ 𝜎𝐴𝐵 + 𝑍 𝐴𝐵 , Tr[𝜎𝐴𝐵 ] = 1}. (9.2.97)

Proof: This follows from the definition in (9.2.3) and the dual formulation of the
hypothesis testing relative entropy stated in (7.9.5). ■

Recall that SEP = PPT in the case of qubit-qubit and qubit-qutrit states, which
means that the optimizations in (9.2.94) and (9.2.97) are SDPs when 𝜌 𝐴𝐵 is either
a two-qubit state or a qubit-qutrit state.

551
Chapter 9: Entanglement Measures

9.3 Generalized Rains Divergence

As explained in Section 9.1.1, the generalized divergence of entanglement is in
general complicated to compute because the set of separable states does not admit
a simple characterization, making optimization over separable states difficult. By
relaxing the set of separable states to the set PPT′ defined in (9.1.144), which
has a simple characterization in terms of semi-definite constraints, we defined the
generalized Rains divergence in (9.1.148). Recall that
PPT′ ( 𝐴 : 𝐵) = {𝜎𝐴𝐵 : 𝜎𝐴𝐵 ≥ 0, ∥T𝐵 (𝜎𝐴𝐵 ) ∥ 1 ≤ 1}. (9.3.1)

In this section, we investigate properties of the generalized Rains divergence.

We start by recalling its definition.

Definition 9.23 Generalized Rains Divergence of a Bipartite State

Let 𝑫 be a generalized divergence (see Definition 7.15). For every bipartite
state 𝜌 𝐴𝐵 , we define the generalized Rains divergence of 𝜌 𝐴𝐵 as

𝑹( 𝐴; 𝐵) 𝜌 B inf 𝑫 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ). (9.3.2)

𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)

If 𝑫 is continuous in its second argument, then the infimum can be replaced by

a minimum.

Since SEP ⊆ PPT′, optimizing over states in PPT′ can never lead to a value that
is greater than the value obtained by optimizing over separable states. Therefore,
as stated in (9.1.149),
𝑹( 𝐴; 𝐵) 𝜌 ≤ 𝑬 ( 𝐴; 𝐵) 𝜌 (9.3.3)
for every state 𝜌 𝐴𝐵 .
We are particularly interested throughout the rest of this book in the following
generalized Rains divergences for every state 𝜌 𝐴𝐵 :
1. The Rains relative entropy of 𝜌 𝐴𝐵 ,
𝑅( 𝐴; 𝐵) 𝜌 B inf 𝐷 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ), (9.3.4)
𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)

where 𝐷 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) is the quantum relative entropy of 𝜌 𝐴𝐵 and 𝜎𝐴𝐵 (Defi-

nition 7.1). For two-qubit states, it is known that the infimum in (9.3.4) is
552
Chapter 9: Entanglement Measures

achieved by a separable state, which means that 𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 = 𝑅( 𝐴; 𝐵) 𝜌 for

two-qubit states 𝜌 𝐴𝐵 (please consult the Bibliographic Notes in Section 9.6).
2. The 𝜀-hypothesis testing Rains relative entropy of 𝜌 𝐴𝐵 ,
𝜀
𝑅𝐻 ( 𝐴; 𝐵) 𝜌 B inf′ 𝐷 𝜀𝐻 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ), (9.3.5)
𝜎𝐴𝐵 ∈PPT ( 𝐴:𝐵)

where 𝐷 𝜀𝐻 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) is the 𝜀-hypothesis testing relative entropy of 𝜌 𝐴𝐵 and

𝜎𝐴𝐵 (Definition 7.65).
3. The sandwiched Rényi Rains relative entropy of 𝜌 𝐴𝐵 ,
e𝛼 ( 𝐴; 𝐵) 𝜌 B
𝑅 inf e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ),
𝐷 (9.3.6)
𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)

where 𝐷 e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ), 𝛼 ∈ [1/2, 1) ∪ (1, ∞), is the sandwiched Rényi relative

entropy of 𝜌 𝐴𝐵 and 𝜎𝐴𝐵 (Definition 7.28). Note that 𝑅 e𝛼 ( 𝐴; 𝐵) 𝜌 is monotonically
increasing in 𝛼 for all 𝜌 𝐴𝐵 (see Proposition 7.31). This fact, along with the
fact that lim𝛼→1 𝐷e𝛼 = 𝐷 (see Proposition 7.30), leads to
e𝛼 ( 𝐴; 𝐵) 𝜌 = 𝑅( 𝐴; 𝐵) 𝜌
lim 𝑅 (9.3.7)
𝛼→1
for every state 𝜌 𝐴𝐵 . See Appendix 10.A for details of the proof.
4. The max-Rains relative entropy of 𝜌 𝐴𝐵 ,
𝑅max ( 𝐴; 𝐵) 𝜌 B inf 𝐷 max (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ), (9.3.8)
𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)

where 𝐷 max (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) is the max-relative entropy of 𝜌 𝐴𝐵 and 𝜎𝐴𝐵 (Defini-

tion 7.58). Using the fact that lim𝛼→∞ 𝐷 e𝛼 = 𝐷 max (see Proposition 7.61), we
find that
𝑅max ( 𝐴; 𝐵) 𝜌 = lim 𝑅
e𝛼 ( 𝐴; 𝐵) 𝜌 (9.3.9)
𝛼→∞
for every state 𝜌 𝐴𝐵 . See Appendix 10.A for details of the proof. Due to this
e𝛼 ( 𝐴; 𝐵) 𝜌 is monotonically increasing in 𝛼 for
fact, along with the fact that 𝑅
all 𝜌 𝐴𝐵 , we have that
𝑅max ( 𝐴; 𝐵) 𝜌 ≥ 𝑅
e𝛼 ( 𝐴; 𝐵) 𝜌 (9.3.10)
for all 𝛼 ∈ (1, ∞) and every state 𝜌 𝐴𝐵 .
The following inequalities relate the log-negativity, as defined in (9.1.96), to
the Rains relative entropy, the sandwiched Rényi Rains relative entropy, and the
max-Rains relative entropy:

553
Chapter 9: Entanglement Measures

Proposition 9.24 Log-Negativity to Rains Relative Entropies

For a bipartite state 𝜌 𝐴𝐵 , the following inequalities hold

𝑅( 𝐴; 𝐵) 𝜌 ≤ 𝑅max ( 𝐴; 𝐵) 𝜌 ≤ 𝐸 𝑁 ( 𝐴; 𝐵) 𝜌 . (9.3.11)

Furthermore, for all 𝛼, 𝛽 ∈ [1/2, 1) ∪ (1, ∞) such that 𝛼 < 𝛽, we have that
e𝛼 ( 𝐴; 𝐵) 𝜌 ≤ 𝑅
𝑅 e𝛽 ( 𝐴; 𝐵) 𝜌 . (9.3.12)

Proof: The inequality 𝑅( 𝐴; 𝐵) 𝜌 ≤ 𝑅max ( 𝐴; 𝐵) 𝜌 and the inequality in (9.3.12) are

a direct consequence of the monotonicity in 𝛼 of the sandwiched Rényi relative
entropy (Proposition 7.31), as well as (9.3.7) and (9.3.9).
To see the inequality 𝑅max ( 𝐴; 𝐵) 𝜌 ≤ 𝐸 𝑁 ( 𝐴; 𝐵) 𝜌 , consider picking
𝜌 𝐴𝐵
𝜎𝐴𝐵 = (9.3.13)
∥T𝐵 (𝜌 𝐴𝐵 ) ∥ 1
in (9.3.8). For this choice, we have that 𝜎𝐴𝐵 ≥ 0 and ∥T𝐵 (𝜎𝐴𝐵 ) ∥ 1 ≤ 1, so that
𝜎𝐴𝐵 ∈ PPT′ ( 𝐴 : 𝐵). Thus
𝑅max ( 𝐴; 𝐵) 𝜌 ≤ 𝐷 max (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) (9.3.14)
= 𝐷 max (𝜌 𝐴𝐵 ∥ 𝜌 𝐴𝐵 ) + log2 ∥T𝐵 (𝜌 𝐴𝐵 )∥ 1 (9.3.15)
= 𝐸 𝑁 ( 𝐴; 𝐵) 𝜌 . (9.3.16)
The first equality follows by direct evaluation using Definition 7.58. The last equality
follows because 𝐷 max (𝜌 𝐴𝐵 ∥ 𝜌 𝐴𝐵 ) = 0 and from the definition in (9.1.96). ■

Proposition 9.25 Properties of Generalized Rains Divergence

Let 𝑫 be a generalized divergence, and consider the generalized Rains diver-
gence 𝑹( 𝐴; 𝐵) 𝜌 of a state 𝜌 𝐴𝐵 as defined in (9.3.2).
1. PPT monotonicity: For every completely PPT-preserving channel
P 𝐴𝐵→𝐴′ 𝐵′ ,
𝑹( 𝐴; 𝐵) 𝜌 ≥ 𝑹( 𝐴′; 𝐵′)𝜔 , (9.3.17)
where 𝜔 𝐴′ 𝐵′ = P 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ). In other words, the generalized Rains di-
vergence is monotonically non-increasing under completely PPT-preserving
channels. Since every LOCC channel is a completely PPT-preserving

554
Chapter 9: Entanglement Measures

channel (Propositions 4.24 and 4.29), the generalized Rains divergence

is monotone non-increasing under LOCC channels, and thus it is an
entanglement measure as per Definition 9.1.
2. Subadditivity: If 𝑫 is additive for product positive semi-definite operators,
i.e., 𝑫 (𝜌 ⊗ 𝜔∥𝜎 ⊗ 𝜏) = 𝑫 (𝜌∥𝜎) + 𝑫 (𝜔∥𝜏), then for every two quantum
states 𝜌 𝐴1 𝐵1 and 𝜔 𝐴2 𝐵2 the generalized Rains relative entropy is subadditive:

𝑹( 𝐴1 𝐴2 ; 𝐵1 𝐵2 ) 𝜌⊗𝜔 ≤ 𝑹( 𝐴1 ; 𝐵1 ) 𝜌 + 𝑹( 𝐴2 ; 𝐵2 )𝜔 . (9.3.18)

3. Convexity: If 𝑫 is jointly convex, meaning that

!
∑︁ ∑︁ ∑︁
𝑥 𝑥
𝑫 𝑝(𝑥) 𝜌 𝐴𝐵 𝑝(𝑥)𝜎𝐴𝐵 ≤ 𝑝(𝑥) 𝑫 (𝜌 𝑥𝐴𝐵 ∥𝜎𝐴𝐵
𝑥
) (9.3.19)
𝑥∈X 𝑥∈X 𝑥∈X

for every finite alphabet X, probability distribution 𝑝 : X → [0, 1], set

{𝜌 𝑥𝐴𝐵 }𝑥∈X of states, set {𝜎𝐴𝐵
𝑥 }
𝑥∈X of positive semi-definite operators, then
the generalized Rains divergence is convex:
∑︁
𝑹( 𝐴; 𝐵) 𝜌 ≤ 𝑝(𝑥) 𝑹( 𝐴; 𝐵) 𝜌 𝑥 , (9.3.20)
𝑥∈X
Í
where 𝜌 𝐴𝐵 = 𝑥∈X 𝑝(𝑥) 𝜌 𝑥𝐴𝐵 .
Properties 1. and 2. are satisfied when the generalized divergence is the quantum
relative entropy, the Petz–, sandwiched, and geometric Rényi relative entropies,
and the max-relative entropy. Property 3. is satisfied when the generalized
divergence is the quantum relative entropy and the Petz–, sandwiched, and
geometric Rényi relative entropies for the range of 𝛼 < 1 for which data
processing holds.

Remark: Note that the generalized Rains divergence is generally not a faithful entanglement
measure. Although 𝑹( 𝐴; 𝐵)𝜌 = 0 for all separable states 𝜌 𝐴𝐵 due to the containment SEP( 𝐴 :
𝐵) ⊆ PPT′ ( 𝐴 : 𝐵), the converse statement is not generally true because the infimum in the
definition of 𝑹( 𝐴; 𝐵)𝜌 is not generally achieved by a separable state.

Proof:

555
Chapter 9: Entanglement Measures

1. For 𝜔 𝐴′ 𝐵′ = P 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ), we have by definition,

𝑹( 𝐴′; 𝐵′)𝜔 = inf 𝑫 (𝜔 𝐴′ 𝐵′ ∥𝜏𝐴′ 𝐵′ ) (9.3.21)

𝜏𝐴′ 𝐵′ ∈PPT′ ( 𝐴′ :𝐵′ )
= inf 𝑫 (P 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 )∥𝜏𝐴′ 𝐵′ ). (9.3.22)
𝜏𝐴′ 𝐵′ ∈PPT′ ( 𝐴′ :𝐵′ )

Now, recall from Lemma 9.14 that the set PPT′ is closed under completely
PPT-preserving channels. Based on this, it follows that the output operators of
the completely PPT-preserving channel P 𝐴𝐵→𝐴′ 𝐵′ are in the set PPT′ ( 𝐴′ : 𝐵′).
In other words, we have

{P 𝐴𝐵→𝐴′ 𝐵′ (𝜎𝐴𝐵 ) : 𝜎𝐴𝐵 ∈ PPT′ ( 𝐴 : 𝐵)} ⊆ PPT′ ( 𝐴′ : 𝐵′). (9.3.23)

Therefore, restricting the optimization in (9.3.22) to this set leads to

𝑹( 𝐴′; 𝐵′)𝜔 ≤ inf 𝑫 (P 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 )∥P 𝐴𝐵→𝐴′ 𝐵′ (𝜎𝐴𝐵 )) (9.3.24)

𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)
≤ inf 𝑫 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) (9.3.25)
𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)
= 𝑹( 𝐴; 𝐵) 𝜌 , (9.3.26)

as required, where to obtain the second inequality we used the data-processing

inequality for the generalized divergence.
2. By definition, the optimization in the definition of 𝑹( 𝐴1 𝐴2 ; 𝐵1 𝐵2 ) 𝜌⊗𝜔 is over
the set

PPT′ ( 𝐴1 𝐴2 : 𝐵1 𝐵2 )
= {𝜎𝐴1 𝐴2 𝐵1 𝐵1 : 𝜎𝐴1 𝐴2 𝐵1 𝐵2 ≥ 0, T𝐵1 𝐵2 (𝜎𝐴1 𝐴2 𝐵1 𝐵2 ) 1
≤ 1}, (9.3.27)

which contains operators of the form 𝜎𝐴1 𝐴2 𝐵1 𝐵2 = 𝜉 𝐴1 𝐵1 ⊗ 𝜏𝐴2 𝐵2 , where 𝜉 𝐴1 𝐵1 ∈

PPT′ ( 𝐴1 : 𝐵1 ) and 𝜏𝐴2 𝐵2 ∈ PPT′ ( 𝐴2 : 𝐵2 ). By restricting the optimization to
such tensor product operators, and by using the additivity of the generalized
divergence 𝑫, we obtain

𝑹( 𝐴1 𝐴2 ; 𝐵1 𝐵2 ) 𝜌⊗𝜔 ≤ 𝑫 (𝜌 𝐴1 𝐵1 ⊗ 𝜔 𝐴2 𝐵2 ∥𝜉 𝐴1 𝐵1 ⊗ 𝜏𝐴2 𝐵2 ) (9.3.28)

= 𝑫 (𝜌 𝐴1 𝐵1 ∥𝜉 𝐴1 𝐵1 ) + 𝑫 (𝜔 𝐴2 𝐵2 ∥𝜏𝐴2 𝐵2 ). (9.3.29)

Since 𝜉 𝐴1 𝐵1 ∈ PPT′ ( 𝐴1 : 𝐵1 ) and 𝜏𝐴2 𝐵2 ∈ PPT′ ( 𝐴2 : 𝐵2 ) are arbitrary, the

inequality in (9.3.18) follows.
556
Chapter 9: Entanglement Measures

3. We have
!
∑︁
𝑹( 𝐴; 𝐵) 𝜌 = inf′ 𝑫 𝑝(𝑥) 𝜌 𝑥𝐴𝐵 𝜎𝐴𝐵 . (9.3.30)
𝜎𝐴𝐵 ∈PPT ( 𝐴:𝐵)
𝑥∈X

Let us restrict the optimization over all PPT′ operators to an optimization over
′
sets {𝜎𝐴𝐵
𝑥 }
𝑥∈X of PPT operators indexed by Í the alphabet X. Then, because
′
PPT ( 𝐴 : 𝐵) is a convex set, we have that 𝑥∈X 𝑝(𝑥)𝜎𝐴𝐵 𝑥 ∈ PPT′ ( 𝐴 : 𝐵).
Therefore, using the joint convexity of 𝑫, we obtain
!
∑︁ ∑︁
𝑹( 𝐴; 𝐵) 𝜌 ≤ 𝑥 inf ′ 𝑫 𝑝(𝑥) 𝜌 𝑥𝐴𝐵 𝑝(𝑥)𝜎𝐴𝐵𝑥
(9.3.31)
{𝜎𝐴𝐵 } 𝑥 ⊂PPT ( 𝐴:𝐵)
∑︁𝑥∈X 𝑥∈X
𝑥 𝑥
≤ inf
𝑥 } ⊂PPT′ ( 𝐴:𝐵)
𝑝(𝑥) 𝑫 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) (9.3.32)
{𝜎𝐴𝐵 𝑥
𝑥∈X
∑︁
≤ 𝑝(𝑥) inf
𝑥 ∈PPT′ ( 𝐴:𝐵)
𝑥
𝑫 (𝜌 𝑥𝐴𝐵 ∥𝜎𝐴𝐵 ) (9.3.33)
𝜎𝐴𝐵
𝑥∈X
∑︁
= 𝑝(𝑥) 𝑹( 𝐴; 𝐵) 𝜌 𝑥 , (9.3.34)
𝑥∈X

as required. ■

By Proposition 9.25, the sandwiched Rényi Rains relative entropy 𝑅

e𝛼 is convex
for 𝛼 ∈ [1/2, 1). Although convexity does not hold for 𝛼 > 1, we do have
quasi-convexity, which we now prove.

Proposition 9.26 Quasi-Convexity of Rényi Rains Relative Entropy

Let 𝑝 : X → [0, 1] be a probability distribution over a finite alphabet X, and
let {𝜌 𝑥𝐴𝐵 }𝑥∈X be a set of states. Then, for all 𝛼 ∈ (1, ∞),

e𝛼 ( 𝐴; 𝐵) 𝜌 ≤ max 𝑅
𝑅 e𝛼 ( 𝐴; 𝐵) 𝜌 𝑥 , (9.3.35)
𝑥∈X
Í
where 𝜌 𝐴𝐵 = 𝑥∈X 𝑝(𝑥) 𝜌 𝑥𝐴𝐵 .

Proof: We have
!
∑︁
e𝛼 ( 𝐴; 𝐵) 𝜌 =
𝑅 inf 𝐷
e𝛼 𝑝(𝑥) 𝜌 𝑥𝐴𝐵 𝜎𝐴𝐵 . (9.3.36)
𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)
𝑥∈X

557
Chapter 9: Entanglement Measures

Let us restrict the optimization over all PPT′ operators to an optimization over sets
′
{𝜎𝐴𝐵
𝑥 }
𝑥∈X of PPT operators indexed by the alphabet X. Then, because PPT′ ( 𝐴 : 𝐵)
Í
is a convex set, we have that 𝑥∈X 𝑝(𝑥)𝜎𝐴𝐵 𝑥 ∈ PPT′ ( 𝐴 : 𝐵). Let us also recall

from (7.5.174) that the sandwiched Rényi relative entropy is jointly quasi-convex,
meaning that
!
∑︁ ∑︁
𝐷
e𝛼 𝑝(𝑥) 𝜌 𝑥𝐴𝐵 𝑝(𝑥)𝜎𝐴𝐵𝑥
≤ max 𝐷e𝛼 (𝜌 𝑥 ∥𝜎 𝑥 ),
𝐴𝐵 𝐴𝐵 (9.3.37)
𝑥∈X
𝑥∈X 𝑥∈X

We thus obtain
!
∑︁ ∑︁
e𝛼 ( 𝐴; 𝐵) 𝜌 ≤
𝑅 inf
𝑥 } ⊂PPT′ ( 𝐴:𝐵)
𝐷
e𝛼 𝑝(𝑥) 𝜌 𝑥𝐴𝐵 𝑥
𝑝(𝑥)𝜎𝐴𝐵 (9.3.38)
{𝜎𝐴𝐵 𝑥
𝑥∈X 𝑥∈X
≤ inf e𝛼 (𝜌 𝑥 ∥𝜎 𝑥 )
max 𝐷 (9.3.39)
𝑥 } ⊂PPT′ ( 𝐴:𝐵) 𝑥∈X 𝐴𝐵 𝐴𝐵
{𝜎𝐴𝐵 𝑥

≤ max inf e𝛼 (𝜌 𝑥 ∥𝜎 𝑥 )
𝐷 (9.3.40)
𝑥 ∈PPT′ ( 𝐴:𝐵) 𝐴𝐵 𝐴𝐵
𝑥∈X 𝜎𝐴𝐵
e𝛼 ( 𝐴; 𝐵) 𝜌 𝑥 ,
= max 𝑅 (9.3.41)
𝑥∈X

as required. ■

9.3.1 Semi-Definite Program Formulations

One of the advantages of using the generalized Rains divergence as an entanglement

measure is that the set PPT′ involved in its definition has a simple characterization
in terms of semi-definite constraints. Indeed, let us recall that

PPT′ ( 𝐴 : 𝐵) = {𝜎𝐴𝐵 : 𝜎𝐴𝐵 ≥ 0, ∥T𝐵 (𝜎𝐴𝐵 ) ∥ 1 ≤ 1}. (9.3.42)

Using the expression in (9.1.102) for ∥T𝐵 (𝜎𝐴𝐵 )∥ 1 , this set can equivalently be
written as follows:

PPT′ ( 𝐴 : 𝐵) = {𝜎𝐴𝐵 : 𝜎𝐴𝐵 ≥ 0, ∃𝐾 𝐴𝐵 , 𝐿 𝐴𝐵 ≥ 0 such that

Tr[𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 ] ≤ 1, T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ) = 𝜎𝐴𝐵 }, (9.3.43)

which can be further simplified to

558
Chapter 9: Entanglement Measures

PPT′ ( 𝐴 : 𝐵) = {T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ) : T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ) ≥ 0,
𝐾 𝐴𝐵 , 𝐿 𝐴𝐵 ≥ 0, Tr[𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 ] ≤ 1}, (9.3.44)

In this section, we show how these characterizations of the set PPT′ allow us to
compute both the max-Rains relative entropy and the hypothesis testing Rains
relative entropy via semi-definite programs (Section 2.4).
We first consider the max-Rains relative entropy, which we recall is defined as

𝑅max ( 𝐴; 𝐵) 𝜌 = inf 𝐷 max (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ). (9.3.45)

𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)

Let us also recall from (7.8.4) that 𝐷 max (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) can be written as follows:

𝐷 max (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) = log2 inf{𝜆 : 𝜌 𝐴𝐵 ≤ 𝜆𝜎𝐴𝐵 }. (9.3.46)

As shown in the discussion after (7.8.4), the optimization in the equation above is a
semi-definite program (SDP). This, along with the definition in (9.3.42) of the set
PPT′ ( 𝐴 : 𝐵), leads to the following SDP formulation for 𝑅max .

Proposition 9.27 SDPs for the Max-Rains Relative Entropy

Let 𝜌 𝐴𝐵 be a bipartite state. Then the max-Rains relative entropy can be written
as
𝑅max ( 𝐴; 𝐵) 𝜌 = log2 𝑊max ( 𝐴; 𝐵) 𝜌 , (9.3.47)
where

𝑊max ( 𝐴; 𝐵) 𝜌
= inf {Tr[𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 ] : T𝐵 [𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ] ≥ 𝜌 𝐴𝐵 }, (9.3.48)
𝐾 𝐴𝐵 ,𝐿 𝐴𝐵 ≥0
= sup {Tr[𝑌 𝐴𝐵 𝜌 𝐴𝐵 ] : ∥T𝐵 [𝑌 𝐴𝐵 ] ∥ ∞ ≤ 1}. (9.3.49)
𝑌 𝐴𝐵 ≥0

Proof: First, we establish the equality for 𝑊 ( 𝐴; 𝐵) 𝜌 in (9.3.48). Due to the fact
that the infimum over PPT′ operators in the definition of 𝑅max can be achieved, we
have that

𝑅max ( 𝐴; 𝐵) 𝜌 = inf 𝐷 max (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) (9.3.50)

𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)
= inf log2 inf{𝜆 : 𝜌 𝐴𝐵 ≤ 𝜆𝜎𝐴𝐵 } (9.3.51)
𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)

559
Chapter 9: Entanglement Measures

= log2 inf{𝜆 : 𝜌 𝐴𝐵 ≤ 𝜆𝜎𝐴𝐵 , 𝜎𝐴𝐵 ≥ 0, ∥T𝐵 [𝜎𝐴𝐵 ] ∥ 1 ≤ 1}.

(9.3.52)
The constraint 𝜌 𝐴𝐵 ≤ 𝜆𝜎𝐴𝐵 implies (by taking the trace on both sides of the
inequality) that 𝜆 ≥ 1. This in turn implies that 𝜆𝜎𝐴𝐵 ≥ 0 and that ∥T𝐵 [𝜆𝜎𝐴𝐵 ] ∥ 1 ≤
𝜆. So we have

𝑅max ( 𝐴; 𝐵) 𝜌 = log2 inf{𝜆 : 𝜌 𝐴𝐵 ≤ 𝜆𝜎𝐴𝐵 , 𝜆𝜎𝐴𝐵 ≥ 0, ∥T𝐵 [𝜆𝜎𝐴𝐵 ] ∥ 1 ≤ 𝜆}

(9.3.53)
Let us now make the change of variable 𝑆 𝐴𝐵 ≡ 𝜆𝜎𝐴𝐵 . We then have

𝑅max ( 𝐴; 𝐵) 𝜌 = log2 inf{𝜆 : 𝜌 𝐴𝐵 ≤ 𝑆 𝐴𝐵 , 𝑆 𝐴𝐵 ≥ 0, ∥T𝐵 [𝑆 𝐴𝐵 ] ∥ 1 ≤ 𝜆} (9.3.54)

= log2 inf{∥T𝐵 [𝑆 𝐴𝐵 ] ∥ 1 : 𝜌 𝐴𝐵 ≤ 𝑆 𝐴𝐵 }, (9.3.55)

where to obtain the last line we eliminated the constraint 𝑆 𝐴𝐵 ≥ 0 (because it is

implied by 𝜌 𝐴𝐵 ≤ 𝑆 𝐴𝐵 ) and we used the fact that 𝜇 ≥ ∥T𝐵 [𝑆 𝐴𝐵 ] ∥ 1 , meaning that
the smallest value of 𝜆 is ∥T𝐵 [𝑆 𝐴𝐵 ] ∥ 1 .
Now, for an arbitrary operator 𝑆 𝐴𝐵 satisfying 𝜌 𝐴𝐵 ≤ 𝑆 𝐴𝐵 , recall from (2.2.67)
that the Jordan–Hahn decomposition of T𝐵 [𝑆 𝐴𝐵 ] is given by T𝐵 (𝑆 𝐴𝐵 ) = 𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ,
with 𝐾 𝐴𝐵 , 𝐿 𝐴𝐵 ≥ 0 and 𝐾 𝐴𝐵 𝐿 𝐴𝐵 = 0. We then find that

∥T𝐵 [𝑆 𝐴𝐵 ] ∥ 1 = Tr[𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 ], (9.3.56)

𝑆 𝐴𝐵 = T𝐵 [𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ]. (9.3.57)

The following inequality thus holds:

inf{Tr[𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 ] : 𝜌 𝐴𝐵 ≤ T𝐵 [𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ], 𝐾 𝐴𝐵 , 𝐿 𝐴𝐵 ≥ 0}
≤ inf{∥T𝐵 (𝑆 𝐴𝐵 )∥ 1 : 𝜌 𝐴𝐵 ≤ 𝑆 𝐴𝐵 }. (9.3.58)

To see the opposite inequality, let 𝐾 𝐴𝐵 and 𝐿 𝐴𝐵 be arbitrary operators such that
𝐾 𝐴𝐵 , 𝐿 𝐴𝐵 ≥ 0 and 𝜌 𝐴𝐵 ≤ T𝐵 [𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ]. Then, setting 𝑆 𝐴𝐵 = T𝐵 [𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ],
we find that 𝜌 𝐴𝐵 ≤ 𝑆 𝐴𝐵 and

∥T𝐵 (𝑆 𝐴𝐵 ) ∥ 1 = ∥𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ∥ 1 (9.3.59)
≤ ∥𝐾 𝐴𝐵 ∥ 1 + ∥𝐿 𝐴𝐵 ∥ 1 (9.3.60)
= Tr[𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 ], (9.3.61)

where we used the triangle inequality. This implies that

560
Chapter 9: Entanglement Measures

inf{∥T𝐵 (𝑆 𝐴𝐵 )∥ 1 : 𝜌 𝐴𝐵 ≤ 𝑆 𝐴𝐵 } ≤
inf{Tr[𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 ] : 𝜌 𝐴𝐵 ≤ T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ), 𝐾 𝐴𝐵 , 𝐿 𝐴𝐵 ≥ 0}. (9.3.62)

Putting together (9.3.58) and (9.3.62), we conclude that 𝑊max ( 𝐴; 𝐵) 𝜌 is given by

the equality in (9.3.48).
To see the equality in (9.3.49), we employ semi-definite programming duality
(see Section 2.4). We first put (9.3.48) into standard form (see Definition 2.26) as
follows:
inf {Tr[𝐶 𝑋] : Φ(𝑋) ≥ 𝐷}, (9.3.63)
𝑋 ≥0

with
1 𝐴𝐵 0

𝐾 𝐴𝐵 0
𝑋= , 𝐶=
0 1 𝐴𝐵
, (9.3.64)
0 𝐿 𝐴𝐵
Φ(𝑋) = T𝐵 [𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ], 𝐷 = 𝜌 𝐴𝐵 . (9.3.65)

The dual program is then given by

sup{Tr[𝐷𝑌 ] : Φ† (𝑌 ) ≤ 𝐶}. (9.3.66)

𝑌 ≥0

In order to determine the adjoint Φ† of Φ, consider that

Tr[Φ(𝑋)𝑌 ] = Tr[T𝐵 [𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ]𝑌 𝐴𝐵 ] (9.3.67)

= Tr[(𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 )T𝐵 [𝑌 𝐴𝐵 ]] (9.3.68)

𝐾 𝐴𝐵 0 T𝐵 [𝑌 𝐴𝐵 ] 0
= Tr . (9.3.69)
0 𝐿 𝐴𝐵 0 −T𝐵 [𝑌 𝐴𝐵 ]

Therefore,
T [𝑌 ] 0
Φ† (𝑌 ) = 𝐵 𝐴𝐵 (9.3.70)
0 −T𝐵 [𝑌 𝐴𝐵 ]
so that Φ† (𝑌 ) ≤ 𝐶 is equivalent to

1 𝐴𝐵 0

T𝐵 [𝑌 𝐴𝐵 ] 0
≤
0 1 𝐴𝐵
. (9.3.71)
0 −T𝐵 [𝑌 𝐴𝐵 ]

This is equivalent to the condition −1 𝐴𝐵 ≤ T𝐵 [𝑌 𝐴𝐵 ] ≤ 1 𝐴𝐵 , which is equivalent

to ∥T𝐵 [𝑌 𝐴𝐵 ] ∥ ∞ ≤ 1 (see (2.4.26)). We thus conclude that (9.3.66) is equal to
(9.3.49).
561
Chapter 9: Entanglement Measures

To arrive at the equality in (9.3.48)–(9.3.49), we need to verify that strong

duality holds, and so we check the conditions of Theorem 2.28. For the dual
program in (9.3.49), pick 𝑌 𝐴𝐵 = 12 1 𝐴𝐵 . Then the constraints in (9.3.48) are strict,
because 𝑌 𝐴𝐵 > 0 and ∥T𝐵 (𝑌 𝐴𝐵 ) ∥ ∞ < 1 for this choice. Furthermore, for the
primal in (9.3.48), pick 𝐾 𝐴𝐵 and 𝐿 𝐴𝐵 equal to the positive and negative parts of
T𝐵 (𝜌 𝐴𝐵 ), respectively. These are positive semi-definite by definition and satisfy
𝜌 𝐴𝐵 = T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ), so that they are feasible for the primal. ■

It is worthwhile to observe the close connection of the SDP in (9.3.48), related

to the max-Rains relative entropy, to that in (9.1.102), related to the log-negativity.
In fact, the SDP in (9.3.48) is a relaxation of that in (9.1.102), and this leads to an
alternate proof of the rightmost inequality from (9.3.11):
𝑅max ( 𝐴; 𝐵) 𝜌 ≤ 𝐸 𝑁 ( 𝐴; 𝐵) 𝜌 , (9.3.72)
holding for every bipartite state 𝜌 𝐴𝐵 .
Furthermore, the form of the SDP in (9.3.48), along with the same proof given
for Proposition 9.10, except with (9.1.106) and (9.1.112) replaced with inequality
constraints, leads to the following conclusion:

Proposition 9.28 Max-Rains Relative Entropy is a Selective PPT Mono-

tone
The max-Rains relative entropy is a selective PPT monotone; i.e., (9.1.103)
holds with 𝐸 set to 𝑅max .

Due to the additivity of 𝐷 max , Proposition 9.25 implies that 𝑅max is subadditive:
𝑅max ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 ) 𝜌⊗𝜔 ≤ 𝑅max ( 𝐴1 ; 𝐵1 ) 𝜌 + 𝑅max ( 𝐴2 ; 𝐵2 )𝜔 , (9.3.73)
where the inequality holds for all states 𝜌 𝐴1 𝐵1 and 𝜔 𝐴2 𝐵2 . Using the dual formulation
in (9.3.48) for 𝑅max , we find that the reverse inequality also holds, implying that
𝑅max is an additive entanglement measure.

Proposition 9.29 Additivity of Max-Rains Relative Entropy

Let 𝜌 𝐴1 𝐵1 and 𝜔 𝐴2 𝐵2 be quantum states. Then,
𝑅max ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 ) 𝜌⊗𝜔 = 𝑅max ( 𝐴1 ; 𝐵1 ) 𝜌 + 𝑅max ( 𝐴2 ; 𝐵2 )𝜔 . (9.3.74)

562
Chapter 9: Entanglement Measures

Proof: We prove the inequality reverse to the one in (9.3.73). To this end, we
employ the dual formulation of 𝑅max in (9.3.49). Let 𝑌 𝐴1 𝐵1 and 𝑆 𝐴2 𝐵2 be arbitrary
operators satisfying

T𝐵1 [𝑌 𝐴1 𝐵1 ] ∞
≤ 1, 𝑌 𝐴1 𝐵1 ≥ 0, (9.3.75)
T𝐵2 [𝑆 𝐴2 𝐵2 ] ∞
≤ 1, 𝑆 𝐴2 𝐵2 ≥ 0. (9.3.76)

Then it follows from multiplicativity of the Schatten ∞-norm under tensor products
(see (2.2.96)) that

T𝐵1 𝐵2 [𝑌 𝐴1 𝐵1 ⊗ 𝑆 𝐴2 𝐵2 ] ∞
= T𝐵1 [𝑌 𝐴1 𝐵1 ] ⊗ T𝐵2 [𝑆 𝐴2 𝐵2 ] ∞
(9.3.77)
= T𝐵1 [𝑌 𝐴1 𝐵1 ] ∞
T𝐵2 [𝑆 𝐴2 𝐵2 ] ∞
(9.3.78)
≤ 1. (9.3.79)

Furthermore, we have that 𝑌 𝐴1 𝐵1 ⊗ 𝑆 𝐴2 𝐵2 ≥ 0. So it follows that

log2 Tr[𝑌 𝐴1 𝐵1 𝜌 𝐴1 𝐵1 ] + log2 Tr[𝑆 𝐴2 𝐵2 𝜔 𝐴2 𝐵2 ]

= log2 Tr[𝑌 𝐴1 𝐵1 𝜌 𝐴1 𝐵1 ]Tr[𝑆 𝐴2 𝐵2 𝜔 𝐴2 𝐵2 ] (9.3.80)

= log2 Tr[ 𝑌 𝐴1 𝐵1 ⊗ 𝑆 𝐴2 𝐵2 𝜌 𝐴1 𝐵1 ⊗ 𝜔 𝐴2 𝐵2 ] (9.3.81)
≤ 𝑅max ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 ) 𝜌⊗𝜔 . (9.3.82)

The inequality follows because 𝑌 𝐴1 𝐵1 ⊗ 𝑆 𝐴2 𝐵2 is a particular operator satisfying the

constraints in (9.3.48) for 𝑅max ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 ) 𝜌⊗𝜔 . Since the inequality holds for all
𝑌 𝐴1 𝐵1 and 𝑆 𝐴2 𝐵2 satisfying (9.3.75)–(9.3.76), the superadditivity inequality

𝑅max ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 ) 𝜌⊗𝜔 ≥ 𝑅max ( 𝐴1 ; 𝐵1 ) 𝜌 + 𝑅max ( 𝐴2 ; 𝐵2 )𝜔 (9.3.83)

follows. ■

We now consider the hypothesis testing Rains relative entropy, which is defined
as
𝜀
𝑅𝐻 ( 𝐴; 𝐵) 𝜌 = inf′ 𝐷 𝜀𝐻 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ), (9.3.84)
𝜎𝐴𝐵 ∈PPT ( 𝐴:𝐵)

for 𝜀 ∈ [0, 1]. Recall the primal and dual formulations of 𝐷 𝜀𝐻 from Proposition 7.66.
Using the dual formulation, we obtain the following:

563
Chapter 9: Entanglement Measures

Proposition 9.30 SDP for Hypothesis Testing Rains Relative Entropy

Let 𝜌 𝐴𝐵 be a bipartite state. Then, the hypothesis testing Rains relative entropy
can be written as
𝜀
𝑅𝐻 ( 𝐴; 𝐵) 𝜌 = − log2 𝑊𝐻𝜀 ( 𝐴; 𝐵) 𝜌 , (9.3.85)
for all 𝜀 ∈ [0, 1], where

𝑊𝐻𝜀 ( 𝐴; 𝐵) 𝜌
B sup {𝜇(1 − 𝜀) − Tr[𝑍 𝐴𝐵 ] : 𝜇𝜌 𝐴𝐵 ≤ T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ) + 𝑍 𝐴𝐵 ,
𝜇≥0,𝑍 𝐴𝐵 ,
𝐾 𝐴𝐵 ,𝐿 𝐴𝐵 ≥0
T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ) ≥ 0, Tr[𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 ] ≤ 1} (9.3.86)
= inf {∥T𝐵 (𝑀 𝐴𝐵 + 𝑁 𝐴𝐵 )∥ ∞ : Tr[𝑀 𝐴𝐵 𝜌 𝐴𝐵 ] ≥ 1 − 𝜀, 𝑀 𝐴𝐵 ≤ 1 𝐴𝐵 }.
𝑀 𝐴𝐵 ,𝑁 𝐴𝐵 ≥0
(9.3.87)

Proof: Using the dual SDP formulation of the hypothesis testing relative entropy
from Proposition 7.66, we obtain

𝑊𝐻𝜀 ( 𝐴; 𝐵) 𝜌 =
sup {𝜇(1 − 𝜀) − Tr[𝑍 𝐴𝐵 ] : 𝜇𝜌 𝐴𝐵 ≤ 𝜎𝐴𝐵 + 𝑍 𝐴𝐵 } (9.3.88)
𝜇≥0,𝑍 𝐴𝐵 ≥0,𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)

Now, combining with the characterization of PPT′ ( 𝐴 : 𝐵) from (9.3.44), we conclude

(9.3.86).
The SDP for 𝑊𝐻𝜀 ( 𝐴; 𝐵) 𝜌 in (9.3.86) can be written in the standard form of
(2.4.3) as
sup {Tr[ 𝐴𝑋] : Φ(𝑋) ≤ 𝐵} , (9.3.89)
𝑋 ≥0
where
𝜇 0 0 0 1−𝜀 0 0 0
−1 𝐴𝐵 0 0®
© ª © ª
0 𝑍 𝐴𝐵 0 0 ® 0
𝑋= ®, 𝐴 = ®, (9.3.90)
0 0 𝐾 𝐴𝐵 0 ® 0 0 0 0®
«0 0 0 𝐿 𝐴𝐵 ¬ « 0 0 0 0¬
𝜇𝜌 𝐴𝐵 − T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ) − 𝑍 𝐴𝐵 0 0
Φ(𝑋) = −T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 )
© ª
0 0 ®,
« 0 0 Tr[𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 ] ¬
(9.3.91)
564
Chapter 9: Entanglement Measures

0 0 0
𝐵 = 0 0 0® .
© ª
(9.3.92)
«0 0 1¬
Setting
𝑀 𝐴𝐵 0 0
𝑌 = 0 𝑁 𝐴𝐵 0® ,
© ª
(9.3.93)
« 0 0 𝜆¬
we compute the dual map Φ† as follows:
Tr[𝑌 Φ(𝑋)]
= Tr[𝑀 𝐴𝐵 (𝜇𝜌 𝐴𝐵 − T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ) − 𝑍 𝐴𝐵 )] − Tr[𝑁 𝐴𝐵 T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 )]
+ 𝜆 Tr[𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 ] (9.3.94)
= 𝜇 Tr[𝑀 𝐴𝐵 𝜌 𝐴𝐵 ] − Tr[𝑀 𝐴𝐵 𝑍 𝐴𝐵 ] + Tr[(𝜆1 𝐴𝐵 − T𝐵 (𝑀 𝐴𝐵 + 𝑁 𝐴𝐵 )) 𝐾 𝐴𝐵 ]
+ Tr[(𝜆1 𝐴𝐵 + T𝐵 (𝑀 𝐴𝐵 + 𝑁 𝐵 )) 𝐿 𝐴𝐵 ], (9.3.95)
which implies that
Φ† (𝑌 ) =
Tr[𝑀 𝐴𝐵 𝜌 𝐴𝐵 ] 0 0 0
−𝑀 𝐴𝐵
© ª
0 0 0 ®
𝜆1 𝐴𝐵 − T𝐵 (𝑀 𝐴𝐵 + 𝑁 𝐴𝐵 )
®.
0 0 0 ®
« 0 0 0 𝜆1 𝐴𝐵 + T𝐵 (𝑀 𝐴𝐵 + 𝑁 𝐵 ) ¬
(9.3.96)
Then using the standard form of the dual program in (2.4.4), i.e.,
inf Tr[𝐵𝑌 ] : Φ† (𝑌 ) ≥ 𝐴 ,

(9.3.97)
𝑌 ≥0
we find that the dual SDP is given by
inf {𝜆 : Tr[𝑀 𝐴𝐵 𝜌 𝐴𝐵 ] ≥ 1 − 𝜀, 𝑀 𝐴𝐵 ≤ 1 𝐴𝐵 ,
𝜆,𝑀 𝐴𝐵 ,𝑁 𝐴𝐵 ≥0
𝜆1 𝐴𝐵 ± T𝐵 (𝑀 𝐴𝐵 + 𝑁 𝐴𝐵 ) ≥ 0}. (9.3.98)
This can alternatively be written as
inf {∥T𝐵 (𝑀 𝐴𝐵 + 𝑁 𝐴𝐵 ) ∥ ∞ : Tr[𝑀 𝐴𝐵 𝜌 𝐴𝐵 ] ≥ 1 − 𝜀, 𝑀 𝐴𝐵 ≤ 1 𝐴𝐵 }. (9.3.99)
𝑀 𝐴𝐵 ,𝑁 𝐴𝐵 ≥0

Finally, we verify that strong duality holds, by applying Theorem 2.28. A

strictly feasible choice for the primal variables is 𝜇 = 1, 𝑍 𝐴𝐵 = 1 𝐴𝐵 , 𝐾 𝐴𝐵 = 𝜋 𝐴𝐵 /2,
𝐿 𝐴𝐵 = 𝜋 𝐴𝐵 /3. A feasible choice for the dual variables is 𝑀 𝐴𝐵 = 1 𝐴𝐵 and
𝑁 𝐴𝐵 = 1 𝐴𝐵 . ■
565
Chapter 9: Entanglement Measures

9.4 Squashed Entanglement

In this section, we investigate the properties of the squashed entanglement, which
we introduced as an entanglement measure in Section 9.1.1. We first recall the
definition of squashed entanglement from (9.1.162).

Definition 9.31 Squashed Entanglement

Let 𝜌 𝐴𝐵 be a bipartite state. Then, the squashed entanglement is defined as
1
𝐸 sq ( 𝐴; 𝐵) 𝜌 B inf {𝐼 ( 𝐴; 𝐵|𝐸)𝜔 : Tr𝐸 [𝜔 𝐴𝐵𝐸 ] = 𝜌 𝐴𝐵 } , (9.4.1)
2 𝜔 𝐴𝐵𝐸
where the quantum conditional mutual information 𝐼 ( 𝐴; 𝐵|𝐸)𝜔 is defined in
(7.1.11) as

𝐼 ( 𝐴; 𝐵|𝐸)𝜔 = 𝐻 ( 𝐴|𝐸)𝜔 + 𝐻 (𝐵|𝐸)𝜔 − 𝐻 ( 𝐴𝐵|𝐸)𝜔 (9.4.2)

= 𝐻 ( 𝐴𝐸)𝜔 − 𝐻 (𝐸)𝜔 + 𝐻 (𝐵𝐸)𝜔 − 𝐻 ( 𝐴𝐵𝐸)𝜔 (9.4.3)
= 𝐻 (𝐵|𝐸)𝜔 − 𝐻 (𝐵| 𝐴𝐸)𝜔 (9.4.4)

The optimization in the definition of squashed entanglement is with respect to

all extensions 𝜔 𝐴𝐵𝐸 of 𝜌 𝐴𝐵 , with the extension system 𝐸 having arbitrarily large,
yet finite dimension, which means that the infimum cannot in general be replaced
by a minimum. The fact that the extension system can have arbitrarily large, yet
finite dimension also means that computing the squashed entanglement is in general
difficult; however, we can always place an upper bound on it by calculating the
quantum conditional mutual information of a specific extension.
To understand why the quantity in (9.4.1) is called squashed entanglement,
it is helpful to think in terms of a cryptographic scenario. This cryptographic
perspective turns out to be useful in the context of secret key agreement, which we
consider in Chapters 15 and 20. Consider three parties, the protagonists Alice and
Bob, as well as an eavesdropper. Alice and Bob possess the quantum systems 𝐴 and
𝐵, respectively, and the eavesdropper possesses a system 𝐸, such that the global state
𝜔 𝐴𝐵𝐸 is consistent with the reduced state 𝜌 𝐴𝐵 of Alice and Bob. The conditional
mutual information 𝐼 ( 𝐴; 𝐵|𝐸)𝜔 corresponds to the correlations between Alice
and Bob from the perspective of the eavesdropper. The squashed entanglement
𝐸 sq ( 𝐴; 𝐵) 𝜌 is then an optimization over all possible global states 𝜔 𝐴𝐵𝐸 such that
566
Chapter 9: Entanglement Measures

Tr𝐸 [𝜔 𝐴𝐵𝐸 ] = 𝜌 𝐴𝐵 , and this optimization corresponds to the worst possible scenario
in which the eavesdropper attempts to “squash down” the correlations of Alice and
Bob, i.e., to reduce the value of 𝐼 ( 𝐴; 𝐵|𝐸)𝜔 as much as possible. This cryptographic
perspective actually allows us to write the squashed entanglement in an alternative
way, which we do in Proposition 9.37 below.
We begin by establishing some basic properties of squashed entanglement. As
we will see, the squashed entanglement possesses all of the desired properties of an
entanglement measure stated at the beginning of Section 9.1.

Proposition 9.32 Properties of Squashed Entanglement

The squashed entanglement 𝐸 sq ( 𝐴; 𝐵) 𝜌 has the following properties.
1. Non-negativity: For every bipartite state 𝜌 𝐴𝐵 ,

𝐸 sq ( 𝐴; 𝐵) 𝜌 ≥ 0. (9.4.5)

2. Faithfulness: We have 𝐸 sq ( 𝐴; 𝐵)𝜎 = 0 if and only if 𝜎𝐴𝐵 is a separable

state.
3. Convexity: Let X be a finite alphabet, 𝑝 : X → [0, 1] a probability
distribution on X, and {𝜌 𝑥𝐴𝐵 }𝑥∈X a set of states. Then,
∑︁
𝐸 sq ( 𝐴; 𝐵) 𝜌 ≤ 𝑝(𝑥)𝐸 sq ( 𝐴; 𝐵) 𝜌 𝑥 , (9.4.6)
𝑥∈X
Í
where 𝜌 𝐴𝐵 = 𝑥∈X 𝑝(𝑥) 𝜌 𝑥𝐴𝐵 .
4. Monogamy: For every state 𝜌 𝐴1 𝐴2 𝐵1 𝐵2 , the overall squashed entanglement
is not smaller than the sum of the individual squashed entanglements:

𝐸 sq ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 ) 𝜌 ≥ 𝐸 sq ( 𝐴1 ; 𝐵1 ) 𝜌 + 𝐸 sq ( 𝐴1 ; 𝐵2 ) 𝜌 + 𝐸 sq ( 𝐴2 ; 𝐵1 ) 𝜌
+ 𝐸 sq ( 𝐴2 ; 𝐵2 ) 𝜌 . (9.4.7)

5. Additivity: For a tensor-product state 𝜎𝐴1 𝐴2 𝐵1 𝐵2 = 𝜔 𝐴1 𝐵1 ⊗ 𝜏𝐴2 𝐵2 , the

following additivity identity holds:

𝐸 sq ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 )𝜎 = 𝐸 sq ( 𝐴1 ; 𝐵1 )𝜔 + 𝐸 sq ( 𝐴2 ; 𝐵2 )𝜏 . (9.4.8)

567
Chapter 9: Entanglement Measures

Proof:
1. This follows immediately from the fact that the conditional mutual information
of an arbitrary state is non-negative (Theorem 7.6).
2. The statement “if 𝜎𝐴𝐵 is a separable state, then 𝐸 sq ( 𝐴; 𝐵)𝜎 = 0” follows from
the line of reasoning in (9.1.157)–(9.1.159) used to motivate the definition of
squashed entanglement. For a proof of the converse statement, please consult
the Bibliographic Notes in Section 9.6.
3. Let 𝜔𝑥𝐴𝐵𝐸 denote an arbitrary extension of 𝜌 𝑥𝐴𝐵 . Then
∑︁
𝜔 𝐴𝐵𝐸 𝑋 B 𝑝(𝑥)𝜔𝑥𝐴𝐵𝐸 ⊗ |𝑥⟩⟨𝑥| 𝑋 (9.4.9)
𝑥∈X

is a particular extension of 𝜌 𝐴𝐵 . It follows that

2 · 𝐸 sq ( 𝐴; 𝐵) 𝜌 ≤ 𝐼 ( 𝐴; 𝐵|𝐸 𝑋)𝜔 (9.4.10)
∑︁
= 𝑝(𝑥)𝐼 ( 𝐴; 𝐵|𝐸)𝜔 𝑥 , (9.4.11)
𝑥∈X

where to obtain the equality we used the direct-sum property of conditional

mutual information (see Proposition 7.9). Since the inequality holds for
arbitrary extensions of 𝜌 𝑥𝐴𝐵 , we conclude the desired inequality.
4. To see the inequality in (9.4.7), let 𝜔 𝐴1 𝐴2 𝐵1 𝐵2 𝐸 be an arbitrary extension of
𝜌 𝐴1 𝐴2 𝐵1 𝐵2 . Then by two applications of the chain rule for conditional mutual
information, we find that
𝐼 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 |𝐸)𝜔
= 𝐼 ( 𝐴1 ; 𝐵1 𝐵2 |𝐸)𝜔 + 𝐼 ( 𝐴2 ; 𝐵1 𝐵2 |𝐸 𝐴1 )𝜔 (9.4.12)
= 𝐼 ( 𝐴1 ; 𝐵1 |𝐸)𝜔 + 𝐼 ( 𝐴1 ; 𝐵2 |𝐸 𝐵1 )𝜔
+ 𝐼 ( 𝐴2 ; 𝐵2 |𝐸 𝐴1 )𝜔 + 𝐼 ( 𝐴2 ; 𝐵1 |𝐸 𝐴1 𝐵2 )𝜔 (9.4.13)

≥ 2 𝐸 sq ( 𝐴1 ; 𝐵1 ) 𝜌 + 𝐸 sq ( 𝐴1 ; 𝐵2 ) 𝜌 + 𝐸 sq ( 𝐴2 ; 𝐵1 ) 𝜌 + 𝐸 sq ( 𝐴2 ; 𝐵2 ) 𝜌 .
(9.4.14)
The inequality follows because 𝜔 𝐴1 𝐵1 𝐸 is a particular extension of the reduced
state 𝜌 𝐴1 𝐵1 , the state 𝜔 𝐴1 𝐵1 𝐵2 𝐸 is a particular extension of the reduced state
𝜌 𝐴1 𝐵2 , the state 𝜔 𝐴1 𝐴2 𝐵2 𝐸 is a particular extension of the reduced state 𝜌 𝐴2 𝐵2 ,
and 𝜔 𝐴1 𝐴2 𝐵1 𝐵2 𝐸 is a particular extension of the reduced state 𝜌 𝐴2 𝐵1 . Since the
extension 𝜔 𝐴1 𝐴2 𝐵1 𝐵2 𝐸 is arbitrary, optimizing over all such extensions on the
left-hand side of the inequality above gives (9.4.7).
568
Chapter 9: Entanglement Measures

5. To see the equality in (9.4.8), first consider that for a tensor-product state
𝜔 𝐴1 𝐵1 ⊗ 𝜏𝐴2 𝐵2 , the reduced state on systems 𝐴1 𝐵2 is the product state 𝜔 𝐴1 ⊗ 𝜏𝐵2 ,
and the reduced state on systems 𝐴2 𝐵1 is the product state 𝜔 𝐴2 ⊗ 𝜏𝐵1 . Thus,
faithfulness implies that 𝐸 sq ( 𝐴1 ; 𝐵2 )𝜎 = 𝐸 sq ( 𝐴2 ; 𝐵1 )𝜎 = 0, and then the
monogamy inequality in (9.4.7) implies that

𝐸 sq ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 )𝜎 ≥ 𝐸 sq ( 𝐴1 ; 𝐵1 )𝜔 + 𝐸 sq ( 𝐴2 ; 𝐵2 )𝜏 . (9.4.15)

Now let 𝜔 𝐴1 𝐵1 𝐸1 be an extension of 𝜔 𝐴1 𝐵1 , and let 𝜏𝐴2 𝐵2 𝐸2 be an extension of

𝜏𝐴2 𝐵2 . Then 𝜔 𝐴1 𝐵1 𝐸1 ⊗ 𝜏𝐴2 𝐵2 𝐸2 is an extension of 𝜔 𝐴1 𝐵1 ⊗ 𝜏𝐴2 𝐵2 . We then have
that

2 · 𝐸 sq ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 )𝜎 ≤ 𝐼 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 |𝐸 1 𝐸 2 )𝜔⊗𝜏 (9.4.16)
= 𝐼 ( 𝐴1 ; 𝐵1 |𝐸 1 )𝜔 + 𝐼 ( 𝐴2 ; 𝐵2 |𝐸 2 )𝜏 , (9.4.17)

where the equality follows from the additivity of conditional mutual information
with respect to tensor-product states (Proposition 7.9). Since the extensions
𝜔 𝐴1 𝐵1 𝐸1 and 𝜏𝐴2 𝐵2 𝐸2 are arbitrary, the following inequality holds:

𝐸 sq ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 )𝜎 ≤ 𝐸 sq ( 𝐴1 ; 𝐵1 )𝜔 + 𝐸 sq ( 𝐴2 ; 𝐵2 )𝜏 . (9.4.18)

We then conclude (9.4.8) by combining (9.4.15) and (9.4.18). ■

Let us now prove that the squashed entanglement is indeed an entanglement

measure, as per Definition 9.1.

Theorem 9.33 Selective LOCC Monotonicity of Squashed Entanglement

The squashed entanglement is a selective LOCC monotone, so that (9.1.14)
holds with 𝐸 set to 𝐸 sq .

Proof: We prove that the conditions of Lemma 9.2 hold and then we apply it. The
first part of the proof follows from the fact that conditional mutual information does
not increase under the action of local channels (see Proposition 7.9): for a tripartite
state 𝜉 𝐴𝐵𝐸 and local channels N 𝐴→𝐴′ and M𝐵→𝐵′ ,

𝐼 ( 𝐴; 𝐵|𝐸)𝜉 ≥ 𝐼 ( 𝐴′; 𝐵′ |𝐸) 𝜁 , (9.4.19)

where 𝜁 𝐴′ 𝐵′ 𝐸 B (N 𝐴→𝐴′ ⊗ M𝐵→𝐵′ )(𝜉 𝐴𝐵𝐸 ). After incorporating optimizations, it

then follows immediately that squashed entanglement does not increase under the
569
Chapter 9: Entanglement Measures

action of local channels:

𝐸 sq ( 𝐴; 𝐵) 𝜌 ≥ 𝐸 sq ( 𝐴′; 𝐵′) 𝜅 , (9.4.20)

where 𝜅 𝐴′ 𝐵′ B (N 𝐴→𝐴′ ⊗ M𝐵→𝐵′ )(𝜌 𝐴𝐵 ).

Also, the squashed entanglement is invariant under classical communication,
which follows from Lemma 9.34 below. Now applying Lemma 9.2, we conclude
the statement of the theorem. ■

Lemma 9.34 Invariance of Squashed Entanglement Under Classical

Communication
Let 𝜌 𝑋 𝐴𝐵 be a classical–quantum state of the following form:
∑︁
𝜌 𝑋 𝐴𝐵 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴𝐵 , (9.4.21)
𝑥∈X

where X is a finite alphabet, 𝑝 : X → [0, 1] is a probability distribution, and

{𝜌 𝑥𝐴𝐵 }𝑥∈X is a set of states. Then
∑︁
𝐸 sq (𝑋 𝐴; 𝐵) 𝜌 = 𝐸 sq ( 𝐴; 𝐵𝑋) 𝜌 = 𝑝(𝑥)𝐸 sq ( 𝐴; 𝐵) 𝜌 𝑥 . (9.4.22)
𝑥∈X

Proof: Note that discarding or appending a classical state |𝑥⟩⟨𝑥| 𝑋 𝐴 is a local

channel, so that 𝐸 sq ( 𝐴; 𝐵) 𝜌 𝑥 = 𝐸 sq (𝑋 𝐴; 𝐵) |𝑥⟩⟨𝑥|⊗𝜌 𝑥 . In Proposition 9.32, we proved
that the squashed entanglement is a convex function. Using this fact, it follows that
∑︁ ∑︁
𝑝(𝑥)𝐸 sq ( 𝐴; 𝐵) 𝜌 𝑥 = 𝑝(𝑥)𝐸 sq (𝑋 𝐴; 𝐵) |𝑥⟩⟨𝑥|⊗𝜌 𝑥 ≥ 𝐸 sq (𝑋 𝐴; 𝐵) 𝜌 . (9.4.23)
𝑥∈X 𝑥∈X

Similarly, we have that

∑︁
𝑝(𝑥)𝐸 sq ( 𝐴; 𝐵) 𝜌 𝑥 ≥ 𝐸 sq ( 𝐴; 𝐵𝑋) 𝜌 . (9.4.24)
𝑥∈X

So it suffices to establish the following inequality:

∑︁
𝑝(𝑥)𝐸 sq ( 𝐴; 𝐵) 𝜌 𝑥 ≤ min{𝐸 sq (𝑋 𝐴; 𝐵) 𝜌 , 𝐸 sq ( 𝐴; 𝐵𝑋) 𝜌 }. (9.4.25)
𝑥∈X

570
Chapter 9: Entanglement Measures

To this end, let 𝜔 𝑋 𝐴𝐵𝐸 be an arbitrary extension of 𝜌 𝑋 𝐴𝐵 . After the action of a local
Í
completely dephasing channel Δ 𝑋 (·) B 𝑥∈X |𝑥⟩⟨𝑥| 𝑋 (·)|𝑥⟩⟨𝑥| 𝑋 , it follows that the
state 𝜃 𝑋 𝐴𝐵𝐸 B Δ 𝑋 (𝜔 𝑋 𝐴𝐵𝐸 ) has the following form:
∑︁
𝜃 𝑋 𝐴𝐵𝐸 = 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 𝐴 ⊗ 𝜃 𝑥𝐴𝐵𝐸 , (9.4.26)
𝑥∈X

is a purification of 𝜌 𝑋 𝐴𝐵 with purifying systems 𝑋𝐸 𝑅. By applying Proposition 4.4,

we conclude that the extension 𝜔 𝑋 𝐴𝐵𝐸 can be realized by the action of a quantum
channel N 𝑋𝐸 𝑅→𝐸 on systems 𝑋𝐸 𝑅 of 𝜑 𝑋 𝑋𝐸 𝐴𝐵𝑅 :

𝜔 𝑋 𝐴𝐵𝐸 = N 𝑋𝐸 𝑅→𝐸 (𝜑 𝑋 𝑋𝐸 𝐴𝐵𝑅 ). (9.4.28)

The conclusion stated in (9.4.26) then follows because

∑︁
𝜃 𝑋 𝐴𝐵𝐸 = (Δ 𝑋 ⊗ N 𝑋𝐸 𝑅→𝐸 )(𝜑 𝑋 𝑋𝐸 𝐴𝐵𝑅 ) = 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜃 𝑥𝐴𝐵𝐸 , (9.4.29)
𝑥∈X

where 𝜃 𝑥𝐴𝐵𝐸 B N 𝑋𝐸 𝑅→𝐸 (|𝑥⟩⟨𝑥| 𝑋𝐸 ⊗ 𝜙𝑥𝐴𝐵𝑅 ). We then find that

The last equality follows from the chain rule for conditional mutual information and
the second-to-last inequality from non-negativity of conditional mutual information:
𝐼 (𝑋; 𝐵|𝐸) 𝜌 ≥ 0. The final inequality holds because the conditional mutual
information does not increase under the action of a local channel on system 𝑋 (see
Proposition 7.9). Since the inequality holds for an arbitrary extension of 𝜌 𝑋 𝐴𝐵 , we
conclude that ∑︁
𝑝(𝑥)𝐸 sq ( 𝐴; 𝐵) 𝜌 𝑥 ≤ 𝐸 sq (𝑋 𝐴; 𝐵) 𝜌 . (9.4.35)
𝑥∈X

571
Chapter 9: Entanglement Measures

A proof for the other inequality

∑︁
𝑝(𝑥)𝐸 sq ( 𝐴; 𝐵) 𝜌 𝑥 ≤ 𝐸 sq ( 𝐴; 𝐵𝑋) 𝜌 (9.4.36)
𝑥∈X

follows similarly. ■

In Section 9.1.1, we introduced the entanglement of formation as an entangle-

ment measure. The following inequality relates the entanglement of formation to
the squashed entanglement:

Proposition 9.35 Squashed Entanglement and Entanglement of Forma-

tion
The entanglement of formation is never smaller than the squashed entanglement:

𝐸 sq ( 𝐴; 𝐵) 𝜌 ≤ 𝐸 𝐹 ( 𝐴; 𝐵) 𝜌 , (9.4.37)

for every bipartite state 𝜌 𝐴𝐵 .

Proof: We already alluded to this inequality when introducing squashed en-

tanglement in Section 9.1.1.4. If the extension 𝜔 𝐴𝐵𝐸 is restricted to be of the
form ∑︁
𝜔 𝐴𝐵𝐸 = 𝑝(𝑥)𝜙𝑥𝐴𝐵 ⊗ |𝑥⟩⟨𝑥|, (9.4.38)
𝑥

where 𝑝(𝑥) is a probability distribution and {𝜙𝑥𝐴𝐵 }𝑥 is a set of pure states satisfying
Í
𝜌 𝐴𝐵 = 𝑥 𝑝(𝑥)𝜙𝑥𝐴𝐵 , then it follows that

1
𝐸 sq ( 𝐴; 𝐵) 𝜌 ≤ 𝐼 ( 𝐴; 𝐵|𝑋)𝜔 = 𝐻 ( 𝐴|𝑋)𝜔 . (9.4.39)
2
Since the inequality holds for all such extensions, the inequality in (9.4.37) follows
by applying the definition of entanglement of formation in (9.1.42). ■

For pure bipartite states, the entanglement of formulation is simply the entropy
of the reduced state of one of the subsystems. It turns out that the squashed
entanglement reduces to the same quantity for pure bipartite states.

572
Chapter 9: Entanglement Measures

Proposition 9.36 Squashed Entanglement for Pure States

Let 𝜓 𝐴𝐵 be a pure bipartite state. Then, the squashed entanglement of 𝜓 𝐴𝐵 is
equal to the entropy of its reduced state on 𝐴:

𝐸 sq ( 𝐴; 𝐵)𝜓 = 𝐻 ( 𝐴)𝜓 . (9.4.40)

Proof: Every extension 𝜔 𝐴𝐵𝐸 of the pure bipartite state 𝜓 𝐴𝐵 is a product state of
the form 𝜔 𝐴𝐵𝐸 = 𝜓 𝐴𝐵 ⊗ 𝜌 𝐸 for some state 𝜌 𝐸 . By Proposition 7.9, it follows that

𝐼 ( 𝐴; 𝐵|𝐸)𝜓⊗𝜌 = 𝐼 ( 𝐴; 𝐵)𝜓 = 2𝐻 ( 𝐴)𝜓 , (9.4.41)

where the second equality holds because 𝐻 ( 𝐴𝐵)𝜓 = 0 and 𝐻 ( 𝐴)𝜓 = 𝐻 (𝐵)𝜓 for a
pure bipartite state. We thus have 𝐸 sq ( 𝐴; 𝐵)𝜓 = 𝐻 ( 𝐴)𝜓 . ■

An immediate consequence of Proposition 9.36 is that, for the maximally entan-

Í𝑑−1
gled state |Φ⟩ 𝐴𝐵 = √1 𝑖=0 |𝑖, 𝑖⟩ 𝐴𝐵 of Schmidt rank 𝑑, the squashed entanglement
𝑑
is equal to log2 𝑑:
𝐸 sq ( 𝐴; 𝐵)Φ = log2 𝑑. (9.4.42)

Let us now return to the discussion after Definition 9.31 on the cryptographic
interpretation of squashed entanglement. We stated that the quantum conditional
mutual information 𝐼 ( 𝐴; 𝐵|𝐸)𝜔 can be interpreted as the amount of correlations
between Alice (𝐴) and Bob (𝐵) from the point of view of an eavesdropper (𝐸),
where 𝜔 𝐴𝐵𝐸 is the joint state shared by all three parties. If the eavesdropper wants
to reduce, or “squash” these correlations as much as possible, then we optimize
with respect to every state 𝜔 𝐴𝐵𝐸 that is consistent with the state 𝜌 𝐴𝐵 shared by
Alice and Bob, leading to the squashed entanglement 𝐸 sq ( 𝐴; 𝐵) 𝜌 . Now, recall
Proposition 4.4, which states that for every extension 𝜔 𝐴𝐵𝐸 of a given state 𝜌 𝐴𝐵
𝜌
there exists a quantum channel S𝐸 ′ →𝐸 such that S𝐸 ′ →𝐸 (𝜓 𝐴𝐵𝐸 ′ ) = 𝜔 𝐴𝐵𝐸 , where
𝜌
𝜓 𝐴𝐵𝐸 ′ is a purification of 𝜌 𝐴𝐵 . We therefore immediately have the following.

Proposition 9.37 Other Representations of Squashed Entanglement

𝜌
Let 𝜌 𝐴𝐵 be a bipartite quantum state, and let 𝜓 𝐴𝐵𝐸 ′ be a purification of it. Then

573
Chapter 9: Entanglement Measures

the squashed entanglement 𝐸 sq ( 𝐴; 𝐵) 𝜌 can be written as

1 𝜌
𝐸 sq ( 𝐴; 𝐵) 𝜌 = inf {𝐼 ( 𝐴; 𝐵|𝐸)𝜔 : 𝜔 𝐴𝐵𝐸 = S𝐸 ′ →𝐸 (𝜓 𝐴𝐵𝐸 ′ )}, (9.4.43)
2 S𝐸 ′ →𝐸

where the infimum is with respect to every quantum channel S𝐸 ′ →𝐸 . Another

representation of squashed entanglement is

𝐸 sq ( 𝐴; 𝐵) 𝜌 =
1 𝜌
inf 𝐻 (𝐵|𝐸)𝜃 + 𝐻 (𝐵|𝐹)𝜃 : 𝜃 𝐵𝐸 𝐹 B V𝐸 ′ →𝐸 𝐹 (𝜓 𝐴𝐵𝐸 ′ ) , (9.4.44)
2 V𝐸 ′ →𝐸𝐹

where the infimum is with respect to every isometric channel V𝐸 ′ →𝐸 𝐹 .

The act of “squashing” the correlations between Alice and Bob can thus be
thought of explicitly in terms of an eavesdropper applying the channel S𝐸 ′ →𝐸 to
their purifying system 𝐸 ′ of 𝜌 𝐴𝐵 . For this reason, we call S𝐸 ′ →𝐸 a squashing
channel.

Proof: The equality (9.4.43) follows immediately from Proposition 4.4.

In order to establish (9.4.44), let S𝐸 ′ →𝐸 be an arbitrary squashing channel, and let
V𝐸 ′ →𝐸 𝐹 be an isometric extension of S𝐸 ′ →𝐸 , so that S𝐸 ′ →𝐸 = Tr𝐹 ◦ V𝐸 ′ →𝐸 𝐹 , where
the system 𝐹 satisfies 𝑑 𝐹 ≥ rank(Γ𝐸S 𝐸 ′ ). Consider that 𝜃 𝐴𝐵𝐸 𝐹 B V𝐸 ′ →𝐸 𝐹 (𝜓 𝐴𝐵𝐸 ′ )
𝜌
𝜌
is a purification of both 𝜔 𝐴𝐵𝐸 and 𝜃 𝐵𝐸 𝐹 , i.e., 𝜔 𝐴𝐵𝐸 = S𝐸 ′ →𝐸 (𝜓 𝐴𝐵𝐸 ′ ) = Tr𝐹 [𝜃 𝐴𝐵𝐸 𝐹 ]
and 𝜃 𝐵𝐸 𝐹 = Tr 𝐴 [𝜃 𝐴𝐵𝐸 𝐹 ]. Then, using (9.4.4), it follows that

𝐼 ( 𝐴; 𝐵|𝐸)𝜔 = 𝐻 (𝐵|𝐸)𝜔 − 𝐻 (𝐵| 𝐴𝐸)𝜔 (9.4.45)

= 𝐻 (𝐵|𝐸)𝜃 − 𝐻 (𝐵| 𝐴𝐸)𝜃 . (9.4.46)

Now, since 𝜃 𝐴𝐵𝐸 𝐹 is a pure state, we have from duality of conditional entropy that

−𝐻 (𝐵| 𝐴𝐸)𝜃 = 𝐻 (𝐵|𝐹)𝜃 . (9.4.47)

Therefore,
𝐼 ( 𝐴; 𝐵|𝐸)𝜔 = 𝐻 (𝐵|𝐸)𝜃 + 𝐻 (𝐵|𝐹)𝜃 . (9.4.48)
We conclude that (9.4.44) holds because the squashing channel S𝐸 ′ →𝐸 is arbitrary
in the development above. ■

574
Chapter 9: Entanglement Measures

We now establish an explicit uniform continuity bound for the squashed

entanglement:

Proposition 9.38 Uniform Continuity of Squashed Entanglement

Let 𝜌 𝐴𝐵 and 𝜎𝐴𝐵 be bipartite states such that the following fidelity bound holds

𝐹 (𝜌 𝐴𝐵 , 𝜎𝐴𝐵 ) ≥ 1 − 𝜀, (9.4.49)

for 𝜀 ∈ [0, 1]. Then the following bound applies to their squashed entangle-
ments:
√ √
𝐸 sq ( 𝐴; 𝐵) 𝜌 − 𝐸 sq ( 𝐴; 𝐵)𝜎 ≤ 𝜀 log2 min {𝑑 𝐴 , 𝑑 𝐵 } + 𝑔2 ( 𝜀), (9.4.50)

where
𝑔2 (𝛿) B (𝛿 + 1) log2 (𝛿 + 1) − 𝛿 log2 𝛿. (9.4.51)

Proof: Due to Uhlmann’s theorem (Theorem 6.8) and Proposition 4.4, for an
arbitrary extension 𝜌 𝐴𝐵𝐸 of 𝜌 𝐴𝐵 , there exists an extension 𝜎𝐴𝐵𝐸 of 𝜎𝐴𝐵 such that

𝐹 (𝜌 𝐴𝐵𝐸 , 𝜎𝐴𝐵𝐸 ) = 𝐹 (𝜌 𝐴𝐵 , 𝜎𝐴𝐵 ) ≥ 1 − 𝜀. (9.4.52)

By the relation between trace distance and fidelity (Theorem 6.14), it follows that
1 √
∥ 𝜌 𝐴𝐵𝐸 − 𝜎𝐴𝐵𝐸 ∥ 1 ≤ 𝜀. (9.4.53)
2
Then, applying the uniform continuity of conditional mutual information (Proposi-
tion 7.10), we find that

2𝐸 sq ( 𝐴; 𝐵)𝜎 ≤ 𝐼 ( 𝐴; 𝐵|𝐸)𝜎 (9.4.54)

√ √
≤ 𝐼 ( 𝐴; 𝐵|𝐸) 𝜌 + 2 𝜀 log2 min {𝑑 𝐴 , 𝑑 𝐵 } + 2𝑔2 ( 𝜀). (9.4.55)

Since the extension 𝜌 𝐴𝐵𝐸 is arbitrary, it follows that

√ √
𝐸 sq ( 𝐴; 𝐵)𝜎 ≤ 𝐸 sq ( 𝐴; 𝐵) 𝜌 + 𝜀 log2 min {𝑑 𝐴 , 𝑑 𝐵 } + 𝑔2 ( 𝜀). (9.4.56)

The other bound,

√ √
𝐸 sq ( 𝐴; 𝐵) 𝜌 ≤ 𝐸 sq ( 𝐴; 𝐵)𝜎 + 𝜀 log2 min {𝑑 𝐴 , 𝑑 𝐵 } + 𝑔2 ( 𝜀), (9.4.57)

follows from a similar proof. ■

575
Chapter 9: Entanglement Measures

9.5 Summary
In this chapter, we studied entanglement measures for quantum states and quantum
channels. The defining property of an entanglement measure for states is mono-
tonicity under local operations and classical communication (LOCC): a function
𝐸 : H 𝐴𝐵 → R is an entanglement measure if 𝐸 (𝜌 𝐴𝐵 ) ≥ 𝐸 (L(𝜌 𝐴𝐵 )) for every
bipartite state 𝜌 𝐴𝐵 and every LOCC channel L. LOCC monotonicity can be
thought of as a special kind of data-processing inequality, and it is a core concept
in entanglement theory in the same way that the data-processing inequality is the
core concept behind generalized divergence.
An important type of state entanglement measure for our purposes in this book
is a divergence-based measure, in which the entanglement in a given bipartite
quantum state is quantified by its divergence with the set of separable states. As
our divergence, we take a generalized divergence 𝑫 : D(H) × L+ (H) → R, and
we call the resulting quantity generalized divergence of entanglement. Due to the
data-processing inequality (which holds for a generalized divergence by definition),
we immediately obtain LOCC monotonicity for the generalized divergence of
entanglement, thus making it an entanglement measure. We also consider the
divergence with the larger set of PPT′ operators that contains all separable states,
and call the resulting quantity generalized Rains divergence.

9.6 Bibliographic Notes

Entanglement theory has a long history, with one of the seminal papers being that of
Bennett et al. (1996c) (see also Bennett et al. (1996b,a) for earlier works). Bennett
et al. (1996c) introduced the resource theory of entanglement, with the separable
states as the free states and LOCC channels as the free channels, along with the
related operational notions of distillable entanglement and entanglement cost. The
review of Horodecki et al. (2009b) is a useful resource for gaining an understanding
of notable accomplishments in the area. See also the review of Plenio and Virmani
(2007), which focuses specifically on entanglement measures.
The axiomatic approach to defining an entanglement measure that we have
taken in this chapter was proposed by Bennett et al. (1996c); Vedral et al. (1997);
Vedral and Plenio (1998); Vidal (2000), with LOCC monotonicity emerging as

576
Chapter 9: Entanglement Measures

the defining property of an entanglement measure Horodecki et al. (2009b). Vidal

(2000) and Horodecki (2005) established simplified conditions for proving that a
function is a selective LOCC monotone. Lemma 9.2 is in this spirit.
Entanglement of formation was defined by Bennett et al. (1996c), and they
showed that it is a selective LOCC monotone. They also showed that it is an upper
bound on entanglement cost, and thus an upper bound on distillable entanglement
(we did not define these concepts here, but we will define distillable entanglement
in detail in Chapter 13). The uniform continuity bound for entanglement of
formation in Proposition 9.4 was given by Winter (2016). The faithfulness bounds
for entanglement of formation in Proposition 9.5 were given by Li and Winter
(2018) (see also Nielsen (2000)). The non-additivity of entanglement of formation
was established by Hastings (2009), which built upon an earlier result of Shor
(2004) connecting various additivity conjectures in quantum information theory.
The formula in (9.1.93) for the entanglement of formation of two-qubit states was
determined by Wootters (1998).
Negavitity and log-negativity were defined by Zyczkowski et al. (1998) (see
also (Vidal and Werner, 2002) and (Plenio, 2005)). Vidal and Werner (2002)
showed that log-negativity is monotone under LOCC. They also proved that it is
additive. See also (Plenio, 2005) for a proof of selective LOCC monotonicity of
log-negativity, and for a proof of the fact that log-negativity is not convex. The
semi-definite programming approach, which we have taken here to prove selective
LOCC monotonicity of log-negativity, is based on Wang and Duan (2016a).
The relative entropy of entanglement was defined by Vedral et al. (1997);
Vedral and Plenio (1998), who also proved many of its properties, such as LOCC
monotonicity and convexity. They also proposed concepts closely related to the
generalized divergence of entanglement. The fact that trace distance of entanglement
is not a selective LOCC monotone was shown by Qiao et al. (2018). Optimization
over the set of separable states has been shown by Gurvits (2004); Gharibian
(2010) to be NP-hard in general (see also (Ioannou, 2007; Shi and Wu, 2012)). A
closed-form formula for the relative entropy of entanglement for two-qubit states
was derived by Miranowicz and Ishizaka (2008). The max-relative entropy of
entanglement was defined by Datta (2009b), the hypothesis testing relative entropy
of entanglement by Brandao and Datta (2011), and the sandwiched Rényi relative
entropy of entanglement by Wilde et al. (2017). The fact that the relative entropy
of entanglement is invariant under classical communication (Proposition 9.17) was
stated by Horodecki (2005) and proven by Kaur and Wilde (2017). Our proof

577
Chapter 9: Entanglement Measures

of selective separable monotonicity in Proposition 9.19, for the Rényi relative

entropies, is based on the approach from Wang and Wilde (2020). The proof of
Property 1. of Proposition 9.20 follows the approach from Plenio et al. (2000),
and the proof of Property 2. follows the approach of Tomamichel et al. (2017,
Proposition 10). Property 2. of Proposition 9.20 was first established by Vedral
and Plenio (1998). The cone program formulations of max-relative entropy of
entanglement of a bipartite state, as well as max-relative entropy of entanglement
of a quantum channel, were given by Berta and Wilde (2018).
The relative entropy with the set of PPT states was defined by Rains (1999a)
in the context of entanglement distillation (see Chapter 13). Rains (2001) then
modified the quantity to obtain a tighter bound on the distillable entanglement.
After this development, Audenaert et al. (2002) defined the set PPT′ and showed
that this improved bound can be written as the relative entropy with the set of
PPT′ operators, which we refer to here as the Rains relative entropy. The fact that
the set PPT’ is preserved by completely PPT preserving channels (Property 1. of
Lemma 9.14) was shown by Tomamichel et al. (2017). The sandwiched Rains
relative entropy of a bipartite state was defined by Tomamichel et al. (2017), as well
as the generalized Rains divergence, both as generalizations of the Rains relative
entropy of Rains (2001); Audenaert et al. (2002). The semi-definite program
formulation of the max-Rains relative entropy of a bipartite state was given by Wang
and Duan (2016a); Wang et al. (2019b), who also recognized that the semi-definite
programming bound from Wang and Duan (2016a) is equal to the max-Rains
relative entropy. Wang and Duan (2016a) proved that the max-Rains relative
entropy is a selective PPT monotone (Proposition 9.28) and that it is additive
(Proposition 9.29). Miranowicz and Ishizaka (2008) proved that the Rains relative
entropy is equal to the relative entropy of entanglement for two-qubit states.
The squashed entanglement of a bipartite state was defined by Christandl
and Winter (2004), who established several of its key properties mentioned in
Propositions 9.32, 9.33, and 9.36, including non-negativity, vanishing on separable
states, convexity, superadditivity in general, additivity for tensor-product states,
LOCC monotonicity, reduction for pure states, and the squashing channel represen-
tation in (9.4.43). The faithfulness of squashed entanglement was established by
Brandao et al. (2011). A function related to squashed entanglement was discussed
by Tucci (1999, 2002). Our discussions motivating squashed entanglement are
related to those presented by Tucci (1999, 2002). The representation of squashed
entanglement in (9.4.44) is due to Takeoka et al. (2014). Uniform continuity of the
squashed entanglement of a bipartite state was established by Alicki and Fannes

578
Chapter 9: Entanglement Measures

(2004), and the explicit bound given here is due to Shirokov (2017).

Appendix 9.A Semi-Definite Programs for Negativ-

ity

Here we prove that the quantity ∥T𝐵 (𝜌 𝐴𝐵 ) ∥ 1 has the following primal and dual
SDP formulations:

∥T𝐵 (𝜌 𝐴𝐵 ) ∥ 1 = sup {Tr[𝑅 𝐴𝐵 𝜌 𝐴𝐵 ] : −1 𝐴𝐵 ≤ T𝐵 (𝑅 𝐴𝐵 ) ≤ 1 𝐴𝐵 } , (9.A.1)

𝑅 𝐴𝐵
= inf {Tr[𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 ] : T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ) = 𝜌 𝐴𝐵 } , (9.A.2)
𝐾 𝐴𝐵 ,𝐿 𝐴𝐵 ≥0

where the optimization in the first line is with respect to Hermitian 𝑅 𝐴𝐵 . Since this
is the core quantity underlying both the negativity and the log-negativity, it follows
that these entanglement measures can be computed by means of semi-definite
programs. To see the first equality, consider that

∥T𝐵 (𝜌 𝐴𝐵 )∥ 1 = ∥T𝐵 (𝜌 𝐴𝐵 ) ∥ 1 (9.A.3)

= sup Tr[𝑅 𝐴𝐵 T𝐵 (𝜌 𝐴𝐵 )] (9.A.4)
∥𝑅 𝐴𝐵 ∥ ∞ ≤1
= sup Tr[T𝐵 (𝑅 𝐴𝐵 ) 𝜌 𝐴𝐵 ] (9.A.5)
∥𝑅 𝐴𝐵 ∥ ∞ ≤1
= sup Tr[𝑅 𝐴𝐵 𝜌 𝐴𝐵 ] (9.A.6)
∥T 𝐵 (𝑅 𝐴𝐵 )∥ ∞ ≤1
= sup {Tr[𝑅 𝐴𝐵 𝜌 𝐴𝐵 ] : −1 𝐴𝐵 ≤ T𝐵 (𝑅 𝐴𝐵 ) ≤ 1 𝐴𝐵 } . (9.A.7)
𝑅 𝐴𝐵

The second equality follows from Hölder duality (see (2.2.98)), and since T𝐵 (𝜌 𝐴𝐵 )
is Hermitian, it suffices to optimize over Hermitian 𝑅 𝐴𝐵 . The third equality follows
because the partial transpose is its own Hilbert–Schmidt adjoint. The fourth equality
follows from the substitution 𝑅 𝐴𝐵 → T𝐵 (𝑅 𝐴𝐵 ). The final equality follows because
the inequality ∥T𝐵 (𝑅 𝐴𝐵 )∥ ∞ ≤ 1 is equivalent to −1 𝐴𝐵 ≤ T𝐵 (𝑅 𝐴𝐵 ) ≤ 1 𝐴𝐵 for a
Hermitian operator T𝐵 (𝑅 𝐴𝐵 ).
Now consider that the set of Hermitian operators is equivalent to the set of
operators formed as differences of positive semi-definite operators. So this implies
that

579
Chapter 9: Entanglement Measures

∥T𝐵 (𝜌 𝐴𝐵 )∥ 1 =
sup {Tr[(𝑃 𝐴𝐵 − 𝑄 𝐴𝐵 ) 𝜌 𝐴𝐵 ] : −1 𝐴𝐵 ≤ T𝐵 (𝑃 𝐴𝐵 − 𝑄 𝐴𝐵 ) ≤ 1 𝐴𝐵 } . (9.A.8)
𝑃 𝐴𝐵 ,𝑄 𝐴𝐵 ≥0

Then by setting
1 𝐴𝐵 0

𝑃 𝐴𝐵 0 𝜌 𝐴𝐵 0
𝑋= , 𝐴= , 𝐵=
0 1 𝐴𝐵
, (9.A.9)
0 𝑄 𝐴𝐵 0 −𝜌 𝐴𝐵

T𝐵 (𝑃 𝐴𝐵 − 𝑄 𝐴𝐵 ) 0
Φ(𝑋) = , (9.A.10)
0 −T𝐵 (𝑃 𝐴𝐵 − 𝑄 𝐴𝐵 )
this primal SDP is now in the standard form of (2.4.3). Then, setting

𝐾 𝐴𝐵 0
𝑌= , (9.A.11)
0 𝐿 𝐴𝐵
we can calculate the Hilbert–Schmidt adjoint of Φ as
Tr[𝑌 Φ(𝑋)]

𝐾 𝐴𝐵 0 T𝐵 (𝑃 𝐴𝐵 − 𝑄 𝐴𝐵 ) 0
= Tr (9.A.12)
0 𝐿 𝐴𝐵 0 −T𝐵 (𝑃 𝐴𝐵 − 𝑄 𝐴𝐵 )
= Tr[𝐾 𝐴𝐵 (T𝐵 (𝑃 𝐴𝐵 − 𝑄 𝐴𝐵 ))] − Tr[𝐿 𝐴𝐵 (T𝐵 (𝑃 𝐴𝐵 − 𝑄 𝐴𝐵 ))] (9.A.13)
= Tr[T𝐵 (𝐾 𝐴𝐵 )(𝑃 𝐴𝐵 − 𝑄 𝐴𝐵 ))] − Tr[T𝐵 (𝐿 𝐴𝐵 )(𝑃 𝐴𝐵 − 𝑄 𝐴𝐵 )] (9.A.14)
= Tr[T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 )𝑃 𝐴𝐵 ] + Tr[T𝐵 (𝐿 𝐴𝐵 − 𝐾 𝐴𝐵 )𝑄 𝐴𝐵 ] (9.A.15)

T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ) 0 𝑃 𝐴𝐵 0
= Tr , (9.A.16)
0 −T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ) 0 𝑄 𝐴𝐵
so that
† T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ) 0
Φ (𝑌 ) = . (9.A.17)
0 −T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 )
Then, plugging into the standard form for the dual SDP in (2.4.4) and simplifying a
bit, we find that it is given by

Tr[𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 ] : T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ) ≥ 𝜌 𝐴𝐵 ,
inf
𝐾 𝐴𝐵 ,𝐿 𝐴𝐵 ≥0 −T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ) ≥ −𝜌 𝐴𝐵
= inf {Tr[𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 ] : T𝐵 (𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ) = 𝜌 𝐴𝐵 } . (9.A.18)
𝐾 𝐴𝐵 ,𝐿 𝐴𝐵 ≥0

Strong duality holds according to Theorem 2.28. Indeed, setting 𝐾 𝐴𝐵 and 𝐿 𝐴𝐵

to the respective positive and negative parts of T𝐵 (𝜌 𝐴𝐵 ) is feasible for the dual,
while setting 𝑅 𝐴𝐵 = 1 𝐴𝐵 /2 is strictly feasible for the primal.
580
Chapter 10

Entanglement Measures for

Quantum Channels
So far we have considered entanglement measures for quantum states. We now
consider entanglement measures for channels. Using the general principle discussed
in Section 7.11.2 for constructing channel quantities out of state quantitites, we
arrive at the following definition for the entanglement of a channel.
...

10.1 Definition and Basic Properties

...

Definition 10.1 Entanglement of a Quantum Channel

From an entanglement measure 𝐸 defined on bipartite quantum states, we
define the corresponding entanglement of a quantum channel N 𝐴→𝐵 as follows:

𝐸 (N) B sup 𝐸 (𝑅; 𝐵)𝜔 , (10.1.1)

𝜌𝑅 𝐴

where 𝜔 𝑅𝐵 B N 𝐴→𝐵 (𝜌 𝑅 𝐴 ), and the optimization is with respect to every

bipartite state 𝜌 𝑅 𝐴 , with system 𝑅 arbitrarily large, yet finite.

581
Chapter 10: Entanglement Measures for Quantum Channels

Remark: Note that it suffices to optimize (10.1.1) with respect to pure states 𝜓 𝑅 𝐴, with the
dimension of 𝑅 equal to the dimension of 𝐴, when calculating the entanglement of a channel, so
that
𝐸 (N) B sup 𝐸 (𝑅; 𝐵) 𝜔 , (10.1.2)
𝜓𝑅 𝐴

where 𝜔 𝑅𝐵 B N 𝐴→𝐵 (𝜓 𝑅 𝐴). This follows from the fact that an entanglement measure for states
is, by definition, monotone under LOCC channels. It is therefore monotone under a local partial
trace channel. In particular, consider a mixed state 𝜌 𝑅 𝐴, with the dimension of 𝑅 not necessarily
equal to the dimension of 𝐴. Let 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜔 𝑅 𝐴). Then, if we take a purification 𝜙 𝑅′ 𝑅 𝐴 of
𝜌 𝑅 𝐴, we obtain

𝐸 (𝑅; 𝐵) 𝜔 = 𝐸 (N 𝐴→𝐵 (𝜌 𝑅 𝐴)) (10.1.3)

= 𝐸 (N 𝐴→𝐵 (Tr 𝑅′ [𝜙 𝑅′ 𝑅 𝐴])) (10.1.4)
= 𝐸 (Tr 𝑅′ [N 𝐴→𝐵 (𝜙 𝑅′ 𝑅 𝐴)]) (10.1.5)
≤ 𝐸 (N 𝐴→𝐵 (𝜙 𝑅′ 𝑅 𝐴)) (10.1.6)
= 𝐸 (𝑅 ′ 𝑅; 𝐵) 𝜏 , (10.1.7)

where 𝜏𝑅′ 𝑅𝐵 = N 𝐴→𝐵 (𝜙 𝑅′ 𝑅 𝐴) and to obtain the inequality we used the fact that 𝐸 is monotone
under the partial trace channel Tr 𝑅′ . This demonstrates that it suffices to optimize with respect
to pure states when calculating the entanglement of a channel. Furthermore, by the Schmidt
decomposition theorem (Theorem 2.2), the dimension of the purifying system 𝑅 ′ 𝑅 need not
exceed the dimension of 𝐴.

Note that in the definition above the channel N 𝐴→𝐵 acts locally on the state
𝜓 𝑅 𝐴 to produce the state 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ). We can thus view N 𝐴→𝐵 as an
LOCC channel, which means that 𝐸 (𝑅; 𝐵)𝜔 ≤ 𝐸 (𝑅; 𝐴)𝜓 , by the definition of an
entanglement measure for states. In other words, by sending one share of a bipartite
state through the channel N, the entanglement can only stay the same or go down.
The quantity 𝐸 (N) thus indicates how well entanglement is preserved when one
share of it is sent through the channel N.
Let us consider three examples of entanglement measures for quantum channels,
defined using entanglement measures for bipartite quantum states.
1. The generalized divergence of entanglement of a channel N 𝐴→𝐵 , defined for
every generalized divergence 𝑫 as

𝑬 (N) B sup 𝑬 (𝑅; 𝐵)𝜔 (10.1.8)

𝜓𝑅 𝐴
= sup inf 𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 ), (10.1.9)
𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), and the optimization is with respect to pure states

582
Chapter 10: Entanglement Measures for Quantum Channels

𝜓 𝑅 𝐴 , with the dimension of 𝑅 equal to the dimension of 𝐴. We investigate this

entanglement measure in Section 10.3.
2. The generalized Rains divergence of a channel N 𝐴→𝐵 , defined for every
generalized divergence 𝑫 as
𝑹(N) B sup 𝑹( 𝐴; 𝐵)𝜔 (10.1.10)
𝜓𝑆 𝐴
= sup ′
inf 𝑫 (N 𝐴→𝐵 (𝜓 𝑆 𝐴 )∥𝜎𝑆𝐵 ), (10.1.11)
𝜓 𝑆 𝐴 𝜎𝑆𝐵 ∈PPT (𝑆:𝐵)

where 𝜔 𝑆𝐵 = N 𝐴→𝐵 (𝜓 𝑆 𝐴 ), and the optimization is with respect to pure states

𝜓 𝑆 𝐴 , with the dimension of 𝑆 equal to the dimension of 𝐴. We investigate this
entanglement measure in Section 10.4.
3. The squashed entanglement of a channel N 𝐴→𝐵 , defined as
𝐸 sq (N) B sup 𝐸 sq (𝑅; 𝐵)𝜔 , (10.1.12)
𝜓𝑅 𝐴

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), and the optimization is with respect to pure states

𝜓 𝑅 𝐴 , with the dimension of 𝑅 equal to the dimension of 𝐴. We investigate this
entanglement measure in Section 10.5.

Remark: Instead of defining an entanglement measure for channels via an entanglement

measure for states, consider that the channel analogue of a separable state is an entanglement-
breaking channel, which follows from the discussion in Section 4.4.6. Another way to construct
an entanglement measure for quantum channels is through the generalized channel divergence
(Definition 7.81) between the channel and the set of entanglement-breaking channels:

𝑬 ′ (N) B inf 𝑫 (N∥M), (10.1.13)

M∈EB( 𝐴→𝐵)

where EB( 𝐴 → 𝐵) denotes the set of entanglement-breaking channels taking system 𝐴 to system
𝐵. Now, using the expression for the generalized channel divergence in (7.11.2), we obtain

𝑬 ′ (N) = inf sup 𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴) ∥M 𝐴→𝐵 (𝜓 𝑅 𝐴)), (10.1.14)

M∈EB( 𝐴→𝐵) 𝜓𝑅 𝐴

where the optimization is with respect to pure states 𝜓 𝑅 𝐴 is such that the dimension of 𝑅 is equal
to the dimension of 𝐴.
Now, because entanglement-breaking channels and separable states (with maximally mixed
reduced state) are in one-to-one correspondence (see Section 4.4.6), we find that the generalized
divergence of entanglement of N is bounded from above as follows:

𝑬 (N) ≤ sup inf 𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴) ∥M 𝐴→𝐵 (𝜓 𝑅 𝐴)). (10.1.15)

𝜓𝑅 𝐴 M∈EB( 𝐴→𝐵)

583
Chapter 10: Entanglement Measures for Quantum Channels

The right-hand side of the above inequality and the quantity 𝑬 ′ (N) differ in the order of the
infimum and supremum. From the discussion in Section 2.3, in particular (2.3.14), we conclude
that 𝑬 (N) ≤ 𝑬 ′ (N) for all quantum channels N. For the rest of this chapter, and throughout the
rest of this book, we thus stick with the definition of a channel entanglement measure given in
Definition 10.1.

Many properties of state entanglement measures carry over, or have an analogue,

to the corresponding channel entanglement measure.

Proposition 10.2 Properties of Entanglement Measures for Channels

Let 𝐸 be an entanglement measure for states, as defined in Definition 9.1,
and consider the corresponding channel entanglement measure defined in
Definition 10.1.
1. Faithfulness: If 𝐸 vanishes for all separable states, then 𝐸 (N) = 0 for all
entanglement breaking channels. If 𝐸 is faithful (vanishing if and only if
the input state is separable), then 𝐸 (N) = 0 implies that N is entanglement
breaking.
2. Convexity: If 𝐸 is a convex entanglement measure for states, then the
corresponding channel entanglement measure is convex: for every finite
alphabet X, probability distribution 𝑝 : X → [0, 1], and set {N𝑥𝐴→𝐵 }𝑥∈X
of quantum channels,
!
∑︁ ∑︁
𝐸 𝑝(𝑥)N𝑥 ≤ 𝑝(𝑥)𝐸 (N𝑥 ). (10.1.16)
𝑥∈X 𝑥∈X

3. Superadditivity: If 𝐸 is superadditive, meaning that

𝐸 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 ) 𝜌⊗𝜏 ≥ 𝐸 ( 𝐴1 ; 𝐵1 ) 𝜌 + 𝐸 ( 𝐴2 ; 𝐵2 )𝜏 (10.1.17)

for all states 𝜌 𝐴1 𝐵1 and 𝜏𝐴2 𝐵2 , then the channel entanglment measure is also
superadditive: for every two channels N 𝐴1 →𝐵1 and M 𝐴2 →𝐵2 ,

𝐸 (N ⊗ M) ≥ 𝐸 (N) + 𝐸 (M). (10.1.18)

Proof:
1. Let N be an entanglement breaking channel. This means that 𝜔 𝑅𝐵 =
584
Chapter 10: Entanglement Measures for Quantum Channels

N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) is separable for every pure state 𝜓 𝑅 𝐴 . Therefore, since 𝐸 vanishes

for all separable states, we have 𝐸 (𝑅; 𝐵)𝜔 = 0 for every pure state 𝜓 𝑅 𝐴 , so that
𝐸 (N) = 0.
Now, suppose that 𝐸 is a faithful state entanglement measure, and let 𝐸 (N) = 0.
Since 𝐸 is faithful, it is non-negative for all input states, which implies that
𝐸 (𝑅; 𝐵)𝜔 = 0 for every state 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), i.e., for every pure state
𝜓 𝑅 𝐴 . Furthermore, by faithfulness of 𝐸 for states, it holds that N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) is
separable for every pure state 𝜓 𝑅 𝐴 . Therefore, N is entanglement breaking.
2. Let 𝜓 𝑅 𝐴 be an arbitrary pure state, and let
!
∑︁ ∑︁
𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) =
𝑥
𝑝(𝑥)𝜔𝑥𝑅𝐵 , (10.1.19)
𝑥∈X 𝑥∈X
where 𝜔𝑥𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ).
𝑥 Then, by the convexity of 𝐸 for states,
∑︁ ∑︁
𝐸 (𝑅; 𝐵)𝜔 ≤ 𝑝(𝑥)𝐸 (𝑅; 𝐵)𝜔 𝑥 ≤ 𝑝(𝑥)𝐸 (N𝑥 ), (10.1.20)
𝑥∈X 𝑥∈X
where for the last inequality we used the definition of the channel entanglement
measure. Therefore, for every pure state 𝜓 𝑅 𝐴 , we have
∑︁
𝐸 (𝑅; 𝐵)𝜔 ≤ 𝑝(𝑥)𝐸 (N𝑥 ). (10.1.21)
𝑥∈X
Thus,
!
∑︁ ∑︁
𝑥
𝐸 𝑝(𝑥)N = sup 𝐸 (𝑅; 𝐵)𝜔 ≤ 𝑝(𝑥)𝐸 (N𝑥 ), (10.1.22)
𝑥∈X 𝜓𝑅 𝐴 𝑥∈X
as required.
3. By restricting the optimization in the definition of 𝐸 (N ⊗ M) to pure product
states 𝜙 𝑅1 𝐴1 ⊗ 𝜑 𝑅2 𝐴2 , letting 𝜉 𝑅1 𝐵1 = N 𝐴1 →𝐵1 (𝜑 𝑅1 𝐴1 ), 𝜏𝑅2 𝐵2 = M 𝐴2 →𝐵2 (𝜑 𝑅2 𝐴2 ),
and using superadditivity of the state entanglement measure 𝐸, we obtain
𝐸 (N ⊗ M) = sup 𝐸 (𝑅; 𝐵1 𝐵2 )𝜔 (10.1.23)
𝜓 𝑅 𝐴 1 𝐴2

≥ sup 𝐸 (𝑅1 𝑅2 ; 𝐵1 𝐵2 )𝜉⊗𝜏 (10.1.24)

𝜙 𝑅1 𝐴2 ⊗𝜑 𝑅2 𝐴2

≥ sup 𝐸 (𝑅1 ; 𝐵1 )𝜉 + sup 𝐸 (𝑅2 ; 𝐵2 )𝜏 (10.1.25)

𝜙 𝑅1 𝐴1 𝜑 𝑅2 𝐴2

= 𝐸 (N) + 𝐸 (M), (10.1.26)

as required. ■
585
Chapter 10: Entanglement Measures for Quantum Channels

A0
A

Alice ρ 0 0
Bob
A AB
N ω ABB0

B
B0

Figure 10.1: Starting from a state 𝜌 𝐴′ 𝐴𝐵′ , Alice sends the system 𝐴 through
the channel N 𝐴→𝐵 to Bob, resulting in the state 𝜔 𝐴′ 𝐵𝐵′ = N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ).
The difference between the final and initial entanglement (as quantified by an
entanglement measure for states), optimized over all initial states 𝜌 𝐴′ 𝐴𝐵′ , is
equal to the amortized entanglement of N; see Definition 10.3.

10.2 Amortized Entanglement

There is another way to define the entanglement of a quantum channel from an
entanglement measure on quantum states, and this method turns out to be useful in
the feedback-assisted communciation protocols that we consider in Part III. To see
how this measure is defined, consider the situation shown in Figure 10.1. In this
setup, Alice and Bob each have access to systems 𝐴′ and 𝐵′, respectively. Alice
also possesses the system 𝐴, and she passes it through the channel N 𝐴→𝐵 to Bob.
The initial joint state 𝜌 𝐴′ 𝐴𝐵′ then becomes 𝜔 𝐴𝐵𝐵′ = N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ). The systems
𝐴′ and 𝐵′ can be thought of as “memory systems” that hold auxiliary information.
In feedback-assisted quantum communication protocols, these memory systems
can hold the results of previous rounds of communication between Alice and Bob
for the purpose of deciding what local operations to perform in subsequent rounds.
By taking the difference between the final entanglement in the state 𝜔 𝐴′ 𝐵𝐵′ and the
initial entanglement in the state 𝜌 𝐴′ 𝐴𝐵′ , and optimizing over all initial states 𝜌 𝐴′ 𝐴𝐵′ ,
we arrive at what is called the amortized entanglement.

586
Chapter 10: Entanglement Measures for Quantum Channels

Definition 10.3 Amortized Entanglement of a Quantum Channel

From an entanglement measure 𝐸 defined on bipartite quantum states, we
define the amortized entanglement of a quantum channel N 𝐴→𝐵 as follows:

𝐸 A (N) B sup 𝐸 ( 𝐴′; 𝐵𝐵′)𝜔 − 𝐸 ( 𝐴′ 𝐴; 𝐵′) 𝜌 ,

(10.2.1)
𝜌 𝐴′ 𝐴𝐵′

where 𝜔 𝐴′ 𝐵𝐵′ B N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ) and the optimization is with respect to states
𝜌 𝐴′ 𝐴𝐵′ . The systems 𝐴′ and 𝐵′ have arbitrarily large, yet finite dimensions.

Due to the fact that the systems 𝐴′ and 𝐵′ can be arbitrarily large, it is not
necessarily the case that the supremum above can be achieved. Thus, in general, it
might be difficult to compute a channel’s amortized entanglement.
For every entanglement measure 𝐸 that is equal to zero for all separable states,
we always have that the entanglement of the channel never exceeds the amortized
entanglement of the channel:

Lemma 10.4
For a given quantum channel N and entanglement measure 𝐸 that is equal to
zero for all separable states, the channel’s entanglement does not exceed its
amortized entanglement:
𝐸 (N) ≤ 𝐸 A (N). (10.2.2)

Proof: By choosing the input state 𝜌 𝐴′ 𝐴𝐵′ in the optimization for amortized
entanglement to have a trivial (one-dimensional) system 𝐵′ (so that 𝜌 𝐴′ 𝐴𝐵′ is trivially
a separable state between Alice and Bob), we find that 𝐸 ( 𝐴′; 𝐵𝐵′)𝜔 = 𝐸 ( 𝐴′; 𝐵)𝜔
and 𝐸 ( 𝐴𝐴′; 𝐵′) 𝜌 = 0. Since such a state is an arbitrary state to consider for
optimizing the channel’s entanglement, the inequality follows. ■

Whether the inequality reverse to the one in (10.2.2) holds, which would imply
that 𝐸 (N) = 𝐸 A (N), depends on the entanglement measure 𝐸. In Section 10.6
below, we show that this so-called “amortization collapse” occurs for some
entanglement measures.
The amortized entanglement of a channel has several interesting properties,
which we list in some detail in this section. These include convexity, faithfulness,

587
Chapter 10: Entanglement Measures for Quantum Channels

and (sub)additivity.

Proposition 10.5 Properties of Amortized Entanglement of a Quantum

Channel
1. Dimension bound: Let 𝐸 be a subadditive entanglement measure for states,
and let N 𝐴→𝐵 be a quantum channel. Then,
𝐸 A (N) ≤ min{𝐸 ( 𝐴; 𝐴′)Φ , 𝐸 (𝐵; 𝐵′)Φ }, (10.2.3)
where 𝐴′ has the same dimension as 𝐴, 𝐵′ has the same dimension as 𝐵,
and Φ+ is a maximally entangled state of systems 𝐴𝐴′ or 𝐵𝐵′.
2. Faithfulness: Let 𝐸 be an entanglement measure that is equal to zero
for all separable states. If a channel N is entanglement-breaking, then
its amortized entanglement 𝐸 A (N) is equal to zero. If the entanglement
measure 𝐸 is faithful (equal to zero if and only if the state is separable)
and the amortized entanglement 𝐸 A (N) of a channel N is equal to zero,
then the channel N is entanglement breaking.
3. Convexity: Let 𝐸 be a convex entanglement measure for states. Then,
the amortized entanglement 𝐸 A of a channel is convex: for every finite
alphabet X, probability distribution 𝑝 : X → [0, 1], and set {N𝑥𝐴→𝐵 }𝑥∈X
of quantum channels,
!
∑︁ ∑︁
𝐸A 𝑝(𝑥)N𝑥 ≤ 𝑝(𝑥)𝐸 A (N𝑥 ). (10.2.4)
𝑥∈X 𝑥∈X

4. Subdditivity and additivity: For every entanglement measure 𝐸, the

amortized entanglement 𝐸 A of a channel is subadditive, meaning that for
every two quantum channels N and M,
𝐸 A (N ⊗ M) ≤ 𝐸 A (N) + 𝐸 A (M). (10.2.5)

If 𝐸 is an additive entanglement measure, then the amortized entanglement

𝐸 A is additive, meaning that
𝐸 A (N ⊗ M) = 𝐸 A (N) + 𝐸 A (M) (10.2.6)
for every two quantum channels N and M.

588
Chapter 10: Entanglement Measures for Quantum Channels

Proof:
1. To prove (10.2.3), we use the fact that N can be simulated via teleporation.
Specifically, from (5.1.33) and (5.1.34), we can represent the action of N 𝐴→𝐵
on every state 𝜌 𝐴′ 𝐴𝐵′ in the following two ways:

N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ) = N𝐵𝑖′ →𝐵 T 𝐴𝐴𝑖 𝐵𝑖 →𝐵𝑖′ (𝜌 𝐴′ 𝐴𝐵′ ⊗ Φ 𝐴𝑖 𝐵𝑖 ) , (10.2.7)

N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ) = T 𝐴𝑜′ 𝐴𝑜 𝐵𝑜 →𝐵 N 𝐴→𝐴𝑜′ (𝜌 𝐴′ 𝐴𝐵′ ) ⊗ Φ+𝐴𝑜 𝐵𝑜 , (10.2.8)

where 𝐴𝑖 , 𝐵𝑖 , 𝐵𝑖′ are auxiliary systems such that 𝑑 𝐴𝑖 = 𝑑 𝐵𝑖 = 𝑑 𝐵𝑖′ = 𝑑 𝐴 (i.e.,

systems with the same dimension as the input system 𝐴 of the channel)
and 𝐴𝑜 , 𝐴′𝑜 , 𝐵𝑜 are auxiliary systems such that 𝑑 𝐴𝑜 = 𝑑 𝐴𝑜′ = 𝑑 𝐵𝑜 = 𝑑 𝐵 (i.e.,
systems with the same dimension as the output system 𝐵 of the channel). The
teleportation channel T is given by (5.1.26). Since T is an LOCC channel, and
N is a local channel, by LOCC monotonicity of the entanglement measure 𝐸
we obtain

𝐸 ( 𝐴′; 𝐵𝐵′)𝜔 ≤ 𝐸 ( 𝐴′ 𝐴𝐴𝑖 ; 𝐵′ 𝐵𝑖 ) 𝜌⊗Φ , (10.2.9)

𝐸 ( 𝐴′; 𝐵𝐵′)𝜔 ≤ 𝐸 ( 𝐴′ 𝐴𝐴𝑜 ; 𝐵′ 𝐵𝑜 ) 𝜌⊗Φ , (10.2.10)

where 𝜔 𝐴′ 𝐵𝐵′ = N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ). Then, by subadditivity of 𝐸, we find that

𝐸 ( 𝐴′; 𝐵𝐵′)𝜔 ≤ 𝐸 ( 𝐴′ 𝐴; 𝐵′) 𝜌 + 𝐸 ( 𝐴𝑖 ; 𝐵𝑖 )Φ , (10.2.11)

𝐸 ( 𝐴′; 𝐵𝐵′)𝜔 ≤ 𝐸 ( 𝐴′ 𝐴; 𝐵′) 𝜌 + 𝐸 ( 𝐴𝑜 ; 𝐵𝑜 )Φ . (10.2.12)

Since the state 𝜌 𝐴′ 𝐴𝐵′ is arbitrary, we conclude that

𝐸 A (N) ≤ 𝐸 ( 𝐴𝑖 ; 𝐵𝑖 )Φ = 𝐸 ( 𝐴; 𝐴′)Φ , (10.2.13)

𝐸 A (N) ≤ 𝐸 ( 𝐴𝑜 ; 𝐵𝑜 )Φ = 𝐸 (𝐵; 𝐵′)Φ , (10.2.14)

where for the last equality in each case we used 𝑑 𝐴𝑖 = 𝑑 𝐵𝑖 = 𝑑 𝐴′ = 𝑑 𝐴 and

𝑑 𝐴𝑜 = 𝑑 𝐵𝑜 = 𝑑 𝐵′ = 𝑑 𝐵 . We thus have

𝐸 A (N) ≤ min{𝐸 ( 𝐴; 𝐴′)Φ , 𝐸 (𝐵; 𝐵′)Φ }, (10.2.15)

as required.
2. Let N be an entanglement breaking channel. For every state 𝜌 𝐴′ 𝐴𝐵′ , let
𝜔 𝐴𝐵𝐵′ = N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ). Recall from Section 4.4.6, specifically Theorem 4.15,
589
Chapter 10: Entanglement Measures for Quantum Channels

that every entanglement breaking channel can be represented as a measurement

of the input system followed by the preparation of a state on the output system
conditioned on the outcome of the measurement. As such, every entanglement
breaking channel can be simulated by an LOCC channel. Therefore, by the
monotonicity of the entanglement measure 𝐸 under LOCC,

𝐸 ( 𝐴′; 𝐵𝐵′)𝜔 ≤ 𝐸 ( 𝐴′ 𝐴; 𝐵′) 𝜌 , (10.2.16)

which means that

𝐸 ( 𝐴′; 𝐵𝐵′)𝜔 − 𝐸 ( 𝐴′ 𝐴; 𝐵′) 𝜌 ≤ 𝐸 ( 𝐴′ 𝐴; 𝐵′) 𝜌 − 𝐸 ( 𝐴′ 𝐴; 𝐵′) 𝜌 = 0 (10.2.17)

for every state 𝜌 𝐴′ 𝐴𝐵′ . Therefore, 𝐸 A (N) ≤ 0. On the other hand, because
𝐸 vanishes for all separable states, it holds that 𝐸 (N) ≥ 0. Therefore, by
Lemma 10.4, 𝐸 A (N) ≥ 0, and we conclude that 𝐸 A (N) = 0.
Now, let 𝐸 be a faithful entanglement measure, meaning that it vanishes if
and only if the input state is separable, and suppose that 𝐸 A (N) = 0. By
Lemma 10.4, we have that 0 = 𝐸 A (N) ≥ 𝐸 (N), which in turn implies that
𝐸 (N) = 0 because 𝐸 (N) ≥ 0 for all channels N. Therefore, by Proposi-
tion 10.2, we conclude that N is entanglement breaking.
3. Let 𝜌 𝐴′ 𝐴𝐵′ be an arbitrary state, and let
!
∑︁ ∑︁
𝜔 𝐴′ 𝐵𝐵′ = 𝑝(𝑥)N𝑥𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ) = 𝑝(𝑥)𝜔𝑥𝐴′ 𝐵𝐵′ , (10.2.18)
𝑥∈X 𝑥∈X

where 𝜔𝑥𝐴′ 𝐵𝐵′ = N𝑥𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ) for all 𝑥 ∈ X. Then, by convexity of the
entanglement measure 𝐸, we obtain
∑︁
𝐸 ( 𝐴′; 𝐵𝐵′)𝜔 ≤ 𝑝(𝑥)𝐸 ( 𝐴′; 𝐵𝐵′)𝜔 𝑥 . (10.2.19)
𝑥∈X

Also, 𝐸 ( 𝐴′ 𝐴; 𝐵′) 𝜌 = 𝑥∈X 𝑝(𝑥)𝐸 ( 𝐴′ 𝐴; 𝐵′) 𝜌 . Therefore, by the definition of

Í
amortized entanglement, we find that

𝐸 ( 𝐴′; 𝐵𝐵′)𝜔 − 𝐸 ( 𝐴′ 𝐴; 𝐵′) 𝜌 (10.2.20)

∑︁
𝑝(𝑥) 𝐸 ( 𝐴′; 𝐵𝐵′)𝜔 𝑥 − 𝐸 ( 𝐴′ 𝐴; 𝐵′) 𝜌

≤ (10.2.21)
𝑥∈X
∑︁
≤ 𝑝(𝑥)𝐸 A (N𝑥 ). (10.2.22)
𝑥∈X

590
Chapter 10: Entanglement Measures for Quantum Channels

Since the state 𝜌 𝐴′ 𝐴𝐵′ is arbitrary, by optimizing over every state 𝜌 𝐴′ 𝐴𝐵′ on the
left-hand side of the inequality above, we obtain
!
∑︁ ∑︁
A
𝐸 𝑝(𝑥)N ≤ 𝑥
𝑝(𝑥)𝐸 A (N𝑥 ), (10.2.23)
𝑥∈X 𝑥∈X

as required.
4. Let 𝐴1 and 𝐵1 denote the respective input and output systems for the quantum
channel N, and let 𝐴2 and 𝐵2 denote the respective input and output quantum
systems for the quantum channel M. Let 𝜌 𝐴′ 𝐴1 𝐴2 𝐵′ be an arbitrary state. Let

𝜔 𝐴′ 𝐵1 𝐵2 𝐵′ = (N 𝐴1 →𝐵1 ⊗ M 𝐴2 →𝐵2 )(𝜌 𝐴′ 𝐴1 𝐴2 𝐵′ ) (10.2.24)

= N 𝐴1 →𝐵1 (𝜏𝐴′ 𝐴1 𝐵2 𝐵′ ), (10.2.25)

where
𝜏𝐴′ 𝐴1 𝐵2 𝐵′ B M 𝐴2 →𝐵2 (𝜌 𝐴′ 𝐴1 𝐴2 𝐵′ ). (10.2.26)
Observe that the state 𝜔 𝐴′ 𝐵1 𝐵2 𝐵′ is both an example of an output state in the
optimization defining 𝐸 A (N ⊗ M) and in the optimization defining 𝐸 A (N)
(with an appropriate identification of the 𝐴′, 𝐵, and 𝐵′ systems for the latter).
Observe also that 𝜏𝐴′ 𝐴1 𝐵2 𝐵′ is an example of an output state in the optimization
defining 𝐸 A (M). Therefore,

𝐸 ( 𝐴′; 𝐵1 𝐵2 𝐵′)𝜔 − 𝐸 ( 𝐴′ 𝐴1 𝐴2 ; 𝐵′) 𝜌

= 𝐸 ( 𝐴′; 𝐵1 𝐵2 𝐵′)𝜔 − 𝐸 ( 𝐴′ 𝐴1 ; 𝐵2 𝐵′)𝜏
+ 𝐸 ( 𝐴′ 𝐴1 ; 𝐵2 𝐵′)𝜏 − 𝐸 ( 𝐴′ 𝐴1 𝐴2 ; 𝐵′) 𝜌 (10.2.27)
≤ 𝐸 A (N) + 𝐸 A (M). (10.2.28)

Since the state 𝜌 𝐴′ 𝐴1 𝐴2 𝐵′ is arbitrary, we can optimize over all such states on
the left-hand side of the inequality above to obtain

𝐸 A (N ⊗ M) ≤ 𝐸 A (N) + 𝐸 A (M). (10.2.29)

Now, suppose that 𝐸 is an additive entanglement measure. Let us restrict the

optimization over states 𝜌 𝐴′ 𝐴1 𝐴2 𝐵′ in the definition of 𝐸 A (N ⊗ M) such that
𝐴′ ≡ 𝐴′1 𝐴′2 , 𝐵′ ≡ 𝐵′1 𝐵′2 , and 𝜌 𝐴′ 𝐴1 𝐴2 𝐵′ = 𝜌 1𝐴′ 𝐴1 𝐵′ ⊗ 𝜌 2𝐴′ 𝐴2 𝐵′ , for states 𝜌 1𝐴′ 𝐴1 𝐵′
1 1 2 2 1 1
and 𝜌 2𝐴′ 𝐴2 𝐵′ . Then,
2 2

𝜔 𝐴′ 𝐵1 𝐵2 𝐵′ = (N 𝐴1 →𝐵1 ⊗ M 𝐴2 →𝐵2 )(𝜌 𝐴′ 𝐴1 𝐴2 𝐵′ ) (10.2.30)

591
Chapter 10: Entanglement Measures for Quantum Channels

= N 𝐴1 →𝐵1 (𝜌 1𝐴′ 𝐴1 𝐵′ ) ⊗ M 𝐴2 →𝐵2 (𝜌 2𝐴′ 𝐴2 𝐵′ ) (10.2.31)

1 1 2 2

=: 𝜔1𝐴′ 𝐵1 𝐵′ ⊗ 𝜔2𝐴2 𝐵2 𝐵′ . (10.2.32)

1 1 2

Therefore, using additivity of 𝐸, we obtain

𝐸 A (N ⊗ M) = sup {𝐸 ( 𝐴′; 𝐵1 𝐵2 𝐵′)𝜔 − 𝐸 ( 𝐴′ 𝐴1 𝐴2 ; 𝐵′) 𝜌 } (10.2.33)
𝜌 𝐴′ 𝐴1 𝐴2 𝐵′

≥ sup {𝐸 ( 𝐴′1 𝐴′2 ; 𝐵1 𝐵2 𝐵′1 𝐵′2 )𝜔1 ⊗𝜔2

𝜌 1𝐴′ 𝐴 𝐵′
⊗𝜌 2𝐴′ 𝐴 𝐵′
1 1 1 2 2 2

− 𝐸 ( 𝐴′1 𝐴′2 𝐴1 𝐴2 ; 𝐵′1 𝐵′2 ) 𝜌1 ⊗𝜌2 } (10.2.34)

≥ sup {𝐸 ( 𝐴′1 ; 𝐵1 𝐵′1 )𝜔1 + 𝐸 ( 𝐴′2 ; 𝐵2 𝐵′2 )𝜔2
𝜌 1𝐴′ 𝐴 𝐵′
⊗𝜌 2𝐴′ 𝐴 𝐵′
1 1 1 2 2 2

− 𝐸 ( 𝐴′1 𝐴1 ; 𝐵′1 ) 𝜌1 − 𝐸 ( 𝐴′2 𝐴2 ; 𝐵′2 ) 𝜌2 } (10.2.35)

= sup {𝐸 ( 𝐴′1 ; 𝐵1 𝐵′1 )𝜔1 − 𝐸 ( 𝐴′1 𝐴1 ; 𝐵′1 ) 𝜌1 }
𝜌 1𝐴′ 𝐴𝐵′
1 1 1

+ sup {𝐸 ( 𝐴′2 ; 𝐵2 𝐵′2 )𝜔2 − 𝐸 ( 𝐴′2 𝐴2 ; 𝐵′2 ) 𝜌2 }

𝜌 2𝐴′ 𝐴
𝐵′
2 2 2
(10.2.36)
= 𝐸 A (N) + 𝐸 A (M). (10.2.37)
Combining this with (10.2.29), we conclude that
𝐸 A (N ⊗ M) = 𝐸 A (N) + 𝐸 A (M), (10.2.38)
as required. ■

An immediate consequence of (10.2.5) is the following inequality:

sup 𝐸 A (N ⊗ M) − 𝐸 A (M) ≤ 𝐸 A (N),

(10.2.39)
M
where the supremum is with respect to quantum channels M. This inequality
demonstrates that no other channel can help to enhance the amortized entanglement
of a quantum channel.

10.2.1 Amortized Entanglement and Teleportation Simulation

Teleportation (or, more generally, LOCC, separable, or PPT) simulation of a

quantum channel is a key tool that we can use to establish upper bounds on
592
Chapter 10: Entanglement Measures for Quantum Channels

capacities of certain quantum channels when they are assisted by LOCC. Recalling
Section 5.1.4, the basic idea behind this tool is that a quantum channel can be
simulated by the action of a teleportation protocol, with a maximally entangled
resource state shared between the sender 𝐴 and receiver 𝐵. More generally, recalling
Definition 4.25, a channel N 𝐴→𝐵 with input system 𝐴 and output system 𝐵 is
defined to be LOCC-simulable with associated resource state 𝜔 𝑅𝐵′ if the following
equality holds for all input states 𝜌 𝐴 :

N 𝐴→𝐵 (𝜌 𝐴 ) = L 𝐴𝑅𝐵′ →𝐵 (𝜌 𝐴 ⊗ 𝜔 𝑅𝐵′ ), (10.2.40)

where L 𝐴𝑅𝐵′ →𝐵 is a quantum channel consisting of LOCC between the sender,

who has systems 𝐴 and 𝑅, and the receiver, who has system 𝐵′. Whenever the
underlying state entanglement measure is subadditive, the amortized entanglement
of an LOCC-simulable channel can be bounded from above by the entanglement of
the resource state. In fact, this is precisely what we did when proving the dimension
bound in Proposition 10.5 above. We can therefore understand the dimension bound
as being a consequence of the fact that all channels are teleportation simulable, and
hence LOCC simulable, by using a maximally entangled resource state.

Proposition 10.6
Let 𝐸 be a subadditive state entanglement measure (recall (9.1.9)). If a quantum
channel N 𝐴→𝐵 is LOCC-simulable with associated resource state 𝜔 𝑅𝐵′ , i.e.,

N 𝐴→𝐵 (𝜌 𝐴 ) = L 𝐴𝑅𝐵′ →𝐵 (𝜌 𝐴 ⊗ 𝜔 𝑅𝐵′ ), (10.2.41)

where L 𝐴𝑅𝐵′ →𝐵 is an LOCC channel, then the amortized entanglement 𝐸 A (N)

of N is bounded from above by the entanglement of the resource state:

𝐸 A (N) ≤ 𝐸 (𝑅; 𝐵′)𝜔 . (10.2.42)

Proof: For every state 𝜌 𝐴′ 𝐴𝐵′′ , we use monotonicity of the state entanglement
measure under LOCC, as well as subadditivity of the measure, to obtain

𝐸 ( 𝐴′; 𝐵𝐵′′)L(𝜌⊗𝜔) − 𝐸 ( 𝐴′ 𝐴; 𝐵′′) 𝜌

≤ 𝐸 ( 𝐴′ 𝐴𝑅; 𝐵′′ 𝐵′) 𝜌⊗𝜔 − 𝐸 ( 𝐴′ 𝐴; 𝐵′′) 𝜌 (10.2.43)
≤ 𝐸 ( 𝐴′ 𝐴; 𝐵′′) 𝜌 + 𝐸 (𝑅; 𝐵′)𝜔 − 𝐸 ( 𝐴′ 𝐴; 𝐵′′) 𝜌 (10.2.44)
= 𝐸 (𝑅; 𝐵′)𝜔 , (10.2.45)

593
Chapter 10: Entanglement Measures for Quantum Channels

where for the first inequality we made use of LOCC monotonicity and for the
second inequality we made use of the assumption of subadditivity. Since the state
𝜌 𝐴′ 𝐴𝐵′′ was arbitrary, we conclude (10.2.42). ■

If it happens that a channel N 𝐴→𝐵 is LOCC-simulable with resource state

𝜔 𝑅𝐵′ = N 𝐴→𝐵′ (𝜌 𝑅 𝐴 ) for some state 𝜌 𝑅 𝐴 , then the inequality in (10.2.42) becomes
an equality. In Section 5.1.4, we saw an example in which such a situation arises,
namely, when the channel N is group covariant. In this case, the resource state is
simply the Choi state of the channel.

Proposition 10.7
Let 𝐸 be an entanglement measure that is subadditive with respect to states and
zero on separable states, and let 𝐸 A denote its amortized version. If a channel
N 𝐴→𝐵 is LOCC-simulable with associated resource state 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜌 𝑅 𝐴 )
for some input state 𝜌 𝑅 𝐴 , then the following equality holds

𝐸 A (N) = 𝐸 (𝑅; 𝐵)𝜔 . (10.2.46)

Proof: From Proposition 10.6, we have that 𝐸 A (N) ≤ 𝐸 (𝑅; 𝐵)𝜔 . For the reverse
inequality, we take 𝜌 𝐴′ 𝐴𝐵′′ = 𝜌 𝑅 𝐴 in the optimization that defines 𝐸 A (N), where
we identify 𝐴′ ≡ 𝑅 and 𝐵′′ ≡ ∅ (i.e., 𝐵′′ is a trivial one-dimensional system). Then,
N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′′ ) = N 𝐴→𝐵 (𝜌 𝑅 𝐴 ), which is the resource state. Furthermore, since
𝐵′′ is a one-dimensional system, the state 𝜌 𝐴′ 𝐴𝐵′′ is trivially separable, so that
𝐸 ( 𝐴′ 𝐴; 𝐵′′) 𝜌 = 0. Therefore, 𝐸 A (N) ≥ 𝐸 ( 𝐴′; 𝐵)𝜔 ≡ 𝐸 (𝑅; 𝐵)𝜔 . ■

10.3 Generalized Divergence of Entanglement

In this section, we examine the generalized divergence of entanglement of quantum
channels, which is a channel entanglement measure that arises from the generalized
divergence of entanglement for quantum states that we considered in Section 9.2.

594
Chapter 10: Entanglement Measures for Quantum Channels

Definition 10.8 Generalized Divergence of Entanglement

Let 𝑫 be a generalized divergence (see Definition 7.15). For every quantum
channel N 𝐴→𝐵 , we define the generalized divergence of entanglement of N as

𝑬 (N) B sup 𝑬 (𝑅; 𝐵)𝜔 (10.3.1)

𝜓𝑅 𝐴
= sup inf 𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 ), (10.3.2)
𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)

where 𝜔 𝑅𝐵 B N 𝐴→𝐵 (𝜓 𝑅 𝐴 ). The supremum is with respect to every pure state

𝜓 𝑅 𝐴 , with the dimension of 𝑅 equal to the dimension of 𝐴.

In the remark immediately after Definition 10.1, we stated how it suffices

to optimize with respect to pure bipartite states (with equal dimension for each
subsystem) when calculating the generalized divergence of entanglement of a
quantum channel.
We can write the generalized divergence of entanglement of N 𝐴→𝐵 in the
following alternate form:

𝑬 (N) = sup 𝑬 (N 𝐴→𝐵 , 𝜌 𝐴 ), (10.3.3)

𝜌𝐴
√ √
𝑬 (N 𝐴→𝐵 , 𝜌 𝐴 ) B inf 𝑫 ( 𝜌 𝐴 ΓN
𝐴𝐵 𝜌 𝐴 ∥𝜎𝐴𝐵 ). (10.3.4)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)
√
This is indeed true because we can write every purification of 𝜌 𝐴 as (𝑉𝐴′ 𝜌 𝐴′ ⊗
1 𝐴 )|Γ⟩ 𝐴′ 𝐴 for some isometry 𝑉𝐴′ (see (2.2.38) and Theorem 2.3), that the set of
separable states is invariant under local isometries, and that generalized divergences
are invariant under local isometries (see Proposition 7.16).
As with the state quantities, we are interested in the following generalized
divergences of entanglement of a quantum channel N 𝐴→𝐵 :
1. The relative entropy of entanglement of N,

𝐸 𝑅 (N) B sup 𝐸 𝑅 (𝑆; 𝐵)𝜔 (10.3.5)

𝜓𝑆 𝐴
= sup inf 𝐷 (N 𝐴→𝐵 (𝜓 𝑆 𝐴 )∥𝜎𝑆𝐵 ), (10.3.6)
𝜓 𝑆 𝐴 𝜎𝑆𝐵 ∈SEP(𝑆:𝐵)

where 𝜔 𝑆𝐵 = N 𝐴→𝐵 (𝜓 𝑆 𝐴 ).
595
Chapter 10: Entanglement Measures for Quantum Channels

2. The 𝜀-hypothesis testing relative entropy of entanglement of N,

𝐸 𝑅𝜀 (N) B sup 𝐸 𝑅𝜀 (𝑅; 𝐵)𝜔 (10.3.7)

𝜓𝑅 𝐴
= sup inf 𝐷 𝜀𝐻 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 ), (10.3.8)
𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ).
3. The sandwiched Rényi relative entropy of entanglement of N,
e𝛼 (N) B sup 𝐸
𝐸 e𝛼 (𝑅; 𝐵)𝜔 (10.3.9)
𝜓𝑅 𝐴

= sup inf e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 ),

𝐷 (10.3.10)
𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)

where 𝜔 𝑅𝐵 ∈ N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) and 𝛼 ∈ [1/2, 1) ∪ (1, ∞). It follows from Proposi-

e𝛼 (N) is monotonically increasing in 𝛼. Also, in Appendix 10.A,
tion 7.31 that 𝐸
we prove that
𝐸 𝑅 (N) = lim+ 𝐸 e𝛼 (N) (10.3.11)
𝛼→1
for every quantum channel N.
4. The max-relative entropy of entanglement of N,

𝐸 max (N) B sup 𝐸 max (𝑅; 𝐵)𝜔 (10.3.12)

𝜓𝑅 𝐴
= sup inf 𝐷 max (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 ), (10.3.13)
𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ). In Appendix 10.A, we prove that

𝐸 max (N) = lim 𝐸

e𝛼 (N) (10.3.14)
𝛼→∞

for every quantum channel N.

The generalized divergence of entanglement of a quantum channel satisfies all
of the general properties of a channel entanglement measure shown in Proposi-
tion 10.2 except for the superadditivity property, which holds when the generalized
divergence of a bipartite state is superadditive. However, as shown in Proposi-
tion 9.16, the generalized divergence of entanglement of a bipartite state is generally
only subadditive. Thus, neither the superadditivity nor the subadditivity of the
generalized divergence of entanglement of a channel immediately follows. In
596
Chapter 10: Entanglement Measures for Quantum Channels

Section 10.6 below, we show that the max-relative entropy of entanglement of a

quantum channel is subadditive, meaning that
𝐸 max (N ⊗ M) ≤ 𝐸 max (N) + 𝐸 max (M) (10.3.15)
for all quantum channels N and M.
The amortized generalized divergence of entanglement, defined according to
Definition 10.3 as
𝑬 A (N) B sup {𝑬 ( 𝐴′; 𝐵𝐵′)𝜔 − 𝑬 ( 𝐴′ 𝐴; 𝐵) 𝜌 }, (10.3.16)
𝜌 𝐴′ 𝐴𝐵′

where 𝜔 𝐴′ 𝐵𝐵′ = N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ), satisfies all of the properties stated in Proposi-
tion 10.5. In particular, it is subadditive. We show in Section 10.6 below that
A (N) for every quantum channel N, and this is what leads to the
𝐸 max (N) = 𝐸 max
subadditivity statement in (10.3.15).
For covariant channels, the optimization over pure input states in the generalized
divergence of entanglement can be simplified, as we now show. This simplification
is similar to the simplification that occurs for the generalized channel divergence
for jointly covariant channels (see Proposition 7.84).

Proposition 10.9 Generalized Divergence of Entanglement for Covariant

Channels
Let N 𝐴→𝐵 be a 𝐺-covariant quantum channel for a finite group 𝐺 (recall
Definition 4.18). Then, for every pure state 𝜓 𝑅 𝐴 , with the dimension of 𝑅 equal
to the dimension of 𝐴, we have that

𝑬 (𝑅; 𝐵)𝜔 ≤ 𝑬 (𝑅; 𝐵)𝜔 , (10.3.17)

𝜌
where 𝜔 𝑅𝐵 B N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), 𝜔 𝑅𝐵 B N 𝐴→𝐵 (𝜙 𝑅 𝐴 ),

1 ∑︁ 𝑔 𝑔†
𝜌𝐴 = 𝑈 𝐴 𝜓 𝐴𝑈 𝐴 C T𝐺 (𝜓 𝐴 ), (10.3.18)
|𝐺 | 𝑔∈𝐺

𝜌
and 𝜙 𝑅 𝐴 is a purification of 𝜌 𝐴 . Consequently,

𝑬 (N) = sup{𝑬 (𝑅; 𝐵)𝜔 : 𝜙 𝐴 = T𝐺 (𝜙 𝐴 )}, (10.3.19)

𝜙𝑅 𝐴

597
Chapter 10: Entanglement Measures for Quantum Channels

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜙 𝑅 𝐴 ). In other words, in order to calculate 𝑬 (N), it

suffices to optimize with respect to pure states 𝜙 𝑅 𝐴 such that the reduced state
𝜙 𝐴 is invariant under the channel T𝐺 defined in (10.3.18).

Remark: Using (10.3.3), we can write (10.3.17) as

𝑬 (N, 𝜌) ≤ 𝑬 (N, T𝐺 (𝜌)), (10.3.20)

which holds for every state 𝜌 acting on the input space of the channel N. We can write (10.3.19)
as
𝑬 (N) = sup{𝑬 (N, 𝜌) : 𝜌 = T𝐺 (𝜌)}. (10.3.21)
𝜌

Proof: The proof is similar to the proof of Proposition 7.84. Let 𝜓 𝑅 𝐴 be an

where {|𝑔⟩ 𝑅′ }𝑔∈𝐺 is an orthonormal basis for H 𝑅′ indexed by the elements of

𝐺. Since all purifications of a state can be mapped to each other by isometries
on the purifying systems, there exists an isometry 𝑊 𝑅→𝑅′ 𝑅 such that |𝜓 𝜌 ⟩ 𝑅′ 𝑅 𝐴 =
𝑊 𝑅→𝑅′ 𝑅 |𝜙 𝜌 ⟩ 𝑅 𝐴 . Then, because the set SEP of separable states is invariant un-
der local isometries, for every state 𝜎𝑅𝐵 ∈ SEP(𝑅 : 𝐵) we have that 𝜏𝑅′ 𝑅𝐵 B
W 𝑅→𝑅′ 𝑅 (𝜎𝑅𝐵 ) ∈ SEP(𝑅′ 𝑅 : 𝐵). Therefore,
𝜌
𝑫 (N 𝐴→𝐵 (𝜓 𝑅′ 𝑅 𝐴 )∥𝜏𝑅′ 𝑅𝐵 ) (10.3.23)
𝜌
= 𝑫 (N 𝐴→𝐵 (W 𝑅→𝑅′ 𝑅 (𝜙 𝑅 𝐴 ))∥W 𝑅→𝑅′ 𝐴′ (𝜎𝑅𝐵 )) (10.3.24)
𝜌
= 𝑫 (W 𝑅→𝑅′ 𝑅 (N 𝐴→𝐵 (𝜙 𝑅 𝐴 ))∥W 𝑅→𝑅′ 𝑅 (𝜎𝑅𝐵 )) (10.3.25)
𝜌
= 𝑫 (N 𝐴→𝐵 (𝜙 𝑅 𝐴 )∥𝜎𝑅𝐵 ), (10.3.26)

where, to obtain the last equality, we used the fact that any generalized divergence
is isometrically invariant (recall Proposition 7.16). Now, if we apply the dephasing
channel 𝑋 ↦→ 𝑔∈𝐺 |𝑔⟩⟨𝑔|𝑋 |𝑔⟩⟨𝑔| to the 𝑅′ system, then by the data-processing
Í
inequality for the generalized divergence 𝑫, we obtain
𝜌
𝑫 (N 𝐴→𝐵 (𝜓 𝑅′ 𝑅 𝐴 )∥𝜏𝑅′ 𝑅𝐵 )
598
Chapter 10: Entanglement Measures for Quantum Channels

© 1 ∑︁
|𝑔⟩⟨𝑔| 𝑅′ ⊗ ((V𝐵 ) † ◦ N 𝐴→𝐵 ◦ U 𝐴 )(𝜓 𝑅 𝐴 )
𝑔 𝑔
= 𝑫
|𝐺 | 𝑔∈𝐺
«
∑︁
𝑔† 𝑔 𝑔ª
𝑝(𝑔)|𝑔⟩⟨𝑔| 𝑅′ ⊗ 𝑉𝐵 𝜏𝑅𝐵𝑉𝐵 ® , (10.3.28)
𝑔∈𝐺 ¬
where to obtain the last line we applied the unitary channel given by the unitary
Í 𝑔†
𝑔∈𝐺 |𝑔⟩⟨𝑔| 𝑅 ′ ⊗ 𝑉𝐵 and we used the fact that generalized divergences are invariant
under unitaries. Furthermore, we wrote the action of the dephasing channel on 𝜏𝑅′ 𝑅𝐵
Í 𝑔
as 𝑔∈𝐺 𝑝(𝑔)|𝑔⟩⟨𝑔| 𝑅′ ⊗ 𝜏𝑅𝐵 , where 𝑝 : 𝐺 → [0, 1] is a probability distribution and
{𝜏𝑅𝐵 }𝑔∈𝐺 is a set of states. This operator is in the set SEP(𝑅′ 𝑅 : 𝐵) because the
𝑔

set SEP is closed under local channels. Next, due to the covariance of N, we have
that (V𝐵 ) † ◦ N ◦ U 𝐴 = N, so that
𝑔 𝑔

𝜌
𝑫 (N 𝐴→𝐵 (𝜓 𝑅′ 𝑅 𝐴 )∥𝜏𝑅′ 𝑅𝐵 ) (10.3.29)

© 1 ∑︁ ∑︁
𝑔† 𝑔 𝑔ª
≥ 𝑫 |𝑔⟩⟨𝑔| 𝑅′ ⊗ N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) 𝑝(𝑔)|𝑔⟩⟨𝑔| 𝑅′ ⊗ 𝑉𝐵 𝜏𝑅𝐵𝑉𝐵 ®
|𝐺 | 𝑔∈𝐺 𝑔∈𝐺
« ¬
(10.3.30)
∑︁
𝑔† 𝑔 𝑔ª
≥ 𝑫 N 𝐴→𝐵 (𝜓 𝑅 𝐴 )
©
𝑝(𝑔)𝑉𝐵 𝜏𝑅𝐵𝑉𝐵 ® , (10.3.31)
« 𝑔∈𝐺 ¬
where to obtain the last inequality we used the data-processing inequality for 𝑫
Í 𝑔† 𝑔 𝑔
under the channel Tr 𝑅′ . Now, observe that the state 𝑔∈𝐺 𝑝(𝑔)|𝑔⟩⟨𝑔| 𝑅′ ⊗ 𝑉𝐵 𝜏𝑅𝐵𝑉𝐵
is in the set SEP(𝑅′ 𝑅 : 𝐵). This is due to the fact that 𝑔∈𝐺 |𝑔⟩⟨𝑔| 𝑅′ ⊗ 𝑉𝐵 is a
Í 𝑔†

controlled unitary, and since register 𝑅′ is classical, this controlled unitary can
be implemented as an LOCC channel. Also, the set SEP is closed under LOCC
Í 𝑔† 𝑔
channels. It follows then that 𝑔∈𝐺 𝑝(𝑔)𝑉𝐵 𝜏𝑅𝐵𝑉𝐵 ∈ SEP(𝑅 : 𝐵) because we obtain
it from the previous separable state by applying a local partial trace over 𝑅′. By
taking the infimum over every state 𝜏𝑅𝐵 ∈ SEP(𝑅 : 𝐵) in (10.3.31), we have that
𝜌 𝜌
𝑫 (N 𝐴→𝐵 (𝜙 𝑅 𝐴 )∥𝜎𝑅𝐵 ) = 𝑫 (N 𝐴→𝐵 (𝜓 𝑅′ 𝑅 𝐴 )∥𝜏𝑅′ 𝑅𝐵 ) (10.3.32)
599
Chapter 10: Entanglement Measures for Quantum Channels

≥ inf 𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜏𝑅𝐵 ) (10.3.33)

𝜏𝑅𝐵 ∈SEP(𝑅:𝐵)
= 𝑬 (𝑅; 𝐵)𝜔 , (10.3.34)

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ). This inequality holds for every state 𝜓 𝑅 𝐴 and every
state 𝜎𝑅𝐵 ∈ SEP(𝑅 : 𝐵). Therefore, optimizing over all 𝜎𝑅𝐵 ∈ SEP(𝑅 : 𝐵) leads to
𝜌
inf 𝑫 (N 𝐴→𝐵 (𝜙 𝑅 𝐴 )∥𝜎𝑅𝐵 ) = 𝑬 (𝑅; 𝐵)𝜔 ≥ 𝑬 (𝑅; 𝐵)𝜔 , (10.3.35)
𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)

𝜌
where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜙 𝑅 𝐴 ). This is precisely the inequality in (10.3.17).
𝜌
Next, by construction, the state 𝜙 𝑅 𝐴 is such that its reduced state on 𝐴 is invariant
under the channel T𝐺 . Optimizing over all such states leads to

sup{𝑬 (𝑅; 𝐵)𝜔 : 𝜙 𝐴 = T𝐺 (𝜙 𝐴 ), 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜙 𝑅 𝐴 )} ≥ 𝑬 (𝑅; 𝐵)𝜔 . (10.3.36)

𝜙𝑅 𝐴

Since this inequality holds for every pure state 𝜓 𝑅 𝐴 , we finally obtain

sup{𝑬 (𝑅; 𝐵)𝜔 : 𝜙 𝐴 = T𝐺 (𝜙 𝐴 ), 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜙 𝑅 𝐴 )} ≥ 𝑬 (N). (10.3.37)

𝜙𝑅 𝐴

Since the reverse inequality trivially holds, we obtain (10.3.19). ■

We saw in Section 9.2.1 that both the max-relative entropy of entanglement

and the hypothesis testing relative entropy of entanglement can be formulated as
cone programs. We now show that the max-relative entropy of entanglement for
channels can also be formulated as a cone program.

Proposition 10.10 Cone Program for the Max-Relative Entropy of En-

tanglement of a Quantum Channel
Let N 𝐴→𝐵 be a quantum channel. Then
𝐸 max (N) = log2 Σmax (N), (10.3.38)
where
N

Σmax (N) B inf ∥Tr 𝐵 [𝑌𝑆𝐵 ] ∥ ∞ : Γ𝑆𝐵 ≤ 𝑌𝑆𝐵 , (10.3.39)
𝑌𝑆𝐵 ∈SEP
d

N is the Choi operator of the channel N

and Γ𝑆𝐵 𝐴→𝐵 .

600
Chapter 10: Entanglement Measures for Quantum Channels

Proof: Using the definition of a channel’s max-relative entropy of entanglement,

and the cone program formulation of the max-relative entropy of entanglement for
states from Proposition 9.21, we have

𝐸 max (N) = sup 𝐸 max (𝑆; 𝐵)𝜔 (10.3.40)

𝜓𝑆 𝐴
= sup inf log2 {Tr[𝑋𝑆𝐵 ] : N 𝐴→𝐵 (𝜓 𝑆 𝐴 ) ≤ 𝑋𝑆𝐵 } , (10.3.41)
𝜓 𝑆 𝐴 𝑋𝑆𝐵 ∈SEP
d

where 𝜔 𝑆𝐵 = N 𝐴→𝐵 (𝜓 𝑆 𝐴 ). Now, recall from (2.2.38) that an arbitrary pure bipartite
Í𝑑 𝐴−1
state 𝜓 𝑆 𝐴 can be written as 𝑍 𝑆 Γ𝑆 𝐴 𝑍 𝑆† , where Γ𝑆 𝐴 = |Γ⟩⟨Γ|, |Γ⟩𝑆 𝐴 = 𝑖=0 |𝑖, 𝑖⟩𝑆 𝐴 ,
and 𝑍 𝑆 is an operator satisfying Tr[𝑍 𝑆† 𝑍 𝑆 ] = 1. Then

N 𝐴→𝐵 (𝜓 𝑆 𝐴 ) = N 𝐴→𝐵 (𝑍 𝑆 Γ𝑆 𝐴 𝑍 𝑆† ) (10.3.42)

= 𝑍 𝑆 N 𝐴→𝐵 (Γ𝑆 𝐴 )𝑍 𝑆† (10.3.43)
N †
= 𝑍 𝑆 Γ𝑆𝐵 𝑍𝑆 , (10.3.44)
N =N
where Γ𝑆𝐵 𝐴→𝐵 (Γ𝑆 𝐴 ) is the Choi operator of N 𝐴→𝐵 . Since the set of operators
𝑍 𝑆 satisfying 𝑍 𝑆† 𝑍 𝑆 > 0 and Tr[𝑍 𝑆† 𝑍 𝑆 ] = 1 is dense in the set of all operators
satisfying Tr[𝑍 𝑆† 𝑍 𝑆 ] = 1, we find that

𝐸 max (N)
n o
N †
= log2 sup inf Tr[𝑋𝑆𝐵 ] : 𝑍 𝑆 Γ𝑆𝐵 𝑍𝑆 ≤ 𝑋𝑆𝐵 , 𝑍 𝑆† 𝑍 𝑆 > 0, Tr[𝑍 𝑆† 𝑍 𝑆 ] =1 .
𝑍 𝑆 𝑋𝑆𝐵 ∈SEP
d
(10.3.45)
Let us now make a change of variable, defining the variable 𝑌𝑆𝐵 according to the
relation 𝑋𝑆𝐵 = 𝑍 𝑆𝑌𝑆𝐵 𝑍 𝑆† . Then, since
N †
𝑍 𝑆 Γ𝑆𝐵 𝑍 𝑆 ≤ 𝑋𝑆𝐵 = 𝑍 𝑆𝑌𝑆𝐵 𝑍 𝑆† ⇐⇒ N
Γ𝑆𝐵 ≤ 𝑌𝑆𝐵 , (10.3.46)
𝑋𝑆𝐵 ∈ SEP
d ⇐⇒ 𝑌𝑆𝐵 ∈ SEP,
d (10.3.47)

we find that

Eq. (10.3.45)
n o
= sup inf Tr[𝑍 𝑆𝑌𝑆𝐵 𝑍 𝑆† ] : N
Γ𝑆𝐵 ≤ 𝑌𝑆𝐵 , 𝑍 𝑆† 𝑍 𝑆 > 0, Tr[𝑍 𝑆† 𝑍 𝑆 ] =1
𝑍 𝑆 𝑌𝑆𝐵 ∈SEP
d
n o
= sup inf Tr[𝑍 𝑆† 𝑍 𝑆𝑌𝑆𝐵 ] : N
Γ𝑆𝐵 ≤ 𝑌𝑆𝐵 , 𝑍 𝑆† 𝑍 𝑆 > 0, Tr[𝑍 𝑆† 𝑍 𝑆 ] =1
𝑍 𝑆 𝑌𝑆𝐵 ∈SEP
d

601
Chapter 10: Entanglement Measures for Quantum Channels

N

= sup inf Tr[𝜌 𝑆𝑌𝑆𝐵 ] : Γ𝑆𝐵 ≤ 𝑌𝑆𝐵 , (10.3.48)
𝜌 𝑆 𝑌𝑆𝐵 ∈SEP
d

where in the last line we made the substitution 𝜌 𝑆 = 𝑍 𝑆† 𝑍 𝑆 , so that the optimization
is with respect to density operators. Furthermore, we have employed the fact that
the set of density operators satisfying 𝜌 𝑆 > 0 is dense in the set of all density
operators. Now observing that the objective function is linear in 𝜌 𝑆 and 𝑌𝑆𝐵 , the
set of density operators is compact and convex, and the set of separable operators
is convex, the Sion minimax theorem (Theorem 2.24) applies, such that we can
exchange the optimizations to find that
N

sup inf Tr[𝜌 𝑆𝑌𝑆𝐵 ] : Γ𝑆𝐵 ≤ 𝑌𝑆𝐵
𝜌 𝑆 𝑌𝑆𝐵 ∈SEP
d
N

= inf sup Tr[𝜌 𝑆𝑌𝑆𝐵 ] : Γ𝑆𝐵 ≤ 𝑌𝑆𝐵 (10.3.49)
𝑌𝑆𝐵 ∈SEP
d 𝜌𝑆
N

= inf sup Tr[𝜌 𝑆 Tr 𝐵 [𝑌𝑆𝐵 ]] : Γ𝑆𝐵 ≤ 𝑌𝑆𝐵 (10.3.50)
𝑌𝑆𝐵 ∈SEP
d 𝜌𝑆
N

= inf ∥Tr 𝐵 [𝑌𝑆𝐵 ] ∥ ∞ : Γ𝑆𝐵 ≤ 𝑌𝑆𝐵 (10.3.51)
𝑌𝑆𝐵 ∈SEP
d

= Σmax (N). (10.3.52)

The second equality follows from partial trace, and the third follows because
∥ 𝑋 ∥ ∞ = sup 𝜌 Tr[𝑋 𝜌] for positive semi-definite operators 𝑋, where the optimization
is with respect to density operators (see (2.2.123)). ■

10.4 Generalized Rains Divergence

We now examine the generalized Rains divergence of quantum channels, which is a
channel entanglement measure that arises from the generalized Rains divergence
for bipartite quantum states that we considered in Section 9.3.

Definition 10.11 Generalized Rains Information of a Quantum Channel

Let 𝑫 be a generalized divergence (see Definition 7.15. For every quantum
channel N 𝐴→𝐵 , we define the generalized Rains information of N as

𝑹(N) B sup 𝑹(𝑆; 𝐵)𝜔 (10.4.1)

𝜓𝑆 𝐴

602
Chapter 10: Entanglement Measures for Quantum Channels

= sup inf
′
𝑫 (N 𝐴→𝐵 (𝜓 𝑆 𝐴 )∥𝜎𝑆𝐵 ), (10.4.2)
𝜓 𝑆 𝐴 𝜎𝑆𝐵 ∈PPT (𝑆:𝐵)

where 𝜔 𝑆𝐵 B N 𝐴→𝐵 (𝜓 𝑆 𝐴 ). The supremum is with respect to every pure state

𝜓 𝑆 𝐴 , with the dimension of 𝑆 the same as the dimension of 𝐴.

In the remark immediately after Definition 10.1 we show that it suffices to

optimize with respect to pure bipartite states (with equal dimension for each
subsystem) when calculating the generalized Rains divergence of a quantum
channel.
We can write the generalized Rains divergence of N 𝐴→𝐵 in the following
alternate form:

𝑹(N) = sup 𝑹(N 𝐴→𝐵 , 𝜌 𝐴 ), (10.4.3)

𝜌𝐴
√ √
𝑹(N 𝐴→𝐵 , 𝜌 𝐴 ) B inf′ 𝑫 ( 𝜌 𝐴 ΓN
𝐴𝐵 𝜌 𝐴 ∥𝜎𝐴𝐵 ). (10.4.4)
𝜎𝐴𝐵 ∈PPT ( 𝐴:𝐵)
√
This is indeed true because we can write every purification of 𝜌 𝐴 as (𝑉𝑆 𝜌 𝑆 ⊗
1 𝐴 )|Γ⟩𝑆 𝐴 for some isometry 𝑉𝑆 (see (2.2.38) and Theorem 2.3), the set of PPT′
operators is invariant under local isometries, and the generalized divergences are
invariant under local isometries (see Proposition 7.16).
As with the state quantities, we are interested in the following generalized Rains
information quantities for every quantum channel N 𝐴→𝐵 . For every case below, we
define 𝜔 𝑆𝐵 = N 𝐴→𝐵 (𝜓 𝑆 𝐴 ).
1. The Rains information of N,

𝑅(N) B sup 𝑅(𝑆; 𝐵)𝜔 (10.4.5)

𝜓𝑆 𝐴
= sup inf
′
𝐷 (N 𝐴→𝐵 (𝜓 𝑆 𝐴 )∥𝜎𝑆𝐵 ). (10.4.6)
𝜓 𝑆 𝐴 𝜎𝑆𝐵 ∈PPT (𝑆:𝐵)

2. The 𝜀-hypothesis testing Rains information of N,

𝜀 𝜀
𝑅𝐻 (N) B sup 𝑅𝐻 (𝑆; 𝐵)𝜔 (10.4.7)
𝜓𝑆 𝐴
= sup inf′ 𝐷 𝜀𝐻 (N 𝐴→𝐵 (𝜓 𝑆 𝐴 )∥𝜎𝑆𝐵 ), (10.4.8)
𝜓 𝑆 𝐴 𝜎𝑆𝐵 ∈PPT (𝑆:𝐵)

603
Chapter 10: Entanglement Measures for Quantum Channels

3. The sandwiched Rényi Rains information of N,

e𝛼 (N) B sup 𝑅
𝑅 e𝛼 (𝑆; 𝐵)𝜔 (10.4.9)
𝜓𝑆 𝐴

= sup inf e𝛼 (N 𝐴′ →𝐵 (𝜓 𝑆 𝐴 )∥𝜎𝑆𝐵 ),

𝐷 (10.4.10)
′
𝜓 𝑆 𝐴 𝜎𝑆𝐵 ∈PPT (𝑆:𝐵)

where 𝛼 ∈ [1/2, 1) ∪ (1, ∞). It follows from Proposition 7.31 that 𝑅

e𝛼 (N) is
monotonically increasing in 𝛼. Also, in Appendix 10.A, we prove that
e𝛼 (N)
𝑅(N) = lim 𝑅 (10.4.11)
𝛼→1

for every quantum channel N.

4. The max-Rains information of N,

𝑅max (N) B sup 𝑅max (𝑆; 𝐵)𝜔 (10.4.12)

𝜓𝑆 𝐴
= sup inf′
𝐷 max (N 𝐴→𝐵 (𝜓 𝑆 𝐴 )∥𝜎𝑆𝐵 ). (10.4.13)
𝜓 𝑆 𝐴 𝜎𝑆𝐵 ∈PPT (𝑆:𝐵)

In Appendix 10.A, we prove that

𝑅max (N) = lim 𝑅

e𝛼 (N) (10.4.14)
𝛼→∞

for every quantum channel N.

The generalized Rains divergence of a quantum channel satisfies all of the
properties of a channel entanglement measure laid out in Proposition 10.2, except
for faithfulness and superadditivity. Faithfulness generally does not hold because
the generalized Rains divergence of a bipartite quantum state is not faithful.
Superadditivity does not hold in general because the Rains divergence of a bipartite
quantum state is generally only subadditive. The max-Rains information of a
quantum channel, however, is additive, meaning that

𝑅max (N ⊗ M) = 𝑅max (N) + 𝑅max (M) (10.4.15)

for all quantum channels N and M. We defer a proof of this to Section 10.6 below.
The amortized generalized Rains divergence 𝑹 A (N), defined according to
Definition 10.3 as

𝑹 A (N) B sup {𝑹( 𝐴′; 𝐵𝐵′)𝜔 − 𝑹( 𝐴′ 𝐴; 𝐵) 𝜌 }, (10.4.16)

𝜌 𝐴′ 𝐴𝐵′

604
Chapter 10: Entanglement Measures for Quantum Channels

where 𝜔 𝐴′ 𝐵𝐵′ = N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ), satisfies all of the properties stated in Proposi-
tion 10.5 except for faithfulness, because the generalized Rains divergence of
a bipartite quantum state is not faithful. In particular, due to additivity of the
max-Rains relative entropy (Proposition 9.29), we immediately obtain additivity of
the amortized max-Rains information of a quantum channel, i.e.,
A A A
𝑅max (N ⊗ M) = 𝑅max (N) + 𝑅max (M) (10.4.17)
for all quantum channels N and M. We show in Section 10.6 below that
A
𝑅max (N) = 𝑅max (N) (10.4.18)
for every quantum channel N, and it is this fact that leads to the additivity statement
in (10.4.15).
For covariant channels, the optimization over pure input states in the generalized
Rains divergence simplifies in the same way as it does for the generalized divergence
of entanglement.

Proposition 10.12 Generalized Rains Information for Covariant Chan-

nels
Let N 𝐴→𝐵 be a 𝐺-covariant quantum channel for a finite group 𝐺 (recall
Definition 4.18). Then, for every pure state 𝜓 𝑅 𝐴 , with the dimension of 𝑅 equal
to the dimension of 𝐴, we have that

𝑹(𝑆; 𝐵)𝜔 ≤ 𝑹(𝑆; 𝐵)𝜔 , (10.4.19)

𝜌
where 𝜔 𝑅𝐵 B N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), 𝜔 𝑅𝐵 B N 𝐴→𝐵 (𝜙 𝑅 𝐴 ),

1 ∑︁ 𝑔 𝑔†
𝜌𝐴 = 𝑈 𝐴 𝜌 𝐴𝑈 𝐴 C T𝐺 (𝜌 𝐴 ), (10.4.20)
|𝐺 | 𝑔∈𝐺

𝜌
𝜌 𝐴 = 𝜓 𝐴 = Tr𝑆 [𝜓 𝑆 𝐴 ], and 𝜙 𝑆 𝐴 a purification of 𝜌 𝐴 . Consequently,

𝑹(N) = sup{𝑹(𝑆; 𝐵)𝜔 : 𝜙 𝐴 = T𝐺 (𝜙 𝐴 ), 𝜔 𝑆𝐵 = N 𝐴→𝐵 (𝜙 𝑆 𝐴 )}. (10.4.21)

𝜙𝑆 𝐴

In other words, in order to calculate 𝑹(N), it suffices to optimize with respect

to pure states 𝜙 𝑆 𝐴 such that the reduced state 𝜙 𝐴 is invariant under the channel
T𝐺 defined in (10.4.20).

605
Chapter 10: Entanglement Measures for Quantum Channels

Remark: Using (10.4.3), we can write (10.4.19) as

𝑹(N, 𝜌) ≤ 𝑹(N, T𝐺 (𝜌)), (10.4.22)

which holds for every state 𝜌 acting on the input space of the channel N. We can write (10.4.21)
as
𝑹(N) = sup{𝑹(N, 𝜌) : 𝜌 = T𝐺 (𝜌)}. (10.4.23)
𝑅

Proof: The proof is identical to the proof of Proposition 10.9, with the exception
that the set PPT’ is involved rather than the set SEP. The LOCC channels discussed
there preserve the set PPT’, and this is the main reason why the same proof
applies. ■

Using the SDP formulation of max-Rains relative entropy in Proposition 9.27,

we arrive at an SDP formulation for the max-Rains information of a quantum
channel.

Proposition 10.13 SDPs for Max-Rains Information of a Quantum Chan-

nel
Let N 𝐴→𝐵 be a quantum channel. Then

𝑅max (N) = log2 Γmax (N), (10.4.24)

where

Γmax (N)
N
= inf {∥Tr 𝐵 [𝑉𝑆𝐵 + 𝑌𝑆𝐵 ] ∥ ∞ : T𝐵 (𝑉𝑆𝐵 − 𝑌𝑆𝐵 ) ≥ Γ𝑆𝐵 } (10.4.25)
𝑌𝑆𝐵 ,𝑉𝑆𝐵 ≥0
N
= sup {Tr[Γ𝑆𝐵 𝑋𝑆𝐵 ] : Tr[𝜌 𝑆 ] ≤ 1, −𝜌 𝑆 ⊗ 1𝐵 ≤ T𝐵 (𝑋𝑆𝐵 ) ≤ 𝜌 𝑆 ⊗ 1𝐵 }.
𝜌 𝑆 ≥0
(10.4.26)

Proof: To arrive at (10.4.26), consider that, with 𝜔 𝑆𝐵 = N 𝐴→𝐵 (𝜓 𝑆 𝐴 ),

2 𝑅max (N)
= sup 2 𝑅max (𝑆;𝐵) 𝜔 (10.4.27)
𝜓𝑆 𝐴
= sup {Tr[N 𝐴→𝐵 (𝜓 𝑆 𝐴 ) 𝑋𝑆𝐵 ] : ∥T𝐵 (𝑋𝑆𝐵 ) ∥ ∞ ≤ 1, 𝑋𝑆𝐵 ≥ 0} , (10.4.28)
606
Chapter 10: Entanglement Measures for Quantum Channels

where the last equality follows from (9.3.47)–(9.3.49). Recall from (2.2.38) that an
arbitrary pure bipartite state 𝜓 𝑆 𝐴 can be written as 𝑍 𝑆 Γ𝑆 𝐴 𝑍 𝑆† , where Γ𝑆 𝐴 = |Γ⟩⟨Γ| 𝑆 𝐴 ,
|𝑖, 𝑖⟩𝑆 𝐴 , and 𝑍 𝑆 is an operator satisfying Tr[𝑍 𝑆† 𝑍 𝑆 ] = 1. Then
Í𝑑 𝐴=1
|Γ⟩𝑆 𝐴 = 𝑖=0

Tr[N 𝐴→𝐵 (𝜓 𝑆 𝐴 ) 𝑋𝑆𝐵 ] = Tr[N 𝐴→𝐵 (𝑍 𝑆 Γ𝑆 𝐴 𝑍 𝑆† ) 𝑋𝑆𝐵 ] (10.4.29)

= Tr[𝑍 𝑆 N 𝐴→𝐵 (Γ𝑆 𝐴 )𝑍 𝑆† 𝑋𝑆𝐵 ] (10.4.30)
N †
= Tr[Γ𝑆𝐵 𝑍 𝑆 𝑋𝑆𝐵 𝑍 𝑆 ], (10.4.31)
N =N
where Γ𝑆𝐵 𝐴→𝐵 (Γ𝑆 𝐴 ) denotes the Choi operator of the channel N 𝐴→𝐵 . Since
the set of operators 𝑍 𝑆 satisfying 𝑍 𝑆† 𝑍 𝑆 > 0 and Tr[𝑍 𝑆† 𝑍 𝑆 ] = 1 is dense in the set of
all operators satisfying Tr[𝑍 𝑆† 𝑍 𝑆 ] = 1, we find that

N †
2 𝑅max (N) = sup{Tr[Γ𝑆𝐵 𝑍 𝑆 𝑋𝑆𝐵 𝑍 𝑆 ] : ∥T𝐵 (𝑋𝑆𝐵 )∥ ∞ ≤ 1,
𝑋𝑆𝐵 ≥ 0, 𝑍 𝑆† 𝑍 𝑆 > 0, Tr[𝑍 𝑆† 𝑍 𝑆 ] = 1}. (10.4.32)

Consider that 𝑋𝑆𝐵 ≥ 0 ⇔ 𝑍 𝑆† 𝑋𝑆𝐵 𝑍 𝑆 ≥ 0 and

∥T𝐵 (𝑋𝑆𝐵 ) ∥ ∞ ≤ 1
⇐⇒ − 1𝑆𝐵 ≤ T𝐵 (𝑋𝑆𝐵 ) ≤ 1𝑆𝐵 (10.4.33)
⇐⇒ − 𝑍 𝑆† 𝑍 𝑆 ⊗ 1𝐵 ≤ 𝑍 𝑆† T𝐵 (𝑋𝑆𝐵 )𝑍 𝑆 ≤ 𝑍 𝑆† 𝑍 𝑆 ⊗ 1𝐵 (10.4.34)
⇐⇒ − 𝑍 𝑆† 𝑍 𝑆 ⊗ 1𝐵 ≤ T𝐵 (𝑍 𝑆† 𝑋𝑆𝐵 𝑍 𝑆 ) ≤ 𝑍 𝑆† 𝑍 𝑆 ⊗ 1𝐵 . (10.4.35)

′ B 𝑍 † 𝑋 𝑍 and 𝜌 = 𝑍 † 𝑍 > 0 and rewrite as follows:

We now set 𝑋𝑆𝐵 𝑆 𝑆𝐵 𝑆 𝑆 𝑆 𝑆

N ′
2 𝑅max (N) = sup{Tr[Γ𝑆𝐵 𝑋𝑆𝐵 ] : −𝜌 𝑆 ⊗ 1𝐵 ≤ T𝐵 (𝑋𝑆𝐵
′
) ≤ 𝜌𝑆 ⊗ 1𝐵 ,
′
𝑋𝑆𝐵 ≥ 0, 𝜌 𝑆 > 0, Tr[𝜌 𝑆 ] = 1}, (10.4.36)

which is the equality in (10.4.24) and (10.4.26), after observing that the set
{𝜌 𝑆 : 𝜌 𝑆 > 0, Tr[𝜌 𝑆 ] = 1} is dense in the set {𝜌 𝑆 : 𝜌 𝑆 ≥ 0, Tr[𝜌 𝑆 ] = 1}.
To arrive at the equality in (10.4.25), we employ the dual formulation of the
max-Rains relative entropy in (9.3.49). Consider that

2 𝑅max (N)
= sup 2 𝑅max (𝑆;𝐵) 𝜔 (10.4.37)
𝜓𝑆 𝐴

607
Chapter 10: Entanglement Measures for Quantum Channels

= sup inf {Tr[𝐾𝑆𝐵 + 𝐿 𝑆𝐵 ] : T𝐵 (𝐾𝑆𝐵 − 𝐿 𝑆𝐵 ) ≥ N 𝐴→𝐵 (𝜓 𝑆 𝐴 )}. (10.4.38)

𝜓 𝑆 𝐴 𝐾𝑆𝐵 ,𝐿 𝑆𝐵 ≥0

Making the same observations as we did previously, we have that N 𝐴→𝐵 (𝜓 𝑆 𝐴 ) =

N 𝑍 † , as well as
𝑍 𝑆 Γ𝑆𝐵 𝑆

′ N
− 𝐿 ′𝑆𝐵 ≥ Γ𝑆𝐵

T𝐵 (𝐾𝑆𝐵 − 𝐿 𝑆𝐵 ) ≥ N 𝐴→𝐵 (𝜓 𝑆 𝐴 ) ⇐⇒ T𝐵 𝐾𝑆𝐵 , (10.4.39)
′ and 𝐿 ′ are such that 𝐾 ′ † ′ †
where 𝐾𝑆𝐵 𝑆𝐵 𝑆𝐵 = 𝑍 𝑆 𝐾 𝑆𝐵 𝑍 𝑆 and 𝐿 𝑆𝐵 = 𝑍 𝑆 𝐿 𝑆𝐵 𝑍 𝑆 , respectively.
′ , 𝐿 ′ ≥ 0, and we find that
Then 𝐾𝑆𝐵 , 𝐿 𝑆𝐵 ≥ 0 ⇐⇒ 𝐾𝑆𝐵 𝑆𝐵

2 𝑅max (N) = sup ′

𝑍 𝑆† + 𝑍 𝑆 𝐿 ′𝑆𝐵 𝑍 𝑆† ] : T𝐵 𝐾𝑆𝐵
′ N
− 𝐿 ′𝑆𝐵 ≥ Γ𝑆𝐵

′
inf′ {Tr[𝑍 𝑆 𝐾𝑆𝐵 ,
𝑍 𝑆 𝐾𝑆𝐵 ,𝐿 𝑆𝐵 ≥0
𝑍 𝑆† 𝑍 𝑆 > 0, Tr[𝑍 𝑆† 𝑍 𝑆 ] = 1}. (10.4.40)

Employing cyclicity of trace, setting 𝜌 𝑆 = 𝑍 𝑆† 𝑍 𝑆 , and exploiting the fact that the set
{𝜌 𝑆 : 𝜌 𝑆 > 0, Tr[𝜌 𝑆 ] = 1} is dense in the set {𝜌 𝑆 : 𝜌 𝑆 ≥ 0, Tr[𝜌 𝑆 ] = 1}, we find
that

2 𝑅max (N) = sup ′ inf ′ {Tr[𝜌 𝑆 (𝐾𝑆𝐵

′
+ 𝐿 ′𝑆𝐵 )] : 𝐾𝑆𝐵
′
, 𝐿 ′𝑆𝐵 ≥ 0,
𝜌 𝑆 𝐾𝑆𝐵 ,𝐿 𝑆𝐵
′ N
− 𝐿 ′𝑆𝐵 ≥ Γ𝑆𝐵

T𝐵 𝐾𝑆𝐵 , 𝜌 𝑆 ≥ 0, Tr[𝜌 𝑆 ] = 1}. (10.4.41)
′ and
The function that we are optimizing is linear in 𝜌 𝑆 and jointly convex in 𝐾𝑆𝐵
𝐿 ′𝑆𝐵 (the set with respect to which the infimum is performed is also compact), so
that the minimax theorem (Theorem 2.24) applies and we can exchange sup with
inf to find that

2 𝑅max (N) = inf

′ ,𝐿 ′
′
sup{Tr[𝜌 𝑆 (𝐾𝑆𝐵 + 𝐿 ′𝑆𝐵 )] : 𝐾𝑆𝐵
′
, 𝐿 ′𝑆𝐵 ≥ 0,
𝐾𝑆𝐵 𝑆𝐵 𝜌𝑆
′ N
− 𝐿 ′𝑆𝐵 ≥ Γ𝑆𝐵

T𝐵 𝐾𝑆𝐵 , 𝜌 𝑆 ≥ 0, Tr[𝜌 𝑆 ] = 1}. (10.4.42)
′ and 𝐿 ′ , consider that
For fixed 𝐾𝑆𝐵 𝑆𝐵

′
sup{Tr[𝜌 𝑆 (𝐾𝑆𝐵 + 𝐿 ′𝑆𝐵 )] : 𝜌 𝑆 ≥ 0, Tr[𝜌 𝑆 ] = 1}
𝜌𝑆
′
= sup{Tr[𝜌 𝑆 Tr 𝐵 [𝐾𝑆𝐵 + 𝐿 ′𝑆𝐵 ]] : 𝜌 𝑆 ≥ 0, Tr[𝜌 𝑆 ] = 1} (10.4.43)
𝜌𝑆
′
= Tr 𝐵 [𝐾𝑆𝐵 + 𝐿 ′𝑆𝐵 ] ∞
, (10.4.44)

608
Chapter 10: Entanglement Measures for Quantum Channels

where for the last line we used (2.2.123). Substituting back in, we find that

2 𝑅max (N) = ′
inf { Tr 𝐵 [𝐾𝑆𝐵
′ ,𝐿 ′
+ 𝐿 ′𝑆𝐵 ] ∞
′
: 𝐾𝑆𝐵 , 𝐿 ′𝑆𝐵 ≥ 0,
𝐾𝑆𝐵 𝑆𝐵
′ N
− 𝐿 ′𝑆𝐵 ≥ Γ𝑆𝐵

T𝐵 𝐾𝑆𝐵 }, (10.4.45)
as claimed in (10.4.26).
According to Theorem 2.28, strong duality holds by picking 𝑉𝑆𝐵 and 𝑌𝑆𝐵 equal
N ), respectively, which are feasible for
to the positive and negative parts of T𝐵 (Γ𝑆𝐵
(10.4.26). Furthermore, we can pick 𝜌 𝑆 = 1𝑆 /(2𝑑 𝑆 ) and 𝑋𝑆𝐵 = 1𝑆𝐵 /(3𝑑 𝑆 ), which
are strictly feasible for (10.4.25). ■

10.5 Squashed Entanglement

We now move on to the squashed entanglement of a quantum channel, which is a
channel entanglement measure that arises from the squashed entanglement of a
bipartite state, the latter defined in Section 9.4.

Definition 10.14 Squashed Entanglement of a Quantum Channel

For every quantum channel N 𝐴→𝐵 , we define the squashed entanglement of N
as

𝐸 sq (N) = sup 𝐸 sq (𝑅; 𝐵)𝜔 (10.5.1)

𝜓𝑅 𝐴
1
= sup inf {𝐼 (𝑅; 𝐵|𝐸)𝜏 : Tr𝐸 [𝜏𝑅𝐵𝐸 ] = 𝜔 𝑅𝐵 }, (10.5.2)
2 𝜓 𝑅 𝐴 𝜏𝑅𝐵𝐸

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) and the quantum conditional mutual information

𝐼 (𝑅; 𝐵|𝐸)𝜏 is defined as

𝐼 (𝑅; 𝐵|𝐸)𝜏 = 𝐻 (𝑅|𝐸)𝜏 + 𝐻 (𝐵|𝐸)𝜏 − 𝐻 (𝑅𝐵|𝐸)𝜏 . (10.5.3)

We can write the squashed entanglement of a quantum channel in the following

alternate form:
𝐸 sq (N) = sup 𝐸 sq (N, 𝜌), (10.5.4)
𝜌𝐴

609
Chapter 10: Entanglement Measures for Quantum Channels

𝐸 sq (N, 𝜌) B 𝐸 sq (𝑅; 𝐵)𝜔 . (10.5.5)

𝜌 𝜌
where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), with 𝜓 𝑅 𝐴 some purification of 𝜌 𝐴 . This is indeed true
because the squashed entanglement of a bipartite state is invariant under isometries
and because all purifications of a given state are related to each other by local
isometries acting on the purifying system.
Let us now recall the following alternate expression for the squashed entangle-
ment of a bipartite state from (9.4.44):
1 𝜌
𝐸 sq ( 𝐴; 𝐵) 𝜌 = inf 𝐻 (𝐵|𝐸)𝜃 + 𝐻 (𝐵|𝐹)𝜃 : 𝜃 𝐵𝐸 𝐹 = V𝐸 ′ →𝐸 𝐹 (𝜓 𝐴𝐵𝐸 ′ ), ,
2 V𝐸 ′ →𝐸𝐹
(10.5.6)
𝜌
where 𝜓 𝐴𝐵𝐸 ′ is some purification of 𝜌 𝐴𝐵 . Now, given an input state 𝜌 𝐴 of a
quantum channel N 𝐴→𝐵 , let 𝜙 𝑅 𝐴 be a purification of 𝜌 𝐴 . Then, for the state
𝜔 𝑅𝐵 B N 𝐴→𝐵 (𝜙 𝑅 𝐴 ), we can take a purification to be 𝜓 𝑅𝐵𝐸 ′ = UN𝐴→𝐵𝐸 ′ (𝜙 𝑅 𝐴 ),
N
where U is an isometric channel that extends N. Then, for every isometric channel
V𝐸 ′ →𝐸 𝐹 , we can define the state
𝜃 𝐵𝐸 𝐹 = V𝐸 ′ →𝐸 𝐹 (𝜓 𝐵𝐸 ′ ) = (V𝐸 ′ →𝐸 𝐹 ◦ UN
𝐴→𝐵𝐸 ′ )(𝜌 𝐴 ), (10.5.7)
where 𝜓 𝐵𝐸 ′ = Tr 𝑅 [𝜓 𝑅𝐵𝐸 ′ ]. We then have

𝐸 sq (N, 𝜌 𝐴 ) = 𝐸 (𝑅; 𝐵)𝜔 =

1
inf 𝐻 (𝐵|𝐸)𝜃 + 𝐻 (𝐵|𝐹)𝜃 : 𝜃 𝐵𝐸 𝐹 = (V𝐸 ′ →𝐸 𝐹 ◦ UN

𝐴→𝐵𝐸 ′ )(𝜌 𝐴 ) (10.5.8)
2 V𝐸 ′ →𝐸𝐹
for every state 𝜌 𝐴 . We thus obtain the following expression for the squashed
entanglement of a channel:

𝐸 sq (N) =
1
sup inf 𝐻 (𝐵|𝐸)𝜃 + 𝐻 (𝐵|𝐹)𝜃 : 𝜃 𝐵𝐸 𝐹 = (V𝐸 ′ →𝐸 𝐹 ◦ UN

𝐴→𝐵𝐸 ′ )(𝜌 𝐴 ) .
2 𝜌 𝐴 V𝐸 ′ →𝐸𝐹
(10.5.9)

The squashed entanglement of a quantum channel, as well as its amortized

A defined as
version 𝐸 sq
A
𝐸 sq (N) B sup {𝐸 sq ( 𝐴′; 𝐵𝐵′)𝜔 − 𝐸 sq ( 𝐴′ 𝐴; 𝐵′) 𝜌 }, (10.5.10)
𝜌 𝐴′ 𝐴𝐵′

610
Chapter 10: Entanglement Measures for Quantum Channels

with 𝜔 𝐴′ 𝐵𝐵′ = N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ), satisfy all of the properties stated in Proposition 10.2
and Proposition 10.5, respectively. In particular, because the squashed entanglement
for states is additive (see (9.4.8)), we immediately have that the amortized squashed
entanglement of a channel is additive, i.e.,
A A A
𝐸 sq (N ⊗ M) = 𝐸 sq (N) + 𝐸 sq (M) (10.5.11)

for all quantum channels N and M. In Section 10.6 below, we prove that 𝐸 sq (N) =
A (N) for every quantum channel N, which then implies the additivity of the
𝐸 sq
squashed entanglement of a channel, i.e.,

𝐸 sq (N ⊗ M) = 𝐸 sq (N) + 𝐸 sq (M) (10.5.12)

for all channels N and M.

Proposition 10.15
Let N 𝐴→𝐵 be a quantum channel. The function 𝜌 ↦→ 𝐸 sq (N, 𝜌), where
𝐸 sq (N, 𝜌) is defined in (10.5.5), is concave: for X a finite alphabet, 𝑝 : X →
[0, 1] a probability distribution on X, and {𝜌 𝑥𝐴 }𝑥∈X a set of states, the following
inequality holds
!
∑︁ ∑︁
𝐸 sq N, 𝑝(𝑥) 𝜌 𝑥𝐴 ≥ 𝑝(𝑥)𝐸 sq (N, 𝜌 𝑥𝐴 ). (10.5.13)
𝑥∈X 𝑥∈X

Proof: In order to prove this, we make use of the expression for 𝐸 sq (N, 𝜌 𝐴 ) in
(10.5.8).
For every state 𝜌 𝑥𝐴 , with 𝑥 ∈ X, define the state

𝜃 𝑥𝐵𝐸 𝐹 B (V𝐸 ′ →𝐸 𝐹 ◦ UN 𝑥
𝐴→𝐵𝐸 ′ )(𝜌 𝐴 ), (10.5.14)

where V𝐸 ′ →𝐸 𝐹 is an arbitrary isometric channel. Then, using (10.5.8), we have

1
𝐸 sq (N, 𝜌 𝑥𝐴 ) = 𝐸 (𝑅; 𝐵)𝜔 𝑥 ≤ (𝐻 (𝐵|𝐸)𝜃 𝑥 + 𝐻 (𝐵|𝐹)𝜃 𝑥 , ) (10.5.15)
2

Now, let
∑︁
𝜌𝐴 B 𝑝(𝑥) 𝜌 𝑥𝐴 , (10.5.16)
𝑥∈X

611
Chapter 10: Entanglement Measures for Quantum Channels
∑︁
𝜃 𝐵𝐸 𝐹 B 𝑝(𝑥)𝜃 𝑥𝐵𝐸 𝐹 (10.5.17)
𝑥∈X
∑︁
= 𝑝(𝑥)(V𝐸 ′ →𝐸 𝐹 ◦ UN 𝑥
𝐴→𝐵𝐸 ′ )(𝜌 𝐴 ) (10.5.18)
𝑥∈X
= (V𝐸 ′ →𝐸 𝐹 ◦ UN
𝐴→𝐵𝐸 ′ )(𝜌 𝐴 ). (10.5.19)

Using concavity of conditional entropy (see (7.2.120)), we obtain

∑︁ 1 ∑︁
𝑝(𝑥)𝐸 sq (N, 𝜌 𝑥𝐴 ) ≤ 𝑝(𝑥) (𝐻 (𝐵|𝐸)𝜃 𝑥 + 𝐻 (𝐵|𝐹)𝜃 𝑥 ) (10.5.20)
2
𝑥∈X 𝑥∈X
1
≤ (𝐻 (𝐵|𝐸)𝜃 + 𝐻 (𝐵|𝐹)𝜃 ). (10.5.21)
2
Finally, since the isometric channel V𝐸 ′ →𝐸 𝐹 is arbitrary, taking the infimum over
all such channels on the right-hand side of the inequality above and using (10.5.19)
gives us
∑︁ 1
𝑝(𝑥)𝐸 sq (N, 𝜌 𝑥𝐴 ) ≤ inf {𝐻 (𝐵|𝐸)𝜃 + 𝐻 (𝐵|𝐹)𝜃 } (10.5.22)
2 V𝐸 ′ →𝐸𝐹
𝑥∈X
= 𝐸 sq (N, 𝜌 𝐴 ), (10.5.23)

which is what we set out to prove. ■

10.6 Amortization Collapses

In Lemma 10.4, we proved the following relation between the entanglement 𝐸 (N)
of a channel N and its amortized entanglement 𝐸 A (N):

𝐸 (N) ≤ 𝐸 A (N). (10.6.1)

In general, therefore, amortization can yield a larger value for the entanglement of
a channel than the usual channel entanglement measure.
For which entanglement measures does the reverse inequality hold? In this
section, we investigate this question, and we prove that three of the channel
entanglement measures that we have considered in this chapter — max-relative
entropy of entanglement, max-Rains information, and squashed entanglement
— satisfy the reverse inequality. Thus, for these three entanglement measures,
612
Chapter 10: Entanglement Measures for Quantum Channels

amortization does not yield a higher entanglement value than the usual channel
entanglement measure. This so-called “amortization collapse” is important because
it immediately implies additivity of the usual channel entanglement measure.

10.6.1 Max-Relative Entropy of Entanglement

We start by proving that the amortization collapse occurs for max-relative entropy
of entanglement. The key tools in the proof are Propositions 9.21 and 10.10,
which provide cone programs for both the max-relative entropy of entanglement for
bipartite states and the max-relative entropy of entanglement for quantum channels.
Let us recall these now:

𝐸 max ( 𝐴; 𝐵) 𝜌 = log2 𝐺 max ( 𝐴; 𝐵) 𝜌 (10.6.2)

= log2 inf {Tr[𝑋 𝐴𝐵 ] : 𝜌 𝐴𝐵 ≤ 𝑋 𝐴𝐵 } , (10.6.3)
𝑋 𝐴𝐵 ∈SEP
d

𝐸 max (N) = log2 Σmax (N) (10.6.4)

N

= log2 inf ∥Tr 𝐵 [𝑌𝑆𝐵 ] ∥ ∞ : Γ𝑆𝐵 ≤ 𝑌𝑆𝐵 , (10.6.5)
𝑌𝑆𝐵 ∈SEP
d

N .
where 𝜌 𝐴𝐵 is a bipartite state and N is a quantum channel with Choi operator Γ𝑆𝐵

Theorem 10.16 Amortization Collapse for Max-Relative Entropy of

Entanglement
Let N 𝐴→𝐵 be a quantum channel. For every state 𝜌 𝐴′ 𝐴𝐵′ ,

𝐸 max ( 𝐴′; 𝐵𝐵′)𝜔 ≤ 𝐸 max (N) + 𝐸 max ( 𝐴′ 𝐴; 𝐵′) 𝜌 , (10.6.6)

where 𝜔 𝐴′ 𝐵𝐵′ B N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ). Consequently,

A
𝐸 max (N) ≤ 𝐸 max (N), (10.6.7)

and thus (by Lemma 10.4), we have that

A
𝐸 max (N) = 𝐸 max (N) (10.6.8)

for every quantum channel N.

613
Chapter 10: Entanglement Measures for Quantum Channels

Proof: Using the cone program formulations in (10.6.2)–(10.6.5), we find that the
inequality in (10.6.6) is equivalent to

𝐺 max ( 𝐴′; 𝐵𝐵′)𝜔 = Σmax (N) · 𝐺 max ( 𝐴′ 𝐴; 𝐵′) 𝜌 . (10.6.9)

We now set out to prove this inequality.

Using (10.6.3), we find that

𝐺 max ( 𝐴′ 𝐴; 𝐵′) 𝜌 = inf Tr[𝐶 𝐴′ 𝐴𝐵′ ], (10.6.10)

subject to the constraints

d 𝐴′ 𝐴 : 𝐵′),
𝐶 𝐴′ 𝐴𝐵′ ∈ SEP( (10.6.11)
𝐶 𝐴′ 𝐴𝐵′ ≥ 𝜌 𝐴′ 𝐴𝐵′ , (10.6.12)

and
𝐺 max ( 𝐴′; 𝐵𝐵′)𝜔 = inf Tr[𝐷 𝐴′ 𝐵𝐵′ ], (10.6.13)
subject to the constraints
d 𝐴′ : 𝐵𝐵′),
𝐷 𝐴′ 𝐵𝐵′ ∈ SEP( (10.6.14)
𝐷 𝐴′ 𝐵𝐵′ ≥ N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ). (10.6.15)

Furthermore, (10.6.5) gives that

Σmax (N) = inf ∥Tr 𝐵 [𝑌𝑆𝐵 ] ∥ ∞ , (10.6.16)

subject to the constraints

𝑌𝑆𝐵 ∈ SEP(𝑆
d : 𝐵), (10.6.17)
N
𝑌𝑆𝐵 ≥ Γ𝑆𝐵 . (10.6.18)

With these optimizations in place, we can now establish the inequality in (10.6.9)
by making a judicious choice for 𝐷 𝐴′ 𝐵𝐵′ . Let 𝐶 𝐴′ 𝐴𝐵′ be an arbitrary operator
to consider in the optimization for 𝐺 max ( 𝐴′ 𝐴; 𝐵′) 𝜌 (i.e., satisfying (10.6.11)–
(10.6.12)), and let 𝑌𝑆𝐵 be an arbitrary operator to consider in the optimization for
Í𝑑 𝐴−1
Σmax (N) (i.e., satisfying (10.6.17)–(10.6.18)). Let |Γ⟩𝑆 𝐴 = 𝑖=0 |𝑖, 𝑖⟩𝑆 𝐴 . Pick

𝐷 𝐴′ 𝐵𝐵′ = ⟨Γ| 𝑆 𝐴𝐶 𝐴′ 𝐴𝐵′ ⊗ 𝑌𝑆𝐵 |Γ⟩𝑆 𝐴 . (10.6.19)

614
Chapter 10: Entanglement Measures for Quantum Channels

We need to prove that 𝐷 𝐴′ 𝐵𝐵′ is feasible for 𝐺 max ( 𝐴′; 𝐵𝐵′)𝜔 . To this end, consider
that
N
⟨Γ| 𝑆 𝐴𝐶 𝐴′ 𝐴𝐵′ ⊗ 𝑌𝑆𝐵 |Γ⟩𝑆 𝐴 ≥ ⟨Γ| 𝑆 𝐴 𝜌 𝐴′ 𝐴𝐵′ ⊗ Γ𝑆𝐵 |Γ⟩𝑆 𝐴 (10.6.20)
= N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ), (10.6.21)

which follows from (10.6.12), (10.6.18), and (4.2.6). Now, since 𝐶 𝐴′ 𝐴𝐵′ ∈
′ ′ Í
SEP( 𝐴 𝐴 : 𝐵 ), it can be written as 𝑥∈X 𝑃𝑥𝐴′ 𝐴 ⊗ 𝑄 𝑥𝐵′ for some finite alphabet
d
X and for sets {𝑃𝑥𝐴′ 𝐴 }𝑥∈X , {𝑄 𝑥𝐵′ }𝑥∈X of positive semi-definite operators. Similarly,
d (𝑆 : 𝐵), it can be written as Í𝑦∈Y 𝐿 𝑦 ⊗ 𝑀 𝑦 , for some finite alphabet
since 𝑌𝑆𝐵 ∈ SEP
𝑦 𝑦 𝑆 𝐵
Y and sets {𝐿 𝑆 } 𝑦∈Y , {𝑀𝐵 } 𝑦∈Y of positive semi-definite operators. Then, using
(2.2.40) and (2.2.41), we have that

⟨Γ| 𝑆 𝐴𝐶 𝐴′ 𝐴𝐵′ ⊗ 𝑌𝑆𝐵 |Γ⟩𝑆 𝐴

∑︁
𝑦 𝑦
= ⟨Γ| 𝑆 𝐴 𝑃𝑥𝐴′ 𝐴 ⊗ 𝑄 𝑥𝐵′ ⊗ 𝐿 𝑆 ⊗ 𝑀𝐵 |Γ⟩𝑆 𝐴 (10.6.22)
𝑥∈X
𝑦∈Y

⟨Γ| 𝑆 𝐴 𝑃𝑥𝐴′ 𝐴 T 𝐴 (𝐿 𝐴 ) ⊗ 𝑄 𝑥𝐵′ ⊗ 1𝑆 ⊗ 𝑀𝐵 |Γ⟩𝑆 𝐴

∑︁
𝑦 𝑦
= (10.6.23)
𝑥∈X
𝑦∈Y
∑︁
𝑦 𝑦
= Tr 𝐴 [𝑃𝑥𝐴′ 𝐴 T 𝐴 (𝐿 𝐴 )] ⊗ 𝑄 𝑥𝐵′ ⊗ 𝑀𝐵 (10.6.24)
𝑥∈X
𝑦∈Y
d 𝐴′ : 𝐵𝐵′).
∈ SEP( (10.6.25)

For the second equality, we used the transpose trick from (2.2.40), and for the third,
we used (2.2.41). The last statement follows because
√︃ √︃
𝑥 𝑦 𝑦 𝑥 𝑦
Tr 𝐴 [𝑃 𝐴′ 𝐴 T 𝐴 (𝐿 𝐴 )] = Tr 𝐴 T 𝐴 (𝐿 𝐴 )𝑃 𝐴′ 𝐴 T 𝐴 (𝐿 𝐴 ) (10.6.26)

is positive semi-definite for each 𝑥 ∈ X and 𝑦 ∈ Y. Thus, 𝐷 𝐴′ 𝐵𝐵′ is feasible for

𝐺 max ( 𝐴′; 𝐵𝐵′)𝜔 . Finally, using (2.2.41) again, consider that

𝐺 max ( 𝐴′; 𝐵𝐵′)𝜔 ≤ Tr[𝐷 𝐴′ 𝐵𝐵′ ] (10.6.27)

= Tr[⟨Γ| 𝑆 𝐴𝐶 𝐴′ 𝐴𝐵′ ⊗ 𝑌𝑆𝐵 |Γ⟩𝑆 𝐴 ] (10.6.28)
= Tr[𝐶 𝐴′ 𝐴𝐵′ T 𝐴 (𝑌 𝐴𝐵 )] (10.6.29)
= Tr[𝐶 𝐴′ 𝐴𝐵′ T 𝐴 (Tr 𝐵 [𝑌 𝐴𝐵 ])] (10.6.30)

615
Chapter 10: Entanglement Measures for Quantum Channels

For the second equality, we used the transpose trick from (2.2.40). Since 𝐶 𝐴′ 𝐴𝐵′
and 𝑌𝑆𝐵 are positive semi-definite (this follows from (10.6.12) and (10.6.18),
respectively), using (2.2.98) we have that

Tr[𝐶 𝐴′ 𝐴𝐵′ T 𝐴 (Tr 𝐵 [𝑌 𝐴𝐵 ])] = |Tr[𝐶 𝐴′ 𝐴𝐵′ T 𝐴 (Tr 𝐵 [𝑌 𝐴𝐵 ])] | (10.6.31)

≤ ∥𝐶 𝐴′ 𝐴𝐵′ ∥ 1 ∥T 𝐴 (Tr 𝐵 [𝑌 𝐴𝐵 ]) ∥ ∞ (10.6.32)
= Tr[𝐶 𝐴′ 𝐴𝐵′ ] ∥T 𝐴 (Tr 𝐵 [𝑌 𝐴𝐵 ]) ∥ ∞ (10.6.33)
= Tr[𝐶 𝐴′ 𝐴𝐵′ ] ∥Tr 𝐵 [𝑌 𝐴𝐵 ] ∥ ∞ , (10.6.34)

where for the last equality we used the fact that the spectrum of an operator is
invariant under the action of a full transpose (note, in this case, that T 𝐴 is a full
transpose because the operator Tr 𝐵 [𝑌 𝐴𝐵 ] acts only on 𝐴). Therefore,

𝐺 max ( 𝐴′; 𝐵𝐵′)𝜔 ≤ Tr[𝐶 𝐴′ 𝐴𝐵′ ] ∥Tr 𝐵 [𝑌 𝐴𝐵 ] ∥ ∞ . (10.6.35)

Since this inequality holds for all 𝐶 𝐴′ 𝐴𝐵′ satisfying (10.6.11)–(10.6.12) and for all
𝑌𝑆𝐵 satisfying (10.6.17)–(10.6.18), we conclude (10.6.9) after taking an infimum
with respect to all such operators.
Having shown that

𝐸 max ( 𝐴′; 𝐵𝐵′)𝜔 ≤ 𝐸 max (N) + 𝐸 max ( 𝐴′ 𝐴; 𝐵′) 𝜌

=⇒ 𝐸 max ( 𝐴′; 𝐵𝐵′)𝜔 − 𝐸 max ( 𝐴′ 𝐴; 𝐵′) 𝜌 ≤ 𝐸 max (N) (10.6.36)
A (N) that
for every state 𝜌 𝐴′ 𝐴𝐵′ , it follows immediately from the definition of 𝐸 max
A (N) ≤ 𝐸
𝐸 max max (N). Combined with the result of Lemma 10.4, which is the
A (N) = 𝐸
reverse inequality, we obtain 𝐸 max max (N). ■

A (N) = 𝐸
With the equality 𝐸 max max (N) in hand, the subadditivity of max-relative
entropy of entanglement of quantum channels immediately follows.

Corollary 10.17 Subadditivity of Max-Relative Entropy of Entanglement

of a Quantum Channel
The max-relative entropy of entanglement of a quantum channel is subadditive:
for every two quantum channels N and M,

𝐸 max (N ⊗ M) ≤ 𝐸 max (N) + 𝐸 max (M). (10.6.37)

616
Chapter 10: Entanglement Measures for Quantum Channels

Proof: Given that the amortized entanglement of a quantum channel is always

subadditive, regardless of whether or not the underlying state entanglement measure
is additive (see Proposition 10.5), we have that
A A A
𝐸 max (N ⊗ M) ≤ 𝐸 max (N) + 𝐸 max (M) (10.6.38)
for all quantum channels N and M. Using (10.6.8), we immediately obtain the
desired result. ■

10.6.2 Max-Rains Information

Let us now prove that the amortization collapse also occurs for the max-Rains
information of a quantum channel. The key tools needed for the proof are
Propositions 9.27 and 10.13, which provide semi-definite programs for both the
max-Rains relative entropy for bipartite states and the max-Rains information for
quantum channels. Let us recall these now:
2 𝑅max ( 𝐴;𝐵)𝜌 = 𝑊max ( 𝐴; 𝐵) 𝜌 (10.6.39)
= inf {Tr[𝐾 𝐴𝐵 + 𝐿 𝐴𝐵 ] : T𝐵 [𝐾 𝐴𝐵 − 𝐿 𝐴𝐵 ] ≥ 𝜌 𝐴𝐵 }, (10.6.40)
𝐾 𝐴𝐵 ,𝐿 𝐴𝐵 ≥0
𝑅max (N)
2 = Γmax (N) (10.6.41)
N
= inf {∥Tr 𝐵 [𝑉𝑆𝐵 + 𝑌𝑆𝐵 ] ∥ ∞ : T𝐵 [𝑉𝑆𝐵 − 𝑌𝑆𝐵 ] ≥ Γ𝑆𝐵 }, (10.6.42)
𝑌𝑆𝐵 ,𝑉𝑆𝐵 ≥0

N .
where 𝜌 𝐴𝐵 is a bipartite state and N is a quantum channel with Choi operator Γ𝑆𝐵

Theorem 10.18 Amortization Collapse for Max-Rains Information

Let N 𝐴→𝐵 be a quantum channel. For every state 𝜌 𝐴′ 𝐴𝐵′ ,
𝑅max ( 𝐴′; 𝐵𝐵′)𝜔 ≤ 𝑅max (N) + 𝑅max ( 𝐴′ 𝐴; 𝐵′) 𝜌 , (10.6.43)
where 𝜔 𝐴′ 𝐵𝐵′ B N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ). Consequently,
A
𝑅max (N) ≤ 𝑅max (N), (10.6.44)
and thus (by Lemma 10.4), we have that
A
𝑅max (N) = 𝑅max (N) (10.6.45)
for every quantum channel N.

617
Chapter 10: Entanglement Measures for Quantum Channels

Proof: The proof given below is conceptually similar to the proof of Theorem 10.16,
but it has some key differences.
Using the semi-definite program formulations in (10.6.39)–(10.6.42), we find
that the inequality in (10.6.43) is equivalent to

𝑊max ( 𝐴′; 𝐵𝐵′)𝜔 ≤ Γmax (N) · 𝑊max ( 𝐴′ 𝐴; 𝐵′) 𝜌 , (10.6.46)

and so we aim to prove this one.

Using (10.6.40), we have that

𝑊 ( 𝐴′ 𝐴; 𝐵′) 𝜌 = inf Tr[𝐶 𝐴′ 𝐴𝐵′ + 𝐷 𝐴′ 𝐴𝐵′ ], (10.6.47)

subject to the constraints

𝐶 𝐴′ 𝐴𝐵′ , 𝐷 𝐴′ 𝐴𝐵′ ≥ 0, (10.6.48)

T𝐵′ (𝐶 𝐴′ 𝐴𝐵′ − 𝐷 𝐴′ 𝐴𝐵′ ) ≥ 𝜌 𝐴′ 𝐴𝐵′ , (10.6.49)

and we also have

𝑊 ( 𝐴′; 𝐵𝐵′)𝜔 = inf Tr[𝐺 𝐴′ 𝐵𝐵′ + 𝐹𝐴′ 𝐵𝐵′ ], (10.6.50)

subject to the constraints

𝐺 𝐴′ 𝐵𝐵′ , 𝐹𝐴′ 𝐵𝐵′ ≥ 0, (10.6.51)

N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ) ≤ T𝐵𝐵′ [𝐺 𝐴′ 𝐵𝐵′ − 𝐹𝐴′ 𝐵𝐵′ ]. (10.6.52)

Using (10.6.42), we have that

Γmax (N) = inf ∥Tr 𝐵 [𝑉𝑆𝐵 + 𝑌𝑆𝐵 ] ∥ ∞ , (10.6.53)

subject to the constraints

𝑌𝑆𝐵 , 𝑉𝑆𝐵 ≥ 0, (10.6.54)

N
T𝐵 [𝑉𝑆𝐵 − 𝑌𝑆𝐵 ] ≥ Γ𝑆𝐵 . (10.6.55)

With these SDP formulations in place, we can now establish the inequality in
(10.6.46) by making judicious choices for 𝐺 𝐴′ 𝐵𝐵′ and 𝐹𝐴′ 𝐵𝐵′ . Let 𝐶 𝐴′ 𝐴𝐵′ and 𝐷 𝐴′ 𝐴𝐵′
be arbitrary operators in the optimization for 𝑊max ( 𝐴′ 𝐴; 𝐵′) 𝜌 , and let 𝑌𝑆𝐵 and 𝑉𝑆𝐵

618
Chapter 10: Entanglement Measures for Quantum Channels
Í𝑑 𝐴−1
be arbitrary operators in the optimization for Γmax (N). Let |Γ⟩𝑆 𝐴 = 𝑖=0 |𝑖, 𝑖⟩𝑆 𝐴 .
Pick
𝐺 𝐴′ 𝐵𝐵′ = ⟨Γ| 𝑆 𝐴𝐶 𝐴′ 𝐴𝐵′ ⊗ 𝑉𝑆𝐵 + 𝐷 𝐴′ 𝐴𝐵′ ⊗ 𝑌𝑆𝐵 |Γ⟩𝑆 𝐴 , (10.6.56)
𝐹𝐴′ 𝐵𝐵′ = ⟨Γ| 𝑆 𝐴𝐶 𝐴′ 𝐴𝐵′ ⊗ 𝑌𝑆𝐵 + 𝐷 𝐴′ 𝐴𝐵′ ⊗ 𝑉𝑆𝐵 |Γ⟩𝑆 𝐴 . (10.6.57)
Note that 𝐺 𝐴′ 𝐵𝐵′ , 𝐹𝐴′ 𝐵𝐵′ ≥ 0 because 𝐶 𝐴′ 𝐴𝐵′ , 𝐷 𝐴′ 𝐴𝐵′ , 𝑌𝑆𝐵 , 𝑉𝑆𝐵 ≥ 0. Using
(10.6.49) and (10.6.55), consider that
T𝐵𝐵′ [𝐺 𝐴′ 𝐵𝐵′ − 𝐹𝐴′ 𝐵𝐵′ ]
= T𝐵𝐵′ [⟨Γ| 𝑆 𝐴 (𝐶 𝐴′ 𝐴𝐵′ − 𝐷 𝐴′ 𝐴𝐵′ ) ⊗ (𝑉𝑆𝐵 − 𝑌𝑆𝐵 )|Γ⟩𝑆 𝐴 ] (10.6.58)
= ⟨Γ| 𝑆 𝐴 T𝐵′ [𝐶 𝐴′ 𝐴𝐵′ − 𝐷 𝐴′ 𝐴𝐵′ ] ⊗ T𝐵 [𝑉𝑆𝐵 − 𝑌𝑆𝐵 ]|Γ⟩𝑆 𝐴 (10.6.59)
N
≥ ⟨Γ| 𝑆 𝐴 𝜌 𝐴′ 𝐴𝐵′ ⊗ Γ𝑆𝐵 |Γ⟩𝑆 𝐴 (10.6.60)
= N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ), (10.6.61)
where the last equality follows from (4.2.6). Our choices of 𝐺 𝐴′ 𝐵𝐵′ and 𝐹𝐴′ 𝐵𝐵′
are thus feasible points for 𝑊max ( 𝐴′; 𝐵𝐵′)𝜔 . Using this, along with (2.2.40) and
(2.2.41), we obtain
𝑊max ( 𝐴′; 𝐵𝐵′)𝜔
≤ Tr[𝐺 𝐴′ 𝐵𝐵′ + 𝐹𝐴′ 𝐵𝐵′ ] (10.6.62)
= Tr[⟨Γ| 𝑆 𝐴 (𝐶 𝐴′ 𝐴𝐵′ + 𝐷 𝐴′ 𝐴𝐵′ ) ⊗ (𝑉𝑆𝐵 + 𝑌𝑆𝐵 )|Γ⟩𝑆 𝐴 ] (10.6.63)
= Tr[(𝐶 𝐴′ 𝐴𝐵′ + 𝐷 𝐴′ 𝐴𝐵′ )T 𝐴 (𝑉𝐴𝐵 + 𝑌 𝐴𝐵 )] (10.6.64)
= Tr[(𝐶 𝐴′ 𝐴𝐵′ + 𝐷 𝐴′ 𝐴𝐵′ )T 𝐴 (Tr 𝐵 [𝑉𝐴𝐵 + 𝑌 𝐴𝐵 ])]. (10.6.65)
The second equality follows from the transpose trick from (2.2.40). Now, since
𝐶 𝐴′ 𝐴𝐵′ , 𝐷 𝐴′ 𝐴𝐵′ ≥ 0 (recall (10.6.48)), and 𝑉𝐴𝐵 , 𝑌 𝐴𝐵 ≥ 0 (recall (10.6.54)), we can
use (2.2.98) to conclude that
Tr[(𝐶 𝐴′ 𝐴𝐵′ + 𝐷 𝐴′ 𝐴𝐵′ )T 𝐴 (Tr 𝐵 [𝑉𝐴𝐵 + 𝑌 𝐴𝐵 ])] (10.6.66)
= |Tr[(𝐶 𝐴′ 𝐴𝐵′ + 𝐷 𝐴′ 𝐴𝐵′ )T 𝐴 (Tr 𝐵 [𝑉𝐴𝐵 + 𝑌 𝐴𝐵 ])] | (10.6.67)
≤ ∥𝐶 𝐴′ 𝐴𝐵′ + 𝐷 𝐴′ 𝐴𝐵′ ∥ 1 ∥T 𝐴 (Tr 𝐵 [𝑉𝐴𝐵 + 𝑌 𝐴𝐵 ])∥ ∞ (10.6.68)
= Tr[𝐶 𝐴′ 𝐴𝐵′ + 𝐷 𝐴′ 𝐴𝐵′ ] ∥T 𝐴 (Tr 𝐵 [𝑉𝐴𝐵 + 𝑌 𝐴𝐵 ])∥ ∞ (10.6.69)
= Tr[𝐶 𝐴′ 𝐴𝐵′ + 𝐷 𝐴′ 𝐴𝐵′ ] ∥Tr 𝐵 [𝑉𝐴𝐵 + 𝑌 𝐴𝐵 ] ∥ ∞ , (10.6.70)
where the final equality follows because the spectrum of an operator is invariant
under the action of a (full) transpose (note, in this case, that T 𝐴 is a full transpose
because the operator Tr 𝐵 [𝑉𝐴𝐵 + 𝑌 𝐴𝐵 ] acts only on system 𝐴). We thus have
𝑊max ( 𝐴′; 𝐵𝐵′)𝜔 ≤ Tr[𝐶 𝐴′ 𝐴𝐵′ + 𝐷 𝐴′ 𝐴𝐵′ ] ∥T 𝐴 [Tr 𝐵 [𝑉𝐴𝐵 + 𝑌 𝐴𝐵 ]] ∥ ∞ (10.6.71)
619
Chapter 10: Entanglement Measures for Quantum Channels

Since this inequality holds for all 𝐶 𝐴′ 𝐴𝐵′ and 𝐷 𝐴′ 𝐴𝐵′ satisfying (10.6.49) and for
all 𝑉𝐴𝐵 and 𝑌 𝐴𝐵 satisfying (10.6.55), we conclude the inequality in (10.6.46).
Having shown that

𝑅max ( 𝐴′; 𝐵𝐵′)𝜔 ≤ 𝑅max (N) + 𝑅max ( 𝐴′ 𝐴; 𝐵′) 𝜌 ,

=⇒ 𝑅max ( 𝐴′; 𝐵𝐵′)𝜔 − 𝑅max ( 𝐴′ 𝐴; 𝐵′) 𝜌 ≤ 𝑅max (N) (10.6.72)

for every state 𝜌 𝐴′ 𝐴𝐵′ , it immediately follows from the definition of amortized
entanglement of a channel that
A
𝑅max (N) ≤ 𝑅max (N). (10.6.73)

Thus, by Lemma 10.4, which proves the reverse inequality, we obtain

A
𝑅max (N) = 𝑅max (N) (10.6.74)

for every quantum channel N. ■

A (N) = 𝑅
With the equality 𝑅max max (N) in hand, along with additivity of max-
Rains relative entropy for bipartite states, additivity of max-Rains information of a
quantum channel immediately follows.

Corollary 10.19 Additivity of Max-Rains Information of a Channel

The max-Rains information of a quantum channel is additive: for every two
quantum channels N and M,

𝑅max (N ⊗ M) = 𝑅max (N) + 𝑅max (M). (10.6.75)

Proof: The additivity of max-Rains relative entropy for bipartite states (see
Proposition 9.29), along with Proposition 10.5, implies that the amortized max-
Rains information of a quantum channel is additive, meaning that
A A A
𝑅max (N ⊗ M) = 𝑅max (N) + 𝑅max (M) (10.6.76)

for all quantum channels N and M. Then, from (10.6.45), we obtain the desired
result. ■

620
Chapter 10: Entanglement Measures for Quantum Channels

10.6.3 Squashed Entanglement

Finally, let us prove that the amortization collapse occurs for the squashed entangle-
ment of a quantum channel.

Theorem 10.20 Amortization Collapse for Squashed Entanglement

The squashed entanglement of a channel does not increase under amortization,
i.e.,
A
𝐸 sq (N) = 𝐸 sq (N) (10.6.77)
for every quantum channel N.

A (N) has already been shown in Lemma 10.4.

Proof: The inequality 𝐸 sq (N) ≤ 𝐸 sq
We thus prove the reverse inequality.
Let 𝜌 𝐴′ 𝐴𝐵′ be an arbitrary state, and let 𝜔 𝐴′ 𝐵𝐵′ = N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ). Let UN
𝐴→𝐵𝐸
be an isometric channel extending N 𝐴→𝐵 , and let 𝜑 𝐴′ 𝐴𝐵′ 𝑅 be a purification of
𝜌 𝐴′ 𝐴𝐵′ . Then, the state 𝜃 𝐴′ 𝐵𝐵′ 𝐸 𝑅 B UN
𝐴→𝐵𝐸 (𝜑 𝐴 𝐴𝐵 𝑅 ) is a purification of 𝜔 𝐴 𝐵𝐵 .
′ ′ ′ ′

As we show in Lemma 10.22 below, the following inequality holds

𝐸 sq ( 𝐴′; 𝐵𝐵′)𝜔 = 𝐸 sq ( 𝐴′; 𝐵𝐵′)𝜃 ≤ 𝐸 sq ( 𝐴′ 𝐵′ 𝑅; 𝐵)𝜃 + 𝐸 sq ( 𝐴′ 𝐵𝐸; 𝐵′)𝜃 . (10.6.78)

Now, squashed entanglement is invariant under the action of an isometric channel,

in particular UN ′ ′ ′ ′
𝐴→𝐵𝐸 , so that 𝐸 sq ( 𝐴 𝐵𝐸; 𝐵 ) 𝜃 = 𝐸 sq ( 𝐴 𝐴; 𝐵 ) 𝜌 . Also, the state
𝜃 𝐴′ 𝐵𝐵′ 𝐸 𝑅 can be viewed as a particular state in the optimization over states that
defines 𝐸 sq (N) because 𝜃 𝐴′ 𝐵𝐵′ 𝐸 𝑅 is a purification of 𝜔 𝐴′ 𝐵𝐵′ . Therefore,

𝐸 sq ( 𝐴′ 𝐵′ 𝑅; 𝐵)𝜃 ≤ 𝐸 sq (N). (10.6.79)

Altogether, we thus have

𝐸 sq ( 𝐴′; 𝐵𝐵′)𝜔 ≤ 𝐸 sq (N) + 𝐸 sq ( 𝐴′ 𝐴; 𝐵′) 𝜌

=⇒ 𝐸 sq ( 𝐴′; 𝐵𝐵′)𝜔 − 𝐸 sq ( 𝐴′ 𝐴; 𝐵′) 𝜌 ≤ 𝐸 sq (N) (10.6.80)

for every state 𝜌 𝐴′ 𝐴𝐵′ , which means by definition of amortized entanglement that
A
𝐸 sq (N) ≤ 𝐸 sq (N), (10.6.81)

which is what we sought to prove. ■

621
Chapter 10: Entanglement Measures for Quantum Channels

A (N) = 𝐸 (N) in hand, along with additivity of squashed

With the equality 𝐸 sq sq
entanglement for bipartite states (Property 4. of Proposition 9.32), additivity of
squashed entanglement for channels immediately follows.

Corollary 10.21 Additivity of Squashed Entanglement of a Channel

The squashed entanglement of a quantum channel is additive: for every two
quantum channels N and M,

𝐸 sq (N ⊗ M) = 𝐸 sq (N) + 𝐸 sq (M). (10.6.82)

Proof: The additivity of the squashed entanglement for bipartite states (see
Proposition 9.32), along with Proposition 10.5, implies that the amortized squashed
entanglement of a quantum channel is additive, meaning that
A A A
𝐸 sq (N ⊗ M) = 𝐸 sq (N) + 𝐸 sq (M) (10.6.83)
for all quantum channels N and M. Then, from (10.6.77), we obtain the desired
result. ■

Lemma 10.22
Let 𝜓 𝐾 𝐿 1 𝐿 2 𝑀1 𝑀2 be a pure state. Then

𝐸 sq (𝐾; 𝐿 1 𝐿 2 )𝜓 ≤ 𝐸 sq (𝐾 𝐿 2 𝑀2 ; 𝐿 1 )𝜓 + 𝐸 sq (𝐾 𝐿 1 𝑀1 ; 𝐿 2 )𝜓 . (10.6.84)

Proof: We make use of the squashing channel representation of squashed entan-

glement in (9.4.43), namely,
1 𝜌
𝐸 sq ( 𝐴; 𝐵) 𝜌 = inf {𝐼 ( 𝐴; 𝐵|𝐸)𝜔 : 𝜔 𝐴𝐵𝐸 = S𝐸 ′ →𝐸 (𝜓 𝐴𝐵𝐸 ′ )}, (10.6.85)
2 S𝐸 ′ →𝐸
𝜌
where 𝜓 𝐴𝐵𝐸 ′ is a purification of 𝜌 and the infimum is with respect to every quantum
channel S𝐸 ′ →𝐸 . Let us also recall that
𝐼 ( 𝐴; 𝐵|𝐸)𝜔 = 𝐻 ( 𝐴|𝐸)𝜔 + 𝐻 (𝐵|𝐸)𝜔 − 𝐻 ( 𝐴𝐵|𝐸)𝜔 (10.6.86)
= 𝐻 (𝐵|𝐸)𝜔 − 𝐻 (𝐵| 𝐴𝐸)𝜔 , (10.6.87)
and that strong subadditivity (Theorem 7.6) is the statement that 𝐼 ( 𝐴; 𝐵|𝐸)𝜔 ≥ 0.
From this we obtain the following two inequalitites:
𝐻 ( 𝐴𝐵|𝐸)𝜔 ≤ 𝐻 ( 𝐴|𝐸)𝜔 + 𝐻 (𝐵|𝐸)𝜔 , (10.6.88)
622
Chapter 10: Entanglement Measures for Quantum Channels

𝐻 (𝐵| 𝐴𝐸)𝜔 ≤ 𝐻 (𝐵|𝐸)𝜔 . (10.6.89)

Now, the given pure state 𝜓 𝐾 𝐿 1 𝐿 2 𝑀1 𝑀2 can be thought of as a purification of

the reduced state 𝜓 𝐾 𝐿 1 𝐿 2 for which the squashed entanglement 𝐸 sq (𝐾; 𝐿 1 𝐿 2 )𝜓 is
evaluated, with the purifying systems being 𝑀1 and 𝑀2 . Then, considering an
arbitrary product squashing channel S1𝑀1 →𝑀 ′ ⊗ S2𝑀2 →𝑀 ′ , and letting
1 2

𝜔 𝐾 𝐿 1 𝐿 2 𝑀1′ 𝑀2′ = (S1𝑀1 →𝑀 ′ ⊗ S2𝑀2 →𝑀 ′ )(𝜓 𝐾 𝐿 1 𝐿 2 𝑀1 𝑀2 ), (10.6.90)

1 2

we find from (10.6.85) that

2𝐸 sq (𝐾; 𝐿 1 𝐿 2 )𝜓 ≤ 𝐼 (𝐾; 𝐿 1 𝐿 2 |𝑀1′ 𝑀2′ )𝜔 . (10.6.91)

Expanding the quantum conditional mutual information using (10.6.87), we have

that

𝐼 (𝐾; 𝐿 1 𝐿 2 |𝑀1′ 𝑀2′ )𝜔 = 𝐻 (𝐿 1 𝐿 2 |𝑀1′ 𝑀2′ )𝜔 − 𝐻 (𝐿 1 𝐿 2 |𝑀1′ 𝑀2′ 𝐾)𝜔 . (10.6.92)

Now, let 𝜙 𝐾 𝐿 1 𝐿 2 𝑀1′ 𝑀2′ 𝑅 be a purification of 𝜔 𝐾 𝐿 1 𝐿 2 𝑀1′ 𝑀2′ with purifying system 𝑅.
Then, by definition of conditional entropy, and using the fact that 𝜙 𝐾 𝐿 1 𝐿 2 𝑀1′ 𝑀2′ 𝑅 is
pure, we obtain1

𝐻 (𝐿 1 𝐿 2 |𝑀1′ 𝑀2′ 𝐾)𝜔 = 𝐻 (𝐿 1 𝐿 2 𝑀1′ 𝑀2′ 𝐾)𝜔 − 𝐻 (𝑀1′ 𝑀2′ 𝐾)𝜔 (10.6.93)
= 𝐻 (𝑅) 𝜙 − 𝐻 (𝐿 1 𝐿 2 𝑅) 𝜙 (10.6.94)
= −𝐻 (𝐿 1 𝐿 2 |𝑅) 𝜙 . (10.6.95)

Therefore,

2𝐸 sq (𝐾; 𝐿 1 𝐿 2 )𝜓 ≤ 𝐻 (𝐿 1 𝐿 2 |𝑀1′ 𝑀2′ ) 𝜙 + 𝐻 (𝐿 1 𝐿 2 |𝑅) 𝜙 . (10.6.96)

Using the inequality in (10.6.88) followed by two applications of (10.6.89) (with

appropriate identification of subsystems in all three cases), we obtain

𝐻 (𝐿 1 𝐿 2 |𝑀1′ 𝑀2′ ) 𝜙 ≤ 𝐻 (𝐿 1 |𝑀1′ 𝑀2′ ) 𝜙 + 𝐻 (𝐿 2 |𝑀1′ 𝑀2′ ) 𝜙 (10.6.97)

≤ 𝐻 (𝐿 1 |𝑀1′ ) 𝜙 + 𝐻 (𝐿 2 |𝑀2′ ) 𝜙 . (10.6.98)
1The steps in (10.6.93)–(10.6.95) establish a general fact called duality of conditional entropy:
for every pure state 𝜓 𝐴𝐵𝐸 , the following equality holds 𝐻 ( 𝐴|𝐸) 𝜓 + 𝐻 (𝐵|𝐸) 𝜓 = 0.

623
Chapter 10: Entanglement Measures for Quantum Channels

Next, using (10.6.88), we conclude that

𝐻 (𝐿 1 𝐿 2 |𝑅) 𝜙 ≤ 𝐻 (𝐿 1 |𝑅) 𝜙 + 𝐻 (𝐿 2 |𝑅) 𝜙 . (10.6.99)
Therefore, continuing from (10.6.96), we have
2𝐸 sq (𝐾; 𝐿 1 𝐿 2 )𝜓 ≤ 𝐻 (𝐿 1 |𝑀1′ ) 𝜙 +𝐻 (𝐿 2 |𝑀2′ ) 𝜙 +𝐻 (𝐿 1 |𝑅) 𝜙 +𝐻 (𝐿 2 |𝑅) 𝜙 . (10.6.100)
Now, applying reasoning analogous to that in (10.6.93)–(10.6.95) for the last two
terms, we find that
2𝐸 sq (𝐾; 𝐿 1 𝐿 2 )𝜓 ≤ 𝐻 (𝐿 1 |𝑀1′ )𝜔 − 𝐻 (𝐿 2 |𝐾 𝐿 2 𝑀1′ 𝑀2′ )𝜔
+ 𝐻 (𝐿 2 |𝑀2′ )𝜔 − 𝐻 (𝐿 2 |𝐾 𝐿 1 𝑀1′ 𝑀2′ )𝜔 (10.6.101)
= 𝐼 (𝐾 𝐿 2 𝑀2′ ; 𝐿 1 |𝑀1′ )𝜔 + 𝐼 (𝐾 𝐿 1 𝑀1′ ; 𝐿 2 |𝑀2′ )𝜔 , (10.6.102)
where in the last line we used the expression in (10.6.87) for the conditional mutual
information.
Now, let us regard 𝜓 𝐾 𝐿 1 𝐿 2 𝑀1 𝑀2 as a purification of the reduced state 𝜓 𝐾 𝐿 2 𝑀2 𝐿 1 ,
for which the squashed entanglement 𝐸 sq (𝐾 𝐿 2 𝑀2 ; 𝐿 1 )𝜓 is evaluated, with the
purifying system being 𝑀1 . Then, the state
𝜏𝐾 𝐿 2 𝑀2 𝐿 1 𝑀1′ B S1𝑀1 →𝑀 ′ (𝜓 𝐾 𝐿 2 𝑀1 𝐿 1 𝑀1 ) (10.6.103)
1

is a particular extension of 𝜓 𝐾 𝐿 2 𝑀2 𝐿 1 in the optimization for 𝐸 sq (𝐾 𝐿 2 𝑀2 ; 𝐿 1 )𝜓 .

Similarly, we can regard 𝜓 𝐾 𝐿 1 𝐿 2 𝑀1 𝑀2 as a purification of the reduced state 𝜓 𝐾 𝐿 1 𝑀1 𝐿 2 ,
for which the squashed entanglement 𝐸 sq (𝐾 𝐿 1 𝑀1 ; 𝐿 2 )𝜓 is evaluated, with the
purifying system being 𝑀2 . Then, the state
𝜎𝐾 𝐿 1 𝑀1 𝐿 2 𝑀2′ B S2𝑀2 →𝑀 ′ (𝜓 𝐾 𝐿 1 𝑀1 𝐿 2 𝑀2 ) (10.6.104)
2

is a particular extension of 𝜓 𝐾 𝐿 1 𝑀1 𝐿 2 in the optimization for 𝐸 sq (𝐾 𝐿 1 𝑀1 ; 𝐿 2 )𝜓 .

Using all of this, we proceed from (10.6.102) to obtain
2𝐸 sq (𝐾; 𝐿 1 𝐿 2 )𝜓 ≤ 𝐼 (𝐾 𝐿 2 𝑀2′ ; 𝐿 1 |𝑀1′ )𝜔 + 𝐼 (𝐾 𝐿 1 𝑀1′ ; 𝐿 2 |𝑀2′ )𝜔 (10.6.105)
≤ 𝐼 (𝐾 𝐿 2 𝑀2 ; 𝐿 1 |𝑀1′ )𝜏 + 𝐼 (𝐾 𝐿 1 𝑀1 ; 𝐿 2 |𝑀2′ )𝜎 , (10.6.106)
where for the second inequality we used the data-processing inequality for condi-
tional mutual information (Proposition 7.9). Since the squashing channels S1𝑀1 →𝑀 ′
1
and S2𝑀2 →𝑀 ′ are arbitrary, optimizing over all such channels on the right-hand side
2
of the inequality above leads us to
𝐸 sq (𝐾; 𝐿 1 𝐿 2 )𝜓 ≤ 𝐸 sq (𝐾 𝐿 2 𝑀2 ; 𝐿 1 )𝜓 + 𝐸 sq (𝐾 𝐿 1 𝑀1 ; 𝐿 2 )𝜓 , (10.6.107)
as required. ■
624
Chapter 10: Entanglement Measures for Quantum Channels

10.7 Summary
... We considered two types of channel entanglement measures. The first type
quantifies the entanglement of a bipartite state after one share of it is sent through
the given quantum channel, in a manner analogous to the channel information
measures defined in Chapter 7. The second type of channel entanglement measure
is called amortized entanglement, which essentially quantifies the difference in
entanglement between a bipartite state and the state obtained after sending one
share of it through the given channel. The concept of amortized entanglement
turns out to play an important role in feedback-assisted communciation scenarios
(as considered in Part III), as it can be used to prove important properties of
entanglement measures of the first kind....

10.8 Bibliographic Notes

...The entanglement of a quantum channel was presented by Takeoka et al. (2014);
Tomamichel et al. (2017), by employing the squashed entanglement and the Rains
relative entropy entanglement measures, respectively. The amortized entanglement
of a quantum channel has its roots in early work by Bennett et al. (2003); Leifer
et al. (2003), and it was formally defined and its various properties established
by Kaur and Wilde (2017). The work by Rigovacca et al. (2018) is related to the
notion of amortized entanglement. The connection of amortized entanglement to
teleportation simulation of quantum channels was elucidated by Kaur and Wilde
(2017). A channel’s relative entropy of entanglement was defined by Pirandola
et al. (2017), max-relative entropy of entanglement by Christandl and Müller-
Hermes (2017), and the hypothesis testing and sandwiched Rényi relative entropy
of entanglement by Wilde et al. (2017). Proposition 10.9 is based on (Tomamichel
et al., 2017, Proposition 2) and (Wilde et al., 2017, Proposition 14). The generalized
Rains information of a quantum channel was defined by Tomamichel et al. (2017),
who explicitly considered the Rains information and the sandwiched Rényi Rains
information as special cases. Tomamichel et al. (2016) focused on the hypothesis
testing Rains information of a quantum channel. Wang et al. (2019b) observed
that the quantity defined by Wang and Duan (2016b) is equal to the max-Rains
information of a quantum channel. The semi-definite program formulation of
the max-Rains information of a quantum channel in Proposition 10.13 is due

625
Chapter 10: Entanglement Measures for Quantum Channels

to Wang and Duan (2016b); Wang et al. (2019b). Proposition 10.12 is due
to Tomamichel et al. (2017). Concavity of a channel’s unoptimized squashed
entanglement (Proposition 10.15) is due to Takeoka et al. (2014). The amortization
collapse of max-relative entropy of entanglement and of max-Rains information
(Theorems 10.16 and 10.18, respectively) were shown by Berta and Wilde (2018),
with the amortization collapse of max-relative entropy implicitly considered by
Christandl and Müller-Hermes (2017). Lemma 10.22 is due to Takeoka et al. (2014),
and the explicit observation that Lemma 10.22 implies that amortization does not
increase the squashed entanglement of a quantum channel (Theorem 10.20) was
realized by Kaur and Wilde (2017). Corollary 10.21 was established by Takeoka
et al. (2014). ...

10.9 Problems

Appendix 10.A The 𝜶 → 1 and 𝜶 → ∞ Limits of the

Sandwiched Rényi Entanglement
Measures
In this section, we prove the 𝛼 → 1 and 𝛼 → ∞ limits of the sandwiched Rényi
state and channel entanglement measures that we have considered in this chapter.
Specifically, we consider the limits of
e𝛼 ( 𝐴; 𝐵) 𝜌 =
𝐸 inf e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ),
𝐷 (10.A.1)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)
e𝛼 (N) = sup
𝐸 inf e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 ),
𝐷 (10.A.2)
𝜓𝑅 𝐴 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)
e𝛼 ( 𝐴; 𝐵) 𝜌 =
𝑅 inf e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ),
𝐷 (10.A.3)
𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)
e𝛼 (N) = sup
𝑅 inf e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 ),
𝐷 (10.A.4)
′
𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈PPT (𝑅:𝐵)

where 𝛼 ∈ [1/2, 1) ∪ (1, ∞).

We make consistent use throughout this section of the fact that the sandwiched
e𝛼 is monotonically increasing in 𝛼 for all 𝛼 ∈ (0, 1) ∪ (1, ∞)
Rényi relative entropy 𝐷
(see Proposition 7.31), as well as the fact that lim𝛼→1 𝐷 e𝛼 = 𝐷, where 𝐷 is

626
Chapter 10: Entanglement Measures for Quantum Channels

the quantum relative entropy (see Proposition 7.30). We also use the fact that
lim𝛼→∞ 𝐷e𝛼 = 𝐷 max (see Proposition 7.61).

10.A.0.1 𝜶 → 1 Limits

We start by showing that

e𝛼 ( 𝐴; 𝐵) 𝜌 = 𝐸 𝑅 ( 𝐴; 𝐵) =
lim 𝐸 inf 𝐷 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ). (10.A.5)
𝛼→1 𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)

The proof of
e𝛼 ( 𝐴; 𝐵) 𝜌 = 𝑅( 𝐴; 𝐵) 𝜌 =
lim 𝑅 inf 𝐷 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) (10.A.6)
𝛼→1 𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)

is analogous, and so we omit it.

When approaching one from above, due to monotonicity in 𝛼 of 𝐷
e𝛼 (Proposi-
tion 7.31), we have that
e𝛼 =
lim 𝐷 e𝛼 = 𝐷.
inf 𝐷 (10.A.7)
𝛼→1+ 𝛼∈(1,∞)

Therefore, we readily obtain

e𝛼 ( 𝐴; 𝐵) 𝜌 =
lim 𝐸 inf inf e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 )
𝐷 (10.A.8)
𝛼→1+ 𝛼∈(1,∞) 𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)
= inf e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 )
inf 𝐷 (10.A.9)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵) 𝛼∈(1,∞)
= inf 𝐷 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) (10.A.10)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)
= 𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 . (10.A.11)
Now, when approaching one from below, we have
e𝛼 = sup 𝐷
lim 𝐷 e𝛼 = 𝐷. (10.A.12)
𝛼→1 − 𝛼∈(0,1)

Therefore,
e𝛼 ( 𝐴; 𝐵) 𝜌 = sup
lim 𝐸 inf e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ).
𝐷 (10.A.13)
𝛼→1 − 𝛼∈( 1/2,1) 𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)

Now, we apply Theorem 2.25. Specifically, we can apply the theorem in order to
exchange the order of the infimum and supremum because the function
(𝜎𝐴𝐵 , 𝛼) ↦→ 𝐷
e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) (10.A.14)
627
Chapter 10: Entanglement Measures for Quantum Channels

is continuous in the first argument and montonically increasing in the second

argument (also, the set of separable states is compact). We thus obtain
e𝛼 ( 𝐴; 𝐵) 𝜌 = sup
lim 𝐸 inf e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 )
𝐷 (10.A.15)
𝛼→1 − 𝛼∈( 1/2,1) 𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)
= inf e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 )
sup 𝐷 (10.A.16)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵) 𝛼∈( 1/2,1)
= inf 𝐷 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) (10.A.17)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)
= 𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 . (10.A.18)

This concludes the proof of (10.A.5).

Now, for the channel measure, we show that
e𝛼 (N) = 𝐸 𝑅 (N) = sup
lim 𝐸 inf 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 ). (10.A.19)
𝛼→1 𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)

The proof of
e𝛼 (N) = 𝑅(N) = sup
lim 𝑅 inf 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 ) (10.A.20)
′
𝛼→1 𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈PPT (𝑅:𝐵)

is analogous, and so we omit it.

When approaching one from below, we use exactly the same arguments as above
to exchange the infimum and supremum, in order to conclude that
e𝛼 (N) = sup sup
lim 𝐸 inf e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 )
𝐷 (10.A.21)
𝛼→1 − 𝛼∈( 1/2,1) 𝜓𝑅 𝐴 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)

= sup sup inf e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 )

𝐷 (10.A.22)
𝜓 𝑅 𝐴 𝛼∈( 1/2,1) 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)

= sup inf e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 )

sup 𝐷 (10.A.23)
𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵) 𝛼∈( 1/2,1)
= sup inf 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 ) (10.A.24)
𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)
= 𝐸 𝑅 (N). (10.A.25)

Next, when approaching from above, we again use Theorem 2.25. This time, since
the function

(𝜓 𝑅 𝐴 , 𝛼) ↦→ inf e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 )

𝐷 (10.A.26)
𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)

628
Chapter 10: Entanglement Measures for Quantum Channels

is continuous in the first argument and monotonically increasing in the second

argument, we can exchange the order of the infimum and supremum to obtain
e𝛼 (N) =
lim 𝐸 inf sup inf e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 )
𝐷 (10.A.27)
𝛼→1+ 𝛼∈(1,∞) 𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)

= sup inf inf e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 )

𝐷 (10.A.28)
𝜓 𝑅 𝐴 𝛼∈(1,∞) 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)

= sup inf e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 )

inf 𝐷 (10.A.29)
𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵) 𝛼∈(1,∞)
= sup inf 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 ) (10.A.30)
𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)
= 𝐸 𝑅 (N). (10.A.31)

This concludes the proof of (10.A.19).

10.A.0.2 𝜶 → ∞ Limits

We now move on to the 𝛼 → ∞ limits. We first show that

e𝛼 ( 𝐴; 𝐵) 𝜌 = 𝐸 max ( 𝐴; 𝐵) =
lim 𝐸 inf 𝐷 max (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ). (10.A.32)
𝛼→∞ 𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)

The proof of
e𝛼 ( 𝐴; 𝐵) 𝜌 = 𝑅max ( 𝐴; 𝐵) 𝜌 =
lim 𝑅 inf 𝐷 max (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) (10.A.33)
𝛼→∞ 𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)

is analogous, and so we omit it.

Note that due to monotonicity in 𝛼 of 𝐷
e𝛼 , the following equality holds

e𝛼 = sup 𝐷
lim 𝐷 e𝛼 = 𝐷 max . (10.A.34)
𝛼→∞ 𝛼∈(1,∞)

Therefore,
e𝛼 ( 𝐴; 𝐵) 𝜌 = sup
lim 𝐸 inf e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ).
𝐷 (10.A.35)
𝛼→∞ 𝛼∈(1,∞) 𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)

Now, since the function

(𝜎𝐴𝐵 , 𝛼) ↦→ 𝐷
e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) (10.A.36)

629
Chapter 10: Entanglement Measures for Quantum Channels

is continuous in the first argument, monotonically increasing in the second argument,

and because the set of separable states is compact, we can use Theorem 2.25 to
change the order of the supremum and infimum to obtain
e𝛼 ( 𝐴; 𝐵) 𝜌 = sup
lim 𝐸 inf e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 )
𝐷 (10.A.37)
𝛼→∞ 𝛼∈(1,∞) 𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)
= inf e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 )
sup 𝐷 (10.A.38)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵) 𝛼∈(1,∞)

= inf 𝐷 max (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ) (10.A.39)

𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)
= 𝐸 max ( 𝐴; 𝐵) 𝜌 . (10.A.40)

This completes the proof of (10.A.32).

Finally, we prove that
e𝛼 (N) = 𝐸 max (N) = sup
lim 𝐸 inf 𝐷 max (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 ). (10.A.41)
𝛼→∞ 𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)

The proof of
e𝛼 (N) = 𝑅max (N) = sup
lim 𝑅 inf 𝐷 max (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 ) (10.A.42)
′
𝛼→∞ 𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈PPT (𝑅:𝐵)

is analogous, and so we omit it.

Using exactly the same argument as in the proof of (10.A.32) above, we obtain
e𝛼 (N) = sup sup
lim 𝐸 inf e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 )
𝐷 (10.A.43)
𝛼→∞ 𝛼∈(1,∞) 𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)

= sup sup inf e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 )

𝐷 (10.A.44)
𝜓 𝑅 𝐴 𝛼∈(1,∞) 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)

= sup inf e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 )

sup 𝐷 (10.A.45)
𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵) 𝛼∈(1,∞)
= sup inf 𝐷 max (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 ) (10.A.46)
𝜓 𝑅 𝐴 𝜎𝑅𝐵 ∈SEP(𝑅:𝐵)
= 𝐸 max (N), (10.A.47)

where in the third line we used Theorem 2.25 in order to exchange the infimum
and supremum based on exactly the same arguments used in the proof of (10.A.32)
above. This concludes the proof of (10.A.41).

630
Part II

Quantum Communication
Protocols

We now begin our study of quantum communication protocols. In this

part of the book, we focus on point-to-point communication protocols,
most of which do not make use of feedback between the sender and
receiver. The settings include classical communication, entanglement-
assisted classical communication, entanglement distillation, quantum
communication, secret key distillation, and private communication. These
point-to-point protocols are the most basic communication models in
quantum information, and in many cases, they are relevant from a practical
perspective. Furthermore, these protocols serve as benchmarks for the
usefulness of the feedback-assisted protocols that we consider in Part III,
in the sense that the optimal communication rates of any feedback-assisted
protocol should not be smaller than the corresponding point-to-point
protocol, in order for the feedback-assisted protocol to be deemed useful
or advantageous.
Chapter 11

Entanglement-Assisted
Classical Communication
The first communication task that we consider is entanglement-assisted classical
communication. In this scenario, Alice and Bob are allowed to share an unlimited
amount of entanglement prior to communication, and the goal is for Alice to transmit
the maximum possible amount of classical information over a given channel N, by
using this prior shared entanglement as a resource. We consider this particular
setting before all other communication settings because, perhaps unexpectedly, the
main information-theoretic results in this setting are much simpler than those in all
other communication settings that we consider in this book.
Entanglement is a uniquely quantum phenomenon, and it is natural to ask,
when communicating over quantum channels, whether it can be used to provide
an advantage for sending classical information. The super-dense coding protocol,
described in Section 5.2, is an example of such an advantage. Recall that in this
protocol, Alice and Bob share a pair of quantum systems in the maximally entangled
state |Φ+ ⟩ = √1 (|0, 0⟩ + |1, 1⟩), and they are connected by a noiseless qubit channel.
2
With this shared entanglement, along with only one use of the channel, Alice can
communicate two bits of classical information to Bob. In the case of qudits, using
Í𝑑−1
the maximally entangled state √1 𝑖=0 |𝑖, 𝑖⟩, Alice can communication 2 log2 𝑑
𝑑
bits to Bob with only one use of a noiseless qudit quantum channel. Does this
kind of advantage exist in general? Specifically, supposing that we allow Alice
and Bob unlimited shared entanglement, what is the maximum amount of classical
information that can be communicated over a given quantum channel N?

632
Chapter 11: Entanglement-Assisted Classical Communication

The answer to this question is provided by Theorem 11.16, which tells us that
the entanglement-assisted classical capacity of a channel N is equal to the mutual
information 𝐼 (N) of the channel (see (7.11.102)). The strength of this result is that
it holds for all channels. Entanglement-assisted classical communication is one of
the few scenarios in which such a profoundly simple statement—applying to all
channels—can be made. Furthermore, the fact that the mutual information 𝐼 (N) is
the optimal rate for entanglement-assisted classical communication for all channels
N makes this communication scenario formally analogous to communication over
classical channels. Indeed, the famous result of Shannon from 1948 is that the
capacity of a classical channel described by a conditional probability distribution
𝑝𝑌 |𝑋 (𝑦|𝑥) with input and output random variables 𝑋 and 𝑌 , respectively, is equal
to max 𝑝 𝑋 𝐼 (𝑋; 𝑌 ), where 𝐼 (𝑋; 𝑌 ) is the mutual information between the random
variables 𝑋 and 𝑌 and the optimization is performed over all probability distributions
𝑝 𝑋 corresponding to the input 𝑋. Entanglement-assisted classical communication
can thus be viewed as a “natural” analogue of classical communication in the
quantum setting.

11.1 One-Shot Setting

We begin by considering the one-shot setting for entanglement-assisted classical
communication over N, with such a protocol depicted in Figure 11.1. We call
this the “one-shot setting” because the channel N is used only once. This is in
contrast to the “asymptotic setting” that we consider in the next section, in which
the channel may be used an arbitrarily large number of times.
The protocol depicted in Figure 11.1 is defined by the four elements (M, Ψ𝐴′ 𝐵′ ,
E 𝑀 ′ 𝐴′ →𝐴 , D𝐵𝐵′ → 𝑀b ), in which M is a message set, Ψ𝐴′ 𝐵′ is an entangled state shared
by Alice and Bob, E 𝑀 ′ 𝐴′ →𝐴 is an encoding channel, and D𝐵𝐵′ → 𝑀b is a decoding
channel. The triple (Ψ, E, D), consisting of the resource state and encoding and
decoding channels, is called an entanglement-assisted code or, more simply, a code.
In what follows, we employ the abbreviation

P ≡ (Ψ, E, D), (11.1.1)

where the notation P indicates protocol.

Given that there are |M| messages in the message set, it holds that each message
can be uniquely associated with a bit string of size at least log2 |M|. The quantity
633
Chapter 11: Entanglement-Assisted Classical Communication

M3m

Alice
A0 E A
N B

m
b
Ψ A0 B0
Bob
B0

Figure 11.1: Depiction of a protocol for entanglement-assisted classical

communication over one use of the quantum channel N. Alice and Bob initially
share a pair of quantum systems in the state Ψ𝐴′ 𝐵′ . Alice, who wishes to send a
message 𝑚 selected from a set M of messages, first encodes the message into a
quantum state on a quantum system 𝐴 by using an encoding channel E. She
then sends the quantum system 𝐴 through the channel N. After Bob receives
the system 𝐵, he performs a measurement on both of his systems 𝐵𝐵′, using the
outcome of the measurement to give an estimate 𝑚 b of the message sent by Alice.

log2 |M| thus represents the number of bits that are communicated in the protocol.
One of the goals of this section is to obtain upper and lower bounds, in terms
of information measures for channels, on the maximum number log2 |M| of bits
that can be communicated in an entanglement-assisted classical communication
protocol.
The protocol proceeds as follows: let 𝑝 : M → [0, 1] be a probability
distribution over the message set. Alice starts by preparing two systems 𝑀 and 𝑀 ′
in the following classically correlated state:
𝑝
∑︁
Φ𝑀 𝑀 ′ B 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ |𝑚⟩⟨𝑚| 𝑀 ′ . (11.1.2)
𝑚∈M
Note that if Alice wishes to send a particular message 𝑚 deterministically, then
she can choose the distribution 𝑝 to be the degenerate distribution, equal to one
for 𝑚 and zero for all other messages. Alice and Bob share the state Ψ𝐴′ 𝐵′ before
communication begins, so that the global state shared between them is
𝑝
Φ 𝑀 𝑀 ′ ⊗ Ψ 𝐴′ 𝐵 ′ . (11.1.3)

Alice then sends the 𝑀 ′ and 𝐴′ registers through the encoding channel E 𝑀 ′ 𝐴′ →𝐴 .
Due to the fact that the system 𝑀 ′ is classical, this encoding channel realizes a set
{E𝑚𝐴′ →𝐴 } 𝑚∈M of quantum channels as follows:
E𝑚𝐴′ →𝐴 (𝜏𝐴′ ) B E 𝑀 ′ 𝐴′ →𝐴 (|𝑚⟩⟨𝑚| 𝑀 ′ ⊗ 𝜏𝐴′ ) (11.1.4)
634
Chapter 11: Entanglement-Assisted Classical Communication

for every state 𝜏𝐴′ . The global state after the encoding channel is therefore
𝑝
∑︁
E 𝑀 ′ 𝐴′ →𝐴 (Φ 𝑀 𝑀 ′ ⊗ Ψ𝐴′ 𝐵′ ) = 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ E𝑚𝐴′ →𝐴 (Ψ𝐴′ 𝐵′ ). (11.1.5)
𝑚∈M

Alice then transmits the 𝐴 system through the channel N 𝐴→𝐵 , leading to the
state 𝑝
(N 𝐴→𝐵 ◦ E 𝑀 ′ 𝐴′ →𝐴 )(Φ 𝑀 𝑀 ′ ⊗ Ψ𝐴′ 𝐵′ )
∑︁
= 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ (N 𝐴→𝐵 ◦ E𝑚𝐴′ →𝐴 )(Ψ𝐴′ 𝐵′ ).
𝑚∈M (11.1.6)
∑︁
𝑚
= 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ 𝜏𝐵𝐵 ′,

𝑚∈M
where
′ B (N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(Ψ 𝐴′ 𝐵 ′ ) ∀ 𝑚 ∈ M.
𝑚 𝑚
𝜏𝐵𝐵 (11.1.7)

Bob, whose task is to determine which message Alice sent, applies a decoding
channel D𝐵𝐵′ → 𝑀b on his system 𝐵′ and the system 𝐵 received through the channel N.
The decoding channel is a quantum-classical channel (Definition 4.10) associated
with a POVM {Λ𝑚 𝐵𝐵′ } 𝑚∈M , so that
∑︁
D𝐵𝐵′ → 𝑀b (𝜏𝐵𝐵′ ) B
𝑚
Tr[Λ𝑚 𝑚
𝐵𝐵′ 𝜏𝐵𝐵′ ]| 𝑚
b
b⟩⟨𝑚
b | 𝑀b , (11.1.8)
b∈M
𝑚

for all 𝑚 ∈ M. The global state in (11.1.6) thus becomes

𝑝 𝑝
𝜔 B (D𝐵𝐵′ → 𝑀b ◦ N 𝐴→𝐵 ◦ E 𝑀 ′ 𝐴′ →𝐴 )(Φ 𝑀 𝑀 ′ ⊗ Ψ𝐴′ 𝐵′ ) (11.1.9)
𝑀𝑀b
∑︁
𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ Tr[Λ𝑚 𝐵𝐵′ N 𝐴→𝐵 (E 𝐴′ →𝐴 (Ψ 𝐴′ 𝐵′ ))]| 𝑚
𝑚
= b
b⟩⟨𝑚
b | 𝑀b .
b∈M
𝑚,𝑚
(11.1.10)

The final decoding measurement by Bob induces the conditional probability

distribution 𝑞 : M × M → [0, 1] defined by

b |𝑚) B Pr[ 𝑀
𝑞( 𝑚 b |𝑀 = 𝑚]
b=𝑚 (11.1.11)
𝐵𝐵′ N 𝐴→𝐵 (E 𝐴′ →𝐴 (Ψ 𝐴′ 𝐵′ ))].
= Tr[Λ𝑚
b 𝑚
(11.1.12)

Bob’s strategy is such that if the outcome 𝑚

b occurs from his measurement, then he
declares that the message sent was 𝑚 b.
635
Chapter 11: Entanglement-Assisted Classical Communication

The probability that Bob correctly identifies a given message 𝑚 is then given by
𝑞(𝑚|𝑚). The message error probability of the code P ≡ (Ψ, E, D) and message 𝑚
is then given by
𝑝 err (𝑚, P; N) B 1 − 𝑞(𝑚|𝑚)
= Tr[( 1𝐵𝐵′ − Λ𝑚 𝐵𝐵
𝑚
′ )N 𝐴→𝐵 (E 𝐴′ →𝐴 (Ψ 𝐴′ 𝐵 ′ ))]
∑︁ (11.1.13)
= b |𝑚).
𝑞( 𝑚
b∈M\{𝑚}
𝑚

The average error probability of the code is

∑︁
𝑝 err (P; 𝑝, N) B 𝑝(𝑚) 𝑝 err (𝑚; P)
𝑚∈M
∑︁
= 𝑝(𝑚)(1 − 𝑞(𝑚|𝑚)) (11.1.14)
𝑚∈M
∑︁ ∑︁
= b |𝑚).
𝑝(𝑚)𝑞( 𝑚
b∈M\{𝑚}
𝑚∈M 𝑚

The maximal error probability of the code is

𝑝 ∗err (P; N) B max 𝑝 err (𝑚, P; N). (11.1.15)
𝑚∈M

Each of these three error probabilities can be used to assess the reliability of the
protocol, i.e., how well the encoding and decoding allow Alice to transmit her
message to Bob.

Definition 11.1 (|M|, 𝜺) Entanglement-Assisted Classical Communica-

tion Protocol
Let (M, Ψ𝐴′ 𝐵′ , E 𝑀 ′ 𝐴′ →𝐴 , D𝐵𝐵′ → 𝑀b ) be the elements of an entanglement-assisted
classical communication protocol over the channel N 𝐴→𝐵 . The protocol is
called an (|M|, 𝜀) protocol, with 𝜀 ∈ [0, 1], if 𝑝 ∗err (P; N) ≤ 𝜀.

Lemma 11.2
The following equalities hold
1 𝑝 𝑝
𝑝 err (P; 𝑝, N) = Φ𝑀 𝑀 ′ − 𝜔 b , (11.1.16)
2 𝑀𝑀 1

636
Chapter 11: Entanglement-Assisted Classical Communication

1 𝑝
𝑝 ∗err (P; N) =
𝑝
max Φ𝑀 𝑀 ′ − 𝜔 b , (11.1.17)
𝑝:M→[0,1] 2 𝑀𝑀 1

𝑝 𝑝
where Φ 𝑀 𝑀 ′ and 𝜔 b are defined in (11.1.2) and (11.1.9), respec-
𝑀𝑀
tively. Thus, the error criterion 𝑝 ∗err (P; N) ≤ 𝜀 is equivalent to
𝑝 𝑝
max 𝑝:M→[0,1] 12 Φ 𝑀 𝑀 ′ − 𝜔 b ≤ 𝜀.
𝑀𝑀 1

Remark: The final criterion above states that the normalized trace distance between the initial
and final states of the protocol, maximized over all possible prior probability distributions, does
not exceed 𝜀.

Proof: To see this, let us first note that the normalized trace distance in (11.1.17)
is equal to the average error probability of the code. Indeed,
1 𝑝 𝑝
Φ𝑀 𝑀 ′ − 𝜔 b
2 𝑀 𝑀 1

1 ∑︁
= 𝑝(𝑚)|𝑚, 𝑚⟩⟨𝑚, 𝑚| 𝑀 𝑀 ′
2
𝑚∈M
∑︁
− b |𝑚)|𝑚, 𝑚
𝑝(𝑚)𝑞( 𝑚 b⟩⟨𝑚, 𝑚
b | 𝑀 𝑀b (11.1.18)
b∈M
𝑚,𝑚 1
1 ∑︁
= 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀
2
𝑚∈M
!
∑︁
⊗ |𝑚⟩⟨𝑚| 𝑀 ′ − b |𝑚)| 𝑚
𝑞( 𝑚 b⟩⟨𝑚
b | 𝑀b (11.1.19)
b∈M
𝑚 1
1 ∑︁ ∑︁
= 𝑝(𝑚) |𝑚⟩⟨𝑚| 𝑀 ′ − b |𝑚)| 𝑚
𝑞( 𝑚 b⟩⟨𝑚
b | 𝑀b (11.1.20)
2
𝑚∈M b∈M
𝑚 1
1 ∑︁
= 𝑝(𝑚) ∥ (1 − 𝑞(𝑚|𝑚))|𝑚⟩⟨𝑚|
2
𝑚∈M
∑︁
− b |𝑚)| 𝑚
𝑞( 𝑚 b⟩⟨𝑚
b | 𝑀b (11.1.21)
b∈M\{𝑚}
𝑚 1

637
Chapter 11: Entanglement-Assisted Classical Communication

1 ∑︁ ∑︁
= 𝑝(𝑚) (1 − 𝑞(𝑚|𝑚)) + b |𝑚) ®
© ª
𝑞( 𝑚 (11.1.22)
2
𝑚∈M « b∈M\{𝑚}
𝑚 ¬
1 ∑︁ 1 ∑︁ ∑︁
= 𝑝(𝑚)(1 − 𝑞(𝑚|𝑚)) + b |𝑚)
𝑝(𝑚)𝑞( 𝑚 (11.1.23)
2 2
𝑚∈M b∈M\{𝑚}
𝑚∈M 𝑚
| {z } | {z }
𝑝 err (P;𝑝,N) 𝑝 err (P;𝑝,N)
= 𝑝 err (P; 𝑝, N), (11.1.24)

where the third and fifth equalities follow from (2.2.97) with 𝛼 = 1. Then, if 𝑚 ∗ ∈ M
is the message attaining the maximum error probability 𝑝 ∗err , let 𝑝˜ : M → [0, 1]
be the probability distribution such that 𝑝(𝑚 ˜ ∗ ) = 1 and 𝑝(𝑚)
˜ = 0 for all 𝑚 ≠ 𝑚 ∗ .
Using this probability distribution, we obtain

𝑝 ∗err (P; N) = max 𝑝 err (𝑚, P; N) (11.1.25)

𝑚∈M
∑︁
= ˜
𝑝(𝑚) 𝑝 err (𝑚, P; N) (11.1.26)
𝑚∈M
˜ N)
= 𝑝 err (P; 𝑝, (11.1.27)
1 𝑝˜ 𝑝˜
= Φ𝑀 𝑀 ′ − 𝜔 b (11.1.28)
2 𝑀 𝑀 1
1 𝑝 𝑝
≤ max Φ𝑀 𝑀 ′ − 𝜔 b . (11.1.29)
𝑝:M→[0,1] 2 𝑀 𝑀 1

Furthermore, letting 𝑝 ∗ be the distribution attaining the maximum average error

probability, we find that
1 𝑝
= 𝑝 err (P; 𝑝 ∗ , N)
𝑝
max Φ𝑀 𝑀 ′ − 𝜔 b (11.1.30)
𝑝:M→[0,1] 2 𝑀 𝑀 1
∑︁
= 𝑝 ∗ (𝑚) 𝑝 err (𝑚, P; N) (11.1.31)
𝑚∈M
∑︁
≤ 𝑝 ∗ (𝑚) 𝑝 ∗err (P; N) (11.1.32)
𝑚∈M
= 𝑝 ∗err (P; N), (11.1.33)

where the inequality follows from the fact that

𝑝 err (𝑚, P; N) ≤ max 𝑝 err (𝑚, P; N) = 𝑝 ∗err (P; N) (11.1.34)

𝑚∈M

638
Chapter 11: Entanglement-Assisted Classical Communication

for all 𝑚 ∈ M. So we find that

𝑝 ∗err (P; N) = max 𝑝 err (P; 𝑝) (11.1.35)
𝑝:M→[0,1]
1 𝑝 𝑝
= max Φ𝑀 𝑀 ′ − 𝜔 b . (11.1.36)
𝑝:M→[0,1] 2 𝑀𝑀 1

This concludes the proof. ■

Another way to define the error criterion of an (|M|, 𝜀) protocol, which is

equivalent to the average error probability, is through what is called the comparator
test. The comparator test is a measurement defined by the two-element POVM
{Π 𝑀 𝑀b , 1 − Π 𝑀 𝑀b }, where Π 𝑀 𝑀b is the projection defined as
∑︁
Π 𝑀 𝑀b B |𝑚⟩⟨𝑚| 𝑀 ⊗ |𝑚⟩⟨𝑚| 𝑀b . (11.1.37)
𝑚∈M
𝑝
Note that Tr[Π 𝑀 𝑀b 𝜔 b ] is equal to the probability that the classical registers 𝑀
𝑀𝑀
b in the state 𝜔 b have the same values. In particular, observe that
and 𝑀 𝑀𝑀
h i
𝑝
Tr Π 𝑀 𝑀b 𝜔 b
" 𝑀𝑀 !
∑︁
= Tr |𝑚, 𝑚⟩⟨𝑚, 𝑚| 𝑀 𝑀b
𝑚∈M
!#
∑︁
× 𝑝(𝑚′)𝑞( 𝑚
b |𝑚′)|𝑚′, 𝑚
b⟩⟨𝑚′, 𝑚
b | 𝑀 𝑀b (11.1.38)
𝑚 ′ ,𝑚∈M
∑︁
= 𝑝(𝑚′)𝑞( 𝑚
b |𝑚′)𝛿𝑚,𝑚 ′ 𝛿𝑚,𝑚b (11.1.39)
𝑚,𝑚 ′ ,𝑚
b∈M
∑︁
= 𝑝(𝑚)𝑞(𝑚|𝑚) (11.1.40)
𝑚∈M
= 1 − 𝑝 err (P; 𝑝, N). (11.1.41)
We can interpret this expression as the average success probability of the code
P ≡ (Ψ, E, D), which we denote by
𝑝 succ (P; 𝑝, N) B 1 − 𝑝 err (P; 𝑝, N). (11.1.42)

As mentioned at the beginning of this chapter, our goal is to bound (from

above and below) the maximum number log2 |M| of transmitted bits in any
639
Chapter 11: Entanglement-Assisted Classical Communication

entanglement-assisted classical communication protocol over N. Given an error

probability tolerance of 𝜀, we call the maximum bits of transmitted bits the one-shot
entanglement-assisted classical capacity of N.

Definition 11.3 One-Shot Entanglement-Assisted Classical Capacity

Given a quantum channel N 𝐴→𝐵 and 𝜀 ∈ [0, 1], the one-shot 𝜀-error
entanglement-assisted classical capacity of N, denoted by 𝐶EA𝜀 (N), is de-

fined to be the maximum number log2 |M| of transmitted bits among all
(|M|, 𝜀) entanglement-assisted classical communication protocols over N. In
other words,
𝜀
𝐶EA (N) B sup {log2 |M| : 𝑝 ∗err ((Ψ, E, D); N) ≤ 𝜀}, (11.1.43)
(M,Ψ,E,D)

where the optimization is with respect to all protocols

(M, Ψ𝐴′ 𝐵′ , E 𝑀 ′ 𝐴′ →𝐴 , D𝐵𝐵′ → 𝑀b ) such that 𝑑 𝑀 ′ = 𝑑 𝑀b = |M|.

In addition to finding, for a given 𝜀 ∈ [0, 1], the maximum number of transmitted
bits among all (|M|, 𝜀) classical communication protocols over N 𝐴→𝐵 , we can
consider the following complementary problem: for a given number of messages
|M|, find the smallest possible error among all (|M|, 𝜀) entanglement-assisted
classical communication protocols, which we denote by 𝜀EA ∗ (|M|; N). In other

words, the problem is to determine

∗
𝜀 EA (|M|; N) B inf {𝑝 ∗err ((Ψ, E, D); N) : 𝑑 𝑀 ′ = 𝑑 𝑀b = |M|}, (11.1.44)
(Ψ,E,D)

where the optimization is over every state Ψ𝐴′ 𝐵′ , encoding channel E 𝑀 ′ 𝐴′ →𝐴 , and
decoding channel D𝐵𝐵′ → 𝑀b , such that 𝑑 𝑀 ′ = 𝑑 𝑀b = |M|. In this chapter, we focus
primarily on the problem of optimizing the number of transmitted bits rather than
the error, and so our primary quantity of interest is the one-shot capacity 𝐶EA𝜀 (N).

11.1.1 Protocol Over a Useless Channel

Our first goal is to obtain an upper bound on the one-shot entanglement-assisted

classical capacity defined in (11.1.43). To do so, along with the entanglement-
assisted classical communication protocol over the actual channel N described
above, we also consider the same protocol but over the useless channel depicted
640
Chapter 11: Entanglement-Assisted Classical Communication

M3m

Alice
A0 E A
PσB B

m̂
Ψ A0 B0
Bob
B0

Figure 11.2: Depiction of a protocol that is useless for entanglement-assisted

classical communication. The state encoding the message 𝑚 via E is discarded
and replaced by an arbitrary (but fixed) state 𝜎𝐵 .

in Figure 11.2. This useless channel discards the quantum state encoded with the
message and replaces it with some arbitrary (but fixed) state 𝜎𝐵 . This replacement
channel is useless for communication because the state 𝜎𝐵 does not contain any
information about the message 𝑚. Intuitively, we can say that such a channel
corresponds to “cutting the communication line.” As we show in Lemma 11.4,
comparing this protocol over the useless channel with the actual protocol allows us
to obtain an upper bound on the quantity log2 |M|, which we recall represents the
number of bits that are transmitted over the channel.
The definition of the useless channel implies that, for every message 𝑚 ∈ M,

E𝑚𝐴′ →𝐴 (Ψ𝐴′ 𝐵′ ) ↦→ P𝜎𝐵 ◦ Tr 𝐴 ◦ E𝑚𝐴′ →𝐴 (Ψ𝐴′ 𝐵′ ) = 𝜎𝐵 ⊗ Ψ𝐵′ , (11.1.45)

where Ψ𝐵′ B Tr 𝐴′ [Ψ𝐴′ 𝐵′ ]. Making use of the definition of the replacement channel
R𝜎𝐴→𝐵
𝐵
in Definition 4.8, we can write this as

E𝑚𝐴′ →𝐴 (Ψ𝐴′ 𝐵′ ) ↦→ R𝜎𝐴→𝐵

𝐵
(E 𝐴′ →𝐴 (Ψ𝐴′ 𝐵′ )). (11.1.46)

The state at the end of the protocol over the useless channel is then
∑︁
𝑝
𝜏 bB 𝑝(𝑚)Tr[Λ𝑚 𝐵𝐵′ (𝜎𝐵 ⊗ Ψ𝐵′ )]|𝑚⟩⟨𝑚| 𝑀 ⊗ | 𝑚
b
b⟩⟨𝑚b | 𝑀b
𝑀𝑀
𝑚,𝑚b∈M
𝑝
∑︁ (11.1.47)
= 𝜋𝑀 ⊗ Tr[Λ𝑚
𝐵𝐵′ (𝜎𝐵
b
⊗ Ψ𝐵′ )]| 𝑚
b⟩⟨𝑚
b | 𝑀b ,
b∈M
𝑚

where ∑︁
𝑝
𝜋𝑀 B 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 . (11.1.48)
𝑚∈M

641
Chapter 11: Entanglement-Assisted Classical Communication
𝑝
Now, recall from (11.1.10) that the state 𝜔 b at the end of the actual protocol over
𝑀𝑀
the channel N is given by
∑︁
𝑝
𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ Tr[Λ𝑚 𝐵𝐵′ N 𝐴→𝐵 (E 𝐴′ →𝐴 (Ψ 𝐴′ 𝐵′ ))]| 𝑚
𝑚
𝜔 b= b
b⟩⟨𝑚
b | 𝑀b .
𝑀𝑀
b∈M
𝑚,𝑚
(11.1.49)
It is helpful in what follows to let
1 ∑︁
Φ𝑀 𝑀 ′ B |𝑚⟩⟨𝑚| 𝑀 ⊗ |𝑚⟩⟨𝑚| 𝑀 ′ (11.1.50)
|M|
𝑚∈M

be the state in (11.1.2) in which the probability distribution 𝑝 is the uniform

distribution over M.
We now state a lemma that is helpful for placing an upper bound on the number
log2 |M| of bits communicated in an entanglement-assisted classical communication
protocol. This lemma can also be used for the same purpose for unassisted classical
communication protocols, as discussed in the next chapter.

Lemma 11.4
Let Φ 𝑀 𝑀 ′ be the state defined in (11.1.50), and let 𝜔 𝑀 𝑀 ′ be a state on the
two classical registers 𝑀 and 𝑀 ′ such that 𝜔 𝑀 = Tr 𝑀 ′ [𝜔 𝑀 𝑀 ′ ] = 𝜋 𝑀 = 1|M| 𝑀
.
If the probability Tr[Π 𝑀 𝑀 ′ 𝜔 𝑀 𝑀 ′ ] that the state 𝜔 𝑀 𝑀 ′ passes the comparator
test defined by the POVM {Π 𝑀 𝑀 ′ , 1 − Π 𝑀 𝑀 ′ }, where Π 𝑀 𝑀 ′ is the projection
defined in (11.1.37), satisfies

Tr[Π 𝑀 𝑀 ′ 𝜔 𝑀 𝑀 ′ ] ≥ 1 − 𝜀, (11.1.51)

for some 𝜀 ∈ [0, 1], then

log2 |M| ≤ 𝐼 𝐻𝜀 (𝑀; 𝑀 ′)𝜔 , (11.1.52)

where the 𝜀-hypothesis testing mutual information 𝐼 𝐻𝜀 (𝑀; 𝑀 ′)𝜔 is defined in

(7.11.88).

Proof: By assumption, we have that

Tr[Π 𝑀 𝑀 ′ 𝜔 𝑀 𝑀 ′ ] ≥ 1 − 𝜀, (11.1.53)

642
Chapter 11: Entanglement-Assisted Classical Communication

Now, consider a state 𝜏𝑀 𝑀 ′ of the form 𝜏𝑀 𝑀 ′ = 𝜔 𝑀 ⊗ 𝜎𝑀 ′ = 𝜋 𝑀 ⊗ 𝜎𝑀 ′ , where 𝜎𝑀 ′

is some state.1 Then,

Tr[Π 𝑀 𝑀 ′ 𝜏𝑀 𝑀 ′ ] = Tr[Π 𝑀 𝑀 ′ (𝜋 𝑀 ⊗ 𝜎𝑀 ′ )] (11.1.54)

1
= Tr[Π 𝑀 𝑀 ′ ( 1 𝑀 ⊗ 𝜎𝑀 ′ )] (11.1.55)
|M|
1
= Tr[Tr 𝑀 [Π 𝑀 𝑀 ′ ]𝜎𝑀 ′ ] (11.1.56)
|M|
1
= Tr[1 𝑀 ′ ]𝜎𝑀 ′ ] (11.1.57)
|M|
1
= , (11.1.58)
|M|
We thus obtain

log2 |M| = − log2 Tr[Π 𝑀 𝑀 ′ 𝜏𝑀 𝑀 ′ ] (11.1.59)

≤ 𝐷 𝜀𝐻 (𝜔 𝑀 𝑀 ′ ∥𝜏𝑀 𝑀 ′ ) (11.1.60)
= 𝐷 𝜀𝐻 (𝜔 𝑀 𝑀 ′ ∥𝜔 𝑀 ⊗ 𝜎𝑀 ′ ), (11.1.61)

where the inequality follows from the definition of the hypothesis testing relative
entropy in (7.9.1) (i.e., Π 𝑀 𝑀 ′ is a particular measurement operator satisfying
(11.1.53), but 𝐷 𝜀𝐻 (𝜔 𝑀 𝑀 ′ ∥𝜏𝑀 𝑀 ′ ) involves an optimization over all such operators).
Since the state 𝜎𝑀 ′ is arbitrary, we conclude that

log2 |M| ≤ inf 𝐷 𝜀𝐻 (𝜔 𝑀 𝑀 ′ ∥𝜔 𝑀 ⊗ 𝜎𝑀 ′ ) = 𝐼 𝐻𝜀 (𝑀; 𝑀 ′)𝜔 , (11.1.62)

𝜎𝑀 ′

which is (11.1.52), as required. ■

The right-hand side of (11.1.52) is an upper bound on the number log2 |M| of bits
communicated using an (|M|, 𝜀) entanglement-assisted classical communication
protocol over the channel N. Indeed, since the error criterion 𝑝 ∗err (P) ≤ 𝜀 holds by
definition of an (|M|, 𝜀) protocol, using (11.1.36) and (11.1.24) we obtain

𝑝 err (P; 𝑝, N) ≤ max 𝑝 err (P; 𝑝, N) (11.1.63)

𝑝:M→[0,1]

1 Note that the state in (11.1.47) at the end of the entanglement-assisted classical communication
protocol over the useless channel has precisely this form (when 𝑝 is taken to be the uniform
distribution over the message set M).

643
Chapter 11: Entanglement-Assisted Classical Communication

1 𝑝 𝑝
= max Φ𝑀 𝑀 ′ − 𝜔 b (11.1.64)
𝑝:M→[0,1] 2 𝑀𝑀 1
∗
= 𝑝 err (P; N) (11.1.65)
≤𝜀 (11.1.66)

for every probability distribution 𝑝 on M. In particular, the inequality above holds

with 𝑝 being the uniform distribution on M, so that 𝑝(𝑚) = |M| 1
for all 𝑚 ∈ M. Let
us define the state
1 ∑︁
Tr[Λ𝑚 𝐵𝐵′ N 𝐴→𝐵 (E 𝐴′ →𝐴 (Ψ 𝐴′ 𝐵′ ))]|𝑚, 𝑚
𝑚
𝜔 𝑀 𝑀b B b
b⟩⟨𝑚, 𝑚
b | 𝑀 𝑀b , (11.1.67)
|M|
b∈M
𝑚,𝑚

𝑝
which is the state 𝜔 with 𝑝 being the uniform distribution over M. Observe that
𝑀𝑀b
" ! #
1 ∑︁ ∑︁
𝐵𝐵′ N 𝐴→𝐵 (E 𝐴′ →𝐴 (Ψ 𝐴′ 𝐵′ )) |𝑚⟩⟨𝑚| 𝑀
Λ𝑚 𝑚
Tr 𝑀b [𝜔 𝑀 𝑀b ] = Tr b
|M|
𝑚∈M b∈M
𝑚
(11.1.68)
1 ∑︁
= Tr[N 𝐴→𝐵 (E𝑚𝐴′ →𝐴 (Ψ𝐴′ 𝐵′ ))]|𝑚⟩⟨𝑚| 𝑀 (11.1.69)
|M|
𝑚∈M
= 𝜋𝑀 , (11.1.70)

where the last equality follows because the channels N 𝐴→𝐵 and E𝑚𝐴′ →𝐴 are trace
preserving. Finally, since the probability of passing the comparator test is given by
(11.1.41), i.e., h i
𝑝
Tr Π 𝑀 𝑀b 𝜔 b = 1 − 𝑝 err (P; 𝑝, N) (11.1.71)
𝑀𝑀
for every probability distribution 𝑝, we find that Tr[Π 𝑀 𝑀b 𝜔 𝑀 𝑀b ] ≥ 1 − 𝜀. The state
𝜔 𝑀 𝑀b thus satisfies the condition of Lemma 11.4. We conclude that

log2 |M| ≤ 𝐼 𝐻𝜀 (𝑀; 𝑀)

b𝜔 (11.1.72)

for every (|M|, 𝜀) entanglement-assisted classical communication protocol.

Recall from Section 7.9 that the hypothesis testing relative entropy has an
operational meaning as the optimal type-II error exponent in asymmetric hypothesis
testing. The quantity 𝐷 𝜀𝐻 (𝜔 𝑀 𝑀b ∥𝜔 𝑀 ⊗ 𝜎𝑀b ) thus represents the optimal type-II error
exponent, subject to the upper bound of 𝜀 on the type-I error exponent, for distin-
guishing between the state resulting from the actual entanglement-assisted classical
644
Chapter 11: Entanglement-Assisted Classical Communication

communication protocol over N and the state resulting from an entanglement-

assisted classical communication protocol over a useless channel, which discards
the state encoded with the message and replaces it with the state 𝜎𝑀b . By taking
an infimum over all states 𝜎𝑀b , the quantity 𝐼 𝐻𝜀 (𝑀; 𝑀)
b 𝜔 represents the smallest
possible minimum type-II error exponent. The bound in (11.1.72) thus establishes
a close link between the tasks of reliable communication and hypothesis testing.
Given a particular choice of the encoding and decoding channels, as well as
a particular choice of the shared state Ψ𝐴′ 𝐵′ , if 𝑝 ∗err (P; N) ≤ 𝜀, then the quantity
𝐼 𝐻𝜀 (𝑀; 𝑀)
b 𝜔 in (11.1.72) is an upper bound on the maximum number of bits that
can be transmitted over the channel N. The optimal value of this upper bound can
be realized by finding the state 𝜎𝑀b defining the useless channel that optimizes the
quantity 𝐼 𝐻𝜀 (𝑀; 𝑀)
b 𝜔 in addition to the measurement that achieves the 𝜀-hypothesis
testing relative entropy. Importantly, a different choice of encoding, decoding, and
of the state Ψ𝐴′ 𝐵′ produces a different value for this upper bound. We would thus
like to find an upper bound that applies regardless of which specific protocol is
chosen. In other words, we would like an upper bound that is a function of the
channel N only, and this is the topic of the next section.

11.1.2 Upper Bound on the Number of Transmitted Bits

We now give a general upper bound on the number of transmitted bits possible for
an arbitrary one-shot entanglement-assisted classical communication protocol for
a channel N. This result is stated in Theorem 11.6. The upper bound obtained
therein holds independently of the encoding and decoding channels used in the
protocol and depends only on the given communication channel N.
Let us start with an arbitrary (|M|, 𝜀) entanglement-assisted classical com-
munication protocol over N corresponding to, as described at the beginning of
this chapter, a message set M, a prior shared entangled state Ψ𝐴′ 𝐵′ , an encoding
channel E, and a decoding channel D. The error criterion 𝑝 ∗err (P; N) ≤ 𝜀 holds
by the definition of an (|M|, 𝜀) protocol. Then, by the arguments at the end of the
previous section, Lemma 11.4 implies that the inequality log2 |M| ≤ 𝐼 𝐻𝜀 (𝑀; 𝑀)b𝜔
holds. Using this bound on the number log2 |M| of transmitted bits, we obtain the
following result:

645
Chapter 11: Entanglement-Assisted Classical Communication

Proposition 11.5 Upper Bound on One-Shot Entanglement-Assisted

Classical Capacity
Let N 𝐴→𝐵 be a quantum channel. For every (|M|, 𝜀) entanglement-assisted
classical communication protocol over N 𝐴→𝐵 , with 𝜀 ∈ [0, 1], the number
of bits transmitted over N is bounded from above by the 𝜀-hypothesis testing
mutual information of N, defined in (7.11.87), i.e.,

log2 |M| ≤ 𝐼 𝐻𝜀 (N). (11.1.73)

Consequently, for the one-shot entanglement-assisted classical capacity, we

have
𝜀
𝐶EA (N) ≤ 𝐼 𝐻𝜀 (N) (11.1.74)
for all 𝜀 ∈ [0, 1].

Proof: First, let us apply Lemma 11.4 to conclude that

log2 |M| ≤ 𝐼 𝐻𝜀 (𝑀; 𝑀)
b 𝜔, (11.1.75)
where 𝜔 𝑀 𝑀b is defined in (11.1.67). Using the data-processing inequality for the
hypothesis testing mutual information under the action of the decoding channel
D𝐵𝐵′ → 𝑀b (from Proposition 7.19), we find that
b 𝜔 ≤ 𝐼 𝐻𝜀 (𝑀; 𝐵𝐵′)𝜃 ,
𝐼 𝐻𝜀 (𝑀; 𝑀) (11.1.76)
where the state 𝜃 𝑀 𝐵𝐵′ is the same as that in (11.1.6):
𝜃 𝑀 𝐵𝐵′ := (N 𝐴→𝐵 ◦ E 𝑀 ′ 𝐴′ →𝐴 )(Φ 𝑀 𝑀 ′ ⊗ Ψ𝐴′ 𝐵′ ). (11.1.77)
Observe that the reduced state 𝜃 𝑀 𝐵′ is a product state because the channel N 𝐴→𝐵
and encoding E 𝑀 ′ 𝐴′ →𝐴 are trace preserving:
𝜃 𝑀 𝐵′ = Tr 𝐵 [(N 𝐴→𝐵 ◦ E 𝑀 ′ 𝐴′ →𝐴 )(Φ 𝑀 𝑀 ′ ⊗ Ψ𝐴′ 𝐵′ )] (11.1.78)
= Tr 𝑀 ′ 𝐴′ [Φ 𝑀 𝑀 ′ ⊗ Ψ𝐴′ 𝐵′ ] (11.1.79)
= Φ 𝑀 ⊗ Ψ𝐵′ (11.1.80)
= 𝜃 𝑀 ⊗ 𝜃 𝐵′ . (11.1.81)
Now, by definition, we have that
𝐼 𝐻𝜀 (𝑀; 𝐵𝐵′)𝜃 = inf 𝐷 𝜀𝐻 (𝜃 𝑀 𝐵𝐵′ ∥𝜃 𝑀 ⊗ 𝜎𝐵𝐵′ ) (11.1.82)
𝜎𝐵𝐵′

646
Chapter 11: Entanglement-Assisted Classical Communication

Choosing 𝜎𝐵𝐵′ to be 𝜎𝐵 ⊗ 𝜃 𝐵′ and optimizing over 𝜎𝐵 only, we find that

𝐼 𝐻𝜀 (𝑀; 𝐵𝐵′)𝜃 ≤ inf 𝐷 𝜀𝐻 (𝜃 𝑀 𝐵𝐵′ ∥𝜃 𝑀 ⊗ 𝜎𝐵 ⊗ 𝜃 𝐵′ ) (11.1.83)

𝜎𝐵
= inf 𝐷 𝜀𝐻 (𝜃 𝑀 𝐵𝐵′ ∥𝜃 𝑀 𝐵′ ⊗ 𝜎𝐵 ) (11.1.84)
𝜎𝐵
= 𝐼 𝐻𝜀 (𝑀 𝐵′; 𝐵)𝜃 , (11.1.85)

where the first equality follows from the observation in (11.1.81) and the second
equality follows by definition. Now, observe that the state 𝜃 𝑀 𝐵𝐵′ has the form
N 𝐴→𝐵 (𝜌 𝑆 𝐴 ), where 𝑆 ≡ 𝑀 𝐵′ and 𝜌 𝑆 𝐴 ≡ E 𝑀 ′ 𝐴′ →𝐴 (Φ 𝑀 𝑀 ′ ⊗ Ψ𝐴′ 𝐵′ ). This means
that we can optimize over every state 𝜌 𝑆 𝐴 to obtain

𝐼 𝐻𝜀 (𝑀 𝐵′; 𝐵)𝜃 ≤ sup 𝐼 𝐻𝜀 (𝑆; 𝐵)𝜉 , (11.1.86)

𝜌𝑆 𝐴

where 𝜉 𝑆𝐵 = N 𝐴→𝐵 (𝜌 𝑆 𝐴 ). Note that this optimization over states 𝜌 𝑆 𝐴 is effectively

an optimization over all possible encoding channels E 𝑀 ′ 𝐴′ →𝐴 that define the
(|M|, 𝜀) protocol. Now, since it suffices to take pure states when optimizing the
𝜀-hypothesis testing mutual information of bipartite states, following from the same
reasoning in (7.11.4)–(7.11.7) in the context of generalized divergences, we find
that

𝐼 𝐻𝜀 (𝑀 𝐵′; 𝐵)𝜃 ≤ sup 𝐼 𝐻𝜀 (𝑆; 𝐵) 𝜁 (11.1.87)

𝜓𝑆 𝐴
= 𝐼 𝐻𝜀 (N), (11.1.88)

where 𝜓 𝑆 𝐴 is a pure state, with the dimension of 𝑆 the same as that of 𝐴, and
𝜁 𝑆𝐵 = N 𝐴→𝐵 (𝜓 𝑆 𝐴 ). So we have
b 𝜔 ≤ 𝐼 𝐻𝜀 (𝑀 𝐵′; 𝐵)𝜃 ≤ 𝐼 𝐻𝜀 (N),
log2 |M| ≤ 𝐼 𝐻𝜀 (𝑀; 𝑀) (11.1.89)

as required. ■

The result of Proposition 11.5 can be written explicitly as

log2 |M| ≤ sup inf 𝐷 𝜀𝐻 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ 𝜎𝐵 ) (11.1.90)

𝜓 𝑅 𝐴 𝜎𝐵
= sup inf 𝐷 𝜀𝐻 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥R𝜎𝐴→𝐵
𝐵
(𝜓 𝑅 𝐴 )), (11.1.91)
𝜓 𝑅 𝐴 𝜎𝐵

in which we explictly see the comparison, via the hypothesis testing relative entropy,
between the actual entanglement-assisted classical communication protocol and
647
Chapter 11: Entanglement-Assisted Classical Communication

the protocols over the useless channels R𝜎𝐴→𝐵 𝐵

, with each labeled by the state 𝜎𝐵 .
The state 𝜓 𝑅 𝐴 corresponds to the state after the encoding channel, and optimizing
over these states is effectively an optimization over all encoding channels and all
shared entangled states. The latter is true since the preparation of the shared state
Ψ𝐴′ 𝐵′ can always be incorporated into a larger encoding channel. Specifically,
the encoding E 𝑀 ′ 𝐴′ →𝐴 (Φ 𝑀 𝑀 ′ ⊗ Ψ𝐴′ 𝐵′ ) can be written as E′𝑀 ′ →𝐴 (Φ 𝑀 𝑀 ′ ), where
E′𝑀 ′ →𝐴 B E 𝑀 ′ 𝐴′ →𝐴 ◦ PΨ𝐴′ 𝐵′ .
As an immediate consequence of Propositions 11.5, 7.70, and 7.71, we have the
following two bounds:

Theorem 11.6 One-Shot Upper Bounds for Entanglement-Assisted Clas-

sical Communication
Let N 𝐴→𝐵 be a quantum channel, and let 𝜀 ∈ [0, 1). For all (|M|, 𝜀)
entanglement-assisted classical communication protocols over the channel
N, the following bounds hold,
1
log2 |M| ≤ (𝐼 (N) + ℎ2 (𝜀)), (11.1.92)
1−𝜀
𝛼 1
log2 |M| ≤ e
𝐼𝛼 (N) + log2 ∀ 𝛼 > 1, (11.1.93)
𝛼−1 1−𝜀

where 𝐼 (N) is the mutual information of N, as defined in (7.11.102), and e

𝐼𝛼 (N)
is the sandwiched Rényi mutual information of N, as defined in (7.11.91).

Since the bounds in (11.1.92) and (11.1.93) hold for every (|M|, 𝜀) entanglement-
assisted classical communication protocol over N, we have that

𝜀 1
𝐶EA (N) ≤ (𝐼 (N) + ℎ2 (𝜀)), (11.1.94)
1−𝜀
𝜀 𝛼 1
𝐶EA (N) ≤ e
𝐼𝛼 (N) + log2 ∀ 𝛼 > 1, (11.1.95)
𝛼−1 1−𝜀

for all 𝜀 ∈ [0, 1).

Let us recap the steps we took to arrive at the bounds in (11.1.92) and (11.1.93).
1. We first compared the entanglement-assisted classical communication protocol
over N with the same protocol over a useless channel, by using the hypothesis

648
Chapter 11: Entanglement-Assisted Classical Communication

testing relative entropy. Lemma 11.4 plays a role in bounding the maximum
number of transmitted bits for a particular protocol.
2. We then used the data-processing inequality for the hypothesis testing relative
entropy to obtain a quantity that is independent of the decoding channel, as
well as being minimized over all useless protocols when compared to the actual
protocol. This is done in (11.1.76) and (11.1.82)–(11.1.85) in the proof of
Proposition 11.5.
3. Finally, we optimized over all encoding channels (and, effectively, over all
shared states Ψ𝐴′ 𝐵′ ) in (11.1.86)–(11.1.88) to obtain Proposition 11.5, in which
the bound is a function solely of the channel and the error probability.
4. Using Propositions 7.70 and 7.71, which relate the hypothesis testing relative
entropy to the quantum relative entropy and the sandwiched Rényi relative
entropy, respectively, we arrived at Theorem 11.6.
The bounds in (11.1.92) and (11.1.93) are fundamental upper bounds on the
number of transmitted bits for every entanglement-assisted classical communication
protocol. A natural question to ask now is whether the upper bounds in (11.1.92)
and (11.1.93) can be achieved. In other words, is it possible to devise protocols
such that the number of transmitted bits is equal to the right-hand side of either
(11.1.92) or (11.1.93)? We do not know how to, especially if we demand that we
exactly attain the right-hand side of either (11.1.92) or (11.1.93). However, when
given many uses of a channel (in the asymptotic setting), we can come close to
achieving these upper bounds. This motivates finding lower bounds on the number
of transmitted bits.

11.1.3 Lower Bound on the Number of Transmitted Bits via

Position-Based Coding and Sequential Decoding

To obtain lower bounds on the number of transmitted bits, as discussed in Ap-

pendix A, we should devise particular coding schemes. Concretely, we should
devise, for all 𝜀 ∈ (0, 1), an entanglement-assisted classical communication proto-
col (M, Ψ𝐴′ 𝐵′ , E, D) that is an (|M|, 𝜀) protocol, meaning that the maximal error
probability 𝑝 ∗err (P; N) satisfies 𝑝 ∗err (P; N) ≤ 𝜀. Recall from (11.1.15) that the
maximal error probability is defined as
𝑝 ∗err (P; N) = max 𝑝 err (𝑚, P; N), (11.1.96)
𝑚∈M

649
Chapter 11: Entanglement-Assisted Classical Communication

where for 𝑚 ∈ M the message error probability 𝑝 err (𝑚, P; N) is defined in (11.1.13)
as
𝑝 err (𝑚, P; N) = 1 − 𝑞(𝑚|𝑚), (11.1.97)
b |𝑚) being the probability of identifying the message sent as 𝑚
with 𝑞( 𝑚 b, given that
the message 𝑚 was sent.
We make use of a technique called position-based coding along with sequential
decoding to establish the lower bound (11.1.108) in Proposition 11.8 below, which
is analogous to the upper bound (11.1.73) in Proposition 11.5. We now give a brief
description of position-based coding and sequential decoding, while leaving the
details to the proof of Proposition 11.8.
Let us consider an entanglement-assisted classical communication protocol
defined by the four elements (M, 𝜌 ⊗|M| 𝐴′ 𝐵′ , E, D) and depicted in Figure 11.3. The
state shared by Alice and Bob prior to communication is |M| copies of a state 𝜌 𝐴′ 𝐵′ .
The encoding E is defined such that if Alice wishes to send a message 𝑚 ∈ M,
then she sends her 𝑚th 𝐴′ system through the channel. Specifically, the encoding
channels E𝑚( 𝐴′ ) |M| →𝐴 are defined as
h i
E𝑚( 𝐴′ ) |M| →𝐴 (𝜌 ⊗|M|
𝐴′ 𝐵 ′ ) =𝜌 𝐵1′ ⊗ ··· ⊗ 𝜌 𝐴𝐵′𝑚 ⊗ ··· ⊗ 𝜌 𝐵′𝑚 = Tr 𝐴¯ 𝑚 𝜌 ⊗|M|
𝐴′ 𝐵 ′ , (11.1.98)

where 𝐴¯ 𝑚 indicates all systems 𝐴 𝑘 except for 𝐴𝑚 . This encoding procedure is

called position-based coding because the message is encoded into the particular
𝐴′ system that is sent to Bob. In other words, the message is encoded into the
“position” of the 𝐴′ systems.2
The state held by the receiver Bob after Alice sends the 𝐴 system in (11.1.98)
through the channel N 𝐴→𝐵 is

𝜏𝐵𝑚′ ···𝐵′𝑚 ···𝐵′ 𝐵 B 𝜌 𝐵1′ ⊗ · · · ⊗ N 𝐴→𝐵 (𝜌 𝐴𝐵′𝑚 ) ⊗ · · · ⊗ 𝜌 𝐵′|M| . (11.1.99)

1 |M|

Bob, whose task is to determine the message 𝑚 sent to him, should apply a de-
coding channel that ideally succeeds with high probability. The sequential decoding
strategy consists of Bob performing a sequence of measurements on systems 𝐵𝑖′ and
2 In practice, it would be wasteful for the sender to discard so much entanglement by explicitly
using the encoding procedure in (11.1.98). The explicit encoding given should thus be considered a
conceptual tool for understanding that the 𝑚th system is sent through the channel, and in practice it
can be realized simply by sending the 𝑚th system through the channel.

650
Chapter 11: Entanglement-Assisted Classical Communication

Alice Bob
ρ A10 B10
A10 B10
ρ A20 B20
A20 B20
..
.. .. . .. ..
. . . .

m3M A0m
N 0
Bm

.. .. .. ..
. . .. . .
.
A |M| B|M|
ρ A0 B0
|M| |M|

Figure 11.3: Schematic depiction of position-based coding. Alice and Bob

start with |M| copies of a state 𝜌 𝐴′ 𝐵′ . If Alice wants to send the message 𝑚 ∈ M,
she sends the 𝑚th system through the channel N to Bob.

𝐵. Each of these measurements is defined by the POVM {Π𝐵′ 𝐵𝑅 , 1𝐵′ 𝐵𝑅 − Π𝐵′ 𝐵𝑅 },

where 𝑅 is an arbitrary (finite-dimensional) reference system held by Bob that he
makes use of to help with the decoding. In particular, Bob has identical reference
systems 𝑅1 , . . . , 𝑅 |M| , each associated with his 𝐵′ systems. The projectors defining
the sequential decoding strategy are then

𝑃𝑖 B 1𝐵1′ 𝑅1 ⊗ · · · ⊗ 1𝐵𝑖−1
′ 𝑅
𝑖−1 ⊗ Π 𝐵𝑖 𝐵𝑅𝑖 ⊗ 1 𝐵𝑖+1 𝑅𝑖+1 ⊗ · · · ⊗ 1 𝐵 |M| 𝑅 |M|
′ ′ ′ (11.1.100)

for all 1 ≤ 𝑖 ≤ |M|, and they correspond to measuring systems 𝐵𝑖′ 𝐵𝑅𝑖 with the
POVM {Π𝐵′ 𝐵𝑅 , 1𝐵′ 𝐵𝑅 − Π𝐵′ 𝐵𝑅 }. This measurement can be thought of intuitively as
asking the question “Was the 𝑖th message sent?”, with the outcome corresponding
to 𝑃𝑖 being “yes” and the outcome corresponding to
b𝑖 B 1 − 𝑃𝑖
𝑃 (11.1.101)

being “no.” Bob performs a measurement on the systems 𝐵′1 𝐵𝑅1 , followed by
a measurement on 𝐵′2 𝐵𝑅2 , followed by a measurement on 𝐵′3 𝐵𝑅3 , etc., until he
obtains the outcome corresponding to “yes.” The system number corresponding to
b |𝑚) of guessing
this outcome is then his guess for the message. The probability 𝑞( 𝑚
b given that the message 𝑚 was sent is, therefore,
𝑚

b |𝑚) = Tr[𝑃𝑚b 𝑃 b1 𝜔𝑚′ ′

b𝑚b−1 · · · 𝑃
𝑞( 𝑚 𝐵𝑅1 ···𝑅 |M| 𝑃1 · · · 𝑃𝑚 b ],
b−1 𝑃𝑚 (11.1.102)
b b
𝐵 ···𝐵
1 |M|

651
Chapter 11: Entanglement-Assisted Classical Communication

where
𝜔𝑚
𝐵′ ···𝐵′ 𝐵𝑅1 ···𝑅 |M| B 𝜏𝐵𝑚′ ···𝐵′ 𝐵 ⊗ |0, · · · , 0⟩⟨0, · · · , 0| 𝑅1 ···𝑅 |M| . (11.1.103)
1 |M| 1 |M|

The message error probability is then

𝑝 err (𝑚; P) = 1 − Tr[𝑃𝑚 𝑃 b1 𝜔𝑚′ ′
b𝑚−1 · · · 𝑃
𝐵 ···𝐵 𝐵𝑅1 ···𝑅 |M| 𝑃1 · · · 𝑃𝑚−1 𝑃𝑚 ]
b b (11.1.104)
1 |M|

for all 𝑚 ∈ M.
Recall that our goal is to place an upper bound on the maximal error probability
𝑝 ∗err (P; N)
of this position-based coding and sequential decoding protocol. We
obtain an upper bound on 𝑝 err (𝑚, P; N) for each message 𝑚 from applying the
following theorem, called the quantum union bound, whose proof can be found in
Appendix 11.A. This theorem can be thought of as a quantum generalization of
the union bound from probability theory. Indeed, if 𝐴1 , . . . , 𝐴 𝑁 is a sequence of
events, then the union bound is as follows:
𝑁
∑︁
𝑐
Pr[( 𝐴1 ∩ · · · ∩ 𝐴 𝑁 ) ] = Pr[ 𝐴1𝑐 ∪···∪ 𝐴𝑐𝑁 ] ≤ Pr[ 𝐴𝑖𝑐 ], (11.1.105)
𝑖=1
where the superscript 𝑐 denotes the complement of an event.

Theorem 11.7 Quantum Union Bound

Let {𝑃𝑖 }𝑖=1
𝑁
be a set of projectors. For every state 𝜌 and 𝑐 > 0,

1−Tr[𝑃 𝑁 𝑃 𝑁−1 · · · 𝑃1 𝜌𝑃1 · · · 𝑃 𝑁−1 𝑃 𝑁 ]

𝑁−1
(11.1.106)
≤ (1 + 𝑐)Tr[( 1 − 𝑃 𝑁 ) 𝜌] + (2 + 𝑐 + 𝑐 ) Tr[( 1 − 𝑃𝑖 ) 𝜌].
∑︁
−1

𝑖=1

Proof: See Appendix 11.A. ■

Using this theorem, we place the following upper bound on the message error
probability 𝑝 err (𝑚, P; N):
𝑝 err (𝑚, P; N) ≤ (1 + 𝑐)Tr[ 𝑃
b𝑚 𝜔𝑚′ ′
𝐵 ···𝐵 𝐵𝑅1 ···𝑅 |M| ]
1 |M|
𝑚−1
∑︁ (11.1.107)
−1
+ (2 + 𝑐 + 𝑐 ) Tr[𝑃𝑖 𝜔𝑚
𝐵′ ···𝐵′ 𝐵𝑅1 ···𝑅 |M| ],
1 |M|
𝑖=1

652
Chapter 11: Entanglement-Assisted Classical Communication

which holds for all 𝑐 > 0. By making a particular choice for the projectors
𝑃1 , . . . , 𝑃 |M| , and a particular choice for the constant 𝑐, we obtain the following.

Proposition 11.8 Lower Bound on One-Shot Entanglement-Assisted

Classical Capacity
Let N 𝐴→𝐵 be a quantum channel. For 𝜀 ∈ (0, 1) and 𝜂 ∈ (0, 𝜀), there exists an
(|M|, 𝜀) entanglement-assisted classical communication protocol over N 𝐴→𝐵
such that
𝜀−𝜂 4𝜀
log2 |M| = 𝐼 𝐻 (N) − log2 2 (11.1.108)
𝜂
Consequently, for all 𝜀 ∈ (0, 1) and for all 𝜂 ∈ (0, 𝜀),

𝜀 𝜀−𝜂 4𝜀
𝐶EA (N) ≥ 𝐼 𝐻 (N) − log2 2 . (11.1.109)
𝜂
Here,
𝜀 𝜀
𝐼 𝐻 (N) B sup 𝐼 𝐻 (𝑅; 𝐵)𝜔 , (11.1.110)
𝜓𝑅 𝐴

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), 𝜓 𝑅 𝐴 is a pure state with the dimension of 𝑅 the

same as that of 𝐴, and
𝜀
𝐼 𝐻 ( 𝐴; 𝐵) 𝜌 B 𝐷 𝜀𝐻 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴 ⊗ 𝜌 𝐵 ). (11.1.111)

𝜀
Remark: The quantity 𝐼 𝐻 ( 𝐴; 𝐵)𝜌 defined in the statement of Proposition 11.8 above is similar
to the quantity 𝐼 𝐻𝜀 ( 𝐴; 𝐵)𝜌 defined in (7.11.88), except that we do not perform an optimization
𝜀
over states 𝜎𝐵 . The resulting channel quantity 𝐼 𝐻 (N) is then similar to the quantity 𝐼 𝐻𝜀 (N)
𝜀
defined in (7.11.87). The fact that it suffices to optimize over pure states 𝜓 𝑅 𝐴 in 𝐼 𝐻 (N), with the
dimension of 𝑅 the same as that of 𝐴, follows from arguments analogous to those presented in
Section 7.11.

Proof: Fix 𝜀 ∈ (0, 1) and 𝜂 ∈ (0, 𝜀). Starting with the encoded state on Bob’s
systems as defined in (11.1.99), observe that the state of the systems 𝐵 and 𝐵′𝑚b is
given by
𝑚 N 𝐴′ →𝐵 (𝜌 𝐴′ 𝐵′ ) b = 𝑚,
if 𝑚
𝜏𝐵𝐵′ = (11.1.112)
𝑚
b N 𝐴′ →𝐵 (𝜌 𝐴′ ) ⊗ 𝜌 𝐵′ if 𝑚
b ≠ 𝑚.
Recall that the system 𝐵 results from the system 𝐴𝑚 being sent through the channel
by Alice. If, along with system 𝐵, he measures the system 𝐵′𝑚 , then Bob is
653
Chapter 11: Entanglement-Assisted Classical Communication

performing a measurement on the state N 𝐴′ →𝐵 (𝜌 𝐴′ 𝐵′ ). On the other hand, if Bob

measures a system 𝐵′𝑚b , with 𝑚 b ≠ 𝑚, then Bob is performing a measurement on the
state N 𝐴′ →𝐵 (𝜌 𝐴′ ) ⊗ 𝜌 𝐵′ . Bob, knowing the states 𝜌 𝐴′ 𝐵′ as well as the channel N,
can devise the measurement {Λ𝐵′ 𝐵 , 1𝐵′ 𝐵 − Λ𝐵′ 𝐵 } that achieves the minimum value
of the probability Tr[Λ𝐵′ 𝐵 (𝜌 𝐵′ ⊗ N 𝐴′ →𝐵 (𝜌 𝐴′ ))] while satisfying

Tr[Λ𝐵′ 𝐵 N 𝐴′ →𝐵 (𝜌 𝐴′ 𝐵′ )] ≥ 1 − (𝜀 − 𝜂). (11.1.113)

Equivalently, by recalling Definition 7.65, the measurement achieves the 𝜀-

hypothesis testing relative entropy
𝜀−𝜂
𝐷 𝐻 (N 𝐴′ →𝐵 (𝜌 𝐴′ 𝐵′ )∥ 𝜌 𝐵′ ⊗ N 𝐴′ →𝐵 (𝜌 𝐴′ )), (11.1.114)

meaning that
− log2 Tr[Λ𝐵′ 𝐵 (𝜌 𝐵′ ⊗ N 𝐴′ →𝐵 (𝜌 𝐴′ ))]
𝜀−𝜂
= 𝐷 𝐻 (N 𝐴′ →𝐵 (𝜌 𝐴′ 𝐵′ )∥ 𝜌 𝐵′ ⊗ N 𝐴′ →𝐵 (𝜌 𝐴′ )) (11.1.115)
𝜀−𝜂
= 𝐼 𝐻 (𝐵′; 𝐵)𝜉 ,
where 𝜉 𝐵′ 𝐵 B N 𝐴′ →𝐵 (𝜌 𝐴′ 𝐵′ ).
The measurement with POVM {Λ𝐵′ 𝐵 , 1𝐵′ 𝐵 − Λ𝐵′ 𝐵 } forms one part of Bob’s
decoding strategy. The other part of the decoding strategy is based on the fact
that Bob does not know the position corresponding to the system 𝐵 he receives
through the channel from Alice. He therefore does not know which of the systems
𝐵′1 , . . . , 𝐵′|M| to measure along with 𝐵. As described before the statement of the
proposition, the sequential decoding strategy consists of Bob performing a sequence
of projective measurements on the systems 𝐵𝑖′ 𝐵𝑅 corresponding to the question
“Was the 𝑖th message sent?”. Let us define the projectors {Π𝐵′ 𝐵𝑅 , 1𝐵′ 𝐵𝑅 − Π𝐵′ 𝐵𝑅 }
on which this measurement is based as follows:

Π𝐵′ 𝐵𝑅 B 𝑈𝐵† ′ 𝐵𝑅 ( 1𝐵′ 𝐵 ⊗ |1⟩⟨1| 𝑅 )𝑈𝐵′ 𝐵𝑅 , (11.1.116)

where 𝑅 is a qubit system and the unitary 𝑈𝐵′ 𝐵𝑅 is defined as

𝑈𝐵′ 𝐵𝑅 B 1𝐵′ 𝐵 − Λ𝐵′ 𝐵 ⊗ (|0⟩⟨0| 𝑅 + |1⟩⟨1| 𝑅 )

√︁
√︁
+ Λ𝐵′ 𝐵 ⊗ (|1⟩⟨0| 𝑅 − |0⟩⟨1| 𝑅 ) . (11.1.117)

Then, it follows that

Tr[Π𝐵′ 𝐵𝑅 (N 𝐴′ →𝐵 (𝜌 𝐴′ 𝐵′ ) ⊗ |0⟩⟨0| 𝑅 )]
654
Chapter 11: Entanglement-Assisted Classical Communication

= Tr[( 1𝐵′ 𝐵 ⊗ ⟨0| 𝑅 )Π𝐵′ 𝐵𝑅 ( 1𝐵′ 𝐵 ⊗ |0⟩ 𝑅 )N 𝐴′ →𝐵 (𝜌 𝐴′ 𝐵′ )] (11.1.118)

= Tr[Λ𝐵′ 𝐵 N 𝐴′ →𝐵 (𝜌 𝐴′ 𝐵′ )] (11.1.119)
≥ 1 − (𝜀 − 𝜂), (11.1.120)
where the second equality holds by the definition
√ of Π𝐵′ 𝐵𝑅 in (11.1.116) and the fact
that ( 1𝐵′ 𝐵 ⊗ ⟨1| 𝑅 )𝑈𝐵′ 𝐵𝑅 ( 1𝐵′ 𝐵 ⊗ |0⟩ 𝑅 ) = Λ𝐵′ 𝐵 , which can be seen from (11.1.117).
To obtain the inequality, we used (11.1.113). Defining the projection operators
|M|
{𝑃𝑖 }𝑖=1 as in (11.1.100), and defining the state 𝜔 as in (11.1.103), it also holds that
Tr[𝑃𝑖 𝜔𝑚
𝐵′ ···𝐵′ 𝐵𝑅1 ···𝑅 |M| ] = Tr[Λ𝐵′ 𝐵 (𝜌 𝐵′ ⊗ N 𝐴′ →𝐵 (𝜌 𝐴′ ))] (11.1.121)
1 |M|

for all 𝑖 < 𝑚, and

b𝑚 𝜔𝑚′ ′
Tr[ 𝑃 𝐵 ···𝐵 𝐵𝑅1 ···𝑅 |M| ] = Tr[( 1𝐵′ 𝐵 − Λ𝐵′ 𝐵 )N 𝐴′ →𝐵 (𝜌 𝐵′ 𝐴′ )]. (11.1.122)
1 |M|

Now, recall that the message error probability 𝑝 err (𝑚, P; N) is defined as in
(11.1.104), i.e.,
𝑝 err (𝑚, P; N)
(11.1.123)
= 1 − Tr[𝑃𝑚 𝑃
b𝑚−1 · · · 𝑃 b1 · · · 𝑃
b1 𝜔 𝐵′ ···𝐵′ 𝐵𝑅1 ···𝑅 |M| 𝑃
1 |M|
b𝑚−1 𝑃𝑚 ],

and that we can use the quantum union bound (Theorem 11.7) to place an upper
bound on this quantity as in (11.1.107), i.e.,
𝑝 err (𝑚; P) ≤ (1 + 𝑐)Tr[ 𝑃
b𝑚 𝜔𝑚′ ′
𝐵 ···𝐵 𝐵𝑅1 ···𝑅 |M| ]
1 |M|
𝑚−1
∑︁ (11.1.124)
−1
+ (2 + 𝑐 + 𝑐 ) Tr[𝑃𝑖 𝜔𝑚
𝐵1′ ···𝐵′|M| 𝐵𝑅1 ···𝑅 |M| ]
𝑖=1

for all 𝑐 > 0. Using (11.1.121) and (11.1.122), the inequality in (11.1.113), and
the equality in (11.1.115), the upper bound can be simplified so that
𝑝 err (𝑚, P; N)
≤ (1 + 𝑐)Tr[( 1𝐵′ 𝐵 − Λ𝐵′ 𝐵 )N 𝐴′ →𝐵 (𝜌 𝐵′ 𝐴′ )]
+ (2 + 𝑐 + 𝑐−1 )(𝑚 − 1)Tr[Λ𝐵′ 𝐵 (𝜌 𝐵′ ⊗ N 𝐴′ →𝐵 (𝜌 𝐴′ )] (11.1.125)
𝜀− 𝜂
−𝐼 𝐻 (𝐵′ ;𝐵) 𝜉
≤ (1 + 𝑐)(𝜀 − 𝜂) + (2 + 𝑐 + 𝑐−1 )|M|2 (11.1.126)
for all 𝑐 > 0, where the second inequality follows because 𝑚 − 1 ≤ |M|. The
inequality in (11.1.126) holds for all 𝑚 ∈ M, which means that for all 𝑐 > 0,
𝜀− 𝜂
(𝐵′ ;𝐵) 𝜉
𝑝 ∗err (P; N) ≤ (1 + 𝑐)(𝜀 − 𝜂) + (2 + 𝑐 + 𝑐−1 )|M|2−𝐼 𝐻 . (11.1.127)
655
Chapter 11: Entanglement-Assisted Classical Communication
𝜀−𝜂
Let us set 𝛾 ≡ 𝐼 𝐻 (𝐵′; 𝐵)𝜉 and solve for the value of |M| such that

(1 + 𝑐)(𝜀 − 𝜂) + (2 + 𝑐 + 𝑐−1 )|M|2−𝛾 = 𝜀. (11.1.128)

We find that
|M| = 2𝛾 𝑏(𝜂 − 𝑏𝜀), (11.1.129)
𝑐
where 𝑏 ≡ 1+𝑐 . Since 𝑏 is a variable and our goal is to make |M| as large as possible
for fixed 𝜀 and 𝜂, let us maximize |M| with respect to 𝑏. Solving 𝜕|M|𝜕𝑏 = 0, we find
𝜂
that 𝑏 = 2𝜀 . This is a permissible value of 𝑏 since it is required that 𝑏 > 0 and
𝜂 − 𝑏𝜀 ≥ 0. Plugging back into (11.1.129), we find that

2 𝜀− 𝜂
𝛾𝜂 𝐼𝐻 (𝐵′ ;𝐵) 𝜉 −log2 4𝜀
|M| = 2 =2 𝜂2 . (11.1.130)
4𝜀
Thus, with |M| given by (11.1.130), we conclude that

𝑝 ∗err (P) ≤ 𝜀, (11.1.131)

and this proves the existence of an (|M|, 𝜀) protocol with |M| given by (11.1.130).
However, (11.1.130) holds for every state 𝜌 𝐴′ 𝐵′ , which means that we can take

𝜀−𝜂 ′ 4𝜀
log2 |M| = sup 𝐼 𝐻 (𝐵 ; 𝐵)𝜉 − log2 2
𝜌 𝐴′ 𝐵′ 𝜂
(11.1.132)
𝜀−𝜂 4𝜀
= 𝐼 𝐻 (N) − log2 2 ,
𝜂

and have (11.1.131) hold. This is precisely (11.1.108), and since 𝜀 ∈ (0, 1) and
𝜂 ∈ (0, 𝜀) are arbitrary, the proof is complete. ■

Let us take note of the following two facts from the proof of Proposition 11.8
given above:
1. Given a particular 𝜀 ∈ (0, 1) and an 𝜂 ∈ (0, 𝜀), we can construct a position-
based coding and sequential decoding protocol achieving a maximal error
probability of 𝑝 ∗err (P) ≤ 𝜀 by taking

4𝜀
𝐼 𝐻 (𝐵′; 𝐵)𝜉 − log2 2 ,
𝜀−𝜂
log2 |M| = b (11.1.133)
𝜂

656
Chapter 11: Entanglement-Assisted Classical Communication

where 𝜉 𝐵′ 𝐵 = N 𝐴′ →𝐵 (𝜌 𝐴′ 𝐵′ ) and 𝜌 𝐴′ 𝐵′ is the shared state that Alice and Bob

start with (see (11.1.130)). Note that this holds for every state 𝜌 𝐴′ 𝐵′ . We take
the supremum at the end of the proof in order to obtain the highest possible
number of transmitted bits, and also to obtain a quantity that is a function of
the channel N only. The right-hand side of (11.1.108) can thus be achieved by
determining the optimal shared state 𝜌 𝐴′ 𝐵′ .
𝜀−𝜂
𝐼 𝐻 (N), we see
2. Since it suffices to optimize over pure states in order to obtain b
that the shared state 𝜌 𝐴′ 𝐵′ can be taken to be pure, with the dimension of 𝐵′
the same as that of 𝐴′.
An immediate consequence of Propositions 11.8 and 7.72 is the following
theorem.

Theorem 11.9 One-Shot Lower Bounds for Entanglement-Assisted Clas-

sical Communication
Let N 𝐴→𝐵 be a quantum channel. For all 𝜀 ∈ (0, 1), 𝜂 ∈ (0, 𝜀), and 𝛼 ∈ (0, 1),
there exists an (|M|, 𝜀) entanglement-assisted classical communication protocol
over N 𝐴→𝐵 such that

𝛼 1 4𝜀
log2 |M| ≥ 𝐼 𝛼 (N) − log2 − log2 2 . (11.1.134)
1−𝛼 𝜀−𝜂 𝜂
Here,
𝐼 𝛼 (N) B sup 𝐼 𝛼 (𝑅; 𝐵)𝜔 , (11.1.135)
𝜓𝑅 𝐴

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), 𝜓 𝑅 𝐴 is a pure state, 𝑅 has dimension equal to that

of 𝐴, and
𝐼 𝛼 ( 𝐴; 𝐵) 𝜌 B 𝐷 𝛼 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴 ⊗ 𝜌 𝐵 ). (11.1.136)

Remark: The quantity 𝐼 𝛼 ( 𝐴; 𝐵)𝜌 defined in the statement of Theorem 11.9 above is similar to
the quantity 𝐼 𝛼 ( 𝐴; 𝐵)𝜌 defined in (7.11.90), except that we do not perform an optimization over
states 𝜎𝐵 . The resulting channel quantity 𝐼 𝛼 (N) is then similar to the quantity 𝐼 𝛼 (N) defined in
(7.11.89). The fact that it suffices to optimize over pure states 𝜓 𝑅 𝐴 in 𝐼 𝛼 (N), with the dimension
of 𝑅 the same as that of 𝐴, follows from arguments analogous to those presented in Section 7.11.

Proof: From Proposition 11.8, we know that for all 𝜀 ∈ (0, 1) and 𝜂 ∈ (0, 𝜀) there
exists an (|M|, 𝜀) entanglement-assisted classical communication protocol such

657
Chapter 11: Entanglement-Assisted Classical Communication

that
𝜀−𝜂 4𝜀
log2 |M| = 𝐼 𝐻 (N) − log2 2 . (11.1.137)
𝜂
Proposition 7.72 relates the hypothesis testing relative entropy to the Petz–Rényi
relative entropy according to

𝛼 1
𝐷 𝜀𝐻 (𝜌∥𝜎) ≥ 𝐷 𝛼 (𝜌∥𝜎) + log2 (11.1.138)
𝛼−1 𝜀
for all 𝛼 ∈ (0, 1), which implies that

𝜀 𝛼 1
𝐼 𝐻 (N) ≥ 𝐼 𝛼 (N) + log2 . (11.1.139)
𝛼−1 𝜀
Combining this inequality with (11.1.137), we obtain the desired result. ■

Since the inequality in (11.1.134) holds for all (|M|, 𝜀) entanglement-assisted

classical communication protocols, we have that

𝜀 𝛼 1 4𝜀
𝐶EA (N) ≥ 𝐼 𝛼 (N) + log2 − log2 2 (11.1.140)
𝛼−1 𝜀−𝜂 𝜂
for all 𝛼 ∈ (0, 1), where 𝜀 ∈ (0, 1) and 𝜂 ∈ (0, 𝜀).

11.2 Entanglement-Assisted Classical Capacity of a

Quantum Channel
Let us now consider the asymptotic setting of entanglement-assisted classical
communication. In this scenario, depicted in Figure 11.4, instead of encoding the
message into one quantum system and consequently using the channel N only once,
Alice encodes the message into 𝑛 ≥ 1 quantum systems 𝐴1 , . . . , 𝐴𝑛 , all with the
same dimension as that of 𝐴, and sends each one of these through the channel N.
We call this the asymptotic setting because the number 𝑛 can be arbitrarily large.
The analysis of the asymptotic setting is almost exactly the same as that of the
one-shot setting. This is due to the fact that 𝑛 independent uses of the channel N
can be regarded as a single use of the tensor-product channel N ⊗𝑛 . So the only
change that needs to be made is to replace N with N ⊗𝑛 and to define the states and
658
Chapter 11: Entanglement-Assisted Classical Communication

A1 B1
N
A2 B2
M3m N

A0
E ..
.
A n −1
..

N
.
..
.
Bn−1

An Bn m̂
N
Alice
Ψ A0 B0
Bob

Figure 11.4: The most general entanglement-assisted classical communication

protocol over a multiple number 𝑛 ≥ 1 uses of a quantum channel N. Alice
and Bob intially share a pair of quantum systems in the state Ψ𝐴′ 𝐵′ . Alice, who
wishes to send a message 𝑚 from a set M of messages, first encodes the message
into a quantum state on 𝑛 quantum systems using an encoding channel E. She
then sends each quantum system through the channel N. After Bob receives the
systems, he performs a joint measurement on them and the system 𝐵′, using the
outcome of the measurement to give an estimate 𝑚 b of the message 𝑚 sent to
him by Alice.

POVM elements as acting on 𝑛 systems instead of just one. In particular, the state
at the end of the protocol is
𝑝
= (D𝐵𝑛 𝐵′ → 𝑀b ◦ N ⊗𝑛
𝑝
𝐴→𝐵 ◦ E 𝑀 𝐴 →𝐴 )(Φ 𝑀 𝑀 ′ ⊗ Ψ 𝐴 𝐵 ), (11.2.1)
𝜔 ′ ′ 𝑛 ′ ′
𝑀𝑀b

where 𝑝 is the prior probability distribution over the set of messages M, the
encoding channel E 𝑀 ′ 𝐴′ →𝐴𝑛 defines a set {E𝑚𝑀 ′ →𝐴𝑛 } 𝑚∈M of channels so that
𝑝
∑︁
E 𝑀 𝐴 →𝐴 (Φ 𝑀 𝑀 ′ ⊗ Ψ𝐴 𝐵 ) =
′ ′ 𝑛 ′ ′ 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ E𝑚𝐴′ →𝐴𝑛 (Ψ𝐴′ 𝐵′ ), (11.2.2)
𝑚∈M
and the decoding channel D𝐵𝑛 𝐵′ → 𝑀b , with associated POVM {Λ𝑚 𝐵 𝑛 𝐵′ } 𝑚∈M , is defined
as ∑︁
D𝐵𝑛 𝐵′ → 𝑀b (𝜏𝐵𝑛 𝐵′ ) = Tr[Λ𝑚𝐵 𝑛 𝐵′ 𝜏𝐵 𝑛 𝐵′ ]|𝑚⟩⟨𝑚| 𝑀b. (11.2.3)
𝑚∈M
Then, for a given code specified by the encoding and decoding channels, the
definitions of the message error probability of the code, the average error probability
of the code, and the maximal error probability of the code all follow analogously
from their definitions in (11.1.13), (11.1.14), and (11.1.15), respectively, from the
one-shot setting.

659
Chapter 11: Entanglement-Assisted Classical Communication

Definition 11.10 (𝒏, |M|, 𝜺) Entanglement-Assisted Classical Commu-

nication Protocol
Let (M, Ψ𝐴′ 𝐵′ , E 𝑀 ′ 𝐴′ →𝐴𝑛 , D𝐵𝑛 𝐵′ → 𝑀b ) be the elements of an entanglement-
assisted classical communication protocol over 𝑛 uses of the channel N 𝐴→𝐵 . The
protocol is called an (𝑛, |M|, 𝜀) protocol, with 𝜀 ∈ [0, 1], if 𝑝 ∗err (P; N ⊗𝑛 ) ≤ 𝜀.

Note that if there exists an (𝑛, |M|, 𝜀) entanglement-assisted classical commu-

nication protocol, then there exists an (𝑛, |M′ |, 𝜀) entanglement-assisted classical
communication protocol for all M′ satisfying |M′ | ≤ |M|. Indeed, simply take a
subset M′ ⊆ M of size |M′ | and define the encoding and decoding channels E′ and
D′ as the restrictions of the original channels E, D to the set M′. Then, using the
shorthand P′ ≡ (Ψ, E′, D′),

𝑝 ∗err (P′; N) = max

′ ′
𝑝 err (𝑚′, P′; N) (11.2.4)
𝑚 ∈M
≤ max 𝑝 err (𝑚, P; N) (11.2.5)
𝑚∈M
= 𝑝 ∗err (P; N) (11.2.6)
≤ 𝜀, (11.2.7)

where the first inequality holds because M′ is a subset of M. So we have an

(𝑛, |M′ |, 𝜀) protocol. Similarly, if there does not exist an (𝑛, |M|, 𝜀) entanglement-
assisted classical communication protocol, then there does not exist an (𝑛, |M′ |, 𝜀)
entanglement-assisted classical communication protocol for all M′ satisfying
|M′ | ≥ |M|.
The rate of an entanglement-assisted classical communication protocol over 𝑛
uses of a channel is equal to the number of bits that can be transmitted per channel
use, i.e.,
1
𝑅(𝑛, |M|) B log2 |M|. (11.2.8)
𝑛
Observe that the rate depends only on the number of messages in the set and on
the number of uses of the channel. In particular, it does not directly depend on
the communication channel nor on the encoding and decoding channels. Given a
channel N 𝐴→𝐵 and 𝜀 ∈ (0, 1), the maximum rate of entanglement-assisted classical
communication over N among all (𝑛, |M|, 𝜀) protocols is
1 𝜀
𝑛,𝜀
𝐶EA (N) B 𝐶EA (N ⊗𝑛 ) (11.2.9)
𝑛
660
Chapter 11: Entanglement-Assisted Classical Communication

1
= sup log2 |M| : 𝑝 ∗err ((Ψ, E, D); N ⊗𝑛 ) ≤ 𝜀 , (11.2.10)
(M,Ψ,E,D) 𝑛

where the optimization is over all entanglement-assisted classical communication

protocols (M, Ψ𝐴′ 𝐵′ , E 𝑀 ′ 𝐴′ →𝐴𝑛 , D𝐵𝑛 𝐵′ → 𝑀b ) over N ⊗𝑛 , with 𝑑 𝑀 ′ = 𝑑 𝑀b = |M|.
The goal of an entanglement-assisted classical communication protocol in the
asymptotic setting is to maximize the rate while at the same time keeping the
maximal error probability low, using the number 𝑛 of channel uses as a tunable
parameter. Ideally, we would want the error probability to vanish, and since we
want to determine the highest possible rate, we are not necessarily concerned about
the practical question regarding how many channel uses might be required, at least
in the asymptotic setting. In particular, as indicated by definitions, it might take an
arbitrarily large number of channel uses to obtain the highest rate with a vanishing
error probability.

Definition 11.11 Achievable Rate for Entanglement-Assisted Classical

Communication
Given a quantum channel N, a rate 𝑅 ∈ R+ is called an achievable rate for
entanglement-assisted classical communication over N if for all 𝜀 ∈ (0, 1],
𝛿 > 0, and sufficiently large 𝑛, there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) entanglement-
assisted classical communication protocol.

As we prove in Appendix A,
∗
𝑅 achievable rate ⇐⇒ lim 𝜀EA (2𝑛(𝑅−𝛿) ; N ⊗𝑛 ) = 0 ∀ 𝛿 > 0. (11.2.11)
𝑛→∞

In other words, a rate 𝑅 is achievable if the optimal error probability for a sequence
of protocols with rate 𝑅 − 𝛿, 𝛿 > 0, vanishes as the number 𝑛 of uses of N increases.

Definition 11.12 Entanglement-Assisted Classical Capacity of a Quan-

tum Channel
The entanglement-assisted classical capacity of a quantum channel N, denoted
by 𝐶EA (N), is defined as the supremum of all achievable rates, i.e.,

𝐶EA (N) B sup{𝑅 : 𝑅 is an achievable rate for N}. (11.2.12)

661
Chapter 11: Entanglement-Assisted Classical Communication

The entanglement-assisted classical capacity can also be written as

1 𝜀
𝐶EA (N) = inf lim inf 𝐶EA (N ⊗𝑛 ). (11.2.13)
𝜀∈(0,1] 𝑛→∞ 𝑛

See Appendix A for a proof.

Definition 11.13 Weak Converse Rate for Entanglement-Assisted Classi-

cal Communication
Given a quantum channel N, a rate 𝑅 ∈ R+ is called a weak converse rate for
entanglement-assisted classical communication over N if every 𝑅′ > 𝑅 is not
an achievable rate for N.

We show in Appendix A in that

∗
𝑅 weak converse rate ⇐⇒ lim 𝜀EA (2𝑛(𝑅−𝛿) ; N ⊗𝑛 ) > 0 ∀ 𝛿 > 0. (11.2.14)
𝑛→∞

In other words, a weak converse rate is a rate above which the optimal error
probability cannot be made to vanish in the limit of a large number of channel uses.

Definition 11.14 Strong Converse Rate for Entanglement-Assisted Clas-

sical Communication
Given a quantum channel N, a rate 𝑅 ∈ R+ is called a strong converse
rate for entanglement-assisted classical communication over N if for all
𝜀 ∈ [0, 1), 𝛿 > 0, and sufficiently large 𝑛, there does not exist an (𝑛, 2𝑛(𝑅+𝛿) , 𝜀)
entanglement-assisted classical communication protocol over N.

We show in Appendix A that

∗
𝑅 strong converse rate ⇐⇒ lim 𝜀EA (2𝑛(𝑅+𝛿) ; N ⊗𝑛 ) = 1 ∀ 𝛿 > 0. (11.2.15)
𝑛→∞

In other words, unlike the weak converse, in which the optimal error is required
to simply be bounded away from zero as the number 𝑛 of channel uses increases,
in order to have a strong converse rate the optimal error has to converge to one
as 𝑛 increases. By comparing (11.2.14) and (11.2.15), it is clear that every strong
converse rate is a weak converse rate.

662
Chapter 11: Entanglement-Assisted Classical Communication

Definition 11.15 Strong Converse Entanglement-Assisted Classical Ca-

pacity of a Quantum Channel
The strong converse entanglement-assisted classical capacity of a quantum
channel N, denoted by 𝐶
eEA (N), is defined as the infimum of all strong converse
rates, i.e.,
eEA (N) B inf{𝑅 : 𝑅 is a strong converse rate for N}.
𝐶 (11.2.16)

As shown in general in Appendix A, we have that

𝐶EA (N) ≤ 𝐶
eEA (N) (11.2.17)

for every quantum channel N. We can also write the strong converse entanglement-
assisted classical capacity as

eEA (N) = sup lim sup 1 𝐶 𝜀 (N ⊗𝑛 ).

𝐶 (11.2.18)
EA
𝜀∈[0,1) 𝑛→∞ 𝑛

See Appendix A for a proof.

Having defined the entanglement-assisted classical capacity of a quantum
channel, as well as the strong converse capacity, we now state the main theorem
of this chapter, which gives an expression for the entanglement-assisted classical
capacity of a quantum channel.

Theorem 11.16 Entanglement-Assisted Classical Capacity

For every quantum channel N, its entanglement-assisted classical capac-
ity 𝐶EA (N) and its strong converse entanglement-assisted classical capacity
𝐶eEA (N) are both equal to the mutual information 𝐼 (N), i.e.,

𝐶EA (N) = 𝐶
eEA (N) = 𝐼 (N), (11.2.19)

where 𝐼 (N) is defined in (7.11.102).

There are two ingredients to proving Theorem 11.16:

1. Achievability: We show that 𝐼 (N) is an achievable rate. In general, to show that
𝑅 ∈ R+ is achievable, we define the shared entangled state Ψ𝐴′ 𝐵′ and construct
663
Chapter 11: Entanglement-Assisted Classical Communication

encoding and decoding channels such that for all 𝜀 ∈ (0, 1] and sufficiently
large 𝑛, the encoding and decoding channels correspond to (𝑛, 2𝑛𝑟 , 𝜀) protocols,
as per Definition 11.10, with rates 𝑟 < 𝑅. Thus, if 𝑅 is an achievable rate, then,
for every error probability 𝜀, it is possible to find an 𝑛 large enough, along
with encoding and decoding channels, such that the resulting protocol has rate
arbitrarily close to 𝑅 and maximal error probability bounded from above by 𝜀.
The achievability part of the proof establishes that 𝐶EA (N) ≥ 𝐼 (N).
2. Strong Converse: We show that 𝐼 (N) is a strong converse rate, from which
it follows that 𝐶eEA (N) ≤ 𝐼 (N). In general, to show that 𝑅 ∈ R+ is a strong
converse rate, we show that, given any shared entangled state Ψ𝐴′ 𝐵′ and
any encoding and decoding channels, for every rate 𝑟 > 𝑅, 𝜀 ∈ [0, 1), and
sufficiently large 𝑛, the communication protocol defined by the encoding and
decoding channels is not an (𝑛, 2𝑛𝑟 , 𝜀) protocol.
After showing the achievability and strong converse parts, we can use the
inequality in (11.2.17) to conclude that

𝐼 (N) ≤ 𝐶EA (N) ≤ 𝐶

eEA (N) ≤ 𝐼 (N), (11.2.20)

which immediately implies that 𝐶EA (N) = 𝐶

eEA (N) = 𝐼 (N).

We first establish in Section 11.2.1 that the rate 𝐼 (N) is achievable for entan-
glement-assisted classical communication over N. We then address the additivity of
the mutual information of a channel, in particular of the sandwiched Rényi mutual
information of a channel, in Section 11.2.2. Finally, we prove that 𝐼 (N) is a strong
converse rate in Section 11.2.3. This implies that 𝐼 (N) is a weak converse rate;
however, in Section 11.2.4, we provide an independent proof of this fact, as the
technique used in the proof is useful for alternate communication scenarios (besides
entanglement-assisted communication) for which a strong converse theorem is not
known to hold.

11.2.1 Proof of Achievability

In this section, we prove that 𝐼 (N) is an achievable rate for entanglement-assisted

classical communication over N.
First, recall from Theorem 11.9 that for all 𝜀 ∈ (0, 1), 𝜂 ∈ (0, 𝜀), and 𝛼 ∈ (0, 1),
there exists an (|M|, 𝜀) entanglement-assisted classical communication protocol
664
Chapter 11: Entanglement-Assisted Classical Communication

over N such that

𝛼 1 4𝜀
log2 |M| ≥ 𝐼 𝛼 (N) + log2 − log2 2 , (11.2.21)
𝛼−1 𝜀−𝜂 𝜂
where 𝐼 𝛼 (N) is defined in (11.1.135). We obtained this result through a position-
based coding strategy along with sequential decoding. A simple corollary of this
result is the following.

Corollary 11.17 Lower Bound for Entanglement-Assisted Classical Com-

munication in Asymptotic Setting
Let N be a quantum channel. For all 𝜀 ∈ (0, 1], 𝑛 ∈ N, and 𝛼 ∈ (0, 1), there
exists an (𝑛, |M|, 𝜀) entanglement-assisted classical communication protocol
over 𝑛 uses of N such that

1 1 2 3
log2 |M| ≥ 𝐼 𝛼 (N) − log2 − . (11.2.22)
𝑛 𝑛(1 − 𝛼) 𝜀 𝑛

Proof: Fix 𝜀 ∈ (0, 1]. The inequality (11.2.21) holds for every channel N, which
means that it holds for N ⊗𝑛 . Applying the inequality in (11.2.21) to N ⊗𝑛 and
dividing both sides by 𝑛, we obtain

log2 |M| 1 ⊗𝑛 𝛼 1 1 4𝜀
≥ 𝐼 𝛼 (N ) + log2 − log2 2 (11.2.23)
𝑛 𝑛 𝑛(𝛼 − 1) 𝜀−𝜂 𝑛 𝜂
for all 𝛼 ∈ (0, 1). By restricting the optimization in the definition of 𝐼 𝛼 (N ⊗𝑛 )
to tensor-power states, we conclude that 𝐼 𝛼 (N ⊗𝑛 ) ≥ 𝑛𝐼 𝛼 (N). This follows from
the additivity of the Petz–Rényi relative entropy under tensor-product states (see
Proposition 7.23). So we obtain

log2 |M| 𝛼 1 1 4𝜀
≥ 𝐼 𝛼 (N) + log2 − log2 2 (11.2.24)
𝑛 𝑛(𝛼 − 1) 𝜀−𝜂 𝑛 𝜂
for all 𝛼 ∈ (0, 1). Letting 𝜂 = 𝜀2 , and using the fact that 𝛼 − 1 is negative for
𝛼 ∈ (0, 1), this inequality becomes

log2 |M| 1 2 3
≥ 𝐼 𝛼 (N) − log2 − (11.2.25)
𝑛 𝑛(1 − 𝛼) 𝜀 𝑛
for all 𝛼 ∈ (0, 1). Since 𝜀 is arbitrary, we find that for all 𝜀 ∈ (0, 1], there exists an
(𝑛, |M|, 𝜀) protocol such that (11.2.22) is satisfied, as required. ■
665
Chapter 11: Entanglement-Assisted Classical Communication

The inequality in (11.2.22) gives us, for every 𝜀 ∈ (0, 1] and 𝑛 ∈ N, a lower
bound on the size |M| of the message set that we can take for a corresponding
(𝑛, |M|, 𝜀) entanglement-assisted classical communication protocol defined by
position-based coding and sequential decoding. If instead we fix a particular
communication rate 𝑅 by letting |M| = 2𝑛𝑅 , then we can rearrange the inequality
in (11.2.22) to obtain an upper bound on the maximal error probability of the
corresponding (𝑛, 2𝑛𝑅 , 𝜀) entanglement-assisted classical communication protocol.
Specifically, we conclude that
𝜀 ≤ 2 · 2−𝑛(1−𝛼) ( 𝐼 𝛼 (N)−𝑅− 𝑛 )
3
(11.2.26)
for all 𝛼 ∈ (0, 1).
The inequality in (11.2.22) implies that

𝑛,𝜀 1 2 3
𝐶EA (N) ≥ 𝐼 𝛼 (N) − log2 − (11.2.27)
𝑛(𝛼 − 1) 𝜀 𝑛
for all 𝜀 ∈ (0, 1] and 𝛼 ∈ (0, 1).
We can now use (11.2.22) to prove that the mutual information 𝐼 (N) is an
achievable rate for entanglement-assisted classical communication over N.

Proof of the Achievability Part of Theorem 11.16

Fix 𝜀 ∈ (0, 1] and 𝛿 > 0. Let 𝛿1 , 𝛿2 > 0 be such that

𝛿 = 𝛿1 + 𝛿2 . (11.2.28)
Set 𝛼 ∈ (0, 1) such that
𝛿1 ≥ 𝐼 (N) − 𝐼 𝛼 (N), (11.2.29)
which is possible since 𝐼 𝛼 (N) is monotonically increasing in 𝛼 (this follows from
Proposition 7.23), and since lim𝛼→1− 𝐼 𝛼 (N) = 𝐼 (N) (see Appendix 11.B for a
proof). With this value of 𝛼, take 𝑛 large enough so that

1 2 3
𝛿2 ≥ log2 + . (11.2.30)
𝑛(1 − 𝛼) 𝜀 𝑛

Now, making use of the inequality in (11.2.22) of Corollary 11.17, there exists
an (𝑛, |M|, 𝜀) protocol, with 𝑛 and 𝜀 chosen as above, such that

log2 |M| 1 2 3
≥ 𝐼 𝛼 (N) − log2 − . (11.2.31)
𝑛 𝑛(1 − 𝛼) 𝜀 𝑛
666
Chapter 11: Entanglement-Assisted Classical Communication

Rearranging the right-hand side of this inequality, and using (11.2.28)–(11.2.30),

we find that

log2 |M| 1 2 3
≥ 𝐼 (N) − 𝐼 (N) − 𝐼 𝛼 (N) + log2 + (11.2.32)
𝑛 𝑛(1 − 𝛼) 𝜀 𝑛
≥ 𝐼 (N) − (𝛿1 + 𝛿2 ) (11.2.33)
= 𝐼 (N) − 𝛿. (11.2.34)

We thus have 𝐼 (N) − 𝛿 ≤ 𝑛1 log2 |M|. By the fact stated immediately after
Definition 11.10, we conclude that there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) entanglement-
assisted classical communication protocol with 𝑅 = 𝐼 (N) for all sufficiently large 𝑛
such that (11.2.30) holds. Since 𝜀 and 𝛿 are arbitrary, we have that for all 𝜀 ∈ (0, 1],
𝛿 > 0, and sufficiently large 𝑛, there exists an (𝑛, 2𝑛(𝐼 (N)−𝛿) , 𝜀) entanglement-
assisted classical communication protocol. This means that 𝐼 (N) is an achievable
rate, and thus that 𝐶EA (N) ≥ 𝐼 (N). See Appendix 11.C for a discussion of a
different way of seeing the achievability proof.

11.2.2 Additivity of the Sandwiched Rényi Mutual Information

of a Channel

We now turn our attention to establishing converse bounds for entanglement-assisted

classical communication in the asymptotic setting. Recall from Theorem 11.6 that,
for every quantum channel N, 𝜀 ∈ [0, 1), and all (|M|, 𝜀) entanglement-assisted
classical communication protocols over N,
1
log2 |M| ≤ [𝐼 (N) + ℎ2 (𝜀)], (11.2.35)
1−𝜀
𝛼 1
log2 |M| ≤ e
𝐼𝛼 (N) + log2 ∀ 𝛼 > 1. (11.2.36)
𝛼−1 1−𝜀

To obtain these inequalities, we considered an entanglement-assisted classical

communication protocol over a useless channel and used the hypothesis testing
relative entropy to compare this protocol with the actual protocol over the channel N.
The useless channel in the asymptotic setting is analogous to the one in Figure 11.2
and is shown in Figure 11.5. A simple corollary of Theorem 11.6, which is relevant
for the asymptotic setting, is the following:

667
Chapter 11: Entanglement-Assisted Classical Communication

A1 B1

A2 B2
M3m

A0
E ..
.
A n −1
..
. PσBn ..
.
Bn−1

An Bn m̂

Alice
Ψ A0 B0
Bob

Figure 11.5: Depiction of a protocol that is useless for entanglement-assisted

classical communication in the asymptotic setting. The state encoding the
message 𝑚 via E is discarded and replaced by an arbitrary (but fixed) state 𝜎𝐵𝑛 .

Corollary 11.18 Upper Bounds for Entanglement-Assisted Classical

Communication in Asymptotic Setting
Let N be a quantum channel. For all 𝜀 ∈ [0, 1), 𝑛 ∈ N, and (𝑛, |M|, 𝜀)
entanglement-assisted classical communication protocols over 𝑛 uses of N, the
rate of transmitted bits is bounded from above as follows:

1 1 1 ⊗𝑛 1
log2 |M| ≤ 𝐼 (N ) + ℎ2 (𝜀) , (11.2.37)
𝑛 1−𝜀 𝑛 𝑛

1 1 e ⊗𝑛 𝛼 1
log2 |M| ≤ 𝐼𝛼 (N ) + log2 ∀ 𝛼 > 1. (11.2.38)
𝑛 𝑛 𝑛(𝛼 − 1) 1−𝜀

Proof: Since the inequalities (11.2.35) and (11.2.36) of Theorem 11.6 hold for
every channel N, they hold for the channel N ⊗𝑛 . Therefore, applying (11.2.35) and
(11.2.36) to N ⊗𝑛 and dividing both sides by 𝑛, we immediately obtain the desired
result. ■

The inequalities in the corollary above give us, for all 𝜀 ∈ [0, 1) and 𝑛 ∈ N, an
upper bound on the size |M| of the message set we can take for every corresponding
(𝑛, |M|, 𝜀) entanglement-assisted classical communication protocol. If instead
we fix a particular communication rate 𝑅 by letting |M| = 2𝑛𝑅 , then we can

668
Chapter 11: Entanglement-Assisted Classical Communication

obtain a lower bound on the maximal error probability of the corresponding

(𝑛, 2𝑛𝑅 , 𝜀) entanglement-assisted classical communication protocol. Specifically,
using (11.2.38), we find that
𝜀 ≥ 1 − 2−𝑛 (
𝛼−1
𝛼 )( 𝑅− 𝑛1 e𝐼 𝛼 (N ⊗𝑛 ) ) (11.2.39)
for all 𝛼 > 1.
The inequality in (11.2.37) can be simplified by using the fact that the mutual
information of a channel is additive, as stated in the following theorem:

Theorem 11.19 Additivity of Mutual Information of a Quantum Channel

For every two quantum channels N1 and N2 , the mutual information of N1 ⊗ N2
is equal to the sum of mutual informations of N1 and N2 , i.e.,

𝐼 (N1 ⊗ N2 ) = 𝐼 (N1 ) + 𝐼 (N2 ). (11.2.40)

Proof: We first recall from (7.11.102) that the mutual information 𝐼 (N) of the
channel N is defined as
𝐼 (N) = sup 𝐼 (𝑅; 𝐵)𝜔
𝜓𝑅 𝐴
(11.2.41)
= sup 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ N 𝐴→𝐵 (𝜓 𝐴 )),
𝜓𝑅 𝐴

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), 𝜓 𝑅 𝐴 is a pure state, and 𝑅 is a system with the same

dimension as 𝐴. Now that, as shown in Section 7.11 in the context of generalized
divergences, optimizing over all states 𝜌 𝑅 𝐴 is not required, since
sup 𝐼 (𝑅; 𝐵)𝜔 = sup 𝐼 (𝑅; 𝐵)𝜔 . (11.2.42)
𝜌𝑅 𝐴 𝜓𝑅 𝐴

We also recall that the mutual information 𝐼 ( 𝐴; 𝐵) 𝜌 of a bipartite state 𝜌 𝐴𝐵 is a

special case of conditional mutual information 𝐼 ( 𝐴; 𝐵|𝐶) with trivial conditioning
system 𝐶. By applying the additivity of conditional mutual information (see
(7.2.133)), we thus conclude additivity of mutual information for product states
𝜏𝐴1 𝐵1 ⊗ 𝜔 𝐴2 𝐵2 :
𝐼 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 )𝜏⊗𝜔 = 𝐼 ( 𝐴1 ; 𝐵1 )𝜏 + 𝐼 ( 𝐴2 ; 𝐵2 )𝜔 . (11.2.43)

Using these facts, the inequality

𝐼 (N1 ⊗ N2 ) ≥ 𝐼 (N1 ) + 𝐼 (N2 ), (11.2.44)
669
Chapter 11: Entanglement-Assisted Classical Communication

is straightforward to establish. Letting

𝜌 𝑅1 𝑅2 𝐵1 𝐵2 B ((N1 ) 𝐴1 →𝐵1 ⊗ (N2 ) 𝐴2 →𝐵2 )(𝜓 𝑅1 𝑅2 𝐴1 𝐴2 ), (11.2.45)

𝜏𝑅1 𝐵1 B (N1 ) 𝐴1 →𝐵1 (𝜙 𝑅1 𝐴1 ), (11.2.46)
𝜔 𝑅2 𝐵2 B (N2 ) 𝐴2 →𝐵2 (𝜑 𝑅2 𝐴2 ), (11.2.47)

and restricting the optimization in the mutual information of N to pure product

states 𝜙 ⊗ 𝜑, we get that

𝐼 (N1 ⊗ N2 ) = sup 𝐼 (𝑅1 𝑅2 ; 𝐵1 𝐵2 ) 𝜌 (11.2.48)

𝜓
≥ sup 𝐼 (𝑅1 𝑅2 ; 𝐵1 𝐵2 )𝜏⊗𝜔 (11.2.49)
𝜙⊗𝜑
= sup {𝐼 (𝑅1 ; 𝐵1 )𝜏 + 𝐼 (𝑅2 ; 𝐵2 )𝜔 } (11.2.50)
𝜙⊗𝜑
= sup 𝐼 (𝑅1 ; 𝐵1 )𝜏 + sup 𝐼 (𝑅2 ; 𝐵2 )𝜔 (11.2.51)
𝜙 𝜑
= 𝐼 (N1 ) + 𝐼 (N2 ). (11.2.52)

To prove the reverse inequality, let 𝜌 𝑅𝐵1 𝐵1 B ((N1 ) 𝐴1 →𝐵1 ⊗ (N2 ) 𝐴2 →𝐵2 )
(𝜓 𝑅 𝐴1 𝐴2 ). Then, using the formula in (7.1.8) for the mutual information in
terms of the quantum entropy, it is straightforward to verify that

𝐼 (𝑅; 𝐵1 𝐵2 ) 𝜌 = 𝐼 (𝑅; 𝐵1 ) 𝜌 + 𝐼 (𝑅𝐵1 ; 𝐵2 ) 𝜌 − 𝐼 (𝐵1 ; 𝐵2 ) 𝜌 . (11.2.53)

Now, Klein’s inequality in Proposition 7.3, implies that the mutual information is
non-negative. Using this fact on the last term in (11.2.53), we find that

𝐼 (𝑅; 𝐵1 𝐵2 ) 𝜌 ≤ 𝐼 (𝑅; 𝐵1 ) 𝜌 + 𝐼 (𝑅𝐵1 ; 𝐵2 ) 𝜌 . (11.2.54)

Since N2 is trace preserving, we have that

𝜌 𝑅𝐵1 = Tr 𝐵2 [𝜌 𝑅𝐵1 𝐵2 ] = (N1 ) 𝐴1 →𝐵1 (𝜓 𝑅 𝐴1 ). (11.2.55)

Therefore,
𝐼 (𝑅; 𝐵1 ) 𝜌 ≤ sup 𝐼 (𝑅; 𝐵1 )𝜏 = 𝐼 (N1 ), (11.2.56)
𝜌 𝑅 𝐴1

where the equality follows from (11.2.42). Similarly, by writing 𝜌 𝑅𝐵1 𝐵2 as

𝜌 𝑅𝐵1 𝐵2 = (N2 ) 𝐴2 →𝐵2 (𝜔 𝑅𝐵1 𝐴2 ), 𝜔 𝑅𝐵1 𝐴2 B (N1 ) 𝐴1 →𝐵1 (𝜓 𝑅 𝐴1 𝐴2 ), (11.2.57)

670
Chapter 11: Entanglement-Assisted Classical Communication

we get that
𝐼 (𝑅𝐵1 ; 𝐵2 ) 𝜌 ≤ 𝐼 (N2 ). (11.2.58)
Therefore,
𝐼 (𝑅; 𝐵1 𝐵1 ) 𝜌 ≤ 𝐼 (N1 ) + 𝐼 (N2 ). (11.2.59)
Since the state 𝜓 𝑅 𝐴1 𝐴2 that we started with is arbitrary, we obtain
𝐼 (N1 ⊗ N2 ) = sup 𝐼 (𝑅; 𝐵1 𝐵2 ) 𝜌 ≤ 𝐼 (N1 ) + 𝐼 (N2 ), (11.2.60)
𝜓 𝑅 𝐴1 𝐴2

as required. Combining this inequality with that in (11.2.44), we have the required
equality, 𝐼 (N1 ⊗ N2 ) = 𝐼 (N1 ) + 𝐼 (N2 ). ■

Using the additivity of the mutual information of a channel, the inequality in

(11.2.37) can be rewritten as

log2 |M| 1 1
≤ 𝐼 (N) + ℎ2 (𝜀) , (11.2.61)
𝑛 1−𝜀 𝑛
which implies that

𝑛,𝜀 1 1
𝐶EA (N) ≤ 𝐼 (N) + ℎ2 (𝜀) (11.2.62)
1−𝜀 𝑛
for all 𝑛 ≥ 1 and 𝜀 ∈ (0, 1). Using this inequality, it is straightforward to conclude
that 𝐼 (N) is a weak converse rate for entanglement-assisted classical communication,
and the interested reader can jump ahead to Section 11.2.4 to see this.
We now show that the sandwiched Rényi mutual information e 𝐼𝛼 (N) of a channel
N is also additive, i.e., that
e𝐼𝛼 (N1 ⊗ N2 ) = e
𝐼𝛼 (N1 ) + e
𝐼𝛼 (N2 ) (11.2.63)
for all 𝛼 > 1. One ingredient of the proof is the additivity of the sandwiched Rényi
mutual information of bipartite states with respect to tensor-product states, i.e.,
𝐼𝛼 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 )𝜉⊗𝜔 = e
e 𝐼 𝛼 ( 𝐴1 ; 𝐵 1 ) 𝜉 + e
𝐼 𝛼 ( 𝐴2 ; 𝐵 2 ) 𝜔 , (11.2.64)
where 𝜉 𝐴1 𝐵1 and 𝜔 𝐴2 𝐵2 are states. To show this, let us first recall the definition of
the sandwiched Rényi mutual information of a bipartite state 𝜌 𝐴𝐵 from (7.11.92):
𝐼𝛼 ( 𝐴; 𝐵) 𝜌 = inf 𝐷
e e𝛼 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴 ⊗ 𝜎𝐵 ), (11.2.65)
𝜎𝐵
where the optimization is over states 𝜎𝐵 . This quantity, as well as the sandwiched
Rényi mutual information of a channel, can be written in an alternate way, as we
show in the following lemma, the proof of which can be found in Appendix 11.D.

671
Chapter 11: Entanglement-Assisted Classical Communication

Lemma 11.20
For every bipartite state 𝜌 𝐴𝐵 and 𝛼 > 1, the sandwiched Rényi mutual
𝐼𝛼 ( 𝐴; 𝐵) 𝜌 can be written as
information e

𝐼𝛼 ( 𝐴; 𝐵) 𝜌
e

𝛼 1− 𝛼 𝛼−1 (11.2.66)
= log2 sup Tr 𝐴𝐶 𝜌 𝐴𝛼 ⊗ 𝜏𝐶 𝛼 |𝜓⟩⟨𝜓| 𝐴𝐵𝐶 ,
𝛼−1 𝜏𝐶 𝛼
2𝛼−1

where |𝜓⟩ 𝐴𝐵𝐶 is a purification of 𝜌 𝐴𝐵 and 𝜏𝐶 is a state. The sandwiched Rényi

mutual information 𝐼 (N) of a channel N can be written as
𝛼
𝐼𝛼 (N) =
e inf log2 S𝜎(𝛼) ◦N , (11.2.67)
𝛼 − 1 𝜎𝐵 𝐵
CB, 1→𝛼

1− 𝛼 1− 𝛼
where S𝜎(𝛼)
𝐵
(·) B 𝜎𝐵2𝛼 (·)𝜎𝐵2𝛼 and

1 1
∥M∥ CB, 1→𝛼 B sup M 𝐴→𝐵 𝑌𝑅2𝛼 |Γ⟩⟨Γ| 𝑅 𝐴𝑌𝑅2𝛼 , (11.2.68)
𝑌𝑅 >0, 𝛼
Tr[𝑌𝑅 ]≤1

with M a completely positive map. (See Appendix 11.E for an alternate

expression for ∥·∥ CB,1→𝛼 .)

Proof: See Appendix 11.D. ■

Using the alternate expression in (11.2.66), we establish the additivity statement

in (11.2.64) of the sandwiched Rényi mutual information of a bipartite state.

Proposition 11.21 Additivity of Sandwiched Rényi Mutual Information

of Bipartite States
For every product state 𝜉 𝐴1 𝐵1 ⊗ 𝜔 𝐴2 𝐵2 and 𝛼 > 1, the sandwiched Rényi mutual
𝐼𝛼 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 )𝜉⊗𝜔 is additive, i.e.,
information e

𝐼𝛼 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 )𝜉⊗𝜔 = e
e 𝐼 𝛼 ( 𝐴1 ; 𝐵 1 ) 𝜉 + e
𝐼 𝛼 ( 𝐴2 ; 𝐵 2 ) 𝜔 . (11.2.69)

672
Chapter 11: Entanglement-Assisted Classical Communication

Proof: By definition, we have that

𝐼𝛼 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 )𝜉⊗𝜔 = inf 𝐷
e e𝛼 (𝜉 𝐴1 𝐵1 ⊗ 𝜔 𝐴2 𝐵2 ∥𝜉 𝐴1 ⊗ 𝜔 𝐴2 ⊗ 𝜎𝐵1 𝐵2 ) (11.2.70)
𝜎𝐵1 𝐵2

If we restrict the optimization to product states 𝜎𝐵11 ⊗ 𝜎𝐵22 , then we find that

𝐼𝛼 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 )𝜉⊗𝜔
e
≤ inf 𝐷 e𝛼 (𝜉 𝐴1 𝐵1 ⊗ 𝜔 𝐴2 𝐵2 ∥𝜉 𝐴1 ⊗ 𝜔 𝐴2 ⊗ 𝜎𝐵1 ⊗ 𝜎𝐵2 ) (11.2.71)
1 2
𝜎 ⊗𝜎
1 2
n o
= inf 𝐷 e𝛼 (𝜉 𝐴1 𝐵1 ∥𝜉 𝐴1 ⊗ 𝜎𝐵1 ) + 𝐷e𝛼 (𝜔 𝐴2 𝐵2 ∥𝜔 𝐴2 ⊗ 𝜎𝐵2 ) (11.2.72)
1 2
𝜎 1 ,𝜎 2
𝐼 𝛼 ( 𝐴1 ; 𝐵 1 ) 𝜉 + e
=e 𝐼 𝛼 ( 𝐴2 ; 𝐵 2 ) 𝜔 . (11.2.73)

So
𝐼𝛼 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 )𝜉⊗𝜔 ≤ e
e 𝐼 𝛼 ( 𝐴1 ; 𝐵 1 ) 𝜉 + e
𝐼 𝛼 ( 𝐴2 ; 𝐵 2 ) 𝜔 . (11.2.74)

To show the reverse inequality, we use the alternate expression (11.2.66) in

Lemma 11.20 for the sandwiched Rényi mutual information of a bipartite state.
In this expression, if we take a product purification |𝜓1 ⟩ 𝐴1 𝐵1𝐶1 ⊗ |𝜓2 ⟩ 𝐴2 𝐵2𝐶2 of
𝜉 𝐴1 𝐵1 ⊗ 𝜔 𝐴2 𝐵2 and restrict the optimization to product states, we obtain

𝐼𝛼 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 )𝜉⊗𝜔
e

𝛼 1−𝛼𝛼 𝛼−1
= log2 sup Tr 𝐴1 𝐴2𝐶1𝐶2 𝜉 𝐴1 ⊗ 𝜔 𝐴2 ⊗ 𝜏𝐶1𝛼𝐶2
𝛼−1 𝜏𝐶1 𝐶2

× |𝜓1 ⟩⟨𝜓1 | 𝐴1 𝐵1𝐶1 ⊗ |𝜓2 ⟩⟨𝜓2 | 𝐴2 𝐵2𝐶2 𝛼 (11.2.75)
2𝛼−1

𝛼 1− 𝛼 1− 𝛼 𝛼−1 𝛼−1
≥ log2 sup Tr 𝐴1 𝐴2𝐶1𝐶2 𝜉 𝐴1𝛼 ⊗ 𝜔 𝐴𝛼2 ⊗ 𝜏𝐶1𝛼 ⊗ 𝜏𝐶2𝛼
𝛼−1 𝜏𝐶1 ⊗𝜏𝐶2

×|𝜓1 ⟩⟨𝜓1 | 𝐴1 𝐵1𝐶1 ⊗ |𝜓2 ⟩⟨𝜓2 | 𝐴2 𝐵2𝐶2 𝛼 (11.2.76)
2𝛼−1

𝛼 1− 𝛼 𝛼−1
= log2 sup Tr 𝐴1𝐶1 𝜉 𝐴1𝛼 ⊗ 𝜏𝐶1𝛼 |𝜓1 ⟩⟨𝜓1 | 𝐴1 𝐵1𝐶1
𝛼−1 𝜏𝐶1 ,𝜏𝐶2

1− 𝛼 𝛼−1
⊗Tr 𝐴2𝐶2 𝜔 𝐴2 ⊗ 𝜏𝐶2
𝛼 𝛼
|𝜓2 ⟩⟨𝜓2 | 𝐴2 𝐵2𝐶2 (11.2.77)
𝛼
2𝛼−1
(
𝛼 1− 𝛼 𝛼−1
= log2 sup Tr 𝐴1𝐶1 𝜉 𝐴1 ⊗ 𝜏𝐶1
𝛼 𝛼
|𝜓1 ⟩⟨𝜓1 | 𝐴1 𝐵1𝐶1
𝛼−1 𝜏𝐶1 ,𝜏𝐶2 𝛼
2𝛼−1

673
Chapter 11: Entanglement-Assisted Classical Communication

)
1− 𝛼 𝛼−1
× Tr 𝐴2𝐶2 𝜔 𝐴2 ⊗ 𝜏𝐶2
𝛼 𝛼
|𝜓2 ⟩⟨𝜓2 | 𝐴2 𝐵2𝐶2 (11.2.78)
𝛼
2𝛼−1

𝛼 1− 𝛼 𝛼−1
= log2 sup Tr 𝐴1𝐶1 𝜉 𝐴1 ⊗ 𝜏𝐶1
𝛼 𝛼
|𝜓1 ⟩⟨𝜓1 | 𝐴1 𝐵1𝐶1
𝛼−1 𝜏𝐶1 𝛼
2𝛼−1

𝛼 1− 𝛼 𝛼−1
+ log2 sup Tr 𝐴2𝐶2 𝜔 𝐴2 ⊗ 𝜏𝐶2
𝛼 𝛼
|𝜓2 ⟩⟨𝜓2 | 𝐴2 𝐵2𝐶2 (11.2.79)
𝛼−1 𝜏𝐶2 𝛼
2𝛼−1

𝐼 𝛼 ( 𝐴1 ; 𝐵 1 ) 𝜉 + e
=e 𝐼 𝛼 ( 𝐴2 ; 𝐵 2 ) 𝜔 . (11.2.80)

We have thus shown (11.2.69), as required. ■

Theorem 11.22 Additivity of Sandwiched Rényi Mutual Information of

a Channel
For every two channels N1 and N2 , and for all 𝛼 > 1,

𝐼𝛼 (N1 ⊗ N2 ) = e
e 𝐼𝛼 (N1 ) + e
𝐼𝛼 (N2 ). (11.2.81)

Proof: Recall that

𝐼𝛼 (N) = sup e
e 𝐼𝛼 (𝑅; 𝐵)𝜔 , (11.2.82)
𝜓𝑅 𝐴

where 𝜔 𝑅𝐵 B N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), and the supremum is taken over every pure state 𝜓 𝑅 𝐴 ,
with 𝑅 having the same dimension as 𝐴. The superadditivity of the sandwiched
Rényi mutual information of a channel, namely,

𝐼𝛼 (N1 ⊗ N2 ) ≥ e
e 𝐼𝛼 (N1 ) + e
𝐼𝛼 (N2 ) (11.2.83)

follows immediately by restricting the optimization in the definition (11.2.82) to

product states and using the additivity of the sandwiched Rényi mutual information
of bipartite states, as proven in Proposition 11.21. An explicit proof of this statement
goes along the same lines as (11.2.48)–(11.2.52), with 𝐼 replaced by e 𝐼𝛼 .
To prove the reverse inequality, namely, the subadditivity of the sandwiched
Rényi mutual information of a channel, we use the expression in (11.2.67). By
restricting the infimum in (11.2.67) to product states, we find that

𝐼𝛼 (N1 ⊗ N2 )
e
𝛼
= inf log2 S𝜎(𝛼) ◦ (N1 ⊗ N2 ) (11.2.84)
𝛼 − 1 𝜎𝐵1 𝐵2 𝐵1 𝐵2
CB, 1→𝛼

674
Chapter 11: Entanglement-Assisted Classical Communication

𝛼
≤ inf log2 S (𝛼)1 2 ◦ (N1 ⊗ N2 ) (11.2.85)
𝛼 − 1 𝜎𝐵1 ⊗𝜎𝐵2 𝜎𝐵 ⊗𝜎𝐵
1 2 CB, 1→𝛼
1 2

𝛼 (𝛼) (𝛼)
= inf log2 S𝜎1 ◦ N1 ⊗ S𝜎2 ◦ N2 , (11.2.86)
𝛼 − 1 𝜎𝐵1 ,𝜎𝐵2 𝐵1 𝐵2 CB, 1→𝛼
1 2

where the last equality follows because

(𝛼) (𝛼) (𝛼)
S 1 2 = S 1 ⊗ id𝐵2 ◦ id𝐵1 ⊗ S 2 . (11.2.87)
𝜎𝐵 ⊗𝜎𝐵 𝜎𝐵 𝜎𝐵
1 2 1 2

Now, consider that the norm ∥·∥ CB, 1→𝛼 is multiplicative with respect to tensor
products of completely positive maps, i.e.,

∥M1 ⊗ M2 ∥ CB, 1→𝛼 = ∥M1 ∥ CB, 1→𝛼 ∥M2 ∥ CB, 1→𝛼 (11.2.88)

for every two completely positive maps M1 , M2 and all 𝛼 > 1 (see Appendix 11.F
for a proof). Using this, we find that

𝐼𝛼 (N1 ⊗ N2 )
e

𝛼 (𝛼) (𝛼)
≤ inf log2 S𝜎1 ◦ N1 ⊗ S𝜎2 ◦ N2 (11.2.89)
𝛼 − 1 𝜎𝐵1 ,𝜎𝐵2 𝐵1 𝐵2 CB, 1→𝛼
1 2

𝛼 𝛼
= inf log2 S (𝛼)1 ◦ N1 + inf log2 S (𝛼)2 ◦ N2
𝛼 − 1 𝜎𝐵1 𝜎𝐵
1 CB, 1→𝛼 𝛼 − 1 𝜎𝐵2 𝜎𝐵
2 CB, 1→𝛼
1 2
(11.2.90)
𝐼𝛼 (N1 ) + e
=e 𝐼𝛼 (N2 ). (11.2.91)

We have thus shown that e𝐼𝛼 (N1 ⊗ N2 ) ≤ e

𝐼𝛼 (N1 ) + e
𝐼𝛼 (N2 ), and by combining this
with (11.2.83), we conclude (11.2.81). ■

Note that the additivity of the mutual information of a channel, i.e., Theo-
rem 11.19, follows straightforwardly from the theorem above by taking the limit
𝛼 → 1 (see Appendix 11.B for a proof).
Using the additivity of the sandwiched Rényi mutual information of a channel
(Theorem 11.21), the inequality in (11.2.38) can be written as

1 𝛼 1
log2 |M| ≤ e 𝐼𝛼 (N) + log2 (11.2.92)
𝑛 𝑛(𝛼 − 1) 1−𝜀
675
Chapter 11: Entanglement-Assisted Classical Communication

for all 𝛼 > 1. This implies that

𝑛,𝜀 𝛼 1
𝐶EA (N) ≤e
𝐼𝛼 (N) + log2 (11.2.93)
𝑛(𝛼 − 1) 1−𝜀
for all 𝑛 ≥ 1, 𝜀 ∈ (0, 1), and 𝛼 > 1.

11.2.3 Proof of the Strong Converse

With the inequality in (11.2.93) in hand, we can now prove that the mutual
information 𝐼 (N) is a strong converse rate for entanglement-assisted classical
communication over N and establish that 𝐶
eEA (N) = 𝐼 (N).

Proof of the Strong Converse Part of Theorem 11.16

Fix 𝜀 ∈ [0, 1) and 𝛿 > 0. Let 𝛿1 , 𝛿2 > 0 be such that

𝛿 > 𝛿 1 + 𝛿 2 C 𝛿′ . (11.2.94)

Set 𝛼 ∈ (1, ∞) such that

𝛿1 ≥ e
𝐼𝛼 (N) − 𝐼 (N), (11.2.95)
𝐼𝛼 (N) is monotonically increasing with 𝛼 (following from
which is possible since e
Proposition 7.31), and since lim𝛼→1+ e 𝐼𝛼 (N) = 𝐼 (N) (see Appendix 11.B for a
proof). With this value of 𝛼, take 𝑛 large enough so that

𝛼 1
𝛿2 ≥ log2 . (11.2.96)
𝑛(𝛼 − 1) 1−𝜀

Now, with the values of 𝑛 and 𝜀 as above, every (𝑛, |M|, 𝜀) entanglement-
assisted classical communication protocol satisfies (11.2.92). Rearranging the
right-hand side of this inequality, and using (11.2.94)–(11.2.96), we obtain

log2 |M| 𝛼 1
≤ 𝐼 (N) + e𝐼𝛼 (N) − 𝐼 (N) + log2 (11.2.97)
𝑛 𝑛(𝛼 − 1) 1−𝜀
≤ 𝐼 (N) + 𝛿1 + 𝛿2 (11.2.98)
= 𝐼 (N) + 𝛿′ (11.2.99)
< 𝐼 (N) + 𝛿. (11.2.100)
676
Chapter 11: Entanglement-Assisted Classical Communication

log |M|
So we have that 𝐼 (N) +𝛿 > 2𝑛 for all (𝑛, |M|, 𝜀) entanglement-assisted classical
communication protocols and sufficiently large 𝑛. Due to this strict inequality, it
follows that there cannot exist an (𝑛, 2𝑛(𝐼 (N)+𝛿) , 𝜀) entanglement-assisted classical
communication protocol for all sufficiently large 𝑛 such that (11.2.96) holds, for
if it did there would exist a set M such that |M| = 2𝑛(𝐼 (N)+𝛿) , which we have
just seen is not possible. Since 𝜀 and 𝛿 are arbitrary, we conclude that for all
𝜀 ∈ [0, 1), 𝛿 > 0, and sufficiently large 𝑛, there does not exist an (𝑛, 2𝑛(𝐼 (N)+𝛿) , 𝜀)
entanglement-assisted classical communication protocol. This means that 𝐼 (N) is
a strong converse rate, and thus that 𝐶 eEA (N) ≤ 𝐼 (N). See Appendix 11.G for a
different way of understanding the strong converse.

11.2.4 Proof of the Weak Converse

We now conclude Section 11.2 by providing an independent proof of the fact that
the mutual information 𝐼 (N) of a channel N is a weak converse rate.3

Theorem 11.23 Weak Converse for Entanglement-Assisted Classical

Communication
For every quantum channel N, the mutual information 𝐼 (N) is a weak converse
rate for entanglement-assisted classical communication over N.

Proof: Suppose that 𝑅 is an achievable rate. Then, by definition, for all 𝜀 ∈ (0, 1],
𝛿 > 0, and sufficiently large 𝑛, there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) entanglement-assisted
classical communication protocol over N. For all such protocols, the inequality
(11.2.61) holds by Corollary 11.18 and the additivity of the mutual information of
a channel, i.e.,
1 1
𝑅−𝛿 ≤ 𝐼 (N) + ℎ2 (𝜀) . (11.2.101)
1−𝜀 𝑛
Since this bound holds for all sufficiently large 𝑛, it holds in the limit 𝑛 → ∞, so
that
1
𝑅≤ 𝐼 (N) + 𝛿. (11.2.102)
1−𝜀
3 Recallthat any strong converse rate is also a weak converse rate, so that by the proof of the
strong converse part of Theorem 11.16 we can immediately conclude that 𝐼 (N) is a weak converse
rate.

677
Chapter 11: Entanglement-Assisted Classical Communication

Then, since this inequality holds for all 𝜀 ∈ (0, 1] and 𝛿 > 0, we obtain

1
𝑅 ≤ lim 𝐼 (N) + 𝛿 = 𝐼 (N). (11.2.103)
𝜀,𝛿→0 1 − 𝜀

We have thus shown that if 𝑅 is an achievable rate, then 𝑅 ≤ 𝐼 (N). The

contrapositive of this statement is that if 𝑅 > 𝐼 (N), then 𝑅 is not an achievable rate.
By definition, therefore, 𝐼 (N) is a weak converse rate. ■

11.3 Examples
In this section, we determine the entanglement-assisted classical capacity of some
of the channels that we introduced in Chapter 4. Recall that Theorem 11.16 states
that the entanglement-assisted classical capacity 𝐶EA (N) is given by the mutual
information of the channel N, i.e.,

𝐶EA (N) = 𝐼 (N) = sup 𝐼 (𝑅; 𝐵)𝜔 , (11.3.1)

𝜓𝑅 𝐴

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) and the optimization is over every pure state 𝜓 𝑅 𝐴 , with
the dimension of 𝑅 the same as the dimension of the input system 𝐴 of the channel.
The mutual information 𝐼 (𝑅; 𝐵)𝜔 can be calculated using either the quantum relative
entropy or the quantum entropy via

𝐼 (𝑅; 𝐵)𝜔 = 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ N 𝐴→𝐵 (𝜓 𝐴 )) (11.3.2)

= 𝐻 (𝑅)𝜔 + 𝐻 (𝐵)𝜔 − 𝐻 ( 𝐴𝐵)𝜔 . (11.3.3)

In what follows, we consider the entanglement-assisted classical capacity of

channels that are covariant with respect to a group 𝐺, and then we provide an explicit
expression for the entanglement-assisted classical capacity of the depolarizing,
erasure, and generalized amplitude damping channels (see Section 4.5). See
Figure 11.6 for a plot of these capacities.

11.3.1 Covariant Channels

Let us start with covariant channels. Recall from Definition 4.18 that a channel N is
covariant with respect to a group 𝐺 if there exist projective unitary representations
678
Chapter 11: Entanglement-Assisted Classical Communication

2.00
Dp
1.75 Ep
1.50 Ap

1.25

CE (bits)
1.00

0.75

0.50

0.25

0.00
0.0 0.2 0.4 0.6 0.8 1.0
p

Figure 11.6: The entanglement-assisted classical capacity 𝐶EA of the depolar-

izing channel D 𝑝 (expressed in (11.3.19)), the erasure channel E 𝑝 (expressed in
(11.3.35)), and the amplitude damping channel A 𝑝 (expressed in (11.3.48)), all
of which are defined for the parameter 𝑝 ∈ [0, 1].

𝑔 𝑔
{𝑈 𝐴 }𝑔∈𝐺 and {𝑉𝐵 }𝑔∈𝐺 such that

N(𝑈 𝐴 𝜌(𝑈 𝐴 ) † ) = 𝑉𝐵 N(𝜌)(𝑉𝐵 ) †

𝑔 𝑔 𝑔 𝑔
(11.3.4)
for every state 𝜌 and all 𝑔 ∈ 𝐺.
Suppose that a channel N is covariant with respect to a group 𝐺 such that the
𝑔
representation {𝑈 𝐴 }𝑔∈𝐺 of 𝐺 acting on the input space of N satisfies
1 ∑︁ 𝑔 1
𝑈 𝐴 𝜌 𝐴 (𝑈 𝐴 ) † = Tr[𝜌 𝐴 ] ,
𝑔
(11.3.5)
|𝐺 | 𝑔∈𝐺 𝑑

for all 𝜌 𝐴 , where 𝑑 is the dimension of the input space of the channel N. Such
channels are called irreducibly covariant.
Let us now recall Proposition 7.86, which tells us that the generalized mutual
information for every covariant channel is given as follows:
𝑰(N) = sup{𝑰(𝑅; 𝐵)𝜔 : 𝜙 𝐴 = T𝐺 (𝜙 𝐴 )}, (11.3.6)
𝜙𝑅 𝐴
Í
where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜙 𝑅 𝐴 ) and T𝐺 (·) B |𝐺1 | 𝑔∈𝐺 𝑈 𝑔 (·)𝑈 𝑔† . In other words, for
covariant channels, it suffices to optimize over pure states 𝜙 𝑅 𝐴 for which the reduced
679
Chapter 11: Entanglement-Assisted Classical Communication

state 𝜙 𝐴 is invariant under the channel T𝐺 . For irreducibly covariant channels, the
expression in (11.3.6) simplifies to
𝑰(N) = 𝑰( 𝐴; 𝐵) 𝜌N , (11.3.7)
where 𝜌 N
𝐴𝐵 = N 𝐴 →𝐵 (Φ 𝐴𝐴 ) is the Choi state of N.
′ ′

Using (11.3.6), we immediately obtain the following theorem.

Theorem 11.24 Entanglement-Assisted Classical Capacity of Covariant

Channels
If a channel N 𝐴→𝐵 is covariant with respect to a group 𝐺 as in (11.3.4), then
its entanglement-assisted classical capacity is given by

𝐶EA (N) = sup{𝐼 (𝑅; 𝐵)𝜔 : 𝜙 𝐴 = T𝐺 (𝜙 𝐴 )}, (11.3.8)

𝜙𝑅 𝐴

𝑔
where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜙 𝑅 𝐴 ). If the representation {𝑈 𝐴 }𝑔∈𝐺 is irreducible,
then the entanglement-assisted classical capacity of N is given by the mutual
information of its Choi state 𝜌 N
𝐴𝐵 , i.e.,

𝐶EA (N) = 𝐼 ( 𝐴; 𝐵) 𝜌N . (11.3.9)

11.3.1.1 Depolarizing Channel

In Section 4.5, specifically in (4.5.31), we defined the depolarizing channel on

qubits as
𝑝
D 𝑝 (𝜌) B (1 − 𝑝) 𝜌 + (𝑋 𝜌𝑋 + 𝑌 𝜌𝑌 + 𝑍 𝜌𝑍) (11.3.10)
3
From this, it follows that the channel is covariant with respect to the Pauli operators,
meaning that
D 𝑝 (𝑋 𝜌𝑋) = 𝑋D 𝑝 (𝜌) 𝑋, (11.3.11)
D 𝑝 (𝑌 𝜌𝑌 ) = 𝑌 D 𝑝 (𝜌)𝑌 , (11.3.12)
D 𝑝 (𝑍 𝜌𝑍) = 𝑍D 𝑝 (𝜌)𝑍. (11.3.13)
Furthermore, we have the identity in (4.5.32), which asserts that for every state 𝜌,
1 1 1 1 1
𝜌 + 𝑋 𝜌𝑋 + 𝑌 𝜌𝑌 + 𝑍 𝜌𝑍 = . (11.3.14)
4 4 4 4 2
680
Chapter 11: Entanglement-Assisted Classical Communication

This means that the operators {1, 𝑋, 𝑌 , 𝑍 } satisfy the property in (11.3.5).4 By
Theorem 11.24, we thus have that the entanglement-assisted classical capacity of
the depolarizing channel is simply the mutual information of its Choi state.
D
It is straightforward to see that the Choi state 𝜌 𝐴𝐵𝑝 of the depolarizing channel is
D
𝜌 𝐴𝐵𝑝 = (1 − 𝑝)|Φ+ ⟩⟨Φ+ | 𝐴𝐵
𝑝
|Ψ+ ⟩⟨Ψ+ | 𝐴𝐵 + |Ψ− ⟩⟨Ψ− | 𝐴𝐵 + |Φ− ⟩⟨Φ− | 𝐴𝐵 . (11.3.15)

+
3
Since 𝐻 ( 𝐴) 𝜌D 𝑝 = 𝐻 (𝐵) 𝜌D 𝑝 = log2 (2) = 1 and
𝑝
𝐻 ( 𝐴𝐵) 𝜌D 𝑝 = −(1 − 𝑝) log2 (1 − 𝑝) − 𝑝 log2 , (11.3.16)
3
we find that

𝐶EA (D 𝑝 ) = 𝐼 ( 𝐴; 𝐵) 𝜌D 𝑝 = 𝐻 ( 𝐴) 𝜌D 𝑝 + 𝐻 (𝐵) 𝜌D 𝑝 − 𝐻 ( 𝐴𝐵) 𝜌D 𝑝 (11.3.17)

𝑝
= 2 + (1 − 𝑝) log2 (1 − 𝑝) + 𝑝 log2 (11.3.18)
3
= 2 − ℎ2 ( 𝑝) − 𝑝 log2 (3) (11.3.19)

for all 𝑝 ∈ [0, 1]. See Figure 11.6 above for a plot of the capacity.
Let us also briefly analyze the lower and upper bounds obtained in Corollar-
ies 11.17 and 11.18, respectively. Specifically, let us consider the following bounds
on the maximal error probability that results from these bounds, i.e.,

𝜀 ≤ 2 · 2−𝑛(1−𝛼) ( 𝐼 𝛼 (D 𝑝 )−𝑅− 𝑛 ) ,
3
(11.3.20)
𝜀 ≥ 1 − 2−𝑛 ( 𝛼 )( 𝑅−𝐼 𝛼 (D 𝑝 ) ) ,
𝛼−1 e
(11.3.21)

which are rearrangements of (11.2.22) and (11.2.38) and are discussed further in
Appendices 11.C and 11.G, respectively. Now, by (11.3.7), we have

𝐼𝛼 (D 𝑝 ) = e
e 𝐼𝛼 (𝑅; 𝐵) 𝜌D 𝑝 ≤ e
𝐼 𝛼 (𝑅; 𝐵) 𝜌D 𝑝 , (11.3.22)

where
𝐼 𝛼 ( 𝐴; 𝐵) 𝜌 B 𝐷
e e𝛼 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴 ⊗ 𝜌 𝐵 ). (11.3.23)
4The operators {1, 𝑋, 𝑌 , 𝑍 } form a projective unitary representation of the group Z2 × Z2 =
{(0, 0), (0, 1), (1, 0), (1, 1)}, where Z2 is the group consisting of the set {0, 1} with addition modulo
two. Specficially, we have 𝑈 (0,0) = 1, 𝑈 (0,1) = 𝑋, 𝑈 (1,0) = 𝑍, and 𝑈 (1,1) = 𝑌 .

681
Chapter 11: Entanglement-Assisted Classical Communication

1.0 1.0

0.8 0.8

0.6 0.6
Error, ε

Error, ε
0.4 0.4

0.2 n = 106 0.2

n = 2 · 105
n = 105 n = 107
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Rate, R Rate, R

Figure 11.7: Plot of the error bounds in (11.3.24) and (11.3.25) for the
depolarizing channel D 𝑝 with 𝑝 = 0.4. By increasing the number 𝑛 of channel
uses, it is possible to communicate at rates closer to the capacity (indicated by
the black vertical line) with vanishing error probability. Furthermore, for every
rate above the capacity, as 𝑛 increases, the error probability approaches one at
an exponential rate, consistent with the fact that the mutual information 𝐼 (D 𝑝 )
is a strong converse rate.

For simplicity, let us use the quantity in (11.3.22), which does not involve an
optimization over states 𝜎𝐵 , to place a lower bound on 𝜀, so that

−𝑛 ( 𝛼−1 ) 𝑅− 𝐼
e 𝛼 (𝑅;𝐵) 𝜔
𝜀 ≥ 1−2 𝛼
, (11.3.24)
D
where 𝜔 𝑅𝐵 = 𝜌 𝑅𝐵𝑝 is the Choi state of D 𝑝 . Similarly, for simplicity, let us take the
quantity 𝐼 𝛼 (N), which by definition involves an optimization over all pure states
𝜓 𝑅 𝐴 , and let 𝜓 𝑅 𝐴 be the maximally entangled state Φ+𝑅 𝐴 . So we take the upper
bound on the error probability to be
𝜀 ≤ 2 · 2−𝑛(1−𝛼) ( 𝐼 𝛼 (𝑅;𝐵) 𝜔 −𝑅− 𝑛 ) ,
3
(11.3.25)
D
where 𝜔 𝑅𝐵 = 𝜌 𝑅𝐵𝑝 is the Choi state of D 𝑝 . Then, for 𝑝 = 0.4, we plot in
Figure 11.7 the bounds in (11.3.24) and (11.3.25) (with 𝛼 = 1.0001 and 𝛼 = 0.9999,
respectively) to obtain plots that are analogous to the generic plot in Figure 11.8 in
Appendix 11.G. As portrayed in Figure 11.8, we indeed see that, as the number
𝑛 of channel uses increases, the capacity 𝐶EA (D 𝑝 ) becomes a clearer dividing
point between reliable communication—with nearly-vanishing error probability—
and unreliable communication—with error probability approaching one at an
exponential rate.
682
Chapter 11: Entanglement-Assisted Classical Communication

Let us consider the qudit depolarizing channel, as defined in (4.5.36) in terms

of the Heisenberg–Weyl operators {𝑊𝑧,𝑥 } 𝑧,𝑥 from (3.2.48). From the definition
in (4.5.36) and the properties stated in (3.2.51)–(3.2.53), it follows that the qudit
depolarizing channel is covariant with respect to the Heisenberg–Weyl operators.
Furthermore, the Heisenberg–Weyl operators form an irreducible projective unitary
representation of the group Z𝑑 × Z𝑑 , and thus satisfy
𝑑−1
1 ∑︁ † 1
2
𝑊𝑧,𝑥 𝜌𝑊𝑧,𝑥 = Tr[𝜌] (11.3.26)
𝑑 𝑧,𝑥=0 𝑑

for all 𝜌. Therefore, by Theorem 11.24, we have that

( 𝑝)
𝐶EA (D 𝑝 ) = 𝐼 ( 𝐴; 𝐵) (𝑑) . (11.3.27)
𝜌D 𝑝

By calculations analogous to those above, we obtain

𝐶EA (D (𝑑) 2
𝑝 ) = 2 log2 𝑑 − ℎ2 ( 𝑝) − 𝑝 log2 (𝑑 − 1). (11.3.28)

11.3.1.2 Erasure Channel

In Section 4.5, specifically in (4.5.18), we defined the qubit erasure channel as

E 𝑝 (𝜌) B (1 − 𝑝) 𝜌 + 𝑝Tr[𝜌]|𝑒⟩⟨𝑒| (11.3.29)

for 𝑝 ∈ [0, 1], where |𝑒⟩ is a state that is orthogonal to all states in the input
qubit system. Recall that if we let the output space simply be a qutrit system with
the orthonormal basis {|0⟩, |1⟩, |2⟩}, then the input qubit space can be naturally
embedded into the subspace of the qutrit system spanned by |0⟩ and |1⟩, so that we
can let the erasure state simply be |2⟩. This means that we can write the action of
the erasure channel as

E 𝑝 (𝜌) = (1 − 𝑝) 𝜌 + 𝑝|2⟩⟨2|. (11.3.30)

Observe that, like the depolarizing channel, the erasure channel is covariant
with respect to the group Z2 × Z2 , with the representation {1, 𝑋, 𝑌 , 𝑍 } on the input
qubit space and the representation {1 ⊕ |2⟩⟨2|, 𝑋 ⊕ |2⟩⟨2|, 𝑌 ⊕ |2⟩⟨2|, 𝑍 ⊕ |2⟩⟨2|}
on the output space. Then, by Theorem 11.24, the entanglement-assisted classical
capacity of the erasure channel is equal to the mutual information of its Choi state.
683
Chapter 11: Entanglement-Assisted Classical Communication

E
The Choi state 𝜌 𝐴𝐵𝑝 of the erasure channel is
E 1𝐴
𝜌 𝐴𝐵𝑝 = (1 − 𝑝)|Φ+ ⟩⟨Φ+ | 𝐴𝐵 + 𝑝 ⊗ |2⟩⟨2|. (11.3.31)
2
It is straightforward to verify that
𝐻 ( 𝐴) 𝜌E 𝑝 = 1, (11.3.32)

1− 𝑝
𝐻 (𝐵) 𝜌E 𝑝 = −(1 − 𝑝) log2 − 𝑝 log2 𝑝, (11.3.33)
2
𝑝
𝐻 ( 𝐴𝐵) 𝜌E 𝑝 = −(1 − 𝑝) log2 (1 − 𝑝) − 𝑝 log2 , (11.3.34)
2
so that
𝐶EA (E 𝑝 ) = 𝐼 ( 𝐴; 𝐵) 𝜌E 𝑝 = 2(1 − 𝑝). (11.3.35)

In general, as discussed in Section 4.5.2, we can consider an erasure channel E (𝑑)

𝑝
acting on a qudit system with dimension 𝑑 and orthonormal basis {|1⟩, . . . , |𝑑⟩},
so that the output of the erasure channel is a state on a (𝑑 + 1)-dimensional system,
i.e.,
E (𝑑)
𝑝 (𝜌) = (1 − 𝑝) 𝜌 + 𝑝Tr[𝜌]|𝑑 + 1⟩⟨𝑑 + 1|. (11.3.36)
Then, it is straightforward to see that the qudit erasure channel is irreducibly
covariant with respect to the group Z𝑑 × Z𝑑 , with the corresponding representation
on the input space being {𝑊𝑧,𝑥 : 0 ≤ 𝑧, 𝑥 ≤ 𝑑 − 1} and the representation on
the output space being {𝑊𝑧,𝑥 ⊕ |𝑑 + 1⟩⟨𝑑 + 1| : 0 ≤ 𝑧, 𝑥 ≤ 𝑑 − 1}. Therefore, by
reasoning analogous to the above, we obtain

𝐶EA (E (𝑑)
𝑝 ) = 2(1 − 𝑝) log2 𝑑. (11.3.37)

11.3.2 Generalized Amplitude Damping Channel

In (4.5.10), we defined the generalized amplitide damping channel A𝛾,𝑁 as the

channel with the four Kraus operators in (4.5.11) and (4.5.12), i.e.,
A𝛾,𝑁 (𝜌) = 𝐴1 𝜌 𝐴1† + 𝐴2 𝜌 𝐴2† + 𝐴3 𝜌 𝐴3† + 𝐴4 𝜌 𝐴4† , (11.3.38)
where
√
√ √

1 √︁ 0 0 𝛾
𝐴1 = 1 − 𝑁 , 𝐴2 = 1 − 𝑁 , (11.3.39)
0 1−𝛾 0 0
684
Chapter 11: Entanglement-Assisted Classical Communication

√ √
√︁
1−𝛾 0 0 0
𝐴3 = 𝑁 , 𝐴4 = 𝑁 √ . (11.3.40)
0 1 𝛾 0

Now, it is straightforward to verify that 𝑍 𝐴1 = 𝐴1 𝑍, 𝑍 𝐴2 = −𝐴2 𝑍, 𝑍 𝐴3 = 𝐴3 𝑍,

and 𝑍 𝐴4 = −𝑍 𝐴4 . Therefore,
A𝛾,𝑁 (𝑍 𝜌𝑍) = 𝑍A𝛾,𝑁 (𝜌)𝑍 (11.3.41)
for every state 𝜌. The generalized amplitude damping channel is thus covariant
with respect to {1, 𝑍 }, which can be viewed as a representation of the group 𝐺 = Z2
on both the input and output spaces of the channel. Note that this representation
does not satisfy the property in (11.3.5).
Nevertheless, we can use the expression in (11.3.8) to determine the entan-
glement-assisted classical capacity of the generalized amplitude damping channel.
First, observe that the channel 𝜌 ↦→ |𝐺1 | 𝑔∈𝐺 𝑈𝑔 𝜌𝑈𝑔† is same as the completely
Í
dephasing channel defined in (4.5.28). Since the output of the completely dephasing
channel is always diagonal in the basis {|0⟩, |1⟩}, by (11.3.8) it suffices to optimize
over pure states 𝜓 𝑅 𝐴 such that the reduced state 𝜓 𝐴 is diagonal in the basis {|0⟩, |1⟩}.
Thus, up to an (irrelevant) unitary on the system 𝑅, the pure states 𝜓 𝑅 𝐴 in (11.3.8)
can be taken to have the form
√︁ √
|𝜓⟩ 𝑅 𝐴 = 1 − 𝑝|0, 0⟩ 𝑅 𝐴 + 𝑝|1, 1⟩ 𝑅 𝐴 (11.3.42)
for 𝑝 ∈ [0, 1]. For every such state, it is straightforward to show that the corre-
sponding output state 𝜔 𝑅𝐵 = (A𝛾,𝑁 ) 𝐴→𝐵 (𝜓 𝑅 𝐴 ) has four eigenvalues: (1 − 𝑝) 𝛾𝑁,
(1 − 𝑁) 𝑝𝛾, and
1
𝜆± B (1 − 𝛾(𝑁 + 𝑝 − 2𝑁 𝑝) ± 𝑓 (𝑁, 𝑝, 𝛾)) , (11.3.43)
2
√︃
𝑓 (𝑁, 𝑝, 𝛾) B 𝛾 2 (𝑁 − 𝑝) 2 − 2𝛾(𝑁 + 𝑝 − 2𝑁 𝑝) + 1, (11.3.44)
which means that

𝐻 (𝑅𝐵) 𝜌A𝛾, 𝑁 = −(1 − 𝑝)𝛾𝑁 log2 ((1 − 𝑝)𝛾𝑁)

− (1 − 𝑁) 𝑝𝛾 log2 ((1 − 𝑁) 𝑝𝛾)
− 𝜆 + log2 𝜆+ − 𝜆 − log2 𝜆− . (11.3.45)
Since 𝜔 𝑅 = 𝜓 𝑅 = (1 − 𝑝)|0⟩⟨0| + 𝑝|1⟩⟨1|, we have that 𝐻 (𝑅) 𝜌A𝛾, 𝑁 = ℎ2 ( 𝑝). Finally,
we have
𝐻 (𝐵) 𝜌A𝛾, 𝑁 = ℎ2 (𝛾( 𝑝 − 𝑁) + 1 − 𝑝)). (11.3.46)
685
Chapter 11: Entanglement-Assisted Classical Communication

Therefore,

𝐼 (A𝛾,𝑁 ) = max {ℎ2 ( 𝑝) + ℎ2 (𝛾( 𝑝 − 𝑁) + 1 − 𝑝))

𝑝∈[0,1]
+ (1 − 𝑝)𝛾𝑁 log2 ((1 − 𝑝)𝛾𝑁)
+ (1 − 𝑁) 𝑝𝛾 log2 ((1 − 𝑁) 𝑝𝛾)
+ 𝜆+ log2 𝜆 + + 𝜆 − log2 𝜆 − }. (11.3.47)

In the case 𝑁 = 0, the channel A𝛾,0 is the amplitude damping channel A𝛾 (see
(4.5.1)). Using (11.3.47), we find that

𝐼 (A𝛾 ) = max {ℎ2 ( 𝑝(1 − 𝛾)) + ℎ2 ( 𝑝) − ℎ2 (𝛾 𝑝)} (11.3.48)

𝑝∈[0,1]

for all 𝛾 ∈ [0, 1].

11.4 Summary
In this chapter, we formally defined and studied entanglement-assisted classical
communication. We began with the fundamental one-shot setting, in which a quan-
tum channel is used just once for entanglement-assisted classical communication,
and we defined the one-shot entanglement-assisted classical capacity in (11.1.43).
We then derived upper and lower bounds on the one-shot capacity (Propositions 11.5
and 11.8), making a fundamental link between communication and hypothesis
testing for both bounds. To derive the upper bound, the main conceptual point was
to compare an actual protocol for entanglement-assisted communication with a
useless one. This approach led to the hypothesis testing mutual information as an
upper bound. To derive the lower bound, we employed the combined approach of
sequential decoding and position-based coding, which at its core, is about how well
a correlated state can be distinguished from a product state. Stepping back a bit,
this is conceptually similar to the idea behind the converse upper bound, which
ultimately features a comparison between a correlated state and a product state. We
can consider the one-shot setting to contain the fundamental information theoretic
argument for entanglement-assisted communication.
With the one-shot setting in hand, we moved on the asymptotic setting, in which
the channel is allowed to be used multiple times (as a model of how communication
channels would be used in practice). We defined various notions of communication
686
Chapter 11: Entanglement-Assisted Classical Communication

rates, including achievable rates, capacity, weak converse rates, strong converse
rates, and strong converse capacity. With the fundamental one-shot bounds in
place, we then substituted one use of the channel N with 𝑛 uses (the tensor-product
channel N ⊗𝑛 ) and applied various technical arguments to prove that the mutual
information of a channel is equal to both its capacity and strong converse capacity
for entanglement-assisted communication. As a main step to establish the capacity,
we proved that the mutual information of a channel is additive, and as a main step
to establish the strong converse capacity, we proved that the sandwiched Rényi
mutual information of a channel is additive.
Finally, we calculated the entanglement-assisted classical for several key chan-
nels, including the depolarizing, erasure, and generalized amplitude damping
channels, in order to illustrate the theory on some concrete examples.
As it turns out, the strongest results known in quantum information theory
are for the entanglement-assisted capacity. The results stated above hold for all
quantum channels, and thus can be viewed from the physics perspective as universal
physical laws delineating the ultimate limits of entanglement-assisted classical
communication for any physical process (i.e., described by a quantum channel). In
this sense, shared entanglement simplifies quantum information theory immensely.
Going forward from here, the same concepts such as capacity, achievable rate,
etc. can be defined for different communication tasks (i.e., unassisted classical
communication, quantum communication, private communication, etc.). What
changes is that the known results are not as strong as they are for the entanglement-
assisted setting. We know the capacity of these other communication tasks only
for certain subclasses of channels. This might be considered unfortunate, but a
different perspective is that it is exciting, because rather exotic phenomena such as
superadditivity and superactivation can occur.

11.5 Bibliographic Notes

Entanglement-assisted classical communication was devised by Bennett et al.
(1999b), as an information-theoretic extension of super-dense coding. The
entanglement-assisted classical capacity theorem was proven by Bennett et al.
(2002), and Holevo (2002a) gave a different proof for this theorem.
Entanglement-assisted classical communication in the one-shot setting was

687
Chapter 11: Entanglement-Assisted Classical Communication

considered by Datta and Hsieh (2013); Matthews and Wehner (2014); Datta et al.
(2016); Anshu et al. (2019); Qi et al. (2018b); Anshu et al. (2019). Proposition 11.5
is due to Matthews and Wehner (2014), and the proof given here was found
independently by Qi et al. (2018b) and Anshu et al. (2019).
The position-based coding method was introduced by Anshu et al. (2019).
It can be understood as a quantum generalization of pulse position modulation
(Verdu, 1990; Cariolaro and Erseghe, 2003). Sequential decoding was considered
by Giovannetti et al. (2012); Sen (2012); Wilde (2013); Gao (2015); Oskouei et al.
(2019), and Theorem 11.7 is due to Oskouei et al. (2019). Proposition 11.8 is due
to Qi et al. (2018b), and the proof given here was found by Oskouei et al. (2019).
The strong converse for entanglement-assisted classical capacity was established
by Bennett et al. (2014) and Gupta and Wilde (2015), with the latter paper employing
the Rényi entropic method used in this chapter. Eq. (11.2.66) and the additivity of
sandwiched Rényi mutual information of bipartite states (Proposition 11.21) were
established by Beigi (2013). Eq. (11.2.67) was established by Gupta and Wilde
(2015), and the completely-bound 1 → 𝛼 norm was studied in depth by Devetak
et al. (2006). Theorem 11.22 was proven by Gupta and Wilde (2015), by employing
the multiplicativity of completely bounded norms (Eq. (11.2.88)) found by Devetak
et al. (2006).
The entanglement-assisted classical capacity of the depolarizing and erasure
channels was evaluated by Bennett et al. (1999b), the same capacity for the amplitude
damping channel was evaluated by Giovannetti and Fazio (2005), and the same
capacity for the generalized amplitude damping channel was evaluated by Li-Zhen
and Mao-Fa (2007a).
The proofs in Appendix 11.B are due to Cooney et al. (2016), and the proofs in
Appendices 11.E and 11.F are due to Jencova (2006) (with the proofs in this book
containing some slight variations). The Lieb concavity theorem (Theorem 11.30)
is due to Lieb (1973).

Appendix 11.A Proof of Theorem 11.7

Theorem 11.7 is a consequence of the following more general result:

688
Chapter 11: Entanglement-Assisted Classical Communication

Theorem 11.25
Let {𝑃𝑖 }𝑖=1
𝑁
be a finite set of projectors. For every vector |𝜓⟩ and 𝑐 > 0,

∥|𝜓⟩∥ 22 − ∥𝑃 𝑁 𝑃 𝑁−1 · · · 𝑃1 |𝜓⟩∥ 22 ≤ (1 + 𝑐) ∥ ( 1 − 𝑃 𝑁 )|𝜓⟩∥ 22

𝑁−1
∥ ( 1 − 𝑃𝑖 )|𝜓⟩∥ 22
∑︁
−1
+ (2 + 𝑐 + 𝑐 ) (11.A.1)
𝑖=2
+ (2 + 𝑐 ) ∥ ( 1 − 𝑃1 )|𝜓⟩∥ 22 .
−1

Indeed, recall from Theorem 2.4 that every density operator 𝜌 has a spectral
decomposition of the following form:
∑︁
𝜌= 𝑝( 𝑗)|𝜓 𝑗 ⟩⟨𝜓 𝑗 |, (11.A.2)
𝑗 ∈J

where 𝑝 : J → [0, 1] is a probability distribution, and {|𝜓 𝑗 ⟩} 𝑗 ∈J is an orthonormal

set of eigenvectors. Applying Theorem 11.25, we find that

1 − Tr[𝑃 𝑁 𝑃 𝑁−1 · · · 𝑃1 |𝜓 𝑗 ⟩⟨𝜓 𝑗 |𝑃1 · · · 𝑃 𝑁−1 ]

2 2
= |𝜓 𝑗 ⟩ 2
− 𝑃 𝑁 𝑃 𝑁−1 · · · 𝑃1 |𝜓 𝑗 ⟩ 2
(11.A.3)
𝑁−1
≤ (1 + 𝑐) ( 1 − 𝑃 𝑁 ) |𝜓 𝑗 ⟩ ( 1 − 𝑃𝑖 ) |𝜓 𝑗 ⟩
∑︁
2 −1 2
2
+ (2 + 𝑐 + 𝑐 ) 2
𝑖=2

+ (2 + 𝑐−1 ) ( 1 − 𝑃1 ) |𝜓 𝑗 ⟩
2
2
(11.A.4)
𝑁−1
= (1 + 𝑐) Tr[( 1 − 𝑃 𝑁 ) |𝜓 𝑗 ⟩⟨𝜓 𝑗 |] + (2 + 𝑐 + 𝑐 ) Tr[( 1 − 𝑃𝑖 )|𝜓 𝑗 ⟩⟨𝜓 𝑗 |]
∑︁
−1

𝑖=2
+ (2 + 𝑐 )Tr[( 1 − 𝑃1 ) |𝜓 𝑗 ⟩⟨𝜓 𝑗 |].
−1
(11.A.5)

The reduction from Theorem 11.25 to Theorem 11.7 follows by averaging the above
inequality over the probability distribution 𝑝 : J → [0, 1] and from the fact that
the right-hand side of the resulting inequality can be bounded from above by
𝑁−1
(1 + 𝑐)Tr[( 1 − 𝑃 𝑁 ) 𝜌] + (2 + 𝑐 + 𝑐 ) Tr[( 1 − 𝑃𝑖 ) 𝜌],
∑︁
−1
(11.A.6)
𝑖=1

689
Chapter 11: Entanglement-Assisted Classical Communication

so that
1 − Tr[𝑃 𝑁 𝑃 𝑁−1 · · · 𝑃1 𝜌𝑃1 · · · 𝑃 𝑁−1 𝑃 𝑁 ]
𝑁−1
(11.A.7)
≤ (1 + 𝑐)Tr[( 1 − 𝑃 𝑁 ) 𝜌] + (2 + 𝑐 + 𝑐 ) Tr[( 1 − 𝑃𝑖 ) 𝜌].
∑︁
−1

𝑖=1

We therefore shift our focus to proving Theorem 11.25, and we do so with the aid
of several lemmas. To simplify the notation, we hereafter employ the following
shorthands:

∥· · · ∥ 𝜓 ≡ ∥· · · |𝜓⟩∥ 2 , (11.A.8)
⟨· · · ⟩𝜓 ≡ ⟨𝜓| · · · |𝜓⟩, (11.A.9)
b𝑖 ≡ 1 − 𝑃𝑖 ,
𝑃 (11.A.10)

where for a given operator 𝐴 the expression ⟨𝐴⟩𝜓 = ⟨𝜓| 𝐴|𝜓⟩ is assumed to mean
⟨𝜓|( 𝐴|𝜓⟩). Furthermore, we also assume without loss of generality that the vector
|𝜓⟩ in Theorem 11.25 is a unit vector. This assumption can be dropped by scaling
the resulting inequality by an arbitrary positive number corresponding to the norm
of |𝜓⟩.
First recall that, due to the fact that 𝑃2 = 𝑃 for every projector 𝑃, we have the
following identities holding for all 𝑖 ∈ {1, 2, . . . , 𝑁 }:

⟨𝑃
b𝑖 𝑃𝑖−1 · · · 𝑃1 ⟩𝜓 = ⟨𝑃 b𝑖 𝑃𝑖−1 · · · 𝑃1 ⟩𝜓 ,
b𝑖 𝑃
(11.A.11)
⟨𝑃1 · · · 𝑃𝑖 ⟩𝜓 = ⟨𝑃1 · · · 𝑃𝑖 𝑃𝑖 ⟩𝜓 ,

under the convention that 𝑃𝑖−1 · · · 𝑃1 = 𝑃1 · · · 𝑃𝑖−1 = 1 for 𝑖 = 1.

Lemma 11.26
For a set {𝑃𝑖 }𝑖=1
𝑁
, a unit vector |𝜓⟩, and employing the shorthand in (11.A.8)–
(11.A.10), we have the following identities and inequality:
𝑁
∑︁
⟨𝑃
b𝑖 𝑃𝑖−1 · · · 𝑃1 ⟩𝜓 = 1 − ⟨𝑃 𝑁 · · · 𝑃1 ⟩𝜓 , (11.A.12)
𝑖=1
𝑁
∑︁
⟨𝑃1 · · · 𝑃𝑖−1 𝑃
b𝑖 ⟩𝜓 = 1 − ⟨𝑃1 · · · 𝑃 𝑁 ⟩𝜓 , (11.A.13)
𝑖=1

690
Chapter 11: Entanglement-Assisted Classical Communication

𝑁
∑︁
b𝑖 𝑃𝑖−1 · · · 𝑃1 ⟩𝜓 = 1 − ⟨𝑃1 · · · 𝑃 𝑁 · · · 𝑃1 ⟩𝜓 ,
⟨𝑃1 · · · 𝑃𝑖−1 𝑃 (11.A.14)
𝑖=1
√︃ √︃
1− ⟨𝑃 𝑁 ⟩𝜓 ⟨𝑃1 · · · 𝑃 𝑁 · · · 𝑃1 ⟩𝜓
∑︁𝑁 √︃ √︃
≤ ⟨𝑃 b𝑖 𝑃𝑖−1 · · · 𝑃1 ⟩𝜓 ,
b𝑖 ⟩𝜓 ⟨𝑃1 · · · 𝑃𝑖−1 𝑃
𝑖=1
(11.A.15)

under the convention that 𝑃𝑖−1 · · · 𝑃1 = 𝑃1 · · · 𝑃𝑖−1 = 1 for 𝑖 = 1.

Proof: The following identities are straightforward to verify:

1 = ⟨𝑃
b1 ⟩𝜓 + ⟨𝑃
b2 𝑃1 ⟩𝜓 + · · · + ⟨𝑃
b𝑁−1 𝑃 𝑁−2 · · · 𝑃1 ⟩𝜓 + ⟨𝑃
b𝑁 𝑃 𝑁−1 · · · 𝑃1 ⟩𝜓
+ ⟨𝑃 𝑁 𝑃 𝑁−1 · · · 𝑃1 ⟩𝜓 , (11.A.16)
1 = ⟨𝑃
b1 ⟩𝜓 + ⟨𝑃1 𝑃 b2 ⟩𝜓 + · · · + ⟨𝑃1 · · · 𝑃 𝑁−2 𝑃
b𝑁−1 ⟩𝜓 + ⟨𝑃1 · · · 𝑃 𝑁−1 𝑃
b𝑁 ⟩𝜓
+ ⟨𝑃1 · · · 𝑃 𝑁−1 𝑃 𝑁 ⟩𝜓 , (11.A.17)
1 = ⟨𝑃
b1 ⟩𝜓 + ⟨𝑃1 𝑃 b2 𝑃1 ⟩𝜓 + · · · + ⟨𝑃1 · · · 𝑃 𝑁−2 𝑃
b𝑁−1 𝑃 𝑁−2 · · · 𝑃1 ⟩𝜓
+ ⟨𝑃1 · · · 𝑃 𝑁−1 𝑃
b𝑁 𝑃 𝑁−1 · · · 𝑃1 ⟩𝜓 + ⟨𝑃1 · · · 𝑃 𝑁−1 𝑃 𝑁 𝑃 𝑁−1 · · · 𝑃1 ⟩𝜓 .
(11.A.18)
Consequently, from the equalities in (11.A.16), (11.A.17), and (11.A.18), we obtain
(11.A.12), (11.A.13), and (11.A.14), respectively. The following equality is a direct
consequence of (11.A.16) and (11.A.11):
1 = ⟨𝑃
b1 ⟩𝜓 + ⟨𝑃 b2 𝑃1 ⟩𝜓 + · · · + ⟨𝑃
b2 𝑃 b𝑁−1 𝑃b𝑁−1 𝑃 𝑁−2 · · · 𝑃1 ⟩𝜓
+ ⟨𝑃b𝑁 𝑃b𝑁 𝑃 𝑁−1 · · · 𝑃1 ⟩𝜓 + ⟨𝑃 𝑁 𝑃 𝑁 𝑃 𝑁−1 · · · 𝑃1 ⟩𝜓 . (11.A.19)
By applying the Cauchy–Schwarz inequality from (2.2.30) to (11.A.19), we find
that
√︃ √︃
1 ≤ ⟨𝑃
b1 ⟩𝜓 + ⟨𝑃b2 ⟩𝜓 ⟨𝑃1 𝑃 b2 𝑃1 ⟩𝜓
√︃ √︃
+ · · · + ⟨𝑃 b𝑁 ⟩𝜓 ⟨𝑃1 · · · 𝑃 𝑁−1 𝑃
b𝑁 𝑃 𝑁−1 · · · 𝑃1 ⟩𝜓
√︃ √︃
+ ⟨𝑃 𝑁 ⟩𝜓 ⟨𝑃1 · · · 𝑃 𝑁−1 𝑃 𝑁 𝑃 𝑁−1 · · · 𝑃1 ⟩𝜓 , (11.A.20)
from which (11.A.15) immediately follows. ■

691
Chapter 11: Entanglement-Assisted Classical Communication

Lemma 11.27
For a set {𝑃𝑖 }𝑖=1
𝑁
of projectors, a unit vector |𝜓⟩, and employing the shorthand
in (11.A.8)–(11.A.10), the following inequality holds for 𝑁 ≥ 2:
𝑁 𝑁−1
b𝑖 ( 1 − 𝑃𝑖−1 · · · 𝑃1 )∥ 2𝜓 ≤
∑︁ ∑︁
∥𝑃 b𝑖 ∥ 2𝜓 ,
∥𝑃 (11.A.21)
𝑖=1 𝑖=1

under the convention that 𝑃𝑖−1 · · · 𝑃1 = 𝑃1 · · · 𝑃𝑖−1 = 1 for 𝑖 = 1. Equivalently,

𝑁 𝑁−1
b𝑖 ( 1 −
∑︁ ∑︁
∥𝑃 𝑃𝑖−1 · · · 𝑃1 )∥ 2𝜓 ≤ b𝑖 ∥ 2𝜓 ,
∥𝑃 (11.A.22)
𝑖=2 𝑖=1

due to the aforementioned convention.

Proof: Consider the following chain of equalities:

𝑁
b𝑖 ( 1 − 𝑃𝑖−1 · · · 𝑃1 )∥ 2𝜓
∑︁
∥𝑃
𝑖=1
𝑁
∑︁
= ∥𝑃 b𝑖 𝑃𝑖−1 · · · 𝑃1 ∥ 2𝜓
b𝑖 − 𝑃 (11.A.23)
𝑖=1
𝑁
∑︁
= b𝑖 ∥ 2𝜓 − ⟨𝑃
∥𝑃 b𝑖 𝑃𝑖−1 · · · 𝑃1 ⟩𝜓 − ⟨𝑃1 · · · 𝑃𝑖−1 𝑃
b𝑖 ⟩𝜓
𝑖=1

+⟨𝑃1 · · · 𝑃𝑖−1 𝑃
b𝑖 𝑃𝑖−1 · · · 𝑃1 ⟩𝜓 (11.A.24)
𝑁
!
∑︁
= b𝑖 ∥ 2𝜓 − 1 + ⟨𝑃 𝑁 · · · 𝑃1 ⟩𝜓 − 1 + ⟨𝑃1 · · · 𝑃 𝑁 ⟩𝜓
∥𝑃
𝑖=1
+ 1 − ⟨𝑃1 · · · 𝑃 𝑁 · · · 𝑃1 ⟩𝜓 (11.A.25)
𝑁
!
∑︁
= b𝑖 ∥ 2𝜓 − 1 + ⟨𝑃 𝑁 𝑃 𝑁 𝑃 𝑁−1 · · · 𝑃1 ⟩𝜓 + ⟨𝑃1 · · · 𝑃 𝑁−1 𝑃 𝑁 𝑃 𝑁 ⟩𝜓
∥𝑃
𝑖=1
− ⟨𝑃1 · · · 𝑃 𝑁 · · · 𝑃1 ⟩𝜓 . (11.A.26)

To obtain (11.A.24), we used the identities in (11.A.11). Next, to get (11.A.25),

the identities in (11.A.12), (11.A.13), and (11.A.14) of Lemma 11.26 were used.

692
Chapter 11: Entanglement-Assisted Classical Communication

Continuing, we have that

𝑁
!
∑︁
Eq. (11.A.26) ≤ b𝑖 ∥ 2𝜓 − 1 − ⟨𝑃1 · · · 𝑃 𝑁 · · · 𝑃1 ⟩𝜓
∥𝑃
𝑖=1
√︃ √︃
+ 2 ⟨𝑃 𝑁 ⟩𝜓 ⟨𝑃1 · · · 𝑃 𝑁 · · · 𝑃1 ⟩𝜓 (11.A.27)
𝑁
!
∑︁
= b𝑖 ∥ 2𝜓 − 1 + ⟨𝑃 𝑁 ⟩𝜓
∥𝑃
𝑖=1
√︃ √︃ 2
− ⟨𝑃 𝑁 ⟩𝜓 − ⟨𝑃1 · · · 𝑃 𝑁 · · · 𝑃1 ⟩𝜓
(11.A.28)
𝑁
! 𝑁−1
∑︁ ∑︁
≤ b𝑖 ∥ 2𝜓
∥𝑃 − b𝑁 ∥ 2𝜓
∥𝑃 = b𝑖 ∥ 2𝜓 .
∥𝑃 (11.A.29)
𝑖=1 𝑖=1

To obtain (11.A.27), the Cauchy-Schwarz inequality was employed. ■

We are now in a position to prove Theorem 11.25.

Proof of Theorem 11.25

Consider that
1 − ∥𝑃 𝑁 · · · 𝑃1 ∥ 2𝜓 = 1 − ⟨𝑃1 · · · 𝑃 𝑁 · · · 𝑃1 ⟩𝜓
√︃ √︃
+ 2 1 − ⟨𝑃 𝑁 ⟩𝜓 ⟨𝑃1 · · · 𝑃 𝑁 · · · 𝑃1 ⟩𝜓
√︃ √︃
− 2 1 − ⟨𝑃 𝑁 ⟩𝜓 ⟨𝑃1 · · · 𝑃 𝑁 · · · 𝑃1 ⟩𝜓 (11.A.30)
√︃ √︃
= 2 1 − ⟨𝑃 𝑁 ⟩𝜓 ⟨𝑃1 · · · 𝑃 𝑁 · · · 𝑃1 ⟩𝜓
√︃ √︃ 2
− ⟨𝑃 𝑁 ⟩𝜓 − ⟨𝑃1 · · · 𝑃 𝑁 · · · 𝑃1 ⟩𝜓

− 1 + ⟨𝑃 𝑁 ⟩𝜓 . (11.A.31)
Continuing, we have that
Eq. (11.A.31)
693
Chapter 11: Entanglement-Assisted Classical Communication
√︃
√︁
b𝑁 ∥ 2𝜓 + 2 1 − 𝑃 𝑁 ⟨𝑃1 · · · 𝑃 𝑁 · · · 𝑃1 ⟩𝜓
≤ −∥ 𝑃 (11.A.32)
𝑁 √︃
∑︁ √︃
≤ b𝑁 ∥ 2𝜓
−∥ 𝑃 +2 ⟨𝑃
b𝑖 ⟩𝜓 ⟨𝑃1 · · · 𝑃𝑖−1 𝑃
b𝑖 𝑃𝑖−1 · · · 𝑃1 ⟩𝜓 (11.A.33)
𝑖=1
𝑁 √︃
b𝑖 ( 1 − 𝑃𝑖−1 · · · 𝑃1 )∥ 𝜓 .
∑︁
≤ b𝑁 ∥ 2𝜓
−∥ 𝑃 +2 ⟨𝑃
b𝑖 ⟩𝜓 ∥ 𝑃
b𝑖 ∥ 𝜓 + ∥ 𝑃 (11.A.34)
𝑖=1

First, (11.A.32) is obtained by observing that

√︃ √︃ 2
− ⟨𝑃 𝑁 ⟩𝜓 − ⟨𝑃1 · · · 𝑃 𝑁 · · · 𝑃1 ⟩𝜓 − 1 + ⟨𝑃 𝑁 ⟩𝜓 ≤ −1 + ⟨𝑃 𝑁 ⟩𝜓
b𝑁 ∥ 2𝜓 .
= −∥ 𝑃 (11.A.35)

Next, (11.A.33) follows from (11.A.15) of Lemma 11.26. Then, (11.A.34) is a

consequence of the triangle inequality:
√︃ √︃
⟨𝑃1 · · · 𝑃𝑖−1 𝑃
b𝑖 𝑃𝑖−1 · · · 𝑃1 ⟩𝜓 = ⟨𝑃1 · · · 𝑃𝑖−1 𝑃 b𝑖 𝑃𝑖−1 · · · 𝑃1 ⟩𝜓
b𝑖 𝑃 (11.A.36)
= ∥𝑃
b𝑖 𝑃𝑖−1 · · · 𝑃1 ∥ 𝜓 (11.A.37)
b𝑖 (−1 + 1 − 𝑃𝑖−1 · · · 𝑃1 )∥ 𝜓
= ∥𝑃 (11.A.38)
= ∥−𝑃b𝑖 + 𝑃b𝑖 ( 1 − 𝑃𝑖−1 · · · 𝑃1 )∥ 𝜓 (11.A.39)
≤ ∥𝑃 b𝑖 ( 1 − 𝑃𝑖−1 · · · 𝑃1 )∥ 𝜓 .
b𝑖 ∥ 𝜓 + ∥ 𝑃 (11.A.40)

Continuing, we have that

Eq. (11.A.34)
𝑁 𝑁
b𝑖 ( 1 − 𝑃𝑖−1 · · · 𝑃1 )∥ 𝜓
∑︁ ∑︁
b𝑁 ∥ 2𝜓 + 2
= −∥ 𝑃 b𝑖 ∥ 2𝜓 + 2
∥𝑃 ∥𝑃
b𝑖 ∥ 𝜓 ∥ 𝑃 (11.A.41)
𝑖=1 𝑖=1
𝑁 𝑁
b𝑖 ( 1 − 𝑃𝑖−1 · · · 𝑃1 )∥ 𝜓
∑︁ ∑︁
b𝑁 ∥ 2𝜓 + 2
= −∥ 𝑃 b𝑖 ∥ 2𝜓 + 2
∥𝑃 ∥𝑃
b𝑖 ∥ 𝜓 ∥ 𝑃 (11.A.42)
𝑖=1 𝑖=2
𝑁
∑︁
b𝑁 ∥ 2𝜓 + 2
≤ −∥ 𝑃 b𝑖 ∥ 2𝜓
∥𝑃
𝑖=1
𝑁
b𝑖 ( 1 − 𝑃𝑖−1 · · · 𝑃1 )∥ 2𝜓
∑︁
+ b𝑖 ∥ 2𝜓 + 𝑐−1 ∥ 𝑃
𝑐∥ 𝑃 (11.A.43)
𝑖=2

694
Chapter 11: Entanglement-Assisted Classical Communication

𝑁
∑︁ 𝑁
∑︁ 𝑁−1
∑︁
b𝑁 ∥ 2𝜓 b𝑖 ∥ 2𝜓 b𝑖 ∥ 2𝜓 −1 b𝑖 ∥ 2𝜓
≤ −∥ 𝑃 +2 ∥𝑃 +𝑐 ∥𝑃 +𝑐 ∥𝑃 (11.A.44)
𝑖=1 𝑖=2 𝑖=1
𝑁−1
∑︁
b𝑁 ∥ 2𝜓 + (2 + 𝑐−1 )∥ 𝑃
≤ (1 + 𝑐)∥ 𝑃 b1 ∥ 2𝜓 + (2 + 𝑐 + 𝑐−1 ) b𝑖 ∥ 2𝜓 .
∥𝑃 (11.A.45)
𝑖=2

Eq. (11.A.42) follows from the convention that 𝑃𝑖−1 · · · 𝑃1 = 1 for 𝑖 = 1. Eq.
(11.A.43) is a consequence of the inequality 2𝑥𝑦 ≤ 𝑐𝑥 2 + 𝑐−1 𝑦 2 , holding for 𝑥, 𝑦 ∈ R
and 𝑐 > 0. Finally, (11.A.44) is obtained by using Lemma 11.27.

Appendix 11.B The 𝜶 → 1 Limit of the Sandwic-

hed Rényi Mutual Information of a
Channel
In this appendix, we show that

lim 𝐼 𝛼 (N) = lim+ e

𝐼𝛼 (N) = 𝐼 (N), (11.B.1)
𝛼→1 − 𝛼→1

where we recall that

𝐼 𝛼 (N) = sup 𝐷 𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ N 𝐴→𝐵 (𝜓 𝐴 )), (11.B.2)

𝜓𝑅 𝐴

𝐼𝛼 (N) = sup inf 𝐷

e e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ 𝜎𝐵 ). (11.B.3)
𝜓 𝑅 𝐴 𝜎𝐵

In these definitions, 𝜓 𝑅 𝐴 is a pure state, with the dimension of 𝑅 equal to the

dimension of 𝐴, and in the definition of e𝐼𝛼 (N) the infimum is over states 𝜎𝐵 .
All of the arguments presented here are similar to those in Appendix 10.A,
which we refer to for additional details.
As a consequence of the fact that 𝐼 𝛼 (N) increases monotonically with 𝛼
(see Proposition 7.23), as well as the fact that lim𝛼→1 𝐷 𝛼 (𝜌∥𝜎) = 𝐷 (𝜌∥𝜎) (see
Proposition 7.22), we find that

lim 𝐼 𝛼 (N) = sup sup 𝐷 𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ N 𝐴→𝐵 (𝜓 𝐴 )) (11.B.4)

𝛼→1 − 𝛼∈(0,1) 𝜓 𝑅 𝐴
= sup sup 𝐷 𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ N 𝐴→𝐵 (𝜓 𝐴 )) (11.B.5)
𝜓 𝑅 𝐴 𝛼∈(0,1)

695
Chapter 11: Entanglement-Assisted Classical Communication

= sup 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ N 𝐴→𝐵 (𝜓 𝐴 )) (11.B.6)

𝜓𝑅 𝐴
= 𝐼 (N), (11.B.7)

as required.
Similarly, for the sandwiched Rényi mutual information, we use the fact that
it increases monotonically with 𝛼 (see Proposition 7.31), along with the fact that
lim𝛼→1 𝐷 e𝛼 (𝜌∥𝜎) = 𝐷 (𝜌∥𝜎) (see Proposition 7.30), to obtain

𝐼𝛼 (N) =
lim e e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ 𝜎𝐵 )
inf sup inf 𝐷 (11.B.8)
𝛼→1+ 𝛼∈(1,∞) 𝜓 𝑅 𝐴 𝜎𝐵
e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ 𝜎𝐵 )
= sup inf inf 𝐷 (11.B.9)
𝜓 𝑅 𝐴 𝛼∈(1,∞) 𝜎𝐵
e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ 𝜎𝐵 )
= sup inf inf 𝐷 (11.B.10)
𝜓 𝑅 𝐴 𝜎𝐵 𝛼∈(1,∞)
= sup inf 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ 𝜎𝐵 ) (11.B.11)
𝜓 𝑅 𝐴 𝜎𝐵
= sup 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ N 𝐴→𝐵 (𝜓 𝐴 )) (11.B.12)
𝜓𝑅 𝐴
= 𝐼 (N). (11.B.13)

To obtain the second equality, we made use of the minimax theorem in Theorem 2.25
to exchange inf 𝛼∈(1,∞) and sup𝜓 𝑅 𝐴 . Specifically, we applied that theorem to the
function
(𝛼, 𝜓 𝑅 𝐴 ) ↦→ inf 𝐷
e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜓 𝑅 ⊗ 𝜎𝐵 ), (11.B.14)
𝜎𝐵

which is monotonically increasing in the first argument and continuous in the

second argument.

Appendix 11.C Achievability from a Different Point

of View

Here we show that the mutual information 𝐼 (N) is an achievable rate based on the
alternate definition given in Appendix A. According to that definition, a rate 𝑅 ∈ R+
is an achievable rate for entanglement-asssisted classical communication over N
if there exists a sequence {(𝑛, |M𝑛 |, 𝜀 𝑛 )}𝑛∈N of (𝑛, |M|, 𝜀) entanglement-assisted

696
Chapter 11: Entanglement-Assisted Classical Communication

classical communication protocols over 𝑛 uses of N such that

1
lim inf log2 |M𝑛 | ≥ 𝑅 and lim 𝜀 𝑛 = 0. (11.C.1)
𝑛→∞ 𝑛 𝑛→∞

To start, let us recall Corollary 11.17, which states that for all 𝜀 ∈ (0, 1], 𝑛 ∈ N,
and 𝛼 ∈ (0, 1), there exists an (𝑛, |M|, 𝜀) protocol satisfying

1 1 2 3
log2 |M| ≥ 𝐼 𝛼 (N) − log2 − . (11.C.2)
𝑛 𝑛(1 − 𝛼) 𝜀 𝑛
Fix constants 𝛿1 , 𝛿2 satisfying 0 < 𝛿2 < 𝛿1 < 1. Pick 𝛼𝑛 ∈ (0, 1) and 𝜀 𝑛 ∈ (0, 1]
as follows:
𝛼𝑛 B 1 − 𝑛−(1−𝛿1 ) , 𝜀 𝑛 B 2−𝑛 .
𝛿2
(11.C.3)
Plugging in to (11.C.2), we find that there exists a sequence of {(𝑛, |M𝑛 |, 𝜀 𝑛 )}𝑛∈N
protocols satisfying

1 1 2 3
log2 |M𝑛 | ≥ 𝐼 𝛼𝑛 (N) − log2 − (11.C.4)
𝑛 𝑛(1 − 𝛼𝑛 ) 𝜀𝑛 𝑛
1 + 𝑛 𝛿2 3
= 𝐼 𝛼𝑛 (N) − − (11.C.5)
𝑛 𝛿1 𝑛
1 1 3
= 𝐼 𝛼𝑛 (N) − 𝛿 − 𝛿 −𝛿 − . (11.C.6)
𝑛1 𝑛1 2 𝑛
Now taking the limit 𝑛 → ∞, we find that

1 1 1 3
lim inf log2 |M𝑛 | ≥ lim inf 𝐼 𝛼𝑛 (N) − 𝛿 − 𝛿 −𝛿 − (11.C.7)
𝑛→∞ 𝑛 𝑛→∞ 𝑛1 𝑛1 2 𝑛
= 𝐼 (N), (11.C.8)

and lim𝑛→∞ 𝜀 𝑛 = 0. The equality above follows because lim𝑛→∞ 𝛼𝑛 = 1 and

lim𝛼→1 𝐼 𝛼 (N) = 𝐼 (N) (see Appendix 11.B for a proof). Thus, it follows that the
mutual information rate 𝐼 (N) is achievable according to the alternate definition
given in Appendix A.
In the approach detailed above, the error probability decays subexponentially to
zero (i.e., slower than an exponential decay) and the rate increases to 𝐼 (N) with
increasing 𝑛. If we would like to have exponential decay of the error probability, then
we can instead fix the rate 𝑅 to be a constant satisfying 𝑅 < 𝐼 (N) and reconsider
the analysis. Rearranging the inequality in (11.2.22) in order to get a bound on
697
Chapter 11: Entanglement-Assisted Classical Communication

𝜀 𝑛 , we find that for all 𝛼 ∈ (0, 1), there exists a sequence of {(𝑛, |M𝑛 |, 𝜀 𝑛 )}𝑛∈N
protocols satisfying
𝜀 𝑛 ≤ 2 · 2−𝑛(1−𝛼) ( 𝐼 𝛼 (N)−𝑅− 𝑛 ) .
3
(11.C.9)
Since 𝑅 < 𝐼 (N), lim𝛼→1 𝐼 𝛼 (N) = 𝐼 (N), and since 𝐼 𝛼 (N) is monotonically
increasing in 𝛼 (this follows from Proposition 7.31), there exists an 𝛼∗ < 1 such
that 𝐼 𝛼∗ (N) > 𝑅. Applying the bound in (11.C.9) to this value of 𝛼, we find that
∗
𝜀 𝑛 ≤ 2 · 2−𝑛(1−𝛼 ) ( 𝐼 𝛼∗ (N)−𝑅− 𝑛 ) .
3
(11.C.10)
Then, taking the limit 𝑛 → ∞ on both sides of this inequality, we conclude that
lim𝑛→∞ 𝜀 𝑛 = 0 exponentially fast. Thus, by choosing 𝑅 as a constant satisfying
𝑅 < 𝐼 (N) it follows that there exists a sequence of {(𝑛, 2𝑛𝑅 , 𝜀 𝑛 )}𝑛∈N protocols such
that the error probability 𝜀 𝑛 decays exponentially fast to zero.

Appendix 11.D Proof of Lemma 11.20

We start by writing the definition of e𝐼𝛼 ( 𝐴; 𝐵) 𝜌 as

𝛼 1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
𝐼𝛼 ( 𝐴; 𝐵) 𝜌 = inf
e log2 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼 𝜌 𝐴𝐵 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼 (11.D.1)
𝜎𝐵 𝛼 − 1
𝛼
𝛼 1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
= log2 inf 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼 𝜌 𝐴𝐵 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼 , (11.D.2)
𝛼−1 𝜎𝐵
𝛼

where 𝛼 > 1 and the optimization is over states 𝜎𝐵 . Then, for every purification
|𝜓⟩ 𝐴𝐵𝐶 of 𝜌 𝐴𝐵 , we have

1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼 𝜌 𝐴𝐵 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼

1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
= Tr𝐶 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼 |𝜓⟩⟨𝜓| 𝐴𝐵𝐶 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼 . (11.D.3)

Now, the operator inside Tr𝐶 on the last line in the equation above is rank one,
which means that

1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
Tr𝐶 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼 |𝜓⟩⟨𝜓| 𝐴𝐵𝐶 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼 and

1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
Tr 𝐴𝐵 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼 |𝜓⟩⟨𝜓| 𝐴𝐵𝐶 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼

698
Chapter 11: Entanglement-Assisted Classical Communication

have the same non-zero eigenvalues. This means that their Schatten norms are
equal, so that
𝐼𝛼 ( 𝐴; 𝐵) 𝜌
e

𝛼 1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
= log2 inf Tr𝐶 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼 |𝜓⟩⟨𝜓| 𝐴𝐵𝐶 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼 (11.D.4)
𝛼−1 𝜎𝐵
𝛼
𝛼 1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
= log2 inf Tr 𝐴𝐵 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼 |𝜓⟩⟨𝜓| 𝐴𝐵𝐶 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼 (11.D.5)
𝛼−1 𝜎𝐵
𝛼
Now, we use the variational characterization of the Schatten norm in (2.2.98), which
states that for every operator 𝑋,
∥ 𝑋 ∥ 𝑝 = sup Tr[𝑌 † 𝑋] (11.D.6)
∥𝑌 ∥ 𝑝 ′ =1

for all 1 ≤ 𝑝 ≤ ∞, where 𝑝′ satisfies 1𝑝 + 𝑝1′ = 1. Note that if 𝑋 is positive semi-

definite, then we can restrict the optimization to positive semi-definite operators
𝑌 . Using (11.D.6), with 𝑝 = 𝛼 and 𝑝′ = 𝛼−1 𝛼
, on the expression in (11.D.5), and
since the argument of the norm in that expression is positive semi-definite, we can
optimize over positive semi-definite operators 𝜏𝐶 to obtain
𝐼𝛼 ( 𝐴; 𝐵) 𝜌
e

𝛼 𝛼−1 1− 𝛼 1− 𝛼
= log2 inf sup Tr 𝜏𝐶 𝛼 Tr 𝐴𝐵 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼 ×
𝛼−1 𝜎𝐵 𝜏𝐶

1− 𝛼 1− 𝛼
|𝜓⟩⟨𝜓| 𝐴𝐵𝐶 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼 (11.D.7)

𝛼 1− 𝛼 1− 𝛼 𝛼−1
= log2 inf sup Tr 𝜌 𝐴 ⊗ 𝜎𝐵 ⊗ 𝜏𝐶
2𝛼 2𝛼 2𝛼
𝛼−1 𝜎𝐵 𝜏𝐶

1− 𝛼 1− 𝛼 𝛼−1
×|𝜓⟩⟨𝜓| 𝐴𝐵𝐶 𝜌 𝐴2𝛼 ⊗ 𝜎𝐵2𝛼 ⊗ 𝜏𝐶2𝛼 (11.D.8)

𝛼 1− 𝛼 1− 𝛼 𝛼−1
= log2 inf sup Tr 𝜌 𝐴 ⊗ 𝜎𝐵 ⊗ 𝜏𝐶
𝛼 𝛼 𝛼
|𝜓⟩⟨𝜓| 𝐴𝐵𝐶 (11.D.9)
𝛼−1 𝜎𝐵 𝜏𝐶

𝛼 1− 𝛼 1− 𝛼 𝛼−1
= log2 sup inf Tr 𝜌 𝐴𝛼 ⊗ 𝜎𝐵 𝛼 ⊗ 𝜏𝐶 𝛼 |𝜓⟩⟨𝜓| 𝐴𝐵𝐶 , (11.D.10)
𝛼−1 𝜏𝐶 𝜎𝐵

where the last line follows by applying Sion’s minimax theorem (Theorem 2.24) to
the function

1− 𝛼 1− 𝛼 𝛼−1
(𝜏𝐶 , 𝜎𝐵 ) ↦→ Tr 𝜌 𝐴𝛼 ⊗ 𝜎𝐵 𝛼 ⊗ 𝜏𝐶 𝛼 |𝜓⟩⟨𝜓| 𝐴𝐵𝐶 , (11.D.11)

699
Chapter 11: Entanglement-Assisted Classical Communication

1− 𝛼
which is convex in the first argument because 𝜎𝐵 ↦→ 𝜎𝐵 𝛼 is operator convex and
𝛼−1
concave in the second argument because 𝜏𝐶 ↦→ 𝜏𝐶 𝛼 is operator concave.
Finally, we use Proposition 2.8, which is that

∥𝑋∥𝑝 = inf Tr[𝑋𝑌 ] (11.D.12)

𝑌 ≥0,
∥𝑌 ∥ 𝑝 ′ =1

for all 0 < 𝑝 < 1, where 1𝑝 + 𝑝1′ = 1. Applying this to (11.D.10) with 𝑝′ = 𝛼
1−𝛼 , so
𝛼
that 𝑝 = 2𝛼−1 , we conclude that

𝐼𝛼 ( 𝐴; 𝐵) 𝜌
e

𝛼 1− 𝛼 1− 𝛼 𝛼−1
= log2 sup inf Tr 𝜌 𝐴𝛼 ⊗ 𝜎𝐵 𝛼 ⊗ 𝜏𝐶 𝛼 |𝜓⟩⟨𝜓| 𝐴𝐵𝐶 (11.D.13)
𝛼−1 𝜏𝐶 𝜎𝐵

𝛼 1− 𝛼 1− 𝛼 𝛼−1
= log2 sup inf Tr 𝜎𝐵 𝛼 Tr 𝐴𝐶 𝜌 𝐴𝛼 ⊗ 𝜏𝐶 𝛼 |𝜓⟩⟨𝜓| 𝐴𝐵𝐶 (11.D.14)
𝛼−1 𝜏𝐶 𝜎𝐵

𝛼 1− 𝛼 𝛼−1
= log2 sup Tr 𝐴𝐶 𝜌 𝐴𝛼 ⊗ 𝜏𝐶 𝛼 |𝜓⟩⟨𝜓| 𝐴𝐵𝐶 , (11.D.15)
𝛼−1 𝜏𝐶 𝛼
2𝛼−1

the last line of which is (11.2.66), as required.

To prove (11.2.67), we use the fact that the definition of the sandwiched Rényi
mutual information of a bipartite state can be written as in (11.D.2), i.e.,

𝛼 1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
𝐼𝛼 (𝑅; 𝐵) 𝜌 =
e log2 inf 𝜌 𝑅2𝛼 ⊗ 𝜎𝐵2𝛼 𝜌 𝑅𝐵 𝜌 𝑅2𝛼 ⊗ 𝜎𝐵2𝛼 , (11.D.16)
𝛼−1 𝜎𝐵
𝛼

which means that the definition in (11.2.82) can be written as

𝐼𝛼 (N)
e

𝛼 1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
= sup log inf 𝜓 𝑅2𝛼 ⊗ 𝜎𝐵2𝛼 N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) 𝜓 𝑅2𝛼 ⊗ 𝜎𝐵2𝛼 .
𝛼 − 1 𝜓 𝑅 𝐴 2 𝜎𝐵 𝛼
(11.D.17)
Now, we use the fact mentioned in (2.2.36), which is that for every pure state 𝜓 𝑅 𝐴 ,
with the systems 𝑅 and 𝐴 having the same dimensions, there exists an operator 𝑋 𝑅
such that
|𝜓⟩ 𝑅 𝐴 = (𝑋 𝑅 ⊗ 1 𝐴 )|Γ⟩ 𝑅 𝐴 , (11.D.18)

700
Chapter 11: Entanglement-Assisted Classical Communication

and Tr[𝑋 𝑅† 𝑋 𝑅 ] = 1, with this latter equality

√ following from (2.2.41). By taking a
polar decomposition of 𝑋 𝑅 as 𝑋 𝑅 = 𝑈 𝑅 𝜏 𝑅 for a unitary 𝑈 𝑅 and a state 𝜏𝑅 (see
Theorem 2.3), we can then write
√
|𝜓⟩ 𝑅 𝐴 = (𝑈 𝑅 𝜏𝑅 ⊗ 1 𝐴 )|Γ⟩ 𝑅 𝐴 . (11.D.19)

This implies that

𝜓 𝑅 = Tr 𝐵 [N 𝐴→𝐵 (𝜓 𝑅 𝐴 )] (11.D.20)
√ √
= Tr 𝐵 [(𝑈 𝑅 𝜏𝑅 ⊗ 1𝐵 )N 𝐴→𝐵 (Γ𝑅 𝐴 )( 𝜏𝑅𝑈 𝑅† ⊗ 1𝐵 )] (11.D.21)
= 𝑈 𝑅 𝜏𝑅𝑈 𝑅† , (11.D.22)

where the last equality follows because N is trace preserving and Tr 𝐴 [|Γ⟩⟨Γ| 𝑅 𝐴 ] =
1𝑅 . Using this, we find that

1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
𝜓 𝑅2𝛼 ⊗ 𝜎𝐵2𝛼 N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) 𝜓 𝑅2𝛼 ⊗ 𝜎𝐵2𝛼

1− 𝛼
†
1− 𝛼 √ √ †
1− 𝛼
†
1− 𝛼
= 𝑈 𝑅 𝜏𝑅 𝑈 𝑅 ⊗ 𝜎𝐵
2𝛼 2𝛼
𝑈 𝑅 𝜏𝑅 N 𝐴→𝐵 (Γ𝑅 𝐴 ) 𝜏𝑅𝑈 𝑅 𝑈 𝑅 𝜏𝑅 𝑈 𝑅 ⊗ 𝜎𝐵
2𝛼 2𝛼

1 1− 𝛼 1 1− 𝛼
†
= 𝑈 𝑅 𝜏𝑅2𝛼 ⊗ 𝜎𝐵2𝛼 N 𝐴→𝐵 (Γ𝑅 𝐴 ) 𝜏𝑅2𝛼 𝑈 𝑅 ⊗ 𝜎𝐵2𝛼 . (11.D.23)

Therefore, by exploiting unitary invariance of the 𝛼-Schatten norm, we can write

𝐼𝛼 (N) as
e

𝐼𝛼 (N)
e

𝛼 1 1− 𝛼 1 1− 𝛼
= sup log2 inf 𝜌 𝑅 ⊗ 𝜎𝐵
2𝛼 2𝛼
N 𝐴→𝐵 (Γ𝑅 𝐴 ) 𝜌 𝑅 ⊗ 𝜎𝐵
2𝛼 2𝛼
(11.D.24)
𝛼 − 1 𝜌𝑅 𝜎𝐵
𝛼

𝛼 1 1 1− 𝛼 1
= sup log2 inf N 𝐴→𝐵 (Γ𝑅 𝐴 ) 2 𝜌 𝑅 ⊗ 𝜎𝐵
𝛼 𝛼
N 𝐴→𝐵 (Γ𝑅 𝐴 ) 2 . (11.D.25)
𝛼 − 1 𝜌𝑅 𝜎𝐵
𝛼

Now, the function

1 1 1− 𝛼 1
(𝜌 𝑅 , 𝜎𝐵 ) ↦→ N 𝐴→𝐵 (Γ𝑅 𝐴 ) 2 𝜌 𝑅 ⊗ 𝜎𝐵
𝛼 𝛼
N 𝐴→𝐵 (Γ𝑅 𝐴 ) 2 (11.D.26)
𝛼

is concave in the first argument (this follows from Lemma 11.29 in Appendix 11.F
below) and convex in the second argument (this follows from the operator convexity

701
Chapter 11: Entanglement-Assisted Classical Communication

1− 𝛼
of 𝜎𝐵 ↦→ 𝜎𝐵 𝛼 for 𝛼 > 1 and convexity of the Schatten norm). Thus, by the Sion
minimax theorem (Theorem 2.24), we can exchange sup 𝜌 𝑅 and inf 𝜎𝐵 . Also, we
1− 𝛼 1− 𝛼
define the completely positive map S𝜎(𝛼)
𝐵
by S𝜎(𝛼)
𝐵
(·) B 𝜎𝐵 (·)𝜎𝐵 . We can then
2𝛼 2𝛼

𝐼𝛼 (N) as
further rewrite e

𝐼𝛼 (N)
e

𝛼 1 1
= inf log2 sup (S𝜎(𝛼) ◦ N 𝐴→𝐵 ) 𝜌 𝑅2𝛼 |Γ⟩⟨Γ| 𝑅 𝐴 𝜌 𝑅2𝛼 (11.D.27)
𝛼−1 𝐵 𝜎 𝜌𝑅
𝐵
𝛼
𝛼
= inf log2 S𝜎(𝛼) ◦N , (11.D.28)
𝛼 − 1 𝜎𝐵 𝐵
CB, 1→𝛼

where, to arrive at the last line, we used the definition in (11.2.68). Also, consider
that the optimum in (11.2.68) is achieved when Tr[𝑌𝑅 ] = 1. Therefore,
𝛼
𝐼𝛼 (N) =
e inf log2 S𝜎(𝛼) ◦N , (11.D.29)
𝛼 − 1 𝜎𝐵 𝐵
CB, 1→𝛼

as required.

Appendix 11.E Alternate Expression for the 1 → 𝜶

CB Norm
In this section, we show that
∥M 𝐴→𝐵 (𝑌𝑅 𝐴 ) ∥ 𝛼
∥M∥ CB,1→𝛼 = sup (11.E.1)
𝑌𝑅 𝐴 >0 ∥Tr 𝐴 [𝑌𝑅 𝐴 ] ∥ 𝛼

for every completely positive map M. We start with the expression in (11.2.68)
and write it alternatively as follows:

1 1
∥M∥ CB,1→𝛼 = sup M 𝐴→𝐵 𝑌𝑅2𝛼 |Γ⟩⟨Γ| 𝑅 𝐴𝑌𝑅2𝛼 (11.E.2)
𝑌𝑅 >0, 𝛼
Tr[𝑌𝑅 ]≤1

1 1
= sup M 𝐴→𝐵 𝑌𝑅 |Γ⟩⟨Γ| 𝑅 𝐴𝑌𝑅
2 2
(11.E.3)
𝑌𝑅 >0, 𝛼
∥𝑌𝑅 ∥ 𝛼 ≤1

702
Chapter 11: Entanglement-Assisted Classical Communication

1 1
M 𝐴→𝐵 𝑌𝑅2 |Γ⟩⟨Γ| 𝑅 𝐴𝑌𝑅2
𝛼
= sup . (11.E.4)
𝑌𝑅 >0 ∥𝑌𝑅 ∥ 𝛼

Now, we use the fact that there is a one-to-one correspondence between the operators
𝑌𝑅 and the vectors
1
|Γ𝑌 ⟩ 𝑅 𝐴 B (𝑌𝑅2 ⊗ 1 𝐴 )|Γ⟩ 𝑅 𝐴 . (11.E.5)
This allows us to rewrite the optimization in (11.E.4) in terms of such vectors. Then,
by employing isometric invariance of the norms with respect to an isometry acting
on the reference system 𝑅, we can restrict the optimization to arbitrary vectors
|𝜓⟩ 𝑅 𝐴 . Therefore, we have that
∥M 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥ 𝛼
∥M∥ CB,1→𝛼 = sup , (11.E.6)
𝜓𝑅 𝐴 ∥Tr 𝐴 [𝜓 𝑅 𝐴 ] ∥ 𝛼

where 𝜓 𝑅 𝐴 ≡ |𝜓⟩⟨𝜓| 𝑅 𝐴 . Since the optimization in (11.E.6) is over a subset of

the positive semi-definite operators 𝑌𝑅 𝐴 (and by approximation), we conclude the
inequality
∥M 𝐴→𝐵 (𝑌𝑅 𝐴 )∥ 𝛼
∥M∥ CB,1→𝛼 ≤ sup . (11.E.7)
𝑌𝑅 𝐴 >0 ∥Tr 𝐴 [𝑌𝑅 𝐴 ] ∥ 𝛼

It remains to show the opposite inequality. Consider a vector |𝜙⟩𝑆𝑅 𝐴 that purifies
𝑌𝑅 𝐴 > 0, in the sense that Tr𝑆 [𝜙 𝑆𝑅 𝐴 ] = 𝑌𝑅 𝐴 . Then we have that
∥M 𝐴→𝐵 (𝑌𝑅 𝐴 )∥ 𝛼 ∥ (M 𝐴→𝐵 ⊗ Tr𝑆 )(𝜙 𝑆𝑅 𝐴 )∥ 𝛼
= (11.E.8)
∥Tr 𝐴 [𝑌𝑅 𝐴 ] ∥ 𝛼 ∥Tr𝑆 𝐴 [𝜙 𝑆𝑅 𝐴 ] ∥ 𝛼
∥ (M 𝐴→𝐵 ⊗ Tr𝑆 )(𝜙 𝑆𝑅 𝐴 )∥ 𝛼
≤ sup (11.E.9)
|𝜙⟩𝑆𝑅 𝐴 ∥Tr𝑆 𝐴 [𝜙 𝑆𝑅 𝐴 ] ∥ 𝛼
= ∥M ⊗ Tr∥ CB,1→𝛼 (11.E.10)
= ∥M∥ CB,1→𝛼 ∥Tr∥ CB,1→𝛼 (11.E.11)
= ∥M∥ CB,1→𝛼 (11.E.12)

The third-to-last equality follows from (11.E.6). The second-to-last equality follows
from (11.2.88), as shown in Appendix 11.F. The final inequality follows because
∥Tr∥ CB,1→𝛼 = 1, as can be readily verified.

703
Chapter 11: Entanglement-Assisted Classical Communication

Appendix 11.F Proof of the Multiplicativity of the

1 → 𝜶 CB Norm
In this appendix, we prove the statement in (11.2.88), which is that

∥M1 ⊗ M2 ∥ CB,1→𝛼 = ∥M1 ∥ CB,1→𝛼 ∥M2 ∥ CB,1→𝛼 (11.F.1)

for every two completely positive maps M1 and M2 and all 𝛼 > 1. Recall from
(11.2.68) that
1 1
M 2𝛼
∥M∥ CB,1→𝛼 = sup 𝑌𝑅2𝛼 Γ𝑅𝐵𝑌𝑅 , (11.F.2)
𝑌𝑅 >0, 𝛼
Tr[𝑌𝑅 ]≤1
M B M
where Γ𝑅𝐵 𝐴→𝐵 (Γ𝑅 𝐴 ) is the Choi representation of M, and the dimension of
𝑅 is the same as the dimension of 𝐴. For a completely positive map P𝐶→𝐷 , let us
also define
∥P𝐶→𝐷 (𝑍𝐶 ) ∥ 𝛼
∥P∥ 𝛼→𝛼 B sup . (11.F.3)
𝑍𝐶 >0 ∥𝑍𝐶 ∥ 𝛼
Note that
∥P𝐶→𝐷 (𝑍𝐶 ) ∥ 𝛼
∥P∥ 𝛼→𝛼 = sup (11.F.4)
𝑍𝐶 >0 ∥𝑍𝐶 ∥ 𝛼
= sup ∥P𝐶→𝐷 (𝑍𝐶 )∥ 𝛼 (11.F.5)
𝑍𝐶 >0,
∥𝑍𝐶 ∥ 𝛼 ≤1
1
= sup P𝐶→𝐷 (𝑌𝐶𝛼 ) , (11.F.6)
𝑌𝐶 >0, 𝛼
Tr[𝑌𝐶 ]≤1

where the last equality follows from the substitution 𝑌𝐶 = 𝑍𝐶𝛼 so that Tr[𝑌𝐶 ] =
Tr[𝑍𝐶𝛼 ] = ∥𝑍𝐶 ∥ 𝛼𝛼 .
Now, it immediately follows that

∥M1 ⊗ M2 ∥ CB,1→𝛼 ≥ ∥M1 ∥ CB,1→𝛼 ∥M2 ∥ CB,1→𝛼 . (11.F.7)

Indeed, due to the fact that the Choi representation of M1 ⊗ M2 has a tensor-product
form (see (4.2.17)), we can restrict the optimization in the definition of the norm
∥M1 ⊗ M2 ∥ CB,1→𝛼 to tensor-product operators 𝑌𝑅1 ⊗ 𝑌𝑅2 to obtain

∥M1 ⊗ M2 ∥ CB,1→𝛼
704
Chapter 11: Entanglement-Assisted Classical Communication

1 1
= sup 𝑌𝑅2𝛼1 𝑅2 (Γ𝑅M11𝐵1 ⊗ Γ𝑅M22𝐵2 )𝑌𝑅2𝛼1 𝑅2 (11.F.8)
𝑌𝑅1 𝑅2 >0, 𝛼
Tr[𝑌𝑅1 𝑅2 ]≤1
1 1 1 1
≥ sup (𝑌𝑅2𝛼1 ⊗ 𝑌𝑅2𝛼2 )(Γ𝑅M11𝐵1 ⊗ Γ𝑅M22𝐵2 )(𝑌𝑅2𝛼1 ⊗ 𝑌𝑅2𝛼2 ) (11.F.9)
𝑌𝑅1 >0,𝑌𝑅2 >0, 𝛼
Tr[𝑌𝑅1 ]≤1,Tr[𝑌𝑅2 ]≤1
1 1 1 1
= sup 𝑌𝑅2𝛼1 Γ𝑅M11𝐵1𝑌𝑅2𝛼1 ⊗ 𝑌𝑅2𝛼2 Γ𝑅M22𝐵2𝑌𝑅2𝛼2 (11.F.10)
𝑌𝑅1 >0,𝑌𝑅2 >0, 𝛼
Tr[𝑌𝑅1 ]≤1,Tr[𝑌𝑅2 ]≤1
1 1 1 1
= sup 𝑌𝑅2𝛼1 Γ𝑅M11𝐵1𝑌𝑅2𝛼1 𝑌𝑅2𝛼2 Γ𝑅M22𝐵2𝑌𝑅2𝛼2 (11.F.11)
𝑌𝑅1 >0,𝑌𝑅2 >0, 𝛼 𝛼
Tr[𝑌𝑅1 ]≤1,Tr[𝑌𝑅2 ]≤1
1 1 1 1
= sup 𝑌𝑅2𝛼1 Γ𝑅M11𝐵1𝑌𝑅2𝛼1 sup 𝑌𝑅2𝛼2 Γ𝑅M22𝐵2𝑌𝑅2𝛼2 (11.F.12)
𝑌𝑅1 >0, 𝛼 𝑌2 >0, 𝛼
Tr[𝑌𝑅1 ]≤1 Tr[𝑌𝑅2 ]≤1

= ∥M1 ∥ CB,1→𝛼 ∥M2 ∥ CB,1→𝛼 . (11.F.13)

Now we establish the opposite inequality. Let UM 𝐴→𝐵𝐸 be a linear map that
M
extends M 𝐴→𝐵 , in the sense that there is a linear operator 𝑈 𝐴→𝐵𝐸 such that

UM M M †
𝐴→𝐵𝐸 (𝑌 𝐴 ) = 𝑈 𝐴→𝐵𝐸 𝑌 𝐴 (𝑈 𝐴→𝐵𝐸 ) , (11.F.14)
Tr𝐸 [UM
𝐴→𝐵𝐸 (𝑌 𝐴 )] = M 𝐴→𝐵 (𝑌 𝐴 ). (11.F.15)
1 1
Due to the fact that 𝑌𝑅2𝛼 UM𝐴→𝐵𝐸 (Γ𝑅 𝐴 )𝑌𝑅 is a rank-one operator, and from an
2𝛼

application of a generalization of the Schmidt decomposition (Theorem 2.2), the

following operators have the same non-zero eigenvalues:
1 1 1 1
Tr𝐸 [𝑌𝑅2𝛼 UM
𝐴→𝐵𝐸 (Γ𝑅 𝐴 )𝑌𝑅 ] = 𝑌𝑅 M 𝐴→𝐵 (Γ𝑅 𝐴 )𝑌𝑅
2𝛼 2𝛼 2𝛼
(11.F.16)

and
1 1 1 1
Tr 𝑅𝐵 [𝑌𝑅2𝛼 UM
𝐴→𝐵𝐸 (Γ𝑅 𝐴 )𝑌𝑅 ] = Tr 𝑅 [𝑌𝑅 M 𝐴→𝐵 (Γ𝑅 𝐴 )𝑌𝑅 ]
2𝛼 2𝛼 𝑐 2𝛼
(11.F.17)
1
= M𝑐𝐴→𝐸 ((𝑌 𝐴T ) 𝛼 ), (11.F.18)

where M𝑐𝐴→𝐸 = Tr 𝐵 ◦ UM 𝐴→𝐵𝐸 denotes the complementary map and the last equality
follows by applying the transpose trick in (2.2.40) and the fact that Tr 𝑅 [Γ𝑅 𝐴 ] = 1 𝐴 .
705
Chapter 11: Entanglement-Assisted Classical Communication

Then we find that

1
∥M∥ CB,1→𝛼 = sup M𝑐𝐴→𝐸 ((𝑌 𝐴T ) 𝛼 ) (11.F.19)
𝑌 𝐴 >0, 𝛼
Tr[𝑌 𝐴]≤1
1
= sup M𝑐𝐴→𝐸 (𝑌 𝐴𝛼 ) (11.F.20)
𝑌 𝐴 >0, 𝛼
Tr[𝑌 𝐴]≤1
M𝑐𝐴→𝐸 (𝑌 𝐴 ) 𝛼
= sup (11.F.21)
𝑌 𝐴 >0 ∥𝑌 𝐴 ∥ 𝛼
𝑐
= ∥M ∥ 𝛼→𝛼 , (11.F.22)

where the second equality follows from (11.F.4)–(11.F.6). So we have that

∥M 𝐴→𝐵 ∥ CB,1→𝛼 = M𝑐𝐴→𝐸 𝛼→𝛼

. (11.F.23)

for an arbitrary operator 𝑌 𝐴1 𝐴2 satisfying 𝑌 𝐴1 𝐴2 > 0, and setting 𝑋 𝐴1 𝐸2 B

Finally,
M2 𝐴 →𝐸 (𝑌 𝐴1 𝐴2 ), we can write
𝑐
2 2

(M1𝑐 ) 𝐴1 →𝐸1 ⊗ (M2𝑐 ) 𝐴2 →𝐸2 (𝑌 𝐴1 𝐴2 )

= (M1𝑐 ) 𝐴1 →𝐸1 (M2𝑐 ) 𝐴2 →𝐸2 (𝑌 𝐴1 𝐴2 ) (11.F.24)
= (M1𝑐 ) 𝐴1 →𝐸1 (𝑋 𝐴1 𝐸2 ). (11.F.25)

Then, multiplying and dividing by 𝑋 𝐴1 𝐸2 𝛼

= (M2𝑐 ) 𝐴2 →𝐸2 (𝑌 𝐴1 𝐴2 ) 𝛼
gives

M1𝑐 𝐴 →𝐸 ⊗ M2𝑐 𝐴 →𝐸 (𝑌 𝐴1 𝐴2 )
1 1 2 2 𝛼
𝑌 𝐴1 𝐴2 𝛼

M1𝑐 𝐴 →𝐸 𝑋 𝐴1 𝐸 2 M2𝑐 𝐴2 →𝐸 2
(𝑌 𝐴1 𝐴2 )
1 1 𝛼 𝛼
= (11.F.26)
𝑋 𝐴1 𝐸 2 𝛼
𝑌 𝐴1 𝐴2 𝛼

© M1 𝑐
𝐴1 →𝐸 1
(𝑋 𝐴1 𝐸2 ) ª
𝛼®
≤ sup ®
𝑋 𝐴1 𝐸2 >0 𝑋 𝐴1 𝐸 2 𝛼
« ¬

© M2𝑐 𝐴2 →𝐸 2
(𝑌 𝐴1 𝐴2 ) ª
𝛼®
× sup ® (11.F.27)
𝑌 𝐴1 𝐴2 >0 𝑌 𝐴1 𝐴2 𝛼
« ¬
706
Chapter 11: Entanglement-Assisted Classical Communication

= M1𝑐 𝐴1 →𝐸 1 ⊗ id𝐸2 id 𝐴1 ⊗ M2𝑐 𝐴2 →𝐸 2 𝛼→𝛼 (11.F.28)
𝛼→𝛼

= M1𝑐 𝐴1 →𝐸 1 𝛼→𝛼 M2𝑐 𝐴2 →𝐸2 (11.F.29)
𝛼→𝛼
= ∥M1 ∥ CB,1→𝛼 ∥M2 ∥ CB,1→𝛼 . (11.F.30)

The third equality follows from Lemma 11.28 below. The final equality holds by
(11.F.23). Since 𝑌 𝐴1 𝐴2 is arbitrary, we find that

∥M1 ⊗ M2 ∥ CB,1→𝛼 = M1𝑐 ⊗ M2𝑐 𝛼→𝛼

(11.F.31)
(M1𝑐 ⊗ M2𝑐 )(𝑌 𝐴1 𝐴2 ) 𝛼
= sup (11.F.32)
𝑌 𝐴1 𝐴2 >0 𝑌 𝐴1 𝐴2 𝛼
≤ ∥M1 ∥ CB,1→𝛼 ∥M2 ∥ CB,1→𝛼 . (11.F.33)

So we have that ∥M1 ⊗ M2 ∥ CB,1→𝛼 = ∥M1 ∥ CB,1→𝛼 ∥M2 ∥ CB,1→𝛼 .

Lemma 11.28
Let M be a completely positive map. Then, for id an arbitrary identity map, the
following equality holds,

∥id ⊗ M∥ 𝛼→𝛼 = ∥M∥ 𝛼→𝛼 . (11.F.34)

Proof: The inequality ∥id ⊗ M∥ 𝛼→𝛼 ≥ ∥M∥ 𝛼→𝛼 immediately follows by restrict-
ing the optimization on the left-hand side of the inequality. So we now establish
the non-trivial inequality ∥id ⊗ M∥ 𝛼→𝛼 ≤ ∥M∥ 𝛼→𝛼 . Letting the identity map act
on a reference system 𝑅, consider from (11.F.6) that
1
∥id ⊗ M∥ 𝛼→𝛼 = sup M 𝐴→𝐵 (𝑌𝑅𝛼𝐴 ) . (11.F.35)
𝑌𝑅 𝐴 >0, 𝛼
Tr[𝑌𝑅 𝐴]≤1

𝑑2
Let {𝑉 𝑖 }𝑖=1
𝑅
denote a set of Heisenberg–Weyl operators acting on the reference
system 𝑅 (see (3.2.48)), so that
2
1
𝑑𝑅
1 ∑︁ 𝑖 𝑖 †
𝑉 𝑅 (·)(𝑉 𝑅 ) = Tr[·] . (11.F.36)
𝑑 2𝑅 𝑖=1 𝑑𝑅

707
Chapter 11: Entanglement-Assisted Classical Communication

Then, for an arbitrary 𝑌𝑅 𝐴 > 0 satisfying Tr[𝑌𝑅 𝐴 ] ≤ 1, we use the unitary invariance
of the Schatten norm to obtain
𝑑𝑅 2
1 1 ∑︁ 1
M 𝐴→𝐵 (𝑌𝑅 𝐴 )𝛼
= 2 𝑉𝑅𝑖 M 𝐴→𝐵 (𝑌𝑅𝛼𝐴 )(𝑉𝑅𝑖 ) † (11.F.37)
𝛼 𝑑 𝑅 𝑖=1 𝛼
𝑑𝑅 2
1 ∑︁ 1
= 2 M 𝐴→𝐵 ((𝑉𝑅𝑖 𝑌𝑅 𝐴 (𝑉𝑅𝑖 ) † ) 𝛼 ) (11.F.38)
𝑑 𝑅 𝑖=1 𝛼
1
𝑑 2𝑅
©  1 ∑︁  𝛼ª
𝑉𝑅𝑖 𝑌𝑅 𝐴 (𝑉𝑅𝑖 ) †  ®
 ®
≤ M 𝐴→𝐵  2

(11.F.39)
 𝑑𝑅  ®
𝑖=1
 
« ¬ 𝛼
1
= M 𝐴→𝐵 [𝜋 𝑅 ⊗ 𝑌 𝐴 ] 𝛼 , (11.F.40)
𝛼
where the inequality follows from Lemma 11.29 below, which states that the
1
function 𝑋 ↦→ M(𝑋 𝛼 ) is concave for all 𝛼 > 1. The last equality follows from
𝛼
1𝑅
(3.2.98), with 𝜋 𝑅 = |𝑅| the maximally mixed state and 𝑌 𝐴 = Tr 𝑅 [𝑌𝑅 𝐴 ]. Continuing,
we find that

1 1 1
M 𝐴→𝐵 [𝜋 𝑅 ⊗ 𝑌 𝐴 ] 𝛼 = M 𝐴→𝐵 𝜋 𝑅 ⊗ 𝑌 𝐴 𝛼 𝛼
(11.F.41)
𝛼 𝛼
1 1
= 𝜋 𝑅𝛼 ⊗ M 𝐴→𝐵 (𝑌 𝐴𝛼 ) (11.F.42)
𝛼
1 1
= 𝜋𝑅 𝛼
M 𝐴→𝐵 (𝑌 𝐴 ) 𝛼
(11.F.43)
𝛼 𝛼
1
= M 𝐴→𝐵 (𝑌 𝐴𝛼 ) (11.F.44)
𝛼
1
≤ sup M 𝐴→𝐵 (𝑌 𝐴𝛼 ) (11.F.45)
𝑌 𝐴 >0, 𝛼
Tr[𝑌 𝐴]≤1
= ∥M∥ 𝛼→𝛼 . (11.F.46)
Since the inequality holds for arbitrary 𝑌𝑅 𝐴 > 0 satisfying Tr[𝑌𝑅 𝐴 ] ≤ 1, we find
that
∥id ⊗ M∥ 𝛼→𝛼 ≤ ∥M∥ 𝛼→𝛼 , (11.F.47)
concluding the proof. ■

708
Chapter 11: Entanglement-Assisted Classical Communication

Lemma 11.29
Let 𝑋 be a positive semi-definite operator, and let M be a completely positive
map. For 𝛼 > 1, the following function is concave:
1
𝑋 ↦→ M(𝑋 𝛼 ) . (11.F.48)
𝛼

Proof: Since M is completely positive, it has a Kraus representation as

∑︁
M(𝑍) = 𝑀𝑖 𝑍 𝑀𝑖† . (11.F.49)
𝑖

From Proposition 2.8, consider that

1
h 1
i
M(𝑋 )𝛼 = sup Tr M(𝑋 )𝑌 𝛼 (11.F.50)
𝛼 𝑌 >0,
∥𝑌 ∥ 𝛼 ≤1
𝛼−1
h 1 𝛼−1
i
= sup Tr M(𝑋 )𝑌 𝛼 𝛼 (11.F.51)
𝑌 >0,
Tr[𝑌 ]≤1
∑︁ h i
Tr 𝑀𝑖 𝑋 𝛼 𝑀𝑖†𝑌 𝛼 .
1 𝛼−1
= sup (11.F.52)
𝑌 >0, 𝑖
Tr[𝑌 ]≤1

The Lieb concavity theorem (see Theorem 11.30 below) is the statement that the
following function is jointly concave with respect to positive semi-definite 𝑅 and 𝑆
for arbitrary 𝑡 ∈ (0, 1) and an arbitrary operator 𝐾:

(𝑅, 𝑆) ↦→ Tr[𝐾 𝑅𝑡 𝐾 † 𝑆 1−𝑡 ]. (11.F.53)

Let 𝑋0 , 𝑋1 ≥ 0 and let 𝑌0 , 𝑌1 > 0 be such that Tr[𝑌0 ], Tr[𝑌1 ] ≤ 1. Then for
𝜆 ∈ (0, 1], and defining

𝑋𝜆 B 𝜆𝑋0 + (1 − 𝜆) 𝑋1 , 𝑌𝜆 B 𝜆𝑌0 + (1 − 𝜆) 𝑌1 , (11.F.54)

we find that

1 𝛼−1 1 𝛼−1
𝜆Tr M(𝑋0𝛼 )𝑌0 𝛼 + (1 − 𝜆) Tr M(𝑋1𝛼 )𝑌1 𝛼
∑︁
∑︁ 1 𝛼−1 1 𝛼−1
= 𝜆Tr 𝑀𝑖 𝑋0𝛼 𝑀𝑖†𝑌0 𝛼 + (1 − 𝜆)Tr 𝑀𝑖 𝑋1𝛼 𝑀𝑖†𝑌1 𝛼 (11.F.55)
𝑖 𝑖

709
Chapter 11: Entanglement-Assisted Classical Communication

∑︁ 1 𝛼−1 1 𝛼−1
= 𝜆Tr 𝑀𝑖 𝑋0 𝑀𝑖†𝑌0
𝛼 𝛼
+ (1 − 𝜆) Tr 𝑀𝑖 𝑋1 𝑀𝑖†𝑌1
𝛼 𝛼
(11.F.56)
𝑖
∑︁
1 𝛼−1

≤ Tr 𝑀𝑖 𝑋𝜆 𝑀𝑖†𝑌𝜆
𝛼 𝛼
(11.F.57)
𝑖

1 𝛼−1
= Tr M(𝑋𝜆 )𝑌𝜆 𝛼 𝛼
(11.F.58)

1 𝛼−1 1
≤ sup Tr M(𝑋𝜆𝛼 )𝑌 𝐴 𝛼 = M(𝑋𝜆𝛼 ) , (11.F.59)
𝑌 𝐴 >0, 𝛼
Tr[𝑌 𝐴]≤1

where the first inequality follows from an application of the Lieb concavity theorem,
and the second inequality follows from applying (11.F.52) and because 𝑌𝜆 is a
particular operator satisfying 𝑌𝜆 > 0 and Tr[𝑌𝜆 ] ≤ 1. Since the chain of inequalities
holds for arbitary 𝑌0 , 𝑌1 > 0 such that Tr[𝑌0 ], Tr[𝑌1 ] ≤ 1, we conclude that
1 1 1
𝜆 M(𝑋0𝛼 ) + (1 − 𝜆) M(𝑋1𝛼 ) ≤ M(𝑋𝜆𝛼 ) , (11.F.60)
𝛼 𝛼 𝛼

which concludes the proof. ■

Theorem 11.30 Lieb Concavity

The following function is jointly concave with respect to positive semi-definite
operators 𝑅 and 𝑆 for arbitrary 𝑡 ∈ (0, 1) and an arbitrary operator 𝐾:

(𝑅, 𝑆) ↦→ Tr[𝐾 𝑅𝑡 𝐾 † 𝑆 1−𝑡 ]. (11.F.61)

Proof: We begin by restricting the first argument of the function in (11.F.61) to

positive definite operators. Defining |𝐾⟩ 𝑅 𝐴 = 𝐾 𝐴† |Γ⟩ 𝑅 𝐴 , consider that

Tr[𝐾 𝑅 𝐾 𝑆 ] = ⟨Γ| 𝑅 𝐴 1 𝑅 ⊗ 𝐾 𝑅 𝐾 𝑆
𝑡 † 1−𝑡 𝑡 † 1−𝑡
|Γ⟩ 𝑅 𝐴 (11.F.62)
𝐴
T 1−𝑡 𝑡 †
= ⟨Γ| 𝑅 𝐴 (𝑆 𝑅 ) ⊗ 𝐾 𝐴 𝑅 𝐴 𝐾 𝐴 |Γ⟩ 𝑅 𝐴 (11.F.63)
⟨Γ| 𝑅 𝐴 𝐾 𝐴 (𝑆 T𝑅 ) 1−𝑡 ⊗ 𝑅𝑡𝐴 𝐾 𝐴† |Γ⟩ 𝑅 𝐴

= (11.F.64)
1 1
= ⟨𝐾 | 𝑅 𝐴 𝑅 𝐴 (𝑆 T𝑅 ) 1−𝑡 ⊗ 𝑅𝑡−1
2
𝐴 𝑅 𝐴 |𝐾⟩ 𝑅 𝐴 2
(11.F.65)
1 1−𝑡 12
= ⟨𝐾 | 𝑅 𝐴 𝑅 𝐴2 𝑆 T𝑅 ⊗ 𝑅 −1
𝐴 𝑅 𝐴 |𝐾⟩ 𝑅 𝐴 (11.F.66)

710
Chapter 11: Entanglement-Assisted Classical Communication

1 1
= ⟨𝐾 | 𝑅 𝐴 𝑅 𝐴2 𝑔(𝑆 T𝑅 ⊗ 𝑅 −1
𝐴 )𝑅 𝐴 |𝐾⟩ 𝑅 𝐴 ,
2
(11.F.67)

where the fourth equality holds by the positive definiteness of 𝑅, and where
𝑔(𝑥) B 𝑥 1−𝑡 is an operator concave function. For 𝜆 ∈ [0, 1], let

𝑅𝜆 B 𝜆𝑅0 + (1 − 𝜆) 𝑅1 , 𝑆𝜆 B 𝜆𝑆0 + (1 − 𝜆) 𝑆1 , (11.F.68)

where 𝑅0 and 𝑅1 are positive definite and 𝑆0 and 𝑆1 are positive semi-definite. Also,
let

𝐺 0 B 1 ⊗ 𝜆𝑅0 (𝑅𝜆 ) − 2 ,
√︁ 1
(11.F.69)
𝐺 1 B 1 ⊗ (1 − 𝜆) 𝑅1 (𝑅𝜆 ) − 2 .
√︁ 1
(11.F.70)

Then

𝐺 †0 𝐺 0 + 𝐺 †1 𝐺 1 = 1 ⊗ (𝑅𝜆 ) − 2 𝜆𝑅0 (𝑅𝜆 ) − 2

1 1

+ 1 ⊗ (𝑅𝜆 ) − 2 (1 − 𝜆) 𝑅1 (𝑅𝜆 ) − 2
1 1
(11.F.71)
= 1 ⊗ (𝑅𝜆 ) − 21
𝑅𝜆 (𝑅𝜆 ) − 12
(11.F.72)
= 1 ⊗ 1. (11.F.73)

A variation of the operator Jensen inequality (Theorem 2.16) is that the following
inequality holds for an operator concave function 𝑓 , a finite set {𝑋𝑖 }𝑖 of Hermitian
operators, and a finite set {𝐴𝑖 }𝑖 of operators satisfying 𝑖 𝐴𝑖† 𝐴𝑖 = 1:
Í

!
∑︁ ∑︁
𝐴𝑖† 𝑓 (𝑋𝑖 ) 𝐴𝑖 ≤ 𝑓 𝐴𝑖† 𝑋𝑖 𝐴𝑖 . (11.F.74)
𝑖 𝑖

Then from the operator Jensen inequality and (11.F.67), we conclude that

𝜆Tr[𝐾 𝑅0𝑡 𝐾 † 𝑆01−𝑡 ] + (1 − 𝜆) Tr[𝐾 𝑅1𝑡 𝐾 † 𝑆11−𝑡 ]

1 1
= 𝜆⟨𝐾 | 𝑅 𝐴 (𝑅0 ) 𝐴2 𝑔(𝑆0T ⊗ 𝑅0−1 ) (𝑅0 ) 𝐴2 |𝐾⟩ 𝑅 𝐴
1 1
+ (1 − 𝜆) ⟨𝐾 | 𝑅 𝐴 (𝑅1 ) 𝐴2 𝑔(𝑆1T ⊗ 𝑅1−1 ) (𝑅1 ) 𝐴2 |𝐾⟩ 𝑅 𝐴 (11.F.75)
1 1
= 𝜆⟨𝐾 | 𝑅 𝐴 (𝑅0 ) 𝐴2 𝑔(𝜆𝑆0T ⊗ (𝜆𝑅0 ) −1 ) (𝑅0 ) 𝐴2 |𝐾⟩ 𝑅 𝐴
1
+ (1 − 𝜆) ⟨𝐾 | 𝑅 𝐴 (𝑅1 ) 𝐴2 𝑔((1 − 𝜆) 𝑆1T ⊗ ((1 − 𝜆) 𝑅1 ) −1 ) (𝑅1 ) 1/2
𝐴 |𝐾⟩ 𝑅 𝐴
(11.F.76)
711
Chapter 11: Entanglement-Assisted Classical Communication

1 1
= ⟨𝐾 | 𝑅 𝐴 (𝑅𝜆 ) 𝐴2 𝐺 †0 𝑔(𝜆𝑆0T ⊗ (𝜆𝑅0 ) −1 )𝐺 0 (𝑅𝜆 ) 𝐴2 |𝐾⟩ 𝑅 𝐴
1 1
+ ⟨𝐾 | 𝑅 𝐴 (𝑅𝜆 ) 𝐴2 𝐺 †1 𝑔((1 − 𝜆) 𝑆1T ⊗ ((1 − 𝜆) 𝑅1 ) −1 )𝐺 1 (𝑅𝜆 ) 𝐴2 |𝐾⟩ 𝑅 𝐴 (11.F.77)
1 1
≤ ⟨𝐾 | 𝑅 𝐴 (𝑅𝜆 ) 𝐴2 𝑔(𝐿) (𝑅𝜆 ) 𝐴2 |𝐾⟩ 𝑅 𝐴 , (11.F.78)
√ 1
where the third equality follows because 1 𝑅 ⊗ 𝜆𝑅02 = 1 𝑅 ⊗ (𝑅𝜆 ) 2 𝐺 †0 . In the last
1

line, we have let

† T −1
𝐿 B 𝐺 0 𝜆𝑆0 ⊗ (𝜆𝑅0 ) 𝐺0

† T −1
+ 𝐺 1 (1 − 𝜆) 𝑆1 ⊗ ((1 − 𝜆) 𝑅1 ) 𝐺 1 . (11.F.79)

Consider that

𝐿 = 𝐺 †0 𝜆𝑆0T ⊗ (𝜆𝑅0 ) −1 𝐺 0 + 𝐺 †1 (1 − 𝜆) 𝑆1T ⊗ ((1 − 𝜆) 𝑅1 ) −1 𝐺 1

= 1 ⊗ (𝑅𝜆 ) − 12
1 ⊗ 𝜆𝑅0 (𝑅𝜆 ) − 12
√︁ T −1
√︁
𝜆𝑅0 𝜆𝑆0 ⊗ (𝜆𝑅0 )

+ 1 ⊗ (𝑅𝜆 ) − 2 (1 − 𝜆) 𝑅1
1 √︁

1 ⊗ (1 − 𝜆) 𝑅1 (𝑅𝜆 )
−1
√︁ − 12
T
× (1 − 𝜆) 𝑆1 ⊗ ((1 − 𝜆) 𝑅1 ) (11.F.80)
= 𝜆𝑆0T ⊗ (𝑅𝜆 ) −1 + (1 − 𝜆) 𝑆1T ⊗ (𝑅𝜆 ) −1 (11.F.81)
= 𝑆𝜆T ⊗ (𝑅𝜆 ) −1 . (11.F.82)

Continuing, we find that

𝜆Tr[𝐾 𝑅0𝑡 𝐾 † 𝑆01−𝑡 ] + (1 − 𝜆) Tr[𝐾 𝑅1𝑡 𝐾 † 𝑆11−𝑡 ]

1 1
≤ ⟨𝐾 | 𝑅 𝐴 (𝑅𝜆 ) 𝐴2 𝑔 (𝐿) (𝑅𝜆 ) 𝐴2 |𝐾⟩ 𝑅 𝐴 (11.F.83)
1 1
= ⟨𝐾 | 𝑅 𝐴 (𝑅𝜆 ) 𝐴 𝑔(𝑆𝜆T ⊗ (𝑅𝜆 ) −1 ) (𝑅𝜆 ) 𝐴 |𝐾⟩ 𝑅 𝐴
2 2
(11.F.84)
= Tr[𝐾 𝑅𝜆𝑡 𝐾 † 𝑆𝜆1−𝑡 ]. (11.F.85)

So the function (𝑅, 𝑆) ↦→ Tr[𝐾 𝑅𝑡 𝐾 † 𝑆 1−𝑡 ] is jointly concave when the first argument
is restricted to be a positive definite operator. The more general case of positive
semi-definite operators in the first argument can be established by adding 𝜀 1 to any
positive semi-definite operator to ensure that it is positive definite, applying the
above inequality, and then taking the limit 𝜀 → 0 at the end. This concludes the
proof. ■
712
Chapter 11: Entanglement-Assisted Classical Communication

Appendix 11.G The Strong Converse from a Differ-

ent Point of View

Here we show that the mutual information 𝐼 (N) is a strong converse rate based on
the the alternate definition given in Appendix A. According to that definition, a
rate 𝑅 ∈ R+ is a strong converse rate for entanglement-assisted classical commu-
nication over a channel N if for every sequence {(𝑛, |M𝑛 |, 𝜀 𝑛 )}𝑛∈N of (𝑛, |M|, 𝜀)
entanglement-assisted classical communication protocols over 𝑛 uses of N, we have
that lim inf 𝑛→∞ 𝑛1 log2 |M𝑛 | > 𝑅 ⇒ lim𝑛→∞ 𝜀 𝑛 = 1.
Let us show that the mutual information 𝐼 (N) of the channel N is a strong
converse rate under this alternate definition. Let {(𝑛, |M𝑛 | , 𝜀 𝑛 )}𝑛∈N be a sequence
of protocols satisfying lim inf 𝑛→∞ 𝑛1 log2 |M𝑛 | > 𝐼 (N). Due to this strict inequality,
the fact that lim𝛼→1 e𝐼𝛼 (N) = 𝐼 (N), and since the sandwiched Rényi mutual infor-
mation 𝐼𝛼 (N) is monotonically increasing in 𝛼 (this follows from Proposition 7.31),
e
there exists a value 𝛼∗ > 1 such that
1
lim inf log2 |M𝑛 | > e
𝐼𝛼∗ (N). (11.G.1)
𝑛→∞ 𝑛
Now recall the following bound from (11.2.92), which holds for all 𝛼 > 1 and for
every (𝑛, |M| , 𝜀) protocol:

1 𝛼 1
log2 |M| ≤ e
𝐼𝛼 (N) + log2 . (11.G.2)
𝑛 𝑛 (𝛼 − 1) 1−𝜀
We can apply it in our case to conclude that
∗

1 𝛼 1
log2 |M𝑛 | ≤ e
𝐼𝛼∗ (N) + log2 . (11.G.3)
𝑛 𝑛 (𝛼∗ − 1) 1 − 𝜀𝑛
Now suppose that
lim inf 𝜀 𝑛 = 𝑐 ∈ [0, 1). (11.G.4)
𝑛→∞
Then it follows that
∗

1 𝛼 1
lim inf log2 |M𝑛 | ≤ lim inf e 𝐼𝛼∗ (N) + log2 (11.G.5)
𝑛→∞ 𝑛 𝑛→∞ 𝑛 (𝛼∗ − 1) 1 − 𝜀𝑛
∗

𝛼 1
=e𝐼𝛼∗ (N) + lim inf log2 (11.G.6)
𝑛→∞ 𝑛 (𝛼∗ − 1) 1 − 𝜀𝑛
713
Chapter 11: Entanglement-Assisted Classical Communication

𝐼𝛼∗ (N),
=e (11.G.7)
where the last equality follows because 𝛼∗ > 1 is a constant and the sequence
{𝜀 𝑛 }𝑛∈𝑁 converges to a constant 𝑐 ∈ [0, 1). However, this contradicts (11.G.1).
Thus, (11.G.4) cannot hold, and so we conclude that lim inf 𝑛→∞ 𝜀 𝑛 = 1.
The argument given above makes no statement about how fast the error probabil-
ity converges to one in the large 𝑛 limit. If we fix the rate 𝑅 of communication to be a
constant satisfying 𝑅 > 𝐼 (N), then we can argue that the error probability converges
exponentially fast to one. To this end, consider a sequence {(𝑛, 2𝑛𝑅 , 𝜀 𝑛 )}𝑛∈N of
(𝑛, |M|, 𝜀) protocols, with each element of the sequence having an arbitrary (but
fixed) rate 𝑅 > 𝐼 (N). For each element of the sequence, the inequality in (11.2.92)
holds, which means that

𝛼 1
𝑅≤e 𝐼𝛼 (N) + log2 (11.G.8)
𝑛(𝛼 − 1) 1 − 𝜀𝑛
for all 𝛼 > 1. Rearranging this inequality leads to the following lower bound on the
error probabilities 𝜀 𝑛 :
𝜀 𝑛 ≥ 1 − 2−𝑛 ( 𝛼 )( 𝑅−𝐼 𝛼 (N) )
𝛼−1 e
(11.G.9)
for all 𝛼 > 1. Now, since 𝑅 > 𝐼 (N), lim𝛼→1 e 𝐼𝛼 (N) = 𝐼 (N), and since the
sandwiched Rényi mutual information 𝐼𝛼 (N) is monotonically increasing in 𝛼
e
(this follows from Proposition 7.31), there exists an 𝛼∗ > 1 such that 𝑅 > e 𝐼𝛼∗ (N).
Applying the inequality in (11.G.9) to this value of 𝛼, we find that
𝛼∗ −1

−𝑛 𝛼∗ ( 𝑅−e𝐼 𝛼∗ (N) )
𝜀𝑛 ≥ 1 − 2 . (11.G.10)
Then, taking the limit 𝑛 → ∞ on both sides of this inequality, we conclude that
lim𝑛→∞ 𝜀 𝑛 = 1 and the convergence to one is exponentially fast.
From the arguments above, we find not only that 𝐼 (N) is a strong converse
rate according to the alternate definition provided in Appendix A, but also that
the maximal error probability of every sequence of (𝑛, |M|, 𝜀) entanglement-
assisted classical communication protocols with fixed rate strictly above the mutual
information 𝐼 (N) approaches one at an exponential rate.
In Section 11.C, we showed that the error probability vanishes in the limit 𝑛 → ∞
for every fixed rate 𝑅 < 𝐼 (N). We thus see that, as 𝑛 → ∞, the mutual information
𝐼 (N) is a sharp dividing point between reliable, error-free communication and
communication with error probability approaching one exponentially fast. This
situation is depicted in Figure 11.8.
714
Chapter 11: Entanglement-Assisted Classical Communication

n→∞

Error
Probability,
εn

0 Rate, Rn
I (N )

Figure 11.8: The error probability 𝜀 𝑛 as a function of the rate 𝑅𝑛 for

entanglement-assisted classical communication over a quantum channel N.
As 𝑛 → ∞, for every rate below the mutual information 𝐼 (N), there exists a
sequence of protocols with error probability converging to zero. For every rate
above the mutual information 𝐼 (N), the error probability converges to one for
all possible protocols.

715
Chapter 12

Classical Communication
We now move on to classical communication over quantum channels. Unlike
the previous chapter, here we suppose that Alice and Bob do not have access
to shared entanglement prior to communication. Thus, the scenario considered
in this chapter is more practical than the entanglement-assisted setting—in the
previous chapter, we made the simplifying assumption that shared entanglement
is available for free to the sender and receiver. However, without widespread
entanglement-sharing networks available, this assumption is not really practical,
and so the entanglement-assisted capacity is mostly of academic interest at the
moment.
Without shared entanglement available to the sender and receiver, is it still
advantageous to use a quantum strategy to send classical information over a quantum
channel? At first glance, it may seem that, without prior shared entanglement, there
might not be any point in using a quantum strategy to send classical information
over a quantum channel. However, when using a channel multiple times, there is
still the possibility of encoding a message into a state at the encoder that is entangled
across multiple channel uses and then performing a collective measurement at the
decoder. For many examples of channels, it is known that collective measurements
can enhance communication capacity, and it is known that in principle, there exists
a channel for which entangled states at the encoder provides a further enhancement
to communication capacity.
Although it may seem that determining the maximum amount of classical
information that can be communicated using a given quantum channel, i.e., de-
termining the classical capacity of a quantum channel, might be easier than its

716
Chapter 12: Classical Communication

entanglement-assisted counterpart, this problem turns out to be one of the most

challenging problems in quantum Shannon theory. This is due to the fact that, as
we discuss in this chapter, the relevant quantity in calculating the classical capacity
of a quantum channel N is related to its Holevo information 𝜒(N), and this quantity
is not known to be additive for all quantum channels. This means that the best we
can say for a given channel is that its Holevo information is an achievable rate for
classical communication—we cannot necessarily say that it is the highest possible
achievable rate. This is in stark contrast to the case of entanglement-assisted
classical communication, for which we know that the mutual information 𝐼 (N) of a
channel is additive for all channels and thus is equal to the entanglement-assisted
classical capacity of any quantum channel.
We start in the next brief section by considering some simple motivating
examples of communication. Then we consider the one-shot setting. This setting
for classical communication is similar to the one-shot setting of entanglement-
assisted classical communication from the previous chapter. The only difference
is that, in this case, there is no entanglement assistance. We then move on to
the asymptotic setting, for which we prove Theorem 12.13, which states that
the classical capacity of a quantum channel is equal to the regularized Holevo
information of the channel. This is a quantity that in general requires computing
the Holevo information for an arbitrarily large number of uses of the given channel,
and it is therefore intractable unless the Holevo information happens to be additive
for the channel. From here, we consider various classes of channels for which the
Holevo information is additive or for which we can establish the strong converse.
We also consider various methods for bounding the classical capacity from above.
Finally, we calculate the classical capacity for some examples of channels.

Simple Example of Classical Communication Over a Quantum Channel

At the beginning of the previous chapter, we stated that super-dense coding is a

simple example of an entanglement-assisted classical communication protocol over
a noiseless quantum channel. The essense of that protocol is the encoding of 𝑑 2
messages into 𝑑 2 mutually orthogonal pure states (the maximally entangled states
defined in (3.2.58)). The 𝑑 2 messages contain log2 𝑑 2 = 2 log2 𝑑 bits of classical
information, which can be communicated without error from Alice to Bob, with
just one use of a noiseless qudit quantum channel.
Now, without the assistance of prior shared entanglement, one result of this

717
Chapter 12: Classical Communication

chapter is that the maximum amount of classical information that can be commu-
nicated over a noiseless quantum channel without error is log2 𝑑, where 𝑑 is the
dimension of the channel. Let us describe a simple protocol that achieves this
number of communicated bits. Consider a discrete set M of messages, and suppose
that Alice encodes each message 𝑚 ∈ M into a quantum state |𝑚⟩, such that the set
{|𝑚⟩} 𝑚∈M , is orthonormal, i.e., ⟨𝑚|𝑚′⟩ = 𝛿𝑚,𝑚 ′ for all 𝑚, 𝑚′ ∈ M. Bob, knowing
Alice’s encoding of the messages, devises a measurement to extract the message
described by the POVM {|𝑚⟩⟨𝑚|} 𝑚∈M . His strategy is to guess that the message
sent was “𝑚” if the outcome of his measurement is 𝑚 ∈ M. If Alice sends the state
|𝑚⟩⟨𝑚| through a noiseless quantum channel, then Bob is guaranteed to receive the
state |𝑚⟩⟨𝑚| unaltered, so that his guess will always be correct. Alice can thus send
log2 |M| bits of classical information to Bob without error.
Now, if the channel is noisy, the initially orthogonal states in general become
non-orthogonal, so that if Alice sends the state |𝑚⟩⟨𝑚| through the channel then
Bob generally receive a mixed state 𝜌 𝑚 instead. As a consequence of using a noisy
quantum channel, Bob’s decoding strategy will not always succeed, meaning that
there will be errors. In order to mitigate the effects of noise, Alice can choose
a more clever encoding of the message, and similarly Bob can devise a more
clever decoding strategy.1 Alice and Bob can also use the channel multiple times,
which can decrease the error in general, while also allowing for the messages to be
encoded into higher-dimensional entangled states.
Observe that the task of classical communication over a quantum channel is
closely related to the task of state discrimination (see Section 5.3.1). Recall that
the goal of state discrimination is to minimize the error probability for a given set
{𝜌 𝑚 } 𝑚∈M of states corresponding to the message set M and a particular decoding
POVM {Λ𝑚 𝐵 } 𝑚∈M indexed by the messages. In classical communication, we focus
primarily on maximizing the rate 𝑛1 log2 |M| of communication for a given error
probability 𝜀, and we are interested in determining the maximum rate 𝑅 for which
𝜀 vanishes as the number 𝑛 of channel uses increases.
1We assume, as in all communication tasks considered in this book, that Alice and Bob know
the channel connecting them, so that they can use this knowledge to develop their encoding and
decoding.

718
Chapter 12: Classical Communication

M3m
E A
N B
m
b

Alice Bob

Figure 12.1: Depiction of a protocol for classical communication over one use
of the quantum channel N. Alice, who wishes to send a message 𝑚 chosen
from a set M of messages, first encodes the message into a quantum state on a
quantum system 𝐴, using a classical–quantum encoding channel E. She then
sends the quantum system 𝐴 through the channel N 𝐴→𝐵 . After Bob receives
the system 𝐵, he performs a measurement on it, using the outcome of the
measurement to give an estimate 𝑚 b of the message sent by Alice.

12.1 One-Shot Setting

In the one-shot setting, we start by considering a classical communication protocol
over a quantum channel N, as depicted in Figure 12.1. The protocol is defined by
the triple (M, E 𝑀→𝐴 , D𝐵→ 𝑀b ), consisting of a message set M, an encoding channel
E 𝑀→𝐴 , and a decoding channel D𝐵→ 𝑀b . The pair (E, D) of encoding and decoding
channels is often called a code and denoted by C = (E, D). The encoding channel
is a classical–quantum channel (see Definition 4.9), and the decoding channel is a
quantum–classical or measurement channel (see Definition 4.10).
Now, given that there are |M| messages in the message set, it follows that
each message can be uniquely associated with a bit string of size at least log2 |M|.
The quantity log2 |M| thus represents the number of bits communicated in the
protocol. One of the goals of this section is to obtain upper and lower bounds on
maximum number of log2 |M| of bits that can be communicated in any classical
communication protocol.
The protocol proceeds as follows: let 𝑝 : M → [0, 1] be a probability
distribution over the message set. With probability 𝑝(𝑚), Alice picks a message
𝑚 ∈ M and makes a local copy of it. Letting {|𝑚⟩} 𝑚∈M be an orthonormal basis
indexed by the messages, her initial state is described by the following classically
correlated state:
𝑝
∑︁
Φ𝑀 𝑀 ′ B 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ |𝑚⟩⟨𝑚| 𝑀 ′ . (12.1.1)
𝑚∈M
Note that if Alice wishes to send a particular message 𝑚 deterministically, then she
719
Chapter 12: Classical Communication

can choose the distribution 𝑝 to be the degenerate distribution, equal to one for 𝑚
and zero for all other messages.
She then uses an encoding channel E 𝑀→𝐴 to map the message to a quantum
state 𝜌 𝑚𝐴 . We can explicitly define the encoding channel E 𝑀 ′ →𝐴 as

E 𝑀 ′ →𝐴 (|𝑚⟩⟨𝑚′ | 𝑀 ′ ) = 𝛿𝑚,𝑚 ′ 𝜌 𝑚𝐴 ∀ 𝑚, 𝑚′ ∈ M. (12.1.2)

Note that this channel has the form of a classical–quantum channel (recall Defini-
tion 4.9). The action of the encoding channel on the initial state in (12.1.1) is as
follows:
𝑝
∑︁
E 𝑀 →𝐴 (Φ 𝑀 𝑀 ′ ) =
′ 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ E 𝑀 ′ →𝐴 (|𝑚⟩⟨𝑚|) (12.1.3)
𝑚∈M
∑︁
= 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ 𝜌 𝑚𝐴 (12.1.4)
𝑚∈M
𝑝
=: 𝜌 𝑀 𝐴 . (12.1.5)

Alice then sends the system 𝐴 through the channel N 𝐴→𝐵 , resulting in the state
∑︁
N 𝐴→𝐵 (𝜌 𝑀 𝐴 ) = 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ N 𝐴→𝐵 (𝜌 𝑚𝐴 ). (12.1.6)
𝑚∈M

Bob, whose task is to determine which message Alice sent, performs a decoding
measurement on his received system 𝐵, which has the corresponding POVM
𝐵 } 𝑚∈M . The measurement is associated with the decoding channel D 𝐵→ 𝑀
{Λ𝑚 b,
which is simply a quantum–classical channel as given in Definition 4.10, i.e.,
∑︁
D𝐵→ 𝑀b (𝜏𝐵 ) B Tr[Λ𝑚𝐵 𝜏𝐵 ]|𝑚⟩⟨𝑚| 𝑀
b (12.1.7)
𝑚∈M

for every state 𝜏𝐵 . So the final state of the protocol is

𝑝 𝑝
𝜔 B (D𝐵→ 𝑀b ◦ N 𝐴→𝐵 ◦ E 𝑀 ′ →𝐴 )(Φ 𝑀 𝑀 ′ ) (12.1.8)
𝑀𝑀b
∑︁
𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ Tr[Λ𝑚 𝐵 N 𝐴→𝐵 (𝜌 𝐴 )]| 𝑚
𝑚
= b
b⟩⟨𝑚
b | 𝑀b . (12.1.9)
b∈M
𝑚,𝑚

The measurement by Bob induces the conditional probability distribution

𝑞 : M × M → [0, 1] defined by

𝐵 N 𝐴→𝐵 (𝜌 𝐴 )].
b |𝑀 = 𝑚] = Tr[Λ𝑚 𝑚
b |𝑚) B Pr[ 𝑀
𝑞( 𝑚 b=𝑚 b
(12.1.10)
720
Chapter 12: Classical Communication

Bob’s strategy is such that if the outcome 𝑚

b occurs from his measurement, then he
guesses that the message sent was 𝑚 b. The probability that Bob correctly identifies
a given message 𝑚 is then equal to 𝑞(𝑚|𝑚). The message error probability of the
code is given by

𝑝 err (𝑚, (E, D); N) B 1 − 𝑞(𝑚|𝑚)

= Tr[( 1𝐵 − Λ𝑚𝐵 )N 𝐴→𝐵 (𝜌 𝑚𝐴 )]
∑︁ (12.1.11)
= b |𝑚).
𝑞( 𝑚
b∈M\{𝑚}
𝑚

The average error probability of the code is

∑︁
𝑝 err ((E, D); 𝑝, N) B 𝑝(𝑚) 𝑝 err (𝑚, (E, D); N) (12.1.12)
𝑚∈M
∑︁
= 𝑝(𝑚)(1 − 𝑞(𝑚|𝑚)) (12.1.13)
𝑚∈M
∑︁ ∑︁
= b |𝑚).
𝑝(𝑚)𝑞( 𝑚 (12.1.14)
b∈M\{𝑚}
𝑚∈M 𝑚

The maximal error probability of the code is

𝑝 ∗err (E, D; N) B max 𝑝 err (𝑚, (E, D); N). (12.1.15)

𝑚∈M

Just as in the case of entanglement-assisted classical communication in Chapter 11,

each of these three error probabilities can be used to assess the reliability of the
protocol, i.e., how well the encoding and decoding allows Alice to transmit her
message to Bob.

Definition 12.1 (|M|, 𝜺) Classical Communication Protocol

A classical communication protocol (M, E 𝑀→𝐴 , D𝐵→ 𝑀b ) over the channel
N 𝐴→𝐵 is called an (|M|, 𝜀) protocol, with 𝜀 ∈ [0, 1], if 𝑝 ∗err (E, D; N) ≤ 𝜀.

As with entanglement-assisted classical communication, the error criterion

𝑝 err (E, D; N) ∗ ≤ 𝜀 is equivalent to
1 𝑝 𝑝
max Φ𝑀 𝑀 ′ − 𝜔 b ≤ 𝜀, (12.1.16)
𝑝:M→[0,1] 2 𝑀𝑀 1

721
Chapter 12: Classical Communication

and the steps to show this are the same as those shown in the proof of Lemma 11.2.
In particular, the following equality holds
1 𝑝 𝑝
Φ𝑀 𝑀 ′ − 𝜔 b = 𝑝 err ((E, D); 𝑝), (12.1.17)
2 𝑀 𝑀 1

which leads to
𝑝 ∗err (E, D; N) = max 𝑝 err ((E, D); 𝑝, N)
𝑝:M→[0,1]
1 𝑝 𝑝
(12.1.18)
= max Φ𝑀 𝑀 ′ − 𝜔 b .
𝑝:M→[0,1] 2 𝑀𝑀 1

Also, as in Chapter 11, another way to define the error criterion of the protocol is
through a comparator test. Recall that the comparator test is a measurement defined
by the two-element POVM {Π 𝑀 𝑀b , 1 𝑀 𝑀b − Π 𝑀 𝑀b }, where Π 𝑀 𝑀b is the projection
defined as ∑︁
Π 𝑀 𝑀b B |𝑚⟩⟨𝑚| 𝑀 ⊗ |𝑚⟩⟨𝑚| 𝑀b . (12.1.19)
𝑚∈M
𝑝
Note that Tr[Π 𝑀 𝑀b 𝜔 b ] is simply the probability that the classical registers 𝑀
𝑀𝑀
b in the state 𝜔 𝑝 have the same values. In particular, following the same
and 𝑀
𝑀𝑀
b
steps as in (11.1.38)–(11.1.40), we have
h i
𝑝
Tr Π 𝑀 𝑀b 𝜔 b = 1 − 𝑝 err ((E, D); 𝑝, N) =: 𝑝 succ ((E, D); 𝑝, N), (12.1.20)
𝑀𝑀

where we have acknowledged that the expression on the left-hand side can be
interpreted as the average success probability of the code (E, D) and denoted it by
𝑝 succ ((E, D); 𝑝, N).
As mentioned at the beginning of this chapter, our goal is to bound (from above
and below) the maximum number log2 |M| of transmitted bits for every classical
communication protocol over N. Given an error probability threshold of 𝜀, we call
the maximum number of transmitted bits the one-shot classical capacity of N.

Definition 12.2 One-Shot Classical Capacity of a Quantum Channel

Given a quantum channel N 𝐴→𝐵 and 𝜀 ∈ [0, 1], the one-shot 𝜀-error classical
capacity of N, denoted by 𝐶 𝜀 (N), is defined to be the maximum number log2 |M|
of transmitted bits among all (|M|, 𝜀) classical communication protocols over

722
Chapter 12: Classical Communication

N. In other words,

𝐶 𝜀 (N) B sup {log2 |M| : 𝑝 ∗err (E, D; N) ≤ 𝜀}, (12.1.21)

(M,E,D)

where the optimization is over all protocols (M, E 𝑀 ′ →𝐴 , D𝐵→ 𝑀b ) satisfying

𝑑 𝑀 ′ = 𝑑 𝑀b = |M|.

In addition to finding, for a given 𝜀 ∈ [0, 1], the maximum number of transmitted
bits among all (|M|, 𝜀) classical communication protocols over N 𝐴→𝐵 , we can
consider the following complementary problem: for a given number of messages
|M|, find the smallest possible error probability among all (|M|, 𝜀) classical
communication protocols, which we denote by 𝜀𝐶∗ (|M|; N). In other words, to
problem is to determine

𝜀𝐶∗ (|M|; N) B inf {𝑝 ∗err (E, D; N) : 𝑑 𝑀 ′ = 𝑑 𝑀b = |M|}, (12.1.22)

E,D

where the optimization is over encoding channels E with input space dimension
|M| and decoding channels D with output space dimension |M|. In this book, we
focus primarily on the problem of optimizing the number of transmitted bits rather
than the error probability, and so our primary quantity of interest is the one-shot
capacity 𝐶 𝜀 (N).

12.1.1 Protocol Over a Useless Channel

We now turn to establishing an upper bound on the one-shot classical capacity, and
our approach is similar to the approach outlined in Section 11.1.1. With this goal
in mind, along with the actual classical communication protocol, we also consider
the same protocol but performed over a useless channel as depicted in Figure 12.2.
This useless channel discards the state encoded with the message and replaces it
with some arbitrary (but fixed) state 𝜎𝐵 . In other words,

𝜌 𝑚𝐴 ↦→ 𝜎𝐵 C (P𝜎𝐵 ◦ Tr)(𝜌 𝑚𝐴 ) ∀ 𝑚 ∈ M, (12.1.23)

where the encoded states 𝜌 𝑚𝐴 are defined in (12.1.2). This channel is useless
because the state 𝜎𝐵 does not contain any information about the message. As with
entanglement-assisted classical communication, comparing this protocol over the

723
Chapter 12: Classical Communication

M3m
E A
PσB B
m
b

Alice Bob

Figure 12.2: Depiction of a protocol that is useless for classical communication.

The state encoding the message 𝑚 via E is discarded and replaced with an
arbitrary (but fixed) state 𝜎𝐵 .

useless channel with the actual protocol allows us to obtain an upper bound on the
quantity log2 |M|, which we recall represents the number of bits that are transmitted
over the channel.
The state at the end of the protocol over the useless channel is the following
tensor-product state:
∑︁ ∑︁
𝑝
𝜏 bB 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ Tr[Λ𝑚 𝐵 𝜎𝐵 ]| 𝑚
b
b⟩⟨𝑚
b | 𝑀b , (12.1.24)
𝑀𝑀
𝑚∈M b∈M
𝑚

which indicates that the decoded message system 𝑀 b is independent of the message
𝑝
system 𝑀 in this case. Now, recall from (12.1.9) that the state 𝜔 b at the end of
𝑀𝑀
the actual protocol over the channel N is given by
∑︁
𝑝
𝑝(𝑚)Tr[Λ𝑚 𝐵 N 𝐴→𝐵 (𝜌 𝐴 )]|𝑚⟩⟨𝑚| 𝑀 ⊗ | 𝑚
𝑚
𝜔 b= b
b⟩⟨𝑚
b | 𝑀b . (12.1.25)
𝑀𝑀
b∈M
𝑚,𝑚

Similar to the notation from Chapter 11, we let

1 ∑︁
𝐵 N 𝐴→𝐵 (𝜌 𝐴 )]|𝑚⟩⟨𝑚| 𝑀 ⊗ | 𝑚
Tr[Λ𝑚 𝑚
𝜔 𝑀 𝑀b B b
b⟩⟨𝑚
b | 𝑀b , (12.1.26)
|M|
b∈M
𝑚,𝑚
𝑝
be the state 𝜔 with the probability distribution 𝑝 over the message set equal to
𝑀𝑀b
1
the uniform distribution, i.e., 𝑝(𝑚) = |M| . We also let

1 ∑︁
Φ𝑀 𝑀 ′ B |𝑚⟩⟨𝑚| 𝑀 ⊗ |𝑚⟩⟨𝑚| 𝑀 ′ (12.1.27)
|M|
𝑚∈M

be the state in (12.1.1) with 𝑝 being the uniform distribution over M.

Now, observe that Tr 𝑀b [𝜔 𝑀 𝑀b ] = 𝜋 𝑀 . Also, for every (|M|, 𝜀) classical commu-
nication protocol, the condition 𝑝 ∗err (E, D; N) ≤ 𝜀 holds. By following the same
724
Chapter 12: Classical Communication

steps as in (11.1.63)–(11.1.66), this condition implies that 𝑝 err ((E, D); 𝑝, N) ≤ 𝜀

for the uniform distribution 𝑝. Then, by (12.1.20), we find that

Tr[Π 𝑀 𝑀b 𝜔 𝑀 𝑀b ] ≥ 1 − 𝜀. (12.1.28)

We can therefore use Lemma 11.4 to conclude that

log2 |M| ≤ 𝐼 𝐻𝜀 (𝑀; 𝑀)

b𝜔 (12.1.29)

for every (|M|, 𝜀) classical communication protocol. This means that, given a
particular choice of the encoding and decoding channels, if 𝑝 ∗err (E, D; N) ≤ 𝜀, then
the upper bound in (12.1.29) is the maximum number of bits that can be transmitted
over the channel N. The optimal value of this upper bound is realized by finding
the state 𝜎𝑀b defining the useless channel that optimizes the quantity 𝐼 𝐻𝜀 (𝑀; 𝑀)
b 𝜔 in
addition to the measurement that achieves the 𝜀-hypothesis testing relative entropy
in (11.1.61). Importantly, a different choice of encoding and decoding produces a
different value for this upper bound. We would thus like to find an upper bound
that applies regardless of which specific protocol is chosen. In other words, we
would like an upper bound that is a function of the channel N only.

12.1.2 Upper Bound on the Number of Transmitted Bits

We now give a general upper bound on the number of transmitted bits that can
be communicated in any classical communication protocol. This result is stated
in Theorem 12.4, and the upper bound obtained therein holds independently of
the encoding and decoding channels used in the protocol and depends only on the
given communication channel N.
Let us start with an arbitrary (|M|, 𝜀) classical communication protocol over
the channel N, corresponding to, as described at the beginning of this chapter, a
message set M, an encoding channel E, and a decoding channel D. The error
criterion 𝑝 ∗err (E, D; N) ≤ 𝜀 holds by definition of an (|M|, 𝜀) protocol, which
implies the upper bound in (12.1.29) for the number log2 |M| of transmitted bits in
any (|M|, 𝜀) classical communication protocol. Using this upper bound, we obtain
the following:

725
Chapter 12: Classical Communication

Proposition 12.3 Upper Bound on One-Shot Classical Capacity

Let N be a quantum channel. For every (|M|, 𝜀) classical communication
protocol over N, with 𝜀 ∈ [0, 1], the number of bits transmitted over N is
bounded from above by the 𝜀-hypothesis testing Holevo information of N, as
defined in (7.11.93), i.e.,

log2 |M| ≤ 𝜒𝐻𝜀 (N). (12.1.30)

Therefore,
𝐶 𝜀 (N) ≤ 𝜒𝐻𝜀 (N). (12.1.31)

Proof: We start with the upper bound in (12.1.29), i.e.,

log2 |M| ≤ 𝐼 𝐻𝜀 (𝑀; 𝑀)

b 𝜔, (12.1.32)

where the state 𝜔 𝑀 𝑀b is defined in (12.1.26). Recall that this bound follows from
Lemma 11.4. Note that the state 𝜔 𝑀 𝑀b can be written as

𝜔 𝑀 𝑀b = D𝐵→ 𝑀b (𝜃 𝑀 𝐵 ), (12.1.33)

where
1 ∑︁
𝜃𝑀𝐵 B |𝑚⟩⟨𝑚| 𝑀 ⊗ N 𝐴→𝐵 (𝜌 𝑚𝐴 ). (12.1.34)
|M|
𝑚∈M

Now, from the data-processing inequality for the hypothesis testing relative
entropy under the action of the decoding channel D𝐵→ 𝑀b , we find that

b 𝜔 = inf 𝐷 𝜀𝐻 (𝜔 b ∥𝜔 𝑀 ⊗ 𝜎′ ) ≤ 𝐼 𝐻𝜀 (𝑀; 𝐵)𝜃 ,

𝐼 𝐻𝜀 (𝑀; 𝑀) (12.1.35)
𝜎c′ 𝑀𝑀 b 𝑀
𝑀

where we have used the fact that 𝜃 𝑀 = 𝜋 𝑀 = 𝜔 𝑀 . Note that

𝜃 𝑀 𝐵 = N 𝐴→𝐵 (𝜌 𝑀 𝐴 ), (12.1.36)
𝑝
where 𝜌 𝑀 𝐴 is the classical–quantum state 𝜌 𝑀 𝐴 defined in (12.1.5) with 𝑝 equal to
the uniform probability distribution. Optimizing over all classical–quantum states
𝜉 𝑀 𝐴 then leads to

𝐼 𝐻𝜀 (𝑀; 𝐵)𝜃 ≤ sup 𝐼 𝐻𝜀 (𝑀; 𝐵) 𝜁 = 𝜒𝐻𝜀 (N), (12.1.37)

𝜉𝑀 𝐴

726
Chapter 12: Classical Communication

where 𝜁 𝑀 𝐵 = N 𝐴→𝐵 (𝜉 𝑀 𝐴 ) and we have used the definition in (7.11.93) for the
𝜀-hypothesis testing Holevo information of a channel. Note that this optimization
over all classical–quantum states is effectively an optimization over all possible
encoding channels E 𝑀 ′ →𝐴 that define the (|M|, 𝜀) protocol. Putting everything
together, we obtain

log2 |M| ≤ 𝐼 𝐻𝜀 (𝑀; 𝑀)

b 𝜔 ≤ 𝐼 𝐻𝜀 (𝑀; 𝐵)𝜃 ≤ 𝜒𝐻𝜀 (N), (12.1.38)

as required. ■

The result of Proposition 12.3 can be written explicitly as

log2 |M| ≤ sup inf 𝐷 𝜀𝐻 (N 𝐴→𝐵 (𝜌 𝑀 𝐴 )∥ 𝜌 𝑀 ⊗ 𝜎𝐵 ) (12.1.39)

𝜌 𝑀 𝐴 𝜎𝐵
= sup inf 𝐷 𝜀𝐻 (N 𝐴→𝐵 (𝜌 𝑀 𝐴 )∥R𝜎𝐴→𝐵
𝐵
(𝜌 𝑀 𝐴 )), (12.1.40)
𝜌 𝑀 𝐴 𝜎𝐵

where 𝜌 𝑀 𝐴 is a classical–quantum state. By doing so, we explictly see here the

comparison, via the hypothesis testing relative entropy, between the actual classical
communication protocol and the protocol over the useless channels R𝜎𝐴→𝐵 𝐵
, labeled
by the states 𝜎𝐵 . The state 𝜌 𝑀 𝐴 corresponds to the state after the encoding channel,
and optimizing over these states is effectively an optimization over all encoding
channels.
As an immediate consequence of Propositions 12.3, 7.70, and 7.71, we have the
following two bounds:

Theorem 12.4 One-Shot Upper Bounds for Classical Communication

Let N be a quantum channel, let 𝜀 ∈ [0, 1), and let 𝛼 > 1. For every (|M|, 𝜀)
classical communication protocol over N, the following bounds hold
1
log2 |M| ≤ ( 𝜒(N) + ℎ2 (𝜀)) , (12.1.41)
1−𝜀
𝛼 1
log2 |M| ≤ e
𝜒𝛼 (N) + log2 , (12.1.42)
𝛼−1 1−𝜀

where 𝜒(N) is the Holevo information of N, as defined in (7.11.106), and e

𝜒𝛼 (N)
is the sandwiched Rényi Holevo information of N, as defined in (7.11.95).

727
Chapter 12: Classical Communication

Since the bounds in (12.1.41) and (12.1.42) hold for every (|M|, 𝜀) classical
communication protocol over N, we have that
1
𝐶 𝜀 (N) ≤ ( 𝜒(N) + ℎ2 (𝜀)), (12.1.43)
1−𝜀
𝛼 1
𝐶 𝜀 (N) ≤ e
𝜒𝛼 (N) + log2 ∀ 𝛼 > 1, (12.1.44)
𝛼−1 1−𝜀

for all 𝜀 ∈ [0, 1).

Let us recap the steps that we took to arrive at the bounds in (12.1.41)
and (12.1.42).
1. We first compared the classical communication protocol over the channel N
with a protocol over a useless channel, by using the 𝜀-hypothesis testing relative
entropy. This led us to the upper bound in (12.1.29).
2. We then used the data-processing inequality for the hypothesis testing relative
entropy to obtain a quantity that is independent of the decoding channel, and
also optimized over all useless protocols. This is done in (12.1.35) in the proof
of Proposition 12.3.
3. Finally, to obtain a bound that is a function solely of the channel N and the error
probability, we optimized over all encoding channels to obtain Proposition 12.3.
4. Using Propositions 7.70 and 7.71, which relate the hypothesis testing relative
entropy to the quantum relative entropy and the sandwiched-Rényi relative
entropy, respectively, we arrived at Theorem 12.4.
The bounds in (12.1.41) and (12.1.42) are fundamental upper bounds on the
number of transmitted bits for every classical communication protocol. A natural
question to ask is whether the upper bounds in (12.1.41) and (12.1.42) can be
achieved. In other words, is it possible to devise protocols such that the number of
transmitted bits is equal to the right-hand side of either (12.1.41) or (12.1.42)? We
do not know how to, especially if we demand that we exactly attain the right-hand
side of either (12.1.41) or (12.1.42). However, when given many uses of a channel
(in the asymptotic setting), we can come close to achieving these upper bounds.
This motivates finding lower bounds on the number of transmitted bits.

728
Chapter 12: Classical Communication

12.1.3 Lower Bound on the Number of Transmitted Bits

Having obtained upper bounds on the number transmitted bits in the previous
section, let us now determine lower bounds. The key result of this section is
Proposition 12.5, resulting in Theorem 12.6, which contains a lower bound on the
number of transmitted bits for every (|M|, 𝜀) classical communication protocol.
As we saw in the previous chapter on entanglement-assisted classical co-
mmunication, in order to obtain a lower bound on the number of transmitted bits,
we should devise an explicit classical communication protocol (M, E, D) such that
the maximal error probability satisfies 𝑝 ∗err (E, D; N) ≤ 𝜀 for 𝜀 ∈ [0, 1]. Recall
from (12.1.15) that the maximal error probability is defined as

𝑝 ∗err (E, D; N) = max 𝑝 err (𝑚, (E, D); N), (12.1.45)

𝑚∈M

where, for all 𝑚 ∈ M, the message error probability 𝑝 err (𝑚; (E, D)) is defined in
(12.1.11) as
𝑝 err (𝑚, (E, D); N) = 1 − 𝑞(𝑚|𝑚), (12.1.46)
b |𝑚) being the probability of identifying the message sent as 𝑚
with 𝑞( 𝑚 b given that
the message 𝑚 was sent.
The classical communication protocol discussed here is related to the enta-
nglement-assisted classical communication protocol in Section 11.1.3. We suppose
at first that Alice and Bob have some shared randomness prior to communication.
This shared randomness is strictly speaking not part of the classical communication
protocol as outlined at the beginning of Section 12.1, but the advantage of using
it is that we can directly employ all of the developments for the position-based
coding and sequential decoding strategy from Section 11.1.3. We then perform
what is called derandomization and expurgation (both of which we outline below)
ultimately to remove this shared randomness from the protocol and thus obtain
the desired lower bound on the number of transmitted bits for the true unassisted
classical communication protocol.

Proposition 12.5 Lower Bound on One-Shot Classical Capacity

Let N 𝐴→𝐵 be a quantum channel. For all 𝜀 ∈ (0, 1) and 𝜂 ∈ 0, 𝜀2 , there exists

729
Chapter 12: Classical Communication

an (|M|, 𝜀) classical communication protocol over N 𝐴→𝐵 such that

𝜀 4𝜀
−𝜂
log2 |M| = 𝜒 𝐻2 (N) − log2 2 . (12.1.47)
𝜂

Consequently, for all 𝜀 ∈ (0, 1) and 𝜂 ∈ 0, 𝜀2 ,

𝜀 4𝜀
−𝜂
𝐶 𝜀 (N) ≥ 𝜒 𝐻2 (N) − log2 2 . (12.1.48)
𝜂
Here,
𝜀
𝜒 𝜀𝐻 (N) B sup 𝐼 𝐻 (𝑋; 𝐵)𝜔 , (12.1.49)
𝜌𝑋 𝐴

where 𝜔 𝑋 𝐵 = N 𝐴→𝐵 (𝜌 𝑋 𝐴 ), the state 𝜌 𝑋 𝐴 is a classical–quantum state, and

𝜀
𝐼 𝐻 (𝑋; 𝐵)𝜔 = 𝐷 𝜀𝐻 (𝜔 𝑋 𝐵 ∥𝜔 𝑋 ⊗ 𝜔 𝐵 ). (12.1.50)

𝜀
Remark: The quantity 𝜒 𝐻 (N) defined in the statement of Proposition 12.5 above is similar
to the quantity 𝜒𝐻 (N) defined in (7.11.93), except that it is defined with respect to the mutual
𝜀
𝜀
information 𝐼 𝐻 (𝑋; 𝐵)𝜌 that we encountered in Proposition 11.8, which does not involve an
optimization over states 𝜎𝐵 .

Proof: As described before the statement of the proposition, we start with a

protocol based on randomness-assisted classical communication, in which Alice
′|
and Bob have shared randomness prior to communication via the state 𝜌 ⊗|M 𝑋𝐵 ,
′
′
where M is a message set and 𝜌 𝑋 𝐵′ is the following classically correlated state:
∑︁
𝜌 𝑋 𝐵′ B 𝑟 (𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ |𝑥⟩⟨𝑥| 𝐵′ . (12.1.51)
𝑥∈X
Here, X is a finite alphabet and 𝑟 : X → [0, 1] is a probability distribution on X.
The system 𝑋 is held by Alice and the system 𝐵′ is held by Bob. We denote the
encoding and decoding channels for this protocol by E′ and D′, respectively, and
they correspond to the position-based coding and sequential decoding strategies
developed in Section 11.1.3. The goal is to use this protocol to determine the
existence of encoding and decoding channels E and D for an (|M|, 𝜀) classical
communication protocol.
As a first step, Alice processes each of her 𝑋 systems with a classical–quantum
channel 𝑥 ↦→ 𝜌 𝑥𝐴 (see Definition 4.9), so that the state shared by them becomes
730
Chapter 12: Classical Communication
′
𝜌 ⊗|M |
𝐴′ 𝐵′ , where ∑︁
𝜌 𝐴′ 𝐵 ′ = 𝑟 (𝑥) 𝜌 𝑥𝐴′ ⊗ |𝑥⟩⟨𝑥| 𝐵′ . (12.1.52)
𝑥∈X
Just as in Section 11.1.3, the rest of the encoding channel E′ is defined such that
if Alice wishes to send the message 𝑚 ∈ M′, then she sends the 𝑚th 𝐴 system
through the channel. Thus, the state shared by Alice and Bob becomes

𝜌 𝐴1′ 𝐵1′ ⊗ · · · ⊗ N 𝐴→𝐵 (𝜌 𝐴𝐵′𝑚 ) ⊗ · · · ⊗ 𝜌 𝐴′|M′ | 𝐵′|M′ | (12.1.53)

for all 𝑚 ∈ M′, where

∑︁
𝜌 𝐴𝑖′ 𝐵𝑖′ B 𝑟 (𝑥) 𝜌 𝑥𝐴′ ⊗ |𝑥⟩⟨𝑥| 𝐵𝑖′ . (12.1.54)
𝑖
𝑥∈X

The reduced state on Bob’s systems is then

𝜏𝐵𝑚′ ···𝐵′𝑚 ···𝐵′ 𝐵 = 𝜌 𝐵1′ ⊗ · · · ⊗ N 𝐴→𝐵 (𝜌 𝐴𝐵′𝑚 ) ⊗ · · · ⊗ 𝜌 𝐵′|M′ | . (12.1.55)

1 |M′ |

In particular, we have that

N 𝐴′ →𝐵 (𝜌 𝐴′ 𝐵′ ) b = 𝑚,
if 𝑚
𝜏𝐵𝑚′ = (12.1.56)
𝑚
b
𝐵 𝜌 𝐵′ ⊗ N 𝐴′ →𝐵 (𝜌 𝐴′ ) if 𝑚
b ≠ 𝑚,
for all 𝑚 ∈ M′. Bob’s task is to perform a test to guess which of the two states
N 𝐴′ →𝐵 (𝜌 𝐴′ 𝐵′ ) and 𝜌 𝐵′ ⊗ N 𝐴′ →𝐵 (𝜌 𝐴′ ) he has on 𝐵 and the |M′ | systems 𝐵′1 · · · 𝐵′|M′ | .
Since the shared state in the proof of Proposition 11.8 is arbitrary, we can apply
all the arguments in that proof with the state 𝜌 𝐴′ 𝐵′ defined in (12.1.54) above to
conclude immediately via (11.1.126) that

𝑝 err (𝑚, (E′, D′); N) ≤ 𝜀 (12.1.57)

for all 𝑚 ∈ M′, provided that

′ 𝜀−𝜂 4𝜀
log2 |M | = 𝐼 𝐻 (𝐵′; 𝐵)𝜉 − log2 2 , (12.1.58)
𝜂
where 𝜉 𝐵′ 𝐵 B N 𝐴′ →𝐵 (𝜌 𝐴′ 𝐵′ ). Note that 𝜉 𝐵′ 𝐵 is a classical–quantum state, which
𝜀−𝜂
means that 𝐼 𝐻 (𝐵′; 𝐵)𝜉 = 𝜒 𝐻 (𝐵′; 𝐵)𝜉 . Furthermore, by taking a supremum over
𝜀−𝜂

every input ensemble {(𝑟 (𝑥), 𝜌 𝑥𝐴 )}𝑥∈X , we find that

4𝜀
log2 |M′ | = 𝜒 𝐻 (N) − log2 2 .
𝜀−𝜂
(12.1.59)
𝜂
731
Chapter 12: Classical Communication

Combining (12.1.57) and (12.1.59), we can already conclude that, when shared
randomness is available for free, a lower bound on the number of transmitted bits
is given by (12.1.59). The condition in (12.1.57) on the message error probability
implies that the average error probability 𝑝 err ((E′, D′); 𝑝), with 𝑝 the uniform
distribution over M′, satisfies
1 ∑︁
𝑝 err ((E′, D′); 𝑝, N) = 𝑝 err (𝑚, (E′, D′), N) ≤ 𝜀. (12.1.60)
|M′ | ′ 𝑚∈M

Let us now use the expression in (11.1.104) to derive an exact expression for
the average error probability. The expression in (11.1.104) is

𝑝 err (𝑚, (E, D); N)

b1 𝜔𝑚′ ′ (12.1.61)
= 1 − Tr[𝑃𝑚 𝑃
b𝑚−1 · · · 𝑃
𝐵 ···𝐵 𝐵𝑅1 ···𝑅 |M′ | 𝑃1 · · · 𝑃𝑚−1 𝑃𝑚 ]
b b
1 |M′ |

for all 𝑚 ∈ M′. First, observe that both N 𝐴′ →𝐵 (𝜌 𝐴′ 𝐵′ ) and 𝜌 𝐵′ ⊗ N 𝐴′ →𝐵 (𝜌 𝐴′ ) are

classical–quantum states. This means that, for every measurement operator Λ𝐵𝐵′ ,
we find that
∑︁
Tr[Λ𝐵𝐵′ N 𝐴′ →𝐵 (𝜌 𝐵′ 𝐴′ )] = 𝑟 (𝑥)Tr[Λ𝐵′ 𝐵 (|𝑥⟩⟨𝑥| 𝐵′ ⊗ 𝜌 𝑥𝐵 )] (12.1.62)
𝑥∈X
𝑟 (𝑥)Tr[(⟨𝑥| 𝐵′ ⊗ 1𝐵 )Λ𝐵′ 𝐵 (|𝑥⟩𝐵′ ⊗ 1𝐵 ) 𝜌 𝑥𝐵 ]
∑︁
=
𝑥∈X
(12.1.63)
∑︁
= 𝑟 (𝑥)Tr[𝑀𝐵𝑥 𝜌 𝑥𝐵 ], (12.1.64)
𝑥∈X

where 𝜌 𝑥𝐵 = N 𝐴′ →𝐵 (𝜌 𝑥𝐴′ ), and we have defined the operators

𝑀𝐵𝑥 B Tr 𝐵′ [(|𝑥⟩⟨𝑥| 𝐵′ ⊗ 1𝐵′ )Λ𝐵′ 𝐵 ]. (12.1.65)

Í
Similarly, letting 𝜌 𝐵 B 𝑥∈X 𝑟 (𝑥) 𝜌 𝑥𝐵 , we find that
∑︁
Tr[Λ𝐵′ 𝐵 (𝜌 𝐵′ ⊗ N 𝐴′ →𝐵 (𝜌 𝐴′ ))] = 𝑟 (𝑥)Tr[Λ𝐵′ 𝐵 (|𝑥⟩⟨𝑥| 𝐵′ ⊗ 𝜌 𝐵 )] (12.1.66)
𝑥∈X
∑︁
= 𝑟 (𝑥)Tr[𝑀𝐵𝑥 𝜌 𝐵 ]. (12.1.67)
𝑥∈X

732
Chapter 12: Classical Communication

This implies that the measurement operator Λ∗𝐵′ 𝐵 that achieves the optimal value
𝜀−𝜂
for the quantity 𝐷 𝐻 (N 𝐴′ →𝐵 (𝜌 𝐴′ 𝐵′ )∥ 𝜌 𝐵′ ⊗ N 𝐴′ →𝐵 (𝜌 𝐴′ )) can be taken to have the
form ∑︁
∗
Λ 𝐵′ 𝐵 = |𝑥⟩⟨𝑥| 𝐵′ ⊗ 𝑀𝐵𝑥 . (12.1.68)
𝑥∈X
Now using the fact that
√︃ ∑︁ √︃
Λ∗𝐵′ 𝐵 = |𝑥⟩⟨𝑥| 𝐵′ ⊗ 𝑀𝐵𝑥 , (12.1.69)
𝑥∈X

the projectors Π𝐵′ 𝐵𝑅 defined in (11.1.116) have the following form:

∑︁
𝑥
Π𝐵′ 𝐵𝑅 = |𝑥⟩⟨𝑥| 𝐵′ ⊗ Π𝐵𝑅 , (12.1.70)
𝑥∈X

where 𝑅 is a reference system held by Bob to help with the decoding, and the
projector Π𝐵𝑅
𝑥 is given by

𝑥
Π𝐵𝑅 B (𝑈𝐵𝑅 ) ( 1𝐵 ⊗ |1⟩⟨1| 𝑅 ) 𝑈𝐵𝑅
𝑥 † 𝑥
, (12.1.71)
√︃
𝑥
𝑈𝐵𝑅 B 1𝐵 − 𝑀𝐵𝑥 ⊗ (|0⟩⟨0| 𝑅 + |1⟩⟨1| 𝑅 )
√︃
+ 𝑀𝐵𝑥 ⊗ (|1⟩⟨0| 𝑅 − |0⟩⟨1| 𝑅 ) . (12.1.72)

This in turn implies that the measurement operators 𝑃𝑖 , which are used for the
sequential decoding and are defined in (11.1.100), have the form
∑︁
𝑃𝑖 = |𝑥⟩⟨𝑥| 𝐵1′ ···𝐵′|M′ | ⊗ 𝑃𝑖𝑥𝑖 , (12.1.73)
𝑥1 ,...,𝑥 |M′ | ∈X

where |𝑥⟩ ≡ |𝑥1 , . . . , 𝑥 |M′ | ⟩ and

𝑃𝑖𝑥𝑖 B 1 𝑅1 ⊗ · · · ⊗ 1 𝑅𝑖−1 ⊗ Π𝐵𝑅

𝑥𝑖
𝑖
⊗ 1 𝑅𝑖+1 ⊗ · · · ⊗ 1 𝑅 |M′ | . (12.1.74)

Finally, since we can write the state 𝜏𝐵𝑚′ ···𝐵′ 𝐵

in (12.1.55) as
1 |M′ |
∑︁
𝜏𝐵𝑚′ ···𝐵′ ′ 𝐵 = 𝑟 (𝑥1 ) · · · 𝑟 (𝑥 |M′ | )|𝑥⟩⟨𝑥| 𝐵1′ ···𝐵′|M′ | ⊗ 𝜌 𝑥𝐵𝑚 , (12.1.75)
1 |M |
𝑥1 ,...,𝑥 |M′ | ∈X

we find that

𝜔𝑚
𝐵′ ···𝐵′ 𝐵𝑅1 ···𝑅 |M′ |
1 |M′ |

733
Chapter 12: Classical Communication

B 𝜏𝐵𝑚′ ···𝐵′ ′ 𝐵 ⊗ |0⟩⟨0| 𝑅1 ···𝑅 |M′ | (12.1.76)

1
∑︁|M |
= 𝑟 (𝑥1 ) · · · 𝑟 (𝑥 |M′ | )|𝑥⟩⟨𝑥| 𝐵1′ ···𝐵′|M′ | ⊗ (𝜌 𝑥𝐵𝑚 ⊗ |0⟩⟨0| 𝑅1 ···𝑅 |M′ | ),
𝑥1 ,...,𝑥 |M′ | ∈X
(12.1.77)

where |0⟩ ≡ |0, . . . , 0⟩. Therefore, by definition,

𝑝 err (𝑚; (E′, D′))

= 1 − Tr[𝑃𝑚 𝑃 b𝑚−1 · · · 𝑃 b1 𝜔 𝐵′ ···𝐵′ 𝐵𝑅1 ···𝑅 ′ 𝑃
1 |M′ |
b ···𝑃
|M | 1
b𝑚−1 𝑃𝑚 ] (12.1.78)
∑︁
= 𝑟 (𝑥 1 ) · · · 𝑟 (𝑥 |M′ | )
𝑥1 ,...,𝑥 |M′ | ∈X
i
× 1− Tr[Ω𝑥𝑚𝑚 (𝜌 𝑥𝐵𝑚 ⊗ |0⟩⟨0| 𝑅1 ···𝑅 |M′ | )] , (12.1.79)

for all 𝑚 ∈ M′, where

Ω𝑥𝑚𝑚 B 𝑃 b𝑥 𝑚−1 𝑃𝑚
b𝑥1 · · · 𝑃 𝑥 𝑚 b𝑥 𝑚−1 b𝑥1 .
𝑃𝑚−1 · · · 𝑃 (12.1.80)
1 𝑚−1 1

Therefore, the average error probability is bounded as

𝑝 err ((E′, D′); 𝑝)

∑︁ 1 ∑︁
= 𝑟 (𝑥1 ) · · · 𝑟 (𝑥 |M′ | )
′
|M′ |
𝑚∈M ′ 𝑥1 ,...,𝑥 |M | ∈X
i
× 1− Tr[Ω𝑥𝑚𝑚 (𝜌 𝑥𝐵𝑚
⊗ |0⟩⟨0| 𝑅1 ···𝑅 |M′ | )] (12.1.81)
∑︁ 1 ∑︁
≤ 𝑟 (𝑥1 ) · · · 𝑟 (𝑥 |M′ | )
|M′ |
𝑚∈M′ 𝑥1 ,...,𝑥 |M′ | ∈X
𝑚−1
!#
∑︁
× 𝛾I Tr[(𝐼 𝐵 − 𝑀𝐵𝑥 𝑚 ) 𝜌 𝑥𝐵𝑚 ] + 𝛾II Tr[𝑀𝐵𝑥𝑖 𝜌 𝑥𝐵𝑚 ] (12.1.82)
𝑖=1
≤ 𝜀, (12.1.83)

where 𝛾I B 1 + 𝑐 and 𝛾II B 2 + 𝑐 + 𝑐−1 , with 𝑐 = 2𝜀−𝜂 𝜂

, and the inequality on the
last line holds due to (12.1.60). Exchanging the sum over M′ with the sum over the
elements 𝑥1 , . . . , 𝑥 |M′ | of X (in the spirit of the famous trick of Shannon), we find
that

734
Chapter 12: Classical Communication
∑︁
𝑟 (𝑥1 ) · · · 𝑟 (𝑥 |M′ | ) 𝑝 err (C; 𝑝)
𝑥1 ,...,𝑥 |M′ | ∈X
∑︁
≤ 𝑟 (𝑥1 ) · · · 𝑟 (𝑥 |M′ | )𝑢 err (C; 𝑝) ≤ 𝜀, (12.1.84)
𝑥1 ,...,𝑥 |M′ | ∈X

where
∑︁ 1 𝑥𝑚 𝑥𝑚

𝑝 err (C; 𝑝, N) B 1 − Tr[Ω𝑚 (𝜌 𝐵 ⊗ |0⟩⟨0| 𝑅1 ···𝑅 |M′ | )] (12.1.85)
|M′ |
𝑚∈M′

is the average error probability under a code C in which each message 𝑚 is encoded
as 𝑚 ↦→ 𝑥 𝑚 ↦→ 𝜌 𝑥𝐴𝑚 and
𝑚−1
!
∑︁ 1 ∑︁
𝑢 err (C; 𝑝, N) B ′
𝛾I Tr[(𝐼 𝐵 − 𝑀𝐵𝑥 𝑚 ) 𝜌 𝑥𝐵𝑚 ] + 𝛾II Tr[𝑀𝐵𝑥𝑖 𝜌 𝑥𝐵𝑚 ]
|M |
𝑚∈M′ 𝑖=1
(12.1.86)
is an upper bound on the average error probability 𝑝 err (C; 𝑝, N). The decoding
is defined by the measurement operators {Ω𝑥𝑚𝑚 } 𝑚∈M′ . Note that the code C is a
random variable, in the sense that the string 𝑥1 , . . . , 𝑥 |M′ | of length |M′ | is used for
the encoding and decoding with probability 𝑟 (𝑥1 ) · · · 𝑟 (𝑥 |M′ | ).
Since the minimum does not exceed the average, the inequality in (12.1.84)
implies that there exists a code C∗ , with corresponding string 𝑥 1∗ , . . . , 𝑥 ∗|M′ | , such
that
𝑢 err (C∗ ; 𝑝, N) ≤ 𝜀, (12.1.87)
and in turn, via Theorem 11.7, that

𝑝 err (C∗ ; 𝑝, N) ≤ 𝑢 err (C∗ ; 𝑝, N) ≤ 𝜀. (12.1.88)

By choosing this particular code, we can now follow through the entire argument
above without the shared randomness (in the form of the state 𝜌 𝐴′ 𝐵′ ) in order to
conclude that with the code C∗ , the number of transmitted bits is given by (12.1.59),
and the average error probability of the code is bounded from above by 𝜀. This
completes the derandomization part of the proof.
Finally, we are interested in a code, call it (E, D), satisfying the maximal error
probability criterion 𝑝 ∗err (E, D; N) ≤ 𝜀 instead of the average error probability
criterion. To find such a code, we can apply expurgation to the code C∗ defined
above. Formally, this means the following: since we have a code satisfying
735
Chapter 12: Classical Communication

(12.1.87), by Markov’s inequality (see (2.3.20)), half of the codewords in C∗ (call

them 𝑐 1 , . . . , 𝑐 |M′ | ) satisfy
2
𝑢 err (𝑚; C∗ ) ≤ 2𝜀, (12.1.89)
for all 𝑚 ∈ M′ corresponding to the codewords 𝑐 1 , . . . , 𝑐 |M′ | , where
2

𝑚−1
∑︁
∗
𝑢 err (𝑚, C ; N) B 𝛾I Tr[(𝐼 𝐵 − 𝑀𝐵𝑥 𝑚 ) 𝜌 𝑥𝐵𝑚 ] + 𝛾II Tr[𝑀𝐵𝑥𝑖 𝜌 𝑥𝐵𝑚 ]. (12.1.90)
𝑖=1
′
We thus define a new message set M ⊂ M′, with |M| = |M2 | , by removing all but
those messages in M′ whose encodings are given by 𝑐 1 , . . . , 𝑐 |M′ | . Let C denote
2
the expurgated code. Due to the fact that all of the terms in 𝑢 err (𝑚, C∗ ; N) are
non-negative, we find for all 𝑚 ∈ M that

𝑢 err (𝑚, C; N) ≤ 𝑢 err (𝑚, C∗ ; N), (12.1.91)

where
𝑚−1
∑︁
𝑢 err (𝑚, C; N) B 𝛾I Tr[(𝐼 𝐵 − 𝑀𝐵𝑐 𝑚 ) 𝜌 𝑐𝐵𝑚 ] + 𝛾II Tr[𝑀𝐵𝑐𝑖 𝜌 𝑐𝐵𝑚 ]. (12.1.92)
𝑖=1

Again applying the quantum union bound (Theorem 11.7), we then find that

𝑝 err (𝑚, C; N) ≤ 𝑢 err (𝑚, C; N), (12.1.93)

where

𝑝 err (𝑚, C; N) B 1−
𝑐 𝑚 b𝑐 𝑚−1 b𝑐1 (𝜌 𝑐 𝑚 ⊗ |0⟩⟨0| 𝑅1 ···𝑅 ′ ) 𝑃 b𝑐 𝑚−1 𝑃𝑚
b𝑐1 · · · 𝑃 𝑐𝑚
Tr[𝑃𝑚 𝑃𝑚−1 · · · 𝑃 1 𝐵 |M | 1 𝑚−1 ]. (12.1.94)

We thus have a code (E, D) satisfying

𝑝 ∗err (E, D; N) ≤ 2𝜀. (12.1.95)

Specifically, the encoding is given by 𝑚 ↦→ 𝑐 𝑚 for all 𝑚 ∈ M, and the decoding

is given by the sequential decoding procedure consisting of sequentially applying
𝑐 𝑚 b𝑐 𝑚
the binary measurements {𝑃𝑚 , 𝑃𝑚 } for all 𝑚 ∈ M and decoding as message 𝑚 as
𝑐𝑚
soon as the outcome 𝑃𝑚 occurs.

736
Chapter 12: Classical Communication

Therefore, we can use (12.1.59) to obtain the following for the number log2 |M|
of transmitted bits with the reduced message set:
′
|M |
log2 |M| = log2 = log2 |M′ | − log2 (2) (12.1.96)
2

𝜀−𝜂 4𝜀
= 𝜒 𝐻 (N) − log2 2 − log2 (2) (12.1.97)
𝜂

𝜀−𝜂 8𝜀
= 𝜒 𝐻 (N) − log2 2 . (12.1.98)
𝜂

Since 𝜀 and 𝜂 are arbitrary, we have shown that for all 𝜀 ∈ (0, 1) and 𝜂 ∈ (0, 𝜀),
there exists an (|M|,
2𝜀) classical communication protocol satisfying log2 |M| =
𝜀−𝜂
𝜒 𝐻 (N) − log2 8𝜀 𝜂2
. By the substitution 2𝜀 → 𝜀, we can finally say that for all

𝜀 ∈ (0, 1) and 𝜂 ∈ 0, 𝜀2 , there exists an (|M|, 𝜀) classical communication protocol
satisfying
𝜀 4𝜀
2 −𝜂
log2 |M| = 𝜒 𝐻 (N) − log2 2 . (12.1.99)
𝜂
This concludes the proof. ■

An immediate consequence of Propositions 12.5 and 7.72 is the following

theorem.

Theorem 12.6 One-Shot Lower Bounds for Classical Communication

Let N 𝐴→𝐵 be a quantum channel. For all 𝜀 ∈ (0, 1), 𝜂 ∈ 0, 𝜀2 , and 𝛼 ∈ (0, 1),
there exists an (|M|, 𝜀) classical communication protocol over N 𝐴→𝐵 such that

𝛼 1 4𝜀
log2 |M| ≥ 𝜒 𝛼 (N) + log2 𝜀 − log2 2 . (12.1.100)
𝛼−1 2 − 𝜂 𝜂

Here,
𝜒 𝛼 (N) B sup 𝐼 𝛼 (𝑋; 𝐵)𝜔 , (12.1.101)
𝜌𝑋 𝐴

where 𝜔 𝑋 𝐵 = N 𝐴→𝐵 (𝜌 𝑋 𝐴 ), the state 𝜌 𝑋 𝐴 is a classical–quantum state, and

𝐼 𝛼 (𝑋; 𝐵)𝜔 B 𝐷 𝛼 (𝜔 𝑋 𝐵 ∥𝜔 𝑋 ⊗ 𝜔 𝐵 ). (12.1.102)

737
Chapter 12: Classical Communication

Remark: The quantity 𝜒 𝛼 (N) defined in the statement of Theorem 12.6 above is similar to
the quantity 𝜒 𝛼 (N) defined in (7.11.94), except that it is defined with respect to the mutual
information 𝐼 𝛼 (𝑋; 𝐵) 𝜔 that we encountered in Theorem 11.9, which does not involve an
optimization over states 𝜎𝐵 .

Proof: From Proposition 12.5, we know that for all 𝜀 ∈ (0, 1) and 𝜂 ∈ 0, 𝜀2 , there
exists an (|M|, 𝜀) classical communication protocol such that

𝜀 4𝜀
−𝜂
log2 |M| = 𝜒 𝐻2 (N) − log2 2 . (12.1.103)
𝜂
Proposition 7.72 relates the hypothesis testing relative entropy to the Petz–Rényi
relative entropy according to

𝛼 1
𝐷 𝜀𝐻 (𝜌∥𝜎) ≥ 𝐷 𝛼 (𝜌∥𝜎) + log2 (12.1.104)
𝛼−1 𝜀

for all 𝛼 ∈ (0, 1), which implies that

𝛼 1
𝜒 𝜀𝐻 (N) ≥ 𝜒 𝛼 (N) + log2 . (12.1.105)
𝛼−1 𝜀

Combining this inequality with (12.1.103), we immediately get the desired re-
sult. ■

Since the inequality in (12.1.100) holds for every (|M|, 𝜀) classical communi-
cation protocol, we have that

𝜀 𝛼 1 4𝜀
𝐶 (N) ≥ 𝜒 𝛼 (N) + log2 𝜀 − log2 2 (12.1.106)
𝛼−1 2 −𝜂 𝜂

for all 𝛼 ∈ (0, 1), 𝜀 ∈ (0, 1), and 𝜂 ∈ 0, 𝜀2 .

12.2 Classical Capacity of a Quantum Channel

Let us now consider the asymptotic setting of classical communication, as depicted
in Figure 12.3. Similar to entanglement-assisted classical communication, instead
of encoding the message into one quantum system and consequently using the
738
Chapter 12: Classical Communication

A1 B1
N
A2 B2
N
M3m
E ..
.
A n −1
..

N
.
..
.
Bn−1
m
b

An Bn
N
Alice Bob

Figure 12.3: The most general classical communication protocol over a multiple
number 𝑛 ≥ 1 uses of a quantum channel N. Alice, who wishes to send a
message 𝑚 selected from a set M, first encodes the message into a quantum
state on 𝑛 quantum systems using a classical–quantum encoding channel E. She
then sends each quantum system through the channel N. After Bob receives the
systems, he performs a collective measurement on them, using the outcome of
the measurement to give an estimate 𝑚b of the message 𝑚 sent to him by Alice.

channel N only once, Alice encodes the message into 𝑛 ≥ 1 quantum systems
𝐴1 , . . . , 𝐴𝑛 , all with the same dimension as 𝐴, and sends each one of these through
the channel N. We call this the asymptotic setting because the number 𝑛 of channel
uses can be arbitrarily large.
Recall that in the case of entanglement-assisted classical communication, we
showed that encoding channels that entangle the 𝑛 systems 𝐴1 , . . . , 𝐴𝑛 do not help
to achieve higher rates in the asymptotic setting. This is due to the additivity of the
mutual information and the additivity of the sandwiched Rényi mutual information
of a channel for all channels and 𝛼 > 1. In the case of classical communication that
we consider in this chapter, it turns out that, so far, such a statement is known to be
generally false for the Holevo information of a quantum channel (please consult the
Bibliographic Notes in Section 12.5). That is, in principle there exists a channel
for which the Holevo information is not additive. Therefore, unlike entanglement-
assisted classical communication, concrete expressions for the classical capacity
exist only for specific classes of channels.
The analysis of the classical communication protocol in the asymptotic setting
is almost exactly the same as in the one-shot setting. This is due to the fact that 𝑛
independent uses of the channel N can be regarded as a single use of the channel
N ⊗𝑛 . So the only change that needs to be made is to replace N with N ⊗𝑛 and to
define the states and POVM elements as acting on 𝑛 systems instead of just one. In

739
Chapter 12: Classical Communication

particular, the state at the end of the protocol presented in (12.1.8)–(12.1.9) at the
beginning of Section 12.1 is
𝑝
= (D𝐵𝑛 → 𝑀b ◦ N ⊗𝑛
𝑝
𝐴→𝐵 ◦ E 𝑀 →𝐴 )(Φ 𝑀 𝑀 ′ ), (12.2.1)
𝜔 ′ 𝑛
𝑀𝑀b

where 𝑝 is the prior probability distribution over the message set M, the encoding
channel E 𝑀 ′ →𝐴𝑛 is defined as
E 𝑀 ′ →𝐴𝑛 (|𝑚⟩⟨𝑚| 𝑀 ′ ) = 𝜌 𝑚𝐴𝑛 ∀ 𝑚 ∈ M, (12.2.2)
and the decoding channel D𝐵𝑛 → 𝑀b , with associated POVM {Λ𝑚 𝐵 𝑛 } 𝑚∈M , is defined as
∑︁
D𝐵𝑛 → 𝑀b (𝜏𝐵 ) =
𝑛 Tr[Λ𝑚𝐵 𝑛 𝜏𝐵 𝑛 ]|𝑚⟩⟨𝑚| 𝑀
b. (12.2.3)
𝑚∈M

Then, for every given code specified by the encoding and decoding channels, the
definitions of the message error probability of the code, the average error probability
of the code, and the maximal error probability of the code all follow analogously
from their definitions in (12.1.11), (12.1.13), and (12.1.15), respectively, in the
one-shot setting.

Definition 12.7 (𝒏, |M|, 𝜺) Classical Communication Protocol

A classical communication protocol (M, E 𝑀→𝐴𝑛 , D𝐵𝑛 → 𝑀b ) over 𝑛 uses of the
channel N 𝐴→𝐵 is called an (𝑛, |M|, 𝜀) protocol, with 𝜀 ∈ [0, 1], if 𝑝 ∗err (E, D) ≤
𝜀.

Just as in the case of entanglement-assisted classical communication, the rate of

a classical communication protocol over 𝑛 uses of a channel is simply the number
of bits that can transmitted per channel use, i.e.,
1
𝑅(𝑛, |M|) B log2 |M|. (12.2.4)
𝑛
Given a channel N 𝐴→𝐵 and 𝜀 ∈ [0, 1], the maximum rate of classical communication
over N among all (𝑛, |M|, 𝜀) protocols is

1 𝜀 ⊗𝑛 1 ∗ ⊗𝑛
𝑛,𝜀
𝐶 (N) B 𝐶 (N ) = sup log2 |M| : 𝑝 err (E, D; N ) ≤ 𝜀 , (12.2.5)
𝑛 (M,E,D) 𝑛

where the optimization is with respect to every classical communication protocol

(M, E 𝑀 ′ →𝐴 , D𝐵→ 𝑀b ) over N ⊗𝑛 , with 𝑑 𝑀 ′ = 𝑑 𝑀b = |M|.
740
Chapter 12: Classical Communication

As with entanglement-assisted classical communication, the goal of a classical

communication protocol is to maximize the rate while at the same time keeping
the maximal error probability low. Ideally, we would want the error probability
to vanish, and since we want to determine the highest possible rate, we are not
concerned about the practical question regarding how many channel uses might be
required, at least in the asymptotic setting. In particular, as we will see below, it
might take an arbitrarily large number of channel uses to obtain the highest rate
with a vanishing error probability.

Definition 12.8 Achievable Rate for Classical Communication

Given a quantum channel N, a rate 𝑅 ∈ R+ is called an achievable rate for
classical communication over N if for all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently
large 𝑛, there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) classical communication protocol.

As we prove in Appendix A,

𝑅 acheivable rate ⇐⇒ lim 𝜀𝐶∗ (2𝑛(𝑅−𝛿) ; N ⊗𝑛 ) = 0 ∀ 𝛿 > 0. (12.2.6)

𝑛→∞

In other words, a rate 𝑅 is achievable if the optimal error probability for a sequence
of protocols with rate 𝑅 − 𝛿, 𝛿 > 0, vanishes as the number 𝑛 of uses of N increases.

Definition 12.9 Classical Capacity of a Quantum Channel

The classical capacity of a quantum channel N, denoted by 𝐶 (N), is defined
as the supremum of all achievable rates, i.e.,

𝐶 (N) B sup{𝑅 : 𝑅 is an achievable rate for N}. (12.2.7)

The classical capacity can also be written as

1
𝐶 (N) = inf lim inf 𝐶 𝜀 (N ⊗𝑛 ). (12.2.8)
𝜀∈(0,1] 𝑛→∞ 𝑛

See Appendix A for a proof.

741
Chapter 12: Classical Communication

Definition 12.10 Weak Converse Rate for Classical Communication

Given a quantum channel N, a rate 𝑅 ∈ R+ is called a weak converse rate for
classical communication over N if every 𝑅′ > 𝑅 is not an achievable rate for N.

We show in Appendix A that

𝑅 weak converse rate ⇐⇒ lim 𝜀𝐶∗ (2𝑛(𝑅−𝛿) ; N ⊗𝑛 ) > 0 ∀ 𝛿 > 0. (12.2.9)
𝑛→∞
In other words, a weak converse rate is a rate above which the optimal error
probability cannot be made to vanish in the limit of a large number of channel uses.

Definition 12.11 Strong Converse Rate for Classical Communication

Given a quantum channel N, a rate 𝑅 ∈ R+ is called a strong converse rate for
classical communication over N if for all 𝜀 ∈ [0, 1), 𝛿 > 0, and sufficiently
large 𝑛, there does not exist an (𝑛, 2𝑛(𝑅+𝛿) , 𝜀) classical communication protocol
over N.

We show in Appendix A that

𝑅 strong converse rate ⇐⇒ lim 𝜀𝐶∗ (2𝑛(𝑅+𝛿) ; N ⊗𝑛 ) = 1 ∀ 𝛿 > 0. (12.2.10)
𝑛→∞
In other words, unlike the weak converse, in which the optimal error probability is
required to simply be bounded away from zero as the number 𝑛 of channel uses
increases, in order to have a strong converse rate the optimal error has to converge
to one as 𝑛 increases. By comparing (12.2.9) and (12.2.10), it is clear that every
strong converse rate is a weak converse rate.

Definition 12.12 Strong Converse Classical Capacity of a Quantum Chan-

nel
The strong converse classical capacity of a quantum channel N, denoted by
e(N), is defined as the infimum of all strong converse rates,i.e.,
𝐶
e(N) B inf{𝑅 : 𝑅 is a strong converse rate for N}.
𝐶 (12.2.11)

As shown in general in Appendix A, the following inequality holds

𝐶 (N) ≤ 𝐶
e(N) (12.2.12)
742
Chapter 12: Classical Communication

for every quantum channel N. We can also write the strong converse classical
capacity as
𝐶e(N) = sup lim sup 1 𝐶 𝜀 (N ⊗𝑛 ). (12.2.13)
𝜀∈[0,1) 𝑛→∞ 𝑛
See Appendix A for a proof.
Having defined the classical capacity of a quantum channel, as well as the strong
converse capacity, we now state one of the main theorems of this chapter, which
gives us a formal expression for the classical capacity of every quantum channel.

Theorem 12.13 Classical Capacity of a Quantum Channel

The classical capacity of a quantum channel N is equal to its regularized Holevo
information 𝜒reg (N) of N, i.e.,

1
𝐶 (N) = 𝜒reg (N) B lim 𝜒(N ⊗𝑛 ). (12.2.14)
𝑛→∞ 𝑛

Remark: The quantity 𝜒reg (N) B lim𝑛→∞ 𝑛1 𝜒(N ⊗𝑛 ) is called the regularization of the Holevo
information. It can be shown that the limit in the definition of 𝜒reg (N) does indeed exist (please
consult the Bibliographic Notes in Section 12.5).

Note that, unlike the case of entanglement-assisted classical communication in

Chapter 11, the right-hand side of (12.2.14) does not depend on only a single use
of the channel N. Rather, the capacity formula involves a limit over an arbitrarily
large number of uses of the channel and is essentially impossible to compute in
general, firstly because of the difficulty of computing the Holevo information of a
channel and secondly due to the limit over an arbitrarily large number of uses of
the channel. The issue is essentially that the Holevo information of a channel is not
known to be additive for all channels, while the mutual information of a channel is
known to have this property (we show this in Theorem 11.19). However, as we
show below in Section 12.2.3, the Holevo information is superadditive, meaning
that 𝜒(N ⊗𝑛 ) ≥ 𝑛𝜒(N). This implies that the Holevo information is always a lower
bound on the quantum capacity of every channel N:
𝐶 (N) ≥ 𝜒(N) for all channels N. (12.2.15)
Channels for which the Holevo information is known to be additive include the
following:
743
Chapter 12: Classical Communication

1. All entanglement-breaking channels. (See Definition 4.12.)

2. All Hadamard channels. (See Definition 4.16.)
3. The depolarizing channel. (See (4.5.31).)
4. The erasure channel. (See (4.5.18).)
For all of these channels, we thus have that 𝐶 (N) = 𝜒(N).
Also notice that, unlike Theorem 11.16 for entanglement-assisted classical
communication, Theorem 12.13 only makes a statement about the classical capacity
𝐶 (N) of all channels, not about the strong converse classical capacity 𝐶 e(N).
In the case of entanglement-assisted classical communication, proving that the
mutual information of a channel is a strong converse rate involved proving that
the sandwiched Rényi mutual information is additive for all channels, which we
established in Theorem 11.22. Similarly, in the case of classical communication
and attempting to follow a similar approach, the relevant quantity is the sandwiched
Rényi Holevo information e 𝜒𝛼 , defined in (7.11.95). Unlike the sandwiched Rényi
mutual information, the sandwiched Rényi Holevo information is not known to be
additive for all channels. However, as we show in Section 12.2.3.1, it is additive
for all entanglement-breaking channels. It is also additive for Hadamard and
depolarizing channels (please consult the Bibliographic Notes in Section 12.5).
The best we can do, at the moment, is to say that the regularized Holevo information
is a weak converse rate for all channels.
There are two ingredients to the proof of Theorem 12.13:
1. Achievability: We show that 𝜒reg (N) is an achievable rate. In general, to show
that 𝑅 ∈ R+ is achievable, we define encoding and decoding channels such that
for all 𝜀 ∈ (0, 1] and sufficiently large 𝑛, the encoding and decoding channels
correspond to (𝑛, 2𝑛𝑟 , 𝜀) protocols with rates 𝑟 < 𝑅, as per Definition 12.8.
Thus, if 𝑅 is an achievable rate, then, given an error probability 𝜀, it is possible
to find an 𝑛 large enough, along with encoding and decoding channels, such
that the resulting protocol has rate arbitrarily close to 𝑅 and maximal error
probability bounded from above by 𝜀.
The achievability part of the proof establishes that 𝐶 (N) ≥ 𝜒reg (N).
2. Weak Converse: We show that 𝜒reg (N) is a weak converse rate, from which it
follows that 𝐶 (N) ≤ 𝜒reg (N). To show that 𝜒reg (N) is a weak converse rate,
we show that every achievable rate 𝑟 satisfies 𝑟 ≤ 𝜒reg (N).
744
Chapter 12: Classical Communication

The achievability and weak converse proofs establish that the classical capacity
is equal to the regularized Holevo information: 𝐶 (N) = 𝜒reg (N). Theorem 12.13
and the inequality in (12.2.12) allow us to conclude that

e(N) ≥ lim 1 𝜒(N ⊗𝑛 ).

𝐶 (12.2.16)
𝑛→∞ 𝑛

We first establish in Section 12.2.1 that the rate 𝜒reg (N) is achievable for
classical communication over N. Then, in Section 12.2.2, we prove that 𝜒reg (N)
is a weak converse rate. We prove that the sandwiched Rényi Holevo information
of a entanglement-breaking channel is additive in Section 12.2.3. With this
additivity result, we prove in Section 12.2.4 that 𝐶 (N) = 𝐶 e(N) = 𝜒(N) for all
entanglement-breaking channels.

12.2.1 Proof of Achievability

In this section, we prove that 𝜒reg (N) is an achievable rate for classical communi-
cation over N.
First, recall from Theorem 12.6 that for all 𝜀 ∈ (0, 1) and 𝜂 ∈ (0, 𝜀2 ), there
exists an (|M|, 𝜀) classical communication protocol over N such that

𝛼 1 4𝜀
log2 |M| ≥ 𝜒 𝛼 (N) + log2 𝜀 − log2 2 (12.2.17)
𝛼−1 2 −𝜂 𝜂

for all 𝛼 ∈ (0, 1), where we recall from (12.1.101) that

𝜒 𝛼 (N) = sup 𝐼 𝛼 (𝑋; 𝐵)𝜔 (12.2.18)

𝜌𝑋 𝐴
= sup 𝐷 𝛼 (N 𝐴→𝐵 (𝜌 𝑋 𝐴 )∥ 𝜌 𝑋 ⊗ N 𝐴→𝐵 (𝜌 𝐴 )), (12.2.19)
𝜌𝑋 𝐴

where 𝜔 𝑋 𝐵 B N 𝐴→𝐵 (𝜌 𝑋 𝐴 ) and the optimization is over all classical–quantum

states 𝜌 𝑋 𝐴 . A simple corollary of this result is the following.

745
Chapter 12: Classical Communication

Corollary 12.14 Lower Bound for Classical Communication in Asymp-

totic Setting
Let N be a quantum channel. For all 𝜀 ∈ (0, 1], 𝑛 ∈ N, and 𝛼 ∈ (0, 1), there
exists an (𝑛, |M|, 𝜀) classical communication protocol over 𝑛 uses of N such
that
1 1 4 4
log2 |M| ≥ 𝜒 𝛼 (N) − log2 − . (12.2.20)
𝑛 𝑛(1 − 𝛼) 𝜀 𝑛

Proof: The inequality (12.2.17) holds for every channel N, which means that it
holds for N ⊗𝑛 . Applying the inequality in (12.2.17) to N ⊗𝑛 and dividing both sides
by 𝑛, we obtain

1 1 ⊗𝑛 𝛼 1 1 4𝜀
log2 |M| ≥ 𝜒 𝛼 (N ) + log2 𝜀 − log2 2 (12.2.21)
𝑛 𝑛 𝑛(𝛼 − 1) 2 −𝜂 𝑛 𝜂

for all 𝛼 ∈ (0, 1). By restricting the optimization in the definition of 𝜒 𝛼 (N ⊗𝑛 )

to tensor-power states, we find that 𝜒 𝛼 (N ⊗𝑛 ) ≥ 𝑛𝜒 𝛼 (N). This follows from
the additivity of the Petz–Rényi relative entropy under tensor-product states (see
Proposition 7.23). So we obtain

1 𝛼 1 1 4𝜀
log2 |M| ≥ 𝜒 𝛼 (N) + log2 𝜀 − log2 2 (12.2.22)
𝑛 𝑛(𝛼 − 1) 2 − 𝜂 𝑛 𝜂

for all 𝛼 ∈ (0, 1). Now, letting 𝜂 = 𝜀4 and using the fact that 𝛼 − 1 is negative for
𝛼 ∈ (0, 1), the following inequality holds for all 𝛼 ∈ (0, 1):

1 1 4 4
log2 |M| ≥ 𝜒 𝛼 (N) − log2 − . (12.2.23)
𝑛 𝑛(1 − 𝛼) 𝜀 𝑛

In other words, for all 𝜀 ∈ (0, 1], there exists an (𝑛, |M|, 𝜀) classical communication
protocol such that (12.2.20) is satisfied. This concludes the proof. ■

The inequality in (12.2.20) gives us, for every 𝜀 ∈ (0, 1] and 𝑛 ∈ N, a lower
bound on the rate of a corresponding (𝑛, |M|, 𝜀) classical communication protocol,
which is known to exist due to Proposition 12.5. If instead we fix a particular
communication rate 𝑅 by letting |M| = 2𝑛𝑅 , then we can rearrange the inequality
in (12.2.20) to obtain an exponentially decaying upper bound on the maximal

746
Chapter 12: Classical Communication

error probability of the corresponding (𝑛, 2𝑛𝑅 , 𝜀) classical communication protocol.

Specifically, we find that
𝜀 ≤ 4 · 2−(1−𝛼) ( 𝜒 𝛼 (N)−𝑅− 𝑛 )
4
(12.2.24)
for all 𝛼 ∈ (0, 1).
The inequality in (12.2.20) implies that

1 4 4
𝐶 𝑛,𝜀 (N) ≥ 𝜒 𝛼 (N) − log2 − (12.2.25)
𝑛(𝛼 − 1) 𝜀 𝑛
for all 𝑛 ≥ 1, 𝜀 ∈ (0, 1), and 𝛼 ∈ (0, 1).
We can now use (12.2.20) to prove that 𝜒reg (N) is an achievable rate for classical
communication over N.

Proof of the Achievability Part of Theorem 12.13

Fix 𝜀 ∈ (0, 1] and 𝛿 > 0. Let 𝛿1 , 𝛿2 > 0 be such that

𝛿 = 𝛿1 + 𝛿2 . (12.2.26)
Set 𝛼 ∈ (0, 1) such that
𝛿1 ≥ 𝜒(N) − 𝜒 𝛼 (N), (12.2.27)
which is possible because 𝜒 𝛼 (N) is monotonically increasing in 𝛼 (this follows from
Proposition 7.23), because lim𝛼→1− 𝜒 𝛼 (N) = 𝜒(N) (the proof of this is analogous
to the one presented in Appendix 11.B). With this value of 𝛼, take 𝑛 large enough
so that
1 4 4
𝛿2 ≥ log2 + . (12.2.28)
𝑛(1 − 𝛼) 𝜀 𝑛

Now, making use of the inequality in (12.2.20) of Corollary 12.14, there exists
an (𝑛, |M|, 𝜀) protocol, with 𝑛 and 𝜀 chosen as above, such that

1 1 4 4
log2 |M| ≥ 𝜒 𝛼 (N) − log2 − . (12.2.29)
𝑛 𝑛(1 − 𝛼) 𝜀 𝑛
Rearranging the right-hand side of this inequality, and using (12.2.26)–(12.2.28),
we find that

1 1 4 4
log2 |M| ≥ 𝜒(N) − 𝜒(N) − 𝜒 𝛼 (N) + log2 + (12.2.30)
𝑛 𝑛(1 − 𝛼) 𝜀 𝑛
747
Chapter 12: Classical Communication

≥ 𝜒(N) − (𝛿1 + 𝛿2 ) (12.2.31)

= 𝜒(N) − 𝛿. (12.2.32)

We thus have 𝜒(N) − 𝛿 ≤ 𝑛1 log2 |M|. Recall that if an (𝑛, |M|, 𝜀) protocol exists,
then an (𝑛, |M′ |, 𝜀) also exists for all M′ satisfying |M′ | ≤ |M|. We thus conclude
that there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) classical communication with 𝑅 = 𝜒(N) for
all sufficiently large 𝑛 such that (12.2.28) holds. Since 𝜀 and 𝛿 are arbitrary, we
conclude that, for all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently large 𝑛, there exists an
(𝑛, 2𝑛( 𝜒(N)−𝛿) , 𝜀) classical communication protocol. This means that 𝜒(N) is an
achievable rate, and thus that 𝐶 (N) ≥ 𝜒(N).
Now, we can repeat the arguments above for the tensor-power channel N ⊗𝑘 with
𝑘 ≥ 1, and we conclude that 1𝑘 𝜒(N ⊗𝑘 ) is an achievable rate. Since this holds for all
𝑘, we conclude that lim𝑘→∞ 1𝑘 𝜒(N ⊗𝑘 ) = 𝜒reg (N) is an achievable rate. Therefore,
𝐶 (N) ≥ 𝜒reg (N).

12.2.1.1 Achievability from a Different Point of View

Using arguments similar to those given in Appendix 11.C, we can make the
following statement: there exists a sequence {(𝑛, 2𝑛𝑅𝑛 , 𝜀 𝑛 )}𝑛∈N of (𝑛, |M|, 𝜀)
classical communication protocols over N, such that lim inf 𝑛→∞ 𝑅𝑛 ≥ 𝜒(N) and
lim𝑛→∞ 𝜀 𝑛 = 0. If we consider a sequence {(𝑛, 2𝑛𝑅 , 𝜀 𝑛 )}𝑛∈N of (𝑛, |M|, 𝜀) classical
communication protocols, this time keeping the rate at an arbitrary (but fixed) value
𝑅 < 𝜒(N) and varying the error probability, we conclude that there exists a sequence
of protocols for which the error probabilities 𝜀 𝑛 approach zero exponentially fast as
𝑛 → ∞.

12.2.2 Proof of the Weak Converse

We now show that the regularized Holevo information 𝜒reg (N) is a weak converse
rate. The result is to establish that 𝐶 (N) ≤ 𝜒reg (N) and therefore that 𝐶 (N)
= 𝜒reg (N), completing the proof of Theorem 12.13.
Let us first recall from Theorem 12.4 that for every quantum channel N we have
the following: for all 𝜀 ∈ [0, 1) and (|M|, 𝜀) classical communication protocols

748
Chapter 12: Classical Communication

A1 B1

A2 B2

.. .. ..
M3m
E .
A n −1
. PσBn .
Bn−1
m
b

An Bn

Alice Bob

Figure 12.4: Depiction of a protocol that is useless for classical communication

in the asymptotic setting. The state encoding the message 𝑚 via E is discarded
and replaced by an arbitrary (but fixed) state 𝜎𝐵𝑛 .

over N,
1
log2 |M| ≤ ( 𝜒(N) + ℎ2 (𝜀)) , (12.2.33)
1−𝜀
𝛼 1
log2 |M| ≤ e
𝜒𝛼 (N) + log2 , ∀ 𝛼 > 1. (12.2.34)
𝛼−1 1−𝜀
To obtain these inequalities, we considered a classical communication protocol
over a useless channel and used the hypothesis testing relative entropy to compare
this protocol with the actual protocol over the channel N. The useless channel
in the asymptotic setting is analogous to the one in Figure 12.2 and is shown
in Figure 12.4. A simple corollary of Theorem 12.4, which is relevant for the
asymptotic setting, is the following.

Corollary 12.15 Upper Bounds for Classical Communication in Asymp-

totic Setting
Let N be a quantum channel. For all 𝜀 ∈ [0, 1), 𝑛 ∈ N, and (𝑛, |M|, 𝜀) classical
communication protocols over 𝑛 uses of N, the rate of transmitted bits is
bounded from above as follows:

log2 |M| 1 1 ⊗𝑛 1
≤ 𝜒(N ) + ℎ2 (𝜀) , (12.2.35)
𝑛 1−𝜀 𝑛 𝑛

log2 |M| 1 𝛼 1
≤ e 𝜒𝛼 (N ⊗𝑛 ) + log2 ∀ 𝛼 > 1. (12.2.36)
𝑛 𝑛 𝑛(𝛼 − 1) 1−𝜀

749
Chapter 12: Classical Communication

Proof: Since the inequalities in (12.2.33) and (12.2.34) of Theorem 12.4 hold for
every channel N, they hold for the channel N ⊗𝑛 . Therefore, applying (12.2.33) and
(12.2.34) to N ⊗𝑛 and dividing both sides by 𝑛, we immediately obtain the desired
result. ■

The inequalities in the corollary above give us, for every 𝜀 ∈ [0, 1) and
𝑛 ∈ N, an upper bound on the size |M| of the message set we can take for an
arbitrary (𝑛, |M|, 𝜀) classical communication protocol. If instead we fix a particular
communication rate 𝑅 by letting |M| = 2𝑛𝑅 , then we can obtain a lower bound on
the maximal error probability of an arbitrary (𝑛, 2𝑛𝑅 , 𝜀) classical communication
protocol. Specifically, using (12.2.36), we find that

𝜀 ≥ 1 − 2−𝑛 (
𝛼−1
𝛼 )( 𝑅− 𝑛1 e𝜒 𝛼 (N ⊗𝑛 ) ) (12.2.37)

for all 𝛼 > 1.

The inequalities in (12.2.35) and (12.2.36) imply that

1 1 1
𝐶 𝑛,𝜀 (N) ≤ 𝜒(N ⊗𝑛 ) + ℎ2 (𝜀) , (12.2.38)
1−𝜀 𝑛 𝑛

1 𝛼 1
𝐶 𝑛,𝜀 (N) ≤ e 𝜒𝛼 (N ⊗𝑛 ) + log2 ∀ 𝛼 > 1, (12.2.39)
𝑛 𝑛(𝛼 − 1) 1−𝜀
where 𝑛 ≥ 1 and 𝜀 ∈ (0, 1).
Using (12.2.35), we can now prove the weak converse part of Theorem 12.13.

Proof of the Weak Converse Part of Theorem 12.13

Suppose that 𝑅 is an achievable rate. Then, by definition, for all 𝜀 ∈ (0, 1], 𝛿 > 0,
and sufficiently large 𝑛, there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) classical communication
protocol over N. For all such protocols, the inequality (12.2.35) in Corollary 12.15
holds, so that
1 1 1
𝑅−𝛿 ≤ 𝜒(N ⊗𝑛 ) + ℎ2 (𝜀) . (12.2.40)
1−𝜀 𝑛 𝑛
Since this bound holds for all 𝑛, it holds in the limit 𝑛 → ∞, so that

1 1 1
𝑅 ≤ lim 𝜒(N ⊗𝑛 ) + ℎ2 (𝜀) + 𝛿 (12.2.41)
𝑛→∞ 1 − 𝜀 𝑛 𝑛
750
Chapter 12: Classical Communication

1 1
= lim 𝜒(N ⊗𝑛 ) + 𝛿. (12.2.42)
1 − 𝜀 𝑛→∞ 𝑛
Then, since this inequality holds for all 𝜀, 𝛿 > 0, we then conclude that

1 1 1
𝑅 ≤ lim lim 𝜒(N ) + 𝛿 = lim 𝜒(N ⊗𝑛 ).
⊗𝑛
(12.2.43)
𝜀,𝛿→0 1 − 𝜀 𝑛→∞ 𝑛 𝑛→∞ 𝑛

We have thus shown that if 𝑅 is an achievable rate, then 𝑅 ≤ 𝜒reg (N). The
contrapositive of this statement is that if 𝑅 > 𝜒reg (N), then 𝑅 is not an achievable
rate. By definition, therefore, 𝜒reg (N) is a weak converse rate.
Recall that Theorem 12.13 only gives an expression for the capacity 𝐶 (N),
and not for the strong converse capacity 𝐶 e(N). The sandwiched Rényi Holevo
information e 𝜒𝛼 (N) of a channel N can be used to obtain the upper bound in
(12.2.36), holding for every (𝑛, |M|, 𝜀) protocol. This inequality then leads to an
expression for the strong converse capacity in the case that e𝜒𝛼 (N) happens to be
additive for N. We now, therefore, address this question regarding the additivity of
the sandwiched Rényi Holevo information.

12.2.3 The Additivity Question

Although we have shown that the classical capacity 𝐶 (N) of a channel N is given
by the regularized Holevo information 𝜒reg (N) = lim𝑛→∞ 𝑛1 𝜒(N ⊗𝑛 ), as mentioned
earlier, without the additivity of 𝜒(N) this result is not particularly helpful since it
is not known how to compute the regularized Holevo information in general.
Note, however, that for all channels N1 and N2 we always have the superadditivity
of the Holevo information, i.e.,

𝜒(N1 ⊗ N2 ) ≥ 𝜒(N1 ) + 𝜒(N2 ). (12.2.44)

This follows by performing exactly the same steps in (11.2.44)–(11.2.52), but with
the systems 𝑅1 and 𝑅2 therein taken to be classical systems. Therefore, to prove the
additivity of 𝜒 for a channel N, it suffices to show that 𝜒(N ⊗ M) ≤ 𝜒(N) + 𝜒(M).
Similarly, the sandwiched Rényi Holevo information is superadditive; i.e., for
all 𝛼 ≥ 1 and all channels N1 and N2 , it holds that

𝜒𝛼 (N1 ⊗ N2 ) ≥ e
e 𝜒𝛼 (N1 ) + e
𝜒𝛼 (N2 ). (12.2.45)
751
Chapter 12: Classical Communication

First, recall that

𝜒𝛼 (N) = sup e
e 𝐼𝛼 (𝑋; 𝐵)𝜔 , (12.2.46)
𝜌𝑋 𝐴

where 𝜔 𝑋 𝐵 = N 𝐴→𝐵 (𝜌 𝑋 𝐴 ), and where we optimize over classical–quantum states

Í
𝜌 𝑋 𝐴 = 𝑥∈X 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 , with X a finite alphabet with associated |X|-
dimensional system 𝑋 and {𝜌 𝑥𝐴 }𝑥∈X a set of states. Also, recall that for every state
𝜌 𝐴𝐵 ,
𝐼𝛼 ( 𝐴; 𝐵) 𝜌 = inf 𝐷
e e𝛼 (𝜌 𝐴𝐵 ∥ 𝜌 𝐴 ⊗ 𝜎𝐵 ). (12.2.47)
𝜎𝐵

Now, the proof of (12.2.45) proceeds similarly to the proof of the corresponding
inequality (12.2.44) for the Holevo information. By restricting the optimization in
𝜒𝛼 (N1 ⊗ N2 ) to product states, and letting 𝜌′𝑋1 𝑋2 𝐵1 𝐵2 be defined as
the definition of e

𝜌′𝑋1 𝑋2 𝐵1 𝐵2 B ((N1 ) 𝐴1 →𝐵1 ⊗ (N2 ) 𝐴2 →𝐵2 )(𝜌 𝑋1 𝑋2 𝐴1 𝐴2 ), (12.2.48)

for a classical–quantum state 𝜌 𝑋1 𝑋2 𝐴1 𝐴2 (𝑋 systems classical and 𝐴 systems quan-

tum), we find that

𝜒𝛼 (N1 ⊗ N2 ) = sup e
e 𝐼𝛼 (𝑋1 𝑋2 ; 𝐵1 𝐵2 ) 𝜌′ (12.2.49)
𝜌

≥ sup e
𝐼𝛼 (𝑋1 𝑋2 ; 𝐵1 𝐵2 )𝜉 ′ ⊗𝜔′ , (12.2.50)
𝜏⊗𝜔

where 𝜏𝑋′ 1 𝐵1 B (N1 ) 𝐴1 →𝐵1 (𝜉 𝑋1 𝐴2 ) and 𝜔′𝑋2 𝐵2 B (N2 ) 𝐴2 →𝐵2 (𝜔 𝑋2 𝐴2 ). Proposi-

tion 11.21 states that the sandwiched Rényi mutual information e 𝐼𝛼 is additive for
product states, meaning that

𝐼𝛼 ( 𝐴1 𝐴2 ; 𝐵1 𝐵2 )𝜏⊗𝜔 = e
e 𝐼 𝛼 ( 𝐴1 ; 𝐵 1 ) 𝜏 + e
𝐼 𝛼 ( 𝐴2 ; 𝐵 2 ) 𝜔 (12.2.51)

for every state 𝜏𝐴1 𝐵1 ⊗ 𝜔 𝐴2 𝐵2 . Using this, we find that

n o
𝜒𝛼 (N1 ⊗ N2 ) ≥ sup 𝐼𝛼 (𝑋1 ; 𝐵1 )𝜏 + 𝐼𝛼 (𝑋2 ; 𝐵2 )𝜔
e e ′ e ′ (12.2.52)
𝜏,𝜔
𝐼𝛼 (𝑋1 ; 𝐵1 )𝜏′ + sup e
= sup e 𝐼𝛼 (𝑋2 ; 𝐵2 )𝜔′ (12.2.53)
𝜏 𝜔
𝜒𝛼 (N1 ) + e
=e 𝜒𝛼 (N2 ), (12.2.54)

i.e.,
𝜒𝛼 (N1 ⊗ N2 ) ≥ e
e 𝜒𝛼 (N1 ) + e
𝜒𝛼 (N2 ), (12.2.55)
as required.
752
Chapter 12: Classical Communication

We see that in order to show the additivity of the sandwiched Rényi Holevo
information for N, it suffices to show subadditivity for N, i.e.,

𝜒𝛼 (N ⊗𝑛 ) ≤ 𝑛e
e 𝜒𝛼 (N) ∀ 𝑛 ≥ 1. (12.2.56)

We now show that subadditivity, and thus additivity, of the sandwiched Rényi
Holevo information holds for all entanglement-breaking channels.

12.2.3.1 Entanglement-Breaking Channels

In this section, we prove that the sandwiched Rényi Holevo information is additive
for all entanglement-breaking channels.

Theorem 12.16 Additivity of 𝝌

e𝜶 for Entanglement-Breaking Channels
For an entanglement-breaking channel N and an arbitrary channel M, the
following equality holds for all 𝛼 > 1:

𝜒𝛼 (N ⊗ M) = e
e 𝜒𝛼 (N) + e
𝜒𝛼 (M). (12.2.57)

The proof of this theorem relies on two lemmas, the first of which states that the
sandwiched Rényi Holevo information e 𝜒𝛼 (N) of a channel N is equal to a quantity
𝐾𝛼 (N), called the sandwiched Rényi information radius of N.
e

Lemma 12.17
For every quantum channel N and 𝛼 > 1, the following equality holds

𝜒𝛼 (N) = inf sup 𝐷

e e𝛼 (N(𝜌)∥𝜎) C 𝐾
e𝛼 (N), (12.2.58)
𝜎 𝜌

e𝛼 (N) is called
where the optimizations are over states 𝜌 and 𝜎. The quantity 𝐾
the sandwiched Rényi information radius of N.

𝜒𝛼 (N) ≤ 𝐾
Proof: To prove this lemma, we show that e e𝛼 (N) and e
𝜒𝛼 (N) ≥ 𝐾
e𝛼 (N).

𝜒𝛼 (N), we find that for every state 𝜏𝐵 ,

First, using the definition in (7.11.95) of e

𝜒𝛼 (N)
e
753
Chapter 12: Classical Communication

𝐼𝛼 (𝑋; 𝐵)𝜔
= sup e (12.2.59)
𝜌𝑋 𝐴
!
∑︁ ∑︁
= sup inf 𝐷
e𝛼 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ N(𝜌 𝑥𝐴 ) 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜎𝐵 (12.2.60)
𝜌𝑋 𝐴 𝜎𝐵
𝑥∈X 𝑥∈X
!
∑︁ ∑︁
≤ sup 𝐷
e𝛼 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ N(𝜌 𝑥𝐴 ) 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜏𝐵 , (12.2.61)
𝜌𝑋 𝐴
𝑥∈X 𝑥∈X
where 𝜔 𝑋 𝐵 = N 𝐴→𝐵 (𝜌 𝑋 𝐴 ) and the supremum is over all classical–quantum states
Í
𝜌 𝑋 𝐴 of the form 𝜌 𝑋 𝐴 = 𝑥∈X 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 , with X a finite alphabet with
associated |X|-dimensional quantum system 𝑋 and {𝜌 𝑥𝐴 }𝑥∈X is a set of states.
Now, recall from (7.5.174) that the sandwiched Rényi relative entropy is jointly
quasi-convex for 𝛼 > 1 and invariant under tensoring in the same state |𝑥⟩⟨𝑥|, which
implies that
!
∑︁ ∑︁
𝜒𝛼 (N) ≤ sup 𝐷
e e𝛼 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ N(𝜌 𝑥𝐴 ) 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜏𝐵 (12.2.62)
𝜌𝑋 𝐴
𝑥∈X 𝑥∈X
≤ sup max 𝐷 e𝛼 (N(𝜌 𝑥 )∥𝜏𝐵 ) (12.2.63)
𝐴
𝜌 𝑋 𝐴 𝑥∈X
e𝛼 (N(𝜌 𝐴 )∥𝜏𝐵 ).
≤ sup 𝐷 (12.2.64)
𝜌𝐴

The final inequality above holds for every state 𝜏𝐵 , which implies that
𝜒𝛼 (N) ≤ inf sup 𝐷
e e𝛼 (N(𝜌 𝐴 )∥𝜏𝐵 ) = 𝐾
e𝛼 (N), (12.2.65)
𝜏𝐵 𝜌 𝐴

𝜒𝛼 (N) ≤ 𝐾
i.e., e e𝛼 (N).
e𝛼 (N) ≤ e
We now show that 𝐾 𝜒𝛼 (N). First, consider that
e𝛼 (N) = inf sup 𝐷
𝐾 e𝛼 (N(𝜌 𝐴 )∥𝜎𝐵 ) (12.2.66)
𝜎𝐵 𝜌 𝐴
1 e𝛼 (N(𝜌 𝐴 )∥𝜎𝐵 )
= inf sup log2 𝑄 (12.2.67)
𝛼 − 1 𝜎𝐵 𝜌 𝐴
1 e𝛼 (N(𝜌 𝐴 )∥𝜎𝐵 ).
= log2 inf sup 𝑄 (12.2.68)
𝛼−1 𝜎𝐵 𝜌 𝐴

Now, by taking a supremum over all probability measures 𝜇 on the set of all states
𝜌 𝐴 , we find that
∫
sup 𝑄e𝛼 (N(𝜌 𝐴 )∥𝜎𝐵 ) ≤ sup e𝛼 (N(𝜌 𝐴 )∥𝜎𝐵 ) d𝜇(𝜌 𝐴 ).
𝑄 (12.2.69)
𝜌𝐴 𝜇

754
Chapter 12: Classical Communication

So we have that
∫
e𝛼 (N) ≤ 1 log2 inf sup
𝐾 e𝛼 (N(𝜌 𝐴 )∥𝜎𝐵 ) d𝜇(𝜌 𝐴 ).
𝑄 (12.2.70)
𝛼−1 𝜎𝐵 𝜇

We now apply the Sion minimax theorem (Theorem 2.24) to exchange inf 𝜎𝐵 and
sup 𝜇 . This theorem is applicable because the function
∫
(𝜇, 𝜎𝐵 ) ↦→ 𝑄e𝛼 (N(𝜌 𝐴 )∥𝜎𝐵 ) d𝜇(𝜌 𝐴 ) (12.2.71)

is linear in the measure 𝜇 and convex in the states 𝜎𝐵 . The latter is indeed true
because 𝛼
1 1− 𝛼 1
𝑄e𝛼 (N(𝜌 𝐴 )∥𝜎𝐵 ) = N(𝜌) 2 𝜎 N(𝜌 𝐴 ) 2 ,
𝛼
(12.2.72)
𝐵
𝛼
1− 𝛼
for all 𝛼 > 1 the function 𝜎𝐵 ↦→ 𝜎𝐵 is operator convex, the Schatten norm ∥·∥ 𝛼
𝛼

is convex, and the function 𝑥 ↦→ 𝑥 𝛼 is convex for all 𝑥 ≥ 0. So we find that

∫
e𝛼 (N) ≤ 1 e𝛼 (N(𝜌 𝐴 )∥𝜎𝐵 ) d𝜇(𝜌 𝐴 ).
𝐾 log2 sup inf 𝑄 (12.2.73)
𝛼−1 𝜇 𝜎 𝐵

Now, by Carathéodory’s theorem (Theorem 2.23), if 𝜌 is a density operator acting

on a 𝑑-dimensional space, then there exists an alphabet X of size no more than 𝑑 2 ,
a probability distribution 𝑝 : X → [0, 1] on X, and an ensemble {( 𝑝(𝑥), 𝜌 𝑥𝐴 )}𝑥∈X
of states such that
∫ ∑︁
𝑄 𝛼 (N(𝜌 𝐴 )∥𝜎𝐵 ) d𝜇(𝜌) =
e e𝛼 (N(𝜌 𝑥 )∥𝜎𝐵 ).
𝑝(𝑥) 𝑄 (12.2.74)
𝐴
𝑥∈X

Therefore,

e𝛼 (N) ≤ 1 ∑︁
e𝛼 (N(𝜌 𝑥 )∥𝜎𝐵 )
𝐾 log2 sup inf 𝑝(𝑥) 𝑄 (12.2.75)
𝛼−1 {( 𝑝(𝑥),𝜌 𝑥𝐴)} 𝑥 𝜎𝐵
𝐴
𝑥∈X
1 e𝛼 (N 𝐴→𝐵 (𝜌 𝑋 𝐵 )∥ 𝜌 𝑋 ⊗ 𝜎𝐵 )
= log2 sup inf 𝑄 (12.2.76)
𝛼−1 {( 𝑝(𝑥),𝜌 𝐴)} 𝑥
𝑥 𝜎 𝐵

e𝛼 (𝜌 𝑋 𝐴 ∥ 𝜌 𝑋 ⊗ 𝜎𝐵 )
= sup inf 𝐷 (12.2.77)
𝜌 𝑋 𝐴 𝜎𝐵

𝐼𝛼 (𝑋; 𝐵)𝜔
= sup e (12.2.78)
𝜌𝑋 𝐴
𝜒𝛼 (N),
=e (12.2.79)

755
Chapter 12: Classical Communication

where 𝜔 𝑋 𝐵 = N 𝐴→𝐵 (𝜌 𝑋 𝐴 ), and to obtain the first equality we used the direct-
sum property of 𝑄 e𝛼 in (7.5.41). So we have 𝐾 e𝛼 (N) ≤ e 𝜒𝛼 (N) in addition to
𝐾𝛼 (N) ≥ e
e 𝜒𝛼 (N), which means that 𝐾𝛼 (N) = e
e 𝜒𝛼 (N), as required. ■

𝜒𝛼 (N) = 𝜒(N) (the proof of this analogous to the

Using the fact that lim𝛼→1+ e
one presented in Appendix 11.B), we obtain the following alternate formula for the
Holevo information of a quantum channel N:

𝜒(N) = inf sup 𝐷 (N(𝜌)∥𝜎), (12.2.80)

𝜎 𝜌

where the optimizations are over states 𝜌 and 𝜎.

Before stating the following lemma, let us define an entanglement-breaking map
N 𝐴→𝐵 to be a completely positive map such that N 𝐴→𝐵 (𝑋 𝑅 𝐴 ) is a separable operator
of systems 𝑅 and 𝐵 for every positive semi-definite input operator 𝑋 𝑅 𝐴 . We can
think of it as a generalization of an entanglement-breaking channel (Definition 4.12)
in which there is no requirement of trace preservation.

Lemma 12.18
Let M 𝐴→𝐵 be a completely positive map, and let 𝑃 𝑅 𝐴 be a positive semi-definite
separable operator, i.e., such that it can be written in the following form:
∑︁
𝑃𝑅 𝐴 = 𝐶 𝑅𝑥 ⊗ 𝐷 𝑥𝐴 , (12.2.81)
𝑥∈X

where X is a finite alphabet and 𝐶 𝑅𝑥 , 𝐷 𝑥𝐴 ≥ 0 for all 𝑥 ∈ X. Let 𝑃 𝑅 = Tr 𝐴 [𝑃 𝑅 𝐴 ] =

Í
𝑥∈X Tr[𝐷 𝐴 ]𝐶 𝑅 . For all 𝛼 ≥ 1, the following inequality holds
𝑥 𝑥

∥M 𝐴→𝐵 (𝑃 𝑅 𝐴 ) ∥ 𝛼 ≤ 𝜈𝛼 (M 𝐴→𝐵 ) · ∥𝑃 𝑅 ∥ 𝛼 , (12.2.82)

where
𝜈𝛼 (P) := sup ∥P(𝜌) ∥ 𝛼 , (12.2.83)
𝜌

P is a completely positive map, and the supremum is taken over every density
operator in the domain of P. As a consequence, if N 𝐴′ →𝐵′ is an entanglement-
breaking map, then the following equality holds for all 𝛼 ≥ 1:

𝜈𝛼 (M 𝐴→𝐵 ⊗ N 𝐴′ →𝐵′ ) = 𝜈𝛼 (M 𝐴→𝐵 ) · 𝜈𝛼 (N 𝐴′ →𝐵′ ). (12.2.84)

756
Chapter 12: Classical Communication

Proof: Without loss of generality, we can suppose that each 𝐷 𝑥𝐴 is normalized,

in the sense that Tr[𝐷 𝑥𝐴 ] = 1. If it is not the case, then we can redefine 𝐶 𝑅𝑥 as
𝐶 𝑅𝑥 Tr[𝐷 𝑥𝐴 ] and 𝐷 𝑥𝐴 as 𝐷 𝑥𝐴 /Tr[𝐷 𝑥𝐴 ] without changing the separable operator 𝑃 𝑅 𝐴 .
Next, we observe that
∑︁
M 𝐴→𝐵 (𝑃 𝑅 𝐴 ) = 𝐶 𝑅𝑥 ⊗ M 𝐴→𝐵 (𝐷 𝑥𝐴 ) (12.2.85)
𝑥∈X
= 𝑉𝑇𝑉 † , (12.2.86)

where
√︃
⟨𝑥| ⊗ 𝐶 𝑅𝑥 ⊗ 1𝐵 ,
∑︁
𝑉 := (12.2.87)
𝑥∈X
|𝑥⟩⟨𝑥| ⊗ 1 𝑅 ⊗ M 𝐴→𝐵 (𝐷 𝑥𝐴 ).
∑︁
𝑇 := (12.2.88)
𝑥∈X

This implies that

Tr[(M 𝐴→𝐵 (𝑃 𝑅 𝐴 )) 𝛼 ] = Tr[(𝑉𝑇𝑉 † ) 𝛼 ]. (12.2.89)
Now let us apply the Araki–Lieb–Thirring inequality (Lemma 2.15), which states
that for all positive semi-definite operators 𝑋 and 𝑌 ,
1 𝑟𝑞
h 1 i h 𝑟 𝑞i
𝑟 𝑟2
Tr 𝑌 𝑋𝑌
2 2 ≤ Tr 𝑌 𝑋 𝑌
2 (12.2.90)

for all 𝑞 ≥ 0 and 𝑟 ≥ 1. For 𝑞 = 1, we obtain

1 𝑟
h 1 i
Tr 𝑌 2 𝑋𝑌 2 ≤ Tr[𝑋 𝑟 𝑌 𝑟 ] . (12.2.91)

Now, for every operator 𝑍, note that 𝑍 𝑋 𝑍 † has the same non-zero eigenvalues as
1 1
𝑍 † 𝑍 2 𝑋 𝑍 † 𝑍 2 (this follows by considering the polar decomposition of 𝑍). In
addition, since 𝑍 † 𝑍 is positive semi-definite, applying (12.2.91) with 𝑌 = 𝑍 † 𝑍
gives us
h 𝑟 i 12 12 𝑟
Tr 𝑍 𝑋 𝑍 † = Tr 𝑍 † 𝑍 𝑋 𝑍 † 𝑍 (12.2.92)

≤ Tr 𝑋 𝑟 (𝑍 † 𝑍) 𝑟 .

(12.2.93)

Substituting 𝑟 = 𝛼, 𝑍 = 𝑉, and 𝑋 = 𝑇 into this inequality gives us

Tr[(M 𝐴→𝐵 (𝑃 𝑅 𝐴 )) 𝛼 ] = Tr[(𝑉𝑇𝑉 † ) 𝛼 ] (12.2.94)

757
Chapter 12: Classical Communication

≤ Tr[(𝑉 †𝑉) 𝛼𝑇 𝛼 ]. (12.2.95)

Letting ∑︁ √︃
𝑆 := ⟨𝑥| ⊗ 𝐶 𝑅𝑥 , (12.2.96)
𝑥∈X
observe that
𝑉 †𝑉 = 𝑆 † 𝑆 ⊗ 1𝐵 , (12.2.97)
which implies that
(𝑉 †𝑉) 𝛼 = (𝑆 † 𝑆) 𝛼 ⊗ 1𝐵 . (12.2.98)
Therefore, since 𝑇, and thus 𝑇 𝛼 , is block diagonal, we find that

Tr[(𝑉 †𝑉) 𝛼𝑇 𝛼 ]
" !#
∑︁
= Tr (𝑆 † 𝑆) 𝛼 ⊗ 1𝐵 |𝑥⟩⟨𝑥| ⊗ 1 𝑅 ⊗ (M 𝐴→𝐵 (𝐷 𝑥𝐴 )) 𝛼 (12.2.99)
𝑥∈X
∑︁ h i
= Tr (𝑆 † 𝑆) 𝛼 Tr[(M 𝐴→𝐵 (𝐷 𝑥𝐴 )) 𝛼 ], (12.2.100)
𝑥
𝑥∈X

where
†
(𝑆 𝑆) 𝛼
:= (⟨𝑥| ⊗ 1 𝑅 )(𝑆 † 𝑆) 𝛼 (|𝑥⟩ ⊗ 1 𝑅 ). (12.2.101)
𝑥
Now,
1
Tr[(M 𝐴→𝐵 (𝐷 𝑥𝐴 )) 𝛼 ] 𝛼 = M 𝐴→𝐵 (𝐷 𝑥𝐴 ) 𝛼
≤ 𝜈𝛼 (M 𝐴→𝐵 ) (12.2.102)
⇒ Tr[(M 𝐴→𝐵 (𝐷 𝑥𝐴 )) 𝛼 ] ≤ 𝜈𝛼 (M 𝐴→𝐵 ) 𝛼 . (12.2.103)

By taking the partial trace over system 𝐴 of 𝑃 𝑅 𝐴 , we find that

∑︁
𝑃 𝑅 = Tr 𝐴 [𝑃 𝑅 𝐴 ] = 𝐶 𝑅𝑥 = 𝑆𝑆 † . (12.2.104)
𝑥∈X

Using this, we find that

∑︁ h i
†
Tr (𝑆 𝑆) 𝛼
= Tr[(𝑆 † 𝑆) 𝛼 ] (12.2.105)
𝑥
𝑥∈X
= Tr[(𝑆𝑆 † ) 𝛼 ] (12.2.106)
= Tr[𝑃 𝛼𝑅 ] (12.2.107)
= ∥𝑃 𝑅 ∥ 𝛼𝛼 . (12.2.108)

758
Chapter 12: Classical Communication

Putting everything together, we conclude that

1
∥M 𝐴→𝐵 (𝑃 𝑅 𝐴 )∥ 𝛼 = (Tr[(M 𝐴→𝐵 (𝑃 𝑅 𝐴 )) 𝛼 ]) 𝛼 (12.2.109)
1
† 𝛼 𝛼
= Tr[(𝑉𝑇𝑉 ) ] (12.2.110)
1
† 𝛼 𝛼 𝛼
≤ Tr[(𝑉 𝑉) 𝑇 ] (12.2.111)
≤ 𝜈𝛼 (M 𝐴→𝐵 ) · ∥𝑃 𝑅 ∥ 𝛼 . (12.2.112)

To see the equality in (12.2.84), we prove it in two steps. First, consider that the
following inequality holds for all completely positive maps M 𝐴→𝐵 and N 𝐴′ →𝐵′ :
𝜈𝛼 (M 𝐴→𝐵 ⊗ N 𝐴′ →𝐵′ ) ≥ 𝜈𝛼 (M 𝐴→𝐵 ) · 𝜈𝛼 (N 𝐴′ →𝐵′ ). (12.2.113)
This follows simply by restricting the optimization in the definition of 𝜈𝛼 (M 𝐴→𝐵 ⊗
N 𝐴′ →𝐵′ ) to tensor-product states. Specifically,
𝜈𝛼 (M 𝐴→𝐵 ⊗ N 𝐴′ →𝐵′ ) = sup ∥ (M 𝐴→𝐵 ⊗ N 𝐴′ →𝐵′ )(𝜌 𝐴𝐴′ )∥ 𝛼 (12.2.114)
𝜌 𝐴𝐴′
≥ sup ∥ (M 𝐴→𝐵 ⊗ N 𝐴′ →𝐵′ )(𝜎𝐴 ⊗ 𝜔 𝐴′ )∥ 𝛼 (12.2.115)
𝜎𝐴,𝜔 𝐴′
= sup ∥ (M 𝐴→𝐵 (𝜎𝐴 ) ⊗ N 𝐴′ →𝐵′ (𝜔 𝐴′ )∥ 𝛼 (12.2.116)
𝜎𝐴,𝜔 𝐴′
= sup ∥ (M 𝐴→𝐵 (𝜎𝐴 )∥ 𝛼 · sup ∥N 𝐴′ →𝐵′ (𝜔 𝐴′ )∥ 𝛼 (12.2.117)
𝜎𝐴 𝜔 𝐴′
= 𝜈𝛼 (M 𝐴→𝐵 ) · 𝜈𝛼 (N 𝐴′ →𝐵′ ). (12.2.118)
The following reverse inequality
𝜈𝛼 (M 𝐴→𝐵 ⊗ N 𝐴′ →𝐵′ ) ≤ 𝜈𝛼 (M 𝐴→𝐵 ) · 𝜈𝛼 (N 𝐴′ →𝐵′ ) (12.2.119)
holds when N 𝐴′ →𝐵′ is an entanglement-breaking map. Indeed, considering an
arbitrary input state 𝜌 𝐴𝐴′ , the output state 𝜔 𝐴𝐵′ := N 𝐴′ →𝐵′ (𝜌 𝐴𝐴′ ) is a separable
operator. Applying (12.2.82) to the separable operator 𝜔 𝐴𝐵′ and identifying system
𝐵′ with 𝑅 in (12.2.82), we conclude that
∥ (M 𝐴→𝐵 ⊗ N 𝐴′ →𝐵′ )(𝜌 𝐴𝐴′ )∥ 𝛼 = ∥M 𝐴→𝐵 (𝜔 𝐴𝐵′ ) ∥ 𝛼 (12.2.120)
≤ 𝜈𝛼 (M 𝐴→𝐵 ) · ∥𝜔 𝐵′ ∥ 𝛼 (12.2.121)
= 𝜈𝛼 (M 𝐴→𝐵 ) · ∥N 𝐴′ →𝐵′ (𝜌 𝐴′ )∥ 𝛼 (12.2.122)
≤ 𝜈𝛼 (M 𝐴→𝐵 ) · 𝜈𝛼 (N 𝐴′ →𝐵′ ). (12.2.123)
Since the inequality holds for every input state 𝜌 𝐴𝐴′ , we conclude the inequality in
(12.2.119). ■
759
Chapter 12: Classical Communication

With Lemmas 12.17 and 12.18 in hand, we can now prove Theorem 12.16.

Proof of Theorem 12.16

e𝛼 (N) as
We start by using (7.5.3) to write the definition of 𝐾
e𝛼 (N) = inf sup 𝐷
𝐾 e𝛼 (N(𝜌 𝐴 )∥𝜎𝐵 ) (12.2.124)
𝜎𝐵 𝜌 𝐴

𝛼 1− 𝛼 1− 𝛼
= inf sup log2 𝜎𝐵 N(𝜌 𝐴 )𝜎𝐵2𝛼
2𝛼
. (12.2.125)
𝜎𝐵 𝜌 𝐴 𝛼 − 1
𝛼

Then, for every channel M,

e𝛼 (N ⊗ M)
𝐾
e𝛼 ((N ⊗ M)(𝜌 𝐴𝐵 )∥𝜎𝐴′ 𝐵′ )
= inf sup 𝐷 (12.2.126)
𝜎𝐴′ 𝐵′ 𝜌 𝐴𝐵

𝛼 1− 𝛼 1− 𝛼
= inf sup log2 𝜎𝐴2𝛼 ′ 𝐵′ ((N ⊗ M)(𝜌 𝐴𝐵 )) 𝜎 2𝛼
𝐴′ 𝐵 ′ (12.2.127)
𝛼 − 1 𝜎𝐴′ 𝐵′ 𝜌 𝐴𝐵 𝛼
𝛼 1− 𝛼 1− 𝛼
= inf log2 sup 𝜎𝐴2𝛼 ′ 𝐵 ′ ((N ⊗ M)(𝜌 𝐴𝐵 )) 𝜎𝐴′ 𝐵 ′
2𝛼
(12.2.128)
𝛼 − 1 𝜎𝐴′ 𝐵′ 𝜌 𝐴𝐵 𝛼

𝛼 1− 𝛼 1− 𝛼 1− 𝛼 1− 𝛼
≤ inf log2 sup 𝜎𝐴′ ⊗ 𝜏𝐵′ ((N ⊗ M)(𝜌 𝐴𝐵 )) 𝜎𝐴′ ⊗ 𝜏𝐵′
2𝛼 2𝛼 2𝛼 2𝛼
,
𝛼 − 1 𝜎𝐴′ ,𝜏𝐵′ 𝜌 𝐴𝐵
𝛼
(12.2.129)
where to obtain the inequality we have restricted the infimum to tensor product
states. Now, observe that since N is entanglement-breaking, then sandwiching
1− 𝛼
the output of the channel by the positive semi-definite operator 𝜎𝐴2𝛼
′ leads to a
′
new map N that is a completely positive entanglement-breaking map (though
not necessarily trace preserving). Similarly, sandwiching the output of M by the
1− 𝛼
positive semi-definite operator 𝜏𝐵2𝛼
′ leads to a new completely positive map M′.
Therefore, using Lemma 12.18, we obtain
e𝛼 (N ⊗ M)
𝐾

𝛼
≤ inf log2 sup ∥ (N′ ⊗ M′)(𝜌 𝐴𝐵 ) ∥ 𝛼 (12.2.130)
𝛼 − 1 𝜎𝐴′ ,𝜏𝐵′ 𝜌 𝐴𝐵
𝛼
= inf log2 𝜈𝛼 (N′ ⊗ M′) (12.2.131)
𝛼 − 1 𝜎𝐴′ ,𝜏𝐵′
760
Chapter 12: Classical Communication

𝛼
= inf log2 (𝜈𝛼 (N′)𝜈𝛼 (M′)) (12.2.132)
𝛼 − 1 𝜎𝐴′ ,𝜏𝐵′
𝛼
inf log2 𝜈𝛼 (N′) + log2 𝜈𝛼 (M′)

= (12.2.133)
𝛼 − 1 𝜎𝐴′ ,𝜏𝐵′
𝛼 𝛼
= inf log2 sup ∥N′ (𝜌 𝐴 ) ∥ 𝛼 + inf log2 sup ∥M′ (𝜔 𝐵 )∥ 𝛼 (12.2.134)
𝜎𝐴′ 𝛼 − 1 𝜌𝐴 𝜏𝐵′ 𝛼 − 1 𝜔𝐵
𝛼 𝛼
= inf sup log2 ∥N′ (𝜌 𝐴 ) ∥ 𝛼 + inf sup log2 ∥M′ (𝜔 𝐵 )∥ 𝛼 (12.2.135)
𝜎𝐴′ 𝜌 𝐴 𝛼 − 1 𝜏𝐵′ 𝜔 𝐵 𝛼 − 1

=𝐾e𝛼 (N) + 𝐾 e𝛼 (M). (12.2.136)

e𝛼 (N ⊗ M) ≤ 𝐾
So we have that 𝐾 e𝛼 (N) + 𝐾 e𝛼 (M) for every channel M. Using
Lemma 12.17, we obtain the desired result.
Note that the additivity of the Holevo information of every entanglement-
breaking channel follows from the additivity of the sandwiched Rényi Holevo
information of such channels by taking the limit 𝛼 → 1+ (the proof is analogous to
the one presented in Appendix 11.B).

12.2.4 Proof of the Strong Converse for Entanglement-Breaking

Channels

Having shown that the sandwiched Rényi Holevo information is additive for all
entanglement-breaking channels, we can now proceed further from (12.2.36) to
prove a strong converse theorem for all entanglement-breaking channels. Moreover,
since the sandwiched Rényi Holevo information e 𝜒𝛼 (N) satisfies lim𝛼→1+ e
𝜒𝛼 (N) =
𝜒(N) (the proof of this is analogous to the one presented in Appendix 11.B), we
can go beyond the statement of Theorem 12.13 and say that 𝐶 (N) = 𝜒(N) for all
entanglement-breaking channels N.

Theorem 12.19 Classical Capacity of Entanglement-Breaking Channels

For every entanglement-breaking channel N,

𝐶 (N) = 𝐶
e(N) = 𝜒(N). (12.2.137)

761
Chapter 12: Classical Communication

Remark: Note that this theorem holds more generally for every channel N for which the
𝜒 𝛼 (N) is additive.
sandwiched Rényi Holevo information e

Proof: Since lim𝛼→1+ e 𝜒𝛼 (N) = 𝜒(N) (the proof of this is analogous to the one
presented in Appendix 11.B), we find that the Holevo information is additive for all
entanglement-breaking channels. The equality 𝐶 (N) = 𝜒(N) then follows from
Theorem 12.13.
The remainder of the proof is devoted to establishing that 𝜒(N) is a strong
converse rate for classical communication over N, from which it follows that
e(N) ≤ 𝜒(N), which in turn implies, via (12.2.12), that 𝐶
𝐶 e(N) = 𝜒(N).

Fix 𝜀 ∈ [0, 1) and 𝛿 > 0. Let 𝛿1 , 𝛿2 > 0 be such that

𝛿 > 𝛿 1 + 𝛿 2 C 𝛿′ . (12.2.138)

Set 𝛼 ∈ (1, ∞) such that

𝛿1 ≥ e
𝜒𝛼 (N) − 𝜒(N), (12.2.139)
which is possible since e𝜒𝛼 (N) is monotonically increasing with 𝛼 (this follows
from Proposition 7.31), and since lim𝛼→1+ e𝜒𝛼 (N) = 𝜒(N). With this value of 𝛼,
take 𝑛 large enough so that

𝛼 1
𝛿2 ≥ log2 . (12.2.140)
𝑛(𝛼 − 1) 1−𝜀

Now, with the values of 𝑛 and 𝜀 chosen as above, every (𝑛, |M|, 𝜀) classical
communication protocol satisfies (12.2.36) in Corollary 12.15. In particular, using
the additivity of the sandwiched Rényi Holevo information for all 𝛼 > 1, we can
write (12.2.36) as

1 𝛼 1
log2 |M| ≤ e𝜒𝛼 (N) + log2 . (12.2.141)
𝑛 𝑛(𝛼 − 1) 1−𝜀

Rearranging the right-hand side of this inequality, and using the assumptions in
(12.2.138)–(12.2.140), we obtain

1 𝛼 1
log2 |M| ≤ 𝜒(N) + e𝜒𝛼 (N) − 𝜒(N) + log2 (12.2.142)
𝑛 𝑛(𝛼 − 1) 1−𝜀
≤ 𝜒(N) + 𝛿1 + 𝛿2 (12.2.143)

762
Chapter 12: Classical Communication

= 𝜒(N) + 𝛿′ (12.2.144)
< 𝜒(N) + 𝛿. (12.2.145)

So we have that 𝜒(N) + 𝛿 > 𝑛1 log2 |M| for all (𝑛, |M|, 𝜀) classical communication
protocols with 𝑛 sufficiently large. Due to this strict inequality, it follows that
there cannot exist an (𝑛, 2𝑛( 𝜒(N)+𝛿) , 𝜀) classical communication protocol for all
sufficiently large 𝑛 such that (12.2.140) holds, for if it did there would exist some
message set M such that 𝑛1 log2 |M| = 𝜒(N) + 𝛿, which we have just seen is
not possible. Since 𝜀 and 𝛿 are arbitrary, we conclude that for all 𝜀 ∈ [0, 1),
𝛿 > 0, and sufficiently large 𝑛, there does not exist an (𝑛, 2𝑛( 𝜒(N)+𝛿) , 𝜀) classical
communication protocol. This means that 𝜒(N) is a strong converse rate, which
completes the proof. ■

12.2.4.1 The Strong Converse from a Different Point of View

Just as we did in the case of entanglement-assisted classical communication in

Appendix 11.G, we can use the alternative definitions of classical capacity and strong
converse classical capacity (stated in Appendix A) to see that 𝐶 (N) = 𝐶
e(N) = 𝜒(N)
for all channels N for which the sandwiched Rényi Holevo information is additive and
that, as shown in Figure 12.5, the quantity 𝜒(N) is a sharp dividing point between
reliable, error-free communication and communication with error approaching one
exponentially fast. Specifically, by following the arguments in Appendix 11.G, we
obtain the following: for every sequence {(𝑛, 2𝑛𝑅 , 𝜀 𝑛 )}𝑛∈N of (𝑛, |M|, 𝜀) protocols,
with each element of the sequence having an arbitrary (but fixed) rate 𝑅 > 𝜒(N),
the sequence {𝜀 𝑛 }𝑛∈N of error probabilities approaches one at an exponential rate.

12.2.5 General Upper Bounds on the Strong Converse Classical

Capacity

The difficulty in proving the additivity of the Holevo information for a general
channel, and thus obtaining an upper bound on its classical capacity, has motivated
the study of other, more tractable upper bounds on the classical capacity of a
quantum channel. In this section, we present two upper bounds on the strong
converse classical capacity of a quantum channel.

763
Chapter 12: Classical Communication

n→∞

Error
Probability,
εn

0 Rate, Rn
χ (N )

Figure 12.5: The error probability 𝜀 𝑛 as a function of the rate 𝑅𝑛 for classical
communication over a quantum channel N for which the sandwiched Rényi
Holevo information e𝜒𝛼 (N) is additive. As 𝑛 → ∞, for every rate below the
Holevo information 𝜒(N), there exists a sequence of protocols with error
probability converging to zero. For every rate above the Holevo information
𝜒(N), the error probability converges to one for all possible protocols.

12.2.5.1 𝚼-Information Upper Bound

Recall from Proposition 12.3 that the following upper bound on the number of
transmitted bits holds for every (|M|, 𝜀) classical communication protocol:

log2 |M| ≤ 𝜒𝐻𝜀 (N), (12.2.146)

where the 𝜀-hypothesis testing Holevo information 𝜒𝐻𝜀 (N) of the quantum channel
N is defined in (7.11.93) as

𝜒𝐻𝜀 (N) = sup 𝐼 𝐻𝜀 (𝑋; 𝐵)𝜔 (12.2.147)

𝜌𝑋 𝐴
= sup inf 𝐷 𝜀𝐻 (N 𝐴→𝐵 (𝜌 𝑋 𝐴 )∥ 𝜌 𝑋 ⊗ 𝜎𝐵 ). (12.2.148)
𝜌 𝑋 𝐴 𝜎𝐵

Here, 𝜔 𝑋 𝐵 = N 𝐴→𝐵 (𝜌 𝑋 𝐴 ), and the optimization is over classical–quantum states

Í
𝜌 𝑋 𝐴 of the form 𝜌 𝑋 𝐴 = 𝑥∈X 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 , where X is a finite alphabet,
𝑝 : X → [0, 1] is a probability distribution, and {( 𝑝(𝑥), 𝜌 𝑥𝐴 )}𝑥∈X is an ensemble
of states.
We derived the inequality in (12.2.146) by comparing the actual classical
communication protocol over the channel N with a classical communication
protocol over the replacement channel R (see (12.1.23) in Section 12.1.1). The
replacement channel is useless for classical communication because it discards the
state encoded with the message and replaces it with a fixed state. We can make the
764
Chapter 12: Classical Communication

comparison between the channel N and the channel R for classical communication
more explicit by writing the quantity 𝜒𝐻𝜀 (H) as
𝜒𝐻𝜀 (N) = sup inf 𝐷 𝜀𝐻 (N 𝐴→𝐵 (𝜌 𝑋 𝐴 )∥R 𝐴→𝐵 (𝜌 𝑋 𝐴 )), (12.2.149)
𝜌 𝑋 𝐴 R 𝐴→𝐵

where R 𝐴→𝐵 has the action R 𝐴→𝐵 (𝜌 𝑋 𝐴 ) = 𝜌 𝑋 ⊗ 𝜎𝐵 .

Intuitively, an approach for obtaining an alternative upper bound on the strong
converse classical capacity is to expand the set of useless channels from the
replacement channels to the following set of completely positive trace non-increasing
maps:
𝔉 B {F 𝐴→𝐵 : ∃ 𝜎𝐵 ≥ 0, Tr[𝜎𝐵 ] ≤ 1, F 𝐴→𝐵 (𝜌 𝐴 ) ≤ 𝜎𝐵 ∀𝜌 𝐴 ∈ D(H 𝐴 )},
(12.2.150)
This set of maps, even though they contain completely positive, non-trace-preserving
maps, can be thought of intuitively as also being useless for classical communication,
and it contains the set of replacement channels. Using this set, we define the
generalized Υ-information as follows:

Definition 12.20 Generalized 𝚼-Information

Let 𝑫 be a generalized divergence (see Definition 7.15). For every quantum
channel N 𝐴→𝐵 , we define the generalized Υ-information of N as

𝚼(N) B sup inf 𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 )), (12.2.151)

𝜓 𝑅 𝐴 F∈𝔉

where the supremum is over pure states 𝜓 𝑅 𝐴 , with the dimension of 𝑅 the same
as the dimension of 𝐴.

Remark: Note that it suffices to optimize over pure states 𝜓 𝑅 𝐴, with the dimension of 𝑅 equal
to the dimension of 𝐴, when calculating the generalized Υ-information of a channel, i.e., for
general states 𝜌 𝑅 𝐴 (with the dimension of 𝑅 not necessarily equal to the dimension of 𝐴),

sup inf 𝑫 (N 𝐴→𝐵 (𝜌 𝑅 𝐴) ∥F 𝐴→𝐵 (𝜌 𝑅 𝐴)) = 𝚼(N). (12.2.152)

𝜌𝑅 𝐴 F∈𝔉

The proof of this proceeds analogously to the steps in (7.11.4)–(7.11.2) for proving that it suffices
to optimize over pure states when calculating the generalized channel divergence.

In this section, we are interested in the following generalized Υ-information

channel quantities:
765
Chapter 12: Classical Communication

1. The Υ-information of N,

Υ(N) B sup inf 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 )). (12.2.153)

𝜓 𝑅 𝐴 F∈𝔉

2. The 𝜀-hypothesis testing Υ-information of N,

Υ𝐻𝜀 (N) B sup inf 𝐷 𝜀𝐻 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 )). (12.2.154)

𝜓 𝑅 𝐴 F∈𝔉

3. The sandwiched Rényi Υ-information of N,

e𝛼 (N) B sup inf 𝐷
Υ e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 )), (12.2.155)
𝜓 𝑅 𝐴 F∈𝔉

where 𝛼 ∈ [1/2, 1) ∪ (1, ∞).

Proposition 12.21 Holevo Information and 𝚼-Information

The Υ-information Υ(N) of a quantum channel N is greater than or equal to its
Holevo information 𝜒(N):

Υ(N) ≥ 𝜒(N). (12.2.156)

Proof: To see this, we apply Proposition 7.83 to obtain

√ √ √ F √
Υ(N) = sup inf 𝐷 ( 𝜌 𝐴 ΓN
𝐴𝐵 𝜌 𝐴 ∥ 𝜌 𝐴 Γ 𝐴𝐵 𝜌 𝐴 ) (12.2.157)
𝜌 𝐴 F∈𝔉
√ √ √ F √
= inf sup 𝐷 ( 𝜌 𝐴 ΓN
𝐴𝐵 𝜌 𝐴 ∥ 𝜌 𝐴 Γ 𝐴𝐵 𝜌 𝐴 ). (12.2.158)
F∈𝔉 𝜌 𝐴

Then, by employing (12.2.158), we find that

√ √ √ F √
Υ(N) = inf sup 𝐷 ( 𝜌 𝐴 ΓN
𝐴𝐵 𝜌 𝐴 ∥ 𝜌 𝐴 Γ 𝐴𝐵 𝜌 𝐴 ) (12.2.159)
F∈𝔉 𝜌 𝐴

≥ inf sup 𝐷 (N 𝐴→𝐵 (𝜌 𝐴 )∥F 𝐴→𝐵 (𝜌 𝐴 )), (12.2.160)

F∈𝔉 𝜌 𝐴

where, to obtain the inequality, we used the fact that

√ √ √ √
𝐴𝐵 𝜌 𝐴 = ( 𝜌 𝐴′ ⊗ 1 𝐵 )N 𝐴→𝐵 (Γ 𝐴′ 𝐴 )( 𝜌 𝐴′ ⊗ 1 𝐵 )
𝜌 𝐴 ΓN (12.2.161)
√ √
= N 𝐴→𝐵 ( 𝜌 𝐴′ ⊗ 1 𝐴 )Γ𝐴′ 𝐴 ( 𝜌 𝐴′ ⊗ 1 𝐴 ) (12.2.162)
766
Chapter 12: Classical Communication
√︁ √︁
= N 𝐴→𝐵 ( 1 𝐴′ ⊗ 𝜌 𝐴 )Γ𝐴𝐴 ( 1 𝐴′ ⊗
T T
𝜌 𝐴) . (12.2.163)

In the last line we used the transpose trick (see (2.2.40). Then, we applied the
monotonicity of the quantum relative entropy with respect to the partial trace Tr 𝐴 .
Finally, we used the fact that 𝜌 T𝐴 is a state for every 𝜌 𝐴 , so that the optimization
over states remains unchanged.
Continuing, we have that

Υ(N) ≥ inf sup 𝐷 (N 𝐴→𝐵 (𝜌 𝐴 )∥𝜎F ) (12.2.164)

F∈𝔉 𝜌 𝐴

≥ inf sup 𝐷 (N 𝐴→𝐵 (𝜌 𝐴 )∥𝜎𝐵 ) (12.2.165)

𝜎𝐵 𝜌 𝐴

= 𝜒(N). (12.2.166)

To obtain the first inequality, we first used the fact that for every map F ∈ 𝔉 there
exists a state, which we call 𝜎F , such that F 𝐴→𝐵 (𝜌 𝐴 ) ≤ 𝜎F for all input states 𝜌 𝐴 .
We then used 2.(d) in Proposition 7.3. To obtain the last inequality, we simply
enlarged the set over which the infimum is performed to include all states. Then,
to obtain the equality on the last line, we used the expression in (12.2.80) for the
Holevo information. ■

We now prove an analogue of Proposition 12.3 involving the 𝜀-hypothesis testing

Υ-information, Υ𝐻𝜀 (N), in place of the 𝜀-hypothesis testing Holevo information
𝜒𝐻𝜀 (N).

Proposition 12.22
Let N be a quantum channel. For every (|M|, 𝜀) classical communication
protocol over N, the number of bits transmitted over N is bounded from above
by the 𝜀-hypothesis testing Υ-information of N, i.e.,

log2 |M| ≤ Υ𝐻𝜀 (N). (12.2.167)

Proof: For every (|M|, 𝜀) classical communication protocol, with encoding and
decoding channel given by E and D, respectively, the maximal error probability
criterion 𝑝 ∗err (E, D; N) ≤ 𝜀 holds. This implies 𝑝 err ((E, D); 𝑝, N) ≤ 𝜀 for the
average probability, where 𝑝 : M → [0, 1] is the uniform prior probability
distribution over the messages in M. If the encoding channel E is defined such
767
Chapter 12: Classical Communication

that we obtain the set {𝜌 𝑚𝐴 } 𝑚∈M of states associated to each message 𝑚 ∈ M (see
(12.1.2)), and the decoding channel D is defined by the POVM {Λ𝑚 𝐵 }𝑚
b
b∈M , then we
can write the average success probability 𝑝 succ ((E, D); 𝑝, N) of the code (E, D) as

𝑝 succ ((E, D); 𝑝, N) = 1 − 𝑝 err ((E, D); 𝑝, N) (12.2.168)

1 ∑︁
𝐵 N 𝐴→𝐵 (𝜌 𝐴 )],
Tr[Λ𝑚 𝑚
= (12.2.169)
|M|
𝑚∈M

and we have that 𝑝 succ ((E, D); 𝑝, N) ≥ 1 − 𝜀. Now, recall from (4.2.5) that we can
write the action of N 𝐴→𝐵 in terms of its Choi representation ΓN 𝐴𝐵 as

N 𝐴→𝐵 (𝜌 𝑚𝐴 ) = Tr 𝐴 (𝜌 𝑚𝐴 ) T ⊗ 1𝐵 ΓN

𝐴𝐵 (12.2.170)

for all 𝑚 ∈ M. Also, let us define the average state

1 ∑︁ 𝑚
𝜌𝐴 B 𝜌𝐴 , (12.2.171)
|M|
𝑚∈M

and a purification of it
√︁
|𝜙⟩ 𝐴𝐴′ B ( 1 𝐴 ⊗ 𝜌 𝐴 ⊗ 1 𝐴′ |Γ⟩ 𝐴′ 𝐴 ,
√︁ T
𝜌 𝐴′ )|Γ⟩ 𝐴𝐴′ = (12.2.172)

where we used the transpose trick in (2.2.40) to obtain the last equality. Then,
observe that
√︁ √︁
N 𝐴 →𝐵 (𝜙 𝐴𝐴′ ) = 𝜌 𝐴 N 𝐴 →𝐵 (Γ𝐴𝐴 ) 𝜌 T𝐴
′
T
′ ′ (12.2.173)
√︁ √︁
= 𝜌 T𝐴 ΓN T
𝐴𝐵 𝜌 𝐴 , (12.2.174)

which implies that the Choi operator ΓN

𝐴𝐵 can be written as
2

ΓN T −
1 T −
1
𝐴𝐵 = (𝜌 𝐴 ) N 𝐴′ →𝐵 (𝜙 𝐴𝐴′ )(𝜌 𝐴 ) . (12.2.175)
2 2

Therefore,

𝑝 succ ((E, D); 𝑝, N)

1 ∑︁
𝐵 N 𝐴→𝐵 (𝜌 𝐴 )]
Tr[Λ𝑚 𝑚
= (12.2.176)
|M|
𝑚∈M
2 Note that if 𝜌 T𝐴 is not invertible, then the inverse is understood to be on the support of 𝜌 T𝐴.

768
Chapter 12: Classical Communication

1 ∑︁ 𝑚 T N
= Tr (𝜌 𝐴 ) ⊗ Λ𝑚 𝐵 Γ 𝐴𝐵 (12.2.177)
|M|
𝑚∈M
1 ∑︁ h 𝑚 T 𝑚 T −
1 T −
1
i
= Tr (𝜌 𝐴 ) ⊗ Λ𝐵 (𝜌 𝐴 ) N 𝐴′ →𝐵 (𝜙 𝐴𝐴′ )(𝜌 𝐴 )
2 2 (12.2.178)
|M|
" 𝑚∈M ! #
1 1 ∑︁ 1
= Tr (𝜌 T𝐴 ) − 2 (𝜌 𝑚𝐴 ) T ⊗ Λ𝑚 T −
𝐵 (𝜌 𝐴 ) N 𝐴′ →𝐵 (𝜙 𝐴𝐴′ )
2 (12.2.179)
|M|
𝑚∈M
= Tr[Ω 𝐴𝐵 N 𝐴′ →𝐵 (𝜙 𝐴𝐴′ )], (12.2.180)

where !
1 1 ∑︁ 1
Ω 𝐴𝐵 B (𝜌 T𝐴 ) − 2 (𝜌 𝑚𝐴 ) T ⊗ Λ𝑚 T −
𝐵 (𝜌 𝐴 ) .
2 (12.2.181)
|M|
𝑚∈M
Note that Ω 𝐴𝐵 is positive semi-definite, i.e., Ω 𝐴𝐵 ≥ 0. Also, observe that since
𝐵 ≤ 1 𝐵 for all 𝑚 ∈ M, we have that
Λ𝑚
!
1
(𝜌 𝑚𝐴 ) T ⊗ 1𝐵 (𝜌 T𝐴 ) − 2 = 1 𝐴𝐵 .
∑︁ 1
Ω 𝐴𝐵 ≤ (𝜌 T𝐴 ) (12.2.182)
|M|
𝑚∈M

Together with Ω 𝐴𝐵 ≥ 0, this means that Ω 𝐴𝐵 is a measurement operator. So we

have that

𝑝 succ ((E, D); 𝑝, N) = Tr[Ω 𝐴𝐵 N 𝐴→𝐵 (𝜙 𝐴𝐴′ )] ≥ 1 − 𝜀. (12.2.183)

Now, let F ∈ 𝔉. This means that there exists a state, call it 𝜎𝐵 , such that
F(𝜌 𝐴 ) ≤ 𝜎𝐵 for all states 𝜌 𝐴 . We find that

Tr[Ω 𝐴𝐵 F 𝐴′ →𝐵 (𝜙 𝐴𝐴′ )]
" ! #
1 1 ∑︁ 1
= Tr (𝜌 T𝐴 ) − 2 (𝜌 𝑚𝐴 ) T ⊗ Λ𝑚 T −
𝐵 (𝜌 𝐴 ) F 𝐴′ →𝐵 (𝜙 𝐴𝐴′ )
2 (12.2.184)
|M|
𝑚∈M
1 ∑︁ F
= Tr (𝜌 𝑚𝐴 ) T ⊗ Λ𝑚 𝐵 Γ 𝐴𝐵 (12.2.185)
|M|
𝑚∈M
1 ∑︁
Tr[Λ𝑚 𝐵 F 𝐴→𝐵 (𝜌 𝐴 )]
𝑚
= (12.2.186)
|M|
𝑚∈M
1 ∑︁
≤ Tr[Λ𝑚𝐵 𝜎𝐵 ] (12.2.187)
|M|
𝑚∈M

769
Chapter 12: Classical Communication

1
≤ , (12.2.188)
|M|
where we used (12.2.175) to obtain the second equality and we used the fact that
F(𝜌 𝐴 ) ≤ 𝜎𝐵 for every input state 𝜌 𝐴 to obtain the second-to-last inequality.
Now, by optimizing the quantity Tr[Ω 𝐴𝐵 F 𝐴′ →𝐵 (𝜙 𝐴𝐴′ )] over all measurement
operators, subject to the constraint Tr[Ω 𝐴𝐵 N 𝐴′ →𝐵 (𝜙 𝐴𝐴′ )] ≥ 1 − 𝜀, we get that
log2 |M| ≤ 𝐷 𝜀𝐻 (N 𝐴′ →𝐵 (𝜙 𝐴𝐴′ )∥F 𝐴′ →𝐵 (𝜙 𝐴𝐴′ )). (12.2.189)
Since this holds for every F ∈ 𝔉, we have that
log2 |M| ≤ inf 𝐷 𝜀𝐻 (N 𝐴′ →𝐵 (𝜙 𝐴𝐴′ )∥F 𝐴′ →𝐵 (𝜙 𝐴𝐴′ )). (12.2.190)
F∈𝔉

Finally, optimizing over all pure states 𝜙 𝐴𝐴′ , we conclude that

log2 |M| ≤ sup inf 𝐷 𝜀𝐻 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 )) = Υ𝐻𝜀 (N), (12.2.191)
𝜓 𝑅 𝐴 F∈𝔉

as required. ■

As an immediate consequence of Propositions 12.22, 7.70, and 7.71, we have

the following two bounds:

Proposition 12.23
Let N be a quantum channel, let 𝜀 ∈ [0, 1), and let 𝛼 > 1. For every (|M|, 𝜀)
classical communication protocol over N, the following bounds hold
1
log2 |M| ≤ (Υ(N) + ℎ2 (𝜀)) , (12.2.192)
1−𝜀
𝛼 1
log2 |M| ≤ Υ
e𝛼 (N) + log2 . (12.2.193)
𝛼−1 1−𝜀

In the asymptotic setting, the bounds above become the following:

1 1 1 ⊗𝑛 1
log2 |M| ≤ Υ(N ) + ℎ2 (𝜀) , (12.2.194)
𝑛 1−𝜀 𝑛 𝑛

1 1e 𝛼 1
log2 |M| ≤ Υ𝛼 (N ⊗𝑛 ) + log2 ∀ 𝛼 > 1. (12.2.195)
𝑛 𝑛 𝑛(𝛼 − 1) 1−𝜀
770
Chapter 12: Classical Communication

These upper bounds hold for every (𝑛, |M|, 𝜀) classical communication protocol
over a quantum channel N, where 𝑛 ∈ N and 𝜀 ∈ [0, 1).
Now, as with the Holevo information and the sandwiched Rényi Holevo informa-
tion, we are faced with the additivity of the Υ-information and the sandwiched Rényi
Υ-information. Our primary focus is on the latter, since we would like to make a
statement about the strong converse for channels more general than entanglement-
breaking channels. It turns out that the sandwiched Rényi Υ-information is additive
for irreducibly-covariant channels.
Recall from Definition 4.18 that a channel N 𝐴→𝐵 is covariant with respect to
𝑔 𝑔
a group 𝐺 if there exist projective unitary representations {𝑈 𝐴 }𝑔∈𝐺 and {𝑉𝐵 }𝑔∈𝐺
such that
N(𝑈 𝐴 𝜌 𝐴 (𝑈 𝐴 ) † ) = 𝑉𝐵 N(𝜌 𝐴 )(𝑉𝐵 ) †
𝑔 𝑔 𝑔 𝑔
(12.2.196)
for all states 𝜌 𝐴 and all 𝑔 ∈ 𝐺. The channel N is called irreducibly covariant if
𝑔
the representation {𝑈 𝐴 }𝑔∈𝐺 acting on the input space of the channel is irreducible,
which means that it satisfies
1 ∑︁ 𝑔 1
𝑈 𝐴 𝜌 𝐴 (𝑈 𝐴 ) † =
𝑔
(12.2.197)
|𝐺 | 𝑔∈𝐺 𝑑𝐴

for every state 𝜌 𝐴 .

Proposition 12.24 Generalized 𝚼-Information for Irreducibly-Covariant

Channels
Let N 𝐴→𝐵 be an irreducibly-covariant quantum channel. Then the generalized
Υ-information of N can be calculated using the maximally entangled state Φ 𝑅 𝐴 ,
i.e.,
𝚼(N) = inf 𝑫 (N 𝐴→𝐵 (Φ 𝑅 𝐴 )∥F 𝐴→𝐵 (Φ 𝑅 𝐴 ))
F∈𝔉
(12.2.198)
= inf 𝑫 (𝜌 N
𝑅𝐵 ∥ 𝜌 F
𝑅𝐵 ).
F∈𝔉

Proof: By simply restricting the optimization over states 𝜓 𝑅 𝐴 in the definition of

the generalized Υ-information to the maximally entangled state Φ 𝑅 𝐴 , we obtain

𝚼(N) = sup inf 𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 )) (12.2.199)

𝜓 𝑅 𝐴 F∈𝔉
≥ inf 𝑫 (N 𝐴→𝐵 (Φ 𝑅 𝐴 )∥F 𝐴→𝐵 (Φ 𝑅 𝐴 )). (12.2.200)
F∈𝔉

771
Chapter 12: Classical Communication

To prove the reverse inequality, let us recall Proposition 7.84, specifically its proof.
𝑔
Let 𝐺 be the group with respect to which N is irreducibly covariant, let {𝑈 𝐴 }𝑔∈𝐺
be the irreducible representation of 𝐺 acting on the input space of N 𝐴→𝐵 , and let
𝑔
{𝑉𝐵 }𝑔∈𝐺 be the representation of 𝐺 acting on the output space of N. Since the maps
F 𝐴→𝐵 in 𝔉, in particular the map achieving the infimum in the definition of 𝚼(N),
need not be irreducibly covariant, we cannot use Proposition 7.84 directly. Instead,
we consider (7.11.41) in its proof. For every F 𝐴→𝐵 ∈ 𝔉, by using (7.11.52), the
inequality in (7.11.41) becomes

𝑫 (N 𝐴→𝐵 (Φ 𝑅 𝐴 )∥F 𝐴→𝐵 (Φ 𝑅 𝐴 ))

© 1 ∑︁ 𝑔 † 𝑔
≥ 𝑫 |𝑔⟩⟨𝑔| 𝑅 ⊗ (V𝐵 ) ◦ N 𝐴→𝐵 ◦ U 𝐴 (𝜓 𝑅 𝐴 )
′ (12.2.201)
|𝐺 | 𝑔∈𝐺
«
1 ∑︁
𝑔 † 𝑔
|𝑔⟩⟨𝑔| 𝑅′ ⊗ (V𝐵 ) ◦ F 𝐴→𝐵 ◦ U 𝐴 (𝜓 𝑅 𝐴 ) ®
ª
(12.2.202)
|𝐺 | 𝑔∈𝐺
¬
for every pure state 𝜓 𝑅 𝐴 , with the dimension of 𝑅 equal to the dimension of 𝐴.
Using the data-processing inequality for the generalized divergence with respect to
the partial trace Tr 𝑅 , and using the fact that N is covariant, we find that

𝑫 (N 𝐴→𝐵 (Φ 𝑅 𝐴 )∥F 𝐴→𝐵 (Φ 𝑅 𝐴 ))

1 ∑︁ 𝑔 † 𝑔

≥ 𝑫 N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) (V𝐵 ) ◦ F 𝐴→𝐵 ◦ U 𝐴 (𝜓 𝑅 𝐴 ) ® .
© ª
(12.2.203)
|𝐺 | 𝑔∈𝐺
« ¬
Now, observe that the map F′𝐴→𝐵 B |𝐺1 | 𝑔∈𝐺 (V𝐵 ) † ◦ F 𝐴→𝐵 ◦ U 𝐴 is in the set 𝔉
Í 𝑔 𝑔

since F is. Therefore, optimizing over all F′ ∈ 𝔉, we find that

𝑫 (N 𝐴→𝐵 (Φ 𝑅 𝐴 )∥F 𝐴→𝐵 (Φ 𝑅 𝐴 ))

≥ inf
′
𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F′𝐴→𝐵 (𝜓 𝑅 𝐴 )). (12.2.204)
F ∈𝔉

Since the map F ∈ 𝔉 is arbitrary, we conclude that

inf 𝑫 (N 𝐴→𝐵 (Φ 𝑅 𝐴 )∥F 𝐴→𝐵 (Φ 𝑅 𝐴 ))

F∈𝔉
≥ inf
′
𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F′𝐴→𝐵 (𝜓 𝑅 𝐴 )). (12.2.205)
F ∈𝔉

772
Chapter 12: Classical Communication

Finally, since the state 𝜓 𝑅 𝐴 is arbitrary, we obtain

inf 𝑫 (N 𝐴→𝐵 (Φ 𝑅 𝐴 )∥F 𝐴→𝐵 (Φ 𝑅 𝐴 ))

F∈𝔉
≥ sup inf
′
𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F′𝐴→𝐵 (𝜓 𝑅 𝐴 )) (12.2.206)
𝜓 𝑅 𝐴 F ∈𝔉
= 𝚼(N). (12.2.207)

We thus conclude (12.2.198), as required. ■

By Proposition 12.24, we have that for irreducibly-covariant channels the

sandwiched Rényi Υ-information can be calculated without an optimization over
all pure states 𝜓 𝑅 𝐴 —we simply set 𝜓 𝑅 𝐴 = Φ 𝑅 𝐴 . Using this fact, we obtain the
following:

Proposition 12.25 Subadditivity of Sandwiched Rényi 𝚼-Information for

Irreducibly-Covariant Channels
Let N 𝐴→𝐵 be an irreducibly covariant quantum channel. Then, for all 𝛼 ∈
[1/2, 1) ∪ (1, ∞), the sandwiched Rényi Υ-information Υ
e𝛼 (N) is subadditive,
i.e.,
e𝛼 (N ⊗𝑛 ) ≤ 𝑛Υ
Υ e𝛼 (N) (12.2.208)
for all 𝑛 ≥ 1.

Proof: Since N is irreducibly covariant, we use Proposition 12.24 to conclude that

e𝛼 (N ⊗𝑛 ) = inf 𝐷
Υ e𝛼 (N ⊗𝑛 (Φ 𝑅 𝑛 𝐴𝑛 )∥F 𝐴𝑛 →𝐵𝑛 (Φ 𝑅 𝑛 𝐴𝑛 )), (12.2.209)
F∈𝔉𝑛 𝐴→𝐵

where 𝔉𝑛 is the set of completely positive maps in (12.2.150) acting on the space
of the system 𝐴𝑛 . Now, the maximally entangled state Φ 𝑅 𝑛 𝐴𝑛 on 𝑛 identical copies
𝑅1 · · · 𝑅𝑛 and 𝐴1 · · · 𝐴𝑛 of the systems 𝑅 and 𝐴 splits into a tensor product in the
following way:
Φ 𝑅 𝑛 𝐴 𝑛 = Φ 𝑅 1 𝐴1 ⊗ · · · ⊗ Φ 𝑅 𝑛 𝐴 𝑛 . (12.2.210)
Furthermore, if we restrict the optimization over maps F 𝐴𝑛 →𝐵𝑛 ∈ 𝔉𝑛 to a tensor
product of identical maps G in the set 𝔉 such that

F 𝐴𝑛 →𝐵𝑛 = G 𝐴1 →𝐵1 ⊗ · · · ⊗ G 𝐴𝑛 →𝐵𝑛 , (12.2.211)

773
Chapter 12: Classical Communication

then, we arrive at the following bound:

e𝛼 (N ⊗𝑛 )
Υ
e𝛼 (N 𝐴1 →𝐵1 (Φ 𝑅1 𝐴1 ) ⊗ · · · ⊗ N 𝐴𝑛 →𝐵𝑛 (Φ 𝑅𝑛 𝐴𝑛 )∥
≤ inf 𝐷
G∈𝔉
G 𝐴1 →𝐵1 (Φ 𝑅1 𝐴1 ) ⊗ · · · ⊗ G 𝐴𝑛 →𝐵𝑛 (Φ 𝑅𝑛 𝐴𝑛 )) (12.2.212)
e𝛼 (N 𝐴→𝐵 (Φ 𝑅 𝐴 )∥G 𝐴→𝐵 (Φ 𝑅 𝐴 ))
= inf 𝑛 𝐷 (12.2.213)
G∈𝔉1
e𝛼 (N),
= 𝑛Υ (12.2.214)
as required, where the first equality follows from the additivity of the sand-
wiched Rényi relative entropy for tensor-product states (see (7.5.40) in Proposi-
tion 7.31). ■

With the subadditivity of the sandwiched Rényi Υ-information for irreducibly

covariant channels, we can now state the following strong converse theorem.

Theorem 12.26 𝚼-Information Upper Bound on the Strong Converse

Classical Capacity of Irreducibly-Covariant Channels
The Υ-information Υ(N) of an irreducibly-covariant quantum channel N is a
strong converse rate for classical communication over N; i.e.,
e(N) ≤ Υ(N).
𝐶 (12.2.215)

Proof: Fix 𝜀 ∈ [0, 1) and 𝛿 > 0. Let 𝛿1 , 𝛿2 > 0 be such that

𝛿 > 𝛿 1 + 𝛿 2 C 𝛿′ . (12.2.216)
Set 𝛼 ∈ (1, ∞) such that
𝛿1 ≥ Υ
e𝛼 (N) − Υ(N), (12.2.217)
which is possible because Υe𝛼 (N) is monotonically increasing with 𝛼 (this follows
from Proposition 7.31) and lim𝛼→1+ Υ e𝛼 (N) = Υ(N) (see Appendix 12.A for a
proof). With this value of 𝛼, take 𝑛 large enough so that

𝛼 1
𝛿2 ≥ log2 . (12.2.218)
𝑛(𝛼 − 1) 1−𝜀

Now, with the values of 𝑛 and 𝜀 chosen as above, every (𝑛, |M|, 𝜀) classical
communication protocol satisfies (12.2.195). In particular, using the subadditivity
774
Chapter 12: Classical Communication

of the sandwiched Rényi Υ-information for all 𝛼 > 1, we can write (12.2.195) as

1 𝛼 1
log2 |M| ≤ Υ
e𝛼 (N) + log2 . (12.2.219)
𝑛 𝑛(𝛼 − 1) 1−𝜀
Rearranging the right-hand side of this inequality, and using the assumptions in
(12.2.216)–(12.2.218), we obtain

1 𝛼 1
log2 |M| ≤ Υ(N) + Υe𝛼 (N) − Υ(N) + log2 (12.2.220)
𝑛 𝑛(𝛼 − 1) 1−𝜀
≤ Υ(N) + 𝛿1 + 𝛿2 (12.2.221)
= Υ(N) + 𝛿′ (12.2.222)
< Υ(N) + 𝛿. (12.2.223)

So we have that Υ(N) + 𝛿 > 𝑛1 log2 |M| for all (𝑛, |M|, 𝜀) classical communication
protocols with 𝑛 sufficiently large. Due to this strict inequality, it follows that
there cannot exist an (𝑛, 2𝑛(Υ(N)+𝛿) , 𝜀) classical communication protocol for all
sufficiently large 𝑛 such that (12.2.218) holds, for if it did there would exist some
message set M such that 𝑛1 log2 |M| = Υ(N) + 𝛿, which we have just seen is not
possible. Since 𝜀 and 𝛿 are arbitrary, we have that for all 𝜀 ∈ [0, 1), 𝛿 > 0, and
sufficiently large 𝑛, there does not exist an (𝑛, 2𝑛(Υ(N)+𝛿) , 𝜀) classical communication
protocol. This means that Υ(N) is a strong converse rate, which completes the
proof. ■

Theorem 12.26 thus gives us an upper bound on the strong converse classical
e(N) of any irreducibly-covariant channel N, namely,
capacity 𝐶
e(N) ≤ Υ(N),
𝐶 (12.2.224)

which in turn implies, via (12.2.12), that

𝐶 (N) ≤ Υ(N) (12.2.225)

for every irreducibly-covariant channel N. Recall that the Holevo information

𝜒(N) is an achievable rate for classical communication over any quantum channel,
which implies that 𝐶 (N) ≥ 𝜒(N). (This in fact gives another way for concluding
that Υ(N) ≥ 𝜒(N).)
It turns out that the Υ-information Υ(N) is equal to the Holevo information in
the case of the erasure channel E (𝑑)
𝑝 . Recall from Section 11.3.1.2 that the erasure

775
Chapter 12: Classical Communication

channel is irreducibly covariant. This fact, along with other reasoning, allows us to
conclude that

𝐶 (E (𝑑) e (𝑑) (𝑑)

𝑝 ) = 𝐶 (E 𝑝 ) = 𝜒(E 𝑝 ) = (1 − 𝑝) log2 𝑑 (12.2.226)

for all dimensions 𝑑 ≥ 2 and all 𝑝 ∈ [0, 1]. We provide a proof of this chain of
equalities in Section 12.3.1.2 below.

12.2.5.2 SDP Upper Bound

While the Υ-information gives us an upper bound on the strong converse classical
capacity of any irreducibly-covariant channel, computing it is relatively challenging
due to the minimization over the set 𝔉. In this section, we define a subset of 𝔉,
denoted by 𝔉𝛽 , that allows us to obtain a quantity that can be computed using a
semi-definite program (SDP). Furthermore, this quantity turns out to be additive for
all channels, which means that it is an upper bound on the strong converse classical
capacity for all channels.
The set 𝔉𝛽 is defined as the following set of completely positive maps:

𝔉𝛽 B {F completely positive : 𝛽(F) ≤ 1}, (12.2.227)

where 𝛽(F) is defined as the solution to the following optimization problem:



 infimum Tr[𝑆 𝐵 ]

 F T
𝛽(F) B subject to −𝑅 𝐴𝐵 ≤ (Γ𝐴𝐵 ) 𝐵 ≤ 𝑅 𝐴𝐵 , (12.2.228)
−1 𝐴 ⊗ 𝑆 𝐵 ≤ 𝑅 T𝐴𝐵 ≤ 1𝐴 ⊗ 𝑆𝐵

 𝐵


Note that the optimization occurs over the operators 𝑆 𝐵 and 𝑅 𝐴𝐵 , and that the
optimization problem as a whole is an SDP. Indeed, recalling the general form of
an SDP from Section 2.4, we can write it as

 infimum Tr[𝐶 𝑋]



𝛽(F) = subject to Φ(𝑋) ≥ 𝐷, (12.2.229)
𝑋 ≥ 0,



where
1𝐵

𝑆𝐵 0 0
𝑋= , 𝐶= , (12.2.230)
0 𝑅 𝐴𝐵 0 0 𝐴𝐵
776
Chapter 12: Classical Communication

𝑅 𝐴𝐵 0 0 0
© ª
0 𝑅 𝐴𝐵 0 0
Φ(𝑋) =
®
0 1 𝑅 ⊗ 𝑆 𝐵 − 𝑅 𝐴𝐵
T𝐵 ®, (12.2.231)
0 0 ®
« 0 0 0 1𝑅 ⊗ 𝑆 𝐵 + 𝑅 𝐴𝐵 ¬
T𝐵

(ΓF𝐴𝐵 ) T 𝐵 0 0 0
© F
−(Γ𝐴𝐵 ) T𝐵 ª
0 0 0®
𝐷= ®. (12.2.232)
0 0 0 0®
« 0 0 0 0¬

Note that the constraints in (12.2.228) imply that 𝑆 𝐵 and 𝑅 𝐴𝐵 are positive semi-
definite, since the constraint −𝑅 𝐴𝐵 ≤ (ΓF𝐴𝐵 ) T 𝐵 ≤ 𝑅 𝐴𝐵 implies that

𝑅 𝐴𝐵 − (ΓF𝐴𝐵 ) T 𝐵 ≥ 0, (12.2.233)
𝑅 𝐴𝐵 + (ΓF𝐴𝐵 ) T 𝐵 ≥ 0. (12.2.234)

Adding the two inequalities leads to 𝑅 𝐴𝐵 ≥ 0. Similarly, the constraint −1 𝐴 ⊗ 𝑆 𝐵 ≤

𝑅 T𝐴𝐵
𝐵
≤ 1 𝐴 ⊗ 𝑆 𝐵 implies that 𝑆 𝐵 ≥ 0.
It is straightforward to show that 𝔉𝛽 is a subset of 𝔉, i.e., 𝔉𝛽 ⊆ 𝔉. Indeed,
suppose that for a given completely positive map F ∈ 𝔉𝛽 , the quantity 𝛽(F) in
(12.2.228) is achieved by the operators (𝑅 ∗𝐴𝐵 , 𝑆 ∗𝐵 ). Note that since F ∈ 𝔉𝛽 , by
definition we have that Tr[𝑆 ∗𝐵 ] ≤ 1, and we also have that (ΓF𝐴𝐵 ) T 𝐵 ≤ 𝑅 ∗𝐴𝐵 and
(𝑅 ∗𝐴𝐵 ) T 𝐵 ≤ 1 𝐴 ⊗ 𝑆 ∗𝐵 . Letting 𝜎𝐵 ≡ 𝑆 ∗𝐵 , for every state 𝜌 𝐴 we find that

F 𝐴→𝐵 (𝜌 𝐴 ) = Tr 𝐴 (𝜌 T𝐴 ⊗ 1𝐵 )ΓF𝐴𝐵

(12.2.235)
T
= Tr 𝐴 (𝜌 𝐴 ⊗ 1𝐵 )(Γ𝐴𝐵 )F T𝐵
T
(12.2.236)
h √︁ √︁ i T
= Tr 𝐴 ( 𝜌 𝐴 ⊗ 1𝐵 )(Γ𝐴𝐵 ) ( 𝜌 𝐴 ⊗ 1𝐵 )
T F T𝐵 T
(12.2.237)
h √︁ √︁ i T
≤ Tr 𝐴 ( 𝜌 𝐴 ⊗ 1𝐵 )𝑅 𝐴𝐵 ( 𝜌 𝐴 ⊗ 1𝐵 )
T ∗ T
(12.2.238)
h √︁ √︁ i
= Tr 𝐴 ( 𝜌 𝐴 ⊗ 1𝐵 )(𝑅 𝐴𝐵 ) ( 𝜌 𝐴 ⊗ 1𝐵 )
T ∗ T𝐵 T
(12.2.239)
≤ Tr 𝐴 (𝜌 T𝐴 ⊗ 1𝐵 )( 1 𝐴 ⊗ 𝜎𝐵 )

(12.2.240)
= 𝜎𝐵 , (12.2.241)

where to obtain the first inequality we used (ΓF𝐴𝐵 ) T 𝐵 ≤ 𝑅 ∗𝐴𝐵 and to obtain the second
inequality we used (𝑅 ∗𝐴𝐵 ) T 𝐵 ≤ 1 𝐴 ⊗ 𝜎𝐵 . Therefore, F 𝐴→𝐵 (𝜌 𝐴 ) ≤ 𝜎𝐵 for all 𝜌 𝐴 ,
which means that F 𝐴→𝐵 ∈ 𝔉. Since F ∈ 𝔉𝛽 is arbitrary, we conclude that 𝔉𝛽 ⊂ 𝔉.
777
Chapter 12: Classical Communication

By replacing the set 𝔉 in the definition of the generalized Υ-information of a

channel N with the set 𝔉𝛽 , we obtain the following quantity:

𝚼 𝛽 (N) B sup inf 𝑫 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 )), (12.2.242)

𝜓 𝑅 𝐴 F∈𝔉𝛽

which we call the Υ 𝛽 -information of N. When we take the generalized divergence

𝑫 to be the quantum relative entropy, the hypothesis testing relative entropy, and
the sandwiched Rényi relative entropy, we have

Υ 𝛽 (N) B sup inf 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 )), (12.2.243)

𝜓 𝑅 𝐴 F∈𝔉𝛽
𝛽,𝜀
Υ𝐻 (N) B sup inf 𝐷 𝜀𝐻 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 )), (12.2.244)
𝜓 𝑅 𝐴 F∈𝔉𝛽
e𝛼𝛽 (N) B sup inf 𝐷
Υ e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 )). (12.2.245)
𝜓 𝑅 𝐴 F∈𝔉𝛽

Since the set 𝔉𝛽 is a subset of 𝔉, minimizing over 𝔉𝛽 can never lead to a smaller
value compared to minimizing over 𝔉, which means that

𝚼(N) ≤ 𝚼 𝛽 (N). (12.2.246)

Therefore, using the Υ 𝛽 -information, the bound in Proposition 12.22 on every

(|M|, 𝜀) classical communication protocol thus becomes
𝛽,𝜀
log2 |M| ≤ Υ𝐻 (N). (12.2.247)

This bound is looser than the one in Proposition 12.22, but it has the advantange
that it can be computed using an SDP. This is due to the fact that the hypothesis
testing relative entropy can itself be computed via an SDP.
Although we get an efficiently computable upper bound in the one-shot setting
via the Υ 𝛽 -information, in the asymptotic setting this bound is not known to
be additive, making its evaluation computationally prohibitive as the number 𝑛
of channel uses increases. Instead, for the purpose of obtaining an efficiently
computable upper bound in the asymptotic setting, we define the following quantity
for every quantum channel N:

𝐶 𝛽 (N) = log2 𝛽(N), (12.2.248)

Since 𝛽(N) can be computed using an SDP (in particular, via the optimization
problem in (12.2.228)), we have that 𝐶 𝛽 (N) can also be computed using an SDP.
778
Chapter 12: Classical Communication

A useful fact about the quantity 𝐶 𝛽 (N) is the fact that it is additive, i.e.,

𝐶 𝛽 (N1 ⊗ N2 ) = 𝐶 𝛽 (N1 ) + 𝐶 𝛽 (N2 ) (12.2.249)

for all channels N1 and N2 , as proved in Appendix 12.B by employing semi-definite

programming duality. We use this fact to prove that 𝐶 𝛽 (N) is a strong converse rate
for classical communication over a channel N in Theorem 12.28 below. However,
first we establish the following proposition:

Proposition 12.27
For a quantum channel N, the following inequalities hold for all 𝛼 > 1:

Υ(N) ≤ 𝐶 𝛽 (N), e𝛼 (N) ≤ 𝐶 𝛽 (N).

Υ (12.2.250)

Proof: Let F = 𝛽(N)1

N. Then, 𝛽(F) = 𝛽(N)
1
𝛽(N) = 1, which means that F ∈ 𝔉𝛽 .
Then, since 𝔉𝛽 ⊆ 𝔉, we can choose F as above when performing the infimum in
the definition of Υ(N). This leads to

Υ(N) = sup inf 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 )) (12.2.251)

𝜓 𝑅 𝐴 F∈𝔉
!
1
≤ sup 𝐷 N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) (12.2.252)
𝜓𝑅 𝐴 𝛽(N)
= sup 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥N 𝐴→𝐵 (𝜓 𝑅 𝐴 )) + log2 𝛽(N) (12.2.253)
𝜓𝑅 𝐴
= 𝐶 𝛽 (N), (12.2.254)

as required, where the second equality follows because 𝐷 (𝜌∥ 1𝑥 𝜎) = 𝐷 (𝜌∥𝜎)+log2 𝑥

for every state 𝜌, positive semi-definite operator 𝜎, and 𝑥 > 0 (see (7.2.26)). The
last equality follows because 𝐷 (𝜌∥ 𝜌) = 0.
e𝛼 (𝜌∥ 1 𝜎) = 𝐷
Similarly, using the fact that 𝐷 e𝛼 (𝜌∥𝜎) + log2 𝑥, and using the
𝑥
same choice for the map F as above, we find that Υ
e𝛼 (N) ≤ 𝐶 𝛽 (N). ■

The additivity of 𝐶 𝛽 , along with Proposition 12.27, leads us to the following:

779
Chapter 12: Classical Communication

Theorem 12.28 SDP Upper Bound on Strong Converse Classical Capac-

ity
For a quantum channel N, the quantity 𝐶 𝛽 (N) is a strong converse rate for
classical communication over N.

Proof: We start by observing that, using Proposition 12.27, the inequality in

(12.2.195) can be written as

1 1 𝛼 1
log2 |M| ≤ 𝐶 𝛽 (N ⊗𝑛 ) + log2 (12.2.255)
𝑛 𝑛 𝑛(𝛼 − 1) 1−𝜀
for all 𝛼 > 1. This inequality holds for all (𝑛, |M|, 𝜀) classical communication
protocols. Using the additivity of 𝐶 𝛽 , we find that

1 𝛼 1
log2 |M| ≤ 𝐶 𝛽 (N) + log2 . (12.2.256)
𝑛 𝑛(𝛼 − 1) 1−𝜀
Now, let us fix 𝜀 ∈ [0, 1) and 𝛿 > 0. Let 𝛿′ be such that 𝛿 > 𝛿′. Since 𝐶 𝛽 (N)
does not depend on 𝛼, let us choose 𝛼 such that the right-hand side of the above
inequality is as small as possible, which occurs as 𝛼 → ∞. With this choice of 𝛼,
take 𝑛 large enough so that

1 1
𝛿′ ≥ log2 . (12.2.257)
𝑛 1−𝜀
Then, we obtain

1 1 1
log2 |M| ≤ 𝐶 𝛽 (N) + log2 (12.2.258)
𝑛 𝑛 1−𝜀
≤ 𝐶 𝛽 (N) + 𝛿′ (12.2.259)
< 𝐶 𝛽 (N) + 𝛿. (12.2.260)
So we have that 𝐶 𝛽 (N) + 𝛿 > 𝑛1 log2 |M| for all (𝑛, |M|, 𝜀) classical communication
protocols with 𝑛 sufficiently large. Due to this strict inequality, it follows that
there cannot exist an (𝑛, 2𝑛(𝐶𝛽 (N)+𝛿) , 𝜀) classical communication protocol for all
sufficiently large 𝑛 such that (12.2.257) holds, for if it did there would exist
some message set M such that 𝑛1 log2 |M| = 𝐶 𝛽 (N) + 𝛿, which we have just seen
is not possible. Since 𝜀 and 𝛿 are arbitrary, we have that for all 𝜀 ∈ [0, 1),
𝛿 > 0, and sufficiently large 𝑛, there does not exist an (𝑛, 2𝑛(𝐶𝛽 (N)+𝛿) , 𝜀) classical
communication protocol. This means that 𝐶 𝛽 (N) is a strong converse rate, which
completes the proof. ■
780
Chapter 12: Classical Communication

By examining (12.2.258) in the above proof, we see that the following bound
holds for an arbitrary (𝑛, |M|, 𝜀) classical communication protocol:

1 1 1
log2 |M| ≤ 𝐶 𝛽 (N) + log2 . (12.2.261)
𝑛 𝑛 1−𝜀

If we fix the rate 𝑅 = 𝑛1 log2 |M|, then this bound can be rewritten as follows:

1 − 𝜀 ≤ 2−𝑛(𝑅−𝐶𝛽 (N)) , (12.2.262)

which indicates that communicating at a rate 𝑅 > 𝐶 𝛽 (N) implies the success
probability 1 − 𝜀 of every sequence of such protocols decays exponentially fast to
zero.

12.3 Examples
In this section, we present various examples of channels with known formulas for the
Holevo information and/or known results on additivity of the Holevo information.
Let us start by making some observations about the Holevo information 𝜒(N)
of a channel N. First, by expanding the definition of the Holevo information using
the expression for the mutual information in terms of the relative entropy, we arrive
at the following:

Proposition 12.29 Alternate Forms for Channel Holevo Information

For a channel N, the following equalities hold
" #
∑︁
𝜒(N) = sup 𝐻 (N(𝜌 𝐴 )) − 𝑝(𝑥)𝐻 (N(𝜓 𝑥𝐴 )) , (12.3.1)
{( 𝑝(𝑥),𝜓 𝐴
𝑥 )}
𝑥 𝑥∈X
" #
∑︁
= sup 𝐻 (N(𝜌 𝐴 )) − 𝑝(𝑥)𝐻 (N𝑐 (𝜓 𝑥𝐴 )) . (12.3.2)
{( 𝑝(𝑥),𝜓 𝐴
𝑥 )}
𝑥 𝑥∈X
Í
where {( 𝑝(𝑥), 𝜓 𝑥𝐴 )}𝑥∈X is an ensemble of pure states, 𝜌 𝐴 B 𝑥∈X 𝑝(𝑥)𝜓 𝑥𝐴 , and
we recall from (7.1.1) and (7.2.88) that 𝐻 (𝜌) = −Tr[𝜌 log 𝜌] is the quantum
entropy of 𝜌.

781
Chapter 12: Classical Communication

Proof: We start by recalling the definition of the Holevo information 𝜒(N) of N

from (7.11.106):
𝜒(N) B sup 𝐼 (𝑋; 𝐵)𝜔 , (12.3.3)
𝜌𝑋 𝐴

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜌 𝑋 𝐴 ), and the supremum is over all classical-quantum states of

Í
the form 𝜌 𝑋 𝐴 = 𝑥∈X 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐴 , with X a finite alphabet with associated |X|-
dimensional system 𝑋, {𝜌 𝑥𝐴 }𝑥∈X a set of states, and 𝑝 : X → [0, 1] a probability
distribution on X. Recall from Proposition 7.87 that to compute the Holevo
information it suffices to take ensembles consisting only of pure states.
Defining the classical–quantum state 𝜌 𝑋 𝐴 as
∑︁
𝜌𝑋 𝐴 = 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜓 𝑥𝐴 , (12.3.4)
𝑥∈X

where {𝜓 𝑥𝐴 }𝑥∈X is a set of pure states, and defining 𝜌′𝑋 𝐵 = N 𝐴→𝐵 (𝜌 𝑋 𝐴 ) is another
classical–quantum state, it follows from Proposition 7.14 that
!
∑︁ ∑︁
𝐼 (𝑋; 𝐵) 𝜌′ = 𝐻 𝑝(𝑥)N(𝜓 𝑥𝐴 ) − 𝑝(𝑥)𝐻 (N(𝜓 𝑥𝐴 )) (12.3.5)
𝑥∈X 𝑥∈X

for all states 𝜌 𝑋 𝐴 . Since optimizing over classical–quantum states is equivalent to

optimizing over ensembles {( 𝑝(𝑥), 𝜓 𝑥𝐴 )}𝑥∈X , we obtain (12.3.1).
To prove (12.3.2), let 𝑉𝐴→𝐵𝐸 be an isometric extension of N. Then, for every
pure state 𝜓 on the channel input system 𝐴, we can write
N(𝜓) = Tr𝐸 [𝑉𝜓𝑉 † ]. (12.3.6)
Since 𝑉 |𝜓⟩ is a pure state, it follows that Tr𝐸 [𝑉𝜓𝑉 † ] and Tr 𝐵 [𝑉𝜓𝑉 † ] have the
same (non-zero) eigenvalues. The latter state is equal to N𝑐 (𝜓 𝐴 ) by definition,
which means that
𝐻 (N(𝜓 𝐴 )) = 𝐻 (N𝑐 (𝜓 𝐴 )). (12.3.7)
Therefore, (12.3.2) follows. ■

12.3.1 Covariant Channels

For irreducibly-covariant channels (see Definition 4.18), the Holevo information

takes a particularly simple form.

782
Chapter 12: Classical Communication

Theorem 12.30 Holevo Information of Irreducibly-Covariant Channels

Suppose N 𝐴→𝐵 is a covariant channel with respect to a finite group 𝐺, with
𝑔
an irreducible representation {𝑈 𝐴 }𝑔∈𝐺 of 𝐺 acting on the input space of the
𝑔
channel and another representation {𝑉𝐵 }𝑔∈𝐺 of 𝐺 acting on the output space of
the channel. Then,

𝜒(N) = 𝐻 (N(𝜋 𝐴 )) − 𝐻min (N), (12.3.8)

where 𝜋 𝐴 = 1𝑑 𝐴𝐴 and
𝐻min (N) B min 𝐻 (N(𝜌 𝐴 )) (12.3.9)
𝜌𝐴

is called the minimum output entropy of N. The minimization is with respect

to all input states 𝜌 𝐴 in the domain of N.

Proof: We have
" ! #
∑︁ ∑︁
𝜒(N) = sup 𝐻 𝑝(𝑥)N(𝜌 𝑥𝐴 ) − 𝑝(𝑥)𝐻 (N(𝜌 𝑥𝐴 )) (12.3.10)
{( 𝑝(𝑥),𝜌 𝑥𝐴)} 𝑥 𝑥∈X
( !) 𝑥∈X
∑︁
≤ sup 𝐻 𝑝(𝑥)N(𝜌 𝑥𝐴 )
{( 𝑝(𝑥),𝜌 𝑥𝐴)} 𝑥
(𝑥∈X )
∑︁
+ sup − 𝑝(𝑥)𝐻 (N(𝜌 𝑥𝐴 )) (12.3.11)
{( 𝑝(𝑥),𝜌 𝑥𝐴)} 𝑥 𝑥∈X
≤ sup 𝐻 (N(𝜌 𝐴 )) + sup {−𝐻 (N(𝜌 𝐴 ))} (12.3.12)
𝜌𝐴 𝜌𝐴
= sup 𝐻 (N(𝜌 𝐴 )) − inf 𝐻 (N(𝜌 𝐴 )). (12.3.13)
𝜌𝐴 𝜌𝐴

Now, by the unitary invariance of the quantum entropy, for every state 𝜌 𝐴 we obtain
𝐻 (N(𝜌 𝐴 )) = 𝐻 (𝑉𝐵 N(𝜌 𝐴 )(𝑉𝐵 ) † ) = 𝐻 (N(𝑈 𝐴 𝜌 𝐴 (𝑈 𝐴 ) † ))
𝑔 𝑔 𝑔 𝑔
(12.3.14)
for all 𝑔 ∈ 𝐺. This implies that
1 ∑︁
𝐻 (N(𝑈 𝐴 𝜌 𝐴 (𝑈 𝐴 ) † ))
𝑔 𝑔
𝐻 (N(𝜌 𝐴 )) = (12.3.15)
|𝐺 | 𝑔∈𝐺

where the inequality follows from concavity of the quantum entropy and the last
equality follows because {𝑈 𝑔 }𝑔∈𝐺 is an irreducible representation, which implies
that
1 ∑︁ 𝑔 1𝐴
𝑈 𝐴 𝜌 𝐴 (𝑈 𝐴 ) † =
𝑔
(12.3.19)
|𝐺 | 𝑔∈𝐺 𝑑𝐴

for every state 𝜌 𝐴 . Then, since we are optimizing a continuous function over a
compact and convex set, the infimum in (12.3.13) can be achieved, meaning that
we can replace the infimum in (12.3.13) with a minimum, which means that

𝜒(N) ≤ 𝐻 (N(𝜋 𝐴 )) − 𝐻min (N). (12.3.20)

let 𝜌 ∗𝐴 be a state forowhich 𝐻 (N(𝜌 ∗𝐴 )) = 𝐻min (N).

To show the reverse inequality,n
1 𝑔 ∗ 𝑔 †
Then, we consider the ensemble |𝐺 | , 𝑈 𝐴 𝜌 𝐴 (𝑈 𝐴 ) 𝑔∈𝐺
and obtain

© ∑︁ 1 𝑔 ∗ 𝑔 † ª
∑︁ 1
𝐻 (N(𝑈 𝐴 𝜌 ∗𝐴 (𝑈 𝐴 ) † ))
𝑔 𝑔
𝜒(N) ≥ 𝐻 N(𝑈 𝐴 𝜌 𝐴 (𝑈 𝐴 ) ) ® − (12.3.21)
|𝐺 | |𝐺 |
«𝑔∈𝐺 ¬ 𝑔∈𝐺
∑︁ 1
𝐻 (𝑉𝐵 𝜌 ∗𝐴 (𝑉𝐵 ) † )
𝑔 𝑔
= 𝐻 (N(𝜋 𝐴 )) − (12.3.22)
𝑔∈𝐺
|𝐺 |
∑︁ 1
= 𝐻 (N(𝜋 𝐴 )) − 𝐻 (N(𝜌 ∗𝐴 )) (12.3.23)
𝑔∈𝐺
|𝐺 |
= 𝐻 (N(𝜋 𝐴 )) − 𝐻 (N(𝜌 ∗𝐴 )) (12.3.24)
= 𝐻 (N(𝜋 𝐴 )) − 𝐻min (N). (12.3.25)

Therefore,
𝜒(N) ≥ 𝐻 (N(𝜋 𝐴 )) − 𝐻min (N), (12.3.26)
and the proof is complete. ■

We note that to compute the minimum output entropy of any channel N, it

suffices to optimize over pure states. Indeed, by restricting the optimization to pure
784
Chapter 12: Classical Communication

states 𝜓 in the definition of 𝐻min (N), we find that

𝐻min (N) = min 𝐻 (N(𝜌)) ≤ min 𝐻 (N(𝜓)). (12.3.27)

𝜌 𝜓

On the other hand, sinceÍevery state 𝜌 can be written as a convex combination of

pure states, so that 𝜌 = 𝑥∈X 𝑝(𝑥)𝜓 𝑥 , we see that
!!
∑︁
𝐻 (N(𝜌)) = 𝐻 N 𝑝(𝑥)𝜓 𝑥 (12.3.28)
𝑥∈X
!
∑︁
=𝐻 𝑝(𝑥)N(𝜓 𝑥 ) (12.3.29)
∑︁𝑥∈X
≥ 𝑝(𝑥)𝐻 (N(𝜓 𝑥 )) (12.3.30)
𝑥∈X
≥ min 𝐻 (N(𝜓 𝑥 )) (12.3.31)
𝑥∈X
≥ min 𝐻 (N(𝜓)), (12.3.32)
𝜓

where the first inequality follows from concavity of the quantum entropy. So we
have
𝐻min (N) = min 𝐻 (N(𝜓)). (12.3.33)
𝜓

We now look at two irreducibly covariant channels, the depolarizing channel

and the erasure channel. A plot of the Holevo information for these channels is
given in Figure 12.6.

12.3.1.1 Depolarizing Channel

In Section 4.5, specifically in (4.5.31), we defined the qubit depolarizing channel as

𝑝
D 𝑝 (𝜌) B (1 − 𝑝) 𝜌 + (𝑋 𝜌𝑋 + 𝑌 𝜌𝑌 + 𝑍 𝜌𝑍) (12.3.34)
3
for all 𝑝 ∈ [0, 1]. From the arguments in Section 11.3.1.1, it follows that this
channel is covariant with respect to the Pauli operators on both the input and
output spaces. Furthermore, the Pauli operators {1, 𝑋, 𝑌 , 𝑍 } satisfy the property in
(12.3.19). We thus conclude that

𝜒(D 𝑝 ) = 𝐻 D 𝑝 (𝜋) − 𝐻min (D 𝑝 ) = 1 − 𝐻min (D 𝑝 ), (12.3.35)
785
Chapter 12: Classical Communication

1.0 Dp
Ep
0.8

0.6

χ (N )
0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0
p

Figure 12.6: The Holevo information of the depolarizing channel D 𝑝 (expressed

in (12.3.37)) and the erasure channel E 𝑝 (expressed in (12.3.57)), both of which
are defined for the parameter 𝑝 ∈ [0, 1].

where we used the fact that the depolarizing channel is unital, i.e., D 𝑝 ( 1) = 1, and
that 𝐻 (𝜋) = log2 2 = 1. To compute the minimum output entropy, we use the fact
that it suffices to minimize over pure states. It is straightforward to show that the
two eigenvalues of D 𝑝 (𝜓) are ( 2𝑝/3, 1 − 2𝑝/3) for every pure state 𝜓. Therefore,

2𝑝
𝐻min (D 𝑝 ) = ℎ2 (12.3.36)
3

so that
2𝑝
𝜒(D 𝑝 ) = 1 − ℎ2 (12.3.37)
3
for 𝑝 ∈ [0, 1]. Since the Holevo information is known to be additive for the
depolarizing channel (please consult the Bibliographic Notes in Section 12.5), it
follows that 𝜒(D 𝑝 ) is equal to the classical capacity of the depolarizing channel.
See Figure 12.6 for a plot of the Holevo information 𝜒(D 𝑝 ) of the depolarizing
channel.
For the qudit depolarizing channel D (𝑑) 𝑝 , recall from the discussion around
(11.3.26) that it is irreducibly covariant. Therefore, by Theorem 12.30, we obtain

𝜒(D (𝑑) 𝑑
𝑝 ) = log2 𝑑 − 𝐻min (D 𝑝 ). (12.3.38)
786
Chapter 12: Classical Communication

The calculation of the minimum output entropy for the qudit depolarizing channel is
analogous to the calculation of the minimum output entropy of the qubit depolarizing
channel. In particular, for every pure state 𝜓, the eigenvalues of D (𝑑) 𝑑
𝑝 (𝜓) are 1− 𝑑+1 𝑝
(with multiplicity one) and 𝑑 2𝑑−1 𝑝 (with multiplicity 𝑑 − 1). Indeed, by using the
𝑝𝑑 2
parameterization in (4.5.37) with 𝑞 = 𝑑 2 −1
, consider that
𝐼
D (𝑑)
𝑝 (𝜓) = (1 − 𝑞) 𝜓 + 𝑞 (12.3.39)
𝑑
𝑞 𝑞
= 1−𝑞+ 𝜓 + (𝐼 − 𝜓) (12.3.40)
𝑑 𝑑
𝑝𝑑 𝑝𝑑
= 1− 𝜓+ 2 (𝐼 − 𝜓) . (12.3.41)
𝑑+1 𝑑 −1
Therefore,

𝑑𝑝 𝑑𝑝 𝑑𝑝 𝑑𝑝
𝐻min (D (𝑑)
𝑝 ) =− 1− log2 1 − − log2 2 , (12.3.42)
𝑑+1 𝑑+1 𝑑+1 𝑑 −1
so that

𝑑𝑝 𝑑𝑝 𝑑𝑝 𝑑𝑝
𝜒(D (𝑑)
𝑝 ) = log2 𝑑 + 1 − log2 1 − + log2 2 (12.3.43)
𝑑+1 𝑑+1 𝑑+1 𝑑 −1
for 𝑑 ≥ 2 and 𝑝 ∈ [0, 1].
The Holevo information is also known to be additive for the qudit depolarizing
channel, which means that the expression in (12.3.43) is equal to its classical
capacity.

Theorem 12.31 Additivity of the Holevo Information for the Depolarizing

Channel
For every channel M,

𝜒(D (𝑑) (𝑑)

𝑝 ⊗ M) = 𝜒(D 𝑝 ) + 𝜒(M) (12.3.44)

for all 𝑑 ≥ 2 and all 𝑝 ∈ [0, 1]. Consequently,

𝐶 (D (𝑑) (𝑑)
𝑝 ) = 𝜒(D 𝑝 ). (12.3.45)

Proof: Please consult the Bibliographic Notes in Section 12.5. ■

787
Chapter 12: Classical Communication

It also holds that the Holevo information is the strong converse classical capacity
of the qudit depolarizing channel, i.e.,

e(D (𝑑)
𝐶 (𝑑)
𝑝 ) = 𝜒(D 𝑝 ) (12.3.46)

for all 𝑑 ≥ 2 and all 𝑝 ∈ [0, 1]. Please consult the Bibliographic Notes in
Section 12.5 for a reference to the proof.

12.3.1.2 Erasure Channel

Let us now consider the erasure channel. Recall from (4.5.18) that the erasure
channel E 𝑝 , with 𝑝 ∈ [0, 1], is defined as

E 𝑝 (𝜌) = (1 − 𝑝) 𝜌 + 𝑝Tr[𝜌]|𝑒⟩⟨𝑒|, (12.3.47)

where |𝑒⟩ is called the erasure state and is not in the Hilbert space of the input
system 𝐴. In other words, the state |𝑒⟩⟨𝑒| is supported on the space orthogonal to
the input space. As argued in Section 11.3.1.2, we can consider the output space of
the channel to be a qutrit system with the orthonormal basis {|0⟩, |1⟩, |2⟩}, and we
can let the state |2⟩ be the erasure state. Then,

E 𝑝 (𝜌) = (1 − 𝑝) 𝜌 + 𝑝|2⟩⟨2| (12.3.48)

for every state 𝜌.

We also argued in Section 11.3.1.2 that the erasure channel is irreducibly
covariant. Therefore, by Theorem 12.30, we have that

𝜒(E 𝑝 ) = 𝐻 E 𝑝 (𝜋) − 𝐻min (E 𝑝 ). (12.3.49)

Now, on the input qubit space, we have 𝜋 = 21 |0⟩⟨0| + 12 |1⟩⟨1|. Therefore,

1− 𝑝 1− 𝑝
E 𝑝 (𝜋) = |0⟩⟨0| + |1⟩⟨1| + 𝑝|2⟩⟨2|, (12.3.50)
2 2
which means that

1− 𝑝
𝐻 E 𝑝 (𝜋) = −(1 − 𝑝) log2 − 𝑝 log2 𝑝 (12.3.51)
2
= 1 − 𝑝 + ℎ2 ( 𝑝). (12.3.52)

788
Chapter 12: Classical Communication

In addition, for every pure state 𝜓, we have

𝐻 (E 𝑝 (𝜓)) = 𝐻 ((1 − 𝑝)𝜓 + 𝑝|2⟩⟨2|) (12.3.53)

= −(1 − 𝑝) log2 (1 − 𝑝) − 𝑝 log2 𝑝, (12.3.54)
= ℎ2 ( 𝑝), (12.3.55)

where the second equality follows because the state |2⟩⟨2| is orthogonal to 𝜓.
Therefore,
𝐻min (E 𝑝 ) = ℎ2 ( 𝑝), (12.3.56)
which means that the Holevo information of the erasure channel is

𝜒(E 𝑝 ) = 1 − 𝑝. (12.3.57)

This is consistent with what one might expect intuitively because communication
over the erasure channel is only possible with probability 1 − 𝑝, when no erasure
occurs, and conditioned on this outcome, the erasure channel is simply the identity
channel.
In general, for the qudit erasure channel E (𝑑)
𝑝 , whose action can be defined
on the 𝑑-dimensional space with orthonormal basis {|1⟩, . . . , |𝑑⟩} such that the
state |𝑑 + 1⟩ is the erasure state, we have that it is irreducibly covariant (see
Section 11.3.1.2). Using this fact, which implies that

𝜒(E 𝑝 ) = 𝐻 E 𝑝 (𝜋) − 𝐻min (E (𝑑)
(𝑑) (𝑑)
𝑝 ), (12.3.58)

along with arguments analogous to those presented above, we obtain

𝜒(E (𝑑)
𝑝 ) = (1 − 𝑝) log2 𝑑. (12.3.59)

Proposition 12.32 𝚼-Information of the Erasure Channel

The Υ-information of the qudit erasure channel E (𝑑)

𝑝 is given by

Υ(E (𝑑) (𝑑)

𝑝 ) = 𝜒(E 𝑝 ) = (1 − 𝑝) log2 𝑑. (12.3.60)

Proof: By combining (12.3.59) and Proposition 12.21, we conclude that Υ(E (𝑑)
𝑝 ) ≥
(𝑑)
𝜒(E 𝑝 ) = (1 − 𝑝) log2 𝑑. So we establish the opposite inequality.
789
Chapter 12: Classical Communication

Since the erasure channel is irreducibly covariant, Theorem 12.24 implies

that the optimization over states 𝜓 𝑅 𝐴 in the definition of the Υ-information is
unnecessary, and we have that
Υ(E (𝑑) (𝑑)
𝑝 ) = inf 𝐷 ((E 𝑝 ) 𝐴→𝐵 (Φ 𝑅 𝐴 )∥F 𝐴→𝐵 (Φ 𝑅 𝐴 )), (12.3.61)
F∈𝔉
where Φ 𝑅 𝐴 is the maximally entangled state with Schmidt rank 𝑑. The Choi state
(𝑑)
E
𝜌 𝑅𝐵𝑝 = (E (𝑑)
𝑝 ) 𝐴→𝐵 (Φ 𝑅 𝐴 ) of the erasure channel is
E
(𝑑)
1𝑅
𝜌 𝑅𝐵𝑝 = (1 − 𝑝)Φ 𝑅𝐵 + 𝑝⊗ |𝑒⟩⟨𝑒|. (12.3.62)
𝑑
Now, let us make a particular choice of the map F in the minimization over the
completely positive maps in 𝔉. Suppose that F 𝐴→𝐵 is such that
1− 𝑝 1𝑅 F
F 𝐴→𝐵 (Φ 𝑅 𝐴 ) = Φ 𝑅𝐵 + 𝑝 ⊗ |𝑒⟩⟨𝑒| C 𝜎𝑅𝐵 , (12.3.63)
𝑑 𝑑
which implies via (4.2.14) that its action on a general input state 𝜌 𝐴 is as follows:
1− 𝑝 1− 𝑝
F 𝐴→𝐵 (𝜌 𝐴 ) = 𝜌 𝐴 + 𝑝|𝑒⟩⟨𝑒| ≤ 1 𝐴 + 𝑝|𝑒⟩⟨𝑒|. (12.3.64)
𝑑 𝑑
Note that the map F 𝐴→𝐵 defined in this way is indeed in the set 𝔉, as demonstrated
by the inequality in (12.3.64) and the fact that the operator on the right-hand side
of (12.3.64) is a quantum state. Then,
(𝑑)
E
Υ(E (𝑑)
𝑝 ) ≤ 𝐷 (𝜌 𝑅𝐵𝑝 F
∥𝜎𝑅𝐵 ). (12.3.65)
It is straightforward to show that
(𝑑) (𝑑)
𝑝
E𝑝 E𝑝
Tr 𝜌 𝑅𝐵 log2 𝜌 𝑅𝐵 = (1 − 𝑝) log2 (1 − 𝑝) + 𝑝 log2 , (12.3.66)
𝑑
and
(𝑑)
E𝑝 F 1 − 𝑝 𝑝
Tr 𝜌 𝑅𝐵 log2 𝜎𝑅𝐵 = (1 − 𝑝) log2 + 𝑝 log2 , (12.3.67)
𝑑 𝑑
which means that
Υ(E (𝑑) (𝑑)
𝑝 ) ≤ (1 − 𝑝) log2 𝑑 = 𝜒(E 𝑝 ). (12.3.68)
This concludes the proof. ■

The classical capacity of the quantum erasure channel and its strong converse
now follow as a direct corollary of (12.3.59), (12.2.15), Proposition 12.32, the
irreducible covariance of the erasure channel, and Theorem 12.26.

790
Chapter 12: Classical Communication

Theorem 12.33 Classical Capacity of the Erasure Channel

For 𝑑 ≥ 2 and 𝑝 ∈ [0, 1], the following equality holds for the classical capacity
and strong converse classical capacity of the quantum erasure channel E (𝑑)𝑝 :

𝐶 (E (𝑑) e (𝑑)
𝑝 ) = 𝐶 (E 𝑝 ) = (1 − 𝑝) log2 𝑑. (12.3.69)

12.3.2 Amplitude Damping Channel

Recall from (4.5.1) that the amplitude damping channel is defined as

A𝛾 (𝜌) = 𝐴1 𝜌 𝐴1† + 𝐴2 𝜌 𝐴2† , (12.3.70)

where
√ √︁
𝐴1 = 𝛾|0⟩⟨1|, 𝐴2 = |0⟩⟨0| + 1 − 𝛾|1⟩⟨1|. (12.3.71)

It can be shown that

1 ∗ 2 ′

𝜒(A𝛾 ) = 𝑓 (𝑟 ) − log2 (1 − 𝑞 ) − 𝑞 𝑓 (𝑞) , (12.3.72)
2
where

𝑓 (𝑥) B (1 + 𝑥) log2 (1 + 𝑥) − (1 − 𝑥) log2 (1 − 𝑥), (12.3.73)

1 + 𝑥
𝑓 ′ (𝑥) = log2 , (12.3.74)
1−𝑥
√︄
(𝑞 − 𝛾) 2
𝑟∗ B 1 − 𝛾 − + 𝑞2, (12.3.75)
1−𝛾

and 𝑞 is determined via

(𝛾𝑞 − 𝛾 2 − 𝛾(1 − 𝛾)) 𝑓 ′ (𝑟 ∗ ) = −𝑟 ∗ (1 − 𝛾) 𝑓 ′ (𝑞). (12.3.76)

(Please consult the Bibliographic Notes in Section 12.5.) It is worth noting that
neither the additivity of the Holevo information for nor the classical capacity of the
amplitude damping channel are not known.

791
Chapter 12: Classical Communication

1.0 C β (Aγ )
χ (Aγ )
0.8

0.6

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0
γ

Figure 12.7: The Holevo information 𝜒(A𝛾 ) in (12.3.72) of the amplitude

damping channel, which represents a lower bound on its classical capacity. Also
shown is the upper bound 𝐶 𝛽 (A𝛾 ) on the classical capacity, which is defined in
(12.2.248) via the SDP in (12.2.228), and for the amplitude damping channel it
is given by the expression in (12.3.77).

As we have seen, the quantity 𝐶 𝛽 (A𝛾 ) is an upper bound on the classical

capacity of the amplitude damping channel, and it can be shown (please consult the
Bibliographic Notes in Section 12.5) that
√︁
𝐶 𝛽 (A𝛾 ) = log2 (1 + 1 − 𝛾). (12.3.77)

See Figure 12.7 for a plot of both 𝜒(A𝛾 ) and 𝐶 𝛽 (A𝛾 ).

12.3.3 Hadamard Channels

In this section, we prove that the Holevo information is additive for all Hadamard
channels.

792
Chapter 12: Classical Communication

Theorem 12.34 Additivity of Holevo Information for Hadamard Chan-

nels
For a Hadamard channel N and an arbitrary channel M, the following additivity
relation holds
𝜒(N ⊗ M) = 𝜒(N) + 𝜒(M). (12.3.78)

Proof: Using the expression in (12.3.2), we have that

𝜒(N ⊗ M)
" !
∑︁
= sup 𝐻 𝑝(𝑥)(N ⊗ M)(𝜓 𝑥𝐴1 𝐴2 )
{( 𝑝(𝑥),𝜓 𝑥 )} 𝑥 ∈X 𝑥∈X (12.3.79)
#
∑︁
− 𝑝(𝑥)𝐻 ((N𝑐 ⊗ M𝑐 )(𝜓 𝑥𝐴1 𝐴2 )) .
𝑥∈X

Now, for every bipartite state 𝜌 𝐴𝐵 , it follows from strong subadditivity that
𝐻 (𝜌 𝐴𝐵 ) ≤ 𝐻 (𝜌 𝐴 ) + 𝐻 (𝜌 𝐵 ) a fact known as the subadditivity of the quantum
entropy (consider (10.6.88) with system 𝐶 trivial). Using this for the first term in
(12.3.79), we find that
!
∑︁
𝐻 𝑝(𝑥)(N ⊗ M)(𝜓 𝑥𝐴1 𝐴2 )
𝑥∈X
! ! (12.3.80)
∑︁ ∑︁
≤𝐻 𝑝(𝑥)N(𝜓 𝑥𝐴1 ) + 𝐻 𝑝(𝑥)M(𝜓 𝑥𝐴2 ) .
𝑥∈X 𝑥∈X

We now make use of the following identity, which is straightforward to verify: for
every finite alphabet X and ensemble {( 𝑝(𝑥), 𝜌 𝑥𝐴𝐵 )},
∑︁ ∑︁ ∑︁
𝑝(𝑥)𝐷 (𝜌 𝑥𝐴𝐵 ∥ 𝜌 𝑥𝐴 ⊗ 𝜌 𝑥𝐵 ) = 𝑝(𝑥)𝐻 (𝜌 𝑥𝐴 ) + 𝑝(𝑥)𝐻 (𝜌 𝑥𝐵 )
𝑥∈X 𝑥∈X 𝑥∈X
∑︁
− 𝑝(𝑥)𝐻 (𝜌 𝑥𝐴𝐵 ). (12.3.81)
𝑥∈X

793
Chapter 12: Classical Communication

We use this for the second term in (12.3.79) to conclude that

∑︁
𝑝(𝑥)𝐻 ((N𝑐 ⊗ M𝑐 )(𝜓 𝑥𝐴1 𝐴2 ))
𝑥∈X
∑︁ ∑︁
𝑐
= 𝑝(𝑥)𝐻 (N (𝜓 𝑥𝐴1 )) + 𝑝(𝑥)𝐻 (M𝑐 (𝜓 𝑥𝐴2 )) (12.3.82)
𝑥∈X 𝑥∈X
∑︁
− 𝑝(𝑥)𝐷 ((N𝑐 ⊗ M𝑐 )(𝜓 𝑥𝐴1 𝐴2 )∥N𝑐 (𝜓 𝑥𝐴1 ) ⊗ M𝑐 (𝜓 𝑥𝐴2 )).
𝑥∈X

Now, let us focus on the relative entropy term in the expression above. Since N is a
Hadamard channel, by Proposition 4.17 we know that the complementary channel
N𝑐 is entanglement-breaking. Then, from Theorem 4.15, we know that every
entanglement-breaking channel can be written as the composition of a measurement
channel followed by a preparation channel. This means that we can write N𝑐 as
N𝑐 = P ◦ Mqc , where Mqc is the measurement (or quantum–classical) channel, and
P is the preparation channel. Using the data-processing inequality for the quantum
relative entropy, for all 𝑥 ∈ X we obtain

𝐷 ((N𝑐 ⊗ M𝑐 )(𝜓 𝑥𝐴1 𝐴2 )∥N𝑐 (𝜓 𝑥𝐴1 ) ⊗ M(𝜓 𝑥𝐴2 ))

= 𝐷 ((P ◦ Mqc
𝑐
⊗ M𝑐 )(𝜓 𝑥𝐴1 𝐴2 )∥(P ◦ Mqc )(𝜓 𝑥𝐴1 ) ⊗ M𝑐 (𝜓 𝑥𝐴2 )) (12.3.83)
≤ 𝐷 ((Mqc ⊗ M)(𝜓 𝑥𝐴1 𝐴2 )∥Mqc
𝑐
(𝜓 𝑥𝐴1 ) ⊗ M𝑐 (𝜓 𝑥𝐴2 )). (12.3.84)

Then, using the identity (12.3.81) once again, we obtain

∑︁
𝑝(𝑥)𝐷 ((Mqc ⊗ M𝑐 )(𝜓 𝑥𝐴1 𝐴2 )∥Mqc (𝜓 𝑥𝐴1 ) ⊗ M𝑐 (𝜓 𝑥𝐴2 ))
𝑥∈X
∑︁ ∑︁
= 𝑝(𝑥)𝐻 (Mqc (𝜓 𝑥𝐴1 )) + 𝑝(𝑥)𝐻 (M𝑐 (𝜓 𝑥𝐴2 ))
𝑥∈X 𝑥∈X
∑︁
− 𝑝(𝑥)𝐻 ((Mqc ⊗ M )(𝜓 𝑥𝐴1 𝐴2 )).
𝑐
(12.3.85)
𝑥∈X
𝑦
Now, let the measurement channel Mqc have the associated POVM {𝑀 𝐴1 } 𝑦∈Y for
some finite alphabet Y. Then, letting 𝑞(𝑦|𝑥) B Tr[𝑀𝑦 𝜓 𝑥𝐴1 ], we have that

𝐻 (Mqc (𝜓 𝑥𝐴1 )) = 𝐻 (𝑌 |𝑋 = 𝑥) (12.3.86)

for all 𝑥 ∈ X. Also, for every 𝑥 ∈ X, we find that

∑︁
𝑥,𝑦
(Mqc ⊗ M )(𝜓 𝐴1 𝐴2 ) =
𝑐 𝑥
𝑞(𝑦|𝑥)|𝑦⟩⟨𝑦|𝑌 ⊗ 𝜌 𝐴2 , (12.3.87)
𝑦∈Y

794
Chapter 12: Classical Communication

Tr 𝐴1 [(𝑀 𝐴1 ⊗ 1 𝐴2 )𝜓 𝑥𝐴1 𝐴2 ]. Note that

𝑥,𝑦 1 𝑦
where 𝜌 𝐴2 B 𝑞(𝑦|𝑥)

Tr 𝐴1 [(𝑀 𝐴1 ⊗ 1 𝐴2 )𝜓 𝑥𝐴1 𝐴2 ]
∑︁ ∑︁
𝑥,𝑦 𝑦
𝑞(𝑦|𝑥) 𝜌 𝐴2 =
𝑦∈Y 𝑦∈Y
(12.3.88)
= Tr 𝐴1 [𝜓 𝑥𝐴1 𝐴2 ]
= 𝜓 𝑥𝐴2
for all 𝑥 ∈ X. Therefore, for all 𝑥 ∈ X,
𝐻 ((Mqc ⊗ M)(𝜓 𝑥𝐴1 𝐴2 ))

©∑︁ 𝑥,𝑦 ª
= 𝐻 𝑞(𝑦|𝑥)|𝑦⟩⟨𝑦|𝑌 ⊗ M𝑐 (𝜌 𝐴2 ) ® (12.3.89)
« 𝑦∈Y ∑︁ ¬
𝑐 𝑥,𝑦
= 𝐻 (𝑌 |𝑋 = 𝑥) + 𝑞(𝑦|𝑥)𝐻 (M (𝜌 𝐴2 )), (12.3.90)
𝑦∈Y

where the last equality follows from the direct-sum property of the quantum entropy.
Putting everything together, we obtain
!
∑︁ ∑︁
𝐻 𝑝(𝑥)(N ⊗ M)(𝜓 𝑥𝐴1 𝐴2 ) − 𝑝(𝑥)𝐻 ((N𝑐 ⊗ M𝑐 )(𝜓 𝑥𝐴1 𝐴2 ))
𝑥∈X 𝑥∈X
! !
∑︁ ∑︁ ∑︁
≤𝐻 𝑝(𝑥)N(𝜓 𝑥𝐴1 ) − 𝑝(𝑥)𝐻 (N𝑐 (𝜓 𝑥𝐴1 )) + 𝐻 𝑝(𝑥)M(𝜓 𝑥𝐴2 )
𝑥∈X 𝑥∈X 𝑥∈X
∑︁ ∑︁
− 𝑝(𝑥)𝐻 (M𝑐 (𝜓 𝑥𝐴2 )) + 𝑝(𝑥)𝐻 (𝑌 |𝑋 = 𝑥)
𝑥∈X 𝑥∈X
∑︁ ∑︁
+ 𝑝(𝑥)𝐻 (M𝑐 (𝜓 𝑥𝐴2 )) − 𝑝(𝑥)𝐻 (𝑌 |𝑋 = 𝑥)
𝑥∈X 𝑥∈X
∑︁ ∑︁
𝑥,𝑦
− 𝑝(𝑥)𝑞(𝑦|𝑥)𝐻 (M𝑐 (𝜓 𝐴2 )) (12.3.91)
𝑥∈X 𝑦∈Y
!
∑︁ ∑︁
=𝐻 𝑝(𝑥)N(𝜓 𝑥𝐴1 ) − 𝑝(𝑥)𝐻 (N𝑐 (𝜓 𝑥𝐴1 )) (12.3.92)
𝑥∈X 𝑥∈X

©∑︁ ∑︁ 𝑥,𝑦 ª
+ 𝐻 𝑝(𝑥)𝑞(𝑦|𝑥)M(𝜌 𝐴2 ) ®
«𝑥∈X
∑︁ ∑︁
𝑦∈Y ¬
𝑐 𝑥,𝑦
− 𝑝(𝑥)𝑞(𝑦|𝑥)𝐻 (M (𝜌 𝐴2 ))
𝑥∈X 𝑦∈Y

795
Chapter 12: Classical Communication

≤ 𝜒(N) + 𝜒(M), (12.3.93)

where we have used (12.3.88) and, to obtain the last inequality, the fact that the first
two terms in (12.3.93) are of the form of the objective function in the expression in
(12.3.1) for the Holevo information 𝜒(N), and similarly for the last two terms, in
𝑥,𝑦
which the ensemble is {( 𝑝(𝑥)𝑞(𝑦|𝑥), 𝜌 𝐴2 ) : 𝑥 ∈ X, 𝑦 ∈ Y}.
Since the ensemble {( 𝑝(𝑥), 𝜓 𝑥𝐴1 𝐴2 )}𝑥∈X used to obtain (12.3.93) is arbitrary, we
conclude that
𝜒(N ⊗ M) ≤ 𝜒(M) + 𝜒(N), (12.3.94)
which implies, via the superadditivity in (12.2.44) that 𝜒(N ⊗ M) = 𝜒(N) + 𝜒(M),
as required. ■

Exercise 12.1
Prove that the classical capacity of the 𝑑-dimensional dephasing channel, 𝑑 ≥ 2,
is log2 𝑑.

12.4 Summary
In this chapter, we developed the theory of classical communication over a quantum
channel, adopting a similar structure to that of the previous chapter. We began
with the one-shot setting of classical communication, and we defined the one-shot
classical capacity of a quantum channel in Definition 12.2. We then derived upper
(Proposition 12.3) and lower (Proposition 12.5) bounds on the one-shot classical
capacity in terms of the hypothesis testing Holevo information of a quantum channel.
The approaches to doing so are conceptually similar to those from the previous
chapter. However, there are extra steps involved in deriving the lower bound,
called derandomization and expurgation, that establish the existence of a code with
maximum error probability no larger than a given threshold and number of bits
transmitted roughly equal to the one-shot Holevo information.
With the fundamental information-theoretic arguments established in the one-
shot setting, we then moved on to the asymptotic setting of classical communication.
One of the main results is that the regularized Holevo information of a channel
is equal to its classical capacity (Theorem 12.13). We then considered some
special cases: for entanglement-breaking, Hadamard, depolarizing, and erasue
796
Chapter 12: Classical Communication

channels, the Holevo information is not only equal to the classical capacity but also
equal to the strong converse classical capacity (we showed the proofs in full for
entanglement-breaking and erasure channels, but deferred to the literature for the
others). We discussed general upper bounds on the classical capacity, including the
Υ-information and 𝐶 𝛽 semi-definite programming bound.
Going forward from here, the methods of position-based coding and sequential
decoding are useful for the tasks of secret key distillation (Chapter 15) and private
communication (Chapter 16), and the concept of derandomization appears again in
the context of private communication. The Holevo information will also play a role
in achievable rates for these tasks.

12.5 Bibliographic Notes

Classical communication over quantum channels is one of the earliest settings
considered in quantum information theory. A key early work on the topic includes
Holevo (1973), in which the Holevo upper bound on classical capacity was
established. Many years later, after the advent of quantum computing, the Holevo
information lower bound on classical capacity was established by Holevo (1998)
and Schumacher and Westmoreland (1997). Prior to these works, Hausladen et al.
(1996) proved the same lower bound for the special case of a channel that accepts
classical inputs and outputs pure quantum states.
Classical communication in the one-shot setting has been considered by a
number of authors, including Hayashi and Nagaoka (2003); Hayashi (2007);
Mosonyi and Datta (2009); Mosonyi and Hiai (2011); Renes and Renner (2011);
Wang and Renner (2012); Matthews and Wehner (2014); Sharma and Warsi (2013);
Datta et al. (2013); Tomamichel and Hayashi (2013); Wilde (2013); Anshu et al.
(2019); Qi et al. (2018b); Oskouei et al. (2019). Proposition 12.3 is due to Matthews
and Wehner (2014). The second part of Theorem 12.4 is due to Wilde et al. (2014).
A variation of Theorem 12.5 (for average error probability with uniformly random
message) is due to Wang and Renner (2012). The proof given here is due to
Oskouei et al. (2019), however with some variations given in this book to account
for maximal error probability. At the same time, the proof uses the method of
position-based coding (Anshu et al., 2019), with the derandomization argument as
given by Qi et al. (2018b).
Additivity of Holevo information for entanglement-breaking channels was
797
Chapter 12: Classical Communication

established by Shor (2002a), for Hadamard channels by King et al. (2007), for
the depolarizing channel by King (2003b), and for the erasure channel by Bennett
et al. (1997). The fact that the Holevo information is the strong converse classical
capacity of the depolarizing channel was proven by Koenig and Wehner (2009).
Additivity of the sandwiched Rényi-Holevo information for entanglement-breaking
channels (Theorem 12.16) was established by Wilde et al. (2014), by building upon
earlier seminal results of King (2003a) subsequently generalized by Holevo (2006).
That is, Lemma 12.18 is due to King (2003a); Holevo (2006). Lemma 12.17 is due
to Wilde et al. (2014).
The Υ-information of a quantum channel and its variants were defined by Wang
et al. (2019c). The same authors established bounds on classical capacity involving
Υ-information. The strong converse for the classical capacity of the quantum
erasure channel is due to Wilde and Winter (2014), but here we have followed the
approach of Wang et al. (2019c). The semi-definite programming upper bound
𝐶 𝛽 (N) for the classical capacity of a quantum channel N was established by Wang
et al. (2018).
The Holevo information of covariant channels was studied by Holevo (2002b).
A proof of the fact that the limit in the definition of the regularized Holevo
information of a channel exists was given by Barnum et al. (1998).
The formula in (12.3.72) for the Holevo information of the amplitude damping
channel was derived by Li-Zhen and Mao-Fa (2007b), using the techniques of
Cortese (2002) and Berry (2005). The formula in (12.3.77) for the quantity 𝐶 𝛽
for the same channel was determined by Wang et al. (2018) (see also Khatri et al.
(2020)).

Appendix 12.A The 𝜶 → 1 Limit of the Sandwiched

Rényi 𝚼-Information of a Channel
In this section, we show that
e𝛼 (N) = Υ(N),
lim Υ (12.A.1)
𝛼→1+
where we recall that
e𝛼 (N) = sup inf 𝐷
Υ e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 )), (12.A.2)
𝜓 𝑅 𝐴 F∈𝔉

798
Chapter 12: Classical Communication

Υ(N) = sup inf 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 )). (12.A.3)

𝜓 𝑅 𝐴 F∈𝔉

Here, 𝜓 𝑅 𝐴 is a pure state, with the dimension of 𝑅 equal to the dimension of 𝐴, and
the infimum is over the set 𝔉 of completely positive maps defined as
𝔉 = {F 𝐴→𝐵 : ∃ 𝜎𝐵 ≥ 0, Tr[𝜎𝐵 ] ≤ 1, F 𝐴→𝐵 (𝜌 𝐴 ) ≤ 𝜎𝐵 ∀ 𝜌 𝐴 ∈ D(H 𝐴 )}.
(12.A.4)
Now, since the sandwiched Rényi relative entropy increases monotonically with
𝛼 (see Proposition 7.31), and since lim𝛼→1 𝐷 e𝛼 (𝜌∥𝜎) = 𝐷 (𝜌∥𝜎) (see Proposi-
tion 7.30), we obtain
e𝛼 (N) =
lim Υ e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 ))
inf sup inf 𝐷 (12.A.5)
𝛼→1+ 𝛼∈(1,∞) 𝜓 𝑅 𝐴 F∈𝔉

= sup inf e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 ))

inf 𝐷 (12.A.6)
𝜓 𝑅 𝐴 𝛼∈(1,∞) F∈𝔉

= sup inf e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 ))

inf 𝐷 (12.A.7)
𝜓 𝑅 𝐴 F∈𝔉 𝛼∈(1,∞)
= sup inf 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 )) (12.A.8)
𝜓 𝑅 𝐴 F∈𝔉
= Υ(N), (12.A.9)
as required, where to obtain the second equality we made use of the minimax
theorem in Theorem 2.25 to exchange inf 𝛼∈(1,∞) and sup𝜓 𝑅 𝐴 . Specifically, we
applied that theorem to the function
(𝛼, 𝜓 𝑅 𝐴 ) ↦→ inf 𝐷
e𝛼 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥F 𝐴→𝐵 (𝜓 𝑅 𝐴 )), (12.A.10)
F∈𝔉

which is monotonically increasing in the first argument and continuous in the

second argument.

Appendix 12.B Proof of the Additivity of 𝑪 𝜷 (N)

In this section, we prove that
𝐶 𝛽 (N1 ⊗ N2 ) = 𝐶 𝛽 (N1 ) + 𝐶 𝛽 (N2 ). (12.B.1)
Noting that 𝐶 𝛽 (N) = log2 𝛽(N), with 𝛽(N) defined in (12.2.228), the expression
above for the additivity of 𝐶 𝛽 (N) is equivalent to the multiplicativity of 𝛽(N), i.e.,
𝛽(N1 ⊗ N2 ) = 𝛽(N1 ) · 𝛽(N2 ). (12.B.2)
799
Chapter 12: Classical Communication

We now prove that this equality holds. We start with the following lemma.

Lemma 12.35
Let 𝐴 and 𝐵 be Hermitian operators such that −𝐴 ≤ 𝐵 ≤ 𝐴, and let 𝐶 and 𝐷
be Hermitian operators such that −𝐶 ≤ 𝐷 ≤ 𝐶. Then,

−𝐴 ⊗ 𝐶 ≤ 𝐵 ⊗ 𝐷 ≤ 𝐴 ⊗ 𝐶. (12.B.3)

Proof: The condition −𝐴 ≤ 𝐵 ≤ 𝐴 is equivalent to 𝐴 − 𝐵 ≥ 0 and 𝐴 + 𝐵 ≥ 0,

and the condition −𝐶 ≤ 𝐷 ≤ 𝐶 is equivalent to 𝐶 − 𝐷 ≥ 0 and 𝐶 + 𝐷 ≥ 0. These
inequalities imply that

( 𝐴 − 𝐵) ⊗ (𝐶 − 𝐷) ≥ 0, (12.B.4)
( 𝐴 + 𝐵) ⊗ (𝐶 + 𝐷) ≥ 0, (12.B.5)
( 𝐴 − 𝐵) ⊗ (𝐶 + 𝐷) ≥ 0, (12.B.6)
( 𝐴 + 𝐵) ⊗ (𝐶 − 𝐷) ≥ 0. (12.B.7)

Expanding the left-hand side of these inequalities gives

𝐴⊗𝐶−𝐵⊗𝐶−𝐴⊗𝐷+𝐵⊗𝐷 ≥ 0, (12.B.8)
𝐴⊗𝐶+𝐵⊗𝐶+𝐴⊗𝐷+𝐵⊗𝐷 ≥ 0, (12.B.9)
𝐴⊗𝐶−𝐵⊗𝐶+𝐴⊗𝐷−𝐵⊗𝐷 ≥ 0, (12.B.10)
𝐴⊗𝐶+𝐵⊗𝐶−𝐴⊗𝐷−𝐵⊗𝐷 ≥ 0. (12.B.11)

Now, adding the first two of these inequalities implies that 𝐴 ⊗ 𝐶 + 𝐵 ⊗ 𝐷 ≥ 0, which
is equivalent to the left-hand side of (12.B.3). Adding the last two inequalities
implies that 𝐴 ⊗ 𝐶 − 𝐵 ⊗ 𝐷 ≥ 0, which is equivalent to the right-hand side of
(12.B.3). ■

An immediate corollary of the lemma above is the following: for all Hermitian
operators 𝐴, 𝐵, 𝐶, 𝐷 such that 0 ≤ 𝐵 ≤ 𝐴 and 0 ≤ 𝐷 ≤ 𝐶, it holds that

0 ≤ 𝐵 ⊗ 𝐷 ≤ 𝐴 ⊗ 𝐶. (12.B.12)

Indeed, the condition 0 ≤ 𝐵 ≤ 𝐴 implies that 𝐴 ≥ 0, which is equivalent to −𝐴 ≤ 0,

which means that −𝐴 ≤ 𝐵 holds. Similarly, we get that −𝐶 ≤ 𝐷. So we have that
−𝐴 ≤ 𝐵 ≤ 𝐴 and −𝐶 ≤ 𝐷 ≤ 𝐶. The result then follows by applying the lemma
above.
800
Chapter 12: Classical Communication

Now, let us start the proof of (12.B.2) by showing

𝛽(N1 ⊗ N2 ) ≤ 𝛽(N1 ) · 𝛽(N2 ). (12.B.13)
Recall from (12.2.228) that

 infimum Tr[𝑆 𝐵 ]



𝛽(N) = subject to −𝑅 𝐴𝐵 ≤ T𝐵 [ΓN 𝐴𝐵 ] ≤ 𝑅 𝐴𝐵 , (12.B.14)
−1 𝐴 ⊗ 𝑆 𝐵 ≤ T𝐵 [𝑅 𝐴𝐵 ] ≤ 1 𝐴 ⊗ 𝑆 𝐵 .



Now, let (𝑅 1𝐴𝐵 , 𝑆 1𝐵 ) be a feasible point in the SDP for 𝛽(N1 ), and let (𝑅 2𝐴𝐵 , 𝑆 2𝐵 ) be
a feasible point in the SDP for 𝛽(N2 ). Each pair thus satisfies the constraints in
(12.B.14). Using Lemma 12.35, the first of these constraints implies that
−𝑅 1𝐴1 𝐵1 ⊗ 𝑅 2𝐴2 𝐵2 ≤ T𝐵1 [ΓN1 N2 1 2
𝐴1 𝐵1 ] ⊗ T 𝐵2 [Γ 𝐴2 𝐵2 ] ≤ 𝑅 𝐴1 𝐵1 ⊗ 𝑅 𝐴2 𝐵2 . (12.B.15)
Furthermore, observe that
T𝐵1 [ΓN1 N2 N1 N2
𝐴1 𝐵1 ] ⊗ T 𝐵2 [Γ 𝐴2 𝐵2 ] = T 𝐵1 𝐵2 [Γ 𝐴1 𝐵1 ⊗ Γ 𝐴2 𝐵2 ] (12.B.16)
= T𝐵1 𝐵2 [ΓN1 ⊗N2
𝐴1 𝐴2 𝐵1 𝐵2 ]. (12.B.17)
Using this, along with Lemma 12.35, the second constraint in (12.B.14) implies
that
−1 𝐴1 𝐴2 ⊗ 𝑆 1𝐵1 ⊗ 𝑆 2𝐵2 ≤ T𝐵1 𝐵2 [𝑅 1𝐴1 𝐵1 ⊗ 𝑅 2𝐴2 𝐵2 ] ≤ 1 𝐴1 𝐴2 ⊗ 𝑆 1𝐵1 ⊗ 𝑆 2𝐵2 . (12.B.18)
Now, the inequalities in (12.B.17) and (12.B.18) imply that (𝑅 1𝐴1 𝐵1 ⊗ 𝑅 2𝐴2 𝐵2 , 𝑆 1𝐵1 ⊗
𝑆 2𝐵2 ) is a feasible point in the SDP for 𝛽(N1 ⊗ N2 ). This means that

𝛽(N1 ⊗ N2 ) ≤ Tr[𝑆 1𝐵1 ⊗ 𝑆 2𝐵2 ] = Tr[𝑆 1𝐵1 ]Tr[𝑆 2𝐵2 ]. (12.B.19)

Since (𝑅 1𝐴1 𝐵1 , 𝑆 1𝐵1 ) and (𝑅 2𝐴2 𝐵2 , 𝑆 2𝐵2 ) are arbitrary feasible points in the SDPs for
𝛽(N1 ) and 𝛽(N2 ), respectively, the inequality in (12.B.19) holds for the feasible
points achieving 𝛽(N1 ) and 𝛽(N2 ). This means that
𝛽(N1 ⊗ N2 ) ≤ 𝛽(N1 ) · 𝛽(N2 ), (12.B.20)
as required.
The prove the reverse inequality, i.e.,
𝛽(N1 ⊗ N2 ) ≥ 𝛽(N1 ) 𝛽(N2 ), (12.B.21)
we turn to the SDP dual to the one in (12.B.14).

801
Chapter 12: Classical Communication

Lemma 12.36
For every quantum channel N, the SDP dual to the SDP in (12.B.14) for 𝛽(N)
is given by
N ] (𝐾



 supremum Tr T 𝐵 [Γ 𝐴𝐵 𝐴𝐵 − 𝑀 )
𝐴𝐵 ,

B subject to 𝐾 𝐴𝐵 + 𝑀 𝐴𝐵 ≤ T𝐵 [𝐸 𝐴𝐵 + 𝐹𝐴𝐵 ],


𝛽(N) (12.B.22)
𝐸 𝐵 + 𝐹𝐵 ≤ 1𝐵 ,
b




 𝐾 𝐴𝐵 , 𝑀 𝐴𝐵 , 𝐸 𝐴𝐵 , 𝐹𝐴𝐵 ≥ 0.

Furthermore, it holds that 𝛽(N)

b = 𝛽(N).

Proof: Using the formulation of the SDP for 𝛽(N) as in (12.2.229), the dual to
the SDP for 𝛽(N) is simply

 supremum Tr[𝐷𝑌 ]



𝛽(N) = subject to Φ† (𝑌 ) ≤ 𝐶,
b (12.B.23)
𝑌 ≥ 0,



where
T𝐵 [ΓN𝐴𝐵 ] 0 0 0
1𝐵 −T𝐵 [ΓN

𝐴𝐵 ]
© ª
0 0 0 0®
𝐶= 𝐷=

, ®, (12.B.24)
0 0 𝐴𝐵 0 0 0 0®
« 0 0 0 0¬

and the map Φ is defined as

𝑅 𝐴𝐵 0 0 0
© ª
0 𝑅 𝐴𝐵 0 0
Φ(𝑋) =
®
0 1 𝑅 ⊗ 𝑆 𝐵 − T𝐵 [𝑅 𝐴𝐵 ]
®, (12.B.25)
0 0 ®
« 0 0 0 1𝑅 ⊗ 𝑆 𝐵 + T𝐵 [𝑅 𝐴𝐵 ] ¬

𝑆 0
𝑋= 𝐵 . (12.B.26)
0 𝑅 𝐴𝐵

To determine the adjoint Φ† , we first observe that, since the operators 𝐶 and 𝐷
are block diagonal, the objective function Tr[𝐷𝑌 ] of the dual problem involves
only the diagonal blocks of 𝑌 . Furthermore, the fact that Φ(𝑋) and 𝑋 are block
diagonal means that the condition Tr[Φ(𝑋)𝑌 ] = Tr[𝑋Φ† (𝑌 )] defining the adjoint
802
Chapter 12: Classical Communication

map Φ† involves only the diagonal blocks of 𝑌 . Therefore, if the dual problem is
feasible, then there is always a feasible point 𝑌 that is block diagonal. This means
that, without loss of generality, we can let
1
𝑌 𝐴𝐵 0 0 0
© 2 ª
0 𝑌 𝐴𝐵 0 0 ®
𝑌 = 3 ®, (12.B.27)
0 0 𝑌 𝐴𝐵 0 ®
4
« 0 0 0 𝑌 𝐴𝐵 ¬
1 , 𝑌 2 , 𝑌 3 , 𝑌 4 ≥ 0. Then,
with 𝑌 𝐴𝐵 𝐴𝐵 𝐴𝐵 𝐴𝐵

Tr[Φ(𝑋)𝑌 ]
 𝑅 𝐴𝐵 0 0 0
© ª
 0 𝑅 𝐴𝐵 0 0
= Tr 
®
1

⊗ − [𝑅 ]
®
0 0 𝑅 𝑆 𝐵 T 𝐵 𝐴𝐵 0 ®
1𝑅 ⊗ 𝑆 𝐵 + T𝐵 [𝑅 𝐴𝐵 ] ¬

 0 0 0
«
1
𝑌 𝐴𝐵 0 0 0 
© 2 ª
0 𝑌 𝐴𝐵 0 0 ®
× 3 ® (12.B.28)
0 0 𝑌 𝐴𝐵 0 ®
0 0 0 𝑌 𝐴𝐵 4 
« ¬
= Tr 𝑅 𝐴𝐵𝑌 𝐴𝐵 + 𝑅 𝐴𝐵𝑌 𝐴𝐵 + ( 1 𝑅 ⊗ 𝑆 𝐵 − T𝐵 [𝑅 𝐴𝐵 ])𝑌 𝐴𝐵
1 2 3

+( 1 𝑅 ⊗ 𝑆 𝐵 + T𝐵 [𝑅 𝐴𝐵 ])𝑌 𝐴𝐵4

(12.B.29)
= Tr[𝑆 𝐵 (𝑌𝐵3 + 𝑌𝐵4 )] + Tr[𝑅 𝐴𝐵 (𝑌 𝐴𝐵
1 2
+ 𝑌 𝐴𝐵 4
+ T𝐵 [𝑌 𝐴𝐵 3
− 𝑌 𝐴𝐵 ])] (12.B.30)
3
𝑆𝐵 0 𝑌𝐵 + 𝑌𝐵4 0
= Tr . (12.B.31)
0 𝑅 𝐴𝐵 0 𝑌 𝐴𝐵 + 𝑌 𝐴𝐵 + T𝐵 [𝑌 𝐴𝐵
1 2 4 − 𝑌3 ]
𝐴𝐵

For the last line to be equal to Tr[𝑋Φ† (𝑌 )], we must have

3
† 𝑌𝐵 + 𝑌𝐵4 0
Φ (𝑌 ) = 1 + 𝑌 2 + T [𝑌 4 − 𝑌 3 ] . (12.B.32)
0 𝑌 𝐴𝐵 𝐴𝐵 𝐵 𝐴𝐵 𝐴𝐵

Then, the condition Φ† (𝑌 ) ≤ 𝐶 is given by

1𝐵 0
3
𝑌𝐵 + 𝑌𝐵4 0
0 1 + 𝑌 2 + T [𝑌 4 − 𝑌 3 ] ≤ 0 0
𝑌 𝐴𝐵
, (12.B.33)
𝐴𝐵 𝐵 𝐴𝐵 𝐴𝐵 𝑅𝐵

which implies that

𝑌𝐵3 + 𝑌𝐵4 ≤ 1𝐵 , (12.B.34)

803
Chapter 12: Classical Communication

1 2 3 4
𝑌 𝐴𝐵 + 𝑌 𝐴𝐵 ≤ T𝐵 [𝑌 𝐴𝐵 − 𝑌 𝐴𝐵 ]. (12.B.35)

Then,

Tr[𝐷𝑌 ] = Tr[T𝐵 [ΓN 1 N 2

𝐴𝐵 ]𝑌 𝐴𝐵 ] − Tr[T 𝐵 [Γ 𝐴𝐵 ]𝑌 𝐴𝐵 ] (12.B.36)
= Tr[T𝐵 [ΓN 1 2
𝐴𝐵 ] (𝑌 𝐴𝐵 − 𝑌 𝐴𝐵 )]. (12.B.37)

Therefore, the dual is given by

supremum Tr T𝐵 [ΓN



 𝐴𝐵 ] (𝐾 𝐴𝐵 − 𝑀 𝐴𝐵 )
 subject to 𝐾 𝐴𝐵 + 𝑀 𝐴𝐵 ≤ T𝐵 [𝐸 𝐴𝐵 − 𝐹𝐴𝐵 ],


=
𝐸 𝐵 + 𝐹𝐵 ≤ 1𝐵 ,
𝛽(N)
b (12.B.38)


𝐾 𝐴𝐵 , 𝑀 𝐴𝐵 , 𝐸 𝐴𝐵 , 𝐹𝐴𝐵 ≥ 0,



as required.
To show that 𝛽(N)b = 𝛽(N), we need to check that Slater’s condition holds
(Theorem 2.28). We can pick 𝐸 𝐴𝐵 = 13𝑑𝐴𝐵𝐴 , 𝐹𝐴𝐵 = 16𝑑𝐴𝐵𝐴 , and 𝐾 𝐴𝐵 = 𝑀 𝐴𝐵 = 24𝑑 1 𝐴𝐵
𝐴
,
where 𝑑 𝐴 is the dimension of the space of the system 𝐴. Then we have strict
inequalities for all of the constraints of the dual problem, which means that Slater’s
condition holds. The primal 𝛽(N) and dual 𝛽(N) b are thus equal. ■

With the dual problem in hand, we can now prove (12.B.21). Let (𝐾 𝐴1 1 𝐵1 ,
𝑀 𝐴1 1 𝐵1 , 𝐸 1𝐴1 𝐵1 , 𝐹𝐴1 1 𝐵1 ) be a feasible point for the dual SDP for N1 , and let (𝐾 𝐴2 2 𝐵2 ,
𝑀 𝐴2 2 𝐵2 , 𝐸 2𝐴2 𝐵2 , 𝐹𝐴2 2 𝐵2 ) be a feasible point for the dual SDP for N2 . Then, pick

𝐾 𝐴1 𝐵1 𝐴2 𝐵2 = 𝐾 𝐴1 1 𝐵1 ⊗ 𝐾 𝐴2 2 𝐵2 + 𝑀 𝐴1 1 𝐵1 ⊗ 𝑀 𝐴2 2 𝐵2 , (12.B.39)
𝑀 𝐴1 𝐵1 𝐴2 𝐵2 = 𝐾 𝐴1 1 𝐵1 ⊗ 𝑀 𝐴2 2 𝐵2 + 𝑀 𝐴1 1 𝐵1 ⊗ 𝐾 𝐴2 2 𝐵2 , (12.B.40)
𝐸 𝐴1 𝐵1 𝐴2 𝐵2 = 𝐸 1𝐴1 𝐵1 ⊗ 𝐸 2𝐴2 𝐵2 + 𝐹𝐴1 1 𝐵1 ⊗ 𝐹𝐴2 2 𝐵2 , (12.B.41)
𝐹𝐴1 𝐵1 𝐴2 𝐵2 = 𝐸 1𝐴1 𝐵1 ⊗ 𝐹𝐴2 2 𝐵2 + 𝐹𝐴1 1 𝐵1 ⊗ 𝐸 2𝐴2 𝐵2 . (12.B.42)

Note that 𝐾 𝐴1 𝐵1 𝐴2 𝐵2 , 𝑀 𝐴1 𝐵1 𝐴2 𝐵2 , 𝐸 𝐴1 𝐵1 𝐴2 𝐵2 , 𝐹𝐴1 𝐵1 𝐴2 𝐵2 ≥ 0. Then,

𝐾 𝐴1 𝐵1 𝐴2 𝐵2 − 𝑀 𝐴1 𝐵1 𝐴2 𝐵2 = (𝐾 𝐴1 1 𝐵1 − 𝑀 𝐴1 1 𝐵1 ) ⊗ (𝐾 𝐴2 2 𝐵2 − 𝑀 𝐴2 2 𝐵2 ), (12.B.43)
𝐾 𝐴1 𝐵1 𝐴2 𝐵2 + 𝑀 𝐴1 𝐵1 𝐴2 𝐵2 = (𝐾 𝐴1 1 𝐵1 + 𝑀 𝐴1 1 𝐵1 ) ⊗ (𝐾 𝐴2 2 𝐵2 + 𝑀 𝐴2 2 𝐵2 ), (12.B.44)
𝐸 𝐴1 𝐵1 𝐴2 𝐵2 − 𝐹𝐴1 𝐵1 𝐴2 𝐵2 = (𝐸 1𝐴1 𝐵1 − 𝐹𝐴1 1 𝐵1 ) ⊗ (𝐸 2𝐴2 𝐵2 − 𝐹𝐴2 2 𝐵2 ), (12.B.45)
𝐸 𝐴1 𝐵1 𝐴2 𝐵2 + 𝐹𝐴1 𝐵1 𝐴2 𝐵2 = (𝐸 1𝐴1 𝐵1 + 𝐹𝐴1 1 𝐵1 ) ⊗ (𝐸 2𝐴2 𝐵2 + 𝐹𝐴2 2 𝐵2 ). (12.B.46)

804
Chapter 12: Classical Communication

Consider that

𝐾 𝐴1 𝐵 1 𝐴2 𝐵 2 + 𝑀 𝐴1 𝐵 1 𝐴2 𝐵 2
= (𝐾 𝐴1 1 𝐵1 + 𝑀 𝐴1 1 𝐵1 ) ⊗ (𝐾 𝐴2 2 𝐵2 + 𝑀 𝐴2 2 𝐵2 ) (12.B.47)
≤ T𝐵1 [𝐸 1𝐴1 𝐵1 − 𝐹𝐴1 1 𝐵1 ] ⊗ T𝐵2 [𝐸 2𝐴2 𝐵2 − 𝐹𝐴2 2 𝐵2 ] (12.B.48)
= T𝐵1 𝐵2 [(𝐸 1𝐴1 𝐵1 − 𝐹𝐴1 1 𝐵1 ) ⊗ (𝐸 2𝐴2 𝐵2 − 𝐹𝐴2 2 𝐵2 )] (12.B.49)
= T𝐵1 𝐵2 [𝐸 𝐴1 𝐵1 𝐴2 𝐵2 − 𝐹𝐴1 𝐵1 𝐴2 𝐵2 ], (12.B.50)

where the inequality follows from the constraints 𝐾 𝑖𝐴𝑖 𝐵𝑖 , 𝑀 𝐴𝑖 𝑖 𝐵𝑖 ≥ 0 and 𝐾 𝑖𝐴𝑖 𝐵𝑖
+ 𝑀 𝐴𝑖 𝑖 𝐵𝑖 ≤ T𝐵𝑖 [𝐸 𝑖𝐴𝑖 𝐵𝑖 − 𝐹𝐴𝑖 𝑖 𝐵𝑖 ] for 𝑖 ∈ {1, 2} and from an application of (12.B.12).
Furthermore, we have that

𝐸 𝐵1 𝐵2 + 𝐹𝐵1 𝐵2 = (𝐸 𝐵1 1 + 𝐹𝐵11 ) ⊗ (𝐸 𝐵2 2 + 𝐹𝐵22 ) (12.B.51)

≤ 1 𝐵1 ⊗ 1 𝐵2 (12.B.52)
= 1 𝐵1 𝐵2 , (12.B.53)

where the inequality follows from the constraints 𝐸 𝐵𝑖 𝑖 , 𝐹𝐵𝑖 𝑖 ≥ 0 and 𝐸 𝐵𝑖 𝑖 + 𝐹𝐵𝑖 𝑖 ≤ 1𝐵𝑖
for 𝑖 ∈ {1, 2} and from an application of (12.B.12). The collection

(𝐾 𝐴1 𝐵1 𝐴2 𝐵2 , 𝑀 𝐴1 𝐵1 𝐴2 𝐵2 , 𝐸 𝐴1 𝐵1 𝐴2 𝐵2 , 𝐹𝐴1 𝐵1 𝐴2 𝐵2 ) (12.B.54)

thus constitutes a feasible point for the SDP in (12.B.38). By restricting the
optimization in the SDP to this point, we find that

𝛽(N1 ⊗ N2 ) (12.B.55)
h i
N1 ⊗N2
≥ Tr T𝐵1 𝐵2 [Γ𝐴1 𝐴2 𝐵1 𝐵2 ] (𝐾 𝐴1 𝐵1 𝐴2 𝐵2 − 𝑀 𝐴1 𝐵1 𝐴2 𝐵2 ) (12.B.56)
h i
N1 N2
= Tr T𝐵1 [Γ𝐴1 𝐵1 ] ⊗ T𝐵2 [Γ𝐴2 𝐵2 ] (𝐾 𝐴1 𝐵1 𝐴2 𝐵2 − 𝑀 𝐴1 𝐵1 𝐴2 𝐵2 ) (12.B.57)
h
N1 N2
= Tr T𝐵1 [Γ𝐴1 𝐵1 ] ⊗ T𝐵2 [Γ𝐴2 𝐵2 ]
i
× (𝐾 𝐴1 1 𝐵1 − 𝑀 𝐴1 1 𝐵1 ) ⊗ (𝐾 𝐴2 2 𝐵2 − 𝑀 𝐴2 2 𝐵2 ) (12.B.58)
h i
N1 1 1
= Tr T𝐵1 [Γ𝐴1 𝐵1 ] (𝐾 𝐴1 𝐵1 − 𝑀 𝐴1 𝐵1 )
h i
N2 2 2
× Tr T𝐵2 [Γ𝐴2 𝐵2 ] (𝐾 𝐴2 𝐵2 − 𝑀 𝐴2 𝐵2 ) . (12.B.59)

Now, since (𝐾 𝐴𝐵
1 , 𝑀 1 , 𝐸 1 , 𝐹 1 ) (𝐾 2 , 𝑀 2 , 𝐸 2 , 𝐹 2 ) were arbitrary feasible
𝐴𝐵 𝐴𝐵 𝐴𝐵 𝐴𝐵 𝐴𝐵 𝐴𝐵 𝐴𝐵
b 1 ) = 𝛽(N1 ) and 𝛽(N
points in the SDPs for 𝛽(N b 2 ) = 𝛽(N2 ), respectively, the

805
Chapter 12: Classical Communication

inequality in (12.B.59) holds for the feasible points achieving 𝛽(N1 ) and 𝛽(N2 ).
Therefore,
𝛽(N1 ⊗ N2 ) ≥ 𝛽(N1 ) · 𝛽(N2 ). (12.B.60)
We have thus shown that 𝛽(N1 ⊗ N2 ) = 𝛽(N1 ) · 𝛽(N2 ).

806
Chapter 13

Entanglement Distillation
In the last two chapters, we explored classical communication over quantum channels,
in which classical information is encoded into a quantum state, transmitted over
a quantum channel, and decoded at the receiving end. In this chapter, we begin
our exploration of quantum communication. The goal here is to send quantum
information between two spatially separated parties. By “quantum information,”
we mean that a particular quantum state is transmitted, which is carried physically
by some quantum system. As was the case in previous chapters, the particular
information carrier is unimportant to us when developing the theoretical results;
however, the most common physical manifestation is a photonic encoding, which is
useful for long-distance quantum communication.
A basic quantum communication protocol is teleportation, which we developed
in Section 5.1. In this protocol, the sender, Alice, initially shares a maximally
entangled state with the receiver, Bob. This shared entanglement, along with
classical communication, can be used to transmit an arbitrary quantum state
perfectly from Alice to Bob. Specifically, if Alice and Bob share a maximally
entangled state of Schmidt rank 𝑑 ≥ 2, then using this entanglement along
with 2 log2 𝑑 bits of classical communication, Alice can perfectly transmit an
arbitrary state of log2 𝑑 qubits to Bob. Thus, the quantum teleportation protocol
realizes a noiseless quantum channel between Alice and Bob without having to
physically transport the particles carrying the quantum information. Of course, this
achievement comes at the cost of having a pre-shared maximally entangled state.
How do we obtain maximally entangled states in the first place? In practice,
due to noise and other device imperfections, physical sources of entanglement

807
Chapter 13: Entanglement Distillation

An Â

Alice
Bob
ρ⊗
AB
n
L↔ Φ Â B̂

Bn B̂

Figure 13.1: Given a bipartite state 𝜌 𝐴𝐵 shared by Alice and Bob, the task of
entanglement distillation is to find the largest 𝑑 for which a maximally entangled
state |Φ⟩ 𝐴ˆ 𝐵ˆ of Schmidt rank 𝑑 can be extracted from 𝑛 copies of 𝜌 𝐴𝐵 with the
smallest possible error, given a two-way LOCC channel L↔𝑛 𝑛 ˆ ˆ between
𝐴 𝐵 → 𝐴𝐵
Alice and Bob.

often only produce mixed entangled states, not the pure, maximally entangled
states that are needed for quantum teleportation. The purpose of this chapter is
to show that many copies of a mixed entangled state can be used to extract, or
distill, some smaller number of pure maximally entangled states. These distilled
maximally entangled states can then be used for quantum communication via the
teleportation protocol. This is a basic strategy for quantum communication that
we consider in more detail in Chapter 19, in order to obtain achievable rates for
quantum communication over a quantum channel.
Similar to quantum teleportation, in which the allowed resources are local
operations by Alice and Bob and one-way classical communication from Alice to
Bob, in entanglement distillation we allow Alice and Bob local operations with
two-way classical communication (that is, communication from Alice to Bob and
from Bob to Alice); see Figure 13.1. The goal is to determine, given many copies
of a quantum state 𝜌 𝐴𝐵 , the maximum rate at which maximally entangled states
(i.e., ebits) can be distilled approximately from 𝜌 𝐴𝐵 , where the rate is defined as
the ratio 𝑛1 log2 𝑑 between the number log2 𝑑 of approximate ebits extracted and the
initial number 𝑛 of copies of 𝜌 𝐴𝐵 . In the asymptotic setting, this maximum rate
of entanglement distillation is called the distillable entanglement of 𝜌 𝐴𝐵 , and we
denote it by 𝐸 𝐷 (𝜌 𝐴𝐵 ). We often write 𝐸 𝐷 (𝜌 𝐴𝐵 ) as 𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 in order to explicitly
indicate the bipartition between the subsystems.
The shared resource state 𝜌 𝐴𝐵 for entanglement distillation has to be entangled
to begin with in order for entanglement distillation to be successful. If 𝜌 𝐴𝐵 is
separable to begin with, then it stays separable after the application of an LOCC
channel, and it is not possible to distill high fidelity maximally entangled states
808
Chapter 13: Entanglement Distillation

from a separable state. This intuitive reasoning becomes formalized in this chapter:
some of the entanglement measures from Chapter 9 serve as upper bounds on
the distillable entanglement, in both the one-shot (Section 13.1) and asymptotic
(Section 13.2) settings. In particular, the Rains relative entropy and squashed
entanglement are upper bounds on distillable entanglement. These entanglement
measures are currently the best known upper bounds on distillable entanglement,
and so we focus exclusively on them for this purpose in this chapter. It is then
a trivial consequence of Proposition 9.24, (9.1.149), and Proposition 9.35 that
log-negativity, relative entropy of entanglement, and entanglement of formation
are upper bounds on distillable entanglement, and so we do not focus on these
entanglement measures in this chapter.
We also consider lower bounds on distillable entanglement in this chapter: the
lower bound on distillable entanglement in the one-shot setting in Section 13.1.2 is
based on the concept of decoupling, which is an important concept that we discuss
later. This lower bound, when applied in the asymptotic setting, leads to the coherent
information lower bound 𝐸 𝐷 (𝜌 𝐴𝐵 ) ≥ 𝐼 ( 𝐴⟩𝐵) 𝜌 on distillable entanglement.

13.1 One-Shot Setting

The one-shot setting for entanglement distillation begins with Alice and Bob
sharing the state 𝜌 𝐴𝐵 . An entanglement distillation protocol for 𝜌 𝐴𝐵 is defined
by the pair (𝑑, L↔ ˆ ˆ ), where 𝑑 ∈ N, 𝑑 ≥ 1, and L↔ ˆ ˆ is an LOCC channel
𝐴𝐵→ 𝐴 𝐵 𝐴𝐵→ 𝐴 𝐵
(Definition 4.22), with 𝑑 𝐴ˆ = 𝑑 𝐵ˆ = 𝑑. The distillation error 𝑝 err (L↔ ; 𝜌 𝐴𝐵 ) of the
protocol is given by the infidelity, defined as

𝑝 err (L↔ ; 𝜌 𝐴𝐵 ) B 1 − 𝐹 (Φ 𝐴ˆ 𝐵ˆ , L↔
𝐴𝐵→ 𝐴ˆ 𝐵ˆ
(𝜌 𝐴𝐵 )) (13.1.1)
= 1 − ⟨Φ| 𝐴ˆ 𝐵ˆ L↔
𝐴𝐵→ 𝐴ˆ 𝐵ˆ
(𝜌 𝐴𝐵 )|Φ⟩ 𝐴ˆ 𝐵ˆ , (13.1.2)

where Φ 𝐴ˆ 𝐵ˆ is the maximally entangled state of Schmidt rank 𝑑, defined as

𝑑−1
1 ∑︁
|Φ⟩ 𝐴ˆ 𝐵ˆ =√ |𝑖⟩ 𝐴ˆ ⊗ |𝑖⟩𝐵ˆ , (13.1.3)
𝑑 𝑖=0

and 𝐹 is the fidelity (see Section 6.2). To obtain (13.1.2), we used the formula in
(6.2.2) for the fidelity between a pure state and a mixed state.

809
Chapter 13: Entanglement Distillation

The figure of merit in (13.1.2) is sensible: the error probability 𝑝 err (L↔ ; 𝜌 𝐴𝐵 ) is
equal to the probability that the state 𝜔 𝐴ˆ 𝐵ˆ B L↔ ˆ ˆ (𝜌 𝐴𝐵 ) fails an “entanglement
𝐴𝐵→ 𝐴 𝐵
test,” which is a measurement defined by the POVM
{Φ 𝐴ˆ 𝐵ˆ , 1 𝐴ˆ 𝐵ˆ − Φ 𝐴ˆ 𝐵ˆ }. (13.1.4)
Passing the test corresponds to the measurement operator Φ 𝐴ˆ 𝐵ˆ and failing corre-
sponds to 1 𝐴ˆ 𝐵ˆ − Φ 𝐴ˆ 𝐵ˆ . If 1 − Tr[𝜔 𝐴ˆ 𝐵ˆ Φ 𝐴ˆ 𝐵ˆ ] ≤ 𝜀 ∈ [0, 1], and 𝑑 𝐴ˆ = 𝑑 𝐵ˆ = 𝑑 ≥ 1,
then we say that the final state 𝜔 𝐴ˆ 𝐵ˆ contains log2 𝑑 𝜀-approximate ebits.

Definition 13.1 (𝒅, 𝜺) Entanglement Distillation Protocol

An entanglement distillation protocol (𝑑, L↔ ˆ ˆ ) for the state 𝜌 𝐴𝐵 is called a
𝐴𝐵→ 𝐴 𝐵
(𝑑, 𝜀) protocol, with 𝜀 ∈ [0, 1], if 𝑝 err (L↔ ; 𝜌 𝐴𝐵 ) ≤ 𝜀.

Given 𝜀 ∈ [0, 1], the largest number log2 𝑑 of 𝜀-approximate ebits that can be
extracted from a state 𝜌 𝐴𝐵 among all (𝑑, 𝜀) entanglement distillation protocols is
called the one-shot 𝜀-distillable entanglement of 𝜌 𝐴𝐵 .

Definition 13.2 One-Shot Distillable Entanglement

Given a bipartite state 𝜌 𝐴𝐵 and 𝜀 ∈ [0, 1], the one-shot 𝜀-distillable entangle-
ment of 𝜌 𝐴𝐵 , denoted by 𝐸 𝐷𝜀 (𝜌 𝐴𝐵 ) ≡ 𝐸 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 , is defined as

𝐸 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 B sup {log2 𝑑 : 𝑝 err (L↔ ; 𝜌 𝐴𝐵 ) ≤ 𝜀}, (13.1.5)

(𝑑,L↔ )

where the optimization is over all 𝑑 ≥ 1 and every LOCC channel L↔ ˆ ˆ

𝐴𝐵→ 𝐴 𝐵
with 𝑑 𝐴ˆ = 𝑑 𝐵ˆ = 𝑑.

In addition to finding the largest number log2 𝑑 of 𝜀-approximate ebits that can
be extracted from all (𝑑, 𝜀) entanglement distilltion protocols for a given 𝜀 ∈ [0, 1],
we can consider the following complementary question: for a given 𝑑 ≥ 1, what is
the lowest value of 𝜀 that can be attained among all (𝑑, 𝜀) entanglement distillation
protocols? In other words, what is the value of
𝜀 ∗𝐷 (𝑑; 𝜌 𝐴𝐵 ) B inf {𝑝 err (L↔ ; 𝜌 𝐴𝐵 ) : 𝑑 𝐴ˆ = 𝑑 𝐵ˆ = 𝑑}, (13.1.6)
L↔ ˆ ˆ ∈LOCC
𝐴𝐵→ 𝐴𝐵

where the optimization is over every LOCC channel L↔ ˆ ˆ , with 𝑑 𝐴ˆ = 𝑑 𝐵ˆ = 𝑑?

𝐴𝐵→ 𝐴 𝐵
In this book, we focus primarily on the problem of optimizing the number of
810
Chapter 13: Entanglement Distillation

extracted (approximate) ebits rather than the error, and so our primary quantity of
interest is the one-shot distillable entanglement 𝐸 𝐷𝜀 (𝜌 𝐴𝐵 ).
Calculating the one-shot distillable entanglement is generally a difficult task,
because it involves optimizing over every Schmidt rank 𝑑 ≥ 1 of the maximally
entangled state Φ 𝐴ˆ 𝐵ˆ and over every LOCC channel L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ , with 𝑑 𝐴ˆ = 𝑑 𝐵ˆ = 𝑑.
We therefore try to estimate the one-shot distillable entanglement by devising upper
and lower bounds. We begin in the next section with upper bounds.

13.1.1 Upper Bounds on the Number of Ebits

In this section, we provide three different upper bounds on one-shot distillable

entanglement, based on coherent information, Rains relative entropy, and squashed
entanglement. Our study of upper bounds on one-shot distillable entanglement
begins with coherent information and the following lemma. This can be understood
as a fully quantum generalization of Lemma 11.4, in which we are performing the
entanglement test in (13.1.4) rather than the comparator test from (11.1.37).

Lemma 13.3
Let 𝐴 and 𝐵 be quantum systems with the same dimension 𝑑 ≥ 1. Let Φ 𝐴𝐵
be a maximally entangled state of Schmidt rank 𝑑, and let 𝜔 𝐴𝐵 be an arbitrary
bipartite state. If the probability Tr[Φ 𝐴𝐵 𝜔 𝐴𝐵 ] that the state 𝜔 𝐴𝐵 passes the
entanglement test defined by the POVM {Φ 𝐴𝐵 , 1 𝐴𝐵 − Φ 𝐴𝐵 } satisfies

Tr[Φ 𝐴𝐵 𝜔 𝐴𝐵 ] ≥ 1 − 𝜀 (13.1.7)

for some 𝜀 ∈ [0, 1], then

log2 𝑑 ≤ 𝐼 𝐻𝜀 ( 𝐴⟩𝐵)𝜔 , (13.1.8)

where 𝐼 𝐻𝜀 ( 𝐴⟩𝐵)𝜔 is the 𝜀-hypothesis testing coherent information (see

(7.11.97)).

If, in addition to (13.1.7), we have that Tr 𝐵 [𝜔 𝐴𝐵 ] = 𝜋 𝐴 = 1𝑑𝐴 , then

2 log2 𝑑 ≤ 𝐼 𝐻𝜀 ( 𝐴; 𝐵)𝜔 . (13.1.9)

811
Chapter 13: Entanglement Distillation

Proof: By assumption, we have that

𝐹 (Φ 𝐴𝐵 , 𝜌 𝐴𝐵 ) = Tr[Φ 𝐴𝐵 𝜔 𝐴𝐵 ] ≥ 1 − 𝜀. (13.1.10)

Now, for every state 𝜎𝐵 , we have that

1
Tr[Φ 𝐴𝐵 ( 1 𝐴 ⊗ 𝜎𝐵 )] = ⟨Γ| 𝐴𝐵 ( 1 𝐴 ⊗ 𝜎𝐵 )|Γ⟩ 𝐴𝐵 (13.1.11)
𝑑
1
= Tr[𝜎𝐵 ] (13.1.12)
𝑑
1
= , (13.1.13)
𝑑
where we used (2.2.41). Next, recall from (7.11.97) that

𝐼 𝐻𝜀 ( 𝐴⟩𝐵)𝜔 = inf 𝐷 𝜀𝐻 (𝜔 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ), (13.1.14)

𝜎𝐵

where

𝐷 𝜀𝐻 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ) = − log2 inf {Tr[Λ 𝐴𝐵 ( 1 𝐴 ⊗ 𝜎𝐵 )] : 0 ≤ Λ 𝐴𝐵 ≤ 1 𝐴𝐵 ,

Λ 𝐴𝐵
Tr[Λ 𝐴𝐵 𝜔 𝐴𝐵 ] ≥ 1 − 𝜀}. (13.1.15)

Based on (13.1.10), we see that Φ 𝐴𝐵 is a measurement operator satisfying the

constraints for the optimization in the definition of 𝐷 𝜀𝐻 (𝜔 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ). Therefore,

≥ 2−𝐷 𝐻 (𝜔 𝐴𝐵 ∥ 1 𝐴 ⊗𝜎𝐵 ) ,
1
Tr[Φ 𝐴𝐵 ( 1 𝐴 ⊗ 𝜎𝐵 )] =
𝜀
(13.1.16)
𝑑
which implies that
log2 𝑑 ≤ 𝐷 𝜀𝐻 (𝜔 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ) (13.1.17)
for every state 𝜎𝐵 . Optimizing over 𝜎𝐵 leads to

log2 𝑑 ≤ inf 𝐷 𝜀𝐻 (𝜔 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ) = 𝐼 𝐻𝜀 ( 𝐴⟩𝐵)𝜔 , (13.1.18)

𝜎𝐵

which is precisely (13.1.8).

Similarly,
1
Tr[Φ 𝐴𝐵 (𝜋 𝐴 ⊗ 𝜎𝐵 )] = Tr[Φ 𝐴𝐵 ( 1 𝐴 ⊗ 𝜎𝐵 )] (13.1.19)
𝑑
812
Chapter 13: Entanglement Distillation

1
= ⟨Γ| 𝐴𝐵 ( 1 𝐴 ⊗ 𝜎𝐵 )|Γ⟩ 𝐴𝐵 (13.1.20)
𝑑2
1
= 2, (13.1.21)
𝑑
where the last line follows from the same reasoning for (13.1.11)–(13.1.13). Next,
recall that

𝐼 𝐻𝜀 ( 𝐴; 𝐵)𝜔 = inf 𝐷 𝜀𝐻 (𝜔 𝐴𝐵 ∥𝜔 𝐴 ⊗ 𝜎𝐵 ). (13.1.22)

𝜎𝐵

Therefore, by definition of the hypothesis testing relative entropy,

1 −𝐷 𝐻
𝜀 (𝜔
𝐴𝐵 ∥𝜋 𝐴 ⊗𝜎𝐵 )
Tr[Φ 𝐴𝐵 (𝜋 𝐴 ⊗ 𝜎𝐵 )] = 2
≥ 2 , (13.1.23)
𝑑
which implies that
2 log2 𝑑 ≤ 𝐷 𝜀𝐻 (𝜔 𝐴𝐵 ∥𝜋 𝐴 ⊗ 𝜎𝐵 ). (13.1.24)
Since the state 𝜎𝐵 is arbitrary, we obtain

2 log2 𝑑 ≤ inf 𝐷 𝜀𝐻 (𝜔 𝐴𝐵 ∥𝜋 𝐴 ⊗ 𝜎𝐵 ) = 𝐼 𝐻𝜀 ( 𝐴; 𝐵)𝜔 , (13.1.25)

𝜎𝐵

which is precisely (13.1.9). To obtain the last equality, we made use of the
assumption Tr 𝐵 [𝜔 𝐴𝐵 ] = 𝜋 𝐴 . ■

Note that the result of Lemma 13.3 is general and applies to every bipartite
state that is close in fidelity to a maximally entangled state. Applying it to the state
𝜔 𝐴ˆ 𝐵ˆ = L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ (𝜌 𝐴𝐵 ) at the output of a (𝑑, 𝜀) entanglement distillation protocol
for a state 𝜌 𝐴𝐵 , we obtain the following result:

Theorem 13.4 Upper Bound on One-Shot Distillable Entanglement

Let 𝜌 𝐴𝐵 be a bipartite state. For every (𝑑, 𝜀) entanglement distillation protocol
(𝑑, L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ ) for 𝜌 𝐴𝐵 , with 𝜀 ∈ (0, 1] and 𝑑 𝐴ˆ = 𝑑 𝐵ˆ = 𝑑, the number of
𝜀-approximate ebits extracted at the end of the protocol is bounded from above
by the LOCC-optimized 𝜀-hypothesis testing coherent information of 𝜌 𝐴𝐵 , i.e.,

log2 𝑑 ≤ sup 𝐼 𝐻𝜀 ( 𝐴′⟩𝐵′)L(𝜌) , (13.1.26)

where the optimization is over every LOCC channel L 𝐴𝐵→𝐴′ 𝐵′ . Consequently,

813
Chapter 13: Entanglement Distillation

for the one-shot 𝜀-distillable entanglement, we obtain

𝐸 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 ≤ sup 𝐼 𝐻𝜀 ( 𝐴′⟩𝐵′)L(𝜌) . (13.1.27)

Proof: For a (𝑑, 𝜀) entanglement distillation protocol (𝑑, L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ ) for 𝜌 𝐴𝐵 , by

definition the state 𝜔 𝐴ˆ 𝐵ˆ = L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ (𝜌 𝐴𝐵 ) satisfies Tr[Φ 𝐴ˆ 𝐵ˆ 𝜔 𝐴ˆ 𝐵ˆ ] ≥ 1− 𝜀. Therefore,
using (13.1.8), we conclude that log2 𝑑 ≤ 𝐼 𝐻𝜀 ( 𝐴⟩ ˆ L(𝜌) . Since L
ˆ 𝐵)
𝐴𝐵→ 𝐴ˆ 𝐵ˆ is a
particular LOCC channel, we conclude that
𝐼 𝐻𝜀 ( 𝐴⟩ ˆ L(𝜌) ≤ sup 𝐼 𝐻𝜀 ( 𝐴′⟩𝐵′)L(𝜌) .
ˆ 𝐵) (13.1.28)
L
We thus conclude (13.1.26). Now using the definition of 𝐸 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 in (13.1.5),
we obtain the inequality in (13.1.27). ■

We now consider an upper bound based on the Rains relative entropy. In order
to place an upper bound on the one-shot distillable entanglement 𝐸 𝐷𝜀 (𝜌 𝐴𝐵 ) for a
given state 𝜌 𝐴𝐵 and 𝜀 ∈ [0, 1], we consider states that are useless for entanglement
distillation. This is entirely analogous conceptually to what is done for classical
and entanglement-assisted classical communication in the previous two chapters,
in which we used the set of replacement channels (which are useless for both of
these communication tasks) to place an upper bound on the number of transmitted
bits in an (|M|, 𝜀) protocol, such that the upper bound depends only on the channel
N being used for communication.
What states are useless for entanglement distillation? Note that an intuitive
necessary condition for successful entanglement distillation is that the initial state
𝜌 𝐴𝐵 should be entangled: if 𝜌 𝐴𝐵 is separable, then the output state L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ (𝜌 𝐴𝐵 )
of an arbitrary entanglement distillation protocol is still a separable state. This
suggests that separable states are useless for entanglement distillation. To be
more precise, separable states are useless for entanglement distillation because they
have a very small probability of passing the entanglement test. As we show in
Lemma 13.5 below, the following bound holds for every separable state 𝜎𝐴𝐵 :
1
Tr[Φ 𝐴𝐵 𝜎𝐴𝐵 ] ≤ , (13.1.29)
𝑑
where 𝑑 is the Schmidt rank of Φ 𝐴𝐵 . More generally, operators in the set PPT′,
defined as
PPT′ ( 𝐴 : 𝐵) = {𝜎𝐴𝐵 : 𝜎𝐴𝐵 ≥ 0, ∥T𝐵 (𝜎𝐴𝐵 )∥ 1 ≤ 1}, (13.1.30)
814
Chapter 13: Entanglement Distillation

are also useless for entanglement distillation, in the sense that a statement analgous
to (13.1.29) can be made for them. We now prove this statement.

Lemma 13.5
Let 𝐴 and 𝐵 be quantum systems with the same dimension 𝑑 ≥ 1. Let Φ 𝐴𝐵
be a maximally entangled state of Schmidt rank 𝑑. If 𝜎𝐴𝐵 ∈ PPT′ ( 𝐴 : 𝐵), then
Tr[Φ 𝐴𝐵 𝜎𝐴𝐵 ] ≤ 𝑑1 .

1
Remark: Note that Lemma 13.5 implies that Tr[Φ 𝐴𝐵 𝜎𝐴𝐵 ] ≤ 𝑑 for every separable state 𝜎𝐴𝐵
because SEP( 𝐴 : 𝐵) ⊆ PPT′ ( 𝐴 : 𝐵) (recall Figure 9.2).

Proof: Using the fact that the partial transpose is self-inverse and self-adjoint, as
discussed in (3.2.114) and (3.2.115), respectively, we find that

Tr[Φ 𝐴𝐵 𝜎𝐴𝐵 ] = Tr[T𝐵 (Φ 𝐴𝐵 )T𝐵 (𝜎𝐴𝐵 )] (13.1.31)

1
= Tr[𝑈𝐵 𝐹𝐴𝐵𝑈𝐵† T𝐵 (𝜎𝐴𝐵 )], (13.1.32)
𝑑
where 𝐹𝐴𝐵 is the unitary swap operator and 𝑈𝐵 is a local unitary acting on
system 𝐵. Here we applied the identity T𝐵 (Φ 𝐴𝐵 ) = 𝑑1 𝑈𝐵 𝐹𝐴𝐵𝑈𝐵† from (3.2.126).
Since 𝑈𝐵 𝐹𝐴𝐵𝑈𝐵† is a unitary operator, by the variational characterization of the
trace norm (see (2.2.130)), we obtain
1
Tr[Φ 𝐴𝐵 𝜎𝐴𝐵 ] ≤ ∥T𝐵 (𝜎𝐴𝐵 )∥ 1 (13.1.33)
𝑑
1
≤ , (13.1.34)
𝑑
where the last line follows from the definition of the set PPT′ ( 𝐴 : 𝐵) in (13.1.30). ■

Due to the fact that SEP ⊆ PPT ⊆ PPT′ (see Figure 9.2), Lemma 13.5 tells
us that both separable and PPT states are useless for entanglement distillation.
However, due to the fact that separable states are strictly contained in the set of PPT
states for all bipartite states except for qubit-qubit and qubit-qutrit states, it follows
that there are PPT entangled states that are useless for entanglement distillation.
We elaborate upon this point further in Section 13.2.0.1 below, and we show that
the distillable entanglement (in the asymptotic setting) vanishes for all PPT states.

815
Chapter 13: Entanglement Distillation

The steps followed in the proof of Lemma 13.5 above are completely analogous
to the steps in (13.1.11)–(13.1.13) and in (13.1.19)–(13.1.21) of the proof of
Lemma 13.3. Therefore, just as Lemma 13.3 was used to establish Proposition 13.4,
we can use Lemma 13.5 to place an upper bound on the number log2 𝑑 of approximate
ebits in a bipartite state 𝜔 𝐴𝐵 .

Proposition 13.6
Fix 𝜀 ∈ [0, 1], and let 𝐴 and 𝐵 be quantum systems with the same dimension
𝑑 ≥ 1. Fix a maximally entangled state Φ 𝐴𝐵 of Schmidt rank 𝑑. Let 𝜔 𝐴𝐵 be an
𝜀-approximate maximally entangled state, in the sense that

𝐹 (Φ 𝐴𝐵 , 𝜔 𝐴𝐵 ) = Tr[Φ 𝐴𝐵 𝜔 𝐴𝐵 ] ≥ 1 − 𝜀. (13.1.35)

Then, the number log2 𝑑 of 𝜀-approximate ebits in 𝜔 𝐴𝐵 is bounded from above

as follows:
𝜀
log2 𝑑 ≤ 𝑅𝐻 ( 𝐴; 𝐵)𝜔 , (13.1.36)
where 𝑅𝐻 𝜀 ( 𝐴; 𝐵) is the 𝜀-hypothesis testing Rains relative entropy of 𝜔
𝜔 𝐴𝐵 (see
(9.3.5)).

Proof: Let 𝜎𝐴𝐵 be an arbitrary operator in PPT′ ( 𝐴 : 𝐵). The inequality Tr[Φ 𝐴𝐵 𝜔 𝐴𝐵 ] ≥
1 − 𝜀 guarantees that 𝜔 𝐴𝐵 passes the entanglement test with probability greater than
1 − 𝜀. Thus, we conclude that Φ 𝐴𝐵 is a particular measurement operator satisfying
the constraints for 2−𝐷 𝐻 (𝜔 𝐴𝐵 ∥𝜎𝐴𝐵 ) . Applying Lemma 13.5 and the definition of
𝜀

𝐷 𝜀𝐻 (𝜔 𝐴𝐵 ∥𝜎𝐴𝐵 ), we conclude that

1
2−𝐷 𝐻 (𝜔 𝐴𝐵 ∥𝜎𝐴𝐵 ) ≤ Tr[Φ 𝐴𝐵 𝜎𝐴𝐵 ] ≤
𝜀
. (13.1.37)
𝑑
Rearranging this leads to

log2 𝑑 ≤ 𝐷 𝜀𝐻 (𝜔 𝐴𝐵 ∥𝜎𝐴𝐵 ) (13.1.38)

Since this inequality holds for every operator 𝜎𝐴𝐵 ∈ PPT′ ( 𝐴 : 𝐵), we conclude that

log2 𝑑 ≤ inf′ 𝐷 𝜀𝐻 (𝜔 𝐴𝐵 ∥𝜎𝐴𝐵 ) = 𝑅𝐻

𝜀
( 𝐴; 𝐵)𝜔 , (13.1.39)
𝜎𝐴𝐵 ∈PPT ( 𝐴:𝐵)

𝜀 ( 𝐴; 𝐵) in (9.3.5) to obtain the last equality.

where we used the definition of 𝑅𝐻 ■
𝜔

816
Chapter 13: Entanglement Distillation

A consequence of Proposition 13.6 is the following upper bound on the one-shot

distillable entanglement of 𝜌 𝐴𝐵 .

Theorem 13.7 Rains Upper Bound on One-Shot Distillable Entanglement

Let 𝜌 𝐴𝐵 be a bipartite state. For every (𝑑, 𝜀) entanglement distillation protocol
for 𝜌 𝐴𝐵 , with 𝜀 ∈ [0, 1], we have that
𝜀
log2 𝑑 ≤ 𝑅𝐻 ( 𝐴; 𝐵) 𝜌 . (13.1.40)

Consequently, for the one-shot distillable entanglement, we have

𝐸 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 ≤ 𝑅𝐻
𝜀
( 𝐴; 𝐵) 𝜌 (13.1.41)

for every state 𝜌 𝐴𝐵 and all 𝜀 ∈ [0, 1].

Proof: Consider a (𝑑, 𝜀) entanglement distillation protocol for 𝜌 𝐴𝐵 with the

corresponding LOCC channel L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ . Then, by definition, we have that

𝑝 err (L; 𝜌 𝐴𝐵 ) = 1 − Tr[Φ 𝐴ˆ 𝐵ˆ L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ (𝜌 𝐴𝐵 )] ≤ 𝜀. (13.1.42)

Letting 𝜔 𝐴ˆ 𝐵ˆ = L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ (𝜌 𝐴𝐵 ), we have that Tr[Φ 𝐴ˆ 𝐵ˆ 𝜔 𝐴ˆ 𝐵ˆ ] ≥ 1 − 𝜀. The output

state 𝜔 𝐴ˆ 𝐵ˆ of the entanglement distillation protocol therefore satisfies the conditions
of Proposition 13.6, which means that
𝜀 ˆ ˆ
log2 𝑑 ≤ 𝑅𝐻 ( 𝐴; 𝐵)𝜔 . (13.1.43)

Now, it follows from Proposition 9.25 that 𝑅𝐻 𝜀 ( 𝐴;

ˆ 𝐵)
ˆ is an entanglement measure.
Thus, it satisfies the data-processing inequality under LOCC channels, which means
that 𝑅𝐻𝜀 ( 𝐴;
ˆ 𝐵)ˆ 𝜔 ≤ 𝑅 𝜀 ( 𝐴; 𝐵) 𝜌 . We thus have log2 𝑑 ≤ 𝑅 𝜀 ( 𝐴; 𝐵) 𝜌 . Since this
𝐻 𝐻
inequality holds for all 𝑑 ≥ 1 and every LOCC channel L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ , by definition
of one-shot 𝜀-distillable entanglement, we obtain 𝐸 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 ≤ 𝑅𝐻 𝜀 ( 𝐴; 𝐵) , as
𝜌
required. ■

The main step that allows us to conclude the bound in (13.1.40) in terms of
𝜀 is an entanglement measure, meaning that
the state 𝜌 𝐴𝐵 alone is the fact that 𝑅𝐻
it is monotone non-increasing under LOCC channels. In other words, the set of
PPT′ operators is preserved under LOCC channels. This fact is not true for the set
{1 𝐴 ⊗ 𝜎𝐵 : 𝜎𝐵 ∈ D(H)} appearing in the optimization that defines the 𝜀-hypothesis
817
Chapter 13: Entanglement Distillation

testing coherent information, meaning that the operator L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ ( 1 𝐴 ⊗ 𝜎𝐵 ) (where

L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ is an LOCC channel) is not in general of the form 1𝐵ˆ ⊗ 𝜏𝐵ˆ for some state
𝜏𝐵ˆ . We therefore cannot use the data-processing inequality for the bound in (13.4)
in order to reduce it to log2 𝑑 ≤ 𝐼 𝐻𝜀 ( 𝐴⟩𝐵) 𝜌 .
Combining Theorems 13.4 and 13.7 with Propositions 7.70 and 7.71 immediately
leads to the following upper bounds.

Corollary 13.8
Let 𝜌 𝐴𝐵 be a bipartite state, and let 𝜀 ∈ [0, 1/2). For every (𝑑, 𝜀) entanglement
distillation protocol (𝑑, L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ ), with 𝑑 𝐴ˆ = 𝑑 𝐵ˆ = 𝑑, we have that

1 ′ ′
log2 𝑑 ≤ sup 𝐼 ( 𝐴 ⟩𝐵 )L(𝜌) + ℎ2 (𝜀) , (13.1.44)
1 − 2𝜀 L

where 𝐼 ( 𝐴′⟩𝐵′)L(𝜌) is the coherent information of L 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ) (see

(7.2.92)). For 𝜀 ∈ [0, 1),

𝛼 1
log2 𝑑 ≤ 𝑅e𝛼 ( 𝐴; 𝐵) 𝜌 + log2 ∀ 𝛼 > 1, (13.1.45)
𝛼−1 1−𝜀

where
e𝛼 ( 𝐴; 𝐵) 𝜌 =
𝑅 inf e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 )
𝐷 (13.1.46)
𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)

is the sandwiched Rényi Rains relative entropy of 𝜌 𝐴𝐵 (see (9.3.6)).

Proof: Combining the upper bound in (13.1.26) from Theorem 13.4 with the
upper bound in (7.9.52) from Proposition 7.70, we obtain
log2 𝑑 ≤ sup 𝐼 𝐻𝜀 ( 𝐴′⟩𝐵′)L(𝜌) (13.1.47)
L

1 ′ ′ 𝜀
≤ sup 𝐼 ( 𝐴 ⟩𝐵 )L(𝜌) + ℎ2 (𝜀) + log2 𝑑. (13.1.48)
1−𝜀 L 1−𝜀
Rearranging this and simplifying leads to

1
log2 𝑑 ≤ sup 𝐼 ( 𝐴′⟩𝐵′)L(𝜌) + ℎ2 (𝜀) , (13.1.49)
1 − 2𝜀 L
which is the inequality in (13.1.44). The inequality in (13.1.45) follows from
Theorem 13.7 and (7.9.59) in Proposition 7.71. ■
818
Chapter 13: Entanglement Distillation

Since the upper bounds in (13.1.44) and (13.1.45) hold for all (𝑑, 𝜀) entangle-
ment distillation protocols, we conclude the following upper bounds on distillable
entanglement:

1
𝐸 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 ≤ sup 𝐼 ( 𝐴′⟩𝐵′)L(𝜌) + ℎ2 (𝜀) , (13.1.50)
1 − 2𝜀 L

𝜀 𝛼 1
𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 ≤ 𝑅e𝛼 ( 𝐴; 𝐵) 𝜌 + log2 ∀ 𝛼 > 1. (13.1.51)
𝛼−1 1−𝜀

We finally turn to squashed entanglement and establish it as an upper bound on

one-shot distillable entanglement:

Theorem 13.9 Squashed Entanglement Upper Bound on One-Shot Dis-

tillable Entanglement
Let 𝜌 𝐴𝐵 be a bipartite state. For every (𝑑, 𝜀) entanglement distillation protocol
for 𝜌 𝐴𝐵 , with 𝜀 ∈ [0, 1), we have that
1 √
log2 𝑑 ≤ √ 𝐸 sq ( 𝐴; 𝐵) 𝜌 + 𝑔2 ( 𝜀) , (13.1.52)
1− 𝜀

where 𝐸 sq ( 𝐴; 𝐵) 𝜌 is the squashed entanglement of 𝜌 𝐴𝐵 (see (9.1.162)) and

𝑔2 (𝛿) B (𝛿 +1) log2 (𝛿 +1) − 𝛿 log2 𝛿. Consequently, for the one-shot distillable
entanglement, we have
1 √
𝐸 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 ≤ √ 𝐸 sq ( 𝐴; 𝐵) 𝜌 + 𝑔2 ( 𝜀) (13.1.53)
1− 𝜀

for every state 𝜌 𝐴𝐵 and 𝜀 ∈ [0, 1).

Proof: Consider a (𝑑, 𝜀) entanglement distillation protocol for 𝜌 𝐴𝐵 with the

corresponding LOCC channel L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ . From the LOCC monotonicity of squashed
entanglement (Theorem 9.33), we have that

𝐸 sq ( 𝐴; ˆ 𝜔 ≤ 𝐸 sq ( 𝐴; 𝐵) 𝜌 ,
ˆ 𝐵) (13.1.54)

where 𝜔 𝐴ˆ 𝐵ˆ = L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ (𝜌 𝐴𝐵 ). Continuing, by definition, the following inequality

holds
𝑝 err (L; 𝜌 𝐴𝐵 ) = 1 − Tr[Φ 𝐴ˆ 𝐵ˆ L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ (𝜌 𝐴𝐵 )] ≤ 𝜀. (13.1.55)
819
Chapter 13: Entanglement Distillation

It follows that Tr[Φ 𝐴ˆ 𝐵ˆ 𝜔 𝐴ˆ 𝐵ˆ ] ≥ 1 − 𝜀, which is the same as

𝐹 (Φ 𝐴ˆ 𝐵ˆ , 𝜔 𝐴ˆ 𝐵ˆ ) ≥ 1 − 𝜀. (13.1.56)

As a consequence of Proposition 9.38, we find that

√ √
𝐸 sq ( 𝐴; 𝐵)
ˆ ˆ 𝜔 ≥ 𝐸 sq ( 𝐴; 𝐵)
ˆ ˆ Φ − 𝜀 log2 min 𝐴 , 𝐵ˆ + 𝑔2 ( 𝜀)
ˆ (13.1.57)
√ √
= log2 𝑑 − 𝜀 log2 𝑑 + 𝑔2 ( 𝜀) (13.1.58)
√ √
= (1 − 𝜀) log2 𝑑 − 𝑔2 ( 𝜀). (13.1.59)

The first equality follows from Proposition √ 9.36. We can√finally rearrange the
established inequality 𝐸 sq ( 𝐴; 𝐵) 𝜌 ≥ (1 − 𝜀) log2 𝑑 − 𝑔2 ( 𝜀) to be in the form
stated in the theorem. ■

13.1.2 Lower Bound on the Number of Ebits via Decoupling

Having found upper bounds on one-shot distillable entanglement, we now focus

on lower bounds. In order to find a lower bound on distillable entanglement,
we have to find an explicit entanglement distillation protocol that works for an
arbitrary bipartite state 𝜌 𝐴𝐵 and an arbitrary error 𝜀 ∈ (0, 1). Recall that the goal
of entanglement distillation is for two parties, Alice and Bob, to make use of LOCC
to transform their shared bipartite state 𝜌 𝐴𝐵 to the maximally entangled state Φ 𝐴ˆ 𝐵ˆ ,
for some 𝑑 𝐴ˆ = 𝑑 𝐵ˆ = 𝑑 ≥ 1. Now, the initial state 𝜌 𝐴𝐵 has some purification
|𝜓 𝜌 ⟩ 𝐴𝐵𝐸 , with the purifying system 𝐸 in general correlated with 𝐴 and 𝐵. However,
because the maximally entangled state is pure, every purification of it must be
of the form Φ 𝐴ˆ 𝐵ˆ ⊗ 𝜙 𝐸 ′ , with the system 𝐸 ′ in tensor product with systems 𝐴ˆ and
ˆ Since the goal of entanglement distillation is to distill a maximally entangled
𝐵.
state and the maximally entangled state has this property, we can thus think of
entanglement distillation as the task of decoupling 𝐴 and 𝐵 from their environment
𝐸; see Figure 13.2. Our lower bound on one-shot distillable entanglement tells us
what dimension 𝑑 of 𝐴ˆ and 𝐵ˆ is sufficient in order to achieve this decoupling up to
error 𝜀.
The lower bound on one-shot distillable entanglement that we determine in
this section is expressed in terms of an information measure that is derived from
a smoothed version of the max-relative entropy 𝐷 max , which we briefly cover in
Section 7.8.1. Recall from Definition 7.58 that the max-relative entropy of a state

820
Chapter 13: Entanglement Distillation

E ρE
A Â

ψ ABE
L Φ+
Â B̂

B B̂

Figure 13.2: The task of entanglement distillation can be understood from the
perspective of decoupling: given a bipartite state 𝜌 𝐴𝐵 with purification 𝜓 𝐴𝐵𝐸 ,
the entanglement distillation protocol given by the LOCC channel L should
result in the pure maximally entangled state Φ 𝐴ˆ 𝐵ˆ , which by definition is in
tensor product with the environment, so that the joint state is Φ 𝐴ˆ 𝐵ˆ ⊗ 𝜌 𝐸 , with
𝜌 𝐸 = Tr 𝐴𝐵 [𝜓 𝐴𝐵𝐸 ].

𝜌 and a positive semi-definite operator 𝜎 is defined as

1 1
𝐷 max (𝜌∥𝜎) = log2 𝜎 − 2 𝜌𝜎 − 2 . (13.1.60)
∞

Using this, we define the conditional min-entropy as

𝐻min ( 𝐴|𝐵) 𝜌 B − inf 𝐷 max (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ) (13.1.61)
𝜎𝐵

where the optimization is with respect to every state 𝜎𝐵 .

From Definition 7.62, the smooth max-relative entropy is defined as
𝜀
𝐷 max (𝜌∥𝜎) = inf 𝐷 max (e
𝜌 ∥𝜎), (13.1.62)
𝜌 ∈B 𝜀 (𝜌)
e

where we recall from (7.8.41) that

B𝜀 (𝜌) = {e 𝜌 ) ≤ 𝜀},
𝜌 : 𝑃(𝜌, e (13.1.63)
𝜌 ) is given by (see Definition 6.16)
and the sine distance 𝑃(𝜌, e
√︁
𝑃(𝜌, e 𝜌 ) = 1 − 𝐹 (𝜌, e
𝜌 ). (13.1.64)
Using the smooth max-relative entropy, we define the smooth conditional min-
entropy of 𝜌 𝐴𝐵 as
𝜀
𝐻min 𝜀
( 𝐴|𝐵) 𝜌 = − inf 𝐷 max (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ) (13.1.65)
𝜎𝐵

821
Chapter 13: Entanglement Distillation

= sup 𝐻min ( 𝐴|𝐵) e𝜌 (13.1.66)

𝜌 ∈B 𝜀 (𝜌)
e

for all 𝜀 ∈ (0, 1), where the optimization in the first line is with respect to states 𝜎𝐵 .
We also need the smooth conditional max-entropy of 𝜌 𝐴𝐵 , which is defined as
𝜀
𝐻max ( 𝐴|𝐵) 𝜌 B inf 𝐻max ( 𝐴|𝐵) 𝜌 (13.1.67)
𝜌 ∈B 𝜀 (𝜌)
e

for all 𝜀 ∈ (0, 1), where

e1 ( 𝐴|𝐵) 𝜌
𝐻max ( 𝐴|𝐵) 𝜌 B 𝐻 (13.1.68)
2
e 1 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 )
= − inf 𝐷 (13.1.69)
𝜎𝐵 2

= sup log2 𝐹 (𝜌 𝐴𝐵 , 1 𝐴 ⊗ 𝜎𝐵 ). (13.1.70)

𝜎𝐵

The obtain the last equality, we made use of (7.5.8), and the optimization therein is
with respect to states 𝜎𝐵 .
For every state 𝜌 𝐴𝐵 , the conditional min- and max-entropies are related as
follows:
𝐻max ( 𝐴|𝐵) 𝜌 = −𝐻min ( 𝐴|𝐸)𝜓 (13.1.71)
𝜀 𝜀
𝐻max ( 𝐴|𝐵) 𝜌 = −𝐻min ( 𝐴|𝐸)𝜓 , (13.1.72)
for all 𝜀 ∈ (0, 1), where 𝜓 𝐴𝐵𝐸 is a purification of 𝜌 𝐴𝐵 .
Both the conditional min-entropy and the smooth conditional min-entropy can
be formulated as semi-definite programs. The same is true for the conditional max-
entropy and the smooth conditional max-entropy. Please consult the Bibliographic
Notes in Section 13.5 for details.
Finally, we need the quantity
e2 ( 𝐴|𝐵) 𝜌 B − inf 𝐷
𝐻 e2 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ) (13.1.73)
𝜎𝐵
" 2#
1 1
− −
= − inf log2 Tr 𝜎𝐵 4 𝜌 𝐴𝐵 𝜎𝐵 4 , (13.1.74)
𝜎𝐵

which is known as the “conditional collision entropy” of 𝜌 𝐴𝐵 , where the optimization

is with respect to states 𝜎𝐵 . Due to monotonicity in 𝛼 of the sandwiched Rényi
relative entropy 𝐷
e𝛼 (see Proposition 7.31), we have that

𝐻min ( 𝐴|𝐵) 𝜌 ≤ 𝐻
e2 ( 𝐴|𝐵) 𝜌 (13.1.75)
822
Chapter 13: Entanglement Distillation

for every bipartite state 𝜌 𝐴𝐵 .

We are now ready to state a lower bound on one-shot distillable entanglement.

Theorem 13.10 Lower Bound on One-Shot Distillable Entanglement

√
Let 𝜌 𝐴𝐵 be a quantum state. For all 𝜀 ∈ (0, 1] and 𝜂 ∈ [0, 𝜀), there exists a
(𝑑, 𝜀) one-way entanglement distillation protocol for 𝜌 𝐴𝐵 with
√
𝜀−𝜂
log2 𝑑 = −𝐻max ( 𝐴|𝐵) 𝜌 + 4 log2 𝜂. (13.1.76)

Consequently, for the one-shot distillable entanglement of 𝜌 𝐴𝐵 , we have

√
𝜀 𝜀−𝜂 ′ ′
𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 ≥ sup −𝐻max ( 𝐴 |𝐵 )L(𝜌) + 4 log2 𝜂 (13.1.77)
L
√
for all 𝜀 ∈ [0, 1] and 𝜂 ∈ [0, 𝜀), where the optimization is over every LOCC
channel L 𝐴𝐵→𝐴′ 𝐵′ .

In order to prove Theorem 13.10, we exhibit an√ entanglement distillation

𝜀− 𝜂
protocol (𝑑, L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ ), with 𝑑 𝐴ˆ = 𝑑 𝐵ˆ = 𝑑 = 2−𝐻max ( 𝐴|𝐵)𝜌 +4 log2 𝜂 , such that
𝑝 err (L; 𝜌 𝐴𝐵 ) ≤ 𝜀 for all 𝜀 ∈ (0, 1]. To this end, we construct a one-way LOCC
channel L→ ˆ ˆ of the form
𝐴𝐵→ 𝐴 𝐵
∑︁
→
L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ = E𝑥𝐴→ 𝐴ˆ ⊗ D𝑥𝐵→𝐵ˆ , (13.1.78)
𝑥∈X

where X is a finite alphabet, {E𝑥 ˆ }𝑥∈X is a set of completely positive maps such
Í 𝐴→ 𝐴
that 𝑥∈X E𝑥 ˆ is trace preserving, and {D𝑥 ˆ }𝑥∈X is a set of channels. Recall
𝐴→ 𝐴 𝐵→ 𝐵
from Section 4.6.2 that every one-way Alice-to-Bob LOCC channel can be written
as
L→𝐴𝐵→ 𝐴ˆ 𝐵ˆ
= D𝐵𝑋𝐵 →𝐵ˆ ◦ C 𝑋 𝐴→𝑋𝐵 ◦ E 𝐴→ 𝐴𝑋
ˆ 𝐴, (13.1.79)
where E 𝐴→ 𝐴𝑋
ˆ 𝐴 is a local channel for Alice that corresponds to the quantum
instrument given by the maps {E𝑥 ˆ }𝑥∈X , i.e., (see also (4.4.53))
𝐴→ 𝐴
∑︁
E 𝐴→ 𝐴𝑋
ˆ 𝐴 (𝜌 𝐴 ) = E𝑥𝐴→ 𝐴ˆ (𝜌 𝐴 ) ⊗ |𝑥⟩⟨𝑥| 𝑋 𝐴 . (13.1.80)
𝑥∈X

The map C 𝑋 𝐴→𝑋𝐵 is a noiseless classical channel that transforms the classical
register 𝑋 𝐴 , held by Alice, to the classical register 𝑋𝐵 (which is simply a copy of
823
Chapter 13: Entanglement Distillation

𝑋 𝐴 ), held by Bob. The final channel D𝐵𝑋𝐵 →𝐵ˆ is a local channel for Bob defined as
D𝐵𝑋𝐵 →𝐵ˆ (𝜌 𝐵 ⊗ |𝑥⟩⟨𝑥| 𝑋𝐵 ) = D𝑥𝐵→𝐵ˆ (𝜌 𝐵 ) (13.1.81)
for all 𝑥 ∈ X. In the proof below, we explicitly construct the CP maps {E 𝐴→ 𝐴ˆ }𝑥∈X
and the channels {D𝑥 ˆ }𝑥∈X .
𝐵→ 𝐵
In addition to providing explicit forms for the channels E 𝐴→ 𝐴𝑋 ˆ 𝐴 and D 𝐵𝑋 𝐵 → 𝐵ˆ
involved in the LOCC channel L → in (13.1.78), we prove that 𝑝 err (L; 𝜌 𝐴𝐵 ) ≤ 𝜀.
𝐴𝐵→ 𝐴ˆ 𝐵ˆ
To do this, we make use of the following general decoupling result, which we
explain and prove in Appendix 13.A.

Theorem 13.11
Given a subnormalized state 𝜌 𝐴𝐸 (i.e., Tr[𝜌 𝐴𝐸 ] ≤ 1), and a completely positive
map N 𝐴→𝐴′ , the following bound holds
∫
N 𝐴→𝐴′ (𝑈 𝐴 𝜌 𝐴𝐸 𝑈 𝐴† ) − ΦN
𝐴′ ⊗ 𝜌 𝐸 d𝑈 𝐴
𝑈𝐴 1
1 1 ′
≤ 2− 2 𝐻2 ( 𝐴|𝐸)𝜌 − 2 𝐻2 ( 𝐴| 𝐴 )ΦN , (13.1.82)
e e

where ΦN N N
𝐴′ B Tr 𝐴 [Φ 𝐴𝐴′ ], Φ 𝐴𝐴′ is given by N 𝐴→𝐴 (Φ 𝐴𝐴 ), 𝜌 𝐸 B Tr 𝐴 [𝜌 𝐴𝐸 ],
′

and the integral is over unitaries 𝑈 𝐴 acting on system 𝐴, taken with respect to
the Haar measure.

Proof: See Appendix 13.A. ■

Remark: The integral in (13.1.82) with respect to the Haar measure should be thought of as
a uniform average over the continuous set of all unitaries 𝑈 𝐴 acting on the system 𝐴. In other
words, the integral is analogous to a uniform average over a discrete set of unitaries. In fact, for
every dimension 𝑑 ≥ 1, there exists a set {𝑈 𝑥 } 𝑥 ∈X of unitaries, called a unitary one-design, such
that
1
∫
1 ∑︁
𝑈 𝑋𝑈 † d𝑈 = 𝑈 𝑥 𝑋𝑈 𝑥† = Tr[𝑋] (13.1.83)
𝑈 |X| 𝑑
𝑥 ∈X
for every operator 𝑋. An example of a unitary one-design is the Heisenberg-Weyl operators
{𝑊𝑧,𝑥 : 0 ≤ 𝑧, 𝑥 ≤ 𝑑−1}, which are defined in (3.2.48)–(3.2.50). Please consult the Bibliographic
Notes in Section 13.5 for more information about integration over unitaries with respect to the
Haar measure and about unitary designs. A simple argument for the right-most equality in
(13.1.83) goes as follows. First, it follows for a unitary 𝑉 that
∫ ∫ ∫
† † †
𝑉 𝑈 𝑋𝑈 d𝑈 𝑉 = 𝑉𝑈 𝑋 (𝑉𝑈) d𝑈 = 𝑈 𝑋𝑈 † d𝑈, (13.1.84)
𝑈 𝑈 𝑈

824
Chapter 13: Entanglement Distillation

where the final equality follows

∫ because the Haar measure is a unitarily invariant measure. So it
follows that the operator 𝑈 𝑈 𝑋𝑈 † d𝑈 commutes with all unitaries. The only operator that does
so is the identity operator, which implies that 𝑈 𝑈 𝑋𝑈 † d𝑈 ∝ 1. The normalization factor of
∫

Tr[𝑋]/𝑑 follows by taking a trace of the left-hand side, using its cyclicity, and the fact that d𝑈 is
a probability measure.

Proof of Theorem 13.10

√
Fix 𝜂 ∈ (0, 𝜀), and let 𝜓 𝐴𝐵𝐸 be a purification of 𝜌 𝐴𝐵 , with 𝜌 𝐴𝐸 B Tr 𝐵 [𝜓 𝐴𝐵𝐸 ].
Fix 𝑑 such that (13.1.76) holds. Then, starting from the expression in (13.1.76)
and using (13.1.72), we find that
√
𝜀−𝜂
log2 𝑑 = 𝐻min ( 𝐴|𝐸) 𝜌 + 4 log2 𝜂. (13.1.85)
√
𝜌 𝐴𝐸 ∈ B
Now, pick a state e 𝜀−𝜂 (𝜌
𝐴𝐸 ) such that
√
𝜀−𝜂
𝐻min ( 𝐴|𝐸) e𝜌 = 𝐻min ( 𝐴|𝐸) 𝜌 . (13.1.86)
Then, using (13.1.75), we find that
log2 𝑑 = 𝐻min ( 𝐴|𝐸) e𝜌 + 4 log2 𝜂 (13.1.87)
≤𝐻e2 ( 𝐴|𝐸) e𝜌 + 4 log2 𝜂 (13.1.88)

=𝐻e2 ( 𝐴|𝐸) e𝜌 − 2 log2 1 , (13.1.89)
𝜂2
where " 2#
− 14 − 14
e2 ( 𝐴|𝐸) e𝜌 = − inf log2 Tr 𝜎 e
𝐻 (13.1.90)
𝜎𝐸 𝐸 𝜌 𝐴𝐸 𝜎𝐸

We now define a channel E 𝐴→ 𝐴𝑋

ˆ 𝐴 as follows:
∑︁
𝑥 𝑥†
E 𝐴→ 𝐴𝑋
ˆ 𝐴 (·) B
𝑥
𝑉𝐴→ ˆ
𝐴
Π 𝑥
𝐴 (·)Π 𝐴𝑉 ˆ ⊗ |𝑥⟩⟨𝑥| 𝑋 𝐴 , (13.1.91)
𝐴→ 𝐴
𝑥∈X
𝑑𝐴
where 𝑑 𝐴ˆ = 𝑑, X is a finite alphabet with1 |X| = 𝑑 𝑋 𝐴 = 𝑑 , {Π 𝑥𝐴 }𝑥∈X is a set of
1We assume that 𝑑 divides 𝑑 𝐴 without loss of generality. If it is not the case, then we can
repeat the whole analysis with the system 𝐴 embedded in a larger Hilbert space that is divided
by√𝑑. We would also need to√ start with a state e
𝜌 𝐴𝐸 such that (13.1.86) holds with the definition
𝜀− 𝜂 𝜀− 𝜂
𝐻min ( 𝐴|𝐸)𝜌 B − inf 𝜎𝐸 𝐷 max (𝜌 𝐴𝐸 ∥Π 𝐴 ⊗ 𝜎𝐵 ), where Π 𝐴 is the projection onto the support of
Tr𝐸 [𝜌 𝐴𝐸 ]. Then we would repeat the whole analysis with such a e𝜌 𝐴𝐸 . We do not go into further
details here.

825
Chapter 13: Entanglement Distillation

= 1 𝐴 , and {𝑉 𝑥
Í
projectors such that 𝑥∈X Π 𝐴
𝑥 }𝑥∈X is a set of isometries. So we
𝐴→ 𝐴ˆ
have that
𝑥 𝑥†
E𝑥𝐴→ 𝐴ˆ (·) B 𝑉𝐴→
𝑥
ˆ Π 𝑥
𝐴 (·)Π 𝐴𝑉 (13.1.92)
𝐴 𝐴→ 𝐴ˆ
for all 𝑥 ∈ X. Each isometry 𝑉 𝑥 ˆ takes the subspace of H 𝐴 onto which Π 𝑥𝐴
𝐴→ 𝐴
projects and embeds it into the fixed 𝑑-dimensional space H 𝐴ˆ , i.e., im(𝑉 𝑥 ˆ ) = H 𝐴ˆ
𝐴→ 𝐴
for all 𝑥 ∈ X. The projectors {Π 𝑥𝐴 }𝑥∈X correspond to a measurement of the input
state, with Π 𝑥𝐴 (·)Π 𝑥𝐴 the (unnormalized) post-measurement state, and the isometries
{𝑉 𝑥 ˆ }𝑥∈X can be thought of as encodings of the initial system 𝐴 into the system 𝐴ˆ
𝐴→ 𝐴
on which one share of the desired maximally entangled state Φ 𝐴ˆ 𝐵ˆ is to be generated.
We have
∑︁
E 𝐴→ 𝐴𝑋 ˆ 𝐴 (𝜌 𝐴𝐵 ) = E 𝐴→ 𝐴ˆ (𝜌 𝐴𝐵 ) ⊗ |𝑥⟩⟨𝑥| 𝑋 𝐴 (13.1.93)
𝑥∈X
∑︁
= V𝑥𝐴→ 𝐴ˆ (Π 𝑥𝐴 𝜌 𝐴𝐵 Π 𝑥𝐴 ) ⊗ |𝑥⟩⟨𝑥| 𝑋 𝐴 (13.1.94)
𝑥∈X
∑︁
= 𝑝(𝑥)𝜔𝑥𝐴𝐵
ˆ ⊗ |𝑥⟩⟨𝑥| 𝑋 𝐴 , (13.1.95)
𝑥∈X

where
𝑝(𝑥) B Tr[Π 𝐴 𝜌 𝐴 ], (13.1.96)
1 𝑥
𝜔𝑥𝐴𝐵
ˆ B V 𝐴→ 𝐴ˆ (Π 𝑥𝐴 𝜌 𝐴𝐵 Π 𝑥𝐴 ). (13.1.97)
𝑝(𝑥)

Now, by Theorem 13.11, the following inequality holds

∫
E 𝐴→ 𝐴𝑋 𝜌 𝐴𝐸 𝑈 𝐴† ) − ΦE𝐴𝑋
ˆ 𝐴 (𝑈 𝐴 e ˆ ⊗e
𝜌 𝐸 d𝑈 𝐴
𝐴 1
𝑈𝐴
1 1
≤ 2− 2 𝐻2 ( 𝐴|𝐸)𝜌e− 2 𝐻2 ( 𝐴| 𝐴𝑋 𝐴)ΦE , (13.1.98)
e e ˆ

where ΦE ˆ is the Choi state of E 𝐴→ 𝐴𝑋 ˆ 𝐴 . Given that log2 𝑑 ≤ 𝐻e2 ( 𝐴|𝐸) e𝜌 −

𝐴 𝐴𝑋 𝐴

2 log2 𝜂12 , we obtain

1e 1
2− 2 𝐻2 ( 𝐴|𝐸)𝜌e ≤ √ 𝜂2 . (13.1.99)
𝑑
We also have that
( " 2# )
1 1
e2 ( 𝐴| 𝐴𝑋
𝐻 ˆ 𝐴 )ΦE = sup − log2 Tr 𝜎 − 4 ΦE 𝜎
−4
(13.1.100)
ˆ ˆ
𝐴 𝐴𝑋
𝐴𝑋 𝐴 ˆ 𝐴 𝐴𝑋 𝐴
𝜎𝐴𝑋
ˆ
𝐴

826
Chapter 13: Entanglement Distillation

≥ − log2 𝑑. (13.1.101)

Indeed, in the optimization over 𝜎𝐴𝑋

ˆ 𝐴 , take

1 ∑︁
ˆ 𝐴 =
𝜎𝐴𝑋 𝜋 𝐴ˆ ⊗ |𝑥⟩⟨𝑥| 𝑋 𝐴 (13.1.102)
𝑑𝑋𝐴
𝑥∈X
= 𝜋 𝐴ˆ ⊗ 𝜋 𝑋 𝐴 (13.1.103)
1
= 1 ˆ ⊗ 1𝑋𝐴 . (13.1.104)
𝑑𝐴 𝐴

With this choice of 𝜎𝐴𝑋

ˆ 𝐴 , we find that
" # 2
−1 − 14
Tr 𝜎 ˆ 4 ΦE𝐴 𝐴𝑋 𝜎
ˆ 𝐴 𝐴𝑋
ˆ 𝐴
𝐴𝑋 𝐴
2
= 𝑑 𝐴 Tr ΦE𝐴 𝐴𝑋
ˆ 𝐴 (13.1.105)
 ∑︁ ! 2
†
 𝑥 𝑥 𝑥 𝑥

= 𝑑 𝐴 Tr  𝑉𝐴→ Π
𝐴ˆ 𝐴
Φ 𝐴𝐴 Π (𝑉
𝐴 𝐴→ 𝐴ˆ ) ⊗ |𝑥⟩⟨𝑥| 𝑋 𝐴

 (13.1.106)
 𝑥∈X 
 
∑︁ 2
= 𝑑𝐴 𝑥
Tr 𝑉𝐴→ Π 𝑥 Φ Π 𝑥 (𝑉 𝑥 ) †
𝐴ˆ 𝐴 𝐴𝐴 𝐴 𝐴→ 𝐴ˆ
(13.1.107)
𝑥∈X
2
𝐴−1
𝑑∑︁

∑︁ © 𝑥 1 †ª 

= 𝑑𝐴 Tr 𝑉𝐴→ 𝐴ˆ Π 𝑥𝐴 |𝑖⟩⟨ 𝑗 | 𝐴 ⊗ |𝑖⟩⟨ 𝑗 | 𝐴 Π 𝑥𝐴 (𝑉𝐴→
𝑥
𝐴ˆ
) ® (13.1.108)
𝑑 𝐴 𝑖, 𝑗=0 
𝑥∈X 
«

¬ 
2
∑︁ © 1 ∑︁ 𝑑−1
ª 
= 𝑑𝐴 Tr  |𝑖⟩⟨ 𝑗 | 𝐴ˆ ⊗ |𝑖⟩⟨ 𝑗 | 𝐴ˆ ®  (13.1.109)
𝑥∈X  𝑑 𝐴 𝑖, 𝑗=0 
« ¬ 
2
∑︁ © 𝑑 1 ∑︁ 𝑑−1
ª 
= 𝑑𝐴 Tr  |𝑖⟩⟨ 𝑗 | 𝐴ˆ ⊗ |𝑖⟩⟨ 𝑗 | 𝐴ˆ ®  (13.1.110)
𝑥∈X  𝑑 𝐴 𝑑 𝑖, 𝑗=0 
« ¬ 
2
2
 ∑︁ 𝑑−1
𝑑 ∑︁  ©1 ª 
= Tr  |𝑖⟩⟨ 𝑗 | 𝐴ˆ ⊗ |𝑖⟩⟨ 𝑗 | 𝐴ˆ ®  (13.1.111)
𝑑𝐴  𝑑 𝑖, 𝑗=0
𝑥∈X « ¬ 

𝑑2
= |X| (13.1.112)
𝑑𝐴
827
Chapter 13: Entanglement Distillation

= 𝑑, (13.1.113)
𝑑𝐴
where we recall that 𝑑 𝑋 𝐴 = |X| = 𝑑 . We thus have
1 √
2− 2 𝐻2 ( 𝐴| 𝐴𝑋 𝐴)ΦE ≤
e ˆ
𝑑, (13.1.114)

which means that

∫
E 𝐴→ 𝐴𝑋 𝜌 𝐴𝐸 𝑈 𝐴† ) − ΦE𝐴𝑋
ˆ 𝐴 (𝑈 𝐴 e ˆ ⊗e
𝜌𝐸 d𝑈 𝐴 ≤ 𝜂2 . (13.1.115)
𝐴 1
𝑈𝐴

Note that
1 ∑︁ 𝑥
ΦE𝐴𝑋
ˆ = 𝑉𝐴→ 𝐴ˆ Π 𝑥𝐴 1 𝐴 Π 𝑥𝐴𝑉 𝑥† ˆ ⊗ |𝑥⟩⟨𝑥| 𝑋 𝐴 (13.1.116)
𝐴 𝑑𝐴 𝐴→ 𝐴
𝑥∈X
1 ∑︁ 𝑥
= 𝑉𝐴→ 𝐴ˆ Π 𝑥𝐴𝑉 𝑥† ˆ ⊗ |𝑥⟩⟨𝑥| 𝑋 𝐴 (13.1.117)
𝑑𝐴 𝐴→ 𝐴
𝑥∈X
1
= 1 ˆ ⊗ 1𝑋𝐴 (13.1.118)
𝑑𝐴 𝐴
= 𝜋 𝐴ˆ ⊗ 𝜋 𝑋 𝐴 , (13.1.119)

where the last equality follows because 𝑑 𝑋 𝐴 = 𝑑𝑑𝐴 . So we have

∫
E 𝐴→ 𝐴𝑋 𝜌 𝐴𝐸 𝑈 𝐴† ) − 𝜋 𝐴ˆ ⊗ 𝜋 𝑋 𝐴 ⊗ e
ˆ 𝐴 (𝑈 𝐴 e 𝜌 𝐸 d𝑈 𝐴 ≤ 𝜂2 . (13.1.120)
𝑈𝐴 1

Now, since the average over a set of elements is never less than the minimum over
the same set, we have that
∫
E 𝐴→ 𝐴𝑋 𝜌 𝐴𝐸 𝑈 𝐴† ) − 𝜋 𝐴ˆ ⊗ 𝜋 𝑋 𝐴 ⊗ e
ˆ 𝐴 (𝑈 𝐴 e 𝜌 𝐸 d𝑈 𝐴
𝑈𝐴 1

≥ min E 𝐴→ 𝐴𝑋 𝜌 𝐴𝐸 𝑈 𝐴† ) − 𝜋 𝐴ˆ ⊗ 𝜋 𝑋 𝐴 ⊗ e
ˆ 𝐴 (𝑈 𝐴 e 𝜌𝐸 (13.1.121)
𝑈𝐴 1

This implies that there exists a unitary 𝑈 𝐴 (in particular, one that achieves the
minimum on the right-hand side of the above inequality) such that
∫
2
𝜂 ≥ E 𝐴→ 𝐴𝑋 𝜌 𝐴𝐸 𝑈 𝐴† ) − 𝜋 𝐴ˆ ⊗ 𝜋 𝑋 𝐴 ⊗ e
ˆ 𝐴 (𝑈 𝐴 e 𝜌𝐸 d𝑈 𝐴
𝑈𝐴 1

828
Chapter 13: Entanglement Distillation

≥ E 𝐴→ 𝐴𝑋 𝜌 𝐴𝐸 𝑈 𝐴† ) − 𝜋 𝐴ˆ ⊗ 𝜋 𝑋 𝐴 ⊗ e
ˆ 𝐴 (𝑈 𝐴 e 𝜌𝐸 (13.1.122)
1

Now, let

𝜔 ˆ 𝐴 𝐸 = E 𝐴→ 𝐴𝑋
e𝐴𝑋 𝜌 𝐴𝐸 𝑈 𝐴† ), e
ˆ 𝐴 (𝑈 𝐴 e ˆ 𝐴 𝐸 = 𝜋 𝐴ˆ ⊗ 𝜋 𝑋 𝐴 ⊗ e
𝜏𝐴𝑋 𝜌𝐸 , (13.1.123)
†
ˆ 𝐴 𝐸 = E 𝐴→ 𝐴𝑋
𝜔 𝐴𝑋 ˆ 𝐴 (𝑈 𝐴 𝜌 𝐴𝐸 𝑈 𝐴 ), ˆ 𝐴 𝐸 = 𝜋 𝐴ˆ ⊗ 𝜋 𝑋 𝐴 ⊗ 𝜌 𝐸 .
𝜏𝐴𝑋 (13.1.124)

Then, by the Fuchs–van de Graaf inequality (see (6.2.88)), and by the definition of
the sine distance (see Definition 6.16), we have that

𝜂2 ≥ 𝜔 ˆ 𝐴𝐸 − e
e𝐴𝑋 𝜏𝐴𝑋
ˆ 𝐴𝐸 (13.1.125)
1
√︃
≥ 2 − 2 𝐹 (𝜔 ˆ 𝐴𝐸 , e
e𝐴𝑋 ˆ 𝐴𝐸 )
𝜏𝐴𝑋 (13.1.126)
√︃
≥ 2 − 2 1 − 𝑃( 𝜔 ˆ 𝐴𝐸 , e
e𝐴𝑋 ˆ 𝐴𝐸 ) ,
𝜏𝐴𝑋 2 (13.1.127)

which implies that

√︄
2
𝜂2
𝑃( 𝜔 , 𝜏
ˆ 𝐴 𝐸 𝐴𝑋
e𝐴𝑋 e ˆ 𝐴𝐸 ) ≤ 1 − 1 − (13.1.128)
2
√︂
𝜂2
=𝜂 1− (13.1.129)
4
≤ 𝜂. (13.1.130)

Then, by the triangle inequality for sine distance (Lemma 6.17), we have

𝑃(𝜔 𝐴𝑋
ˆ 𝐴𝐸 , eˆ 𝐴 𝐸 ) ≤ 𝑃(𝜔 𝐴𝑋
𝜏𝐴𝑋 ˆ 𝐴𝐸 , 𝜔 ˆ 𝐴 𝐸 ) + 𝑃( 𝜔
e𝐴𝑋 ˆ 𝐴𝐸 , e
e𝐴𝑋 ˆ 𝐴𝐸 )
𝜏𝐴𝑋 (13.1.131)
≤ 𝑃(𝜌 𝐴𝐸 , e
𝜌 𝐴𝐸 ) + 𝜂 (13.1.132)
√
≤ 𝜀−𝜂+𝜂 (13.1.133)
√
= 𝜀, (13.1.134)

where the second inequality follows from the data-processing inequality for the sine
distance, unitary invariance of the sine distance, and the inequality in (13.1.130).
To
√ obtain the last inequality, we used the definition of the state e
𝜌 𝐴𝐸 as one that is
( 𝜀 − 𝜂)-close to 𝜌 𝐴𝐸 in sine distance. We can write the inequality in (13.1.134)
in terms of fidelity as
𝐹 (𝜔 𝐴𝑋
ˆ 𝐴𝐸 , eˆ 𝐴 𝐸 ) ≥ 1 − 𝜀.
𝜏𝐴𝑋 (13.1.135)
829
Chapter 13: Entanglement Distillation

Note that both 𝜔 𝐴𝑋 ˆ 𝐴 𝐸 and e ˆ 𝐴 𝐸 are classical-quantum states because 𝑋 𝐴 is a

𝜏𝐴𝑋
classical register. In particular,
∑︁
ˆ 𝐴𝐸 =
𝜔 𝐴𝑋 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 𝐴 ⊗ 𝜔𝑥𝐴𝐸
ˆ , (13.1.136)
𝑥∈X
𝑝(𝑥) = Tr[Π 𝑥𝐴𝑈 𝐴 𝜌 𝐴𝑈 𝐴† ], (13.1.137)
1 𝑥
𝜔𝑥𝐴𝐸
ˆ = V 𝐴→ 𝐴ˆ (Π 𝑥𝐴𝑈 𝐴 𝜌 𝐴𝐸 𝑈 𝐴† Π 𝑥𝐴 ). (13.1.138)
𝑝(𝑥)
Also,
1 ∑︁
ˆ 𝐴 𝐸 = 𝜋 𝐴ˆ ⊗
𝜏𝐴𝑋
e |𝑥⟩⟨𝑥| 𝑋 𝐴 ⊗ e
𝜌𝐸 . (13.1.139)
𝑑𝑋𝐴
𝑥∈X
Then, using the direct-sum property of the root fidelity (see (6.2.58)), we obtain
√ 2
𝐹 (𝜔 𝐴𝑋
ˆ 𝐴𝐸 , eˆ 𝐴𝐸 ) =
𝜏𝐴𝑋 𝐹 (𝜔 𝐴𝑋
ˆ 𝐴𝐸 , e ˆ 𝐴𝐸 )
𝜏𝐴𝑋 (13.1.140)
√︄ !2
∑︁ 𝑝(𝑥) √
= 𝐹 (𝜔𝑥𝐴𝐸
ˆ , 𝜋 𝐴ˆ ⊗ e
𝜌𝐸 ) . (13.1.141)
𝑑𝑋𝐴
𝑥∈X

Now, let
1 𝑥
𝜓 𝑥𝐴𝐵𝐸
ˆ B V 𝐴→ 𝐴ˆ (Π 𝑥𝐴𝑈 𝐴 𝜓 𝐴𝐵𝐸 𝑈 𝐴† Π 𝑥𝐴 ) (13.1.142)
𝑝(𝑥)
be a purification of 𝜔𝑥ˆ for all 𝑥 ∈ X, and let Φ 𝐴ˆ 𝐵ˆ ⊗ 𝜙e𝐸 𝐵′ be a purification of
𝐴𝐸
𝜌 𝐸 . Then, by Uhlmann’s theorem (Theorem 6.8), for every 𝑥 ∈ X there exists
𝜋 𝐴ˆ ⊗ e
an isometric channel W𝑥 ˆ ′ such that
𝐵→ 𝐵𝐵
√ √
𝐹 (𝜔𝑥𝐴𝐸
ˆ , 𝜋 𝐴ˆ ⊗ e
𝜌𝐸 ) = 𝐹 (W𝑥𝐵→𝐵𝐵 𝑥
ˆ ′ (𝜓 𝐴𝐵𝐸
ˆ ), Φ 𝐴ˆ 𝐵ˆ ⊗ 𝜙e𝐸 𝐵′ ) (13.1.143)

for all 𝑥 ∈ X. Using the set {W𝑥 ˆ ′ }𝑥∈X , we define the quantum channels
𝐵→ 𝐵𝐵
{D𝑥 ˆ }𝑥∈X as follows:
𝐵→ 𝐵

D𝑥𝐵→𝐵ˆ B Tr 𝐵′ ◦ W𝑥𝐵→𝐵𝐵
ˆ ′. (13.1.144)

By the data-processing inequality for fidelity (see Theorem 6.9) under the partial
trace channel Tr𝐸 𝐵′ , we obtain
√
𝐹 (𝜔𝑥𝐴𝐸
ˆ , 𝜋 𝐴ˆ ⊗ e
𝜌𝐸 )

830
Chapter 13: Entanglement Distillation
√
= 𝐹 (W𝑥𝐵→𝐵𝐵 𝑥
ˆ ′ (𝜓 𝐴𝐵𝐸
ˆ ), Φ 𝐴ˆ 𝐵ˆ ⊗ 𝜙e𝐸 𝐵′ ) (13.1.145)
√
≤ 𝐹 (Tr𝐸 𝐵′ [W𝑥𝐵→𝐵𝐵 𝑥
ˆ ′ (𝜓 𝐴𝐵𝐸
ˆ )], Tr𝐸 𝐵′ [Φ 𝐴ˆ 𝐵ˆ ⊗ 𝜙e𝐸 𝐵′ ]) (13.1.146)
√
= 𝐹 (D𝑥𝐵→𝐵ˆ (𝜔𝑥𝐴𝐵ˆ ), Φ 𝐴ˆ 𝐵ˆ ), (13.1.147)
for all 𝑥 ∈ X, where we recall the definition of 𝜔𝑥ˆ from (13.1.97). Since the
𝐴𝐵
inequality in (13.1.147) holds for all 𝑥 ∈ X, we have that
√︄ !2
∑︁ 𝑝(𝑥) √
𝐹 (𝜔𝑥𝐴𝐸
ˆ , 𝜋 𝐴ˆ ⊗ e
𝜌𝐸 )
𝑑𝑋𝐴
𝑥∈X
√︄ !2
∑︁ 𝑝(𝑥) √
≤ 𝐹 (D𝑥𝐵→𝐵ˆ (𝜔𝑥𝐴𝐵
ˆ ), Φ 𝐴ˆ 𝐵ˆ ) . (13.1.148)
𝑑𝑋𝐴
𝑥∈X

Now, the final state of Alice and Bob after executing the LOCC channel defined
by (13.1.79), with E 𝐴→ 𝐴𝑋ˆ 𝐴 defined by (13.1.91) and D 𝐵𝑋 𝐵 → 𝐵ˆ defined by (13.1.81)
and (13.1.144), is
𝜔 𝐴ˆ 𝐵ˆ = (D 𝑋𝐵 𝐵→𝐵ˆ ◦ C 𝑋 𝐴→𝑋𝐵 ◦ E 𝐴→ 𝐴𝑋ˆ 𝐴 )(𝜌 𝐴𝐵 ) (13.1.149)
∑︁
= D 𝑋𝐵 𝐵→𝐵ˆ (E𝑥𝐴→ 𝐴ˆ (𝜌 𝐴𝐵 ) ⊗ |𝑥⟩⟨𝑥| 𝑋𝐵 ) (13.1.150)
𝑥∈X
∑︁
= (E𝑥𝐴→ 𝐴ˆ ⊗ D𝑥𝐵→𝐵ˆ )(𝜌 𝐴𝐵 ) (13.1.151)
𝑥∈X
" #
∑︁
= Tr 𝑋𝐵 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋𝐵 ⊗ D𝑥𝐵→𝐵ˆ (𝜔𝑥𝐴𝐵
ˆ ) , (13.1.152)
𝑥∈X

where in the third equality we recognize the required form in (13.1.78) for a one-way
Alice-to-Bob LOCC channel, and in the last inequality we made use of (13.1.97).
Using the form of 𝜔 𝐴ˆ 𝐵ˆ in the last equality, along with all of the developments above,
we finally obtain
𝐹 (𝜔 𝐴ˆ 𝐵ˆ , Φ 𝐴ˆ 𝐵ˆ )
" # !
∑︁
= 𝐹 Tr 𝑋𝐵 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋𝐵 ⊗ D𝑥𝐵→𝐵ˆ (𝜔𝑥𝐴𝐵
ˆ ) , Tr 𝑋 𝐵 [𝜋 𝑋 𝐵 ⊗ Φ 𝐴ˆ 𝐵ˆ ] (13.1.153)
𝑥∈X
!
∑︁ 1 ∑︁
≥𝐹 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋𝐵 ⊗ D𝐵→𝐵ˆ (𝜔 𝐴𝐵
𝑥 𝑥
ˆ ), |𝑥⟩⟨𝑥| 𝑋 𝐴 ⊗ Φ 𝐴ˆ 𝐵ˆ (13.1.154)
𝑑𝑋𝐴
𝑥∈X 𝑥∈X

831
Chapter 13: Entanglement Distillation
√︄ !2
∑︁ 𝑝(𝑥) √
= 𝐹 (D𝑥𝐵→𝐵ˆ (𝜔𝑥𝐴𝐵
ˆ ), Φ 𝐴ˆ 𝐵ˆ ) (13.1.155)
𝑑𝑋𝐴
𝑥∈X
√︄ !2
∑︁ 𝑝(𝑥) √
= 𝐹 ((Tr𝐸 𝐵′ ◦ W𝑥𝐵→𝐵𝐵 𝑥
ˆ ′ )(𝜓 𝐴𝐵𝐸
ˆ ), Tr𝐸 𝐵′ [Φ 𝐴ˆ 𝐵ˆ ⊗ 𝜙e𝐸 𝐵′ ])
𝑑𝑋𝐴
𝑥∈X
(13.1.156)
√︄ !2
∑︁ 𝑝(𝑥) √
≥ 𝐹 (W𝑥𝐵→𝐵𝐵 𝑥
ˆ ′ (𝜓 𝐴𝐵𝐸
ˆ ), Φ 𝐴ˆ 𝐵ˆ ⊗ 𝜙e𝐸 𝐵′ ) (13.1.157)
𝑑𝑋𝐴
𝑥∈X
√︄ !2
∑︁ 𝑝(𝑥) √
= 𝐹 (𝜔𝑥𝐴𝐸
ˆ , 𝜋 𝐴ˆ ⊗ e
𝜌𝐸 ) (13.1.158)
𝑑𝑋𝐴
𝑥∈X
≥ 1 − 𝜀. (13.1.159)

Therefore,
𝑝 err (L; 𝜌 𝐴𝐵 ) = 1 − 𝐹 (𝜔 𝐴ˆ 𝐵ˆ , Φ 𝐴ˆ 𝐵ˆ ) ≤ 𝜀. (13.1.160)

To summarize, we have shown that, given a state 𝜌 𝐴𝐵 and 𝜀 ∈ (0, 1), there
exists a (𝑑, 𝜀) one-way entanglement distillation
√
protocol L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ if the dimension
𝜀−𝜂 √
𝑑 = 𝑑 𝐴ˆ = 𝑑 𝐵ˆ satisfies log2 𝑑 = −𝐻max ( 𝐴|𝐵) 𝜌 + 4 log2 𝜂, where 𝜂 ∈ [0, 𝜀).
Although we explicitly constructed the encoding channels {E𝑥 ˆ }𝑥∈X on Alice’s
𝐴→ 𝐴
side, on Bob’s side we relied on Uhlmann’s theorem to guarantee the existence of a
set of decoding channels {D𝑥 ˆ }𝑥∈X such that the overall LOCC channel L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ
𝐵→ 𝐵
satisfies 𝑝 err (L; 𝜌 𝐴𝐵 ) ≤ 𝜀. ■
Combining Theorem 13.10 with (7.8.83), and using (13.1.72), leads to the
following lower bound on the one-shot distillable entanglement:

Corollary 13.12
Let 𝜌 𝐴𝐵√be a bipartite quantum state with purification 𝜓 𝐴𝐵𝐸 . For all 𝜀 ∈ (0, 1),
𝜂 ∈ [0, 𝜀), and 𝛼 > 1, there exists a (𝑑, 𝜀) one-way entanglement distillation
protocol for 𝜌 𝐴𝐵 satisfying

1 1
log2 𝑑 ≥ 𝐻
e𝛼 ( 𝐴|𝐸)𝜓 − log2 √
𝛼−1 ( 𝜀 − 𝜂) 2

832
Chapter 13: Entanglement Distillation

1
− log2 √ + 4 log2 𝜂. (13.1.161)
1 − ( 𝜀 − 𝜂) 2

Proof: The inequality follows from taking the results of Theorem 13.10, using
(13.1.72), and applying the inequality in (7.8.83). ■

Since the inequality in (13.1.161) holds for all (𝑑, 𝜀) entanglement √ distillation
protocols, we obtain the following bound for all 𝜀 ∈ (0, 1), 𝜂 ∈ [0, 𝜀), and 𝛼 > 1:

𝜀 ′ ′ 1 1
𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 ≥ sup 𝐻
e𝛼 ( 𝐴 |𝐸 ) 𝜙 − log2 √
L 𝛼−1 ( 𝜀 − 𝜂) 2

1
− log2 √ + 4 log2 𝜂, (13.1.162)
1 − ( 𝜀 − 𝜂) 2
where the optimization is with respect to every LOCC channel L 𝐴𝐵→𝐴′ 𝐵′ , such that
𝜙 𝐴′ 𝐵′ 𝐸 ′ is a purification of L 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ). This comes about by first applying the
LOCC channel L 𝐴𝐵→𝐴′ 𝐵′ to 𝜌 𝐴𝐵 for free, applying Corollary 13.12 to the state
L 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ), and finally optimizing over every LOCC channel L 𝐴𝐵→𝐴′ 𝐵′ .

13.2 Distillable Entanglement of a Quantum State

Having found upper and lower bounds on the one-shot distillable entanglement
𝐸 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 of a bipartite quantum state 𝜌 𝐴𝐵 , let us now move on to the asymptotic
setting. In this setting, we allow Alice and Bob to make use of an arbitrarily large
number 𝑛 of copies of the state 𝜌 𝐴𝐵 in order to obtain a maximally entangled state.
An entanglement distillation protocol for 𝑛 copies of 𝜌 𝐴𝐵 is defined by the triple
(𝑛, 𝑑, L 𝐴𝑛 𝐵𝑛 → 𝐴ˆ 𝐵ˆ ), consisting of the number 𝑛 of copies of 𝜌 𝐴𝐵 , an integer 𝑑 ≥ 1,
and an LOCC channel L 𝐴𝑛 𝐵𝑛 → 𝐴ˆ 𝐵ˆ with 𝑑 𝐴ˆ = 𝑑 𝐵ˆ = 𝑑. Observe that an entanglement
distillation protocol for 𝑛 copies of 𝜌 𝐴𝐵 is equivalent to a (one-shot) entanglement
distillation protocol for the state 𝜌 ⊗𝑛 𝐴𝐵 . All of the results of Section 13.1 thus carry
over to the asymptotic setting simply by replacing 𝜌 𝐴𝐵 with 𝜌 ⊗𝑛 𝐴𝐵 . In particular,
the error probability for an entanglement distillation protocol for 𝜌 𝐴𝐵 defined by
(𝑛, 𝑑, L 𝐴𝑛 𝐵𝑛 → 𝐴ˆ 𝐵ˆ ) is equal to
𝑝 err (L; 𝜌 ⊗𝑛 ⊗𝑛
𝐴𝐵 ) = 1 − ⟨Φ| 𝐴ˆ 𝐵ˆ L 𝐴𝑛 𝐵 𝑛 → 𝐴ˆ 𝐵ˆ (𝜌 𝐴𝐵 )|Φ⟩ 𝐴ˆ 𝐵ˆ . (13.2.1)

833
Chapter 13: Entanglement Distillation

Definition 13.13 (𝒏, 𝒅, 𝜺) Entanglement Distillation Protocol

An entanglement distillation protocol (𝑛, 𝑑, L 𝐴𝑛 𝐵𝑛 → 𝐴ˆ 𝐵ˆ ) for 𝑛 copies of 𝜌 𝐴𝐵 , with
𝑑 𝐴ˆ = 𝑑 𝐵ˆ = 𝑑, is called an (𝑛, 𝑑, 𝜀) protocol, with 𝜀 ∈ [0, 1], if 𝑝 err (L; 𝜌 ⊗𝑛
𝐴𝐵 ) ≤
𝜀.

Based on the discussion above, we note that an (𝑛, 𝑑, 𝜀) entanglement distillation

protocol for 𝜌 𝐴𝐵 is a (𝑑, 𝜀) entanglement distillation protocol for 𝜌 ⊗𝑛
𝐴𝐵 .
The rate 𝑅(𝑛, 𝑑) of an (𝑛, 𝑑, 𝜀) entanglement distillation protocol for 𝑛 copies
of a given state is
log2 𝑑
𝑅(𝑛, 𝑑) B , (13.2.2)
𝑛
which can be thought of as the number of 𝜀-approximate ebits contained in the
final state of the protocol, per copy of the given initial state. Given a state 𝜌 𝐴𝐵
and 𝜀 ∈ [0, 1], the maximum rate of entanglement distillation among all (𝑛, 𝑑, 𝜀)
entanglement distillation protocols for 𝜌 𝐴𝐵 is given by
𝑛,𝜀 𝑛,𝜀 1 𝜀 ⊗𝑛
𝐸𝐷 (𝜌 𝐴𝐵 ) ≡ 𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 B 𝐸 (𝜌 ) (13.2.3)
𝑛 𝐷 𝐴𝐵
log2 𝑑 ⊗𝑛
= sup : 𝑝 err (L; 𝜌 𝐴𝐵 ) ≤ 𝜀 , (13.2.4)
(𝑑,L) 𝑛
where the optimization is with respect to all 𝑑 ≥ 1 and every LOCC channel
L 𝐴𝑛 𝐵𝑛 → 𝐴ˆ 𝐵ˆ with 𝑑 𝐴ˆ = 𝑑 𝐵ˆ = 𝑑.

Definition 13.14 Achievable Rate for Entanglement Distillation

Given a bipartite quantum state 𝜌 𝐴𝐵 , a rate 𝑅 ∈ R+ is called an achievable rate
for entanglement distillation for 𝜌 𝐴𝐵 if for all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently
large 𝑛, there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) entanglement distillation protocol for
𝜌 𝐴𝐵 .

As we prove in Appendix A,
𝑅 achievable rate ⇐⇒ lim 𝜀 𝐷 (2𝑛(𝑅−𝛿) ; 𝜌 ⊗𝑛
𝐴𝐵 ) = 0 ∀ 𝛿 > 0. (13.2.5)
𝑛→∞
In other words, a rate 𝑅 is achievable if the optimal error probability for a sequence
of protocols with rate 𝑅 − 𝛿, 𝛿 > 0, vanishes as the number 𝑛 of copies of 𝜌 𝐴𝐵
increases.
834
Chapter 13: Entanglement Distillation

Definition 13.15 Distillable Entanglement of a Quantum State

The distillable entanglement of a bipartite state 𝜌 𝐴𝐵 , denoted by 𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 , is
defined to be the supremum of all achievable rates for entanglement distillation
for 𝜌 𝐴𝐵 , i.e.,

𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 B sup{𝑅 : 𝑅 is an achievable rate for 𝜌 𝐴𝐵 }. (13.2.6)

The distillable entanglement can also be written as

1
𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 = inf lim inf 𝐸 𝐷𝜀 (𝜌 ⊗𝑛
𝐴𝐵 ). (13.2.7)
𝜀∈(0,1] 𝑛→∞ 𝑛

See Appendix A for a proof.

Definition 13.16 Weak Converse Rate for Entanglement Distillation

Given a bipartite state 𝜌 𝐴𝐵 , a rate 𝑅 ∈ R+ is called a weak converse rate for
entanglement distillation for 𝜌 𝐴𝐵 if every 𝑅′ > 𝑅 is not an achievable rate for
𝜌 𝐴𝐵 .

As we show in Appendix A,

𝑅 weak converse rate ⇐⇒ lim 𝜀 𝐷 (2𝑛(𝑅−𝛿) ; 𝜌 ⊗𝑛

𝐴𝐵 ) > 0 ∀ 𝛿 > 0. (13.2.8)
𝑛→∞

Definition 13.17 Strong Converse Rate for Entanglement Distillation

Given a bipartite state 𝜌 𝐴𝐵 , a rate 𝑅 ∈ R+ is called a strong converse rate for
entanglement distillation for 𝜌 𝐴𝐵 if for all 𝜀 ∈ [0, 1), 𝛿 > 0, and sufficiently
large 𝑛, there does not exist an (𝑛, 2𝑛(𝑅+𝛿) , 𝜀) entanglement distillation protocol
for 𝜌 𝐴𝐵 .

We show in Appendix A that

𝑅 strong converse rate ⇐⇒ lim 𝜀 𝐷 (2𝑛(𝑅+𝛿) ; 𝜌 𝐴𝐵 ) = 1 ∀ 𝛿 > 0. (13.2.9)

𝑛→∞

835
Chapter 13: Entanglement Distillation

Definition 13.18 Strong Converse Distillable Entanglement of a Quan-

tum State
The strong converse distillable entanglement of a bipartite state 𝜌 𝐴𝐵 , denoted
e𝐷 ( 𝐴; 𝐵) 𝜌 , is defined as the infimum of all strong converse rates, i.e.,
by 𝐸

e𝐷 ( 𝐴; 𝐵) 𝜌 B inf{𝑅 : 𝑅 is a strong converse rate for 𝜌 𝐴𝐵 }.

𝐸 (13.2.10)

Note that
𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 ≤ 𝐸
e𝐷 ( 𝐴; 𝐵) 𝜌 (13.2.11)
for all bipartite states 𝜌 𝐴𝐵 . We can also write the strong converse distillable
entanglement as

e𝐷 ( 𝐴; 𝐵) 𝜌 = sup lim sup 1 𝐸 𝐷𝜀 (𝜌 ⊗𝑛 ).

𝐸 (13.2.12)
𝐴𝐵
𝜀∈[0,1) 𝑛→∞ 𝑛

See Appendix A for a proof.

We are now ready to present a general expression for the distillable entanglement
of a bipartite quantum state, as well as two upper bounds on it.

Theorem 13.19 Distillable Entanglement of a Bipartite State

The distillable entanglement of a bipartite state 𝜌 𝐴𝐵 is given by
1
𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 = lim sup 𝐼 ( 𝐴′⟩𝐵′)L (𝑛) (𝜌 ⊗𝑛 ) , (13.2.13)
𝑛→∞ 𝑛 (𝑛)
L

where the optimization is with respect to (two-way) LOCC channels L (𝑛)

𝐴𝑛 𝐵 𝑛 →𝐴′ 𝐵′ .
Furthermore, the Rains relative entropy 𝑅( 𝐴; 𝐵) 𝜌 from (9.3.4) is a strong
converse rate for distillable entanglement, in the sense that
e𝐷 ( 𝐴; 𝐵) 𝜌 ≤ 𝑅( 𝐴; 𝐵) 𝜌 ,
𝐸 (13.2.14)

and the squashed entanglement from (9.4.1) is a weak converse rate, in the
sense that
𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 ≤ 𝐸 sq ( 𝐴; 𝐵) 𝜌 . (13.2.15)

836
Chapter 13: Entanglement Distillation

If we define
𝐷 ↔ (𝜌 𝐴𝐵 ) ≡ 𝐷 ↔ ( 𝐴; 𝐵) 𝜌 B sup 𝐼 ( 𝐴′⟩𝐵′)L(𝜌) , (13.2.16)
L
then we can write (13.2.13) as
1 ↔ ⊗𝑛
𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 = lim 𝐷 (𝜌 𝐴𝐵 ) C 𝐷 ↔ reg (𝜌 𝐴𝐵 ), (13.2.17)
𝑛→∞ 𝑛

so that the distillable entanglement can be viewed as the regularized version of 𝐷 ↔ ,

and it is reminiscent of the regularized Holevo information that gives the classical
capacity of a quantum channel.
Let us make the following observations about Theorem 13.19.
• The coherent information of a bipartite state 𝜌 𝐴𝐵 is an achievable rate for
entanglement distillation, i.e.,
𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 ≥ 𝐼 ( 𝐴⟩𝐵) 𝜌 = 𝐻 (𝐵) 𝜌 − 𝐻 ( 𝐴𝐵) 𝜌 . (13.2.18)
This follows immediately from (13.2.13) by dropping the optimization over
two-way LOCC channels and due to the fact that the coherent information is
additive for product states, meaning that 𝐼 ( 𝐴𝑛 ⟩𝐵𝑛 ) 𝜌 ⊗𝑛 = 𝑛𝐼 ( 𝐴⟩𝐵) 𝜌 . As we
show in Section 13.2.1 below, the strategy to attain the coherent information
rate is essentially the one-way entanglement distillation protocol considered in
Section 13.1.2 for the one-shot lower bound, and it is sometimes called the
“hashing protocol.” For this reason, the inequality in (13.2.18) is known as
the hashing bound (please consult the Bibliographic Notes in Section 13.5 for
pointers to the research literature).
• In order to obtain a higher entanglement distillation rate than 𝐼 ( 𝐴⟩𝐵) 𝜌 , one
strategy is to use 𝑛 ≥ 2 copies of 𝜌 𝐴𝐵 along with a two-way LOCC channel
L 𝐴𝑛 𝐵𝑛 →𝐴′ 𝐵′ in order to obtain a state 𝜔 𝐴′ 𝐵′ B L 𝐴𝑛 𝐵𝑛 →𝐴′ 𝐵′ (𝜌 ⊗𝑛
𝐴𝐵 ) whose
coherent information is potentially larger than that of 𝜌 𝐴𝐵 . Then, we can apply
the hashing protocol to the state 𝜔 𝐴′ 𝐵′ . The overall rate of this strategy (the
two-way LOCC channel followed by the hashing protocl) is then 𝑛1 𝐼 ( 𝐴′⟩𝐵′)𝜔 ,
and Theorem 13.19 tells us that such a strategy is optimal in the large 𝑛 limit.
With increasingly more copies of 𝜌 𝐴𝐵 to start with, it might be possible to
obtain a better rate, which is why we need to regularize in general.
As with the proof of the entanglement-assisted classical capacity and classical
capacity theorems in Chapters 11 and 12, respectively, we prove Theorem 13.19 in
two steps:
837
Chapter 13: Entanglement Distillation

1. Achievability: We show that the right-hand side of (13.2.13) is an achievable

rate for entanglement distillation for 𝜌 𝐴𝐵 . Doing so involves exhibiting an
explicit entanglement distillation protocol. The protocol we use is based on
the one we used in Section 13.1.2 to obtain a lower bound on the one-shot
distillable entanglement.
The achievability part of the proof establishes that
1
𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 ≥ lim sup 𝐼 ( 𝐴′⟩𝐵′)L(𝜌 ⊗𝑛 ) . (13.2.19)
𝑛→∞ 𝑛 L

2. Weak converse: We show that the right-hand side of (13.2.13) is a weak

converse rate for entanglement distillation for 𝜌 𝐴𝐵 , from which it follows that
𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 ≤ lim𝑛→∞ 𝑛1 supL 𝐼 ( 𝐴′⟩𝐵′)L(𝜌 ⊗𝑛 ) . In order to show this, we use the
one-shot upper bounds from Section 13.1.1 to prove that every achievable rate
𝑅 satisfies 𝑅 ≤ lim𝑛→∞ 𝑛1 supL 𝐼 ( 𝐴′⟩𝐵′)L(𝜌 ⊗𝑛 ) .
We go through the achievability part of the proof of Theorem 13.19 in Sec-
tion 13.2.1. We then proceed with the weak converse part in Section 13.2.2.
The expression in (13.2.13) for the distillable entanglement involves both a limit
over an unbounded number of copies of the state 𝜌 𝐴𝐵 , as well as an optimization
over all two-way LOCC channels. Computing the distillable entanglement is
therefore intractable in general. After establishing a proof of (13.2.13), we proceed
to establish upper bounds on distillable entanglement that depend only on the
given state 𝜌 𝐴𝐵 . Specifically, in Section 13.2.3, we use the one-shot results in
Section 13.1.1 to show that the Rains relative entropy is a strong converse rate for
entanglement distillation. We also show that the squashed entanglement is a weak
converse rate for entanglement distillation.

13.2.0.1 Bound Entanglement

The inequality in (13.2.14) implies that 𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 ≤ 0 for every PPT state 𝜌 𝐴𝐵 ,

because, by definition, the Rains relative entropy vanishes for all PPT states. On
the other hand, we always have 𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 ≥ 0 for every state 𝜌 𝐴𝐵 . Therefore,

𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 = 0 for all PPT states. (13.2.20)

Recall from the discussion after Lemma 13.5 (see also Section 3.2.9) that there exist
PPT entangled states in higher dimensional bipartite systems because SEP( 𝐴 : 𝐵) ≠
838
Chapter 13: Entanglement Distillation

PPT
SEP = PPT Entangled SEP (Bound Entangled
Entangled)

(a) Qubit-qubit and qubit-qutrit states (b) General high-dimensional states

— Distillable — Non-Distillable

Figure 13.3: The set of all bipartite states can be split into distillable and
non-distillable sets. (a) For qubit-qubit and qubit-qutrit states entanglement
and distillability are in one-to-one correspondence because all PPT states are
separable. (b) In higher dimensions, there are entangled states belonging to the
set PPT, which we call bound entangled states. Distillability and entanglement
are thus not synonymous in general for high-dimensional quantum systems.

PPT( 𝐴 : 𝐵) except for when 𝐴 and 𝐵 are both qubits or when one is a qubit and the
other is a qutrit. All of these entangled states have zero distillable entanglement, and
thus we refer to them as bound entangled. Remarkably, therefore, except for qubit-
qubit and qubit-qutrit states, prior entanglement is only necessary, but not sufficient,
for distilling pure maximally entangled states. Please consult the Bibliographic
Notes in Section 13.5 for more information about bound entanglement.
As shown in Figure 13.3, we can use entanglement distillation to split up the
set of all bipartite states into distillable and non-distillable states. For two-qubit
states and qubit-qutrit states, non-distillable states are exactly equal to the set of
separable states by the PPT criterion. For higher dimensions, as stated above, this
is not the case. Also in higher dimensions, it is in general possible to have states
with negative partial transpose (NPT) that are nonetheless non-distillable. These
NPT bound entangled states are shown in Figure 13.3(b) as the region between the
PPT bound entangled states and the distillable entangled states. It is not known
whether NPT bound entangled states exist, but since they have not been ruled out,
we nevertheless depict them in the figure.

839
Chapter 13: Entanglement Distillation

13.2.1 Proof of Achievability

As the first step in proving the achievability part of Theorem 13.19, let us recall
Corollary√ 13.12: given a bipartite state 𝜌 𝐴𝐵 with purification 𝜓 𝐴𝐵𝐸 , for all 𝜀 ∈ (0, 1),
𝜂 ∈ [0, 𝜀), and 𝛼 > 1, there exists a (𝑑, 𝜀) entanglement distillation protocol for
𝜌 𝐴𝐵 such that

1 1
log2 𝑑 ≥ 𝐻e𝛼 ( 𝐴|𝐸)𝜓 − log2 √
𝛼−1 ( 𝜀 − 𝜂) 2

1
− log2 √ + 4 log2 𝜂, (13.2.21)
1 − ( 𝜀 − 𝜂) 2
where
𝐻 e𝛼 (𝜓 𝐴𝐸 ∥ 1 𝐴 ⊗ 𝜎𝐸 )
e𝛼 ( 𝐴|𝐸)𝜓 = − inf 𝐷 (13.2.22)
𝜎𝐸
is the sandwiched Rényi conditional entropy. Applying this inequality to the state
𝜌 ⊗𝑛
𝐴𝐵 for all 𝑛 ≥ 1 leads to the following:

Proposition 13.20
For every state 𝜌 𝐴𝐵 and 𝜀 ∈ (0, 1), there exists an (𝑛, 𝑑, 𝜀) entanglement
log 𝑑
distillation protocol for 𝜌 𝐴𝐵 such that the rate 𝑛2 satisfies

log2 𝑑 2𝛼 − 1 4 1 1
≥𝐻e𝛼 ( 𝐴|𝐸)𝜓 − log2 − log2 , (13.2.23)
𝑛 𝑛(𝛼 − 1) 𝜀 𝑛 1 − 𝜀4

for all 𝑛 ≥ 1 and 𝛼 > 1, where 𝜓 𝐴𝐵𝐸 is a purification of 𝜌 𝐴𝐵 . In general,

𝑛,𝜀 1 e𝛼 ( 𝐴′ |𝐸 ′) 𝜙 − 2𝛼 − 1 4
𝐸𝐷 ( 𝐴; 𝐵) ≥ sup 𝐻 log2
L 𝑛 𝑛(𝛼 − 1) 𝜀

1 1
− log2 , (13.2.24)
𝑛 1 − 𝜀4

for all 𝑛 ≥ 1 and 𝛼 > 1, where the optimization is over every LOCC channel
L 𝐴𝑛 𝐵𝑛 →𝐴′ 𝐵′ , such that 𝜙 𝐴′ 𝐵′ 𝐸 ′ is a purification of L 𝐴𝑛 𝐵𝑛 →𝐴′ 𝐵′ (𝜌 ⊗𝑛
𝐴𝐵 ).

Proof: Let 𝜓 𝐴𝐵𝐸 be a purification

√
of 𝜌 𝐴𝐵 , and use the tensor-product purification
𝜓 ⊗𝑛 ⊗𝑛 𝜀
𝐴𝐵𝐸 for 𝜌 𝐴𝐵 . Also, let 𝜂 = 2 . Substituting all of this into the inequality in
840
Chapter 13: Entanglement Distillation

(13.2.21) and simplifying leads to

log2 𝑑 1 𝑛 𝑛 1 4
≥ 𝐻e𝛼 ( 𝐴 |𝐸 )𝜓 ⊗𝑛 − log2
𝑛 𝑛 𝑛(𝛼 − 1) 𝜀

1 1 2 4
− log2 𝜀 − log2 , (13.2.25)
𝑛 1− 4 𝑛 𝜀

Then, optimizing over every LOCC channel L 𝐴𝑛 𝐵𝑛 →𝐴′ 𝐵′ , and using the definition
𝑛,𝜀
of 𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 in (13.2.4), we obtain (13.2.24).
By employing additivity of the sandwiched Rényi conditional entropy for all
𝛼 > 1, we have that
e𝛼 ( 𝐴𝑛 |𝐸 𝑛 )𝜓 ⊗𝑛 = 𝑛 𝐻
𝐻 e𝛼 ( 𝐴|𝐸)𝜓 . (13.2.26)
Note that the proof of additivity follows similarly to the proof of Proposition 11.21.
This leads to (13.2.23). ■

With the inequality in (13.2.23), we can prove that the coherent information is
an achievable rate for entanglement distillation.

Theorem 13.21 Achievability of Coherent Information for Entanglement

Distillation
The coherent information 𝐼 ( 𝐴⟩𝐵) 𝜌 of a bipartite state 𝜌 𝐴𝐵 is an achievable rate
for entanglement distillation for 𝜌 𝐴𝐵 . In other words, 𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 ≥ 𝐼 ( 𝐴⟩𝐵) 𝜌
for every bipartite state 𝜌 𝐴𝐵 .

Proof: Let 𝜓 𝐴𝐵𝐸 be a purification of 𝜌 𝐴𝐵 . Fix 𝜀 ∈ (0, 1] and 𝛿 > 0. Let 𝛿1 , 𝛿2 > 0
be such that
𝛿 = 𝛿1 + 𝛿2 . (13.2.27)
Set 𝛼 ∈ (1, ∞) such that

𝛿1 ≥ 𝐼 ( 𝐴⟩𝐵) 𝜌 − 𝐻
e𝛼 ( 𝐴|𝐸)𝜓 . (13.2.28)

e𝛼 ( 𝐴|𝐸)𝜓 increases monotonically with decreas-

Note that this is possible because 𝐻
ing 𝛼 (this follows from Proposition 7.23), so that
e𝛼 ( 𝐴|𝐸)𝜓 = sup 𝐻
lim 𝐻 e𝛼 ( 𝐴|𝐸)𝜓 . (13.2.29)
𝛼→1+ 𝛼∈(1,∞)

841
Chapter 13: Entanglement Distillation

Also,
e𝛼 ( 𝐴|𝐸)𝜓 = sup 𝐻
lim 𝐻 e𝛼 ( 𝐴|𝐸)𝜓 (13.2.30)
𝛼→1+ 𝛼∈(1,∞)

= sup − inf 𝐷 e𝛼 (𝜓 𝐴𝐸 ∥ 1 𝐴 ⊗ 𝜎𝐸 ) (13.2.31)
𝛼∈(1,∞) 𝜎𝐸

e𝛼 (𝜓 𝐴𝐸 ∥ 1 𝐴 ⊗ 𝜎𝐸 )
= − inf inf 𝐷 (13.2.32)
𝛼∈(1,∞) 𝜎𝐸
e𝛼 (𝜓 𝐴𝐸 ∥ 1 𝐴 ⊗ 𝜎𝐸 )
= − inf inf 𝐷 (13.2.33)
𝜎𝐸 𝛼∈(1,∞)
= − inf 𝐷 (𝜓 𝐴𝐸 ∥ 1 𝐴 ⊗ 𝜎𝐸 ) (13.2.34)
𝜎𝐸
= 𝐻 ( 𝐴|𝐸)𝜓 , (13.2.35)

where the fifth equality follows from Proposition 7.30. Then, by definition of
conditional entropy, and the fact that 𝜓 𝐴𝐵𝐸 is a pure state, we find that

𝐻 ( 𝐴|𝐸)𝜓 = 𝐻 ( 𝐴𝐸)𝜓 − 𝐻 (𝐸)𝜓 = 𝐻 (𝐵)𝜓 − 𝐻 ( 𝐴𝐵)𝜓 = 𝐼 ( 𝐴⟩𝐵) 𝜌 . (13.2.36)

With 𝛼 ∈ (1, ∞) chosen such that (13.2.28) holds, take 𝑛 large enough so that

2𝛼 − 1 4 1 1
𝛿2 ≥ log2 + log2 . (13.2.37)
𝑛(𝛼 − 1) 𝜀 𝑛 1 − 𝜀4

Now, we use the fact that for the 𝑛 and 𝜀 chosen above there exists an (𝑛, 𝑑, 𝜀)
protocol such that

log2 𝑑 2𝛼 − 1 4 1 1
≥𝐻 e𝛼 ( 𝐴|𝐸)𝜓 − log2 − log2 . (13.2.38)
𝑛 𝑛(𝛼 − 1) 𝜀 𝑛 1 − 𝜀4

(This follows from Proposition 13.20 above.) Rearranging the right-hand side of
this inequality, and using (13.2.27), (13.2.28), and (13.2.37), we find that

log2 𝑑 2𝛼 − 1 4
≥ 𝐼 ( 𝐴⟩𝐵) 𝜌 − 𝐼 ( 𝐴⟩𝐵) 𝜌 − 𝐻
e𝛼 ( 𝐴|𝐸)𝜓 + log2
𝑛 𝑛(𝛼 − 1) 𝜀

1 1
+ log2 (13.2.39)
𝑛 1 − 𝜀4
≥ 𝐼 ( 𝐴⟩𝐵) 𝜌 − (𝛿1 + 𝛿2 ) (13.2.40)
= 𝐼 ( 𝐴⟩𝐵) 𝜌 − 𝛿. (13.2.41)

842
Chapter 13: Entanglement Distillation

We thus have that there exists an (𝑛, 𝑑, 𝜀) entanglement distillation protocol with
log 𝑑
rate 𝑛2 ≥ 𝐼 ( 𝐴⟩𝐵) 𝜌 − 𝛿. Therefore, there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) entanglement
distillation protocol with 𝑅 = 𝐼 ( 𝐴⟩𝐵) 𝜌 for all sufficiently large 𝑛 such that (13.2.37)
holds. Since 𝜀 and 𝛿 are arbitrary, we conclude that for all 𝜀 ∈ (0, 1], 𝛿 > 0, and
sufficiently large 𝑛, there exists an (𝑛, 2𝑛(𝐼 ( 𝐴⟩𝐵)𝜌 −𝛿) , 𝜀) entanglement distillation
protocol. This means that, by definition, 𝐼 ( 𝐴⟩𝐵) 𝜌 is an achievable rate. ■

Proof of the Achievability Part of Theorem 13.19

Let L 𝐴 𝑘 𝐵 𝑘 →𝐴′ 𝐵′ be an arbitrary LOCC channel with 𝑘 ≥ 1, let

𝜔 𝐴′ 𝐵′ = L 𝐴 𝑘 𝐵 𝑘 →𝐴′ 𝐵′ (𝜌 ⊗𝑘
𝐴𝐵 ), (13.2.42)
and let 𝜙 𝐴′ 𝐵′ 𝐸 ′ be a purification of 𝜔 𝐴′ 𝐵′ . Fix 𝜀 ∈ (0, 1] and 𝛿 > 0. Let 𝛿1 , 𝛿2 > 0
be such that
𝛿 = 𝛿1 + 𝛿2 . (13.2.43)
Set 𝛼 ∈ (1, ∞) such that
1 1e
𝛿1 ≥ 𝐼 ( 𝐴′⟩𝐵′)𝜔 − 𝐻 ′ ′
𝛼 ( 𝐴 |𝐸 ) 𝜙 , (13.2.44)
𝑘 𝑘
which is possible based on the arguments given in the proof of Theorem 13.21
above. Then, with this choice of 𝛼, take 𝑛 large enough so that

2𝛼 − 1 4 1 1
𝛿2 ≥ log2 + log2 . (13.2.45)
𝑘𝑛(𝛼 − 1) 𝜀 𝑘𝑛 1 − 𝜀4
Now, we use the fact that, for the chosen 𝑛 and 𝜀, there exists an (𝑛, 𝑑, 𝜀)
entanglement distillation protocol such that (13.2.23) holds, i.e.,

log2 𝑑
e𝛼 ( 𝐴′ |𝐸 ′) 𝜙 − 2𝛼 − 1 4 1 1
≥𝐻 log2 − log2 . (13.2.46)
𝑛 𝑛(𝛼 − 1) 𝜀 𝑛 1 − 𝜀4
Dividing both sides by 𝑘 gives

log2 𝑑 1e 2𝛼 − 1 4 1 1
≥ 𝐻𝛼 ( 𝐴′ |𝐸 ′) 𝜙 − log2 − log2 . (13.2.47)
𝑘𝑛 𝑘 𝑘𝑛(𝛼 − 1) 𝜀 𝑘𝑛 1 − 𝜀4
Rearranging the right-hand side of this inequality, and using (13.2.43)–(13.2.45),
we find that

log2 𝑑 1 1 1e
≥ 𝐼 ( 𝐴′⟩𝐵′)𝜔 − 𝐼 ( 𝐴′⟩𝐵′)𝜔 − 𝐻 ′ ′
𝛼 ( 𝐴 |𝐸 ) 𝜙
𝑘𝑛 𝑘 𝑘 𝑘
843
Chapter 13: Entanglement Distillation

2𝛼 − 1 4 1 1
+ log2 + log2 (13.2.48)
𝑘𝑛(𝛼 − 1) 𝜀 𝑘𝑛 1 − 𝜀4
1
≥ 𝐼 ( 𝐴′⟩𝐵′)𝜔 − (𝛿1 + 𝛿2 ) (13.2.49)
𝑘
1
= 𝐼 ( 𝐴′⟩𝐵′)𝜔 − 𝛿. (13.2.50)
𝑘
log 𝑑
Thus, there exists a (𝑘𝑛, 𝑑, 𝜀) entanglement distillation protocol with rate 𝑘𝑛2 ≥
1 ′ ′ ′ ′ 𝑛′ (𝑅−𝛿) , 𝜀) entan-
𝑘 𝐼 ( 𝐴 ⟩𝐵 )𝜔 − 𝛿. Therefore, letting 𝑛 ≡ 𝑘𝑛, there exists an (𝑛 , 2
glement distillation protocol with 𝑅 = 1𝑘 𝐼 ( 𝐴′⟩𝐵′)𝜔 for all sufficiently large 𝑛 such
that (13.2.45) holds. Since 𝜀 and 𝛿 are arbitrary, we conclude that for all 𝜀 ∈ (0, 1],
1 ′ ′
𝛿 > 0, and sufficiently large 𝑛, there exists an (𝑛, 2𝑛( 𝑘 𝐼 ( 𝐴 ⟩𝐵 ) 𝜔 −𝛿) , 𝜀) entanglement
distillation protocol. This means that 1𝑘 𝐼 ( 𝐴′⟩𝐵′)𝜔 is an achievable rate. (Recall that
𝜔 𝐴′ 𝐵′ = L 𝐴 𝑘 𝐵 𝑘 →𝐴′ 𝐵′ (𝜌 ⊗𝑘
𝐴𝐵 ).)
Now, since in the arguments above the LOCC channel L 𝐴 𝑘 𝐵 𝑘 →𝐴′ 𝐵′ is arbitrary,
we conclude that
1
sup 𝐼 ( 𝐴′⟩𝐵′)L(𝜌 ⊗𝑘 ) (13.2.51)
𝑘 L
is an achievable rate. Finally, since the number 𝑘 of copies of 𝜌 𝐴𝐵 is arbitrary, we
conclude that
1
lim sup 𝐼 ( 𝐴′⟩𝐵′)L(𝜌 ⊗𝑘 ) (13.2.52)
𝑘→∞ 𝑘 L

is an achievable rate.

13.2.2 Proof of the Weak Converse

In order to prove the weak converse part of Theorem 13.19, we make use of
Corollary 13.8, specifically (13.1.44): given a bipartite state 𝜌 𝐴𝐵 , for every (𝑑, 𝜀)
entanglement distillation protocol for 𝜌 𝐴𝐵 , with 𝜀 ∈ [0, 1/2), it holds that

1 ′ ′
log2 𝑑 ≤ sup 𝐼 ( 𝐴 ⟩𝐵 )L(𝜌) + ℎ2 (𝜀) , (13.2.53)
1 − 2𝜀 L

where the optimization is with respect to every LOCC channel L 𝐴𝐵→𝐴′ 𝐵′ . Applying
this inequality to the state 𝜌 ⊗𝑛
𝐴𝐵 immediately leads to the following.

844
Chapter 13: Entanglement Distillation

Proposition 13.22
Let 𝜌 𝐴𝐵 be a bipartite state, and let 𝑛 ≥ 1 and 𝜀 ∈ [0, 1/2). For an (𝑛, 𝑑, 𝜀)
entanglement distillation protocol for 𝜌 𝐴𝐵 with corresponding LOCC channel
log 𝑑
L 𝐴𝑛 𝐵𝑛 →𝐴′ 𝐵′ , the rate 𝑛2 satisfies

log2 𝑑 1 1 ′ ′ 1
≤ sup 𝐼 ( 𝐴 ⟩𝐵 )L(𝜌 ⊗𝑛 ) + ℎ2 (𝜀) . (13.2.54)
𝑛 1 − 2𝜀 L 𝑛 𝑛

Consequently,

1 1 1
𝑛,𝜀
𝐸𝐷 ( 𝐴; 𝐵) 𝜌 ≤ sup 𝐼 ( 𝐴′⟩𝐵′)L(𝜌 ⊗𝑛 ) + ℎ2 (𝜀) , (13.2.55)
1 − 2𝜀 𝑛 L 𝑛

where the optimization is over every LOCC channel L 𝐴𝑛 𝐵𝑛 →𝐴′ 𝐵′ .

Proof: The inequality in (13.2.54) is immediate from (13.1.44) in Corollary 13.8

by applying that inequality to the state 𝜌 ⊗𝑛
𝐴𝐵 and dividing both sides by 𝑛. The
𝑛,𝜀
inequality in (13.2.55) follows immediately by definition of 𝐸 𝐷 in (13.2.4). ■

Proof of the Weak Converse Part of Theorem 13.19

Suppose that 𝑅 is an achievable rate for entanglement distillation for the bipartite
state 𝜌 𝐴𝐵 . Then, by definition, for all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently large 𝑛,
there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) entanglement distillation protocol for 𝜌 𝐴𝐵 . For all
such protocols for which 𝜀 ∈ (0, 1/2), the inequality in (13.2.54) holds, so that

1 1 1
𝑅−𝛿 ≤ sup 𝐼 ( 𝐴′⟩𝐵′)L(𝜌 ⊗𝑛 ) + ℎ2 (𝜀) . (13.2.56)
1 − 2𝜀 𝑛 L 𝑛

Since the inequality holds for all sufficiently large 𝑛, it holds in the limit 𝑛 → ∞,
so that

1 1 1
𝑅 ≤ lim sup 𝐼 ( 𝐴′⟩𝐵′)L(𝜌 ⊗𝑛 ) + ℎ2 (𝜀) + 𝛿 (13.2.57)
𝑛→∞ 1 − 2𝜀 𝑛 L 𝑛
1 1
= lim sup 𝐼 ( 𝐴′⟩𝐵′)L(𝜌 ⊗𝑛 ) + 𝛿. (13.2.58)
1 − 2𝜀 𝑛→∞ 𝑛 L

845
Chapter 13: Entanglement Distillation

Then, since this inequality holds for all 𝜀 ∈ (0, 1/2), 𝛿 > 0, we conclude that

1 1
𝑅 ≤ lim lim sup 𝐼 ( 𝐴′⟩𝐵′)L(𝜌 ⊗𝑛 ) + 𝛿 (13.2.59)
𝜀,𝛿→0 1 − 2𝜀 𝑛→∞ 𝑛 L
1
= lim sup 𝐼 ( 𝐴′⟩𝐵′)L(𝜌 ⊗𝑛 ) . (13.2.60)
𝑛→∞ 𝑛 L

We have thus shown that the quantity lim𝑛→∞ 𝑛1 supL 𝐼 ( 𝐴′⟩𝐵′)L(𝜌 ⊗𝑛 ) is a weak
converse rate for entanglement distillation for 𝜌 𝐴𝐵 .

13.2.3 Rains Relative Entropy Strong Converse Upper Bound

As stated previously, the expression in (13.2.13) for distillable entanglement involves

both a limit over an unbounded number of copies of the initial state 𝜌 𝐴𝐵 , as well
as an optimization over all two-way LOCC channels. Computing the distillable
entanglement is therefore intractable in general. In this section, we use the one-shot
upper bound established in Section 13.1.1 to show that the Rains relative entropy is
a strong converse upper bound on the distillable entanglement of a bipartite state.
We start by recalling the upper bound in (13.1.45), which tells us that

𝛼 1
log2 𝑑 ≤ 𝑅e𝛼 ( 𝐴; 𝐵) 𝜌 + log2 ∀ 𝛼 > 1, (13.2.61)
𝛼−1 1−𝜀

for an arbitrary (𝑑, 𝜀) entanglement distillation protocol, where 𝜀 ∈ (0, 1). Recall
that
e𝛼 ( 𝐴; 𝐵) 𝜌 =
𝑅 inf e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ).
𝐷 (13.2.62)
𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)

The upper bound above is a consequence of the fact that PPT′ operators are useless
for entanglement distillation, in the sense that for every 𝜎𝐴𝐵 ∈ PPT′ ( 𝐴 : 𝐵), the
bound Tr[Φ 𝐴𝐵 𝜎𝐴𝐵 ] ≤ 𝑑1 holds.
Applying the upper bound in (13.2.61) to the state 𝜌 ⊗𝑛
𝐴𝐵 leads to the following
result:

846
Chapter 13: Entanglement Distillation

Corollary 13.23
Let 𝜌 𝐴𝐵 be a bipartite state, let 𝑛 ≥ 1, 𝜀 ∈ [0, 1), and 𝛼 > 1. For an (𝑛, 𝑑, 𝜀)
entanglement distillation protocol, the following bound holds

log2 𝑑 𝛼 1
≤𝑅 e𝛼 ( 𝐴; 𝐵) 𝜌 + log2 . (13.2.63)
𝑛 𝑛(𝛼 − 1) 1−𝜀

Consequently,

𝑛,𝜀 𝛼 1
𝐸𝐷 ( 𝐴; 𝐵) 𝜌 ≤𝑅
e𝛼 ( 𝐴; 𝐵) 𝜌 + log2 . (13.2.64)
𝑛(𝛼 − 1) 1−𝜀

Proof: An (𝑛, 𝑑, 𝜀) entanglement distillation protocol for 𝜌 𝐴𝐵 is a (𝑑, 𝜀) entangle-

ment distillation protocol for 𝜌 ⊗𝑛
𝐴𝐵 . Therefore, applying the inequality in (13.2.61)
⊗𝑛
to the state 𝜌 𝐴𝐵 and dividing both sides by 𝑛 leads to

log2 𝑑 1 𝑛 𝑛 𝛼 1
≤ 𝑅 e𝛼 ( 𝐴 ; 𝐵 ) 𝜌 ⊗𝑛 + log2 . (13.2.65)
𝑛 𝑛 𝑛(𝛼 − 1) 1−𝜀
Now, by subadditivity of the sandwiched Rényi Rains relative entropy (see (9.3.18)),
we have that
e𝛼 ( 𝐴𝑛 ; 𝐵𝑛 ) 𝜌 ⊗𝑛 ≤ 𝑛 𝑅
𝑅 e𝛼 ( 𝐴; 𝐵) 𝜌 . (13.2.66)
Therefore,
log2 𝑑 𝛼 1
≤𝑅
e𝛼 ( 𝐴; 𝐵) 𝜌 + log2 , (13.2.67)
𝑛 𝑛(𝛼 − 1) 1−𝜀
as required. Since this inequality holds for all (𝑛, 𝑑, 𝜀) protocols, we obtain (13.2.64)
by optimizing over all protocols (𝑑, L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ ), with 𝑑 𝐴ˆ = 𝑑 𝐵ˆ = 𝑑 ≥ 1. ■

Given an 𝜀 ∈ (0, 1), the inequality in (13.2.63) gives us a bound on the rate of
an arbitrary (𝑛, 𝑑, 𝜀) entanglement distillation protocol for a state 𝜌 𝐴𝐵 . If instead
we fix the rate to be 𝑟, so that 𝑑 = 2𝑛𝑟 , then the inequality in (13.2.63) is as follows:

𝛼 1
𝑟≤𝑅 e𝛼 ( 𝐴; 𝐵) 𝜌 + log2 (13.2.68)
𝑛(𝛼 − 1) 1−𝜀
for all 𝛼 > 1. Rearranging this inequality gives us the following lower bound on 𝜀:

𝜀 ≥ 1 − 2−𝑛 ( )(𝑟− 𝑅e𝛼 ( 𝐴;𝐵)𝜌 )

𝛼−1
𝛼 (13.2.69)
847
Chapter 13: Entanglement Distillation

for all 𝛼 > 1.

Theorem 13.24 Strong Converse Upper Bound on Distillable Entangle-

ment
Let 𝜌 𝐴𝐵 be a bipartite state. The Rains relative entropy 𝑅( 𝐴; 𝐵) 𝜌 is a strong
converse rate for entanglement distillation for 𝜌 𝐴𝐵 , i.e.,
e𝐷 ( 𝐴; 𝐵) 𝜌 ≤ 𝑅( 𝐴; 𝐵) 𝜌 ,
𝐸 (13.2.70)

where we recall that 𝑅( 𝐴; 𝐵) 𝜌 is defined as

𝑅( 𝐴; 𝐵) 𝜌 = inf 𝐷 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ). (13.2.71)

𝜎𝐴𝐵 ∈PPT′ ( 𝐴:𝐵)

Proof: Let 𝜀 ∈ [0, 1) and 𝛿 > 0. Let 𝛿1 , 𝛿2 > 0 be such that

𝛿 > 𝛿 1 + 𝛿 2 C 𝛿′ . (13.2.72)

Set 𝛼 ∈ (1, ∞) such that

𝛿1 ≥ 𝑅
e𝛼 ( 𝐴; 𝐵) 𝜌 − 𝑅( 𝐴; 𝐵) 𝜌 , (13.2.73)

which is possible because 𝑅e𝛼 ( 𝐴; 𝐵) 𝜌 is monotonically increasing in 𝛼 (which

follows from Proposition 7.31) and because lim𝛼→1+ 𝑅 e𝛼 ( 𝐴; 𝐵) 𝜌 = 𝑅( 𝐴; 𝐵) 𝜌 (see
Appendix 10.A for a proof). With this value of 𝛼, take 𝑛 large enough so that

𝛼 1
𝛿2 ≥ log2 . (13.2.74)
𝑛(𝛼 − 1) 1−𝜀

Now, with the values of 𝑛 and 𝜀 as above, an arbitrary (𝑛, 𝑑, 𝜀) entanglement

distillation protocol for 𝜌 𝐴𝐵 satisfies (13.2.63), i.e.,

log2 𝑑 𝛼 1
≤𝑅 e𝛼 ( 𝐴; 𝐵) 𝜌 + log2 (13.2.75)
𝑛 𝑛(𝛼 − 1) 1−𝜀
for all 𝛼 ∈ (1, ∞). Rearranging the right-hand side of this inequality, and using
(13.2.72)–(13.2.74), we obtain

log2 𝑑 𝛼 1
≤ 𝑅( 𝐴; 𝐵) 𝜌 + 𝑅
e𝛼 ( 𝐴; 𝐵) 𝜌 − 𝑅( 𝐴; 𝐵) 𝜌 + log2 (13.2.76)
𝑛 𝑛(𝛼 − 1) 1−𝜀
848
Chapter 13: Entanglement Distillation

≤ 𝑅( 𝐴; 𝐵) 𝜌 + 𝛿1 + 𝛿2 (13.2.77)
= 𝑅( 𝐴; 𝐵) 𝜌 + 𝛿′ (13.2.78)
< 𝑅( 𝐴; 𝐵) 𝜌 + 𝛿. (13.2.79)
log 𝑑
So we have that 𝑛2 < 𝑅( 𝐴; 𝐵) 𝜌 + 𝛿 for all (𝑛, 𝑑, 𝜀) entanglement distillation
protocols for 𝜌 𝐴𝐵 with sufficiently large 𝑛 such that (13.2.74) holds. Due to this strict
inequality, it follows that there cannot exist an (𝑛, 2𝑛(𝑅( 𝐴;𝐵)𝜌 +𝛿) , 𝜀) entanglement
distillation protocol for 𝜌 𝐴𝐵 for all sufficiently large 𝑛 such that (13.2.74) holds.
For if it were to exist, there would be a 𝑑 ≥ 1 such that log2 𝑑 = 𝑛(𝑅( 𝐴; 𝐵) 𝜌 + 𝛿),
which we have just seen is not possible. Since 𝜀 and 𝛿 are arbitrary, we conclude
that for all 𝜀 ∈ [0, 1), 𝛿 > 0, and sufficiently large 𝑛, there does not exist an
(𝑛, 2𝑛(𝑅( 𝐴;𝐵)𝜌 +𝛿) , 𝜀) entanglement distillation protocol for 𝜌 𝐴𝐵 . This means that
𝑅( 𝐴; 𝐵) 𝜌 is a strong converse rate, so that 𝐸 e𝐷 ( 𝐴; 𝐵) 𝜌 ≤ 𝑅( 𝐴; 𝐵) 𝜌 . ■

Given that the Rains relative entropy is a strong converse rate for distillable
entanglement, by following arguments analogous to those in the proof above, we
can conclude that 1𝑘 𝑅( 𝐴 𝑘 ; 𝐵 𝑘 ) 𝜌 ⊗𝑘 is a strong converse rate for all 𝑘 ≥ 2. Therefore,
the regularized quantity
1
𝑅 reg ( 𝐴; 𝐵) 𝜌 B lim 𝑅( 𝐴𝑛 ; 𝐵𝑛 ) 𝜌 ⊗𝑛 , (13.2.80)
𝑛→∞ 𝑛

is a strong converse rate for entanglement distillation for 𝜌 𝐴𝐵 , so that

e𝐷 ( 𝐴; 𝐵) 𝜌 ≤ 𝑅 reg ( 𝐴; 𝐵) 𝜌 .
𝐸 (13.2.81)

By subadditivity of Rains relative entropy (see (9.3.18)),

𝑅 reg ( 𝐴; 𝐵) 𝜌 ≤ 𝑅( 𝐴; 𝐵) 𝜌 , (13.2.82)

so that the regularized quantity in general gives a tighter upper bound on distillable
entanglement.

13.2.3.1 The Strong Converse from a Different Point of View

We now show that the Rains relative entropy is a strong converse rate using the
equivalent characterization of a strong converse rate in (13.2.9). In other words,
given a bipartite state 𝜌 𝐴𝐵 , we show that for an arbitrary sequence {(𝑛, 2𝑛𝑟 , 𝜀 𝑛 )}𝑛∈N

849
Chapter 13: Entanglement Distillation

of (𝑛, 𝑑, 𝜀) protocols with rates 𝑟 > 𝑅( 𝐴; 𝐵) 𝜌 , it holds that lim𝑛→∞ 𝜀 𝑛 = 1. Indeed,

for every element of the sequence, the inequality in (13.2.68) applies; namely,

𝛼 1
𝑟≤𝑅 e𝛼 ( 𝐴; 𝐵) 𝜌 + log2 (13.2.83)
𝑛(𝛼 − 1) 1 − 𝜀𝑛
for all 𝛼 > 1. Rearranging this inequality gives us the following lower bound on 𝜀 𝑛 :

𝜀 𝑛 ≥ 1 − 2−𝑛 ( )(𝑟− 𝑅e𝛼 ( 𝐴;𝐵)𝜌 )

𝛼−1
𝛼 (13.2.84)

for all 𝛼 > 1. Now, since 𝑟 > 𝑅( 𝐴; 𝐵) 𝜌 , lim𝛼→1+ 𝑅 e𝛼 ( 𝐴; 𝐵) 𝜌 = 𝑅( 𝐴; 𝐵) 𝜌 (see

Appendix 10.A), and because 𝑅 e𝛼 ( 𝐴; 𝐵) 𝜌 is monotonically increasing in 𝛼 (this
follows from Proposition 7.31), there exists an 𝛼∗ > 1 such that 𝑟 > 𝑅 e𝛼∗ ( 𝐴; 𝐵) 𝜌 .
Applying the inequality in (13.2.84) to this value of 𝛼, we find that
𝛼∗ −1

−𝑛 𝛼∗ (𝑟− 𝑅e𝛼∗ ( 𝐴;𝐵)𝜌 )
𝜀𝑛 ≥ 1 − 2 . (13.2.85)

Then, taking the limit 𝑛 → ∞ on both sides of this inequality, we conclude that
lim𝑛→∞ 𝜀 𝑛 ≥ 1. But 𝜀 𝑛 ≤ 1 for all 𝑛 because 𝜀 𝑛 is a probability by definition. So
we obtain lim𝑛→∞ 𝜀 𝑛 = 1. Since the rate 𝑟 > 𝑅( 𝐴; 𝐵) 𝜌 is arbitrary, we conclude
that 𝑅( 𝐴; 𝐵) 𝜌 is a strong converse rate for entanglement distillation for 𝜌 𝐴𝐵 . We
also see from (13.2.85) that the sequence {𝜀 𝑛 }𝑛∈N approaches one at an exponential
rate.

13.2.4 Squashed Entanglement Weak Converse Upper Bound

In this section, we establish the squashed entanglement of a bipartite state as a

weak converse upper bound on its distillable entanglement. The main idea is to
apply the one-shot bound from Theorem 13.9 and the additivity of the squashed
entanglement (Proposition 9.32) in order to arrive at this conclusion.

Corollary 13.25
Let 𝜌 𝐴𝐵 be a bipartite state, let 𝑛 ≥ 1, and let 𝜀 ∈ [0, 1). For an (𝑛, 𝑑, 𝜀)
entanglement distillation protocol, the following bound holds
√
1 1 𝑔2 ( 𝜀)
log2 𝑑 ≤ √ 𝐸 sq ( 𝐴; 𝐵) 𝜌 + . (13.2.86)
𝑛 1− 𝜀 𝑛

850
Chapter 13: Entanglement Distillation

Proof: An (𝑛, 𝑑, 𝜀) entanglement distillation protocol for 𝜌 𝐴𝐵 is a (𝑑, 𝜀) entangle-

ment distillation protocol for 𝜌 ⊗𝑛
𝐴𝐵 . Therefore, applying the inequality in (13.9) to
⊗𝑛
the state 𝜌 𝐴𝐵 and dividing both sides by 𝑛 leads to
√
1 1 1 𝑔 2 ( 𝜀)
log2 𝑑 ≤ √ 𝐸 sq ( 𝐴𝑛 ; 𝐵𝑛 ) 𝜌 ⊗𝑛 + . (13.2.87)
𝑛 1− 𝜀 𝑛 𝑛

Now, by additivity of the squashed entanglement (see (9.4.8)), we have that

𝐸 sq ( 𝐴𝑛 ; 𝐵𝑛 ) 𝜌 ⊗𝑛 = 𝑛𝐸 sq ( 𝐴; 𝐵) 𝜌 . (13.2.88)

This concludes the proof. ■

We now provide a proof of (13.2.15), the statement that the squashed entan-
glement is a weak converse rate for entanglement distillation. Suppose that 𝑅 is
an achievable rate for entanglement distillation for the bipartite state 𝜌 𝐴𝐵 . Then,
by definition, for all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently large 𝑛, there exists an
(𝑛, 2𝑛(𝑅−𝛿) , 𝜀) entanglement distillation protocol for 𝜌 𝐴𝐵 . For all such protocols,
the inequality in (13.2.86) holds, so that

1 √

1
𝑅−𝛿 ≤ √ 𝐸 sq ( 𝐴; 𝐵) 𝜌 + 𝑔2 ( 𝜀) . (13.2.89)
1− 𝜀 𝑛

Since the inequality holds for all sufficiently large 𝑛, it holds in the limit 𝑛 → ∞,
so that
1 √

1
𝑅 ≤ lim √ 𝐸 sq ( 𝐴; 𝐵) 𝜌 + 𝑔2 ( 𝜀) + 𝛿 (13.2.90)
𝑛→∞ 1 − 𝜀 𝑛
1
= √ 𝐸 sq ( 𝐴; 𝐵) 𝜌 + 𝛿. (13.2.91)
1− 𝜀

Then, since this inequality holds for all 𝜀 ∈ (0, 1] and 𝛿 > 0, we conclude that

1
𝑅 ≤ lim √ 𝐸 sq ( 𝐴; 𝐵) 𝜌 + 𝛿 (13.2.92)
𝜀,𝛿→0 1 − 𝜀
= 𝐸 sq ( 𝐴; 𝐵) 𝜌 . (13.2.93)

We have thus shown that the squashed entanglement 𝐸 sq ( 𝐴; 𝐵) 𝜌 is a weak converse

rate for entanglement distillation.

851
Chapter 13: Entanglement Distillation

13.2.5 One-Way Entanglement Distillation

In Section 13.1.2, we considered a one-way entanglement distillation protocol to

derive a lower bound on the one-shot distillable entanglement of a bipartite state.
In the asymptotic setting, this leads to the coherent information lower bound on
distillable entanglement, i.e.,
𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 ≥ 𝐼 ( 𝐴⟩𝐵) 𝜌 = 𝐻 (𝐵) 𝜌 − 𝐻 ( 𝐴𝐵) 𝜌 , (13.2.94)
which holds for every bipartite state 𝜌 𝐴𝐵 . By simply reversing the roles of Alice
and Bob in the protocol, it follows that
𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 ≥ 𝐼 (𝐵⟩ 𝐴) 𝜌 = 𝐻 ( 𝐴) 𝜌 − 𝐻 ( 𝐴𝐵) 𝜌 . (13.2.95)
The quantity on the right-hand side of the above inequality is sometimes called
reverse coherent information. Thus, in general, we have the following lower bound
on distillable entanglement:
𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 ≥ max{𝐼 ( 𝐴⟩𝐵) 𝜌 , 𝐼 (𝐵⟩ 𝐴) 𝜌 }, (13.2.96)
which holds for every bipartite state 𝜌 𝐴𝐵 .
The coherent information lower bound can be improved by first applying a
two-way LOCC channel to 𝑛 copies of the given state, and then performing a
one-way entanglement distillation protocol at the coherent information rate. This
leads to
1 1
𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 = lim sup 𝐼 ( 𝐴′⟩𝐵′)L(𝜌 ⊗𝑛 ) = lim 𝐷 ↔ (𝜌 ⊗𝑛
𝐴𝐵 ) (13.2.97)
𝑛→∞ 𝑛 L 𝑛→∞ 𝑛

where the optimization is over every two-way LOCC channel L 𝐴𝑛 𝐵𝑛 →𝐴′ 𝐵′ .

If we restrict the optimization in (13.2.97) above to one-way LOCC channels
of the form L 𝐴𝑛 𝐵𝑛 →𝐴′ 𝐵′ , then we obtain what is called the one-way distillable
entanglement, denoted by 𝐸 𝐷 → ( 𝐴; 𝐵) , and defined operationally in a similar way
𝜌
to the distillable entanglement 𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 , but with the free operations allowed
restricted to one-way LOCC. A key result is the following equality:
→ 1 1
𝐸𝐷 ( 𝐴; 𝐵) 𝜌 = lim sup 𝐼 ( 𝐴′⟩𝐵′)L→ (𝜌 ⊗𝑛 ) = lim 𝐷 → (𝜌 ⊗𝑛
𝐴𝐵 ), (13.2.98)
𝑛→∞ 𝑛 L→ 𝑛→∞ 𝑛

where
𝐷 → (𝜌 𝐴𝐵 ) B sup 𝐼 ( 𝐴′⟩𝐵′)L→ (𝜌) . (13.2.99)
L→

852
Chapter 13: Entanglement Distillation

Like the distillable entanglement, the one-way distillable entanglement is an

operational quantity of interest in entanglement theory. Furthermore, the equality
in (13.2.98) can be proved along similar lines to how we proved (13.2.13).
In what follows, we show that this expression for one-way distillable entangle-
ment can be simplified.

Theorem 13.26 One-Way Distillable Entanglement of a Bipartite State

The one-way distillable entanglement of a bipartite state 𝜌 𝐴𝐵 is given by

→ 1
𝐸𝐷 ( 𝐴; 𝐵) 𝜌 = lim sup 𝐼 ( 𝐴′⟩𝑋 𝐵𝑛 )𝑉 𝜌 ⊗𝑛𝑉 † , (13.2.100)
𝑛→∞ 𝑛 𝑉

where 𝑉𝐴𝑛 →𝐴′ 𝑋 𝐸 is an isometry of the form

∑︁
𝑉𝐴𝑛 →𝐴′ 𝑋 𝐸 = 𝐾 𝐴𝑥 𝑛 →𝐴′ ⊗ |𝑥⟩ 𝑋 ⊗ |𝑥⟩𝐸 , (13.2.101)
𝑥∈X

with 𝑥∈X (𝐾 𝐴𝑥 𝑛 ) † 𝐾 𝐴𝑥 𝑛 = 1 𝐴𝑛 , 𝑑 𝐴′ = 𝑑 𝑛𝐴 , 𝑑 𝑋 ≤ 𝑑 2𝐴 , and 𝑑 𝐸 = |X| = 𝑑 𝑋 .

Í
Additionally,
𝐷 → (𝜌 𝐴𝐵 ) = sup 𝐼 ( 𝐴′⟩𝑋 𝐵)𝑉 𝜌𝑉 † , (13.2.102)
𝑉

with the optimization over isometries 𝑉 as in (13.2.101), with 𝑛 = 1.

This theorem tells us that, to determine the one-way distillable entanglement of

a bipartite state, it suffices to optimize over one-way LOCC channels that consist
of only a quantum instrument for Alice, with each of the corresponding maps
containing just one Kraus operator. Furthermore, it suffices to take 𝐴′ = 𝐴𝑛 and
𝐵′ = 𝐵 𝑛 .

Proof: We start by recalling from Section 4.6.2 (see also the beginning of
Section 13.1.2) that every one-way LOCC channel L→
𝐴𝑛 𝐵 𝑛 →𝐴′ 𝐵′ can be expressed as

𝜔 𝐴 ′ 𝐵 ′ B L→ ′ ′ (𝜌 𝑛 𝑛 ) (13.2.103)
∑︁𝐴 𝐵 →𝐴 𝐵 𝐴 𝐵
𝑛 𝑛

= (E𝑥𝐴𝑛 →𝐴′ ⊗ D𝑥𝐵𝑛 →𝐵′ )(𝜌 𝐴𝑛 𝐵𝑛 ) (13.2.104)

𝑥∈X

= D 𝑋𝐵 𝐵𝑛 →𝐵′ ◦ C 𝑋 𝐴→𝑋𝐵 ◦ E 𝐴𝑛 →𝐴′ 𝑋 𝐴 (𝜌 𝐴𝑛 𝐵𝑛 ), (13.2.105)
where X is some finite alphabet, 𝑑 𝑋 𝐴 = 𝑑 𝑋𝐵 = |X|, {E𝑥𝐴𝑛 →𝐴′ }𝑥∈X is a set of com-
853
Chapter 13: Entanglement Distillation
Í
pletely positive maps such that 𝑥∈X E𝑥𝐴𝑛 →𝐴′ is trace preserving, and {D𝑥𝐵𝑛 →𝐵′ }𝑥∈X
is a set of channels. In particular,
∑︁
E 𝐴𝑛 →𝐴′ 𝑋 𝐴 (𝜌 𝐴𝑛 𝐵𝑛 ) = E𝑥𝐴𝑛 →𝐴′ (𝜌 𝐴𝑛 𝐵𝑛 ) ⊗ |𝑥⟩⟨𝑥| 𝑋 𝐴 , (13.2.106)
𝑥∈X
D 𝑋𝐵 𝐵𝑛 →𝐵′ (|𝑥⟩⟨𝑥| 𝑋𝐵 ⊗ 𝜌 𝐴𝑛 𝐵𝑛 ) = D𝑥𝐵𝑛 →𝐵′ (𝜌 𝐴𝑛 𝐵𝑛 ). (13.2.107)

For every 𝑛 ≥ 1, if we restrict the optimization in (13.2.98) such that |X| = 𝑑 2𝐴𝑛 =
𝐴 , 𝑑 = 𝑑 𝐴 = 𝑑 𝐴, D
𝑑 2𝑛 = id 𝑛 for all 𝑥 ∈ X, and E𝑥𝐴𝑛 →𝐴′ (·) = 𝐾 𝐴𝑥 𝑛 (·)𝐾 𝐴𝑥 𝑛 for
′
𝑛 𝑥
Í𝐵𝑛 →𝐵′ 𝑥 † 𝐵 𝑥
all 𝑥 ∈ X such that 𝑥∈X (𝐾 𝐴𝑛 ) 𝐾 𝐴𝑛 = 1 𝐴𝑛 , then the LOCC channel L→ 𝐴𝑛 𝐵 𝑛 →𝐴′ 𝐵′
reduces to
∑︁
L→𝐴𝑛 𝐵 𝑛 →𝐴′ 𝐵′ (𝜌 𝑛
𝐴 𝐵 𝑛 ) = 𝐾 𝐴𝑥 𝑛 𝜌 𝐴𝑛 𝐵𝑛 (𝐾 𝐴𝑥 𝑛 ) † ⊗ |𝑥⟩⟨𝑥| 𝑋 (13.2.108)
𝑥∈X

for every state 𝜌 𝐴𝑛 𝐵𝑛 , and it has an isometric extension of the form

∑︁
𝑉𝐴𝑛 →𝐴𝑛 𝑋 𝐸 = 𝐾 𝐴𝑥 𝑛 ⊗ |𝑥⟩ 𝑋 ⊗ |𝑥⟩𝐸 . (13.2.109)
𝑥∈X

We thus obtain
→ 1
𝐸𝐷 ( 𝐴; 𝐵) 𝜌 ≥ lim sup 𝐼 ( 𝐴𝑛 ⟩𝑋 𝐵𝑛 )𝜔 (13.2.110)
𝑛→∞ 𝑛 𝑉

The rest of the proof is devoted to proving the reverse inequality. Let L→
𝐴𝑛 𝐵 𝑛 →𝐴′ 𝐵′
be an arbitrary one-way LOCC channel of the form in (13.2.103)–(13.2.105). For
every state 𝜌 𝐴𝑛 𝐵𝑛 , let

𝑝 𝑥 B Tr[E𝑥𝐴𝑛 →𝐴′ (𝜌 𝐴𝑛 𝐵𝑛 )], (13.2.111)

1
𝜌 𝑥𝐴′ 𝐵𝑛 B E𝑥𝐴𝑛 →𝐴′ (𝜌 𝐴𝑛 𝐵𝑛 ), (13.2.112)
𝑝𝑥
1
𝜌 𝑥𝐵𝑛 B Tr 𝐴′ [E𝑥𝐴𝑛 →𝐴′ (𝜌 𝐴𝑛 𝐵𝑛 )], (13.2.113)
𝑝𝑥
for all 𝑥 ∈ X. Then, using the data-processing inequality for coherent information
in (7.3.17) and the direct-sum property for quantum relative entropy, we obtain

𝐼 ( 𝐴′⟩𝐵′)𝜔 ≤ 𝐼 ( 𝐴′⟩𝐵𝑛 𝑋𝐵 )E(𝜌) (13.2.114)

𝑝 𝑥 𝐷 (𝜌 𝑥𝐴′ 𝐵𝑛 ∥ 1 𝐴′ ⊗ 𝜌 𝑥𝐵𝑛 )
∑︁
= (13.2.115)
𝑥∈X

854
Chapter 13: Entanglement Distillation
∑︁
= 𝑝 𝑥 𝐼 ( 𝐴′⟩𝐵𝑛 ) 𝜌 𝑥 . (13.2.116)
𝑥∈X

Now,
∑︁
E 𝐴𝑛 →𝐴′ 𝑋 𝐴 (𝜌 𝐴𝑛 𝐵𝑛 )= E𝑥𝐴𝑛 →𝐴′ (𝜌 𝐴𝐵 ) ⊗ |𝑥⟩⟨𝑥| 𝑋 𝐴 (13.2.117)
𝑥∈X
∑︁
= 𝑝 𝑥 𝜌 𝑥𝐴′ 𝐵 ⊗ |𝑥⟩⟨𝑥| 𝑋 𝐴 . (13.2.118)
𝑥∈X

Suppose that E𝑥𝐴𝑛 →𝐴′ has the following Kraus representation:

∑︁
𝐾 𝐴𝑛 →𝐴′ (·)(𝐾 𝐴𝑛 →𝐴′ ) † ,
𝑥,𝑦 𝑥,𝑦
E 𝐴𝑛 →𝐴′ (·) =
𝑥
(13.2.119)
𝑦∈Y

where Y is some finite alphabet. Let

𝑞 𝑥,𝑦 B Tr[𝐾 𝐴𝑛 →𝐴′ 𝜌 𝐴𝑛 𝐵𝑛 (𝐾 𝐴𝑛 →𝐴′ ) † ],
𝑥,𝑦 𝑥,𝑦
(13.2.120)
Í 𝑞
and observe that 𝑝 𝑥 = 𝑦∈Y 𝑞 𝑥,𝑦 . Therefore, for each 𝑥 ∈ X, the values 𝑟 𝑦|𝑥 B 𝑝𝑥,𝑦
𝑥
constitute a probability distribution on Y, in the sense that 𝑟 𝑦|𝑥 ≥ 0 for all 𝑦 ∈ Y,
Í
and 𝑦∈Y 𝑟 𝑦|𝑥 = 1. Using this, and letting
1 𝑥,𝑦
𝐾 𝐴𝑛 →𝐴′ 𝜌 𝐴𝑛 𝐵𝑛 (𝐾 𝐴𝑛 →𝐴′ ) † ,
𝑥,𝑦 𝑥,𝑦
𝜌 𝐴′ 𝐵 𝑛 B (13.2.121)
𝑞 𝑥,𝑦
so that ∑︁
𝑥,𝑦
𝑝 𝑥 𝜌 𝑥𝐴′ 𝐵𝑛 = 𝑞 𝑥,𝑦 𝜌 𝐴′ 𝐵𝑛 , (13.2.122)
𝑦∈Y
we find that
∑︁
′ 𝑛
𝑝 𝑥 𝐼 ( 𝐴 ⟩𝐵 ) 𝜌 𝑥 ≤ 𝑝 𝑥 𝑟 𝑦|𝑥 𝐼 ( 𝐴′⟩𝐵𝑛 ) 𝜌 𝑥,𝑦 , (13.2.123)
𝑦∈Y

for all 𝑥 ∈ X, where the inequality follows from convexity of coherent information
(see (7.2.121)). Without loss of generality, we can take 𝐴′ ≡ 𝐴𝑛 : if 𝐴′ has a
dimension smaller than that of 𝐴𝑛 , we can always first isometrically embed 𝐴′ into
𝐴𝑛 . The coherent information remains unchanged under this isometric embedding.
Combining the last inequality above with the one in (13.2.116), we conclude
that
∑︁ ∑︁ ∑︁
′ ′ ′ 𝑛 𝑥,𝑦
𝐼 ( 𝐴 ⟩𝐵 )𝜔 ≤ 𝑝𝑥 𝑟 𝑦|𝑥 𝐼 ( 𝐴 ⟩𝐵 ) 𝜌 = 𝑞 𝑥,𝑦 𝐼 ( 𝐴′⟩𝐵𝑛 ) 𝜌 𝑥,𝑦 . (13.2.124)
𝑥∈X 𝑦∈Y 𝑥∈X,𝑦∈Y

855
Chapter 13: Entanglement Distillation

Then assigning the superindex 𝑧 = (𝑥, 𝑦) with Z = X × Y, we finally have

𝐼 ( 𝐴′⟩𝐵′)𝜔 ≤ sup 𝐼 ( 𝐴′⟩𝐵𝑛 𝑍 𝐵 )E(𝜌 ⊗𝑛 ) , (13.2.125)
E

where the optimization is over channels E 𝐴𝑛 →𝐴′ 𝑍 𝐵 of the form

∑︁
E 𝐴𝑛 →𝐴′ 𝑍 𝐵 (·) B 𝐾 𝐴𝑧 𝑛 →𝐴′ (·)(𝐾 𝐴𝑧 𝑛 →𝐴′ ) † ⊗ |𝑧⟩⟨𝑧| 𝑍 𝐵 , (13.2.126)
𝑧∈X

such that 𝑧∈Z (𝐾 𝐴𝑧 𝑛 →𝐴′ ) † 𝐾 𝐴𝑧 𝑛 →𝐴′ = 1 𝐴𝑛 . (This is effectively an optimization over

Í
operators {𝐾 𝐴𝑧 𝑛 →𝐴′ } 𝑧∈Z such that 𝑧∈Z (𝐾 𝐴𝑧 𝑛 →𝐴′ ) † 𝐾 𝐴𝑧 𝑛 →𝐴′ = 1 𝐴𝑛 .) The channel
Í
E 𝐴𝑛 →𝐴′ 𝑍 𝐵 has an isometric extension of the form
∑︁
𝑉𝐴 →𝐴 𝑍 𝐵 𝐸 =
𝑛 ′ 𝐾 𝐴𝑧 𝑛 →𝐴′ ⊗ |𝑧⟩𝑍 𝐵 ⊗ |𝑧⟩𝐸 , (13.2.127)
𝑧∈Z

where 𝑑 𝐸 = 𝑑 𝑍 𝐵 = |Z|. Since the number of Kraus operators need not exceed
𝑑 2𝐴𝑛 = 𝑑 2𝑛 2𝑛
𝐴 (see Theorem 4.3), we can take 𝑑 𝑍 = 𝑑 𝐴 without loss of generality. We
can thus optimize over all isometries of the form in (13.2.127). Altogether, we have
that
𝐼 ( 𝐴′⟩𝐵′)𝜔 ≤ sup 𝐼 ( 𝐴′⟩𝑍 𝐵𝑛 )𝑉 𝜌 ⊗𝑛𝑉 † (13.2.128)
𝑉
for every one-way LOCC channel L→ and all 𝑛 ≥ 1. Optimizing over all
𝐴𝑛 𝐵 𝑛 →𝐴′ 𝐵′
one-way LOCC channels on the left-hand side of the inequality above, and taking
the limit 𝑛 → ∞ leads us to conclude that
→ 1
𝐸𝐷 ( 𝐴; 𝐵) 𝜌 ≤ lim sup 𝐼 ( 𝐴′⟩𝑍 𝐵𝑛 )𝑉 𝜌 ⊗𝑛𝑉 † . (13.2.129)
𝑛→∞ 𝑛 𝑉

Combining this with (13.2.110) and reassigning 𝑍 as 𝑋 finishes the proof. ■

Lemma 13.27
For every bipartite state 𝜌 𝐴𝐵 , the optimized coherent information lower bound
on distillable entanglement is non-negative, i.e., 𝐷 → (𝜌 𝐴𝐵 ) ≥ 0.

Proof: Let 𝜓 𝐴𝐵𝑅 = |𝜓⟩⟨𝜓| 𝐴𝐵𝑅 be a purification of 𝜌 𝐴𝐵 , and consider the following
Schmidt decomposition of |𝜓⟩ 𝐴𝐵𝑅 :
𝑟−1 √︁
∑︁
|𝜓⟩ 𝐴𝐵𝑅 = 𝜆 𝑘 |𝜙 𝑘 ⟩ 𝐴 ⊗ |𝜑 𝑘 ⟩𝐵𝑅 . (13.2.130)
𝑘=0

856
Chapter 13: Entanglement Distillation

Then, let
𝑟−1
∑︁
𝑉𝐴→𝐴′ 𝑋 𝐸 B |𝑘⟩ 𝐴′ ⟨𝜙 𝑘 | 𝐴 ⊗ |𝑘⟩ 𝑋 ⊗ |𝑘⟩𝐸 . (13.2.131)
𝑘=0

It is then straightforward to show that 𝐼 ( 𝐴′⟩𝑋 𝐵)𝑉 𝜌𝑉 † = 0. Since 𝑉 is an example

of an isometry in the optimization for 𝐷 → (𝜌 𝐴𝐵 ), we conclude that 𝐷 → (𝜌 𝐴𝐵 ) ≥
𝐼 ( 𝐴′⟩𝑋 𝐵)𝑉 𝜌𝑉 † = 0. ■

13.3 Examples
We now consider classes of bipartite states and evaluate the upper and lower bounds
on their distillable entanglement that we have established in this chapter. In some
cases, the distillable entanglement can be determined exactly because the upper
and lower bounds coincide.

13.3.1 Pure States

The simplest example for which distillable entanglement can be determined exactly
is the class of pure bipartite states. In this case, the coherent information lower
bound and the Rains relative entropy upper bound coincide and are equal to the
entropy of the reduced state. Indeed, for the coherent information, the joint entropy
𝐻 ( 𝐴𝐵)𝜓 = 0 for every pure state 𝜓 𝐴𝐵 , so that

𝐼 ( 𝐴⟩𝐵)𝜓 = 𝐻 (𝐵)𝜓 − 𝐻 ( 𝐴𝐵)𝜓 = 𝐻 (𝐵)𝜓 = 𝐻 ( 𝐴)𝜓 , (13.3.1)

where the last equality follows from the Schmidt decomposition theorem (Theo-
rem 2.2) to see that the reduced states 𝜓 𝐴 and 𝜓 𝐵 have the same non-zero eigenvalues,
and thus the same value for the entropy. On the other hand, Proposition 9.20 states
that the relative entropy of entanglement 𝐸 𝑅 ( 𝐴; 𝐵)𝜓 = 𝐻 ( 𝐴)𝜓 , and we also know
that 𝐸 𝑅 ( 𝐴; 𝐵)𝜓 ≥ 𝑅( 𝐴; 𝐵)𝜓 (see (9.1.149)). We thus have the following:

857
Chapter 13: Entanglement Distillation

Theorem 13.28 Distillable Entanglement for Pure States

The distillable entanglement of a pure bipartite state 𝜓 𝐴𝐵 is equal to the entropy
of the reduced state on 𝐴, i.e.,

𝐸 𝐷 ( 𝐴; 𝐵)𝜓 = 𝐻 ( 𝐴)𝜓 . (13.3.2)

13.3.2 Degradable and Anti-Degradable States

In this section, we define two classes of states for which the one-way distillable
entanglement takes on a simple form.

Definition 13.29 Degradable and Anti-Degradable Bipartite States

Given a bipartite state 𝜌 𝐴𝐵 with purification 𝜓 𝐴𝐵𝐸 , we call it degradable if there
exists a quantum channel D𝐵→𝐸 ′ such that the state 𝜏𝐴𝐸 ′ 𝐸 B D𝐵→𝐸 ′ (𝜓 𝐴𝐵𝐸 )
satisfies
𝜏𝐴𝐸 ′ = 𝜏𝐴𝐸 = 𝜓 𝐴𝐸 . (13.3.3)
We call 𝜌 𝐴𝐵 anti-degradable if there exists a quantum channel A𝐸→𝐵′ such that
the state 𝜔 𝐴𝐵𝐵′ B A𝐸→𝐵′ (𝜓 𝐴𝐵𝐸 ) satisfies

𝜔 𝐴𝐵′ = 𝜔 𝐴𝐵 = 𝜌 𝐴𝐵 . (13.3.4)

Remark: Degradable and anti-degradable states are the state counterparts of degradable
and anti-degradable channels; see Definition 4.6. In fact, observe that the Choi state of a
degradable channel is a degradable state, and the Choi state of an anti-degradable channel is an
anti-degradable state.
Anti-degradable states are also sometimes called symmetrically extendible states or two-
extendible states (please consult the Bibliographic Notes in Section 13.5).

Intuitively, a degradable state is one for which the system 𝐵 can be used
to simulate (via a quantum channel D𝐵→𝐸 ′ ) the correlations between 𝐴 and 𝐸.
Analogously, an anti-degrdable state is one for which the system 𝐸 can be used to
simulate (via a quantum channel A𝐸→𝐵′ ) the correlations between 𝐴 and 𝐵.
An anti-degradable state 𝜌 𝐴𝐵 is one for which the environment 𝐸 (corresponding
to the purifying system of 𝜌 𝐴𝐵 ) cannot be decoupled from 𝐴 and 𝐵 through LOCC
858
Chapter 13: Entanglement Distillation

from 𝐴 to 𝐵 alone. Indeed, recall the task of decoupling from Section 13.1.2 (in
particular, see Figure 13.2). Since a channel can always be applied to 𝐸 in order to
simulate the correlations between 𝐴 and 𝐵, from the point of view of 𝐴, the systems
𝐵 and 𝐸 become indistinguishable, so that 𝐴 and 𝐵 cannot be (perfectly) decoupled
from 𝐸. Given that decoupling is not possible for anti-degradable states, we might
expect that anti-degradable states have zero one-way distillable entanglement. This
is indeed true, as we now show.

Theorem 13.30 One-Way Distillable Entanglement for Anti-Degradable

States
For an anti-degradable state 𝜌 𝐴𝐵 , the one-way distillable entanglement is equal
→ ( 𝐴; 𝐵) = 0.
to zero, i.e., 𝐸 𝐷 𝜌

Proof: Let 𝑉𝐴→𝐴′ 𝑋 𝐸 be an arbitrary isometry in the optimization for 𝐷 → (𝜌 𝐴𝐵 ).

Also, let 𝜓 𝐴𝐵𝑅 be a purification of 𝜌 𝐴𝐵 . Then, because 𝜌 𝐴𝐵 is anti-degradable,
there exists a channel A 𝑅→𝐵 such that

𝜌 𝐴𝐵 = A 𝑅→𝐵 (𝜓 𝐴𝑅 ). (13.3.5)

Now, let
†
𝜔 𝐴′ 𝑋 𝐸 𝐵𝑅 = 𝑉𝐴→𝐴′ 𝑋 𝐸 𝜓 𝐴𝐵𝑅𝑉𝐴→𝐴 ′𝑋𝐸, (13.3.6)
which is a pure state. Then, using the fact that
†
𝜔 𝐴′ 𝑋 𝐵 = Tr𝐸 [𝑉𝐴→𝐴′ 𝑋 𝐸 𝜌 𝐴𝐵𝑉𝐴→𝐴 ′𝑋𝐸] (13.3.7)
†
= (A 𝑅→𝐵 ◦ Tr𝐸 )(𝑉𝐴→𝐴′ 𝑋 𝐸 𝜓 𝐴𝑅𝑉𝐴→𝐴 ′𝑋𝐸) (13.3.8)
= A 𝑅→𝐵 (𝜔 𝐴′ 𝑋 𝑅 ), (13.3.9)

and that 𝜔 𝑋 𝐵 = A 𝑅→𝐵 (𝜔 𝑋 𝑅 ), we find that

𝐼 ( 𝐴′⟩𝑋 𝐵)𝜔 ≤ 𝐼 ( 𝐴′⟩𝑅𝑋)𝜔 (13.3.10)

= 𝐻 (𝑅𝑋)𝜔 − 𝐻 ( 𝐴′ 𝑅𝑋)𝜔 (13.3.11)
= 𝐻 ( 𝐴′ 𝐸 𝐵)𝜔 − 𝐻 (𝐸 𝐵)𝜔 (13.3.12)
= 𝐻 ( 𝐴′ 𝑋 𝐵)𝜔 − 𝐻 (𝑋 𝐵)𝜔 (13.3.13)
= −𝐼 ( 𝐴′⟩𝑋 𝐵)𝜔 , (13.3.14)

where we used the data-processing inequality in (7.3.17), and for the subsequent
equalities we used the fact that 𝜔 𝐴′ 𝑋 𝐸 𝐵𝑅 is a pure state that is symmetric in 𝑋 and
859
Chapter 13: Entanglement Distillation

𝐸. We thus have 𝐼 ( 𝐴′⟩𝑋 𝐵)𝑉 𝜌𝑉 † ≤ 0 for every isometry 𝑉 used in the optimization
for 𝐷 → (𝜌 𝐴𝐵 ), implying that 𝐷 → (𝜌 𝐴𝐵 ) ≤ 0. However, since 𝐷 → (𝜌 𝐴𝐵 ) ≥ 0 by
Lemma 13.27, we obtain 𝐷 → (𝜌 𝐴𝐵 ) = 0. The statement that 𝐸 𝐷 → ( 𝐴; 𝐵) = 0
𝜌
follows by repeating the same argument for 𝑛 copies of 𝜌 𝐴𝐵 and using the fact that
𝜌 ⊗𝑛
𝐴𝐵 is an anti-degradable state if 𝜌 𝐴𝐵 is. ■

Theorem 13.31 One-Way Distillable Entanglement for Degradable States

For a degradable state 𝜌 𝐴𝐵 , we have

𝐷 → (𝜌 𝐴𝐵 ) = 𝐼 ( 𝐴⟩𝐵) 𝜌 . (13.3.15)

Consequently, 𝐷 → (𝜌 ⊗𝑛 →
𝐴𝐵 ) = 𝑛𝐷 (𝜌 𝐴𝐵 ), and thus the one-way distillable
entanglement of a degradable state 𝜌 𝐴𝐵 is equal to its coherent information:
→
𝐸𝐷 ( 𝐴; 𝐵) 𝜌 = 𝐼 ( 𝐴⟩𝐵) 𝜌 . (13.3.16)

Proof: First, observe that if we pick the isometry 𝑉 in (13.2.102) to be 𝑉𝐴→𝐴′ 𝑋 𝐸 =

1 𝐴 ⊗ |0, 0⟩ 𝑋 𝐸 (so that 𝑋 and 𝐸 are one-dimensional systems), then we obtain
𝐷 → (𝜌 𝐴𝐵 ) ≥ 𝐼 ( 𝐴⟩𝐵) 𝜌 . Then, we conclude that 𝐷 → (𝜌 ⊗𝑛 𝐴𝐵 ) ≥ 𝑛𝐼 ( 𝐴⟩𝐵) 𝜌 be-
cause coherent information is additive for product states; thus, 𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 =
lim𝑛→∞ 𝑛1 𝐷 → (𝜌 ⊗𝑛𝐴𝐵 ) ≥ 𝐼 ( 𝐴⟩𝐵) 𝜌 .
We now prove the reverse inequality. Let 𝑉𝐴→𝐴′ 𝑋 𝐸 be an arbitrary isometry
in the optimization for 𝐷 → (𝜌 𝐴𝐵 ). Also, let 𝜓 𝐴𝐵𝑅 be a purification of 𝜌 𝐴𝐵 .
Then, since 𝜌 𝐴𝐵 is degradable, there exists a channel D𝐵→𝑅′ such that the state
𝜏𝐴𝑅′ 𝑅 B D𝐵→𝑅′ (𝜓 𝐴𝐵𝑅 ) satisfies

𝜏𝐴𝑅′ = 𝜏𝐴𝑅 = 𝜓 𝐴𝑅 . (13.3.17)

Let 𝑊𝐵→𝑅′ 𝐹 be an isometric extension of D𝐵→𝑅′ , and let

|𝜑⟩ 𝐴𝑅′ 𝐹 𝑅 = 𝑊𝐵→𝑅′ 𝐹 |𝜓⟩ 𝐴𝐵𝑅 , (13.3.18)

Then, by invariance of entropy under the isometry 𝑊𝐵→𝑅′ 𝐹 ,

𝐼 ( 𝐴′⟩𝑋 𝐵)𝜔 = 𝐻 (𝑋 𝐵)𝜔 − 𝐻 ( 𝐴′ 𝑋 𝐵)𝜔 (13.3.21)

860
Chapter 13: Entanglement Distillation

= 𝐻 (𝑋 𝑅′ 𝐹) 𝜙 − 𝐻 ( 𝐴′ 𝑋 𝑅′ 𝐹) 𝜙 (13.3.22)
= 𝐻 (𝑋 𝑅′ 𝐹) 𝜙 − 𝐻 (𝐸 𝑅) 𝜙 (13.3.23)
= 𝐻 (𝑋 𝑅′ 𝐹) 𝜙 − 𝐻 (𝑋 𝑅) 𝜙 , (13.3.24)

where the second-to-last line follows because 𝜙 𝐴′ 𝑋 𝐸 𝑅′ 𝐹 𝑅 is a pure state, so that

𝐻 ( 𝐴′ 𝑋 𝑅′ 𝐹) 𝜙 = 𝐻 (𝐸 𝑅) 𝜙 , and then for the last line we used the fact that 𝜙 𝐴′ 𝑋 𝐸 𝑅′ 𝐹 𝑅
is symmetric in 𝑋 and 𝐸 by definition of 𝑉𝐴→𝐴′ 𝑋 𝐸 . Next, due to the fact that
𝜏𝐴𝑅′ = 𝜏𝐴𝑅 , it holds that 𝜙 𝑋 𝑅′ = 𝜙 𝑋 𝑅 . We thus obtain

𝐼 ( 𝐴′⟩𝑋 𝐵)𝜔 = 𝐻 (𝑋 𝑅′ 𝐹) 𝜙 − 𝐻 (𝑋 𝑅′) 𝜙 (13.3.25)

= −𝐼 (𝐹⟩𝑋 𝑅′) 𝜙 (13.3.26)
≤ −𝐼 (𝐹⟩𝑅′) 𝜙 (13.3.27)
= 𝐻 (𝐹 𝑅′) 𝜙 − 𝐻 (𝑅′) 𝜙 (13.3.28)
= 𝐻 (𝐹 𝑅′) 𝜙 − 𝐻 (𝑅) 𝜙 , (13.3.29)

where the inequality follows from the data-processing inequality with the partial
trace channel Tr 𝑋 , and the last equality follows because 𝜙 𝑅 = 𝜙 𝑅′ , due to the
degradability of 𝜌 𝐴𝐵 . Finally, observe that
†
𝜙 𝑅′ 𝐹 = 𝑊𝐵→𝑅′ 𝐹 𝜌 𝐵𝑊𝐵→𝑅 ′𝐹 , (13.3.30)

so that 𝐻 (𝑅′ 𝐹) 𝜙 = 𝐻 (𝐵) 𝜌 by isometric invariance of entropy. Also, 𝜙 𝑅 =

Tr 𝐴𝐵 [𝜓 𝐴𝐵𝑅 ], which implies that 𝐻 (𝑅) 𝜙 = 𝐻 ( 𝐴𝐵) 𝜌 . We thus obtain

𝐼 ( 𝐴′⟩𝑋 𝐵)𝜔 ≤ 𝐼 ( 𝐴⟩𝐵) 𝜌 (13.3.31)

for every isometry 𝑉𝐴→𝐴′ 𝑋 𝐸 . This implies that 𝐷 → (𝜌 𝐴𝐵 ) ≤ 𝐼 ( 𝐴⟩𝐵) 𝜌 , i.e.,

𝐷 → (𝜌 𝐴𝐵 ) = 𝐼 ( 𝐴⟩𝐵) 𝜌 . (13.3.32)

In other words, the trivial isometry 𝑉𝐴→𝐴′ 𝑋 𝐸 = 1 𝐴 ⊗ |0, 0⟩ 𝑋 𝐸 is optimal for

𝐷 → (𝜌 𝐴𝐵 ) when 𝜌 𝐴𝐵 is a degradable state. Thus, by additivity of coherent
information for product states, we obtain 𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 = 𝐼 ( 𝐴⟩𝐵) 𝜌 , as required. ■

13.4 Summary
In this chapter, we studied the task of entanglement distillation, in which the goal is
for Alice and Bob to convert many copies of a shared entangled state 𝜌 𝐴𝐵 to some
861
Chapter 13: Entanglement Distillation

(smaller) number of ebits, i.e., copies of a two-qubit maximally entangled state

using local operations and classical communication. The largest rate at which this
can be done, given arbitrarily many copies of 𝜌 𝐴𝐵 and such that the error vanishes,
is called distillable entanglement, and we denote it by 𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 . We started with
the one-shot setting, in which we allow for some error in the distillation protocol,
and we determined both upper and lower bounds on the number of (approximate)
ebits that can be obtained. Then, in the asymptotic setting, we showed that the
coherent information 𝐼 ( 𝐴⟩𝐵) 𝜌 is a lower bound on distillable entanglement for
every bipartite state 𝜌 𝐴𝐵 . We also found that two different entanglement measures
are upper bounds on distillable entanglement, namely, the Rains relative entropy
and squashed entanglement. These are the best known upper bounds on distillable
entanglement.
By first performing entanglement distillation to transform their mixed entangled
states to approximately pure maximally entangled states and then performing the
quantum teleportation protocol, Alice can transmit any quantum state to Bob. In this
sense, entanglement distillation can be used to realize a near-ideal quantum channel
between Alice and Bob. This fact underlies achievable strategies for quantum
communication, which is the task of perfectly transmitting an arbitrary quantum
state from Alice to Bob when the resource is a quantum channel N 𝐴→𝐵 connecting
Alice and Bob rather than a shared bipartite state 𝜌 𝐴𝐵 . Quantum communication is
the subject of the next chapter.

13.5 Bibliographic Notes

Although our focus in this book is on communication, maximally entangled states
are useful resources for many other quantum information processing tasks, (see, e.g.,
Horodecki et al. (2009b) for applications of entanglement in quantum computing),
which makes entanglement distillation a relevant topic in its own right.
The concept of entanglement distillation was initially developed by Bennett et al.
(1996b,c), who also provided one-way and two-way protocols for distillation from
two-qubit Bell-diagonal states. The precise mathematical definition of distillable
entanglement was given by Rains (1998, 1999a,b); Horodecki et al. (2000); Plenio
and Virmani (2007). Relaxing the set of allowed operations from LOCC channels
to separable and completely PPT-preserving channels as a means to obtain upper
bounds on distillable entanglement was considered by Rains (1998, 1999a,b).

862
Chapter 13: Entanglement Distillation

Berta (2008); Buscemi and Datta (2010b); Brandao and Datta (2011); Wilde
et al. (2017) have considered lower bounds on distillable entanglement in the
one-shot setting. The lower bound that we present in Proposition 13.10 is the one
given by Wilde et al. (2017, Proposition 21), which makes use of the one-shot
decoupling results obtained by Dupuis et al. (2014). In particular, the proof of
Theorem 13.11 provided in Appendix 13.A comes directly from the proof of (Dupuis
et al., 2014, Theorem 3.3). The notion of decoupling has played an important role
in the development of quantum information theory. It was originally proposed by
Schumacher and Westmoreland (2002) in the context of understanding approximate
quantum error correction and quantum communication. It was then developed in
much more detail by Horodecki et al. (2005b, 2007) in the context of state merging
and by Hayden et al. (2008a) for understanding the coherent information lower
bound on quantum capacity. Dupuis (2010) developed the method in more detail in
his PhD thesis for a variety of information-processing tasks, and this culminated in
the general decoupling theorem presented as (Dupuis et al., 2014, Theorem 3.3).
For the SDP formulations of conditional min- and max-entropy, as well as their
smoothed variants, see (Tomamichel, 2015, Chapter 6). For more information
about unitary designs and about Haar measure integration over unitaries, we refer
to (Collins and Śniady, 2006; Roy and Scott, 2009). The one-shot upper bound
that we present in Proposition 13.6 and Theorem 13.7 based on the fact that PPT’
operators are useless for entanglement distillation was determined by Tomamichel
et al. (2016).
In the asymptotic setting, Devetak and Winter (2005) used random coding
arguments to establish the coherent information lower bound (also called the
“hashing inequality”) on the distillable entanglement of a bipartite quantum state.
The corresponding hashing protocol was presented by Bennett et al. (1996c) for
two-qubit Bell-diagonal states. Devetak and Winter (2005) also determined that
the general expression in Theorem 13.19 is an achievable rate for entanglement
distillation from a bipartite state, and they also proved the converse. Horodecki
et al. (2000) conjectured this formula earlier, conditioned on the hashing inequality
being true.
Theorem 13.26 is due to Devetak and Winter (2005). The other results in
Sections 13.2.5 and 13.3.2 on one-way entanglement distillation were obtained by
Leditzky et al. (2018), who in the same work used the concepts of approximate
degradablity and approximate anti-degradability of bipartite states to derive upper
bounds on distillable entanglement. We note that anti-degradable quantum states,

863
Chapter 13: Entanglement Distillation

Environment
ρ AE τB ⊗ ρ E
Alice
A B
U N
Alice Bob

Figure 13.4: Given a bipartite state 𝜌 𝐴𝐸 and a quantum channel N 𝐴→𝐵 , the
goal of decoupling is to obtain a state 𝜏𝐵 ⊗ 𝜌 𝐸 that is in tensor product with the
environment 𝐸, where 𝜌 𝐸 = Tr 𝐴 [𝜌 𝐴𝐸 ]. To assist with the task, Alice is allowed
to apply an arbitrary unitary 𝑈 to her system 𝐴.

as defined by Leditzky et al. (2018), are also known as symmetrically extendible

states, two-extendible states, or two-shareable states (Werner, 1989a; Doherty et al.,
2004; Yang, 2006).
PPT entangled states (i.e., bound entangled states) were discovered by Horodecki
(1997), and Horodecki et al. (1998) showed that PPT states are useless for entan-
glement distillation. A major open and challenging question, which is alluded to
in Section 13.2.0.1 (see also Figure 13.3) is whether there exist NPT (negative
partial transpose) bound entangled states. For discussions concerning NPT bound
entanglement, we refer to (Horodecki and Horodecki, 1999; DiVincenzo et al.,
2000; Dür et al., 2000).

Appendix 13.A One-Shot Decoupling and Proof of

Theorem 13.11
The key insight needed to obtain the lower bound on one-shot distillable entan-
glement in Theorem 13.10 is that entanglement distillation can be thought of in
terms of decoupling. The general scenario of decoupling is depicted in Figure 13.4.
Given a quantum channel N 𝐴→𝐵 and a bipartite state 𝜌 𝐴𝐸 , the goal of decoupling
is to obtain a state N 𝐴→𝐵 (𝜌 𝐴𝐸 ) that is decoupled from the system 𝐸, i.e., a state
approximately of the form 𝜏𝐵 ⊗ 𝜌 𝐸 , where 𝜌 𝐸 = Tr 𝐴 [𝜌 𝐴𝐸 ] and 𝜏𝐵 is some state.
Note that the reduced state of 𝐸 at the output is the same as the reduced state of 𝐸
at the input because we apply a channel only to the system 𝐴 and such a channel is
trace preserving. We also allow Alice an arbitrary unitary that she can apply to her

864
Chapter 13: Entanglement Distillation

system before sending it through the channel N 𝐴→𝐵 , so that we require

N 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝐸 𝑈 𝐴† ) ≈𝜀 𝜏𝐵 ⊗ 𝜌 𝐸 , (13.A.1)

for some state 𝜏𝐵 , up to some error 𝜀. Of course, exact equality in (13.A.1)

cannot be obtained in general, and we make the goal instead to obtain an output
state N 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝐸 𝑈 𝐴† ) that is as close as possible to the state ΦN
𝐵 ⊗ 𝜌 𝐸 , where
N N N
Φ𝐵 = Tr 𝐴 [Φ 𝐴𝐵 ] and Φ 𝐴𝐵 is the Choi state of N 𝐴→𝐵 . The choice for 𝜏𝐵 given by the
reduced Choi state might seem arbitrary, but it is taken for analytical considerations
and leads to a good bound for our purposes. By using trace distance as our measure
of closeness, the goal is to determine an upper bound on the following quantity:

min N 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝐸 𝑈 𝐴† ) − ΦN
𝐵 ⊗ 𝜌𝐸 (13.A.2)
𝑈𝐴 1

Note that the minimum over all unitaries never exceeds the average, meaning that

min N 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝐸 𝑈 𝐴† ) − ΦN
𝐵 ⊗ 𝜌𝐸
𝑈𝐴 1
∫
≤ N 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝐸 𝑈 𝐴† ) − ΦN
𝐵 ⊗ 𝜌𝐸 d𝑈 𝐴 , (13.A.3)
𝑈𝐴 1

where the integral over all unitaries 𝑈 𝐴 is with respect to the Haar measure, and it
can be thought of as a uniform average over the continuous set of all unitaries 𝑈 𝐴
acting on the system 𝐴. Theorem 13.11 provides an upper bound on the right-hand
side of the inequality above, and we restate the result here for convenience:
∫
N 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝐸 𝑈 𝐴† ) − ΦN
1e 1e
𝐵 ⊗ 𝜌𝐸 d𝑈 𝐴 ≤ 2− 2 𝐻2 ( 𝐴|𝐸)𝜌 − 2 𝐻2 ( 𝐴|𝐵)ΦN , (13.A.4)
𝑈𝐴 1

where we recall that the sandwiched Rényi conditional entropy of order two of a
bipartite state 𝜔𝐶𝐷 is defined as
e2 (𝐶 |𝐷)𝜔 = − inf 𝐷
𝐻 e2 (𝜔𝐶𝐷 ∥ 1𝐶 ⊗ 𝜎𝐷 ) (13.A.5)
𝜎𝐷
" 2#
− 14 − 41
= − inf log2 Tr 𝜎𝐷 𝜔𝐶𝐷 𝜎𝐷 , (13.A.6)
𝜎𝐷

and the optimization is over every state 𝜎𝐷 .

865
Chapter 13: Entanglement Distillation

Proof of Theorem 13.11

Let
𝑀𝐵𝐸 B N 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝐸 𝑈 𝐴† ) − ΦN
𝐵 ⊗ 𝜌𝐸 , (13.A.7)
𝜎𝐵𝐸 B 𝜏𝐵 ⊗ 𝜁 𝐸 , (13.A.8)
with 𝜏𝐵 and 𝜁 𝐸 arbitrary positive definite states. By the variational characterization
of the trace norm in (2.2.130), we have that
∥ 𝑀𝐵𝐸 ∥ 1 = max |Tr[𝑈𝐵𝐸 𝑀𝐵𝐸 ]| , (13.A.9)
𝑈 𝐵𝐸

where the optimization is over every unitary 𝑈𝐵𝐸 . Using the Cauchy–Schwarz
inequality (see (2.2.30)), and suppressing system labels for brevity, we obtain
∥ 𝑀 ∥ 1 = max |Tr[𝑈 𝑀]| (13.A.10)
𝑈
h 1 1
1 1
i
− −
= max Tr 𝜎 4 𝑈𝜎 4 𝜎 4 𝑀𝜎 4 (13.A.11)
𝑈
√︂ h 1 i h 1 i
1 1 1 1 1
≤ max Tr 𝜎 𝑈𝜎 4 4 †
𝜎 𝑈 𝜎
4 4 −
Tr 𝜎 𝑀𝜎 𝑀 𝜎
4 − 2 † − 4 (13.A.12)
𝑈
√︂ h 1 i h 1 i
1 1 1
† − −
= max Tr 𝜎 2 𝑈𝜎 2 𝑈 Tr 𝜎 4 𝑀𝜎 2 𝑀 𝜎 4 . † − (13.A.13)
𝑈
1 1
Since 𝜎 2 and 𝑈𝜎 2 𝑈 † are positive definite for every unitary 𝑈, by the Cauchy–
Schwarz inequality, we conclude that
h 1 1
i h 1 1
i
† †
Tr 𝜎 𝑈𝜎 𝑈 = Tr 𝜎 𝑈𝜎 𝑈
2 2 2 2 (13.A.14)
√︂ h i h i
1 1 1 1
†
≤ Tr 𝜎 2 𝜎 2 Tr 𝑈𝜎 2 𝑈 𝑈𝜎 2 𝑈 † (13.A.15)
= Tr[𝜎] (13.A.16)
=1 (13.A.17)
for every unitary 𝑈, which implies that
h 1 1
i
†
max Tr 𝜎 𝑈𝜎 𝑈 ≤ Tr[𝜎] = 1.
2 2 (13.A.18)
𝑈

On the other hand, by taking 𝑈 = 1 in the optimization over 𝑈, we obtain

h 1 1
i
†
max Tr 𝜎 𝑈𝜎 𝑈 ≥ Tr[𝜎] = 1,
2 2 (13.A.19)
𝑈

866
Chapter 13: Entanglement Distillation

which means that h 1 1

i
†
max Tr 𝜎 𝑈𝜎 𝑈
2 2 = Tr[𝜎] = 1. (13.A.20)
𝑈
Therefore,
√︂ h i
1 1 1
∥ 𝑀 ∥ 1 ≤ Tr 𝜎 − 4 𝑀𝜎 − 2 𝑀 † 𝜎 − 4 (13.A.21)
√︄
1
1 2
= Tr 𝜎 − 4 𝑀𝜎 − 4 , (13.A.22)

where the last line follows because 𝑀 is Hermitian. So we have that

N 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝐸 𝑈 𝐴† ) − ΦN
𝐵 ⊗ 𝜌𝐸
1
√︄

1 2
1 †
≤ Tr (𝜏𝐵 ⊗ 𝜁 𝐸 ) − 4 (N 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝐸 𝑈 𝐴 ) − ΦN
𝐵 ⊗ 𝜌 𝐸 )(𝜏𝐵 ⊗ 𝜁 𝐸 )
−4 .

(13.A.23)

Now, define
1 1
e 𝐴→𝐵 (·) B 𝜏 − 4 N 𝐴→𝐵 (·)𝜏 − 4 ,
N (13.A.24)
𝐵 𝐵
− 14 − 14
𝜌 𝐴𝐸 B 𝜁 𝐸 𝜌 𝐴𝐸 𝜁 𝐸 .
e (13.A.25)

Using these definitions, we can write the inequality above as

N 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝐸 𝑈 𝐴† ) − ΦN
𝐵 ⊗ 𝜌𝐸
√︄ 1
2
† N
≤ Tr N e 𝐴→𝐵 (𝑈 𝐴 e
𝜌 𝐴𝐸 𝑈 𝐴 ) − Φ𝐵 ⊗ e
e
𝜌 𝐸 . (13.A.26)

Taking the integral over unitaries 𝑈 𝐴 on both sides of this inequality, and using
Jensen’s inequality (see (2.3.21)), which applies because the square root function is
concave, we obtain
∫
N 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝐸 𝑈 𝐴† ) − ΦN
𝐵 ⊗ 𝜌𝐸 d𝑈 𝐴
𝑈𝐴 1
∫ √︄ 2
† N
≤ Tr N e 𝐴→𝐵 (𝑈 𝐴 e𝜌 𝐴𝐸 𝑈 𝐴 ) − Φ𝐵 ⊗ e
e
𝜌𝐸 d𝑈 𝐴 (13.A.27)
𝑈𝐴

867
Chapter 13: Entanglement Distillation
√︄∫ 2
≤ N 𝜌 𝐴𝐸 𝑈 𝐴† )
e 𝐴→𝐵 (𝑈 𝐴 e − ΦN ⊗e
e
Tr 𝐵 𝜌𝐸 d𝑈 𝐴 . (13.A.28)
𝑈𝐴

Expanding the integral on the right-hand side of the inequality above leads to
∫ 2
† N
Tr N e 𝐴→𝐵 (𝑈 𝐴 e
𝜌 𝐴𝐸 𝑈 𝐴 ) − Φ𝐵 ⊗ e
e
𝜌𝐸 d𝑈 𝐴
𝑈𝐴
∫ 2
= Tr N 𝜌 𝐴𝐸 𝑈 𝐴† )
e 𝐴→𝐵 (𝑈 𝐴 e d𝑈 𝐴 (13.A.29)
𝑈𝐴
∫ h i
† N
−2 Tr N 𝐴→𝐵 (𝑈 𝐴 e 𝜌 𝐴𝐸 𝑈 𝐴 )(Φ𝐵 ⊗ e 𝜌 𝐸 ) d𝑈 𝐴 + Tr[(ΦN 𝜌𝐸 ) 2]
𝐵 ⊗ e
e e e

𝑈𝐴
(13.A.30)
∫ 2
= Tr N 𝜌 𝐴𝐸 𝑈 𝐴† )
e 𝐴→𝐵 (𝑈 𝐴 e d𝑈 𝐴
𝑈𝐴
∫
† N
− 2 Tr N 𝜌 𝐸 ) + Tr[(ΦN
𝜌 𝐴𝐸 𝑈 𝐴 d𝑈 𝐴 (Φ𝐵 ⊗ e 𝜌 𝐸 ) 2 ].
𝐵 ⊗ e
e e
e 𝐴→𝐵 𝑈𝐴 e
𝑈𝐴
(13.A.31)
Now, using (13.1.83), we have that
∫
†
N 𝜌 𝐴𝐸 𝑈 𝐴 d𝑈 𝐴 = N 𝜌 𝐴𝐸 ]) = ΦN
e 𝐴→𝐵 (𝜋 𝐴 ⊗ Tr 𝐴 [e
𝐵 ⊗ e
e
e 𝐴→𝐵 𝑈𝐴 e 𝜌 𝐸 . (13.A.32)
𝑈𝐴

Therefore,
∫ 2
† N
Tr N e 𝐴→𝐵 (𝑈 𝐴 e
𝜌 𝐴𝐸 𝑈 𝐴 ) − Φ𝐵 ⊗ e
e
𝜌𝐸 d𝑈 𝐴
𝑈𝐴
∫ 2
𝜌 𝐴𝐸 𝑈 𝐴† ) d𝑈 𝐴 − Tr[(ΦN
e 2
= Tr N e 𝐴→𝐵 (𝑈 𝐴 e 𝜌 2𝐸 ]. (13.A.33)
𝐵 ) ]Tr[e
𝑈𝐴

We now use the fact that

Tr[𝑋 2 ] = Tr[𝑋 ⊗2 𝐹] (13.A.34)
for every operator 𝑋, where 𝐹 is the swap operator defined in (2.5.10). In (13.A.33)
above, we have 𝑋 ≡ 𝑋𝐵𝐸 = N e 𝐴→𝐵 (𝑈 𝐴 e𝜌 𝐴𝐸 𝑈 𝐴† ), which is a bipartite operator. The
corresponding swap operator is 𝐹𝐵𝐸 = 𝐹𝐵 ⊗ 𝐹𝐸 , with 𝐹𝐵 the swap operator acting on
two copies of H𝐵 and 𝐹𝐸 the swap operator acting on two copies of H𝐸 . Therefore,
2
Tr N 𝜌 𝐴𝐸 𝑈 𝐴† )
e 𝐴→𝐵 (𝑈 𝐴 e

868
Chapter 13: Entanglement Distillation
⊗2
= Tr N 𝜌 𝐴𝐸 𝑈 𝐴† )
e 𝐴→𝐵 (𝑈 𝐴 e (𝐹𝐵 ⊗ 𝐹𝐸 ) (13.A.35)
h i
e ⊗2 (𝑈 ⊗2 e ⊗2 † ⊗2
= Tr N 𝐴→𝐵 𝐴 𝜌 𝐴𝐸 (𝑈 𝐴 ) ) (𝐹𝐵 ⊗ 𝐹𝐸 ) (13.A.36)
h i
⊗2 † ⊗2 e ⊗2 † ⊗2
= Tr e𝜌 𝐴𝐸 (𝑈 𝐴 ) ( N 𝐴→𝐵 ) (𝐹𝐵 )𝑈 𝐴 ⊗ 𝐹𝐸 , (13.A.37)

where the last line follows from the definition of the adjoint of a channel. We thus
have
∫ 2
†
Tr N e 𝐴→𝐵 (𝑈 𝐴 e 𝜌 𝐴𝐸 𝑈 𝐴 ) d𝑈 𝐴
𝑈𝐴
∫
= Tr e 𝜌 ⊗2
𝐴𝐸 (𝑈 𝐴† ) ⊗2 ( N
e ⊗2 ) † (𝐹𝐵 )𝑈 ⊗2 d𝑈 𝐴 ⊗ 𝐹𝐸 . (13.A.38)
𝐴→𝐵 𝐴
𝑈𝐴

Now, we use the following known fact (a standard result in Schur–Weyl duality):
for every operator 𝑋 acting on C𝑑 ⊗ C𝑑 , with 𝑑 ≥ 1,
∫
(𝑈 † ) ⊗2 𝑋𝑈 ⊗2 d𝑈 = 𝛼1 + 𝛽𝐹, (13.A.39)
𝑈

where 𝐹 is again the swap operator, and

Tr[𝑋] Tr[𝑋 𝐹]
𝛼= − , (13.A.40)
𝑑 2 − 1 𝑑 (𝑑 2 − 1)
Tr[𝑋 𝐹] Tr[𝑋]
𝛽= 2 − . (13.A.41)
𝑑 −1 𝑑 (𝑑 2 − 1)
e ⊗2 ) † (𝐹𝐵 ), which is an operator acting on two copies of H 𝐴 , we
Taking 𝑋 ≡ ( N 𝐴→𝐵
obtain

Tr[( N e ⊗2 ( 1⊗2 )𝐹𝐵 ]

e ⊗2 ) † (𝐹𝐵 )] = Tr[N (13.A.42)
𝐴→𝐵 𝐴→𝐵 𝐴
= 𝑑 2𝐴 Tr[(ΦN ⊗2
𝐵 ) 𝐹𝐵 ]
e
(13.A.43)
= 𝑑 2𝐴 Tr[(ΦN 2
𝐵 ) ],
e
(13.A.44)

where the last line follows from (13.A.34). By similar reasoning, we obtain
e ⊗2 ) † (𝐹𝐵 )] = Tr[N
Tr[𝐹𝐴 ( N e ⊗2 (𝐹𝐴 )𝐹𝐵 ] (13.A.45)
𝐴→𝐵 𝐴→𝐵
= 𝑑 2𝐴 Tr[(𝐹𝐴 ⊗ 𝐹𝐵 )(ΦN ⊗2
𝐴𝐵 ) ]
e
(13.A.46)
869
Chapter 13: Entanglement Distillation

= 𝑑 2𝐴 Tr[(ΦN 2
𝐴𝐵 ) ],
e
(13.A.47)

e ⊗2 on 𝐹𝐴 with the
where the second equality follows by expressing the action of N 𝐴→𝐵
Choi state ΦN
e
𝐴𝐵 , using (4.2.5). To obtain the last equality, we again used (13.A.34).
We thus have
Ne 2 !
Tr[(ΦN ) ] 𝐴𝐵 ) ]
e 2
𝐵 2
Tr[(Φ
𝛼= 𝑑 𝐴 − 𝑑 𝐴 , (13.A.48)
𝑑 2𝐴 − 1 N
Tr[(Φ𝐵 ) ]
e 2
!
N
Tr[(Φ 𝐴𝐵 ) ] 2
e 2 N
Tr[(Φ𝐵 ) ]
e 2
𝛽= 𝑑 𝐴 − 𝑑 𝐴 . (13.A.49)
𝑑 2𝐴 − 1 N
Tr[(Φ 𝐴𝐵 ) ]
e 2

We now make use of the following general fact, whose proof we provide below
in Lemma 13.32: for every non-zero positive semi-definite operator 𝑃 𝐴𝐵 with
𝑃 𝐵 B Tr 𝐴 [𝑃 𝐴𝐵 ], the following inequalities hold

1 Tr[𝑃2𝐴𝐵 ]
≤ ≤ 𝑑𝐴. (13.A.50)
𝑑𝐴 Tr[𝑃2𝐵 ]

Applying these inequalities to the expressions for 𝛼 and 𝛽 above, we obtain

𝛼 ≤ Tr[(ΦN 2
𝐵 ) ], 𝛽 ≤ Tr[(ΦN 2
𝐴𝐵 ) ].
e e
(13.A.51)

We thus have that

∫ 2
†
Tr N e 𝐴→𝐵 (𝑈 𝐴 e 𝜌 𝐴𝐸 𝑈 𝐴 ) d𝑈 𝐴 (13.A.52)
𝑈𝐴
h i
= Tr e 𝜌 ⊗2
𝐴𝐸 (𝛼 1 ⊗2
𝐴 + 𝛽𝐹 𝐴 ) ⊗ 𝐹 𝐸 (13.A.53)
𝜌 2𝐸 ] + 𝛽Tr[e
= 𝛼Tr[e 𝜌 2𝐴𝐸 ] (13.A.54)
≤ Tr[(ΦN 2
𝜌 2𝐸 ] + Tr[(ΦN
𝐵 ) ]Tr[e
2
𝜌 2𝐴𝐸 ].
𝐴𝐵 ) ]Tr[e
e e
(13.A.55)

Combining this with (13.A.33), we find that

∫ 2
N 𝜌 𝐴𝐸 𝑈 𝐴† )
e 𝐴→𝐵 (𝑈 𝐴 e − ΦN ⊗e
e
Tr 𝐵 𝜌𝐸 d𝑈 𝐴
𝑈𝐴

≤ Tr[(ΦN 2
𝜌 2𝐴𝐸 ]. (13.A.56)
𝐴𝐵 ) ]Tr[e
e

870
Chapter 13: Entanglement Distillation

Then, by (13.A.28), and recalling the definitions in (13.A.24) and (13.A.25), we

obtain
∫
N 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝐸 𝑈 𝐴† ) − ΦN 𝐵 ⊗ 𝜌𝐸 d𝑈 𝐴
𝑈𝐴 1
√︃
≤ Tr[(ΦN 𝐴𝐵 ) ]Tr[e𝜌 2𝐴𝐸 ]
e 2
(13.A.57)
" 1 1
2# ! 2 " 2# ! 4
− 41 N − 14 1 − 1
= Tr 𝜏𝐵 Φ 𝐴𝐵 𝜏𝐵 Tr 𝜁 − 4 𝜌 𝐴𝐸 𝜁 𝐸 4 . (13.A.58)

This inequality holds for all states 𝜏𝐵 and 𝜁 𝐸 , which means that
∫
N 𝐴→𝐵 (𝑈 𝐴 𝜌 𝐴𝐸 𝑈 𝐴† ) − ΦN
𝐵 ⊗ 𝜌𝐸 d𝑈 𝐴
𝑈𝐴 1
" 2 # ! 12 " 2 # ! 12
− 14 − 41 − 41 − 14
≤ inf Tr 𝜏𝐵 ΦN
𝐴𝐵 𝜏𝐵 inf Tr 𝜁 𝐸 𝜌 𝐴𝐸 𝜁 𝐸 (13.A.59)
𝜏𝐵 𝜁𝐸
1 1
= 2− 2 𝐻2 ( 𝐴|𝐵)ΦN − 2 𝐻2 ( 𝐴|𝐸)𝜌
e
(13.A.60)
1 1
= 2− 2 𝐻2 ( 𝐴|𝐸)𝜌 − 2 𝐻2 ( 𝐴|𝐵)ΦN ,
e
(13.A.61)

which completes the proof. ■

Lemma 13.32
For every non-zero positive semi-definite operator 𝑃 𝐴𝐵 , with 𝑃 𝐵 = Tr 𝐴 [𝑃 𝐴𝐵 ],
it holds that
1 Tr[𝑃2𝐴𝐵 ]
≤ ≤ 𝑑𝐴. (13.A.62)
𝑑𝐴 Tr[𝑃2𝐵 ]

Proof: Letting 𝐴′ denote a copy of 𝐴, and applying the Cauchy–Schwarz inequality

(see (2.2.30)), we find that

Tr[𝑃2𝐵 ] = Tr[(𝑃 𝐴𝐵 ⊗ 1 𝐴′ )(𝑃 𝐴′ 𝐵 ⊗ 1 𝐴 )] (13.A.63)

≤ Tr[(𝑃 𝐴𝐵 ⊗ 1 𝐴′ ) 2 ]Tr[(𝑃 𝐴′ 𝐵 ⊗ 1 𝐴 ) 2 ]
√︁
(13.A.64)
= Tr[𝑃2𝐴𝐵 ⊗ 1 𝐴′ ] (13.A.65)
= 𝑑 𝐴 Tr[𝑃2𝐴𝐵 ]. (13.A.66)

871
Chapter 13: Entanglement Distillation

The other inequality follows from the operator inequality 𝑃 𝐴𝐵 ≤ 𝑑 𝐴 1 𝐴 ⊗ 𝑃 𝐵 ,

1
2
after sandwiching it by 𝑃 𝐴𝐵 and taking a full trace. This operator inequality follows
𝑑 −1 𝑖
2
𝑈 𝐴 𝑃 𝐴𝐵 (𝑈 𝑖𝐴 ) † = 𝜋 𝐴 ⊗ 𝑃 𝐵 (Lemma 3.15), for {𝑈 𝑖𝐴 }𝑖 the
Í
in turn because 𝑑12 𝑖=0
𝐴
set of Heisenberg–Weyl operators, and by noticing that all terms in the sum are
positive semi-definite and one term in the sum is 𝑑12 𝑃 𝐴𝐵 . ■
𝐴

872
Chapter 14

Quantum Communication
In the previous chapter, we considered entanglement distillation, which is the task
of taking many copies of a mixed entangled state 𝜌 𝐴𝐵 shared by Alice and Bob
and transforming them to a maximally entangled state Φ 𝐴ˆ 𝐵ˆ of Schmidt rank 𝑑 ≥ 2.
Using the quantum teleportation protocol, the maximally entangled state resulting
from entanglement distillation can be used for quantum communication, in the
sense that Alice can transfer an arbitrary state of log2 𝑑 qubits to Bob.
Now, if Alice and Bob are distantly separated, then how do they obtain many
copies of the shared entangled state 𝜌 𝐴𝐵 in the first place? Typically, one of the
parties, say Alice, prepares two quantum systems in an entangled state and sends
one of them through a quantum channel N 𝐴→𝐵 to Bob, thereby establishing the
shared entangled state. Rather than use the shared entangled state as the resource
for communication, it is more natural to use the quantum channel itself as the
resource, as it could in principle lead to better strategies and higher rates. This is
the scenario that we consider in this chapter.
Recall that in the case of classical communication from Chapter 12, we
considered messages from a set M, and the goal was to find upper and lower bounds
on the maximum number log2 |M| of transmitted bits over a quantum channel for a
given error 𝜀. Now, in the case of quantum communication, the goal is to transmit
a given number of qubits, rather than bits, for a given error 𝜀. Formally, suppose
that the sender, Alice, holds a quantum system 𝐴′ with dimension 𝑑 ≥ 1 that she
would like to transmit over the channel N to Bob, the receiver. In general, the
state of this system could be entangled with the state of some other system 𝑅 (of
arbitrary dimension) to which Alice does not have access, and so we suppose that

873
Chapter 14: Quantum Communication

Ψ RA0 ω RB0

A0
E A
N B
D B0

Alice Bob

Figure 14.1: Depiction of a quantum communication protocol for one use of

the quantum channel N. Alice shares the maximally entangled state Ψ𝑅 𝐴′ with
an inaccessible reference system 𝑅. She uses the channel E 𝐴′ →𝐴 to encode her
system 𝐴′ into a system 𝐴, which is sent through the channel N 𝐴→𝐵 . Bob then
applies the decoding channel D𝐵→𝐵′ , such that the final state is 𝜔 𝑅𝐵′ shared
between Bob and the reference system.

the joint state is a pure state Ψ𝑅 𝐴′ with Schmidt rank 𝑑. Note that, by the Schmidt
decomposition theorem (Theorem 2.2), the dimension of 𝑅 need not exceed the
dimension of 𝐴′, which is 𝑑. The goal is to determine the largest value of log2 𝑑
(which can be thought of as the number of qubits in the system 𝐴′) for which the
𝐴′ part of an arbitrary entangled state Ψ𝑅 𝐴′ can be transmitted with error at most
𝜀. This general quantum communication scenario is known as strong subspace
transmission. As usual, Alice and Bob are allowed local encoding and decoding
channels, respectively, to help with this task; see Figure 14.1 for a depiction of a
one-shot protocol for quantum communication. In the asymptotic setting, they are
also allowed as many uses of the channel N as desired. The quantum capacity of N,
denoted by 𝑄(N), is then the largest value of 𝑛1 log2 𝑑 such that the 𝐴′ part of an
arbitrary pure state Ψ𝑅 𝐴′ can be transmitted to Bob with error that vanishes as the
number 𝑛 of channel uses increases.
Note that the notion of quantum communication presented above (strong
subspace transmission) is completely general and includes as special cases the
following information-processing tasks:
1. Entanglement transmission: Here, Alice’s system 𝐴′ is in the maximally
entangled state Φ 𝑅 𝐴′ with the reference system 𝑅, and the goal is to transmit

874
Chapter 14: Quantum Communication

the system 𝐴′ to Bob. This is a special case of strong subspace transmission in

which Ψ𝑅 𝐴′ = Φ 𝑅 𝐴′ .
2. Entanglement generation: Alice prepares a pure entangled state Ψ𝐴′ 𝐴 , with
𝑑 𝐴′ = 𝑑 ≥ 1 and 𝐴 the input system to the channel N. The goal is to transmit
the system 𝐴 to Bob such that the resulting state shared by Alice and Bob is the
maximally entangled state of Schmidt rank 𝑑. We show in Appendix 14.A how
entanglement generation is related to the notion of quantum communication
that we consider here.
Note that entanglement generation is similar to entanglement distillation, and
we elaborate more upon this similarity in Section 14.1.3.
3. Subspace transmission: In this scenario, Alice wishes to send a system 𝐴′,
in an arbitrary pure state 𝜑 𝐴′ , to Bob. This is a special case of the protocol
above in which the system 𝑅 is not entangled with Alice’s system, so that
Ψ𝑅 𝐴′ = 𝜙 𝑅 ⊗ 𝜑 𝐴′ for some pure state 𝜙 𝑅 on 𝑅.
Note that subspace transmission can be accomplished by first performing
an entanglement transmission or entanglement generation protocol and then
performing the quantum teleportation protocol (however, this approach requires
the use of a forward classical channel).
We discuss these alternative notions of quantum communication in more detail, as
well as prove some relationships between them, in Appendix 14.A.
We start our development of quantum communication with the one-shot set-
ting. Recall that mutual-information channel measures appear in the one-shot
upper bounds for entanglement-assisted classical communication and that Holevo-
information channel measures appear in the one-shot upper bounds for classical
communication. In the case of quantum communication, we find that coherent-
information channel measures appear in the one-shot upper bounds. The one-shot
lower bound that we obtain is based on the one-shot, one-way entanglement distilla-
tion protocol in Section 13.1.2 of the previous chapter, along with an argument for
the removal of the classical communication used in that protocol. We then move
on to the asymptotic setting, and we find that the quantum capacity of a quantum
channel N is equal to its regularized coherent information, i.e.,
𝑐 1 𝑐 ⊗𝑛
𝑄(N) = 𝐼reg (N) B lim 𝐼 (N ), (14.0.1)
𝑛→∞ 𝑛
where we recall the definition of the coherent information 𝐼 𝑐 (N) of a channel from
(7.11.107). Thus, as with the classical capacity, the quantum capacity is difficult to
875
Chapter 14: Quantum Communication

compute in general. We then find tractable upper bounds on the quantum capacity,
and for this purpose, the channel entanglement measures defined in Chapter 10
play an important role.

14.1 One-Shot Setting

A (strong subspace) quantum communication protocol for a quantum channel N

in the one-shot setting is illustrated in Figure 14.1. It is defined by the three
elements (𝑑, E 𝐴′ →𝐴 , D𝐵→𝐵′ ), in which 𝑑 is the dimension of the system 𝐴′, E 𝐴′ →𝐴
is an encoding channel with 𝑑 𝐴′ = 𝑑, and D𝐵→𝐵′ is a decoding channel with
𝑑 𝐵′ = 𝑑 𝐴′ = 𝑑. We call the pair (E, D) of encoding and decoding channels a
quantum communication code1 for N.

Remark: In a strong subspace transmission protocol, the goal is to transmit one share of a
pure state Ψ𝑅 𝐴′ , with corresponding state vector |Ψ⟩𝑅 𝐴′ . Note that the state vector |Ψ⟩𝑅 𝐴′ has a
Schmidt decomposition of the form
𝑑 √
∑︁
|Ψ⟩ 𝑅 𝐴′ = 𝑝(𝑥)|𝜉 𝑥 ⟩𝑅 ⊗ |𝜁 𝑥 ⟩ 𝐴′ , (14.1.1)
𝑥=1

where {𝑝(𝑥)} 𝑑𝑥=1 are the Schmidt coefficients and {|𝜉 𝑥 ⟩𝑅 } 𝑑𝑥=1 , {|𝜁 𝑥 ⟩ 𝐴′ } 𝑑𝑥=1 are orthonormal sets
of vectors for 𝑅 and 𝐴′ , respectively. When written in this form, the state vector |Ψ⟩𝑅 𝐴′ can be
𝑝
understood as a coherent version of the initial state Φ 𝑀 𝑀 ′ for classical and entanglement-assisted
classical communication (see, e.g., (11.1.2)). The key difference in the classical-communication
case is that there is a fixed orthonormal basis {|𝑚⟩} 𝑚∈M corresponding to the messages 𝑚 in
the message set M. In quantum communication, the goal is to transmit a state of a quantum
system, which means that there is no particular basis used for communication. The encoding and
decoding channels should thus be defined so that they can reliably transmit states of the system
in an arbitrary basis.

The protocol proceeds as follows: we start with the entangled state Ψ𝑅 𝐴′ , where
the system 𝐴′ belongs to Alice and the system 𝑅 is an arbitrary reference system
inaccessible to Alice. Alice then sends the system 𝐴′ through the encoding channel
E 𝐴′ →𝐴 and sends 𝐴 through the channel N 𝐴→𝐵 . Once Bob receives the system
𝐵, he applies the decoding channel D𝐵→𝐵′ to it. The final state of the protocol is
1The quantum communication codes that we consider in this chapter are essentially equivalent
to codes for performing approximate quantum error correction. Please consult the Bibliographic
Notes in Section 14.5 for more information.

876
Chapter 14: Quantum Communication

therefore
𝜔 𝑅𝐵′ B (D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(Ψ𝑅 𝐴′ ). (14.1.2)

Let us now quantify the reliability of the protocol described above, i.e., how
close the final state 𝜔 𝑅𝐵′ is to the initial state Ψ𝑅 𝐴′ . In Chapter 6, we discussed two
measures of closeness for states:
• Normalized trace distance, using which the distance between the initial and
final states is 12 ∥Ψ𝑅 𝐴′ − 𝜔 𝑅𝐵′ ∥ 1 . The lower the normalized trace distance, the
more reliable the protocol is.
• Fidelity, in which case we have
√︁ √ 2
𝐹 (Ψ𝑅 𝐴′ , 𝜔 𝑅𝐵′ ) = Ψ𝑅 𝐴′ 𝜔 𝑅𝐵′ = ⟨Ψ| 𝑅 𝐴′ 𝜔 𝑅𝐵′ |Ψ⟩ 𝑅 𝐴′ (14.1.3)
1
as the closeness measure between the initial and final states of the protocol.
The higher the fidelity, the more reliable the protocol is.
These two measures of closeness are arguably equivalent to each other, in the
sense that one can be used to bound the other via the inequality (6.2.88) shown in
Theorem 6.14, which we restate here: for all states 𝜌 and 𝜎,
√︁ 1 √︁
1 − 𝐹 (𝜌, 𝜎) ≤ ∥ 𝜌 − 𝜎∥ 1 ≤ 1 − 𝐹 (𝜌, 𝜎). (14.1.4)
2
Now, our figure of merit for the quantum communication protocol should not
be based on just one particular initial state, in this case Ψ𝑅 𝐴′ . Recall that the task of
quantum communication is to reliably transmit one share of an arbitrary pure state
through the channel N 𝐴→𝐵 . Intuitively, therefore, the closer the overall channel
D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 is to the identity channel id 𝐴′ →𝐵′ , the better the code
(E, D) is at the quantum communication task, and so our figure of merit should
quantify this distance. One method to determine this distance is to calculate how
well a given code can transmit one share of a state in the worst case, i.e., by either
the highest value of the trace distance or by the lowest value of the fidelity. If a
code can be designed such that, in the worst case, the fidelity (trace distance) is
high (low), then by definition any other state will do just as well or better. We are
thus led to define the following two figures of merit:
1. Worst-case trace distance: We define this as
1
sup ∥Ψ𝑅 𝐴′ − (D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(Ψ𝑅 𝐴′ )∥ 1 . (14.1.5)
Ψ𝑅 𝐴′ 2

877
Chapter 14: Quantum Communication

Recalling Definition 6.18, we see that worse-case trace distance is equal to

the diamond distance between the identity channel id 𝐴′ →𝐵′ and the channel
D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 :
1
∥id 𝐴′ →𝐵′ − D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 ∥⋄ . (14.1.6)
2
2. Worst-case fidelity: We define this as
inf ⟨Ψ| 𝑅 𝐴′ (D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(Ψ𝑅 𝐴′ )|Ψ⟩ 𝑅 𝐴′ , (14.1.7)
Ψ𝑅 𝐴′

which is the same as the channel fidelity 𝐹 (D ◦ N ◦ E) from Definition 6.22.

These two figures of merit are arguably equivalent, as mentioned before, due to the
inequality in (14.1.4) relating the trace distance and the fidelity. For the rest of this
chapter, we exclusively use the worst-case fidelity of the code as the figure of merit,
and we define the error probability of the quantum communication code (E, D) for
N as
𝑝 ∗err (E, D; N) B 1 − 𝐹 (D ◦ N ◦ E) (14.1.8)
= sup {1 − ⟨Ψ| 𝑅 𝐴′ (D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(Ψ𝑅 𝐴′ )|Ψ⟩ 𝑅 𝐴′ } .
Ψ𝑅 𝐴′
(14.1.9)
One can view this quantity as the quantum analogue of the maximum error
probability for the classical communication tasks of Chapters 11 and 12, which is
why we use the same notation for it as in those chapters.

Definition 14.1 (𝒅, 𝜺) Quantum Communication Protocol

Let (𝑑, E, D) be the elements of a quantum communication protocol for the
quantum channel N. The protocol is called a (𝑑, 𝜀) protocol, with 𝜀 ∈ [0, 1],
if 𝑝 ∗err (E, D; N) ≤ 𝜀.

As alluded to at the beginning of this chapter, a special case of interest in

quantum communication is entanglement transmission, which is when the state
Ψ𝑅 𝐴′ is fixed to be the maximally entangled state Φ 𝑅 𝐴′ = |Φ⟩⟨Φ| 𝑅 𝐴′ , where we
recall that
𝑑−1
1 ∑︁
|Φ⟩ 𝑅 𝐴′ = √ |𝑖, 𝑖⟩ 𝑅 𝐴′ . (14.1.10)
𝑑 𝑖=0
878
Chapter 14: Quantum Communication

Now, since this state is a particular state in the optimization in (14.1.9) for the error
probability 𝑝 ∗err (E, D; N), we conclude that
𝑝 ∗err (E, D; N)
≥ 1 − ⟨Φ| 𝑅 𝐴′ (D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(Φ 𝑅 𝐴′ )|Φ⟩ 𝑅 𝐴′ (14.1.11)
= 1 − 𝐹𝑒 (D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 ) (14.1.12)
C 𝑝 err (E, D; N), (14.1.13)
where in the second line we have identified the entanglement fidelity of the channel
D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 , as stated in Definition 6.21. In the last line, we have
defined the quantity 𝑝 err (E, D; N). As the notation suggests, this quantity is a
quantum analogue of the average error probability for classical and entanglement-
assisted classical communication. In classical and entanglement-assisted classical
communication, the average error probability corresponds to taking a uniform
distribution over the messages being sent. Similarly, in quantum communication,
the average error probability can be thought of as taking a uniform distribution
for the Schmidt coefficients in (14.1.1), which by definition gives a maximally
entangled state.
Another way of writing the average error probability for a quantum communica-
tion code is via what is known as the entanglement test, which we introduced in the
previous chapter. It is analogous to the comparator test that we defined in Chapters
11 and 12 in the context of classical communication. The entanglement test is
defined by the POVM {Φ 𝑅𝐵′ , 1 𝑅𝐵′ − Φ 𝑅𝐵′ }. The outcomes of the entanglement
test tell us whether the state being measured is the maximally entangled state Φ 𝑅𝐵′ .
Since the state Φ 𝑅𝐵′ is pure, using (6.2.2), the probability that the state 𝜔 𝑅𝐵′ at the
end of the protocol is in the maximally entangled state, i.e., the probability that the
state “passes the entanglement test,” is
Tr[Φ 𝑅𝐵′ 𝜔 𝑅𝐵′ ] = ⟨Φ| 𝑅𝐵′ 𝜔 𝑅𝐵′ |Φ⟩ 𝑅𝐵′ (14.1.14)
= 𝐹 (Φ 𝑅𝐵′ , 𝜔 𝑅𝐵′ ) (14.1.15)
= 1 − 𝑝 err (E, D; N). (14.1.16)

As stated at the beginning of this chapter, the goal of quantum communication

is to determine the maximum number log2 𝑑 of qubits that can be transmitted over
a quantum channel N, in the sense that the 𝐴′ part of an arbitrary pure state Ψ𝑅 𝐴′ ,
with 𝑑 𝐴′ = 𝑑, can be transmitted over the channel with error at most 𝜀 ∈ (0, 1]. We
call this maximum number of transmitted qubits the one-shot quantum capacity of
N.
879
Chapter 14: Quantum Communication

Definition 14.2 One-Shot Quantum Capacity of a Quantum Channel

Given a quantum channel N and 𝜀 ∈ (0, 1], the one-shot 𝜀-error quantum
capacity of N, denoted by 𝑄 𝜀 (N), is defined to be the maximum number log2 𝑑
of transmitted qubits among all (𝑑, 𝜀) quantum communication protocols over
N. In other words,

𝑄 𝜀 (N) B sup {log2 𝑑 : 𝑝 ∗err (E, D; N) ≤ 𝜀}, (14.1.17)

(𝑑,E,D)

where the optimization is with respect to 𝑑 ∈ N, 𝑑 ≥ 1, encoding channels E

with input system dimension 𝑑, and decoding channels D with output system
dimension 𝑑.

In addition to finding, for a given 𝜀 ∈ (0, 1], the maximum number of transmitted
qubits among all (𝑑, 𝜀) quantum communication protocols over N 𝐴→𝐵 , we can
consider the following complementary problem: for a given dimension 𝑑 ≥ 1, find
the smallest possible error among all (𝑑, 𝜀) quantum communication protocols for
N 𝐴→𝐵 , which we denote by 𝜀𝑄 ∗ (𝑑; N). In other words, the complementary problem

is to determine
∗
𝜀𝑄 (𝑑; N) B inf {𝑝 ∗err (E, D; N) : 𝑑 𝐴′ = 𝑑 𝐵′ = 𝑑}, (14.1.18)
E,D

where the optimization is with respect to all encoding channels E 𝐴′ →𝐴 and decoding
channels D𝐵→𝐵′ such that 𝑑 𝐴′ = 𝑑 𝐵′ = 𝑑. In this book, we focus primarily on the
problem of optimizing the number of transmitted qubits rather than the error, and
so our primary quantity of interest is the one-shot quantum capacity 𝑄 𝜀 (N).

14.1.1 Protocol for a Useless Channel

Consider an arbitrary (𝑑, 𝜀) quantum communication protocol for a channel N 𝐴→𝐵 ,

with 𝜀 ∈ (0, 1] and with encoding and decoding channels E and D, respectively.
This means that 𝑝 ∗err (E, D; N) ≤ 𝜀. By the arguments in (14.1.11)–(14.1.13), this
protocol realizes a (𝑑, 𝜀) entanglement transmission protocol, in the sense that
Tr[Φ 𝑅𝐵′ 𝜌 𝑅𝐵′ ] ≥ 1 − 𝜀, (14.1.19)
where Φ 𝑅 𝐴′ is the maximally entangled state defined in (14.1.10) and
𝜌 𝑅𝐵′ B (D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(Φ 𝑅 𝐴′ ). (14.1.20)
880
Chapter 14: Quantum Communication

Ψ RA0 ω RB0

A0
E A
PσB B
D B0

Alice Bob

Figure 14.2: Depiction of a protocol that is useless for entanglement transmis-

sion. The encoded half of Alice’s share of the pure state Ψ𝑅 𝐴′ is discarded and
replaced by an arbitrary (but fixed) state 𝜎𝐵 .

Consider now the same protocol but over the useless channel depicted in Figure
14.2. This useless channel is exactly the same as the one considered in Chapters 11
and 12; namely, it is the replacement channel for some state 𝜎𝐵 . For the initial state
Φ 𝑅 𝐴′ , the state at the end of the protocol for the replacement channel is

𝜏𝑅𝐵′ = (D𝐵→𝐵′ ◦ R𝜎𝐴→𝐵

𝐵
◦ E 𝐴′ →𝐴 )(Φ 𝑅 𝐴′ ) = 𝜋 𝑅 ⊗ D𝐵→𝐵′ (𝜎𝐵 ). (14.1.21)

As in classical communication and entanglement-assisted classical communication,

we now use the hypothesis testing relative entropy to compare the state 𝜌 𝑅𝐵′
obtained at the end of the quantum communication protocol over the channel N
with the state 𝜏𝑅𝐵′ obtained at the end of the quantum communication protocol
for the replacement channel R𝜎𝐵 . In particular, we make use of Lemma 13.3,
because the state 𝜏𝑅𝐵′ satisfies Tr 𝐵′ [𝜏𝑅𝐵 ] = 𝜋 𝑅 and due to (14.1.19). Therefore,
using (13.1.9) in Lemma 13.3, we conclude that
1 𝜀
log2 𝑑 ≤ 𝐼 𝐻 (𝑅; 𝐵′) 𝜌 . (14.1.22)
2
Another bound from Lemma 13.3, namely, the one in (13.1.8), is a more general
upper bound that requires only the assumption in (14.1.19) and does not have the
interpretation of being a comparison between a quantum communication protocol
for N and a quantum communication protocol for R𝜎𝐴→𝐵 𝐵
. Applying this bound
gives
log2 𝑑 ≤ 𝐼 𝐻𝜀 (𝑅⟩𝐵′) 𝜌 . (14.1.23)
881
Chapter 14: Quantum Communication

This latter bound is the one that we employ in this chapter because it leads to a
formula for the quantum capacity of some channels of interest in applications. This
inequality tells us that, given an arbitrary (𝑑, 𝜀) quantum communication protocol
with corresponding code (E, D), the 𝜀-hypothesis testing coherent information
𝐼 𝐻𝜀 (𝑅⟩𝐵′) 𝜌 , with 𝜌 𝑅𝐵′ given by (14.1.20), is an upper bound on the maximum
number of qubits that can be transmitted over the channel with error at most 𝜀.
Note that a different choice for the encoding and decoding generally produces a
different value for the upper bound. We would like an upper bound that applies
regardless of the specific protocol. In other words, we would like an upper bound
that is a function of the channel N 𝐴→𝐵 only.

14.1.2 Upper Bound on the Number of Transmitted Qubits

We now establish a general upper bound on the number of transmitted qubits in an

arbitrary quantum communication protocol. This bound holds independently of
the encoding and decoding channels used in the protocol and depends only on the
given communication channel N 𝐴→𝐵 and the error 𝜀.

Theorem 14.3 Upper Bound on One-Shot Quantum Capacity

Let N 𝐴→𝐵 be a quantum channel. For a (𝑑, 𝜀) quantum communication protocol
for N 𝐴→𝐵 , with 𝜀 ∈ (0, 1], the number of qubits transmitted over N is bounded
from above by the 𝜀-hypothesis testing coherent information of N defined in
(7.11.96), i.e.,
log2 𝑑 ≤ 𝐼 𝐻𝑐,𝜀 (N). (14.1.24)
Consequently, for the one-shot quantum capacity of N,

𝑄 𝜀 (N) ≤ 𝐼 𝐻𝑐,𝜀 (N). (14.1.25)

Proof: Let E and D be the encoding and decoding channels, respectively, for a
(𝑑, 𝜀) quantum communication protocol for N. Then, by (14.1.23), we have that

log2 𝑑 ≤ 𝐼 𝐻𝜀 (𝑅⟩𝐵′) 𝜌 = inf 𝐷 𝜀𝐻 (𝜌 𝑅𝐵′ ∥ 1 𝑅 ⊗ 𝜎𝐵′ ), (14.1.26)

𝜎𝐵′

where 𝜌 𝑅𝐵′ = (D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(Φ 𝑅 𝐴′ ) is the state defined in (14.1.20).

By restricting the optimization in the definition of 𝐼 𝐻𝜀 (𝑅⟩𝐵′) 𝜌 over every state 𝜎𝐵′

882
Chapter 14: Quantum Communication

to the set {D𝐵→𝐵′ (𝜏𝐵 ) : 𝜏𝐵 ∈ D(H𝐵 )}, we obtain

𝐼 𝐻𝜀 (𝑅⟩𝐵′) 𝜌
= inf 𝐷 𝜀𝐻 (𝜌 𝑅𝐵′ ∥ 1 𝑅 ⊗ 𝜎𝐵′ ) (14.1.27)
𝜎𝐵′
≤ inf 𝐷 𝜀𝐻 ((D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(Φ 𝑅 𝐴′ )∥ 1 𝑅 ⊗ D𝐵→𝐵′ (𝜏𝐵 )) (14.1.28)
𝜏𝐵
≤ inf 𝐷 𝜀𝐻 (N 𝐴→𝐵 (𝜌 𝑅 𝐴 )∥ 1 𝑅 ⊗ 𝜏𝐵 ) (14.1.29)
𝜏𝐵

where the second inequality follows from the data-processing inequality for hy-
pothesis testing relative entropy and we let 𝜌 𝑅 𝐴 B E 𝐴′ →𝐴 (Φ 𝑅 𝐴′ ). We now take
the supremum over every state 𝜌 𝑅 𝐴 , which effectively corresponds to taking the
supremum over all encoding channels, and since it suffices to consider only pure
states when optimizing the coherent information (see the arguments after Definition
7.85), we conclude that

𝐼 𝐻𝜀 (𝑅⟩𝐵′) 𝜌 ≤ inf 𝐷 𝜀𝐻 (N 𝐴→𝐵 (𝜌 𝑅 𝐴 )∥ 1 𝑅 ⊗ 𝜏𝐵 ) (14.1.30)

𝜏𝐵
≤ sup inf 𝐷 𝜀𝐻 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥ 1 𝑅 ⊗ 𝜏𝐵 ) (14.1.31)
𝜓 𝑅 𝐴 𝜏𝐵
= 𝐼 𝐻𝑐,𝜀 (N), (14.1.32)

as required. ■

As an immediate consequence of Theorem 14.3 and Propositions 7.70 and 7.71,

we obtain the following:

Corollary 14.4
Let N 𝐴→𝐵 be a quantum channel, and let 𝜀 ∈ [0, 1). For all (𝑑, 𝜀) quantum
communication protocols for N, the following bounds hold:

(1 − 2𝜀) log2 𝑑 ≤ 𝐼 𝑐 (N) + ℎ2 (𝜀), (14.1.33)

𝛼 1
𝐼𝛼𝑐 (N) +
log2 𝑑 ≤ e log2 ∀ 𝛼 > 1, (14.1.34)
𝛼−1 1−𝜀

where 𝐼 𝑐 (N) is the coherent information of N, as defined in (7.11.107), and

𝐼𝛼𝑐 (N) is the sandwiched Rényi coherent information of N, as defined in
e
(7.11.100).

883
Chapter 14: Quantum Communication

The proof of (14.1.33) is analogous to the proof of (13.1.44). The proof of

(14.1.34) follows by combining Theorem 14.3 with Proposition 7.71.
Since the bounds in (14.1.33) and (14.1.34) hold for an arbitrary (𝑑, 𝜀) quantum
communication protocol for N, we have that

(1 − 2𝜀)𝑄 𝜀 (N) ≤ 𝐼 𝑐 (N) + ℎ2 (𝜀), (14.1.35)

𝛼 1
𝑄 𝜀 (N) ≤ e
𝐼𝛼𝑐 (N) + log2 ∀ 𝛼 > 1, (14.1.36)
𝛼−1 1−𝜀

for all 𝜀 ∈ [0, 1).

Let us summarize the steps that we took to arrive at the bounds in (14.1.33) and
(14.1.34):
1. We first compared a quantum communication protocol for N with the same
protocol for a useless channel by using the hypothesis testing relative entropy.
This led us to Lemma 13.3, and the resulting upper bound in (14.1.23).
2. We then used the data-processing inequality for the hypothesis testing relative
entropy to remove the decoding channel from the bound in (14.1.23). This is
done in (14.1.29) in the proof of Theorem 14.3.
3. Finally, we optimized over all encoding channels in (14.1.30)–(14.1.32) to
obtain Theorem 14.3, in which the bound is a function solely of the channel
and the error probability. Using Propositions 7.70 and 7.71, which relate
hypothesis testing relative entropy to quantum relative entropy and sandwiched
Rényi relative entropy, we arrived at Corollary 14.4.

14.1.3 Lower Bound on the Number of Transmitted Qubits via

Entanglement Distillation

Having derived upper bounds on the number of transmitted qubits for an arbitrary
quantum communication protocol, let us now determine a lower bound on the
number of transmitted qubits. As with the other communication scenarios that we
have considered so far, in order to obtain a lower bound on the number qubits that
can be transmitted, we need to devise an explicit (𝑑, 𝜀) quantum communication
protocol for all 𝜀 ∈ (0, 1). The protocol we consider is based on the one-shot,
one-way entanglement distillation protocol from Proposition 13.10 in Chapter 13,

884
Chapter 14: Quantum Communication

which establishes
√ that, for an arbitrary bipartite state 𝜌 𝐴𝐵 and for all 𝜀 ∈ (0, 1] and
𝜂 ∈ [0, 𝜀), there exists a (𝑑, 𝜀) one-way entanglement distillation protocol for
𝜌 𝐴𝐵 such that √
𝜀−𝜂
log2 𝑑 = −𝐻max ( 𝐴|𝐵) 𝜌 + 4 log2 𝜂. (14.1.37)
The goal in this section is to show that entanglement distillation can be used
to develop a quantum communication strategy. Specifically, we show that the
existence of a (𝑑, 𝜀) one-way entanglement distillation protocol for the bipartite state
𝜔 𝐴𝐵 = N 𝐴′ →𝐵 (𝜓 𝐴𝐴′ ) implies the existence of a (𝑑 ′, 𝜀′) quantum communication
protocol, with 𝑑 ′ and 𝜀′ being functions of 𝑑 and 𝜀. The claim is as follows:

Theorem 14.5 Lower Bound on One-Shot Quantum Capacity

√
Let N 𝐴→𝐵 be a quantum channel. For all 𝜀 ∈ (0, 1), 𝜂 ∈ [0, 𝜀 𝛿/4), and
𝛿 ∈ (0, 1), there exists a (𝑑, 𝜀) quantum communication protocol for N 𝐴→𝐵
such that
√
𝜀 𝛿
−𝜂
log2 𝑑 = sup −𝐻max 4
( 𝐴|𝐵)𝜔 + log2 (𝜂4 (1 − 𝛿)), (14.1.38)
𝜓 𝐴𝐴′

where 𝜔 𝐴𝐵 = N 𝐴′ →𝐵 (𝜓 𝐴𝐴′ ). Consequently,

√
𝜀 𝛿
4 −𝜂
𝑄 (N) ≥ sup −𝐻max ( 𝐴|𝐵)𝜔 + log2 (𝜂4 (1 − 𝛿))
𝜀
(14.1.39)
𝜓 𝐴𝐴′
√
for all 𝜂 ∈ [0, 𝜀 𝛿/4) and 𝛿 ∈ (0, 1), where 𝜔 𝐴𝐵 = N 𝐴′ →𝐵 (𝜓 𝐴𝐴′ ).

The first step in the proof of Theorem 14.5 is to observe that one-way entan-
glement distillation is an example of entanglement generation, albeit with forward
(i.e., sender to receiver) classical communication, which we introduced at the
beginning of this chapter and formally define below. We then show that forward
classical communication does not help for entanglement generation, even in the
non-asymptotic setting. One-way entanglement distillation thus implies entangle-
ment generation. We then show that entanglement generation implies entanglement
transmission, which we defined at the beginning of this chapter. Finally, we show
that entanglement transmission implies quantum communication.
Before proceeding with the proof of Theorem 14.5, let us formally define entan-
glement generation (with and without one-way LOCC assistance) and entanglement

885
Chapter 14: Quantum Communication

transmission.
• Entanglement generation: An entanglement generation protocol for N 𝐴→𝐵 is
defined by the three elements (𝑑, Ψ𝐴′ 𝐴 , D𝐵→𝐵′ ), where Ψ𝐴′ 𝐴 is a pure state
with 𝑑 𝐴′ = 𝑑, and D𝐵→𝐵′ is a decoding channel with 𝑑 𝐵′ = 𝑑. The goal of the
protocol is to transmit the system 𝐴 such that the final state

𝜎𝐴′ 𝐵′ B (D𝐵→𝐵′ ◦ N 𝐴→𝐵 )(Ψ𝐴′ 𝐴 ) (14.1.40)

is close in fidelity to a maximally entangled state of Schmidt rank 𝑑. The

entanglement generation error of the protocol is given by

𝑝 (EG)
err (Ψ 𝐴′ 𝐴 , D; N) B 1 − ⟨Φ| 𝐴′ 𝐵′ 𝜎𝐴′ 𝐵′ |Φ⟩ 𝐴′ 𝐵′ (14.1.41)
= 1 − 𝐹 (Φ 𝐴′ 𝐵′ , 𝜎𝐴′ 𝐵′ ). (14.1.42)

We call the protocol (𝑑, Ψ𝐴′ 𝐴 , D𝐵→𝐵′ ) a (𝑑, 𝜀) protocol, with 𝜀 ∈ [0, 1], if
𝑝 (EG)
err (Ψ 𝐴′ 𝐴 , D; N) ≤ 𝜀.
Note that an entanglement generation protocol (𝑑, Ψ𝐴′ 𝐴 , D𝐵→𝐵′ ) over N 𝐴→𝐵
is an example of an entanglement distillation protocol (𝑑, L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ ) for the
state 𝜌 𝐴′ 𝐵 = N 𝐴→𝐵 (Ψ𝐴′ 𝐴 ), with 𝐴ˆ ≡ 𝐴′, 𝐵ˆ ≡ 𝐵′, and L 𝐴𝐵→ 𝐴ˆ 𝐵ˆ ≡ D𝐵→𝐵′ .
• Entanglement generation assisted by one-way LOCC: An entanglement gener-
ation protocol for N 𝐴→𝐵 assisted by one-way LOCC from 𝐴 to 𝐵 is defined
by (𝑑, Ψ𝐴′ 𝐴 , {E𝑥𝐴′ 𝐴→𝐴′ 𝐴 }𝑥 , {D𝑥𝐵→𝐵′ }𝑥 ), where 𝑑 ≥ 1, Ψ𝐴′ 𝐴 is a pure state with
𝑑 𝐴′ = 𝑑, {E𝑥𝐴′ 𝐴→𝐴′ 𝐴 }𝑥∈X is a set of completely positive maps indexed by a finite
Í
alphabet X such that 𝑥∈X E𝑥𝐴′ 𝐴→𝐴′ 𝐴 is trace preserving, and {D𝑥𝐵→𝐵′ }𝑥∈X is a
set of quantum channels indexed by X, with 𝑑 𝐵′ = 𝑑. The goal of the protocol
is to transmit the system 𝐴 such that the final state
∑︁
→
𝜎𝐴′ 𝐵′ B (D𝑥𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E𝑥𝐴′ 𝐴→𝐴′ 𝐴 )(Ψ𝐴′ 𝐴 ) (14.1.43)
𝑥∈X

is close in fidelity to a maximally entangled state of Schmidt rank 𝑑. The error

of the protocol is given by
(EG),→
𝑝 err (Ψ𝐴′ 𝐴 , {E𝑥 }𝑥 , {D𝑥 }𝑥 ; N) = 1 − 𝐹 (Φ 𝐴′ 𝐵′ , 𝜎𝐴→′ 𝐵′ ). (14.1.44)

We call the protocol (𝑑, Ψ𝐴′ 𝐴 , {E𝑥𝐴′ 𝐴→𝐴′ 𝐴 }𝑥 , {D𝑥𝐵→𝐵′ }𝑥 ) a (𝑑, 𝜀) protocol, with
𝜀 ∈ [0, 1], if 𝑝 (EG),→
err (Ψ𝐴′ 𝐴 , {E𝑥 }𝑥 , {D𝑥 }𝑥 ; N) ≤ 𝜀.

886
Chapter 14: Quantum Communication

• Entanglement transmission: An entanglement transmission protocol for N 𝐴→𝐵

consists of the three elements (𝑑, E, D), where 𝑑 ≥ 1, E 𝐴′ →𝐴 is an encoding
channel with 𝑑 𝐴′ = 𝑑, and D𝐵→𝐵′ is a decoding channel with 𝑑 𝐵′ = 𝑑. The
goal of the protocol is to transmit the 𝐴′ system of a maximally entangled state
Φ 𝑅 𝐴′ of Schmidt rank 𝑑 such that the final state
𝜔 𝑅𝐵′ B (D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(Φ 𝑅 𝐴′ ) (14.1.45)
is close to the initial maximally entangled state. The entanglement transmission
error of the protocol is
𝑝 (ET)
err (E, D; N) B 1 − ⟨Φ| 𝑅𝐵′ 𝜔 𝑅𝐵′ |Φ⟩ 𝑅𝐵′ (14.1.46)
= 1 − 𝐹𝑒 (D ◦ N ◦ E), (14.1.47)
where we recall the entanglement fidelity of a channel from Definition 6.21. We
call the protocol (𝑑, E, D) a (𝑑, 𝜀) protocol, with 𝜀 ∈ [0, 1], if 𝑝 (ET)
err (E, D; N)
≤ 𝜀.
Observe that the error criterion for entanglement transmission is the same as the
average error criterion for quantum communication (see (14.1.11)–(14.1.13)).
This means that the existence of an arbitrary (𝑑, 𝜀) quantum communication
protocol implies the existence of a (𝑑, 𝜀) entanglement transmission protocol.
Also observe that the existence of an arbitrary (𝑑, 𝜀) entanglement transmission
protocol implies the existence of a (𝑑, 𝜀) entanglement generation protocol
with respect to the state Ψ𝑅 𝐴 ≡ E 𝐴′ →𝐴 (Φ 𝑅 𝐴′ ) (with the systems 𝑅, 𝐴, and 𝐴′
belonging to Alice).

14.1.3.1 Proof of Theorem 14.5

We start by showing that an arbitrary entanglement distillation protocol for the state
N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 ), with 𝜓 𝐴′ 𝐴 a pure state, has the same performance parameters as an
entanglement generation protocol with one-way LOCC assistance.
Consider an arbitrary (𝑑, 𝜀) entanglement distillation protocol for N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 )
given by a one-way LOCC channel L 𝐴′ 𝐵→ 𝐴ˆ 𝐵ˆÍ , with 𝑑 𝐴ˆ = 𝑑 𝐵ˆ = 𝑑. In general,
this LOCC channel has the form L 𝐴′ 𝐵→ 𝐴ˆ 𝐵ˆ = 𝑥∈X E𝑥 ′ ˆ ⊗ D𝑥 ˆ , where X is
𝐴 →𝐴 𝐵→ 𝐵
some finite alphabet, {E𝑥 ′ ˆ }𝑥∈X is a set of completely positive maps such that
Í 𝐴 →𝐴
𝑥∈X E 𝐴′ → 𝐴ˆ is trace preserving, and {D 𝐵→ 𝐵ˆ } 𝑥∈X is a set of channels. The output
𝑥 𝑥

state of the entanglement distillation protocol is

887
Chapter 14: Quantum Communication
∑︁
(E𝑥𝐴′ → 𝐴ˆ ⊗ D𝑥𝐵→𝐵ˆ )(N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 ))
𝑥∈X
∑︁
= (D𝑥𝐵→𝐵ˆ ◦ N 𝐴→𝐵 ◦ E𝑥𝐴′ → 𝐴ˆ )(𝜓 𝐴′ 𝐴 ), (14.1.48)
𝑥∈X

which has the form of a state at the output of an entanglement generation protocol
with one-way LOCC assistance. We thus have that a (𝑑, 𝜀) entanglement distillation
protocol for N 𝐴→𝐵 (𝜓 𝐴′ 𝐴 ) is equivalent to a (𝑑, 𝜀) entanglement generation protocol
for N with one-way LOCC assistance. We now show that one-way LOCC assistance
does not help for entanglement generation.

Lemma 14.6
Given a (𝑑, 𝜀) entanglement generation protocol for a channel N, assisted by
one-way LOCC, with 𝑑 ≥ 1 and 𝜀 ∈ [0, 1], there exists a (𝑑, 𝜀) entanglement
generation protocol for N (without one-way LOCC assistance).

Proof: Consider an arbitrary (𝑑, 𝜀) entanglement generation protocol assisted by

one-way LOCC. The output state of such a protocol has the form in (14.1.43), i.e.,
∑︁
→
𝜎𝐴′ 𝐵′ = (D𝑥𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E𝑥𝐴′ 𝐴→𝐴′ 𝐴 )(Ψ𝐴′ 𝐴 ), (14.1.49)
𝑥∈X

and by definition we have

𝐹 (Φ 𝐴′ 𝐵′ , 𝜎𝐴→′ 𝐵′ ) = Tr[Φ 𝐴′ 𝐵′ 𝜎𝐴→′ 𝐵′ ] ≥ 1 − 𝜀. (14.1.50)
Now, let
𝑝(𝑥) B Tr[E𝑥𝐴′ 𝐴→𝐴′ 𝐴 (Ψ𝐴′ 𝐴 )], (14.1.51)
1 𝑥
𝜌 𝑥𝐴′ 𝐴 B E ′ ′ (Ψ 𝐴′ 𝐴 ). (14.1.52)
𝑝(𝑥) 𝐴 𝐴→𝐴 𝐴
Using this, we can write 𝜎𝐴→′ 𝐵′ as
∑︁
→
𝜎𝐴′ 𝐵′ = 𝑝(𝑥)(D𝑥𝐵→𝐵′ ◦ N 𝐴→𝐵 )(𝜌 𝑥𝐴′ 𝐴 ). (14.1.53)
𝑥∈X

For every 𝑥 ∈ X, let 𝜌 𝑥𝐴′ 𝐴 have the following spectral decomposition:

𝑟𝑥
∑︁
𝜌 𝑥𝐴′ 𝐴 = 𝑞(𝑘 |𝑥)𝜙𝑥,𝑘
𝐴′ 𝐴 , (14.1.54)
𝑘=1

888
Chapter 14: Quantum Communication

where 𝑟 𝑥 = rank(𝜌 𝑥𝐴′ 𝐴 ). We thus have that

𝑟𝑥
∑︁ ∑︁
𝜎𝐴→′ 𝐵′ = 𝑝(𝑥)𝑞(𝑘 |𝑥)(D𝑥𝐵→𝐵′ ◦ N 𝐴→𝐵 )(𝜙𝑥,𝑘
𝐴′ 𝐴 ). (14.1.55)
𝑥∈X 𝑘=1

Then, letting 𝜎𝐴𝑥,𝑘′ 𝐵′ B (D𝑥𝐵→𝐵′ ◦ N 𝐴→𝐵 )(𝜙𝑥,𝑘

𝐴′ 𝐴 ), we conclude that

𝑟𝑥
∑︁ ∑︁
Tr[Φ 𝐴′ 𝐵′ 𝜎𝐴→′ 𝐵′ ] = 𝑝(𝑥)𝑞(𝑘 |𝑥)Tr[Φ 𝐴′ 𝐵′ 𝜎𝐴𝑥,𝑘′ 𝐵′ ] (14.1.56)
𝑥∈X 𝑘=1
≤ max Tr[Φ 𝐴′ 𝐵′ 𝜎𝐴𝑥,𝑘′ 𝐵′ ] (14.1.57)
𝑥∈X,
1≤𝑘 ≤𝑟 𝑥
= max Tr[Φ 𝐴′ 𝐵′ (D𝑥𝐵→𝐵′ ◦ N 𝐴→𝐵 )(𝜙𝑥,𝑘
𝐴′ 𝐴 )]. (14.1.58)
𝑥∈X,
1≤𝑘 ≤𝑟 𝑥

In other words, there exists a pair (𝜙𝑥,𝑘

𝐴′ 𝐴 , D 𝐵→𝐵′ ) (namely, the one that achieves the
𝑥

maximum on the right-hand side of the inequality above) such that

Tr[Φ 𝐴′ 𝐵′ 𝜎𝐴𝑥,𝑘′ 𝐵′ ] ≥ Tr[Φ 𝐴′ 𝐵′ 𝜎𝐴′ 𝐵′ ] ≥ 1 − 𝜀. (14.1.59)

Therefore, the triple (𝑑, 𝜙 𝑘,𝑥

𝐴′ 𝐴 , D 𝐵→𝐵′ ) constitutes a (𝑑, 𝜀) entanglement generation
𝑥

protocol (without one-way LOCC assistance). ■

As we have seen in Chapter 13, forward classical communication certainly helps

in general for entanglement distillation. However, it does not help for entanglement
generation because the resource for entanglement generation is a quantum channel,
whereas for entanglement distillation the resource is a bipartite quantum state.
Having a quantum channel as the resource is more powerful than having a bipartite
quantum state because, when using a quantum channel, there is an extra degree of
freedom in the input state to the channel. The proof of Lemma 14.6 demonstrates
that the forward classical communication in an arbitrary (𝑑, 𝜀) is not needed, and
the proof essentially relies on convexity of the entanglement fidelity performance
criterion.
We now show that entanglement generation implies entanglement transmission,
up to a transformation of the performance parameters.

889
Chapter 14: Quantum Communication

Lemma 14.7 Entanglement Generation to Entanglement Transmission

Given a (𝑑, 𝜀) entanglement generation protocol for a channel N 𝐴→𝐵 , with
𝑑 ≥ 1 and 𝜀 ∈ [0, 1], there exists a (𝑑, 4𝜀) entanglement transmission protocol
for N.

Proof: Let (𝑑, Ψ𝐴′ 𝐴 , D𝐵→𝐵′ ) be the elements of a (𝑑, 𝜀) entanglement generation
protocol for N 𝐴→𝐵 , with 𝑑 𝐴′ = 𝑑 𝐵′ = 𝑑. This implies that the output state

𝜎𝐴′ 𝐵′ = (D𝐵→𝐵′ ◦ N 𝐴→𝐵 )(Ψ𝐴′ 𝐴 ) (14.1.60)

satisfies
𝐹 (Φ 𝐴′ 𝐵′ , 𝜎𝐴′ 𝐵′ ) ≥ 1 − 𝜀. (14.1.61)
We now construct an entanglement transmission protocol. To this end, let 𝐴′ ≡ 𝑅
be a reference system inaccessible to both Alice and Bob. By the data-processing
inequality for fidelity (Theorem 6.9) with respect to the partial trace channel Tr 𝐵′ ,
we have that

𝐹 (Φ 𝑅 , Ψ𝑅 ) = 𝐹 (Tr 𝐵′ [Φ 𝑅𝐵′ ], Tr 𝐵′ [𝜎𝑅𝐵′ ]) ≥ 𝐹 (Φ 𝑅𝐵′ , 𝜎𝑅𝐵′ ) ≥ 1 − 𝜀. (14.1.62)

Next, by Uhlmann’s theorem (Theorem 6.8), there exists an isometric channel

U 𝐴′ →𝐴 such that

𝐹 (Φ 𝑅 , Ψ𝑅 ) = 𝐹 (U 𝐴′ →𝐴 (Φ 𝑅 𝐴′ ), Ψ𝑅 𝐴′ ) ≥ 1 − 𝜀. (14.1.63)

We let this isometric channel U 𝐴′ →𝐴 be the encoding channel for the entanglement
transmission protocol, and we let

𝜔 𝑅𝐵′ = (D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ U 𝐴′ →𝐴 )(Φ 𝑅 𝐴′ ). (14.1.64)

Next, using the sine distance (Definition 6.16), by definition√of the (𝑑, 𝜀) entan-
glement generation protocol, we have that 𝑃(Φ 𝑅𝐵′ , 𝜎𝑅𝐵′ ) ≤ 𝜀. Similarly, from
(14.1.63) we have that
√
𝑃(Ψ𝑅 𝐴 , U 𝐴′ →𝐴 (Φ 𝑅 𝐴′ )) ≤ 𝜀. (14.1.65)

Therefore, by the triangle inequality for the sine distance (Lemma 6.17), we
conclude that

𝑃(Φ 𝑅𝐵′ , 𝜔 𝑅𝐵′ ) ≤ 𝑃(Φ 𝑅𝐵′ , 𝜎𝑅𝐵′ ) + 𝑃(𝜎𝑅𝐵′ , 𝜔 𝑅𝐵′ ) (14.1.66)

890
Chapter 14: Quantum Communication
√ √
≤ 𝜀+ 𝜀 (14.1.67)
√
≤ 2 𝜀, (14.1.68)

where the second inequality follows from the data-processing inequality for sine
distance (see (6.2.114)) and (14.1.65) to see that

𝑃(𝜎𝑅𝐵′ , 𝜔 𝑅𝐵′ ) (14.1.69)

= 𝑃((D𝐵→𝐵′ ◦ N 𝐴→𝐵 )(Ψ𝑅 𝐴 ), (D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ U 𝐴′ →𝐴 )(Φ 𝑅 𝐴′ )) (14.1.70)
≤ 𝑃(Ψ𝑅 𝐴 , U 𝐴′ →𝐴 (Φ 𝑅 𝐴′ )) (14.1.71)
√
≤ 𝜀. (14.1.72)

Therefore, by definition of the sine distance, we conclude that

1 − 𝐹 (Φ 𝑅𝐵′ , 𝜔 𝑅𝐵′ ) ≤ 4𝜀, (14.1.73)

so that (𝑑, U 𝐴′ →𝐴 , D𝐵→𝐵′ ) constitutes a (𝑑, 4𝜀) entanglement transmission protocol,

as required. ■

Finally, we show that entanglement transmission implies quantum communica-

tion, up to a transformation of the performance parameters. We could alternatively
call this statement “quantum expurgation,” because the arguments in the proof are
analogous to the expurgation arguments applied in the proof of the lower bound for
one-shot classical communication in Proposition 12.5.

Lemma 14.8 Entanglement Transmission to Quantum Communication

Given a (𝑑, 𝜀) entanglement transmission protocol for a channel N 𝐴→𝐵 ,√︁with
𝑑 ≥ 1 and 𝜀 ∈ [0, 1], for all 𝛿 ∈ (0, 1), there exists a (⌊(1 − 𝛿)𝑑⌋, 2 𝜀/𝛿)
quantum communication protocol for N.

Proof: Suppose that a (𝑑, 𝜀) entanglement transmission code for N 𝐴→𝐵 exists,
and let E 𝐴′ →𝐴 and D𝐵→𝐵′ be the corresponding encoding and decoding channels,
respectively, with 𝑑 𝐴′ = 𝑑 𝐵′ = 𝑑. The condition 𝑝 err (E, D; N) ≤ 𝜀 then holds,
namely,
1 − Tr[Φ 𝑅𝐵′ 𝜔 𝑅𝐵′ ] ≤ 𝜀, (14.1.74)
where
𝜔 𝑅𝐵′ = (D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(Φ 𝑅 𝐴′ ). (14.1.75)
891
Chapter 14: Quantum Communication

Let
C 𝐴′ →𝐵′ B D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 . (14.1.76)
We proceed with the following algorithm:
1. Set 𝑘 = 𝑑 and H𝑑 = H 𝐴′ . Suppose for now that (1 − 𝛿)𝑑 is a positive integer.
2. Set |𝜙 𝑘 ⟩ ∈ H𝑘 to be a state vector that achieves the minimum fidelity of C 𝐴′ →𝐵′ :

|𝜙 𝑘 ⟩ B arg min ⟨𝜙|C 𝐴′ →𝐵′ (|𝜙⟩⟨𝜙|)|𝜙⟩, (14.1.77)

|𝜙⟩∈H 𝑘

and set the fidelity 𝐹𝑘 of |𝜙 𝑘 ⟩ as follows:

𝐹𝑘 B min ⟨𝜙|C 𝐴′ →𝐵′ (|𝜙⟩⟨𝜙|)|𝜙⟩ (14.1.78)

|𝜙⟩∈H 𝑘
= ⟨𝜙 𝑘 |C 𝐴′ →𝐵′ (|𝜙 𝑘 ⟩⟨𝜙 𝑘 |)|𝜙 𝑘 ⟩. (14.1.79)

3. Set
H𝑘−1 B span{|𝜓⟩ ∈ H𝑘 : |⟨𝜓|𝜙 𝑘 ⟩| = 0}. (14.1.80)
That is, H𝑘−1 is set to the orthogonal complement of |𝜙 𝑘 ⟩ in H𝑘 , so that
H𝑘 = H𝑘−1 ⊕ span{|𝜙 𝑘 ⟩}. Set 𝑘 → 𝑘 − 1.
4. Repeat steps 2-3 until 𝑘 = (1 − 𝛿)𝑑 after step 3.
The idea behind this algorithm is to successively remove minimum fidelity
states from H 𝐴′ until 𝑘 = (1 − 𝛿)𝑑. By the structure of the algorithm and some
analysis given below, we are then guaranteed that for this 𝑘 and lower that

1 − min ⟨𝜙|C(|𝜙⟩⟨𝜙|)|𝜙⟩ ≤ 𝜀/𝛿 . (14.1.81)

|𝜙⟩∈H 𝑘

That is, the subspace H𝑘 is good for quantum communication of states at the channel
input with fidelity at least 1 − 𝜀/𝛿 (to be precise, the subspace H𝑘 is good for
subspace transmission as defined in the introduction of this chapter). Furthermore,
the algorithm implies that

𝐹𝑑 ≤ 𝐹𝑑−1 ≤ · · · ≤ 𝐹(1−𝛿)𝑑 , (14.1.82)

H𝑑 ⊇ H𝑑−1 ⊇ · · · ⊇ H (1−𝛿)𝑑 . (14.1.83)

Also, {|𝜙 𝑘 ⟩}ℓ𝑘=1 is an orthonormal basis for Hℓ , where ℓ ∈ {1, . . . , 𝑑}. Note that
the unit vectors |𝜙 𝑘 ⟩, 𝑘 ∈ {(1 − 𝛿)𝑑 − 1, . . . , 1} can be generated by repeating the
algorithm above exhaustively.
892
Chapter 14: Quantum Communication

We now analyze the claims above by employing Markov’s inequality and some
other tools. From (14.1.74), we have that

𝐹 (Φ 𝑅𝐵′ , C 𝐴′ →𝐵′ (Φ 𝑅 𝐴′ )) ≥ 1 − 𝜀. (14.1.84)

Since {|𝜙 𝑘 ⟩} 𝑑𝑘=1 is an orthonormal basis for H𝑑 , we can write

𝑑
1 ∑︁
|Φ⟩ 𝑅 𝐴′ =√ |𝜙 𝑘 ⟩ 𝑅 ⊗ |𝜙 𝑘 ⟩ 𝐴′ , (14.1.85)
𝑑 𝑘=1

where complex conjugation is taken with respect to the basis {|𝑖⟩}𝑖=0 𝑑−1 used in

(14.1.10). A consequence of the data-processing inequality for fidelity under the

Í
dephasing channel 𝜔 𝑅 ↦→ 𝑑𝑘=1 |𝜙 𝑘 ⟩⟨𝜙 𝑘 | 𝑅 𝜔 𝑅 |𝜙 𝑘 ⟩⟨𝜙 𝑘 | 𝑅 and convexity of the square
function is that
𝑑
1 ∑︁
𝐹 (Φ 𝑅𝐵′ , (id 𝑅 ⊗ C 𝐴′ →𝐵′ )(Φ 𝑅 𝐴′ )) ≤ ⟨𝜙 𝑘 |C 𝐴′ →𝐵′ (|𝜙 𝑘 ⟩⟨𝜙 𝑘 |)|𝜙 𝑘 ⟩ (14.1.86)
𝑑 𝑘=1
𝑑
1 ∑︁
= 𝐹𝑘 . (14.1.87)
𝑑 𝑘=1

This means that

𝑑 𝑑
1 ∑︁ 1 ∑︁
𝐹𝑘 ≥ 1 − 𝜀 ⇐⇒ (1 − 𝐹𝑘 ) ≤ 𝜀. (14.1.88)
𝑑 𝑘=1 𝑑 𝑘=1

Now, taking 𝐾 to be a uniform random variable with realizations 𝑘 ∈ {1, . . . , 𝑑}

and applying Markov’s inequality (see (2.3.20)), we find that
E[1 − 𝐹𝐾 ] 𝜀
Pr[1 − 𝐹𝐾 ≥ 𝜀/𝛿] ≤ 𝜀/𝛿
≤ 𝜀/𝛿
= 𝛿. (14.1.89)

So this implies that (1 − 𝛿)𝑑 of the 𝐹𝑘 values are such that 𝐹𝑘 ≥ 1 − 𝜀/𝛿. Since they
are ordered as given in (14.1.82), we conclude that H (1−𝛿)𝑑 , which by definition
has dimension (1 − 𝛿)𝑑, is a good subspace for quantum communication in the
following sense (subspace transmission):

min ⟨𝜙|C 𝐴′ →𝐵′ (|𝜙⟩⟨𝜙|)|𝜙⟩ ≥ 1 − 𝜀/𝛿 . (14.1.90)

|𝜙⟩∈H (1− 𝛿 ) 𝑑

893
Chapter 14: Quantum Communication

Now, applying Proposition 6.25 to (14.1.90), we conclude that

√︁
min ⟨𝜓|(idH′(1− 𝛿 ) 𝑑 ⊗ C 𝐴′ →𝐵′ )(|𝜓⟩⟨𝜓|)|𝜙⟩ ≥ 1 − 2 𝜀/𝛿, (14.1.91)
|𝜓⟩∈H′(1− 𝛿 ) 𝑑 ⊗H (1− 𝛿 ) 𝑑

√︁
i.e., 𝑝 ∗err (E, D; N) ≤ 2 𝜀/𝛿, which is the criterion for strong subspace transmission
(the strongest notion of quantum communication).
To finish off the proof, suppose that (1 − 𝛿)𝑑 is not an integer. Then there exists a
𝛿′ > 𝛿 such that (1 − 𝛿′)𝑑 = ⌊(1 − 𝛿)𝑑⌋ is a positive integer. By the above reasoning,
there exists a code satisfying (14.1.91), except with 𝛿 replaced by ′
√︁ 𝛿 , and with√︁the
code dimension equal to ⌊(1 − 𝛿)𝑑⌋. We also have that 1 − 2 𝜀/𝛿′ > 1 − 2 𝜀/𝛿.
This concludes the proof. ■

We now return to the proof of Theorem 14.5. To finish it off, we combine the
results of Lemmas 14.6, 14.7, and 14.8 to conclude that the existence of a (𝑑, 𝜀)
entanglement distillation protocol for 𝜔 𝐴𝐵 = N 𝐴′ →𝐵 (𝜓 𝐴𝐴′ ) implies the existence
of a (𝑑 ′, 𝜀′) quantum communication protocol, where
√︂
′ ′ 𝜀
𝑑 = (1 − 𝛿)𝑑, 𝜀 = 4 , 𝛿 ∈ (0, 1). (14.1.92)
𝛿
Recalling that 𝑑 is given by (14.1.37), we conclude that
√
𝜀′ 𝛿
′ 4 −𝜂
log2 𝑑 = −𝐻max ( 𝐴|𝐵)𝜔 + log2 (𝜂4 (1 − 𝛿)). (14.1.93)

Then, since the pure state 𝜓 𝐴𝐴′ used in (14.1.93) is arbitrary, we conclude that there
exists a (𝑑 ′, 𝜀′) quantum communication protocol satisfying
√
𝜀′ 𝛿

′ −𝜂
log2 𝑑 = sup −𝐻max 4
( 𝐴|𝐵)𝜔 + log2 (𝜂4 (1 − 𝛿)) (14.1.94)
𝜓 𝐴𝐴′
√
for all 𝜂 ∈ [0, 𝜀′ 𝛿/4) and 𝛿 ∈ (0, 1). This is precisely the statement in (14.1.38),
and so the proof of Theorem 14.5 is complete.
Applying the relation between smooth conditional min- and max-entropy in
(13.1.72) to the result of Theorem 14.5, and combining it with (7.8.83), we obtain
the following.

894
Chapter 14: Quantum Communication

Corollary 14.9
Let N 𝐴→𝐵 be a quantum channel, and let VN 𝐴→𝐵𝐸 be √an isometric channel
extending N 𝐴→𝐵 . For all 𝜀 ∈ (0, 1), 𝛿 ∈ (0, 1), 𝜂 ∈ [0, 𝜀 𝛿/4), and 𝛼 > 1, there
exists a (𝑑, 𝜀) quantum communication protocol for N with

1 1
log2 𝑑 ≥ sup 𝐻 e𝛼 ( 𝐴|𝐸) 𝜙 − log2
𝜓 𝐴𝐴′ 𝛼−1 𝑓 (𝜀, 𝛿, 𝜂)

1
− log2 + log2 (𝜂4 (1 − 𝛿)), (14.1.95)
1 − 𝑓 (𝜀, 𝛿, 𝜂)
√ 2
where 𝑓 (𝜀, 𝛿, 𝜂) B − 𝜂 , 𝜙 𝐴𝐸 = Tr 𝐵 [VN
𝜀 𝛿
4 𝐴′ →𝐵𝐸 (𝜓 𝐴𝐴 )] = N 𝐴′ →𝐸 (𝜓 𝐴𝐴 ),
′
𝑐 ′

and 𝜓 𝐴𝐴′ is a pure state with the dimension of 𝐴′ equal to the dimension of 𝐴.

Since the inequality in (14.1.95) holds for all (𝑑, 𝜀) quantum communication
protocols, we have that

𝜀 1 1
𝑄 (N) ≥ sup 𝐻 e𝛼 ( 𝐴|𝐸) 𝜙 − log2
𝜓 𝐴𝐴′ 𝛼−1 𝑓 (𝜀, 𝛿, 𝜂)

1
− log2 + log2 (𝜂4 (1 − 𝛿)), (14.1.96)
1 − 𝑓 (𝜀, 𝛿, 𝜂)

where
𝜙 𝐴𝐸 = Tr 𝐵 [VN
𝐴′ →𝐵𝐸 (𝜓 𝐴𝐴′ )] = N 𝐴′ →𝐸 (𝜓 𝐴𝐴′ ),
𝑐
(14.1.97)
√
𝑓 (𝜀, 𝛿, 𝜂) is defined just above, 𝜂 ∈ [0, 𝜀 𝛿/4), 𝛿 ∈ (0, 1), and 𝛼 > 1.

14.1.3.2 Remark on Forward Classical Communication

To summarize what we did in this section, we used the result from Proposition 13.10
on one-shot entanglement distillation to prove the existence of a quantum commu-
nication protocol in the one-shot setting. Note that the entanglement distillation
protocol of Proposition 13.10 involves one-way classical communication, while
quantum communication (as we defined it at the beginning of this chapter) does
not. In other words, in this section we managed to remove the one-way classical
communication from the entanglement distillation protocol and thereby argue for

895
Chapter 14: Quantum Communication

Ψ RA0 ω RB0
A1 B1
N
A2 B2
N
A0
E ..
.
A n −1
..

N
.
..
.
Bn−1
D B0

An Bn
N
Alice Bob

Figure 14.3: A general quantum communication protocol for 𝑛 ≥ 1 memory-

less/unassisted uses of a quantum channel N. Alice uses the channel E to encode
her share 𝐴′ of the pure state Ψ𝑅 𝐴′ into 𝑛 quantum systems 𝐴1 , 𝐴2 , . . . , 𝐴𝑛 . She
then sends each one of these through the channel N. Bob finally applies a joint
decoding channel D on the systems 𝐵1 , 𝐵2 , . . . , 𝐵𝑛 , resulting in the state 𝜔 𝑅𝐵′
given by (14.2.1).

the existence of a quantum communication protocol. More generally, it holds that

forward classical communication (i.e., from the sender to the receiver) does not
enhance the corresponding quantum capacity of the channel. In other words, if
𝑄 → (N) denotes the quantum capacity of the channel N when classical commu-
nication from the sender to the receiver is allowed as part of the protocol, then
𝑄 → (N) = 𝑄(N). This is a direct consequence of the chain of reasoning given in
Lemmas 14.6, 14.7, and 14.8, and we return to this point in Chapter 19 when we
consider LOCC-assisted quantum communication.

14.2 Quantum Capacity of a Quantum Channel

We now consider the asymptotic setting. In this scenario, depicted in Figure 14.3,
the quantum system 𝐴′ to be transmitted to Bob is encoded into 𝑛 copies 𝐴1 , . . . , 𝐴𝑛
of a quantum system 𝐴, for 𝑛 ≥ 1. Each of these systems is then sent independently
through the channel N. We call this the asymptotic setting because the number 𝑛
can be arbitrarily large.

896
Chapter 14: Quantum Communication

Analysis of the asymptotic setting is almost exactly the same as that of the
one-shot setting. This is due to the fact that 𝑛 independent uses of the channel
N can be regarded as a single use of the channel N ⊗𝑛 . So the only change that
needs to be made is to replace N with N ⊗𝑛 and to define the encoding and decoding
channels as acting on 𝑛 systems instead of just one. In particular, the state at the
end of the protocol becomes

𝜔 𝑅𝐵′ = (D𝐵𝑛 →𝐵′ ◦ N ⊗𝑛

𝐴→𝐵 ◦ E 𝐴 →𝐴 )(Ψ𝑅 𝐴 ).
′ 𝑛 ′ (14.2.1)

Then, just as in the one-shot setting, we define the error probability of the code
(E, D) for 𝑛 independent uses of N as

𝑝 ∗err (E, D; N ⊗𝑛 ) = 1 − 𝐹 (D ◦ N ⊗𝑛 ◦ E). (14.2.2)

Definition 14.10 (𝒏, 𝒅, 𝜺) Quantum Communication Protocol

Let (𝑑, E 𝐴′ →𝐴𝑛 , D𝐵𝑛 →𝐵′ ) be the elements of a quantum communication protocol
for 𝑛 independent uses of the channel N 𝐴→𝐵 , where 𝑑 𝐴′ = 𝑑 𝐵′ = 𝑑. The protocol
is called an (𝑛, 𝑑, 𝜀) protocol, with 𝜀 ∈ [0, 1], if 𝑝 ∗err (E, D; N ⊗𝑛 ) ≤ 𝜀.

The rate of an (𝑛, 𝑑, 𝜀) quantum communication protocol is defined as the

number of qubits transmitted per channel use, i.e.,
log2 𝑑
𝑅(𝑛, 𝑑) B . (14.2.3)
𝑛
Observe that the rate depends only on the dimension 𝑑 of the system 𝐴′ of the pure
state Ψ𝑅 𝐴′ to be transmitted and on the number of channel uses. In particular, it
does not directly depend on the communication channel nor on the encoding and
decoding channels. For a given 𝜀 ∈ [0, 1] and 𝑛 ≥ 1, the highest rate among all
(𝑛, 𝑑, 𝜀) protocols is denoted by 𝑄 𝑛,𝜀 (N), and it is defined as

1 𝜀 ⊗𝑛 log2 𝑑 ∗ ⊗𝑛
𝑛,𝜀
𝑄 (N) B 𝑄 (N ) = sup : 𝑝 err (E, D; N ) ≤ 𝜀 , (14.2.4)
𝑛 (𝑑,E,D) 𝑛

where in the second equality we use the definition of the one-shot quantum capacity
𝑄 𝜀 given in (14.1.17), and the supremum is over all 𝑑 ≥ 1, encoding channels
E with input system dimension 𝑑, and decoding channels D with output system
dimension 𝑑.

897
Chapter 14: Quantum Communication

Definition 14.11 Achievable Rate for Quantum Communication

Given a quantum channel N, a rate 𝑅 ∈ R+ is called an achievable rate for
quantum communication over N if for all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently
large 𝑛, there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) quantum communication protocol for N.

As we prove in Appendix A,
∗
𝑅 achievable rate ⇐⇒ lim 𝜀𝑄 (2𝑛(𝑅−𝛿) ; N ⊗𝑛 ) = 0 ∀ 𝛿 > 0. (14.2.5)
𝑛→∞

In other words, a rate 𝑅 is achievable if for all 𝛿 > 0, the optimal error probability
for a sequence of protocols with rate 𝑅 − 𝛿 vanishes as the number 𝑛 of uses of N
increases.

Definition 14.12 Quantum Capacity of a Quantum Channel

The quantum capacity of a quantum channel N, denoted by 𝑄(N), is defined
to be the supremum of all achievable rates, i.e.,

𝑄(N) B sup{𝑅 : 𝑅 is an achievable rate for N} (14.2.6)

An equivalent definition of quantum capacity is

1
𝑄(N) = inf lim inf 𝑄 𝜀 (N ⊗𝑛 ). (14.2.7)
𝜀∈(0,1] 𝑛→∞ 𝑛

We prove this in Appendix A.

Definition 14.13 Weak Converse Rate for Quantum Communication

Given a quantum channel N, a rate 𝑅 ∈ R+ is called a weak converse rate for
quantum communication over N if every 𝑅′ > 𝑅 is not an achievable rate for N.

We show in Appendix A that

∗
𝑅 weak converse rate ⇐⇒ lim 𝜀𝑄 (2𝑛(𝑅−𝛿) ; N ⊗𝑛 ) > 0 ∀ 𝛿 > 0. (14.2.8)
𝑛→∞

In other words, a weak converse rate is a rate for which the optimal error probability
cannot be made to vanish, even in the limit of a large number of channel uses.

898
Chapter 14: Quantum Communication

Definition 14.14 Strong Converse Rate for Quantum Communication

Given a quantum channel N, a rate 𝑅 ∈ R+ is called a strong converse rate for
quantum communication over N if for all 𝜀 ∈ [0, 1), 𝛿 > 0, and sufficiently
large 𝑛, there does not exist an (𝑛, 2𝑛(𝑅+𝛿) , 𝜀) quantum communication protocol
for N.

We show in Appendix A that

∗
𝑅 strong converse rate ⇐⇒ lim 𝜀𝑄 (2𝑛(𝑅+𝛿) ; N ⊗𝑛 ) = 1 ∀ 𝛿 > 0. (14.2.9)
𝑛→∞

Unlike the weak converse, in which the optimal error is required to simply be
bounded away from zero as the number 𝑛 of channel uses increases, in order to
have a strong converse rate, the optimal error has to converge to one as 𝑛 increases.
By comparing (14.2.8) and (14.2.9), we conclude that every strong converse rate is
a weak converse rate.

Definition 14.15 Strong Converse Quantum Capacity of a Quantum

Channel
The strong converse quantum capacity of a quantum channel N, denoted by
𝑄(N),
e is defined as the infimum of all strong converse rates, i.e.,

𝑄(N)
e B inf{𝑅 : 𝑅 is a strong converse rate for N} (14.2.10)

We can also write the strong converse quantum capacity as

1
𝑄(N)
e = sup lim sup 𝑄 𝜀 (N ⊗𝑛 ). (14.2.11)
𝜀∈[0,1) 𝑛→∞ 𝑛

See Appendix A for a proof. We also show in Appendix A that

𝑄(N) ≤ 𝑄(N)
e (14.2.12)

for every quantum channel N.

We now state the main theorem of this chapter, which gives an expression for
the quantum capacity of a quantum channel.

899
Chapter 14: Quantum Communication

Theorem 14.16 Quantum Capacity

The quantum capacity of a quantum channel N 𝐴→𝐵 is equal to the regularized
𝑐 (N) of N, i.e.,
coherent information 𝐼reg

𝑐 1 𝑐 ⊗𝑛
𝑄(N) = 𝐼reg (N) B lim 𝐼 (N ). (14.2.13)
𝑛→∞ 𝑛

Recall from (7.11.107) that the channel coherent information is defined as

𝐼 𝑐 (N) = sup 𝐼 (𝑅⟩𝐵)𝜔 , (14.2.14)
𝜓𝑅 𝐴
where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) and 𝜓 𝑅 𝐴 is a pure state with the dimension of 𝑅 equal
to the dimension of 𝐴.
Observe that the expression in (14.2.13) for the quantum capacity of a quantum
channel is somewhat similar to the expression in (12.2.14) for the classical capacity
of a quantum channel, in the sense that both capacities involve a regularization of a
corresponding channel measure. In the case of quantum communication, we obtain
the regularization of the channel’s coherent information, whereas in the case of
classical communication, we obtain the regularization of Holevo information. Due
to the regularization, which involves a limit of an arbitrarily large number of uses
of the channel, the quantum capacity is in general difficult to compute.
We show below in Section 14.2.3 that the coherent information is always
superadditive, meaning that 𝐼 𝑐 (N ⊗𝑛 ) ≥ 𝑛𝐼 𝑐 (N) for every channel N. This means
that the coherent information is always a lower bound on the quantum capacity of a
channel N:
𝑄(N) ≥ 𝐼 𝑐 (N) for every channel N. (14.2.15)
If the coherent information happens to be additive for a particular channel, then
the regularization in (14.2.13) is not required. The coherent information is known
to be additive for all degradable and anti-degradable channels. (See Definition
4.6.) In fact, for anti-degradable channels, the coherent information is equal to
zero, a fact that we prove in Section 14.3.2 below. Examples of degradable and
anti-degradable channels include the amplitude damping channel (Section 4.5.1)
and the quantum erasure channel (Section 4.5.2). For all such channels, we thus
have 𝑄(N) = 𝐼 𝑐 (N).
Also, just as with classical communication, in this case Theorem 14.16 only
makes a statement about the quantum capacity and not about the strong converse
900
Chapter 14: Quantum Communication

quantum capacity. One way of attempting to prove that the coherent information of
a channel is equal to its strong converse quantum capacity involves proving that the
sandwiched Rényi coherent information is additive for the channel. Unfortunately,
this quantity has not been shown to be additive for any quantum channel thus far,
which means that this approach is not known to be useful for obtaining strong
converse quantum capacities. We consider another approach to strong converse
quantum capacities in Section 14.2.4, which leads to a strong converse theorem
for dephasing channels (see (4.5.35)). In terms of a general statement about the
converse, the best we can say generally is that the regularized coherent information
is a weak converse rate for all quantum channels.
There are two ingredients to the proof of Theorem 14.16:
1. Achievability: We show that 𝐼reg𝑐 (N) is an achievable rate, which involves

explicitly constructing a quantum communication protocol. The developments

in Section 14.1.3 on a lower bound for one-shot quantum capacity can be
used (via the substitution N → N ⊗𝑛 ) to argue for the existence of a quantum
communication protocol for N in the asymptotic setting at the rate 𝐼reg
𝑐 (N).

The achievability part of the proof establishes that 𝑄(N) ≥ 𝐼reg

𝑐 (N).

2. Weak Converse: We show that 𝐼reg 𝑐 (N) is a weak converse rate, from which it

follows that 𝑄(N) ≤ 𝐼reg𝑐 (N). To show that 𝐼 𝑐 (N) is a weak converse rate,
reg
we use the one-shot upper bounds from Section 14.1.2 to conclude that every
achievable rate 𝑅 satisfies 𝑅 ≤ 𝐼reg
𝑐 (N).

𝑐 (N) is an achievable
We first establish in Section 14.2.1 that the quantity 𝐼reg
rate for quantum communication over N. Then, in Section 14.2.2, we prove that
𝑐 (N) is a weak converse rate.
𝐼reg

14.2.1 Proof of Achievability

𝑐 (N) is an achievable rate for quantum communica-
In this section, we prove that 𝐼reg
tion over a channel N.
√
First, recall from Corollary 14.9 that for all 𝜀 ∈ (0, 1), 𝛿 ∈ (0, 1), 𝜂 ∈ [0, 𝜀 𝛿/4),
and 𝛼 > 1, there exists a (𝑑, 𝜀) quantum communication protocol for N 𝐴→𝐵 with

1 1
log2 𝑑 ≥ sup 𝐻 e𝛼 ( 𝐴|𝐸) 𝜙 − log2
𝜓 𝐴𝐴′ 𝛼−1 𝑓 (𝜀, 𝛿, 𝜂)
901
Chapter 14: Quantum Communication

1
− log2 + log2 (𝜂4 (1 − 𝛿)), (14.2.16)
1 − 𝑓 (𝜀, 𝛿, 𝜂)
√ 2
𝜀 𝛿
where 𝑓 (𝜀, 𝛿, 𝜂) B − 𝜂 , 𝜙 𝐴𝐸 = N𝑐𝐴′ →𝐸 (𝜓 𝐴𝐴′ ), and 𝜓 𝐴𝐴′ is a pure state with
4
the dimension of 𝐴′ equal to the dimension of 𝐴. Note that

𝐻 e𝛼 (𝜙 𝐴𝐸 ∥ 1 𝐴 ⊗ 𝜎𝐸 ),
e𝛼 ( 𝐴|𝐸) 𝜙 = − inf 𝐷 (14.2.17)
𝜎𝐸

where the optimization is with respect to states 𝜎𝐸 . A simple corollary of (14.2.16)

is the following.

Corollary 14.17 Lower Bound for Quantum Communication in the

Asymptotic Setting
Let N 𝐴→𝐵 be a quantum channel. For all 𝑛 ∈ N, 𝜀 ∈ (0, 1), and 𝛼 > 1, there
exists an (𝑛, 𝑑, 𝜀) quantum communication protocol for N with

log2 𝑑 1 128
≥ sup 𝐻 e𝛼 ( 𝐴|𝐸) 𝜙 − log2 2
𝑛 𝜓 𝐴𝐴′ 𝑛(𝛼 − 1) 𝜀
!
1 1 4 1 15
− log2 2
− log2 − , (14.2.18)
𝑛 1− 𝜀 128
𝑛 𝜀 𝑛

where 𝜙 𝐴𝐸 = N𝑐𝐴′ →𝐸 (𝜓 𝐴𝐴′ ) and 𝜓 𝐴𝐴′ is a pure state with the dimension of 𝐴′
equal to the dimension of 𝐴.

Proof: Applying the inequality in (14.2.16) to the channel N ⊗𝑛 , letting 𝛿 = 1/2,

and letting 𝜂 = √𝜀 leads to
8 2

log2 𝑑 𝑛 𝑛 1 128
≥ sup 𝐻 ( 𝐴 |𝐸 )Φ − log2 2
𝑛 Ψ 𝐴𝑛 𝐴′ 𝑛 𝑛(𝛼 − 1) 𝜀
!
1 1 4 1 15
− log2 2
− log2 − , (14.2.19)
𝑛 1− 𝜀 𝑛
128
𝜀 𝑛

where Φ 𝐴𝑛 𝐸 𝑛 = (N𝑐𝐴′ →𝐸 ) ⊗𝑛 (Ψ𝐴𝑛 𝐴′ 𝑛 ), and the optimization is with respect to pure

states Ψ𝐴𝑛 𝐴′ 𝑛 with 𝑑 𝐴 = 𝑑 𝐴′ . Now, let 𝜓 𝐴𝐴′ be an arbitrary pure state, and let

902
Chapter 14: Quantum Communication

𝜙 𝐴𝐸 = N𝑐𝐴′ →𝐸 (𝜓 𝐴𝐴′ ). Then, restricting the optimization in the definition of

e𝛼 ( 𝐴𝑛 |𝐸 𝑛 ) 𝜙 ⊗𝑛 to product states (see (14.2.17)) leads to
𝐻

e𝛼 ( 𝐴𝑛 |𝐸 𝑛 ) 𝜙 ⊗𝑛 ≥ 𝑛 𝐻
𝐻 e𝛼 ( 𝐴|𝐸) 𝜙 . (14.2.20)

In other words,
e𝛼 ( 𝐴𝑛 |𝐸 𝑛 )Φ ≥ 𝐻
sup 𝐻 e𝛼 ( 𝐴𝑛 |𝐸 𝑛 ) 𝜙 ⊗𝑛 ≥ 𝑛 𝐻
e𝛼 ( 𝐴|𝐸) 𝜙 (14.2.21)
Ψ 𝐴𝑛 𝐴′ 𝑛

for every pure state 𝜓 𝐴𝐴′ , which implies that

e𝛼 ( 𝐴𝑛 |𝐸 𝑛 )Φ ≥ 𝑛 sup 𝐻
sup 𝐻 e𝛼 ( 𝐴|𝐸) 𝜙 . (14.2.22)
Ψ 𝐴𝑛 𝐴′ 𝑛 𝜓 𝐴𝐴′

Therefore, the inequality in (14.2.19) simplifies to (14.2.18), as required. ■

The inequality in (14.2.18) implies that

1 128
𝑄 𝑛,𝜀 (N) ≥ sup 𝐻 e𝛼 ( 𝐴|𝐸) 𝜙 − log2 2
𝜓 𝐴𝐴′ 𝑛(𝛼 − 1) 𝜀
!
1 1 4 1 15
− log2 2
− log2 − , (14.2.23)
𝑛 1− 𝜀 𝑛
128
𝜀 𝑛

for all 𝑛 ≥ 1, 𝜀 ∈ (0, 1), and 𝛼 > 1, where 𝑑 𝐴′ = 𝑑 𝐴 and 𝜙 𝐴𝐸 = N𝑐𝐴′ →𝐸 (𝜓 𝐴𝐴′ ).
We can now use (14.2.18) to prove that the regularized coherent information
𝑐 (N)
𝐼reg is an achievable rate for quantum communication over N.

Proof of the Achievability Part of Theorem 14.16

Fix 𝜀 ∈ (0, 1] and 𝛿 > 0. Let 𝛿1 , 𝛿2 > 0 be such that

𝛿 = 𝛿1 + 𝛿2 . (14.2.24)

Set 𝛼 ∈ (1, ∞) such that

𝛿1 ≥ 𝐼 𝑐 (N) − sup 𝐻
e𝛼 ( 𝐴|𝐸) 𝜙 , (14.2.25)
𝜓 𝐴𝐴′

903
Chapter 14: Quantum Communication

where 𝜓 𝐴𝐴′ is a pure state with the dimension of 𝐴′ equal to the dimension of 𝐴
and 𝜙 𝐴𝐸 = N𝑐𝐴′ →𝐸 (𝜓 𝐴𝐴′ ). Note that this is possible because 𝐻
e𝛼 ( 𝐴|𝐸) 𝜙 increases
monotonically with decreasing 𝛼 (this follows from Proposition 7.23), so that
e𝛼 ( 𝐴|𝐸) 𝜙 = sup sup 𝐻
lim sup 𝐻 e𝛼 ( 𝐴|𝐸) 𝜙 (14.2.26)
𝛼→1+ 𝜓 𝐴𝐴′ 𝜓
𝛼∈(1,∞) 𝐴𝐴′

= sup sup − inf 𝐷 e𝛼 (𝜙 𝐴𝐸 ∥ 1 𝐴 ⊗ 𝜎𝐸 ) (14.2.27)
𝛼∈(1,∞) 𝜓 𝐴𝐴′ 𝜎𝐸

= − inf e𝛼 (𝜙 𝐴𝐸 ∥ 1 𝐴 ⊗ 𝜎𝐸 )
inf inf 𝐷 (14.2.28)
𝛼∈(1,∞) 𝜓 𝐴𝐴′ 𝜎𝐸
e𝛼 (𝜙 𝐴𝐸 ∥ 1 ⊗ 𝜎𝐸 )
= − inf inf inf 𝐷 (14.2.29)
𝜓 𝐴𝐴′ 𝜎𝐸 𝛼∈(1,∞)
= − inf inf 𝐷 (𝜙 𝐴𝐸 ∥ 1 𝐴 ⊗ 𝜎𝐸 ) (14.2.30)
𝜓 𝐴𝐴′ 𝜎𝐸

= sup − inf 𝐷 (𝜙 𝐴𝐸 ∥ 1 𝐴 ⊗ 𝜎𝐸 ) (14.2.31)
𝜓 𝐴𝐴′ 𝜎𝐸

= sup 𝐻 ( 𝐴|𝐸) 𝜙 , (14.2.32)

𝜓 𝐴𝐴′

where the fifth equality follows from Proposition 7.22. Now, let VN 𝐴′ →𝐵𝐸 be an iso-
metric channel extending N 𝐴′ →𝐵 such that 𝜙 𝐴𝐸 = N 𝐴′ →𝐸 (𝜓 𝐴𝐴′ ) = Tr 𝐵 [VN
𝑐
𝐴′ →𝐵𝐸 (𝜓 𝐴𝐴 )].
′

Since the state VN 𝐴′ →𝐵𝐸 (𝜓 𝐴𝐴 ) is pure, we can view it as a purification of

′

𝜔 𝐴𝐵 = N 𝐴′ →𝐵 (𝜓 𝐴𝐴′ ), so that
𝐻 ( 𝐴|𝐸) 𝜙 = 𝐻 ( 𝐴𝐸) 𝜙 − 𝐻 (𝐸) 𝜙 (14.2.33)
= 𝐻 (𝐵)𝜔 − 𝐻 ( 𝐴𝐵)𝜔 (14.2.34)
= 𝐼 ( 𝐴⟩𝐵)𝜔 (14.2.35)
for every pure state 𝜓 𝐴𝐴′ . Therefore,
sup 𝐻 ( 𝐴|𝐸) 𝜙 = sup 𝐼 ( 𝐴⟩𝐵)𝜔 = 𝐼 𝑐 (N). (14.2.36)
𝜓 𝐴𝐴′ 𝜓 𝐴𝐴′

With 𝛼 ∈ (1, ∞) chosen such that (14.2.25) holds, take 𝑛 large enough so that
!
1 128 1 1 4 1 15
𝛿2 ≥ log2 2 + log2 + log2 + . (14.2.37)
𝑛(𝛼 − 1) 𝜀 𝑛 1− 𝜀
2
𝑛 𝜀 𝑛
128

Now, we use the fact that for the 𝑛 and 𝜀 chosen above there exists an (𝑛, 𝑑, 𝜀)
protocol such that

904
Chapter 14: Quantum Communication

log2 𝑑 1 128
≥ sup 𝐻 e𝛼 ( 𝐴|𝐸) 𝜙 − log2 2
𝑛 𝜓 𝐴𝐴′ 𝑛(𝛼 − 1) 𝜀
!
1 1 4 1 15
− log2 2
− log2 − , (14.2.38)
𝜀
𝑛 1 − 128 𝑛 𝜀 𝑛
which holds due to Corollary 14.17. Rearranging the right-hand side of this
inequality, and using (14.2.24), (14.2.25), and (14.2.37), we find that

log2 𝑑 𝑐 𝑐 1 128
≥ 𝐼 (N) − 𝐼 (N) − sup 𝐻 e𝛼 ( 𝐴|𝐸) 𝜙 + log2 2
𝑛 𝜓 𝐴𝐴′ 𝑛(𝛼 − 1) 𝜀
! !
1 1 4 1 15
+ log2 2
+ log2 + (14.2.39)
𝜀
𝑛 1 − 128 𝑛 𝜀 𝑛
≥ 𝐼 𝑐 (N) − (𝛿1 + 𝛿2 ) (14.2.40)
= 𝐼 𝑐 (N) − 𝛿. (14.2.41)
log 𝑑
Thus, there exists an (𝑛, 𝑑, 𝜀) quantum communication protocol with rate 𝑛2 ≥
𝐼 𝑐 (N) − 𝛿. Therefore, there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) quantum communication
protocol with 𝑅 = 𝐼 𝑐 (N) for all sufficiently large 𝑛 such that (14.2.37) holds. Since
𝜀 and 𝛿 are arbitrary, we conclude that for all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently
large 𝑛, there exists an (𝑛, 2𝑛(𝐼 (N)−𝛿) , 𝜀) quantum communication protocol. This
𝑐

means, by definition, that 𝐼 𝑐 (N) is an achievable rate for quantum communication

over N.
Now, we can repeat the arguments above for the tensor-power channel N ⊗𝑘 for
all 𝑘 ≥ 1, and so we conclude that 1𝑘 𝐼 𝑐 (N ⊗𝑘 ) is an achievable rate (the arguments are
similar to the arguments in the proof of the achievability part of Proposition 13.19).
Since 𝑘 is arbitrary, we conclude that lim𝑘→∞ 1𝑘 𝐼 𝑐 (N ⊗𝑘 ) = 𝐼reg𝑐 (N) is an achievable

rate for quantum communication over N. ■

14.2.2 Proof of the Weak Converse

𝑐 (N) is a weak converse
We now show that the regularized coherent information 𝐼reg
rate for quantum communication over N. This establishes that 𝑄(N) ≤ 𝐼reg 𝑐 (N)
𝑐 (N), completing the proof of Theorem 14.16.
and therefore that 𝑄(N) = 𝐼reg
Let us first recall from Theorem 14.4 that for every quantum channel N, we
have the following: for all 𝜀 ∈ [0, 1) and (𝑑, 𝜀) quantum communication protocols
905
Chapter 14: Quantum Communication

Ψ RA0 ω RB0
A1 B1

A2 B2

A0
E ..
.
A n −1
..
. PσBn ..
.
Bn−1
D B0

An Bn

Alice Bob

Figure 14.4: Depiction of a protocol that is useless for quantum communication

in the asymptotic setting. The state encoding Alice’s share of the pure state
Ψ𝑅 𝐴′ is discarded and replaced by an arbitrary (but fixed) state 𝜎𝐵𝑛 .

for N,
(1 − 2𝜀) log2 𝑑 ≤ 𝐼 𝑐 (N) + ℎ2 (𝜀), (14.2.42)

𝑐 𝛼 1
log2 𝑑 ≤ e
𝐼𝛼 (N) + log2 ∀ 𝛼 > 1. (14.2.43)
𝛼−1 1−𝜀
To obtain these inequalities, we considered a quantum communication protocol for
a useless channel. The useless channel in the asymptotic setting is analogous to the
one in Figure 14.2 and is shown in Figure 14.4. Applying (14.2.42) and (14.2.43)
to the channel N ⊗𝑛 leads to the following.

Corollary 14.18 Upper Bounds for Quantum Communication in the

Asymptotic Setting
Let N be a quantum channel. For all 𝜀 ∈ [0, 1), 𝑛 ∈ N, and (𝑛, 𝑑, 𝜀) quantum
communication protocols over 𝑛 uses of N, the number of transmitted qubits is
bounded from above as follows:
log 𝑑 1 1
(1 − 2𝜀) 2 ≤ 𝐼 𝑐 (N ⊗𝑛 ) + ℎ2 (𝜀), (14.2.44)
𝑛 𝑛 𝑛
log2 𝑑 1 𝑐 ⊗𝑛 𝛼 1
≤ e 𝐼𝛼 (N ) + log2 ∀ 𝛼 > 1. (14.2.45)
𝑛 𝑛 𝑛(𝛼 − 1) 1−𝜀

906
Chapter 14: Quantum Communication

Proof: Since the inequalities in (14.2.42) and (14.2.43) of Theorem 14.4 hold for
every channel N, they hold for the channel N ⊗𝑛 . Therefore, applying (14.2.42) and
(14.2.43) to N ⊗𝑛 and dividing both sides by 𝑛, we obtain the desired result. ■

The inequalities in the corollary above give us, for all 𝜀 ∈ [0, 1) and 𝑛 ∈ N, an
upper bound on the rate of an arbitrary (𝑛, 𝑑, 𝜀) quantum communication protocol.
If instead we fix a particular rate 𝑅 by letting 𝑑 = 2𝑛𝑅 , then we can obtain a lower
bound on the error probability of an (𝑛, 2𝑛𝑅 , 𝜀) quantum communication protocol.
Specifically, using (14.2.45), we find that

𝜀 ≥ 1 − 2−𝑛 (
𝛼−1
𝛼 )( 𝑅− 𝑛1 e𝐼 𝛼𝑐 (N ⊗𝑛 ) ) (14.2.46)

for all 𝛼 > 1.

The inequalities in (14.2.44) and (14.2.45) imply that
1 𝑐 ⊗𝑛 1
(1 − 2𝜀)𝑄 𝑛,𝜀 (N) ≤ 𝐼 (N ) + ℎ2 (𝜀), (14.2.47)
𝑛 𝑛
1 𝛼 1
𝑄 𝑛,𝜀 (N) ≤ e𝐼𝛼𝑐 (N ⊗𝑛 ) + log2 ∀ 𝛼 > 1, (14.2.48)
𝑛 𝑛(𝛼 − 1) 1−𝜀
with 𝑛 ≥ 1 and 𝜀 ∈ [0, 1).
Using (14.2.44), we can now prove the weak converse part of Theorem 14.16.

Proof of the Weak Converse Part of Theorem 14.16

Suppose that 𝑅 is an achievable rate. Then, by definition, for all 𝜀 ∈ (0, 1], 𝛿 > 0,
and sufficiently large 𝑛, there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) quantum communication
protocol for N. For all such protocols, the inequality (14.2.44) in Corollary 14.18
holds, so that
1 1
(1 − 2𝜀)(𝑅 − 𝛿) ≤ 𝐼 𝑐 (N ⊗𝑛 ) + ℎ2 (𝜀). (14.2.49)
𝑛 𝑛
Since this bound holds for all sufficiently large 𝑛, it holds in the limit 𝑛 → ∞, so
that

1 𝑐 ⊗𝑛 1
(1 − 2𝜀)𝑅 ≤ lim 𝐼 (N ) + ℎ2 (𝜀) + (1 − 2𝜀)𝛿, (14.2.50)
𝑛→∞ 𝑛 𝑛
1
= lim 𝐼 𝑐 (N ⊗𝑛 ) + (1 − 2𝜀)𝛿. (14.2.51)
𝑛→∞ 𝑛

907
Chapter 14: Quantum Communication

Then, since this inequality holds for all 𝜀 ∈ (0, 1/2) and 𝛿 > 0, we obtain

1 1 𝑐 1
𝑅 ≤ lim lim 𝐼 (N) + 𝛿 = lim 𝐼 𝑐 (N ⊗𝑛 ) = 𝐼reg 𝑐
(N). (14.2.52)
𝜀,𝛿→0 1 − 2𝜀 𝑛→∞ 𝑛 𝑛→∞ 𝑛

We have thus shown that if 𝑅 is an achievable rate, then 𝑅 ≤ 𝐼reg 𝑐 (N). The
𝑐 (N), then 𝑅 is not an achievable
contrapositive of this statement is that if 𝑅 > 𝐼reg
𝑐 (N) is a weak converse rate.
rate. By definition, therefore, 𝐼reg

14.2.3 The Additivity Question

Although we have shown that the quantum capacity 𝑄(N) of a channel N is given
𝑐 (N) = lim 1 𝑐 ⊗𝑛
by its regularized coherent information 𝐼reg 𝑛→∞ 𝑛 𝐼 (N ), without the
additivity of 𝐼 𝑐 (N), this result is not particularly helpful since it is not clear whether
the regularized coherent information can be computed in general.
The coherent information is always superadditive, meaning that for two arbitrary
quantum channels N1 and N2 ,
𝐼 𝑐 (N1 ⊗ N2 ) ≥ 𝐼 𝑐 (N1 ) + 𝐼 𝑐 (N2 ). (14.2.53)
This follows from the fact that coherent information is additive for product states
𝜏𝐴1 𝐵1 ⊗ 𝜔 𝐴2 𝐵2 :
𝐼 ( 𝐴1 𝐴2 ⟩𝐵1 𝐵2 )𝜏⊗𝜔 = 𝐼 ( 𝐴1 ⟩𝐵1 )𝜏 + 𝐼 ( 𝐴2 ⟩𝐵2 )𝜔 , (14.2.54)
which is a consequence of (7.1.6) and the additivity of entropy for product states
(see (7.2.104)).
Now, let 𝜓 𝑅1 𝑅2 𝐴1 𝐴2 , 𝜙 𝑅1 𝐴1 , 𝜑 𝑅2 𝐴2 be arbitrary pure states, where 𝐴1 and 𝐴2 are
input systems to the channels N1 and N2 , respectively, and 𝑑 𝑅1 = 𝑑 𝐴1 and 𝑑 𝑅2 = 𝑑 𝐴2 .
Then, letting
𝜌 𝑅1 𝑅2 𝐵1 𝐵2 B ((N1 ) 𝐴1 →𝐵1 ⊗ (N2 ) 𝐴2 →𝐵2 )(𝜓 𝑅1 𝑅2 𝐴1 𝐴2 ), (14.2.55)
𝜏𝑅1 𝐵1 B (N1 ) 𝐴1 →𝐵1 (𝜙 𝑅1 𝐴1 ), (14.2.56)
𝜔 𝑅2 𝐵2 B (N2 ) 𝐴2 →𝐵2 (𝜑 𝑅2 𝐴2 ), (14.2.57)
and restricting the optimization in the definition of coherent information of a
channel to pure product states, we find that
𝐼 𝑐 (N1 ⊗ N2 ) = sup 𝐼 (𝑅1 𝑅2 ⟩𝐵1 𝐵2 ) 𝜌 (14.2.58)
𝜓 𝑅1 𝑅2 𝐴1 𝐴2

908
Chapter 14: Quantum Communication

≥ sup 𝐼 (𝑅1 𝑅2 ⟩𝐵1 𝐵2 )𝜏⊗𝜔 (14.2.59)

𝜙 𝑅1 𝐴1 ⊗𝜑 𝑅2 𝐴2

= sup {𝐼 (𝑅1 ⟩ 𝐴1 )𝜏 + 𝐼 (𝑅2 ⟩𝐵2 )𝜔 } (14.2.60)

𝜙 𝑅1 𝐴1 ⊗𝜑 𝑅2 𝐴2

= sup 𝐼 (𝑅1 ⟩𝐵1 )𝜏 + sup 𝐼 (𝑅2 ⟩𝐵2 )𝜔 (14.2.61)

𝜙 𝑅1 𝐴1 𝜑 𝑅2 𝐴2

= 𝐼 𝑐 (N1 ) + 𝐼 𝑐 (N2 ), (14.2.62)

which is precisely (14.2.53). The reverse inequality does not hold in general, but it
does for degradable channels (see Section 14.3.1 below).
For the sandwiched Rényi coherent information of a bipartite state 𝜌 𝐴𝐵 , which
is defined as
e e𝛼 (𝜌 𝐴𝐵 ∥ 1 𝐴 ⊗ 𝜎𝐵 ),
𝐼𝛼𝑐 ( 𝐴⟩𝐵) 𝜌 = inf 𝐷 (14.2.63)
𝜎𝐵

where the optimization is over states 𝜎𝐵 , the following additivity equality holds for
all product states 𝜏𝐴1 𝐵1 ⊗ 𝜔 𝐴2 𝐵2 and 𝛼 ∈ (1, ∞):

𝐼𝛼 ( 𝐴1 𝐴2 ⟩𝐵1 𝐵2 )𝜏⊗𝜔 = e
e 𝐼𝛼 ( 𝐴1 ⟩𝐵1 )𝜏 + e
𝐼𝛼 ( 𝐴2 ⟩𝐵2 )𝜔 . (14.2.64)

This equality follows by reasoning similar to that given for the proof of Propo-
sition 11.21. By the same reasoning given in (14.2.58)–(14.2.62), we conclude
that
𝐼𝛼𝑐 (N1 ⊗ N2 ) ≥ 𝐼𝛼𝑐 (N1 ) + 𝐼𝛼𝑐 (N2 ) (14.2.65)
for all 𝛼 ∈ (1, ∞), where 𝐼𝛼𝑐 (N) is the sandwiched Rényi coherent information of
the channel 𝐼𝛼𝑐 (N). Whether the reverse inequality holds, even for particular classes
of channels, is an open question.

14.2.4 Rains Information Strong Converse Upper Bound

Except for channels for which the coherent information is known to be additive
(such as the class of degradable channels; see Section 14.3.1 below), the quantum
capacity of a channel is difficult to compute. This prompts us to find tractable upper
bounds on quantum capacity. This search for tractable upper bounds is entirely
analogous to what was done in Section 12.2.5 for classical communication in order
to obtain tractable strong converse upper bounds on classical capacity.
Recall that in the previous chapter on entanglement distillation, our approach to
obtaining strong converse upper bounds on distillable entanglement consisted of
909
Chapter 14: Quantum Communication

comparing the state at the output of an entanglement distillation protocol with one
that is useless for entanglement distillation. We considered the set of PPT′ operators
as the useless set, and we obtained state entanglement measures as upper bounds in
the one-shot and asymptotic settings. Now, observe that entanglement transmission
is similar to entanglement distillation in the sense that, like entanglement distillation,
the error criterion for entanglement transmission involves comparing the output
state of the protocol to the maximally entangled state. This suggests that the state
entanglement measures defined in Section 9.3, and in particular the results of
Proposition 13.6 and Corollary 13.7, are relevant. However, the main resource
that we are considering in this chapter is a quantum channel and not a quantum
state, and so we have an extra degree of freedom in the input state to the channel,
which we can optimize. This suggests that the channel entanglement measures
from Chapter 10 are relevant, and this is indeed what we find.

Proposition 14.19
Let N 𝐴→𝐵 be a quantum channel. For an arbitrary (𝑑, 𝜀) quantum commu-
nication protocol for N 𝐴→𝐵 , the number log2 𝑑 of qubits transmitted over N
satisfies
𝜀
log2 𝑑 ≤ 𝑅𝐻 (N), (14.2.66)
where
𝜀 𝜀
𝑅𝐻 (N) = sup 𝑅𝐻 (𝑆; 𝐵)𝜔 (14.2.67)
𝜓𝑆 𝐴
= sup inf′
𝐷 (N 𝐴→𝐵 (𝜓 𝑆 𝐴 )∥𝜎𝑆𝐵 ) (14.2.68)
𝜓 𝑆 𝐴 𝜎𝑆𝐵 ∈PPT (𝑆:𝐵)

is the 𝜀-hypothesis testing Rains information of the channel N, defined in

(10.4.8). Consequently, the one-shot quantum capacity does not exceed the
𝜀-hypothesis testing Rains information of the channel N:

𝑄 𝜀 (N) ≤ 𝑅𝐻
𝜀
(N). (14.2.69)

Remark: Note that in the expression for 𝑅 𝐻 𝜀 (N) above it suffices to optimize over pure states

𝜓 𝑅 𝐴, with the dimension of 𝑆 equal to the dimension of 𝐴. We showed this in (10.1.3)–(10.1.6)

immediately after Definition 10.1.

Proof: By the arguments in (14.1.11)–(14.1.13), a (𝑑, 𝜀) quantum communication

910
Chapter 14: Quantum Communication

protocol is a (𝑑, 𝜀) entanglement transmission protocol. As such, we conclude

that the state 𝜌 𝑅𝐵′ defined in (14.1.19) satisfies Tr[Φ 𝑅𝐵′ 𝜌 𝑅𝐵′ ] ≥ 1 − 𝜀. We can
therefore apply Proposition 13.6 to conclude that
𝜀
log2 𝑑 ≤ 𝑅𝐻 (𝑆; 𝐵′) 𝜌 . (14.2.70)
Note that
𝜀
𝑅𝐻 (𝑆; 𝐵′) 𝜌
= inf′ 𝐷 𝜀𝐻 (𝜌 𝑆𝐵′ ∥𝜎𝑆𝐵′ ) (14.2.71)
𝜎𝑆𝐵′ ∈PPT (𝑆:𝐵′ )
= inf′ 𝐷 𝜀𝐻 ((D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(Φ𝑆 𝐴′ )∥𝜎𝑆𝐵′ ). (14.2.72)
𝜎𝑆𝐵′ ∈PPT (𝑆:𝐵′ )

Now, since every local channel is completely PPT preserving (this follows imme-
diately from Proposition 4.29 and Lemma 4.30), we conclude that the channel
D𝐵→𝐵′ ≡ id𝑆 ⊗ D𝐵→𝐵′ is completely PPT preserving, so that the set
{D𝐵→𝐵′ (𝜏𝑆𝐵 ) : 𝜏𝑆𝐵 ∈ PPT′ (𝑆 : 𝐵)} (14.2.73)
is a subset of PPT′ (𝑆 : 𝐵′). Thus, by restricting the optimization over all operators
𝜎𝑆𝐵′ ∈ PPT′ (𝑆 : 𝐵′) to the outputs D𝐵→𝐵′ (𝜏𝑆𝐵 ) of the decoding channel D𝐵→𝐵′
acting on operators 𝜏𝑆𝐵 ∈ PPT′ (𝑆 : 𝐵), we obtain
𝜀
𝑅𝐻 (𝑆; 𝐵′)𝜔
≤ inf′ 𝐷 𝜀𝐻 ((D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(Φ𝑆 𝐴′ )∥D𝐵→𝐵′ (𝜏𝑆𝐵 ))
𝜏𝑆𝐵 ∈PPT (𝑆:𝐵)
(14.2.74)
≤ inf 𝐷 𝜀𝐻 ((N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(Φ𝑆 𝐴′ )∥𝜏𝑆𝐵 ) (14.2.75)
𝜏𝑆𝐵 ∈PPT′ (𝑆:𝐵)
= inf′ 𝐷 𝜀𝐻 (N 𝐴→𝐵 (𝜌 𝑆 𝐴 )∥𝜏𝑆𝐵 ), (14.2.76)
𝜏𝑆𝐵 ∈PPT (𝑆:𝐵)

where the second inequality follows from the data-processing inequality for hypoth-
esis testing relative entropy, and the equality follows by letting 𝜌 𝑆 𝐴 = E 𝐴′ →𝐴 (Φ𝑆 𝐴′ ).
Finally, after optimizing over all states 𝜌 𝑆 𝐴 , we obtain
𝜀
𝑅𝐻 (𝑆; 𝐵′) 𝜌 ≤ sup inf′ 𝐷 𝜀𝐻 (N 𝐴→𝐵 (𝜌 𝑆 𝐴 )∥𝜏𝑆𝐵 ) (14.2.77)
𝜌 𝑆 𝐴 𝜏𝑆𝐵 ∈PPT (𝑆:𝐵)
𝜀
= 𝑅𝐻 (N), (14.2.78)
so that, by (14.2.70), we conclude that
𝜀
log2 𝑑 ≤ 𝑅𝐻 (N), (14.2.79)
as required. ■
911
Chapter 14: Quantum Communication

The result of Proposition 14.19 is analogous to the result of Theorem 14.3.

Combining it with Proposition 7.71 leads to the following:

Corollary 14.20 One-Shot Rains Upper Bound for Quantum Communi-

cation
Let N 𝐴→𝐵 be a quantum channel, and let 𝜀 ∈ [0, 1). For all (𝑑, 𝜀) quantum
communication protocols over N, we have that

𝛼 1
log2 𝑑 ≤ 𝑅
e𝛼 (N) + log2 ∀ 𝛼 > 1, (14.2.80)
𝛼−1 1−𝜀

where
e𝛼 (N) = sup 𝑅
𝑅 e𝛼 (𝑆; 𝐵)𝜔 (14.2.81)
𝜓𝑆 𝐴

= sup inf′ e𝛼 (N 𝐴→𝐵 (𝜓 𝑆 𝐴 )∥𝜎𝑆𝐵 )

𝐷 (14.2.82)
𝜓 𝑆 𝐴 𝜎𝑆𝐵 ∈PPT (𝑆:𝐵)

is the sandwiched Rényi Rains information of N, defined in (10.4.10).

Remark: Note that in the expression for 𝑅 e𝛼 (N) above it suffices to optimize over pure states
𝜓 𝑆 𝐴, with the dimension of 𝑆 equal to the dimension of 𝐴. We showed this in (10.1.3)–(10.1.6)
immediately after Definition 10.1.

Since the inequality in (14.2.80) holds for all (𝑑, 𝜀) quantum communication
protocols, we conclude the following upper bound on the one-shot quantum capacity:

𝜀 𝛼 1
𝑄 (N) ≤ 𝑅 e𝛼 (N) + log2 (14.2.83)
𝛼−1 1−𝜀
for all 𝛼 > 1.
For 𝑛 channel uses, the bound in (14.2.80) becomes

log2 𝑑 1 ⊗𝑛 𝛼 1
≤ 𝑅 e𝛼 (N ) + log2 ∀ 𝛼 > 1, (14.2.84)
𝑛 𝑛 𝑛(𝛼 − 1) 1−𝜀
which holds for an arbitrary (𝑛, 𝑑, 𝜀) quantum communication protocol that employs
𝑛 uses of the channel N, where 𝑛 ≥ 1 and 𝜀 ∈ [0, 1). We can simplify this inequality
by making use of the following fact.

912
Chapter 14: Quantum Communication

Proposition 14.21 Weak Subadditivity of Rényi Rains Information of a

Channel
Let N 𝐴→𝐵 be a quantum channel, with 𝑑 𝐴 the dimension of the input system 𝐴.
For all 𝛼 > 1 and 𝑛 ∈ N, we have

𝛼(𝑑 2𝐴 − 1)
e𝛼 (N ⊗𝑛 ) ≤ 𝑛 𝑅
𝑅 e𝛼 (N) + log2 (𝑛 + 1). (14.2.85)
𝛼−1

Proof: Throughout this proof, for convenience we make use of the alternate
notation
e𝛼 (𝜌 𝐴𝐵 ) ≡ 𝑅
𝑅 e𝛼 ( 𝐴; 𝐵) 𝜌 (14.2.86)
where 𝜌 𝐴𝐵 is a bipartite state.
Let 𝜓 𝑆 𝐴𝑛 be an arbitrary pure state, with the dimension of 𝑆 equal to the
dimension of 𝐴𝑛 , and let 𝜌 𝐴𝑛 B Tr𝑆 [𝜓 𝑆 𝐴𝑛 ]. We start by observing that the channel
N ⊗𝑛 is covariant with respect to the symmetric group S𝑛 . In particular, if we let
{𝑊 𝐴𝜋 𝑛 } 𝜋∈S𝑛 and {𝑊𝐵𝜋𝑛 } 𝜋∈S𝑛 be the unitary representations of S𝑛 , defined in (2.5.1),
acting on H ⊗𝑛 ⊗𝑛
𝐴 and H 𝐵 , respectively, then for every state 𝜌 𝐴 , we have that
𝑛

N ⊗𝑛 (𝑊 𝐴𝜋 𝑛 𝜌 𝐴𝑛 𝑊 𝐴𝜋†𝑛 ) = 𝑊𝐵𝜋𝑛 N ⊗𝑛 (𝜌 𝐴𝑛 )𝑊𝐵𝜋†𝑛 (14.2.87)

for all 𝜋 ∈ S𝑛 . Consequently, by Proposition 10.12, in particular by (10.4.19), we

find that
e𝛼 (N ⊗𝑛 (𝜓 𝑆 𝐴𝑛 )) ≤ 𝑅
𝑅 e𝛼 (N ⊗𝑛 (𝜓 𝜌 𝑛 )), (14.2.88)
𝐴→𝐵 𝐴→𝐵 𝑆𝐴
where the state 𝜌 𝐴𝑛 is defined as
1 ∑︁ 𝜋
𝜌 𝐴𝑛 = 𝑊 𝐴𝑛 𝜌 𝐴𝑛 𝑊 𝐴𝜋†𝑛 , (14.2.89)
𝑛!
𝜋∈S𝑛

𝜌
and 𝜓 𝑆 𝐴𝑛 is a purification of 𝜌 𝐴𝑛 .
Since the state 𝜌 𝐴𝑛 is permutation invariant by definition, by Lemma 3.13, it
has a purification |𝜙 𝜌 ⟩ 𝐴ˆ 𝑛 𝐴𝑛 ∈ Sym𝑛 (H 𝐴𝐴
ˆ ), where the dimension of 𝐴
ˆ is equal to
the dimension of 𝐴. Consequently, there exists an isometry 𝑉𝑆→ 𝐴ˆ 𝑛 such that

𝑉𝑆→ 𝐴ˆ 𝑛 |𝜓 𝜌 ⟩𝑆 𝐴𝑛 = |𝜙 𝜌 ⟩ 𝐴ˆ 𝑛 𝐴𝑛 . (14.2.90)

913
Chapter 14: Quantum Communication

Therefore, by isometric invariance of the sandwiched Rényi Rains relative entropy,

we obtain
e𝛼 (N ⊗𝑛 (𝜓 𝜌 𝑛 )) = 𝑅
𝑅 e𝛼 (N ⊗𝑛 (𝜙 𝜌 )). (14.2.91)
𝐴→𝐵 𝑆𝐴 𝐴→𝐵 ˆ𝑛 𝑛 𝐴 𝐴

Now, since 𝜙 ˆ 𝑛 𝑛 is a state, the operator inequality 𝜙 ˆ 𝑛 𝑛 ≤ 1 𝐴ˆ 𝑛 𝐴𝑛 holds.

𝜌 𝜌
𝐴 𝐴 𝐴 𝐴
Multiplying both sides of this inequality from the left and right by the projection
ΠSym𝑛 (H 𝐴𝐴
ˆ )
≡ Π 𝐴ˆ 𝑛 𝐴𝑛 onto the symmetric subspace of H ⊗𝑛
ˆ , we obtain 𝐴𝐴

Π 𝐴ˆ 𝑛 𝐴𝑛 |𝜙 𝜌 ⟩⟨𝜙 𝜌 | 𝐴ˆ 𝑛 𝐴𝑛 Π 𝐴ˆ 𝑛 𝐴𝑛 ≤ Π 2𝐴ˆ 𝑛 𝐴𝑛 = Π 𝐴ˆ 𝑛 𝐴𝑛 . (14.2.92)

But |𝜙 𝜌 ⟩ 𝐴ˆ 𝑛 𝐴𝑛 ∈ Sym𝑛 (H 𝐴𝐴
ˆ ), which means that

Π 𝐴ˆ 𝑛 𝐴𝑛 |𝜙 𝜌 ⟩⟨𝜙 𝜌 | 𝐴ˆ 𝑛 𝐴𝑛 Π 𝐴ˆ 𝑛 𝐴𝑛 = |𝜙 𝜌 ⟩⟨𝜙 𝜌 | 𝐴ˆ 𝑛 𝐴𝑛 . (14.2.93)

Therefore, ∫
𝑑 2𝐴 + 𝑛 − 1
𝜙 ⊗𝑛
𝜌
𝜙 ˆ𝑛 𝑛 ≤ Π 𝐴ˆ 𝑛 𝐴𝑛 = ˆ d𝜙, (14.2.94)
𝐴 𝐴 𝑛 𝐴𝐴

where the equality follows from (2.5.16). Now, note that

(𝑛 + 𝑑 2𝐴 − 1)(𝑛 + 𝑑 2𝐴 − 2) · · · (𝑛 + 2)(𝑛 + 1)

𝑛 + 𝑑 2𝐴 − 1
= (14.2.95)
𝑛 (𝑑 2𝐴 − 1)(𝑑 2𝐴 − 2) · · · 2 · 1
𝑛 + 𝑑 2𝐴 − 1 𝑛 + 𝑑 2𝐴 − 2 𝑛+2 𝑛+1
= · · · · · . (14.2.96)
𝑑 2𝐴 − 1 𝑑 2𝐴 − 2 2 1

Then, using the fact that 𝑛+𝑘

𝑘 ≤ 𝑛 + 1 for all 𝑘 ≥ 1, and applying this inequality to
each factor on the right-hand side of the above equation, we obtain

𝑛 + 𝑑 2𝐴 − 1
≤ (𝑛 + 1) 𝑑 𝐴−1
2
(14.2.97)
𝑛
Therefore,
∫
𝑑 2𝐴−1
𝜙 ⊗𝑛 𝑑 𝐴−1
𝜌 2
𝜙 ˆ𝑛 𝑛 ≤ (𝑛 + 1) ˆ d𝜙 ≡ (𝑛 + 1) 𝜉 𝐴ˆ 𝑛 𝐴𝑛 . (14.2.98)
𝐴 𝐴 𝐴𝐴

Next, we use (7.5.44) to obtain

e𝛼 (N ⊗𝑛 (𝜙 𝜌 ) ≤ 𝑅
e𝛼 (N ⊗𝑛 (𝜉 ˆ 𝑛 𝑛 )) + 𝛼
log2 (𝑛 + 1) 𝑑 𝐴−1 .
2
𝑅 (14.2.99)
𝐴→𝐵 ˆ𝑛 𝑛 𝐴 𝐴 𝐴→𝐵 𝐴 𝐴 𝛼−1
914
Chapter 14: Quantum Communication

Then, by quasi-convexity of the Rényi Rains relative entropy (Proposition 9.26),

we obtain
e𝛼 (N ⊗𝑛 (𝜉 ˆ 𝑛 𝑛 )) ≤ sup 𝑅
𝑅 e𝛼 (N ⊗𝑛 (𝜙 ⊗𝑛 )). (14.2.100)
𝐴→𝐵 𝐴 𝐴 𝐴→𝐵 ˆ 𝐴𝐴
𝜙 𝐴𝐴
ˆ

Then, using subadditivity of the sandwiched Rényi Rains relative entropy for
tensor-product states, as given by (9.3.18), we find that
e𝛼 (N ⊗𝑛 (𝜙 ⊗𝑛 )) ≤ 𝑛 𝑅
𝑅 e𝛼 (N 𝐴→𝐵 (𝜙 ˆ )). (14.2.101)
𝐴→𝐵 ˆ 𝐴𝐴 𝐴𝐴

Putting everything together, we finally obtain

e𝛼 (N ⊗𝑛 (𝜓 𝑆 𝐴𝑛 ))
𝑅 𝐴→𝐵
𝛼(𝑑 2𝐴 − 1)
≤ 𝑛 sup 𝑅e𝛼 (N 𝐴→𝐵 (𝜙 ˆ )) + log2 (𝑛 + 1) (14.2.102)
𝜙 𝐴𝐴
ˆ
𝐴𝐴
𝛼 − 1
𝛼(𝑑 2𝐴 − 1)
e𝛼 (N) +
= 𝑛𝑅 log2 (𝑛 + 1). (14.2.103)
𝛼−1
Since the pure state 𝜓 𝑆 𝐴𝑛 is arbitrary, we conclude that

𝛼(𝑑 2𝐴 − 1)
e𝛼 (N ⊗𝑛 ) = sup 𝑅
𝑅 e𝛼 (N ⊗𝑛 (𝜓 𝑆 𝐴𝑛 )) ≤ 𝑛 𝑅
e𝛼 (N) + log2 (𝑛 + 1),
𝜓 𝑆 𝐴𝑛
𝐴→𝐵 𝛼−1
(14.2.104)
as required. ■

Combining (14.2.85) with (14.2.84) leads to the following upper bound on

the rate of an arbitrary (𝑛, 𝑑, 𝜀) quantum communication protocol for a quantum
channel N 𝐴→𝐵 :
2 −1
!
log2 𝑑 𝛼 (𝑛 + 1) 𝑑 𝐴
≤𝑅e𝛼 (N) + log2 (14.2.105)
𝑛 𝑛(𝛼 − 1) 1−𝜀

for all 𝛼 > 1. Consequently, the following bound holds for the 𝑛-shot quantum
capacity:
!
𝑑 2𝐴−1
𝛼 (𝑛 + 1)
𝑄 𝑛,𝜀 (N) ≤ 𝑅
e𝛼 (N) + log2 (14.2.106)
𝑛(𝛼 − 1) 1−𝜀

for all 𝛼 > 1.

915
Chapter 14: Quantum Communication

With this bound, we are now ready to state the main result of this section, which
is that the Rains information of a channel is an upper bound on the strong converse
capacity of an arbitrary quantum channel N.

Theorem 14.22 Strong Converse Upper Bound on Quantum Capacity

The Rains information 𝑅(N) of a quantum channel N 𝐴→𝐵 is a strong converse
rate for quantum communication over N. In other words, 𝑄(N)
e ≤ 𝑅(N) for
every quantum channel N.

Recall from (10.4.6) that

𝑅(N) = sup inf ′
𝐷 (N 𝐴→𝐵 (𝜓 𝑆 𝐴 )∥𝜎𝑆𝐵 ), (14.2.107)
𝜓 𝑆 𝐴 𝜎𝑆𝐵 ∈PPT (𝑆:𝐵)

where the supremum is with respect to pure states 𝜓 𝑆 𝐴 with 𝑑 𝑆 = 𝑑 𝐴 .

Proof: Let 𝜀 ∈ [0, 1) and 𝛿 > 0. Let 𝛿1 , 𝛿2 > 0 be such that

𝛿 > 𝛿 1 + 𝛿 2 C 𝛿′ . (14.2.108)
Set 𝛼 ∈ (1, ∞) such that
𝛿1 ≥ 𝑅
e𝛼 (N) − 𝑅(N), (14.2.109)
which is possible because 𝑅 e𝛼 (N) is monotonically increasing in 𝛼 (which follows
from Proposition 7.31) and because lim𝛼→1+ 𝑅 e𝛼 (N) = 𝑅(N) (see Appendix 10.A
for a proof). With this value of 𝛼, take 𝑛 large enough so that
!
𝑑 2𝐴−1
𝛼 (𝑛 + 1)
𝛿2 ≥ log2 , (14.2.110)
𝑛(𝛼 − 1) 1−𝜀
where 𝑑 𝐴 is the dimension of the input space of the channel N.
Now, with the values of 𝑛 and 𝜀 as above, an (𝑛, 𝑑, 𝜀) quantum communication
protocol satisfies (14.2.105), i.e.,
!
𝑑 2𝐴−1
log2 𝑑 𝛼 (𝑛 + 1)
≤𝑅 e𝛼 (N) + log2 , (14.2.111)
𝑛 𝑛(𝛼 − 1) 1−𝜀
for all 𝛼 > 1. Rearranging the right-hand side of this inequality, and using
(14.2.108)–(14.2.110), we obtain
2 −1
!
log2 𝑑 𝛼 (𝑛 + 1) 𝑑 𝐴
≤ 𝑅(N) + 𝑅e𝛼 (N) − 𝑅(N) + log2 (14.2.112)
𝑛 𝑛(𝛼 − 1) 1−𝜀

916
Chapter 14: Quantum Communication

≤ 𝑅(N) + 𝛿1 + 𝛿2 (14.2.113)
= 𝑅(N) + 𝛿′ (14.2.114)
< 𝑅(N) + 𝛿. (14.2.115)
log 𝑑
So we have that 𝑅(N) +𝛿 > 𝑛2 for all (𝑛, 𝑑, 𝜀) quantum communication protocols
with sufficiently large 𝑛. Due to this strict inequality, it follows that there cannot
exist an (𝑛, 2𝑛(𝑅(N)+𝛿) , 𝜀) quantum communication protocol for all sufficiently
large 𝑛 such that (14.2.110) holds, for if it did there would exist a 𝑑 such that
log2 𝑑 = 𝑛(𝑅(N) + 𝛿), which we have just seen is not possible. Since 𝜀 and 𝛿 are
arbitrary, we conclude that for all 𝜀 ∈ [0, 1), 𝛿 > 0, and sufficiently large 𝑛, there
does not exist an (𝑛, 2𝑛(𝑅(N)+𝛿) , 𝜀) quantum communication protocol. This means
that 𝑅(N) is a strong converse rate, and thus that 𝑄(N)
e ≤ 𝑅(N). ■

14.2.4.1 The Strong Converse from a Different Point of View

Let us now show that the Rains relative entropy of a quantum channel N is a strong
converse rate according to the definition of a strong converse rate in Appendix A. To
this end, consider a sequence {(𝑛, 2𝑛𝑟 , 𝜀 𝑛 )}𝑛∈N of (𝑛, 𝑑, 𝜀) quantum communication
protocols, with each element of the sequence having an arbitrary (but fixed) rate
𝑟 > 𝑅(N). For each element of the sequence, the inequality in (14.2.105) holds,
which means that
!
𝑑 2𝐴−1
𝛼 (𝑛 + 1)
𝑟≤𝑅 e𝛼 (N) + log2 (14.2.116)
𝑛(𝛼 − 1) 1 − 𝜀𝑛

for all 𝛼 > 1. Rearranging this inequality leads to the following lower bound on the
error probabilities 𝜀 𝑛 :

𝜀 𝑛 ≥ 1 − (𝑛 + 1) 𝑑 𝐴−1 · 2−𝑛 ( )(𝑟− 𝑅e𝛼 (N) ) .

2 𝛼−1
𝛼 (14.2.117)
Now, since 𝑟 > 𝑅(N), lim𝛼→1+ 𝑅 e𝛼 (N) = 𝑅(N) (see Appendix 10.A for a proof),
and since the sandwiched Rényi Rains relative entropy is monotonically increasing
in 𝛼 (see Proposition 7.31), there exists an 𝛼∗ > 1 such that 𝑅 > 𝑅
e𝛼∗ (N). Applying
the inequality in (14.2.117) to this value of 𝛼, we find that
𝛼∗ −1

𝑑 2𝐴−1 −𝑛 𝛼∗ (𝑟− 𝑅e𝛼∗ (N) )
𝜀 𝑛 ≥ 1 − (𝑛 + 1) ·2 . (14.2.118)
Then, taking the limit 𝑛 → ∞ on both sides of this inequality, we conclude that
lim𝑛→∞ 𝜀 𝑛 ≥ 1. However, 𝜀 𝑛 ≤ 1 for all 𝑛 since 𝜀 𝑛 is a probability by definition. So
917
Chapter 14: Quantum Communication

we conclude that lim𝑛→∞ 𝜀 𝑛 = 1. Since the rate 𝑟 > 𝑅(N) is arbitrary, we conclude
that 𝑅(N) is a strong converse rate. We also see from (14.2.118) that the sequence
{𝜀 𝑛 }𝑛∈N approaches one at an exponential rate.

14.2.5 Squashed Entanglement Weak Converse Bound

One of the results of Chapter 19 is that the squashed entanglement of a quantum

channel (see Definition 10.14) is a weak converse rate for quantum communication
assisted by LOCC. Since the LOCC-assisted quantum capacity is an upper bound
on the unassisted quantum capacity considered in this chapter, we conclude that the
squashed entanglement of a channel is a weak converse rate for unassisted quantum
communication.
We present some statements along these lines briefly here, indicating that the
squashed entanglement gives an upper bound on the one-shot quantum capacity, the
𝑛-shot quantum capacity, as well as the asymptotic quantum capacity. Complete
proofs of these statements are available in Chapter 19, and they follow from the fact
that the assistance of LOCC can only increase rates of quantum communication.
Propositions 14.23 and 14.24 below are a direct consequence of Theorem 19.4, and
Theorem 14.25 below is a consequence of Theorem 19.15.

Proposition 14.23 One-Shot Squashed Entanglement Upper Bound

Let N 𝐴→𝐵 be a quantum channel, and let 𝜀 ∈ [0, 1). For all (𝑑, 𝜀) quantum
communication protocols over the channel N 𝐴→𝐵 , the following bound holds
1 √
log2 𝑑 ≤ √ 𝐸 sq (N) + 𝑔2 ( 𝜀) , (14.2.119)
1− 𝜀

where 𝐸 sq (N) is the squashed entanglement of the channel N (see Defini-

tion 10.14) and 𝑔2 (𝛿) B (𝛿 + 1) log2 (𝛿 + 1) − 𝛿 log2 𝛿. Consequently, for the
one-shot quantum capacity of N,
1 √
𝑄 𝜀 (N) ≤ √ 𝐸 sq (N) + 𝑔2 ( 𝜀) . (14.2.120)
1− 𝜀

918
Chapter 14: Quantum Communication

Proposition 14.24 𝒏-Shot Squashed Entanglement Upper Bound

Let N 𝐴→𝐵 be a quantum channel, and let 𝜀 ∈ [0, 1). For all (𝑛, 𝑑, 𝜀) quantum
communication protocols over the channel N 𝐴→𝐵 , the following bound holds
√
1 1 𝑔2 ( 𝜀)
log2 𝑑 ≤ √ 𝐸 sq (N) + . (14.2.121)
𝑛 1− 𝜀 𝑛

Consequently, for the 𝑛-shot quantum capacity of N,

√
1 𝑔 2 ( 𝜀)
𝑄 𝑛,𝜀 (N) ≤ √ 𝐸 sq (N) + . (14.2.122)
1− 𝜀 𝑛

Theorem 14.25 Weak Converse Upper Bound on Quantum Capacity

The squashed entanglement 𝐸 sq (N) of a quantum channel N 𝐴→𝐵 is a weak
converse rate for quantum communication over N. In other words, 𝑄(N) ≤
𝐸 sq (N) for every quantum channel N.

14.3 Examples
We now consider the quantum capacity for particular classes of quantum channels.
As remarked earlier, computing the quantum capacity of an arbitrary channel is a
difficult task. This task is made more difficult by the fact that, in some cases, the
coherent information is known to be strictly superadditive, meaning that

𝐼 𝑐 (N ⊗𝑛 ) > 𝑛𝐼 𝑐 (N). (14.3.1)

This fact confirms that regularization of the coherent information really is needed
in general in order to compute the quantum capacity, and that additivity of coherent
information is simply not true for all channels. Another interesting phenomenon
related to quantum capacity is superactivation, which is when two channels N1
and N2 , each with zero quantum capacity, i.e., 𝑄(N1 ) = 𝑄(N2 ) = 0, can combine
to have non-zero quantum capacity, i.e., 𝑄(N1 ⊗ N2 ) > 0. Please consult the
Bibliographic Notes in Section 14.5 for more information about strict superadditivity
and superactivation.

919
Chapter 14: Quantum Communication

In this section, we show that coherent information is additive for all degradable
channels, which means that regularization is not needed in order to compute their
capacities. The same turns out to be true for generalized dephasing channels, and
we prove this by showing that the Rains relative entropy of those channels coincides
with their coherent information. We also show that anti-degradable channels have
zero quantum capacity. Finally, we evaluate the upper and lower bounds established
in this chapter for the generalized amplitude damping channel.
Before starting, let us first recall the definition of coherent information of a
channel:
𝐼 𝑐 (N) = sup 𝐼 (𝑅⟩𝐵)𝜔 = sup{𝐻 (N(𝜌)) − 𝐻 (N𝑐 (𝜌))}, (14.3.2)
𝜓𝑅 𝐴 𝜌

where 𝜔 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) and the second equality is explained in (7.11.111)–

(7.11.113). We let
𝐼 𝑐 (𝜌, N) B 𝐻 (N(𝜌)) − 𝐻 (N𝑐 (𝜌)). (14.3.3)

14.3.1 Degradable Channels

Recall from Definition 4.6 that a channel N 𝐴→𝐵 is degradable if there exists a
channel D𝐵→𝐸 such that
N𝑐 = D ◦ N, (14.3.4)
where N𝑐 is a channel complementary to N (see Definition 4.5) and 𝑑 𝐸 ≥ rank(ΓN
𝐴𝐵 ).
In particular, if 𝑉𝐴→𝐵𝐸 is an isometric extension of N, so that
N(𝜌) = Tr𝐸 [𝑉 𝜌𝑉 † ] (14.3.5)
for every state 𝜌, then
N𝑐 (𝜌) = Tr 𝐵 [𝑉 𝜌𝑉 † ]. (14.3.6)

We now show that the coherent information is additive for degradable quantum
channels, meaning that
𝐼 𝑐 (N ⊗ M) = 𝐼 𝑐 (N) + 𝐼 𝑐 (M) (14.3.7)
for all degradable quantum channels N and M. Consequently, regularization is
unnecessary, and we conclude that the quantum capacity of a degradable channel is
equal to its coherent information:
𝑄(N) = 𝐼 𝑐 (N) for every degradable channel N. (14.3.8)

920
Chapter 14: Quantum Communication

Proposition 14.26 Additivity of Coherent Information for Degradable

Channels
Let N and M be degradable channels. Then, the coherent information is
additive, i.e.,
𝐼 𝑐 (N ⊗ M) = 𝐼 𝑐 (N) + 𝐼 𝑐 (M). (14.3.9)

Proof: As shown in Section 14.2.3, we always have superadditivity of coherent

information, so that 𝐼 𝑐 (N ⊗ M) ≥ 𝐼 𝑐 (N) + 𝐼 𝑐 (M). So we prove that the reverse
inequality also holds for the case of degradable channels.
Let D1 and D2 be the degrading channels for N and M, respectively, meaning
that
N𝑐 = D1 ◦ N, M𝑐 = D2 ◦ M. (14.3.10)

Now, let 𝜌 𝐴1 𝐴2 be an arbitrary state on the input systems 𝐴1 and 𝐴2 of the

channels N and M, respectively. Using (7.11.103) and (7.11.104), along with the
fact that (N ⊗ M) 𝑐 = N𝑐 ⊗ M𝑐 , we find that

𝐻 (N𝑐 (𝜌 𝐴1 )) + 𝐻 (M𝑐 (𝜌 𝐴2 ) − 𝐻 ((N ⊗ M) 𝑐 (𝜌 𝐴1 𝐴2 ) (14.3.11)

= 𝐷 ((N ⊗ M) 𝑐 (𝜌 𝐴1 𝐴2 )∥N𝑐 (𝜌 𝐴1 ) ⊗ M𝑐 (𝜌 𝐴2 )) (14.3.12)
= 𝐷 ((D1 ◦ N ⊗ D2 ◦ M)(𝜌 𝐴1 𝐴2 )∥(D1 ◦ N)(𝜌 𝐴1 ) ⊗ (D2 ◦ M)(𝜌 𝐴2 ))
(14.3.13)
≤ 𝐷 ((N ⊗ M)(𝜌 𝐴1 𝐴2 )∥N(𝜌 𝐴1 ) ⊗ M(𝜌 𝐴2 )) (14.3.14)
= 𝐻 (N(𝜌 𝐴1 )) + 𝐻 (M(𝜌 𝐴2 )) − 𝐻 ((N ⊗ M)(𝜌 𝐴1 𝐴2 )), (14.3.15)

where the third equality follows from (14.3.10), the inequality follows from the
data-processing inequality for quantum relative entropy, and the last equality from
(7.11.103) and (7.11.104). Rearranging this inequality and applying subadditivity
of the entropy 𝐻 ((N ⊗ M)(𝜌 𝐴1 𝐴2 )) gives

𝐻 ((N ⊗ M)(𝜌 𝐴1 𝐴2 )) − 𝐻 ((N ⊗ M) 𝑐 (𝜌 𝐴1 𝐴2 )) (14.3.16)

≤ 𝐻 (N(𝜌 𝐴1 )) − 𝐻 (N𝑐 (𝜌 𝐴2 )) + 𝐻 (M(𝜌 𝐴2 )) − 𝐻 (M𝑐 (𝜌 𝐴2 )) (14.3.17)
≤ sup{𝐻 (N(𝜌 𝐴1 )) − 𝐻 (N𝑐 (𝜌 𝐴1 ))} (14.3.18)
𝜌 𝐴1

+ sup{𝐻 (M(𝜌 𝐴2 )) − 𝐻 (M𝑐 (𝜌 𝐴2 ))} (14.3.19)

𝜌 𝐴2

= 𝐼 𝑐 (N) + 𝐼 𝑐 (M) (14.3.20)

921
Chapter 14: Quantum Communication

Since the state 𝜌 𝐴1 𝐴2 is arbitrary, we conclude that

𝐼 𝑐 (N ⊗ M) = sup {𝐻 ((N ⊗ M)(𝜌 𝐴1 𝐴2 )) − 𝐻 ((N ⊗ N) 𝑐 (𝜌 𝐴1 𝐴2 ))} (14.3.21)

𝜌 𝐴1 𝐴2

≤ 𝐼 𝑐 (N) + 𝐼 𝑐 (M), (14.3.22)

as required. ■

Another useful fact about a degradable channel N is that the coherent information
𝐼 𝑐 (𝜌, N)
defined in (14.3.3) is concave in the input state 𝜌.

Lemma 14.27
For a degradable channel N, the function 𝜌 ↦→ 𝐼 𝑐 (𝜌, N) is concave in the input
state 𝜌. In other words, for every finite alphabet X, probability distribution
𝑝 : X → [0, 1], and set {𝜌 𝑥𝐴 }𝑥∈X of states,
!
∑︁ ∑︁
𝐼 𝑐
𝑝(𝑥) 𝜌 𝐴 , N ≥
𝑥
𝑝(𝑥)𝐼 𝑐 (𝜌 𝑥𝐴 , N). (14.3.23)
𝑥∈X 𝑥∈X

Proof: Let D be a degrading channel corresponding to N, so that N𝑐 = D ◦ N.

Next, define the following states:
∑︁
𝜔𝑋 𝐵 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ N 𝐴→𝐵 (𝜌 𝑥𝐴 ), (14.3.24)
𝑥∈X
∑︁
𝜏𝑋 𝐸 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ N𝑐 (𝜌 𝑥𝐴 ) = D𝐵→𝐸 (𝜔 𝑋 𝐵 ) (14.3.25)
𝑥∈X

Then, by noting that 𝜏𝑋 𝐸 = D𝐵→𝐸 (𝜔 𝑋 𝐵 ) and applying the data-processing inequality

for quantum mutual information (see (7.2.202)), we obtain

𝐼 (𝑋; 𝐸)𝜏 ≤ 𝐼 (𝑋; 𝐵)𝜔 . (14.3.26)

Then, using (7.1.10) and rearranging leads to

𝐻 (𝐵)𝜔 − 𝐻 (𝐸)𝜏 ≥ 𝐻 (𝐵|𝑋)𝜔 − 𝐻 (𝐸 |𝑋)𝜏 , (14.3.27)

which is the desired inequality in (14.3.23). Indeed, the left-hand side of the
Í
inequality above is simply 𝐼 𝑐 𝑥∈X 𝑝(𝑥) 𝜌 𝑥𝐴 , N . For the right-hand side, we find
922
Chapter 14: Quantum Communication

that
∑︁
𝐻 (𝐵|𝑋)𝜔 = 𝑝(𝑥)𝐻 (N(𝜌 𝑥𝐴 )), (14.3.28)
𝑥∈X
∑︁
𝐻 (𝐸 |𝑋)𝜏 = 𝑝(𝑥)𝐻 (N𝑐 (𝜌 𝑥𝐴 )), (14.3.29)
𝑥∈X

because 𝜔 𝑋 𝐵 and 𝜏𝑋 𝐸 are classical-quantum states. Therefore, the right-hand side

Í
of (14.3.27) is equal to 𝑥∈X 𝑝(𝑥)𝐼 𝑐 (𝜌 𝑥𝐴 , N). ■

14.3.1.1 Generalized Dephasing Channels

While additivity of coherent information for degradable channels allows for a

tractable formula for their quantum capacity, the question about the strong converse
quantum capacity 𝑄(N)
e remains. In other words, is it the case that 𝑄(N)
e = 𝐼 𝑐 (N)
for all degradable quantum channels N? We answer this question here for a
particular class of degradable channels.
We consider the class of degradable channels N called generalized dephasing
channels. Such channels are defined by the following isometric extension:
𝑑−1
∑︁
N
𝑉𝐴→𝐵𝐸 = |𝑖⟩𝐵 ⟨𝑖| 𝐴 ⊗ |𝜓𝑖 ⟩𝐸 , (14.3.30)
𝑖=0

where 𝑑 ≥ 1 and where the state vectors {|𝜓𝑖 ⟩}𝑖=0 𝑑−1 are arbitrary (not necessarily

orthonormal). Recalling the discussion in Section 4.4.7 on Hadamard channels,

in particular (4.4.102), we see that generalized dephasing channels are Hadamard
channels, as in (4.4.95), with 𝑉 therein set to 1.
For a state 𝜌 𝐴 , we have that
𝑑−1
∑︁
N N†
N 𝐴→𝐵 (𝜌 𝐴 ) = Tr𝐸 [𝑉𝐴→𝐵𝐸 𝜌 𝐴𝑉𝐴→𝐵𝐸 ] = ⟨𝑖|𝜌 𝐴 | 𝑗⟩⟨𝜓𝑖 |𝜓 𝑗 ⟩|𝑖⟩⟨ 𝑗 | 𝐵 , (14.3.31)
𝑖, 𝑗=0

and
𝑑−1
∑︁
N N†
N𝑐𝐴→𝐸 (𝜌 𝐴 ) = Tr 𝐵 [𝑉𝐴→𝐵𝐸 𝜌 𝐴𝑉𝐴→𝐵𝐸 ] = ⟨𝑖|𝜌 𝐴 |𝑖⟩|𝜓𝑖 ⟩⟨𝜓𝑖 | 𝐸 . (14.3.32)
𝑖=0

923
Chapter 14: Quantum Communication

Then, it is straightforward to see that

N𝑐 ◦ N(𝜌) = N𝑐 (𝜌) (14.3.33)

for every state 𝜌. This implies that generalized dephasing channels N are degradable,
with N𝑐 being the degrading channel.
We now show that 𝑄(N) e = 𝐼 𝑐 (N) for every generalized dephasing channel N.
We do this by showing that the Rains information 𝑅(N) of a generalized dephasing
channel is equal to its coherent information.

Theorem 14.28 Quantum Capacity of Generalized Dephasing Channels

For every generalized dephasing channel N (defined by the isometric extension
in (14.3.30)), the following equalities hold

𝑄(N) = 𝑄(N)
e = 𝑅(N) = 𝐼 𝑐 (N), (14.3.34)

which establish the coherent information as the quantum capacity and strong
converse quantum capacity.

Proof: It suffices to show that 𝑅(N) = 𝐼 𝑐 (N). Note that the inequality 𝐼 𝑐 (N) ≤
𝑅(N) holds for every quantum channel N by combining the result of Theorem 14.16
with the result of Theorem 14.22. We now show that the reverse inequality holds
for all generalized dephasing channels.
We start by observing that every generalized dephasing channel N is covariant
with respect to the operators {𝑍 ( 𝑗)} 𝑑−1
𝑗=0 defined in (3.2.49):

N(𝑍 ( 𝑗) 𝜌𝑍 ( 𝑗) † ) = 𝑍 ( 𝑗)N(𝜌)𝑍 ( 𝑗) † (14.3.35)

for every state 𝜌 and for all 0 ≤ 𝑗 ≤ 𝑑 − 1, where

𝑑−1
∑︁ 2 𝜋i𝑘 𝑗
𝑍 ( 𝑗) = e 𝑑 |𝑘⟩⟨𝑘 |. (14.3.36)
𝑘=0

Then, for every state 𝜌, the average state

𝑑−1 𝑑−1
1 ∑︁ †
∑︁
𝜌B 𝑍 ( 𝑗) 𝜌𝑍 ( 𝑗) = ⟨𝑘 |𝜌|𝑘⟩|𝑘⟩⟨𝑘 | (14.3.37)
𝑑 𝑗=0 𝑘=0

924
Chapter 14: Quantum Communication

is diagonal in the basis {|𝑖⟩}𝑖=0

𝑑−1 . Since the quantities ⟨𝑘 |𝜌|𝑘⟩ are probabilities, we

conclude that for every state 𝜌, its corresponding average state 𝜌 has a purification
of the following form:
𝑑−1 √
∑︁
𝜌
|𝜙 ⟩ 𝑅 𝐴 = 𝑝(𝑖)|𝑖⟩ 𝑅 ⊗ |𝑖⟩ 𝐴 C |𝜓 𝑝 ⟩ 𝑅 𝐴 , (14.3.38)
𝑖=0

where 𝑝 : {0, 1, . . . , 𝑑 − 1} → [0, 1] is a probability distribution. Therefore, by

Proposition 10.12, when calculating the Rains information 𝑅(N), it suffices to
𝑝
optimize over the pure states 𝜓 𝑅 𝐴 :
𝑝
𝑅(N) = sup inf′ 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 ). (14.3.39)
𝑝
𝜓𝑅 𝐴 𝜎𝑅𝐵 ∈PPT (𝑅:𝐵)

Now, restricting the optimization in the definition of the coherent informa-

tion 𝐼 𝑐 (N) to the pure states in (14.3.38), we obtain

𝐼 𝑐 (N) = sup inf 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥ 1 𝑅 ⊗ 𝜎𝐵 ) (14.3.40)

𝜓 𝑅 𝐴 𝜎𝐵
≥ sup inf 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥ 1 𝑅 ⊗ 𝜎𝐵 )
𝑝
(14.3.41)
𝑝 𝜎𝐵
𝜓𝑅 𝐴

≥ sup inf 𝐷 (Δ(N 𝐴→𝐵 (𝜓 𝑅 𝐴 ))∥Δ( 1 𝑅 ⊗ 𝜎𝐵 )),

𝑝
(14.3.42)
𝑝 𝜎𝐵
𝜓𝑅 𝐴

where the last line follows from the data-processing inequality for quantum relative
entropy, and we introduced the following channel:
𝑑−1
Δ(𝜌) B Π𝜌Π + ( 1 − Π) 𝜌( 1 − Π),
∑︁
Π= |𝑖⟩⟨𝑖| 𝑅 ⊗ |𝑖⟩⟨𝑖| 𝐵 . (14.3.43)
𝑖=0

Now, it is straightforward to check that

𝑝 𝑝
ΠN 𝐴→𝐵 (𝜓 𝑅 𝐴 )Π = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ), (14.3.44)

which implies that

𝑝 𝑝
Δ(N 𝐴→𝐵 (𝜓 𝑅 𝐴 )) = ΠN 𝐴→𝐵 (𝜓 𝑅 𝐴 )Π. (14.3.45)

Therefore, because Π and 1 − Π project onto orthogonal subspaces, we obtain

𝐷 (Δ(N 𝐴→𝐵 (𝜓 𝑅 𝐴 ))∥Δ( 1 𝑅 ⊗ 𝜎𝐵 ))

𝑝

925
Chapter 14: Quantum Communication

= 𝐷 (ΠN 𝐴→𝐵 (𝜓 𝑅 𝐴 )Π∥Π( 1 𝑅 ⊗ 𝜎𝐵 )Π)

𝑝
(14.3.46)
𝑑−1
!
∑︁
𝑝
= 𝐷 ΠN 𝐴→𝐵 (𝜓 𝑅 𝐴 )Π 𝑞(𝑖)|𝑖⟩⟨𝑖| 𝑅 ⊗ |𝑖⟩⟨𝑖| 𝐵 , (14.3.47)
𝑖=0

where the last line follows because

𝑑−1
Π( 1 𝑅 ⊗ 𝜎𝐵 )Π =
∑︁
𝑞(𝑖)|𝑖⟩⟨𝑖| 𝑅 ⊗ |𝑖⟩⟨𝑖| 𝐵 , (14.3.48)
𝑖=0

with the probability distribution 𝑞(𝑖) B ⟨𝑖|𝜎𝐵 |𝑖⟩. Note that the right-hand side of
the equation above is a state in PPT′ (𝑅 : 𝐵). Therefore, we have

𝐼 𝑐 (N) ≥ sup inf 𝐷 (Δ(N 𝐴→𝐵 (𝜓 𝑅 𝐴 ))∥Δ( 1 𝑅 ⊗ 𝜎𝐵 ))

𝑝
(14.3.49)
𝑝 𝜎𝐵
𝜓𝑅 𝐴

= sup inf 𝐷 (ΠN 𝐴→𝐵 (𝜓 𝑅 𝐴 )Π∥Π( 1 𝑅 ⊗ 𝜎𝐵 )Π)

𝑝
(14.3.50)
𝑝 𝜎𝐵
𝜓𝑅 𝐴
𝑑−1
!
∑︁
𝑝
= sup inf 𝐷 N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) ⟨𝑖|𝜎𝐵 |𝑖⟩|𝑖⟩⟨𝑖| 𝑅 ⊗ |𝑖⟩⟨𝑖| 𝐵 (14.3.51)
𝑝 𝜎𝐵
𝜓𝑅 𝐴 𝑖=0
𝑝
≥ sup inf′ 𝐷 (N 𝐴→𝐵 (𝜓 𝑅 𝐴 )∥𝜎𝑅𝐵 ) (14.3.52)
𝑝
𝜓𝑅 𝐴 𝜎𝑅𝐵 ∈PPT (𝑅:𝐵)

= 𝑅(N), (14.3.53)

completing the proof. ■

14.3.2 Anti-Degradable Channels

Let us now consider anti-degradable channels. Recall from Definition 4.6 that a
channel N 𝐴→𝐵 is anti-degradable if there exists an anti-degrading channel A𝐸→𝐵
such that
N = A ◦ N𝑐 , (14.3.54)
where N𝑐 is a channel complementary to N and 𝑑 𝐸 ≥ rank(ΓN
𝐴𝐵 ).

926
Chapter 14: Quantum Communication

Proposition 14.29 Coherent Information for Anti-Degradable Channels

The coherent information vanishes for all anti-degradable channels, i.e., 𝐼 𝑐 (N) =
0 for every anti-degradable channel N. Therefore, 𝑄(N) = 0 for all anti-
degradable channels.

Proof: Let N 𝐴→𝐵 have the following Stinespring representation: N(𝜌 𝐴 ) =

†
Tr𝐸 [𝑉𝐴→𝐵𝐸 𝜌 𝐴𝑉𝐴→𝐵𝐸 ]. Then, for every pure state 𝜓 𝑅 𝐴 , the state vector
|𝜙⟩ 𝑅𝐵𝐸 B 𝑉𝐴→𝐵𝐸 |𝜓⟩ 𝑅 𝐴 (14.3.55)
is such that
1
𝐼 (𝑅⟩𝐵) 𝜙 = 𝐼 (𝑅; 𝐵) 𝜙 − 𝐼 (𝑅; 𝐸) 𝜙 . (14.3.56)
2
To see this, let us first note that 𝐻 (𝐸) 𝜙 = 𝐻 (𝑅𝐵) 𝜙 and 𝐻 (𝑅𝐸) 𝜙 = 𝐻 (𝐵) 𝜙 . These
identities hold because 𝜙 𝑅𝐵𝐸 is a pure state, implying that the reduced states 𝜙 𝐸
and 𝜙 𝑅𝐵 have the same spectrum and the reduced states 𝜙 𝑅𝐸 and 𝜙 𝐵 have the same
spectrum. This, along with (7.11.103), leads to
1
𝐼 (𝑅; 𝐵) 𝜙 − 𝐼 (𝑅; 𝐸) 𝜙 (14.3.57)
2
1
= 𝐻 (𝑅) 𝜙 + 𝐻 (𝐵) 𝜙 − 𝐻 (𝑅𝐵) 𝜙 − 𝐻 (𝑅) 𝜙 − 𝐻 (𝐸) 𝜙 + 𝐻 (𝑅𝐸) 𝜙 (14.3.58)
2
= 𝐻 (𝐵) 𝜙 − 𝐻 (𝑅𝐵) 𝜙 (14.3.59)
= 𝐼 (𝑅⟩𝐵) 𝜙 . (14.3.60)
Next, using (7.11.104), noting that 𝜙 𝑅𝐵 = N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) C 𝜔 𝑅𝐵 , and using the fact
that N is anti-degradable, so that N = A ◦ N𝑐 , we have
𝐼 (𝑅; 𝐵) 𝜙 ≤ 𝐼 (𝑅; 𝐸) 𝜙 , (14.3.61)
where the inequality follows from the data-processing inequality for mutual in-
formation under local channels (see (7.2.202)) and the facts that N 𝐴→𝐵 (𝜓 𝑅 𝐴 ) =
A𝐸→𝐵 ◦ N𝑐𝐴→𝐸 (𝜓 𝑅 𝐴 ) and the reduced state 𝜙 𝑅𝐸 = N𝑐𝐴→𝐸 (𝜓 𝑅 𝐴 ). Therefore, from
(14.3.56) we conclude that
𝐼 (𝑅⟩𝐵)𝜔 ≤ 0 (14.3.62)
for every pure state 𝜓 𝑅 𝐴 . This implies that
𝐼 𝑐 (N) = sup 𝐼 (𝑅⟩𝐵)𝜔 = 0, (14.3.63)
𝜓𝑅 𝐴

as required. ■
927
Chapter 14: Quantum Communication

14.3.3 Generalized Amplitude Damping Channel

Let us recall the definition of the generalized amplitude damping channel (GADC)
from (4.5.10):

A𝛾,𝑁 (𝜌) = 𝐴1 𝜌 𝐴1† + 𝐴2 𝜌 𝐴2† + 𝐴3 𝜌 𝐴3† + 𝐴4 𝜌 𝐴4† , (14.3.64)

where
√
√ √

1 √︁ 0 0 𝛾
𝐴1 = 1 − 𝑁 , 𝐴2 = 1 − 𝑁 , (14.3.65)
0 1−𝛾 0 0
√ √
√︁
1−𝛾 0 0 0
𝐴3 = 𝑁 , 𝐴4 = 𝑁 √ , (14.3.66)
0 1 𝛾 0

and 𝛾, 𝑁 ∈ [0, 1]. It is straightforward to show that

A𝛾,𝑁 (𝜌) = 𝑋A𝛾,1−𝑁 (𝑋 𝜌𝑋) 𝑋 (14.3.67)

for every state 𝜌 and all 𝛾, 𝑁 ∈ [0, 1]. In other words, the GADC A𝛾,𝑁 is related
to the GADC A𝛾,1−𝑁 via a simple pre- and post-processing by the Pauli unitary
𝑋 = |0⟩⟨1| + |1⟩⟨0|. The information-theoretic aspects of the GADC are thus
invariant under the interchange 𝑁 ↔ 1 − 𝑁, which means that we can, without loss
of generality, restrict the parameter 𝑁 to the interval [0, 1/2].
For 𝑁 = 0, the GADC reduces to the amplitude damping channel A𝛾 defined in
(4.5.1), which is degradable. Indeed, we first note that

A𝑐𝛾,0 = A1−𝛾,0 , (14.3.68)

where the complementary channel A𝑐𝛾,0 (recall Definition 4.5) is defined via the
following isometric extension:

𝑉 𝛾,𝑁 B 𝐴1 ⊗ |0⟩ + 𝐴2 ⊗ |1⟩ + 𝐴3 ⊗ |2⟩ + 𝐴4 ⊗ |3⟩. (14.3.69)

We now use the fact that, for all 𝛾1 , 𝛾2 , 𝑁1 , 𝑁2 ∈ [0, 1],

A𝛾,𝑁 = A𝛾2 ,𝑁2 ◦ A𝛾1 ,𝑁1 , (14.3.70)

where 𝛾 = 𝛾1 + 𝛾2 − 𝛾1 𝛾2 and 𝑁 = 𝛾1 (1−𝛾 2 )𝑁1 +𝛾2 𝑁2

𝛾1 +𝛾2 −𝛾1 𝛾2 . From this fact, it follows that
the defining condition for degradability, namely, D𝛾,0 ◦ A𝛾,0 = A𝑐𝛾,0 = A1−𝛾,0 ,
928
Chapter 14: Quantum Communication

is satisfied by the quantum channel D𝛾,0 B A 1−2𝛾 ,0 . It can be shown that for
1−𝛾
𝑁 > 0, the GADC A𝛾,𝑁 is not degradable for all 𝛾 ∈ (0, 1] (please consult the
Bibliographic Notes in Section 14.5).
Since A𝛾,0 is degradable, its coherent information is additive, which means that
its quantum capacity is equal to its coherent information, i.e.,
n o
𝑐 𝑐
𝑄(A𝛾,0 ) = 𝐼 (A𝛾,0 ) = sup 𝐻 (A𝛾,0 (𝜌)) − 𝐻 (A𝛾,0 (𝜌)) (14.3.71)
𝜌
= sup 𝐼 (𝜌, A𝛾,0 ),
𝑐
(14.3.72)
𝜌

where we have used the expression in (14.3.2). Now, as explained in Section 11.3.2,
the GADC is covariant with respect to the Pauli operator 𝑍. Furthermore, by
Lemma 14.27, the function 𝜌 ↦→ 𝐼 𝑐 (𝜌, A𝛾,0 ) is concave. Therefore, for every state
𝜌,
1 1 1 1
𝐼 𝑐 𝜌 + 𝑍 𝜌𝑍, A𝛾,0 ≥ 𝐼 𝑐 (𝜌, A𝛾,0 ) + 𝐼 𝑐 (𝑍 𝜌𝑍, A𝛾,0 ). (14.3.73)
2 2 2 2
Now, using the fact that A𝛾,0 is covariant with respect to 𝑍, and the fact that
A𝑐𝛾,0 = A1−𝛾,0 , we obtain

𝐼 𝑐 (𝑍 𝜌𝑍, A𝛾,0 ) = 𝐻 (A𝛾,0 (𝑍 𝜌𝑍)) − 𝐻 (A𝑐𝛾,0 (𝑍 𝜌𝑍)) (14.3.74)

= 𝐻 (𝑍A𝛾,0 (𝜌)𝑍) − 𝐻 (𝑍A1−𝛾,0 (𝜌)𝑍) (14.3.75)
= 𝐻 (A𝛾,0 (𝜌)) − 𝐻 (A1−𝛾,0 (𝜌)) (14.3.76)
= 𝐻 (A𝛾,0 (𝜌)) − 𝐻 (A𝑐𝛾,0 (𝜌)) (14.3.77)
= 𝐼 𝑐 (𝜌, A𝛾,0 ). (14.3.78)
Therefore,
1 1
𝐼 𝑐
𝜌 + 𝑍 𝜌𝑍, A𝛾,0 ≥ 𝐼 𝑐 (𝜌, A𝛾,0 ) (14.3.79)
2 2
for every state 𝜌. Recalling from (4.5.28) that the state 12 𝜌 + 12 𝑍 𝜌𝑍 results from the
action of the completely dephasing channel on 𝜌, which means that it is diagonal in
the standard basis, we find that
max 𝐼 𝑐 ((1 − 𝑝)|0⟩⟨0| + 𝑝|1⟩⟨1|, A𝛾,0 ) ≥ 𝐼 𝑐 (𝜌, A𝛾,0 ) (14.3.80)
𝑝∈[0,1]

for every state 𝜌, which means that

𝑄(A𝛾,0 ) = 𝐼 𝑐 (A𝛾,0 ) = max 𝐼 𝑐 ((1 − 𝑝)|0⟩⟨0| + 𝑝|1⟩⟨1|, A𝛾,0 ) (14.3.81)
𝑝∈[0,1]

929
Chapter 14: Quantum Communication

1.0

0.8

0.6

Q(Aγ,0 )
0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0
γ

Figure 14.5: Quantum capacity of the amplitude damping channel, as given

by (14.3.82). The capacity is equal to zero for 𝛾 ≥ 21 because in this parameter
range the channel is anti-degradable.

= max {ℎ2 ((1 − 𝛾) 𝑝) − ℎ2 (𝛾 𝑝)}, (14.3.82)

𝑝∈[0,1]

where in the last line we have evaluated 𝐼 𝑐 ((1 − 𝑝)|0⟩⟨0| + 𝑝|1⟩⟨1|, A𝛾,0 ). See
Figure 14.5 for a plot of the quantum capacity of the amplitude damping channel
A𝛾,0 . Note that the capacity vanishes at 𝛾 = 12 , which is due to the fact that for
𝛾 ≥ 12 the amplitude damping channel A𝛾,0 (and more generally the GADC A𝛾,𝑁
for 𝑁 ∈ [0, 1]) is anti-degradable. From Proposition 14.29, we thus have that
𝑄(A𝛾,𝑁 ) = 0 for all 𝑁 ∈ [0, 1] and 𝛾 ≥ 12 .
Let us now consider the coherent information of the GADC A𝛾,𝑁 for 𝑁 > 0.
In this case, the coherent information 𝐼 𝑐 (A𝛾,𝑁 ) is a lower bound on the quantum
capacity of the GADC. As with the amplitude damping channel, it can be shown
that for the GADC A𝛾,𝑁 with 𝑁 > 0 it suffices to optimize over states diagonal in
the standard basis in order to compute the coherent information:
𝐼 𝑐 (A𝛾,𝑁 ) = max 𝐼 𝑐 ((1 − 𝑝)|0⟩⟨0| + 𝑝|1⟩⟨1|, A𝛾,𝑁 ), (14.3.83)
𝑝∈[0,1]

for all 𝛾 ∈ (0, 1) and all 𝑁 > 0. The proof of this is more involved, since for 𝑁 > 0
the GADC is not degradable, meaning that we cannot use Lemma 14.27. Please
consult the Bibliographic Notes in Section 14.5 for a source of the proof.
In Figure 14.6, we plot the coherent information lower bound given by (14.3.83).
930
Chapter 14: Quantum Communication

N = 0.1 N = 0.2 N = 0.3

1.0 1.0 1.0

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0.0 0.0 0.0

0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
γ γ γ
N = 0.4 N = 0.45 N = 0.5
1.0 1.0 1.0

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0.0 0.0 0.0

0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
γ γ γ

I c (Aγ,N ) QUB
DP,1 QUB
DP,2 QUB
DP,3 QUB
DP,4 R(Aγ,N )

Figure 14.6: The coherent information lower bound 𝐼 𝑐 (A𝛾,𝑁 ) and four upper
bounds on the quantum capacity of the generalized amplitude damping channel
A𝛾,𝑁 . The quantum capacity lies within the shaded region.

We also plot the Rains information upper bound 𝑅(A𝛾,𝑁 ) as well as four other upper
bounds that are based on the following identities, which follow from (14.3.70):
A𝛾,𝑁 = A𝛾𝑁,1 ◦ A 𝛾 (1− 𝑁 ) ,0 , (14.3.84)
1−𝛾 𝑁

A𝛾,𝑁 = A𝛾(1−𝑁),0 ◦ A 𝛾𝑁 . (14.3.85)

1−𝛾 (1− 𝑁 ) ,1

It then follows that

𝑄(A𝛾,𝑁 ) ≤ 𝑄(A 𝛾 (1− 𝑁 ) ,0 ) C 𝑄 UB
DP,1 (𝛾, 𝑁), (14.3.86)
1−𝛾 𝑁

𝑄(A𝛾,𝑁 ) ≤ 𝑄(A𝛾(1−𝑁),0 ) C 𝑄 UB
DP,2 (𝛾, 𝑁), (14.3.87)
𝑄(A𝛾,𝑁 ) ≤ 𝑄(A𝛾𝑁,0 ) C 𝑄 UB
DP,3 (𝛾, 𝑁), (14.3.88)
𝑄(A𝛾,𝑁 ) ≤ 𝑄(A 𝛾𝑁 ) C 𝑄 UB
DP,4 (𝛾, 𝑁). (14.3.89)
1−𝛾 (1− 𝑁 ) ,0

Note that the right-hand side of each inequality can be calculated using (14.3.82).
We have also made use of (14.3.67), which implies that 𝑄(A𝛾,1 ) = 𝑄(A𝛾,0 ). These
931
Chapter 14: Quantum Communication

inequalities hold due to the fact that, for the composition of two quantum channels
N and M,
𝑄(N ◦ M) ≤ 𝑄(M) and 𝑄(N ◦ M) ≤ 𝑄(N). (14.3.90)
The first inequality holds by the data-processing inequality. The second inequality
can be viewed as a lower bound on the quantum capacity of the channel N that
arises from a coding strategy consisting of some encoding followed by many uses
of the channel M.

14.4 Summary
In this chapter, we studied quantum communication. Given a quantum channel
N 𝐴→𝐵 connecting Alice and Bob, the goal in quantum communication is to
determine the highest rate, called the quantum capacity and denoted by 𝑄(N), at
which the 𝐴′ part of an arbitrary pure state Ψ𝑅 𝐴′ can be transmitted to Bob without
error. At the disposal of Alice and Bob are local encoding and decoding channels,
as well as an arbitrary number of (unassisted) uses of the channel N 𝐴→𝐵 . By
unassisted, we mean that Alice and Bob are not allowed to communicate with each
other between channel uses. We found that the coherent information 𝐼 𝑐 (N) of N is
always a lower bound on its quantum capacity, and that, in general, computing the
𝑐 (N).
exact value of the capacity involves a regularization, so that 𝑄(N) = 𝐼reg
Starting with the one-shot setting, in which only one use of the channel is
allowed and there is some tolerable non-zero error, we determined both upper and
lower bounds on the number of qubits that can be transmitted. The one-shot upper
bound involves the hypothesis testing relative entropy in a way similar to how it is
involved in classical communication and entanglement distillation. Specifically,
we establish the hypothesis testing coherent information as an upper bound. This
leads to the coherent information (hence regularized coherent information) weak
converse upper bound in the asymptotic setting. To obtain a lower bound, we
used the results of Chapter 13 on entanglement distillation. We found that we
could take the entanglement distillation protocol developed in that chapter and
convert it to a suitable quantum communication protocol. We proved that this
lower bound is optimal when applied to the asymptotic setting, in the sense that
it leads to the coherent information (hence regularized coherent information) as
an achievable rate, which matches the upper bound. For degradable channels, we
showed that the coherent information is additive, meaning that 𝑄(N) = 𝐼 𝑐 (N) for
932
Chapter 14: Quantum Communication

all degradable channels. We also showed that anti-degradable channels have zero
quantum capacity.
With the goal of obtaining tractable estimates of quantum capacity for general
channels, we found that the Rains information 𝑅(N) of N is a strong converse
upper bound on the quantum capacity of N. This allowed us to conclude that the
quantum capacity of the generalized dephasing channel is equal to its coherent
information, because its Rains information and coherent information coincide.
We also looked ahead to Chapter 19 and concluded from the results there that
the squashed entanglement of a quantum channel is an upper bound on quantum
capacity.

14.5 Bibliographic Notes

The problem of determining the capacity of a quantum channel for transmitting
quantum information, in a manner analogous to Shannon’s channel capacity
theorem, was proposed by Shor (1995). The notion of quantum communication
that we consider in this chapter, as well as the notion of entanglement transmission,
was defined by Schumacher (1996). The notion of subspace transmission was
defined by Barnum et al. (2000) (see also (Bennett et al., 1997)), and the notion of
entanglement generation was defined by Devetak (2005). These different notions of
quantum communication, and the connections between them, have been examined
by Kretschmann and Werner (2004), where they also proved that the capacities for
these variations are all equal to each other.
Upper and lower bounds on one-shot quantum capacity have been established
by Buscemi and Datta (2010a); Datta and Hsieh (2013); Beigi et al. (2016);
Tomamichel et al. (2016); Anshu et al. (2019); Wang et al. (2019b). The approach
of using hypothesis testing relative entropy for obtaining an upper bound on one-shot
quantum capacity (specifically, Theorem 14.3) comes from work by Matthews and
Wehner (2014). The lower bound on the one-shot quantum capacity in Theorem 14.5
comes from work on one-shot decoupling (Dupuis et al., 2014), which was then
used by Wilde et al. (2017, Proposition 21) to obtain a lower bound on the one-shot
distillable entanglement. The various code conversions in Lemmas 14.6, 14.7,
and 14.8 are available in a number of works, including Barnum et al. (2000);
Kretschmann and Werner (2004); Klesse (2007); Watrous (2018) (see also Wilde
and Qi (2018)). The one-shot Rains upper bound in Corollary 14.20 was obtained

933
Chapter 14: Quantum Communication

by Tomamichel et al. (2017).

In the asymptotic setting, Schumacher (1996); Schumacher and Nielsen (1996);
Barnum et al. (1998, 2000) established coherent information as an upper bound on
quantum capacity, and Lloyd (1997); Shor (2002b); Devetak (2005) established
the lower bound. (See also the proofs of Klesse (2008); Hayden et al. (2008b).)
Decoupling as a method for understanding quantum capacity was initially studied by
Schumacher and Westmoreland (2002) and developed in further detail by Hayden
et al. (2008a). The Rains information strong converse upper bound (Theorem 14.22)
was established by Tomamichel et al. (2017). Weak subadditivity of Rényi Rains
information of a channel (Proposition 14.21) is also due to Tomamichel et al. (2017).
We also mention that upper bounds on quantum capacity based on approximate
degradability and approximate anti-degradability of channels have been established
by Sutter et al. (2017) (see also (Leditzky et al., 2018)).
Additivity of coherent information for degradable channels was shown by
Devetak and Shor (2005). Smith and Yard (2008); Smith et al. (2011) demonstrated
the phenomenon of superactivation of quantum capacity, and DiVincenzo et al.
(1998); Smith and Smolin (2007); Cubitt et al. (2015); Elkouss and Strelchuk
(2015) demonstrated superadditivity of coherent information. Lemma 14.27 was
presented in (Yard et al., 2008, Lemma 5). The quantum capacity of generalized
dephasing channels was established by Devetak and Shor (2005) and the strong
converse by Tomamichel et al. (2017). See (Morgan and Winter, 2014) for the
pretty-strong converse for the quantum capacity of degradable channels. The
generalized amplitude damping channel (GADC) has been studied in detail by
Khatri et al. (2020). The fact that this channel is not degradable for 𝑁 ∈ (0, 1)
and 𝛾 ∈ (0, 1] follows from (Cubitt et al., 2008, Theorem 4). The expression in
(14.3.82) for the quantum capacity of the amplitude damping channel was given by
Giovannetti and Fazio (2005). For a proof of (14.3.83), see (García-Patrón et al.,
2009, Appendix). A proof of anti-degradability of the GADC A𝛾,𝑁 for all 𝛾 ≥ 12
can be found in (Khatri et al., 2020, Proposition 2).

Appendix 14.A Alternative Notions of

Quantum Communication
At the beginning of this chapter, we considered three alternative notions of quantum
communication, and we described how they are implied by the notion of quantum

934
Chapter 14: Quantum Communication

communication as defined at the beginning of Section 14.1. We now precisely

define these other notions of quantum communication, and we show how the
notion of quantum communication considered in the chapter (strong subspace
transmission) implies all three alternatives.

Definition 14.30 Entanglement Transmission

An entanglement transmission protocol for N 𝐴→𝐵 consists of the three elements
(𝑑, E, D), where 𝑑 ≥ 1, E 𝐴′ →𝐴 is an encoding channel with 𝑑 𝐴′ = 𝑑, and
D𝐵→𝐵′ is a decoding channel with 𝑑 𝐵′ = 𝑑. The goal of the protocol is to
transmit the 𝐴′ system of a maximally entangled state Φ 𝑅 𝐴′ of Schmidt rank 𝑑
such that the final state

𝜔 𝑅𝐵′ B (D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(Φ 𝑅 𝐴′ ) (14.A.1)

is close to the initial maximally entangled state. The entanglement transmission

error of the protocol is

𝑝 (ET)
err (E, D; N) B 1 − ⟨Φ| 𝑅𝐵′ 𝜔 𝑅𝐵′ |Φ⟩ 𝑅𝐵′ (14.A.2)
= 1 − 𝐹𝑒 (D ◦ N ◦ E), (14.A.3)

where we recall the entanglement fidelity of a channel from Definition 6.21. We

call the protocol (𝑑, E, D) a (𝑑, 𝜀) protocol, with 𝜀 ∈ [0, 1], if 𝑝 (ET)
err (E, D; N)
≤ 𝜀.

It is straightforward to see that if there exists a (𝑑, 𝜀) quantum communication

protocol for a quantum channel N (as per Definition 14.1), then there exists a (𝑑, 𝜀)
entanglement transmission protocol. Indeed, for a (𝑑, 𝜀) quantum communication
protocol with encoding and decoding channel E and D, we have that
𝑝 ∗err (E, D; N)
= max {1 − ⟨Ψ| 𝑅 𝐴′ (D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(Ψ𝑅 𝐴′ )|Ψ⟩ 𝑅 𝐴′ } (14.A.4)
|Ψ⟩ 𝑅 𝐴′
= 1 − 𝐹 (D ◦ N ◦ E) (14.A.5)
≤ 𝜀. (14.A.6)
However, since the maximally entangled state Φ 𝐴𝐵′ is a particular pure state in the
optimization for 𝐹 (D ◦ N ◦ E), we conclude that
𝑝 (ET)
err (E, D; N) = 1 − 𝐹𝑒 (D ◦ N ◦ E) ≤ 1 − 𝐹 (D ◦ N ◦ E) ≤ 𝜀. (14.A.7)
935
Chapter 14: Quantum Communication

So the elements (𝑑, E, D) form a (𝑑, 𝜀) entanglement transmission protocol.

Definition 14.31 Entanglement Generation

An entanglement generation protocol for N 𝐴→𝐵 is defined by the three elements
(𝑑, Ψ𝐴′ 𝐴 , D𝐵→𝐵′ ), where Ψ𝐴′ 𝐴 is a pure state with 𝑑 𝐴′ = 𝑑, and D𝐵→𝐵′ is a
decoding channel with 𝑑 𝐵′ = 𝑑. The goal of the protocol is to transmit the
system 𝐴 such that the final state

𝜎𝐴′ 𝐵′ B (D𝐵→𝐵′ ◦ N 𝐴→𝐵 )(Ψ𝐴′ 𝐴 ) (14.A.8)

is close in fidelity to a maximally entangled state of Schmidt rank 𝑑. The

entanglement generation error of the protocol is given by

𝑝 (EG)
err (Ψ 𝐴′ 𝐴 , D; N) B 1 − ⟨Φ| 𝐴′ 𝐵′ 𝜎𝐴′ 𝐵′ |Φ⟩ 𝐴′ 𝐵′ (14.A.9)
= 1 − 𝐹 (Φ 𝐴′ 𝐵′ , 𝜎𝐴′ 𝐵′ ). (14.A.10)

We call the protocol (𝑑, Ψ𝐴′ 𝐴 , D𝐵→𝐵′ ) a (𝑑, 𝜀) protocol, with 𝜀 ∈ [0, 1], if
𝑝 (EG)
err (Ψ 𝐴′ 𝐴 , D; N) ≤ 𝜀.

Consider a (𝑑, 𝜀) quantum communication protocol for N 𝐴→𝐵 given by the

elements (𝑑, E 𝐴′ →𝐴 , D𝐵→𝐵′ ), where 𝑑 𝐴′ = 𝑑 𝐵′ = 𝑑. Then, by the arguments above,
the same elements constitute a (𝑑, 𝜀) entanglement transmission protocol, so that

⟨Φ| 𝑅𝐵′ (D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(Φ 𝑅 𝐴′ )|Φ⟩ 𝑅𝐵′ ≥ 1 − 𝜀. (14.A.11)

˜ B E 𝐴′ →𝐴 (Φ 𝐴𝐴
Now, let the system 𝑅 ≡ 𝐴˜ belong to Alice, and let Ψ𝐴𝐴 ˜ ). Then,

˜ ′ (D 𝐵→𝐵′ ◦ N 𝐴→𝐵 )(Ψ 𝐴𝐴

⟨Φ| 𝐴𝐵 ˜ )|Φ⟩ 𝑅𝐵′ ≥ 1 − 𝜀. (14.A.12)

˜ , D 𝐵→𝐵′ ) constitute a (𝑑, 𝜀) entan-

Therefore, by definition, the elements (𝑑, Ψ𝐴𝐴
glement generation protocol.

Definition 14.32 Subspace Transmission

A subspace transmission protocol over the quantum channel N 𝐴→𝐵 consists
of the three elements (𝑑, E, D), where 𝑑 ≥ 1 and E and D are encoding and
decoding channels. The goal of the protocol is to transmit an arbitrary pure

936
Chapter 14: Quantum Communication

state 𝜓 𝐴′ such that the final state

𝜔 𝐵′ B (D𝐵→𝐵′ ◦ N 𝐴→𝐵 ◦ E 𝐴′ →𝐴 )(𝜓 𝐴′ ) (14.A.13)

is close in fidelity to the initial state. The state transmission error of the
protocol is

𝑝 (ST)
err (E, D; N) B 1 − min ⟨𝜓|N(𝜓)|𝜓⟩ = 1 − 𝐹min (D ◦ N ◦ E), (14.A.14)
𝜓

where we recall the minimum fidelity of a channel defined in (6.4.5). We call

the protocol (𝑑, E, D) a (𝑑, 𝜀) protocol, with 𝜀 ∈ [0, 1], if 𝑝 (ST)
err (E, D; N) ≤ 𝜀.

Remark: An alternative way to define the error criterion for a subspace transmission code
would be to use the average fidelity, defined in (6.4.3); please consult the Bibliographic Notes in
Section 14.5.

Given a (𝑑, 𝜀) quantum communication protocol for the channel N with the
elements (𝑑, E, D), the equality 𝑝 ∗err (E, D; N) = 1 − 𝐹 (D ◦ N ◦ E) holds, where
𝐹 (·) is the channel fidelity defined in (6.4.6). Then, restricting the optimization
in 𝐹 (D ◦ N ◦ E) to pure states Ψ𝑅 𝐴′ = |Ψ⟩⟨Ψ| 𝑅 𝐴′ such that |Ψ⟩ 𝑅 𝐴′ = |𝜙⟩ 𝑅 ⊗ |𝜓⟩ 𝐴′ ,
we obtain

𝑝 (ST)
err (E, D; N)
= 1 − 𝐹min (D ◦ N ◦ E) (14.A.15)
= 1 − min ⟨𝜓|N(𝜓)|𝜓⟩ (14.A.16)
|𝜓⟩
= 1 − min (⟨𝜙| 𝑅 ⊗ ⟨𝜓| 𝐴′ )(𝜙 𝑅 ⊗ N(𝜓 𝐴′ ))(|𝜙⟩ 𝑅 ⊗ |𝜓⟩ 𝐴′ ) (14.A.17)
|𝜙⟩,|𝜓⟩
≤ 1 − 𝐹 (D ◦ N ◦ E) (14.A.18)
≤ 𝜀. (14.A.19)

So the elements (𝑑, E, D) form a (𝑑, 𝜀) subspace transmission protocol.

937
Chapter 15

Secret Key Distillation

This chapter considers the task of secret key distillation. The setting of this task is
that Alice and Bob share a bipartite quantum state 𝜌 𝐴𝐵 , and the goal is for them to
perform local operations and public communication in order to transform 𝜌 𝐴𝐵 to a
state that approximates an ideal secret key. Some questions are in order: What is
an ideal secret key and for whom is it secret? How much secret key can they extract
from this state? These are the main questions addressed in this chapter.
The information-theoretic model we assume is that the physical laboratories of
Alice and Bob are secure, so that system 𝐴 of the state 𝜌 𝐴𝐵 is physically secured in
Alice’s laboratory and system 𝐵 is physically secured in Bob’s. We suppose that
an eavesdropper Eve possesses a system 𝐸 that purifies 𝜌 𝐴𝐵 . That is, if 𝜓 𝐴𝐵𝐸 is a
purification of 𝜌 𝐴𝐵 , then we suppose that system 𝐸 of 𝜓 𝐴𝐵𝐸 is in Eve’s possession.
This model gives the eavesdropper a lot of power. Indeed, if 𝜔 𝐴𝐵𝐸 ′ is an arbitrary
extension of the state 𝜌 𝐴𝐵 , then as a consequence of Proposition 4.4, Eve can
transform 𝜓 𝐴𝐵𝐸 to 𝜔 𝐴𝐵𝐸 ′ by means of a channel acting on her system 𝐸. We also
assume that any classical data transmitted between Alice and Bob is public, so that
Eve has access to all of it.
An ideal secret key of log2 𝐾 secret bits is a tripartite state of the following
form:
Φ 𝐴𝐵 ⊗ 𝜎𝐸 , (15.0.1)
where
𝐾−1
1 ∑︁
Φ 𝐴𝐵 B |𝑖⟩⟨𝑖| 𝐴 ⊗ |𝑖⟩⟨𝑖| 𝐵 . (15.0.2)
𝐾 𝑖=0

938
Chapter 15: Secret Key Distillation

There are three salient aspects of such a tripartite key state:

1. The key value is uniformly random and thus hard to guess.

2. The key values in the registers of Alice and Bob are perfectly correlated. That
is, if Alice measures the key value to be 𝑖 ∈ {0, . . . , 𝐾 − 1}, then Bob is
guaranteed to measure the same value.
3. The overall state is a product state between systems 𝐴𝐵 and 𝐸. This means
that Eve’s system 𝐸 is of no use in guessing the key value.

The goal of a secret-key distillation protocol is for Alice and Bob to transform the
initial state 𝜓 𝐴𝐵𝐸 , by means of local operations and public classical communication,
to a state that approximates an ideal key state of the form in (15.0.1).
A secret key is useful in a communication task called the one-time pad protocol
(also known as the Vernam cipher). In this protocol, we suppose that Alice has a
message 𝑚 ∈ {0, . . . , 𝐾 − 1} that she would like to send to Bob. By making use of
the key, Alice can calculate 𝑚˜ B 𝑚 ⊕ 𝑖, where 𝑖 is the key value and the addition is
modulo 𝐾, and then send the encrypted message 𝑚˜ over a public classical channel.
Since the key is ideal, no one else besides Alice and Bob knows the precise key
value 𝑖, and the encrypted message 𝑚˜ is uniformly random, which means that it is
hard to guess (i.e., there is a 1/𝐾 chance that an eavesdropper could guess it, which
becomes small as 𝐾 becomes large). When Bob receives the encrypted message
˜ he can calculate 𝑚 = 𝑚˜ ⊖ 𝑖 and decrypt the message 𝑚 because he knows the
𝑚,
key value 𝑖. This is one of the main uses of a secret key and in turn why we are
interested in secret key distillation.
It turns out that there are strong connections between entanglement distillation
from Chapter 13 and secret key distillation. They are not precisely the same tasks
but there are strong links, and the structure of this chapter follows the structure of
Chapter 13 quite closely. The mainÍreason for the strong connection is that the
maximally entangled state Φ 𝐴𝐵 = 𝐾1 𝑖,𝐾−1 𝑗=0 |𝑖⟩⟨ 𝑗 | 𝐴 ⊗ |𝑖⟩⟨ 𝑗 | 𝐵 can be used to generate
an ideal key state. To see this, consider that the state Φ 𝐴𝐵 is unextendible, so that
the only possible extension of it is a tensor-product extension of the form Φ 𝐴𝐵 ⊗ 𝜎𝐸 .
Then, if Alice and Bob perform local measurement channels on their systems 𝐴 and
𝐵, with respect to the computational basis, they can realize the ideal tripartite key
state of the form in (15.0.1). Thus, if one can generate maximally entangled states,
then one can generate key states. However, the converse is not true in general, and
this is what distinguishes secret key distillation from entanglement distillation.

939
Chapter 15: Secret Key Distillation

Similar to what we have done in previous chapters, here we establish lower and
upper bounds on the number of secret key bits that can be distilled from a bipartite
state 𝜌 𝐴𝐵 . The lower bounds are given in terms of the private information of the
state, and the upper bounds are given in terms of not only the private information but
also the squashed entanglement and the relative entropy of entanglement. The fact
that we can use entanglement measures as bounds further highlights the connection
between secret key distillation and entanglement.

15.1 One-Shot Setting

The one-shot setting for secret key distillation begins with Alice and Bob sharing a
state 𝜌 𝐴𝐵 , and we assume that the eavesdropper Eve has access to a system 𝐸 of
a purification of 𝜌 𝐴𝐵 . For concreteness, let 𝜓 𝐴𝐵𝐸 denote the purification of 𝜌 𝐴𝐵 ,
with system 𝐴 of 𝜓 𝐴𝐵𝐸 held by Alice, 𝐵 by Bob, and 𝐸 by Eve. Keep in mind
that all purifications of 𝜌 𝐴𝐵 are related by an isometric channel acting on the 𝐸
system, so that Eve can reach all purifications easily by performing an isometric
channel on her system 𝐸. The model we assume is that the laboratory of Alice is
physically secure and the quantum system 𝐴 is fully contained in it. Similarly, we
assume that the laboratory of Bob is physically secure and contains the system 𝐵.
However, if the state 𝜌 𝐴𝐵 is mixed, then the purifying degrees of freedom in 𝐸 are
available to Eve (if, on the other hand, 𝜌 𝐴𝐵 is pure, then an arbitrary purification
𝜓 𝐴𝐵𝐸 is always a tensor-product state of the systems 𝐴𝐵 and 𝐸 and, in this sense, it
is understood that Eve does not really have access to purifying degrees of freedom).
This approach gives the most power to the eavesdropper for the setting of secret
key distillation.
In a secret-key distillation protocol, Alice and Bob are allowed to use local
operations and public classical communication (abbreviated as LOPC). An LOPC
channel is similar to an LOCC channel (as discussed in Section 4.6.2), but the
critical difference is that Eve gets a copy of all of the classical data exchanged. Recall
that a generic LOCC channel L 𝐴𝐵→𝐴′ 𝐵′ can be written as follows, as discussed in
Definition 4.22: ∑︁
L 𝐴𝐵→𝐴′ 𝐵′ = E𝑧𝐴→𝐴′ ⊗ F𝐵→𝐵 𝑧
′, (15.1.1)
𝑧∈Z

where Z is a finite alphabet and {E𝑧𝐴→𝐴′ } 𝑧∈Z and {F𝐵→𝐵

𝑧
′ } 𝑧∈Z are sets of completely
positive maps such that the sum map L 𝐴𝐵→𝐴′ 𝐵′ is trace preserving. Then an LOPC

940
Chapter 15: Secret Key Distillation

channel is the following enlargement of L 𝐴𝐵→𝐴′ 𝐵′ :

∑︁
L 𝐴𝐵→𝐴′ 𝐵′ 𝑍 = E𝑧𝐴→𝐴′ ⊗ F𝐵→𝐵
𝑧
′ ⊗ |𝑧⟩⟨𝑧| 𝑍 , (15.1.2)
𝑧∈Z
such that Eve has access to the system 𝑍, which contains all of the classical data
exchanged to realize L 𝐴𝐵→𝐴′ 𝐵′ .
The goal of a secret-key distillation protocol is for Alice and Bob to produce an
approximation of an ideal secret-key state, which is defined as follows:

Definition 15.1 Tripartite Key State

A state 𝛾 𝐴𝐵𝐸 is a tripartite key state of size 𝐾, or containing log2 𝐾 bits
of secrecy, if local measurements of the 𝐴 and 𝐵 systems lead to the same
uniformly random outcome and the system 𝐸 is product with the measurement
outcomes. That is, after Alice and Bob send their systems through local
dephasing (measurement) channels
𝐾−1
∑︁
M 𝐴,𝐵 (·) B |𝑖⟩⟨𝑖| 𝐴,𝐵 (·)|𝑖⟩⟨𝑖| 𝐴,𝐵 , (15.1.3)
𝑖=0

the resulting state on 𝐴𝐵 and 𝐸 is as follows:

(M 𝐴 ⊗ M𝐵 )(𝛾 𝐴𝐵𝐸 ) = Φ 𝐴𝐵 ⊗ 𝜎𝐸 , (15.1.4)

for some state 𝜎𝐸 and where Φ 𝐴𝐵 is the maximally classically correlated state
𝐾−1
1 ∑︁
Φ 𝐴𝐵 B |𝑖⟩⟨𝑖| 𝐴 ⊗ |𝑖⟩⟨𝑖| 𝐵 . (15.1.5)
𝐾 𝑖=0

As stated in Definition 15.1, the defining aspect of an ideal tripartite key state is
that the systems 𝐴 and 𝐵 of Alice and Bob are perfectly correlated and uniformly
random. This property makes the actual key value, which ends up being observed
by both Alice and Bob, hard to guess if there are many key values. Furthermore,
the fact that the overall state is such that it is tensor product between 𝐴𝐵 and 𝐸
implies that Eve’s system cannot provide any help at all in guessing the key value.
With the notions above in place, we can now formally define a secret-key
distillation protocol. Such a protocol for the state 𝜌 𝐴𝐵 is defined by the pair
941
Chapter 15: Secret Key Distillation

(𝐾, L↔ ↔
𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝑍 ), where 𝐾 ∈ N and L 𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝑍 is an LOPC channel as defined
in (15.1.2), with 𝑑 𝐾 𝐴 = 𝑑 𝐾 𝐵 = 𝐾. The key distillation error 𝑝 err (L↔ ; 𝜌 𝐴𝐵 ) of the
protocol is given by the infidelity, defined as

↔ ↔
𝑝 err (L ; 𝜌 𝐴𝐵 ) B inf 1 − 𝐹 (𝛾𝐾 𝐴𝐾 𝐵 𝐸 𝑍 , L 𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝑍 (𝜓 𝐴𝐵𝐸 )) , (15.1.6)
𝛾𝐾 𝐴 𝐾 𝐵 𝐸 𝑍

where the optimization is with respect to every tripartite key state 𝛾𝐾 𝐴𝐾 𝐵 𝐸 𝑍 of

size 𝐾, which is of the form in Definition 15.1 under the identifications 𝐾 𝐴 ↔ 𝐴,
𝐾 𝐵 ↔ 𝐵, and 𝐸 𝑍 ↔ 𝐸. Furthermore, 𝜓 𝐴𝐵𝐸 is a purification of 𝜌 𝐴𝐵 . Note that the
key distillation error 𝑝 err (L↔ ; 𝜌 𝐴𝐵 ) is invariant under the choice of a purification
𝜓 𝐴𝐵𝐸 because it involves an optimization over every tripartite key state 𝛾𝐾 𝐴𝐾 𝐵 𝐸 𝑍 ,
purifications are related by isometric channels, the fidelity is invariant under
isometric channels, and V𝐸 (𝛾𝐾 𝐴𝐾 𝐵 𝐸 𝑍 ) is an ideal tripartite key state if 𝛾𝐾 𝐴𝐾 𝐵 𝐸 𝑍
is, where V𝐸 is an isometric channel. The optimization in (15.1.6) guarantees the
existence of at least one state 𝜎𝐸 𝑍 of the eavesdropper such that the actual state
L↔𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝑍 (𝜓 𝐴𝐵𝐸 ) of the protocol approximates an ideal tripartite key state, in
the following sense
(M𝐾 𝐴 ⊗ M𝐾 𝐵 )(L↔
𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝑍 (𝜓 𝐴𝐵𝐸 )) ≈ (M𝐾 𝐴 ⊗ M𝐾 𝐵 )(𝛾𝐾 𝐴 𝐾 𝐵 𝐸 𝑍 ) (15.1.7)
= Φ𝐾 𝐴𝐾 𝐵 ⊗ 𝜎𝐸 𝑍 , (15.1.8)
if the key distillation error 𝑝 err (L↔ ; 𝜌 𝐴𝐵 ) is small.
At this point, it might not be clear why we employ the infidelity error criterion
in (15.1.6) rather than the normalized trace distance. We did so in Chapter 13 in
the context of entanglement distillation because it corresponded to the operational
notion of an entanglement test (see (13.1.4)). We later show how the infidelity error
criterion corresponds to the operational notion of a “privacy test,” which justifies
its use in the context of secret key distillation.

Definition 15.2 (𝑲, 𝜺) secret-key distillation protocol

A secret key distillation protocol (𝐾, L↔ 𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝑍 ) for the state 𝜌 𝐴𝐵 is called a
(𝐾, 𝜀) protocol, with 𝜀 ∈ [0, 1], if 𝑝 err (L↔ ; 𝜌 𝐴𝐵 ) ≤ 𝜀.

Given 𝜀 ∈ [0, 1], the largest number log2 𝐾 of 𝜀-approximate secret-key bits
that can be extracted from a state 𝜌 𝐴𝐵 among all (𝐾, 𝜀) secret-key distillation
protocols is called the one-shot 𝜀-distillable key of 𝜌 𝐴𝐵 .

942
Chapter 15: Secret Key Distillation

Definition 15.3 One-Shot Distillable Key

Given a bipartite state 𝜌 𝐴𝐵 and 𝜀 ∈ [0, 1], the one-shot distillable key of 𝜌 𝐴𝐵 ,
denoted by 𝐾 𝐷𝜀 (𝜌 𝐴𝐵 ) ≡ 𝐾 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 , is defined as

𝐾 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 B sup log2 𝐾 : 𝑝 err (L↔ ; 𝜌 𝐴𝐵 ) ≤ 𝜀 ,

(15.1.9)
(𝐾,L↔ )

where the optimization is over all 𝐾 ∈ N and every LOPC channel L↔

𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝑍
with 𝑑 𝐾 𝐴 = 𝑑 𝐾 𝐵 = 𝐾.

Calculating the one-shot distillable key is difficult computationally because it

involves optimizing over the key size 𝐾 and over every LOPC channel L↔ 𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝑍 ,
with 𝑑 𝐾 𝐴 = 𝑑 𝐾 𝐵 = 𝐾. We thus try to estimate the one-shot distillable key by
determining upper and lower bounds on it. Section 15.1.3 introduces upper bounds
on the one-shot distillable key. Before doing so, we first clarify how secret-key
distillation protocols can be thought of from a different perspective as bipartite
private-state distillation protocols.

15.1.1 Tripartite Key States and Bipartite Private States

An important insight for secret key distillation is that there is a way to describe
the whole theory exclusively in terms of a bipartite scenario. This is related to
the assumption that the eavesdropper Eve possesses a full purification 𝜓 𝐴𝐵𝐸 of the
original state 𝜌 𝐴𝐵 , along with the structure of quantum mechanics.
To motivate this concept, consider that an approximate tripartite state 𝛾 𝐴𝐵𝐸 (as
described in Definition 15.1) is generated at the end of a key distillation protocol,
and it is such that all that the eavesdropper possesses is only available in the system
𝐸 (in this context, let us make the same identifications 𝐾 𝐴 ↔ 𝐴, 𝐾 𝐵 ↔ 𝐵, and
𝐸 𝑍 ↔ 𝐸 discussed around (15.1.6)). As such, we can consider a purification of
the state 𝛾 𝐴𝐵𝐸 of the form 𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 , in which the joint system 𝐴′ 𝐵′ constitutes
the purifying system. Since a secret-key distillation protocol involves only three
parties, and we already argued that the system 𝐸 is all that Eve possesses, it follows
that Alice and Bob jointly possess the purifying system, which can be split among
them as 𝐴′ 𝐵′. The reduced state 𝛾 𝐴𝐴′ 𝐵𝐵′ = Tr𝐸 [𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 ] is then a bipartite state
because all systems involved are in possession of Alice and Bob. If the original
state 𝛾 𝐴𝐵𝐸 is a tripartite key state according to Definition 15.1, then by constructing
943
Chapter 15: Secret Key Distillation

𝛾 𝐴𝐴′ 𝐵𝐵′ according to this procedure, the resulting state is called a bipartite private
state, and it has a particular structure. Conversely, if 𝛾 𝐴𝐴′ 𝐵𝐵′ is a state with the
structure of a bipartite private state, then it follows that by purifying this state
to 𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 with an 𝐸 system and tracing over systems 𝐴′ and 𝐵′, we arrive at
a tripartite key state. So there is an equivalence between these two viewpoints
(tripartite picture of key distillation and bipartite picture of private state distillation).
We develop this correspondence in detail in what follows.
Before starting, we briefly mention that the equivalence between the tripartite
and bipartite pictures of key distillation implies that we can bring the tools of
entanglement theory (Chapter 9) to bear on the problem of establishing upper
bounds on the number of approximate secret-key bits that can be generated in a key
distillation protocol. This is one of the main applications of this correspondence,
and we note here that it has led to other insights in quantum information theory.

Definition 15.4 Bipartite Private State

A state 𝛾 𝐴𝐵𝐴′ 𝐵′ is a bipartite private state of size 𝐾, containing log2 𝐾 bits of
secrecy, if after purifying 𝛾 𝐴𝐵𝐴′ 𝐵′ to a pure state 𝛾 𝐴𝐵𝐴′ 𝐵′ 𝐸 with purifying system
𝐸 and tracing over the systems 𝐴′ 𝐵′, the resulting state 𝛾 𝐴𝐵𝐸 is a tripartite key
state of size 𝐾. The systems 𝐴 and 𝐵 are called key systems, and the systems
𝐴′ and 𝐵′ are called shield systems.

Theorem 15.5
A state 𝛾 𝐴𝐵𝐴′ 𝐵′ is a bipartite private state if and only if it has the following
form:
†
𝛾 𝐴𝐵𝐴′ 𝐵′ = 𝑈 𝐴𝐵𝐴′ 𝐵′ (Φ 𝐴𝐵 ⊗ 𝜃 𝐴′ 𝐵′ ) 𝑈 𝐴𝐵𝐴 ′ 𝐵′ , (15.1.10)
where Φ 𝐴𝐵 is a maximally entangled state of Schmidt rank 𝐾:
𝐾−1
1 ∑︁
Φ 𝐴𝐵 B |𝑖⟩⟨ 𝑗 | 𝐴 ⊗ |𝑖⟩⟨ 𝑗 | 𝐵 , (15.1.11)
𝐾 𝑖, 𝑗=0

𝜃 𝐴′ 𝐵′ is some state, and 𝑈 𝐴𝐵𝐴′ 𝐵′ is a global twisting unitary of the following

form:
𝐾−1
∑︁
𝑖𝑗
𝑈 𝐴𝐵𝐴 𝐵 B
′ ′ |𝑖⟩⟨𝑖| 𝐴 ⊗ | 𝑗⟩⟨ 𝑗 | 𝐵 ⊗ 𝑈 𝐴′ 𝐵′ . (15.1.12)
𝑖, 𝑗=0

944
Chapter 15: Secret Key Distillation

𝑖𝑗
In the above, 𝑈 𝐴′ 𝐵′ is a unitary operator for all 𝑖, 𝑗 ∈ {0, . . . , 𝐾 − 1}.

Proof: Suppose that 𝛾 𝐴𝐵𝐴′ 𝐵′ has the form in (15.1.10). A particular purification
of 𝛾 𝐴𝐵𝐴′ 𝐵′ is

|𝜙 𝛾 ⟩ 𝐴𝐵𝐴′ 𝐵′ 𝐸
= 𝑈 𝐴𝐵𝐴′ 𝐵′ |Φ⟩ 𝐴𝐵 ⊗ |𝜓 𝜃 ⟩ 𝐴′ 𝐵′ 𝐸 (15.1.13)
𝐾−1 𝐾−1
!
∑︁
𝑖𝑗 ª 1 ∑︁
= |𝑖⟩⟨𝑖| 𝐴 ⊗ | 𝑗⟩⟨ 𝑗 | 𝐵 ⊗ 𝑈 𝐴′ 𝐵′ ® √ |𝑘⟩ 𝐴 |𝑘⟩𝐵 ⊗ |𝜓 𝜃 ⟩ 𝐴′ 𝐵′ 𝐸
©
(15.1.14)
𝐾 𝑘=0
«𝑖, 𝑗=0 ¬
𝐾−1
1 ∑︁
=√ |𝑘⟩ 𝐴 |𝑘⟩𝐵 ⊗ 𝑈 𝐴𝑘 𝑘′ 𝐵′ |𝜓 𝜃 ⟩ 𝐴′ 𝐵′ 𝐸 , (15.1.15)
𝐾 𝑘=0

where |𝜓 𝜃 ⟩ 𝐴′ 𝐵′ 𝐸 purifies 𝜃 𝐴′ 𝐵′ . The local dephasing channels in (15.1.3) lead to

the following state

(M 𝐴 ⊗ M𝐵 )(|𝜙 𝛾 ⟩⟨𝜙 𝛾 | 𝐴𝐵𝐴′ 𝐵′ 𝐸 )

𝐾−1 †
1 ∑︁ 𝑘𝑘 𝜃 𝜃 ′ ′ 𝑘𝑘
= |𝑘⟩⟨𝑘 | 𝐴 ⊗ |𝑘⟩⟨𝑘 | 𝐵 ⊗ 𝑈 𝐴′ 𝐵′ |𝜓 ⟩⟨𝜓 | 𝐴 𝐵 𝐸 𝑈 𝐴′ 𝐵′ . (15.1.16)
𝐾 𝑘=0

Taking a partial trace over the 𝐴′ 𝐵′ systems leads to

" 𝐾−1 #
1 ∑︁ †
Tr 𝐴′ 𝐵′ |𝑘⟩⟨𝑘 | 𝐴 ⊗ |𝑘⟩⟨𝑘 | 𝐵 ⊗ 𝑈 𝐴𝑘 𝑘′ 𝐵′ |𝜓 𝜃 ⟩⟨𝜓 𝜃 | 𝐴′ 𝐵′ 𝐸 𝑈 𝐴𝑘 𝑘′ 𝐵′
𝐾 𝑘=0
𝐾−1 †
1 ∑︁ 𝑘 𝑘 𝜃 𝜃 𝑘 𝑘
= |𝑘⟩⟨𝑘 | 𝐴 ⊗ |𝑘⟩⟨𝑘 | 𝐵 ⊗ Tr 𝐴′ 𝐵′ 𝑈 𝐴′ 𝐵′ |𝜓 ⟩⟨𝜓 | 𝐴′ 𝐵′ 𝐸 𝑈 𝐴′ 𝐵′ (15.1.17)
𝐾 𝑘=0
𝐾−1 †
1 ∑︁ 𝑘 𝑘 𝑘 𝑘 𝜃 𝜃
= |𝑘⟩⟨𝑘 | 𝐴 ⊗ |𝑘⟩⟨𝑘 | 𝐵 ⊗ Tr 𝐴′ 𝐵′ 𝑈 𝐴′ 𝐵′ 𝑈 𝐴′ 𝐵′ |𝜓 ⟩⟨𝜓 | 𝐴′ 𝐵′ 𝐸 (15.1.18)
𝐾 𝑘=0
= Φ 𝐴𝐵 ⊗ 𝜌 𝐸 . (15.1.19)

Thus, the particular purification |𝜙 𝛾 ⟩ 𝐴𝐵𝐴′ 𝐵′ 𝐸 leads to a tripartite key state on systems
𝐴𝐵𝐸. Now, in the development above, we chose a particular purification of 𝛾 𝐴𝐵𝐴′ 𝐵′ .
However, given that all purifications are related by isometries acting on the purifying

945
Chapter 15: Secret Key Distillation

system, every purification can be written as 𝑉𝐸→𝐸 ′ |𝜙 𝛾 ⟩ 𝐴𝐵𝐴′ 𝐵′ 𝐸 for some isometry
𝑉𝐸→𝐸 ′ . Then repeating the calculation above gives that the reduced state on 𝐴𝐵𝐸 ′
after local dephasing channels on 𝐴 and 𝐵 is

Φ 𝐴𝐵 ⊗ 𝑉𝐸→𝐸 ′ 𝜌 𝐸 (𝑉𝐸→𝐸 ′ ) † , (15.1.20)

so that there is no correlation between the measurement outcomes of Alice and

Bob and the system 𝐸 ′. Furthermore, the measurement outcomes are perfectly
correlated and uniformly random. So we conclude that a state 𝛾 𝐴𝐵𝐴′ 𝐵′ of the form
in (15.1.10) is a bipartite private state.
Conversely, suppose now that 𝛾 𝐴𝐵𝐴′ 𝐵′ is a bipartite private state held by Alice
and Bob, and let |𝜙 𝛾 ⟩ 𝐴𝐵𝐴′ 𝐵′ 𝐸 be a purification of it, with 𝐸 the purifying system.
Expanding the state in the basis of the local measurements of Alice and Bob gives
𝐾−1
∑︁
𝛾
|𝜙 ⟩ 𝐴𝐵𝐴′ 𝐵′ 𝐸 = 𝛼𝑖, 𝑗 |𝑖⟩ 𝐴 | 𝑗⟩𝐵 |𝜙𝑖, 𝑗 ⟩ 𝐴′ 𝐵′ 𝐸 , (15.1.21)
𝑖, 𝑗=0

for some states |𝜙𝑖, 𝑗 ⟩ 𝐴′ 𝐵′ 𝐸 and probability amplitudes {𝛼𝑖, 𝑗 }𝑖, 𝑗 . However, in order
for the measurement outcomes of Alice and Bob to be perfectly correlated and
uniformly random, it is necessary that
1
if 𝑖 = 𝑗
|𝛼𝑖, 𝑗 | 2 = 𝐾 . (15.1.22)
0 if 𝑖 ≠ 𝑗
(Any other values for the amplitudes 𝛼𝑖, 𝑗 would lead to a different distribution
upon measurement of the 𝐴 and 𝐵 systems.) So the global state should have the
following form:
𝐾−1
𝛾
∑︁ 1
|𝜙 ⟩ 𝐴𝐵𝐴′ 𝐵′ 𝐸 = √ |𝑖⟩ 𝐴 |𝑖⟩𝐵 𝑒𝑖𝜑𝑖 |𝜙𝑖,𝑖 ⟩ 𝐴′ 𝐵′ 𝐸 . (15.1.23)
𝑖=0 𝐾
In order for the reduced density operator on 𝐸 to be independent of the measurement
outcomes of Alice and Bob, it is necessary for it to be a fixed state with no dependence
on 𝑖:
Tr 𝐴′ 𝐵′ [|𝜙𝑖,𝑖 ⟩⟨𝜙𝑖,𝑖 | 𝐴′ 𝐵′ 𝐸 ] = 𝜎𝐸 . (15.1.24)
In such a case, then all of the states |𝜙𝑖,𝑖 ⟩ 𝐴′ 𝐵′ 𝐸 are purifications of the same state 𝜎𝐸 ,
so that there exists a unitary 𝑈 𝑖𝐴′ 𝐵′ relating each |𝜙𝑖,𝑖 ⟩ 𝐴′ 𝐵′ 𝐸 to a fixed purification
|𝜙𝜎 ⟩ 𝐴′ 𝐵′ 𝐸 of 𝜎:
𝑒𝑖𝜑𝑖 |𝜙𝑖,𝑖 ⟩ 𝐴′ 𝐵′ 𝐸 = 𝑈 𝑖𝐴′ 𝐵′ |𝜙𝜎 ⟩ 𝐴′ 𝐵′ 𝐸 . (15.1.25)
946
Chapter 15: Secret Key Distillation

Thus, we can write the global state as

𝐾−1
1 ∑︁
√ |𝑖⟩ 𝐴 |𝑖⟩𝐵𝑈 𝑖,𝑖 𝜎 ′ ′
𝐴′ 𝐵′ |𝜙 ⟩ 𝐴 𝐵 𝐸 , (15.1.26)
𝐾 𝑖=0
which is equivalent to
𝐾−1
© ∑︁ 𝑖, 𝑗 ª
|𝑖⟩⟨𝑖| 𝐴 ⊗ | 𝑗⟩⟨ 𝑗 | 𝐵 ⊗ 𝑈 𝐴′ 𝐵′ ® |Φ⟩ 𝐴𝐵 ⊗ |𝜙𝜎 ⟩ 𝐴′ 𝐵′ 𝐸 , (15.1.27)
«𝑖, 𝑗=0 ¬
𝑖, 𝑗
after setting 𝑈 𝐴′ 𝐵′ = 𝑈 𝑖𝐴′ 𝐵′ for all 𝑗 ∈ {0, . . . , 𝐾 − 1} (there is in fact full freedom
𝑖, 𝑗
in how the unitary 𝑈 𝐴′ 𝐵′ is chosen for 𝑖 ≠ 𝑗). One can now deduce that the reduced
state on systems 𝐴𝐵𝐴′ 𝐵′ has the form in (15.1.10). ■

Definition 15.6 𝜺-Approximate Tripartite Key State

Fix 𝜀 ∈ [0, 1]. A state 𝜌 𝐴𝐵𝐸 is an 𝜀-approximate tripartite key state if there
exists a tripartite key state 𝛾 𝐴𝐵𝐸 , as in Definition 15.1, such that

𝐹 (𝜌 𝐴𝐵𝐸 , 𝛾 𝐴𝐵𝐸 ) ≥ 1 − 𝜀. (15.1.28)

Similarly, a state 𝜌 𝐴𝐵𝐴′ 𝐵′ is an 𝜀-approximate bipartite private state if there

exists a bipartite private state 𝛾 𝐴𝐵𝐴′ 𝐵′ , as in Definition 15.4, such that

𝐹 (𝜌 𝐴𝐵𝐴′ 𝐵′ , 𝛾 𝐴𝐵𝐴′ 𝐵′ ) ≥ 1 − 𝜀. (15.1.29)

Approximate tripartite key states are in one-to-one correspondence with approx-

imate bipartite private states, as summarized below:

Proposition 15.7
If 𝜌 𝐴𝐵𝐴′ 𝐵′ is an 𝜀-approximate bipartite key state with 𝐾 key values, then the
state 𝜌 𝐴𝐵𝐸 is an 𝜀-approximate tripartite key state with 𝐾 key values, where
𝜌 𝜌
𝜌 𝐴𝐵𝐸 = Tr 𝐴′ 𝐵′ [𝜓 𝐴𝐵𝐴′ 𝐵′ 𝐸 ] and 𝜓 𝐴𝐵𝐴′ 𝐵′ 𝐸 is an arbitrary purification of 𝜌 𝐴𝐵𝐴′ 𝐵′ .
The converse statement is true as well.

𝜌
Proof: Suppose that the inequality in (15.1.28) is satisfied. Let 𝜓 𝐴𝐵𝐴′ 𝐵′ 𝐸 be a
purification of 𝜌 𝐴𝐵𝐸 . Then by applying Uhlmann’s theorem (Theorem 6.8), there
947
Chapter 15: Secret Key Distillation

exists a purification 𝛾 𝐴𝐵𝐴′ 𝐵′ 𝐸 of 𝛾 𝐴𝐵𝐸 such that

𝜌
𝐹 (𝜌 𝐴𝐵𝐸 , 𝛾 𝐴𝐵𝐸 ) = 𝐹 (𝜓 𝐴𝐵𝐴′ 𝐵′ 𝐸 , 𝛾 𝐴𝐵𝐴′ 𝐵′ 𝐸 ). (15.1.30)
Tracing over the 𝐸 system and applying the data-processing inequality for fidelity
(Theorem 6.9), we conclude that
𝜌
𝐹 (𝜓 𝐴𝐵𝐴′ 𝐵′ , 𝛾 𝐴𝐵𝐴′ 𝐵′ ) ≥ 1 − 𝜀. (15.1.31)
Since 𝛾 𝐴𝐵𝐸 is an ideal tripartite key state and the state 𝛾 𝐴𝐵𝐴′ 𝐵′ arises from it via
purification and tracing over system 𝐸, it follows from Definition 15.4 that 𝛾 𝐴𝐵𝐴′ 𝐵′
is an ideal bipartite private state. In turn, according to Definition 15.6, it follows
𝜌
that 𝜓 𝐴𝐵𝐴′ 𝐵′ is an 𝜀-approximate bipartite private state.
For the other implication, suppose that the inequality in (15.1.29) is satisfied.
𝜌
Let 𝜓 𝐴𝐵𝐴′ 𝐵′ 𝐸 be a purification of 𝜌 𝐴𝐵𝐴′ 𝐵′ . By applying Uhlmann’s theorem
(Theorem 6.8), there exists a purification 𝛾 𝐴𝐵𝐴′ 𝐵′ 𝐸 of the ideal bipartite private
state 𝛾 𝐴𝐵𝐴′ 𝐵′ such that
𝜌
𝐹 (𝜌 𝐴𝐵𝐴′ 𝐵′ , 𝛾 𝐴𝐵𝐴′ 𝐵′ ) = 𝐹 (𝜓 𝐴𝐵𝐴′ 𝐵′ 𝐸 , 𝛾 𝐴𝐵𝐴′ 𝐵′ 𝐸 ) (15.1.32)
Tracing over the 𝐴′ 𝐵′ systems and applying the data-processing inequality for
fidelity, we conclude that
𝜌
𝐹 (𝜓 𝐴𝐵𝐸 , 𝛾 𝐴𝐵𝐸 ) ≥ 1 − 𝜀. (15.1.33)
Since 𝛾 𝐴𝐵𝐴′ 𝐵′ is an ideal bipartite private state and the state 𝛾 𝐴𝐵𝐸 arises from it
via purification and tracing over systems 𝐴′ 𝐵′, it follows from Definition 15.4 that
𝛾 𝐴𝐵𝐸 is an ideal tripartite key state. In turn, according to Definition 15.6, it follows
𝜌
that 𝜓 𝐴𝐵𝐸 is an 𝜀-approximate tripartite key state. ■

15.1.2 Equivalence of Tripartite Key Distillation and Bipartite

Private State Distillation

The equivalence between ideal and approximate tripartite key states and bipartite
private states extends further, and it is a correspondence that allows us to consider
secret key distillation in the bipartite picture. To this end, we define a bipartite
private-state distillation protocol, and then we prove the equivalence.
A bipartite private-state distillation protocol for the state 𝜌 𝐴𝐵 is defined by the
pair (𝐾, L↔ ↔
𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝐴′ 𝐵′ ), where 𝐾 ∈ N and L 𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝐴′ 𝐵′ is an LOCC channel

948
Chapter 15: Secret Key Distillation

with 𝑑 𝐾 𝐴 = 𝑑 𝐾 𝐵 = 𝐾. The key distillation error 𝑝 err𝑏 (L↔ ; 𝜌 ) of the protocol is

𝐴𝐵
given in terms of the infidelity, defined as

↔ ↔
𝑏
𝑝 err (L ; 𝜌 𝐴𝐵 ) B inf 1 − 𝐹 (𝛾𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ , L 𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ (𝜌 𝐴𝐵 )) ,
𝛾𝐾 𝐴 𝐾 𝐵 𝐴′ 𝐵′
(15.1.34)
where the optimization is with respect to every bipartite private state 𝛾𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′
such that 𝑑 𝐾 𝐴 = 𝑑 𝐾 𝐵 = 𝐾.

Definition 15.8 (𝑲, 𝜺) Private-State Distillation Protocol

A bipartite private-state distillation protocol (𝐾, L↔ 𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝐴′ 𝐵′ ) for the state
𝑏 (L↔ ; 𝜌 ) ≤ 𝜀.
𝜌 𝐴𝐵 is called a (𝐾, 𝜀) protocol, with 𝜀 ∈ [0, 1], if 𝑝 err 𝐴𝐵

We now establish the main result of this section, which is the equivalence of
tripartite key distillation and bipartite private-state distillation:

Theorem 15.9
Let 𝐾 ∈ N and 𝜀 ∈ [0, 1]. Let 𝜌 𝐴𝐵 be a bipartite state. There exists a (𝐾, 𝜀)
tripartite key distillation protocol for 𝜌 𝐴𝐵 if and only if there exists a (𝐾, 𝜀)
bipartite private-state distillation protocol for 𝜌 𝐴𝐵 .

Proof: We start by proving that there exists a (𝐾, 𝜀) bipartite private-state distilla-
tion protocol if there exists a (𝐾, 𝜀) tripartite key distillation protocol. Let 𝜓 𝐴𝐵𝐸
be a purification of 𝜌 𝐴𝐵 , let L↔ 𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝑍 be the LOPC channel realizing the key
distillation, and let 𝛾𝐾 𝐴𝐾 𝐵 𝐸 𝑍 be a tripartite key state such that

1 − 𝐹 (𝛾𝐾 𝐴𝐾 𝐵 𝐸 𝑍 , L↔
𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝑍 (𝜓 𝐴𝐵𝐸 )) ≤ 𝜀. (15.1.35)

The LOPC channel L↔𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝑍 has the form in (15.1.2), so that

∑︁
L↔𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝑍 = E𝑧𝐴→𝐾 𝐴 ⊗ F𝐵→𝐾
𝑧
𝐵
⊗ |𝑧⟩⟨𝑧| 𝑍 . (15.1.36)
𝑧∈Z

L ↔
An isometric extension 𝑈 𝐴𝐵→𝐾 ′ ′ of this LOPC channel is as follows:
𝐴𝐾 𝐵 𝐴 𝐵 𝑍
∑︁ 𝑧
L↔ E F𝑧
𝑈 𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝑍 B 𝑉𝐴→𝐾 𝐴𝐴
′ ⊗ 𝑉𝐵→𝐾 𝐵 𝐵′ ⊗ |𝑧⟩ 𝑍 , (15.1.37)
𝑧∈Z

949
Chapter 15: Secret Key Distillation

E
where {𝑉𝐴→𝐾
𝑧 F 𝑧
′ } 𝑧∈Z and {𝑉𝐵→𝐾 𝐵 ′ } 𝑧∈Z are sets of linear operators such that
𝐴𝐴 𝐵
L↔
𝑈 𝐴𝐵→𝐾 is an isometry and
′ ′
𝐴𝐾 𝐵 𝐴 𝐵 𝑍

↔
Tr 𝐴′ 𝐵′ ◦UL𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝑍 = L↔
𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝑍 , (15.1.38)

with
↔ ↔ ↔
UL𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝑍 (·) B 𝑈 𝐴𝐵→𝐾
L L †
′ ′ (·)(𝑈 𝐴𝐵→𝐾 𝐾 𝐴′ 𝐵 ′ 𝑍 ) .
𝐴𝐾 𝐵 𝐴 𝐵 𝑍 𝐴 𝐵
(15.1.39)
E
To meet these requirements, note that it is necessary for each 𝑉𝐴→𝐾
𝑧 F 𝑧
′ and 𝑉𝐵→𝐾 𝐵 ′
𝐴𝐴 𝐵
to be a contraction, i.e., satisfying
E 𝑧 F 𝑧
𝑉𝐴→𝐾 𝐴𝐴
′ , 𝑉𝐵→𝐾 𝐵𝐵
′ ≤ 1. (15.1.40)
∞ ∞
↔
It then follows that the state UL𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝑍 (𝜓 𝐴𝐵𝐸 ) purifies L↔
𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝑍 (𝜓 𝐴𝐵𝐸 ),
and by applying Uhlmann’s theorem (Theorem 6.8), there exists a pure state
𝛾𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝐸 𝑍 satisfying

𝐹 (𝛾𝐾 𝐴𝐾 𝐵 𝐸 𝑍 , L↔
𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝑍 (𝜓 𝐴𝐵𝐸 ))
↔
= 𝐹 (𝛾𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝐸 𝑍 , UL𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝑍 (𝜓 𝐴𝐵𝐸 )). (15.1.41)

Now applying the same reasoning given in Proposition 15.7, we conclude that the
following inequality holds
↔
1 − 𝐹 (𝛾𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ , (Tr 𝑍 ◦UL𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝑍 )(𝜌 𝐴𝐵 )) ≤ 𝜀, (15.1.42)

where 𝛾𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ is an ideal bipartite private state of size 𝐾. Note that the channel
↔
Tr 𝑍 ◦UL𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝑍 is an LOCC channel, because it has the following form:
∑︁ 𝑧
L↔
VE𝐴→𝐾 𝐴 𝐴′ ⊗ VF𝐵→𝐾 𝐵 𝐵′ .
𝑧
Tr 𝑍 ◦U 𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝑍 = (15.1.43)
𝑧∈Z

Thus, there exists a (𝐾, 𝜀) bipartite private-state distillation protocol if there exists
a (𝐾, 𝜀) tripartite key distillation protocol.
We now prove the opposite implication. Suppose that there exists a (𝐾, 𝜀)
bipartite private-state distillation protocol. Let L↔𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝐴′ 𝐵′ be the LOCC channel
realizing the private-state distillation, and let 𝛾𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ be an ideal bipartite private
state satisfying

1 − 𝐹 (𝛾𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ , L↔
𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝐴′ 𝐵′ (𝜌 𝐴𝐵 )) ≤ 𝜀. (15.1.44)
950
Chapter 15: Secret Key Distillation

Let 𝜓 𝐴𝐵𝐸 be a purification of 𝜌 𝐴𝐵 . Suppose that the LOCC channel L↔

𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝐴′ 𝐵′
has the following form:
∑︁
↔
L 𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ = E𝑧𝐴→𝐾 𝐴 𝐴′ ⊗ F𝐵→𝐾
𝑧
𝐵𝐵
′, (15.1.45)
𝑧∈Z

where Z is a finite alphabet and {E𝑧𝐴→𝐾 𝐴 𝐴′ } 𝑧∈Z and {F𝐵→𝐾 𝑧

𝐵𝐵
′ } 𝑧∈Z are sets of
↔
completely positive maps such that L 𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ is trace preserving. Without
loss of generality, we can suppose that each completely positive map E𝑧𝐴→𝐾 𝐴 𝐴′
consists of a single Kraus operator, and we can suppose the same for F𝐵→𝐾 𝑧
𝐵𝐵
′ (the
reasoning here is similar to that given in the remark after Definition 3.5). Let us
𝑧 𝑧
denote these as 𝐸 𝐴→𝐾 𝐴𝐴
′ and 𝐹𝐵→𝐾 𝐵 ′ , respectively. Then an isometric extension
𝐵
of this LOCC channel is as follows:
∑︁
L↔ 𝑧 𝑧
𝑈 𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝑍 B 𝐸 𝐴→𝐾 ′ ⊗ 𝐹𝐵→𝐾 𝐵 ′ ⊗ |𝑧⟩ 𝑍 . (15.1.46)
𝐴𝐴 𝐵
𝑧∈Z
↔
Thus, the state UL𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝑍 (𝜓 𝐴𝐵𝐸 ) is a purification of L↔
𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝐴′ 𝐵′ (𝜌 𝐴𝐵 ).
Applying Uhlmann’s theorem (Theorem 6.8), it follows that there exists a purification
𝛾𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝐸 𝑍 of 𝛾𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ satisfying
↔
𝐹 (𝛾𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝐸 𝑍 , UL𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝑍 (𝜓 𝐴𝐵𝐸 )) ≥ 1 − 𝜀. (15.1.47)
Tracing over systems 𝐴′ 𝐵′ and applying the data-processing inequality for fidelity,
we conclude that
↔
𝐹 (𝛾𝐾 𝐴𝐾 𝐵 𝐸 𝑍 , (Tr 𝐴′ 𝐵′ ◦UL𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝑍 )(𝜓 𝐴𝐵𝐸 )) ≥ 1 − 𝜀. (15.1.48)
Now by applying the same reasoning in Proposition 15.7, we conclude that the state
𝛾𝐾 𝐴𝐾 𝐵 𝐸 𝑍 is an ideal tripartite key state. However, the channel
↔
Tr 𝐴′ 𝐵′ ◦UL𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝑍 (15.1.49)
is not necessarily an LOPC channel due to the coherence of the 𝑍 system with
the other systems. We can apply a completely dephasing channel Δ𝑍 (·) B
Í
𝑧∈Z |𝑧⟩⟨𝑧| 𝑍 (·)|𝑧⟩⟨𝑧| 𝑍 to the 𝑍 system, and the fidelity does not decrease under the
action of this channel, implying that
↔
𝐹 (Δ𝑍 (𝛾𝐾 𝐴𝐾 𝐵 𝐸 𝑍 ), (Δ𝑍 ◦ Tr 𝐴′ 𝐵′ ◦UL𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝑍 )(𝜓 𝐴𝐵𝐸 )) ≥ 1 − 𝜀. (15.1.50)

The state Δ𝑍 (𝛾𝐾 𝐴𝐾 𝐵 𝐸 𝑍 ) is a tripartite key state, and the channel

↔
Δ𝑍 ◦ Tr 𝐴′ 𝐵′ ◦UL𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝑍 (15.1.51)
951
Chapter 15: Secret Key Distillation

is an LOPC channel, being explicitly written as follows:

↔
∑︁
Δ𝑍 ◦ Tr 𝐴′ 𝐵′ ◦UL𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ 𝑍 = E𝑧𝐴→𝐾 𝐴 ⊗ F𝐵→𝐾
𝑧
𝐵
⊗ |𝑧⟩⟨𝑧| 𝑍 , (15.1.52)
𝑧∈Z

where
†
E𝑧𝐴→𝐾 𝐴 (·) B Tr 𝐴′ [𝐸 𝐴→𝐾
𝑧
𝐴𝐴
𝑧
′ (·)(𝐸 𝐴→𝐾 𝐴′ ) ],
𝐴
(15.1.53)
†
F𝐵→𝐾
𝑧
𝐵
𝑧
(·) B Tr 𝐵′ [𝐹𝐵→𝐾 𝐵𝐵
𝑧
′ (·)(𝐹𝐵→𝐾 𝐵 ′ ) ].
𝐵
(15.1.54)

Thus, we have proven that the existence of a (𝐾, 𝜀) bipartite private-state distil-
lation protocol for 𝜌 𝐴𝐵 implies the existence of a (𝐾, 𝜀) tripartite key distillation
protocol. ■

15.1.2.1 Entanglement Distillation and Secret Key Distillation

The equivalence between tripartite key distillation and bipartite private-state

distillation allows us to relate secret key distillation to entanglement distillation.
Indeed, a maximally entangled state Φ 𝐴𝐵 is a particular kind of bipartite private
state in which the shield systems 𝐴′ 𝐵′ are trivial and the twisting unitary 𝑈 𝐴𝐵𝐴′ 𝐵′
is the identity. This and Theorem 15.9 imply that a (𝐾, 𝜀) entanglement distillation
protocol is a (𝐾, 𝜀) secret-key distillation protocol. However, the converse is not
necessarily true because it is not generally possible to convert a bipartite private
state of size 𝐾 to a maximally entangled state of Schmidt rank 𝐾.
As a consequence of the discussion above, it follows that the one-shot distillable
entanglement of a bipartite state 𝜌 𝐴𝐵 is a lower bound on the one-shot distillable
key of 𝜌 𝐴𝐵 :
𝐸 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 ≤ 𝐾 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 (15.1.55)
for all 𝜀 ∈ [0, 1]. Since this relationship holds on the fundamental one-shot level,
it also holds for the asymptotic quantities as well:

𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 ≤ 𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 , (15.1.56)
e𝐷 ( 𝐴; 𝐵) 𝜌 ≤ 𝐾
𝐸 e𝐷 ( 𝐴; 𝐵) 𝜌 , (15.1.57)

where 𝐸 𝐷 ( 𝐴; 𝐵) 𝜌 is the distillable entanglement of 𝜌 𝐴𝐵 (Definition 13.15), 𝐾 𝐷 ( 𝐴; 𝐵) 𝜌

is the distillable key of 𝜌 𝐴𝐵 (given later in Definition 15.28), and 𝐸 e𝐷 ( 𝐴; 𝐵) 𝜌 and
e𝐷 ( 𝐴; 𝐵) 𝜌 are the strong converse quantities.
𝐾
952
Chapter 15: Secret Key Distillation

15.1.3 Upper Bounds on the Number of Secret-Key Bits

In this section, we provide three different upper bounds on one-shot distillable

key, based on private information, relative entropy of entanglement, and squashed
entanglement.

15.1.3.1 Private Information Upper Bound

Our study of upper bounds on one-shot distillable key begins with the private
information and the following lemma.

Lemma 15.10
Let 𝐴 and 𝐵 be quantum systems with the same dimension 𝐾 ∈ N, let 𝐸 be
another quantum system of arbitrary dimension, and let 𝜀 ∈ (0, 1). Let 𝜔 𝐴𝐵𝐸
be an 𝜀-approximate tripartite key state of size 𝐾, as specified in Definition 15.6,
and let 𝜔M𝐴𝐵𝐸 B (M 𝐴 ⊗ M 𝐵 )(𝜔 𝐴𝐵𝐸 ), where M 𝐴 and M 𝐵 are the measurement
channels in Definition 15.6. Then the following inequality holds
√ √

𝜀+𝛿 𝜀 1
log2 𝐾 ≤ 𝐼 𝐻 ( 𝐴; 𝐵)𝜔M − 𝐼max ( 𝐴; 𝐸)𝜔M + log2 , (15.1.58)
𝛿
√
where 𝛿 ∈ (0, 1 − 𝜀) and
√
𝜀
𝐼max ( 𝐴; 𝐸)𝜔M B inf √ inf 𝐷 max ( 𝜔
e𝐴𝐸 ∥ 𝜔
e𝐴 ⊗ 𝜏𝐸 ). (15.1.59)
e𝐴𝐸 ,𝜔M
e𝐴𝐸 :𝑃( 𝜔
𝜔 𝐴𝐸
)≤ 𝜀 𝜏𝐸

Proof: Consider that the following condition holds from Definition 15.6:
𝐹 (𝛾 𝐴𝐵𝐸 , 𝜔 𝐴𝐵𝐸 ) ≥ 1 − 𝜀, (15.1.60)
where 𝛾 𝐴𝐵𝐸 is an ideal tripartite key state. Applying the measurement channels
M 𝐴 and M𝐵 from Definition 15.1 and the data-processing inequality for fidelity,
we conclude that
𝐹 (Φ 𝐴𝐵 ⊗ 𝜎𝐸 , (M 𝐴 ⊗ M𝐵 )(𝜔 𝐴𝐵𝐸 )) ≥ 1 − 𝜀. (15.1.61)
Now tracing over system 𝐸 and again applying the data-processing inequality for
fidelity, we conclude that
𝐹 (Φ 𝐴𝐵 , 𝜔M
𝐴𝐵 ) = 𝐹 (Φ 𝐴𝐵 , (M 𝐴 ⊗ M 𝐵 )(𝜔 𝐴𝐵 )) ≥ 1 − 𝜀. (15.1.62)
953
Chapter 15: Secret Key Distillation

Observe that the state 𝜔M

𝐴𝐵 is a classical state and can be written as

𝐾−1
∑︁
𝜔M
𝐴𝐵 = 𝑝(𝑖)𝑞( 𝑗 |𝑖)|𝑖⟩⟨𝑖| 𝐴 ⊗ | 𝑗⟩⟨ 𝑗 | 𝐵 , (15.1.63)
𝑖, 𝑗=0

for a probability distribution 𝑝(𝑖) and a conditional probability distribution 𝑞( 𝑗 |𝑖).

If we perform the comparator test {Π 𝐴𝐵 , 𝐼 𝐴𝐵 − Π 𝐴𝐵 } on systems 𝐴𝐵 of 𝜔M 𝐴𝐵 , where

𝐾−1
∑︁
Π 𝐴𝐵 B |𝑖⟩⟨𝑖| 𝐴 ⊗ |𝑖⟩⟨𝑖| 𝐵 , (15.1.64)
𝑖=0

then the probability of passing it is given by

Tr[Π 𝐴𝐵 𝜔M 𝐴𝐵 ]
 𝐾−1 ! 𝐾−1 
 ∑︁ ′ ′ ∑︁
|𝑖 ⟩⟨𝑖 | 𝐴 ⊗ |𝑖′⟩⟨𝑖′ | 𝐵
ª
= Tr  𝑝(𝑖)𝑞( 𝑗 |𝑖)|𝑖⟩⟨𝑖| 𝐴 ⊗ | 𝑗⟩⟨ 𝑗 | 𝐵 ®
©
(15.1.65)
 𝑖 ′ =0
«𝑖, 𝑗=0

 ¬
𝐾−1
∑︁
= 𝑝(𝑖)𝑞(𝑖|𝑖) (15.1.66)
𝑖=0

Now consider the following channel:

T 𝐴𝐵 (𝜏𝐴𝐵 ) B Tr[Π 𝐴𝐵 𝜏𝐴𝐵 ]|1⟩⟨1| + Tr[(𝐼 𝐴𝐵 − Π 𝐴𝐵 ) 𝜏𝐴𝐵 ]|0⟩⟨0|, (15.1.67)

which outputs a classical flag register indicating if the comparator test is successful
or not. Consider that

T 𝐴𝐵 (Φ 𝐴𝐵 ) = |1⟩⟨1|, (15.1.68)
𝐾−1
! 𝐾−1
!
∑︁ ∑︁
T 𝐴𝐵 (𝜔M𝐴𝐵 ) = 𝑝(𝑖)𝑞(𝑖|𝑖) |1⟩⟨1| + 1 − 𝑝(𝑖)𝑞(𝑖|𝑖) |0⟩⟨0|. (15.1.69)
𝑖=0 𝑖=0

Employing the data-processing inequality for the fidelity and the findings above,
we conclude that

1 − 𝜀 ≤ 𝐹 (Φ 𝐴𝐵 , 𝜔M
𝐴𝐵 ) (15.1.70)
≤ 𝐹 (T 𝐴𝐵 (Φ 𝐴𝐵 ), T 𝐴𝐵 (𝜔M
𝐴𝐵 )) (15.1.71)

954
Chapter 15: Secret Key Distillation

𝐾−1
∑︁
= 𝑝(𝑖)𝑞(𝑖|𝑖). (15.1.72)
𝑖=0

Thus, we conclude that the probability of passing the comparator test satisfies

Tr[Π 𝐴𝐵 𝜔M
𝐴𝐵 ] ≥ 1 − 𝜀. (15.1.73)

Now let Π 𝛿𝐴 be the projection onto the positive eigenspace of 1𝛿 Φ 𝐴 − 𝜔M 𝐴 , where

𝛿 ∈ (0, 1). Consider that

𝛿 1 M 1 𝛿
Π 𝐴 Φ 𝐴 − 𝜔 𝐴 Π 𝛿𝐴 ≥ 0 =⇒ Π 𝛿𝐴 𝜔M Π
𝐴 𝐴
𝛿
≤ Π 𝐴 Φ 𝐴 Π 𝛿𝐴 , (15.1.74)
𝛿 𝛿
and
1
M
𝐼𝐴 − Π 𝛿𝐴 Φ𝐴 − 𝜔 𝐴 𝐼𝐴 − Π𝐴 ≤ 0𝛿
(15.1.75)
𝛿
=⇒ Tr[(𝐼 𝐴 − Π 𝛿𝐴 )Φ 𝐴 ] ≤ 𝛿 Tr[(𝐼 𝐴 − Π 𝛿𝐴 )𝜔M
𝐴 ] ≤ 𝛿. (15.1.76)

The latter inequality can be rewritten as

Tr[Π 𝛿𝐴 Φ 𝐴 ] ≥ 1 − 𝛿. (15.1.77)

Also, let 𝜎𝐵 be an arbitrary state, and consider that

Tr[Π 𝛿𝐴 Π 𝐴𝐵 Π 𝛿𝐴 (𝜔M 𝛿 M 𝛿
𝐴 ⊗ 𝜎𝐵 )] = Tr[Π 𝐴𝐵 (Π 𝐴 𝜔 𝐴 Π 𝐴 ⊗ 𝜎𝐵 )] (15.1.78)
1
≤ Tr[Π 𝐴𝐵 (Π 𝛿𝐴 Φ 𝐴 Π 𝛿𝐴 ⊗ 𝜎𝐵 )] (15.1.79)
𝛿
1
≤ Tr[Π 𝐴𝐵 (Φ 𝐴 ⊗ 𝜎𝐵 )] (15.1.80)
𝛿
1
= Tr[Π 𝐴𝐵 (𝐼 𝐴 ⊗ 𝜎𝐵 )] (15.1.81)
𝛿𝐾
1
= , (15.1.82)
𝛿𝐾
where the second inequality follows because Π 𝛿𝐴 and Φ 𝐴 commute. Then consider
that

Tr[(𝐼 𝐴𝐵 − Π 𝛿𝐴 Π 𝐴𝐵 Π 𝛿𝐴 )𝜔M
𝐴𝐵 ]
1
≤ Tr[(𝐼 𝐴𝐵 − Π 𝛿𝐴 Π 𝐴𝐵 Π 𝛿𝐴 )Φ 𝐴𝐵 ] + Φ 𝐴𝐵 − 𝜔M
𝐴𝐵 (15.1.83)
2 1
955
Chapter 15: Secret Key Distillation

1
≤ Tr[(𝐼 𝐴𝐵 − Π 𝐴𝐵 )Φ 𝐴𝐵 ] + Tr[(𝐼 𝐴𝐵 − Π 𝛿𝐴 ⊗ 𝐼 𝐵 )Φ 𝐴𝐵 ] + Φ 𝐴𝐵 − 𝜔M
𝐴𝐵
2 1
(15.1.84)
1
= Tr[(𝐼 𝐴 − Π 𝛿𝐴 )Φ 𝐴 ] + Φ 𝐴𝐵 − 𝜔M
𝐴𝐵 (15.1.85)
√︃ 2 1

≤ 𝛿 + 1 − 𝐹 (Φ 𝐴𝐵 , 𝜔M 𝐴𝐵 ) (15.1.86)
√
≤ 𝛿 + 𝜀. (15.1.87)
The first inequality is a consequence of the variational characterization of the
normalized trace distance from Theorem 6.1. The second inequality is a consequence
of the following union bound for commuting projectors 𝑃 and 𝑄:
𝐼 − 𝑃𝑄𝑃 ≤ 𝐼 − 𝑃 + 𝐼 − 𝑄, (15.1.88)
which in turn follows from (𝐼 − 𝑃) (𝐼 − 𝑄) ≥ 0. The third inequality follows from
Theorem 6.14, and the last from (15.1.62). As such, the measurement operator
Π 𝛿𝐴 Π 𝐴𝐵 Π 𝛿𝐴 is a particular measurement operator satisfying the contraints given in
√
(𝜔M M
𝜀+𝛿
the optimization for the hypothesis testing relative entropy 𝐷 𝐻 𝐴𝐵 ∥𝜔 𝐴 ⊗ 𝜎𝐵 ),
and we thus conclude that
log2 𝛿 + log2 𝐾 = log2 𝛿𝐾 (15.1.89)
≤ − log2 Tr[Π 𝛿𝐴 Π 𝐴𝐵 Π 𝛿𝐴 (𝜔M
𝐴 ⊗ 𝜎𝐵 )] (15.1.90)
√
(𝜔M M
𝜀+𝛿
≤ 𝐷𝐻 𝐴𝐵 ∥𝜔 𝐴 ⊗ 𝜎𝐵 ). (15.1.91)
Since the bound holds for every state 𝜎𝐵 , we conclude that
√

𝜀+𝛿 1
log2 𝐾 ≤ 𝐼 𝐻 ( 𝐴; 𝐵)𝜔M + log2 . (15.1.92)
𝛿

Now we aim to show that

√
𝜀
𝐼max ( 𝐴; 𝐸)𝜔 ≤ 0. (15.1.93)
Consider that (15.1.61) implies that
𝐹 (Φ 𝐴 ⊗ 𝜎𝐸 , 𝜔M
𝐴𝐸 ) ≥ 1 − 𝜀. (15.1.94)

Thus, the state Φ 𝐴 ⊗ 𝜎𝐸 is such that

√
𝑃(Φ 𝐴 ⊗ 𝜎𝐸 , 𝜔M
𝐴𝐸 ) ≤ 𝜀. (15.1.95)
956
Chapter 15: Secret Key Distillation

Then
√
𝜀
𝐼max ( 𝐴; 𝐸)𝜔M = inf √ inf 𝐷 max ( 𝜔
e𝐴𝐸 ∥ 𝜔
e𝐴 ⊗ 𝜏𝐸 ) (15.1.96)
e𝐴𝐸 ,𝜔M
e𝐴𝐸 :𝑃( 𝜔
𝜔 𝐴𝐸
)≤ 𝜀 𝜏𝐸

≤ 𝐷 max (Φ 𝐴 ⊗ 𝜎𝐸 ∥Φ 𝐴 ⊗ 𝜎𝐸 ) (15.1.97)
= 0, (15.1.98)

where the inequality follows from the choices 𝜔 e𝐴𝐸 = Φ 𝐴 ⊗ 𝜎𝐸 , 𝜏𝐸 = 𝜎𝐸 , (15.1.95),

and the fact that 𝐷 max (𝜌∥ 𝜌) = 0 for every state 𝜌. ■

Note that the result of Lemma 15.10 is general and applies to every tripartite
state that is close in fidelity to an ideal tripartite key state. Applying it to the state
𝜔 𝐾 𝐴 𝐾 𝐵 𝐸 𝑍 = L↔
𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝐸 𝑍 (𝜓 𝐴𝐵𝐸 ) that is the final output of a (𝐾, 𝜀) tripartite key
distillation protocol for a state 𝜌 𝐴𝐵 with purification 𝜓 𝐴𝐵𝐸 , we obtain the following
result:

Theorem 15.11 Upper Bound on One-Shot Distillable Key

Let 𝜌 𝐴𝐵 be a bipartite state with purification 𝜓 𝐴𝐵𝐸 . For every (𝐾, 𝜀) tripartite
key distillation protocol (𝐾, L↔ 𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝐸 𝑍 ) for 𝜓 𝐴𝐵𝐸 , with 𝜀 ∈ (0, 1) and
𝑑 𝐾 𝐴 = 𝑑 𝐾 𝐵 = 𝐾, the number of 𝜀-approximate secret-key bits extracted at the
end of the protocol is bounded from above by the LOPC-optimized private
information of 𝜌 𝐴𝐵 , i.e.,
√ √
1
𝜀+𝛿 ′ 𝜀
log2 𝐾 ≤ sup 𝐼 𝐻 (𝑋; 𝐵 )L(𝜓) − 𝐼max (𝑋; 𝐸 𝑍)L(𝜓) + log2 , (15.1.99)
L 𝛿
√
where 𝛿 ∈ (0, 1 − 𝜀) and the optimization is over every LOPC channel
L↔𝐴𝐵→𝑋 𝐵′ 𝑍 , where 𝑋 and 𝑍 are classical systems. Consequently, for the
one-shot 𝜀-distillable key, the following bound holds
√ √
1
𝜀 𝜀+𝛿 ′ 𝜀
𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 ≤ sup 𝐼 𝐻 (𝑋; 𝐵 )L(𝜓) − 𝐼max (𝑋; 𝐸 𝑍)L(𝜓) + log2 .
L 𝛿
(15.1.100)

Proof: For a (𝐾, 𝜀) tripartite key distillation protocol (𝐾, L↔ 𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝑍 ) for 𝜓 𝐴𝐵𝐸 ,
↔
by definition the state 𝜔 𝐾 𝐴𝐾 𝐵 𝐸 𝑍 = L 𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝑍 (𝜓 𝐴𝐵𝐸 ) satisfies

𝐹 (𝛾𝐾 𝐴𝐾 𝐵 𝐸 𝑍 , 𝜔 𝐾 𝐴𝐾 𝐵 𝐸 𝑍 ) ≥ 1 − 𝜀, (15.1.101)
957
Chapter 15: Secret Key Distillation

where 𝛾𝐾 𝐴𝐾 𝐵 𝐸 𝑍 is an ideal tripartite key state. Upon performing the local measure-
ments M𝐾 𝐴 and M𝐾 𝐵 mentioned in Definition 15.1, we conclude that

𝐹 (Φ𝐾 𝐴𝐾 𝐵 ⊗ 𝜎𝐸 𝑍 , (M𝐾 𝐴 ⊗ M𝐾 𝐵 )(𝜔 𝐾 𝐴𝐾 𝐵 𝐸 𝑍 )) ≥ 1 − 𝜀. (15.1.102)

Set
𝜔M
𝐾 𝐴 𝐾 𝐵 𝐸 𝑍 B (M𝐾 𝐴 ⊗ M𝐾 𝐵 )(𝜔 𝐾 𝐴 𝐾 𝐵 𝐸 𝑍 ). (15.1.103)
Therefore, using (15.1.58), we conclude that
√ √

𝜀+𝛿 𝜀 1
log2 𝐾 ≤ 𝐼 𝐻 (𝐾 𝐴 ; 𝐾 𝐵 )𝜔M − 𝐼max (𝐾 𝐴 ; 𝐸 𝑍)𝜔M + log2 , (15.1.104)
𝛿
√
where 𝛿 ∈ (0, 1 − 𝜀). Since (M𝐾 𝐴 ⊗ M𝐾 𝐵 ) ◦ L↔ 𝐴𝐵→𝐾 𝐴 𝐾 𝐵 𝑍 is a particular LOPC
↔
channel of the form L 𝐴𝐵→𝑋 𝐵′ 𝑍 , with 𝑋 and 𝑍 classical systems, we conclude that
√ √
𝜀+𝛿 𝜀
𝐼𝐻 (𝐾 𝐴 ; 𝐾 𝐵 )𝜔M − 𝐼max (𝐾 𝐴 ; 𝐸 𝑍)𝜔M
√ √
𝜀+𝛿 ′ 𝜀
≤ sup 𝐼 𝐻 (𝑋; 𝐵 )L(𝜓) − 𝐼max (𝑋; 𝐸 𝑍)L(𝜓) . (15.1.105)
L

We thus conclude (15.1.99). Now employing the definition of the one-shot 𝜀-

distillable key in (15.1.9), we conclude (15.1.100). ■

15.1.3.2 Relative Entropy of Entanglement Upper Bound

We now consider an upper bound based on the relative entropy of entanglement

(Section 9.2). In order to place an upper bound on the one-shot distillable key
𝐾 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 for a given state 𝜌 𝐴𝐵 and 𝜀 ∈ [0, 1], we consider state that are useless
for key distillation. This approach is analogous to what we did previously for
entanglement distillation in Section 13.1.1.
Which states are useless for key distillation? Suppose that a state 𝜎𝐴𝐵 is
separable, so that it can be written as
∑︁
𝜎𝐴𝐵 = 𝑝(𝑥)𝜓 𝑥𝐴 ⊗ 𝜑𝑥𝐵 , (15.1.106)
𝑥∈X

where X is a finite alphabet, 𝑝 : X → [0, 1] is a probability distribution, and

{𝜓 𝑥𝐴 }𝑥∈X and {𝜑𝑥𝐵 }𝑥∈X are sets of pure states. Consistent with the model of key
distillation that we have discussed so far, the eavesdropper Eve is allowed to have
958
Chapter 15: Secret Key Distillation

access to the purifying system 𝐸 of a purification of 𝜎𝐴𝐵 , which in this case can be
chosen as follows:

𝜓 𝐴𝐵𝐸 = |𝜓⟩⟨𝜓| 𝐴𝐵𝐸 , (15.1.107)

∑︁ √︁
|𝜓⟩ 𝐴𝐵𝐸 B 𝑝(𝑥)|𝜓 𝑥 ⟩ 𝐴 ⊗ |𝜑𝑥 ⟩𝐵 ⊗ |𝑥⟩𝐸 . (15.1.108)
𝑥∈X

Then Eve can measure the system 𝐸 and obtain an outcome 𝑥 ∈ X with probability
𝑝(𝑥), and the resulting state of Alice and Bob is the product state 𝜓 𝑥𝐴 ⊗ 𝜑𝑥𝐵 . Being
a product state, the resulting state 𝜓 𝑥𝐴 ⊗ 𝜑𝑥𝐵 of Alice and Bob has no correlation
whatsoever and so cannot be used to generate a secret key. If Alice and Bob attempt
to process this state using LOCC, the same problem arises. In the model of key
distillation that we assume, Eve gets a copy of all classical data exchanged between
Alice and Bob, and so the resulting state is still a product state and is useless for
generating a secret key. As such, all separable states are useless for key distillation.
The intuition above is useful for reasoning about key distillation, but there
is a way to make it precise by means of a construct called the “privacy test.” In
doing so, we exploit the equivalence between tripartite key distillation and bipartite
private-state distillation discussed in Section 15.1.2 and identified in Theorem 15.9.
The privacy test is analogous to the entanglement test used in Chapter 13, which
we used to establish upper bounds on the number of approximate ebits that can
be generated in an entanglement distillation protocol. Here we define a “privacy
test” as a method for testing whether a given bipartite state is private. It forms an
essential component in Proposition 15.15, which states that the 𝜀-relative entropy of
entanglement is an upper bound on the number of private bits in an 𝜀-approximate
bipartite private state.

Definition 15.12 Privacy Test

Let 𝛾 𝐴𝐵𝐴′ 𝐵′ be a bipartite private state as given in Definition 15.4. A privacy
test corresponding to 𝛾 𝐴𝐵𝐴′ 𝐵′ (a 𝛾-privacy test) is defined as the following
measurement:
{Π 𝐴𝐵𝐴′ 𝐵′ , 𝐼 𝐴𝐵𝐴′ 𝐵′ − Π 𝐴𝐵𝐴′ 𝐵′ } , (15.1.109)
where
†
Π 𝐴𝐵𝐴′ 𝐵′ B 𝑈 𝐴𝐵𝐴′ 𝐵′ (Φ 𝐴𝐵 ⊗ 𝐼 𝐴′ 𝐵′ ) 𝑈 𝐴𝐵𝐴 ′ 𝐵′ (15.1.110)
and 𝑈 𝐴𝐵𝐴′ 𝐵′ is the twisting unitary specified in (15.1.12).

959
Chapter 15: Secret Key Distillation

If one has access to the systems 𝐴𝐵𝐴′ 𝐵′ of a bipartite state 𝜌 𝐴𝐵𝐴′ 𝐵′ and has a
description of 𝛾 𝐴𝐵𝐴′ 𝐵′ satisfying (15.1.29), then the 𝛾-privacy test decides whether
𝜌 𝐴𝐵𝐴′ 𝐵′ is a private state with respect to 𝛾 𝐴𝐵𝐴′ 𝐵′ . The first outcome corresponds to
the decision “yes, it is a 𝛾-private state,” and the second outcome corresponds to
“no.” Physically, this test is just untwisting the purported private state and projecting
onto a maximally entangled state. The following lemma states that the probability
for an 𝜀-approximate bipartite private state to pass the 𝛾-privacy test is not smaller
than 1 − 𝜀:

Lemma 15.13
Let 𝜀 ∈ [0, 1] and let 𝜌 𝐴𝐵𝐴′ 𝐵′ be an 𝜀-approximate private state as given in
Definition 15.6, with 𝛾 𝐴𝐵𝐴′ 𝐵′ satisfying (15.1.29). The probability for 𝜌 𝐴𝐵𝐴′ 𝐵′
to pass the 𝛾-privacy test is never smaller than 1 − 𝜀:

Tr[Π 𝐴𝐵𝐴′ 𝐵′ 𝜌 𝐴𝐵𝐴′ 𝐵′ ] ≥ 1 − 𝜀, (15.1.111)

where Π 𝐴𝐵𝐴′ 𝐵′ is defined in (15.1.110).

Proof: One can see this bound explicitly by inspecting the following steps:
Tr[Π 𝐴𝐵𝐴′ 𝐵′ 𝜌 𝐴𝐵𝐴′ 𝐵′ ]
†
= Tr[𝑈 𝐴𝐵𝐴′ 𝐵′ (Φ 𝐴𝐵 ⊗ 𝐼 𝐴′ 𝐵′ ) 𝑈 𝐴𝐵𝐴 ′ 𝐵 ′ 𝜌 𝐴𝐵𝐴′ 𝐵 ′ ] (15.1.112)
†
= Tr[(Φ 𝐴𝐵 ⊗ 𝐼 𝐴′ 𝐵′ ) 𝑈 𝐴𝐵𝐴 ′ 𝐵 ′ 𝜌 𝐴𝐵𝐴′ 𝐵 ′ 𝑈 𝐴𝐵𝐴′ 𝐵 ′ ] (15.1.113)
†
= ⟨Φ| 𝐴𝐵 Tr 𝐴′ 𝐵′ [𝑈 𝐴𝐵𝐴 ′ 𝐵 ′ 𝜌 𝐴𝐵𝐴′ 𝐵 ′ 𝑈 𝐴𝐵𝐴′ 𝐵 ′ ]|Φ⟩ 𝐴𝐵 (15.1.114)
†
= 𝐹 (Φ 𝐴𝐵 , Tr 𝐴′ 𝐵′ [𝑈 𝐴𝐵𝐴 ′ 𝐵 ′ 𝜌 𝐴𝐵𝐴′ 𝐵 ′ 𝑈 𝐴𝐵𝐴′ 𝐵 ′ ]) (15.1.115)
†
≥ 𝐹 (Φ 𝐴𝐵 ⊗ 𝜃 𝐴′ 𝐵′ , 𝑈 𝐴𝐵𝐴 ′ 𝐵 ′ 𝜌 𝐴𝐵𝐴′ 𝐵 ′ 𝑈 𝐴𝐵𝐴′ 𝐵 ′ ) (15.1.116)
†
= 𝐹 (𝑈 𝐴𝐵𝐴′ 𝐵′ (Φ 𝐴𝐵 ⊗ 𝜃 𝐴′ 𝐵′ )𝑈 𝐴𝐵𝐴 ′ 𝐵 ′ , 𝜌 𝐴𝐵𝐴′ 𝐵 ′ ) (15.1.117)
= 𝐹 (𝛾 𝐴𝐵𝐴′ 𝐵′ , 𝜌 𝐴𝐵𝐴′ 𝐵′ ) (15.1.118)
≥ 1 − 𝜀. (15.1.119)
The third equality follows because Φ 𝐴𝐵 is pure and by taking applying the definition
of partial trace (over 𝐴′ 𝐵′). The fourth equality follows from the expression
in (6.2.2), for the fidelity between a pure state and a mixed state. The first
inequality follows from the data-processing inequality for fidelity. The second-
to-last equality follows from the unitary invariance of the fidelity, and the last
equality follows because 𝛾 𝐴𝐵𝐴′ 𝐵′ is an ideal private state, written as 𝛾 𝐴𝐵𝐴′ 𝐵′ =
960
Chapter 15: Secret Key Distillation

†
𝑈 𝐴𝐵𝐴′ 𝐵′ (Φ 𝐴𝐵 ⊗ 𝜃 𝐴′ 𝐵′ )𝑈 𝐴𝐵𝐴 ′ 𝐵′ . ■

On the other hand, a separable state 𝜎𝐴𝐵𝐴′ 𝐵′ ∈ SEP( 𝐴𝐴′ : 𝐵𝐵′) of the key and
shield systems has a small chance of passing an arbitrary 𝛾-privacy test:

Lemma 15.14
For a separable state 𝜎𝐴𝐵𝐴′ 𝐵′ ∈ SEP( 𝐴𝐴′ : 𝐵𝐵′), the probability of passing an
arbitrary 𝛾-privacy test is not larger than 𝐾1 :

1
Tr[Π 𝐴𝐵𝐴′ 𝐵′ 𝜎𝐴𝐵𝐴′ 𝐵′ ] ≤ , (15.1.120)
𝐾
where 𝐾 is the number of values that the secret key can take (i.e., 𝐾 = 𝑑 𝐴 = 𝑑 𝐵 ).

Proof: The idea is to begin by establishing the bound for an arbitrary pure product
state |𝜙⟩ 𝐴𝐴′ ⊗ |𝜑⟩𝐵𝐵′ , i.e., to show that
1
Tr[Π 𝐴𝐵𝐴′ 𝐵′ |𝜙⟩⟨𝜙| 𝐴𝐴′ ⊗ |𝜑⟩⟨𝜑| 𝐵𝐵′ ] ≤ . (15.1.121)
𝐾
We can expand these states with respect to the standard bases of 𝐴 and 𝐵 as follows:
" 𝐾 # 𝐾 
∑︁ ∑︁ 
|𝜙⟩ 𝐴𝐴′ ⊗ |𝜑⟩𝐵𝐵′ = 𝛼𝑖 |𝑖⟩ 𝐴 ⊗ |𝜙𝑖 ⟩ 𝐴′ ⊗  𝛽 𝑗 | 𝑗⟩𝐵 ⊗ |𝜑 𝑗 ⟩𝐵′  , (15.1.122)
𝑖=1  𝑗=1 
 
2
|𝛼𝑖 | 2 = 𝐾𝑗=1 𝛽 𝑗 = 1. We then find that
Í𝐾 Í
where 𝑖=1

Tr[Π 𝐴𝐵𝐴′ 𝐵′ |𝜙⟩⟨𝜙| 𝐴𝐴′ ⊗ |𝜑⟩⟨𝜑| 𝐵𝐵′ ]

†
= Tr[𝑈 𝐴𝐵𝐴′ 𝐵′ (Φ 𝐴𝐵 ⊗ 𝐼 𝐴′ 𝐵′ ) 𝑈 𝐴𝐵𝐴 ′ 𝐵 ′ |𝜙⟩⟨𝜙| 𝐴𝐴′ ⊗ |𝜑⟩⟨𝜑| 𝐵𝐵 ′ ] (15.1.123)
2
†
= (⟨Φ| 𝐴𝐵 ⊗ 𝐼 𝐴′ 𝐵′ ) 𝑈 𝐴𝐵𝐴 ′ 𝐵 ′ |𝜙⟩ 𝐴𝐴′ ⊗ |𝜑⟩ 𝐵𝐵 ′ (15.1.124)
2
Í 2
𝐾 𝑖𝑖†
√1
𝐾 𝑖=1 ⟨𝑖| 𝐴 ⊗ ⟨𝑖| 𝐵 ⊗ 𝑈 𝐴′ 𝐵′ ×
= Í𝐾 (15.1.125)
′ ′
𝑖 ′ , 𝑗 ′ =1 𝛼𝑖 ′ 𝛽 𝑗 ′ |𝑖 ⟩ 𝐴 ⊗ | 𝑗 ⟩ 𝐵 ⊗ |𝜙𝑖 ′ ⟩ 𝐴′ |𝜑 𝑗 ′ ⟩ 𝐵′
2
2
𝐾
1 ∑︁
= 𝛼𝑖 ′ 𝛽 𝑗 ′ ⟨𝑖|𝑖′⟩ 𝐴 ⊗ ⟨𝑖| 𝑗 ′⟩𝐵 ⊗ 𝑈 𝑖𝑖†
𝐴′ 𝐵′ |𝜙𝑖 ⟩ 𝐴 |𝜑 𝑗 ⟩ 𝐵
′ ′ ′ ′ (15.1.126)
𝐾 𝑖,𝑖 ′ , 𝑗 ′ =1
2
961
Chapter 15: Secret Key Distillation

𝐾 2
1 ∑︁
= 𝛼𝑖 𝛽𝑖𝑈 𝑖𝑖†
𝐴′ 𝐵′ |𝜙𝑖 ⟩ 𝐴 |𝜑𝑖 ⟩ 𝐵
′ ′ (15.1.127)
𝐾 𝑖=1
2
𝐾 2
1 ∑︁
= 𝛼𝑖 𝛽𝑖 |𝜉𝑖 ⟩ 𝐴′ 𝐵′ (15.1.128)
𝐾 𝑖=1 2
𝐾
1 ∑︁
= 𝛼𝑖 𝛽𝑖 𝛼∗𝑗 𝛽∗𝑗 ⟨𝜉 𝑗 |𝜉𝑖 ⟩ 𝐴′ 𝐵′ . (15.1.129)
𝐾 𝑖, 𝑗=1

where |𝜉𝑖 ⟩ 𝐴′ 𝐵′ B (𝑈 𝑖𝑖𝐴′ 𝐵′ ) † |𝜙𝑖 ⟩ 𝐴′ |𝜑𝑖 ⟩𝐵′ is a quantum state. The desired bound in
(15.1.121) is then equivalent to
𝐾
∑︁
𝛼𝑖 𝛽𝑖 𝛼∗𝑗 𝛽∗𝑗 ⟨𝜉 𝑗 |𝜉𝑖 ⟩ 𝐴′ 𝐵′ ≤ 1. (15.1.130)
𝑖, 𝑗=1
√ √
Setting 𝛼𝑖 = 𝑝𝑖 𝑒𝑖𝜃 𝑖 and 𝛽𝑖 = 𝑞𝑖 𝑒𝑖𝜂𝑖 , we find that
𝐾 𝐾
∑︁ ∑︁ √
𝛼𝑖 𝛽𝑖 𝛼∗𝑗 𝛽∗𝑗 ⟨𝜉 𝑗 |𝜉𝑖 ⟩ 𝐴′ 𝐵′ = 𝑝𝑖 𝑞𝑖 𝑝 𝑗 𝑞 𝑗 𝑒𝑖 ( 𝜃 𝑖 +𝜂𝑖 −𝜃 𝑗 −𝜂 𝑗 ) ⟨𝜉 𝑗 |𝜉𝑖 ⟩ 𝐴′ 𝐵′ (15.1.131)
𝑖, 𝑗=1 𝑖, 𝑗=1
𝐾
∑︁ √
≤ 𝑝𝑖 𝑞𝑖 𝑝 𝑗 𝑞 𝑗 ⟨𝜉 𝑗 |𝜉𝑖 ⟩ 𝐴′ 𝐵′ (15.1.132)
𝑖, 𝑗=1
𝐾
∑︁ √
≤ 𝑝𝑖 𝑞𝑖 𝑝 𝑗 𝑞 𝑗 (15.1.133)
𝑖, 𝑗=1
" 𝐾
#2
∑︁ √
= 𝑝𝑖 𝑞𝑖 ≤ 1, (15.1.134)
𝑖=1
where the last inequality holds for all probability distributions (this is just the
statement that the classical fidelity cannot exceed one). The above reasoning thus
establishes (15.1.120) for pure product states, and the bound for general separable
states follows because every such state can be written as a convex combination of
pure product states. ■

Recall from (9.2.3) that the 𝜀-relative entropy of entanglement of a bipartite

state 𝜌 𝐴𝐵 is defined as
𝐸 𝑅𝜀 ( 𝐴; 𝐵) 𝜌 B inf 𝐷 𝜀𝐻 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ). (15.1.135)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)

962
Chapter 15: Secret Key Distillation

This quantity is an LOCC monotone, meaning that

𝐸 𝑅𝜀 ( 𝐴; 𝐵) 𝜌 ≥ 𝐸 𝑅𝜀 ( 𝐴′; 𝐵′)𝜔 , (15.1.136)

for 𝜔 𝐴′ 𝐵′ B L 𝐴𝐵→𝐴′ 𝐵′ (𝜌 𝐴𝐵 ), with L 𝐴𝐵→𝐴′ 𝐵′ an LOCC channel.

Proposition 15.15
Fix 𝜀 ∈ [0, 1]. Let 𝜌 𝐴𝐵𝐴′ 𝐵′ be an 𝜀-approximate bipartite private state, as given
in Definition 15.6. Then the number log2 𝐾 of private bits in such a state is
bounded from above by the 𝜀-relative entropy of entanglement of 𝜌 𝐴𝐵𝐴′ 𝐵′ :

log2 𝐾 ≤ 𝐸 𝑅𝜀 ( 𝐴𝐴′; 𝐵𝐵′) 𝜌 . (15.1.137)

Proof: Let 𝜎𝐴𝐵𝐴′ 𝐵′ be an arbitrary separable state in SEP( 𝐴𝐴′ : 𝐵𝐵′). From
Definition 15.6 and Lemma 15.13, we conclude that the 𝛾-privacy test Π 𝐴𝐵𝐴′ 𝐵′
from (15.1.110) is a particular measurement operator satisfying the constraint
Tr[Π 𝐴𝐵𝐴′ 𝐵′ 𝜌 𝐴𝐵𝐴′ 𝐵′ ] ≥ 1 − 𝜀 for 𝛽𝜀 (𝜌 𝐴𝐵𝐴′ 𝐵′ ∥𝜎𝐴𝐵𝐴′ 𝐵′ ). Applying Lemma 15.14
and the definition of 𝛽𝜀 , we conclude that
1
𝛽𝜀 (𝜌 𝐴𝐵𝐴′ 𝐵′ ∥𝜎𝐴𝐵𝐴′ 𝐵′ ) ≤ Tr[Π 𝐴𝐵𝐴′ 𝐵′ 𝜎𝐴𝐵𝐴′ 𝐵′ ] ≤ . (15.1.138)
𝐾
Since the inequality holds for all separable states 𝜎𝐴𝐵𝐴′ 𝐵′ ∈ SEP( 𝐴𝐴′ : 𝐵𝐵′), we
conclude that
1
sup 𝛽𝜀 (𝜌 𝐴𝐵𝐴′ 𝐵′ ∥𝜎𝐴𝐵𝐴′ 𝐵′ ) ≤ . (15.1.139)
𝜎𝐴𝐵 𝐴′ 𝐵′ ∈SEP( 𝐴𝐴′ :𝐵𝐵′ ) 𝐾

Applying a negative logarithm and the definition in (15.1.135), we arrive at the

inequality in (15.1.137). ■

A consequence of Proposition 15.15 is the following upper bound on the

one-shot distillable key of 𝜌 𝐴𝐵 :

963
Chapter 15: Secret Key Distillation

Theorem 15.16 Relative Entropy of Entanglement Upper Bound on

One-Shot Distillable Key
Let 𝜌 𝐴𝐵 be a bipartite state. For every (𝐾, 𝜀) secret-key distillation protocol
for 𝜌 𝐴𝐵 , with 𝜀 ∈ [0, 1], we have that

log2 𝐾 ≤ 𝐸 𝑅𝜀 ( 𝐴; 𝐵) 𝜌 . (15.1.140)

Consequently, for the one-shot distillable key, we have

𝐾 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 ≤ 𝐸 𝑅𝜀 ( 𝐴; 𝐵) 𝜌 , (15.1.141)

for every state 𝜌 𝐴𝐵 and 𝜀 ∈ [0, 1].

Proof: By Theorem 15.9, we can work in the picture of bipartite private-state

distillation. Consider a (𝐾, 𝜀) private-state distillation protocol for 𝜌 𝐴𝐵 with the
corresponding LOCC channel L 𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ . Then, by definition, we have that

1 − 𝐹 (𝛾𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ , L 𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ (𝜌 𝐴𝐵 )) ≤ 𝜀 (15.1.142)

for some ideal bipartite private state 𝛾𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ of size 𝐾. Letting 𝜔 𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ B

L 𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ (𝜌 𝐴𝐵 ), we have that 1 − 𝐹 (𝛾𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ , 𝜔 𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ ) ≤ 𝜀. The output
state 𝜔 𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ of the private-state distillation protocol therefore satisfies the
conditions of Proposition 15.15, which means that

log2 𝐾 ≤ 𝐸 𝑅𝜀 (𝐾 𝐴 𝐴′; 𝐾 𝐵 𝐵′)𝜔 . (15.1.143)

Now, as mentioned above and discussed in Section 9.2, 𝐸 𝑅𝜀 is an entanglement

measure. Thus, it satisfies the data-processing inequality under LOCC channels,
which means that 𝐸 𝑅𝜀 (𝐾 𝐴 𝐴′; 𝐾 𝐵 𝐵′)𝜔 ≤ 𝐸 𝑅𝜀 ( 𝐴; 𝐵) 𝜌 . We thus have log2 𝐾 ≤
𝐸 𝑅𝜀 ( 𝐴; 𝐵) 𝜌 . Since this inequality holds for all 𝐾 ∈ N and for every LOCC
channel L 𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ , by definition of the one-shot 𝜀-distillable key, we obtain
𝐾 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 ≤ 𝐸 𝑅𝜀 ( 𝐴; 𝐵) 𝜌 , as required. ■

We then find the following upper bounds on the distillable key available in
(𝐾, 𝜀) key distillation protocols:

964
Chapter 15: Secret Key Distillation

Corollary 15.17
Let 𝜌 𝐴𝐵 be a bipartite state, and let 𝜀 ∈ [0, 1). For every (𝐾, 𝜀) secret-key
distillation protocol for 𝜌 𝐴𝐵 , we have that
√
1 − 2 𝜀 − 𝛿 log2 𝐾 ≤ sup 𝐼 (𝑋; 𝐵′)L↔ (𝜓) − 𝐼 (𝑋; 𝐸 𝑍)L↔ (𝜓)

L↔
√ √ √

1
+ ℎ2 ( 𝜀 + 𝛿) + 1 − 𝜀 − 𝛿 log2 + 2𝑔2 ( 𝜀), (15.1.144)
𝛿
√
where 𝛿 ∈ 0, 1 − 𝜀 , 𝜓 𝐴𝐵𝐸 is a purification of 𝜌 𝐴𝐵 , the information quantities
are evaluated on the state L↔ 𝐴𝐵→𝑋 𝐵′ 𝑍 (𝜓 𝐴𝐵𝐸 ), and the optimization is over every
↔
LOPC channel L 𝐴𝐵→𝑋 𝐵′ 𝑍 with classical systems 𝑋 and 𝑍. The following
bound holds for all 𝛼 > 1:

𝛼 1
log2 𝐾 ≤ 𝐸 e𝛼 ( 𝐴; 𝐵) 𝜌 + log2 , (15.1.145)
𝛼−1 1−𝜀

where
e𝛼 ( 𝐴; 𝐵) 𝜌 =
𝐸 inf e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 )
𝐷 (15.1.146)
𝜎𝐴𝐵 ∈SEP( 𝐴;𝐵)

is the sandwiched Rényi relative entropy of entanglement (see (9.2.4)).

Proof: Employing the same reasoning that led to (15.1.92) and (15.1.98), consider
that the following bounds hold for a given (𝐾, 𝜀) secret-key distillation protocol:
√

𝜀+𝛿 1
log2 𝐾 ≤ 𝐼 𝐻 (𝐾 𝐴 ; 𝐾 𝐵 )𝜔M + log2 , (15.1.147)
𝛿
√
𝜀
𝐼max (𝐾 𝐴 ; 𝐸 𝑍)𝜔M ≤ 0. (15.1.148)
√
where 𝛿 ∈ (0, 1 − 𝜀). Consider from Proposition 7.70 that
√ 1 √
𝜀+𝛿
𝐼𝐻 (𝐾 𝐴 ; 𝐾 𝐵 )𝜔M ≤ √ 𝐼 (𝐾 𝐴 ; 𝐾 𝐵 )𝜔M + ℎ2 ( 𝜀 + 𝛿) . (15.1.149)
1− 𝜀−𝛿
Combining (15.1.147) and (15.1.149), we obtain
√
1 − 𝜀 − 𝛿 log2 𝐾 ≤ 𝐼 (𝐾 𝐴 ; 𝐾 𝐵 )𝜔M
√ √

1
+ ℎ2 ( 𝜀 + 𝛿) + 1 − 𝜀 − 𝛿 log2 . (15.1.150)
𝛿
965
Chapter 15: Secret Key Distillation

Also, we have that

√
𝜀
𝐼max (𝐾 𝐴 ; 𝐸 𝑍)𝜔M
= inf √ inf 𝐷 max ( 𝜔
e𝐾 𝐴 𝐸 𝑍 ∥ 𝜔
e𝐾 𝐴 ⊗ 𝜏𝐸 𝑍 ) (15.1.151)
e𝐾 𝐴 𝐸 𝑍 ,𝜔M )≤ 𝜀 𝜏𝐸 𝑍
e𝐾 𝐴 𝐸 𝑍 :𝑃( 𝜔
𝜔

≥ inf √ inf 𝐷 (𝜔
e𝐾 𝐴 𝐸 𝑍 ∥ 𝜔
e𝐾 𝐴 ⊗ 𝜏𝐸 𝑍 ) (15.1.152)
e𝐾 𝐴 𝐸 𝑍 ,𝜔M )≤ 𝜀 𝜏𝐸 𝑍
e𝐾 𝐴 𝐸 𝑍 :𝑃( 𝜔
𝜔

= inf √ 𝐼 (𝐾 𝐴 ; 𝐸 𝑍)𝜔e (15.1.153)

e𝐾 𝐴 𝐸 𝑍 ,𝜔M )≤ 𝜀
e𝐾 𝐴 𝐸 𝑍 :𝑃( 𝜔
𝜔
√ √
≥ 𝐼 (𝐾 𝐴 ; 𝐸 𝑍)𝜔M − 𝜀 log2 𝐾 − 2𝑔2 ( 𝜀). (15.1.154)
The first inequality follows because 𝐷 max (𝜌∥𝜎) ≥ 𝐷 (𝜌∥𝜎), and the second
inequality is a consequence of Theorem 6.14 and (7.2.169). We then find that
√ √ √
𝜀
𝐼max (𝐾 𝐴 ; 𝐸 𝑍)𝜔M ≥ 𝐼 (𝐾 𝐴 ; 𝐸 𝑍)𝜔M − 𝜀 log2 𝐾 − 2𝑔2 ( 𝜀). (15.1.155)
Combining (15.1.148) and (15.1.155), we conclude that
√ √
− 𝜀 log2 𝐾 ≤ −𝐼 (𝐾 𝐴 ; 𝐸 𝑍)𝜔M + 2𝑔2 ( 𝜀). (15.1.156)
Adding (15.1.150) and (15.1.156) gives
√
1 − 2 𝜀 − 𝛿 log2 𝐾 ≤ 𝐼 (𝐾 𝐴 ; 𝐾 𝐵 )𝜔M − 𝐼 (𝐾 𝐴 ; 𝐸 𝑍)𝜔M
√ √ √

1
+ ℎ2 ( 𝜀 + 𝛿) + 1 − 𝜀 − 𝛿 log2 + 2𝑔2 ( 𝜀). (15.1.157)
𝛿
Now by optimizing over every LOPC channel L↔ 𝐴𝐵→𝑋 𝐵′ 𝑍 with 𝑋 and 𝑍 classical
systems and observing that the state 𝜔M 𝐾 𝐴 𝐾 𝐵 𝐸 𝑍 results from the action of a particular
LOPC channel on 𝜓 𝐴𝐵𝐸 , we conclude that
𝐼 (𝐾 𝐴 ; 𝐾 𝐵 )𝜔M − 𝐼 (𝐾 𝐴 ; 𝐸 𝑍)𝜔M ≤ sup 𝐼 (𝑋; 𝐵′)L(𝜓) − 𝐼 (𝑋; 𝐸 𝑍)L(𝜓) , (15.1.158)

L

thus giving (15.1.144).

The inequality in (15.1.145) follows from Theorem 15.16 and (7.9.59) in
Proposition 7.71. ■

Since the upper bounds in (15.1.144) and (15.1.145) hold for all (𝐾, 𝜀) secret-
key distillation protocols, we conclude the following upper bounds on one-shot
𝜀-distillable key:

966
Chapter 15: Secret Key Distillation
√
1 − 2 𝜀 − 𝛿 𝐾 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 ≤ sup 𝐼 (𝑋; 𝐵′)L(𝜓) − 𝐼 (𝑋; 𝐸 𝑍)L(𝜓)

L
√ √ √

1
+ ℎ2 ( 𝜀 + 𝛿) + 1 − 𝜀 − 𝛿 log2 + 2𝑔2 ( 𝜀), (15.1.159)
𝛿

𝛼 1
log2 𝐾 ≤ 𝐸 e𝛼 ( 𝐴; 𝐵) 𝜌 + log2 , ∀𝛼 > 1, (15.1.160)
𝛼−1 1−𝜀
√
where 𝛿 ∈ (0, 1 − 𝜀) and the optimization in (15.1.159) is over every LOPC
channel L 𝐴𝐵→𝑋 𝐵′ 𝑍 .

15.1.3.3 Squashed Entanglement Upper Bound

We now turn to squashed entanglement and establish it as an upper bound on

one-shot distillable key. Before doing so, we establish some preparatory lemmas.
We begin by establishing Lemma 15.19, which is an upper bound on the
logarithm of the dimension 𝐾 of a key system of an 𝜀-approximate private state,
as given in Definition 15.6, in terms of its squashed entanglement, plus another
term depending only on 𝜀 and log2 𝐾. In what follows, we suppose that 𝛾 𝐴𝐴′ 𝐵𝐵′
is a private state with key systems 𝐴𝐵 and shield systems 𝐴′ 𝐵′. Recall from
Theorem 15.5 that a private state of log2 𝐾 private bits can be written in the
following form:
†
𝛾 𝐴𝐵𝐴′ 𝐵′ = 𝑈 𝐴𝐵𝐴′ 𝐵′ (Φ 𝐴𝐵 ⊗ 𝜎𝐴′ 𝐵′ ) 𝑈 𝐴𝐵𝐴 ′ 𝐵′ , (15.1.161)

where Φ 𝐴𝐵 is a maximally entangled state of Schmidt rank 𝐾

1 ∑︁
Φ 𝐴𝐵 B |𝑖⟩⟨ 𝑗 | 𝐴 ⊗ |𝑖⟩⟨ 𝑗 | 𝐵 , (15.1.162)
𝐾 𝑖, 𝑗

and ∑︁
𝑖𝑗
𝑈 𝐴𝐵𝐴′ 𝐵′ = |𝑖⟩⟨𝑖| 𝐴 ⊗ | 𝑗⟩⟨ 𝑗 | 𝐵 ⊗ 𝑈 𝐴′ 𝐵′ (15.1.163)
𝑖, 𝑗
𝑖𝑗
is a controlled unitary known as a “twisting unitary,” with each 𝑈 𝐴′ 𝐵′ a unitary
operator. Due to the fact that the maximally entangled state Φ 𝐴𝐵 is unextendible, an
arbitrary extension 𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 of a private state 𝛾 𝐴𝐴′ 𝐵𝐵′ necessarily has the following
form:
†
𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 = 𝑈 𝐴𝐴′ 𝐵𝐵′ (Φ 𝐴𝐵 ⊗ 𝜎𝐴′ 𝐵′ 𝐸 ) 𝑈 𝐴𝐴 ′ 𝐵𝐵 ′ , (15.1.164)
967
Chapter 15: Secret Key Distillation

where 𝜎𝐴′ 𝐵′ 𝐸 is an extension of 𝜎𝐴′ 𝐵′ . We start with the following lemma, which
applies to an arbitrary extension of a bipartite private state:

Lemma 15.18
Let 𝛾 𝐴𝐴′ 𝐵𝐵′ be a bipartite private state, and let 𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 be an extension of it,
as given above. Then the following identity holds for every such extension:

2 log2 𝐾 = 𝐼 ( 𝐴; 𝐵𝐵′ |𝐸) 𝛾 + 𝐼 ( 𝐴′; 𝐵| 𝐴𝐵′ 𝐸) 𝛾 . (15.1.165)

Proof: First consider that the following identity holds as a consequence of two
applications of the chain rule for conditional quantum mutual information (see
(7.2.136)):

𝐼 ( 𝐴𝐴′; 𝐵𝐵′ |𝐸) 𝛾 = 𝐼 ( 𝐴; 𝐵𝐵′ |𝐸) 𝛾 + 𝐼 ( 𝐴′; 𝐵𝐵′ | 𝐴𝐸) 𝛾

= 𝐼 ( 𝐴; 𝐵𝐵′ |𝐸) 𝛾 + 𝐼 ( 𝐴′; 𝐵′ | 𝐴𝐸) 𝛾 + 𝐼 ( 𝐴′; 𝐵|𝐵′ 𝐴𝐸) 𝛾 . (15.1.166)

Combined with the following identity, which holds for an arbitrary extension
𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 of a private state 𝛾 𝐴𝐴′ 𝐵𝐵′ ,

𝐼 ( 𝐴𝐴′; 𝐵𝐵′ |𝐸) 𝛾 = 2 log2 𝐾 + 𝐼 ( 𝐴′; 𝐵′ | 𝐴𝐸) 𝛾 , (15.1.167)

we recover the statement in (15.1.165). So it remains to prove (15.1.167).

By definition, we have that

𝐼 ( 𝐴𝐴′; 𝐵𝐵′ |𝐸) 𝛾 = 𝐻 ( 𝐴𝐴′ 𝐸) 𝛾 +𝐻 (𝐵𝐵′ 𝐸) 𝛾 −𝐻 (𝐸) 𝛾 −𝐻 ( 𝐴𝐴′ 𝐵𝐵′ 𝐸) 𝛾 . (15.1.168)

By applying (15.1.162)–(15.1.164), we can write 𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 as follows:

1 ∑︁
|𝑖⟩⟨ 𝑗 | 𝐴 ⊗ |𝑖⟩⟨ 𝑗 | 𝐵 ⊗ 𝑈 𝑖𝑖𝐴′ 𝐵′ 𝜎𝐴′ 𝐵′ 𝐸 (𝑈 𝐴′ 𝐵′ ) † .
𝑗𝑗
𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 = (15.1.169)
𝐾 𝑖, 𝑗

Tracing over system 𝐵 leads to the following state:

1 ∑︁
𝛾 𝐴𝐴′ 𝐵′ 𝐸 = |𝑖⟩⟨𝑖| 𝐴 ⊗ 𝛾 𝑖𝐴′ 𝐵′ 𝐸 , (15.1.170)
𝐾 𝑖

where
𝛾 𝑖𝐴′ 𝐵′ 𝐸 B 𝑈 𝑖𝑖𝐴′ 𝐵′ 𝜎𝐴′ 𝐵′ 𝐸 (𝑈 𝑖𝑖𝐴′ 𝐵′ ) † . (15.1.171)
968
Chapter 15: Secret Key Distillation

Similarly, tracing over system 𝐴 of 𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 leads to

1 ∑︁
𝛾 𝐵𝐴′ 𝐵′ 𝐸 = |𝑖⟩⟨𝑖| 𝐵 ⊗ 𝛾 𝑖𝐴′ 𝐵′ 𝐸 . (15.1.172)
𝐾 𝑖
So these and the chain rule for conditional entropy (see (7.2.110)) imply that
𝐻 ( 𝐴𝐴′ 𝐸) 𝛾 = 𝐻 ( 𝐴) 𝛾 + 𝐻 ( 𝐴′ 𝐸 | 𝐴) 𝛾 = log2 𝐾 + 𝐻 ( 𝐴′ 𝐸 | 𝐴) 𝛾 . (15.1.173)
Similarly, we have that
𝐻 (𝐵𝐵′ 𝐸) 𝛾 = log2 𝐾 + 𝐻 (𝐵′ 𝐸 |𝐵) 𝛾 = log2 𝐾 + 𝐻 (𝐵′ 𝐸 | 𝐴) 𝛾 , (15.1.174)
where we have used the symmetries in (15.1.170)–(15.1.172). Since 𝛾 𝐸 = 𝛾 𝑖𝐸 for
all 𝑖 (this is a consequence of 𝛾 𝐴𝐵𝐴′ 𝐵′ being an ideal private state), we find that
1 ∑︁
𝐻 (𝐸) 𝛾 = 𝐻 (𝐸) 𝛾 𝑖 = 𝐻 (𝐸 | 𝐴) 𝛾 . (15.1.175)
𝐾 𝑖
Finally, we have that
𝐻 ( 𝐴𝐴′ 𝐵𝐵′ 𝐸) 𝛾 = 𝐻 ( 𝐴𝐵𝐴′ 𝐵′ 𝐸)Φ⊗𝜎 (15.1.176)
= 𝐻 ( 𝐴𝐵)Φ + 𝐻 ( 𝐴′ 𝐵′ 𝐸)𝜎 (15.1.177)
1 ∑︁
= 𝐻 ( 𝐴′ 𝐵′ 𝐸) 𝛾 𝑖 (15.1.178)
𝐾 𝑖
= 𝐻 ( 𝐴′ 𝐵′ 𝐸 | 𝐴) 𝛾 . (15.1.179)
The first equality follows from unitary invariance of quantum entropy. The second
equality follows because the entropy is additive for tensor-product states. The
third equality follows because 𝐻 ( 𝐴𝐵)Φ = 0 since Φ 𝐴𝐵 is a pure state, and 𝜎𝐴′ 𝐵′ 𝐸
is related to 𝛾 𝑖𝐴′ 𝐵′ 𝐸 by the unitary 𝑈 𝑖𝑖𝐴′ 𝐵′ . The final equality follows by applying
(15.1.170), and the fact that conditional entropy is a convex combination of entropies
for a classical-quantum state where the conditioning system is classical. Combining
(15.1.168), (15.1.173), (15.1.174), (15.1.175), (15.1.179), and the fact that
𝐼 ( 𝐴′; 𝐵′ | 𝐴𝐸) 𝛾 = 𝐻 ( 𝐴′ 𝐸 | 𝐴) 𝛾 + 𝐻 (𝐵′ 𝐸 | 𝐴) 𝛾 − 𝐻 (𝐸 | 𝐴) 𝛾 − 𝐻 ( 𝐴′ 𝐵′ 𝐸 | 𝐴) 𝛾 ,
(15.1.180)
we recover (15.1.167). ■

We now establish the squashed entanglement upper bound for an approximate

bipartite private state:

969
Chapter 15: Secret Key Distillation

Proposition 15.19
Let 𝛾 𝐴𝐴′ 𝐵𝐵′ be a private state, with key systems 𝐴𝐵 and shield systems 𝐴′ 𝐵′,
and let 𝜔 𝐴𝐴′ 𝐵𝐵′ be an 𝜀-approximate private state, in the sense that

𝐹 (𝛾 𝐴𝐴′ 𝐵𝐵′ , 𝜔 𝐴𝐴′ 𝐵𝐵′ ) ≥ 1 − 𝜀 (15.1.181)

for 𝜀 ∈ [0, 1]. Suppose that 𝑑 𝐴 = 𝑑 𝐵 = 𝐾. Then

√ √
(1 − 2 𝜀) log2 𝐾 ≤ 𝐸 sq ( 𝐴𝐴′; 𝐵𝐵′)𝜔 + 2𝑔2 ( 𝜀), (15.1.182)

where
𝑔2 (𝛿) B (𝛿 + 1) log2 (𝛿 + 1) − 𝛿 log2 𝛿. (15.1.183)

Proof: By applying Uhlmann’s theorem for fidelity (Theorem 6.8) and the inequal-
ities relating trace distance and fidelity from Theorem 6.14, for a given extension
𝜔 𝐴𝐴′ 𝐵𝐵′ 𝐸 of 𝜔 𝐴𝐴′ 𝐵𝐵′ , there exists an extension 𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 of 𝛾 𝐴𝐴′ 𝐵𝐵′ such that
1 √
∥𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 − 𝜔 𝐴𝐴′ 𝐵𝐵′ 𝐸 ∥ 1 ≤ 𝜀. (15.1.184)
2
Defining 𝑓1 (𝛿, 𝐾) B 2𝛿 log2 𝐾 + 2𝑔2 (𝛿), we then find that
2 log2 𝐾 = 𝐼 ( 𝐴; 𝐵𝐵′ |𝐸) 𝛾 + 𝐼 ( 𝐴′; 𝐵| 𝐴𝐵′ 𝐸) 𝛾 (15.1.185)
√
≤ 𝐼 ( 𝐴; 𝐵𝐵′ |𝐸)𝜔 + 𝐼 ( 𝐴′; 𝐵| 𝐴𝐵′ 𝐸)𝜔 + 2 𝑓1 ( 𝜀, 𝐾) (15.1.186)
≤ 𝐼 ( 𝐴; 𝐵𝐵′ |𝐸)𝜔 + 𝐼 ( 𝐴′; 𝐵| 𝐴𝐵′ 𝐸)𝜔
√
+ 𝐼 ( 𝐴′; 𝐵′ | 𝐴𝐸)𝜔 + 2 𝑓1 ( 𝜀, 𝐾) (15.1.187)
√
= 𝐼 ( 𝐴𝐴′; 𝐵𝐵′ |𝐸)𝜔 + 2 𝑓1 ( 𝜀, 𝐾). (15.1.188)
The first equality follows from Lemma 15.18. The first inequality follows from
two applications of Proposition 7.10 (uniform continuity of conditional mutual
information). The second inequality follows because 𝐼 ( 𝐴′; 𝐵′ | 𝐴𝐸)𝜔 ≥ 0 (this is
strong subadditivity from Theorem 7.6). The last equality is a consequence of the
chain rule for conditional mutual information, as used in (15.1.166). Since the
inequality
√
2 log2 𝐾 ≤ 𝐼 ( 𝐴𝐴′; 𝐵𝐵′ |𝐸)𝜔 + 2 𝑓1 ( 𝜀, 𝐾) (15.1.189)
holds for an arbitrary extension of 𝜔 𝐴𝐴′ 𝐵𝐵′ , the statement of the proposition
follows. ■
970
Chapter 15: Secret Key Distillation

We now put these statements together and arrive at the following squashed-
entanglement upper bound on one-shot distillable key:

Theorem 15.20 Squashed Entanglement Upper Bound on One-Shot

Distillable Key
Let 𝜌 𝐴𝐵 be a bipartite state. For every (𝐾, 𝜀) secret-key distillation protocol
for 𝜌 𝐴𝐵 , with 𝜀 ∈ [0, 1), we have that
√ √
1 − 2 𝜀 log2 𝐾 ≤ 𝐸 sq ( 𝐴; 𝐵) 𝜌 + 2𝑔2 ( 𝜀), (15.1.190)

where 𝐸 sq ( 𝐴; 𝐵) 𝜌 is the squashed entanglement of 𝜌 𝐴𝐵 (see Section 9.4) and

𝑔2 (𝛿) B (𝛿 + 1) log2 (𝛿+1) −𝛿 log2 𝛿. Consequently, for the one-shot distillable
key, we have
√ √
1 − 2 𝜀 𝐾 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 ≤ 𝐸 sq ( 𝐴; 𝐵) 𝜌 + 2𝑔2 ( 𝜀), (15.1.191)

for every state 𝜌 𝐴𝐵 and 𝜀 ∈ [0, 1).

Proof: We exploit Theorem 15.9 and work in the bipartite picture of private-
state distillation, instead of the tripartite picture of key distillation. With this in
mind, consider a (𝐾, 𝜀) bipartite private-state distillation protocol for 𝜌 𝐴𝐵 with the
corresponding LOCC channel L 𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ . From the LOCC monotonicity of
squashed entanglement (Theorem 9.33), we have that

𝐸 sq (𝐾 𝐴 𝐴′; 𝐾 𝐵 𝐵′)𝜔 ≤ 𝐸 sq ( 𝐴; 𝐵) 𝜌 , (15.1.192)

where 𝜔 𝐾 𝐴 𝐴′ 𝐾 𝐵 𝐵′ B L 𝐴𝐵→𝐾 𝐴𝐾 𝐵 𝐴′ 𝐵′ (𝜌 𝐴𝐵 ). Continuing, by the definition in (15.1.6)

and applying Theorem 15.9, the following inequality holds

𝑝 err (L; 𝜌 𝐴𝐵 ) = 1 − 𝐹 (𝛾𝐾 𝐴 𝐴′ 𝐾 𝐵 𝐵′ , 𝜔 𝐾 𝐴 𝐴′ 𝐾 𝐵 𝐵′ ) ≤ 𝜀 (15.1.193)

for some ideal bipartite private state 𝛾𝐾 𝐴 𝐴′ 𝐾 𝐵 𝐵′ . As a consequence of Proposi-

tion 15.19, we find that
√ √
(1 − 2 𝜀) log2 𝐾 ≤ 𝐸 sq (𝐾 𝐴 𝐴′; 𝐾 𝐵 𝐵′)𝜔 + 2𝑔2 ( 𝜀). (15.1.194)

Combining (15.1.192) and (15.1.194), we conclude (15.1.190). Since this bound

holds for all (𝐾, 𝜀) key distillation protocols, the bound in (15.1.191) follows after
applying the definition in (15.1.9). ■
971
Chapter 15: Secret Key Distillation

15.1.4 Lower Bound on the Number of Secret-Key Bits via

Position-Based Coding and Convex Splitting

Having found upper bounds on one-shot distillable key, we now turn to establishing
a lower bound. In order to establish a lower bound on distillable key, we have to
find an explicit secret-key distillation protocol that works for an arbitrary bipartite
state 𝜌 𝐴𝐵 and an arbitrary error 𝜀 ∈ (0, 1). Recall that the goal of secret key
distillation is for two parties, Alice and Bob, to make use of LOPC to transform
a purification 𝜓 𝐴𝐵𝐸 of their shared state 𝜌 𝐴𝐵 to an ideal key state of the form in
Definition 15.1, with the key size 𝐾 as large as possible, subject to the constraint
that the error not exceed 𝜀. Furthermore, we allow them to make use of public
classical communication for free.
Before we get into the details, let us first slightly modify the model of secret key
distillation, and we discuss later how the model we have already discussed can fit
together with this alternative model. The alternative model consists of supposing
that the state shared by Alice, Bob, and Eve, is a classical–quantum–quantum state
𝜌 𝑋 𝐵𝐸 of the following form:
∑︁
𝜌 𝑋 𝐵𝐸 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐵𝐸 , (15.1.195)
𝑥∈X

where X is a finite alphabet, 𝑝 : X → [0, 1] is a probability distribution, and

{𝜌 𝑥𝐵𝐸 }𝑥∈X is a set of states. The goal of a secret-key distillation protocol in
this setting is to perform an LOPC channel L↔ 𝑋 𝐵→𝐾 𝐴 𝐾 𝐵 𝑍 such that the final state
↔
𝜔 𝐾 𝐴𝐾 𝐵 𝐸 𝑍 B L 𝑋 𝐵→𝐾 𝐴𝐾 𝐵 𝐸 𝑍 (𝜌 𝑋 𝐵𝐸 ) satisfies

↔
𝑝 err (L ; 𝜌 𝑋 𝐵𝐸 ) B inf 1 − 𝐹 (Φ𝐾 𝐴𝐾 𝐵 ⊗ 𝜎𝐸 𝑍 , 𝜔 𝐾 𝐴𝐾 𝐵 𝐸 𝑍 ) ≤ 𝜀, (15.1.196)
𝜎𝐸 𝑍

where the error 𝜀 ∈ [0, 1], the infimum is with respect to every state 𝜎𝐸 𝑍 , and
Φ𝐾 𝐴𝐾 𝐵 is a maximally classically correlated state of size 𝐾
𝐾−1
1 ∑︁
Φ𝐾 𝐴 𝐾 𝐵 B |𝑖⟩⟨𝑖| 𝐾 𝐴 ⊗ |𝑖⟩⟨𝑖| 𝐾 𝐵 . (15.1.197)
𝐾 𝑖=0

We then define the one-shot distillable key of 𝜌 𝑋 𝐵𝐸 as follows:

𝐾 𝐷𝜀 (𝜌 𝑋 𝐵𝐸 ) B sup log2 𝐾 : 𝑝 err (L↔ ; 𝜌 𝑋 𝐵𝐸 ) ≤ 𝜀 ,

(15.1.198)
(𝐾,L↔ )

972
Chapter 15: Secret Key Distillation

where the optimization is over all 𝐾 ∈ N and every LOPC channel L↔

𝑋 𝐵→𝐾 𝐴 𝐾 𝐵 𝑍
with 𝑑 𝐾 𝐴 = 𝑑 𝐾 𝐵 = 𝐾.
The main idea behind the lower bound is to exhibit a particular protocol that
accomplishes the task of secret key distillation. The protocol we devise is simple to
describe but more involved to analyze. Additionally, it really does take advantage
of the fact that free public classical communication is allowed, in the sense that a
large amount of public classical communication is employed. The protocol begins
with Alice, Bob, and Eve sharing the state 𝜌 𝑋 𝐵𝐸 , with Alice possessing system
𝑋, Bob 𝐵, and Eve 𝐸. Alice picks a value 𝑘 uniformly at random from the set
[𝐾] B {1, . . . , 𝐾 }. This will end up being the value of the key. She also picks a
value 𝑟 uniformly at random from the set [𝑅] B {1, . . . , 𝑅}, where 𝑅 ∈ N. This
variable plays the role of randomness that is used to confuse Eve about which key
value 𝑘 was chosen. Once 𝑘 and 𝑟 have been selected, Alice labels her system 𝑋 of
the state 𝜌 𝑋 𝐵𝐸 by the pair (𝑘, 𝑟), as 𝑋𝑘,𝑟 . Alice then prepares 𝐾 𝑅 − 1 independent
instances of the classical state
∑︁
𝑝(𝑥)|𝑥⟩⟨𝑥|, (15.1.199)
𝑥∈X

and labels the resulting systems as 𝑋1,1 , . . . , 𝑋𝑘,𝑟−1 , 𝑋𝑘,𝑟+1 , . . . , 𝑋𝐾 𝑅 . Alice then
sends the classical registers 𝑋1,1 , . . . , 𝑋𝐾,𝑅 in lexicographic order over a public
classical communication channel, so that both Bob and Eve receive copies of them.
At this point, for fixed values of 𝑘 and 𝑟, the global shared state of Alice, Bob, and
Eve is as follows:

𝜌 𝑋𝑘,𝑟𝐾 𝑅 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 B 𝜌 𝑋1,1 𝑋1,1

′ 𝑋 ′′ ⊗ · · · ⊗ 𝜌 𝑋
1,1 𝑘,𝑟 −1 𝑋 𝑘,𝑟 −1 𝑋 𝑘,𝑟 −1 ⊗
′ ′′

′ 𝑋 ′′ 𝐵𝐸 ⊗ 𝜌 𝑋
𝑘,𝑟+1 𝑋 𝑘,𝑟+1 𝑋 𝑘,𝑟+1 ⊗ · · · ⊗ 𝜌 𝑋𝐾 ,𝑅 𝑋𝐾 ,𝑅 𝑋𝐾 ,𝑅 , (15.1.200)
𝜌 𝑋𝑘,𝑟 𝑋𝑘,𝑟 ′ ′′ ′ ′′
𝑘,𝑟

where Bob possesses all systems labeled as 𝑋 ′ (in addition to his 𝐵 system) and Eve
possesses all systems labeled as 𝑋 ′′ (in addition to her 𝐸 system). Furthermore,
′ 𝑋 ′′ = · · · = 𝜌 𝑋
𝜌 𝑋1,1 𝑋1,1 ′ ′′ (15.1.201)
1,1 𝑘,𝑟 −1 𝑋 𝑘,𝑟 −1 𝑋 𝑘,𝑟 −1

= 𝜌 𝑋𝑘,𝑟+1 𝑋𝑘,𝑟+1
′ ′′
𝑋 𝑘,𝑟+1 = · · · = 𝜌 𝑋𝐾 ,𝑅 𝑋𝐾′ ,𝑅 𝑋𝐾′′ ,𝑅 (15.1.202)
∑︁
= 𝑝(𝑥)|𝑥𝑥𝑥⟩⟨𝑥𝑥𝑥|, (15.1.203)
𝑥∈X

and ∑︁
′ 𝑋 ′′ 𝐵𝐸 =
𝜌 𝑋𝑘,𝑟 𝑋𝑘,𝑟 𝑘,𝑟
𝑝(𝑥)|𝑥𝑥𝑥⟩⟨𝑥𝑥𝑥| ⊗ 𝜌 𝑥𝐵𝐸 . (15.1.204)
𝑥∈X

973
Chapter 15: Secret Key Distillation

Thus, it is only the 𝑋𝑘,𝑟 classical system that has correlation with Bob and Eve’s
systems 𝐵𝐸 and all others have no correlation whatsoever. The objective of the
key distillation protocol is for Bob to identify the 𝑋𝑘,𝑟 system that has correlation
with his (and in this way, identify the key value), while the randomness variable
𝑟 should have sufficient size 𝑅 to severely reduce the chance that Eve can guess
which 𝑋 ′′ system is correlated with hers. The reduced state of Bob, for fixed 𝑘 and
𝑟, is as follows:

𝜌 𝑋𝑘,𝑟′𝐾 𝑅 𝐵 = 𝜌 𝑋1,1
′ ⊗ · · · ⊗ 𝜌𝑋′
𝑘,𝑟 −1
⊗ 𝜌 𝑋𝑘,𝑟
′ 𝐵 ⊗ 𝜌𝑋′
𝑘,𝑟+1
⊗ · · · ⊗ 𝜌 𝑋𝐾′ ,𝑅 , (15.1.205)

while the reduced state of Eve, for a fixed value of 𝑘, is as follows:

𝑅
1 ∑︁
𝜌 𝑋𝑘 ′′𝐾 𝑅 𝐸 B 𝜌 𝑋 ′′ ⊗ · · · ⊗ 𝜌 𝑋𝑘,𝑟
′′
−1
⊗ 𝜌 𝑋𝑘,𝑟
′′ 𝐸 ⊗ 𝜌 𝑋 ′′ ⊗ · · · ⊗ 𝜌 𝑋𝐾′′ ,𝑅 . (15.1.206)
𝑅 𝑟=1 1,1 𝑘,𝑟+1

The idea behind confusing Eve is that if 𝑅 is large enough, then it becomes difficult
for Eve to determine which 𝑋 ′′ system is correlated with her system 𝐸. What we
show later is that if 𝑅 is large enough, then her reduced state, for all key values 𝑘,
is essentially indistinguishable from the following product state:

′′ ⊗ · · · ⊗ 𝜌 𝑋 ′′
𝜌 𝑋1,1 ⊗ 𝜌𝐸 , (15.1.207)
𝐾 ,𝑅

where 𝜌 𝐸 = Tr 𝑋 𝐵 [𝜌 𝑋 𝐵𝐸 ]. If that is the case, then she can figure out essentially
nothing about the key value 𝑘, leaving her no strategy other than to try and randomly
guess it.
To analyze this protocol in detail, we employ two methods: position-based cod-
ing, as used previously in Section 11.1.3 in the context of classical communication,
and another idea known as convex splitting. Looking at Bob’s state in (15.1.205)
and comparing it with that in (11.1.99), it is natural to employ position-based
coding to figure out the value of 𝑘 and 𝑟. Indeed, invoking Proposition 11.8 (in
particular, (11.1.130)–(11.1.131)), if the following condition holds

𝜀−𝜂 4𝜀
log2 𝐾 𝑅 = 𝐼 𝐻 (𝑋; 𝐵) 𝜌 − log2 2 , (15.1.208)
𝜂
𝜀−𝜂
for 𝜂 ∈ (0, 𝜀), and where 𝐼 𝐻 (𝑋; 𝐵) 𝜌 is the hypothesis testing mutual information
defined in (7.11.88), then Bob can decode 𝑘 and 𝑟 with error probability no larger
than 𝜀. We would also like to guarantee that Eve’s state in (15.1.206) is close to the

974
Chapter 15: Secret Key Distillation

product state in (15.1.207). This is where the convex-split lemma is useful, which
states the following: If
√
𝜀−𝜂 2
log2 𝑅 = 𝐼 max (𝐸; 𝑋) 𝜌 + log2 2 , (15.1.209)
𝜂
then there exists a state e
𝜌 𝐸 satisfying

1 − 𝐹 (𝜌 𝑋𝑘 ′𝐾 𝑅 𝐸 , 𝜌 𝑋1,1
′′ ⊗ · · · ⊗ 𝜌 𝑋 ′′
𝐾 ,𝑅
⊗e
𝜌 𝐸 ) ≤ 𝜀, (15.1.210)
√
and 𝑃(e 𝜌 𝐸 , 𝜌 𝐸 ) ≤ 𝜀 − 𝜂. Observe that the inequality above holds for all key values
𝛿
𝑘. In the above, 𝐼 max (𝐸; 𝑋) 𝜌 is a smooth max-mutual information quantity defined
for 𝛿 ∈ (0, 1) as
𝛿
𝐼 max (𝐸; 𝑋) 𝜌 B inf 𝐷 max (e
𝜌𝑋𝐸 ∥ 𝜌𝑋 ⊗ e
𝜌 𝐸 ). (15.1.211)
𝜌 𝑋𝐸 ,𝜌 𝑋𝐸 )≤𝛿
𝜌 𝑋𝐸 :𝑃(e
e

𝛿
Observe that 𝐼 max (𝐸; 𝑋) 𝜌 is different from the smooth max-mutual information
quantity defined previously in (15.1.59). By suitably combining position-based
coding with convex splitting and subtracting (15.1.209) from (15.1.208), we thus
arrive at the conclusion that Alice and Bob can distill a key 𝐾 of size
√
𝜀−𝜂 𝜀−𝜂 4𝜀 2
log2 𝐾 = 𝐼 𝐻 (𝑋; 𝐵) 𝜌 − 𝐼 max (𝐸; 𝑋) 𝜌 − log2 2 − log2 2 , (15.1.212)
𝜂 𝜂
and be guaranteed that

1. Bob can decode the key value 𝑘 with error probability no larger than 𝜀 and
2. the key value is secure from Eve with security parameter 𝜀 (as given in
(15.1.210)).

Having discussed the protocol for key distillation and some intuition justifying
why the scheme works, we now formally state a lower bound on the one-shot
distillable key of a state 𝜌 𝑋 𝐵𝐸 :

Theorem 15.21
Let 𝜌 𝑋 𝐵𝐸 be a classical–quantum–quantum state, with system
√ 𝑋 held by Alice,
𝐵 by Bob, and 𝐸 by Eve. For all 𝜀 ∈ (0, 1], 𝜀 = 1 − 1 − 𝜀, 𝛿 ∈ (0, 𝜀′),
′

975
Chapter 15: Secret Key Distillation

𝜂 ∈ (0, 𝜀′ − 𝛿), and 𝜁 ∈ (0, 𝛿), there exists a (𝐾, 𝜀) one-way key distillation
protocol for 𝜌 𝑋 𝐵𝐸 with

𝜀 ′ −𝛿−𝜂 𝛿−𝜁
log2 𝐾 = 𝐼 𝐻 (𝑋; 𝐵) 𝜌 − 𝐼 max (𝐸; 𝑋) 𝜌
4(𝜀′ − 𝛿)

2
− log2 − log2 , (15.1.213)
𝜂2 𝜁2
𝜀 ′ −𝛿−𝜂
where the hypothesis testing mutual information 𝐼 𝐻 (𝑋; 𝐵) 𝜌 is defined in
𝛿−𝜁
(7.11.88) and the smooth max-mutual information 𝐼 max (𝐸; 𝑋) 𝜌 is defined in
(15.1.211).

As discussed above, one of the main tools that we employ to prove this theorem
is the smooth convex-split lemma, which we state here and prove in Appendix 15.A.

Lemma 15.22 Smooth convex split

Let 𝜌 𝐴𝐸 be a state, and let 𝑅 ∈ N. Let 𝜏𝐴1 ···𝐴𝑅 𝐸 denote the following state:
𝑅
1 ∑︁
𝜏𝐴1 ···𝐴𝑅 𝐸 B 𝜌 𝐴 ⊗ · · · ⊗ 𝜌 𝐴𝑟 −1 ⊗ 𝜌 𝐴𝑟 𝐸 ⊗ 𝜌 𝐴𝑟+1 ⊗ · · · ⊗ 𝜌 𝐴𝑅 . (15.1.214)
𝑅 𝑟=1 1

Let 𝜀 ∈ (0, 1) and 𝜂 ∈ (0, 𝜀). If

𝜀−𝜂 2
log2 𝑅 ≥ 𝐼 max (𝐸; 𝐴) 𝜌 + log2 2 , (15.1.215)
𝜂

then there exists a state e

𝜌 𝐸 satisfying

𝑃(𝜏𝐴1 ···𝐴𝑅 𝐸 , 𝜌 𝐴1 ⊗ · · · ⊗ 𝜌 𝐴𝑅 ⊗ e
𝜌 𝐸 ) ≤ 𝜀, (15.1.216)

𝜌 𝐸 , 𝜌 𝐸 ) ≤ 𝜀 − 𝜂.
and 𝑃(e

We now prove Theorem 15.21.

Proof (Proof of Theorem 15.21): Fix 𝜀 ∈ (0, 1], 𝛿 ∈ (0, 𝜀), 𝜂 ∈ (0, 𝜀 − 𝛿), and
𝜁 ∈ (0, 𝛿). Alice performs the key distillation protocol discussed in the paragraph

976
Chapter 15: Secret Key Distillation

surrounding (15.1.199)–(15.1.207). The global state, for fixed 𝑘 and 𝑟 is as given in

(15.1.200); the reduced state of Bob, for fixed 𝑘 and 𝑟 is as given in (15.1.205); and
the reduced state of Eve, for fixed 𝑘, is given by (15.1.206). The overall global state,
including Alice’s classical registers that hold the key value 𝑘 and the randomness
value 𝑟, is as follows:

𝜌 𝐾 𝐴 𝑅 𝐴 𝑋 𝐾 𝑅 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 B
𝐾 𝑅
1 ∑︁ ∑︁
|𝑘⟩⟨𝑘 | 𝐾 𝐴 ⊗ |𝑟⟩⟨𝑟 | 𝑅 𝐴 ⊗ 𝜌 𝑋𝑘,𝑟𝐾 𝑅 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 , (15.1.217)
𝐾𝑅 𝑘=1 𝑟=1

where 𝜌 𝑋𝑘,𝑟𝐾 𝑅 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 is defined in (15.1.200). Tracing over the 𝑅 𝐴 and 𝑋 𝐾 𝑅

systems, the state becomes
𝐾 𝑅
1 ∑︁ 1 ∑︁ 𝑘,𝑟
𝜌 𝐾 𝐴 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 B |𝑘⟩⟨𝑘 | 𝐾 𝐴 ⊗ 𝜌 (15.1.218)
𝐾 𝑘=1 𝑅 𝑟=1 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸

By Proposition 11.8, we conclude that if

𝜀−𝛿−𝜂 4(𝜀 − 𝛿)
log2 𝐾 𝑅 = 𝐼𝐻 (𝑋; 𝐵) 𝜌 − log2 , (15.1.219)
𝜂2

then there exists a POVM {Λ𝑘,𝑟 }

𝑋 ′𝐾 𝑅 𝐵 𝑘 ∈[𝐾],𝑟∈[𝑅]
such that

Tr[Λ𝑘,𝑟 𝜌 𝑘,𝑟 ] ≥ 1 − (𝜀 − 𝛿)
𝑋 ′𝐾 𝑅 𝐵 𝑋 ′𝐾 𝑅 𝐵
∀𝑘 ∈ [𝐾] , 𝑟 ∈ [𝑅] . (15.1.220)

Let us define the measurement channel M′𝑋 ′𝐾 𝑅 𝐵→𝐾 as follows:

𝐵 𝑅𝐵

𝐾 ∑︁
∑︁ 𝑅
M′𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 𝑅 𝐵 (𝜏𝑋 ′𝐾 𝑅 𝐵 ) B Tr[Λ𝑘,𝑟 𝜏 ′𝐾 𝑅 𝐵 ]|𝑘⟩⟨𝑘 | 𝐾 𝐵 ⊗ |𝑟⟩⟨𝑟 | 𝑅 𝐵 ,
𝑋 ′𝐾 𝑅 𝐵 𝑋
𝑘=1 𝑟=1
(15.1.221)
and the reduced measurement channel M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 as

M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 (𝜏𝑋 ′𝐾 𝑅 𝐵 ) B (Tr 𝑅 𝐵 ◦M′𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 𝑅 𝐵 )(𝜏𝑋 ′𝐾 𝑅 𝐵 ) (15.1.222)

𝐾
∑︁
= Tr[Λ𝑘,𝑟 𝜏 ′𝐾 𝑅 𝐵 ]|𝑘⟩⟨𝑘 | 𝐾 𝐵 .
𝑋 ′𝐾 𝑅 𝐵 𝑋
(15.1.223)
𝑘=1

Observe that

977
Chapter 15: Secret Key Distillation

1
M′𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 𝑅 𝐵 (𝜌 𝑋𝑘,𝑟′𝐾 𝑅 𝐵 ) − |𝑘⟩⟨𝑘 | 𝐾 𝐵 ⊗ |𝑟⟩⟨𝑟 | 𝑅 𝐵
2 1
= 1 − Tr[Λ 𝑋 ′𝐾 𝑅 𝐵 𝜌 𝑋𝑘,𝑟′𝐾 𝑅 𝐵 ] ≤ 𝜀 − 𝛿, (15.1.224)
𝑘,𝑟

which follows from the same calculation given in (11.1.18)–(11.1.24). By the

data-processing inequality for the trace distance, we conclude that
1
M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 (𝜌 𝑋𝑘,𝑟′𝐾 𝑅 𝐵 ) − |𝑘⟩⟨𝑘 | 𝐾 𝐵 ≤ 𝜀 − 𝛿. (15.1.225)
2 1

From convexity of trace distance, we conclude that

𝑅
!
1 1 ∑︁ 𝑘,𝑟
M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 𝜌 − |𝑘⟩⟨𝑘 | 𝐾 𝐵 ≤ 𝜀 − 𝛿. (15.1.226)
2 𝑅 𝑟=1 𝑋 ′𝐾 𝑅 𝐵
1

The actual state at the end of the protocol is as follows:

M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 (𝜌 𝐾 𝐴 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 ), (15.1.227)

and the ideal state to generate is

𝐾
1 ∑︁
Φ𝐾 𝐴 𝐾 𝐵 ⊗ e
𝜌 𝑋 ′′𝐾 𝑅 𝐸 = |𝑘⟩⟨𝑘 | 𝐾 𝐴 ⊗ |𝑘⟩⟨𝑘 | 𝐾 𝐵 ⊗ e
𝜌 𝑋 ′′𝐾 𝑅 𝐸 , (15.1.228)
𝐾 𝑘=1

where e𝜌 𝑋 ′′𝐾 𝑅 𝐸 is some state of the eavesdropper Eve’s systems 𝑋 ′′𝐾 𝑅 𝐸. Thus, our
goal is to find an upper bound on the following quantity
1
M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 (𝜌 𝐾 𝐴 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 ) − Φ𝐾 𝐴𝐾 𝐵 ⊗ e
𝜌 𝑋 ′′𝐾 𝑅 𝐸 , (15.1.229)
2 1

which we will convert at the end to an upper bound on

1 − 𝐹 (M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 (𝜌 𝐾 𝐴 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 ), Φ𝐾 𝐴𝐾 𝐵 ⊗ e
𝜌 𝑋 ′′𝐾 𝑅 𝐸 ). (15.1.230)

To this end, let us first consider bounding the following intermediate quantity:
𝐾 𝑅
1 1 ∑︁ 1 ∑︁ 𝑘,𝑟
M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 (𝜌 𝐾 𝐴 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 ) − |𝑘⟩⟨𝑘 | 𝐾 𝐴 ⊗ |𝑘⟩⟨𝑘 | 𝐾 𝐵 ⊗ 𝜌 .
2 𝐾 𝑘=1 𝑅 𝑟=1 𝑋 ′′𝐾 𝑅 𝐸
1
(15.1.231)
978
Chapter 15: Secret Key Distillation

We find that
𝐾 𝑅
1 1 ∑︁ 1 ∑︁ 𝑘,𝑟
M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 (𝜌 𝐾 𝐴 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 ) − |𝑘⟩⟨𝑘 | 𝐾 𝐴 ⊗ |𝑘⟩⟨𝑘 | 𝐾 𝐵 ⊗ 𝜌
2 𝐾 𝑘=1 𝑅 𝑟=1 𝑋 ′′𝐾 𝑅 𝐸
Í 1
Í 𝑘,𝑟
1 𝐾1 𝐾𝑘=1 |𝑘⟩⟨𝑘 | 𝐾 𝐴 ⊗ M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 𝑅1 𝑟=1 𝑅
𝜌 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸
= Í Í 𝑅 𝑘,𝑟 (15.1.232)
2 − 𝐾1 𝐾𝑘=1 |𝑘⟩⟨𝑘 | 𝐾 𝐴 ⊗ |𝑘⟩⟨𝑘 | 𝐾 𝐵 ⊗ 𝑅1 𝑟=1 𝜌 𝑋 ′′𝐾 𝑅 𝐸 1
𝐾 𝑅
! 𝑅
1 ∑︁ 1 1 ∑︁ 𝑘,𝑟 1 ∑︁ 𝑘,𝑟
= M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 𝜌 − |𝑘⟩⟨𝑘 | 𝐾 𝐵 ⊗ 𝜌 .
𝐾 𝑘=1 2 𝑅 𝑟=1 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 𝑅 𝑟=1 𝑋 ′′𝐾 𝑅 𝐸
1
(15.1.233)

Now let us define the state

𝑅
!
𝑘 ′ ,𝑘 1 1 ∑︁ ′ ′
𝜔 𝑋 ′′𝐾 𝑅 𝐸 B ′
Tr 𝑋 ′𝐾 𝑅 𝐵 [Λ𝑘𝑋 ′𝐾
,𝑟 𝑘,𝑟
𝑅 𝐵 𝜌 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 ] , (15.1.234)
𝑞(𝑘 |𝑘) 𝑅 𝑟,𝑟 ′ =1
𝑅
′ 1 ∑︁ ′ ,𝑟 ′
𝑞(𝑘 |𝑘) B Tr[Λ𝑘𝑋 ′𝐾 𝑘,𝑟
𝑅 𝐵 𝜌 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 ]. (15.1.235)
𝑅 𝑟,𝑟 ′ =1

Consider that
𝐾 𝑅
∑︁
′ ′ ,𝑘 1 ∑︁ 𝑘,𝑟
𝑞(𝑘 |𝑘)𝜔 𝑘𝑋 ′′𝐾 = 𝜌 . (15.1.236)
𝑘 ′ =1
𝑅𝐸
𝑅 𝑟=1 𝑋 ′′𝐾 𝑅 𝐸
Then we can write
𝑅
! 𝐾
1 ∑︁ ∑︁ ′ ,𝑘
M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 𝑘,𝑟
𝜌 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 = 𝑞(𝑘 ′ |𝑘)|𝑘 ′⟩⟨𝑘 ′ | 𝐾 𝐵 ⊗ 𝜔 𝑘𝑋 ′′𝐾 𝑅 𝐸 , (15.1.237)
𝑅 𝑟=1 𝑘 ′ =1

so that
𝑅
! 𝐾
1 ∑︁ ∑︁
M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 𝜌 𝑘,𝑟
= 𝑞(𝑘 ′ |𝑘)|𝑘 ′⟩⟨𝑘 ′ | 𝐾 𝐵 . (15.1.238)
𝑅 𝑟=1 𝑋 ′𝐾 𝑅 𝐵 𝑘 ′ =1

Using these observations, we can finally write

𝐾 𝑅
! 𝑅
1 ∑︁ 1 1 ∑︁
𝑘,𝑟 1 ∑︁ 𝑘,𝑟
M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 𝜌 − |𝑘⟩⟨𝑘 | 𝐾 𝐵 ⊗ 𝜌
𝐾 𝑘=1 2 𝑅 𝑟=1 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 𝑅 𝑟=1 𝑋 ′′𝐾 𝑅 𝐸
1

979
Chapter 15: Secret Key Distillation

𝐾 𝐾 𝐾
1 ∑︁ 1 ∑︁ ′ ′ ′ 𝑘 ′ ,𝑘
∑︁ ′ ,𝑘
= 𝑞(𝑘 |𝑘)|𝑘 ⟩⟨𝑘 | 𝐾 𝐵 ⊗ 𝜔 𝑋 ′′𝐾 𝑅 𝐸 − |𝑘⟩⟨𝑘 | 𝐾 𝐵 ⊗ 𝑞(𝑘 ′ |𝑘)𝜔 𝑘𝑋 ′′𝐾 𝑅𝐸
𝐾 𝑘=1 2 𝑘 ′ =1 𝑘 ′ =1 1
(15.1.239)
𝐾 𝐾
1 ∑︁ ∑︁ ′ 1 ′ ,𝑘 𝑘 ′ ,𝑘
≤ 𝑞(𝑘 |𝑘) |𝑘 ′⟩⟨𝑘 ′ | 𝐾 𝐵 ⊗ 𝜔 𝑘𝑋 ′′𝐾 𝑅 𝐸 − |𝑘⟩⟨𝑘 | 𝐾 𝐵 ⊗ 𝜔 𝑋 ′′𝐾 𝑅 𝐸
𝐾 𝑘=1 𝑘 ′ =1 2 1

(15.1.240)
𝐾 𝐾
1 ∑︁ ∑︁ ′ 1 ′ ′
= 𝑞(𝑘 |𝑘) |𝑘 ⟩⟨𝑘 | 𝐾 𝐵 − |𝑘⟩⟨𝑘 | 𝐾 𝐵 1 (15.1.241)
𝐾 𝑘=1 𝑘 ′ =1 2
𝐾 𝐾
1 ∑︁ ∑︁
= 𝑞(𝑘 ′ |𝑘) (15.1.242)
𝐾 𝑘=1 ′
𝑘 =1,
𝑘 ′ ≠𝑘
𝐾 𝑅
!
1 ∑︁ 1 1 ∑︁ 𝑘,𝑟
= M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 𝜌 − |𝑘⟩⟨𝑘 | 𝐾 𝐵 (15.1.243)
𝐾 𝑘=1 2 𝑅 𝑟=1 𝑋 ′𝐾 𝑅 𝐵
1
≤ 𝜀 − 𝛿. (15.1.244)
We thus conclude that
𝐾 𝑅
1 1 ∑︁ 1 ∑︁ 𝑘,𝑟
M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 (𝜌 𝐾 𝐴 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 ) − |𝑘⟩⟨𝑘 | 𝐾 𝐴 ⊗ |𝑘⟩⟨𝑘 | 𝐾 𝐵 ⊗ 𝜌
2 𝐾 𝑘=1 𝑅 𝑟=1 𝑋 ′′𝐾 𝑅 𝐸
1
≤ 𝜀 − 𝛿. (15.1.245)

We now turn to the analysis of privacy. Starting from the overall global state
(15.1.217), and fixing a value of 𝑘, the reduced state of Eve’s systems is as follows:
𝑅
1 ∑︁ 𝑘,𝑟
𝜌 𝑋𝑘 ′′𝐾 𝑅 𝐸 = 𝜌 ′′ ⊗ · · · ⊗ 𝜌 𝑋 ′′
= 𝜌 𝑋1,1
𝑅 𝑟=1 𝑋 ′′𝐾 𝑅 𝐸 𝑘−1,𝑅

𝑅
1 ∑︁
⊗ 𝜌 𝑋 ′′ ⊗ · · · ⊗ 𝜌 𝑋𝑘,𝑟
′′
−1
⊗ 𝜌 𝑋𝑘,𝑟
′′ 𝐸 ⊗ 𝜌 𝑋 ′′ ⊗ · · · ⊗ 𝜌 𝑋𝑘,𝑅
′′
𝑅 𝑟=1 𝑘,1 𝑘,𝑟+1

⊗ 𝜌 𝑋𝑘+1,1
′′ ⊗ · · · ⊗ 𝜌 𝑋𝐾′′ ,𝑅 . . (15.1.246)
Our goal is to show that
1 𝑘
𝜌 ′′𝐾 𝑅 − 𝜌 𝑋 ′′𝐾 𝑅 ⊗ e
𝜌𝐸 ≤ 𝛿, (15.1.247)
2 𝑋 𝐸 1

980
Chapter 15: Secret Key Distillation

for some state e𝜌 𝐸 . By the invariance of the trace distance with respect to tensor-
product states, i.e., ∥𝜎 ⊗ 𝜏 − 𝜔 ⊗ 𝜏∥ 1 = ∥𝜎 − 𝜔∥ 1 , we find that
1 𝑘
𝜌 ′′𝐾 𝑅 − 𝜌 𝑋 ′′𝐾 𝑅 ⊗ e
𝜌𝐸 1 (15.1.248)
2 𝑋 𝐸
1 𝑘
= 𝜌 ′′ ′′ − 𝜌 𝑋𝑘,1
′′ ···𝑋 ′′ ⊗ e
𝜌𝐸 (15.1.249)
2 𝑋𝑘,1 ···𝑋𝑘,𝑅 𝐸 𝑘,𝑅
1
𝑅
1 1 ∑︁
= 𝜌 𝑋 ⊗ · · · ⊗ 𝜌 𝑋𝑘,𝑟 −1 ⊗ 𝜌 𝑋𝑘,𝑟 𝐸 − 𝜌 𝑋𝑘,𝑟 ⊗ e
′′ ′′ ′′ ′′ 𝜌 𝐸 ⊗ 𝜌 𝑋𝑘,𝑟+1
′′ ⊗ · · · ⊗ 𝜌 𝑋𝑘,𝑅
′′ .
2 𝑅 𝑟=1 𝑘,1
1
(15.1.250)

By invoking the smooth convex-split lemma (Lemma 15.22) and the inequality
relating normalized trace distance and sine distance (see (6.2.88)), we find that if
we pick 𝑅 such that

𝛿−𝜁 2
log2 𝑅 = 𝐼 max (𝐸; 𝑋) 𝜌 + log2 2 , (15.1.251)
𝜁
then we are guaranteed that
1 𝑘
𝜌 ′′𝐾 𝑅 − 𝜌 𝑋 ′′𝐾 𝑅 ⊗ e
𝜌𝐸 ≤ 𝛿, (15.1.252)
2 𝑋 𝐸 1

where e
𝜌 𝐸 is some state such that 𝑃(e𝜌 𝐸 , 𝜌 𝐸 ) ≤ 𝛿 − 𝜁. Now combining (15.1.245)
and (15.1.252) with the triangle inequality, we conclude the desired statement:
1
M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 (𝜌 𝐾 𝐴 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 ) − Φ𝐾 𝐴𝐾 𝐵 ⊗ 𝜌 𝑋 ′′𝐾 𝑅 ⊗ e
𝜌𝐸 ≤ 𝜀. (15.1.253)
2 1

We finally conclude that

𝜌 𝐸 ) ≤ 𝜀 (2 − 𝜀)
1 − 𝐹 (M 𝑋 ′𝐾 𝑅 𝐵→𝐾 𝐵 (𝜌 𝐾 𝐴 𝑋 ′𝐾 𝑅 𝑋 ′′𝐾 𝑅 𝐵𝐸 ), Φ𝐾 𝐴𝐾 𝐵 ⊗ 𝜌 𝑋 ′′𝐾 𝑅 ⊗ e
(15.1.254)
by exploiting the inequality in (6.2.88) relating fidelity and trace distance. Now
√ the inverse function of 𝜀(2 − 𝜀), with domain and range given
using the fact that
by [0, 1], is 1 − 1 − 𝜀 and reassigning 𝜀 (2 − 𝜀) as 𝜀, we conclude the desired
statement in (15.1.213). ■

The result of Theorem 15.21 applies to the model of secret key distillation
outlined in the paragraph containing (15.1.195)–(15.1.198). To extend it to the
981
Chapter 15: Secret Key Distillation

main model considered in this chapter (and outlined in Section 15.1), we can allow
Alice and Bob to perform an LOPC channel L↔ 𝐴𝐵→𝑋 𝐵′ 𝑍 to obtain the following
state:
𝜌 𝑋 𝐵 ′ 𝐸 𝑍 B L↔
𝐴𝐵→𝑋 𝐵′ 𝑍 (𝜓 𝐴𝐵𝐸 ), (15.1.255)
where 𝜓 𝐴𝐵𝐸 is a purification of the state 𝜌 𝐴𝐵 of interest and L↔
𝐴𝐵→𝑋 𝐵′ 𝑍 is an LOPC
channel with classical output system 𝑋 and quantum output system 𝐵′. Then we
′

obtain the following by applying Theorem 15.21:

Corollary 15.23
Let 𝜌 𝐴𝐵 be a bipartite state, with system
√ 𝑋 held by Alice, 𝐵 by Bob, and 𝐸
by Eve. For all 𝜀 ∈ (0, 1], 𝜀 B 1 − 1 − 𝜀, 𝛿 ∈ (0, 𝜀 ), 𝜂 ∈ (0, 𝜀′ − 𝛿), and
′ ′

𝜁 ∈ (0, 𝛿), there exists a (𝐾, 𝜀) one-way key distillation protocol for 𝜌 𝐴𝐵 with

𝜀 ′ −𝛿−𝜂 𝛿−𝜁
log2 𝐾 = 𝐼 𝐻 (𝑋; 𝐵′) 𝜌 − 𝐼 max (𝐸 𝑍; 𝑋) 𝜌
4(𝜀′ − 𝛿)

2
− log2 − log2 , (15.1.256)
𝜂2 𝜁2
𝜀 ′ −𝛿−𝜂
where the hypothesis testing mutual information 𝐼 𝐻 (𝑋; 𝐵′) 𝜌 is defined
𝛿−𝜁
in (7.11.88), the smooth max-mutual information 𝐼 max (𝐸 𝑍; 𝑋) 𝜌 is defined
in (15.1.211), and these quantities are evaluated with respect to the state in
(15.1.255), with L↔𝐴𝐵→𝑋 𝐵′ 𝑍 an LOPC channel with classical output system 𝑋
and quantum output system 𝐵′. Consequently, for the one-shot distillable key
of 𝜌 𝐴𝐵 , we have

𝜀 ′ −𝛿−𝜂 𝛿−𝜁
𝐾 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 ≥ sup 𝐼𝐻 (𝑋; 𝐵′) 𝜌 − 𝐼 max (𝐸 𝑍; 𝑋) 𝜌
L↔ ,𝛿∈(0,𝜀 ′ ),
𝜂∈(0,𝜀 ′ −𝛿),𝜁 ∈(0,𝛿)
4(𝜀′ − 𝛿)

2
− log2 2
− log2 2 , (15.1.257)
𝜂 𝜁
√
where 𝜀′ B 1 − 1 − 𝜀 and the optimization is over every LOPC channel
L↔
𝐴𝐵→𝑋 𝐵′ 𝑍 .

Now combining Corollary 15.23 with Propositions 7.64 and 7.72, we conclude
the following lower bound on one-shot distillable key:

982
Chapter 15: Secret Key Distillation

Corollary 15.24
Let 𝜌 𝐴𝐵 be√ a bipartite state′ with purification 𝜓 𝐴𝐵𝐸 . For all 𝜀 ∈ (0, 1),
′ ′
𝜀 = 1 − 1 − 𝜀, 𝛿 ∈ (0, 𝜀 ), 𝜂 ∈ (0, 𝜀 − 𝛿), 𝜁 ∈ (0, 𝛿), 𝜈 ∈ (0, 𝛿 − 𝜁),
𝛼 ∈ (0, 1), and 𝛽 > 1, there exists a (𝐾, 𝜀) one-way key distillation protocol
for 𝜌 𝐴𝐵 satisfying

log2 𝐾 ≥ 𝐼 𝛼 (𝑋; 𝐵′) 𝜌 − e

𝐼 𝛽′ (𝑋; 𝐸 𝑍) 𝜌 − 𝑓 (𝜀′, 𝛿, 𝜂, 𝜈, 𝜁, 𝛼, 𝛽)

where
𝜌 𝑋 𝐵 ′ 𝐸 𝑍 B L↔
𝐴𝐵→𝑋 𝐵′ 𝑍 (𝜓 𝐴𝐵𝐸 ), (15.1.258)
L↔𝐴𝐵→𝑋 𝐵′ 𝑍 is an LOPC channel with classical output system 𝑋 and quantum
output system 𝐵′,

𝐼 𝛽′ (𝑋; 𝐸 𝑍) 𝜌 B 𝐷
e e 𝛽 (𝜌 𝑋 𝐸 𝑍 ∥ 𝜌 𝑋 ⊗ 𝜌 𝐸 𝑍 ), (15.1.259)

and

′ 𝛼 1 8
𝑓 (𝜀 , 𝛿, 𝜂, 𝜈, 𝜁, 𝛼, 𝛽) B log2 ′ + log2 2
1−𝛼 𝜀 −𝛿−𝜂 𝜈

1 1 1
+ log2 + log2
𝛽−1 (𝛿 − 𝜁 − 𝜈) 2 1 − (𝛿 − 𝜁 − 𝜈) 2
4(𝜀′ − 𝛿)

2
+ log2 + log 2 . (15.1.260)
𝜂2 𝜁2

Proof: The main idea here is to convert the smooth mutual information quantities
𝜀 ′ −𝛿−𝜂 𝛿−𝜁
𝐼𝐻 (𝑋; 𝐵′) 𝜌 and 𝐼 max (𝐸 𝑍; 𝑋) 𝜌 from Corollary 15.23 to Rényi mutual informa-
tion quantities with correction terms related to the smoothing parameters. Let us
first invoke Proposition 7.72 to conclude the following lower bound:

𝜀 ′ −𝛿−𝜂 𝛼 1
𝐼𝐻 (𝑋; 𝐵′) 𝜌 ≥ 𝐼 𝛼 (𝑋; 𝐵′) 𝜌 − log2 ′ . (15.1.261)
1−𝛼 𝜀 −𝛿−𝜂
Next, we invoke Lemma 15.25 below to conclude that

𝛿−𝜁 𝛿−𝜁−𝜈 8
𝐼 max (𝐸 𝑍; 𝑋) 𝜌 ≤ 𝐷 max (𝜌 𝑋 𝐸 𝑍 ∥ 𝜌 𝑋 ⊗ 𝜌 𝐸 𝑍 ) + log2 2 , (15.1.262)
𝜈
where 𝜈 ∈ (0, 𝛿 − 𝜁). Then we invoke Proposition 7.64 to conclude that

983
Chapter 15: Secret Key Distillation

𝛿−𝜁−𝜈
𝐷 max (𝜌 𝑋 𝐸 𝑍 ∥ 𝜌 𝑋 ⊗ 𝜌 𝐸 𝑍 ) ≤ 𝐷e 𝛽 (𝜌 𝑋 𝐸 𝑍 ∥ 𝜌 𝑋 ⊗ 𝜌 𝐸 𝑍 )

1 1 1
+ log2 + log2 . (15.1.263)
𝛽−1 (𝛿 − 𝜁 − 𝜈) 2 1 − (𝛿 − 𝜁 − 𝜈) 2
Considering that
𝐷 𝐼 𝛽′ (𝑋; 𝐸 𝑍) 𝜌 .
e 𝛽 (𝜌 𝑋 𝐸 𝑍 ∥ 𝜌 𝑋 ⊗ 𝜌 𝐸 𝑍 ) = e (15.1.264)
Putting all of the above together with Corollary 15.23, we conclude the proof. ■

Lemma 15.25
Let 𝜌 𝐴𝐸 be a bipartite state, and let 𝜀, 𝛿 > 0 be such that 𝜀 + 𝛿 < 1. Then

𝜀+𝛿 𝜀 8
𝐼 max (𝐸; 𝐴) 𝜌 ≤ 𝐷 max (𝜌 𝐴𝐸 ∥ 𝜌 𝐴 ⊗ 𝜌 𝐸 ) + log2 2 , (15.1.265)
𝛿
𝜀+𝛿 𝜀 (𝜌
where 𝐼 max (𝐸; 𝐴) 𝜌 is defined in (15.1.211) and 𝐷 max 𝐴𝐸 ∥ 𝜌 𝐴 ⊗ 𝜌 𝐸 )
in (7.8.42).

Proof: See Appendix 15.B. ■

15.2 Distillable Key of a Quantum State

Having found upper and lower bounds on the one-shot distillable key 𝐾 𝐷𝜀 ( 𝐴; 𝐵) 𝜌 of
a bipartite state 𝜌 𝐴𝐵 , let us now move on to the asymptotic setting. In this setting,
we allow Alice and Bob to make use of an arbitrarily large number 𝑛 of copies of
the state 𝜌 𝐴𝐵 in order to obtain a secret-key state. A secret key distillation protocol
for 𝑛 copies of 𝜌 𝐴𝐵 is defined by the triple (𝑛, 𝐾, L↔ 𝐴𝑛 𝐵 𝑛 →𝐾 𝐴 𝐾 𝐵 𝑍 ), consisting of the
number 𝑛 of copies of 𝜌 𝐴𝐵 , an integer 𝐾 ∈ N, and an LOPC channel L↔ 𝐴𝑛 𝐵 𝑛 →𝐾 𝐴 𝐾 𝐵 𝑍
with 𝑑 𝐾 𝐴 = 𝑑 𝐾 𝐵 = 𝐾. Observe that a secret-key distillation protocol for 𝑛 copies
of 𝜌 𝐴𝐵 is equivalent to a one-shot secret-key distillation protocol for the state 𝜌 ⊗𝑛 𝐴𝐵 .
All of the results of Section 15.1 thus carry over to the asymptotic setting simply
by replacing 𝜌 𝐴𝐵 with 𝜌 ⊗𝑛 𝐴𝐵 . In particular, the error probability for a secret-key
distillation protocol for 𝜌 𝐴𝐵 defined by (𝑛, 𝐾, L↔ 𝐴𝑛 𝐵 𝑛 →𝐾 𝐴 𝐾 𝐵 𝑍 ) is equal to

↔
𝑝 err (L ; 𝜌 ⊗𝑛
𝐴𝐵 ) = inf 1− 𝐹 (𝛾𝐾 𝐴𝐾 𝐵 𝐸 𝑍 , L↔ ⊗𝑛
𝐴𝑛 𝐵 𝑛 →𝐾 𝐴 𝐾 𝐵 𝑍 (𝜓 𝐴𝐵𝐸 )) , (15.2.1)
𝛾𝐾 𝐴 𝐾 𝐵 𝐸 𝑍

984
Chapter 15: Secret Key Distillation

where the infimum is with respect to every ideal tripartite key state 𝛾𝐾 𝐴𝐾 𝐵 𝐸 𝑍 and
𝜓 𝐴𝐵𝐸 is a purification of 𝜌 𝐴𝐵 . The definition in (15.2.1) is thus the same as that in
(15.1.6), but for the tensor-power state 𝜌 ⊗𝑛
𝐴𝐵 .

Definition 15.26 (𝒏, 𝑲, 𝜺) Secret-Key Distillation Protocol

A secret-key distillation protocol (𝑛, 𝐾, L↔
𝐴𝑛 𝐵 𝑛 →𝐾 𝐴 𝐾 𝐵 𝑍 ) for 𝑛 copies of 𝜌 𝐴𝐵 ,
with 𝑑 𝐾 𝐴 = 𝑑 𝐾 𝐵 = 𝐾, is called an (𝑛, 𝐾, 𝜀) protocol, with 𝜀 ∈ [0, 1], if
𝑝 err (L↔ ; 𝜌 ⊗𝑛
𝐴𝐵 ) ≤ 𝜀.

Based on the discussion above, we note here that an (𝑛, 𝐾, 𝜀) secret-key

distillation protocol for 𝜌 𝐴𝐵 is a (𝐾, 𝜀) secret-key distillation protocol for 𝜌 ⊗𝑛
𝐴𝐵 .
The rate 𝑅(𝑛, 𝐾) of an (𝑛, 𝐾, 𝜀) secret-key distillation protocol for 𝑛 copies of
a given state is
log2 𝐾
𝑅(𝑛, 𝐾) B , (15.2.2)
𝑛
which can be thought of as the number of 𝜀-approximate secret-key bits contained
in the final state of the protocol, per copy of the given initial state. Given a state
𝜌 𝐴𝐵 and 𝜀 ∈ [0, 1], the maximum rate of secret key distillation among all (𝑛, 𝐾, 𝜀)
secret-key distillation protocols for 𝜌 𝐴𝐵 is
1
𝐾 𝐷𝑛,𝜀 (𝜌 𝐴𝐵 ) ≡ 𝐾 𝐷𝑛,𝜀 ( 𝐴; 𝐵) 𝜌 B 𝐾 𝐷𝜀 (𝜌 ⊗𝑛𝐴𝐵 ) (15.2.3)
𝑛
log2 𝐾
= sup : 𝑝 err (L↔ ; 𝜌 ⊗𝑛
𝐴𝐵 ) ≤ 𝜀 , (15.2.4)
(𝐾,L )↔ 𝑛
where the optimization is with respect to all 𝐾 ∈ N and every LOPC channel
L↔
𝐴𝑛 𝐵 𝑛 →𝐾 𝐴 𝐾 𝐵 𝑍 with 𝑑 𝐾 𝐴 = 𝑑 𝐾 𝐵 = 𝐾.

Definition 15.27 Achievable Rate for Secret Key Distillation

Given a bipartite state 𝜌 𝐴𝐵 , a rate 𝑅 ∈ R+ is called an achievable rate for secret
key distillation for 𝜌 𝐴𝐵 if for all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently large 𝑛, there
exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) secret-key distillation protocol for 𝜌 𝐴𝐵 .

As we prove in Appendix A,
𝑅 achievable rate ⇐⇒ lim 𝜀 𝐷 (2𝑛(𝑅−𝛿) ; 𝜌 ⊗𝑛
𝐴𝐵 ) = 0 ∀𝛿 > 0. (15.2.5)
𝑛→∞

985
Chapter 15: Secret Key Distillation

In other words, a rate 𝑅 is achievable if the optimal error probability for a sequence
of protocols with rate 𝑅 − 𝛿 vanishes as the number 𝑛 of copies of 𝜌 𝐴𝐵 increases.

Definition 15.28 Distillable Key of a Quantum State

The distillable key of a bipartite state 𝜌 𝐴𝐵 , denoted by 𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 , is defined to
be the supremum of all achievable rates for secret key distillation for 𝜌 𝐴𝐵 , i.e.,

𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 B sup {𝑅 : 𝑅 is an achievable rate for 𝜌 𝐴𝐵 } . (15.2.6)

The distillable key can also be written as

1
𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 = inf lim inf 𝐾 𝐷𝜀 (𝜌 ⊗𝑛
𝐴𝐵 ). (15.2.7)
𝜀∈(0,1] 𝑛→∞ 𝑛

See Appendix A for a proof.

Definition 15.29 Weak Converse Rate for Secret Key Distillation

Given a bipartite state 𝜌 𝐴𝐵 , a rate 𝑅 ∈ R+ is called a weak converse rate for
secret key distillation for 𝜌 𝐴𝐵 if every 𝑅′ > 𝑅 is not an achievable rate for 𝜌 𝐴𝐵 .

As we show in Appendix A,

𝑅 weak converse rate ⇐⇒ lim 𝜀 𝐷 (2𝑛(𝑅−𝛿) ; 𝜌 ⊗𝑛

𝐴𝐵 ) > 0 ∀𝛿 > 0.
𝑛→∞
(15.2.8)

Definition 15.30 Strong Converse Rate for Secret Key Distillation

Given a bipartite state 𝜌 𝐴𝐵 , a rate 𝑅 ∈ R+ is called a strong converse rate for
secret key distillation for 𝜌 𝐴𝐵 if for all 𝜀 ∈ [0, 1), 𝛿 > 0, and sufficiently large
𝑛, there does not exist an (𝑛, 2𝑛(𝑅+𝛿) , 𝜀) secret key distillation protocol for 𝜌 𝐴𝐵 .

We show in Appendix A that

𝑅 strong converse rate ⇐⇒ lim 𝜀 𝐷 (2𝑛(𝑅−𝛿) ; 𝜌 ⊗𝑛

𝐴𝐵 ) = 1 ∀𝛿 > 0.
𝑛→∞
(15.2.9)

986
Chapter 15: Secret Key Distillation

Definition 15.31 Strong Converse Distillable Entanglement of a Quan-

tum State
The strong converse distillable key of a bipartite state 𝜌 𝐴𝐵 , denoted by
e𝐷 ( 𝐴; 𝐵) 𝜌 , is defined as the infimum of all strong converse rates, i.e.,
𝐾

e𝐷 ( 𝐴; 𝐵) 𝜌 B inf {𝑅 : 𝑅 is a strong converse rate for 𝜌 𝐴𝐵 } .

𝐾 (15.2.10)

Note that
𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 ≤ 𝐾
e𝐷 ( 𝐴; 𝐵) 𝜌 (15.2.11)
for every bipartite state 𝜌 𝐴𝐵 . We can also write the strong converse distillable key
as
e𝐷 ( 𝐴; 𝐵) 𝜌 = sup lim sup 1 𝐾 𝐷𝜀 (𝜌 ⊗𝑛 ).
𝐾 (15.2.12)
𝐴𝐵
𝜀∈[0,1) 𝑛→∞ 𝑛
See Appendix A for a proof.
We are now ready to present a general expression for the distillable key of a
bipartite state, as well as two upper bounds on it.

Theorem 15.32 Distillable Key of a Bipartite State

The distillable key of a bipartite state 𝜌 𝐴𝐵 is given by
1
′
𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 = lim sup 𝐼 (𝑋; 𝐵 )L (𝑛) (𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑍)L (𝑛) (𝜓 ⊗𝑛 ) , (15.2.13)
𝑛→∞ 𝑛 (𝑛)
L

where the optimization is with respect to every two-way LOPC chan-

nel L (𝑛)𝐴𝑛 𝐵 𝑛 →𝑋 𝐵′ 𝑍 with classical output system 𝑋 and quantum output sys-
tem 𝐵′. The information quantities are evaluated with respect to the state
L (𝑛) ⊗𝑛
𝐴𝑛 𝐵 𝑛 →𝑋 𝐵′ 𝑍 (𝜓 𝐴𝐵𝐸 ), where 𝜓 𝐴𝐵𝐸 is a purification of 𝜌 𝐴𝐵 . Furthermore, the
relative entropy of entanglement 𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 from (9.2.2) is a strong converse
rate for distillable key, in the sense that
e𝐷 ( 𝐴; 𝐵) 𝜌 ≤ 𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 ,
𝐾 (15.2.14)

and the squashed entanglement from (9.4.1) is a weak converse rate, in the
sense that
𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 ≤ 𝐸 sq ( 𝐴; 𝐵) 𝜌 . (15.2.15)

987
Chapter 15: Secret Key Distillation

If we define
𝐷 ←→ ←→ ′
𝐾 (𝜌 𝐴𝐵 ) ≡ 𝐷 𝐾 ( 𝐴; 𝐵) 𝜌 B sup 𝐼 (𝑋; 𝐵 )L(𝜓) − 𝐼 (𝑋; 𝐸 𝑍)L(𝜓) , (15.2.16)
L

where the entropic quantities are evaluated with respect to L 𝐴𝑛 𝐵𝑛 →𝑋 𝐵′ 𝑍 (𝜓 𝐴𝐵𝐸 ),

with 𝜓 𝐴𝐵𝐸 a purification of 𝜌 𝐴𝐵 , then we can write (15.2.13) as
1 ←→ ⊗𝑛
𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 = lim 𝐷 (𝜌 𝐴𝐵 ) =: 𝐷 ←→
reg,𝐾 (𝜌 𝐴𝐵 ). (15.2.17)
𝑛→∞ 𝑛 𝐾

Thus, the distillable key can be viewed as the regularization of 𝐷 ←→

𝐾 , similar to
what we found in (13.2.17) in Chapter 13 for distillable entanglement.
Let us make the following observations about Theorem 15.32.

• The private information is an achievable rate for secret key distillation, i.e.,
𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 ≥ max{𝐼 (𝑋; 𝐵)𝜏 − 𝐼 (𝑋; 𝐸)𝜏 , 𝐼 ( 𝐴; 𝑌 )𝜔 − 𝐼 (𝑌 ; 𝐸)𝜔 }, (15.2.18)
where
∑︁
𝜏𝑋 𝐵𝐸 B |𝑥⟩⟨𝑥| 𝑋 ⊗ Tr 𝐴 [Λ𝑥𝐴 𝜓 𝐴𝐵𝐸 ], (15.2.19)
𝑥
∑︁
𝑦
𝜔𝑌 𝐴𝐸 B |𝑦⟩⟨𝑦|𝑌 ⊗ Tr 𝐵 [Γ𝐵 𝜓 𝐴𝐵𝐸 ], (15.2.20)
𝑦
𝑦
𝜓 𝐴𝐵𝐸 is a purification of the bipartite state 𝜌 𝐴𝐵 , and {Λ𝑥𝐴 }𝑥 and {Γ𝐵 } 𝑦 are
POVMs. The idea behind the first achievable rate 𝐼 (𝑋; 𝐵)𝜏 − 𝐼 (𝑋; 𝐸)𝜏 is that
Alice performs the measurement {Λ𝑥𝐴 }𝑥 on her system 𝐴, and this produces
the classical–quantum–quantum state 𝜏𝑋 𝐵𝐸 . Alice and Bob then execute the
protocol from Theorem 15.21 on many copies of the state 𝜏𝑋 𝐵𝐸 . Alternatively,
the idea behind the second achievable rate 𝐼 ( 𝐴; 𝑌 )𝜔 − 𝐼 (𝑌 ; 𝐸)𝜔 is similar, but
with the roles of Alice and Bob swapped and distilling a key from many copies
of the state 𝜔𝑌 𝐴𝐸 .

We can also consider these conclusions to be immediate consequences of

(15.2.13), which follow from dropping the optimization over two-way LOPC
channels and from the fact that the unoptimized private informations in
(15.2.18) are additive for product states.
• In order to obtain a higher key distillation rate than the private informations
in (15.2.18), one strategy is to use 𝑛 ≥ 2 copies of 𝜓 𝐴𝐵𝐸 along with a
988
Chapter 15: Secret Key Distillation

two-way LOPC channel L 𝐴𝑛 𝐵𝑛 →𝑋 𝐵′ 𝑍 , in order to obtain a state 𝜔 𝐴′ 𝐵′ 𝐸 𝑛 𝑍 B

L 𝐴𝑛 𝐵𝑛 →𝑋 𝐵′ 𝑍 (𝜓 ⊗𝑛
𝐴𝐵𝐸 ). The normalized private informations of this latter state
are potentially higher than that of 𝜏𝑋 𝐵𝐸 and 𝜔𝑌 𝐴𝐸 in (15.2.19)–(15.2.20). The
overall rate of this strategy is then
1 ′

𝐼 (𝑋; 𝐵 )L (𝑛) (𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑍)L (𝑛) (𝜓 ⊗𝑛 ) , (15.2.21)
𝑛
and Theorem 15.32 tells us that such a strategy is optimal in the large 𝑛 limit.
With increasingly more copies of 𝜓 𝐴𝐵𝐸 to start with, it might be possible to
obtain a better rate, which is why we need to regularize in general.

As with other previous capacity theorem proofs in this book, we prove Theo-
rem 15.32 in two steps:

1. Achievability: We show that the right-hand side of (15.2.13) is an achievable

rate for secret key distillation for 𝜌 𝐴𝐵 . Doing so involves exhibiting an explicit
secret key distillation protocol. The protocol we use is based on the one we
introduced in Section 15.1.4 to obtain a lower bound on the one-shot distillable
secret key.

The achievability part of the proof establishes that

1
′
𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 ≥ lim sup 𝐼 (𝑋; 𝐵 )L (𝑛) (𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑍)L (𝑛) (𝜓 ⊗𝑛 ) . (15.2.22)
𝑛→∞ 𝑛 (𝑛)
L

2. Weak converse: We show that the right-hand side of (15.2.13) is a weak

converse rate for secret key distillation for 𝜌 𝐴𝐵 , from which it follows that
1
′
𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 ≤ lim sup 𝐼 (𝑋; 𝐵 )L (𝑛) (𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑍)L (𝑛) (𝜓 ⊗𝑛 ) . (15.2.23)
𝑛→∞ 𝑛 (𝑛)
L

In order to show this, we use the one-shot upper bounds from Section 15.1.3 to
prove that every achievable rate 𝑅 satisfies
1
𝑅 ≤ lim sup 𝐼 (𝑋; 𝐵′)L (𝑛) (𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑍)L (𝑛) (𝜓 ⊗𝑛 ) . (15.2.24)
𝑛→∞ 𝑛 (𝑛)
L

We go through the achievability part of the proof of Theorem 15.32 in Sec-

tion 15.2.1. We then proceed with the weak converse part in Section 15.2.2.
989
Chapter 15: Secret Key Distillation

The expression in (15.2.13) for the distillable key involves both a limit over an
unbounded number of copies of the state 𝜌 𝐴𝐵 , as well as an optimization over all
two-way LOPC channels. Computing the distillable key is therefore intractable
in general. After establishing a proof of (15.2.13), we proceed to establish upper
bounds on distillable entanglement that depend only on the given state 𝜌 𝐴𝐵 .
Specifically, in Section 15.2.3, we use the one-shot results in Section 15.1.3.2 to
show that the relative entropy of entanglement is a strong converse rate for secret
key distillation. We also show that the squashed entanglement is a weak converse
rate for secret key distillation.

15.2.1 Proof of Achievability

As the first step in proving the achievability part of Theorem 15.32, let us recall
Corollary√15.24: given a bipartite state 𝜌 𝐴𝐵 with purification 𝜓 𝐴𝐵𝐸 , for all 𝜀 ∈ (0, 1),
𝜀′ = 1 − 1 − 𝜀, 𝛿 ∈ (0, 𝜀′), 𝜂 ∈ (0, 𝜀′ − 𝛿), 𝜁 ∈ (0, 𝛿), 𝜈 ∈ (0, 𝛿 − 𝜁), 𝛼 ∈ (0, 1),
and 𝛽 > 1, there exists a (𝐾, 𝜀) one-way key distillation protocol for 𝜌 𝐴𝐵 satisfying
log2 𝐾 ≥ 𝐼 𝛼 (𝑋; 𝐵′) 𝜌 − e
𝐼 𝛽′ (𝑋; 𝐸 𝑍) 𝜌 − 𝑓 (𝜀′, 𝛿, 𝜂, 𝜈, 𝜁, 𝛼, 𝛽) (15.2.25)
where
𝜌 𝑋 𝐵 ′ 𝐸 𝑍 B L↔
𝐴𝐵→𝑋 𝐵′ 𝑍 (𝜓 𝐴𝐵𝐸 ), (15.2.26)
L↔𝐴𝐵→𝑋 𝐵′ 𝑍 is an LOPC channel with classical output system 𝑋 and quantum output
system 𝐵′, the Rényi mutual information e 𝐼 𝛽′ (𝑋; 𝐸 𝑍) 𝜌 is defined in (15.1.259), and
the function 𝑓 (𝜀′, 𝛿, 𝜂, 𝜈, 𝜁, 𝛼, 𝛽) in (15.1.260). Applying this inequality to the
state 𝜌 ⊗𝑛
𝐴𝐵 for all 𝑛 ∈ N leads to the following:

Proposition 15.33
For every state 𝜌 𝐴𝐵 and 𝜀 ∈ (0, 1), there exists an (𝑛, 𝐾, 𝜀) key distillation
log 𝐾
protocol for 𝜌 𝐴𝐵 such that the rate 𝑛2 satisfies
′ 𝜀′ 𝜀′ 𝜀′

log2 𝐾 ′ ′ 1 ′ 𝜀
≥ 𝐼 𝛼 (𝑋; 𝐵 ) 𝜌 − e
𝐼 𝛽 (𝑋; 𝐸 𝑍) 𝜌 − 𝑓 𝜀 , , , , , 𝛼, 𝛽 , (15.2.27)
𝑛 𝑛 2 4 4 2
for all 𝑛 ∈ N, 𝛼 ∈ (0, 1), 𝛽 > 1, where the information quantities are with
respect to the state in (15.2.26). More generally, we have the following lower
bound on the finite-length distillable key:

990
Chapter 15: Secret Key Distillation

1
′ ′
𝐾 𝐷𝑛,𝜀 ( 𝐴; 𝐵) 𝜌 𝑛
≥ sup 𝐼 𝛼 (𝑋; 𝐵 )𝜏 − 𝐼 𝛽 (𝑋; 𝐸 𝑍)𝜏
e
𝑛 L↔
′ ′ ′ ′

1 ′ 𝜀 𝜀 𝜀 𝜀
− 𝑓 𝜀 , , , , , 𝛼, 𝛽 , (15.2.28)
𝑛 2 4 4 2

for all 𝑛 ∈ N, 𝛼 ∈ (0, 1), 𝛽 > 1, where the optimization is over every LOPC
channel L↔ 𝐴𝑛 𝐵 𝑛 →𝑋 𝐵′ 𝑍 and the information quantities are with respect to the
following state:
𝜏𝑋 𝐵′ 𝐸 𝑛 𝑍 B L↔ ⊗𝑛
𝐴𝑛 𝐵 𝑛 →𝑋 𝐵′ 𝑍 (𝜓 𝐴𝐵𝐸 ). (15.2.29)

Proof: Let 𝜓 𝐴𝐵𝐸 be a purification of 𝜌 𝐴𝐵 , and use the tensor-product purification

𝜀′ 𝜀′ 𝜀′ 𝜀′
𝜓 ⊗𝑛
𝐴𝐵𝐸 for 𝜌 ⊗𝑛
𝐴𝐵 . Also, let 𝛿 = 2 , 𝜂 = 4 , 𝜈 = 4 , and 𝜁 = 2 . Substituting all of this
into the inequality in (15.2.25) and simplifying leads to the inequality
′ 𝜀′ 𝜀′ 𝜀′

log2 𝐾 1 𝜀
≥ 𝐼 𝛼 (𝑋; 𝐵′)𝜏 − e𝐼 𝛽′ (𝑋; 𝐸 𝑛 𝑍)𝜏 − 𝑓 𝜀′, , , , , 𝛼, 𝛽 , (15.2.30)
𝑛 𝑛 2 4 4 2
⊗𝑛
where 𝜏𝑋 𝐵′ 𝐸 𝑛 𝑍 B L↔ 𝐴𝑛 𝐵 𝑛 →𝑋 𝐵′ 𝑍 (𝜓 𝐴𝐵𝐸 ). Then, optimizing over every LOPC channel
L↔ 𝑛,𝜀
𝐴𝐵→𝑋 𝐵′ 𝑍 , and using the definition of 𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 in (15.2.3), we obtain (15.2.28).
⊗𝑛
By restricting the state 𝜏𝑋 𝐵′ 𝐸 𝑛 𝑍 to have the form 𝜏𝑋⊗𝑛𝐵′ 𝐸 𝑍 = L↔ 𝐴𝐵→𝑋 𝐵′ 𝑍 (𝜓 𝐴𝐵𝐸 )
(i.e., using a tensor-power LOPC strategy) and employing additivity of 𝐼 𝛼 (𝑋 𝑛 ; 𝐵′𝑛 )𝜏 ⊗𝑛
and e 𝐼 𝛽′ (𝑋 𝑛 ; 𝐸 𝑛 𝑍 𝑛 )𝜏 ⊗𝑛 , we conclude that

𝑛 ′𝑛 ′ 𝑛 𝑛 𝑛 ′ ′
𝐼 𝛼 (𝑋 ; 𝐵 ) 𝜏 ⊗𝑛 − 𝐼 𝛽 (𝑋 ; 𝐸 𝑍 )𝜏 = 𝑛 𝐼 𝛼 (𝑋; 𝐵 )𝜏 − 𝐼 𝛽 (𝑋; 𝐸 𝑍)𝜏 .
e ⊗𝑛 e (15.2.31)

The lower bound in (15.2.27) then follows. ■

Using the inequality in (15.2.27), we can prove the following lower bound on
distillable key:

Theorem 15.34 Achievability of Private Information for Secret Key Dis-

tillation
The private information 𝐼 (𝑋; 𝐵′)𝜏 − 𝐼 (𝑋; 𝐸 𝑍)𝜏 is an achievable rate for secret
key distillation for 𝜌 𝐴𝐵 , where the private information is evaluated with respect

991
Chapter 15: Secret Key Distillation

to the following state:

𝜏𝑋 𝐵′ 𝐸 𝑍 B L↔
𝐴𝐵→𝑋 𝐵′ 𝑍 (𝜓 𝐴𝐵𝐸 ), (15.2.32)

and 𝜓 𝐴𝐵𝐸 is a purification of 𝜌 𝐴𝐵 . In other words,

𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 ≥ 𝐼 (𝑋; 𝐵′)𝜏 − 𝐼 (𝑋; 𝐸 𝑍)𝜏 (15.2.33)

for every bipartite state 𝜌 𝐴𝐵 .

Proof: Let 𝜓 𝐴𝐵𝐸 be a purification of 𝜌 𝐴𝐵 . Fix 𝜀 ∈ (0, 1] and 𝛿 > 0. Let 𝛿1 , 𝛿2 > 0
be such that 𝛿 = 𝛿1 + 𝛿2 . Set 𝛼 ∈ (0, 1) and 𝛽 > 1 such that

′ ′ ′
𝛿1 ≥ 𝐼 (𝑋; 𝐵 )𝜏 − 𝐼 (𝑋; 𝐸 𝑍)𝜏 − 𝐼 𝛼 (𝑋; 𝐵 )𝜏 − 𝐼 𝛽 (𝑋; 𝐸 𝑍)𝜏 .
e (15.2.34)

Note that this is possible because 𝐼 𝛼 (𝑋; 𝐵′)𝜏 increases monotonically with increasing
𝛼 ∈ (0, 1) (see Proposition 7.23) and e 𝐼 𝛽′ (𝑋; 𝐸 𝑍)𝜏 decreases monotonically with
decreasing 𝛽 (see Proposition 7.31), so that

lim 𝐼 𝛼 (𝑋; 𝐵′)𝜏 = sup 𝐼 𝛼 (𝑋; 𝐵′)𝜏 , (15.2.35)

𝛼→1 − 𝛼∈(0,1)
𝐼 𝛽′ (𝑋; 𝐸 𝑍)𝜏 =
lim e 𝐼 𝛽′ (𝑋; 𝐸 𝑍)𝜏 .
inf e (15.2.36)
𝛽→1+ 𝛽∈(1,∞)

Also,

𝐼 (𝑋; 𝐵′)𝜏 = lim− 𝐼 𝛼 (𝑋; 𝐵′)𝜏 , (15.2.37)

𝛼→1
𝐼 𝛽′ (𝑋; 𝐸 𝑍)𝜏 .
𝐼 (𝑋; 𝐸 𝑍)𝜏 = lim+ e (15.2.38)
𝛽→1

With 𝛼 and 𝛽 chosen such that (15.2.34) holds, take 𝑛 large enough so that
′ 𝜀′ 𝜀′ 𝜀′

1 𝜀
𝛿2 ≥ 𝑓 𝜀′, , , , , 𝛼, 𝛽 . (15.2.39)
𝑛 2 4 4 2
Now, we use the fact that for the 𝑛 and 𝜀 chosen above, there exists an (𝑛, 𝐾, 𝜀)
protocol such that
′ 𝜀′ 𝜀′ 𝜀′

log2 𝐾 1 𝜀
≥ 𝐼 𝛼 (𝑋; 𝐵′) 𝜌 − e
𝐼 𝛽′ (𝑋; 𝐸 𝑍) 𝜌 − 𝑓 𝜀′, , , , , 𝛼, 𝛽 . (15.2.40)
𝑛 𝑛 2 4 4 2
992
Chapter 15: Secret Key Distillation

(This follows from Proposition 15.33 above.) Rearranging the right-hand side of
this inequality, and using (15.2.34), (15.2.39), and (15.2.40), we find that
log2 𝐾
≥ 𝐼 (𝑋; 𝐵′)𝜏 − 𝐼 (𝑋; 𝐸 𝑍)𝜏
𝑛
′ ′ ′
© 𝐼 (𝑋; 𝐵 )𝜏 − 𝐼 (𝑋; 𝐸 𝑍)𝜏 − 𝐼 𝛼 (𝑋; 𝐵 )𝜏 − 𝐼 𝛽 (𝑋; 𝐸 𝑍)𝜏 ª
e
− ′ ′ ′ ′ ® (15.2.41)
+ 𝑛1 𝑓 𝜀′, 𝜀2 , 𝜀4 , 𝜀4 , 𝜀2 , 𝛼, 𝛽
« ¬
′
≥ 𝐼 (𝑋; 𝐵 )𝜏 − 𝐼 (𝑋; 𝐸 𝑍)𝜏 − (𝛿1 + 𝛿2 ) (15.2.42)
= 𝐼 (𝑋; 𝐵′)𝜏 − 𝐼 (𝑋; 𝐸 𝑍)𝜏 − 𝛿. (15.2.43)
We thus have shown that there exists an (𝑛, 𝐾, 𝜀) secret key distillation proto-
col with rate 𝑛2 ≥ 𝐼 (𝑋; 𝐵′)𝜏 − 𝐼 (𝑋; 𝐸 𝑍)𝜏 − 𝛿. Therefore, there exists an
log 𝐾

(𝑛, 2𝑛(𝑅−𝛿) , 𝜀) secret-key distillation protocol with 𝑅 = 𝐼 (𝑋; 𝐵′)𝜏 − 𝐼 (𝑋; 𝐸 𝑍)𝜏 for
all sufficiently large 𝑛 such that (15.2.39) holds. Since 𝜀 and 𝛿 are arbitrary, we
conclude that for all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently large 𝑛, there exists an
′
(𝑛, 2𝑛(𝐼 (𝑋;𝐵 ) 𝜏 −𝐼 (𝑋;𝐸 𝑍) 𝜏 −𝛿) , 𝜀) secret key distillation protocol. This means that, by
definition, 𝐼 (𝑋; 𝐵′)𝜏 − 𝐼 (𝑋; 𝐸 𝑍)𝜏 is an achievable rate. ■

Proof of the Achievability Part of Theorem 15.32

Let L↔
𝐴 𝑘 𝐵 𝑘 →𝑋 𝐵′ 𝑍
be an arbitrary LOPC channel with 𝑘 ∈ N, let

𝜏𝑋 𝐵′ 𝐸 𝑘 𝑍 B L↔
𝐴 𝑘 𝐵 𝑘 →𝑋 𝐵′ 𝑍
(𝜓 ⊗𝑘
𝐴𝐵𝐸 ), (15.2.44)
where 𝜓 𝐴𝐵𝐸 is a purification of 𝜌 𝐴𝐵 . Fix 𝜀 ∈ (0, 1] and 𝛿 > 0. Let 𝛿1 , 𝛿2 > 0 be
such that 𝛿 = 𝛿1 + 𝛿2 . Set 𝛼 ∈ (0, 1) and 𝛽 ∈ (1, ∞) such that
1 1
𝛿1 ≥ 𝐼 (𝑋; 𝐵′)𝜏 − 𝐼 (𝑋; 𝐸 𝑘 𝑍)𝜏 − 𝐼 𝛼 (𝑋; 𝐵′)𝜏 − e
𝐼 𝛽′ (𝑋; 𝐸 𝑘 𝑍)𝜏 , (15.2.45)
𝑘 𝑘
which is possible based on the arguments given in the proof of Theorem 15.34
above. Then, with this choice of 𝛼 and 𝛽, take 𝑛 large enough so that
′ ′ ′ ′

1 ′ 𝜀 𝜀 𝜀 𝜀
𝛿2 ≥ 𝑓 𝜀 , , , , , 𝛼, 𝛽 . (15.2.46)
𝑘𝑛 2 4 4 2
Now, we use the fact that, for the chosen 𝑛 and 𝜀, there exists an (𝑛, 𝐾, 𝜀) secret-key
distillation protocol such that (15.2.27) holds, i.e.,
′ 𝜀′ 𝜀′ 𝜀′

log2 𝐾 1 𝜀
≥ 𝐼 𝛼 (𝑋; 𝐵′)𝜏 − e
𝐼 𝛽′ (𝑋; 𝐸 𝑘 𝑍)𝜏 − 𝑓 𝜀′, , , , , 𝛼, 𝛽 . (15.2.47)
𝑛 𝑛 2 4 4 2
993
Chapter 15: Secret Key Distillation

Dividing both sides by 𝑘 gives

′ 𝜀′ 𝜀′ 𝜀′

log2 𝐾 1 ′ ′ 𝑘
1 ′ 𝜀
≥ 𝐼 𝛼 (𝑋; 𝐵 )𝜏 − e𝐼 𝛽 (𝑋; 𝐸 𝑍)𝜏 − 𝑓 𝜀 , , , , , 𝛼, 𝛽 .
𝑘𝑛 𝑘 𝑘𝑛 2 4 4 2
(15.2.48)
Rearranging the right-hand side of this inequality, and using (15.2.45)–(15.2.48),
we find that
log2 𝐾 1 ′ 𝑘

≥ 𝐼 (𝑋; 𝐵 )𝜏 − 𝐼 (𝑋; 𝐸 𝑍)𝜏
𝑘𝑛 𝑘
1 1
′ 𝑘 ′ ′ 𝑘
− 𝐼 (𝑋; 𝐵 )𝜏 − 𝐼 (𝑋; 𝐸 𝑍)𝜏 − 𝐼 𝛼 (𝑋; 𝐵 )𝜏 − e
𝐼 𝛽 (𝑋; 𝐸 𝑍)𝜏
𝑘 𝑘
′ 𝜀′ 𝜀′ 𝜀′

1 𝜀
− 𝑓 𝜀′, , , , , 𝛼, 𝛽 (15.2.49)
𝑘𝑛 2 4 4 2
1 ′ 𝑘

≥ 𝐼 (𝑋; 𝐵 )𝜏 − 𝐼 (𝑋; 𝐸 𝑍)𝜏 − (𝛿1 + 𝛿2 ) (15.2.50)
𝑘
1 ′ 𝑘

= 𝐼 (𝑋; 𝐵 )𝜏 − 𝐼 (𝑋; 𝐸 𝑍)𝜏 − 𝛿. (15.2.51)
𝑘
log 𝐾
Thus, there exists a (𝑘𝑛, 𝐾, 𝜀) secret-key distillation protocol with rate 𝑘𝑛2 ≥
′ ′
1

𝑘 𝐼 (𝑋; 𝐵 ) 𝜏 − 𝐼′ (𝑋; 𝐸 𝑍) 𝜏 − 𝛿. Therefore, letting 𝑛 ≡ 𝑘𝑛, we conclude that there
𝑘

exists an (𝑛′, 2𝑛 (𝑅−𝛿) , 𝜀) secret-key distillation protocol with

1 ′ 𝑘

𝑅= 𝐼 (𝑋; 𝐵 )𝜏 − 𝐼 (𝑋; 𝐸 𝑍)𝜏 (15.2.52)
𝑘
for all sufficiently large 𝑛 such that (15.2.46) holds. Since 𝜀 and 𝛿 are arbitrary,
we conclude that for all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently large 𝑛, there exists an
′
(𝑛, 2𝑛( 𝑘 ( 𝐼 (𝑋;𝐵 ) 𝜏 −𝐼 (𝑋;𝐸 𝑍) 𝜏 ) −𝛿)
1 𝑘

1 ′
, 𝜀) secret key distillation protocol. This means that
𝑘 𝐼 (𝑋; 𝐵 ) 𝜏 − 𝐼 (𝑋; 𝐸 𝑍) 𝜏 is an achievable rate.
𝑘

Now, since in the arguments above the LOPC channel L↔ 𝐴 𝑘 𝐵 𝑘 →𝑋 𝐵′ 𝑍

is arbitrary,
we conclude that
1 ′ 𝑘
sup 𝐼 (𝑋; 𝐵 )𝜏 − 𝐼 (𝑋; 𝐸 𝑍)𝜏 (15.2.53)
𝑘 L↔
is an achievable rate. Finally, since the number 𝑘 of copies of 𝜌 𝐴𝐵 is arbitrary, we
conclude that
1 ′ 𝑘
lim sup 𝐼 (𝑋; 𝐵 )𝜏 − 𝐼 (𝑋; 𝐸 𝑍)𝜏 (15.2.54)
𝑘→∞ 𝑘 L↔

is an achievable rate.
994
Chapter 15: Secret Key Distillation

15.2.2 Proof of the Weak Converse

In order to prove the weak converse part of Theorem 15.32, we make use of
Corollary 15.17, specifically (15.1.144): given a bipartite state 𝜌 𝐴𝐵 , for every
(𝐾, 𝜀) secret key distillation protocol for 𝜌 𝐴𝐵 , with 𝜀 ∈ [0, 1), the following bound
holds
√
1 − 2 𝜀 − 𝛿 log2 𝐾 ≤ sup 𝐼 (𝑋; 𝐵′)L(𝜓) − 𝐼 (𝑋; 𝐸 𝑍)L(𝜓)

L
√ √ √

1
+ ℎ2 ( 𝜀 + 𝛿) + 1 − 𝜀 − 𝛿 log2 + 2𝑔2 ( 𝜀), (15.2.55)
𝛿
√
where 𝛿 ∈ 0, 1 − 𝜀 , 𝜓 𝐴𝐵𝐸 is a purification of 𝜌 𝐴𝐵 , and the information quantities
are evaluated on the state L 𝐴𝐵→𝑋 𝐵′ 𝑍 (𝜓 𝐴𝐵𝐸 ). Applying this inequality to the state
𝜌 ⊗𝑛
𝐴𝐵 leads to the following.

Proposition 15.35
√
Let 𝜌 𝐴𝐵 be a bipartite state, let 𝑛 ∈ N, 𝜀 ∈ [0, 1), and 𝛿 ∈ (0, 1 − 𝜀). For
an (𝑛, 𝐾, 𝜀) secret-key distillation protocol for 𝜌 𝐴𝐵 with corresponding LOPC
log 𝐾
channel L 𝐴𝑛 𝐵𝑛 →𝑋 𝐵′ 𝑍 , with classical systems 𝑋 and 𝑍, the rate 𝑛2 satisfies

√ log2 𝐾 1
≤ sup 𝐼 (𝑋; 𝐵′)L(𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑍)L(𝜓 ⊗𝑛 )

1−2 𝜀−𝛿
𝑛 𝑛 L
√ √ √

1 1
+ ℎ2 ( 𝜀 + 𝛿) + 1 − 𝜀 − 𝛿 log2 + 2𝑔2 ( 𝜀) . (15.2.56)
𝑛 𝛿

Consequently,

√ 1
1 − 2 𝜀 − 𝛿 𝐾 𝐷𝑛,𝜀 ( 𝐴; 𝐵) 𝜌 ≤ sup 𝐼 (𝑋; 𝐵′)L(𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑍)L(𝜓 ⊗𝑛 )

𝑛 L
√ √ √

1 1
+ ℎ2 ( 𝜀 + 𝛿) + 1 − 𝜀 − 𝛿 log2 + 2𝑔2 ( 𝜀) , (15.2.57)
𝑛 𝛿

where the optimization is over every LOCC channel L 𝐴𝑛 𝐵𝑛 →𝑋 𝐵′ 𝑍 .

995
Chapter 15: Secret Key Distillation

Proof of the Weak Converse Part of Theorem 15.32

Suppose that 𝑅 is an achievable rate for secret key distillation for the bipartite
state 𝜌 𝐴𝐵 . Then, by definition, for all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently large 𝑛,
there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) secret-key distillation protocol for 𝜌 𝐴𝐵 . For all such
protocols, the inequality in (15.2.56) holds, so that
√ 1
1 − 2 𝜀 − 𝛿′ (𝑅 − 𝛿) ≤ sup 𝐼 (𝑋; 𝐵′)L(𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑍)L(𝜓 ⊗𝑛 )

𝑛 L
√ √ √

1 ′ ′ 1
+ ℎ2 ( 𝜀 + 𝛿 ) + 1 − 𝜀 − 𝛿 log2 ′ + 2𝑔2 ( 𝜀) . (15.2.58)
𝑛 𝛿
Since the inequality holds for all sufficiently large 𝑛, it holds in the limit 𝑛 → ∞,
so that
√
1 − 2 𝜀 − 𝛿′ (𝑅 − 𝛿)

1
sup 𝐼 (𝑋; 𝐵′)L(𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑍)L(𝜓 ⊗𝑛 )

≤ lim
𝑛→∞ 𝑛 L
!
√ √ √

1 1
ℎ2 ( 𝜀 + 𝛿′) + 1 − 𝜀 − 𝛿′ log2

+ + 2𝑔2 ( 𝜀) (15.2.59)
𝑛 𝛿
1
= lim sup 𝐼 (𝑋; 𝐵′)L(𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑍)L(𝜓 ⊗𝑛 ) .

(15.2.60)
𝑛→∞ 𝑛 L

Then√since this inequality holds for all 𝜀 ∈ (0, 1), 𝛿 > 0, it holds in particular for
𝛿′ = 𝜀, 𝜀 ∈ (0, 19 ), which gives
1 1
√ lim sup 𝐼 (𝑋; 𝐵′)L(𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑍)L(𝜓 ⊗𝑛 ) + 𝛿,

𝑅≤ (15.2.61)
1 − 3 𝜀 𝑛→∞ 𝑛 L
and we thus conclude that
1 1
√ lim sup 𝐼 (𝑋; 𝐵′)L(𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑍)L(𝜓 ⊗𝑛 ) + 𝛿

𝑅 ≤ lim
𝜀,𝛿→0 1 − 3 𝜀 𝑛→∞ 𝑛 L
(15.2.62)
1
= lim sup 𝐼 (𝑋; 𝐵′)L(𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑍)L(𝜓 ⊗𝑛 ) .

(15.2.63)
𝑛→∞ 𝑛 L

We have thus shown that the quantity lim𝑛→∞ 𝑛1 supL 𝐼 (𝑋; 𝐵′)L(𝜓 ⊗𝑛 ) −𝐼 (𝑋; 𝐸 𝑍)L(𝜓 ⊗𝑛 )
is a weak converse rate for secret key distillation for 𝜌 𝐴𝐵 .
996
Chapter 15: Secret Key Distillation

15.2.3 Relative Entropy of Entanglement Strong Converse Up-

per Bound

As indicated previously, the expression in (15.2.13) for distillable key involves both
a limit over an unbounded number of copies of the initial state 𝜌 𝐴𝐵 , as well as an
optimization over all two-way LOPC channels. Computing the distillable key is
therefore intractable in general. In this section, we use the one-shot upper bound
established in Section 15.1.3.2 to show that the relative entropy of entanglement is
a strong converse upper bound on the distillable key of a bipartite state 𝜌 𝐴𝐵 .
We start by recalling the upper bound in (15.1.145), which tells us that

𝛼 1
log2 𝐾 ≤ 𝐸e𝛼 ( 𝐴; 𝐵) 𝜌 + log2 ∀𝛼 > 1, (15.2.64)
𝛼−1 1−𝜀
for an arbitrary (𝐾, 𝜀) secret-key distillation protocol, where 𝜀 ∈ (0, 1). Recall that
e𝛼 ( 𝐴; 𝐵) 𝜌 =
𝐸 inf e𝛼 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ).
𝐷 (15.2.65)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)

Recall that the upper bound above is a consequence of the fact that separable states
are useless for secret key distillation.
Applying the upper bound in (15.2.64) to the state 𝜌 ⊗𝑛
𝐴𝐵 leads to the following
result:

Corollary 15.36
Let 𝜌 𝐴𝐵 be a bipartite state, let 𝑛 ∈ N, 𝜀 ∈ [0, 1), and 𝛼 > 1. For an (𝑛, 𝐾, 𝜀)
secret-key distillation protocol, the following bound holds

log2 𝐾 𝛼 1
≤𝐸e𝛼 ( 𝐴; 𝐵) 𝜌 + log2 . (15.2.66)
𝑛 𝑛 (𝛼 − 1) 1−𝜀

Consequently,

𝛼 1
𝐾 𝐷𝑛,𝜀 ( 𝐴; 𝐵) 𝜌 ≤ 𝐸
e𝛼 ( 𝐴; 𝐵) 𝜌 + log2 . (15.2.67)
𝑛 (𝛼 − 1) 1−𝜀

Proof: An (𝑛, 𝐾, 𝜀) secret-key distillation protocol for 𝜌 𝐴𝐵 is a (𝐾, 𝜀) secret-key

distillation protocol for 𝜌 ⊗𝑛
𝐴𝐵 . Therefore, applying the inequality in (15.2.64) to the
997
Chapter 15: Secret Key Distillation

state 𝜌 ⊗𝑛
𝐴𝐵 and dividing both sides by 𝑛 leads to

log2 𝐾 1e 𝑛 𝑛 𝛼 1
≤ 𝐸 𝛼 ( 𝐴 ; 𝐵 ) 𝜌 ⊗𝑛 + log2 . (15.2.68)
𝑛 𝑛 𝑛 (𝛼 − 1) 1−𝜀
Now, by subadditivity of the sandwiched Rényi relative entropy of entanglement
(see (9.2.10)), we have that
e𝛼 ( 𝐴𝑛 ; 𝐵𝑛 ) 𝜌 ⊗𝑛 ≤ 𝑛 𝐸
𝐸 e𝛼 ( 𝐴; 𝐵) 𝜌 . (15.2.69)

Therefore,

log2 𝐾 𝛼 1
≤𝐸
e𝛼 ( 𝐴; 𝐵) 𝜌 + log2 , (15.2.70)
𝑛 𝑛 (𝛼 − 1) 1−𝜀
as required. Since this inequality holds for all (𝑛, 𝐾, 𝜀) protocols, we obtain
(15.2.67) by optimizing over all key distillation protocols. ■

Given an 𝜀 ∈ (0, 1), the inequality in (15.2.66) gives us a bound on the rate of
an arbitrary (𝑛, 𝐾, 𝜀) secret-key distillation protocol for a state 𝜌 𝐴𝐵 . If we instead
fix the rate to be 𝑟, so that 𝐾 = 2𝑛𝑟 , then the inequality in (15.2.66) is as follows:

𝛼 1
𝑟≤𝐸 e𝛼 ( 𝐴; 𝐵) 𝜌 + log2 (15.2.71)
𝑛 (𝛼 − 1) 1−𝜀
for all 𝛼 > 1. Rearranging this inequality gives us the following lower bound on 𝜀:

𝜀 ≥ 1 − 2−𝑛 ( )(𝑟−𝐸e𝛼 ( 𝐴;𝐵)𝜌 )

𝛼−1
𝛼 (15.2.72)

for all 𝛼 > 1.

Theorem 15.37 Strong Converse Upper Bound on Distillable Key

Let 𝜌 𝐴𝐵 be a bipartite state. The relative entropy of entanglement 𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 is
a strong converse rate for secret key distillation for 𝜌 𝐴𝐵 , i.e.,
e𝐷 ( 𝐴; 𝐵) 𝜌 ≤ 𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 ,
𝐾 (15.2.73)
where we recall that 𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 is defined as
inf 𝐷 (𝜌 𝐴𝐵 ∥𝜎𝐴𝐵 ). (15.2.74)
𝜎𝐴𝐵 ∈SEP( 𝐴:𝐵)

998
Chapter 15: Secret Key Distillation

Proof: The proof is identical that given for Theorem 13.24, except we make use
of (15.2.66). ■

Given that the relative entropy of entanglement is a strong converse rate for
distillable key, by following arguments analogous to those in the referenced proof,
we conclude that 1𝑘 𝐸 𝑅 ( 𝐴 𝑘 ; 𝐵 𝑘 ) 𝜌 ⊗𝑘 is a strong converse rate for all 𝑘 ∈ N. Therefore,
the regularized quantity

reg 1
𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 B lim 𝐸 𝑅 ( 𝐴𝑛 ; 𝐵𝑛 ) 𝜌 ⊗𝑛 (15.2.75)
𝑛→∞ 𝑛

is a strong converse rate for secret key distillation for 𝜌 𝐴𝐵 , so that

e𝐷 ( 𝐴; 𝐵) 𝜌 ≤ 𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 .
𝐾 (15.2.76)

By the subaddivity of relative entropy of entanglement (see (9.2.10)),

reg
𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 ≤ 𝐸 𝑅 ( 𝐴; 𝐵) 𝜌 , (15.2.77)

so that the regularized quantity in general gives a tighter upper bound on distillable
key.

15.2.4 Squashed Entanglement Weak Converse Upper Bound

In this section, we establish the squashed entanglement of a bipartite state as a

weak converse upper bound on its distillable key. The main idea is to apply the one-
shot bound from Theorem 15.20 and the additivity of the squashed entanglement
(Proposition 9.32) in order to arrive at this conclusion.

Corollary 15.38
Let 𝜌 𝐴𝐵 be a bipartite state, let 𝑛 ∈ N, and let 𝜀 ∈ [0, 1). For an (𝑛, 𝐾, 𝜀)
secret-key distillation protocol, the following bound holds
√ 1 2 √
1 − 2 𝜀 log2 𝐾 ≤ 𝐸 sq ( 𝐴; 𝐵) 𝜌 + 𝑔2 ( 𝜀). (15.2.78)
𝑛 𝑛

Proof: An (𝑛, 𝐾, 𝜀) secret-key distillation protocol for 𝜌 𝐴𝐵 is a (𝐾, 𝜀) secret-key

distillation protocol for 𝜌 ⊗𝑛
𝐴𝐵 . Therefore, applying the inequality in (15.1.190) to
999
Chapter 15: Secret Key Distillation

the state 𝜌 ⊗𝑛
𝐴𝐵 and dividing both sides by 𝑛 leads to

√ 1 1 2 √
1 − 2 𝜀 log2 𝐾 ≤ 𝐸 sq ( 𝐴𝑛 ; 𝐵𝑛 ) 𝜌 ⊗𝑛 + 𝑔2 ( 𝜀). (15.2.79)
𝑛 𝑛 𝑛
Now, by additivity of the squashed entanglement (Proposition 9.32), we have that

𝐸 sq ( 𝐴𝑛 ; 𝐵𝑛 ) 𝜌 ⊗𝑛 = 𝑛𝐸 sq ( 𝐴; 𝐵) 𝜌 . (15.2.80)

This concludes the proof. ■

We now provide a proof of (15.2.15), the statement that the squashed entan-
glement is a weak converse rate for secret key distillation. Suppose that 𝑅 is
an achievable rate for secret key distillation for the bipartite state 𝜌 𝐴𝐵 . Then,
by definition, for all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently large 𝑛, there exists an
(𝑛, 2𝑛(𝑅−𝛿) , 𝜀) secret-key distillation protocol for 𝜌 𝐴𝐵 . For all such protocols, the
inequality in (15.2.78) holds, so that
√ 2 √
1 − 2 𝜀 (𝑅 − 𝛿) ≤ 𝐸 sq ( 𝐴; 𝐵) 𝜌 + 𝑔2 ( 𝜀). (15.2.81)
𝑛
Since the inequality holds for all sufficiently large 𝑛, it holds in the limit 𝑛 → ∞,
so that
√ 2 √

1 − 2 𝜀 (𝑅 − 𝛿) ≤ lim 𝐸 sq ( 𝐴; 𝐵) 𝜌 + 𝑔2 ( 𝜀) (15.2.82)
𝑛→∞ 𝑛
= 𝐸 sq ( 𝐴; 𝐵) 𝜌 . (15.2.83)

Then, since this inequality holds for all 𝜀 ∈ (0, 1] and 𝛿 > 0, it holds in particular
for all 𝜀 ∈ (0, 14 ) and 𝛿 > 0, implying that
1
𝑅≤ √ 𝐸 sq ( 𝐴; 𝐵) 𝜌 + 𝛿, (15.2.84)
1−2 𝜀
and furthermore, that

1
𝑅 ≤ lim √ 𝐸 sq ( 𝐴; 𝐵) 𝜌 + 𝛿 (15.2.85)
𝜀,𝛿→0 1 − 2 𝜀
= 𝐸 sq ( 𝐴; 𝐵) 𝜌 . (15.2.86)

We have thus shown that the squashed entanglement is a weak converse rate for
secret key distillation.
1000
Chapter 15: Secret Key Distillation

15.3 One-Way Secret Key Distillation

In Section 15.1.4, we considered a one-way secret-key distillation protocol to derive
a lower bound on the one-shot distillable key of a bipartite state. In the asymptotic
setting, this leads to the private information lower bound on the distillable key of a
bipartite state 𝜌 𝐴𝐵 , i.e.,

𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 ≥ 𝐼 (𝑋; 𝐵)𝜏 − 𝐼 (𝑋; 𝐸)𝜏 , (15.3.1)

where ∑︁
𝜏𝑋 𝐵𝐸 B |𝑥⟩⟨𝑥| 𝑋 ⊗ Tr 𝐴 [Λ𝑥𝐴 𝜓 𝐴𝐵𝐸 ], (15.3.2)
𝑥
𝜓 𝐴𝐵𝐸 is a purification of 𝜌 𝐴𝐵 , and {Λ𝑥𝐴 }𝑥 is a POVM. By reversing the roles of
Alice and Bob in the protocol, we find that

𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 ≥ 𝐼 ( 𝐴; 𝑌 )𝜔 − 𝐼 (𝑌 ; 𝐸)𝜔 , (15.3.3)

where ∑︁
𝑦
𝜔𝑌 𝐴𝐸 B |𝑦⟩⟨𝑦|𝑌 ⊗ Tr 𝐵 [Γ𝐵 𝜓 𝐴𝐵𝐸 ], (15.3.4)
𝑦
𝑦
where {Γ𝐵 } 𝑦 is a POVM. Then, in general, we have the following lower bound on
distillable key:

𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 ≥ max{𝐼 (𝑋; 𝐵)𝜏 − 𝐼 (𝑋; 𝐸)𝜏 , 𝐼 ( 𝐴; 𝑌 )𝜔 − 𝐼 (𝑌 ; 𝐸)𝜔 }. (15.3.5)

This private information lower bound can be improved by first applying a

two-way LOPC channel to 𝑛 copies of the given state, and then performing a
one-way secret-key distillation protocol at the private information rate. This leads
to
1
𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 = lim sup 𝐼 (𝑋; 𝐵′)L↔ (𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑛 𝑍)L↔ (𝜓 ⊗𝑛 ) , (15.3.6)
𝑛→∞ 𝑛 L↔

⊗𝑛
where the information quantities are evaluated on the state L↔ 𝐴𝑛 𝐵 𝑛 →𝑋 𝐵′ 𝑍 (𝜓 𝐴𝐵𝐸 ),
L↔𝐴𝑛 𝐵 𝑛 →𝑋 𝐵′ 𝑍 is an LOPC channel with classical systems 𝑋 and 𝑍, and 𝜓 𝐴𝐵𝐸 is a
purification of 𝜌 𝐴𝐵 .
If we restrict the optimization in (15.3.6) above to one-way LOPC channels
of the form L→ 𝐴𝑛 𝐵 𝑛 →𝑋 𝐵′ 𝑍 , then we obtain what is called the one-way distillable
1001
Chapter 15: Secret Key Distillation

key of 𝜌 𝐴𝐵 , denoted by 𝐾 𝐷→ ( 𝐴; 𝐵) 𝜌 , and defined operationally in a similar way to

the distillable key 𝐾 𝐷 ( 𝐴; 𝐵) 𝜌 , but with the free operations allowed restricted to
one-way LOPC. A key result is the following equality:
1
𝐾 𝐷→ ( 𝐴; 𝐵) 𝜌 = lim sup 𝐼 (𝑋; 𝐵′)L→ (𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑛 𝑍)L→ (𝜓 ⊗𝑛 ) (15.3.7)
𝑛→∞ 𝑛 L→
1
= lim 𝐷 → (𝜌 ⊗𝑛
𝐴𝐵 ), (15.3.8)
𝑛→∞ 𝑛 𝐾

where
𝐷→ ′
𝐾 (𝜌 𝐴𝐵 ) B sup 𝐼 (𝑋; 𝐵 )L→ (𝜓) − 𝐼 (𝑋; 𝐸 𝑍)L→ (𝜓) . (15.3.9)
L→
Like the distillable key, the one-way distillable key is an operational quantity of
interest. Furthermore, the equality in (15.3.7) can be proved similarly to how we
proved (15.2.13).
In what follows, we show that this expression for one-way distillable key can be
simplified.

Theorem 15.39 One-Way Distillable Key of a Bipartite State

The one-way distillable key of a bipartite state 𝜌 𝐴𝐵 is given by
1
𝐾 𝐷→ ( 𝐴; 𝐵) 𝜌 = lim sup 𝐼 (𝑋; 𝐵𝑛 |𝑍)𝜏 − 𝐼 (𝑋; 𝐸 𝑛 |𝑍)𝜏 , (15.3.10)
𝑛→∞ 𝑛 {Λ 𝑥,𝑧 }
𝑛 𝑥 ∈X,𝑧 ∈Z
𝐴

where
∑︁
⊗𝑛
𝜏𝑋 𝑍 𝐵𝑛 𝐸 𝑛 B |𝑥⟩⟨𝑥| 𝑋 ⊗ |𝑧⟩⟨𝑧| 𝑍 ⊗ Tr 𝐴𝑛 [Λ𝑥,𝑧
𝐴𝑛 𝜓 𝐴𝐵𝐸 ], (15.3.11)
𝑥∈X,𝑧∈Z

and the optimization is over every POVM {Λ𝑥,𝑧

𝐴𝑛 } 𝑥∈X,𝑧∈Z with output alphabets
X and Z.

This theorem tells us that, to determine the one-way distillable key of a bipartite
state, it suffices to optimize over one-way LOPC channels that consist of a POVM
conducted on Alice’s systems.

Proof: Let us start by recalling from Definition 4.22 and the discussion around

1002
Chapter 15: Secret Key Distillation

(15.1.2) that every one-way LOPC channel L→

𝐴𝑛 𝐵 𝑛 →𝑋 𝐵′ 𝑍 can be expressed as

𝜔 𝑋 𝐵 ′ 𝑍 B L→ ′ (𝜉 𝑛 𝑛 ) (15.3.12)
∑︁𝐴 𝐵 →𝑋 𝐵 𝑍 𝐴 𝐵
𝑛 𝑛

= (E𝑧𝐴𝑛 →𝑋 ⊗ D𝑧𝐵𝑛 →𝐵′ )(𝜉 𝐴𝑛 𝐵𝑛 ) ⊗ |𝑧⟩⟨𝑧| 𝑍 , (15.3.13)

𝑧∈Z
= (D𝑍 𝐵 𝐵𝑛 →𝐵′ ◦ C𝑍 𝐴→𝑍 𝐵 𝑍 ◦ E 𝐴𝑛 →𝑋 𝑍 𝐴 )(𝜉 𝐴𝑛 𝐵𝑛 ), (15.3.14)

where Z is some finite alphabet, {E𝑧𝐴𝑛 →𝑋 } 𝑧∈Z is a set of completely positive maps
Í
such that 𝑧∈Z E𝑧𝐴𝑛 →𝑋 is trace preserving, and {D𝑧𝐵𝑛 →𝐵′ } 𝑧∈Z is a set of channels.
Furthermore,
∑︁
E 𝐴𝑛 →𝑋 𝑍 𝐴 (𝜉 𝐴𝑛 𝐵𝑛 ) = E𝑧𝐴𝑛 →𝑋 (𝜉 𝐴𝑛 𝐵𝑛 ) ⊗ |𝑧⟩⟨𝑧| 𝑍 , (15.3.15)
𝑧∈Z
D𝑍 𝐵 𝐵𝑛 →𝐵′ (|𝑧⟩⟨𝑧| 𝑍 𝐵 ⊗ 𝜉 𝐴𝑛 𝐵𝑛 ) = D𝑧𝐵𝑛 →𝐵′ (𝜉 𝐴𝑛 𝐵𝑛 ), (15.3.16)

and since the map E𝑧𝐴𝑛 →𝑋 has a classical output 𝑋, it can be written as
∑︁
E𝑧𝐴𝑛 →𝑋 (𝜉 𝐴𝑛 𝐵𝑛 ) = Tr 𝐴𝑛 [Λ𝑥,𝑧
𝐴𝑛 𝜉 𝐴 𝐵 ]|𝑥⟩⟨𝑥| 𝑋 ,
𝑛 𝑛 (15.3.17)
𝑥∈X

where {Λ𝑥,𝑧
𝐴𝑛 } 𝑥∈X,𝑧∈Z is a POVM.
For every 𝑛 ∈ N, if we restrict the optimization in (15.3.7) to D𝑧𝐵𝑛 →𝐵′ = id𝐵𝑛
for all 𝑧 ∈ Z and E𝑧𝐴𝑛 →𝑋 (·) = 𝑥∈X Tr 𝐴𝑛 [Λ𝑥,𝑧
Í
𝐴𝑛 (·)] for all 𝑧 ∈ Z, then the LOPC
→
channel L 𝐴𝑛 𝐵𝑛 →𝑋 𝐵′ 𝑍 reduces to
∑︁
L→ (𝜉 𝑛
𝐴𝑛 𝐵 𝑛 →𝑋 𝐵 𝑛 𝑍 𝐴 𝐵 𝑛 ) = Tr 𝐴𝑛 [Λ𝑥,𝑧
𝐴𝑛 𝜉 𝐴 𝐵 ] ⊗ |𝑥⟩⟨𝑥| 𝑋 ⊗ |𝑧⟩⟨𝑧| 𝑍
𝑛 𝑛 (15.3.18)
𝑥∈X,𝑧∈Z

for every input state 𝜉 𝐴𝑛 𝐵𝑛 . We thus conclude that

1
𝐾 𝐷→ ( 𝐴; 𝐵) 𝜌 ≥ lim sup 𝐼 (𝑋; 𝐵𝑛 |𝑍)𝜏 − 𝐼 (𝑋; 𝐸 𝑛 |𝑍)𝜏 . (15.3.19)
𝑛→∞ 𝑛 {Λ 𝑥,𝑧 }
𝑛 𝑥 ∈X,𝑧 ∈Z
𝐴

The rest of the proof is devoted to proving the reverse inequality. Let L→
𝐴𝑛 𝐵 𝑛 →𝑋 𝐵′ 𝑍
be an arbitrary LOPC channel of the form in (15.3.12)–(15.3.17). Consider that

𝐼 (𝑋; 𝐵′)L→ (𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑛 𝑍)L→ (𝜓 ⊗𝑛 )

≤ 𝐼 (𝑋; 𝐵𝑛 𝑍)L′ (𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑛 𝑍)L′ (𝜓 ⊗𝑛 ) (15.3.20)

1003
Chapter 15: Secret Key Distillation

= 𝐼 (𝑋; 𝑍)L′ (𝜓 ⊗𝑛 ) + 𝐼 (𝑋; 𝐵𝑛 |𝑍)L′ (𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝑍)L′ (𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑛 |𝑍)L′ (𝜓 ⊗𝑛 )

(15.3.21)
= 𝐼 (𝑋; 𝐵𝑛 |𝑍)L′ (𝜓 ⊗𝑛 ) − 𝐼 (𝑋; 𝐸 𝑛 |𝑍)L′ (𝜓 ⊗𝑛 ) (15.3.22)

where
∑︁
L′𝐴𝑛 𝐵𝑛 →𝑋 𝐵𝑛 𝑍 (𝜉 𝐴𝑛 𝐵𝑛 ) B Tr 𝐴𝑛 [Λ𝑥,𝑧
𝐴𝑛 𝜉 𝐴 𝐵 ] ⊗ |𝑥⟩⟨𝑥| 𝑋 ⊗ |𝑧⟩⟨𝑧| 𝑍 .
𝑛 𝑛 (15.3.23)
𝑥∈X,𝑧∈Z

The inequality follows from data-processing with respect to the decoding channel
D𝑍 𝐵 𝐵𝑛 →𝐵′ of Bob. This concludes the proof. ■

Lemma 15.40
For every bipartite state 𝜌 𝐴𝐵 , the optimized private information lower bound
on distillable key is non-negative, i.e., 𝐷 →
𝐾 (𝜌 𝐴𝐵 ) ≥ 0.

Proof: Let 𝜓 𝐴𝐵𝐸 be a purification of 𝜌 𝐴𝐵 , and consider the following Schmidt

decomposition of 𝜓 𝐴𝐵𝐸 :
𝑟−1 √︁
∑︁
|𝜓⟩ 𝐴𝐵𝐸 = 𝜆 𝑘 |𝜙 𝑘 ⟩ 𝐴 ⊗ |𝜑 𝑘 ⟩𝐵𝐸 . (15.3.24)
𝑘=0

Then let Λ𝑥,𝑧

𝐴 = |𝜙𝑥 ⟩⟨𝜙𝑥 |𝛿 𝑧,𝑥 so that the POVM measures in the local Schmidt basis of
𝐴 and broadcasts the measurement result through 𝑥 and 𝑧. It is then straightforward
to show that the private information 𝐼 (𝑋; 𝐵|𝑍) − 𝐼 (𝑋; 𝐸 |𝑍) = 0. Since the POVM
we chose is a particular choice in the optimization for 𝐷 →
𝐾 (𝜌 𝐴𝐵 ) ≥ 0, we conclude
→
that 𝐷 𝐾 (𝜌 𝐴𝐵 ) ≥ 𝐼 (𝑋; 𝐵|𝑍) − 𝐼 (𝑋; 𝐸 |𝑍) = 0. ■

Lemma 15.41
For every bipartite state 𝜌 𝐴𝐵 , the optimized private information lower bound
on distillable key is not smaller than the coherent information of 𝜌 𝐴𝐵 , i.e.,
𝐷→𝐾 (𝜌 𝐴𝐵 ) ≥ 𝐼 ( 𝐴⟩𝐵) 𝜌 . Thus, the coherent information is a lower bound for
one-way distillable key:

𝐾 𝐷→ (𝜌 𝐴𝐵 ) ≥ 𝐼 ( 𝐴⟩𝐵) 𝜌 . (15.3.25)

1004
Chapter 15: Secret Key Distillation

Proof: Let Λ𝑥,𝑧 𝐴 = |𝜑𝑥 ⟩⟨𝜑𝑥 | 𝐴 be a rank-one POVM for which there is no output 𝑧.
Let the state after the measurement be as follows:
∑︁
𝜏𝑋 𝐵𝐸 B |𝑥⟩⟨𝑥| 𝑋 ⊗ Tr 𝐴 [|𝜑𝑥 ⟩⟨𝜑𝑥 | 𝐴 𝜓 𝐴𝐵𝐸 ] (15.3.26)
𝑥∈X
∑︁
= 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜓 𝑥𝐵𝐸 , (15.3.27)
𝑥∈X

where

𝑝(𝑥) B Tr[|𝜑𝑥 ⟩⟨𝜑𝑥 | 𝐴 𝜓 𝐴𝐵𝐸 ], (15.3.28)

1
𝜓 𝑥𝐵𝐸 B Tr 𝐴 [|𝜑𝑥 ⟩⟨𝜑𝑥 | 𝐴 𝜓 𝐴𝐵𝐸 ]. (15.3.29)
𝑝(𝑥)
Note that each 𝜓 𝑥𝐵𝐸 is a pure state. Then it follows that

𝐼 (𝑋; 𝐵|𝑍)𝜏 − 𝐼 (𝑋; 𝐸 |𝑍)𝜏 = 𝐼 (𝑋; 𝐵)𝜏 − 𝐼 (𝑋; 𝐸)𝜏 (15.3.30)

= 𝐻 (𝐵)𝜏 − 𝐻 (𝐵|𝑋)𝜏 − 𝐻 (𝐸)𝜏 + 𝐻 (𝐸 |𝑋)𝜏 (15.3.31)
= 𝐻 (𝐵)𝜏 − 𝐻 (𝐸)𝜏 (15.3.32)
= 𝐻 (𝐵) 𝜌 − 𝐻 (𝐸) 𝜌 (15.3.33)
= 𝐼 ( 𝐴⟩𝐵) 𝜌 . (15.3.34)

The first equality follows because the 𝑍 system is trivial. The third equality follows
because 𝐻 (𝐵|𝑋)𝜏 = 𝐻 (𝐸 |𝑋)𝜏 , which in turn follows because each state 𝜓 𝑥𝐵𝐸 is
pure. ■

15.4 Examples
We now consider classes of bipartite states and evaluate the upper and lower bounds
on their distillable key that we have established in this chapter. In some cases,
the distillable key can be determined exactly because the upper and lower bounds
coincide.

15.4.1 Pure States

The simplest example for which distillable key can be determined exactly is the
class of pure bipartite states. In this case, the coherent information lower bound
1005
Chapter 15: Secret Key Distillation

from Lemma 15.41 and the relative entropy of entanglement upper bound from
Theorem 15.32 coincide and are equal to the entropy of the reduced state. Thus,
applying this same reasoning from Section 13.3.1, we conclude the following:

Theorem 15.42 Distillable Key for Pure States

The distillable key of a pure bipartite state 𝜓 𝐴𝐵 is equal to the entropy of the
reduced state on 𝐴, i.e.,

𝐾 𝐷 ( 𝐴; 𝐵)𝜓 = 𝐻 ( 𝐴)𝜓 . (15.4.1)

15.4.2 Degradable and Anti-Degradable States

In Section 13.3.2, we defined degradable and anti-degradable states, and we proved

that the one-way distillable entanglement of a degradable state is equal to its
coherent information. Also, we proved that the one-way distillable entanglement
of an anti-degradable state vanishes. It turns out that the same results hold for
one-way distillable key.

Theorem 15.43 One-Way Distillable Key for Anti-Degradable States

For an anti-degradable state 𝜌 𝐴𝐵 , the one-way distillable key is equal to zero,
i.e., 𝐾 𝐷→ ( 𝐴; 𝐵) 𝜌 = 0.

Proof: This is a direct consequence of the definition of an anti-degradable state

and the result in Theorem 15.39. Indeed, for an anti-degradable state 𝜌 𝐴𝐵 with
purification 𝜓 𝐴𝐵𝐸 , there exists an anti-degrading channel A𝐸→𝐵 such that 𝜌 𝐴𝐵 =
A𝐸→𝐵 (𝜓 𝐴𝐸 ). A similar statement holds for 𝜌 ⊗𝑛 ⊗𝑛
𝐴𝐵 , i.e., 𝜌 𝐴𝐵 = [A𝐸→𝐵 (𝜓 𝐴𝐸 )] .
⊗𝑛

Applying this fact and the data-processing inequality to the expression 𝐼 (𝑋; 𝐵𝑛 |𝑍)𝜏 −
𝐼 (𝑋; 𝐸 𝑛 |𝑍)𝜏 from Theorem 15.39, we conclude that

𝐼 (𝑋; 𝐵𝑛 |𝑍)𝜏 − 𝐼 (𝑋; 𝐸 𝑛 |𝑍)𝜏 ≤ 0. (15.4.2)

So we conclude that 𝐾 𝐷→ ( 𝐴; 𝐵) 𝜌 ≤ 0. Combined with the general lower bound

from Lemma 15.40, we conclude that 𝐾 𝐷→ ( 𝐴; 𝐵) 𝜌 = 0 for an anti-degradable state
𝜌 𝐴𝐵 . ■

1006
Chapter 15: Secret Key Distillation

Theorem 15.44 One-Way Distillable Key for Degradable States

For a degradable state 𝜌 𝐴𝐵 , we have

𝐷→
𝐾 (𝜌 𝐴𝐵 ) = 𝐼 ( 𝐴⟩𝐵) 𝜌 . (15.4.3)
⊗𝑛
Consequently, 𝐷 → →
𝐾 (𝜌 𝐴𝐵 ) = 𝑛𝐷 𝐾 (𝜌 𝐴𝐵 ), and thus the one-way distillable key
of a degradable state 𝜌 𝐴𝐵 is equal to its coherent information:

𝐾 𝐷→ ( 𝐴; 𝐵) 𝜌 = 𝐼 ( 𝐴⟩𝐵) 𝜌 . (15.4.4)

Proof: It suffices to prove the upper bound 𝐷 → 𝐾 (𝜌 𝐴𝐵 ) ≤ 𝐼 ( 𝐴⟩𝐵) 𝜌 because

→
Lemma 15.41 established the lower bound 𝐷 𝐾 (𝜌 𝐴𝐵 ) ≥ 𝐼 ( 𝐴⟩𝐵) 𝜌 in general.
Recall that the defining property of a degradable state 𝜌 𝐴𝐵 with purification 𝜓 𝐴𝐵𝐸
is that there exists a degrading channel D𝐵→𝐸 such that 𝜓 𝐴𝐸 = D𝐵→𝐸 (𝜌 𝐴𝐵 ). Then
the same is true for the tensor-power states, i.e., 𝜓 ⊗𝑛 ⊗𝑛
𝐴𝐸 = (D 𝐵→𝐸 (𝜌 𝐴𝐵 )) . Then
consider that

𝐼 (𝑋; 𝐵𝑛 |𝑍)𝜏 − 𝐼 (𝑋; 𝐸 𝑛 |𝑍)𝜏

= 𝐼 (𝑋 𝑍; 𝐵𝑛 )𝜏 − 𝐼 (𝑍; 𝐵𝑛 )𝜏 − [𝐼 (𝑋 𝑍; 𝐸 𝑛 )𝜏 − 𝐼 (𝑍; 𝐸 𝑛 )𝜏 ] (15.4.5)
= 𝐼 (𝑋 𝑍; 𝐵𝑛 )𝜏 − 𝐼 (𝑋 𝑍; 𝐸 𝑛 )𝜏 − [𝐼 (𝑍; 𝐵𝑛 )𝜏 − 𝐼 (𝑍; 𝐸 𝑛 )𝜏 ] (15.4.6)
≤ 𝐼 (𝑋 𝑍; 𝐵𝑛 )𝜏 − 𝐼 (𝑋 𝑍; 𝐸 𝑛 )𝜏 , (15.4.7)

where
∑︁
⊗𝑛
𝜏𝑋 𝑍 𝐵𝑛 𝐸 𝑛 = |𝑥⟩⟨𝑥| 𝑋 ⊗ |𝑧⟩⟨𝑧| 𝑍 ⊗ Tr 𝐴𝑛 [Λ𝑥,𝑧
𝐴𝑛 𝜓 𝐴𝐵𝐸 ]. (15.4.8)
𝑥∈X,𝑧∈Z

The sole inequality above follows from the data-processing inequality for mutual
information and the fact that there is a degrading channel from 𝐵𝑛 to 𝐸 𝑛 . Now
let Λ𝑥,𝑧 𝑥,𝑦,𝑧 ⟩⟨𝜑𝑥,𝑦,𝑧 | 𝑛 be a rank-one decomposition of the POVM {Λ𝑥,𝑧 }
Í
𝐴𝑛 = 𝑦 |𝜑 𝐴 𝐴𝑛 𝑥,𝑧
and define the following extension of the state 𝜏𝑋 𝑍 𝐵𝑛 𝐸 𝑛 :
∑︁
𝜏𝑋 𝑍𝑌 𝐵𝑛 𝐸 𝑛 = |𝑥⟩⟨𝑥| 𝑋 ⊗ |𝑧⟩⟨𝑧| 𝑍 ⊗ |𝑦⟩⟨𝑦|𝑌 ⊗ Tr 𝐴𝑛 [|𝜑𝑥,𝑦,𝑧 ⟩⟨𝜑𝑥,𝑦,𝑧 | 𝐴𝑛 𝜓 ⊗𝑛
𝐴𝐵𝐸 ].
𝑥∈X,𝑧∈Z
(15.4.9)
Then consider that

𝐼 (𝑋 𝑍; 𝐵𝑛 )𝜏 − 𝐼 (𝑋 𝑍; 𝐸 𝑛 )𝜏
1007
Chapter 15: Secret Key Distillation

= 𝐼 (𝑋 𝑍𝑌 ; 𝐵𝑛 )𝜏 − 𝐼 (𝑌 ; 𝐵𝑛 |𝑋 𝑍)𝜏 − [𝐼 (𝑋 𝑍𝑌 ; 𝐸 𝑛 )𝜏 − 𝐼 (𝑌 ; 𝐸 𝑛 |𝑋 𝑍)𝜏 ] (15.4.10)

= 𝐼 (𝑋 𝑍𝑌 ; 𝐵𝑛 )𝜏 − 𝐼 (𝑋 𝑍𝑌 ; 𝐸 𝑛 )𝜏 − [𝐼 (𝑌 ; 𝐵𝑛 |𝑋 𝑍)𝜏 − 𝐼 (𝑌 ; 𝐸 𝑛 |𝑋 𝑍)𝜏 ] (15.4.11)
≤ 𝐼 (𝑋 𝑍𝑌 ; 𝐵𝑛 )𝜏 − 𝐼 (𝑋 𝑍𝑌 ; 𝐸 𝑛 )𝜏 (15.4.12)
= 𝐻 (𝐵𝑛 )𝜏 − 𝐻 (𝐸 𝑛 )𝜏 − [𝐻 (𝐵𝑛 |𝑋 𝑍𝑌 )𝜏 − 𝐻 (𝐸 𝑛 |𝑋 𝑍𝑌 )𝜏 ] (15.4.13)
= 𝐻 (𝐵𝑛 )𝜏 − 𝐻 (𝐸 𝑛 )𝜏 (15.4.14)
= 𝐻 (𝐵𝑛 )𝜓 − 𝐻 (𝐸 𝑛 )𝜓 (15.4.15)

= 𝑛 𝐻 (𝐵)𝜓 − 𝐻 (𝐸)𝜓 (15.4.16)
= 𝑛𝐼 ( 𝐴⟩𝐵) 𝜌 . (15.4.17)

The sole inequality above follows from the data-processing inequality for conditional
mutual information and the fact that there is a degrading channel from 𝐵𝑛 to 𝐸 𝑛 .
This concludes the proof. ■

15.5 Summary
In this chapter, we considered the task of secret key distillation, in which the goal is
for Alice and Bob to convert a bipartite state to an approximate tripartite key state
with as many secret key bits as possible. In doing so, they are allowed to perform
local operations and public classical communication, in which an eavesdropper
obtains a copy of all of the classical communication exchanged. The highest rate at
which this can be accomplished is called the distillable key of the state. We began
with the one-shot setting, in which we allow some error in the distillation protocol,
and we determined lower and upper bounds on the number of approximate secret
key bits that can be distilled. In the asymptotic setting, we proved that the private
information of the state is an achievable rate, and we proved that the squashed
entanglement and the relative entropy of entanglement are upper bounds. These
latter quantities are the best known upper bounds on distillable key.
By performing secret key distillation and then the one-time pad protocol
(described in the introduction of this chapter), Alice can transmit a classical
message privately to Bob. This process thus induces an ideal private classical
channel from Alice to Bob. If Alice and Bob are connected by a quantum
channel, then they can use it to share a bipartite state, from which they can induce
a private classical channel in the aforementioned manner. This is one way to
communicate privately over a quantum channel. In the next chapter, we discuss
other, more direct approaches for private communication, which give an optimal
1008
Chapter 15: Secret Key Distillation

private communication strategy for some quantum channels.

15.6 Bibliographic Notes

The task of secret key distillation, like many tasks in quantum information theory,
has its roots in classical information theory. Maurer (1993) and Ahlswede and
Csiszár (1993) developed the theory of secret key distillation in the classical case.
There, the assumption is that Alice, Bob, and Eve share a tripartite distribution
𝑝 𝑋𝑌 𝑍 , from which they are trying to extract an approximation of an ideal secret key
by means of local operations and public classical communication. The quantum
case considered here is thus a generalization of this scenario, with a tripartite pure
state 𝜓 𝐴𝐵𝐸 replacing the classical distribution 𝑝 𝑋𝑌 𝑍 . Recall that the eavesdropper
sharing the purifying system of a purification of Alice and Bob’s state gives Eve
more power, because she can realize any possible extension of Alice and Bob’s
state by acting on the purifying system.
The one-time pad protocol traces its roots much further back. It was invented by
Vernam (1926), and its security was established by Shannon (1949). As discussed
in the introduction of this chapter, the main application of secret key distillation is
to distill a secret key that can be used in conjunction with the one-time pad protocol
in order to transmit a message privately.
Much of the technical work on secret key distillation was motivated by the
development of quantum key distribution (Bennett and Brassard, 1984; Ekert, 1991).
An early paper on the topic is about privacy amplification (Bennett et al., 1995),
which is a component of a key distillation protocol. Secret key distillation from
a bipartite quantum state was then studied by a number of researchers, including
(Devetak and Winter, 2005; Horodecki et al., 2005a; Christandl, 2006; Horodecki
et al., 2008a, 2009a; Christandl et al., 2007, 2012).
The one-shot setting of secret key distillation was studied by Renes and Renner
(2012) and Khatri et al. (2019). We follow the approach of Khatri et al. (2019)
closely in this chapter.
The connection between the tripartite picture of secret key distillation and the
bipartite picture of private state distillation was identified by Horodecki et al. (2005a,
2009a). This work led to the understanding of the difference between entanglement
and secret key, and it allowed for using the tools of entanglement theory (such as

1009
Chapter 15: Secret Key Distillation

entanglement measures) in the context of secret key distillation. In the context

of asymptotic secret key distillation, the relative entropy of entanglement upper
bound on distillable key was established by Horodecki et al. (2005a, 2009a) and the
squashed entanglement upper bound by Christandl (2006); Christandl et al. (2007,
2012) (see also (Wilde, 2016) in this context).
The privacy test was defined by Horodecki et al. (2008b,a), and its use in
establishing one-shot converse bound was established by Wilde et al. (2017).
Lemma 15.13 is due to Wilde et al. (2017). Lemma 15.14 is implicit in the work
of Horodecki et al. (2009a) and was explicitly proved by Wilde et al. (2017).
Proposition 15.15 and Theorem 15.16 were established by Wilde et al. (2017).
Lemma 15.18, Proposition 15.19, and Theorem 15.20 were established by Wilde
(2016).
Theorem 15.21 was established by Khatri et al. (2019). The convex split method
was introduced by Anshu et al. (2017), and the smooth variant in Lemma 15.22
is due to Khatri et al. (2019), making use of methods in the appendix of Liu and
Winter (2019). Lemma 15.25 is a variant of a result in Anshu et al. (2019) and was
established by Khatri et al. (2019) (see also Wilde (2017b)).
The expression for distillable key in (15.2.13) of Theorem 15.32 is due to
Devetak and Winter (2005). Eq. (15.2.66) is due to Wilde et al. (2017), and
Eq. (15.2.78) to Wilde (2016). One-way secret key distillation was also considered
by Devetak and Winter (2005). Theorems 15.43 and 15.44 were established by
Leditzky (2019).

Appendix 15.A Proof of Smooth Convex Split Lemma

In this appendix, we prove Lemma 15.22.
√
Let e 𝜌 𝐴𝐸 , 𝜌 𝐴𝐸 ) ≤
𝜌 𝐴𝐸 be an arbitrary state satisfying 𝑃(e 𝜀 − 𝜂 and such that

𝜌𝐴 ⊗ e 𝜌 𝐴𝐸 + (1 − 𝑝) 𝜔 𝐴𝐸 ,
𝜌 𝐸 = 𝑝e (15.A.1)

for some 𝑝 ∈ (0, 1) and 𝜔 𝐴𝐸 some state. We define the following state, which we
think of as an approximation to 𝜏𝐴1 ···𝐴𝑅 𝐸 :
𝑅
1 ∑︁
𝜏𝐴1 ···𝐴𝑅 𝐸
e B 𝜌 𝐴 ⊗ · · · ⊗ 𝜌 𝐴𝑟 −1 ⊗ e
𝜌 𝐴𝑟 𝐸 ⊗ 𝜌 𝐴𝑟+1 ⊗ · · · ⊗ 𝜌 𝐴𝑅 . (15.A.2)
𝑅 𝑟=1 1

1010
Chapter 15: Secret Key Distillation
√
It is a good approximation if 𝜀 − 𝜂 is small, because
√
𝐹 (𝜏𝐴1 ···𝐴𝑅 𝐸 , e
𝜏𝐴1 ···𝐴𝑅 𝐸 )
1 ∑︁ √
𝑅
≥ 𝐹 (𝜌 ⊗𝑟−1
𝐴 ⊗ 𝜌 𝐴𝑟 𝐸 ⊗ 𝜌 ⊗𝑅−𝑟
𝐴 , 𝜌 ⊗𝑟−1
𝐴 𝜌 𝐴𝑟 𝐸 ⊗ 𝜌 ⊗𝑅−𝑟
⊗e 𝐴 ) (15.A.3)
𝑅 𝑟=1
1 ∑︁ √
𝑅
= 𝐹 (𝜌 𝐴𝑟 𝐸 , e
𝜌 𝐴𝑟 𝐸 ) (15.A.4)
𝑅 𝑟=1
√
= 𝐹 (𝜌 𝐴𝐸 , e
𝜌 𝐴𝐸 ), (15.A.5)
where the inequality follows from the concavity of the root fidelity (Theorem 6.11).
This in turn implies that
√ √
𝐹 (𝜏𝐴1 ···𝐴𝑅 𝐸 , e
𝜏𝐴1 ···𝐴𝑅 𝐸 ) ≥ 𝐹 (𝜌 𝐴𝐸 , e
𝜌 𝐴𝐸 ). (15.A.6)
So the inequality in (15.A.6), the√definition of the sine distance (Definition 6.16),
𝜌 𝐴𝐸 , 𝜌 𝐴𝐸 ) ≤ 𝜀 − 𝜂, imply that
and the fact that 𝑃(e
√
𝜏𝐴1 ···𝐴𝑅 𝐸 ) ≤ 𝜀 − 𝜂.
𝑃(𝜏𝐴1 ···𝐴𝑅 𝐸 , e (15.A.7)
Now, let us define the following states:
𝛽 𝐴𝐸 B 𝜌 𝐴 ⊗ e 𝜌𝐸 , (15.A.8)
𝛼 𝐴𝐸 B e 𝜌 𝐴𝐸 , (15.A.9)
𝑅
1 ∑︁
𝜏𝐴 𝑅 𝐸 𝑅 B
e 𝛽 𝐴 𝐸 ⊗ · · · ⊗ 𝛽 𝐴𝑟 −1 𝐸𝑟 −1 ⊗ 𝛼 𝐴𝑟 𝐸𝑟 ⊗ 𝛽 𝐴𝑟+1 𝐸𝑟+1 ⊗ · · · ⊗ 𝛽 𝐴𝑅 𝐸 𝑅 ,
𝑅 𝑟=1 1 1
(15.A.10)
and observe that
Tr𝐸 𝑅 [(𝛽 𝐴𝐸 ) ⊗𝑅 ] = 𝜌 𝐴1 ⊗ · · · ⊗ 𝜌 𝐴𝑅 ⊗ e
𝜌𝐸 , (15.A.11)
2

Tr𝐸 𝑅 [e
𝜏𝐴 𝑅 𝐸 𝑅 ] = e
𝜏𝐴1 ···𝐴𝑅 𝐸 . (15.A.12)
2

Thus, it follows from the data-processing inequality for the sine distance that
𝜏𝐴1 ···𝐴𝑅 𝐸 , 𝜌 𝐴1 ⊗ · · · ⊗ 𝜌 𝐴𝑅 ⊗ e
𝑃(e 𝜏𝐴 𝑅 𝐸 𝑅 , (𝛽 𝐴𝐸 ) ⊗𝑅 ).
𝜌 𝐸 ) ≤ 𝑃(e (15.A.13)
Now consider that
𝜌 𝐴𝐸 + (1 − 𝑝) 𝜔 𝐴𝐸 ) ⊗𝑅
(𝛽 𝐴𝐸 ) ⊗𝑅 = ( 𝑝e (15.A.14)
1011
Chapter 15: Secret Key Distillation
∑︁
⊗[𝑅]\𝑆
= 𝑝 |𝑆| (1 − 𝑝) 𝑅−|𝑆| e
𝜌 ⊗𝑆
𝐴𝐸 ⊗ 𝜔 𝐴𝐸 (15.A.15)
𝑆⊂[𝑅]
𝑅
∑︁ 𝑅
= 𝑝 𝑘 (1 − 𝑝) 𝑛−𝑘 𝜃 𝑘 (15.A.16)
𝑘=0
𝑘

where 𝜃 𝑘 is the following state:

1 ∑︁ ⊗𝑆
𝜃 𝑘 B 𝑅 𝜌 𝐴𝐸 ⊗ 𝜔 ⊗[𝑅]\𝑆
e 𝐴𝐸 . (15.A.17)
𝑘 |𝑆|=𝑘

Also, consider that

𝑅
1 ∑︁
𝜏𝐴 𝑅 𝐸 𝑅
e = 𝛽 𝐴 𝐸 ⊗ · · · ⊗ 𝛽 𝐴𝑟 −1 𝐸𝑟 −1 ⊗ 𝛼 𝐴𝑟 𝐸𝑟
𝑅 𝑟=1 1 1
⊗ 𝛽 𝐴𝑟+1 𝐸𝑟+1 ⊗ · · · ⊗ 𝛽 𝐴𝑅 𝐸 𝑅 (15.A.18)
∑︁
⊗[𝑅]\𝑆
= 𝑝 |𝑆|−1 (1 − 𝑝) 𝑅−|𝑆| e
𝜌 ⊗𝑆
𝐴𝐸 ⊗ 𝜔 𝐴𝐸 (15.A.19)
∅≠𝑆⊂[𝑅]
𝑅
∑︁ 𝑅−1
= 𝑝 𝑘−1 (1 − 𝑝) 𝑅−𝑘 𝜃 𝑘 (15.A.20)
𝑘=1
𝑘 −1
𝑅
∑︁ 𝑘 𝑅 𝑘
= 𝑝 (1 − 𝑝) 𝑅−𝑘 𝜃 𝑘 . (15.A.21)
𝑘=0
𝑅𝑝 𝑘

𝑅−1 𝑘 𝑅

In the last line, we used the identity 𝑘−1 = 𝑅𝑝 𝑘 . Defining the following
classical–quantum states:
𝑅
∑︁ 𝑅
𝛽 𝐴𝑅 𝐸 𝑅 𝐾 B 𝑝 𝑘 (1 − 𝑝) 𝑛−𝑘 𝜃 𝑘 ⊗ |𝑘⟩⟨𝑘 | 𝐾 , (15.A.22)
𝑘=0
𝑘
𝑅
∑︁ 𝑘 𝑅 𝑘
𝜏𝐴 𝑅 𝐸 𝑅 𝐾 B
e 𝑝 (1 − 𝑝) 𝑅−𝑘 𝜃 𝑘 ⊗ |𝑘⟩⟨𝑘 | 𝐾 , (15.A.23)
𝑘=0
𝑅𝑝 𝑘

consider that
√
𝐹 ((𝛽 𝐴𝐸 ) ⊗𝑅 , e
𝜏𝐴 𝑅 𝐸 𝑅 )
√
≥ 𝐹 (𝛽 𝐴 𝑅 𝐸 𝑅 𝐾 , e
𝜏𝐴 𝑅 𝐸 𝑅 𝐾 ) (15.A.24)

1012
Chapter 15: Secret Key Distillation
√︄ √︄
𝑅 √

∑︁ 𝑅 𝑘 𝑛−𝑘 𝑘 𝑅 𝑘
= 𝑝 (1 − 𝑝) 𝑝 (1 − 𝑝) 𝑅−𝑘 𝐹 (𝜃 𝑘 , 𝜃 𝑘 ) (15.A.25)
𝑘=0
𝑘 𝑅𝑝 𝑘
√︄
𝑅
∑︁ 𝑅 𝑘 𝑘
= 𝑝 (1 − 𝑝) 𝑅−𝑘 (15.A.26)
𝑘=0
𝑘 𝑅𝑝
√︄
𝑅
1 ∑︁ 𝑅 𝑘 √
= 𝑝 (1 − 𝑝) 𝑅−𝑘 𝑘 (15.A.27)
𝑅 𝑝 𝑘=0 𝑘
√︄
1 h√ i
= E𝐾 𝐾 , (15.A.28)
𝑅𝑝
where E𝐾 denotes the expectation with respect to the binomial random variable 𝐾.
The first inequality follows from the data-processing inequality for fidelity with
respect to partial trace. The other steps follow by direct evaluation. Let 𝜇 = 𝑅 𝑝 (i.e.,
the mean of a binomial random variable). Consider that the following inequality
holds for all 𝑘 ≥ 0 and 𝜇 > 0:
√ √ 𝑘 − 𝜇 (𝑘 − 𝜇) 2
𝑘 ≥ 𝜇+ √ − . (15.A.29)
2 𝜇 2𝜇3/2
Then we find that
√︄ √︄
√

1 h i 1 √ 𝐾 − 𝜇 (𝐾 − 𝜇) 2
E𝐾 𝐾 ≥ E𝐾 𝜇 + √ − (15.A.30)
𝑅𝑝 𝑅𝑝 2 𝜇 2𝜇3/2
√︄
1 √ Var(𝐾)
= 𝜇− (15.A.31)
𝑅𝑝 2𝜇3/2
√︄ !
1 √︁ Var(𝐾)
= 𝑅𝑝 − (15.A.32)
𝑅𝑝 2 (𝑅 𝑝) 3/2
𝑅 𝑝 (1 − 𝑝)
=1− (15.A.33)
(𝑅 𝑝) 2
(1 − 𝑝)
=1− (15.A.34)
𝑅𝑝
1
≥ 1− . (15.A.35)
𝑅𝑝
Thus it follows that
√ 𝜂2
𝐹 ((𝛽 𝐴𝐸 ) ⊗𝑅 , e
𝜏𝐴 𝑅 𝐸 𝑅 ) ≥ 1 − (15.A.36)
2
1013
Chapter 15: Secret Key Distillation

if
2
log2 𝑅 ≥ log2 (1/𝑝) + log2 2 . (15.A.37)
𝜂
This implies that
𝑃((𝛽 𝐴𝐸 ) ⊗𝑅 , e
𝜏𝐴 𝑅 𝐸 𝑅 ) ≤ 𝜂. (15.A.38)
For the same choice of 𝑅, it follows from (15.A.13) that

𝜏𝐴1 ···𝐴𝑅 𝐸 , 𝜌 𝐴1 ⊗ · · · ⊗ 𝜌 𝐴𝑅 ⊗ e
𝑃(e 𝜌 𝐸 ) ≤ 𝜂. (15.A.39)

Applying the triangle inequality to (15.A.7) and (15.A.39), we find that

√
𝑃(𝜏𝐴1 ···𝐴𝑅 𝐸 , 𝜌 𝐴1 ⊗ · · · ⊗ 𝜌 𝐴𝑅 ⊗ e
𝜌 𝐸 ) ≤ 𝜀. (15.A.40)

The
√ whole argument above holds for an arbitrary state e 𝜌 𝐴𝐸 , 𝜌 𝐴𝐸 ) ≤
𝜌 𝐴𝐸 satisfying 𝑃(e
𝜀 − 𝜂 and (15.A.1), and so taking an infimum of log2 (1/𝑝) over 𝑝 and all states
satisfying these conditions, and applying the definition in (15.1.211), as well as
Lemma 7.59, we find that
√
𝑃(𝜏𝐴1 ···𝐴𝑅 𝐸 , 𝜌 𝐴1 ⊗ · · · ⊗ 𝜌 𝐴𝑅 ⊗ e
𝜌𝐸 ) ≤ 𝜀 (15.A.41)

if √

𝜀−𝜂 2
log2 𝑅 ≥ 𝐼 max (𝐸; 𝐴) 𝜌 + log2 2 . (15.A.42)
𝜂
This concludes the proof.

Appendix 15.B Relating Two Variants of Smooth-

Max Mutual Information
In this appendix, we prove Lemma 15.25. The steps consist of constructing a state
𝜌 𝐴𝐸 such that
b

8
𝐷 max (b
𝜌 𝐴𝐸 ∥ 𝜌 𝐴 ⊗ b
𝜌 𝐸 ) ≤ 𝐷 max (e
𝜌 𝐴𝐸 ∥ 𝜌 𝐴 ⊗ 𝜌 𝐸 ) + log2 2 (15.B.1)
𝛿
and 𝑃(b 𝜌 𝐴𝐸 , 𝜌 𝐴𝐸 ) ≤ 𝜀 + 𝛿. Using these inequalities, we can apply the definition of
𝜀+𝛿
𝐼 max (𝐸; 𝐴) 𝜌 to conclude the desired inequality in (15.1.265). We begin by showing
the first inequality, and after that, we establish the second one.
1014
Chapter 15: Secret Key Distillation

We begin by establishing some preparatory facts. Let e 𝜌 𝐴𝐸 be a state satisfying

𝛾
𝑃(e𝜌 𝐴𝐸 , 𝜌 𝐴𝐸 ) ≤ 𝜀. Let 𝛾 = 𝛿 /8, and set Π𝐸 to be the projection onto the positive
2

eigenspace of 𝛾1 e 𝜌 𝐸 − 𝜌 𝐸 . Then it follows that

𝛾 1 𝛾 𝛾 𝛾 1 𝛾 𝛾 8 𝛾 𝛾
Π𝐸 𝜌 𝐸 − 𝜌 𝐸 Π𝐸 ≥ 0
e ⇒ Π𝐸 𝜌 𝐸 Π𝐸 ≤ Π𝐸 e
𝜌 𝐸 Π𝐸 = 2 Π𝐸 e
𝜌 𝐸 Π𝐸 ,
𝛾 𝛾 𝛿
(15.B.2)
and

𝛾 1 𝛾
𝐼− Π𝐸 𝜌𝐸 − 𝜌𝐸
e 𝐼 − Π𝐸 ≤ 0
𝛾
𝛾 𝛾 𝛿2
⇒ Tr[ 𝐼 − Π𝐸 e 𝜌 𝐸 ] ≤ 𝛾 Tr[ 𝐼 − Π𝐸 𝜌 𝐸 ] ≤ 𝛾 = , (15.B.3)
8
𝛾
where the last inequality follows because Tr[ 𝐼 − Π𝐸 𝜌 𝐸 ] ≤ 1. The inequality in
(15.B.3) can be rewritten as

𝛾 𝛿2
𝜌𝐸 ]
Tr[Π𝐸 e ≥ 1− . (15.B.4)
8

We now establish (15.B.1). Let us define the following states:

𝛾 𝛾 𝛾 𝛾
𝜌 𝐴𝐸 𝑋 B Π𝐸 e 𝜌 𝐴𝐸 Π𝐸 ⊗ |0⟩⟨0| 𝑋 + 𝐼 − Π𝐸 e 𝜌 𝐴𝐸 𝐼 − Π𝐸 ⊗ |1⟩⟨1| 𝑋 , (15.B.5)

𝜌 1/2
𝛾 𝛾 𝛾 1/2
𝜌 𝐴𝐸 𝑋 B Π𝐸 e
b 𝜌 𝐴𝐸 Π𝐸 + 𝜌 𝐴 ⊗ e 𝐸 𝐼 − Π 𝐸 𝜌
e 𝐸 ⊗ |0⟩⟨0| 𝑋 , (15.B.6)
so that

𝜌 𝐴𝐸 = Tr 𝑋 [b
b 𝜌 𝐴𝐸 𝑋 ] (15.B.7)
𝜌 1/2 𝜌 1/2
𝛾 𝛾 𝛾
𝜌 𝐴𝐸 Π𝐸 + 𝜌 𝐴 ⊗ e
= Π𝐸 e 𝐸 𝐼 − Π𝐸 e 𝐸 . (15.B.8)

𝜌 𝐴𝐸 ≤ 𝜇𝜌 𝐴 ⊗ 𝜌 𝐸 , with
Then, using the inequality e

𝜇 B 2𝐷 max (e𝜌 𝐴𝐸 ∥ 𝜌 𝐴 ⊗𝜌 𝐸 ) , (15.B.9)

and the fact that 𝜇 𝛿82 ≥ 1 (which holds because 𝐷 max (e𝜌 𝐴𝐸 ∥ 𝜌 𝐴 ⊗ 𝜌 𝐸 ) ≥ 0 and
8 ≥ 𝛿 ), we find that
2

𝜌 1/2
𝛾 𝛾 𝛾 1/2
𝜌 𝐴𝐸 ≤ 𝜇𝜌 𝐴 ⊗ Π𝐸 𝜌 𝐸 Π𝐸 + 𝜌 𝐴 ⊗ e
b 𝐸 𝐼 − Π𝐸 e𝜌𝐸 (15.B.10)

1015
Chapter 15: Secret Key Distillation

8 𝛾 𝛾 1/2 𝛾 1/2
≤𝜇 𝜌 𝐴 ⊗ Π 𝐸 𝜌
e 𝐸 Π 𝐸 + 𝜌 𝐴 ⊗ 𝜌
e 𝐸 𝐼 − Π 𝐸 e 𝜌𝐸 (15.B.11)
𝛿2
8 h 𝛾 𝛾 1/2 𝛾 1/2
i
≤ 𝜇 2 𝜌 𝐴 ⊗ Π𝐸 e 𝜌 𝐸 Π𝐸 + 𝜌 𝐴 ⊗ e 𝜌 𝐸 𝐼 − Π𝐸 e 𝜌𝐸 (15.B.12)
𝛿
8 h i
𝛾 𝛾 1/2 𝛾 1/2
= 𝜇 2 𝜌 𝐴 ⊗ Π𝐸 e 𝜌 𝐸 Π𝐸 + e 𝜌 𝐸 𝐼 − Π𝐸 e 𝜌𝐸 (15.B.13)
𝛿
8
= 𝜇 2 𝜌𝐴 ⊗ b𝜌𝐸 . (15.B.14)
𝛿
The second inequality above follows from (15.B.2). Applying the definition of
𝐷 max (b
𝜌 𝐴𝐸 ∥ 𝜌 𝐴 ⊗ b𝜌 𝐸 ), we conclude that

8
𝐷 max (b 𝜌 𝐴𝐸 ∥ 𝜌 𝐴 ⊗ b
𝜌 𝐸 ) ≤ 𝐷 max (e
𝜌 𝐴𝐸 ∥ 𝜌 𝐴 ⊗ 𝜌 𝐸 ) + log2 2 . (15.B.15)
𝛿

We can conclude the statement of the lemma if 𝑃(b 𝜌 𝐴𝐸 , 𝜌 𝐴𝐸 ) ≤ 𝜀 + 𝛿, and so it

is our aim to show this now. Consider that
√︃
𝑃(b𝜌 𝐴𝐸 𝑋 , 𝜌 𝐴𝐸 𝑋 ) = 1 − 𝐹 (b
𝜌 𝐴𝐸 𝑋 , 𝜌 𝐴𝐸 𝑋 ). (15.B.16)

The following chain of inequalities holds

√︃
𝐹 (b
𝜌 𝐴𝐸 𝑋 , 𝜌 𝐴𝐸 𝑋 )
" 1/2 #
√︃ √︃
𝛾 𝛾 𝛾 𝛾
= Tr Π𝐸 e 𝜌 𝐴𝐸 Π𝐸 b𝜌 𝐴𝐸 Π𝐸 e𝜌 𝐴𝐸 Π𝐸 (15.B.17)
" 1/2 #
√︃ √︃
𝛾 𝛾 𝛾 𝛾 𝛾 𝛾
≥ Tr Π𝐸 e
𝜌 𝐴𝐸 Π𝐸 Π𝐸 e
𝜌 𝐴𝐸 Π𝐸 Π𝐸 e
𝜌 𝐴𝐸 Π𝐸 (15.B.18)
𝛾 𝛾
= Tr Π𝐸 e
𝜌 𝐴𝐸 Π𝐸 (15.B.19)
𝛾
𝜌𝐸 ]
= Tr[Π𝐸 e (15.B.20)
𝛿2
≥ 1− , (15.B.21)
8
where the inequality follows from operator monotonicity of the square root and the
fact that

𝜌 1/2
𝛾 𝛾 𝛾 1/2
𝜌 𝐴𝐸 = Π𝐸 e
b 𝜌 𝐴𝐸 Π𝐸 + 𝜌 𝐴 ⊗ e 𝐸 𝐼 − Π 𝐸 e𝜌𝐸 (15.B.22)
𝛾 𝛾
≥ Π𝐸 e 𝜌 𝐴𝐸 Π𝐸 (15.B.23)
1016
Chapter 15: Secret Key Distillation

2
From the above and (15.B.4), we conclude that 𝐹 (b 𝜌 𝐴𝐸 𝑋 , 𝜌 𝐴𝐸 𝑋 ) ≥ 1 − 𝛿4 , which
implies that
𝛿
𝜌 𝐴𝐸 𝑋 , 𝜌 𝐴𝐸 𝑋 ) ≤ .
𝑃(b (15.B.24)
2
Now consider that

𝑃(𝜌 𝐴𝐸 𝑋 , 𝜌 𝐴𝐸 ⊗ |0⟩⟨0| 𝑋 )
≤ 𝑃(𝜌 𝐴𝐸 𝑋 , e 𝜌 𝐴𝐸 ⊗ |0⟩⟨0| 𝑋 )
+ 𝑃(e 𝜌 𝐴𝐸 ⊗ |0⟩⟨0| 𝑋 , 𝜌 𝐴𝐸 ⊗ |0⟩⟨0| 𝑋 ) (15.B.25)
√︁
= 1 − 𝐹 (𝜌 𝐴𝐸 𝑋 , e 𝜌 𝐴𝐸 ⊗ |0⟩⟨0| 𝑋 ) + 𝑃(e 𝜌 𝐴𝐸 , 𝜌 𝐴𝐸 ) (15.B.26)
√︄
√︃ √︁ 2
𝛾 𝛾
= 1 − Π𝐸 e 𝜌 𝐴𝐸 Π𝐸 e 𝜌 𝐴𝐸 + 𝑃(e 𝜌 𝐴𝐸 , 𝜌 𝐴𝐸 ) (15.B.27)
1
√︄
√︃ √︃ 2
𝛾 𝛾 𝛾 𝛾
= 1 − Π𝐸 e 𝜌 𝐴𝐸 Π 𝐴 Π𝐸 e 𝜌 𝐴𝐸 Π 𝐴 + 𝑃(e 𝜌 𝐴𝐸 , 𝜌 𝐴𝐸 ) (15.B.28)
1
√︃
𝛾 2
= 1 − Tr[Π𝐸 e 𝜌 𝐴𝐸 ] + 𝑃(e 𝜌 𝐴𝐸 , 𝜌 𝐴𝐸 ) (15.B.29)
𝛿
≤ + 𝜀, (15.B.30)
2
where we applied the triangle inequality of the sine distance (Lemma 6.17) for the
√ √ √ √
first inequality and the fact that Π𝜔Π 𝜏 = Π𝜔Π Π𝜏Π for a projector
1 1
Π and states 𝜔 and 𝜏. Combining this with (15.B.24), we find that

𝜌 𝐴𝐸 , 𝜌 𝐴𝐸 ) = 𝑃(b
𝑃(b 𝜌 𝐴𝐸 𝑋 , 𝜌 𝐴𝐸 ⊗ |0⟩⟨0| 𝑋 ) (15.B.31)
≤ 𝑃(b𝜌 𝐴𝐸 𝑋 , 𝜌 𝐴𝐸 𝑋 ) + 𝑃(𝜌 𝐴𝐸 𝑋 , 𝜌 𝐴𝐸 ⊗ |0⟩⟨0| 𝑋 ) (15.B.32)
= 𝜀 + 𝛿. (15.B.33)

Since we have found a state b

𝜌 𝐴𝐸 satisfying 𝑃(b 𝜌 𝐴𝐸 , 𝜌 𝐴𝐸 ) ≤ 𝜀 + 𝛿 and (15.B.15), we
conclude that

𝜀+𝛿 8
𝐼max
e (𝐸; 𝐴) 𝜌 ≤ 𝐷 max (e
𝜌 𝐴𝐸 ∥ 𝜌 𝐴 ⊗ 𝜌 𝐸 ) + log2 2 . (15.B.34)
𝛿

Since this inequality has been shown for all states e 𝜌 𝐴𝐸 , 𝜌 𝐴𝐸 ) ≤ 𝜀,

𝜌 𝐴𝐸 satisfying 𝑃(e
we conclude the statement of the lemma.

1017
Chapter 16

Private Communication
This chapter focuses on the task of private communication, in which the goal is for
a sender to communicate classical information privately over a quantum channel to
a receiver, such that the environment of the channel gains essentially no information
about the message transmitted. There are connections between this task and secret
key distillation from Chapter 15, as well as with quantum communication from
Chapter 14. Private communication can be considered a dynamic version of the
general problem of establishing secret correlations between two parties, whereas
secret key distillation is a static version of the same problem. Indeed, the resource
shared between the two parties in the former task is a quantum channel (a dynamic
resource), whereas the resource shared in the latter is a bipartite quantum state (a
static resource). The cryptographic models are similar as well: in key distillation,
we assumed that an eavesdropper possesses the purifying system of a purification of
the shared state, whereas, in this chapter, we assume that an eavesdropper possesses
the purifying system of a purification of the channel connecting the sender to
receiver (i.e., the eavesdropper possesses the environment of the channel). The
connection of private communication to quantum communication is as follows: if
two parties can communicate some amount of quantum information with some
error, then the amount of private information that they can communicate is related
to this amount by an inequality. This inequality in turn implies that the private
capacity of a quantum channel is not smaller than its quantum capacity.
As with other communication tasks that we have considered in previous chapters,
there are multiple ways to define how communication can be private, based on
various error criteria. In this chapter, we define two such criteria that lead to two
different but related communication tasks, one that we call secret-key transmission
1018
Chapter 16: Private Communication

and another that we call private communication. The criterion for the former task
is most similar to an average error criterion, in which the goal is for the sender
to use the channel to transmit one share of a secret key to the receiver, and the
criterion for the latter task is a maximal infidelity criterion, in which all messages
transmitted over the channel are required to meet a particular error criterion, which
captures both the decoding error probability of the receiver, as well as the security
of the message transmitted.
As usual by now, we begin our development in the one-shot setting, with the
goal of establishing lower and upper bounds on the one-shot private capacity.
We find several upper bounds on the one-shot private capacity, in terms of the
one-shot private information of the channel, the hypothesis testing relative entropy
of entanglement, and the squashed entanglement. The lower bound that we establish
is related to a different variation of the one-shot private information of the channel
(not the same quantity as in the upper bound), and we juxtapose the methods
of position-based coding and convex splitting to prove the achievability of this
one-shot private information. Some of the mathematical steps in the proof of the
lower bound are similar to those that we used in the previous chapter, in which
we established a lower bound on the one-shot distillable key. Moving on to the
asymptotic setting, we prove that the private capacity of a quantum channel is
equal to its regularized private information. This quantity is difficult to compute in
general, and so we then establish some upper bounds on it in terms of the relative
entropy of entanglement and squashed entanglement.

16.1 One-Shot Setting

Let N 𝐴→𝐵 be a quantum channel connecting a sender Alice to a receiver Bob, and
let UN 𝐴→𝐵𝐸 be an isometric channel extending N 𝐴→𝐵 , in the sense that N 𝐴→𝐵 =
Tr𝐸 ◦UN 𝐴→𝐵𝐸 . The goal of a private communication protocol is for Alice to
communicate a classical message to Bob reliably, in the sense that Bob can decode
it with high probability, and such that it is secure from anyone who possesses the
environment system 𝐸 (we personify the environment as the eavesdropper Eve). A
private communication protocol in the one-shot setting is illustrated in Figure [REF].
It is defined by the three elements (M, E 𝑀 ′ →𝐴 , D𝐵→ 𝑀ˆ ), in which M is a message
set, E 𝑀 ′ →𝐴 is an encoding channel, and D𝐵→ 𝑀ˆ is a decoding channel. The pair
(E 𝑀 ′ →𝐴 , D𝐵→ 𝑀ˆ ), consisting of the encoding and decoding channels, is called a
private communication code or, more simply, a code. The encoding channel is a
1019
Chapter 16: Private Communication

classical–quantum channel, and the decoding channel is a quantum–classical or

measurement channel.
The steps of the protocol proceed similarly to those of the classical communica-
tion protocol discussed in Section 12.1, with the key difference that the message
transmitted should be kept private from Eve. Let us employ notation similar to that
discussed in (12.1.1)–(12.1.9), in which Alice’s probability for selecting message
𝑚 ∈ M is denoted by 𝑝(𝑚), the initial state is denoted by
𝑝
∑︁
Φ𝑀 𝑀 ′ B 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ |𝑚⟩⟨𝑚| 𝑀 ′ , (16.1.1)
𝑚∈M

the state after the encoding channel by

𝑝 𝑝
𝜌 𝑀 𝐴 B E 𝑀 ′ →𝐴 (Φ 𝑀 𝑀 ′ ) (16.1.2)
∑︁
= 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ 𝜌 𝑚𝐴 , (16.1.3)
𝑚∈M

where we have defined

𝜌 𝑚𝐴 B E 𝑀 ′ →𝐴 (|𝑚⟩⟨𝑚| 𝑀 ′ ), (16.1.4)
the state before the decoding channel by

UN
𝑝
𝐴→𝐵𝐸 (𝜌 𝑀 𝐴 ), (16.1.5)

and the final state of the protocol by

𝑝
B (D𝐵→ 𝑀ˆ ◦ UN
𝑝
𝜔
𝑀 𝑀ˆ 𝐸 𝐴→𝐵𝐸 ◦ E 𝑀 ′ →𝐴 )(Φ 𝑀 𝑀 ′ ) (16.1.6)
∑︁
ˆ N
𝐵 U 𝐴→𝐵𝐸 (𝜌 𝐴 )],
ˆ 𝑀ˆ ⊗ Tr 𝐵 [Λ𝑚 𝑚
= 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ | 𝑚⟩⟨ˆ 𝑚| (16.1.7)
ˆ
𝑚,𝑚∈M

where we have used the fact that the decoding channel is a measurement channel
and thus can be written in terms of a POVM {Λ𝑚 𝐵 } 𝑚∈M as
∑︁
ˆ
D𝐵→ 𝑀ˆ (𝜏𝐵 ) B Tr[Λ𝑚𝐵 𝜏𝐵 ] | 𝑚⟩⟨
ˆ 𝑚|ˆ 𝑀ˆ . (16.1.8)
ˆ
𝑚∈M

If we define the following states

ˆ N
𝐵 U 𝐴→𝐵𝐸 (𝜌 𝐴 )]
Tr 𝐵 [Λ𝑚 𝑚
𝑚ˆ
𝜔𝑚,
𝐸 B , (16.1.9)
ˆ
𝑞( 𝑚|𝑚)
1020
Chapter 16: Private Communication

ˆ N 𝑚ˆ
𝐵 U 𝐴→𝐵𝐸 (𝜌 𝐴 )] = Tr[Λ 𝐵 N 𝐴→𝐵 (𝜌 𝐴 )],
B Tr[Λ𝑚 𝑚 𝑚
ˆ
𝑞( 𝑚|𝑚) (16.1.10)

then we can write the final state of the protocol alternatively as follows:
∑︁
𝑝 𝑚ˆ
𝜔 ˆ = ˆ
𝑝(𝑚)𝑞( 𝑚|𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ | 𝑚⟩⟨
ˆ 𝑚|ˆ 𝑀ˆ ⊗ 𝜔𝑚,
𝐸 . (16.1.11)
𝑀𝑀𝐸
ˆ
𝑚,𝑚∈M

The difference between a protocol for (public) communication, as discussed

in Section 12.1, and one for private communication is the metric used for charac-
terizing performance. Here, we demand that the message remain private from the
eavesdropper in addition to being decodable by the receiver. We combine these
constraints into a single metric, which we define in terms of the infidelity. We
delineate two different cases, similar to how we did in Section 12.1:
1. The average infidelity of the code is given by

𝑝 err (E, D; 𝑝, N)
𝑝 𝑝

B inf 1 − 𝐹 (Φ 𝑀 𝑀ˆ ⊗ 𝜎𝐸 , P 𝑀 ′ → 𝑀ˆ 𝐸 (Φ 𝑀 𝑀 ′ )) (16.1.12)
𝜎𝐸
!2
∑︁ √
= inf 1 − 𝑝(𝑚) 𝐹 (|𝑚⟩⟨𝑚| 𝑀ˆ ⊗ 𝜎𝐸 , P 𝑀 ′ → 𝑀ˆ 𝐸 (|𝑚⟩⟨𝑚| 𝑀 ′ )) ® ,
© ª
𝜎𝐸
« 𝑚∈M ¬
(16.1.13)

where
P 𝑀 ′ → 𝑀ˆ 𝐸 B D𝐵→ 𝑀ˆ ◦ UN
𝐴→𝐵𝐸 ◦ E 𝑀 ′ →𝐴 (16.1.14)
and the infimum is taken over every state 𝜎𝐸 of the eavesdropper’s system 𝐸.
Also, we employed Proposition 7.31 with 𝛼 = 12 in the last line above. If the prior
probability distribution 𝑝(𝑚) is the uniform distribution (i.e., 𝑝(𝑚) = 1/|M|),
then the communication task is called secret-key transmission, because the
goal is for Alice to transmit one share of a secret key to the receiver Bob.
2. An alternative error criterion is the maximal infidelity of the code, defined as

𝑝 ∗err (E, D; N) B inf max 1 − 𝐹 (|𝑚⟩⟨𝑚| 𝑀ˆ ⊗ 𝜎𝐸 , P 𝑀 ′ → 𝑀ˆ 𝐸 (|𝑚⟩⟨𝑚| 𝑀 ′ )) .

𝜎𝐸 𝑚∈M
(16.1.15)
When the communication task employs this error criterion, we refer to it as
private communication.

1021
Chapter 16: Private Communication

The interpretation of the average infidelity obeying the inequality

𝑝 err (E, D; 𝑝, N) ≤ 𝜀 (16.1.16)

is that there exists a state 𝜎𝐸 of the eavesdropper’s system 𝐸 such that the state of
systems 𝑀ˆ and 𝐸 is close to the product state |𝑚⟩⟨𝑚| 𝑀ˆ ⊗ 𝜎𝐸 , on average. This
means that not only can Bob can decode well, but also, that the state of Eve’s system
is close to the constant state 𝜎𝐸 , such that her system is not useful for figuring out
the message transmitted (on average). Indeed, by applying the data-processing
inequality with respect to partial trace of system 𝐸 and letting 𝜎𝐸 be the state that
achieves 𝑝 err (E, D; 𝑝, N), we conclude that

𝜀 ≥ 𝑝 err (E, D; 𝑝, N) (16.1.17)

!2
∑︁ √
=1− 𝑝(𝑚) 𝐹 (|𝑚⟩⟨𝑚| 𝑀ˆ ⊗ 𝜎𝐸 , P 𝑀 ′ → 𝑀ˆ 𝐸 (|𝑚⟩⟨𝑚| 𝑀 ′ )) (16.1.18)
𝑚∈M
∑︁
≥ 1− 𝑝(𝑚)𝐹 (|𝑚⟩⟨𝑚| 𝑀ˆ ⊗ 𝜎𝐸 , P 𝑀 ′ → 𝑀ˆ 𝐸 (|𝑚⟩⟨𝑚| 𝑀 ′ )) (16.1.19)
∑︁ 𝑚∈M
≥ 𝑝(𝑚) 1 − 𝐹 (|𝑚⟩⟨𝑚| 𝑀ˆ , (D𝐵→ 𝑀ˆ ◦ N 𝐴→𝐵 ◦ E 𝑀 ′ →𝐴 )(|𝑚⟩⟨𝑚| 𝑀 ′ ))
𝑚∈M
(16.1.20)
∑︁
= 𝑝(𝑚) 1 − ⟨𝑚| 𝑀ˆ (D𝐵→ 𝑀ˆ ◦ N 𝐴→𝐵 ◦ E 𝑀 ′ →𝐴 )(|𝑚⟩⟨𝑚| 𝑀 ′ )|𝑚⟩ 𝑀ˆ
𝑚∈M
(16.1.21)
∑︁
𝐵 N 𝐴→𝐵 (𝜌 𝐴 )] .
𝑝(𝑚) 1 − Tr[Λ𝑚 𝑚
= (16.1.22)
𝑚∈M

The second inequality follows from convexity of the square function and the third
from the data-processing inequality for fidelity. The latter expression is the same as
the average error probability from (12.1.13). Now applying the data-processing
ˆ we conclude that
inequality with respect to partial trace over 𝑀,

𝜀 ≥ 𝑝 err (E, D; 𝑝, N) (16.1.23)

𝑝 𝑝
=1− 𝐹 (Φ 𝑀 𝑀ˆ ⊗ 𝜎𝐸 , (D𝐵→ 𝑀ˆ ◦ UN 𝐴→𝐵𝐸 ◦ E 𝑀 ′ →𝐴 )(Φ 𝑀 𝑀 ′ )) (16.1.24)
𝑝 𝑝
≥ 1− 𝐹 (Tr 𝑀ˆ [Φ 𝑀 𝑀ˆ ⊗ 𝜎𝐸 ], Tr 𝑀ˆ [(D𝐵→ 𝑀ˆ ◦ UN 𝐴→𝐵𝐸 ◦ E 𝑀 ′ →𝐴 )(Φ 𝑀 𝑀 ′ )])
(16.1.25)
𝑝 𝑝
= 1 − 𝐹 (𝜋 𝑀 ⊗ 𝜎𝐸 , (N𝑐𝐴→𝐸 ◦ E 𝑀 ′ →𝐴 )(Φ 𝑀 𝑀 ′ )) (16.1.26)

1022
Chapter 16: Private Communication
!2
∑︁ √
=1− 𝑝(𝑚) 𝐹 (𝜎𝐸 , N𝑐𝐴→𝐸 (𝜌 𝑚𝐴 )) , (16.1.27)
𝑚∈M

which indicates that the state of Eve’s system 𝐸 is close to the constant state 𝜎𝐸 on
average. In the above, N𝑐𝐴→𝐸 is a complementary channel of N 𝐴→𝐵 , as defined in
Section 4.3.2, and is given by N𝑐𝐴→𝐸 = Tr 𝐵 ◦UN 𝐴→𝐵𝐸 . Also, in the last line above,
1
we employed Proposition 7.31 with 𝛼 = 2 .
The interpretation of the maximum infidelity obeying the constraint

𝑝 ∗err (E, D; N) ≤ 𝜀 (16.1.28)

is similar. If this condition holds, then there exists a state 𝜎𝐸 of the eavesdropper’s
system 𝐸 such that the state of systems 𝑀ˆ and 𝐸 is close to the product state
|𝑚⟩⟨𝑚| 𝑀ˆ ⊗ 𝜎𝐸 , for every message 𝑚 ∈ M. So this is a much stronger constraint in
general and the one we aim to achieve for private communication. By applying
the data-processing inequality to 𝑝 ∗err (E, D; N) ≤ 𝜀 and letting 𝜎𝐸 be the state that
achieves 𝑝 ∗err (E, D; N), we conclude by similar reasoning as given above that

𝜀 ≥ 𝑝 ∗err (E, D; N) ≥ max 1 − Tr[Λ𝑚 N

𝐵 𝐴→𝐵 (𝜌 𝑚
𝐴 )] , (16.1.29)
𝑚∈M

and
𝜀 ≥ 𝑝 ∗err (E, D; N) ≥ max 1 − 𝐹 (𝜎𝐸 , N𝑐𝐴→𝐸 (𝜌 𝑚𝐴 )) .

(16.1.30)
𝑚∈M

Thus, if 𝑝 ∗err (E, D; N) ≤ 𝜀 holds, then Bob can reliably decode every message
𝑚 ∈ M, in the sense that

𝐵 N 𝐴→𝐵 (𝜌 𝐴 )] ≥ 1 − 𝜀
Tr[Λ𝑚 ∀𝑚 ∈ M,
𝑚
(16.1.31)

and Eve’s system 𝐸 is not useful for determining any of the messages, in the sense
that
𝐹 (𝜎𝐸 , N𝑐𝐴→𝐸 (𝜌 𝑚𝐴 )) ≥ 1 − 𝜀 ∀𝑚 ∈ M. (16.1.32)

These two different infidelity criteria can be used to assess the performance of a
protocol, i.e., how well Bob can decode the message and how secure it is from Eve.

1023
Chapter 16: Private Communication

Definition 16.1 (|M| , 𝜺) Private Communication Protocol

A private communication protocol (M, E 𝑀 ′ →𝐴 , D𝐵→ 𝑀ˆ ) over the channel N 𝐴→𝐵
is called an (|M| , 𝜀) protocol, with 𝜀 ∈ [0, 1], if 𝑝 ∗err (E, D; N) ≤ 𝜀.

Similar to the case of entanglement-assisted and unassisted classical communi-

cation, the infidelity criterion 𝑝 ∗err (E, D; N) ≤ 𝜀 is equivalent to
𝑝 𝑝

N
inf max 1 − 𝐹 (Φ 𝑀 𝑀ˆ ⊗ 𝜎𝐸 , (D𝐵→ 𝑀ˆ ◦ U 𝐴→𝐵𝐸 ◦ E 𝑀 ′ →𝐴 )(Φ 𝑀 𝑀 ′ )) ≤ 𝜀,
𝜎𝐸 𝑝:M→[0,1]
(16.1.33)
where the optimization is over every probability distribution 𝑝(𝑚) for the messages
in M. This follows because
𝑝 𝑝
𝐹 (Φ 𝑀 𝑀ˆ ⊗ 𝜎𝐸 , (D𝐵→ 𝑀ˆ ◦ UN
𝐴→𝐵𝐸 ◦ E 𝑀 ′ →𝐴 )(Φ 𝑀 𝑀 ′ )) =
" #2
∑︁ √
𝑝(𝑚) 𝐹 (|𝑚⟩⟨𝑚| 𝑀ˆ ⊗ 𝜎𝐸 , (D𝐵→ 𝑀ˆ ◦ UN 𝐴→𝐵𝐸 ◦ E 𝑀 ′ →𝐴 )(|𝑚⟩⟨𝑚| 𝑀 ′ ))) ,
𝑚∈M
(16.1.34)

as a consequence of Proposition 7.31 with 𝛼 = 12 , and then one can employ

arguments similar to those in (11.1.25)–(11.1.34) to conclude (16.1.33).
The one-shot private capacity of the channel N is equal to the maximum number
of private bits that can be transmitted for a fixed infidelity threshold 𝜀:

Definition 16.2 One-Shot Private Capacity of a Quantum Channel

Given a quantum channel N 𝐴→𝐵 and 𝜀 ∈ [0, 1], the one-shot 𝜀-error private
capacity of N, denoted by 𝑃 𝜀 (N), is defined to be the maximum number
log2 |M| of private bits among all (|M| , 𝜀) private communication protocols
over N. In other words,

𝑃 𝜀 (N) B sup log2 |M| : 𝑝 ∗err (E, D; N) ≤ 𝜀 ,

(16.1.35)
(M,E,D)

where the optimization is over all protocols (M, E 𝑀 ′ →𝐴 , D𝐵→ 𝑀ˆ ) satisfying

𝑑 𝑀 ′ = 𝑑 𝑀ˆ = |M|.

1024
Chapter 16: Private Communication

16.1.1 Private Communication and Quantum Communication

This subsection establishes that a quantum communication protocol can always

be converted to one for private communication, such that there is negligible loss
with respect to code parameters. This result then implies an inequality relating the
one-shot quantum capacity to the one-shot private capacity.

Proposition 16.3
The existence of an (𝑀, 𝜀) quantum communication protocol for a quan-
tum channel N 𝐴→𝐵 implies the existence of an (⌊𝑀/2⌋ , min{1, 2𝜀}) private
communication protocol for N 𝐴→𝐵 .

Proof: Starting from an (𝑀, 𝜀) quantum communication protocol, we can use it

to transmit one share of a maximally entangled state
𝑀
1 ∑︁
Φ 𝑅𝑆 B |𝑚⟩⟨𝑚′ | 𝑅 ⊗ |𝑚⟩⟨𝑚′ | 𝑆 (16.1.36)
𝑀 𝑚,𝑚 ′ =1

of Schmidt rank 𝑀 faithfully, by definition (see Definition 14.1):

𝐹 (Φ 𝑅𝑆 , (D𝐵→𝑆 ◦ N 𝐴→𝐵 ◦ E𝑆′ →𝐴 )(Φ 𝑅𝑆′ )) ≥ 1 − 𝜀. (16.1.37)

Consider that the state

𝜎𝑅𝑆𝐸 B (D𝐵→𝑆 ◦ UN
𝐴→𝐵𝐸 ◦ E𝑆 ′ →𝐴 )(Φ 𝑅𝑆 ′ ) (16.1.38)

extends the state output from the actual protocol. By Uhlmann’s theorem (The-
orem 6.8), there exists an extension of Φ 𝑅𝑆 such that the fidelity between this
extension and the state 𝜎𝑅𝑆𝐸 is equal to the fidelity in (16.1.37). However, the
maximally entangled state Φ 𝑅𝑆 is unextendible in the sense that the only possible
extension is a tensor-product state Φ 𝑅𝑆 ⊗ 𝜔 𝐸 for some state 𝜔 𝐸 . So, putting these
statements together, we find that

𝐹 (Φ 𝑅𝑆 ⊗ 𝜔 𝐸 , (D𝐵→𝑆 ◦ UN
𝐴→𝐵𝐸 ◦ E𝑆 ′ →𝐴 )(Φ 𝑅𝑆 ′ )) ≥ 1 − 𝜀. (16.1.39)

Furthermore, measuring the 𝑅 and 𝑆 systems locally in the Schmidt basis of Φ 𝑅𝑆

only increases the fidelity, so that

𝐹 (Φ 𝑅𝑆 ⊗ 𝜔 𝐸 𝑛 , (D𝐵→𝑆 ◦ UN
𝐴→𝐵𝐸 ◦ E𝑆 ′ →𝐴 )(Φ 𝑅𝑆 )) ≥ 1 − 𝜀, (16.1.40)
1025
Chapter 16: Private Communication

where D𝐵→𝑆 denotes the concatenation of the original decoder D𝐵→𝑆 followed by
the local measurement:
∑︁
D𝐵→𝑆 (·) B |𝑚⟩⟨𝑚|D𝐵→𝑆 (·)|𝑚⟩⟨𝑚| (16.1.41)
𝑚
∑︁
= Tr[(D𝐵→𝑆 ) † [|𝑚⟩⟨𝑚|] (·)]|𝑚⟩⟨𝑚| 𝑆 . (16.1.42)
𝑚

Observe that {(D𝐵→𝑆 ) † [|𝑚⟩⟨𝑚|]} 𝑚 is a valid POVM. Using the direct-sum property
of the fidelity (Proposition 7.31 with 𝛼 = 21 ) and defining 𝜌 𝑚𝐴 B E𝑆′ →𝐴 (|𝑚⟩⟨𝑚| 𝑆′ ),
we can then rewrite this as
𝑀 √
!2
1 ∑︁
𝐹 (|𝑚⟩⟨𝑚| 𝑆 ⊗ 𝜔 𝐸 , (D𝐵→𝑆 ◦ UN 𝑚
𝐴→𝐵𝐸 )(𝜌 𝐴 )) ≥ 1 − 𝜀. (16.1.43)
𝑀 𝑚=1

We can in turn rewrite this inequality as

1 ∑︁ √
𝑀 √
𝐹 (|𝑚⟩⟨𝑚| 𝑆 ⊗ 𝜔 𝐸 , (D𝐵→𝑆 ◦ UN
𝐴→𝐵𝐸 )(𝜌 𝑚
𝐴 )) ≥ 1−𝜀 (16.1.44)
𝑀 𝑚=1

and again as
𝑀
1 ∑︁ √ N
√
1 − 𝐹 (|𝑚⟩⟨𝑚| 𝑆 ⊗ 𝜔 𝐸 , (D𝐵→𝑆 ◦ U 𝐴→𝐵𝐸 )(𝜌 𝐴 )) ≤ 1 − 1 − 𝜀
𝑚
𝑀 𝑚=1
(16.1.45)
′
Markov’s inequality then guarantees that there exists a subset M of the set
{1, . . . , 𝑀 } of size ⌊𝑀/2⌋ such that the following condition holds for all 𝑚 ∈ M′:
√ N
√
1 − 𝐹 (|𝑚⟩⟨𝑚| 𝑆 ⊗ 𝜔 𝐸 , (D𝐵→𝑆 ◦ U 𝐴→𝐵𝐸 )(𝜌 𝐴 )) ≤ 2 1 − 1 − 𝜀 . (16.1.46)
𝑚

We can rewrite this condition as

√ 2
𝐹 (|𝑚⟩⟨𝑚| 𝑆 ⊗ 𝜔 𝐸 , (D𝐵→𝑆 ◦ UN 𝑚
𝐴→𝐵𝐸 )(𝜌 𝐴 )) ≥ 1−2 1− 1−𝜀 (16.1.47)
√ 2
= 1−2 1−𝜀 (16.1.48)
≥ 1 − 2𝜀. (16.1.49)

1026
Chapter 16: Private Communication

We now define the private communication protocol to consist of codewords

{𝜌 𝑚𝐴 B E𝑆→𝐴 (|𝑚⟩⟨𝑚| 𝑆 )} 𝑚∈M′ and the decoding POVM to be
( !)
∑︁
† 0 †
{Λ𝑚 𝐵 ≡ (D 𝐵→𝑆 ) (|𝑚⟩⟨𝑚|)} 𝑚∈M′ ∪ Λ 𝐵 𝑛 B (D 𝐵→𝑆 ) |𝑚⟩⟨𝑚| . (16.1.50)
𝑚∉M′

Thus, we have shown that from an (𝑀, 𝜀) quantum communication protocol, one
can realize an ( ⌊𝑀/2⌋ , 2𝜀) protocol for private communication. ■

Proposition 16.3 then implies the following for the one-shot capacities:

Theorem 16.4
For a quantum channel N 𝐴→𝐵 and 𝜀 ∈ (0, 1), the following inequality relates
𝜀
the one-shot quantum capacity 𝑄 2 (N) to the one-shot private capacity 𝑃 𝜀 (N):
𝜀
𝑄 2 (N) ≤ 𝑃 𝜀 (N) + 1. (16.1.51)

Proof: Given an arbitrary (𝑀, 𝜀/2) quantum communication protocol, by Propo-

sition 16.3, we can realize an arbitrary (𝑀/2, 𝜀) private communication protocol.
𝜀
Letting the protocol be one that achieves the one-shot quantum capacity 𝑄 2 (N)
𝜀
(i.e., log2 𝑀 = 𝑄 2 (N)), we conclude that there exists an (𝑀/2, 𝜀) private com-
munication protocol. Since this is a particular (𝑀/2, 𝜀) private communication
protocol, we conclude that

log2 (𝑀/2) ≤ 𝑃 𝜀 (N), (16.1.52)

which follows from the definition of the one-shot private capacity 𝑃 𝜀 (N). We
𝜀
finally use the fact that log2 (𝑀/2) = 𝑄 2 (N) − 1. ■

16.1.2 Secret-Key Transmission and Bipartite Private-State

Transmission

In this section, we establish a connection between secret-key transmission and

bipartite private-state transmission. Before doing so, we first define what is meant
by a bipartite private-state transmission protocol. To do so, we follow the same spirit
in Section 15.1.1, and we purify each step of a secret-key transmission protocol and
1027
Chapter 16: Private Communication

trace out the system possessed by the eavesdropper Eve. In this case, it is only the
environment 𝐸 of the isometric channel UN 𝐴→𝐵𝐸 that belongs to the eavesdropper,
and tracing it out leads to the original channel N 𝐴→𝐵 .
A bipartite private-state transmission protocol is defined by the triple

(M, UE𝑀 ′ →𝐴𝐴′ , UD

𝐵→ 𝑀ˆ 𝐵′
), (16.1.53)

where M is a message set, UE𝑀 ′ →𝐴𝐴′ is an isometric encoding channel, and UD ˆ ′

𝐵→ 𝑀 𝐵
is an isometric decoding channel. The protocol begins with Alice preparing a GHZ
state Φ 𝑀 ′′ 𝑀 𝑀 ′ of the following form:

Φ 𝑀 ′′ 𝑀 𝑀 ′ B |Φ⟩⟨Φ| 𝑀 ′′ 𝑀 𝑀 ′ , (16.1.54)

where
1 ∑︁
|Φ⟩ 𝑀 ′′ 𝑀 𝑀 ′ B √︁ |𝑚⟩ 𝑀 ′′ |𝑚⟩ 𝑀 |𝑚⟩ 𝑀 ′ . (16.1.55)
|M| 𝑚∈M
She transmits the 𝑀 ′ system through the isometric encoding channel UE𝑀 ′ →𝐴𝐴′ ,
leading to the state UE𝑀 ′ →𝐴𝐴′ (Φ 𝑀 ′′ 𝑀 𝑀 ′ ). She transmits the 𝐴 system through the
channel N 𝐴→𝐵 , leading to the state

N 𝐴→𝐵 (UE𝑀 ′ →𝐴𝐴′ (Φ 𝑀 ′′ 𝑀 𝑀 ′ )). (16.1.56)

Bob finally performs the isometric decoding channel UD ˆ ′ . The final state of
𝐵→ 𝑀 𝐵
the protocol is then as follows:

𝜔 𝑀 ′′ 𝑀 𝐴′ 𝑀ˆ 𝐵′ B (UD
𝐵→ 𝑀ˆ 𝐵′
◦ N 𝐴→𝐵 ◦ UE𝑀 ′ →𝐴𝐴′ )(Φ 𝑀 ′′ 𝑀 𝑀 ′ ), (16.1.57)

where the systems 𝑀 ′′ 𝑀 𝐴′ are in possession of Alice and systems 𝑀ˆ 𝐵′ are in

possession of Bob.
Observe that each step of the protocol involves a purification of the steps in
a secret-key transmission protocol, as outlined in Section 16.1. The initial GHZ
state is a purification of the maximally classically correlated state Φ 𝑀 𝑀 ′ . The
isometric encoding channel UE𝑀 ′ →𝐴𝐴′ purifies the encoding channel E 𝑀 ′ →𝐴 , and
the isometric decoding channel UD ˆ ′ purifies the decoding channel D𝐵→ 𝑀ˆ .
𝐵→ 𝑀 𝐵
The infidelity of a bipartite private-state transmission protocol of the form above
is then defined as follows:

(UE , UD ; N) B
𝑏
𝑝 err inf 1 − 𝐹 (𝛾 𝑀 ′′ 𝑀 𝐴′ 𝑀ˆ 𝐵′ , 𝜔 𝑀 ′′ 𝑀 𝐴′ 𝑀ˆ 𝐵′ ) , (16.1.58)
𝛾 𝑀 ′′ 𝑀 𝐴′ 𝑀ˆ 𝐵′

1028
Chapter 16: Private Communication

where the optimization is with respect to every ideal bipartite private state
𝛾 𝑀 ′′ 𝑀 𝐴′ 𝑀ˆ 𝐵′ , with key system 𝑀 held by Alice, shield systems 𝑀 ′′ 𝐴′ by Alice, key
system 𝑀ˆ by Bob, and shield system 𝐵′ by Bob (see Section 15.1.1).

Definition 16.5 (|M| , 𝜺) Private-State Transmission Protocol

A bipartite private-state transmission protocol (M, UE𝑀 ′ →𝐴𝐴′ , UD ˆ ′ ) for
𝐵→ 𝑀 𝐵
the channel N 𝐴→𝐵 is called an (|M| , 𝜀) protocol, with 𝜀 ∈ [0, 1], if
𝑏 (UE , UD ; N) ≤ 𝜀.
𝑝 err

We now establish the main result of this section, which is the equivalence of
secret-key transmission and bipartite private-state transmission:

Theorem 16.6
Let M be a message set, and let 𝜀 ∈ [0, 1]. Let N 𝐴→𝐵 be a quantum channel.
There exists an (|M| , 𝜀) secret-key transmission protocol for N 𝐴→𝐵 if and
only if there exists an (|M| , 𝜀) bipartite private-state transmission protocol for
N 𝐴→𝐵 .

Proof: We start by proving that there exists an (|M| , 𝜀) bipartite private-state

transmission protocol if there exists an (|M| , 𝜀) secret-key transmission protocol.
Let UN 𝐴→𝐵𝐸 be an isometric channel extending N 𝐴→𝐵 . Let E 𝑀 →𝐴 be the encoding
′

channel, and let D𝐵→ 𝑀ˆ be the decoding channel. The final state of the protocol is
as follows:
𝜔 𝑀 𝑀ˆ 𝐸 B (D𝐵→ 𝑀ˆ ◦ N 𝐴→𝐵 ◦ E 𝑀 ′ →𝐴 )(Φ 𝑀 𝑀 ′ ) (16.1.59)
and satisfies the inequality

1 − 𝐹 (Φ 𝑀 𝑀ˆ ⊗ 𝜎𝐸 , 𝜔 𝑀 𝑀ˆ 𝐸 ) ≤ 𝜀. (16.1.60)

Observe that Φ 𝑀 𝑀ˆ ⊗ 𝜎𝐸 is an ideal tripartite key state, according to Definition 15.1.

Now let UE𝑀 ′ →𝐴𝐴′ be an isometric channel that extends the encoding channel E 𝑀 ′ →𝐴 ,
and let UD ˆ ′ be an isometric channel that extends the decoding channel D𝐵→ 𝑀ˆ .
𝐵→ 𝑀 𝐵
Let Φ 𝑀 ′′ 𝑀 𝑀 ′ be a GHZ state that purifies Φ 𝑀 𝑀 ′ . Then the following state is a
purification of 𝜔 𝑀 𝑀ˆ 𝐸 :

𝜔 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ 𝐸 B (UD
𝐵→ 𝑀ˆ 𝐵′
◦ UN E
𝐴→𝐵𝐸 ◦ U 𝑀 ′ →𝐴𝐴′ )(Φ 𝑀 ′′ 𝑀 𝑀 ′ ). (16.1.61)
1029
Chapter 16: Private Communication

Applying Uhlmann’s theorem, we conclude that there is a purification of Φ 𝑀 𝑀ˆ ⊗ 𝜎𝐸 ,

call it 𝛾 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ 𝐸 , such that

𝐹 (Φ 𝑀 𝑀ˆ ⊗ 𝜎𝐸 , 𝜔 𝑀 𝑀ˆ 𝐸 ) = 𝐹 (𝛾 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ 𝐸 , 𝜔 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ 𝐸 ). (16.1.62)

Tracing over the 𝐸 system, we conclude from the data-processing inequality for
fidelity that
1 − 𝐹 (𝛾 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ , 𝜔 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ ) ≤ 𝜀, (16.1.63)
where

𝜔 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ = (UD
𝐵→ 𝑀ˆ 𝐵′
◦ N 𝐴→𝐵 ◦ UE𝑀 ′ →𝐴𝐴′ )(Φ 𝑀 ′′ 𝑀 𝑀 ′ ). (16.1.64)

Furthermore, by Definition 15.4, the state 𝛾 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ is an ideal bipartite private

state with key system 𝑀 held by Alice, shield systems 𝑀 ′′ 𝐴′ held by Alice, key
system 𝑀ˆ held by Bob, and shield system 𝐵′ held by Bob. Thus, we have shown
the first claim.
Now we establish the opposite implication (which follows essentially by running
the argument above backwards). To this end, let UE𝑀 ′ →𝐴𝐴′ be an isometric encoding
channel, and let UD ˆ ′ be an isometric decoding channel for a bipartite private-
𝐵→ 𝑀 𝐵
state transmission protocol. The initial state of the protocol is the GHZ state
Φ 𝑀 ′′ 𝑀 𝑀 ′ , and the final state is 𝜔 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ , as given in (16.1.64). For an (|M| , 𝜀)
bipartite private-state transmission protocol, the following inequality holds

1 − 𝐹 (𝛾 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ , 𝜔 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ ) ≤ 𝜀, (16.1.65)

where 𝛾 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ is an ideal bipartite private state. The state 𝜔 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ 𝐸 in

(16.1.61) is a purification of 𝜔 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ , and by Uhlmann’s theorem, there exists
a purification 𝛾 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ 𝐸 of 𝛾 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ such that

𝐹 (𝛾 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ , 𝜔 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ ) = 𝐹 (𝛾 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ 𝐸 , 𝜔 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ 𝐸 ). (16.1.66)

Tracing over the systems 𝑀 ′′, 𝐴′, and 𝐵′, the following inequality holds

1 − 𝐹 (𝛾 𝑀 𝑀ˆ 𝐸 , 𝜔 𝑀 𝑀ˆ 𝐸 ) ≤ 𝜀. (16.1.67)

By the definition of an ideal private state (see Definition 15.4) and since the state
𝛾 𝑀 𝑀 ′′ 𝐴′ 𝑀ˆ 𝐵′ is an ideal bipartite private state, it follows that 𝛾 𝑀 𝑀ˆ 𝐸 is an ideal
tripartite key state. Thus, we have proven the second claim. ■

1030
Chapter 16: Private Communication

16.1.3 Upper Bounds on the Number of Transmitted Private

Bits

We now establish some general upper bounds on the number of private bits
that can be communicated in an arbitrary private communication protocol. The
results are stated in Proposition 16.7 and Theorems 16.9 and 16.11, and, like the
upper bounds established in previous chapters, they hold independently of the
encoding and decoding channels used in the protocol and depends only on the
given communication channel N. The first upper bound is in terms of the one-shot
private information of the channel, and the others are in terms of the channel’s
𝜀-relative entropy of entanglement and squashed entanglement.

16.1.3.1 Private Information Upper Bound

Proposition 16.7 Upper Bound on One-Shot Private Capacity

Let N 𝐴→𝐵 be a quantum channel. For every (|M| , 𝜀) private communication
protocol over N, with 𝜀 ∈ [0, 1], the number of private bits transmitted over N
is bounded from above by the one-shot private information of N:
√
𝜀 𝜀
log2 |M| ≤ sup 𝐼 𝐻 (𝑋; 𝐵) 𝜌 − 𝐼max (𝑋; 𝐸) 𝜌 , (16.1.68)
{𝑝(𝑥),𝜌 𝑥𝐴 } 𝑥 ∈X

where the optimization is over every ensemble {𝑝(𝑥), 𝜌 𝑥𝐴 }𝑥∈X and the state
𝜌 𝑋 𝐵𝐸 is given by
∑︁
𝜌 𝑋 𝐵𝐸 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ UN 𝑥
𝐴→𝐵𝐸 (𝜌 𝐴 ), (16.1.69)
𝑥∈X

with UN
𝐴→𝐵𝐸 an isometric channel extending N 𝐴→𝐵 . The hypothesis testing
mutual information 𝐼√𝐻𝜀 (𝑋; 𝐵) 𝜌 is defined in (7.11.88) and the smooth max-
𝜀
mutual information 𝐼max (𝑋; 𝐸) 𝜌 in (15.1.59). Therefore,
√
𝜀 𝜀 𝜀
𝑃 (N) ≤ sup 𝐼 𝐻 (𝑋; 𝐵) 𝜌 − 𝐼max (𝑋; 𝐸) 𝜌 . (16.1.70)
{𝑝(𝑥),𝜌 𝑥𝐴 } 𝑥 ∈X

Proof: The proof has some similarities with the proof of Lemma 15.10. Since
1031
Chapter 16: Private Communication

𝑝 ∗err (E, D; N) ≥ 𝑝 err (E, D; 𝑝, N) for every probability distribution 𝑝(𝑚) over the
messages, it follows by definition that

𝑃 𝜀 (N) ≤ sup log2 |M| : 𝑝 err (E, D; 𝑝, N) ≤ 𝜀 , (16.1.71)
(M,E,D)

with 𝑝 set to the uniform distribution over messages. So we bound the right-hand
side instead (note that it is equal to the one-shot secret-key transmission capacity).
Let (M, E 𝑀 ′ →𝐴 , D𝐵→ 𝑀ˆ ) be an arbitrary private communication protocol. By the
reasoning in (16.1.17)–(16.1.22), it follows that
1 ∑︁
Tr[Λ𝑚 𝐵 N 𝐴→𝐵 (𝜌 𝐴 )] ≥ 1 − 𝜀.
𝑚
(16.1.72)
|M|
𝑚∈M

By the same reasoning given in the proof of Proposition 12.3, we conclude that

log2 |M| ≤ 𝐼 𝐻𝜀 (𝑀; 𝐵)𝜏 , (16.1.73)

where the state 𝜏𝑀 𝐵𝐸 is defined as

1 ∑︁
𝜏𝑀 𝐵𝐸 B |𝑚⟩⟨𝑚| 𝑀 ⊗ UN 𝑚
𝐴→𝐵𝐸 (𝜌 𝐴 ). (16.1.74)
|M|
𝑚∈M

Observe that
1 ∑︁
𝜏𝑀 𝐵𝐸 = |𝑚⟩⟨𝑚| 𝑀 ⊗ (UN
𝐴→𝐵𝐸 ◦ E 𝑀 ′ →𝐴 )(|𝑚⟩⟨𝑚| 𝑀 ′ ) (16.1.75)
|M|
𝑚∈M
= (UN
𝐴→𝐵𝐸 ◦ E 𝑀 ′ →𝐴 )(Φ 𝑀 𝑀 ′ ). (16.1.76)

From (16.1.23)–(16.1.27), we know that there exists a state 𝜎𝐸 such that

𝜀 ≥ 1 − 𝐹 (𝜋 𝑀 ⊗ 𝜎𝐸 , ( N
b 𝐴→𝐸 ◦ E 𝑀 ′ →𝐴 )(Φ 𝑀 𝑀 ′ ))
= 1 − 𝐹 (𝜏𝑀 ⊗ 𝜎𝐸 , 𝜏𝑀 𝐸 ),

which, by applying the same reasoning in (15.1.93)–(15.1.98), allows us to conclude

that √
𝜀
𝐼max (𝑀; 𝐸)𝜏 ≤ 0. (16.1.77)

Putting together (16.1.73) and (16.1.77) implies that

√
𝜀
log2 |M| ≤ 𝐼 𝐻𝜀 (𝑀; 𝐵)𝜏 − 𝐼max (𝑀; 𝐸)𝜏 (16.1.78)
1032
Chapter 16: Private Communication
√
𝜀
≤ sup 𝐼 𝐻𝜀 (𝑋; 𝐵) 𝜌 − 𝐼max (𝑋; 𝐸) 𝜌 , (16.1.79)
{𝑝(𝑥),𝜌 𝑥𝐴 } 𝑥 ∈X

n o
1
where the final inequality follows by noting that is a particular
|M| , 𝜌 𝑚𝐴
𝑚∈M
input ensemble and the one-shot private information in the last line involves an
optimization over all input ensembles. ■

As a consequence of the reasoning behind Proposition 16.7, along with (7.2.96),

Proposition 7.70, and (15.1.155), we obtain the following:

Corollary 16.8
Let N 𝐴→𝐵 be a quantum channel, and let 𝜀 ∈ [0, 1). For all (|M| , 𝜀) private
communication protocols for N, the following bound holds
√
1 − 𝜀 − 𝜀 log2 |M| ≤
√
sup 𝐼 (𝑋; 𝐵) 𝜌 − 𝐼 (𝑋; 𝐸) 𝜌 + ℎ2 (𝜀) + 2𝑔( 𝜀). (16.1.80)
{ 𝑝(𝑥),𝜌 𝑥𝐴 } 𝑥 ∈X

Consequently, the following bound holds for the one-shot private capacity of a
channel N:
√
1 − 𝜀 − 𝜀 𝑃 𝜀 (N) ≤
√
sup 𝐼 (𝑋; 𝐵) 𝜌 − 𝐼 (𝑋; 𝐸) 𝜌 + ℎ2 (𝜀) + 2𝑔( 𝜀). (16.1.81)
{ 𝑝(𝑥),𝜌 𝑥𝐴 } 𝑥 ∈X

Proof: Employing the same reasoning that led to (16.1.73) and (16.1.77), consider
that the following bounds hold for a given (|M| , 𝜀) private communication protocol:

log2 |M| ≤ 𝐼 𝐻𝜀 (𝑀; 𝐵)𝜏 , (16.1.82)

√
𝜀
𝐼max (𝑀; 𝐸)𝜏 ≤ 0, (16.1.83)

where the state 𝜏𝑀 𝐵𝐸 is defined in (16.1.74). Now we apply (7.2.96) and Proposi-
tion 7.70 to conclude that
1
𝐼 𝐻𝜀 (𝑀; 𝐵)𝜏 ≤ (𝐼 (𝑀; 𝐵)𝜏 + ℎ2 (𝜀)) , (16.1.84)
1−𝜀

1033
Chapter 16: Private Communication

which implies that

(1 − 𝜀) log2 |M| ≤ 𝐼 (𝑀; 𝐵)𝜏 + ℎ2 (𝜀), (16.1.85)

and the same reasoning that led to (15.1.155) to conclude that

√ √ √
𝜀
𝐼max (𝑀; 𝐸)𝜏 ≥ 𝐼 (𝑀; 𝐸)𝜏 − 𝜀 log2 |M| − 2𝑔2 ( 𝜀). (16.1.86)

Combining (16.1.85) and (16.1.86), we conclude that

√
1 − 𝜀 − 𝜀 log2 |M|
√
≤ 𝐼 (𝑀; 𝐵)𝜏 − 𝐼 (𝑀; 𝐸)𝜏 + ℎ2 (𝜀) + 2𝑔( 𝜀) (16.1.87)
√
≤ sup 𝐼 (𝑋; 𝐵) 𝜌 − 𝐼 (𝑋; 𝐸) 𝜌 + ℎ2 (𝜀) + 2𝑔( 𝜀), (16.1.88)
{ 𝑝(𝑥),𝜌 𝑥𝐴 } 𝑥 ∈X

where the last inequality follows by optimizing over all input ensembles. ■

16.1.3.2 Relative Entropy of Entanglement Upper Bound

We now consider an upper bound based on the channel’s relative entropy of

entanglement. In order to do so, we exploit the equivalence between secret-key
transmission and bipartite private-state transmission established in Section 16.1.2.
We also make use of Proposition 15.15, which gives an upper bound on the number
log2 |M| of private bits contained in the final state of a bipartite private-state
transmission protocol.
In the previous chapter on secret key distillation, our approach to obtaining
upper bounds on distillable key consisted of 1) establishing a connection between a
tripartite key distillation protocol and a bipartite private state distillation protocol
(see Section 15.1.2) and 2) comparing the state at the output of a bipartite private-
state distillation protocol with one that is useless for this task. We considered
the set of separable states as the useless set, and we proved that certain state
entanglement measures are upper bounds on distillable key in the one-shot and
asymptotic settings.
In Section 16.1.2, we established a similar correspondence between secret-key
transmission, as defined in Section 16.1, and bipartite private-state transmission.
Here, we observe that bipartite private-state transmission is similar to bipartite
private-state distillation in the sense that, like private-state distillation, the error
1034
Chapter 16: Private Communication

criterion for private-state transmission involves comparing the output state to an

ideal private state (see Definition 16.5). This suggests that the state entanglement
measures defined in Section 9.2 are relevant (in particular the result of Proposi-
tion 15.15). However, as was the case for quantum communication in Chapter 14,
the main resource that we are considering in this chapter is a quantum channel and
not a quantum state, and so we have an extra degree of freedom in the input state to
the channel, which we can optimize. This suggests that the channel entanglement
measures from Chapter 10 are relevant, and it is indeed what we find.

Theorem 16.9 Relative Entropy of Entanglement Upper Bound on One-

Shot Private Capacity
Let N 𝐴→𝐵 be a quantum channel, and let 𝜀 ∈ [0, 1). For all (|M| , 𝜀) private
communication protocols for N, the following bound holds

log2 |M| ≤ 𝐸 𝑅𝜀 (N), (16.1.89)

where 𝐸 𝑅𝜀 (N) is the 𝜀-relative entropy of entanglement of N, defined in (10.3.8)

as
𝐸 𝑅𝜀 (N) B sup inf 𝐷 𝜀𝐻 (N 𝐴→𝐵 (𝜓 𝑆 𝐴 )∥𝜎𝑆𝐵 ). (16.1.90)
𝜓 𝑆 𝐴 𝜎𝑆𝐵 ∈SEP(𝑆:𝐵)

Consequently, we have the following bound on the one-shot private capacity:

𝑃 𝜀 (N) ≤ 𝐸 𝑅𝜀 (N). (16.1.91)

Proof: By definition, it follows that the one-shot secret-key transmission capacity

is an upper bound on 𝑃 𝜀 (N). Applying Theorem 16.6, we conclude that the
one-shot bipartite private-state transmission capacity is equal to the one-shot
secret-key transmission capacity. So let us bound the one-shot bipartite private-
state transmission capacity. Consider an arbitrary (𝑀, 𝜀) bipartite private-state
transmission protocol. The final state of such a protocol satisfies the condition in
Definition 16.5, which means that there is an ideal private state 𝛾 𝑀 ′′ 𝑀 𝐴′ 𝑀ˆ 𝐵′ such
that
1 − 𝐹 (𝛾 𝑀 ′′ 𝑀 𝐴′ 𝑀ˆ 𝐵′ , 𝜔 𝑀 ′′ 𝑀 𝐴′ 𝑀ˆ 𝐵′ ) ≤ 𝜀. (16.1.92)
As such, Proposition 15.15 applies, and we conclude that

log2 |M| ≤ 𝐸 𝑅𝜀 (𝑀 ′′ 𝑀 𝐴′; 𝑀ˆ 𝐵′)𝜔 (16.1.93)

≤ 𝐸 𝑅𝜀 (𝑀 ′′ 𝑀 𝐴′; 𝐵) 𝜌 (16.1.94)
1035
Chapter 16: Private Communication

≤ 𝐸 𝑅𝜀 (N). (16.1.95)

The second inequality follows from the data-processing inequality for 𝐷 𝜀𝐻 under
the action of the isometric decoding channel UD ˆ ′ and where the state 𝜌 𝑀 ′′ 𝑀 𝐴′ 𝐵
𝐵→ 𝑀 𝐵
is defined as
𝜌 𝑀 ′′ 𝑀 𝐴′ 𝐵 B (N 𝐴→𝐵 ◦ UE𝑀 ′ →𝐴𝐴′ )(Φ 𝑀 ′′ 𝑀 𝑀 ′ ). (16.1.96)
The systems 𝑀 ′′ 𝑀 𝐴′ extend the system 𝐴 of the state UE𝑀 ′ →𝐴𝐴′ (Φ 𝑀 ′′ 𝑀 𝑀 ′ ), with
𝐴 being the input to the channel N 𝐴→𝐵 . As such, we can optimize over all such
input states, and then conclude the final inequality above (here, we need to apply
the remark after Definition 10.1 as well). ■

We then have the following bound as a direct application of Proposition 7.71:

Corollary 16.10
Let N 𝐴→𝐵 be a quantum channel, and let 𝜀 ∈ [0, 1). For all (|M| , 𝜀) private
communication protocols for N, the following bound holds for all 𝛼 > 1:

𝛼 1
log2 |M| ≤ 𝐸
e𝛼 (N) + log2 , (16.1.97)
𝛼−1 1−𝜀

where 𝐸e𝛼 (N) is the sandwiched Renyi relative entropy of entanglement of N,

defined in (10.3.10) as
e𝛼 (N) B sup
𝐸 inf e𝛼 (N 𝐴→𝐵 (𝜓 𝑆 𝐴 )∥𝜎𝑆𝐵 ).
𝐷 (16.1.98)
𝜓 𝑆 𝐴 𝜎𝑆𝐵 ∈SEP(𝑆:𝐵)

16.1.3.3 Squashed Entanglement Upper Bound

We now turn to squashed entanglement and establish it as an upper bound on

one-shot private capacity. The reasoning behind this result is very similar to that
given in the proof of Proposition 16.9, except that we employ Proposition 15.19
instead:

1036
Chapter 16: Private Communication

Theorem 16.11 Squashed Entanglement Upper Bound on One-Shot

Private Capacity
Let N 𝐴→𝐵 be a quantum channel, and let 𝜀 ∈ [0, 1). For all (|M| , 𝜀) private
communication protocols for N, the following bound holds
√ √
1 − 2 𝜀 log2 |M| ≤ 𝐸 sq (N) + 2𝑔2 ( 𝜀), (16.1.99)

where 𝐸 sq (N) is the squashed entanglement of the channel N, given in Defini-

tion 10.14 as
𝐸 sq (N) B sup 𝐸 sq (𝑆; 𝐵)𝜏 , (16.1.100)
𝜓𝑆 𝐴

and 𝜏𝑆𝐵 B N 𝐴→𝐵 (𝜓 𝑆 𝐴 ). Consequently, we have the following bound on the

one-shot private capacity:
√ √
1 − 2 𝜀 𝑃 𝜀 (N) ≤ 𝐸 sq (N) + 2𝑔2 ( 𝜀). (16.1.101)

Proof: As indicated above, the argument is precisely the same as in the proof of
Proposition 16.9, except that we apply the following bound from Proposition 15.19
instead:
√ √
1 − 2 𝜀 log2 |M| ≤ 𝐸 sq (𝑀 ′′ 𝑀 𝐴′; 𝑀ˆ 𝐵′)𝜔 + 2𝑔2 ( 𝜀). (16.1.102)

After this step, we apply the data-processing inequality for 𝐸 sq and optimize over
channel input states. ■

16.1.4 Lower Bound on the Number of Transmitted Private Bits

via Position-Based Coding and Convex Splitting

Having derived upper bounds on the number of private bits that can be transmitted
in an arbitrary private communication protocol, let us now determine a lower bound.
Here we use the methods of position-based coding and convex splitting to derive
an explicit (|M| , 𝜀) protocol for all 𝜀 ∈ (0, 1).
To derive this lower bound, let us consider a slightly different model of
communication, in which there is a one-input, two-output classical–quantum channel
connecting the sender Alice to the legitimate receiver Bob and the eavesdropper

1037
Chapter 16: Private Communication

Eve:
𝑥 → 𝜌 𝑥𝐵𝐸 , (16.1.103)
where 𝑥 ∈ X is the classical input symbol and 𝜌 𝑥𝐵𝐸 is the bipartite quantum state
that appears at the output when 𝑥 is input. Bob has access to the system 𝐵 of the
output and Eve to 𝐸. The channel can alternatively be written as a quantum channel
as follows: ∑︁
N 𝑋→𝐵𝐸 (𝜔) B ⟨𝑥| 𝑋 𝜔|𝑥⟩ 𝑋 𝜌 𝑥𝐵𝐸 , (16.1.104)
𝑥∈X
where {|𝑥⟩ 𝑋 }𝑥∈X is an orthonormal basis. In this way, a private communication
protocol for N 𝑋→𝐵𝐸 is defined exactly as we did in Section 16.1, with N 𝑋→𝐵𝐸
replacing the isometric channel U 𝐴→𝐵𝐸 therein. Furthermore, the notions of
code infidelity, an (|M| , 𝜀) private communication protocol, and one-shot private
capacity are defined in the same way, but with N 𝑋→𝐵𝐸 replacing the isometric
channel U 𝐴→𝐵𝐸 .
The main result of this section is the following lower bound on the one-shot
private capacity 𝑃 𝜀 (N) of a classical–quantum wiretap channel N 𝑋→𝐵𝐸 :
𝜀 ′ −𝛿−𝜂 𝛿−𝜁
𝑃 𝜀 (N) ≥ 𝐼 𝐻 (𝑋; 𝐵) 𝜌 − 𝐼 max (𝐸; 𝑋) 𝜌
8 (𝜀′ − 𝛿)

2
− log2 − log2 , (16.1.105)
𝜂2 𝜁2
√︁
where 𝜀′ = 1 − 1 − 𝜀/2, 𝛿 ∈ (0, 𝜀′), 𝜂 ∈ (0, 𝜀′ − 𝛿), and 𝜁 ∈ (0, 𝛿), and the
information measures are evaluated with respect to the state
∑︁
𝜌 𝑋 𝐵𝐸 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐵𝐸 . (16.1.106)
𝑥∈X

In the above, 𝑃 𝜀 (N) represents the maximum number of bits that can be sent from
Alice to Bob, using a classical–quantum wiretap channel once, such that the infidelity
does not exceed 𝜀 ∈ (0, 1). The quantities on the right-hand side of the inequality
in (16.1.105) are particular one-shot generalizations of the Holevo information to
Bob and Eve, which are defined in (7.11.88) and (15.1.211), respectively.
To prove the one-shot bound in (16.1.105), we employ position-based coding
(Section 11.1.3) and convex splitting (Section 15.1.4). The main idea of position-
based coding is conceptually simple and we review it briefly here. To communicate
a classical message from Alice to Bob, we allow them to share a quantum state 𝜌 𝑅⊗𝑀
𝐴
before communication begins, where 𝑀 is the number of messages, Bob possesses
1038
Chapter 16: Private Communication

the 𝑅 systems, and Alice the 𝐴 systems. If Alice wishes to communicate message
𝑚, then she sends the 𝑚th 𝐴 system through the channel. The reduced state of
Bob’s systems is then

𝜌 𝑅1 ⊗ · · · ⊗ 𝜌 𝑅𝑚−1 ⊗ 𝜌 𝑅𝑚 𝐵 ⊗ 𝜌 𝑅𝑚+1 ⊗ · · · ⊗ 𝜌 𝑅 𝑀 , (16.1.107)

where 𝜌 𝑅𝑚 𝐵 = N 𝐴𝑚 →𝐵 (𝜌 𝑅𝑚 𝐴𝑚 ) and N 𝐴𝑚 →𝐵 is the quantum channel. For all 𝑚′ ≠ 𝑚,

the reduced state for systems 𝑅𝑚 ′ and 𝐵 is the product state 𝜌 𝑅𝑚′ ⊗ 𝜌 𝐵 . However,
the reduced state of systems 𝑅𝑚 𝐵 is the (generally) correlated state 𝜌 𝑅𝑚 𝐵 . So if
Bob has a binary measurement that can distinguish the joint state 𝜌 𝑅𝐵 from the
product state 𝜌 𝑅 ⊗ 𝜌 𝐵 sufficiently well, he can base a decoding strategy off of this,
and the scheme is reliable as long as the number of bits log2 𝑀 to be communicated
is chosen to be roughly equal to the hypothesis testing mutual information. This
is exactly what is used in position-based coding, thus forging a transparent and
intuitive link between quantum hypothesis testing and communication for the case
of entanglement-assisted communication.
Convex splitting is rather intuitive as well and can be thought of as dual to
the coding scenario mentioned above. Suppose instead that Alice and Bob have
a means of generating the state in (16.1.107), perhaps by the strategy mentioned
above. But now suppose that Alice chooses the variable 𝑚 uniformly at random, so
that the state, from the perspective of someone ignorant of the choice of 𝑚, is the
following mixture:
𝑀
1 ∑︁
𝜌 𝑅 ⊗ · · · ⊗ 𝜌 𝑅𝑚−1 ⊗ 𝜌 𝑅𝑚 𝐵 ⊗ 𝜌 𝑅𝑚+1 ⊗ · · · ⊗ 𝜌 𝑅 𝑀 . (16.1.108)
𝑀 𝑚=1 1

The convex-split lemma (Lemma 15.22) guarantees that as long as log2 𝑀 is roughly
equal to the smooth max-mutual information in (15.1.211), then the state above is
nearly indistinguishable from the product state 𝜌 𝑅⊗𝑀 ⊗ 𝜌 𝐵 .
Here we use the approaches of position-based coding and convex splitting in
conjunction to construct codes for the classical–quantum wiretap channel. The
main underlying idea is to have a message variable 𝑚 ∈ {1, . . . , 𝑀 } and a local
randomness variable 𝑟 ∈ {1, . . . , 𝑅}, the latter of which is selected uniformly at
random and used to confuse the eavesdropper Eve. Before communication begins,
Alice, Bob, and Eve are allowed share to 𝑀 𝑅 copies of the common randomness
state ∑︁
𝜌 𝑋 𝑋 ′ 𝑋 ′′ B 𝑝 𝑋 (𝑥)|𝑥𝑥𝑥⟩⟨𝑥𝑥𝑥| 𝑋 𝑋 ′ 𝑋 ′′ . (16.1.109)
𝑥∈X

1039
Chapter 16: Private Communication

We can think of the 𝑀 𝑅 copies of 𝜌 𝑋 𝑋 ′ 𝑋 ′′ as being partitioned into 𝑀 blocks, each

of which contain 𝑅 copies of the state 𝜌 𝑋 𝑋 ′ 𝑋 ′′ . If Alice wishes to send message
𝑚, then she picks 𝑟 uniformly at random and sends the 𝑋 𝐴 system labeled by
(𝑚, 𝑟) through the classical–quantum wiretap channel in (16.1.103). As long as
𝜀
log2 𝑀 𝑅 is roughly equal to the hypothesis testing mutual information 𝐼 𝐻 (𝑋; 𝐵),
then Bob can use a position-based decoder to figure out both 𝑚 and 𝑟. As long as
𝜀
log2 𝑅 is roughly equal to the smooth max-mutual information 𝐼 max (𝐸; 𝑋), then the
convex-split lemma guarantees that the overall state of Eve’s systems, regardless of
which message 𝑚 was chosen, is nearly indistinguishable from the product state
𝜌 ⊗𝑀
𝑋𝐸
𝑅
⊗ 𝜌 𝐸 . Thus, in such a scheme, Bob can figure out 𝑚 while Eve cannot
figure out anything about 𝑚. This is the intuition behind the coding scheme and
𝜀 𝜀
gives a sense of why log2 𝑀 = log2 𝑀 𝑅 − log2 𝑅 ≈ 𝐼 𝐻 (𝑋; 𝐵) − 𝐼 max (𝐸; 𝑋) is an
achievable number of bits that can be sent privately from Alice to Bob. The main
purpose of this section is to develop the details of this argument and furthermore to
show how the scheme can be derandomized, so that the 𝑀 𝑅 copies of the common
randomness state 𝜌 𝑋 𝑋 ′ 𝑋 ′′ are in fact not necessary.
There are strong connections between the approach for establishing a lower
bound on one-shot distillable key detailed in Section 15.1.4, and the approach we
have outlined above and detail below. In fact, there is a point in the analysis below
at which it becomes precisely the same, and at that point, we simply invoke the
proof of Theorem 15.21 to complete the analysis.
We now state the main theorem of this section:

Theorem 16.12
Let N 𝑋→𝐵𝐸 : 𝑥 → 𝜌 𝑥𝐵𝐸 be a classical–quantum wiretap channel, in which Alice
has access to the input, Bob to the output
√︁ system 𝐵, and Eve to the output
system 𝐸. For all 𝜀 ∈ (0, 1], 𝜀 = 1 − 1 − 𝜀/2, 𝛿 ∈ (0, 𝜀′), 𝜂 ∈ (0, 𝜀′ − 𝛿),
′

and 𝜁 ∈ (0, 𝛿), there exists an (|M| , 𝜀) private communication protocol for
N 𝑋→𝐵𝐸 , such that
𝜀 ′ −𝛿−𝜂 𝛿−𝜁
log2 |M| = 𝐼 𝐻 (𝑋; 𝐵) 𝜌 − 𝐼 max (𝐸; 𝑋) 𝜌
8 (𝜀′ − 𝛿)

2
− log2 − log2 .
𝜂2 𝜁2
𝜀 ′ −𝛿−𝜂
where the hypothesis testing mutual information 𝐼𝐻 (𝑋; 𝐵) 𝜌 is defined in

1040
Chapter 16: Private Communication

𝛿−𝜁
(7.11.88) and the smooth max-mutual information 𝐼 max (𝐸; 𝑋) 𝜌 in (15.1.211),
and they are evaluated with respect to the state 𝜌 𝑋 𝐵𝐸 in (16.1.106).

Proof: We first exhibit a public shared randomness assisted protocol for private
communication and then show later how to derandomize it. The protocol proceeds
exactly as discussed above. We suppose that Alice, Bob, and Eve share the state
𝜌 ⊗𝑀
𝑋 𝑋 ′ 𝑋 ′′ before communication begins, where 𝑀 = |M|. If Alice wants to send
𝑅

the message 𝑚, she picks 𝑟 uniformly at random from {1, . . . , 𝑅} and transmits a
classical copy 𝑋 ′′′ of the 𝑋 system labeled by (𝑚, 𝑟) through the channel N 𝑋→𝐵𝐸 .
The resulting state of Alice, Bob, and Eve, for fixed 𝑚 and 𝑟, is then as follows:

𝜌 𝑚,𝑟
𝑋 𝑀 𝑅 𝑋 ′𝑀 𝑅 𝑋 ′′𝑀 𝑅 𝐵𝐸
′ 𝑋 ′′ ⊗ · · · ⊗ 𝜌 𝑋
B 𝜌 𝑋1,1 𝑋1,1 1,1 𝑚,𝑟 −1 𝑋𝑚,𝑟 −1 𝑋𝑚,𝑟 −1 ⊗
′ ′′

′ 𝑋 ′′ 𝐵𝐸 ⊗ 𝜌 𝑋
𝑚,𝑟+1 𝑋𝑚,𝑟+1 𝑋𝑚,𝑟+1 ⊗ · · · ⊗ 𝜌 𝑋 𝑀,𝑅 𝑋 𝑀,𝑅 𝑋 𝑀,𝑅 , (16.1.110)
𝜌 𝑋𝑚,𝑟 𝑋𝑚,𝑟 ′ ′′ ′ ′′
𝑚,𝑟

where

′ 𝑋 ′′ = · · · = 𝜌 𝑋
𝜌 𝑋1,1 𝑋1,1 ′ ′′ (16.1.111)
1,1 𝑚,𝑟 −1 𝑋𝑚,𝑟 −1 𝑋𝑚,𝑟 −1

= 𝜌 𝑋𝑚,𝑟+1 𝑋𝑚,𝑟+1
′ ′′
𝑋𝑚,𝑟+1 = · · · = 𝜌 𝑋 𝑀,𝑅 𝑋 𝑀,𝑅
′ ′′
𝑋 𝑀,𝑅 (16.1.112)
∑︁
= 𝑝 𝑋 (𝑥)|𝑥𝑥𝑥⟩⟨𝑥𝑥𝑥| 𝑋 𝑋 ′ 𝑋 ′′ , (16.1.113)
𝑥∈X

and
∑︁
𝜌 ′ 𝑋 ′′ 𝐵𝐸
𝑋𝑚,𝑟 𝑋𝑚,𝑟 𝑚,𝑟
= 𝑝 𝑋 (𝑥)|𝑥𝑥𝑥⟩⟨𝑥𝑥𝑥| 𝑋 𝑋 ′ 𝑋 ′′ ⊗ N 𝑋 ′′′ →𝐵𝐸 (|𝑥⟩⟨𝑥| 𝑋 ′′′ ) (16.1.114)
𝑥∈X
∑︁
= 𝑝 𝑋 (𝑥)|𝑥𝑥𝑥⟩⟨𝑥𝑥𝑥| 𝑋 𝑋 ′ 𝑋 ′′ ⊗ 𝜌 𝑥𝐵𝐸 . (16.1.115)
𝑥∈X

At this point, the state here is precisely the same as that given in (15.1.200),
and the goal from here is the same as well. Thus, we can apply the same reasoning
given there to conclude that the following infidelity condition holds

1 − 𝐹 (M 𝑋 ′𝑀 𝑅 𝐵→𝑀𝐵 (𝜌 𝑀 𝐴 𝑋 ′𝑀 𝑅 𝑋 ′′𝑀 𝑅 𝐵𝐸 ), Φ 𝑀 𝐴 𝑀𝐵 ⊗ 𝜌 𝑋 ′′𝑀 𝑅 ⊗ e

𝜌 𝐸 ) ≤ 𝜀 (16.1.116)

if
𝜀 ′ −𝛿−𝜂 𝛿−𝜁
log2 |M| = 𝐼 𝐻 (𝑋; 𝐵)𝜏 − 𝐼 max (𝐸; 𝑋)𝜏
1041
Chapter 16: Private Communication

4 (𝜀′ − 𝛿)

2
− log2 − log2 , (16.1.117)
𝜂2 𝜁2
where ∑︁
𝜏𝑋 𝐵𝐸 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ 𝜌 𝑥𝐵𝐸 (16.1.118)
𝑥∈X
and 𝜌 𝑀 𝐴 𝑋 ′𝑀 𝑅 𝑋 ′′𝑀 𝑅 𝐵𝐸 is the reduction of the following state:

𝜌 𝑀 𝐴 𝑅 𝐴 𝑋 𝑀 𝑅 𝑋 ′𝑀 𝑅 𝑋 ′′𝑀 𝑅 𝐵𝐸 B
𝑀 𝑅
1 ∑︁ ∑︁
|𝑚⟩⟨𝑚| 𝑀 𝐴 ⊗ |𝑟⟩⟨𝑟 | 𝑅 𝐴 ⊗ 𝜌 𝑚,𝑟
𝑋 𝑀 𝑅 𝑋 ′𝑀 𝑅 𝑋 ′′𝑀 𝑅 𝐵𝐸
. (16.1.119)
𝑀 𝑅 𝑚=1 𝑟=1

That is,

𝜌 𝑀 𝐴 𝑋 ′𝑀 𝑅 𝑋 ′′𝑀 𝑅 𝐵𝐸 = Tr 𝑅 𝐴 𝑋 𝑀 𝑅 [𝜌 𝑀 𝐴 𝑅 𝐴 𝑋 ′𝑀 𝑅 𝑋 ′′𝑀 𝑅 𝐵𝐸 ]. (16.1.120)

Furthermore, M 𝑋 ′𝑀 𝑅 𝐵→𝑀𝐵 is a measurement channel of the form in (15.1.222),

and e
𝜌 𝐸 is a state satisfying

𝜌𝐸 ) ≤ 𝛿 − 𝜁 .
𝑃(𝜌 𝐸 , e (16.1.121)

Thus, Bob’s strategy is to decode both

′𝑚 and 𝑟 (as before), and he can do so as long as
𝜀 ′ −𝛿−𝜂
log2 𝑀 𝑅 = 𝐼 𝐻 (𝑋; 𝐵)𝜏 − log2 4(𝜀𝜂2−𝛿) . At the same time, the message 𝑚 should
𝛿−𝜁

be private from Eve, and this is possible as long as log2 𝑅 = 𝐼 max (𝐸; 𝑋)𝜏 + log2 𝜁22 .
Calculating log2 𝑀 = log2 𝑀 𝑅 − log2 𝑅 gives (16.1.117). Then the analysis in the
proof of Theorem 15.21 guarantees that the condition in (16.1.116) holds.
We now discuss how to derandomize the protocol. First, let us define the
following measurement channels
𝑥 ,...,𝑥
M𝐵→𝑀 1,1
𝐵
𝑀,𝑅
(𝜔 𝐵 ) B M 𝑋 ′𝑀 𝑅 𝐵→𝑀𝐵 (|𝑥 1,1 , . . . , 𝑥 𝑀,𝑅 ⟩⟨𝑥1,1 , . . . , 𝑥 𝑀,𝑅 | 𝑋 ′𝑀 𝑅 ⊗ 𝜔 𝐵 ),
(16.1.122)
where 𝜔 𝐵 is an input state. Also, consider that the state 𝜌 𝑀 𝐴 𝑋 ′𝑀 𝑅 𝑋 ′′𝑀 𝑅 𝐵𝐸 can be
written as
𝑀
1 ∑︁
𝜌 𝑀 𝐴 𝑋 ′𝑀 𝑅 𝑋 ′′𝑀 𝑅 𝐵𝐸 = |𝑚⟩⟨𝑚| 𝑀 𝐴 ⊗
𝑀 𝑚=1

1042
Chapter 16: Private Communication
∑︁
𝑝(𝑥 1,1 ) · · · 𝑝(𝑥 𝑀,𝑅 )|𝑥 1,1 , . . . , 𝑥 𝑀,𝑅 ⟩⟨𝑥1,1 , . . . , 𝑥 𝑀,𝑅 | 𝑋 ′𝑀 𝑅 ⊗
𝑥 1,1 ,...,𝑥 𝑀,𝑅
𝑅
1 ∑︁ 𝑥 𝑚,𝑟
|𝑥1,1 , . . . , 𝑥 𝑀,𝑅 ⟩⟨𝑥1,1 , . . . , 𝑥 𝑀,𝑅 | 𝑋 ′′𝑀 𝑅 ⊗ 𝜌 . (16.1.123)
𝑅 𝑟=1 𝐵𝐸

With this in mind, the state M 𝑋 ′𝑀 𝑅 𝐵→𝑀𝐵 (𝜌 𝑀 𝐴 𝑋 ′𝑀 𝑅 𝑋 ′′𝑀 𝑅 𝐵𝐸 ) can be written as

follows:

M 𝑋 ′𝑀 𝑅 𝐵→𝑀𝐵 (𝜌 𝑀 𝐴 𝑋 ′𝑀 𝑅 𝑋 ′′𝑀 𝑅 𝐵𝐸 ) =
𝑀
1 ∑︁ ∑︁
𝑝(𝑥1,1 ) · · · 𝑝(𝑥 𝑀,𝑅 )|𝑚⟩⟨𝑚| 𝑀 𝐴 ⊗
𝑀 𝑚=1 𝑥 ,...,𝑥
1,1 𝑀,𝑅
𝑅
!
𝑥 ,...,𝑥 1 ∑︁ 𝑥 𝑚,𝑟
|𝑥1,1 , . . . , 𝑥 𝑀,𝑅 ⟩⟨𝑥 1,1 , . . . , 𝑥 𝑀,𝑅 | 𝑋 ′′𝑀 𝑅 ⊗ M𝐵→𝑀
1,1
𝐵
𝑀,𝑅
𝜌 𝐵𝐸 , (16.1.124)
𝑅 𝑟=1

and the condition in (16.1.116) as

1−𝜀
≤ 𝐹 (M 𝑋 ′𝑀 𝑅 𝐵→𝑀𝐵 (𝜌 𝑀 𝐴 𝑋 ′𝑀 𝑅 𝑋 ′′𝑀 𝑅 𝐵𝐸 ), Φ 𝑀 𝐴 𝑀𝐵 ⊗ 𝜌 𝑋 ′′𝑀 𝑅 ⊗ e 𝜌𝐸 ) (16.1.125)
" 1 Í 𝑀 Í # 2
𝑀 𝑚=1 𝑥1,1,...,𝑥 𝑀,𝑅 𝑝(𝑥 1,1) · · · 𝑝(𝑥 𝑀,𝑅 )×
= √
𝑥1,1 ,...,𝑥 𝑀,𝑅 1 Í 𝑅 𝑥 𝑚,𝑟
, (16.1.126)
𝐹 M𝐵→𝑀 𝐵 𝑅 𝑟=1 𝜌 𝐵𝐸 , |𝑚⟩⟨𝑚| 𝑀 𝐵 ⊗ 𝜌
e 𝐸

which is the same as

𝑀
1 ∑︁ ∑︁
𝑝(𝑥1,1 ) · · · 𝑝(𝑥 𝑀,𝑅 )×
𝑀 𝑚=1 𝑥 ,...,𝑥
1,1 𝑀,𝑅
! !
√ 𝑥 ,...,𝑥 1
𝑅
∑︁ 𝑥 𝑚,𝑟
𝐹 M𝐵→𝑀
1,1
𝐵
𝑀,𝑅
𝜌 𝐵𝐸 , |𝑚⟩⟨𝑚| 𝑀𝐵 ⊗ e
𝜌𝐸
𝑅 𝑟=1
√
≥ 1 − 𝜀. (16.1.127)

We can now exploit the “Shannon trick” of exchanging the sum over the messages
𝑚 and the sum over the codewords to rewrite this inequality as
∑︁
𝑝(𝑥1,1 ) · · · 𝑝(𝑥 𝑀,𝑅 )×
𝑥1,1 ,...,𝑥 𝑀,𝑅

1043
Chapter 16: Private Communication

𝑀 √ 𝑅
! !!
1 ∑︁ 𝑥1,1 ,...,𝑥 𝑀,𝑅 1
∑︁ 𝑥 𝑚,𝑟
𝐹 M𝐵→𝑀 𝜌 , |𝑚⟩⟨𝑚| 𝑀𝐵 ⊗ e
𝜌𝐸
𝑀 𝑚=1 𝐵 𝑅 𝑟=1 𝐵𝐸
√
≥ 1 − 𝜀. (16.1.128)

Since the average does not exceed the maximum, we conclude that there exists
some choice of codewords 𝑥1,1 , . . . , 𝑥 𝑀,𝑅 such that the following inequality holds
! !
1 ∑︁ √
𝑀
𝑥1,1 ,...,𝑥 𝑀,𝑅 1
𝑅
∑︁ 𝑥 𝑚,𝑟 √
𝐹 M𝐵→𝑀 𝜌 , |𝑚⟩⟨𝑚| 𝑀 𝐵 ⊗ 𝜌
e 𝐸 ≥ 1 − 𝜀. (16.1.129)
𝑀 𝑚=1 𝐵 𝑅 𝑟=1 𝐵𝐸
𝑥 ,...,𝑥
Let us then use the shorthand M𝐵→𝑀𝐵 ≡ M𝐵→𝑀 1,1
𝐵
𝑀,𝑅
, so that we can rewrite the
above as
! !
1
𝑀 √
∑︁ 1
𝑅
∑︁ 𝑥 𝑚,𝑟 √
𝐹 M𝐵→𝑀𝐵 𝜌 𝐵𝐸 , |𝑚⟩⟨𝑚| 𝑀𝐵 ⊗ e
𝜌 𝐸 ≥ 1 − 𝜀. (16.1.130)
𝑀 𝑚=1 𝑅 𝑟=1

This completes the derandomization part of the proof.

Finally, we are interested in a code that satisfies the maximal infidelity criterion
𝑝 ∗err (E, D; N)
≤ 𝜀. To find such a code, we can apply expurgation to the code found
above. Since the square root function is concave, after bringing the average inside
the square root and squaring both sides of the inequality, we conclude that
𝑀 𝑅
! !
1 ∑︁ 1 ∑︁ 𝑥 𝑚,𝑟
𝐹 M𝐵→𝑀𝐵 𝜌 𝐵𝐸 , |𝑚⟩⟨𝑚| 𝑀𝐵 ⊗ e
𝜌 𝐸 ≥ 1 − 𝜀, (16.1.131)
𝑀 𝑚=1 𝑅 𝑟=1

which we can rewrite one more time as

𝑀 𝑅
! !!
1 ∑︁ 1 ∑︁ 𝑥 𝑚,𝑟
1 − 𝐹 M𝐵→𝑀𝐵 𝜌 , |𝑚⟩⟨𝑚| 𝑀𝐵 ⊗ e
𝜌 𝐸 ≤ 𝜀. (16.1.132)
𝑀 𝑚=1 𝑅 𝑟=1 𝐵𝐸

Now applying Markov’s inequality, we conclude that at least half of the messages
are such that the following inequality holds
𝑅
! !
1 ∑︁ 𝑥 𝑚,𝑟
1 − 𝐹 M𝐵→𝑀𝐵 𝜌 𝐵𝐸 , |𝑚⟩⟨𝑚| 𝑀𝐵 ⊗ e
𝜌 𝐸 ≤ 2𝜀. (16.1.133)
𝑅 𝑟=1

Thus, these messages and the corresponding codewords are retained as the final code.
To be clear, suppose without loss of generality, that messages 1, . . . , ⌊𝑀/2⌋ are
1044
Chapter 16: Private Communication

retained and messages ⌊𝑀/2⌋ + 1, . . . , 𝑀 are expurgated. Then this means that the
corresponding codewords retained are 𝑥1,1 , . . . , 𝑥1,𝑅 , 𝑥2,1 , . . . , 𝑥 2,𝑅 , . . . , 𝑥 ⌊𝑀/2⌋,1 ,
. . . , 𝑥 ⌊𝑀/2⌋,𝑅 , and the ones discarded are 𝑥 ⌊𝑀/2⌋+1,1 , . . . , 𝑥 ⌊𝑀/2⌋+1,𝑅 , 𝑥 ⌊𝑀/2⌋+2,1 , . . . ,
𝑥 ⌊𝑀/2⌋+2,𝑅 , . . . , 𝑥 𝑀,1 , . . . , 𝑥 𝑀,𝑅 . After the expurgation, the rate of the code is given
by

log2 |M| /2 = log2 |M| − log2 (2) (16.1.134)

𝜀 ′ −𝛿−𝜂 𝛿−𝜁
= 𝐼𝐻 (𝑋; 𝐵) 𝜌 − 𝐼 max (𝐸; 𝑋) 𝜌
4 (𝜀′ − 𝛿)

2
− log2 − log2 − log2 (2) (16.1.135)
𝜂2 𝜁2
𝜀 ′ −𝛿−𝜂 𝛿−𝜁
= 𝐼𝐻 (𝑋; 𝐵) 𝜌 − 𝐼 max (𝐸; 𝑋) 𝜌
8 (𝜀′ − 𝛿)

2
− log2 − log2 . (16.1.136)
𝜂2 𝜁2
By a final substitution of 2𝜀 with 𝜀 and rewriting, we arrive at the claim of the
theorem. ■

We can induce a classical–quantum wiretap channel from an isometric channel

UN
𝐴→𝐵𝐸 extending N 𝐴→𝐵 by the following pre-processing:

𝑥 → 𝜌 𝑥𝐴 → UN 𝑥
𝐴→𝐵𝐸 (𝜌 𝐴 ). (16.1.137)

That is, based on the value of a letter 𝑥, Alice inputs the state 𝜌 𝑥𝐴 into the isometric
channel UN 𝐴→𝐵𝐸 . Optimizing over all such preprocessings and applying Theo-
rem 16.12, we arrive at the following lower bound on the one-shot private capacity
of a quantum channel N 𝐴→𝐵 (according to the definition given in Section 16.1):

Corollary 16.13
Let N 𝐴→𝐵 be a quantum channel that √︁ is extended by the isometric channel
N
U 𝐴→𝐵𝐸 . For all 𝜀 ∈ (0, 1], 𝜀 = 1 − 1 − 𝜀/2, 𝛿 ∈ (0, 𝜀′), 𝜂 ∈ (0, 𝜀′ − 𝛿), and
′

𝜁 ∈ (0, 𝛿), there exists an (|M| , 𝜀) private communication protocol for N 𝐴→𝐵 ,
such that
𝜀 ′ −𝛿−𝜂 𝛿−𝜁
log2 |M| = sup 𝐼𝐻 (𝑋; 𝐵) 𝜌 − 𝐼 max (𝐸; 𝑋) 𝜌
{ 𝑝(𝑥),𝜌 𝐴 } 𝑥 ∈X
𝑥

1045
Chapter 16: Private Communication

8 (𝜀′ − 𝛿)

2
− log2 − log2 .
𝜂2 𝜁2
𝜀 ′ −𝛿−𝜂
where the hypothesis testing mutual information 𝐼 𝐻 (𝑋; 𝐵) 𝜌 is defined in
𝛿−𝜁
(7.11.88) and the smooth max-mutual information 𝐼 max (𝐸; 𝑋) 𝜌 in (15.1.211),
and the information quantities are evaluated with respect to the following state:
∑︁
𝜌 𝑋 𝐵𝐸 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ UN 𝑥
𝐴→𝐵𝐸 (𝜌 𝐴 ). (16.1.138)
𝑥∈X

Now applying Propositions 7.72 and 7.64, we conclude the following bound:

Corollary 16.14
Let N 𝐴→𝐵 be a quantum channel that√︁is extended by the isometric channel
UN ′ ′
𝐴→𝐵𝐸 . For all 𝜀 ∈ (0, 1], 𝜀 = 1 − 1 − 𝜀/2, 𝛿 ∈ (0, 𝜀 ), 𝜂 ∈ (0, 𝜀 − 𝛿),
′

𝜁 ∈ (0, 𝛿), 𝜈 ∈ (0, 𝛿 − 𝜁), 𝛼 ∈ (0, 1), and 𝛽 > 1, there exists an (|M| , 𝜀)
private communication protocol for N 𝐴→𝐵 , such that

log2 |M| ≥ sup 𝐼 𝛼 (𝑋; 𝐵) 𝜌 − 𝐼 𝛽 (𝑋; 𝐸) 𝜌 − 𝑓 (𝜀′, 𝛿, 𝜂, 𝜈, 𝜁, 𝛼, 𝛽).
e ′

{ 𝑝(𝑥),𝜌 𝑥𝐴 } 𝑥 ∈X
(16.1.139)
where the Petz–Renyi mutual information 𝐼 𝛼 (𝑋; 𝐵) 𝜌 is defined in (11.1.136),
the sandwiched Renyi mutual information e 𝐼 𝛽′ (𝑋; 𝐸) 𝜌 as

𝐼 𝛽′ (𝑋; 𝐸) 𝜌 B 𝐷
e e 𝛽 (𝜌 𝑋 𝐸 ∥ 𝜌 𝑋 ⊗ 𝜌 𝐸 ), (16.1.140)
and the information quantities are evaluated with respect to the following state:
∑︁
𝜌 𝑋 𝐵𝐸 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ UN 𝑥
𝐴→𝐵𝐸 (𝜌 𝐴 ). (16.1.141)
𝑥∈X

Furthermore,

′ 𝛼 1 8
𝑓 (𝜀 , 𝛿, 𝜂, 𝜈, 𝜁, 𝛼, 𝛽) B log2 ′ + log2 2
1−𝛼 𝜀 −𝛿−𝜂 𝜈

1 1 1
+ log2 + log2
𝛽−1 (𝛿 − 𝜁 − 𝜈) 2 1 − (𝛿 − 𝜁 − 𝜈) 2

1046
Chapter 16: Private Communication

8 (𝜀′ − 𝛿)

2
+ log2 + log2 . (16.1.142)
𝜂2 𝜁2

Proof: The reasoning here is precisely the same as that given in the proof of
Corollary
15.24. The only difference is that we optimize over every ensemble
𝑝(𝑥), 𝜌 𝐴 𝑥∈X . ■
𝑥

16.2 Private Capacity of a Quantum Channel

We now consider the asymptotic setting. In this scenario, depicted in Figure [REF],
the classical message system 𝑀 ′ to be transmitted to Bob is encoded into 𝑛 copies
𝐴1 , . . . , 𝐴𝑛 of a quantum system 𝐴, for 𝑛 ∈ N. Each of the systems is then sent
through an independent use of the isometric channel UN 𝐴→𝐵𝐸 , which extends the
point-to-point channel N 𝐴→𝐵 . As before, this is the asymptotic setting because 𝑛
can be arbitrarily large.
Due to the fact that 𝑛 independent uses of the channel UN 𝐴→𝐵𝐸 is no different
N ⊗𝑛
from one use of the tensor-power channel (U 𝐴→𝐵𝐸 ) , the information theory
underlying the asymptotic setting is no different from that in the one-shot setting,
and the main task we accomplish here is to analyze performance of the protocols in
the large 𝑛 limit. Indeed, the only change that we make here is to replace UN 𝐴→𝐵𝐸
N ⊗𝑛
with (U 𝐴→𝐵𝐸 ) and define the encoding and decoding channels as acting on 𝑛
systems instead of one. If Alice transmits message 𝑚, then the final state of the
protocol is
(D𝐵𝑛 → 𝑀ˆ ◦ (UN
𝐴→𝐵𝐸 )
⊗𝑛
◦ E 𝑀 ′ →𝐴𝑛 )(|𝑚⟩⟨𝑚| 𝑀 ′ ), (16.2.1)
where E 𝑀 ′ →𝐴𝑛 is the encoding channel and D𝐵𝑛 → 𝑀ˆ the decoding channel. Just as
in the one-shot setting, we define the maximal infidelity of the code as

𝑝 ∗err (E, D; N ⊗𝑛 ) =
inf max (1 − 𝐹 (|𝑚⟩⟨𝑚| 𝑀ˆ ⊗ 𝜎𝐸 𝑛 , (D𝐵𝑛 → 𝑀ˆ ◦ (UN
𝐴→𝐵𝐸 )
⊗𝑛
◦ E 𝑀 ′ →𝐴𝑛 )(|𝑚⟩⟨𝑚| 𝑀 ′ ))),
𝜎𝐸 𝑛 𝑚∈M
(16.2.2)

where the infimum is with respect to every state 𝜎𝐸 𝑛 of the eavesdropper’s system 𝐸.

1047
Chapter 16: Private Communication

Definition 16.15 (𝒏, |M| , 𝜺) Private Communication Protocol

Let (M, E 𝑀 ′ →𝐴𝑛 , D𝐵𝑛 → 𝑀ˆ ) be the elements of a private communication protocol
for 𝑛 independent uses of the channel N 𝐴→𝐵 , where 𝑑 𝑀 ′ = 𝑑 𝑀ˆ = |M|. The
protocol is called an (𝑛, |M| , 𝜀) protocol, with 𝜀 ∈ [0, 1], if 𝑝 ∗err (E, D; N ⊗𝑛 ) ≤
𝜀.

The rate of an (𝑛, |M| , 𝜀) private communication protocol is defined as the

number of private bits transmitted per channel use, i.e.,
log2 |M|
𝑅(𝑛, |M|) B . (16.2.3)
𝑛
The rate depends only on the size |M| of the message set M and on the number
of channel uses. In particular, it does not depend on the communication channel
nor on the encoding and decoding channels. For a given 𝜀 ∈ [0, 1] and 𝑛 ∈ N,
the highest rate among all (𝑛, |M| , 𝜀) protocols is denoted by 𝑃𝑛,𝜀 (N), and it is
defined as

1 log |M|
𝑃𝑛,𝜀 (N) B 𝑃 𝜀 (N ⊗𝑛 ) = sup 2
: 𝑝 ∗err (E, D; N ⊗𝑛 ) ≤ 𝜀 , (16.2.4)
𝑛 (M,E,D) 𝑛

where, in the second equality, we use the definition of the one-shot private capacity
𝑃 𝜀 given in (16.1.35), and the supremum is over every message set M, encoding
channel E with input dimension |M|, and decoding channel D with output dimension
|M|.
We now provide several definitions related to private capacity and its associated
concepts.

Definition 16.16 Achievable Rate for Private Communication

Given a quantum channel N, a rate 𝑅 ∈ R+ is called an achievable rate for
private communication over N if for all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently large
𝑛, there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) private communication protocol for N.

1048
Chapter 16: Private Communication

Definition 16.17 Private Capacity of a Quantum Channel

The private capacity of a quantum channel N, denoted by 𝑃(N), is defined to
be the supremum of all achievable rates, i.e.,

𝑃(N) B sup {𝑅 : 𝑅 is an achievable rate for N} . (16.2.5)

An equivalent definition of private capacity is

1
𝑃(N) = inf lim inf 𝑃 𝜀 (N ⊗𝑛 ), (16.2.6)
𝜀∈(0,1] 𝑛→∞ 𝑛

which we prove in Appendix A.

Definition 16.18 Weak Converse Rate for Private Communication

Given a quantum channel N, a rate 𝑅 ∈ R+ is called a weak converse rate for
private communication over N if every 𝑅′ > 𝑅 is not an achievable rate for N.

Definition 16.19 Strong Converse Rate for Private Communication

Given a quantum channel N, a rate 𝑅 ∈ R+ is called a strong converse rate for
private communication over N if for all 𝜀 ∈ [0, 1), 𝛿 > 0, and sufficiently large
𝑛, there does not exist an (𝑛, 2𝑛(𝑅+𝛿) , 𝜀) private communication protocol for N.

Definition 16.20 Strong Converse Private Capacity of a Quantum Chan-

nel
The strong converse private capacity of a quantum channel N, denoted by 𝑃(N),
e
is defined as the infimum of all strong converse rates, i.e.,

𝑃(N)
e B inf {𝑅 : 𝑅 is a strong converse rate for N} . (16.2.7)

We can also write the strong converse private capacity as

1
𝑃(N)
e = sup lim sup 𝑃 𝜀 (N ⊗𝑛 ). (16.2.8)
𝜀∈[0,1) 𝑛→∞ 𝑛

1049
Chapter 16: Private Communication

See Appendix A for a proof. We also show in Appendix A that

𝑃(N) ≤ 𝑃(N)
e (16.2.9)
for every quantum channel N.
We now state one of the main theorems of this chapter, which gives an experssion
for the private capacity of a quantum channel.

Theorem 16.21 Private Capacity

The private capacity of a quantum channel N 𝐴→𝐵 is equal to the regularized
private information 𝑃reg (N) of N, i.e.,

𝑝 1 𝑝 ⊗𝑛
𝑃(N) = 𝐼reg (N) B lim 𝐼 (N ), (16.2.10)
𝑛→∞ 𝑛

where the private information of a channel is defined as

𝐼 𝑝 (N) B sup 𝐼 (𝑋; 𝐵) 𝜌 − 𝐼 (𝑋; 𝐸) 𝜌 , (16.2.11)

{ 𝑝(𝑥),𝜌 𝐴 } 𝑥 ∈X
𝑥

and the information quantities are evaluated with respect to the state
∑︁
𝜌 𝑋 𝐵𝐸 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ UN 𝑥
𝐴→𝐵𝐸 (𝜌 𝐴 ), (16.2.12)
𝑥∈X

with UN
𝐴→𝐵𝐸 an isometric channel extending N 𝐴→𝐵 .

Observe that the expression in (16.2.10) for the private capacity involves a
regularization of the private information. Thus, in general, it is difficult to compute
because the optimization is over an arbitrarily large number of channel uses.
By following an argument similar to that given in Section 14.2.3, it follows that
the private information is always superadditive, meaning that 𝐼 𝑝 (N ⊗𝑛 ) ≥ 𝑛𝐼 𝑝 (N)
for every 𝑛 ∈ N and channel N. This means that the private information is always a
lower bound on the private capacity of a channel N:
𝑃(N) ≥ 𝐼 𝑝 (N) for every channel N. (16.2.13)
If the private information happens to be additive for a particular channel, then the
regularization in (16.2.10) is not required. For example, the private information
1050
Chapter 16: Private Communication

is known to be additive for all degradable and anti-degradable channels (see

Definition 4.6). Furthermore, for degradable channels, the private information is
equal to the coherent information and so there is no difference between the quantum
capacity and the private capacity for these channels. That is, for degradable
channels, we have that
𝑃(N) = 𝑄(N) = 𝐼 𝑐 (N), (16.2.14)
where the coherent information 𝐼 𝑐 (N) of a channel N is defined in (7.11.107). For
anti-degradable channels, the private information is equal to zero, which is what
we prove in Section 16.3.2.
Theorem 16.21 only makes a statement about the private capacity and not about
the strong converse private capacity. Later on, we prove that a channel’s relative
entropy of entanglement is a strong converse rate for private communication, and
for some channels, it coincides with the private information, thus leading to the
strong converse property holding for these channels. More generally, however, the
best statement we can make is that the regularized private information is a weak
converse rate for all quantum channels.
There are two ingredients to the proof of Theorem 16.21:
𝑝
1. Achievability: We prove that 𝐼reg (N) is an achievable rate, which involves
explicitly constructing a private communication protocol. The developments
in Section 16.1.4 on a lower bound for one-shot private capacity can be used,
via the substitution N → N ⊗𝑛 , to argue for the existence of a private communi-
𝑝
cation protocol for N in the asymptotic setting at the rate 𝐼reg (N).
𝑝
The achievability part of the proof establishes that 𝑃(N) ≥ 𝐼reg (N).
𝑝
2. Weak Converse: We prove that 𝐼reg (N) is a weak converse rate, from which
𝑝 𝑝
it follows that 𝑃(N) ≤ 𝐼reg (N). To show that 𝐼reg (N) is a weak converse rate,
we use the upper bounds on one-shot private capacity from Section 16.1.3 to
𝑝
conclude that every achievable rate 𝑅 satisfies 𝑅 ≤ 𝐼reg (N).
𝑝
We first establish in Section 16.2.1 that the quantity 𝐼reg (N) is an achievable
rate for private communication over N. Then, in Section 16.2.2, we prove that
𝑝
𝐼reg (N) is a weak converse rate.
Before proceeding, we establish a relationship between the private information
of a quantum channel and its coherent information, which mirrors the operational
relationship established in Section 16.1.1. This relationship gives another way
1051
Chapter 16: Private Communication

to arrive at the conclusion that the private capacity of a quantum channel is not
smaller than its quantum capacity.

Theorem 16.22
For a quantum channel N 𝐴→𝐵 , its private information is not smaller than its
coherent information:
𝐼 𝑐 (N) ≤ 𝐼 𝑝 (N), (16.2.15)
where the coherent information is defined in (7.11.107) and the private infor-
mation in (16.2.11). As a consequence, the private capacity is not smaller than
the quantum capacity:
𝑄(N) ≤ 𝑃(N). (16.2.16)

Proof: Picking a pure-state ensemble in (16.2.11), i.e., {𝑝(𝑥), 𝜓 𝑥𝐴 }𝑥∈X , and setting
∑︁
𝜌 𝑋 𝐵𝐸 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ UN 𝑥
𝐴→𝐵𝐸 (𝜓 𝐴 ), (16.2.17)
𝑥∈X

with UN
𝐴→𝐵𝐸 an isometric channel extending N 𝐴→𝐵 , we find that
𝐼 𝑝 (N) ≥ 𝐼 (𝑋; 𝐵) 𝜌 − 𝐼 (𝑋; 𝐸) 𝜌 (16.2.18)

= 𝐻 (𝐵) 𝜌 − 𝐻 (𝐵|𝑋) 𝜌 − 𝐻 (𝐸) 𝜌 − 𝐻 (𝐸 |𝑋) 𝜌 (16.2.19)
= 𝐻 (𝐵) 𝜌 − 𝐻 (𝐸) 𝜌 . (16.2.20)
The first equality follows from rewriting the mutual information, and the second
follows because the conditional entropies can be written as
∑︁
𝐻 (𝐵|𝑋) 𝜌 = 𝑝(𝑥)𝐻 (Tr𝐸 [UN 𝑥
𝐴→𝐵𝐸 (𝜓 𝐴 )]), (16.2.21)
𝑥∈X
∑︁
𝐻 (𝐸 |𝑋) 𝜌 = 𝑝(𝑥)𝐻 (Tr 𝐵 [UN 𝑥
𝐴→𝐵𝐸 (𝜓 𝐴 )]). (16.2.22)
𝑥∈X
They are equal because the entropies of the marginal states of a pure bipartite state
are equal. Now consider that the reduced state of the 𝐵𝐸 systems is
!
∑︁ ∑︁
𝜌 𝐵𝐸 = 𝑝(𝑥)UN 𝑥 N
𝐴→𝐵𝐸 (𝜓 𝐴 ) = U 𝐴→𝐵𝐸 𝑝(𝑥)𝜓 𝑥𝐴 . (16.2.23)
𝑥∈X 𝑥∈X
Since we can realize an arbitrary input density operator by taking convex com-
binations of pure states, and by applying (7.11.113), we conclude the claim in
(16.2.15).
1052
Chapter 16: Private Communication

The conclusion in (16.2.16) follows by applying (16.2.15) and Theorems 14.16

and 16.21. ■

16.2.1 Proof of Achievability

𝑝
In this section, we prove that 𝐼reg (N) is an achievable rate for private communication
over a channel N.
First, recall Corollary 16.14. Applying it, we find the following:

Corollary 16.23 Lower Bound for Private Communication in the Asymp-

totic Setting
Let N 𝐴→𝐵 be a quantum channel, and let UN 𝐴→𝐵𝐸 be an isometric channel
extending it. For all 𝑛 ∈ N, 𝜀 ∈ (0, 1), 𝛼 ∈ (0, 1), and 𝛽 > 1, there exists an
log |M|
(𝑛, |M| , 𝜀) private communication protocol for N 𝐴→𝐵 such that the rate 2𝑛
satisfies

log2 |M|
≥ sup 𝐼 𝛽′ (𝑋; 𝐸) 𝜌
𝐼 𝛼 (𝑋; 𝐵) 𝜌 − e
𝑛 { 𝑝(𝑥),𝜌 𝑥𝐴 } 𝑥 ∈X
′ ′ ′ ′

1 ′ 𝜀 𝜀 𝜀 𝜀
− 𝑓 𝜀 , , , , , 𝛼, 𝛽 , (16.2.24)
𝑛 2 4 4 2

where the information quantities are evaluated with respect to the state
∑︁
𝜌 𝑋 𝐵𝐸 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ UN 𝑥
𝐴→𝐵𝐸 (𝜌 𝐴 ), (16.2.25)
𝑥∈X

and the function 𝑓 is defined in (16.1.142).

Proof: We evaluate the quantities in Corollary 16.14 with respect to the tensor-
power isometric channel (UN 𝐴→𝐵𝐸 )
⊗𝑛 and choose the input ensemble to be a

tensor-power ensemble {𝑝(𝑥 1 ) · · · 𝑝(𝑥 𝑛 ), 𝜌 𝑥𝐴11 ⊗ · · · ⊗ 𝜌 𝑥𝐴𝑛𝑛 }𝑥1 ,...,𝑥 𝑛 ∈X×𝑛 . This implies
that the state being evaluated for the Renyi information quantities is the tensor-power
𝜀′ 𝜀′ 𝜀′
state 𝜌 ⊗𝑛
𝑋 𝐵𝐸 , where 𝜌 𝑋 𝐵𝐸 is defined in (16.2.25). Let 𝛿 = 2 , 𝜂 = 4 , 𝜈 = 4 , and
𝜀′ ′
𝜁 = 2 . Exploiting the additivity of 𝐼 𝛼 and e 𝐼 𝛽 , substituting into the inequality in

1053
Chapter 16: Private Communication

(16.1.139), and simplifying leads to the inequality

log2 |M|
′

≥ sup 𝐼 𝛼 (𝑋; 𝐵) 𝜌 − 𝐼 𝛽 (𝑋; 𝐸) 𝜌
e
𝑛 { 𝑝(𝑥),𝜌 𝐴 } 𝑥 ∈X
𝑥

′ 𝜀′ 𝜀′ 𝜀′

1 𝜀
− 𝑓 𝜀′, , , , , 𝛼, 𝛽 . (16.2.26)
𝑛 2 4 4 2
This concludes the proof. ■

Using the inequality in (16.2.24), we conclude the following lower bound on

the private capacity of a quantum channel:

Theorem 16.24 Achievability of Private Information for Private Com-

munication
The private information 𝐼 𝑝 (N) of a quantum channel N, as defined in (16.2.11),
is an achievable rate for private communication over N. In other words,

𝑃(N) ≥ 𝐼 𝑝 (N), (16.2.27)

for every quantum channel N.

Proof: Let UN 𝐴→𝐵𝐸 be an isometric channel extending the channel N 𝐴→𝐵 of interest.
Fix 𝜀 ∈ (0, 1] and 𝛿 > 0. Let 𝛿1 , 𝛿2 > 0 be such that 𝛿 = 𝛿1 + 𝛿2 . Set 𝛼 ∈ (0, 1)
and 𝛽 > 1 such that

′
𝛿1 ≥ 𝐼 (𝑋; 𝐵) 𝜌 − 𝐼 (𝑋; 𝐸) 𝜌 − 𝐼 𝛼 (𝑋; 𝐵) 𝜌 − 𝐼 𝛽 (𝑋; 𝐸) 𝜌 ,
e (16.2.28)
where the information quantities are evaluated with respect to the state 𝜌 𝑋 𝐵𝐸 in
(16.2.25). Note that this is possible because 𝐼 𝛼 (𝑋; 𝐵) 𝜌 increases monotonically with
increasing 𝛼 ∈ (0, 1) (see Proposition 7.23) and e 𝐼 𝛽′ (𝑋; 𝐸) 𝜌 decreases monotonically
with decreasing 𝛽 (see Proposition 7.31), so that
lim 𝐼 𝛼 (𝑋; 𝐵) 𝜌 = sup 𝐼 𝛼 (𝑋; 𝐵) 𝜌 , (16.2.29)
𝛼→1 − 𝛼∈(0,1)
𝐼 𝛽′ (𝑋; 𝐸) 𝜌 =
lim+ e 𝐼 𝛽′ (𝑋; 𝐸) 𝜌 .
inf e (16.2.30)
𝛽→1 𝛽∈(1,∞)

Also,
𝐼 (𝑋; 𝐵) 𝜌 = lim− 𝐼 𝛼 (𝑋; 𝐵) 𝜌 , (16.2.31)
𝛼→1

1054
Chapter 16: Private Communication

𝐼 𝛽′ (𝑋; 𝐸) 𝜌 .
𝐼 (𝑋; 𝐸) 𝜌 = lim+ e (16.2.32)
𝛽→1

With 𝛼 and 𝛽 chosen such that (16.2.28) holds, take 𝑛 large enough so that
′ 𝜀′ 𝜀′ 𝜀′

1 𝜀
𝛿2 ≥ 𝑓 𝜀′, , , , , 𝛼, 𝛽 . (16.2.33)
𝑛 2 4 4 2
Now, we use the fact that for the 𝑛 and 𝜀 chosen above, there exists an (𝑛, |M| , 𝜀)
protocol such that
′ 𝜀′ 𝜀′ 𝜀′

log2 |M| ′ 1 ′ 𝜀
≥ 𝐼 𝛼 (𝑋; 𝐵) 𝜌 − e
𝐼 𝛽 (𝑋; 𝐸) 𝜌 − 𝑓 𝜀 , , , , , 𝛼, 𝛽 , (16.2.34)
𝑛 𝑛 2 4 4 2
which follows from Corollary 16.23 above. Rearranging the right-hand side of this
inequality, and using (16.2.28), (16.2.33), and (16.2.34), we find that
log2 |M|
≥ 𝐼 (𝑋; 𝐵) 𝜌 − 𝐼 (𝑋; 𝐸) 𝜌
𝑛
𝐼 (𝑋; 𝐵) − 𝐼 (𝑋; 𝐸) − 𝐼 (𝑋; 𝐵) − 𝐼
e ′ (𝑋; 𝐸)
𝜌 𝜌 𝛼 𝜌 𝜌 ª
−
© ′ ′ ′ ′
𝛽 ® (16.2.35)
+ 𝑛1 𝑓 𝜀′, 𝜀2 , 𝜀4 , 𝜀4 , 𝜀2 , 𝛼, 𝛽
« ¬
≥ 𝐼 (𝑋; 𝐵) 𝜌 − 𝐼 (𝑋; 𝐸) 𝜌 − (𝛿1 + 𝛿2 ) (16.2.36)
= 𝐼 (𝑋; 𝐵) 𝜌 − 𝐼 (𝑋; 𝐸) 𝜌 − 𝛿. (16.2.37)
We thus have shown that there exists an (𝑛, |M| , 𝜀) private communication protocol
log |M|
with rate 2𝑛 ≥ 𝐼 (𝑋; 𝐵) 𝜌 − 𝐼 (𝑋; 𝐸) 𝜌 −𝛿. Therefore, there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀)
private communication protocol with 𝑅 = 𝐼 (𝑋; 𝐵) 𝜌 − 𝐼 (𝑋; 𝐸) 𝜌 for all sufficiently
large 𝑛 such that (16.2.33) holds. Since 𝜀 and 𝛿 are arbitrary, we conclude that for
all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently large 𝑛, there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) private
communication protocol. This means that, by definition, 𝐼 (𝑋; 𝐵) 𝜌 − 𝐼 (𝑋; 𝐸) 𝜌 is
an achievable rate. Since this is true for all input ensembles, we can finally take a
supremum over all input ensembles to arrive at the conclusion in (16.2.27). ■

16.2.1.1 Proof of the Achievability Part of Theorem 16.21

Let {𝑝(𝑥), 𝜌 𝑥𝐴 𝑘 }𝑥∈X be an arbitrary ensemble over 𝑘 channel input systems, with
𝑘 ∈ N. Let
∑︁
𝜏𝑋 𝐵 𝑘 𝐸 𝑘 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ (UN ⊗𝑘 𝑥
𝐴→𝐵𝐸 ) (𝜌 𝐴 𝑘 ). (16.2.38)
𝑥∈X

1055
Chapter 16: Private Communication

Fix 𝜀 ∈ (0, 1] and 𝛿 > 0. Let 𝛿1 , 𝛿2 > 0 be such that 𝛿 = 𝛿1 + 𝛿2 . Set 𝛼 ∈ (0, 1)
and 𝛽 > 1 such that
1 𝑘 𝑘
1
𝑘 ′ 𝑘

𝛿1 ≥ 𝐼 (𝑋; 𝐵 )𝜏 − 𝐼 (𝑋; 𝐸 )𝜏 − 𝐼 𝛼 (𝑋; 𝐵 )𝜏 − 𝐼 𝛽 (𝑋; 𝐸 )𝜏 , (16.2.39)
e
𝑘 𝑘
which is possible based on the arguments given in the proof of Theorem 16.24
above. Then, with this choice of 𝛼 and 𝛽, take 𝑛 large enough so that
′ 𝜀′ 𝜀′ 𝜀′

1 𝜀
𝛿2 ≥ 𝑓 𝜀′, , , , , 𝛼, 𝛽 . (16.2.40)
𝑘𝑛 2 4 4 2
Now, we use the fact that, for the chosen 𝑛 and 𝜀, there exists an (𝑛, |M| , 𝜀) private
communication protocol such that (16.2.24) holds, i.e.,
′ 𝜀′ 𝜀′ 𝜀′

log2 |M| 𝑘 ′ 𝑘 1 ′ 𝜀
≥ 𝐼 𝛼 (𝑋; 𝐵 )𝜏 − e 𝐼 𝛽 (𝑋; 𝐸 )𝜏 − 𝑓 𝜀 , , , , , 𝛼, 𝛽 . (16.2.41)
𝑛 𝑛 2 4 4 2
Dividing both sides by 𝑘 gives
′ 𝜀′ 𝜀′ 𝜀′

log2 |M| 1 𝑘 ′ 𝑘
1 ′ 𝜀
≥ 𝐼 𝛼 (𝑋; 𝐵 )𝜏 − e 𝐼 𝛽 (𝑋; 𝐸 )𝜏 − 𝑓 𝜀 , , , , , 𝛼, 𝛽 .
𝑘𝑛 𝑘 𝑘𝑛 2 4 4 2
(16.2.42)
Rearranging the right-hand side of this inequality, and using (16.2.39)–(16.2.42),
we find that
log2 |M| 1 𝑘 𝑘

≥ 𝐼 (𝑋; 𝐵 )𝜏 − 𝐼 (𝑋; 𝐸 )𝜏
𝑘𝑛 𝑘
1
− 𝐼 𝛽′ (𝑋; 𝐸 𝑘 )𝜏
𝐼 (𝑋; 𝐵 𝑘 )𝜏 − 𝐼 (𝑋; 𝐸 𝑘 )𝜏 − 𝐼 𝛼 (𝑋; 𝐵 𝑘 )𝜏 − e
𝑘
′ ′ ′ ′

1 ′ 𝜀 𝜀 𝜀 𝜀
− 𝑓 𝜀 , , , , , 𝛼, 𝛽 (16.2.43)
𝑘𝑛 2 4 4 2
1 𝑘 𝑘

≥ 𝐼 (𝑋; 𝐵 )𝜏 − 𝐼 (𝑋; 𝐸 )𝜏 − (𝛿1 + 𝛿2 ) (16.2.44)
𝑘
1 𝑘 𝑘

= 𝐼 (𝑋; 𝐵 )𝜏 − 𝐼 (𝑋; 𝐸 )𝜏 − 𝛿. (16.2.45)
𝑘
log |M|
Thus, there exists a (𝑘𝑛, |M| , 𝜀) private communication protocol with rate 𝑘𝑛2
≥
′
1

𝑘 𝐼 (𝑋; 𝐵 ) 𝜏 − ′𝐼 (𝑋; 𝐸 ) 𝜏 − 𝛿. Therefore, letting 𝑛 ≡ 𝑘𝑛, we conclude that there
𝑘 𝑘

exists an (𝑛′, 2𝑛 (𝑅−𝛿) , 𝜀) private communication protocol with

1 𝑘 𝑘

𝑅= 𝐼 (𝑋; 𝐵 )𝜏 − 𝐼 (𝑋; 𝐸 )𝜏 (16.2.46)
𝑘
1056
Chapter 16: Private Communication

for all sufficiently large 𝑛 such that (16.2.40) holds. Since 𝜀 and 𝛿 are arbitrary,
we conclude that for all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently large 𝑛, there exists an
(𝑛, 2𝑛 ( 𝑘 ( 𝐼 (𝑋;𝐵 ) 𝜏 −𝐼 (𝑋;𝐸 ) 𝜏 )−𝛿) , 𝜀) private communication protocol. This means that
1 𝑘 𝑘

1
𝑘 𝐼 (𝑋; 𝐵 ) 𝜏 − 𝐼 (𝑋; 𝐸 ) 𝜏 is an achievable rate.
𝑘 𝑘

Now, since the input ensemble is arbitrary in the arguments above, we conclude
that
1 𝑝 ⊗𝑘 1 𝑘 𝑘

𝐼 (N ) = sup 𝐼 (𝑋; 𝐵 )𝜏 − 𝐼 (𝑋; 𝐸 )𝜏 (16.2.47)
𝑘 {𝑝(𝑥),𝜌 𝑥 𝑘 } 𝑥 ∈X 𝑘
𝐴

is an achievable rate. Finally, since the number 𝑘 of instances of the channel N is

arbitrary, we conclude that the regularized private information lim𝑘→∞ 1𝑘 𝐼 𝑝 (N ⊗𝑘 )
is an achievable rate.

16.2.2 Proof of the Weak Converse

In order to prove the weak converse part of Theorem 16.21, we make use of
Corollary 16.8, specifically (16.1.80). Applying this inequality to the tensor-power
channel N ⊗𝑛
𝐴→𝐵 leads to the following:

Proposition 16.25
Let N 𝐴→𝐵 be a quantum channel, and let UN 𝐴→𝐵𝐸 be an isometric channel ex-
tending it. Let 𝑛 ∈ N and 𝜀 ∈ [0, 1). For an (𝑛, |M| , 𝜀) private communication
log |M|
protocol for N 𝐴→𝐵 , the rate 2𝑛 satisfies

√ log2 |M| 1
1−𝜀− 𝜀 ≤ sup 𝐼 (𝑋; 𝐵𝑛 ) 𝜌 − 𝐼 (𝑋; 𝐸 𝑛 ) 𝜌
𝑛 { 𝑝(𝑥),𝜌 𝑥𝐴𝑛 } 𝑥 ∈X 𝑛
1 √
+ ℎ2 (𝜀) + 2𝑔( 𝜀) , (16.2.48)
𝑛
where the information quantities are evaluated with respect to the state
∑︁
𝜌 𝑋 𝐵𝑛 𝐸 𝑛 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ (UN ⊗𝑛 𝑥
𝐴→𝐵𝐸 ) (𝜌 𝐴𝑛 ), (16.2.49)
𝑥∈X

with UN
𝐴→𝐵𝐸 an isometric channel extending N 𝐴→𝐵 . Consequently,

1057
Chapter 16: Private Communication

√ 1
1 − 𝜀 − 𝜀 𝑃𝑛,𝜀 (N) ≤ sup 𝐼 (𝑋; 𝐵𝑛 ) 𝜌 − 𝐼 (𝑋; 𝐸 𝑛 ) 𝜌
{ 𝑝(𝑥),𝜌 𝑥𝐴𝑛 } 𝑥 ∈X 𝑛
1 √
+ ℎ2 (𝜀) + 2𝑔( 𝜀) . (16.2.50)
𝑛

16.2.2.1 Proof of the Weak Converse Part of Theorem 16.21

Suppose that 𝑅 is an achievable rate for private communication over the channel
N 𝐴→𝐵 . Then, by definition, for all 𝜀 ∈ (0, 1], 𝛿 > 0, and sufficiently large 𝑛, there
exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) private communication protocol for N 𝐴→𝐵 . For all such
protocols, the inequality in (16.2.48) holds, so that
√ 1
1 − 𝜀 − 𝜀 (𝑅 − 𝛿) ≤ sup 𝐼 (𝑋; 𝐵𝑛 ) 𝜌 − 𝐼 (𝑋; 𝐸 𝑛 ) 𝜌
{ 𝑝(𝑥),𝜌 𝑥𝐴𝑛 } 𝑥 ∈X 𝑛
1 √
+
ℎ2 (𝜀) + 2𝑔( 𝜀) . (16.2.51)
𝑛
Since the inequality holds for all sufficiently large 𝑛, it holds in the limit 𝑛 → ∞,
so that
√ 1
1 − 𝜀 − 𝜀 (𝑅 − 𝛿) ≤ lim sup 𝐼 (𝑋; 𝐵𝑛 ) 𝜌 − 𝐼 (𝑋; 𝐸 𝑛 ) 𝜌
𝑛→∞
{ 𝑝(𝑥),𝜌 𝑥𝐴𝑛 } 𝑥 ∈X 𝑛
!
1 √
+ ℎ2 (𝜀) + 2𝑔( 𝜀) (16.2.52)
𝑛
1
= lim sup 𝐼 (𝑋; 𝐵𝑛 ) 𝜌 − 𝐼 (𝑋; 𝐸 𝑛 ) 𝜌 . (16.2.53)
𝑛→∞
{ 𝑝(𝑥),𝜌 𝑥 𝑛 } 𝑛
𝐴 𝑥 ∈X

Then since this√inequality holds for all 𝜀 ∈ (0, 1), 𝛿 > 0, it holds in particular for 𝜀
satisfying 𝜀 + 𝜀 < 1, which gives
1 1
𝑅≤ √ lim sup 𝐼 (𝑋; 𝐵𝑛 ) 𝜌 − 𝐼 (𝑋; 𝐸 𝑛 ) 𝜌 + 𝛿, (16.2.54)
1 − 𝜀 − 𝜀 𝑛→∞ { 𝑝(𝑥),𝜌 𝑥 𝑛 } 𝑛
𝐴 𝑥 ∈X

and we thus conclude that

1 1
𝑅 ≤ lim √ lim sup 𝐼 (𝑋; 𝐵𝑛 ) 𝜌 − 𝐼 (𝑋; 𝐸 𝑛 ) 𝜌 + 𝛿
𝜀,𝛿→0 1 − 𝜀 − 𝜀 𝑛→∞ 𝑝(𝑥),𝜌 𝑥 𝑛
{ 𝑛} 𝐴 𝑥 ∈X
(16.2.55)
1058
Chapter 16: Private Communication

1
= lim sup 𝐼 (𝑋; 𝐵𝑛 ) 𝜌 − 𝐼 (𝑋; 𝐸 𝑛 ) 𝜌 (16.2.56)
𝑛→∞
{ 𝑝(𝑥),𝜌 𝑥𝐴𝑛 } 𝑥 ∈X 𝑛
𝑝
= 𝐼reg (N). (16.2.57)
𝑝
We have thus shown that the quantity 𝐼reg (N) is a weak converse rate for private
communication over N.

16.2.3 Relative Entropy of Entanglement Strong Converse Bound

Except for channels for which the private information is known to be additive (such
as the class of degradable channels; see Section 16.3.1 below), the private capacity
of a channel is difficult to compute. This prompts us to find upper bounds on the
private capacity. In this section, we do so in terms of the channel’s relative entropy
of entanglement, and in terms of the channel’s squashed entanglement in the next
section.
We begin by recalling the bound from (16.1.97), which holds for all (|M| , 𝜀)
private communication protocols and for all 𝛼 > 1:

𝜀 𝛼 1
𝑃 (N) ≤ 𝐸 e𝛼 (N) + log2 . (16.2.58)
𝛼−1 1−𝜀
For 𝑛 channel uses, the bound in (16.1.97) becomes

log2 |M| 1 ⊗𝑛 𝛼 1
≤ 𝐸 e𝛼 (N ) + log2 , (16.2.59)
𝑛 𝑛 𝑛 (𝛼 − 1) 1−𝜀
which holds for all 𝛼 > 1 and for all (𝑛, |M| , 𝜀) private communication protocols,
with 𝑛 ∈ N and 𝜀 ∈ [0, 1). We can simplify this inequality by making use of the
following fact:

Proposition 16.26 Weak Subadditivity of a Channel’s Renyi Relative

Entropy of Entanglement
Let N 𝐴→𝐵 be a quantum channel, with 𝑑 𝐴 the dimension of the input system 𝐴.
For all 𝛼 > 1 and 𝑛 ∈ N, we have
2 −1

𝛼 𝑑
e𝛼 (N ⊗𝑛 ) ≤ 𝑛 𝐸
𝐸 e𝛼 (N) + 𝐴
log2 (𝑛 + 1). (16.2.60)
𝛼−1

1059
Chapter 16: Private Communication

Proof: The proof is identical to the proof of Proposition 14.21, but making use of
Proposition 10.9 at the beginning instead of Proposition 10.12. ■

Combining (16.2.60) with (16.2.59), we conclude the following upper bound

on the rate of an arbitrary (𝑛, |M| , 𝜀) private communication protocol:
!
𝑑 2𝐴−1
log2 |M| 𝛼 (𝑛 + 1)
≤𝐸 e𝛼 (N) + log2 , (16.2.61)
𝑛 𝑛 (𝛼 − 1) 1−𝜀
which holds for all 𝛼 > 1. Consequently, the following upper bound holds for the
𝑛-shot private capacity:
!
𝑑 2𝐴−1
𝛼 (𝑛 + 1)
𝑃𝑛,𝜀 (N) ≤ 𝐸e𝛼 (N) + log2 , (16.2.62)
𝑛 (𝛼 − 1) 1−𝜀
for all 𝛼 > 1.
With this bound, we are now ready to state the main result of this section, which
is that a channel’s relative entropy of entanglement is an upper bound on the strong
converse private capacity of an arbitrary quantum channel N.

Theorem 16.27 Strong Converse Upper Bound on Private Capacity

A channel’s relative entropy of entanglement, denoted by 𝐸 𝑅 (N), is a strong
converse rate for private communication over N. In other words, 𝑃(N) e ≤
𝐸 𝑅 (N) for every quantum channel N.

Recall from (10.3.6) that

𝐸 𝑅 (N) B sup inf 𝐷 (N 𝐴→𝐵 (𝜓 𝑆 𝐴 )∥𝜎𝑆𝐵 ), (16.2.63)
𝜓 𝑆 𝐴 𝜎𝑆𝐵 ∈SEP(𝑆:𝐵)
where the supremum is with respect to every pure state 𝜓 𝑆 𝐴 with 𝑑 𝑆 = 𝑑 𝐴 .

Proof: The proof here is identical to that given for Theorem 14.22, but using the
relative entropy of entanglement 𝐸 𝑅 (N) instead of the Rains information 𝑅(N). ■

16.2.4 Squashed Entanglement Weak Converse Bound

We showed earlier in Section 16.1.3.3 that the squashed entanglement gives an

upper bound on the one-shot private capacity. Here we extend these results to
1060
Chapter 16: Private Communication

the 𝑛-shot setting, establishing a bound on the 𝑛-shot private capacity and we
conclude from it that the squashed entanglement of a quantum channel is a weak
converse rate for private communication over it. Later on in the book, in Chapter 20,
we prove that the squashed entanglement of a channel is an upper bound on its
secret-key-agreement capacity, which generally can be much larger than its private
capacity. Thus, the squashed entanglement bound is generally a loose upper bound
on its (unassisted) private capacity.

Theorem 16.28 Squashed Entanglement Upper Bound on 𝒏-Shot Private

Capacity
Let N 𝐴→𝐵 be a quantum channel, and let 𝜀 ∈ [0, 1). For all (𝑛, |M| , 𝜀) private
communication protocols for N, the following bound holds
√ log2 |M| 2 √
1−2 𝜀 ≤ 𝐸 sq (N) + 𝑔2 ( 𝜀), (16.2.64)
𝑛 𝑛
where 𝐸 sq (N) is the squashed entanglement of the channel N, defined in
Section 10.5.

Proof: Plugging the tensor-power channel N ⊗𝑛 into the bound from Theorem 16.11,
we conclude the following bound
√ log2 |M| 1 2 √
1−2 𝜀 ≤ 𝐸 sq (N ⊗𝑛 ) + 𝑔2 ( 𝜀). (16.2.65)
𝑛 𝑛 𝑛
The desired statement then follows from the additivity of squashed entanglement of
a channel (Corollary 10.21), which implies that 𝑛1 𝐸 sq (N ⊗𝑛 ) = 𝐸 sq (N). ■

Theorem 16.29 Weak Converse Upper Bound on Private Capacity

The squashed entanglement 𝐸 sq (N) of a quantum channel N 𝐴→𝐵 is a weak
converse rate for private communication over N. In other words, 𝑃(N) ≤
𝐸 sq (N) for every quantum channel N.

Proof: We exploit the bound from Theorem 16.28 and an argument similar to that
from Section 15.2.4 to conclude the desired statement. ■

1061
Chapter 16: Private Communication

16.3 Examples
We now consider the private capacity for particular classes of quantum channels.
As we indicated earlier, computing the private capacity of an arbitrary channel is a
difficult task. This task is made more difficult by the fact that, in some cases, the
private information is known to be strictly superadditive in the following sense:

𝐼 𝑝 (N ⊗𝑛 ) ≥ 𝑛𝐼 𝑝 (N). (16.3.1)

This fact confirms that regularization of the private information is really needed
in general in order to compute the private capacity, and that additivity of private
information does not hold for all channels. Please consult the Bibliographic Notes in
Section 16.5 for more information about strict superadditivity of private information
for certain quantum channels.
Before starting the development below, recall that the private information of a
channel N 𝐴→𝐵 is defined as

𝐼 𝑝 (N) = sup 𝐼 (𝑋; 𝐵) 𝜌 − 𝐼 (𝑋; 𝐸) 𝜌 , (16.3.2)
{𝑝(𝑥),𝜌 𝑥𝐴 } 𝑥 ∈X

where the state 𝜌 𝑋 𝐵𝐸 is defined as

∑︁
𝜌 𝑋 𝐵𝐸 B 𝑝(𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ UN 𝑥
𝐴→𝐵𝐸 (𝜌 𝐴 ), (16.3.3)
𝑥∈X

with UN
𝐴→𝐵𝐸 an isometric channel extending N 𝐴→𝐵 and the optimization over every
ensemble {𝑝(𝑥), 𝜌 𝑥𝐴 }𝑥∈X .

16.3.1 Degradable Channels

Recall from Definition 4.6 that a channel N 𝐴→𝐵 is degradable if there exists a
degrading channel D𝐵→𝐸 such that

N𝑐 = D ◦ N, (16.3.4)

where N𝑐 is a channel complementary to N (see Definition 4.5) and 𝑑 𝐸 ≥ rank(ΓN

𝐴𝐵 ).
In particular, if 𝑉𝐴→𝐵𝐸 is an isometric extension of N, so that

N(𝜌) = Tr𝐸 [𝑉 𝜌𝑉 † ] (16.3.5)

1062
Chapter 16: Private Communication

for every state 𝜌, then

N𝑐 (𝜌) = Tr 𝐵 [𝑉 𝜌𝑉 † ]. (16.3.6)

We now show that the private information is equal to the coherent information
for every degradable channel, i.e.,

𝐼 𝑝 (N) = 𝐼 𝑐 (N), (16.3.7)

where 𝐼 𝑐 (N) is defined in (7.11.107). As a consequence of this observation and the

fact that a tensor product of degradable channels is also degradable, it follows that
the private capacity of a degradable channel is equal to its coherent information,
and there is no difference between the private capacity and the quantum capacity in
this case, i.e.,

𝑄(N) = 𝑃(N) = 𝐼 𝑝 (N) = 𝐼 𝑐 (N) for every degradable channel N. (16.3.8)

Proposition 16.30 Private Information of Degradable Channels

Let N 𝐴→𝐵 be a degradable channel. Then its private information is equal to its
coherent information:
𝐼 𝑝 (N) = 𝐼 𝑐 (N), (16.3.9)
where the channel’s private information 𝐼 𝑝 (N) is defined in (16.3.2) and its
coherent information in (7.11.107). As a consequence, (16.3.8) holds.

Proof: By Theorem 16.22, we only need to prove the inequality 𝐼 𝑝 (N) ≤ 𝐼 𝑐 (N)
for the case of a degradable channel. Let UN 𝐴→𝐵𝐸 be an isometric channel extending
Í 𝑥,𝑦
N 𝐴→𝐵 . Let 𝜌 𝐴 = 𝑦 𝑝(𝑦|𝑥)𝜓 𝐴 be a spectral decomposition of the input state 𝜌 𝑥𝐴 ,
𝑥

and define the following extension of the state 𝜌 𝑋 𝐵𝐸 in (16.3.3):

∑︁
𝑝(𝑥) 𝑝(𝑦|𝑥)|𝑥⟩⟨𝑥| 𝑋 ⊗ |𝑦⟩⟨𝑦|𝑌 ⊗ UN
𝑥,𝑦
𝜌 𝑋𝑌 𝐵𝐸 = 𝐴→𝐵𝐸 (𝜓 𝐴 ). (16.3.10)
𝑥,𝑦

Consider that

𝐼 (𝑋; 𝐵) 𝜌 − 𝐼 (𝑋; 𝐸) 𝜌

= 𝐼 (𝑋𝑌 ; 𝐵) 𝜌 − 𝐼 (𝑌 ; 𝐵|𝑋) 𝜌 − 𝐼 (𝑋𝑌 ; 𝐸) 𝜌 − 𝐼 (𝑌 ; 𝐸 |𝑋) 𝜌 (16.3.11)

= 𝐼 (𝑋𝑌 ; 𝐵) 𝜌 − 𝐼 (𝑋𝑌 ; 𝐸) 𝜌 − 𝐼 (𝑌 ; 𝐵|𝑋) 𝜌 − 𝐼 (𝑌 ; 𝐸 |𝑋) 𝜌 (16.3.12)
≤ 𝐼 (𝑋𝑌 ; 𝐵) 𝜌 − 𝐼 (𝑋𝑌 ; 𝐸) 𝜌 (16.3.13)
1063
Chapter 16: Private Communication

= 𝐻 (𝐵) 𝜌 − 𝐻 (𝐵|𝑋𝑌 ) 𝜌 − 𝐻 (𝐸) 𝜌 − 𝐻 (𝐸 |𝑋𝑌 ) 𝜌 (16.3.14)
= 𝐻 (𝐵) 𝜌 − 𝐻 (𝐸) 𝜌 (16.3.15)
≤ 𝐼 𝑐 (N). (16.3.16)
The first equality follows by applying the chain rule for conditional mutual infor-
mation. The first inequality follows by applying the data-processing inequality for
conditional mutual information and the fact that there is a degrading channel D𝐵→𝐸
such that 𝜌 𝑋𝑌 𝐸 = D𝐵→𝐸 (𝜌 𝑋𝑌 𝐵 ). The last few steps follow the same reasoning
given in the proof of Theorem 16.22. ■

16.3.1.1 Generalized Dephasing Channels

Recall the definition of a generalized dephasing channel from the discussion

surrounding (14.3.30). Similar to what was found for generalized dephasing
channels in Section 14.3.1.1, we can also consider the question of whether the
strong converse property holds for the private capacity of these channels. Indeed, it
is the case, and the reasoning is essentially the same as that given in the proof of
Theorem 14.28, except that we use the strong converse bound on private capacity
given by the relative entropy of entanglement. The main observation to make while
examining the proof of Theorem 14.28 is that the state in (14.3.48) is a separable
state.

Theorem 16.31 Private Capacity of Generalized Dephasing Channels

For every generalized dephasing channel N (defined by the isometric extension
in (14.3.30)), the following equalities hold

𝑃(N) = 𝑃(N)
e = 𝐸 𝑅 (N) = 𝐼 𝑝 (N) (16.3.17)
= 𝑄(N) = 𝑄(N)
e = 𝑅(N) = 𝐼 𝑐 (N). (16.3.18)

Proof: The following inequalities hold in general

𝐼 𝑐 (N) ≤ 𝑄(N) ≤ 𝑃(N) ≤ 𝑃(N)e ≤ 𝐸 𝑅 (N), (16.3.19)
𝑐 𝑝
𝐼 (N) ≤ 𝐼 (N) ≤ 𝑃(N), (16.3.20)
𝐼 𝑐 (N) ≤ 𝑄(N) ≤ 𝑄(N)
e ≤ 𝑅(N) ≤ 𝐸 𝑅 (N), (16.3.21)
and the reasoning given above establishes that 𝐼 𝑐 (N) = 𝐸 𝑅 (N) for generalizing
dephasing channels. ■
1064
Chapter 16: Private Communication

16.3.2 Anti-Degradable Channels

Let us consider the private capacity for anti-degradable channels. Recall from
Definition 4.6 that a channel N 𝐴→𝐵 is anti-degradable if there exists an anti-
degrading channel A𝐸→𝐵 such that

N 𝐴→𝐵 = A𝐸→𝐵 ◦ N𝑐𝐴→𝐸 , (16.3.22)

where N𝑐𝐴→𝐸 is a channel complementary to N 𝐴→𝐵 and 𝑑 𝐸 ≥ rank(ΓN

𝐴𝐵 ).

Proposition 16.32 Private Information for Anti-Degradable Channels

The private information vanishes for all anti-degradable channels, i.e., 𝐼 𝑝 (N) =
0 for every anti-degradable channel N. Therefore, the private capacity of an anti-
degradable channel is equal to zero, i.e., 𝑃(N) = 0 for every anti-degradable
channel N.

Proof: The first claim is a direct consequence of the definition of the private
information in (16.3.2), the fact that there is an anti-degrading channel A𝐸→𝐵
such that 𝜌 𝑋 𝐵 = A𝐸→𝐵 (𝜌 𝑋 𝐸 ), where the state 𝜌 𝑋 𝐵𝐸 is defined in (16.3.3), and the
data-processing inequality for mutual information. The second claim follows from
the regularized expression for private capacity from Theorem 16.21 and the fact
that a tensor product of anti-degradable channels is anti-degradable. ■

16.4 Summary

In this chapter, we studied private communication over a quantum channel N 𝐴→𝐵 .

The communication model that we employed is that a sender has access to the
input system 𝐴, a legitimate receiver has access to the output system 𝐵, and an
eavesdropper has access to the system 𝐸 of an isometric channel UN 𝐴→𝐵𝐸 extending
N 𝐴→𝐵 . This model gives the most power to the eavesdropper, subject to the
constraints that the systems 𝐴 and 𝐵 are physically secure in the laboratories of the
sender and receiver, respectively. The goal of a private communication protocol is
for the sender to transmit a classical message such that the receiver can decode it
with high probability and the eavesdropper cannot determine which message was
transmitted (i.e., her system 𝐸 should be essentially useless for figuring out the

1065
Chapter 16: Private Communication

transmitted message). The private capacity is defined as the largest rate at which
private communication is possible, such that the decoding error probability tends to
zero and the eavesdropper’s system becomes decoupled with the message system.
In our definitions, we combined these requirements into a single constraint. We
found that the private information 𝐼 𝑝 (N) of quantum channel N is a lower bound
on its private capacity, and that, in general, computing the exact value of the private
𝑝
capacity involves a regularization, i.e., 𝑃(N) = 𝐼reg (N).
Following the same course as in previous chapters, we began with the one-shot
setting for private communication, in which only one use of the channel is allowed,
along with some non-zero error. We then determined upper and lower bounds on
the number of private bits that can be transmitted. We established three upper
bounds on the one-shot private capacity, involving the one-shot private information,
the hypothesis testing relative entropy of entanglement, as well as the squashed
entanglement. These in turn led to upper bounds on the asymptotic private capacity.
To obtain a lower bound on the one-shot private capacity, we employed the methods
of position-based coding and convex splitting, similar to how we did in the previous
chapter on secret key distillation (Chapter 15). This lower bound is optimal when
employed in the asymptotic setting because it leads to the regularized private
information as an achievable rate for private communication, and this matches the
upper bound. For degradable channels, there is no difference between the private
information and the coherent information, and this implies that there is no difference
between the private capacity and quantum capacity for these channels. We also
proved that the private capacity of anti-degradable channels is equal to zero.
Since the regularized private information is difficult to compute, we established
other upper bounds on private capacity, in terms of relative entropy of entanglement
(strong converse upper bound) and squashed entanglement (weak converse upper
bound). We then concluded that the strong converse property holds for all
generalized dephasing channels and their private capacity is equal to their coherent
information.

16.5 Bibliographic Notes

Shannon (1949) studied the information-theoretic security of communication
systems. Some years later, the private capacity of a classical channel (also known as
secrecy capacity) was introduced and studied by Wyner (1975) and some years later

1066
Chapter 16: Private Communication

by Csiszár and Körner (1978), who established a general formula for the private
capacity of a classical channel.
Bennett and Brassard (1984) devised the first protocol for sending private
classical information over a quantum channel, which is known as quantum key
distribution. The private capacity of a quantum channel was studied by Devetak
(2005); Cai et al. (2004), who independently established the regularized expression
for it in Theorem 16.21.
Private communication was studied from the one-shot perspective by Renes
and Renner (2011); Wilde et al. (2017); Wilde (2017b); Radhakrishnan et al.
(2017). Proposition 16.3 was established by Wilde and Qi (2018). The connection
between secret-key transmission and bipartite private-state transmission is a direct
consequence of the insights of Horodecki et al. (2005a, 2009a) and was discussed
by Wilde et al. (2017). The upper bound in Proposition 16.7 is similar to that
established by Qi et al. (2018a). The upper bound in Theorem 16.9 is due to Wilde
et al. (2017) and the upper bound in Theorem 16.11 to Takeoka et al. (2014). The
lower bound in Section 16.1.4 is due to Wilde (2017b).
As mentioned above, the asymptotic theory of private communication was
developed by Devetak (2005); Cai et al. (2004). Devetak (2005) proved Theo-
rem 16.22, relating coherent and private information and the private to quantum
capacity. Strict superadditivity of the private information of a quantum channel
was established by Smith et al. (2008), and this result was strengthened by Elkouss
and Strelchuk (2015). The relative entropy of entanglement strong converse bound
on private capacity in Theorem 16.27 was proven by Wilde et al. (2017). The
squashed entanglement weak converse bound on private capacity in Theorem 16.29
was proven by Takeoka et al. (2014). The private capacity of degradable channels
(i.e., Theorem 16.30 and (16.3.8)) was established by Smith (2008). The strong
converse property for the private capacity of generalized dephasing channels was
established by Wilde et al. (2017).

1067
Part III

Quantum Communication
Protocols With Feedback
Assistance

We now delve into interactive quantum communication protocols. Such

protocols involve interaction between the sender and receiver of a quantum
channel, beyond the quantum channel that connects them, and this interac-
tion can potentially increase a given communication capacity because it
represents an additional resource that the sender and receiver have at their
disposal. Such protocols are richer than the non-interactive protocols that
we considered previously, and as such, their analysis is more involved.
One objective of the following chapters is to understand what role this
interaction plays and whether it can increase capacity. For the most part,
what we accomplish is the establishment of limitations on the ability of
feedback to increase capacity. In some cases, such as the case presented
in the first chapter, a surprising conclusion is that interaction does not
increase capacity at all, so that the theory simplifies.
Chapter 17

Quantum-Feedback-Assisted
Communication
In this chapter, we begin our foray into interactive quantum communication
by analyzing communication protocols in which the goal is for the sender to
communicate a classical message to the receiver, with the assistance of a free
noiseless quantum feedback channel. By a quantum feedback channel, we mean a
quantum channel from the receiver to the sender that is separate from the channel
from the sender to the receiver being used to communicate the message. We thus
call this communication scenario “quantum-feedback-assisted communication.”
One simple (yet effective) way to make use of this free noiseless quantum
feedback channel is for the receiver to transmit one share of a bipartite quantum
state to the sender. By doing so, they can establish shared entanglement, and the
rates of classical communication that are achievable with such a strategy are given
by the limits on entanglement-assisted communication that we studied previously
in Chapter 11.
Perhaps surprisingly, we show here that the same non-asymptotic converse
bounds established in (11.2.61) and (11.2.92) apply to protocols assisted by noiseless
quantum feedback. These non-asymptotic converse bounds imply that the quantum-
feedback-assisted classical capacity of a channel is no larger than its entanglement-
assisted capacity. Furthermore, the strong converse property holds for the quantum-
feedback-assisted capacity, so that the strong converse capacity is equal to the
mutual information of a quantum channel.

1069
Chapter 17: Quantum-Feedback-Assisted Communication

A10 A20 A0n−1 A0n

M3m
E 0
E 1 ··· E n−1
F0 A1 F1 A2 Fn−1 An

...
Alice Ψ 0
Bob
F0 B0 N N N

...
B1 F1 B2 F2 Bn
B00 B10 B20 Bn0 −1
D1 D2 ··· m
b

Figure 17.1: A general quantum-feedback-assisted communication protocol for

the channel N, which uses it 𝑛 times.

This result demonstrates that the entanglement-assisted capacity of a quantum

channel is a rather robust communication capacity. Not only is the mutual
information of a channel equal to the strong converse entanglement-assisted
capacity for all channels, but it is also equal to the strong converse quantum
feedback-assisted capacity for all channels. Thus, the theory of entanglement-
assisted and quantum-feedback-assisted communication simplifies immensely.
It is worth remarking that Shannon proved that a similar result holds for classical
channels, and the strong converse property was later demonstrated as well. In this
sense, the entanglement-assisted capacity of a quantum channel represents the fully
quantum generalization of the classical capacity of a classical channel. Related, the
quantum mutual information of a quantum channel represents the fully quantum
generalization of the classical mutual information of a classical channel.

17.1 𝒏-Shot Quantum Feedback-Assisted Communi-

cation Protocols
We begin by defining the most general form for an 𝑛-shot classical communication
protocol assisted by a noiseless quantum feedback channel, where 𝑛 ∈ N. Such a
protocol is depicted in Figure 17.1, and it is defined by the following elements:

(M, Ψ𝐹0 𝐵0′ , E0𝑀 ′ 𝐹0 →𝐴′ 𝐴1 , {E𝑖𝐴′ 𝐹𝑖 →𝐴′ , D𝑖𝐵𝑖 𝐵′ →𝐹𝑖 𝐵𝑖′ }𝑖=1 , D 𝐵 𝑛 𝐵′
𝑛−1 𝑛
),
1 𝑖 𝐴
𝑖+1 𝑖+1 𝑖−1 𝑛−1
→𝑀
b
(17.1.1)
where M is the message set, Ψ𝐹0 𝐵0′ denotes a bipartite quantum state, the objects
denoted by E are encoding channels, and the objects denoted by D are decoding
channels. Let C denote all of these elements, which together constitute the quantum-
1070
Chapter 17: Quantum-Feedback-Assisted Communication

feedback-assisted code. The quantum systems labeled by 𝐹 represent the feedback

systems that Bob sends back to Alice. The primed systems 𝐴𝑖′ and 𝐵𝑖′ represent
local quantum memory or “scratch” registers that Alice and Bob can exploit in the
feedback-assisted protocol.
In such an 𝑛-round feedback-assisted protocol, the protocol proceeds as follows:
let 𝑝 : M → [0, 1] be a probability distribution over the message set. Alice starts
by preparing two classical registers 𝑀 and 𝑀 ′ in the following state:
𝑝
∑︁
Φ𝑀 𝑀 ′ B 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ |𝑚⟩⟨𝑚| 𝑀 ′ . (17.1.2)
𝑚∈M

Furthermore, Alice and Bob also initially share a quantum state Ψ𝐹0 𝐵0′ on Alice’s
system 𝐹0 and Bob’s system 𝐵′0 . This state is prepared by Bob locally, and then he
transmits the system 𝐹0 to Alice via the noiseless quantum feedback channel. The
initial global state shared between them is
𝑝
Φ 𝑀 𝑀 ′ ⊗ Ψ𝐹0 𝐵0′ . (17.1.3)

Alice then sends the 𝑀 ′ and 𝐹0 registers through the first encoding channel
E0𝑀 ′ 𝐹0 →𝐴′ 𝐴1 . This encoding channel realizes a set {E0,𝑚 }
𝐹0 →𝐴1′ 𝐴1 𝑚∈M
of quantum
1
channels as follows:
E0,𝑚 (𝜏 ) B E0𝑀 ′ 𝐹0 →𝐴′ 𝐴1 (|𝑚⟩⟨𝑚| 𝑀 ′ ⊗ 𝜏𝐹0 ),
𝐹0 →𝐴′ 𝐴1 𝐹0
(17.1.4)
1 1

for all input states 𝜏𝐹0 . The global state after the first encoding channel is then as
follows:
𝑝
E0𝑀 ′ 𝐹0 →𝐴′ 𝐴1 (Φ 𝑀 𝑀 ′ ⊗ Ψ𝐹0 𝐵0′ ). (17.1.5)
1

Note that the scratch system 𝐴′1can contain a classical copy of the particular
message 𝑚 that is being communicated, and the same is true for all of the later
scratch systems 𝐴𝑖′, for 𝑖 ∈ {2, . . . , 𝑛}. In fact, this is necessary in order for the
communication protocol to be effective. Alice then transmits the 𝐴1 system through
the channel N 𝐴1 →𝐵1 , leading to the state
𝑝
𝜌 1𝑀 𝐴′ 𝐵1 𝐵′ B (N 𝐴1 →𝐵1 ◦ E0𝑀 ′ 𝐹0 →𝐴′ 𝐴1 )(Φ 𝑀 𝑀 ′ ⊗ Ψ𝐹0 𝐵0′ ). (17.1.6)
1 0 1

After receiving the 𝐵1 system, Bob performs the decoding channel D1𝐵1 𝐵′ →𝐹1 𝐵′ ,
0 1
such that the state is then
D1𝐵1 𝐵′ →𝐹1 𝐵′ (𝜌 1𝑀 𝐴′ 𝐵1 𝐵′ ), (17.1.7)
0 1 1 0

1071
Chapter 17: Quantum-Feedback-Assisted Communication

with it being understood that the system 𝐵′1 is Bob’s new scratch register and the
feedback system 𝐹1 gets sent over the noiseless quantum feedback channel back to
Alice.
In the next round, Alice processes the 𝐴′1 𝐹1 systems with the encoding channel
E1𝐴′ 𝐹1 →𝐴′ 𝐴2 , and she sends system 𝐴2 over the channel N 𝐴2 →𝐵2 , leading to the state
1 2

𝜌 2𝑀 𝐴′ 𝐵2 𝐵′ B (N 𝐴2 →𝐵2 ◦ E1𝐴′ 𝐹1 →𝐴′ 𝐴2 ◦ D1𝐵1 𝐵′ →𝐹1 𝐵′ )(𝜌 1𝑀 𝐴′ 𝐵1 𝐵′ ). (17.1.8)

2 1 1 2 0 1 1 0

Bob then applies the second decoding channel D2𝐵2 𝐵′ →𝐹2 𝐵′ . This process then
1 2
iterates 𝑛 − 2 more times, and the state after each use of the channel is as follows:

𝜌𝑖𝑀 𝐴′ 𝐵𝑖 𝐵′ B
𝑖 𝑖−1

(N 𝐴𝑖 →𝐵𝑖 ◦ E𝑖−1
𝐴′ 𝐹 →𝐴𝑖′ 𝐴𝑖 ◦ D𝑖−1
𝐵𝑖−1 𝐵′ →𝐹𝑖−1 𝐵𝑖−1
𝑖−1
′ )(𝜌 𝑀 𝐴′ 𝐵 𝐵′ ), (17.1.9)
𝑖−1 𝑖−1 𝑖−2 𝑖−1 𝑖−1 𝑖−2

for 𝑖 ∈ {3, . . . , 𝑛}.

In the final round, Bob performs the decoding channel D𝑛 , which is a
𝐵 𝑛 𝐵′𝑛−1 → 𝑀
b
quantum-to-classical channel that finally decodes the transmitted message. The
final classical–classical state of the protocol is then as follows:
𝑝
𝜔 B D𝑛 (Tr 𝐴′𝑛 [𝜌 𝑛𝑀 𝐴′𝑛 𝐵𝑛 𝐵′ ]). (17.1.10)
𝑀𝑀b 𝐵 𝑛 𝐵′𝑛−1 → 𝑀
b 𝑛−1

Now, just as we did in Chapter 11 in the case of entanglement-assisted clas-

sical communication, we can define the message error probability, average error
probability, and maximal error probability as in (11.1.13), (11.1.14), and (11.1.15),
respectively. Using the alternative expression in (11.1.24) for the average error prob-
ability, we have that the average error probability for the quantum-feedback-assisted
code C is given by
1 𝑝 𝑝
𝑝 err (C; 𝑝) = Φ𝑀 𝑀 ′ − 𝜔 b . (17.1.11)
2 𝑀𝑀 1

Using the alternative expression in (11.1.36) for the maximal error probability, we
have that the maximal error probability of the quantum-feedback-assisted code C is
given by
1 𝑝
𝑝 ∗err (C) = max
𝑝
Φ𝑀 𝑀 ′ − 𝜔 b . (17.1.12)
𝑝:M→[0,1] 2 𝑀𝑀 1

1072
Chapter 17: Quantum-Feedback-Assisted Communication

A10 A20 A0n−1 A0n

M3m
E 0
E 1 ··· E n−1
F0 A1 F1 A2 Fn−1 An

...
Alice Ψ 0
Bob
F0 B0 PσB PσB PσB

...
B1 F1 B2 F2 Bn
B00 B10 B20 Bn0 −1
D1 D2 ··· m
b

Figure 17.2: Depiction of a protocol that is useless for quantum-feedback-

assisted classical communication. In each round, the encoded state is discarded
and replaced with an arbitrary (but fixed) state 𝜎𝐵 .

Definition 17.1 (𝒏, |M|, 𝜺) Quantum-Feedback-Assisted Classical Com-

munication Protocol
Let (M, Ψ𝐹0 𝐵0′ , E0𝑀 ′ 𝐹0 →𝐴′ 𝐴1 , {E𝑖𝐴′ 𝐹𝑖 →𝐴′ , D𝑖𝐵𝑖 𝐵′ →𝐹 ′ }𝑖=1 , D
𝑛−1 𝑛 ) be
1 𝑖 𝐴
𝑖+1 𝑖+1 𝑖−1 𝑖 𝐵 𝑖 𝐵𝑛 𝐵′ 𝑛−1
→𝑀
b
the elements of an 𝑛-shot quantum-feedback-assisted classical communication
protocol over the channel N 𝐴→𝐵 . The protocol is called an (𝑛, |M|, 𝜀) protocol,
with 𝜀 ∈ [0, 1], if 𝑝 ∗err (C) ≤ 𝜀.

17.1.1 Protocol over a Useless Channel

As before, when determining converse bounds on the rate at which classical

messages can be communicated reliably using such feedback-assisted protocols, it
is helpful to consider a useless channel. Our plan is again to use relative entropy
(or some generalized divergence) to compare the states at each time step of the
actual protocol with those resulting from employing a useless channel instead of the
actual channel. As before, a useless channel that conveys no information at all is
one in which the input state is discarded and replaced with some state at the output:

R 𝐴→𝐵 B P𝜎𝐵 ◦ Tr 𝐴 , (17.1.13)

where P𝜎𝐵 denotes a preparation channel that prepares the arbitrary (but fixed) state
𝜎𝐵 at the output.
We can modify the 𝑖 th step of the protocol discussed in the previous section,
such that instead of the actual channel N 𝐴𝑖 →𝐵𝑖 being applied, the replacement
channel R 𝐴𝑖 →𝐵𝑖 is applied; see Figure 17.2.
1073
Chapter 17: Quantum-Feedback-Assisted Communication

The state after the first round in this protocol over the useless channel is
1 0 𝑝
𝐴′ 𝐵1 𝐵′ B (R 𝐴1 →𝐵1 ◦ E 𝑀 ′ 𝐹0 →𝐴′ 𝐴1 )(Φ 𝑀 𝑀 ′ ⊗ Ψ𝐹0 𝐵0 )
𝜏𝑀 ′ (17.1.14)
1 0 1
𝑝
= Tr 𝐴1 [E0𝑀 ′ 𝐹0 →𝐴′ 𝐴1 (Φ 𝑀 𝑀 ′ ⊗ Ψ𝐹0 𝐵0′ )] ⊗ 𝜎𝐵1 , (17.1.15)
1

where we observe that

1 1
𝜏𝑀 𝐴′ 𝐵1 𝐵′ = 𝜏𝑀 𝐴′ 𝐵′ ⊗ 𝜎𝐵1 ,
1 0 1 0
(17.1.16)
and furthermore that
1 0 𝑝
𝐵1 𝐵′ = Tr 𝐴1 𝐴1 [E 𝑀 ′ 𝐹0 →𝐴′ 𝐴1 (Φ 𝑀 𝑀 ′ ⊗ Ψ𝐹0 𝐵0 )] ⊗ 𝜎𝐵1
𝜏𝑀 ′ ′ (17.1.17)
0 1
𝑝
= Tr 𝑀 ′ 𝐹0 [Φ 𝑀 𝑀 ′ ⊗ Ψ𝐹0 𝐵0′ ] ⊗ 𝜎𝐵1 (17.1.18)
𝑝
= 𝜋 𝑀 ⊗ Ψ𝐵0′ ⊗ 𝜎𝐵1 , (17.1.19)

where the second equality holds due to the fact that the first encoding channel
E0𝑀 ′ 𝐹0 →𝐴′ 𝐴1 is trace preserving, and where
1
∑︁
𝑝
𝜋𝑀 B 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 . (17.1.20)
𝑚∈M

Thus, there is no correlation whatsover between the message system 𝑀 and

Bob’s systems 𝐵1 𝐵′0 after tracing over all of Alice’s systems. Intuitively, this is a
consequence of the fact that the “communication line has been cut” when employing
the replacement channel.
The state after the second replacement channel R 𝐴2 →𝐵2 is then given by
2
𝜏𝑀 𝐴′ 𝐵2 𝐵 ′
2 1

B (R 𝐴2 →𝐵2 ◦ E1𝐴′ 𝐹1 →𝐴′ 𝐴2 ◦ D1𝐵1 𝐵′ →𝐹1 𝐵′ )(𝜏𝑀

1
𝐴′ 𝐵1 𝐵 ′ ) (17.1.21)
1 2 0 1 1 0

= Tr 𝐴2 [(E1𝐴′ 𝐹1 →𝐴′ 𝐴2 ◦ D1𝐵1 𝐵′ →𝐹1 𝐵′ )(𝜏𝑀

1
𝐴′ 𝐵1 𝐵′ )] ⊗ 𝜎𝐵2 (17.1.22)
1 2 0 1 1 0

= Tr 𝐴2 [(E1𝐴′ 𝐹1 →𝐴′ 𝐴2 ◦ D1𝐵1 𝐵′ →𝐹1 𝐵′ )(𝜏𝑀

1
𝐴′ 𝐵′ ⊗ 𝜎𝐵1 )] ⊗ 𝜎𝐵2 , (17.1.23)
1 2 0 1 1 0

where we used (17.1.16) to obtain the last line. If we take the partial trace over
system 𝐴′2 , then the fact that the encoding channel E1𝐴′ 𝐹1 →𝐴′ 𝐴2 is trace preserving
1 2
implies that
2
𝜏𝑀 𝐵2 𝐵 ′ 1

1074
Chapter 17: Quantum-Feedback-Assisted Communication

= Tr 𝐴2′ 𝐴2 [(E1𝐴′ 𝐹1 →𝐴′ 𝐴2 ◦ D1𝐵1 𝐵′ →𝐹1 𝐵′ )(𝜏𝑀

1
𝐴′ 𝐵′ ⊗ 𝜎𝐵1 )] ⊗ 𝜎𝐵2 (17.1.24)
1 2 0 1 1 0

= Tr 𝐴1′ 𝐹1 [D1𝐵1 𝐵′ →𝐹1 𝐵′ (𝜏𝑀

1
𝐴′ 𝐵′ ⊗ 𝜎𝐵1 )] ⊗ 𝜎𝐵2 (17.1.25)
0 1 1 0

= Tr𝐹1 [D1𝐵1 𝐵′ →𝐹1 𝐵′ (𝜏𝑀

1
𝐵′ ⊗ 𝜎𝐵1 )] ⊗ 𝜎𝐵2 . (17.1.26)
0 1 0

𝑝
1
Then, using (17.1.19), which implies that 𝜏𝑀 𝐵′
= 𝜋 𝑀 ⊗ Ψ𝐵0′ , we find that
0

2 1 𝑝
𝐵2 𝐵′ = Tr 𝐹1 [D 𝐵1 𝐵′ →𝐹1 𝐵′ (𝜋 𝑀 ⊗ Ψ𝐵0 ⊗ 𝜎𝐵1 )] ⊗ 𝜎𝐵2
𝜏𝑀 ′ (17.1.27)
1 0 1
𝑝
= 𝜋 𝑀 ⊗ Tr𝐹1 [D1𝐵1 𝐵′ →𝐹1 𝐵′ (Ψ𝐵0′ ⊗ 𝜎𝐵1 )] ⊗ 𝜎𝐵2 (17.1.28)
0 1
𝑝
= 𝜋 𝑀 ⊗ 𝜏𝐵2 ′ ⊗ 𝜎𝐵2 . (17.1.29)
1

Thus, we find again that there is no correlation whatsoever between the message
system 𝑀 and Bob’s systems 𝐵2 𝐵′1 after tracing over all of Alice’s systems.
The states for the other rounds 𝑖 ∈ {3, . . . , 𝑛} are given by
𝑖
𝜏𝑀 𝐴′ 𝐵𝑖 𝐵 ′
𝑖 𝑖−1

B (R 𝐴𝑖 →𝐵𝑖 ◦ E𝑖−1
𝐴′ 𝐹 𝑖−1 →𝐴 ′ 𝐴 ◦ D𝐵
𝑖
𝑖−1
𝑖−1 𝐵 ′ →𝐹
𝑖−1 𝐵
𝑖−1
′ )(𝜏𝑀 𝐴′ 𝐵 𝐵′ ) (17.1.30)
𝑖−1 𝑖 𝑖−2 𝑖−1 𝑖−1 𝑖−1 𝑖−2

= Tr 𝐴𝑖 [E𝑖−1
′ 𝐹
𝐴𝑖−1 ′
𝑖−1 →𝐴𝑖 𝐴𝑖
◦ D𝑖−1 ′ →𝐹
𝐵𝑖−1 𝐵𝑖−2 𝑖−1 𝐵𝑖−1
𝑖−1
′ )(𝜏𝑀 𝐴′ 𝐵 𝐵′ )] ⊗ 𝜎𝐵𝑖 (17.1.31)
𝑖−1 𝑖−1 𝑖−2

= Tr 𝐴𝑖 [E𝑖−1
′ 𝐹
𝐴𝑖−1 ′
𝑖−1 →𝐴𝑖 𝐴𝑖
◦ D𝑖−1 ′ →𝐹
𝐵𝑖−1 𝐵𝑖−2 𝑖−1 𝐵𝑖−1
𝑖−1
′ )(𝜏𝑀 𝐴′ 𝐵 ′ ⊗ 𝜎𝐵𝑖−1 )] ⊗ 𝜎𝐵𝑖 .
𝑖−1 𝑖−2
(17.1.32)
Repeating a calculation similar to the above leads to a similar conclusion as above:
𝑖 𝑝
𝜏𝑀 𝐵𝑖 𝐵 ′ = 𝜋 𝑀 ⊗ 𝜏𝐵𝑖 ′ ⊗ 𝜎𝐵𝑖 , (17.1.33)
𝑖−1 𝑖−1

for all 𝑖 ∈ {3, . . . , 𝑛}. That is, there is no correlation whatsoever between the
message system 𝑀 and Bob’s systems 𝐵𝑖 𝐵𝑖−1 ′ after tracing over all of Alice’s systems.

Again, this is intuitively a consequence of the fact that the “communication line has
been cut” when employing the replacement channel.
Bob’s final decoding channel D𝑛 therefore leads to the following
𝐵 𝑛 𝐵′𝑛−1 → 𝑀
b
classical–classical state:
𝑝 𝑝
𝜏𝑀 𝑀b B 𝜋 𝑀 ⊗ D𝑛 (𝜏𝐵2 ′ ⊗ 𝜎𝐵𝑛 ) = 𝜋 𝑀 ⊗ 𝜏𝑀b , (17.1.34)
𝐵 𝑛 𝐵′𝑛−1 → 𝑀
b 𝑛−1

Í
where 𝜏𝑀b B 𝑚b∈M 𝑡 ( 𝑚
b)| 𝑚 b | 𝑀b for some probability distribution 𝑡 : M → [0, 1],
b⟩⟨𝑚
which corresponds to Bob’s measurement.
1075
Chapter 17: Quantum-Feedback-Assisted Communication

17.1.2 Upper Bound on the Number of Transmitted Bits

We now give a general upper bound on the number transmitted bits in any quantum-
feedback-assisted classical communication protocol. This result is stated in Theorem
17.2, and it holds independently of the encoding and decoding channels used in the
protocol and depends only on the given communication channel N. Recall from
the previous section that log2 |M| represents the number of bits that are transmitted
over the channel N.

Theorem 17.2 𝒏-Shot Upper Bounds for Quantum-Feedback-Assisted

Classical Communication
Let N 𝐴→𝐵 be a quantum channel, and let 𝜀 ∈ [0, 1). For all (𝑛, |M|, 𝜀)
quantum-feedback-assisted classical communication protocols over the channel
N 𝐴→𝐵 , the following bounds hold,

log2 |M| 1 1
≤ 𝐼 (N) + ℎ2 (𝜀) , (17.1.35)
𝑛 1−𝜀 𝑛

log2 |M| 𝛼 1
≤e𝐼𝛼 (N) + log2 ∀ 𝛼 > 1, (17.1.36)
𝑛 𝑛(𝛼 − 1) 1−𝜀

where 𝐼 (N) is the mutual information of N, as defined in (7.11.102), and e

𝐼𝛼 (N)
is the sandwiched Rényi mutual information of N, as defined in (7.11.91).

Proof: Let us start with an arbitrary (𝑛, |M|, 𝜀) quantum-feedback-assisted classi-

cal communication protocol over a channel N 𝐴→𝐵 , corresponding to, as described
earlier, a message set M, a shared quantum state Ψ𝐹0 𝐵0′ , the encoding channels
E0𝑀 ′ 𝐹0 →𝐴′ 𝐴1 and {E𝑖𝐴′ 𝐹𝑖 →𝐴′ 𝐴𝑖+1 }𝑖=1
𝑛−1 , and the decoding channels {D𝑖
′ →𝐹 𝐵 ′ }𝑖=1
𝐵𝑖 𝐵𝑖−1 𝑖 𝑖
𝑛−1
1 𝑖 𝑖+1
and D𝑛 ′ . Recall that we refer to all of these objects collectively as the code
𝐵 𝑛 𝐵 𝑛−1 → 𝑀
b
C. The error criterion 𝑝 ∗err (C) ≤ 𝜀 holds by the definition of an (𝑛, |M|, 𝜀) protocol,
which implies that for all probability distributions 𝑝 : M → [0, 1] on the message
set M:
𝑝 err (C; 𝑝) ≤ 𝑝 ∗err (C) ≤ 𝜀. (17.1.37)
(The reasoning for this is analogous to that in (11.1.63)–(11.1.66).) In particular,
the above inequality holds with 𝑝 being the uniform distribution on M, so that
1
𝑝(𝑚) = |M| for all 𝑚 ∈ M.

1076
Chapter 17: Quantum-Feedback-Assisted Communication

Now, let Φ 𝑀 𝑀b be the state defined in (17.1.2) with 𝑝 the uniform distribution, and
similarly let 𝜔 𝑀 𝑀b , defined in (17.1.10), be the state at the end of the protocol such
that 𝑝 is the uniform prior probability distribution. Observe that Tr[𝜔 𝑀 𝑀b ] = 𝜋 𝑀 .
Also, letting ∑︁
Π 𝑀 𝑀b = |𝑚⟩⟨𝑚| 𝑀 ⊗ |𝑚⟩⟨𝑚| 𝑀b (17.1.38)
𝑚∈M
be the projection defining the comparator test, as in (11.1.37), observe that
1
1 − Tr[Π 𝑀 𝑀b 𝜔 𝑀 𝑀b ] = Φ − 𝜔 𝑀 𝑀b ≤ 𝜀, (17.1.39)
2 𝑀 𝑀b 1

where the first equality follows by combining (11.1.24) with (11.1.41). This means
that
Tr[Π 𝑀 𝑀b 𝜔 𝑀 𝑀b ] ≥ 1 − 𝜀. (17.1.40)
We thus have all of the ingredients to apply Lemma 11.4. Doing so gives the
following critical first bound:

log2 |M| ≤ 𝐼 𝐻𝜀 (𝑀; 𝑀)

b 𝜔. (17.1.41)

Invoking Proposition 7.70, the definition of 𝐼 𝐻𝜀 (𝑀; 𝑀)

b from (7.11.88), and the
expression for mutual information from (7.2.97), we find that

𝜀 1
𝐼 𝐻 (𝑀; 𝑀)𝜔 ≤
b 𝐼 (𝑀; 𝑀)𝜔 + ℎ2 (𝜀) .
b (17.1.42)
1−𝜀
Now, using the data-processing inequality for the mutual information (see Proposi-
tion 7.19) with respect to the last decoding channel, D𝑛 ′ , we find that
b 𝐵 𝑛 𝐵 𝑛−1 → 𝑀

b 𝜔 ≤ 𝐼 (𝑀; 𝐵𝑛 𝐵′ ) 𝜌 𝑛 .
𝐼 (𝑀; 𝑀) (17.1.43)
𝑛−1

Then, using the chain rule for mutual information in (7.2.112), we obtain

𝐼 (𝑀; 𝐵𝑛 𝐵′𝑛−1 ) 𝜌 𝑛 = 𝐼 (𝑀; 𝐵𝑛 |𝐵′𝑛−1 ) 𝜌 𝑛 + 𝐼 (𝑀; 𝐵′𝑛−1 ) 𝜌 𝑛 (17.1.44)

≤ 𝐼 (𝑀 𝐵′𝑛−1 ; 𝐵𝑛 ) 𝜌 𝑛 + 𝐼 (𝑀; 𝐵′𝑛−1 ) 𝜌 𝑛 , (17.1.45)

where the second line is a consequence of the chain rule, as well as non-negativity
of mutual information:

𝐼 (𝑀; 𝐵𝑛 |𝐵′𝑛−1 ) 𝜌 𝑛 = 𝐼 (𝑀 𝐵′𝑛−1 ; 𝐵𝑛 ) 𝜌 𝑛 − 𝐼 (𝐵′𝑛−1 ; 𝐵𝑛 ) 𝜌 𝑛 (17.1.46)

1077
Chapter 17: Quantum-Feedback-Assisted Communication

≤ 𝐼 (𝑀 𝐵′𝑛−1 ; 𝐵𝑛 ) 𝜌 𝑛 . (17.1.47)
Finally, observe that the state 𝜌 𝑛𝑀 𝐵𝑛 𝐵′ has the following form:
𝑛−1

𝜌 𝑛𝑀 𝐵𝑛 𝐵′ = N 𝐴𝑛 →𝐵𝑛 (𝜁 𝑀
𝑛
𝐵′ 𝐴𝑛 ), (17.1.48)
𝑛−1 𝑛−1

where
𝑛
𝜁𝑀 𝐵′ 𝐴𝑛 B
𝑛−1

Tr 𝐴′𝑛 [(E𝑛−1
𝐴′ 𝐹 →𝐴′𝑛 𝐴𝑛 ◦ D𝑛−1
𝐵 𝑛−1 𝐵′
𝑛−1
→𝐹𝑛−1 𝐵′𝑛−1 )(𝜌 𝑀 𝐴′𝑛−1 𝐵 𝑛−1 𝐵′𝑛−2 )]. (17.1.49)
𝑛−1 𝑛−1 𝑛−2

𝑛
That is, the state 𝜁 𝑀 is a particular state to consider in the optimization of the
𝐵′𝑛−1 𝐴𝑛
mutual information of a channel (with the channel input system being 𝐴𝑛 and the
external correlated systems being 𝑀 𝐵′𝑛−1 ), whereas the definition of the mutual
information of a channel involves an optimization over all such states. This means
that
𝐼 (𝑀 𝐵′𝑛−1 ; 𝐵𝑛 ) 𝜌 𝑛 ≤ 𝐼 (N). (17.1.50)
Putting together (17.1.43), (17.1.45), and (17.1.50), we find that
b 𝜔 ≤ 𝐼 (N) + 𝐼 (𝑀; 𝐵′ ) 𝜌 𝑛 .
𝐼 (𝑀; 𝑀) (17.1.51)
𝑛−1

The quantity 𝐼 (𝑀; 𝐵′𝑛−1 ) 𝜌 𝑛 can be bounded using steps analogous to the above.
In particular, using the data-processing inequality for the mutual information with
respect to the second-to-last decoding channel D𝑛−1𝐵 𝑛−1 𝐵′𝑛−2 →𝐹𝑛−1 𝐵′𝑛−1
, then employing
the same steps as above, we conclude that
𝐼 (𝑀; 𝐵′𝑛−1 ) 𝜌 𝑛 ≤ 𝐼 (𝑀; 𝐵𝑛−1 𝐵′𝑛−2 ) 𝜌 𝑛−1 (17.1.52)
= 𝐼 (𝑀; 𝐵𝑛−1 |𝐵′𝑛−2 ) 𝜌 𝑛−1 + 𝐼 (𝑀; 𝐵′𝑛−2 ) 𝜌 𝑛−1 (17.1.53)
≤ 𝐼 (𝑀 𝐵′𝑛−2 ; 𝐵𝑛−1 ) 𝜌 𝑛−1 + 𝐼 (𝑀; 𝐵′𝑛−2 ) 𝜌 𝑛−1 (17.1.54)
≤ 𝐼 (N) + 𝐼 (𝑀; 𝐵′𝑛−2 ) 𝜌 𝑛−1 , (17.1.55)
Overall, this leads to
b 𝜔 ≤ 2𝐼 (N) + 𝐼 (𝑀; 𝐵′ ) 𝜌 𝑛−1 .
𝐼 (𝑀; 𝑀) (17.1.56)
𝑛−2

Then, bounding 𝐼 (𝑀; 𝐵′𝑛−2 ) in the same manner as above, and continuing this
process 𝑛 − 3 more times such that we completely “unwind” the protocol, we obtain
b 𝜔 ≤ 2𝐼 (N) + 𝐼 (𝑀; 𝐵′ ) 𝜌 𝑛−1
𝐼 (𝑀; 𝑀) (17.1.57)
𝑛−1

1078
Chapter 17: Quantum-Feedback-Assisted Communication

≤ 3𝐼 (N) + 𝐼 (𝑀; 𝐵′𝑛−3 ) 𝜌 𝑛−2 (17.1.58)

..
. (17.1.59)
≤ 𝑛𝐼 (N) + 𝐼 (𝑀; 𝐵′0 ) 𝜌1 . (17.1.60)

However, from (17.1.6), we have that 𝜌 1𝑀 𝐵′ = Φ 𝑀 ⊗ Ψ𝐵0′ , which means that

0
𝐼 (𝑀; 𝐵′0 ) 𝜌1 = 0. Therefore, putting together (17.1.41), (17.1.42), and (17.1.57)–
(17.1.60), we get

log2 |M| ≤ 𝐼 𝐻𝜀 (𝑀; 𝑀)

b𝜔 (17.1.61)
1
≤ 𝐼 (𝑀; 𝑀)𝜔 + ℎ2 (𝜀)
b (17.1.62)
1−𝜀
1
≤ (𝑛𝐼 (N) + ℎ2 (𝜀)) , (17.1.63)
1−𝜀
and the last line is equivalent to (17.1.35), as required.
We now establish the bound in (17.1.36). Combining (17.1.41) with Proposi-
tion 7.71, we conclude that the following bound holds for all 𝛼 > 1:

𝛼 1
log2 |M| ≤ e𝐼𝛼 (𝑀; 𝑀)
b 𝜔+ log2 . (17.1.64)
𝛼−1 1−𝜀

𝐼𝛼 (𝑀; 𝑀)
Recall that the sandwiched Rényi mutual information e b 𝜔 is defined as

𝐼𝛼 (𝑀; 𝑀)
e e𝛼 (𝜔 b ∥𝜔 𝑀 ⊗ 𝜉 b )
b 𝜔 = inf 𝐷
𝑀𝑀 𝑀 (17.1.65)
𝜉𝑀
c

e𝛼 (𝜔 b ∥𝜋 𝑀 ⊗ 𝜉 b ).
= inf 𝐷 (17.1.66)
𝜉𝑀 𝑀𝑀 𝑀
c

Our goal now is to compare the actual protocol with one that results from employing
a useless, replacement channel. To this end, let R𝜎𝐴→𝐵 𝐵
be the replacement channel
defined in (17.1.13), with 𝜎𝐵 an arbitrary (but fixed) state. Then as discussed in
Section 17.1.1 (in particular, in (17.1.34)), the final state of the protocol conducted
with the replacement channel is given by 𝜏𝑀 𝑀b = 𝜋 𝑀 ⊗ 𝜏𝑀b . Then, we find that

𝐼𝛼 (𝑀; 𝑀)
e e𝛼 (𝜔 b ∥𝜋 𝑀 ⊗ 𝜉 b )
b 𝜔 = inf 𝐷
𝑀𝑀 𝑀 (17.1.67)
𝜉𝑀
c

≤𝐷
e𝛼 (𝜔 b ∥𝜋 𝑀 ⊗ 𝜏 b )
𝑀𝑀 𝑀 (17.1.68)
e𝛼 (𝜔 b ∥𝜏 b ).
=𝐷 (17.1.69)
𝑀𝑀 𝑀𝑀

1079
Chapter 17: Quantum-Feedback-Assisted Communication

We now proceed with a similar method considered in the proof of the bound in
(17.1.35), but using the sandwiched Rényi relative entropy as our main tool for
analysis. By applying the data-processing inequality for the sandwiched Rényi
relative entropy with respect to the last decoding channel, and using (17.1.33), we
find that
e𝛼 (𝜌 𝑛
e𝛼 (𝜔 b ∥𝜏 b ) ≤ 𝐷 𝑛
𝐷 𝑀𝑀 𝑀𝑀 𝑀 𝐵 𝑛 𝐵′ ∥𝜏𝑀 𝐵 𝑛 𝐵′ ) 𝑛−1 𝑛−1
(17.1.70)
e𝛼 (𝜌 𝑛 𝑛
=𝐷 𝑀 𝐵 𝑛 𝐵′𝑛−1 ∥𝜋 𝑀 ⊗ 𝜏𝐵′𝑛−1 ⊗ 𝜎𝐵 𝑛 ) (17.1.71)
𝛼 e𝛼 (𝜌 𝑛 𝑛 1
= log2 𝑄 ′ ∥𝜋 𝑀 ⊗ 𝜏 ′ ⊗ 𝜎𝐵 ) 𝛼, (17.1.72)
𝛼−1 𝑀 𝐵 𝐵
𝑛 𝑛−1 𝐵 𝑛−1
𝑛

where in the last line we used the definition in (7.5.2) of the sandwiched Rényi
relative entropy. Now, recalling that 𝜌 𝑛𝑀 𝐵𝑛 𝐵′ = N 𝐴𝑛 →𝐵𝑛 (𝜁 𝑀 𝑛
𝐵′ 𝐴𝑛
) with the state
𝑛−1 𝑛−1
(𝑛)
𝜁𝑀 𝐵′𝑛−1 𝐴𝑛
defined in (17.1.49), and defining the positive semi-definite operator
1−2𝛼𝛼 1−2𝛼𝛼
(𝛼)
𝑋𝑀 𝐵′𝑛−1 𝐴𝑛
B 𝜋𝑀 ⊗ 𝜏𝐵𝑛 ′ 𝑛
𝜁𝑀 𝐵′𝑛−1 𝐴𝑛 𝜋𝑀 ⊗ 𝜏𝐵𝑛 ′ , (17.1.73)
𝑛−1 𝑛−1

as well as the completely positive map

1− 𝛼 1− 𝛼
S𝜎(𝛼)
𝐵
(·) B 𝜎𝐵2𝛼
𝑛
(·)𝜎𝐵2𝛼
𝑛
, (17.1.74)

we use the definition of 𝑄

e𝛼 in (7.5.3) to obtain
1
e𝛼 (𝜌 𝑛
𝑄 ′ ∥𝜋 𝑀 ⊗ 𝜏 𝑛
′ ⊗ 𝜎 𝐵 ) 𝛼
𝑀 𝐵 𝑛 𝐵 𝑛−1 𝐵 𝑛−1 𝑛
1−2𝛼𝛼 1−2𝛼𝛼
𝑛 𝑛 𝑛
= 𝜋 𝑀 ⊗ 𝜏𝐵′ ⊗ 𝜎𝐵𝑛 𝜌 𝑀 𝐵𝑛 𝐵′ 𝜋 𝑀 ⊗ 𝜏𝐵′ ⊗ 𝜎𝐵𝑛 (17.1.75)
𝑛−1 𝑛−1 𝑛−1
𝛼
1−2𝛼𝛼 1−2𝛼𝛼
= S𝜎(𝛼)𝐵
𝜋 𝑀 ⊗ 𝜏𝐵𝑛 ′ N 𝐴𝑛 →𝐵𝑛 (𝜁 𝑀 𝑛 𝑛
𝐵′ 𝐴𝑛 ) 𝜋 𝑀 ⊗ 𝜏𝐵′ (17.1.76)
𝑛−1 𝑛−1 𝑛−1
𝛼
= (S𝜎(𝛼)
𝐵
(𝛼)
◦ N 𝐴𝑛 →𝐵𝑛 )(𝑋 𝑀 𝐵′ 𝐴𝑛
) . (17.1.77)
𝑛−1 𝛼
(𝛼)
Multiplying and dividing by 𝑋 𝑀 𝐵′
leads to
𝑛−1 𝛼

(S𝜎(𝛼)
𝐵
(𝛼)
◦ N 𝐴𝑛 →𝐵𝑛 )(𝑋 𝑀 𝐵′ 𝐴𝑛
)
𝑛−1 𝛼
(S𝜎(𝛼)
𝐵
(𝛼)
◦ N 𝐴𝑛 →𝐵𝑛 )(𝑋 𝑀 𝐵′ 𝐴𝑛
)
𝑛−1 𝛼 (𝛼)
= 𝑋𝑀 𝐵′
(17.1.78)
(𝛼) 𝑛−1 𝛼
𝑋𝑀 𝐵′𝑛−1 𝛼

1080
Chapter 17: Quantum-Feedback-Assisted Communication

(S𝜎(𝛼)
𝐵
(𝛼)
◦ N 𝐴𝑛 →𝐵𝑛 )(𝑋 𝑀 𝐵′ 𝐴𝑛
)
𝑛−1 𝛼
= ×
(𝛼)
𝑋𝑀 𝐵′𝑛−1 𝛼
1−2𝛼𝛼 1−2𝛼𝛼
1− 𝛼 1− 𝛼
𝜋𝑀 ⊗
2𝛼
𝜏𝐵𝑛 ′ 𝑛
𝜁𝑀 𝐵′𝑛−1 𝜋𝑀 ⊗
2𝛼
𝜏𝐵𝑛 ′ (17.1.79)
𝑛−1 𝑛−1
𝛼
(S𝜎(𝛼)
𝐵
(𝛼)
◦ N 𝐴𝑛 →𝐵𝑛 )(𝑋 𝑀 𝐵′ 𝐴𝑛
)
=
𝑛−1 𝛼 e𝛼 (𝜁 𝑛 ′ ∥𝜋 𝑀 ⊗ 𝜏 𝑛 ′ ) 𝛼1
·𝑄 (17.1.80)
(𝛼) 𝑀𝐵 𝐵
𝑛−1 𝑛−1
𝑋𝑀 𝐵′𝑛−1 𝛼

(S𝜎(𝛼)
𝐵
◦ N 𝐴𝑛 →𝐵𝑛 )(𝑌𝑀 𝐵′𝑛−1 𝐴𝑛 ) 1
𝛼 e𝛼 (𝜁 𝑛 ′ ∥𝜋 𝑀 ⊗ 𝜏 𝑛 ′ ) 𝛼
≤ sup ·𝑄 𝑀𝐵 𝐵
𝐴 ≥0
𝑌 𝑀 𝐵′ 𝑛−1 𝑛−1
𝑛−1 𝑛 𝑌𝑀 𝐵′𝑛−1
𝛼
(17.1.81)
= S𝜎(𝛼)
𝐵
◦ N 𝐴𝑛 →𝐵𝑛 e𝛼 (𝜁 𝑛 ′ ∥𝜋 𝑀 ⊗ 𝜏 𝑛 ′ ) 𝛼1 ,
·𝑄 𝑀𝐵 𝐵 (17.1.82)
CB,1→𝛼 𝑛−1 𝑛−1

where to obtain the inequality we performed an optimization with respect to all

positive semi-definite operators 𝑌𝑀 𝐵′𝑛−1 𝐴𝑛 . To obtain the last line, we have used
the norm ∥·∥ CB,1→𝛼 defined in (11.2.68), which we show in Appendix 11.E can be
written as
∥M 𝐴→𝐵 (𝑌𝑅 𝐴 )∥ 𝛼
∥M∥ CB,1→𝛼 = sup (17.1.83)
𝑌𝑅 𝐴 ≥0 ∥Tr 𝐴 [𝑌𝑅 𝐴 ] ∥ 𝛼

for any completely positive map M. Plugging (17.1.82) back in to (17.1.72), we

conclude the following bound:

e𝛼 (𝜌 𝑛 𝑛
𝐷 𝑀 𝐵 𝑛 𝐵′𝑛−1 ∥𝜏𝑀 𝐵 𝑛 𝐵′𝑛−1 )
𝛼
≤ log2 S𝜎(𝛼) ◦ N 𝐴𝑛 →𝐵𝑛 e𝛼 (𝜁 𝑛 ′ ∥𝜋 𝑀 ⊗ 𝜏 𝑛 ′ ). (17.1.84)
+𝐷
𝛼−1 𝐵
CB,1→𝛼 𝑀𝐵 𝐵
𝑛−1 𝑛−1

As in the proof of (17.1.35), we now iterate the above by successively bounding the
sandwiched Rényi relative entropy terms 𝐷 e𝛼 (𝜁 𝑖 ′ ∥𝜋 𝑀 ⊗ 𝜏𝑖 ′ ) for 𝑖 ∈ {1, . . . , 𝑛}.
𝑀 𝐵𝑖−1 𝐵𝑖−1
e𝛼 (𝜁 ′ ∥𝜋 𝑀 ⊗𝜏 ′ ), we use the data-processing inequality
Starting with the term 𝐷 𝑛 𝑛
𝑀 𝐵 𝑛−1 𝐵 𝑛−1
for the sandwiched Rényi relative entropy under the second-to-last decoding channel
D𝑛−1
𝐵 𝑛−1 𝐵′𝑛−2 →𝐹𝑛−1 𝐵′𝑛−1
, then apply the same reasoning as in (17.1.75)–(17.1.82) to
obtain
e𝛼 (𝜁 𝑛 ′ ∥𝜋 𝑀 ⊗ 𝜏 𝑛 ′ )
𝐷 𝑀𝐵 𝑛−1
𝐵 𝑛−1

1081
Chapter 17: Quantum-Feedback-Assisted Communication

e𝛼 (𝜌 𝑛−1 ′ ∥𝜋 𝑀 ⊗ 𝜏 𝑛−1 ′ )
≤𝐷 (17.1.85)
𝑀 𝐵 𝑛−1 𝐵𝑛−2
𝐵 𝑛−1 𝐵 𝑛−2
e𝛼 (𝜌 𝑛−1 ′ ∥𝜋 𝑀 ⊗ 𝜏 𝑛−1
=𝐷 𝑀 𝐵 𝑛−1 𝐵 𝑛−2 𝐵′𝑛−2 ⊗ 𝜎𝐵 ) (17.1.86)
𝛼
≤ log2 S𝜎(𝛼) ◦ N 𝐴𝑛−1 →𝐵𝑛−1 e𝛼 (𝜁 𝑛−1′ ∥𝜋 𝑀 ⊗ 𝜏 𝑛−1
+𝐷 𝐵′𝑛−2 ). (17.1.87)
𝛼−1 𝐵
CB,1→𝛼 𝑀 𝐵 𝑛−2

Iterating this reasoning 𝑛 − 2 more times, we end up with the following bound:
e𝛼 (𝜔 b ∥𝜏 b )
𝐷 𝑀𝑀 𝑀𝑀
𝛼
≤𝑛 log2 S𝜎(𝛼) ◦ N 𝐴→𝐵 e𝛼 (𝜌 1 ′ ∥𝜋 𝑀 ⊗ Ψ𝐵′ )
+𝐷
𝛼−1 𝐵
CB,1→𝛼 𝑀𝐵 0 0

𝛼
=𝑛 log2 S𝜎(𝛼) ◦ N 𝐴→𝐵 , (17.1.88)
𝛼−1 𝐵
CB,1→𝛼

where the equality holds because 𝜌 1𝑀 𝐵′ = 𝜋 𝑀 ⊗ Ψ𝐵0′ . Putting together (17.1.64),

0
(17.1.69), and (17.1.88), we finally obtain

log2 |M| ≤

𝛼 𝛼 1
𝑛 log2 S𝜎(𝛼) ◦ N 𝐴→𝐵 + log2 . (17.1.89)
𝛼−1 𝐵
CB,1→𝛼 𝛼−1 1−𝜀
Since we proved that this bound holds for any choice of the state 𝜎𝐵 , we conclude
that

log2 |M|

𝛼 𝛼 1
≤𝑛 inf log2 S𝜎(𝛼) ◦ N 𝐴→𝐵 + log2 (17.1.90)
𝛼 − 1 𝜎𝐵 𝐵
CB,1→𝛼 𝛼−1 1−𝜀

𝛼 1
= 𝑛e𝐼𝛼 (N) + log2 , (17.1.91)
𝛼−1 1−𝜀
where the last equality follows from Lemma 11.20, and it implies (17.1.36). ■

17.1.3 The Amortized Perspective

In this section, we revisit the proofs above for Theorem 17.2 that establish bounds
on non-asymptotic quantum feedback-assisted capacity. In particular, we adopt a
different perspective, which we call the amortized perspective and which turns out
to be useful in establishing bounds for all kinds of feedback-assisted protocols other
than the ones considered in this chapter.
1082
Chapter 17: Quantum-Feedback-Assisted Communication

For the case of quantum feedback-assisted protocols, the amortized perspective

consists of defining the amortized mutual information of a quantum channel as
the largest net difference of the output and input mutual information that can be
realized by the channel, when Alice and Bob are allowed to share an arbitrary
state before the use of the channel. A key property of the amortized mutual
information of a channel is that it is more readily seen to lead to an upper bound
on the non-asymptotic quantum feedback-assisted capacity. Furthermore, another
key property is that the amortized mutual information of a channel collapses to
the usual mutual information of a channel, and so this leads to an alternative
way of understanding the previous results. Furthermore, as indicated above, this
perspective becomes quite useful in later chapters when we analyze LOCC-assisted
quantum communication protocols and LOPC-assisted private communication
protocols.

17.1.3.1 Quantum Mutual Information

We begin by defining the following key concept, the amortized mutual information
of a quantum channel:

Definition 17.3 Amortized Mutual Information of a Channel

The amortized mutual information of a quantum channel N 𝐴→𝐵 is defined as

𝐼 A (N) B sup 𝐼 ( 𝐴′; 𝐵𝐵′)𝜔 − 𝐼 ( 𝐴′ 𝐴; 𝐵′) 𝜌 ,

(17.1.92)
𝜌 𝐴′ 𝐴𝐵′

where
𝜔 𝐴′ 𝐵𝐵′ B N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ) (17.1.93)
and the optimization is over states 𝜌 𝐴′ 𝐴𝐵′ .

Intuitively, the amortized mutual information is equal to the largest net mutual
information that can be realized by the channel, if we allow Alice and Bob to share
an arbitrary state before communication begins. As mentioned above, this concept
turns out to be useful for understanding the feedback-assisted protocols presented
previously.
We have the following simple relationship between mutual information and
amortized mutual information:

1083
Chapter 17: Quantum-Feedback-Assisted Communication

Lemma 17.4
The mutual information of any channel N 𝐴→𝐵 does not exceed its amortized
mutual information:
𝐼 (N) ≤ 𝐼 A (N). (17.1.94)

Proof: Let us restrict the optimization in the definition of the amortized mutual
information to states 𝜌 𝐴′ 𝐴𝐵′ that have a trivial 𝐵′ system. This means that
𝜌 𝐴′ 𝐴𝐵′ is of the form 𝜌 𝐴′ 𝐴𝐵′ = 𝜌 𝐴′ 𝐴 ⊗ |0⟩⟨0| 𝐵′ . Therefore, 𝐼 ( 𝐴′ 𝐴; 𝐵′) 𝜌 = 0 and
𝐼 ( 𝐴′; 𝐵𝐵′)𝜔 = 𝐼 ( 𝐴′; 𝐵)𝜔 , where 𝜔 𝐴′ 𝐵𝐵′ = N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ), so that

𝐼 A (N) = sup [𝐼 ( 𝐴′; 𝐵𝐵′)𝜔 − 𝐼 ( 𝐴′ 𝐴; 𝐵)𝜔 ] (17.1.95)

𝜌 𝐴′ 𝐴𝐵′
≥ sup 𝐼 ( 𝐴′; 𝐵)𝜔 (17.1.96)
𝜌 𝐴′ 𝐴
= 𝐼 (N), (17.1.97)

i.e., 𝐼 A (N) ≥ 𝐼 (N), as required. ■

An important question is whether the opposite inequality, i.e., 𝐼 A ≤ 𝐼 (N), holds,

which would allow us to conclude that 𝐼 A = 𝐼 (N). In this case, we find that it does.
Specifically, we have the following.

Proposition 17.5
Given an arbitrary quantum channel N, amortization does not increase its
mutual information:
𝐼 (N) = 𝐼 A (N). (17.1.98)

Proof: To see this, consider that for an arbitrary input state 𝜌 𝐴′ 𝐴𝐵′ , we can use the
chain rule for mutual information in (7.2.112) twice to obtain

𝐼 ( 𝐴′; 𝐵𝐵′)𝜔 − 𝐼 ( 𝐴′ 𝐴; 𝐵′) 𝜌

= 𝐼 ( 𝐴′; 𝐵)𝜔 + 𝐼 ( 𝐴′; 𝐵′ |𝐵)𝜔 − 𝐼 ( 𝐴′ 𝐴; 𝐵′) 𝜌 (17.1.99)
≤ 𝐼 ( 𝐴′; 𝐵)𝜔 + 𝐼 ( 𝐴′ 𝐵; 𝐵′)𝜔 − 𝐼 ( 𝐴′ 𝐴; 𝐵′) 𝜌 . (17.1.100)

In particular, to obtain the third line, note that (7.2.112) implies

𝐼 ( 𝐴′ 𝐵; 𝐵′)𝜔 = 𝐼 (𝐵; 𝐵′)𝜔 + 𝐼 ( 𝐴′; 𝐵′ |𝐵)𝜔 ≥ 𝐼 ( 𝐴′; 𝐵′ |𝐵)𝜔 (17.1.101)

1084
Chapter 17: Quantum-Feedback-Assisted Communication

since 𝐼 (𝐵; 𝐵′)𝜔 ≥ 0. Continuing, we apply the data-processing inequality for

the mutual information under the channel N, which implies that 𝐼 ( 𝐴′ 𝐴; 𝐵) 𝜌 ≥
𝐼 ( 𝐴′ 𝐵; 𝐵′)𝜔 . Therefore,

𝐼 ( 𝐴′; 𝐵𝐵′)𝜔 − 𝐼 ( 𝐴′ 𝐴; 𝐵′) 𝜌 (17.1.102)

≤ 𝐼 ( 𝐴′; 𝐵)𝜔 + 𝐼 ( 𝐴′ 𝐵; 𝐵′)𝜔 − 𝐼 ( 𝐴′ 𝐵; 𝐵′)𝜔 (17.1.103)
= 𝐼 ( 𝐴′; 𝐵)𝜔 (17.1.104)
≤ 𝐼 (N), (17.1.105)

where the last line follows because the state 𝜔 𝐴′ 𝐵 = N 𝐴→𝐵 (𝜌 𝐴′ 𝐴 ) has the form of
states that we consider when performing the optimization in the definition of the
mutual information of a channel. Since the inequality

𝐼 ( 𝐴′; 𝐵𝐵′)𝜔 − 𝐼 ( 𝐴′ 𝐴; 𝐵′) 𝜌 ≤ 𝐼 (N) (17.1.106)

holds for an arbitrary input state 𝜌 𝐴′ 𝐴𝐵′ , we conclude the bound in (17.1.98). ■

We note here that the equality in (17.1.98) is stronger than the additivity of mutual
information shown in Chapter 11 (in particular, that shown in Theorem 11.19).
Indeed, the equality in (17.1.98) actually implies the additivity relation discussed
previously. To see this, consider that the equality in (17.1.98) implies that

𝐼 ( 𝐴′; 𝐵𝐵′)𝜔 − 𝐼 ( 𝐴′ 𝐴; 𝐵′) 𝜌 ≤ 𝐼 (N) (17.1.107)

for an arbitrary input state 𝜌 𝐴′ 𝐴𝐵′ , where 𝜔 𝐴′ 𝐵𝐵′ = N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ). Now let
𝜌 𝐴′ 𝐴𝐵′ = M 𝐴′′ →𝐵′ (𝜎𝐴′ 𝐴𝐴′′ ) for some channel M 𝐴′′ →𝐵′ and some state 𝜎𝐴′ 𝐴𝐴′′ .
Then it follows that

𝜔 𝐴′ 𝐵𝐵′ = (N 𝐴→𝐵 ⊗ M 𝐴′′ →𝐵′ )(𝜎𝐴′ 𝐴𝐴′′ ), (17.1.108)

and applying (17.1.107), we have that

𝐼 ( 𝐴′; 𝐵𝐵′)𝜔 ≤ 𝐼 (N) + 𝐼 ( 𝐴′ 𝐴; 𝐵′) 𝜌 (17.1.109)

≤ 𝐼 (N) + sup 𝐼 ( 𝐴′ 𝐴; 𝐵′) 𝜌 (17.1.110)
𝜎𝐴′ 𝐴𝐴′′
= 𝐼 (N) + 𝐼 (M), (17.1.111)

where the inequality follows because the state 𝜎𝐴′ 𝐴𝐴′′ is a particular state to consider
for the optimization in the definition of the mutual information of the channel

1085
Chapter 17: Quantum-Feedback-Assisted Communication

M 𝐴′′ →𝐵′ . Since the inequality holds for all input states 𝜎𝐴′ 𝐴𝐴′′ to N 𝐴→𝐵 ⊗ M 𝐴′′ →𝐵′ ,
we conclude that
𝐼 (N ⊗ M) ≤ 𝐼 (N) + 𝐼 (M), (17.1.112)
which is the non-trivial inequality needed in the proof of the additivity of the mutual
information of a channel (see the proof of Theorem 11.19).
How is the amortized mutual information relevant for analyzing a feedback-
assisted protocol? Consider that the bound in (17.1.43) involves the mutual
information 𝐼 (𝑀; 𝐵𝑛 𝐵′𝑛−1 ) 𝜌 𝑛 , so that

𝐼 (𝑀; 𝐵𝑛 𝐵′𝑛−1 ) 𝜌 𝑛
= 𝐼 (𝑀; 𝐵𝑛 𝐵′𝑛−1 ) 𝜌 𝑛 − 𝐼 (𝑀; 𝐵′0 ) 𝜌1 (17.1.113)
𝑛−1
∑︁
= 𝐼 (𝑀; 𝐵𝑛 𝐵′𝑛−1 ) 𝜌 𝑛 − 𝐼 (𝑀; 𝐵′0 ) 𝜌1 + 𝐼 (𝑀; 𝐵𝑖′) 𝜌𝑖 − 𝐼 (𝑀; 𝐵𝑖′) 𝜌𝑖 (17.1.114)
𝑖=1
𝑛−1
∑︁
≤ 𝐼 (𝑀; 𝐵𝑛 𝐵′𝑛−1 ) 𝜌 𝑛 − 𝐼 (𝑀; 𝐵′0 ) 𝜌1 + ′
𝐼 (𝑀; 𝐵𝑖 𝐵𝑖−1 ) 𝜌𝑖 − 𝐼 (𝑀; 𝐵𝑖′) 𝜌𝑖 (17.1.115)
𝑖=1
𝑛
∑︁
′ ′
= 𝐼 (𝑀; 𝐵𝑖 𝐵𝑖−1 ) 𝜌𝑖 − 𝐼 (𝑀; 𝐵𝑖−1 ) 𝜌𝑖 (17.1.116)
𝑖=1
≤ 𝑛 · sup 𝐼 ( 𝐴′; 𝐵𝐵′)𝜔 − 𝐼 ( 𝐴′ 𝐴; 𝐵′) 𝜌 (17.1.117)
𝜌 𝐴′ 𝐴𝐵′
A
= 𝑛 · 𝐼 (N) = 𝑛 · 𝐼 (N). (17.1.118)

The first equality follows because the state 𝜌 1𝑀 𝐵′ is a product state. The second
0
equality follows by adding and subtracting the mutual information of the state
of the message system 𝑀 and Bob’s memory system 𝐵𝑖′. The inequality is a
consequence of data processing under the action of the decoding channels. The
third equality follows from collecting terms. The final inequality follows because the
state 𝜌𝑖𝑀 𝐵𝑖 𝐵′ is a particular state to consider in the optimization of the amortized
𝑖−1
mutual information, and the final equality follows from the amortization collapse
in Proposition 17.5.
Thus, we observe that the bound in (17.1.35), at a fundamental level, is a
consequence of the amortization collapse from Proposition 17.5.

1086
Chapter 17: Quantum-Feedback-Assisted Communication

17.1.3.2 Sandwiched Rényi Mutual Information

We can also consider the concept of amortization for the sandwiched Rényi mutual
information, and in this subsection, we revisit the bound in (17.1.36) to understand
it from this perspective.

Definition 17.6 Amortized Sandwiched Mutual Information

The amortized sandwiched Rényi mutual information of a quantum channel
N 𝐴→𝐵 is defined for 𝛼 ∈ (0, 1) ∪ (1, ∞) as follows:
h i
A ′ ′ ′ ′
𝐼𝛼 (N) B sup 𝐼𝛼 ( 𝐴 ; 𝐵𝐵 )𝜔 − 𝐼𝛼 ( 𝐴 𝐴; 𝐵 ) 𝜌 ,
e e e (17.1.119)
𝜌 𝐴′ 𝐴𝐵′

where 𝜔 𝐴′ 𝐵𝐵′ B N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ) and the optimization is over states 𝜌 𝐴′ 𝐴𝐵′ .

Just as with the mutual information of a channel, we find that for all 𝛼 ∈
(0, 1) ∪ (1, ∞),
e 𝐼𝛼A (N),
𝐼𝛼 (N) ≤ e (17.1.120)
and the proof of this analogous to the proof of Lemma 17.4, which establishes
the corresponding inequality for the mutual information. So the question is to
determine whether the opposite inequality holds. Indeed, we find again that it is
the case, at least for 𝛼 > 1.

Proposition 17.7
Amortization does not increase the sandwiched Rényi mutual information of a
quantum channel N for all 𝛼 > 1:
e 𝐼𝛼A (N).
𝐼𝛼 (N) = e (17.1.121)

Proof: Let 𝜌 𝐴′ 𝐴𝐵′ be an arbitrary input state, and let 𝜎𝐵 and 𝜏𝐵′ be arbitrary states.
Then, letting 𝜔 𝐴′ 𝐵𝐵′ = N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ), we find that

𝐼𝛼 ( 𝐴′; 𝐵𝐵′)𝜔
e
= inf 𝐷 e𝛼 (𝜔 𝐴′ 𝐵𝐵′ ∥𝜔 𝐴′ ⊗ 𝜉 𝐵𝐵′ ) (17.1.122)
𝜉 𝐵𝐵′

≤𝐷
e𝛼 (𝜔 𝐴′ 𝐵𝐵′ ∥𝜔 𝐴′ ⊗ 𝜎𝐵 ⊗ 𝜏𝐵′ ) (17.1.123)
1087
Chapter 17: Quantum-Feedback-Assisted Communication

=𝐷e𝛼 (N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ )∥ 𝜌 𝐴′ ⊗ 𝜎𝐵 ⊗ 𝜏𝐵′ ) (17.1.124)

𝛼 1− 𝛼 1− 𝛼
= log2 (𝜌 𝐴′ ⊗ 𝜎𝐵 ⊗ 𝜏𝐵′ ) 2𝛼 N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ) (𝜌 𝐴′ ⊗ 𝜎𝐵 ⊗ 𝜏𝐵′ ) 2𝛼 ,
𝛼−1 𝛼
(17.1.125)

where to obtain the last equality we used the alternate expression in (7.5.3) for the
sandwiched Rényi relative entropy. Defining
1− 𝛼 1− 𝛼
𝑋 𝐴(𝛼)
′ 𝐴𝐵′ B (𝜌 𝐴 ⊗ 𝜏𝐵 )
′ ′ 2𝛼 𝜌 𝐴′ 𝐴𝐵′ (𝜌 𝐴′ ⊗ 𝜏𝐵′ ) 2𝛼 , (17.1.126)

and making use of the completely positive map S𝜎(𝛼)

𝐵
from (17.1.74), we find that
1− 𝛼 1− 𝛼
(𝜌 𝐴′ ⊗ 𝜎𝐵 ⊗ 𝜏𝐵′ ) 2𝛼 N 𝐴→𝐵 (𝜌 𝐴′ 𝐴𝐵′ ) (𝜌 𝐴′ ⊗ 𝜎𝐵 ⊗ 𝜏𝐵′ ) 2𝛼
𝛼
= (S𝜎(𝛼)
𝐵
◦ N 𝐴→𝐵 )(𝑋 𝐴(𝛼)
′ 𝐴𝐵 ′ ) (17.1.127)
𝛼
(S𝜎(𝛼)
𝐵
◦ N 𝐴→𝐵 )(𝑋 𝐴(𝛼)
′ 𝐴𝐵 ′ )
= 𝛼
𝑋 𝐴(𝛼)
′ 𝐵′ (17.1.128)
𝑋 𝐴(𝛼)
′ 𝐵′
𝛼
𝛼
(𝛼)
(S𝜎𝐵 ◦ N 𝐴→𝐵 )(𝑌 𝐴′ 𝐴𝐵′ )
𝛼
≤ sup ×
𝑌 𝐴′ 𝐴𝐵′ ≥0 ∥𝑌 𝐴′ 𝐵′ ∥ 𝛼
1− 𝛼 1− 𝛼
[𝜌 𝐴′ ⊗ 𝜏𝐵′ ] 2𝛼 𝜌 𝐴′ 𝐵′ [𝜌 𝐴′ ⊗ 𝜏𝐵′ ] 2𝛼 (17.1.129)
𝛼
= S𝜎(𝛼)
1
𝐵
◦ N 𝐴→𝐵 ·𝑄
e𝛼 (𝜌 𝐴′ 𝐵′ ∥ 𝜌 𝐴′ ⊗ 𝜏𝐵′ ) 𝛼 , (17.1.130)
CB,1→𝛼

where in the last line we have used the expression in (11.E.1) for the norm ∥·∥ CB,1→𝛼 .
Plugging (17.1.130) back into (17.1.125), we find that
𝛼
𝐼𝛼 ( 𝐴′; 𝐵𝐵′)𝜔 ≤
e log2 S𝜎(𝛼) ◦ N 𝐴→𝐵
𝛼−1 𝐵
CB,1→𝛼
+𝐷
e𝛼 (𝜌 𝐴′ 𝐵′ ∥ 𝜌 𝐴′ ⊗ 𝜏𝐵′ ). (17.1.131)

Since the inequality holds for arbitrary states 𝜎𝐵 and 𝜏𝐵′ , we conclude that

𝐼𝛼 ( 𝐴′; 𝐵𝐵′)𝜔
e
𝛼
≤ inf log2 S𝜎(𝛼) ◦ N 𝐴→𝐵 + inf 𝐷
e𝛼 (𝜌 𝐴′ 𝐵′ ∥ 𝜌 𝐴′ ⊗ 𝜏𝐵′ ) (17.1.132)
𝛼 − 1 𝜎𝐵 𝐵
CB,1→𝛼 𝜏𝐵′

=e 𝐼𝛼 ( 𝐴′; 𝐵′) 𝜌
𝐼𝛼 (N) + e (17.1.133)
1088
Chapter 17: Quantum-Feedback-Assisted Communication

≤e 𝐼𝛼 ( 𝐴′ 𝐴; 𝐵′) 𝜌 ,
𝐼𝛼 (N) + e (17.1.134)

where the equality follows from Lemma 11.20 and the final inequality from the
data-processing inequality for the mutual information under the partial trace Tr 𝐴 .
Since we have shown that the following inequality holds for an arbitrary input state
𝜌 𝐴′ 𝐴𝐵′ :
𝐼𝛼 ( 𝐴′; 𝐵𝐵′)𝜔 − e
e 𝐼𝛼 ( 𝐴′ 𝐴; 𝐵′) 𝜌 ≤ e
𝐼𝛼 (N), (17.1.135)
𝐼𝛼A (N) ≤ e
we conclude that e 𝐼𝛼A (N) = e
𝐼𝛼 (N), which leads to e 𝐼𝛼 (N) after combining
with (17.1.120). ■

By following exactly the same steps in (17.1.107)–(17.1.112), but replacing 𝐼

with e
𝐼𝛼 , we can conclude that the amortization collapse in Proposition 17.7 implies
the additivity relation (Theorem 11.22) for sandwiched Rényi mutual information
of quantum channels for all 𝛼 > 1:
e 𝐼𝛼A (N)
𝐼𝛼 (N) = e ∀𝛼 >1 =⇒ 𝐼𝛼 (N ⊗ M) ≤ e
e 𝐼𝛼 (N) + e
𝐼𝛼 (M), (17.1.136)

where N and M are quantum channels. Furthermore, by following exactly the

same steps as in (17.1.113)–(17.1.118), but replacing 𝐼 with e
𝐼𝛼 , and employing
Proposition 17.7, we conclude the following bound

𝐼𝛼 (𝑀; 𝐵𝑛 𝐵′𝑛−1 ) 𝜌 𝑛 ≤ 𝑛 · e
e 𝐼𝛼 (N), (17.1.137)

which in turn implies the bound in (17.1.36). Thus, we can alternatively analyze
feedback-assisted protocols and arrive at the bound in (17.1.36) by utilizing the
concept of amortization.

17.2 Quantum Feedback-Assisted Classical Capacity

of a Quantum Channel
In this section, we analyze the asymptotic case, in which we allow for an arbitrarily
large number of rounds of feedback. This task is now rather straightforward, given
the bounds that we have established in the previous section. So we keep this section
brief, only stating some definitions and then some theorems that follow as a direct
consequence of definitions and developments in previous chapters.

1089
Chapter 17: Quantum-Feedback-Assisted Communication

Definition 17.8 Achievable Rate for Quantum-Feedback-Assisted Classi-

cal Communication
Given a quantum channel N, a rate 𝑅 ∈ R+ is called an achievable rate for
quantum-feedback-assisted classical communication over N if for all 𝜀 ∈ (0, 1],
all 𝛿 > 0, and all sufficiently large 𝑛, there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) quantum-
feedback-assisted classical communication protocol.

Definition 17.9 Quantum-Feedback-Assisted Classical Capacity of a

Quantum Channel
The quantum-feedback-assisted classical capacity of a quantum channel N,
denoted by 𝐶QFB (N), is defined as the supremum of all achievable rates, i.e.,

𝐶QFB (N) B sup{𝑅 : 𝑅 is an achievable rate for N}. (17.2.1)

Definition 17.10 Strong Converse Rate for Quantum-Feedback-Assisted

Classical Communication
Given a quantum channel N, a rate 𝑅 ∈ R+ is called a strong converse rate for
quantum-feedback-assisted classical communication over N if for all 𝜀 ∈ [0, 1),
all 𝛿 > 0, and all sufficiently large 𝑛, there does not exist an (𝑛, 2𝑛(𝑅+𝛿) , 𝜀)
quantum-feedback-assisted classical communication protocol.

Definition 17.11 Strong Converse Quantum-Feedback-Assisted Classical

Capacity of a Quantum Channel
The strong converse quantum-feedback-assisted classical capacity of a quantum
channel N, denoted by 𝐶
eQFB (N), is defined as the infimum of all strong converse
rates, i.e.,
eQFB (N) B inf{𝑅 : 𝑅 is a strong converse rate for N}.
𝐶 (17.2.2)

The main result of this section is the following capacity theorem:

1090
Chapter 17: Quantum-Feedback-Assisted Communication

Theorem 17.12 Quantum-Feedback-Assisted Classical Capacity

For any quantum channel N, its quantum-feedback-assisted classical capacity
𝐶QFB (N) and its strong converse quantum-feedback-assisted classical capacity
are both equal to its mutual information 𝐼 (N), i.e.,

𝐶QFB (N) = 𝐶
eQFB (N) = 𝐼 (N), (17.2.3)

where 𝐼 (N) is defined in (7.11.102).

Proof: By previous reasoning, we have that

𝐶QFB (N) ≤ 𝐶
eQFB (N), (17.2.4)

and by Theorem 11.16 and the fact that any entanglement-assisted classical
communication protocol is a particular kind of quantum-feedback-assisted classical
communication protocol, we have that

𝐼 (N) ≤ 𝐶QFB (N) ≤ 𝐶

eQFB (N). (17.2.5)

The upper bound 𝐶 eQFB (N) ≤ 𝐼 (N) follows from (17.1.36) and the same reasoning
given in the proof detailed in Section 11.2.3. ■

17.3 Bibliographic Notes

Shannon (1956) proved that feedback does not increase the classical capacity of
a classical channel (his result is a weak-converse bound). The strong converse
for the feedback-assisted classical capacity of a classical channel was established
independently by Kemperman and Kesten. Kesten’s proof appeared in (Wolfowitz,
1964, Chapter 4) and Kemperman’s proof appeared later in (Kemperman, 1971).
See (Ulrey, 1976) for a discussion of this history. Polyanskiy and Verdú (2010)
employed a Rényi-entropic method to extend Shannon’s result to a strong converse
statement, i.e., that the mutual information of a classical channel is equal to the
strong converse feedback-assisted classical capacity.
Bowen (2004) proved that a quantum feedback channel does not increase the
entanglement-assisted classical capacity of a quantum channel (his result is a
weak-converse bound). Bennett et al. (2014) proved that the mutual information
1091
Chapter 17: Quantum-Feedback-Assisted Communication

of a quantum channel is equal to the strong converse quantum-feedback-assisted

classical capacity. Their approach was to employ the quantum reverse Shannon
theorem to do so. Cooney et al. (2016) used a Rényi-entropic method to prove this
same result, and this is the approach that we have followed in this book. As far as
we are aware, the concept of amortized mutual information of a quantum channel
and the fact that it reduces to the mutual information of a quantum channel are
original to this book.

1092
Chapter 18

Classical-Feedback-Assisted
Communication
In this chapter, we continue with our study of feedback-assisted capacities. The
class of protocols that we consider in this chapter are very similar to those from
the previous chapter (Chapter 17), with the exception that the feedback channel is
a classical channel instead of a quantum channel. The resulting communication
task is then called classical communication assisted by a classical channel (or
classical-feedback-assisted communication for short).
Interestingly, this slight change has the effect of complicating the theory quite
a bit: a general expression for the capacity is not known. It is only known for
certain channels such as entanglement-breaking channels and erasure channels.
Additionally, there are examples of channels for which classical feedback can
increase the classical capacity significantly, due to the interplay between classical
feedback and entanglement that can be generated by the channel. We do not discuss
this example in this chapter and instead point to the Bibliographic Notes for details
(Section 18.7). All of the above implies that the increase of capacity due to classical
feedback is a truly quantum-mechanical phenomenon that separates the classical
and quantum theories of communication. Indeed, it is necessary for a channel to
have the ability to generate entanglement in order for classical feedback to give a
boost to capacity.
Our main focus in this chapter is on establishing upper bounds on the classical-
feedback-assisted capacity. First, we prove that classical feedback does not increase
the capacity of entanglement-breaking channels. The main tools here are similar
1093
Chapter 18: Classical-Feedback-Assisted Communication

to those employed in Section 12.2.3.1. Next, we establish that the average output
entropy of a channel is an upper bound on the feedback-assisted capacity. Finally,
we establish that the Υ-information of a channel, introduced in Section 12.2.5.1,
is actually an upper bound on the feedback-assisted capacity. We close out the
chapter by discussing some example channels and summarizing the main concepts
presented.

18.1 𝒏-Shot Classical Feedback-Assisted Communi-

cation Protocols
In this section, we briefly summarize what is meant by an 𝑛-shot protocol for
classical communication assisted by a classical feedback channel, where 𝑛 ∈ N.
This section is brief because such a protocol is defined exactly as in Section 17.1,
with the exception that every feedback channel acting on system 𝐹𝑖 , sent from the
receiver to the sender, for all 𝑖 ∈ {0, . . . , 𝑛}, is a classical feedback channel of the
following form:
𝑑 𝐹𝑖 −1
∑︁
Δ𝐹𝑖 (𝜌 𝐹𝑖 ) B | 𝑗⟩⟨ 𝑗 | 𝐹𝑖 𝜌 𝐹𝑖 | 𝑗⟩⟨ 𝑗 | 𝐹𝑖 , (18.1.1)
𝑗=0
𝑑 𝐹 −1
where {| 𝑗⟩} 𝑗=0𝑖 is some orthonormal basis known to both the sender and the
receiver.
In short, every such protocol has the form given in Figure 17.1, with the
aforementioned exception that every channel acting on 𝐹𝑖 , for 𝑖 ∈ {0, . . . , 𝑛}, is a
classical feedback channel as in (18.1.1). Every such protocol is defined by the
following elements:

(M, Ψ𝐹0 𝐵0′ , E0𝑀 ′ 𝐹0 →𝐴′ 𝐴1 , {E𝑖𝐴′ 𝐹𝑖 →𝐴′ 𝐴 𝑖+1

, D𝑖𝐵𝑖 𝐵′ →𝐹𝑖 𝐵′ }𝑖=1
𝑛−1
, D𝑛𝐵 𝐵′
), → 𝑀ˆ
1 𝑖 𝑖+1 𝑖−1 𝑖 𝑛 𝑛−1
(18.1.2)
where M is the message set, Ψ𝐹0 𝐵0′ is a bipartite quantum state, the objects denoted
by E are encoding channels, and those denoted by D decoding channels. Let
C denote all of these elements, which together constitute the classical-feedback-
assisted code. The systems labeled by 𝐹 are feedback systems, the 𝑖th of which is
sent by the receiver Bob to the sender Alice through the classical feedback channel
𝑝
in (18.1.1). The initial state of such a protocol, prepared by Alice, is Φ 𝑀 𝑀 ′ , as
defined in (17.1.2). The states throughout the protocol are the same as defined in
1094
Chapter 18: Classical-Feedback-Assisted Communication

Section 17.1, with the exception that every state with an 𝐹 label is replaced by the
same state succeeded by the completely dephasing channel Δ𝐹𝑖 . That is, the initial
state is
𝑝
Φ 𝑀 𝑀 ′ ⊗ Δ𝐹0 (Ψ𝐹0 𝐵0′ ), (18.1.3)
and the other states are
𝑝
𝜌 1𝑀 𝐴′ 𝐵1 𝐵′ B (N 𝐴1 →𝐵1 ◦ E0𝑀 ′ 𝐹0 →𝐴′ 𝐴1 )(Φ 𝑀 𝑀 ′ ⊗ Δ𝐹0 (Ψ𝐹0 𝐵0′ )), (18.1.4)
1 0 1

𝜌 2𝑀 𝐴′ 𝐵2 𝐵′ B (N 𝐴2 →𝐵2 ◦ E1𝐴′ 𝐹1 →𝐴′ 𝐴2 ◦ Δ𝐹1 ◦ D1𝐵1 𝐵′ →𝐹1 𝐵′ )(𝜌 1𝑀 𝐴′ 𝐵1 𝐵′ ), (18.1.5)

2 1 1 2 0 1 1 0

𝜌𝑖𝑀 𝐴′ 𝐵𝑖 𝐵′ B
𝑖 𝑖−1
2
(N 𝐴𝑖 →𝐵𝑖 ◦ E𝑖−1
𝐴′ 𝐹 →𝐴𝑖′ 𝐴𝑖 ◦ Δ𝐹𝑖−1 ◦ D𝑖−1
𝐵𝑖−1 𝐵′ ′ )(𝜌 𝑀 𝐴′ 𝐵 𝐵 ′ ),
→𝐹𝑖−1 𝐵𝑖−1 (18.1.6)
𝑖−1 𝑖−1 𝑖−2 2 2 1

where 𝑖 ∈ {3, . . . , 𝑛}. The final state of the protocol is then as follows:
𝑝
𝜔 B D𝑛𝐵 ′ (Tr 𝐴′𝑛 [𝜌 𝑛𝑀 𝐴′𝑛 𝐵𝑛 𝐵′ ]). (18.1.7)
𝑀 𝑀ˆ 𝑛 𝐵 𝑛−1 → 𝑀
ˆ 𝑛−1

Consider that the initial state of the protocol, as given in (18.1.3), has the
following form:
𝑝
Φ 𝑀 𝑀 ′ ⊗ Δ𝐹0 (Ψ𝐹0 𝐵0′ ) =
∑︁ ∑︁
𝑓
𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ |𝑚⟩⟨𝑚| 𝑀 ⊗
′ 𝑝( 𝑓0 )| 𝑓0 ⟩⟨ 𝑓0 | 𝐹0 ⊗ Ψ𝐵0′ , (18.1.8)
0
𝑚∈M 𝑓0

where 𝑝( 𝑓0 ) is a probability distribution over the possible classical values sent

through the feedback channel and each Ψ𝐵0′ is a state of the system 𝐵′0 . After the
𝑓
0
encoding channel E0𝑀 ′ 𝐹0 →𝐴′ 𝐴1 acts, the state becomes as follows:
1

𝑝
E0𝑀 ′ 𝐹0 →𝐴′ 𝐴1 (Φ 𝑀 𝑀 ′ ⊗ Δ𝐹0 (Ψ𝐹0 𝐵0′ )) =
1
∑︁ ∑︁
0,𝑚, 𝑓 𝑓
𝑝(𝑚) 𝑝( 𝑓0 )|𝑚⟩⟨𝑚| 𝑀 ⊗ 𝜍 𝐴′ 𝐴1 0 ⊗ Ψ𝐵0′ , (18.1.9)
1 0
𝑚∈M 𝑓0

0,𝑚, 𝑓
where the state 𝜍 𝐴′ 𝐴1 0 is defined as
1

0,𝑚, 𝑓
𝜍 𝐴′ 𝐴1 0 B E0𝑀 ′ 𝐹0 →𝐴′ 𝐴1 (|𝑚⟩⟨𝑚| 𝑀 ′ ⊗ | 𝑓0 ⟩⟨ 𝑓0 | 𝐹0 ). (18.1.10)
1 1

1095
Chapter 18: Classical-Feedback-Assisted Communication

Then one can proceed from here, defining states of the protocol conditioned on the
value of the message and the classical feedback.
Just as we did in Chapter 11, we define the message error probability, average
error probability, and maximal error probability, as in (11.1.13), (11.1.14), and
(11.1.15), respectively. Using the expression in (11.1.24), the average error
probability for the classical-feedback-assisted code C is given by
1 𝑝 𝑝
𝑝 err (C; 𝑝) B Φ 𝑀 𝑀ˆ − 𝜔 ˆ , (18.1.11)
2 𝑀𝑀 1

and using the expression in (11.1.36), the maximal error probability for the
classical-feedback-assisted code C is given by
1 𝑝
𝑝 ∗err (C) B
𝑝
max Φ 𝑀 𝑀ˆ − 𝜔 ˆ , (18.1.12)
𝑝:M→[0,1] 2 𝑀 𝑀 1

where the maximization is over every probability distribution 𝑝.

Definition 18.1 (𝒏, |M| , 𝜺) Classical-Feedback-Assisted Classical Com-

munication Protocol
Let (M, Ψ𝐹0 𝐵0′ , E0𝑀 ′ 𝐹0 →𝐴′ 𝐴1 , {E𝑖𝐴′ 𝐹𝑖 →𝐴′ , D𝑖𝐵𝑖 𝐵′ →𝐹 ′ }𝑖=1 , D
𝑛 𝑛 ) be
1 𝑖 𝐴
𝑖+1 𝑖+1 𝑖−1 𝑖 𝐵 𝑖 𝐵𝑛 𝐵′
𝑛−1
→ 𝑀ˆ
the elements of an 𝑛-shot classical-feedback-assisted classical communication
protocol over the channel N 𝐴→𝐵 . The protocol is called an (𝑛, |M| , 𝜀) protocol,
with 𝜀 ∈ [0, 1], if 𝑝 ∗err (C) ≤ 𝜀.

18.2 Protocol over a Useless Channel

A common theme in this book has been that we can derive converse bounds by
using a generalized divergence to compare the output of the actual protocol with
one that is useless for the task. We did exactly this in Section 17.1.1 of the previous
chapter, and the only change that we make here, as in the previous section, is to
replace every quantum feedback channel with a classical one. Figure 17.2 applies
again and we briefly define the steps and states exactly as before, except with this
key difference for both the figure and the states involved. A useless channel is one
that traces out the input and replaces it with some state at the output:
R 𝐴→𝐵 B P𝜎𝐵 ◦ Tr 𝐴 , (18.2.1)
1096
Chapter 18: Classical-Feedback-Assisted Communication

where P𝜎𝐵 denotes a preparation channel that prepares the state 𝜎𝐵 at the output.
The initial state of this protocol is
𝑝
Φ 𝑀 𝑀 ′ ⊗ Δ𝐹0 (Ψ𝐹0 𝐵0′ ), (18.2.2)
and the others are as follows:
1 0 𝑝
𝐴′ 𝐵1 𝐵′ B (R 𝐴1 →𝐵1 ◦ E 𝑀 ′ 𝐹0 →𝐴′ 𝐴1 )(Φ 𝑀 𝑀 ′ ⊗ Δ𝐹0 (Ψ𝐹0 𝐵0 )),
𝜏𝑀 ′ (18.2.3)
1 0 1
2 1 1 1
𝜏𝑀 𝐴′ 𝐵2 𝐵′ B (R 𝐴2 →𝐵2 ◦ E 𝐴′ 𝐹1 →𝐴′ 𝐴2 ◦ Δ𝐹1 ◦ D 𝐵1 𝐵′ →𝐹1 𝐵′ )(𝜌 𝑀 𝐴′ 𝐵1 𝐵′ ),
2 1 1 2 0 1 1 0
(18.2.4)

𝑖
𝜏𝑀 𝐴′ 𝐵𝑖 𝐵 ′ B
𝑖 𝑖−1
2
(R 𝐴𝑖 →𝐵𝑖 ◦ E𝑖−1
𝐴′ 𝐹 →𝐴𝑖′ 𝐴𝑖 ◦ Δ𝐹𝑖−1 ◦ D𝑖−1
𝐵𝑖−1 𝐵′ →𝐹 𝐵 ′ )(𝜌 𝑀 𝐴′ 𝐵 𝐵 ′ ), (18.2.5)
𝑖−1 𝑖−1 𝑖−2 𝑖−1 𝑖−1 2 2 1

where 𝑖 ∈ {3, . . . , 𝑛}. The final state of the protocol is then as follows:
𝑝
𝜔 B D𝑛𝐵 ′ (Tr 𝐴′𝑛 [𝜌 𝑛𝑀 𝐴′𝑛 𝐵𝑛 𝐵′ ]). (18.2.6)
𝑀 𝑀ˆ 𝑛 𝐵 𝑛−1 → 𝑀
ˆ 𝑛−1

Going through calculations similar to those in (17.1.15)–(17.1.33), we arrive at

the following conclusions:
1 1
𝜏𝑀 𝐴′ 𝐵1 𝐵′ = 𝜏𝑀 𝐴′ 𝐵′ ⊗ 𝜎𝐵1 ,
1 0 1 0
(18.2.7)
1 𝑝
𝐵1 𝐵′ = 𝜋 𝑀 ⊗ Ψ𝐵0 ⊗ 𝜎𝐵1 ,
𝜏𝑀 ′ (18.2.8)
0
2 2 𝑝
𝜏𝑀 𝐵2 𝐵′ = 𝜋 𝑀 ⊗ 𝜏𝐵′ ⊗ 𝜎𝐵2 ,
1 1
(18.2.9)
𝑖 𝑝
𝜏𝑀 𝐵𝑖 𝐵 ′ = 𝜋 𝑀 ⊗ 𝜏𝐵𝑖 ′ ⊗ 𝜎𝐵𝑖 , (18.2.10)
𝑖−1 𝑖−1

where 𝑖 ∈ {3, . . . , 𝑛} and

∑︁
𝑝
𝜋𝑀 B 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 . (18.2.11)
𝑚∈M
Thus, there is no correlation whatsoever between the message system 𝑀 and Bob’s
′ , for each 𝑖 ∈ {1, . . . , 𝑛}, after tracing over Alice’s systems. As
systems 𝐵𝑖 𝐵𝑖−1
before, this is a consequence of the fact that the “communication line has been cut”
when employing the replacement channel.
Bob’s final decoding channel D𝑛 thus leads to the following classical–
𝐵 𝑛 𝐵′𝑛−1 → 𝑀ˆ
classical state:
𝑝
𝜏𝑀 𝑀ˆ B 𝜋 𝑀 ⊗ 𝜏𝑀ˆ , (18.2.12)
Í
where 𝜏𝑀ˆ B 𝑚∈M
ˆ 𝑡 ( 𝑚)|
ˆ 𝑚⟩⟨ ˆ 𝑀ˆ , for some probability distribution 𝑡 : M → [0, 1],
ˆ 𝑚|
which corresponds to Bob’s measurement.
1097
Chapter 18: Classical-Feedback-Assisted Communication

18.3 Upper Bounds on the Number of Transmitted

Bits
We now provide several upper bounds on the number of transmitted bits for classical
communication protocols assisted by classical feedback. The first bounds that
we discuss apply only to entanglement-breaking channels (recall Definition 4.12),
and they imply that classical feedback increases neither the asymptotic classical
capacity of entanglement-breaking channels nor their strong converse classical
capacity. Then the next two bounds apply to all quantum channels, and they are
known as the entropy bound and the geometric Υ-information bound.

18.3.1 Upper Bounds for Entanglement-Breaking Channels

Before stating the main theorem of this section, we discuss particular aspects
of a classical-feedback-assisted protocol for classical communication over an
entanglement-breaking channel. Indeed, suppose that N 𝐴→𝐵 is an entanglement-
breaking channel. We begin our analysis by inspecting the state in (18.1.9). This
state is fully separable with respect to the cut 𝑀 : 𝐴′1 𝐴1 : 𝐵′0 . That is, it can be
written as follows: ∑︁
𝑧
𝑞(𝑧)𝜏𝑀 ⊗ 𝜎𝐴𝑧 ′ 𝐴1 ⊗ 𝜔 𝑧𝐵0 , (18.3.1)
1
𝑧
𝑧
for 𝑞 a probability distribution and {𝜏𝑀 } 𝑧 , {𝜎𝐴𝑧 ′ 𝐴1 } 𝑧 , and {𝜔 𝑧𝐵0 } 𝑧 sets of states.
1
Since the channel N 𝐴→𝐵 is entanglement breaking, when it acts on system 𝐴1 of
0,𝑚, 𝑓
the state 𝜍 𝐴′ 𝐴1 0 in (18.1.9), the resulting state is a separable state of the following
1
form: ∑︁
0,𝑚, 𝑓0 𝑦,𝑚, 𝑓 𝑦,𝑚, 𝑓
N 𝐴1 →𝐵1 (𝜍 𝐴′ 𝐴1 ) = 𝑝(𝑦|𝑚, 𝑓0 )𝜍 𝐴′ 0 ⊗ 𝜍 𝐵1 0 . (18.3.2)
1 1
𝑦

So this implies that the state 𝜌 1𝑀 𝐴′ 𝐵1 𝐵′ , as defined in (18.1.4) and with N 𝐴1 →𝐵1
1 0
entanglement breaking, is fully separable across all systems (i.e., with respect to
the cut 𝑀 : 𝐴′1 : 𝐵1 : 𝐵′0 ).

Bob then applies the decoding channel Δ𝐹1 ◦ D1𝐵1 𝐵′ →𝐹1 𝐵′ and the state at this
0 1
point is as follows:

(Δ𝐹1 ◦ D1𝐵1 𝐵′ →𝐹1 𝐵′ )(𝜌 1𝑀 𝐴′ 𝐵1 𝐵′ ) =

0 1 1 0

1098
Chapter 18: Classical-Feedback-Assisted Communication
∑︁ ∑︁
𝑝(𝑦|𝑚, 𝑓0 ) 𝑝(𝑚) 𝑝( 𝑓0 )|𝑚⟩⟨𝑚| 𝑀 ⊗
𝑚∈M 𝑓0 ,𝑦
𝑦,𝑚, 𝑓0 𝑦,𝑚, 𝑓0 𝑓
𝜍 𝐴′ ⊗ (Δ𝐹1 ◦ D1𝐵1 𝐵′ →𝐹1 𝐵′ )(𝜍 𝐵1 ⊗ Ψ𝐵0′ ). (18.3.3)
1 0 1 0

𝑦,𝑚, 𝑓 𝑓
Since the 𝐹1 system is classical, the state (Δ𝐹1 ◦ D1𝐵1 𝐵′ →𝐹1 𝐵′ )(𝜍 𝐵1 0 ⊗ Ψ𝐵0′ ) can
0 1 0
be written as
∑︁
1 𝑦,𝑚, 𝑓0 𝑓0 𝑓 ,𝑦,𝑚, 𝑓0
(Δ𝐹1 ◦ D𝐵1 𝐵′ →𝐹1 𝐵′ )(𝜍 𝐵1 ⊗ Ψ𝐵′ ) = 𝑝( 𝑓1 |𝑦, 𝑚, 𝑓0 )| 𝑓1 ⟩⟨ 𝑓1 | 𝐹1 ⊗ 𝜍 𝐵1′ .
0 1 0 1
𝑓1
(18.3.4)
This means that the state (Δ𝐹1 ◦ D1𝐵1 𝐵′ →𝐹1 𝐵′ )(𝜌 1𝑀 𝐴′ 𝐵1 𝐵′ ) is fully separable with
0 1 1 0
respect to the cut 𝑀 : 𝐴′1 : 𝐹1 : 𝐵′1 .
This process continues, and since the channel N 𝐴→𝐵 is entanglement breaking,
by following an analysis similar to that given above, we observe that the state of
the message system 𝑀, Alice’s, and Bob’s is always fully separable throughout the
protocol. This is the key reason that we obtain the bounds given in the following
theorem:

Theorem 18.2 𝒏-Shot Upper Bounds for Classical Feedback Assisted

Classical Communication over Entanglement Breaking
Channels
Let N 𝐴→𝐵 be an entanglement-breaking channel, and let 𝜀 ∈ [0, 1). For all
(𝑛, |M| , 𝜀) classical-feedback-assisted classical communication protocols over
the channel N 𝐴→𝐵 , the following bounds hold

log2 |M| 1 1
≤ 𝜒(N) + ℎ2 (𝜀) , (18.3.5)
𝑛 1−𝜀 𝑛

log2 |M| 𝛼 1
≤e 𝜒𝛼 (N) + log2 ∀𝛼 > 1, (18.3.6)
𝑛 𝑛 (𝛼 − 1) 1−𝜀

where 𝜒(N) is the Holevo information of N 𝐴→𝐵 , as defined in (7.11.106), and

𝜒𝛼 (N) is the sandwiched Rényi Holevo information of N 𝐴→𝐵 , as defined in
e
(7.11.95).

Proof: Applying precisely the same reasoning as in the beginning of the proof of

1099
Chapter 18: Classical-Feedback-Assisted Communication

Theorem 17.2, we conclude the following bound:

log2 |M| ≤ 𝐼 𝐻𝜀 (𝑀; 𝑀)

ˆ 𝜔, (18.3.7)

where 𝜔 𝑀 𝑀ˆ is the final state of the protocol when the distribution 𝑝 is set to the
uniform distribution.
Invoking Proposition 7.70, the definition of 𝐼 𝐻𝜀 (𝑀; 𝑀)
ˆ from (7.11.88), and the
expression for the mutual information from (7.2.97), we find that
1
𝐼 𝐻𝜀 (𝑀; 𝑀)
ˆ 𝜔≤ 𝐼 (𝑀; 𝑀)
ˆ 𝜔 + ℎ2 (𝜀) . (18.3.8)
1−𝜀
Now employing the data-processing inequality for the mutual information with
respect to the last decoding channel D𝑛 ′ ˆ , we find that
𝐵 𝑛 𝐵 𝑛−1 → 𝑀

ˆ 𝜔 ≤ 𝐼 (𝑀; 𝐵𝑛 𝐵′ ) 𝜌 𝑛 .
𝐼 (𝑀; 𝑀) (18.3.9)
𝑛−1

Then using the chain for the mutual information in (7.2.112), we obtain

𝐼 (𝑀; 𝐵𝑛 𝐵′𝑛−1 ) 𝜌 𝑛 = 𝐼 (𝑀; 𝐵𝑛 |𝐵′𝑛−1 ) 𝜌 𝑛 + 𝐼 (𝑀; 𝐵′𝑛−1 ) 𝜌 𝑛 (18.3.10)

≤ 𝐼 (𝑀 𝐵′𝑛−1 ; 𝐵𝑛 ) 𝜌 𝑛 + 𝐼 (𝑀; 𝐵′𝑛−1 ) 𝜌 𝑛 . (18.3.11)

As mentioned above, the state shared between Alice and Bob, at any point during
the protocol, is a separable state. Thus, the global state before the 𝑛th channel use
can be written as follows:
𝑛 1 ∑︁ ∑︁
𝑚,𝑦 𝑚,𝑦
𝜌 𝑀 𝐴′𝑛 𝐴𝑛 𝐵′ = |𝑚⟩⟨𝑚| 𝑀 ⊗ 𝑝(𝑦|𝑚)𝜍 𝐴′ 𝐴𝑛 ⊗ 𝜍 𝐵′ . (18.3.12)
𝑛−1 |M| 𝑦
𝑛 𝑛−1
𝑚∈M

Then the state after the 𝑛th channel acts is as follows:

𝜌 𝑛𝑀 𝐴′𝑛 𝐵𝑛 𝐵′ = N 𝐴𝑛 →𝐵𝑛 (𝜌 𝑛𝑀 𝐴′𝑛 𝐴𝑛 𝐵′ ) (18.3.13)

𝑛−1 𝑛−1
1 ∑︁ ∑︁
𝑚,𝑦 𝑚,𝑦
= |𝑚⟩⟨𝑚| 𝑀 ⊗ 𝑝(𝑦|𝑚)N 𝐴𝑛 →𝐵𝑛 (𝜍 𝐴′ 𝐴𝑛 ) ⊗ 𝜍 𝐵′ .
|M| 𝑦
𝑛 𝑛−1
𝑚∈M
(18.3.14)

An extension of the state above is as follows:

𝜌 𝑛𝑀𝑌 𝐴′𝑛 𝐵𝑛 𝐵′ =
𝑛−1

1100
Chapter 18: Classical-Feedback-Assisted Communication

1 ∑︁ ∑︁ 𝑚,𝑦 𝑚,𝑦
𝑝(𝑦|𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ |𝑦⟩⟨𝑦|𝑌 ⊗ N 𝐴𝑛 →𝐵𝑛 (𝜍 𝐴′ 𝐴𝑛 ) ⊗ 𝜍 𝐵′ , (18.3.15)
|M| 𝑦
𝑛 𝑛−1
𝑚∈M

and tracing over the system 𝐴′𝑛 leads to the following state:

𝜌 𝑛𝑀𝑌 𝐵𝑛 𝐵′
𝑛−1

= Tr 𝐴′𝑛 [𝜌 𝑛𝑀𝑌 𝐴′𝑛 𝐵𝑛 𝐵′ ] (18.3.16)

𝑛−1
1 ∑︁ ∑︁ 𝑚,𝑦 𝑚,𝑦
= 𝑝(𝑦|𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ |𝑦⟩⟨𝑦|𝑌 ⊗ N 𝐴𝑛 →𝐵𝑛 (𝜍 𝐴𝑛 ) ⊗ 𝜍 𝐵′ . (18.3.17)
|M| 𝑦
𝑛−1
𝑚∈M

Then consider that

𝐼 (𝑀 𝐵′𝑛−1 ; 𝐵𝑛 ) 𝜌 𝑛 ≤ 𝐼 (𝑀𝑌 𝐵′𝑛−1 ; 𝐵𝑛 ) 𝜌 𝑛 (18.3.18)

= 𝐼 (𝑀𝑌 ; 𝐵𝑛 ) 𝜌 𝑛 + 𝐼 (𝐵′𝑛−1 ; 𝐵𝑛 |𝑀𝑌 ) 𝜌 𝑛 (18.3.19)
= 𝐼 (𝑀𝑌 ; 𝐵𝑛 ) 𝜌 𝑛 (18.3.20)
≤ 𝜒(N). (18.3.21)

The first inequality follows from the data-processing inequality for mutual informa-
tion. The first equality follows from the chain rule. The second equality follows
because the state in (18.3.17) is product when conditioning on the systems 𝑀 and
𝑌 . The last inequality follows because the state 𝜌 𝑛𝑀𝑌 𝐵𝑛 is a classical–quantum state
of the following form:

𝜌 𝑛𝑀𝑌 𝐵𝑛 = Tr 𝐵′𝑛−1 [𝜌 𝑛𝑀𝑌 𝐵𝑛 𝐵′ ] (18.3.22)

𝑛−1
1 ∑︁ ∑︁ 𝑚,𝑦
= 𝑝(𝑦|𝑚)|𝑚⟩⟨𝑚| 𝑀 ⊗ |𝑦⟩⟨𝑦|𝑌 ⊗ N 𝐴𝑛 →𝐵𝑛 (𝜍 𝐴𝑛 ). (18.3.23)
|M| 𝑦
𝑚∈M

Thus, the definition of the Holevo information in (7.11.106) implies the last in-
equality in (18.3.21). Putting together (18.3.9), (18.3.10)–(18.3.11), and (18.3.18)–
(18.3.21), we conclude that
ˆ 𝜔 ≤ 𝜒(N) + 𝐼 (𝑀; 𝐵′ ) 𝜌 𝑛
𝐼 (𝑀; 𝑀) (18.3.24)
𝑛−1
≤ 𝜒(N) + 𝐼 (𝑀; 𝐵𝑛−1 𝐵′𝑛−2 ) 𝜌 𝑛−1 , (18.3.25)

where the last inequality follows from the data-processing inequality for mutual
information.

1101
Chapter 18: Classical-Feedback-Assisted Communication

Now, we recognize the term 𝐼 (𝑀; 𝐵𝑛−1 𝐵′𝑛−2 ) 𝜌 𝑛−1 as being of the same form
as 𝐼 (𝑀; 𝐵𝑛 𝐵′𝑛−1 ) 𝜌 𝑛 in (18.3.10). Thus, we iterate through the same sequence of
arguments to conclude that

𝐼 (𝑀; 𝐵𝑛−1 𝐵′𝑛−2 ) 𝜌 𝑛−1 ≤ 𝜒(N) + 𝐼 (𝑀; 𝐵𝑛−2 𝐵′𝑛−3 ) 𝜌 𝑛−2 , (18.3.26)

which in turn implies that

ˆ 𝜔 ≤ 2𝜒(N) + 𝐼 (𝑀; 𝐵𝑛−2 𝐵′ ) 𝜌 𝑛−2 .
𝐼 (𝑀; 𝑀) (18.3.27)
𝑛−3

Continuing all the way back to the first channel use, we find that

𝐼 (𝑀; 𝑀)
ˆ 𝜔 ≤ 𝑛𝜒(N) (18.3.28)

because 𝐼 (𝑀; 𝐵′0 ) = 0 (the systems 𝑀 and 𝐵′0 are in a product state at the start of
the protocol). Putting together (18.3.7), (18.3.8), and (18.3.28), we conclude that
1
log2 |M| ≤ (𝑛𝜒(N) + ℎ2 (𝜀)) , (18.3.29)
1−𝜀
which implies the claim in (18.3.5).
We now prove the inequality in (18.3.6). Our starting point is again (18.3.7),
but from there, we instead apply Proposition 7.71, the definition of 𝐼 𝐻𝜀 (𝑀; 𝑀)
ˆ 𝜔
from (7.11.88), and the expression for the sandwiched Rényi mutual information
from (7.11.92) to find that the following holds for all 𝛼 > 1:

𝜀 𝛼 1
𝐼 𝐻 (𝑀; 𝑀)
ˆ 𝜔≤e 𝐼𝛼 (𝑀; 𝑀)ˆ 𝜔+ log2 . (18.3.30)
𝛼−1 1−𝜀

𝐼𝛼 (𝑀; 𝑀)
Recall that the sandwiched Rényi mutual information e ˆ 𝜔 is defined as

𝐼𝛼 (𝑀; 𝑀)
e e𝛼 (𝜔 𝑀 𝑀ˆ ∥𝜔 𝑀 ⊗ 𝜉 𝑀ˆ )
ˆ 𝜔 = inf 𝐷 (18.3.31)
𝜉 𝑀ˆ
e𝛼 (𝜔 𝑀 𝑀ˆ ∥𝜋 𝑀 ⊗ 𝜉 𝑀ˆ ).
= inf 𝐷 (18.3.32)
𝜉 𝑀ˆ

We adopt a similar approach to that given for the proof of (17.1.36). Our goal is
thus to compare the actual protocol with one that results from employing a useless,
replacement channel (of the form discussed in Section 18.2). To this end, let R𝜎𝐴→𝐵 𝐵

be the replacement channel defined in (18.2.1), with 𝜎𝐵 an arbitrary state. Then as

discussed in Section 18.2 (in particular, in (18.2.12)), the final state of the protocol
1102
Chapter 18: Classical-Feedback-Assisted Communication

conducted with the replacement channel is given by 𝜏𝑀 𝑀ˆ = 𝜋 𝑀 ⊗ 𝜏𝑀ˆ . Then, we

find that

𝐼𝛼 (𝑀; 𝑀)
e e𝛼 (𝜔 𝑀 𝑀ˆ ∥𝜋 𝑀 ⊗ 𝜉 𝑀ˆ )
ˆ 𝜔 = inf 𝐷 (18.3.33)
𝜉 𝑀ˆ

≤𝐷
e𝛼 (𝜔 𝑀 𝑀ˆ ∥𝜋 𝑀 ⊗ 𝜏𝑀ˆ ) (18.3.34)
e𝛼 (𝜔 𝑀 𝑀ˆ ∥𝜏𝑀 𝑀ˆ ).
=𝐷 (18.3.35)

By applying the data-processing inequality for the sandwiched Rényi relative

entropy with respect to the last decoding channel, and using (18.2.10), we find that
e𝛼 (𝜌 𝑛
e𝛼 (𝜔 𝑀 𝑀ˆ ∥𝜏𝑀 𝑀ˆ ) ≤ 𝐷 𝑛
𝐷 𝑀 𝐵 𝑛 𝐵′ ∥𝜏𝑀 𝐵 𝑛 𝐵′ ) 𝑛−1 𝑛−1
(18.3.36)
e𝛼 (𝜌 𝑛 𝑛
=𝐷 𝑀 𝐵 𝑛 𝐵′ ∥𝜏𝑀 𝐵′ 𝑛−1 𝑛−1
⊗ 𝜎𝐵𝑛 ), (18.3.37)

It is our goal to bound this last term. To do so, consider that

e𝛼 (𝜌 𝑛 𝑛
𝐷 𝑀 𝐵 𝑛 𝐵′𝑛−1 ∥𝜏𝑀 𝐵′𝑛−1 ⊗ 𝜎𝐵 𝑛 ) =
1−2𝛼𝛼 1−2𝛼𝛼
𝛼
log2 Θ 1−𝛼𝛼 ◦ N 𝐴𝑛 →𝐵𝑛 𝑛
𝜏𝑀 𝐵′𝑛−1 𝜌 𝑛𝑀 𝐴𝑛 𝐵′ 𝑛
𝜏𝑀 𝐵′𝑛−1 ,
𝛼−1 𝜎𝐵𝑛 𝑛−1
𝛼
(18.3.38)
where we define the completely positive map Θ 𝑋 by
1 1
Θ 𝑋 (𝜌) B 𝑋 2 𝜌𝑋 2 . (18.3.39)

We now employ the key observation from before: if the channel N is entanglement
breaking, then Alice and Bob’s systems are always separable throughout the protocol.
Thus, the state 𝜌 𝑛𝑀 𝐴𝑛 𝐵′ is fully separable with respect to the cut 𝑀 : 𝐴𝑛 : 𝐵′𝑛−1 . It
𝑛−1
is in turn separable with respect to the bipartite cut 𝐴𝑛 : 𝑀 𝐵′𝑛−1 and can be written
as ∑︁
𝑛 𝑗 𝑗
𝜌 𝑀 𝐴𝑛 𝐵′ = 𝑝( 𝑗) 𝜌 𝐴𝑛 ⊗ 𝜌 𝑀 𝐵′ , (18.3.40)
𝑛−1 𝑛−1
𝑗

which implies that

1−2𝛼𝛼 1−2𝛼𝛼
𝑛
𝜏𝑀 𝐵′𝑛−1 𝜌 𝑛𝑀 𝐴𝑛 𝐵′ 𝑛
𝜏𝑀 𝐵′𝑛−1
𝑛−1
!
1−2𝛼𝛼 ∑︁ 1−2𝛼𝛼
𝑛 𝑗 𝑗 𝑛
= 𝜏𝑀 𝐵′𝑛−1 𝑝( 𝑗) 𝜌 𝐴𝑛 ⊗ 𝜌 𝑀 𝐵′ 𝜏𝑀 𝐵′𝑛−1 (18.3.41)
𝑛−1
𝑗

1103
Chapter 18: Classical-Feedback-Assisted Communication

∑︁ 1−2𝛼𝛼 1−2𝛼𝛼
𝑗 𝑛 𝑗 𝑛
= 𝑝( 𝑗) 𝜌 𝐴𝑛 ⊗ 𝜏𝑀 𝐵′𝑛−1 𝜌 𝑀 𝐵′ 𝜏𝑀 𝐵′𝑛−1 . (18.3.42)
𝑛−1
𝑗

Since conjugation by a positive semi-definite operator is a completely positive map,

we can apply Lemma 12.18 to conclude that
1−2𝛼𝛼 1−2𝛼𝛼
Θ ◦ N 𝐴𝑛 →𝐵𝑛
1− 𝛼 𝜏𝑀𝑛
𝐵′𝑛−1 𝜌 𝑛𝑀 𝐴𝑛 𝐵′ 𝜏𝑀𝑛
𝐵′𝑛−1
𝜎𝐵𝑛𝛼 𝑛−1
𝛼
1−2𝛼𝛼 1−2𝛼𝛼
≤ 𝜈𝛼 Θ 1−𝛼𝛼 ◦ N 𝐴𝑛 →𝐵𝑛 · 𝜏𝑀 𝐵′
𝑛 𝑛
𝜌 𝑀 𝐵′ 𝑛
𝜏𝑀 𝐵′ , (18.3.43)
𝜎𝐵𝑛 𝑛−1 𝑛−1 𝑛−1
𝛼

where 𝜈𝛼 (Θ 1− 𝛼 ◦N 𝐴𝑛 →𝐵𝑛 ) is defined from (12.2.82) and we have identified 𝑀 𝐵′𝑛−1

𝜎𝐵𝑛𝛼
with system 𝑅 of 𝑃 𝑅 𝐴 in (12.2.81) and 𝐴𝑛 with system 𝐴 of 𝑃 𝑅 𝐴 . We then have the
following chain of inequalities:
e𝛼 (𝜌 𝑛 𝑛
𝐷 𝑀 𝐵 𝑛 𝐵′𝑛−1 ∥𝜏𝑀 𝐵 𝑛 𝐵′𝑛−1 )

𝛼
≤ log2 𝜈𝛼 Θ 1−𝛼𝛼 ◦ N 𝐴𝑛 →𝐵𝑛 + 𝐷
e𝛼 (𝜌 𝑛 ′ ∥𝜏 𝑛 ′ ) (18.3.44)
𝛼−1 𝜎𝐵𝑛 𝑀 𝐵 𝑛−1 𝑀 𝐵 𝑛−1

𝛼
≤ log2 𝜈𝛼 Θ 1−𝛼𝛼 ◦ N 𝐴𝑛 →𝐵𝑛 + 𝐷
e𝛼 (𝜌 𝑛 𝑛
𝑀 𝐵 𝑛−1 𝐵′𝑛−2 ∥𝜏𝑀 𝐵 𝑛−1 𝐵′𝑛−2 ) (18.3.45)
𝛼−1 𝜎𝐵𝑛

𝛼
≤𝑛 log2 𝜈𝛼 Θ 1−𝛼𝛼 ◦ N 𝐴→𝐵 + 𝐷e𝛼 (𝜌 𝑛 ′ ∥𝜏 𝑛 ′ ) (18.3.46)
𝛼−1 𝜎 𝑀 𝐵0 𝑀 𝐵0
𝐵
𝛼
=𝑛 log2 𝜈𝛼 Θ 1−𝛼𝛼 ◦ N 𝐴→𝐵 (18.3.47)
𝛼−1 𝜎𝐵

The first inequality follows by combining (18.3.36)–(18.3.37) and (18.3.43). The

second inequality follows from the data-processing inequality for the sandwiched
Rényi relative entropy, with respect to the channel Tr𝐹𝑛−1 ◦Δ𝐹𝑛−1 ◦D𝑛−1
𝐵 𝑛−1 𝐵′𝑛−2 →𝐹𝑛−1 𝐵′𝑛−1
.
The third inequality follows by recognizing that
e𝛼 (𝜌 𝑛 𝑛
𝐷 𝑀 𝐵 𝑛−1 𝐵′ ∥𝜏𝑀 𝐵 𝑛−1 𝐵′ )
𝑛−2 𝑛−2
(18.3.48)

is the sandwiched Rényi relative entropy at round 𝑛 − 1 of the protocol, which allows
us to apply the argument inductively. The first equality follows because 𝜌 𝑛𝑀 𝐵′ = 𝜏𝑀
𝑛
𝐵0′
0
because no channels have been applied at this point in the protocol. Putting together
(18.3.7), (18.3.30), (18.3.33)–(18.3.35), (18.3.36), and (18.3.44)–(18.3.47), we
1104
Chapter 18: Classical-Feedback-Assisted Communication

conclude that

𝛼 𝛼 1
log2 |M| ≤ 𝑛 log2 𝜈𝛼 Θ 1−𝛼𝛼 ◦ N 𝐴→𝐵 + log2 . (18.3.49)
𝛼−1 𝜎𝐵 𝛼−1 1−𝜀

Since this upper bound holds for every state 𝜎𝐵 , we can take an infimum over all
such states and conclude that

log2 |M|

𝛼 𝛼 1
≤𝑛 inf log2 𝜈𝛼 Θ 1−𝛼𝛼 ◦ N 𝐴→𝐵 + log2 (18.3.50)
𝛼 − 1 𝜎𝐵 𝜎𝐵 𝛼−1 1−𝜀

= 𝑛𝐾e𝛼 (N 𝐴→𝐵 ) + 𝛼 log2 1 (18.3.51)
𝛼−1 1−𝜀

𝛼 1
= 𝑛e𝜒𝛼 (N 𝐴→𝐵 ) + log2 . (18.3.52)
𝛼−1 1−𝜀

The first equality follows from the definition of 𝜈𝛼 in (12.2.82) and the definition
e𝛼 in (12.2.58). The last equality follows from Lemma 12.17. ■
of 𝐾

18.3.2 Entropy Upper Bound on the Number of Transmitted

Bits

We now establish an upper bound that holds for an arbitrary quantum channel. It is
equal to the maximum output entropy of the channel (Theorem 18.5). A refinement
of this upper bound leads to an upper bound equal to the maximum expected output
entropy of the channel (Theorem 18.6), by writing it as a convex combination of
other channels.
We begin by establishing the first upper bound. The main idea for doing so is
to consider a protocol that simulates the general protocol detailed in Section 18.1.
The simulation is a purified protocol, in which every step of the original protocol
is purified. Each state of the purified protocol, when conditioned on the message
being transmitted and the values of the classical feedback, is in a pure state. We
now detail the form of this purified protocol. In order to simplify notation, we let
𝐴ˆ denote a joint system throughout, referring to both the original system 𝐴′ and
a purifying reference system, and we take the same convention when using the
ˆ By inspecting (18.1.8), the initial state of Bob in the purified protocol
notation 𝐵.

1105
Chapter 18: Classical-Feedback-Assisted Communication

is as follows:
∑︁
𝑓
𝜎𝐹0 𝐹 ′ 𝐵ˆ 0 B 𝑝( 𝑓0 )| 𝑓0 ⟩⟨ 𝑓0 | 𝐹0 ⊗ | 𝑓0 ⟩⟨ 𝑓0 | 𝐹0′ ⊗ 𝜓 ˆ0 , (18.3.53)
0 𝐵0
𝑓0
𝑓 𝑓
where the state 𝜓 ˆ0 purifies Bob’s state Ψ𝐵0′ , such that tracing over a subsystem
𝐵0 0
Additionally, Bob keeps an extra copy 𝐹0′ of the classical data
𝑓 𝑓
of 𝜓 ˆ0 gives Ψ𝐵0′ .
𝐵0 0
transmitted over the classical feedback channel. Let U0 denote an isometric
𝑀 ′ 𝐹0 → 𝐴ˆ 1 𝐴1
channel extending the encoding channel E0𝑀 ′ 𝐹0 →𝐴′ 𝐴1 . After U0 ′ acts, the
1 𝑀 𝐹0 → 𝐴ˆ 1 𝐴1
global state is as follows:
∑︁ ∑︁
0,𝑚, 𝑓 𝑓
𝜔1𝑀 𝐴ˆ 𝐴 𝐹 ′ 𝐵ˆ B 𝑝(𝑚) 𝑝( 𝑓0 )|𝑚⟩⟨𝑚| 𝑀 ⊗ 𝜑 ˆ 0 ⊗ | 𝑓0 ⟩⟨ 𝑓0 | 𝐹0′ ⊗ 𝜓 ˆ0 , (18.3.54)
1 1 0 0 𝐴1 𝐴1 𝐵0
𝑚∈M 𝑓0

where
U0𝑀 ′ 𝐹 → 𝐴ˆ (|𝑚⟩⟨𝑚| 𝑀 ′ ⊗ | 𝑓0 ⟩⟨ 𝑓0 | 𝐹0 ). (18.3.55)
0 1 𝐴1

We perform this purification for each step of the protocol. Let U𝑖

𝐴𝑖′ 𝐹𝑖 → 𝐴ˆ 𝑖+1 𝐴𝑖+1
denote an isometric channel extending the encoding channel E𝑖𝐴′ 𝐹𝑖 →𝐴′ 𝐴𝑖+1 for each
𝑖 𝑖+1
𝑖 ∈ {1, . . . , 𝑛 − 1}. Since the system 𝐹𝑖 is classical, for each 𝑖 ∈ {1, . . . , 𝑛 − 1}, the
decoding channel Δ𝐹𝑖 ◦ D𝑖𝐵𝑖 𝐵′ →𝐹𝑖 𝐵′ can be written explicitly as
𝑖−1 𝑖
∑︁
𝑖, 𝑓
Δ𝐹𝑖 ◦ D𝑖𝐵𝑖 𝐵′ →𝐹𝑖 𝐵′ = D𝐵𝑖 𝑖𝐵′ →𝐵′ ⊗ | 𝑓𝑖 ⟩⟨ 𝑓𝑖 | 𝐹𝑖 , (18.3.56)
𝑖−1 𝑖 𝑖−1 𝑖
𝑓𝑖
𝑖, 𝑓
where {D𝐵𝑖 𝑖𝐵′ →𝐵′ } 𝑓𝑖 is a collection of completely positive maps such that the
Í 𝑖−1 𝑖, 𝑓𝑖 𝑖 𝑖, 𝑓
sum map 𝑓𝑖 D𝐵𝑖 𝐵′ →𝐵′ is trace preserving. Let 𝑉 𝑖 ′ ˆ be a linear map such
𝑖−1 𝑖 𝐵𝑖 𝐵𝑖−1 → 𝐵𝑖
) † gives
𝑖, 𝑓𝑖 𝑖, 𝑓𝑖
that tracing over a subsystem of 𝑉 ′ (·)(𝑉 ′ the original map
𝐵𝑖 𝐵𝑖−1 → 𝐵ˆ 𝑖 𝐵𝑖 𝐵𝑖−1 → 𝐵ˆ 𝑖
𝑖, 𝑓
D𝐵𝑖 𝑖𝐵′ →𝐵𝑖′
, and define the map
𝑖−1

𝑖, 𝑓𝑖 𝑖, 𝑓𝑖 𝑖, 𝑓𝑖 †
V ′ → 𝐵ˆ (𝜏𝐵𝑖 𝐵𝑖−1 )
′ B𝑉 ′ → 𝐵ˆ 𝜏𝐵𝑖 𝐵𝑖−1 (𝑉𝐵 𝐵 ′ → 𝐵ˆ ) .
′ (18.3.57)
𝐵𝑖 𝐵𝑖−1 𝑖 𝐵𝑖 𝐵𝑖−1 𝑖 𝑖 𝑖−1 𝑖

Then we define the extended decoding channel V𝑖 ′ →𝐹 𝐵ˆ 𝐹 ′ for each 𝑖 ∈

𝐵𝑖 𝐵𝑖−1 𝑖 𝑖 𝑖
{1, . . . , 𝑛 − 1} as
∑︁
𝑖, 𝑓𝑖
V𝑖𝐵 𝐵′ →𝐹 𝐵ˆ 𝐹 ′ (𝜏𝐵𝑖 𝐵𝑖−1
′ ) B V ′ → 𝐵ˆ (𝜏𝐵𝑖 𝐵𝑖−1 )
′ ⊗ | 𝑓𝑖 ⟩⟨ 𝑓𝑖 | 𝐹𝑖 ⊗ | 𝑓𝑖 ⟩⟨ 𝑓𝑖 | 𝐹𝑖′ .
𝑖 𝑖−1 𝑖 𝑖 𝑖 𝐵𝑖 𝐵𝑖−1 𝑖
𝑓𝑖
(18.3.58)
1106
Chapter 18: Classical-Feedback-Assisted Communication

This extended decoding channel keeps an extra copy of the classical feedback value
𝑓𝑖 for Bob in the classical register 𝐹𝑖′. The final decoding channel in the original
protocol is a measurement channel and thus can be written as
∑︁
D𝐵 𝐵′ → 𝑀ˆ (𝜏𝐵𝑛 𝐵𝑛−1 ) =
𝑛 ′ Tr[Λ𝑚 𝐵 𝑛 𝐵′ 𝜏𝐵 𝑛 𝐵 𝑛−1 ]|𝑚⟩⟨𝑚| 𝑀ˆ ,
′ (18.3.59)
𝑛 𝑛−1 𝑛−1
𝑚∈M

where {Λ𝑚
𝐵𝑛 𝐵′
} 𝑚∈M is a POVM. We enlarge it as follows in the simulation protocol:
𝑛−1
∑︁ √︃ √︃
V𝑛𝐵 ′ (𝜏 ′ ) B
𝑀ˆ 𝐵 𝑛 𝐵 𝑛−1
Λ𝑚
𝐵𝑛 𝐵′
𝜏𝐵 𝑛 𝐵 𝑛−1 Λ 𝐵 𝑛 𝐵′
′ 𝑚 ⊗ |𝑚⟩⟨𝑚| 𝑀ˆ , (18.3.60)
𝑛 𝐵 𝑛−1 → 𝐵ˆ 𝑛 𝑛−1 𝑛−1
𝑚∈M

where the system 𝐵ˆ 𝑛 is isomorphic to the systems 𝐵𝑛 𝐵′𝑛−1 . In the simulation

protocol, we also consider an isometric channel UN
𝐴→𝐵𝐸 that simulates the original
N
channel N 𝐴→𝐵 as follows: N 𝐴→𝐵 = Tr𝐸 ◦U 𝐴→𝐵𝐸 .
Thus, the various states involved in the purified protocol are as follows. The
𝑝
global initial state is Φ 𝑀 𝑀 ′ ⊗ 𝜎𝐹0 𝐹 ′ 𝐵ˆ 0 . Alice performs the extended encoding
0
channel U0 and the state becomes as follows:
𝑀 ′ 𝐹0 → 𝐴ˆ 1 𝐴1
𝑝
𝜔1𝑀 𝐴ˆ ′ = U0𝑀 ′ 𝐹 → 𝐴ˆ (Φ 𝑀 𝑀 ′ ⊗ 𝜎𝐹0 𝐹 ′ 𝐵ˆ 0 ). (18.3.61)
1 𝐴1 𝐹0 𝐵ˆ 0 0 1 𝐴1 0

Alice transmits system 𝐴1 through the first use of the extended channel UN
𝐴1 →𝐵1 𝐸 1 ,
resulting in the following state:

𝜌 1𝑀 𝐴ˆ ′ B UN 1
𝐴1 →𝐵1 𝐸 1 (𝜔 𝑀 𝐴ˆ ′ ). (18.3.62)
1 𝐵1 𝐸 1 𝐹0 𝐵ˆ 0 1 𝐴1 𝐹0 𝐵ˆ 0

Bob processes his systems 𝐵1 and 𝐵′0 with the extended decoding channel
V1 ′ ˆ ′ , and Alice acts with the extended encoding channel U ′
1 ,
𝐵1 𝐵0 →𝐹1 𝐵1 𝐹1 ˆ 𝐴1 𝐹1 → 𝐴2 𝐴2
resulting in the state

𝜔2𝑀 𝐴ˆ ′ ′ B (U1𝐴′ 𝐹 → 𝐴ˆ ◦ V1𝐵 ′ ′ )(𝜌 1𝑀 𝐴ˆ ′ ). (18.3.63)

2 𝐴2 𝐵ˆ 1 𝐸 1 𝐹0 𝐹1 1 1 2 𝐴2 1 𝐵0 →𝐹1 𝐵ˆ 1 𝐹1 1 𝐵1 𝐸 1 𝐹0 𝐵ˆ 0

This process iterates 𝑛 − 2 more times, resulting in the following states:

𝜌𝑖𝑀 𝐴ˆ 𝐵 𝐵ˆ 𝑖−1 ′ B UN 𝑖
𝐴𝑖 →𝐵𝑖 𝐸 𝑖 (𝜔 𝑀 𝐴ˆ 𝐴 𝐵ˆ 𝑖−1 ′ ), (18.3.64)
𝑖−1 𝐸 1 [𝐹0 ] 𝑖 𝑖−1 𝐸 1 [𝐹0 ]
𝑖 𝑖−1
𝑖 𝑖 𝑖

𝜔𝑖+1 𝑖 ′ B (U𝑖𝐴′ 𝐹 → 𝐴ˆ ◦ V𝑖𝐵 𝐵′ ′ )(𝜌𝑖𝑀 𝐴ˆ 𝐵 𝐵ˆ 𝑖−1 ′ ),

𝑖+1 𝐴𝑖+1 𝐵ˆ 𝑖 𝐸 1 [𝐹0 ] 𝑖 𝑖−1 →𝐹𝑖 𝐵ˆ 𝑖 𝐹𝑖 𝑖 𝑖 𝑖−1 𝐸 1 [𝐹0 ]
𝑀 𝐴ˆ 𝑖 𝑖
𝑖 𝑖 𝑖+1 𝐴𝑖+1
(18.3.65)
1107
Chapter 18: Classical-Feedback-Assisted Communication

for 𝑖 ∈ {2, . . . , 𝑛 − 1}. The final extended decoding channel results in the following
state:

𝜌 𝑛𝑀 𝐴ˆ ˆ 𝐸 𝑛 [𝐹 𝑛−1 ] ′ B V𝑛𝐵 ′ (𝜌 𝑛𝑀 𝐴ˆ 𝑛−1 ] ′ ), (18.3.66)

𝑛 𝐵 𝑛−1 → 𝐵ˆ 𝑛 𝑀 𝑛 𝐵 𝑛 𝐵ˆ 𝑛−1 𝐸 1 [𝐹0
ˆ 𝑛
𝑛 𝐵˜ 𝑀 1 0

where the 𝐵˜ system encompasses all systems in Bob’s possession at the end. Note
that we recover each state of the original protocol described in Section 18.1 by
performing particular partial traces.
Before stating the main theorem of this section, we prove two lemmas that play
an important role in its proof. Both lemmas involve the following information
measure:
𝐼 (𝑋; 𝐶𝑌 )𝜏 + 𝐻 (𝐶 |𝑋𝑌 )𝜏 , (18.3.67)
where the information quantities are evaluated with respect to the following
classical–quantum state:
∑︁
𝑥,𝑦
𝜏𝑋𝑌𝐶 B 𝑝(𝑥, 𝑦)|𝑥⟩⟨𝑥| 𝑋 ⊗ |𝑦⟩⟨𝑦|𝑌 ⊗ 𝜏𝐶 . (18.3.68)
𝑥,𝑦

𝑥,𝑦
In the above, 𝑝(𝑥, 𝑦) is a probability distribution and 𝜏𝐶 is a quantum state for all
𝑥 and 𝑦.

Lemma 18.3
Let 𝜏𝑋𝑌 𝐴𝐵 be a classical–quantum state, with classical systems 𝑋𝑌 and quantum
systems 𝐴𝐵 pure when conditioned on 𝑋𝑌 . Let L 𝐴𝐵→𝐴′ 𝐵′ 𝑍 be a one-way
LOCC channel of the following form:
∑︁
L 𝐴𝐵→𝐴 𝐵 𝑍 B
′ ′ U𝑧𝐴→𝐴′ ⊗ V𝑧𝐵→𝐵′ ⊗ |𝑧⟩⟨𝑧| 𝑍 , (18.3.69)
𝑧

where {V𝑧𝐵→𝐵′ } 𝑧 is a collection of completely positive trace-non-increasing

maps with V𝑧𝐵→𝐵′ (·) B 𝑉𝐵→𝐵 𝑧 𝑧 † 𝑧
′ (·)(𝑉𝐵→𝐵 ′ ) and {U 𝐴→𝐴′ } 𝑧 is a collection of
isometric channels. Then the following inequality holds

𝐼 (𝑋; 𝐵𝑌 )𝜏 + 𝐻 (𝐵|𝑋𝑌 )𝜏 ≥ 𝐼 (𝑋; 𝐵′𝑌 𝑍)𝜔 + 𝐻 (𝐵′ |𝑋𝑌 𝑍)𝜔 , (18.3.70)

where 𝜔 𝑋𝑌 𝑍 𝐴′ 𝐵′ B L 𝐴𝐵→𝐴′ 𝐵′ 𝑍 ′ (𝜏𝑋𝑌 𝐴𝐵 ).

1108
Chapter 18: Classical-Feedback-Assisted Communication

Proof: The inequality 𝐼 (𝑋; 𝐵𝑌 )𝜏 ≥ 𝐼 (𝑋; 𝐵′𝑌 𝑍)𝜔 follows from the data-processing
inequality for mutual information. In more detail, consider that 𝜔 𝑋𝑌 𝑍 𝐵′ is equal to
𝜔 𝑋𝑌 𝑍 𝐵′ = Tr 𝐴′ [𝜔 𝑋𝑌 𝑍 𝐴′ 𝐵′ ] (18.3.71)
" #
∑︁
= Tr 𝐴′ (U𝑧𝐴→𝐴′ ⊗ V𝑧𝐵→𝐵′ )(𝜏𝑋𝑌 𝐴𝐵 ) ⊗ |𝑧⟩⟨𝑧| 𝑍 (18.3.72)
𝑧
∑︁
= ((Tr 𝐴′ ◦U𝑧𝐴→𝐴′ ) ⊗ V𝑧𝐵→𝐵′ )(𝜏𝑋𝑌 𝐴𝐵 ) ⊗ |𝑧⟩⟨𝑧| 𝑍 (18.3.73)
𝑧
∑︁
= V𝑧𝐵→𝐵′ (Tr 𝐴 [𝜏𝑋𝑌 𝐴𝐵 ]) ⊗ |𝑧⟩⟨𝑧| 𝑍 (18.3.74)
𝑧
∑︁
= V𝑧𝐵→𝐵′ (𝜏𝑋𝑌 𝐵 ) ⊗ |𝑧⟩⟨𝑧| 𝑍 . (18.3.75)
𝑧

The fourth equality follows because U𝑧𝐴→𝐴′ is an isometric channel for all 𝑧. Thus,
the state 𝜔 𝑋𝑌 𝑍 𝐵′ can be understood as arising from the action of the quantum
Í
instrument 𝑧 V𝑧𝐵→𝐵′ ⊗ |𝑧⟩⟨𝑧| 𝑍 on the state 𝜏𝑋𝑌 𝐵 , and since this a channel taking
system 𝐵 to 𝐵′ 𝑍, the data-processing inequality for mutual information applies. The
inequality 𝐻 (𝐵|𝑋𝑌 )𝜏 ≥ 𝐻 (𝐵′ |𝑋𝑌 𝑍)𝜔 is a consequence of the LOCC monotonicity
of the entanglement of formation (see Proposition 9.6). Indeed, consider that
𝐻 (𝐵|𝑋𝑌 )𝜏 = 𝐸 𝐹 ( 𝐴; 𝐵𝑋𝑌 )𝜏 , (18.3.76)
𝐻 (𝐵′ |𝑋𝑌 𝑍)𝜔 = 𝐸 𝐹 ( 𝐴′; 𝐵′ 𝑋𝑌 𝑍)𝜔 , (18.3.77)
which follows from the direct-sum property of the entanglement of formation (see
the proof of Proposition 9.6) and its reduction to entropy of entanglement for pure
states (see (9.1.40)). Thus, we apply these equalities and the LOCC monotonicity
of entanglement of formation (i.e., 𝐸 𝐹 ( 𝐴; 𝐵𝑋𝑌 )𝜏 ≥ 𝐸 𝐹 ( 𝐴′; 𝐵′ 𝑋𝑌 𝑍)𝜔 ). ■

The following lemma places an entropic upper bound on the amount by which
the information quantity in (18.3.67) can increase by the action of a channel N 𝐴→𝐵 :

Lemma 18.4
Let N 𝐴→𝐵 be a quantum channel, and let 𝜏𝑋𝑌 𝐴𝐵′ be a classical–quantum state
of the following form:
∑︁
𝑥,𝑦
𝜏𝑋𝑌 𝐴𝐵′ B 𝑝(𝑥, 𝑦)|𝑥⟩⟨𝑥| 𝑋 ⊗ |𝑦⟩⟨𝑦|𝑌 ⊗ 𝜏𝐴𝐵′ . (18.3.78)
𝑥,𝑦

1109
Chapter 18: Classical-Feedback-Assisted Communication

Then

𝐼 (𝑋; 𝐵𝐵′𝑌 )𝜔 + 𝐻 (𝐵𝐵′ |𝑋𝑌 )𝜔 − [𝐼 (𝑋; 𝐵′𝑌 )𝜏 + 𝐻 (𝐵′ |𝑋𝑌 )𝜏 ] ≤ 𝐻 (𝐵)𝜔 ,

(18.3.79)
where 𝜔 𝑋𝑌 𝐵𝐵′ B N 𝐴→𝐵 (𝜏𝑋𝑌 𝐴𝐵′ ).

Proof: Consider that

The key properties of the information quantity in (18.3.67) is that it does not
increase under the action of a one-way LOCC channel from Bob to Alice (i.e., the
decoding channel of Bob, the classical feedback channel, and the encoding channel
of Alice) and it cannot increase by more than the output entropy of a channel under
its action. We can use these properties to establish the following entropy bound on
the number of bits that can be transmitted by a feedback-assisted communication
protocol:

Theorem 18.5
Let N 𝐴→𝐵 be a quantum channel, and let 𝜀 ∈ [0, 1). For an (𝑛, |M| , 𝜀) protocol
for classical communication over a quantum channel N 𝐴→𝐵 assisted by classical
feedback, as described in Section 18.1, the following bound holds

log2 |M| 1 ℎ2 (𝜀)
≤ sup 𝐻 (N 𝐴→𝐵 (𝜌 𝐴 )) + . (18.3.85)
𝑛 1 − 𝜀 𝜌𝐴 𝑛

Proof: Our starting point is the general bounds in (18.3.7)–(18.3.8), which imply
1110
Chapter 18: Classical-Feedback-Assisted Communication

that
1
log2 |M| ≤ 𝐼 (𝑀; 𝑀)ˆ 𝜔 + ℎ2 (𝜀) , (18.3.86)
1−𝜀
where 𝜔 𝑀 𝑀ˆ is the final state of the protocol, as given in (18.1.7), with 𝑝 therein set
to the uniform distribution over the set M of messages. Continuing, and considering
the purified protocol outlined above, we find that

𝐼 (𝑀; 𝑀)
ˆ 𝜔
≤ 𝐼 (𝑀; 𝐵𝑛 𝐵ˆ 𝑛−1 [𝐹0𝑛−1 ] ′) 𝜌 𝑛 + 𝐻 (𝐵𝑛 𝐵ˆ 𝑛−1 |[𝐹0𝑛−1 ] ′ 𝑀) 𝜌 𝑛 (18.3.87)
= 𝐼 (𝑀; 𝐵𝑛 𝐵ˆ 𝑛−1 [𝐹0𝑛−1 ] ′) 𝜌 𝑛 + 𝐻 (𝐵𝑛 𝐵ˆ 𝑛−1 |[𝐹0𝑛−1 ] ′ 𝑀) 𝜌 𝑛
− 𝐼 (𝑀; 𝐵ˆ 0 𝐹0′ )𝜔1 + 𝐻 ( 𝐵ˆ 0 |𝐹0′ 𝑀)𝜔1

(18.3.88)
= 𝐼 (𝑀; 𝐵𝑛 𝐵ˆ 𝑛−1 [𝐹0𝑛−1 ] ′) 𝜌 𝑛 + 𝐻 (𝐵𝑛 𝐵ˆ 𝑛−1 |[𝐹0𝑛−1 ] ′ 𝑀) 𝜌 𝑛
− 𝐼 (𝑀; 𝐵ˆ 0 𝐹0′ )𝜔1 + 𝐻 ( 𝐵ˆ 0 |𝐹0′ 𝑀)𝜔1

𝑛
∑︁
+ 𝐼 (𝑀; 𝐵ˆ 𝑖−1 [𝐹0𝑖−1 ] ′)𝜔𝑖 + 𝐻 ( 𝐵ˆ 𝑖−1 |[𝐹0𝑖−1 ] ′ 𝑀)𝜔𝑖
𝑖=2
− 𝐼 (𝑀; 𝐵ˆ 𝑖−1 [𝐹0𝑖−1 ] ′)𝜔𝑖 + 𝐻 ( 𝐵ˆ 𝑖−1 |[𝐹0𝑖−1 ] ′ 𝑀)𝜔𝑖 .

(18.3.89)

The first inequality follows from non-negativity of quantum entropy and data
processing under the action of the final decoding channel. The first equality follows
because 𝐼 (𝑀; 𝐵ˆ 0 𝐹0′ )𝜔1 + 𝐻 ( 𝐵ˆ 0 |𝐹0′ 𝑀)𝜔1 = 0 for the initial state 𝜔1 ˆ ′ ˆ (indeed,
𝑀 𝐴1 𝐴1 𝐹0 𝐵0
the systems 𝑀 and 𝐹0′ 𝐵ˆ 0 of the reduced state 𝜔1 ′ ˆ are product, and the state on
𝑀 𝐹0 𝐵0
system 𝐵ˆ 0 is pure when conditioned on 𝐹0′ 𝑀). The last equality follows by adding
and subtracting the same term. Continuing, we find that the quantity in the last line
above is bounded as

≤ 𝐼 (𝑀; 𝐵𝑛 𝐵ˆ 𝑛−1 [𝐹0𝑛−1 ] ′) 𝜌 𝑛 + 𝐻 (𝐵𝑛 𝐵ˆ 𝑛−1 |[𝐹0𝑛−1 ] ′ 𝑀) 𝜌 𝑛

− 𝐼 (𝑀; 𝐵ˆ 0 𝐹0′ )𝜔1 + 𝐻 ( 𝐵ˆ 0 |𝐹0′ 𝑀)𝜔1

𝑛
∑︁
+ 𝐼 (𝑀; 𝐵𝑖−1 𝐵ˆ 𝑖−2 [𝐹0𝑖−2 ] ′) 𝜌𝑖−1 + 𝐻 (𝐵𝑖−1 𝐵ˆ 𝑖−2 |[𝐹0𝑖−2 ] ′ 𝑀) 𝜌𝑖−1
𝑖=2
− 𝐼 (𝑀; 𝐵ˆ 𝑖−1 [𝐹0𝑖−1 ] ′)𝜔𝑖 + 𝐻 ( 𝐵ˆ 𝑖−1 |[𝐹0𝑖−1 ] ′ 𝑀)𝜔𝑖

(18.3.90)
𝑛
∑︁
= 𝐼 (𝑀; 𝐵𝑖 𝐵ˆ 𝑖−1 [𝐹0𝑖−1 ] ′) 𝜌𝑖 + 𝐻 (𝐵𝑖 𝐵ˆ 𝑖−1 |[𝐹0𝑖−1 ] ′ 𝑀) 𝜌𝑖
𝑖=1
− 𝐼 (𝑀; 𝐵ˆ 𝑖−1 [𝐹0𝑖−1 ] ′)𝜔𝑖 + 𝐻 ( 𝐵ˆ 𝑖−1 |[𝐹0𝑖−1 ] ′ 𝑀)𝜔𝑖

(18.3.91)

1111
Chapter 18: Classical-Feedback-Assisted Communication

𝑛
∑︁
≤ 𝐻 (𝐵𝑖 ) 𝜌𝑖 (18.3.92)
𝑖=1
≤ 𝑛 sup 𝐻 (N 𝐴→𝐵 (𝜌 𝐴 )) (18.3.93)
𝜌𝐴

The first inequality follows from Lemma 18.3 and the second from Lemma 18.4.
So we conclude that

𝐼 (𝑀; 𝑀)
ˆ 𝜔 ≤ 𝑛 sup 𝐻 (N 𝐴→𝐵 (𝜌 𝐴 )). (18.3.94)
𝜌𝐴

Putting together this inequality and (18.3.86), we conclude the inequality in

(18.3.85). ■

18.3.2.1 Maximum Average Output Entropy Upper Bound for Probabilistic

Mixtures of Channels

In this section, we provide a brief proof of the following theorem, which generalizes
Theorem 18.5 to the maximum average output entropy of a quantum channel:

Theorem 18.6
Í
Let N 𝐴→𝐵 = 𝑥 𝑝 𝑋 (𝑥)N𝑥𝐴→𝐵 , where 𝑝 𝑋 is a probability distribution and
{N𝑥𝐴→𝐵 }𝑥 is a set of channels. For an (𝑛, |M|, 𝜀) protocol for classical com-
munication over the channel N 𝐴→𝐵 assisted by classical feedback, of the form
described in Section 18.1, the following bound applies
∑︁
(1 − 𝜀) log2 |M| ≤ 𝑛 · sup 𝑝 𝑋 (𝑥)𝐻 (N𝑥𝐴→𝐵 (𝜌 𝐴 )) + ℎ2 (𝜀).
𝜌𝐴 𝑥

Proof: The main idea behind the proof is to observe that an arbitrary feedback-
assisted protocol of the form discussed in Section 18.1, which is for communication
Í
over a probabilistic mixture channel N 𝐴→𝐵 = 𝑧 𝑝 𝑍 (𝑧)N 𝑧𝐴→𝐵 , has a simulation of
the following form:

1. Before the 𝑖th use of the channel N 𝐴→𝐵 in the feedback-assisted protocol, Bob
selects a random variable 𝑍𝑖 independently according to the distribution 𝑝 𝑍 .
He transmits 𝑍𝑖 over the classical feedback channel to Alice.

1112
Chapter 18: Classical-Feedback-Assisted Communication

2. Each channel use N 𝐴→𝐵 from the original protocol is replaced by a simulation
in terms of another channel M 𝐴𝑍 ′ →𝐵 , which accepts a quantum input on system
𝐴 and a classical input on system 𝑍 ′. Conditioned on the value 𝑧 in system
𝑍 ′, the channel M 𝐴𝑍 ′ →𝐵 applies N 𝑧𝐴→𝐵 to the quantum system 𝐴. Thus, if the
random variable 𝑍 ∼ 𝑝 𝑍 is fed into the input system 𝑍 ′ of M 𝐴𝑍 ′ →𝐵 , then the
channel M 𝐴𝑍 ′ →𝐵 is indistinguishable from the original channel N 𝐴→𝐵 .
3. Alice feeds a copy of the classical random variable 𝑍𝑖 into the 𝑖th use of the
channel M 𝐴𝑍 ′ →𝐵 .
4. All other aspects of the protocol are executed in the same way as before. Namely,
even though it would be an advantage to Alice to modify her encodings and
Bob to modify later decodings based on the realizations of 𝑍𝑖 , they do not do
so, and they instead blindly operate all other aspects of the simulation protocol
as they are in the original protocol.

Our goal now is to establish the inequality in Theorem 18.6, relating the 𝑛, |M|,
and 𝜀 parameters of the original (𝑛, |M| , 𝜀) protocol by using the above simulation.
The main observation to make from here is that the same proof from Lemma 18.4
gives the following bound:

𝐼 (𝑋; 𝐵𝐵′𝑌 𝑍)𝜔 + 𝐻 (𝐵𝐵′ |𝑋𝑌 𝑍)𝜔

− [𝐼 (𝑋; 𝐵′𝑌 𝑍)𝜏 + 𝐻 (𝐵′ |𝑋𝑌 𝑍)𝜏 ] ≤ 𝐻 (𝐵|𝑍)𝜔 , (18.3.95)

where 𝜔 𝑋𝑌 𝑍 𝐵𝐵′ is the following state:

𝜔 𝑋𝑌 𝑍 𝐵𝐵′ B M 𝐴𝑍 ′ →𝐵 (𝜏𝑋𝑌 𝑍 𝑍 ′′ ), (18.3.96)

∑︁
𝑥,𝑦,𝑧
𝜏𝑋𝑌 𝑍 𝑍 ′′ 𝐴𝐵′ B 𝑝(𝑥, 𝑦, 𝑧)|𝑥, 𝑦, 𝑧, 𝑧⟩⟨𝑥, 𝑦, 𝑧, 𝑧| 𝑋,𝑌 ,𝑍,𝑍 ′ ⊗ 𝜏𝐴𝐵′ . (18.3.97)
𝑥,𝑦,𝑧

This follows by grouping 𝑍 with 𝑌 , but then discarding only 𝑌 and 𝐵′ at the end
of the proof. We then apply this bound, and the same reasoning in the proof
of Theorem 18.5, except that the variables 𝑍0 , . . . , 𝑍𝑖 are grouped together with
the feedback variables [𝐹0𝑖−1 ] ′ and then the same reasoning in (18.3.87)–(18.3.93)
applies. At this point, we invoke (18.3.95) and find that
𝑛
∑︁
(1 − 𝜀) log2 |M| ≤ 𝐻 (𝐵𝑖 |𝑍𝑖 ) 𝜌 (𝑖) + ℎ2 (𝜀). (18.3.98)
𝑖=1

1113
Chapter 18: Classical-Feedback-Assisted Communication

We can then bound the sum over entropies as follows:

𝑛
∑︁
𝐻 (𝐵𝑖 |𝑍𝑖 ) 𝜌 (𝑖) ≤ 𝑛𝐻 (𝐵|𝑍) 𝜌 (18.3.99)
𝑖=1
∑︁
=𝑛 𝑝 𝑍 (𝑧)𝐻 (N 𝑧 (𝜔)) (18.3.100)
𝑧
∑︁
≤ 𝑛 sup 𝑝 𝑍 (𝑧)𝐻 (N 𝑧 (𝜌)). (18.3.101)
𝜌 𝑧

The first inequality is by concavity of conditional entropy, and the conditional

entropy is defined on the averaged channel output state over 𝑛 uses, defined as
∑︁
𝜌 𝐵𝑍 B 𝑝 𝑍 (𝑧)|𝑧⟩⟨𝑧| ⊗ N 𝑧 (𝜔), (18.3.102)
𝑧
𝑛
1 ∑︁ (𝑖)
𝜔𝐴 B 𝜔 . (18.3.103)
𝑛 𝑖=1 𝐴𝑖

The second equality follows from the definition of conditional entropy. The third
inequality follows from optimizing over all states. ■

18.3.3 Geometric 𝚼-Information Upper Bound on the Number

of Transmitted Bits

In this section, we prove that the Υ-information bound from Section 12.2.5.1 is
actually an upper bound on the classical capacity assisted by classical feedback. The
main idea behind the approach detailed in this section is to establish a correlation
measure for bipartite channels, which is non-increasing under the action of one-way
LOCC channels and measures the forward classical communication that can be
generated by the bipartite channel for which it is evaluated. Such a measure is
relevant in the context of a feedback-assisted protocol because, in such a protocol,
Alice and Bob employ a one-way LOCC channel from Bob to Alice. In particular,
local channels are allowed for free, as well as the use of a classical feedback channel.
Both of these actions can be considered as particular kinds of bipartite channels
and both of them fall into the class of bipartite channels that are non-signaling from
Alice to Bob and C-PPT-P (call this class NS 𝐴↛𝐵 ∩ PPT). Recall the definition of
non-signaling channels from Section 4.6.4 and C-PPT-P channels from Section 4.6.3.
As such, if we employ a measure of bipartite channels that involves a comparison
1114
Chapter 18: Classical-Feedback-Assisted Communication

between a bipartite channel of interest to all bipartite channels in NS 𝐴↛𝐵 ∩ PPT,

then the two kinds of free channels would have zero value and the measure would
indicate how different the channel of interest is from this set (i.e., how different it is
from a channel that has no ability to send quantum information and no ability to
signal from Alice to Bob). This is the main idea behind the measure that we define
below in Definition 18.7, but one should keep in mind that the measure below does
not follow this reasoning precisely.
In Definition 18.7, although we motivated the measure for bipartite channels,
we define it more generally for completely positive bipartite maps, as it turns out to
be useful to do so when we define other measures later.

Definition 18.7 𝜷-Measure of Classical Communication for Bipartite

Channels
Let M 𝐴𝐵→𝐴′ 𝐵′ be a completely positive bipartite map. Then we define

𝐶 𝛽 (M 𝐴𝐵→𝐴′ 𝐵′ ) B log2 𝛽(M 𝐴𝐵→𝐴′ 𝐵′ ), (18.3.104)



 ∥Tr 𝐴′ 𝐵′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] ∥ ∞ : 




 M
𝑇𝐵𝐵′ (𝑉𝐴𝐴′ 𝐵𝐵′ ± Γ𝐴𝐴′ 𝐵𝐵′ ) ≥ 0,



𝛽(M 𝐴𝐵→𝐴′ 𝐵′ ) B inf ,
𝑆 𝐴𝐴′ 𝐵𝐵′ ,  𝑆 𝐴𝐴 ′ 𝐵𝐵 ′ ± 𝑉 𝐴𝐴′ 𝐵𝐵 ′ ≥ 0, 

𝑉 𝐴𝐴′ 𝐵𝐵′ ∈Herm 

 Tr 𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] = 𝜋 𝐴 ⊗ Tr 𝐴𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] 

 
(18.3.105)

where Herm denotes the set of Hermitian operators and ΓM

𝐴𝐴′ 𝐵𝐵′ is the Choi
operator of M 𝐴𝐵→𝐴′ 𝐵′ :

ΓM
𝐴𝐴′ 𝐵𝐵′ B M 𝐴ˆ 𝐵→𝐴
ˆ ′ 𝐵 ′ (Γ 𝐴 𝐴
ˆ ⊗ Γ𝐵 𝐵ˆ ). (18.3.106)

In the above, 𝐴ˆ is isomorphic to 𝐴, system 𝐵ˆ is isomorphic to 𝐵,

𝐴−1
𝑑∑︁ 𝐵 −1
𝑑∑︁
Γ𝐴 𝐴ˆ B |𝑖⟩⟨ 𝑗 | 𝐴 ⊗ |𝑖⟩⟨ 𝑗 | 𝐴ˆ , Γ𝐵 𝐵ˆ B |𝑖⟩⟨ 𝑗 | 𝐵 ⊗ |𝑖⟩⟨ 𝑗 | 𝐵ˆ , (18.3.107)
𝑖, 𝑗=0 𝑖, 𝑗=0

and 𝜋 𝐴 B 𝐼 𝐴 /𝑑 𝐴 .

Since 𝑆 𝐴𝐴′ 𝐵𝐵′ ± 𝑉𝐴𝐴′ 𝐵𝐵′ ≥ 0 implies that 𝑆 𝐴𝐴′ 𝐵𝐵′ ≥ 0, we can also express

1115
Chapter 18: Classical-Feedback-Assisted Communication

𝛽(M 𝐴𝐵→𝐴′ 𝐵′ ) as follows:



 𝜆: 


Tr 𝐴′ 𝐵′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] ≤ 𝜆𝐼 𝐴𝐵

 



 


inf 𝑇𝐵𝐵′ (𝑉𝐴𝐴′ 𝐵𝐵′ ± ΓM 𝐴𝐴′ 𝐵𝐵′ ) ≥ 0, (18.3.108)
𝑆 𝐴𝐴′ 𝐵𝐵′ ,
𝑆 𝐴𝐴 𝐵𝐵 ± 𝑉𝐴𝐴′ 𝐵𝐵′ ≥ 0,
 
 ′ ′ 
𝑉 𝐴𝐴′ 𝐵𝐵′ ∈Herm 
 

Tr 𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] = 𝜋 𝐴 ⊗ Tr 𝐴𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ]

 

 

By exploiting the equality constraint Tr 𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] = 𝜋 𝐴 ⊗ Tr 𝐴𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ], we

find that
∥Tr 𝐴′ 𝐵′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] ∥ ∞ = ∥Tr 𝐵′ [Tr 𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ]] ∥ ∞ (18.3.109)
= ∥Tr 𝐵′ [𝜋 𝐴 ⊗ Tr 𝐴𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ]] ∥ ∞ (18.3.110)
= ∥𝜋 𝐴 ⊗ Tr 𝐴𝐴′ 𝐵′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] ∥ ∞ (18.3.111)
1
= ∥Tr 𝐴𝐴′ 𝐵′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] ∥ ∞ . (18.3.112)
𝑑𝐴
Then we find that
1


 ∥Tr 𝐴𝐴′ 𝐵′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] ∥ ∞ :
𝑑𝐴




𝑇

 ′ (𝑉 ′ ′ ± Γ M ) ≥ 0,



′ ′
𝛽(M 𝐴𝐵→𝐴′ 𝐵′ ) B inf 𝐵𝐵 𝐴𝐴 𝐵𝐵 𝐴𝐴 𝐵𝐵 .
𝑆 𝐴𝐴′ 𝐵𝐵′ ,  𝑆 𝐴𝐴 ′ 𝐵𝐵 ′ ± 𝑉 𝐴𝐴′ 𝐵𝐵 ′ ≥ 0, 

𝑉 𝐴𝐴′ 𝐵𝐵′ ∈Herm 

 Tr 𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] = 𝜋 𝐴 ⊗ Tr 𝐴𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ]  
 
(18.3.113)
Since 𝑆 𝐴𝐴′ 𝐵𝐵′ ± 𝑉𝐴𝐴′ 𝐵𝐵′ ≥ 0 implies that 𝑆 𝐴𝐴′ 𝐵𝐵′ ≥ 0, we can also rewrite
𝛽(M 𝐴𝐵→𝐴′ 𝐵′ ) as


 𝜆: 


1
[𝑆 ] ≤
 



 𝑑𝐴 Tr ′
𝐴𝐴 𝐵 ′ ′
𝐴𝐴 𝐵𝐵 ′ 𝜆𝐼 𝐵 , 



𝛽(M 𝐴𝐵→𝐴′ 𝐵′ ) B inf 𝑇𝐵𝐵′ (𝑉𝐴𝐴′ 𝐵𝐵′ ± ΓM ′
𝐴𝐴 𝐵𝐵 ′ ) ≥ 0, .
𝜆,𝑆 𝐴𝐴′ 𝐵𝐵′ ≥0, 
𝑆 𝐴𝐴′ 𝐵𝐵′ ± 𝑉𝐴𝐴′ 𝐵𝐵′ ≥ 0,
 

𝑉 𝐴𝐴′ 𝐵𝐵′ ∈Herm  

Tr 𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] = 𝜋 𝐴 ⊗ Tr 𝐴𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] 

 
 
(18.3.114)

18.3.3.1 Properties of the basic measure

We now establish several properties of 𝐶 𝛽 (N 𝐴𝐵→𝐴′ 𝐵′ ), which are basic properties

that we might expect of a measure of forward classical communication for a bipartite
channel. These include the following:
1116
Chapter 18: Classical-Feedback-Assisted Communication

1. non-negativity (Proposition 18.8),

2. stability under tensoring with identity channels (Proposition 18.9),
3. zero value for classical feedback channels (Proposition 18.10),
4. zero value for a tensor product of local channels (Proposition 18.11),
5. subadditivity under serial composition (Proposition 18.12),
6. data processing under pre- and post-processing by local channels (Corol-
lary 18.13),
7. invariance under local unitary channels (Corollary 18.14),
8. convexity of 𝛽 (Proposition 18.15).

All of the properties above hold for bipartite channels, while the second and
fifth through eighth hold more generally for completely positive bipartite maps.

Proposition 18.8 Non-Negativity

Let N 𝐴𝐵→𝐴′ 𝐵′ be a bipartite channel. Then

𝐶 𝛽 (N 𝐴𝐵→𝐴′ 𝐵′ ) ≥ 0. (18.3.115)

Proof: We prove the equivalent statement 𝛽(N 𝐴𝐵→𝐴′ 𝐵′ ) ≥ 1. Let 𝜆, 𝑆 𝐴𝐴′ 𝐵𝐵′ , and
𝑉𝐴𝐴′ 𝐵𝐵′ be arbitrary Hermitian operators satisfying the constraints in (18.3.114).
Then consider that

𝜆𝑑 𝐵 = 𝜆 Tr 𝐵 [𝐼 𝐵 ] (18.3.116)
1
≥ Tr 𝐴𝐴′ 𝐵𝐵′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] (18.3.117)
𝑑𝐴
1
≥ Tr 𝐴𝐴′ 𝐵𝐵′ [𝑉𝐴𝐴′ 𝐵𝐵′ ] (18.3.118)
𝑑𝐴
1
= Tr 𝐴𝐴′ 𝐵𝐵′ [𝑇𝐵𝐵′ (𝑉𝐴𝐴′ 𝐵𝐵′ )] (18.3.119)
𝑑𝐴
≥ Tr 𝐴𝐴′ 𝐵𝐵′ [ΓN𝐴𝐴′ 𝐵𝐵′ ] (18.3.120)
1
= Tr 𝐴𝐵 [𝐼 𝐴𝐵 ] (18.3.121)
𝑑𝐴
1117
Chapter 18: Classical-Feedback-Assisted Communication

= 𝑑𝐵. (18.3.122)

This implies that 𝜆 ≥ 1. Since the inequality holds for all 𝜆, 𝑆 𝐴𝐴′ 𝐵𝐵′ , and 𝑉𝐴𝐴′ 𝐵𝐵′
satisfying the constraints in (18.3.114), we conclude the statement above. ■

Proposition 18.9 Stability

Let M 𝐴𝐵→𝐴′ 𝐵′ be a completely positive bipartite map. Then

𝐶 𝛽 (id 𝐴→
¯ 𝐴˜ ⊗M 𝐴𝐵→𝐴′ 𝐵′ ⊗ id 𝐵→
¯ 𝐵˜ ) = 𝐶 𝛽 (M 𝐴𝐵→𝐴′ 𝐵′ ). (18.3.123)

Proof: Let 𝑆 𝐴𝐴′ 𝐵𝐵′ and 𝑉𝐴𝐴′ 𝐵𝐵′ be arbitrary Hermitian operators satisfying the
constraints in (18.3.105) for M 𝐴𝐵→𝐴′ 𝐵′ . The Choi operator of id 𝐴→
¯ 𝐴˜ ⊗M 𝐴𝐵→𝐴′ 𝐵′ ⊗
id𝐵→
¯ 𝐵˜ is given by
Γ𝐴¯ 𝐴˜ ⊗ ΓM
𝐴𝐴′ 𝐵𝐵′ ⊗ Γ𝐵¯ 𝐵˜ . (18.3.124)
Let us show that Γ𝐴¯ 𝐴˜ ⊗ 𝑆 𝐴𝐴′ 𝐵𝐵′ ⊗ Γ𝐵¯ 𝐵˜ and Γ𝐴¯ 𝐴˜ ⊗𝑉𝐴𝐴′ 𝐵𝐵′ ⊗ Γ𝐵¯ 𝐵˜ satisfy the constraints
¯ 𝐴˜ ⊗M 𝐴𝐵→𝐴′ 𝐵′ ⊗ id 𝐵→
in (18.3.105) for id 𝐴→ ¯ 𝐵˜ . Consider that

𝑇𝐵𝐵′ (𝑉𝐴𝐴′ 𝐵𝐵′ ± ΓM

𝐴𝐴′ 𝐵𝐵′ ) ≥ 0 (18.3.125)
⇐⇒ 𝑇𝐵𝐵′ (Γ𝐴¯ 𝐴˜ ⊗ 𝑉𝐴𝐴′ 𝐵𝐵′ ⊗ Γ𝐵¯ 𝐵˜ ± Γ𝐴¯ 𝐴˜ ⊗ ΓM
𝐴𝐴′ 𝐵𝐵′ ⊗ Γ𝐵¯ 𝐵˜ ) ≥ 0 (18.3.126)
⇐⇒ 𝑇𝐵𝐵′ 𝐵¯ 𝐵˜ (Γ𝐴¯ 𝐴˜ ⊗ 𝑉𝐴𝐴′ 𝐵𝐵′ ⊗ Γ𝐵¯ 𝐵˜ ± Γ𝐴¯ 𝐴˜ ⊗ ΓM
𝐴𝐴′ 𝐵𝐵′ ⊗ Γ𝐵¯ 𝐵˜ ) ≥ 0 (18.3.127)

𝑆 𝐴𝐴′ 𝐵𝐵′ ± 𝑉𝐴𝐴′ 𝐵𝐵′ ≥ 0 (18.3.128)

⇐⇒ Γ𝐴¯ 𝐴˜ ⊗ 𝑆 𝐴𝐴′ 𝐵𝐵′ ⊗ Γ𝐵¯ 𝐵˜ ± Γ𝐴¯ 𝐴˜ ⊗ 𝑉𝐴𝐴′ 𝐵𝐵′ ⊗ Γ𝐵¯ 𝐵˜ ≥ 0 (18.3.129)

Tr 𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] = 𝜋 𝐴 ⊗ Tr 𝐴𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ],

the latter equivalent to

Tr 𝐴′ 𝐴˜ [Γ𝐴¯ 𝐴˜ ⊗ 𝑆 𝐴𝐴′ 𝐵𝐵′ ⊗ Γ𝐵¯ 𝐵˜ ]

= 𝐼 𝐴¯ ⊗ 𝜋 𝐴 ⊗ Tr 𝐴𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ⊗ Γ𝐵¯ 𝐵˜ ] (18.3.130)
= 𝜋 𝐴¯ ⊗ 𝜋 𝐴 ⊗ Tr 𝐴𝐴′ 𝐴¯ 𝐴˜ [Γ𝐴¯ 𝐴˜ ⊗ 𝑆 𝐴𝐴′ 𝐵𝐵′ ⊗ Γ𝐵¯ 𝐵˜ ]. (18.3.131)

Also, consider that

1
˜ ′ 𝐵˜ [Γ 𝐴¯ 𝐴˜ ⊗ 𝑆 𝐴𝐴′ 𝐵𝐵′ ⊗ Γ𝐵¯ 𝐵˜ ]
Tr 𝐴𝐴′ 𝐴¯ 𝐴𝐵 ∞
𝑑 𝐴 𝑑 𝐴¯
1118
Chapter 18: Classical-Feedback-Assisted Communication

1
= 𝑑 ¯ Tr 𝐴𝐴′ 𝐵′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ⊗ 𝐼 𝐵¯ ] ∞
(18.3.132)
𝑑 𝐴 𝑑 𝐴¯ 𝐴
1
= Tr 𝐴𝐴′ 𝐵′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] ⊗ 𝐼 𝐵¯ ∞ (18.3.133)
𝑑𝐴
1
= ∥Tr 𝐴𝐴′ 𝐵′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] ∥ ∞ . (18.3.134)
𝑑𝐴
Thus, it follows that

𝛽(M 𝐴𝐵→𝐴′ 𝐵′ ) ≥ 𝛽(id 𝐴→

¯ 𝐴˜ ⊗M 𝐴𝐵→𝐴′ 𝐵′ ⊗ id 𝐵→
¯ 𝐵˜ ). (18.3.135)

Now let us show the opposite inequality. Let 𝑆 𝐴¯ 𝐴𝐴𝐴

˜ ′ 𝐵𝐵′ 𝐵¯ 𝐵˜ and 𝑉 𝐴¯ 𝐴𝐴𝐴
˜ ′ 𝐵𝐵′ 𝐵¯ 𝐵˜
be arbitrary Hermitian operators satisfying the constraints in (18.3.105) for
¯ 𝐴˜ ⊗M 𝐴𝐵→𝐴′ 𝐵′ ⊗ id 𝐵→
id 𝐴→ ¯ 𝐵˜ . Set

1
𝑆′𝐴𝐴′ 𝐵𝐵′ B Tr ¯ ˜ ¯ ˜ [𝑆 ¯ ˜ ′ ′ ¯ ˜ ], (18.3.136)
𝑑 𝐴¯ 𝑑 𝐵¯ 𝐴 𝐴 𝐵 𝐵 𝐴 𝐴𝐴𝐴 𝐵𝐵 𝐵 𝐵
′ 1
𝑉𝐴𝐴 ′ 𝐵𝐵 ′ B Tr ¯ ˜ ¯ ˜ [𝑉 ¯ ˜ ′ ′ ¯ ˜ ]. (18.3.137)
𝑑 𝐴¯ 𝑑 𝐵¯ 𝐴 𝐴 𝐵 𝐵 𝐴 𝐴𝐴𝐴 𝐵𝐵 𝐵 𝐵
Consider that
⊗N⊗id M
Γid ˜ ′ 𝐵𝐵′ 𝐵¯ 𝐵˜ = Γ 𝐴¯ 𝐴˜ ⊗ Γ 𝐴𝐴′ 𝐵𝐵′ ⊗ Γ𝐵¯ 𝐵˜ .
𝐴¯ 𝐴𝐴𝐴
(18.3.138)
Then
M
𝑇𝐵𝐵′ 𝐵¯ 𝐵˜ (𝑉𝐴¯ 𝐴𝐴𝐴
˜ ′ 𝐵𝐵′ 𝐵¯ 𝐵˜ ± Γ 𝐴¯ 𝐴˜ ⊗ Γ 𝐴𝐴′ 𝐵𝐵′ ⊗ Γ𝐵¯ 𝐵˜ ) ≥ 0 (18.3.139)
M
=⇒ Tr 𝐴¯ 𝐴˜ 𝐵¯ 𝐵˜ [𝑇𝐵𝐵′ 𝐵¯ 𝐵˜ (𝑉𝐴¯ 𝐴𝐴𝐴
˜ ′ 𝐵𝐵′ 𝐵¯ 𝐵˜ ± Γ 𝐴¯ 𝐴˜ ⊗ Γ 𝐴𝐴′ 𝐵𝐵′ ⊗ Γ𝐵¯ 𝐵˜ )] ≥ 0 (18.3.140)
⇐⇒ 𝑇𝐵𝐵′ (𝑉𝐴𝐴′ 𝐵𝐵′ ± 𝑑 𝐴¯ 𝑑 𝐵¯ ΓM
𝐴𝐴′ 𝐵𝐵′ ) ≥ 0 (18.3.141)
′ M
⇐⇒ 𝑇𝐵𝐵′ (𝑉𝐴𝐴 ′ 𝐵𝐵 ′ ± Γ 𝐴𝐴′ 𝐵𝐵 ′ ) ≥ 0. (18.3.142)

Also

˜ ′ 𝐵𝐵′ 𝐵¯ 𝐵˜ ± 𝑉 𝐴¯ 𝐴𝐴𝐴
𝑆 𝐴¯ 𝐴𝐴𝐴 ˜ ′ 𝐵𝐵′ 𝐵¯ 𝐵˜ ≥ 0 (18.3.143)
=⇒ Tr 𝐴¯ 𝐴˜ 𝐵¯ 𝐵˜ [𝑆 𝐴¯ 𝐴𝐴𝐴
˜ ′ 𝐵𝐵′ 𝐵¯ 𝐵˜ ± 𝑉 𝐴¯ 𝐴𝐴𝐴
˜ ′ 𝐵𝐵′ 𝐵¯ 𝐵˜ ] ≥ 0 (18.3.144)
⇐⇒ 𝑆′𝐴𝐴′ 𝐵𝐵′ ± 𝑉𝐴𝐴 ′
′ 𝐵𝐵 ′ ≥ 0, (18.3.145)

and

˜ ′ [𝑆 𝐴¯ 𝐴𝐴𝐴
Tr 𝐴𝐴 ˜ ′ 𝐵𝐵′ 𝐵¯ 𝐵˜ ] = 𝜋 𝐴𝐴
¯ ⊗ Tr 𝐴¯ 𝐴𝐴𝐴
˜ ′ [𝑆 𝐴¯ 𝐴𝐴𝐴
˜ ′ 𝐵𝐵′ 𝐵¯ 𝐵˜ ] (18.3.146)
1119
Chapter 18: Classical-Feedback-Assisted Communication

=⇒ ˜ ′ 𝐵¯ 𝐵˜ [𝑆 𝐴¯ 𝐴𝐴𝐴
Tr 𝐴¯ 𝐴𝐴 ˜ ′ 𝐵𝐵′ 𝐵¯ 𝐵˜ ] = Tr 𝐴¯ 𝐵¯ 𝐵˜ [𝜋 𝐴𝐴
¯ ⊗ Tr 𝐴¯ 𝐴𝐴𝐴˜ ′ [𝑆 𝐴¯ 𝐴𝐴𝐴
˜ ′ 𝐵𝐵′ 𝐵¯ 𝐵˜ ]]
(18.3.147)
= 𝜋 𝐴 ⊗ Tr 𝐴¯ 𝐴𝐴𝐴 ˜ ′ 𝐵¯ 𝐵˜ [𝑆 𝐴¯ 𝐴𝐴𝐴
˜ ′ 𝐵𝐵′ 𝐵¯ 𝐵˜ ] (18.3.148)
⇐⇒ Tr 𝐴′ [𝑆′𝐴𝐴′ 𝐵𝐵′ ] = 𝜋 𝐴 ⊗ Tr 𝐴𝐴′ [𝑆′𝐴𝐴′ 𝐵𝐵′ ]. (18.3.149)
Finally, let 𝜆 be such that
1
Tr ¯ ˜ ′ ′ ˜ [𝑆 ¯ ˜ ′ ′ ¯ ˜ ] ≤ 𝜆𝐼 𝐵 𝐵¯ . (18.3.150)
𝑑 𝐴 𝑑 𝐴¯ 𝐴 𝐴𝐴𝐴 𝐵 𝐵 𝐴 𝐴𝐴𝐴 𝐵𝐵 𝐵 𝐵
Then it follows that

1
Tr 𝐵¯ Tr ¯ ˜ ′ ′ ˜ [𝑆 ¯ ˜ ′ ′ ¯ ˜ ] ≤ Tr 𝐵¯ [𝜆𝐼 𝐵 𝐵¯ ] (18.3.151)
𝑑 𝐴 𝑑 𝐴¯ 𝐴 𝐴𝐴𝐴 𝐵 𝐵 𝐴 𝐴𝐴𝐴 𝐵𝐵 𝐵 𝐵
1
⇐⇒ Tr ¯ ˜ ′ ′ ¯ ˜ [𝑆 ¯ ˜ ′ ′ ¯ ˜ ] ≤ 𝑑 𝐵¯ 𝜆𝐼 𝐵 (18.3.152)
𝑑 𝐴 𝑑 𝐴¯ 𝐴 𝐴𝐴𝐴 𝐵 𝐵 𝐵 𝐴 𝐴𝐴𝐴 𝐵𝐵 𝐵 𝐵
1
⇐⇒ Tr 𝐴𝐴′ 𝐵′ [𝑆′𝐴𝐴′ 𝐵𝐵′ ] ≤ 𝜆𝐼 𝐵 . (18.3.153)
𝑑𝐴
Thus, we conclude that
𝛽(M 𝐴𝐵→𝐴′ 𝐵′ ) ≤ 𝛽(id 𝐴→
¯ 𝐴˜ ⊗M 𝐴𝐵→𝐴′ 𝐵′ ⊗ id 𝐵→
¯ 𝐵˜ ). (18.3.154)
This concludes the proof. ■

Proposition 18.10 Zero on Classical Feedback Channels

Let Δ𝐵→𝐴′ be a classical feedback channel:
𝑑−1
∑︁
Δ𝐵→𝐴′ (·) B |𝑖⟩ 𝐴′ ⟨𝑖| 𝐵 (·)|𝑖⟩𝐵 ⟨𝑖| 𝐴′ , (18.3.155)
𝑖=0

where system 𝐴′ is isomorphic to 𝐵 and 𝑑 = 𝑑 𝐴′ = 𝑑 𝐵 . Then

𝐶 𝛽 (Δ𝐵→𝐴′ ) = 0. (18.3.156)

Proof: We prove the equivalent statement that 𝛽(Δ𝐵→𝐴′ ) = 1. In this case, the 𝐴
and 𝐵′ systems are trivial, so that 𝑑 𝐴 = 1, and the Choi operator of Δ𝐵→𝐴′ is given
by
Δ
Γ𝐵𝐴 ′ = Γ 𝐵𝐴′ , (18.3.157)
1120
Chapter 18: Classical-Feedback-Assisted Communication

where
𝐵 −1
𝑑∑︁
Γ 𝐵𝐴′ B |𝑖⟩⟨𝑖| 𝐵 ⊗ |𝑖⟩⟨𝑖| 𝐴′ . (18.3.158)
𝑖=0

Pick 𝑆 𝐵𝐴′ = 𝑉𝐵𝐴′ = Γ 𝐵𝐴′ . Then we need to check that the constraints in (18.3.105)
are satisfied for these choices. Consider that
Δ
𝑇𝐵 (𝑉𝐵𝐴′ ± Γ𝐵𝐴 ′) ≥ 0 (18.3.159)
⇐⇒ 𝑇𝐵 (Γ 𝐵𝐴′ ± Γ 𝐵𝐴′ ) ≥ 0 (18.3.160)
⇐⇒ Γ 𝐵𝐴′ ± Γ 𝐵𝐴′ ≥ 0, (18.3.161)

and the last inequality is trivially satisfied. Also,

𝑆 𝐵𝐴′ ± 𝑉𝐵𝐴′ ≥ 0 (18.3.162)

⇐⇒ Γ 𝐵𝐴′ ± Γ 𝐵𝐴′ ≥ 0, (18.3.163)

and the no-signaling condition Tr 𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] = 𝜋 𝐴 ⊗ Tr 𝐴𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] is trivially
satisfied because the 𝐴 system is trivial, having dimension equal to one. Finally, let
us evaluate the objective function for these choices:
1
∥Tr 𝐴𝐴′ 𝐵′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] ∥ ∞ = ∥Tr 𝐴′ [𝑆 𝐴′ 𝐵 ] ∥ ∞ (18.3.164)
𝑑𝐴
= Tr 𝐴′ [Γ 𝐵𝐴′ ] (18.3.165)
∞
= ∥𝐼 𝐵 ∥ ∞ (18.3.166)
= 1. (18.3.167)

Combined with the general lower bound from Proposition 18.8, we conclude
(18.3.156). ■

Proposition 18.11 Zero on Tensor Products of Local Channels

Let E 𝐴→𝐴′ and F𝐵→𝐵′ be quantum channels. Then

𝐶 𝛽 (E 𝐴→𝐴′ ⊗ F𝐵→𝐵′ ) = 0. (18.3.168)

Proof: We prove the equivalent statement that 𝛽(E 𝐴→𝐴′ ⊗ F𝐵→𝐵′ ) = 1. Set
𝑆 𝐴𝐴′ 𝐵𝐵′ = 𝑉𝐴𝐴′ 𝐵𝐵′ = ΓE𝐴𝐴′ ⊗ Γ𝐵𝐵
F , where ΓE and ΓF are the Choi operators of
′ 𝐴𝐴′ 𝐵𝐵′

1121
Chapter 18: Classical-Feedback-Assisted Communication

E 𝐴→𝐴′ and F𝐵→𝐵′ , respectively. We need to check that the constraints in (18.3.105)
are satisfied for these choices. Consider that

𝑇𝐵𝐵′ (𝑉𝐴𝐴′ 𝐵𝐵′ ± ΓE𝐴𝐴′ ⊗ Γ𝐵𝐵

F
′) ≥ 0 (18.3.169)
⇐⇒ 𝑇𝐵𝐵′ (ΓE𝐴𝐴′ ⊗ Γ𝐵𝐵
F E F
′ ± Γ 𝐴𝐴′ ⊗ Γ𝐵𝐵 ′ ) ≥ 0 (18.3.170)
⇐⇒ ΓE𝐴𝐴′ ⊗ 𝑇𝐵𝐵′ (Γ𝐵𝐵
F E F
′ ) ± Γ 𝐴𝐴′ ⊗ 𝑇𝐵𝐵 ′ (Γ𝐵𝐵 ′ ) ≥ 0, (18.3.171)
F .
and the last inequality trivially holds because 𝑇𝐵𝐵′ acts as a positive map on Γ𝐵𝐵 ′
Also,

𝑆 𝐴𝐴′ 𝐵𝐵′ ± 𝑉𝐴𝐴′ 𝐵𝐵′ ≥ 0 (18.3.172)

⇐⇒ ΓE𝐴𝐴′ ⊗ Γ𝐵𝐵
F E F
′ ± Γ 𝐴𝐴′ ⊗ Γ𝐵𝐵 ′ ≥ 0, (18.3.173)

and

Tr 𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] = Tr 𝐴′ [ΓE𝐴𝐴′ ⊗ Γ𝐵𝐵

F
′] (18.3.174)
F
= 𝐼 𝐴 ⊗ Γ𝐵𝐵 ′ (18.3.175)
= 𝜋 𝐴 ⊗ Tr 𝐴𝐴′ [ΓE𝐴𝐴′ ⊗ Γ𝐵𝐵 F
′] (18.3.176)
= 𝜋 𝐴 ⊗ Tr 𝐴𝐴′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ]. (18.3.177)

Finally, consider that the objective function evaluates to

F
∥Tr 𝐴′ 𝐵′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] ∥ ∞ = Tr 𝐴′ 𝐵′ [ΓE𝐴𝐴′ ⊗ Γ𝐵𝐵 ′] (18.3.178)
∞
= ∥𝐼 𝐴𝐵 ∥ ∞ (18.3.179)
= 1. (18.3.180)

Combined with the general lower bound from Proposition 18.8, we conclude
(18.3.168). ■

Proposition 18.12 Subadditivity under Composition

Let M1𝐴𝐵→𝐴′ 𝐵′ , M2𝐴′ 𝐵′ →𝐴′′ 𝐵′′ be completely positive bipartite maps, and define

M3𝐴𝐵→𝐴′′ 𝐵′′ B M2𝐴′ 𝐵′ →𝐴′′ 𝐵′′ ◦ M1𝐴𝐵→𝐴′ 𝐵′ . (18.3.181)

Then
𝐶 𝛽 (M3𝐴𝐵→𝐴′′ 𝐵′′ ) ≤ 𝐶 𝛽 (M2𝐴′ 𝐵′ →𝐴′′ 𝐵′′ ) + 𝐶 𝛽 (M1𝐴𝐵→𝐴′ 𝐵′ ). (18.3.182)

1122
Chapter 18: Classical-Feedback-Assisted Communication

Proof: We prove the equivalent statement that

𝛽(M3𝐴𝐵→𝐴′′ 𝐵′′ ) ≤ 𝛽(M2𝐴′ 𝐵′ →𝐴′′ 𝐵′′ ) · 𝛽(M1𝐴𝐵→𝐴′ 𝐵′ ). (18.3.183)

Let 𝑆 1𝐴𝐴′ 𝐵𝐵′ and 𝑉𝐴𝐴

1
′ 𝐵𝐵 ′ satisfy

1 M 1
𝑇𝐵𝐵′ (𝑉𝐴𝐴 ′ 𝐵𝐵 ′ ± Γ 𝐴𝐴′ 𝐵𝐵 ′ ) ≥ 0, (18.3.184)
𝑆 1𝐴𝐴′ 𝐵𝐵′ ± 𝑉𝐴𝐴
1
′ 𝐵𝐵 ′ ≥ 0, (18.3.185)
Tr 𝐴′ [𝑆 1𝐴𝐴′ 𝐵𝐵′ ] = 𝜋 𝐴 ⊗ Tr 𝐴𝐴′ [𝑆 1𝐴𝐴′ 𝐵𝐵′ ], (18.3.186)

and let 𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ and 𝑉𝐴2′ 𝐴′′ 𝐵′ 𝐵′′ satisfy

𝑇𝐵′ 𝐵′′ (𝑉𝐴2′ 𝐴′′ 𝐵′ 𝐵′′ ± ΓM

2
𝐴′ 𝐴′′ 𝐵′ 𝐵′′ ) ≥ 0, (18.3.187)
𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ ± 𝑉𝐴2′ 𝐴′′ 𝐵′ 𝐵′′ ≥ 0, (18.3.188)
Tr 𝐴′′ [𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ ] = 𝜋 𝐴′ ⊗ Tr 𝐴′ 𝐴′′ [𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ ]. (18.3.189)

Then it follows that

1 2 M 1 M 2
𝑇𝐵𝐵′ 𝐵′ 𝐵′′ (𝑉𝐴𝐴 ′ 𝐵𝐵 ′ ⊗ 𝑉 𝐴′ 𝐴′′ 𝐵 ′ 𝐵 ′′ ± Γ 𝐴𝐴′ 𝐵𝐵 ′ ⊗ Γ 𝐴′ 𝐴′′ 𝐵 ′ 𝐵 ′′ ) ≥ 0, (18.3.190)
𝑆 1𝐴𝐴′ 𝐵𝐵′ ⊗ 𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ ± 𝑉𝐴𝐴
1 2
′ 𝐵𝐵 ′ ⊗ 𝑉 𝐴′ 𝐴′′ 𝐵 ′ 𝐵 ′′ ≥ 0. (18.3.191)

This latter statement is a consequence of the general fact that if 𝐴, 𝐵, 𝐶, and 𝐷 are
Hermitian operators satisfying 𝐴 ± 𝐵 ≥ 0 and 𝐶 ± 𝐷 ≥ 0, then 𝐴 ⊗ 𝐶 ± 𝐵 ⊗ 𝐷 ≥ 0.
To see this, consider that the original four operator inequalities imply the four
operator inequalities ( 𝐴 ± 𝐵) ⊗ (𝐶 ± 𝐷) ≥ 0, and then summing these four different
operator inequalities in various ways leads to 𝐴 ⊗ 𝐶 ± 𝐵 ⊗ 𝐷 ≥ 0.
Now apply the following positive map to (18.3.190)–(18.3.191):

(·) → (⟨Γ| 𝐴′ 𝐴′ ⊗ ⟨Γ| 𝐵′ 𝐵′ )(·)(|Γ⟩ 𝐴′ 𝐴′ ⊗ |Γ⟩𝐵′ 𝐵′ ), (18.3.192)

where
∑︁
|Γ⟩ 𝐴′ 𝐴′ B |𝑖⟩ 𝐴′ |𝑖⟩ 𝐴′ , (18.3.193)
𝑖
∑︁
|Γ⟩𝐵′ 𝐵′ B |𝑖⟩𝐵′ |𝑖⟩𝐵′ . (18.3.194)
𝑖

This gives
3 M ◦M 2 1
𝑇𝐵𝐵′′ (𝑉𝐴𝐴 ′′ 𝐵𝐵 ′′ ± Γ 𝐴𝐴′′ 𝐵𝐵 ′′ ) ≥ 0, (18.3.195)
1123
Chapter 18: Classical-Feedback-Assisted Communication

𝑆 3𝐴𝐴′′ 𝐵𝐵′′ ± 𝑉𝐴𝐴

3
′′ 𝐵𝐵 ′′ ≥ 0, (18.3.196)

where
3 1 2
𝑉𝐴𝐴 ′′ 𝐵𝐵 ′′ B (⟨Γ| 𝐴′ 𝐴′ ⊗ ⟨Γ| 𝐵 ′ 𝐵 ′ )(𝑉 𝐴𝐴′ 𝐵𝐵 ′ ⊗ 𝑉 𝐴′ 𝐴′′ 𝐵 ′ 𝐵 ′′ )(|Γ⟩ 𝐴′ 𝐴′ ⊗ |Γ⟩ 𝐵 ′ 𝐵 ′ ),
(18.3.197)
ΓM ◦M M M
2 1 1 2
𝐴𝐴′′ 𝐵𝐵′′ B (⟨Γ| 𝐴′ 𝐴′ ⊗ ⟨Γ| 𝐵′ 𝐵′ )(Γ 𝐴𝐴′ 𝐵𝐵′ ⊗ Γ 𝐴′ 𝐴′′ 𝐵′ 𝐵′′ )(|Γ⟩ 𝐴′ 𝐴′ ⊗ |Γ⟩ 𝐵′ 𝐵′ ),
(18.3.198)
𝑆 3𝐴𝐴′′ 𝐵𝐵′′ B (⟨Γ| 𝐴′ 𝐴′ ⊗ ⟨Γ| 𝐵′ 𝐵′ )(𝑆 1𝐴𝐴′ 𝐵𝐵′ ⊗ 𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ )(|Γ⟩ 𝐴′ 𝐴′ ⊗ |Γ⟩𝐵′ 𝐵′ ),
(18.3.199)
and we applied (4.2.20) to conclude that

(⟨Γ| 𝐴′ 𝐴′ ⊗ ⟨Γ| 𝐵′ 𝐵′ )(ΓM M M ◦M

1 2 2 1
𝐴𝐴′ 𝐵𝐵′ ⊗ Γ 𝐴′ 𝐴′′ 𝐵′ 𝐵′′ )(|Γ⟩ 𝐴′ 𝐴′ ⊗ |Γ⟩ 𝐵′ 𝐵′ ) = Γ 𝐴𝐴′′ 𝐵𝐵′′ .
(18.3.200)
Also, consider that

Tr 𝐴′′ [𝑆 3𝐴𝐴′′ 𝐵𝐵′′ ]

= Tr 𝐴′′ [(⟨Γ| 𝐴′ 𝐴′ ⊗ ⟨Γ| 𝐵′ 𝐵′ )(𝑆 1𝐴𝐴′ 𝐵𝐵′ ⊗ 𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ )(|Γ⟩ 𝐴′ 𝐴′ ⊗ |Γ⟩𝐵′ 𝐵′ )]
(18.3.201)
= (⟨Γ| 𝐴′ 𝐴′ ⊗ ⟨Γ| 𝐵′ 𝐵′ )(𝑆 1𝐴𝐴′ 𝐵𝐵′ ⊗ Tr 𝐴′′ [𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ ])(|Γ⟩ 𝐴′ 𝐴′ ⊗ |Γ⟩𝐵′ 𝐵′ )
(18.3.202)
= (⟨Γ| 𝐴′ 𝐴′ ⊗ ⟨Γ| 𝐵′ 𝐵′ )(𝑆 1𝐴𝐴′ 𝐵𝐵′ ⊗ 𝜋 𝐴′ ⊗ Tr 𝐴′ 𝐴′′ [𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ ])(|Γ⟩ 𝐴′ 𝐴′ ⊗ |Γ⟩𝐵′ 𝐵′ )
(18.3.203)
1
= (⟨Γ| 𝐴′ 𝐴′ ⊗ ⟨Γ| 𝐵′ 𝐵′ )(𝑆 1𝐴𝐴′ 𝐵𝐵′ ⊗ 𝐼 𝐴′ ⊗ Tr 𝐴′ 𝐴′′ [𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ ])(|Γ⟩ 𝐴′ 𝐴′ ⊗ |Γ⟩𝐵′ 𝐵′ )
𝑑𝐴 ′

(18.3.204)
1
= ⟨Γ| 𝐵′ 𝐵′ (Tr 𝐴′ [𝑆 1𝐴𝐴′ 𝐵𝐵′ ] ⊗ Tr 𝐴′ 𝐴′′ [𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ ])|Γ⟩𝐵′ 𝐵′ (18.3.205)
𝑑 𝐴′
1
= ⟨Γ| 𝐵′ 𝐵′ (𝜋 𝐴 ⊗ Tr 𝐴𝐴′ [𝑆 1𝐴𝐴′ 𝐵𝐵′ ] ⊗ Tr 𝐴′ 𝐴′′ [𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ ])|Γ⟩𝐵′ 𝐵′ (18.3.206)
𝑑 𝐴′
1
= 𝜋𝐴 ⊗ ⟨Γ| 𝐵′ 𝐵′ (Tr 𝐴𝐴′ [𝑆 1𝐴𝐴′ 𝐵𝐵′ ] ⊗ Tr 𝐴′ 𝐴′′ [𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ ])|Γ⟩𝐵′ 𝐵′ . (18.3.207)
𝑑 𝐴′
Now consider that
1
Tr 𝐴𝐴′′ [𝑆 3𝐴𝐴′′ 𝐵𝐵′′ ] = ⟨Γ| 𝐵′ 𝐵′ (Tr 𝐴𝐴′ [𝑆 1𝐴𝐴′ 𝐵𝐵′ ] ⊗ Tr 𝐴′ 𝐴′′ [𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ ])|Γ⟩𝐵′ 𝐵′ .
𝑑 𝐴′
(18.3.208)
1124
Chapter 18: Classical-Feedback-Assisted Communication

So we conclude that

Tr 𝐴′′ [𝑆 3𝐴𝐴′′ 𝐵𝐵′′ ] = 𝜋 𝐴 ⊗ Tr 𝐴𝐴′′ [𝑆 3𝐴𝐴′′ 𝐵𝐵′′ ]. (18.3.209)

Finally, consider that

Tr 𝐴′′ 𝐵′′ [𝑆 3𝐴𝐴′′ 𝐵𝐵′′ ] ∞

= Tr 𝐴′′ 𝐵′′ [(⟨Γ| 𝐴′ 𝐴′ ⊗ ⟨Γ| 𝐵′ 𝐵′ ) 𝑆 1𝐴𝐴′ 𝐵𝐵′ ⊗ 𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ (|Γ⟩ 𝐴′ 𝐴′ ⊗ |Γ⟩𝐵′ 𝐵′ )]
∞
(18.3.210)

= [(⟨Γ| 𝐴′ 𝐴′ ⊗ ⟨Γ| 𝐵′ 𝐵′ ) 𝑆 1𝐴𝐴′ 𝐵𝐵′ ⊗ Tr 𝐴′′ 𝐵′′ [𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ ] (|Γ⟩ 𝐴′ 𝐴′ ⊗ |Γ⟩𝐵′ 𝐵′ )]
∞
(18.3.211)
≤ Tr 𝐴′′ 𝐵′′ [𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ ] ∞
·

[(⟨Γ| 𝐴′ 𝐴′ ⊗ ⟨Γ| 𝐵′ 𝐵′ ) 𝑆 1𝐴𝐴′ 𝐵𝐵′ ⊗ 𝐼 𝐴′ 𝐵′ (|Γ⟩ 𝐴′ 𝐴′ ⊗ |Γ⟩𝐵′ 𝐵′ )] (18.3.212)
∞
= Tr 𝐴′′ 𝐵′′ [𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ ] ∞
Tr 𝐴′ 𝐵′ [𝑆 1𝐴𝐴′ 𝐵𝐵′ ] ∞
. (18.3.213)

Since 𝑆 3𝐴𝐴′′ 𝐵𝐵′′ and 𝑉𝐴𝐴

3
′′ 𝐵𝐵 ′′ are particular choices that satisfy the constraints in
(18.3.195)–(18.3.209), we conclude that

𝛽(M3𝐴𝐵→𝐴′′ 𝐵′′ ) ≤ Tr 𝐴′′ 𝐵′′ [𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ ] ∞

Tr 𝐴′ 𝐵′ [𝑆 1𝐴𝐴′ 𝐵𝐵′ ] ∞
. (18.3.214)

Since 𝑆 1𝐴𝐴′ 𝐵𝐵′ and 𝑉𝐴𝐴

1
′ 𝐵𝐵 ′ are arbitrary Hermitian operators satisfying the con-

straints in (18.3.184)–(18.3.186) and 𝑆 2𝐴′ 𝐴′′ 𝐵′ 𝐵′′ and 𝑉𝐴2′ 𝐴′′ 𝐵′ 𝐵′′ are arbitrary Her-
mitian operators satisfying the constraints in (18.3.187)–(18.3.189), we conclude
(18.3.182). ■

Corollary 18.13 Data Processing under Local Channels

Let M 𝐴𝐵→𝐴′ 𝐵′ be a completely positive bipartite map. Let K 𝐴→𝐴ˆ , L𝐵→𝐵
ˆ ,
N 𝐴′ →𝐴′′ , and P𝐵′ →𝐵′′ be local quantum channels, and define the bipartite
completely positive map F 𝐴ˆ 𝐵→𝐴
ˆ ′′ 𝐵 ′′ as follows:

F 𝐴ˆ 𝐵→𝐴
ˆ ′′ 𝐵 ′′ B (N 𝐴′ →𝐴′′ ⊗ P 𝐵 ′ →𝐵 ′′ )M 𝐴𝐵→𝐴′ 𝐵 ′ (K 𝐴→𝐴
ˆ ⊗ L𝐵→𝐵
ˆ ). (18.3.215)

Then
𝐶 𝛽 (F 𝐴ˆ 𝐵→𝐴
ˆ ′′ 𝐵 ′′ ) ≤ 𝐶 𝛽 (M 𝐴𝐵→𝐴′ 𝐵 ′ ). (18.3.216)

1125
Chapter 18: Classical-Feedback-Assisted Communication

Proof: Apply Propositions 18.11 and 18.12 to find that

𝐶 𝛽 (F 𝐴ˆ 𝐵→𝐴
ˆ ′′ 𝐵 ′′ )

≤ 𝐶 𝛽 (N 𝐴′ →𝐴′′ ⊗ P𝐵′ →𝐵′′ ) + 𝐶 𝛽 (M 𝐴𝐵→𝐴′ 𝐵′ ) + 𝐶 𝛽 (K 𝐴→𝐴

ˆ ⊗ L𝐵→𝐵
ˆ ) (18.3.217)
= 𝐶 𝛽 (M 𝐴𝐵→𝐴′ 𝐵′ ). (18.3.218)

This concludes the proof. ■

Corollary 18.14 Invariance under Local Unitary Channels

Let M 𝐴𝐵→𝐴′ 𝐵′ be a completely positive bipartite map. Let U 𝐴 , V𝐵 , W 𝐴′ , and
Y𝐵′ be local unitary channels, and define the bipartite completely positive map
F 𝐴ˆ 𝐵→𝐴
ˆ ′′ 𝐵 ′′ as follows:

F 𝐴𝐵→𝐴′ 𝐵′ B (W 𝐴′ ⊗ Y𝐵′ )M 𝐴𝐵→𝐴′ 𝐵′ (U 𝐴 ⊗ V𝐵 ). (18.3.219)

Then
𝐶 𝛽 (F 𝐴𝐵→𝐴′ 𝐵′ ) = 𝐶 𝛽 (M 𝐴𝐵→𝐴′ 𝐵′ ). (18.3.220)

Proof: Apply Corollary 18.13 twice to conclude that 𝐶 𝛽 (M 𝐴𝐵→𝐴′ 𝐵′ ) ≥ 𝐶 𝛽 (F 𝐴𝐵→𝐴′ 𝐵′ )

and 𝐶 𝛽 (F 𝐴𝐵→𝐴′ 𝐵′ ) ≥ 𝐶 𝛽 (M 𝐴𝐵→𝐴′ 𝐵′ ). ■

Proposition 18.15 Convexity

The measure 𝛽 is convex, in the following sense:

𝛽(M𝜆𝐴𝐵→𝐴′ 𝐵′ ) ≤ 𝜆𝛽(M1𝐴𝐵→𝐴′ 𝐵′ ) + (1 − 𝜆) 𝛽(M0𝐴𝐵→𝐴′ 𝐵′ ), (18.3.221)

where M0𝐴𝐵→𝐴′ 𝐵′ and M1𝐴𝐵→𝐴′ 𝐵′ are completely positive bipartite maps, 𝜆 ∈

[0, 1], and

M𝜆𝐴𝐵→𝐴′ 𝐵′ B 𝜆M1𝐴𝐵→𝐴′ 𝐵′ + (1 − 𝜆) M0𝐴𝐵→𝐴′ 𝐵′ . (18.3.222)

Proof: Let 𝑆 𝑥𝐴𝐴′ 𝐵𝐵′ and 𝑉𝐴𝐴 ′ 𝐵𝐵 ′ satisfy the constraints in (18.3.105) for M 𝐴𝐵→𝐴′ 𝐵 ′
𝑥 𝑥

for 𝑥 ∈ {0, 1}. Then

𝑆𝜆𝐴𝐴′ 𝐵𝐵′ B 𝜆𝑆 1𝐴𝐴′ 𝐵𝐵′ + (1 − 𝜆) 𝑆 0𝐴𝐴′ 𝐵𝐵′ , (18.3.223)

𝜆 1 0
𝑉𝐴𝐴 ′ 𝐵𝐵 ′ B 𝜆𝑉 𝐴𝐴′ 𝐵𝐵 ′ + (1 − 𝜆) 𝑉 𝐴𝐴′ 𝐵𝐵 ′ , (18.3.224)
1126
Chapter 18: Classical-Feedback-Assisted Communication

satisfy the constraints in (18.3.105) for M𝜆𝐴𝐵→𝐴′ 𝐵′ . Then it follows that

𝛽(M𝜆𝐴𝐵→𝐴′ 𝐵′ ) ≤ Tr 𝐴′ 𝐵′ [𝑆𝜆𝐴𝐴′ 𝐵𝐵′ ] ∞

(18.3.225)
≤ 𝜆 Tr 𝐴′ 𝐵′ [𝑆 1𝐴𝐴′ 𝐵𝐵′ ] ∞
+ (1 − 𝜆) Tr 𝐴′ 𝐵′ [𝑆 0𝐴𝐴′ 𝐵𝐵′ ] ∞ ,
(18.3.226)

where the second inequality follows from convexity of the ∞-norm. Since the
inequality holds for all 𝑆 𝑥𝐴𝐴′ 𝐵𝐵′ and 𝑉𝐴𝐴
𝑥
′ 𝐵𝐵 ′ satisfying the constraints in (18.3.105)
for M 𝐴𝐵→𝐴′ 𝐵′ for 𝑥 ∈ {0, 1}, we conclude (18.3.221). ■
𝑥

18.3.3.2 Related Measures

We now define variations of the bipartite channel measure from (18.3.105). We

employ generalized divergences to do so, and in doing so, we arrive at a large
number of variations of the basic bipartite channel measure.
Using the generalized channel divergence from Definition 7.81, we define the
following:

Definition 18.16 𝚼-Measure of Classical Communication for Bipartite

Channels
For a bipartite channel N 𝐴𝐵→𝐴′ 𝐵′ , we define the following measure of forward
classical communication:

𝚼(N 𝐴𝐵→𝐴′ 𝐵′ ) B inf 𝑫 (N 𝐴𝐵→𝐴′ 𝐵′ ∥M 𝐴𝐵→𝐴′ 𝐵′ ),

M 𝐴𝐵→𝐴′ 𝐵′ :𝛽(M 𝐴𝐵→𝐴′ 𝐵′ )≤1
(18.3.227)
where the optimization is with respect to completely positive bipartite maps
M 𝐴𝐵→𝐴′ 𝐵′ .

Using the quantum relative entropy, the sandwiched Rényi relative entropy,
the Belavkin–Staszewski relative entropy, and the geometric Rényi relative en-
tropy, we then obtain the following respective channel measures: Υ(N 𝐴𝐵→𝐴′ 𝐵′ ),
e𝛼 (N 𝐴𝐵→𝐴′ 𝐵′ ), Υ(N
Υ b𝛼 (N 𝐴𝐵→𝐴′ 𝐵′ ), defined by substituting 𝑫 with
b 𝐴𝐵→𝐴′ 𝐵′ ), and Υ
𝐷, 𝐷e𝛼 , 𝐷,
b and 𝐷 b𝛼 .

We now establish some properties of 𝚼(N 𝐴𝐵→𝐴′ 𝐵′ ), analogous to those estab-

lished earlier for 𝐶 𝛽 (N 𝐴𝐵→𝐴′ 𝐵′ ) in Section 18.3.3.1. We assume that the underlying
1127
Chapter 18: Classical-Feedback-Assisted Communication

generalized divergence satisfies the minimal assumptions in (7.3.34) and (7.3.36);

that is, 𝑫 (1∥𝑐) ≥ 0 for 𝑐 ∈ (0, 1] and 𝑫 (𝜌∥ 𝜌) = 0 for every state 𝜌.

Proposition 18.17 Non-Negativity

Let N 𝐴𝐵→𝐴′ 𝐵′ be a bipartite channel. Then

𝚼(N 𝐴𝐵→𝐴′ 𝐵′ ) ≥ 0. (18.3.228)

Proof: We prove the first inequality and the proof of the second inequality is
similar. Consider that
𝚼(N 𝐴𝐵→𝐴′ 𝐵′ )
= inf 𝑫 (N 𝐴𝐵→𝐴′ 𝐵′ ∥M 𝐴𝐵→𝐴′ 𝐵′ )
M 𝐴𝐵→𝐴′ 𝐵′ :
𝛽(M 𝐴𝐵→𝐴′ 𝐵′ )≤1
≥ inf 𝑫 (N 𝐴𝐵→𝐴′ 𝐵′ (Φ 𝑅 𝐴 ⊗ Φ𝐵𝑆 )∥M 𝐴𝐵→𝐴′ 𝐵′ (Φ 𝑅 𝐴 ⊗ Φ𝐵𝑆 ))
M 𝐴𝐵→𝐴′ 𝐵′ :
𝛽(M 𝐴𝐵→𝐴′ 𝐵′ )≤1
≥ inf 𝑫 (Tr[N 𝐴𝐵→𝐴′ 𝐵′ (Φ 𝑅 𝐴 ⊗ Φ𝐵𝑆 )] ∥ Tr[M 𝐴𝐵→𝐴′ 𝐵′ (Φ 𝑅 𝐴 ⊗ Φ𝐵𝑆 )])
M 𝐴𝐵→𝐴′ 𝐵′ :
𝛽(M 𝐴𝐵→𝐴′ 𝐵′ )≤1
= inf 𝑫 (1∥ Tr[M 𝐴𝐵→𝐴′ 𝐵′ (Φ 𝑅 𝐴 ⊗ Φ𝐵𝑆 )]) (18.3.229)
M 𝐴𝐵→𝐴′ 𝐵′ :𝛽(M 𝐴𝐵→𝐴′ 𝐵′ )≤1

The first inequality follows because 𝑫 (N 𝐴𝐵→𝐴′ 𝐵′ ∥M 𝐴𝐵→𝐴′ 𝐵′ ) involves an opti-

mization over all possible input states, and we have chosen the product of maximally
entangled states. The second inequality follows from the data-processing inequality
for the generalized divergence. Thus, the inequality follows if we can show that
Tr[M 𝐴𝐵→𝐴′ 𝐵′ (Φ 𝑅 𝐴 ⊗ Φ𝐵𝑆 )] ≤ 1. (18.3.230)
Let 𝜆, 𝑆 𝐴𝐴′ 𝐵𝐵′ , and 𝑉𝐴𝐴′ 𝐵𝐵′ be arbitrary Hermitian operators satisfying the con-
straints in (18.3.114) for M 𝐴𝐵→𝐴′ 𝐵′ . Then, we find that
𝜆𝑑 𝐴 𝑑 𝐵 = 𝜆 Tr 𝐴𝐵 [𝐼 𝐴𝐵 ] (18.3.231)
≥ Tr 𝐴𝐴′ 𝐵𝐵′ [𝑆 𝐴𝐴′ 𝐵𝐵′ ] (18.3.232)
≥ Tr 𝐴𝐴′ 𝐵𝐵′ [𝑉𝐴𝐴′ 𝐵𝐵′ ] (18.3.233)
= Tr 𝐴𝐴′ 𝐵𝐵′ [𝑇𝐵𝐵′ (𝑉𝐴𝐴′ 𝐵𝐵′ )] (18.3.234)
≥ Tr 𝐴𝐴′ 𝐵𝐵′ [𝑇𝐵𝐵′ (ΓM
𝐴𝐴′ 𝐵𝐵′ )] (18.3.235)
= Tr 𝐴𝐴′ 𝐵𝐵′ [ΓM
𝐴𝐴′ 𝐵𝐵′ ] (18.3.236)
1128
Chapter 18: Classical-Feedback-Assisted Communication

= Tr[ΓM
𝐴𝐴′ 𝐵𝐵′ ], (18.3.237)
which is equivalent to
𝜆 ≥ Tr[M 𝐴𝐵→𝐴′ 𝐵′ (Φ 𝑅 𝐴 ⊗ Φ𝐵𝑆 )]. (18.3.238)
Taking an infimum over 𝜆, 𝑆 𝐴𝐴′ 𝐵𝐵′ , and 𝑉𝐴𝐴′ 𝐵𝐵′ satisfying the constraints in
(18.3.114) for M 𝐴𝐵→𝐴′ 𝐵′ and applying the assumption 𝛽(M 𝐴𝐵→𝐴′ 𝐵′ ) ≤ 1, we
conclude (18.3.230). ■

Proposition 18.18 Stability

Let N 𝐴𝐵→𝐴′ 𝐵′ be a bipartite channel. Then

𝚼(N 𝐴𝐵→𝐴′ 𝐵′ ) = 𝚼(id 𝐴→

¯ 𝐴˜ ⊗N 𝐴𝐵→𝐴′ 𝐵′ ⊗ id 𝐵→
¯ 𝐵˜ ). (18.3.239)

Proof: The definition of the generalized channel divergence in Definition 7.81

implies that it is stable, in the sense that
𝑫 (N 𝐴𝐵→𝐴′ 𝐵′ ∥M 𝐴𝐵→𝐴′ 𝐵′ ) =
𝑫 (id 𝐴→
¯ 𝐴˜ ⊗N 𝐴𝐵→𝐴′ 𝐵′ ⊗ id 𝐵→
¯ 𝐵˜ ∥ id 𝐴→
¯ 𝐴˜ ⊗M 𝐴𝐵→𝐴′ 𝐵′ ⊗ id 𝐵→
¯ 𝐵˜ ), (18.3.240)

for every channel N 𝐴𝐵→𝐴′ 𝐵′ and completely positive map M 𝐴𝐵→𝐴′ 𝐵′ . Combining
with Proposition 18.9 and the definition in (18.3.227), we conclude (18.3.239). ■

Proposition 18.19 Zero on Classical Feedback Channels

Let Δ𝐵→𝐴′ be a classical feedback channel:
𝑑−1
∑︁
Δ𝐵→𝐴′ (·) B |𝑖⟩ 𝐴′ ⟨𝑖| 𝐵 (·)|𝑖⟩𝐵 ⟨𝑖| 𝐴′ , (18.3.241)
𝑖=0

where system 𝐴′ is isomorphic to 𝐵 and 𝑑 = 𝑑 𝐴′ = 𝑑 𝐵 . Then

𝚼(Δ𝐵→𝐴′ ) = 0. (18.3.242)

Proof: This follows from Proposition 18.10. Since 𝛽(Δ𝐵→𝐴′ ) = 1, we can pick
M𝐵→𝐴′ = Δ𝐵→𝐴′ , and then
𝑫 (Δ𝐵→𝐴′ ∥M𝐵→𝐴′ ) = 𝑫 (Δ𝐵→𝐴′ ∥Δ𝐵→𝐴′ ) = 0. (18.3.243)
1129
Chapter 18: Classical-Feedback-Assisted Communication

So this establishes that 𝚼(Δ𝐵→𝐴′ ) ≤ 0, and the other inequality 𝚼(Δ𝐵→𝐴′ ) ≥ 0

follows from Proposition 18.17. ■

Proposition 18.20 Zero on Tensor Products of Local Channels

Let E 𝐴→𝐴′ and F𝐵→𝐵′ be quantum channels. Then

𝚼(E 𝐴→𝐴′ ⊗ F𝐵→𝐵′ ) = 0. (18.3.244)

Proof: Same argument as given for Proposition 18.19, but use Proposition 18.11
instead. ■

We now establish some properties that are more specific to the Belavkin–
Staszewski and geometric Rényi relative entropies (however the first actually holds
also for the quantum relative entropy and other quantum Rényi relative entropies).

Proposition 18.21
Let N 𝐴𝐵→𝐴′ 𝐵′ be a bipartite channel. Then for all 𝛼 ∈ (1, 2],
b 𝐴𝐵→𝐴′ 𝐵′ ) ≤ Υ
Υ(N b𝛼 (N 𝐴𝐵→𝐴′ 𝐵′ ) ≤ 𝐶 𝛽 (N 𝐴𝐵→𝐴′ 𝐵′ ). (18.3.245)

Proof: Pick M 𝐴𝐵→𝐴′ 𝐵′ = 𝛽(N 𝐴𝐵→𝐴1

′ 𝐵′ )
N 𝐴𝐵→𝐴′ 𝐵′ in the definition of Υ(N
b 𝐴𝐵→𝐴′ 𝐵′ )
b𝛼 (N 𝐴𝐵→𝐴′ 𝐵′ ) and use the fact that, for 𝑐 > 0, 𝐷
and Υ b (𝜌∥𝑐𝜎) = 𝐷 b (𝜌∥𝜎) − log2 𝑐
b𝛼 (𝜌∥𝑐𝜎) = 𝐷
and 𝐷 b𝛼 (𝜌∥𝜎) − log2 𝑐 for all 𝛼 ∈ (1, 2]. We also require the
monotonicity in 𝛼 property from Proposition 7.44. ■

Proposition 18.22 Subadditivity

For bipartite channels N1𝐴𝐵→𝐴′ 𝐵′ and N2𝐴′ 𝐵′ →𝐴′′ 𝐵′′ , the following inequality
holds for all 𝛼 ∈ (0, 1) ∪ (1, 2]:
b𝛼 (N2 ′ ′ ′′ ′′ ◦ N1 2 1
Υ 𝐴𝐵→𝐴′ 𝐵′ ) ≤ Υ𝛼 (N 𝐴′ 𝐵′ →𝐴′′ 𝐵′′ ) + Υ𝛼 (N 𝐴𝐵→𝐴′ 𝐵′ ).
b b
𝐴 𝐵 →𝐴 𝐵
(18.3.246)

1130
Chapter 18: Classical-Feedback-Assisted Communication

Proof: This inequality is a direct consequence of the subadditivity inequality

in [REF - GEOMETRIC CH RENYI SUBADD], and the fact that if M1 and
M2 are completely positive bipartite maps satisfying 𝛽(M1 ), 𝛽(M2 ) ≤ 1, then
𝛽(M2 ◦ M1 ) ≤ 1 (see Proposition 18.12). ■

18.3.3.3 Measure of Classical Communication for a Point-to-Point Channel

Let M 𝐴→𝐵′ be a point-to-point completely positive map, which is a special case

of a completely positive bipartite map with the Bob input 𝐵 trivial and the Alice
output 𝐴′ trivial. We first show that 𝛽 in (18.3.105) reduces to the measure from
(12.2.228).

Proposition 18.23
Let M 𝐴→𝐵′ be a point-to-point completely positive map. Then



 Tr[𝑆 𝐵′ ] : 



𝛽(M 𝐴→𝐵′ ) B inf 𝑇𝐵𝐵′ (𝑉𝐴𝐵′ ± ΓM 𝐴𝐵′ ) ≥ 0, . (18.3.247)
𝑆 𝐵′ ,𝑉 𝐴𝐵′ ∈Herm 
 𝐼 𝐴 ⊗ 𝑆 𝐵′ ± 𝑉𝐴𝐵′ ≥ 0 

 

Proof: In this case, the systems 𝐴′ and 𝐵 are trivial. So then the definition in
(18.3.105) reduces to


 ∥Tr 𝐵′ [𝑆 𝐴𝐵′ ] ∥ ∞ : 


 𝑇𝐵𝐵′ (𝑉𝐴𝐵′ ± ΓM ′ ) ≥ 0,

 


𝛽(M 𝐴→𝐵′ ) = inf 𝐴𝐵 . (18.3.248)
𝑆 𝐴𝐵′ ,𝑉 𝐴𝐵′ ∈Herm 
 𝑆 𝐴𝐵 ′ ± 𝑉 𝐴𝐵′ ≥ 0, 

 𝑆 𝐴𝐵′ = 𝜋 𝐴 ⊗ Tr 𝐴 [𝑆 𝐴𝐵′ ]
 

 
The last constraint implies that the optimization simplifies to

 ∥Tr 𝐵′ [𝜋 𝐴 ⊗ Tr 𝐴 [𝑆 𝐴𝐵′ ]] ∥ ∞ :

 

 

𝛽(M 𝐴→𝐵′ ) = inf 𝑇𝐵𝐵′ (𝑉𝐴𝐵′ ± ΓM 𝐴𝐵′ ) ≥ 0, (18.3.249)
𝑆 𝐴𝐵′ ,𝑉 𝐴𝐵′ ∈Herm 
 𝜋 𝐴 ⊗ Tr 𝐴 [𝑆 𝐴𝐵′ ] ± 𝑉𝐴𝐵′ ≥ 0
 


′
 Tr 𝐵′ [𝜋 𝐴 ⊗ 𝑆 𝐵′ ] ∞ : 

 
 

= ′ inf 𝑇𝐵𝐵′ (𝑉𝐴𝐵′ ± ΓM 𝐴𝐵′ ) ≥ 0, (18.3.250)
𝑆 𝐵′ ,𝑉 𝐴𝐵′ ∈Herm 
 𝜋 𝐴 ⊗ 𝑆′ ′ ± 𝑉𝐴𝐵′ ≥ 0  
 𝐵 

1131
Chapter 18: Classical-Feedback-Assisted Communication




 Tr[𝑆′𝐵′ ] ∥𝜋 𝐴 ∥ ∞ : 



= ′ inf 𝑇𝐵𝐵′ (𝑉𝐴𝐵′ ± ΓM 𝐴𝐵′ ) ≥ 0, (18.3.251)
𝑆 𝐵′ ,𝑉 𝐴𝐵′ ∈Herm 
 𝜋 𝐴 ⊗ 𝑆′ ′ ± 𝑉𝐴𝐵′ ≥ 0 

 𝐵 
1 ′
𝑑 𝐴 Tr[𝑆 𝐵′ ] :

 


 

= ′ inf 𝑇𝐵𝐵′ (𝑉𝐴𝐵′ ± ΓM 𝐴𝐵′ ) ≥ 0,
(18.3.252)
𝑆 𝐵′ ,𝑉 𝐴𝐵′ ∈Herm 
 𝜋 𝐴 ⊗ 𝑆′ ′ ± 𝑉𝐴𝐵′ ≥ 0 

 𝐵 



 Tr[𝑆 𝐵′ ] : 



= inf 𝑇𝐵𝐵′ (𝑉𝐴𝐵′ ± ΓM 𝐴𝐵′ ) ≥ 0, . (18.3.253)
𝑆 𝐵′ ,𝑉 𝐴𝐵′ ∈Herm 
 𝜋 𝐴 ⊗ 𝑆 𝐵′ ± 𝑉𝐴𝐵′ ≥ 0 

 
This concludes the proof. ■

More generally, consider that the definition in (18.3.227) becomes as follows

for a point-to-point channel N 𝐴→𝐵′ :

𝚼(N 𝐴→𝐵′ ) B inf 𝑫 (N 𝐴→𝐵′ ∥M 𝐴→𝐵′ ), (18.3.254)

M 𝐴→𝐵′ :𝛽(M 𝐴→𝐵′ )≤1

b 𝐴→𝐵′ ) and Υ
which leads to the quantities Υ(N b𝛼 (N 𝐴→𝐵′ ), for which we have the
following bounds for 𝛼 ∈ (1, 2]:
b 𝐴→𝐵′ ) ≤ Υ
Υ(N b𝛼 (N 𝐴→𝐵′ ) ≤ 𝐶 𝛽 (N 𝐴→𝐵′ ). (18.3.255)

The next proposition is critical for establishing our upper bound proofs in
Section 18.3.3.4. It states that if one share of a maximally classically correlated
state passes through a completely positive map M 𝐴→𝐵′ for which 𝛽(M 𝐴→𝐵′ ) ≤ 1,
then the resulting operator has a very small chance of passing the comparator test,
as defined in (18.3.258). (Recall that we previously used the comparator test in
(11.1.37) and (12.1.19).)

Proposition 18.24 Bound for Comparator Test Success Probability

Let
𝑑−1
1 ∑︁
Φ 𝐴𝐴
ˆ B |𝑖⟩⟨𝑖| 𝐴ˆ ⊗ |𝑖⟩⟨𝑖| 𝐴 (18.3.256)
𝑑 𝑖=0
denote the maximally classically correlated state, and let M 𝐴→𝐵′ be a completely

1132
Chapter 18: Classical-Feedback-Assisted Communication

positive map M 𝐴→𝐵′ for which 𝛽(M 𝐴→𝐵′ ) ≤ 1. Then

1
ˆ ′ M 𝐴→𝐵′ (Φ 𝐴𝐴
Tr[Π 𝐴𝐵 ˆ )] ≤ , (18.3.257)
𝑑
where Π 𝐴𝐵
ˆ ′ is the comparator test:

𝑑−1
∑︁
Π 𝐴𝐵
ˆ ′ B |𝑖⟩⟨𝑖| 𝐴ˆ ⊗ |𝑖⟩⟨𝑖| 𝐵′ , (18.3.258)
𝑖=0

ˆ 𝐴, and 𝐵′.
and the following systems are isomorphic: 𝐴,

Proof: Recall the expression for 𝛽(M 𝐴→𝐵′ ) in (18.3.247). Let 𝑆 𝐵′ and 𝑉𝐴𝐵′
be arbitrary Hermitian operators satisfying the constraints for 𝛽(M 𝐴→𝐵′ ). An
application of (4.2.6) implies that
M
M 𝐴→𝐵′ (Φ 𝐴𝐴
ˆ ) = ⟨Γ| 𝐴 𝐴˜ Φ 𝐴𝐴 ˜ ′ |Γ⟩ 𝐴 𝐴˜ ,
ˆ ⊗ Γ 𝐴𝐵 (18.3.259)
where 𝐴˜ ≃ 𝐴. This means that
M
ˆ ′ M 𝐴→𝐵′ (Φ 𝐴𝐴
Tr[Π 𝐴𝐵 ˆ )] = Tr[Π 𝐴𝐵
ˆ ′ ⟨Γ| 𝐴 𝐴˜ Φ 𝐴𝐴 ˜ ′ |Γ⟩ 𝐴 𝐴˜ ]
ˆ ⊗ Γ 𝐴𝐵 (18.3.260)
M
= Tr[𝑇𝐵′ (Π 𝐴𝐵
ˆ ′ )⟨Γ| 𝐴 𝐴˜ Φ 𝐴𝐴
ˆ ⊗ Γ 𝐴𝐵
˜ ′ |Γ⟩ 𝐴 𝐴˜ ] (18.3.261)
M
ˆ ′ ⟨Γ| 𝐴 𝐴˜ Φ 𝐴𝐴
= Tr[Π 𝐴𝐵 ˜ ′ )|Γ⟩ 𝐴 𝐴˜ ]
ˆ ⊗ 𝑇𝐵′ (Γ 𝐴𝐵 (18.3.262)
≤ Tr[Π 𝐴𝐵
ˆ ′ ⟨Γ| 𝐴 𝐴˜ Φ 𝐴𝐴
ˆ ⊗ 𝑇𝐵′ (𝑉 𝐴𝐵
˜ ′ )|Γ⟩ 𝐴 𝐴˜ ] (18.3.263)
= Tr[𝑇𝐵′ (Π 𝐴𝐵
ˆ ′ )⟨Γ| 𝐴 𝐴˜ Φ 𝐴𝐴
ˆ ⊗ 𝑉 𝐴𝐵
˜ ′ |Γ⟩ 𝐴 𝐴˜ ] (18.3.264)
ˆ ′ ⟨Γ| 𝐴 𝐴˜ Φ 𝐴𝐴
= Tr[Π 𝐴𝐵 ˆ ⊗ 𝑉 𝐴𝐵
˜ ′ |Γ⟩ 𝐴 𝐴˜ ] (18.3.265)
≤ Tr[Π 𝐴𝐵
ˆ ′ ⟨Γ| 𝐴 𝐴˜ Φ 𝐴𝐴
ˆ ⊗ 𝐼 𝐴˜ ⊗ 𝑆 𝐵′ |Γ⟩ 𝐴 𝐴˜ ] (18.3.266)
ˆ ′ ⟨Γ| 𝐴 𝐴˜ Φ 𝐴𝐴
= Tr[Π 𝐴𝐵 ˆ ⊗ 𝐼 𝐴˜ |Γ⟩ 𝐴 𝐴˜ ⊗ 𝑆 𝐵′ ] (18.3.267)
ˆ ′ Tr 𝐴 [Φ 𝐴𝐴
= Tr[Π 𝐴𝐵 ˆ ] ⊗ 𝑆 𝐵′ ] (18.3.268)
1
= Tr[Π 𝐴𝐵 ˆ ′ 𝐼 𝐴ˆ ⊗ 𝑆 𝐵′ ] (18.3.269)
𝑑
1
= Tr[𝑆 𝐵′ ]. (18.3.270)
𝑑
Since this holds for all 𝑆 𝐵′and 𝑉𝐴𝐵′ satisfying the constraints for 𝛽(M 𝐴→𝐵′ ), we
conclude that
1
ˆ ′ M 𝐴→𝐵′ (Φ 𝐴𝐴
Tr[Π 𝐴𝐵 ˆ )] ≤ . (18.3.271)
𝑑
1133
Chapter 18: Classical-Feedback-Assisted Communication

This concludes the proof. ■

We finally state another proposition that plays an essential role in our upper
bound proofs in Section 18.3.3.4.

Proposition 18.25
Suppose that N 𝐴→𝐵 is a channel with 𝐴 isomorphic to 𝐵 that satisfies
1
N 𝐴→𝐵 (Φ 𝑅 𝐴 ) − Φ 𝑅𝐵 ≤ 𝜀, (18.3.272)
2 1
Í
for 𝜀 ∈ [0, 1) and where Φ 𝑅𝐵 B 𝑑1 𝑖 |𝑖⟩⟨𝑖| 𝑅 ⊗ |𝑖⟩⟨𝑖| 𝐵 and 𝑑 = 𝑑 𝑅 = 𝑑 𝐴 = 𝑑 𝐵 .
Then

log2 𝑑 ≤ inf 𝐷 𝜀𝐻 (N 𝐴→𝐵 (Φ 𝑅 𝐴 )∥M 𝐴→𝐵 (Φ 𝑅 𝐴 )), (18.3.273)

M 𝐴→𝐵 :𝛽(M 𝐴→𝐵 )≤1

and for all 𝛼 ∈ (1, 2],

log2 𝑑 ≤ inf b𝛼 (N 𝐴→𝐵 (Φ 𝑅 𝐴 )∥M 𝐴→𝐵 (Φ 𝑅 𝐴 ))

𝐷
M 𝐴→𝐵 :𝛽(M 𝐴→𝐵 )≤1

𝛼 1
+ log2 . (18.3.274)
𝛼−1 1−𝜀

Proof: We begin by proving (18.3.273). The condition

1
N 𝐴→𝐵 (Φ 𝑅 𝐴 ) − Φ 𝑅𝐵 ≤𝜀 (18.3.275)
2 1

implies that
Tr[Π 𝑅𝐵 N 𝐴→𝐵 (Φ 𝑅 𝐴 )] ≥ 1 − 𝜀, (18.3.276)
Í
where Π 𝑅𝐵 B 𝑖 |𝑖⟩⟨𝑖| 𝑅 ⊗ |𝑖⟩⟨𝑖| 𝐵 is the comparator test. Indeed, applying a
Í
completely dephasing channel Δ𝐵 (·) B 𝑖 |𝑖⟩⟨𝑖|(·)|𝑖⟩⟨𝑖| to the output of the channel
N 𝐴→𝐵 and applying the data-processing inequality for trace distance, we conclude
that
1
𝜀≥ N 𝐴→𝐵 (Φ 𝑅 𝐴 ) − Φ 𝑅𝐵 (18.3.277)
2 1

1134
Chapter 18: Classical-Feedback-Assisted Communication

1
≥ (Δ𝐵 ◦ N 𝐴→𝐵 )(Φ 𝑅 𝐴 ) − Δ𝐵 (Φ 𝑅𝐵 ) (18.3.278)
2 1
1
= (Δ𝐵 ◦ N 𝐴→𝐵 )(Φ 𝑅 𝐴 ) − Φ 𝑅𝐵 . (18.3.279)
2 1

Let 𝜔 𝑅𝐵 B (Δ𝐵 ◦ N 𝐴→𝐵 )(Φ 𝑅 𝐴 ) and observe that it can be written as

1 ∑︁
𝜔 𝑅𝐵 = 𝑝( 𝑗 |𝑖)|𝑖⟩⟨𝑖| 𝑅 ⊗ | 𝑗⟩⟨ 𝑗 | 𝐵 (18.3.280)
𝑑 𝑖, 𝑗

for some conditional probability distribution 𝑝( 𝑗 |𝑖). Then

1
(Δ𝐵 ◦ N 𝐴→𝐵 )(Φ 𝑅 𝐴 ) − Φ 𝑅𝐵
2 1
1 1 ∑︁ 1 ∑︁
= 𝑝( 𝑗 |𝑖)|𝑖⟩⟨𝑖| 𝑅 ⊗ | 𝑗⟩⟨ 𝑗 | 𝐵 − 𝛿𝑖, 𝑗 |𝑖⟩⟨𝑖| 𝑅 ⊗ | 𝑗⟩⟨ 𝑗 | 𝐵 (18.3.281)
2 𝑑 𝑖, 𝑗 𝑑 𝑖, 𝑗
1
1 1 ∑︁ ∑︁
= ( 𝑝( 𝑗 |𝑖) − 𝛿𝑖, 𝑗 )| 𝑗⟩⟨ 𝑗 | 𝐵 (18.3.282)
2𝑑 𝑖 𝑗
" #1
1 1 ∑︁ ∑︁
= (1 − 𝑝(𝑖|𝑖)) + 𝑝( 𝑗 |𝑖) (18.3.283)
2𝑑 𝑖 𝑗≠𝑖
1 ∑︁
= (1 − 𝑝(𝑖|𝑖)) (18.3.284)
𝑑 𝑖
∑︁ 1
=1− 𝑝(𝑖|𝑖). (18.3.285)
𝑖
𝑑

This implies that

∑︁ 1
𝑝(𝑖|𝑖) ≥ 1 − 𝜀. (18.3.286)
𝑖
𝑑
Now consider that

Tr[Π 𝑅𝐵 N 𝐴→𝐵 (Φ 𝑅 𝐴 )] = Tr[Δ𝐵 (Π 𝑅𝐵 )N 𝐴→𝐵 (Φ 𝑅 𝐴 )] (18.3.287)

= Tr[Π 𝑅𝐵 (Δ𝐵 ◦ N 𝐴→𝐵 )(Φ 𝑅 𝐴 )] (18.3.288)
= Tr[Π 𝑅𝐵 𝜔 𝑅𝐵 ] (18.3.289)
∑︁ 1
= 𝑝(𝑖|𝑖). (18.3.290)
𝑖
𝑑

1135
Chapter 18: Classical-Feedback-Assisted Communication

So we conclude that
Tr[Π 𝑅𝐵 N 𝐴→𝐵 (Φ 𝑅 𝐴 )] ≥ 1 − 𝜀. (18.3.291)
Applying the definition of the hypothesis testing relative entropy from Defini-
tion 7.65, we conclude that

inf 𝐷 𝜀𝐻 (N 𝐴→𝐵 (Φ 𝑅 𝐴 )∥M 𝐴→𝐵 (Φ 𝑅 𝐴 ))

M 𝐴→𝐵 :𝛽(M 𝐴→𝐵 )≤1

Tr[Λ 𝑅𝐵 M 𝐴→𝐵 (Φ 𝑅 𝐴 )] :
= inf − log2 inf
M 𝐴→𝐵 : Λ 𝑅𝐵 ≥0 Tr[Λ 𝑅𝐵 N 𝐴→𝐵 (Φ 𝑅 𝐴 )] ≥ 1 − 𝜀, Λ 𝑅𝐵 ≤ 𝐼 𝑅𝐵
𝛽(M 𝐴→𝐵 )≤1
(18.3.292)

Tr[Λ 𝑅𝐵 M 𝐴→𝐵 (Φ 𝑅 𝐴 )] :
= − log2 sup inf .
M 𝐴→𝐵 : Λ 𝑅𝐵 ≥0 Tr[Λ 𝑅𝐵 N 𝐴→𝐵 (Φ 𝑅 𝐴 )] ≥ 1 − 𝜀, Λ 𝑅𝐵 ≤ 𝐼 𝑅𝐵
𝛽(M 𝐴→𝐵 )≤1
(18.3.293)

Now consider that

Tr[Λ 𝑅𝐵 M 𝐴→𝐵 (Φ 𝑅 𝐴 )] :
sup inf
M 𝐴→𝐵 : Λ 𝑅𝐵 ≥0 Tr[Λ 𝑅𝐵 N 𝐴→𝐵 (Φ 𝑅 𝐴 )] ≥ 1 − 𝜀, Λ 𝑅𝐵 ≤ 𝐼 𝑅𝐵
𝛽(M 𝐴→𝐵 )≤1

≤ sup Tr[Π 𝑅𝐵 M 𝐴→𝐵 (Φ 𝑅 𝐴 )] (18.3.294)

M 𝐴→𝐵 :𝛽(M 𝐴→𝐵 )≤1
1
≤ , (18.3.295)
𝑑
where the last inequality follows from Proposition 18.24. Then applying a negative
logarithm gives (18.3.273).
The inequality in (18.3.274) follows as direct application of the following
relationship between hypothesis testing relative entropy and the geometric Rényi
relative entropy:

𝛼 1
𝐷 𝜀𝐻 (𝜌∥𝜎) ≤ 𝐷
b𝛼 (𝜌∥𝜎) + log2 , (18.3.296)
𝛼−1 1−𝜀

as well as the previous proposition. The proof of (18.3.296) follows the same proof
given for Proposition 7.71. ■

1136
Chapter 18: Classical-Feedback-Assisted Communication

18.3.3.4 Proof of Geometric 𝚼-Information Upper Bound

We now have everything that we need to establish that the geometric Υ-information
is an upper bound on the number of bits that can be transmitted by means of
a quantum channel assisted by a classical feedback channel. By examining the
𝑝
protocol in Section 18.1, consider that the final state 𝜔 ˆ of the protocol can be
𝑀𝑀
written as follows:
𝑝 𝑝
𝜔 ˆ = P 𝑀 ′ → 𝑀ˆ (Φ 𝑀 𝑀 ′ ), (18.3.297)
𝑀𝑀
where

P 𝑀 ′ → 𝑀ˆ B D𝑛 ◦ N ◦ E𝑛−1 ◦ Δ ◦ D𝑛−1 ◦ N ◦ E𝑛−2 ◦ Δ ◦ D𝑛−2 ◦

· · · ◦ D2 ◦ N ◦ E1 ◦ Δ ◦ D1 ◦ N ◦ E0 ◦ A, (18.3.298)

and A is an appending channel that appends the state Δ𝐹0 (Ψ𝐹0 𝐵0′ ) to the input state
𝑝
Φ 𝑀 𝑀 ′ . In (18.3.298), we have omitted all system labels for simplicity.
We now state the main result of this section:

Theorem 18.26
Fix 𝑛 ∈ N, 𝜀 ∈ [0, 1), and 𝛼 ∈ (1, 2], and let N 𝐴→𝐵 be a quantum channel. For
all (𝑛, |M| , 𝜀) classical-feedback-assisted classical communication protocols
over the channel N 𝐴→𝐵 , the following bound holds

log2 |M| b 𝛼 1
≤ Υ𝛼 (N 𝐴→𝐵 ) + log2 , (18.3.299)
𝑛 𝑛 (𝛼 − 1) 1−𝜀

where Υb𝛼 (N 𝐴→𝐵 ) is the geometric Υ-information of N 𝐴→𝐵 , as defined in

(18.3.254), with 𝑫 set to 𝐷
b𝛼 .

Proof: Consider an arbitrary (𝑛, |M| , 𝜀) protocol of the form described in Sec-
tion 18.1, with final state as given in (18.3.297). Let the distribution 𝑝 over the
messages be the uniform distribution. Since the condition 𝑝 ∗err (C) ≤ 𝜀 holds, with
𝑝 ∗err defined in (18.1.12), we can apply (18.3.274) of Proposition 18.25 to conclude
that

log2 |M| ≤ inf b𝛼 (P 𝑀 ′ → 𝑀ˆ (Φ 𝑀 𝑀 ′ )∥M 𝑀 ′ → 𝑀ˆ (Φ 𝑀 𝑀 ′ ))

𝐷
M 𝑀 ′ → 𝑀ˆ :𝛽(M 𝑀 ′ → 𝑀ˆ )≤1

1137
Chapter 18: Classical-Feedback-Assisted Communication

𝛼 1
+ log2 (18.3.300)
𝛼−1 1−𝜀

𝛼 1
≤Υ
b𝛼 (P ′ ˆ ) +
𝑀 →𝑀 log2 , (18.3.301)
𝛼−1 1−𝜀

where the second inequality follows from the definition in with 𝑫 set to 𝐷 b𝛼 .
Eq. (18.3.298) indicates that the whole protocol is a serial composition of bipartite
channels. Then we find that
b𝛼 (P ′ ˆ )
Υ 𝑀 →𝑀
= Υ𝛼 (D𝑛 ◦ N ◦ E𝑛−1 ◦ Δ ◦ D𝑛−1 ◦ N ◦ E𝑛−2 ◦ Δ ◦ D𝑛−2 ◦
b (18.3.302)
· · · ◦ D2 ◦ N ◦ E1 ◦ Δ ◦ D1 ◦ N ◦ E0 ◦ A) (18.3.303)
𝑛
∑︁ 𝑛−1
∑︁
𝑖 b𝛼 (E𝑖 ) + Υ
≤ 𝑛Υ
b𝛼 (N) + 𝑛Υ b𝛼 (Δ) + b𝛼 (D ) +
Υ Υ b𝛼 (A) (18.3.304)
𝑖=1 𝑖=0
b𝛼 (N).
= 𝑛Υ (18.3.305)

The inequality follows from Proposition 18.22. The last equality follows from
Propositions 18.18, 18.20, and 18.19 because each encoding channel E𝑖 and
decoding channel D𝑖 is a local channel and Δ is a classical feedback channel. We
also implicitly used the stability property in Proposition 18.18. Putting everything
together, we conclude that

𝛼 1
log2 |M| ≤ 𝑛Υb𝛼 (N) + log2 , (18.3.306)
𝛼−1 1−𝜀
which is equivalent to the desired bound in (18.3.299). ■

18.4 Classical Capacity of a Quantum Channel As-

sisted by Classical Feedback
In this section, we analyze the asymptotic case of feedback-assisted communication,
in which we allow for an arbitrary large number of rounds of feedback. The
definitions in this section are similar to those in previous chapters, and so we keep
this section brief.

1138
Chapter 18: Classical-Feedback-Assisted Communication

Definition 18.27 Achievable Rate for Classical-Feedback-Assisted Classi-

cal Communication
Given a quantum channel N, a rate 𝑅 ∈ R+ is called an achievable rate for
classical-feedback-assisted classical communication over N if for all 𝜀 ∈ (0, 1],
all 𝛿 > 0, and all sufficiently large 𝑛, there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) classical-
feedback-assisted classical communication protocol.

Definition 18.28 Classical-Feedback-Assisted Classical Capacity of a

Quantum Channel
The classical-feedback-assisted classical capacity of a quantum channel N,
denoted by 𝐶CFB (N), is defined as the supremum of all achievable rates, i.e.,

𝐶CFB (N) B sup{𝑅 : 𝑅 is an achievable rate for N}. (18.4.1)

Definition 18.29 Strong Converse Rate for Classical-Feedback-Assisted

Classical Communication
Given a quantum channel N, a rate 𝑅 ∈ R+ is called a strong converse rate for
classical-feedback-assisted classical communication over N if for all 𝜀 ∈ [0, 1),
all 𝛿 > 0, and all sufficiently large 𝑛, there does not exist an (𝑛, 2𝑛(𝑅+𝛿) , 𝜀)
classical-feedback-assisted classical communication protocol.

Definition 18.30 Strong Converse Classical-Feedback-Assisted Classical

Capacity of a Quantum Channel
The strong converse classical-feedback-assisted classical capacity of a quantum
channel N, denoted by 𝐶eCFB (N), is defined as the infimum of all strong converse
rates, i.e.,
eCFB (N) B inf{𝑅 : 𝑅 is a strong converse rate for N}.
𝐶 (18.4.2)

We conclude several theorems, based on the bounds given in Section 16.32:

1139
Chapter 18: Classical-Feedback-Assisted Communication

Theorem 18.31 Classical-Feedback-Assisted Classical Capacity of

Entanglement-Breaking Channels
For an entanglement-breaking channel N, its classical-feedback-assisted clas-
sical capacity 𝐶CFB (N) and its strong converse quantum-feedback-assisted
classical capacity are both equal to its Holevo information 𝜒(N), i.e.,

𝐶CFB (N) = 𝐶
eCFB (N) = 𝜒(N), (18.4.3)

where 𝜒(N) is defined in (7.11.106).

Proof: The lower bound 𝜒(N) ≤ 𝐶CFB (N) follows from Theorem 12.13 (i.e.,
not making use of the classical feedback channel at all). The upper bound
𝐶eCFB (N) ≤ 𝜒(N) follows from (18.3.6) of Theorem 18.2, and by reasoning similar
to that given in the proof of Theorem 12.19. ■

Theorem 18.32 Average Entropy Weak Converse Bound for Classical

Capacity Assisted by Classical Feedback
Let N 𝐴→𝐵 be a quantum channel that can be written as the following probabilistic
mixture of quantum channels:
∑︁
N 𝐴→𝐵 = 𝑝 𝑋 (𝑥)N𝑥𝐴→𝐵 , (18.4.4)
𝑥

where 𝑝 𝑋 is a probability distribution and {N𝑥𝐴→𝐵 }𝑥 is a set of quantum channels.

The following upper bound holds for the classical capacity of a quantum channel
assisted by classical feedback:
∑︁
𝐶CFB (N 𝐴→𝐵 ) ≤ sup 𝑝 𝑋 (𝑥)𝐻 (N𝑥𝐴→𝐵 (𝜌 𝐴 )). (18.4.5)
𝜌𝐴 𝑥

Proof: This is a direct consequence of Theorem 18.6 and reasoning similar to that
given for the proof around (12.2.43). ■

1140
Chapter 18: Classical-Feedback-Assisted Communication

Theorem 18.33 Geometric 𝚼-Information Strong Converse Bound for

Classical Capacity Assisted by Classical Feedback
The following upper bound holds for the strong converse classical capacity of a
quantum channel N 𝐴→𝐵 assisted by classical feedback:
eCFB (N 𝐴→𝐵 ) ≤ Υ(N
𝐶 b 𝐴→𝐵 ), (18.4.6)
b 𝐴→𝐵 ) is the Υ-information defined from the Belavkin–Staszewski
where Υ(N
relative entropy (see (18.3.254) and Definition 7.51).

Proof: This is a direct consequence of the upper bound in Theorem 18.26 and
reasoning similar to that given in the proof of Theorem . We also require the fact
that the geometric Rényi relative entropy converges to the Belavkin–Staszewski
relative entropy in the limit as 𝛼 → 1 (see Proposition 7.52). ■

18.5 Examples
In this section, we briefly provide some examples of channels for which we evaluate
the capacity upper bounds in Section 18.4. We begin with the quantum erasure
channel (see Section 4.5.2). Recall that a quantum erasure channel acts as follows
on an input density operator 𝜌:

E 𝑝 (𝜌) B (1 − 𝑝) 𝜌 + 𝑝|𝑒⟩⟨𝑒|, (18.5.1)

where 𝑝 ∈ [0, 1] is the erasure probability and |𝑒⟩⟨𝑒| is an erasure state orthogonal
to every possible input. Let 𝑑 be the dimension of the input to the channel. By
inspection, we see that the erasure channel is a probabilistic mixture of an identity
channel and a channel that traces out the input and replaces with the erasure state.
Thus, we apply Theorem 18.32 to conclude that

𝐶CFB (E 𝑝 ) ≤ sup [(1 − 𝑝) 𝐻 (id(𝜌 𝐴 )) + 𝑝𝐻 (|𝑒⟩⟨𝑒|)] (18.5.2)

𝜌𝐴
= (1 − 𝑝) sup 𝐻 (id(𝜌 𝐴 )) (18.5.3)
𝜌𝐴
= (1 − 𝑝) log2 𝑑. (18.5.4)

1141
Chapter 18: Classical-Feedback-Assisted Communication

Since this upper bound is an achievable for classical communication over the erasure
channel without feedback (see Theorem 12.33), we then conclude that

𝐶 (E 𝑝 ) = 𝐶CFB (E 𝑝 ) = (1 − 𝑝) log2 𝑑. (18.5.5)

That is, classical feedback does not increase the classical capacity of the erasure
channel.
Finally, we evaluate the bound in Theorem 18.33 for the qubit depolarizing
channel. Recall from Section 16.32 that it is defined as

D 𝑝 (𝑋) B (1 − 𝑝) 𝑋 + 𝑝 Tr[𝑋]𝜋, (18.5.6)

𝜋 B 𝐼/𝑑. (18.5.7)

It was already established in Section 16.32 that Υ(D 𝑝 ) is an upper bound on its
(unassisted) classical capacity, and we discussed in Section 16.32 how the Holevo
information is equal to its classical capacity. What we find now is that Υ(D 𝑝 ) is
an upper bound on its classical capacity assisted by a classical feedback channel.
Figure 18.1 plots this upper bound and also plots the Holevo information lower
bound when 𝑑 = 2. The latter is given by 1 − ℎ2 ( 𝑝/2), where ℎ2 is the binary
entropy function. Note that the depolarizing channel is entanglement breaking for
𝑑 𝑑
𝑝 ≥ 𝑑+1 . As such, the bounds from Theorem 18.31 apply, so that, for 𝑝 ≥ 𝑑+1 ,
the Holevo information 1 − ℎ2 ( 𝑝/2) is equal to the classical capacity assisted by
classical feedback.

18.6 Summary
In this chapter, we developed the general theory of classical communication over a
quantum channel assisted by classical feedback from receiver to sender. Our main
focus was on establishing upper bounds on this capacity. The main findings of this
chapter are as follows:

1. We first proved that classical feedback does not enhance the classical capacity
of an entanglement-breaking channel.
2. Next, we established that the average output entropy of a channel is a weak
converse upper bound on the feedback-assisted capacity. The method for
establishing this average entropy bound involves identifying an information

1142
Chapter 18: Classical-Feedback-Assisted Communication

0.9

0.8

0.7

0.6
Rate
0.5

0.4

0.3

0.2
Holevo information
0.1 Upsilon Information
Entanglement-breaking
0
0 0.2 0.4 0.6 0.8 1
p

Figure 18.1: Lower and upper bounds on the classical-feedback-assisted

classical capacity of the qubit depolarizing channel in (18.5.6), with 𝑑 = 2. The
dashed vertical line indicates that the qubit depolarizing channel is entanglement
breaking for 𝑝 ≥ 2/3, so that the Holevo information is equal to the feedback-
assisted capacity for these values, according to Theorem 18.31.

measure that has two key properties: 1) it does not increase under a one-way
local operations and classical communication channel from the receiver to the
sender and 2) a quantum channel from sender to receiver cannot increase the
information measure by more than the maximum average output entropy of the
channel. This information measure can be understood as the sum of two terms,
with one corresponding to classical correlation and the other to entanglement.
3. We finally established a general strong converse upper bound on the feedback-
assisted capacity, in terms of the geometric Υ-information of a quantum
channel. The main method for doing was to devise an information measure
for bipartite channels that is equal to zero for classical feedback channels and
products of local channels.

18.7 Bibliographic Notes

The classical capacity of a quantum channel assisted by a classical feedback channel
was first studied by Bowen and Nagarajan (2005), who proved that classical feedback

1143
Chapter 18: Classical-Feedback-Assisted Communication

does not increase the capacity of entanglement-breaking channels. This result was
strengthened to a strong converse statement by Ding and Wilde (2018). Smith
and Smolin (2009) provided an example of a channel for which classical feedback
can signficantly enhance the classical capacity. Bennett et al. (2006) related the
feedback-assisted capacity to other capacities in quantum Shannon theory, and
García-Patrón et al. (2018) related it to other notions of feedback-assisted capacity.
Ding et al. (2019) established the entropy upper bound on the feedback-assisted
capacity, and Ding et al. (2023) established the geometric Υ-information upper
bound on the strong converse feedback-assisted capacity.

1144
Chapter 19

LOCC-Assisted Quantum
Communication
This chapter develops an important variation of quantum communication, in which
we allow the sender and receiver the free use of classical communication. That is,
in between every use of a given quantum communication channel N 𝐴→𝐵 , the sender
and receiver are allowed to perform local operations and classical communication
(LOCC). For this reason, the capacity considered in this chapter is called the
LOCC-assisted quantum capacity.
The practical motivation for this kind of feedback-assisted quantum capacity
comes from the fact that, these days, classical communication is rather cheap
and plentiful. Thus, from a resource-theoretic perspective, it can be sensible to
simply allow classical communication as a free resource (similar to how we did for
entanglement-assisted communication in Chapter 11). Then our goal is to place
informative bounds on the rate at which quantum information can be communicated
from the sender to the receiver in this setting. Furthermore, these bounds are
relevant for understanding and placing limitations on the speed at which distributed
quantum computation can be carried out.
In order to establish upper bounds on LOCC-assisted quantum capacity, we
revisit the concept of amortization introduced in Section 17.1.3. However, in
this context, we proceed somewhat differently, instead employing entanglement
measures to quantify how much entanglement can be generated by multiple uses of a
quantum channel. Then we define the amortized entanglement of a quantum channel
as the largest difference between the output and input entanglement of the channel.
1145
Chapter 19: LOCC-Assisted Quantum Communication

In particular, two entanglement measures on which we focus are the squashed

entanglement and the Rains relative entropy, as well as a variant of the latter
called max-Rains relative entropy. One key property of the squashed entanglement
and the max-Rains relative entropy is that these entanglement measures do not
increase under amortization, similar to how we previously observed that the mutual
information of a channel does not increase under amortization. This key property
leads to the conclusion that these entanglement measures can serve as upper bounds
on the LOCC-assisted quantum capacity of a quantum channel, and arriving at this
conclusion is one of the main goals of this chapter. At the end of the chapter, we
demonstrate the utility of the squashed entanglement and Rains family of bounds
by evaluating them for several example quantum channels of interest.

Combining Entanglement Distillation and Teleportation to Obtain a Quantum

Communication Protocol

We can use entanglement distillation along with teleportation in the asymptotic

setting to obtain a lower bound on the number of transmitted qubits in a quantum
communication protocol. The strategy is as follows; see Figure 19.1.
1. Alice prepares several copies, say 𝑛, of a pure state 𝜓 𝐴𝐴
˜ , with the dimension of
𝐴˜ equal to the dimension of 𝐴, and sends each of the 𝐴 systems through the
channel N 𝐴→𝐵 to Bob.
2. Alice and Bob now share 𝑛 copies of the state 𝜔 𝐴𝐵 ˜ = N 𝐴→𝐵 (𝜓 𝐴𝐴 ˜ ). They
perform a one-way entanglement distillation protocol to convert these mixed
entangled states to a perfect, maximally entangled state Φ 𝐴ˆ 𝐵ˆ of Schmidt rank
𝑑 ≥ 2. Roughly speaking, as shown in Chapter 13, a Schmidt rank of 2𝑛𝐼 ( 𝐴⟩𝐵) 𝜔
˜

˜ , as 𝑛 → ∞.
is achievable for 𝑛 copies of 𝜔 𝐴𝐵
3. Using the distilled maximally entangled state, along with 2 log2 𝑑 bits of
classical communication, Alice and Bob perform the quantum teleportation
protocol to transmit the 𝐴′ part of an arbitrary pure state Ψ𝑅 𝐴′ from Alice to
Bob, with 𝑑 𝐴′ = 𝑑.
Since there are 𝑛 uses of the channel in this strategy, we see that as 𝑛 → ∞, the
log 𝑑
rate of this strategy (the number of qubits per channel use) is 𝑛2 = 𝐼 ( 𝐴⟩𝐵)𝜔 .
By optimizing over all initial pure states 𝜓 𝐴𝐴˜ prepared by Alice, we find that, in
the asymptotic setting, the rate sup𝜓 𝐴𝐴
˜
𝐼 ( 𝐴⟩𝐵)𝜔 = 𝐼 𝑐 (N) is achievable, where we
˜

1146
Chapter 19: LOCC-Assisted Quantum Communication

Reference
Ψ RA0 Ψ RB0
Alice
A0

Ãn Â

T→
ψ Ãn An
An B0

Alice
Bob N ⊗n L→
Bn B̂

One-way Teleportation
entanglement
distillation

Figure 19.1: Sketch of a quantum communication protocol over 𝑛 uses of the

channel N, which makes use of entanglement distillation and teleportation. The
arrow “→” indicates that the channel contains classical communication from
Alice to Bob only. We show in [] that, in the asymptotic limit, this strategy
achieves the quantum capacity of N.

recognize the coherent information 𝐼 𝑐 (N) of the channel N, which we define in

(7.11.107). Note that the strategy we have outlined here also involves classical
communication between Alice and Bob during the entanglement distillation step
and during the teleportation step. The results of Section 14.1.3 and Section 14.2.1
show that this strategy is nonetheless a valid quantum communication protocol, in
the sense that the rate 𝐼 𝑐 (N) can be achieved by a strategy that does not make use
of forward classical communication.
Now, Alice can do better than prepare the state 𝜓 ⊗𝑛 ˜ and send each 𝐴 system
𝐴𝐴
through the channel: she can prepare a state 𝜓 𝐴˜ 1 ··· 𝐴˜ 𝑛 𝐴1 ···𝐴𝑛 ≡ 𝜓 𝐴˜ 𝑛 𝐴𝑛 such that the
systems 𝐴1 , . . . , 𝐴𝑛 being sent through N are entangled. One can then achieve a
rate of 𝑛1 sup𝜓 𝐴˜ 𝑛 𝐴𝑛 𝐼 ( 𝐴˜ 𝑛 ⟩𝐵𝑛 )𝜔 = 𝑛1 𝐼 𝑐 (N ⊗𝑛 ). Since we are free to use the channel N
as many times as we want, we can optimize over 𝑛 to obtain the communication rate
1
sup 𝐼 𝑐 (N ⊗𝑛 ) C 𝐼reg
𝑐
(N), (19.0.1)
𝑛∈N 𝑛

and it is this regularized coherent information of N that is optimal for quantum

communication over the channel N. In other words, 𝑄(N) = 𝐼reg 𝑐 (N), and we prove

this in Section 14.2.2.

1147
Chapter 19: LOCC-Assisted Quantum Communication

19.1 𝒏-Shot LOCC-Assisted Quantum Communica-

tion Protocol
This section discusses the most general form for an 𝑛-shot LOCC-assisted quantum
communication protocol.
Before starting, we should clarify that the goal of such a protocol is to produce
an approximate maximally entangled state, with the Schmidt rank of the ideal target
state as large as possible. Due to the assumption of free classical communication, as
well as the quantum teleportation protocol discussed in Section 5.1, generating an
approximate maximally entangled state is equivalent to generating an approximate
identity quantum channel, such that the dimension of the ideal target identity channel
is equal to the Schmidt rank of the target maximally entangled state. To make
this statement more quantitative, suppose that 𝜔 𝐴′ 𝐵′ is an approximate maximally
entangled state; i.e., suppose that it is 𝜀-close in normalized trace distance to a
maximally entangled state Φ 𝐴′ 𝐵′ of Schmidt rank 𝑑:
1
∥𝜔 𝐴′ 𝐵′ − Φ 𝐴′ 𝐵′ ∥ 1 ≤ 𝜀. (19.1.1)
2
Let T 𝐴𝐴′ 𝐵′ →𝐵 denote the one-way LOCC channel corresponding to quantum
teleportation (as discussed around (5.1.24)). Then by applying (5.1.25), it follows
that teleportation over the ideal resource state Φ 𝐴′ 𝐵′ realizes an identity channel
id 𝐴→𝐵 of dimension 𝑑:

T 𝐴𝐴′ 𝐵′ →𝐵 ((·) ⊗ Φ 𝐴′ 𝐵′ ) = id 𝐴→𝐵 (·). (19.1.2)

Let T 𝜔𝐴→𝐵 denote the channel realized by teleportation over the unideal state 𝜔 𝐴′ 𝐵′ :

T 𝜔𝐴→𝐵 (·) B T 𝐴𝐴′ 𝐵′ →𝐵 ((·) ⊗ 𝜔 𝐴′ 𝐵′ ). (19.1.3)

We would then like to determine the deviation of the ideal channel from T 𝜔𝐴→𝐵 , and
to do so, we can employ the normalized diamond distance. Then consider that,
from the data-processing inequality for trace distance,

T 𝜔𝐴→𝐵 − id 𝐴→𝐵 ⋄
= sup T 𝐴→𝐵 (𝜓 𝑅 𝐴 )
𝜔
− id 𝐴→𝐵 (𝜓 𝑅 𝐴 ) 1
(19.1.4)
𝜓𝑅 𝐴
= sup ∥T 𝐴𝐴′ 𝐵′ →𝐵 (𝜓 𝑅 𝐴 ⊗ 𝜔 𝐴′ 𝐵′ ) − T 𝐴𝐴′ 𝐵′ →𝐵 (𝜓 𝑅 𝐴 ⊗ Φ 𝐴′ 𝐵′ )∥ 1 (19.1.5)
𝜓𝑅 𝐴

1148
Chapter 19: LOCC-Assisted Quantum Communication

≤ sup ∥𝜓 𝑅 𝐴 ⊗ 𝜔 𝐴′ 𝐵′ − 𝜓 𝑅 𝐴 ⊗ Φ 𝐴′ 𝐵′ ∥ 1 (19.1.6)
𝜓𝑅 𝐴
= ∥𝜔 𝐴′ 𝐵′ − Φ 𝐴′ 𝐵′ ∥ 1 ≤ 2𝜀, (19.1.7)
so that we arrive at the desired statement mentioned above:
1 1 𝜔
∥𝜔 𝐴′ 𝐵′ − Φ 𝐴′ 𝐵′ ∥ 1 ≤ 𝜀 ⇒ T − id 𝐴→𝐵 ⋄
≤ 𝜀. (19.1.8)
2 2 𝐴→𝐵
Thus, for the above reason, we focus exclusively on LOCC-assisted protocols
whose aim is to generate an approximate maximally entangled state. In what follows,
all bipartite cuts for separable states or LOCC channels should be understood as
being between Alice’s and Bob’s systems.
A protocol for LOCC-assisted quantum communication is depicted in Fig-
ure [REF], and it is defined by the following elements:
(𝜌 (1)
𝐴 ′ 𝐴1 𝐵 ′
, {L (𝑖)
𝐴′ 𝐵 𝐵 𝑖
(𝑛+1)
′ →𝐴′ 𝐴 𝐵 ′ }𝑖=2 , L 𝐴′ 𝐵 𝐵 ′ →𝑀 𝑀 ),
𝑛
(19.1.9)
1 1 𝑖−1 𝑖−1 𝑖−1 𝑖 𝑖 𝑛 𝑛 𝑛 𝐴 𝐵

where 𝜌 (1)
𝐴 ′ 𝐴1 𝐵 ′
is a separable state, L (𝑖)
𝐴′ 𝐵 𝐵′ →𝐴𝑖′ 𝐴𝑖 𝐵𝑖′
is an LOCC channel for
1 1 𝑖−1 𝑖−1 𝑖−1
𝑖 ∈ {2, . . . , 𝑛}, and L (𝑛+1)
𝐴′𝑛 𝐵 𝑛 𝐵′𝑛 →𝑀 𝐴 𝑀 𝐵
is a final LOCC channel that generates the
approximate maximally entangled state in systems 𝑀 𝐴 and 𝑀𝐵 . Let C denote
all of these elements, which together constitute the LOCC-assisted quantum
communication code. All systems with primed labels should be understood as
local quantum memory or scratch registers that Alice or Bob can employ in this
information-processing task. They are also assumed to be finite-dimensional, but
could be arbitrarily large. The unprimed systems are the ones that are either input
to or output from the quantum communication channel N 𝐴→𝐵 .
The LOCC-assisted quantum communication protocol begins with Alice and
(1)
Bob performing an LOCC channel L∅→𝐴 ′ 𝐴 𝐵 ′ , which leads to the separable state
1 1 1
𝜌 (1) mentioned above, where
𝐴1′ 𝐴1 𝐵1′
and 𝐴′1 𝐵′1
are systems that are finite-dimensional
but arbitrarily large. The system 𝐴1 is such that it can be fed into the first channel
use. Alice sends system 𝐴1 through the first channel use, leading to a state
𝜔 (1)
𝐴′ 𝐵1 𝐵 ′
B N 𝐴1 →𝐵1 (𝜌 (1)
𝐴 ′ 𝐴1 𝐵 ′
). (19.1.10)
1 1 1 1

Alice and Bob then perform the LOCC channel L (2) 𝐴1′ 𝐵1 𝐵1′ →𝐴2′ 𝐴2 𝐵2′
, which leads to
the state
𝜌 (2)
𝐴 ′ 𝐴2 𝐵 ′
B L (2)
𝐴′ 𝐵1 𝐵′ →𝐴′ 𝐴2 𝐵′
(𝜔 (1)
𝐴′ 𝐵1 𝐵 ′
). (19.1.11)
2 2 1 1 2 2 1 1

1149
Chapter 19: LOCC-Assisted Quantum Communication

Alice sends system 𝐴2 through the second channel use N 𝐴2 →𝐵2 , leading to the state

𝜔 (2)
𝐴′ 𝐵2 𝐵 ′
B N 𝐴2 →𝐵2 (𝜌 (2)
𝐴 ′ 𝐴2 𝐵 ′
). (19.1.12)
2 2 2 2

This process iterates: the protocol uses the channel 𝑛 times. In general, we have the
following states for all 𝑖 ∈ {2, . . . , 𝑛}:

𝜌 (𝑖)
𝐴 ′ 𝐴𝑖 𝐵 ′
B L (𝑖)
𝐴′ 𝐵 𝐵′ →𝐴𝑖′ 𝐴𝑖 𝐵𝑖′
(𝜔 (𝑖−1)
𝐴′ 𝐵𝑖−1 𝐵′
), (19.1.13)
𝑖 𝑖 𝑖−1 𝑖−1 𝑖−1 𝑖−1 𝑖−1

𝜔 (𝑖)
𝐴′ 𝐵𝑖 𝐵 ′
B N 𝐴𝑖 →𝐵𝑖 (𝜌 (𝑖)
𝐴𝑖′ 𝐴𝑖 𝐵𝑖′
), (19.1.14)
𝑖 𝑖

where L (𝑖)
𝐴′ 𝐵 𝐵′ →𝐴𝑖′ 𝐴𝑖 𝐵𝑖′
is an LOCC channel. The final step of the protocol
𝑖−1 𝑖−1 𝑖−1
consists of an LOCC channel L (𝑛+1)
𝐴′𝑛 𝐵 𝑛 𝐵′𝑛 →𝑀 𝐴 𝑀 𝐵
, which generates the systems 𝑀 𝐴
and 𝑀𝐵 for Alice and Bob, respectively. The protocol’s final state is as follows:

𝜔 𝑀 𝐴 𝑀𝐵 B L (𝑛+1)
𝐴′ 𝐵 𝑛 𝐵′ →𝑀 𝐴 𝑀 𝐵
(𝜔 (𝑛)
𝐴′ 𝐵 𝑛 𝐵 ′
). (19.1.15)
𝑛 𝑛 𝑛 𝑛

The goal of the protocol is for the final state 𝜔 𝑀 𝐴 𝑀𝐵 to be close to a maximally
entangled state, and we define the quantum error probability of the code as follows:

𝑞 err (C) B 1 − 𝐹 (𝜔 𝑀 𝐴 𝑀𝐵 , Φ 𝑀 𝐴 𝑀𝐵 ) (19.1.16)

= 1 − ⟨Φ| 𝑀 𝐴 𝑀𝐵 𝜔 𝑀 𝐴 𝑀𝐵 |Φ⟩ 𝑀 𝐴 𝑀𝐵 , (19.1.17)

where 𝐹 denotes the quantum fidelity (Definition 6.5) and the maximally entangled
state Φ 𝑀 𝐴 𝑀𝐵 = |Φ⟩⟨Φ| 𝑀 𝐴 𝑀𝐵 is defined from
𝑀
1 ∑︁
|Φ⟩ 𝑀 𝐴 𝑀𝐵 B√ |𝑚⟩ 𝑀 𝐴 ⊗ |𝑚⟩ 𝑀𝐵 , (19.1.18)
𝑀 𝑚=1

such that it has Schmidt rank 𝑀. Intuitively, the quantum error probability 𝑞 err (C)
is equal to the probability that one obtains the outcome “not maximally entangled
state Φ 𝑀 𝐴 𝑀𝐵 ” when performing the test or measurement {Φ 𝑀 𝐴 𝑀𝐵 , 𝐼 𝑀 𝐴 𝑀𝐵 −Φ 𝑀 𝐴 𝑀𝐵 }
on the final state 𝜔 𝑀 𝐴 𝑀𝐵 of the protocol.

1150
Chapter 19: LOCC-Assisted Quantum Communication

Definition 19.1 (𝒏, 𝑴, 𝜺) LOCC-Assisted Quantum Communication

Protocol
Let C B (𝜌 (1)
𝐴1′ 𝐴1 𝐵1′
, {L (𝑖)
′ 𝐵
𝐴𝑖−1 ′ ′ ′}
𝑛 , L (𝑛+1)
𝑖−1 𝐵𝑖−1 →𝐴𝑖 𝐴𝑖 𝐵𝑖 𝑖=2 𝐴′𝑛 𝐵 𝑛 𝐵′𝑛 →𝑀 𝐴 𝑀 𝐵
) be the elements
of an 𝑛-round LOCC-assisted quantum communication protocol over the
channel N 𝐴→𝐵 . The protocol is called an (𝑛, 𝑀, 𝜀) protocol, with 𝜀 ∈ [0, 1], if
𝑞 err (C) ≤ 𝜀.

19.1.1 Lower Bound on the Number of Transmitted Qubits

[IN PROGRESS]
one-shot lower bound in terms of coherent information of a state. This achieves
coherent information of a channel, as well as reverse coherent information of a
channel.

19.1.2 Amortized Entanglement as a General Upper Bound for

LOCC-Assisted Quantum Communication Protocols

It is an interesting question to determine whether the inequality opposite to that in

Lemma 10.4 holds, and as the following proposition demonstrates, this question is
intimately related to finding useful upper bounds on the rate at which maximal entan-
glement can be distilled by employing an LOCC-assisted quantum communication
protocol. The following proposition represents our fundamental starting point when
analyzing limitations on LOCC-assisted quantum communication protocols.

Proposition 19.2
Let N 𝐴→𝐵 be a quantum channel, let 𝜀 ∈ [0, 1], and let 𝐸 be an entanglement
measure that is equal to zero for all separable states. For an (𝑛, 𝑀, 𝜀) LOCC-
assisted quantum communication protocol with final state 𝜔 𝑀 𝐴 𝑀𝐵 , the following
bound holds
𝐸 (𝑀 𝐴 ; 𝑀𝐵 )𝜔 ≤ 𝑛 · 𝐸 A (N). (19.1.19)

Proof: Consider an LOCC-assisted quantum communication protocol as presented

1151
Chapter 19: LOCC-Assisted Quantum Communication

in Section 19.1. From the monotonicity of the entanglement measure 𝐸 with

respect to LOCC channels, we find that

𝐸 (𝑀 𝐴 ; 𝑀𝐵 )𝜔
≤ 𝐸 ( 𝐴′𝑛 ; 𝐵𝑛 𝐵′𝑛 )𝜔 (𝑛) (19.1.20)
′ ′ ′ ′
= 𝐸 ( 𝐴𝑛 ; 𝐵𝑛 𝐵𝑛 )𝜔 (𝑛) − 𝐸 ( 𝐴1 𝐴1 ; 𝐵1 ) 𝜌 (1) (19.1.21)
" 𝑛 #
∑︁
= 𝐸 ( 𝐴′𝑛 ; 𝐵𝑛 𝐵′𝑛 )𝜔 (𝑛) + 𝐸 ( 𝐴𝑖′ 𝐴𝑖 ; 𝐵𝑖′) 𝜌 (𝑖) − 𝐸 ( 𝐴𝑖′ 𝐴𝑖 ; 𝐵𝑖′) 𝜌 (𝑖)
𝑖=2
′ ′
− 𝐸 ( 𝐴1 𝐴1 ; 𝐵1 ) 𝜌 (1) (19.1.22)
𝑛
∑︁
𝐸 ( 𝐴𝑖′; 𝐵𝑖 𝐵𝑖′)𝜔 (𝑖) − 𝐸 ( 𝐴𝑖′ 𝐴𝑖 ; 𝐵𝑖′) 𝜌 (𝑖)

≤ (19.1.23)
𝑖=1
≤ 𝑛 · 𝐸 A (N). (19.1.24)

The first equality follows because the state 𝜌 (1) 𝐴1′ 𝐴1 𝐵1′
is a separable state, and by
assumption, the entanglement measure 𝐸 vanishes for all such states. The second
equality follows by adding and subtracting equal terms. The second inequality
follows because 𝐸 ( 𝐴𝑖′ 𝐴𝑖 ; 𝐵𝑖′) 𝜌 (𝑖) ≤ 𝐸 ( 𝐴𝑖−1
′ ;𝐵 ′
𝑖−1 𝐵𝑖−1 )𝜔 (𝑖−1) for all 𝑖 ∈ {2, . . . , 𝑛},
due to monotonicity of the entanglement measure 𝐸 with respect to LOCC channels.
The final inequality follows from the definition of amortized entanglement and
the fact that the states 𝜔 (𝑖)
𝐴𝑖′ 𝐵𝑖 𝐵𝑖′
and 𝜌 (𝑖)
𝐴𝑖′ 𝐴𝑖 𝐵𝑖′
are particular states to consider in its
optimization. ■

The inequality in (19.1.19) states that the entanglement of the final output state
𝜔 𝑀 𝐴 𝑀𝐵 , as quantified by 𝐸, cannot exceed 𝑛 times the amortized entanglement of
the channel N 𝐴→𝐵 . Intuitively, the only resource allowed in the protocol, which
has the potential to generate entanglement, is the quantum communication channel
N 𝐴→𝐵 . All of the LOCC channels allowed for free have no ability to generate
entanglement on their own. Thus, the entanglement of the final state should not
exceed the largest possible amount of entanglement that could ever be generated
with 𝑛 calls to the channel, and this largest entanglement is exactly the amortized
entanglement of the channel.
Observe that the bound in Proposition 19.2 depends on the final state 𝜔 𝑀 𝐴 𝑀𝐵 ,
and thus it is not a universal bound, depending only on the parameters 𝑛, 𝑀, and
𝜀, because this state in turn depends on the entire protocol. Similar to the upper
bounds established in previous chapters, it is desirable to refine this bound such that
1152
Chapter 19: LOCC-Assisted Quantum Communication

it depends only on 𝑛, 𝑀, and 𝜀, which are the parameters characterizing any generic
LOCC-assisted quantum communication protocol. In the forthcoming sections, we
consider particular entanglement measures, such as squashed entanglement and
Rains relative entropy, which allow us to relate the parameters 𝑀 and 𝜀 to the final
state 𝜔 𝑀 𝐴 𝑀𝐵 .
To end this section, we note here that the bound in Proposition 19.2 simplifies for
teleportation-simulable channels and for entanglement measures that are subadditive
with respect to states and equal to zero for all separable states. This conclusion is a
consequence of Propositions 10.6 and 19.2:

Corollary 19.3 Reduction by Teleportation

Let 𝐸 𝑆 be an entanglement measure that is subadditive with respect to states
and equal to zero for all separable states. Let N 𝐴→𝐵 be a channel that is
teleportation-simulable with associated resource state 𝜃 𝑅𝐵′ . Let 𝜀 ∈ [0, 1]. For
an (𝑛, 𝑀, 𝜀) LOCC-assisted quantum communication protocol with final state
𝜔 𝑀 𝐴 𝑀𝐵 , the following bound holds

𝐸 𝑆 (𝑀 𝐴 ; 𝑀𝐵 )𝜔 ≤ 𝑛 · 𝐸 𝑆 (𝑅; 𝐵′)𝜃 . (19.1.25)

19.1.3 Squashed Entanglement Upper Bound on the Number of

Transmitted Qubits

We now establish the squashed entanglement upper bound on the number of qubits
that a sender can transmit to a receiver by employing an LOCC-assisted quantum
communication protocol:

Theorem 19.4 𝒏-Shot Squashed Entanglement Upper Bound

Let N 𝐴→𝐵 be a quantum channel, and let 𝜀 ∈ [0, 1). For all (𝑛, 𝑀, 𝜀)
LOCC-assisted quantum communication protocols over the channel N 𝐴→𝐵 , the
following bound holds
1 √
log2 𝑀 ≤ √ 𝑛 · 𝐸 sq (N) + 𝑔2 ( 𝜀) . (19.1.26)
1− 𝜀

1153
Chapter 19: LOCC-Assisted Quantum Communication

Proof: Consider an arbitrary (𝑛, 𝑀, 𝜀) LOCC-assisted quantum communication

protocol over the channel N 𝐴→𝐵 , as defined in Section 19.1. The squashed
entanglement is an entanglement measure (monotone under LOCC as shown in
Theorem 9.33) and it is equal to zero for separable states (Proposition 9.4.5). Thus,
Proposition 19.2 applies, and we find that
A
𝐸 sq (𝑀 𝐴 ; 𝑀𝐵 )𝜔 ≤ 𝑛 · 𝐸 sq (N) = 𝑛 · 𝐸 sq (N), (19.1.27)

where the equality follows from Theorem 10.20. Applying Definition 19.1 leads to

𝐹 (Φ 𝑀 𝐴 𝑀𝐵 , 𝜔 𝑀 𝐴 𝑀𝐵 ) ≥ 1 − 𝜀. (19.1.28)

As a consequence of Proposition 9.38, we find that

𝐸 sq (𝑀 𝐴 ; 𝑀𝐵 )𝜔
√ √
≥ 𝐸 sq (𝑀 𝐴 ; 𝑀𝐵 )Φ − 𝜀 log2 min {|𝑀 𝐴 | , |𝑀𝐵 |} + 𝑔2 ( 𝜀) (19.1.29)
√ √
= log2 𝑀 − 𝜀 log2 𝑀 + 𝑔2 ( 𝜀) (19.1.30)
√ √
= (1 − 𝜀) log2 𝑀 − 𝑔2 ( 𝜀). (19.1.31)

The first equality follows from Proposition

√ 9.36. We can√finally rearrange the
established inequality 𝑛 · 𝐸 sq (N) ≥ (1 − 𝜀) log2 𝑀 − 𝑔2 ( 𝜀) to be in the form
stated in the theorem. ■

19.2 𝒏-Shot PPT-Assisted Quantum Communication

Protocol
Recalling the completely PPT-preserving channels of Definition 4.27 as forming
a superset of LOCC channels, we can also consider quantum communication
protocols assisted by completely PPT-preserving channels (abbreviated as C-PPT-P
channels). Such a PPT-assisted protocol has exactly the same form as given in
Section 19.1, but it is instead defined by the following elements:

(𝜌 (1)
𝐴 ′ 𝐴1 𝐵 ′
, {P (𝑖)
𝐴′ 𝐵 𝐵 𝑖
(𝑛+1)
′ →𝐴′ 𝐴 𝐵 ′ }𝑖=2 , P 𝐴′ 𝐵 𝐵 ′ →𝑀 𝑀 ),
𝑛
(19.2.1)
1 1 𝑖−1 𝑖−1 𝑖−1 𝑖 𝑖 𝑛 𝑛 𝑛 𝐴 𝐵

where 𝜌 (1)
𝐴 ′ 𝐴1 𝐵 ′
is a PPT state, P (𝑖)
𝐴′ 𝐵 𝐵′ →𝐴𝑖′ 𝐴𝑖 𝐵𝑖′
is a C-PPT-P channel for 𝑖 ∈
1 1 𝑖−1 𝑖−1 𝑖−1
{2, . . . , 𝑛}, and P (𝑛+1)
𝐴′ 𝐵 𝑛 𝐵′ →𝑀 𝐴 𝑀 𝐵
is a final C-PPT-P channel that generates an
𝑛 𝑛

1154
Chapter 19: LOCC-Assisted Quantum Communication

approximate maximally entangled state in systems 𝑀 𝐴 and 𝑀𝐵 . Denoting the final

state of the protocol again by 𝜔 𝑀 𝐴 𝑀𝐵 , the criterion for such a protocol is again
given by (19.1.16). We then arrive at the following definition:

Definition 19.5 (𝒏, 𝑴, 𝜺) PPT-Assisted Quantum Communication Pro-

tocol
Let C B (𝜌 (1)𝐴1′ 𝐴1 𝐵1′
, {P (𝑖)
′ 𝐵
𝐴𝑖−1 ′ ′ ′}
𝑛 , P (𝑛+1)
𝑖−1 𝐵𝑖−1 →𝐴𝑖 𝐴𝑖 𝐵𝑖 𝑖=2 𝐴′𝑛 𝐵 𝑛 𝐵′𝑛 →𝑀 𝐴 𝑀 𝐵
) be the elements
of an 𝑛-round PPT-assisted quantum communication protocol over the channel
N 𝐴→𝐵 . The protocol is called an (𝑛, 𝑀, 𝜀) protocol, with 𝜀 ∈ [0, 1], if
𝑞 err (C) ≤ 𝜀.

Since every LOCC channel is a C-PPT-P channel, we can make the following
observation immediately:

Remark: Every (𝑛, 𝑀, 𝜀) LOCC-assisted quantum communication protocol is also an (𝑛, 𝑀, 𝜀)

PPT-assisted quantum communication protocol.

As a consequence of the above observation, any converse bound or limitation

that we establish for an arbitrary (𝑛, 𝑀, 𝜀) PPT-assisted protocol is also a converse
bound for an (𝑛, 𝑀, 𝜀) LOCC-assisted protocol.
By the exact same reasoning as in the proof of Proposition 19.2, but replacing
LOCC channels with C-PPT-P channels, we arrive at the following proposition:

Proposition 19.6
Let N 𝐴→𝐵 be a quantum channel, let 𝜀 ∈ [0, 1], and let 𝐸 be an entanglement
measure that is monotone under completely PPT-preserving channels and
is equal to zero for all PPT states. For an (𝑛, 𝑀, 𝜀) PPT-assisted quantum
communication protocol with final state 𝜔 𝑀 𝐴 𝑀𝐵 , the following bound holds

𝐸 (𝑀 𝐴 ; 𝑀𝐵 )𝜔 ≤ 𝑛 · 𝐸 A (N). (19.2.2)

Recalling Definition 4.32, a channel N 𝐴→𝐵 with input system 𝐴 and output
system 𝐵 is defined to be PPT-simulable with associated resource state 𝜔 𝑅𝐵′ if the
following equality holds for all input states 𝜌 𝐴 :
N 𝐴→𝐵 (𝜌 𝐴 ) = P 𝐴𝑅𝐵′ →𝐵 (𝜌 𝐴 ⊗ 𝜔 𝑅𝐵′ ), (19.2.3)
1155
Chapter 19: LOCC-Assisted Quantum Communication

where P 𝐴𝑅𝐵′ →𝐵 is a completely PPT-preserving channel between the sender, who

has systems 𝐴 and 𝑅, and the receiver, who has system 𝐵′.
By the same reasoning used to arrive at Corollary 19.3, but replacing LOCC
channels with completely PPT-preserving ones, we find the following:

Corollary 19.7
Let 𝐸 𝑆 denote an entanglement measure that is monotone non-increasing with
respect to completely PPT-preserving channels, subadditive with respect to
states, and equal to zero for all PPT states. Let N 𝐴→𝐵 be a channel that
is PPT-simulable with associated resource state 𝜃 𝑅𝐵′ . Let 𝜀 ∈ [0, 1]. For
an (𝑛, 𝑀, 𝜀) PPT-assisted quantum communication protocol with final state
𝜔 𝑀 𝐴 𝑀𝐵 , the following bound holds

𝐸 𝑆 (𝑀 𝐴 ; 𝑀𝐵 )𝜔 ≤ 𝑛 · 𝐸 𝑆 (𝑅; 𝐵′)𝜃 . (19.2.4)

19.2.1 Rényi–Rains Information Upper Bounds on the Number

of Transmitted Qubits

We now establish the max-Rains information upper bound on the number of qubits
that a sender can transmit to a receiver by employing a PPT-assisted quantum
communication protocol:

Theorem 19.8 𝒏-Shot Max-Rains Upper Bound

Let N 𝐴→𝐵 be a quantum channel, and let 𝜀 ∈ [0, 1). For all (𝑛, 𝑀, 𝜀)
PPT-assisted quantum communication protocols over the channel N 𝐴→𝐵 , the
following bound holds

1
log2 𝑀 ≤ 𝑛 · 𝑅max (N) + log2 . (19.2.5)
1−𝜀

Proof: Consider an arbitrary (𝑛, 𝑀, 𝜀) LOCC-assisted quantum communication

protocol over the channel N 𝐴→𝐵 , as defined in Section 19.1. The max-Rains relative
entropy is an entanglement measure (monotone under completely PPT-preserving
channels as shown in Proposition 9.28), and it is equal to zero for PPT states. Thus,

1156
Chapter 19: LOCC-Assisted Quantum Communication

Proposition 19.6 applies, and we find that

A
𝑅max (𝑀 𝐴 ; 𝑀𝐵 )𝜔 ≤ 𝑛 · 𝑅max (N) = 𝑛 · 𝑅max (N), (19.2.6)

where the equality follows from Theorem 10.18. Applying Definition 19.5 leads to

𝐹 (Φ 𝑀 𝐴 𝑀𝐵 , 𝜔 𝑀 𝐴 𝑀𝐵 ) ≥ 1 − 𝜀. (19.2.7)

As a consequence of Propositions 13.6 and 7.71, we find that

log2 𝑀 ≤ 𝑅 𝜀 (𝑀 𝐴 ; 𝑀𝐵 )𝜔 (19.2.8)

1
≤ 𝑅max (𝑀 𝐴 ; 𝑀𝐵 )𝜔 + log2 . (19.2.9)
1−𝜀
Combining (19.2.6) and (19.2.9), we conclude the proof. ■

At this point, it is worthwhile to compare the squashed-entanglement bound in

Theorem 19.4 with the max-Rains information bound in Theorem 19.8. First, both
bounds hold for all quantum channels, and so this is an advantage that they both
possess. The squashed entanglement and max-Rains information are rather different
quantities, and so the quantities on their own can vary based on the channel for
which they are evaluated. The squashed-entanglement bound in Theorem 19.4 is a
weak-converse bound, whereas the max-Rains information bound in Theorem 19.8
is a strong-converse bound. The max-Rains information bound has the advantage
that it is efficiently computable by a semi-definite program, whereas it is not
known how to compute the squashed entanglement. However, one can apply
Proposition 9.37 to see that the squashed entanglement bound gives a whole host of
upper bounds related to the choice of a squashing channel, and one can potentially
obtain tight bounds by making a clever choice of a squashing channel.
For channels that are PPT-simulable with associated resource states, as recalled
in (19.2.3), we obtain upper bounds that can be even stronger:

Theorem 19.9 𝒏-Shot Rényi–Rains Upper Bounds for PPT-Simulable

Channels
Let N 𝐴→𝐵 be a quantum channel that is PPT-simulable with associated resource
state 𝜃 𝑆𝐵′ , and let 𝜀 ∈ [0, 1). For all (𝑛, 𝑀, 𝜀) PPT-assisted quantum commu-
nication protocols over the channel N 𝐴→𝐵 , the following bounds hold for all

1157
Chapter 19: LOCC-Assisted Quantum Communication

𝛼 > 1:

′ 𝛼 1
log2 𝑀 ≤ 𝑛 · 𝑅
e𝛼 (𝑆; 𝐵 )𝜃 + log2 , (19.2.10)
𝛼−1 1−𝜀
1
log2 𝑀 ≤ [𝑛 · 𝑅(𝑆; 𝐵′)𝜃 + ℎ2 (𝜀)] . (19.2.11)
1−𝜀

Proof: Consider an arbitrary (𝑛, 𝑀, 𝜀) LOCC-assisted quantum communication

protocol over the channel N 𝐴→𝐵 , as defined in Section 19.1. The Rényi–Rains
relative entropy and Rains relative entropy are monotone non-increasing under
completely PPT-preserving channels (Proposition 9.25), equal to zero for PPT states,
and subadditive with respect to states (Proposition 9.25). As such, Corollary 19.7
applies, and we find for 𝛼 > 1 that
𝑅 e𝛼 (𝑆; 𝐵′)𝜃 ,
e𝛼 (𝑀 𝐴 ; 𝑀𝐵 )𝜔 ≤ 𝑛 · 𝑅 (19.2.12)
𝑅(𝑀 𝐴 ; 𝑀𝐵 )𝜔 ≤ 𝑛 · 𝑅(𝑆; 𝐵′)𝜃 . (19.2.13)
Applying Definition 19.5 leads to
𝐹 (Φ 𝑀 𝐴 𝑀𝐵 , 𝜔 𝑀 𝐴 𝑀𝐵 ) ≥ 1 − 𝜀. (19.2.14)
As a consequence of Proposition 13.6, we have that
log2 𝑀 ≤ 𝑅 𝜀 (𝑀 𝐴 ; 𝑀𝐵 )𝜔 . (19.2.15)
Applying Propositions 7.70 and 7.71, we find that

𝛼 1
log2 𝑀 ≤ 𝑅e𝛼 (𝑀 𝐴 ; 𝑀𝐵 )𝜔 + log2 , (19.2.16)
𝛼−1 1−𝜀
1
log2 𝑀 ≤ [𝑅(𝑀 𝐴 ; 𝑀𝐵 )𝜔 + ℎ2 (𝜀)] . (19.2.17)
1−𝜀
Putting together (19.2.12), (19.2.13), (19.2.16), and (19.2.17) concludes the
proof. ■

19.3 LOCC- and PPT-Assisted Quantum Capacities

of Quantum Channels
In this section, we analyze the asymptotic capacities, and as before, the upper
bounds for the asymptotic capacities are straightforward consequences of the
1158
Chapter 19: LOCC-Assisted Quantum Communication

non-asymptotic bounds given in Sections 19.1.3 and 19.2.1. The definitions of

these capacities are similar to what we have given previously, and so we only state
them here briefly.

Definition 19.10 Achievable Rate for LOCC-Assisted Quantum Commu-

nication
Given a quantum channel N, a rate 𝑅 ∈ R+ is called an achievable rate
for LOCC-assisted quantum communication over N if for all 𝜀 ∈ (0, 1], all
𝛿 > 0, and all sufficiently large 𝑛, there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) LOCC-assisted
quantum communication protocol.

Definition 19.11 LOCC-Assisted Quantum Capacity of a Quantum

Channel
The LOCC-assisted quantum capacity of a quantum channel N, denoted by
𝑄 ↔ (N), is defined as the supremum of all achievable rates, i.e.,

𝑄 ↔ (N) B sup{𝑅 : 𝑅 is an achievable rate for N}. (19.3.1)

Definition 19.12 Weak Converse Rate for LOCC-Assisted Quantum

Communication
Given a quantum channel N, a rate 𝑅 ∈ R+ is called a weak converse rate
for LOCC-assisted quantum communication over N if every 𝑅′ > 𝑅 is not an
achievable rate for N.

Definition 19.13 Strong Converse Rate for LOCC-Assisted Quantum

Communication
Given a quantum channel N, a rate 𝑅 ∈ R+ is called a strong converse rate for
LOCC-assisted quantum communication over N if for all 𝜀 ∈ [0, 1), all 𝛿 > 0,
and all sufficiently large 𝑛, there does not exist an (𝑛, 2𝑛(𝑅+𝛿) , 𝜀) LOCC-assisted
quantum communication protocol.

1159
Chapter 19: LOCC-Assisted Quantum Communication

Definition 19.14 Strong Converse LOCC-Assisted Quantum Capacity

of a Quantum Channel
The strong converse LOCC-assisted quantum capacity of a quantum channel N,
denoted by 𝑄e↔ (N), is defined as the infimum of all strong converse rates, i.e.,

e↔ (N) B inf{𝑅 : 𝑅 is a strong converse rate for N}.

𝑄 (19.3.2)

We have the exact same definitions for PPT-assisted quantum communication,

and we use the notation 𝑄 ↔
PPT to refer to the PPT-assisted quantum capacity and
e↔ for the strong converse PPT-assisted quantum capacity.
𝑄 PPT

Recall that, by definition, the following bounds hold

𝑄 ↔ (N) ≤ 𝑄
e↔ (N) ≤ 𝑄
e↔ (N),
PPT (19.3.3)
↔ ↔ e↔ (N).
𝑄 (N) ≤ 𝑄 (N) ≤ 𝑄
PPT PPT (19.3.4)

As a direct consequence of the bound in Theorem 19.4 and methods similar to

those given in the proof of Theorem 11.23, we find the following:

Theorem 19.15 Squashed-Entanglement Weak-Converse Bound

The squashed entanglement of a channel N is a weak converse rate for LOCC-
assisted quantum communication:

𝑄 ↔ (N) ≤ 𝐸 sq (N). (19.3.5)

As a direct consequence of the bound in Theorem 19.8 and methods similar to

those given in Section 11.2.3, we find that

Theorem 19.16 Max-Rains Strong-Converse Bound

The max-Rains information of a channel N is a strong converse rate for
PPT-assisted quantum communication:
e↔ (N) ≤ 𝑅max (N).
𝑄 (19.3.6)
PPT

As a direct consequence of the bound in Theorem 19.9 and methods similar to

1160
Chapter 19: LOCC-Assisted Quantum Communication

those given in Section 11.2.3, we find that

Theorem 19.17 Rains Strong-Converse Bound for PPT-Simulable Chan-

nels
Let N be a quantum channel that is PPT-simulable with associated resource state
𝜃 𝑆𝐵′ . Then the Rains information of N is a strong converse rate for PPT-assisted
quantum communication:
e↔ (N) ≤ 𝑅(N).
𝑄 (19.3.7)
PPT

19.4 Examples
[IN PROGRESS]
erasure channel - get Rains information as a strong converse rate - will match
lower bound in terms of reverse coherent information (Hayashi called this pseudo-
coherent information)
covariant dephasing channels - get Rains information as a strong converse rate
and then coherent information matches this (will evaluate Rains information bound
in unassisted quantum capacity chapter)
depolarizing channel - evaluate Rains information
use squashed entanglement to give upper bound for amplitude damping channel

19.5 Bibliographic Notes

The concept of LOCC-assisted quantum communication over a quantum channel
was presented in (Bennett et al., 1996c, Section V). The same Section V of (Bennett
et al., 1996c) also showed how to use the notion of teleportation simulation of a
quantum channel and entanglement measures in order to bound the LOCC-assisted
quantum capacity from above by a resource state that can realize the channel
by teleportation simulation. Müller-Hermes (2012) presented a more detailed
analysis of this bounding technique. Other papers that make use of the teleportation-
simulation technique in this and other contexts include those by Horodecki et al.
1161
Chapter 19: LOCC-Assisted Quantum Communication

(1999); Gottesman and Chuang (1999); Zhou et al. (2000); Bowen and Bose (2001);
Takeoka et al. (2002); Giedke and Ignacio Cirac (2002); Wolf et al. (2007); Niset
et al. (2009); Chiribella et al. (2009); Soeda et al. (2011); Leung and Matthews
(2015); Pirandola et al. (2017); Takeoka et al. (2016); Wilde et al. (2017); Takeoka
et al. (2017); Kaur and Wilde (2017).
A precise mathematical definition of an LOCC-assisted quantum communication
protocol conducted over a quantum channel was presented in (Müller-Hermes,
2012, Definition 12) and (Takeoka et al., 2014, Section IV).
That the entanglement of the final state of an 𝑛-round LOCC-assisted quantum
communication is bounded from above by 𝑛 times the channel’s amortized entan-
glement (Proposition 19.2) was anticipated by Bennett et al. (2003) and proven by
Kaur and Wilde (2017). Corollary 19.3 was anticipated by Bennett et al. (1996c)
and presented in more detail in (Müller-Hermes, 2012, Chapter 4), while the form
in which we have presented it here is closely related to the presentation by Kaur
and Wilde (2017).
The 𝑛-round PPT-assisted quantum communication protocols presented in
Section 19.2 were considered by Kaur and Wilde (2017), with PPT-assisted quantum
communication over a single or parallel use of a quantum channel considered by
Leung and Matthews (2015); Wang and Duan (2016b); Wang et al. (2019b). The
bound in Proposition 19.6 was established by Kaur and Wilde (2017).
Theorem 19.4 is due to Takeoka et al. (2014).
Wang and Duan (2016a) defined a semi-definite programming upper bound on
distillable entanglement of a bipartite state, and Wang and Duan (2016b) defined a
semi-definite programming upper bound on the quantum capacity of a quantum
channel. Wang et al. (2019b) observed that the quantity defined by Wang and
Duan (2016a) is equal to the max-Rains relative entropy, while also observing
that the quantity defined by Wang and Duan (2016b) is equal to the max-Rains
information of a quantum channel. Berta and Wilde (2018) established the max-
Rains information as an upper bound on the 𝑛-round non-asymptotic PPT-assisted
quantum capacity (Theorem 19.8). The upper bounds in Theorem 19.9 are due to
Kaur and Wilde (2017).

1162
Chapter 20

Secret Key Agreement

This chapter continues with the theme of feedback-assisted communication. Here,
we consider secret-key-agreement protocols, where the goal is for the sender and
receiver of a quantum channel N 𝐴→𝐵 to establish secret key, by using the quantum
channel N 𝐴→𝐵 along with the free use of public classical communication. That
is, between every channel use in a secret-key-agreement protocol, the sender and
receiver are allowed to perform local operations and public classical communication
(LOPC). The notion of capacity developed in this chapter is known as secret-key-
agreement capacity.
The secret key distilled in such a secret-key-agreement protocol should be
protected from an eavesdropper. The model we assume here is that the eavesdropper
is quite powerful, having access to the full environment of every use of the quantum
channel N 𝐴→𝐵 , as well as a copy of all of the classical data exchanged by the sender
and receiver when they conduct a round of LOPC. To understand this model in
a physical context, suppose that the quantum channel connecting the sender and
receiver is a fiber-optic cable, which we can model as a bosonic loss channel, and
suppose that the sender employs quantum states of light to distill a secret key with
the receiver. Then, in this eavesdropper model, we are assuming that all of the light
that does not make it to the receiver is collected by the eavesdropper in a quantum
memory. In this way, the secret key distilled by such a protocol is guaranteed to be
secure against a quantum-enabled eavesdropper.
The practical motivation for secret-key-agreement protocols is related to the
motivation that we considered in the previous chapter on LOCC-assisted quantum
communication. Classical communication is cheap and plentiful these days, and

1163
Chapter 20: Secret Key Agreement

so from a resource-theoretic perspective, it seems sensible to allow it for free in a

theoretical model of communication. Once a secret key has been established, it
can be used in conjuction with the well known one-time pad protocol as a scheme
for private communication of an arbitrary message that has the same size as the
key. Thus, as a consequence of the one-time pad protocol, it follows that secret key
agreement and private communication are equivalent information-processing tasks
when public classical communication is available for free. Furthermore, the model
of LOPC-assisted secret key agreement is essentially the same model considered in
quantum key distribution, which is one of the most famous applications in quantum
information science. One of the main goals of this chapter is to place bounds on
the rate at which secret-key-agreement is possible. Due to the strong connection
between the model of secret-key-agreement and quantum key distribution, these
bounds then place limitations on the rates at which it is possible to generate secret
key in a quantum key distribution protocol.
The main method for placing limitations on the rates of secret-key-agreement
protocols is similar to the approach that we took in the last chapter. In fact, there
are many parallels. We again use the concept of amortization and entanglement
measures, such as squashed entanglement and relative entropy of entanglement
(and several variants of the latter).
However, the main difference between this chapter and the previous one is
that the communication model is different. As discussed above, a secret-key-
agreement protocol is a three-party protocol, consisting of the legitimate sender
and receiver, as well as the eavesdropper. Thus, a priori, it is not obvious
how to connect entanglement measures, which are used in two-party protocols,
to secret-key-agreement protocols. To overcome this problem, we exploit the
purification principle to establish a powerful equivalence between tripartite secret-
key-agreement protocols and bipartite private-state distillation protocols. After
doing so, we can apply entanglement measures to bound the rate at which it is
possible to distill bipartite private states in a bipartite private-state distillation
protocol, and then by appealing to the aforementioned equivalence, we can bound
the rate at which it is possible to distill secret key in a tripartite secret-key-agreement
protocol.
The main conclusion of this chapter is that entanglement measures such as
squashed entanglement and relative entropy of entanglement (and the latter’s
variations) are upper bounds on the secret-key-agreement capacity of quantum
channels. At the end of the chapter, we evaluate these bounds for various channels

1164
Chapter 20: Secret Key Agreement

of interest in order to determine the fundamental limitations on secret key agreement

for these channels.

20.1 𝒏-Shot Secret-Key-Agreement Protocol

We begin by discussing the most general form for a secret-key-agreement protocol
conducted over a quantum channel. The most important point to clarify before
starting is the communication model, in particular, to address the question of who
has access to what. First, we suppose that there is a quantum channel N 𝐴→𝐵
connecting the legitimate sender Alice to the legitimate receiver Bob. Alice has
exclusive access to the input system 𝐴, and Bob has exclusive access to the output
system 𝐵. As we know from Chapter 4, every quantum channel has an isometric
channel U 𝐴→𝐵𝐸 extending it, such that the original channel N 𝐴→𝐵 is recovered by
tracing over the purifying or environment system 𝐸:

N 𝐴→𝐵 = Tr𝐸 ◦U 𝐴→𝐵𝐸 . (20.1.1)

Taking the same perspective as that in Chapter 16, with the idea that a powerful, fully
quantum eavesdropper could have access to every system to which the legitimate
parties do not have access, we suppose that the quantum eavesdropper has access
to the environment system 𝐸. Furthermore, in a secret-key-agreement protocol, the
legitimate parties are allowed to use a public, classical communication channel, in
addition to the quantum channel N 𝐴→𝐵 , in order to generate a secret key. Since
this channel is public, we suppose that the eavesdropper has access to all of the
classical data exchanged between the legitimate parties.
In more detail, an 𝑛-shot protocol for secret key agreement consists of 𝑛 calls
to the quantum channel N 𝐴→𝐵 , interleaved by LOPC channels. Since all of the
classical data exchanged between Alice and Bob is assumed to be public and
available to the eavesdropper as well, we call these channels “LOPC” channels,
which is an abbreviation of “local operations and public communication.” In fact,
a protocol for secret key agreement has essentially the same structure as a protocol
for LOCC-assisted quantum communication, as discussed in Section 19.1, with the
exception that the systems at the end should hold a secret key instead of a maximally
entangled state.
A protocol for secret key agreement is depicted in Figure [REF],and it consists

1165
Chapter 20: Secret Key Agreement

of the following elements:

(𝜌 𝐴1′ 𝐴1 𝐵1′ 𝑌1 , {L (𝑖)

𝐴′ 𝐵 𝐵
(𝑛+1)
′ →𝐴′ 𝐴 𝐵 ′𝑌 }𝑖=2 , L 𝐴′ 𝐵 𝐵 ′ →𝐾 𝐾 𝑌
𝑛
). (20.1.2)
𝑖−1 𝑖−1 𝑖−1 𝑖 𝑖 𝑖 𝑖 𝑛 𝑛 𝑛 𝐴 𝐵 𝑛+1

All systems labeled by 𝐴 belong to Alice, those labeled by 𝐵 belong to Bob, and
those labeled by 𝑌 are classical systems belonging to Eve, representing a copy
of the classical data exchanged by Alice and Bob in a round of LOPC. In the
above, 𝜌 𝐴1′ 𝐴1 𝐵1′ 𝑌1 is a separable state, L (𝑖)
𝐴′ 𝐵𝑖−1 𝐵′ →𝐴′ 𝐴𝑖 𝐵′𝑌𝑖
is an LOPC channel
𝑖−1 𝑖−1 𝑖 𝑖

for 𝑖 ∈ {2, . . . , 𝑛}, and L (𝑛+1) is a final LOPC channel that generates
𝐴′𝑛 𝐵 𝑛 𝐵′𝑛 →𝐾 𝐴 𝐾 𝐵𝑌𝑛+1
the approximate secret key in systems 𝐾 𝐴 and 𝐾 𝐵 . Let C denote all of these
elements, which together constitute the secret-key-agreement protocol. As with
LOCC-assisted quantum communication, all systems with primed labels should
be understood as local quantum memory or scratch registers that Alice and Bob
can employ in this information-processing task. We also assume that they are
finite-dimensional, yet arbitrarily large. The unprimed systems are the ones that are
either input to or output from the quantum communication channel N 𝐴→𝐵 .
The secret-key-agreement protocol begins with Alice and Bob performing an
(1) (1)
LOPC channel L∅→𝐴 ′ 𝐴 𝐵 ′ 𝑌 , which leads to the separable state 𝜌 𝐴′ 𝐴 𝐵 ′ 𝑌 mentioned
1 1
1 1 1 1 1 1
′ ′
above, where 𝐴1 and 𝐵1 are systems that are finite-dimensional yet arbitrarily large.
In particular, the state 𝜌 (1)
𝐴′ 𝐴1 𝐵′ 𝑌1
has the following form:
1 1
∑︁
𝜌 (1)
𝑦 𝑦
𝐴1′ 𝐴1 𝐵1′ 𝑌1
B 𝑝𝑌1 (𝑦 1 )𝜏𝐴1′ 𝐴1 ⊗ 𝜁 𝐵1′ ⊗ |𝑦 1 ⟩⟨𝑦 1 |𝑌1 , (20.1.3)
1 1
𝑦1

where 𝑌1 is a classical random variable corresponding to the message exchanged

between Alice and Bob, which is needed to establish this state. The classical system
𝑦 𝑦
𝑌1 belongs to the eavesdropper. Also, {𝜏𝐴1′ 𝐴1 } 𝑦1 and {𝜁 𝐵1′ } 𝑦1 are sets of quantum
1 1
states and 𝑝𝑌1 is a probability distribution. Note that the reduced state for Alice and
Bob is a generic separable state of the following form:
∑︁
(1) 𝑦 𝑦
𝜌 𝐴 ′ 𝐴1 𝐵 ′ = 𝑝𝑌1 (𝑦 1 )𝜏𝐴1′ 𝐴1 ⊗ 𝜁 𝐵1′ . (20.1.4)
1 1 1 1
𝑦1

The system 𝐴1 of 𝜌 (1)

𝐴1′ 𝐴1 𝐵1′ 𝑌1
is such that it can be fed into the first channel use. Alice
then sends system 𝐴1 through the first channel use, leading to a state

𝜔 (1)
𝐴′ 𝐵1 𝐵′ 𝐸 1𝑌1
B UN (1)
𝐴1 →𝐵1 𝐸 1 (𝜌 𝐴′ 𝐴1 𝐵′ 𝑌1 ). (20.1.5)
1 1 1 1

1166
Chapter 20: Secret Key Agreement

Note that we write the channel use as the isometric channel UN 𝐴1 →𝐵1 𝐸 1 that extends
N 𝐴1 →𝐵1 , since we would like to incorporate the eavesdropper’s system 𝐸 1 explicitly
into the description of the protocol. Alice and Bob then perform the LOPC channel
L (2)
𝐴′ 𝐵1 𝐵′ →𝐴′ 𝐴2 𝐵′ 𝑌2
, which leads to the state
1 1 2 2

𝜌 (2)
𝐴′ 𝐴2 𝐵′ 𝐸 1𝑌1𝑌2
B L (2)
𝐴′ 𝐵1 𝐵′ →𝐴′ 𝐴2 𝐵′ 𝑌2
(𝜔 (1)
𝐴′ 𝐵1 𝐵′ 𝐸 1𝑌1
). (20.1.6)
2 2 1 1 2 2 1 1

The LOPC channel L (2)

𝐴′ 𝐵1 𝐵′ →𝐴′ 𝐴2 𝐵′ 𝑌2
can be written as
1 1 2 2
∑︁
L (2)
𝑦 𝑦
𝐴′ 𝐵1 𝐵′ →𝐴′ 𝐴2 𝐵′ 𝑌2
B E 𝐴2′ →𝐴′ 𝐴2 ⊗ F𝐵21 𝐵′ →𝐵′ ⊗ |𝑦 2 ⟩⟨𝑦 2 |𝑌2 . (20.1.7)
1 1 2 2 1 2 1 2
𝑦2
𝑦 𝑦
In the above, {E 𝐴2′ →𝐴′ 𝐴2 } 𝑦2 and {F𝐵21 𝐵′ →𝐵′ } 𝑦2 are sets of completely positive maps
1 2 Í 1 2
𝑦 𝑦
such that the sum map 𝑦2 E 𝐴2′ →𝐴′ 𝐴2 ⊗ F𝐵21 𝐵′ →𝐵′ is trace preserving. The classical
1 2 1 2
system 𝑌2 represents the eavesdropper’s copy of the classical data exchanged by
Alice and Bob in this round of LOPC. Note that the reduced channel acting on
Alice and Bob’s systems is as follows:
∑︁
(2) 𝑦 𝑦
L 𝐴′ 𝐵1 𝐵′ →𝐴′ 𝐴2 𝐵′ = E 𝐴2′ →𝐴′ 𝐴2 ⊗ F𝐵21 𝐵′ →𝐵′ . (20.1.8)
1 1 2 2 1 2 1 2
𝑦2

Alice sends system 𝐴2 through the second channel use UN 𝐴2 →𝐵2 𝐸 2 , leading to the
state
𝜔 (2)
𝐴′ 𝐵2 𝐵′ 𝐸 1 𝐸 2𝑌1𝑌2
B UN (2)
𝐴2 →𝐵2 𝐸 2 (𝜌 𝐴′ 𝐴2 𝐵′ 𝐸 1𝑌1𝑌2 ). (20.1.9)
2 2 2 2

This process iterates: the protocol uses the channel 𝑛 times. In general, we have the
following states for all 𝑖 ∈ {2, . . . , 𝑛}:

𝜌 (𝑖)′ B L (𝑖)
𝐴′ 𝐵 𝐵′ →𝐴𝑖′ 𝐴𝑖 𝐵𝑖′𝑌𝑖
(𝜔 (𝑖−1) ), (20.1.10)
𝐴𝑖 𝐴𝑖 𝐵𝑖′ 𝐸 1𝑖−1𝑌1𝑖 𝑖−1 𝑖−1 𝑖−1
′ ′ 𝐸 𝑖−1𝑌 𝑖−1
𝐴𝑖−1 𝐵𝑖−1 𝐵𝑖−1 1 1

𝜔 (𝑖)
′ ′ B UN
→𝐵 (𝜌 (𝑖)
), (20.1.11)
𝐴𝐵𝐵𝐸 𝑌𝑖 𝑖
𝑖 𝑖 𝑖
𝐴 𝑖
1 1
𝐸
𝑖 𝑖 ′ 𝐴𝑖 𝐴𝑖 𝐵𝑖′ 𝐸 1𝑖−1𝑌1𝑖

where L (𝑖)
𝐴′ 𝐵 𝐵′ →𝐴𝑖′ 𝐴𝑖 𝐵𝑖′𝑌𝑖
is an LOPC channel that can be written as
𝑖−1 𝑖−1 𝑖−1
∑︁
L (𝑖)
𝑦 𝑦
𝐴′ 𝐵 𝐵′ →𝐴𝑖′ 𝐴𝑖 𝐵𝑖′𝑌𝑖
B E 𝐴𝑖′ →𝐴𝑖′ 𝐴𝑖
⊗ F𝐵𝑖𝑖−1 𝐵′ →𝐵𝑖′
⊗ |𝑦𝑖 ⟩⟨𝑦𝑖 |𝑌𝑖 . (20.1.12)
𝑖−1 𝑖−1 𝑖−1 𝑖−1 𝑖−1
𝑦𝑖
𝑦 𝑦
In the above, {E 𝐴𝑖′ }
and {F𝐵𝑖𝑖−1 𝐵′ →𝐵′ } 𝑦𝑖 are sets of completely positive
→𝐴𝑖′ 𝐴𝑖 𝑦 𝑖
Í 𝑦 𝑖−1 𝑖 𝑖−1
𝑦
maps such that the sum map 𝑦𝑖 E 𝐴𝑖′ →𝐴′ 𝐴𝑖 ⊗ F𝐵𝑖𝑖−1 𝐵′ →𝐵′ is trace preserving.
𝑖−1 𝑖 𝑖−1 𝑖

1167
Chapter 20: Secret Key Agreement

The classical system 𝑌𝑖 represents the eavesdropper’s copy of the classical data
exchanged by Alice and Bob in this round of LOPC. Note that the reduced channel
acting on Alice and Bob’s systems is as follows:
∑︁
(𝑖) 𝑦 𝑦
L 𝐴′ 𝐵𝑖−1 𝐵′ →𝐴′ 𝐴𝑖 𝐵′ = E 𝐴𝑖′ →𝐴′ 𝐴𝑖 ⊗ F𝐵𝑖𝑖−1 𝐵′ →𝐵′ . (20.1.13)
𝑖−1 𝑖−1 𝑖 𝑖 𝑖−1 𝑖 𝑖−1 𝑖
𝑦𝑖

In (20.1.10)–(20.1.11), we have employed the following shorthands: 𝐸 1𝑖 ≡ 𝐸 1 · · · 𝐸𝑖

and 𝑌1𝑖 ≡ 𝑌1 · · · 𝑌𝑖 . The final step of the protocol consists of an LOPC channel
L (𝑛+1)
𝐴′𝑛 𝐵 𝑛 𝐵′𝑛 →𝐾 𝐴 𝐾 𝐵𝑌𝑛+1
, which generates the key systems 𝐾 𝐴 and 𝐾 𝐵 for Alice and Bob,
respectively. The protocol’s final state is as follows:

𝜔 𝐾 𝐴𝐾 𝐵 𝐸 𝑛𝑌 𝑛+1 B L (𝑛+1)
𝐴′ 𝐵 𝑛 𝐵′ →𝐾 𝐴 𝐾 𝐵𝑌𝑛+1
(𝜔 (𝑛)
𝐴 ′ 𝐵 𝑛 𝐵 ′ 𝐸 𝑛𝑌 𝑛
). (20.1.14)
1 1 𝑛 𝑛 𝑛 𝑛 1 1

and the reduced final channel acting on Alice and Bob’s systems is as follows:
∑︁
(𝑛+1) 𝑦 𝑦 𝑛+1
L 𝐴′ 𝐵𝑛 𝐵′ →𝐾 𝐴𝐾 𝐵 = E 𝐴𝑛+1
′ →𝐾 ⊗ F 𝐵 𝐵 ′ →𝐾 .
𝐴 𝑛 𝐵
(20.1.16)
𝑛 𝑛 𝑛 𝑛
𝑦 𝑛+1

The goal of the protocol is for the final state 𝜔 𝐾 𝐴𝐾 𝐵 𝐸 𝑛𝑌 𝑛+1 to be nearly indistin-
1 1
guishable from a tripartite secret-key state, and we define the privacy error of the
code to be as follows:

𝑝 err (C) B 1 − 𝐹 (𝜔 𝐾 𝐴𝐾 𝐵 𝐸 𝑛𝑌 𝑛+1 , Φ𝐾 𝐴𝐾 𝐵 ⊗ 𝜎𝐸 𝑛𝑌 𝑛+1 ), (20.1.17)

1 1 1 1

where 𝜎𝐸 𝑛𝑌 𝑛+1 is some state of the eavesdropper’s systems, 𝐹 denotes the quantum
1 1
fidelity (Definition 6.5) and the maximally classically correlated state Φ𝐾 𝐴𝐾 𝐵 is
defined as
𝐾
1 ∑︁
Φ𝐾 𝐴 𝐾 𝐵 B |𝑘⟩⟨𝑘 | 𝐾 𝐴 ⊗ |𝑘⟩⟨𝑘 | 𝐾 𝐵 . (20.1.18)
𝐾 𝑘=1
Intuitively, the privacy error 𝑝 err (C) quantifies how distinguishable the final state
𝜔 𝐾 𝐴𝐾 𝐵 𝐸 𝑛𝑌 𝑛+1 is from an ideal tripartite secret-key state Φ𝐾 𝐴𝐾 𝐵 ⊗ 𝜎𝐸 𝑛𝑌 𝑛+1 , in which
1 1 1 1
the key values in 𝐾 𝐴 and 𝐾 𝐵 are perfectly correlated and uniformly random and
1168
Chapter 20: Secret Key Agreement

in tensor product with the eavesdropper’s systems 𝐸 1𝑛𝑌1𝑛+1 . For an ideal tripartite
secret-key state, it is difficult for an eavesdropper to guess the value of the key by
observing the content of her quantum systems 𝐸 1𝑛𝑌1𝑛+1 . In fact, the chance for an
eavesdropper to guess the key value of an ideal secret-key state is equal to 1/𝐾,
which is no better than random guessing.
Due to the isometric invariance of the fidelity and the fact that all isometric
extensions of a channel are related by an isometry acting on the environment system,
the privacy error in (20.1.17) is invariant under any choice of an isometric channel
UN𝐴→𝐵𝐸 that extends the original channel N 𝐴→𝐵 . Thus, the relevant performance
parameters for a secret-key agreement protocol do not change with the particular
isometric extension chosen. This is to be expected since the actual information that
the eavesdropper gains in the protocol does not depend on the particular isometric
extension chosen.

Definition 20.1 (𝒏, 𝑲, 𝜺) Secret-Key-Agreement Protocol

Let (𝜌 (1)
𝐴1′ 𝐴1 𝐵1′ 𝑌1
, {L (𝑖)
′ 𝐵
𝐴𝑖−1 ′ ′ ′ }
𝑛 , L (𝑛+1)
𝑖−1 𝐵𝑖−1 →𝐴𝑖 𝐴𝑖 𝐵𝑖 𝑌𝑖 𝑖=2 𝐴′𝑛 𝐵 𝑛 𝐵′𝑛 →𝐾 𝐴 𝐾 𝐵𝑌𝑛+1
) be the elements
of an 𝑛-round LOPC-assisted secret-key-agreement protocol over the channel
N 𝐴→𝐵 . The protocol is called an (𝑛, 𝐾, 𝜀) protocol, with 𝜀 ∈ [0, 1], if the
privacy error 𝑝 err (C) ≤ 𝜀.

20.1.1 Equivalence between Secret Key Agreement and LOPC-

Assisted Private Communication

The goal of a secret-key-agreement protocol is for Alice and Bob to establish an

approximation of an ideal secret key, the latter begin uniformly distributed, perfectly
correlated, and independent of Eve’s quantum systems. What is the use of this
secret key? As it turns out, it can be used for private communication by means
of the one-time pad protocol. This in turn means that secret key agreement and
private classical communication are equivalent information processing tasks when
public classical communication is available for free, and the goal of this section is
to clarify this point.
An LOPC-assisted private communication protocol uses a quantum channel 𝑛
times along with public classical communication to transmit an arbitrary message
of size 𝐾 privately from Alice to Bob in such a way that the fidelity of the actual

1169
Chapter 20: Secret Key Agreement

state at the end of the protocol and the ideal state is no smaller than 1 − 𝜀. In
𝑝
more detail, let Φ 𝑀 𝐴 𝑀𝐵 denote the following state in which there is an arbitrary
distribution 𝑝 over the message:
𝐾
∑︁
𝑝
Φ 𝑀 𝐴 𝑀𝐵 B 𝑝(𝑚)|𝑚⟩⟨𝑚| 𝑀 𝐴 ⊗ |𝑚⟩⟨𝑚| 𝑀𝐵 . (20.1.19)
𝑚=1
𝑝
Let 𝜔 denote the final state of the protocol, which is defined in the
𝑀 𝐴 𝑀 𝐵 𝐸 1𝑛𝑌1𝑛+1
same way as (20.1.14), with the exception that the message distribution 𝑝 is no
longer uniform. Then an (𝑛, 𝐾, 𝜀) LOPC-assisted private communication protocol
is defined similarly to an (𝑛, 𝐾, 𝜀) secret-key-agreement protocol as given above,
except that the following inequality holds
𝑝 𝑝
max 1 − 𝐹 (𝜔 , Φ 𝑀 𝐴 𝑀𝐵 ⊗ 𝜎𝐸 𝑛𝑌 𝑛+1 ) ≤ 𝜀. (20.1.20)
𝑝:M→[0,1] 𝑀 𝐴 𝑀 𝐵 𝐸 1𝑛𝑌1𝑛+1 1 1

where the maximization is over all message distributions 𝑝 and 𝜎𝐸 𝑛𝑌 𝑛+1 is some fixed
1 1
state of the eavesdropper’s systems that is independent of the message transmitted.
By the use of the one-time pad protocol, it follows that an (𝑛, 𝐾, 𝜀) secret-key-
agreement protocol leads to an (𝑛, 𝐾, 𝜀) LOPC-assisted private communication
protocol. To see how the one-time pad protocol works in conjunction with secret
key agreement, suppose that Alice and Bob have completed an (𝑛, 𝐾, 𝜀) secret-
key-agreement protocol as described in the previous section, with the final state
𝜔 𝐾 𝐴𝐾 𝐵 𝐸 𝑛𝑌 𝑛+1 satisfying 𝑝 err (C) ≤ 𝜀. Alice then brings in her local message registers
1 1
𝑀 𝐴 and 𝑀 𝐴′ , so that the overall quantum state is
𝑝
Φ 𝑀 𝐴 𝑀 𝐴′ ⊗ 𝜔 𝐾 𝐴𝐾 𝐵 𝐸 𝑛𝑌 𝑛+1 (20.1.21)
1 1

The one-time pad protocol is an LOPC protocol in which Alice then performs the
following classical computation, represented as a quantum channel, on her message
register 𝑀 𝐴′ and her key register 𝐾 𝐴 :
∑︁
|𝑚 ⊕ 𝑘⟩𝐶 𝐴 ⟨𝑚| 𝑀 𝐴′ ⟨𝑘 | 𝐾 𝐴 (·)|𝑚⟩ 𝑀 𝐴′ |𝑘⟩𝐾 𝐴 ⟨𝑚 ⊕ 𝑘 |𝐶 𝐴 , (20.1.22)
𝑘,𝑚

where the addition ⊕ is modulo 𝐾. She then transmits the classical register 𝐶 𝐴 over
a public classical channel to Bob. Eve can make a copy 𝐶 𝐴′ of this classical register
containing the value 𝑚 ⊕ 𝑘, but since Bob’s key register 𝐾 𝐵 is not available to her,
the register 𝐶 𝐴′ is nearly independent of Alice’s message register 𝑀 𝐴 (depending on
1170
Chapter 20: Secret Key Agreement

how small 𝜀 is). Bob then performs the following classical computation, represented
as a quantum channel, on his received register 𝐶 𝐴 and his key register 𝐾 𝐵 :
∑︁
|𝑐 ⊖ 𝑘⟩ 𝑀𝐵 ⟨𝑐|𝐶 𝐴 ⟨𝑘 | 𝐾 𝐵 (·)|𝑐⟩𝐶 𝐴 |𝑘⟩𝐾 𝐵 ⟨𝑐 ⊖ 𝑘 | 𝑀𝐵 , (20.1.23)
𝑐,𝑘
𝑝
where the subtraction ⊖ is modulo 𝐾. Let 𝜔 denote the final state of
𝑀 𝐴 𝑀 𝐵 𝐸 1𝑛𝑌1𝑛+1 𝐶 𝐴′
the protocol. By applying the data-processing inequality to (20.1.20), as well as
the fact mentioned above that 𝐶 𝐴′ is independent of 𝑀 𝐴 and 𝑀𝐵 in the ideal case,
the following inequality holds
𝑝 𝑝
max 1 − 𝐹 (𝜔 𝑛 𝑛+1 , Φ 𝑀 𝐴 𝑀𝐵
⊗ 𝜎𝐸 𝑛𝑌 𝑛+1𝐶 𝐴′ ) ≤ 𝜀, (20.1.24)
𝑝:M→[0,1] 𝑀 𝐴 𝑀 𝐵 𝐸 1 𝑌1 𝐶 𝐴′ 1 1

where 𝜎𝐸 𝑛𝑌 𝑛+1 𝐶 𝐴′ is a fixed state of the eavesdropper’s systems. Thus, an (𝑛, 𝐾, 𝜀)

1 1
secret-key-agreement protocol leads to an (𝑛, 𝐾, 𝜀) LOPC-assisted private commu-
nication protocol, as claimed.
The other implication is trivial: while employing an (𝑛, 𝐾, 𝜀) LOPC-assisted
private communication protocol, Alice can choose the distribution of the message to
be uniform, and then an (𝑛, 𝐾, 𝜀) LOPC-assisted private communication protocol
leads to an (𝑛, 𝐾, 𝜀) secret-key-agreement protocol. Thus, it follows that LOPC-
assisted private communication and secret key agreement are equivalent whenever
public classical communication is available for free.

20.2 Equivalence between Secret Key Agreement and

LOCC-Assisted Private-State Distillation
There is a deep and powerful equivalence between a secret-key-agreement protocol
as described above and a protocol that uses LOCC assistance to distill a bipartite
private state (recall Definition 15.4). This equivalence is helpful in the analysis
of secret-key-agreement protocols, in the sense that one can use tools from
entanglement theory in order to establish bounds on the rate at which secret key
agreement is possible.
The main idea behind this equivalence is to apply the purification principle to a
secret-key-agreement protocol and then examine the consequences. That is, we
can purify each step of the secret-key-agreement protocol discussed in the previous
section, and then we can examine various reduced states at each step. In what
follows, we detail such a purified protocol.
1171
Chapter 20: Secret Key Agreement

20.2.1 The Purified Protocol

To begin with, recall that the initial state 𝜌 (1)

𝐴 ′ 𝐴1 𝐵 ′
of a secret-key-agreement protocol
1 1
is a separable state of the form in (20.1.3). The state 𝜌 (1) 𝐴1′ 𝐴1 𝐵1′
can be purified as
follows
∑︁ √︁
|𝜌 (1) ⟩ 𝐴1′ 𝐴1 𝑆 𝐴1 𝐵1′ 𝑆 𝐵1 𝑌1 B 𝑝𝑌1 (𝑦 1 )|𝜏 𝑦1 ⟩ 𝐴1′ 𝐴1 𝑆 𝐴1 ⊗ |𝜁 𝑦1 ⟩𝐵1′ 𝑆 𝐵1 ⊗ |𝑦 1 ⟩𝑌1 , (20.2.1)
𝑦1

where the systems 𝑆 𝐴1 and 𝑆 𝐵1 are known as local “shield” systems. In principle,
the shield systems 𝑆 𝐴1 and 𝑆 𝐵1 could be held by Alice and Bob, respectively, and
𝑦 𝑦
the states |𝜏 𝑦1 ⟩ 𝐴1′ 𝐴1 𝑆 𝐴1 and |𝜁 𝑦1 ⟩𝐵1′ 𝑆 𝐵1 purify 𝜏𝐴1′ 𝐴1 and 𝜁 𝐵1′ in (20.1.3), respectively.
1 1
We assume without loss of generality that the shield systems contain a coherent
classical copy of the classical random variable 𝑌1 , such that tracing over systems
𝑆 𝐴1 and 𝑆 𝐵1 recovers the original state in (20.1.3). As before, Eve possesses system
𝑌1 , which contains a coherent classical copy of the classical data exchanged.
Each LOPC channel L (𝑖) ′ 𝐵
𝐴𝑖−1 ′ ′ ′ for 𝑖 ∈ {2, . . . , 𝑛} is of the form in
𝑖−1 𝐵𝑖−1 →𝐴𝑖 𝐴𝑖 𝐵𝑖
(20.1.12) and can be purified to an isometry in the following way:
(𝑖)
𝑈 𝐴L′ 𝐵 𝐵′ →𝐴𝑖′ 𝐴𝑖 𝑆 𝐴𝑖 𝐵𝑖′ 𝑆 𝐵𝑖 𝑌𝑖 B
𝑖−1 𝑖−1 𝑖−1
∑︁
𝑈 𝐴E′ ⊗ 𝑈𝐵F𝑖−1 𝐵′
𝑦𝑖 𝑦𝑖
→𝐴𝑖′ 𝐴𝑖 𝑆 𝐴𝑖 →𝐵𝑖′ 𝑆 𝐵𝑖 ⊗ |𝑦𝑖 ⟩𝑌𝑖 , (20.2.2)
𝑖−1 𝑖−1
𝑦𝑖

where {𝑈 𝐴E′ 𝑖 →𝐴′ 𝐴𝑖 𝑆 𝐴 } 𝑦𝑖 and {𝑈𝐵F𝑖−1

𝑦 𝑖 𝑦
′ →𝐵 ′ 𝑆 } 𝑦 𝑖 are collections of linear operators,
𝐵𝑖−1
𝑖−1 𝑖 𝑖 𝑖 𝐵𝑖
each of which is a contraction, that is,

∥𝑈 𝐴E′ ∥𝑈𝐵F𝑖−1 𝐵′
𝑦𝑖 𝑦𝑖

𝑖−1
→𝐴𝑖′ 𝐴𝑖 𝑆 𝐴𝑖 ∥ ∞ , 𝑖−1
→𝐵𝑖′ 𝑆 𝐵𝑖 ∥ ∞ ≤ 1, (20.2.3)

such that the linear operator in (20.2.2) is an isometry.

(𝑖)
It is important to note here that the isometry 𝑈 𝐴L′ 𝐵𝑖−1 𝐵′ →𝐴′ 𝐴𝑖 𝑆 𝐴 𝐵′ 𝑆 𝐵 𝑌𝑖 results
𝑖−1 𝑖−1 𝑖 𝑖 𝑖 𝑖
from purifying each step of an LOPC channel. That is, an LOPC channel is
implemented as a sequence of one-way LOPC channels, which each consist of a
generalized measurement by one party, classical communication of the measurement
outcome to the other, and a channel by the other party, conditioned on the outcome
of the measurement. So when purifying the LOPC channel, we purify each of these
steps, and the resulting purified channel is what is represented in (20.2.2).
1172
Chapter 20: Secret Key Agreement

The systems 𝑆 𝐴𝑖 and 𝑆 𝐵𝑖 in (20.2.2) are shield systems belonging to Alice and
Bob, respectively, and we assume without loss of generality that they contain a
coherent classical copy of the classical random variable 𝑌𝑖 , such that tracing over
the systems 𝑆 𝐴𝑖 and 𝑆 𝐵𝑖 recovers the original LOPC channel in (20.1.12). As before,
𝑌𝑖 is a system held by Eve, containing a coherent classical copy of the classical data
exchanged in this round.
Thus, a purification of the state 𝜌 (𝑖)
𝐴 ′ 𝐴𝑖 𝐵 ′
after each LOPC channel is as follows:
𝑖 𝑖

|𝜌 (𝑖) ⟩ 𝐴′ 𝐴𝑖 𝑆 𝐵′ 𝑆 𝐸 𝑖−1𝑌1𝑖 B
𝑖 𝐴𝑖 𝑖 𝐵𝑖 1
1 1
L (𝑖)
𝑈 ′ 𝐵
𝐴𝑖−1 ′ ′ ′
𝑖−1 𝐵𝑖−1 →𝐴𝑖 𝐴𝑖 𝑆 𝐴𝑖 𝐵𝑖 𝑆 𝐵𝑖 𝑌𝑖
|𝜔 (𝑖−1) ⟩ 𝐴′ 𝐵 𝐵′ 𝑆 𝑆 𝐸 𝑖−1𝑌1𝑖−1 , (20.2.4)
𝑖−1 𝑖−1 𝑖−1 𝐴𝑖−1 𝐵𝑖−1 1
1 1

where we have employed the shorthands 𝑆 𝐴𝑖 ≡ 𝑆 𝐴1 · · · 𝑆 𝐴𝑖 and 𝑆 𝐵𝑖 ≡ 𝑆 𝐵1 · · · 𝑆 𝐵𝑖 ,

1 1
with a similar shorthand for 𝐸 1𝑖−1 and 𝑌1𝑖 as before. A purification of the state
𝜔 (𝑖)
𝐴′ 𝐵𝑖 𝐵 ′
after each use of the channel N 𝐴→𝐵 is
𝑖 𝑖

|𝜔 (𝑖) ⟩ 𝐴′ 𝐵𝑖 𝑆 𝐵′ 𝑆 𝐸 𝑖 𝑌 𝑖 B 𝑈 𝐴N𝑖 →𝐵𝑖 𝐸𝑖 |𝜌 (𝑖) ⟩ 𝐴′ 𝐴𝑖 𝑆 𝐵′ 𝑆 𝐸 𝑖−1𝑌1𝑖 , (20.2.5)

𝑖 𝐴𝑖 𝑖 𝐵𝑖 1 1 𝑖 𝐴𝑖 𝑖 𝐵𝑖 1
1 1 1 1

where 𝑈 𝐴N𝑖 →𝐵𝑖 𝐸𝑖 is an isometric extension of the 𝑖th channel use N 𝐴𝑖 →𝐵𝑖 .
The final LOPC channel takes the form in (20.1.15), and it can be purified to an
isometry similarly as
(𝑛+1)
𝑈 𝐴L′𝑛 𝐵𝑛 𝐵′𝑛 →𝐾 𝐴𝑆 𝐴 𝐾 𝐵 𝑆 𝐵𝑛+1 𝑌𝑛+1 B
𝑛+1
∑︁
𝑈 𝐴E′𝑛 →𝐾 𝐴𝑆 𝐴 ⊗ 𝑈𝐵F𝑛 𝐵′𝑛 →𝐾 𝐵 𝑆 𝐵
𝑦𝑛+1 𝑦𝑛+1
⊗ |𝑦 𝑛+1 ⟩𝑌𝑛+1 . (20.2.6)
𝑛+1 𝑛+1
𝑦 𝑛+1

The systems 𝑆 𝐴𝑛+1 and 𝑆 𝐵𝑛+1 are again shield systems belonging to Alice and Bob,
respectively, and we assume again that they contain a coherent classical copy of
the classical random variable 𝑌𝑛+1 , such that tracing over 𝑆 𝐴𝑛+1 and 𝑆 𝐵𝑛+1 recovers
the original LOPC channel in (20.1.15). As before, 𝑌𝑛+1 is a system held by Eve,
containing a coherent classical copy of the classical data exchanged in this round.
The final state at the end of the purified protocol is a pure state |𝜔⟩𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 𝐸 𝑛𝑌 𝑛+1 ,
given by

|𝜔⟩𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 𝐸 𝑛𝑌 𝑛+1 B
1173
Chapter 20: Secret Key Agreement

(𝑛+1)
𝑈 𝐴L′𝑛 𝐵𝑛 𝐵′𝑛 →𝐾 𝐴𝑆 𝐴 𝐾 𝑆 𝑌 |𝜔
(𝑛)
⟩ 𝐴′𝑛 𝐵𝑛 𝑆 𝐴𝑛 𝐵′𝑛 𝑆 𝐵𝑛 𝐸1𝑛𝑌1𝑛 . (20.2.7)
𝑛+1 𝐵 𝐵𝑛+1 𝑛+1 1 1

Alice is in possession of the key system 𝐾 𝐴 and the shield systems 𝑆 𝐴 ≡ 𝑆 𝐴1 · · · 𝑆 𝐴𝑛+1 ,
Bob possesses the key system 𝐾 𝐵 and the shield systems 𝑆 𝐵 ≡ 𝑆 𝐵1 · · · 𝑆 𝐵𝑛+1 , and
Eve holds the environment systems 𝐸 𝑛 ≡ 𝐸 1 · · · 𝐸 𝑛 . Additionally, Eve has coherent
copies 𝑌 𝑛+1 ≡ 𝑌1 · · · 𝑌𝑛+1 of all the classical data exchanged.

20.2.2 LOCC-Assisted Bipartite Private-State Distillation

As a consequence of the purification principle, on the one hand, if we trace over

the shield systems at every step, then we simply recover the original tripartite
secret-key-agreement protocol detailed in Section 20.1. On the other hand, suppose
that we instead trace over all of Eve’s systems at each step. Due to the fact that
each state of the 𝑌 systems is a coherent classical copy, the resulting reduced states
consist of a classical mixture of various states of Alice and Bob’s systems, as would
arise in an LOCC-assisted quantum communication protocol.
It is worthwhile to examine how each step changes after tracing over Eve’s
systems of the purified protocol. For the first step, the reduced state of Alice and
Bob’s systems is a separable state of the following form:
∑︁
(1) 𝑦 𝑦
𝜌 𝐴 ′ 𝐴1 𝑆 𝐴 𝐵 ′ 𝑆 𝐵 = 𝑝𝑌1 (𝑦 1 )𝜏𝐴1′ 𝐴1 𝑆 𝐴 ⊗ 𝜁 𝐵1′ 𝑆 𝐵 , (20.2.8)
1 1 1 1 1 1 1 1
𝑦1
𝑦 𝑦
where 𝜏𝐴1′ 𝐴1 𝑆 𝐴 = |𝜏 𝑦1 ⟩⟨𝜏 𝑦1 | 𝐴1′ 𝐴1 𝑆 𝐴1 and 𝜁 𝐵1′ 𝑆 𝐵 = |𝜁 𝑦1 ⟩⟨𝜁 𝑦1 | 𝐵1′ 𝑆 𝐵1 . Tracing over Eve’s
1 1 1 1
(𝑖)
system 𝑌𝑖 of each isometry 𝑈 𝐴L′ 𝐵𝑖−1 𝐵′ →𝐴′ 𝐴𝑖 𝑆 𝐴 𝐵′ 𝑆 𝐵 𝑌𝑖 leads to the following LOCC
𝑖−1 𝑖−1 𝑖 𝑖 𝑖 𝑖
channel:
∑︁ 𝑦𝑖
L (𝑖)
′ ′ ′ ′
𝐴 𝐵𝑖−1 𝐵 →𝐴 𝐴𝑖 𝑆 𝐴 𝐵 𝑆 𝐵
= U E
′ ′
𝐴 →𝐴 𝐴𝑖 𝑆 𝐴 ⊗ U F 𝑦𝑖
𝐵𝑖−1 𝐵′ →𝐵′ 𝑆 𝐵 , (20.2.9)
𝑖−1 𝑖−1 𝑖 𝑖 𝑖 𝑖 𝑖−1 𝑖 𝑖 𝑖−1 𝑖 𝑖
𝑦𝑖

where
UE𝐴′ (·) B 𝑈 𝐴E′ (·) [𝑈 𝐴E′ †
𝑦𝑖 𝑦𝑖 𝑦𝑖
→𝐴𝑖′ 𝐴𝑖 𝑆 𝐴𝑖 →𝐴𝑖′ 𝐴𝑖 𝑆 𝐴𝑖 →𝐴 ′𝐴 𝑆 ] , (20.2.10)
𝑖−1 𝑖−1 𝑖−1 𝑖 𝑖 𝐴𝑖

UF𝐵𝑖−1 𝐵′ F 𝑦𝑖 F 𝑦𝑖
(·) [𝑈𝐵𝑖−1 𝐵′ →𝐵′ 𝑆 𝐵 ] † .
𝑦𝑖

𝑖−1
→𝐵𝑖′ 𝑆 𝐵𝑖 (·) B𝑈 ′ →𝐵 ′ 𝑆
𝐵𝑖−1 𝐵𝑖−1 𝑖 𝐵𝑖 𝑖−1 𝑖 𝑖
(20.2.11)
Tracing over Eve’s system 𝑌𝑛+1 of the final isometry leads to the following LOCC
channel:
∑︁ 𝑦
(𝑛+1)
UE𝐴′𝑛 →𝐾 𝐴𝑆 𝐴 ⊗ UF𝐵𝑛 𝐵′𝑛 →𝐾 𝐵 𝑆 𝐵 ,
𝑛+1 𝑦𝑛+1
L 𝐴′ 𝐵𝑛 𝐵′ →𝐾 𝐴𝑆 𝐴 𝐾 𝐵 𝑆 𝐵 = (20.2.12)
𝑛 𝑛 𝑛+1 𝑛+1 𝑛+1 𝑛+1
𝑦 𝑛+1

1174
Chapter 20: Secret Key Agreement

with a similar convention as in (20.2.10)–(20.2.11) for the maps UE𝐴′ 𝑛+1

𝑦
→𝐾 𝐴 𝑆 𝐴
and
𝑛 𝑛+1
UF 𝑦𝑛+1 .
𝐵 𝑛 𝐵′𝑛 →𝐾 𝐵 𝑆 𝐵𝑛+1

The states at every step of the protocol are then given by the following for all
𝑖 ∈ {2, . . . , 𝑛}:

𝜌 (𝑖)
𝐴 ′ 𝐴𝑖 𝑆 𝐵𝑖′ 𝑆 𝐵𝑖
B L (𝑖)
𝐴′ 𝐵 𝐵′ →𝐴𝑖′ 𝐴𝑖 𝑆 𝐴𝑖 𝐵𝑖′ 𝑆 𝐵𝑖
(𝜔 (𝑖−1)
𝐴′ 𝑆 𝐵 𝐵′ 𝑆
), (20.2.13)
𝑖 𝐴𝑖 𝑖−1 𝑖−1 𝑖−1 𝑖−1 𝐴𝑖−1 𝑖−1 𝑖−1 𝐵𝑖−1
1 1 1 1

𝜔 (𝑖)
𝐴′ 𝑆 𝐵𝑖 𝐵𝑖′ 𝑆 𝐵𝑖
B N 𝐴𝑖 →𝐵𝑖 (𝜌 (𝑖)
𝐴 ′ 𝐴𝑖 𝑆 𝐵′ 𝑆
), (20.2.14)
𝑖 𝐴𝑖 𝑖 𝐴𝑖 𝑖 𝐵𝑖
1 1 1 1

and the final state of the protocol is given by

𝜔 𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 B L (𝑛+1)
𝐴′ 𝐵 𝑛 𝐵′ →𝐾 𝐴 𝑆 𝐴 𝐾 𝑆
(𝜔 (𝑛)
𝐴′ 𝐵 𝑛 𝑆 ′ ). (20.2.15)
𝑛 𝑛 𝑛+1 𝐵 𝐵𝑛+1 𝑛 𝐴𝑛 𝐵 𝑛 𝑆 𝐵𝑛
1 1

Finally, by employing (20.1.17) and Proposition 15.7, the following condition

holds
𝑝 err (C) = 1 − 𝐹 (𝜔 𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 , 𝛾𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 ), (20.2.16)
where 𝛾𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 is a bipartite private state of the form in Theorem 15.5. Thus,
applying Definition 20.1, it follows that
𝐹 (𝜔 𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 , 𝛾𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 ) ≥ 1 − 𝜀. (20.2.17)

We can now make a critical observation. By tracing over Eve’s systems 𝐸 𝑛

and 𝑌 𝑛+1 at every step of the purified protocol as we did above, it is clear that
the resulting protocol is an LOCC-assisted protocol that distills an approximate
bipartite private state on the systems 𝐾 𝐴 𝑆 𝐴 𝐾 𝐵 𝑆 𝐵 , with performance parameter
given by (20.2.17). Indeed, the initial state in (20.2.8) is a separable state, and
the channels in (20.2.9) and (20.2.12) are LOCC channels, and so the protocol
has the same form as an LOCC-assisted quantum communication protocol, as we
studied in the previous chapter. However, the goal of this LOCC-assisted bipartite
private-state distillation is not as stringent as it was in the previous chapter, for
LOCC-assisted quantum communication. Namely, it is only necessary to distill an
approximate bipartite private state and not necessarily an approximate maximally
entangled state; but keep in mind that a maximally entangled state is a particular
kind of bipartite private state. Thus, starting with a tripartite secret-key-agreement
protocol, we can apply the purification principle, then trace over the systems of the
eavesdropper, and the result is a bipartite private-state distillation protocol assisted
by LOCC.
1175
Chapter 20: Secret Key Agreement

Alternatively, this reasoning can go in the opposite direction. Suppose instead

that we had started with an LOCC-assisted bipartite private-state distillation protocol
of the above form. Then we could purify it as we did in the previous subsection, and
after doing so, we could trace over Alice and Bob’s shield systems. If the protocol
satisfies the condition in (20.2.17), then the resulting protocol would be an (𝑛, 𝐾, 𝜀)
tripartite secret-key-agreement protocol, which follows as a consequence of the
equivalence between approximate bipartite private states and tripartite secret-key
states, as given in Proposition 15.7.
As a consequence of this equivalence between tripartite secret-key-agreement
protocols and bipartite private-state distillation protocols, we can employ the
tools of entanglement theory in order to analyze bipartite private-state distillation
protocols, in a way similar to how we did in the last chapter. For example, if our
goal is to determine upper bounds on the rate at which it is possible to generate
secret key in a secret-key-agreement protocol, then we can employ an entanglement
measure to analyze the equivalent bipartite private-state distillation protocol in
order to accomplish the goal. In fact, this is exactly what we accomplish in this
chapter, demonstrating that the squashed entanglement of a channel serves as an
upper bound on secret-key rates, and that variations of the relative entropy of
entanglement, similar in spirit to the Rains relative entropy, serve as upper bounds
on secret-key rates as well.

20.2.2.1 Unboundedness of Shield Systems in a Bipartite Private-State Distil-

lation Protocol

One observation that we make here is that the shield systems in a bipartite private-
state distillation protocol are finite-dimensional, yet arbitrarily large. That is, there
is no bound that we can establish on their dimension for a generic private-state
distillation protocol, and this unboundedness is a consequence of the fact that the
shield systems result from purifying the local memory or scratch registers of Alice
and Bob, which in turn have no bound on their dimension. This unboundedness
poses a challenge when trying to establish upper bounds on the rate at which
secret key agreement, or equivalently, bipartite private-state distillation is possible.
However, there are methods for handling this unboundedness that we detail later.

1176
Chapter 20: Secret Key Agreement

20.2.3 Relation between Secret Key Agreement and LOCC-

Assisted Quantum Communication

Due to the fact that a maximally entangled state is a particular kind of bipartite
private state and due to the equivalence between secret key agreement and LOCC-
assisted bipartite private-state distillation, we arrive at the following conclusion,
which relates LOCC-assisted quantum communication to secret key agreement:

Proposition 20.2
Let N 𝐴→𝐵 be a quantum channel, let 𝑛, 𝐾 ∈ N, and let 𝜀 ∈ [0, 1]. Then an
(𝑛, 𝐾, 𝜀) LOCC-assisted quantum communication protocol is also an (𝑛, 𝐾, 𝜀)
protocol for secret key agreement.

This statement is a rather simple observation, but it has consequences for

capacities related to these tasks. That is, from this statement, we can conclude that
the LOCC-assisted quantum capacity of a given quantum channel is bounded from
above by its secret-key-agreement capacity. Thus, any upper bound established on
the secret-key-agreement capacity of a quantum channel is also an upper bound on
its LOCC-assisted quantum capacity. Furthermore, if a given quantity is a lower
bound on the LOCC-assisted quantum capacity of a quantum channel, then it is
also a lower bound on its secret-key-agreement capacity.

20.2.4 𝒏-Shot Secret-Key-Agreement Protocol Assisted by Pub-

lic Separable Channels

In Section 19.2, we generalized the notion of an LOCC-assisted quantum com-

munication protocol to one that is assisted by PPT-preserving channels. We note
here that we can consider a similar kind of generalization for secret-key-agreement
protocols.
Recall from Section 4.6.2 that any LOCC channel L 𝐴𝐵 can be written as a
separable channel of the following form:
∑︁
𝑦 𝑦
L 𝐴𝐵→𝐴 𝐵 =
′ ′ E 𝐴→𝐴′ ⊗ F𝐵→𝐵′ , (20.2.18)
𝑦
𝑦 𝑦
where {E 𝐴→𝐴′ } 𝑦 and {F𝐵→𝐵′ } 𝑦 are sets of completely positive maps such that the
1177
Chapter 20: Secret Key Agreement
Í 𝑦 𝑦
sum map 𝑦 E 𝐴 ⊗ F𝐵 is trace preserving. However, the converse statement is not
true. That is, it is not possible in general to implement an arbitrary separable
channel of the form above as an LOCC channel.
Thus, we can allow for a slight generalization of a secret-key-agreement protocol
to one that is assisted by public separable channels. Indeed, we define a public
separable channel to be the following generalization of an LOPC channel:
∑︁
𝑦 𝑦
L 𝐴𝐵→𝐴′ 𝐵′𝑌 = E 𝐴→𝐴′ ⊗ F𝐵→𝐵′ ⊗ |𝑦⟩⟨𝑦|𝑌 , (20.2.19)
𝑦

where Alice and Bob have access to the 𝐴 and 𝐵 systems, respectively, and the
eavesdropper has access to the system 𝑌 . The only requirement for a public
𝑦 𝑦
separable channel is that {E 𝐴→𝐴Í′ } 𝑦 and {F𝐵→𝐵′ } 𝑦 are sets of completely positive
𝑦 𝑦
maps such that the sum map 𝑦 E 𝐴 ⊗ F𝐵 is trace preserving. Similar to the
distinction between LOCC and separable channels, it is not possible in general
to implement a public separable channel via local operations and public classical
communication.
The main point that we make in this section is that we can generalize a secret-
key-agreement protocol to be assisted by public separable channels rather than
just LOPC channels. For fixed privacy error, the resulting protocol achieves a
rate of communication that is either the same or higher than that achieved by
an LOPC-assisted protocol, due to the fact that every LOPC channel is a public
separable channel. Such a protocol is defined in the same way as we did in
Section 20.1, and then we arrive at the following definition:

Definition 20.3 (𝒏, 𝑲, 𝜺) Secret-Key-Agreement Protocol Assisted by

Public Separable Channels

Let C B (𝜌 (1)
𝐴1′ 𝐴1 𝐵1′ 𝑌1
, {L (𝑖)
′ 𝐵
𝐴𝑖−1 ′ ′ ′ }
𝑛 , L (𝑛+1)
𝑖−1 𝐵𝑖−1 →𝐴𝑖 𝐴𝑖 𝐵𝑖 𝑌𝑖 𝑖=2 𝐴′𝑛 𝐵 𝑛 𝐵′𝑛 →𝐾 𝐴 𝐾 𝐵𝑌𝑛+1
) be the ele-
ments of an 𝑛-round public-separable-assisted secret-key-agreement protocol
over the channel N 𝐴→𝐵 . The protocol is called an (𝑛, 𝐾, 𝜀) protocol, with
𝜀 ∈ [0, 1], if the privacy error 𝑝 err (C) ≤ 𝜀.

Furthermore, the equivalence between secret key agreement and bipartite

private-state distillation, as outlined in Sections 20.2.1 and 20.2.2, still holds under
this generalization (one can check that all of the steps given in Sections 20.2.1 and
20.2.2 still hold). However, the correspondence changes as follows: to any tripartite
1178
Chapter 20: Secret Key Agreement

(𝑛, 𝐾, 𝜀) secret-key-agreement protocol assisted by public separable channels, there

exists an (𝑛, 𝐾, 𝜀) bipartite private-state distillation protocol assisted by separable
channels and vice versa. Thus, we can again employ the tools of entanglement
theory to analyze secret-key-agreement protocols assisted by public separable
channels.

20.2.5 Amortized Entanglement Bound for Secret-Key-Agreement

Protocols

Due to the equivalence between tripartite secret-key-agreement protocols and

bipartite private-state distillation protocols, we can use the tools of entanglement
theory to establish upper bounds on the rate at which secret-key-agreement is
possible. Namely, we can apply the idea of amortized entanglement from the
previous chapter in order to establish a generic upper bound in terms of an amortized
entanglement measure. In fact, by the same steps used to arrive at Proposition 19.2,
we find the following bound:

Proposition 20.4
Let N 𝐴→𝐵 be a quantum channel, let 𝜀 ∈ [0, 1], and let 𝐸 be an entanglement
measure that is equal to zero for all separable states. For an (𝑛, 𝐾, 𝜀) secret-
key-agreement protocol, the following bound holds

𝐸 (𝐾 𝐴 𝑆 𝐴 ; 𝐾 𝐵 𝑆 𝐵 )𝜔 ≤ 𝑛 · 𝐸 A (N), (20.2.20)

where 𝜔 𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 is the final state resulting from the equivalent (𝑛, 𝐾, 𝜀)

bipartite private-state distillation protocol (see (20.2.15)) and 𝐸 A (N) is the
amortized entanglement of the channel N 𝐴→𝐵 , as given in Definition 10.3.

Just as the bound from Proposition 19.2 depends on the final state of the
LOCC-assisted quantum communication protocol, the same is true for the bound
in (20.2.20). The bound is thus not a universal bound (a universal bound would
depend only on the protocol parameters 𝑛, 𝐾, and 𝜀). Thus, one of the main goals
of the forthcoming sections is to employ particular entanglement measures in order
to arrive at universal bounds for secret-key-agreement protocols.
We should also observe that the quantity 𝐸 (𝐾 𝐴 𝑆 𝐴 ; 𝐾 𝐵 𝑆 𝐵 )𝜔 in the bound in

1179
Chapter 20: Secret Key Agreement

(20.2.20) can be understood as quantifying amount of entanglement between the

systems 𝐾 𝐴 𝑆 𝐴 and 𝐾 𝐵 𝑆 𝐵 . As such, the shield systems 𝑆 𝐴 and 𝑆 𝐵 are involved,
and they can have arbitrarily large dimension. Thus, one must account for this in
the analysis of the entanglement 𝐸 (𝐾 𝐴 𝑆 𝐴 ; 𝐾 𝐵 𝑆 𝐵 )𝜔 . For example, in the previous
chapter, we analyzed the analogous quantity 𝐸 (𝑀 𝐴 ; 𝑀𝐵 )𝜔 by employing squashed
entanglement. In particular, since the state 𝜔 𝑀 𝐴 𝑀𝐵 there was an approximate maxi-
mally entangled state, we applied the uniform continuity of squashed entanglement
from Proposition 9.38 in order to evaluate 𝐸 (𝑀 𝐴 ; 𝑀𝐵 )𝜔 . When we did so, the
dimension of the maximally entangled state appeared in the continuity bound,
and this was acceptable there because the dimension of the maximally entangled
state is directly related to the rate of entanglement distillation. However, it is not
clear whether we can take such an approach, via uniform continuity of squashed
entanglement, when analyzing bipartite private-state distillation, due to the fact
that the shield systems do not necessarily have a bounded dimension. As such, we
employ another method to analyze such protocols.
Just as the bound in Proposition 19.2 simplifies for teleportation-simulable
channels and particular entanglement measures, the same is true for the bound
given in Proposition 20.4, by employing the same reasoning:

Corollary 20.5 Reduction by Teleportation

Let 𝐸 𝑆 denote an entanglement measure that is subadditive with respect to
states (Definition 9.1.9) and equal to zero for all separable states. Let N 𝐴→𝐵
be a channel that is LOCC-simulable with associated resource state 𝜃 𝑅𝐵′
(Definition 4.25). Let 𝜀 ∈ [0, 1]. For an (𝑛, 𝐾, 𝜀) secret-key-agreement
protocol, the following bound holds

𝐸 𝑆 (𝐾 𝐴 𝑆 𝐴 ; 𝐾 𝐵 𝑆 𝐵 )𝜔 ≤ 𝑛 · 𝐸 𝑆 (𝑅; 𝐵′)𝜃 , (20.2.21)

where 𝜔 𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 is the final state resulting from the equivalent (𝑛, 𝐾, 𝜀)

bipartite private-state distillation protocol.

Just as we found bounds that apply to PPT-assisted quantum communication in

terms of entanglement measures that are monotone with respect to PPT-preserving
channels, we can also find bounds that apply to secret-key-agreement protocols that
are assisted by public separable channels:

1180
Chapter 20: Secret Key Agreement

Proposition 20.6
Let N 𝐴→𝐵 be a quantum channel, let 𝜀 ∈ [0, 1], and let 𝐸 be an entanglement
measure that is monotone non-increasing with respect to separable channels
and equal to zero for all separable states. For an (𝑛, 𝐾, 𝜀) secret-key-agreement
protocol assisted by public separable channels, the following bound holds

𝐸 (𝐾 𝐴 𝑆 𝐴 ; 𝐾 𝐵 𝑆 𝐵 )𝜔 ≤ 𝑛 · 𝐸 A (N), (20.2.22)

where 𝜔 𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 is the final state resulting from the equivalent (𝑛, 𝐾, 𝜀)

bipartite private-state distillation protocol assisted by separable channels and
𝐸 A (N) is the amortized entanglement of the channel N 𝐴→𝐵 , as given in
Definition 10.3.

Finally, this bound again simplifies for channels that are simulable by the action
of a separable channel on a resource state 𝜃 𝑅𝐵′ (separable-simulable channels):

Corollary 20.7
Let 𝐸 𝑆 denote an entanglement measure that is that is monotone non-increasing
with respect to separable channels, subadditive with respect to states (Defini-
tion 9.1.9), and equal to zero for separable states. Let N 𝐴→𝐵 be a channel that is
separable-simulable with associated resource state 𝜃 𝑅𝐵′ (Definition 4.26). Let
𝜀 ∈ [0, 1]. For an (𝑛, 𝐾, 𝜀) secret-key-agreement protocol assisted by public
separable channels, the following bound holds

𝐸 𝑆 (𝐾 𝐴 𝑆 𝐴 ; 𝐾 𝐵 𝑆 𝐵 )𝜔 ≤ 𝑛 · 𝐸 𝑆 (𝑅; 𝐵′)𝜃 , (20.2.23)

where 𝜔 𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 is the final state resulting from the equivalent (𝑛, 𝐾, 𝜀)

bipartite private-state distillation protocol assisted by separable channels.

20.3 Squashed Entanglement Upper Bound on the

Number of Transmitted Private Bits
We now employ the squashed entanglement in order to bound the number of private
bits that an 𝑛-shot secret-key-agreement protocol can generate. We have already

1181
Chapter 20: Secret Key Agreement

shown in Section 9.4 that the squashed entanglement satisfies all of the requirements
needed to apply it in Proposition 20.4. Namely, it is equal to zero for separable
states, it is an entanglement measure (non-increasing under the action of an LOCC
channel), and the squashed entanglement of a channel does not increase under
amortization (Theorem 10.20). Putting all of these items together, we can already
conclude the following bound for an (𝑛, 𝐾, 𝜀) secret-key-agreement protocol:

𝐸 sq (𝐾 𝐴 𝑆 𝐴 ; 𝐾 𝐵 𝑆 𝐵 )𝜔 ≤ 𝑛 · 𝐸 sq (N), (20.3.1)

where 𝐸 sq (N) is the squashed entanglement of a channel (Definition 10.1) and

𝜔 𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 is the final state resulting from the equivalent (𝑛, 𝐾, 𝜀) bipartite private-
state distillation protocol. Thus, what remains is to evaluate the squashed entangle-
ment of an approximate bipartite private state, and this is the main technical problem
that we consider in this section before concluding that squashed entanglement is an
upper bound on secret-key rates.

20.3.1 Squashed Entanglement and Approximate Private States

This subsection establishes Proposition 20.9, which is an upper bound on the

logarithm of the dimension 𝐾 of a key system of an 𝜀-approximate private state, as
given in Definition 15.6, in terms of its squashed entanglement, plus another term
depending only on 𝜀 and log2 𝐾.
In what follows, we suppose that 𝛾 𝐴𝐴′ 𝐵𝐵′ is a private state with key systems 𝐴𝐵
and shield systems 𝐴′ 𝐵′. Recall from Theorem 15.5 that a private state of log2 𝐾
private bits can be written in the following form:
†
𝛾 𝐴𝐵𝐴′ 𝐵′ = 𝑈 𝐴𝐵𝐴′ 𝐵′ (Φ 𝐴𝐵 ⊗ 𝜎𝐴′ 𝐵′ ) 𝑈 𝐴𝐵𝐴 ′ 𝐵′ , (20.3.2)

where Φ 𝐴𝐵 is a maximally entangled state of Schmidt rank 𝐾

1 ∑︁
Φ 𝐴𝐵 B |𝑖⟩⟨ 𝑗 | 𝐴 ⊗ |𝑖⟩⟨ 𝑗 | 𝐵 , (20.3.3)
𝐾 𝑖, 𝑗

and ∑︁
𝑖𝑗
𝑈 𝐴𝐵𝐴′ 𝐵′ = |𝑖⟩⟨𝑖| 𝐴 ⊗ | 𝑗⟩⟨ 𝑗 | 𝐵 ⊗ 𝑈 𝐴′ 𝐵′ (20.3.4)
𝑖, 𝑗
𝑖𝑗
is a controlled unitary known as a “twisting unitary,” with each 𝑈 𝐴′ 𝐵′ a unitary
operator. Due to the fact that the maximally entangled state Φ 𝐴𝐵 is unextendible,
1182
Chapter 20: Secret Key Agreement

any extension 𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 of a private state 𝛾 𝐴𝐴′ 𝐵𝐵′ necessarily has the following
form:
†
𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 = 𝑈 𝐴𝐴′ 𝐵𝐵′ (Φ 𝐴𝐵 ⊗ 𝜎𝐴′ 𝐵′ 𝐸 ) 𝑈 𝐴𝐴 ′ 𝐵𝐵 ′ , (20.3.5)
where 𝜎𝐴′ 𝐵′ 𝐸 is an extension of 𝜎𝐴′ 𝐵′ .
We start with the following lemma, which applies to any extension of a bipartite
private state:

Lemma 20.8
Let 𝛾 𝐴𝐴′ 𝐵𝐵′ be a bipartite private state, and let 𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 be an extension of it,
as given above. Then the following identity holds for any such extension:

2 log2 𝐾 = 𝐼 ( 𝐴; 𝐵𝐵′ |𝐸) 𝛾 + 𝐼 ( 𝐴′; 𝐵| 𝐴𝐵′ 𝐸) 𝛾 . (20.3.6)

Proof: First consider that the following identity holds as a consequence of two
applications of the chain rule for conditional quantum mutual information:

𝐼 ( 𝐴𝐴′; 𝐵𝐵′ |𝐸) 𝛾 = 𝐼 ( 𝐴; 𝐵𝐵′ |𝐸) 𝛾 + 𝐼 ( 𝐴′; 𝐵𝐵′ | 𝐴𝐸) 𝛾

= 𝐼 ( 𝐴; 𝐵𝐵′ |𝐸) 𝛾 + 𝐼 ( 𝐴′; 𝐵′ | 𝐴𝐸) 𝛾 + 𝐼 ( 𝐴′; 𝐵|𝐵′ 𝐴𝐸) 𝛾 . (20.3.7)

Combined with the following identity, which holds for an extension 𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 of a
private state 𝛾 𝐴𝐴′ 𝐵𝐵′ ,

𝐼 ( 𝐴𝐴′; 𝐵𝐵′ |𝐸) 𝛾 = 2 log2 𝐾 + 𝐼 ( 𝐴′; 𝐵′ | 𝐴𝐸) 𝛾 , (20.3.8)

we recover the statement in (20.3.6). So it remains to prove (20.3.8). By definition,

we have that

𝐼 ( 𝐴𝐴′; 𝐵𝐵′ |𝐸) 𝛾 = 𝐻 ( 𝐴𝐴′ 𝐸) 𝛾 + 𝐻 (𝐵𝐵′ 𝐸) 𝛾 − 𝐻 (𝐸) 𝛾 − 𝐻 ( 𝐴𝐴′ 𝐵𝐵′ 𝐸) 𝛾 . (20.3.9)

By applying (20.3.3)–(20.3.5), we can write 𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 as follows:

1 ∑︁
|𝑖⟩⟨ 𝑗 | 𝐴 ⊗ |𝑖⟩⟨ 𝑗 | 𝐵 ⊗ 𝑈 𝑖𝑖𝐴′ 𝐵′ 𝜎𝐴′ 𝐵′ 𝐸 (𝑈 𝐴′ 𝐵′ ) † .
𝑗𝑗
𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 = (20.3.10)
𝐾 𝑖, 𝑗

Tracing over system 𝐵 leads to the following state:

1 ∑︁
𝛾 𝐴𝐴′ 𝐵′ 𝐸 = |𝑖⟩⟨𝑖| 𝐴 ⊗ 𝛾 𝑖𝐴′ 𝐵′ 𝐸 , (20.3.11)
𝐾 𝑖

1183
Chapter 20: Secret Key Agreement

where
𝛾 𝑖𝐴′ 𝐵′ 𝐸 B 𝑈 𝑖𝑖𝐴′ 𝐵′ 𝜎𝐴′ 𝐵′ 𝐸 (𝑈 𝑖𝑖𝐴′ 𝐵′ ) † . (20.3.12)
Similarly, tracing over system 𝐴 of 𝛾 𝐴𝐴′ 𝐵𝐵′ 𝐸 leads to
1 ∑︁
𝛾 𝐵𝐴′ 𝐵′ 𝐸 = |𝑖⟩⟨𝑖| 𝐵 ⊗ 𝛾 𝑖𝐴′ 𝐵′ 𝐸 . (20.3.13)
𝐾 𝑖
So these and the chain rule for conditional entropy imply that
𝐻 ( 𝐴𝐴′ 𝐸) 𝛾 = 𝐻 ( 𝐴) 𝛾 + 𝐻 ( 𝐴′ 𝐸 | 𝐴) 𝛾 = log2 𝐾 + 𝐻 ( 𝐴′ 𝐸 | 𝐴) 𝛾 . (20.3.14)
Similarly, we have that
𝐻 (𝐵𝐵′ 𝐸) 𝛾 = log2 𝐾 + 𝐻 (𝐵′ 𝐸 |𝐵) 𝛾 = log2 𝐾 + 𝐻 (𝐵′ 𝐸 | 𝐴) 𝛾 , (20.3.15)
where we have used the symmetries in (20.3.11)–(20.3.13). Since 𝛾 𝐸 = 𝛾 𝑖𝐸 for all 𝑖
(this is a consequence of 𝛾 𝐴𝐵𝐴′ 𝐵′ being an ideal private state), we find that
1 ∑︁
𝐻 (𝐸) 𝛾 = 𝐻 (𝐸) 𝛾 𝑖 = 𝐻 (𝐸 | 𝐴) 𝛾 . (20.3.16)
𝐾 𝑖
Finally, we have that
𝐻 ( 𝐴𝐴′ 𝐵𝐵′ 𝐸) 𝛾 = 𝐻 ( 𝐴𝐵𝐴′ 𝐵′ 𝐸)Φ⊗𝜎 (20.3.17)
= 𝐻 ( 𝐴𝐵)Φ + 𝐻 ( 𝐴′ 𝐵′ 𝐸)𝜎 (20.3.18)
1 ∑︁
= 𝐻 ( 𝐴′ 𝐵′ 𝐸) 𝛾 𝑖 (20.3.19)
𝐾 𝑖
= 𝐻 ( 𝐴′ 𝐵′ 𝐸 | 𝐴) 𝛾 . (20.3.20)
The first equality follows from unitary invariance of quantum entropy. The second
equality follows because the entropy is additive for tensor-product states. The
third equality follows because 𝐻 ( 𝐴𝐵)Φ = 0 since Φ 𝐴𝐵 is a pure state, and 𝜎𝐴′ 𝐵′ 𝐸
is related to 𝛾 𝑖𝐴′ 𝐵′ 𝐸 by the unitary 𝑈 𝑖𝑖𝐴′ 𝐵′ . The final equality follows by applying
(20.3.11), and the fact that conditional entropy is a convex combination of entropies
for a classical-quantum state where the conditioning system is classical.
Combining (20.3.9), (20.3.14), (20.3.15), (20.3.16), (20.3.20), and the fact that
𝐼 ( 𝐴′; 𝐵′ | 𝐴𝐸) 𝛾 = 𝐻 ( 𝐴′ 𝐸 | 𝐴) 𝛾 +𝐻 (𝐵′ 𝐸 | 𝐴) 𝛾 −𝐻 (𝐸 | 𝐴) 𝛾 −𝐻 ( 𝐴′ 𝐵′ 𝐸 | 𝐴) 𝛾 , (20.3.21)
we recover (20.3.8). ■

We can now establish the squashed entanglement bound for an approximate

bipartite private state:

1184
Chapter 20: Secret Key Agreement

Proposition 20.9
Let 𝛾 𝐴𝐴′ 𝐵𝐵′ be a private state, with key systems 𝐴𝐵 and shield systems 𝐴′ 𝐵′,
and let 𝜔 𝐴𝐴′ 𝐵𝐵′ be an 𝜀-approximate private state, in the sense that

𝐹 (𝛾 𝐴𝐴′ 𝐵𝐵′ , 𝜔 𝐴𝐴′ 𝐵𝐵′ ) ≥ 1 − 𝜀 (20.3.22)

for 𝜀 ∈ [0, 1]. Suppose that | 𝐴| = |𝐵| = 𝐾. Then

√ √
(1 − 2 𝜀) log2 𝐾 ≤ 𝐸 sq ( 𝐴𝐴′; 𝐵𝐵′)𝜔 + 2𝑔2 ( 𝜀), (20.3.23)

where
𝑔2 (𝛿) B (𝛿 + 1) log2 (𝛿 + 1) − 𝛿 log2 𝛿. (20.3.24)

2 log2 𝐾 = 𝐼 ( 𝐴; 𝐵𝐵′ |𝐸) 𝛾 + 𝐼 ( 𝐴′; 𝐵| 𝐴𝐵′ 𝐸) 𝛾 (20.3.26)

The first equality follows from Lemma 20.8. The first inequality follows from
two applications of Proposition 7.10 (uniform continuity of conditional mutual
information). The second inequality follows because 𝐼 ( 𝐴′; 𝐵′ | 𝐴𝐸)𝜔 ≥ 0 (this is
strong subadditivity from Theorem 7.6). The last equality is a consequence of
the chain rule for conditional mutual information, as used in (20.3.7). Since the
inequality √
2 log2 𝐾 ≤ 𝐼 ( 𝐴𝐴′; 𝐵𝐵′ |𝐸)𝜔 + 2 𝑓1 ( 𝜀, 𝐾) (20.3.30)
holds for any extension of 𝜔 𝐴𝐴′ 𝐵𝐵′ , the statement of the proposition follows. ■

1185
Chapter 20: Secret Key Agreement

20.3.2 Squashed Entanglement Upper Bound

We now establish the squashed entanglement upper bound on the number of private
bits that a sender can transmit to a receiver by employing a secret-key-agreement
protocol. The proof is similar to that of Theorem 19.4, but it instead invokes
Proposition 20.9.

Theorem 20.10 𝒏-Shot Squashed Entanglement Upper Bound

Let N 𝐴→𝐵 be a quantum channel, and let 𝜀 ∈ [0, 1/4). For all (𝑛, 𝐾, 𝜀)
secret-key-agreement protocols over the channel N 𝐴→𝐵 , the following bound
holds
1 √
log2 𝑀 ≤ √ 𝑛 · 𝐸 sq (N) + 2𝑔2 ( 𝜀) . (20.3.31)
1−2 𝜀

Proof: Given an arbitrary (𝑛, 𝐾, 𝜀) secret-key-agreement protocol as outlined in

Section 20.1, we consider its equivalent (𝑛, 𝐾, 𝜀) LOCC-assisted bipartite private-
state distillation protocol, as outlined in Section 20.2.2. The squashed entanglement
is an entanglement measure (monotone under LOCC as shown in Theorem 9.33)
and it is equal to zero for separable states (Proposition 9.4.5). Thus, Proposition 20.4
applies, and we find that
A
𝐸 sq (𝐾 𝐴 𝑆 𝐴 ; 𝐾 𝐵 𝑆 𝐵 )𝜔 ≤ 𝑛 · 𝐸 sq (N) = 𝑛 · 𝐸 sq (N), (20.3.32)

where the equality follows from Theorem 10.20. Applying Definition 20.1 and
(20.2.17) leads to
𝐹 (𝛾𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 , 𝜔 𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 ) ≥ 1 − 𝜀, (20.3.33)
where 𝛾𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 is an exact private state of log2 𝐾 private bits. As a consequence
of Proposition 20.9, we find that
√ √
𝐸 sq (𝐾 𝐴 𝑆 𝐴 ; 𝐾 𝐵 𝑆 𝐵 )𝜔 ≥ (1 − 2 𝜀) log2 𝐾 − 2𝑔2 ( 𝜀). (20.3.34)

Putting together (20.3.32) and (20.3.34), we arrive at the statement of the theo-
rem. ■

1186
Chapter 20: Secret Key Agreement

20.4 Relative Entropy of Entanglement Upper Bounds

on the Number of Transmitted Private Bits
We now establish the max-relative entropy of entanglement bound on the number
of private bits that a sender can transmit to a receiver by employing a secret-key-
agreement protocol assisted by public separable channels:

Theorem 20.11 𝒏-Shot Max-Relative Entropy of Entanglement Upper

Bound
Let N 𝐴→𝐵 be a quantum channel, and let 𝜀 ∈ [0, 1). For all (𝑛, 𝐾, 𝜀) secret-key-
agreement protocols assisted by public separable channels, over the channel
N 𝐴→𝐵 , the following bound holds

1
log2 𝐾 ≤ 𝑛 · 𝐸 max (N) + log2 . (20.4.1)
1−𝜀

Proof: Given an arbitrary (𝑛, 𝐾, 𝜀) secret-key-agreement protocol assisted by

public separable channels as outlined in Section 20.2.4, we consider its equivalent
(𝑛, 𝐾, 𝜀) separable-assisted bipartite private-state distillation protocol. The max-
relative entropy of entanglement is an entanglement measure (monotone under
separable channels channels as shown in Proposition 9.16) and it is equal to zero
for separable states. Thus, Proposition 20.4 applies, and we find that
A
𝐸 max (𝐾 𝐴 𝑆 𝐴 ; 𝐾 𝐵 𝑆 𝐵 )𝜔 ≤ 𝑛 · 𝐸 max (N) = 𝑛 · 𝐸 max (N), (20.4.2)

where the equality follows from Theorem 10.16. Applying Definition 20.1 and
(20.2.17) leads to
𝐹 (𝛾𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 , 𝜔 𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 ) ≥ 1 − 𝜀, (20.4.3)
where 𝛾𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 is an exact private state of log2 𝐾 private bits. As a consequence
of Propositions 15.15 and 7.71, we find that

log2 𝐾 ≤ 𝐸 𝑅𝜀 (𝑆 𝐴 𝐾 𝐴 ; 𝑆 𝐵 𝐾 𝐵 )𝜔 (20.4.4)

1
≤ 𝐸 max (𝑆 𝐴 𝐾 𝐴 ; 𝑆 𝐵 𝐾 𝐵 )𝜔 + log2 . (20.4.5)
1−𝜀

Combining (20.4.2) and (20.4.5), we conclude the proof. ■

1187
Chapter 20: Secret Key Agreement

For channels that are separable-simulable with associated resource states, as

given in Definition 4.26, we obtain upper bounds that can be even stronger:

Theorem 20.12 𝒏-Shot Rényi–REE Upper Bounds for Separable-

Simulable Channels
Let N 𝐴→𝐵 be a quantum channel that is separable-simulable with associated
resource state 𝜃 𝑆𝐵′ , and let 𝜀 ∈ [0, 1). For all (𝑛, 𝐾, 𝜀) secret-key-agreement
protocols assisted by public separable channels, over the channel N 𝐴→𝐵 , the
following bounds hold for all 𝛼 > 1:

e𝛼 (𝑆; 𝐵′)𝜃 + 𝛼 1
log2 𝐾 ≤ 𝑛 · 𝐸 log2 , (20.4.6)
𝛼−1 1−𝜀
1
log2 𝐾 ≤ [𝑛 · 𝐸 𝑅 (𝑆; 𝐵′)𝜃 + ℎ2 (𝜀)] . (20.4.7)
1−𝜀

Proof: Given an arbitrary (𝑛, 𝐾, 𝜀) secret-key-agreement protocol assisted by

public separable channels as outlined in Section 20.2.4, we consider its equivalent
(𝑛, 𝐾, 𝜀) separable-assisted bipartite private-state distillation protocol. The Rényi
relative entropy of entanglement and relative entropy of entanglement are monotone
non-increasing under separable channels (Proposition 9.16), equal to zero for
separable states, and subadditive with respect to states (Proposition 9.16). As such,
Corollary 20.7 applies, and we find for 𝛼 > 1 that
e𝛼 (𝑆 𝐴 𝐾 𝐴 ; 𝑆 𝐵 𝐾 𝐵 )𝜔 ≤ 𝑛 · 𝐸
𝐸 e𝛼 (𝑆; 𝐵′)𝜃 , (20.4.8)
𝐸 𝑅 (𝑆 𝐴 𝐾 𝐴 ; 𝑆 𝐵 𝐾 𝐵 )𝜔 ≤ 𝑛 · 𝐸 𝑅 (𝑆; 𝐵′)𝜃 . (20.4.9)

Applying Definition 20.1 and (20.2.17) leads to

𝐹 (𝛾𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 , 𝜔 𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 ) ≥ 1 − 𝜀, (20.4.10)

where 𝛾𝐾 𝐴𝑆 𝐴𝐾 𝐵 𝑆 𝐵 is an exact private state of log2 𝐾 private bits. As a consequence

of Proposition 15.15, we have that

log2 𝐾 ≤ 𝐸 𝑅𝜀 (𝑆 𝐴 𝐾 𝐴 ; 𝑆 𝐵 𝐾 𝐵 )𝜔 . (20.4.11)

Applying Propositions 7.70 and 7.71, we find that

𝛼 1
log2 𝐾 ≤ 𝐸
e𝛼 (𝑆 𝐴 𝐾 𝐴 ; 𝑆 𝐵 𝐾 𝐵 )𝜔 + log2 , (20.4.12)
𝛼−1 1−𝜀
1188
Chapter 20: Secret Key Agreement

1
log2 𝐾 ≤ [𝐸 𝑅 (𝑆 𝐴 𝐾 𝐴 ; 𝑆 𝐵 𝐾 𝐵 )𝜔 + ℎ2 (𝜀)] . (20.4.13)
1−𝜀
Putting together (20.4.8), (20.4.9), (20.4.12), and (20.4.13) concludes the proof. ■

20.5 Secret-Key-Agreement Capacities of Quantum

Channels
In this section, we analyze the asymptotic capacities, and as before, the upper
bounds for the asymptotic capacities are straightforward consequences of the
non-asymptotic bounds given in Sections 20.3.2 and 20.4. The definitions of these
capacities are similar to what we have given previously, and so we only state them
here briefly.

Definition 20.13 Achievable Rate for Secret Key Agreement

Given a quantum channel N, a rate 𝑅 ∈ R+ is called an achievable rate for
secret key agreement over N if for all 𝜀 ∈ (0, 1], all 𝛿 > 0, and all sufficiently
large 𝑛, there exists an (𝑛, 2𝑛(𝑅−𝛿) , 𝜀) secret-key-agreement protocol.

Definition 20.14 Secret-Key-Agreement Capacity of a Quantum Channel

The secret-key-agreement capacity of a quantum channel N, denoted by 𝑃↔ (N),
is defined as the supremum of all achievable rates, i.e.,

𝑃↔ (N) B sup{𝑅 : 𝑅 is an achievable rate for N}. (20.5.1)

Definition 20.15 Weak Converse Rate for Secret Key Agreement

Given a quantum channel N, a rate 𝑅 ∈ R+ is called a weak converse rate for
secret key agreement over N if every 𝑅′ > 𝑅 is not an achievable rate for N.

1189
Chapter 20: Secret Key Agreement

Definition 20.16 Strong Converse Rate for Secret Key Agreement

Given a quantum channel N, a rate 𝑅 ∈ R+ is called a strong converse rate for
secret key agreement over N if for all 𝜀 ∈ [0, 1), all 𝛿 > 0, and all sufficiently
large 𝑛, there does not exist an (𝑛, 2𝑛(𝑅+𝛿) , 𝜀) secret-key-agreement protocol.

Definition 20.17 Strong Converse Secret-Key-Agreement Capacity of a

Quantum Channel
The strong converse secret-key-agreement capacity of a quantum channel N,
e↔ (N), is defined as the infimum of all strong converse rates, i.e.,
denoted by 𝑃
e↔ (N) B inf{𝑅 : 𝑅 is a strong converse rate for N}.
𝑃 (20.5.2)

We have the exact same definitions for secret key agreement assisted by
public separable channels, and we use the notation 𝑃SEP↔ to refer to the public-

e↔ for the strong converse

separable-assisted secret-key-agreement capacity and 𝑃 SEP
secret-key-agreement capacity assisted by public separable channels.
Recall that, by definition, the following bounds hold

𝑃↔ (N) ≤ 𝑃
e↔ (N) ≤ 𝑃
e↔ (N),
SEP (20.5.3)
𝑃 (N) ≤ 𝑃 (N) ≤ 𝑃↔ (N).
↔ ↔
SEP
e
SEP (20.5.4)

As a direct consequence of the bound in Theorem 20.10 and methods similar to

those given in the proof of Theorem 11.23, we find the following:

Theorem 20.18 Squashed-Entanglement Weak-Converse Bound

The squashed entanglement of a channel N is a weak converse rate for secret
key agreement:
𝑃↔ (N) ≤ 𝐸 sq (N). (20.5.5)

As a direct consequence of the bound in Theorem 20.11 and methods similar to

those given in Section 11.2.3, we find that

1190
Chapter 20: Secret Key Agreement

Theorem 20.19 Max-Relative Entropy of Entanglement Strong-Converse

Bound
The max-relative entropy of entanglement of a channel N is a strong converse
rate for secret key agreement assisted by public separable channels:
e↔ (N) ≤ 𝐸 max (N).
𝑃 (20.5.6)
SEP

As a direct consequence of the bound in Theorem 20.12 and methods similar to

those given in Section 11.2.3, we find that

Theorem 20.20 Relative Entropy of Entanglement Strong-Converse

Bound for Separable-Simulable Channels
Let N be a quantum channel that is separable-simulable with associated resource
state 𝜃 𝑆𝐵′ . Then the relative entropy of entanglement of 𝜃 𝑆𝐵′ is a strong converse
rate for secret key agreement assisted by public separable channels:
e↔ (N) ≤ 𝐸 𝑅 (𝑆; 𝐵′)𝜃 .
𝑃 (20.5.7)
SEP

20.6 Examples
[IN PROGRESS]

20.7 Bibliographic Notes

Quantum key distribution is one of the first examples of a secret-key-agreement
protocol conducted over a quantum channel (Bennett and Brassard, 1984; Ekert,
1991). Secret key agreement was considered in classical information theory by
Maurer (1993); Ahlswede and Csiszár (1993). Secret key distillation from a
bipartite quantum state was studied by a number of researchers, including Devetak
and Winter (2005); Horodecki et al. (2005a); Christandl (2006); Horodecki et al.
(2008a, 2009a); Christandl et al. (2007, 2012), well before secret key agreement over
quantum channels was considered. One of the seminal insights in this domain, that

1191
Chapter 20: Secret Key Agreement

a tripartite secret-key-distillation protocol is equivalent to a bipartite private-state

distillation protocol, was made by Horodecki et al. (2005a, 2009a). These authors
also established the relative entropy of entanglement (Vedral and Plenio, 1998) as
an upper bound on the rate of a secret-key-distillation protocol. In earlier work,
Curty et al. (2004) observed that a separable state has no distillable secret key.
Secret key agreement over a quantum channel was formally defined by Takeoka
et al. (2014), who also observed that the aforementioned insight extends to this
setting, with more details being given by Kaur and Wilde (2017). The issue
of unbounded shield systems resulting from secret-key-distillation or secret-key-
agreement protocols was somewhat implicit in (Horodecki et al., 2005a, 2009a)
and discussed in more detail by Christandl et al. (2012); Wilde (2016); Wilde et al.
(2017). The relation between LOCC-assisted quantum communication and secret
key agreement was discussed by Wilde et al. (2017). The notion of secret key
agreement assisted by public separable channels is original to this book (including
the observation that various previously known bounds apply to these more general
protocols).
The use of teleportation simulation and relative entropy of entanglement as a
method for bounding rates of secret-key-agreement protocols was presented by
Pirandola et al. (2017).
The amortized entanglement bound for secret-key-agreement protocols (Propo-
sition 20.4) was contributed by Kaur and Wilde (2017). Corollary 20.5, as presented
here, is due to Kaur and Wilde (2017).
The squashed entanglement upper bound on the rate of a secret-key-agreement
protocol was established by Takeoka et al. (2014); Wilde (2016). Lemma 20.8 and
Proposition 20.9 are due to Wilde (2016).
The use of sandwiched relative entropy of entanglement for bounding key rates
was contributed by Wilde et al. (2017). The max-relative entropy of entanglement
of a state was introduced by Datta (2009b,a), and the generalization to channels by
Christandl and Müller-Hermes (2017). Lemmas 9.21 and 10.10 were established
by Berta and Wilde (2018). Proposition 10.16 was proven by Christandl and
Müller-Hermes (2017). The proof that we follow here was given by Berta and
Wilde (2018). The interpretation of Proposition 10.16 in terms of “amortization
collapse” was given by Berta and Wilde (2018). Theorem 20.11 was proven by
Christandl and Müller-Hermes (2017). The precise statement of Theorem 20.12
is due to Wilde et al. (2017); Kaur and Wilde (2017) (although it was stated for

1192
Chapter 20: Secret Key Agreement

LOCC-simulable channels in these papers).

1193
Summary
[IN PROGRESS]

1194
Appendix A

Analyzing General
Communication Scenarios
[IN PROGRESS]

1195
Bibliography
A. Acín, E. Bagan, M. Baig, Ll. Masanes, and R. Muñoz Tapia. Multiple-copy two-state
discrimination with individual measurements. Physical Review A, 71:032338, March 2005. doi:
10.1103/PhysRevA.71.032338. URL https://link.aps.org/doi/10.1103/PhysRevA.71.
032338.

C. Adami and N. J. Cerf. von Neumann capacity of noisy quantum channels. Physical Review A,
56:3470–3483, November 1997. doi: 10.1103/PhysRevA.56.3470. URL https://link.aps.
org/doi/10.1103/PhysRevA.56.3470.

Dorit Aharonov, Alexei Kitaev, and Noam Nisan. Quantum Circuits with Mixed States. In
Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC ’98, page
20–30, New York, NY, USA, 1998. Association for Computing Machinery. ISBN 0897919629.
doi: 10.1145/276698.276708. URL https://doi.org/10.1145/276698.276708.

Rudolf Ahlswede and Imre Csiszár. Common randomness in information theory and cryptography.
I. Secret sharing. IEEE Transactions on Information Theory, 39:1121–1132, July 1993. URL
https://ieeexplore.ieee.org/document/243431.

Robert Alicki and Mark Fannes. Continuity of quantum conditional information. Journal of Physics
A: Mathematical and General, 37:L55–L57, January 2004. URL https://doi.org/10.1088/
0305-4470/37/5/l01.

Anurag Anshu, Vamsi Krishna Devabathini, and Rahul Jain. Quantum communication using coherent
rejection sampling. Physical Review Letters, 119:120506, September 2017. doi: 10.1103/
PhysRevLett.119.120506. URL https://link.aps.org/doi/10.1103/PhysRevLett.119.
120506.

Anurag Anshu, Mario Berta, Rahul Jain, and Marco Tomamichel. A minimax approach to one-shot
entropy inequalities. Journal of Mathematical Physics, 60:122201, December 2019. doi:
10.1063/1.5126723. URL https://doi.org/10.1063/1.5126723.

Anurag Anshu, Rahul Jain, and Naqueeb A. Warsi. Building blocks for communication over noisy
quantum networks. IEEE Transactions on Information Theory, 65:1287–1306, February 2019. doi:
10.1109/TIT.2018.2851297. URL https://ieeexplore.ieee.org/document/8399830.

1196
Anurag Anshu, Rahul Jain, and Naqueeb A. Warsi. On the near-optimality of one-shot classical
communication over quantum channels. Journal of Mathematical Physics, 60:012204, January
2019. doi: 10.1063/1.5039796. URL https://doi.org/10.1063/1.5039796.

Huzihiro Araki. On an inequality of Lieb and Thirring. Letters in Mathematical Physics,

19:167–170, February 1990. ISSN 1573-0530. doi: 10.1007/BF01045887. URL https:
//doi.org/10.1007/BF01045887.

S. Arora, E. Hazan, and S. Kale. Fast algorithms for approximate semidefinite programming using
the multiplicative weights update method. In 46th Annual IEEE Symposium on Foundations
of Computer Science (FOCS’05), pages 339–348, 2005. doi: 10.1109/SFCS.2005.35. URL
https://ieeexplore.ieee.org/document/1530726.

Sanjeev Arora and Satyen Kale. A combinatorial, primal-dual approach to semidefinite programs. In
Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, pages 227–236,
New York, NY, USA, June 2007. Association for Computing Machinery. ISBN 9781595936318.
doi: 10.1145/1250790.1250823. URL https://doi.org/10.1145/1250790.1250823.

Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a
meta-algorithm and applications. Theory of Computing, 8:121–164, 2012. doi: 10.4086/toc.
2012.v008a006. URL http://www.theoryofcomputing.org/articles/v008a006.

Koenraad Audenaert, Bart De Moor, Karl Gerd H. Vollbrecht, and Reinhard F. Werner. Asymptotic
relative entropy of entanglement for orthogonally invariant states. Physical Review A, 66:032310,
September 2002. doi: 10.1103/PhysRevA.66.032310. URL http://link.aps.org/doi/10.
1103/PhysRevA.66.032310.

Koenraad M. R. Audenaert. Comparisons between quantum state distinguishability measures.

Quantum Information and Computation, 14:31–38, January 2014. ISSN 1533-7146. URL
http://dl.acm.org/citation.cfm?id=2600498.2600500.

Koenraad M. R. Audenaert and Jens Eisert. Continuity bounds on the quantum relative entropy.
Journal of Mathematical Physics, 46:102104, October 2005. doi: 10.1063/1.2044667. URL
https://doi.org/10.1063/1.2044667.

Koenraad M. R. Audenaert, John Calsamiglia, Ramon Muñoz Tapia, Emilio Bagan, Lluis Masanes,
Antonio Acin, and Frank Verstraete. Discriminating states: The quantum Chernoff bound.
Physical Review Letters, 98:160501, April 2007. doi: 10.1103/PhysRevLett.98.160501. URL
http://link.aps.org/doi/10.1103/PhysRevLett.98.160501.

Koenraad M. R. Audenaert, Milàn Mosonyi, and Frank Verstraete. Quantum state discrimination
bounds for finite sample size. Journal of Mathematical Physics, 53:122205, December 2012.
doi: 10.1063/1.4768252. URL https://doi.org/10.1063/1.4768252.

Masashi Ban, Kouichi Yamazaki, and Osamu Hirota. Accessible information in combined and
sequential quantum measurementson a binary-state signal. Physical Review A, 55:22–26,
January 1997. doi: 10.1103/PhysRevA.55.22. URL https://link.aps.org/doi/10.1103/
PhysRevA.55.22.

1197
Howard Barnum, Michael A. Nielsen, and Benjamin Schumacher. Information transmission through
a noisy quantum channel. Physical Review A, 57:4153–4175, June 1998. doi: 10.1103/PhysRevA.
57.4153. URL https://link.aps.org/doi/10.1103/PhysRevA.57.4153.

Howard Barnum, Emanuel Knill, and Michael A. Nielsen. On quantum fidelities and channel
capacities. IEEE Transactions on Information Theory, 46:1317–1329, July 2000. URL
https://ieeexplore.ieee.org/document/850671.

David Beckman, Daniel Gottesman, Michael A. Nielsen, and John Preskill. Causal and localizable
quantum operations. Physical Review A, 64:052309, October 2001. doi: 10.1103/PhysRevA.64.
052309. URL https://link.aps.org/doi/10.1103/PhysRevA.64.052309.

Salman Beigi. Sandwiched Rényi divergence satisfies data processing inequality. Journal of
Mathematical Physics, 54:122202, December 2013. URL https://aip.scitation.org/
doi/10.1063/1.4838855.

Salman Beigi, Nilanjana Datta, and Felix Leditzky. Decoding quantum information via the Petz
recovery map. Journal of Mathematical Physics, 57:082203, August 2016. URL https:
//doi.org/10.1063/1.4961515.

V. P. Belavkin and P. Staszewski. C*-algebraic generalization of relative entropy and entropy. Annales
de l’I.H.P. Physique théorique, 37:51–58, 1982. URL http://eudml.org/doc/76163.

Ingemar Bengtsson and Karol Zyczkowski. Geometry of Quantum States: An Introduction to

Quantum Entanglement. Cambridge University Press, second edition, 2017. doi: 10.1017/
9781139207010.

Charles H. Bennett and Gilles Brassard. Quantum cryptography: Public key distribution and coin
tossing. In International Conference on Computer System and Signal Processing, IEEE, 1984,
pages 175–179, 1984.

Charles H. Bennett and Stephen J. Wiesner. Communication via one- and two-particle operators on
Einstein-Podolsky-Rosen states. Physical Review Letters, 69:2881–2884, November 1992. doi:
10.1103/PhysRevLett.69.2881. URL https://link.aps.org/doi/10.1103/PhysRevLett.
69.2881.

Charles H. Bennett, Gilles Brassard, Claude Crépeau, Richard Jozsa, Asher Peres, and William K.
Wootters. Teleporting an unknown quantum state via dual classical and Einstein-Podolsky-Rosen
channels. Physical Review Letters, 70:1895–1899, March 1993. URL https://link.aps.
org/doi/10.1103/PhysRevLett.70.1895.

Charles H. Bennett, Gilles Brassard, Claude Crepeau, and Ueli M. Maurer. Generalized privacy
amplification. IEEE Transactions on Information Theory, 41:1915–1923, November 1995. doi:
10.1109/18.476316. URL https://ieeexplore.ieee.org/document/476316.

Charles H. Bennett, Herbert J. Bernstein, Sandu Popescu, and Benjamin Schumacher. Concentrating
partial entanglement by local operations. Physical Review A, 53:2046–2052, April 1996a. doi:
10.1103/PhysRevA.53.2046. URL https://link.aps.org/doi/10.1103/PhysRevA.53.
2046.

1198
Charles H. Bennett, Gilles Brassard, Sandu Popescu, Benjamin Schumacher, John A. Smolin, and
William K. Wootters. Purification of noisy entanglement and faithful teleportation via noisy
channels. Physical Review Letters, 76:722–725, January 1996b. doi: 10.1103/PhysRevLett.76.
722. URL https://link.aps.org/doi/10.1103/PhysRevLett.76.722.

Charles H. Bennett, David P. DiVincenzo, John A. Smolin, and William K. Wootters. Mixed-state
entanglement and quantum error correction. Physical Review A, 54:3824–3851, November 1996c.
doi: 10.1103/PhysRevA.54.3824. URL https://link.aps.org/doi/10.1103/PhysRevA.
54.3824.

Charles H. Bennett, David P. DiVincenzo, and John A. Smolin. Capacities of quantum erasure
channels. Physical Review Letters, 78:3217–3220, April 1997. doi: 10.1103/PhysRevLett.78.
3217. URL https://link.aps.org/doi/10.1103/PhysRevLett.78.3217.

Charles H. Bennett, David P. DiVincenzo, Christopher A. Fuchs, Tal Mor, Eric Rains, Peter W.
Shor, John A. Smolin, and William K. Wootters. Quantum nonlocality without entanglement.
Physical Review A, 59:1070–1091, February 1999a. doi: 10.1103/PhysRevA.59.1070. URL
http://link.aps.org/doi/10.1103/PhysRevA.59.1070.

Charles H. Bennett, Peter W. Shor, John A. Smolin, and Ashish V. Thapliyal. Entanglement-assisted
classical capacity of noisy quantum channels. Physical Review Letters, 83:3081–3084, October
1999b. doi: 10.1103/PhysRevLett.83.3081. URL https://link.aps.org/doi/10.1103/
PhysRevLett.83.3081.

Charles H. Bennett, Peter W. Shor, John A. Smolin, and Ashish V. Thapliyal. Entanglement-
assisted capacity of a quantum channel and the reverse Shannon theorem. IEEE Transactions on
Information Theory, 48:2637–2655, October 2002. URL https://ieeexplore.ieee.org/
document/1035117.

Charles H. Bennett, Aram W. Harrow, Debbie W. Leung, and John A. Smolin. On the capacities
of bipartite Hamiltonians and unitary gates. IEEE Transactions on Information Theory, 49:
1895–1911, August 2003. ISSN 0018-9448. doi: 10.1109/TIT.2003.814935. URL https:
//ieeexplore.ieee.org/document/1214070.

Charles H. Bennett, Igor Devetak, Peter W. Shor, and John A. Smolin. Inequalities and separations
among assisted capacities of quantum channels. Physical Review Letters, 96:150502, April 2006.
URL https://link.aps.org/doi/10.1103/PhysRevLett.96.150502.

Charles H. Bennett, Igor Devetak, Aram W. Harrow, Peter W. Shor, and Andreas Winter. The
quantum reverse Shannon theorem and resource tradeoffs for simulating quantum channels. IEEE
Transactions on Information Theory, 60:2926–2959, May 2014. URL https://ieeexplore.
ieee.org/document/6757002.

Dominic W. Berry. Qubit channels that achieve capacity with two states. Physical Review A, 71:
032334, March 2005. URL https://link.aps.org/doi/10.1103/PhysRevA.71.032334.

Mario Berta. Single-shot quantum state merging. Diploma thesis, ETH Zurich, February 2008.

1199
Mario Berta and Mark M. Wilde. Amortization does not enhance the max-Rains information of a
quantum channel. New Journal of Physics, 20:053044, May 2018. doi: 10.1088/1367-2630/
aac153. URL https://doi.org/10.1088/1367-2630/aac153.

Mario Berta, Omar Fawzi, and Marco Tomamichel. On variational expressions for quantum
relative entropies. Letters in Mathematical Physics, 107:2239–2265, December 2017. URL
https://doi.org/10.1007/s11005-017-0990-7.

Reinhold A. Bertlmann and Philipp Krammer. Bloch vectors for qudits. Journal of Physics A:
Mathematical and Theoretical, 41:235303, May 2008. doi: 10.1088/1751-8113/41/23/235303.
URL https://doi.org/10.1088/1751-8113/41/23/235303.

Rajendra Bhatia. Matrix Analysis. Springer New York, 1997. doi: 10.1007/978-1-4612-0653-8.

Igor Bjelakovic and Rainer Siegmund-Schultze. Quantum Stein’s lemma revisited, inequalities for
quantum entropies, and a concavity theorem of Lieb. July 2003.

Garry Bowen. Quantum feedback channels. IEEE Transactions on Information Theory, 50:
2429–2434, October 2004. URL https://ieeexplore.ieee.org/document/1337116.

Garry Bowen and Sougato Bose. Teleportation as a depolarizing quantum channel, relative entropy,
and classical capacity. Physical Review Letters, 87:267901, December 2001. doi: 10.1103/
PhysRevLett.87.267901. URL https://link.aps.org/doi/10.1103/PhysRevLett.87.
267901.

Garry Bowen and Rajagopal Nagarajan. On feedback and the classical capacity of a noisy
quantum channel. IEEE Transactions on Information Theory, 51:320–324, January 2005. URL
https://ieeexplore.ieee.org/document/1365361.

Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.

Fernando G. S. L. Brandao and Nilanjana Datta. One-shot rates for entanglement manipulation
under non-entangling maps. IEEE Transactions on Information Theory, 57:1754–1760, March
2011. ISSN 0018-9448. doi: 10.1109/TIT.2011.2104531. URL https://ieeexplore.ieee.
org/document/5714245.

Fernando G.S.L. Brandao, Matthias Christandl, and Jon Yard. Faithful squashed entanglement.
Communications in Mathematical Physics, 306:805–830, September 2011. ISSN 0010-3616. doi:
10.1007/s00220-011-1302-1. URL http://dx.doi.org/10.1007/s00220-011-1302-1.

Sarah Brandsen, Mengke Lian, Kevin D. Stubbs, Narayanan Rengaswamy, and Henry D. Pfister.
Adaptive Procedures for Discriminating Between Arbitrary Tensor-Product Quantum States. In
2020 IEEE International Symposium on Information Theory (ISIT), pages 1933–1938, 2020.
doi: 10.1109/ISIT44484.2020.9174234. URL https://ieeexplore.ieee.org/abstract/
document/9174234.

Samuel L. Braunstein and H. J. Kimble. Teleportation of continuous quantum variables. Physical

Review Letters, 80:869–872, January 1998. URL https://link.aps.org/doi/10.1103/
PhysRevLett.80.869.

1200
Samuel L. Braunstein, Giacomo M. D’Ariano, Gerard J. Milburn, and Massimiliano F. Sacchi.
Universal teleportation with a twist. Physical Review Letters, 84:3486–3489, April 2000. doi:
10.1103/PhysRevLett.84.3486. URL https://link.aps.org/doi/10.1103/PhysRevLett.
84.3486.

Heinz-Peter Breuer and Francesco Petruccione. The Theory of Open Quantum Systems. Oxford
University Press, 2002.

Dorje Brody and Bernhard Meister. Minimum decision cost for quantum ensembles. Physical
Review Letters, 76:1–5, January 1996. doi: 10.1103/PhysRevLett.76.1. URL https://link.
aps.org/doi/10.1103/PhysRevLett.76.1.

Francesco Buscemi and Nilanjana Datta. The quantum capacity of channels with arbitrarily
correlated noise. IEEE Transactions on Information Theory, 56:1447–1460, March 2010a.
ISSN 0018-9448. doi: 10.1109/TIT.2009.2039166. URL https://ieeexplore.ieee.org/
document/5429118.

Francesco Buscemi and Nilanjana Datta. Distilling entanglement from arbitrary resources. Jour-
nal of Mathematical Physics, 51:102201, October 2010b. doi: http://dx.doi.org/10.1063/
1.3483717. URL http://scitation.aip.org/content/aip/journal/jmp/51/10/10.
1063/1.3483717.

Mark S. Byrd and Navin Khaneja. Characterization of the positivity of the density matrix in terms
of the coherence vector representation. Physical Review A, 68:062322, December 2003. doi:
10.1103/PhysRevA.68.062322. URL https://link.aps.org/doi/10.1103/PhysRevA.68.
062322.

Ning Cai, Andreas Winter, and Raymond W. Yeung. Quantum privacy and quantum wiretap
channels. Problems of Information Transmission, 40:318–336, October 2004. ISSN 0032-9460.
URL http://dx.doi.org/10.1007/s11122-005-0002-x.

Gianfranco Cariolaro and Tomaso Erseghe. Pulse Position Modulation. John Wiley & Sons, Inc.,
2003. ISBN 9780471219286. doi: 10.1002/0471219282.eot394. URL http://dx.doi.org/
10.1002/0471219282.eot394.

Eric A. Carlen. Trace inequalities and quantum entropy: An introductory course. Contempo-
rary Mathematics, 529:73–140, 2010. URL http://www.ueltschi.org/AZschool/notes/
EricCarlen.pdf.

Filippo Caruso and Vittorio Giovannetti. Degradability of bosonic Gaussian channels. Physical
Review A, 74:062307, December 2006. URL https://journals.aps.org/pra/abstract/
10.1103/PhysRevA.74.062307.

Nicholas J. Cerf and Christoph Adami. Negative entropy and information in quantum mechanics.
Physical Review Letters, 79:5194–5197, December 1997. doi: 10.1103/PhysRevLett.79.5194.
URL https://link.aps.org/doi/10.1103/PhysRevLett.79.5194.

1201
Nicolas J. Cerf and Chris Adami. Information theory of quantum entanglement and measurement.
Physica D: Nonlinear Phenomena, 120:62–81, September 1998. ISSN 0167-2789. doi:
https://doi.org/10.1016/S0167-2789(98)00045-1. URL http://www.sciencedirect.com/
science/article/pii/S0167278998000451.

Herman Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the
sum of observations. The Annals of Mathematical Statistics, 23:493–507, 12 1952. doi:
10.1214/aoms/1177729330. URL https://doi.org/10.1214/aoms/1177729330.

Giulio Chiribella, Giacomo Mauro D’Ariano, and Paolo Perinotti. Realization schemes for quantum
instruments in finite dimensions. Journal of Mathematical Physics, 50:042101, April 2009. doi:
10.1063/1.3105923. URL http://dx.doi.org/10.1063/1.3105923.

Eric Chitambar, Debbie Leung, Laura Mančinska, Maris Ozols, and Andreas Winter. Everything you
always wanted to know about LOCC (but were afraid to ask). Communications in Mathematical
Physics, 328:303–326, May 2014. ISSN 1432-0916. doi: 10.1007/s00220-014-1953-9. URL
http://dx.doi.org/10.1007/s00220-014-1953-9.

Eric Chitambar, Julio I. de Vicente, Mark W. Girard, and Gilad Gour. Entanglement manipulation
and distillability beyond LOCC. Journal of Mathematical Physics, 61:042201, April 2020. URL
https://doi.org/10.1063/1.5124109.

Man-Duen Choi. Completely positive linear maps on complex matrices. Linear Algebra and
Its Applications, 10:285–290, 1975. URL https://www.sciencedirect.com/science/
article/pii/0024379575900750.

Matthias Christandl. The Structure of Bipartite Quantum States: Insights from Group Theory and
Cryptography. PhD thesis, University of Cambridge, April 2006.

Matthias Christandl and Alexander Müller-Hermes. Relative entropy bounds on quantum, private
and repeater capacities. Communications in Mathematical Physics, 353:821–852, July 2017. doi:
10.1007/s00220-017-2885-y. URL https://doi.org/10.1007/s00220-017-2885-y.

Matthias Christandl and Andreas Winter. ‘Squashed entanglement’: An additive entanglement

measure. Journal of Mathematical Physics, 45:829–840, March 2004. URL https://doi.
org/10.1063/1.1643788.

Matthias Christandl, Artur Ekert, Michal Horodecki, Pawel Horodecki, Jonathan Oppenheim, and
Renato Renner. Unifying classical and quantum key distillation. Proceedings of the 4th Theory
of Cryptography Conference, Lecture Notes in Computer Science, 4392:456–478, February 2007.
URL https://doi.org/10.1007/978-3-540-70936-7_25.

Matthias Christandl, Norbert Schuch, and Andreas Winter. Entanglement of the antisymmetric
state. Communications in Mathematical Physics, 311:397–422, March 2012. URL https:
//doi.org/10.1007/s00220-012-1446-7.

1202
Benoît Collins. Moments and cumulants of polynomial random variables on unitarygroups, the
Itzykson-Zuber integral, and free probability. International Mathematics Research Notices,
2003:953–982, 01 2003. ISSN 1073-7928. doi: 10.1155/S107379280320917X. URL https:
//doi.org/10.1155/S107379280320917X.

Benoît Collins and Piotr Śniady. Integration with Respect to the Haar Measure on Unitary,
Orthogonal and Symplectic Group. Communications in Mathematical Physics, 264:773–795,
June 2006. ISSN 1432-0916. doi: 10.1007/s00220-006-1554-3. URL https://doi.org/10.
1007/s00220-006-1554-3.

Tom Cooney, Milan Mosonyi, and Mark M. Wilde. Strong converse exponents for a quantum
channel discrimination problem and quantum-feedback-assisted communication. Communica-
tions in Mathematical Physics, 344:797–829, June 2016. URL https://doi.org/10.1007/
s00220-016-2645-4.

John Cortese. Relative entropy and single qubit Holevo-Schumacher-Westmoreland channel capacity.
July 2002.

Imre Csiszár and Janos Körner. Broadcast channels with confidential messages. IEEE Transactions
on Information Theory, 24:339–348, May 1978. URL https://ieeexplore.ieee.org/
document/1055892.

Toby Cubitt, David Elkouss, William Matthews, Maris Ozols, David Pérez-García, and Sergii
Strelchuk. Unbounded number of channel uses may be required to detect quantum capacity.
Nature Communications, 6:6739, 2015. URL https://doi.org/10.1038/ncomms7739.

Toby S. Cubitt, Mary Beth Ruskai, and Graeme Smith. The structure of degradable quantum
channels. Journal of Mathematical Physics, 49:102104, 2008. URL https://doi.org/10.
1063/1.2953685.

Marcos Curty, Maciej Lewenstein, and Norbert Lütkenhaus. Entanglement as a precondition for
secure quantum key distribution. Physical Review Letters, 92:217903, May 2004. doi: 10.
1103/PhysRevLett.92.217903. URL https://link.aps.org/doi/10.1103/PhysRevLett.
92.217903.

Nilanjana Datta. Max-relative entropy of entanglement, alias log robustness. International Journal
of Quantum Information, 7:475–491, January 2009a. URL https://www.worldscientific.
com/doi/abs/10.1142/S0219749909005298.

Nilanjana Datta. Min- and max-relative entropies and a new entanglement monotone. IEEE
Transactions on Information Theory, 55:2816–2826, June 2009b. URL https://ieeexplore.
ieee.org/document/4957651.

Nilanjana Datta and Min-Hsiu Hsieh. One-shot entanglement-assisted quantum and classical
communication. IEEE Transactions on Information Theory, 59:1929–1939, March 2013. URL
https://ieeexplore.ieee.org/document/6359930.

1203
Nilanjana Datta and Felix Leditzky. A limit of the quantum Rényi divergence. Journal of Physics
A: Mathematical and Theoretical, 47:045304, January 2014. URL http://stacks.iop.org/
1751-8121/47/i=4/a=045304.

Nilanjana Datta, Milan Mosonyi, Min-Hsiu Hsieh, and Fernando G. S. L. Brandão. A smooth entropy
approach to quantum hypothesis testing and the classical capacity of quantum channels. IEEE
Transactions on Information Theory, 59:8014–8026, December 2013. ISSN 0018-9448. doi:
10.1109/TIT.2013.2282160. URL https://ieeexplore.ieee.org/document/6670246.

Nilanjana Datta, Marco Tomamichel, and Mark M. Wilde. On the second-order asymptotics for
entanglement-assisted communication. Quantum Information Processing, 15:2569–2591, June
2016. URL https://doi.org/10.1007/s11128-016-1272-5.

Edward B. Davies and J. T. Lewis. An operational approach to quantum probability. Communications

in Mathematical Physics, 17:239–260, 1970. URL https://doi.org/10.1007/BF01647093.

John de Pillis. Linear transformations which preserve Hermitian and pos-

itive semidefinite operators. Pacific Journal of Mathematics, 23:129–
137, 1967. doi: pjm/1102991990. URL https://projecteuclid.
org/journals/pacific-journal-of-mathematics/volume-23/issue-1/
Linear-transformations-which-preserve-hermitian-and-positive-semidefinite-operators/
pjm/1102991990.full.

Igor Devetak. The private classical capacity and quantum capacity of a quantum channel. IEEE
Transactions on Information Theory, 51:44–55, January 2005. doi: 10.1109/TIT.2004.839515.
URL https://ieeexplore.ieee.org/document/1377491.

Igor Devetak and Peter W. Shor. The Capacity of a Quantum Channel for Simultaneous Transmission
of Classical and Quantum Information. Communications in Mathematical Physics, 256:287–303,
June 2005. URL https://doi.org/10.1007/s00220-005-1317-6.

Igor Devetak and Andreas Winter. Distillation of secret key and entanglement from quantum
states. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering
Sciences, 461:207–235, January 2005. doi: 10.1098/rspa.2004.1372. URL https://doi.org/
10.1098/rspa.2004.1372.

Igor Devetak, Marius Junge, Christopher King, and Mary Beth Ruskai. Multiplicativity of completely
bounded 𝑝-norms implies a new additivity result. Communications in Mathematical Physics,
266:37–63, August 2006. URL https://doi.org/10.1007/s00220-006-0034-0.

Dawei Ding and Mark M. Wilde. Strong converse for the feedback-assisted classical capacity of
entanglement-breaking channels. Problems of Information Transmission, 54:1–19, 2018. URL
https://doi.org/10.1134/S0032946018010015.

Dawei Ding, Yihui Quek, Peter W. Shor, and Mark M. Wilde. Entropy bound for the classical
capacity of a quantum channel assisted by classical feedback. In Proceedings of the 2019 IEEE
International Symposium on Information Theory, pages 250–254, Paris, France, July 2019. URL
https://ieeexplore.ieee.org/document/8849604.

1204
Dawei Ding, Sumeet Khatri, Yihui Quek, Peter W. Shor, Xin Wang, and Mark M. Wilde.
Bounding the forward classical capacity of bipartite quantum channels. IEEE Transactions
on Information Theory, 69(5):3034–3061, May 2023. doi: 10.1109/TIT.2022.3233924. URL
https://ieeexplore.ieee.org/document/10005080.

David P. DiVincenzo, Peter W. Shor, and John A. Smolin. Quantum-channel capacity of very noisy
channels. Physical Review A, 57:830–839, February 1998. doi: 10.1103/PhysRevA.57.830.
URL https://link.aps.org/doi/10.1103/PhysRevA.57.830.

David P. DiVincenzo, Peter W. Shor, John A. Smolin, Barbara M. Terhal, and Ashish V. Thapliyal.
Evidence for bound entangled states with negative partial transpose. Physical Review A, 61,
May 2000. doi: 10.1103/physreva.61.062312. URL https://link.aps.org/doi/10.1103/
PhysRevA.61.062312.

Andrew C. Doherty, Pablo A. Parrilo, and Federico M. Spedalieri. Complete family of separability
criteria. Physical Review A, 69:022308, February 2004. doi: 10.1103/PhysRevA.69.022308.
URL https://link.aps.org/doi/10.1103/PhysRevA.69.022308.

Frederic Dupuis. The decoupling approach to quantum information theory. PhD thesis, University
of Montreal, April 2010.

Frédéric Dupuis and Mark M. Wilde. Swiveled Rényi entropies. Quantum Information Processing,
15:1309–1345, March 2016. ISSN 1573-1332. doi: 10.1007/s11128-015-1211-x. URL
http://dx.doi.org/10.1007/s11128-015-1211-x.

Frederic Dupuis, Lea Kraemer, Philippe Faist, Joseph M. Renes, and Renato Renner. Generalized
entropies. XVIIth International Congress on Mathematical Physics, pages 134–153, 2013. URL
https://doi.org/10.1142/9789814449243_0008.

Frédéric Dupuis, Mario Berta, Jürg Wullschleger, and Renato Renner. One-shot decoupling.
Communications in Mathematical Physics, 328:251–284, May 2014. ISSN 1432-0916. doi:
10.1007/s00220-014-1990-4. URL http://dx.doi.org/10.1007/s00220-014-1990-4.

Wolfgang Dür, J. Ignacio Cirac, Maciej Lewenstein, and Dagmar Bruß. Distillability and partial
transposition in bipartite systems. Physical Review A, 61:062313, May 2000. doi: 10.
1103/PhysRevA.61.062313. URL https://link.aps.org/doi/10.1103/PhysRevA.61.
062313.

T. Eggeling, D. Schlingemann, and Reinhard F. Werner. Semicausal operations are semilocalizable.

Europhysics Letters, 57:782–788, March 2002. doi: 10.1209/epl/i2002-00579-4. URL https:
//doi.org/10.1209%2Fepl%2Fi2002-00579-4.

Harold G. Eggleston. Convexity. Cambridge University Press, 1958.

Artur K. Ekert. Quantum cryptography based on Bell’s theorem. Physical Review Letters, 67:661–
663, August 1991. URL https://link.aps.org/doi/10.1103/PhysRevLett.67.661.

1205
David Elkouss and Sergii Strelchuk. Superadditivity of private information for any number of uses
of the channel. Physical Review Letters, 115:040501, July 2015. doi: 10.1103/PhysRevLett.115.
040501. URL http://link.aps.org/doi/10.1103/PhysRevLett.115.040501.

Kun Fang and Hamza Fawzi. Geometric Rényi divergence and its applications in quantum
channel capacities. Communications in Mathematical Physics, 384:1615–1677, June 2021. doi:
10.1007/s00220-021-04064-4. URL https://doi.org/10.1007/s00220-021-04064-4.

William Feller. An Introduction to Probability Theory and Its Applications, volume 1. Wiley, third
edition, 1968.

Rupert L. Frank and Elliott H. Lieb. Monotonicity of a relative Rényi entropy. Journal of
Mathematical Physics, 54:122201, December 2013. URL https://doi.org/10.1063/1.
4838835.

Bert Fristedt and Lawrence Gray. A Modern Approach to Probability Theory. Birkhäuser, Boston,
1997. ISBN 978-1-4899-2839-9.

Christopher A. Fuchs and Carlton M. Caves. Mathematical techniques for quantum communication
theory. Open Systems & Information Dynamics, 3:345–356, 1995. doi: 10.1007/BF02228997.
URL https://doi.org/10.1007/BF02228997.

Christopher A. Fuchs and Jeroen van de Graaf. Cryptographic distinguishability measures for
quantum mechanical states. IEEE Transactions on Information Theory, 45:1216–1227, May
1998. URL https://ieeexplore.ieee.org/document/761271.

Jun-Ichi Fujii, Masatoshi Fujii, and Ritsuo Nakamoto. Jensen’s operator inequality and its
application. Sûrikaisekikenkyûsho Kôkyûroku, 1396:85–93, March 2004. URL http://www.
kurims.kyoto-u.ac.jp/~kyodo/kokyuroku/contents/pdf/1396-10.pdf.

Jingliang Gao. Quantum union bounds for sequential projective measurements. Physical Review A,
92:052331, November 2015. doi: 10.1103/PhysRevA.92.052331. URL https://link.aps.
org/doi/10.1103/PhysRevA.92.052331.

Raúl García-Patrón, Stefano Pirandola, Seth Lloyd, and Jeffrey H. Shapiro. Reverse coherent
information. Physical Review Letters, 102:210501, May 2009. doi: 10.1103/PhysRevLett.102.
210501. URL https://link.aps.org/doi/10.1103/PhysRevLett.102.210501.

Raul García-Patrón, William Matthews, and Andreas Winter. Quantum enhancement of randomness
distribution. IEEE Transactions on Information Theory, 64:4664–4673, June 2018. URL
https://ieeexplore.ieee.org/document/8328871.

I. Gelfand and Mark Aronovich Naimark. On the imbedding of normed rings into the
ring of operators in Hilbert space. Rec. Math. [Mat. Sbornik] N.S., 12(54):197–217,
1943. URL http://www.mathnet.ru/php/archive.phtml?wshow=paper&jrnid=sm&
paperid=6155&option_lang=eng.

1206
Sevag Gharibian. Strong NP-hardness of the quantum separability problem. Quantum Information
and Computation, 10:343–360, March 2010. ISSN 1533-7146. URL https://doi.org/10.
26421/QIC10.3-4-11.

Géza Giedke and J. Ignacio Cirac. Characterization of Gaussian operations and distillation of
Gaussian states. Physical Review A, 66:032316, September 2002. doi: 10.1103/PhysRevA.66.
032316. URL https://link.aps.org/doi/10.1103/PhysRevA.66.032316.

Alexei Gilchrist, Nathan K. Langford, and Michael A. Nielsen. Distance measures to compare
real and ideal quantum processes. Physical Review A, 71:062310, June 2005. URL https:
//link.aps.org/doi/10.1103/PhysRevA.71.062310.

Vittorio Giovannetti and Rosario Fazio. Information-capacity description of spin-chain correlations.

Physical Review A, 71:032314, March 2005. doi: 10.1103/PhysRevA.71.032314. URL
https://link.aps.org/doi/10.1103/PhysRevA.71.032314.

Vittorio Giovannetti, Seth Lloyd, and Lorenzo Maccone. Achieving the Holevo bound via sequential
measurements. Physical Review A, 85:012302, January 2012. URL https://link.aps.org/
doi/10.1103/PhysRevA.85.012302.

Daniel Gottesman and Isaac L. Chuang. Demonstrating the viability of universal quantum
computation using teleportation and single-qubit operations. Nature, 402:390–393, November
1999. doi: 10.1038/46503. URL https://doi.org/10.1038/46503.

Markus Grassl, Thomas Beth, and Thomas Pellizzari. Codes for the quantum erasure channel.
Physical Review A, 56:33–38, July 1997. doi: 10.1103/PhysRevA.56.33. URL https://link.
aps.org/doi/10.1103/PhysRevA.56.33.

Manish Gupta and Mark M. Wilde. Multiplicativity of completely bounded 𝑝-norms implies a
strong converse for entanglement-assisted capacity. Communications in Mathematical Physics,
334:867–887, March 2015. URL https://doi.org/10.1007/s00220-014-2212-9.

Leonid Gurvits. Classical complexity and quantum entanglement. Journal of Computer and System
Sciences, 69:448–484, 2004. ISSN 0022-0000. doi: https://doi.org/10.1016/j.jcss.2004.06.003.
URL http://www.sciencedirect.com/science/article/pii/S0022000004000893.

Brian C. Hall. Quantum Theory for Mathematicians. Graduate Texts in Mathematics. Springer New
York, 2013. ISBN 9781461471165.

Frank Hansen and Gert K. Pedersen. Jensen’s operator inequality. Bulletin of the London Mathe-
matical Society, 35:553–564, July 2003. ISSN 1469-2120. doi: 10.1112/S0024609303002200.
URL http://dx.doi.org/10.1112/S0024609303002200.

Aram W. Harrow. The church of the symmetric subspace. 2013.

Matthew B. Hastings. Superadditivity of communication capacity using entangled inputs. Nature

Physics, 5:255–257, April 2009. URL https://doi.org/10.1038/nphys1224.

1207
Paul Hausladen, Richard Jozsa, Benjamin Schumacher, Michael Westmoreland, and William K.
Wootters. Classical information capacity of a quantum channel. Physical Review A, 54:1869–
1876, September 1996. doi: 10.1103/PhysRevA.54.1869. URL https://link.aps.org/doi/
10.1103/PhysRevA.54.1869.

Masahito Hayashi. Error exponent in asymmetric quantum hypothesis testing and its application
to classical-quantum channel coding. Physical Review A, 76:062301, December 2007. doi:
10.1103/PhysRevA.76.062301. URL https://link.aps.org/doi/10.1103/PhysRevA.76.
062301.

Masahito Hayashi. Quantum Information Theory: Mathematical Foundation. Springer, second

edition, 2017.

Masahito Hayashi and Hiroshi Nagaoka. General formulas for capacity of classical-quantum
channels. IEEE Transactions on Information Theory, 49:1753–1768, July 2003. URL https:
//ieeexplore.ieee.org/document/1023343.

Patrick Hayden, Richard Jozsa, Dénes Petz, and Andreas Winter. Structure of states which satisfy
strong subadditivity of quantum entropy with equality. Communications in Mathematical Physics,
246:359–374, April 2004. URL https://doi.org/10.1007/s00220-004-1049-z.

Patrick Hayden, Michał Horodecki, Andreas Winter, and Jon Yard. A decoupling approach
to the quantum capacity. Open Systems & Information Dynamics, 15:7–19, 2008a. doi:
10.1142/S1230161208000043. URL https://doi.org/10.1142/S1230161208000043.

Patrick Hayden, Peter W. Shor, and Andreas Winter. Random quantum codes from Gaussian
ensembles and an uncertainty relation. Open Systems & Information Dynamics, 15:71–89, March
2008b. URL https://doi.org/10.1142/S1230161208000079.

Teiko Heinosaari and Mário Ziman. The Mathematical Language of Quantum Theory: From
Uncertainty to Entanglement. Cambridge University Press, 2012.

Carl W. Helstrom. Detection theory and quantum mechanics. Information and Control, 10:254–291,
1967. ISSN 0019-9958. URL https://doi.org/10.1016/S0019-9958(67)90302-6.

Carl W. Helstrom. Quantum detection and estimation theory. Journal of Statistical Physics, 1:
231–252, 1969. ISSN 0022-4715. doi: 10.1007/BF01007479. URL https://doi.org/10.
1007/BF01007479.

Carl W. Helstrom. Quantum Detection and Estimation Theory. Academic Press, 1976.

Fumio Hiai and Milán Mosonyi. Different quantum 𝑓 -divergences and the reversibility of quantum
operations. Reviews in Mathematical Physics, 29:1750023, August 2017. URL https:
//doi.org/10.1142/S0129055X17500234.

Fumio Hiai and Dénes Petz. The proper formula for relative entropy and its asymptotics in quantum
probability. Communications in Mathematical Physics, 143:99–114, December 1991. URL
https://doi.org/10.1007/BF02100287.

1208
B. L. Higgins, A. C. Doherty, S. D. Bartlett, G. J. Pryde, and H. M. Wiseman. Multiple-copy state
discrimination: Thinking globally, acting locally. Physical Review A, 83:052314, May 2011. doi:
10.1103/PhysRevA.83.052314. URL https://link.aps.org/doi/10.1103/PhysRevA.83.
052314.

Foek T. Hioe and Joseph H. Eberly. 𝑛-level coherence vector and higher conservation laws in quantum
optics and quantum mechanics. Physical Review Letters, 47:838–841, September 1981. doi:
10.1103/PhysRevLett.47.838. URL https://link.aps.org/doi/10.1103/PhysRevLett.
47.838.

Alexander S. Holevo. An analogue of statistical decision theory and noncommutative

probability theory. Trudy Moskovskogo Matematicheskogo Obshchestva, 26:133–149,
1972a. URL http://www.mathnet.ru/php/archive.phtml?wshow=paper&jrnid=mmo&
paperid=260&option_lang=eng.

Alexander S. Holevo. On quasiequivalence of locally normal states. Theoretical and Mathematical

Physics, 13:1071–1082, November 1972b. ISSN 1573-9333. URL https://doi.org/10.
1007/BF01035528.

Alexander S. Holevo. Bounds for the quantity of information transmitted by a quan-

tum communication channel. Problems of Information Transmission, 9:177–183,
1973. URL http://www.mathnet.ru/php/archive.phtml?wshow=paper&jrnid=ppi&
paperid=903&option_lang=eng.

Alexander S. Holevo. The capacity of the quantum channel with general signal states. IEEE
Transactions on Information Theory, 44:269–273, January 1998. URL https://ieeexplore.
ieee.org/document/651037.

Alexander S. Holevo. On entanglement assisted classical capacity. Journal of Mathematical Physics,

43:4326–4333, September 2002a. URL https://doi.org/10.1063/1.1495877.

Alexander S. Holevo. Remarks on the classical capacity of quantum channel. December 2002b.

Alexander S. Holevo. Multiplicativity of 𝑝-norms of completely positive maps and the additivity
problem in quantum information theory. Russian Mathematical Surveys, 61:301–339, 2006.
URL https://doi.org/10.1070/rm2006v061n02abeh004313.

Alexander S. Holevo. Quantum Systems, Channels, Information: A Mathematical Introduction,

volume 16. Walter de Gruyter, 2013.

Roger A. Horn and Charles R. Johnson. Matrix Analysis. Matrix Analysis. Cambridge University
Press, 2013. ISBN 9780521839402.

Karol Horodecki, Michał Horodecki, Paweł Horodecki, and Jonathan Oppenheim. Secure key
from bound entanglement. Physical Review Letters, 94:160502, April 2005a. doi: 10.
1103/PhysRevLett.94.160502. URL http://link.aps.org/doi/10.1103/PhysRevLett.
94.160502.

1209
Karol Horodecki, Michał Horodecki, Paweł Horodecki, Debbie Leung, and Jonathan Oppenheim.
Quantum key distribution based on private states: Unconditional security over untrusted channels
with zero quantum capacity. IEEE Transactions on Information Theory, 54:2604–2620, June
2008a. ISSN 0018-9448. doi: 10.1109/TIT.2008.921870. URL https://ieeexplore.ieee.
org/document/4529275.

Karol Horodecki, Michał Horodecki, Paweł Horodecki, Debbie Leung, and Jonathan Oppenheim.
Unconditional privacy over channels which cannot convey quantum information. Physical
Review Letters, 100:110502, March 2008b. doi: 10.1103/PhysRevLett.100.110502. URL
http://link.aps.org/doi/10.1103/PhysRevLett.100.110502.

Karol Horodecki, Michal Horodecki, Pawel Horodecki, and Jonathan Oppenheim. General paradigm
for distilling classical key from quantum states. IEEE Transactions on Information Theory, 55:
1898–1929, April 2009a. URL https://ieeexplore.ieee.org/document/4802308.

Michał Horodecki. Simplifying monotonicity conditions for entanglement measures. Open Systems
& Information Dynamics, 12:231–237, September 2005. URL https://doi.org/10.1007/
s11080-005-0920-5.

Michał Horodecki and Paweł Horodecki. Reduction criterion of separability and limits for a class of
distillation protocols. Physical Review A, 59:4206–4216, June 1999. doi: 10.1103/PhysRevA.59.
4206. URL http://link.aps.org/doi/10.1103/PhysRevA.59.4206.

Michał Horodecki, Paweł Horodecki, and Ryszard Horodecki. Separability of mixed states: nec-
essary and sufficient conditions. Physics Letters A, 223:1–8, November 1996. doi: 10.1016/
s0375-9601(96)00706-2. URL https://www.sciencedirect.com/science/article/
pii/S0375960196007062.

Michał Horodecki, Paweł Horodecki, and Ryszard Horodecki. Mixed-state entanglement and
distillation: Is there a ‘bound’ entanglement in nature? Physical Review Letters, 80:5239–5242,
June 1998. doi: 10.1103/physrevlett.80.5239. URL https://link.aps.org/doi/10.1103/
PhysRevLett.80.5239.

Michał Horodecki, Paweł Horodecki, and Ryszard Horodecki. General teleportation channel, singlet
fraction, and quasidistillation. Physical Review A, 60:1888–1898, September 1999. doi: 10.1103/
PhysRevA.60.1888. URL https://link.aps.org/doi/10.1103/PhysRevA.60.1888.

Michał Horodecki, Paweł Horodecki, and Ryszard Horodecki. Unified approach to quantum
capacities: Towards quantum noisy coding theorem. Physical Review Letters, 85:433–436,
July 2000. doi: 10.1103/PhysRevLett.85.433. URL https://link.aps.org/doi/10.1103/
PhysRevLett.85.433.

Michal Horodecki, Peter W. Shor, and Mary Beth Ruskai. Entanglement breaking channels.
Reviews in Mathematical Physics, 15:629–641, 2003. URL https://doi.org/10.1142/
S0129055X03001709.

Michal Horodecki, Jonathan Oppenheim, and Andreas Winter. Partial quantum information. Nature,
436:673–676, August 2005b. URL https://doi.org/10.1038/nature03909.

1210
Michal Horodecki, Jonathan Oppenheim, and Andreas Winter. Quantum state merging and negative
information. Communications in Mathematical Physics, 269:107–136, January 2007. URL
https://doi.org/10.1007/s00220-006-0118-x.

Pawel Horodecki. Separability criterion and inseparable mixed states with positive partial
transposition. Physics Letters A, 232:333–339, August 1997. ISSN 0375-9601. doi:
https://doi.org/10.1016/S0375-9601(97)00416-7. URL http://www.sciencedirect.com/
science/article/pii/S0375960197004167.

Ryszard Horodecki, Paweł Horodecki, Michał Horodecki, and Karol Horodecki. Quantum entangle-
ment. Reviews of Modern Physics, 81:865–942, June 2009b. doi: 10.1103/RevModPhys.81.865.
URL https://link.aps.org/doi/10.1103/RevModPhys.81.865.

Lawrence M. Ioannou. Computational complexity of the quantum separability problem. Quantum

Information and Computation, 7:335–370, May 2007. ISSN 1533-7146. URL https://doi.
org/10.26421/QIC7.4-5.

Raban Iten, Joseph M. Renes, and David Sutter. Pretty good measures in quantum information
theory. IEEE Transactions on Information Theory, 63:1270–1279, February 2017. doi:
10.1109/TIT.2016.2639521. URL https://ieeexplore.ieee.org/document/7782776.

Vojkan Jaksic, Yoshiko Ogata, Claude-Alain Pillet, and Robert Seiringer. Quantum hypothesis
testing and non-equilibrium statistical mechanics. Reviews in Mathematical Physics, 24:1230002,
2012. URL https://doi.org/10.1142/S0129055X12300026.

A. Jamiołkowski. Linear transformations which preserve trace and positive semidefiniteness

of operators. Reports on Mathematical Physics, 3:275–278, 1972. ISSN 0034-4877. doi:
https://doi.org/10.1016/0034-4877(72)90011-0. URL https://www.sciencedirect.com/
science/article/pii/0034487772900110.

Anna Jencova. A relation between completely bounded norms and conjugate channels. Communica-
tions in Mathematical Physics, 266:65–70, August 2006. URL https://doi.org/10.1007/
s00220-006-0035-z.

Anna Jencova. Quantum hypothesis testing and sufficient subalgebras. Letters in Mathematical
Physics, 93:15–27, 2010. URL https://doi.org/10.1007/s11005-010-0398-0.

Vishal Katariya and Mark M. Wilde. Geometric distinguishability measures limit quantum channel
estimation and discrimination. Quantum Information Processing, 20:78, April 2021. URL
https://doi.org/10.1007/s11128-021-02992-7.

Eneet Kaur and Mark M. Wilde. Amortized entanglement of a quantum channel and approximately
teleportation-simulable channels. Journal of Physics A: Mathematical and Theoretical, July
2017. URL http://iopscience.iop.org/10.1088/1751-8121/aa9da7.

Johannes Henricus Bernardus Kemperman. Strong converses for a general memoryless channel
with feedback. In Transactions of the Sixth Prague Conference on Information Theory, Statistical
Decision Functions, and Random Processes, 1971.

1211
Leonid G Khachiyan. Polynomial algorithms in linear programming. USSR Computational
Mathematics and Mathematical Physics, 20:53–72, 1980. ISSN 0041-5553. URL https:
//www.sciencedirect.com/science/article/pii/0041555380900610.

Sumeet Khatri, Eneet Kaur, Saikat Guha, and Mark M. Wilde. Second-order coding rates for key
distillation in quantum key distribution. October 2019.

Sumeet Khatri, Kunal Sharma, and Mark M. Wilde. Information-theoretic aspects of the generalized
amplitude-damping channel. Physical Review A, 102:012401, July 2020. URL https://link.
aps.org/doi/10.1103/PhysRevA.102.012401.

Gen Kimura. The Bloch vector for 𝑁-level systems. Physics Letters A, 314:339–349, 2003.
ISSN 0375-9601. doi: https://doi.org/10.1016/S0375-9601(03)00941-1. URL https://www.
sciencedirect.com/science/article/pii/S0375960103009411.

Christopher King. Maximal 𝑝-norms of entanglement breaking channels. Quantum Information

and Computation, 3:186–190, 2003a. URL https://doi.org/10.26421/QIC3.2-9.

Christopher King. The capacity of the quantum depolarizing channel. IEEE Transactions
on Information Theory, 49:221–229, January 2003b. ISSN 0018-9448. URL https://
ieeexplore.ieee.org/document/1159773.

Christopher King, Keĳi Matsumoto, Michael Nathanson, and Mary Beth Ruskai. Properties
of conjugate channels with applications to additivity and multiplicativity. Markov Processes
and Related Fields, 13:391–423, 2007. URL http://math-mprf.org/journal/articles/
id1123/. J. T. Lewis memorial issue.

Alexei Kitaev. Quantum computations: algorithms and error correction. Russian Mathematical Sur-
veys, 52:1191–1249, 1997. URL https://doi.org/10.1070/rm1997v052n06abeh002155.

Oskar Klein. Zur Quantenmechanischen Begründung des zweiten Hauptsatzes der Wärmelehre. Z.
Physik, 72:767–775, 1931.

Rochus Klesse. Approximate quantum error correction, random codes, and quantum channel
capacity. Physical Review A, 75:062315, June 2007. doi: 10.1103/PhysRevA.75.062315. URL
https://link.aps.org/doi/10.1103/PhysRevA.75.062315.

Rochus Klesse. A random coding based proof for the quantum coding theorem. Open Sys-
tems & Information Dynamics, 15:21–45, March 2008. URL https://doi.org/10.1142/
S1230161208000055.

Robert Koenig and Stephanie Wehner. A strong converse for classical channel coding using
entangled inputs. Physical Review Letters, 103:070504, August 2009. URL https://link.
aps.org/doi/10.1103/PhysRevLett.103.070504.

Robert Koenig, Renato Renner, and Christian Schaffner. The Operational Meaning of Min- and
Max-Entropy. IEEE Transactions on Information Theory, 55:4337–4347, September 2009. URL
https://ieeexplore.ieee.org/document/5208530.

1212
Pieter Kok, W. J. Munro, Kae Nemoto, T. C. Ralph, Jonathan P. Dowling, and G. J. Milburn. Linear
optical quantum computing with photonic qubits. Reviews of Modern Physics, 79:135–174,
January 2007. doi: 10.1103/RevModPhys.79.135. URL https://link.aps.org/doi/10.
1103/RevModPhys.79.135.

Hidetoshi Komiya. Elementary proof for Sion’s minimax theorem. Kodai Mathematical Journal,
11:5–7, 1988. URL https://doi.org/10.2996/kmj/1138038812.

Karl Kraus. States, Effects and Operations: Fundamental Notions of Quantum Theory,. Springer
Verlag, 1983.

Dennis Kretschmann and Reinhard F. Werner. Tema con variazioni: quantum channel capacity. New
Journal of Physics, 6:26, 2004. URL http://stacks.iop.org/1367-2630/6/i=1/a=026.

Erwin Kreyszig. Introductory Functional Analysis with Applications. Wiley Classics Library. Wiley,
1989. ISBN 9780471504597.

Lev Landau. Das dämpfungsproblem in der wellenmechanik. Zeitschrift für Physik, 45:430–441,
May 1927. ISSN 0044-3328. URL https://doi.org/10.1007/BF01343064.

Oscar E. Lanford, III and Derek W. Robinson. Mean entropy of states in quantum-statistical
mechanics. Journal of Mathematical Physics, 9:1120–1125, July 1968. doi: 10.1063/1.1664685.
URL https://doi.org/10.1063/1.1664685.

Jimmie D. Lawson and Yongdo Lim. The geometric mean, matrices, metrics, and more. The American
Mathematical Monthly, 108:797–812, November 2001. doi: 10.1080/00029890.2001.11919815.
URL https://doi.org/10.1080/00029890.2001.11919815.

Felix Leditzky. Relative entropies and their use in quantum information theory. PhD thesis,
University of Cambridge, November 2016.

Felix Leditzky. Distillable key of degradable states. unpublished, August 2019. private email
communication.

Felix Leditzky, Nilanjana Datta, and Graeme Smith. Useful states and entanglement distillation. IEEE
Transactions on Information Theory, 64:4689–4708, July 2018. doi: 10.1109/TIT.2017.2776907.
URL https://ieeexplore.ieee.org/document/8119865.

Felix Leditzky, Eneet Kaur, Nilanjana Datta, and Mark M. Wilde. Approaches for approximate
additivity of the Holevo information of quantum channels. Physical Review A, 97:012332,
January 2018. doi: 10.1103/PhysRevA.97.012332. URL https://link.aps.org/doi/10.
1103/PhysRevA.97.012332.

Yin Tat Lee, Aaron Sidford, and Sam Chiu Wai Wong. A faster cutting plane method and its
implications for combinatorial and convex optimization. In IEEE 56th Annual Symposium
on the Foundations of Computer Science, pages 1049–1065, October 2015. URL https:
//ieeexplore.ieee.org/document/7354442.

1213
Matthew S. Leifer. Conditional density operators and the subjectivity of quantum operations.
AIP Conference Proceedings, 889:172–186, February 2007. doi: 10.1063/1.2713456. URL
https://aip.scitation.org/doi/abs/10.1063/1.2713456.

Matthew S. Leifer and Robert W. Spekkens. Towards a formulation of quantum theory as a causally
neutral theory of Bayesian inference. Physical Review A, 88:052130, November 2013. doi:
10.1103/PhysRevA.88.052130. URL http://link.aps.org/doi/10.1103/PhysRevA.88.
052130.

Matthew S. Leifer, Leah Henderson, and Noah Linden. Optimal entanglement generation from
quantum operations. Physical Review A, 67:012306, January 2003. doi: 10.1103/physreva.67.
012306. URL https://link.aps.org/doi/10.1103/PhysRevA.67.012306.

Debbie Leung and William Matthews. On the power of PPT-preserving and non-signalling codes.
IEEE Transactions on Information Theory, 61:4486–4499, August 2015. ISSN 0018-9448. doi:
10.1109/TIT.2015.2439953. URL https://ieeexplore.ieee.org/document/7115934.

Ke Li and Andreas Winter. Squashed entanglement, 𝑘-extendibility, quantum Markov chains,

and recovery maps. Foundations of Physics, 48:910–924, February 2018. URL https:
//doi.org/10.1007/s10701-018-0143-6.

Hou Li-Zhen and Fang Mao-Fa. Entanglement-assisted classical capacity of a generalized amplitude
damping channel. Chinese Physics Letters, 24:2482, 2007a. URL http://stacks.iop.org/
0256-307X/24/i=9/a=006.

Hou Li-Zhen and Fang Mao-Fa. The Holevo capacity of a generalized amplitude-damping channel.
Chinese Physics, 16:1843, 2007b. URL http://stacks.iop.org/1009-1963/16/i=7/a=
006.

Elliot H. Lieb. Convex trace functions and the Wigner-Yanase-Dyson conjecture. Advances in Math-
ematics, 11:267–288, December 1973. URL https://doi.org/10.1016/0001-8708(73)
90011-X.

Elliott H. Lieb and Mary Beth Ruskai. Proof of the strong subadditivity of quantum-mechanical
entropy. Journal of Mathematical Physics, 14:1938–1941, 1973a. URL https://doi.org/10.
1063/1.1666274.

Elliott H. Lieb and Mary Beth Ruskai. A fundamental property of quantum-mechanical entropy.
Physical Review Letters, 30:434–436, March 1973b. doi: 10.1103/PhysRevLett.30.434. URL
https://link.aps.org/doi/10.1103/PhysRevLett.30.434.

Elliott H. Lieb and Walter E. Thirring. Inequalities for the Moments of the Eigenvalues of the
Schrodinger Hamiltonian and Their Relation to Sobolev Inequalities, pages 269–304. Princeton
University Press, 1976. doi: doi:10.1515/9781400868940-014. URL https://doi.org/10.
1515/9781400868940-014.

Göran Lindblad. Completely positive maps and entropy inequalities. Communications in Mathe-
matical Physics, 40:147–151, June 1975. ISSN 0010-3616. doi: 10.1007/BF01609396. URL
http://dx.doi.org/10.1007/BF01609396.

1214
Zi-Wen Liu and Andreas Winter. Resource theories of quantum channels and the universal role of
resource erasure. April 2019.

Seth Lloyd. Capacity of the noisy quantum channel. Physical Review A, 55:1613–1622, March 1997.
doi: 10.1103/PhysRevA.55.1613. URL https://link.aps.org/doi/10.1103/PhysRevA.
55.1613.

Per-Olov Löwdin. On the nonorthogonality problem. 5:185–199, 1970. ISSN 0065-3276. doi:
https://doi.org/10.1016/S0065-3276(08)60339-1. URL https://www.sciencedirect.com/
science/article/pii/S0065327608603391.

Per-Olov Löwdin. On the non-orthogonality problem connected with the use of atomic wave
functions in the theory of molecules and crystals. The Journal of Chemical Physics, 18:365–375,
1950. doi: 10.1063/1.1747632. URL https://doi.org/10.1063/1.1747632.

Keĳi Matsumoto. A new quantum version of 𝑓 -divergence. 2013.

Keĳi Matsumoto. Quantum fidelities, their duals, and convex analysis. August 2014a.

Keĳi Matsumoto. On the condition of conversion of classical probability distribution families into
quantum families. December 2014b.

Keĳi Matsumoto. A new quantum version of 𝑓 -divergence. In Masanao Ozawa, Jeremy Butterfield,
Hans Halvorson, Miklós Rédei, Yuichiro Kitajima, and Francesco Buscemi, editors, Reality
and Measurement in Algebraic Quantum Theory, volume 261, pages 229–273, Singapore, 2018.
Springer Singapore. ISBN 9789811324864 9789811324871. doi: 10.1007/978-981-13-2487-1_
10. URL http://link.springer.com/10.1007/978-981-13-2487-1_10. Series Title:
Springer Proceedings in Mathematics & Statistics.

William Matthews and Stephanie Wehner. Finite blocklength converse bounds for quantum
channels. IEEE Transactions on Information Theory, 60:7317–7329, November 2014. URL
https://ieeexplore.ieee.org/document/6891222.

Ueli M. Maurer. Secret key agreement by public discussion from common information. IEEE
Transactions on Information Theory, 39:733–742, May 1993. URL https://ieeexplore.
ieee.org/document/256484.

I. Mayer. On Löwdin’s method of symmetric orthogonalization. International Journal of

Quantum Chemistry, 90:63–65, 2002. doi: https://doi.org/10.1002/qua.981. URL https:
//onlinelibrary.wiley.com/doi/abs/10.1002/qua.981.

Simon Milz and Kavan Modi. Quantum Stochastic Processes and Quantum non-Markovian
Phenomena. PRX Quantum, 2:030201, July 2021. doi: 10.1103/PRXQuantum.2.030201. URL
https://link.aps.org/doi/10.1103/PRXQuantum.2.030201.

Adam Miranowicz and Satoshi Ishizaka. Closed formula for the relative entropy of entanglement.
Physical Review A, 78:032310, September 2008. doi: 10.1103/PhysRevA.78.032310. URL
https://link.aps.org/doi/10.1103/PhysRevA.78.032310.

1215
Gert Molière and Max Delbrück. Statistische Quantenmechanik und Thermodynamik. Berlin:
Verlag der Akademie der Wissenschaften, 1935.

Ciara Morgan and Andreas Winter. ‘Pretty strong’ converse for the quantum capacity of degradable
channels. IEEE Transactions on Information Theory, 60:317–333, January 2014. URL
https://ieeexplore.ieee.org/document/6663606.

Milan Mosonyi and Nilanjana Datta. Generalized relative entropies and the capacity of classical-
quantum channels. Journal of Mathematical Physics, 50:072104, July 2009. doi: 10.1063/1.
3167288. URL http://dx.doi.org/10.1063/1.3167288.

Milán Mosonyi and Fumio Hiai. On the quantum Rényi relative entropies and related capacity
formulas. IEEE Transactions on Information Theory, 57:2474–2487, April 2011. URL
https://ieeexplore.ieee.org/document/5730573.

Milán Mosonyi and Tomohiro Ogawa. Quantum hypothesis testing and the operational interpretation
of the quantum Rényi relative entropies. Communications in Mathematical Physics, 334:
1617–1648, March 2015. URL https://doi.org/10.1007/s00220-014-2248-x.

Milán Mosonyi and Dénes Petz. Structure of sufficient quantum coarse-grainings. Letters in
Mathematical Physics, 68:19–30, April 2004. ISSN 1573-0530. URL https://doi.org/10.
1007/s11005-004-4072-2.

Alexander Müller-Hermes. Transposition in quantum information theory. Master’s thesis, Technical

University of Munich, September 2012.

Martin Müller-Lennert, Frédéric Dupuis, Oleg Szehr, Serge Fehr, and Marco Tomamichel. On
quantum Rényi entropies: a new generalization and some properties. Journal of Mathematical
Physics, 54:122203, December 2013. URL https://doi.org/10.1063/1.4838856.

Hiroshi Nagaoka. The converse part of the theorem for quantum Hoeffding bound. November 2006.

Mark Aronovich Naimark. Spectral functions of a symmetric operator. Izv. Akad. Nauk SSSR
Ser. Mat., 4:277–318, 1940. URL http://www.mathnet.ru/php/archive.phtml?wshow=
paper&jrnid=im&paperid=3745&option_lang=eng.

Michael A. Nielsen. Continuity bounds for entanglement. Physical Review A, 61:064301, April
2000. URL https://link.aps.org/doi/10.1103/PhysRevA.61.064301.

Michael A. Nielsen. A simple formula for the average gate fidelity of a quantum dynamical
operation. Physics Letters A, 303:249 – 252, 2002. ISSN 0375-9601. doi: DOI:10.1016/
S0375-9601(02)01272-0. URL https://www.sciencedirect.com/science/article/
pii/S0375960102012720.

Michael A. Nielsen and Isaac L. Chuang. Quantum Computation and Quantum Information.
Cambridge University Press, 2000.

1216
Julien Niset, Jaromír Fiurasek, and Nicolas J. Cerf. No-go theorem for Gaussian quantum error
correction. Physical Review Letters, 102:120501, March 2009. doi: 10.1103/PhysRevLett.102.
120501. URL http://link.aps.org/doi/10.1103/PhysRevLett.102.120501.

Michael Nussbaum and Arleta Szkoła. The Chernoff lower bound for symmetric quantum hypothesis
testing. The Annals of Statistics, 37:1040–1057, 2009. doi: 10.1214/08-AOS593. URL
https://doi.org/10.1214/08-AOS593.

Tomohiro Ogawa and Hiroshi Nagaoka. Strong Converse and Stein’s Lemma in Quantum Hypothesis
Testing, pages 28–42. 2005. doi: 10.1142/9789812563071_0003. URL https://www.
worldscientific.com/doi/abs/10.1142/9789812563071_0003.

Samad Khabbazi Oskouei, Stefano Mancini, and Mark M. Wilde. Union bound for quantum
information processing. Proceedings of the Royal Society A, 475:20180612, January 2019.
doi: 10.1098/rspa.2018.0612. URL https://royalsocietypublishing.org/doi/abs/10.
1098/rspa.2018.0612.

Masanao Ozawa. Quantum measuring processes of continuous observables. Journal of Mathematical

Physics, 25:79–87, 1984. URL https://doi.org/10.1063/1.526000.

Vern Paulsen. Completely Bounded Maps and Operator Algebras. Cambridge Studies in Advanced
Mathematics. Cambridge University Press, 2003. doi: 10.1017/CBO9780511546631.

Asher Peres. Separability criterion for density matrices. Physical Review Letters, 77:1413–1415,
August 1996. doi: 10.1103/PhysRevLett.77.1413. URL http://link.aps.org/doi/10.
1103/PhysRevLett.77.1413.

Dénes Petz. Quasi-entropies for States of a von Neumann Algebra. Publications of the Research
Institute for Mathematical Sciences, 21:787–800, 1985. doi: 10.2977/prims/1195178929. URL
https://doi.org/10.2977/prims/1195178929.

Dénes Petz. Quasi-entropies for finite quantum systems. Reports in Mathematical Physics, 23:
57–65, 1986a. URL https://doi.org/10.1016/0034-4877(86)90067-4.

Dénes Petz. Sufficient subalgebras and the relative entropy of states of a von Neumann algebra.
Communications in Mathematical Physics, 105:123–131, March 1986b. ISSN 1432-0916. URL
https://doi.org/10.1007/BF01212345.

Dénes Petz. Sufficiency of channels over von Neumann algebras. Quarterly Journal of Mathematics,
39:97–108, 1988. ISSN 1464-3847. URL https://doi.org/10.1093/qmath/39.1.97.

Dénes Petz. Monotonicity of quantum relative entropy revisited. Reviews in Mathematical Physics,
15:79, March 2003. URL https://doi.org/10.1142/S0129055X03001576.

Dénes Petz and Mary Beth Ruskai. Contraction of generalized relative entropy under stochastic
mappings on matrices. Infinite Dimensional Analysis, Quantum Probability and Related Topics,
1:83–89, January 1998. URL https://doi.org/10.1142/S0219025798000077.

1217
Marco Piani, Michal Horodecki, Pawel Horodecki, and Ryszard Horodecki. Properties of quantum
nonsignaling boxes. Physical Review A, 74:012305, July 2006. doi: 10.1103/PhysRevA.74.
012305. URL https://link.aps.org/doi/10.1103/PhysRevA.74.012305.

Stefano Pirandola, Riccardo Laurenza, Carlo Ottaviani, and Leonardo Banchi. Fundamental limits
of repeaterless quantum communications. Nature Communications, 8:15043, 2017. URL
https://doi.org/10.1038/ncomms15043.

Martin B. Plenio. Logarithmic negativity: A full entanglement monotone that is not convex.
Physical Review Letters, 95:090503, August 2005. doi: 10.1103/PhysRevLett.95.090503. URL
https://link.aps.org/doi/10.1103/PhysRevLett.95.090503.

Martin B. Plenio and Shashank Virmani. An introduction to entanglement measures. Quantum

Information & Computation, 7:1–51, January 2007. ISSN 1533-7146. URL https://doi.
org/10.26421/QIC7.1-2-1.

Martin B. Plenio, Shashank Virmani, and P. Papadopoulos. Operator monotones, the reduction
criterion and the relative entropy. Journal of Physics A: Mathematical and General, 33:L193,
June 2000. URL http://stacks.iop.org/0305-4470/33/i=22/a=101.

Yury Polyanskiy and Sergio Verdú. Arimoto channel coding converse and Rényi divergence.
In Proceedings of the 48th Annual Allerton Conference on Communication, Control, and
Computation, pages 1327–1333, September 2010. URL https://ieeexplore.ieee.org/
abstract/document/5707067.

Haoyu Qi, Kunal Sharma, and Mark M. Wilde. Entanglement-assisted private communication
over quantum broadcast channels. Journal of Physics A: Mathematical and Theoretical, 51:
374001, August 2018a. doi: 10.1088/1751-8121/aad5f3. URL https://doi.org/10.1088/
1751-8121/aad5f3.

Haoyu Qi, Qing-Le Wang, and Mark M. Wilde. Applications of position-based coding to classical
communication over quantum channels. Journal of Physics A, 51:444002, November 2018b.
URL https://doi.org/10.1088/1751-8121/aae290.

Lu-Feng Qiao, Alexander Streltsov, Jun Gao, Swapan Rana, Ruo-Jing Ren, Zhi-Qiang Jiao, Cheng-
Qiu Hu, Xiao-Yun Xu, Ci-Yu Wang, Hao Tang, Ai-Lin Yang, Zhi-Hao Ma, Maciej Lewenstein,
and Xian-Min Jin. Entanglement activation from quantum coherence and superposition.
Physical Review A, 98:052351, November 2018. doi: 10.1103/PhysRevA.98.052351. URL
https://link.aps.org/doi/10.1103/PhysRevA.98.052351.

Jaikumar Radhakrishnan, Pranab Sen, and Naqueeb Ahmad Warsi. One-shot private classical
capacity of quantum wiretap channel: Based on one-shot quantum covering lemma. March 2017.

Eric M. Rains. Entanglement purification via separable superoperators. 1998.

Eric M. Rains. Bound on distillable entanglement. Physical Review A, 60:179–184, July 1999a.
doi: 10.1103/PhysRevA.60.179. URL http://link.aps.org/doi/10.1103/PhysRevA.60.
179.

1218
Eric M. Rains. Rigorous treatment of distillable entanglement. Physical Review A, 60:173–178,
July 1999b. doi: 10.1103/PhysRevA.60.173. URL https://link.aps.org/doi/10.1103/
PhysRevA.60.173.

Eric M. Rains. A semidefinite program for distillable entanglement. IEEE Transactions on

Information Theory, 47:2921–2933, November 2001. URL https://ieeexplore.ieee.org/
document/959270.

Alexey E. Rastegin. Relative error of state-dependent cloning. Physical Review A, 66:042304,

October 2002. doi: 10.1103/PhysRevA.66.042304. URL http://link.aps.org/doi/10.
1103/PhysRevA.66.042304.

Alexey E. Rastegin. A lower bound on the relative error of mixed-state cloning and related
operations. Journal of Optics B: Quantum and Semiclassical Optics, 5:S647, December 2003.
URL http://stacks.iop.org/1464-4266/5/i=6/a=017.

Alexey E. Rastegin. Sine distance for quantum states. February 2006.

Michael Reed and Barry Simon. Methods of Modern Mathematical Physics, volume I: Functional
Analysis. Academic Press, 1981. ISBN 9780080570488.

Joseph M. Renes and Renato Renner. Noisy channel coding via privacy amplification and information
reconciliation. IEEE Transactions on Information Theory, 57:7377–7385, November 2011.
ISSN 0018-9448. doi: 10.1109/TIT.2011.2162226. URL https://ieeexplore.ieee.org/
document/5967913.

Joseph M. Renes and Renato Renner. One-shot classical data compression with quantum side
information and the distillation of common randomness or secret keys. IEEE Transactions
on Information Theory, 58:1985–1991, March 2012. doi: 10.1109/TIT.2011.2177589. URL
https://ieeexplore.ieee.org/document/6157080.

Renato Renner. Security of Quantum Key Distribution. PhD thesis, ETH Zürich, December 2005.

Luca Rigovacca, Go Kato, Stefan Baeuml, Myungshik S. Kim, William J. Munro, and Koji Azuma.
Versatile relative entropy bounds for quantum networks. New Journal of Physics, 20:013033,
January 2018. URL https://doi.org/10.1088/1367-2630/aa9fcf.

Ralph Tyrrell Rockafellar. Convex Analysis. Princeton Landmarks in Mathematics and Physics.
Princeton University Press, 1970. ISBN 9780691015866.

B. Rosgen and J. Watrous. On the hardness of distinguishing mixed-state quantum computations. In

20th Annual IEEE Conference on Computational Complexity (CCC’05), pages 344–354, 2005.
doi: 10.1109/CCC.2005.21. URL https://ieeexplore.ieee.org/document/1443098.

Sheldon Ross. Introduction to Probability Models. Academic Press, 12 edition, 2019. ISBN
978-0-12-814346-9.

1219
Aidan Roy and A. J. Scott. Unitary designs and codes. Designs, Codes and Cryptography, 53:
13–31, October 2009. doi: 10.1007/s10623-009-9290-2. URL https://doi.org/10.1007/
s10623-009-9290-2.

Walter Rudin. Principles of Mathematical Analysis. International Series in Pure and Applied
Mathematics. McGraw-Hill, 1976. ISBN 9780070856134.

Mary Beth Ruskai. Inequalities for quantum entropy: A review with conditions for equality. Journal
of Mathematical Physics, 43:4358–4375, September 2002. doi: 10.1063/1.1497701. URL
https://doi.org/10.1063/1.1497701.

Massimiliano F. Sacchi. Optimal discrimination of quantum operations. Physical Review A, 71:

062340, June 2005. doi: 10.1103/PhysRevA.71.062340. URL https://link.aps.org/doi/
10.1103/PhysRevA.71.062340.

J. J. Sakurai. Modern Quantum Mechanics. Addison–Wesley Publishing Company, Inc., revised

edition, 1994.

Benjamin Schumacher. Sending entanglement through noisy quantum channels. Physical Review
A, 54:2614–2628, October 1996. doi: 10.1103/PhysRevA.54.2614. URL https://link.aps.
org/doi/10.1103/PhysRevA.54.2614.

Benjamin Schumacher and Michael A. Nielsen. Quantum data processing and error correction.
Physical Review A, 54:2629–2635, October 1996. doi: 10.1103/PhysRevA.54.2629. URL
https://link.aps.org/doi/10.1103/PhysRevA.54.2629.

Benjamin Schumacher and Michael D. Westmoreland. Sending classical information via noisy
quantum channels. Physical Review A, 56:131–138, July 1997. URL https://link.aps.org/
doi/10.1103/PhysRevA.56.131.

Benjamin Schumacher and Michael D. Westmoreland. Approximate quantum error correction.

Quantum Information Processing, 1:5–12, April 2002. ISSN 1573-1332. doi: 10.1023/A:
1019653202562. URL https://doi.org/10.1023/A:1019653202562.

Pranab Sen. Achieving the Han–Kobayashi inner bound for the quantum interference channel
by sequential decoding. In 2012 IEEE International Symposium on Information Theory
Proceedings, pages 736–740, September 2012. doi: 10.1109/ISIT.2012.6284656. URL
https://ieeexplore.ieee.org/document/6284656.

Claude Shannon. The zero error capacity of a noisy channel. IRE Transactions on Information Theory,
IT-2:S8–S19, September 1956. URL https://ieeexplore.ieee.org/document/1056798.

Claude E. Shannon. Communication theory of secrecy systems. The Bell System Technical
Journal, 28:656–715, October 1949. doi: 10.1002/j.1538-7305.1949.tb00928.x. URL https:
//ieeexplore.ieee.org/document/6769090.

Naresh Sharma. Equality conditions for the quantum 𝑓 -relative entropy and generalized data
processing inequalities. Quantum Information Processing, 11:137–160, 2012. ISSN 2157-8095.
URL https://doi.org/10.1007/s11128-011-0238-x.

1220
Naresh Sharma and Naqueeb Ahmad Warsi. Fundamental bound on the reliability of quantum
information transmission. Physical Review Letters, 110:080501, February 2013. doi: 10.1103/
PhysRevLett.110.080501. URL https://link.aps.org/doi/10.1103/PhysRevLett.110.
080501.

Yaoyun Shi and Xiaodi Wu. Epsilon-net method for optimizations over separable states. In Artur
Czumaj, Kurt Mehlhorn, Andrew Pitts, and Roger Wattenhofer, editors, Automata, Languages,
and Programming, pages 798–809, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. ISBN
978-3-642-31594-7. URL https://doi.org/10.1007/978-3-642-31594-7_67.

Maksim E. Shirokov. Tight uniform continuity bounds for the quantum conditional mutual
information, for the Holevo quantity, and for capacities of quantum channels. Journal of
Mathematical Physics, 58:102202, October 2017. doi: 10.1063/1.4987135. URL https:
//doi.org/10.1063/1.4987135.

Peter W. Shor. Scheme for reducing decoherence in quantum computer memory. Physical
Review A, 52:R2493–R2496, October 1995. doi: 10.1103/PhysRevA.52.R2493. URL https:
//link.aps.org/doi/10.1103/PhysRevA.52.R2493.

Peter W. Shor. Additivity of the classical capacity of entanglement-breaking quantum channels.

Journal of Mathematical Physics, 43:4334–4340, 2002a. doi: 10.1063/1.1498000. URL
https://doi.org/10.1063/1.1498000.

Peter W. Shor. The quantum channel capacity and coherent information. In Lecture Notes, MSRI
Workshop on Quantum Computation, 2002b.

Peter W. Shor. Equivalence of additivity questions in quantum information theory. Communi-

cations in Mathematical Physics, 246:453–472, 2004. URL https://doi.org/10.1007/
s00220-004-1071-1.

Maurice Sion. On general minimax theorems. Pacific Journal of Mathematics, 8:171–176, March
1958. URL https://msp.org/pjm/1958/8-1/p14.xhtml.

Graeme Smith. Private classical capacity with a symmetric side channel and its application to quantum
cryptography. Physical Review A, 78:022306, August 2008. doi: 10.1103/PhysRevA.78.022306.
URL https://link.aps.org/doi/10.1103/PhysRevA.78.022306.

Graeme Smith and John A. Smolin. Degenerate quantum codes for Pauli channels. Physical
Review Letters, 98:030501, January 2007. doi: 10.1103/PhysRevLett.98.030501. URL https:
//link.aps.org/doi/10.1103/PhysRevLett.98.030501.

Graeme Smith and John A. Smolin. Extensive nonadditivity of privacy. Physical Review Letters,
103:120503, September 2009. URL https://link.aps.org/doi/10.1103/PhysRevLett.
103.120503.

Graeme Smith and Jon Yard. Quantum communication with zero-capacity channels. Science, 321:
1812–1815, September 2008. URL https://science.sciencemag.org/content/321/
5897/1812.

1221
Graeme Smith, Joseph M. Renes, and John A. Smolin. Structured codes improve the Bennett-
Brassard-84 quantum key rate. Physical Review Letters, 100:170502, April 2008. doi: 10.1103/
PhysRevLett.100.170502. URL https://link.aps.org/doi/10.1103/PhysRevLett.100.
170502.

Graeme Smith, John A. Smolin, and Jon Yard. Quantum communication with Gaussian channels of
zero quantum capacity. Nature Photonics, 5:624–627, August 2011. URL https://doi.org/
10.1038/nphoton.2011.203.

R. R. Smith. Completely Bounded Maps between C*-Algebras. Journal of the London Mathematical
Society, s2-27:157–166, 02 1983. ISSN 0024-6107. doi: 10.1112/jlms/s2-27.1.157. URL
https://doi.org/10.1112/jlms/s2-27.1.157.

Akihito Soeda, Peter S. Turner, and Mio Murao. Entanglement cost of implementing controlled-
unitary operations. Physical Review Letters, 107:180501, October 2011. doi: 10.1103/physrevlett.
107.180501. URL https://link.aps.org/doi/10.1103/PhysRevLett.107.180501.

Benjamin Steinberg. Representation Theory of Finite Groups: An Introductory Approach. Springer

New York, 2011. ISBN 9781461407768.

W. Forrest Stinespring. Positive Functions on C*-Algebras. Proceedings of the American

Mathematical Society, 6:211–216, 1955. ISSN 00029939, 10886826. URL http://www.jstor.
org/stable/2032342.

Gilbert Strang. Introduction to Linear Algebra. Wellesley-Cambridge Press and SIAM, fifth edition,
May 2016.

Ruslan L. Stratonovich. Information capacity of a quantum communications channel. i. Soviet

Radiophysics, 8:82–91, January 1965. ISSN 1573-9120. doi: 10.1007/BF01038470. URL
https://doi.org/10.1007/BF01038470.

David Sutter, Volkher B. Scholz, Andreas Winter, and Renato Renner. Approximate degradable
quantum channels. IEEE Transactions on Information Theory, 63:7832–7844, December 2017.
URL https://ieeexplore.ieee.org/document/8046086.

Masahiro Takeoka, Masashi Ban, and Masahide Sasaki. Quantum channel of continuous variable
teleportation and nonclassicality of quantum states. Journal of Optics B: Quantum and
Semiclassical Optics, 4:114, April 2002. URL http://stacks.iop.org/1464-4266/4/i=
2/a=306.

Masahiro Takeoka, Saikat Guha, and Mark M. Wilde. The squashed entanglement of a quantum
channel. IEEE Transactions on Information Theory, 60:4987–4998, August 2014. ISSN
0018-9448. URL https://ieeexplore.ieee.org/document/6832533.

Masahiro Takeoka, Kaushik P. Seshadreesan, and Mark M. Wilde. Unconstrained distillation

capacities of a pure-loss bosonic broadcast channel. In 2016 IEEE International Symposium
on Information Theory (ISIT), pages 2484–2488, July 2016. doi: 10.1109/ISIT.2016.7541746.
URL https://ieeexplore.ieee.org/document/7541746.

1222
Masahiro Takeoka, Kaushik P. Seshadreesan, and Mark M. Wilde. Unconstrained capacities of
quantum key distribution and entanglement distillation for pure-loss bosonic broadcast channels.
Physical Review Letters, 119:150501, October 2017. URL https://link.aps.org/doi/10.
1103/PhysRevLett.119.150501.

Marco Tomamichel. Quantum Information Processing with Finite Resources: Mathematical

Foundations. Springer, 2015.

Marco Tomamichel and Masahito Hayashi. A hierarchy of information quantities for finite
block length analysis of quantum tasks. IEEE Transactions on Information Theory, 59:7693–
7710, November 2013. ISSN 0018-9448. doi: 10.1109/TIT.2013.2276628. URL https:
//ieeexplore.ieee.org/document/6574274.

Marco Tomamichel, Roger Colbeck, and Renato Renner. A fully quantum asymptotic equipartition
property. IEEE Transactions on Information Theory, 55:5840–5847, December 2009. URL
https://ieeexplore.ieee.org/document/5319753.

Marco Tomamichel, Roger Colbeck, and Renato Renner. Duality Between Smooth Min- and
Max-Entropies. IEEE Transactions on Information Theory, 56:4674–4681, September 2010.
URL https://ieeexplore.ieee.org/document/5550419.

Marco Tomamichel, Mario Berta, and Joseph M. Renes. Quantum coding with finite resources. Na-
ture Communications, 7:11419, May 2016. URL https://doi.org/10.1038/ncomms11419.

Marco Tomamichel, Mark M. Wilde, and Andreas Winter. Strong converse rates for quantum
communication. IEEE Transactions on Information Theory, 63:715–727, January 2017. doi:
10.1109/tit.2016.2615847. URL https://ieeexplore.ieee.org/document/7586115.

Robert R. Tucci. Quantum entanglement and conditional information transmission. September

1999.

Robert R. Tucci. Entanglement of distillation and conditional mutual information. Februrary 2002.

Armin Uhlmann. The ‘Transition Probability’ in the State Space of a *-Algebra. Reports on
Mathematical Physics, 9:273–279, April 1976. URL https://www.sciencedirect.com/
science/article/pii/0034487776900604.

Michael L. Ulrey. Sequential coding for channels with feedback. Information and Control, 32:
93–100, October 1976. URL https://doi.org/10.1016/S0019-9958(76)90129-7.

Hisaharu Umegaki. Conditional expectations in an operator algebra IV (entropy and information).

Kodai Mathematical Seminar Reports, 14:59–85, 1962. URL https://doi.org/10.2996/
kmj/1138844604.

Lieven Vandenberghe and Stephen Boyd. Semidefinite programming. SIAM Review, 38:49–95,
1996. doi: 10.1137/1038003. URL https://doi.org/10.1137/1038003.

1223
Gonzalo Vazquez-Vilar. Multiple quantum hypothesis testing expressions and classical-quantum
channel converse bounds. In 2016 IEEE International Symposium on Information Theory, pages
2854–2857, Barcelona, Spain, 2016. URL https://ieeexplore.ieee.org/document/
7541820.

Vlatko Vedral and Martin B. Plenio. Entanglement measures and purification procedures. Physical
Review A, 57:1619–1633, March 1998. doi: 10.1103/PhysRevA.57.1619. URL http://link.
aps.org/doi/10.1103/PhysRevA.57.1619.

Vlatko Vedral, Martin B. Plenio, M. A. Rippin, and Peter L. Knight. Quantifying entanglement.
Physical Review Letters, 78:2275–2279, March 1997. doi: 10.1103/PhysRevLett.78.2275. URL
https://link.aps.org/doi/10.1103/PhysRevLett.78.2275.

Sergio Verdu. On channel capacity per unit cost. IEEE Transactions on Information Theory, 36:1019–
1030, 1990. doi: 10.1109/18.57201. URL https://ieeexplore.ieee.org/document/
57201.

Gilbert S. Vernam. Cipher printing telegraph systems for secret wire and radio telegraphic
communications. Transactions of the American Institute of Electrical Engineers, 45:295–301,
1926. URL https://ieeexplore.ieee.org/document/5061224.

Guifré Vidal. Entanglement monotones. Journal of Modern Optics, 47:355–376, 2000. doi:
10.1080/09500340008244048. URL https://www.tandfonline.com/doi/abs/10.1080/
09500340008244048.

Guifré Vidal and Reinhard F. Werner. Computable measure of entanglement. Physical Review A,
65:032314, February 2002. doi: 10.1103/PhysRevA.65.032314. URL https://link.aps.
org/doi/10.1103/PhysRevA.65.032314.

Johann von Neumann. Wahrscheinlichkeitstheoretischer aufbau der quantenmechanik. Nachrichten

von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse, 1:
245–272, 1927a. URL http://eudml.org/doc/59230.

Johann von Neumann. Thermodynamik quantenmechanischer gesamtheiten. Nachrichten von

der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse, 102:
273–291, 1927b.

Johann von Neumann. Zur theorie der gesellschaftsspiele. Mathematische Annalen, 100:295–320,
December 1928. ISSN 1432-1807. URL https://doi.org/10.1007/BF01448847.

Johann von Neumann. Mathematische grundlagen der quantenmechanik. Verlag von Julius Springer
Berlin, 1932.

Michael Walter, David Gross, and Jens Eisert. Multi-partite entanglement. 2016.

Kun Wang, Xin Wang, and Mark M. Wilde. Quantifying the unextendibility of entanglement.
November 2019a.

1224
Ligong Wang and Renato Renner. One-shot classical-quantum capacity and hypothesis testing.
Physical Review Letters, 108:200501, 2012. doi: 10.1103/PhysRevLett.108.200501. URL
https://link.aps.org/doi/10.1103/PhysRevLett.108.200501.

Xin Wang and Runyao Duan. Improved semidefinite programming upper bound on distillable
entanglement. Physical Review A, 94:050301, November 2016a. doi: 10.1103/physreva.94.
050301. URL https://link.aps.org/doi/10.1103/PhysRevA.94.050301.

Xin Wang and Runyao Duan. A semidefinite programming upper bound of quantum capacity. In
2016 IEEE International Symposium on Information Theory (ISIT). IEEE, July 2016b. doi:
10.1109/isit.2016.7541587. URL https://ieeexplore.ieee.org/document/7541587.

Xin Wang and Mark M. Wilde. Resource theory of asymmetric distinguishability. Physical
Review Research, 1:033170, December 2019. doi: 10.1103/PhysRevResearch.1.033170. URL
http://arxiv.org/abs/1905.11629.

Xin Wang and Mark M. Wilde. 𝛼-logarithmic negativity. Physical Review A, 102:032416, September
2020. doi: 10.1103/PhysRevA.102.032416. URL https://link.aps.org/doi/10.1103/
PhysRevA.102.032416.

Xin Wang, Wei Xie, and Runyao Duan. Semidefinite programming strong converse bounds for
classical capacity. IEEE Transactions on Information Theory, 64:640–653, January 2018.
ISSN 0018-9448. doi: 10.1109/TIT.2017.2741101. URL https://ieeexplore.ieee.org/
document/8012535.

Xin Wang, Kun Fang, and Runyao Duan. Semidefinite programming converse bounds for quantum
communication. IEEE Transactions on Information Theory, 65:2583–2592, April 2019b. URL
https://ieeexplore.ieee.org/document/8482492.

Xin Wang, Kun Fang, and Marco Tomamichel. On converse bounds for classical communication over
quantum channels. IEEE Transactions on Information Theory, 65:4609–4619, July 2019c. doi:
10.1109/TIT.2019.2898656. URL https://ieeexplore.ieee.org/document/8638816.

John Watrous. Semidefinite programs for completely bounded norms. Theory of Comput-
ing, 5:217–238, November 2009. doi: 10.4086/toc.2009.v005a011. URL http://www.
theoryofcomputing.org/articles/v005a011.

John Watrous. Simpler semidefinite programs for completely bounded norms. Chicago Jour-
nal of Theoretical Computer Science, July 2013. URL http://cjtcs.cs.uchicago.edu/
articles/2013/8/contents.html.

John Watrous. The Theory of Quantum Information. Cambridge University Press, 2018. doi:
10.1017/9781316848142.

R. F. Werner and A. S. Holevo. Counterexample to an additivity conjecture for output purity of

quantum channels. Journal of Mathematical Physics, 43:4353–4357, 2002. doi: 10.1063/1.
1498491. URL https://doi.org/10.1063/1.1498491.

1225
Reinhard F. Werner. An application of Bell’s inequalities to a quantum state extension problem.
Letters in Mathematical Physics, 17:359–363, May 1989a. doi: 10.1007/BF00399761. URL
https://doi.org/10.1007/BF00399761.

Reinhard F. Werner. Quantum states with Einstein-Podolsky-Rosen correlations admitting a hidden-

variable model. Physical Review A, 40:4277–4281, October 1989b. doi: 10.1103/PhysRevA.40.
4277. URL http://link.aps.org/doi/10.1103/PhysRevA.40.4277.

Reinhard F. Werner. All teleportation and dense coding schemes. Journal of Physics A: Mathematical
and General, 34:7081, September 2001. URL http://stacks.iop.org/0305-4470/34/i=
35/a=332.

Mark M. Wilde. Sequential decoding of a general classical-quantum channel. Proceedings of the

Royal Society of London A: Mathematical, Physical and Engineering Sciences, 469, September
2013. ISSN 1364-5021. doi: 10.1098/rspa.2013.0259. URL https://doi.org/10.1098/
rspa.2013.0259.

Mark M. Wilde. Squashed entanglement and approximate private states. Quantum Information
Processing, 15:4563–4580, November 2016. ISSN 1573-1332. doi: 10.1007/s11128-016-1432-7.
URL http://dx.doi.org/10.1007/s11128-016-1432-7.

Mark M. Wilde. Quantum Information Theory. Cambridge University Press, second edition, 2017a.
URL https://doi.org/10.1017/CBO9781139525343.

Mark M. Wilde. Position-based coding and convex splitting for private communication over
quantum channels. Quantum Information Processing, 16:264, October 2017b. URL https:
//doi.org/10.1007/s11128-017-1718-4.

Mark M. Wilde. Strong and uniform convergence in the teleportation simulation of bosonic Gaussian
channels. Physical Review A, 97:062305, June 2018a. doi: 10.1103/PhysRevA.97.062305. URL
https://link.aps.org/doi/10.1103/PhysRevA.97.062305.

Mark M. Wilde. Optimized quantum 𝑓 -divergences and data processing. Journal of Physics A, 51:
374002, September 2018b. URL https://doi.org/10.1088/1751-8121/aad5a1.

Mark M. Wilde and Haoyu Qi. Energy-constrained private and quantum capacities of quantum
channels. IEEE Transactions on Information Theory, 64:7802–7827, December 2018. URL
https://ieeexplore.ieee.org/document/8541091.

Mark M. Wilde and Andreas Winter. Strong Converse for the Quantum Capacity of the Erasure
Channel for Almost All Codes. 27:52–66, 2014. ISSN 1868-8969. doi: 10.4230/LIPIcs.TQC.
2014.52. URL http://drops.dagstuhl.de/opus/volltexte/2014/4806.

Mark M. Wilde, Andreas Winter, and Dong Yang. Strong converse for the classical capacity
of entanglement-breaking and Hadamard channels via a sandwiched Rényi relative entropy.
Communications in Mathematical Physics, 331:593–622, October 2014. URL https://doi.
org/10.1007/s00220-014-2122-x.

1226
Mark M. Wilde, Marco Tomamichel, and Mario Berta. Converse bounds for private communication
over quantum channels. IEEE Transactions on Information Theory, 63:1792–1817, March 2017.
URL https://ieeexplore.ieee.org/document/7807212.
Andreas Winter. Tight uniform continuity bounds for quantum entropies: conditional entropy,
relative entropy distance and energy constraints. Communications in Mathematical Physics, 347:
291–313, October 2016. URL https://doi.org/10.1007/s00220-016-2609-8.
Michael M. Wolf, David Pérez-García, and Geza Giedke. Quantum capacities of bosonic channels.
Physical Review Letters, 98:130501, March 2007. doi: 10.1103/PhysRevLett.98.130501. URL
https://link.aps.org/doi/10.1103/PhysRevLett.98.130501.
Jacob Wolfowitz. Coding Theorems of Information Theory, volume 31 of Ergebnisse der Mathematik
und Ihrer Grenzgebiete. Springer, 1964.
William K. Wootters. Entanglement of formation of an arbitrary state of two qubits. Physical
Review Letters, 80:2245–2248, March 1998. doi: 10.1103/PhysRevLett.80.2245. URL https:
//link.aps.org/doi/10.1103/PhysRevLett.80.2245.
Aaron D. Wyner. The wire-tap channel. Bell System Technical Journal, 54:1355–1387, October
1975. URL https://ieeexplore.ieee.org/document/6772207.
Dong Yang. A simple proof of monogamy of entanglement. Physics Letters A, 360:249–250,
2006. ISSN 0375-9601. doi: https://doi.org/10.1016/j.physleta.2006.08.027. URL http:
//www.sciencedirect.com/science/article/pii/S0375960106012801.
Jon Yard, Patrick Hayden, and Igor Devetak. Capacity theorems for quantum multiple-access
channels: Classical-quantum and quantum-quantum capacity regions. IEEE Transactions
on Information Theory, 54:3091–3113, July 2008. URL https://ieeexplore.ieee.org/
document/4545000.
Haidong Yuan and Chi-Hang Fred Fung. Fidelity and Fisher Information on Quantum Chan-
nels. New Journal of Physics, 19:113039, November 2017. doi: 10.1088/1367-2630/
aa874c. URL http://stacks.iop.org/1367-2630/19/i=11/a=113039?key=crossref.
c8abb94f653e6d572133885d9e0b86b0.
Horace Yuen, Robert Kennedy, and Melvin Lax. Optimum testing of multiple hypotheses in
quantum detection theory. IEEE Transactions on Information Theory, 21:125–134, March 1975.
URL https://ieeexplore.ieee.org/document/1055351.
Sisi Zhou and Liang Jiang. An Exact Correspondence between the Quantum Fisher Information
and the Bures Metric. October 2019. URL http://arxiv.org/abs/1910.08473.
Xinlan Zhou, Debbie W. Leung, and Isaac L. Chuang. Methodology for quantum logic gate
construction. Physical Review A, 62:052316, oct 2000. doi: 10.1103/PhysRevA.62.052316.
URL https://link.aps.org/doi/10.1103/PhysRevA.62.052316.
Karol Zyczkowski, Paweł Horodecki, Anna Sanpera, and Maciej Lewenstein. Volume of the set of
separable states. Physical Review A, 58:883–892, August 1998. doi: 10.1103/PhysRevA.58.883.
URL https://link.aps.org/doi/10.1103/PhysRevA.58.883.

1227

Resources of The Quantum World: A Modern Textbook On Quantum Resource Theories
No ratings yet
Resources of The Quantum World: A Modern Textbook On Quantum Resource Theories
958 pages
QI Notes201123
No ratings yet
QI Notes201123
709 pages
Quantum Walks For Computer Scientists: MOCL009-FM MOCL009-FM - Cls September 13, 2008 12:8
No ratings yet
Quantum Walks For Computer Scientists: MOCL009-FM MOCL009-FM - Cls September 13, 2008 12:8
133 pages
2024_kueng_quantum_computing
No ratings yet
2024_kueng_quantum_computing
194 pages
QKD_pRAMOD vERMA
No ratings yet
QKD_pRAMOD vERMA
226 pages
2003.08657v1
No ratings yet
2003.08657v1
159 pages
A Thesis Submitted For The Degree of PHD at The University of Warwick
No ratings yet
A Thesis Submitted For The Degree of PHD at The University of Warwick
207 pages
Diversity Education
100% (9)
Diversity Education
18 pages
Intro to Quantum Computing - Aaronson
No ratings yet
Intro to Quantum Computing - Aaronson
259 pages
Luecke W. - Quantum Information Processing (2005)
No ratings yet
Luecke W. - Quantum Information Processing (2005)
201 pages
1002 Catalog Prom New
No ratings yet
1002 Catalog Prom New
21 pages
Barak Shoshany PHYS 4P51 Lecture Notes
No ratings yet
Barak Shoshany PHYS 4P51 Lecture Notes
180 pages
Bassano Vacchini PDF
No ratings yet
Bassano Vacchini PDF
151 pages
0906.2699v2
No ratings yet
0906.2699v2
52 pages
Full Download Geologic Time Scale 2020 Felix M. Gradstein PDF
100% (2)
Full Download Geologic Time Scale 2020 Felix M. Gradstein PDF
64 pages
1610.09896v5
No ratings yet
1610.09896v5
31 pages
Powders
100% (2)
Powders
16 pages
Qbook 1
No ratings yet
Qbook 1
438 pages
Beganner Levels 1-17
50% (4)
Beganner Levels 1-17
17 pages
The Functional Analysis of Quantum Information Theory: Ved Prakash Gupta, Prabha Mandayam and V. S. Sunder
No ratings yet
The Functional Analysis of Quantum Information Theory: Ved Prakash Gupta, Prabha Mandayam and V. S. Sunder
123 pages
02 Whole
No ratings yet
02 Whole
138 pages
Lectures On Quantum Computation, Quantum Error Correcting Codes and Information Theory
No ratings yet
Lectures On Quantum Computation, Quantum Error Correcting Codes and Information Theory
132 pages
Quantum Computing Lecture Notes Another Set
No ratings yet
Quantum Computing Lecture Notes Another Set
105 pages
Qit 18
No ratings yet
Qit 18
141 pages
PE Price Circular 12.04.2024-9
No ratings yet
PE Price Circular 12.04.2024-9
14 pages
Lecture Notes v2 18
No ratings yet
Lecture Notes v2 18
149 pages
Qcqi Seminarquantum Computations
No ratings yet
Qcqi Seminarquantum Computations
100 pages
Fundamentals of Quantum Information Theory: Michael Keyl
No ratings yet
Fundamentals of Quantum Information Theory: Michael Keyl
120 pages
Gaussian States in Continuous Variable
No ratings yet
Gaussian States in Continuous Variable
108 pages
Course On Quantum Computing
No ratings yet
Course On Quantum Computing
235 pages
Quantum Mechanics and Geometry-si Li
No ratings yet
Quantum Mechanics and Geometry-si Li
246 pages
2110.04970v1
No ratings yet
2110.04970v1
8 pages
BEC 371 - Consumer Theory - 1 - 2
No ratings yet
BEC 371 - Consumer Theory - 1 - 2
61 pages
9781638280590-summary
No ratings yet
9781638280590-summary
18 pages
PMK 40 Ibs
No ratings yet
PMK 40 Ibs
8 pages
IQC Masterfile
No ratings yet
IQC Masterfile
117 pages
1405.3999v2
No ratings yet
1405.3999v2
12 pages
RDOS Best Practice Guidelines For Design of Storage and Collection Space of Waste in MF Commercial and Mixed Use Buildings Final
No ratings yet
RDOS Best Practice Guidelines For Design of Storage and Collection Space of Waste in MF Commercial and Mixed Use Buildings Final
22 pages
Barak Shoshany PHY 256 Lecture Notes
100% (1)
Barak Shoshany PHY 256 Lecture Notes
167 pages
BCHD Proof
No ratings yet
BCHD Proof
43 pages
FTS-2025_Ph-1_T08-(Code-A)_Sol_(16-01-2025)
No ratings yet
FTS-2025_Ph-1_T08-(Code-A)_Sol_(16-01-2025)
16 pages
Percussive Typhoon: Walter Mertens
No ratings yet
Percussive Typhoon: Walter Mertens
35 pages
Qcnotes PDF
No ratings yet
Qcnotes PDF
218 pages
Lecture 9
No ratings yet
Lecture 9
87 pages
Broshure - BFI-E3 Inverter
No ratings yet
Broshure - BFI-E3 Inverter
4 pages
Physics 160 Notes
No ratings yet
Physics 160 Notes
73 pages
Step by Step Sweater V1
100% (2)
Step by Step Sweater V1
7 pages
Girador de Balde
No ratings yet
Girador de Balde
14 pages
The Mathematics of Entanglement: Summer School at Universidad de Los Andes
No ratings yet
The Mathematics of Entanglement: Summer School at Universidad de Los Andes
70 pages
Quantum Computing: Lecture Notes: Ronald de Wolf
No ratings yet
Quantum Computing: Lecture Notes: Ronald de Wolf
163 pages
22 Scheme Physics For Cse Module 3 Notes
No ratings yet
22 Scheme Physics For Cse Module 3 Notes
45 pages
Barak Shoshany PHY 256 Lecture Notes
No ratings yet
Barak Shoshany PHY 256 Lecture Notes
165 pages
INV-000167
No ratings yet
INV-000167
2 pages
Pse Anrpt2010
No ratings yet
Pse Anrpt2010
59 pages
كوانتم كومبيوتك
No ratings yet
كوانتم كومبيوتك
165 pages
Qclec
No ratings yet
Qclec
260 pages
Organ Donation Act 1@
No ratings yet
Organ Donation Act 1@
10 pages
LayingOutFrustumWithDividers 20jul2012
No ratings yet
LayingOutFrustumWithDividers 20jul2012
9 pages
Lista de Verbos Por Grupos
No ratings yet
Lista de Verbos Por Grupos
3 pages
Books On Squeezed and Coherent States
No ratings yet
Books On Squeezed and Coherent States
9 pages
Quantum Computing - Lecture Notes
100% (2)
Quantum Computing - Lecture Notes
114 pages
The Mathematics of Entanglement: Summer School at Universidad de Los Andes
No ratings yet
The Mathematics of Entanglement: Summer School at Universidad de Los Andes
52 pages
Renes Lecture Notes14 PDF
No ratings yet
Renes Lecture Notes14 PDF
187 pages
A Study On Customer Attitude Towards Yamaha Sports Bike in Lucknow City
No ratings yet
A Study On Customer Attitude Towards Yamaha Sports Bike in Lucknow City
6 pages
PHD Thesis of O. Krueger - Quantum Information Theory With Gaussian Systems
No ratings yet
PHD Thesis of O. Krueger - Quantum Information Theory With Gaussian Systems
135 pages
Quantum Computing and Information
100% (2)
Quantum Computing and Information
182 pages
OM Assignment: Grab
100% (1)
OM Assignment: Grab
42 pages
QC Notes
No ratings yet
QC Notes
141 pages
Coding Data Worksheet - Jonathan Lowe
No ratings yet
Coding Data Worksheet - Jonathan Lowe
2 pages
QI Lecture Notes
No ratings yet
QI Lecture Notes
83 pages
The Tarot Workbook - Complete Course
100% (6)
The Tarot Workbook - Complete Course
107 pages
Lectures Qco
No ratings yet
Lectures Qco
124 pages
Use of Coatings On Hydraulic Steel Structures
No ratings yet
Use of Coatings On Hydraulic Steel Structures
12 pages
1803 07098 PDF
No ratings yet
1803 07098 PDF
38 pages
NG Sze Kay Priscilla 1
No ratings yet
NG Sze Kay Priscilla 1
1 page
Lectures Notes On Quantum Computing and Quantum Information
No ratings yet
Lectures Notes On Quantum Computing and Quantum Information
187 pages
Track and Field Webquest
No ratings yet
Track and Field Webquest
5 pages
Colonialism
No ratings yet
Colonialism
7 pages
Quantum Information Theory (Lecture Notes)
No ratings yet
Quantum Information Theory (Lecture Notes)
101 pages
The Temple of Quantum Computing
No ratings yet
The Temple of Quantum Computing
250 pages
All Lectures
No ratings yet
All Lectures
173 pages
Script QI
No ratings yet
Script QI
199 pages
Your Personal Copy - Do Not Bring To Interview: Online Nonimmigrant Visa Application (DS-160)
No ratings yet
Your Personal Copy - Do Not Bring To Interview: Online Nonimmigrant Visa Application (DS-160)
4 pages
The Temple of Quantum Computing
No ratings yet
The Temple of Quantum Computing
251 pages
Question Bank Differential Calculus Questions Are Taken From 12 University Questions 2 Mark Questions
No ratings yet
Question Bank Differential Calculus Questions Are Taken From 12 University Questions 2 Mark Questions
17 pages
White Dwarf Rules
100% (4)
White Dwarf Rules
2 pages
Mortals or Immortals
From Everand
Mortals or Immortals
Konstantinos p Anastasiadis
No ratings yet
Quantum Physics for Beginners
From Everand
Quantum Physics for Beginners
Max Thomson
4.5/5 (3)
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Risk Management and System Safety
From Everand
Risk Management and System Safety
Leonam dos Santos Guimarães
5/5 (1)
Brass Methods: An Essential Resource for Educators, Conductors, and Students
From Everand
Brass Methods: An Essential Resource for Educators, Conductors, and Students
David Kish
No ratings yet
A Discourse Analysis of 1 Peter
From Everand
A Discourse Analysis of 1 Peter
Ervin Ray Starwalt
No ratings yet
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
From Everand
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
Massimiliano Bocciarelli
No ratings yet
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Keys to Better Reading
From Everand
Keys to Better Reading
Judy McFall
No ratings yet
Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.