100% found this document useful (1 vote)
47 views

An Intensive Introduction To Cryptography

Uploaded by

tayshie.ent
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
47 views

An Intensive Introduction To Cryptography

Uploaded by

tayshie.ent
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 404

BOA Z BA R A K

AN INTENSIVE
INTRODUCTION
TO C RY P TO G R A P H Y

L E C T U R E N OT E S .
AVA I L A B L E O N HTTPS://INTENSECRYPTO.ORG
Text available on  https://github.com/boazbk/crypto - please post any issues there - thank you!

This version was compiled on Wednesday 17th November, 2021 22:35

Copyright © 2021 Boaz Barak

This work is licensed under a Creative


Commons “Attribution-NonCommercial-
NoDerivatives 4.0 International” license.
If you can just get your mind together

Then come on across to me

We’ll hold hands, and then we’ll watch the sun rise

From the bottom of the sea

Jimi Hendrix, Are You Experienced?


Contents

Foreword and Syllabus 9

I Preliminaries 15

0 Mathematical Background 17

1 Introduction 39

II Private key cryptography 59

2 Computational Security 61

3 Pseudorandomness 83

4 Pseudorandom functions 103

5 Pseudorandom functions from pseudorandom generators


and CPA security 117

6 Chosen Ciphertext Security 137

7 Hash Functions, Random Oracles, and Bitcoin 153

8 Key derivation, protecting passwords, slow hashes, Merkle


trees 167

III Public key cryptography 177

9 Public key cryptography 179

10 Concrete candidates for public key crypto 211

11 Lattice based cryptography 229

12 Establishing secure connections over insecure channels 247

Compiled on 11.17.2021 22:35


6

IV Advanced topics 261

13 Zero knowledge proofs 263

14 Fully homomorphic encryption: Introduction and boot-


strapping 279

15 Fully homomorphic encryption: Construction 293

16 Multiparty secure computation I: Definition and Honest-


But-Curious to Malicious complier 311

17 Multiparty secure computation II: Construction using Fully


Homomorphic Encryption 327

18 Quantum computing and cryptography I 337

19 Quantum computing and cryptography II 355

20 Software Obfuscation 365

21 More obfuscation, exotic encryptions 375

22 Anonymous communication 381

V Conclusions 383

23 Ethical, moral, and policy dimensions to cryptography 385

24 Course recap 391


Contents (detailed)

Foreword and Syllabus 9


0.1 Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
0.1.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . 12
0.2 Why is cryptography hard? . . . . . . . . . . . . . . . . 12

I Preliminaries 15

0 Mathematical Background 17
0.1 A quick overview of mathematical prerequisites . . . . 17
0.2 Mathematical Proofs . . . . . . . . . . . . . . . . . . . . 19
0.2.1 Example: The existence of infinitely many primes. 20
0.3 Probability and Sample spaces . . . . . . . . . . . . . . . 22
0.3.1 Random variables . . . . . . . . . . . . . . . . . . 24
0.3.2 Distributions over strings . . . . . . . . . . . . . . 26
0.3.3 More general sample spaces. . . . . . . . . . . . . 27
0.4 Correlations and independence . . . . . . . . . . . . . . 27
0.4.1 Independent random variables . . . . . . . . . . . 29
0.4.2 Collections of independent random variables. . . 30
0.5 Concentration and tail bounds . . . . . . . . . . . . . . . 31
0.5.1 Chebyshev’s Inequality . . . . . . . . . . . . . . . 32
0.5.2 The Chernoff bound . . . . . . . . . . . . . . . . . 33
0.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
0.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

1 Introduction 39
1.1 Some history . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.2 Defining encryptions . . . . . . . . . . . . . . . . . . . . 41
1.3 Defining security of encryption . . . . . . . . . . . . . . 43
1.3.1 Generating randomness in actual cryptographic
systems . . . . . . . . . . . . . . . . . . . . . . . . 44
1.4 Defining the secrecy requirement. . . . . . . . . . . . . . 46
1.5 Perfect Secrecy . . . . . . . . . . . . . . . . . . . . . . . . 49
1.5.1 Achieving perfect secrecy . . . . . . . . . . . . . . 52
1.6 Necessity of long keys . . . . . . . . . . . . . . . . . . . 54
8

1.6.1 Amplifying success probability . . . . . . . . . . 57


1.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 58

II Private key cryptography 59

2 Computational Security 61
2.0.1 Proof by reduction . . . . . . . . . . . . . . . . . . 65
2.1 The asymptotic approach . . . . . . . . . . . . . . . . . . 66
2.1.1 Counting number of operations. . . . . . . . . . . 68
2.2 Our first conjecture . . . . . . . . . . . . . . . . . . . . . 70
2.3 Why care about the cipher conjecture? . . . . . . . . . . 71
2.4 Prelude: Computational Indistinguishability . . . . . . 71
2.5 The Length Extension Theorem or Stream Ciphers . . . 76
2.5.1 Appendix: The computational model . . . . . . . 80

3 Pseudorandomness 83
3.0.1 Unpredictability: an alternative approach for
proving the length extension theorem . . . . . . . 88
3.1 Stream ciphers . . . . . . . . . . . . . . . . . . . . . . . . 90
3.2 What do pseudorandom generators actually look like? . 92
3.2.1 Attempt 0: The counter generator . . . . . . . . . 92
3.2.2 Attempt 1: The linear checksum / linear feed-
back shift register (LFSR) . . . . . . . . . . . . . . 92
3.2.3 From insecurity to security . . . . . . . . . . . . . 94
3.2.4 Attempt 2: Linear Congruential Generators
with dropped bits . . . . . . . . . . . . . . . . . . 95
3.3 Successful examples . . . . . . . . . . . . . . . . . . . . . 96
3.3.1 Case Study 1: Subset Sum Generator . . . . . . . 96
3.3.2 Case Study 2: RC4 . . . . . . . . . . . . . . . . . . 97
3.3.3 Case Study 3: Blum, Blum and Shub . . . . . . . 98
3.4 Non-constructive existence of pseudorandom generators 99

4 Pseudorandom functions 103


4.1 One time passwords (e.g. Google Authenticator, RSA
ID, etc.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.1.1 How do pseudorandom functions help in the
login problem? . . . . . . . . . . . . . . . . . . . . 108
4.1.2 Modifying input and output lengths of PRFs . . . 111
4.2 Message Authentication Codes . . . . . . . . . . . . . . 112
4.3 MACs from PRFs . . . . . . . . . . . . . . . . . . . . . . 114
4.4 Arbitrary input length extension for MACs and PRFs . 115
4.5 Aside: natural proofs . . . . . . . . . . . . . . . . . . . . 115
9

5 Pseudorandom functions from pseudorandom generators


and CPA security 117
5.1 Securely encrypting many messages - chosen plaintext
security . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.2 Pseudorandom permutations / block ciphers . . . . . . 125
5.3 Encryption modes . . . . . . . . . . . . . . . . . . . . . . 130
5.4 Optional, Aside: Broadcast Encryption . . . . . . . . . . 131
5.5 Reading comprehension exercises . . . . . . . . . . . . . 134

6 Chosen Ciphertext Security 137


6.1 Short recap . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.2 Going beyond CPA . . . . . . . . . . . . . . . . . . . . . 138
6.2.1 Example: The Wired Equivalence Privacy (WEP) 138
6.2.2 Chosen ciphertext security . . . . . . . . . . . . . 140
6.3 Constructing CCA secure encryption . . . . . . . . . . . 143
6.4 (Simplified) GCM encryption . . . . . . . . . . . . . . . 148
6.5 Padding, chopping, and their pitfalls: the “buffer
overflow” of cryptography . . . . . . . . . . . . . . . . . 149
6.6 Chosen ciphertext attack as implementing metaphors . 150
6.7 Reading comprehension exercises . . . . . . . . . . . . . 150

7 Hash Functions, Random Oracles, and Bitcoin 153


7.1 The “Bitcoin” Problem . . . . . . . . . . . . . . . . . . . 153
7.1.1 The Currency Problem . . . . . . . . . . . . . . . 153
7.1.2 Bitcoin Architecture . . . . . . . . . . . . . . . . . 154
7.2 The Bitcoin Ledger . . . . . . . . . . . . . . . . . . . . . 155
7.2.1 From Proof of Work to Consensus on Ledger . . . 158
7.3 Collision Resistance Hash Functions and Creating
Short “Unique” Identifiers . . . . . . . . . . . . . . . . . 160
7.4 Practical Constructions of Cryptographic Hash Func-
tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.4.1 Practical Random-ish Functions . . . . . . . . . . 162
7.4.2 Some History . . . . . . . . . . . . . . . . . . . . . 163
7.4.3 The NSA and Hash Functions . . . . . . . . . . . 164
7.4.4 Cryptographic vs Non-Cryptographic Hash
Functions . . . . . . . . . . . . . . . . . . . . . . . 164
7.5 Reading comprehension exercises . . . . . . . . . . . . . 165

8 Key derivation, protecting passwords, slow hashes, Merkle


trees 167
8.1 Keys from passwords . . . . . . . . . . . . . . . . . . . . 167
8.2 Merkle trees and verifying storage. . . . . . . . . . . . . 170
8.3 Proofs of Retrievability . . . . . . . . . . . . . . . . . . . 171
8.4 Entropy extraction . . . . . . . . . . . . . . . . . . . . . . 172
8.4.1 Forward and backward secrecy . . . . . . . . . . 175
10

III Public key cryptography 177

9 Public key cryptography 179


9.1 Private key crypto recap . . . . . . . . . . . . . . . . . . 181
9.2 Public Key Encryptions: Definition . . . . . . . . . . . . 184
9.2.1 The obfuscation paradigm . . . . . . . . . . . . . 185
9.3 Some concrete candidates: . . . . . . . . . . . . . . . . . 187
9.3.1 Diffie-Hellman Encryption (aka El-Gamal) . . . . 187
9.3.2 Sampling random primes . . . . . . . . . . . . . . 192
9.3.3 A little bit of group theory. . . . . . . . . . . . . . 193
9.3.4 Digital Signatures . . . . . . . . . . . . . . . . . . 195
9.3.5 The Digital Signature Algorithm (DSA) . . . . . 196
9.4 Putting everything together - security in practice. . . . . 200
9.5 Appendix: An alternative proof of the density of primes 204
9.6 Additional Group Theory Exercises and Proofs . . . . . 204
9.6.1 Solved exercises: . . . . . . . . . . . . . . . . . . . 206

10 Concrete candidates for public key crypto 211


10.1 Some number theory. . . . . . . . . . . . . . . . . . . . . 211
10.1.1 Primaliy testing . . . . . . . . . . . . . . . . . . . 212
10.1.2 Fields . . . . . . . . . . . . . . . . . . . . . . . . . 213
10.1.3 Chinese remainder theorem . . . . . . . . . . . . 214
10.1.4 The RSA and Rabin functions . . . . . . . . . . . 215
10.1.5 Abstraction: trapdoor permutations . . . . . . . . 216
10.1.6 Public key encryption from trapdoor permuta-
tions . . . . . . . . . . . . . . . . . . . . . . . . . . 217
10.1.7 Digital signatures from trapdoor permutations . 220
10.2 Hardcore bits and security without random oracles . . 222
10.2.1 Extending to more than one hardcore bit . . . . . 226

11 Lattice based cryptography 229


11.0.1 Quick linear algebra recap . . . . . . . . . . . . . 231
11.1 A world without Gaussian elimination . . . . . . . . . . 233
11.2 Security in the real world. . . . . . . . . . . . . . . . . . 235
11.3 Search to decision . . . . . . . . . . . . . . . . . . . . . . 237
11.4 An LWE based encryption scheme . . . . . . . . . . . . 238
11.5 But what are lattices? . . . . . . . . . . . . . . . . . . . . 243
11.6 Ring based lattices . . . . . . . . . . . . . . . . . . . . . . 244

12 Establishing secure connections over insecure channels 247


12.1 Cryptography’s obsession with adjectives. . . . . . . . . 247
12.2 Basic Key Exchange protocol . . . . . . . . . . . . . . . . 249
12.3 Authenticated key exchange . . . . . . . . . . . . . . . . 250
12.3.1 Bleichenbacher’s attack on RSA PKCS V1.5 and
SSL V3.0 . . . . . . . . . . . . . . . . . . . . . . . . 250
11

12.4 Chosen ciphertext attack security for public key cryp-


tography . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
12.5 CCA secure public key encryption in the Random
Oracle Model . . . . . . . . . . . . . . . . . . . . . . . . 253
12.5.1 Defining secure authenticated key exchange . . . 256
12.5.2 The compiler approach for authenticated key
exchange . . . . . . . . . . . . . . . . . . . . . . . 258
12.6 Password authenticated key exchange. . . . . . . . . . . 259
12.7 Client to client key exchange for secure text messaging
- ZRTP, OTR, TextSecure . . . . . . . . . . . . . . . . . . 259
12.8 Heartbleed and logjam attacks . . . . . . . . . . . . . . . 259

IV Advanced topics 261

13 Zero knowledge proofs 263


13.1 Applications for zero knowledge proofs. . . . . . . . . . 264
13.1.1 Nuclear disarmament . . . . . . . . . . . . . . . . 264
13.1.2 Voting . . . . . . . . . . . . . . . . . . . . . . . . . 265
13.1.3 More applications . . . . . . . . . . . . . . . . . . 265
13.2 Defining and constructing zero knowledge proofs . . . 266
13.3 Defining zero knowledge . . . . . . . . . . . . . . . . . . 269
13.4 Zero knowledge proof for Hamiltonicity. . . . . . . . . . 273
13.4.1 Why is this interesting? . . . . . . . . . . . . . . . 275
13.5 Parallel repetition and turning zero knowledge proofs
to signatures. . . . . . . . . . . . . . . . . . . . . . . . . . 277
13.5.1 “Bonus features” of zero knowledge . . . . . . . . 278

14 Fully homomorphic encryption: Introduction and boot-


strapping 279
14.1 Defining fully homomorphic encryption . . . . . . . . . 282
14.1.1 Another application: fully homomorphic en-
cryption for verifying computation . . . . . . . . 283
14.2 Example: An XOR homomorphic encryption . . . . . . 284
14.2.1 Abstraction: A trapdoor pseudorandom generator.286
14.3 From linear homomorphism to full homomorphism . . 289
14.4 Bootstrapping: Fully Homomorphic “escape velocity” . 289
14.4.1 Radioactive legos analogy . . . . . . . . . . . . . 290
14.4.2 Proving the bootstrapping theorem . . . . . . . . 291

15 Fully homomorphic encryption: Construction 293


15.1 Prelude: from vectors to matrices . . . . . . . . . . . . . 295
15.2 Real world partially homomorphic encryption . . . . . 297
15.3 Noise management via encoding . . . . . . . . . . . . . 298
15.4 Putting it all together . . . . . . . . . . . . . . . . . . . . 300
12

15.5 Analysis of our scheme . . . . . . . . . . . . . . . . . . . 301


15.5.1 Correctness . . . . . . . . . . . . . . . . . . . . . . 302
15.5.2 CPA Security . . . . . . . . . . . . . . . . . . . . . 303
15.5.3 Homomorphism . . . . . . . . . . . . . . . . . . . 303
15.5.4 Shallow decryption circuit . . . . . . . . . . . . . 303
15.6 Advanced topics: . . . . . . . . . . . . . . . . . . . . . . 306
15.6.1 Fully homomorphic encryption for approximate
computation over the real numbers: CKKS . . . . 306
15.6.2 Bandwidth efficient fully homomorphic encryp-
tion GH . . . . . . . . . . . . . . . . . . . . . . . . 307
15.6.3 Using fully homomorphic encryption to achieve
private information retrieval. . . . . . . . . . . . . 308

16 Multiparty secure computation I: Definition and Honest-


But-Curious to Malicious complier 311
16.1 Ideal vs. Real Model Security. . . . . . . . . . . . . . . . 312
16.2 Formally defining secure multiparty computation . . . 313
16.2.1 First attempt: a slightly “too ideal” definition . . 313
16.2.2 Allowing for aborts . . . . . . . . . . . . . . . . . 314
16.2.3 Some comments: . . . . . . . . . . . . . . . . . . . 316
16.3 Example: Second price auction using bitcoin . . . . . . 318
16.3.1 Another example: distributed and threshold
cryptography . . . . . . . . . . . . . . . . . . . . . 319
16.4 Proving the fundamental theorem: . . . . . . . . . . . . 320
16.5 Malicious to honest but curious reduction . . . . . . . . 321
16.5.1 Handling probabilistic strategies: . . . . . . . . . 325

17 Multiparty secure computation II: Construction using Fully


Homomorphic Encryption 327
17.1 Constructing 2 party honest but curious computation
from fully homomorphic encryption . . . . . . . . . . . 328
17.2 Achieving circuit privacy in a fully homomorphic
encryption . . . . . . . . . . . . . . . . . . . . . . . . . . 331
17.2.1 Bottom line: A two party secure computation
protocol . . . . . . . . . . . . . . . . . . . . . . . . 333
17.3 Beyond two parties . . . . . . . . . . . . . . . . . . . . . 334

18 Quantum computing and cryptography I 337


18.1 The double slit experiment . . . . . . . . . . . . . . . . . 338
18.2 Quantum amplitudes . . . . . . . . . . . . . . . . . . . . 338
18.2.1 Quantum computing and computation - an
executive summary. . . . . . . . . . . . . . . . . . 341
18.3 Quantum 101 . . . . . . . . . . . . . . . . . . . . . . . . 343
18.3.1 Physically realizing quantum computation . . . . 346
18.3.2 Bra-ket notation . . . . . . . . . . . . . . . . . . . 347
13

18.4 Bell’s Inequality . . . . . . . . . . . . . . . . . . . . . . . 348


18.5 Analysis of Bell’s Inequality . . . . . . . . . . . . . . . . 349
18.6 Grover’s Algorithm . . . . . . . . . . . . . . . . . . . . . 352

19 Quantum computing and cryptography II 355


19.1 From order finding to factoring and discrete log . . . . 355
19.2 Finding periods of a function: Simon’s Algorithm . . . 356
19.3 From Simon to Shor . . . . . . . . . . . . . . . . . . . . . 358
19.3.1 The Fourier transform over ℤ𝑚 . . . . . . . . . . . 358
19.3.2 Fast Fourier Transform. . . . . . . . . . . . . . . . 359
19.3.3 Quantum Fourier Transform over ℤ𝑚 . . . . . . . 360
19.4 Shor’s Order-Finding Algorithm. . . . . . . . . . . . . . 361
19.4.1 Analysis: the case that 𝑟|𝑚 . . . . . . . . . . . . . 362
19.4.2 The general case . . . . . . . . . . . . . . . . . . . 362
19.5 Rational approximation of real numbers . . . . . . . . . 363
19.5.1 Quantum cryptography . . . . . . . . . . . . . . . 364

20 Software Obfuscation 365


20.1 Witness encryption . . . . . . . . . . . . . . . . . . . . . 366
20.2 Deniable encryption . . . . . . . . . . . . . . . . . . . . . 367
20.3 Functional encryption . . . . . . . . . . . . . . . . . . . . 367
20.4 The software patch problem . . . . . . . . . . . . . . . . 368
20.5 Software obfuscation . . . . . . . . . . . . . . . . . . . . 369
20.6 Applications of obfuscation . . . . . . . . . . . . . . . . 370
20.7 Impossibility of obfuscation . . . . . . . . . . . . . . . . 370
20.7.1 Proof of impossibility of VBB obfuscation . . . . 370
20.8 Indistinguishability obfuscation . . . . . . . . . . . . . . 373

21 More obfuscation, exotic encryptions 375


21.1 Slower, weaker, less securer . . . . . . . . . . . . . . . . 375
21.2 How to get IBE from pairing based assumptions. . . . . 376
21.3 Beyond pairing based cryptography . . . . . . . . . . . 379

22 Anonymous communication 381


22.1 Steganography . . . . . . . . . . . . . . . . . . . . . . . . 381
22.2 Anonymous routing . . . . . . . . . . . . . . . . . . . . 382
22.3 Tor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
22.4 Telex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
22.5 Riposte . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382

V Conclusions 383

23 Ethical, moral, and policy dimensions to cryptography 385


23.1 Reading prior to lecture: . . . . . . . . . . . . . . . . . . 387
23.2 Case studies. . . . . . . . . . . . . . . . . . . . . . . . . . 387
14

23.2.1 The Snowden revelations . . . . . . . . . . . . . . 387


23.2.2 FBI vs Apple case . . . . . . . . . . . . . . . . . . 388
23.2.3 Juniper backdoor case and the OPM break-in . . 389

24 Course recap 391


24.1 Some things we did not cover . . . . . . . . . . . . . . . 393
24.2 What I hope you learned . . . . . . . . . . . . . . . . . . 394
Foreword and Syllabus

“Human ingenuity cannot concoct a cipher which human


ingenuity cannot resolve.” Edgar Allan Poe, 1841

Cryptography - the art or science of “secret writing” - has been


around for several millenia. For almost all that time, Edgar Allan Poe’s
quote above held true. Indeed, the history of cryptography is littered
with the figurative corpses of cryptosystems believed secure and then
broken, and sometimes with the actual corpses of those who have
mistakenly placed their faith in these cryptosystems. Yet, something
changed in the last few decades. New cryptosystems have been found
that have not been broken despite being subjected to immense efforts
involving both human ingenuity and computational power on a scale
that completely dwarves the “crypto breakers” of Poe’s time. Even
more amazingly, these cryptosystems are not only seemingly unbreak-
able, but they also achieve this under much harsher conditions. Not
only do today’s attackers have more computational power but they
also have more data to work with. In Poe’s age, an attacker would be
lucky if they got access to more than a few ciphertexts with known
plaintexts. These days attackers might have massive amounts of data
- terabytes or more - at their disposal. In fact, with public key encryp-
tion, an attacker can generate as many ciphertexts as they wish.
These new types of cryptosystems, both more secure and more ver-
satile, have enabled many applications that in the past were not only
impossible but in fact unimaginable. These include secure communi-
cation without sharing a secret, electronic voting without a trusted
authority, anonymous digital cash, and many more. Cryptography
now supplies crucial infrastructure without which much of the mod-
ern “communication economy” could not function.
This course is about the story of this cryptographic revolution.
However, beyond the cool applications and the crucial importance
of cryptography to our society, it contains also intellectual and math-
ematical beauty. To understand these often paradoxical notions of
cryptography, you need to think differently, adapting the point of
view of an attacker, and (as we will see) sometimes adapting the

Compiled on 11.17.2021 22:35


16

points of view of other hypothetical entities. More than anything, this


course is about this cryptographic way of thinking. It may not be im-
mediately applicable to protecting your credit card information or to
building a secure system, but learning a new way of thinking is its
own reward.

0.1 SYLLABUS
In this fast-paced course, I plan to start from the very basic notions of
cryptography and by the end of the term reach some of the exciting
advances that happened in the last few years such as the construction
of fully homomorphic encryption, a notion that Brian Hayes called “one
of the most amazing magic tricks in all of computer science”, and in-
distinguishability obfuscators which are even more amazing. To achieve
this, our focus will be on ideas rather than implementations and so we
will present cryptographic notions in their pedagogically simplest
form – the one that best illustrates the underlying concepts – rather
than the one that is most efficient, widely deployed, or conforms to In-
ternet standards. We will discuss some examples of practical systems
and attacks, but only when these serve to illustrate a conceptual point.
Depending on time, I plan to cover the following notions:

• Part I: Introduction

1. How do we define security for encryption? Arguably the most


important step in breaking out of the “build-break-tweak” cycle
that Poe’s quote described has been the idea that we can have a
mathematically precise definition of security, rather than relying
on fuzzy notions, that allow us only to determine with certainty
that a system is broken but never have a chance of proving that a
system is secure .
2. Perfect security and its limitations: Showing the possibility
(and the limitations) of encryptions that are perfectly secure
regardless of the attacker’s computational resources.
3. Computational security: Bypassing the above limitations by re-
stricting to computationally efficient attackers. Proofs of security
by reductions.

• Part II: Private Key Cryptography

1. Pseudorandom generators: The basic building block of cryp-


tography, which also provided a new twist on the age-old philo-
sophical and scientific question of the nature of randomness.
2. Pseudorandom functions, permutations, block ciphers: Block
ciphers are the working horse of crypto.
17

3. Authentication and active attacks: Authentication turns out to


be as crucial, if not more so, to security than secrecy and often
a precondition to the latter. We’ll talk about notions such as
Message Authentication Codes and Chosen-Ciphertext-Attack
secure encryption, as well as real-world examples why these
notions are necessary.
4. Hash functions and the “Random Oracle Model”: Hash func-
tions are used everwhere in crypto, including for verifying in-
tegrity, entropy distillation, and many other cases.
5. Building pseudorandom generators from one-way permu-
tations (optional): Justifying our “axiom” of pseudo-random
generators by deriving it from a weaker assumption.

• Part III: Public key encryption

1. Public key cryptography and the obfuscation paradigm: How


did Diffie, Hellman, Merkle, Ellis even dare to imagine the possi-
bility of public key encryption?
2. Constructing public key encryption: Factoring, discrete log,
and lattice based systems: We’ll discuss several variants for
constructing public key systems, including those that are widely
deployed such as RSA, Diffie-Hellman, and the elliptic curve
variants. We’ll also discuss some variants of lattice based cryp-
tosystems that have the advantage of not being broken by quan-
tum computers and being more versatile. The former’s weakness
to quantum computers is the reason why the NSA has advised
people to transition to lattice-based cryptosystems in the not too
far future.
3. Signature schemes: These are the public key versions of authen-
tication, though interestingly they are easier to construct in some
sense than the latter.
4. Active attacks for encryption: Chosen ciphertext attacks for
public key encryption.

• Part IV: Advanced notions

1. Fully homomorphic encryption: Computing on encrypted data.


2. Multiparty secure computation: An amazing construction that
enables applications such as playing poker over the net without
trusting the server, privacy preserving data mining, electronic
auctions without a trusted auctioneer, and electronic elections
without a trusted central authority.
3. Zero knowledge proofs: Prove a statement without revealing
the reason to why its true.
18

4. Quantum computing and cryptography: Shor’s algorithm to


break RSA and friends. Quantum key distribution. On “quan-
tum resistant” cryptography.
5. Indistinguishability obfuscation: Construction of indistin-
guishability obfuscators, the potential “master tool” for crypto.
6. Practical protocols: Techniques for constructing practical proto-
cols for particular tasks as opposed to general (and often ineffi-
cient) feasibility proofs.
7. Cryptocurrencies: Hash chains and Merkle trees, proofs of
work, achieving consensus on a ledger via “majority of cycles”,
smart contracts, achieving anonymity via zero knowledge proofs.

0.1.1 Prerequisites
The main prerequisite is the ability to read, write (and even enjoy!)
mathematical proofs. In addition, familiarity with algorithms, ba-
sic probability theory and basic linear algebra will be helpful. We’ll
only use fairly basic concepts from all these areas: e.g. Oh-notation-
e.g. 𝑂(𝑛) running time - from algorithms, notions such as events, ran-
dom variables, expectation, from probability theory, and notions such
as matrices, vectors, and eigenvectors. Mathematically mature stu-
dents should be able to pick up the needed notions on their own. See
the “mathematical background” handout for more details.
No programming knowledge is needed. If you’re interested in the
course but are not sure if you have sufficient background, or you have
any other questions, please don’t hesitate to contact me.

0.2 WHY IS CRYPTOGRAPHY HARD?


Cryptography is a hard topic. Over the course of history, many bril-
liant people have stumbled in it, and did not realize subtle attacks on
their ciphers. Even today it is frustratingly easy to get crypto wrong,
and often system security is compromised because developers used
crypto schemes in the wrong, or at least suboptimal, way. Why is this
topic (and this course) so hard? Some of the reasons include:

• To argue about the security of a cryptographic scheme, you have to


think like an attacker. This requires a very different way of thinking
than what we are used to when developing algorithms or systems,
and arguing that they perform well.

• To get robust assurances of security you need to argue about all


possible attacks . The only way I know to analyze this infinite set
is via mathematical proofs . Moreover, these types of mathematical
proofs tend to be rather different than the ones most mathemati-
cians typically work with. Because the proof itself needs to take the
19

viewpoint of the attacker, these often tend to be proofs by contra-


diction and involve several twists of logic that take some getting
used to.

• As we’ll see in this course, even defining security is a highly non-


trivial task. Security definitions often get subtle and require quite
a lot of creativity. For example, the way we model in general a
statement such as “An attacker Eve does not get more information
from observing a system above what she knew a-priori” is that
we posit a “hypothetical alter ego” of Eve called Lilith who knows
everything Eve knew a-priori but does not get to observe the actual
interaction in the system. We then want to prove that anything that
Eve learned could also have been learned by Lilith. If this sounds
confusing, it is. But it is also fascinating, and leads to ways to argue
mathematically about knowledge as well as beautiful generalizations
of the notion of encryption and protecting communication into
schemes for protecting computation .

If cryptography is so hard, is it really worth studying? After all,


given this subtlety, a single course in cryptography is no guarantee of
using (let alone inventing) crypto correctly. In my view, regardless of
its immense and growing practical importance, cryptography is worth
studying for its intellectual content. There are many areas of science
where we achieve goals once considered to be science fiction. But
cryptography is an area where current achievements are so fantastic
that in the thousands of years of secret writing people did not even
dare imagine them. Moreover, cryptography may be hard because
it forces you to think differently, but it is also rewarding because it
teaches you to think differently. And once you pass this initial hurdle,
and develop a “cryptographer’s mind”, you might find that this point
of view is useful in areas that seem to have nothing to do with crypto.
I
PRELIMINARIES
0
Mathematical Background

This is a brief review of some mathematical tools, and especially


probability theory, that we will use in this course. See also the math-
ematical background and probability lectures in my Notes on Intro-
duction to Theoretical Computer Science, which share much of the
following text.
At Harvard, much of this material (and more) is taught in Stat
110 “Introduction to Probability”, CS20 “Discrete Mathematics”, and
AM107 “Graph Theory and Combinatorics”. Some good sources for
this material are the lecture notes by Papadimitriou and Vazirani (see
home page of Umesh Vaziarani), Lehman, Leighton and Meyer from
MIT Course 6.042 “Mathematics For Computer Science” (Chapters
1-2 and 14 to 19 are particularly relevant), and the Berkeley course CS
70. The mathematical tool we use most often is discrete probability.
The “Probabilistic Method” book by Alon and Spencer is a great re-
source in this area. Also, the books of Mitzenmacher and Upfal and
Prabhakar and Raghavan cover probability from a more algorithmic
perspective. For an excellent popular discussion of some of the math-
ematical concepts we’ll talk about see the book “How Not to Be Wrong”
by Jordan Ellenberg.
Although knowledge of algorithms is not strictly necessary, it
would be quite useful. Students who did not take an algorithms class
such as CS 124 might want to look at (1) Corman, Leiserson, Rivest
and Smith, (2) Dasgupte, Papadimitriou and Vaziarni, or (3) Klein-
berg and Tardos. We do not require prior knowledge of complexity
or computability but some basic familiarity could be useful. Students
who did not take a theory of computation class such as CS 121 might
want to look at my lecture notes or the first 2 chapters of my book with
Arora.

0.1 A QUICK OVERVIEW OF MATHEMATICAL PREREQUISITES


The main notions we will use in this course are the following:

Compiled on 11.17.2021 22:35


24 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

• Proofs: First and foremost, this course will involve a heavy dose
of formal mathematical reasoning, which includes mathematical
definitions, statements, and proofs.

• Sets and functions: We will assume familiarity with basic notions


of sets and operations on sets such as union (denoted ∪), intersec-
tion (denoted ∩), and set subtraction (denoted ⧵). We denote by |𝐴|
the size of the set 𝐴. We also assume familiarity with functions, and
notions such as one-to-one (injective) functions and onto (surjec-
tive) functions. If 𝑓 is a function from a set 𝐴 to a set 𝐵, we denote
this by 𝑓 ∶ 𝐴 → 𝐵. If 𝑓 is one-to-one then this implies that |𝐴| ≤ |𝐵|.
If 𝑓 is onto then |𝐴| ≥ |𝐵|. If 𝑓 is a permutation/bijection (e.g.,
one-to-one and onto) then this implies that |𝐴| = |𝐵|.

• Big Oh notation: If 𝑓, 𝑔 are two functions from ℕ to ℕ, then (1)


𝑓 = 𝑂(𝑔) if there exists a constant 𝑐 such that 𝑓(𝑛) ≤ 𝑐 ⋅ 𝑔(𝑛) for
every sufficiently large 𝑛, (2) 𝑓 = Ω(𝑔) if 𝑔 = 𝑂(𝑓), (3) 𝑓 = Θ(𝑔) is
𝑓 = 𝑂(𝑔) and 𝑔 = 𝑂(𝑓), (4) 𝑓 = 𝑜(𝑔) if for every 𝜖 > 0, 𝑓(𝑛) ≤ 𝜖⋅𝑔(𝑛)
for every sufficiently large 𝑛, and (5) 𝑓 = 𝜔(𝑔) if 𝑔 = 𝑜(𝑓). To
emphasize the input parameter, we often write 𝑓(𝑛) = 𝑂(𝑔(𝑛))
instead of 𝑓 = 𝑂(𝑔), and use similar notation for 𝑜, Ω, 𝜔, Θ. While
this is only an imprecise heuristic, when you see a statement of the
form 𝑓(𝑛) = 𝑂(𝑔(𝑛)) you can often replace it in your mind by the
statement 𝑓(𝑛) ≤ 1000𝑔(𝑛) while the statement 𝑓(𝑛) = Ω(𝑔(𝑛)) can
often be thought of as 𝑓(𝑛) ≥ 0.001𝑔(𝑛) .

• Logical operations: The operations AND, OR, and NOT (∧, ∨, ¬)


and the quantifiers “exists” and “forall” (∃,∀).

• Tuples and strings: The notation Σ𝑘 and Σ∗ where Σ is some finite


set which is called the alphabet (quite often Σ = {0, 1}).

• Graphs: Undirected and directed graphs, connectivity, paths, and


cycles.

• Basic combinatorics: Notions such as (𝑛𝑘) (the number of 𝑘-sized


subset of a set of size 𝑛).

• Discrete probability: We will extensively use probability theory, and


specifically probability over finite samples spaces such as tossing 𝑛
coins, including notions such as random variables, expectation, and
concentration.

• Modular arithmetic: We will use modular arithmetic (i.e., addition


and multiplication modulo some number 𝑚), and in particular
talk about operations on vectors and matrices whose elements are
taken modulo 𝑚. If 𝑛 is an integer, then we denote by 𝑎 (mod 𝑛)
the remainder of 𝑎 when divided by 𝑛. 𝑎 (mod 𝑛) is the number
mathe mati ca l backg rou n d 25

𝑟 ∈ {0, … , 𝑛 − 1} such that 𝑎 = 𝑘𝑛 + 𝑟 for some integer 𝑘. It will


be very useful that 𝑎 (mod 𝑛) + 𝑏 (mod 𝑛) = (𝑎 + 𝑏) (mod 𝑛)
and 𝑎 (mod 𝑛) ⋅ 𝑏 (mod 𝑛) = (𝑎 ⋅ 𝑏) (mod 𝑛) and so modular
arithmetic inherits all the rules (associativity, commutativity etc..)
of integer arithmetic. If 𝑎, 𝑏 are positive integers then 𝑔𝑐𝑑(𝑎, 𝑏) is the
largest integer that divides both 𝑎 and 𝑏. It is known that for every
𝑎, 𝑏 there exist (not necessarily positive) integers 𝑥, 𝑦 such that
𝑎𝑥 + 𝑏𝑦 = 𝑔𝑐𝑑(𝑎, 𝑏) (it’s a good exercise to prove this on your own).
In particular, if 𝑔𝑐𝑑(𝑎, 𝑛) = 1 then there exists a modular inverse for 𝑎
which is a number 𝑏 such that 𝑎𝑏 = 1 (mod 𝑛). We sometimes write
𝑏 as 𝑎−1 (mod 𝑛).

• Group theory, linear algebra: In later parts of the course we will


need the notions of matrices, vectors, matrix multiplication and
inverse, determinant, eigenvalues, and eigenvectors. These can
be picked up in any basic text on linear algebra. In some parts we
might also use some basic facts of group theory (finite groups only,
and mostly only commutative ones). These also can be picked up as
we go along, and a prior course on group theory is not necessary.

• Discrete probability: Probability theory, and specifically probability


over finite samples spaces such as tossing 𝑛 coins is a crucial part
of cryptography, since (as we’ll see) there is no secrecy without
randomness.

0.2 MATHEMATICAL PROOFS


Arguably the mathematical prerequisite needed for this course is a
certain level of comfort with mathematical proofs. Many students
tend to think of mathematical proofs as a very formal object, like
the proofs studied in school in geometry, consisting of a sequence of
axioms and statements derived from them by very specific rules. In
fact,
a proof is a piece of writing meant to convince human
readers that a particular statement is true.

(In this class, the particular humans you are trying to convince are
me and the teaching fellows.)
To write a proof of some statement X you need to follow three steps:

1. Make sure that you completely understand the statement X.

2. Think about X until you are able to convince yourself that X is true.

3. Think how to present the argument in the clearest possible way so


you can convince the reader as well.
26 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Like any good piece of writing, a proof should be concise and not
be overly formal or cumbersome. In fact, overuse of formalism can of-
ten be detrimental to the argument since it can mask weaknesses in the
argument from both the writer and the reader. Sometimes students
try to “throw the kitchen sink” at an answer trying to list all possi-
bly relevant facts in the hope of getting partial credit. But a proof is a
piece of writing, and a badly written proof will not get credit even if
it contains some correct elements. It is better to write a clear proof of
a partial statement. In particular, if you haven’t been able to convince
yourself that the statement is true, you should be honest about it and
explain which parts of the statement you have been able to verify and
which parts you haven’t.

0.2.1 Example: The existence of infinitely many primes.


In the spirit of “do what I say and not what I do”, I will now demon-
strate the importance of conciseness by belaboring the point and
spending several paragraphs on a simple proof, written by Euclid
around 300 BC. Recall that a prime number is an integer 𝑝 > 1 whose
only divisors are 𝑝 and 1. Euclid’s Theorem is the following:

Theorem 0.1 — Infinitude of primes. There exist infinitely many primes.

Instead of simply writing down the proof, let us try to understand


how we might figure this proof out. (If you haven’t seen this proof
before, or you don’t remember it, you might want to stop reading at
this point and try to come up with it on your own before continuing.)
The first (and often most important) step is to understand what the
statement means. Saying that the number of primes is infinite means
that it is not finite. More precisely, this means that for every natural
number 𝑘, there are more than 𝑘 primes.
Now that we understand what we need to prove, let us try to con-
vince ourselves of this fact. At first, it might seem obvious— since
there are infinitely many natural numbers, and every one of them can
be factored into primes, there must be infinitely many primes, right?
Wrong. Since we can multiply a prime many times with itself, a
finite number of primes can generate infinitely many numbers. In-
deed the single prime 3 generates the infinite set of all numbers of the
form 3𝑛 . So, what we really need to show is that for every finite set
of primes {𝑝1 , … , 𝑝𝑘 }, there exists a number 𝑛 that has a prime factor
outside this set.
Now we need to start playing around. Suppose that we had just
two primes 𝑝 and 𝑞. How would we find a number 𝑛 that is not gen-
erated by 𝑝 and 𝑞? If you try to draw things on the number line, you
will see that there is always some gap between multiples of 𝑝 and 𝑞 in
mathe mati ca l backg rou n d 27

the sense that they are never consecutive. It is possible to prove that
(in fact, it’s not a bad exercise) but this observation already suggests a
guess for what would be a number that is divisible by neither 𝑝 nor 𝑞,
namely 𝑝𝑞 + 1. Indeed, the remainder of 𝑛 = 𝑝𝑞 + 1 when dividing by
either 𝑝 or 𝑞 would be 1 (which in particular is not zero). This obser-
vation generalizes and we can set 𝑛 = 𝑝𝑞𝑟 + 1 to be a number that is
divisible neither by 𝑝, 𝑞 nor 𝑟, and more generally 𝑛 = 𝑝1 ⋯ , 𝑝𝑘 + 1 is
not divisible by 𝑝1 , … , 𝑝𝑘 .
Now we have convinced ourselves of the statement and it is time
to think of how to write this down in the clearest way. One issue that
arises is that we want to prove things truly from the definition of
primes and first principles, and so not assume properties of division
and remainders or even the existence of a prime factorization, without
proving it. Here is what a proof could look like. We will prove the
following two lemmas:
Lemma 0.2 — Existence of prime divisor. For every integer 𝑛 > 1, there
exists a prime 𝑝 > 1 that divides 𝑛.
For every set of integers 𝑝1 , … , 𝑝𝑘 > 1,
Lemma 0.3 — Existence of co-prime.
there exists a number 𝑛 such that none of 𝑝1 , … , 𝑝𝑘 divide 𝑛.
From these two lemmas it follows that there exist infinitely many
primes, since otherwise if we let 𝑝1 , … , 𝑝𝑘 be the set of all primes,
then we would get a contradiction as by combining Lemma 0.2 and
Lemma 0.3 we would get a number 𝑛 with a prime factor outside this
set. We now prove the lemmas:

Proof of Lemma 0.2. Let 𝑛 > 1 be a number, and let 𝑝 be the smallest
divisor of 𝑛 that is larger than 1 (there exists such a number 𝑝 since 𝑛
divides itself). We claim that 𝑝 is a prime. Indeed suppose otherwise
there was some 1 < 𝑞 < 𝑝 that divides 𝑝. Then since 𝑛 = 𝑝𝑐 for some
integer 𝑐 and 𝑝 = 𝑞𝑐′ for some integer 𝑐′ we’ll get that 𝑛 = 𝑞𝑐𝑐′ and
hence 𝑞 divides 𝑛 in contradiction to the choice of 𝑝 as the smallest
divisor of 𝑛.

Proof of Lemma 0.3. Let 𝑛 = 𝑝1 ⋯ 𝑝𝑘 + 1 and suppose for the sake of


contradiction that there exists some 𝑖 such that 𝑛 = 𝑝𝑖 ⋅ 𝑐 for some
integer 𝑐. Then if we divide the equation 𝑛 − 𝑝1 ⋯ 𝑝𝑘 = 1 by 𝑝𝑖 then we
get 𝑐 minus an integer on the lefthand side, and the fraction 1/𝑝𝑖 on
the righthand side.

This completes the proof of Theorem 0.1


28 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

0.3 PROBABILITY AND SAMPLE SPACES


Perhaps the main mathematical background needed in cryptography
is probability theory since, as we will see, there is no secrecy without
randomness. Luckily, we only need fairly basic notions of probability
theory and in particular only probability over finite sample spaces.
If you have a good understanding of what happens when we toss 𝑘
random coins, then you know most of the probability you’ll need.
The discussion below is not meant to replace a course on proba-
bility theory, and if you have not seen this material before, I highly 1
Harvard’s STAT 110 class (whose lectures are avail-
recommend you look at additional resources to get up to speed.1 able on youtube ) is a highly recommended introduc-
The nature of randomness and probability is a topic of great philo- tion to probability. See also these lecture notes from
MIT’s “Mathematics for Computer Science” course ,as
sophical, scientific and mathematical depth. Is there actual random-
well as notes 12-17 of Berkeley’s CS 70.
ness in the world, or does it proceed in a deterministic clockwork fash-
ion from some initial conditions set at the beginning of time? Does
probability refer to our uncertainty of beliefs, or to the frequency of
occurrences in repeated experiments? How can we define probability
over infinite sets?
These are all important questions that have been studied and de-
bated by scientists, mathematicians, statisticians, and philosophers.
Fortunately, we will not need to deal directly with these questions
here. We will be mostly interested in the setting of tossing 𝑛 random,
unbiased and independent coins. Below we define the basic proba-
bilistic objects of events and random variables when restricted to this
setting. These can be defined for much more general probabilistic ex-
periments or sample spaces, and later on we will briefly discuss how
this can be done. However, the 𝑛-coin case is sufficient for almost
everything we’ll need in this course.
If instead of “heads” and “tails” we encode the sides of each coin
by “zero” and “one”, we can encode the result of tossing 𝑛 coins as
a string in {0, 1}𝑛 . Each particular outcome 𝑥 ∈ {0, 1}𝑛 is obtained
with probability 2−𝑛 . For example, if we toss three coins, then we
obtain each of the 8 outcomes 000, 001, 010, 011, 100, 101, 110, 111
with probability 2−3 = 1/8 (see also Fig. 1). We can describe the
experiment of tossing 𝑛 coins as choosing a string 𝑥 uniformly at
random from {0, 1}𝑛 , and hence we’ll use the shorthand 𝑥 ∼ {0, 1}𝑛 Figure 1: The probabilistic experiment of tossing
for 𝑥 that is chosen according to this experiment. three coins corresponds to making 2 × 2 × 2 = 8
choices, each with equal probability. In this example,
An event is simply a subset 𝐴 of {0, 1}𝑛 . The probability of 𝐴, de- the blue set corresponds to the event 𝐴 = {𝑥 ∈
noted by Pr𝑥∼{0,1}𝑛 [𝐴] (or Pr[𝐴] for short, when the sample space is {0, 1}3 | 𝑥0 = 0} where the first coin toss is equal
understood from the context), is the probability that an 𝑥 chosen uni- to 0, and the pink set corresponds to the event 𝐵 =
{𝑥 ∈ {0, 1}3 | 𝑥1 = 1} where the second coin toss is
formly at random will be contained in 𝐴. Note that this is the same as equal to 1 (with their intersection having a purplish
|𝐴|/2𝑛 (where |𝐴| as usual denotes the number of elements in the set color). As we can see, each of these events contains 4
elements (out of 8 total) and so has probability 1/2.
𝐴). For example, the probability that 𝑥 has an even number of ones The intersection of 𝐴 and 𝐵 contains two elements,
𝑛−1
is Pr[𝐴] where 𝐴 = {𝑥 ∶ ∑𝑖=0 𝑥𝑖 = 0 mod 2}. In the case 𝑛 = 3, and so the probability that both of these events occur
is 2/8 = 1/4.
mathe mati ca l backg rou n d 29

𝐴 = {000, 011, 101, 110}, and hence Pr[𝐴] = 4


8 = 1
2 (see Fig. 2). It turns
out this is true for every 𝑛:
Lemma 0.4 For every 𝑛 > 0,
𝑛−1
Pr 𝑛
[∑ 𝑥𝑖 is even ] = 1/2
𝑥∼{0,1}
𝑖=0

P
To test your intuition on probability, try to stop here
and prove the lemma on your own.
Figure 2: The event that if we toss three coins
𝑥0 , 𝑥1 , 𝑥2 ∈ {0, 1} then the sum of the 𝑥𝑖 ’s is even
has probability 1/2 since it corresponds to exactly 4
Proof of Lemma 0.4. We prove the lemma by induction on 𝑛. For the out of the 8 possible strings of length 3.

case 𝑛 = 1 it is clear since 𝑥 = 0 is even and 𝑥 = 1 is odd, and hence


the probability that 𝑥 ∈ {0, 1} is even is 1/2. Let 𝑛 > 1. We assume
by induction that the lemma is true for 𝑛 − 1 and we will prove it
for 𝑛. We split the set {0, 1}𝑛 into four disjoint sets 𝐸0 , 𝐸1 , 𝑂0 , 𝑂1 ,
where for 𝑏 ∈ {0, 1}, 𝐸𝑏 is defined as the set of 𝑥 ∈ {0, 1}𝑛 such that
𝑥0 ⋯ 𝑥𝑛−2 has even number of ones and 𝑥𝑛−1 = 𝑏 and similarly 𝑂𝑏 is
the set of 𝑥 ∈ {0, 1}𝑛 such that 𝑥0 ⋯ 𝑥𝑛−2 has odd number of ones and
𝑥𝑛−1 = 𝑏. Since 𝐸0 is obtained by simply extending 𝑛 − 1-length string
with even number of ones by the digit 0, the size of 𝐸0 is simply the
number of such 𝑛 − 1-length strings which by the induction hypothesis
is 2𝑛−1 /2 = 2𝑛−2 . The same reasoning applies for 𝐸1 , 𝑂0 , and 𝑂1 .
Hence each one of the four sets 𝐸0 , 𝐸1 , 𝑂0 , 𝑂1 is of size 2𝑛−2 . Since
𝑥 ∈ {0, 1}𝑛 has an even number of ones if and only if 𝑥 ∈ 𝐸0 ∪ 𝑂1
(i.e., either the first 𝑛 − 1 coordinates sum up to an even number and
the final coordinate is 0 or the first 𝑛 − 1 coordinates sum up to an odd
number and the final coordinate is 1), we get that the probability that
𝑥 satisfies this property is

|𝐸0 ∪𝑂1 | 2𝑛−2 + 2𝑛−2 1


2𝑛 = = ,
2𝑛 2
using the fact that 𝐸0 and 𝑂1 are disjoint and hence |𝐸0 ∪ 𝑂1 | =
|𝐸0 | + |𝑂1 |.

We can also use the intersection (∩) and union (∪) operators to
talk about the probability of both event 𝐴 and event 𝐵 happening, or
the probability of event 𝐴 or event 𝐵 happening. For example, the
probability 𝑝 that 𝑥 has an even number of ones and 𝑥0 = 1 is the same
𝑛−1
as Pr[𝐴 ∩ 𝐵] where 𝐴 = {𝑥 ∈ {0, 1}𝑛 ∶ ∑𝑖=0 𝑥𝑖 = 0 mod 2} and
𝐵 = {𝑥 ∈ {0, 1}𝑛 ∶ 𝑥0 = 1}. This probability is equal to 1/4 for
𝑛 > 1. (It is a great exercise for you to pause here and verify that you
understand why this is the case.)
30 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Because intersection corresponds to considering the logical AND


of the conditions that two events happen, while union corresponds
to considering the logical OR, we will sometimes use the ∧ and ∨
operators instead of ∩ and ∪, and so write this probability 𝑝 = Pr[𝐴 ∩
𝐵] defined above also as

Pr [∑ 𝑥𝑖 = 0 mod 2 ∧ 𝑥0 = 1] .
𝑥∼{0,1}𝑛
𝑖

If 𝐴 ⊆ {0, 1}𝑛 is an event, then 𝐴 = {0, 1}𝑛 ⧵ 𝐴 corresponds to the


event that 𝐴 does not happen. Since |𝐴| = 2𝑛 − |𝐴|, we get that

Pr[𝐴] = |𝐴|
2𝑛 = 2𝑛 −|𝐴|
2𝑛 =1− |𝐴|
2𝑛 = 1 − Pr[𝐴]

This makes sense: since 𝐴 happens if and only if 𝐴 does not happen,
the probability of 𝐴 should be one minus the probability of 𝐴.

R
Remark 0.5 — Remember the sample space. While the
above definition might seem very simple and almost
trivial, the human mind seems not to have evolved for
probabilistic reasoning, and it is surprising how often
people can get even the simplest settings of probability
wrong. One way to make sure you don’t get confused
when trying to calculate probability statements is
to always ask yourself the following two questions:
(1) Do I understand what is the sample space that
this probability is taken over?, and (2) Do I under-
stand what is the definition of the event that we are
analyzing?.
For example, suppose that I were to randomize seating
in my course, and then it turned out that students
sitting in row 7 performed better on the final: how
surprising should we find this? If we started out with
the hypothesis that there is something special about
the number 7 and chose it ahead of time, then the
event that we are discussing is the event 𝐴 that stu-
dents sitting in number 7 had better performance on
the final, and we might find it surprising. However, if
we first looked at the results and then chose the row
whose average performance is best, then the event
we are discussing is the event 𝐵 that there exists some
row where the performance is higher than the over-
all average. 𝐵 is a superset of 𝐴, and its probability
(even if there is no correlation between sitting and
performance) can be quite significant.

0.3.1 Random variables


Events correspond to Yes/No questions, but often we want to analyze
finer questions. For example, if we make a bet at the roulette wheel,
mathe mati ca l backg rou n d 31

we don’t want to just analyze whether we won or lost, but also how
much we’ve gained. A (real valued) random variable is simply a way
to associate a number with the result of a probabilistic experiment.
Formally, a random variable is a function 𝑋 ∶ {0, 1}𝑛 → ℝ that maps
every outcome 𝑥 ∈ {0, 1}𝑛 to an element 𝑋(𝑥) ∈ ℝ. For example, the
function 𝑠𝑢𝑚 ∶ {0, 1}𝑛 → ℝ that maps 𝑥 to the sum of its coordinates
𝑛−1
(i.e., to ∑𝑖=0 𝑥𝑖 ) is a random variable.
The expectation of a random variable 𝑋, denoted by 𝔼[𝑋], is the
average value that that this number takes, taken over all draws from
the probabilistic experiment. In other words, the expectation of 𝑋 is
defined as follows:

𝔼[𝑋] = ∑ 2−𝑛 𝑋(𝑥) .


𝑥∈{0,1}𝑛

If 𝑋 and 𝑌 are random variables, then we can define 𝑋 + 𝑌 as


simply the random variable that maps a point 𝑥 ∈ {0, 1}𝑛 to 𝑋(𝑥) +
𝑌 (𝑥). One basic and very useful property of the expectation is that it
is linear:
Lemma 0.6 — Linearity of expectation.

𝔼[𝑋 + 𝑌 ] = 𝔼[𝑋] + 𝔼[𝑌 ]

Proof.
𝔼[𝑋 + 𝑌 ] = ∑ 2−𝑛 (𝑋(𝑥) + 𝑌 (𝑥)) =
𝑥∈{0,1}𝑛

∑ 2−𝑛 𝑋(𝑥) + ∑ 2−𝑛 𝑌 (𝑥) =


𝑥∈{0,1}𝑏 𝑥∈{0,1}𝑏

𝔼[𝑋] + 𝔼[𝑌 ]

Similarly, 𝔼[𝑘𝑋] = 𝑘 𝔼[𝑋] for every 𝑘 ∈ ℝ. For example, using the


linearity of expectation, it is very easy to show that the expectation of
the sum of the 𝑥𝑖 ’s for 𝑥 ∼ {0, 1}𝑛 is equal to 𝑛/2. Indeed, if we write
𝑛−1
𝑋 = ∑𝑖=0 𝑥𝑖 then 𝑋 = 𝑋0 + ⋯ + 𝑋𝑛−1 where 𝑋𝑖 is the random
variable 𝑥𝑖 . Since for every 𝑖, Pr[𝑋𝑖 = 0] = 1/2 and Pr[𝑋𝑖 = 1] = 1/2,
we get that 𝔼[𝑋𝑖 ] = (1/2) ⋅ 0 + (1/2) ⋅ 1 = 1/2 and hence 𝔼[𝑋] =
𝑛−1
∑𝑖=0 𝔼[𝑋𝑖 ] = 𝑛 ⋅ (1/2) = 𝑛/2.

P
If you have not seen discrete probability before, please
go over this argument again until you are sure you
follow it; it is a prototypical simple example of the
type of reasoning we will employ again and again in
this course.
32 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

If 𝐴 is an event, then 1𝐴 is the random variable such that 1𝐴 (𝑥)


equals 1 if 𝑥 ∈ 𝐴, and 1𝐴 (𝑥) = 0 otherwise. Note that Pr[𝐴] = 𝔼[1𝐴 ]
(can you see why?). Using this and the linearity of expectation, we
can show one of the most useful bounds in probability theory:
Lemma 0.7 — Union bound. For every two events 𝐴, 𝐵, Pr[𝐴 ∪ 𝐵] ≤ Pr[𝐴] +
Pr[𝐵]

P
Before looking at the proof, try to see why the union
bound makes intuitive sense. We can also prove
it directly from the definition of probabilities and
the cardinality of sets, together with the equation
|𝐴 ∪ 𝐵| ≤ |𝐴| + |𝐵|. Can you see why the latter
equation is true? (See also Fig. 3.)

Proof of Lemma 0.7. For every 𝑥, the variable 1𝐴∪𝐵 (𝑥) ≤ 1𝐴 (𝑥) + 1𝐵 (𝑥).
Hence, Pr[𝐴∪𝐵] = 𝔼[1𝐴∪𝐵 ] ≤ 𝔼[1𝐴 +1𝐵 ] = 𝔼[1𝐴 ]+𝔼[1𝐵 ] = Pr[𝐴]+Pr[𝐵].

The way we often use this in theoretical computer science is to


argue that, for example, if there is a list of 100 bad events that can hap-
pen, and each one of them happens with probability at most 1/10000,
then with probability at least 1 − 100/10000 = 0.99, no bad event
happens.

0.3.2 Distributions over strings


While most of the time we think of random variables as having
as output a real number, we sometimes consider random vari-
ables whose output is a string. That is, we can think of a map
𝑌 ∶ {0, 1}𝑛 → {0, 1}∗ and consider the “random variable” 𝑌 such
that for every 𝑦 ∈ {0, 1}∗ , the probability that 𝑌 outputs 𝑦 is equal
to 21𝑛 |{𝑥 ∈ {0, 1}𝑛 | 𝑌 (𝑥) = 𝑦}|. To avoid confusion, we will typically Figure 3: The union bound tells us that the probability
refer to such string-valued random variables as distributions over of 𝐴 or 𝐵 happening is at most the sum of the indi-
vidual probabilities. We can see it by noting that for
strings. So, a distribution 𝑌 over strings {0, 1}∗ can be thought of as every two sets |𝐴 ∪ 𝐵| ≤ |𝐴| + |𝐵| (with equality only
a finite collection of strings 𝑦0 , … , 𝑦𝑀−1 ∈ {0, 1}∗ and probabilities if 𝐴 and 𝐵 have no intersection).
𝑝0 , … , 𝑝𝑀−1 (which are non-negative numbers summing up to one),
so that Pr[𝑌 = 𝑦𝑖 ] = 𝑝𝑖 .
Two distributions 𝑌 and 𝑌 ′ are identical if they assign the same
probability to every string. For example, consider the following two
functions 𝑌 , 𝑌 ′ ∶ {0, 1}2 → {0, 1}2 . For every 𝑥 ∈ {0, 1}2 , we define
𝑌 (𝑥) = 𝑥 and 𝑌 ′ (𝑥) = 𝑥0 (𝑥0 ⊕ 𝑥1 ) where ⊕ is the XOR operations. Al-
though these are two different functions, they induce the same distri-
bution over {0, 1}2 when invoked on a uniform input. The distribution
𝑌 (𝑥) for 𝑥 ∼ {0, 1}2 is of course the uniform distribution over {0, 1}2 .
mathe mati ca l backg rou n d 33

On the other hand 𝑌 ′ is simply the map 00 ↦ 00, 01 ↦ 01, 10 ↦ 11,


11 ↦ 10 which is a permutation over the map 𝐹 ∶ {0, 1}2 → {0, 1}2
defined as 𝐹 (𝑥0 𝑥1 ) = 𝑥0 𝑥1 and the map 𝐺 ∶ {0, 1}2 → {0, 1}2 defined
as 𝐺(𝑥0 𝑥1 ) = 𝑥0 (𝑥0 ⊕ 𝑥1 )

0.3.3 More general sample spaces.


While in this chapter we assume that the underlying probabilistic
experiment corresponds to tossing 𝑛 independent coins, everything
we say easily generalizes to sampling 𝑥 from a more general finite or
countable set 𝑆 (and not-so-easily generalizes to uncountable sets 𝑆 as
well). A probability distribution over a finite set 𝑆 is simply a function
𝜇 ∶ 𝑆 → [0, 1] such that ∑𝑥∈𝑆 𝜇(𝑠) = 1. We think of this as the
experiment where we obtain every 𝑥 ∈ 𝑆 with probability 𝜇(𝑠), and
sometimes denote this as 𝑥 ∼ 𝜇. An event 𝐴 is a subset of 𝑆, and the
probability of 𝐴, which we denote by Pr𝜇 [𝐴], is ∑𝑥∈𝐴 𝜇(𝑥). A random
variable is a function 𝑋 ∶ 𝑆 → ℝ, where the probability that 𝑋 = 𝑦 is
equal to ∑𝑥∈𝑆 s.t. 𝑋(𝑥)=𝑦 𝜇(𝑥).

0.4 CORRELATIONS AND INDEPENDENCE


One of the most delicate but important concepts in probability is the
notion of independence (and the opposing notion of correlations). Subtle
correlations are often behind surprises and errors in probability and
statistical analysis, and several mistaken predictions have been blamed
on miscalculating the correlations between, say, housing prices in
Florida and Arizona, or voter preferences in Ohio and Michigan. See
also Joe Blitzstein’s aptly named talk “Conditioning is the Soul of
Statistics”. (Another thorny issue is of course the difference between
correlation and causation. Luckily, this is another point we don’t need to
worry about in our clean setting of tossing 𝑛 coins.)
Two events 𝐴 and 𝐵 are independent if the fact that 𝐴 happens
Figure 4: Two events 𝐴 and 𝐵 are independent if
makes 𝐵 neither more nor less likely to happen. For example, if we Pr[𝐴 ∩ 𝐵] = Pr[𝐴] ⋅ Pr[𝐵]. In the two figures above,
think of the experiment of tossing 3 random coins 𝑥 ∈ {0, 1}3 , and we the empty 𝑥 × 𝑥 square is the sample space, and 𝐴
let 𝐴 be the event that 𝑥0 = 1 and 𝐵 the event that 𝑥0 + 𝑥1 + 𝑥2 ≥ 2, and 𝐵 are two events in this sample space. In the left
figure, 𝐴 and 𝐵 are independent, while in the right
then if 𝐴 happens it is more likely that 𝐵 happens, and hence these figure they are negatively correlated, since 𝐵 is less
events are not independent. On the other hand, if we let 𝐶 be the event likely to occur if we condition on 𝐴 (and vice versa).
Mathematically, one can see this by noticing that in
that 𝑥1 = 1, then because the second coin toss is not affected by the the left figure the areas of 𝐴 and 𝐵 respectively are
result of the first one, the events 𝐴 and 𝐶 are independent. 𝑎 ⋅ 𝑥 and 𝑏 ⋅ 𝑥, and so their probabilities are 𝑎⋅𝑥
𝑥2
= 𝑥𝑎
The formal definition is that events 𝐴 and 𝐵 are independent if and 𝑥2 = 𝑥 respectively, while the area of 𝐴 ∩ 𝐵 is
𝑏⋅𝑥 𝑏

Pr[𝐴 ∩ 𝐵] = Pr[𝐴] ⋅ Pr[𝐵]. If Pr[𝐴 ∩ 𝐵] > Pr[𝐴] ⋅ Pr[𝐵] then we say 𝑎 ⋅ 𝑏 which corresponds to the probability 𝑎⋅𝑏 𝑥2
. In the
right figure, the area of the triangle 𝐵 is 𝑏⋅𝑥 which
that 𝐴 and 𝐵 are positively correlated, while if Pr[𝐴 ∩ 𝐵] < Pr[𝐴] ⋅ Pr[𝐵] 2
corresponds to a probability of 2𝑥 𝑏
, but the area of
then we say that 𝐴 and 𝐵 are negatively correlated (see Fig. 1). 𝐴 ∩ 𝐵 is 𝑏′ ⋅𝑎
for some 𝑏′ < 𝑏. This means that the
2
If we consider the above examples on the experiment of choosing ′
probability of 𝐴 ∩ 𝐵 is 𝑏2𝑥⋅𝑎2 < 𝑏
2𝑥 ⋅ 𝑥𝑎 , or in other words
𝑥 ∈ {0, 1}3 then we can see that Pr[𝐴 ∩ 𝐵] < Pr[𝐴] ⋅ Pr[𝐵].
34 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Pr[𝑥0 = 1] = 1
2
Pr[𝑥0 + 𝑥1 + 𝑥2 ≥ 2] = Pr[{011, 101, 110, 111}] = 4
8 = 1
2

but

Pr[𝑥0 = 1 ∧ 𝑥0 + 𝑥1 + 𝑥2 ≥ 2] = Pr[{101, 110, 111}] = 3


8 > 1
2 ⋅ 1
2

and hence, as we already observed, the events {𝑥0 = 1} and {𝑥0 +


𝑥1 + 𝑥2 ≥ 2} are not independent and in fact are positively correlated.
On the other hand, Pr[𝑥0 = 1 ∧ 𝑥1 = 1] = Pr[{110, 111}] = 28 = 12 ⋅ 21
and hence the events {𝑥0 = 1} and {𝑥1 = 1} are indeed independent.

R
Remark 0.8 — Disjointness vs independence. People
sometimes confuse the notion of disjointness and in-
dependence, but these are actually quite different. Two
events 𝐴 and 𝐵 are disjoint if 𝐴 ∩ 𝐵 = ∅, which means
that if 𝐴 happens then 𝐵 definitely does not happen.
They are independent if Pr[𝐴 ∩ 𝐵] = Pr[𝐴] Pr[𝐵] which
means that knowing that 𝐴 happens gives us no infor-
mation about whether 𝐵 happened or not. If 𝐴 and 𝐵
have nonzero probability, then being disjoint implies
that they are not independent, since in particular it
means that they are negatively correlated.

Conditional probability:If 𝐴 and 𝐵 are events, and 𝐴 happens with


nonzero probability then we define the probability that 𝐵 happens
conditioned on 𝐴 to be Pr[𝐵|𝐴] = Pr[𝐴 ∩ 𝐵]/ Pr[𝐴]. This corresponds
to calculating the probability that 𝐵 happens if we already know
that 𝐴 happened. Note that 𝐴 and 𝐵 are independent if and only if
Pr[𝐵|𝐴] = Pr[𝐵].

More than two events: We can generalize this definition to more than
two events. We say that events 𝐴1 , … , 𝐴𝑘 are mutually independent
if knowing that any set of them occurred or didn’t occur does not
change the probability that an event outside the set occurs. Formally,
the condition is that for every subset 𝐼 ⊆ [𝑘],
Figure 5: Consider the sample space {0, 1}𝑛 and the
events 𝐴, 𝐵, 𝐶, 𝐷, 𝐸 corresponding to 𝐴: 𝑥0 = 1, 𝐵:
Pr[∧𝑖∈𝐼 𝐴𝑖 ] = ∏ Pr[𝐴𝑖 ]. 𝑥1 = 1, 𝐶: 𝑥0 +𝑥1 +𝑥2 ≥ 2, 𝐷: 𝑥0 +𝑥1 +𝑥2 = 0𝑚𝑜𝑑2
𝑖∈𝐼
and 𝐷: 𝑥0 + 𝑥1 = 0𝑚𝑜𝑑2. We can see that 𝐴
and 𝐵 are independent, 𝐶 is positively correlated
For example, if 𝑥 ∼ {0, 1}3 , then the events {𝑥0 = 1}, {𝑥1 = 1} and with 𝐴 and positively correlated with 𝐵, the three
{𝑥2 = 1} are mutually independent. On the other hand, the events events 𝐴, 𝐵, 𝐷 are mutually independent, and while
every pair out of 𝐴, 𝐵, 𝐸 is independent, the three
{𝑥0 = 1}, {𝑥1 = 1} and {𝑥0 + 𝑥1 = 0 mod 2} are not mutually events 𝐴, 𝐵, 𝐸 are not mutually independent since
independent, even though every pair of these events is independent their intersection has probability 28 = 41 instead of
(can you see why? see also Fig. 5). 2 ⋅ 2 ⋅ 2 = 8.
1 1 1 1
mathe mati ca l backg rou n d 35

0.4.1 Independent random variables


We say that two random variables 𝑋 ∶ {0, 1}𝑛 → ℝ and 𝑌 ∶ {0, 1}𝑛 → ℝ
are independent if for every 𝑢, 𝑣 ∈ ℝ, the events {𝑋 = 𝑢} and {𝑌 = 𝑣}
are independent. (We use {𝑋 = 𝑢} as shorthand for {𝑥 | 𝑋(𝑥) = 𝑢}.)
In other words, 𝑋 and 𝑌 are independent if Pr[𝑋 = 𝑢 ∧ 𝑌 = 𝑣] =
Pr[𝑋 = 𝑢] Pr[𝑌 = 𝑣] for every 𝑢, 𝑣 ∈ ℝ. For example, if two random
variables depend on the result of tossing different coins then they are
independent:
Lemma 0.9 Suppose that 𝑆 = {𝑠0 , … , 𝑠𝑘−1 } and 𝑇 = {𝑡0 , … , 𝑡𝑚−1 } are
disjoint subsets of {0, … , 𝑛 − 1} and let 𝑋, 𝑌 ∶ {0, 1}𝑛 → ℝ be random
variables such that 𝑋 = 𝐹 (𝑥𝑠0 , … , 𝑥𝑠𝑘−1 ) and 𝑌 = 𝐺(𝑥𝑡0 , … , 𝑥𝑡𝑚−1 ) for
some functions 𝐹 ∶ {0, 1}𝑘 → ℝ and 𝐺 ∶ {0, 1}𝑚 → ℝ. Then 𝑋 and 𝑌
are independent.

P
The notation in the lemma’s statement is a bit cum-
bersome, but at the end of the day, it simply says that
if 𝑋 and 𝑌 are random variables that depend on two
disjoint sets 𝑆 and 𝑇 of coins (for example, 𝑋 might
be the sum of the first 𝑛/2 coins, and 𝑌 might be the
largest consecutive stretch of zeroes in the second 𝑛/2
coins), then they are independent.

Proof of Lemma 0.9. Let 𝑎, 𝑏 ∈ ℝ, and let 𝐴 = {𝑥 ∈ {0, 1}𝑘 ∶ 𝐹 (𝑥) = 𝑎}


and 𝐵 = {𝑥 ∈ {0, 1}𝑚 ∶ 𝐺(𝑥) = 𝑏}. Since 𝑆 and 𝑇 are disjoint, we can
reorder the indices so that 𝑆 = {0, … , 𝑘 − 1} and 𝑇 = {𝑘, … , 𝑘 + 𝑚 − 1}
without affecting any of the probabilities. Hence we can write Pr[𝑋 =
𝑎 ∧ 𝑌 = 𝑏] = |𝐶|/2𝑛 where 𝐶 = {𝑥0 , … , 𝑥𝑛−1 ∶ (𝑥0 , … , 𝑥𝑘−1 ) ∈
𝐴 ∧ (𝑥𝑘 , … , 𝑥𝑘+𝑚−1 ) ∈ 𝐵}. Another way to write this using string
concatenation is that 𝐶 = {𝑥𝑦𝑧 ∶ 𝑥 ∈ 𝐴, 𝑦 ∈ 𝐵, 𝑧 ∈ {0, 1}𝑛−𝑘−𝑚 }, and
hence |𝐶| = |𝐴||𝐵|2𝑛−𝑘−𝑚 , which means that

|𝐶|
2𝑛 = |𝐴| |𝐵| 2𝑛−𝑘−𝑚
2𝑘 2𝑚 2𝑛−𝑘−𝑚 = Pr[𝑋 = 𝑎] Pr[𝑌 = 𝑏].

Note that if 𝑋 and 𝑌 are independent random variables then (if


we let 𝑆𝑋 , 𝑆𝑌 denote all the numbers that have positive probability of
being the output of 𝑋 and 𝑌 , respectively) it holds that:

𝔼[XY] = ∑ Pr[𝑋 = 𝑎 ∧ 𝑌 = 𝑏] ⋅ 𝑎𝑏 =(1) ∑ Pr[𝑋 = 𝑎] Pr[𝑌 = 𝑏] ⋅ 𝑎𝑏 =(2)


𝑎∈𝑆𝑋 ,𝑏∈𝑆𝑌 𝑎∈𝑆𝑋 ,𝑏∈𝑆𝑌

( ∑ Pr[𝑋 = 𝑎] ⋅ 𝑎) ( ∑ Pr[𝑌 = 𝑏]𝑏) =(3)


𝑎∈𝑆𝑋 𝑏∈𝑆𝑌

𝔼[𝑋] 𝔼[𝑌 ]
36 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

where the first equality (=(1) ) follows from the independence of 𝑋


and 𝑌 , the second equality (=(2) ) follows by “opening the parenthe-
ses” of the righthand side, and the third inequality (=(3) ) follows
from the definition of expectation. (This is not an “if and only if”; see
Exercise 0.8.)
Another useful fact is that if 𝑋 and 𝑌 are independent random
variables, then so are 𝐹 (𝑋) and 𝐺(𝑌 ) for all functions 𝐹 , 𝐺 ∶ ℝ → 𝑅.
This is intuitively true since learning 𝐹 (𝑋) can only provide us with
less information than does learning 𝑋 itself. Hence, if learning 𝑋
does not teach us anything about 𝑌 (and so also about 𝐹 (𝑌 )) then
neither will learning 𝐹 (𝑋). Indeed, to prove this we can write for
every 𝑎, 𝑏 ∈ ℝ:

Pr[𝐹 (𝑋) = 𝑎 ∧ 𝐺(𝑌 ) = 𝑏] = ∑ Pr[𝑋 = 𝑥 ∧ 𝑌 = 𝑦] =


𝑥 s.t.𝐹 (𝑥)=𝑎,𝑦 s.t. 𝐺(𝑦)=𝑏

∑ Pr[𝑋 = 𝑥] Pr[𝑌 = 𝑦] =
𝑥 s.t.𝐹 (𝑥)=𝑎,𝑦 s.t. 𝐺(𝑦)=𝑏


⎜ ∑ Pr[𝑋 = 𝑥]⎞ ⎟⋅⎛
⎜ ∑ Pr[𝑌 = 𝑦]⎞⎟=
⎝𝑥 s.t.𝐹 (𝑥)=𝑎 ⎠ ⎝𝑦 s.t.𝐺(𝑦)=𝑏 ⎠
Pr[𝐹 (𝑋) = 𝑎] Pr[𝐺(𝑌 ) = 𝑏].

0.4.2 Collections of independent random variables.


We can extend the notions of independence to more than two random
variables: we say that the random variables 𝑋0 , … , 𝑋𝑛−1 are mutually
independent if for every 𝑎0 , … , 𝑎𝑛−1 ∈ ℝ,

Pr [𝑋0 = 𝑎0 ∧ ⋯ ∧ 𝑋𝑛−1 = 𝑎𝑛−1 ] = Pr[𝑋0 = 𝑎0 ] ⋯ Pr[𝑋𝑛−1 = 𝑎𝑛−1 ].

And similarly, we have that


Lemma 0.10 — Expectation of product of independent random variables. If
𝑋0 , … , 𝑋𝑛−1 are mutually independent then
𝑛−1 𝑛−1
𝔼[ ∏ 𝑋𝑖 ] = ∏ 𝔼[𝑋𝑖 ].
𝑖=0 𝑖=0

Lemma 0.11 — Functions preserve independence.If 𝑋0 , … , 𝑋𝑛−1 are mu-


tually independent, and 𝑌0 , … , 𝑌𝑛−1 are defined as 𝑌𝑖 = 𝐹𝑖 (𝑋𝑖 ) for
some functions 𝐹0 , … , 𝐹𝑛−1 ∶ ℝ → ℝ, then 𝑌0 , … , 𝑌𝑛−1 are mutually
independent as well.

P
We leave proving Lemma 0.10 and Lemma 0.11 as
Exercise 0.9 Exercise 0.10. It is good idea for you stop
now and do these exercises to make sure you are com-
mathe mati ca l backg rou n d 37

fortable with the notion of independence, as we will


use it heavily later on in this course.

0.5 CONCENTRATION AND TAIL BOUNDS


The name “expectation” is somewhat misleading. For example, sup-
pose that you and I place a bet on the outcome of 10 coin tosses, where
if they all come out to be 1’s then I pay you 100,000 dollars and other-
wise you pay me 10 dollars. If we let 𝑋 ∶ {0, 1}10 → ℝ be the random
variable denoting your gain, then we see that

𝔼[𝑋] = 2−10 ⋅ 100000 − (1 − 2−10 )10 ∼ 90.


But we don’t really “expect” the result of this experiment to be for
you to gain 90 dollars. Rather, 99.9% of the time you will pay me 10
dollars, and you will hit the jackpot 0.01% of the times.
However, if we repeat this experiment again and again (with fresh
and hence independent coins), then in the long run we do expect your
average earning to be close to 90 dollars, which is the reason why
casinos can make money in a predictable way even though every
individual bet is random. For example, if we toss 𝑛 independent and
unbiased coins, then as 𝑛 grows, the number of coins that come up
ones will be more and more concentrated around 𝑛/2 according to the
famous “bell curve” (see Fig. 6).
Much of probability theory is concerned with so called concentration
or tail bounds, which are upper bounds on the probability that a ran-
dom variable 𝑋 deviates too much from its expectation. The first and
simplest one of them is Markov’s inequality:

If 𝑋 is a non-negative random
Theorem 0.12 — Markov’s inequality.
variable then Pr[𝑋 ≥ 𝑘 𝔼[𝑋]] ≤ 1/𝑘.

Figure 6: The probabilities that we obtain a particular


sum when we toss 𝑛 = 10, 20, 100, 1000 coins
P converge quickly to the Gaussian/normal distribution.
Markov’s Inequality is actually a very natural state-
ment (see also Fig. 7). For example, if you know that
the average (not the median!) household income in
the US is 70,000 dollars, then in particular you can
deduce that at most 25 percent of households make
more than 280,000 dollars, since otherwise, even if
the remaining 75 percent had zero income, the top
25 percent alone would cause the average income to
be larger than 70,000 dollars. From this example you
can already see that in many situations, Markov’s
inequality will not be tight and the probability of devi-
ating from expectation will be much smaller: see the
Chebyshev and Chernoff inequalities below.
38 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Proof of Theorem 0.12. Let 𝜇 = 𝔼[𝑋] and define 𝑌 = 1𝑋≥𝑘𝜇 . That


is, 𝑌 (𝑥) = 1 if 𝑋(𝑥) ≥ 𝑘𝜇 and 𝑌 (𝑥) = 0 otherwise. Note that by
definition, for every 𝑥, 𝑌 (𝑥) ≤ 𝑋/(𝑘𝜇). We need to show 𝔼[𝑌 ] ≤ 1/𝑘.
But this follows since 𝔼[𝑌 ] ≤ 𝔼[𝑋/𝑘(𝜇)] = 𝔼[𝑋]/(𝑘𝜇) = 𝜇/(𝑘𝜇) = 1/𝑘.

Going beyond Markov’s Inequality: Markov’s inequality says that a (non-


negative) random variable 𝑋 can’t go too crazy and be, say, a million
times its expectation, with significant probability. But ideally we
would like to say that with high probability, 𝑋 should be very close to
its expectation, e.g., in the range [0.99𝜇, 1.01𝜇] where 𝜇 = 𝔼[𝑋]. This
is not generally true, but does turn out to hold when 𝑋 is obtained
by combining (e.g., adding) many independent random variables.
Figure 7: Markov’s Inequality tells us that a non-
This phenomenon, variants of which are known as “law of large num-
negative random variable 𝑋 cannot be much larger
bers”, “central limit theorem”, “invariance principles” and “Chernoff than its expectation, with high probability. For exam-
bounds”, is one of the most fundamental in probability and statistics, ple, if the expectation of 𝑋 is 𝜇, then the probability
that 𝑋 > 4𝜇 must be at most 1/4, as otherwise just
and is one that we heavily use in computer science as well. the contribution from this part of the sample space
will be too large.
0.5.1 Chebyshev’s Inequality
A standard way to measure the deviation of a random variable from
its expectation is by using its standard deviation. For a random variable
𝑋, we define the variance of 𝑋 as Var[𝑋] = 𝔼[(𝑋 − 𝜇)2 ] where 𝜇 =
𝔼[𝑋]; i.e., the variance is the average squared distance of 𝑋 from its
expectation. The standard deviation of 𝑋 is defined as 𝜎[𝑋] = √Var[𝑋].
(This is well-defined since the variance, being an average of a square,
is always a non-negative number.)
Using Chebyshev’s inequality, we can control the probability that
a random variable is too many standard deviations away from its
expectation.

Suppose that 𝜇 = 𝔼[𝑋] and


Theorem 0.13 — Chebyshev’s inequality.
𝜎 = Var[𝑋]. Then for every 𝑘 > 0, Pr[|𝑋 − 𝜇| ≥ 𝑘𝜎] ≤ 1/𝑘2 .
2

Proof. The proof follows from Markov’s inequality. We define the


random variable 𝑌 = (𝑋 − 𝜇)2 . Then 𝔼[𝑌 ] = Var[𝑋] = 𝜎2 , and hence
by Markov the probability that 𝑌 > 𝑘2 𝜎2 is at most 1/𝑘2 . But clearly
(𝑋 − 𝜇)2 ≥ 𝑘2 𝜎2 if and only if |𝑋 − 𝜇| ≥ 𝑘𝜎.

One example of how to use Chebyshev’s inequality is the setting


when 𝑋 = 𝑋1 + ⋯ + 𝑋𝑛 where 𝑋𝑖 ’s are independent and identically
distributed (i.i.d for short) variables with values in [0, 1] where each
has expectation 1/2. Since 𝔼[𝑋] = ∑𝑖 𝔼[𝑋𝑖 ] = 𝑛/2, we would like to
say that 𝑋 is very likely to be in, say, the interval [0.499𝑛, 0.501𝑛]. Us-
ing Markov’s inequality directly will not help us, since it will only tell
mathe mati ca l backg rou n d 39

us that 𝑋 is very likely to be at most 100𝑛 (which we already knew,


since it always lies between 0 and 𝑛). However, since 𝑋1 , … , 𝑋𝑛 are
independent,
Var[𝑋1 + ⋯ + 𝑋𝑛 ] = Var[𝑋1 ] + ⋯ + Var[𝑋𝑛 ] . (1)
(We leave showing this to the reader as Exercise 0.11.)
For every random variable 𝑋𝑖 in [0, 1], Var[𝑋𝑖 ] ≤ 1 (if the variable
is always in [0, 1], it can’t be more than 1 away from its expectation),

and hence (1) implies that Var[𝑋] ≤ 𝑛 and hence 𝜎[𝑋] ≤ 𝑛. For
√ √
large 𝑛, 𝑛 ≪ 0.001𝑛, and in particular if 𝑛 ≤ 0.001𝑛/𝑘, we can
use Chebyshev’s inequality to bound the probability that 𝑋 is not in
[0.499𝑛, 0.501𝑛] by 1/𝑘2 .

0.5.2 The Chernoff bound


Chebyshev’s inequality already shows a connection between inde-
pendence and concentration, but in many cases we can hope for
a quantitatively much stronger result. If, as in the example above,
𝑋 = 𝑋1 + … + 𝑋𝑛 where the 𝑋𝑖 ’s are bounded i.i.d random variables
of mean 1/2, then as 𝑛 grows, the distribution of 𝑋 would be roughly
the normal or Gaussian distribution− that is, distributed according to
the bell curve (see Fig. 6 and Fig. 8). This distribution has the property
of being very concentrated in the sense that the probability of devi-
ating 𝑘 standard deviations from the mean is not merely 1/𝑘2 as is
guaranteed by Chebyshev, but rather is roughly 𝑒−𝑘 . Specifically, for
2

a normal random variable 𝑋 of expectation 𝜇 and standard deviation


𝜎, the probability that |𝑋 − 𝜇| ≥ 𝑘𝜎 is at most 2𝑒−𝑘 /2 . That is, we have
2

an exponential decay of the probability of deviation.


The following extremely useful theorem shows that such expo-
nential decay occurs every time we have a sum of independent and
bounded variables. This theorem is known under many names in dif-
ferent communities, though it is mostly called the Chernoff bound in
the computer science literature:

If 𝑋1 , … , 𝑋𝑛 are i.i.d random


Theorem 0.14 — Chernoff/Hoeffding bound.
variables such that 𝑋𝑖 ∈ [0, 1] and 𝔼[𝑋𝑖 ] = 𝑝 for every 𝑖, then for
every 𝜖 > 0 Figure 8: In the normal distribution or the Bell curve,
𝑛−1
the probability of deviating 𝑘 standard deviations
Pr[∣∑ 𝑋𝑖 − 𝑝𝑛∣ > 𝜖𝑛] ≤ 2 ⋅ 𝑒 −2𝜖2 𝑛
. from the expectation shrinks exponentially in 𝑘2 , and
𝑖=0 2
specifically with probability at least 1 − 2𝑒−𝑘 /2 , a
random variable 𝑋 of expectation 𝜇 and standard
We omit the proof, which appears in many texts, and uses Markov’s deviation 𝜎 satisfies 𝜇 − 𝑘𝜎 ≤ 𝑋 ≤ 𝜇 + 𝑘𝜎. This figure
inequality on i.i.d random variables 𝑌0 , … , 𝑌𝑛 that are of the form gives more precise bounds for 𝑘 = 1, 2, 3, 4, 5, 6.
(Image credit:Imran Baghirov)
𝑌𝑖 = 𝑒𝜆𝑋𝑖 for some carefully chosen parameter 𝜆. See Exercise 0.14
for a proof of the simple (but highly useful and representative) case
where each 𝑋𝑖 is {0, 1} valued and 𝑝 = 1/2. (See also Exercise 0.15 for
a generalization.)
40 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

0.6 EXERCISES
Exercise 0.1Prove that for every finite 𝑆, 𝑇 , there are (|𝑇 | + 1)|𝑆| partial
functions from 𝑆 to 𝑇 .

For every pair of functions 𝐹 , 𝐺 below, deter-


Exercise 0.2 — 𝑂-notation.
mine which of the following relations holds: 𝐹 = 𝑂(𝐺), 𝐹 = Ω(𝐺),
𝐹 = 𝑜(𝐺) or 𝐹 = 𝜔(𝐺).

a. 𝐹 (𝑛) = 𝑛, 𝐺(𝑛) = 100𝑛.



b. 𝐹 (𝑛) = 𝑛, 𝐺(𝑛) = 𝑛.

c. 𝐹 (𝑛) = 𝑛 log 𝑛, 𝐺(𝑛) = 2(log(𝑛)) .


2


d. 𝐹 (𝑛) = 𝑛, 𝐺(𝑛) = 2√log 𝑛

e. 𝐹 (𝑛) = (⌈0.2𝑛⌉
𝑛
) , 𝐺(𝑛) = 20.1𝑛 (where (𝑛𝑘) is the number of 𝑘-sized
2
one way to do this is to use Stirling’s approximation
subsets of a set of size 𝑛) and 𝑔(𝑛) = 20.1𝑛 . See footnote for hint.2 for the factorial function..

Exercise 0.3Give an example of a pair of functions 𝐹 , 𝐺 ∶ ℕ → ℕ such


that neither 𝐹 = 𝑂(𝐺) nor 𝐺 = 𝑂(𝐹 ) holds.

In the follow-
Exercise 0.4 — Properties of expectation and variance.
ing exercise 𝑋, 𝑌 denote random variables over some sample
space 𝑆. You can assume that the probability on 𝑆 is the uniform
distribution— every point 𝑠 is output with probability 1/|𝑆|. Thus
𝔼[𝑋] = (1/|𝑆|) ∑𝑠∈𝑆 𝑋(𝑠). We define the variance and standard
deviation of 𝑋 and 𝑌 as above (e.g., 𝑉 𝑎𝑟[𝑋] = 𝔼[(𝑋 − 𝔼[𝑋])2 ] and the
standard deviation is the square root of the variance). You can reuse
your answers to prior questions in the later ones.

1. Prove that 𝑉 𝑎𝑟[𝑋] is always non-negative.

2. Prove that 𝑉 𝑎𝑟[𝑋] = 𝔼[𝑋 2 ] − 𝔼[𝑋]2 .

3. Prove that always 𝔼[𝑋 2 ] ≥ 𝔼[𝑋]2 .

4. Give an example for a random variable 𝑋 such that 𝔼[𝑋 2 ] > 𝔼[𝑋]2 .

5. Give an example for a random variable 𝑋 such that its standard


deviation is not equal to 𝔼[|𝑋 − 𝔼[𝑋]|].

6. Give an example for a random variable 𝑋 such that its standard


deviation is equal to to 𝔼[|𝑋 − 𝔼[𝑋]|].

7. Give an example for two random variables 𝑋, 𝑌 such that 𝔼[XY] =


𝔼[𝑋]𝔼[𝑌 ].
mathe mati ca l backg rou n d 41

8. Give an example for two random variables 𝑋, 𝑌 such that 𝔼[XY] ≠


𝔼[𝑋]𝔼[𝑌 ].

9. Prove that if 𝑋 and 𝑌 are independent random variables (i.e., for


every 𝑥, 𝑦, Pr[𝑋 = 𝑥 ∧ 𝑌 = 𝑦] = Pr[𝑋 = 𝑥] Pr[𝑌 = 𝑦]) then
𝔼[XY] = 𝔼[𝑋]𝔼[𝑌 ] and 𝑉 𝑎𝑟[𝑋 + 𝑌 ] = 𝑉 𝑎𝑟[𝑋] + 𝑉 𝑎𝑟[𝑌 ].

Exercise 0.5 — Random hash function. Suppose that 𝐻 is chosen to be a


random function mapping the numbers {1, … , 𝑛} to the numbers
{1, .., 𝑚}. That is, for every 𝑖 ∈ {1, … , 𝑛}, 𝐻(𝑖) is chosen to be a ran-
dom number in {1, … , 𝑚} and that choice is done independently for
every 𝑖. For every 𝑖 < 𝑗 ∈ {1, … , 𝑛}, define the random variable 𝑋𝑖,𝑗 to
equal 1 if there was a collision between 𝐻(𝑖) and 𝐻(𝑗) in the sense that
𝐻(𝑖) = 𝐻(𝑗) and to equal 0 otherwise.

1. For every 𝑖 < 𝑗, compute 𝔼[𝑋𝑖,𝑗 ].

2. Define 𝑌 = ∑𝑖<𝑗 𝑋𝑖,𝑗 to be the total number of collisions. Compute


𝔼[𝑌 ] as a function of 𝑛 and 𝑚. In particular your answer should
imply that if 𝑚 < 𝑛2 /1000 then 𝔼[𝑌 ] > 1 and hence in expectation
there should be at least one collision and so the function 𝐻 will not
be one to one.

3. Prove that if 𝑚 > 1000 ⋅ 𝑛2 then the probability that 𝐻 is one to one
is at least 0.9.

4. Give an example of a random variable 𝑍 (unrelated to the function


𝐻) that is always equal to a non-negative integer, and such that
𝔼[𝑍] ≥ 1000 but Pr[𝑍 > 0] < 0.001.

5. Prove that if 𝑚 < 𝑛2 /1000 then the probability that 𝐻 is one to one
is at most 0.1.

0.7 EXERCISES
Suppose that we toss three independent fair coins 𝑎, 𝑏, 𝑐 ∈
Exercise 0.6
{0, 1}. What is the probability that the XOR of 𝑎,𝑏, and 𝑐 is equal to 1?
What is the probability that the AND of these three values is equal to
1? Are these two events independent?

Give an example of random variables 𝑋, 𝑌 ∶ {0, 1}3 → ℝ


Exercise 0.7
such that 𝔼[XY] ≠ 𝔼[𝑋] 𝔼[𝑌 ].

42 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Give an example of random variables 𝑋, 𝑌 ∶ {0, 1}3 → ℝ


Exercise 0.8
such that 𝑋 and 𝑌 are not independent but 𝔼[XY] = 𝔼[𝑋] 𝔼[𝑌 ].

Exercise 0.9 — Product of expectations. Prove Lemma 0.10


Exercise 0.10 — Transformations preserve independence. Prove Lemma 0.11


Prove that if
Exercise 0.11 — Variance of independent random variables.
𝑋0 , … , 𝑋𝑛−1 are independent random variables then Var[𝑋0 + ⋯ +
𝑛−1
𝑋𝑛−1 ] = ∑𝑖=0 Var[𝑋𝑖 ].

Recall the definition of a distribution


Exercise 0.12 — Entropy (challenge).
𝜇 over some finite set 𝑆. Shannon defined the entropy of a distribution
𝜇, denoted by 𝐻(𝜇), to be ∑𝑥∈𝑆 𝜇(𝑥) log(1/𝜇(𝑥)). The idea is that if 𝜇
is a distribution of entropy 𝑘, then encoding members of 𝜇 will require
𝑘 bits, in an amortized sense. In this exercise we justify this definition.
Let 𝜇 be such that 𝐻(𝜇) = 𝑘.
1. Prove that for every one to one function 𝐹 ∶ 𝑆 → {0, 1}∗ ,
𝔼𝑥∼𝜇 |𝐹 (𝑥)| ≥ 𝑘.
2. Prove that for every 𝜖, there is some 𝑛 and a one-to-one function
𝐹 ∶ 𝑆 𝑛 → {0, 1}∗ , such that 𝔼𝑥∼𝜇𝑛 |𝐹 (𝑥)| ≤ 𝑛(𝑘 + 𝜖), where 𝑥 ∼ 𝜇
denotes the experiments of choosing 𝑥0 , … , 𝑥𝑛−1 each independently
from 𝑆 using the distribution 𝜇.

Exercise 0.13 — Entropy approximation to binomial. Let 𝐻(𝑝) = 𝑝 log(1/𝑝) + 3


While you don’t need this to solve this exercise, this
(1 − 𝑝) log(1/(1 − 𝑝)).3 is the function that maps 𝑝 to the entropy (as defined
Prove that for every 𝑝 ∈ (0, 1) and 𝜖 > 0, if 𝑛 is large enough then4 in Exercise 0.12) of the 𝑝-biased coin distribution over
{0, 1}, which is the function 𝜇 ∶ {0, 1} → [0, 1] s.y.
𝑛 𝜇(0) = 1 − 𝑝 and 𝜇(1) = 𝑝.
2(𝐻(𝑝)−𝜖)𝑛 ( ) ≤ 2(𝐻(𝑝)+𝜖)𝑛 4
Hint: Use Stirling’s formula for approximating the
𝑝𝑛 factorial function.

where (𝑛𝑘) is the binomial coefficient 𝑘!(𝑛−𝑘)!


𝑛!
which is equal to the
number of 𝑘-size subsets of {0, … , 𝑛 − 1}.

Exercise 0.14 — Chernoff using Stirling. 1. Prove that Pr𝑥∼{0,1}𝑛 [∑ 𝑥𝑖 =


𝑘] = (𝑛𝑘)2−𝑛 .

2. Use this and Exercise 0.13 to prove (an approximate version of)
the Chernoff bound for the case that 𝑋0 , … , 𝑋𝑛−1 are i.i.d. random
variables over {0, 1} each equaling 0 and 1 with probability 1/2.
That is, prove that for every 𝜖 > 0, and 𝑋0 , … , 𝑋𝑛−1 as above,
𝑛−1
Pr[| ∑𝑖=0 − 𝑛/2
| > 𝜖𝑛] < 2
0.1⋅𝜖2 𝑛
.

mathe mati ca l backg rou n d 43

Exercise 0.15 — Poor man’s Chernoff.Exercise 0.14 establishes the Chernoff


bound for the case that 𝑋0 , … , 𝑋𝑛−1 are i.i.d variables over {0, 1} with
expectation 1/2. In this exercise we use a slightly different method
(bounding the moments of the random variables) to establish a version
of Chernoff where the random variables range over [0, 1] and their
expectation is some number 𝑝 ∈ [0, 1] that may be different than
1/2. Let 𝑋0 , … , 𝑋𝑛−1 be i.i.d random variables with 𝔼 𝑋𝑖 = 𝑝 and
Pr[0 ≤ 𝑋𝑖 ≤ 1] = 1. Define 𝑌𝑖 = 𝑋𝑖 − 𝑝.

1. Prove that for every 𝑗0 , … , 𝑗𝑛−1 ∈ ℕ, if there exists one 𝑖 such that 𝑗𝑖
𝑛−1 𝑗
is odd then 𝔼[∏𝑖=0 𝑌𝑖 𝑖 ] = 0.
5
Hint: Bound the number of tuples 𝑗0 , … , 𝑗𝑛−1 such
2. Prove that for every 𝑘, 𝔼[(∑𝑖=0 𝑌𝑖 )𝑘 ] ≤ (10𝑘𝑛)𝑘/2 .5
𝑛−1
that every 𝑗𝑖 is even and ∑ 𝑗𝑖 = 𝑘 using the Binomial
coefficient and the fact that in any such tuple there are
𝑛/(10000 log 1/𝜖) 6 at most 𝑘/2 distinct indices.
3. Prove that for every 𝜖 > 0, Pr[| ∑𝑖 𝑌𝑖 | ≥ 𝜖𝑛] ≥ 2−𝜖 .
2
6
Hint: Set 𝑘 = 2⌈𝜖2 𝑛/1000⌉ and then show that if the
event | ∑ 𝑌𝑖 | ≥ 𝜖𝑛 happens then the random variable
■ (∑ 𝑌𝑖 )𝑘 is a factor of 𝜖−𝑘 larger than its expectation.

The Chernoff bound


Exercise 0.16 — Lower bound for distinguishing coins.
can be used to show that if you were given a coin of bias at least 𝜖,
you should only need 𝑂(1/𝜖2 ) samples to be able to reject the “null
hypothesis” that the coin is completely unbiased with extremely high
confidence. In the following somewhat more challenging question, we
try to show a converse to this, proving that distinguishing between a
fair every coin and a coin that outputs “heads” with probability 1/2+𝜖
requires at least Ω(1/𝜖2 ) samples.
𝑛
Let 𝑃 be the uniform distribution over {0, 1} and 𝑄 be the 1/2 + 𝜖-
biased distribution corresponding to tossing 𝑛 coins in which each one
has a probability of 1/2 + 𝜖 of equaling 1 and probability 1/2 − 𝜖 of
𝑛
equaling 0. Namely the probability of 𝑥 ∈ {0, 1} according to 𝑄 is
𝑛
equal to ∏𝑖=1 (1/2 − 𝜖 + 2𝜖𝑥𝑖 ).

1. Prove that for every threshold 𝜃 between 0 and 𝑛, if 𝑛 < 1/(100𝜖)2


then the probabilities that ∑ 𝑥𝑖 ≤ 𝜃 under 𝑃 and 𝑄 respectively
differ by at most 0.1. Therefore, one cannot use the test whether
the number of heads is above or below some threshold to reliably
distinguish between these two possibilities unless the number of
samples 𝑛 of the coins is at least some constant times 1/𝜖2 .
𝑛
2. Prove that for every function 𝐹 mapping {0, 1} to {0, 1}, if 𝑛 <
1/(100𝜖)2 then the probabilities that 𝐹 (𝑥) = 1 under 𝑃 and 𝑄 re-
spectively differ by at most 0.1. Therefore, if the number of samples
is smaller than a constant times 1/𝜖2 then there is simply no test that
can reliably distinguish between these two possibilities.


44 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Our model for proba-


Exercise 0.17 — Simulating distributions using coins.
bility involves tossing 𝑛 coins, but sometimes algorithms require sam-
pling from other distributions, such as selecting a uniform number in
{0, … , 𝑀 − 1} for some 𝑀 . Fortunately, we can simulate this with an
exponentially small probability of error: prove that for every 𝑀 , if 𝑛 >
𝑘⌈log 𝑀 ⌉, then there is a function 𝐹 ∶ {0, 1}𝑛 → {0, … , 𝑀 − 1} ∪ {⊥}
such that (1) The probability that 𝐹 (𝑥) = ⊥ is at most 2−𝑘 and (2) the
distribution of 𝐹 (𝑥) conditioned on 𝐹 (𝑥) ≠ ⊥ is equal to the uniform 7
Hint: Think of 𝑥 ∈ {0, 1}𝑛 as choosing 𝑘 numbers
distribution over {0, … , 𝑀 − 1}.7 𝑦1 , … , 𝑦𝑘 ∈ {0, … , 2⌈log 𝑀⌉ − 1}. Output the first such
■ number that is in {0, … , 𝑀 − 1}.

Suppose that a country has 300,000,000 citi-


Exercise 0.18 — Sampling.
zens, 52 percent of which prefer the color “green” and 48 percent of
which prefer the color “orange”. Suppose we sample 𝑛 random citi-
zens and ask them their favorite color (assume they will answer truth-
fully). What is the smallest value 𝑛 among the following choices so
that the probability that the majority of the sample answers “green” is
at most 0.05?

a. 1,000

b. 10,000

c. 100,000

d. 1,000,000

Exercise 0.19 Would the answer to Exercise 0.18 change if the country
had 300,000,000,000 citizens?

Under the same assumptions as Exer-


Exercise 0.20 — Sampling (2).
cise 0.18, what is the smallest value 𝑛 among the following choices so
that the probability that the majority of the sample answers “green” is
at most 2−100 ?

a. 1,000

b. 10,000

c. 100,000

d. 1,000,000

e. It is impossible to get such low probability since there are fewer


than 2100 citizens.


1
Introduction

Additional reading: Chapters 1 and 2 of Katz-Lindell book. Sections 2.1


(Introduction) and 2.2 (Shannon ciphers and perfect security) in the 1
Referring to a book such as Katz-Lindell or Boneh-
Boneh Shoup book. 1 Shoup can be useful during this course to supplement
Ever since people started to communicate, there were some mes- these notes with additional discussions, extensions,
details, practical applications, or references. In partic-
sages that they wanted kept secret. Thus cryptography has an old ular, in the current state of these lecture notes, almost
though arguably undistinguished history. For a long time cryptogra- all references and credits are omitted unless the name
has become standard in the literature, or I believe that
phy shared similar features with Alchemy as a domain in which many
the story of some discovery can serve a pedagogical
otherwise smart people would be drawn into making fatal mistakes. point. See the Katz-Lindell book for historical notes
Indeed, the history of cryptography is littered with the figurative and references. This lecture shares a lot of text with
(though is not identical to) my lecture on cryptog-
corpses of cryptosystems believed secure and then broken, and some- raphy in the introduction to theoretical computer
times with the actual corpses of those who have mistakenly placed science lecture notes.
their faith in these cryptosystems. The definitive text on the history
of cryptography is David Kahn’s “The Codebreakers”, whose title al- 2
Traditionally, cryptography was the name for the
ready hints at the ultimate fate of most cryptosystems.2 (See also “The activity of making codes, while cryptoanalysis is the
Code Book” by Simon Singh.) name for the activity of breaking them, and cryptology
is the name for the union of the two. These days
We recount below just a few stories to get a feel for this field. But cryptography is often used as the name for the broad
before we do so, we should introduce the cast of characters. The basic science of constructing and analyzing the security of
not just encryptions but many schemes and protocols
setting of “encryption” or “secret writing” is the following: one per-
for protecting the confidentiality and integrity of
son, whom we will call Alice, wishes to send another person, whom communication and computation.
we will call Bob, a secret message. Since Alice and Bob are not in the
same room (perhaps because Alice is imprisoned in a castle by her
cousin the queen of England), they cannot communicate directly and
need to send their message in writing. Alas, there is a third person,
whom we will call Eve, that can see their message. Therefore Alice
needs to find a way to encode or encrypt the message so that only Bob
(and not Eve) will be able to understand it.

1.1 SOME HISTORY


In 1587, Mary the queen of Scots, and heir to the throne of England,
wanted to arrange the assassination of her cousin, queen Elisabeth I
of England, so that she could ascend to the throne and finally escape

Compiled on 11.17.2021 22:35


46 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

the house arrest under which she had been for the last 18 years. As
part of this complicated plot, she sent a coded letter to Sir Anthony
Babington.
Mary used what’s known as a substitution cipher where each letter
is transformed into a different obscure symbol (see Fig. 1.1). At a first
look, such a letter might seem rather inscrutable- a meaningless se-
quence of strange symbols. However, after some thought, one might
recognize that these symbols repeat several times and moreover that
different symbols repeat with different frequencies. Now it doesn’t Figure 1.1: Snippet from encrypted communication

take a large leap of faith to assume that perhaps each symbol corre- between queen Mary and Sir Babington

sponds to a different letter and the more frequent symbols correspond


to letters that occur in the alphabet with higher frequency. From this
observation, there is a short gap to completely breaking the cipher,
which was in fact done by queen Elisabeth’s spies who used the de-
coded letters to learn of all the co-conspirators and to convict queen
Mary of treason, a crime for which she was executed. Trusting in su-
perficial security measures (such as using “inscrutable” symbols) is a
trap that users of cryptography have been falling into again and again
over the years. (As in many things, this is the subject of a great XKCD
cartoon, see Fig. 1.2.)
The Vigenère cipher is named after Blaise de Vigenère who de- Figure 1.2: XKCD’s take on the added security of using
uncommon symbols
scribed it in a book in 1586 (though it was invented earlier by Bellaso).
The idea is to use a collection of substitution ciphers - if there are 𝑛
different ciphers then the first letter of the plaintext is encoded with
the first cipher, the second with the second cipher, the 𝑛𝑡ℎ with the
𝑛𝑡ℎ cipher, and then the 𝑛 + 1𝑠𝑡 letter is again encoded with the first
cipher. The key is usually a word or a phrase of 𝑛 letters, and the
𝑖𝑡ℎ substitution cipher is obtained by shifting each letter 𝑘𝑖 positions
in the alphabet. This “flattens” the frequencies and makes it much
harder to do frequency analysis, which is why this cipher was consid-
Figure 1.3: Confederate Cipher Disk for implementing
ered “unbreakable” for 300+ years and got the nickname “le chiffre
the Vigenère cipher
indéchiffrable” (“the unbreakable cipher”). Nevertheless, Charles
Babbage cracked the Vigenère cipher in 1854 (though he did not pub-
lish it). In 1863 Friedrich Kasiski broke the cipher and published the
result. The idea is that once you guess the length of the cipher, you
can reduce the task to breaking a simple substitution cipher which can
be done via frequency analysis (can you see why?). Confederate gen-
erals used Vigenère regularly during the civil war, and their messages
were routinely cryptanalyzed by Union officers.
The Enigma cipher was a mechanical cipher (looking like a type- Figure 1.4: Confederate encryption of the message
“Gen’l Pemberton: You can expect no help from this
writer, see Fig. 1.5) where each letter typed would get mapped into a side of the river. Let Gen’l Johnston know, if possible,
different letter depending on the (rather complicated) key and current when you can attack the same point on the enemy’s
state of the machine which had several rotors that rotated at different lines. Inform me also and I will endeavor to make a
diversion. I have sent some caps. I subjoin a despatch
paces. An identically wired machine at the other end could be used from General Johnston.”
i n trod u c ti on 47

to decrypt. Just as many ciphers in history, this has also been believed
by the Germans to be “impossible to break” and even quite late in the
war they refused to believe it was broken despite mounting evidence
to that effect. (In fact, some German generals refused to believe it
was broken even after the war.) Breaking Enigma was an heroic effort
which was initiated by the Poles and then completed by the British at
Bletchley Park, with Alan Turing (of the Turing machines) playing a
key role. As part of this effort the Brits built arguably the world’s first
large scale mechanical computation devices (though they looked more
similar to washing machines than to iPhones). They were also helped
along the way by some quirks and errors of the German operators. For
example, the fact that their messages ended with “Heil Hitler” turned
out to be quite useful.
Here is one entertaining anecdote: the Enigma machine would
never map a letter to itself. In March 1941, Mavis Batey, a cryptana-
lyst at Bletchley Park received a very long message that she tried to
decrypt. She then noticed a curious property— the message did not
contain the letter “L”.3 She realized that the probability that no “L”’s
appeared in the message is too small for this to happen by chance.
Hence she surmised that the original message must have been com-
posed only of L’s. That is, it must have been the case that the operator,
perhaps to test the machine, have simply sent out a message where he Figure 1.5: In the Enigma mechanical cipher the secret
repeatedly pressed the letter “L”. This observation helped her decode key would be the settings of the rotors and internal
wires. As the operator types up their message, the
the next message, which helped inform of a planned Italian attack and encrypted version appeared in the display area above,
secure a resounding British victory in what became known as “the and the internal state of the cipher was updated (and
so typing the same letter twice would generally result
Battle of Cape Matapan”. Mavis also helped break another Enigma
in two different letters output). Decrypting follows
machine. Using the information she provided, the Brits were able the same process: if the sender and receiver are using
to feed the Germans with the false information that the main allied the same key then typing the ciphertext would result
in the plaintext appearing in the display.
invasion would take place in Pas de Calais rather than on Normandy. 3
Here is a nice exercise: compute (up to an order
In the words of General Eisenhower, the intelligence from Bletchley of magnitude) the probability that a 50-letter long
park was of “priceless value”. It made a huge difference for the Allied message composed of random letters will end up not
containing the letter “L”.
war effort, thereby shortening World War II and saving millions of
lives. See also this interview with Sir Harry Hinsley.

1.2 DEFINING ENCRYPTIONS


Many of the troubles that cryptosystem designers faced over history
(and still face!) can be attributed to not properly defining or under-
standing what the goals they want to achieve are in the first place. We
now turn to actually defining what is an encryption scheme. Clearly
we can encode every message as a string of bits, i.e., an element of
{0, 1}ℓ for some ℓ. Similarly, we can encode the key as a string of bits
as well, i.e., an element of {0, 1}𝑛 for some 𝑛. Thus, we can think of
an encryption scheme as composed of two functions. The encryption
function 𝐸 maps a secret key 𝑘 ∈ {0, 1}𝑛 and a message (known also
48 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

as plaintext) 𝑚 ∈ {0, 1}ℓ into a ciphertext 𝑐 ∈ {0, 1}𝐿 for some 𝐿. We


write this as 𝑐 = 𝐸𝑘 (𝑚). The decryption function 𝐷 does the reverse
operation, mapping the secret key 𝑘 and the ciphertext 𝑐 back into the
plaintext message 𝑚, which we write as 𝑚 = 𝐷𝑘 (𝑐). The basic equa-
tion is that if we use the same key for encryption and decryption, then
we should get the same message back. That is, for every 𝑘 ∈ {0, 1}𝑛
and 𝑚 ∈ {0, 1}ℓ ,

𝑚 = 𝐷𝑘 (𝐸𝑘 (𝑚)) .

This motivates the following definition which attempts to capture


what it means for an encryption scheme to be valid or “make sense”,
regardless of whether or not it is secure:

Let ℓ ∶ ℕ → ℕ and 𝐶 ∶ ℕ → ℕ
Definition 1.1 — Valid encryption scheme.
be two functions mapping natural numbers to natural numbers.
A pair of polynomial-time computable functions (𝐸, 𝐷) map-
ping strings to strings is a valid private key encryption scheme (or
encryption scheme for short) with plaintext length function ℓ(⋅) and
ciphertext length function 𝐶(⋅) if for every 𝑛 ∈ ℕ, 𝑘 ∈ {0, 1}𝑛 and
𝑚 ∈ {0, 1}ℓ(𝑛) , |𝐸𝑘 (𝑚)| = 𝐶(𝑛) and

𝐷(𝑘, 𝐸(𝑘, 𝑚)) = 𝑚 . (1.1)

We will often write the first input (i.e., the key) to the encryp-
tion and decryption as a subscript and so can write (1.1) also as
𝐷𝑘 (𝐸𝑘 (𝑚)) = 𝑚.

Figure 1.6: A private-key encryption scheme is a


pair of algorithms 𝐸, 𝐷 such that for every key
𝑘 ∈ {0, 1}𝑛 and plaintext 𝑥 ∈ {0, 1}ℓ(𝑛) , 𝑐 = 𝐸𝑘 (𝑚)
is a ciphertext of length 𝐶(𝑛). The encryption scheme
is valid if for every such 𝑦, 𝐷𝑘 (𝑦) = 𝑥. That is, the
decryption of an encryption of 𝑥 is 𝑥, as long as both
encryption and decryption use the same key.

The validity condition implies that for any fixed 𝑘, the map 𝑚 ↦
𝐸𝑘 (𝑚) is one to one (can you see why?) and hence the ciphertext
length is always at least the plaintext length. Thus we typically focus
on the plaintext length as the quantity to optimize in an encryption
i n trod u c ti on 49

scheme. The larger ℓ(𝑛) is, the better the scheme, since it means we
need a shorter secret key to protect messages of the same length.

R
Remark 1.2 — A note on notation, and comparison with
Katz-Lindell, Boneh-Shoup, and other texts.. A note on
notation: We will always use 𝑖, 𝑗, ℓ, 𝑛 to denote natural
numbers.
The number 𝑛 will often denote the length of our
secret key. The length of the key (or another closely
related number) is often known as the security parame-
ter in the literature. Katz-Lindell also uses 𝑛 to denote
this parameter, while Boneh-Shoup and Rosulek use
𝜆 for it. (Some texts also use the Greek letter 𝜅 for the
same parameter.) We chose to denote the security
parameter by 𝑛 as to correspond with the standard
algorithmic notation for input length (as in 𝑂(𝑛) or
𝑂(𝑛2 ) time algorithms).
We often use ℓ to denote the length of the message,
sometimes also known as “block length” since longer
messages are simply chopped into “blocks” of length ℓ
and also appropriately padded.
We will use 𝑘 to denote the secret key, 𝑚 to denote
the secret plaintext message, and 𝑐 to denote the en-
crypted ciphertext. Note that 𝑘, 𝑚, 𝑐 are not numbers
but rather bit strings of lengths 𝑛, ℓ(𝑛), 𝐶(𝑛) respec-
tively. We will also sometimes use 𝑥 and 𝑦 to denote
strings, and so sometimes use 𝑥 as the plaintext and
𝑦 as the ciphertext. In general, while we try to reserve
variable names for particular purposes, cryptography
uses so many concepts that it would sometimes need
to “reuse” the same letter for different purposes.
For simplicity, we denote the space of possible keys as
{0, 1}𝑛 and the space of possible messages as {0, 1}ℓ
for ℓ = ℓ(𝑛). Boneh-Shoup uses a more general no-
tation of 𝒦 for the space of all possible keys and ℳ
for the space of all possible messages. This does not
make much difference since we can represent every
discrete object such as a key or message as a binary
string. (One difference is that in principle the space
of all possible messages could include messages of
unbounded length, though in such a case what is done
in both theory and practice is to break these up into
finite-size blocks and encrypt one block at a time.)

1.3 DEFINING SECURITY OF ENCRYPTION


Definition 1.1 says nothing about security and does not rule out trivial
“encryption” schemes such as the scheme 𝐸𝑘 (𝑚) = 𝑚 that simply
outputs the plaintext as is. Defining security is tricky, and we’ll take it
one step at a time, but let’s start by pondering what is secret and what
50 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

is not. A priori we are thinking of an attacker Eve that simply sees


the ciphertext 𝑐 = 𝐸𝑘 (𝑚) and does not know anything on how it was
generated. So, it does not know the details of 𝐸 and 𝐷, and certainly
does not know the secret key 𝑘. However, many of the troubles past
cryptosystems went through were caused by them relying on “secu-
rity through obscurity”— trusting that the fact their methods are not
known to their enemy will protect them from being broken. This is a
faulty assumption - if you reuse a method again and again (even with
a different key each time) then eventually your adversaries will figure
out what you are doing. And if Alice and Bob meet frequently in a
secure location to decide on a new method, they might as well take the
opportunity to exchange their secret messages…
These considerations led Auguste Kerckhoffs in 1883 to state the
following principle:
A cryptosystem should be secure even if everything about
the system, except the key, is public knowledge. 4
4
The actual quote is “Il faut qu’il n’exige pas le
Why is it OK to assume the key is secret and not the algorithm? secret, et qu’il puisse sans inconvénient tomber entre
les mains de l’ennemi” loosely translated as “The
Because we can always choose a fresh key. But of course that won’t system must not require secrecy and can be stolen by
help us much if our key is “1234” or “passw0rd!”. In fact, if you use the enemy without causing trouble”. According to
Steve Bellovin the NSA version is “assume that the
any deterministic algorithm to choose the key then eventually your first copy of any device we make is shipped to the
adversary will figure this out. Therefore for security we must choose Kremlin”.
the key at random and can restate Kerckhoffs’s principle as follows:
There is no secrecy without randomness

This is such a crucial point that is worth repeating:


There is no secrecy without randomness

At the heart of every cryptographic scheme there is a secret key,


and the secret key is always chosen at random. A corollary of that is
that to understand cryptography, you need to know some probability
theory. Fortunately, we don’t need much of probability- only probabil-
ity over finite spaces, and basic notions such as expectation, variance,
concentration and the union bound suffice for most of we need. In
fact, understanding the following two statements will already get you
much of what you need for cryptography:

• For every fixed string 𝑥 ∈ {0, 1}𝑛 , if you toss a coin 𝑛 times, the
probability that the heads/tails pattern will be exactly 𝑥 is 2−𝑛 .

• A probability of 2−128 is really really small.

1.3.1 Generating randomness in actual cryptographic systems


How do we actually get random bits in actual systems? The main idea
is to use a two stage approach. First we need to get some data that
i n trod u c ti on 51

is unpredictable from the point of view of an attacker on our system.


Some sources for this could be measuring latency on the network or
hard drives (getting harder with solid state disk), user keyboard and
mouse movement patterns (problematic when you need fresh ran-
domness at boot time ), clock drift and more, there are some other
sources including audio, video, and network. All of these can be prob-
lematic, especially for servers or virtual machines, and so hardware
based random number generators based on phenomena such as ther-
mal noise or nuclear decay are becoming more popular. Once we
have some data 𝑋 that is unpredictable, we need to estimate the en-
tropy in it. You can roughly imagine that 𝑋 has 𝑘 bits of entropy if the
probability that an attacker can guess 𝑋 is at most 2−𝑘 . People then
use a hash function (an object we’ll talk about more later) to map 𝑋
into a string of length 𝑘 which is then hopefully distributed (close
to) uniformly at random. All of this process, and especially under-
standing the amount of information an attacker may have on the en-
tropy sources, is a bit of a dark art and indeed a number of attacks on
cryptographic systems were actually enabled by weak generation of
randomness. Here are a few examples.
One of the first attacks was on the SSL implementation of Netscape
(the browser at the time). Netscape used the following “unpre-
dictable” information— the time of day and a process ID both of
which turned out to be quite predictable (who knew attackers have
clocks too?). Netscape tried to protect its security through “security
through obscurity” by not releasing the source code for their pseu-
dorandom generator, but it was reverse engineered by Ian Goldberg
and David Wagner (Ph.D students at the time) who demonstrated this
attack.
In 2006 a programmer removed a line of code from the procedure
to generate entropy in OpenSSL package distributed by Debian since it
caused a warning in some automatic verification code. As a result for
two years (until this was discovered) all the randomness generated by
this procedure used only the process ID as an “unpredictable” source.
This means that all communication done by users in that period is
fairly easily breakable (and in particular, if some entities recorded that
communication they could break it also retroactively). This caused
a huge headache and a worldwide regeneration of keys, though it is
believed that many of the weak keys are still used. See XKCD’s take on
that incident.
In 2012 two separate teams of researchers scanned a large num-
ber of RSA keys on the web and found out that about 4 percent of
them are easy to break. The main issue were devices such as routers,
internet-connected printers and such. These devices sometimes run
variants of Linux- a desktop operating system- but without a hard Figure 1.7: XKCD Cartoon: Random number generator
52 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

drive, mouse or keyboard, they don’t have access to many of the en-
tropy sources that desktops have. Coupled with some good old fash-
ioned ignorance of cryptography and software bugs, this led to many
keys that are downright trivial to break, see this blog post and this
web page for more details.
After the entropy is collected and then “purified” or “extracted” to
a uniformly random string that is, say, a few hundred bits long, we of-
ten need to “expand” it into a longer string that is also uniform (or at
least looks like that for all practical purposes). We will discuss how to
go about that in the next lecture. This step has its weaknesses too, and
in particular the Snowden documents, combined with observations of
Shumow and Ferguson, strongly suggest that the NSA has deliberately
inserted a trapdoor in one of the pseudorandom generators published
by the National Institute of Standards and Technologies (NIST). Fortu-
nately, this generator wasn’t widely adopted, but apparently the NSA
did pay 10 million dollars to RSA security so the latter would make
this generator their default option in their products.

1.4 DEFINING THE SECRECY REQUIREMENT.


Defining the secrecy requirement for an encryption is not simple.
Over the course of history, many smart people got it wrong and con-
vinced themselves that ciphers were impossible to break. The first per-
son to truly ask the question in a rigorous way was Claude Shannon in
1945 (though a partial version of his manuscript was only declassified
in 1949). Simply by asking this question, he made an enormous contri-
bution to the science of cryptography and practical security. We now
will try to examine how one might answer it.
Let me warn you ahead of time that we are going to insist on a
mathematically precise definition of security. That means that the defi-
nition must capture security in all cases, and the existence of a single
counterexample, no matter how “silly”, would make us rule out a
candidate definition. This exercise of coming up with “silly” coun-
terexamples might seem, well, silly. But in fact it is this method that
has led Shannon to formulate his theory of secrecy, which (after much
followup work) eventually revolutionized cryptography, and brought
this science to a new age where Edgar Allan Poe’s maxim no longer
holds, and we are able to design ciphers which human (or even non-
human) ingenuity cannot break.
The most natural way to attack an encryption is for Eve to guess all
possible keys. In many encryption schemes this number is enormous
and this attack is completely infeasible. For example, the theoretical
number of possibilities in the Enigma cipher was about 10113 which
roughly means that even if we filled the milky way galaxy with com-
puters operating at light speed, the sun would still die out before it
i n trod u c ti on 53

5
There are about 1068 atoms in the galaxy, so even if
finished examining all the possibilities.5 One can understand why the we assumed that each one of those atoms was a com-
Germans thought it was impossible to break. (Note that despite the puter that can process say 1021 decryption attempts
per second (as the speed of light is 109 meters per
number of possibilities being so enormous, such a key can still be eas-
second and the diameter of an atom is about 10−12
ily specified and shared between Alice and Bob by writing down 113 meters), then it would still take 10113−89 = 1024
digits on a piece of paper.) Ray Miller of the NSA had calculated that, seconds, which is about 1017 years to exhaust all
possibilities, while the sun is estimated to burn out in
in the way the Germans used the machine, the number of possibilities about 5 billion years.
was “only” 1023 , but this is still extremely difficult to pull off even to-
day, and many orders of magnitudes above the computational powers
during the WW-II era. Thus clearly, it is sometimes possible to break
an encryption without trying all possibilities. A corollary is that hav-
ing a huge number of key combinations does not guarantee security,
as an attacker might find a shortcut (as the allies did for Enigma) and
recover the key without trying all options.
Since it is possible to recover the key with some tiny probability
(e.g. by guessing it at random), perhaps one way to define security
of an encryption scheme is that an attacker can never recover the key
with probability significantly higher than that. Here is an attempt at
such a definition:

An encryption
Definition 1.3 — Security of encryption: first attempt.
scheme (𝐸, 𝐷) is 𝑛-secure if no matter what method Eve employs,
the probability that she can recover the true key 𝑘 from the cipher-
text 𝑐 is at most 2−𝑛 .

P
When you see a mathematical definition that attempts
to model some real-life phenomenon such as security,
you should pause and ask yourself:

1. Do I understand mathematically what the defini-


tion is stating?
2. Is it a reasonable way to capture the real life phe-
nomenon we are discussing?

One way to answer question 2 is to try to think of


both examples of objects that satisfy the definition
and examples of objects that violate it, and see if this
conforms to your intuition about whether these objects
display the phenomenon we are trying to capture. Try
to do this for Definition 1.3

You might wonder if Definition 1.3 is not too strong. After all how
are we going to ever prove that Eve cannot recover the secret key no
matter what she does? Edgar Allan Poe would say that there can al-
ways be a method that we overlooked. However, in fact this definition
is too weak! Consider the following encryption: the secret key 𝑘 is cho-
sen at random in {0, 1}𝑛 but our encryption scheme simply ignores it
54 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

and lets 𝐸𝑘 (𝑚) = 𝑚 and 𝐷𝑘 (𝑐) = 𝑐. This is a valid encryption since


𝐷𝑘 (𝐸𝑘 (𝑚)) = 𝑚, but is of course completely insecure as we are simply
outputting the plaintext in the clear. Yet, no matter what Eve does, if
she only sees 𝑐 and not 𝑘, there is no way she can guess the true value
of 𝑘 with probability better than 2−𝑛 , since it was chosen completely at
random and she gets no information about it. Formally, one can prove
the following result:
Lemma 1.4 Let (𝐸, 𝐷) be the encryption scheme above. For every func-
tion 𝐸𝑣𝑒 ∶ {0, 1}ℓ → {0, 1}𝑛 and for every 𝑚 ∈ {0, 1}ℓ , the probability
that 𝐸𝑣𝑒(𝐸𝑘 (𝑚)) = 𝑘 is exactly 2−𝑛 .

Proof. This follows because 𝐸𝑘 (𝑚) = 𝑚 and hence 𝐸𝑣𝑒(𝐸𝑘 (𝑚)) =


𝐸𝑣𝑒(𝑚) which is some fixed value 𝑘′ ∈ {0, 1}𝑛 that is independent of
𝑘. Hence the probability that 𝑘 = 𝑘′ is 2−𝑛 . QED

The math behind the above argument is very simple, yet I urge
you to read and re-read the last two paragraphs until you are sure
that you completely understand why this encryption is in fact secure
according to the above definition. This is a “toy example” of the kind
of reasoning that we will be employing constantly throughout this
course, and you want to make sure that you follow it.
So, Lemma 1.4 is true, but one might question its meaning. Clearly
this silly example was not what we meant when stating this defini-
tion. However, as mentioned above, we are not willing to ignore even
silly examples and must amend the definition to rule them out. One
obvious objection is that we don’t care about hiding the key- it is the
message that we are trying to keep secret. This suggests the next at-
tempt:

An encryption
Definition 1.5 — Security of encryption: second attempt.
scheme (𝐸, 𝐷) is 𝑛-secure if for every message 𝑚 no matter what
method Eve employs, the probability that she can recover 𝑚 from
the ciphertext 𝑐 = 𝐸𝑘 (𝑚) is at most 2−𝑛 .

Now this seems like it captures our intended meaning. But remem-
ber that we are being anal, and truly insist that the definition holds
as stated, namely that for every plaintext message 𝑚 and every func-
tion 𝐸𝑣𝑒 ∶ {0, 1}𝐶 → {0, 1}ℓ , the probability over the choice of 𝑘 that
𝐸𝑣𝑒(𝐸𝑘 (𝑚)) = 𝑚 is at most 2−𝑛 . But now we see that this is clearly
impossible. After all, this is supposed to work for every message 𝑚
and every function 𝐸𝑣𝑒, but clearly if 𝑚 is the all-zeroes message 0ℓ
and 𝐸𝑣𝑒 is the function that ignores its input and simply outputs 0ℓ ,
then it will hold that 𝐸𝑣𝑒(𝐸𝑘 (𝑚)) = 𝑚 with probability one.
i n trod u c ti on 55

So, if before the definition was too weak, the new definition is too
strong and is impossible to achieve. The problem is that of course
we could guess a fixed message with probability one, so perhaps we
could try to consider a definition with a random message. That is:

An encryption
Definition 1.6 — Security of encryption: third attempt.
scheme (𝐸, 𝐷) is 𝑛-secure if no matter what method Eve employs,
if 𝑚 is chosen at random from {0, 1}ℓ , the probability that she can
recover 𝑚 from the ciphertext 𝑐 = 𝐸𝑘 (𝑚) is at most 2−𝑛 .

This weakened definition can in fact be achieved, but we have again


weakened it too much. Consider an encryption that hides the last ℓ/2
bits of the message, but completely reveals the first ℓ/2 bits. The prob-
ability of guessing a random message is 2−ℓ/2 , and so such a scheme
would be “ℓ/2 secure” per Definition 1.6 but this is still a scheme that
you would not want to use. The point is that in practice we don’t en-
crypt random messages— our messages might be in English, might
have common headers, and might have even more structures based on
the context. In fact, it may be that the message is either “Yes” or “No”
(or perhaps either “Attack today” or “Attack tomorrow”) but we want
to make sure Eve doesn’t learn which one it is. So, using an encryption
scheme that reveals the first half of the message (or frankly even only
the first bit) is unacceptable.

1.5 PERFECT SECRECY


So far all of our attempts at definitions oscillated between being too
strong (and hence impossible) or too weak (and hence not guarantee-
ing actual security). The key insight of Shannon was that in a secure
encryption scheme the ciphertext should not reveal any additional in-
formation about the plaintext. So, if for example it was a priori possible
for Eve to guess the plaintext with some probability 1/𝑘 (e.g., because
there were only 𝑘 possibilities for it) then she should not be able to
guess it with higher probability after seeing the ciphertext. This can be
formalized as follows:

Definition 1.7 — Perfect secrecy.An encryption scheme (𝐸, 𝐷) is per-


fectly secret if there for every set 𝑀 ⊆ {0, 1}ℓ of plaintexts, and for
every strategy used by Eve, if we choose at random 𝑚 ∈ 𝑀 and a
random key 𝑘 ∈ {0, 1}𝑛 , then the probability that Eve guesses 𝑚
after seeing 𝐸𝑘 (𝑚) is at most 1/|𝑀 |.

In particular, if we encrypt either “Yes” or “No” with probability


1/2, then Eve won’t be able to guess which one it is with probability
better than half. In fact, that turns out to be the heart of the matter:
56 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Theorem 1.8 — Two to many theorem. An encryption scheme (𝐸, 𝐷)


is perfectly secret if and only if for every two distinct plaintexts
{𝑚0 , 𝑚1 } ⊆ {0, 1}ℓ and every strategy used by Eve, if we choose
at random 𝑏 ∈ {0, 1} and a random key 𝑘 ∈ {0, 1}𝑛 , then the
probability that Eve guesses 𝑚𝑏 after seeing 𝐸𝑘 (𝑚𝑏 ) is at most 1/2.

Proof. The “only if” direction is obvious— this condition is a special


case of the perfect secrecy condition for a set 𝑀 of size 2.
The “if” direction is trickier. We will use a proof by contradiction.
We need to show that if there is some set 𝑀 (of size possibly much
larger than 2) and some strategy for Eve to guess (based on the ci-
phertext) a plaintext chosen from 𝑀 with probability larger than
1/|𝑀 |, then there is also some set 𝑀 ′ of size two and a strategy 𝐸𝑣𝑒′
for Eve to guess a plaintext chosen from 𝑀 ′ with probability larger
than 1/2.
Let’s fix the message 𝑚0 to be the all zeroes message and pick 𝑚1 at
random in 𝑀 . Under our assumption, it holds that for random key 𝑘
and message 𝑚1 ∈ 𝑀 ,

Pr [𝐸𝑣𝑒(𝐸𝑘 (𝑚1 )) = 𝑚1 ] > 1/|𝑀 | . (1.2)


𝑘←𝑅 {0,1}𝑛 ,𝑚1 ←𝑅 𝑀

On the other hand, for every choice of 𝑘, 𝑚′ = 𝐸𝑣𝑒(𝐸𝑘 (𝑚0 )) is a


fixed string independent on the choice of 𝑚1 , and so if we pick 𝑚1 at
random in 𝑀 , then the probability that 𝑚1 = 𝑚′ is at most 1/|𝑀 |, or
in other words

Pr [𝐸𝑣𝑒(𝐸𝑘 (𝑚0 )) = 𝑚1 ] ≤ 1/|𝑀 | . (1.3)


𝑘←𝑅 {0,1}𝑛 ,𝑚1 ←𝑅 𝑀

We can also write (1.2) and (1.3) as

𝔼 Pr[𝐸𝑣𝑒(𝐸𝑘 (𝑚1 )) = 𝑚1 ] > 1/|𝑀 |


𝑚1 ←𝑅 𝑀

and
𝔼 Pr[𝐸𝑣𝑒(𝐸𝑘 (𝑚0 )) = 𝑚1 ] ≤ 1/|𝑀 |
𝑚1 ←𝑅 𝑀

where these expectations are taken over the choice of 𝑚1 . Hence by


linearity of expectation

𝔼 (Pr[𝐸𝑣𝑒(𝐸𝑘 (𝑚1 )) = 𝑚1 ] − Pr[𝐸𝑣𝑒(𝐸𝑘 (𝑚0 )) = 𝑚1 ]) > 0 . (1.4)


𝑚1 ←𝑅 𝑀

(In words, for random 𝑚1 , the probability that Eve outputs 𝑚1 given
an encryption of 𝑚1 is higher than the probability that Eve outputs 𝑚1
given an encryption of 𝑚0 .)
In particular, by the averaging argument (the argument that if the
average of numbers is larger than 𝛼 then one of the numbers is larger
than 𝛼) there must exist 𝑚1 ∈ 𝑀 satisfying

Pr[𝐸𝑣𝑒(𝐸𝑘 (𝑚1 )) = 𝑚1 ] > Pr[𝐸𝑣𝑒(𝐸𝑘 (𝑚0 )) = 𝑚1 ] .


i n trod u c ti on 57

(Can you see why? This is worthwhile stopping and reading again.)
But this can be turned into an attacker 𝐸𝑣𝑒′ such that for 𝑏 ←𝑅
{0, 1}. the probability that 𝐸𝑣𝑒′ (𝐸𝑘 (𝑚𝑏 )) = 𝑚𝑏 is larger than 1/2.
Indeed, we can define 𝐸𝑣𝑒′ (𝑐) to output 𝑚1 if 𝐸𝑣𝑒(𝑐) = 𝑚1 and
otherwise output a random message in {𝑚0 , 𝑚1 }. The probability
that 𝐸𝑣𝑒′ (𝑦) equals 𝑚1 is higher when 𝑐 = 𝐸𝑘 (𝑚1 ) than when 𝑐 =
𝐸𝑘 (𝑚0 ), and since 𝐸𝑣𝑒′ outputs either 𝑚0 or 𝑚1 , this means that the
probability that 𝐸𝑣𝑒′ (𝐸𝑘 (𝑚𝑏 )) = 𝑚𝑏 is larger than 1/2. (Can you see
why?)

P
The proof of Theorem 1.8 is not trivial, and is worth
reading again and making sure you understand it.
An excellent exercise, which I urge you to pause and
do now is to prove the following: (𝐸, 𝐷) is perfectly
secret if for every plaintexts 𝑚, 𝑚′ ∈ {0, 1}ℓ , the two
random variables {𝐸𝑘 (𝑚)} and {𝐸𝑘′ (𝑚′ )} (for ran-
domly chosen keys 𝑘 and 𝑘′ ) have precisely the same
distribution.

Prove that a valid


Solved Exercise 1.1 — Perfect secrecy, equivalent definition.
encryption scheme (𝐸, 𝐷) with plaintext length ℓ(⋅) is perfectly secret
if and only if for every 𝑛 ∈ ℕ and plaintexts 𝑚, 𝑚′ ∈ {0, 1}ℓ(𝑛) , the
following two distributions 𝑌 and 𝑌 ′ over {0, 1}∗ are identical:

• 𝑌 is obtained by sampling 𝑘 ←𝑅 {0, 1}𝑛 and outputting 𝐸𝑘 (𝑚).

• 𝑌 ′ is obtained by sampling 𝑘 ←𝑅 {0, 1}𝑛 and outputting 𝐸𝑘 (𝑚′ ).

Solution:
We only sketch the proof. The condition in the exercise is equiv-
alent to perfect secrecy with |𝑀 | = 2. For every 𝑀 = {𝑚, 𝑚′ },
if 𝑌 and 𝑌 ′ are identical then clearly for every 𝐸𝑣𝑒 and possible
output 𝑦, Pr[𝐸𝑣𝑒(𝐸𝑘 (𝑚)) = 𝑦] = Pr[𝐸𝑣𝑒(𝐸𝑘 (𝑚′ )) = 𝑦] since these
correspond applying 𝐸𝑣𝑒 on the same distribution 𝑌 = 𝑌 ′ . On the
other hand, if 𝑌 and 𝑌 ′ are not identical then there must exist some
ciphertext 𝑐∗ such that Pr[𝑌 = 𝑐∗ ] > Pr[𝑌 ′ = 𝑐∗ ] (or vice versa).
The adversary that on input 𝑐 guesses that 𝑐 is an encryption of 𝑚
if 𝑐 = 𝑐∗ and otherwise tosses a coin will have some advantage over
1/2 in distinguishing an encryption of 𝑚 from an encryption of 𝑚′ .

58 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

We summarize the equivalent definitions of perfect secrecy in the


following theorem, whose (omitted) proof follows from Theorem 1.8
and Solved Exercise 1.1 as well as similar proof ideas.

Theorem 1.9 — Perfect secrecy equivalent conditions. Let (𝐸, 𝐷) be a valid


encryption scheme with message length ℓ(𝑛). Then the following
conditions are equivalent:

1. (𝐸, 𝐷) is perfectly secret as per Definition 1.7.

2. For every pair of messages 𝑚0 , 𝑚1 ∈ {0, 1}ℓ(𝑛) , the distributions


{𝐸𝑘 (𝑚0 )}𝑘←𝑅 {0,1}𝑛 and {𝐸𝑘 (𝑚1 )}𝑘←𝑅 {0,1}𝑛 are identical.

3. (Two-message security: Eve can’t guess which of one of two


messages was encrypted with success better than half.) For ev-
ery function 𝐸𝑣𝑒 ∶ {0, 1}𝐶(𝑛) → {0, 1}ℓ(𝑛) and pair of messages
𝑚0 , 𝑚1 ∈ {0, 1}ℓ(𝑛) ,

Pr [𝐸𝑣𝑒(𝐸𝑘 (𝑚𝑏 )) = 𝑚𝑏 ] ≤ 1/2


𝑏←𝑅 {0,1},𝑘←𝑅 {0,1}𝑛

4. (Arbitrary prior security: Eve can’t guess which message was


encrypted with success better than her prior information.) For
every distribution 𝒟 over {0, 1}ℓ(𝑛) , and 𝐸𝑣𝑒 ∶ {0, 1}𝐶(𝑛) →
{0, 1}ℓ(𝑛) ,

Pr [𝐸𝑣𝑒(𝐸𝑘 (𝑚)) = 𝑚] ≤ max(𝒟)


𝑚←𝑅 𝒟,𝑘←𝑅 {0,1}𝑛

where we denote max(𝒟) = max𝑚∗ ∈{0,1}ℓ(𝑛) Pr𝑚←𝑅 𝒟 [𝑚 = 𝑚∗ ] to


be the largest probability of any element under 𝒟.

1.5.1 Achieving perfect secrecy


So, perfect secrecy is a natural condition, and does not seem to be too
weak for applications, but can it actually be achieved? After all, the
condition that two different plaintexts are mapped to the same distri-
bution seems somewhat at odds with the condition that Bob would
succeed in decrypting the ciphertexts and find out if the plaintext was
in fact 𝑚 or 𝑚′ . It turns out the answer is yes! For example, Fig. 1.8
details a perfectly secret encryption for two bits. 6
The one-time pad is typically credited to Gilbert
In fact, this can be generalized to any number of bits:6 Vernam of Bell and Joseph Mauborgne of the U.S.
Army Signal Corps, but Steve Bellovin discovered
an earlier inventor Frank Miller who published a
There is a
Theorem 1.10 — One Time Pad (Vernam 1917, Shannon 1949). description of the one-time pad in 1882. However, it
perfectly secret valid encryption scheme (𝐸, 𝐷) with ℓ(𝑛) = 𝑛. is unclear if Miller realized the fact that security of
this system can be mathematically proven, and so the
theorem below should probably be still be credited to
Proof Idea: Vernam and Mauborgne.
i n trod u c ti on 59

Figure 1.8: A perfectly secret encryption scheme


for two-bit keys and messages. The blue vertices
represent plaintexts and the red vertices represent
ciphertexts, each edge mapping a plaintext 𝑚 to a
ciphertext 𝑐 = 𝐸𝑘 (𝑚) is labeled with the correspond-
ing key 𝑘. Since there are four possible keys, the
degree of the graph is four and it is in fact a complete
bipartite graph. The encryption scheme is valid in the
sense that for every 𝑘 ∈ {0, 1}2 , the map 𝑚 ↦ 𝐸𝑘 (𝑚)
is one-to-one, which in other words means that the set
of edges labeled with 𝑘 is a matching.

Our scheme is the one-time pad also known as the “Vernam Ci-
pher”, see Fig. 1.9. The encryption is exceedingly simple: to encrypt
a message 𝑚 ∈ {0, 1}𝑛 with a key 𝑘 ∈ {0, 1}𝑛 we simply output
𝑚 ⊕ 𝑘 where ⊕ is the bitwise XOR operation that outputs the string
corresponding to XORing each coordinate of 𝑚 and 𝑘.

Figure 1.9: In the one time pad encryption scheme


we encrypt a plaintext 𝑚 ∈ {0, 1}𝑛 with a key
𝑘 ∈ {0, 1}𝑛 by the ciphertext 𝑚 ⊕ 𝑘 where ⊕ denotes
the bitwise XOR operation.

Proof of Theorem 1.10. For two binary strings 𝑎 and 𝑏 of the same
length 𝑛, we define 𝑎 ⊕ 𝑏 to be the string 𝑐 ∈ {0, 1}𝑛 such that
𝑐𝑖 = 𝑎𝑖 + 𝑏𝑖 mod 2 for every 𝑖 ∈ [𝑛]. The encryption scheme
(𝐸, 𝐷) is defined as follows: 𝐸𝑘 (𝑚) = 𝑚 ⊕ 𝑘 and 𝐷𝑘 (𝑐) = 𝑐 ⊕ 𝑘.
By the associative law of addition (which works also modulo two),
𝐷𝑘 (𝐸𝑘 (𝑚)) = (𝑚 ⊕ 𝑘) ⊕ 𝑘 = 𝑚 ⊕ (𝑘 ⊕ 𝑘) = 𝑚 ⊕ 0𝑛 = 𝑚, using the fact
that for every bit 𝜎 ∈ {0, 1}, 𝜎 + 𝜎 mod 2 = 0 and 𝜎 + 0 = 𝜎 mod 2.
Hence (𝐸, 𝐷) form a valid encryption.
To analyze the perfect secrecy property, we claim that for every
𝑚 ∈ {0, 1}𝑛 , the distribution 𝑌𝑚 = 𝐸𝑘 (𝑚) where 𝑘 ←𝑅 {0, 1}𝑛 is
simply the uniform distribution over {0, 1}𝑛 , and hence in particular
the distributions 𝑌𝑚 and 𝑌𝑚′ are identical for every 𝑚, 𝑚′ ∈ {0, 1}𝑛 .
Indeed, for every particular 𝑦 ∈ {0, 1}𝑛 , the value 𝑦 is output by 𝑌𝑚 if
and only if 𝑦 = 𝑚 ⊕ 𝑘 which holds if and only if 𝑘 = 𝑚 ⊕ 𝑦. Since 𝑘 is
chosen uniformly at random in {0, 1}𝑛 , the probability that 𝑘 happens
to equal 𝑚 ⊕ 𝑦 is exactly 2−𝑛 , which means that every string 𝑦 is output
by 𝑌𝑚 with probability 2−𝑛 .
60 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Figure 1.10: For any key length 𝑛, we can visualize an


encryption scheme (𝐸, 𝐷) as a graph with a vertex
for every one of the 2ℓ(𝑛) possible plaintexts and for
every one of the ciphertexts in {0, 1}∗ of the form
𝐸𝑘 (𝑥) for 𝑘 ∈ {0, 1}𝑛 and 𝑥 ∈ {0, 1}ℓ(𝑛) . For every
plaintext 𝑥 and key 𝑘, we add an edge labeled 𝑘
between 𝑥 and 𝐸𝑘 (𝑥). By the validity condition, if we
pick any fixed key 𝑘, the map 𝑥 ↦ 𝐸𝑘 (𝑥) must be
one-to-one. The condition of perfect secrecy simply
corresponds to requiring that every two plaintexts
𝑥 and 𝑥′ have exactly the same set of neighbors (or
multi-set, if there are parallel edges).

P
The argument above is quite simple but is worth read-
ing again. To understand why the one-time pad is
perfectly secret, it is useful to envision it as a bipartite
graph as we’ve done in Fig. 1.8. (In fact the encryp-
tion scheme of Fig. 1.8 is precisely the one-time pad
for 𝑛 = 2.) For every 𝑛, the one-time pad encryp-
tion scheme corresponds to a bipartite graph with
2𝑛 vertices on the “left side” corresponding to the
plaintexts in {0, 1}𝑛 and 2𝑛 vertices on the “right side”
corresponding to the ciphertexts {0, 1}𝑛 . For every
𝑥 ∈ {0, 1}𝑛 and 𝑘 ∈ {0, 1}𝑛 , we connect 𝑥 to the vertex
𝑦 = 𝐸𝑘 (𝑥) with an edge that we label with 𝑘. One can
see that this is the complete bipartite graph, where
every vertex on the left is connected to all vertices on
the right. In particular this means that for every left
vertex 𝑥, the distribution on the ciphertexts obtained
by taking a random 𝑘 ∈ {0, 1}𝑛 and going to the
neighbor of 𝑥 on the edge labeled 𝑘 is the uniform dis-
tribution over {0, 1}𝑛 . This ensures the perfect secrecy
condition.

1.6 NECESSITY OF LONG KEYS


So, does Theorem 1.10 give the final word on cryptography, and
means that we can all communicate with perfect secrecy and live
happily ever after? No it doesn’t. While the one-time pad is efficient,
and gives perfect secrecy, it has one glaring disadvantage: to commu-
nicate 𝑛 bits you need to store a key of length 𝑛. In contrast, practically
i n trod u c ti on 61

used cryptosystems such as AES-128 have a short key of 128 bits (i.e.,
16 bytes) that can be used to protect terabytes or more of communica-
tion! Imagine that we all needed to use the one time pad. If that was
the case, then if you had to communicate with 𝑚 people, you would
have to maintain (securely!) 𝑚 huge files that are each as long as the
length of the maximum total communication you expect with that per-
son. Imagine that every time you opened an account with Amazon,
Google, or any other service, they would need to send you in the mail
(ideally with a secure courier) a DVD full of random numbers, and
every time you suspected a virus, you’d need to ask all these services
for a fresh DVD. This doesn’t sound so appealing.
This is not just a theoretical issue. The Soviets have used the one-
time pad for their confidential communication since before the 1940’s.
In fact, even before Shannon’s work, the U.S. intelligence already
knew in 1941 that the one-time pad is in principle “unbreakable” (see
page 32 in the Venona document). However, it turned out that the
hassle of manufacturing so many keys for all the communication took
its toll on the Soviets and they ended up reusing the same keys for
more than one message. They did try to use them for completely dif-
ferent receivers in the (false) hope that this wouldn’t be detected. The
Venona Project of the U.S. Army was founded in February 1943 by
Gene Grabeel (see Fig. 1.11), a former home economics teacher from
Madison Heights, Virginia and Lt. Leonard Zubko. In October 1943,
they had their breakthrough when it was discovered that the Russians
were reusing their keys. In the 37 years of its existence, the project has
resulted in a treasure chest of intelligence, exposing hundreds of KGB
agents and Russian spies in the U.S. and other countries, including
Julius Rosenberg, Harry Gold, Klaus Fuchs, Alger Hiss, Harry Dexter
Figure 1.11: Gene Grabeel, who founded the U.S.
White and many others. Russian SigInt program on 1 Feb 1943. Photo taken in
Unfortunately it turns out that that such long keys are necessary for 1942, see Page 7 in the Venona historical study.
perfect secrecy:

For every perfectly


Theorem 1.11 — Perfect secrecy requires long keys.
secret encryption scheme (𝐸, 𝐷) the length function ℓ satisfies
ℓ(𝑛) ≤ 𝑛.

Proof Idea:
The idea behind the proof is illustrated in Fig. 1.12. We define a
graph between the plaintexts and ciphertexts, where we put an edge Figure 1.12: An encryption scheme where the number
of keys is smaller than the number of plaintexts
between plaintext 𝑥 and ciphertext 𝑦 if there is some key 𝑘 such that corresponds to a bipartite graph where the degree is
𝑦 = 𝐸𝑘 (𝑥). The degree of this graph is at most the number of potential smaller than the number of vertices on the left side.
Together with the validity condition this implies that
keys. The fact that the degree is smaller than the number of plaintexts
there will be two left vertices 𝑥, 𝑥′ with non-identical
(and hence of ciphertexts) implies that there would be two plaintexts neighborhoods, and hence the scheme does not satisfy
𝑥 and 𝑥′ with different sets of neighbors, and hence the distribution perfect secrecy.
62 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

of a ciphertext corresponding to 𝑥 (with a random key) will not be


identical to the distribution of a ciphertext corresponding to 𝑥′ .

Proof of Theorem 1.11. Let 𝐸, 𝐷 be a valid encryption scheme with


messages of length ℓ and key of length 𝑛 < ℓ. We will show that
(𝐸, 𝐷) is not perfectly secret by providing two plaintexts 𝑥0 , 𝑥1 ∈
{0, 1}ℓ such that the distributions 𝑌𝑥0 and 𝑌𝑥1 are not identical, where
𝑌𝑥 is the distribution obtained by picking 𝑘 ←𝑅 {0, 1}𝑛 and outputting
𝐸𝑘 (𝑥).
We choose 𝑥0 = 0ℓ . Let 𝑆0 ⊆ {0, 1}∗ be the set of all ciphertexts
that have nonzero probability of being output in 𝑌𝑥0 . That is, 𝑆0 =
{𝑦 | ∃𝑘∈{0,1}𝑛 𝑦 = 𝐸𝑘 (𝑥0 )}. Since there are only 2𝑛 keys, we know that
|𝑆0 | ≤ 2𝑛 .
We will show the following claim:
Claim I: There exists some 𝑥1 ∈ {0, 1}ℓ and 𝑘 ∈ {0, 1}𝑛 such that
𝐸𝑘 (𝑥1 ) ∉ 𝑆0 .
Claim I implies that the string 𝐸𝑘 (𝑥1 ) has positive probability of
being output by 𝑌𝑥1 and zero probability of being output by 𝑌𝑥0 and
hence in particular 𝑌𝑥0 and 𝑌𝑥1 are not identical. To prove Claim I, just
choose a fixed 𝑘 ∈ {0, 1}𝑛 . By the validity condition, the map 𝑥 ↦
𝐸𝑘 (𝑥) is a one to one map of {0, 1}ℓ to {0, 1}∗ and hence in particular
the image of this map which is the set 𝐼𝑘 = {𝑦 | ∃𝑥∈{0,1}ℓ 𝑦 = 𝐸𝑘 (𝑥)} has
size at least (in fact exactly) 2ℓ . Since |𝑆0 | ≤ 2𝑛 < 2ℓ , this means that
|𝐼𝑘 | > |𝑆0 | and so in particular there exists some string 𝑦 in 𝐼𝑘 ⧵ 𝑆0 . But
by the definition of 𝐼𝑘 this means that there is some 𝑥 ∈ {0, 1}ℓ such
that 𝐸𝑘 (𝑥) ∉ 𝑆0 which concludes the proof of Claim I and hence of
Theorem 1.11.

R
Remark 1.12 — Adding probability into the picture. There
is a sense in which both our secrecy and our impossi-
bility results might not be fully convincing, and that
is that we did not explicitly consider algorithms that
use randomness . For example, maybe Eve can break
a perfectly secret encryption if she is not modeled
as a deterministic function 𝐸𝑣𝑒 ∶ {0, 1}𝑜 → {0, 1}ℓ
but rather a probabilistic process. Similarly, maybe the
encryption and decryption functions could be prob-
abilistic processes as well. It turns out that none of
those matter.
For the former, note that a probabilistic process can
be thought of as a distribution over functions, in the
sense that we have a collection of functions 𝑓1 , ..., 𝑓𝑁
mapping {0, 1}𝑜 to {0, 1}ℓ , and some probabilities
i n trod u c ti on 63

𝑝1 , … , 𝑝𝑁 (non-negative numbers summing to 1), so


we now think of Eve as selecting the function 𝑓𝑖 with
probability 𝑝𝑖 . But if none of those functions can give
an advantage better than 1/2, then neither can this
collection (this is related to the averaging principle in
probability).
A similar (though more involved) argument shows
that the impossibility result showing that the key must
be at least as long as the message still holds even if the
encryption and decryption algorithms are allowed to
be probabilistic processes as well (working this out is
a great exercise).

1.6.1 Amplifying success probability


Theorem 1.11 implies that for every encryption scheme (𝐸, 𝐷) with
ℓ(𝑛) > 𝑛, there is a pair of messages 𝑥0 , 𝑥1 and an attacker 𝐸𝑣𝑒 that
can distinguish between an encryption of 𝑥0 and an encryption of
𝑥1 with success better than 1/2. But perhaps Eve’s success is only
marginally better than half, say 0.50001? It turns out that’s not the
case. If the message is even somewhat larger than the key, the success
of Eve can be very close to 1:

Let (𝐸, 𝐷) be an
Theorem 1.13 — Short keys imply high probability attack.
encryption scheme with ℓ(𝑛) = 𝑛 + 𝑡. Then there is a function 𝐸𝑣𝑒
and pair of messages 𝑥0 , 𝑥1 such that

Pr [𝐸𝑣𝑒(𝐸𝑘 (𝑥𝑏 )) = 𝑥𝑏 ] ≥ 1 − 2−𝑡−1 .


𝑘←𝑅 {0,1}𝑛 ,𝑏←𝑅 {0,1}

Proof. As in the proof of Theorem 1.11, let ℓ = ℓ(𝑛) and let 𝑥0 = 0ℓ


and 𝑆0 = {𝐸𝑘 (𝑥0 ) ∶ 𝑘 ∈ {0, 1}𝑛 } be the set of size at most 2𝑛 of all
ciphertexts corresponding to 𝑥0 . We claim that

Pr [𝐸𝑘 (𝑥) ∈ 𝑆0 ] ≤ 2−𝑡 . (1.5)


𝑘←𝑅 {0,1}𝑛 ,𝑥∈{0,1}ℓ

We show this by arguing that this bound holds for every fixed 𝑘,
when we take the probability over 𝑥, and so in particular it holds also
for random 𝑘. Indeed, for every fixed 𝑘, the map 𝑥 ↦ 𝐸𝑘 (𝑥) is a one-
to-one map, and so the distribution of 𝐸𝑘 (𝑥) for random 𝑥 ∈ {0, 1}ℓ is
uniform over some set 𝑇𝑘 of size 2𝑛+𝑡 . For every 𝑘, the probability over
𝑥 that 𝐸𝑘 (𝑥) ∈ 𝑆0 is equal to
|𝑇𝑘 ∩𝑆0 | |𝑆0 | 2𝑛
|𝑇𝑘 | ≤ |𝑇𝑘 | ≤ 2𝑛+𝑡 = 2−𝑡

thus proving (1.5).


Now, for every 𝑥, define 𝑝𝑥 to be Pr𝑘←𝑅 {0,1}𝑛 [𝐸𝑘 (𝑥) ∈ 𝑆0 ]. By (1.5),
the expectation of 𝑝𝑥 over random 𝑥 ←𝑅 {0, 1}𝑛 is at most 2−𝑡 and so
64 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

in particular by the averaging argument there exists some 𝑥1 such that


𝑝𝑥1 ≤ 2−𝑡 . Yet that means that the following adversary 𝐸𝑣𝑒 will be
able to distinguish between an encryption of 𝑥0 and an encryption of
𝑥1 with probability at least 1 − 2−𝑡−1 :

• Input: A ciphertext 𝑦 ∈ {0, 1}∗

• Operation: If 𝑦 ∈ 𝑆0 , output 𝑥0 , otherwise output 𝑥1 .

The probability that 𝐸𝑣𝑒(𝐸𝑘 (𝑥0 )) = 𝑥0 is equal to 1, while the


probability that 𝐸𝑣𝑒(𝐸𝑘 (𝑥1 )) = 𝑥1 is equal to 1 − 𝑝𝑥1 ≥ 1 − 2−𝑡 . Hence
the overall probability of 𝐸𝑣𝑒 guessing correctly is

1 1
2 ⋅1+ 2 ⋅ (1 − 2−𝑡 ) = 1 − 2−𝑡−1 .

1.7 BIBLIOGRAPHICAL NOTES


Much of this text is shared with my Introduction to Theoretical Com-
puter Science textbook.
Shannon’s manuscript was written in 1945 but was classified, and a
partial version was only published in 1949. Still it has revolutionized
cryptography, and is the forerunner to much of what followed.
The Venona project’s history is described in this document. Aside
from Grabeel and Zubko, credit to the discovery that the Soviets were
reusing keys is shared by Lt. Richard Hallock, Carrie Berry, Frank
Lewis, and Lt. Karl Elmquist, and there are others that have made
important contributions to this project. See pages 27 and 28 in the
document.
In a 1955 letter to the NSA that only recently came forward, John
Nash proposed an “unbreakable” encryption scheme. He wrote “I
hope my handwriting, etc. do not give the impression I am just a crank or
circle-squarer… The significance of this conjecture [that certain encryption
schemes are exponentially secure against key recovery attacks] .. is that it is
quite feasible to design ciphers that are effectively unbreakable.” John Nash
made seminal contributions in mathematics and game theory, and was
awarded both the Abel Prize in mathematics and the Nobel Memorial
Prize in Economic Sciences. However, he has struggled with mental
illness throughout his life. His biography, A Beautiful Mind was made
into a popular movie. It is natural to compare Nash’s 1955 letter to
the NSA to the 1956 letter by Kurt Gödel to John von Neumann. From
the theoretical computer science point of view, the crucial difference
is that while Nash informally talks about exponential vs polynomial
computation time, he does not mention the word “Turing Machine”
or other models of computation, and it is not clear if he is aware or not
i n trod u c ti on 65

that his conjecture can be made mathematically precise (assuming a


formalization of “sufficiently complex types of enciphering”).
II
PRIVATE KEY CRYPTOGRAPHY
2
Computational Security

Additional reading: Sections 2.2 and 2.3 in Boneh-Shoup book. Chapter


3 up to and including Section 3.3 in Katz-Lindell book.
Recall our cast of characters- Alice and Bob want to communicate
securely over a channel that is monitored by the nosy Eve. In the last
lecture, we have seen the definition of perfect secrecy that guarantees
that Eve cannot learn anything about their communication beyond
what she already knew. However, this security came at a price. For
every bit of communication, Alice and Bob have to exchange in ad-
vance a bit of a secret key. In fact, the proof of this result gives rise to
the following simple Python program that can break every encryption
scheme that uses, say, a 128 bit key, with a 129 bit message:
from itertools import product # Import an iterator for
↪ cartesian products
from random import choice # choose random element of list

# Gets ciphertext as input and two potential plaintexts


# Returns most likely plaintext
# We assume we have access to the function
↪ Encrypt(key,plaintext)
def Distinguish(ciphertext,plaintext1,plaintext2):
for key in product([0,1], repeat = 128): # Iterate
↪ over all possible keys of length 128
if Encrypt(key, plaintext1)==ciphertext:
return plaintext1
if Encrypt(key, plaintext2)==ciphertext:
return plaintext2
return choice([plaintext1,plaintext2])

The program Distinguish will break any 128-bit key and 129-bit
message encryption Encrypt, in the sense that there exist a pair of
messages 𝑚0 , 𝑚1 such that Distinguish(Encrypt(𝑘, 𝑚𝑏 ), 𝑚0 , 𝑚1 ) =
𝑚𝑏 with probability at least 0.75 over 𝑘 ←𝑅 {0, 1}𝑛 and 𝑏 ←𝑅 {0, 1}.

Compiled on 11.17.2021 22:35


70 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Now, generating, distributing, and protecting huge keys causes


immense logistical problems, which is why almost all encryption
schemes used in practice do in fact utilize short keys (e.g., 128 bits
long) with messages that can be much longer (sometimes even ter-
abytes or more of data).
So, why can’t we use the above Python program to break all en-
cryptions in the Internet and win infamy and fortune? We can in fact,
but we’ll have to wait a really long time, since the loop in Distinguish
will run 2128 times, which will take much more than the lifetime of the
universe to complete, even if we used all the computers on the planet.
However, the fact that this particular program is not a feasible at-
tack, does not mean there does not exist a different attack. But this
still suggests a tantalizing possibility: if we consider a relaxed version
of perfect secrecy that restricts Eve to performing computations that
can be done in this universe (e.g., less than 2256 steps should be safe
not just for human but for all potential alien civilizations) then can we
bypass the impossibility result and allow the key to be much shorter
than the message?
This in fact does seem to be the case, but as we’ve seen, defining
security is a subtle task, and will take some care. As before, the way
we avoid (at least some of) the pitfalls of so many cryptosystems in
history is that we insist on very precisely defining what it means for a
scheme to be secure.
Let us defer the discussion how one defines a function being com-
putable in “less than 𝑇 operations” and just say that there is a way
to formally do so. We will want to say that a scheme has “256 bits of
security” if it is not possible to break it using less than 2256 operations,
and more generally that it has 𝑡 bits of security if it can’t be broken
using less than 2𝑡 operations. Given the perfect secrecy definition we
saw last time, a natural attempt for defining computational secrecy
would be the following:

An encryption
Definition 2.1 — Computational secrecy (first attempt).
scheme (𝐸, 𝐷) has 𝑡 bits of computational secrecy if for every two dis-

tinct plaintexts {𝑚0 , 𝑚1 } ⊆ {0, 1} and every strategy of Eve using
at most 2𝑡 computational steps, if we choose at random 𝑏 ∈ {0, 1}
𝑛
and a random key 𝑘 ∈ {0, 1} , then the probability that Eve guesses
𝑚𝑏 after seeing 𝐸𝑘 (𝑚𝑏 ) is at most 1/2.

Note: It is important to keep track of what is known and unknown to


the adversary Eve. The adversary knows the set {𝑚0 , 𝑚1 } of potential
messages, and the ciphertext 𝑦 = 𝐸𝑘 (𝑚𝑏 ). The only things she doesn’t
know are whether 𝑏 = 0 or 𝑏 = 1, and the value of the secret key 𝑘.
In particular, because 𝑚0 and 𝑚1 are known to Eve, it does not matter
comp u tati ona l se c u ri ty 71

whether we define Eve’s goal in this “security game” as outputting 𝑚𝑏


or as outputting 𝑏.
Definition 2.1 seems very natural, but is in fact impossible to achieve
if the key is shorter than the message.

P
Before reading further, you might want to stop and
think if you can prove that there is no, say, encryption

scheme with 𝑛 bits of computational security satisfy-
ing Definition 2.1 with ℓ = 𝑛 + 1 and where the time to
compute the encryption is polynomial.

The reason Definition 2.1 can’t be achieved is that if the message is


even one bit longer than the key, we can always have a very efficient
procedure that achieves success probability of about 1/2 + 2−𝑛−1
by guessing the key. This is because we can replace the loop in the
Python program Distinguish by choosing the key at random. Since
we have some small chance of guessing correctly, we will get a small
advantage over half.
Of course an advantage of 2−256 in guessing the message is not
really something we would worry about. For example, since the earth
is about 5 billion years old, we can estimate the chance that an asteroid
of the magnitude that caused the dinosaurs’ extinction will hit us
this very second to be about 2−60 . Hence we want to relax the notion
of computational security so it would not consider guessing with
such a tiny advantage as a “true break” of the scheme. The resulting
definition is the following:

Definition 2.2 — Computational secrecy (concrete). An encryption scheme


(𝐸, 𝐷) has 𝑡 bits of computational secrecy 1 if for every two distinct

plaintexts {𝑚0 , 𝑚1 } ⊆ {0, 1} and every strategy of Eve using at
most 2𝑡 computational steps, if we choose at random 𝑏 ∈ {0, 1} and
𝑛
a random key 𝑘 ∈ {0, 1} , then the probability that Eve guesses 𝑚𝑏
after seeing 𝐸𝑘 (𝑚𝑏 ) is at most 1/2 + 2−𝑡 .
1
Another version of “𝑡 bits of security” is that a
Having learned our lesson, let’s try to see that this strategy does scheme has 𝑡 bits of security if for every 𝑡1 + 𝑡2 ≤ 𝑡,
an attacker running in 2𝑡1 time can’t get success
give us the kind of conditions we desired. In particular, let’s verify
probability advantage more than 2−𝑡2 . However
that this definition implies the analogous condition to perfect secrecy. these two definitions only differ from one another
by at most a factor of two. This may be important for
practical applications (where the difference between
If (𝐸, 𝐷) has
Theorem 2.3 — Guessing game for computational secrecy.
64 and 32 bits of security could be crucial) but won’t
𝑡 bits of computational secrecy as per Definition 2.2 then for ev- matter for our concerns.

ery subset 𝑀 ⊆ {0, 1} and every strategy of Eve using at most
2𝑡 − (100ℓ + 100) computational steps, if we choose at random
𝑛
𝑚 ∈ 𝑀 and a random key 𝑘 ∈ {0, 1} , then the probability that Eve
guesses 𝑚 after seeing 𝐸𝑘 (𝑚) is at most 1/|𝑀 | + 2−𝑡+1 .
72 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Before proving this theorem note that it gives us a pretty strong


guarantee. In the exercises we will strengthen it even further showing
that no matter what prior information Eve had on the message before, 2
The latter property is known as “semantic security”,
she will never get any non-negligible new information on it.2 One way see also section 3.2.2 of Katz Lindell on “semantic se-
to phrase it is that if the sender used a 256-bit secure encryption to en- curity” and Section 2 of Boneh-Shoup “computational
ciphers and semantic security”.
crypt a message, then your chances of getting to learn any additional
information about it before the universe collapses are more or less the
same as the chances that a fairy will materialize and whisper it in your
ear.

P
Before reading the proof, try to again review the
proof of Theorem 1.8, and see if you can generalize it
yourself to the computational setting.

Proof of Theorem 2.3. The proof is rather similar to the equivalence of


guessing one of two messages vs. one of many messages for perfect
secrecy (i.e., Theorem 1.8). However, in the computational context we
need to be careful in keeping track of Eve’s running time. In the proof
of Theorem 1.8 we showed that if there exists:

• A subset 𝑀 ⊆ {0, 1} of messages

and
𝑜 ℓ
• An adversary 𝐸𝑣𝑒 ∶ {0, 1} → {0, 1} such that

Pr 𝑛
[𝐸𝑣𝑒(𝐸𝑘 (𝑚)) = 𝑚] > 1/|𝑀 |
𝑚←𝑅 𝑀,𝑘←𝑅 {0,1}

Then there exist two messages 𝑚0 , 𝑚1 and an adversary 𝐸𝑣𝑒′ ∶


𝑜 ℓ
{0, 1} → {0, 1} such that Pr𝑏←𝑅 {0,1},𝑘←𝑅 {0,1}𝑛 [𝐸𝑣𝑒′ (𝐸𝑘 (𝑚𝑏 )) = 𝑚𝑏 ] >
1/2.
To adapt this proof to the computational setting and complete the
proof of the current theorem it suffices to show that:

• If the probability of 𝐸𝑣𝑒 succeeding was 1


|𝑀| + 𝜖 then the probability
of 𝐸𝑣𝑒′ succeeding is at least 21 + 𝜖/2.

• If 𝐸𝑣𝑒 can be computed in 𝑇 operations, then 𝐸𝑣𝑒′ can be com-


puted in 𝑇 + 100ℓ + 100 operations.

This will imply that if 𝐸𝑣𝑒 ran in polynomial time and had poly-
nomial advantage over 1/|𝑀 | in guessing a plaintext chosen from 𝑀 ,
then 𝐸𝑣𝑒′ would run in polynomial time and have polynomial advan-
tage over 1/2 in guessing a plaintext chosen from {𝑚0 , 𝑚1 }.
comp u tati ona l se c u ri ty 73

The first item can be shown by simply doing the same proof more
carefully, keeping track how the advantage over |𝑀| 1
for 𝐸𝑣𝑒 translates
into an advantage over 2 for 𝐸𝑣𝑒 . As the world’s most annoying
1 ′

saying goes, doing this is an excellent exercise for the reader.


The second item is obtained by looking at the definition of 𝐸𝑣𝑒′
from that proof. On input 𝑐, 𝐸𝑣𝑒′ computed 𝑚 = 𝐸𝑣𝑒(𝑐) (which
costs 𝑇 operations), checked if 𝑚 = 𝑚0 (which costs, say, at most 5ℓ
operations), and then outputted either 1 or a random bit (which is a
constant, say at most 100 operations).

2.0.1 Proof by reduction


The proof of Theorem 2.3 is a model to how a great many of the re-
sults in this course will look like. Generally we will have many theo-
rems of the form:
“If there is a scheme 𝑆 ′ satisfying security defini-
tion 𝑋 ′ then there is a scheme 𝑆 satisfying security
definition 𝑋”

In the context of Theorem 2.3, 𝑋 ′ was “having 𝑡 bits of security” (in


the context distinguishing between encryptions of two ciphertexts)
and 𝑋 was the more general notion of hardness of getting a non-trivial
advantage over guessing for an encryption of a random 𝑚 ∈ 𝑀 .
While in Theorem 2.3 the encryption scheme 𝑆 was the same as 𝑆 ′ ,
this need not always be the case. However, all of the proofs of such
statements will have the same global structure— we will assume
towards a contradiction, that there is an efficient adversary strategy
𝐸𝑣𝑒 demonstrating that the scheme 𝑆 violates the security notion 𝑋,
and build from 𝐸𝑣𝑒 a strategy 𝐸𝑣𝑒′ demonstrating that 𝑆 ′ violates 𝑋 ′ .
This is such an important point that it deserves repeating:
The way you show that if 𝑆 ′ is secure then 𝑆 is secure is
by giving a transformation from an adversary that breaks
𝑆 into an adversary that breaks 𝑆 ′

For computational secrecy, we will always want that 𝐸𝑣𝑒′ will


be efficient if 𝐸𝑣𝑒 is, and that will usually be the case because 𝐸𝑣𝑒′
will simply use 𝐸𝑣𝑒 as a black box, which it will not invoke too many
times, and addition will use some polynomial time preprocessing
and postprocessing. The more challenging parts of such proofs are
typically:

• Coming up with the strategy 𝐸𝑣𝑒′ .

• Analyzing the probability of success and in particular showing that


if 𝐸𝑣𝑒 had non-negligible advantage then so will 𝐸𝑣𝑒′ .
74 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Note that, just like in the context of NP completeness or uncom-


putability reductions, security reductions work backwards. That is, we
construct the scheme 𝑆 based on the scheme 𝑆 ′ , but then prove that
we can transform an algorithm breaking 𝑆 into an algorithm breaking
𝑆 ′ . Just like in computational complexity, it can sometimes be hard to
keep track of the direction of the reduction. In fact, cryptographic re-
ductions can be even subtler, since they involve an interplay of several
entities (for example, sender, receiver, and adversary) and probabilis-
tic choices (e.g., over the message to be sent and the key).

2.1 THE ASYMPTOTIC APPROACH


For practical security, often every bit of security matters. We want
our keys to be as short as possible and our schemes to be as fast as
possible while satisfying a particular level of security. In practice we
would usually like to ensure that when we use a smallish security
parameter such as 𝑛 in the few hundreds or thousands then:

• The honest parties (the parties running the encryption and decryp-
tion algorithms) are extremely efficient, something like 100-1000
cycles per byte of data processed. In theory terms we would want
them be using an 𝑂(𝑛) or at worst 𝑂(𝑛2 ) time algorithms with not-
too-big hidden constants.

• We want to protect against adversaries (the parties trying to break


the encryption) that have much vaster computational capabilities.
A typical modern encryption is built so that using standard key
sizes it can withstand the combined computational powers of all
computers on earth for several decades. In theory terms we would
want the time to break the scheme to be 2Ω(𝑛) (or if not, at least

2Ω( 𝑛) or 2Ω(𝑛 ) ) with not too small hidden constants.
1/3

For implementing cryptography in practice, the tradeoff between


security and efficiency can be crucial. However, for understanding the
principles behind cryptography, keeping track of concrete security can
be a distraction, and so just like we do in algorithms courses, we will
use asymptotic analysis (also known as big Oh notation) to sweep many
of those details under the carpet.
To a first approximation, there will be only two types of running
times we will encounter in this course:

• Polynomial running time of the form 𝑑⋅𝑛𝑐 for some constants 𝑑, 𝑐 > 0
(or 𝑝𝑜𝑙𝑦(𝑛) = 𝑛𝑂(1) for short), which we will consider as efficient. 3
Some texts reserve the term exponential to functions
of the form 2𝜖𝑛√ for some 𝜖 > 0 and call a function
• Exponential running time of the form 2𝑑⋅𝑛 for some constants 𝑑, 𝜖 >
𝜖
such as, say, 2 𝑛 subexponential . However, we will
0 (or 2𝑛 for short) which we will consider as infeasible.3 generally not make this distinction in this course.
Ω(1)
comp u tati ona l se c u ri ty 75

Another way to say it is that in this course, if a scheme has any


security at all, it will have at least 𝑛𝜖 bits of security where 𝑛 is the
length of the key and 𝜖 > 0 is some absolute constant such as 𝜖 = 1/3.
Hence in this course, whenever you hear the term “super polyno-
mial”, you can equate it in your mind with “exponential” and you
won’t be far off the truth.
These are not all the theoretically possible running times. One can
have intermediate functions such as 𝑛log 𝑛 though we will generally
not encounter those. To make things clean (and to correspond to
standard terminology), we will generally associate “efficient computa-
tion” with polynomial time in 𝑛 where 𝑛 is either its input length or the
key size (the key size and input length will always be polynomially
related, and so this choice won’t matter). We want our algorithms (en-
cryption, decryption, etc.) to be computable in polynomial time, but
to require super polynomial time to break.

In cryptography, we care not just about the


Negligible probabilities.
running time of the adversary but also about their probability of suc-
cess (which should be as small as possible). If 𝜇 ∶ ℕ → [0, ∞) is a
function (which we’ll often think of as corresponding to the adver-
sary’s probability of success or advantage over the trivial probability,
as a function of the key size 𝑛) then we say that 𝜇(𝑛) is negligible if it’s
smaller than the inverse of every (positive) polynomial. Our security
definitions will have the following form:
”Scheme 𝑆 is secure if for every polynomial 𝑝(⋅) and 𝑝(𝑛)
time adversary 𝐸𝑣𝑒, there is some negligible function
𝜇 such that the probability that 𝐸𝑣𝑒 succeeds in the
security game for 𝑆 is at most 𝑡𝑟𝑖𝑣𝑖𝑎𝑙 + 𝜇(𝑛)”

We now make these notions more formal.

A function 𝜇 ∶ ℕ → [0, ∞) is negligi-


Definition 2.4 — Negligible function.
ble if for every polynomial 𝑝 ∶ ℕ → ℕ there exists 𝑁 ∈ ℕ such that
𝜇(𝑛) < 𝑝(𝑛)1
for every 𝑛 > 𝑁 . 4
4
Negligible functions are sometimes defined with
The following exercise provides a good way to get some comfort image equalling [0, 1] as opposed to the set [0, ∞) of
with this definition: non-negative real numbers, since they are typically
used to bound probabilities. However, this does not
1. Let 𝜇 ∶ ℕ → [0, ∞) be a
Exercise 2.1 — Negligible functions properties. make much difference since if 𝜇 is negligible then for
large enough 𝑛, 𝜇(𝑛) will be smaller than one.
negligible function. Prove that for every polynomials 𝑝, 𝑞 ∶ ℝ → ℝ
with non-negative coefficients such that 𝑝(0) = 0, the function
𝜇′ ∶ ℕ → [0, ∞) defined as 𝜇′ (𝑛) = 𝑝(𝜇(𝑞(𝑛))) is negligible.

2. Let 𝜇 ∶ ℕ → [0, ∞). Prove that 𝜇 is negligible if and only if for every
constant 𝑐, lim𝑛→∞ 𝑛𝑐 𝜇(𝑛) = 0.

76 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

R
Remark 2.5 — Asymptotic analysis. The above defini-
tions could be confusing if you haven’t encountered
asymptotic analysis before. Reading the beginning of
Chapter 3 (pages 43-51) in the KL book, as well as the
mathematical background lecture in my intro to TCS
notes can be extremely useful. As a rule of thumb, if
every time you see the word “polynomial” you imag-
ine the function 𝑛10 and every time you see

the word
“negligible” you imagine the function 2− 𝑛 then you
will get the right intuition.
What you need to remember is that negligible is much
smaller than any inverse polynomial, while polynomi-
als are closed under multiplication, and so we have the
“equations”

𝑛𝑒𝑔𝑙𝑖𝑔𝑖𝑏𝑙𝑒 × 𝑝𝑜𝑙𝑦𝑛𝑜𝑚𝑖𝑎𝑙 = 𝑛𝑒𝑔𝑙𝑖𝑔𝑖𝑏𝑙𝑒


and

𝑝𝑜𝑙𝑦𝑛𝑜𝑚𝑖𝑎𝑙 × 𝑝𝑜𝑙𝑦𝑛𝑜𝑚𝑖𝑎𝑙 = 𝑝𝑜𝑙𝑦𝑛𝑜𝑚𝑖𝑎𝑙


As mentioned, in practice people really want to get as
close as possible to 𝑛 bits of security with an 𝑛 bit key,
but we would be happy as long as the security grows
with the key, so when we say a scheme is “secure” you

can think of it having 𝑛 bits of security (though any
function growing faster than log 𝑛 would be fine as
well).

From now on, we will require all of our encryption schemes to be


efficient which means that the encryption and decryption algorithms
should run in polynomial time. Security will mean that any efficient
adversary can make at most a negligible gain in the probability of 5
Note that there is a subtle issue here with the or-
guessing the message over its a priori probability.5 der of quantifiers. For a scheme to be efficient, the
We can now formally define computational secrecy in asymptotic algorithms such as encryption and decryption need
to run in some fixed polynomial time such as 𝑛2 or
terms:
𝑛3 . In contrast we allow the adversary to run in any
polynomial time. That is, for every 𝑐, if 𝑛 is large
An encryption
Definition 2.6 — Computational secrecy (asymptotic).
enough, then the scheme should be secure against
scheme (𝐸, 𝐷) is computationally secret if for every two distinct an adversary that runs in time 𝑛𝑐 . This is in line

plaintexts {𝑚0 , 𝑚1 } ⊆ {0, 1} and every efficient (i.e., polynomial with the general principle in cryptography that we
always allow the adversary potentially much more
time) strategy of Eve, if we choose at random 𝑏 ∈ {0, 1} and a ran- resources than those used by the honest users. In
𝑛
dom key 𝑘 ∈ {0, 1} , then the probability that Eve guesses 𝑚𝑏 after practical security we often assume that the gap be-
tween the honest use and the adversary resources can
seeing 𝐸𝑘 (𝑚𝑏 ) is at most 1/2 + 𝜇(𝑛) for some negligible function be exponential. For example, a low power embedded
𝜇(⋅). device can encrypt messages that, as far as we know,
are undecipherable even by a nation-state using
super-computers and massive data centers.
2.1.1 Counting number of operations.
One more detail that we’ve so far ignored is what does it mean exactly
for a function to be computable using at most 𝑇 operations. Fortu-
nately, when we don’t really care about the difference between 𝑇 and,
comp u tati ona l se c u ri ty 77

say, 𝑇 2 , then essentially every reasonable definition gives the same an- 6
With some caveats that need to be added due to
swer.6 Formally, we can use the notions of Turing machines, Boolean quantum computers: we’ll get to those later in the
circuits, or straightline programs to define complexity. For concrete- course, though they won’t change most of our theory.
𝑛 𝑚 See also this discussion in my intro TCS textbook
ness, let’s define that a function 𝐹 ∶ {0, 1} → {0, 1} has complexity
and this presentation of Aaronson on the “extended
at most 𝑇 if there is a Boolean circuit that computes 𝐹 using at most Church Turing thesis”.
𝑇 Boolean gates (say AND/OR/NOT or NAND; alternatively you can
choose your favorite universal gate sets.) We will often also consider
probabilistic functions in which case we allow the circuit a RAND gate
that outputs a single random bit (though this in general does not give
extra power). The fact that we only care about asymptotics means
you don’t really need to think of gates when arguing in cryptogra-
phy. However, it is comforting to know that this notion has a precise
mathematical formulation.

Uniform vs non-uniform models.While many computational texts fo-


cus on models such as Turing machines, in cryptography it is more
convenient to use Boolean circuits which are a non uniform model of
computation in the sense that we allow a different circuit for every
given input length. The reasons are the following:

1. Circuits can express finite computation, while Turing machines only


make sense for computing on arbitrarily large input lengths, and
so we can make sense of notions such as “𝑡 bits of computational
security”.

2. Circuits allow the notion of “hardwiring” whereby if we can com-


pute a certain function 𝐹 ∶ {0, 1}𝑛+𝑠 → {0, 1}𝑚 using a circuit
of 𝑇 gates and have a string 𝑤 ∈ {0, 1}𝑠 then we can compute the
function 𝑥 ↦ 𝐹 (𝑥𝑤) using 𝑇 gates as well. This is useful in many
cryptograhic proofs.

One can build the theory of cryptography using Turing machines as


well, but it is more cumbersome.

R
Remark 2.7 — Computing beyond functions. Later on
in the course, both our cryptographic schemes and
the adversaries will extend beyond simple functions
that map an input to an output, and we will consider
interactive algorithms that exchange messages with one
another. Such an algorithm can be implemented us-
ing circuits or Turing machines that take as input the
prior state and the history of messages up to a certain
point in the interaction, and output the next message
in the interaction. The number of operations used in
such a strategy is the total number of gates used in
computing all the messages.
78 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

2.2 OUR FIRST CONJECTURE


We are now ready to make our first conjecture:

The Cipher Conjecture: 7 There exists a compu-


tationally secret encryption scheme (𝐸, 𝐷) (where
𝐸, 𝐷 are efficient) with length function ℓ(𝑛) = 𝑛 + 1.
7
As will be the case for other conjectures we talk
about, the name “The Cipher Conjecture” is not
A conjecture is a well defined mathematical statement which (1)
a standard name, but rather one we’ll use in this
we believe is true but (2) don’t know yet how to prove. Proving the course. In the literature this conjecture is mostly
cipher conjecture will be a great achievement and would in particular referred to as the conjecture of existence of one way
functions, a notion we will learn about later. These two
settle the P vs NP question, which is arguably the fundamental ques- conjectures a priori seem quite different but have been
tion of computer science. That is, the following theorem is known: shown to be equivalent.

If 𝑃 = NP then there does not


Theorem 2.8 — Breaking crypto if P=NP.
exist a computationally secret encryption with efficient 𝐸 and 𝐷
and where the message is longer than the key.

Proof. We just sketch the proof, as this is not the focus of this course.
If 𝑃 = NP then whenever we have a loop that searches through some
domain to find some string that satisfies a particular property (like the
loop in the Distinguish subroutine above that searches over all keys)
then this loop can be sped up exponentially .

While it is very widely believed that 𝑃 ≠ NP, at the moment we


do not know how to prove this, and so have to settle for accepting the
cipher conjecture as essentially an axiom, though we will see later in
this course that we can show it follows from some seemingly weaker
conjectures.
There are several reasons to believe the cipher conjecture. We now
briefly mention some of them:

• Intuition: If the cipher conjecture is false then it means that for every
possible cipher we can make the exponential time attack described
above become efficient. It seems “too good to be true” in a similar
way that the assumption that P=NP seems too good to be true.

• Concrete candidates: As we will see in the next lecture, there are sev-
eral concrete candidate ciphers using keys shorter than messages
for which despite tons of effort, no one knows how to break them.
Some of them are widely used and hence governments and other
benign or not so benign organizations have every reason to invest
huge resources in trying to break them. Despite that as far as we
know (and we know a little more after Edward Snowden’s reve-
lations) there is no significant break known for the most popular
ciphers. Moreover, there are other ciphers that can be based on
comp u tati ona l se c u ri ty 79

canonical mathematical problems such as factoring large integers


or decoding random linear codes that are immensely interesting in
their own right, independently of their cryptographic applications.

• Minimalism: Clearly if the cipher conjecture is false then we also


don’t have a secure encryption with a message, say, twice as long
as the key. But it turns out the cipher conjecture is in fact necessary
for essentially every cryptographic primitive, including not just
private key and public key encryptions but also digital signatures,
hash functions, pseudorandom generators, and more. That is, if the
cipher conjecture is false then to a large extent cryptography does
not exist, and so we essentially have to assume this conjecture if we
want to do any kind of cryptography.

2.3 WHY CARE ABOUT THE CIPHER CONJECTURE?

“Give me a place to stand, and I shall move the world”


Archimedes, circa 250 BC

Every perfectly secure encryption scheme is clearly also compu-


tationally secret, and so if we required a message of size 𝑛 instead
𝑛 + 1, then the conjecture would have been trivially satisfied by the
one-time pad. However, having a message longer than the key by just
a single bit does not seem that impressive. Sure, if we used such a
scheme with 128-bit long keys, our communication will be smaller by
a factor of 128/129 (or a saving of about 0.8%) over the one-time pad,
but this doesn’t seem worth the risk of using an unproven conjecture.
However, it turns out that if we assume this rather weak condition,
we can actually get a computationally secret encryption scheme with
a message of size 𝑝(𝑛) for every polynomial 𝑝(⋅). In essence, we can fix
a single 𝑛-bit long key and communicate securely as many bits as we
want!
Moreover, this is just the beginning. There is a huge range of other
useful cryptographic tools that we can obtain from this seemingly
innocent conjecture: (We will see what all these names and some of
these reductions mean later in the course.)
We will soon see the first of the many reductions we’ll learn in this
course. Together this “web of reductions” forms the scientific core of
cryptography, connecting many of the core concepts and enabling us
to construct increasingly sophisticated tools based on relatively simple
“axioms” such as the cipher conjecture.

2.4 PRELUDE: COMPUTATIONAL INDISTINGUISHABILITY


The task of Eve in breaking an encryption scheme is to distinguish
between an encryption of 𝑚0 and an encryption of 𝑚1 . It turns out
80 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Figure 2.2: Web of reductions between notions equiva-


lent to ciphers with larger than key messages

to be useful to consider this question of when two distributions are


computationally indistinguishable more broadly:

Let
Definition 2.9 — Computational Indistinguishability (concrete definition).
𝑚
𝑋 and 𝑌 be two distributions over {0, 1} . We say that 𝑋 and 𝑌
are (𝑇 , 𝜖)-computationally indistinguishable, denoted by 𝑋 ≈𝑇 ,𝜖 𝑌 , if
for every function 𝐷 ∶ {0, 1}𝑚 → {0, 1} computable with at most 𝑇
operations,

| Pr[𝐷(𝑋) = 1] − Pr[𝐷(𝑌 ) = 1]| ≤ 𝜖 .

Prove that
Solved Exercise 2.1 — Computational Indistinguishability game.
for every 𝑋, 𝑌 and 𝑇 , 𝜖 as above 𝑋 ≈𝑇 ,𝜖 𝑌 if and only if for every
≤ 𝑇 -operation computable 𝐸𝑣𝑒, the probability that 𝐸𝑣𝑒 wins in the
following game is at most 1/2 + 𝜖/2:

1. We pick 𝑏 ←𝑅 {0, 1}.

2. If 𝑏 = 0, we let 𝑤 ←𝑅 𝑋. If 𝑏 = 1, we let 𝑤 ←𝑅 𝑌 .

3. We give 𝐸𝑣𝑒 the input 𝑤, and 𝐸𝑣𝑒 outputs 𝑏′ ∈ {0, 1}.

4. 𝐸𝑣𝑒 wins if 𝑏 = 𝑏′ .

P
Working out this exercise on your own is a great way
to get comfortable with computational indistinguisha-
bility, which is a fundamental notion.
comp u tati ona l se c u ri ty 81

Solution:
For every function 𝐸𝑣𝑒 ∶ {0, 1}𝑚 → {0, 1}, let 𝑝𝑋 = Pr[𝐸𝑣𝑒(𝑋) =
1] and 𝑝𝑌 = Pr[𝐸𝑣𝑒(𝑌 ) = 1].
Then the probability that 𝐸𝑣𝑒 wins the game is:

Pr[𝑏 = 0](1 − 𝑝𝑋 ) + Pr[𝑏 = 1]𝑝𝑌


and since Pr[𝑏 = 0] = Pr[𝑏 = 1] = 1/2 this is

1
2 − 12 𝑝𝑋 + 21 𝑝𝑌 = 1
2 + 12 (𝑝𝑌 − 𝑝𝑋 )

We see that 𝐸𝑣𝑒 wins the game with success 1/2+𝜖/2 if and only
if
Pr[𝐸𝑣𝑒(𝑌 ) = 1] − Pr[𝐸𝑣𝑒(𝑋) = 1] = 𝜖 .
Since Pr[𝐸𝑣𝑒(𝑌 ) = 1]−Pr[𝐸𝑣𝑒(𝑋) = 1] ≤ |Pr[𝐸𝑣𝑒(𝑋) = 1] − Pr[𝐸𝑣𝑒(𝑌 ) = 1]|,
this already shows that if 𝑋 and 𝑌 are (𝑇 , 𝜖)-indistinguishable then
𝐸𝑣𝑒 will win the game with probability at most 𝜖/2.
For the other direction, assume that 𝑋 and 𝑌 are not compu-
tationally indistinguishable and let 𝐸𝑣𝑒 be a 𝑇 time operation
function such that

|Pr[𝐸𝑣𝑒(𝑋) = 1] − Pr[𝐸𝑣𝑒(𝑌 ) = 1]| ≥ 𝜖 .

Then by definition of absolute value, there are two options.


Either Pr[𝐸𝑣𝑒(𝑋) = 1] − Pr[𝐸𝑣𝑒(𝑌 ) = 1] ≥ 𝜖 in which case
𝐸𝑣𝑒 wins the game with probability at least 1/2 + 𝜖/2. Otherwise
Pr[𝐸𝑣𝑒(𝑋) = 1] − Pr[𝐸𝑣𝑒(𝑌 ) = 1] ≤ −𝜖, in which case the function
𝐸𝑣𝑒′ (𝑤) = 1 − 𝐸𝑣𝑒(𝑤) (which is just as easy to compute) wins the
game with probability at least 1/2 + 𝜖/2.
Note that above we assume that the class of “functions com-
putable in at most 𝑇 operations” is closed under negation, in the
sense that if 𝐹 is in this class, then 1 − 𝐹 is also. For standard
Boolean circuits, this can be done if we don’t count negation gates
(which can change the total circuit size by at most a factor of two),
or we can allow for 𝐸𝑣𝑒′ to require a constant additional number of
operations, in which case the exercise is still essentially true but is
slightly more cumbersome to state.

As we did with computational secrecy, we can also define an


asymptotic definition of computational indistinguishability.

Let 𝑚 ∶ ℕ → ℕ be some function


Definition 2.10 — Computational indistt.
and let {𝑋𝑛 }𝑛∈ℕ and {𝑌𝑛 }𝑛∈ℕ be two sequences of distributions
such that 𝑋𝑛 and 𝑌𝑛 are distributions over {0, 1}𝑚(𝑛) .
82 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

We say that {𝑋𝑛 }𝑛∈ℕ and {𝑌𝑛 }𝑛∈ℕ are computationally indistin-
guishable, denoted by {𝑋𝑛 }𝑛∈ℕ ≈ {𝑌𝑛 }𝑛∈ℕ , if for every polynomial
𝑝 ∶ ℕ → ℕ and sufficiently large 𝑛, 𝑋𝑛 ≈𝑝(𝑛),1/𝑝(𝑛) 𝑌𝑛 .

Solving the following asymptotic analog of Solved Exercise 2.1


is a good way to get comfortable with the asymptotic definition of
computational indistinguishability:
Let
Exercise 2.2 — Computational Indistinguishability game (asymptotic).
{𝑋𝑛 }𝑛∈ℕ , {𝑌𝑛 }𝑛∈ℕ and 𝑚 ∶ ℕ → ℕ be as above. Then {𝑋𝑛 }𝑛∈ℕ ≈
{𝑌𝑛 }𝑛∈ℕ if and only if for every polynomial-time 𝐸𝑣𝑒, there is some
negligible function 𝜇 such that 𝐸𝑣𝑒 wins the following game with
probability at most 1/2 + 𝜇(𝑛):

1. We pick 𝑏 ←𝑅 {0, 1}.

2. If 𝑏 = 0, we let 𝑤 ←𝑅 𝑋𝑛 . If 𝑏 = 1, we let 𝑤 ←𝑅 𝑌𝑛 .

3. We give 𝐸𝑣𝑒 the input 𝑤, and 𝐸𝑣𝑒 outputs 𝑏′ ∈ {0, 1}.

4. 𝐸𝑣𝑒 wins if 𝑏 = 𝑏′ .

Dropping the index n. Since the index 𝑛 of our distributions would


often be clear from context (indeed in most cases it will be the length
of the key), we will sometimes drop it from our notation. So if 𝑋 and
𝑌 are two random variables that depend on some index 𝑛, we will say
that 𝑋 is computationally indistinguishable from 𝑌 (denoted as 𝑋 ≈
𝑌 ) when the sequences {𝑋𝑛 }𝑛∈ℕ and {𝑌𝑛 }𝑛∈ℕ are computationally
indistinguishable.
We can use computational indistinguishability to phrase the defini-
tion of computational secrecy more succinctly:

Let
Theorem 2.11 — Computational Indistinguishability phrasing of security.
(𝐸, 𝐷) be a valid encryption scheme. Then (𝐸, 𝐷) is computation-
ally secret if and only if for every two messages 𝑚0 , 𝑚1 ∈ {0, 1}ℓ ,

{𝐸𝑘 (𝑚0 )}𝑛∈ℕ ≈ {𝐸𝑘 (𝑚1 )}𝑛∈ℕ

where each of these two distributions is obtained by sampling a


𝑛
random 𝑘←𝑅 {0, 1} .

Working out the proof is an excellent way to make sure you under-
stand both the definition of computational secrecy and computational
indistinguishability, and hence we leave it as an exercise.
One intuition for computational indistinguishability is that it is
related to some notion of distance. If two distributions are computa-
tionally indistinguishable, then we can think of them as “very close”
comp u tati ona l se c u ri ty 83

to one another, at least as far as efficient observers are concerned. In-


tuitively, if 𝑋 is close to 𝑌 and 𝑌 is close to 𝑍 then 𝑋 should be close 8
Results of this form are known as “triangle inequal-
to 𝑍.8 Similarly if four distributions 𝑋, 𝑋 ′ , 𝑌 , 𝑌 ′ satisfy that 𝑋 is close ities” since they can be viewed as generalizations of
to 𝑌 and 𝑋 ′ is close to 𝑌 ′ , then you might expect that the distribu- the statement that for every three points on the plane
𝑥, 𝑦, 𝑧, the distance from 𝑥 to 𝑧 is not larger than the
tion (𝑋, 𝑋 ′ ) where we take two independent samples from 𝑋 and 𝑋 ′
distance from 𝑥 to 𝑦 plus the distance from 𝑦 to 𝑧. In
respectively, is close to the distribution (𝑌 , 𝑌 ′ ) where we take two other words, the edge 𝑥, 𝑧 of the triangle (𝑥, 𝑦, 𝑧) is
independent samples from 𝑌 and 𝑌 ′ respectively. We will now verify not longer than the sum of the lengths of the other
two edges 𝑥, 𝑦 and 𝑦, 𝑧.
that these intuitions are in fact correct:

Theorem 2.12 — Triangle Inequality for Computational Indistinguishability.


Suppose 𝑋1 ≈𝑇 ,𝜖 𝑋2 ≈𝑇 ,𝜖 ⋯ ≈𝑇 ,𝜖 𝑋𝑚 . Then 𝑋1 ≈𝑇 ,(𝑚−1)𝜖 𝑋𝑚 .

Proof. Suppose that there exists a 𝑇 time 𝐸𝑣𝑒 such that

| Pr[𝐸𝑣𝑒(𝑋1 ) = 1] − Pr[𝐸𝑣𝑒(𝑋𝑚 ) = 1]| > (𝑚 − 1)𝜖 .

Write
𝑚−1
Pr[𝐸𝑣𝑒(𝑋1 ) = 1]−Pr[𝐸𝑣𝑒(𝑋𝑚 ) = 1] = ∑ (Pr[𝐸𝑣𝑒(𝑋𝑖 ) = 1] − Pr[𝐸𝑣𝑒(𝑋𝑖+1 ) = 1]) .
𝑖=1

Thus,
𝑚−1
∑ |Pr[𝐸𝑣𝑒(𝑋𝑖 ) = 1] − Pr[𝐸𝑣𝑒(𝑋𝑖+1 ) = 1]| > (𝑚 − 1)𝜖
𝑖=1

and hence in particular there must exist some 𝑖 ∈ {1, … , 𝑚 − 1} such


that
|Pr[𝐸𝑣𝑒(𝑋𝑖 ) = 1] − Pr[𝐸𝑣𝑒(𝑋𝑖+1 ) = 1]| > 𝜖

contradicting the assumption that {𝑋𝑖 } ≈𝑇 ,𝜖 {𝑋𝑖+1 } for all 𝑖 ∈


{1, … , 𝑚 − 1}.

Theorem 2.13 — Computational Indistinguishability is preserved under repeti-


𝑛
Suppose that 𝑋1 , … , 𝑋ℓ , 𝑌1 , … , 𝑌ℓ are distributions over {0, 1}
tion.
such that 𝑋𝑖 ≈𝑇 ,𝜖 𝑌𝑖 . Then (𝑋1 , … , 𝑋ℓ ) ≈𝑇 −10ℓ𝑛,ℓ𝜖 (𝑌1 , … , 𝑌ℓ ).

Proof. For every 𝑖 ∈ {0, … , ℓ} we define 𝐻𝑖 to be the distribu-


tion (𝑋1 , … , 𝑋𝑖 , 𝑌𝑖+1 , … , 𝑌ℓ ). Clearly 𝐻ℓ = (𝑋1 , … , 𝑋ℓ ) and
𝐻0 = (𝑌1 , … , 𝑌ℓ ). We will prove that for every 𝑖, 𝐻𝑖−1 ≈𝑇 −10ℓ𝑛,𝜖 𝐻𝑖 ,
and the proof will then follow from the triangle inequality (can
you see why?). Indeed, suppose towards the sake of contradic-
tion that there was some 𝑖 ∈ {1, … , ℓ} and some 𝑇 − 10ℓ𝑛-time
𝑛ℓ
𝐸𝑣𝑒′ ∶ {0, 1} → {0, 1} such that

|𝔼[𝐸𝑣𝑒′ (𝐻𝑖−1 )] − 𝔼[𝐸𝑣𝑒′ (𝐻𝑖 )]| > 𝜖 .


84 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

In other words

∣𝔼𝑋1 ,…,𝑋𝑖−1 ,𝑌𝑖 ,…,𝑌ℓ [𝐸𝑣𝑒′ (𝑋1 , … , 𝑋𝑖−1 , 𝑌𝑖 , … , 𝑌ℓ )] − 𝔼𝑋1 ,…,𝑋𝑖 ,𝑌𝑖+1 ,…,𝑌ℓ [𝐸𝑣𝑒′ (𝑋1 , … , 𝑋𝑖 , 𝑌𝑖+1 , … , 𝑌ℓ )]∣ > 𝜖 .

By linearity of expectation we can write the difference of these two


expectations as

𝔼𝑋1 ,…,𝑋𝑖−1 ,𝑋𝑖 ,𝑌𝑖 ,𝑌𝑖+1 ,…,𝑌ℓ [𝐸𝑣𝑒′ (𝑋1 , … , 𝑋𝑖−1 , 𝑌𝑖 , 𝑌𝑖+1 , … , 𝑌ℓ ) − 𝐸𝑣𝑒′ (𝑋1 , … , 𝑋𝑖−1 , 𝑋𝑖 , 𝑌𝑖+1 , … , 𝑌ℓ )]

By the averaging principle9 this means that there exist some values 9
This is the principle that if the average grade in an
𝑥1 , … , 𝑥𝑖−1 , 𝑦𝑖+1 , … , 𝑦ℓ such that exam was at least 𝛼 then someone must have gotten at
least 𝛼, or in other words that if a real-valued random
∣𝔼𝑋𝑖 ,𝑌𝑖 [𝐸𝑣𝑒′ (𝑥1 , … , 𝑥𝑖−1 , 𝑌𝑖 , 𝑦𝑖+1 , … , 𝑦ℓ ) − 𝐸𝑣𝑒′ (𝑥1 , … , 𝑥𝑖−1 , 𝑋𝑖 , 𝑦𝑖+1 , … , 𝑦ℓ )]∣ variable
>𝜖 𝑍 satisfies 𝔼[𝑍] ≥ 𝛼 then Pr[𝑍 ≥ 𝛼] > 0.

Now 𝑋𝑖 and 𝑌𝑖 are simply independent draws from the


distributions 𝑋 and 𝑌 respectively, and so if we define
𝐸𝑣𝑒(𝑧) = 𝐸𝑣𝑒′ (𝑥1 , … , 𝑥𝑖−1 , 𝑧, 𝑦𝑖+1 , … , 𝑦ℓ ) then 𝐸𝑣𝑒 runs in time
at most the running time of 𝐸𝑣𝑒′ plus 10ℓ𝑛10 and it satisfies
10
The cost 10ℓ𝑛 is for the operations of feeding the
∣𝔼𝑋𝑖 [𝐸𝑣𝑒(𝑋𝑖 )] − 𝔼𝑌𝑖 [𝐸𝑣𝑒(𝑌𝑖 )]∣ > 𝜖 “hardwired” strings 𝑥1 , … , 𝑥𝑖−1 , 𝑦𝑖+1 , … , 𝑦ℓ into
𝐸𝑣𝑒′ . These take up at most ℓ𝑛 bits, and depending
contradicting the assumption that 𝑋𝑖 ≈𝑇 ,𝜖 𝑌𝑖 . on the computational model, storing and feeding
them into 𝐸𝑣𝑒′ may take 𝑐ℓ𝑛 steps for some small
■ constant 𝑐 < 10. In the future, we will usually ignore
such minor details and simply say that if 𝐸𝑣𝑒′ runs in
polynomial time then so will 𝐸𝑣𝑒.
R
Remark 2.14 — The hybrid argument. The above proof
illustrates a powerful technique known as the hybrid
argument whereby we show that two distribution
𝐶 0 and 𝐶 1 are close to each other by coming up
with a sequence of distributions 𝐻0 , … , 𝐻𝑡 such that
𝐻𝑡 = 𝐶 1 , 𝐻0 = 𝐶 0 , and we can argue that 𝐻𝑖 is close to
𝐻𝑖+1 for all 𝑖. This type of argument repeats itself time
and again in cryptography, and so it is important to
get comfortable with it.

2.5 THE LENGTH EXTENSION THEOREM OR STREAM CIPHERS


We now turn to show the length extension theorem, stating that if we
have an encryption for 𝑛 + 1-length messages with 𝑛-length keys,
then we can obtain an encryption with 𝑝(𝑛)-length messages for every
polynomial 𝑝(𝑛). For a warm-up, let’s show the easier fact that we
can transform an encryption such as above, into one that has keys of
length 𝑡𝑛 and messages of length 𝑡(𝑛 + 1) for every integer 𝑡:

Theorem 2.15 — Security of repetition. Suppose that (𝐸 ′ , 𝐷′ ) is a com-


putationally secret encryption scheme with 𝑛 bit keys and 𝑛 + 1 bit
messages. Then the scheme (𝐸, 𝐷) where 𝐸𝑘1 ,…,𝑘𝑡 (𝑚1 , … , 𝑚𝑡 ) =
(𝐸𝑘′ 1 (𝑚1 ), … , 𝐸𝑘′ 𝑡 (𝑚𝑡 )) and 𝐷𝑘1 ,…,𝑘𝑡 (𝑐1 , … , 𝑐𝑡 ) = (𝐷𝑘′ 1 (𝑐1 ), … , 𝐷𝑘′ 𝑡 (𝑐𝑡 ))
comp u tati ona l se c u ri ty 85

is a computationally secret scheme with 𝑡𝑛 bit keys and 𝑡(𝑛 + 1) bit


messages.

Proof. This might seem “obvious” but in cryptography, even obvious


facts are sometimes wrong, so it’s important to prove this formally.
Luckily, this is a fairly straightforward implication of the fact that
computational indisinguishability is preserved under many samples.
That is, by the security of (𝐸 ′ , 𝐷′ ) we know that for every two mes-
𝑛+1
sages 𝑚, 𝑚′ ∈ {0, 1} , 𝐸𝑘′ (𝑚) ≈ 𝐸𝑘′ (𝑚′ ) where 𝑘 is chosen from
the distribution 𝑈𝑛 . Therefore by the indistinguishability of many
𝑛+1
samples lemma, for every two tuples 𝑚1 , … , 𝑚𝑡 ∈ {0, 1} and
𝑛+1
𝑚′1 , … , 𝑚′𝑡 ∈ {0, 1} ,

(𝐸𝑘′ 1 (𝑚1 ), … , 𝐸𝑘′ 𝑡 (𝑚𝑡 )) ≈ (𝐸𝑘′ 1 (𝑚′1 ), … , 𝐸𝑘′ 𝑡 (𝑚′𝑡 ))


for random 𝑘1 , … , 𝑘𝑡 chosen independently from 𝑈𝑛 which is ex-
actly the condition that (𝐸, 𝐷) is computationally secret.

Randomized encryption scheme. We can now prove the full length exten-
sion theorem. Before doing so, we will need to generalize the notion
of an encryption scheme to allow a randomized encryption scheme. That
is, we will consider encryption schemes where the encryption algo-
rithm can “toss coins” in its computation. There is a crucial difference
between key material and such “as hoc” (sometimes also known as
“ephemeral”) randomness. Keys need to be not only chosen at ran-
dom, but also shared in advance between the sender and receiver, and
stored securely throughout their lifetime. The “coin tosses” used by
a randomized encryption scheme are generated “on the fly” and are
not known to the receiver, nor do they need to be stored long term by
the sender. So, allowing such randomized encryption does not make
a difference for most applications of encryption schemes. In fact, as
we will see later in this course, randomized encryption is necessary for
security against more sophisticated attacks such as chosen plaintext
and chosen ciphertext attacks, as well as for obtaining secure public key
encryptions. We will use the notation 𝐸𝑘 (𝑚; 𝑟) to denote the output of
the encryption algorithm on key 𝑘, message 𝑚 and using internal ran-
domness 𝑟. We often suppress the notation for the randomness, and
hence use 𝐸𝑘 (𝑚) to denote the random variable obtained by sampling
a random 𝑟 and outputting 𝐸𝑘 (𝑚; 𝑟).
We can now show that given an encryption scheme with messages
one bit longer than the key, we can obtain a (randomized) encryption
scheme with arbitrarily long messages:
86 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Suppose that there exists a


Theorem 2.16 — Length extension of ciphers.
computationally secret encryption scheme (𝐸 ′ , 𝐷′ ) with key length
𝑛 and message length 𝑛 + 1. Then for every polynomial 𝑡(𝑛) there
exists a (randomized) computationally secret encryption scheme
(𝐸, 𝐷) with key length 𝑛 and message length 𝑡(𝑛).

Figure 2.3: Constructing a cipher with 𝑡 bit long


messages from one with 𝑛 + 1 long messages

P
This is perhaps our first example of a non trivial cryp-
tographic theorem, and the blueprint for this proof
will be one that we will follow time and again during
this course. Please make sure you read this proof
carefully and follow the argument.

Proof of Theorem 2.16. The construction, depicted in Fig. 2.3, is actually


quite natural and variants of it are used in practice for stream ciphers,
which are ways to encrypt arbitrarily long messages using a fixed size
key. The idea is that we use a key 𝑘0 of size 𝑛 to encrypt (1) a fresh
key 𝑘1 of size 𝑛 and (2) one bit of the message. Now we can encrypt
𝑘2 using 𝑘1 and so on and so forth. We now describe the construction
and analysis in detail.
Let 𝑡 = 𝑡(𝑛). We are given a cipher 𝐸 ′ which can encrypt 𝑛 + 1-
bit long messages with an 𝑛-bit long key and we need to encrypt a
𝑡
𝑡-bit long message 𝑚 = (𝑚1 , … , 𝑚𝑡 ) ∈ {0, 1} . Our idea is simple (at
𝑛
least in hindsight). Let 𝑘0 ←𝑅 {0, 1} be our key (which is chosen at 11
The keys 𝑘1 , … , 𝑘𝑡 are sometimes known as
random). To encrypt 𝑚 using 𝑘0 , the encryption function will choose ephemeral keys in the crypto literature, since they
𝑛
𝑡 random strings 𝑘1 , … , 𝑘𝑡 ←𝑅 {0, 1} . We will then encrypt the 𝑛 + 1- are created only for the purposes of this particular
interaction.
bit long message (𝑘1 , 𝑚1 ) with the key 𝑘0 to obtain the ciphertext 𝑐1 , 12
The astute reader might note that the key 𝑘𝑡 is
then encrypt the 𝑛 + 1-bit long message (𝑘2 , 𝑚2 ) with the key 𝑘1 to actually not used anywhere in the encryption nor
decryption and hence we could encrypt 𝑛 more bits
obtain the ciphertext 𝑐2 , and so on and so forth until we encrypt the
of the message instead in this final round. We used
message (𝑘𝑡 , 𝑚𝑡 ) with the key 𝑘𝑡−1 .11 We output (𝑐1 , … , 𝑐𝑡 ) as the final the current description for the sake of symmetry and
ciphertext.12 simplicity of exposition.
comp u tati ona l se c u ri ty 87

To decrypt (𝑐1 , … , 𝑐𝑡 ) using the key 𝑘0 , first decrypt 𝑐1 to learn


(𝑘1 , 𝑚1 ), then use 𝑘1 to decrypt 𝑐2 to learn (𝑘2 , 𝑚2 ), and so on until we
use 𝑘𝑡−1 to decrypt 𝑐𝑡 and learn (𝑘𝑡 , 𝑚𝑡 ). Finally we can simply output
(𝑚1 , … , 𝑚𝑡 ).
The above are clearly valid encryption and decryption algorithms,
and hence the real question becomes is it secure??. The intuition is that
𝑐1 hides all information about (𝑘1 , 𝑚1 ) and so in particular the first bit
of the message is encrypted securely, and 𝑘1 still can be treated as an
unknown random string even to an adversary that saw 𝑐1 . Thus, we
can think of 𝑘1 as a random secret key for the encryption 𝑐2 , and hence
the second bit of the message is encrypted securely, and so on and so
forth.
Our discussion above looks like a reasonable intuitive argument,
but to make sure it’s true we need to give an actual proof. Let 𝑚, 𝑚′ ∈
𝑡
{0, 1} be two messages. We need to show that 𝐸𝑈𝑛 (𝑚) ≈ 𝐸𝑈𝑛 (𝑚′ ).
The heart of the proof will be the following claim:
Claim: Let 𝐸̂ be the algorithm that on input a message 𝑚 and key
𝑘0 works like 𝐸 except that its 𝑖𝑡ℎ block contains 𝐸𝑘′ 𝑖−1 (𝑘𝑖′ , 𝑚𝑖 ) where 𝑘𝑖′
𝑛
is a random string in {0, 1} , that is chosen independently of everything
𝑡
else including the key 𝑘𝑖 . Then, for every message 𝑚 ∈ {0, 1}

̂ (𝑚) .
𝐸𝑈𝑛 (𝑚) ≈ 𝐸𝑈𝑛
(2.1)

Note that 𝐸̂ is not a valid encryption scheme since it’s not at all
clear there is a decryption algorithm for it. It is just an hypothetical
tool we use for the proof. Since both 𝐸 and 𝐸̂ are randomized en-
cryption schemes (with 𝐸 using (𝑡 − 1)𝑛 bits of randomness for the
ephemeral keys 𝑘1 , … , 𝑘𝑡−1 and 𝐸̂ using (2𝑡 − 1)𝑛 bits of randomness
for the ephemeral keys 𝑘1 , … , 𝑘𝑡 , 𝑘2′ , … , 𝑘𝑡′ ), we can also write (2.1) as


𝐸𝑈𝑛 (𝑚; 𝑈𝑡𝑛 ̂ (𝑚; 𝑈 ′
) ≈ 𝐸𝑈 𝑛 (2𝑡−1)𝑛 )

where we use 𝑈ℓ′ to denote a random variable that is chosen uni-


formly at random from {0, 1}ℓ and independently from the choice
of 𝑈𝑛 (which is chosen uniformly at random from {0, 1}𝑛 ).
Once we prove the claim then we are done since we know that for
every pair of messages 𝑚, 𝑚′ , 𝐸𝑈𝑛 (𝑚) ≈ 𝐸𝑈 ̂ (𝑚) and 𝐸𝑈 (𝑚′ ) ≈
𝑛 𝑛
𝐸𝑈̂ (𝑚′ ) but 𝐸𝑈 ̂ (𝑚) ≈ 𝐸𝑈 ̂ (𝑚′ ) since 𝐸̂ is essentially the same as
𝑛 𝑛 𝑛
the 𝑡-times repetition scheme we analyzed above. Thus by the triangle
inequality we can conclude that 𝐸𝑈𝑛 (𝑚) ≈ 𝐸𝑈𝑛 (𝑚′ ) as we desired.
Proof of claim: We prove the claim by the hybrid method. For
𝑗 ∈ {0, … , 𝑡}, let 𝐻𝑗 be the distribution of ciphertexts where in the first
𝑗 blocks we act like 𝐸̂ and in the last 𝑡 − 𝑗 blocks we act like 𝐸. That
is, we choose 𝑘0 , … , 𝑘𝑡 , 𝑘1′ , … , 𝑘𝑡′ independently at random from 𝑈𝑛
and the 𝑖𝑡ℎ block of 𝐻𝑗 is equal to 𝐸𝑘′ 𝑖−1 (𝑘𝑖 , 𝑚𝑖 ) if 𝑖 > 𝑗 and is equal to
88 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

𝐸𝑘′ 𝑖−1 (𝑘𝑖′ , 𝑚𝑖 ) if 𝑖 ≤ 𝑗. Clearly, 𝐻𝑡 = 𝐸𝑈


̂ (𝑚) and 𝐻0 = 𝐸𝑈 (𝑚) and so
𝑛 𝑛
it suffices to prove that for every 𝑗, 𝐻𝑗−1 ≈ 𝐻𝑗 . Indeed, let 𝑗 ∈ {1, … , 𝑡}
and suppose towards the sake of contradiction that there exists an
efficient 𝐸𝑣𝑒′ such that

∣𝔼[𝐸𝑣𝑒′ (𝐻𝑗−1 )] − 𝔼[𝐸𝑣𝑒′ (𝐻𝑗 )]∣ ≥ 𝜖 (∗)

where 𝜖 = 𝜖(𝑛) is noticeable. By the averaging principle, there


exists some fixed choice for 𝑘1′ , … , 𝑘𝑡′ , 𝑘0 , … , 𝑘𝑗−2 , 𝑘𝑗 , … , 𝑘𝑡 such that (∗)
still holds. Note that in this case the only randomness is the choice of
𝑘𝑗−1 ←𝑅 𝑈𝑛 and moreover the first 𝑗 − 1 blocks and the last 𝑡 − 𝑗 blocks
of 𝐻𝑗−1 and 𝐻𝑗 would be identical and we can denote them by 𝛼 and 𝛽
respectively and hence write (∗) as

∣𝔼𝑘𝑗−1 [𝐸𝑣𝑒′ (𝛼, 𝐸𝑘′ 𝑗−1 (𝑘𝑗 , 𝑚𝑗 ), 𝛽) − 𝐸𝑣𝑒′ (𝛼, 𝐸𝑘′ 𝑗−1 (𝑘𝑗′ , 𝑚𝑗 ), 𝛽)]∣ ≥ 𝜖 (∗∗)

But now consider the adversary 𝐸𝑣𝑒 that is defined as 𝐸𝑣𝑒(𝑐) =


𝐸𝑣𝑒′ (𝛼, 𝑐, 𝛽). Then 𝐸𝑣𝑒 is also efficient and by (∗∗) it can distinguish
between 𝐸𝑈′ 𝑛 (𝑘𝑗 , 𝑚𝑗 ) and 𝐸𝑈′ 𝑛 (𝑘𝑗′ , 𝑚𝑗 ) thus contradicting the secu-
rity of (𝐸 ′ , 𝐷′ ). This concludes the proof of the claim and hence the
theorem.

2.5.1 Appendix: The computational model


For concreteness sake let us give a precise definition of what it means
for a function or probabilistic process 𝑓 mapping {0, 1}𝑛 to {0, 1}𝑚 to
be computable using 𝑇 operations.

• If you have taken any course on computational complexity (such as


Harvard CS 121), then this is the model of Boolean circuits, except
that we also allow randomization.

• If you have not taken such a course, you might simply take it on
faith that it is possible to model what it means for an algorithm to
be able to map an input 𝑥 into an output 𝑓(𝑥) using 𝑇 “elementary
operations”.

In both cases you might want to skip this appendix and only return
to it if you find something confusing.
The model we use is a Boolean circuit that also has a RAND gate
that outputs a random bit. We could use as the basic set of gates
the standard AND, OR and NOT but for simplicity we use the one-
element set NAND. We represent the circuit as a straightline program,
but this is of course just a matter of convenience. As shown (for exam-
ple) in the CS 121 textbook, these two representations are identical.
comp u tati ona l se c u ri ty 89

A probabilistic straight-
Definition 2.17 — Probabilistic straightline program.
line program consists of a sequence of lines, each one of them one of
the following forms:

• foo = NAND(bar, baz) where foo,bar,baz are variable identi-


fiers.

• foo = RAND() where foo is a variable identifier.

Given a program 𝜋, we say that its size is the number of lines it


contains. Variables of the form X[𝑖] or Y[𝑗] are considered input and
output variables respectively. If the input variables range from 0 to
𝑛 − 1 and the output variables range from 0 to 𝑚 − 1 then the program
computes the probabilistic process that maps {0, 1}𝑛 to {0, 1}𝑚 in the
natural way. If 𝐹 is a (probabilistic or deterministic) map of {0, 1}𝑛 to
{0, 1}𝑚 , the complexity of 𝐹 is the size of the smallest program 𝑃 that
computes it.
If you haven’t taken a class such as CS121 before, you might won-
der how such a simple model captures complicated programs that use
loops, conditionals, and more complex data types than simply a bit in
{0, 1}, not to mention some special purpose crypto-breaking devices
that might involve tailor-made hardware. It turns out that it does (for
the same reason we can compile complicated programming languages
to run on silicon chips with a very limited instruction set). In fact, as
far as we know, this model can capture even computations that hap-
pen in nature, whether it’s in a bee colony or the human brain (which
contains about 1010 neurons, so should in principle be simulatable
by a program that has up to a few order of magnitudes of the same
number of lines). Crucially, for cryptography, we care about such pro-
grams not because we want to actually run them, but because we want
to argue about their non existence. If we have a process that cannot be
computed by a straightline program of length shorter than 2128 > 1038
then it seems safe to say that a computer the size of the human brain
(or even all the human and nonhuman brains on this planet) will not
be able to perform it either.

Advanced note: non uniformity. The computational model we use in this


class is non uniform (corresponding to Boolean circuits) as opposed
to uniform (corresponding to Turing machines). If this distinction
doesn’t mean anything to you, you can ignore it as it won’t play a sig-
nificant role in what we do next. It basically means that we do allow
our programs to have hardwired constants of 𝑝𝑜𝑙𝑦(𝑛) bits where 𝑛 is
the input/key length. In fact, to be precise, we will hold ourselves to a
higher standard than our adversary, in the sense that we require our
90 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

algorithms to be efficient in the stronger sense of being computable


in uniform probabilistic polynomial time (for some fixed polynomial,
often 𝑂(𝑛) or 𝑂(𝑛2 )), while the adversary is allowed to use non uni-
formity.

Quantum computing. An interesting potential exception to this princi-


ple that every natural process should be simulatable by a straightline
program of comparable complexity are processes where the quantum
mechanical notions of interference and entanglement play a significant
role. We will talk about this notion of quantum computing towards the
end of the course, though note that much of what we say does not
really change when we add quantum into the picture. As discussed
in the CS 121 text, we can still capture these processes by straightline
programs (that now have somewhat more complex form), and so
most of what we’ll do just carries over in the same way to the quantum
realm as long as we are fine with conjecturing the strong form of the
cipher conjecture, namely that the cipher is infeasible to break even for
quantum computers. All current evidence points toward this strong
form being true as well. The field of constructing encryption schemes
that are potentially secure against quantum computers is known as
post quantum cryptography and we will return to this later in the
course.
3
Pseudorandomness

1
Edited and expanded by Richard Xu in Spring 2020.
Reading: Katz-Lindell Section 3.3, Boneh-Shoup Chapter 31
The nature of randomness has troubled philosophers, scientists, 2
Even lawyers grapple with this question, with a
statisticians and laypeople for many years.2 Over the years people recent example being the debate of whether fantasy
have given different answers to the question of what does it mean for football is a game of chance or of skill.
data to be random, and what is the nature of probability. The move-
ments of the planets initially looked random and arbitrary, but then
early astronomers managed to find order and make some predictions
on them. Similarly, we have made great advances in predicting the
weather and will probably continue to do so.
So, while these days it seems as if the event of whether or not it will
rain a week from today is random, we could imagine that in the future
we will be able to predict the weather accurately. Even the canonical
notion of a random experiment -tossing a coin - might not be as ran-
dom as you’d think: the second toss will have the same result as the
first one with about a 51% chance. (Though see also this experiment.)
It is conceivable that at some point someone would discover some
function 𝐹 that, given the first 100 coin tosses by any given person, can 3
In fact such a function must exist in some sense since
predict the value of the 101𝑠𝑡 .3 in the entire history of the world, presumably no
In all these examples, the physics underlying the event, whether it’s sequence of 100 fair coin tosses has ever repeated.
the planets’ movement, the weather, or coin tosses, did not change but
only our powers to predict them. So to a large extent, randomness is a
function of the observer, or in other words
If a quantity is hard to compute, it might as well be
random.

Much of cryptography is about trying to make this intuition more


formal, and harnessing it to build secure systems. The basic object we
want is the following:

Compiled on 11.17.2021 22:35


92 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

A function 𝐺
Definition 3.1 — Pseudorandom generator (concrete). ∶
𝑛 ℓ
{0, 1} → {0, 1} is a (𝑇 , 𝜖) pseudorandom generator if 𝐺(𝑈𝑛 ) ≈𝑇 ,𝜖 𝑈ℓ
𝑡
where 𝑈𝑡 denotes the uniform distribution on {0, 1} .

That is, 𝐺 is a (𝑇 , 𝜖) pseudorandom generator if no circuit of at most


𝑇 gates can distinguish with bias better than 𝜖 between the output of
𝐺 (on a random input) and a uniformly random string of the same
length. Spelling this out fully, this means that for every function 𝐷 ∶
{0, 1}ℓ → {0, 1} computable using at most 𝑇 operations,

∣ Pr [𝐷(𝐺(𝑥)) = 1] − Pr [𝐷(𝑦) = 1]∣ < 𝜖 .


𝑥←𝑅 {0,1}𝑛 𝑦←𝑅 {0,1}ℓ

As we did for the case of encryption, we will typically use asymp-


totic terms to describe cryptographic pseudorandom generator. We
say that 𝐺 is simply a pseudorandom generator if it is efficiently com-
putable and it is (𝑝(𝑛), 1/𝑝(𝑛))-pseudorandom generator for every
polynomial 𝑝(⋅). In other words, we define pseudorandom generators
as follows:

Let 𝐺 ∶ {0, 1}∗ → {0, 1}∗ be


Definition 3.2 — Pseudorandom generator.
some function computable in polynomial time. We say that 𝐺 is
a pseudorandom generator with length function ℓ ∶ ℕ → ℕ (where
ℓ(𝑛) > 𝑛) if

• For every 𝑥 ∈ {0, 1}∗ , |𝐺(𝑥)| = ℓ(|𝑥|).

• For every polynomial 𝑝(⋅) and sufficiently large 𝑛, if 𝐷 ∶


{0, 1} ℓ(𝑛)
→ {0, 1} is computable by at most 𝑝(𝑛) operations,
then

|Pr[𝐷(𝐺(𝑈𝑛 )) = 1] − Pr[𝐷(𝑈ℓ ) = 1]| < 1


𝑝(𝑛) (3.1)

Another way to say it, is that a polynomial-time computable func-


tion 𝐺 mapping 𝑛 bits strings to ℓ(𝑛) > 𝑛 bit strings is a pseudo-
random generator if the two distributions 𝐺(𝑈𝑛 ) and 𝑈ℓ(𝑛) are compu-
tationally indistinguishable.

P
This definition (as is often the case in cryptography)
is a bit long, but the concept of a pseudorandom gen-
erator is central to cryptography, and so you should
take your time and make sure you understand it. In-
tuitively, a function 𝐺 is a pseudorandom generator
if (1) it expands its input (mapping 𝑛 bits to 𝑛 + 1 or
more) and (2) we cannot distinguish between the out-
p se u d ora n d omn e ss 93

put 𝐺(𝑥) for 𝑥 a short (i.e., 𝑛 bit long) random string,


often known as the seed of the pseudorandom gen-
erator, and a truly random long (i.e., of length ℓ(𝑛))
string chosen uniformly at random from {0, 1}ℓ(𝑛) .

Figure 3.1: A function 𝐺 ∶ {0, 1}𝑛 → {0, 1}ℓ(𝑛) is


a pseudorandom generator if 𝐺(𝑥) for a random short
𝑥 ←𝑅 {0, 1}𝑛 is computationally indistinguishable
from a long truly random 𝑦 ←𝑅 {0, 1}ℓ(𝑛) .

Note that the requirement that ℓ > 𝑛 is crucial to make this notion
non-trivial, as for ℓ = 𝑛 the function 𝐺(𝑥) = 𝑥 clearly satisfies that
𝐺(𝑈𝑛 ) is identical to (and hence in particular indistinguishable from)
the distribution 𝑈𝑛 . (Make sure that you understand this last state-
ment!) However, for ℓ > 𝑛 this is no longer trivial at all. In particular,
if we didn’t restrict the running time of 𝐸𝑣𝑒 then no such pseudo-
random generator would exist:
𝑛 𝑛+1
Lemma 3.3 Suppose that 𝐺 ∶ {0, 1} → {0, 1} . Then there ex-
𝑛+1
ists an (inefficient) algorithm 𝐸𝑣𝑒 ∶ {0, 1} → {0, 1} such that
𝔼[𝐸𝑣𝑒(𝐺(𝑈𝑛 ))] = 1 but 𝔼[𝐸𝑣𝑒(𝑈𝑛+1 )] ≤ 1/2.
𝑛+1
Proof. On input 𝑦 ∈ {0, 1} , consider the algorithm 𝐸𝑣𝑒 that goes
𝑛
over all possible 𝑥 ∈ {0, 1} and will output 1 if and only if 𝑦 = 𝐺(𝑥)
for some 𝑥. Clearly 𝔼[𝐸𝑣𝑒(𝐺(𝑈𝑛 ))] = 1. However, the set 𝑆 = {𝐺(𝑥) ∶
𝑛
𝑥 ∈ {0, 1} } on which Eve outputs 1 has size at most 2𝑛 , and hence a
random 𝑦←𝑅 𝑈𝑛+1 will fall in 𝑆 with probability at most 1/2.

It is not hard to show that if 𝑃 = NP then the above algorithm Eve


can be made efficient. In particular, at the moment we do not know
how to prove the existence of pseudorandom generators. Nevertheless
we believe that pseudorandom generators exist and hence we make
the following conjecture:
Conjecture (The PRG conjecture): For every 𝑛,
there exists a pseudorandom generator 𝐺 mapping
𝑛 bits to 𝑛 + 1 bits. 4
94 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

As was the case for the cipher conjecture, and any other conjecture,
there are two natural questions regarding the PRG conjecture: why
should we believe it and why should we care. Fortunately, the answer
to the first question is simple: it is known that the cipher conjecture
implies the PRG conjecture, and hence if we believe the former we
should believe the latter. (The proof is highly non-trivial and we may
not get to see it in this course.) As for the second question, we will
see that the PRG conjecture implies a great number of useful crypto-
graphic tools, including the cipher conjecture (i.e., the two conjectures
are in fact equivalent). We start by showing that once we can get to an
output that is one bit longer than the input, we can in fact obtain any
polynomial number of bits.

Suppose that the PRG con-


Theorem 3.4 — Length Extension for PRG’s.
jecture is true. Then for every polynomial 𝑡(𝑛), there exists a pseu-
dorandom generator mapping 𝑛 bits to 𝑡(𝑛) bits.

Figure 3.2: Length extension for pseudorandom


generators

Proof. The proof of this theorem is very similar to the length extension
theorem for ciphers, and in fact this theorem can be used to give an
alternative proof for the former theorem.
The construction is illustrated in Fig. 3.2. We are given a pseu-
dorandom generator 𝐺′ mapping 𝑛 bits into 𝑛 + 1 bits and need to
construct a pseudorandom generator 𝐺 mapping 𝑛 bits to 𝑡 = 𝑡(𝑛) bits
for some polynomial 𝑡(⋅). The idea is that we maintain a state of 𝑛 bits, 5
Because we use a small input to grow a large pseu-
which are originally our input seed5 𝑠0 , and at the 𝑖𝑡ℎ step we use 𝐺′ dorandom string, the input to a pseudorandom
to map 𝑠𝑖−1 to the 𝑛 + 1-long bit string (𝑠𝑖 , 𝑦𝑖 ), output 𝑦𝑖 and keep 𝑠𝑖 generator is often known as its seed.

as our new state. To prove the security of this construction we need


to show that the distribution 𝐺(𝑈𝑛 ) = (𝑦1 , … , 𝑦𝑡 ) is computationally
indistinguishable from the uniform distribution 𝑈𝑡 . As usual, we will
use the hybrid argument. For 𝑖 ∈ {0, … , 𝑡} we define 𝐻𝑖 to be the dis-
tribution where the first 𝑖 bits chosen uniformly at random, whereas
the last 𝑡 − 𝑖 bits are computed as above. Namely, we choose 𝑠𝑖 at ran-
p se u d ora n d omn e ss 95

dom in {0, 1}𝑛 and continue the computation of 𝑦𝑖+1 , … , 𝑦𝑡 from the
state 𝑠𝑖 . Clearly 𝐻0 = 𝐺(𝑈𝑛 ) and 𝐻𝑡 = 𝑈𝑡 and hence by the triangle
inequality it suffices to prove that 𝐻𝑖 ≈ 𝐻𝑖+1 for all 𝑖 ∈ {0, … , 𝑡 − 1}.
We illustrate these two hybrids in Fig. 3.3.

Figure 3.3: Hybrids 𝐻𝑖 and 𝐻𝑖+1 — dotted boxes


refer to values that are chosen independently and
uniformly at random

Now suppose otherwise that there exists some adversary 𝐸𝑣𝑒 such
that |𝔼[𝐸𝑣𝑒(𝐻𝑖 )] − 𝔼[𝐸𝑣𝑒(𝐻𝑖+1 )]| ≥ 𝜖 for some non-negligible 𝜖. From
𝐸𝑣𝑒, we will design an adversary 𝐸𝑣𝑒′ breaking the security of the
pseudorandom generator 𝐺′ (see Fig. 3.4).

Figure 3.4: Building an adversary 𝐸𝑣𝑒′ for 𝐺′ from


an adversary 𝐸𝑣𝑒 distinguishing 𝐻𝑖 and 𝐻𝑖+1 . The
boxes marked with questions marks are those that
are random or pseudorandom depending on whether
we are in 𝐻𝑖 or 𝐻𝑖+1 . Everything inside the dashed
red lines is simulated by 𝐸𝑣𝑒′ that gets as input the
𝑛 + 1-bit string (𝑠𝑖+1 , 𝑦𝑖+1 ).

On input a string 𝑦 of length 𝑛 + 1, 𝐸𝑣𝑒′ will interpret 𝑦 as


(𝑠𝑖+1 , 𝑦𝑖+1 ) where 𝑠𝑖+1 ∈ {0, 1}𝑛 . She then chooses 𝑦1 , … , 𝑦𝑖 randomly
and compute 𝑦𝑖+2 , … , 𝑦𝑡 as in our pseudorandom generator’s construc-
tion. 𝐸𝑣𝑒′ will then feed (𝑦1 , … , 𝑦𝑡 ) to 𝐸𝑣𝑒 and output whatever 𝐸𝑣𝑒
96 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

does. Clearly, 𝐸𝑣𝑒′ is efficient if 𝐸𝑣𝑒 is. Moreover, one can see that
if 𝑦 was random then 𝐸𝑣𝑒′ is feeding 𝐸𝑣𝑒 with an input distributed
according to 𝐻𝑖+1 while if 𝑦 was of the form 𝐺(𝑠) for a random 𝑠 then
𝐸𝑣𝑒′ will feed 𝐸𝑣𝑒 with an input distributed according to 𝐻𝑖 . Hence
we get that | 𝔼[𝐸𝑣𝑒′ (𝐺′ (𝑈𝑛 ))] − 𝔼[𝐸𝑣𝑒′ (𝑈𝑛+1 )]| ≥ 𝜖 contradicting the
security of 𝐺′ .

R
Remark 3.5 — Pseudorandom generators in practice. The
proof of Theorem 3.4 is indicative of many practical
constructions of pseudorandom generators. In many
operating systems and programming environments,
pseudorandom generators work as follows:

1. Upon initialization, the system obtains an initial


seed of randomness 𝑥0 ∈ {0, 1}𝑛 (where often 𝑛 is
something like 128 or 256).
2. At the 𝑡-th call to a function such as ‘rand’ to obtain
new randomness, the system uses some underlying
pseudorandom generator 𝐺′ ∶ {0, 1}𝑛 → {0, 1}𝑛+𝑚
to let 𝑥′ ‖𝑦 = 𝐺′ (𝑥𝑡−1 ), updates 𝑥𝑡 = 𝑥′ and outputs
𝑦.

There are often some additional complications on


how to obtain this seed from some “unpredictable”
or “high entropy” observations (which can some-
times include network latency, user typing and mouse
patterns, and more), and whether the state of the
system is periodically “refreshed” using additional
observations.

3.0.1 Unpredictability: an alternative approach for proving the length


extension theorem
The notion that being random is the same as being “unpredictable”,
as discussed at the beginning of this chapter, can be formalized as
follows.

An efficiently computable func-


Definition 3.6 — Unpredictable function.
tion 𝐺 ∶ {0, 1} → {0, 1} is unpredictable if, for any 𝑛, 1 ≤ 𝑖 < ℓ(𝑛)
∗ ∗

and polynomially-sized circuit 𝑃 ,

1
Pr [𝑃 (𝑦1 , … , 𝑦𝑖−1 ) = 𝑦𝑖 ] ≤ + 𝑛𝑒𝑔𝑙(𝑛).
𝑦←𝐺(𝑈𝑛 ) 2

Here, ℓ(𝑛) is the length function of 𝐺 and 𝑦 ← 𝐺(𝑈𝑛 ) denotes that


𝑦 is a random output of 𝐺. In other words, no polynomial-sized cir-
p se u d ora n d omn e ss 97

cuit can predict the next bit of the output of 𝐺 given the previous
bits significantly better than guessing.

We now show that the condition for a function 𝐺 to be unpre-


dictable is equivalent to the condition for it to be a secure PRG. Please
make sure you follow the proof, because it is an important theorem,
and because it is another example of a canonical cryptographic proof.
Lemma 3.7 Let 𝐺 ∶ {0, 1}∗ → {0, 1}∗ be a function with length function
ℓ(𝑛), then 𝐺 is a secure PRG iff it is unpredictable.

Proof. For the forward direction, suppose for contradiction that there
exists some 𝑖 and some circuit 𝑃 can predict 𝑦𝑖 given 𝑦1 , … , 𝑦𝑖−1 with
probability 𝑝 ≥ 21 + 𝜖(𝑛) for non-negligible 𝜖. Consider the adversary
𝐸𝑣𝑒 that, given a string 𝑦, runs the circuit 𝑃 on 𝑦1 , … , 𝑦𝑖−1 , checks if
the output is equal to 𝑦𝑖 and if so output 1.
If 𝑦 = 𝐺(𝑥) for a uniform 𝑥, then 𝑃 succeeds with probability
𝑝. If 𝑦 is uniformly random, then we can imagine that the bit 𝑦𝑖 is
generated after 𝑃 finished its calculation. The bit 𝑦𝑖 is 0 or 1 with equal
probability, so 𝑃 succeeds with probability 21 . Since 𝐸𝑣𝑒 outputs 1
when 𝑃 succeeds,
1
|Pr[𝐸𝑣𝑒(𝐺(𝑈𝑛 )) = 1] − Pr[𝐸𝑣𝑒(𝑈ℓ ) = 1]| = |𝑝 − | ≥ 𝜖(𝑛),
2
a contradiction.
For the backward direction, let 𝐺 be an unpredictable function. Let
𝐻𝑖 be the distribution where the first 𝑖 bits come from 𝐺(𝑈𝑛 ) while the
last ℓ − 𝑖 bits are all random. Notice that 𝐻0 = 𝑈ℓ and 𝐻ℓ = 𝐺(𝑈𝑛 ), so
it suffices to show that 𝐻𝑖−1 ≈ 𝐻𝑖 for all 𝑖.
Suppose 𝐻𝑖−1 ≉ 𝐻𝑖 for some 𝑖, i.e. there exists some 𝐸𝑣𝑒 and non-
negligible 𝜖 such that

Pr[𝐸𝑣𝑒(𝐻𝑖 ) = 1] − Pr[𝐸𝑣𝑒(𝐻𝑖−1 ) = 1] > 𝜖(𝑛).

Consider the program 𝑃 that, on input (𝑦1 , … , 𝑦𝑖−1 ), picks the bits
𝑦𝑖̂ , … , 𝑦ℓ̂ uniformly at random. Then, 𝑃 calls 𝐸𝑣𝑒 on the generated
input. If 𝐸𝑣𝑒 outputs 1 then 𝑃 outputs 𝑦𝑖̂ , and otherwise it outputs
1 − 𝑦𝑖̂ .
The string (𝑦1 , … , 𝑦𝑖−1 , 𝑦𝑖̂ , … , 𝑦ℓ̂ ) has the same distribution as 𝐻𝑖−1 .
However, conditioned on 𝑦𝑖̂ = 𝑦𝑖 , the string has distribution equal to
𝐻𝑖 . Let 𝑝 be the probability that 𝐸𝑣𝑒 outputs 1 if 𝑦𝑖̂ = 𝑦𝑖 and 𝑞 be the
same probability when 𝑦𝑖̂ ≠ 𝑦𝑖 , then we get
1
𝑝 − (𝑝 + 𝑞) = Pr[𝐸𝑣𝑒(𝐻𝑖 ) = 1] − Pr[𝐸𝑣𝑒(𝐻𝑖−1 ) = 1] > 𝜖(𝑛).
2
Therefore, the probability 𝑃 outputs the correct value is equal to 12 𝑝 +
2 (1 − 𝑞) = 2 + 𝜖(𝑛), a contradiction.
1 1


98 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

The definition of unpredictability is useful because many of our


candidates for pseudorandom generators appeal to the unpredictabil-
ity definition in their proofs. For example, the Blum-Blum-Shub gen-
erator we will see later in the chapter is proved to be unpredictable
if the “quadratic residuosity problem” is hard. It is also nice to know
that our intuition at the beginning of the chapter can be formalized.

3.1 STREAM CIPHERS


We now show a connection between pseudorandom generators and
encryption schemes:

Theorem 3.8 — PRG conjecture implies Cipher conjectures. If the PRG


conjecture is true then so is the cipher conjecture.

It turns out that the converse direction is also true, and hence these
two conjectures are equivalent. We will probably not show the (quite
non-trivial) proof of this fact in this course. (We might show a weaker
version though.)

Proof. Recall that the one time pad is a perfectly secure cipher but its
only problem was that to encrypt an 𝑛 + 1 long message it needed
an 𝑛 + 1 long bit key. Now using a pseudorandom generator, we can
map an 𝑛-bit long key into an 𝑛 + 1-bit long string that looks random
enough that we could use it as a key for the one-time pad. That is, our
cipher will look as follows:

𝐸𝑘 (𝑚) = 𝐺(𝑘) ⊕ 𝑚

and

𝐷𝑘 (𝑐) = 𝐺(𝑘) ⊕ 𝑐

Just like in the one time pad, 𝐷𝑘 (𝐸𝑘 (𝑚)) = 𝐺(𝑘) ⊕ 𝐺(𝑘) ⊕ 𝑚 =
𝑚. Moreover, the encryption and decryption algorithms are clearly
efficient. We will prove security of this encryption by showing the
stronger claim that 𝐸𝑈𝑛 (𝑚) ≈ 𝑈𝑛+1 for any 𝑚.
Notice that 𝑈𝑛+1 = 𝑈𝑛+1 ⊕ 𝑚, as we showed in the security of the
one-time pad. Suppose that for some non-negligible 𝜖 = 𝜖(𝑛) > 0 there
is an efficient adversary 𝐸𝑣𝑒′ such that

|𝔼[𝐸𝑣𝑒′ (𝐺(𝑈𝑛 ) ⊕ 𝑚)] − 𝔼[𝐸𝑣𝑒′ (𝑈𝑛+1 ⊕ 𝑚)]| ≥ 𝜖.

Then the adversary 𝐸𝑣𝑒 defined as 𝐸𝑣𝑒(𝑦) = 𝐸𝑣𝑒′ (𝑦 ⊕ 𝑚) would


be also efficient. Furthermore, if 𝑦 is pseudorandom then 𝐸𝑣𝑒(𝑦) =
𝐸𝑣𝑒′ (𝐺(𝑈𝑛 ) ⊕ 𝑚) and if 𝑦 is uniformly random then 𝐸𝑣𝑒(𝑦) =
p se u d ora n d omn e ss 99

𝐸𝑣𝑒′ (𝑈𝑛+1 ⊕ 𝑚). Then, 𝐸𝑣𝑒 can distinguish the two distributions
with advantage 𝜖, a contradiction.

If the PRG outputs 𝑡(𝑛) bits instead of 𝑛 + 1 then we automatically


get an encryption scheme with 𝑡(𝑛) long message length. In fact, in
practice if we use the length extension for PRG’s, we don’t need to
decide on the length of messages in advance. Every time we need to
encrypt another bit (or another block) 𝑚𝑖 of the message, we run the
basic PRG to update our state and obtain some new randomness 𝑦𝑖
that we can XOR with the message and output. Such constructions
are known as stream ciphers in the literature. In much of the practical
literature, the name stream cipher is used both for the pseudorandom
generator itself as well as for the encryption scheme that is obtained
by combining it with the one-time pad.

R
Remark 3.9 — Using pseudorandom generators for coin
tossing over the phone. The following is a cute appli-
cation of pseudorandom generators. Alice and Bob
want to toss a fair coin over the phone. They use a
pseudorandom generator 𝐺 ∶ {0, 1}𝑛 → {0, 1}3𝑛 .
1. Alice will send 𝑧 ←𝑅 {0, 1}3𝑛 to Bob
2. Bob picks 𝑠 ←𝑅 {0, 1}𝑛 and 𝑏 ←𝑅 {0, 1}. If 𝑏 = 0
then Bob sends 𝑦 = 𝐺(𝑠) and if 𝑏 = 1 he sends 𝑦 =
𝐺(𝑠) ⊕ 𝑧. In other words, 𝑦 = 𝐺(𝑠) ⊕ 𝑏 ⋅ 𝑧 where 𝑏 ⋅ 𝑧
is the vector (𝑏 ⋅ 𝑧1 , … , 𝑏 ⋅ 𝑧3𝑛 ).
3. Alice then picks a random 𝑏′ ←𝑅 {0, 1} and sends
it to Bob.
4. Bob sends to Alice the string 𝑠 and 𝑏. Alice verifies
that indeed 𝑦 = 𝐺(𝑠) ⊕ 𝑏 ⋅ 𝑧. Otherwise Alice aborts.
5. The output of the protocol is 𝑏 ⊕ 𝑏′ .
It can be shown that (assuming the protocol is com-
pleted) the output is a random coin, which neither
Alice or Bob can control or predict with more than
negligible advantage over half. Trying to formalize
this and prove it is an excellent exercise. Two main
components in the proofs are:
• With probability 1 − 𝑛𝑒𝑔𝑙(𝑛) over 𝑧 ←𝑅 {0, 1}3𝑛 ,
the sets 𝑆0 = {𝐺(𝑥)|𝑥 ∈ {0, 1}𝑛 } and
𝑆1 = {𝐺(𝑥) ⊕ 𝑧|𝑥 ∈ {0, 1}𝑛 } will be disjoint.
Hence by choosing 𝑧 at random, Alice can ensure
that Bob is committed to the choice of 𝑏 after sending
𝑦.
• For every 𝑧, both the distribution 𝐺(𝑈𝑛 ) and
𝐺(𝑈𝑛 ) ⊕ 𝑧 are pseudorandom. This can be shown
to imply that no matter what string 𝑧 Alice chooses,
she cannot predict 𝑏 from the string 𝑦 sent by Bob
with probability better than 1/2 + 𝑛𝑒𝑔𝑙(𝑛). Hence
her choice of 𝑏′ will be essentially independent of 𝑏.
100 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

3.2 WHAT DO PSEUDORANDOM GENERATORS ACTUALLY LOOK


LIKE?
So far we have made the conjectures that objects such as ciphers and
pseudorandom generators exist, without giving any hint as to how
they would actually look like. (Though we have examples such as
the Caesar cipher, Vigenere, and Enigma of what secure ciphers don’t
look like.) As mentioned above, we do not know how to prove that
any particular function is a pseudorandom generator. However, there
are quite simple candidates (i.e., functions that are conjectured to be
secure pseudorandom generators), though care must be taken in
constructing them. We now consider candidates for functions that
maps 𝑛 bits to 𝑛 + 1 bits (or more generally 𝑛 + 𝑐 for some constant
𝑐 ) and look at least somewhat “randomish”. As these constructions
are typically used as a basic component for obtaining a longer length
PRG via the length extension theorem (Theorem 3.4), we will think
of these pseudorandom generators as mapping a string 𝑠 ∈ {0, 1}𝑛
representing the current state into a string 𝑠’ ∈ {0, 1}𝑛 representing
the new state as well as a string 𝑏 ∈ {0, 1}𝑐 representing the current
output. See also Section 6.1 in Katz-Lindell and (for greater depth)
Sections 3.6-3.9 in the Boneh-Shoup book.

3.2.1 Attempt 0: The counter generator


To get started, let’s look at an example of an obviously bogus pseudo-
random generator. We define the “counter pseudorandom generator”
𝐺 ∶ {0, 1}𝑛 → {0, 1}𝑛+1 as follows. 𝐺(𝑠) = (𝑠′ , 𝑏) where 𝑠′ = 𝑠 + 1
mod 2𝑛 (treating 𝑠 and 𝑠′ as numbers in {0, … , 2𝑛 − 1}) and 𝑏 is the
least significant digit of 𝑠′ . It’s a great exercise to work out why this is
not a secure pseudorandom generator.

P
You should really pause here and make sure you see
why the “counter pseudorandom generator” is not
a secure pseudorandom generator. Show that this is
true even if we replace the least significant digit by the
𝑘-th digit for every 0 ≤ 𝑘 < 𝑛.

3.2.2 Attempt 1: The linear checksum / linear feedback shift register


(LFSR)
LFSR can be thought of as the “mother” (or maybe more like the sick
great-uncle) of all pseudorandom generators. One of the simplest
ways to generate a “randomish” extra digit given an 𝑛 digit number 6
CRC are often used to generate a “control digit” to
detect mistypes of credit card or social security card
is to use a checksum - some linear combination of the digits, with a
number. This has very different goals than its use for
canonical example being the cyclic redundancy check or CRC.6 This pseudorandom generators, though there are some
motivates the notion of a linear feedback shift register generator (LFSR): common intuitions behind the two usages.
pse u d ora n d omn e ss 101

if the current state is 𝑠 ∈ {0, 1}𝑛 then the output is 𝑓(𝑠) where 𝑓 is
a linear function (modulo 2) and the new state is obtained by right
shifting the previous state and putting 𝑓(𝑠) at the leftmost location.
That is, 𝑠′1 = 𝑓(𝑠) and 𝑠′𝑖 = 𝑠𝑖−1 for 𝑖 ∈ {2, … , 𝑛}.
LFSR’s have several good properties- if the function 𝑓(⋅) is chosen
properly then they can have very long periods (i.e., it can take an ex-
ponential number of steps until the state repeats itself), though that
also holds for the simple “counter” generator we saw above. They
also have the property that every individual bit is equal to 0 or 1 with
probability exactly half (the counter generator also shares this prop-
erty).
A more interesting property is that (if the function is selected prop-
erly) every two coordinates are independent from one another. That
is, there is some super-polynomial function 𝑡(𝑛) (in fact 𝑡(𝑛) can be
exponential in 𝑛) such that if ℓ ≠ ℓ′ ∈ {0, … , 𝑡(𝑛)}, then if we look at
the two random variables corresponding to the ℓ-th and ℓ′ -th output
of the generator (where randomness is the initial state) then they are
distributed like two independent random coins. (This is non-trivial to
show, and depends on the choice of 𝑓 - it is a challenging but useful
exercise to work this out.) The counter generator fails badly at this
condition: the least significant bits between two consecutive states
always flip.
There is a more general notion of a linear generator where the new
state can be any invertible linear transformation of the previous state.
That is, we interpret the state 𝑠 as an element of ℤ𝑡𝑞 for some integers
7
A ring is a set of elements where addition and
𝑞, 𝑡,7 and let 𝑠’ = 𝐹 (𝑠) and the output 𝑏 = 𝐺(𝑠) where 𝐹 ∶ ℤ𝑡𝑞 → ℤ𝑡𝑞 multiplication are defined and obey the natural rules
and 𝐺 ∶ ℤ𝑡𝑞 → ℤ𝑞 are invertible linear transformations (modulo 𝑞). of associativity and commutativity (though without
This includes as a special case the linear congruential generator where necessarily having a multiplicative inverse for every
element). For every integer 𝑞 we define ℤ𝑞 (known as
𝑡 = 1 and the map 𝐹 (𝑠) corresponds to taking 𝑎𝑠 (mod 𝑞) where 𝑎 is the ring of integers modulo 𝑞) to be the set {0, … , 𝑞 − 1}
number co-prime to 𝑞. where addition and multiplication is done modulo 𝑞.
All these generators are unfortunately insecure due to the great
bane of cryptography- the Gaussian elimination algorithm which stu- 8
Despite the name, the algorithm goes at least as far
dents typically encounter in any linear algebra class.8 back as the Chinese Jiuzhang Suanshu manuscript,
circa 150 B.C.
There is a poly-
Theorem 3.10 — The unfortunate theorem for cryptography.
nomial time algorithm to solve 𝑚 linear equations in 𝑛 variables
(or to certify no solution exists) over any ring.

Despite its seeming simplicity and ubiquity, Gaussian elimination


(and some generalizations and related algorithms such as Euclid’s
extended g.c.d algorithm and the LLL lattice reduction algorithm)
has been used time and again to break candidate cryptographic con-
structions. In particular, if we look at the first 𝑛 outputs of a linear
generator 𝑏1 , … , 𝑏𝑛 then we can write linear equations in the unknown
102 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

initial state of the form 𝑓1 (𝑠) = 𝑏1 , … , 𝑓𝑛 (𝑠) = 𝑏𝑛 where the 𝑓𝑖 ’s are


known linear functions. Either those functions are linearly independent,
in which case we can solve the equations to get the unique solution for
the original state 𝑠 and from which point we can predict all outputs of
the generator, or they are dependent, which means that we can predict
some of the outputs even without recovering the original state. Either
way, the generator is ∗♯!’ed (where ∗♯! refers to whatever verb you
prefer to use when your system is broken). See also this 1977 paper of
James Reed.

R
Remark 3.11 — Non-cryptographic PRGs. The above
means that it is a bad idea to use a linear checksum as
a pseudorandom generator in a cryptographic appli-
cation, and in fact in any adversarial setting (e.g., one
shouldn’t hope that an attacker would not be able to
reverse engineer the algorithm 9 that computes the
control digit of a credit card number). However, that
does not mean that there are no legitimate cases where
linear generators can be used . In a setting where the
application is not adversarial and you have an ability
to test if the generator is actually successful, it might
be reasonable to use such insecure non-cryptographic
generators. They tend to be more efficient (though
often not by much) and hence are often the default
option in many programming environments such as
the C rand() command. (In fact, the real bottleneck
in using cryptographic pseudorandom generators
is often the generation of entropy for their seed, as
discussed in the previous lecture, and not their actual
running time.)
9
That number is obtained by applying an algorithm
of Hans Peter Luhn which applies a simple map to
each digit of the card and then sums them up modulo
3.2.3 From insecurity to security 10.
It is often the case that we want to “fix” a broken cryptographic prim-
itive, such as a pseudorandom generator, to make it secure. At the
moment this is still more of an art than a science, but there are some
principles that cryptographers have used to try to make this more
principled. The main intuition is that there are certain properties of
computational problems that make them more amenable to algo-
rithms (i.e., “easier”) and when we want to make the problems useful
for cryptography (i.e., “hard”) we often seek variants that don’t pos-
sess these properties. The following table illustrates some examples
of such properties. (These are not formal statements, but rather is
intended to give some intuition )

Easy Hard
Continuous Discrete
pse u d ora n d omn e ss 103

Easy Hard
Convex Non-convex
Linear Non-linear (degree ≥ 2)
Noiseless Noisy
Local Global
Shallow Deep
Low degree High degree

Many cryptographic constructions can be thought of as trying to


transform an easy problem into a hard one by moving from the left to
the right column of this table.
The discrete logarithm problem is the discrete version of the con-
tinuous real logarithm problem. The learning with errors problem
can be thought of as the noisy version of the linear equations problem
(or the discrete version of least squares minimization). When con-
structing block ciphers we often have mixing transformation to ensure
that the dependency structure between different bits is global, S-boxes
to ensure non-linearity, and many rounds to ensure deep structure and
large algebraic degree.
This also works in the other direction. Many algorithmic and ma-
chine learning advances work by embedding a discrete problem in a
continuous convex one. Some attacks on cryptographic objects can be
thought of as trying to recover some of the structure (e.g., by embed-
ding modular arithmetic in the real line or “linearizing” non linear
equations).

3.2.4 Attempt 2: Linear Congruential Generators with dropped bits


One approach that is widely used in implementations of pseudoran-
dom generators is to take a linear generator such as the linear congru-
ential generators described above, and use for the output a “chopped”
version of the linear function and drop some of the least significant
bits. The operation of dropping these bits is non-linear and hence the
attack above does not immediately apply. Nevertheless, it turns out
this attack can be generalized to handle this case, and hence even with
dropped bits Linear Congruential Generators are completely insecure
and should be used (if at all) only in applications such as simulations
where there is no adversary. Section 3.7.1 in the Boneh-Shoup book
describes one attack against such generators that uses the notion of
lattice algorithms that we will encounter later in this course in very
different contexts.
104 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

3.3 SUCCESSFUL EXAMPLES


Let’s now describe some successful (at least per current knowledge)
pseudorandom generators:

3.3.1 Case Study 1: Subset Sum Generator 10


Actually modern computers will be able to break
Here is an extremely simple generator that is yet still secure10 as far as this generator via brute force, but if the length and
number of the constants were doubled (or perhaps
we know. quadrupled) this should be sufficiently secure,
though longer to write down.
# seed is a list of 40 zero/one values
# output is a 48 bit integer
def subset_sum_gen(seed):
modulo = 0x1000000
constants = [
0x3D6EA1, 0x1E2795, 0xC802C6, 0xBF742A, 0x45FF31,
0x53A9D4, 0x927F9F, 0x70E09D, 0x56F00A, 0x78B494,
0x9122E7, 0xAFB10C, 0x18C2C8, 0x8FF050, 0x0239A3,
0x02E4E0, 0x779B76, 0x1C4FC2, 0x7C5150, 0x81E05E,
0x154647, 0xB80E68, 0xA042E5, 0xE20269, 0xD3B7F3,
0xCC5FB9, 0x0BFC55, 0x847AE0, 0x8CFDF8, 0xE304B7,
0x869ACE, 0xB4CDAB, 0xC8E31F, 0x00EDC7, 0xC50541,
0x0D6DDD, 0x695A2F, 0xA81062, 0x0123CA, 0xC6C5C3 ]

# return the modular sum of the constants


# corresponding to ones in the seed
return reduce(lambda x,y: (x+y) % modulo,
map(lambda a,b: a*b, constants,seed))

The seed to this generator is an array seed of 40 bits, with 40 hard-


wired constants each 48 bits long (these constants were generated at
random, but are fixed once and for all, and are not kept secret and
hence are not considered part of the secret random seed). The output
is simply
40
∑ seed[𝑖]constants[𝑖] (mod 248 )
𝑖=1

and hence expands the 40 bit input into a 48 bit output.


This generator is loosely motivated by the “subset sum” computa-
tional problem, which is NP hard. However, since NP hardness is a
worst case notion of complexity, it does not imply security for pseudo-
random generators, which requires hardness of an average case variant.
To get some intuition for its security, we can work out why (given that
it seems to be linear) we cannot break it by simply using Gaussian
elimination.
pse u d ora n d omn e ss 105

P
This is an excellent point for you to stop and try to
answer this question on your own.

Given the known constants and known output, figuring out the set
of potential seeds can be thought of as solving a single equation in 40
variables. However, this equation is clearly overdetermined, and will
have a solution regardless of whether the observed value is indeed an
output of the generator, or it is chosen uniformly at random.
More concretely, we can use linear-equation solving to compute
(given the known constants 𝑐1 , … , 𝑐40 ∈ ℤ248 and the output 𝑦 ∈ ℤ248 )
the linear subspace 𝑉 of all vectors (𝑠1 , … , 𝑠40 ) ∈ (ℤ248 )40 such that
∑ 𝑠𝑖 𝑐𝑖 = 𝑦 (mod 248 ). But, regardless of whether 𝑦 was generated at
random from ℤ248 , or 𝑦 was generated as an output of the generator,
the subspace 𝑉 will always have the same dimension (specifically,
since it is formed by a single linear equation over 40 variables, the
dimension will be 39.) To break the generator we seem to need to be
able to decide whether this linear subspace 𝑉 ⊆ (ℤ248 )40 contains
a Boolean vector (i.e., a vector 𝑠 ∈ {0, 1}𝑛 ). Since the condition that
a vector is Boolean is not defined by linear equations, we cannot use
Gaussian elimination to break the generator. Generally, the task of
finding a vector with small coefficients inside a discrete linear sub-
space is closely related to a classical problem known as finding the
shortest vector in a lattice. (See also the short integer solution (SIS)
problem.)

3.3.2 Case Study 2: RC4


The following is another example of an extremely simple generator
known as RC4 (this stands for Rivest Cipher 4, as Ron Rivest invented
this in 1987) and is still fairly widely used today.

def RC4(P,i,j):
i = (i + 1) % 256
j = (j + P[i]) % 256
P[i], P[j] = P[j], P[i]
return (P,i,j,P[(P[i]+P[j]) % 256])

The function RC4 takes as input the current state P,i,j of the gen-
erator and returns the new state together with a single output byte.
The state of the generator consists of an array P of 256 bytes, which can
be thought of as a permutation of the numbers 0, … , 255 in the sense
that we maintain the invariant that P[𝑖] ≠ P[𝑗] for every 𝑖 ≠ 𝑗, and two
indices 𝑖, 𝑗 ∈ {0, … , 255}. We can consider the initial state as the case
where P is a completely random permutation and 𝑖 and 𝑗 are initial-
106 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

ized to zero, although to save on initial seed size, typically RC4 uses
some “pseudorandom” way to generate P from a shorter seed as well.
RC4 has extremely efficient software implementations and hence
has been widely implemented. However, it has several issues with its 11
I typically do not include references in these lecture
security. In particular it was shown by Mantin11 and Shamir that the notes, and leave them to the texts, but I make here an
second bit of RC4 is not random, even if the initialization vector was exception because Itsik Mantin was a close friend of
mine in grad school.
random. This and other issues led to a practical attack on the 802.11b
WiFi protocol, see Section 9.9 in Boneh-Shoup. The initial response to
those attacks was to suggest to drop the first 1024 bytes of the output,
but by now the attacks have been sufficiently extended that RC4 is
simply not considered a secure cipher anymore. The ciphers Salsa and
ChaCha, designed by Dan Bernstein, have a similar design to RC4, and
are considered secure and deployed in several standard protocols such
as TLS, SSH and QUIC, see Section 3.6 in Boneh-Shoup.

3.3.3 Case Study 3: Blum, Blum and Shub


B.B.S., which stands for the authors Blum, Blum and Shub, is a simple
generator constructed from a potentially hard problem in number
theory.
Let 𝑁 = 𝑃 ⋅ 𝑄, where 𝑃 , 𝑄 are primes. (We will generally use
𝑃 , 𝑄 of size roughly 𝑛, where 𝑛 is our security parameter, and so use
capital letters to emphasize that the magnitude of these numbers is
exponential in the security parameter.)
We define QR𝑁 to be the set of quadratic residues modulo 𝑁 , which
are the numbers that have a modular square root. Formally,

QR𝑁 = {𝑋 2 mod 𝑁 ∣ gcd(𝑋, 𝑁 ) = 1}.


This definition extends the concept of “perfect squares” when
we are working with standard integers. Notice that each number in
𝑌 ∈ QR𝑁 has at least one square root (number 𝑋 such that 𝑌 = 𝑋 2
mod 𝑁 ). We will see later in the course that if 𝑁 = 𝑃 ⋅ 𝑄 for primes
𝑃 , 𝑄 then each 𝑌 ∈ QR𝑁 has exactly 4 square roots. The B.B.S. genera-
tor chooses 𝑁 = 𝑃 ⋅ 𝑄, where 𝑃 , 𝑄 are prime and 𝑃 , 𝑄 ≡ 3 (mod 4).
The second condition guarantees that for each 𝑌 ∈ QR𝑁 , exactly one
of its square roots fall in QR𝑁 , and hence the map 𝑋 ↦ 𝑋 2 mod 𝑁 is
one-to-one and onto map from QR𝑁 to itself.
It is defined as follows:

def BBS(X):
return (X * X % N, N % 2)

In other words, on input 𝑋, BBS(𝑋) outputs 𝑋 2 mod 𝑁 and the


least significant bit of 𝑋. We can think of BBS as a map BBS ∶ 𝑄𝑅𝑁 →
QR𝑁 × {0, 1} and so it maps a domain into a larger domain. We can
also extend it to output 𝑡 additional bits, by repeatedly squaring the
pse u d ora n d omn e ss 107

input, letting 𝑋0 = 𝑋, 𝑋𝑖+1 = 𝑋𝑖2 mod 𝑁 , for 𝑖 = 0, … , 𝑡 − 1, and


outputting 𝑋𝑡 together with the least significant bits of 𝑋0 , … , 𝑋𝑡−1 .
It turns out that assuming that there is no polynomial-time algorithm
(where “polynomial-time” means polynomial in the number of bits
to represent 𝑁 , i.e., polynomial in log 𝑁 ) to factor randomly chosen
integers 𝑁 = 𝑃 ⋅ 𝑄, for every 𝑡 that is polynomial in the number of bits
in 𝑁 , the output of the 𝑡-step BBS generator will be computationally
indistinguishable from 𝑈𝑄𝑅𝑁 × 𝑈𝑡 where 𝑈𝑄𝑅𝑁 denotes the uniform
distribution over QR𝑁 .
The number theory required to show takes a while to develop.
However, it is interesting and I recommend the reader to search up
this particular generator, see for example this survey by Junod.

3.4 NON-CONSTRUCTIVE EXISTENCE OF PSEUDORANDOM GEN-


ERATORS
We now show that, if we don’t insist on constructivity of pseudoran-
dom generators, then there exists pseudorandom generators with
output that are exponentially larger than the input length.
There is some
Lemma 3.12 — Existence of inefficient pseudorandom generators.
absolute constant 𝐶 such that for every 𝜖, 𝑇 , if ℓ > 𝐶(log 𝑇 + log(1/𝜖))
and 𝑚 ≤ 𝑇 , then there is an (𝑇 , 𝜖) pseudorandom generator 𝐺 ∶
{0, 1}ℓ → {0, 1}𝑚 .

Proof Idea:
The proof uses an extremely useful technique known as the “prob-
abilistic method” which is not too hard mathematically but can be There is a whole (highly recommended) book by
12

confusing at first.12 The idea is to give a “non constructive” proof of Alon and Spencer devoted to this method.
existence of the pseudorandom generator 𝐺 by showing that if 𝐺 was
chosen at random, then the probability that it would be a valid (𝑇 , 𝜖)
pseudorandom generator is positive. In particular this means that
there exists a single 𝐺 that is a valid (𝑇 , 𝜖) pseudorandom generator.
The probabilistic method is just a proof technique to demonstrate the
existence of such a function. Ultimately, our goal is to show the ex-
istence of a deterministic function 𝐺 that satisfies the conditions of a
(𝑇 , 𝜖) PRG.

The above discussion might be rather abstract at this point, but


would become clearer after seeing the proof.

Proof of Lemma 3.12. Let 𝜖, 𝑇 , ℓ, 𝑚 be as in the lemma’s statement. We


need to show that there exists a function 𝐺 ∶ {0, 1}ℓ → {0, 1}𝑚 that
“fools” every 𝑇 line program 𝑃 in the sense of (3.1). We will show
that this follows from the following claim:
108 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Claim I: For every fixed NAND program / Boolean circuit 𝑃 , if we


pick 𝐺 ∶ {0, 1}ℓ → {0, 1}𝑚 at random then the probability that (3.1) is
violated is at most 2−𝑇 .
2

Before proving Claim I, let us see why it implies Lemma 3.12. We


can identify a function 𝐺 ∶ {0, 1}ℓ → {0, 1}𝑚 with its “truth table”
or simply the list of evaluations on all its possible 2ℓ inputs. Since
each output is an 𝑚 bit string, we can also think of 𝐺 as a string in
{0, 1}𝑚⋅2 . We define ℱ𝑚 ℓ to be the set of all functions from {0, 1} to

{0, 1}𝑚 . As discussed above we can identify ℱ𝑚 ℓ with {0, 1} and



𝑚⋅2

choosing a random function 𝐺 ∼ ℱℓ corresponds to choosing a


𝑚

random 𝑚 ⋅ 2ℓ -long bit string.


For every NAND program / Boolean circuit 𝑃 let 𝐵𝑃 be the event
that, if we choose 𝐺 at random from ℱ𝑚 ℓ then (3.1) is violated with
respect to the program 𝑃 . It is important to understand what is the
sample space that the event 𝐵𝑃 is defined over, namely this event
depends on the choice of 𝐺 and so 𝐵𝑃 is a subset of ℱ𝑚 ℓ . An equiva-
lent way to define the event 𝐵𝑃 is that it is the subset of all functions
mapping {0, 1}ℓ to {0, 1}𝑚 that violate (3.1), or in other words:


{ ⎫
}
1 1
𝐵𝑃 = ⎨ 𝐺 ∈ ℱ 𝑚
ℓ ∣ ∣ 2ℓ ∑ 𝑃 (𝐺(𝑠)) − 2𝑚 ∑ 𝑃 (𝑟)∣ > 𝜖⎬ .
{
⎩ 𝑠∈{0,1}ℓ 𝑟∈{0,1}𝑚 }

(3.2)
(We’ve replaced here the probability statements in (3.1) with the
equivalent sums so as to reduce confusion as to what is the sample
space that 𝐵𝑃 is defined over.)
To understand this proof it is crucial that you pause here and see
how the definition of 𝐵𝑃 above corresponds to (3.2). This may well
take re-reading the above text once or twice, but it is a good exercise
at parsing probabilistic statements and learning how to identify the
sample space that these statements correspond to.
Now, the number of programs of size 𝑇 (or circuits of size 𝑇 ) is
at most 2𝑂(𝑇 log 𝑇 ) . Since 𝑇 log 𝑇 = 𝑜(𝑇 2 ) this means that if Claim I
is true, then by the union bound it holds that the probability of the
union of 𝐵𝑃 over all NAND programs of at most 𝑇 lines is at most
2𝑂(𝑇 log 𝑇 ) 2−𝑇 < 0.1 for sufficiently large 𝑇 . What is important for
2

us about the number 0.1 is that it is smaller than 1. In particular this


means that there exists a single 𝐺∗ ∈ ℱ𝑚 ℓ such that 𝐺 does not violate

(3.1) with respect to any NAND program of at most 𝑇 lines, but that
precisely means that 𝐺∗ is a (𝑇 , 𝜖) pseudorandom generator.
Hence, it suffices to prove Claim I to conclude the proof of
Lemma 3.12. Choosing a random 𝐺 ∶ {0, 1}ℓ → {0, 1}𝑚 amounts to
choosing 𝐿 = 2ℓ random strings 𝑦0 , … , 𝑦𝐿−1 ∈ {0, 1}𝑚 and letting
𝐺(𝑥) = 𝑦𝑥 (identifying {0, 1}ℓ and [𝐿] via the binary representation).
Hence the claim amounts to showing that for every fixed function
pse u d ora n d omn e ss 109

𝑃 ∶ {0, 1}𝑚 → {0, 1}, if 𝐿 > 2𝐶(log 𝑇 +log(1/𝜖)) (which by setting 𝐶 > 4,
we can ensure is larger than 10𝑇 2 /𝜖2 ) then the probability that
𝐿−1
∣ 𝐿1 ∑ 𝑃 (𝑦𝑖 ) − Pr [𝑃 (𝑠) = 1]∣ > 𝜖 (3.3)
𝑠←𝑅 {0,1}𝑚
𝑖=0

is at most 2−𝑇 . (??) follows directly from the Chernoff bound. If


2

we let for every 𝑖 ∈ [𝐿] the random variable 𝑋𝑖 denote 𝑃 (𝑦𝑖 ), then
since 𝑦0 , … , 𝑦𝐿−1 is chosen independently at random, these are inde-
pendently and identically distributed random variables with mean
𝔼𝑦←𝑅 {0,1}𝑚 [𝑃 (𝑦)] = Pr𝑦←𝑅 {0,1}𝑚 [𝑃 (𝑦) = 1] and hence the probability
that they deviate from their expectation by 𝜖 is at most 2 ⋅ 2−𝜖 𝐿/2 .
2


4
Pseudorandom functions

Reading: Rosulek Chapter 6 has a good description of pseudorandom


functions. Katz-Lindell cover pseudorandom functions in a different
order than us. The topics of this lecture and the next ones are covered
in KL sections 3.4-3.5 (PRFs and CPA security), 4.1-4.3 (MACs), and
8.5 (construction of PRFs from PRG).
In the last lecture we saw the notion of pseudorandom generators, and
introduced the PRG conjecture, which stated that there exists a pseu-
dorandom generator mapping 𝑛 bits to 𝑛 + 1 bits. We have seen the
length extension theorem, which states that given such a pseudoran-
dom generator, there exists a generator mapping 𝑛 bits to 𝑚 bits for an
arbitrarily large polynomial 𝑚(𝑛). But can we extend it even further?
Say, to 2𝑛 bits? Does this question even make sense? And why would
we want to do that? This is the topic of this lecture.
At a first look, the notion of extending the output length of a pseu-
dorandom generator to 2𝑛 bits seems nonsensical. After all, we want
our generator to be efficient and just writing down the output will take
exponential time. However, there is a way around this conundrum.
While we can’t efficiently write down the full output, we can require
that it would be possible, given an index 𝑖 ∈ {0, … , 2𝑛 − 1}, to compute 1
We will often identify the strings of length 𝑛 with
the 𝑖𝑡ℎ bit of the output in polynomial time.1 That is, we require that the numbers between 0 and 2𝑛−1 , and switch freely
the function 𝑖 ↦ 𝐺(𝑠)𝑖 is efficiently computable and (by security of between the two representations, and hence can think
of 𝑖 also as a string in {0, 1}𝑛 . We will also switch
the pseudorandom generator) indistinguishable from a function that
between indexing strings starting from 0 and starting
maps each index 𝑖 to an independent random bit in {0, 1}. This is the from 1 based on convenience.
notion of a pseudorandom function generator which is a bit subtle to de-
fine and construct, but turns out to have a great many applications in
cryptography.

Definition 4.1 — Pseudorandom Function Generator.An efficiently com-


putable function 𝐹 taking two inputs 𝑠 ∈ {0, 1}∗ and 𝑖 ∈ {0, … , 2|𝑠| −
1} and outputting a single bit 𝐹 (𝑠, 𝑖) is a pseudorandom function

Compiled on 11.17.2021 22:35


112 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

(PRF) generator if for every polynomial time adversary 𝐴 out-


putting a single bit and polynomial 𝑝(𝑛), if 𝑛 is large enough then:

∣ 𝔼 [𝐴𝐹 (𝑠,⋅) (1𝑛 )] − 𝔼 [𝐴𝐻 (1𝑛 )]∣ < 1/𝑝(𝑛) .


𝑠∈{0,1}𝑛 𝐻←𝑅 [2𝑛 ]→{0,1}

Some notes on notation are in order. The input 1𝑛 is simply a string


of 𝑛 ones, and it is a typical cryptography convention to assume that
such an input is always given to the adversary. This is simply be-
cause by “polynomial time adversary” we really mean polynomial in 2
This also allows us to be consistent with the notion
𝑛 (which is our key size or security parameter)2 . The notation 𝐴𝐹 (𝑠,⋅) of “polynomial in the size of the input.”
means that 𝐴 has black box (also known as oracle) access to the func-
tion that maps 𝑖 to 𝐹 (𝑠, 𝑖). That is, 𝐴 can choose an index 𝑖, query the
box and get 𝐹 (𝑠, 𝑖), then choose a new index 𝑖′ , query the box to get
𝐹 (𝑠, 𝑖′ ), and so on for a polynomial number of queries. The notation
𝐻 ←𝑅 [2𝑛 ] → {0, 1} means that 𝐻 is a completely random function
that maps every index 𝑖 to an independent and random different bit.

R
Remark 4.2 — Completely Random Functions. This no-
tion of a randomly chosen function can be difficult to
wrap your mind around. Try to imagine a table of all
of the strings in {0, 1}𝑛 . We now go to each possible
input, randomly generate a bit to be its output, and
write down the result in the table. When we’re done,
we have a length 2𝑛 lookup table that maps each input
to an output that was generated uniformly at random
and independently of all other outputs. This lookup
table is now our random function 𝐻.
In practice it’s too cumbersome to actually generate
all 2𝑛 bits, and sometimes in theory it’s convenient to
think of each output as generated only after a query is
made. This leads to adopting the lazy evaluation model.
In the lazy evaluation model, we imagine that a lazy
person is sitting in a room with the same lookup table
as before, but with all entries blank. If someone makes
some query 𝐻(𝑠), the lazy person checks if the entry
for 𝑠 in the lookup table is blank. If so, the lazy evalu-
ator generates a random bit, writes down the result for
𝑠, and returns it. Otherwise, if an output has already
been generated for 𝑠 previously (because 𝑠 has been
queried before), the lazy evaluator simply returns this
value. Can you see why this model is more convenient
in some ways?
One last way to think about how a completely random
function is determined is to first observe that there
exist a total of 22 functions from {0, 1}𝑛 to {0, 1} (can
𝑛

you see why? It may be easier to think of them as


functions from [2𝑛 ] to {0, 1}). We choose one of them
pse u d ora n d om fu nc ti on s 113

uniformly at random to be 𝐻, and it’s still the case


that for any given input 𝑠 the result 𝐻(𝑠) is 0 or 1 with
equal probability independent of any other input.
Regardless of which model we use to think about gen-
erating 𝐻, after we’ve chosen 𝐻 and put it in a black
box, the behavior of 𝐻 is in some sense “determinis-
tic” because given the same query it will always return
the same result. However, before we ever make any
given query 𝑠 we can only guess 𝐻(𝑠) correctly with
probability 21 , because without previously observing
𝐻(𝑠) it is effectively random and undecided to us (just
like in the lazy evaluator model).

P
Now would be a fantastic time to stop and think
deeply about the three constructions in the remark
above, and in particular why they are all equivalent. If
you don’t feel like thinking then at the very least you
should make a mental note to come back later if you’re
confused, because this idea will be very useful down
the road.

Thus, the notation 𝐴𝐻 in the PRF definition means 𝐴 has access


to a completely random black box that returns a random bit for any
new query made, and on previously seen queries returns the same bit
as before. Finally one last note: below we will identify the set [2𝑛 ] =
{0, … , 2𝑛 − 1} with the set {0, 1}𝑛 (there is a one to one mapping
between those sets using the binary representation), and so we will
treat 𝑖 interchangeably as a number in [2𝑛 ] or a string in {0, 1}𝑛 .

Ensembles of PRFs. If 𝐹 is a pseudorandom function generator, then


if we choose a random string 𝑠 and consider the function 𝑓𝑠 defined
by 𝑓𝑠 (𝑖) = 𝐹 (𝑠, 𝑖), no efficient algorithm can distinguish between
black box access to 𝑓𝑠 (⋅) and black box access to a completely random
function (see Fig. 4.1). Notably, black box access implies that a priori
the adversary does not know which function it’s querying. From the
adversary’s point of view, they query some oracle 𝑂 (which behind
the scenes is either 𝑓𝑠 (⋅) or 𝐻), and must decide if 𝑂 = 𝑓𝑠 (⋅) or 𝑂 =
𝐻. Thus often instead of talking about a pseudorandom function
generator we will refer to a pseudorandom function ensemble {𝑓𝑠 }𝑠∈{0,1}∗ .
Formally, this is defined as follows:

Definition 4.3 — PRF ensembles.Let {𝑓𝑠 }𝑠∈{0,1}∗ be an ensemble of


functions such that for every 𝑠 ∈ {0, 1}∗ , 𝑓𝑠 ∶ {0, 1}|𝑠| → {0, 1}. We
say that {𝑓𝑠 } is a pseudorandom function ensemble if the function 𝐹
114 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

that on input 𝑠 ∈ {0, 1}∗ and 𝑖 ∈ {0, … , 2|𝑠| − 1} outputs 𝑓𝑠 (𝑖) is a


PRF generator.

Note that the condition of Definition 4.3 corresponds to requiring


that for every polynomial 𝑝 and 𝑝(𝑛)-time adversary 𝐴, if 𝑛 is large
enough then

∣ 𝔼 [𝐴𝑓𝑠 (⋅) (1𝑛 )] − 𝔼 [𝐴ℎ (1𝑛 )]∣ < 1/𝑝(𝑛)


𝑠∈{0,1}𝑛 ℎ←𝑅 ℱ𝑛,1

where ℱ𝑛,1 is the set of all functions mapping {0, 1}𝑛 to {0, 1} (i.e.,
the set {0, 1}𝑛 → {0, 1}).

P
It is worth while to pause and make sure you un-
derstand why Definition 4.3 and Definition 4.1 give
different ways to talk about the same object.

Figure 4.1: In a pseudorandom function, an adversary


cannot tell whether they are given a black box that
computes the function 𝑖 ↦ 𝐹 (𝑠, 𝑖) for some secret 𝑠
that was chosen at random and fixed, or whether the
black box computes a completely random function
that tosses a fresh random coin whenever it’s given a
new input 𝑖.

In the next lecture we will see the proof of following theorem (due
to Goldreich, Goldwasser, and Micali)

Assuming the PRG conjecture, there


Theorem 4.4 — PRFs from PRGs.
exists a secure pseudorandom function generator.

But before we see the proof of Theorem 4.4, let us see why pseudo-
random functions could be useful.
pse u d ora n d om fu nc ti on s 115

4.1 ONE TIME PASSWORDS (E.G. GOOGLE AUTHENTICATOR, RSA


ID, ETC.)
Until now we have talked about the task of encryption, or protecting
the secrecy of messages. But the task of authentication, or protecting
the integrity of messages is no less important. For example, consider
the case that you receive a software update for your PC, phone, car,
pacemaker, etc. over an open channel such as an unencrypted Wi-
Fi connection. The contents of that update are not secret, but it is of
crucial importance that it was unchanged from the message sent out
by the company and that no malicious attacker had modified the
code. Similarly, when you log into your bank, you might be much
more concerned about the possibility of someone impersonating you
and cleaning out your account than you are about the secrecy of your
information.
Let’s start with a very simple scenario which we’ll call the login
problem. Alice and Bob share a key as before, but now Alice wants to
simply prove her identity to Bob. What makes this challenging is that
this time they need to contend with not the passive eavesdropping
Eve but the active adversary Mallory, who completely controls the
communication channel between them and can modify (or mall) any
message that they send. Specifically for the identity proving case, we
think of the following scenario. Each instance of such an identifica-
tion protocol consists of some interaction between Alice and Bob that
ends with Bob deciding whether to accept it as authentic or reject as
an impersonation attempt. Mallory’s goal is to fool Bob into accepting
her as Alice.
The most basic way to try to solve the login problem is by simply
using a password. That is, if we assume that Alice and Bob can share
a key, we can treat this key as some secret password 𝑝 that was se-
lected at random from {0, 1}𝑛 (and hence can only be guessed with
probability 2−𝑛 ). Why doesn’t Alice simply send 𝑝 to Bob to prove
to him her identity? A moment’s thought shows that this would be a
very bad idea. Since Mallory is controlling the communication line,
she would learn 𝑝 after the first identification attempt and could then
easily impersonate Alice in future interactions. However, we seem to
have just the tool to protect the secrecy of 𝑝— encryption. Suppose that
Alice and Bob share a secret key 𝑘 and an additional secret password
𝑝. Wouldn’t a simple way to solve the login problem be for Alice to
send Bob an encryption of the password 𝑝? After all, the security of
the encryption should guarantee that Mallory can’t learn 𝑝, right?

P
116 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

This would be a good time to stop reading and try to


think for yourself whether using a secure encryption
to encrypt 𝑝 would guarantee security for the login
problem. (No really, stop and think about it.)

The problem is that Mallory does not have to learn the password
𝑝 in order to impersonate Alice. For example, she can simply record
the message Alice 𝑐1 sends to Bob in the first session and then replay
it to Bob in the next session. Since the message is a valid encryption
of 𝑝, then Bob would accept it from Mallory! (This is known as a
replay attack and is a common attack one needs to protect against in
cryptographic protocols.) One can try to put in countermeasures to
defend against this particular attack, but its existence demonstrates
that secrecy of the password does not guarantee security of the login
protocol.

4.1.1 How do pseudorandom functions help in the login problem?


The idea is that they create what’s known as a one time password. Alice
and Bob will share an index 𝑠 ∈ {0, 1}𝑛 for the pseudorandom func-
tion generator {𝑓𝑠 }. When Alice wants to prove her identity to Bob,
Bob will choose a random 𝑖 ←𝑅 {0, 1}𝑛 , send 𝑖 to Alice, and then Alice
will send 𝑓𝑠 (𝑖), 𝑓𝑠 (𝑖 + 1), … , 𝑓𝑠 (𝑖 + ℓ − 1) to Bob where ℓ is some param-
eter (you can think of ℓ = 𝑛 for simplicity). Bob will check that indeed
𝑦 = 𝑓𝑠 (𝑖) and if so accept the session as authentic.
The formal protocol is as follows:

Protocol PRF-Login:

• Shared input: 𝑠 ∈ {0, 1}𝑛 . Alice and Bob treat it as a seed for a
pseudorandom function generator {𝑓𝑠 }.
• In every session Alice and Bob do the following:
1. Bob chooses a random 𝑖 ←𝑅 [2𝑛 ] and sends 𝑖 to Alice.
2. Alice sends 𝑦1 , … , 𝑦ℓ to Bob where 𝑦𝑗 = 𝑓𝑠 (𝑖 + 𝑗 − 1).
3. Bob checks that for every 𝑗 ∈ {1, … , ℓ}, 𝑦𝑗 = 𝑓𝑠 (𝑖 + 𝑗 − 1) and if
so accepts the session; otherwise he rejects it.

As we will see it’s not really crucial that the input 𝑖 (which is
known in crypto parlance as a nonce) is random. What is crucial is that
it never repeats itself, to foil a replay attack. For this reason in many
applications Alice and Bob compute 𝑖 as a function of the current time
(for example, the index of the current minute based on some agreed-
upon starting point), and hence we can make it into a one message
protocol. Also the parameter ℓ is sometimes chosen to be deliberately
short so that it will be easy for people to type the values 𝑦1 , … , 𝑦ℓ . Figure 4.2: The Google Authenticator app is one
popular example of a one-time password scheme
Why is this secure? The key to understanding schemes using pseu- using pseudorandom functions. Another example is
dorandom functions is to imagine what would happen if 𝑓𝑠 was be RSA’s SecurID token.
pse u d ora n d om fu nc ti on s 117

an actual random function instead of a pseudo random function. In a


truly random function, every one of the values 𝑓𝑠 (0), … , 𝑓𝑠 (2𝑛 − 1)
is chosen independently and uniformly at random from {0, 1}. One
useful way to imagine this is using the concept of “lazy evaluation”.
We can think of 𝑓𝑆 as determined by tossing 2𝑛 different coins for
the values 𝑓(0), … , 𝑓(2𝑛 − 1). Now consider the case where we don’t
actually toss the 𝑖𝑡ℎ coin until we need it. The crucial point is that
if we have queried the function in 𝑇 ≪ 2𝑛 places, then when Bob
chooses a random 𝑖 ∈ [2𝑛 ] it is extremely unlikely that any one of the
set {𝑖, 𝑖 + 1, … , 𝑖 + ℓ − 1} will be one of those locations that we pre-
viously queried. Thus, if the function was truly random, Mallory has
no information on the value of the function in these coordinates, and
would be able to predict (or rather, guess) it in all these locations with
probability at most 2−ℓ .

P
Please make sure you understand the informal rea-
soning above, since we will now translate this into a
formal theorem and proof.

Theorem 4.5 — Login protocol via PRF. Suppose that {𝑓𝑠 } is a secure
pseudorandom function generator and Alice and Bob interact us-
ing Protocol PRF-Login for some polynomial number 𝑇 of sessions
(over a channel controlled by Mallory). After observing these in-
teractions, Mallory then interacts with Bob, where Bob follows the
protocol’s instructions but Mallory has access to arbitrary efficient
computation. Then, the probability that Bob accepts the interaction
is at most 2−ℓ + 𝜇(𝑛) where 𝜇(⋅) is some negligible function.

Proof. This proof, as so many others in this course, uses an argument


via contradiction. We assume, towards the sake of contradiction, that
there exists an adversary 𝑀 (for Mallory) that can break the identifi-
cation scheme PRF-Login with probability 2−ℓ + 𝜖 after 𝑇 interactions.
We then construct an attacker 𝐴 that can distinguish access to {𝑓𝑠 }
from access to a random function in 𝑝𝑜𝑙𝑦(𝑇 ) time and with bias at
least 𝜖/2.
How do we construct this adversary 𝐴? The idea is as follows. First,
we prove that if we ran the protocol PRF-Login using an actual random
function, then 𝑀 would not be able to succeed in impersonating with
probability better than 2−ℓ + 𝑛𝑒𝑔𝑙𝑖𝑔𝑖𝑏𝑙𝑒. Therefore, if 𝑀 does do better,
then we can use that to distinguish 𝑓𝑠 from a random function. The
adversary 𝐴 gets some black box 𝑂(⋅) (for oracle) and will use it while
internally simulating all the parties— Alice, Bob and Mallory (using
𝑀 ) in the 𝑇 + 1 interactions of the PRF-Login protocol. Whenever any
118 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

of the parties needs to evaluate 𝑓𝑠 (𝑖), 𝐴 will forward 𝑖 to its black box
𝑂(⋅) and return the value 𝑂(𝑖). It will then output 1 if and only if 𝑀
succeeds in impersonation in this internal simulation. The argument
above showed that if 𝑂(⋅) is a truly random function, then the proba-
bility that 𝐴 outputs 1 is at most 2−ℓ + 𝑛𝑒𝑔𝑙𝑖𝑔𝑖𝑏𝑙𝑒 (and so in particular
less than 2−ℓ + 𝜖/2). On the other hand, if 𝑂(⋅) is the function 𝑖 ↦ 𝑓𝑠 (𝑖)
for some fixed and random 𝑠, then this probability is at least 2−ℓ + 𝜖.
Thus 𝐴 will distinguish between the two cases with bias at least 𝜖/2.
We now turn to the formal proof:
Claim 1: Let PRF-Login* be the hypothetical variant of the protocol
PRF-Login where Alice and Bob share a completely random function
𝐻 ∶ [2𝑛 ] → {0, 1}. Then, no matter what Mallory does, the probability
she can impersonate Alice after observing 𝑇 interactions is at most
2−ℓ + (8ℓ𝑇 )/2𝑛 .
(If PRF-Login* is easier to prove secure than PRF-Login, you might
wonder why we bother with PRF-Login in the first place and not sim-
ply use PRF-Login*. The reason is that specifying a random function
𝐻 requires specifying 2𝑛 bits, and so that would be a huge shared key.
So PRF-Login* is not a protocol we can actually run but rather a hy-
pothetical “mental experiment” that helps us in arguing about the
security of PRF-Login.)
Proof of Claim 1: Let 𝑖1 , … , 𝑖2𝑇 be the nonces chosen by Bob and
recieved by Alice in the first 𝑇 iterations. That is, 𝑖1 is the nonce cho-
sen by Bob in the first iteration while 𝑖2 is the nonce that Alice re-
ceived in the first iteration (if Mallory doesn’t modify it then 𝑖1 = 𝑖2 ).
Similarly, 𝑖3 is the nonce chosen by Bob in the second iteration while
𝑖4 is the nonce received by Alice and so on and so forth. Let 𝑖 be the
nonce chosen in the 𝑇 + 1𝑠𝑡 iteration in which Mallory tries to im-
personate Alice. We claim that the probability that there exists some
𝑗 ∈ {1, … , 2𝑇 } such that |𝑖 − 𝑖𝑗 | < 2ℓ is at most 8ℓ𝑇 /2𝑛 . Indeed, let 𝑆
be the union of all the intervals of the form {𝑖𝑗 − 2ℓ + 1, … , 𝑖𝑗 + 2ℓ − 1}
for 1 ≤ 𝑗 ≤ 2𝑇 . Since it’s a union of 2𝑇 intervals each of length
less than 4ℓ, 𝑆 contains at most 8𝑇 ℓ elements, so the probability that
𝑖 ∈ 𝑆 is |𝑆|/2𝑛 ≤ (8𝑇 ℓ)/2𝑛 . Now, if there does not exists a 𝑗 such that
|𝑖−𝑖𝑗 | < 2ℓ then it means in particular that all the queries to 𝐻(⋅) made
by either Alice or Bob during the first 𝑇 iterations are disjoint from the
interval {𝑖, 𝑖 + 1, … , 𝑖 + ℓ − 1}. Since 𝐻(⋅) is a completely random
function, the values 𝐻(𝑖), … , 𝐻(𝑖 + ℓ − 1) are chosen uniformly and
independently from all the rest of the values of this function. Since
Mallory’s message 𝑦 to Bob in the 𝑇 + 1𝑠𝑡 iteration depends only on
what she observed in the past, the values 𝐻(𝑖), … , 𝐻(𝑖 + ℓ − 1) are inde-
pendent from 𝑦, and hence under the condition that there is no overlap
between this interval and prior queries, the probability that they equal
𝑦 is 2−ℓ . QED (Claim 1).
pse u d ora n d om fu nc ti on s 119

The proof of Claim 1 is not hard but it is somewhat subtle, so it’s


good to go over it again and make sure you understand it.
Now that we have Claim 1, the proof of the theorem follows as
outlined above. We build an adversary 𝐴 to the pseudorandom func-
tion generator from 𝑀 by having 𝐴 simulate “inside its belly” all the
parties Alice, Bob and Mallory and output 1 if Mallory succeeds in
impersonating. Since we assumed 𝜖 is non-negligible and 𝑇 is polyno-
mial, we can assume that (8ℓ𝑇 )/2𝑛 < 𝜖/2 and hence by Claim 1, if the
black box is a random function, then we are in the PRF-Login* setting
and Mallory’s success will be at most 2−ℓ + 𝜖/2. If the black box is
𝑓𝑠 (⋅), then we get exactly the PRF-Login setting and hence under our
assumption the success will be at least 2−ℓ + 𝜖. We conclude that the
difference in probability of 𝐴 outputting 1 between the random and
pseudorandom case is at least 𝜖/2 thus contradicting the security of
the pseudorandom function generator.

4.1.2 Modifying input and output lengths of PRFs


In the course of constructing this one-time-password scheme from a
PRF, we have actually proven a general statement that is useful on its
own: that we can transform standard PRF which is a collection {𝑓𝑠 }
of functions mapping {0, 1}𝑛 to {0, 1}, into a PRF where the functions
have a longer output ℓ. Specifically, we can make the following defini-
tion:

Let ℓin , ℓout


Definition 4.6 — PRF ensemble (varying inputs and outputs). ∶
ℕ → ℕ. An ensemble of functions {𝑓𝑠 }𝑠∈{0,1}∗ is a PRF ensemble with
input length ℓin and output length ℓout if:

1. For every 𝑛 ∈ ℕ and 𝑠 ∈ {0, 1}𝑛 , 𝑓𝑠 ∶ {0, 1}ℓin → {0, 1}ℓout .

2. For every polynomial 𝑝 and 𝑝(𝑛)-time adversary 𝐴, if 𝑛 is large


enough then

∣ 𝔼 [𝐴𝑓𝑠 (⋅) (1𝑛 )] − 𝔼 [𝐴ℎ (1𝑛 )]∣ < 1/𝑝(𝑛) .


𝑠∈{0,1}𝑛 ℎ←𝑅 {0,1}ℓin →{0,1}ℓout

Standard PRFs as we defined in Definition 4.3 correspond to gener-


alized PRFs where ℓin (𝑛) = 𝑛 and ℓout (𝑛) = 1 for all 𝑛 ∈ ℕ. It is a good
exercise (which we will leave to the reader) to prove the following
theorem:
120 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Theorem 4.7 — PRF length extension. Suppose that PRFs exist. Then
for every constant 𝑐 and polynomial-time computable functions
ℓin , ℓout ∶ ℕ → ℕ with ℓin (𝑛), ℓout (𝑛) ≤ 𝑛𝑐 , there exist a PRF ensem-
ble with input length ℓin and output length ℓout .

Thus from now on whenever we are given a PRF, we will allow


ourselves to assume that it has any polynomial output size that is
convenient for us.

4.2 MESSAGE AUTHENTICATION CODES


One time passwords are a tool allowing you to prove your identity to,
say, your email server. But even after you did so, how can the server
trust that future communication comes from you and not from some
attacker that can interfere with the communication channel between
you and the server (so called “man in the middle” attack)? Similarly,
one time passwords may allow a software company to prove their
identity before they send you a software update, but how do you
know that an attacker does not change some bits of this software
update on route between their servers and your device?
This is where Message Authentication Codes (MACs) come into play-
their role is to authenticate not only the identity of the parties but
also their communication. Once again we have Alice and Bob, and the
adversary Mallory who can actively modify messages (in contrast
to the passive eavesdropper Eve). Similar to the case to encryption,
Alice has a message 𝑚 she wants to send to Bob, but now we are not
concerned with Mallory learning the contents of the message. Rather,
we want to make sure that Bob gets precisely the message 𝑚 sent by
Alice. Actually this is too much to ask for, since Mallory can always
decide to block all communication, but we can ask that either Bob gets
precisely 𝑚 or he detects failure and accepts no message at all. Since
we are in the private key setting, we assume that Alice and Bob share a
key 𝑘 that is unknown to Mallory.
What kind of security would we want? We clearly want Mallory
not to be able to cause Bob to accept a message 𝑚′ ≠ 𝑚. But, like
in the encryption setting, we want more than that. We would like
Alice and Bob to be able to use the same key for many messages. So,
Mallory might observe the interactions of Alice and Bob on messages
𝑚1 , … , 𝑚𝑇 before trying to cause Bob to accept a message 𝑚′𝑇 +1 ≠
𝑚𝑇 +1 . In fact, to make our notion of security more robust, we will
even allow Mallory to choose the messages 𝑚1 , … , 𝑚𝑇 (this is known
as a chosen message or chosen plaintext attack). The resulting formal
definition is below:
pse u d ora n d om fu nc ti on s 121

Let (𝑆, 𝑉 ) (for sign


Definition 4.8 — Message Authentication Codes (MAC).
and verify) be a pair of efficiently computable algorithms where
𝑆 takes as input a key 𝑘 and a message 𝑚, and produces a tag
𝜏 ∈ {0, 1}∗ , while 𝑉 takes as input a key 𝑘, a message 𝑚, and a tag
𝜏 , and produces a bit 𝑏 ∈ {0, 1}. We say that (𝑆, 𝑉 ) is a Message
Authentication Code (MAC) if:

• For every key 𝑘 and message 𝑚, 𝑉𝑘 (𝑚, 𝑆𝑘 (𝑚)) = 1.


• For every polynomial-time adversary 𝐴 and polynomial 𝑝(𝑛),
it is with less than 1/𝑝(𝑛) probability over the choice of 𝑘 ←𝑅
{0, 1} that 𝐴
𝑛
(1 ) = (𝑚 , 𝜏 ) such that 𝑚 is not one of the
𝑆𝑘 (⋅) 𝑛 ′ ′ ′

messages 𝐴 queries and 𝑉𝑘 (𝑚′ , 𝜏 ′ ) = 1. 3

3
Clearly if the adversary outputs a pair (𝑚, 𝜏) that
If Alice and Bob share the key 𝑘, then to send a message 𝑚 to Bob, it did query from its oracle then that pair will pass
Alice will simply send over the pair (𝑚, 𝜏 ) where 𝜏 = 𝑆𝑘 (𝑚). If verification. This suggests the possibility of a replay
attack whereby Mallory resends to Bob a message that
Bob receives a message (𝑚′ , 𝜏 ′ ), then he will accept 𝑚′ if and only Alice sent him in the past. As above, one can thwart
if 𝑉𝑘 (𝑚′ , 𝜏 ′ ) = 1. Mallory now observes 𝑡 rounds of communication this by insisting the every message 𝑚 begins with a
fresh nonce or a value derived from the current time.
of the form (𝑚𝑖 , 𝑆𝑘 (𝑚𝑖 )) for messages 𝑚1 , … , 𝑚𝑡 of her choice, and her
goal is to try to create a new message 𝑚′ that was not sent by Alice,
but for which she can forge a valid tag 𝜏 ′ that will pass verification.
Our notion of security guarantees that she’ll only be able to do so with 4
A priori you might ask if we should not also give
negligible probability, in which case the MAC is CMA-secure.4 Mallory an oracle to 𝑉𝑘 (⋅) as well. After all, in the
course of those many interactions, Mallory could
also send Bob many messages (𝑚′ , 𝜏 ′ ) of her choice,
and observe from his behavior whether or not these
passed verification. It is a good exercise to show that
R adding such an oracle does not change the power of
Remark 4.9 — Why can Mallory choose the messages?. the definition, though we note that this is decidedly
The notion of a “chosen message attack” might seem not the case in the analogous question for encryption.
a little “over the top”. After all, Alice is going to send
to Bob the messages of her choice, rather than those
chosen by her adversary Mallory. However, as cryp-
tographers have learned time and again the hard way,
it is better to be conservative in our security defini-
tions and think of an attacker that has as much power
as possible. First of all, we want a message authentica-
tion code that will work for any sequence of messages,
and so it’s better to consider this “worst case” setting
of allowing Mallory to choose them. Second, in many
realistic settings an adversary could have some effect
on the messages that are being sent by the parties.
This has occurred time and again in cases ranging
from web servers to German submarines in World
War II, and we’ll return to this point when we talk
about chosen plaintext and chosen ciphertext attacks on
encryption schemes.
122 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

R
Remark 4.10 — Strong unforgability. Some texts (such as
Boneh Shoup) define a stronger notion of unforgabil-
ity where the adversary cannot even produce new sig-
natures for messages it has queried in the attack. That
is, the adversary cannot produce a valid message-
signature pair that it has not seen before. This stronger
definition can be useful for some applications. It is
fairly easy to transform MACs satisfying Definition 4.8
into MACs satisfying strong unforgability. In partic-
ular, if the signing function is deterministic, and we
use a canonical verifier algorithm where 𝑉𝑘 (𝑚, 𝜎) = 1
iff 𝑆𝑘 (𝑚) = 𝜎 then weak unforgability automatically
implies strong unforgability since every message has a
single signature that would pass verification (can you
see why?).

4.3 MACS FROM PRFS


We now show how pseudorandom function generators yield mes-
sage authentication codes. In fact, the construction is so immediate
that much of the more applied cryptographic literature does not dis-
tinguish between these two concepts, and uses the name “Message
Authentication Codes” to refer to both MAC’s and PRF’s. However,
since this is not applied cryptographic literature, the distinction is
rather important.

Theorem 4.11 — MAC Theorem. Under the PRF Conjecture, there exists
a secure MAC.

Proof. Let 𝐹 (⋅, ⋅) be a secure pseudorandom function generator with


𝑛/2 bits output (which we can obtain using Theorem 4.7). We define
𝑆𝑘 (𝑚) = 𝐹 (𝑘, 𝑚) and 𝑉𝑘 (𝑚, 𝜏 ) to output 1 iff 𝐹 (𝑘, 𝑚) = 𝜏 . Suppose to-
wards the sake of contradiction that there exists an adversary 𝐴 breaks
the security of this construction of a MAC. That is, 𝐴 queries 𝑆𝑘 (⋅)
𝑝𝑜𝑙𝑦(𝑛) many times and with probability 1/𝑝(𝑛) for some polynomial
𝑝 outputs (𝑚′ , 𝜏 ′ ) that she did not ask for such that 𝐹 (𝑘, 𝑚′ ) = 𝜏 ′ .
We use 𝐴 to construct an adversary 𝐴′ that can distinguish between
oracle access to a PRF and a random function by simulating the MAC
security game inside 𝐴′ . Every time 𝐴 requests the signature of some
message 𝑚, 𝐴′ returns 𝑂(𝑚). When 𝐴 returns (𝑚′ , 𝜏 ′ ) at the end
of the MAC game, 𝐴′ returns 1 if 𝑂(𝑚′ ) = 𝜏 ′ , and 0 otherwise. If
𝑂(⋅) = 𝐻(⋅) for some completely random function 𝐻(⋅), then the value
𝐻(𝑚′ ) would be completely random in {0, 1}𝑛/2 and independent of
all prior queries. Hence the probability that this value would equal 𝜏 ′
is at most 2−𝑛/2 . If instead 𝑂(⋅) = 𝐹 (𝑘, ⋅), then by the fact that 𝐴 wins
the MAC security game with probability 1/𝑝(𝑛), the adversary 𝐴′ will
pse u d ora n d om fu nc ti on s 123

output 1 with probability 1/𝑝(𝑛). That means that such an adversary


𝐴′ can distinguish between an oracle to 𝐹 (𝑘, ⋅) and an oracle to a
random function 𝐻, which gives us a contradiction.

4.4 ARBITRARY INPUT LENGTH EXTENSION FOR MACS AND PRFS


So far we required the message to be signed 𝑚 to be no longer than
the key 𝑘 (i.e., both 𝑛 bits long). However, it is not hard to see that
this requirement is not really needed. If our message is longer, we
can divide it into blocks 𝑚1 , … , 𝑚𝑡 and sign each message (𝑖, 𝑚𝑖 )
individually. The disadvantage here is that the size of the tag (i.e.,
MAC output) will grow with the size of the message. However, even
this is not really needed. Because the tag has length 𝑛/2 for length 𝑛
messages, we can sign the tags 𝜏1 , … , 𝜏𝑡 and only output those. The
verifier can repeat this computation to verify this. We can continue
this way and so get tags of 𝑂(𝑛) length for arbitrarily long messages.
Hence in the future, whenever we need to, we can assume that our
PRFs and MACs can get inputs in {0, 1}∗ — i.e., arbitrarily length
strings.
We note that this issue of length extension is actually quite a thorny
and important one in practice. The above approach is not the most
efficient way to achieve this, and there are several more practical vari-
ants in the literature (see Boneh-Shoup Sections 6.4-6.8). Also, one
needs to be very careful on the exact way one chops the message into
blocks and pads it to an integer multiple of the block size. Several at-
tacks have been mounted on schemes that performed this incorrectly.

4.5 ASIDE: NATURAL PROOFS


Pseudorandom functions play an important role in computational
complexity, where they have been used as a way to give “barrier re- 5
This discussion has more to do with computational
sults” for proving results such as P ≠ NP.5 Specifically, the Natural complexity than cryptography, and so can be safely
skipped without harming understanding of future
Proofs barrier for proving circuit lower bounds says that if strong
material in this course.
enough pseudorandom functions exist, then certain types of argu-
ments are bound to fail. These are arguments which come up with a
property EASY of a Boolean function 𝑓 ∶ {0, 1}𝑛 → {0, 1} such that:

• If 𝑓 can be computed by a polynomial sized circuit, then it has the


property EASY.

• The property EASY fails to hold for a random function with high
probability.

• Checking whether EASY holds can be done in time polynomial in


the truth table size of 𝑓. That is, in 2𝑂(𝑛) time.
124 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

A priori these technical conditions might not seem very “natu-


ral” but it turns out that many approaches for proving circuit lower
bounds (for restricted families of circuits) have this form. The idea
is that such approaches find a “non generic” property of easily com-
putable function, such as finding some interesting correlations be-
tween the some input bits and the output. These are correlations that
are unlikely to occur in random functions. The lower bound typically
follows by exhibiting a function 𝑓0 that does not have this property,
and then using that to derive that 𝑓0 cannot be efficiently computed by
this particular restricted family of circuits.
The existence of strong enough pseudorandom functions can be
shown to contradict the existence of such a property EASY, since a
pseudorandom function can be computed by a polynomial sized cir-
cuit, but it cannot be distinguished from a random function. While a
priori a pseudorandom function is only secure for polynomial time
distinguishers, under certain assumptions it might be possible to cre-
ate a pseudorandom function with a seed of size, say, 𝑛5 , that would
be secure with respect to adversaries running in time 2𝑂(𝑛 ) .
2
5
Pseudorandom functions from pseudorandom generators and
CPA security

In this lecture we will see that the PRG conjecture implies the PRF
conjecture. We will also see how PRFs imply an encryption scheme
that is secure even when we encrypt multiple messages with the same
key.
We have seen that PRF’s (pseudorandom functions) are extremely
useful, and we’ll see some more applications of them later on. But
are they perhaps too amazing to exist? Why would someone imagine
that such a wonderful object is feasible? The answer is the following
theorem:

Theorem 5.1 — The PRF Theorem. Suppose that the PRG Conjecture is
true, then there exists a secure PRF collection {𝑓𝑠 }𝑠∈{0,1}∗ such that
for every 𝑠 ∈ {0, 1}𝑛 , 𝑓𝑠 maps {0, 1}𝑛 to {0, 1}𝑛 .

Figure 5.1: The construction of a pseudorandom


function from a pseudorandom generator can be
illustrated by a depth 𝑛 binary tree. The root is
labeled by the seed 𝑠 and for every internal node 𝑣
labeled by a string 𝑥 ∈ {0, 1}𝑛 , we use that label 𝑥
as a seed into the PRG 𝐺 to label 𝑣’s two children. In
particular, the children of 𝑣 are labeled with 𝐺0 (𝑥)
and 𝐺1 (𝑥) respectively. The output of the function 𝑓𝑠
on input 𝑖 is the label of the 𝑖𝑡ℎ leaf counting from left
to right. Note that the numbering of leaf 𝑖 is related to
the bitstring representation of 𝑖 and the path leaf 𝑖 in
the following way: we traverse to leaf 𝑖 from the root
by reading off the 𝑛 bits of 𝑖 left to right and descend
into the left child of the current node for every 0 we
encounter and traverse right for every 1.

Compiled on 11.17.2021 22:35


126 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Proof. We describe the proof, see also Chapter 6 of Rosulek or Section


8.5 of Katz-Lindell (section 7.5 in 2nd edition) for alternative exposi-
tions.
If the PRG Conjecture is true then in particular by the length exten-
sion theorem there exists a PRG 𝐺 ∶ {0, 1}𝑛 → {0, 1}2𝑛 that maps 𝑛
bits into 2𝑛 bits. Let’s denote 𝐺(𝑠) = 𝐺0 (𝑠) ∘ 𝐺1 (𝑠) where ∘ denotes
concatenation. That is, 𝐺0 (𝑠) denotes the first 𝑛 bits and 𝐺1 (𝑠) denotes
the last 𝑛 bits of 𝐺(𝑠).
For 𝑖 ∈ {0, 1}𝑛 , we define 𝑓𝑠 (𝑖) as

𝐺𝑖𝑛 (𝐺𝑖𝑛−1 (⋯ 𝐺𝑖1 (𝑠))).

This corresponds to 𝑛 composed applications of 𝐺𝑏 for 𝑏 ∈ {0, 1}. If


the 𝑗𝑡ℎ bit of 𝑖’s binary string is 0 then the 𝑗𝑡ℎ application of the PRG is
𝐺0 otherwise it is 𝐺1 . This series of successive applications starts with
the initial seed 𝑠.
This definition directly corresponds to the depiction in Fig. 5.1,
where the successive applications of 𝐺𝑏 correspond to the recursive
labeling procedure.
By the definition above we can see that to evaluate 𝑓𝑠 (𝑖) we need to
evaluate the pseudorandom generator 𝑛 times on inputs of length 𝑛,
and so if the pseudorandom generator is efficiently computable then
so is the pseudorandom function. Thus, “all” that’s left is to prove that
the construction is secure and this is the heart of this proof.
I’ve mentioned before that the first step of writing a proof is con-
vincing yourself that the statement is true, but there is actually an
often more important zeroth step which is understanding what the
statement actually means. In this case what we need to prove is the
following:
We need to show that the security of the PRG 𝐺 implies the security
of the PRF ensemble {𝑓𝑠 }. Via the contrapositive, this means that
we assume that there is an adversary 𝐴 that can distinguish in time
𝑇 a black box for 𝑓𝑠 (⋅) from a black-box for a random function with
advantage 𝜖. We need to use 𝐴 come up with an adversary 𝐷 that
can distinguish in time 𝑝𝑜𝑙𝑦(𝑇 ) an input of the form 𝐺(𝑠) (where 𝑠 is
random in {0, 1}𝑛 ) from an input of the form 𝑦 where 𝑦 is random in
{0, 1}2𝑛 with bias at least 𝜖/𝑝𝑜𝑙𝑦(𝑇 ).
Figure 5.2: In the “lazy evaluation” implementation
Assume that 𝐴 as above is a 𝑇 -time adversary that wins in the “PRF
of the black box to the adversary, we label every
game” with advantage 𝜖. Let us consider the “lazy evaluation” imple- node in the tree only when we need it. Subsequent
mentation of the black box for 𝐴 illustrated in Fig. 5.2. That is, at every traversals do not reevaluate the PRG, leading to reuse
of the intermediate seeds. Thus for example, two
point in time there are nodes in the full binary tree that are labeled sibling leaves will correspond to a single call to 𝐺(𝑥),
and nodes which we haven’t yet labeled. When 𝐴 makes a query 𝑖, where 𝑥 is their parent’s label, but with the left child
receiving the first 𝑛 bits and the right child receiving
this query corresponds to the path 𝑖1 … 𝑖𝑛 in the tree. We look at the
the second 𝑛 bits of 𝐺(𝑥). In this figure check marks
lowest (furthest away from the root) node 𝑣 on this path which has correspond to nodes that have been labeled and
been labeled by some value 𝑦, and then we continue labelling the path question marks to nodes that are still unlabeled.
pse u d ora n d om fu nc ti on s from p se u d ora n d om g e n e rators a n d cpa se c u ri ty 127

from 𝑣 downwards until we reach 𝑖. In other words, we label the two


children of 𝑣 by 𝐺0 (𝑦) and 𝐺1 (𝑦), and then if the path 𝑖 involves the
first child then we label its children by 𝐺0 (𝐺0 (𝑦)) and 𝐺1 (𝐺0 (𝑦)), and
so on and so forth (see Fig. 5.3). Note that because 𝐺0 (𝑦) and 𝐺1 (𝑦)
correspond to a single call to 𝐺, regardless of whether the traversals
continues left or right (i.e. whether the current level corresponds to a
value 0 or 1 in 𝑖) we label both children at the same time.

Figure 5.3: When the adversary queries 𝑖, the oracle


takes the path from 𝑖 to the root and computes the
generator on the minimum number of internal nodes
that is needed to obtain the label of the 𝑖𝑡ℎ leaf.

A moment’s thought shows that this is just another (arguably cum-


bersome) way to describe the oracle that simply computes the map
𝑖 ↦ 𝑓𝑠 (𝑖). And so the experiment of running 𝐴 with this oracle pro-
duces precisely the same result as running 𝐴 with access to 𝑓𝑠 (⋅). Note
that since 𝐴 has running time at most 𝑇 , the number of times our or-
acle will need to label an internal node is at most 𝑇 ′ ≤ 2𝑛𝑇 (since we
label at most 2𝑛 nodes for every query 𝑖).
We now define the following 𝑇 ′ hybrids: in the 𝑗𝑡ℎ hybrid, we
run this experiment but in the first 𝑗 times the oracle needs to label
internal nodes then it uses independent random labels. That is, for the
first 𝑗 times we label a node 𝑣, instead of letting the label of 𝑣 be 𝐺𝑏 (𝑢)
(where 𝑢 is the parent of 𝑣, and 𝑏 ∈ {0, 1} corresponds to whether 𝑣 is
the left or right child of 𝑢), we label 𝑣 by a random string in {0, 1}𝑛 .
Note that the 0𝑡ℎ hybrid corresponds to the case where the oracle
implements the function 𝑖 ↦ 𝑓𝑠 (𝑖), while in the 𝑇 ′𝑡ℎ hybrid all labels
are random and hence implements a random function. By the hybrid
argument, if 𝐴 can distinguish between the 0𝑡ℎ hybrid and the 𝑇 ′𝑡ℎ
hybrid with bias 𝜖 then there must exists some 𝑗 such that it distin-
guishes between the 𝑗𝑡ℎ hybrid (pictured in Fig. 5.4) and the 𝑗 + 1𝑠𝑡
hybrid (pictured in Fig. 5.5) with bias at least 𝜖/𝑇 ′ . We will use this 𝑗
and 𝐴 to break the pseudorandom generator.
We can now describe our distinguisher 𝐷 (see Fig. 5.6) for the
pseudorandom generator. On input a string 𝑦 ∈ {0, 1}2𝑛 𝐷 will run
128 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Figure 5.4: In the 𝑗𝑡ℎ hybrid the first 𝑗 internal labels


are drawn uniformly at random from 𝑈𝑛 . All sub-
sequent children’s labels are produced in the usual
way by seeding 𝐺 with the label 𝑧 of the parent and
assigning the first 𝑛 bits (𝐺0 (𝑧)) to the left child and
the last 𝑛 bits (𝐺1 (𝑧)) to the right child. For exam-
ple, for some node 𝑣𝐿 𝑗−1 at the 𝑗
𝑡ℎ level, we generate

pseudorandom string 𝐺(𝑣𝑗−1 ) and label the left child


𝐿

𝑣𝐿𝑗 = 𝐺0 (𝑣𝑗−1 ) and the right child 𝑣𝑗 = 𝐺1 (𝑣𝑗−1 ).


𝐿 𝑅 𝐿

Note that the labeling scheme for this diagram is dif-


ferent from that in the previous figures. This is simply
for ease of exposition, we could still index our nodes
via the path reaching them from the root.

Figure 5.5: The 𝑗 + 1𝑠𝑡 hybrid differs from the 𝑗𝑡ℎ


in that the process of assigning random labels con-
tinues until the 𝑗 + 1𝑠𝑡 step as opposed to the 𝑗𝑡ℎ .
The hybrids are otherwise completely identically
constructed.
pse u d ora n d om fu nc ti on s from p se u d ora n d om g e n e rators a n d cpa se c u ri ty 129

Figure 5.6: Distinguisher D is similar to hybrid 𝑗,


in that the nodes in the first 𝑗 layers are assigned
completely random labels. When evaluating along
a particular path through 𝑣𝐿 𝑗−1 , rather than labeling
the two children by applying 𝐺 to its label, it simply
splits the input 𝑦 into two strings 𝑦0...𝑛 ,𝑦𝑛+1...2𝑛 . If
𝑦 is truly random, 𝐷 is identical to hybrid 𝑗 + 1. If
𝑦 = 𝐺(𝑠) for some random seed 𝑠, then 𝐷 simulates
hybrid 𝑗.

𝐴 and the 𝑗𝑡ℎ oracle inside its belly with one difference- when the
time comes to label the 𝑗𝑡ℎ node, instead of doing this by applying the
pseudorandom generator to the label of its parent 𝑣 (which is what
should happen in the 𝑗𝑡ℎ oracle) it uses its input 𝑦 to label the two
children of 𝑣.
Now, if 𝑦 was completely random then we get exactly the distribu-
tion of the 𝑗 + 1𝑠𝑡 oracle, and hence in this case 𝐷 simulates internally
the 𝑗 + 1𝑠𝑡 hybrid. However, if 𝑦 = 𝐺(𝑠) for some randomly sampled
𝑠 ∈ {0, 1}𝑛 , though it may not be obvious at first, we actually get the
distribution of the 𝑗𝑡ℎ oracle.
The equivalence between hybrid 𝑗 and distinguisher 𝐷 under the
condition that 𝑦 = 𝐺(𝑠) is non obvious, because in hybrid 𝑗, the label
for the children of 𝑣𝑗−1
𝐿
was supposed to be the result of applying the
pseudorandom generator to the label of 𝑣𝑗−1 𝐿
and not to some other
random string (see Fig. 5.6). However, because 𝑣 was labeled before the
𝑗𝑡ℎ step then we know that it was actually labeled by a random string.
Moreover, since we use lazy evaluation we know that step 𝑗 is the first
time where we actually use the value of the label of 𝑣. Hence, if at
this point we resampled this label and used a completely independent
random string 𝑠 then the distribution of 𝑣𝑗−1
𝐿
and 𝑠 would be identical.
The key observations here are:

1. The output of 𝐴 does not directly depend on the internal labels,


but only on the labels of the leaves (since those are the only values
returned by the oracle).

2. The label for an internal vertex 𝑣 is only used once, and that is for
generating the labels for its children.
130 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Hence the distribution of 𝑦 = 𝐺(𝑠), for 𝑠 drawn from 𝑈𝑛 , is iden-


tical to the distribution, 𝐺(𝑣𝑗−1
𝐿
), of the 𝑗𝑡ℎ hybrid, and thus if 𝐴 had
advantage 𝜖 in breaking the PRF {𝑓𝑠 } then 𝐷 will have advantage 𝜖/𝑇 ′
in breaking the PRG 𝐺 thus obtaining a contradiction.

R
Remark 5.2 — PRF’s in practice. While this construc-
tion reassures us that we can rely on the existence of
pseudorandom functions even on days where we re-
member to take our meds, this is not the construction
people use when they need a PRF in practice because
it is still somewhat inefficient, making 𝑛 calls to the
underlying pseudorandom generators. There are
constructions (e.g., HMAC) based on hash functions
that require stronger assumptions but can use as few
as two calls to the underlying function. We will cover
these constructions when we talk about hash functions
and the random oracle model. One can also obtain
practical constructions of PRFs from block ciphers,
which we’ll see later in this lecture.

5.1 SECURELY ENCRYPTING MANY MESSAGES - CHOSEN PLAIN-


TEXT SECURITY
Let’s get back to our favorite task of encryption. We seemed to have
nailed down the definition of secure encryption, or did we?

P
Try to think what kind of security guarantees are not
provided by the notion of computational secrecy we
saw in Definition 2.6

Definition 2.6 talks about encrypting a single message, but this is


not how we use encryption in the real world. Typically, Alice and Bob
(or Amazon and Boaz) setup a shared key and then engage in many
back and forth messages between one another. At first, we might
think that this issue of a single long message vs. many short ones
is merely a technicality. After all, if Alice wants to send a sequence
of messages (𝑚1 , 𝑚2 , … , 𝑚𝑡 ) to Bob, she can simply treat them as a
single long message. Moreover, the way that stream ciphers work, Alice
can compute the encryption for the first few bits of the message she
decides what will be the next bits and so she can send the encryption
of 𝑚1 to Bob and later the encryption of 𝑚2 . There is some truth to
this sentiment, but there are issues with using stream ciphers for
multiple messages. For Alice and Bob to encrypt messages in this
pse u d ora n d om fu nc ti on s from p se u d ora n d om g e n e rators a n d cpa se c u ri ty 131

way, they must maintain a synchronized shared state. If the message 𝑚1


was dropped by the network, then Bob would not be able to decrypt
correctly the encryption of 𝑚2 .
There is another way in which treating many messages as a single
tuple is unsatisfactory. In real life, Eve might be able to have some im-
pact on what messages Alice encrypts. For example, the Katz-Lindell
book describes several instances in World War II where Allied forces
made particular military maneuver for the sole purpose of causing the
Axis forces to send encryptions of messages of the Allies’ choosing.
To consider a more modern example, today Google uses encryption
for all of its search traffic including (for the most part) the ads that
are displayed on the page. But this means that an attacker, by pay-
ing Google, can cause it to encrypt arbitrary text of their choosing.
This kind of attack, where Eve chooses the message she wants to be
encrypted is called a chosen plaintext attack. You might think that we
are already covering this with our current definition that requires se-
curity for every pair of messages and so in particular this pair could be
chosen by Eve. However, in the case of multiple messages, we would
want to allow Eve to be able to choose 𝑚2 after she saw the encryption
of 𝑚1 .
All that leads us to the following definition, which is a strengthen-
ing of our definition of computational security:

An en-
Definition 5.3 — Chosen Plaintext Attack (CPA) secure encryption.
cryption scheme (𝐸, 𝐷) is secure against chosen plaintext attack (CPA
secure) if for every polynomial time 𝐸𝑣𝑒, Eve wins with probability
at most 1/2 + 𝑛𝑒𝑔𝑙(𝑛) in the game defined below:

1. The key 𝑘 is chosen at random in {0, 1}𝑛 and fixed.


2. Eve gets the length of the key 1𝑛 as input. 1 1
Giving Eve the key as a sequence of 𝑛 1′ s as op-
3. Eve interacts with 𝐸 for 𝑡 = 𝑝𝑜𝑙𝑦(𝑛) rounds as follows: in the 𝑖𝑡ℎ posed to in binary representation is a common no-
tational convention in cryptography. It makes no
round, Eve chooses a message 𝑚𝑖 and obtains 𝑐𝑖 = 𝐸𝑘 (𝑚𝑖 ). difference except that it makes the input length for
4. Then Eve chooses two messages 𝑚0 , 𝑚1 , and gets 𝑐∗ = 𝐸𝑘 (𝑚𝑏 ) Eve of length 𝑛, which makes sense since we want to
allow Eve to run in 𝑝𝑜𝑙𝑦(𝑛) time.
for 𝑏 ←𝑅 {0, 1}.
5. Eve continues to interact with 𝐸 for another 𝑝𝑜𝑙𝑦(𝑛) rounds, as
in Step 3.
6. Eve wins if she outputs 𝑏.

Definition 5.3 is illustrated in Fig. 5.7. Our previous notion of com-


putational secrecy (i.e., Definition 2.6) corresponds to the case that
we skip Steps 3 and 5 above. Since Steps 3 and 5 only give the ad-
versary more power (and hence is only more likely to win), CPA Figure 5.7: In the CPA game, Eve interacts with the
security (Definition 5.3) is stronger than computational secrecy (Def- encryption oracle and at the end chooses 𝑚0 , 𝑚1 , gets
an encryption 𝑐∗ = 𝐸𝑘 (𝑚𝑏 ) and outputs 𝑏′ . She wins
inition 2.6), in the sense that every CPA secure encryption (𝐸, 𝐷) is if 𝑏′ = 𝑏
132 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

also computationally secure. It turns out that CPA security is strictly


stronger, in the sense that without modification, our stream ciphers
cannot be CPA secure. In fact, we have a stronger, and intially some-
what surprising theorem:

Theorem 5.4 — CPA security requires randomization. There is no CPA


secure (𝐸, 𝐷) where 𝐸 is deterministic.

Proof. The proof is very simple: Eve will only use a single round of
interacting with 𝐸 where she will ask for the encryption 𝑐1 of 0ℓ . In
the second round, Eve will choose 𝑚0 = 0ℓ and 𝑚1 = 1ℓ , and get
𝑐∗ = 𝐸𝑘 (𝑚𝑏 ) she will then output 0 if and only if 𝑐∗ = 𝑐1 .

This proof is so simple that you might think it shows a problem


with the definition, but it is actually a real problem with security. If
you encrypt many messages and some of them repeat themselves, it is
possible to get significant information by seeing the repetition pattern
(cue the XKCD cartoon again, see Fig. 5.8). To avoid this issue we
need to use a randomized (or probabilistic) encryption, such that if we
encrypt the same message twice we won’t see two copies of the same
ciphertext.2 But how do we do that? Here pseudorandom functions Figure 5.8: Insecurity of deterministic encryption

come to the rescue:


2
If the messages are guaranteed to have high entropy
which roughly means that the probability that a
message repeats itself is negligible, then it is possible
Theorem 5.5 — CPA security from PRFs. Suppose that {𝑓𝑠 } is a PRF to have a secure deterministic private-key encryption,
collection where 𝑓𝑠 ∶ {0, 1} 𝑛
→ {0, 1}ℓ , then the following is a and this is sometimes used in practice. (Though often
some sort of randomization or padding is added
CPA secure encryption scheme: 𝐸𝑠 (𝑚) = (𝑟, 𝑓𝑠 (𝑟) ⊕ 𝑚) where to ensure this property, hence in effect creating a
𝑟 ←𝑅 {0, 1}𝑛 , and 𝐷𝑠 (𝑟, 𝑧) = 𝑓𝑠 (𝑟) ⊕ 𝑧. randomized encryption.) Deterministic encryptions
can sometimes be useful for applications such as
efficient queries on encrypted databases. See this
Proof. I leave to you to verify that 𝐷𝑠 (𝐸𝑠 (𝑚)) = 𝑚. We need to show lecture in Dan Boneh’s coursera course.
the CPA security property. As is usual in PRF-based constructions, we
first show that this scheme will be secure if 𝑓𝑠 was an actually random
function, and then use that to derive security.
Consider the game above when played with a completely random
function and let 𝑟𝑖 be the random string chosen by 𝐸 in the 𝑖𝑡ℎ round
and 𝑟∗ the string chosen in the last round. We start with the following
simple but crucial claim:
Claim: The probability that 𝑟∗ = 𝑟𝑖 for some 𝑖 is at most 𝑇 /2𝑛 .
Proof of claim: For any particular 𝑖, since 𝑟∗ is chosen indepen-
dently of 𝑟𝑖 , the probability that 𝑟∗ = 𝑟𝑖 is 2−𝑛 . Hence the claim fol-
lows from the union bound. QED
Given this claim we know that with probability 1 − 𝑇 /2𝑛 (which is
1 − 𝑛𝑒𝑔𝑙(𝑛)), the string 𝑟∗ is distinct from any string that was chosen
before. This means that by the lazy evaluation principle, if 𝑓𝑠 (⋅) is
a completely random function then the value 𝑓𝑠 (𝑟∗ ) can be thought
pse u d ora n d om fu nc ti on s from p se u d ora n d om g e n e rators a n d cpa se c u ri ty 133

of as being chosen at random in the final round independently of


anything that happened before. But then 𝑓𝑠 (𝑟∗ ) ⊕ 𝑚𝑏 amounts to
simply using the one-time pad to encrypt 𝑚𝑏 . That is, the distributions
𝑓𝑠 (𝑟∗ ) ⊕ 𝑚0 and 𝑓𝑠 (𝑟∗ ) ⊕ 𝑚1 (where we think of 𝑟∗ , 𝑚0 , 𝑚1 as fixed and
the randomness comes from the choice of the random function 𝑓𝑠 (⋅))
are both equal to the uniform distribution 𝑈𝑛 over {0, 1}𝑛 and hence
Eve gets absolutely no information about 𝑏.
This shows that if 𝑓𝑠 (⋅) was a random function then Eve would win
the game with probability at most 1/2. Now if we have some efficient
Eve that wins the game with probability at least 1/2 + 𝜖 then we can
build an adversary 𝐴 for the PRF that will run this entire game with
black box access to 𝑓𝑠 (⋅) and will output 1 if and only if Eve wins. By
the argument above, there would be a difference of at least 𝜖 in the
probability it outputs 1 when 𝑓𝑠 (⋅) is random vs when it is pseudoran-
dom, hence contradicting the security property of the PRF.

5.2 PSEUDORANDOM PERMUTATIONS / BLOCK CIPHERS


Now that we have pseudorandom functions, we might get greedy and
want such functions with even more magical properties. This is where
the notion of pseudorandom permutations comes in.
::: {.definition title=“Pseudorandom permutations” #PRPdef} Let
ℓ ∶ ℕ → ℕ be some function that is polynomially bounded (i.e., there
are some 0 < 𝑐 < 𝐶 such that 𝑛𝑐 < ℓ(𝑛) < 𝑛𝐶 for every 𝑛). A
collection of functions {𝑓𝑠 } where 𝑓𝑠 ∶ {0, 1}ℓ → {0, 1}ℓ for ℓ = ℓ(|𝑠|) is
called a pseudorandom permutation (PRP) collection if:
1. It is a pseudorandom function collection (i.e., the map 𝑠, 𝑥 ↦ 𝑓𝑠 (𝑥)
is efficiently computable and there is no efficient distinguisher
between 𝑓𝑠 (⋅) with a random 𝑠 and a random function).
2. Every function 𝑓𝑠 is a permutation of {0, 1}ℓ (i.e., a one to one and
onto map).
3. There is an efficient algorithm that on input 𝑠, 𝑦 returns 𝑓𝑠−1 (𝑦).
The parameter 𝑛 is known as the key length of the pseudorandom
permutation collection and the parameter ℓ = ℓ(𝑛) is known as the
input length or block length. Often, ℓ = 𝑛 and so in most cases you
can safely ignore this distinction.

P
At first look ?? might seem not to make sense, since on
one hand it requires the map 𝑥 ↦ 𝑓𝑠 (𝑥) to be a permu-
tation, but on the other hand it can be shown that with
high probability a random map 𝐻 ∶ {0, 1}ℓ → {0, 1}ℓ
will not be a permutation. How can then such a collec-
tion be pseudorandom? The key insight is that while
134 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

a random map might not be a permutation, it is not


possible to distinguish with a polynomial number of
queries between a black box that computes a random
function and a black box that computes a random
permutation. Understanding why this is the case,
and why this means that ?? is reasonable, is crucial
to getting intuition to this notion, and so I suggest
you pause now and make sure you understand these
points.

As usual with a new concept, we want to know whether it is possi-


ble to achieve it and whether it is useful. The former is established by
the following theorem:

If the PRF conjecture holds (and


Theorem 5.6 — PRP’s from PRFs.
hence by Theorem 5.1 also if the PRG conjecture holds) then there
exists a pseudorandom permutation collection.

Figure 5.9: We build a PRP 𝑝 on 2𝑛 bits from


three PRFs 𝑓𝑠1 , 𝑓𝑠2 , 𝑓𝑠3 on 𝑛 bits by letting
𝑝𝑠1 ,𝑠2 ,𝑠3 (𝑥1 , 𝑥2 ) = (𝑧1 , 𝑦2 ) where 𝑦1 = 𝑥1 ⊕ 𝑓𝑠1 (𝑥2 ),
𝑦2 = 𝑥2 ⊕ 𝑓𝑠2 (𝑦1 ) and 𝑧1 = 𝑓𝑠3 (𝑦2 ) ⊕ 𝑦1 .

Proof. Fig. 5.9 illustrates the construction of a pseudorandom permu-


tation from a pseudorandom function. The construction (known as
the Luby-Rackoff construction) uses several rounds of what is known
as the Feistel Transformation that maps a function 𝑓 ∶ {0, 1}𝑛 →
{0, 1}𝑛 into a permutation 𝑔 ∶ {0, 1}2𝑛 → {0, 1}2𝑛 using the map
(𝑥, 𝑦) ↦ (𝑥, 𝑓(𝑥) ⊕ 𝑦).
Specifically, given a PRF family {𝑓𝑠 } with 𝑛-bit keys, inputs, and
outputs, our candidate PRP family will be called {𝑝𝑠1 ,𝑠2 ,𝑠3 }. Here,
𝑝𝑠1 ,𝑠2 ,𝑠3 ∶ {0, 1}2𝑛 → {0, 1}2𝑛 is calculated on input (𝑥1 , 𝑥2 ) ∈ {0, 1}2𝑛
as follows (see Fig. 5.9):

• First, map (𝑥1 , 𝑥2 ) ↦ (𝑦1 , 𝑥2 ), where 𝑦1 = 𝑥1 ⊕ 𝑓𝑠1 (𝑥2 ).


pse u d ora n d om fu nc ti on s from p se u d ora n d om g e n e rators a n d cpa se c u ri ty 135

• Next, map (𝑦1 , 𝑥2 ) ↦ (𝑦1 , 𝑦2 ), where 𝑦2 = 𝑥2 ⊕ 𝑓𝑠2 (𝑦1 ).


• Next, map (𝑦1 , 𝑦2 ) ↦ (𝑧1 , 𝑦2 ), where 𝑧1 = 𝑦1 ⊕ 𝑓𝑠3 (𝑦2 ).
• Finally, output 𝑝𝑠1 ,𝑠2 ,𝑠3 (𝑥1 , 𝑥2 ) = (𝑧1 , 𝑦2 ).

Each of the first three steps above corresponds to a single round of


the Feistel transformation, which is easily seen to be both efficiently
computable and efficiently invertible. In fact, we can efficiently calcu-
late 𝑝𝑠−1 1 ,𝑠2 ,𝑠3
(𝑧1 , 𝑦2 ) for an arbitrary string (𝑧1 , 𝑦2 ) ∈ {0, 1}2𝑛 by running
the above three rounds of Feistel transformations in reverse order.
Thus, the real challenge in proving Theorem 5.6 is not showing
that {𝑝𝑠1 ,𝑠2 ,𝑠3 } is a valid permutation, but rather showing that it is
pseudorandom. The details of remainder of this proof are a bit technical,
and can be safely skipped on a first reading.
Intuitively, the argument goes like this. Consider an oracle 𝒪 for
𝑝𝑠1 ,𝑠2 ,𝑠3 that answers an adversary’s query (𝑥1 , 𝑥2 ) by carrying out the
three Feistel transformations outlined above and outputting (𝑧1 , 𝑦2 ).
First, we’ll show that with high probability, 𝒪 will never encounter
the same intermediate string 𝑦1 twice, over the course of all queries
(unless the adversary makes a duplicate query). Since the string 𝑦1 ,
calculated in Step 1, determines the input on which 𝑓𝑠2 is evaluated
in Step 2, it follows that the strings 𝑦2 calculated in Step 2 will appear
to be chosen independently and at random. In particular, they too
will be pairwise distinct with high probability. Since the string 𝑦2
is in turn passed as input to 𝑓𝑠3 in Step 3, it follows that the strings
𝑧1 encountered over the course of all queries will also appear to be
chosen independently and at random. Ultimately, this means that the
oracle’s outputs (𝑧1 , 𝑦2 ) will look like freshly independent, random
strings.
To make this reasoning precise, notice first that it suffices to estab-
lish the security of a variant of 𝑝𝑠1 ,𝑠2 ,𝑠3 in which the pseudorandom
functions 𝑓𝑠1 , 𝑓𝑠2 , and 𝑓𝑠3 used in the construction are replaced by
truly random functions ℎ1 , ℎ2 , ℎ3 ∶ {0, 1}𝑛 → {0, 1}𝑛 . Call this vari-
ant 𝑝ℎ1 ,ℎ2 ,ℎ3 . Indeed, the assumption that {𝑓𝑠 } is a PRF collection
tells us that making this change has only a negligible effect on the
output of an adversary with oracle access to 𝑝. With this in mind,
our job is to show that for every efficient adversary 𝐴, the difference
| Pr[𝐴𝑝ℎ1 ,ℎ2 ,ℎ3 (⋅) (1𝑛 ) = 1] − Pr[𝐴𝐻(⋅) (1𝑛 ) = 1]| is negligible. In this
expression, the first probability is taken over the choice of the random
functions ℎ1 , ℎ2 , ℎ3 ∶ {0, 1}𝑛 → {0, 1}𝑛 used in the Feistel transfor-
mation, and the second probability is taken over the random function
𝐻 ∶ {0, 1}2𝑛 → {0, 1}2𝑛 . To simplify matters, suppose without loss of
generality that 𝐴 always makes 𝑞(𝑛) distinct queries to its oracle, de-
noted (𝑥1 , 𝑥2 ), … , (𝑥1 ) in order. Similarly, let 𝑦1 , 𝑦2 , 𝑧1
(1) (1) (𝑞(𝑛)) (𝑞(𝑛)) (𝑖) (𝑖) (𝑖)
, 𝑥2
136 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

denote the intermediate strings calculated in the three rounds of the


Feistel transformation. Here, 𝑞 is a polynomial in 𝑛.
Consider the case in which the adversary 𝐴 is interacting with the
oracle for 𝑝ℎ1 ,ℎ2 ,ℎ3 , as opposed to the random oracle. Let us say that
a collision occurs at 𝑦1 if for some 1 ≤ 𝑖 < 𝑗 ≤ 𝑞(𝑛), the string 𝑦1
(𝑖)

computed while answering 𝐴’s 𝑖th query coincides with the string 𝑦1
(𝑗)

computed while answering 𝐴’s 𝑗th query. We claim the probability


that a collision occurs at 𝑦1 is negligibly small. Indeed, if a collision
occurs at 𝑦1 , then 𝑦1 = 𝑦1 for some 𝑖 ≠ 𝑗. By the construction of
(𝑖) (𝑗)

𝑝ℎ1 ,ℎ2 ,ℎ3 , this means that 𝑥1 ⊕ ℎ1 (𝑥2 ) = 𝑥1 ⊕ ℎ1 (𝑥2 ). In particular,


(𝑖) (𝑖) (𝑗) (𝑗)

it cannot be the case that 𝑥1 ≠ 𝑥1 and 𝑥2 = 𝑥2 . Since we assumed


(𝑖) (𝑗) (𝑖) (𝑗)

that 𝐴 makes distinct queries to its oracle, it follows that 𝑥2 ≠ 𝑥2 and


(𝑖) (𝑗)

hence that ℎ1 (𝑥2 ) and ℎ1 (𝑥2 ) are uniform and independent. In other
(𝑖) (𝑗)

words, Pr[𝑦1 = 𝑦1 ] = Pr[𝑥1 ⊕ 𝑓1 (𝑥2 ) = 𝑥1 ⊕ 𝑓1 (𝑥2 )] = 2−𝑛 . Taking


(𝑖) (𝑗) (𝑖) (𝑖) (𝑗) (𝑗)

a union bound over all choices of 𝑖 and 𝑗, we see that the probability of
a collision at 𝑦1 is at most 𝑞(𝑛)2 /2𝑛 , which is negligible.
Next, define a collision at 𝑦2 , by a pair of queries 1 ≤ 𝑖 < 𝑗 ≤ 𝑞(𝑛)
such that 𝑦2 = 𝑦2 . We argue that the probability of a collision at
(𝑖) (𝑗)

𝑦2 is also negligible, provided that we condition on the overwhelm-


ingly likely event that no collision occurs at 𝑦1 . Indeed, if 𝑦1 ≠ 𝑦1
(𝑖) (𝑗)

for all 𝑖 ≠ 𝑗, then ℎ2 (𝑦1 ), … , ℎ2 (𝑦1 ) are distribued independently


(1) (𝑞(𝑛))

and uniformly at random. In particular, we have Pr[𝑦2 = 𝑦2 ∣


(𝑖) (𝑗)

no collision at 𝑦1 ] = 2−𝑛 , which is negligible even after taking a union


bound over all 𝑖 and 𝑗. The same argument applied to the third round
of the Feistel transformation similarly shows that, conditioned on the
overwhelmingly likely event that no collision occurs at 𝑦1 or 𝑦2 , the
strings 𝑧1 , … , 𝑧1 for 1 ≤ 𝑖 ≤ 𝑞(𝑛) are also distributed as fresh,
(1) (𝑖)

independent, random strings. At this point, we’ve shown that the ad-
versary cannot distinguish the outputs (𝑧1 , 𝑦2 ), … , (𝑧1 ) of
(1) (1) (𝑞(𝑛)) (𝑞(𝑛))
, 𝑦2
the oracle for 𝑝ℎ1 ,ℎ2 ,ℎ3 from the outputs of a random oracle unless an
event with negligibly small probability occurs. We conclude that the
collection {𝑝ℎ1 ,ℎ2 ,ℎ3 }, and hence our original collection {𝑝𝑠1 ,𝑠2 ,𝑠3 }, is a
secure PRP collection.
For more details regarding this proof, see Section 4.5 in Boneh
Shoup or Section 8.6 (7.6 in 2nd ed) in Katz-Lindell, whose proof was
used as a model for ours.

R
Remark 5.7 — How many Feistel rounds?. The construc-
tion in the proof of Theorem 5.6 constructed a PRP 𝑝
by performing 3 rounds of the Feistel transformation
with a known PRF 𝑓. It is an interesting exercise to
try to show that doing just 1 or 2 rounds of the Feis-
pse u d ora n d om fu nc ti on s from p se u d ora n d om g e n e rators a n d cpa se c u ri ty 137

tel transformation does not suffice to achieve a PRP.


Hint: consider an adversary that makes queries of the form
(𝑥1 , 𝑥2 ) where 𝑥2 is held fixed and 𝑥1 is varied.

The more common name for a pseudorandom permutation is block


cipher (though typically block ciphers are expected to meet additional
security properties on top of being PRPs). The constructions for block
ciphers used in practice don’t follow the construction of Theorem 5.6
(though they use some of the ideas) but have a more ad-hoc nature.
One of the first modern block ciphers was the Data Encryption
Standard (DES) constructed by IBM in the 1970’s. It is a fairly good
cipher- to this day, as far as we know, it provides a pretty good num-
ber of security bits compared to its key length. The trouble is that its
key is only 56 bits long, which is no longer outside the reach of mod-
ern computing power. (It turns out that subtle variants of DES are far
less secure and fall prey to a technique known as differential crypt-
analysis; the IBM designers of DES were aware of this technique but
kept it secret at the behest of the NSA.)
Between 1997 and 2001, the U.S. National Institute of Standards and
Technology (NIST) ran a competition to replace DES which resulted
in the adoption of the block cipher Rijndael as the new advanced
encryption standard (AES). It has a block size (i.e., input length) of
128 bits and a key size (i.e., seed length) of 128, 196, or 256 bits.
The actual construction of AES (or DES for that matter) is not ex-
tremely illuminating, but let us say a few words about the general
principle behind many block ciphers. They are typically constructed
by repeating one after the other a number of very simple permuta-
tions (see Fig. 5.10). Each such iteration is called a round. If there are
𝑡 rounds, then the key 𝑘 is typically expanded into a longer string,
which we think of as a 𝑡 tuple of strings (𝑘1 , … , 𝑘𝑡 ) via some pseu-
dorandom generator known as the key scheduling algorithm. The 𝑖-th
string in the tuple is known as the round key and is used in the 𝑖𝑡ℎ
round. Each round is typically composed of several components:
there is a “key mixing component” that performs some simple permu-
tation based on the key (often as simply as XOR’ing the key), there is
a “mixing component” that mixes the bits of the block so that bits that
were initially nearby don’t stay close to one another, and then there is
some non-linear component (often obtained by applying some simple
non-linear functions known as “S boxes” to each small block of the
input) that ensures that the overall cipher will not be an affine func-
tion. Each one of these operations is an easily reversible operations,
and hence decrypting the cipher simply involves running the rounds
Figure 5.10: A typical round of a block cipher, 𝑘𝑖 is the
backwards. 𝑖𝑡ℎ round key, 𝑥𝑖 is the block before the 𝑖𝑡ℎ round and
𝑥𝑖+1 is the block at the end of this round.
138 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

5.3 ENCRYPTION MODES


How do we use a block cipher to actually encrypt traffic? Well we
could use it as a PRF in the construction above, but in practice people 3
Partially this is because in the above construction we
use other ways.3 had to encode a plaintext of length 𝑛 with a ciphertext
The most natural approach would be that to encrypt a message of length 2𝑛 meaning an overhead of 100 percent in
the communication.
𝑚, we simply use 𝑝𝑠 (𝑚) where {𝑝𝑠 } is the PRP/block cipher. This is
known as the electronic code book (ECB) mode of a block cipher (see
Fig. 5.11). Note that we can easily decrypt since we can compute
𝑝𝑠−1 (𝑚). If the PRP {𝑝𝑠 } only accepts inputs of a fixed length ℓ, we can
use ECB mode to encrypt a message 𝑚 whose length is a multiple of
ℓ by writing 𝑚 = (𝑚1 , 𝑚2 , … , 𝑚𝑡 ), where each block 𝑚𝑖 has length ℓ,
and then encrypting each block 𝑚𝑖 separately. The ciphertext output
by this encryption scheme is (𝑝𝑠 (𝑚1 ), … , 𝑝𝑠 (𝑚𝑡 )). A major drawback
of ECB mode is that it is a deterministic encryption scheme and hence
cannot be CPA secure. Moreover, this is actually a real problem of se-
curity on realistic inputs (see Fig. 5.12), so ECB mode should never be
used.
A more secure way to use a block cipher to encrypt is the cipher
block chaining (CBC) mode. The idea of cipher block chaining is to
encrypt the blocks of a message 𝑚 = (𝑚1 , … , 𝑚𝑡 ) sequentially. To
encrypt the first block 𝑚1 , we XOR 𝑚1 with a random string known
as the initialization vector, or IV, before applying the block cipher 𝑝𝑠 . Figure 5.11: In the Electronic Codebook (ECB) mode,
every message is encrypted deterministically and
To encrypt one of the later blocks 𝑚𝑖 , where 𝑖 > 1, we XOR 𝑚𝑖 with independently
the encryption of 𝑚𝑖−1 before applying the block cipher 𝑝𝑠 . Formally,
the ciphertext consists of the tuple (IV, 𝑐1 , … , 𝑐𝑡 ), where IV is chosen
uniformly at random and 𝑐𝑖 = 𝑝𝑠 (𝑐𝑖−1 ⊕ 𝑚𝑖 ) for 1 ≤ 𝑖 ≤ 𝑡 (we use
the convention that 𝑐0 = IV). This encryption process is depicted
in Fig. 5.13. In order to decrypt (IV, 𝑐1 , … , 𝑐𝑡 ), we simply calculate
𝑚𝑖 = 𝑝𝑠−1 (𝑐𝑖 ) ⊕ 𝑐𝑖−1 for 1 ≤ 𝑖 ≤ 𝑡. Note that if we lose the block 𝑐𝑖 to
traffic in the CBC mode, then we are unable to decrypt the next block
Figure 5.12: An encryption of the Linux penguin (left
𝑐𝑖+1 , but we can recover from that point onwards. image) using ECB mode (middle image) vs CBC
On the one hand, CBC mode is vastly superior to a simple elec- mode (right image). The ECB encryption is insecure
as it reveals much structure about the original image.
tronic codebook since CBC mode with a random IV is CPA secure Image taken from Wikipedia.
(proving this is an excellent exercise). On the other hand, CBC mode
suffers from the drawback that the encryption process cannot be par-
allelized: the ciphertext block 𝑐𝑖 must be computed before 𝑐𝑖+1 .
In the output feedback (OFB) mode we first encrypt the all zero string
using CBC mode to get a sequence (𝑦1 , 𝑦2 , …) of pseudorandom
outputs that we can use as a stream cipher. To transmit a message
𝑚 ∈ {0, 1}∗ , we send the XOR of 𝑚 with the bits output by this stream
cipher, along with the IV used to generate the sequence. The receiver Figure 5.13: In the Cypher-Block-Chaining (CBC) the
encryption of the previous message is XOR’ed into
can decrypt a ciphertext (IV, 𝑐) by first using IV to recover (𝑦1 , 𝑦2 , …), the current message prior to encrypting. The first
and then taking the XOR of 𝑐 with the appropriate number of bits message is XOR’ed with an initialization vector (IV)
that if chosen randomly, ensures CPA security.
pse u d ora n d om fu nc ti on s from p se u d ora n d om g e n e rators a n d cpa se c u ri ty 139

from this sequence. Like CBC mode, OFB mode is CPA secure when
IV is chosen at random. Some advantages of OFB mode over CBC
mode include the ability for the sender to precompute the sequence
(𝑦1 , 𝑦2 , …) well before the message to be encrypted is known, as well
as the fact that the underlying function 𝑝𝑠 used to generate (𝑦1 , 𝑦2 , …)
only needs to be a PRF (not necessarily a PRP).
Perhaps the simplest mode of operation is counter (CTR) mode
where we convert a block cipher to a stream cipher by using the
stream 𝑝𝑠 (IV), 𝑝𝑠 (IV + 1), 𝑝𝑠 (IV + 2), … where IV is a random string
in {0, 1}𝑛 which we identify with [2𝑛 ] (and perform addition modulo
2𝑛 ). That is, to encrypt a message 𝑚 = (𝑚1 , … , 𝑚𝑡 ), we choose IV at
random, and output (IV, 𝑐1 , … , 𝑐𝑡 ), where 𝑐𝑖 = 𝑝𝑠 (IV + 𝑖) ⊕ 𝑚𝑖 for
1 ≤ 𝑖 ≤ 𝑡. Decryption is performed similarly. For a modern block
cipher, CTR mode is no less secure than CBC or OFB, and in fact of-
fers several advantages. For example, CTR mode can easily encrypt
and decrypt blocks in parallel, unlike CBC mode. In addition, CTR
mode only needs to evaluate 𝑝𝑠 once to decrypt any single block of the
ciphertext, unlike OFB mode.
A fairly comprehensive study of the different modes of block ci-
phers is in this document by Rogaway. His conclusion is that if we
simply consider CPA security (as opposed to the stronger notions
of chosen ciphertext security we’ll see in the next lecture) then counter
mode is the best choice, but CBC, OFB and CFB are widely imple-
mented due to legacy reasons. ECB should not be used (except as a
building block as part of a construction achieving stronger security).

5.4 OPTIONAL, ASIDE: BROADCAST ENCRYPTION


At the beginning of this chapter, we saw the proof of Theorem 5.1,
which states that the PRG Conjecture implies the existence of a secure
PRF collection. At the heart of this proof was a rather clever construc-
tion based on a binary tree. As it turns out, similar tree constructions
have been used time and again to solve many other problems in cryp-
tography. In this section, we will discuss just one such application of
these tree constructions, namely broadcast encryption.
Let’s put ourselves in the shoes of Hollywood executives facing
the following problem: we’ve just released a new movie for sale (in
the form of a download or a Blu-ray disc), and we’d like to prevent it
from being pirated. On the one hand, consumers who’ve purchased
a copy of the movie should be able to watch it on certain approved,
standalone devices such as TVs and Blu-ray players without needing
an external internet connection. On the other hand, to minimize the
risk of piracy, these consumers should not have access to the movie
data itself.
140 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

One way to protect the movie data, which we model as a string 𝑥,


is to provide consumers with a secure encryption 𝐸𝑘 (𝑥) of the data.
Although the secret key 𝑘 used to encrypt the data is hidden from
consumers, it is provided to device manufacturers so that they can
embed it in their TVs and Blu-ray players in some secure, tamper-
resistant manner. As long as the key 𝑘 is never leaked to the public,
this system ensures that only approved devices can decrypt and play
a consumer’s copy of the movie. For this reason, we will sometimes
refer to 𝑘 as the device key. This setup is depicted in Fig. 5.14.

Figure 5.14: The problem setup for broadcast encryp-


tion.

Unfortunately, if we were to implement this scheme exactly as


written, it would almost certainly be broken in a matter of days. After
all, as soon as even a single device is hacked, the device key 𝑘 would
be revealed. This would allow the public to access our movie’s data, as
well as the data for all future movies we release for these devices! This
latter consequence is one that we would certainly want to avoid, and
doing so requires the notion of distinct, revocable keys:

Definition 5.8 — Broadcast Encryption Scheme. For our purposes, a broad-


cast encryption scheme consists of:

• A set of 𝑚 distinct devices (or device manufacturers), each of


which has access to one of the 𝑛-bit device keys 𝑘1 , … , 𝑘𝑚 .

• A decryption algorithm 𝐷 that receives as input a ciphertext 𝑦


and a key 𝑘𝑖 .
pse u d ora n d om fu nc ti on s from p se u d ora n d om g e n e rators a n d cpa se c u ri ty 141

• An encryption algorithm 𝐸 that receives as input a plaintext 𝑥,


a key 𝑘𝑚𝑎𝑠𝑡𝑒𝑟 , and a revocation set 𝑅 ⊆ [𝑚] of devices (or device
manufacturers) that are no longer to be trusted.

Intuitively, a broadcast encryption scheme is secure if 𝐷𝑘𝑖 can suc-


cessfully recover 𝑥 from 𝐸𝑘𝑚𝑎𝑠𝑡𝑒𝑟 ,𝑅 (𝑥) whenever 𝑖 ∉ 𝑅, but fails to
do so whenever 𝑖 ∈ 𝑅. In our example of movie piracy, such an en-
cryption scheme would allow us to revoke certain device keys 𝑘𝑖 when
we find out that they have been leaked. To revoke a key 𝑘𝑖 , we would
simply include 𝑖 ∈ 𝑅 when encrypting all future movies. Doing so
prevents 𝑘𝑖 from being used to decrypt these movies. Crucially, revok-
ing the key 𝑘𝑖 of the hacked device 𝑖 doesn’t prevent a secure device
𝑗 ≠ 𝑖 from continuing to perform decryption on future movie releases;
this is exactly what we want in our system.
For the sake of brevity, we will not provide a formal definition of
security for broadcast encryption schemes, although this can and has
been done. Instead, in the remainder of this section, we will describe
a couple examples of broadcast encryption schemes, one of which
makes clever use of a tree construction, as promised.
The simplest construction of a broadcast encryption scheme in-
volves letting 𝑘𝑚𝑎𝑠𝑡𝑒𝑟 = (𝑘1 , … , 𝑘𝑚 ) be the collection of all device keys
and letting 𝐸𝑘𝑚𝑎𝑠𝑡𝑒𝑟 ,𝑅 (𝑥) be the concatenation over all 𝑖 ∉ 𝑅 of a secure
encryption 𝐸𝑘𝑖 (𝑥). Device 𝑖 performs decryption by looking up the
relevant substring 𝐸𝑘𝑖 (𝑥) of the ciphertext and decrypting it with 𝑘𝑖 .
Intuitively, with this scheme, if 𝑥 represents our movie data and there
are 𝑚 ≈ one million devices, then 𝐸𝑘𝑚𝑎𝑠𝑡𝑒𝑟 ,𝑅 (𝑥) is just an encryption
of one million copies of the movie (one for each device key). Revok-
ing the key 𝑘𝑖 amounts to only encrypting 999, 999 copies of all future
movies, so that device 𝑖 can no longer perform decryption.
Clearly, this simple solution to the broadcast encryption prob-
lem has two serious inefficiencies: the length of the master key is
𝑂(𝑛𝑚), and the length of each encryption is 𝑂(|𝑥|𝑚). One way to
address the former problem is to use a key derivation function. That
is, we can shorten the master key by choosing a fixed PRF collection
{𝑓𝑘 }, and calculating each device key 𝑘𝑖 by the rule 𝑘𝑖 = 𝑓𝑘𝑚𝑎𝑠𝑡𝑒𝑟 (𝑖).
The latter problem can be addressed using a technique known as
hybrid encryption. In hybrid encryption, we encrypt 𝑥 by first choos-
ing an ephemeral key 𝑘̂ ←𝑅 {0, 1}𝑛 , encrypting 𝑘̂ using each device
key 𝑘𝑖 where 𝑖 ∉ 𝑅, and then outputting the concatenation of these
strings 𝐸𝑘𝑖 (𝑘),
̂ along with a single encryption 𝐸 ̂ (𝑥) of the movie us-
𝑘
ing the ephermal key. Incorporating these two optimizations reduces
the length of 𝑘𝑚𝑎𝑠𝑡𝑒𝑟 to 𝑂(𝑛) and the length of each encryption to
𝑂(𝑛𝑚 + |𝑥|).
142 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Figure 5.15: A tree based construction of broadcast


encryption with revocable keys.

It turns out that we can construct a broadcast encryption scheme


with even shorter ciphertexts by considering a tree of keys (see
Fig. 5.15). The root of this tree is labeled 𝑘∅ , its children are 𝑘0 and 𝑘1 ,
their children are 𝑘00 , 𝑘01 , 𝑘10 , 𝑘11 , and so on. The depth of the tree is
log2 𝑚, and the value of each key in the tree is decided uniformly at
random, or by applying a key derivation function to a string 𝑘𝑚𝑎𝑠𝑡𝑒𝑟 .
Each device 𝑖 receives all the keys on the path from the root to the
𝑖th leaf. For example, if 𝑚 = 8, then device 011 receives the keys
𝑘∅ , 𝑘0 , 𝑘01 , 𝑘011 .
To encrypt a message 𝑥, we carry out the following procedure:
initially, when no keys have been revoked, we encrypt 𝑥 using an
ephermal key 𝑘̂ (as described above) and encrypt 𝑘̂ with a single
device key 𝑘∅ . This is sufficient since all devices have access to 𝑘∅ . In
order to add a hacked device 𝑖 to the revocation set, we discard all
log2 𝑚 keys belonging to device 𝑖, which comprise a root-to-leaf path
in the tree. Instead of using these keys, we will make sure to encrypt
all future 𝑘’s ̂ using the siblings of the vertices along this path. Doing
so ensures that (1) device 𝑖 can no longer decrypt secure content and
(2) every device 𝑗 ≠ 𝑖 can continue to decrypt content using at least
one of the keys along the path from the root to the 𝑗th leaf. With this
scheme, the total length of a ciphertext is only 𝑂(𝑛|𝑅| log2 𝑚 + |𝑥|) bits,
where |𝑅| is the number of devices revoked so far. When |𝑅| is small,
this bound is much better than what we previously achieved without a
tree-based construction.
pse u d ora n d om fu nc ti on s from p se u d ora n d om g e n e rators a n d cpa se c u ri ty 143

5.5 READING COMPREHENSION EXERCISES


I recommend students do the following exercises after reading the
lecture. They do not cover all material, but can be a good way to check
your understanding.
Exercise 5.1Let (𝐸, 𝐷) be the encryption scheme that we saw in Lec-
ture 2 where 𝐸𝑘 (𝑚) = 𝐺(𝑘) ⊕ 𝑚 where 𝐺(⋅) is a pseudorandom
generator. Is this scheme CPA secure?

a. No it is never CPA secure.


b. It is always CPA secure.
c. It is sometimes CPA secure and sometimes not, depending on the
properties of the PRG 𝐺

Exercise 5.2 Consider the proof constructing PRFs from PRGs. Up


to an order of magnitude, how many invocations of the underlying
pseudorandom generator does the pseudorandom function collection
make when queried on an input 𝑖 ∈ {0, 1}𝑛 ?

a. 𝑛
b. 𝑛2
c. 1
d. 2𝑛

In the following we identify a block cipher with a pseudo-


Exercise 5.3
random permutation (PRP) collection. Which of these statements is
true:

a. Every PRP collection is also a PRF collection


b. Every PRF collection is also a PRP collection
c. If {𝑓𝑠 } is a PRP collection then the encryption scheme 𝐸𝑠 (𝑚) =
𝑓𝑠 (𝑚) is a CPA secure encryption scheme.
d. If {𝑓𝑠 } is a PRF collection then the encryption scheme 𝐸𝑠 (𝑚) =
𝑓𝑠 (𝑚) is a CPA secure encryption scheme.


6
Chosen Ciphertext Security

6.1 SHORT RECAP


Let’s start by reviewing what we have learned so far:

• We can mathematically define security for encryption schemes.


A natural definition is perfect secrecy: no matter what Eve does,
she can’t learn anything about the plaintext that she didn’t know
before. Unfortunately this requires the key to be as long as the
message, thus placing a severe limitation on the usability of it.

• To get around this, we need to consider computational consid-


erations. A basic object is a pseudorandom generator and we con-
sidered the PRG Conjecture which stipulates the existence of an
efficiently computable function 𝐺 ∶ {0, 1}𝑛 → {0, 1}𝑛+1 such that
𝐺(𝑈𝑛 ) ≈ 𝑈𝑛+1 (where 𝑈𝑚 denotes the uniform distribution on 1
The PRG conjecture is the name we use in this
{0, 1}𝑚 and ≈ denotes computational indistinguishability).1 course. In the literature this is known as the conjec-
ture of the existence of pseudorandom generators,
and through the work of Håstad, Impagliazzo, Levin
• We showed that the PRG conjecture implies a pseudorandom gen-
and Luby (HILL) it is known to be equivalent to the
erator of any polynomial output length which in particular via the existence of one way functions, see Vadhan, Chapter 7.
stream cipher construction implies a computationally secure en-
cryption with plaintext arbitrarily larger than the key. (The only re-
striction is that the plaintext is of polynomial size which is needed
anyway if we want to actually be able to read and write it.)

• We then showed that the PRG conjecture actually implies a stronger


object known as a pseudorandom function (PRF) function collection:
this is a collection {𝑓𝑠 } of functions such that if we choose 𝑠 at
random and fix it, and give an adversary a black box computing
𝑖 ↦ 𝑓𝑠 (𝑖) then she can’t tell the difference between this and a black- 2
This was done by Goldreich, Goldwasser and Micali.
box computing a random function.2

• Pseudorandom functions turn out to be useful for identification pro-


tocols, message authentication codes and this strong notion of security

Compiled on 11.17.2021 22:35


146 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

of encryption known as chosen plaintext attack (CPA) security where


we are allowed to encrypt many messages of Eve’s choice and still re-
quire that the next message hides all information except for what
Eve already knew before.

6.2 GOING BEYOND CPA


It may seem that we have finally nailed down the security definition
for encryption. After all, what could be stronger than allowing Eve
unfettered access to the encryption function? Clearly an encryption
satisfying this property will hide the contents of the message in all
practical circumstances. Or will it?

P
Please stop and play an ominous sound track at this
point.

6.2.1 Example: The Wired Equivalence Privacy (WEP)


The Wired Equivalence Privacy (WEP) protocol is perhaps one of the
most inaccurately named protocols of all times. It was invented in
1999 for the purpose of securing Wi-Fi networks so that they would
have virtually the same level of security as wired networks, but al-
ready early on several security flaws were pointed out. In particular
in 2001, Fluhrer, Mantin, and Shamir showed how the RC4 flaws we
mentioned in prior lecture can be used to completely break WEP in
less than one minute. Yet, the protocol lingered on and for many years
after was still the most widely used WiFi encryption protocol as many
routers had it as the default option. In 2007 the WEP was blamed for
a hack stealing 45 million credit card numbers from T.J. Maxx. In 2012
(after 11 years of attacks!) it was estimated that it is still used in about
a quarter of encrypted wireless networks, and in 2014 it was still the
default option on many Verizon home routers. It is still (!) used in
some routers, see Fig. 6.1. This is a great example of how hard it is to
remove insecure protocols from practical usage (and so how impor-
tant it is to get these protocols right).
Here we will talk about a different flaw of WEP that is in fact
shared by many other protocols, including the first versions of the
secure socket layer (SSL) protocol that is used to protect all encrypted
web traffic.
To avoid superfluous details we will consider a highly abstract (and
somewhat inaccurate) version of WEP that still demonstrates our
main point. In this protocol Alice (the user) sends to Bob (the access
point) an IP packet that she wants routed somewhere on the internet.
chose n c i p he rte x t se c u ri ty 147

Figure 6.1: WEP usage over time according to


Wigle.net. Despite having documented security
issues since 2001 and being officially deprecated since
2004, WEP continued to be the most popular WiFi
encryption protocol up to 2012, and at the time of
writing, it is still used by a non-trivial number of
devices (though see this stackoverflow answer for
more).

Thus we can think of the message Alice sends to Bob as a string


𝑚 ∈ {0, 1}ℓ of the form 𝑚 = 𝑚1 ‖𝑚2 where 𝑚1 is the IP address this
packet needs to be routed to and 𝑚2 is the actual message that needs
to be delivered. In the WEP protocol, the message that Alice sends
to Bob has the form 𝐸𝑘 (𝑚‖CRC(𝑚)) (where ‖ denotes concatenation
and CRC(𝑚) is some cyclic redundancy check). A CRC is some func-
tion mapping {0, 1}𝑛 to {0, 1}𝑡 which is meant to enable detection of
errors in typing or communication. The idea is that if a message 𝑚 is
mistyped into 𝑚′ , then it is very likely that CRC(𝑚) ≠ CRC(𝑚′ ). It is
similar to the checksum digits used in credit card numbers and many
other cases. Unlike a message authentication code, a CRC does not
have a secret key and is not secure against adversarial perturbations.
The actual encryption WEP used was RC4, but for us it doesn’t
really matter. What does matter is that the encryption has the form
𝐸𝑘 (𝑚′ ) = 𝑝𝑎𝑑 ⊕ 𝑚′ where 𝑝𝑎𝑑 is computed as some function of the
key. In particular the attack we will describe works even if we use our
stronger CPA secure PRF-based scheme where 𝑝𝑎𝑑 = 𝑓𝑘 (𝑟) for some
random (or counter) 𝑟 that is sent out separately.
Now the security of the encryption means that an adversary seeing
the ciphertext 𝑐 = 𝐸𝑘 (𝑚‖CRC(𝑚)) will not be able to know 𝑚, but
since this is traveling over the air, the adversary could “spoof” the
signal and send a different ciphertext 𝑐′ to Bob. In particular, if the
adversary knows the IP address 𝑚1 that Alice was using (e.g., for
example, the adversary can guess that Alice is probably one of the
billions of people that visit the website boazbarak.org on a regular
basis) then she can XOR the ciphertext with a string of her choosing
and hence convert the ciphertext 𝑐 = 𝑝𝑎𝑑 ⊕ (𝑚1 ‖𝑚2 ‖CRC(𝑚1 , 𝑚2 ))
into the ciphertext 𝑐′ = 𝑐 ⊕ 𝑥 where 𝑥 = 𝑥1 ‖𝑥2 ‖𝑥3 is computed so that
𝑥1 ⊕ 𝑚1 is equal to the adversary’s own IP address!
So, the adversary doesn’t need to decrypt the message- by spoof-
ing the ciphertext she can ensure that Bob (who is an access point,
148 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

and whose job is to decrypt and then deliver packets) simply delivers
it unencrypted straight into her hands. One issue is that Eve modi-
fies 𝑚1 then it is unlikely that the CRC code will still check out, and
hence Bob would reject the packet. However, CRC 32 - the CRC al-
gorithm used by WEP - is linear modulo 2, that is CRC(𝑥 ⊕ 𝑥′ ) =
CRC(𝑥) ⊕ CRC(𝑥′ ). This means that if the original ciphertext 𝑐
was an encryption of the message 𝑚 = 𝑚1 ‖𝑚2 ‖CRC(𝑚1 , 𝑚2 ) then
𝑐′ = 𝑐 ⊕ (𝑥1 ‖0𝑡 ‖CRC(𝑥1 ‖0𝑡 )) will be an encryption of the message
𝑚′ = (𝑚1 ⊕ 𝑥1 )‖𝑚2 ‖CRC((𝑥1 ⊕ 𝑚1 )‖𝑚2 ) (where 0𝑡 denotes a string of
zeroes of the same length 𝑡 as 𝑚2 , and hence 𝑚2 ⊕ 0𝑡 = 𝑚2 ). There-
fore by XOR’ing 𝑐 with 𝑥1 ‖0𝑡 ‖CRC(𝑥1 ‖0𝑡 ), the adversary Mallory can
ensure that Bob will deliver the message 𝑚2 to the IP address 𝑚1 ⊕ 𝑥1
of her choice (see Fig. 6.2).

Figure 6.2: The attack on the WEP protocol allowing


the adversary Mallory to read encrypted messages
even when Alice uses a CPA secure encryption.

6.2.2 Chosen ciphertext security


This is not an isolated example but in fact an instance of a general
pattern of many breaks in practical protocols. Some examples of pro-
tocols broken through similar means include XML encryption, IPSec
(see also here) as well as JavaServer Faces, Ruby on Rails, ASP.NET,
and the Steam gaming client (see the Wikipedia page on Padding
Oracle Attacks).
The point is that often our adversaries can be active and modify the
communication between sender and receiver, which in effect gives
them access not just to choose plaintexts of their choice to encrypt but
even to have some impact on the ciphertexts that are decrypted. This
motivates the following notion of security (see also Fig. 6.3):
chose n c i p he rte x t se c u ri ty 149

An encryption scheme (𝐸, 𝐷) is chosen


Definition 6.1 — CCA security.
ciphertext attack (CCA) secure if every efficient adversary Mallory
wins in the following game with probability at most 1/2 + 𝑛𝑒𝑔𝑙(𝑛):

• Mallory gets 1𝑛 where 𝑛 is the length of the key

• For 𝑝𝑜𝑙𝑦(𝑛) rounds, Mallory gets access to the functions 𝑚 ↦


𝐸𝑘 (𝑚) and 𝑐 ↦ 𝐷𝑘 (𝑐).

• Mallory chooses a pair of messages {𝑚0 , 𝑚1 }, a secret 𝑏 is cho-


sen at random in {0, 1}, and Mallory gets 𝑐∗ = 𝐸𝑘 (𝑚𝑏 ).

• Mallory now gets another 𝑝𝑜𝑙𝑦(𝑛) rounds of access to the func-


tions 𝑚 ↦ 𝐸𝑘 (𝑚) and 𝑐 ↦ 𝐷𝑘 (𝑐) except that she is not allowed
to query 𝑐∗ to her second oracle.

• Mallory outputs 𝑏′ and wins if 𝑏′ = 𝑏.

Figure 6.3: The CCA security game.

This might seems a rather strange definition so let’s try to digest it


slowly. Most people, once they understand what the definition says,
don’t like it that much. There are two natural objections to it:

• This definition seems to be too strong: There is no way we would


let Mallory play with a decryption box - that basically amounts to
letting her break the encryption scheme. Sure, she could have some
impact on the ciphertexts that Bob decrypts and observe some
resulting side effects, but there is a long way from that to giving her
oracle access to the decryption algorithm.
150 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

The response to this is that it is very hard to model what is the


“realistic” information Mallory might get about the ciphertexts she
might cause Bob to decrypt. The goal of a security definition is not to
capture exactly the attack scenarios that occur in real life but rather
to be sufficiently conservative so that these real life attacks could be
modeled in our game. Therefore, having a too strong definition is
not a bad thing (as long as it can be achieved!). The WEP example
shows that the definition does capture a practical issue in security and
similar attacks on practical protocols have been shown time and again
(see for example the discussion of “padding attacks” in Section 3.7.2
of the Katz Lindell book.)

• This definition seems to be too weak: What justification do we


have for not allowing Mallory to make the query 𝑐∗ to the decryp-
tion box? After all she is an adversary so she could do whatever she
wants. The answer is that the definition would be clearly impossi-
ble to achieve if Mallory could simply get the decryption of 𝑐∗ and
learn whether it was an encryption of 𝑚0 or 𝑚1 . So this restriction
is the absolutely minimal one we could make without causing the
notion to be obviously impossible. Perhaps surprisingly, it turns
out that once we make this minimal restriction, we can in fact con-
struct CCA-secure encryptions.

What does CCA have to do with WEP? The CCA security game is some-
what strange, and it might not be immediately clear whether it has
anything to do with the attack we described on the WEP protocol.
However, it turns out that using a CCA secure encryption would have
prevented that attack. The key is the following claim:
Lemma 6.2 Suppose that (𝐸, 𝐷) is a CCA secure encryption. Then,
there is no efficient algorithm that given an encryption 𝑐 of the plain-
text (𝑚1 , 𝑚2 ) outputs a ciphertext 𝑐′ that decrypts to (𝑚′1 , 𝑚2 ) where
𝑚′1 ≠ 𝑚1 .
In particular Lemma 6.2 rules out the attack of transforming 𝑐 that
encrypts a message starting with a some address IP to a ciphertext
that starts with a different address IP . Let us now sketch its proof.

Proof. We’ll show that such if we had an adversary 𝑀 ′ that violates


the conclusion of the claim, then there is an adversary 𝑀 that can win
in the CCA game.
The proof is simple and relies on the crucial fact that the CCA game
allows 𝑀 to query the decryption box on any ciphertext of her choice,
as long as it’s not exactly identical to the challenge cipertext 𝑐∗ . In par-
ticular, if 𝑀 ′ is able to morph an encryption 𝑐 of 𝑚 to some encryption
𝑐′ of some different 𝑚′ that agrees with 𝑚 on some set of bits, then 𝑀
can do the following: in the security game, use 𝑚0 to be some random
chose n c i p he rte x t se c u ri ty 151

message and 𝑚1 to be this plaintext 𝑚. Then, when receiving 𝑐∗ , apply


𝑀 ′ to it to obtain a ciphertext 𝑐′ (note that if the plaintext differs then
the ciphertext must differ also; can you see why?) ask the decryption
box to decrypt it and output 1 if the resulting message agrees with 𝑚
in the corresponding set of bits (otherwise output a random bit). If
𝑀 ′ was successful with probability 𝜖, then 𝑀 would win in the CCA
game with probability at least 1/2 + 𝜖/10 or so.

P
The proof above is rather sketchy. However it is not
very difficult and proving Lemma 6.2 on your own
is an excellent way to ensure familiarity with the
definition of CCA security.

6.3 CONSTRUCTING CCA SECURE ENCRYPTION


The definition of CCA seems extremely strong, so perhaps it is not
surprising that it is useful, but can we actually construct it? The WEP
attack shows that the CPA secure encryption we saw before (i.e.,
𝐸𝑘 (𝑚) = (𝑟, 𝑓𝑘 (𝑟) ⊕ 𝑚)) is not CCA secure. We will see other ex-
amples of non CCA secure encryptions in the exercises. So, how do
we construct such a scheme? The WEP attack actually already hints
of the crux of CCA security. We want to ensure that Mallory is not
able to modify the challenge ciphertext 𝑐∗ to some related 𝑐′ . Another
way to say it is that we need to ensure the integrity of messages to
achieve their confidentiality if we want to handle active adversaries that
might modify messages on the channel. Since in a great many practi-
cal scenarios, an adversary might be able to do so, this is an important
message that deserves to be repeated:
To ensure confidentiality, you need integrity.

This is a lesson that has been time and again been shown and many
protocols have been broken due to the mistaken belief that if we only 3
I also like the part where Green says about a block
care about secrecy, it is enough to use only encryption (and one that cipher mode that “if OCB was your kid, he’d play
is only CPA secure) and there is no need for authentication. Matthew three sports and be on his way to Harvard.” We
will have an exercise about a simplified version of
Green writes this more provocatively as the GCM mode (which perhaps only plays a single
sport and is on its way to …). You can read about
Nearly all of the symmetric encryption modes you learned
OCB in Exercise 9.14 in the Boneh-Shoup book;
about in school, textbooks, and Wikipedia are (poten- it uses the notion of a “tweakable block cipher”
tially) insecure. 3 which simply means that given a single key 𝑘, you
actually get a set {𝑝𝑘,1 , … , 𝑝𝑘,𝑡 } of permutations that
exactly because these basic modes only ensure security for pas- are indistinguishable from 𝑡 independent random
permutation (the set {1, … , 𝑡} is called the set of
sive eavesdropping adversaries and do not ensure chosen ciphertext “tweaks” and we sometimes index it using strings
security which is the “gold standard” for online applications. (For instead of numbers).
152 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

symmetric encryption people often use the name “authenticated en-


cryption” in practice rather than CCA security; those are not identical
but are extremely related notions.)
All of this suggests that Message Authentication Codes might help
us get CCA security. This turns out to be the case. But one needs to
take some care exactly how to use MAC’s to get CCA security. At this
point, you might want to stop and think how you would do this…

P
You should stop here and try to think how you would
implement a CCA secure encryption by combining
MAC’s with a CPA secure encryption.
chose n c i p he rte x t se c u ri ty 153

P
If you didn’t stop before, then you should really stop
and think now.
154 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

OK, so now that you had a chance to think about this on your own,
we will describe one way that works to achieve CCA security from
MACs. We will explore other approaches that may or may not work in
the exercises.

Theorem 6.3 — CCA from CPA and MAC (encrypt-then-sign). Let (𝐸, 𝐷) be
CPA-secure encryption scheme and (𝑆, 𝑉 ) be a CMA-secure MAC
with 𝑛 bit keys and a canonical verification algorithm. 4 Then the
following encryption (𝐸 ′ , 𝐷′ ) with keys 2𝑛 bits is CCA secure:

• 𝐸𝑘′ 1 ,𝑘2 (𝑚) is obtained by computing 𝑐 = 𝐸𝑘1 (𝑚) , 𝜎 = 𝑆𝑘2 (𝑐) and
outputting (𝑐, 𝜎).

• 𝐷𝑘′ 1 ,𝑘2 (𝑐, 𝜎) outputs nothing (e.g., an error message) if 𝑉𝑘2 (𝑐, 𝜎) ≠
1, and otherwise outputs 𝐷𝑘1 (𝑐).
4
By a canonical verification algorithm we mean that
𝑉𝑘 (𝑚, 𝜎) = 1 iff 𝑆𝑘 (𝑚) = 𝜎.

Proof. Suppose, for the sake of contradiction, that there exists an ad-
versary 𝑀 ′ that wins the CCA game for the scheme (𝐸 ′ , 𝐷′ ) with
probability at least 1/2 + 𝜖. We consider the following two cases:
Case I: With probability at least 𝜖/10, at some point during the
CCA game, 𝑀 ′ sends to its decryption box a ciphertext (𝑐, 𝜎) that is
not identical to one of the ciphertexts it previously obtained from its
encryption box, and obtains from it a non-error response.
Case II: The event above happens with probability smaller than
𝜖/10.
We will derive a contradiction in either case. In the first case, we
will use 𝑀 ′ to obtain an adversary that breaks the MAC (𝑆, 𝑉 ), while
in the second case, we will use 𝑀 ′ to obtain an adversary that breaks
the CPA-security of (𝐸, 𝐷).
Let’s start with Case I: When this case holds, we will build an ad-
versary 𝐹 (for “forger”) for the MAC (𝑆, 𝑉 ), we can assume the ad-
5
Since we use a MAC with canonical verification,
versary 𝐹 has access to the both signing and verification algorithms access to the signature algorithm implies access to the
as black boxes for some unknown key 𝑘2 that is chosen at random and verification algorithm.
fixed.5 𝐹 will choose 𝑘1 on its own, and will also choose at random
a number 𝑖0 from 1 to 𝑇 , where 𝑇 is the total number of queries that
𝑀 ′ makes to the decryption box. 𝐹 will run the entire CCA game
with 𝑀 ′ , using 𝑘1 and its access to the black boxes to execute the de-
cryption and decryption boxes, all the way until just before 𝑀 ′ makes
0 query (𝑐, 𝜎) to its decryption box. At that point, 𝐹 will output
the 𝑖𝑡ℎ
(𝑐, 𝜎). We claim that with probability at least 𝜖/(10𝑇 ), our forger will
succeed in the CMA game in the sense that (i) the query (𝑐, 𝜎) will
pass verification, and (ii) the message 𝑐 was not previously queried
before to the signing oracle.
chose n c i p he rte x t se c u ri ty 155

Indeed, because we are in Case I, with probability 𝜖/10, in this


game some query that 𝑀 ′ makes will be one that was not asked before
and hence was not queried by 𝐹 to its signing oracle, and moreover,
the returned message is not an error message, and hence the signature
passes verification. Since 𝑖0 is random, with probability 𝜖/(10𝑇 ) this
0 round. Let us assume that this above event
query will be at the 𝑖𝑡ℎ
GOOD happened in which the 𝑖0 -th query to the decryption box is
a pair (𝑐, 𝜎) that both passes verification and the pair (𝑐, 𝜎) was not
returned before by the encryption oracle. Since we pass (canonical)
verification, we know that 𝜎 = 𝑆𝑘2 (𝑐), and because all encryption
queries return pairs of the form (𝑐′ , 𝑆𝑘2 (𝑐′ )), this means that no such
query returned 𝑐 as its first element either. In other words, when the
event GOOD happens the 𝑖𝑡ℎ 0 query contains a pair (𝑐, 𝜎) such that 𝑐
was not queried before to the signature box, but (𝑐, 𝜎) passes verifi-
cation. This is the definition of breaking (𝑆, 𝑉 ) in a chosen message
attack, and hence we obtain a contradiction to the CMA security of
(𝑆, 𝑉 ).
Now for Case II: In this case, we will build an adversary 𝐸𝑣𝑒 for
CPA-game in the original scheme (𝐸, 𝐷). As you might expect, the
adversary 𝐸𝑣𝑒 will choose by herself the key 𝑘2 for the MAC scheme,
and attempt to play the CCA security game with 𝑀 ′ . When 𝑀 ′ makes
encryption queries this should not be a problem- 𝐸𝑣𝑒 can forward the
plaintext 𝑚 to its encryption oracle to get 𝑐 = 𝐸𝑘1 (𝑚) and then com-
pute 𝜎 = 𝑆𝑘2 (𝑐) since she knows the signing key 𝑘2 .
However, what does 𝐸𝑣𝑒 do when 𝑀 ′ makes decryption queries?
That is, suppose that 𝑀 ′ sends a query of the form (𝑐, 𝜎) to its de-
cryption box. To simulate the algorithm 𝐷′ , 𝐸𝑣𝑒 will need access to a
decryption box for 𝐷, but she doesn’t get such a box in the CPA game
(This is a subtle point- please pause here and reflect on it until you are
sure you understand it!)
To handle this issue 𝐸𝑣𝑒 will follow the common approach of
“winging it and hoping for the best”. When 𝑀 ′ sends a query of the
form (𝑐, 𝜎), 𝐸𝑣𝑒 will first check if it happens to be the case that (𝑐, 𝜎)
was returned before as an answer to an encryption query 𝑚. In this
case 𝐸𝑣𝑒 will breathe a sigh of relief and simply return 𝑚 to 𝑀 ′ as
the answer. (This is obviously correct: if (𝑐, 𝜎) is the encryption of 𝑚
then 𝑚 is the decryption of (𝑐, 𝜎).) However, if the query (𝑐, 𝜎) has not
been returned before as an answer, then 𝐸𝑣𝑒 is in a bit of a pickle. The
way out of it is for her to simply return “error” and hope that every-
thing will work out. The crucial observation is that because we are in
case II things will work out. After all, the only way 𝐸𝑣𝑒 makes a mis-
take is if she returns an error message where the original decryption
box would not have done so, but this happens with probability at most
𝜖/10. Hence, if 𝑀 ′ has success 1/2 + 𝜖 in the CCA game, then even
156 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

if it’s the case that 𝑀 ′ always outputs the wrong answer when 𝐸𝑣𝑒
makes this mistake, we will still get success at least 1/2 + 0.9𝜖. Since
𝜖 is non negligible, this would contradict the CPA security of (𝐸, 𝐷)
thereby concluding the proof of the theorem.

P
This proof is emblematic of a general principle for
proving CCA security. The idea is to show that the de-
cryption box is completely “useless” for the adversary,
since the only way to get a non error response from it
is to feed it with a ciphertext that was received from
the encryption box.

6.4 (SIMPLIFIED) GCM ENCRYPTION


The construction above works as a generic construction, but it is some-
what costly in the sense that we need to evaluate both the block cipher
and the MAC. In particular, if messages have 𝑡 blocks, then we would
need to invoke two cryptographic operations (a block cipher encryp-
tion and a MAC computation) per block. The GCM (Galois Counter
Mode) is a way around this. We are going to describe a simplified ver-
sion of this mode. For simplicity, assume that the number of blocks 𝑡
is fixed and known (though many of the annoying but important de-
tails in block cipher modes of operations involve dealing with padding
to multiple of blocks and dealing with variable block size).
A universal hash function collection is a family of functions {ℎ ∶
{0, 1}ℓ → {0, 1}𝑛 } such that for every 𝑥 ≠ 𝑥′ ∈ {0, 1}ℓ , the random
variables ℎ(𝑥) and ℎ(𝑥′ ) (taken over the choice of a random ℎ from
this family) are pairwise independent in {0, 1}2𝑛 . That is, for every
two potential outputs 𝑦, 𝑦′ ∈ {0, 1}𝑛 ,

Pr[ℎ(𝑥) = 𝑦 ∧ ℎ(𝑥′ ) = 𝑦′ ] = 2−2𝑛 (6.1)


Universal hash functions have rather efficient constructions, and in


particular if we relax the definition to allow almost universal hash func-
tions (where we replace the 2−2𝑛 factor in the righthand side of (6.1)
by a slightly bigger, though still negligible quantity) then the con-
structions become extremely efficient and the size of the description of
6
In 𝜖-almost universal hash functions we require that
ℎ is only related to 𝑛, no matter how big ℓ is.6 for every 𝑦, 𝑦′ ∈ {0, 1}𝑛 , and 𝑥 ≠ 𝑥′ ∈ {0, 1}ℓ ,
Our encryption scheme is defined as follow. The key is (𝑘, ℎ) where the probability that ℎ(𝑥) = ℎ(𝑥′ ) is at most 𝜖. It
𝑘 is an index to a pseudorandom permutation {𝑝𝑘 } and ℎ is the key for can be easily shown that the analysis below extends
to 𝜖 almost universal hash functions as long as 𝜖 is
a universal hash function.7 To encrypt a message 𝑚 = (𝑚1 , … , 𝑚𝑡 ) ∈ negligible, but we will leave verifying this to the
{0, 1}𝑛𝑡 do the following: reader.
7
In practice the key ℎ is derived from the key 𝑘 by
• Choose IV at random in [2𝑛 ]. applying the PRP to some particular input.
chose n c i p he rte x t se c u ri ty 157

• Let 𝑧𝑖 = 𝐸𝑘 (IV + 𝑖) for 𝑖 = 1, … , 𝑡 + 1.

• Let 𝑐𝑖 = 𝑧𝑖 ⊕ 𝑚𝑖 .

• Let 𝑐𝑡+1 = ℎ(𝑐1 , … , 𝑐𝑡 ) ⊕ 𝑧𝑡+1 .

• Output (IV, 𝑐1 , … , 𝑐𝑡+1 ).

The communication overhead includes one additional output block


plus the IV (whose transmission can often be avoided or reduced, de-
pending on the settings; see the notion of “nonce based encryption”).
This is fairly minimal. The additional computational cost on top of 𝑡
block-cipher evaluation is the application of ℎ(⋅). For the particular
choice of ℎ used in Galois Counter Mode, this function ℎ can be eval-
uated very efficiently- at a cost of a single multiplication in the Galois
field of size 2128 per block (one can think of it as some very particu-
lar operation that maps two 128 bit strings to a single one, and can be
carried out quite efficiently). We leave it as an (excellent!) exercise to
prove that the resulting scheme is CCA secure.

6.5 PADDING, CHOPPING, AND THEIR PITFALLS: THE “BUFFER


OVERFLOW” OF CRYPTOGRAPHY
In this course we typically focus on the simplest case where messages
have a fixed size. But in fact, in real life we often need to chop long
messages into blocks, or pad messages so that their length becomes an
integral multiple of the block size. Moreover, there are several subtle
ways to get this wrong, and these have been used in several practical
attacks.

Chopping into blocks: A block cipher a-priori provides a way to en-


crypt a message of length 𝑛, but we often have much longer messages
and need to “chop” them into blocks. This is where the block cipher
modes discussed in the previous lecture come in. However, the basic
popular modes such as CBC and OFB do not provide security against
chosen ciphertext attack, and in fact typically make it easy to extend
a ciphertext with an additional block or to remove the last block from
a ciphertext, both being operations which should not be feasible in a
CCA secure encryption.

Padding: Oftentimes messages are not an integer multiple of the


block size and hence need to be padded. The padding is typically a
map that takes the last partial block of the message (i.e., a string 𝑚
of length in {0, … , 𝑛 − 1}) and maps it into a full block (i.e., a string
𝑚 ∈ {0, 1}𝑛 ). The map needs to be invertible which in particular
means that if the message is already an integer multiple of the block
size we will need to add an extra block. (Since we have to map all the
158 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

1 + 2 + … + 2𝑛−1 messages of length 1, … , 𝑛 − 1 into the 2𝑛 messages of


length 𝑛 in a one-to-one fashion.) One approach for doing so is to pad
an 𝑛′ < 𝑛 length message with the string 10𝑛−𝑛 −1 . Sometimes people

use a different padding which involves encoding the length of the pad.

6.6 CHOSEN CIPHERTEXT ATTACK AS IMPLEMENTING METAPHORS


The classical “metaphor” for an encryption is a sealed envelope, but
as we have seen in the WEP, this metaphor can lead you astray. If you
placed a message 𝑚 in a sealed envelope, you should not be able to
modify it to the message 𝑚 ⊕ 𝑚′ without opening the envelope, and
yet this is exactly what happens in the canonical CPA secure encryp-
tion 𝐸𝑘 (𝑚) = (𝑟, 𝑓𝑘 (𝑟) ⊕ 𝑚). CCA security comes much closer to
realizing the metaphor, and hence is considered as the “gold stan-
dard” of secure encryption. This is important even if you do not in-
tend to write poetry about encryption. Formal verification of computer
programs is an area that is growing in importance given that com-
puter programs become both more complex and more mission critical.
Cryptographic protocols can fail in subtle ways, and even published
proofs of security can turn out to have bugs in them. Hence there is
a line of research dedicated to finding ways to automatically prove se-
curity of cryptographic protocols. Much of these line of research is
based on simple models to describe protocols that are known as Dolev
Yao models, based on the first paper that proposed such models. These
models define an algebraic form of security, where rather than thinking
of messages, keys, and ciphertexts as binary string, we think of them
as abstract entities. There are certain rules for manipulating these
symbols. For example, given a key 𝑘 and a message 𝑚 you can create
the ciphertext {𝑚}𝑘 , which you can decrypt back to 𝑚 using the same
key. However the assumption is that any information that cannot be
obtained by such manipulation is unknown.
Translating a proof of security in this algebra to a proof for real
world adversaries is highly non trivial. However, to have even a fight-
ing chance, the encryption scheme needs to be as strong as possible,
and in particular it turns out that security notions such as CCA play a
crucial role.

6.7 READING COMPREHENSION EXERCISES


I recommend students do the following exercises after reading the
lecture. They do not cover all material, but can be a good way to check
your understanding.
Exercise 6.1Let (𝐸, 𝐷) be the “canonical” PRF-based CPA secure en-
cryption, where 𝐸𝑘 (𝑚) = (𝑟, 𝑓𝑘 (𝑟) ⊕ 𝑚) and {𝑓𝑘 } is a PRF collection
and 𝑟 is chosen at random. Is this scheme CCA secure?
chose n c i p he rte x t se c u ri ty 159

Figure 6.4: The Dolev-Yao Algebra of what an adver-


sary or “intruder” knows. Figure taken from here.

a. No it is never CCA secure.

b. It is always CCA secure.

c. It is sometimes CCA secure and sometimes not, depending on the


properties of the PRF {𝑓𝑘 }.

Suppose that we allow a key to be as long as the message,


Exercise 6.2
and so we can use the one time pad. Would the one-time pad be:

a. CPA secure

b. CCA secure

c. Neither CPA nor CCA secure.

Which of the following statements is true about the proof


Exercise 6.3
of Theorem 6.3:

a. Case I corresponds to breaking the MAC and Case II corresponds


to breaking the CPA security of the underlying encryption scheme.

b. Case I corresponds to breaking the CPA security of the underlying


encryption scheme and Case II corresponds to breaking the MAC.

c. Both cases correspond to both breaking the MAC and encryption


scheme

d. If neither Case I nor Case II happens then we obtain an adversary


breaking the security of the underlying encryption scheme.


7
Hash Functions, Random Oracles, and Bitcoin

We have seen pseudorandom generators, functions and permuta-


tions, as well as Message Authentication codes, CPA and CCA secure
encryptions. This week we will talk about cryptographic hash func-
tions and some of their magical properties. We motivate this by the
Bitcoin cryptocurrency. As usual our discussion will be highly abstract
and idealized, and any resemblance to real cryptocurrencies, living or
dead, is purely coincidental.

7.1 THE “BITCOIN” PROBLEM


Using cryptography to create a centralized digital-currency is fairly
straightforward, and indeed this is what is done by Visa, Mastercard,
and so on. The main challenge with Bitcoin is that it is decentralized.
There is no trusted server, there are no “user accounts”, no central
authority to adjudicate claims. Rather we have a collection of anony-
mous and autonomous parties that somehow need to agree on what is
a valid payment.

7.1.1 The Currency Problem


Before talking about cryptocurrencies, let’s talk about currencies in 1
I am not an economist by any stretch of the imagina-
general.1 At an abstract level, a currency requires two components: tion, so please take the discussion below with a huge
grain of salt. I would appreciate any comments on it.
• A scarce resource.

• A mechanism for determining and transferring ownership of certain


quantities of this resource.

Some currencies are/were based on commodity money. The scarce


resource was some commodity having intrinsic value, such as gold
or silver, or even salt or tea, and ownership based on physical pos-
session. However, for various financial and political reasons, some
societies shifted to representative money, where the currency is not
the commodity itself but rather a certificate that provides the right to
the commodity. Representative money requires trust in some central

Compiled on 11.17.2021 22:35


162 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

authority that would respect the certificate. The next step in the evo-
lution of currencies was fiat money, which is a currency (like today’s
dollar, ever since the U.S. moved off the gold standard) that does not
correspond to any commodity, but rather only relies on trust in a cen-
tral authority. (Another example is the Roman coins, which though
originally made of silver, underwent a continous process of debase-
ment until they contained less than two percent of it.) One advantage
(sometimes disadvantage) of a fiat currency is that it allows for more
flexible monetary policy on parts of the central authority.

7.1.2 Bitcoin Architecture


Bitcoin is a fiat currency without a central authority. A priori this
seems like a contradiction in terms. If there is no trusted central au-
thority, how can we ensure a scarce resource? Who settles claims of
ownership? And who sets monetary policy?
For instance, one problem we are particularly concerned with is the
double-spend problem. The following scenario is a double-spend:

1. Adversary 𝐴 orders a pizza from Pinocchio’s.


2. 𝐴 gives Pinocchio’s a particular “set” of money 𝑚.
3. 𝐴 eats the pizza.
4. 𝐴 gives that same set of money 𝑚 to another Domino’s such that
Pinocchio’s no longer has that money.
5. 𝐴 eats the second pizza.

With cash, this situation is unfathomable. But think about a credit


card: if you can “revoke” (or dispute) the first payment, you could
take money away from Pinocchio’s after you’ve received some goods or
services. Also consider that rather than giving 𝑚 to Domino’s in step
4, 𝐴 could just give 𝑚 back to itself.
We want to make it difficult or impossible for the anyone to perform
a double-spend like this.
Bitcoin (and other cryptocurrencies) aims to provide cryptographic
solutions to this problem and more.
The basic unit in the Bitcoin system is a coin. Each coin has a unique 2
This is one of the places where we simplify and
identifier, and a current owner .2 Transactions in the system have either deviate from the actual Bitcoin system. In the actual
the form of “mint coin with identifier ID and owner 𝑃 ” or “transfer Bitcoin system, the atomic unit is known as a Satoshi
and one Bitcoin (abbreviated BTC) is 108 Satoshis.
the coin ID from 𝑃 to 𝑄”. All of these transactions are recorded in a For reasons of efficiency, there is no individual iden-
public ledger. tifier per Satoshi and transactions can involve transfer
Since there are no user accounts in Bitcoin, the “entities” 𝑃 and 𝑄 and creation of multiple Satoshis. However, conceptu-
ally we can think of atomic coins each of which has a
are not identifiers of any physical person. Rather 𝑃 and 𝑄 are “com- unique identifier.
putational puzzles”. A computational puzzle can be thought of as a
string 𝛼 that specifies some “problem” such that it’s easy to verify
whether some other string 𝛽 is a “solution” for 𝛼, but it is hard to find
such a solution on your own. (Students with complexity background
ha sh fu nc ti on s, ra n d om orac l e s, a n d bi tcoi n 163

will recognize here the class NP.) So when we say “transfer the coin
ID from 𝑃 to 𝑄” we mean that whomever holds a solution for the
puzzle 𝑄 is now the owner of the coin ID (and to verify the authen-
ticity of this transfer, you provide a solution to the puzzle 𝑃 .) More
accurately, a transaction involving the coin ID is self-validating if it
contains a solution to the puzzle that is associated with ID according
to the latest transaction in the ledger.

P
Please re-read the previous paragraph, to make sure
you follow the logic.

One theoretical example of a puzzle is the following: if 𝑁 is the


puzzle, an entity can “prove” that they own coins assigned to 𝑁 if
they can produce numbers 𝐴, 𝐵 such that 𝑁 = 𝐴 ⋅ 𝐵.
Another more generic example (that you can keep in mind as a
potential implementation for the puzzles we use here) is: 𝛼 is some
string in {0, 1}2𝑛 and 𝛽 will be a string in {0, 1}𝑛 such that 𝛼 = 𝐺(𝛽)
where 𝐺 ∶ {0, 1}𝑛 → {0, 1}2𝑛 is some pseudorandom generator.
The real Bitcoin system typically uses puzzles based on digital sig-
natures, a concept we will learn about later in this course, but you can
simply think of 𝑃 as specifying some abstract puzzle and every per-
son that can solve 𝑃 can construct transactions with the coins owned 3
There are reasons why Bitcoin uses digital signatures
by 𝑃 .3 Unfortunately, this means if you lose the solution to the puz- and not these puzzles. The main issue is that we
want to bind the puzzle not just to the coin but also
zle then you have no access to the coin. More alarmingly, if someone to the particular transaction, so that if you know the
steals the solution from you, then you have no recourse or way to get solution to the puzzle 𝑃 corresponding to the coin
your coin back. People have managed to lose millions of dollars in this ID and want to use that to transfer it to 𝑄, it won’t be
possible for someone to take your solution and use
way. that to transfer the coin to 𝑄′ before your transaction
is added to the public ledger. We will come back to
this issue after we learn about digital signatures. As
a quick preview, in Bitcoin the puzzle is as follows:
7.2 THE BITCOIN LEDGER
whoever can produce a digital signature with the
The main idea behind Bitcoin is that there is a public ledger that con- private key corresponding to the public key 𝑃 can
claim these coins.
tains an ordered list of all the transactions that were ever performed
and are considered as valid in the system. Given such a ledger, it is
easy to answer the question of who owns any particular coin. The
main problem is how does a collection of anonymous parties with-
out any central authority agree on this ledger? This is an instance of
the consensus problem in distributed computing. This seems quite
scary, as there are very strong negative results known for this prob-
lem; for example the famous Fischer, Lynch, Patterson (FLP) result
showed that if there is even one party that has a benign failure (i.e.,
it halts and stop responding) then it is impossible to guarantee con-
sensus in a completely asynchronous network. Things are better if
we assume some degree of partial synchrony (i.e., a global clock and
164 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

some bounds on the latency of messages) as well as that a majority or


supermajority of the parties behave correctly.
The partial synchrony assumption is typically approximately main-
tained on the Internet, but the honest majority assumption seems
quite suspicious. What does it mean a “majority of parties” in an
anonymous network where a single person can create multiple “en-
tities” and cause them to behave arbitrarily maliciously (known as
“byzantine” faults in distributed parlance)? Also, why would we
assume that even one party would behave honestly- if there is no cen-
tral authority and it is profitable to cheat then everyone would cheat,
wouldn’t they?

Figure 7.1: The Bitcoin ledger consists of an ordered


list of transactions. At any given point in time there
might be several “forks” that continue the ledger, and
different parties do not necessarily have to agree on
them. However, the Bitcoin architecture is designed to
ensure that the parties corresponding to a majority of
the computing power will reach consensus on a single
ledger.

Perhaps the main idea behind Bitcoin is that “majority” will corre-
spond to a “majority of computing power”, or as the original Bitcoin
paper says, “one CPU one vote” (or perhaps more accurately, “one
cycle one vote”). It might not be immediately clear how to imple-
ment this, but at least it means that creating fictitious new entities
(sometimes known as a Sybil attack after the movie about multiple-
personality disorder) cannot help. To implement it we turn to a cryp-
tographic concept known as “proof of work” which was originally
suggested by Dwork and Naor in 1991 as a way to combat mass mar-
This was a rather visionary paper in that it foresaw
keting email.4
4

this issue before the term “spam” was introduced and


Consider a pseudorandom function {𝑓𝑘 } mapping 𝑛 bits to ℓ bits. indeed when email itself, let alone spam email, was
On average, it will take a party Alice 2ℓ queries to obtain an input 𝑥 hardly widespread.

such that 𝑓𝑘 (𝑥) = 0ℓ . So, if we’re not too careful, we might think of
such an input 𝑥 as a proof that Alice spent 2ℓ time.

P
Stop here and try to think if indeed it is the case that
one cannot find an input 𝑥 such that 𝑓𝑘 (𝑥) = 0ℓ using
much fewer than 2ℓ steps.

The main question in using PRF’s for proofs of work is who is hold-
ing the key 𝑘 for the pseudorandom function. If there is a trusted
server holding the key, then sure, finding such an input 𝑥 would take
on average 2ℓ queries, but the whole point of Bitcoin is to not have a
ha sh fu nc ti on s, ra n d om orac l e s, a n d bi tcoi n 165

trusted server. If we give 𝑘 to a party Alice, then can we guarantee


that she can’t find a “shortcut” to find such an input without running
2ℓ queries? The answer, in general, is no.

P
Indeed, it is an excellent exercise to prove that (under
the PRF conjecture) that there exists a PRF {𝑓𝑘 } map-
ping 𝑛 bits to 𝑛 bits and an efficient algorithm 𝐴 such
that 𝐴(𝑘) = 𝑥 such that 𝑓𝑘 (𝑥) = 0ℓ .

However, suppose that {𝑓𝑘 } was somehow a “super-strong PRF”


that would behave like a random function even to a party that holds
the key. In this case, we can imagine that making a query to 𝑓𝑘 corre-
sponds to tossing ℓ independent random coins, and it would not be
feasible to obtain 𝑥 such that 𝑓𝑘 (𝑥) = 0ℓ using much less than 2ℓ cy-
cles. Thus presenting such an input 𝑥 can serve as a “proof of work”
that you’ve spent 2ℓ cycles or so. By adjusting ℓ we can obtain a proof
of spending 𝑇 cycles for a value 𝑇 of our choice. Now if things would
go as usual in this course then I would state a result like the following:

Theorem: Under the PRG conjecture, there exist


super strong PRF.

Where again, the “super strong PRF” behaves like a truly random
function even to a party that holds the key. Unfortunately such a result
is not known to be true, and for a very good reason. Most natural
ways to define “super strong PRF” will result in properties that can be
shown to be impossible to achieve. Nevertheless, the intuition behind it
still seems useful and so we have the following heuristic:

The random oracle heuristic (aka “Random oracle


model”, Bellare-Rogaway 1993): If a “natural”
protocol is secure when all parties have access to
a random function 𝐻 ∶ {0, 1}𝑛 → {0, 1}ℓ , then it
remains secure even when we give the parties the
description of a cryptographic hash function with the
same input and output lengths.

We don’t have a good characterization as to what makes a proto-


col “natural” and we do have fairly strong counterexamples to this
heuristic (though they are arguably “unnatural”). That said, it still
seems useful as a way to get intuition for security, and in particular to
analyze Bitcoin (and many other practical protocols) we do need to
assume it, at least given current knowledge.

R
166 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Remark 7.1 — Important caveat on the random oracle


model. The random oracle heuristic is very different
from all the conjectures we considered before. It is
not a formal conjecture since we don’t have any good
way to define “natural” and we do have examples
of protocols that are secure when all parties have ac-
cess to a random function but are insecure whenever
we replace this random function by any efficiently
computable function (see the homework exercises).

Under the random oracle model, we can now specify the “proof of
work” protocol for Bitcoin. Given some identifier ID ∈ {0, 1}𝑛 , an
integer 𝑇 ≪ 2𝑛 , and a hash function 𝐻 ∶ {0, 1}2𝑛 → {0, 1}𝑛 , the proof
of work corresponding to ID and 𝑇 will be some 𝑥 ∈ {0, 1}∗ such that 5
The actual Bitcoin protocol is slightly more general,
the first ⌈log 𝑇 ⌉ bits of 𝐻(ID‖𝑥) are zero.5 where the proof is some 𝑥 such that 𝐻(ID‖𝑥), when
interpreted as a number in [2𝑛 ], is at most 𝑇 . There
are also other issues about how exactly 𝑥 is placed
7.2.1 From Proof of Work to Consensus on Ledger and ID is computed from past history that we ignore
here.
How does proof of work help us in achieving consensus?
We want every transaction 𝑡𝑖 in the Bitcoin system to have a
corresponding proof of work. In particular, some proof of 𝑇𝑖 time
“amount” of work with respect to some identifier that is unique to 𝑡𝑖 .
The length of a ledger (𝑡1 , … , 𝑡𝑛 ) is the sum of the corresponding
𝑇𝑖 ’s. In other words, the length corresponds to the total number of
cycles invested in creating this ledger. A ledger is valid if every trans-
action in the ledger of the form “transfer the coin ID from 𝑃 to 𝑄” is
self-certified by a solution to 𝑃 .
Critically, participants (specifically miners) in the Bitcoin network
are rewarded for adding valid entries to the ledger. In other words,
they are given Bitcoins (which are newly minted for them) for per-
forming the “work” required to add an entry to the ledger. However,
honest participants (including non-miners, people who just read the
ledger) will accept the longest known ledger as the ground truth. In
addition, Bitcoin miners are rewarded for adding entry 𝑖 after entry
𝑖 + 100 is added to the ledger. This gives miners an incentive to choose
the longest ledger to contribute their work towards. To see why, con-
sider the following rough approximation of the incentive structure:
Remember that Bitcoin miners are rewarded for adding entry 𝑖 after
entry 𝑖 + 100 is added to the ledger. Thus, by spending “work” (which
directly corresponds to CPU cycles, which directly corresponds to
monetary value), miners are “betting” on whether a particular ledger
will “win”. Think of yourself as a miner, and consider a scenario in
which there are two competing ledgers. Ledger 1 has length 3 and
Ledger 2 has length 6. That means miners have put roughly 2x the
amount of work (= CPU cycles = money) into Ledger 2. In order for
Ledger 1 to “win” (from your perspective that means reach length
ha sh fu nc ti on s, ra n d om orac l e s, a n d bi tcoi n 167

104 to claim your prize and to become longer than Ledger 2), you
would have to perform 3 entries worth of work just to get Ledger 1 to
length 6. But in the meantime, other miners will already be working on
Ledger 2, further increasing its length! Thus you want to add entries
to Ledger 2.
If a ledger 𝐿 corresponds to the majority of the cycles that were
available in this network then every honest party would accept it, as
any alternative ledger would be necessarily shorter. (See Fig. 7.1.)
Thus one can hope that the consensus ledger will continue to grow.
(This is a rather hand-wavy and imprecise argument, see this paper
for a more in depth analysis; this is also related to the phenomenon
known as preferential attachment.)

Cost to mine, mining pools: Generally, if you know that completing a


𝑇 -cycle proof will get you a single coin, then making a single query
(which will succeed with probability 1/𝑇 ) is akin to buying a lottery
ticket that costs you a single cycle and has probability 1/𝑇 to win
a single coin. One difference over the actual lottery is that there is
also some probability that you’re working on the wrong fork of the
ledger, but this incentivizes people to avoid this as much as possible.
Another, perhaps even more major difference, is that things are setup
so that this is a profitable enterprise and the cost of a cycle is smaller
than the value of 1/𝑇 coins. Just like in the lottery, people can and
do gather in groups (known as “mining pools”) where they pool
together all their computing resources, and then split the award if they
win it. Joining a pool doesn’t change your expectation of winning but
reduces the variance. In the extreme case, if everyone is in the same
pool, then for every cycle you spend you get exactly 1/𝑇 coins. The
way these pools work in practice is that someone that spent 𝐶 cycles
looking for an output with all zeroes, only has probability 𝐶/𝑇 of
getting it, but is very likely to get an output that begins with log 𝐶
zeroes. This output can serve as their own “proof of work” that they
spent 𝐶 cycles and they can send it to the pool management so they
get an appropriate share of the reward.

The real Bitcoin: There are several aspects in


which the protocol described above differs from the
real Bitcoin protocol. Some of them were already
discussed above: Bitcoin typically uses digital sig-
natures for puzzles (though it has a more general
scripting language to specify them), and transac-
tions involve a number of Satoshis (and the user
interface typically displays currency is in units of
BTC which are 108 Satoshis). The Bitcoin protocol
also has a formula designed to factor in the decrease
in dollar cost per cycle so that Bitcoins become more
168 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

expensive to mine with time. There is also a fee


mechanism apart from the mining to incentivize
parties to add to the ledger. (The issue of incentives
in Bitcoin is quite subtle and not fully resolved,
and it is possible that parties’ behavior will change
with time.) The ledger does not grow by a single
transaction at a time but rather by a block of transac-
tions, and there is also some timing synchronization
mechanism (which is needed, as per the consensus
impossibility results). There are other differences
as well; see the Bonneau et al paper as well as the
Tschorsch and Scheuermann survey for more.

7.3 COLLISION RESISTANCE HASH FUNCTIONS AND CREATING


SHORT “UNIQUE” IDENTIFIERS
Another issue we “swept under the carpet” is how do we come up
with these unique identifiers per transaction. We want each transac-
tion 𝑡𝑖 to be bound to the ledger state (𝑡1 , … , 𝑡𝑖−1 ), and so the ID of 𝑡𝑖
should contain also the ID’s all the prior transactions. But yet we want
this ID to be only 𝑛 bits long. Ideally, we could solve this if we had
a one to one mapping 𝐻 from {0, 1}𝑁 to {0, 1}𝑛 for some very large
𝑁 ≫ 𝑛. Then the ID corresponding to the task of appending 𝑡𝑖 to
(𝑡1 , … , 𝑡𝑖−1 ) would simply be 𝐻(𝑡1 ‖ ⋯ ‖𝑡𝑖 ). The only problem is that
this is of course clearly impossible- 2𝑁 is much bigger than 2𝑛 and
there is no one to one map from a large set to a smaller set. Luckily we
are in the magical world of crypto where the impossible is routine and
the unimaginable is occasional. So, we can actually find a function 𝐻
that is “essentially” one to one.
The main idea is the following simple result, which can be thought
of as one side of the so called “birthday paradox”:
Lemma 7.2 If 𝐻 is a random function from some domain 𝑆 to {0, 1}𝑛 ,
then the probability that after 𝑇 queries an attacker finds 𝑥 ≠ 𝑥′ such
that 𝐻(𝑥) = 𝐻(𝑥′ ) is at most 𝑇 2 /2𝑛 .

Proof. Let us think of 𝐻 in the “lazy evaluation” mode where for


every query the adversary makes, we choose a random answer in Figure 7.2: A collision-resistant hash function is a
{0, 1}𝑛 at the time it is made. (We can assume the adversary never map that from a large universe to a small one that is
“practically one to one” in the sense that collisions for
makes the same query twice since a repeat query can be simulated by the function do exist but are hard to find.
repeating the same answer.) For 𝑖 < 𝑗 in [𝑇 ] let 𝐸𝑖,𝑗 be the event that
𝐻(𝑥𝑖 ) = 𝐻(𝑥𝑗 ). Since 𝐻(𝑥𝑗 ) is chosen at random and independently
from the prior choice of 𝐻(𝑥𝑖 ), the probability of 𝐸𝑖,𝑗 is 2−𝑛 . Thus the
probability of the union of 𝐸𝑖,𝑗 over all 𝑖, 𝑗’s is less than 𝑇 2 /2𝑛 , but this
probability is exactly what we needed to calculate.

ha sh fu nc ti on s, ra n d om orac l e s, a n d bi tcoi n 169

This means that a random function 𝐻 is collision resistant in the


sense that it is hard for an efficient adversary to find two inputs that
collide. Thus the random oracle heuristic would suggest that a crypto-
graphic hash function can be used to obtain the following object:

A collection {ℎ𝑘 } of
Definition 7.3 — Collision resistant hash functions.
functions where ℎ𝑘 ∶ {0, 1}∗ → {0, 1}𝑛 for 𝑘 ∈ {0, 1}𝑛 is a collision
resistant hash function (CRH) collection if the map (𝑘, 𝑥) ↦ ℎ𝑘 (𝑥)
is efficiently computable and for every efficient adversary 𝐴,
the probability over 𝑘 that 𝐴(𝑘) = (𝑥, 𝑥′ ) such that 𝑥 ≠ 𝑥′ and
ℎ𝑘 (𝑥) = ℎ𝑘 (𝑥′ ) is negligible. 6
6
Note that the other side of the birthday bound
Once more we do not know a theorem saying that under the PRG shows that you can always find a collision in ℎ𝑘 using
conjecture there exists a collision resistant hash function collection, roughly 2𝑛/2 queries. For this reason we typically
need to double the output length of hash functions
even though this property is considered as one of the desiderata for compared to the key size of other cryptographic
cryptographic hash functions. However, we do know how to obtain primitives (e.g., 256 bits as opposed to 128 bits).
collections satisfying this condition under various assumptions that
we will see later in the course such as the learning with error problem
and the factoring and discrete logarithm problems. Furthermore if
we consider the weaker notion of security under a second preimage
attack (also known as being a “universal one way hash function” or
UOWHF) then it is known how to derive such a function from the
PRG assumption.

R
Remark 7.4 — CRH vs PRF. A collection {ℎ𝑘 } of colli-
sion resistant hash functions is an incomparable object
to a collection {𝑓𝑠 } of pseudorandom functions with
the same input and output lengths. On one hand,
the condition of being collision-resistant does not
imply that ℎ𝑘 is indistinguishable from random. For
example, it is possible to construct a valid collision
resistant hash function where the first output bit al-
ways equals zero (and hence is easily distinguishable
from a random function). On the other hand, unlike
Definition 4.1, the adversary of Definition 7.3 is not
merely given a “black box” to compute the hash func-
tion, but rather the key to the hash function. This is a
much stronger attack model, and so a PRF does not
have to be collision resistant. (Constructing a PRF that
is not collision resistant is a nice and recommended
exercise.)
170 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

7.4 PRACTICAL CONSTRUCTIONS OF CRYPTOGRAPHIC HASH


FUNCTIONS
While we discussed hash functions as keyed collections, in practice
people often think of a hash function as being a fixed keyless function.
However, this is because most practical constructions involve some
hardwired standardized constants (often known as IV) that can be
thought of as a choice of the key.
Practical constructions of cryptographic hash functions start
with a basic block which is known as a compression function
ℎ ∶ {0, 1}2𝑛 → {0, 1}𝑛 . The function 𝐻 ∶ {0, 1}∗ → {0, 1}𝑛 is defined
as 𝐻(𝑚1 , … , 𝑚𝑡 ) = ℎ(ℎ(ℎ(𝑚1 , IV), 𝑚2 ), ⋯ , 𝑚𝑡 ) when the message is
composed of 𝑡 blocks (and we can pad it otherwise). See Fig. 7.3. This
construction is known as the Merkle-Damgard construction and we
know that it does preserve collision resistance:

Figure 7.3: The Merkle-Damgard construction converts


a compression function ℎ ∶ {0, 1}2𝑛 → {0, 1}𝑛
into a hash function that maps strings of arbitrary
length into {0, 1}𝑛 . The transformation preserves
collision resistance but does not yield a PRF even if ℎ
was pseudorandom. Hence for many applications it
should not be used directly but rather composed with
a transformation such as HMAC.

Let 𝐻 be
Theorem 7.5 — Merkle-Damgard preserves collision resistance.
constructed from ℎ as above. Then given two messages 𝑚 ≠ 𝑚′ ∈
{0, 1}𝑡𝑛 such that 𝐻(𝑚) = 𝐻(𝑚′ ) we can efficiently find two mes-
sages 𝑥 ≠ 𝑥′ ∈ {0, 1}2𝑛 such that ℎ(𝑥) = ℎ(𝑥′ ).

Proof. The intuition behind the proof is that if ℎ was invertible then
we could invert 𝐻 by simply going backwards. Thus in principle if
a collision for 𝐻 exists then so does a collision for ℎ. Now of course
this is a vacuous statement since both ℎ and 𝐻 shrink their inputs and
hence clearly have collisions. But we want to show a constructive proof
for this statement that will allow us to transform a collision in 𝐻 to
a collision in ℎ. This is very simple. We look at the computation of
𝐻(𝑚) and 𝐻(𝑚′ ) and at the first block in which the inputs differ but
the output is the same (there must be such a block). This block will
yield a collision for ℎ.

ha sh fu nc ti on s, ra n d om orac l e s, a n d bi tcoi n 171

7.4.1 Practical Random-ish Functions


In practice we want much more than collision resistance from our
hash functions. In particular we often would like them to be PRF’s as
well. Unfortunately, the Merkle-Damgard construction is not a PRF
even when IV is random and secret. This is because we can perform
a length extension attack on it. Even if we don’t know IV, given 𝑦 =
𝐻𝐼𝑉 (𝑚1 , … , 𝑚𝑡 ) and a block 𝑚𝑡+1 we can compute 𝑦′ = ℎ(𝑦, 𝑚𝑡+1 )
which equals 𝐻𝐼𝑉 (𝑚1 , … , 𝑚𝑡+1 ).
One fix for this is to use a different IV in the end of the encryption.

That is, we define:



𝐻𝐼𝑉 ,IV′ (𝑚1 , … , 𝑚𝑡 ) = ℎ(IV , 𝐻𝐼𝑉 (𝑚1 , … , 𝑚𝑡 ))
A variant of this construction (where IV is obtained as some sim-

ple function of IV) is known as HMAC and it can be shown to be a


pseudorandom function under some pseudorandomness assump-
tions on the compression function ℎ. It is very widely implemented.
In many cases where I say “use a cryptographic hash function” in this
course I actually mean to use an HMAC like construction that can be
conjectured to give at least a PRF if not stronger “random oracle”-like
properties.
The simplest implementation for a compression function is to take
a block cipher with an 𝑛 bit key and an 𝑛 bit message and then simply
define ℎ(𝑥1 , … , 𝑥2𝑛 ) = 𝐸𝑥𝑛+1 ,…,𝑥2𝑛 (𝑥1 , … , 𝑥𝑛 ). A more common vari-
ant is known as Davies-Meyer where we also XOR the output with
𝑥𝑛+1 , … 𝑥2𝑛 . In practice people often use tailor made block ciphers that
are designed for some efficiency or security concerns.

7.4.2 Some History


Almost all practically used hash functions are based on the Merkle-
Damgard paradigm. Hash functions are designed to be extremely 7
For example, the Boneh-Shoup book quotes process-
efficient7 which also means that they are often at the “edge of insecu- ing times of up to 255MB/sec on a 1.83 Ghz Intel Core
2 processor, which is more than enough to handle not
rity” and indeed have fallen over the edge. just Harvard’s network but even Lamar College’s.
In 1990 Ron Rivest proposed MD4, which was already showing
weaknesses in 1991, and a full collision was found in 1995. Even faster
attacks have been since found and MD4 is considered completely
insecure.
In response to these weaknesses, Rivest designed MD5 in 1991. A
weakness was shown for it in 1996 and a full collision was shown in
2004. Hence it is now also considered insecure.
In 1993 the National Institute of Standards proposed a standard
for a hash function known as the Secure Hash Algorithm (SHA), which
had quite a few similarities with the MD4 and MD5 functions. This
function was known as SHA-0, and the standard was replaced in
1995 with SHA-1, which includes an extra “mixing” (i.e., bit rotation)
operation. At the time no explanation was given for this change, but
172 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

SHA-0 was later found to be insecure. In 2002 a variant with longer


output, known as SHA-256, was added (as well as some others). In
2005, following the MD5 collision, significant weaknesses were shown
in SHA-1. In 2017, a full SHA-1 collision was found. Today SHA-1 is
considered insecure and SHA-256 is recommended.
Given the weaknesses in MD-5 and SHA-1, NIST started a competi-
tion in 2006 for a new hashing standard, based on functions that seem
sufficiently different from the MD5/SHA-0/SHA-1 family. (SHA-256
is unbroken but it seems too close for comfort to those other systems.)
The hash function Keccak was selected as the new standard SHA-3 in
August of 2015.

7.4.3 The NSA and Hash Functions


The NSA is the world’s largest employer of mathematicians, and is
very heavily invested in cryptographic research. It seems quite pos-
sible that they devote far more resources to analyzing symmetric
primitives such as block ciphers and hash functions than the open re-
search community. Indeed, the history above suggests that the NSA
has consistently discovered attacks on hash functions before the cryp-
tographic community (and the same holds for the differential crypt-
analysis technique for block ciphers). That said, despite the “mythic”
powers that are sometimes ascribed to the NSA, this history suggests
that they are ahead of the open community, but not so much ahead,
discovering attacks on hash functions about 5 years or so before they
appear in the open literature.
There are a few ways we can get “insider views” to the NSA’s
thinking. Some such insights can be obtained from the Snowden
documents. The Flame malware was discovered in Iran in 2012 after
operating since at least 2010. It used an MD5 collision to achieve its
goals. Such a collision was known in the open literature since 2008,
but Flame used a different variant that was unknown in the litera-
ture. For this reason it is suspected that it was designed by a western
intelligence agency.
Another insight into NSA’s thoughts can be found in pages 12-19 of
NSA’s internal Cryptolog newsletter which was recently declassified;
one can find there a rather entertaining and opinionated (or obnox-
ious, depending on your point of view) review of the CRYPTO 1992
conference. In page 14 the author remarks that certain weaknesses of
MD5 demonstrated in the conference are unlikely to be extended to
the full version, which suggests that the NSA (or at least the author)
was not aware of the MD5 collisions at the time. (The full archive of
the cryptolog newsletter makes for some interesting reading!)
ha sh fu nc ti on s, ra n d om orac l e s, a n d bi tcoi n 173

7.4.4 Cryptographic vs Non-Cryptographic Hash Functions


Hash functions are of course also widely used for non-cryptographic ap-
plications such as building hash tables and load balancing. For these
applications people often use linear hash functions known as cyclic
redundancy codes (CRC). Note however that even in those seemingly
non-cryptographic applications, an adversary might cause signifi-
cant slowdown to the system if he can generate many collisions. This
can and has been used to obtain denial of service attacks. As a rule of
thumb, if the inputs to your system might be generated by someone
who does not have your best interests at heart, you’re better off using a
cryptographic hash function.

7.5 READING COMPREHENSION EXERCISES


I recommend students do the following exercises after reading the
lecture. They do not cover all material, but can be a good way to check
your understanding.
Exercise 7.1Choose the strongest true statement from the following op-
tions. (That is, choose the mathematical statement from these options
that is both true, and one can derive the other true statements as direct
corollaries.)

a. For every function ℎ ∶ {0, 1}1024 → {0, 1}128 there exist two strings
𝑥 ≠ 𝑥′ in {0, 1}1024 such that ℎ(𝑥) = ℎ(𝑥′ ).

b. There is a randomized algorithm 𝐴 that makes at most 2128 queries


to a given black box computing a function ℎ ∶ {0, 1}1024 → {0, 1}128
that with probability at least 0.9, 𝐴 outputs a pair 𝑥 ≠ 𝑥′ in
{0, 1}1024 such that ℎ(𝑥) = ℎ(𝑥′ ).

c. There is a randomized algorithm 𝐴 that makes at most 100 ⋅ 264


queries to a given black box computing a function ℎ ∶ {0, 1}1024 →
{0, 1}128 that with probability at least 0.9, 𝐴 outputs a pair 𝑥 ≠ 𝑥′ in
{0, 1}1024 such that ℎ(𝑥) = ℎ(𝑥′ ).

d. There is a randomized algorithm 𝐴 that makes at most 0.01 ⋅ 264


queries to a given black box computing a function ℎ ∶ {0, 1}1024 →
{0, 1}128 that with probability at least 0.9, 𝐴 outputs a pair 𝑥 ≠ 𝑥′ in
{0, 1}1024 such that ℎ(𝑥) = ℎ(𝑥′ ).

Exercise 7.2 Suppose that ℎ ∶ {0, 1}1024 → {0, 1}128 is chosen at random.
If 𝑦 is chosen at random in {0, 1}128 and we pick 𝑥1 , … , 𝑥𝑡 indepen-
dently at random in {0, 1}1024 , how large does 𝑡 need to be so that the
probability that there is some 𝑥𝑖 such that ℎ(𝑥𝑖 ) = 𝑦 is at least 1/2.
(Pick the answer with the closest estimate):
174 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

a. 21024

b. 2256

c. 2128

d. 264

Suppose that a message authentication code (𝑆, 𝑉 ) where


Exercise 7.3
Alice and Bob use as one of its components a function ℎ as a black box
is secure when ℎ is a random function. Is it still secure when Alice and
Bob uses a hash function ℎ that is chosen from some PRF collection
and whose key is given to the adversary?

a. It can sometimes be secure and sometimes insecure.

b. It is always secure.

c. It is always insecure.


8
Key derivation, protecting passwords, slow hashes, Merkle
trees

Last lecture we saw the notion of cryptographic hash functions


which are functions that behave like a random function, even in set-
tings (unlike that of standard PRFs) where the adversary has access to
the key that allows them to evaluate the hash function. Hash functions
have found a variety of uses in cryptography, and in this lecture we
survey some of their other applications. In some of these cases, we
only need the relatively mild and well-defined property of collision
resistance while in others we only know how to analyze security under
the stronger (and not precisely well defined) random oracle heuristic.

8.1 KEYS FROM PASSWORDS


We have seen great cryptographic tools, including PRFs, MACs, and
CCA secure encryptions, that Alice and Bob can use when they share
a cryptographic key of 128 bits or so. But unfortunately, many of the
current users of cryptography are humans which, generally speaking,
have extremely faulty memory capacity for storing large numbers.
There are 628 ≈ 248 ways to select a password of 8 upper and lower
case letters + numbers, but some letter/numbers combinations end up
being chosen much more frequently than others. Due to several large
scale hacks, very large databases of passwords have been made public,
and one estimate is that 91 percent of the passwords chosen by users
are contained in a list of about 1, 000 ≈ 210 strings.
If we choose a password at random from some set 𝐷 then the en-
tropy of the password is simply log2 |𝐷|. However, estimating the
entropy of real life passwords is rather difficult. For example, suppose
that I use the winning Massachussets Mega-Lottery numbers as my
password. A priori, my password consists of 5 numbers between 1 till
75 and so its entropy is log2 (755 ) ≈ 31. However, if an attacker knew
that I did this, the entropy might be something like log(520) ≈ 9 (since

Compiled on 11.17.2021 22:35


176 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

there were only 520 such numbers selected in the last 10 years). More-
over, if they knew exactly what draw I based my password on, then
they would know it exactly and hence the entropy (from their point of
view) would be zero. This is worthwhile to emphasize:
The entropy of a secret is always measured with respect to
the attacker’s point of view.

The exact security of passwords is of course a matter of intense


practical interest, but we will simply model the password as being
chosen at random from some set 𝐷 ⊆ {0, 1}𝑛 (which is sometimes
called the “dictionary”). The set 𝐷 is known to the attacker, but she
has no information on the particular choice of the password.
Much of the challenge for using passwords securely relies on the
distinction between offline and online attacks. If each guess for a pass-
word requires interacting online with a server, as is the case when
typing a PIN number in the ATM, then even a weak password (such
as a 4 digit PIN that at best provides 13 bits of entropy) can yield
meaningful security guarantees, as typically an alarm would be raised
after five or so failed attempts.
However, if the adversary has the ability to check offline whether
a password is correct then the number of guesses they can try can be
as high as the number of computing cycles at their disposal, which
can easily run into the billions and so break passwords of 30 or more
bits of entropy. (This is an issue we’ll return to after we learn about
public key cryptography when we’ll talk about password authenticated key
exchange.)
Consider a password manager application. In such an application,
a user typically chooses a master password 𝑝𝑚𝑎𝑠𝑡𝑒𝑟 which she can then
use to access all her other passwords 𝑝1 , … , 𝑝𝑡 . To enable her to do
so without requiring online access to a server, the master password
𝑝𝑚𝑎𝑠𝑡𝑒𝑟 is used to encrypt the other passwords. However to do that, we
need to derive a key 𝑘𝑚𝑎𝑠𝑡𝑒𝑟 from the password.

P
A natural approach is to simply let the key be the
password. For example, if the password 𝑝 is a string
of at most 16 bytes, then we can simply treat it as a
128 bit key and use it for encryption. Stop and think
why this would not be a good idea. In particular think
of an example of a secure encryption (𝐸, 𝐷) and a
distribution 𝑃 over {0, 1}𝑛 of entropy at least 𝑛/2 such
that if the key 𝑘 is chosen at random from 𝑃 then the
encryption will be completely insecure.

A classical approach is to simply use a cryptographic hash function


𝐻 ∶ {0, 1}∗ → {0, 1}𝑛 , and let 𝑘𝑚𝑎𝑠𝑡𝑒𝑟 = 𝐻(𝑝𝑚𝑎𝑠𝑡𝑒𝑟 ). If we think of 𝐻 as
ke y d e ri vati on, p rote c ti ng pa ssword s, slow ha she s, me rkl e tre e s 177

a random oracle and 𝑝𝑚𝑎𝑠𝑡𝑒𝑟 as chosen randomly from 𝐷, then as long


as an attacker makes ≪ |𝐷| queries to the oracle, they are unlikely to
make the query 𝑝𝑚𝑎𝑠𝑡𝑒𝑟 and hence the value 𝑘𝑚𝑎𝑠𝑡𝑒𝑟 will be completely
random from their point of view.
However, since |𝐷| is not too large, it might not be so hard to per-
form such |𝐷| queries. For this reason, people typically use a deliber-
ately slow hash function as a key derivation function. The rationale is that
the honest user only needs to evaluate 𝐻 once, and so could afford
for it to take a while, while the adversary would need to evaluate it
|𝐷| times. For example, if |𝐷| is about 100, 000 and the honest user is
willing to spend 1 cent of computation resources every time they need
to derive 𝑘𝑚𝑎𝑠𝑡𝑒𝑟 from 𝑝𝑚𝑎𝑠𝑡𝑒𝑟 , then we could set 𝐻(⋅) so that it costs
1 cent to evaluate it and hence on average it will cost the adversary
1, 000 dollars to recover it.
There are several approaches for trying to make 𝐻 deliberately
“slow” or “costly” to evaluate but the most popular and simplest one
is to simply let 𝐻 be obtained by iterating many times a basic hash
function such as SHA-256. That is, 𝐻(𝑥) = ℎ(ℎ(ℎ(⋯ ℎ(𝑥)))) where ℎ is
some standard (“fast”) cryptographic hash function and the number
of iterations is tailored to be the largest one that the honest users can 1
Since CPU speeds can vary quite radically and
tolerate.1 attackers might even use special-purpose hardware
In fact, typically we will set 𝑘𝑚𝑎𝑠𝑡𝑒𝑟 = 𝐻(𝑝𝑚𝑎𝑠𝑡𝑒𝑟 ‖𝑟) where 𝑟 is a to evaluate iterated hash functions quickly, Abadi,
Burrows, Manasse, and Wobber suggested in 2003
long random but public string known as a “salt” (see Fig. 8.1). Includ- to use memory bound functions as an alternative
ing such a “salt” can be important to foiling an adversary’s attempts to approach, where these are functions 𝐻(⋅) designed
so that evaluating them will consume at least 𝑇 bits
amortize the computation costs, see the exercises. of memory for some large 𝑇 . See also the followup
paper of Dwork, Goldberg and Naor. This approach
has also been used in some practical key derivation
Figure 8.1: To obtain a key from a password we will
functions such as scrypt and Argon2.
typically use a “slow” hash function to map the
password and a unique-to-user public “salt” value
to a cryptographic key. Even with such a procedure,
the resulting key cannot be considered as secure
and unpredictable as a key that was chosen truly at
random, especially if we are in a setting where an
adversary can launch an offline attack to guess all
possibilities.

Even when we don’t use one password to encrypt others, it is gen-


erally considered the best practice to never store a password in the
178 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

clear but always in this “slow hashed and salted” form, so if the pass-
words file falls to the hands of an adversary it will be expensive to
recover them.

8.2 MERKLE TREES AND VERIFYING STORAGE.


Suppose that you outsource to the cloud storing your huge data file
𝑥 ∈ {0, 1}𝑁 . You now need the 𝑖𝑡ℎ bit of that file and ask the cloud for
𝑥𝑖 . How can you tell that you actually received the correct bit?
Ralph Merkle came up in 1979 with a clever solution, which is
known as “Merkle hash trees”. The idea is the following (see The-
orem 8.1): suppose we have a collision-resistant hash function ℎ ∶
{0, 1}2𝑛 → {0, 1}𝑛 , and think of the string 𝑥 as composed of 𝑡 blocks
of size 𝑛. We then hash every pair of consecutive blocks to transform
𝑥 into a string 𝑥1 of 𝑡/2 blocks, and continue in this way for log 𝑡 steps
until we get a single block 𝑦 ∈ {0, 1}𝑛 . (Assume here 𝑡 is a power of
two for simplicity, though it doesn’t make much difference.)

Figure 8.2: In the Merkle Tree construction we map a


long string 𝑥 into a block 𝑦 ∈ {0, 1}𝑛 that is a “digest”
of the long string 𝑥. As in a collision resistant hash
we can imagine that this map is “one to one” in the
sense that it won’t be possible to find 𝑥′ ≠ 𝑥 with
the same digest. Moreover, we can efficiently certify
that a certain bit of 𝑥 is equal to some value without
sending out all of 𝑥 but rather the log 𝑡 blocks that are
on the path between 𝑖 to the root together with their
“siblings” used in the hash function, for a total of at
most 2 log 𝑡 blocks.

Alice, who sends 𝑥 to the cloud Bob, will keep the short block 𝑦.
Whenever Alice queries the value 𝑖 she will ask for a certificate that 𝑥𝑖
is indeed the right value. This certificate will consists of the block that
contains 𝑖, as well as all of the 2 log 𝑡 blocks that were used in the hash
from this block to the root. The security of this scheme follows from
the following simple theorem:

Theorem 8.1 — Merkle Tree security.Suppose that 𝜋 is a valid certificate


that 𝑥𝑖 = 𝑏, then either this statement is true, or one can effi-
ciently extract from 𝜋 and 𝑥 two inputs 𝑧 ≠ 𝑧 ′ in {0, 1}2𝑛 such that
ℎ(𝑧) = ℎ(𝑧 ′ ).
ke y d e ri vati on, p rote c ti ng pa ssword s, slow ha she s, me rkl e tre e s 179

Proof. The certificate 𝜋 consists of a sequence of log 𝑡 pairs of size-𝑛


blocks that are obtained by following the path on the tree from the
𝑖𝑡ℎ coordinate of 𝑥 to the final root 𝑦. The last pair of blocks is the
a preimage of 𝑦 under ℎ, while each pair on this list is a preimage
of one of the blocks in the next pair. If 𝑥𝑖 ≠ 𝑏, then the first pair of
blocks cannot be identical to the pair of blocks of 𝑥 that contains the
𝑖𝑡ℎ coordinate. However, since we know the final root 𝑦 is identical, if
we compare the corresponding path in 𝑥 to 𝜋, we will see that at some
point there must be an input 𝑧 in the path from 𝑥 and a distinct input
𝑧′ in 𝜋 that hash to the same output.

8.3 PROOFS OF RETRIEVABILITY


The above provides a way to ensure Alice that the value retrieved
from a cloud storage is correct, but how can Alice be sure that the
cloud server still stores the values that she did not ask about?
A priori, you might think that she obviously can’t. If Bob is lazy, or
short on storage, he could decide to store only some small fraction of 𝑥
that he thinks Alice is more likely to query for. As long as Bob wasn’t
unlucky and Alice doesn’t ask these queries, then it seems Bob could
get away with this. In a proof of retrievability, first proposed by Juels
and Kalisky in 2007, Alice would be able to get convinced that Bob
does in fact store her data.
First, note that Alice can guarantee that Bob stores at least 99 per-
cent of her data, by periodically asking him to provide answers (with
proofs!) of the value of 𝑥 at 100 or so random locations. The idea is
that if Bob dropped more than 1 percent of the bits, then he’d be very
likely to be caught “red handed” and get a question from Alice about
a location he did not retain.
Now, if we used some redundancy to store 𝑥 such as the RAID
format, where it is composed of some small number 𝑐 parts and we
can recover any bit of the original data as long as at most one of the
parts were lost, then we might hope that even if 1% of 𝑥 was in fact lost
by Bob, we could still recover the whole string. This is not a fool-proof
guarantee since it could possibly be that the data lost by Bob was not
confined to a single part. To handle this case one needs to consider
generalizations of RAID known as “local reconstruction codes” or
“locally decodable codes”. The paper by Dodis, Vadhan and Wichs is
a good source for this; see also these slides by Seny Kamara for a more
recent overview of the theory and implementations.
180 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

8.4 ENTROPY EXTRACTION


As we’ve seen time and again, randomness is crucial to cryptography.
But how do we get these random bits we need? If we only have a small
number 𝑛 of random bits (e.g., 𝑛 = 128 or so) then we can expand
them to as large a number as we want using a pseudorandom genera-
tor, but where do we get those initial 𝑛 bits from?
The approach used in practice is known as “harvesting entropy”.
The idea is that we make great many measurements 𝑥1 , … , 𝑥𝑚 of
events that are considered “unpredictable” to some extent, including
mouse movements, hard-disk and network latency, sources of noise
etc… and accumulate them in an entropy “pool” which would simply
be some memory array. When we estimate that we have accumulated
more than 128 bits of randomness, then we hash this array into a 128
bit string which we’ll use as a seed for a pseudorandom generator 2
The reason that people use entropy “pools” rather
(see Fig. 8.3).2 Because entropy needs to be measured from the point of than simply adding the entropy to the generator’s
view of the attacker, this “entropy estimation” routine is a bit of a “black state as it comes along is that the latter alternative
might be insecure. Suppose that initial state of the
art” and there isn’t a very principled way to perform it. In practice
generator was known to the adversary and now the
people try to be very conservative (e.g., assume that there is only one entropy is “trickling in” one bit at a time while we
bit of entropy for 64 bits of measurements or so) and hope for the best, continuously use the generator to produce outputs
that can be observed by the adversary. Every time
which often works but sometimes also spectacularly fails, especially in a new bit of entropy is added, the adversary now
embedded systems that do not have access to many of these sources. has uncertainty between two potential states of
the generator, but once an output is produced this
eliminates this uncertainty. In contrast, if we wait
until we accumulate, say, 128 bits of entropy, then
now the
Figure : To obtainwill
8.3adversary pseudorandom bits forstate
have 2128 possible crypto-
graphic to
options applications we hash
consider, and down
it could measurements
be computationally
which contain
infeasible some
to cull entropy
them usingin them observation.
further to a shorter
string that is hopefully truly uniformly random or at
least statistically close to it, and then expand this to
get as many pseudorandom bits as we need using a
pseudorandom generator.

How do hash functions figure into this? The idea is that if an input
𝑥 has 𝑛 bits of entropy then ℎ(𝑥) would still have the same bits of
entropy, as long as its output is larger than 𝑛. In practice people use
the notion of “entropy” in a rather loose sense, but we will try to be
more precise below.
ke y d e ri vati on, p rote c ti ng pa ssword s, slow ha she s, me rkl e tre e s 181

The entropy of a distribution 𝐷 is meant to capture the amount of


“uncertainty” you have over the distribution. The canonical example
is when 𝐷 is the uniform distribution over {0, 1}𝑛 , in which case it
has 𝑛 bits of entropy. If you learn a single bit of 𝐷 then you reduce
the entropy by one bit. For example, if you learn that the 17𝑡ℎ bit is
equal to 0, then the new conditional distribution 𝐷′ is the uniform
distribution over all strings in 𝑥 ∈ {0, 1}𝑛 such that 𝑥17 = 0 and
has 𝑛 − 1 bits of entropy. Entropy is invariant under permutations
of the sample space, and only depends on the vector of probabilities,
and thus for every set 𝑆 all notions of entropy will give log2 |𝑆| bits
of entropy for the uniform distribution over 𝑆. A distribution that is
uniform over some set 𝑆 is known as a flat distribution.
Where different notions of entropy begin to differ is when the dis-
tributions are not flat. The Shannon entropy follows the principle that
“original uncertainty = knowledge learned + new uncertainty”. That
is, it obeys the chain rule which is that if a random variable (𝑋, 𝑌 ) has
𝑛 bits of entropy, and 𝑋 has 𝑘 bits of entropy, then after learning 𝑋 on
average 𝑌 will have 𝑛 − 𝑘 bits of entropy. That is,
𝐻𝑆ℎ𝑎𝑛𝑛𝑜𝑛 (𝑋) + 𝐻𝑆ℎ𝑎𝑛𝑛𝑜𝑛 (𝑌 |𝑋) = 𝐻𝑆ℎ𝑎𝑛𝑛𝑜𝑛 (𝑋, 𝑌 )
Where the entropy of a conditional distribution 𝑌 |𝑋 is simply
𝔼𝑥←𝑋 𝐻𝑆ℎ𝑎𝑛𝑛𝑜𝑛 (𝑌 |𝑋 = 𝑥) where 𝑌 |𝑋 = 𝑥 is the distribution on 𝑌
obtained by conditioning on the event that 𝑋 = 𝑥.
If (𝑝1 , … , 𝑝𝑚 ) is a vector of probabilities summing up to 1 and let us
assume they are rounded so that for every 𝑖, 𝑝𝑖 = 𝑘𝑖 /2𝑛 for some inte-
ger 𝑘𝑖 . We can then split the set {0, 1}𝑛 into 𝑚 disjoint sets 𝑆1 , … , 𝑆𝑚
where |𝑆𝑖 | = 𝑘𝑖 , and consider the probability distribution (𝑋, 𝑌 )
where 𝑌 is uniform over {0, 1}𝑛 , and 𝑋 is equal to 𝑖 whenever 𝑌 ∈ 𝑆𝑖 .
Therefore, by the principles above we know that 𝐻𝑆ℎ𝑎𝑛𝑛𝑜𝑛 (𝑋, 𝑌 ) = 𝑛
(since 𝑋 is completely determined by 𝑌 and hence (𝑋, 𝑌 ) is uniform
over a set of 2𝑛 elements), and 𝐻(𝑌 |𝑋) = 𝔼 log 𝑘𝑖 . Thus the chain rule
𝑚
tells us that 𝐻𝑆ℎ𝑎𝑛𝑛𝑜𝑛 (𝑋) = 𝐻(𝑋, 𝑌 )−𝐻(𝑌 |𝑋) = 𝑛−∑𝑖=1 𝑝𝑖 log(𝑘𝑖 ) =
𝑚
𝑛 − ∑𝑖=1 𝑝𝑖 log(2𝑛 𝑝𝑖 ) since 𝑝𝑖 = 𝑘𝑖 /2𝑛 . Since log(2𝑛 𝑝𝑖 ) = 𝑛 + log(𝑝𝑖 )
we see that this means that
3
The notation 𝐻∞ (⋅) for min entropy comes from the
fact that one can define a family of entropy like func-
𝐻𝑆ℎ𝑎𝑛𝑛𝑜𝑛 (𝑋) = 𝑛 − ∑ 𝑝𝑖 ⋅ 𝑛 − ∑ 𝑝𝑖 log(𝑝𝑖 ) = − ∑ 𝑝𝑖 log(𝑝𝑖 ) tions, containing a function for every non-negative
𝑖 𝑖 𝑖 number 𝑝 based on the 𝑝-norm of the probability
distribution. That is, the Rényi entropy of order 𝑝 is
defined as 𝐻𝑝 (𝑋) = (1 − 𝑝)−1 − log(∑𝑥 Pr[𝑋 = 𝑥]𝑝 ).
using the fact that ∑𝑖 𝑝𝑖 = 1. The min entropy can be thought of as the limit of 𝐻𝑝
The Shannon entropy has many attractive properties, but it turns when 𝑝 tends to infinity while the Shannon entropy is
out that for cryptographic applications, the notion of min entropy is the limit as 𝑝 tends to 1. The entropy 𝐻2 (⋅) is related
to the collision probability of 𝑋 and is often used as
more appropriate. For a distribution 𝑋 the min-entropy is simply de- well. The min entropy is the smallest among all the
fined as 𝐻∞ (𝑋) = min𝑥 log(1/ Pr[𝑋 = 𝑥]).3 Note that if 𝑋 is flat then entropies and hence it is the most conservative (and
so appropriate for usage in cryptography). For flat
𝐻∞ (𝑋) = 𝐻𝑆ℎ𝑎𝑛𝑛𝑜𝑛 (𝑋) and that 𝐻∞ (𝑋) ≤ 𝐻𝑆ℎ𝑎𝑛𝑛𝑜𝑛 (𝑋) for all 𝑋. We sources, which are uniform over a certain subset, all
can now formally define the notion of an extractor: entropies coincide.
182 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

A function ℎ ∶ {0, 1}ℓ+𝑛 →


Definition 8.2 — Randomness extractor.
{0, 1}𝑛 is a randomness extractor (“extractor” for short) if for every
distribution 𝑋 over {0, 1}ℓ with min entropy at least 2𝑛, if we pick
𝑠 to be a random “salt”, the distribution ℎ𝑠 (𝑋) is computationally
indistinguishable from the uniform distribution. 4
4
The pseudorandomness literature studies the notion
The idea is that we apply the hash function to our measurements of extractors much more generally and consider all
in {0, 1}ℓ then if those measurements had at least 𝑘 bits of entropy possible variations for parameters such as the entropy
requirement, the salt (more commonly known as
(with some extra “security margin”) then the output ℎ𝑠 (𝑋) will be as seed) size, the distance from uniformity, and more.
good as random. Since the “salt” value 𝑠 is not secret, it can be chosen The type of notion we consider here is known in that
literature as a “strong seeded extractor”. See Vadhan’s
once at random and hardwired into the description of the function.
monograph for an in-depth treatment of this topic.
(Indeed in practice people often do not explicitly use such a “salt”, but
the hash function description contain some parameters IV that play a
similar role.)

Suppose that ℎ ∶ {0, 1}ℓ+𝑛 →


Theorem 8.3 — Random function is an extractor.
{0, 1} is chosen at random, and ℓ < 𝑛 . Then with high proba-
𝑛 100

bility ℎ is an extractor.

Proof. Let ℎ be chosen as above, and let 𝑋 be some distribution over


{0, 1}ℓ with max𝑥 {Pr[𝑋 = 𝑥]} ≤ 2−2𝑛 . Now, for every 𝑠 ∈ {0, 1}𝑛 let
ℎ𝑠 be the function that maps 𝑥 ∈ {0, 1}ℓ to ℎ(𝑠‖𝑥), and let 𝑌𝑠 = ℎ𝑠 (𝑋).
We want to prove that 𝑌𝑠 is pseudorandom. We will use the following
claim:
Claim: Let 𝐶𝑜𝑙(𝑌𝑠 ) be the probability that two independent
samples from 𝑌𝑠 are identical. Then with probability at least 0.99,
𝐶𝑜𝑙(𝑌𝑠 ) < 2−𝑛 + 100 ⋅ 2−2𝑛 .
Proof of claim: 𝔼𝑠 𝐶𝑜𝑙(𝑌𝑠 ) = ∑𝑠 2−𝑛 ∑𝑥,𝑥′ Pr[𝑋 = 𝑥] Pr[𝑋 =
𝑥′ ] ∑𝑦∈{0,1}𝑛 Pr[ℎ(𝑠, 𝑥) = 𝑦] Pr[ℎ(𝑠, 𝑥′ ) = 𝑦]. Let’s separate this to the
contribution when 𝑥 = 𝑥′ and when they differ. The contribution from
the first term is ∑𝑠 2−𝑛 ∑𝑥 Pr[𝑋 = 𝑥]2 which is simply 𝐶𝑜𝑙(𝑋) =
∑ Pr[𝑋 = 𝑥]2 ≤ 2−2𝑛 since Pr[𝑋 = 𝑥] ≤ 2−2𝑛 . In the second term, the
events that ℎ(𝑠, 𝑥) = 𝑦 and ℎ(𝑠, 𝑥′ ) = 𝑦 are independent, and hence the
contribution here is at most ∑𝑥,𝑥′ Pr[𝑋 = 𝑥] Pr[𝑋 = 𝑥′ ]2−𝑛 . The claim
follows from Markov.
Now suppose that 𝑇 is some efficiently computable function from
{0, 1}𝑛 to {0, 1}, then by Cauchy-Schwarz | 𝔼[𝑇 (𝑈𝑛 )] − 𝔼[𝑇 (𝑌𝑠 )]| =
| ∑𝑦∈{0,1}𝑛 𝑇 (𝑦)[2−𝑛 −Pr[𝑌𝑠 = 𝑦]]| ≤ √∑𝑦 𝑇 (𝑦)2 ⋅ ∑𝑦 (2−𝑛 − Pr[𝑌𝑠 = 𝑦])2
but opening up ∑𝑦 (2−𝑛 − Pr[𝑌𝑆 = 𝑦])2 we get 2−𝑛 − 2 ⋅ 2−𝑛 ∑𝑦 Pr[𝑌𝑠 =
𝑦] + ∑𝑦 Pr[𝑌𝑠 = 𝑦]2 or 𝐶𝑜𝑙(𝑌𝑠 ) − 2−𝑛 which is at most the negligible
quantity 100 ⋅ 2−2𝑛 .

ke y d e ri vati on, p rote c ti ng pa ssword s, slow ha she s, me rkl e tre e s 183

R
Remark 8.4 — Statistical randomness. This proof ac-
tually proves a much stronger statement. First, note
that we did not at all use the fact that 𝑇 is efficiently
computable and hence the distribution ℎ𝑠 (𝑋) will
not be merely pseudorandom but actually statisti-
cally indistinguishable from truly random distribution.
Second, we didn’t use the fact that ℎ is completely
random but rather what we needed was merely
pairwise independence: that for every 𝑥 ≠ 𝑥′ and 𝑦,
Pr𝑠 [ℎ𝑠 (𝑥) = ℎ𝑠 (𝑥′ ) = 𝑦] = 2−2𝑛 . There are efficient
constructions of functions ℎ(⋅) with this property,
though in practice people still often use cryptographic
hash function for this purpose.

8.4.1 Forward and backward secrecy


A cryptographic tool such as encryption is clearly insecure if the ad-
versary learns the private key, and similarly the output of a pseudo-
random generator is insecure if the adversary learns the seed. So, it
might seem as if it’s “game over” once this happens. However, there
is still some hope. For example, if the adversary learns it at time 𝑡
but didn’t know it before then, then one could hope that she does not
learn the information that was exchanged up to time 𝑡 − 1. This prop-
erty is known as “forward secrecy”. It had recently gained interest as
means to protect against powerful “attackers” such as the NSA that
may record the communication transcripts in the hope of deciphering
them in some future after it had learned the secret key. In the context
of pseudorandom generators, one could hope for both forward and
backward secrecy. Forward secrecy means that the state of the genera-
tor is updated at every point in time in a way that learning the state at
time 𝑡 does not help in recovering past state, and “backward secrecy”
means that we can recover from the adversary knowing our internal
state by updating the generator with fresh entropy. See this paper of
me and Halevi for some discussions of this issue, as well as this later
work by Dodis et al.
III
PUBLIC KEY CRYPTOGRAPHY
9
Public key cryptography

People have been dreaming about heavier-than-air flight since at


least the days of Leonardo Da Vinci (not to mention Icarus from Greek
mythology). Jules Verne wrote with rather insightful details about go-
ing to the moon in 1865. But, as far as I know, no one had considered
the possibility of communicating securely without first exchanging a
shared secret key until about 50 years ago. This is surprising given the
thousands of years people have been using secret writing! However,
in the late 1960’s and early 1970’s, several people started to question
this “common wisdom”.
Perhaps the most surprising of these visionaries was an undergrad-
uate student at Berkeley named Ralph Merkle. In the fall of 1974, he
wrote a project proposal for his computer security course that while

“it might seem intuitively obvious that if two people have


never had the opportunity to prearrange an encryption
method, then they will be unable to communicate securely
over an insecure channel… I believe it is false”.

Merkle also felt it is important to add “No. I am not joking.”. The


project proposal was rejected by his professor as “not good enough”.
Merkle later submitted a paper to the communication of the ACM,
where he apologized for the lack of references since he was unable to
find any mention of the problem in the scientific literature, and the
only source where he saw the problem even raised was in a science
fiction story. The paper was rejected with the comment that “Expe-
rience shows that it is extremely dangerous to transmit key information in
the clear.” Merkle showed that one can design a protocol where Alice
and Bob can use 𝑇 invocations of a hash function to exchange a key,
but an adversary (in the random oracle model, though he of course
didn’t use this name) would need roughly 𝑇 2 invocations to break it.
He conjectured that it may be possible to obtain such protocols where
breaking is exponentially harder than using them but could not think of
any concrete way to doing so.

Compiled on 11.17.2021 22:35


188 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Figure 9.1: Ralph Merkle’s Berkeley CS 244 project


proposal for developing public key cryptography

We only found out much later that in the late 1960’s, a few years be-
fore Merkle, James Ellis of the British Intelligence agency GCHQ was
having similar thoughts. His curiosity was spurred by an old World
War II manuscript from Bell labs that suggested the following way
that two people could communicate securely over a phone line. Alice
would inject noise to the line, Bob would relay his messages, and then
Alice would subtract the noise to get the signal. The idea is that an
adversary over the line sees only the sum of Alice’s and Bob’s signals
and doesn’t know what came from what. This got James Ellis thinking
whether it would be possible to achieve something like that digitally.
As he later recollected, in 1970 he realized that in principle this should
be possible. He could think of an hypothetical black box 𝐵 that on
input a “handle” 𝛼 and plaintext 𝑝 would give a “ciphertext” 𝑐. There
would be a secret key 𝛽 corresponding to 𝛼 such that feeding 𝛽 and 𝑐
to the box would recover 𝑝. However, Ellis had no idea how to actu-
ally instantiate this box. He and others kept giving this question as a
puzzle to bright new recruits until one of them, Clifford Cocks, came
up in 1973 with a candidate solution loosely based on the factoring
problem; in 1974 another GCHQ recruit, Malcolm Williamson, came
up with a solution using modular exponentiation.
But among all those thinking of public key cryptography, probably
the people who saw the furthest were two researchers at Stanford,
Whit Diffie and Martin Hellman. They realized that with the advent
of electronic communication, cryptography would find new applica-
tions beyond the military domain of spies and submarines. And they
understood that in this new world of many users and point to point
communication, cryptography would need to scale up. They envi-
sioned an object which we now call “trapdoor permutation” though
they called it “one way trapdoor function” or sometimes simply “pub-
lic key encryption”. This is a collection of permutations {𝑝𝑘 } where
p u bl i c ke y c ry p tog ra p hy 189

𝑝𝑘 is a permutation over (say) {0, 1}|𝑘| , and the map (𝑥, 𝑘) ↦ 𝑝𝑘 (𝑥)
is efficiently computable but the reverse map (𝑘, 𝑦) ↦ 𝑝𝑘−1 (𝑦) is com-
putationally hard. Yet, there is also some secret key 𝑠(𝑘) (i.e., the
“trapdoor”) such that using 𝑠(𝑘) it is possible to efficiently compute
𝑝𝑘−1 . Their idea was that using such a trapdoor permutation, Alice
who knows 𝑠(𝑘) would be able to publish 𝑘 on some public file such
that everyone who wants to send her a message 𝑥 could do so by com-
puting 𝑝𝑘 (𝑥). (While today we know, due to the work of Goldwasser
and Micali, that such a deterministic encryption is not a good idea,
at the time Diffie and Hellman had amazing intuitions but didn’t re-
ally have proper definitions of security.) But they didn’t stop there.
They realized that protecting the integrity of communication is no
less important than protecting its secrecy. Thus, they imagined that
Alice could “run encryption in reverse” in order to certify or sign mes-
sages. That is, given some message 𝑚, Alice would send the value
𝑥 = 𝑝𝑘−1 (ℎ(𝑚)) (for a hash function ℎ) as a way to certify that she en-
dorses 𝑚, and every person who knows 𝑘 could verify this by check-
ing that 𝑝𝑘 (𝑥) = ℎ(𝑚).
At this point, Diffie and Hellman were in a position similar to past
physicists, who predicted that a certain particle should exist but had
no experimental verification. Luckily they met Ralph Merkle. His
ideas about a probabilistic key exchange protocol, together with a sug-
gestion from their Stanford colleague John Gill, inspired them to come
up with what today is known as the Diffie-Hellman Key Exchange (un-
beknownst to them, a similar protocol was found two years earlier at
GCHQ by Malcolm Williamson). They published their paper “New
Directions in Cryptography” in 1976, and it is considered to have
brought about the birth of modern cryptography. However, they still
didn’t find their elusive trapdoor function. This was done the next
year by Rivest, Shamir and Adleman who came up with the RSA trap-
door function, which through the framework of Diffie and Hellman
yielded not just encryption but also signatures (this was essentially
the same function discovered earlier by Clifford Cocks at GCHQ,
though as far as I can tell Cocks, Ellis and Williamson did not real-
ize the application to digital signatures). From this point on began a
flurry of advances in cryptography which hasn’t really died down till
this day.

9.1 PRIVATE KEY CRYPTO RECAP


Before we embark on the wonderful journey to public key cryptogra-
phy, let’s briefly look back and see what we learned about private key Figure 9.2: John T. Gill III. Gill proposed to Diffie and
Hellman to use modular exponentiation as a one-
cryptography. This material is mostly covered in Chapters 1 to 9 of the
way function, which (together with Merkle’s ideas)
Katz Lindell (KL) book and Part I (Chapters 1-9) of the Boneh Shoup enabled what’s known today as the Diffie-Hellman Key
(BS) book. Now would be a good time for you to read the correspond- Exchange protocol.
190 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

ing proofs in one or both of these books. It is often helpful to see the
same proof presented in a slightly different way. Below is a review of
some of the various reductions we saw in class, with pointers to the
corresponding sections in the Katz-Lindell (2nd ed) and Boneh-Shoup
books. These are also covered in Rosulek’s book.

• Pseudorandom generators (PRG) length extension (from 𝑛 + 1


output PRG to 𝑝𝑜𝑙𝑦(𝑛) output PRG): KL 7.4.2, BS 3.4.2
• PRG’s to pseudorandom functions (PRF’s): KL 7.5, BS 4.6
• PRF’s to Chosen Plaintext Attack (CPA) secure encryption: KL
3.5.2, BS 5.5
• PRF’s to secure Message Authentication Codes (MAC’s): KL 4.3, BS
6.3
• MAC’s + CPA secure encryption to chosen ciphertext attack (CCA)
secure encryption: BS 4.5.4, BS 9.4
• Pseudorandom permutation (PRP’s) to CPA secure encryption /
block cipher modes: KL 3.5.2, KL 3.6.2, BS 4.1, 4.4, 5.4
• Hash function applications: fingerprinting, Merkle trees, pass-
words: KL 5.6, BS Chapter 8
• Coin tossing over the phone: we saw a construction in class that
used a commitment scheme built out of a pseudorandom generator.
This is shown in BS 3.12, KL 5.6.5 shows an alternative construction
using random oracles.
• PRP’s from PRF’s: we only sketched the construction which can be
found in KL 7.6 or BS 4.5

One major point we did not talk about in this course was one way
functions. The definition of a one way function is quite simple:

Definition 9.1 — One Way Functions. A function 𝑓 ∶ {0, 1}∗ → {0, 1}∗ is
a one way function if it is efficiently computable and for every 𝑛 and
a 𝑝𝑜𝑙𝑦(𝑛) time adversary 𝐴, the probability over 𝑥 ←𝑅 {0, 1}𝑛 that
𝐴(𝑓(𝑥)) outputs 𝑥′ such that 𝑓(𝑥′ ) = 𝑓(𝑥) is negligible.

The “OWF conjecture” is the conjecture that one way functions


exist. It turns out to be a necessary and sufficient condition for much
of private key cryptography. That is, the following theorem is known
(by combining works of many people):

Theorem 9.2 — One way functions and private key cryptography. The follow-
ing are equivalent:

• One way functions exist

• Pseudorandom generators (with non-trivial stretch) exist


p u bl i c ke y c ry p tog ra p hy 191

• Pseudorandom functions exist

• CPA secure private key encryptions exist

• CCA secure private key encryptions exist

• Message Authentication Codes exist

• Commitment schemes exist


The key result in the proof of this theorem is the result of Hastad,
Impagliazzo, Levin and Luby that if one way functions exist then
pseudorandom generators exist. If you are interested in finding out
more, see Chapter 7 in Vadhan’s pseudorandomness monograph. Sec-
tions 7.2-7.4 in the KL book also cover a special case of this theorem
for the case that the one way function is a permutation on {0, 1}𝑛 for
every 𝑛. This proof has been considerably simplified and quantita-
tively improved in works of Haitner, Holenstein, Reingold, Vadhan,
Wee and Zheng. See this talk of Salil Vadhan for more on this. See
also these lecture notes from a Princeton seminar I gave on this topic
(though the proof has been simplified since then by the above works).

R
Remark 9.3 — Cryptanalytic attacks on private key cryp-
tosystems. Another topic we did not discuss in depth
is attacks on private key cryptosystems. These attacks
often work by “opening the black box” and looking at
the internal operation of block ciphers or hash func-
tions. We then assign variables to various internal
registers, and look to find collections of inputs that
would satisfy some non-trivial relation between those
variables. This is a rather vague description, but you
can read KL Section 6.2.6 on linear and differential
cryptanalysis and BS Sections 3.7-3.9 and 4.3 for more
information. See also this course of Adi Shamir, and
the courses of Dunkelman on analyzing block ciphers
and hash functions. There is also the fascinating area
of side channel attacks on both public and private key
crypto, see this course of Tromer.

R
Remark 9.4 — Digital Signatures. We will discuss in
this lecture Digital signatures, which are the public key
analog of message authentication codes. Surprisingly,
despite being a “public key” object, it is possible to
base digital signatures on one-way functions (this is
obtained using ideas of Lamport, Merkle, Goldwasser-
Goldreich-Micali, Naor-Yung, and Rompel). However
these constructions are not very efficient (and this
may be inherent), and so in practice people use digital
192 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

signatures that are built using similar techniques to


those used for public key encryption.

9.2 PUBLIC KEY ENCRYPTIONS: DEFINITION


We now discuss how we define security for public key encryption. As
mentioned above, it took quite a while for cryptographers to arrive at
the “right” definition, but in the interest of time we will skip ahead to
what by now is the standard basic notion (see also Fig. 9.3):

A triple of efficient algo-


Definition 9.5 — Valid public key encryption.
rithms (𝐺, 𝐸, 𝐷) is a public key encryption scheme of length function
ℓ ∶ ℕ → ℕ if it satisfies the following:

• 𝐺 is a probabilistic algorithm known as the key generation algo-


rithm that on input 1𝑛 outputs a distribution over pair of keys
(𝑒, 𝑑).
Figure 9.3: In a public key encryption, the receiver Bob
generates a pair of keys (𝑒, 𝑑). The encryption key 𝑒 is
• 𝐸 is the encryption algorithm that takes a pair of inputs 𝑒, 𝑚 with used for encryption, and the decryption key is used for
𝑚 ∈ {0, 1}ℓ(𝑛) and outputs 𝑐 = 𝐸𝑒 (𝑚). decryption. We call it a public key system since the
security of the scheme does not rely on the adversary
• 𝐷 is the decryption algorithm that takes a pair of inputs 𝑑, 𝑐 and Eve not knowing the encryption key. Hence, Bob can
publicize the key 𝑒 to a great many potential receivers
outputs 𝑚′ = 𝐷𝑑 (𝑐). and still ensure confidentiality of the messages he
receives.
• For every 𝑚 ∈ {0, 1}ℓ(𝑛) , with probability 1 − 𝑛𝑒𝑔𝑙(𝑛) over
the choice of (𝑒, 𝑑) output from 𝐺(1𝑛 ) and the coins of 𝐸,𝐷,
𝐷𝑑 (𝐸𝑒 (𝑚)) = 𝑚.

Definition 9.5 just refers to the validity of a public-key encryption


scheme, namely the condition that we can encrypt and decrypt using
the keys 𝑒 and 𝑑 respectively, but not to its security. The standard
definition of security for public-key encryption is CPA security:

We say that
Definition 9.6 — CPA security for public-key encryption.
(𝐺, 𝐸, 𝐷) is CPA secure if every efficient adversary 𝐴 wins the
following game with probability at most 1/2 + 𝑛𝑒𝑔𝑙(𝑛):

• (𝑒, 𝑑) ←𝑅 𝐺(1𝑛 )

• 𝐴 is given 𝑒 and outputs a pair of messages 𝑚0 , 𝑚1 ∈ {0, 1}𝑛 .

• 𝐴 is given 𝑐 = 𝐸𝑒 (𝑚𝑏 ) for 𝑏 ←𝑅 {0, 1}.

• 𝐴 outputs 𝑏′ ∈ {0, 1} and wins if 𝑏′ = 𝑏.


p u bl i c ke y c ry p tog ra p hy 193

P
Despite it being a “chosen plaintext attack”, we don’t
explicitly give 𝐴 access to the encryption oracle in the
public key setting. Make sure you understand why
giving it such access would not give it more power.

One metaphor for a public key encryption is a “self-locking lock”


where you don’t need the key to lock it (but rather you simply push
the shackle until it clicks and lock), but you do need the key to unlock
it. So, if Alice generates (𝑒, 𝑑) = 𝐺(1𝑛 ), then 𝑒 serves as the “lock”
that can be used to encrypt messages for Alice while only 𝑑 can be used
to decrypt the messages. Another way to think about it is that 𝑒 is a
“hobbled key” that can be used for only some of the functions of 𝑑.

9.2.1 The obfuscation paradigm


Why would someone imagine that such a magical object could exist?
The writing of both James Ellis as well as Diffie and Hellman sug-
gests that their thought process was roughly as follows. You imagine
a “magic black box” 𝐵 such that if all parties have access to 𝐵 then we
could get a public key encryption scheme. Now if public key encryp-
tion was impossible it would mean that for every possible program
𝑃 that computes the functionality of 𝐵, if we distribute the code of
𝑃 to all parties, then we don’t get a secure encryption scheme. That
means that no matter what program 𝑃 the adversary gets, she will always
be able to get some information out of that code that helps break the
encryption, even though she wouldn’t have been able to break it if 𝑃
was a black box. Now, intuitively understanding arbitrary code is a
very hard problem, so Diffie and Hellman imagined that it might be
possible to take this ideal 𝐵 and compile it to some sufficiently low
level assembly language so that it would behave as a “virtual black
box”.
In particular, if you took, say, the encoding procedure 𝑚 ↦ 𝑝𝑘 (𝑚)
of a block cipher with a particular key 𝑘 and ran it through an opti-
mizing compiler, you might hope that while it would be possible to
perform this map using the resulting executable, it will be hard to ex-
tract 𝑘 from it. Hence, you could treat this code as a “public key”. This
suggests the following approach for getting an encryption scheme:

“Obfuscation based public key encryption”:


(Thought experiment - not an actual construction)
Ingredients:
(i) A pseudorandom permutation collec-
tion {𝑝𝑘 }𝑘∈{0,1}∗ where for every 𝑘 ∈ {0, 1}𝑛 ,
𝑝𝑘 ∶ {0, 1}𝑛 → {0, 1}𝑛
194 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

(ii) An “obfuscating compiler” polynomial-time


computable 𝑂 ∶ {0, 1}∗ → {0, 1}∗ such that for every
circuit 𝐶, 𝑂(𝐶) is a circuit that computes the same
function as 𝐶.
Operation:

• Key Generation: The private key is 𝑘 ←𝑅 {0, 1}𝑛 ,


the public key is 𝐸 = 𝑂(𝐶𝑘 ) where 𝐶𝑘 is the cir-
cuit that maps 𝑥 ∈ {0, 1}𝑛 to 𝑝𝑘 (𝑥).
• Encryption: To encrypt 𝑚 ∈ {0, 1}𝑛 with pub-
lic key 𝐸, choose IV ←𝑅 {0, 1}𝑛 and output
(IV, 𝐸(𝑥 ⊕ IV)).
• Decryption: To decrypt (IV, 𝑦) with key 𝑘, output
IV ⊕ 𝑝𝑘−1 (𝑦).

Diffie and Hellman couldn’t really find a way to make this work,
but it convinced them this notion of public key is not inherently im-
possible. This concept of compiling a program into a functionally
equivalent but “inscrutable” form is known as software obfuscation. It
had turned out to be quite a tricky object to both define formally and
achieve, but it serves as very good intuition for what can be achieved,
even if, as with the random oracle, this intuition can sometimes be
too optimistic. (Indeed, if software obfuscation was possible then we
could obtain a “random oracle like” hash function by taking the code
of a function 𝑓𝑘 chosen from a PRF family and compiling it through an
obfuscating compiler.)
We will not formally define obfuscators yet, but on an intuitive
level it would be a compiler that takes a program 𝑃 and maps into a
program 𝑃 ′ such that:

• 𝑃 ′ is not much slower/bigger than 𝑃 (e.g., as a Boolean circuit it


would be at most polynomially larger).
• 𝑃 ′ is functionally equivalent to 𝑃 , i.e., 𝑃 ′ (𝑥) = 𝑃 (𝑥) for every input 1
For simplicity, assume that the program 𝑃 is side
effect free and hence it simply computes some function,
𝑥.1
say from {0, 1}𝑛 to {0, 1}ℓ for some 𝑛, ℓ.
• 𝑃 ′ is “inscrutable” in the sense that seeing the code of 𝑃 ′ is not
more informative than getting black box access to 𝑃 .

Let me stress again that there is no known construction of obfus-


cators achieving something similar to this definition. In fact, the most
natural formalization of this definition is impossible to achieve (as we
might see later in this course). Only very recently (exciting!) progress
was finally made towards obfuscators-like notions strong enough to
achieve these and other applications, and there are some significant
caveats, see my survey on this topic and a more recent Quanta article.
p u bl i c ke y c ry p tog ra p hy 195

However, when trying to stretch your imagination to consider the


amazing possibilities that could be achieved in cryptography, it is
not a bad heuristic to first ask yourself what could be possible if only
everyone involved had access to a magic black box. It certainly worked
well for Diffie and Hellman.

9.3 SOME CONCRETE CANDIDATES:


We would have loved to prove a theorem of the form:

“Theorem”: If the PRG conjecture is true, then there


exists a CPA-secure public key encryption.

This would have meant that we do not need to assume anything


more than the already minimal notion of pseudorandom generators
(or equivalently, one way functions) to obtain public key cryptogra-
phy. Unfortunately, no such result is known (and this may be inher-
ent). The kind of results we know have the following form:

Theorem: If problem 𝑋 is hard, then there exists a


CPA-secure public key encryption.

Here, 𝑋 is some problem that people have tried to solve and


couldn’t. Thus, we have various candidates for public key encryp-
tion, and we fervently hope that at least one of them is actually secure.
The dirty little secret of cryptography is that we actually don’t have
2
There have been some other more exotic suggestions
for public key encryption (including some by yours
that many candidates. We really have only two well studied families.2 truly as well as suggestions such as the isogeny star
One is the “group theoretic” family that relies on the difficulty of the problem, though see also this), but they have not yet
received wide scrutiny.
discrete logarithm (over modular arithmetic or elliptic curves) or the
integer factoring problem. The other is the “coding/lattice theoretic”
family that relies on the difficulty of solving noisy linear equations
or related problems such as finding short vectors in a lattice and solv-
ing instances of the “knapsack” problem. Moreover, problems from
the first family are known to be efficiently solvable in a computational
model known as “quantum computing”. If large scale physical de-
vices that simulate this model, known as quantum computers, exist,
then they could break all cryptosystems relying on these problems,
and we’ll be down to only having a single family of candidate public
key encryption schemes.
We will start by describing cryptosystems based on the first family
(which was discovered before the other and was more widely imple-
mented), and talk about the second family in future lectures.

9.3.1 Diffie-Hellman Encryption (aka El-Gamal)


The Diffie-Hellman public key system is built on the presumed diffi-
culty of the discrete logarithm problem:
196 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

For any number 𝑝, let ℤ𝑝 be the set of numbers {0, … , 𝑝 − 1} where


addition and multiplication are done modulo 𝑝. We will think of
numbers 𝑝 that are of magnitude roughly 2𝑛 , so they can be described
with about 𝑛 bits. We can clearly multiply and add such numbers
modulo 𝑝 in 𝑝𝑜𝑙𝑦(𝑛) time. If 𝑔 ∈ ℤ𝑝 and 𝑎 is any natural number,
we can define 𝑔𝑎 to be simply 𝑔 ⋅ 𝑔 ⋯ 𝑔 (𝑎 times). A priori one might
think that it would take 𝑎 ⋅ 𝑝𝑜𝑙𝑦(𝑛) time to compute 𝑔𝑎 , which might
be exponential if 𝑎 itself is roughly 2𝑛 . However, we can compute this
in 𝑝𝑜𝑙𝑦((log 𝑎) ⋅ 𝑛) time using the repeated squaring trick. The idea is
that if 𝑎 = 2ℓ , then we can compute 𝑔𝑎 in ℓ by squaring 𝑔 ℓ times, and
a general 𝑎 can be decomposed into powers of two using the binary
representation.
The discrete logarithm problem is the problem of computing, given
𝑔, ℎ ∈ ℤ𝑝 , a number 𝑎 such that 𝑔𝑎 = ℎ. If such a solution 𝑎 exists then
there is always also a solution of size at most 𝑝 (can you see why?)
and so the solution can be represented using 𝑛 bits. However, cur-
rently the best-known algorithm for computing the discrete logarithm
runs in time roughly 2𝑛 , which is currently prohibitively expensive
1/3

3
The running time of the best known algorithms
when 𝑝 is a prime of length about 2048 bits.3 for computing the discrete logarithm modulo 𝑛
John Gill suggested to Diffie and Hellman that modular exponen- 𝑛1/3
bit primes is 2𝑓(𝑛)2 , where 𝑓(𝑛) is a function
tiation can be a good source for the kind of “easy-to-compute but that depends polylogarithmically on 𝑛. If 𝑓(𝑛) would
equal 1, then we’d need numbers of 1283 ≈ 2⋅106 bits
hard-to-invert” functions they were looking for. Diffie and Hellman to get 128 bits of security, but because 𝑓(𝑛) is larger
based a public key encryption scheme as follows: than one, the current estimates are that we need to let
𝑛 = 3072 bit key to get 128 bits of of security. Still,
the existence of such a non-trivial algorithm means
• The key generation algorithm, on input 𝑛, samples a prime number 𝑝
that we need much larger keys than those used for
of 𝑛 bits description (i.e., between 2𝑛−1 to 2𝑛 ), a number 𝑔 ←𝑅 ℤ𝑝 private key systems to get the same level of security.
and 𝑎 ←𝑅 {0, … , 𝑝 − 1}. We also sample a hash function 𝐻 ∶ In particular, to double the estimated security to 256
bits, NIST recommends that we multiply the RSA
{0, 1}𝑛 → {0, 1}ℓ . The public key 𝑒 is (𝑝, 𝑔, 𝑔𝑎 , 𝐻), while the secret keysize five-fold to 15, 360. (The same document
key 𝑑 is 𝑎.4 also says that SHA-256 gives 256 bits of security as
a pseudorandom generator but only 128 bits when
used to hash documents for digital signatures; can
• The encryption algorithm, on input a message 𝑚 ∈ {0, 1}ℓ and a you see why?)
public key 𝑒 = (𝑝, 𝑔, ℎ, 𝐻), will choose a random 𝑏 ←𝑅 {0, … , 𝑝 − 1} 4
Formally, the secret key should contain all the
and output (𝑔𝑏 , 𝐻(ℎ𝑏 ) ⊕ 𝑚). information in the public key plus the extra secret
information, but we omit the public information for
simplicity of notation.
• The decryption algorithm, on input a ciphertext (𝑓, 𝑦) and the secret
key, will output 𝐻(𝑓 𝑎 ) ⊕ 𝑦.

The correctness of the decryption algorithm follows from the fact


that (𝑔𝑎 )𝑏 = (𝑔𝑏 )𝑎 = 𝑔𝑎𝑏 and hence 𝐻(ℎ𝑏 ) computed by the encryption
algorithm is the same as the value 𝐻(𝑓 𝑎 ) computed by the decryption
algorithm. A simple relation between the discrete logarithm and the
Diffie-Hellman system is the following:
Lemma 9.7 If there is a polynomial time algorithm for the discrete
logarithm problem, then the Diffie-Hellman system is insecure.
p u bl i c ke y c ry p tog ra p hy 197

Proof. Using a discrete logarithm algorithm, we can compute the


private key 𝑎 from the parameters 𝑝, 𝑔, 𝑔𝑎 present in the public key,
and clearly once we know the private key we can decrypt any message
of our choice.

Unfortunately, no such result is known in the other direction. How-


ever, we can prove that this protocol is secure in the random-oracle
model, under the assumption that the task of computing 𝑔𝑎𝑏 from 𝑔𝑎
and 𝑔𝑏 (which is now known as the Diffie-Hellman problem) is hard.
Computational Diffie-Hellman Assumption: Let
𝔾 be a group whose elements can be described in
𝑛 bits, with an associative and commutative multi-
plication operation that can be computed in 𝑝𝑜𝑙𝑦(𝑛)
time. The Computational Diffie-Hellman (CDH) as-
sumption holds with respect to the group 𝔾 if for
every generator (see below) 𝑔 of 𝔾 and efficient
algorithm 𝐴, the probability that on input 𝑔, 𝑔𝑎 , 𝑔𝑏 ,
𝐴 outputs the element 𝑔𝑎𝑏 is negligible as a function
of 𝑛. 5
5
Formally, since it is an asymptotic statement, the
In particular we can make the following conjecture: CDH assumption needs to be defined with a sequence
of groups. However, to make notation simpler we will
Computational Diffie-Hellman Conjecture for ignore this issue, and use it only for groups (such as
mod prime groups: For a random 𝑛-bit prime and the numbers modulo some 𝑛 bit primes) where we
random 𝑔 ∈ ℤ𝑝 , the CDH holds with respect to the can easily increase the “security parameter” 𝑛.
group 𝔾 = {𝑔𝑎 mod 𝑝 |𝑎 ∈ ℤ}.

That is, for every polynomial 𝑞 ∶ ℕ → ℕ, if 𝑛 is large enough,


then with probability at least 1 − 1/𝑞(𝑛) over the choice of a uniform
prime 𝑝 ∈ [2𝑛 ] and 𝑔 ∈ ℤ𝑝 , for every circuit 𝐴 of size at most 𝑞(𝑛), the
probability that 𝐴(𝑔, 𝑝, 𝑔𝑎 , 𝑔𝑏 ) outputs ℎ such that 𝑔𝑎𝑏 = ℎ mod 𝑝 is at
most 1/𝑞(𝑛) where the probability is taken over 𝑎, 𝑏 chosen at random
in ℤ𝑝 . (In practice people often take 𝑔 to be a generator of a group
significantly smaller in size than 𝑝, which enables 𝑎, 𝑏 to be smaller
numbers and hence multiplication to be more efficient; we ignore this
optimization in our discussions.)

P
Please take your time to re-read the following conjec-
ture until you are sure you understand what it means.
Victor Shoup’s excellent and online available book A
Computational Introduction to Number Theory and
Algebra has an in depth treatment of groups, genera-
tors, and the discrete log and Diffie-Hellman problem.
See also Chapters 10.4 and 10.5 in the Boneh-Shoup
book, and Chapters 8.3 and 11.4 in the Katz-Lindell
book. There are also solved group theory exercises at
the end of this chapter.
198 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Suppose
Theorem 9.8 — Diffie-Hellman security in Random Oracle Model.
that the Computational Diffie-Hellman Conjecture for mod prime
groups is true. Then, the Diffie-Hellman public key encryption is
CPA secure in the random oracle model.

Proof. For CPA security we need to prove that (for fixed 𝔾 of size 𝑝
and random oracle 𝐻) the following two distributions are computa-
tionally indistinguishable for every two strings 𝑚, 𝑚′ ∈ {0, 1}ℓ :

• (𝑔𝑎 , 𝑔𝑏 , 𝐻(𝑔𝑎𝑏 ) ⊕ 𝑚) for 𝑎, 𝑏 chosen uniformly and independently in


ℤ𝑝 .

• (𝑔𝑎 , 𝑔𝑏 , 𝐻(𝑔𝑎𝑏 ) ⊕ 𝑚′ ) for 𝑎, 𝑏 chosen uniformly and independently in


ℤ𝑝 .

(can you see why this implies CPA security? you should pause here
and verify this!)
We make the following claim:
CLAIM: For a fixed 𝔾 of size 𝑝, generator 𝑔 for 𝔾, and given random
oracle 𝐻, if there is a size 𝑇 distinguisher 𝐴 with 𝜖 advantage between
the distribution (𝑔𝑎 , 𝑔𝑏 , 𝐻(𝑔𝑎𝑏 )) and the distribution (𝑔𝑎 , 𝑔𝑏 , 𝑈ℓ ) (where
𝑎, 𝑏 are chosen uniformly and independently in ℤ𝑝 ), then there is a
size 𝑝𝑜𝑙𝑦(𝑇 ) algorithm 𝐴′ to solve the Diffie-Hellman problem with
respect to 𝔾, 𝑔 with success at least 𝜖. That is, for random 𝑎, 𝑏 ∈ ℤ𝑝 ,
𝐴′ (𝑔, 𝑔𝑎 , 𝑔𝑏 ) = 𝑔𝑎𝑏 with probability at least 𝜖/(2𝑇 ).
Proof of claim: The proof is simple. We claim that under the as-
sumptions above, 𝐴 makes the query 𝑔𝑎𝑏 to its oracle 𝐻 with probabil-
ity at least 𝜖/2 since otherwise, by the “lazy evaluation” paradigm, we
can assume that 𝐻(𝑔𝑎𝑏 ) is chosen independently at random after 𝐴’s
attack is completed and hence (conditioned on the adversary not mak-
ing that query), the value 𝐻(𝑔𝑎𝑏 ) is indistinguishable from a uniform
output. Therefore, on input 𝑔, 𝑔𝑎 , 𝑔𝑏 , 𝐴′ can simulate 𝐴 and simply
output one of the at most 𝑇 queries that 𝐴 makes to 𝐻 at random and
will be successful with probability at least 𝜖/(2𝑇 ).
Now given the claim, we can complete the proof of security via the
following hybrids. Define the following “hybrid” distributions (where
in all cases 𝑎, 𝑏 are chosen uniformly and independently in ℤ𝑝 ):

• 𝐻0 : (𝑔𝑎 , 𝑔𝑏 , 𝐻(𝑔𝑎𝑏 ) ⊕ 𝑚)

• 𝐻1 : (𝑔𝑎 , 𝑔𝑏 , 𝑈ℓ ⊕ 𝑚)

• 𝐻2 : (𝑔𝑎 , 𝑔𝑏 , 𝑈ℓ ⊕ 𝑚′ )

• 𝐻3 : (𝑔𝑎 , 𝑔𝑏 , 𝐻(𝑔𝑎𝑏 ) ⊕ 𝑚′ )
p u bl i c ke y c ry p tog ra p hy 199

The claim implies that 𝐻0 ≈ 𝐻1 . Indeed otherwise we could trans-


form a distinguisher 𝑇 between 𝐻0 and 𝐻1 to a distinguisher 𝑇 ′ ,
violating the claim by letting 𝑇 ′ (ℎ, ℎ′ , 𝑧) = 𝑇 (ℎ, ℎ′ , 𝑧 ⊕ 𝑚).
The distributions 𝐻1 and 𝐻2 are identical by the same argument as
the security of the one time pad (since 𝑈ℓ ⊕ 𝑚 is identical to 𝑈ℓ ).
The distributions 𝐻2 and 𝐻3 are computationally indistinguishable
by the same argument that 𝐻0 ≈ 𝐻1 .
Together these imply that 𝐻0 ≈ 𝐻3 which yields the CPA security
of the scheme.

R
Remark 9.9 — Decisional Diffie Hellman. One can get
security results for this protocol without a random
oracle if we assume a stronger variant known as
the Decisional Diffie-Hellman (DDH) assumption:
for a random 𝑎, 𝑏, 𝑢 ∈ ℤ𝑝 (prime 𝑝), the triple
(𝑔𝑎 , 𝑔𝑏 , 𝑔𝑎𝑏 ) ≈ (𝑔𝑎 , 𝑔𝑏 , 𝑔𝑢 ). This implies CDH (can
you see why?). DDH also restricts our focus to groups
of prime order. In particular, DDH does not hold in
even-order groups. For example, DDH does not hold
in ℤ𝑝 = {1, 2 … 𝑝 − 1} (with group operation multipli-
cation mod 𝑝) since half of its elements are quadratic
residues and it is efficient to test if an element is a
quadratic residue using Fermat’s little theorem (can
you see why? See Exercise 10.7). However, DDH
holds in subgroups of ℤ𝑝 of prime order. If 𝑝 is a safe
prime (i.e. 𝑝 = 2𝑞 + 1 for a prime 𝑞), then we can
instead use the subgroup of quadratic residues, which
has prime order 𝑞. See Boneh-Shoup 10.4.1 for more
details on the underlying groups for CDH and DDH.

R
Remark 9.10 — Elliptic curve cryptography. As men-
tioned, the Diffie-Hellman systems can be run with
many variants of Abelian groups. Of course, for some
of those groups the discrete logarithm problem might
be easy, and so they would be inappropriate to use
for this system. One variant that has been proposed
is elliptic curve cryptography. This is a group consist-
ing of points of the form (𝑥, 𝑦, 𝑧) ∈ ℤ3𝑝 that satisfy a
certain equation, where multiplication can be defined
in a certain way. The main advantage of elliptic curve
cryptography is that the best known algorithms run in
time 2≈𝑛 as opposed to 2≈𝑛 , which allows for much
1/3

shorter keys. Unfortunately, elliptic curve cryptogra-


phy is just as susceptible to quantum algorithms as the
discrete logarithm problem over ℤ𝑝 .
200 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

R
Remark 9.11 — Encryption vs Key Exchange and ElGamal.
In most of the cryptography literature the protocol
above is called the Diffie-Hellman Key Exchange pro-
tocol, and when considered as a public key system
it is sometimes known as ElGamal encryption. 6 The
reason for this mostly stems from the early confusion
on what the right security definitions are. Diffie and
Hellman thought of encryption as a deterministic pro-
cess and so they called their scheme a “key exchange
protocol”. The work of Goldwasser and Micali showed
that encryption must be probabilistic for security.
Also, because of efficiency considerations, these days
public key encryption is mostly used as a mechanism
to exchange a key for a private key encryption that is
then used for the bulk of the communication. Together
this means that there is not much point in distinguish-
ing between a two-message key exchange algorithm
and a public key encryption.
6
ElGamal’s actual contribution was to design a
signature scheme based on the Diffie-Hellman problem,
a variant of which is the Digital Signature Algorithm
9.3.2 Sampling random primes (DSA) described below.

To sample a random 𝑛 bit prime, one can sample a random number


0 ≤ 𝑝 < 2𝑛 and then test if 𝑝 is prime. If it is not prime, then we can
sample a new random number again. To make this work we need to
show two properties:
Efficient testing: That there is a 𝑝𝑜𝑙𝑦(𝑛) time algorithm to test
whether an 𝑛 bit number is a prime. It turns out that there are such
known algorithms. Randomized algorithm have been known since the
1970’s. Moreover in a 2002 breakthrough, Manindra Agrawal, Neeraj
Kayal, and Nitin Saxena (a professor and two undergraduate students
from the Indian Institute of Technology Kanpur) came up with the
first deterministic polynomial time algorithm for testing primality.
Prime density: That the probability that a random 𝑛 bit number
is prime is at least 1/𝑝𝑜𝑙𝑦(𝑛). This probability is in fact 1/ ln(2𝑛 ) =
Ω(1/𝑛) by the Prime Number Theorem. However, for the sake of
completeness, we sketch below a simple argument showing the proba-
bility is at least Ω(1/𝑛2 ).
Lemma 9.12 The number of primes between 1 and 𝑁 is Ω(𝑁 / log 𝑁 ).

Proof. Recall that the least common multiple (LCM) of two or more
𝑎1 , … , 𝑎𝑡 is the smallest number that is a multiple of all of the 𝑎𝑖 ’s. One
way to compute the LCM of 𝑎1 , … , 𝑎𝑡 is to take the prime factoriza-
tions of all the 𝑎𝑖 ’s, and then the LCM is the product of all the primes
that appear in these factorizations, each taken to the corresponding
highest power that appears in the factorization. Let 𝑘 be the number
p u bl i c ke y c ry p tog ra p hy 201

of primes between 1 and 𝑁 . The lemma will follow from the following
two claims:
CLAIM 1: LCM(1, … , 𝑁 ) ≤ 𝑁 𝑘 .
CLAIM 2: If 𝑁 is odd, then LCM(1, … , 𝑁 ) ≥ 2𝑁−1 .
The two claims immediately imply the result, since they imply
that 2𝑁−1 ≤ 𝑁 𝑘 , and taking logs we get that 𝑁 − 1 ≤ 𝑘 log 𝑁 or
𝑘 ≥ (𝑁 − 1)/ log 𝑁 . (We can assume that 𝑁 is odd without of loss of
generality, since changing from 𝑁 to 𝑁 + 1 can change the number of
primes by at most one.) Thus, all that is left is to prove the two claims.
Proof of CLAIM 1: Let 𝑝1 , … , 𝑝𝑘 be all the prime numbers between
1 and 𝑁 , and let 𝑒𝑖 be the largest integer such that 𝑝𝑖 𝑖 ≤ 𝑁 and 𝐿 =
𝑒

𝑝1 ⋯ 𝑝𝑘 . Since 𝐿 is the product of 𝑘 terms, each of size at most 𝑁 ,


𝑒1 𝑒𝑘

𝐿 ≤ 𝑁 𝑘 . But we claim that every number 1 ≤ 𝑎 ≤ 𝑁 divides 𝐿.


Indeed, every prime 𝑝 in the prime factorization of 𝑎 is one of the
𝑝𝑖 ’s, and since 𝑎 ≤ 𝑁 , the power in which 𝑝 appears in 𝑎 is at most
𝑒𝑖 . By the definition of the least common multiple, this means that
LCM(1, … , 𝑁 ) ≤ 𝐿. QED (CLAIM 1)
1
Proof of CLAIM 2: Consider the integral 𝐼 = ∫0 𝑥(𝑁−1)/2 (1 −
𝑥)(𝑁−1)/2 𝑑𝑥. This is clearly some positive number and so 𝐼 > 0. On
one hand, for every 𝑥 between zero and one, 𝑥(1 − 𝑥) ≤ 1/4 and hence
𝐼 is at most 4−(𝑁−1)/2 = 2−𝑁+1 . On the other hand, the polynomial
𝑥(𝑁−1)/2 (1 − 𝑥)(𝑁−1)/2 is some polynomial of degree at most 𝑁 − 1
𝑁−1 1
with integer coefficients, and so 𝐼 = ∑𝑘=0 𝐶𝑘 ∫0 𝑥𝑘 𝑑𝑥 for some integer
1
coefficients 𝐶0 , … , 𝐶𝑁−1 . Since ∫0 𝑥𝑘 𝑑𝑥 = 𝑘+1
1
, we see that 𝐼 is a sum
of fractions with integer numerators and with denominators that are
at most 𝑁 . Since all the denominators are at most 𝑁 and 𝐼 > 0, it
follows that 𝐼 ≥ 𝐿𝐶𝑀(1,…,𝑁)
1
, and so

1
2−𝑁+1 ≥ 𝐼 ≥ 𝐿𝐶𝑀(1,…,𝑁)

which implies LCM(1, … , 𝑁 ) ≤ 2𝑁−1 . QED (CLAIM 2 and hence


lemma)

9.3.3 A little bit of group theory.


If you haven’t seen group theory, it might be useful for you to do a
quick review. We will not use much group theory and mostly use
the theory of finite commutative (also known as Abelian) groups (in
fact often cyclic) which are such a baby version that it might not be
considered true “group theory” by many group theorists. Shoup’s
excellent book contains everything we need to know (and much more
than that). What you need to remember is the following:

• A finite commutative group 𝔾 is a finite set together with a multiplica-


tion operation that satisfies 𝑎 ⋅ 𝑏 = 𝑏 ⋅ 𝑎 and (𝑎 ⋅ 𝑏) ⋅ 𝑐 = 𝑎 ⋅ (𝑏 ⋅ 𝑐).
202 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

• 𝔾 has a special element known as 1, where 𝑔1 = 1𝑔 = 𝑔 for every


𝑔 ∈ 𝔾 and for every 𝑔 ∈ 𝔾 there exists an element 𝑔−1 ∈ 𝔾 such that
𝑔𝑔−1 = 1.

• For every 𝑔 ∈ 𝔾, the order of 𝑔, denoted 𝑜𝑟𝑑𝑒𝑟(𝑔), is the smallest


positive integer 𝑎 such that 𝑔𝑎 = 1.

The following basic facts are all not too hard to prove and would be
useful exercises:

• For every 𝑔 ∈ 𝔾, the map 𝑎 ↦ 𝑔𝑎 is a 𝑘 to 1 map from {0, … , |𝔾| − 1} 7


For every 𝑓 ∈ 𝔾, you can show a one to one and onto
to 𝔾 where 𝑘 = |𝔾|/𝑜𝑟𝑑𝑒𝑟(𝑔). See footnote for hint.7 mapping between the set {𝑎 ∶ 𝑔𝑎 = 1} and the set
{𝑏 ∶ 𝑔𝑏 = 𝑓} by choosing some element 𝑏 from the
latter set and looking at the map 𝑎 ↦ 𝑎 + 𝑏 mod |𝔾|.
• As a corollary, the order of 𝑔 is always a divisor of |𝔾|. This is a
special case of a more general phenomenon: the set {𝑔𝑎 | 𝑎 ∈ ℤ} is a
subset of the group 𝔾 that is closed under multiplication, and such
subsets are known as subgroups of 𝔾. It is not hard to show (using
the same approach as above) that for every group 𝔾 and subgroup
ℍ, the size of ℍ divides the size of 𝔾. This is known as Lagrange’s
Theorem in group theory.

• An element 𝑔 of 𝔾 is called a generator if 𝑜𝑟𝑑𝑒𝑟(𝑔) = |𝔾|. A group


is called cyclic if it has a generator. If 𝔾 is cyclic then there is a (not
necessarily efficiently computable) isomorphism 𝜙 ∶ 𝔾 → ℤ|𝔾| which
is a one-to-one and onto map satisfying 𝜙(𝑔 ⋅ ℎ) = 𝜙(𝑔) + 𝜙(ℎ) for
every 𝑔, ℎ ∈ 𝔾.

When using a group 𝔾 for the Diffie-Hellman protocol, we want


the property that 𝑔 is a generator of the group, which also means that
the map 𝑎 ↦ 𝑔𝑎 is a one-to-one mapping from {0, … , |𝔾| − 1} to 𝔾.
This can be efficiently tested if we know the order of the group and its
factorization, since it will occur if and only if 𝑔𝑎 ≠ 1 for every 𝑎 < |𝔾|
(can you see why this holds?) and we know that if 𝑔𝑎 = 1 then 𝑎 must
divide 𝔾 (and this?).
It is not hard to show that a random element 𝑔 ∈ 𝔾 will be a gen-
erator with non-trivial probability (for similar reasons that a random
number is prime with non-trivial probability). Hence, an approach to
getting such a generator is to simply choose 𝑔 at random and test that
𝑔𝑎 ≠ 1 for all of the fewer than log |𝔾| numbers that are obtained by
taking |𝔾|/𝑞 where 𝑞 is a factor of |𝔾|.

P
Try to stop here and verify all the facts on groups
mentioned above. There are additional group theory
exercises at the end of the chapter as well.
p u bl i c ke y c ry p tog ra p hy 203

9.3.4 Digital Signatures


Public key encryption solves the confidentiality problem, but we still
need to solve the authenticity or integrity problem, which might be
even more important in practice. That is, suppose Alice wants to en-
dorse a message 𝑚 that everyone can verify, but only she can sign.
This of course is extremely widely used in many settings, including
software updates, web pages, financial transactions, and more.

A triple of algo-
Definition 9.13 — Digital Signatures and CMA security.
rithms (𝐺, 𝑆, 𝑉 ) is a chosen-message-attack secure digital signature
scheme if it satisfies the following:

• On input 1𝑛 , the probabilistic key generation algorithm 𝐺 outputs


a pair (𝑠, 𝑣) of keys, where 𝑠 is the private signing key and 𝑣 is the
public verification key.

• On input a message 𝑚 and the signing key 𝑠, the signing algo-


rithm 𝑆 outputs a string 𝜎 = 𝑆𝑠 (𝑚) such that with probability
1 − 𝑛𝑒𝑔𝑙(𝑛), 𝑉𝑣 (𝑚, 𝑆𝑠 (𝑚)) = 1.

• Every efficient adversary 𝐴 wins the following game with at


most negligible probability:

1. The keys (𝑠, 𝑣) are chosen by the key generation algorithm.


2. The adversary gets the inputs 1𝑛 , 𝑣, and black box access to
the signing algorithm 𝑆𝑠 (⋅).
3. The adversary wins if they output a pair (𝑚∗ , 𝜎∗ ) such that
𝑚∗ was not queried before to the signing algorithm and
𝑉𝑣 (𝑚∗ , 𝜎∗ ) = 1.

R
Remark 9.14 — Strong unforgeability. Just like for MACs
(see Definition 4.8), our definition of security for
digital signatures with respect to a chosen message
attack does not preclude the ability of the adversary
to produce a new signature for the same message that
it has seen a signature of. Just like in MACs, people
sometimes consider the notion of strong unforgeability
which requires that it would not be possible for the
adversary to produce a new message-signature pair
(even if the message itself was queried before). Some
signature schemes (such as the full domain hash and
the DSA scheme) satisfy this stronger notion while
others do not. However, just like MACs, it is possible
to transform any signature with standard security into
204 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

a signature that satisfies this stronger unforgeability


condition.

9.3.5 The Digital Signature Algorithm (DSA)


The Diffie-Hellman protocol can be turned into a signature scheme.
This was first done by ElGamal, and a variant of his scheme was de-
veloped by the NSA and standardized by NIST as the Digital Signa-
ture Algorithm (DSA) standard. When based on an elliptic curve this
is known as ECDSA. The starting point is the following generic idea of
how to turn an encryption scheme into an identification protocol.
If Alice published a public encryption key 𝑒, then one natural ap-
proach for Alice to prove her identity to Bob is as follows. Bob will
send an encryption 𝑐 = 𝐸𝑒 (𝑥) of some random message 𝑥 ←𝑅 {0, 1}𝑛
to Alice, and Alice will send 𝑥′ = 𝐷𝑑 (𝑐) back. If 𝑥 = 𝑥′ , then she has
proven that she can decrypt ciphertexts encrypted with 𝑒, and so Bob
can be assured that she is the rightful owner of the public key 𝑒.
However, this falls short of a signature scheme in two aspects:

• This is only an identification protocol and does not allow Alice to


endorse a particular message 𝑚.

• This is an interactive protocol, and so Alice cannot generate a static


signature based on 𝑚 that can be verified by any party without
further interaction.

The first issue is not so significant, since we can always have the
ciphertext be an encryption of 𝑥 = 𝐻(𝑚) where 𝐻 is some hash
function presumed to behave as a random oracle. (We do not want to
simply run this protocol with 𝑥 = 𝑚. Can you see why?)
The second issue is more serious. We could imagine Alice trying
to run this protocol on her own by generating the ciphertext and then
decrypting it, and then sending over the transcript to Bob. But this
does not really prove that she knows the corresponding private key.
After all, even without knowing 𝑑, any party can generate a ciphertext
𝑐 and its corresponding decryption. The idea behind the DSA protocol
is that we require Alice to generate a ciphertext 𝑐 and its decryption
satisfying some additional extra conditions, which would prove that
Alice truly knew the secret key.

DSA Signatures: The DSA signature algorithm works as follows: (See


also Section 12.5.2 in the KL book) 8
It is a bit cumbersome, but not so hard, to transform
functions that map strings to strings to functions
• Key generation: Pick generator 𝑔 for 𝔾 and 𝑎 ∈ {0, … , |𝔾| − 1} and let whose domain or range are group elements. As noted
in the KL book, in the actual DSA protocol 𝐹 is not
ℎ = 𝑔𝑎 . Pick 𝐻 ∶ {0, 1}ℓ → 𝔾 and 𝐹 ∶ 𝔾 → 𝔾 to be some functions
a cryptographic hash function but rather some very
that can be thought of as “hash functions”.8 The public key is (𝑔, ℎ) simple function that is still assumed to be “good
(as well as the functions 𝐻, 𝐹 ) and secret key is 𝑎. enough” for security.
p u bl i c ke y c ry p tog ra p hy 205

• Signature: To sign a message 𝑚 with the key 𝑎, pick 𝑏 at random,


and let 𝑓 = 𝑔𝑏 , and then let 𝜎 = 𝑏−1 [𝐻(𝑚) + 𝑎 ⋅ 𝐹 (𝑓)], where all
computation is done modulo |𝔾|. The signature is (𝑓, 𝜎).

• Verification: To verify a signature (𝑓, 𝜎) on a message 𝑚, check that


𝜎 ≠ 0 and 𝑓 𝜎 = 𝑔𝐻(𝑚) ℎ𝐹 (𝑓) .

P
You should pause here and verify that this is indeed a
valid signature scheme, in the sense that for every 𝑚,
𝑉𝑠 (𝑚, 𝑆𝑠 (𝑚)) = 1.

Very roughly speaking, the idea behind security is that on one hand
𝜎 does not reveal information about 𝑏 and 𝑎 because this is “masked”
by the “random” value 𝐻(𝑚). On the other hand, if an adversary is
able to come up with valid signatures, then at least if we treated 𝐻
and 𝐹 as oracles, if the signature passes verification then (by taking
log to the base of 𝑔) the answers 𝑥, 𝑦 of these oracles will satisfy 𝑏𝜎 =
𝑥 + 𝑎𝑦, which means that sufficiently many such equations should be
enough to recover the discrete log 𝑎.

P
Before seeing the actual proof, it is a very good exer-
cise to try to see how to convert the intuition above
into a formal proof.

Suppose
Theorem 9.15 — Random-Oracle Model Security of DSA signatures.
that the discrete logarithm assumption holds for the group 𝔾. Then
the DSA signature with 𝔾 is secure when 𝐻, 𝐹 are modeled as
random oracles.

Proof. Suppose, for the sake of contradiction, that there was a 𝑇 -time
adversary 𝐴 that succeeds with probability 𝜖 in a chosen message
attack against the DSA scheme. We will show that there is an adver-
sary that can compute the discrete logarithm with running time and
probability polynomially related to 𝑇 and 𝜖 respectively.
Recall that in a chosen message attack in the random oracle model,
the adversary interacts with a signature oracle and oracles that com-
pute the functions 𝐹 and 𝐻. For starters, we consider the following
experiment CMA , where in the chosen message attack we replace the

signature box with the following “fake signature oracle” and “fake
function 𝐹 oracle”:
206 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

On input a message 𝑚, the fake box will choose 𝜎, 𝑟 at random in


{0, … , 𝑝 − 1} (where 𝑝 = |𝔾|), and compute

mod 𝑝
(9.1)
−1
𝑓 = (𝑔𝐻(𝑚) ℎ𝑟 )𝜎

and output (𝑓, 𝜎). We will then record the value 𝐹 (𝑓) = 𝑟 and answer
𝑟 on future queries to 𝐹 . If we’ve already answered 𝐹 (𝑓) before with a
different value, then we halt the experiment and output an error. We
claim that the adversary’s chance of succeeding in CMA is computa-

tionally indistinguishable from its chance of succeeding in the original


CMA experiment. Indeed, since we choose the value 𝑟 = 𝐹 (𝑓) at ran-
dom, as long as we don’t repeat a value 𝑓 that was queried before, the
function 𝐹 is completely random. But since the adversary makes at
most 𝑇 queries, and each 𝑓 is chosen according to (9.1), which yields
a random element the group 𝔾 (which has size roughly 2𝑛 ), the prob-
ability that 𝑓 is repeated is at most 𝑇 /|𝔾|, which is negligible. Now we
computed 𝜎 in the fake box as a random value, but we can also com-
pute 𝜎 as equaling 𝑏−1 (𝐻(𝑚) + 𝑎𝑟) mod 𝑝, where 𝑏 = log𝑔 𝑓 mod 𝑝
is uniform as well, and so the distribution of the signature (𝑓, 𝜎) is
identical to the distribution by a real box.
Note that we can simulate the result of the experiment CMA with-

out access to the value 𝑎 such that ℎ = 𝑔𝑎 . We now transform an al-


gorithm 𝐴′ that manages to forge a signature in the CMA experiment

into an algorithm that given 𝔾, 𝑔, 𝑔𝑎 manages to recover 𝑎.


We let (𝑚∗ , 𝑓 ∗ , 𝜎∗ ) be the message and signature that the adversary
𝐴′ outputs at the end of a successful attack. We can assume without
loss of generality that 𝑓 ∗ is queried to the 𝐹 oracle at some point dur-
ing the attack. (For example, by modifying 𝐴′ to make this query just
before she outputs the final signature.) So, we split into two cases:
Case I: The value 𝐹 (𝑓 ∗ ) is first queried by the signature box.
Case II: The value 𝐹 (𝑓 ∗ ) is first queried by the adversary.
If Case I happens with non-negligible probability, then we know
that the value 𝑓 ∗ is queried when producing the signature (𝑓 ∗ , 𝜎) for
some message 𝑚 ≠ 𝑚∗ , and so we know the following two equations
hold:

𝑔𝐻(𝑚) ℎ𝐹 (𝑓 ) = (𝑓 ∗ )𝜎

and
∗ ∗ ∗
𝑔𝐻(𝑚 ) ℎ𝐹 (𝑓 ) = (𝑓 ∗ )𝜎

Taking logs we get the following equations on 𝑎 = log𝑔 ℎ and 𝑏 =


log𝑔 𝑓 ∗ :
𝐻(𝑚) + 𝑎𝐹 (𝑓 ∗ ) = 𝑏𝜎

and
𝐻(𝑚∗ ) + 𝑎𝐹 (𝑓 ∗ ) = 𝑏𝜎∗
p u bl i c ke y c ry p tog ra p hy 207

or
𝑏 = (𝐻(𝑚) − 𝐻(𝑚∗ ))(𝜎 − 𝜎∗ )−1 mod 𝑝
since all of the values 𝐻(𝑚∗ ), 𝐻(𝑚), 𝜎, 𝜎∗ are known, this means we
can compute 𝑏, and hence also recover the unknown value 𝑎.
If Case II happens, then we split it into two cases as well.
Case IIa is the subcase of Case II where 𝐹 (𝑓 ∗ ) is queried before
𝐻(𝑚∗ ) is queried, and Case IIb is the subscase of Case II when 𝐹 (𝑓 ∗ )
is queried after 𝐻(𝑚∗ ) is queried.
We start by considering the setting that Case IIa happens with
non-negligible probability 𝜖. By the averaging argument there are
some 𝑡′ < 𝑡 ∈ {1, … , 𝑇 } such that with probability at least 𝜖/𝑇 2 , 𝑓 ∗ is
queried by the adversary at the 𝑡′ -th query and 𝑚∗ is queried by the
adversary at its 𝑡-th query. We run the CMA experiment twice, using

the same randomness up until the 𝑡 − 1-th query and independent


randomness from then onwards. With probability at least (𝜖/𝑇 2 )2 ,
both experiments will result in a successful forge, and since 𝑓 ∗ was
queried before at stage 𝑡′ < 𝑡, we get the following equations

𝐻1 (𝑚∗ ) + 𝑎𝐹 (𝑓 ∗ ) = 𝑏𝜎

and
𝐻2 (𝑚∗ ) + 𝑎𝐹 (𝑓 ∗ ) = 𝑏𝜎∗
where 𝐻1 (𝑚∗ ) and 𝐻2 (𝑚∗ ) are the answers of 𝐻 to the query 𝑚∗ in
the first and second time we run the experiment. (The answers of 𝐹 to
𝑓 ∗ are the same since this happens before the 𝑡-th step). As before, we
can use this to recover 𝑎 = log𝑔 ℎ.
If Case IIb happens with non-negligible probability, 𝜖 > 0. Then
again by the averaging argument there are some 𝑡 < 𝑡′ ∈ {1, … , 𝑇 }
such that with probability at least 𝜖/𝑇 2 , 𝑚∗ is queried by the adversary
at the 𝑡-th query, and 𝑓 ∗ is queried by the adversary at its 𝑡′ -th query.
We run the CMA experiment twice, using the same randomness up

until the 𝑡′ − 1-th query and independent randomness from then


onwards. This time we will get the two equations

𝐻(𝑚∗ ) + 𝑎𝐹1 (𝑓 ∗ ) = 𝑏𝜎

and
𝐻(𝑚∗ ) + 𝑎𝐹2 (𝑓 ∗ ) = 𝑏𝜎∗
where 𝐹1 (𝑓 ∗ ) and 𝐹2 (𝑓 ∗ ) are our two answers in the first and second
experiment, and now we can use this to learn 𝑎 = 𝑏(𝜎 − 𝜎∗ )(𝐹1 (𝑓 ∗ ) −
𝐹2 (𝑓 ∗ ))−1 .
The bottom line is that we obtain a probabilistic polynomial time
algorithm that on input 𝔾, 𝑔, 𝑔𝑎 recovers 𝑎 with non-negligible proba-
bility, hence violating the assumption that the discrete log problem is
hard for the group 𝔾.
208 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

R
Remark 9.16 — Non-random oracle model security. In
this lecture both our encryption scheme and digital
signature schemes were not proven secure under a
well-stated computational assumption but rather used
the random oracle model heuristic. However, it is
known how to obtain schemes that do not rely on this
heuristic, and we will see such schemes later on in this
course.

9.4 PUTTING EVERYTHING TOGETHER - SECURITY IN PRACTICE.


Let us discuss briefly how public key cryptography is used to secure
web traffic through the SSL/TLS protocol that we all use when we
use https:// URLs. The security this achieves is quite amazing. No
matter what wired or wireless network you are using, no matter what
country you are in, as long as your device (e.g., phone/laptop/etc..)
and the server you are talking to (e.g., Google, Amazon, Microsoft
etc.) is functioning properly, you can communicate securely without
any party in the middle able to either learn or modify the contents of 9
They are able to know that such an interaction took
your interaction.9 place and the amount of bits exchanged. Preventing
these kind of attacks is more subtle and approaches
In the web setting, there are servers who have public keys, and users for solutions are known as steganography and anony-
who generally don’t have such keys. Ideally, as a user, you should mous routing.
already know the public keys of all the entities you communicate
with e.g., amazon.com, google.com, etc. However, how are you going
to learn those public keys? The traditional answer was that because
they are public these keys are much easier to communicate and the
servers could even post them as ads on the New York Times. Of course
these days everyone reads the Times through nytimes.com and so this
seems like a chicken-and-egg type of problem.
The solution goes back again to the quote of Archimedes of “Give
me a fulcrum, and I shall move the world”. The idea is that trust can
be transitive. Suppose you have a Mac. Then you have already trusted
Apple with quite a bit of your personal information, and so you might
be fine if this Mac came pre-installed with the Apple public key which
you trust to be authentic. Now, suppose that you want to communi-
cate with Amazon.com. Now, you might not know the correct public
key for Amazon, but Apple surely does. So Apple can supply Amazon
with a signed message to the effect of
“I Apple certify that the public key of Amazon.com is 30
82 01 0a 02 82 01 01 00 94 9f 2e fd 07 63
33 53 b1 be e5 d4 21 9d 86 43 70 0e b5 7c
p u bl i c ke y c ry p tog ra p hy 209

45 bb ab d1 ff 1f b1 48 7b a3 4f be c7 9d
0f 5c 0b f1 dc 13 15 b0 10 e3 e3 b6 21 0b
40 b0 a3 ca af cc bf 69 fb 99 b8 7b 22 32
bc 1b 17 72 5b e5 e5 77 2b bd 65 d0 03 00
10 e7 09 04 e5 f2 f5 36 e3 1b 0a 09 fd 4e
1b 5a 1e d7 da 3c 20 18 93 92 e3 a1 bd 0d
03 7c b6 4f 3a a4 e5 e5 ed 19 97 f1 dc ec
9e 9f 0a 5e 2c ae f1 3a e5 5a d4 ca f6 06
cf 24 37 34 d6 fa c4 4c 7e 0e 12 08 a5 c9
dc cd a0 84 89 35 1b ca c6 9e 3c 65 04 32
36 c7 21 07 f4 55 32 75 62 a6 b3 d6 ba e4
63 dc 01 3a 09 18 f5 c7 49 bc 36 37 52 60
23 c2 10 82 7a 60 ec 9d 21 a6 b4 da 44 d7
52 ac c4 2e 3d fe 89 93 d1 ba 7e dc 25 55
46 50 56 3e e0 f0 8e c3 0a aa 68 70 af ec
90 25 2b 56 f6 fb f7 49 15 60 50 c8 b4 c4
78 7a 6b 97 ec cd 27 2e 88 98 92 db 02 03
01 00 01”

Such a message is known as a certificate, and it allows you to extend


your trust in Apple to a trust in Amazon. Now when your browser
communicates with Amazon, it can request this message, and if it is
not present terminate the interaction or at least display some warning.
Clearly a person in the middle can stop this message from travelling
and hence not allow the interaction to continue, but they cannot spoof
the message and send a certificate for their own public key, unless
they know Apple’s secret key. (In today’s actual implementation, for
various business and other reasons, the trusted keys that come pre-
installed in browsers and devices do not belong to Apple or Microsoft
but rather to particular companies such as Verisign known as certificate
authorities. The security of these certificate authorities’ private key is
crucial to the security of the whole protocol, and it has been attacked
before.)
Using certificates, we can assume that Bob the user has the public 10
If this key is ephemeral- generated on the spot
verification key 𝑣 of Alice the server. Now Alice can send Bob also a for this interaction and deleted afterward- then
this has the benefit of ensuring the forward secrecy
public encryption key 𝑒, which is authenticated by 𝑣 and hence guar- property that even if some entity that is in the habit
anteed to be correct.10 Once Bob knows Alice’s public key they are in of recording all communication later finds out Alice’s
private verification key, then it still will not be able
business- he can use that to send an encryption of some private key 𝑘,
to decrypt the information. In applied crypto circles
which they can then use for all the rest of their communication. this property is somewhat misnamed as “perfect
This is, at a very high level, the SSL/TLS protocol, but there are forward secrecy” and associated with the Diffie-
Hellman key exchange (or its elliptic curves variants),
many details inside it including the exact security notions needed since in those protocols there is not much additional
from the encryption, how the two parties negotiate which crypto- overhead for implementing it (see this blog post). The
importance of forward security was emphasized by
graphic algorithm to use, and more. All these issues can and have
the discovery of the Heartbleed vulnerability (see this
been used for attacks on this protocol. For two recent discussions see paper) that allowed via a buffer-overflow attack in
this blog post and this website. OpenSSL to learn the private key of the server.
210 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Figure 9.4: When you connect to a webpage protected


by SSL/TLS, the browser displays information on the
certificate’s authenticity

Figure 9.5: The cipher and certificate used by


’‘’Google.com” ’. Note that Google has a 2048bit
RSA signature key which it then uses to authenticate
an elliptic curve based Diffie-Hellman key exchange
protocol to create session keys for the block cipher
AES with 128 bit key in Galois Counter Mode.
p u bl i c ke y c ry p tog ra p hy 211

Figure 9.6: Digital signatures and other forms of


electronic signatures are legally binding in many
jurisdictions. This is some material from the website
of the electronic signing company DocuSign.

Example: Here is the list of certificate authorities


that were trusted by default (as of spring 2016) by
Mozilla products: Actalis, Amazon, AS Sertifitseer-
imiskeskuse (SK), Atos, Autoridad de Certificacion
Firmaprofesional, Buypass, CA Disig a.s., Camer-
firma, Certicámara S.A., Certigna, Certinomis,
certSIGN, China Financial Certification Authority
(CFCA), China Internet Network Information Cen-
ter (CNNIC), Chunghwa Telecom Corporation,
Comodo, ComSign, Consorci Administració Oberta
de Catalunya (Consorci AOC, CATCert), Cybertrust
Japan / JCSI, D-TRUST, Deutscher Sparkassen
Verlag GmbH (S-TRUST, DSV-Gruppe), DigiCert,
DocuSign (OpenTrust/Keynectis), e-tugra, EDI-
COM, Entrust, GlobalSign, GoDaddy, Government
of France (ANSSI, DCSSI), Government of Hong
Kong (SAR), Hongkong Post, Certizen, Govern-
ment of Japan, Ministry of Internal Affairs and
Communications, Government of Spain, Autori-
tat de Certificación de la Comunitat Valenciana
(ACCV), Government of Taiwan, Government
Root Certification Authority (GRCA), Government
of The Netherlands, PKIoverheid, Government
of Turkey, Kamu Sertifikasyon Merkezi (Kamu
SM), HARICA, IdenTrust, Izenpe S.A., Microsec
e-Szignó CA, NetLock Ltd., PROCERT, QuoVadis,
RSA the Security Division of EMC, SECOM Trust
Systems Co. Ltd., Start Commercial (StartCom)
Ltd., Swisscom (Switzerland) Ltd, SwissSign AG,
Symantec / GeoTrust, Symantec / Thawte, Syman-
tec / VeriSign, T-Systems International GmbH
(Deutsche Telekom), Taiwan-CA Inc. (TWCA),
212 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

TeliaSonera, Trend Micro, Trustis, Trustwave, Turk-


Trust, Unizeto Certum, Visa, Web.com, Wells Fargo
Bank N.A., WISeKey, WoSign CA Limited

9.5 APPENDIX: AN ALTERNATIVE PROOF OF THE DENSITY OF


PRIMES
I record here an alternative way to show that the fraction of primes in 11
It might be that the two ways are more or less the
[2𝑛 ] is Ω(1/𝑛).11 same, as if we open up the polynomial (1 − 𝑥)𝑘 𝑥𝑘 we
get the binomial coefficients.
Lemma 9.17 The probability that a random 𝑛 bit number is prime is at
least Ω(1/𝑛).

Proof. Let 𝑁 = 2𝑛 . We need to show that the number of primes


between 1 and 𝑁 is at least Ω(𝑁 / log 𝑁 ). Consider the number (2𝑁 𝑁) =
2𝑁!
𝑁!𝑁! . By Stirling’s formula we know that log (2𝑁
𝑁 ) = (1 − 𝑜(1))2𝑁 and
in particular 𝑁 ≤ log ( 𝑁 ) ≤ 2𝑁 . Also, by the formula using factorials,
2𝑁

all the prime factors of (2𝑁𝑁 ) are between 0 and 2𝑁 , and each factor
𝑃 cannot appear more than 𝑘 = ⌊ log log 𝑃 ⌋ times. Indeed, for every 𝑁 ,
2𝑁

the number of times 𝑃 appears in the factorization of 𝑁 ! is ∑𝑖 ⌊ 𝑃𝑁𝑖 ⌋,


since we get ⌊ 𝑁 𝑃 ⌋ times a factor 𝑃 in the factorizations of {1, … , 𝑁 },
⌊ 𝑃𝑁2 ⌋ times a factor of the form 𝑃 2 , etc. Thus, the number of times 𝑃
appears in the factorization of (2𝑁 𝑁 ) = 𝑁!𝑁! is equal to ∑𝑖 ⌊ 𝑃 𝑖 ⌋−2⌊ 𝑃 𝑖 ⌋:
(2𝑁)! 2𝑁 𝑁

a sum of at most 𝑘 elements (since 𝑃 𝑘+1


> 2𝑁 ) each of which is either
0 or 1.
log 2𝑁
⌊ log 𝑃 ⌋
Thus, (2𝑁
𝑁 ) ≤ ∏1≤𝑃 ≤2𝑁 𝑃 . Taking logs we get that
𝑃 prime

2𝑁
𝑁 ≤ log ( )
𝑁

≤ ∑ ⌊ log
log 𝑃 ⌋ log 𝑃
2𝑁

𝑃 prime∈[2𝑛]

≤ ∑ log 2𝑁
𝑃 prime∈[2𝑛]

establishing that the number of primes in [1, 𝑁 ] is Ω( log𝑁𝑁 ).


9.6 ADDITIONAL GROUP THEORY EXERCISES AND PROOFS


Below are optional group theory related exercises and proofs meant
to help gain an intuition with group theory. Note that in this class,
we tend only to talk about finite commutative groups 𝔾, but there are
more general groups:
p u bl i c ke y c ry p tog ra p hy 213

• For example, the integers (i.e. infinitely many elements) where the
operation is addition is a commutative group: if 𝑎, 𝑏, 𝑐 are integers,
then 𝑎 + 𝑏 = 𝑏 + 𝑎 (commutativity), (𝑎 + 𝑏) + 𝑐 = 𝑎 + (𝑏 +
𝑐) (associativity), 𝑎 + 0 = 𝑎 (so 0 is the identity element here;
we typically think of the identity as 1, especially when the group
operation is multiplication), and 𝑎 + (−𝑎) = 0 (i.e. for any integer,
we are allowed to think of its additive inverse, which is also an
integer).

• A non-commutative group (or a non-abelian group) is a group


such that ∃𝑎, 𝑏 ∈ 𝔾 but 𝑎 ∗ 𝑏 ≠ 𝑏 ∗ 𝑎 (where ∗ is the group operation).
One example (of an infinite, non-commutative group) is the set of
2 × 2 matrices (over the real numbers) which are invertible, and
the operation is matrix multiplication. The identity element is the
traditional identity matrix, and each matrix has an inverse (and the
product of two invertible matrices is still invertible), and matrix
multiplication satisfies associativity. However, matrix multiplica-
tion here need not satisfy commutativity.

In this class, we restrict ourselves to finite commutative groups to


avoid complications with infinite group orders and annoyances with
non-commutative operations. For the problems below, assume that a
“group” is really a “finite commutative group”.
Here are five more important groups used in cryptography other
than ℤ𝑝 . Recall that groups are given by a set and a binary operation.

• For some prime 𝑝, ℤ∗𝑝 = {1, … , 𝑝 − 1}, with operation multiplication


mod 𝑝 (Note: the ∗ is to distinguish this group from ℤ𝑝 with an
additive operation and from GF(𝑝).)
• The quadratic residues of ℤ∗𝑝 : 𝑄𝑝 = {𝑎2 ∶ 𝑎 ∈ ℤ∗𝑝 } with operation
multiplication mod 𝑝
• ℤ∗𝑛 , where 𝑛 = 𝑝 ⋅ 𝑞 (product of two primes)
• The quadratic residues of ℤ∗𝑛 :: 𝑄𝑛 = {𝑎2 ∶ 𝑎 ∈ ℤ∗𝑛 }, where 𝑛 = 𝑝 ⋅ 𝑞
• Elliptic curve groups

For more familiarity with group definitions, you could verify that
the first 4 groups satisfy the group axioms. For cryptography, two
operations need be efficient for elements 𝑎, 𝑏 in group 𝔾:

• Exponentiation: 𝑎, 𝑏 ↦ 𝑎𝑏 . This is done efficiently using repeated


squaring, i.e. generate all the squares up to 2𝑘 and then use the
binary representation.
• Inverse: 𝑎 ↦ 𝑎−1 . This is done efficiently in ℤ𝑝 by Fermat’s little
theorem. 𝑎−1 = 𝑎𝑝−2 mod 𝑝.
214 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

9.6.1 Solved exercises:


Is the set 𝑆 = {1, 2, 3, 4, 5, 6} a group if the operation
Solved Exercise 9.1
is multiplication mod 7? What if the operation is addition mod 7?

Solution:
Yes (if multiplication) and no (if addition). To prove that some-
thing is a group, we run through the definition of a group. This
set is finite, and multiplication (even multiplication mod some
number) will satisfy commutativity and associativity. The identity
element is 1 because any number times 1, even mod 7, is still itself.
To find inverses, we can in this case literally find the inverses. 1 ∗ 1
mod 7 = 1 mod 7 (so the inverse of 1 is 1). 2 ∗ 4 mod 7 = 8
mod 7 = 1 mod 7 (so the inverse of 2 is 4, and from commutativ-
ity, the inverse of 4 is 2). 3 ∗ 5 mod 7 = 15 mod 7 = 1 mod 7 (so
the inverse of 3 is 5, and the inverse of 5 is 3). 6 ∗ 6 mod 7 = 36
mod 7 = 1 mod 7 (so 6 is its own inverse; notice that an element
can be its own inverse, even if it is not the identity 1). The set 𝑆
is not a group if the operation is addition for many reasons: one
way to see this 1 + 6 mod 7 = 0 mod 7, but 0 is not an element
of 𝑆, so this group is not closed under its operation (implicit in the
definition of a group is the idea that a group’s operation must send
two group elements to another element within the same set of group
elements).

Solved Exercise 9.2 What are the generators of the group {1, 2, 3, 4, 5, 6},
where the operation is multiplication mod 7?

Solution:
3 and 5. Recall that a generator of a group is an element 𝑔 such
that {𝑔, 𝑔2 , 𝑔3 , ⋯} is the entire group. We can directly check the
elements here: {1, 12 , 13 , ⋯} = {1}, so 1 is not a generator. 2 is
not a generator because 2 mod 7 = 8 mod 7 = 1, so the set
3

{2, 22 , 23 , 24 , ⋯} is really the set {2, 4, 1}, which is not the entire
group. 3 will be a generator because 32 mod 7 = 9 mod 7 = 2
mod 7, 33 mod 7 = 2 ∗ 3 mod 7 = 6 mod 7, 33 mod 7 = 18
mod 7 = 4 mod 7, 34 = 12 mod 7 = 5, 35 mod 7 = 15
mod 7 = 1, so {3, 32 , 33 , 34 , 35 , 36 , 37 } = {3, 2, 6, 4, 5, 1}, which are
all of the elements. 4 is not a generator because 43 mod 7 = 64
mod 7 = 1 mod 7, so just like 2, we won’t get every element. 5 is a
generator because 52 mod 6 = 4, 53 mod 7 = 20 mod 7 = 6, 54
mod 7 = 30 mod 7 = 2, 55 mod 7 = 10 mod 7 = 3, 56
p u bl i c ke y c ry p tog ra p hy 215

mod 7 = 15 mod 7 = 1, so just like 3, 5 is a generator. 6 is not


a generator because 62 mod 7 = 1 mod 7, so just like 2, the set
{6, 62 , 63 , ⋯} cannot contain all elements (it will just have 1 and 6).

Solved Exercise 9.3 What is the order of every element in the group
{1, 2, 3, 4, 5, 6}, where the operation is multiplication mod 7?

Solution:
The orders (of 1, 2, 3, 4, 5, 6) are 1, 3, 6, 3, 6, 2, respectively. This
can be seen from the work of the previous problem, where we test
out powers of elements. Notice that all of these orders divide the
number of elements in our group. This is not a coincidence, and it
is an example of Lagrange’s Theorem, which states that the size of
every subgroup of a group will divide the order of a group. Recall
that a subgroup is simply a subset of the group which is a group in
its own right and is closed under the operation of the group.

Solved Exercise 9.4 Suppose we have some (finite, commutative) group


𝔾. Prove that the inverse of any element is unique (i.e. prove that if
𝑎 ∈ 𝔾, then if 𝑏, 𝑐 ∈ 𝔾 such that 𝑎𝑏 = 1 and 𝑎𝑐 = 1, then 𝑏 = 𝑐).

Solution:
Suppose that 𝑎 ∈ 𝔾 and that 𝑏, 𝑐 ∈ 𝔾 such that 𝑎𝑏 = 1 and
𝑎𝑐 = 1. Then we know that 𝑎𝑏 = 𝑎𝑐, and then we can apply 𝑎−1 to
both sides (we are guaranteed that 𝑎 has SOME inverse 𝑎−1 in the
group), and so we have 𝑎−1 𝑎𝑏 = 𝑎−1 𝑎𝑐, but we know that 𝑎−1 𝑎 = 1
(and we can use associativity of a group), so (1)𝑏 = (1)𝑐 so 𝑏 = 𝑐.
QED.

Solved Exercise 9.5Suppose we have some (finite, commutative) group


𝔾. Prove that the identity element is unique (i.e. if 𝑐𝑎 = 𝑐 for all 𝑐 ∈ 𝔾
and if 𝑐𝑏 = 𝑐 for all 𝑐 ∈ 𝔾, then 𝑎 = 𝑏).

Solution:
Suppose that 𝑐𝑎 = 𝑐 for all 𝑐 ∈ 𝔾 and that 𝑐𝑏 = 𝑐 for all 𝑐 ∈ 𝔾.
Then we can say that 𝑐𝑎 = 𝑐 = 𝑐𝑏 (for any 𝑐, but we can choose
some 𝑐 in particular, we could have picked 𝑐 = 1). And then 𝑐
has some inverse element 𝑐 in the group, so 𝑐 𝑐𝑎 = 𝑐−1 𝑐𝑏, but
−1 −1

𝑐−1 𝑐 = 1, so 𝑎 = 𝑏 as desired. QED



216 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

The next few problems are related to quadratic residues, but these
problems are a bit more general (in particular, we are considering
some group, and a subgroup which are all of the elements of the first
group which are squares).
Suppose that 𝔾 is some (finite, commutative) group,
Solved Exercise 9.6
and ℍ is the set defined by ℍ ∶= {ℎ ∈ 𝔾 ∶ ∃𝑔 ∈ 𝐺, 𝑔2 = ℎ}. Verify that ℍ
is a subgroup of 𝔾.

Solution:
To be a subgroup, we need to make sure that ℍ is a group in
its own right (in particular, that it contains the identity, that it
contains inverses, and that it is closed under multiplication; asso-
ciativity and commutativity follow because we are within a larger
set 𝔾 which satisfies associativity and commutativity).
Identity Well, 12 = 1, so 1 ∈ ℍ, so ℍ has the identity element.
Inverses If ℎ ∈ ℍ, then 𝑔2 = ℎ for some 𝑔 ∈ 𝔾, but 𝑔 has an inverse
in 𝔾, and we can look at 𝑔2 (𝑔−1 )2 = (𝑔𝑔−1 )2 = 12 = 1 (where
I used commutativity and associativity, as well as the definition
of the inverse). It is clear that (𝑔−1 )2 ∈ ℍ because there exists an
element in 𝔾 (specifically, 𝑔−1 ) whose square is (𝑔−1 )2 . Therefore
ℎ has an inverse in ℍ, where if ℎ = 𝑔2 , then ℎ−1 = (𝑔−1 )2 . Closure
under operation If ℎ1 , ℎ2 ∈ ℍ, then there exist 𝑔1 , 𝑔2 ∈ 𝔾 where
ℎ1 = (𝑔1 )2 , ℎ2 = (𝑔2 )2 . So ℎ1 ℎ2 = (𝑔1 )2 (𝑔2 )2 = (𝑔1 𝑔2 )2 , so ℎ1 ℎ2 ∈ ℍ.
Therefore, ℍ is a subgroup of 𝔾.

Solved Exercise 9.7 Assume that |𝔾| is an even number and is known,
and that 𝑔 = 1 for any 𝑔 ∈ 𝔾. Also assume that 𝔾 is a cyclic group,
|𝔾|

i.e. there is some 𝑔 ∈ 𝔾 such that any element 𝑓 ∈ 𝔾 can be written as


𝑓 𝑘 for some integer 𝑘. Also assume that exponentiation is efficient in
this context (i.e. we can compute 𝑔𝑟 for any 0 ≤ 𝑟 ≤ |𝔾| in an efficient
time for any 𝑔 ∈ 𝔾).
Under the assumptions stated above, prove that there is an efficient
way to check if some element of 𝔾 is also an element of ℍ, where ℍ is
still the subgroup of squares of elements of 𝔾 (note: running through
all possible elements of 𝔾 may not be efficient, so this cannot be your
strategy).

Solution:
Suppose that we receive some element 𝑔 ∈ 𝔾. We want to know
if there exists some 𝑔′ ∈ 𝔾 such that 𝑔 = (𝑔′ )2 (this is equivalent to
p u bl i c ke y c ry p tog ra p hy 217

𝑔 being in ℍ). To do so, compute 𝑔|𝔾|/2 . I claim that 𝑔|𝔾|/2 = 1 if and


only if 𝑔 ∈ ℍ.
(Proving the if): If 𝑔 ∈ ℍ, then 𝑔 = (𝑔′ )2 for some 𝑔′ ∈ 𝔾. We
then have that 𝑔|𝔾|/2 = ((𝑔′ )2 )|𝔾|/2 = (𝑔′ )|𝔾| . But from our assump-
tion, an element raised to the order of the group is 1, so (𝑔′ )|𝔾| = 1,
so 𝑔|𝔾|/2 = 1. As a result, if 𝑔 ∈ ℍ, then 𝑔|𝔾|/2 = 1.
(Proving the only if): Now suppose that 𝑔|𝔾|/2 = 1. At this
point, we use the fact that 𝔾 is cyclic, so let 𝑓 ∈ 𝔾 be the gener-
ator of 𝔾. We know that 𝑔 is some power of 𝑓, and this power is
either even or odd. If the power is even, we are done. If the power
is odd, then 𝑔 = 𝑓 2𝑘+1 for some natural number 𝑘. And then we
see 𝑔|𝔾|/2 = (𝑓 2𝑘+1 )|𝔾|/2 = 𝑓 |𝔾|+|𝔾|/2 = 𝑓 |𝔾| 𝑓 |𝔾|/2 . We can use the
assumption that any element raised to its group’s order is 1, so
1 = 𝑔|𝔾|/2 = 𝑓 |𝔾| 𝑓 |𝔾|/2 = 𝑓 |𝔾|/2 . This tells us that the order of 𝑓 is
at most |𝔾|/2, but this is a contradiction because 𝑓 is a generator
of 𝔾, so its order cannot be less than |𝔾| (if it were, then looking
at {𝑓, 𝑓 2 , 𝑓 3 , ⋯}, we would only count at most half of the elements
before cycling back to 1, 𝑓, 𝑓 2 , ⋯, so this set wouldn’t contain all
of 𝔾). As a result, we have reached a contradiction, so 𝑔|𝔾|/2 = 1
means that 𝑔 = 𝑓 2𝑘 = (𝑓 𝑘 )2 , so 𝑔 ∈ ℍ.
We are given that this exponentiation is efficient, so checking
𝑔|𝔾|/2 == 1 is an efficient and correct way to test if 𝑔 ∈ ℍ. QED.
This proof idea came from here as well as from the 2/25/20
lecture at Harvard given by MIT professor Yael Kalai.
Commentary on assumptions and proof: Proving that 𝑔|𝔾| = 1
is a useful exercise in its own right, but it overlaps somewhat with
our problem sets of 2020, so we will not prove it here; observe that
if 𝔾 is the set of {1, 2, 3, ⋯ , 𝑝 − 1} for some prime 𝑝, then this is
a special case of Fermat’s Little Theorem, which states that 𝑎𝑝−1
mod 𝑝 = 1 for 𝑎 ∈ {1, 2, 3, ⋯ , 𝑝 − 1}. Also, one can prove that 𝑍𝑝
(the set of numbers 0, 1, 2, ⋯ , 𝑝 − 1, with operation multiplication
mod 𝑝) for 𝑝 a prime is cyclic, where one method can be found
here, where the proof comes down to factorizing certain polynomi-
als and decomposing numbers in terms of prime powers. We can
then see that this proof above says that there is an efficient way to
test whether an element of 𝑍𝑝 is a square or not.

10
Concrete candidates for public key crypto

In the previous lecture we talked about public key cryptography and


saw the Diffie Hellman system and the DSA signature scheme. In this
lecture, we will see the RSA trapdoor function and how to use it for
both encryptions and signatures.

10.1 SOME NUMBER THEORY.


(See Shoup’s excellent and freely available book for extensive coverage
of these and many other topics.)
For every number 𝑚, we define ℤ𝑚 to be the set {0, … , 𝑚 − 1} with
the addition and multiplication operations modulo 𝑚. When two
elements are in ℤ𝑛 then we will always assume that all operations are
done modulo 𝑚 unless stated otherwise. We let ℤ∗𝑚 = {𝑎 ∈ ℤ𝑚 ∶
𝑔𝑐𝑑(𝑎, 𝑚) = 1}. Note that 𝑚 is prime if and only if |ℤ∗𝑚 | = 𝑚 − 1.
For every 𝑎 ∈ ℤ∗𝑚 we can find using the extended gcd algorithm
an element 𝑏 (typically denoted as 𝑎−1 ) such that 𝑎𝑏 = 1 (can you
see why?). The set ℤ∗𝑚 is an abelian group with the multiplication
operation, and hence by the observations of the previous lecture,
𝑎|ℤ𝑚 | = 1 for every 𝑎 ∈ ℤ∗𝑚 . In the case that 𝑚 is prime, this result is

known as “Fermat’s Little Theorem” and is typically stated as 𝑎𝑝−1 = 1


(mod 𝑝) for every 𝑎 ≠ 0.

R
Remark 10.1 — Note on 𝑛 bits vs a number 𝑛. One
aspect that is often confusing in number-theoretic
based cryptography, is that one needs to always keep
track whether we are talking about “big” numbers
or “small” numbers. In many cases in crypto, we use
𝑛 to talk about our key size or security parameter,
in which case we think of 𝑛 as a “small” number of
size 100 − 1000 or so. However, when we work with
ℤ∗𝑚 we often think of 𝑚 as a “big” number having
about 100 − 1000 digits; that is 𝑚 would be roughly
2100 to 21000 or so. I will try to reserve the notation

Compiled on 11.17.2021 22:35


220 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

𝑛 for “small” numbers but may sometimes forget to


do so, and other descriptions of RSA etc.. often use
𝑛 for “big” numbers. It is important that whenever
you see a number 𝑥, you make sure you have a sense
whether it is a “small” number (in which case 𝑝𝑜𝑙𝑦(𝑥)
time is considered efficient) or whether it is a “large”
number (in which case only 𝑝𝑜𝑙𝑦(𝑙𝑜𝑔(𝑥)) time would
be considered efficient).

R
Remark 10.2 — The number 𝑚 vs the message 𝑚. In
much of this course we use 𝑚 to denote a string
which is our plaintext message to be encrypted or
authenticated. In the context of integer factoring, it is
convenient to use 𝑚 = 𝑝𝑞 as the composite number
that is to be factored. To keep things interesting (or
more honestly, because I keep running out of letters)
in this lecture we will have both usages of 𝑚 (though
hopefully not in the same theorem or definition!).
When we talk about factoring, RSA, and Rabin, then
we will use 𝑚 as the composite number, while in the
context of the abstract trapdoor-permutation based
encryption and signatures we will use 𝑚 for the mes-
sage. When you see an instance of 𝑚, make sure you
understand what is its usage.

10.1.1 Primaliy testing


One procedure we often need is to find a prime of 𝑛 bits. The typical
way people do it is by choosing a random 𝑛-bit number 𝑝, and testing
whether it is prime. We showed in the previous lecture that a random
𝑛 bit number is prime with probability at least Ω(1/𝑛2 ) (in fact the
ln 𝑛 by the Prime Number Theorem). We now discuss
probability is 1±𝑜(1)
how we can test for primality.

There is an 𝑝𝑜𝑙𝑦(𝑛)-time algorithm to


Theorem 10.3 — Primality Testing.
test whether a given 𝑛-bit number is prime or composite.

Theorem 10.3 was first shown in 1970’s by Solovay, Strassen, Miller


and Rabin via a probabilistic algorithm (that can make a mistake with
probability exponentially small in the number of coins it uses), and in
a 2002 breakthrough, Agrawal, Kayal, and Saxena gave a deterministic
polynomial time algorithm for the same problem.
Lemma 10.4 There is a probabilistic polynomial time algorithm 𝐴 that
on input a number 𝑚, if 𝑚 is prime 𝐴 outputs YES with probability 1
and if 𝐴 is not even a “pseudoprime” it outputs NO with probability
conc re te ca n d i date s for p u bl i c ke y c ry p to 221

at least 1/2. (The definition of “pseudo-prime” will be clarified in the


proof below.)

Proof. The algorithm is very simple and is based on Fermat’s Lit-


tle Theorem: on input 𝑚, pick a random 𝑎 ∈ {2, … , 𝑚 − 1}, and if
𝑔𝑐𝑑(𝑎, 𝑚) ≠ 1 or 𝑎𝑚−1 ≠ 1 (mod 𝑚) return NO and otherwise return
YES.
By Fermat’s little theorem, the algorithm will always return YES on
a prime 𝑚. We define a “pseudoprime” to be a non-prime number 𝑚
such that 𝑎𝑚−1 = 1 (mod 𝑚) for all 𝑎 such that 𝑔𝑐𝑑(𝑎, 𝑚) = 1.
If 𝑛 is not a pseudoprime then the set 𝑆 = {𝑎 ∈ ℤ∗𝑚 ∶ 𝑎𝑚−1 = 1}
is a strict subset of ℤ∗𝑚 . But it is easy to see that 𝑆 is a group and hence
|𝑆| must divide |𝑍𝑛∗ | and hence in particular it must be the case that
|𝑆| < |ℤ∗𝑛 |/2 and so with probability at least 1/2 the algorithm will
output NO.

Lemma 10.4 its own might not seem very meaningful since it’s
not clear how many pseudoprimes are there. However, it turns out
these pseudoprimes, also known as “Carmichael numbers”, are
much less prevalent than the primes, specifically, there are about
𝑁 /2−Θ(log 𝑁/ log log 𝑁) pseudoprimes between 1 and 𝑁 . If we choose a
random number 𝑚 ∈ [2𝑛 ] and output it if and only if the algorithm of
Lemma 10.4 algorithm outputs YES (otherwise resampling), then the
probability we make a mistake and output a pseudoprime is equal to
the ratio of the set of pseudoprimes in [2𝑛 ] to the set of primes in [2𝑛 ].
Since there are Ω(2𝑛 /𝑛) primes in [2𝑛 ], this ratio is 2−Ω(𝑛/
𝑛
log 𝑛) which is

a negligible quantity. Moreover, as mentioned above, there are better


algorithms that succeed for all numbers.
In contrast to testing if a number is prime or composite, there is
no known efficient algorithm to actually find the factorization of a
composite number. The best known algorithms run in time roughly
2𝑂(𝑛 ) where 𝑛 is the number of bits.
̃ 1/3

10.1.2 Fields
If 𝑝 is a prime then ℤ𝑝 is a field which means it is closed under addition
and multiplication and has 0 and 1 elements. One property of a field is
the following:

If 𝑓 is a
Theorem 10.5 — Fundamental Theorem of Algebra, mod 𝑝 version.
nonzero polynomial of degree 𝑑 over ℤ𝑝 then there are at most 𝑑
distinct inputs 𝑥 such that 𝑓(𝑥) = 0.

(If you’re curious why, you can see that the task of, given
𝑥1 , … , 𝑥𝑑+1 finding the coefficients for a polynomial vanishing on
222 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

the 𝑥𝑖 ’s amounts to solving a linear system in 𝑑 + 1 variables with


𝑑 + 1 equations that are independent due to the non-singularity of the
Vandermonde matrix.)
In particular every 𝑥 ∈ ℤ𝑝 has at most two square roots (numbers 𝑠
such that 𝑠2 = 𝑥 mod 𝑝). In fact, just like over the reals, every 𝑥 ∈ ℤ𝑝
either has no square roots or exactly two square roots of the form ±𝑠.
We can efficiently find square roots modulo a prime. In fact, the
following result is known:

There is a probabilistic 𝑝𝑜𝑙𝑦(log 𝑝, 𝑑)


Theorem 10.6 — Finding roots.
time algorithm to find the roots of a degree 𝑑 polynomial over ℤ𝑝 .

This is a special case of the problem of factoring polynomials over


finite fields, shown in 1967 by Berlekamp and on which much other
work has been done; see Chapter 20 in Shoup).

10.1.3 Chinese remainder theorem


Suppose that 𝑚 = 𝑝𝑞 is a product of two primes. In this case 𝑍𝑚 ∗
does
not contain all the numbers from 1 to 𝑚 − 1. Indeed, all the numbers of
the form 𝑝, 2𝑝, 3𝑝, … , (𝑞 − 1)𝑝 and 𝑞, 2𝑞, … , (𝑝 − 1)𝑞 will have non-trivial
g.c.d. with 𝑚. There are exactly 𝑞 − 1 + 𝑝 − 1 such numbers (because
𝑝 and 𝑞 are prime all the numbers of the forms above are distinct).
Hence |𝑍𝑚 ∗
| = 𝑚 − 1 − (𝑝 − 1) − (𝑞 − 1) = 𝑝𝑞 − 𝑝 − 𝑞 + 1 = (𝑝 − 1)(𝑞 − 1).
Note that |𝑍𝑚∗
| = |ℤ∗𝑝 | ⋅ |ℤ∗𝑞 |. It turns out this is no accident:

Theorem 10.7 — Chinese Remainder Theorem (CRT). If 𝑚 = 𝑝𝑞 then there


is an isomorphism 𝜑 ∶ ℤ∗𝑚 → ℤ∗𝑝× ℤ∗𝑞 .That is, 𝜑 is one to one and
onto and maps 𝑥 ∈ ℤ∗𝑚 into a pair (𝜑1 (𝑥), 𝜑2 (𝑥)) ∈ ℤ∗𝑝 × ℤ∗𝑞 such
that for every 𝑥, 𝑦 ∈ ℤ∗𝑚 :
* 𝜑1 (𝑥 + 𝑦) = 𝜑1 (𝑥) + 𝜑1 (𝑦) (mod 𝑝)
* 𝜑2 (𝑥 + 𝑦) = 𝜑2 (𝑥) + 𝜑2 (𝑦) (mod 𝑞)
* 𝜑1 (𝑥 ⋅ 𝑦) = 𝜑1 (𝑥) ⋅ 𝜑1 (𝑦) (mod 𝑝)
* 𝜑2 (𝑥 ⋅ 𝑦) = 𝜑2 (𝑥) ⋅ 𝜑2 (𝑦) (mod 𝑞)

Proof. 𝜑 simply maps 𝑥 ∈ ℤ∗𝑚 to the pair (𝑥 mod 𝑝, 𝑥 mod 𝑞). Verify-
ing that it satisfies all desired properties is a good exercise. QED

In particular, for every polynomial 𝑓() and 𝑥 ∈ ℤ∗𝑚 , 𝑓(𝑥) = 0


(mod 𝑚) iff 𝑓(𝑥) = 0 (mod 𝑝) and 𝑓(𝑥) = 0 (mod 𝑞). Therefore find-
ing the roots of a polynomial 𝑓() modulo a composite 𝑚 is easy if you
know 𝑚’s factorization. However, if you don’t know the factorization
then this is hard. In particular, extracting square roots is as hard as
finding out the factors:
conc re te ca n d i date s for p u bl i c ke y c ry p to 223

Suppose and
Theorem 10.8 — Square root extraction implies factoring.
there is an efficient algorithm 𝐴 such that for every 𝑚 ∈ ℕ and
𝑎 ∈ ℤ∗𝑚 , 𝐴(𝑚, 𝑎2 (mod 𝑚)) = 𝑏 such that 𝑎2 = 𝑏2 (mod 𝑚). Then,
there is an efficient algorithm to recover 𝑝, 𝑞 from 𝑚.

Proof. Suppose that there is such an algorithm 𝐴. Using the CRT we


can define 𝑓 ∶ ℤ∗𝑝 × ℤ∗𝑞 → ℤ∗𝑝 × ℤ∗𝑞 as 𝑓(𝑥, 𝑦) = 𝜑(𝐴(𝜑−1 (𝑥2 , 𝑦2 ))) for
all 𝑥 ∈ ℤ∗𝑝 and 𝑦 ∈ ℤ∗𝑞 . Now, for any 𝑥, 𝑦 let (𝑥′ , 𝑦′ ) = 𝑓(𝑥, 𝑦). Since
𝑥2 = 𝑥′2 (mod 𝑝) and 𝑦2 = 𝑦′2 (mod 𝑞) we know that 𝑥′ ∈ {±𝑥} and
𝑦′ ∈ {±𝑦}. Since flipping signs doesn’t change the value of (𝑥′ , 𝑦′ ) =
𝑓(𝑥, 𝑦), by flipping one or both of the signs of 𝑥 or 𝑦 we can ensure
that 𝑥′ = 𝑥 and 𝑦′ = −𝑦. Hence (𝑥, 𝑦) − (𝑥′ , 𝑦′ ) = (0, 2𝑦). In other
words, if 𝑐 = 𝜑−1 (𝑥 − 𝑥′ , 𝑦 − 𝑦′ ) then 𝑐 = 0 (mod 𝑝) but 𝑐 ≠ 0 (mod 𝑞)
which in particular means that the greatest common divisor of 𝑐 and
𝑚 is 𝑞. So, by taking 𝑔𝑐𝑑(𝐴(𝜑−1 (𝑥, 𝑦)), 𝑚) we will find 𝑞, from which
we can find 𝑝 = 𝑚/𝑞.
This almost works, but there is a question of how can we find
𝜑−1 (𝑥, 𝑦), given that we don’t know 𝑝 and 𝑞? The crucial observa-
tion is that we don’t need to. We can simply pick a value 𝑎 at random
in {1, … , 𝑚}. With very high probability (namely (𝑝 − 1 + 𝑞 − 1)/𝑝𝑞)
𝑎 will be in ℤ∗𝑚 , and so we can imagine this process as equivalent to
the process of taking a random 𝑥 ∈ ℤ∗𝑝 , a random 𝑦 ∈ ℤ∗𝑞 and then
flipping the signs of 𝑥 and 𝑦 randomly and taking 𝑎 = 𝜑(𝑥, 𝑦). By
the arguments above with probability at least 1/4, it will hold that
𝑔𝑐𝑑(𝑎 − 𝐴(𝑎2 ), 𝑚) will equal 𝑞.

Note that this argument generalizes to work even if the algorithm 𝐴


is an average case algorithm that only succeeds in finding a square root
for a significant fraction of the inputs. This observation is crucial for
cryptographic applications.

10.1.4 The RSA and Rabin functions


We are now ready to describe the RSA and Rabin trapdoor functions:

Given a number 𝑚 = 𝑝𝑞 and 𝑒 such that


Definition 10.9 — RSA function.
𝑔𝑐𝑑((𝑝 − 1)(𝑞 − 1), 𝑒) = 1, the RSA function w.r.t 𝑚 and 𝑒 is the map
𝑓𝑚,𝑒 ∶ ℤ∗𝑚 → ℤ∗𝑚 such that RSA𝑚,𝑒 (𝑥) = 𝑥𝑒 (mod 𝑚).

Given a number 𝑚
Definition 10.10 — Rabin function. = 𝑝𝑞, the Ra-
bin function w.r.t. 𝑚, is the map 𝑅𝑎𝑏𝑖𝑛𝑚 ∶ ℤ𝑚 → ℤ𝑚 such that
∗ ∗

𝑅𝑎𝑏𝑖𝑛𝑚 (𝑥) = 𝑥2 (mod 𝑚).


224 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Note that both maps can be computed in polynomial time. Using


the Chinese Remainder Theorem and Theorem 10.6, we know that 1
Using Theorem 10.6 to invert the function requires
both functions can be inverted efficiently if we know the factorization.1 𝑒 to be not too large. However, as we will see below
However Theorem 10.6 is a much too big of a Hammer to invert the it turns out that using the factorization we can invert
the RSA function for every 𝑒. Also, in practice people
RSA and Rabin functions, and there are direct and simple inversion
often use a small value for 𝑒 (sometimes as small as
algorithms (see homework exercises). By Theorem 10.8, inverting the 𝑒 = 3) for reasons of efficiency.
Rabin function amounts to factoring 𝑚. No such result is known for
the RSA function, but there is no better algorithm known to attack
it than proceeding via factorization of 𝑚. The RSA function has the
advantage that it is a permutation over ℤ∗𝑚 :
Lemma 10.11 RSA𝑚,𝑒 is one to one over ℤ∗𝑚 .

Proof. Suppose that RSA𝑚,𝑒 (𝑎) = RSA𝑚,𝑒 (𝑎′ ). By the CRT, it means
that there is (𝑥, 𝑦) ≠ (𝑥′ , 𝑦′ ) ∈ ℤ∗𝑝 × ℤ∗𝑞 such that 𝑥𝑒 = 𝑥′𝑒 (mod 𝑝)
and 𝑦𝑒 = 𝑦′𝑒 (mod 𝑞). But if that’s the case we get that (𝑥𝑥′−1 )𝑒 = 1
(mod 𝑝) and (𝑦𝑦′−1 )𝑒 = 1 (mod 𝑞). But this means that 𝑒 has to be
a multiple of the order of 𝑥𝑥′−1 and 𝑦𝑦′−1 (at least one of which is not
1 and hence has order > 1). But since the order always divides the
group size, this implies that 𝑒 has to have non-trivial gcd with either
|𝑍𝑝∗ | or |ℤ∗𝑞 | and hence with (𝑝 − 1)(𝑞 − 1).

R
Remark 10.12 — Plain/Textbook RSA. The RSA trap-
door function is known also as “plain” or “textbook”
RSA encryption. This is because initially Diffie and
Hellman (and following them, RSA) thought of an
encryption scheme as a deterministic procedure and
so considered simply encrypting a message 𝑥 by ap-
plying ESA𝑚,𝑒 (𝑥). Today however we know that it is
insecure to use a trapdoor function directly as an en-
cryption scheme without adding some randomization.

10.1.5 Abstraction: trapdoor permutations


We can abstract away the particular construction of the RSA and Rabin
functions to talk about a general trapdoor permutation family. We make
the following definition

Definition 10.13 — Trapdoor permutation.A trapdoor permutation family


(TDP) is a family of functions {𝑝𝑘 } such that for every 𝑘 ∈ {0, 1}𝑛 ,
the function 𝑝𝑘 is a permutation on {0, 1}𝑛 and:
* There is a key generation algorithm 𝐺 such that on input 1𝑛
it outputs a pair (𝑘, 𝜏 ) such that the maps 𝑘, 𝑥 ↦ 𝑝𝑘 (𝑥) and
𝜏 , 𝑦 ↦ 𝑝𝑘 (𝑦) are efficiently computable.
−1
conc re te ca n d i date s for p u bl i c ke y c ry p to 225

• For every efficient adversary 𝐴, Pr(𝑘,𝜏)←𝑅 𝐺(1𝑛 ),𝑦∈{0,1}𝑛 [𝐴(𝑘, 𝑦) =


𝑝𝑘−1 (𝑦)] < 𝑛𝑒𝑔𝑙(𝑛).

R
Remark 10.14 — Domain of permutations. The RSA func-
tion is not a permutation over the set of strings but
rather over ℤ∗𝑚 for some 𝑚 = 𝑝𝑞. However, if we find
primes 𝑝, 𝑞 in the interval [2𝑛/2 (1 − 𝑛𝑒𝑔𝑙(𝑛)), 2𝑛/2 ], then
𝑚 will be in the interval [2𝑛 (1 − 𝑛𝑒𝑔𝑙(𝑛)), 2𝑛 ] and hence
ℤ∗𝑚 (which has size 𝑝𝑞 − 𝑝 − 𝑞 + 1 = 2𝑛 (1 − 𝑛𝑒𝑔𝑙(𝑛)))
can be thought of as essentially identical to {0, 1}𝑛 ,
since we will always pick elements from {0, 1}𝑛 at
random and hence they will be in ℤ∗𝑚 with prob-
ability 1 − 𝑛𝑒𝑔𝑙(𝑛). It is widely believed that for
every sufficiently large 𝑛 there is a prime in the
interval [2𝑛 − 𝑝𝑜𝑙𝑦(𝑛), 2𝑛 ] (this follows from the
Extended Reimann Hypothesis) and Baker, Harman
and Pintz proved that there is a prime in the interval
[2𝑛 − 20.6𝑛 , 2𝑛 ]. 2
2
Another, more minor issue is that the description
of the key might not have the same length as log 𝑚; I
defined them to be the same for simplicity of notation,
and this can be ensured via some padding and
10.1.6 Public key encryption from trapdoor permutations
concatenation tricks.
Here is how we can get a public key encryption from a trapdoor per-
mutation scheme {𝑝𝑘 }.

TDP-based public key encryption (TDPENC):

• Key generation: Run the key generation algorithm


of the TDP to get (𝑘, 𝜏 ). 𝑘 is the public encryption
key and 𝜏 is the secret decryption key.
• Encryption: To encrypt a message 𝑚 with key
𝑘 ∈ {0, 1}𝑛 , choose 𝑥 ∈ {0, 1}𝑛 and output
(𝑝𝑘 (𝑥), 𝐻(𝑥) ⊕ 𝑚) where 𝐻 ∶ {0, 1}𝑛 → {0, 1}ℓ is
a hash function we model as a random oracle.
• Decryption: To decrypt the ciphertext (𝑦, 𝑧) with
key 𝜏 , output 𝑚 = 𝐻(𝑝𝑘−1 (𝑦)) ⊕ 𝑧.

P
Please verify that you understand why TDPENC is a
valid encryption scheme, in the sense that decryption
of an encryption of 𝑚 yields 𝑚.
226 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

If {𝑝𝑘 }
Theorem 10.15 — Public key encryption from trapdoor permutations.
is a secure TDP and 𝐻 is a random oracle then TDPENC is a CPA
secure public key encryption scheme.

Proof. Suppose, towards the sake of contradiction, that there is a


polynomial-size adversary 𝐴 that succeeds in the CPA game of TD-
PENC (with access to a random oracle 𝐻) with non-negligible advan-
tage 𝜖 over half. We will use 𝐴 to design an algorithm 𝐼 that inverts
the trapdoor permutation.
Recall that the CPA game works as follows:

• The adversary 𝐴 gets as input a key 𝑘 ∈ {0, 1}𝑛 .

• The algorithm 𝐴 makes some polynomial amount of computation


and 𝑇1 = 𝑝𝑜𝑙𝑦(𝑛) queries to the random oracle 𝐻 and produces a
pair of messages 𝑚0 , 𝑚1 ∈ {0, 1}ℓ .

• The “challenger” chooses 𝑏∗ ←𝑅 {0, 1}, chooses 𝑥∗ ←𝑅 {0, 1}𝑛 and


computes the ciphertext (𝑦∗ = 𝑝𝑘 (𝑥∗ ), 𝑧 ∗ = 𝐻(𝑥∗ ) ⊕ 𝑚𝑏∗ ) which is an
encryption of 𝑚𝑏∗ .

• The adversary 𝐴 gets (𝑦∗ , 𝑧 ∗ ) as input, makes some additional poly-


nomial amount of computation and 𝑇2 = 𝑝𝑜𝑙𝑦(𝑛) queries to 𝐻, and
then outputs 𝑏.

• The adversary wins if 𝑏 = 𝑏∗ .

We make the following claim:


CLAIM: With probability at least 𝜖, the adversary 𝐴 will make the
query 𝑥∗ to the random oracle.
PROOF: Suppose otherwise. We will prove the claim using the
“forgetful gnome” technique as used in the Boneh Shoup book. By
the “lazy evaluation” paradigm, we can imagine that queries to 𝐻 are
answered by a “faithful gnome” that whenever presented with a new
query 𝑥, chooses a uniform and independent value 𝑤 ←𝑅 {0, 1}ℓ as a
response, and then records that 𝐻(𝑥) = 𝑤 to use that as answers for
future queries.
Now consider the experiment where in the challenge part we use
a “forgetful gnome” that answers 𝐻(𝑥∗ ) by a uniform and indepen-
dent string 𝑤∗ ←𝑅 {0, 1}ℓ and does not record the answer for future
queries. In the “forgetful experiment”, the second component of the
ciphertext 𝑧∗ = 𝑤∗ ⊕ 𝑚𝑏∗ is distributed uniformly in {0, 1}ℓ and inde-
pendently from all other random choices, regardless of whether 𝑏∗ = 0
or 𝑏∗ = 1. Hence in this “forgetful experiment” the adversary gets
no information about 𝑏∗ and its probability of winning is at most 1/2.
But the forgetful experiment is identical to the actual experiment if the
conc re te ca n d i date s for p u bl i c ke y c ry p to 227

value 𝑥∗ is only queried to 𝐻 once. Apart from the query of 𝑥∗ by the


challenger, all other queries to 𝐻 are made by the adversary. Under
our assumption, the adversary makes the query 𝑥∗ with probability at
most 𝜖, and conditioned on this not happening the two experiments
are identical. Since the probability of winning in the forgetful exper-
iment is at most 1/2, the probability of winning in the overall experi-
ment is less than 1/2 + 𝜖, thus yielding a contradiction and establishing
the claim. (These kind of analyses on sample spaces can be confusing;
See Fig. 10.1 for a graphical illustration of this argument.)
Given the claim, we can now construct our inverter algorithm 𝐼 as
follows:

• The input to 𝐼 is the key 𝑘 to the trapdoor permutation and 𝑦∗ =


𝑝𝑘 (𝑥∗ ). The goal of 𝐼 is to output 𝑥∗ .

• The inverter simulates the adversary in a CPA attack, answering all


its queries to the oracle 𝐻 by random values if they are new or the
previously supplied answers if they were asked before. Whenever
the adversary makes a query 𝑥 to 𝐻, 𝐼 checks if 𝑝ℎ (𝑥) = 𝑦∗ and if so
halts and outputs 𝑥.

• When the time comes to produce the challenge, the inverter 𝐼


chooses 𝑧∗ at random and provides the adversary with (𝑦∗ , 𝑧 ∗ )
where 𝑧 ∗ = 𝑤∗ ⊕ 𝑚𝑏∗ .3 It would have been equivalent to answer the adver-
3

sary with a uniformly chosen 𝑧∗ in {0, 1}ℓ , can you


• The inverter continues the simulation again halting an outputting 𝑥 see why?

if the adversary makes the query 𝑥 such that 𝑝𝑘 (𝑥) = 𝑦∗ to 𝐻.

We claim that up to the point we halt, the experiment is identical


to the actual attack. Indeed, since 𝑝𝑘 is a permutation, we know that
if the time came to produce the challenge and we have not halted,
then the query 𝑥∗ has not been made yet to 𝐻. Therefore we are free
to choose an independent random value 𝑤∗ as the value 𝐻(𝑥∗ ). (Our
inverter does not know what the value 𝑥∗ is, but this does not matter
for this argument: can you see why?) Therefore, since by the claim the
adversary will make the query 𝑥∗ to 𝐻 with probability at least 𝜖, our
inverter will succeed with the same probability.

Figure 10.1: In the proof of security of TDPENC, we


P show that if the assumption of the claim is violated,
This proof of Theorem 10.15 is not very long but it
the “forgetful experiment” is identical to the real
is somewhat subtle. Please re-read it and make sure experiment with probability larger 1 − 𝜖. In such
you understand it. I also recommend you look at the a case, even if all that probability mass was on the
version of the same proof in Boneh Shoup: Theorem points in the sample space where the adversary in
11.2 in Section 11.4 (“Encryption based on a trapdoor the forgetful experiment will lose and the adversary
function scheme”). of the real experiment will win, the probability of
winning in the latter experiment would still be less
than 1/2 + 𝜖.
228 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

R
Remark 10.16 — Security without random oracles. We
do not need to use a random oracle to get security in
this scheme, especially if ℓ is sufficiently short. We can
replace 𝐻() with a hash function of specific properties
known as a hard core construction; this was first shown
by Goldreich and Levin.

10.1.7 Digital signatures from trapdoor permutations


Here is how we can get digital signatures from trapdoor permutations
{𝑝𝑘 }. This is known as the “full domain hash” signatures.
Full domain hash signatures (FDHSIG):

• Key generation: Run the key generation algorithm


of the TDP to get (𝑘, 𝜏 ). 𝑘 is the public verification
key and 𝜏 is the secret signing key.
• Signing: To sign a message 𝑚 with key 𝜏 , we
output 𝑝𝑘−1 (𝐻(𝑚)) where 𝐻 ∶ {0, 1}∗ → {0, 1}𝑛 is
a hash function modeled as a random oracle.
• Verification: To verify a message-signature pair
(𝑚, 𝑥) we check that 𝑝𝑘 (𝑥) = 𝐻(𝑚).

We now prove the security of full domain hash:

If {𝑝𝑘 } is a secure TDP and


Theorem 10.17 — Full domain hash security.
𝐻 is a random oracle then FDHSIG is chosen message attack secure
digital signature scheme.

Proof. Suppose towards the sake of contradiction that there is a


polynomial-sized adversary 𝐴 that succeeds in a chosen message
attack with non-negligible probability 𝜖 > 0. We will construct an
inverter 𝐼 for the trapdoor permutation collection that succeeds with
non-negligible probability as well.
Recall that in a chosen message attack the adversary makes 𝑇
queries 𝑚1 , … , 𝑚𝑇 to its signing box which are interspersed with 𝑇 ′
queries 𝑚′1 , … , 𝑚′𝑇 ′ to the random oracle 𝐻. We can assume without
loss of generality (by modifying the adversary and at most doubling
the number of queries) that the adversary always queries the message
𝑚𝑖 to the random oracle before it queries it to the signing box, though
it can also make additional queries to the random oracle (and hence
in particular 𝑇 ′ ≥ 𝑇 ). At the end of the attack the adversary outputs
with probability 𝜖 a pair (𝑥∗ , 𝑚∗ ) such that 𝑚∗ was not queried to the
signing box and 𝑝𝑘 (𝑥∗ ) = 𝐻(𝑚∗ ).
Our inverter 𝐼 works as follows:
conc re te ca n d i date s for p u bl i c ke y c ry p to 229

• Input: 𝑘 and 𝑦∗ = 𝑝𝑘 (𝑦∗ ). Goal is to output 𝑥∗ .

• 𝐼 will guess at random 𝑡∗ which is the step in which the adversary


will query to 𝐻 the message 𝑚∗ that it is eventually going to forge
in. With probability 1/𝑇 ′ the guess will be correct.

• 𝐼 simulates the execution of 𝐴. Except for step 𝑡∗ , whenever 𝐴


makes a new query 𝑚 to the random oracle, 𝐼 will choose a random
𝑥 ← {0, 1}𝑛 , compute 𝑦 = 𝑝𝑘 (𝑥) and designate 𝐻(𝑚) = 𝑦. In step 𝑡∗ ,
when the adversary makes the query 𝑚∗ , the inverter 𝐼 will return
𝐻(𝑚∗ ) = 𝑦∗ . 𝐼 will record the values (𝑥, 𝑦) and so in particular will
always know 𝑝𝑘−1 (𝐻(𝑚)) for every 𝐻(𝑚) ≠ 𝑦∗ that it returned as
answer from its oracle on query 𝑚.

• When 𝐴 makes the query 𝑚 to the signature box, then since 𝑚 was
queried before to 𝐻, if 𝑚 ≠ 𝑚∗ then 𝐼 returns 𝑥 = 𝑝𝑘−1 (𝐻(𝑚)) using
its records. If 𝑚 = 𝑚∗ then 𝐼 halts and outputs “failure”.

• At the end of the game, the adversary outputs (𝑚∗ , 𝑥∗ ). If 𝑝𝑘 (𝑥∗ ) =


𝑦∗ then 𝐼 outputs 𝑥∗ .

We claim that, conditioned on the probability ≥ 𝜖/𝑇 ′ event that the


adversary is successful and the final message 𝑚∗ is the one queried
in step 𝑡∗ , we provide a perfect simulation of the actual game. In-
deed, while in an actual game, the value 𝑦 = 𝐻(𝑚) will be chosen
independently at random in {0, 1}𝑛 , this is equivalent to choosing
𝑥 ←𝑅 {0, 1}𝑛 and letting 𝑦 = 𝑝𝑘 (𝑥). After all, a permutation applied to
the uniform distribution is uniform.
Therefore with probability at least 𝜖/𝑇 ′ the inverter 𝐼 will output 𝑥∗
such that 𝑝𝑘 (𝑥∗ ) = 𝑦∗ hence succeeding in the inverter.

P
Once again, this proof is somewhat subtle. I recom-
mend you also read the version of this proof in Section
13.4 of Boneh-Shoup.

R
Remark 10.18 — Hash and sign. There is another reason
to use hash functions with signatures. By combining a
collision-resistant hash function ℎ ∶ {0, 1}∗ → {0, 1}ℓ
with a signature scheme (𝑆, 𝑉 ) for ℓ-length mes-
sages, we can obtain a signature for arbitrary length
messages by defining 𝑆𝑠′ (𝑚) = 𝑆𝑠 (ℎ(𝑚)) and
𝑉𝑣′ (𝑚, 𝜎) = 𝑉𝑣 (ℎ(𝑚), 𝜎).
230 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

10.2 HARDCORE BITS AND SECURITY WITHOUT RANDOM ORA-


CLES
The main problem with using trapdoor functions as the basis of public
key encryption is twofold: > * The fact that 𝑓 is a trapdoor function
does not rule out the possibility of computing 𝑥 from 𝑓(𝑥) when 𝑥 is
of some special form. Recall that the security of a one-way function
is given over a uniformly random input. Usually messages to be sent
are not drawn from a uniform distribution, and it’s possible that for
some certain values of 𝑥 it is easy to invert 𝑓(𝑥), and those values of
𝑥 also happen to be commonly sent messages. > * The fact that 𝑓 is a
trapdoor function does not rule out the possiblity of easily computing
some partial information about 𝑥 from 𝑓(𝑥). Suppose we wished to
play poker over a channel of bits. If even the suit or color of a card can
be revealed from the encryption of that card, then it doesn’t matter if
the entire encryption cannot be inverted; being able to compute even a
single bit of the plaintext makes the entire game invalid. The RSA and
Rabin functions have not been successfully reversed, but nobody has
been able to prove that they give semantic security. > The solution to
these issues is to use a hardcore predicate of a one-way function 𝑓. We
first define the security of a hardcore predicate, then show how it can
be used to construct semantically secure encryption.

Let 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑛 be a one-


Definition 10.19 — Hardcore predicate.
way function (we assume 𝑓 is length preserving for simplicity),
ℓ(𝑛) be a length function, and ℎ ∶ {0, 1}𝑛 → {0, 1}ℓ(𝑛) be polynomial
time computable. We say ℎ is a hardcore predicate of 𝑓 if for every
efficient adversary 𝐴, every polynomial 𝑝, and all sufficiently large
𝑛,
1
∣Pr[𝐴(𝑓(𝑋𝑛 ), 𝑏(𝑋𝑛 )) = 1] − Pr[𝐴(𝑓(𝑋𝑛 ), 𝑅ℓ(𝑛) ) = 1]∣ <
𝑝(𝑛)

where 𝑋𝑛 and 𝑅ℓ(𝑛) are independently and uniformly distributed


over {0, 1}𝑛 and {0, 1}ℓ(𝑛) , respectively.

That is, given an input 𝑥 ←𝑅 {0, 1}𝑛 chosen uniformly at random,


no efficient adversary can distingusih between a random string 𝑟
and 𝑏(𝑥) given 𝑓(𝑥) with non negligible advantage. This allows us to
construct semantically secure public key encryption:

Hardcore predicate-based public key encryption:

• Key generation: Run the standard key genera-


tion algorithm for the one-way function 𝑓 to get
(𝑒, 𝑑), where 𝑒 is a public key used to compute
conc re te ca n d i date s for p u bl i c ke y c ry p to 231

the function 𝑓 and 𝑑 is a corresponding secret


trapdoor key that makes it easy to invert 𝑓.
• Encryption: To encrypt a message 𝑚 of length 𝑛
with public key 𝑒, pick 𝑥 ←𝑅 {0, 1}𝑛 uniformly at
random and compute (𝑓𝑒 (𝑥), 𝑏(𝑥) ⊕ 𝑚).

• Decryption: To decrypt the ciphertext (𝑐, 𝑐′ ) we first use the secret


trapdoor key 𝑑 to compute 𝐷𝑑 (𝑐) = 𝐷𝑑 (𝑓𝑒 (𝑥)) = 𝑥, then compute
𝑏(𝑥) and 𝑏(𝑥) ⊕ 𝑐′ = 𝑚

P
Please stop to verify that this is a valid public key
encryption scheme.

Note that in this construction of public key encryp-


tion, the input to 𝑓 is 𝑥 drawn uniformly at random
from {0, 1}𝑛 , so the defininition of the one-wayness
of 𝑓 can be applied directly. Furthermore, since 𝑏(𝑥)
is indistinguishable from a random string 𝑟 even
given 𝑓(𝑥), the output 𝑏(𝑥) ⊕ 𝑚 is essentially a one-
time pad encryption of 𝑚, where the key can only
be retrieved by someone who can invert 𝑓. Proving
the security formally is left as an exercise.

This is all fine and good, but how do we actually


construct a hardcore predicate? Blum and Micali
were the first to construct a hardcore predicate
based on the discrete logarithm problem, but the
first construction for general one-way functions was
given by Goldreich and Levin. Their idea is that if 𝑓
is one-way, then it’s hard to guess the exclusive or of
a random subset of the input to 𝑓 when given 𝑓(𝑥)
and the subset itself.

Let 𝑓
Theorem 10.20 — A hardcore predicate for arbitrary one-way functions.
be a one-way function, and let 𝑔 be defined as 𝑔(𝑥, 𝑟) = (𝑓(𝑥), 𝑟),
where |𝑥| = |𝑟|. Let 𝑏(𝑥, 𝑟) = ⊕𝑖∈[𝑛] 𝑥𝑖 𝑟𝑖 be the inner product
mod 2 of 𝑥 and 𝑟. Then 𝑏 is a hard core predicate of the function 𝑔.

The proof of this theorem follows the classic proof


by reduction method, where we assume the exis-
tence of an adversary that can predict 𝑏(𝑥, 𝑟) given
𝑔(𝑥, 𝑟) with non negligible advantage and construct
an adversary that inverts 𝑓 with non negligible
probability. Let 𝐴 be a (possibly randomized) pro-
232 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

gram and 𝜖𝐴 (𝑛) > 1


𝑝(𝑛)
for some polynomial 𝑛 such
that

Pr[𝐴(𝑔(𝑋𝑛 , 𝑅𝑛 )) = 𝑏(𝑋𝑛 , 𝑅𝑛 )] = 1
2 + 𝜖𝐴 (𝑛)
Where 𝑋𝑛 and 𝑅𝑛 are uniform and independent distributions over
{0, 1}𝑛 . We observe that 𝑏 being insecure and having an output of a
single bit implies that such a program 𝐴 exists. First, we show that on
at least 𝜖𝐴 (𝑛) fraction of the possible inputs, program 𝐴 has a 𝜖𝐴2(𝑛)
advantage in predicting the output of 𝑏.
Lemma 10.21 There exists a set 𝑆 ⊆ {0, 1}𝑛 where |𝑆| > 𝜖𝐴 (𝑛)(2𝑛 ) such
that for all 𝑥 ∈ 𝑆,

1 𝜖𝐴 (𝑛)
𝑠(𝑥) = Pr[𝐴(𝑔(𝑥, 𝑅𝑛 )) = 𝑏(𝑥, 𝑅𝑛 )] ≥ +
2 2
Proof. The result follows from an averaging argument. Let 𝑘 = |𝑆|
2𝑛 ,
1 1
𝛼 = ∑ 𝑠(𝑥) and 𝛽 = ∑ 𝑠(𝑥) be the averages of 𝑠(𝑥) over
𝑘 𝑥∈𝑆 1 − 𝑘 𝑥∉𝑆
values in and not in 𝑆, respectively, so 𝑘𝛼 + (1 − 𝑘)𝛽 = 21 + 𝜖. For
notational convenience we set 𝜖 = 𝜖𝐴 (𝑛). By definition 𝔼[𝑠(𝑋𝑛 )] = 12 +
𝜖, so the fact that 𝛼 ≤ 1 and 𝛽 < 21 + 2𝜖 gives 𝑘 + (1 − 𝑘) ( 21 + 2𝜖 ) > 21 + 𝜖,
and solving finds that 𝑘 > 𝜖.

Now we observe that for any 𝑟 ∈ {0, 1}𝑛 , we have

𝑥𝑖 = 𝑏(𝑥, 𝑟) ⊕ 𝑏(𝑥, 𝑟 ⊕ 𝑒𝑖 )
where 𝑒𝑖 is the vector with all 0s except a 1 in the 𝑖th location. This
observation follows from the definition of 𝑏, and it motivates the main
idea of the reduction: Guess 𝑏(𝑥, 𝑟) and use 𝐴 to compute 𝑏(𝑥, 𝑟 ⊕ 𝑒𝑖 ),
then put it together to find 𝑥𝑖 for all 𝑖. The reason guessing works
will become clear later, but intuitively the reason we cannot simply
use 𝐴 to compute both 𝑏(𝑥, 𝑟) and 𝑏(𝑥, 𝑟 ⊕ 𝑒𝑖 ) is that the probability
𝐴 guesses both correctly is only (standard union) bounded below
by 1 − 2 ( 21 − 𝜖𝐴 (𝑛)) = 2𝜖𝐴 (𝑛). However, if we can guess 𝑏(𝑥, 𝑟)
correctly, then we only need to invoke 𝐴 one time to get a better than
half probability of correctly determining 𝑥𝑖 . It is then a simple matter
of taking a majority vote over several such 𝑟 to determine each 𝑥𝑖 .
Now the natural question is how can we possibly
guess (and here we literally mean randomly guess)
each value of 𝑏(𝑥, 𝑟)? The key is that the values of
𝑟 only need to be pairwise independent, since down
the line we plan to use Chebyshev’s inequality on
conc re te ca n d i date s for p u bl i c ke y c ry p to 233

the accuracy of our guesses 4 . This means that


while we need 𝑝𝑜𝑙𝑦(𝑛) many values of 𝑟, we can
get away with guessing log(𝑛) values of 𝑏(𝑥, 𝑟) and
combining them with some trickery to get more
while preserving pairwise independence. Since
2− log 𝑛 = 𝑛1 , with non negligible probability we can
correctly guess all of our 𝑏(𝑥, 𝑟) for polynomially
many 𝑟. We then use 𝐴 to compute 𝑏(𝑥, 𝑟 ⊕ 𝑒𝑖 ) for all
𝑟 and 𝑖, and since 𝐴 has a non negligible advantage
by majority vote we can retrieve each value of 𝑥𝑖 to
invert 𝑓, thus contradicting the one-wayness of 𝑓.
4
This has to do with the fact that Chebyshev’s in-
equality is based on the variances of random vari-
ables. If we had to use the Chernoff bound we would
P be in trouble, since that requires full independence.
It is important that you understand why we can- For more on these and other concentration bounds,
not rely on invoking 𝐴 twice, on both 𝑏(𝑥, 𝑟) and we recommend referring to the text Probability and
𝑏(𝑥, 𝑟 ⊕ 𝑒𝑖 ). It is also important that you understand Computing, by Eli Upfal.
why, with non neligible probability, we can correctly
guess 𝑏(𝑥, 𝑟1 ), … 𝑏(𝑥, 𝑟ℓ ) for 𝑟1 , … 𝑟ℓ chosen indepen-
dently and uniformly at random and ℓ = 𝑂(log 𝑛). At
the moment, it is not important what trickery is used
to combine our guesses, but it will reduce confusion
down the line if you understand why we can get away
with pairwise independence in our inputs instead of
complete mutual independence.

Before moving on to the formal proof of our theorem, please stop


to convince yourself that, given that some trickery exists, this strategy
works for inverting 𝑓.

Proof of Theorem 10.20. ■

We use the assumed existence of 𝐴 to construct 𝐵, a program that


inverts 𝑓 (which we assume is length preserving for notational conve-
nience). Pick 𝑛 = |𝑥| and 𝑙 = ⌈log(2𝑛 ⋅ 𝑝(𝑛)2 + 1)⌉, where 𝜖𝐴 (𝑛) > 𝑝(𝑛)1
.
Next, choose 𝑠1 , … 𝑠𝑙 ∈ {0, 1}𝑛 and 𝜎1 , … 𝜎𝑙 ∈ {0, 1} all independently
and uniformly at random. Here we set 𝜎𝑖 to be the guess for the value
of 𝑏(𝑥, 𝑠𝑖 ). For each non-empty subset 𝐽 of {1, 2, … 𝑙} let 𝑟𝐽 = ⊕𝑗∈𝐽 𝑠𝑗 .
We can observe that

𝑏(𝑥, 𝑟𝐽 ) = 𝑏(𝑥, ⊕𝑗∈𝐽 𝑠𝑗 ) = ⊕𝑗∈𝐽 𝑏(𝑥, 𝑠𝑗 )

by the properties of addition modulo 2, so we can say 𝜌𝐽 = ⊕𝑗∈𝐽 𝜎𝑗


is the correct guess for 𝑏(𝑥, 𝑟𝐽 ) as long as each of 𝜎𝑗 for 𝑗 ∈ 𝐽 are cor-
rect. We can easily verify that the values 𝑟𝐽 are pairwise independent
and uniform, so this construction gives us 𝑝𝑜𝑙𝑦(𝑛) many correct pairs
(𝑏(𝑥, 𝑟𝐽 ), 𝜌𝐽 ) with probability 𝑝𝑜𝑙𝑦(𝑛)
1
, exactly as needed.
Define 𝐺(𝐽 , 𝑖) = 𝜌𝐽 ⊕ 𝐴(𝑓(𝑥), 𝑟𝐽 ⊕ 𝑒𝑖 ) to be the guess for 𝑥𝑖 com-
puted using input 𝑟𝐽 . From here, 𝐵 simply needs to set 𝑥𝑖 to the ma-
234 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

jority value of our guesses 𝐺(𝐽 , 𝑖) over the possible choices of 𝐽 and
output 𝑥.
Now we prove that given that our guesses 𝜌𝐽 are all correct, for all
𝑥 ∈ 𝑆 and for every 1 ≤ 𝑖 ≤ 𝑛, we have

1 𝑙 1
Pr [|{𝐽 |𝐺(𝐽 , 𝑖) = 𝑥𝑖 }| >(2 − 1)] > 1 −
2 2𝑛
That is, with probability at least 1 − 𝑂( 𝑛1 ), more than half of our
(2 − 1) guesses for 𝑥𝑖 are correct, where 2𝑙 − 1 is the number of non
𝑙

empty subsets 𝐽 of {1, 2, … 𝑙}.


For every 𝐽 , define 𝐼𝐽 to be the indicator that
𝐺(𝐽 , 𝑖) = 𝑥𝑖 , and we can observe that 𝐼𝐽 is bernoulli
with expected value 𝑠(𝑥) (again, given that our
guess for 𝑏(𝑥, 𝑟𝐽 ) is correct). Pairwise independence
of the 𝐼𝐽 is given by the pairwise independence of
the 𝑟𝐽 . Setting 𝑚 = 2𝑙 − 1, defining 𝑠(𝑥) = 21 + 𝑞(𝑛)
1
,
and using Chebyshev’s inequality, we get

1 1 1 𝑚
Pr [∑ 𝐼𝐽 ≤ 𝑚] ≤ Pr [∣∑ 𝐼𝐽 − ( + ) 𝑚∣ ≥ 𝑚]
𝐽
2 𝐽
2 𝑞(𝑛) 𝑞(𝑛)

𝑚
= Pr [∣∑ 𝐼𝐽 − 𝔼 [∑ 𝐼𝐽 ]∣ ≥ ]
𝐽 𝐽
𝑞(𝑛)
𝑚Var(𝐼𝐽 )
≤ 2
𝑚
( 𝑞(𝑛) )
1
4
≤ 2
1
( 𝑞(𝑛) ) 𝑚

Since 𝑥 ∈ 𝑆 we know 1
𝑞(𝑛) ≥ 𝜖𝐴 (𝑛)
2 ≥ 2𝑝(𝑛) ,
1
so
1 1
4 4 1
2
≤ 2
=
1
( 𝑞(𝑛) ) 𝑚 1
( 2𝑝(𝑛) ) 2𝑛 ⋅ 𝑝(𝑛)2 2𝑛

Putting it all together, 𝐵 must first pick an 𝑥 ∈ 𝑆, then correctly


guess 𝜎𝑖 for all 𝑖 ∈ [1, 2, … 𝑙], then 𝐴 must correctly compute 𝑏(𝑥, 𝑟𝐽 ⊕
𝑒𝑖 ) on more than half of the 𝑟𝐽 . Since each of these events happens
independently, we get 𝐵’s success probability to be 𝜖𝐴 (𝑛)( 21𝑙 )(1− 2𝑛 1
)=
𝜖𝐴 (𝑛)( 2𝑛𝑝(𝑛)2 )(1 − 2𝑛 ) > ( 𝑝(𝑛) )( 2𝑛𝑝(𝑛)2 )( 2 ) = 4𝑛𝑝(𝑛)3 , which is non
1 1 1 1 1 1

negligible in 𝑛. This contradicts the assumption that 𝑓 is a one way


function, so no adversary 𝐴 can predict 𝑏(𝑥, 𝑟) given (𝑓(𝑥), 𝑟) with a
non negligible advantage, and 𝑏 is a hardcore predicate of 𝑔.

10.2.1 Extending to more than one hardcore bit


By definition, 𝑏 as constructed above is only a hardcore predicate of
length 1. While it’s great that this method works for any arbitrary one-
conc re te ca n d i date s for p u bl i c ke y c ry p to 235

way function, in the real world messages are sometimes longer than a
single bit. Fortunately, there is hope: Goldreich and Levin’s hardcore
bit construction can be used repeatedly to get a hardcore predicate of
logarithmic length.

Theorem 10.22 — Logarithmically many hardcore bits for arbitrary one-way


functions. Let 𝑓 be a one-way function, and define 𝑔2 (𝑥, 𝑠) =
(𝑓(𝑥), 𝑠), where |𝑥| = 𝑛 and |𝑠| = 2𝑛. Let 𝑐 > 0 be a constant,
and 𝑙(𝑛) = ⌈𝑐 log 𝑛⌉. Let 𝑏𝑖 (𝑥, 𝑠) denote the innter product mod 2 of
the binary vectors 𝑥 and (𝑠𝑖+1 , … 𝑠𝑖+𝑛 ), where 𝑠 = (𝑠1 , … 𝑠2𝑛 ). Then
the function ℎ(𝑥, 𝑠) = 𝑏1 (𝑥, 𝑠) … 𝑏𝑙(𝑛) (𝑥, 𝑠) is a hardcore function of
𝑔2 .

It’s clear that this is an imporant improvement on a single hardcore


bit, but still nowhere near useable in general; imagine encrypting a
text document with a key exponentially long in the size of the docu-
ment. A completely different approach is needed to obtain a hardcore
predicate with length polynomial in the key size. Bellare, Stepanovs,
and Tessaro manage to pull it off using indistinguishability obfusca-
tion of circuits, a cryptographic primitive which, like the existence of
PRGs, is assumed to exist.

Theorem 10.23 — Polynomially many hardcore bits for arbitrary one-way func-
tions. Let F be a one-way function family and G be a punctured
PRF with the same input length of F. Then under the assumed
existence of indistinguishability obfuscators, there exists a function
family H that is hardcore for F. Furthermore, the output length of
H is the same as the output length of G.

Since the output length of 𝐺 can be polynomial in the length of


its input, it follows that 𝐻 outputs polynomially many hardcore bits
in the length of its input. The proofs of Theorem 10.22 and Theo-
rem 10.23 require the usage of results and concepts not yet covered in
this course, but we refer interested readers to their original papers:
Goldreich, O., 1995. Three XOR-lemmas-an exposition. In Elec-
tronic Colloquium on Computational Complexity (ECCC).
Bellare, M., Stepanovs, I. and Tessaro, S., 2014, December. Poly-
many hardcore bits for any one-way function and a framework for
differing-inputs obfuscation. In International Conference on the The-
ory and Application of Cryptology and Information Security (pp. 102-
121). Springer, Berlin, Heidelberg.
11
Lattice based cryptography

Lattice based public key encryption (and its cousins known as


knapsack and coding based encryption) have almost as long a his-
tory as discrete logarithm and factoring based schemes. Already in
1976, right after the Diffie-Hellman key exchange was discovered
(and before RSA), Ralph Merkle was working on building public key
encryption from the NP hard knapsack problem (see Diffie’s recollec-
tion). This can be thought of as the task of solving a linear equation
of the form 𝐴𝑥 = 𝑦 (where 𝐴 is a given matrix, 𝑦 is a given vector,
and the unknown are 𝑥) over the real numbers but with the addi-
tional constraint that 𝑥 must be either 0 or 1. His proposal evolved into
the Merkle-Hellman system proposed in 1978 (which was broken in
1984).
McEliece proposed in 1978 a system based on the difficulty of the
decoding problem for general linear codes. This is the task of solving
noisy linear equations where one is given 𝐴 and 𝑦 such that 𝑦 = 𝐴𝑥 + 𝑒
for a “small” error vector 𝑒, and needs to recover 𝑥. Crucially, here
we work in a finite field, such as working modulo 𝑞 for some prime
𝑞 (that can even be 2) rather than over the reals or rationals. There
are special matrices 𝐴∗ for which we know how to solve this problem
efficiently: these are known as efficiently decodable error correcting
codes. McEliece suggested a scheme where the key generator lets 𝐴 be
a “scrambled” version of a special 𝐴∗ (based on the Goppa algebraic
geometric code). So, someone that knows the scrambling could solve
the problem, but (hopefully) someone that doesn’t know it wouldn’t.
McEliece’s system has so far not been broken.
In a 1996 breakthrough, Ajtai showed a private key scheme based
on integer lattices that had a very curious property- its security could
be based on the assumption that certain problems were only hard in
the worst case, and moreover variants of these problems were known
to be NP hard. This re-ignited the hope that we could perhaps realize
the old dream of basing crypto on the mere assumption that 𝑃 ≠ NP.

Compiled on 11.17.2021 22:35


238 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Alas, we now understand that there are fundamental barriers to this


approach.
Nevertheless, Ajtai’s work attracted significant interest, and within
a year both Ajtai and Dwork, as well as Goldreich, Goldwasser and
Halevi came up with lattice based constructions for public key encryp-
tion (the former based also on worst case assumptions). At about the
same time, Hoffstein, Pipher, and Silverman came up with their NTRU
public key system which is based on stronger assumptions but offers
better performance, and they started a company around it together
with Daniel Lieman.
You may note that I haven’t yet said what lattices are; we will do
so later, but for now if you simply think of questions involving linear
equations modulo some prime 𝑞, you will get enough of the intuition
that you need. (The lattice viewpoint is more geometric, and we’ll
discuss it more below; it was first used to attack cryptosystems and in
particular break the Merkle-Hellman knapsack scheme and many of
its variants.)
Lattice based cryptography has captured a lot of attention recently
from both theory and practice. In the theory side, many cool new
constructions are now based on lattice based cryptography, and chief
among them fully homomorphic encryption, as well as indistinguisha-
bility obfuscation (though the latter’s security’s foundations are still
far less solid). On the applied side, the steady advances in the technol-
ogy of quantum computers have finally gotten practitioners worried
about RSA, Diffie Hellman and Elliptic Curves. While current con-
structions for quantum computers are nowhere near being able to,
say, factor larger numbers that can be done classically (or even than
can be done by hand), given that it takes many years to develop new
standards and get them deployed, many believe the effort to transition
away from these factoring/dlog based schemes should start today (or
perhaps should have started several years ago). Based on this, the Na-
tional Institute of Standards and Technology has started a process to
identify “post quantum” public key encryption scheme. All the finalist
for public-key encryption are based on lattices/codes.
Cryptography has the peculiar/unfortunate feature that if a ma-
chine is built that can factor large integers in 20 years, it can still be
used to break the communication we transmit today, provided this
communication was recorded. So, if you have some data that you
expect you’d want still kept secret in 20 years (as many government
and commercial entities do), you might have reasons to worry. Cur-
rently lattice based cryptography is the only real “game in town” for
potentially quantum-resistant public key encryption schemes.
l atti c e ba se d c ry p tog ra p hy 239

Lattice based cryptography is a huge area, and in this lecture and


this course we only touch on few aspects of it. I highly recommend
Chris Peikert’s Survey for a much more in depth treatment of this area.

11.0.1 Quick linear algebra recap


A field 𝔽 is a set that supports the operations +, ⋅ and contains the
numbers 0 and 1 (more formally the additive identity and multiplica-
tive identity) with the usual properties that the real numbers have.
(That is associative, commutative, and distributive law, the fact that
for every 𝑥 ∈ 𝔽 there is an element −𝑥 such that 𝑥 + (−𝑥) = 0 and
that if 𝑥 ≠ 0 there is an element 𝑥−1 such that 𝑥 ⋅ 𝑥−1 = 1.) Apart
from the real numbers, the main field we will be interested in this sec-
tion is the field ℤ𝑞 of the numbers {0, 1, … , 𝑞 − 1} with addition and 1
While this won’t be of interest for us in this chapter,
multiplication done modulo 𝑞, where 𝑞 is a prime number.1 one can also define finite fields whose size is a prime
power of the form 𝑞𝑘 where 𝑞 is a prime and 𝑘 is an
You should be comfortable with the following notions (these are integer; this is sometimes useful and in particular
covered in a number of sources, including the appendix of Katz- fields of size 2𝑘 are sometimes used in practice. In
Lindell and Shoup’s online-available book): such fields we usually think of the elements as vector
𝑣 ∈ (ℤ𝑞 )𝑘 with addition done component-wise but
multiplication is not defined component-wise (since
• A vector 𝑣 ∈ 𝔽𝑛 and a matrix 𝑀 ∈ 𝔽𝑚×𝑛 . An 𝑚 × 𝑛 matrix has 𝑚 otherwise a vector with a single coordinate zero
would not have an inverse) but in a different way, via
rows and 𝑛 columns. We think of vectors as column vectors and so interpreting these vectors as coefficients of a degree
we can think of a vector 𝑣 ∈ 𝔽𝑛 as an 𝑛 × 1 matrix. We write the 𝑘 − 1 polynomial.
𝑖-th coordinate of 𝑣 as 𝑣𝑖 and the (𝑖, 𝑗)-th coordinate of 𝑀 as 𝑀𝑖,𝑗
(i.e. the coordinate in the 𝑖-th row and the 𝑗-th column.) We often
write a vector 𝑣 as (𝑣1 , … , 𝑣𝑛 ) but we still mean that it’s a column
vector unless we say otherwise.

• If 𝛼 ∈ 𝔽 is a scalar (i.e., a number) and 𝑣 ∈ 𝔽𝑛 is a vector then 𝛼𝑣 is


the vector (𝛼𝑣1 , … , 𝛼𝑣𝑛 ). If 𝑢, 𝑣 are 𝑛 dimensional vectors then 𝑢 + 𝑣
is the vector (𝑢1 + 𝑣1 , … , 𝑢𝑛 + 𝑣𝑛 ).

• A linear subspace 𝑉 ⊆ 𝔽𝑛 is a non-empty set of vectors such that


for every vectors 𝑢, 𝑣 ∈ 𝑉 and 𝛼, 𝛽 ∈ 𝔽, 𝛼𝑢 + 𝛽𝑣 ∈ 𝑉 . In partic-
ular this means that 𝑉 contains the all zero vector 0𝑛 (can you see
why?). A subset 𝐴 ⊆ 𝑉 is linearly independent if there is no collec-
tion 𝑎1 , … , 𝑎𝑘 ∈ 𝐴 and scalars 𝛼1 , … , 𝛼𝑘 such that ∑ 𝛼𝑖 𝑎𝑖 = 0𝑛 . It
is known (and not hard to prove) that if 𝐴 is linearly independent
then |𝐴| ≤ 𝑛. It is known that for every such linear subspace there
is a linearly independent set 𝐵 = {𝑏1 , … , 𝑏𝑑 } of vectors, with 𝑑 ≤ 𝑛,
such that for every 𝑢 ∈ 𝑉 there exist 𝛼1 , … , 𝛼𝑑 such that 𝑣 = ∑ 𝛼𝑖 𝑏𝑖 .
Such a set is known as a basis for 𝑉 . A subspace 𝑉 has many bases,
but all of them have the same size 𝑑 which is known as the dimen-
sion of 𝑉 . An affine subspace is a set 𝑈 of the form {𝑢0 + 𝑣 ∶ 𝑣 ∈ 𝑉 }
where 𝑉 is a linear subspace. We can also write 𝑈 as 𝑢0 + 𝑉 . We
denote the dimension of 𝑈 as the dimension of 𝑉 in such a case.
240 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

• The inner product (also known as “dot product”) ⟨𝑢, 𝑣⟩ between


two vectors of the same dimension 𝑛 is defined as ∑ 𝑢𝑖 𝑣𝑖 (addition 2
Inner products can be defined more generally, and in
done in the field 𝔽).2 particular over fields such as the complex numbers we
would define the inner product as ∑ 𝑢𝑖 𝑣𝑖 where for
• The matrix product AB of an 𝑚 × 𝑘 and a 𝑘 × 𝑛 matrix results 𝑎 ∈ ℂ, 𝑎 denotes the complex conjugate of 𝑎. However,
in an 𝑚 × 𝑛 matrix. If we think of the rows of 𝐴 as the vectors we stick to this simple case for this chapter.

𝐴1 , … , 𝐴𝑚 ∈ 𝔽𝑘 and the columns of 𝐵 as 𝐵1 , … , 𝐵𝑛 ∈ 𝔽𝑘 , then the


(𝑖, 𝑗)-th coordinate of AB is ⟨𝐴𝑖 , 𝐵𝑗 ⟩. Matrix product is associative
and satisfies the distributive law but is not commutative: there are
pairs of square matrices 𝐴, 𝐵 such that AB ≠ BA.

• The transpose of an 𝑛 × 𝑚 matrix 𝐴 is the 𝑚 × 𝑛 matrix 𝐴⊤ such that


(𝐴⊤ )𝑖,𝑗 = 𝐴𝑗,𝑖 .

• The inverse of a square 𝑛 × 𝑛 matrix 𝐴 is the matrix 𝐴−1 such that


AA = 𝐼 where 𝐼 is the 𝑛 × 𝑛 identity matrix such that 𝐼𝑖,𝑗 = 1 if
−1

𝑖 = 𝑗 and 𝐼𝑖,𝑗 = 0 otherwise.

• The rank of an 𝑚 × 𝑛 matrix 𝐴 is the minimum number 𝑟 such that


𝑟
we can write 𝐴 as ∑𝑖=1 𝑢𝑖 (𝑣𝑖 )⊤ where 𝑢𝑖 ∈ 𝔽𝑚 and 𝑣𝑖 ∈ 𝔽𝑛 . We
can think of the 𝑢𝑖 ’s as the columns of an 𝑚 × 𝑟 matrix 𝑈 and the
𝑣𝑖 ’s as the rows of an 𝑟 × 𝑛 matrix 𝑉 , and hence the rank of 𝐴 is the
minimum 𝑟 such that 𝐴 = UV where 𝑈 is 𝑚 × 𝑟 and 𝑉 is 𝑟 × 𝑛. It
can be shown that an 𝑛 × 𝑛 matrix is full rank if and only if it has an
inverse.

• Solving linear equations can be thought of as the task of given an 𝑚×


𝑛 matrix 𝐴 and 𝑚-dimensional vector 𝑦, finding the 𝑛-dimensional
vector 𝑥 such that 𝐴𝑥 = 𝑦. If the rank of 𝐴 is at least 𝑛 (which in
particular means that 𝑚 ≥ 𝑛) then by dropping 𝑚 − 𝑛 rows of 𝐴
and coordinates of 𝑦 we can obtain the equation 𝐴′ 𝑥 = 𝑦′ where
𝐴′ is an 𝑛 × 𝑛 matrix that has an inverse. In this case a solution
(if it exists) will be equal to (𝐴′ )−1 𝑦. If for a set of equations we
have 𝑚 > 𝑛 and we can find two such matrices 𝐴′ , 𝐴″ such that
(𝐴′ )−1 𝑦 ≠ (𝐴″ )−1 𝑦 then we say it is over determined and in such
a case it has no solutions. If a set of equations has more variables
𝑛 than equations 𝑚 we say it’s under-determined. In such a case it
either has no solutions or the solutions form an affine subspace of
dimension at least 𝑛 − 𝑚.

• The gaussian elimination algorithm can be used to obtain, given a set


of equations 𝐴𝑥 = 𝑦 a solution to 𝑥 if such exists or a certification
that no solution exists. It can be executed in time polynomial in the
dimensions and the bit complexity of the numbers involved. This
algorithm can also be used to obtain an inverse of a given matrix 𝐴,
if such an inverse exists.
l atti c e ba se d c ry p tog ra p hy 241

R
Remark 11.1 — Keep track of dimensions!. Through-
out this chapter, and while working in lattice based
cryptography in general, it is crucial to keep track of
the dimensions. Whenever you see a symbol such as
𝑣, 𝐴, 𝑥, 𝑦 ask yourself:

• Is it a scalar, a vector or a matrix?


• If it is a vector or a matrix, what are its dimensions?
• If it’s a matrix, is it “square” (i.e., 𝑚 = 𝑛), “short
and fat” (i.e., 𝑚 ≪ 𝑛) or “tall and skinny”? (𝑚 ≫
𝑛)?

11.1 A WORLD WITHOUT GAUSSIAN ELIMINATION


The general approach people use to get a public key encryption is
to obtain a hard computational problem with some mathematical
structure. We’ve seen this in the discrete logarithm problem, where the
task is to invert the map 𝑎 ↦ 𝑔𝑎 (mod 𝑝), and the integer factoring
problem, where the task is to invert the map 𝑎, 𝑏 ↦ 𝑎 ⋅ 𝑏. Perhaps the
simplest structure to consider is the task of solving linear equations. 3
Despite the name, Gaussian elimination has been
known to Chinese mathematicians since 150BC or so,
Pretend that we didn’t know of Gaussian elimination,3 and that and was popularized in the west through the 1670
if we picked a “generic” matrix 𝐴 then the map 𝑥 ↦ 𝐴𝑥 would be notes of Isaac Newton, more than 100 years before
hard to invert. (Here and elsewhere, our default interpretation of Gauss was born.

a vector 𝑥 is as a column vector, and hence if 𝑥 is 𝑛 dimensional and


𝐴 is 𝑚 × 𝑛 then 𝐴𝑥 is 𝑚 dimensional. We use 𝑥⊤ to denote the row
vector obtained by transposing 𝑥.) Could we use that to get a public
key encryption scheme?
Here is a concrete approach. Let us fix some prime 𝑞 (think of it as
polynomial size, e.g., 𝑞 is smaller than 1024 or so, though people can
and sometimes do consider 𝑞 of exponential size), and all computation
below will be done modulo 𝑞. The secret key is a vector 𝑥 ∈ ℤ𝑛𝑞 , and
the public key is (𝐴, 𝑦) where 𝐴 is a random 𝑚 × 𝑛 matrix with entries
in ℤ𝑞 and 𝑦 = 𝐴𝑥. Under our assumption, it is hard to recover the
secret key from the public key, but how do we use the public key to
encrypt?
The crucial observation is that even if we don’t know how to solve
linear equations, we can still combine several equations to get new
ones. To keep things simple, let’s consider the case of encrypting a
single bit.

P
If you have a CPA secure public key encryption
scheme for single bit messages then you can extend
242 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

it to a CPA secure encryption scheme for messages of


any length. Can you see why?

If 𝑎1 , … , 𝑎𝑚 are the rows of 𝐴, we can think of the public key as the


set of equations ⟨𝑎1 , 𝑥⟩ = 𝑦1 , … , ⟨𝑎𝑚 , 𝑥⟩ = 𝑦𝑚 in the unknown vari-
ables 𝑥. The idea is that to encrypt the value 0 we will generate a new
correct equation on 𝑥, while to encrypt the value 1 we will generate an
incorrect equation. To decrypt a ciphertext (𝑎, 𝜎) ∈ ℤ𝑛+1𝑞 , we think of
it as an equation of the form ⟨𝑎, 𝑥⟩ = 𝜎 and output 1 if and only if the
equation is incorrect.
How does the encrypting algorithm, that does not know 𝑥, get
a correct or incorrect equation on demand? One way would be to
simply take two equations ⟨𝑎𝑖 , 𝑥⟩ = 𝑦𝑖 and ⟨𝑎𝑗 , 𝑥⟩ = 𝑦𝑗 and add them
together to get the equation ⟨𝑎𝑖 + 𝑎𝑗 , 𝑥⟩ = 𝑦𝑖 + 𝑦𝑗 . This equation is
correct and so one can use it to encrypt 0, while to encrypt 1 we simply
add some fixed nonzero number 𝛼 ∈ ℤ𝑞 to the right hand side to get
the incorrect equation ⟨𝑎𝑖 + 𝑎𝑗 , 𝑥⟩ = 𝑦𝑖 + 𝑦𝑗 + 𝛼. However, even if it’s
hard to solve for 𝑥 given the equations, an attacker (who also knows
the public key (𝐴, 𝑦)) can try itself all pairs of equations and do the
same thing.
Our solution for this is simple- just add more equations! If the en-
cryptor adds a random subset of equations then there are 2𝑚 possibili-
ties for that, and an attacker can’t guess them all. That is, if the rows of
𝐴 are 𝑎1 , … , 𝑎𝑚 , then we can pick a vector 𝑤 ∈ {0, 1}𝑚 at random, and
consider the equation ⟨𝑎, 𝑥⟩ = 𝑦 where 𝑎 = ∑ 𝑤𝑖 𝑎𝑖 and 𝑦 = ∑ 𝑤𝑖 𝑦𝑖 . In
other words, we can think of this as the equation 𝑤⊤ 𝐴𝑥 = ⟨𝑤, 𝑦⟩ (note
that ⟨𝑤, 𝑦⟩ = 𝑤⊤ 𝑦 and so we can think of this as the equation that we
obtain from 𝐴𝑥 = 𝑦 by multiplying both sides on the left by the row
vector 𝑤⊤ ).
Thus, at least intuitively, the following encryption scheme would
be “secure” in the Gaussian elimination-free world of attackers that
haven’t taken freshman linear algebra:

Scheme “LwoE-ENC”: Public key encryption under


the hardness of “learning linear equations without
errors”.

• Key generation: Pick random 𝑚 × 𝑛 matrix 𝐴 over


ℤ𝑞 , and 𝑥 ←𝑅 ℤ𝑛𝑞 , the secret key is 𝑥 and the pub-
lic key is (𝐴, 𝑦) where 𝑦 = 𝐴𝑥.
• Encryption: To encrypt a message 𝑏 ∈ {0, 1}, pick
𝑤 ∈ {0, 1}𝑚 and output 𝑤⊤ 𝐴, ⟨𝑤, 𝑦⟩ + 𝛼𝑏 for some
fixed nonzero 𝛼 ∈ ℤ𝑞 .
• Decryption: To decrypt a ciphertext (𝑎, 𝜎), output
0 iff ⟨𝑎, 𝑥⟩ = 𝜎.
l atti c e ba se d c ry p tog ra p hy 243

P
Please stop here and make sure that you see why
this is a valid encryption (not in the sense that it is
secure - it’s not - but in the sense that decryption of
an encryption of 𝑏 returns the bit 𝑏), and this descrip-
tion corresponds to the previous one; as usual all
calculations are done modulo 𝑞.

11.2 SECURITY IN THE REAL WORLD.


Like it or not (and cryptographers typically don’t) Gaussian elimina-
tion is possible in the real world and the scheme above is completely
insecure. However, the Gaussian elimination algorithm is extremely
brittle.
Errors tend to be amplified when you combine equations. This is
usually thought of as a bad thing, and numerical analysis is much
about dealing with this issue. However, from the cryptographic point
of view, these errors can be our saving grace and enable us to salvage
the security of the ridiculous scheme above.
To see why Gaussian elimination is brittle, let us recall how it
works. Think of 𝑚 = 𝑛 for simplicity. Given equations 𝐴𝑥 = 𝑦 in
the unknown variables 𝑥, the goal of Gaussian elimination is to trans-
form them into the equations 𝐼𝑥 = 𝑦′ where 𝐼 is the identity matrix
(and hence the solution is simply 𝑥 = 𝑦′ ). Recall how we do it: by
rearranging and scaling, we can assume that the top left corner of 𝐴
is equal to 1, and then we add the first equation to the other equa-
tions (scaled appropriately) to zero out the first entry in all the other
rows of 𝐴 (i.e., make the first column of 𝐴 equal to (1, 0, … , 0)) and
continue onwards to the second column and so on and so forth.
Now, suppose that the equations were noisy, in the sense that we 4
Over ℤ𝑞 , we can think of 𝑞 − 1 also as the number
4 −1, and so on. Thus if 𝑎 ∈ ℤ𝑞 , we define |𝑎| to be the
added to 𝑦 a vector 𝑒 ∈ ℤ𝑚 𝑞 such that |𝑒𝑖 | < 𝛿𝑞 for every 𝑖. Even ignor- minimum of 𝑎 and 𝑞 − 𝑎. This ensures the absolute
ing the effect of the scaling step, simply adding the first equation to value satisfies the natural property of |𝑎| = | − 𝑎|.
the rest of the equations would typically tend to increase the relative
error of equations 2, … , 𝑚 from ≈ 𝛿 to ≈ 2𝛿. Now, when we repeat
the process, we increase the error of equations 3, … , 𝑚 from ≈ 2𝛿 to
≈ 4𝛿, and we see that by the time we’re done dealing with about 𝑛/2
variables, the remaining equations have error level roughly 2𝑛/2 𝛿. So,
unless 𝛿 was truly tiny (and 𝑞 truly big, in which case the difference
between working in ℤ𝑞 and simply working with integers or rationals
disappears), the resulting equations have the form 𝐼𝑥 = 𝑦′ + 𝑒′ where
𝑒′ is so big that we get no information on 𝑥.
The Learning With Errors (LWE) conjecture is that this is inherent:
244 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Conjecture (Learning with Errors, Regev 2005):


Let 𝑞 = 𝑞(𝑛) and 𝛿 = 𝛿(𝑛) be some functions. The
Learning with Error (LWE) conjecture with respect to
𝑞, 𝛿,denoted as LWE𝑞,𝛿 , is the following conjecture:
for every polynomial 𝑚(𝑛) and polynomial-time
adversary 𝑅,

Pr[𝑅(𝐴, 𝐴𝑥 + 𝑒) = 𝑥] < 𝑛𝑒𝑔𝑙(𝑛)


where for 𝑞 = 𝑞(𝑛) and 𝛿 = 𝛿(𝑛), this probability
is taken over 𝐴 a random 𝑚 × 𝑛 matrix over ℤ𝑞 , 𝑥 a
random vector in ℤ𝑛𝑞 , and 𝑒 a random “noise vector”
in ℤ𝑚 5
𝑞 where |𝑒𝑖 | < 𝛿𝑞 for every 𝑖 ∈ [𝑚].

The LWE conjecture (without any parameters) is that


there is some absolute constant 𝑐 such that for every
polynomial 𝑝(𝑛) there, if 𝑞(𝑛) > 𝑝(𝑛)𝑐 then LWE
holds with respect to 𝑞(𝑛) and 𝛿(𝑛) = 1/𝑝(𝑛). 6

6
One can think of 𝑒 as chosen by simply letting every
coordinate be chosen at random in {−𝛿𝑞, −𝛿𝑞 +
It is important to note the order of quantifiers in the learning with
1, … , +𝛿𝑞}. For technical reasons, we sometimes
error conjecture. If we want to handle a noise of low enough mag- consider other distributions and in particular the
nitude (say 𝛿(𝑛) = 1/𝑛2 ) then we need to choose the modulos 𝑞 to discrete Gaussian distribution which is obtained by
letting every coordinate of 𝑒 be an independent
be large enough (for example it is believed that 𝑞 > 𝑛4 will be good Gaussian random variable with standard deviation
enough for this case) and then the adversary can choose 𝑚(𝑛) to be as 𝛿𝑞, conditioned on it being an integer. (A closely
related distribution is obtained by picking such a
big a polynomial as they like, and of course run in time which is an ar-
Gaussian random variable and then rounding it to the
bitrary polynomial in 𝑛. Therefore we can think of such an adversary nearest integer.)
𝑅 as getting access to a “magic box” that they can use 𝑚 = 𝑝𝑜𝑙𝑦(𝑛) 6
People sometimes also consider variants where both
𝑝(𝑛) and 𝑞(𝑛) can be as large as exponential.
number of times to get “noisy equations on 𝑥” of the form (𝑎𝑖 , 𝑦𝑖 ) with
𝑎𝑖 ∈ ℤ𝑛𝑞 , 𝑦𝑖 ∈ ℤ𝑞 where 𝑦𝑖 = ⟨𝑎𝑖 , 𝑥⟩ + 𝑒𝑖 .

P
The LWE conjecture posits that no efficient algorithm
can recover 𝑥 given 𝐴 and 𝐴𝑥 + 𝑒. But you might
wonder whether it’s possible to do this is inefficiently.
The answer is yes. Intuitively the reason is that if we
have more equations than unknown (i.e., if 𝑚 > 𝑛)
then these equations contain enough information to
determine the unknown variables even if they are
noisy. It can be shown that if 𝑚 is sufficiently large
(𝑚 > 10𝑛 will do) then with high probability over
𝐴, 𝑥, 𝑒, given 𝐴 and 𝑦 = 𝑥 + 𝑒, if we enumerate over
all 𝑥̃ ∈ ℤ𝑛𝑞 and output the string minimizing |𝐴𝑥̃ − 𝑦|
(where we define |𝑣| = ∑ |𝑣𝑖 | for a vector 𝑣), then 𝑥̃
will equal 𝑥.
It is a good exercise to work out the details, but a hint
is this can be proven by showing that for every 𝑥̃ ≠ 𝑥,
with high probability over 𝐴, |𝐴𝑥̃ − 𝐴𝑥| > 𝛿𝑞𝑚. The
latter fact holds because 𝑣 = 𝐴(𝑥 − 𝑥)̃ is a random
vector in ℤ𝑚 𝑞 , and the probability that |𝑣| < 𝛿𝑞𝑚 is
l atti c e ba se d c ry p tog ra p hy 245

much smaller than 𝑞 −0.1𝑚 < 𝑞 −𝑛 . Hence we can take a


union bound over all possible 𝑥̃ ∈ ℤ𝑛𝑞 .

11.3 SEARCH TO DECISION


It turns out that if the LWE is hard, then it is even hard to distinguish
between random equations and nearly correct ones:

Figure 11.1: The search to decision reduction (Theo-


rem 11.2) implies that under the LWE conjecture, for
every 𝑚 = 𝑝𝑜𝑙𝑦(𝑛), if we choose and fix a random
𝑚 × 𝑛 matrix 𝐴 over ℤ𝑞 , the distribution 𝐴𝑥 + 𝑒 is
indistinguishable from a random vector in ℤ𝑚 𝑞 , where
𝑥 is a random vector in ℤ𝑛𝑞 and 𝑒 is a random “short”
𝑞 . The two distributions are indistinguish-
vector in ℤ𝑚
able even to an adversary that knows 𝐴.

If the LWE con-


Theorem 11.2 — Search to decision reduction for LWE.
jecture is true then for every 𝑞 = 𝑝𝑜𝑙𝑦(𝑛) and 𝛿 = 1/𝑝𝑜𝑙𝑦(𝑛) and
𝑚 = 𝑝𝑜𝑙𝑦(𝑛), the following two distributions are computationally
indistinguishable:

• {(𝐴, 𝐴𝑥 + 𝑒)} where 𝐴 is random 𝑚 × 𝑛 matrix in ℤ𝑞 , 𝑥 is random


𝑞 is random noise vector of magnitude 𝛿.
in ℤ𝑛𝑞 and 𝑒 ∈ ℤ𝑚

• {(𝐴, 𝑦)} where 𝐴 is random 𝑚 × 𝑛 matrix in ℤ𝑞 and 𝑦 is random


𝑞 .
in ℤ𝑚

Proof. Suppose that we had a decisional adversary 𝐷 that succeeds in


distinguishing the two distributions above with bias 𝜖. For example,
suppose that 𝐷 outputs 1 with probability 𝑝 + 𝜖 on inputs from the
first distribution, and outputs 1 with probability 𝑝 on inputs from the
second distribution.
We will show how we can use this to obtain a polynomial-time
algorithm 𝑆 that on input 𝑚 noisy equations on 𝑥 and a value 𝑎 ∈ 𝑍𝑞 ,
will learn with high probability whether or not the first coordinate of
𝑥 equals 𝑎. Clearly, we can repeat this for all the possible 𝑞 values of 𝑎
to learn the first coordinate exactly, and then continue in this way to
learn all coordinates.
246 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Our algorithm 𝑆 gets as input the pair (𝐴, 𝑦) where 𝑦 = 𝐴𝑥 + 𝑒 and


we need to decide whether 𝑥1 = 𝑎. Now consider the instance (𝐴 +
(𝑟‖0𝑚 ‖ ⋯ ‖0𝑚 ), 𝑦+𝑎𝑟), where 𝑟 is a random vector in ℤ𝑚 𝑞 and the matrix
(𝑟‖0𝑚 ‖ ⋯ ‖0𝑚 ) is simply the matrix with first column equal to 𝑟 and all
other columns equal to 0. If 𝐴 is random then 𝐴 + (𝑟‖0𝑚 ‖ ⋯ ‖0𝑚 ) is
random as well. Now note that 𝐴𝑥 + (𝑟‖0𝑚 ⋯ ‖0𝑚 )𝑥 = 𝐴𝑥 + 𝑥1 𝑟
and hence if 𝑥1 = 𝑎 then we still have an input of the same form
(𝐴′ , 𝐴′ 𝑥 + 𝑒).
In contrast, we claim that if if 𝑥1 ≠ 𝑎 then the distribution (𝐴′ , 𝑦′ )
where 𝐴′ = 𝐴 + (𝑟‖0𝑚 ‖ ⋯ ‖0𝑚 ) and 𝑦′ = 𝐴𝑥 + 𝑒 + 𝑎𝑟 is identical to
the uniform distribution over a random uniformly chosen matrix 𝐴′
and a random and independent uniformly chosen vector 𝑦′ . Indeed,
we can write this distribution as (𝐴′ , 𝑦′ ) where 𝐴′ is chosen uniformly
at random, and 𝑦′ = 𝐴′ 𝑥 + 𝑒 + (𝑎 − 𝑥1 )𝑟 where 𝑟 is a random and in-
dependent vector. (Can you see why?) Since 𝑎 − 𝑥1 ≠ 0, this amounts
to adding a random and independent vector 𝑟 to 𝑦′ , which means that
the distribution (𝐴′ , 𝑦′ ) is uniform and independent.
Hence if we send the input (𝐴′ , 𝑦′ ) to our the decision algorithm 𝐷,
then we would get 1 with probability 𝑝 + 𝜖 if 𝑥1 = 𝑎 and an output of 1
with probability 𝑝 otherwise.
Now the crucial observation is that if our decision algorithm 𝐷
requires 𝑚 equations to succeed with bias 𝜖, we can use 100𝑚𝑛/𝜖2
equations (which is still polynomial) to invoke it 100𝑛/𝜖2 times. This
allows us to distinguish with probability 1 − 2−𝑛 between the case
that 𝐷 outputs 1 with probability 𝑝 + 𝜖 and the case that it outputs 1
with probability 𝑝 (this follows from the Chernoff bound; can you see
why?). Hence by using polynomially more samples than the decision
algorithm 𝐷, we get a search algorithm 𝑆 that can actually recover 𝑥.

11.4 AN LWE BASED ENCRYPTION SCHEME


We can now show the secure variant of our original encryption
scheme:

LWE-based encryption LWE-ENC:

• Parameters: Let 𝛿(𝑛) = 1/𝑛4 and let 𝑞 = 𝑝𝑜𝑙𝑦(𝑛)


be a prime such that LWE holds w.r.t. 𝑞, 𝛿. We let
𝑚 = 𝑛2 log 𝑞.
• Key generation: Pick 𝑥 ∈ ℤ𝑛𝑞 . The private key is 𝑥
and the public key is (𝐴, 𝑦) with 𝑦 = 𝐴𝑥 + 𝑒 with
𝑒 a 𝛿-noise vector and 𝐴 a random 𝑚 × 𝑛 matrix.
• Encrypt: To encrypt 𝑏 ∈ {0, 1} given the
key (𝐴, 𝑦), pick 𝑤 ∈ {0, 1}𝑚 and output
l atti c e ba se d c ry p tog ra p hy 247

𝑤⊤ 𝐴, ⟨𝑤, 𝑦⟩ + 𝑏⌊𝑞/2⌋ (all computations are


done in ℤ𝑞 ).
• Decrypt: To decrypt (𝑎, 𝜎), output 0 iff
|⟨𝑎, 𝑥⟩ − 𝜎| < 𝑞/10.

P
The scheme LWEENC is also described in Fig. 11.2
with slightly different notation. I highly recommend
you stop and verify you understand why the two
descriptions are equivalent.

Figure 11.2: In the encryption scheme LWEENC,


the public key is a matrix 𝐴′ = (𝐴|𝑦), where 𝑦 =
𝐴𝑠 + 𝑒 and 𝑠 is the secret key. To encrypt a bit 𝑏
we choose a random 𝑤 ←𝑅 {0, 1}𝑚 , and output
𝑤⊤ 𝐴′ + (0, … , 0, 𝑏⌊ 2𝑞 ⌋). We decrypt 𝑐 ∈ ℤ𝑛+1
𝑞 to zero
with key 𝑠 iff |⟨𝑐, (𝑠, −1)⟩| ≤ 𝑞/10 where the inner
product is done modulo 𝑞.

Unlike our typical schemes, here it is not immediately clear that this
encryption is valid, in the sense that the decrypting an encryption of 𝑏
returns the value 𝑏. But this is the case:
Lemma 11.3 With high probability, the decryption of the encryption of 𝑏
equals 𝑏.

Proof. ⟨𝑤⊤ 𝐴, 𝑥⟩ = ⟨𝑤, 𝐴𝑥⟩. Hence, if 𝑦 = 𝐴𝑥 + 𝑒 then ⟨𝑤, 𝑦⟩ =


⟨𝑤⊤ 𝐴, 𝑥⟩ + ⟨𝑤, 𝑒⟩. But since every coordinate of 𝑤 is either 0 or 1, 7
In fact, due to the fact that the signs of the error
|⟨𝑤, 𝑒⟩| < 𝛿𝑚𝑞 < 𝑞/10 for our choice of parameters.7 So, we get that vector’s entries are different, we expect the errors to
if 𝑎 = 𝑤⊤ 𝐴 and 𝜎 = ⟨𝑤, 𝑦⟩ + 𝑏⌊𝑞/2⌋ then 𝜎 − ⟨𝑎, 𝑥⟩ = ⟨𝑤, 𝑒⟩ + 𝑏⌊𝑞/2⌋ have significant cancellations and hence we would
expect |⟨𝑤, 𝑒⟩| to only be roughly of magnitude
which will be smaller than 𝑞/10 iff 𝑏 = 0. √
𝑚𝛿𝑞, but this is not crucial for our discussions.

We now prove security of the LWE based encryption:

Theorem 11.4 — CPA security of LWEENC. If the LWE conjecture is true


then LWEENC is CPA secure.

For a public key encryption scheme with messages that are just bits,
CPA security means that an encryption of 0 is indistinguishable from
248 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

an encryption of 1, even given the public key. Thus Theorem 11.4 will
follow from the following lemma:
Lemma 11.5 Let 𝑞, 𝑚, 𝛿 be set as in LWEENC. Then, assuming the LWE
conjecture, the following distributions are computationally indistin-
guishable:

• 𝐷: The distribution over four-tuples of the form (𝐴, 𝑦, 𝑤⊤ 𝐴, ⟨𝑤, 𝑦⟩)


where 𝐴 is uniform in ℤ𝑚×𝑛
𝑞 𝑞 is chosen
, 𝑥 is uniform in ℤ𝑛𝑞 , 𝑒 ∈ ℤ𝑚
with 𝑒𝑖 ∈ {−𝛿𝑞, … , +𝛿𝑞}, 𝑦 = 𝐴𝑥 + 𝑒, and 𝑤 is uniform in {0, 1}𝑚 .

• 𝐷: The distribution over four-tuples (𝐴, 𝑦′ , 𝑎, 𝜎) where all entries


are uniform: 𝐴 is uniform in ℤ𝑚×𝑛
𝑞 𝑞 , 𝑎 is uni-
, 𝑦′ is uniform in ℤ𝑚
form in ℤ𝑞 and 𝜎 is uniform in ℤ𝑞 .
𝑛

P
You should stop here and verify that (i) You under-
stand the statement of Lemma 11.5 and (ii) you un-
derstand why this lemma implies Theorem 11.4. The
idea is that Lemma 11.5 shows that the concatenation
of the public key and encryption of 0 is indistinguish-
able from something that is completely random. You
can then use it to show that the concatenation of the
public key and encryption of 1 is indistinguishable
from the same thing, and then finish using the hybrid
argument.

We now prove Lemma 11.5, which will complete the proof of Theo-
rem 11.4.

Proof of Lemma 11.5. Define 𝐷 to be the distribution (𝐴, 𝑦, 𝑤⊤ 𝐴, ⟨𝑤, 𝑦⟩)


as in the lemma’s statement (i.e., 𝑦 = 𝐴𝑥 + 𝑒 for some 𝑥, 𝑒 chosen as
above). Define 𝐷′ to be the distribution (𝐴, 𝑦′ , 𝑤⊤ 𝐴, ⟨𝑤, 𝑦′ ⟩) where 𝑦′
is chosen uniformly in ℤ𝑚 𝑞 .
We claim that 𝐷 is computationally indistinguishable from 𝐷

under the LWE conjecture. Indeed by Theorem 11.2 (search to deci-


sion reduction) this conjecture implies that the distribution 𝑋 over
pairs (𝐴, 𝑦) with 𝑦 = 𝐴𝑥 + 𝑒 is indistinguishable from the distri-
bution 𝑋 ′ over pairs (𝐴, 𝑦′ ) where 𝑦′ is uniform. But if there was
some polynomial-time algorithm 𝑇 distinguishing 𝐷 from 𝐷′ then
we can design a randomized polynomial-time algorithm 𝑇 ′ distin-
guishing 𝑋 from 𝑋 ′ with the same advantage by setting 𝑇 ′ (𝐴, 𝑦) =
𝑇 (𝐴, 𝑦, 𝑤⊤ 𝐴, ⟨𝑤, 𝑦⟩) for random 𝑤 ←𝑅 {0, 1}𝑚 .
We will finish the proof by showing that the distribution 𝐷′ is
statistically indistinguishable (i.e., has negligible total variation distance)
from 𝐷. This follows from the following claim:
l atti c e ba se d c ry p tog ra p hy 249

CLAIM: Suppose that 𝑚 > 100𝑛 log 𝑞. If 𝐴′ is a random 𝑚 × 𝑛 + 1


matrix over ℤ𝑞 , then with probability at least 1 − 2−𝑛 over the choice
of 𝐴′ , the distribution 𝑍𝐴′ over ℤ𝑛+1 𝑞 which is obtained by choosing 𝑤
at random in {0, 1}𝑚 and outputting 𝑤⊤ 𝐴′ has at most 2−𝑛 statistical
distance from the uniform distribution over ℤ𝑛+1 𝑞 .
Note that the randomness used for the distribution 𝑍𝐴′ is only
obtained by the choice of 𝑤, and not by the choice of 𝐴′ that is fixed.
(This passes a basic “sanity check” since 𝑤 has 𝑚 random bits, while
the uniform distribution over ℤ𝑛𝑞 requires 𝑛 log 𝑞 ≪ 𝑚 random
bits, and hence 𝑍𝐴′ at least has a “fighting chance” in being statisti-
cally close to it.) Another way to state the same claim is that the pair
(𝐴′ , 𝑤⊤ 𝐴′ ) is statistically indistinguishable from the uniform distribu-
tion (𝐴′ , 𝑧) where 𝑧 is a vector chosen independently at random from
ℤ𝑛+1
𝑞 .
The claim completes the proof of the lemma, since letting 𝐴′ be the
matrix (𝐴|𝑦) and 𝑧 = (𝑎, 𝜎), we see that the distribution 𝐷′ , as the
form (𝐴′ , 𝑧) where 𝐴′ is a uniformly random 𝑚 × (𝑛 + 1) matrix and
𝑧 is sampled from 𝑍𝐴′ (i.e., 𝑧 = 𝑤⊤ 𝐴′ where 𝑤 is uniformly chosen
in {0, 1}𝑚 ). Hence this means that the statistical distance of 𝐷′ from
𝐷 (where all elements are uniform) is 𝑂(2−𝑛 ). (Please make sure you
understand this reasoning!)
Proof of claim: The proof of this claim relies on the leftover hash
lemma.
First, the basic idea of the proof: For every 𝑚 × (𝑛 + 1) matrix 𝐴′
over ℤ𝑞 , define ℎ𝐴′ ∶ ℤ𝑚 𝑞 → ℤ𝑞
𝑛+1
to be the map ℎ𝐴′ (𝑤) = 𝑤⊤ 𝐴′ .
This collection can be shown to be a “good” hash function collection
in some specific technical sense, which in particular implies that for
every distribution 𝐷 with much more than 𝑛 log 𝑞 bits of min-entropy,
with all but negligible probability over the choice of 𝐴′ , ℎ𝐴′ (𝐷) is sta-
tistically indistinguishable from the uniform distribution. Now when
we choose 𝑤 at random in {0, 1}𝑚 , it is coming from a distribution
with 𝑚 bits of entropy. If 𝑚 ≫ (𝑛 + 1) log 𝑞, then because the output of
this function is so much smaller than 𝑚, we expect it to be completely
uniform, and this is what’s shown by the leftover hash lemma.
Now we’ll formalize this blueprint. First we need the leftover hash
lemma.
Lemma 11.6 Fix 𝜖 > 0. Let ℋ be a universal hash family with functions
ℎ ∶ 𝒲 → 𝒱. Let 𝑊 be a random variable with output in 𝒲 with
𝐻∞ (𝑊 ) ≥ log |𝒱| + 2 log(1/𝜖) − 2. Then (𝐻(𝑊 ), 𝐻) where 𝐻 follows a
uniform distribution over ℋ has statistical difference less than 𝜖 from
(𝑉 , 𝐻) where 𝑉 is uniform over 𝒱.
250 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

To explain what a universal hash family is, a family ℋ of functions


ℎ ∶ 𝒲 → 𝒱 is a universal hash family if Prℎ←𝑅 ℋ [ℎ(𝑥) = ℎ(𝑥′ )] ≤ |𝒱| 1
for
all 𝑥 ≠ 𝑥 . ′

First, let’s see why Lemma 11.6 implies the claim. Consider the
hash family ℋ = {ℎ𝐴′ }, where ℎ𝐴′ ∶ ℤ𝑚 𝑞 → ℤ𝑞
𝑛+1
is defined by
ℎ𝐴′ (𝑤) = 𝑤⊤ 𝐴′ . For this hash family, the probability over 𝐴′ of 𝑤 ≠ 𝑤′
colliding is Pr𝐴′ [𝑤⊤ 𝐴′ = 𝑤′⊤ 𝐴′ ] = Pr𝐴′ [(𝑤 − 𝑤′ )⊤ 𝐴′ = 0]. Since 𝐴′ is
random, this is 1/(𝑞 𝑛+1 ). So ℋ is a universal hash family.
The min entropy of 𝑤 ←𝑅 {0, 1}𝑚 is the same as the entropy (be-
cause it is uniform) which is 𝑚. The output of the hash family is in
ℤ𝑛+1
𝑞 , and log |ℤ𝑛+1
𝑞 | = (𝑛 + 1) log 𝑞. Since 𝑚 ≥ (𝑛 + 1) log 𝑞 + 20𝑛 − 2
by assumption, Lemma 11.6 implies that (𝑤⊤ 𝐴′ , 𝐴′ ) is 2−10𝑛 close in
terms of statistical distance to (𝑧, 𝐴′ ) where 𝑧 is chosen uniformly in
ℤ𝑛+1
𝑞 .
Now, we’ll show this implies that for probability ≥ 1 − 2−𝑛 over the
selection of 𝐴′ , the statistical distance between 𝑤⊤ 𝐴′ and 𝑧 is less than
2−𝑛 . If not, the distance between (𝑤⊤ 𝐴′ , 𝐴′ ) and (𝑧, 𝐴′ ) would be at
least 2−𝑛 ⋅ 2−𝑛 > 2−10𝑛 . 8
This is based on notes from Daniel Wichs’s class
Proof of Lemma 11.6:8
Let 𝑍 be the random variable (𝐻(𝑊 ), 𝐻), where the probability is
over 𝐻 and 𝑊 . Let 𝑍 ′ be an independent copy of 𝑍.
Step 1: Pr[𝑍 = 𝑍 ′ ] ≤ |ℋ|⋅|𝒱|
1
(1 + 4𝜖2 ). Indeed,

Pr[𝑍 = 𝑍 ′ ] = Pr[(𝐻(𝑊 ), 𝐻) = (𝐻 ′ (𝑊 ′ ), 𝐻 ′ )]
= Pr[𝐻 = 𝐻 ′ ] ⋅ Pr[𝐻(𝑊 ) = 𝐻(𝑊 ′ )]
1
= (Pr[𝑊 = 𝑊 ′ ] + Pr[𝐻(𝑊 ) = 𝐻(𝑊 ′ ) ∧ 𝑊 ≠ 𝑊 ′ ])
|ℋ|
1 1 2 1
≤ ( 𝜖 ⋅4+ )
|ℋ| |𝒱| |𝒱|
1
= (1 + 4𝜖2 ).
|ℋ| ⋅ |𝒱|

Step 2: The statistical difference between (𝐻(𝑊 ), 𝐻) and (𝑉 , 𝐻) is


less than 𝜖. Denote the statistical difference by Δ((𝐻(𝑊 ), 𝐻), (𝑉 , 𝐻)),
then

1 1
Δ((𝐻(𝑊 ), 𝐻), (𝑉 , 𝐻)) = ∑ ∣Pr[𝑍 = (ℎ(𝑤), 𝑤)] − ∣.
2 ℎ,𝑤 |ℋ| ⋅ |𝒱|
l atti c e ba se d c ry p tog ra p hy 251

Define 𝑥ℎ,𝑤 = Pr[𝑍 = (ℎ(𝑤), ℎ)] − |ℋ|⋅|𝒱|


1
and 𝑠ℎ,𝑤 = sign(𝑥ℎ,𝑤 ). Write
𝑥 for the vector of all the 𝑥ℎ,𝑤 and 𝑠 for the vector of all the 𝑠ℎ,𝑤 . Then

1
Δ((𝐻(𝑊 ), 𝐻), (𝑉 , 𝐻)) = ⟨𝑥, 𝑠⟩
2
1
≤ ‖𝑥‖2 ⋅ ‖𝑠‖2 Cauchy-Schwarz
2
√|ℋ| ⋅ |𝒱|
= ‖𝑥‖2 .
2
Let’s expand ‖𝑥‖2 :

2
1
‖𝑥‖22 = ∑ (Pr[𝑍 = (ℎ(𝑤), ℎ)] − )
ℎ,𝑤
|ℋ| ⋅ |𝒱|
2 Pr[𝑍 = (ℎ(𝑤), ℎ)] 1
= ∑ (Pr[𝑍 = (ℎ(𝑤), ℎ)]2 − + )
ℎ,𝑤
|ℋ| ⋅ |𝒱| (|ℋ| ⋅ |𝒱|)2
1 + 4𝜖2 2 |ℋ| ⋅ |𝒱|
≤ − +
|ℋ| ⋅ |𝒱| |ℋ| ⋅ |𝒱| (|ℋ| ⋅ |𝒱|)2
4𝜖2
= .
|ℋ| ⋅ |𝒱|
When we plug this in to our expression for the statistical distance,
we get
√|ℋ| ⋅ |𝒱|
Δ((𝐻(𝑊 ), 𝐻), (𝑉 , 𝐻)) ≤ ‖𝑥‖2
2
≤ 𝜖.
This completes the proof of Lemma 11.6 and hence the theorem.

P
The proof of Theorem 11.4 is quite subtle and requires
some re-reading and thought. To read more about
this, you can look at the survey of Oded Regev, “On
the Learning with Error Problem” Sections 3 and 4.

11.5 BUT WHAT ARE LATTICES?


You can think of a lattice as a discrete version of a subspace. A lattice
𝐿 is simply a discrete subset of ℝ𝑛 such that if 𝑢, 𝑣 ∈ 𝐿 and 𝑎, 𝑏 are
integers then 𝑎𝑢 + 𝑏𝑣 ∈ 𝐿.9 A lattice is given by a basis which simply 9
By discrete we mean that points in 𝐿 are isolated.
One formal way to define it is that there is some 𝜖 > 0
a matrix 𝐵 such that every vector 𝑢 ∈ 𝐿 is obtained as 𝑢 = 𝐵𝑥 for
such that every distinct 𝑢, 𝑣 ∈ 𝐿 are of distance at
some vector of integers 𝑥. It can be shown that we can assume without least 𝜖 from one another.
loss of generality that 𝐵 is full dimensional and hence it’s an 𝑛 by 𝑛
invertible matrix. Note that given a basis 𝐵 we can generate vectors
in 𝐿, as well as test whether a vector 𝑣 is in 𝐿 by testing if 𝐵−1 𝑣 is an
252 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

integer vector. There can be many different bases for the same lattice,
and some of them are easier to work with than others (see Fig. 11.3).

Figure 11.3: A lattice is a discrete subspace 𝐿 ⊆ ℝ𝑛 that


is closed under integer combinations. A basis for the
lattice is a minimal set 𝑏1 , … , 𝑏𝑚 (typically 𝑚 = 𝑛)
such that every 𝑢 ∈ 𝐿 is an integer combination of
𝑏1 , … , 𝑏𝑚 . The same lattice can have different bases.
In this figure the lattice is a set of points in ℝ2 , and
the black vectors 𝑣1 , 𝑣2 and the ref vectors 𝑢1 , 𝑢2 are
two alternative bases for it. Generally we consider the
basis 𝑢1 , 𝑢2 “better” since the vectors are shorter and
it is less “skewed”.

Some classical computational questions on lattices are:

• Shortest vector problem: Given a basis 𝐵 for 𝐿, find the nonzero vec-
tor 𝑣 with smallest norm in 𝐿.

• Closest vector problem: Given a basis 𝐵 for 𝐿 and a vector 𝑢 that is not
in 𝐿, find the closest vector to 𝑢 in 𝐿.

• Bounded distance decoding: Given a basis 𝐵 for 𝐿 and a vector 𝑢 of


the form 𝑢 = 𝑣 + 𝑒 where 𝑣 is in 𝐿, and 𝑒 is a particularly short “er-
ror” vector (so in particular no other vector in the lattice is within
distance ‖𝑒‖ to 𝑢), recover 𝑣. Note that this is a special case of the
closest vector problem.

In particular, if 𝑉 is a linear subspace of ℤ𝑛𝑞 , we can think of it also


as a lattice 𝑉 ̂ of ℝ𝑛 where we simply say that that a vector 𝑢̂ is in 𝑉 ̂ if
all of 𝑢’s
̂ coordinates are integers and if we let 𝑢𝑖 = 𝑢̂𝑖 (mod 𝑞) then
𝑢 ∈ 𝑉 . The learning with error task of recovering 𝑥 from 𝐴𝑥 + 𝑒 can
then be thought of as an instance of the bounded distance decoding
problem for 𝑉 ̂ .
A natural algorithm to try to solve the closest vector and bounded
distance decoding problems is that to take the vector 𝑢, express it in the
basis 𝐵 by computing 𝑤 = 𝐵−1 𝑢, and then round all the coordinates
of 𝑤 to obtain an integer vector 𝑤̃ and let 𝑣 = 𝐵𝑤̃ be a vector in the
lattice. If we have an extremely good basis 𝐿 for the lattice then 𝑣 may
indeed be the closest vector in the lattice, but in other more “skewed”
bases it can be extremely far from it.
l atti c e ba se d c ry p tog ra p hy 253

11.6 RING BASED LATTICES


One of the biggest issues with lattice based cryptosystem is the key
size. In particular, the scheme above uses an 𝑚 × 𝑛 matrix where each
entry takes log 𝑞 bits to describe. (It also encrypts a single bit using
a whole vector, but more efficient “multi-bit” variants are known.)
Schemes using ideal lattices are an attempt to get more practical vari-
ants. These have very similar structure except that the matrix 𝐴 cho-
sen is not completely random but rather can be described by a single
vector. One common variant is the following: we fix some polynomial
𝑝 over ℤ𝑞 with degree 𝑛 and then treat vectors in ℤ𝑛𝑞 as the coefficients
of 𝑛 − 1 degree polynomials and always work modulo this polynomial
𝑝(). (By this I mean that for every polynomial 𝑡 of degree at least 𝑛
we write 𝑡 as 𝑝𝑠 + 𝑟 where 𝑝 is the polynomial above, 𝑠 is some poly-
nomial and 𝑟 is the “remainder” polynomial of degree < 𝑛; then 𝑡
(mod 𝑝) = 𝑟.) Now for every fixed polynomial 𝑡, the operation 𝐴𝑡
which is defined as 𝑠 ↦ 𝑡𝑠 (mod 𝑝) is a linear operation mapping
polynomials of degree at most 𝑛 − 1 to polynomials of degree at most
𝑛 − 1, or put another way, it is a linear map over ℤ𝑛𝑞 . However, the
map 𝐴𝑑 can be described using the 𝑛 coefficients of 𝑡 as opposed to
the 𝑛2 description of a matrix. It also turns out that by using the Fast
Fourier Transform we can evaluate this operation in roughly 𝑛 steps
as opposed to 𝑛2 . The ideal lattice based cryptosystem use matrices of
this form to save on key size and computation time. It is still unclear if
this structure can be used for attacks; recent papers attacking principal
ideal lattices have shown that one needs to be careful about this.
One ideal-lattice based system is the “New Hope” cryptosystem
(see also paper) that has been experimented with by Google. People
have also made highly optimized general (non ideal) lattice based
constructions, see in particular the “Frodo” system (paper here, can
you guess what’s behind the name?). Both New Hope and Frodo have
been submitted to the NIST competition to select a “post quantum”
public key encryption standard.
12
Establishing secure connections over insecure channels

We’ve now compiled all the tools that are needed for the basic goal
of cryptography (which is still being subverted quite often) allow-
ing Alice and Bob to exchange messages assuring their integrity and
confidentiality over a channel that is observed or controlled by an
adversary. Our tools for achieving this goal are:

• Public key (aka asymmetric) encryption schemes.

• Public key (aka asymmetric) digital signatures schemes.

• Private key (aka symmetric) encryption schemes - block ciphers


and stream ciphers.

• Private key (aka symmetric) message authentication codes and


pseudorandom functions.

• Hash functions that are used both as ways to compress messages


for authentication as well as key derivation and other tasks.

The notions of security we require from these building blocks can


vary as well. For encryption schemes we talk about CPA (chosen
plaintext attack) and CCA (chosen ciphertext attacks), for hash func-
tions we talk about collision-resistance, being used (combined with
keys) as pseudorandom functions, and then sometimes we simply
model those as random oracles. Also, all of those tools require access
to a source of randomness, and here we use hash functions as well for
entropy extraction.

12.1 CRYPTOGRAPHY’S OBSESSION WITH ADJECTIVES.


As we learn more and more cryptography we see more and more
adjectives, every notion seems to have modifiers such as “non-
malleable”, “leakage-resilient”, “identity based”, “concurrently
secure”, “adaptive”, “non-interactive”, etc. etc. Indeed, this motivated
a parody web page of an automatic crypto paper title generator.

Compiled on 11.17.2021 22:35


256 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Unlike algorithms, where typically there are straightforward quanti-


tative tradeoffs (e.g., faster is better), in cryptography there are many
qualitative ways protocols can vary based on the assumptions they
operate under and the notions of security they provide.
In particular, the following issues arise when considering the task
of securely transmitting information between two parties Alice and
Bob:

• Infrastructure/setup assumptions: What kind of setup can Alice


and Bob rely upon? For example in the TLS protocol, typically Alice
is a website and Bob is user; using the infrastructure of certificate
authorities, Bob has a trusted way to obtain Alice’s public signature
key, while Alice doesn’t know anything about Bob. But there are
many other variants as well. Alice and Bob could share a (low en-
tropy) password. One of them might have some hardware token, or
they might have a secure out of band channel (e.g., text messages)
to transmit a short amount of information. There are even variants
where the parties authenticate by something they know, with one
recent example being the notion of witness encryption (Garg, Gen-
try, Sahai, and Waters) where one can encrypt information in a
“digital time capsule” to be opened by anyone who, for example,
finds a proof of the Riemann hypothesis.

• Adversary access: What kind of attacks do we need to protect


against. The simplest setting is a passive eavesdropping adversary
(often called “Eve”) but we sometimes consider an active person-
in-the-middle attacks (sometimes called “Mallory”). We sometimes
consider notions of graceful recovery. For example, if the adversary
manages to hack into one of the parties then it can clearly read their
communications from that time onwards, but we would want their
past communication to be protected (a notion known as forward
secrecy). If we rely on trusted infrastructure such as certificate au-
thorities, we could ask what happens if the adversary breaks into
those. Sometimes we rely on the security of several entities or se-
crets, and we want to consider adversaries that control some but not
all of them, a notion known as threshold cryptography. While we typ-
ically assume that information is either fully secret or fully public,
we sometimes want to model side channel attacks where the adver-
sary can learn partial information about the secret, this is known as
leakage-resistant cryptography.

• Interaction: Do Alice and Bob get to interact and relay several


messages back and forth or is it a “one shot” protocol? You may
think that this is merely a question about efficiency but it turns
out to be crucial for some applications. Sometimes Alice and Bob
e sta bl i shi ng se c u re con n e c ti on s ove r i n se c u re cha n n e l s 257

might not be two parties separated in space but the same party
separated in time. That is, Alice wishes to send a message to her
future self by storing an encrypted and authenticated version of it
on some media. In this case, absent a time machine, back and forth
interaction between the two parties is obviously impossible.

• Security goal: The security goals of a protocol are usually stated in


the negative- what does it mean for an adversary to win the secu-
rity game. We typically want the adversary to learn absolutely no
information about the secret beyond what she obviously can. For
example, if we use a shared password chosen out of 𝑡 possibilities,
then we might need to allow the adversary 1/𝑡 success probability,
but we wouldn’t want her to get anything beyond 1/𝑡 + 𝑛𝑒𝑔𝑙(𝑛). In
some settings, the adversary can obviously completely disconnect
the communication channel between Alice and Bob, but we want
her to be essentially limited to either dropping communication
completely or letting it go by unmolested, and not have the ability
to modify communication without detection. Then in some set-
tings, such as in the case of steganography and anonymous routing,
we would want the adversary not to find out even the fact that a
conversation had taken place.

12.2 BASIC KEY EXCHANGE PROTOCOL


The basic primitive for secure communication is a key exchange pro-
tocol, whose goal is to have Alice and Bob share a common random
secret key 𝑘 ∈ {0, 1}𝑛 . Once this is done, they can use a CCA secure /
authenticated private-key encryption to communicate with confiden-
tiality and integrity.
The canonical example of a basic key exchange protocol is the Diffie
Hellman protocol. It uses as public parameters a group 𝔾 with genera-
tor 𝑔, and then follows the following steps:

1. Alice picks random 𝑎 ←𝑅 {0, … , |𝔾| − 1} and sends 𝐴 = 𝑔𝑎 .

2. Bob picks random 𝑏 ←𝑅 {0, … , |𝔾| − 1} and sends 𝐵 = 𝑔𝑏 .

3. They both set their key as 𝑘 = 𝐻(𝑔𝑎𝑏 ) (which Alice computes as 𝐵𝑎


and Bob computes as 𝐴𝑏 ), where 𝐻 is some hash function.

Another variant is using an arbitrary public key encryption scheme


such as RSA:

1. Alice generates keys (𝑑, 𝑒) and sends 𝑒 to Bob.

2. Bob picks random 𝑘 ←𝑅 {0, 1}𝑚 and sends 𝐸𝑒 (𝑘) to Alice.

3. They both set their key to 𝑘 (which Alice computes by decrypting


Bob’s ciphertext)
258 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Under plausible assumptions, it can be shown that these protocols


are secure against a passive eavesdropping adversary Eve. The notion
of security here means that, similar to encryption, if after observing
the transcript Eve receives with probability 1/2 the value of 𝑘 and with
probability 1/2 a random string 𝑘′ ← {0, 1}𝑛 , then her probability
of guessing which is the case would be at most 1/2 + 𝑛𝑒𝑔𝑙(𝑛) (where
𝑛 can be thought of as log |𝔾| or some other parameter related to the
length of bit representation of members in the group).

12.3 AUTHENTICATED KEY EXCHANGE


The main issue with this key exchange protocol is of course that ad-
versaries often are not passive. In particular, an active Eve could agree
on her own key with Alice and Bob separately and then be able to see
and modify all future communication. She might also be able to create
weird (with some potential security implications) correlations by, say,
modifying the message 𝐴 to be 𝐴2 etc..
For this reason, in actual applications we typically use authenticated
key exchange. The notion of authentication used depends on what
we can assume on the setup assumptions. A standard assumption
is that Alice has some public keys but Bob doesn’t. The justification
for this assumption is that Alice might be a server, which has the
capabilities to generate a private/public key pair, disseminate the
public key (e.g., using a certificate authority) and maintain the private
key in a secure storage. In contrast, if Bob is an individual user, then
it might not have access to a secure storage to maintain a private key
(since personal devices can often be hacked). Moreover, Alice might
not care about Bob’s identity. For example, if Alice is nytimes.com and
Bob is a reader, then Bob wants to know that the news he reads really
came from the New York Times, but Alice is equally happy to engage
in communication with any reader. In other cases, such as gmail.com,
after an initial secure connection is setup, Bob can authenticate himself
to Alice as a registered user (by sending his login information or
sending a “cookie” stored from a past interaction).
It is possible to obtain a secure channel under these assumptions,
but one needs to be careful. Indeed, the standard protocol for securing
the web: the transport Layer Security (TLS) protocol (and its prede-
cessor SSL) has gone through six revisions (including a name change
from SSL to TLS) largely because of security concerns. We now illus-
trate one of those attacks.

12.3.1 Bleichenbacher’s attack on RSA PKCS V1.5 and SSL V3.0


If you have a public key, a natural approach is to take the encryption-
based protocol and simply skip the first step since Bob already knows
the public key 𝑒 of Alice. This is basically what happened in the SSL
e sta bl i shi ng se c u re con n e c ti on s ove r i n se c u re cha n n e l s 259

V3.0 protocol. However, as was shown by Bleichenbacher in 1998, it


turns out this is susceptible to the following attack:

• The adversary listens in on a conversation, and in particular ob-


serves 𝑐 = 𝐸𝑒 (𝑘) where 𝑘 is the private key.

• The adversary then starts many connections with the server with
ciphertexts related to 𝑐, and observes whether they succeed or fail
(and in what way they fail, if they do). It turns out that based on
this information, the adversary would be able to recover the key 𝑘.

Specifically, the version of RSA (known as PKCS ฀1 V1.5) used in


the SSL V3.0 protocol requires the value 𝑥 to have a particular for-
mat, with the top two bytes having a certain form. If in the course of
the protocol, a server decrypts 𝑦 and gets a value 𝑥 not of this form
then it would send an error message and halt the connection. While
the designers of SSL V3.0 might not have thought of it that way, this
amounts to saying that an SSL V3.0 server supplies to any party an
oracle that on input 𝑦 outputs 1 iff 𝑦𝑑 (mod 𝑚) has this form, where
𝑑 = 𝑒−1 (mod |ℤ∗𝑚 |) is the secret decryption key. It turned out that
one can use such an oracle to invert the RSA function. For a result of
a similar flavor, see the (1/2 page) proof of Theorem 11.31 (page 418)
in KL, where they show that an oracle that given 𝑦 outputs the least 1
The first attack of this flavor was given in the 1982
significant bit of 𝑦𝑑 (mod 𝑚) allows one to invert the RSA function.1 paper of Goldwasser, Micali, and Tong. Interestingly,
this notion of “hardcore bits” has been used for both
For this reason, new versions of the SSL used a different variant practical attacks against cryptosystems as well as
of RSA known as PKCS ฀1 V2.0 which satisfies (under assumptions) theoretical (and sometimes practical) constructions of
other cryptosystems.
chosen ciphertext security (CCA) and in particular such oracles cannot
be used to break the encryption. Nonetheless, there are still some im-
plementation issues that allowed adversaries to perform some attacks,
specifically Manger showed that depending on how PKCS ฀1 V2.0 is
implemented, it might be possible to still launch an attack. The main
reason is that the specification states several conditions under which
decryption box is supposed to return “error”. The proof of CCA secu-
rity crucially relies on the attacker not being able to distinguish which
condition caused the error message. However, some implementations
could still leak this information, for example by checking these con-
ditions one by one, and so returning “error” quicker when the earlier
conditions hold. See discussion in Katz-Lindell (3rd ed) 12.5.4.

12.4 CHOSEN CIPHERTEXT ATTACK SECURITY FOR PUBLIC KEY


CRYPTOGRAPHY
The concept of chosen ciphertext attack security makes perfect sense
for public key encryption as well. It is defined in the same way as it was
in the private key setting:
260 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

A public key encryp-


Definition 12.1 — CCA secure public key encryption.
tion scheme (𝐺, 𝐸, 𝐷) is chosen ciphertext attack (CCA) secure if
every efficient Mallory wins in the following game with probability
at most 1/2 + 𝑛𝑒𝑔𝑙(𝑛):

• The keys (𝑒, 𝑑) are generated via 𝐺(1𝑛 ), and Mallory gets the
public encryption key 𝑒 and 1𝑛 .

• For 𝑝𝑜𝑙𝑦(𝑛) rounds, Mallory gets access to the function 𝑐 ↦


𝐷𝑑 (𝑐). (She doesn’t need access to 𝑚 ↦ 𝐸𝑒 (𝑚) since she already
knows 𝑒.)

• Mallory chooses a pair of messages {𝑚0 , 𝑚1 }, a secret 𝑏 is cho-


sen at random in {0, 1}, and Mallory gets 𝑐∗ = 𝐸𝑒 (𝑚𝑏 ). (Note
that she of course does not get the randomness used to generate
this challenge encryption.)

• Mallory now gets another 𝑝𝑜𝑙𝑦(𝑛) rounds of access to the func-


tion 𝑐 ↦ 𝐷𝑑 (𝑐) except that she is not allowed to query 𝑐∗ .

• Mallory outputs 𝑏′ and wins if 𝑏′ = 𝑏.

In the private key setting, we achieved CCA security by combining


a CPA-secure private key encryption scheme with a message authenti-
cating code (MAC), where to CCA-encrypt a message 𝑚, we first used
the CPA-secure scheme on 𝑚 to obtain a ciphertext 𝑐, and then added
an authentication tag 𝜏 by signing 𝑐 with the MAC. The decryption
algorithm first verified the MAC before decrypting the ciphertext. In
the public key setting, one might hope that we could repeat the same
construction using a CPA-secure public key encryption and replacing
the MAC with digital signatures.

P
Try to think what would be such a construction, and
whether there is a fundamental obstacle to combin-
ing digital signatures and public key encryption in
the same way we combined MACs and private key
encryption.

Alas, as you may have realized, there is a fly in this ointment. In a


signature scheme (necessarily) it is the signing key that is secret, and
the verification key that is public. But in a public key encryption, the
encryption key is public, and hence it makes no sense for it to use a
secret signing key. (It’s not hard to see that if you reveal the secret
signing key then there is no point in using a signature scheme in the
first place.)
e sta bl i shi ng se c u re con n e c ti on s ove r i n se c u re cha n n e l s 261

Why CCA security matters. For the reasons above, constructing CCA
secure public key encryption is very challenging. But is it worth the
trouble? Do we really need this “ultra conservative” notion of secu-
rity? The answer is yes. Just as we argued for private key encryption,
chosen ciphertext security is the notion that gets us as close as possible
to designing encryptions that fit the metaphor of secure sealed envelopes.
Digital analogies will never be a perfect imitation of physical ones, but
such metaphors are what people have in mind when designing cryp-
tographic protocols, which is a hard enough task even when we don’t
have to worry about the ability of an adversary to reach inside a sealed
envelope and XOR the contents of the note written there with some
arbitrary string. Indeed, several practical attacks, including Bleichen-
bacher’s attack above, exploited exactly this gap between the physical
metaphor and the digital realization. For more on this, please see
Victor Shoup’s survey where he also describes the Cramer-Shoup en-
cryption scheme which was the first practical public key system to be
shown CCA secure without resorting to the random oracle heuristic.
(The first definition of CCA security, as well as the first polynomial-
time construction, was given in a seminal 1991 work of Dolev, Dwork
and Naor.)

12.5 CCA SECURE PUBLIC KEY ENCRYPTION IN THE RANDOM


ORACLE MODEL
We now show how to convert any CPA-secure public key encryption
scheme to a CCA-secure scheme in the random oracle model (this
construction is taken from Fujisaki and Okamoto, CRYPTO 99). In the
homework, you will see a somewhat simpler direct construction of a
CCA secure scheme from a trapdoor permutation, a variant of which
is known as OAEP (which has better ciphertext expansion) has been
standardized as PKCS ฀1 V2.0 and is used in several protocols. The
advantage of a generic construction is that it can be instantiated not
just with the RSA and Rabin schemes, but also directly with Diffie-
Hellman and Lattice based schemes (though there are direct and more
efficient variants for these as well).

CCA-ROM-ENC Scheme:

• Ingredients: A CPA-secure public key encryp-


tion scheme (𝐺′ , 𝐸 ′ , 𝐷′ ) and three hash functions
𝐻, 𝐻 ′ , 𝐻 ″ ∶ {0, 1}∗ → {0, 1}𝑛 (which we model
as independent random oracles 2 ).
• Notes: We assume that 𝐸 ′ takes 𝑛 bit messages
(since CPA security is preserved under concate-
nation, a one-bit scheme can be transformed into
such a scheme). Since 𝐸 ′ is (necessarily) ran-
262 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

domized, we denote by 𝐸 ′ (𝑥; 𝑠) the encryption


of the message 𝑥 using the randomness 𝑠. We as-
sume that the number of bits of randomness 𝐸 ′
uses is 𝑛. (Otherwise we can modify the scheme
to use 𝑛 bits using a pseudorandom generator,
or modify the co-domain of 𝐻 to be the space of
random choices for 𝐸 ′ .)
• Key generation: We generate keys (𝑒, 𝑑) =
𝐺′ (1𝑛 ) for the underlying encryption scheme.
• Encryption: To encrypt a message 𝑚 ∈ {0, 1}ℓ ,
we select 𝑟 ←𝑅 {0, 1}𝑛 , and output

𝐸𝑒 (𝑚) = 𝐸𝑒′ (𝑟; 𝐻(𝑚‖𝑟))‖𝐻 ″ (𝑟) ⊕ 𝑚‖𝐻 ′ (𝑚‖𝑟)

recall that 𝐸𝑒′ (𝑟; 𝑠) denotes the encryption of the


message 𝑟 using randomness 𝑠.

• Decryption: To decrypt a ciphertext 𝑐‖𝑦‖ℎ first


let 𝑟 = 𝐷𝑑′ (𝑐), then compute 𝑚 = 𝐻 ″ (𝑟) ⊕ 𝑦.
Finally check that 𝑐 = 𝐸𝑒′ (𝑟; 𝐻(𝑚‖𝑟)) and
ℎ = 𝐻 (𝑚‖𝑟). If either check fails we output

error; otherwise we output 𝑚.

2
Recall that it’s easy to obtain two independent
random oracles 𝐻, 𝐻 ′ from a single oracle 𝐻 ″ , for
example by letting 𝐻(𝑥) = 𝐻 ″ (0‖𝑥) and 𝐻 ′ (𝑥) =
Theorem 12.2 — CCA security from random oracles. The above CCA-ROM- 𝐻 ″ (1‖𝑥). Similarly we can extend this to three, four
ENC scheme is CCA secure. or any number of oracles.

Proof. Let 𝐴 be a polynomial-time adversary that wins the “CCA


Game” with respect to the scheme (𝐺, 𝐸, 𝐷) with probability 1/2 + 𝜖.
We will show (Claim 1) that there is an adversary 𝐴 ̃ that can win in
this game with probability 1/2 + 𝜖 − 𝑛𝑒𝑔𝑙(𝑛) without using the decryption
box. We will then show (Claim 2) that this implies that 𝐴′ can win
the CPA game with respect to the scheme (𝐺′ , 𝐸 ′ , 𝐷′ ) with probability
1/2 + Ω(𝜖). We start by establishing the first claim:
Claim 1: Under the above assumptions, there exists a polynomial-
time adversary 𝐴 ̃ that wins the CCA game with respect to the scheme
(𝐺, 𝐸, 𝐷) without making any queries to the decryption box.
Proof of Claim 1: The adversary 𝐴 ̃ will simulate 𝐴, keeping track
of all of 𝐴’s queries to its decryption and random oracles. Whenever 𝐴
makes a query 𝑐‖𝑦‖ℎ to the decryption oracle, then 𝐴 ̃ will respond to
it using the following “fake” decryption box 𝐷:̃ check whether ℎ was
returned before from the random oracle 𝐻 ′ as a response to a query
𝑚‖𝑟 by 𝐴. If this is the case, then 𝐴 ̃ will check if 𝑐 = 𝐸𝑒′ (𝑟; 𝐻(𝑚‖𝑟))
and 𝑦 = 𝐻 ″ (𝑟) ⊕ 𝑚. If so, then it will return 𝑚, otherwise it will return
e sta bl i shi ng se c u re con n e c ti on s ove r i n se c u re cha n n e l s 263

error. Note that 𝐷(𝑐‖𝑦‖ℎ)


̃ is computed without any knowledge of the
secret key 𝑑.
We claim that the probability that 𝐴 ̃ will return an answer that
differs from the true decryption box is negligible. Indeed, for each
particular query 𝑐‖𝑦‖ℎ, first observe that if 𝐷(𝑐‖𝑦‖ℎ)
̃ is not error
then 𝐷(𝑐‖𝑦‖ℎ)
̃ = 𝐷𝑑 (𝑐‖𝑦‖ℎ). Indeed, in this case it holds that 𝑐 =
𝐸𝑒′ (𝑚; 𝐻(𝑚‖𝑟)), 𝑦 = 𝐻 ′ (𝑟) ⊕ 𝑚 and ℎ = 𝐻 ′ (𝑚‖𝑟). Hence this is a
properly formatted encryption of 𝑚, on which the true decryption box
will return 𝑚 as well.
So the only way that 𝐷 and 𝐷̃ differ is if 𝐷𝑑 (𝑐‖𝑦‖ℎ) = 𝑚 but
𝐷(𝑐‖𝑦‖ℎ) returns error. For this it must be the case that for 𝑟 = 𝐷𝑑′ (𝑐),
̃
ℎ = 𝐻 ′ (𝑚‖𝑟) but 𝑚‖𝑟 was not queried before by 𝐴. There are two
options: either 𝑚‖𝑟 was not queried at all, but then by the “lazy eval-
uation” paradigm, the value 𝐻 ′ (𝑚‖𝑟) is chosen uniformly in {0, 1}𝑛
independently of ℎ, and the probability that it equals ℎ is 2−𝑛 . The
other option is that 𝑚‖𝑟 was queried but not by the adversary. The
only other party that can make queries to the oracle in the CCA game
is the challenger, and it only makes a single query to 𝐻 ′ when produc-
ing the challenge ciphertext 𝐶 ∗ = 𝑐∗ ‖𝑦∗ ‖ℎ∗ with ℎ∗ = 𝐻 ′ (𝑚∗ ‖𝑟∗ ). Now
the adversary is not allowed to make the query 𝐶 ∗ so in this case the
query must have the form 𝑐‖𝑦‖ℎ∗ where 𝑐‖𝑦 ≠ 𝑐∗ ‖𝑦∗ . But the only way
that 𝐷𝑑 (𝑐‖𝑦‖ℎ∗ ) returns a value other than error is if for 𝑟 = 𝐷𝑑′ (𝑐)
and 𝑚 = 𝑦 ⊕ 𝐻 ″ (𝑟), 𝑐 = 𝐸𝑒 (𝑟; 𝐻(𝑚‖𝑟)) and ℎ∗ = 𝐻 ′ (𝑚‖𝑟). Since
the probability of a collision in 𝐻 ′ is negligible, this can only happen
if 𝑚‖𝑟 = 𝑚∗ ‖𝑟∗ , but in this case it will hold that 𝑐 = 𝑐∗ and 𝑦 = 𝑦∗ ,
contradicting the fact that the ciphertext must differ from 𝐶 ∗ . QED
(Claim 1)
Claim 2: Under the above assumptions, there exists a polynomial-
time adversary 𝐴′ that wins the CPA game with respect to the scheme
(𝐺′ , 𝐸 ′ , 𝐷′ ) with probability at least 1/2 + 𝜖/10.
Proof of Claim 2:
𝐴′ runs the full CCA experiment with the adversary 𝐴 ̃ obtained
from Claim 1, simulating the random oracles 𝐻, 𝐻 ′ , 𝐻 ″ using “lazy
evaluation”. When the time comes and the adversary 𝐴 ̃ chooses two
ciphertexts 𝑚0 , 𝑚1 , then 𝐴′ does the following:

1. The adversary 𝐴′ will choose 𝑟0 , 𝑟1 ←𝑅 {0, 1}𝑛 , give them to its


own challenger and get 𝑐∗ which is either an encryption of 𝑟𝑏∗ under
𝐸𝑒′ for 𝑏∗ ←𝑅 {0, 1}. If the adversary 𝐴 ̃ made in the past a query of
the form 𝑟𝑏 or 𝑚𝑏 ‖𝑟𝑏′ for 𝑏, 𝑏′ ∈ {0, 1} to one of the random oracles
then we stop the experiment and declare failure. (Since 𝑟0 , 𝑟1 are
random in {0, 1}𝑛 and 𝐴 ̃ made only polynomially many queries, the
probability of this happening is negligible).
264 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

2. The adversary 𝐴′ will now give 𝑐∗ ‖𝑦∗ ‖ℎ∗ with 𝑦∗ , ℎ∗ ←𝑅 {0, 1}𝑛 to
𝐴 ̃ as the response to the challenge. (Note that this ciphertext does
not involve neither 𝑚0 nor 𝑚1 in any way.)

3. Now if the adversary 𝐴 ̃ makes a query of the form 𝑟𝑏 or 𝑚‖𝑟𝑏 for


𝑏 ∈ {0, 1} to one of its oracles, then 𝐴′ will output 𝑏. Otherwise, it
outputs a random output.

Note that the adversary 𝐴′ ignores the output of 𝐴.̃ It only cares
about the queries that 𝐴 ̃ makes. Let’s say that an “𝑟𝑏 query is one that
has 𝑟𝑏 as a postfix”. To finish the proof we make the following two
claims:
Claim 2.1: The probability that 𝐴 ̃ makes an 𝑟1−𝑏∗ query is negligi-
ble. Proof: This is because the only value that 𝐴 ̃ receives that depends
on one of 𝑟0 , 𝑟1 is 𝑐∗ which is an encryption of 𝑟𝑏∗ . Hence 𝐴 ̃ never sees
any value that depends on 𝑟1−𝑏∗ and since it is uniform in {0, 1}𝑛 , the
probability that 𝐴 ̃ makes a query with this postfix is negligible. QED
(Claim 2.1)
Claim 2.2: 𝐴 ̃ will make an 𝑟𝑏∗ query with probability at least 𝜖/2.
Proof: Let 𝑐∗ = 𝐸𝑒′ (𝑟𝑏∗ ; 𝑠∗ ) where 𝑠∗ is the randomness used in produc-
ing it. By the lazy evaluation paradigm, since no 𝑟𝑏∗ query was made
up to that point, the distribution would be identical if we defined
𝐻(𝑚𝑏 ‖𝑟𝑏∗ ) = 𝑠∗ , defined 𝐻 ″ (𝑟𝑏∗ ) = 𝑦∗ ⊕𝑚𝑏 and define ℎ∗ = 𝐻 ′ (𝑚𝑏 ‖𝑟𝑏∗ ).
Hence the distribution of the ciphertext is identical to how it is dis-
tributed in the actual CCA game. Now, since 𝐴 ̃ wins the CCA game
with probability 1/2 + 𝜖 − 𝑛𝑒𝑔𝑙(𝑛), in this game it must query 𝐻 ″ at 𝑟𝑏∗
with probability at least 𝜖/2. Indeed, conditioned on not querying 𝐻 ″
at this value, the string 𝑦∗ is independent of the message 𝑚0 , and the
adversary cannot win the game with probability more than 1/2. QED
(Claim 2.2)
Together Claims 2.1 and 2.2 imply that the adversary 𝐴 ̃ makes
an 𝑟𝑏∗ query with probability at least 𝜖/2, and makes an 𝑟1−𝑏∗ query
with negligible probability, hence our adversary 𝐴′ will output 𝑏∗
with probability at least 𝜖/2, and with all but a negligible part of the
remaining probability will guess randomly, leading to an overall suc-
cess in the CPA game of at least 1/2 + 𝜖/2. QED (Claim 2 and hence
theorem)

12.5.1 Defining secure authenticated key exchange


The basic goal of secure communication is to set up a secure channel
between two parties Alice and Bob. We want to do so over an open
network, where messages between Alice and Bob might be read, mod-
ified, deleted, or added by the adversary. Moreover, we want Alice
and Bob to be sure that they are talking to one another rather than
e sta bl i shi ng se c u re con n e c ti on s ove r i n se c u re cha n n e l s 265

other parties. This raises the question of what is identity and how is
it verified. Ultimately, if we want to use identities, then we need to
trust some authority that decides which party has which identity. This
is typically done via a certificate authority (CA). This is some trusted
authority, whose verification key 𝑣𝐶𝐴 is public and known to all par-
ties. Alice proves in some way to the CA that she is indeed Alice, and
then generates a pair (𝑠𝐴𝑙𝑖𝑐𝑒 , 𝑣𝐴𝑙𝑖𝑐𝑒 ), and gets from the CA the message 3
The registration process could be more subtle than
𝜎𝐴𝑙𝑖𝑐𝑒 =“The key 𝑣𝐴𝑙𝑖𝑐𝑒 belongs to Alice” signed with 𝑠𝐶𝐴 .3 Now Alice that, and for example Alice might need to prove to
can send (𝑣𝐴𝑙𝑖𝑐𝑒 , 𝜎𝐴𝑙𝑖𝑐𝑒 ) to Bob to certify that the owner of this public the CA that she does indeed know the corresponding
secret key.
key is indeed Alice.
For example, in the web setting, certain certificate authorities can
certify that a certain public key is associated with a certain website. If
you go to a website using the https protocol, you should see a “lock”
symbol on your browser which will give you details on the certificate.
Often the certificate is a chain of certificate. If I click on this lock sym-
bol in my Chrome browser, I see that the certificate that amazon.com’s
public key is some particular string (corresponding to a 2048 RSA
modulos and exponent) is signed by the Symantec Certificate au-
thority, whose own key is certified by Verisign. My communication
with Amazon is an example of a setting of one sided authentication. It
is important for me to know that I am truly talking to amazon.com,
while Amazon is willing to talk to any client. (Though of course once
we establish a secure channel, I could use it to login to my Amazon
account.) Chapter 21 of Boneh Shoup contains an in depth discussion
of authenticated key exchange protocols, see for example ??. Because
the definition is so involved, we will not go over the full formal def-
initions in this book, but I recommend Boneh-Shoup for an in-depth
treatment.
266 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

the definitions of protocols AEK1 - AEK4.

12.5.2 The compiler approach for authenticated key exchange


There is a generic “compiler” approach to obtaining authenticated key
exchange protocols:

• Start with a protocol such as the basic Diffie-Hellman protocol that


is only secure with respect to a passive eavesdropping adversary.

• Then compile it into a protocol that is secure with respect to an ac-


tive adversary using authentication tools such as digital signatures,
message authentication codes, etc., depending on what kind of
setup you can assume and what properties you want to achieve.

This approach has the advantage of being modular in both the con-
struction and the analysis. However, direct constructions might be
more efficient. There are a great many potentially desirable properties
of key exchange protocols, and different protocols achieve different
subsets of these properties at different costs. The most common vari-
ant of authenticated key exchange protocols is to use some version of
the Diffie-Hellman key exchange. If both parties have public signature
keys, then they can simply sign their messages and then that effec-
tively rules out an active attack, reducing active security to passive
security (though one needs to include identities in the signatures to
ensure non repeating of messages, see here).
The most efficient variants of Diffie Hellman achieve authentication
implicitly, where the basic protocol remains the same (sending 𝑋 = 𝑔𝑥
e sta bl i shi ng se c u re con n e c ti on s ove r i n se c u re cha n n e l s 267

and 𝑌 = 𝑔𝑦 ) but the computation of the secret shared key involves


some authentication information. Of these protocols a particularly
efficient variant is the MQV protocol of Law, Menezes, Qu, Solinas and
Vanstone (which is based on similar principles as DSA signatures),
and its variant HMQV by Krawczyk that has some improved security
properties and analysis.

12.6 PASSWORD AUTHENTICATED KEY EXCHANGE.


NOTE: The following three parts are not yet written - we will discuss
them in class, but please at least skim the resources pointed out below
PAKE is covered in Boneh-Shoup Chapter 21.11

12.7 CLIENT TO CLIENT KEY EXCHANGE FOR SECURE TEXT MES-


SAGING - ZRTP, OTR, TEXTSECURE
To be completed. See Matthew Green’s blog , text secure, OTR.
Security requirements: forward secrecy, deniability.

12.8 HEARTBLEED AND LOGJAM ATTACKS


• Vestiges of past crypto policies.

• Importance of “perfect forward secrecy”

Figure 12.1: How the NSA feels about breaking en-


crypted communication
IV
ADVANCED TOPICS
13
Zero knowledge proofs

The notion of proof is central to so many fields. In mathematics, we


want to prove that a certain assertion is correct. In other sciences, we
often want to accumulate a preponderance of evidence (or statistical
significance) to reject certain hypotheses. In criminal law the prose-
cution famously needs to prove its case “beyond a reasonable doubt”.
Cryptography turns out to give some new twists on this ancient no-
tion.
Typically a proof that some assertion X is true, also reveals
some information about why X is true. When Hercule Poirot
proves that Norman Gale killed Madame Giselle he does so by
showing how Gale committed the murder by dressing up as a flight
attendant and stabbing Madame Gisselle with a poisoned dart.
Could Hercule convince us beyond a reasonable doubt that Gale
did the crime without giving any information on how the crime
was committed? Can the Russians prove to the U.S. that a sealed
box contains an authentic nuclear warhead without revealing
anything about its design? Can I prove to you that the number 𝑚 =
1
In 473,
385, 608, 108, 395, 369, 363, 400, 501, 273, 594, 475, 104, 405, 448, 848, 047,062, 278, case you
983are curious, the factors of 𝑚 are
1, 172, 192, 558, 529, 627, 184, 841, 954, 822, 099
has a prime factor whose last digit is 7 without giving you any infor- and 328, 963, 108, 995, 562, 790, 517, 498, 071, 717.
mation about 𝑚’s prime factors? We won’t answer the first question,
but will show some insights on the latter two.1
Zero knowledge proofs are proofs that fully convince that a statement
is true without yielding any additional knowledge. So, after seeing a zero
knowledge proof that 𝑚 has a factor ending with 7, you’ll be no closer
to knowing 𝑚’s factorization than you were before. Zero knowledge
proofs were invented by Goldwasser, Micali and Rackoff in 1982 and
have since been used in great many settings. How would you achieve
such a thing, or even define it? And why on earth would it be useful?
This is the topic of this lecture.

Compiled on 11.17.2021 22:35


272 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

P
This chapter will rely on the notion of NP complete-
ness, as well as the view of NP as proof systems. For
a review of this notion, please see this chapter of my
introduction to TCS text.

13.1 APPLICATIONS FOR ZERO KNOWLEDGE PROOFS.


Before we talk about how to achieve zero knowledge, let us discuss
some of its potential applications:

13.1.1 Nuclear disarmament


The United States and Russia have reached a dangerous and expensive
equilibrium where each has about 7000 nuclear warheads, much more
than is needed to decimate each others’ population (and the popu- 2
To be fair, “only” about 170 million Americans live
lation of much of the rest of the world).2 Having so many weapons in the 50 largest metropolitan areas and so arguably
many people will survive at least the initial impact of
increases the chance of “leakage” of weapons, or of an accidental
a nuclear war, though it had been estimated that even
launch (which can result in an all out war) through fault in com- a “small” nuclear war involving detonation of 100
munications or rogue commanders. This also threatens the delicate not too large warheads could have devastating global
consequences.
balance of the Non-Proliferation Treaty which at its core is a bargain
where non-weapons states agree not to pursue nuclear weapons and
the five nuclear weapon states agree to make progress on nuclear dis-
armament. These huge quantities of nuclear weapons are not only
dangerous, as they increase the chance of a leak or of an individual
failure or rogue commander causing a world catastrophe, but also
extremely expensive to maintain.
For all of these reasons, in 2009, U.S. President Obama called to set
as a long term goal a “world without nuclear weapons” and in 2012
spoke concretely about talking to Russia about reducing “not only
our strategic nuclear warheads, but also tactical weapons and war-
heads in reserve”. On the other side, Russian President Putin has said
already in 2000 that he sees “no obstacles that could hamper future
deep cuts of strategic offensive armaments”. (Though as of 2018, po-
litical winds on both sides have shifted away from disarmament and
more toward armament.)
There are many reasons why progress on nuclear disarmament has
been so slow, and most of them have nothing to do with zero knowl-
edge or any other piece of technology. But there are some technical
hurdles as well. One of those hurdles is that for the U.S. and Russia to
go beyond restricting the number of deployed weapons to significantly
reducing the stockpiles, they need to find a way for one country to ver-
ifiably prove that it has dismantled warheads. As mentioned in my
work with Glaser and Goldston (see also this page), a key stumbling
block is that the design of a nuclear warhead is of course highly clas-
ze ro know l e d g e p roofs 273

sified and about the last thing in the world that the U.S. would like to
share with Russia and vice versa. So, how can the U.S. convince the
Russian that it has destroyed a warhead, when it cannot let Russian
experts anywhere near it?

13.1.2 Voting
Electronic voting has been of great interest for many reasons. One
potential advantage is that it could allow completely transparent vote
counting, where every citizen could verify that the votes were counted
correctly. For example, Chaum suggested an approach to do so by
publishing an encryption of every vote and then having the central
authority prove that the final outcome corresponds to the counts of
all the plaintexts. But of course to maintain voter privacy, we need to
prove this without actually revealing those plaintexts. Can we do so?

13.1.3 More applications


I chose these two examples above precisely because they are hardly
the first that come to mind when thinking about zero knowledge.
Zero knowledge has been used for many cryptographic applications.
One such application (originating from work of Fiat and Shamir) is
the use for identification protocols. Here Alice knows a solution 𝑥 to a
puzzle 𝑃 , and proves her identity to Bob by, for example, providing an
encryption 𝑐 of 𝑥 and proving in zero knowledge that 𝑐 is indeed an 3
As we’ll see, technically what Alice needs to do
encryption of a solution for 𝑃 .3 Bob can verify the proof, but because in such a scenario is use a zero knowledge proof of
knowledge of a solution for 𝑃 .
it is zero knowledge, learns nothing about the solution of the puzzle
and will not be able to impersonate Alice. An alternative approach to
such identification protocols is through using digital signatures; this
connection goes both ways and zero knowledge proofs have been used
by Schnorr and others as a basis for signature schemes.
Another very generic application is for “compiling protocols”. As
we’ve seen time and again, it is often much easier to handle passive
adversaries than active ones. (For example, it’s much easier to get CPA
security against the eavesdropping Eve than CCA security against
the person-in-the-middle Mallory.) Thus it would be wonderful if
we could “compile” a protocol that is secure with respect to passive
attacks into one that is secure with respect to active ones. As was
first shown by Goldreich, Micali, and Wigderson, zero knowledge
proofs yield a very general such compiler. The idea is that all parties
prove in zero knowledge that they follow the protocol’s specifications.
Normally, such proofs might require the parties to reveal their secret
inputs, hence violating security, but zero knowledge precisely guar-
antees that we can verify correct behaviour without access to these
inputs.
274 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

13.2 DEFINING AND CONSTRUCTING ZERO KNOWLEDGE PROOFS


So, zero knowledge proofs are wonderful objects, but how do we get
them? In fact, we haven’t answered the even more basic question of
how do we define zero knowledge? We have to start by the most basic
task of defining what we mean by a proof.
A proof system can be thought of as an algorithm 𝑉 (for “verifier”)
that takes as input a statement which is some string 𝑥 and another
string 𝜋 known as the proof and outputs 1 if and only if 𝜋 is a valid
proof that the statement 𝑥 is correct. For example:

• In Euclidean geometry, statements are geometric facts such as “in any


triangle the degrees sum to 180 degrees” and the proofs are step by
step derivations of the statements from the five basic postulates.

• In Zermelo-Fraenkel + Axiom of Choice (ZFC) a statement is some


4
Integers can be coded as sets in various ways. For
purported fact about sets (e.g., the Riemann Hypothesis4 ), and a
example, one can encode 0 as ∅ and if 𝑁 is the set
proof is a step by step derivation of it from the axioms. encoding 𝑛, we can encode 𝑛 + 1 using the 𝑛 + 1-
element set {𝑁} ∪ 𝑁.
• We can define many other “theories”. For example, a theory where
the statements are pairs (𝑥, 𝑚) such that 𝑥 is a quadratic residue
modulo 𝑚 and a proof for 𝑥 is the number 𝑠 such that 𝑥 = 𝑠2
(mod 𝑚), or a theory where the theorems are Hamiltonian graphs 𝐺
(graphs on 𝑛 vertices that contain an 𝑛-long cycle) and the proofs
are the description of the cycle.

All these proof systems have the property that the verifying algo-
rithm 𝑉 is efficient. Indeed, that’s the whole point of a proof 𝜋- it’s a
sequence of symbols that makes it easy to verify that the statement is
true.
To achieve the notion of zero knowledge proofs, Goldwasser and
Micali had to consider a generalization of proofs from static sequences
of symbols to interactive probabilistic protocols between a prover and
a verifier. Let’s start with an informal example. The vast majority of
humans have three types of cone cells in their eyes. The reason why
we perceive the sky as blue (see also this), despite its color being quite
a different spectrum than the blue of the rainbow, is that the projection
of the sky’s color to our cones is closest to the projection of blue. It has
been suggested that a tiny fraction of the human population might
have four functioning cones (in fact, only women, as it would require
two X chromosomes and a certain mutation). How would a person
prove to another that she is a in fact such a tetrachromat?
Proof of tetrachromacy:
Suppose that Alice is a tetrachromat and can dis-
tinguish between the colors of two pieces of plastic
that would be identical to a trichromat. She wants to
ze ro know l e d g e p roofs 275

prove to a trichromat Bob that the two pieces are not


identical. She can do this as follows:
Alice and Bob will repeat the following experi-
ment 𝑛 times: Alice turns her back and Bob tosses
a coin and with probability 1/2 leaves the pieces
as they are, and with probability 1/2 switches the
right piece with the left piece. Alice needs to guess
whether Bob switched the pieces or not.
If Alice is successful in all of the 𝑛 repetitions then
Bob will have 1 − 2−𝑛 confidence that the pieces are
truly different.

A similar “proof” inspired the influential notion of hypothesis test-


ing in statistics. Dr. Muriel Bristol said that she prefers the taste of
tea when the milk is put first into the cup and tea later, rather than
vice versa. The statistician Ronald Fisher did not believe her. William
Roach (like Bristol, a chemist, and her future husband) proposed a
probabilistic test, whereby eight cups would be poured for Bristol,
each randomly chosen to either be “milk first” or “tea first”. Bristol
correctly identified all 8 cups. Pondering about this experiment, and
the level of confidence that it enabled to reject the “null hypothesis”
that Bristol simply guessed randomly led to Fisher’s development of
hypothesis testing and the now ubiquitous “𝑝 values”.
We now consider a more “mathematical” example along simi-
lar lines. Recall that if 𝑥 and 𝑚 are numbers then we say that 𝑥 is
a quadratic residue modulo 𝑚 if there is some 𝑠 such that 𝑥 = 𝑠2
(mod 𝑚). Let us define the function NQR(𝑚, 𝑥) to output 1 if and only
if 𝑥 ≠ 𝑠2 (mod 𝑚) for every 𝑠 ∈ {0, … , 𝑚 − 1}. There is a very simple
way to prove statements of the form “NQR(𝑚, 𝑥) = 0”: just give out
𝑠. However, here is an interactive proof system to prove statements of
the form “NQR(𝑚, 𝑥) = 1”:

• We have two parties: Alice and Bob. The common input is (𝑚, 𝑥)
and Alice wants to convince Bob that NQR(𝑚, 𝑥) = 1. (That is, that
𝑥 is not a quadratic residue modulo 𝑚).

• We assume that Alice can compute NQR(𝑚, 𝑤) for every 𝑤 ∈


{0, … , 𝑚 − 1} but Bob is polynomial time.

• The protocol will work as follows:

1. Bob will pick some random 𝑠 ∈ ℤ∗𝑚 (e.g., by picking a random


number in {1, … , 𝑚 − 1} and discard it if it has nontrivial g.c.d.
with 𝑚) and toss a coin 𝑏 ∈ {0, 1}. If 𝑏 = 0 then Bob will send 𝑠2
(mod 𝑚) to Alice and otherwise he will send 𝑥𝑠2 (mod 𝑚) to Alice.

2. Alice will use her ability to compute NQR(𝑚, ⋅) to respond with


𝑏′ = 0 if Bob sent a quadratic residue and with 𝑏′ = 1 otherwise.
276 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

3. Bob accepts the proof if 𝑏 = 𝑏′ .

To see that Bob will indeed accept the proof, note that if 𝑥 is a non-
residue then 𝑥𝑠2 will have to be a non-residue as well (since if it had
a root 𝑠′ then 𝑠′ 𝑠 would be a root of 𝑥𝑠2 ). Hence it will always be the
case that 𝑏′ = 𝑏.
Moreover, if 𝑥 was a quadratic residue of the form 𝑥 = 𝑠′2 (mod 𝑚)
for some 𝑠′ , then 𝑥𝑠2 = (𝑠′ 𝑠)2 is simply a random quadratic residue,
which means that in this case Bob’s message is distributed the same
regardless of whether 𝑏 = 0 or 𝑏 = 1, and no matter what she does, Al-
ice has probability at most 1/2 of guessing 𝑏. Hence if Alice is always
successful than after 𝑛 repetitions Bob would have 1 − 2−𝑛 confidence
that 𝑥 is indeed a non-residue modulo 𝑚.

P
Please stop and make sure you see the similarities be-
tween this protocol and the one for demonstrating that
the two pieces of plastic do not have identical colors.

Let us now make the formal definition:

Let 𝑓 ∶ {0, 1}∗ → {0, 1} be some


Definition 13.1 — Proof systems.
function. A probabilistic proof for 𝑓 (i.e., a proof for statements of the
form “𝑓(𝑥) = 1”) is a pair of interactive algorithms (𝑃 , 𝑉 ) such that
𝑉 runs in polynomial time and they satisfy:

• Completeness: If 𝑓(𝑥) = 1 then on input 𝑥, if 𝑃 and 𝑉 are given


input 𝑥 and interact, then at the end of the interaction 𝑉 will
output Accept with probability at least 0.9.

• Soundness: If If 𝑓(𝑥) = 0 then for any arbitrary (efficient or


non efficient) algorithm 𝑃 ∗ , if 𝑃 ∗ and 𝑉 are given input 𝑥 and
interact then at the end 𝑉 will output Accept with probability at
most 0.1.

R
Remark 13.2 — Functions vs languages. In many texts
proof systems are defined with respect to languages
as opposed to functions. That is, instead of talking
about a function 𝑓 ∶ {0, 1}∗ → {0, 1} we talk about
a language 𝐿 ⊆ {0, 1}∗ . These two viewpoints are
completely equivalent via the mapping 𝑓 ⟷ 𝐿 where
𝐿 = {𝑥 |𝑓(𝑥) = 1}.

Note that we don’t necessarily require the prover to be efficient


(and indeed, in some cases it might not be). On the other hand, our
ze ro know l e d g e p roofs 277

soundness condition holds even if the prover uses a non efficient 5


People have considered the notion of zero knowl-
strategy.5 We say that a proof system has an efficient prover if there edge systems where soundness holds only with re-
is an NP-type proof system Π for 𝐿 (that is some efficient algorithm spect to efficient provers; these are known as argument
systems.
Π such that there exists 𝜋 with Π(𝑥, 𝜋) = 1 iff 𝑥 ∈ 𝐿 and such that
Π(𝑥, 𝜋) = 1 implies that |𝜋| ≤ 𝑝𝑜𝑙𝑦(|𝑥|), such that the strategy for 𝑃
can be implemented efficiently given any static proof 𝜋 for 𝑥 in this
system.

R
Remark 13.3 — Notation for strategies. Up until now,
we always considered cryptographic protocols where
Alice and Bob trusted one another, but were worried
about some adversary controlling the channel between
them. Now we are in a somewhat more “suspicious”
setting where the parties do not fully trust one an-
other. In such protocols there is always a “prescribed”
or honest strategy that a particular party should fol-
low, but we generally don’t want the other parties’
security to rely on someone else’s good intention, and
hence analyze also the case where a party uses an arbi-
trary malicious strategy. We sometimes also consider
the honest but curious case where the adversary is
passive and only collects information, but does not
deviate from the prescribed strategy.
Protocols typically only guarantee security for party A
when it behaves honestly - a party can always chose to
violate its own security and there is not much we can
(or should?) do about it.

13.3 DEFINING ZERO KNOWLEDGE


So far we merely defined the notion of an interactive proof system,
but we need to define what it means for a proof to be zero knowledge.
Before we attempt a definition, let us consider an example. Going back
to the notion of quadratic residuosity, suppose that 𝑥 and 𝑚 are public
and Alice knows 𝑠 such that 𝑥 = 𝑠2 (mod 𝑚). She wants to convince
Bob that this is the case. However she prefers not to reveal 𝑠. Can she
convince Bob that such an 𝑠 exists without revealing any information
about it? Here is a way to do so:

Protocol ZK-QR: Public input for Alice and Bob: 𝑥, 𝑚; Alice’s private
input is 𝑠 such that 𝑥 = 𝑠2 (mod 𝑚).

1. Alice will pick a random 𝑠′ and send to Bob 𝑥′ = 𝑥𝑠′2 (mod 𝑚).

2. Bob will pick a random bit 𝑏 ∈ {0, 1} and send 𝑏 to Alice.

3. If 𝑏 = 0 then Alice reveals 𝑠𝑠′ , hence giving out a root for 𝑥′ ; if 𝑏 = 1


then Alice reveals 𝑠′ , hence showing a root for 𝑥′ 𝑥−1 .
278 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

4. Bob checks that the value 𝑠″ revealed by Alice is indeed a root of


𝑥′ 𝑥−𝑏 , if so then it “accepts” the proof.

If 𝑥 was not a quadratic residue then no matter how 𝑥′ was chosen,


either 𝑥′ or 𝑥′ 𝑥−1 is not a residue and hence Bob will reject the proof
with probability at least 1/2. By repeating this 𝑛 times, we can reduce
the probability of Bob accepting the proof of a non residue to 2−𝑛 .
On the other hand, we claim that we didn’t really reveal anything
about 𝑠. Indeed, if Bob chooses 𝑏 = 0, then the two messages (𝑥′ , 𝑠𝑠′ )
he sees can be thought of as a random quadratic residue 𝑥′ and its
root. If Bob chooses 𝑏 = 1 then after dividing by 𝑥 (which he could
have done by himself) he still gets a random residue 𝑥″ and its root 𝑠′ .
In both cases, the distribution of these two messages is completely in-
dependent of 𝑠, and hence intuitively yields no additional information
about it beyond whatever Bob knew before.
To define zero knowledge mathematically we follow the following
intuition:

A proof system is zero knowledge if the verifier did not


learn anything after the interaction that he could not have
learned on his own.

Despite the name “zero knowledge”, we do not claim that the ver-
ifier does not know anything about the private input 𝑥. For example,
if 1𝑚 = 𝑝 ⋅ 𝑞 for two primes 𝑝, 𝑞, then each 𝑠 ∈ ℤ∗𝑚 has at most four
square roots, and if the verifier could compute square roots then they
can narrow 𝑥 down to these four possibilities. However, the point is
that this is knowledge that the verifier already even before the interac-
tion with the prover, and so participating in the proof resulted in zero
additional knowledge.
Here is how we formally define zero knowledge:

A proof system (𝑃 , 𝑉 ) for 𝑓 is


Definition 13.4 — Zero knowledge proofs.
zero knowledge if for every efficient verifier strategy 𝑉 ∗ there exists
an efficient probabilistic algorithm 𝑆 ∗ (known as the simulator)
such that for every 𝑥 s.t. 𝑓(𝑥) = 1, the following random variables
are computationally indistinguishable:

• The output of 𝑉 ∗ after interacting with 𝑃 on input 𝑥.

• The output of 𝑆 ∗ on input 𝑥.

That is, we can show the verifier does not gain anything from the
interaction, because no matter what algorithm 𝑉 ∗ he uses, whatever
he learned as a result of interacting with the prover, he could have just
ze ro know l e d g e p roofs 279

as equally learned by simply running the standalone algorithm 𝑆 ∗ on


the same input.

R
Remark 13.5 — The simulation paradigm. The natural
way to define security is to say that a system is secure
if some “laundry list” of bad outcomes X,Y,Z can’t
happen. The definition of zero knowledge is differ-
ent. Rather than giving a list of the events that are
not allowed to occur, it gives a maximalist simulation
condition.
At its heart the definition of zero knowledge says the
following: clearly, we cannot prevent the verifier from
running an efficient algorithm 𝑆 ∗ on the public input,
but we want to ensure that this is the most he can
learn from the interaction.
This simulation paradigm has become the standard
way to define security of a great many cryptographic
applications. That is, we bound what an adversary
Eve can learn by postulating some hypothetical ad-
versary Lilith that is under much harsher conditions
(e.g., does not get to interact with the prover) and
ensuring that Eve cannot learn anything that Lilith
couldn’t have learned either. This has an advantage of
being the most conservative definition possible, and
also phrasing security in positive terms- there exists a
simulation - as opposed to the typical negative terms
- events X,Y,Z can’t happen. Since it’s often easier for
us to think of positive terms, paradoxically sometimes
this stronger security condition is easier to prove. Zero
knowledge is in some sense the simplest setting of the
simulation paradigm and we’ll see it time and again in
dealing with more advanced notions.

The definition of zero knowledge is confusing since intuitively if


the verifier gained confidence that the statement is true than surely he
must have learned something. This is another one of those cases where
cryptography is counterintuitive. To understand it better, it is worth-
while to see the formal proof that the protocol above for quadratic
residuosity is zero knowledge:

Theorem 13.6 — Zero knowledge for quadratic residuosity. Protocol ZK-QR


above is a zero knowledge protocol.

Proof. Let 𝑉 ∗ be an arbitrary efficient strategy for Bob. Since Bob only
sends a single bit, we can think of this strategy as composed of two
functions:

• 𝑉1 (𝑥, 𝑚, 𝑥′ ) outputs the bit 𝑏 that Bob chooses on input 𝑥, 𝑚 and


after Alice’s first message is 𝑥′ .
280 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

• 𝑉2 (𝑥, 𝑚, 𝑥′ , 𝑠″ ) is whatever Bob outputs after seeing Alice’s re-


sponse 𝑠″ to the bit 𝑏.

Both 𝑉1 and 𝑉2 are efficiently computable. We now need to come


up with an efficient simulator 𝑆 ∗ that is a standalone algorithm that on
input 𝑥, 𝑚 will output a distribution indistinguishable from the output
𝑉 ∗.
The simulator 𝑆 ∗ will work as follows:

1. Pick 𝑏′ ←𝑅 {0, 1}.


2
2. Pick 𝑠″ at random in ℤ𝑚

. If 𝑏 = 0 then let 𝑥′ = 𝑠″ (mod 𝑚).
2
Otherwise output 𝑥 = 𝑥𝑠″ (mod 𝑚).

3. Let 𝑏 = 𝑉1 (𝑥, 𝑚, 𝑥′ ). If 𝑏 ≠ 𝑏′ then go back to step 1.

4. Output 𝑉2 (𝑥, 𝑚, 𝑥′ , 𝑠″ ).

The correctness of the simulator follows from the following claims


(all of which assume that 𝑥 is actually a quadratic residue, since oth-
erwise we don’t need to make any guarantees and in any case Alice’s
behaviour is not well defined):
Claim 1: The distribution of 𝑥′ computed by 𝑆 ∗ is identical to the
distribution of 𝑥′ chosen by Alice.
Claim 2: With probability at least 1/2, 𝑏′ = 𝑏.
Claim 3: Conditioned on 𝑏 = 𝑏′ and the value 𝑥′ computed in
step 2, the value 𝑠″ computed by 𝑆 ∗ is identical to the value that Alice
sends when her first message is 𝑥′ and Bob’s response is 𝑏.
Together these three claims imply that in expectation 𝑆 ∗ only in-
vokes 𝑉1 and 𝑉2 a constant number of times (since every time it goes
back to step 1 with probability at most 1/2). They also imply that the
output of 𝑆 ∗ is in fact identical to the output of 𝑉 ∗ in a true interaction
with Alice. Thus, we only need to prove the claims, which is actually
quite easy:
Proof of Claim 1: In both cases, 𝑥′ is a random quadratic residue.
QED (Claim 1)
Proof of Claim 2: This is a corollary of Claim 1; since the distribu-
tion of 𝑥′ is identical to the distribution chosen by Alice, in particular
𝑥′ gives out no information about the choice of 𝑏′ . QED (Claim 2)
Proof of Claim 3: This follows from a direct calculation. The value
𝑠 sent by Alice is a square root of 𝑥′ if 𝑏 = 0 and of 𝑥′ 𝑥−1 if 𝑥 = 1. But

this is identical to what happens for 𝑆 ∗ if 𝑏 = 𝑏′ . QED (Claim 3)


Together these complete the proof of the theorem.

Theorem 13.6 is interesting but not yet good enough to guarantee


security in practice. After all, the protocol that we really need to show
ze ro know l e d g e p roofs 281

is zero knowledge is the one where we repeat this procedure 𝑛 times.


This is a general theorem that if a protocol is zero knowledge then
repeating it polynomially many times one after the other (so called
“sequential repetition”) preserves zero knowledge. You can think of
this as cryptography’s version of the equality “0 + 0 = 0”, but as usual,
intuitive things are not always correct and so this theorem does re-
quire (a not super trivial) proof. It is a good exercise to try to prove it
on your own. There are known ways to achieve zero knowledge with
negligible soundness error and a constant number of communication
rounds, see Goldreich’s book (Vol 1, Sec 4.9).

13.4 ZERO KNOWLEDGE PROOF FOR HAMILTONICITY.


We now show a proof for another language.
Suppose that Alice and Bob know an 𝑛-vertex graph 𝐺 and Alice
knows a Hamiltonian cycle 𝐶 in this graph (i.e. a length 𝑛 simple cycle
- one that traverses all vertices exactly once). Here is how Alice can
prove that such a cycle exists without revealing any information about
it.

Protocol ZK-Ham:

0. Common input: graph 𝐻 (in the form of an 𝑛 × 𝑛 adjacency ma-


trix). Alice’s private input: a Hamiltonian cycle 𝐶 = (𝐶1 , … , 𝐶𝑛 )
which are distinct vertices such that (𝐶ℓ , 𝐶ℓ+1 ) is an edge in 𝐻 for
all ℓ ∈ {1, … , 𝑛 − 1} and (𝐶𝑛 , 𝐶1 ) is an edge as well. Below we
assume that 𝐺 ∶ {0, 1}𝑛 → {0, 1}3𝑛 is a pseudorandom generator.

1. Bob chooses a random string 𝑧 ∈ {0, 1}3𝑛

2. Alice chooses a random permutation 𝜋 on {1, … , 𝑛} and let 𝑀


be the 𝜋-permuted adjacency matrix of 𝐻 (i.e., 𝑀𝜋(𝑖),𝜋(𝑗) = 1 iff
(𝑖, 𝑗) is an edge in 𝐻). For every 𝑖, 𝑗, Alice chooses a random string
𝑥𝑖,𝑗 ∈ {0, 1}𝑛 and let 𝑦𝑖,𝑗 = 𝐺(𝑥𝑖,𝑗 ) ⊕ 𝑀𝑖,𝑗 𝑧. She sends {𝑦𝑖,𝑗 }𝑖,𝑗∈[𝑛] to
Bob.

3. Bob chooses a bit 𝑏 ∈ {0, 1}.

4. If 𝑏 = 0 then Alice sends out 𝜋 and the strings {𝑥𝑖,𝑗 } for all 𝑖, 𝑗; if
𝑏 = 1 then Alice sends out the 𝑛 strings 𝑥𝜋(𝐶1 ),𝜋(𝐶2 ) , … , 𝑥𝜋(𝐶𝑛 ),𝜋(𝐶1 )
together with their indices.

5. If 𝑏 = 0 then Bob computes 𝑀 to be the 𝜋-permuted adjacency


matrix of 𝐻 and verifies that all the 𝑦𝑖,𝑗 ’s were computed from the
𝑥𝑖,𝑗 ’s appropriately. If so then Bob accepts the proof, and otherwise
it rejects it. If 𝑏 = 1 then Bob verifies that the indices of the strings
{𝑥𝑖,𝑗 } sent by Alice form a cycle and that indeed 𝑦𝑖,𝑗 = 𝐺(𝑥𝑖,𝑗 ) ⊕ 𝑧
282 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

for every string 𝑥𝑖,𝑗 that was sent by Alice. If so then Bob accepts
the proof and otherwise he rejects it.

Protocol
Theorem 13.7 — Zero Knowledge proof for Hamiltonian Cycle.
ZK-Ham is a zero knowledge proof system for the language of
Hamiltonian graphs. 6
6
Goldreich, Micali and Wigderson were the first to
Proof. We need to prove completeness, soundness, and zero knowl- come up with a zero knowledge proof for an NP
complete problem, though the Hamiltoncity protocol
edge. here is from a later work by Blum. We use Naor’s
Completeness can be easily verified, and so we leave this to the commitment scheme.
reader.
For soundness, we recall that (as we’ve seen before) with ex-
tremely high probability the sets 𝑆0 = {𝐺(𝑥) ∶ 𝑥 ∈ {0, 1}𝑛 } and
𝑆1 = {𝐺(𝑥) ⊕ 𝑧 ∶ 𝑥 ∈ {0, 1}𝑛 } will be disjoint (this probability is
over the choice of 𝑧 that is done by the verifier). Now, assuming this is
the case, given the messages {𝑦𝑖,𝑗 } sent by the prover in the first step,
define an 𝑛 × 𝑛 matrix 𝑀 ′ with entries in {0, 1, ?} as follows: 𝑀𝑖,𝑗 ′
=0
if 𝑦𝑖,𝑗 ∈ 𝑆0 , 𝑀𝑖,𝑗 = 1 if 𝑦𝑖,𝑗 ∈ 𝑆1 and 𝑀𝑖,𝑗 = ? otherwise.
′ ′

We split into two cases. The first case is that there exists some per-
mutation 𝜋 such that (i) 𝑀 ′ is a 𝜋-permuted version of the input
graph 𝐻 and (ii) 𝑀 ′ contains a Hamiltonian cycle. Clearly in this case
𝐻 contains a Hamiltonian cycle as well, and hence we don’t need to
consider it when analyzing soundness. In the other case we claim that
with probability at least 1/2 the verifier will reject the proof. Indeed, if
(i) is violated then the proof will be rejected if Bob chooses 𝑏 = 0 and
if (ii) is violated then the proof will be rejected if Bob chooses 𝑏 = 1.
We now turn to showing zero knowledge. For this we need to build
a simulator 𝑆 ∗ for an arbitrary efficient strategy 𝑉 ∗ of Bob. Recall that
𝑆 ∗ gets as input the graph 𝐻 (but not the Hamiltonian cycle 𝐶) and
needs to produce an output that is indistinguishable from the output
of 𝑉 ∗ . It will do so as follows:
0. Pick 𝑏′ ∈ {0, 1}.

1. Let 𝑧 ∈ {0, 1}3𝑛 be the first message computed by 𝑉 ∗ on input 𝐻.

2. If 𝑏′ = 0 then 𝑆 ∗ computes the second message as Alice does:


chooses a random permutation 𝜋 on {1, … , 𝑛} and let 𝑀 be the
𝜋-permuted adjacency matrix of 𝐻 (i.e., 𝑀𝜋(𝑖),𝜋(𝑗) = 1 iff (𝑖, 𝑗) is
an edge in 𝐻). In contrast, if 𝑏′ = 1 then 𝑆 ∗ lets 𝑀 be the all 1s
matrix. For every 𝑖, 𝑗, 𝑆 ∗ chooses a random string 𝑥𝑖,𝑗 ∈ {0, 1}𝑛
and let 𝑦𝑖,𝑗 = 𝐺(𝑥𝑖,𝑗 ) ⊕ 𝑀𝑖,𝑗 𝑧, where 𝐺 ∶ {0, 1}𝑛 → {0, 1}3𝑛 is a
pseudorandom generator.

3. Let 𝑏 be the output of 𝑉 ∗ when given the input 𝐻 and the first
message {𝑦𝑖,𝑗 } computed as above. If 𝑏 ≠ 𝑏′ then go back to step 0.
ze ro know l e d g e p roofs 283

4. We compute the fourth message of the protocol similarly to how


Alice does it: if 𝑏 = 0 then it consists of 𝜋 and the strings {𝑥𝑖,𝑗 } for
all 𝑖, 𝑗; if 𝑏 = 1 then we pick a random length-𝑛 cycle 𝐶 ′ and the
message consists of the 𝑛 strings 𝑥𝐶1′ ,𝐶2′ , … , 𝑥𝐶𝑛′ ,𝐶1′ together with
their indices.

5. Output whatever 𝑉 ∗ outputs when given the prior message.

We prove the output of the simulator is indistinguishable from the


output of 𝑉 ∗ in an actual interaction by the following claims:
Claim 1: The message {𝑦𝑖,𝑗 } computed by 𝑆 ∗ is computationally
indistinguishable from the first message computed by Alice.
Claim 2: The probability that 𝑏 = 𝑏′ is at least 1/3.
Claim 3: The fourth message computed by 𝑆 ∗ is computationally
indistinguishable from the fourth message computed by Alice.
We will simply sketch here the proofs (see Goldreich’s book for
example for full proofs):
For Claim 1, note that if 𝑏′ = 0 then the message is identical to the
way Alice computes it. If 𝑏′ = 1 then the difference is that 𝑆 ∗ computes
some strings 𝑦𝑖,𝑗 of the form 𝐺(𝑥𝑖,𝑗 ) + 𝑧 where Alice would compute
the corresponding strings as 𝐺(𝑥𝑖,𝑗 ) this is indistinguishable because
𝐺 is a pseudorandom generator (and the distribution 𝑈3𝑛 ⊕ 𝑧 is the
same as 𝑈3𝑛 ).
Claim 2 is a corollary of Claim 1. If 𝑉 ∗ managed to pick a message
𝑏 such that Pr[𝑏 = 𝑏′ ] < 1/2 − 𝑛𝑒𝑔𝑙(𝑛) then in particular it could
distinguish between the first message of Alice (that is computed inde-
pendently of 𝑏′ and hence contains no information about it) from the
first message of 𝑉 ∗ .
For Claim 3, note that again if 𝑏 = 0 then the message is computed
in a way identical to what Alice does. If 𝑏 = 1 then this message is also
computed in a way identical to Alice, since it does not matter if instead
of picking 𝐶 ′ at random, we picked a random permutation 𝜋 and let
𝐶 ′ be the image of the Hamiltonian cycle under this permutation.
This completes the proof of the theorem.

13.4.1 Why is this interesting?


The reason that a protocol for Hamiltonicity is more interesting than
a protocol for quadratic residuosity is that Hamiltonicity is an NP-
complete problem. Specifically recall the following:

• A function 𝐹 ∶ {0, 1}∗ → {0, 1} is in NP if there exists a polynomial-


time algorithm 𝑉𝐹 and some integer 𝑐 such that for every 𝑥 ∈
{0, 1}∗ , 𝐹 (𝑥) = 1 iff there exists 𝑦 ∈ {0, 1}|𝑥| such that 𝑉𝐹 (𝑥, 𝑦) = 1.
𝑐

Many functions of interest in all areas of math, science, engineering,


and more are in the class NP.
284 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

• Let HAM ∶ {0, 1}∗ → {0, 1} be the function that maps a graph 𝐺
to 1 if and only if 𝐺 contains a Hamiltonian cycle. Then HAM ∈
NP. Indeed, this is demonstrated by the function 𝑉𝐻𝐴𝑀 such that
𝑉𝐻𝐴𝑀 (𝐺, 𝐶) = 1 iff 𝐶 is a Hamiltonian cycle in the graph 𝐺.

• The function HAM is NP-complete. Specifically for every


𝐹 , 𝑉𝐹 as above, there is are efficiently computable functions
𝑟, 𝑟𝐸𝑛𝑐𝑜𝑑𝑒 , 𝑟𝐷𝑒𝑐𝑜𝑑𝑒 that satisfy the following:

a. (Completeness of reduction.) For every 𝑥, 𝑦 such that 𝑉𝐹 (𝑥, 𝑦) =


1, 𝑉𝐻𝐴𝑀 (𝑟(𝑥), 𝑟𝐸𝑛𝑐𝑜𝑑𝑒 (𝑥, 𝑦)) = 1. In particular this means that for
every 𝑥 such that 𝐹 (𝑥) = 1, HAM(𝑟(𝑥)) = 1. (Can you see why?)
b. (Soundness of reduction.) For every 𝑥 ∈ {0, 1}∗ , if there exists
𝐶 such that 𝑉𝐻𝐴𝑀 (𝑟(𝑥), 𝐶) = 1 then 𝑉𝐹 (𝑥, 𝑟𝐷𝑒𝑐𝑜𝑑𝑒 (𝑥, 𝐶)) = 1. In
particular this means that for every 𝑥 such that HAM(𝑟(𝑥)) = 1,
𝐹 (𝑥) = 1. (Can you see why?)

Using the reduction above, we can transform the zero-knowledge


proof for Hamiltonicity into a zero knowledge proof for every 𝐹 ∈ NP.
Specifically, to prove that 𝐹 (𝑥) = 1, the verifier and prover will use the
following system (see also Fig. 13.1).

1. Public input: 𝑥. Prover’s private input: 𝑦 such that 𝑉𝐹 (𝑥, 𝑦) = 1.

2. Verifier and prover will compute 𝐺 = 𝑟(𝑥). Prover will compute


𝐶 = 𝑟𝐸𝑛𝑐𝑜𝑑𝑒 (𝑥, 𝑦).

3. Verifier and prove run the Hamiltonicity zero knowledge protocol,


with public input 𝐺 and prover’s private input 𝐶. The verifier’s
output is the output in this protocol.

Figure 13.1: Using a zero knowledge protocol for


Hamiltonicity we can obtain a zero knowledge pro-
tocol for any language 𝐿 in NP. For example, if the
public input is a SAT formula 𝜑 and the Prover’s se-
cret input is a satisfying assignment 𝑥 for 𝜑 then the
verifier can run the reduction on 𝜑 to obtain a graph
𝐻 and the prover can run the same reduction to ob-
tain from 𝑥 a Hamiltonian cycle 𝐶 in 𝐻. They can
then run the ZK-Ham protocol to prove that indeed
𝐻 is Hamiltonian (and hence the original formula
was satisfiable) without revealing any information the
verifier could not have obtain on his own.
ze ro know l e d g e p roofs 285

P
Please make sure that you understand why this
will give a zero knowledge proof for 𝐹 , and in par-
ticular satisfy the completeness, soundness, and
zero-knowledge properties.
Note that while the NP completeness of Hamiltonicity
(and the Cook-Levin Theorem in general) is usually
perceived as a negative result (showing evidence for
the non-existence of an algorithm), in this context we
use it to obtain a positive result (zero knowledge proof
systems for many interesting functions).

This means that for every other NP language 𝐿, we can use the
reduction from 𝐿 to Hamiltonicity combined with protocol ZK-Ham
to give a zero knowledge proof system for 𝐿. In particular this means
that we can have zero knowledge proofs for the following languages:

• The language of numbers 𝑚 such that there exists a prime 𝑝 divid-


ing 𝑚 whose remainder modulo 10 is 7.

• The language of tuples 𝑋, 𝑒, 𝑐1 , … , 𝑐𝑛 such that 𝑐𝑖 is an encryption


of a number 𝑥𝑖 with ∑ 𝑥𝑖 = 𝑋. (This is essentially what we needed
in the voting example above).

• For every efficient function 𝐺, the language of pairs 𝑥, 𝑦 such that


there exists some input 𝑟 satisfying 𝑦 = 𝐺(𝑥‖𝑟). (This is what we
often need in the “protocol compiling” applications to show that a
particular output was produced by the correct program 𝐺 on public
input 𝑥 and private input 𝑟.)

13.5 PARALLEL REPETITION AND TURNING ZERO KNOWLEDGE


PROOFS TO SIGNATURES.
While we talked about amplifying zero knowledge proofs by running
them 𝑛 times one after the other, one could also imagine running the
𝑛 copies in parallel. It is not trivial that we get the same benefit of re-
ducing the error to 2−𝑛 but it turns out that we do in the cases we are
interested in here. Unfortunately, zero knowledge is not necessarily
preserved. It’s an important open problem whether zero knowledge is
preserved for the ZK-Ham protocol mentioned above.
However, Fiat and Shamir showed that in protocols (such as the
ones we showed here) where the verifier only sends random bits, then
if we replaced this verifier by a random function, then both soundness
and zero knowledge are preserved. This suggests a non-interactive
version of these protocols in the random oracle model, and this is
indeed widely used. Schnorr designed signatures based on this non
interactive version.
286 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

13.5.1 “Bonus features” of zero knowledge


The following properties of zero knowledge systems are used in the
literature. We might cover some in class, but mention them here.
These are covered in Chapter 20 of Boneh-Shoup.

• Proof of knowledge - it can be shown that the proof above of Hamil-


tonicity yields more than soundness. We can “extract” from a
prover startegy that succeeds in convincing the verifier that 𝐺 is
Hamiltonian with probability larger than 1/2 an actual Hamiltonian
cycle. This means that the prover didn’t just convince the verifier
that there exists a Hamiltonian cycle in the graph 𝐺 but also that the
prover “knows” it. This notion is known as a “proof of knowledge”.

• Arguments - if a proof system only satisfies the soundness condi-


tion with respect to polynomial-time provers, then it is called an
argument system.

• Succinct proofs - proofs that 𝐹 (𝑥) = 1 where total communication is


a fixed polynomial in 𝑛 independently of the time to verify 𝐹 .

Combining succinct zero-knowledge proofs with the Fiat-Shamir


heuristic for non-interactivity leads to the notion of zero-knowledge
succinct arguments or ZK-SNARG. If these also satisfy a “proof of
knowledge” property then they are called ZK-SNARKs. These have
recently been of great interest for crypto-currencies. See lectures 16-18
in Stanford CS 251, as well as this blog post.
14
Fully homomorphic encryption: Introduction and bootstrap-
ping

In today’s era of “cloud computing”, much of individuals’ and


businesses’ data is stored and computed on by third parties such as
Google, Microsoft, Apple, Amazon, Facebook, Dropbox and many
others. Classically, cryptography provided solutions to protecting
data in motion from point A to point B. But these are not always suffi-
cient to protect data at rest and particularly data in use. For example,
suppose that Alice has some data 𝑥 ∈ {0, 1}𝑛 (in modern applica-
tions 𝑥 would well be terabytes in length or larger) that she wishes to
store with the cloud service Bob, but is afraid that Bob will be hacked,
subpoenaed or simply does not completely trust Bob.
Encryption does not seem to immediately solve the problem. Alice
could store at Bob an encrypted version of the data and keep the secret
key for herself. But then she would be at a loss if she wanted to do
anything more with the data other than retrieving particular blocks of
it. If she wanted to outsource computation to Bob as well, and com-
pute 𝑓(𝑥) for some function 𝑓, then she would need to share the secret
key with Bob, thus defeating the purpose of encrypting the data in the
first place.
For example, after the computing systems of Office of Personell
Management (OPM) were discovered to be hacked in June of 2015,
revealing sensitive information, including fingerprints and all data
gathered during security clearance checks of up to 18 million people,
DHS assistant secretary for cybersecurity and communications Andy
Ozment said that encryption wouldn’t have helped preventing it since
“if an adversary has the credentials of a user on the network, then they
can access data even if it’s encrypted, just as the users on the network
have to access data”. So, can we encrypt data in a way that still allows
some access and computing on it?
Already in 1978, Rivest, Adleman and Dertouzos considered this
problem of a business that wishes to use a “commercial time-sharing

Compiled on 11.17.2021 22:35


288 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

service” to store some sensitive data. They envisioned a potential solu-


tion for this task which they called a privacy homomorphism. This no-
tion later became known as fully homomorphic encryption (FHE) which
is an encryption that allows a party (such as the cloud provider) that
does not know the secret key to modify a ciphertext 𝑐 encrypting 𝑥 to a
ciphertext 𝑐′ encrypting 𝑓(𝑥) for every efficiently computable 𝑓(). In
particular in our scenario above (see Fig. 14.1), such a scheme will
allow Bob, given an encryption of 𝑥, to compute the encryption of
𝑓(𝑥) and send this ciphertext to Alice without ever getting the secret
key and so without ever learning anything about 𝑥 (or 𝑓(𝑥) for that
matter).

Figure 14.1: A fully homomorphic encryption can be


used to store data on the cloud in encrypted form,
but still have the cloud provider be able to evaluate
functions on the data in encrypted form (without
ever learning either the inputs or the outputs of the
function they evaluate).

Unlike the case of a trapdoor function, where it only took a year for
Diffie and Hellman’s challenge to be answered by RSA, in the case of
fully homomorphic encryption for more than 30 years cryptographers
had no constructions achieving this goal. In fact, some people sus-
pected that there is something inherently incompatible between the
security of an encryption scheme and the ability of a user to perform
all these operations on ciphertexts. Stanford cryptographer Dan Boneh
used to joke to incoming graduate students that he will immediately
sign the thesis of anyone who came up with a fully homomorphic en-
cryption. But he never expected that he will actually encounter such
a thesis, until in 2009, Boneh’s student Craig Gentry released a paper
doing just that. Gentry’s paper shook the world of cryptography, and
instigated a flurry of research results making his scheme more effi-
cient, reducing the assumptions it relied on, extending and applying
it, and much more. In particular, Brakerski and Vaikuntanathan man-
aged to obtain a fully homomorphic encryption scheme based only on
the Learning with Error (LWE) assumption we have seen before.
Although there is a number of implementations for (partially and)
fully homomorphic encryption (see this list), there is still much work
to be done in order to realize the full practical potential of FHE. For
a comparable level of security, the encryption and decryption oper-
fu l ly homomorp hi c e nc ry p ti on : i n trod u c ti on a n d bootstra pp i ng 289

ations of a fully homomorphic encryption scheme are several orders


of magnitude slower than a conventional public key system, and (de-
pending on its complexity) homomorphically evaluating a circuit can
be significantly more taxing. However, this is a fast evolving field,
and already since 2009 significant optimizations have been discovered
that reduced the computational and storage overhead by many or-
ders of magnitudes. As in public key encryption, one would imagine
that for larger data one would use a “hybrid” approach of combining
FHE with symmetric encryption, though one might need to come up
with tailor-made symmetric encryption schemes that can be efficiently 1
In 2015 the state of art on homomorphically evalu-
evaluated.1 Homomorphically evaluations of approximate computa- ating AES was about 6 seconds of computation per
tion, which can be useful for machine learning, can be done more block using about 4GB memory total for 180 blocks.
See also this paper. In contrast, modern processors
efficiently.
can evaluate 10s-100s millions of AES blocks per
In this lecture and the next one we will focus on the fully homo- second.
morphic encryption schemes that are easiest to describe, rather than the
ones that are most efficient (though the efficient schemes share many
similarities with the ones we will talk about). As is generally the case
for lattice based encryption, the current most efficient schemes are
based on ideal lattices and on assumptions such as ring LWE or the 2
As we mentioned before, as a general rule of thumb,
security of the NTRU cryptosystem.2 the difference between the ideal schemes and the
one that we describe is that in the ideal setting one
deals with structured matrices that have a compact
R representation as a single vector and also enable fast
Remark 14.1 — Lesson from verifying computation. FFT-like matrix-vector multiplication. This saves a
To take the distance between theory and practice factor of about 𝑛 in the storage and computation
in perspective, it might be useful to consider the requirements (where 𝑛 is the dimension of the
case of verifying computation. In the early 1990’s re- subspace/lattice). However, there can be some subtle
searchers (motivated initially by zero knowledge security implications for ideal lattices as well, see e.g.,
proofs) came up with the notion of probabilistically here, here, here, and here.
checkable proofs (PCP’s) which could yield in prin-
ciple extremely succinct ways to check correctness of
computation.
Probabilistically checkable proofs can be thought of as
“souped up” versions of NP completeness reductions
and like these reductions, have been mostly used for
negative results, especially since the initial proofs were
extremely complicated and also included enormous
hidden constants. However, with time people have
slowly understood these better and made them more
efficient (e.g., see this survey) and it has now reached
the point where these results, are practical (see also
this) and in fact these ideas underlieat least two star-
tups. Overall, constructions for verifying computation
have improved by at least 20 orders of magnitude
over the last two decades. (We will talk about some
of these constructions later in this course.) If progress
on fully homomorphic encryption follows a similar
trajectory, then we can expect the road to practical
utility to be very long, but there is hope that it’s not a
“bridge to nowhere”.
290 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

R
Remark 14.2 — Poor man’s FHE via hardware. Since large
scale fully homomorphic encryption is still impracti-
cal, people have been trying to achieve at least weaker
security goals using certain assumptions. In particular
Intel chips have so called “Secure enclaves” which one
can think of as a somewhat tamper-protected region
of the processor that is supposed to be out of reach for
the outside world. The idea is that a cloud provider
client would treat this enclave as a trusted party that
it can communicate with through the cloud provider.
The client can store their data on the cloud encrypted
with some key 𝑘, and then set up a secure channel
with the enclave using an authenticated key exchange
protocol, and send 𝑘 over. Then, when the client sends
over a function 𝑓 to the cloud provider, the latter party
can simulate FHE by asking the enclave to compute
the encryption of 𝑓(𝑥) given the encryption of 𝑥. In
this solution ultimately the private key does reside on
the cloud provider’s computers, and the client has to
trust the security of the enclave. In practice, this could
provide reasonable security against remote hackers,
but (unlike FHE) probably not against sophisticated
attackers (e.g., governments) that have physical access
to the server.

14.1 DEFINING FULLY HOMOMORPHIC ENCRYPTION


We start by defining partially homomorphic encryption. We focus on en-
cryption for single bits. This is without loss of generality for CPA secu-
rity (CCA security is anyway ruled out for homomorphic encryption-
can you see why?), though there are more efficient constructions that
encrypt several bits at a time.

Let ℱ = ∪ℱℓ be a
Definition 14.3 — Partially Homomorphic Encryption.
class of functions where every 𝑓 ∈ ℱℓ maps {0, 1}ℓ to {0, 1}.
An ℱ-homomorphic public key encryption scheme is a CPA secure
public key encryption scheme (𝐺, 𝐸, 𝐷) such that there exists a
polynomial-time algorithm EVAL ∶ {0, 1}∗ → {0, 1}∗ such that for
every (𝑒, 𝑑) = 𝐺(1𝑛 ), ℓ = 𝑝𝑜𝑙𝑦(𝑛), 𝑥1 , … , 𝑥ℓ ∈ {0, 1}, and 𝑓 ∈ ℱℓ of
description size |𝑓| at most 𝑝𝑜𝑙𝑦(ℓ) it holds that:

• 𝑐 = EVAL𝑒 (𝑓, 𝐸𝑒 (𝑥1 ), … , 𝐸𝑒 (𝑥ℓ )) has length at most 𝑛.

• 𝐷𝑑 (𝑐) = 𝑓(𝑥1 , … , 𝑥ℓ ).
fu l ly homomorp hi c e nc ry p ti on : i n trod u c ti on a n d bootstra pp i ng 291

P
Please stop and verify you understand the defini-
tion. In particular you should understand why some
bound on the length of the output of EVAL is needed
to rule out trivial constructions that are the analo-
gous of the cloud provider sending over to Alice the
entire encrypted database every time she wants to
evaluate a function of it. By artificially increasing the
randomness for the key generation algorithm, this
is equivalent to requiring that |𝑐| ≤ 𝑝(𝑛) for some
fixed polynomial 𝑝(⋅) that does not grow with ℓ or |𝑓|.
You should also understand the distinction between
ciphertexts that are the output of the encryption algo-
rithm on the plaintext 𝑏, and ciphertexts that decrypt
to 𝑏, see Fig. 14.2.

Figure 14.2: In a valid encryption scheme 𝐸, the set


of ciphertexts 𝑐 such that 𝐷𝑑 (𝑐) = 𝑏 is a superset
of the set of ciphertexts 𝑐 such that 𝑐 = 𝐸𝑒 (𝑏; 𝑟)
for some 𝑟 ∈ {0, 1}𝑡 where 𝑡 is the number of
random bits used by the encryption algorithm. Our
definition of partially homomorphic encryption
scheme requires that for every 𝑓 ∶ {0, 1}ℓ → {0, 1}
in our family and 𝑥 ∈ {0, 1}ℓ , if 𝑐𝑖 ∈ 𝐸𝑒 (𝑥𝑖 ; {0, 1}𝑡 )
for 𝑖 = 1...ℓ then EVAL(𝑓, 𝑐1 , … , 𝑐ℓ ) is in the superset
{𝑐 | 𝐷𝑑 (𝑐) = 𝑓(𝑥)} of 𝐸𝑒 (𝑓(𝑥); {0, 1}𝑡 ). For example
if we apply EVAL to the OR function and ciphertexts
𝑐, 𝑐′ that were obtained as encryptions of 1 and 0
respectively, then the output is a ciphertext 𝑐″ that
would be decrypted to OR(1, 0) = 1, even if 𝑐″
is not in the smaller set of possible outputs of the
encryption algorithm on 1. This distinction between
the smaller and larger set is the reason why we cannot
automatically apply the EVAL function to ciphertexts
that are obtained from the outputs of previous EVAL
operations.

A fully homomomorphic encryption is simply a partially homomor-


phic encryption scheme for the family ℱ of all functions, where the
description of a function is as a circuit (say composed of NAND gates,
which are known to be a universal basis).

14.1.1 Another application: fully homomorphic encryption for verifying


computation
The canonical application of fully homomorphic encryption is for a
client to store encrypted data 𝐸(𝑥) on a server, send a function 𝑓 to the
server, and get back the encryption 𝐸(𝑓(𝑥)) of 𝑓(𝑥). This ensures that
the server does not learn any information about 𝑥, but does not ensure
that it actually computes the correct function!
Here is a cute protocol to achieve the latter goal (due to Chung
Kalai and Vadhan). Curiously the protocol involves “doubly encrypt-
292 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

ing” the input, and homomorphically evaluating the EVAL function


itself.

• Assumptions: We assume that all functions 𝑓 that the client will be


interested in can be described by a string of length 𝑛.

• Preprocessing: The client generates a pair of keys (𝑒, 𝑑). In the


initial stage the client computes the encrypted database 𝑐 = 𝐸𝑒 (𝑥)
and sends 𝑐, 𝑒 to the server. It also computes 𝑐∗ = 𝐸𝑒 (𝑓 ∗ ) for some
function 𝑓 ∗ as well as 𝑐∗∗ = EVAL𝑒 (𝑒𝑣𝑎𝑙, 𝑐∗ ‖𝑐) for that 𝑓 ∗ and keeps
𝑐∗ , 𝑐∗∗ for herself, where 𝑒𝑣𝑎𝑙(𝑓, 𝑥) = 𝑓(𝑥) is the circuit evaluation
function.

• Client query: To ask for an evaluation of 𝑓, the client generates a


new random FHE keypair (𝑒′ , 𝑑′ ), chooses 𝑏 ←𝑅 {0, 1} and lets
𝑐𝑏 = 𝐸𝑒′ (𝐸𝑒 (𝑓)) and 𝑐1−𝑏 = 𝐸𝑒′ (𝑐∗ ). It sends the triple 𝑒′ , 𝑐0 , 𝑐1 to
the server.

• Server response: Given the queries 𝑐0 , 𝑐1 , the server defines the


function 𝑔 ∶ {0, 1}∗ → {0, 1}∗ where 𝑔(𝑐) = EVAL𝑒 (𝑒𝑣𝑎𝑙, 𝑐‖𝑐) (for
the fixed 𝑐 received) and computes 𝑐0′ , 𝑐1′ where 𝑐𝑏′ = EVAL𝑒′ (𝑔, 𝑐𝑏 ).
(Please pause here and make sure you understand what this step is
doing! Note that we use here crucially the fact that EVAL itself is a
polynomial time computation.)

• Client check: Client checks whether 𝐷𝑑′ (𝑐1−𝑏



) = 𝑐∗∗ and if so
accepts 𝐷𝑑 (𝐷𝑑′ (𝑐𝑏′ )) as the answer.

We claim that if the server cheats then the client will detect this
with probability 1/2 − 𝑛𝑒𝑔𝑙(𝑛). Working this out is a great exercise.
The probability of detection can be amplified to 1 − 𝑛𝑒𝑔𝑙(𝑛) using
appropriate repetition, see the paper for details.

14.2 EXAMPLE: AN XOR HOMOMORPHIC ENCRYPTION


It turns out that Regev’s LWE-based encryption LWEENC we saw be-
fore is homomorphic with respect to the class of linear (mod 2) func-
tions. Let us recall the LWE assumption and the encryption scheme
based on it.

Let 𝑞 = 𝑞(𝑛) be some func-


Definition 14.4 — DLWE (simplified variant).
tion mapping the natural numbers to primes. The 𝑞(𝑛)-decision
learning with error (𝑞(𝑛)-dLWE) conjecture is the following: for every
𝑚 = 𝑝𝑜𝑙𝑦(𝑛) there is a distribution LWE𝑞 over pairs (𝐴, 𝑠) such that:

• 𝐴 is an 𝑚 × 𝑛 matrix over ℤ𝑞 and 𝑠 ∈ ℤ𝑛𝑞 satisfies 𝑠1 = ⌊ 2𝑞 ⌋ and



|(𝐴𝑠)𝑖 | ≤ 𝑞 for every 𝑖 ∈ {1, … , 𝑚}.
fu l ly homomorp hi c e nc ry p ti on : i n trod u c ti on a n d bootstra pp i ng 293

• The distribution 𝐴 where (𝐴, 𝑠) is sampled from LWE𝑞 is com-


putationally indistinguishable from the uniform distribution of
𝑚 × 𝑛 matrices over ℤ𝑞 .

The dLWE conjecture is that 𝑞(𝑛)-dLWE holds for every 𝑞(𝑛) that is
at most 𝑝𝑜𝑙𝑦(𝑛). This is not exactly the same phrasing we used before,
but as we sketch below, it is essentially equivalent to it. One can also
make the stronger conjecture that 𝑞(𝑛)-dLWE holds even for 𝑞(𝑛)
that is super polynomial in 𝑛 (e.g., 𝑞(𝑛) magnitude roughly 2𝑛 - note
that such a number can still be described in 𝑛 bits and we can still
efficiently perform operations such as addition and multiplication
modulo 𝑞). This stronger conjecture also seems well supported by
evidence and we will use it in future lectures.

P
It is a good idea for you to pause here and try to show
the equivalence on your own.

Equivalence between LWE and DLWE: The reason the two conjectures
are equivalent are the following. Before we phrased the conjecture as
recovering 𝑠 from a pair (𝐴′ , 𝑦) where 𝑦 = 𝐴′ 𝑠′ + 𝑒 and |𝑒𝑖 | ≤ 𝛿𝑞 for
every 𝑖. We then showed a search to decision reduction (Theorem 11.2)
demonstrating that this is equivalent to the task of distinguishing
between this case and the case that 𝑦 is a random vector. If we now let
𝛼 = ⌊ 2𝑞 ⌋ and 𝛽 = 𝛼−1 (mod 𝑞), and consider the matrix 𝐴 = (−𝛽𝑦|𝐴′ )
and the column vector 𝑠 = (𝑠𝛼′ ) we see that 𝐴𝑠 = 𝑒. Note that if 𝑦 is
a random vector in ℤ𝑚 𝑞 then so is −𝛽𝑦 and so the current form of the
conjecture follows from the previous one. (To reduce the number of

free parameters, we fixed 𝛿 to equal 1/ 𝑞; in this form the conjecture
becomes stronger as 𝑞 grows.)

The following variant of the


A linearly-homomorphic encryption scheme:
LWE-ENC described in Section 11.4 turns out to be linearly homomor-
phic:

LWE-ENC’ encryption:

• Key generation: Choose (𝐴, 𝑠) from LWE𝑞 where


𝑚 satisfies 𝑞 1/4 ≫ 𝑚 log 𝑞 ≫ 𝑛.
• To encrypt 𝑏 ∈ {0, 1}, choose 𝑤 ∈ {0, 1}𝑚 and out-
put 𝑤⊤ 𝐴 + (𝑏, 0, … , 0).
• To decrypt 𝑐 ∈ ℤ𝑛𝑞 , output 0 iff |⟨𝑐, 𝑠⟩| ≤ 𝑞/10,
where for 𝑥 ∈ ℤ𝑞 we defined |𝑥| = min{𝑥, 𝑞 − 𝑥}.
(Recall that the first coordinate of 𝑠 is ⌊𝑞/2⌋.)
294 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

The decryption algorithm recovers the original plaintext since



⟨𝑐, 𝑠⟩ = 𝑤⊤ 𝐴𝑠 + 𝑠1 𝑏 and |𝑤⊤ 𝐴𝑠| ≤ 𝑚 𝑞 ≪ 𝑞. It turns out that this
scheme is homomorphic with respect to the class of linear functions
modulo 2. Specifically we make the following claim:
Lemma 14.5 For every ℓ ≪ 𝑞 1/4 , there is an algorithm EVALℓ that on
input 𝑐1 , … , 𝑐ℓ which are LWEENC-encryptions of the bits 𝑏1 , … , 𝑏ℓ ∈
{0, 1}, outputs a ciphertext 𝑐 whose decryption is 𝑏1 ⊕ ⋯ ⊕ 𝑏ℓ .

P
This claim is not hard to prove, but working it out for
yourself can be a good way to get more familiarity
with LWE-ENC’ and the kind of manipulations we’ll
be making time and again in the constructions of
many lattice based cryptographic primitives. Recall
that a ciphertext 𝑐 of LWE-ENC’ is a vector in ℤ𝑛𝑞 . Try
to show that 𝑐 = 𝑐1 + ⋯ + 𝑐ℓ (where addition is done
as vectors in ℤ𝑞 ) will be the encryption of 𝑏1 ⊕ ⋯ ⊕ 𝑏ℓ .
Note that if 𝑞 is super polynomial in 𝑛 then ℓ can be an
arbitrarily large polynomial in 𝑛.

Proof of Lemma 14.5. The proof is quite simple. EVAL will simply add
the ciphertexts as vectors in ℤ𝑞 . If 𝑐 = ∑ 𝑐𝑖 then

⟨𝑐, 𝑠⟩ = ∑ 𝑏𝑖 ⌊ 2𝑞 ⌋ + 𝜉 mod 𝑞

where 𝜉 ∈ ℤ𝑞 is a “noise term” such that |𝜉| ≤ ℓ𝑚 𝑞 ≪ 𝑞.
Since |⌊ 2𝑞 ⌋ − 2𝑞 | < 1, adding at most ℓ terms of this difference adds at
most ℓ, and so we can also write

⟨𝑐, 𝑠⟩ = ⌊∑ 𝑏𝑖 2𝑞 ⌋ + 𝜉 ′ mod 𝑞

for |𝜉 ′ | ≤ ℓ𝑚 𝑞 + ℓ ≪ 𝑞.
If ∑ 𝑏𝑖 is even then ∑ 𝑏𝑖 2𝑞 is an integer multiple of 𝑞 and hence in
this case |⟨𝑐, 𝑠⟩| ≪ 𝑞. If ∑ 𝑏𝑖 is odd ⌊∑ 𝑏𝑖 2𝑞 ⌋ = ⌊𝑞/2⌋ mod 𝑞 and so in
this case |⟨𝑐, 𝑠⟩| = 𝑞/2 ± 𝑜(𝑞) > 𝑞/10.

Several other encryption schemes are also homomorphic with


respect to linear functions. Even before Gentry’s construction there
were constructions of encryption schemes that are homomorphic with
respect to somewhat larger classes (e.g., quadratic functions by Boneh,
Goh and Nissim) but not significantly so.

14.2.1 Abstraction: A trapdoor pseudorandom generator.


It is instructive to consider the following abstraction (which we’ll
use in the next lecture) of the above encryption scheme as a trapdoor
generator (see Fig. 14.3). On input 1𝑛 the key generation algorithm
fu l ly homomorp hi c e nc ry p ti on : i n trod u c ti on a n d bootstra pp i ng 295

outputs a vector 𝑠 ∈ ℤ𝑚𝑞 with 𝑠1 = ⌊ 2 ⌋ and a probabilistic algorithm


𝑞

𝐺𝑠 such that the following holds:

• Any polynomial number of samples from the distribution 𝐺𝑠 (1𝑛 ) is


computationally indistinguishable from independent samples from
the uniform distribution over ℤ𝑛𝑞 .

• If 𝑐 is output by 𝐺𝑠 (1𝑛 ) then |⟨𝑐, 𝑠⟩| ≤ 𝑛 𝑞.

The generator 𝐺𝑠 picks 𝑤 ←𝑅 {0, 1}𝑚 to 𝑤⊤ 𝐴. Its output will look



pseudorandom but will satisfy the condition |⟨𝐺𝑠 (1𝑛 ), 𝑠⟩| ≤ 𝑛 𝑞
with probability 1 over the choice of 𝑤. Thus 𝑠 can be thought of a
“trapdoor” for the generator that allows us to distinguish between
a random vector 𝑐 ∈ ℤ𝑛𝑞 (that with high probability would satisfy

|⟨𝑐, 𝑠⟩| ≫ 𝑛 𝑞, assuming 𝑞 ≫ 𝑛2 ) and an output of the generator.
We use 𝐺𝑠 to encrypt a bit 𝑏 by letting 𝑐 ←𝑅 𝐺𝑠 (1𝑛 ) and out-
putting 𝑐 + (𝑏, 0, … , 0)⊤ . While our particular implementation mapped
𝐺𝑠 (𝑤) = 𝑤⊤ 𝐴, we can ignore these implementation details in the
forgoing.

Figure 14.3: In a trapdoor generator, we have two ways


to generate randomized algorithms. That is, we have

some algorithms GEN and GEN such that GEN out-

puts a pair (𝐺𝑠 , 𝑠) and GEN outputs 𝐺′ with 𝐺𝑠 , 𝐺′
being themselves algorithms (e.g., randomized cir-
cuits). The conditions we require are that (1) the
descriptions of the circuits 𝐺𝑠 and 𝐺′ (considering
them as distributions over strings) are computation-
ally indistinguishable and (2) the distribution 𝐺′ (1𝑛 )
is statistically indistinguishable from the uniform distri-
bution , (3) there is an efficient algorithm that given
the secret “trapdoor” 𝑠 can distinguish the output
of 𝐺𝑠 from the uniform distribution. In particular
(1),(2), and (3) together imply that it is not feasible to
extract 𝑠 from the description of 𝐺𝑠 .

Our LWE-based trapdoor generator satisfies the following stronger


property: we can generate an alternative generator 𝐺′ such that the
description of 𝐺′ is indistinguishable from the description of 𝐺𝑠 but
such that 𝐺′ actually does produce (up to exponentially small statisti-
cal error) the uniform distribution over ℤ𝑛𝑞 . We can do so by sampling
𝐴 completely at random instead of from the LWE𝑞 distribution. We
can define trapdoor generators formally as follows

A trapdoor generator is a pair of


Definition 14.6 — Trapdoor generators.
randomized algorithms GEN, GEN that satisfy the following:

296 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

• On input 1𝑛 , GEN outputs a pair (𝐺𝑠 , 𝑠) where 𝐺𝑠 is a string


describing a randomized circuit. The circuit 𝐺𝑠 takes 1𝑛 as in-
put and outputs a (randomly chosen) string of length 𝑡 where
𝑡 = 𝑡(𝑛) is some polynomial.

• On input 1𝑛 , GEN outputs 𝐺′ where 𝐺′ is a string describing a


randomized circuit with the same inputs and outputs.

• The distributions GEN(1𝑛 )1 (i.e., the first output of GEN(1𝑛 ))


and GEN (1𝑛 )1 are computationally indistinguishable. (These

are both distributions over circuits.)

• With probability 1 − 𝑛𝑒𝑔𝑙(𝑛) over the choice of 𝐺′ output by


GEN , the distribution 𝐺′ (1𝑛 ) is statistically indistinguishable

(i.e., within 𝑛𝑒𝑔𝑙(𝑛) total variation distance) from 𝑈𝑡 (i.e., the


uniform distribution over {0, 1}𝑡 ).

• There is an efficient algorithm 𝑇 such that for every pair (𝐺𝑠 , 𝑠)


output by GEN, Pr[𝑇 (𝑠, 𝐺𝑠 (1𝑛 )) = 1] ≥ 1 − 𝑛𝑒𝑔𝑙(𝑛) (where this
probability is over the internal randomness used by 𝐺𝑠 on the
input 1𝑛 ) but Pr[𝑇 (𝑠, 𝑈𝑡 ) = 1] ≤ 1/3. 3

3
The choice of 1/3 is arbitrary, and can be amplified
as needed.

P
This is not an easy definition to parse, but looking at
Fig. 14.3 can help. Make sure you understand why
LWEENC gives rise to a trapdoor generator satisfying
all the conditions of Definition 14.6.

R
Remark 14.7 — Trapdoor generators in real life. In the
above we use the notion of a “trapdoor” in the pseu-
dorandom generator as a mathematical abstraction,
but generators with actual trapdoors have arisen in
practice. In 2007 the National Institute of Standards
(NIST) released standards for pseudorandom genera-
tors. Pseudorandom generators are the quintessential
private key primitive, typically built out of hash func-
tions, block ciphers, and such and so it was surprising
that NIST included in the list a pseudorandom gen-
erator based on public key tools - the Dual EC DRBG
generator based on elliptic curve cryptography. This
was already strange but became even more worrying
when Microsoft researchers Dan Shumow and Niels
Ferguson showed that this generator could have a trap-
door in the sense that it contained some hardwired
constants that if generated in a particular way, there
would be some information that (just like in 𝐺𝑠 above)
fu l ly homomorp hi c e nc ry p ti on : i n trod u c ti on a n d bootstra pp i ng 297

allows to distinguish the generator from random (see


here for a 2007 blog post on this issue). We learned
more about this when leaks from the Snowden doc-
ument showed that the NSA secretly paid 10 million
dollars to RSA to make this generator the default
option in their Bsafe software.
You’d think that this generator is long dead but it
turns out to be the “gift that keeps on giving”. In De-
cember of 2015, Juniper systems announced that they
have discovered a malicious code in their system, dat-
ing back to at least 2012 (possibly 2008), that would
allow an attacker to surreptitiously decrypt all VPN
traffic through their firewalls. The issue is that Juniper
has been using the Dual EC DRBG and someone has
managed to replace the constant they were using with
another one, one that they presumably knew the trap-
door for (see here and here for more; of course unless
you know to check for this, it’s very hard by looking
at the code to see that one arbitrary looking constant
has been replaced by another). Apparently, even
though this is very surprising to many people in law
enforcement and government, inserting back doors
into cryptographic primitives might end up making
them less secure. Some more details have emereged in
this case in 2021, see this story and this Tweet thread.

14.3 FROM LINEAR HOMOMORPHISM TO FULL HOMOMORPHISM


Gentry’s breakthrough had two components:
• First, he gave a scheme that is homomorphic with respect to arith-
metic circuits (involving not just addition but also multiplications)
of logarithmic depth.

• Second, he showed the amazing “bootstrapping theorem” that if


a scheme is homomorphic enough to evaluate its own decryption
circuit, then it can be turned into a fully homomorphic encryption
that can evaluate any function.
Combining these two insights led to his fully homomorphic encryp- 4
The story is a bit more complex than that. Frustrat-
tion.4 ingly, the decryption circuit of Gentry’s basic scheme
was just a little bit too deep for the bootstrapping
In this lecture we will focus on the second component - the boot- theorem to apply. A lesser man, such as yours truly,
strapping theorem. We will show a “partially homomorphic encryp- would at this point surmise that fully homomprphic
tion” (based on a later work of Gentry, Sahai and Waters) that can fit encryption was just not meant to be, and perhaps take
up knitting or playing bridge as an alternative hobby.
that theorem in the next lecture. However, Craig persevered and managed to come
up with a way to “squash” the decryption circuit so
it can fit the bootstrapping parameters. Follow up
14.4 BOOTSTRAPPING: FULLY HOMOMORPHIC “ESCAPE VELOC-
works, and in particular the paper of Brakerski and
ITY” Vaikuntanathan, managed to get schemes with much
better relation between the homomorphism depth
The bootstrapping theorem is quite surprising. A priori you might and decryption circuit, and hence avoid the need for
expect that given that a homomorphic encryption for linear functions squashing and also improve the security assumptions.
298 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Figure 14.4: The “Bootstrapping Theorem” shows that


once a partially homomorphic encryption scheme is
homomorphic with respect to a rich enough family of
functions, and specifically a family that contains its
own decryption algorithm, then it can be converted to
a fully homomorphic encryption scheme that can be
used to evaluate any function.

was not trivial to do, a homomorphic encryption for quadratics would


be harder, cubics even harder and so on and so forth. But it turns out
that there is some special degree 𝑡∗ such that if we obtain homomor-
phic encryption for degree 𝑡∗ polynomials then we can obtain fully
homomorphic encryption that works for all functions. (Specifically,
if the decryption algorithm 𝑐 ↦ 𝐷𝑑 (𝑐) is a degree 𝑡 polynomial, then
homomorphically evaluating polynomials of degree 𝑡∗ = 2𝑡 will
be sufficient.) That is, it turns out that once an encryption scheme
is strong enough to homomorphically evaluate its own decryption algo-
rithm then we can use it to obtain a fully homomorphic encryption by
“pulling itself up by its own bootstraps”. One analogy is that at this
point the encryption reaches “escape velocity” and we can continue
onwards evaluating gates in perpetuity.
We now show the bootstrapping theorem:

Theorem 14.8 — Bootstrapping Theorem, Gentry 2009. Suppose that


(𝐺, 𝐸, 𝐷) is a CPA circular 5 secure partially homomorphic en-
cryption scheme for the family ℱ and suppose that for every pair
of ciphertexts 𝑐, 𝑐′ the map 𝑑 ↦ 𝐷𝑑 (𝑐) NAND 𝐷𝑑 (𝑐′ ) is in ℱ. Then
(𝐺, 𝐸, 𝐷) can be turned a fully homomorphic encryption scheme.
5
You can ignore the condition of circular security in a
first read - we will discuss it later.

14.4.1 Radioactive legos analogy


Here is one analogy for bootstrapping, inspired by Gentry’s survey.
Suppose that you need to construct some complicated object from a
highly toxic material (see Fig. 14.5). For example you want to build a
castle out of radio-active legos.
fu l ly homomorp hi c e nc ry p ti on : i n trod u c ti on a n d bootstra pp i ng 299

You are given a supply of sealed bags that are flexible enough so
you can manipulate the object from outside the bag. However, each
bag can only hold for 10 seconds of such manipulations before it leaks.
The idea is that if you can open one bag inside another within 9 sec-
onds then you can use the extra second to perform one step. By re-
peating this, you perform the manipulations for arbitrary length.
Specifically, suppose that you have completed 𝑖 steps out of the total
of 𝑇 , and now have the partially constructed castle inside a sealed bag
𝐵𝑖 . You now put the bag 𝐵𝑖 inside a fresh bag 𝐵𝑖+1 . You now spend
9 seconds on opening the bag 𝐵𝑖 inside the bag 𝐵𝑖+1 , and an extra
second on performing the 𝑖 + 1 step in the construction. At this point
we have completed 𝑖 + 1 steps and have the object in the bag 𝐵𝑖+1 , we
can now continue by putting in the bag 𝐵𝑖+2 and so on and so forth.

Figure 14.5: To build a castle from radioactive Lego


bricks, which can be kept safe in a special ziploc bag
for 10 seconds, we can: 1) Place the bricks in a bag,
and place the bag inside an outer bag. 2) Manipulate
the inner bag through the outer bag to remove the
bricks from it in 9 seconds, and spend 1 second
putting one brick in place. Now, just before the outer
bag “leaks” we put it inside a fresh new bag and
repeat the process.

14.4.2 Proving the bootstrapping theorem


We now turn to the formal proof of Theorem 14.8

Proof. The idea behind the proof is simple but ingenious. Recall that
the NAND gate 𝑏, 𝑏′ ↦ ¬(𝑏 ∧ 𝑏′ ) is a universal gate that allows us
to compute any function 𝑓 ∶ {0, 1}𝑛 → {0, 1} that can be efficiently
computed. Thus, to obtain a fully homomorphic encryption it suffices
to obtain a function NANDEVAL such that 𝐷𝑑 (NANDEVAL(𝑐, 𝑐′ )) =
𝐷𝑑 (𝑐) NAND 𝐷𝑑 (𝑐′ ). (Note that this is stronger than the typical no-
tion of homomorphic evaluation since we require that NANDEVAL
outputs an encryption of 𝑏 NAND 𝑏′ when given any pair of cipher-
texts that decrypt to 𝑏 and 𝑏′ respectively, regardless whether these
ciphertexts were produced by the encryption algorithm or by some
other method, including the NANDEVAL procedure itself.)
300 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Thus to prove the theorem, we need to modify (𝐺, 𝐸, 𝐷) into an


encryption scheme supporting the NANDEVAL operation. Our new
scheme will use the same encryption algorithms 𝐸 and 𝐷 but the
following modification 𝐺′ of the key generation algorithm: after run-
ning (𝑑, 𝑒) = 𝐺(1𝑛 ), we will append to the public key an encryption
𝑐∗ = 𝐸𝑒 (𝑑) of the secret key. We have now defined the key generation,
encryption and decryption. CPA security follows from the security of
the original scheme, where by circular security we refer exactly to the
condition that the scheme is secure even if the adversary gets a single 6
Without this assumption we can still obtain a form
encryption of the public key.6 This latter condition is not known to be of FHE known as a leveled FHE where the size of the
implied by standard CPA security but as far as we know is satisfied by public key grows with the depth of the circuit to be
evaluated. We can do this by having ℓ public keys
all natural public key encryptions, including the LWE-based ones we
where ℓ is the depth we want to evaluate, and encrypt
will plug into this theorem later on. the private key of the 𝑖𝑡ℎ key with the 𝑖 + 1𝑠𝑡 public
So, now all that is left is to define the NANDEVAL operation. On key. However, since circular security seems quite
likely to hold, we ignore this extra complication in the
input two ciphertexts 𝑐 and 𝑐′ , we will construct the function 𝑓𝑐,𝑐′ ∶ rest of the discussion.
{0, 1}𝑛 → {0, 1} (where 𝑛 is the length of the secret key) such that
𝑓𝑐,𝑐′ (𝑑) = 𝐷𝑑 (𝑐) NAND 𝐷𝑑 (𝑐′ ). It would be useful to pause at this
point and make sure you understand what are the inputs to 𝑓𝑐,𝑐′ , what
are “hardwired constants” and what is its output. The ciphertexts 𝑐
and 𝑐′ are simply treated as fixed strings and are not part of the input
to 𝑓𝑐,𝑐′ . Rather 𝑓𝑐,𝑐′ is a function (depending on the strings 𝑐, 𝑐′ ) that
maps the secret key into a bit. When running NANDEVAL we of
course do not know the secret key 𝑑, but we can still design a circuit
that computes this function 𝑓𝑐,𝑐′ . Now NANDEVAL(𝑐, 𝑐′ ) will simply
be defined as EVAL(𝑓𝑐,𝑐′ , 𝑐∗ ). Since 𝑐∗ = 𝐸𝑒 (𝑑), we get that

𝐷𝑑 (NANDEVAL(𝑐, 𝑐′ )) = 𝐷𝑑 (EVAL(𝑓𝑐,𝑐′ , 𝑐∗ )) = 𝑓𝑐,𝑐′ (𝑑) = 𝐷𝑑 (𝑐) NAND 𝐷𝑑 (𝑐′ ) .

Thus indeed we map any pair of ciphertexts 𝑐, 𝑐′ that decrypt to 𝑏, 𝑏′


into a ciphertext 𝑐″ that decrypts to 𝑏 NAND 𝑏′ . This is all that we
needed to prove.

P
Don’t let the short proof fool you. This theorem is
quite deep and subtle, and requires some reading and
re-reading to truly “get” it.
15
Fully homomorphic encryption: Construction

In the last lecture we defined fully homomorphic encryption, and


showed the “bootstrapping theorem” that transforms a partially ho-
momorphic encryption scheme into a fully homomorphic encryption,
as long as the original scheme can homomorphically evaluate its own
decryption circuit. In this lecture we will show an encryption scheme
(due to Gentry, Sahai and Waters, henceforth GSW) meeting the lat- 1
This theorem as stated was proven by Brakerski and
ter property. That is, this lecture is devoted to proving1 the following Vaikuntanathan (ITCS 2014) building a line of work
theorem: initiated by Gentry’s original STOC 2009 work. We
will actually prove a weaker version of this theorem,
due to Brakerski and Vaikuntanathan (FOCS 2011),
Theorem 15.1 — FHE from LWE. Assuming the LWE conjecture, there which assumes a quantitative strengthening of LWE.
exists a partially homomorphic public key encryption (𝐺, 𝐸, 𝐷, EVAL) However, we will not follow the proof of Brakerski
that fits the conditions of the bootstrapping theorem (Theo- and Vaikuntanathan but rather a scheme of Gentry,
Sahai and Waters (CRYPTO 2013). Also note that,
rem 14.8). That is, for every two ciphertexts 𝑐 and 𝑐′ , the function as noted in the previous lecture, all of these results
𝑑 ↦ 𝐷𝑑 (𝑐) NAND 𝐷𝑑 (𝑐′ ) can be homomorphically evaluated by require the extra assumption of circular security on top
of LWE to achieve a non-leveled fully homomorphic
EVAL. encryption scheme.

Before the detailed description and analysis, let us first outline our
strategy. The following notion of “noisy homomorphic encryption”
will be of essential importance (see also Fig. 15.1).

Definition 15.2 — Noisy Homomorphic Encryption. A noisy homomorphic


encryption scheme is a four-tuple (𝐺, 𝐸, 𝐷, ENAND) of algorithms
such that (𝐺, 𝐸, 𝐷) is a CPA secure public key scheme and such
that for every keypair (𝑒, 𝑑), there exists a function 𝜂 = 𝜂𝑒,𝑑 which
maps any ciphertext 𝑐 to a number 𝜂(𝑐) ∈ [0, ∞) (which we call the
“noise level” of 𝑐) satisfying the following.
For every keypair (𝑒, 𝑑), if we denote

𝒞𝜃𝑏 = {𝑐 ∶ 𝐷𝑑 (𝑐) = 𝑏, 𝜂(𝑐) ≤ 𝜃}.

then

Compiled on 11.17.2021 22:35


302 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy


𝑞
• 𝐸𝑒 (𝑏) ∈ 𝒞𝑏 for any plaintext 𝑏. That is, “fresh encryptions” have

noise at most 𝑞.

• If 𝑐 ∈ 𝒞𝜂𝑏 with 𝜂 ≤ 𝑞/4, then 𝐷𝑑 (𝑐) = 𝑏. That is, as long as



the noise is at most 𝑞/4 (which is ≫ 𝑞), decryption will still
succeed.

• For any 𝑐 ∈ 𝒞𝜂𝑏 and 𝑐′ ∈ 𝒞𝑏𝜂′ , it holds that

𝑛3 ⋅max{𝜂,𝜂′ }
ENAND(𝑐, 𝑐′ ) ∈ 𝒞𝑏∧𝑏′

as long as 𝑛3 ⋅ max{𝜂, 𝜂′ } < 𝑞/4. That is, as long as noise is not


too large, applying ENAND to 𝑐 and 𝑐′ will yield an encryption
of NAND(𝐷𝑑 (𝑐), 𝐷𝑑 (𝑐′ )) with noise level that is not “too much
higher” than the maximum noise of 𝑐 and 𝑐′ .

The noisy homomorphic encryption actually states that if 𝐶 and 𝐶 ′


encrypt 𝑏 and 𝑏′ up to error 𝜂 and 𝜂′ , respectively, then ENAND(𝑐, 𝑐′ )
encrypts NAND(𝑏, 𝑏′ ) up to some error which can be controlled by
𝜂, 𝜂′ . The coefficient 𝑛3 is not essential here; we just need the order
𝑝𝑜𝑙𝑦(𝑛). This property allows us to perform the ENAND operator
repeatly as long as we can guarantee the accumulated error is smaller
than 𝑞/4, which means that the decryption can be done correctly.
The next theorem tells us with what depth a circuit can be computed
homomorphically.

Figure 15.1: In a noisy homomorphic encryption,


every ciphertext 𝑐 has a “noise” parameter 𝜂(𝑐)
associated with it. When we encrypt 0 or 1, we get

a ciphertext with noise at most 𝑞, while we are
guaranteed to successfully decrypt. Applying the
ENAND operation to two ciphertexts 𝑐 and 𝑐′ yields
a ciphertext with noise level at most 𝑛3 times the
maximum noise of 𝑐 and 𝑐′ . Hence we can compose
ENAND operations to apply any NAND circuit of
depth at most ℓ to fresh encryptions, and succeed in
obtaining a ciphertext decrypting to the circuit output

as long as 𝑛3ℓ 𝑞 ≪ 𝑞/4.

Theorem 15.3If there exists a noisy homomorphic encryption



scheme with 𝑞 = 2 𝑛 , then it can be extended to a homomor-
phic encryption scheme for any circuit with depth smaller than
𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑛).
fu l ly homomorp hi c e nc ry p ti on : con stru c ti on 303

Proof. For any function 𝑓 ∶ {0, 1}𝑚 → {0, 1} which can be described
by a circuit with depth ℓ, we can compute EVAL(𝑓, 𝐸𝑒 (𝑥1 ), ⋯ , 𝐸𝑒 (𝑥𝑚 ))

with error up to 𝑞(𝑛3 )ℓ . (The initial error for 𝐸𝑒 (𝑥𝑖 ) is smaller than

𝑞 and the error will be accumulated with rate up to 𝑛3 .) Thus,
to guarantee that EVAL(𝑓, 𝐸𝑒 (𝑥1 ), ⋯ , 𝐸𝑒 (𝑥𝑚 )) can be decrypted to
√ √
𝑓(𝑥1 , ⋯ , 𝑥𝑚 ) correctly, we only need 𝑞(𝑛3 )ℓ ≪ 𝑞, i.e., 𝑛3ℓ ≪ 𝑞 =
√ √
2 𝑛/2 . This is equalvent to 3ℓ log(𝑛) ≪ 𝑛/2, which can be guaran-
teed when ℓ = 𝑛𝑜(1) or ℓ = 𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑛).


We will assume the LWE conjecture with 𝑞(𝑛) ≈ 2 𝑛 in the re-
mainder of this chapter. With Theorem 15.3 in hand, our goal is to
construct a noisy FHE such that the decryption map (specifically the
map 𝑑 ↦ 𝐷𝑑 (𝑐) for any fixed ciphertext 𝑐) can be computed by a cir-
cuit with depth at most 𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑛). (Theorem 14.8 refers to the map
𝑑 ↦ ¬(𝐷𝑑 (𝑐) ∧ 𝐷𝑑 (𝑐′ )), but this latter map is obtained by applying one
more NAND gate to two parallel executions of 𝑑 ↦ 𝐷𝑑 (𝑐), and hence
if the map 𝑑 ↦ 𝐷𝑑 (𝑐) has depth at most 𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑛) then so does the
map 𝑑 ↦ ¬(𝐷𝑑 (𝑐) ∧ 𝐷𝑑 (𝑐′ )).) Once we do this then we can obtain a
fully homomorphic encryption scheme. We will head into some de-
tails show how to construct things we want in the rest of this chapter.
The most technical and interesting part would be how to upper bound
the noise/error.

15.1 PRELUDE: FROM VECTORS TO MATRICES


In the linear homomorphic scheme we saw in the last lecture, every
ciphertext was a vector 𝑐 ∈ ℤ𝑛𝑞 such that ⟨𝑐, 𝑠⟩ equals (up to scaling by
⌊ 2𝑞 ⌋) the plaintext bit. We saw that adding two ciphertexts modulo 𝑞
corresponded to XOR’ing (i.e., adding modulo 2) the corresponding
two plaintexts. That is, if we define 𝑐 ⊕ 𝑐′ as 𝑐 + 𝑐′ (mod 𝑞) then
performing the ⊕ operation on the ciphertexts corresponds to adding
modulo 2 the plaintexts.
However, to get to a fully, or even partially, homomorphic scheme,
we need to find a way to perform the NAND operation on the two
plaintexts. The challenge is that it seems that to do that we need to
find a way to evaluate multiplications: find a way to define some oper-
ation ⊗ on ciphertexts that corresponds to multiplying the plaintexts.
Alas, a priori, there doesn’t seem to be a natural way to multiply two
vectors.
The GSW approach to handle this is to move from vectors to ma-
trices. As usual, it is instructive to first consider the cryptographer’s
dream world where Gaussian elimination doesn’t exist. In this case,
the GSW ciphertext encrypting 𝑏 ∈ {0, 1} would be an 𝑛 × 𝑛 matrix 𝐶
over ℤ𝑞 such that 𝐶𝑠 = 𝑏𝑠 where 𝑠 ∈ ℤ𝑛𝑞 is the secret key. That is, the
304 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

encryption of a bit 𝑏 is a matrix 𝐶 such that the secret key is an eigen-


vector (modulo 𝑞) of 𝐶 with corresponding eigenvalue 𝑏. (We defer
discussion of how the encrypting party generates such a ciphertext,
since this is in any case only a “dream” toy example.)

P
You should make sure you understand the types of
all the identifiers we refer to. In particular, above 𝐶
is an 𝑛 × 𝑛 matrix with entries in ℤ𝑞 , 𝑠 is a vector in
ℤ𝑞𝑛 , and 𝑏 is a scalar (i.e., just a number) in {0, 1}. See
Fig. 15.2 for a visual representation of the ciphertexts
in this “naive” encryption scheme. Keeping track of
the dimensions of all objects will become only more
important in the rest of this lecture.

Given 𝐶 and 𝑠 we can recover 𝑏 by just checking if 𝐶𝑠 = 𝑠 or 𝐶𝑠 =


0𝑛 . The scheme allows homomorphic evaluation of both addition
(modulo 𝑞) and multiplication, since if 𝐶𝑠 = 𝑏𝑠 and 𝐶 ′ 𝑠 = 𝑏′ 𝑠 then we
can define 𝐶 ⊕ 𝐶 ′ = 𝐶 + 𝐶 ′ (where on the righthand side, addition
is simply done in ℤ𝑞 ) and 𝐶 ⊗ 𝐶 ′ = CC (where again this refers to

matrix multiplication in ℤ𝑞 ).
Indeed, one can verify that both addition and multiplication suc- Figure 15.2: In the “naive” version of the GSW encryp-
ceed since tion, to encrypt a bit 𝑏 we output an 𝑛 × 𝑛 matrix 𝐶
such that 𝐶𝑠 = 𝑏𝑠 where 𝑠 ∈ ℤ𝑛 𝑞 is the secret key. In
(𝐶 + 𝐶 ′ )𝑠 = (𝑏 + 𝑏′ )𝑠
this scheme we can transform encryptions 𝐶, 𝐶 ′ of
and 𝑏, 𝑏′ respectively to an encryption 𝐶 ″ of NAND(𝑏, 𝑏′ )
by letting 𝐶 ″ = 𝐼 − 𝐶𝐶 ′ .
CC 𝑠 = 𝐶(𝑏′ 𝑠) = 𝑏𝑏′ 𝑠

where all these equalities are in ℤ𝑞 .


Addition modulo 𝑞 is not the same as XOR, but given these multi-
plication and addition operations, we can implement the NAND oper-
ation as well. Specifically, for every 𝑏, 𝑏′ ∈ {0, 1}, 𝑏 NAND 𝑏′ = 1 − 𝑏𝑏′ .
Hence we can take a ciphertext 𝐶 encrypting 𝑏 and a ciphertext 𝐶 ′
encrypting 𝑏′ and transform these two ciphertexts to the ciphertext
𝐶 ″ = (𝐼 − 𝐶𝐶 ′ ) that encrypts 𝑏 NAND 𝑏′ (where 𝐼 is the identity
matrix). Thus in a world without Gaussian elimination it is not hard
to get a fully homomorphic encryption.

R
Remark 15.4 — Private key FHE. We have not shown
how to generate a ciphertext without knowledge of 𝑠,
and hence strictly speaking we only showed in this
world how to get a private key fully homomorphic
encryption. Our “real world” scheme will be a full
fledged public key FHE. However we note that private
key homomorphic encryption is already very inter-
esting and in fact sufficient for many of the “cloud
computing” applications. Moreover, Rothblum gave
fu l ly homomorp hi c e nc ry p ti on : con stru c ti on 305

a generic transformation from a private key homo-


morphic encryption to a public key homomorphic
encryption.

15.2 REAL WORLD PARTIALLY HOMOMORPHIC ENCRYPTION


We now discuss how we can obtain an encryption in the real world
where, as much as we’d like to ignore it, there are people who walk
among us (not to mention some computer programs) that actually
know how to invert matrices. As usual, the idea is to “fool Gaussian
elimination with noise” but we will see that we have to be much more
careful about “noise management”, otherwise even for the party hold- 2
For this reason, Craig Gentry called his highly rec-
ing the secret key the noise will overwhelm the signal.2 ommended survey on fully homomorphic encryption
The main idea is that we can expect the following problem to be and other advanced constructions computing on the
edge of chaos.
hard for a random secret 𝑠 ∈ ℤ𝑛𝑞 : distinguish between samples of
random matrices 𝐶 and matrices where 𝐶𝑠 = 𝑏𝑠 + 𝑒 for some 𝑏 ∈

{0, 1} and “short” 𝑒 satisfying |𝑒𝑖 | ≤ 𝑞 for all 𝑖. This yields a natural
candidate for an encryption scheme where we encrypt 𝑏 by a matrix 𝐶 3
We deliberately leave some flexibility in the defini-
satisfying 𝐶𝑠 = 𝑏𝑠 + 𝑒 where 𝑒 is a “short” vector.3 tion of “short”. While initially “short” might mean

We can now try to check what adding and multiplying two matri- that |𝑒𝑖 | < 𝑞 for every 𝑖, decryption will succeed as
long as |𝑒𝑖 | is, say, at most 𝑞/100𝑛.
ces does to the noise. If 𝐶𝑠 = 𝑏𝑠 + 𝑒 and 𝐶 ′ 𝑠 = 𝑏′ 𝑠 + 𝑒′ then

(𝐶 + 𝐶 ′ )𝑠 = (𝑏 + 𝑏′ )𝑠 + (𝑒 + 𝑒′ ) (15.1)

and
CC 𝑠 = 𝐶(𝑏′ 𝑠 + 𝑒′ ) + 𝑒 = 𝑏𝑏′ 𝑠 + (𝑏′ 𝑒 + 𝐶𝑒′ ) . (15.2)

P
I recommend you pause here and check for yourself
whether it will be the case that 𝐶 + 𝐶 ′ encrypts 𝑏 + 𝑏′
and CC encrypts 𝑏𝑏′ up to small noise or not.

We would have loved to say that we can define as above 𝐶 ⊕ 𝐶 ′ =


𝐶 + 𝐶 ′ (mod 𝑞) and 𝐶 ⊗ 𝐶 ′ = CC (mod 𝑞). For this we would need

that the vector (𝐶 + 𝐶 ′ )𝑠 equals (𝑏 + 𝑏′ )𝑠 plus a “short” vector and the


vector CC 𝑠 equals 𝑏𝑏′ 𝑠 plus a “short” vector. The former statement

indeed holds. Looking at (15.2) we see that (𝐶 + 𝐶 ′ )𝑠 equals (𝑏 + 𝑏′ )𝑠


up to the “noise” vector 𝑒 + 𝑒′ , and if 𝑒, 𝑒′ are “short” then 𝑒 + 𝑒′ is
not too long either. That is, if |𝑒𝑖 | < 𝜂 and |𝑒′𝑖 | < 𝜂′ for every 𝑖 then
|𝑒𝑖 + 𝑒′𝑖 | < 𝜂 + 𝜂′ . So we can at least handle a significant number of
additions before the noise gets out of hand.
However, if we consider (15.2), we see that CC will be equal to 𝑏𝑏′ 𝑠

plus the “noise vector” 𝑏′ 𝑒 + 𝐶𝑒′ . The first component 𝑏′ 𝑒 of this noise
vector is “short” (after all 𝑏′ ∈ {0, 1} and 𝑒 is “short”). However, the
306 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

second component 𝐶𝑒′ could be a very large vector. Indeed, since 𝐶


looks like a random matrix in ℤ𝑞 , no matter how small the entries of
𝑒′ , many of the entries of 𝐶𝑒′ will be large. Hence multiplying 𝑒′ by 𝐶
takes us “beyond the edge of chaos” and makes the noise too large for
decryption to be successful.

15.3 NOISE MANAGEMENT VIA ENCODING


The problem we had above is that the entries of 𝐶 are elements in ℤ𝑞
that can be very large, while we would have loved them to be small
numbers such as 0 or 1. At this point one could say
“If only there was some way to encode numbers between
0 and 𝑞 − 1 using only 0’s and 1’s”

If you think about it hard enough, it turns out that there is some-
thing known as the “binary basis” that allows us to encode a number 4
If we were being pedantic the length of the vector
𝑥 ∈ ℤ𝑞 as a vector 𝑥̂ ∈ {0, 1}log 𝑞 .4 What’s even more surprising is that (and other constant below) should be the integer
⌈log 𝑞⌉ but I omit the ceiling symbols for simplicity of
this seemingly trivial trick turns out to be immensely useful. We will notation.
define the binary encoding of a vector or matrix 𝑥 over ℤ𝑞 by 𝑥.̂ That is,
𝑥̂ is obtained by replacing every coordinate 𝑥𝑖 with log 𝑞 coordinates
𝑥𝑖,0 , … , 𝑥𝑖,log 𝑞−1 such that

log 𝑞−1
𝑥𝑖 = ∑ 2𝑗 𝑥𝑖,𝑗 . (15.3)
𝑗=0

Specifically, if 𝑠 ∈ ℤ𝑛𝑞 , then we denote by 𝑠 ̂ the 𝑛 log 𝑞-dimensional


vector with entries in {0, 1}, such that each log 𝑞-sized block of 𝑠 ̂ en-
codes a coordinate of 𝑠. Similarly, if 𝐶 is an 𝑚 × 𝑛 matrix, then we de-
note by 𝐶 ̂ the 𝑚 × 𝑛 log 𝑞 matrix with entries in {0, 1} that corresponds
to encoding every 𝑛-dimensional row of 𝐶 by an 𝑛 log 𝑞-dimensional Figure 15.3: We can encode a vector 𝑠 ∈ ℤ𝑛
𝑞 as a vector
𝑛 log 𝑞
row where each log 𝑞-sized block corresponds to a single entry. (We 𝑠 ̂ ∈ ℤ𝑞 that has only entries in {0, 1} by using
the binary encoding, replacing every coordinate of 𝑠
still think of the entries of these vectors and matrices as elements of ℤ𝑞 with a log 𝑞-sized block in 𝑠.̂ The decoding operation
and so all calculations are still done modulo 𝑞.) is linear and so we can write 𝑠 = 𝑄𝑠 ̂ for a specific
(simple) 𝑛 × (𝑛 log 𝑞) matrix 𝑄.
While encoding in the binary basis is not a linear operation, the
decoding operation is linear as one can see in (15.3). We let 𝑄 be the
𝑛 × (𝑛 log 𝑞) “decoding” matrix that maps an encoding vector 𝑠 ̂ back
to the original vector 𝑠. Specifically, every row of 𝑄 is composed of
𝑛 blocks each of log 𝑞 size, where the 𝑖-th row has only the 𝑖-th block
nonzero, and equal to the values (1, 2, 4, … , 2log 𝑞−1 ). It’s a good exer-
cise to verify that for every vector 𝑠 ∈ ℤ𝑛𝑞 and matrix 𝐶 ∈ ℤ𝑛×𝑛 𝑞 , 𝑄𝑠 ̂ = 𝑠
and 𝐶𝑄 ̂ ⊤ = 𝐶. (See Fig. 15.3 amd Fig. 15.4.)
Figure 15.4: We can encode an 𝑛 × 𝑛 matrix 𝐶 over ℤ𝑞
We describe below the key generation,
Our final encryption scheme:
by an 𝑛 × (𝑛 log 𝑞) matrix 𝐶 ̂ using the binary basis.
encryption and decryption algorithms of our final homomorphic We have the equation 𝐶 = 𝐶𝑄 ̂ ⊤ where 𝑄 is the same
encryption scheme (FHEENC). It will satisfy the following properties: matrix we use to decode a vector.
fu l ly homomorp hi c e nc ry p ti on : con stru c ti on 307

1. Ciphertexts are (𝑛 log 𝑞) × (𝑛 log 𝑞) matrices 𝐶 with all coefficients


in {0, 1}.
𝑛 log 𝑞
2. The secret key is a vector 𝑠 ∈ ℤ𝑛𝑞 . We let 𝑣 ∈ ℤ𝑞 be the vector
𝑉 = 𝑄⊤ 𝑠.

3. An encryption of 𝑏 ∈ {0, 1} is a matrix 𝐶 satisfying the following


“ciphertext equation”
𝐶𝑣 = 𝑏𝑣 + 𝑒 (15.4)

for a “short” 𝑒.

Given the conditions 1,2, and 3, we can now define the addition and
multiplication operations for two ciphertexts 𝐶, 𝐶 ′ as follows:

• 𝐶 ⊕ 𝐶 ′ = 𝐶 + 𝐶 ′ (mod 𝑞)
̂
• 𝐶 ⊗ 𝐶 ′ = (CQ )𝐶 ′

P
Please try to verify that if 𝐶, 𝐶 ′ are encryptions of 𝑏, 𝑏′
then 𝐶 ⊕ 𝐶 ′ and 𝐶 ⊗ 𝐶 ′ will be encryptions of 𝑏 + 𝑏′
and 𝑏𝑏′ respectively.

Correctness of operations. Suppose that 𝐶𝑣 = 𝑏𝑣 + 𝑒 and 𝐶 ′ 𝑣 = 𝑏′ 𝑣 + 𝑒′ .


Then
(𝐶 ⊕ 𝐶 ′ )𝑣 = (𝐶 + 𝐶 ′ )𝑣 = (𝑏 + 𝑏′ )𝑣 + (𝑒 + 𝑒′ ) (15.5)

which means that 𝐶 ⊕ 𝐶 ′ satisfies the ciphertext equation (15.4)


with respect to the plaintext 𝑏 + 𝑏′ , with the short vector 𝑒 + 𝑒′ .
Let’s now analyze the more challenging case of 𝐶 ⊗ 𝐶 ′ .

̂ ̂
(15.6)
⊤ ⊤
(𝐶 ⊗ 𝐶 ′ )𝑣 = (CQ )𝐶 ′ 𝑣 = (CQ )(𝑏′ 𝑣 + 𝑒′ ) .
̂ ⊤ = 𝐴 for every matrix 𝐴, the righthand
But since 𝑣 = 𝑄⊤ 𝑠 and 𝐴𝑄
side of (15.6) equals

̂ ̂ ̂
(15.7)
⊤ ⊤ ⊤
(CQ )(𝑏′ 𝑄⊤ 𝑠 + 𝑒′ ) = 𝑏′ 𝐶𝑄⊤ 𝑠 + (CQ )𝑒′ = 𝑏′ 𝐶𝑣 + (CQ )𝑒′

but since 𝐵
̂ is a matrix with small coefficients for every 𝐵 and 𝑒′
is short, the righthand side of (15.7) equals 𝑏′ 𝐶𝑣 up to a short vector,
and since 𝐶𝑣 = 𝑏𝑣 + 𝑒 and 𝑏′ 𝑒 is short, we get that (𝐶 ⊗ 𝐶 ′ )𝑣 equals
𝑏′ 𝑏𝑣 plus a short vector as desired.
We can now define

ENAND(𝐶, 𝐶 ′ ) = 𝐼 − 𝐶 ⊗ 𝐶 ′ .
308 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Keeping track of parameters. For 𝐶 that encrypts a plaintext 𝑏, let 𝜂(𝐶) =


max𝑖∈[𝑛] |𝐶𝑣 − 𝑏𝑣|. Now if we can see that if 𝐶 encrypts 𝑏 with noise
𝜂(𝐶) and 𝐶 ′ encrypts 𝑏′ with noise 𝜂(𝐶 ′ ), then ENAND(𝐶, 𝐶 ′ ) will
encrypt 1 − 𝑏𝑏′ = NAND(𝑏, 𝑏′ ) with noise of magnitude at most
𝑂(𝜇 + 𝑛 log 𝑞𝜇′ ), which is smaller than 𝑛3 ⋅ max{𝜂(𝐶), 𝜂(𝐶 ′ )} for

𝑞 ≈ 2 𝑛.

15.4 PUTTING IT ALL TOGETHER


We now describe the full scheme. We are going to use a quantitatively
stronger version of LWE. Namely, the 𝑞(𝑛)-dLWE assumption for

𝑞(𝑛) = 2 𝑛 . It is not hard to show that we can relax our assumption to
𝑞(𝑛)-LWE 𝑞(𝑛) = 2𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑛) and Brakerski and Vaikuntanathan showed
how to relax the assumption to standard (i.e. 𝑞(𝑛) = 𝑝𝑜𝑙𝑦(𝑛)) LWE
though we will not present this here.

FHEENC:

• Key generation: As in the scheme of last lecture


the secret key is 𝑠 ∈ ℤ𝑛𝑞 and the public key is
a generator 𝐺𝑠 such that samples from 𝐺𝑠 (1𝑛 )
are indistinguishable from independent ran-
dom samples from ℤ𝑛𝑞 but if 𝑐 is output by 𝐺𝑠

then |⟨𝑐, 𝑠⟩| < 𝑞, where the inner product (as
all other computations) is done modulo 𝑞 and
for every 𝑥 ∈ ℤ𝑞 = {0, … , 𝑞 − 1} we define
|𝑥| = min{𝑥, 𝑞 − 𝑥}. As before, we can assume
that 𝑠1 = ⌊𝑞/2⌋ which implies that (𝑄⊤ 𝑠)1 is
also ⌊𝑞/2⌋ since (as can be verified by direct
inspection) the first row of 𝑄⊤ is (1, 0, … , 0).
• Encryption: To encrypt 𝑏 ∈ {0, 1}, let
𝑑1 , … , 𝑑𝑛 log 𝑞 ←𝑅 𝐺𝑠 (1𝑛 ) output 𝐶 = (𝑏𝑄̂ ⊤ + 𝐷)

where 𝐷 is the matrix whose rows are


𝑑1 , … , 𝑑𝑛 log 𝑞 generated from 𝐺𝑠 . (See Fig. 15.5)
• Decryption: To decrypt the ciphertext 𝐶, we
output 0 if |(CQ 𝑠)1 | 0.1𝑞 and output 1 if

<
0.6𝑞 > |(CQ 𝑠)1 | > 0.4𝑞, see Fig. 15.6. (It doesn’t

matter what we output on other cases.)


• NAND evaluation: Given ciphertexts 𝐶, 𝐶 ′ ,
we define 𝐶∧𝐶 ′ (sometimes also denoted as
̂
NANDEVAL(𝐶, 𝐶 ′ )) to equal 𝐼 (CQ )𝐶 ′ ,


where 𝐼 is the (𝑛 log 𝑞) × (𝑛 log 𝑞) identity matrix.

P
fu l ly homomorp hi c e nc ry p ti on : con stru c ti on 309

Please take your time to read the definition of the


scheme, and go over Fig. 15.5 and Fig. 15.6 to make
sure you understand it.

Figure 15.5: In our fully homomorphic encryption,


the public key is a trapdoor generator 𝐺𝑠 . To encrypt
a bit 𝑏, we output 𝐶 = (𝑏𝑄̂ ⊤ + 𝐷) where 𝐷 is a

(𝑛 log 𝑞) × 𝑛 matrix whose rows are generated using


𝐺𝑠 .

Figure 15.6: We decrypt a ciphertext 𝐶 = (𝑏𝑄̂


⊤ + 𝐷)

by looking at the first coordinate of CQ 𝑠 (or equiv-

alently, CQ 𝑄𝑠). ̂ If 𝑏 = 0 then this equals the first

coordinate of 𝐷𝑠, which is at most 𝑞 in magintude.
If 𝑏 = 1 then we get an extra factor of 𝑄⊤ 𝑠 which we
set to be in the interval (0.499𝑞, 0.51𝑞). We can think
of either 𝑠 or 𝑠 ̂ as our secret key.

15.5 ANALYSIS OF OUR SCHEME


To show that that this scheme is a valid partially homomorphic
scheme we need to show the following properties:

1. Correctness: The decryption of an encryption of 𝑏 ∈ {0, 1} equals 𝑏.

2. CPA security: An encryption of 0 is computationally indistinguish-


able from an encryption of 1 to someone that got the public key.
310 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

3. Homomorphism: If 𝐶 encrypts 𝑏 and 𝐶 ′ encrypts 𝑏′ then 𝐶∧𝐶 ′


encrypts 𝑏 NAND 𝑏′ (with a higher amount of noise). The growth
of the noise will be the reason that we will not get immediately a
fully homomorphic encryption.

4. Shallow decryption circuit: To plug this scheme into the boot-


strapping theorem we will need to show that its decryption al-
gorithm (or more accurately, the function in the statement of the
bootstrapping theorem) can be evaluated in depth 𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑛) (inde-
pendently of 𝑞), and that moreover, the noise grows slowly enough
that our scheme is homomorphic with respect to such circuits.

Once we obtain 1-4 above, we can plug FHEENC into the Bootstrap-
ping Theorem (Theorem 14.8) and thus complete the proof of exis-
tence of a fully homomorphic encryption scheme. We now address
those points one by one.

15.5.1 Correctness
Correctness of the scheme will follow from the following stronger
condition:
Lemma 15.5 For every 𝑏 ∈ {0, 1}, if 𝐶 is the encryption of 𝑏 then it is an
(𝑛 log 𝑞) × (𝑛 log 𝑞) matrix satisfying

CQ 𝑠 = 𝑏𝑄⊤ 𝑠 + 𝑒


where max |𝑒𝑖 | ≪ 𝑞.

Proof. For starters, let us see that the dimensions make sense: the
encryption of 𝑏 is computed by 𝐶 = (𝑏𝑄̂ ⊤ + 𝐷) where 𝐷 is an

(𝑛 log 𝑞) × 𝑛 matrix satisfying |𝐷𝑠|𝑖 ≤ 𝑞 for every 𝑖.
Since 𝑄⊤ is also an (𝑛 log 𝑞) × 𝑛 matrix, adding 𝑏𝑄⊤ (i.e. either 𝑄⊤
or the all-zeroes matrix, depending on whether or not 𝑏 = 1) to 𝐷
makes sense and applying the ⋅ ̂ operation will transform every row
to length 𝑛 log 𝑞 and hence 𝐶 is indeed a square (𝑛 log 𝑞) × (𝑛 log 𝑞)
matrix.
Let us now see what this matrix 𝐶 does to the vector 𝑣 = 𝑄⊤ 𝑠.
Using the fact that 𝑀̂ 𝑄⊤ = 𝑀 for every matrix 𝑀 , we get that

𝐶𝑣 = (𝑏𝑄⊤ + 𝐷)𝑠 = 𝑏𝑣 + 𝐷𝑠

but by construction |(𝐷𝑠)𝑖 | ≤ 𝑞 for every 𝑖.

Lemma 15.5 implies correctness of decryption since by construction


we ensured that (𝑄⊤ 𝑠)1 ∈ (0.499𝑞, 0.5001𝑞) and hence we get that if
𝑏 = 0 then |(𝐶𝑣)1 | = 𝑜(𝑞) and if 𝑏 = 1 then 0.499𝑞 − 𝑜(𝑞) ≤ |(𝐶𝑣 )1 | ≤
0.501𝑞 + 𝑜(𝑞).
fu l ly homomorp hi c e nc ry p ti on : con stru c ti on 311

15.5.2 CPA Security


To show CPA security we need to show that an encryption of 0 is
indistinguishable from an encryption of 1. However, by the security of
the trapdoor generator, an encryption of 𝑏 computed according to our
algorithm will be indistinguishable from an encryption of 𝑏 obtained
when the matrix 𝐷 is a random (𝑛 log 𝑞) × 𝑛 matrix. Now in this case
the encryption is obtained by applying the ⋅ ̂ operation to 𝑏𝑄⊤ + 𝐷
but if 𝐷 is uniformly random then for every choice of 𝑏, 𝑏𝑄⊤ + 𝐷 is
uniformly random (since a fixed matrix plus a random matrix yields a
random matrix) and hence the matrix 𝑏𝑄⊤ + 𝐷 (and so also the matrix
𝑏𝑄̂⊤ + 𝐷) contains no information about 𝑏. This completes the proof

of CPA security (can you see why?).


If we want to plug in this scheme in the bootstrapping theorem,
then we will also assume that it is circular secure. It seems a reasonable
assumption though unfortuantely at the moment we do not know how
to derive it from LWE. (If we don’t want to make this assumption we
can still obtained a leveled fully homomorphic encryption as discussed
in the previous lecture.)

15.5.3 Homomorphism
Let 𝑣 = 𝑄⊤ 𝑠, 𝑏 ∈ {0, 1} and 𝐶 be a ciphertext such that 𝐶𝑣 = 𝑏𝑣 + 𝑒.
We define the noise of 𝐶, denoted as 𝜇(𝐶) to be the maximum of |𝑒𝑖 |
over all 𝑖 ∈ [𝑛 log 𝑞]. We make the following lemma, which we’ll call
the “noisy homomorphism lemma”:
Lemma 15.6 Let 𝐶, 𝐶 ′ be ciphertexts encrypting 𝑏, 𝑏′ respectively with
𝜇(𝐶), 𝜇(𝐶 ′ ) ≤ 𝑞/4. Then 𝐶 ″ = 𝐶∧𝐶 ′ encrypts 𝑏 NAND 𝑏′ and satisfies

𝜇(𝐶 ″ ) ≤ (2𝑛 log 𝑞) max{𝜇(𝐶), 𝜇(𝐶 ′ )} (15.8)

Proof. This follows from the calculations we have done before. As


we’ve seen,
̂⊤ 𝐶 ′ 𝑣 = 𝐶𝑄
𝐶𝑄 ̂⊤ 𝑄⊤ 𝑠+𝐶𝑄
̂⊤ (𝑏′ 𝑣+𝑒′ ) = 𝑏′ 𝐶𝑄 ̂⊤ 𝑒′ = 𝑏′ (𝐶𝑣)+𝐶𝑄
̂⊤ 𝑒′ = 𝑏𝑏′ 𝑣+𝑏′ 𝑒+𝐶𝑄
̂⊤ 𝑒′

̂⊤ is a 0/1 matrix with every row of length 𝑛 log 𝑞, for


But since 𝐶𝑄
̂
every 𝑖 (𝐶𝑄⊤ 𝑒′ )𝑖 ≤ (𝑛 log 𝑞) max𝑗 |𝑒′𝑗 |. We see that the noise vector in
the product has magnitude at most 𝜇(𝐶) + 𝑛 log 𝑞𝜇(𝐶 ′ ). Adding the
identity for the NAND operation adds at most 𝜇(𝐶) + 𝜇(𝐶 ′ ) to the
noise, and so the total noise magnitude is bounded by the righthand
side of (15.8).

15.5.4 Shallow decryption circuit


Recall that to plug in our homomorphic encryption scheme into
the bootstrapping theorem, we needed to show that for every ci-
phertext 𝐶 (generated by the encryption algorithm) the function
312 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

𝑓𝐶 ∶ {0, 1}𝑛 log 𝑞 → {0, 1} can be computed by a circuit of sufficiently


shallow, where 𝑓𝐶 is defined as

𝑓𝐶 (𝑑) = 𝐷𝑑 (𝐶)

where 𝑑 is the secret key and 𝐷𝑑 (𝐶) denotes the decryption algorithm
applied to 𝐶.
In our case a circuit of 𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑛) ≪ 𝑛5 will be “sufficiently shal-
low”. Specifically, by repeatedly applying the noisy homomorphism
lemma (Lemma 15.6), we can show that can homorphically evalu-
ate every circuit of NAND gates whose depth ℓ satisfies the condition

(2𝑛 log 𝑞)ℓ ≪ 𝑞. If 𝑞 = 2 𝑛 then (assuming 𝑛 is sufficiently large) as
long as ℓ < 𝑛0.49 this will be satisfied.
We will encode the secret key of the encryption scheme as the bi-
nary string 𝑠 ̂ which describes our vector 𝑠 ∈ 𝑍𝑞𝑛 as a bit string of
length 𝑛 log 𝑞. Given a ciphertext 𝐶, the decryption algorithm takes
the dot product modulo 𝑞 of 𝑠 with the first row of CQ . This can be

equivalently described as taking the dot product of 𝑠 ̂ with the first


row of CQ 𝑄. Decryption outputs 0 (respectively 1) if the resulting

number is small (respectively large).


In particular to show that 𝑓𝐶 (⋅) can be homomorphically evaluated
𝑛 log 𝑞
it will suffice to show that for every fixed vector 𝑐 ∈ ℤ𝑞 there is a
𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑛) ≪ 𝑛0.49 depth circuit 𝐹 that on input a string 𝑠 ̂ ∈ {0, 1}𝑛 log 𝑞
will output 0 if |⟨𝑐, 𝑠⟩|̂ < 𝑞/10 and output 1 if |⟨𝑐, 𝑠⟩|̂ > 𝑞/5. (We
don’t care what 𝐹 does otherwise.) The above suffices since given
a ciphertext 𝐶 we can use 𝐹 with the vector 𝑐 being the top row of
CQ 𝑄, and hence ⟨𝑐, 𝑠⟩̂ would correspond to the first entry of CQ 𝑠.
⊤ ⊤

P
Please make sure you understand the above argument.

If 𝑐 = (𝑐1 , … , 𝑐𝑛 log 𝑞 ) is a vector then to compute its inner product


with a 0/1 vector 𝑠 ̂ we simply need to sum up the numbers 𝑐𝑖 where
𝑠𝑖̂ = 1. Summing up 𝑚 numbers can be done via the obvious recur-
sion in depth that is log 𝑚 times the depth for a single addition of two
numbers. However, the naive way to add two numbers in ℤ𝑞 (each
represented by log 𝑞 bits) will have depth 𝑂(log 𝑞) which is too much
for us. The issue is that while 𝑚 = 𝑛 log 𝑞 is polynomial in 𝑛, 𝑞 itself

has exponential magnitude. In particular log 𝑞 ≈ 𝑛, and we cannot
afford to use a circuit of that depth.

P
Please stop here and see if you understand why the
natural circuit to compute the addition of two num-
bers modulo 𝑞 (represented as log 𝑞-length binary
fu l ly homomorp hi c e nc ry p ti on : con stru c ti on 313

strings) will require depth 𝑂(log 𝑞). As a hint, one


needs to keep track of the “carry”.

Fortunately, because we only care about accuracy up to 𝑞/10, we


can make the calculation “shallower”. Specifically, if we add 𝑚 num-
bers in ℤ𝑞 (each represented by log 𝑞 bits), we can drop all but the first
100 log 𝑚 most significant digits of our numbers. The reason is that
dropping this can change each number by at most (𝑞/𝑚100 ), and so if
we ignore these digits, then it would change the sum of the 𝑚 num-
bers by at most 𝑚(𝑞/𝑚100 ) ≪ 𝑞. Hence we can easily do this work in
𝑝𝑜𝑙𝑦(log 𝑚) depth, which is 𝑝𝑜𝑙𝑦(log 𝑛) since 𝑚 = 𝑝𝑜𝑙𝑦(𝑛).
Let us now show this more formally:
Lemma 15.7 𝑞 there exists some function 𝑓 ∶ {0, 1}
For every 𝑐 ∈ ℤ𝑚 𝑚

{0, 1} such that:

1. For every 𝑠 ̂ ∈ {0, 1}𝑛 such that |⟨𝑠,̂ 𝑐⟩| < 0.1𝑞, 𝑓(𝑠)̂ = 0

2. For every 𝑠 ̂ ∈ {0, 1}𝑛 such that 0.4𝑞 < |⟨𝑠,̂ 𝑐⟩| < 0.6𝑞, 𝑓(𝑠)̂ = 1

3. There is a circuit computing 𝑓 of depth at most 100(log 𝑚)3 .

::: {.proof data-ref=“decdepthlem”} For every number 𝑥 ∈ ℤ𝑞 ,


define 𝑥̃ to be the number that is obtained by writing 𝑥 in the binary
basis and setting all digits except the 10 log 𝑚 most significant ones to
zero. Note that 𝑥̃ ≤ 𝑥 ≤ 𝑥̃ + 𝑞/𝑚10 . The idea is that we will do the
calculation by changing every number 𝑐𝑖 and the modulos 𝑞 into their
correponding numbers 𝑐𝑖̃ and 𝑞.̃
We define 𝑓(𝑠)̂ to equal 1 if | ∑ 𝑠𝑖̂ 𝑐𝑖̃ (mod 𝑞)|
̃ ≥ 0.3𝑞 ̃ and to equal
0 otherwise (where as usual the absolute value of 𝑥 modulo 𝑞 ̃ is the
minimum of 𝑥 and 𝑞 ̃ − 𝑥.) All numbers involved have zeroes in all but
the 10 log 𝑚 most significant digits and so these less significant digits
can be ignored. Hence we can add any pair of such numbers modulo 𝑞 ̃
in depth 𝑂(log 𝑚)2 using the standard elementary school algorithm to
add two ℓ-digit numbers in 𝑂(ℓ2 ) steps.
Now we can add the 𝑚 numbers by adding pairs, and then adding
up the results, and this way in a binary tree of depth log 𝑚 to get a
total depth of 𝑂(log 𝑚)3 . So, all that is left to prove is that this function
𝑓 satisfies the conditions (1) and (2).
If we look at the non modular sum then | ∑ 𝑠𝑖̂ 𝑐𝑖̃ − ∑ 𝑠𝑖̂ 𝑐𝑖 | <
𝑚𝑞/𝑚10 = 𝑞/𝑚9 so now we want to show that the effect of taking
modulo 𝑞 ̃ is not much different from taking modulo 𝑞. Indeed, note
that this sum (before a modular reduction) is an integer between
0 and 𝑞𝑚. If 𝑥 is such an integer and we divide 𝑥 by 𝑞 to write
𝑥 = 𝑘𝑞 + 𝑟 for 𝑟 < 𝑞, then since 𝑥 < 𝑞𝑚, 𝑘 < 𝑚, and so we can write
𝑥 = 𝑘𝑞 ̃ + 𝑘(𝑞 − 𝑞)̃ + 𝑟 so the difference between 𝑘 mod 𝑞 and 𝑘 mod 𝑞 ̃
314 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

will be (in our standard modular metric) at most 𝑚𝑞/𝑚10 = 𝑞/𝑚9 .


Overall we get that if ∑ 𝑠𝑖̂ 𝑐𝑖 mod 𝑞 is in the interval [0.4𝑞, 0.6𝑞] then
∑ 𝑠𝑖̂ 𝑐𝑖̃ (mod 𝑞)̃ will be in the interval [0.4𝑞 − 100𝑞/𝑚9 , 0.6𝑞 + 100𝑞/𝑚9 ]
which is contained in [0.3𝑞,̃ 0.7𝑞].̃
This completes the proof that our scheme can fit into the bootstrap-
ping theorem (i.e., of Theorem 15.1), hence completing the descrip-
tion of the fully homomorphic encryption scheme.

P
Now would be a good point to go back and see you
understand how all the pieces fit together to obtain
the complete construction of the fully homomorphic
encryption scheme.

15.6 ADVANCED TOPICS:

15.6.1 Fully homomorphic encryption for approximate computation over


the real numbers: CKKS
We have seen how a fully homomorphic encryption for a plaintext
bit 𝑏 can be constructed and we are able to evaluate addition and
multiplication of ciphertexts as well as a NAND gate in the ciphertext
space. One can also extend FHEENC scheme to encrypt a plaintext
message 𝜇 ∈ ℤ𝑞 and can evaluate multi-bit integer additions and
multiplications more efficiently. Our next following question would be
floating/fixed point operations. They are similar to integer operations,
but we need to be able to evaluate a rounding operation following
every computation. Unfortunately, it has been considered difficult to
evaluate the rounding operation ensuring the correctness property.
An easier solution is to assume approximate computations from the
beginning and embrace errors caused by them.
CKKS scheme, one of the recent schemes, addressed this challenge
by allowing small errors in the decrypted results. Its correctness prop-
erty is more relaxed than what we’ve seen before. Now decryption
does not necessarily be precisely the original message, and indeed,
this resolved the rounding operation problem supporting approxi-
mate computation over the real numbers. To get more sense on its
construction, recall that when we decrypt a ciphertext in the FHEENC

scheme, we have CQ 𝑠 = 𝑏𝑄⊤ 𝑠 + 𝑒 where max |𝑒𝑖 | ≪ 𝑞. Since

(𝑄⊤ 𝑠)1 ∈ (0.499𝑞, 0.5001𝑞), multiplying by this term places a plaintext


bit near the most significant bits of the ciphertext where the plaintext
cannot be polluted by the encryption noise. Therefore, we are able
to precisely remove the noise 𝑒 we added for the security. However,
this kind of separated placement actually makes an evaluation of the
rounding operation difficult. On the other hand, the CKKS scheme
fu l ly homomorp hi c e nc ry p ti on : con stru c ti on 315

doesn’t clearly separate the plaintext message and noise in its de-
cryption structure. Specifically, we have the form of 𝑐⊤ 𝑠 = 𝑚 + 𝑒
and the noise lies with the LSB part of the message and does pollute
the lowest bits of the message. Note that this is acceptable as long as
it preserves enough precision. Now we can evaluate rounding(i.e.,
rescaling in the paper) homomorphically, by dividing both a cipher-
text 𝑐 and the parameter 𝑞 by some factor 𝑝. The concept of handling
ciphertexts with a different encryption parameter 𝑞 ′ = 𝑞/𝑝 is already
known to be possible. You can find more details on this modulus
switching technique in this paper if you are interested. Besides, it is
also proved that the precision loss of the decrypted evaluation result
is at most one more bit loss compared to the plaintext computation
result, which means the scheme’s precision guarantee is nearly opti-
mal. This scheme offers an efficient homomorphic encryption setting
for many practical data science and machine learning applications
which does not require precise values, but approximate ones. You may
check existing open source libraries, such as MS SEAL and HEAAN,
of this scheme as well as many practical applications including logistic
regression in the literature.

15.6.2 Bandwidth efficient fully homomorphic encryption GH


When we define homomorphic encryption in Definition 14.3, we only
consider a class of single-output functions ℱ. Now we want to ex-
tend the difinition to multiple-output function and consider how
bandwidth-efficient the fully homomorphic encryption can be. More
specifically, if we want to guarantee that the result of decryption is (or
contains) 𝑓(𝑥1 , … , 𝑥ℓ ), what will be the minimal possible length of the
ciphertext? Let us first define the compressible fully homomorphic
encryption scheme.

Definition 15.8 — Compressible Fully Homomorphic Encryption. A com-


pressible fully homomorphic public key encryption scheme is a CPA
secure public key encryption scheme (𝐺, 𝐸, 𝐷) such that there exist
polynomial-time algorithms EVAL, COMP ∶ 0, 1∗ → 0, 1∗ such
that for every (𝑒, 𝑑) = 𝐺(1𝑛 ), ℓ = 𝑝𝑜𝑙𝑦(𝑛), 𝑥1 , … , 𝑥ℓ ∈ {0, 1}, and
𝑓 ∶ {0, 1}ℓ → {0, 1}∗ which can be described by a circuit, it holds
that:

• 𝑐 = EVAL𝑒 (𝑓, 𝐸𝑒 (𝑥1 ), … , 𝐸𝑒 (𝑥ℓ )).

• 𝑐∗ = COMP(𝑐).

• 𝑓(𝑥1 , … , 𝑥ℓ ) is a prefix of 𝐷𝑑 (𝑐∗ ).


316 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

This definition is similar to the standard fully homomorphic en-


cryption except an additional compression step. The bandwidth
efficiency of a compressible fully homomorphic encryption is often
described by the rate which is defined as follows:

A
Definition 15.9 — Rate of Compressible Fully Homomorphic Encryption.
compressible fully homomorphic public key encryption scheme
has rate 𝛼 = 𝛼(𝑛) if for every (𝑒, 𝑑) = 𝐺(1𝑛 ), ℓ = 𝑝𝑜𝑙𝑦(𝑛),
𝑥1 , … , 𝑥ℓ ∈ {0, 1}, and 𝑓 ∶ {0, 1}ℓ → {0, 1}∗ with sufficiently
long output, it holds that

𝛼|𝑐∗ | ≤ |𝑓(𝑥1 , … , 𝑥ℓ )|.

The following theorem by Gentry and Halevi 2019 answers the


earlier question, which states that the nearly optimal rate, say a rate
arbitrarily close to 1, can be achieved.

For any 𝜖 > 0, there exists a com-


Theorem 15.10 — Nearly Optimal Rate.
pressive fully homomorphic encryption scheme with rate being 1 −
𝜖 under the LWE assumption.

15.6.3 Using fully homomorphic encryption to achieve private information


retrieval.
Private information retrieval (PIR) allows the client to retrive the 𝑖-
th entry of a database which has totally 𝑛 entries without letting the
server know 𝑖. We only consider the single-server case here. Obvi-
ously, a trivial solution is that the server sends the entire database to
the client.
One simple case of PIR is that each entry is a bit, for which the
trivial solution above has the communication complexity being 𝑛.
Kushilevitz and Ostrovsky 1997 reduced the the complexity to be
smaller than 𝑂(𝑛𝜖 ) for any 𝜖 > 0. After that, another work (Cachin et
al. 1999) further reduced the complexity to 𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑛). More discus-
sion about PIR and related FHE techniques can be found in Ostrovsky
and Skeith 2007, Yi et al. 2013 and references therein.
One interesting observation is that fully homomorphic encryption
can be applied to the single-server PIR via the following procedures:
• The client computes 𝐸𝑒 (𝑖) and sends it to the server.

• The server evaluates 𝑐 = EVAL(𝑓, 𝐸𝑒 (𝑖)), where 𝑓(𝑖) returns the 𝑖-th
entry of the database, and sends it (or its compressed version 𝑐∗ )
back to the client.

• The client decrypts 𝐷𝑑 (𝑐) or 𝐷𝑑 (𝑐∗ ) and obtains the 𝑖-th entry of the
database.
fu l ly homomorp hi c e nc ry p ti on : con stru c ti on 317

• Bandwidth efficient fully homomorphic encryption GH

Since there exists compressive fully homomorphic encryption


scheme with nearly optimal rate, say rate arbitrary close to 1 (see The-
orem 15.10), we can immediately get rate-(1 − 𝜖) PIR for any 𝜖. (Note
that this result holds only for database whose entries is quite large,
since the rate is defined for circuits with sufficiently long output.)
Prior to the theorem by Gentry and Halevi 2019, Kiayias et al. 2015
also constructed a PIR scheme with a nearly optimal rate/bandwidth
efficiency. The application of fully homomorphic encryption to PIR is
a fascinating field; not only limited to the bandwidth efficiency, you
may be also interested in the computational cost. We refer to Gentry
and Halevi 2019 for more details.
16
Multiparty secure computation I: Definition and Honest-But-
Curious to Malicious complier

Wikipedia defines cryptography as “the practice and study of


techniques for secure communication in the presence of third par-
ties called adversaries”. However, I think a better definition would
be:
Cryptography is about replacing trust with mathematics.

After all, the reason we work so hard in cryptography is because of


a lack of trust. We wouldn’t need encryption if Alice and Bob could
be guaranteed that their communication, despite going through wire-
less and wired networks controlled and snooped upon by a plethora
of entities, would be as reliable as if it has been hand delivered by a
letter-carrier as reliable as Patti Whitcomb, as opposed to the nosy
Eve who might look in the messages, or the malicious Mallory, who
might tamper with them. We wouldn’t need zero knowledge proofs
if Vladimir could simply say “trust me Barack, this is an authentic
nuke”. We wouldn’t need electronic signatures if we could trust that
all software updates are designed to make our devices safer and not,
to pick a random example, to turn our phones into surveillance de-
vices.
Unfortunately, the world we live in is not as ideal, and we need
these cryptographic tools. But what is the limit of what we can
achieve? Are these examples of encryption, authentication, zero
knowledge etc. isolated cases of good fortune, or are they special
cases of a more general theory of what is possible in cryptography?
It turns out that the latter is the case and there is in fact an extremely
general formulation that (in some sense) captures all of the above
and much more. This notion is called multiparty secure computation or
sometimes secure function evaluation and is the topic of this lecture. We
will show (a relaxed version of) what I like to call “the fundamental
theorem of cryptography,” namely that under natural computational

Compiled on 11.17.2021 22:35


320 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

conjectures (in particular the LWE conjecture, as well as the RSA


or Factoring assumptions) essentially every cryptographic task can
be achieved. This theorem emerged from the 1980’s works of Yao,
Goldreich-Micali-Wigderson, and many others. As we’ll see, like the
“fundamental theorems” of other fields, this is not a result that closes
off the field but rather opens up many other questions. But before
we can even state the result, we need to talk about how can we even
define security in a general setting.

16.1 IDEAL VS. REAL MODEL SECURITY.


The key notion is that cryptography aims to replace trust. Therefore,
we imagine an ideal world where there is some universally trusted
party (which cryptographer Silvio Micali likes to denote by Jimmy
Carter, but feel free to swap in your own favorite trustworthy per-
sonality) that communicates with all participants of the protocol or
interaction, including potentially the adversary. We define security
by stating that whatever the adversary can achieve in our real world,
could have also been achieved in the ideal world.
For example, for obtaining secure communication, Alice will send
her message to the trusted party, who will then convey it to Bob. The
adversary learns nothing about the message’s contents, nor can she
change them. In the zero knowledge application, to prove that there
exists some secret 𝑥 such that 𝑓(𝑥) = 1 where 𝑓(⋅) is a public function,
the prover Alice sends to the trusted party her secret input 𝑥, the
trusted party then verifies that 𝑓(𝑥) = 1 and simply sends to Bob the
message “the statement is true”. It does not reveal to Bob anything
about the secret 𝑥 beyond that.
But the paradigm goes well beyond this. For example, second price
(or Vickrey) auctions are known as a way to incentivize bidders to
bid their true value. In these auctions, every potential buyer sends a
sealed bid, and the item goes to the highest bidder, who only needs
to pay the price of the second-highest bid. We could imagine a digital
version, where buyers send encrypted versions of their bids. The auc-
tioneer could announce who the winner is and what was the second
largest bid, but could we really trust him to do so faithfully? Perhaps
we would want an auction where even the auctioneer doesn’t learn
anything about the bids beyond the identity of the winner and the
value of the second highest bid? Wouldn’t it be great if there was a
trusted party that all bidders could share with their private values,
and it would announce the results of the auction but nothing more
than that? This could be useful not just in second price auctions but
to implement many other mechanisms, especially if you are a Danish
sugar beet farmer.
mu lti pa rty se c u re comp u tati on i : d e fi n i ti on a n d hon e st-bu t-c u ri ou s to ma l i c i ou s comp l i e r 321

There are other examples as well. Perhaps two hospitals might


want to figure out if the same patient visited both, but do not want (or
are legally not allowed) to share with one another the list of people
that visited each one. A trusted party could get both lists and output
only their intersection.
The list goes on and on. Maybe we want to aggregate securely
information of the performance of Estonian IT firms or the financial
health of wall street banks. Almost every cryptographic task could
become trivial if we just had access to a universally trusted party. But
of course in the real world, we don’t. This is what makes the notion of
secure multiparty computation so exciting.

16.2 FORMALLY DEFINING SECURE MULTIPARTY COMPUTATION


We now turn to formal definitions. As we discuss below, there are
many variants of secure multiparty computation, and we pick one
simple version below. A 𝑘-party protocol is a set of efficiently com- 1
Unlike much of this text, in the context of multiparty
putable 𝑘 prescribed interactive strategies for all 𝑘 parties.1 We as- secure computation we use 𝑘 to denote the number of
sume the existence of an authenticated and private point to point parties in the protocol, as opposed to a the secret key
of some encryption scheme.
channel between every pair of parties (this can be implemented using 2
Protocols for 𝑘 > 2 parties require also a broad-
signatures and encryptions).2 A 𝑘-party functionality is a probabilistic cast channel but this can be implemented using the
process 𝐹 mapping 𝑘 inputs in {0, 1}𝑛 into 𝑘 outputs in {0, 1}𝑛 .3 combination of authenticated channels and digital
signatures.
3
Fixing the input and output sizes to 𝑛 is done for
16.2.1 First attempt: a slightly “too ideal” definition
notational simplicity and is without loss of gener-
Here is one attempt of a definition that is clean but a bit too strong. ality. More generally, the inputs and outputs could
Nevertheless it captures much of the spirit of secure multiparty com- have sizes up to polynomial in 𝑛 and some inputs or
output can also be empty. Also, note that one can de-
putation: fine a more general notion of stateful functionalities,
though it is not hard to reduce the task of building
Definition 16.1 — MPC without aborts.Let 𝐹 be a 𝑘-party functional- a protocol for stateful functionalities to building
protocols for stateless ones.
ity. A secure protocol for 𝐹 is a protocol for 𝑘 parties satisfying that
for every 𝑇 ⊆ [𝑘] and every efficient adversary 𝐴, there exists an
efficient “ideal adversary” (i.e., efficient interactive algorithm)
𝑆 such that for every set of inputs {𝑥𝑖 }𝑖∈[𝑘]⧵𝑇 the following two
distributions are computationally indistinguishable:

• The tuple (𝑦1 , … , 𝑦𝑘 ) of outputs of all the parties (both con-


trolled and not controlled by the adversary) in an execution of
the protocol where 𝐴 controls the parties in 𝑇 and the inputs of
the parties not in 𝑇 are given by {𝑥𝑖 }𝑖∈[𝑘]⧵𝑇 .

• The tuple (𝑦1 , … , 𝑦𝑘 ) that is computed using the following pro-


cess:

a. We let {𝑥𝑖 }𝑖∈𝑇 be chosen by 𝑆, and compute (𝑦1′ , … , 𝑦𝑘′ ) =


𝐹 (𝑥1 , … , 𝑥𝑘 ).
322 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

b. For every 𝑖 ∈ [𝑘], if 𝑖 ∉ 𝑇 (i.e., party 𝑖 is “honest”) then 𝑦𝑖 =


𝑦𝑖′ and otherwise, we let 𝑆 choose 𝑦𝑖 .

That is, the protocol is secure if whatever an adversary can gain


by taking complete control over the set of parties in 𝑇 , could have
been gained by simply using this control to choose particular inputs
{𝑥𝑖 }𝑖∈𝑇 , run the protocol honestly, and observe the outputs of the
functionality.
Note that in particular if 𝑇 = ∅ (and hence there is no adversary)
then if the parties’ inputs are (𝑥1 , … , 𝑥𝑘 ) then their outputs will equal
𝐹 (𝑥1 , … , 𝑥𝑘 ).

16.2.2 Allowing for aborts


Definition 16.1 above is a little too strong, in the following sense. Con-
sider the case that 𝑘 = 2 where there are two parties Alice (Party 1)
and Bob (Party 2) that wish to compute some output 𝐹 (𝑥1 , 𝑥2 ). If Bob
is controlled by the adversary then he clearly can simply abort the
protocol and prevent Alice from computing 𝑦1 . Thus, in this case in
the actual execution of the protocol the output 𝑦1 will be some error
message (which we denote by ⊥). But we did not allow this possiblity
for the idealized adversary 𝑆: if 1 ∉ 𝑇 then it must be the case that the
output 𝑦1 is equal to 𝑦1′ for some (𝑦1′ , 𝑦2′ ) = 𝐹 (𝑥1 , 𝑥2 ).
This means that we would be able to distinguish between the 4
As a side note, we can avoid this issue if we have
output in the real and ideal setting.4 This motivates the following, an honest majority of players - i.e. if |𝑇 | < 𝑘/2, but
slightly more messy definition, that allows for the ability of the adver- this condition does not make sense in the two party
setting, where you can only have an honest majority if
sary to abort the execution at any point in time:
there is no cheating party.

Definition 16.2 — MPC with aborts.Let 𝐹 be a 𝑘-party functionality. A


secure protocol for 𝐹 is a protocol for 𝑘 parties satisfying that for ev-
ery 𝑇 ⊆ [𝑘] and every efficient adversary 𝐴, there exists an efficient
“ideal adversary” (i.e., efficient interactive algorithm) 𝑆 such that
for every set of inputs {𝑥𝑖 }𝑖∈[𝑘]⧵𝑇 the following two distributions
are computationally indistinguishable:

• The tuple (𝑦1 , … , 𝑦𝑘 ) of outputs of all the parties (both con-


trolled and not controlled by the adversary) in an execution of
the protocol where 𝐴 controls the parties in 𝑇 and the inputs of
the parties not in 𝑇 are given by {𝑥𝑖 }𝑖∈[𝑘]⧵𝑇 we denote by 𝑦𝑖 = ⊤
if the 𝑖𝑡ℎ party aborted the protocol.

• The tuple (𝑦1 , … , 𝑦𝑘 ) that is computed using the following pro-


cess:
mu lti pa rty se c u re comp u tati on i : d e fi n i ti on a n d hon e st-bu t-c u ri ou s to ma l i c i ou s comp l i e r 323

a. We let {𝑥𝑖 }𝑖∈𝑇 be chosen by 𝑆, and compute (𝑦1′ , … , 𝑦𝑘′ ) =


𝐹 (𝑥1 , … , 𝑥𝑘 ).
b. For 𝑖 = 1, … , 𝑘 do the following: ask 𝑆 if it wishes to abort at
this stage, and if it doesn’t then the 𝑖𝑡ℎ party learns 𝑦𝑖′ . If the
adversary did abort then we exit the loop at this stage and the
parties 𝑖 + 1, … , 𝑘 (regardless if they are honest or malicious)
do not learn the corresponding outputs.
c. Let 𝑘′ be the last non-abort stage we reached above. For every
𝑖 ∉ 𝑇 , if 𝑖 ≤ 𝑘′ then 𝑦𝑖 = 𝑦𝑖′ and if 𝑖 > 𝑘′ then 𝑦𝑖′ = ⊥. We let
the adversary 𝑆 choose {𝑦𝑖 }𝑖∈𝑇 .

Figure 16.1: We define security of a protocol imple-


menting a functionality 𝐹 by stipulating that for
every adversary 𝐴 that controls a subset of the par-
ties, 𝐴’s view in an actual execution of the protocol
would be indistinguishable from its view in an ideal
setting where all the parties send their inputs to
an idealized and perfectly trusted party, who then
computes the outputs and sends it to each party.

Here are some good exercises to make sure you follow the defini-
tion:

• Let 𝐹 be the two party functionality such that 𝐹 (𝐻‖𝐶, 𝐻 ′ ) outputs


(1, 1) if the graph 𝐻 equals the graph 𝐻 ′ and 𝐶 is a Hamiltonian
cycle and otherwise outputs (0, 0). Prove that a protocol for com- 5
Actually, if we want to be pedantic, this is what’s
known as a zero knowledge argument system since
puting 𝐹 is a zero knowledge proof5 system for the language of
soundness is only guaranteed against efficient
Hamiltonicity.6 provers. However, this distinction is not important in
almost all applications.
• Let 𝐹 be the 𝑘-party functionality that on inputs 𝑥1 , … , 𝑥𝑘 ∈ {0, 1}
6
Our treatment of the input graph 𝐻 is an instance
of a general case. While the definition of a function-
outputs to all parties the majority value of the 𝑥𝑖 ’s. Then, in any ality only talks about private inputs, it’s very easy to
protocol that securely computes 𝐹 , for any adversary that controls include public inputs as well. If we want to include
some public input 𝑍 we can simply have 𝑍 concate-
less than half of the parties, if at least 𝑛/2 + 1 of the other parties’ in-
nated to all the private inputs (and the functionality
puts equal 0, then the adversary will not be able to cause an honest check that they are all the same, otherwise outputting
party to output 1. error or some similar result).
324 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

P
It is an excellent idea for you to pause here and try to
work out at least informally these exercises.

Amazingly, we can obtain such a protocol for every functionality:

Theorem 16.3 — Fundamental theorem of cryptography. Under reason-


able assumptions 7 for every polynomial-time computable 𝑘-
functionality 𝐹 there is a polynomial-time protocol that computes
it securely.
7
Originally this was shown under the assumption of
Theorem 16.3 was originally proven by Yao in 1982 for the special trapdoor permutations (which can be derived from
case of two party functionalities, and then proved for the general case the Factoring or RSA conjectures) but it is known
today under a variety of other assumptions, including
by Goldreich, Micali, and Wigderson in 1987. As discussed below, in particular the LWE conjecture.
many variants of this theorem have been shown, and this line of re-
search is still ongoing.

16.2.3 Some comments:


There is in fact not a single theorem but rather many variants of this
fundamental theorem obtained by great many people, depending
on the different security properties desired, as well as the different
cryptographic and setup assumptions. Some of the issues studied in
the literature include the following:

• Fairness, guaranteed output delivery: The definition above does


not attempt to protect against “denial of service” attacks, in the
sense that the adversary is allowed, even in the ideal case, to pre-
vent the honest parties from receiving their outputs.
As mentioned above, without honest majority this is essential for
similar reasons to the issue we discussed in our lecture on bitcoin
why achieving consensus is hard if there isn’t a honest majority.
When there is an honest majority, we can achieve the property of
guaranteed output delivery, which offers protection against such “de-
nial of service” attacks. Even when there is no guaranteed output
delivery, we might want the property of fairness, whereby we guar-
antee that if the honest parties don’t get the output then neither
does the adversary. There has been extensive study of fairness and
there are protocols achieving variants on it under various computa-
tional and setup assumptions.

• Network models: The current definition assumes we have a set of


𝑘 parties with known identities with pairwise secure (confidential
and authenticated) channels between them. Other network models
studies include broadcast channel, non-private networks, and even
no authentication).
mu lti pa rty se c u re comp u tati on i : d e fi n i ti on a n d hon e st-bu t-c u ri ou s to ma l i c i ou s comp l i e r 325

• Setup assumptions: The definition does not assume a trusted third


party, but people have studied different setup assumptions includ-
ing a public key infrastructure, common reference string, and more.

• Adversarial power: It turns out that under certain conditions, it can


be possible to obtain secure multiparty computation with respect to
adversaries that have unbounded computational power (so called
“information theoretic security”). People have also studies different
variants of adversaries including “honest but curious” or “passive
adversaries”, as well as “covert” adversaries that only deviate from
the protocol if they won’t be caught. Other settings studied limit
the adversary’s ability to control parties (e.g., honest majority,
smaller fraction of parties or particular patterns of control, adaptive
vs static corruption).

• Concurrent compositions: The definition displayed above are for


standalone execution which is known not to automatically imply
security with respect to concurrent composition, where many copies
of the same protocol (or different protocols) could be executed 8
One example of the kind of issues that can arise is
simultaneously. This opens up all sorts of new attacks.8 See Yehuda the “grandmasters attack” whereby someone with
Lindell’s thesis (or this updated version) for more. A very general no knowledge of chess could play two grandmasters
simultaneously, relaying their moves to one another
notion known as “UC security” (which stands for “Universally and thereby guaranteeing a win in at least one of the
Composable” or maybe “Ultimate Chuck”) has been proposed to games (or a draw in both).
achieve security in these settings, though at a price of additional
setup assumptions, see here and here.

• Communication: The communication cost for Theorem 16.3 can be


proportional to the size of the circuit that computes 𝐹 . This can be
a very steep cost, especially when computing over large amounts of
data. It turns out that we can sometimes avoid this cost using fully
homomorphic encryption or other techniques.

• Efficiency vs. generality: While Theorem 16.3 tells us that essen-


tially every protocol problem can be solved in principle, its proof
will almost never yield a protocol you actually want to run since
it has enormous efficiency overhead. The issue of efficiency is the
biggest reason why secure multiparty computation has so far not
had a great many practical applications. However, researchers have
been showing more efficient tailor-made protocols for particular
problems of interest, and there has been steady progress in making
those results more practical. See the slides and videos from this
workshop for more.

Is multiparty secure computation the end of crypto? The notion of secure


multiparty computation seems so strong that you might think that
once it is achieved, aside from efficiency issues, there is nothing else to
326 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

be done in cryptography. This is very far from the truth. Multiparty


secure computation does give a way to solve a great many problems
in the setting where we have arbitrary rounds of interactions and un-
bounded communication, but this is far from being always the case.
As we mentioned before, interaction can sometimes make a quali-
tative difference (when Alice and Bob are separated by time rather
than space). As we’ve seen in the discussion of fully homomorphic
encryption, there are also other properties, such as compact commu-
nication, which are not implied by multiparty secure computation but
can make all the difference in contexts such as cloud computing. That
said, multiparty secure computation is an extremely general paradigm
that does apply to many cryptographic problems.

Further reading: The survey of Lindell and Pinkas gives a good


overview of the different variants and security properties considered
in the literature, see also Section 7 in this survey of Goldreich. Chapter
6 in Pass and Shelat’s notes is also a good source.

16.3 EXAMPLE: SECOND PRICE AUCTION USING BITCOIN


Suppose we have the following setup: an auctioneer wants to sell
some item and run a second-price auction, where each party submits
a sealed bid, and the highest bidder gets the item for the price of the
second highest bid. However, as mentioned above, the bidders do not
want the auctioneer to learn what their bid was, and in general noth-
ing else other than the identity of the highest bidder and the value
of the second highest bid. Moreover, we might want the payment to
be via an electronic currency such as bitcoin, so the auctioneer not
only gets the information about the winning bid but an actual self-
certifying transaction they can use to get the payment.
Here is how we could obtain such a protocol using secure multi-
party computation:

• We have 𝑘 parties where the first party is the auctioneer and parties
2, … , 𝑘 are bidders. Let’s assume for simplicity that each party 𝑖 has
a public key 𝑣𝑖 that is associated with some bitcoin account.9 We 9
As we discussed before, bitcoin doesn’t have the
notion of accounts but rather what we mean by that
treat all these keys as the public input. for each one of the public keys, the public ledger
contains a sufficiently large amount of bitcoins that
• The private input of bidder 𝑖 is the value 𝑥𝑖 that it wants to bid as have been transferred to these keys (in the sense that
well as the secret key 𝑠𝑖 that corresponds to their public key. whomever can sign w.r.t. these keys can transfer the
corresponding coins).
• The functionality only provides an output to the auctioneer, which
would be the identity 𝑖 of the winning bidder as well as a valid
signature on this bidder transferring 𝑥 bitcoins to the key 𝑣1 of the
auctioneer, where 𝑥 is the value of the second largest valid bid (i.e.,
𝑥 equals to the second largest 𝑥𝑗 such that 𝑠𝑗 is indeed the private
key corresponding to 𝑣𝑗 .)
mu lti pa rty se c u re comp u tati on i : d e fi n i ti on a n d hon e st-bu t-c u ri ou s to ma l i c i ou s comp l i e r 327

It’s worthwhile to think about what a secure protocol for this func-
tionality accomplishes. For example:

• The fact that in the ideal model the adversary needs to choose its
queries independently means that the adversary cannot get any
information about the honest parties’ bids before deciding on its
bid.

• Despite all parties using their signing keys as inputs to the protocol,
we are guaranteed that no one will learn anything about another
party’s signing key except the single signature that will be pro-
duced.

• Note that if 𝑖 is the highest bidder and 𝑗 is the second highest, then
at the end of the protocol we get a valid signature using 𝑠𝑖 on a
transaction transferring 𝑥𝑗 bitcoins to 𝑣1 , despite 𝑖 not knowing the
value 𝑥𝑗 (and in fact never learning the identity of 𝑗.) Nonetheless,
𝑖 is guaranteed that the signature produced will be on an amount
not larger than its own bid and an amount that one of the other
bidders actually bid for.

I find the ability to obtain such strong notions of security pretty


remarkable. This demonstrates the tremendous power of obtaining
protocols for general functionalities.

16.3.1 Another example: distributed and threshold cryptography


It sometimes makes sense to use multiparty secure computation for
cryptographic computations as well. For example, there might be several
reasons why we would want to “split” a secret key between several
parties, so no party knows it completely.

• Some proposals for key escrow (giving government or other entity


an option for decrypting communication) suggested splitting a
cryptographic key between several agencies or institutions (say
the FBI, the courts, etc.) so that they must collaborate in order
to decrypt communication, thus hopefully preventing unlawful
access.

• On the other side, a company might wish to split its own key be-
tween several servers residing in different countries, to ensure not
one of them is completely under one jurisdiction. Or it might do
such splitting for technical reasons, so that if there is a break in at a
single site, the key is not compromised.

There are several other such examples. One problem with this ap-
proach is that splitting a cryptographic key is not the same as cutting a
100 dollar bill in half. If you simply give half of the bits to each party,
328 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

you could significantly harm security. (For example, it is possible to


recover the full RSA key from only 27% of its bits).
Here is a better approach, known as secret sharing: To securely
share a string 𝑠 ∈ {0, 1}𝑛 among 𝑘 parties so that any 𝑘 − 1 of them
have no information about it, we choose 𝑠1 , … , 𝑠𝑘−1 at random in
{0, 1}𝑛 and let 𝑠𝑘 = 𝑠 ⊕ 𝑠1 ⊕ ⋯ 𝑠𝑘−1 (⊕ as usual denotes the XOR
operation), and give party 𝑖 the string 𝑠𝑖 , which is known as the 𝑖𝑡ℎ
share of 𝑠. Note that 𝑠 = 𝑠1 ⊕ ⋯ ⊕ 𝑠𝑘 and so given all 𝑘 pieces we can
reconstruct the key. Clearly the first 𝑘 − 1 parties did not receive any
information about 𝑠 (since their shares were generated independent of
𝑠), but the following not-too-hard claim shows that this holds for every
set of 𝑘 − 1 parties:
Lemma 16.4 For every 𝑠 ∈ {0, 1}𝑛 , and set 𝑇 ⊆ [𝑘] of size 𝑘 − 1, we get
exactly the same distribution over (𝑠1 , … , 𝑠𝑘 ) as above if we choose 𝑠𝑖
for 𝑖 ∈ 𝑇 at random and set 𝑠𝑡 = 𝑠 ⊕𝑖∈𝑇 𝑠𝑖 where 𝑡 = [𝑘] ⧵ 𝑇 .
We leave the proof of Lemma 16.4 as an exercise.
Secret sharing solves the problem of protecting the key “at rest”
but if we actually want to use the secret key in order to sign or decrypt
some message, then it seems we need to collect all the pieces together
into one place, which is exactly what we wanted to avoid doing. This
is where multiparty secure computation comes into play; we can de-
fine a functionality 𝐹 taking public input 𝑚 and secret inputs 𝑠1 , … , 𝑠𝑘
and producing a signature or decryption of 𝑚. In fact, we can go be-
yond that and even have the parties sign or decrypt a message without
them knowing what this message is, except that it satisfies some con-
ditions.
Moreover, secret sharing can be generalized so that a threshold
other than 𝑘 is necessary and sufficient to reconstruct the secret (and
people have also studied more complicated access patterns). Similarly
multiparty secure computation can be used to achieve distributed
cryptography with finer access control mechanisms.

16.4 PROVING THE FUNDAMENTAL THEOREM:


We will complete the proof of (a relaxed version of) the fundamental
theorem over this lecture and the next one. The proof consists of two
phases:

1. A protocol for the “honest but curious” case using fully homomor-
phic encryption.

2. A reduction of the general case into the “honest but curious” case
where the adversary follows the protocol precisely but merely
attempts to learn some information on top of the output that it is
mu lti pa rty se c u re comp u tati on i : d e fi n i ti on a n d hon e st-bu t-c u ri ou s to ma l i c i ou s comp l i e r 329

“entitled to” learn. (This reduction is based on zero knowledge


proofs and is due to Goldreich, Micali and Wigderson)
We note that while fully homomorphic encryption yields a con-
ceptually simple approach for the first step, it is not currently the
most efficient approach, and rather most practical implementations
are based on the technique known as “Yao’s Garbled Ciruits” (see
this book or this paper or this survey ) which in turn is based a no-
tion known as oblivious transfer which can be thought of as a “baby
private information retrieval” (though it preceded the latter notion).
We will focus on the case of two parties. The same ideas extend to
𝑘 > 2 parties but with some additional complications.

16.5 MALICIOUS TO HONEST BUT CURIOUS REDUCTION


We start from the second stage. Giving a reduction transforming a
protocol in the “honest but curious” setting into a protocol secure in
the malicious setting. That is, we will prove the following theorem:

There
Theorem 16.5 — Honest-but-curious to malicious security compiler.
is a polynomial-time “compiler” 𝐶 such that for every for every
𝑘 party protocol (𝑃1 , … , 𝑃𝑘 ) (where all 𝑃𝑖 ’s are polynomial-time
computable potentially randomized strategies), (𝑃1̃ , … , 𝑃𝑘̃ ) =
𝐶(𝑃1 , … , 𝑃𝑘 ) is a 𝑘-tuple polynomial-time computable strategies
and moreover if (𝑃1 , … , 𝑃𝑘 ) was a protocol for computing some
(potentially randomized) functionality 𝐹 secure with respect to
honest-but-curious adversaries, then (𝑃1̃ , … , 𝑃𝑘̃ ) is a protocol for
computing the same 𝐹 secure with respect to malicious adversaries.

The remainder of this section is devoted to the proof of Theo-


rem 16.5. For ease of notation we will focus on the 𝑘 = 2 case, where
there are only two parties (“Alice” and “Bob”) although these tech-
niques generalize to arbitrary number of parties 𝑘. Note that a priori,
it is not obvious at all that such a compiler should exist. In the “honest
but curious” setting we assume the adversary follows the protocol to
the letter. Thus a protocol where Alice gives away all her secrets to
Bob if he merely asks her to do so politely can be secure in the “honest
but curious” setting if Bob’s instructions are not to ask. More seri-
ously, it could very well be that Bob has an ability to deviate from the
protocol in subtle ways that would be completely undetectable but
allow him to learn Alice’s secrets. Any transformation of the protocol
to obtain security in the malicious setting will need to rule out such
deviations.
The main idea is the following: we do the compilation one party
at a time - we first transform the protocol so that it will remain se-
cure even if Alice tries to cheat, and then transform it so it will remain
330 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

secure even if Bob tries to cheat. Let’s focus on Alice. Let’s imagine
(without loss of generality) that Alice and Bob alternate sending mes-
sages in the protocol with Alice going first, and so Alice sends the
odd messages and Bob sends the even ones. Lets denote by 𝑚𝑖 the
message sent in the 𝑖𝑡ℎ round of the protocol. Alice’s instructions can
be thought of as a sequence of functions 𝑓1 , 𝑓3 , ⋯ , 𝑓𝑡 (where 𝑡 is the
last round in which Alice speaks) where each 𝑓𝑖 is an efficiently com-
putable function mapping Alice’s secret input 𝑥1 , (possibly) her ran-
dom coins 𝑟1 , and the transcript of the previous messages 𝑚1 , … , 𝑚𝑖−1
to the next message 𝑚𝑖 . The functions {𝑓𝑖 } are publicly known and
part of the protocol’s instructions. The only thing that Bob doesn’t
know is 𝑥1 and 𝑟1 . So, our idea would be to change the protocol so
that after Alice sends the message 𝑖, she proves to Bob that it was in-
deed computed correctly using 𝑓𝑖 . If 𝑥1 and 𝑟1 weren’t secret, Alice
could simply send those to Bob so he can verify the computation on
his own. But because they are (and the security of the protocol could
depend on that) we instead use a zero knowledge proof.
Let’s assume for starters that Alice’s strategy is deterministic (and
so there is no random tape 𝑟1 ). A first attempt to ensure she can’t
use a malicious strategy would be for Alice to follow the message
𝑚𝑖 with a zero knowledge proof that there exists some 𝑥1 such that
𝑚𝑖 = 𝑓𝑖 (𝑥1 , 𝑚1 , … , 𝑚𝑖−1 ). However, this will actually not be secure
- it is worth while at this point for you to pause and think if you can
understand the problem with this solution.

P
Really, please stop and think why this will not be
secure.
mu lti pa rty se c u re comp u tati on i : d e fi n i ti on a n d hon e st-bu t-c u ri ou s to ma l i c i ou s comp l i e r 331

P
Did you stop and think?

The problem is that at every step Alice proves that there exists
some input 𝑥1 that can explain her message but she doesn’t prove that
it’s the same input for all messages. If Alice was being truly honest, she
should have picked her input once and use it throughout the protocol,
and she could not compute the first message according to the input
𝑥1 and then the third message according to some input 𝑥′1 ≠ 𝑥1 .
Of course we can’t have Alice reveal the input, as this would violate
security. The solution is for Alice to commit in advance to the input.
We have seen commitments before, but let us now formally define the
notion:

Definition 16.6 — Commitment scheme.A commitment scheme for strings


of length ℓ is a two party protocol between the sender and receiver
satisfying the following:

• Hiding (sender’s security): For every two sender inputs


𝑥, 𝑥′ ∈ {0, 1}ℓ , and no matter what efficient strategy the re-
ceiver uses, it cannot distinguish between the interaction with
the sender when the latter uses 𝑥 as opposed to when it uses 𝑥′ .

• Binding (reciever’s security): No matter what (efficient or non


efficient) strategy the sender uses, if the reciever follows the pro-
tocol then with probability 1 − 𝑛𝑒𝑔𝑙(𝑛), there will exist at most
a single string 𝑥 ∈ {0, 1}ℓ such that the transcript is consistent
with the input 𝑥 and some sender randomness 𝑟.

That is, a commitment is the digital analog to placing a message in


a sealed envelope to be opened at a later time. To commit to a message
𝑥 the sender and reciever interact according to the protocol, and to
open the commitment the sender simply sends 𝑥 as well as the random
coins it used during the commitment phase. The variant we defined
above is known as computationally hiding and statistically binding, since
the sender’s security is only guaranteed against efficient receivers
while the binding property is guaranteed against all senders. There
are also statistically hiding and computationally binding commit-
ments, though it can be shown that we need to restrict to efficient
strategies for at least one of the parties.
We have already seen a commitment scheme before (due to Naor):
the receiver sends a random 𝑧 ←𝑅 {0, 1}3𝑛 and the sender commits to
a bit 𝑏 by choosing a random 𝑠 ∈ {0, 1}𝑛 and sending 𝑦 = PRG(𝑠) ⊕ 𝑏𝑧
where PRG ∶ {0, 1}𝑛 → {0, 1}3𝑛 is a pseudorandom generator. It’s
332 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

a good exercise to verify that it satisfies the above definitions. By


running this protocol ℓ times in parallel we can commit to a string of
any polynomial length.
We can now describe the transformation ensuring the protocol
is secure against a malicious Alice in full, for the case that that the
original strategy of Alice is deterministic (and hence uses no random
coins)

• Initially Alice and Bob engage in a commitment scheme where


Alice commits to her input 𝑥1 . Let 𝜏 be the transcript of this com- 10
Note that even though we assumed that in the
mitment phase and 𝑟𝑐𝑜𝑚 be the randomness Alice used during it.10 original honest-but-curious protocol Alice used a
deterministic strategy, we will transform the protocol
• For 𝑖 = 1, 2, …: into one in which Alice uses a randomized strategy in
both the commitment and zero knowledge phases.
– If 𝑖 is even then Bob sends 𝑚𝑖 to Alice
– If 𝑖 is odd then Alice sends 𝑚𝑖 to Bob and then they engage in
a zero knowledge proof that there exists 𝑥1 , 𝑟𝑐𝑜𝑚 such that (1)
𝑥1 , 𝑟𝑐𝑜𝑚 is consistent with 𝜏 , and (2) 𝑚𝑖 = 𝑓𝑖 (𝑥1 , 𝑚1 , … , 𝑚𝑖−1 ).
The proof is repeated a sufficient number of times to ensure
that if the statement is false then Bob rejects with 1 − 𝑛𝑒𝑔𝑙(𝑛)
probability.
– If the proof is rejected then Bob aborts the protocol.

We will not prove security but will only sketch it here, see Section
7.3.2 in Goldreich’s survey for a more detailed proof:

• To argue that we maintain security for Alice we use the zero knowl-
edge property: we claim that Bob could not learn anything from
the zero knowledge proofs precisely because he could have sim-
ulated them by himself. We also use the hiding property of the
commitment scheme. To prove security formally we need to show
that whatever Bob learns in the modified protocol, he could have
learned in the original protocol as well. We do this by simulating
Bob by replacing the commitment scheme with commitment to
some random junk instead of 𝑥1 and the zero knowledge proofs
with their simulated version. The proof of security requires a hy-
brid argument, and is again a good exercise to try to do it on your
own.

• To argue that we maintain security for Bob we use the binding prop-
erty of the commitment scheme as well as the soundness property
of the zero knowledge system. Once again for the formal proof
we need to show that we could transform any potentially mali-
cious strategy for Alice in the modified protocol into an “honest
but curious” strategy in the original protocol (also allowing Alice
the ability to abort the protocol). It turns out that to do so, it is not
mu lti pa rty se c u re comp u tati on i : d e fi n i ti on a n d hon e st-bu t-c u ri ou s to ma l i c i ou s comp l i e r 333

enough that the zero knowledge system is sound but we need a


stronger property known as a proof of knowledge. We will not define
it formally, but roughly speaking it means we can transform any
prover strategy that convinces the verifier that a statement is true
with non-negligible probability into an algorithm that outputs the
underlying secret (i.e., 𝑥1 and 𝑟𝑐𝑜𝑚 in our case). This is crucial in
order to trasnform Alice’s potentially malicious strategy into an
honest but curious strategy.

We can repeat this transformation for Bob (or Charlie, David, etc..
in the 𝑘 > 2 party case) to transform a protocol secure in the honest
but curious setting into a protocol secure (allowing for aborts) in the
malicious setting.

16.5.1 Handling probabilistic strategies:


So far we assumed that the original strategy of Alice in the honest but
curious is deterministic but of course we need to consider probabilistic
strategies as well. One approach could be to simply think of Alice’s
random tape 𝑟 as part of her secret input 𝑥1 . However, while in the
honest but curious setting Alice is still entitled to freely choose her
own input 𝑥1 , she is not entitled to choose the random tape as she
wishes but is supposed to follow the instructions of the protocol and
choose it uniformly at random. Hence we need to use a coin tossing
protocol to choose the randomness, or more accurately what’s known
as a “coin tossing in the well” protocol where Alice and Bob engage
in a coin tossing protocol at the end of which they generate some ran-
dom coins 𝑟 that only Alice knows but Bob is still guaranteed that they
are random. Such a protocol can actually be achieved very simply.
Suppose we want to generate 𝑚 coins:

• Alice selects 𝑟′ ←𝑅 {0, 1}𝑚 at random and engages in a commitment


protocol to commit to 𝑟′ .
• Bob selects 𝑟″ ←𝑅 {0, 1}𝑚 and sends it to Alice in the clear.
• The result of the coin tossing protocol will be the string 𝑟 = 𝑟′ ⊕ 𝑟″ .

Note that Alice knows 𝑟. Bob doesn’t know 𝑟 but because he chose
𝑟″ after Alice committed to 𝑟′ he knows that it must be fully random
regardless of Alice’s choice of 𝑟′ . It can be shown that if we use this
coin tossing protocol at the beginning and then modify the zero
knowledge proofs to show that 𝑚𝑖 = 𝑓𝑖 (𝑥1 , 𝑟1 , 𝑚1 , … , 𝑚𝑖−1 ) where
𝑟1 is the string that is consistent with the transcript of the coin toss-
ing protocol, then we get a general transformation of an honest but
curious adversary into the malicious setting.
The notion of multiparty secure computation - defining it and
achieving it - is quite subtle and I do urge you to read some of the
334 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

other references listed above as well. In particular, the slides and


videos from the Bar Ilan winter school on secure computation and
efficiency, as well as the ones from the winter school on advances in
practical multiparty computation are great sources for this and related
materials.
17
Multiparty secure computation II: Construction using Fully
Homomorphic Encryption

In the last lecture we saw the definition of secure multiparty com-


putation, as well as the compiler reducing the task of achieving se-
curity in the general (malicious) setting to the passive (honest-but-
curious) setting. In this lecture we will see how using fully homo-
morphic encryption we can achieve security in the honest-but-curious 1
This is by no means the only way to get multiparty
setting.1 We focus on the two party case, and so prove the following secure computation. In fact, multiparty secure com-
theorem: putation was known well before FHE was discovered.
One common construction for achieving this uses a
technique known as Yao’s Garbled Circuit.
Assuming the LWE
Theorem 17.1 — Two party honest-but-curious MPC.
conjecture, for every two party functionality 𝐹 there is a protocol
computing 𝐹 in the honest but curious model.

Before proving the theorem it might be worthwhile to recall what


is actually the definition of secure multiparty computation, when
specialized for the 𝑘 = 2 and honest but curious case. The defini-
tion significantly simplifies here since we don’t have to deal with the
possibility of aborts.

Let 𝐹 be
Definition 17.2 — Two party honest-but-curious secure computation.
(possibly probabilistic) map of {0, 1}𝑛 × {0, 1}𝑛 to {0, 1}𝑛 × {0, 1}𝑛 .
A secure protocol for 𝐹 is a two party protocol such for every party
𝑡 ∈ {1, 2}, there exists an efficient “ideal adversary” (i.e., efficient
interactive algorithm) 𝑆 such that for every pair of inputs (𝑥1 , 𝑥2 )
the following two distributions are computationally indistinguish-
able:

• The tuple (𝑦1 , 𝑦2 , 𝑣) obtained by running the protocol on inputs


𝑥1 , 𝑥2 , and letting 𝑦1 , 𝑦2 be the outputs of the two parties and
𝑣 be the view (all internal randomness, inputs, and messages
received) of party 𝑡.

Compiled on 11.17.2021 22:35


336 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

• The tuple (𝑦1 , 𝑦2 , 𝑣) that is computed by letting (𝑦1 , 𝑦2 ) =


𝐹 (𝑥1 , 𝑥2 ) and 𝑣 = 𝑆(𝑥𝑡 , 𝑦𝑡 ).

That is, 𝑆, which only gets the input 𝑥𝑡 and output 𝑦𝑡 , can sim-
ulate all the information that an honest-but-curious adversary
controlling party 𝑡 will view.

17.1 CONSTRUCTING 2 PARTY HONEST BUT CURIOUS COMPUTA-


TION FROM FULLY HOMOMORPHIC ENCRYPTION
Let 𝐹 be a two party functionality. Lets start with the case that 𝐹 is de-
terministic and that only Alice receives an output. We’ll later show an
easy reduction from the general case to this one. Here is a suggested
protocol for Alice and Bob to run on inputs 𝑥, 𝑦 respectively so that
Alice will learn 𝐹 (𝑥, 𝑦) but nothing more about 𝑦, and Bob will learn
nothing about 𝑥 that he didn’t know before.

Protocol 2PC: (See Fig. 17.1)

• Assumptions: (𝐺, 𝐸, 𝐷, EVAL) is a fully homo-


morphic encryption scheme.
• Inputs: Alice’s input is 𝑥 ∈ {0, 1}𝑛 and Bob’s in-
put is 𝑦 ∈ {0, 1}𝑛 . The goal is for Alice to learn
only 𝐹 (𝑥, 𝑦) and Bob to learn nothing.
• Alice->Bob: Alice generates (𝑒, 𝑑) ←𝑅 𝐺(1𝑛 ) Figure 17.1: An honest but curious protocol for two
and sends 𝑒 and 𝑐 = 𝐸𝑒 (𝑥). party computation using a fully homomorphic
encryption scheme with circuit privacy.
• Bob->Alice: Bob defines 𝑓 to be the function
𝑓(𝑥) = 𝐹 (𝑥, 𝑦) and sends 𝑐′ = EVAL(𝑓, 𝑐) to
Alice.
• Alice’s output: Alice computes 𝑧 = 𝐷𝑑 (𝑐′ ).

First, note that if Alice and Bob both follow the protocol, then in-
deed at the end of the protocol Alice will compute 𝐹 (𝑥, 𝑦). We now
claim that Bob does not learn anything about Alice’s input:

Claim B: For every 𝑥, 𝑦, there exists a standalone algorithm 𝑆 such


that 𝑆(𝑦) is indistinguishable from Bob’s view when interacting with
Alice and their corresponding inputs are (𝑥, 𝑦).

Proof: Bob only receives a single message in this protocol of the form
(𝑒, 𝑐) where 𝑒 is a public key and 𝑐 = 𝐸𝑒 (𝑥). The simulator 𝑆 will
generate (𝑒, 𝑑) ←𝑅 𝐺(1𝑛 ) and compute (𝑒, 𝑐) where 𝑐 = 𝐸𝑒 (0𝑛 ).
(As usual 0𝑛 denotes the length 𝑛 string consisting of all zeroes.)
No matter what 𝑥 is, the output of 𝑆 is indistinguishable from the
message Bob receives by the security of the encryption scheme. QED
mu lti pa rty se c u re co mp u tati on i i : con stru c ti on u si ng fu l ly homomorp hi c e nc ry p ti on 337

(In fact, Claim B holds even against a malicious strategy of Bob- can
you see why?)
We would now hope that we can prove the same regarding Alice’s
security. That is prove the following:

Claim A: For every 𝑥, 𝑦, there exists a standalone algorithm 𝑆 such


that 𝑆(𝑦) is indistinguishable from Alice’s view when interacting with
Bob and their corresponding inputs are (𝑥, 𝑦).

P
At this point, you might want to try to see if you can
prove Claim A on your own. If you’re having difficul-
ties proving it, try to think whether it’s even true.
338 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

So, it turns out that Claim A is not generically true. The reason is
the following: the definition of fully homomorphic encryption only re-
quires that EVAL(𝑓, 𝐸(𝑥)) decrypts to 𝑓(𝑥) but it does not require that
it hides the contents of 𝑓. For example, for every FHE, if we modify
EVAL(𝑓, 𝑐) to append to the ciphertext the first 100 bits of the descrip-
tion of 𝑓 (and have the decryption algorithm ignore this extra infor- 2
It’s true that strictly speaking, we allowed EVAL’s
mation) then this would still be a secure FHE.2 Now we didn’t exactly output to have length at most 𝑛, while this would
specify how we describe the function 𝑓(𝑥) defined as 𝑥 ↦ 𝐹 (𝑥, 𝑦) make the output be 𝑛 + 100, but this is just a tech-
nicality that can be easily bypassed, for example by
but there are clearly representations in which the first 100 bits of the
having a new scheme that on security parameter 𝑛
description would reveal the first few bits of the hardwired constant 𝑦, runs the original scheme with parameter 𝑛/2 (and
hence meaning that Alice will learn those bits from Bob’s message. hence will have a lot of “room” to pad the output of
EVAL with extra bits).
Thus we need to get a stronger property, known as circuit privacy.
This is a property that’s useful in other contexts where we use FHE.
Let us now define it:
::: {.definition title=“Perfect circuit privacy” #perfectcircprivat-
edef} Let ℰ = (𝐺, 𝐸, 𝐷, EVAL) be an FHE. We say that ℰ satisfies
perfect circuit privacy if for every (𝑒, 𝑑) output by 𝐺(1𝑛 ) and every
function 𝑓 ∶ {0, 1}ℓ → {0, 1} of 𝑝𝑜𝑙𝑦(𝑛) description size, and ev-
ery ciphertexts 𝑐1 , … , 𝑐ℓ and 𝑥1 , … , 𝑥ℓ ∈ {0, 1} such that 𝑐𝑖 is output
by 𝐸𝑒 (𝑥𝑖 ), the distribution of EVAL𝑒 (𝑓, 𝑐1 , … , 𝑐ℓ ) is identical to the
distribution of 𝐸𝑒 (𝑓(𝑥)). That is, for every 𝑧 ∈ {0, 1}∗ , the probabil-
ity that EVAL𝑒 (𝑓, 𝑐1 , … , 𝑐ℓ ) = 𝑧 is the same as the probability that
𝐸𝑒 (𝑓(𝑥)) = 𝑧. We stress that these probabilities are taken only over the
coins of the algorithms EVAL and 𝐸.
Perfect circuit privacy is a strong property, that also automatically
implies that 𝐷𝑑 (EVAL(𝑓, 𝐸𝑒 (𝑥1 ), … , 𝐸𝑒 (𝑥ℓ ))) = 𝑓(𝑥) (can you see
why?). In particular, once you understand the definition, the follow-
ing lemma is a fairly straightforward exercise.
Lemma 17.3 If (𝐺, 𝐸, 𝐷, EVAL) satisfies perfect circuit privacy then if
(𝑒, 𝑑) = 𝐺(1𝑛 ) then for every two functions 𝑓, 𝑓 ′ ∶ {0, 1}ℓ → {0, 1} of
𝑝𝑜𝑙𝑦(𝑛) description size and every 𝑥 ∈ {0, 1}ℓ such that 𝑓(𝑥) = 𝑓 ′ (𝑥),
and every algorithm 𝐴,

| Pr[𝐴(𝑑, EVAL(𝑓, 𝐸𝑒 (𝑥1 ), … , 𝐸𝑒 (𝑥ℓ ))) = 1]−Pr[𝐴(𝑑, EVAL(𝑓 ′ , 𝐸𝑒 (𝑥1 ), … , 𝐸𝑒 (𝑥ℓ ))) = 1]| < 𝑛𝑒𝑔𝑙(𝑛).
(17.1)

P
Please stop here and try to prove Lemma 17.3

The algorithm 𝐴 above gets the secret key as input, but still cannot
distinguish whether the EVAL algorithm used 𝑓 or 𝑓 ′ . In fact, the
expression on the lefthand side of (17.1) is equal to zero when the
scheme satisfies perfect circuit privacy.
mu lti pa rty se c u re co mp u tati on i i : con stru c ti on u si ng fu l ly homomorp hi c e nc ry p ti on 339

However, for our applications bounding it by a negligible function


is enough. Hence, we can use the relaxed notion of “imperfect” circuit
privacy, defined as follows:

Let ℰ = (𝐺, 𝐸, 𝐷, EVAL) be


Definition 17.4 — Statistical circuit privacy.
an FHE. We say that ℰ satisfies statistical circuit privacy if for every
(𝑒, 𝑑) output by 𝐺(1𝑛 ) and every function 𝑓 ∶ {0, 1}ℓ → {0, 1}
of 𝑝𝑜𝑙𝑦(𝑛) description size, and every ciphertexts 𝑐1 , … , 𝑐ℓ and
𝑥1 , … , 𝑥ℓ ∈ {0, 1} such that 𝑐𝑖 is output by 𝐸𝑒 (𝑥𝑖 ), the distribution
of EVAL𝑒 (𝑓, 𝑐1 , … , 𝑐ℓ ) is equal up to 𝑛𝑒𝑔𝑙(𝑛) total variation distance
to the distribution of 𝐸𝑒 (𝑓(𝑥)).
That is,

∑ |Pr[EVAL𝑒 (𝑓, 𝑐1 , … , 𝑐ℓ ) = 𝑧] − Pr[𝐸𝑒 (𝑓(𝑥)) = 𝑧]| < 𝑛𝑒𝑔𝑙(𝑛)


𝑧∈{0,1}∗

where once again, these probabilities are taken only over the
coins of the algorithms EVAL and 𝐸.

If you find Definition 17.4 hard to parse, the most important points
you need to remember about it are the following:

• Statistical circuit privacy is as good as perfect circuit privacy for all


applications, and so you can imagine the latter notion when using
it.

• Statistical circuit privacy can easier to achieve in constructions.

(The third point, which goes without saying, is that you can always
ask clarifying questions in class, Piazza, sections, or office hours…)
Intuitively, circuit privacy corresponds to what we need in the
above protocol to protect Bob’s security and ensure that Alice doesn’t
get any information about his input that she shouldn’t have from
the output of EVAL, but before working this out, let us see how we
can construct fully homomorphic encryption schemes satisfying this
property.

17.2 ACHIEVING CIRCUIT PRIVACY IN A FULLY HOMOMORPHIC


ENCRYPTION
We now discuss how we can modify our fully homomorphic encryp-
tion schemes to achieve the notion of circuit privacy. In the scheme
we saw, the encryption of a bit 𝑏, whether obtained through the en-
cryption algorithm or EVAL, always had the form of a matrix 𝐶 over

ℤ𝑞 (for 𝑞 = 2 𝑛 ) where 𝐶𝑣 = 𝑏𝑣 + 𝑒 for some vector 𝑒 that is “small”

(e.g., for every 𝑖, |𝑒𝑖 | < 𝑛𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑛) ≪ 𝑞 = 2 𝑛 ). However, the EVAL
algorithm was deterministic and hence this vector 𝑒 is a function of
340 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

whatever function 𝑓 we are evaluating and someone that knows the


secret key 𝑣 could recover 𝑒 and then obtain from it some information
about 𝑓. We want to make EVAL probabilistic and lose that informa-
tion, and we use the following approach
To kill a signal, drown it in lots of noise

That is, if we manage to add some additional random noise 𝑒′ that


has magnitude much larger than 𝑒, then it would essentially “erase”
any structure 𝑒 had. More formally, we will use the following lemma:
Lemma 17.5 Let 𝑎 ∈ ℤ𝑞 and 𝑇 ∈ ℕ be such that 𝑎𝑇 < 𝑞/2. If we let
𝑋 be the distribution obtained by taking 𝑥 (mod 𝑞) for an integer 𝑥
chosen at random in [−𝑇 , +𝑇 ] and let 𝑋 ′ be the distribution obtained
by taking 𝑎 + 𝑥 (mod 𝑞) for 𝑥 chosen in the same way, then

∑ |Pr[𝑋 = 𝑦] − Pr[𝑋 ′ = 𝑦]| < |𝑎|/𝑇


𝑦∈ℤ𝑞

Proof. This has a simple “proof by picture”: consider the intervals


[−𝑇 , +𝑇 ] and [−𝑇 + 𝑎, +𝑇 + 𝑎] on the number line (see Fig. 17.2).
Note that the symmetric difference of these two intervals is only about
a 𝑎/𝑇 fraction of their union. More formally, 𝑋 is the uniform dis- Figure 17.2: If 𝑎 ≪ 𝑇 then the uniform distribution
over the interval [−𝑇 , +𝑇 ] is statistically close to the
tribution over the 2𝑇 + 1 numbers in the interval [−𝑇 , +𝑇 ] while uniform distribution over the interval [−𝑇 + 𝑎, +𝑇 +
𝑋 ′ is the uniform distribution over the shifted version of this inter- 𝑎], since the statistical distance is proportional to the
event (which happens with probability 𝑎/𝑇 ) that a
val [−𝑇 + 𝑎, +𝑇 + 𝑎]. There are exactly 2|𝑎| numbers which get
random sample from one distribution falls inside the
probability zero under one of those distributions and probability symmetric difference of the two intervals.
(2𝑇 + 1)−1 < (2𝑇 )−1 under the other.

We will also use the following lemma:


Lemma 17.6 If two distributions over numbers 𝑋 and 𝑋 ′ satisfy
Δ(𝑋, 𝑋 ) = ∑𝑦∈ℤ | Pr[𝑋 = 𝑥] − Pr[𝑌 = 𝑦]| < 𝛿 then the dis-

tributions 𝑋 𝑚 and 𝑋 ′𝑚 over 𝑚 dimensional vectors where every


entry is sampled independently from 𝑋 or 𝑋 ′ respectively satisfy
Δ(𝑋 𝑚 , 𝑋 ′𝑚 ) ≤ 𝑚𝛿.

P
We omit the proof of Lemma 17.6 and leave it as
an exercise to prove it using the hybrid argument.
We will actually only use Lemma 17.6 for distri-
butions above; you can obtain intuition for it by
considering the 𝑚 = 2 case where we compare the
rectangles of the forms [−𝑇 , +𝑇 ] × [−𝑇 , +𝑇 ] and
[−𝑇 + 𝑎, +𝑇 + 𝑎] × [−𝑇 + 𝑏, +𝑇 + 𝑏]. You can see that
their union has size roughly 4𝑇 2 while their symmet-
ric difference has size roughly 2𝑇 ⋅ 2𝑎 + 2𝑇 ⋅ 2𝑏, and so
mu lti pa rty se c u re co mp u tati on i i : con stru c ti on u si ng fu l ly homomorp hi c e nc ry p ti on 341

if |𝑎|, |𝑏| ≤ 𝛿𝑇 then the symmetric difference is roughly


a 2𝛿 fraction of the union.

We will not provide the full details, but together these lemmas
show that EVAL can use bootstrapping to reduce the magnitude of
the noise to roughly 2𝑛 and then add an additional random noise of
0.1

roughly, say, 2𝑛 which would make it statistically indistinguishable


0.2

from the actual encryption. Here are some hints on how to make this
work: the idea is that in order to “re-randomize” a ciphertext 𝐶 we
need a very noisy encryption of zero and add it to 𝐶. The normal
encryption will use noise of magnitude 2𝑛 but we will provide an
0.2

encryption of the secret key with smaller magnitude 2𝑛 /𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑛)


0.1

so we can use bootstrapping to reduce the noise. The main idea that
allows to add noise is that at the end of the day, our scheme boils
down to LWE instances that have the form (𝑐, 𝜎) where 𝑐 is a random
vector in ℤ𝑛−1
𝑞 and 𝜎 = ⟨𝑐, 𝑠⟩ + 𝑎 where 𝑎 ∈ [−𝜂, +𝜂] is a small noise
addition. If we take any such input and add to 𝜎 some 𝑎′ ∈ [−𝜂′ , +𝜂′ ]
then we create the effect of completely re-randomizing the noise.
However, completely analyzing this requires non-trivial amount of
care and work.

17.2.1 Bottom line: A two party secure computation protocol


Using the above we can obtain the following theorem:

Theorem 17.7 — Re-randomizable FHE.If the LWE conjecture is true then


there exists a tuple of polynomial-time randomized algorithms
(𝐺, 𝐸, 𝐷, EVAL, RERAND) such that:

• (𝐺, 𝐸, 𝐷, EVAL) is a CPA secure fully-homomorphic encryption


for one bit messages. That is, if (𝑑, 𝑒) = 𝐺(1𝑛 ) then for every
Boolean circuit 𝐶 with ℓ inputs and one output, and 𝑥 ∈ {0, 1}ℓ ,
the ciphertext 𝑐 = EVAL𝑒 (𝐶, 𝐸𝑒 (𝑥1 ), … , 𝐸𝑒 (𝑥ℓ ) has length 𝑛 and
𝐷𝑑 (𝑐) = 𝐶(𝑥) with probability one over the random choices of
the algorithms 𝐸 and EVAL.

• For every pair of keys (𝑒, 𝑑) = 𝐺(1𝑛 ) there are two distributions
𝒞0 , 𝒞1 over {0, 1}𝑛 such that:

– For 𝑏 ∈ {0, 1}, Pr𝑐∼𝒞𝑏 [𝐷𝑑 (𝑐) = 𝑏] = 1. That is, 𝒞𝑏 is distributed


over ciphertexts that decrypt to 𝑏.
– For every ciphertext 𝑐 ∈ {0, 1}𝑛 in the image of either 𝐸𝑒 (⋅)
or EVAL𝑒 (⋅), if 𝐷𝑑 (𝑐) = 𝑏 then RERAND𝑒 (𝑐) is statistically
indistinguishable from 𝒞𝑏 . That is, the output of RERAND𝑒 (𝑐)
342 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

is a ciphertext that decrypts to the same plaintext as 𝑐, but


whose distribution is essentially independent of 𝑐.

Proof Idea:
We do not include the full proof but the idea is we use our standard
LWE-based FHE and to rerandomize a ciphertext 𝑐 we will add to it an
encryption of 0 (which will not change the corresponding plaintext)
and an additional noise vector that would be of much larger mag-
nitude than the original noise vector of 𝑐, but still small enough so
decryption succeeds.

Using the above re-randomizable encryption scheme, we can re-


define EVAL to add a RERAND step at the end and achieve statistical
circuit privacy. If we use Protocol 2PC with such a scheme then we get
a two party computation protocol secure with respect to honest but
curious adversaries. Using the compiler of Theorem 16.5 we obtain a
proof of Theorem 16.3 for the two party setting:

If the LWE conjecture


Theorem 17.8 — Two party secure computation.
is true then for every (potentially randomized) functionality
𝐹 ∶ {0, 1}𝑛1 × {0, 1}𝑛2 → {0, 1}𝑚1 × {0, 1}𝑚2 there exists a
polynomial-time protocol for computing the functionality 𝐹 secure
with respect to potentially malicious adversaries.

17.3 BEYOND TWO PARTIES


We now sketch how to go beyond two parties. It turns out that the
compiler of honest-but-curious to malicious security works just as well
in the many party setting, and so the crux of the matter is to obtain an
honest but curious secure protocol for 𝑘 > 2 parties.
We start with the case of three parties - Alice, Bob, and Charlie. 3
I believe this notation originates with Burrows–
First, let us introduce some convenient notation (which is used in Abadi–Needham (BAN) logic though would be
other settings as well).3 We will assume that each party initially gen- happy to get corrections/references.
erates private/public key pairs with respect to some fully homomor-
phic encryption (satisfying statistical circuit privacy) and sends them
to the other parties. We will use {𝑥}𝐴 to denote the encryption of
𝑥 ∈ {0, 1}ℓ using Alice’s public key (similarly {𝑥}𝐵 and {𝑥}𝐶 will
denote the encryptions of 𝑥 with respect to Bob’s and Charlie’s pub-
lic key. We can also compose these and so denote by {{𝑥}𝐴 }𝐵 the
encryption under Bob’s key of the encryption under Alice’s key of 𝑥.
With the notation above, Protocol 2PC can be described as follows:

Protocol 2PC: (Using BAN notation)


mu lti pa rty se c u re co mp u tati on i i : con stru c ti on u si ng fu l ly homomorp hi c e nc ry p ti on 343

• Inputs: Alice’s input is 𝑥 ∈ {0, 1}𝑛 and Bob’s in-


put is 𝑦 ∈ {0, 1}𝑛 . The goal is for Alice to learn
only 𝐹 (𝑥, 𝑦) and Bob to learn nothing.
• Alice->Bob: Alice sends {𝑥}𝐴 to Bob. (We omit
from this description the public key of Alice
which can be thought of as being concatenated to
the ciphertext).
• Bob->Alice: Bob sends {𝑓(𝑥, 𝑦)}𝐴 to Alice by
running EVAL𝐴 on the ciphertext {𝑥}𝐴 and the
map 𝑥 ↦ 𝐹 (𝑥, 𝑦).
• Alice’s output: Alice computes 𝑓(𝑥, 𝑦)

We can now describe the protocol for three parties. We will focus
on the case where the goal is for Alice to learn 𝐹 (𝑥, 𝑦, 𝑧) (where 𝑥, 𝑦, 𝑧
are the private inputs of Alice, Bob, and Charlie, respectively) and for
Bob and Charlie to learn nothing. As usual we can reduce the gen-
eral case to this by running the protocol multiple times with parties
switching the roles of Alice, Bob, and Charlie.

Protocol 3PC: (Using BAN notation)

• Inputs: Alice’s input is 𝑥 ∈ {0, 1}𝑛 , Bob’s input


is 𝑦 ∈ {0, 1}𝑛 , and Charlie’s input is 𝑧 ∈ {0, 1}𝑚 .
The goal is for Alice to learn only 𝐹 (𝑥, 𝑦, 𝑧) and
for Bob and Charlie to learn nothing.
• Alice->Bob: Alice sends {𝑥}𝐴 to Bob.
• Bob–>Charlie: Bob sends {{𝑥}𝐴 , 𝑦}𝐵 to Charlie.
• Charlie–>Bob: Charlie sends {{𝐹 (𝑥, 𝑦, 𝑧)}𝐴 }𝐵
to Bob. Charlie can do this by running EVAL𝐵
on the ciphertext and on the (efficiently com-
putable) map 𝑐, 𝑦 ↦ EVAL𝐴 (𝑓𝑦 , 𝑐) where 𝑓𝑦 is
the circuit 𝑥 ↦ 𝐹 (𝑥, 𝑦, 𝑧). (Please read this line
several times!)
• Bob–>Alice: Bob sends {𝐹 (𝑥, 𝑦, 𝑧)}𝐴 to Alice by
decrypting the ciphertext sent from Charlie.
• Alice’s output: Alice computes 𝐹 (𝑥, 𝑦, 𝑧) by
decrypting the ciphertext sent from Bob.

If the
Theorem 17.9 — Three party honest-but-curious secure computation.
underlying encryption is a fully homomorphic statistically circuit
private encryption then Protocol 3PC is a secure protocol for the
344 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

functionality (𝑥, 𝑦, 𝑧) ↦ (𝐹 (𝑥, 𝑦, 𝑧), ⊥, ⊥) with respect to honest-


but-curious adversaries.

Proof. Left to the reader :)



18
Quantum computing and cryptography I

“I think I can safely say that nobody understands quan-


tum mechanics.” , Richard Feynman, 1965

“The only difference between a probabilistic classical


world and the equations of the quantum world is that
somehow or other it appears as if the probabilities would
have to go negative”, Richard Feynman, 1982

There were two schools of natural philosophy in ancient Greece.


Aristotle believed that objects have an essence that explains their behav-
ior, and a theory of the natural world has to refer to the reasons (or “fi-
nal cause” to use Aristotle’s language) as to why they exhibit certain
phenomena. Democritus believed in a purely mechanistic explanation
of the world. In his view, the universe was ultimately composed of
elementary particles (or Atoms) and our observed phenomena arise
from the interactions between these particles according to some local
rules. Modern science (arguably starting with Newton) has embraced
Democritus’ point of view, of a mechanistic or “clockwork” universe
of particles and forces acting upon them.
While the classification of particles and forces evolved with time,
to a large extent the “big picture” has not changed from Newton till
Einstein. In particular it was held as an axiom that if we knew fully
the current state of the universe (i.e., the particles and their properties
such as location and velocity) then we could predict its future state at
any point in time. In computational language, in all these theories the
state of a system with 𝑛 particles could be stored in an array of 𝑂(𝑛)
numbers, and predicting the evolution of the system can be done by
running some efficient (e.g., 𝑝𝑜𝑙𝑦(𝑛) time) deterministic computation
on this array.

Compiled on 11.17.2021 22:35


346 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

18.1 THE DOUBLE SLIT EXPERIMENT


Alas, in the beginning of the 20th century, several experimental re-
sults were calling into question this “clockwork” or “billiard ball”
theory of the world. One such experiment is the famous double slit ex-
periment. Here is one way to describe it. Suppose that we buy one of
those baseball pitching machines, and aim it at a soft plastic wall, but
put a metal barrier with a single slit between the machine and the plastic
wall (see Fig. 18.1). If we shoot baseballs at the plastic wall, then some
of the baseballs would bounce off the metal barrier, while some would
make it through the slit and dent the wall. If we now carve out an ad-
ditional slit in the metal barrier then more balls would get through,
and so the plastic wall would be even more dented.
So far this is pure common sense, and it is indeed (to my knowl-
edge) an accurate description of what happens when we shoot base-
balls at a plastic wall. However, this is not the same when we shoot
photons. Amazingly, if we shoot with a “photon gun” (i.e., a laser) at
a wall equipped with photon detectors through some barrier, then
(as shown in Fig. 18.2) in some positions of the wall we will see fewer
hits when the two slits are open than one only ones of them is!.1 In
particular there are positions in the wall that are hit when the first slit
Figure 18.1: In the “double baseball experiment” we
is open, hit when the second gun is open, but are not hit at all when both shoot baseballs from a gun at a soft wall through a
slits are open!. hard barrier that has one or two slits open in it. There
It seems as if each photon coming out of the gun is aware of the is only “constructive interference” in the sense that
the dent in each position in the wall when both slits
global setup of the experiment, and behaves differently if two slits are are open is the sum of the dents when each slit is
open than if only one is. If we try to “catch the photon in the act” and open on its own.
place a detector right next to each slit so we can see exactly the path
1
A nice illustrated description of the double slit
experiment appears in this video.
each photon takes then something even more bizarre happens. The
mere fact that we measure the path changes the photon’s behavior, and
now this “destructive interference” pattern is gone and the number
of times a position is hit when two slits are open is the sum of the
number of times it is hit when each slit is open.

P
You should read the paragraphs above more than
once and make sure you appreciate how truly mind Figure 18.2: The setup of the double slit experiment
boggling these results are. in the case of photon or electron guns. We see also
destructive interference in the sense that there are
some positions on the wall that get fewer hits when
both slits are open than they get when only one of the
slits is open. Image credit: Wikipedia.
18.2 QUANTUM AMPLITUDES
The double slit and other experiments ultimately forced scientists to
accept a very counterintuitive picture of the world. It is not merely
about nature being randomized, but rather it is about the probabilities
in some sense “going negative” and cancelling each other!
qua n tu m comp u ti ng a n d c ry p tog r a p hy i 347

To see what we mean by this, let us go back to the baseball exper-


iment. Suppose that the probability a ball passes through the left slit
is 𝑝𝐿 and the probability that it passes through the right slit is 𝑝𝑅 .
Then, if we shoot 𝑁 balls out of each gun, we expect the wall will be
hit (𝑝𝐿 + 𝑝𝑅 )𝑁 times. In contrast, in the quantum world of photons
instead of baseballs, it can sometimes be the case that in both the first
and second case the wall is hit with positive probabilities 𝑝𝐿 and 𝑝𝑅
respectively but somehow when both slits are open the wall (or a par-
ticular position in it) is not hit at all. It’s almost as if the probabilities
can “cancel each other out”.
To understand the way we model this in quantum mechanics, it is
helpful to think of a “lazy evaluation” approach to probability. We
can think of a probabilistic experiment such as shooting a baseball
through two slits in two different ways:
• When a ball is shot, “nature” tosses a coin and decides if it will go
through the left slit (which happens with probability 𝑝𝐿 ), right slit
(which happens with probability 𝑝𝑅 ), or bounce back. If it passes
through one of the slits then it will hit the wall. Later we can look at
the wall and find out whether or not this event happened, but the
fact that the event happened or not is determined independently of
whether or not we look at the wall.

• The other viewpoint is that when a ball is shot, “nature” computes


the probabilities 𝑝𝐿 and 𝑝𝑅 as before, but does not yet “toss the
coin” and determines what happened. Only when we actually
look at the wall, nature tosses a coin and with probability 𝑝𝐿 + 𝑝𝑅
ensures we see a dent. That is, nature uses “lazy evaluation”, and
only determines the result of a probabilistic experiment when we
decide to measure it.
While the first scenario seems much more natural, the end result
in both is the same (the wall is hit with probability 𝑝𝐿 + 𝑝𝑅 ) and so
the question of whether we should model nature as following the first
scenario or second one seems like asking about the proverbial tree that
falls in the forest with no one hearing about it.
However, when we want to describe the double slit experiment
with photons rather than baseballs, it is the second scenario that lends
itself better to a quantum generalization. Quantum mechanics as-
sociates a number 𝛼 known as an amplitude with each probabilistic
experiment. This number 𝛼 can be negative, and in fact even complex.
We never observe the amplitudes directly, since whenever we mea-
sure an event with amplitude 𝛼, nature tosses a coin and determines
that the event happens with probability |𝛼|2 . However, the sign (or
in the complex case, phase) of the amplitudes can affect whether two
different events have constructive or destructive interference.
348 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Specifically, consider an event that can either occur or not (e.g. “de-
tector number 17 was hit by a photon”). In classical probability, we
model this by a probability distribution over the two outcomes: a pair
of non-negative numbers 𝑝 and 𝑞 such that 𝑝 + 𝑞 = 1, where 𝑝 corre-
sponds to the probability that the event occurs and 𝑞 corresponds to
the probability that the event does not occur. In quantum mechanics,
we model this also by pair of numbers, which we call amplitudes. This
is a pair of (potentially negative or even complex) numbers 𝛼 and 𝛽
such that |𝛼|2 + |𝛽|2 = 1. The probability that the event occurs is |𝛼|2
and the probability that it does not occur is |𝛽|2 . In isolation, these
negative or complex numbers don’t matter much, since we anyway
square them to obtain probabilities. But the interaction of positive and
negative amplitudes can result in surprising cancellations where some-
how combining two scenarios where an event happens with positive
probability results in a scenario where it never does.

P
If you don’t find the above description confusing and
unintuitive, you probably didn’t get it. Please make
sure to re-read the above paragraphs until you are
thoroughly confused.

Quantum mechanics is a mathematical theory that allows us to


calculate and predict the results of the double-slit and many other ex-
periments. If you think of quantum mechanics as an explanation as to
what “really” goes on in the world, it can be rather confusing. How-
ever, if you simply “shut up and calculate” then it works amazingly
well at predicting experimental results. In particular, in the double
slit experiment, for any position in the wall, we can compute num-
bers 𝛼 and 𝛽 such that photons from the first and second slit hit that
position with probabilities |𝛼|2 and |𝛽|2 respectively. When we open
both slits, the probability that the position will be hit is proportional
to |𝛼 + 𝛽|2 , and so in particular, if 𝛼 = −𝛽 then it will be the case that,
despite being hit when either slit one or slit two are open, the position
is not hit at all when they both are. If you are confused by quantum
mechanics, you are not alone: for decades people have been trying to
come up with explanations for “the underlying reality” behind quan-
tum mechanics, including Bohmian Mechanics, Many Worlds and
others. However, none of these interpretations have gained universal
acceptance and all of those (by design) yield the same experimental
predictions. Thus at this point many scientists prefer to just ignore the
question of what is the “true reality” and go back to simply “shutting
up and calculating”.
qua n tu m comp u ti ng a n d c ry p tog r a p hy i 349

Some of the counterintuitive properties that arise from amplitudes


or “negative probabilities” include:

• Interference - As we see here, probabilities can “cancel each other


out”.
• Measurement - The idea that probabilities are negative as long as
“no one is looking” and “collapse” to positive probabilities when
they are measured is deeply disturbing. Indeed, people have shown
that it can yield to various strange outcomes such as “spooky ac-
tions at a distance”, where we can measure an object at one place
and instantaneously (faster than the speed of light) cause a dif-
ference in the results of a measurements in a place far removed.
Unfortunately (or fortunately?) these strange outcomes have been
confirmed experimentally.
• Entanglement - The notion that two parts of the system could be
connected in this weird way where measuring one will affect the
other is known as quantum entanglement.

Again, as counter-intuitive as these concepts are, they have been


experimentally confirmed, so we just have to live with them.

R
Remark 18.1 — Complex vs real, other simplifications. If
(like the author) you are a bit intimidated by complex
numbers, don’t worry: you can think of all ampli-
tudes as real (though potentially negative) numbers
without loss of understanding. All the “magic” of
quantum computing already arises in this case, and
so we will often restrict attention to real amplitudes in
this chapter.
We will also only discuss so-called pure quantum
states, and not the more general notion of mixed states.
Pure states turn out to be sufficient for understanding
the algorithmic aspects of quantum computing.
More generally, this chapter is not meant to be a com-
plete description of quantum mechanics, quantum
information theory, or quantum computing, but rather
illustrate the main points where these differ from
classical computing.

18.2.1 Quantum computing and computation - an executive summary.


One of the strange aspects of the quantum-mechanical picture of the
world is that unlike in the billiard ball example, there is no obvious
algorithm to simulate the evolution of 𝑛 particles over 𝑡 time periods
in 𝑝𝑜𝑙𝑦(𝑛, 𝑡) steps. In fact, the natural way to simulate 𝑛 quantum par-
ticles will require a number of steps that is exponential in 𝑛. This is a
350 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

huge headache for scientists that actually need to do these calculations


in practice.
In the 1981, physicist Richard Feynman proposed to “turn this
lemon to lemonade” by making the following almost tautological
observation:

If a physical system cannot be simulated by a computer


in 𝑇 steps, the system can be considered as performing a
computation that would take more than 𝑇 steps

So, he asked whether one could design a quantum system such that
its outcome 𝑦 based on the initial condition 𝑥 would be some function
𝑦 = 𝑓(𝑥) such that (a) we don’t know how to efficiently compute 2
As its title suggests, Feynman’s lecture was actually
focused on the other side of simulating physics
in any other way, and (b) is actually useful for something.2 In 1985, with a computer, but he mentioned that as a “side
David Deutsch formally suggested the notion of a quantum Turing remark” one could wonder if it’s possible to simulate
machine, and the model has been since refined in works of Detusch physics with a new kind of computer - a “quantum
computer” which would “not [be] a Turing machine,
and Josza and Bernstein and Vazirani. Such a system is now known as but a machine of a different kind”. As far as I know,
a quantum computer. Feynman did not suggest that such a computer could
be useful for computations completely outside the
For a while these hypothetical quantum computers seemed useful domain of quantum simulation, and in fact he found
for one of two things. First, to provide a general-purpose mecha- the question of whether quantum mechanics could
nism to simulate a variety of the real quantum systems that people be simulated by a classical computer to be more
interesting.
care about. Second, as a challenge to the theory of computation’s ap-
proach to model efficient computation by Turing machines, though a
challenge that has little bearing to practice, given that this theoretical
“extra power” of quantum computer seemed to offer little advantage
in the problems people actually want to solve such as combinatorial
optimization, machine learning, data structures, etc..
To a significant extent, this is still true today. We have no real ev-
idence that quantum computers, when built, will offer truly signif- 3
I am using the theorist’ definition of conflating
“significant” with “super-polynomial”. As we’ll
icant3 advantage in 99 percent of the applications of computing.4 see, Grover’s algorithm does offer a very generic
However, there is one cryptography-sized exception: In 1994 Peter quadratic advantage in computation. Whether that
Shor showed that quantum computers can solve the integer factoring quadratic advantage will ever be good enough to
offset in practice the significant overhead in building
and discrete logarithm in polynomial time. This result has captured a quantum computer remains an open question.
the imagination of a great many people, and completely energized We also don’t have evidence that super-polynomial
speedups can’t be achieved for some problems outside
research into quantum computing. the Factoring/Dlog or quantum simulation domains,
This is both because the hardness of these particular problems and there is at least one company banking on such
provides the foundations for securing such a huge part of our commu- speedups actually being feasible.
4
This “99 percent” is a figure of speech, but not
nications (and these days, our economy), as well as it was a powerful completely so. It seems that for many web servers,
demonstration that quantum computers could turn out to be useful the TLS protocol (which based on the current non-
for problems that a-priori seemed to have nothing to do with quantum lattice based systems would be completely broken
by quantum computing) is responsible for about 1
physics. percent of the CPU usage.
At the moment there are several intensive efforts to construct large
scale quantum computers. It seems safe to say that, in the next five
years or so there will not be a quantum computer large enough to fac-
tor, say, a 1024 bit number. However, some quantum computers have
qua n tu m comp u ti ng a n d c ry p tog r a p hy i 351

been built that achieved tasks that are either not known to be achieved
classically, or at least seem to require more resources classically than
they do for these quantum computers. When and if such a computer
is built that can break reasonable parameters of Diffie Hellman, RSA
and elliptic curve cryptography is anybody’s guess. It could also be
a “self destroying prophecy” whereby the existence of a small-scale
quantum computer would cause everyone to shift away to lattice-
based crypto which in turn will diminish the motivation to invest the 5
Of course, given that “export grade” cryptography
huge resources needed to build a large scale quantum computer.5 that was supposed to disappear with 1990’s took
The above summary might be all that you need to know as a cryp- a long time to die, I imagine that we’ll still have
products running 1024 bit RSA when everyone has a
tographer, and enough motivation to study lattice-based cryptography
quantum laptop.
as we do in this course. However, because quantum computing is
such a beautiful and (like cryptography) counter-intuitive concept,
we will try to give at least a hint of what it is about and how Shor’s
algorithm works.

18.3 QUANTUM 101


We now present some of the basic notions in quantum information.
It is very useful to contrast these notions to the setting of probabilistic
systems and see how “negative probabilities” make a difference. This
discussion is somewhat brief. The chapter on quantum computation
in my book with Arora (see draft here) is one relatively short resource
that contains essentially everything we discuss here. See also this
blog post of Aaronson for a high level explanation of Shor’s algorithm
which ends with links to several more detailed expositions. See also
this lecture of Aaronson for a great discussion of the feasibility of
quantum computing (Aaronson’s course lecture notes and the book
that they spawned are fantastic reads as well).

States: We will consider a simple quantum system that includes 𝑛


objects (e.g., electrons/photons/transistors/etc..) each of which can be
in either an “on” or “off” state - i.e., each of them can encode a single
bit of information, but to emphasize the “quantumness” we will call it
a qubit. A probability distribution over such a system can be described
as a 2𝑛 dimensional vector 𝑣 with non-negative entries summing up
to 1, where for every 𝑥 ∈ {0, 1}𝑛 , 𝑣𝑥 denotes the probability that the
system is in state 𝑥. As we mentioned, quantum mechanics allows
negative (in fact even complex) probabilities and so a quantum state
of the system can be described as a 2𝑛 dimensional vector 𝑣 such that
‖𝑣‖2 = ∑𝑥 |𝑣𝑥 |2 = 1.

Measurement: Suppose that we were in the classical probabilistic


setting, and that the 𝑛 bits are simply random coins. Thus we can
describe the state of the system by the 2𝑛 -dimensional vector 𝑣 such
that 𝑣𝑥 = 2−𝑛 for all 𝑥. If we measure the system and see what the coins
352 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

came out, we will get the value 𝑥 with probability 𝑣𝑥 . Naturally, if we


measure the system twice we will get the same result. Thus, after we
see that the coin is 𝑥, the new state of the system collapses to a vector 𝑣
such that 𝑣𝑦 = 1 if 𝑦 = 𝑥 and 𝑣𝑦 = 0 if 𝑦 ≠ 𝑥. In a quantum state, we do
the same thing: if we measure a vector 𝑣 corresponds to turning it with
probability |𝑣𝑥 |2 into a vector that has 1 on coordinate 𝑥 and zero on
all the other coordinates.

Operations: In the classical probabilistic setting, if we have a system


in state 𝑣 and we apply some function 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑛 then this
transforms 𝑣 to the state 𝑤 such that 𝑤𝑦 = ∑𝑥∶𝑓(𝑥)=𝑦 𝑣𝑥 .
Another way to state this, is that 𝑤 = 𝑀𝑓 where 𝑀𝑓 is the matrix
such that 𝑀𝑓(𝑥),𝑥 = 1 for all 𝑥 and all other entries are 0. If we toss a
coin and decide with probability 1/2 to apply 𝑓 and with probability
1/2 to apply 𝑔, this corresponds to the matrix (1/2)𝑀𝑓 + (1/2)𝑀𝑔 .
More generally, the set of operations that we can apply can be cap-
tured as the set of convex combinations of all such matrices- this is
simply the set of non-negative matrices whose columns all sum up to
1- the stochastic matrices. In the quantum case, the operations we can
apply to a quantum state are encoded as a unitary matrix, which is a
matrix 𝑀 such that ‖𝑀 𝑣‖ = ‖𝑣‖ for all vectors 𝑣.

Elementary operations: Of course, even in the probabilistic setting, not


every function 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑛 is efficiently computable. We
think of a function as efficiently computable if it is composed of poly-
nomially many elementary operations, that involve at most 2 or 3 bits
or so (i.e., Boolean gates). That is, we say that a matrix 𝑀 is elemen-
tary if it only modifies three bits. That is, 𝑀 is obtained by “lifting”
some 8 × 8 matrix 𝑀 ′ that operates on three bits 𝑖, 𝑗, 𝑘, leaving all the
rest of the bits intact. Formally, given an 8 × 8 matrix 𝑀 ′ (indexed by
strings in {0, 1}3 ) and three distinct indices 𝑖 < 𝑗 < 𝑘 ∈ {1, … , 𝑛}
we define the 𝑛-lift of 𝑀 ′ with indices 𝑖, 𝑗, 𝑘 to be the 2𝑛 × 2𝑛 matrix 𝑀
such that for every strings 𝑥 and 𝑦 that agree with each other on all
coordinates except possibly 𝑖, 𝑗, 𝑘, 𝑀𝑥,𝑦 = 𝑀𝑥′ 𝑖 𝑥𝑗 𝑥𝑘 ,𝑦𝑖 𝑦𝑗 𝑦𝑘 and other-
wise 𝑀𝑥,𝑦 = 0. Note that if 𝑀 ′ is of the form 𝑀𝑓′ for some function
𝑓 ∶ {0, 1}3 → {0, 1}3 then 𝑀 = 𝑀𝑔 where 𝑔 ∶ {0, 1}𝑛 → {0, 1}𝑛 is
defined as 𝑔(𝑥) = 𝑓(𝑥𝑖 𝑥𝑗 𝑥𝑘 ). We define 𝑀 as an elementary stochastic
matrix or a probabilistic gate if 𝑀 is equal to an 𝑛 lift of some stochas-
tic 8 × 8 matrix 𝑀 ′ . The quantum case is similar: a quantum gate is a
2𝑛 × 2𝑛 matrix that is an 𝑁 lift of some unitary 8 × 8 matrix 𝑀 ′ . It is an
exercise to prove that lifting preserves stochasticity and unitarity. That
is, every probabilistic gate is a stochastic matrix and every quantum
gate is a unitary matrix.
qua n tu m comp u ti ng a n d c ry p tog r a p hy i 353

Complexity: For every stochastic matrix 𝑀 we can define its random-


ized complexity, denoted as 𝑅(𝑀 ) to be the minimum number 𝑇 such
that 𝑀 is can be (approximately) obtained by combining 𝑇 elemen-
tary probabilistic gates. To be concrete, we can define 𝑅(𝑀 ) to be the
minimum 𝑇 such that there exists 𝑇 elementary matrices 𝑀1 , … , 𝑀𝑇
such that for every 𝑥, ∑𝑦 |𝑀𝑦,𝑥 − (𝑀𝑇 ⋯ 𝑀1 )𝑦,𝑥 | < 0.1. (It can be
shown that 𝑅(𝑀 ) is finite and in fact at most 10𝑛 for every 𝑀 ; we can
do so by writing 𝑀 as a convex combination of function and writing
every function as a composition of functions that map a single string
𝑥 to 𝑦, keeping all other inputs intact.) We will say that a probabilistic
process 𝑀 mapping distributions on {0, 1}𝑛 to distributions on {0, 1}𝑛
is efficiently classically computable if 𝑅(𝑀 ) ≤ 𝑝𝑜𝑙𝑦(𝑛). If 𝑀 is a unitary
matrix, then we define the quantum complexity of 𝑀 , denoted as 𝑄(𝑀 ),
to be the minimum number 𝑇 such that there are quantum gates
𝑀1 , … , 𝑀𝑇 satisfying that for every 𝑥, ∑𝑦 |𝑀𝑦,𝑥 − (𝑀𝑇 ⋯ 𝑀1 )𝑦,𝑥 |2 <
0.1.
We say that 𝑀 is efficiently quantumly computable if 𝑄(𝑀 ) ≤ 𝑝𝑜𝑙𝑦(𝑛).

Computing functions: We have defined what it means for an operator


to be probabilistically or quantumly efficiently computable, but we
typically are interested in computing some function 𝑓 ∶ {0, 1}𝑚 →
{0, 1}ℓ . The idea is that we say that 𝑓 is efficiently computable if the
corresponding operator is efficiently computable, except that we also
allow to use extra memory and so to embed 𝑓 in some 𝑛 = 𝑝𝑜𝑙𝑦(𝑚).
We define 𝑓 to be efficiently classically computable if there is some 𝑛 =
𝑝𝑜𝑙𝑦(𝑚) such that the operator 𝑀𝑔 is efficiently classically computable
where 𝑔 ∶ {0, 1}𝑛 → {0, 1}𝑛 is defined such that 𝑔(𝑥1 , … , 𝑥𝑛 ) =
𝑓(𝑥1 , … , 𝑥𝑚 ). In the quantum case we have a slight twist since the 6
It is a good exercise to verify that for every 𝑔 ∶
operator 𝑀𝑔 is not necessarily a unitary matrix.6 Therefore we say that {0, 1}𝑛 → {0, 1}𝑛 , 𝑀𝑔 is unitary if and only if 𝑔 is a
𝑓 is efficiently quantumly computable if there is 𝑛 = 𝑝𝑜𝑙𝑦(𝑚) such that the permutation.
operator 𝑀𝑞 is efficiently quantumly computable where 𝑔 ∶ {0, 1}𝑛 →
{0, 1}𝑛 is defined as 𝑔(𝑥1 , … , 𝑥𝑛 ) = 𝑥1 ⋯ 𝑥𝑚 ‖(𝑓(𝑥1 ⋯ 𝑥𝑚 )0𝑛−𝑚−ℓ ⊕
𝑥𝑚+1 ⋯ 𝑥𝑛 ).

Quantum and classical computation: The way we defined what it means


for a function to be efficiently quantumly computable, it might not be
clear that if 𝑓 ∶ {0, 1}𝑚 → {0, 1}ℓ is a function that we can compute
by a polynomial size Boolean circuit (e.g., combining polynomially
many AND, OR and NOT gates) then it is also quantumly efficiently
computable. The idea is that for every gate 𝑔 ∶ {0, 1}2 → {0, 1} we
can define an 8 × 8 unitary matrix 𝑀ℎ where ℎ ∶ {0, 1}3 → {0, 1}3
has the form ℎ(𝑎, 𝑏, 𝑐) = 𝑎, 𝑏, 𝑐 ⊕ 𝑔(𝑎, 𝑏). So, if 𝑓 has a circuit of 𝑠
gates, then we can dedicate an extra bit for every one of these gates
and then run the corresponding 𝑠 unitary operations one by one, at
the end of which we will get an operator that computes the mapping
354 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

𝑥1 , … , 𝑥𝑚+ℓ+𝑠 = 𝑥1 ⋯ 𝑥𝑚 ‖𝑥𝑚+1 ⋯ 𝑥𝑚+𝑠 ⊕ 𝑓(𝑥1 , … , 𝑥𝑚 )‖𝑔(𝑥1 … 𝑥𝑚 )


where the the ℓ + 𝑖𝑡ℎ bit of 𝑔(𝑥1 , … , 𝑥𝑛 ) is the result of applying the
𝑖𝑡ℎ gate in the calculation of 𝑓(𝑥1 , … , 𝑥𝑚 ). So this is “almost” what we
wanted except that we have this “extra junk” that we need to get rid
of. The idea is that we now simply run the same computation again
which will basically we mean we XOR another copy of 𝑔(𝑥1 , … , 𝑥𝑚 ) to
the last 𝑠 bits, but since 𝑔(𝑥) ⊕ 𝑔(𝑥) = 0𝑠 we get that we compute the
map 𝑥 ↦ 𝑥1 ⋯ 𝑥𝑚 ‖(𝑓(𝑥1 , … , 𝑥𝑚 )0𝑠 ⊕ 𝑥𝑚+1 ⋯ 𝑥𝑚+ℓ+𝑠 ) as desired.

The ”obviously exponential” fallacy:A priori it might seem “obvious”


that quantum computing is exponentially powerful, since to com-
pute a quantum computation on 𝑛 bits we need to maintain the 2𝑛
dimensional state vector and apply 2𝑛 × 2𝑛 matrices to it. Indeed
popular descriptions of quantum computing (too) often say some-
thing along the lines that the difference between quantum and clas-
sical computer is that a classic bit can either be zero or one while a
qubit can be in both states at once, and so in many qubits a quantum
computer can perform exponentially many computations at once. De-
pending on how you interpret this, this description is either false or
would apply equally well to probabilistic computation. However, for
probabilistic computation it is a not too hard exercise to show that if
𝑓 ∶ {0, 1}𝑚 → {0, 1}𝑛 is an efficiently computable function then it has 7
It is a good exercise to show that if 𝑀 is a proba-
a polynomial size circuit of AND, OR and NOT gates.7 Moreover, this bilistic process with 𝑅(𝑀) ≤ 𝑇 then there exists a
“obvious” approach for simulating a quantum computation will take probabilistic circuit of size, say, 100𝑇 𝑛2 that approx-
imately computes 𝑀 in the sense that for every input
not just exponential time but exponential space as well, while it is not
𝑥, ∑𝑦∈{0,1}𝑛 ∣Pr[𝐶(𝑥) = 𝑦] − 𝑀𝑥,𝑦 ∣ < 1/3.
hard to show that using a simple recursive formula one can calculate
the final quantum state using polynomial space (in physics parlance this
is known as “Feynman path integrals”). So, the exponentially long
vector description by itself does not imply that quantum computers
are exponentially powerful. Indeed, we cannot prove that they are
(since in particular we can’t prove that every polynomial space cal-
culation can be done in polynomial time, in complexity parlance we
don’t know how to rule out that 𝑃 = PSPACE), but we do have some
problems (integer factoring most prominently) for which they do
provide exponential speedup over the currently best known classical
(deterministic or probabilistic) algorithms.

18.3.1 Physically realizing quantum computation


To realize quantum computation one needs to create a system with 𝑛
independent binary states (i.e., “qubits”), and be able to manipulate
small subsets of two or three of these qubits to change their state.
While by the way we defined operations above it might seem that
one needs to be able to perform arbitrary unitary operations on these
two or three qubits, it turns out that there several choices for universal
sets - a small constant number of gates that generate all others. The
qua n tu m comp u ti ng a n d c ry p tog r a p hy i 355

biggest challenge is how to keep the system from being measured and
collapsing to a single classical combination of states. This is sometimes
known as the coherence time of the system. The threshold theorem says
that there is some absolute constant level of errors 𝜏 so that if errors
are created at every gate at rate smaller than 𝜏 then we can recover
from those and perform arbitrary long computations. (Of course there
are different ways to model the errors and so there are actually several
threshold theorems corresponding to various noise models).
There have been several proposals to build quantum computers:

• Superconducting quantum computers use super-conducting elec-


tric circuits to do quantum computation. These are currently the
devices with largest number of fully controllable qubits.

• At Harvard, Lukin’s group is using cold atoms to implement quan-


tum computers.

• Trapped ion quantum computers Use the states of an ion to sim-


ulate a qubit. People have made some recent advances on these
computers too. For example, an ion-trap computer was used to im-
plement Shor’s algorithm to factor 15. (It turns out that 15 = 3 × 5
:) )

• Topological quantum computers use a different technology, which


is more stable by design but arguably harder to manipulate to cre-
ate quantum computers.

These approaches are not mutually exclusive and it could be that


ultimately quantum computers are built by combining all of them
together. At the moment, we have devices with about 100 qubits,
and about 1% error per gate. Such restricted machines are sometimes
called “Noisy Intermediate-Scale Quantum Computers” or “NISQ”.
See this article by John Preskil for some of the progress and applica-
tions of such more restricted devices. If the number of qubits is in-
creased and the error is decreased by one or two orders of magnitude,
we could start seeing more applications.

18.3.2 Bra-ket notation


Quantum computing is very confusing and counterintuitive for many
reasons. But there is also a “cultural” reason why people sometimes
find quantum arguments hard to follow. Quantum folks follow their
own special notation for vectors. Many non quantum people find it
ugly and confusing, while quantum folks secretly wish they people
Figure 18.3: Superconducting quantum computer
used it all the time, not just for non-quantum linear algebra, but also prototype at Google. Image credit: Google / MIT
for restaurant bills and elemntary school math classes. Technology Review.
356 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

The notation is actually not so confusing. If 𝑥 ∈ {0, 1}𝑛 then |𝑥⟩


denotes the 𝑥𝑡ℎ standard basis vector in 2𝑛 dimension. That is |𝑥⟩ 2𝑛 -
dimensional column vector that has 1 in the 𝑥𝑡ℎ coordinate and zero
everywhere else. So, we can describe the column vector that has 𝛼𝑥
in the 𝑥𝑡ℎ entry as ∑𝑥∈{0,1}𝑛 𝛼𝑥 |𝑥⟩. One more piece of notation that is
useful is that if 𝑥 ∈ {0, 1}𝑛 and 𝑦 ∈ {0, 1}𝑚 then we identify |𝑥⟩|𝑦⟩ with
|𝑥𝑦⟩ (that is, the 2𝑛+𝑚 dimensional vector that has 1 in the coordinate
corresponding to the concatenation of 𝑥 and 𝑦, and zero everywhere
else). This is more or less all you need to know about this notation to 8
If you are curious, there is an analog notation for
follow this lecture.8 row vectors as ⟨𝑥|. Generally if 𝑢 is a vector then
A quantum gate is an operation on at most three bits, and so it can |𝑢⟩ would be its form as a column vector and ⟨𝑢|
would be its form as a row product. Hence since
be completely specified by what it does to the 8 vectors |000⟩, … , |111⟩. 𝑢⊤ 𝑣 = ⟨𝑢, 𝑣⟩ the inner product of 𝑢 and 𝑏 can be
Quantum states are always unit vectors and so we sometimes omit the thought of as ⟨𝑢||𝑣⟩ . The outer product (the matrix
whose 𝑖, 𝑗 entry is 𝑢𝑖 𝑣𝑗 ) is denoted as |𝑢⟩⟨𝑣|.
normalization for convenience; for example we will identify the state
|0⟩ + |1⟩ with its normalized version √12 |0⟩ + √12 |1⟩.

18.4 BELL’S INEQUALITY


There is something weird about quantum mechanics. In 1935 Einstein,
Podolsky and Rosen (EPR) tried to pinpoint this issue by highlighting
a previously unrealized corollary of this theory. They showed that
the idea that nature does not determine the results of an experiment
until it is measured results in so called “spooky action at a distance”.
Namely, making a measurement of one object may instantaneously
effect the state (i.e., the vector of amplitudes) of another object in the
other end of the universe.
Since the vector of amplitudes is just a mathematical abstraction,
the EPR paper was considered to be merely a thought experiment for
philosophers to be concerned about, without bearing on experiments.
This changed when in 1965 John Bell showed an actual experiment
to test the predictions of EPR and hence pit intuitive common sense
against the quantum mechanics. Quantum mechanics won: it turns
out that it is in fact possible to use measurements to create correlations
between the states of objects far removed from one another that cannot
be explained by any prior theory. Nonetheless, since the results of
these experiments are so obviously wrong to anyone that has ever sat
in an armchair, that there are still a number of Bell denialists arguing 9
If you are extremely paranoid about Alice and Bob
that this can’t be true and quantum mechanics is wrong. communicating with one another, you can coordinate
So, what is this Bell’s Inequality? Suppose that Alice and Bob try with your assistant to perform the experiment exactly
at the same time, and make sure that the rooms
to convince you they have telepathic ability, and they aim to prove it are sufficiently far apart (e.g., are on two different
via the following experiment. Alice and Bob will be in separate closed continents, or maybe even one is on the moon and
another is on earth) so that Alice and Bob couldn’t
rooms.9 You will interrogate Alice and your associate will interrogate
communicate to each other in time the results of their
Bob. You choose a random bit 𝑥 ∈ {0, 1} and your associate chooses respective coins even if they do so at the speed of
a random 𝑦 ∈ {0, 1}. We let 𝑎 be Alice’s response and 𝑏 be Bob’s light.
qua n tu m comp u ti ng a n d c ry p tog r a p hy i 357

response. We say that Alice and Bob win this experiment if 𝑎 ⊕ 𝑏 =


𝑥 ∧ 𝑦. In other words, Alice and Bob need to output two bits that
This form of Bell’s game was shown by Clauser,
10
disagree if 𝑥 = 𝑦 = 1 and agree otherwise.10 Horne, Shimony, and Holt.
Now if Alice and Bob are not telepathic, then they need to agree in
advance on some strategy. It’s not hard for Alice and Bob to succeed
with probability 3/4: just always output the same bit. Moreover, by
doing some case analysis, we can show that no matter what strategy
they use, Alice and Bob cannot succeed with higher probability than 11
Theorem 18.2 below assumes that Alice and Bob
that:11 use deterministic strategies 𝑓 and 𝑔 respectively. More
generally, Alice and Bob could use a randomized
strategy, or equivalently, each could choose 𝑓 and
Theorem 18.2 — Bell’s Inequality.For every two functions 𝑓, 𝑔 ∶ {0, 1} →
𝑔 from some distributions ℱ and 𝒢 respectively.
{0, 1}, Pr𝑥,𝑦∈{0,1} [𝑓(𝑥) ⊕ 𝑔(𝑦) = 𝑥 ∧ 𝑦] ≤ 3/4. However the averaging principle (??) implies that if
all possible deterministic strategies succeed with
probability at most 3/4, then the same is true for all
Proof. Since the probability is taken over all four choices of 𝑥, 𝑦 ∈ randomized strategies.
{0, 1}, the only way the theorem can be violated if if there exist two
functions 𝑓, 𝑔 that satisfy

𝑓(𝑥) ⊕ 𝑔(𝑦) = 𝑥 ∧ 𝑦

for all the four choices of 𝑥, 𝑦 ∈ {0, 1}2 . Let’s plug in all these four
choices and see what we get (below we use the equalities 𝑧 ⊕ 0 = 𝑧,
𝑧 ∧ 0 = 0 and 𝑧 ∧ 1 = 𝑧):

𝑓(0) ⊕ 𝑔(0) = 0 (plugging in 𝑥 = 0, 𝑦 = 0)


𝑓(0) ⊕ 𝑔(1) = 0 (plugging in 𝑥 = 0, 𝑦 = 1)
𝑓(1) ⊕ 𝑔(0) = 0 (plugging in 𝑥 = 1, 𝑦 = 0)
𝑓(1) ⊕ 𝑔(1) = 1 (plugging in 𝑥 = 1, 𝑦 = 1)
If we XOR together the first and second equalities we get 𝑔(0) ⊕
𝑔(1) = 0 while if we XOR together the third and fourth equalities we
get 𝑔(0) ⊕ 𝑔(1) = 1, thus obtaining a contradiction.

An amazing experimentally verified fact is that quantum mechanics 12


More accurately, one either has to give up on a
allows for “telepathy”.12 Specifically, it has been shown that using the “billiard ball type” theory of the universe or believe
in telepathy (believe it or not, some scientists went for
weirdness of quantum mechanics, there is in fact a strategy for Alice the latter option).
and Bob to succeed in this game with probability larger than 3/4 (in
fact, they can succeed with probability about 0.85, see Lemma 18.3).

18.5 ANALYSIS OF BELL’S INEQUALITY


Now that we have the notation in place, we can show a strategy for
Alice and Bob to display “quantum telepathy” in Bell’s Game. Re-
call that in the classical case, Alice and Bob can succeed in the “Bell 13
The strategy we show is not the best one. Alice and
Game” with probability at most 3/4 = 0.75. We now show that quan- Bob can in fact succeed with probability cos2 (𝜋/8) ∼
tum mechanics allows them to succeed with probability at least 0.8.13 0.854.
358 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Lemma 18.3 There is a 2-qubit quantum state 𝜓 ∈ ℂ4 so that if Alice


has access to the first qubit of 𝜓, can manipulate and measure it and
output 𝑎 ∈ {0, 1} and Bob has access to the second qubit of 𝜓 and can
manipulate and measure it and output 𝑏 ∈ {0, 1} then Pr[𝑎 ⊕ 𝑏 =
𝑥 ∧ 𝑦] ≥ 0.8.

Proof. Alice and Bob will start by preparing a 2-qubit quantum system
in the state

𝜓= √1 |00⟩ + √1 |11⟩
2 2

(this state is known as an EPR pair). Alice takes the first qubit of
the system to her room, and Bob takes the qubit to his room. Now,
when Alice receives 𝑥 if 𝑥 = 0 she does nothing and if 𝑥 = 1 she ap-
𝑐𝑜𝑠𝜃 − sin 𝜃
plies the unitary map 𝑅−𝜋/8 to her qubit where 𝑅𝜃 = ( )
sin 𝜃 cos 𝜃
is the unitary operation corresponding to rotation in the plane with
angle 𝜃. When Bob receives 𝑦, if 𝑦 = 0 he does nothing and if 𝑦 = 1
he applies the unitary map 𝑅𝜋/8 to his qubit. Then each one of them
measures their qubit and sends this as their response.
Recall that to win the game Bob and Alice want their outputs to
be more likely to differ if 𝑥 = 𝑦 = 1 and to be more likely to agree
otherwise. We will split the analysis in one case for each of the four
possible values of 𝑥 and 𝑦.
Case 1: 𝑥 = 0 and 𝑦 = 0. If 𝑥 = 𝑦 = 0 then the state does not
change. * Because the state 𝜓 is proportional to |00⟩ + |11⟩, the mea-
surements of Bob and Alice will always agree (if Alice measures 0
then the state collapses to |00⟩ and so Bob measures 0 as well, and
similarly for 1). Hence in the case 𝑥 = 𝑦 = 1, Alice and Bob always
win.
Case 2: 𝑥 = 0 and 𝑦 = 1. If 𝑥 = 0 and 𝑦 = 1 then after Alice
measures her bit, if she gets 0 then the system collapses to the state
|00⟩, in which case after Bob performs his rotation, his qubit is in
the state cos(𝜋/8)|0⟩ + sin(𝜋/8)|1⟩. Thus, when Bob measures his
qubit, he will get 0 (and hence agree with Alice) with probability
cos2 (𝜋/8) ≥ 0.85. Similarly, if Alice gets 1 then the system collapses
to |11⟩, in which case after rotation Bob’s qubit will be in the state
− sin(𝜋/8)|0⟩ + cos(𝜋/8)|1⟩ and so once again he will agree with Alice 14
We are using the (not too hard) observation that
with probability cos2 (𝜋/8). the result of this experiment is the same regardless of
the order in which Alice and Bob apply their rotations
The analysis for Case 3, where 𝑥 = 1 and 𝑦 = 0, is completely and measurements.
analogous to Case 2. Hence Alice and Bob will agree with probability
cos2 (𝜋/8) in this case as well.14
qua n tu m comp u ti ng a n d c ry p tog r a p hy i 359

Case 4: 𝑥 = 1 and 𝑦 = 1. For the case that 𝑥 = 1 and 𝑦 = 1,


after both Alice and Bob perform their rotations, the state will be
proportional to

𝑅−𝜋/8 |0⟩𝑅𝜋/8 |0⟩ + 𝑅−𝜋/8 |1⟩𝑅𝜋/8 |1⟩ . (18.1)

Intuitively, since we rotate one state by 45 degrees and the other


state by -45 degrees, they will become orthogonal to each other, and
the measurements will behave like independent coin tosses that agree
with probability 1/2. However, for the sake of completeness, we now
show the full calculation.
Opening up the coefficients and using cos(−𝑥) = cos(𝑥) and
sin(−𝑥) = − sin(𝑥), we can see that (18.1) is proportional to

cos2 (𝜋/8)|00⟩ + cos(𝜋/8) sin(𝜋/8)|01⟩


− sin(𝜋/8) cos(𝜋/8)|10⟩ + sin2 (𝜋/8)|11⟩
− sin2 (𝜋/8)|00⟩ + sin(𝜋/8) cos(𝜋/8)|01⟩
− cos(𝜋/8) sin(𝜋/8)|10⟩ + cos2 (𝜋/8)|11⟩ .
Using the trigonometric identities 2 sin(𝛼) cos(𝛼) = sin(2𝛼) and
cos( 𝛼) − sin2 (𝛼) = cos(2𝛼), we see that the probability of getting any
one of |00⟩, |10⟩, |01⟩, |11⟩ is proportional to cos(𝜋/4) = sin(𝜋/4) = √12 .
Hence all four options for (𝑎, 𝑏) are equally likely, which mean that in
this case 𝑎 = 𝑏 with probability 0.5.
Taking all the four cases together, the overall probability of winning
the game is at least 41 ⋅ 1 + 21 ⋅ 0.85 + 14 ⋅ 0.5 = 0.8.

R
Remark 18.4 — Quantum vs probabilistic strategies. It
is instructive to understand what is it about quan-
tum mechanics that enabled this gain in Bell’s
Inequality. For this, consider the following anal-
ogous probabilistic strategy for Alice and Bob.
They agree that each one of them output 0 if he
or she get 0 as input and outputs 1 with prob-
ability 𝑝 if they get 1 as input. In this case one
can see that their success probability would be
1
4
⋅ 1 + 21 (1 − 𝑝) + 14 [2𝑝(1 − 𝑝)] = 0.75 − 0.5𝑝2 ≤ 0.75.
The quantum strategy we described above can be
thought of as a variant of the probabilistic strategy for
parameter 𝑝 set to sin2 (𝜋/8) = 0.15. But in the case
𝑥 = 𝑦 = 1, instead of disagreeing only with probability
2𝑝(1 − 𝑝) = 1/4, because we can use these negative
probabilities in the quantum world and rotate the state
in opposite directions, and hence the probability of
disagreement ends up being sin2 (𝜋/4) = 0.5.
360 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

18.6 GROVER’S ALGORITHM


Shor’s Algorithm, which we’ll see in the next lecture, is an amazing
achievement, but it only applies to very particular problems. It does
not seem to be relevant to breaking AES, lattice based cryptography,
or problems not related to quantum computing at all such as schedul-
ing, constraint satisfaction, traveling salesperson etc.. etc.. Indeed,
for the most general form of these search problems, classically we
don’t how to do anything much better than brute force search, which
takes 2𝑛 time over an 𝑛-bit domain. Lev Grover showed that quantum
computers can obtain a quadratic improvement over this brute force
search, solving SAT in 2𝑛/2 time. The effect of Grover’s algorithm on
cryptography is fairly mild: one essentially needs to double the key
lengths of symmetric primitives. But beyond cryptography, if large
scale quantum computers end up being built, Grover search and its
variants might end up being some of the most useful computational
problems they will tackle. Grover’s theorem is the following:

Theorem (Grover search , 1996):There is a quantum 𝑂(2𝑛/2 𝑝𝑜𝑙𝑦(𝑛))-


time algorithm that given a 𝑝𝑜𝑙𝑦(𝑛)-sized circuit computing a function
𝑓 ∶ {0, 1}𝑛 → {0, 1} outputs a string 𝑥∗ ∈ {0, 1}𝑛 such that 𝑓(𝑥∗ ) = 1.

Proof sketch: The proof is not hard but we only sketch it here. The
general idea can be illustrated in the case that there exists a single 𝑥∗
satisfying 𝑓(𝑥∗ ) = 1. (There is a classical reduction from the general
case to this problem.) As in Simon’s algorithm, we can efficiently ini-
tialize an 𝑛-qubit system to the uniform state 𝑢 = 2−𝑛/2 ∑𝑥∈{0,1}𝑛 |𝑥⟩
which has 2−𝑛/2 dot product with |𝑥∗ ⟩. Of course if we measure 𝑢, we
only have probability (2−𝑛/2 )2 = 2−𝑛 of obtaining the value 𝑥∗ . Our
goal would be to use 𝑂(2𝑛/2 ) calls to the oracle to transform the sys-
tem to a state 𝑣 with dot product at least some constant 𝜖 > 0 with the
state |𝑥∗ ⟩.
It is an exercise to show that using 𝐻𝑎𝑑 gets we can efficiently com-
pute the unitary operator 𝑈 such that 𝑈 𝑢 = 𝑢 and 𝑈 𝑣 = −𝑣 for every
𝑣 orthogonal to 𝑢. Also, using the circuit for 𝑓, we can efficiently com-
pute the unitary operator 𝑈 ∗ such that 𝑈 ∗ |𝑥⟩ = |𝑥⟩ for all 𝑥 ≠ 𝑥∗
and 𝑈 ∗ |𝑥∗ ⟩ = −|𝑥∗ ⟩. It turns out that 𝑂(2𝑛/2 ) applications of UU

to 𝑢 yield a vector 𝑣 with Ω(1) inner product with |𝑥∗ ⟩. To see why,
consider what these operators do in the two dimensional linear sub-
space spanned by 𝑢 and |𝑥∗ ⟩. (Note that the initial state 𝑢 is in this
subspace and all our operators preserve this property.) Let 𝑢⟂ be the
unit vector orthogonal to 𝑢 in this subspace and let 𝑥∗⟂ be the unit vec-
tor orthogonal to |𝑥∗ ⟩ in this subspace. Restricted to this subspace, 𝑈 ∗
is a reflection along the axis 𝑥∗⟂ and 𝑈 is a reflection along the axis 𝑢.
qua n tu m comp u ti ng a n d c ry p tog r a p hy i 361

Now, let 𝜃 be the angle between 𝑢 and 𝑥∗⟂ . These vectors are very
close to each other and so 𝜃 is very small but not zero - it is equal to
sin−1 (2−𝑛/2 ) which is roughly 2−𝑛/2 . Now if our state 𝑣 has angle
𝛼 ≥ 0 with 𝑢, then as long as 𝛼 is not too large (say 𝛼 < 𝜋/8) then
this means that 𝑣 has angle 𝑢 + 𝜃 with 𝑥∗⟂ . That means that 𝑈 ∗ 𝑣 will
have angle −𝛼 − 𝜃 with 𝑥∗⟂ or −𝛼 − 2𝜃 with 𝑢, and hence UU 𝑣 will

have angle 𝛼 + 2𝜃 with 𝑢. Hence in one application from UU we move


2𝜃 radians away from 𝑢, and in 𝑂(2 −𝑛/2


) steps the angle between 𝑢
and our state will be at least some constant 𝜖 > 0. Since we live in the
two dimensional space spanned by 𝑢 and |𝑥⟩, it would mean that the
dot product of our state and |𝑥⟩ will be at least some constant as well.
QED
19
Quantum computing and cryptography II

Bell’s Inequality is powerful demonstration that there is some-


thing very strange going on with quantum mechanics. But could this
“strangeness” be of any use to solve computational problems not
directly related to quantum systems? A priori, one could guess the
answer is no. In 1994 Peter Shor showed that one would be wrong:

Theorem 19.1 — Shor’s Theorem. The map that takes an integer 𝑚


into its prime factorization is efficiently quantumly computable.
3
Specifically, it can be computed using 𝑂(log 𝑚) quantum gates.

This is an exponential improvement over the best known classical


̃
1/3
algorithms, which as we mentioned before, take roughly 2𝑂(log 𝑚)
time.
We will now sketch the ideas behind Shor’s algorithm. In fact, Shor
proved the following more general theorem:

There is a quantum polyno-


Theorem 19.2 — Order Finding Algorithm.
mial time algorithm that given a multiplicative Abelian group 𝔾
and element 𝑔 ∈ 𝔾 computes the order of 𝑔 in the group.

Recall that the order of 𝑔 in 𝔾 is the smallest positive integer 𝑎 such


that 𝑔𝑎 = 1. By “given a group” we mean that we can represent the
elements of the group as strings of length 𝑂(log |𝔾|) and there is a
𝑝𝑜𝑙𝑦(log |𝔾|) algorithm to perform multiplication in the group.

19.1 FROM ORDER FINDING TO FACTORING AND DISCRETE LOG


The order finding problem allows not just to factor integers in polyno-
mial time, but also solve the discrete logarithm over arbitrary Abelian
groups, hereby showing that quantum computers will break not just
RSA but also Diffie Hellman and Elliptic Curve Cryptography. We
merely sketch how one reduces the factoring and discrete logarithm

Compiled on 11.17.2021 22:35


364 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

problems to order finding: (see some of the sources above for the full
details)

• For factoring, let us restrict to the case 𝑚 = 𝑝𝑞 for distinct 𝑝, 𝑞.


Recall that we showed that finding the size (𝑝 − 1)(𝑞 − 1) = 𝑚 − 𝑝 −
𝑞 + 1 of the group ℤ∗𝑚 is sufficient to recover 𝑝 and 𝑞. One can show
that if we pick a few random 𝑥’s in ℤ∗𝑚 and compute their order, the
least common multiplier of these orders is likely to be the group
size.

• For discrete log in a group 𝔾, if we get 𝑋 = 𝑔𝑥 and need to recover


𝑥, we can compute the order of various elements of the form 𝑋 𝑎 𝑔𝑏 .
The order of such an element is a number 𝑐 satisfying 𝑐(𝑥𝑎 + 𝑏) = 0
(mod |𝔾|). Again, with a few random examples we will get a non
trivial example (where 𝑐 ≠ 0 (mod |𝔾|) ) and be able to recover the
unknown 𝑥.

19.2 FINDING PERIODS OF A FUNCTION: SIMON’S ALGORITHM


Let ℍ be some Abelian group with a group operation that we’ll denote
by ⊕, and 𝑓 be some function mapping ℍ to an arbitrary set (which
we can encode as {0, 1}∗ ). We say that 𝑓 has period ℎ∗ for some ℎ∗ ∈ ℍ
if for every 𝑥, 𝑦 ∈ ℍ, 𝑓(𝑥) = 𝑓(𝑦) if and only if 𝑦 = 𝑥 ⊕ 𝑘ℎ∗ for
some integer 𝑘. Note that if 𝔾 is some Abelian group, then if we define
ℍ = ℤ|𝔾| , for every element 𝑔 ∈ 𝔾, the map 𝑓(𝑎) = 𝑔𝑎 is a periodic
map over ℍ with period the order of 𝑔. So, finding the order of an item
reduces to the question of finding the period of a function.
How do we generally find the period of a function? Let us consider
the simplest case, where 𝑓 is a function from ℝ to ℝ that is ℎ∗ periodic
for some number ℎ∗ , in the sense that 𝑓 repeats itself on the intervals
[0, ℎ∗ ], [ℎ∗ , 2ℎ∗ ], [2ℎ∗ , 3ℎ∗ ], etc.. How do we find this number ℎ∗ ? The
key idea would be to transform 𝑓 from the time to the frequency do-
main. That is, we use the Fourier transform to represent 𝑓 as a sum of
wave functions. In this representation wavelengths that divide the
period ℎ∗ would get significant mass, while wavelengths that don’t
would likely “cancel out”.
Similarly, the main idea behind Shor’s algorithm is to use a tool
known as the quantum fourier transform that given a circuit computing
the function 𝑓 ∶ ℍ → ℝ, creates a quantum state over roughly log |ℍ|
qubits (and hence dimension |ℍ|) that corresponds to the Fourier
transform of 𝑓. Hence when we measure this state, we get a group
element ℎ with probability proportional to the square of the corre-
sponding Fourier coefficient. One can show that if 𝑓 is ℎ∗ -periodic
then we can recover ℎ∗ from this distribution.
Shor carried out this approach for the group ℍ = ℤ∗𝑞 for some
𝑞, but we will start be seeing this for the group ℍ = {0, 1}𝑛 with
q ua n tu m comp u ti ng a n d c ry p tog ra p hy i i 365

Figure 19.1: If 𝑓 is a periodic function then when we


represent it in the Fourier transform, we expect the
coefficients corresponding to wavelengths that do
not evenly divide the period to be very small, as they
would tend to “cancel out”.

the XOR operation. This case is known as Simon’s algorithm (given


by Dan Simon in 1994) and actually preceded (and inspired) Shor’s
algorithm:

If 𝑓 ∶ {0, 1}𝑛 → {0, 1}∗ is polyno-


Theorem 19.3 — Simon’s Algorithm.
mial time computable and satisfies the property that 𝑓(𝑥) = 𝑓(𝑦) iff
𝑥 ⊕ 𝑦 = ℎ∗ then there exists a quantum polynomial-time algorithm
that outputs a random ℎ ∈ {0, 1}𝑛 such that ⟨ℎ, ℎ∗ ⟩ = 0 (mod 2).

Note that given 𝑂(𝑛) such samples, we can recover ℎ∗ with high
probability by solving the corresponding linear equations.

Proof. Let HAD be the 2 × 2 unitary matrix corresponding to the


one qubit operation |0⟩ ↦ √12 (|0⟩ + |1⟩) and |1⟩ ↦ √12 (|0⟩ − |1⟩)
or |𝑎⟩ ↦ √12 (|0⟩ + (−1)𝑎 |1⟩). Given the state |0𝑛+𝑚⟩ we can
apply this map to each one of the first 𝑛 qubits to get the state
2−𝑛/2 ∑𝑥∈{0,1}𝑛 |𝑥⟩|0𝑚 ⟩ and then we can apply the gates of 𝑓 to
map this to the state 2−𝑛/2 ∑𝑥∈{0,1}𝑛 |𝑥⟩|𝑓(𝑥)⟩ now suppose that we
apply this operation again to the first 𝑛 qubits then we get the state
𝑛
2−𝑛 ∑𝑥∈{0,1}𝑛 ∏𝑖=1 (|0⟩+(−1)𝑥𝑖 |1⟩)|𝑓(𝑥)⟩ which if we open up each one
of these product and look at all 2𝑛 choices 𝑦 ∈ {0, 1}𝑛 (with 𝑦𝑖 = 0 cor-
responding to picking |0⟩ and 𝑦𝑖 = 1 corresponding to picking |1⟩ in
the 𝑖𝑡ℎ product) we get 2−𝑛 ∑𝑥∈{0,1}𝑛 ∑𝑦∈{0,1}𝑛 (−1)⟨𝑥,𝑦⟩ |𝑦⟩|𝑓(𝑥)⟩. Now
under our assumptions for every particular 𝑧 in the image of 𝑓, there
exist exactly two preimages 𝑥 and 𝑥⊕ℎ∗ such that 𝑓(𝑥) = 𝑓(𝑥+ℎ∗ ) = 𝑧.
So, if ⟨𝑦, ℎ∗ ⟩ = 0 (mod 2), we get that (−1)⟨𝑥,𝑦⟩ + (−1)⟨𝑥,𝑦+ℎ ⟩ = 2 and

otherwise we get (−1)⟨𝑥,𝑦⟩ + (−1)⟨𝑥,𝑦+ℎ ⟩ = 0. Therefore, if measure the


state we will get a pair (𝑦, 𝑧) such that ⟨𝑦, ℎ∗ ⟩ = 0 (mod 2). QED

366 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Simon’s algorithm seems to really use the special bit-wise structure


of the group {0, 1}𝑛 , so one could wonder if it has any relevance for
the group ℤ∗𝑚 for some exponentially large 𝑚. It turns out that the
same insights that underlie the well known Fast Fourier Transform
(FFT) algorithm can be used to essentially follow the same strategy for
this group as well.

19.3 FROM SIMON TO SHOR


(Note: The presentation here is adapted from the quantum computing
chapter in my textbook with Arora.)
We now describe how to achieve Shor’s algorithm for order finding.
We will not do this for a general group but rather focus our attention
on the group ℤ∗ℓ for some number ℓ which is the case of interest for
integer factoring and the discrete logarithm modulo primes problems.
That is, we prove the following theorem:

For every ℓ and 𝑎 ∈ ℤ∗ℓ , there


Theorem 19.4 — Shor’s Algorithm, restated.
is a quantum 𝑝𝑜𝑙𝑦(𝑙𝑜𝑔ℓ) algorithm to find the order of 𝑎 in ℤ∗ℓ .

The idea is similar to Simon’s algorithm. We consider the map


𝑥 ↦ 𝑎𝑥 ( mod ℓ) which is a periodic map over ℤ𝑚 where 𝑚 = |ℤ∗ℓ |
with period being the order of 𝑎.
To find the period of this map we will now need to perform a Quan-
tum Fourier Transform (QFT) over the group ℤ𝑚 instead of {0, 1}𝑛 . This
is a quantum algorithm that takes a register from some arbitrary state
𝑓 ∈ ℂ𝑚 into a state whose vector is the Fourier transform 𝑓 ̂ of 𝑓. The
2
QFT takes only 𝑂(log 𝑚) elementary steps and is thus very efficient.
Note that we cannot say that this algorithm “computes” the Fourier
transform, since the transform is stored in the amplitudes of the state,
and as mentioned earlier, quantum mechanics give no way to “read
out” the amplitudes per se. The only way to get information from
a quantum state is by measuring it, which yields a single basis state
with probability that is related to its amplitude. This is hardly repre-
sentative of the entire Fourier transform vector, but sometimes (as is
the case in Shor’s algorithm) this is enough to get highly non-trivial
information, which we do not know how to obtain using classical
(non-quantum) computers.

19.3.1 The Fourier transform over ℤ𝑚


We now define the Fourier transform over ℤ𝑚 (the group of integers
in {0, … , 𝑚 − 1} with addition modulo 𝑚). We give a definition that is
specialized to the current context. For every vector 𝑓 ∈ ℂ𝑚 , the Fourier 1
In the context of Fourier transform it is customary
transform of 𝑓 is the vector 𝑓 ̂ where the 𝑥𝑡ℎ coordinate of 𝑓 ̂ is defined and convenient to denote the 𝑥𝑡ℎ coordinate of a
as1 vector 𝑓 by 𝑓(𝑥) rather than 𝑓𝑥 .
q ua n tu m comp u ti ng a n d c ry p tog ra p hy i i 367

̂ =
𝑓(𝑥) √1 ∑𝑦∈ℤ 𝑓(𝑥)𝜔𝑥𝑦
𝑚 𝑚
where 𝜔 = 𝑒2𝜋𝑖/𝑚 .
The Fourier transform is simply a representation of 𝑓 in the Fourier
basis {𝜒𝑥 }𝑥∈ℤ𝑚 , where 𝜒𝑥 is the vector/function whose 𝑦𝑡ℎ coordinate
is √𝑚𝜔1
𝑥𝑦 . Now the inner product of any two vectors 𝜒𝑥 , 𝜒𝑧 in this basis

is equal to
1 1
⟨𝜒𝑥 , 𝜒𝑧 ⟩ = 𝑚 ∑ 𝜔𝑥𝑦 𝜔𝑧𝑦 = 𝑚 ∑ 𝜔(𝑥−𝑧)𝑦 .
𝑦∈ℤ𝑚 𝑦∈ℤ𝑚

But if 𝑥 = 𝑧 then 𝜔(𝑥−𝑧) = 1 and hence this sum is equal to 1. On


the other hand, if 𝑥 ≠ 𝑧, then this sum is equal to 𝑚 1 1−𝜔(𝑥−𝑦)𝑚
1−𝜔𝑥−𝑦 =
1 1−1
𝑚 1−𝜔 𝑥−𝑦 = 0 using the formula for the sum of a geometric series. In
other words, this is an orthonormal basis which means that the Fourier
transform map 𝑓 ↦ 𝑓 ̂ is a unitary operation.
What is so special about the Fourier basis? For one thing, if we
identify vectors in ℂ𝑚 with functions mapping ℤ𝑚 to ℂ, then it’s easy
to see that every function 𝜒 in the Fourier basis is a homomorphism
from ℤ𝑚 to ℂ in the sense that 𝜒(𝑦 + 𝑧) = 𝜒(𝑦)𝜒(𝑧) for every 𝑦, 𝑧 ∈ ℤ𝑚 .
Also, every function 𝜒 is periodic in the sense that there exists 𝑟 ∈ ℤ𝑚
such that 𝜒(𝑦 + 𝑟) = 𝜒(𝑧) for every 𝑦 ∈ ℤ𝑚 (indeed if 𝜒(𝑦) = 𝜔𝑥𝑦 then
we can take 𝑟 to be ℓ/𝑥 where ℓ is the least common multiple of 𝑥 and
𝑚). Thus, intuitively, if a function 𝑓 ∶ ℤ𝑚 → ℂ is itself periodic (or
roughly periodic) then when representing 𝑓 in the Fourier basis, the
coefficients of basis vectors with periods agreeing with the period of 𝑓
should be large, and so we might be able to discover 𝑓’s period from
this representation. This does turn out to be the case, and is a crucial
point in Shor’s algorithm.

19.3.2 Fast Fourier Transform.


Denote by FT𝑚 the operation that maps every vector 𝑓 ∈ ℂ𝑚 to its
Fourier transform 𝑓.̂ The operation FT𝑚 is represented by an 𝑚 × 𝑚
matrix whose (𝑥, 𝑦)th entry is 𝜔𝑥𝑦 . The trivial algorithm to compute
it takes 𝑚2 operations. The famous Fast Fourier Transform (FFT) algo-
rithm computes the Fourier transform in 𝑂(𝑚 log 𝑚) operations. We
now sketch the idea behind the FFT algorithm as the same idea is used
in the quantum Fourier transform algorithm.
Note that
̂ = √1 ∑
𝑓(𝑥) 𝑓(𝑦)𝜔𝑥𝑦 =
𝑚 𝑦∈ℤ 𝑚
√1 ∑𝑦∈ℤ 𝑓(𝑦)𝜔−2𝑥(𝑦/2) + 𝜔𝑥 √1𝑚 ∑𝑦∈ℤ 𝑓(𝑦)𝜔2𝑥(𝑦−1)/2 .
𝑚 𝑚 ,𝑦 𝑒𝑣𝑒𝑛 𝑚 ,𝑦
𝑜𝑑𝑑
Now since 𝜔 is an 𝑚/2th root of unity and 𝜔
2
= −1, letting 𝑊 be
𝑚/2

the 𝑚/2 × 𝑚/2 diagonal matrix with diagonal entries 𝜔0 , … , 𝜔𝑚/2−1 ,


we get that
FT𝑚 (𝑓)𝑙𝑜𝑤 = FT𝑚/2 (𝑓𝑒𝑣𝑒𝑛 ) + 𝑊 FT𝑚/2 (𝑓𝑜𝑑𝑑 )
FT𝑚 (𝑓)ℎ𝑖𝑔ℎ = FT𝑚/2 (𝑓𝑒𝑣𝑒𝑛 ) − 𝑊 FT𝑚/2 (𝑓𝑜𝑑𝑑 )
368 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

where for an 𝑚-dimensional vector 𝑣,⃗ we denote by 𝑣𝑒𝑣𝑒𝑛 ⃗ (resp.


⃗ ) the 𝑚/2-dimensional vector obtained by restricting 𝑣 ⃗ to the
𝑣𝑜𝑑𝑑
coordinates whose indices have least significant bit equal to 0 (resp. 1)
and by 𝑣𝑙𝑜𝑤
⃗ (resp. 𝑣ℎ𝑖𝑔ℎ
⃗ ) the restriction of 𝑣 ⃗ to coordinates with most
significant bit 0 (resp. 1).
The equations above are the crux of the divide-and-conquer idea of
the FFT algorithm, since they allow to replace a size-𝑚 problem with
two size-𝑚/2 subproblems, leading to a recursive time bound of the
form 𝑇 (𝑚) = 2𝑇 (𝑚/2) + 𝑂(𝑚) which solves to 𝑇 (𝑚) = 𝑂(𝑚 log 𝑚).

19.3.3 Quantum Fourier Transform over ℤ𝑚


The quantum Fourier transform is an algorithm to change the state of a
quantum register from 𝑓 ∈ ℂ𝑚 to its Fourier transform 𝑓.̂

For every
Theorem 19.5 — Quantum Fourier Transform (Bernstein-Vazirani).
𝑚 and 𝑚 = 2𝑚 there is a quantum algorithm that uses 𝑂(𝑚2 ) ele-
mentary quantum operations and transforms a quantum register in
state 𝑓 = ∑𝑥∈ℤ 𝑓(𝑥)|𝑥⟩ into the state 𝑓 ̂ = ∑𝑥∈ℤ 𝑓(𝑥)|𝑥⟩,
̂ where
𝑚 𝑚
̂ =√ ∑
𝑓(𝑥) 1 𝑥𝑦
𝜔 𝑓(𝑥).
𝑚 𝑦∈ℤ𝑚

The crux of the algorithm is the FFT equations, which allow the
problem of computing FT𝑚 , the problem of size 𝑚, to be split into two
identical subproblems of size 𝑚/2 involving computation of FT𝑚/2 ,
which can be carried out recursively using the same elementary oper-
ations. (Aside: Not every divide-and-conquer classical algorithm can
be implemented as a fast quantum algorithm; we are really using the
structure of the problem here.)
We now describe the algorithm and the state, neglecting normaliz-
ing factors.

1. initial state: 𝑓 = ∑𝑥∈ℤ 𝑓(𝑥)|𝑥⟩


𝑚

2. Recursively run FT𝑚/2 on 𝑚 − 1 most significant qubits (state:


(FT𝑚/2 𝑓𝑒𝑣𝑒𝑛 )|0⟩ + (FT𝑚/2 𝑓𝑜𝑑𝑑 )|1⟩)

3. If LSB is 1 then compute 𝑊 on 𝑚 − 1 most significant qubits (see


below). (state : (FT𝑚/2 𝑓𝑒𝑣𝑒𝑛 )|0⟩ + (𝑊 FT𝑚/2 𝑓𝑜𝑑𝑑 )|1⟩)

4. Apply Hadmard gate 𝐻 to least significant qubit. (state:


(FT𝑚/2 𝑓𝑒𝑣𝑒𝑛 )(|0⟩ + |1⟩) + (𝑊 FT𝑚/2 𝑓𝑜𝑑𝑑 )(|0⟩ − |1⟩) =
(FT𝑚/2 𝑓𝑒𝑣𝑒𝑛 + 𝑊 FT𝑚/2 𝑓𝑜𝑑𝑑 )|0⟩ + (FT𝑚/2 𝑓𝑒𝑣𝑒𝑛 − 𝑊 FT𝑚/2 𝑓𝑜𝑑𝑑 )|1⟩)

5. Move LSB to the most significant position. (state: |0⟩(FT𝑚/2 𝑓𝑒𝑣𝑒𝑛 +


𝑊 FT 𝑓 ) + |1⟩(FT
𝑚/2 𝑜𝑑𝑑 𝑓 − 𝑊 FT
𝑚/2 𝑒𝑣𝑒𝑛 𝑓 ) = 𝑓)̂
𝑚/2 𝑜𝑑𝑑

The transformation 𝑊 on 𝑚 − 1 qubits can be defined by |𝑥⟩ ↦


𝑚−2
𝜔 = 𝜔∑𝑖=0
𝑥 2𝑖 𝑥𝑖
(where 𝑥𝑖 is the 𝑖𝑡ℎ qubit of 𝑥). It can be easily seen
q ua n tu m comp u ti ng a n d c ry p tog ra p hy i i 369

to be the result of applying for every 𝑖 ∈ {0, … , 𝑚 − 2} the following


elementary operation on the 𝑖𝑡ℎ qubit of the register:
|0⟩ ↦ |0⟩ and |1⟩ ↦ 𝜔2 |1⟩.
𝑖

The final state is equal to 𝑓 ̂ by the FFT equations (we leave this as
an exercise)

19.4 SHOR’S ORDER-FINDING ALGORITHM.


We now present the central step in Shor’s factoring algorithm: a quan-
tum polynomial-time algorithm to find the order of an integer 𝑎 mod-
ulo an integer ℓ.

There is a polynomial-
Theorem 19.6 — Order finding algorithm, restated.
time quantum algorithm that on input 𝐴, 𝑁 (represented in bi-
nary) finds the smallest 𝑟 such that 𝐴𝑟 = 1 (mod 𝑁 ).

Let 𝑡 = ⌈5 log(𝐴 + 𝑁 )⌉. Our register will consist of 𝑡 + 𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑁 )


qubits. Note that the function 𝑥 ↦ 𝐴𝑥 (mod 𝑁 ) can be computed
in 𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑁 ) time and so we will assume that we can compute the
map |𝑥⟩|𝑦⟩ ↦ |𝑥⟩|𝑦 ⊕ (𝐴𝑥 (mod 𝑁 ⟩)) (where we identify a number
𝑋 ∈ {0, … , 𝑁 − 1} with its representation as a binary string of length 2
To compute this map we may need to extend the
log 𝑁 ).2 Now we describe the order-finding algorithm. It uses a tool register by some additional 𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑁) many qubits,
but we can ignore them as they will always be equal
of elementary number theory called continued fractions which allows to zero except in intermediate computations.
us to approximate (using a classical algorithm) an arbitrary real num-
ber 𝛼 with a rational number 𝑝/𝑞 where there is a prescribed upper
bound on 𝑞 (see below)
We now describe the algorithm and the state, this time including
normalizing factors.

1. Apply Fourier transform to the first 𝑚 bits. (state: √1


𝑚
∑𝑥∈ℤ |𝑥⟩)|0𝑛 ⟩)
𝑚

2. Compute the transformation |𝑥⟩|𝑦⟩ ↦ |𝑥⟩|𝑦 ⊕ (𝐴𝑥 (mod 𝑁 ⟩)).


(state: √1𝑚 ∑𝑥∈ℤ |𝑥⟩|𝐴𝑥 (mod 𝑁 ⟩))
𝑚

3. Measure the second register to get a value 𝑦0 . (state:


𝐾−1
√1 ∑
𝐾 ℓ=0
|𝑥0 + ℓ𝑟⟩|𝑦0 ⟩ where 𝑥0 is the smallest number such
that 𝐴𝑥0 = 𝑦0 (mod 𝑁 ) and 𝐾 = ⌊(𝑚 − 1 − 𝑥0 )/𝑟⌋.)

4. Apply the Fourier transform to the first register. (state:


𝐾−1
√ 1√ (∑𝑥∈ℤ ∑ℓ=0 𝜔(𝑥0 +ℓ𝑟)𝑥 |𝑥⟩) |𝑦0 ⟩)
𝑚 𝐾 𝑛

In the analysis, it will suffice to show that this algorithm outputs


the order 𝑟 with probability at least Ω(1/ log 𝑁 ) (we can always am-
plify the algorithm’s success by running it several times and taking
the smallest output).
370 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

19.4.1 Analysis: the case that 𝑟|𝑚


We start by analyzing the algorithm in the case that 𝑚 = 𝑟𝑐 for some
integer 𝑐. Though very unrealistic (remember that 𝑚 is a power of 2!)
this gives the intuition why Fourier transforms are useful for detecting
periods.

Claim: In this case the value 𝑥 measured will be equal to 𝑎𝑐 for a


random 𝑎 ∈ {0, … , 𝑟 − 1}.
The claim concludes the proof since it implies that 𝑥/𝑚 = 𝑎/𝑟
where 𝑎 is random integer less than 𝑟. Now for every 𝑟, at least
Ω(𝑟/ log 𝑟) of the numbers in [𝑟 − 1] are co-prime to 𝑟. Indeed, the
prime number theorem says that there at least this many primes in
this interval, and since 𝑟 has at most log 𝑟 prime factors, all but log 𝑟 of
these primes are co-prime to 𝑟. Thus, when the algorithm computes
a rational approximation for 𝑥/𝑚, the denominator it will find will
indeed be 𝑟.
To prove the claim, we compute for every 𝑥 ∈ ℤ𝑚 the absolute value
of |𝑥⟩’s coefficient before the measurement. Up to some normalization
factor this is
𝑐−1 ′ 𝑐−1 𝑐−1
∣∑ℓ=0 𝜔(𝑥0 +ℓ𝑟)𝑥 ∣ = ∣𝜔𝑥0 𝑐 𝑐 ∣ ∣∑ℓ=0 𝜔𝑟ℓ𝑥 ∣ = 1 ⋅ ∣∑ℓ=0 𝜔𝑟ℓ𝑥 ∣ .
𝑐−1
If 𝑐 does not divide 𝑥 then 𝜔𝑟 is a 𝑐𝑡ℎ root of unity, so ∑ℓ=0 𝑤𝑟ℓ𝑥 =
0 by the formula for sums of geometric progressions. Thus, such a
number 𝑥 would be measured with zero probability. But if 𝑥 = 𝑐𝑗 then
𝜔𝑟ℓ𝑥 = 𝑤𝑟𝑐𝑗ℓ = 𝜔𝑀𝑗 = 1, and hence the amplitudes of all such 𝑥’s are
equal for all 𝑗 ∈ {0, 2, … , 𝑟 − 1}.

19.4.2 The general case


In the general case, where 𝑟 does not necessarily divide 𝑚, we will not
be able to show that the measured value 𝑥 satisfies 𝑚|𝑥𝑟. However,
we will show that with Ω(1/ log 𝑟) probability, (1) 𝑥𝑟 will be “almost
divisible” by 𝑚 in the sense that 0 ≤ 𝑥𝑟 (mod 𝑚) < 𝑟/10 and (2)
⌊𝑥𝑟/𝑚⌋ is coprime to 𝑟.
Condition (1) implies that |𝑥𝑟 − 𝑐𝑀 | < 𝑟/10 for 𝑐 = ⌊𝑥𝑟/𝑚⌋.
Dividing by 𝑟𝑀 gives ∣ 𝑚 𝑥
− 𝑟𝑐 ∣ < 10𝑀
1
. Therefore, 𝑟𝑐 is a rational
number with denominator at most 𝑁 that approximates 𝑚 𝑥
to within
1/(10𝑀 ) < 1/(4𝑁 ). It is not hard to see that such an approximation is
4

unique (again left as an exercise) and hence in this case the algorithm
will come up with 𝑐/𝑟 and output the denominator 𝑟.
Thus all that is left is to prove the next two lemmas. The first shows
that there are Ω(𝑟/ log 𝑟) values of 𝑥 that satisfy the above two con-
ditions and the second shows that each is measured with probability

Ω((1/ 𝑟)2 ) = Ω(1/𝑟).

Lemma 1: There exist Ω(𝑟/ log 𝑟) values 𝑥 ∈ ℤ𝑚 such that:

1. 0 < 𝑥𝑟 (mod 𝑚) < 𝑟/10


q ua n tu m comp u ti ng a n d c ry p tog ra p hy i i 371

2. ⌊𝑥𝑟/𝑚⌋ and 𝑟 are coprime

Lemma 2: If 𝑥 satisfies 0 < 𝑥𝑟 (mod 𝑚) < 𝑟/10 then, before the


measurement in the final step of the order-finding algorithm, the
coefficient of |𝑥⟩ is at least Ω( √1𝑟 ).

Proof of Lemma 1 We prove the lemma for the case that 𝑟 is coprime to
𝑚, leaving the general case to the reader. In this case, the map 𝑥 ↦ 𝑟𝑥
(mod 𝑚) is a permutation of ℤ∗𝑚 . There are at least Ω(𝑟/ log 𝑟) num-
bers in [1..𝑟/10] that are coprime to 𝑟 (take primes in this range that
are not one of 𝑟’s at most log 𝑟 prime factors) and hence Ω(𝑟/ log 𝑟)
numbers 𝑥 such that 𝑟𝑥 (mod 𝑚) = 𝑥𝑟 − ⌊𝑥𝑟/𝑚⌋𝑚 is in [1..𝑟/10] and
coprime to 𝑟. But this means that ⌊𝑟𝑥/𝑚⌋ can not have a nontrivial
shared factor with 𝑟, as otherwise this factor would be shared with 𝑟𝑥
(mod 𝑚) as well.

Proof of Lemma 2: Let 𝑥 be such that 0 < 𝑥𝑟 (mod 𝑚) < 𝑟/10. The
absolute value of |𝑥⟩’s coefficient in the state before the measurement
is
𝐾−1
√ 1√ ∣ ∑ 𝜔ℓ𝑟𝑥 ∣ ,
𝐾 𝑚
ℓ=0

where 𝐾 = ⌊(𝑚−𝑥0 −1)/𝑟⌋. Note that 𝑚


2𝑟 <𝐾< 𝑚
𝑟since 𝑥0 < 𝑁 ≪ 𝑚.
̸
Setting 𝛽 = 𝜔𝑟𝑥 (note that since 𝑚|𝑟𝑥, 𝛽 ≠ 1) and using the for-

mula for the sum of a geometric series, this is at least 𝑟 ⌈𝑚/𝑟⌉

2𝑀 ∣ 1−𝛽
1−𝛽 ∣ =

𝑟 sin(𝜃⌈𝑚/𝑟⌉/2)
2𝑀 sin(𝜃/2) , where 𝜃 = 𝑟𝑥 (mod
𝑚
𝑚)
is the angle such that 𝛽 = 𝑒𝑖𝜃
(see Figure [quantum:fig:theta] for a proof by picture of the last
equality). Under our assumptions ⌈𝑚/𝑟⌉𝜃 < 1/10 and hence (us-
ing the

fact that sin 𝛼 ∼ 𝛼 for small angles 𝛼), the coefficient of 𝑥 is at
least 4𝑀 ⌈𝑚/𝑟⌉ ≥ 8√
𝑟 1
𝑟
This completes the proof of Theorem 19.6.

19.5 RATIONAL APPROXIMATION OF REAL NUMBERS


In many settings, including Shor’s algorithm, we are given a real num-
ber in the form of a program that can compute its first 𝑡 bits in 𝑝𝑜𝑙𝑦(𝑡)
time. We are interested in finding a close approximation to this real
number of the form 𝑎/𝑏, where there is a prescribed upper bound on
𝑏. Continued fractions is a tool in number theory that is useful for this.
A continued fraction is a number of the following form:
𝑎0 + 𝑎 + 1 1 for 𝑎0 a non-negative integer and 𝑎1 , 𝑎2 , …
1 1
𝑎2 +
𝑎 3 +…
positive integers.
Given a real number 𝛼 > 0, we can find its representation as an in-
finite fraction as follows: split 𝛼 into the integer part ⌊𝛼⌋ and fractional
part 𝛼 − ⌊𝛼⌋, find recursively the representation 𝑅 of 1/(𝛼 − ⌊𝛼⌋), and
372 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

then write
1
𝛼 = ⌊𝛼⌋ + .
𝑅
If we continue this process for 𝑛 steps, we get a rational number, de-
noted by [𝑎0 , 𝑎1 , … , 𝑎𝑛 ], which can be represented as 𝑝𝑞𝑛 with 𝑝𝑛 , 𝑞𝑛
𝑛
coprime. The following facts can be proven using induction:

• 𝑝0 = 𝑎0 , 𝑞0 = 1 and for every 𝑛 > 1, 𝑝𝑛 = 𝑎𝑛 𝑝𝑛−1 + 𝑝𝑛−2 , 𝑞𝑛 =


𝑎𝑛 𝑞𝑛−1 + 𝑞𝑛−2 .

• 𝑝𝑛
𝑞𝑛 − 𝑝𝑛−1
𝑞𝑛−1 = (−1)𝑛−1
𝑞𝑛 𝑞𝑛−1

Furthermore, it is known that ∣ 𝑝𝑞𝑛 − 𝛼∣< 1


𝑞𝑛 𝑞𝑛+1 (∗) which implies
𝑛

that is the closest rational number to 𝛼 with denominator at most


𝑝𝑛
𝑞𝑛
𝑞𝑛 . It also means that if 𝛼 is extremely close to a rational number, say,
∣𝛼 − 𝑎𝑏 ∣ < 4𝑏14 for some coprime 𝑎, 𝑏 then we can find 𝑎, 𝑏 by iterating
the continued fraction algorithm for 𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑏) steps. Indeed, let 𝑞𝑛 be
the first denominator such that 𝑞𝑛+1 ≥ 𝑏. If 𝑞𝑛+1 > 2𝑏2 then (∗) implies
that ∣ 𝑝𝑞𝑛 − 𝛼∣ < 2𝑏12 . But this means that 𝑝𝑞𝑛 = 𝑎𝑏 since there is at most
𝑛 𝑛
one rational number of denominator at most 𝑏 that is so close to 𝛼.
On the other hand, if 𝑞𝑛+1 ≤ 2𝑏2 then since 𝑞𝑛+1 is closer to 𝛼 than 𝑎𝑏 ,
𝑝
𝑛+1

, again meaning that 𝑏. It’s not hard to verify


𝑝 1 𝑝𝑛+1 𝑎
∣ 𝑞𝑛+1 − 𝛼∣ < 4𝑏4 𝑞𝑛+1 =
𝑛+1

that 𝑞𝑛 ≥ 2 𝑛/2
, implying that 𝑝𝑛 and 𝑞𝑛 can be computed in 𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑞𝑛 )
time.

19.5.1 Quantum cryptography


There is another way in which quantum mechanics interacts with
cryptography. These “spooky actions at a distance” have been sug-
gested by Weisner and Bennet-Brassard as a way in which parties can
create a secret shared key over an insecure channel. On one hand, this
concept does not require as much control as general-purpose quantum
computing, and so it has in fact been demonstrated physically. On
the other hand, unlike transmitting standard digital information, this
“insecure channel” cannot be an arbitrary media such as wifi etc.. but
rather one needs fiber optics, lasers, etc.. Unlike quantum computers,
where we only need one of those to break RSA, to actually use key
exchange at scale we need to setup these type of networks, and so it
is unclear if this approach will ever dominate the solution of Alice
sending to Bob a Brink’s truck with the shared secret key. People have
proposed some other ways to use the interesting properties of quan-
tum mechanics for cryptographic purposes including quantum money
and quantum software protection.
20
Software Obfuscation

Let us stop and think of the notions we have seen in cryptography.


We have seen that under reasonable computational assumptions (such
as LWE) we can achieve the following:
• CPA secure private key encryption and Message Authentication codes
(which can be combined to get CCA security or authenticated
encryption)- this means that two parties that share a key can have
virtual secure channel between them. An adversary cannot get any
additional information beyond whatever is her prior knowledge given
an encryption of a message sent from Alice to Bob. Moreover, she
cannot modify this message by even a single bit. It’s lucky we only
discovered these results from the 1970’s onwards— if the Germans
had used such an encryption instead of ENIGMA in World War II
there’s no telling how many more lives would have been lost.

• Public key encryption and digital signatures that enable Alice and Bob
to set up such a virtually secure channel without sharing a prior key.
This enables our “information economy” and protects virtually
every financial transaction over the web. Moreover, it is the crucial
mechanism for supplying “over the air” software updates to smart
devices, whether they be phones, cars, thermostats or anything else.
Some had predicted that this invention would change the nature
of our form of government to crypto anarchy and while this may
be hyperbole, governments everywhere are worried about this
invention.

• Hash functions and pseudorandom functions enable us to create au-


thentication tokens for deriving one-time passwords out of shared
keys, or deriving long keys from short passwords. They are also
useful as a tool in a password based key exchange, which enables two
parties to communicate securely (with fairly good but not over-
whelming probability) when they share a 6 digit PIN, even if the
adversary can easily afford much much more than 106 computa-
tional cycles.

Compiled on 11.17.2021 22:35


374 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

• Fully homomorphic encryption allows computing over encrypted data.


Bob could prepare Alice’s taxes without knowing what her income
is, and more generally store all her data and perform computations
on it, without knowing what the data is.

• Zero knowledge proofs can be used to prove a statement is true with-


out revealing why its true. In particular since you can use zero
knowledge proofs to prove that you posses X bitcoins without giv-
ing any information about their identity, they have been used to
obtain fully anonymous electronic currency.

• Multiparty secure computation are a fully general tool that enable


Alice and Bob (and Charlie, David, Elana, Fran, etc.) to perform
any computation on their private inputs, whether it is to compute
the result of a vote, a second-price auction, privacy-preserving data
mining, perform a cryptographic operation in a distributed manner
(without any party ever learning the secret key) or simply play
poker online without needing to trust any central server.

(BTW all of the above points are notions that you should be famil-
iar and be able to explain what are their security guarantees if you
ever need to use them, for example, in the unlikely event that you ever
find yourself needing to take a cryptography final exam…)
While clearly there are issues of efficiency, is there anything more in
terms of functionality we could ask for? Given all these riches, can we
be even more greedy?
It turns out that the answer is yes. Here are some scenarios that are
still not covered by the above tools:

20.1 WITNESS ENCRYPTION


Suppose that you have uncovered a conspiracy that involves very
powerful people, and you are afraid that something bad might hap-
pen to you. You would like an “insurance policy” in the form of writ-
ing down everything you know and making sure it is published in
the case of your untimely death, but are afraid these powerful peo-
ple could find and attack any trusted agent. Ideally you would want
to publish an encrypted form of your manuscript far and wide, and
make sure the decryption key is automatically revealed if anything
happens to you, but how could you do that? A UA-secure encryption
(which stands for secure against an Underwood attack) gives an abil-
ity to create an encryption 𝑐 of a message 𝑚 that is CPA secure but
such that there is an algorithm 𝐷 such that on input 𝑐 and any string
𝑤 which is a (digitally signed) New York Times obituary for Janine
Skorsky will output 𝑚.
softwa re obfu scati on 375

The technical term for this notion is witness encryption, by which


we mean that for every circuit 𝐹 we have an algorithm 𝐸 that on in-
put 𝐹 and a message 𝑚 creates a ciphertext 𝑐 that is CPA secure, and
there is an algorithm 𝐷 that on input 𝑐 and some string 𝑤, outputs
𝑚 if 𝐹 (𝑤) = 1. In other words, instead of the key being a unique
string, the key is any string 𝑤 that satisfies a certain condition. Wit-
ness encryption can be used for other applications. For example, you
could encrypt a message to future members of humanity, that can be
decrypted only using a valid proof of the Riemann Hypothesis.

20.2 DENIABLE ENCRYPTION


Here is another scenario that is seemingly not covered by our current
tools. Suppose that Alice uses a public key system (𝐺, 𝐸, 𝐷) to encrypt
a message 𝑚 by computing 𝑐 = 𝐸𝑒 (𝑚, 𝑟) and sending 𝑐 to Bob that
will compute 𝑚 = 𝐷𝑑 (𝑐). The ciphertext is intercepted by Bob’s
archenemy Freddie Baskerville Ignatius (or FBI for short) who has
the means to force Alice to reveal the message and as proof reveal
the randomness used in encryption as well. Could Alice find, for any
choice of 𝑚′ , some string 𝑟′ that is pseudorandom and for which 𝑐
equals 𝐸𝑒 (𝑚′ , 𝑟′ )? An encryption scheme with this property is called
1
One could also think of a deniable witness encryp-
deniable, since we Alice can deny she sent 𝑚 and claim she sent 𝑚′ tion, and so if Janine in the scenario above is forced to
instead.1 open the ciphertexts she sent by reveal the random-
ness used to create them, she can credibly claim that
she didn’t encrypt her knowledge of the conspiracy,
but merely wanted to make sure that her family secret
20.3 FUNCTIONAL ENCRYPTION recipe for pumpkin pie is not lost when she passes
away.
It’s not just individuals that don’t have all their needs met by our
current tools. Think of a large enterprise that uses a public key en-
cryption (𝐺, 𝐸, 𝐷). When a ciphertext 𝑐 = 𝐸𝑒 (𝑚) is received by the
enterprise’s servers, it needs to be decrypted using the secret key 𝑑.
But this creates a single point of failure. It would be much better if we
could create a “weakened key” 𝑑1 that, for example, can only decrypt
messages related to sales that were sent in the date range X-Y, a key
𝑑2 that can only decrypt messages that contain certain keywords, or
maybe a key 𝑑3 that only allows to detect whether the message en-
coded by a particular ciphertext satisfies a certain regular expression.
This will allow us to give the key 𝑑1 to the manager of the sales
department (and not worry about her taking the key with her if she
leaves the company), or more generally give every employee a key
that corresponds to his or her role. Furthermore, if the company re-
ceives a subpoena for all emails relating to a particular topic, it could
give out a cryptographic key that reveals precisely these emails and
nothing else. It could also run a spam filter on encrypted messages
without needing to give the server performing this filter access to the
376 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

full contents of the messages (and so perhaps even outsource spam


filtering to a different company).
The general form of this is called a functional encryption. The idea is
that for every function 𝑓 ∶ {0, 1}∗ → {0, 1}∗ we can create a decryption
key 𝑑𝑓 such that on input 𝑐 = 𝐸𝑒 (𝑚), 𝐷𝑑𝑓 (𝑐) = 𝑓(𝑚) but 𝑑𝑓 cannot
be used to gain any other information on the message except for 𝑓(𝑚),
and even if several parties holding 𝑑𝑓1 , … , 𝑑𝑓𝑘 collude together, they
can’t learn more than simply 𝑓1 (𝑚), … , 𝑓𝑘 (𝑚). Note that using fully
homomorphic encryption we can easily transform an encryption of
𝑚 to an encryption of 𝑓(𝑚) but what we want here is the ability to
selectively decrypt only some information about the message.
The formal definition of functional encryption is the following:

Definition 20.1 — Functional Encryption. A tuple (𝐺, 𝐸, 𝐷, 𝐾𝑒𝑦𝐷𝑖𝑠𝑡) is a


functional encryption scheme if:

• For every function 𝑓 ∶ {0, 1}ℓ → {0, 1}, if (𝑑, 𝑒) = 𝐺(1𝑛 ) and 𝑑𝑓 =
𝐾𝑒𝑦𝐷𝑖𝑠𝑡(𝑑, 𝑓), then for every message 𝑚, 𝐷𝑑𝑓 (𝐸𝑒 (𝑚)) = 𝑓(𝑚).

• Every efficient adversary Eve wins the following game with


probability at most 1/2 + 𝑛𝑒𝑔𝑙(𝑛):

1. We generate (𝑑, 𝑒) ←𝑅 𝐺(1𝑛 ).


2. Eve is given 𝑒 and for 𝑖 = 1, … , 𝑇 = 𝑝𝑜𝑙𝑦(𝑛) repeatedly chooses
𝑓𝑖 and receives 𝑑𝑓𝑖 .
3. Eve chooses two messages 𝑚0 , 𝑚1 such that 𝑓𝑖 (𝑚0 ) = 𝑓𝑖 (𝑚1 ) for
all 𝑖 = 1, … , 𝑇 .
4. For 𝑏 ←𝑅 {0, 1}, Eve receives 𝑐∗ = 𝐸𝑒 (𝑚𝑏 ) and outputs 𝑏′ .
5. Eve wins if 𝑏′ = 𝑏.

20.4 THE SOFTWARE PATCH PROBLEM


It’s not only exotic forms of encryption that we’re missing. Here is an-
other application that is not yet solved by the above tools. From time
to time software companies discover a vulnerability in their products.
For example, they might discover that if fed an input 𝑥 of some partic-
ular form (e.g., satisfying a regular expression 𝑅) to a server running
their software could give an adversary unlimited access to it. In such
a case, you might want to release a patch that modifies the software to
check if 𝑅(𝑥) = 1 and if so rejects the input. However the fear is that
hackers who didn’t know about the vulnerability before could dis-
cover it by examining the patch and then use it to attack the customers
softwa re obfu scati on 377

who are slow to update their software. Could we come up for a reg-
ular expression 𝑅 with a program 𝑃 such that 𝑃 (𝑥) = 1 if and only
if 𝑅(𝑥) = 1 but examining the code of 𝑃 doesn’t make it any easier to
find some 𝑥 satisfying 𝑅?

20.5 SOFTWARE OBFUSCATION


All these applications and more could in principle be solved by a
single general tool known as virtual black-box (VBB) secure software
obfuscation. In fact, such an obfuscation is a general tool that can also
be directly used to yield public key encryption, fully homomorphic
encryption, zero knowledge proofs, secure function evaluation, and
many more applications.
We will now give the definition of VBB secure obfuscation and
prove the central result about it, which is unfortunately that secure
VBB obfuscators do not exist. We will then talk about the relaxed
notion of indistinguishablity obfuscators (IO) - this object turns out to
be good enough for many of the above applications and whether it
exists is one of the most exciting open questions in cryptography at the
moment. We will survey some of the research on this front.
Let’s define a compiler to be an efficient (i.e., polynomial time) pos-
sibly probabilistic map 𝒪 that takes a Boolean circuit 𝐶 on 𝑛 bits of
input and outputs a Boolean circuit 𝐶 ′ that also takes 𝑛 input bits and
computes the same function; i.e., 𝐶(𝑥) = 𝐶 ′ (𝑥) for every 𝑥 ∈ {0, 1}𝑛 .
(If 𝒪 is probabilistic then this should happen for every choice of its
coins.) This might seem a strange definition, since it even allows the
trivial compiler 𝒪(𝐶) = 𝐶. That is OK, since later we will require
additional properties such as the following:

Definition 20.2 — VBB secure obfuscation.A compiler 𝒪 is a virtual black


box (VBB) secure obfuscator if it satisfies the following property: for
every efficient adversary 𝐴 mapping {0, 1}∗ to {0, 1}, there exists
an efficient simulator 𝑆 such that for every circuit 𝐶 the following
random variables are computationally indistinguishable:

• 𝐴(𝒪(𝐶))

• 𝑆 𝐶 (1|𝐶| ) where by this we mean the output of 𝑆 when it is given


the length of 𝐶 and access to the function 𝑥 ↦ 𝐶(𝑥) as a black
box (aka oracle access).

(Note that the distributions above are of a single bit, and so being
indistinguishable simply means that the probability of outputting 1 is
equal in both cases up to a negligible additive factor.)
378 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

20.6 APPLICATIONS OF OBFUSCATION


The writings of Diffie and Hellman, James Ellis, and others that
thought of public key encryption, shows that one of the first ap-
proaches they considered was to use obfuscation to transform a
private-key encryption scheme into a public key one. That is, given
a private key encryption scheme (𝐸, 𝐷) we can transform it to a pub-
lic key encryption scheme (𝐺, 𝐸 ′ , 𝐷) by having the key generation
algorithm select a private key 𝑘 ←𝑅 {0, 1}𝑛 that will serve as the de-
cryption key, and let the encryption key 𝑒 be the circuit 𝒪(𝐶) where
𝒪 is an obfuscator and 𝐶 is a circuit mapping 𝑐 to 𝐸𝑘 (𝑐). The new
encryption algorithm 𝐸 ′ takes 𝑒 and 𝑐 and simply outputs 𝑒(𝑐).
These days we know other approaches for obtaining public key
encryption, but the obfuscation-based approach has significant addi-
tional flexibility. To turn this into a fully homomorphic encryption, we
simply publish the obfuscation of 𝑐, 𝑐′ ↦ 𝐷𝑘 (𝑐) NAND 𝐷𝑘 (𝑐′ ). To turn
this into a functional encryption, for every function 𝑓 we can define 𝑑𝑓
as the obfuscation of 𝑐 ↦ 𝑓(𝐷𝑘 (𝑐)).
We can also use obfuscation to get a witness encryption: to encrypt
a message 𝑚 to be opened using any 𝑤 such that 𝐹 (𝑤) = 1, we can ob-
fuscate the function that maps 𝑤 to 𝑚 if 𝐹 (𝑤) = 1 and outputs error
otherwise. To solve the patch problem, for a given regular expression
we can obfuscate the function that maps 𝑥 to 𝑅(𝑥).

20.7 IMPOSSIBILITY OF OBFUSCATION


So far, we’ve learned that in cryptography no concept is too fantastic to
be realized. Unfortunately, VBB secure obfuscation is an exception:

Under the PRG assump-


Theorem 20.3 — impossibility of Obfuscation.
tion, there does not exist a VBB secure obfuscating compiler.

20.7.1 Proof of impossibility of VBB obfuscation


We will now show the proof of Theorem 20.3. For starters, note that
obfuscation is trivial for learnable functions. That is, if 𝐹 is a function
such that given black-box access to 𝐹 we can recover a circuit that
computes it, then we can obfuscate it. Given a circuit 𝐶, the obfuscator
𝒪 will simply use it as a black box to learn a circuit 𝐶 ′ that computes
the same function and output it. Since 𝒪 itself only uses black box
access to 𝐶, it can be trivially simulated perfectly. (Verifying that
this is indeed the case is a good way to make sure you followed the
definition.)
However, this is not so useful, since it’s not hard to see that all the
examples above where we wanted to use obfuscation involved func-
tions that were unlearnable. But it already suggests that we should
softwa re obfu scati on 379

use an unlearnable function for our negative result. Here is an ex-


tremely simple unlearnable function. For every 𝛼, 𝛽 ∈ {0, 1}𝑛 , we
define 𝐹𝛼,𝛽 ∶ {0, 1}𝑛 → {0, 1}𝑛 to be the function that on input 𝑥
outputs 𝛽 if 𝑥 = 𝛼 and otherwise outputs 0𝑛 .
Given black box access for this function for a random 𝛼, 𝛽, it’s ex-
tremely unlikely that we would hit 𝛼 with a polynomial number of
queries and hence will not be able to recover 𝛽 and so in particular
2
Pseudorandom functions can be used to construct
will not be able to learn a circuit that computes 𝐹𝛼,𝛽 .2 examples of functions that are unlearnable in the
This function already yields a counterexample for a stronger ver- much stronger sense that we cannot achieve the
sion of the VBB definition. We define a strong VBB obfuscator to be a machine learning goal of outputting some circuit that
approximately predicts the function.
compiler 𝒪 that satisfies the above definition for adversaries that can
output not just one bit but an arbitrary long string. We can now prove
the following:
Lemma 20.4 There does not exist a strong VBB obfuscator.

Proof. Suppose towards a contradiction that there exists a strong VBB


obfuscator 𝒪. Let 𝐹𝛼,𝛽 be defined as above, and let 𝐴 be the adversary
that on input a circuit 𝐶 ′ simply outputs 𝐶 ′ . We claim that for every 𝑆
there exists some 𝛼, 𝛽 and an efficient algorithm 𝐷𝛼,𝛽
∣Pr[𝐷𝛼,𝛽 (𝐴(𝒪(𝐹𝛼,𝛽 ))) = 1] − Pr[𝐷𝛼,𝛽 (𝑆 𝐹𝛼,𝛽 (110𝑛 )) = 1]∣ > 0.9 (∗)
these probabilities are over the coins of 𝒪 and the simulator 𝑆. Note
that we identify the function 𝐹𝛼,𝛽 with the obvious circuit of size at
most 10𝑛 that computes it.
Clearly (∗) implies that that these two distributions are not in-
distinguishable, and so proving (∗) will finish the proof. The algo-
rithm 𝐷𝛼,𝛽 on input a circuit 𝐶 ′ will simply output 1 iff 𝐶 ′ (𝛼) = 𝛽.
By the definition of a compiler and the algorithm 𝐴, for every 𝛼, 𝛽,
Pr[𝐷𝛼,𝛽 (𝐴(𝒪(𝐹𝛼,𝛽 ))) = 1] = 1.
On the other hand, for 𝐷𝛼,𝛽 to output 1 on 𝐶 ′ = 𝑆 𝐹𝛼,𝛽 (110𝑛 ), it must
be the case that 𝐶 ′ (𝛼) = 𝛽. We claim that there exists some 𝛼, 𝛽 such
that this will happen with negligible probability. Indeed, assume 𝑆
makes 𝑇 = 𝑝𝑜𝑙𝑦(𝑛) queries and pick 𝛼, 𝛽 independently and uniformly
at random from {0, 1}𝑛 . For every 𝑖 = 1, … , 𝑇 , let 𝐸𝑖 be the event that
the 𝑖𝑡ℎ query of 𝑆 is the first in which it gets a response other than
0𝑛 . The probability of 𝐸𝑖 is at most 2−𝑛 because as long as 𝑆 got all
responses to be 0𝑛 , it got no information about 𝛼 and so the choice of
𝑆’s 𝑖𝑡ℎ query is independent of 𝛼 which is chosen at random in {0, 1}𝑛 .
By a union bound, the probability that 𝑆 got any response other than
0𝑛 is negligible. In which case if we let 𝐶 ′ be the output of 𝑆 and let
𝛽 ′ = 𝐶 ′ (𝛼), then 𝛽 ′ is independent of 𝛽 and so the probability that
they are equal is at most 2−𝑛 .

380 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

The adversary in the proof of Lemma 20.4 does not seem very
impressive. After all, it merely printed out its input. Indeed, the
definition of strong VBB security might simply be an overkill, and
“plain” VBB is enough for almost all applications. However, as men-
tioned above, plain VBB is impossible to achieve as well. We’ll prove a
slightly weaker version of Theorem 20.3:

If fully homomor-
Theorem 20.5 — Impossiblity of Obfuscation from FHE.
phic encryption exists then there is no VBB secure obfuscating
compiler.

(To get the original theorem from this, note that if VBB obfuscation
exists then we can transform any private key encryption into a fully
homomorphic public key encryption.)

Proof. Let (𝐺, 𝐸, 𝐷, EVAL) be a fully homomorphic encryption


scheme. For strings 𝑑, 𝑒, 𝑐, 𝛼, 𝛽, 𝛾, we will define the function
𝐹𝑑,𝑒,𝑐,𝛼,𝛽,𝛾 as follows: for inputs of the form 00𝑥, it will output 𝛽 if and
only if 𝑥 = 𝛼, and otherwise output 0𝑛 . For inputs of the form 01𝑐′ ,
it will output 𝛾 iff 𝐷𝑑 (𝑐′ ) = 𝛽 and otherwise output 0𝑛 . And for the
input 1𝑛 , it will output 𝑐. For all other inputs it will output 0𝑛 .
We will use this function family where 𝑑, 𝑒 are the keys of the FHE,
and 𝑐 = 𝐸𝑒 (𝛼). We now define our adversary 𝐴. On input some circuit
𝐶 ′ , 𝐴 will compute 𝑐′ = 𝐶 ′ (1𝑛 ) and let 𝐶 ″ be the circuit that on input
𝑥 outputs 𝐶 ′ (00𝑥). It will then let 𝑐″ = EVAL𝑒 (𝐶 ″ , 𝑐′ ). Note that if
𝑐′ is an encryption of 𝛼 and 𝐶 ′ computes 𝐹 = 𝐹𝑑,𝑒,𝑐,𝛼,𝛽,𝛾 then 𝑐″ will
be an encryption of 𝐹 (00𝛼) = 𝛽. The adversary 𝐴 will then compute
𝛾 ′ = 𝐶 ′ (01𝑐″ ) and output 𝛾1′ .
We claim that for every simulator 𝑆, there exist some tuple
(𝑑, 𝑒, 𝑐, 𝛼, 𝛽, 𝛾) and a distinguisher 𝐷 such that

∣Pr[𝐷(𝐴(𝒪(𝐹𝑑,𝑒,𝑐,𝛼,𝛽,𝛾 ))) = 1] − Pr[𝐷(𝑆 𝐹𝑑,𝑒,𝑐,𝛼,𝛽,𝛾 (1|𝐹𝑑,𝑒,𝑐,𝛼,𝛽,𝛾 | )) = 1]∣ ≥ 0.1

Indeed, the distinguisher 𝐷 will depend on 𝛾 and on input a bit 𝑏


will simply output 1 iff 𝑏 = 𝛾1 . Clearly, if (𝑑, 𝑒) are keys of the FHE
and 𝑐 = 𝐸𝑒 (𝛼) then no matter what circuit 𝐶 ′ the obfuscator 𝒪 outputs
on input 𝐹𝑑,𝑒,𝑐,𝛼,𝛽,𝛾 , the adversary 𝐴 will output 𝛾1 on 𝐶 ′ and hence
𝐷 will output 1 with probability one on 𝐴’s output. > In contrast
if we let 𝑆 be a simulator and generate (𝑑, 𝑒) = 𝐺(1𝑛 ), pick 𝛼, 𝛽, 𝛾
independently at random in {0, 1}𝑛 and let 𝑐 = 𝐸𝑒 (𝛼), we claim that
the probability that 𝑆 will output 𝛾1 will be equal to 1/2 ± 𝑛𝑒𝑔𝑙(𝑛).
Indeed, suppose otherwise, and define the event 𝐸𝑖 to be that the 𝑖𝑡ℎ
query is the first query (apart from the query 1𝑛 whose answer is
𝑐) on which 𝑆 receives an answer other than 0𝑛 . Now there are two
cases:
softwa re obfu scati on 381

Case 1: The query is equal to 00𝛼.


Case 2: The query is equal to 01𝑐′ for some 𝑐′ such that 𝐷𝑑 (𝑐′ ) = 𝛽.
Case 2 only happens with negligible probability because if 𝑆 only
received the value 𝑒 (which is independent of 𝛽) and did not receive
any other non 0𝑛 response up to the 𝑖𝑡ℎ point then it did not learn any
information about 𝛽. Therefore the value 𝛽 is independent of the 𝑖𝑡ℎ
query and the probability that it decrypts to 𝛽 is at most 2−𝑛 .
Case 1 only happens with negligible probability because otherwise
𝑆 is an algorithm that on input an encryption of 𝛼 (and a bunch of
answers of the form 0𝑛 , which are of course not helpful) manages to
output 𝛼 with non-negligible probability, hence violating the CPA
security of the encryption scheme.
Now if neither case happens, then 𝑆 does not receive any informa-
tion about 𝛾, and hence the probability that its output is 𝛾1 is at most
1/2.

P
This proof is simple but deserves a second read. A
crucial point here is to use FHE to allow the adversary
to essentially “feed 𝐶 ′ to itself” so it can obtain from
an encryption of 𝛼 an encryption of 𝛽, even though
that would not be possible using black box access
only.

20.8 INDISTINGUISHABILITY OBFUSCATION


The proof can be generalized to give private key encryption for which
the transformation to public key encryption would be insecure, and
many other such constructions. So, this result might (and indeed to
a large extent did) seem like a death blow to general-purpose obfus-
cation. However, already in that paper we noticed that there was a
variant of obfuscation that we could not rule out, and this is the fol-
lowing:

We say a compiler 𝒪
Definition 20.6 — Indistinguishability Obfuscation.
is an indistinguishability obfuscator (IO) if for every two circuits
𝐶, 𝐶 ′ that have the same size and compute the same function, the
random variables 𝒪(𝐶) and 𝒪(𝐶 ′ ) are computationally indistin-
guishable.

It is a good exercise to understand why the proof of the impos-


sibility result above does not apply to rule out IO. Nevertheless, a
reasonable guess would be that:
382 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

1. IO is impossible to achieve.

2. Even if it was possible to achieve, it is not good enough for most of


the interesting applications of obfuscation.

However, it turns out that this guess is (most likely) wrong. New
results have shown that IO is extremely useful for many applications,
including those outlined above. They also gave some evidence that it
might be possible to achieve. We’ll talk about those works in the next
lecture.
21
More obfuscation, exotic encryptions

Fully homomorphic encryption is an extremely powerful notion,


but it does not allow us to obtain fine control over the access to infor-
mation. With the public key you can do all sorts of computation on
the encrypted data, but you still do not learn it, while with the private
key you learn everything. But in many situations we want fine grained
access control: some people should get access to some of the informa-
tion for some of the time. This makes the “all or nothing” nature of
traditional encryptions problematic. While one could still implement
such access control by interacting with the holder(s) of the secret key,
this is not always possible.
The most general notion of an encryption scheme allowing fine con-
trol is known as functional encryption, as was described in the previous
lecture. This can be viewed as an object dual to Fully Homomorphic
Encryption, and incomparable to it. For every function 𝑓, we can con-
struct an 𝑓-restricted decryption key 𝑑𝑓 that allows recovery of 𝑓(𝑚)
from an encryption of 𝑚 but not anything else.
In this lecture we will focus on a weaker notion known as iden-
tity based encryption (IBE). Unlike the case of full fledged functional
encryption, there are fairly efficient constructions known for IBE.

21.1 SLOWER, WEAKER, LESS SECURER


In a sense, functional encryption or IBE is all about selective leaking of
information. That is, in some sense we want to modify an encryption
scheme so that it actually is “less secure” in some very precise sense,
so that it would be possible to learn something about the plaintext
even without knowing the (full) decryption key.
There is actually a history of cryptographic technique meant to
support such operations. Perhaps the “mother” of all such “quasi en-
cryption” schemes is the modular exponentiation operation 𝑥 ↦ 𝑔𝑥
for some discrete group 𝔾. The map 𝑥 ↦ 𝑔𝑥 is not exactly an encryp-
tion of 𝑥- for one thing, we don’t know how to decrypt it. Also, as a
deterministic map, it cannot be semantically secure. Nevertheless, if 𝑥

Compiled on 11.17.2021 22:35


384 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

is random, or even high entropy, in groups such as cyclic subgroup of


a multiplicative group modulo some prime, we don’t know how to re-
cover 𝑥 from 𝑔𝑥 . However, given 𝑔𝑥1 , … , 𝑔𝑥𝑘 and 𝑎1 , … , 𝑎𝑘 we can find
out if ∑ 𝑎𝑖 𝑥𝑖 = 0, and this can be quite useful in many applications.
More generally, even in the private key setting, people have studied
encryption schemes such as

• Deterministic encryption : an encryption scheme that maps 𝑥 to


𝐸(𝑥) in a deterministic way. This cannot be semantically secure in
general but can be good enough if the message 𝑥 has high enough
entropy or doesn’t repeat and allows to check if two encryptions
encrypt the same object. (We can also do this by publishing a hash
of 𝑥 under some secret salt.)

• Order preserving encryption: is an encryption scheme mapping


numbers in some range {1, … , 𝑁 } to ciphertexts so that given 𝐸(𝑥)
and 𝐸(𝑦) one can efficiently compare whether 𝑥 < 𝑦. This is quite
problematic for security. For example, given 𝑝𝑜𝑙𝑦(𝑡) random such
encryptions you can more or less know where they lie in the inter-
val up to (1 ± 1/𝑡) multiplicative factor..

• Searchable encryption: is a generalization of deterministic encryp-


tion that allows some more sophisticated searchers (such as not
only exact match).

Some of these constructions can be quite efficient. In particular the


system CryptDB developed by Popa et al uses these kinds of encryp-
tions to automatically turn a SQL database into one that works on
encrypted data and still supports the required queries. However, the
issue of how dangerous the “leakage” can be is somewhat subtle. See
this paper and blog post claiming weaknesses in practical use cases for
CryptDB, as well as this response by the CryptDB authors.
While the constructions of IBE and functional encryption often
use maps such as 𝑥 ↦ 𝑔𝑥 as subroutines, they offer a stronger con-
trol over the leakage in the sense that, in the absence of publishing a
(restricted) decryption key, we always get at least CPA security.

21.2 HOW TO GET IBE FROM PAIRING BASED ASSUMPTIONS.


The standard exponentiation mapping 𝑥 ↦ 𝑔𝑥 allows us to compute
linear functions in the exponent. That is, given any linear map 𝐿 of the
form 𝐿(𝑥1 , … , 𝑥𝑘 ) = ∑ 𝑎𝑖 𝑥𝑖 , we can efficiently compute the map
𝑔𝑥1 , … , 𝑔𝑥𝑘 ↦ 𝑔𝐿(𝑥1 ,…,𝑥𝑘 ) . But can we do more? In particular, can
we compute quadratic functions? This is an issue, as even computing
the map 𝑔𝑥 , 𝑔𝑦 ↦ 𝑔𝑥𝑦 is exactly the Diffie Hellman problem that is
considered hard in many of the groups we are interested in.
more obfu scati on, e xoti c e nc ry p ti on s 385

Pairing based cryptography begins with the observation that in some


elliptic curve groups we can use a map based on the so called Weil
or Tate pairings. The idea is that we have an efficiently computable
isomorphism from a group 𝔾1 to a group 𝔾2 mapping 𝑔 to 𝑔 ̂ such
that we can efficiently map the elements 𝑔𝑥 and 𝑔𝑦 to the element
̂ . This in particular means that given 𝑔𝑥1 , … , 𝑔𝑥𝑘 we
𝜑(𝑔𝑥 , 𝑔𝑦 ) = 𝑔𝑥𝑦
can compute 𝑔𝑄(𝑥̂ 1 ,…,𝑥𝑘 ) for every quadratic 𝑄. Note that we cannot
repeat this to compute, say, degree 4 functions in the exponent, since
we don’t know how to invert the map 𝜑.
The Pairing Diffie Hellman Assumption is that we can find two
such groups 𝔾1 , 𝔾2 and generator 𝑔 for 𝔾 such that there is no efficient
algorithm 𝐴 that on input 𝑔𝑎 , 𝑔𝑏 , 𝑔𝑐 (for random 𝑎, 𝑏, 𝑐 ∈ {0, … , |𝔾| −
1}) computes 𝑔𝑎𝑏𝑐̂ . That is, while we can compute a quadratic in the
exponent, we can’t compute a cubic. 1
The construction we show was first published in
We now show an IBE construction due to Boneh and Franklin1 how the CRYPTO 2001 conference. The Weil and Tate
we can obtain from the pairing diffie hellman assumption an identity pairings were used before for cryptographic attacks,
but were used for a positive cryptographic result by
based encryption: Antoine Joux in his 2000 paper getting a three-party
Diffie Hellman protocol and then Boneh and Franklin
• Master key generation: We generate 𝔾1 , 𝔾2 , 𝑔 as above, choose 𝑎 used this to obtain an identity based encryption
scheme, answering an open question of Shamir. At
at random in {0, … , |𝔾| − 1}. The master private key is 𝑎 and the approximately the same time as these papers, Sakai,
master public key is 𝔾1 , 𝔾2 , 𝑔, ℎ = 𝑔𝑎 . We let 𝐻 ∶ {0, 1}∗ → 𝔾1 Ohgishi and Kasahara presented a paper in the SCIS
2000 conference in Japan showing an identity-based
and 𝐻 ′ ∶ 𝔾2 ↦ {0, 1}ℓ be two hash functions modeled as random
key exchange protocol from pairing. Also Clifford
oracles. Cocks (who as we mentioned above in the 1970’s
invented the RSA scheme at GCHQ before R,S,
• Key distribution: Given an arbitrary string 𝑖𝑑 ∈ {0, 1}∗ , we gener- and A did), also came up in 2001 with a different
identity-based encryption scheme using the quadratic
ate the decryption key corresponding to 𝑖𝑑, as 𝑑𝑖𝑑 = 𝐻(𝑖𝑑)𝑎 . residuosity assumption.

• Encryption: To encrypt a message 𝑚 ∈ {0, 1}ℓ given the public


paramters and some id 𝑖𝑑, we choose 𝑐 ∈ {0, … , |𝔾| − 1}, and output
𝑔𝑐 , 𝐻 ′ (𝑖𝑑‖𝜑(ℎ, 𝐻(𝑖𝑑))𝑐 ) ⊕ 𝑚

• Decryption: Given the secret key 𝑑𝑖𝑑 and a ciphertext ℎ′ , 𝑦, we


output 𝐻 ′ (𝑖𝑑‖𝜑(𝑑𝑖𝑑 , ℎ′ )) ⊕ 𝑥

Correctness: We claim that 𝐷𝑑𝑖𝑑 (𝐸𝑖𝑑 (𝑚)) = 𝑚. Indeed, write ℎ𝑖𝑑 =


𝐻(𝑖𝑑) and let 𝑏 = log𝑔 ℎ𝑖𝑑 . Then an encryption of 𝑚 has the form
ℎ′ = 𝑔𝑐 , 𝐻 ′ (𝑖𝑑‖𝜑(𝑔𝑎 , ℎ𝑖𝑑 )𝑐 ) ⊕ 𝑚, and so the second term is equal
to 𝐻 ′ (𝑖𝑑‖𝑔𝑎𝑏𝑐
̂ ) ⊕ 𝑚. However, since 𝑑𝑖𝑑 = ℎ𝑎𝑖𝑑 = 𝑔𝑎𝑏 , we get that
̂ and hence decryption will recover the message. QED
𝜑(ℎ , 𝑑𝑖𝑑 ) = 𝑔𝑎𝑏𝑐

Security: To prove security we need to first present a definition of IBE


security. The definition allows the adversary to request keys corre-
sponding to arbitrary identities, as long as it does not ask for keys
corresponding to the target identity it wants to attack. There are sev-
eral variants, including CCA type of security definitions, but we stick
to a simple one here:
386 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

Definition:An IBE scheme is said to be CPA secure if every efficient


adversary Eve wins the following game with probability at most 1/2 +
𝑛𝑒𝑔𝑙(𝑛):

• The keys are generated and Eve gets the master public key.
• For 𝑖 = 1, … , 𝑇 = 𝑝𝑜𝑙𝑦(𝑛), Eve chooses an identity 𝑖𝑑𝑖 ∈ {0, 1}∗ and
gets the key 𝑑𝑖𝑑 .
• Eve chooses an identity 𝑖𝑑∗ ∉ {𝑖𝑑1 , … , 𝑖𝑑𝑇 } and two messages
𝑚0 , 𝑚 1 .
• We choose 𝑏 ←𝑅 {0, 1} and Eve gets the encryption of 𝑚𝑏 with
respect to the identity 𝑖𝑑∗ .
• Eve outputs 𝑏′ and wins if 𝑏′ = 𝑏.

Theorem: If the pairing Diffie Hellman assumption holds and 𝐻, 𝐻 ′


are random oracles, then the scheme above is CPA secure.

Proof: Suppose for the sake of contradiction that there exists some
time 𝑇 = 𝑝𝑜𝑙𝑦(𝑛) adversary 𝐴 that succeeds in the IBE-CPA with
probability at least 1/2 + 𝜖 for some non-negligible 𝜖. We assume
without loss of generality that whenever 𝐴 makes a query to the key
distribution function with id 𝑖𝑑 or a query to 𝐻 ′ with prefix 𝑖𝑑, it had
already previously made the query 𝑖𝑑 to 𝐻. (𝐴 can be easily modified
to have this behavior)
We will build an algorithm 𝐵 that on input 𝔾1 , 𝔾2 , 𝑔, 𝑔𝑎 , 𝑔𝑏 , 𝑔𝑐 will
output 𝑔𝑎𝑏𝑐
̂ with probability 𝑝𝑜𝑙𝑦(𝜖, 1/𝑇 ).
The algorithm 𝐵 will guess 𝑖0 , 𝑗0 ←𝑅 {1, … , 𝑇 } and simulate 𝐴 “in
its belly” giving it the public key 𝑔𝑎 , and act as follows:

• When 𝐴 makes a query to 𝐻 with 𝑖𝑑, then for all but the 𝑖𝑡ℎ 0 queries,
𝐵 will chooose a random 𝑏𝑖𝑑 ∈ {0, … , |𝔾|} (as usual we’ll assume
|𝔾| is prime), choose 𝑒𝑖𝑑 = 𝑔𝑏𝑖𝑑 and define 𝐻(𝑖𝑑) = 𝑒𝑖𝑑 . Let 𝑖𝑑0 be
0 query 𝐴 made to the oracle. We define 𝐻(𝑖0 ) = 𝑔 (where 𝑔
the 𝑖𝑡ℎ 𝑏 𝑏

is the input to 𝐵- recall that 𝐵 does not know 𝑏.)

• When 𝐴 makes a query to the key distribution oracle with 𝑖𝑑 then


if 𝑖𝑑 ≠ 𝑖𝑑0 then 𝐵 will then respond with 𝑑𝑖𝑑 = (𝑔𝑎 )𝑏𝑖𝑑 . If 𝑖𝑑 = 𝑖𝑑0
then 𝐵 aborts and fails.

• When 𝐴 makes a query to the 𝐻 ′ oracle with input 𝑖𝑑′ ‖ℎ̂ then for all
but the 𝑗0𝑡ℎ query 𝐵 answers with a random string in {0, 1}ℓ . In the
𝑗0𝑡ℎ query, if 𝑖𝑑′ ≠ 𝑖𝑑0 then 𝐵 stops and fails. Otherwise, it outputs
ℎ.̂

• 𝐵 does stops the simulation and fails if we get to the challenge part.

It might seem weird that we stop the simulation before we reach the
challenge part, but the correctness of this reduction follows from the
following claim:
more obfu scati on, e xoti c e nc ry p ti on s 387

Claim: In the actual attack game, with probability at least 𝜖/10 𝐴 will
make the query 𝑖𝑑∗ ‖𝑔𝑎𝑏𝑐
̂ to the 𝐻 ′ oracle, where 𝐻(𝑖𝑑∗ ) = 𝑔𝑏 and the
public key is 𝑔 .
𝑎

Proof: If 𝐴 does not make this query then the message in the chal-
lenge is XOR’ed by a completely random string and 𝐴 cannot distin-
guish between 𝑚0 and 𝑚1 in this case with probability better than 1/2.
QED
Given this claim, to prove the theorem we just need to observe that,
assuming it does not fail, 𝐵 provides answers to 𝐴 that are identically
distributed to the answers 𝐴 receives in an actual execution of the CPA
game, and hence with probability at least 𝜖/(10𝑇 2 ), 𝐵 will guess the
query 𝑖0 when 𝐴 queries 𝐻(𝑖𝑑∗ ) and set the answer to be 𝑔𝑏 , and then
guess the query 𝑗0 when 𝐴 queries 𝑖𝑑∗ ‖𝑔𝑎𝑏𝑐
̂ in which case 𝐵’s output
will be correct. QED

21.3 BEYOND PAIRING BASED CRYPTOGRAPHY


Boneh and Silverberg asked the question of whether we could go be-
yond quadratic polynomials and get schemes that allow us to compute
higher degree. The idea is to get a multilinear map which would be a
set of isomorphic groups 𝔾1 , … , 𝔾𝑑 with generators 𝑔1 , … , 𝑔𝑑 such that
we can map 𝑔𝑖𝑎 and 𝑔𝑗𝑏 to 𝑔𝑖+𝑗𝑎𝑏
.
This way we would be able to compute any degree 𝑑 polynomial in
the exponent given 𝑔1 1 , … , 𝑔1 𝑘 .
𝑥 𝑥

We will now show how using such a multilinear map we can get
a construction for a witness encryption scheme. We will only show
the construction, without talking about the security definition, the
assumption, or security reductions.
Given some circuit 𝐶 ∶ {0, 1}𝑛 → {0, 1} and some message 𝑥
we want to “encrypt” 𝑥 in a way that given 𝑤 such that 𝐶(𝑤) = 1 it
would be possible to decrypt 𝑥, and otherwise it should be hard. It
should be noted that the encrypting party itself does not know any
such 𝑤 and indeed (as in the case of the proof of Reimann hypothe-
sis) might not even know if such a 𝑤 exists. The idea is the following.
We use the fact that the Exact Cover problem is NP complete to
map 𝐶 into collection of subsets 𝑆1 , … , 𝑆𝑚 of the universe 𝑈 (where
𝑚, |𝑈 | = 𝑝𝑜𝑙𝑦(|𝐶|, 𝑛)) such that there exists 𝑤 with 𝐶(𝑤) = 1 if and
only if there exists 𝑑 sets 𝑆𝑖1 , … , 𝑆𝑖𝑑 that are a partition of 𝑈 (i.e., every
element in 𝑈 is covered by exactly one of these sets), and moreover
there is an efficient way to map 𝑤 to such a partition and vice versa.
Now, to encrypt the message 𝑥 we take a degree 𝑑 instance of multilin-
ear maps (𝔾1 , … , 𝔾𝑑 , 𝑔1 , … , 𝑔𝑑 ) (with all groups of size 𝑝) and choose
random 𝑎1 , … , 𝑎|𝑈| ←𝑅 {0, … , 𝑝 − 1}. We then output the cipher-
𝑚
∏𝑗∈𝑆 𝑎𝑗 ∏ 𝑎𝑗 ∏ 𝑎𝑗
text 𝑔1 1
, … , 𝑔1 𝑗∈𝑆𝑚 , 𝐻(𝑔𝑑 𝑗∈𝑈 ) ⊕ 𝑥. Now, given a partition
388 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

𝑆𝑖1 , … , 𝑆𝑖𝑑 of the universe 𝑑, we can use the multilinear operations


∏ 𝑎𝑗
to compute 𝑔𝑑 𝑗∈𝑈 and recover the message. Intuitively, since the
numbers are random, that would be the only way to come up with
computing this value, but showing that requires formulating precise
security definitions for both multilinear maps and witness encryption
and of course a proof.
The first candidate construction for a multilinear map was given
by Garg, Gentry and Halevi. It is based on computational questions
on lattices and so (perhaps not surprisingly) it involves significant
complications due to noise. At a very high level, the idea is to use a
fully homomorphic encryption scheme that can evaluate polynomi-
als up to some degree 𝑑, but release a “hobbled decryption key” that
contains just enough information to provide what’s known as a zero
test: check if an encryption is equal to zero. Because of the homomor-
phic properties, that means that we can check given encryptions of
𝑥1 , … , 𝑥𝑛 and some degree 𝑑 polynomial 𝑃 , whether 𝑃 (𝑥1 , … , 𝑥𝑑 ) = 0.
Moreover, the notion of security this and similar construction satisfy
is rather subtle and indeed not fully understood. Constructions of
indistinguishability obfuscators are built based on this idea, but are
significantly more involved than the construction of a witness encryp-
tion. One central tool they use is the observation that FHE reduces
the task of obfuscation to essentially obfuscating a decryption circuit,
which can often be rather shallow. But beyond that there is significant
work to be done to actually carry out the obfuscation.
22
Anonymous communication

Encryption is meant to protect the contents of communication, but


sometimes the bigger secret is that the communication existed in the
first place. If a whistleblower wants to leak some information to the
New York Times, the mere fact that she sent an email would reveal her
identity. There are two main concepts aimed at achieving anonymity:

• Anonymous routing is about ensuring that Alice and Bob can com-
municate without that fact being revealed.

• Steganography is about having Alice and Bob hiding an encrypted


communication in the context of an seemingly innocuous conversa-
tion.

22.1 STEGANOGRAPHY
The goal in a stegnaographic communication is to hide cryptographic
(or non cryptographic) content without being detected. The idea is
simple: let’s start with the symmetric case and assume Alice and Bob
share a shared key 𝑘 and Alice wants to transmit a bit 𝑏 to Bob. We
assume that Alice and has a choice of 𝑡 words 𝑤1 , … , 𝑤𝑡 that would be
reasonable for her to send at this point in the conversation. Alice will
choose a word 𝑤𝑖 such that 𝑓𝑘 (𝑤𝑖 ) = 𝑏 where {𝑓𝑘 } is a pseudorandom
function collection. With probability 1 − 2−𝑡 there will be such a word.
Bob will decode the message using 𝑓𝑘 (𝑤𝑖 ). Alice and Bob can use an
error correcting code to compensate for the probability 2−𝑡 that Alice
is forced to send the wrong bit.
In the public key setting, suppose that Bob publishes a public key 𝑒
for an encryption scheme that has pseudorandom ciphertexts. That is,
to a party that does not know the key, an encryption is indistinguish-
able from a random string. To send some message 𝑚 to Bob, Alice
computes 𝑐 = 𝐸𝑒 (𝑚) and transmits it to Bob one bit at a time. Given
the 𝑡 words 𝑤1 , … , 𝑤𝑡 , to transmit the bit 𝑐𝑗 Alice chooses a word 𝑤𝑖
such that 𝐻(𝑤𝑖 ) = 𝑐𝑗 where 𝐻 ∶ {0, 1}∗ → {0, 1} is a hash function

Compiled on 11.17.2021 22:35


390 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

modeled as a random oracle. The distribution of words output by Al-


ice 𝑤1 , … , 𝑤ℓ is uniform conditioned on (𝐻(𝑤1 ), … , 𝐻(𝑤ℓ )) = 𝑐. But
note that if 𝐻 is a random oracle, then 𝐻(𝑤1 ), … , 𝐻(𝑤ℓ ) is going to be
uniform, and hence indistinguishable from 𝑐.

22.2 ANONYMOUS ROUTING


• Low latency communication: Aqua, Crowds, LAP, ShadowWalker,
Tarzan, Tor

• Message at a time, protection against timing / traffic analysis:


Mix-nets, e-voting, Dining Cryptographer network (DC net), Dis-
sent, Herbivore, Riposte

22.3 TOR
Basic arhictecture. Attacks

22.4 TELEX

22.5 RIPOSTE
V
CONCLUSIONS
23
Ethical, moral, and policy dimensions to cryptography

This will not be a lecture but rather a discussion on some of the


questions that arise from cryptography. I would like you to read some
of the sources below (and maybe others) and reflect on the following
questions:
The discussion is often framed as weighing privacy against secu-
rity, but I encourage you to look critically at both issues. It is often
instructive to try to compare the current situation with both the histor-
ical past as well as some ideal desired world. It is also worthwhile to
consider cryptography in the broader contexts. Some people on both
the pro regulation and anti regulation camps exeggarate the role of
cryptography.
On one hand, cryptography is likely not to bring about the “crypto
anarchy” regime hoped for in the crypto anarchist manifesto. For
example, more than the growth of bitcoin, we are seeing a turn away
from cash into credit cards and other forms of much more traceable
and less anonymous forms of payments (interestingly, these forms of
payments are often enabled by cryptography). On the other hand,
despite the fears raised by government agencies of “going dark” there
are powerful commercial incentives to collect vast amounts of data
and store them at search-warrant friendly servers. Clearly technology
is shifting the landscape of relationships among individuals, as well
as between individuals and large organizations and governments.
Cryptography is an important component in these technologies but
not the only one, and more than that, the ways technologies end up
used often has more to do with social and commercial factors than
with the technologies themselves.
All that said, significant changes often pose non trivial dangers,
and it is important to have an informed and reasoned discussion of the
ways cryptography can help or harm the general and private good.
Some questions that are worth considering are:

Compiled on 11.17.2021 22:35


394 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

• Is communicating privately a basic human right? Should it extend


to communicating at a distance? Should this be absolute privacy
that cannot be violated even with a legal warrant? If there was a
secure way to implement wiretapping only with a legal warrant,
would it be morally just?

• Is privacy a basic good in its own right? Or a necessary condition


for the freedom of expression, and peaceful assembly and associa-
tion?

• Are we less or more secure today than in the past? In what ways
did the balance between government and individuals shift in the
last few decades? Do governments have more or less data and tools
for monitoring individuals at their disposal? Do individuals and
non-governmental groups have more or less ability to inflict harm
(and hence need to be protected against)?

• Do we have more or less privacy today than in the past? Do cryp-


tography regulation play a big part in that?

• What would be the balance between security and privacy in an


ideal world?

• Is the focus on encryption misguided in that the main issue af-


fecting privacy and security is the so called meta data? Can crypto-
graphic techniques protect such meta data? Even if they could, is
there a commercial interest in doing so?

• One argument against the regulation of cryptography is that, given


the mathematics of cryptography is not secret, the “bad guys” will
always be able to access it. Is this a valid argument? Note that simi-
lar arguments are made in the context of gun control. Also, perhaps
the “true dissidents” will also be able to access cryptography as
well and so regulation will effect the masses or “run of the mill”
private good and not-so-good citizens?

• What would be the practical impact of regulations forbidding the


use of end-to-end crypto without access by governments?

• Rogaway argues that cryptography is inherently political, and


research should acknowledge this and be directed at achieving ben-
eficial political goals. Has cryptography research failed the public?
What more could be done?

• Are some cryptographic (or crypto related) tools inherently


morally problematic? Rogaway suggests that this may be true for
fully homomorphic encryption and differential privacy- do you
agree?
ethi ca l , mora l , a n d p ol i c y d i me n si on s to c ry p tog r a p hy 395

• What are the most significant scenarios where cryptography can


impact positively or negatively? Large scale terror attacks? “Ordi-
nary” crimes (that still claim the lives of many more people than
terror attacks)? Attacks against cyber infrastructure or personal
data? Political dissidents in opressive regimes? Mass government
or corporate surveilance?

• How are these issues different in the U.S. as opposed to other coun-
tries? Is the debate too U.S. centric?

23.1 READING PRIOR TO LECTURE:


• Moral Character of Cryptographic Work - please read at least parts
1-3 (pages 1-30 in the footnoted version) - it’s long and should not
be taken uncritically, but is a very good and thought provoking
read.
• “Going Dark” Berkman report - this is a report written by a com-
mittee, and as such not as exciting (though arguably more sober)
than Rogaway’s paper. Please read at least the introduction and
you might also find the personal statements in Appendix A inter-
esting.
• Digital Equilibrium project - optional reading - this is a group of
very senior current and former officials, in particular in govern-
ment, and as such would tend to fall on the more “establishment”
or “pro regulation” side. Their “foundational paper” has even more
of a “written by committee” feel but is still worthwhile reading.
• Crypto anarchist manifesto - optional reading - very much not
“written by committee” can be an interesting read even if it sounds
more like science fiction than describing actual current or near
future reality.

23.2 CASE STUDIES.


Since such a discussion might be sometimes hard to hold in the ab-
stract, let us consider some actual cases:

23.2.1 The Snowden revelations


The impetus for the current iteration of the security vs privacy debate
were the Snowden revelations on the massive scale of surveillance by
the NSA on citizens in the U.S. and around the globe. Concurrently,
in plain sight, companies such as Apple, Google, Facebook, and oth-
ers are also collecting massive amounts of information on their users.
Some of the backlash to the Snowden revelations was increased pres-
sure on companies to support stronger “end-to-end” encryption such
as some data does not reside on companies’ servers, that have become
suspect. We’re now seeing some “backlash to the backlash” with law
396 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

enforcement and government officials around the globe trying to ban


such encryption technlogoy or mandate government backdoors.

23.2.2 FBI vs Apple case


We’ve mentioned this case in the past. (I also blogged about it.)
The short summary is that an iPhone belonging to one of the San
Bernardino terrorists was found by the FBI. The iPhone’s memory
was encrypted by a key 𝑘 that is obtained as 𝐻(𝑢𝑖𝑑‖𝑝𝑎𝑠𝑠𝑐𝑜𝑑𝑒) where
𝑝𝑎𝑠𝑠𝑐𝑜𝑑𝑒 is the six digit passcode of the user and 𝑢𝑖𝑑 is a secret 128 bit
key that is hardwired into the processor. The processor will only allow
ten attempts at guessing the passcode before erasing all memory. The
FBI wanted Apple’s help in creating a digitally signed software update
that essentially run a brute force search over the 106 passcodes and
output the key 𝑘. The software update could be restricted to run only
on that particular iPhone. Eventually, the FBI managed to extract the
information out of the iPhone without Apple’s help. The method they
used is unknown, but it may be possible to physically extract the 𝑢𝑖𝑑
from the processor. It might also be possible to prevent erasure of the
memory by disconnecting it from the processor, or rewriting it after
erasure. Would such cases change your position on this question?
Some questions that one could ask:

• Given that the FBI had a legal warrant for the information on the
iPhone, was it wrong of Apple to refuse to provide the help re-
quired?

• Was it wrong for Apple to have designed their iPhone so that they
are unable to easily extract information out of it? Should they be
required to make sure that such devices can be searched as a result
of a legal warrant?

• If the only way for the FBI to get the information was to get Apple’s
master signature key (that allows to completely break into any
iPhone, and even turn it into a recording/surveillance device),
would it have been OK for them to do it? Should Apple design their
device in a way that even their master signature key cannot break
them? Is that even possible, given that software updates are crucial
for proper functioning of such devices? (It was recently claimed
that Canadian police has had access to the master decryption key of
Blackberry since 2010.)

In the San Bernardino case, the utility of breaking into the phone
was questioned, given that both perpetrators were killed and there
was no evidence of them receiving any assistance. But there are cases
where things are more complicated. Brittney Mills was 29 years old
and 8 months pregnant when she was shot and killed in April 2015
ethi ca l , mora l , a n d p ol i c y d i me n si on s to c ry p tog r a p hy 397

in Baton Rouge, Louisiana. Her baby was delivered via emergency C


section but also died a week later. There was no sign of forced entry
and so it is quite likely she knew her assailant. Her family believes
that the clues to her murderer’s identity could be found in her iPhone,
but since it is locked they have no way of extracting this information.
One can imagine other cases as well. Recently a mother found her
kidnapped daughter using the Find my iPhone procedure. It is not
hard to concieve of a case where unlocking a phone is the key to sav-
ing someone’s life. Would such cases change your view of the above
questions?

23.2.3 Juniper backdoor case and the OPM break-in


We’ve also mentioned the case of the Juniper backdoor case. This was
a break in to the firewalls of Juniper networks by an unknown party
that was crucially enabled by backdoor allegedly inserted by the NSA
into the Dual EC pseudorandom generator. (see also here and here for
more).
Because of the nature of this break in, whomever is responsible for
it could have decrypted much of the traffic without leaving any traces,
and so we don’t know the damage caused, but such hacks can have
much more significant consequences than forcing people to change
their credit card numbers. When the federal office of personell man-
agement was hacked sensitive information about millions of people
who have gone through the security clearance was extracted. This
includes fingerprints, extensive personal information from interviews
and polygraph sessions, and much more. Such information can help
then gain access to more information, whether it’s using the finger-
print to unlock a phone or using the extensive knowledge of social
connections, habits and interests to launch very targeted attacks to
extract information from particular individuals.
Here one could ask if stronger cryptography, and in particular
cryptographic tools that would have enabled an individual to control
access to his or her own data, would have helped prevent such attacks.
24
Course recap

It might be worthwhile to recall what we learned in this course:

• Perhaps first and foremost, that it is possible to mathematically de-


fine what it means for a cryptographic scheme to be secure. In the
cases we studied, such a definition could always be described as a
“security game”. That is, we first define what it means for a scheme
to be insecure. Then, a scheme is secure if it is not insecure. The no-
tion of “insecurity” is that there exists some adversarial strategy
that succeeds with higher probability than what it should have. We
normally don’t limit the strategy of the adversary but only his or her
capabilities: its computational power and the type of access it has to
the system (e.g., chosen plaintext, chosen ciphertext, etc.). We also
talked how the notion of secrecy requires randomness and how many
real-life failures of cryptosystems amount to faulty assumptions on
the sources of randomness.

• We saw the importance of being conservative in security definitions.


For example, how despite the fact that the notion of chosen cipher-
text attack (CCA) security seems too strong to capture any realistic
scenario (e.g., when do we let an adversary play with a decryption
box?), there are many natural cases where the using a CPA instead
of a CCA secure encryption would lead to an attack on the overall
protocol.

• We saw how we can prove security by reductions. Suppose we have


a scheme 𝑆 that achieves some security notion 𝑋 (for example,
𝑆 might be a function that achieves the security notion of being a
pseudorandom generator) and we use it to build a scheme 𝑇 that
we want to achieve a security notion 𝑌 (for example, we want 𝑇 to
be a message authentication code). To prove 𝑇 is secure, we show
how we can transform an adversary 𝐵 that wins against 𝑇 in the
security game of 𝑌 into an adversary 𝐴 that wins against 𝑆 in the
security game of 𝑋. Typically, the adversary 𝐴 will run 𝐵 “in its

Compiled on 11.17.2021 22:35


400 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

belly”, simulating for 𝐵 the security game 𝑌 with respect to 𝑇 .


This can be somewhat confusing, so please re-read the last three
sentences and make sure you understand this crucial notion.

• We also saw some of the concrete wonderful things we can do in


cryptography:

• In the world of private key cryptography, we saw that based on the


PRG conjecture we can get a CPA secure private key encryption
(which in particular has key shorter than message), pseudorandom
functions, message authentication codes, CCA secure encryption,
commitment schemes, and even zero knowledge proofs for NP
complete languages.

• We saw that assuming the existence of collision resistant hash func-


tions, we can get message authentication codes (and digital signa-
tures) where the key is shorter than the message. We talked about
the heuristic of how we can model hash functions as a random ora-
cle , and use that for “proofs of work” in the context of bitcoin and
password derivation, as well as many other settings.

• We also discussed practical constructions of private key primitives


such as the AES block ciphers, and how such block ciphers are
modeled as pseudorandom permutations and how we can use
them to get CPA or CCA secure encryption via various modes such
as CBC or GCM. We also discussed the Merkle and Davis-Meyer
length extension construction for hash functions, and how the
Merkle tree construction can be used for secure storage.

• We saw the revolutionary notion of public key encryption, that two


people can talk without having coordinated in advance. We saw
constructions for this based on discrete log (e.g., the Diffie-Hellman
protocol), factoring (e.g., the Rabin and RSA trapdoor permuta-
tions), and the learning with errors (LWE) problem. We saw the
notion of digital signatures, and gave several different construc-
tions. We saw how we can use digital signatures to create a “chain
of trust” via certificates, and how the TLS protocol, which protects
web traffic, works.

• We talked about some advances notions and in particular saw the


construction of the surprising concept of a fully homomorphic en-
cryption (FHE) scheme which has been rightly called by Bryan
Hayes “one of the most amazing magic tricks in all of computer
science”. Using FHE and zero knowledge proofs, we can get multi-
party secure computation, which basically means that in the setting
of interactive protocols between several parties, we can establish
cou rse re ca p 401

a “virtual trusted third party” (or, as I prefer to call it, a “virtual


Chuck Norris”).

• We also saw other variants of encryption such as functional encryp-


tion, witness encryption and identity based encryption, which allow for
“selective leaking” of information. For functional encryption and
witness encryption we don’t yet have clean constructions under
standard assumptions but only under obfuscation, but we saw how
we could get identity based encryption using the random oracle
heuristic and the assumption of the difficulty of the discrete loga-
rithm problem in a group that admits an efficient pairing operation.

• We talked about the notion of obfuscation, which can be thought as


the one tool that, if exists, would imply all the others. We saw that
virtual black box obfuscation does not exist, but there might exist a
weaker notion known as “indistinguishability obfuscation” and we
saw how it can be useful via the example of a witness encryption
and a digital signature scheme. We mentioned (without proof) that
it can also be used to obtain a functional encryption scheme.

• We talked about how quantum computing can change the land-


scape of cryptography, making lattice based constructions our main
candidate for public key schemes.

• Finally, we discussed some of the ethical and policy issues that


arise in the applications of cryptography, and what is the impact
cryptography has now, or can have in the future, on society.

24.1 SOME THINGS WE DID NOT COVER


• We did not cover what is arguably the other “fundamental theorem
of cryptography”, namely the equivalence of one-way functions
and pseudorandom generators. A one-way function is an efficient
map 𝐹 ∶ {0, 1}∗ → {0, 1}∗ that is hard to invert on a random in-
put. That is, for any efficient algorithm 𝐴 if 𝐴 is given 𝑦 = 𝐹 (𝑥) for
uniformly chosen 𝑥 ←𝑅 {0, 1}𝑛 , then the probability that 𝐴 out-
puts 𝑥′ with 𝐹 (𝑥′ ) = 𝑦 is negligible. It can be shown that one-way
functions are minimal in the sense that they are necessary for a great
many cryptographic applications including pseudorandom gener-
ators and functions, encryption with key shorter than the message,
hash functions, message authentication codes, and many more.
(Most of these results are obtained via the work of Impagliazzo
and Luby who showed that if one-way functions do not exist then
there is a universal posterior sampler in the sense that for every prob-
abilistic process 𝐹 that maps 𝑥 to 𝑦, there is an efficient algorithm
that given 𝑦 can sample 𝑥′ from a distribution close to the posterior
402 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy

distribution of 𝑥 conditioned on 𝐹 (𝑥) = 𝑦. This result is typically


known as the equivalence of standard one-way functions and dis-
tributional one-way functions.) The fundamental result of Hastad,
Impagliazzo, Levin and Luby is that one-way functions are also
sufficient for much of private key cryptography since they imply the
existence of pseudorandom generators.

• Related to this, although we mentioned this briefly, we did not


go in depth into “Impagliazzo’s Worlds” of algorithmica, heuris-
tica, pessiland, minicrypt, cryptomania (and the new one of “ob-
fustopia”). If this piques your curiosity, please read this 1995 sur-
vey.

• We did not go in detail into the design of private key cryptosystems


such as the AES. Though we discussed modes of operation of block
ciphers, we did not go into a full description of all modes that are
used in practice. We also did not discuss cryptanalytic techniques
such as linear and differential cryptanalysis. We also not discuss
all technical issues that arise with length extension and padding of
encryptions in practice.

• While we talked about bitcoin, the TLS protocol, two factor au-
thentication systems, and some aspects of pretty good privacy, we
restricted ourselves to abstractions of these systems and did not
attempt a full “end to end” analysis of a complete system. I do hope
you have learned the tools that you’d be able to understand the full
operation of such a system if you need to.

• While we talked about Shor’s algorithm, the algorithm people


actually use today to factor numbers is the number field sieve. It and
its predecessor, the quadratic sieve, are well worth studying. The
(freely available online) book of Shoup is an excellent source not
just for these algorithms but general algorithmic group/number
theory.

• We talked about some attacks on practical systems, but there many


other attacks that teach us important lessons, not just about these
particular systems, but also about security and cryptography in
general (as well some human tendencies to repeat certain types of
mistakes).

24.2 WHAT I HOPE YOU LEARNED


I hope you got an appreciation for cryptography, and an understand-
ing of how it can surprise you both in the amazing security properties
it can deliver, as well in the subtle, but often devastating ways, that it
can fail. Beyond cryptography, I hope you got out of this course the
cou rse re ca p 403

ability to think a little differently - to be paranoid enough to see the


world from the point of view of an adversary, but also the lesson that
sometimes if something sounds crazy but is not downright impossible
it might just be feasible.
But if these philosophical ramblings don’t speak to you, as long as
you know the difference between CPA and CCA and I won’t catch you
reusing a one-time pad, you should be in good shape :)
I did not intend this course to teach you how to implement crypto-
graphic algorithms, but I do hope that if you need to use cryptography
at any point, you now have the skills to read up what’s needed and be
able to argue intelligently about the security of real-world systems. I
also hope that you have now sufficient background to not be scared by
the technical jargon and the abundance of adjectives in cryptography
research papers and be able to read up on what you need to follow
any paper that is interesting to you.
Mostly, I just hope you enjoyed this last term and felt like this
course was a good use of your time. I certainly did.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy