An Intensive Introduction To Cryptography
An Intensive Introduction To Cryptography
AN INTENSIVE
INTRODUCTION
TO C RY P TO G R A P H Y
L E C T U R E N OT E S .
AVA I L A B L E O N HTTPS://INTENSECRYPTO.ORG
Text available on https://github.com/boazbk/crypto - please post any issues there - thank you!
We’ll hold hands, and then we’ll watch the sun rise
I Preliminaries 15
0 Mathematical Background 17
1 Introduction 39
2 Computational Security 61
3 Pseudorandomness 83
V Conclusions 383
I Preliminaries 15
0 Mathematical Background 17
0.1 A quick overview of mathematical prerequisites . . . . 17
0.2 Mathematical Proofs . . . . . . . . . . . . . . . . . . . . 19
0.2.1 Example: The existence of infinitely many primes. 20
0.3 Probability and Sample spaces . . . . . . . . . . . . . . . 22
0.3.1 Random variables . . . . . . . . . . . . . . . . . . 24
0.3.2 Distributions over strings . . . . . . . . . . . . . . 26
0.3.3 More general sample spaces. . . . . . . . . . . . . 27
0.4 Correlations and independence . . . . . . . . . . . . . . 27
0.4.1 Independent random variables . . . . . . . . . . . 29
0.4.2 Collections of independent random variables. . . 30
0.5 Concentration and tail bounds . . . . . . . . . . . . . . . 31
0.5.1 Chebyshev’s Inequality . . . . . . . . . . . . . . . 32
0.5.2 The Chernoff bound . . . . . . . . . . . . . . . . . 33
0.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
0.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1 Introduction 39
1.1 Some history . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.2 Defining encryptions . . . . . . . . . . . . . . . . . . . . 41
1.3 Defining security of encryption . . . . . . . . . . . . . . 43
1.3.1 Generating randomness in actual cryptographic
systems . . . . . . . . . . . . . . . . . . . . . . . . 44
1.4 Defining the secrecy requirement. . . . . . . . . . . . . . 46
1.5 Perfect Secrecy . . . . . . . . . . . . . . . . . . . . . . . . 49
1.5.1 Achieving perfect secrecy . . . . . . . . . . . . . . 52
1.6 Necessity of long keys . . . . . . . . . . . . . . . . . . . 54
8
2 Computational Security 61
2.0.1 Proof by reduction . . . . . . . . . . . . . . . . . . 65
2.1 The asymptotic approach . . . . . . . . . . . . . . . . . . 66
2.1.1 Counting number of operations. . . . . . . . . . . 68
2.2 Our first conjecture . . . . . . . . . . . . . . . . . . . . . 70
2.3 Why care about the cipher conjecture? . . . . . . . . . . 71
2.4 Prelude: Computational Indistinguishability . . . . . . 71
2.5 The Length Extension Theorem or Stream Ciphers . . . 76
2.5.1 Appendix: The computational model . . . . . . . 80
3 Pseudorandomness 83
3.0.1 Unpredictability: an alternative approach for
proving the length extension theorem . . . . . . . 88
3.1 Stream ciphers . . . . . . . . . . . . . . . . . . . . . . . . 90
3.2 What do pseudorandom generators actually look like? . 92
3.2.1 Attempt 0: The counter generator . . . . . . . . . 92
3.2.2 Attempt 1: The linear checksum / linear feed-
back shift register (LFSR) . . . . . . . . . . . . . . 92
3.2.3 From insecurity to security . . . . . . . . . . . . . 94
3.2.4 Attempt 2: Linear Congruential Generators
with dropped bits . . . . . . . . . . . . . . . . . . 95
3.3 Successful examples . . . . . . . . . . . . . . . . . . . . . 96
3.3.1 Case Study 1: Subset Sum Generator . . . . . . . 96
3.3.2 Case Study 2: RC4 . . . . . . . . . . . . . . . . . . 97
3.3.3 Case Study 3: Blum, Blum and Shub . . . . . . . 98
3.4 Non-constructive existence of pseudorandom generators 99
V Conclusions 383
0.1 SYLLABUS
In this fast-paced course, I plan to start from the very basic notions of
cryptography and by the end of the term reach some of the exciting
advances that happened in the last few years such as the construction
of fully homomorphic encryption, a notion that Brian Hayes called “one
of the most amazing magic tricks in all of computer science”, and in-
distinguishability obfuscators which are even more amazing. To achieve
this, our focus will be on ideas rather than implementations and so we
will present cryptographic notions in their pedagogically simplest
form – the one that best illustrates the underlying concepts – rather
than the one that is most efficient, widely deployed, or conforms to In-
ternet standards. We will discuss some examples of practical systems
and attacks, but only when these serve to illustrate a conceptual point.
Depending on time, I plan to cover the following notions:
• Part I: Introduction
0.1.1 Prerequisites
The main prerequisite is the ability to read, write (and even enjoy!)
mathematical proofs. In addition, familiarity with algorithms, ba-
sic probability theory and basic linear algebra will be helpful. We’ll
only use fairly basic concepts from all these areas: e.g. Oh-notation-
e.g. 𝑂(𝑛) running time - from algorithms, notions such as events, ran-
dom variables, expectation, from probability theory, and notions such
as matrices, vectors, and eigenvectors. Mathematically mature stu-
dents should be able to pick up the needed notions on their own. See
the “mathematical background” handout for more details.
No programming knowledge is needed. If you’re interested in the
course but are not sure if you have sufficient background, or you have
any other questions, please don’t hesitate to contact me.
• Proofs: First and foremost, this course will involve a heavy dose
of formal mathematical reasoning, which includes mathematical
definitions, statements, and proofs.
(In this class, the particular humans you are trying to convince are
me and the teaching fellows.)
To write a proof of some statement X you need to follow three steps:
2. Think about X until you are able to convince yourself that X is true.
Like any good piece of writing, a proof should be concise and not
be overly formal or cumbersome. In fact, overuse of formalism can of-
ten be detrimental to the argument since it can mask weaknesses in the
argument from both the writer and the reader. Sometimes students
try to “throw the kitchen sink” at an answer trying to list all possi-
bly relevant facts in the hope of getting partial credit. But a proof is a
piece of writing, and a badly written proof will not get credit even if
it contains some correct elements. It is better to write a clear proof of
a partial statement. In particular, if you haven’t been able to convince
yourself that the statement is true, you should be honest about it and
explain which parts of the statement you have been able to verify and
which parts you haven’t.
the sense that they are never consecutive. It is possible to prove that
(in fact, it’s not a bad exercise) but this observation already suggests a
guess for what would be a number that is divisible by neither 𝑝 nor 𝑞,
namely 𝑝𝑞 + 1. Indeed, the remainder of 𝑛 = 𝑝𝑞 + 1 when dividing by
either 𝑝 or 𝑞 would be 1 (which in particular is not zero). This obser-
vation generalizes and we can set 𝑛 = 𝑝𝑞𝑟 + 1 to be a number that is
divisible neither by 𝑝, 𝑞 nor 𝑟, and more generally 𝑛 = 𝑝1 ⋯ , 𝑝𝑘 + 1 is
not divisible by 𝑝1 , … , 𝑝𝑘 .
Now we have convinced ourselves of the statement and it is time
to think of how to write this down in the clearest way. One issue that
arises is that we want to prove things truly from the definition of
primes and first principles, and so not assume properties of division
and remainders or even the existence of a prime factorization, without
proving it. Here is what a proof could look like. We will prove the
following two lemmas:
Lemma 0.2 — Existence of prime divisor. For every integer 𝑛 > 1, there
exists a prime 𝑝 > 1 that divides 𝑛.
For every set of integers 𝑝1 , … , 𝑝𝑘 > 1,
Lemma 0.3 — Existence of co-prime.
there exists a number 𝑛 such that none of 𝑝1 , … , 𝑝𝑘 divide 𝑛.
From these two lemmas it follows that there exist infinitely many
primes, since otherwise if we let 𝑝1 , … , 𝑝𝑘 be the set of all primes,
then we would get a contradiction as by combining Lemma 0.2 and
Lemma 0.3 we would get a number 𝑛 with a prime factor outside this
set. We now prove the lemmas:
Proof of Lemma 0.2. Let 𝑛 > 1 be a number, and let 𝑝 be the smallest
divisor of 𝑛 that is larger than 1 (there exists such a number 𝑝 since 𝑛
divides itself). We claim that 𝑝 is a prime. Indeed suppose otherwise
there was some 1 < 𝑞 < 𝑝 that divides 𝑝. Then since 𝑛 = 𝑝𝑐 for some
integer 𝑐 and 𝑝 = 𝑞𝑐′ for some integer 𝑐′ we’ll get that 𝑛 = 𝑞𝑐𝑐′ and
hence 𝑞 divides 𝑛 in contradiction to the choice of 𝑝 as the smallest
divisor of 𝑛.
■
P
To test your intuition on probability, try to stop here
and prove the lemma on your own.
Figure 2: The event that if we toss three coins
𝑥0 , 𝑥1 , 𝑥2 ∈ {0, 1} then the sum of the 𝑥𝑖 ’s is even
has probability 1/2 since it corresponds to exactly 4
Proof of Lemma 0.4. We prove the lemma by induction on 𝑛. For the out of the 8 possible strings of length 3.
We can also use the intersection (∩) and union (∪) operators to
talk about the probability of both event 𝐴 and event 𝐵 happening, or
the probability of event 𝐴 or event 𝐵 happening. For example, the
probability 𝑝 that 𝑥 has an even number of ones and 𝑥0 = 1 is the same
𝑛−1
as Pr[𝐴 ∩ 𝐵] where 𝐴 = {𝑥 ∈ {0, 1}𝑛 ∶ ∑𝑖=0 𝑥𝑖 = 0 mod 2} and
𝐵 = {𝑥 ∈ {0, 1}𝑛 ∶ 𝑥0 = 1}. This probability is equal to 1/4 for
𝑛 > 1. (It is a great exercise for you to pause here and verify that you
understand why this is the case.)
30 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
Pr [∑ 𝑥𝑖 = 0 mod 2 ∧ 𝑥0 = 1] .
𝑥∼{0,1}𝑛
𝑖
Pr[𝐴] = |𝐴|
2𝑛 = 2𝑛 −|𝐴|
2𝑛 =1− |𝐴|
2𝑛 = 1 − Pr[𝐴]
This makes sense: since 𝐴 happens if and only if 𝐴 does not happen,
the probability of 𝐴 should be one minus the probability of 𝐴.
R
Remark 0.5 — Remember the sample space. While the
above definition might seem very simple and almost
trivial, the human mind seems not to have evolved for
probabilistic reasoning, and it is surprising how often
people can get even the simplest settings of probability
wrong. One way to make sure you don’t get confused
when trying to calculate probability statements is
to always ask yourself the following two questions:
(1) Do I understand what is the sample space that
this probability is taken over?, and (2) Do I under-
stand what is the definition of the event that we are
analyzing?.
For example, suppose that I were to randomize seating
in my course, and then it turned out that students
sitting in row 7 performed better on the final: how
surprising should we find this? If we started out with
the hypothesis that there is something special about
the number 7 and chose it ahead of time, then the
event that we are discussing is the event 𝐴 that stu-
dents sitting in number 7 had better performance on
the final, and we might find it surprising. However, if
we first looked at the results and then chose the row
whose average performance is best, then the event
we are discussing is the event 𝐵 that there exists some
row where the performance is higher than the over-
all average. 𝐵 is a superset of 𝐴, and its probability
(even if there is no correlation between sitting and
performance) can be quite significant.
we don’t want to just analyze whether we won or lost, but also how
much we’ve gained. A (real valued) random variable is simply a way
to associate a number with the result of a probabilistic experiment.
Formally, a random variable is a function 𝑋 ∶ {0, 1}𝑛 → ℝ that maps
every outcome 𝑥 ∈ {0, 1}𝑛 to an element 𝑋(𝑥) ∈ ℝ. For example, the
function 𝑠𝑢𝑚 ∶ {0, 1}𝑛 → ℝ that maps 𝑥 to the sum of its coordinates
𝑛−1
(i.e., to ∑𝑖=0 𝑥𝑖 ) is a random variable.
The expectation of a random variable 𝑋, denoted by 𝔼[𝑋], is the
average value that that this number takes, taken over all draws from
the probabilistic experiment. In other words, the expectation of 𝑋 is
defined as follows:
Proof.
𝔼[𝑋 + 𝑌 ] = ∑ 2−𝑛 (𝑋(𝑥) + 𝑌 (𝑥)) =
𝑥∈{0,1}𝑛
𝔼[𝑋] + 𝔼[𝑌 ]
■
P
If you have not seen discrete probability before, please
go over this argument again until you are sure you
follow it; it is a prototypical simple example of the
type of reasoning we will employ again and again in
this course.
32 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
P
Before looking at the proof, try to see why the union
bound makes intuitive sense. We can also prove
it directly from the definition of probabilities and
the cardinality of sets, together with the equation
|𝐴 ∪ 𝐵| ≤ |𝐴| + |𝐵|. Can you see why the latter
equation is true? (See also Fig. 3.)
Proof of Lemma 0.7. For every 𝑥, the variable 1𝐴∪𝐵 (𝑥) ≤ 1𝐴 (𝑥) + 1𝐵 (𝑥).
Hence, Pr[𝐴∪𝐵] = 𝔼[1𝐴∪𝐵 ] ≤ 𝔼[1𝐴 +1𝐵 ] = 𝔼[1𝐴 ]+𝔼[1𝐵 ] = Pr[𝐴]+Pr[𝐵].
■
Pr[𝐴 ∩ 𝐵] = Pr[𝐴] ⋅ Pr[𝐵]. If Pr[𝐴 ∩ 𝐵] > Pr[𝐴] ⋅ Pr[𝐵] then we say 𝑎 ⋅ 𝑏 which corresponds to the probability 𝑎⋅𝑏 𝑥2
. In the
right figure, the area of the triangle 𝐵 is 𝑏⋅𝑥 which
that 𝐴 and 𝐵 are positively correlated, while if Pr[𝐴 ∩ 𝐵] < Pr[𝐴] ⋅ Pr[𝐵] 2
corresponds to a probability of 2𝑥 𝑏
, but the area of
then we say that 𝐴 and 𝐵 are negatively correlated (see Fig. 1). 𝐴 ∩ 𝐵 is 𝑏′ ⋅𝑎
for some 𝑏′ < 𝑏. This means that the
2
If we consider the above examples on the experiment of choosing ′
probability of 𝐴 ∩ 𝐵 is 𝑏2𝑥⋅𝑎2 < 𝑏
2𝑥 ⋅ 𝑥𝑎 , or in other words
𝑥 ∈ {0, 1}3 then we can see that Pr[𝐴 ∩ 𝐵] < Pr[𝐴] ⋅ Pr[𝐵].
34 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
Pr[𝑥0 = 1] = 1
2
Pr[𝑥0 + 𝑥1 + 𝑥2 ≥ 2] = Pr[{011, 101, 110, 111}] = 4
8 = 1
2
but
R
Remark 0.8 — Disjointness vs independence. People
sometimes confuse the notion of disjointness and in-
dependence, but these are actually quite different. Two
events 𝐴 and 𝐵 are disjoint if 𝐴 ∩ 𝐵 = ∅, which means
that if 𝐴 happens then 𝐵 definitely does not happen.
They are independent if Pr[𝐴 ∩ 𝐵] = Pr[𝐴] Pr[𝐵] which
means that knowing that 𝐴 happens gives us no infor-
mation about whether 𝐵 happened or not. If 𝐴 and 𝐵
have nonzero probability, then being disjoint implies
that they are not independent, since in particular it
means that they are negatively correlated.
More than two events: We can generalize this definition to more than
two events. We say that events 𝐴1 , … , 𝐴𝑘 are mutually independent
if knowing that any set of them occurred or didn’t occur does not
change the probability that an event outside the set occurs. Formally,
the condition is that for every subset 𝐼 ⊆ [𝑘],
Figure 5: Consider the sample space {0, 1}𝑛 and the
events 𝐴, 𝐵, 𝐶, 𝐷, 𝐸 corresponding to 𝐴: 𝑥0 = 1, 𝐵:
Pr[∧𝑖∈𝐼 𝐴𝑖 ] = ∏ Pr[𝐴𝑖 ]. 𝑥1 = 1, 𝐶: 𝑥0 +𝑥1 +𝑥2 ≥ 2, 𝐷: 𝑥0 +𝑥1 +𝑥2 = 0𝑚𝑜𝑑2
𝑖∈𝐼
and 𝐷: 𝑥0 + 𝑥1 = 0𝑚𝑜𝑑2. We can see that 𝐴
and 𝐵 are independent, 𝐶 is positively correlated
For example, if 𝑥 ∼ {0, 1}3 , then the events {𝑥0 = 1}, {𝑥1 = 1} and with 𝐴 and positively correlated with 𝐵, the three
{𝑥2 = 1} are mutually independent. On the other hand, the events events 𝐴, 𝐵, 𝐷 are mutually independent, and while
every pair out of 𝐴, 𝐵, 𝐸 is independent, the three
{𝑥0 = 1}, {𝑥1 = 1} and {𝑥0 + 𝑥1 = 0 mod 2} are not mutually events 𝐴, 𝐵, 𝐸 are not mutually independent since
independent, even though every pair of these events is independent their intersection has probability 28 = 41 instead of
(can you see why? see also Fig. 5). 2 ⋅ 2 ⋅ 2 = 8.
1 1 1 1
mathe mati ca l backg rou n d 35
P
The notation in the lemma’s statement is a bit cum-
bersome, but at the end of the day, it simply says that
if 𝑋 and 𝑌 are random variables that depend on two
disjoint sets 𝑆 and 𝑇 of coins (for example, 𝑋 might
be the sum of the first 𝑛/2 coins, and 𝑌 might be the
largest consecutive stretch of zeroes in the second 𝑛/2
coins), then they are independent.
|𝐶|
2𝑛 = |𝐴| |𝐵| 2𝑛−𝑘−𝑚
2𝑘 2𝑚 2𝑛−𝑘−𝑚 = Pr[𝑋 = 𝑎] Pr[𝑌 = 𝑏].
𝔼[𝑋] 𝔼[𝑌 ]
36 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
∑ Pr[𝑋 = 𝑥] Pr[𝑌 = 𝑦] =
𝑥 s.t.𝐹 (𝑥)=𝑎,𝑦 s.t. 𝐺(𝑦)=𝑏
⎛
⎜ ∑ Pr[𝑋 = 𝑥]⎞ ⎟⋅⎛
⎜ ∑ Pr[𝑌 = 𝑦]⎞⎟=
⎝𝑥 s.t.𝐹 (𝑥)=𝑎 ⎠ ⎝𝑦 s.t.𝐺(𝑦)=𝑏 ⎠
Pr[𝐹 (𝑋) = 𝑎] Pr[𝐺(𝑌 ) = 𝑏].
P
We leave proving Lemma 0.10 and Lemma 0.11 as
Exercise 0.9 Exercise 0.10. It is good idea for you stop
now and do these exercises to make sure you are com-
mathe mati ca l backg rou n d 37
If 𝑋 is a non-negative random
Theorem 0.12 — Markov’s inequality.
variable then Pr[𝑋 ≥ 𝑘 𝔼[𝑋]] ≤ 1/𝑘.
0.6 EXERCISES
Exercise 0.1Prove that for every finite 𝑆, 𝑇 , there are (|𝑇 | + 1)|𝑆| partial
functions from 𝑆 to 𝑇 .
■
√
d. 𝐹 (𝑛) = 𝑛, 𝐺(𝑛) = 2√log 𝑛
e. 𝐹 (𝑛) = (⌈0.2𝑛⌉
𝑛
) , 𝐺(𝑛) = 20.1𝑛 (where (𝑛𝑘) is the number of 𝑘-sized
2
one way to do this is to use Stirling’s approximation
subsets of a set of size 𝑛) and 𝑔(𝑛) = 20.1𝑛 . See footnote for hint.2 for the factorial function..
In the follow-
Exercise 0.4 — Properties of expectation and variance.
ing exercise 𝑋, 𝑌 denote random variables over some sample
space 𝑆. You can assume that the probability on 𝑆 is the uniform
distribution— every point 𝑠 is output with probability 1/|𝑆|. Thus
𝔼[𝑋] = (1/|𝑆|) ∑𝑠∈𝑆 𝑋(𝑠). We define the variance and standard
deviation of 𝑋 and 𝑌 as above (e.g., 𝑉 𝑎𝑟[𝑋] = 𝔼[(𝑋 − 𝔼[𝑋])2 ] and the
standard deviation is the square root of the variance). You can reuse
your answers to prior questions in the later ones.
4. Give an example for a random variable 𝑋 such that 𝔼[𝑋 2 ] > 𝔼[𝑋]2 .
3. Prove that if 𝑚 > 1000 ⋅ 𝑛2 then the probability that 𝐻 is one to one
is at least 0.9.
5. Prove that if 𝑚 < 𝑛2 /1000 then the probability that 𝐻 is one to one
is at most 0.1.
0.7 EXERCISES
Suppose that we toss three independent fair coins 𝑎, 𝑏, 𝑐 ∈
Exercise 0.6
{0, 1}. What is the probability that the XOR of 𝑎,𝑏, and 𝑐 is equal to 1?
What is the probability that the AND of these three values is equal to
1? Are these two events independent?
■
Prove that if
Exercise 0.11 — Variance of independent random variables.
𝑋0 , … , 𝑋𝑛−1 are independent random variables then Var[𝑋0 + ⋯ +
𝑛−1
𝑋𝑛−1 ] = ∑𝑖=0 Var[𝑋𝑖 ].
■
2. Use this and Exercise 0.13 to prove (an approximate version of)
the Chernoff bound for the case that 𝑋0 , … , 𝑋𝑛−1 are i.i.d. random
variables over {0, 1} each equaling 0 and 1 with probability 1/2.
That is, prove that for every 𝜖 > 0, and 𝑋0 , … , 𝑋𝑛−1 as above,
𝑛−1
Pr[| ∑𝑖=0 − 𝑛/2
| > 𝜖𝑛] < 2
0.1⋅𝜖2 𝑛
.
■
mathe mati ca l backg rou n d 43
1. Prove that for every 𝑗0 , … , 𝑗𝑛−1 ∈ ℕ, if there exists one 𝑖 such that 𝑗𝑖
𝑛−1 𝑗
is odd then 𝔼[∏𝑖=0 𝑌𝑖 𝑖 ] = 0.
5
Hint: Bound the number of tuples 𝑗0 , … , 𝑗𝑛−1 such
2. Prove that for every 𝑘, 𝔼[(∑𝑖=0 𝑌𝑖 )𝑘 ] ≤ (10𝑘𝑛)𝑘/2 .5
𝑛−1
that every 𝑗𝑖 is even and ∑ 𝑗𝑖 = 𝑘 using the Binomial
coefficient and the fact that in any such tuple there are
𝑛/(10000 log 1/𝜖) 6 at most 𝑘/2 distinct indices.
3. Prove that for every 𝜖 > 0, Pr[| ∑𝑖 𝑌𝑖 | ≥ 𝜖𝑛] ≥ 2−𝜖 .
2
6
Hint: Set 𝑘 = 2⌈𝜖2 𝑛/1000⌉ and then show that if the
event | ∑ 𝑌𝑖 | ≥ 𝜖𝑛 happens then the random variable
■ (∑ 𝑌𝑖 )𝑘 is a factor of 𝜖−𝑘 larger than its expectation.
■
44 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
a. 1,000
b. 10,000
c. 100,000
d. 1,000,000
Exercise 0.19 Would the answer to Exercise 0.18 change if the country
had 300,000,000,000 citizens?
■
a. 1,000
b. 10,000
c. 100,000
d. 1,000,000
■
1
Introduction
the house arrest under which she had been for the last 18 years. As
part of this complicated plot, she sent a coded letter to Sir Anthony
Babington.
Mary used what’s known as a substitution cipher where each letter
is transformed into a different obscure symbol (see Fig. 1.1). At a first
look, such a letter might seem rather inscrutable- a meaningless se-
quence of strange symbols. However, after some thought, one might
recognize that these symbols repeat several times and moreover that
different symbols repeat with different frequencies. Now it doesn’t Figure 1.1: Snippet from encrypted communication
take a large leap of faith to assume that perhaps each symbol corre- between queen Mary and Sir Babington
to decrypt. Just as many ciphers in history, this has also been believed
by the Germans to be “impossible to break” and even quite late in the
war they refused to believe it was broken despite mounting evidence
to that effect. (In fact, some German generals refused to believe it
was broken even after the war.) Breaking Enigma was an heroic effort
which was initiated by the Poles and then completed by the British at
Bletchley Park, with Alan Turing (of the Turing machines) playing a
key role. As part of this effort the Brits built arguably the world’s first
large scale mechanical computation devices (though they looked more
similar to washing machines than to iPhones). They were also helped
along the way by some quirks and errors of the German operators. For
example, the fact that their messages ended with “Heil Hitler” turned
out to be quite useful.
Here is one entertaining anecdote: the Enigma machine would
never map a letter to itself. In March 1941, Mavis Batey, a cryptana-
lyst at Bletchley Park received a very long message that she tried to
decrypt. She then noticed a curious property— the message did not
contain the letter “L”.3 She realized that the probability that no “L”’s
appeared in the message is too small for this to happen by chance.
Hence she surmised that the original message must have been com-
posed only of L’s. That is, it must have been the case that the operator,
perhaps to test the machine, have simply sent out a message where he Figure 1.5: In the Enigma mechanical cipher the secret
repeatedly pressed the letter “L”. This observation helped her decode key would be the settings of the rotors and internal
wires. As the operator types up their message, the
the next message, which helped inform of a planned Italian attack and encrypted version appeared in the display area above,
secure a resounding British victory in what became known as “the and the internal state of the cipher was updated (and
so typing the same letter twice would generally result
Battle of Cape Matapan”. Mavis also helped break another Enigma
in two different letters output). Decrypting follows
machine. Using the information she provided, the Brits were able the same process: if the sender and receiver are using
to feed the Germans with the false information that the main allied the same key then typing the ciphertext would result
in the plaintext appearing in the display.
invasion would take place in Pas de Calais rather than on Normandy. 3
Here is a nice exercise: compute (up to an order
In the words of General Eisenhower, the intelligence from Bletchley of magnitude) the probability that a 50-letter long
park was of “priceless value”. It made a huge difference for the Allied message composed of random letters will end up not
containing the letter “L”.
war effort, thereby shortening World War II and saving millions of
lives. See also this interview with Sir Harry Hinsley.
𝑚 = 𝐷𝑘 (𝐸𝑘 (𝑚)) .
Let ℓ ∶ ℕ → ℕ and 𝐶 ∶ ℕ → ℕ
Definition 1.1 — Valid encryption scheme.
be two functions mapping natural numbers to natural numbers.
A pair of polynomial-time computable functions (𝐸, 𝐷) map-
ping strings to strings is a valid private key encryption scheme (or
encryption scheme for short) with plaintext length function ℓ(⋅) and
ciphertext length function 𝐶(⋅) if for every 𝑛 ∈ ℕ, 𝑘 ∈ {0, 1}𝑛 and
𝑚 ∈ {0, 1}ℓ(𝑛) , |𝐸𝑘 (𝑚)| = 𝐶(𝑛) and
We will often write the first input (i.e., the key) to the encryp-
tion and decryption as a subscript and so can write (1.1) also as
𝐷𝑘 (𝐸𝑘 (𝑚)) = 𝑚.
The validity condition implies that for any fixed 𝑘, the map 𝑚 ↦
𝐸𝑘 (𝑚) is one to one (can you see why?) and hence the ciphertext
length is always at least the plaintext length. Thus we typically focus
on the plaintext length as the quantity to optimize in an encryption
i n trod u c ti on 49
scheme. The larger ℓ(𝑛) is, the better the scheme, since it means we
need a shorter secret key to protect messages of the same length.
R
Remark 1.2 — A note on notation, and comparison with
Katz-Lindell, Boneh-Shoup, and other texts.. A note on
notation: We will always use 𝑖, 𝑗, ℓ, 𝑛 to denote natural
numbers.
The number 𝑛 will often denote the length of our
secret key. The length of the key (or another closely
related number) is often known as the security parame-
ter in the literature. Katz-Lindell also uses 𝑛 to denote
this parameter, while Boneh-Shoup and Rosulek use
𝜆 for it. (Some texts also use the Greek letter 𝜅 for the
same parameter.) We chose to denote the security
parameter by 𝑛 as to correspond with the standard
algorithmic notation for input length (as in 𝑂(𝑛) or
𝑂(𝑛2 ) time algorithms).
We often use ℓ to denote the length of the message,
sometimes also known as “block length” since longer
messages are simply chopped into “blocks” of length ℓ
and also appropriately padded.
We will use 𝑘 to denote the secret key, 𝑚 to denote
the secret plaintext message, and 𝑐 to denote the en-
crypted ciphertext. Note that 𝑘, 𝑚, 𝑐 are not numbers
but rather bit strings of lengths 𝑛, ℓ(𝑛), 𝐶(𝑛) respec-
tively. We will also sometimes use 𝑥 and 𝑦 to denote
strings, and so sometimes use 𝑥 as the plaintext and
𝑦 as the ciphertext. In general, while we try to reserve
variable names for particular purposes, cryptography
uses so many concepts that it would sometimes need
to “reuse” the same letter for different purposes.
For simplicity, we denote the space of possible keys as
{0, 1}𝑛 and the space of possible messages as {0, 1}ℓ
for ℓ = ℓ(𝑛). Boneh-Shoup uses a more general no-
tation of 𝒦 for the space of all possible keys and ℳ
for the space of all possible messages. This does not
make much difference since we can represent every
discrete object such as a key or message as a binary
string. (One difference is that in principle the space
of all possible messages could include messages of
unbounded length, though in such a case what is done
in both theory and practice is to break these up into
finite-size blocks and encrypt one block at a time.)
• For every fixed string 𝑥 ∈ {0, 1}𝑛 , if you toss a coin 𝑛 times, the
probability that the heads/tails pattern will be exactly 𝑥 is 2−𝑛 .
drive, mouse or keyboard, they don’t have access to many of the en-
tropy sources that desktops have. Coupled with some good old fash-
ioned ignorance of cryptography and software bugs, this led to many
keys that are downright trivial to break, see this blog post and this
web page for more details.
After the entropy is collected and then “purified” or “extracted” to
a uniformly random string that is, say, a few hundred bits long, we of-
ten need to “expand” it into a longer string that is also uniform (or at
least looks like that for all practical purposes). We will discuss how to
go about that in the next lecture. This step has its weaknesses too, and
in particular the Snowden documents, combined with observations of
Shumow and Ferguson, strongly suggest that the NSA has deliberately
inserted a trapdoor in one of the pseudorandom generators published
by the National Institute of Standards and Technologies (NIST). Fortu-
nately, this generator wasn’t widely adopted, but apparently the NSA
did pay 10 million dollars to RSA security so the latter would make
this generator their default option in their products.
5
There are about 1068 atoms in the galaxy, so even if
finished examining all the possibilities.5 One can understand why the we assumed that each one of those atoms was a com-
Germans thought it was impossible to break. (Note that despite the puter that can process say 1021 decryption attempts
per second (as the speed of light is 109 meters per
number of possibilities being so enormous, such a key can still be eas-
second and the diameter of an atom is about 10−12
ily specified and shared between Alice and Bob by writing down 113 meters), then it would still take 10113−89 = 1024
digits on a piece of paper.) Ray Miller of the NSA had calculated that, seconds, which is about 1017 years to exhaust all
possibilities, while the sun is estimated to burn out in
in the way the Germans used the machine, the number of possibilities about 5 billion years.
was “only” 1023 , but this is still extremely difficult to pull off even to-
day, and many orders of magnitudes above the computational powers
during the WW-II era. Thus clearly, it is sometimes possible to break
an encryption without trying all possibilities. A corollary is that hav-
ing a huge number of key combinations does not guarantee security,
as an attacker might find a shortcut (as the allies did for Enigma) and
recover the key without trying all options.
Since it is possible to recover the key with some tiny probability
(e.g. by guessing it at random), perhaps one way to define security
of an encryption scheme is that an attacker can never recover the key
with probability significantly higher than that. Here is an attempt at
such a definition:
An encryption
Definition 1.3 — Security of encryption: first attempt.
scheme (𝐸, 𝐷) is 𝑛-secure if no matter what method Eve employs,
the probability that she can recover the true key 𝑘 from the cipher-
text 𝑐 is at most 2−𝑛 .
P
When you see a mathematical definition that attempts
to model some real-life phenomenon such as security,
you should pause and ask yourself:
You might wonder if Definition 1.3 is not too strong. After all how
are we going to ever prove that Eve cannot recover the secret key no
matter what she does? Edgar Allan Poe would say that there can al-
ways be a method that we overlooked. However, in fact this definition
is too weak! Consider the following encryption: the secret key 𝑘 is cho-
sen at random in {0, 1}𝑛 but our encryption scheme simply ignores it
54 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
The math behind the above argument is very simple, yet I urge
you to read and re-read the last two paragraphs until you are sure
that you completely understand why this encryption is in fact secure
according to the above definition. This is a “toy example” of the kind
of reasoning that we will be employing constantly throughout this
course, and you want to make sure that you follow it.
So, Lemma 1.4 is true, but one might question its meaning. Clearly
this silly example was not what we meant when stating this defini-
tion. However, as mentioned above, we are not willing to ignore even
silly examples and must amend the definition to rule them out. One
obvious objection is that we don’t care about hiding the key- it is the
message that we are trying to keep secret. This suggests the next at-
tempt:
An encryption
Definition 1.5 — Security of encryption: second attempt.
scheme (𝐸, 𝐷) is 𝑛-secure if for every message 𝑚 no matter what
method Eve employs, the probability that she can recover 𝑚 from
the ciphertext 𝑐 = 𝐸𝑘 (𝑚) is at most 2−𝑛 .
Now this seems like it captures our intended meaning. But remem-
ber that we are being anal, and truly insist that the definition holds
as stated, namely that for every plaintext message 𝑚 and every func-
tion 𝐸𝑣𝑒 ∶ {0, 1}𝐶 → {0, 1}ℓ , the probability over the choice of 𝑘 that
𝐸𝑣𝑒(𝐸𝑘 (𝑚)) = 𝑚 is at most 2−𝑛 . But now we see that this is clearly
impossible. After all, this is supposed to work for every message 𝑚
and every function 𝐸𝑣𝑒, but clearly if 𝑚 is the all-zeroes message 0ℓ
and 𝐸𝑣𝑒 is the function that ignores its input and simply outputs 0ℓ ,
then it will hold that 𝐸𝑣𝑒(𝐸𝑘 (𝑚)) = 𝑚 with probability one.
i n trod u c ti on 55
So, if before the definition was too weak, the new definition is too
strong and is impossible to achieve. The problem is that of course
we could guess a fixed message with probability one, so perhaps we
could try to consider a definition with a random message. That is:
An encryption
Definition 1.6 — Security of encryption: third attempt.
scheme (𝐸, 𝐷) is 𝑛-secure if no matter what method Eve employs,
if 𝑚 is chosen at random from {0, 1}ℓ , the probability that she can
recover 𝑚 from the ciphertext 𝑐 = 𝐸𝑘 (𝑚) is at most 2−𝑛 .
and
𝔼 Pr[𝐸𝑣𝑒(𝐸𝑘 (𝑚0 )) = 𝑚1 ] ≤ 1/|𝑀 |
𝑚1 ←𝑅 𝑀
(In words, for random 𝑚1 , the probability that Eve outputs 𝑚1 given
an encryption of 𝑚1 is higher than the probability that Eve outputs 𝑚1
given an encryption of 𝑚0 .)
In particular, by the averaging argument (the argument that if the
average of numbers is larger than 𝛼 then one of the numbers is larger
than 𝛼) there must exist 𝑚1 ∈ 𝑀 satisfying
(Can you see why? This is worthwhile stopping and reading again.)
But this can be turned into an attacker 𝐸𝑣𝑒′ such that for 𝑏 ←𝑅
{0, 1}. the probability that 𝐸𝑣𝑒′ (𝐸𝑘 (𝑚𝑏 )) = 𝑚𝑏 is larger than 1/2.
Indeed, we can define 𝐸𝑣𝑒′ (𝑐) to output 𝑚1 if 𝐸𝑣𝑒(𝑐) = 𝑚1 and
otherwise output a random message in {𝑚0 , 𝑚1 }. The probability
that 𝐸𝑣𝑒′ (𝑦) equals 𝑚1 is higher when 𝑐 = 𝐸𝑘 (𝑚1 ) than when 𝑐 =
𝐸𝑘 (𝑚0 ), and since 𝐸𝑣𝑒′ outputs either 𝑚0 or 𝑚1 , this means that the
probability that 𝐸𝑣𝑒′ (𝐸𝑘 (𝑚𝑏 )) = 𝑚𝑏 is larger than 1/2. (Can you see
why?)
■
P
The proof of Theorem 1.8 is not trivial, and is worth
reading again and making sure you understand it.
An excellent exercise, which I urge you to pause and
do now is to prove the following: (𝐸, 𝐷) is perfectly
secret if for every plaintexts 𝑚, 𝑚′ ∈ {0, 1}ℓ , the two
random variables {𝐸𝑘 (𝑚)} and {𝐸𝑘′ (𝑚′ )} (for ran-
domly chosen keys 𝑘 and 𝑘′ ) have precisely the same
distribution.
Solution:
We only sketch the proof. The condition in the exercise is equiv-
alent to perfect secrecy with |𝑀 | = 2. For every 𝑀 = {𝑚, 𝑚′ },
if 𝑌 and 𝑌 ′ are identical then clearly for every 𝐸𝑣𝑒 and possible
output 𝑦, Pr[𝐸𝑣𝑒(𝐸𝑘 (𝑚)) = 𝑦] = Pr[𝐸𝑣𝑒(𝐸𝑘 (𝑚′ )) = 𝑦] since these
correspond applying 𝐸𝑣𝑒 on the same distribution 𝑌 = 𝑌 ′ . On the
other hand, if 𝑌 and 𝑌 ′ are not identical then there must exist some
ciphertext 𝑐∗ such that Pr[𝑌 = 𝑐∗ ] > Pr[𝑌 ′ = 𝑐∗ ] (or vice versa).
The adversary that on input 𝑐 guesses that 𝑐 is an encryption of 𝑚
if 𝑐 = 𝑐∗ and otherwise tosses a coin will have some advantage over
1/2 in distinguishing an encryption of 𝑚 from an encryption of 𝑚′ .
■
58 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
Our scheme is the one-time pad also known as the “Vernam Ci-
pher”, see Fig. 1.9. The encryption is exceedingly simple: to encrypt
a message 𝑚 ∈ {0, 1}𝑛 with a key 𝑘 ∈ {0, 1}𝑛 we simply output
𝑚 ⊕ 𝑘 where ⊕ is the bitwise XOR operation that outputs the string
corresponding to XORing each coordinate of 𝑚 and 𝑘.
⋆
Proof of Theorem 1.10. For two binary strings 𝑎 and 𝑏 of the same
length 𝑛, we define 𝑎 ⊕ 𝑏 to be the string 𝑐 ∈ {0, 1}𝑛 such that
𝑐𝑖 = 𝑎𝑖 + 𝑏𝑖 mod 2 for every 𝑖 ∈ [𝑛]. The encryption scheme
(𝐸, 𝐷) is defined as follows: 𝐸𝑘 (𝑚) = 𝑚 ⊕ 𝑘 and 𝐷𝑘 (𝑐) = 𝑐 ⊕ 𝑘.
By the associative law of addition (which works also modulo two),
𝐷𝑘 (𝐸𝑘 (𝑚)) = (𝑚 ⊕ 𝑘) ⊕ 𝑘 = 𝑚 ⊕ (𝑘 ⊕ 𝑘) = 𝑚 ⊕ 0𝑛 = 𝑚, using the fact
that for every bit 𝜎 ∈ {0, 1}, 𝜎 + 𝜎 mod 2 = 0 and 𝜎 + 0 = 𝜎 mod 2.
Hence (𝐸, 𝐷) form a valid encryption.
To analyze the perfect secrecy property, we claim that for every
𝑚 ∈ {0, 1}𝑛 , the distribution 𝑌𝑚 = 𝐸𝑘 (𝑚) where 𝑘 ←𝑅 {0, 1}𝑛 is
simply the uniform distribution over {0, 1}𝑛 , and hence in particular
the distributions 𝑌𝑚 and 𝑌𝑚′ are identical for every 𝑚, 𝑚′ ∈ {0, 1}𝑛 .
Indeed, for every particular 𝑦 ∈ {0, 1}𝑛 , the value 𝑦 is output by 𝑌𝑚 if
and only if 𝑦 = 𝑚 ⊕ 𝑘 which holds if and only if 𝑘 = 𝑚 ⊕ 𝑦. Since 𝑘 is
chosen uniformly at random in {0, 1}𝑛 , the probability that 𝑘 happens
to equal 𝑚 ⊕ 𝑦 is exactly 2−𝑛 , which means that every string 𝑦 is output
by 𝑌𝑚 with probability 2−𝑛 .
60 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
P
The argument above is quite simple but is worth read-
ing again. To understand why the one-time pad is
perfectly secret, it is useful to envision it as a bipartite
graph as we’ve done in Fig. 1.8. (In fact the encryp-
tion scheme of Fig. 1.8 is precisely the one-time pad
for 𝑛 = 2.) For every 𝑛, the one-time pad encryp-
tion scheme corresponds to a bipartite graph with
2𝑛 vertices on the “left side” corresponding to the
plaintexts in {0, 1}𝑛 and 2𝑛 vertices on the “right side”
corresponding to the ciphertexts {0, 1}𝑛 . For every
𝑥 ∈ {0, 1}𝑛 and 𝑘 ∈ {0, 1}𝑛 , we connect 𝑥 to the vertex
𝑦 = 𝐸𝑘 (𝑥) with an edge that we label with 𝑘. One can
see that this is the complete bipartite graph, where
every vertex on the left is connected to all vertices on
the right. In particular this means that for every left
vertex 𝑥, the distribution on the ciphertexts obtained
by taking a random 𝑘 ∈ {0, 1}𝑛 and going to the
neighbor of 𝑥 on the edge labeled 𝑘 is the uniform dis-
tribution over {0, 1}𝑛 . This ensures the perfect secrecy
condition.
used cryptosystems such as AES-128 have a short key of 128 bits (i.e.,
16 bytes) that can be used to protect terabytes or more of communica-
tion! Imagine that we all needed to use the one time pad. If that was
the case, then if you had to communicate with 𝑚 people, you would
have to maintain (securely!) 𝑚 huge files that are each as long as the
length of the maximum total communication you expect with that per-
son. Imagine that every time you opened an account with Amazon,
Google, or any other service, they would need to send you in the mail
(ideally with a secure courier) a DVD full of random numbers, and
every time you suspected a virus, you’d need to ask all these services
for a fresh DVD. This doesn’t sound so appealing.
This is not just a theoretical issue. The Soviets have used the one-
time pad for their confidential communication since before the 1940’s.
In fact, even before Shannon’s work, the U.S. intelligence already
knew in 1941 that the one-time pad is in principle “unbreakable” (see
page 32 in the Venona document). However, it turned out that the
hassle of manufacturing so many keys for all the communication took
its toll on the Soviets and they ended up reusing the same keys for
more than one message. They did try to use them for completely dif-
ferent receivers in the (false) hope that this wouldn’t be detected. The
Venona Project of the U.S. Army was founded in February 1943 by
Gene Grabeel (see Fig. 1.11), a former home economics teacher from
Madison Heights, Virginia and Lt. Leonard Zubko. In October 1943,
they had their breakthrough when it was discovered that the Russians
were reusing their keys. In the 37 years of its existence, the project has
resulted in a treasure chest of intelligence, exposing hundreds of KGB
agents and Russian spies in the U.S. and other countries, including
Julius Rosenberg, Harry Gold, Klaus Fuchs, Alger Hiss, Harry Dexter
Figure 1.11: Gene Grabeel, who founded the U.S.
White and many others. Russian SigInt program on 1 Feb 1943. Photo taken in
Unfortunately it turns out that that such long keys are necessary for 1942, see Page 7 in the Venona historical study.
perfect secrecy:
Proof Idea:
The idea behind the proof is illustrated in Fig. 1.12. We define a
graph between the plaintexts and ciphertexts, where we put an edge Figure 1.12: An encryption scheme where the number
of keys is smaller than the number of plaintexts
between plaintext 𝑥 and ciphertext 𝑦 if there is some key 𝑘 such that corresponds to a bipartite graph where the degree is
𝑦 = 𝐸𝑘 (𝑥). The degree of this graph is at most the number of potential smaller than the number of vertices on the left side.
Together with the validity condition this implies that
keys. The fact that the degree is smaller than the number of plaintexts
there will be two left vertices 𝑥, 𝑥′ with non-identical
(and hence of ciphertexts) implies that there would be two plaintexts neighborhoods, and hence the scheme does not satisfy
𝑥 and 𝑥′ with different sets of neighbors, and hence the distribution perfect secrecy.
62 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
R
Remark 1.12 — Adding probability into the picture. There
is a sense in which both our secrecy and our impossi-
bility results might not be fully convincing, and that
is that we did not explicitly consider algorithms that
use randomness . For example, maybe Eve can break
a perfectly secret encryption if she is not modeled
as a deterministic function 𝐸𝑣𝑒 ∶ {0, 1}𝑜 → {0, 1}ℓ
but rather a probabilistic process. Similarly, maybe the
encryption and decryption functions could be prob-
abilistic processes as well. It turns out that none of
those matter.
For the former, note that a probabilistic process can
be thought of as a distribution over functions, in the
sense that we have a collection of functions 𝑓1 , ..., 𝑓𝑁
mapping {0, 1}𝑜 to {0, 1}ℓ , and some probabilities
i n trod u c ti on 63
Let (𝐸, 𝐷) be an
Theorem 1.13 — Short keys imply high probability attack.
encryption scheme with ℓ(𝑛) = 𝑛 + 𝑡. Then there is a function 𝐸𝑣𝑒
and pair of messages 𝑥0 , 𝑥1 such that
We show this by arguing that this bound holds for every fixed 𝑘,
when we take the probability over 𝑥, and so in particular it holds also
for random 𝑘. Indeed, for every fixed 𝑘, the map 𝑥 ↦ 𝐸𝑘 (𝑥) is a one-
to-one map, and so the distribution of 𝐸𝑘 (𝑥) for random 𝑥 ∈ {0, 1}ℓ is
uniform over some set 𝑇𝑘 of size 2𝑛+𝑡 . For every 𝑘, the probability over
𝑥 that 𝐸𝑘 (𝑥) ∈ 𝑆0 is equal to
|𝑇𝑘 ∩𝑆0 | |𝑆0 | 2𝑛
|𝑇𝑘 | ≤ |𝑇𝑘 | ≤ 2𝑛+𝑡 = 2−𝑡
1 1
2 ⋅1+ 2 ⋅ (1 − 2−𝑡 ) = 1 − 2−𝑡−1 .
The program Distinguish will break any 128-bit key and 129-bit
message encryption Encrypt, in the sense that there exist a pair of
messages 𝑚0 , 𝑚1 such that Distinguish(Encrypt(𝑘, 𝑚𝑏 ), 𝑚0 , 𝑚1 ) =
𝑚𝑏 with probability at least 0.75 over 𝑘 ←𝑅 {0, 1}𝑛 and 𝑏 ←𝑅 {0, 1}.
An encryption
Definition 2.1 — Computational secrecy (first attempt).
scheme (𝐸, 𝐷) has 𝑡 bits of computational secrecy if for every two dis-
ℓ
tinct plaintexts {𝑚0 , 𝑚1 } ⊆ {0, 1} and every strategy of Eve using
at most 2𝑡 computational steps, if we choose at random 𝑏 ∈ {0, 1}
𝑛
and a random key 𝑘 ∈ {0, 1} , then the probability that Eve guesses
𝑚𝑏 after seeing 𝐸𝑘 (𝑚𝑏 ) is at most 1/2.
P
Before reading further, you might want to stop and
think if you can prove that there is no, say, encryption
√
scheme with 𝑛 bits of computational security satisfy-
ing Definition 2.1 with ℓ = 𝑛 + 1 and where the time to
compute the encryption is polynomial.
P
Before reading the proof, try to again review the
proof of Theorem 1.8, and see if you can generalize it
yourself to the computational setting.
and
𝑜 ℓ
• An adversary 𝐸𝑣𝑒 ∶ {0, 1} → {0, 1} such that
Pr 𝑛
[𝐸𝑣𝑒(𝐸𝑘 (𝑚)) = 𝑚] > 1/|𝑀 |
𝑚←𝑅 𝑀,𝑘←𝑅 {0,1}
This will imply that if 𝐸𝑣𝑒 ran in polynomial time and had poly-
nomial advantage over 1/|𝑀 | in guessing a plaintext chosen from 𝑀 ,
then 𝐸𝑣𝑒′ would run in polynomial time and have polynomial advan-
tage over 1/2 in guessing a plaintext chosen from {𝑚0 , 𝑚1 }.
comp u tati ona l se c u ri ty 73
The first item can be shown by simply doing the same proof more
carefully, keeping track how the advantage over |𝑀| 1
for 𝐸𝑣𝑒 translates
into an advantage over 2 for 𝐸𝑣𝑒 . As the world’s most annoying
1 ′
• The honest parties (the parties running the encryption and decryp-
tion algorithms) are extremely efficient, something like 100-1000
cycles per byte of data processed. In theory terms we would want
them be using an 𝑂(𝑛) or at worst 𝑂(𝑛2 ) time algorithms with not-
too-big hidden constants.
• Polynomial running time of the form 𝑑⋅𝑛𝑐 for some constants 𝑑, 𝑐 > 0
(or 𝑝𝑜𝑙𝑦(𝑛) = 𝑛𝑂(1) for short), which we will consider as efficient. 3
Some texts reserve the term exponential to functions
of the form 2𝜖𝑛√ for some 𝜖 > 0 and call a function
• Exponential running time of the form 2𝑑⋅𝑛 for some constants 𝑑, 𝜖 >
𝜖
such as, say, 2 𝑛 subexponential . However, we will
0 (or 2𝑛 for short) which we will consider as infeasible.3 generally not make this distinction in this course.
Ω(1)
comp u tati ona l se c u ri ty 75
2. Let 𝜇 ∶ ℕ → [0, ∞). Prove that 𝜇 is negligible if and only if for every
constant 𝑐, lim𝑛→∞ 𝑛𝑐 𝜇(𝑛) = 0.
■
76 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
R
Remark 2.5 — Asymptotic analysis. The above defini-
tions could be confusing if you haven’t encountered
asymptotic analysis before. Reading the beginning of
Chapter 3 (pages 43-51) in the KL book, as well as the
mathematical background lecture in my intro to TCS
notes can be extremely useful. As a rule of thumb, if
every time you see the word “polynomial” you imag-
ine the function 𝑛10 and every time you see
√
the word
“negligible” you imagine the function 2− 𝑛 then you
will get the right intuition.
What you need to remember is that negligible is much
smaller than any inverse polynomial, while polynomi-
als are closed under multiplication, and so we have the
“equations”
say, 𝑇 2 , then essentially every reasonable definition gives the same an- 6
With some caveats that need to be added due to
swer.6 Formally, we can use the notions of Turing machines, Boolean quantum computers: we’ll get to those later in the
circuits, or straightline programs to define complexity. For concrete- course, though they won’t change most of our theory.
𝑛 𝑚 See also this discussion in my intro TCS textbook
ness, let’s define that a function 𝐹 ∶ {0, 1} → {0, 1} has complexity
and this presentation of Aaronson on the “extended
at most 𝑇 if there is a Boolean circuit that computes 𝐹 using at most Church Turing thesis”.
𝑇 Boolean gates (say AND/OR/NOT or NAND; alternatively you can
choose your favorite universal gate sets.) We will often also consider
probabilistic functions in which case we allow the circuit a RAND gate
that outputs a single random bit (though this in general does not give
extra power). The fact that we only care about asymptotics means
you don’t really need to think of gates when arguing in cryptogra-
phy. However, it is comforting to know that this notion has a precise
mathematical formulation.
R
Remark 2.7 — Computing beyond functions. Later on
in the course, both our cryptographic schemes and
the adversaries will extend beyond simple functions
that map an input to an output, and we will consider
interactive algorithms that exchange messages with one
another. Such an algorithm can be implemented us-
ing circuits or Turing machines that take as input the
prior state and the history of messages up to a certain
point in the interaction, and output the next message
in the interaction. The number of operations used in
such a strategy is the total number of gates used in
computing all the messages.
78 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
Proof. We just sketch the proof, as this is not the focus of this course.
If 𝑃 = NP then whenever we have a loop that searches through some
domain to find some string that satisfies a particular property (like the
loop in the Distinguish subroutine above that searches over all keys)
then this loop can be sped up exponentially .
■
• Intuition: If the cipher conjecture is false then it means that for every
possible cipher we can make the exponential time attack described
above become efficient. It seems “too good to be true” in a similar
way that the assumption that P=NP seems too good to be true.
• Concrete candidates: As we will see in the next lecture, there are sev-
eral concrete candidate ciphers using keys shorter than messages
for which despite tons of effort, no one knows how to break them.
Some of them are widely used and hence governments and other
benign or not so benign organizations have every reason to invest
huge resources in trying to break them. Despite that as far as we
know (and we know a little more after Edward Snowden’s reve-
lations) there is no significant break known for the most popular
ciphers. Moreover, there are other ciphers that can be based on
comp u tati ona l se c u ri ty 79
Let
Definition 2.9 — Computational Indistinguishability (concrete definition).
𝑚
𝑋 and 𝑌 be two distributions over {0, 1} . We say that 𝑋 and 𝑌
are (𝑇 , 𝜖)-computationally indistinguishable, denoted by 𝑋 ≈𝑇 ,𝜖 𝑌 , if
for every function 𝐷 ∶ {0, 1}𝑚 → {0, 1} computable with at most 𝑇
operations,
Prove that
Solved Exercise 2.1 — Computational Indistinguishability game.
for every 𝑋, 𝑌 and 𝑇 , 𝜖 as above 𝑋 ≈𝑇 ,𝜖 𝑌 if and only if for every
≤ 𝑇 -operation computable 𝐸𝑣𝑒, the probability that 𝐸𝑣𝑒 wins in the
following game is at most 1/2 + 𝜖/2:
2. If 𝑏 = 0, we let 𝑤 ←𝑅 𝑋. If 𝑏 = 1, we let 𝑤 ←𝑅 𝑌 .
4. 𝐸𝑣𝑒 wins if 𝑏 = 𝑏′ .
P
Working out this exercise on your own is a great way
to get comfortable with computational indistinguisha-
bility, which is a fundamental notion.
comp u tati ona l se c u ri ty 81
Solution:
For every function 𝐸𝑣𝑒 ∶ {0, 1}𝑚 → {0, 1}, let 𝑝𝑋 = Pr[𝐸𝑣𝑒(𝑋) =
1] and 𝑝𝑌 = Pr[𝐸𝑣𝑒(𝑌 ) = 1].
Then the probability that 𝐸𝑣𝑒 wins the game is:
1
2 − 12 𝑝𝑋 + 21 𝑝𝑌 = 1
2 + 12 (𝑝𝑌 − 𝑝𝑋 )
We see that 𝐸𝑣𝑒 wins the game with success 1/2+𝜖/2 if and only
if
Pr[𝐸𝑣𝑒(𝑌 ) = 1] − Pr[𝐸𝑣𝑒(𝑋) = 1] = 𝜖 .
Since Pr[𝐸𝑣𝑒(𝑌 ) = 1]−Pr[𝐸𝑣𝑒(𝑋) = 1] ≤ |Pr[𝐸𝑣𝑒(𝑋) = 1] − Pr[𝐸𝑣𝑒(𝑌 ) = 1]|,
this already shows that if 𝑋 and 𝑌 are (𝑇 , 𝜖)-indistinguishable then
𝐸𝑣𝑒 will win the game with probability at most 𝜖/2.
For the other direction, assume that 𝑋 and 𝑌 are not compu-
tationally indistinguishable and let 𝐸𝑣𝑒 be a 𝑇 time operation
function such that
We say that {𝑋𝑛 }𝑛∈ℕ and {𝑌𝑛 }𝑛∈ℕ are computationally indistin-
guishable, denoted by {𝑋𝑛 }𝑛∈ℕ ≈ {𝑌𝑛 }𝑛∈ℕ , if for every polynomial
𝑝 ∶ ℕ → ℕ and sufficiently large 𝑛, 𝑋𝑛 ≈𝑝(𝑛),1/𝑝(𝑛) 𝑌𝑛 .
2. If 𝑏 = 0, we let 𝑤 ←𝑅 𝑋𝑛 . If 𝑏 = 1, we let 𝑤 ←𝑅 𝑌𝑛 .
4. 𝐸𝑣𝑒 wins if 𝑏 = 𝑏′ .
Let
Theorem 2.11 — Computational Indistinguishability phrasing of security.
(𝐸, 𝐷) be a valid encryption scheme. Then (𝐸, 𝐷) is computation-
ally secret if and only if for every two messages 𝑚0 , 𝑚1 ∈ {0, 1}ℓ ,
Working out the proof is an excellent way to make sure you under-
stand both the definition of computational secrecy and computational
indistinguishability, and hence we leave it as an exercise.
One intuition for computational indistinguishability is that it is
related to some notion of distance. If two distributions are computa-
tionally indistinguishable, then we can think of them as “very close”
comp u tati ona l se c u ri ty 83
Write
𝑚−1
Pr[𝐸𝑣𝑒(𝑋1 ) = 1]−Pr[𝐸𝑣𝑒(𝑋𝑚 ) = 1] = ∑ (Pr[𝐸𝑣𝑒(𝑋𝑖 ) = 1] − Pr[𝐸𝑣𝑒(𝑋𝑖+1 ) = 1]) .
𝑖=1
Thus,
𝑚−1
∑ |Pr[𝐸𝑣𝑒(𝑋𝑖 ) = 1] − Pr[𝐸𝑣𝑒(𝑋𝑖+1 ) = 1]| > (𝑚 − 1)𝜖
𝑖=1
In other words
∣𝔼𝑋1 ,…,𝑋𝑖−1 ,𝑌𝑖 ,…,𝑌ℓ [𝐸𝑣𝑒′ (𝑋1 , … , 𝑋𝑖−1 , 𝑌𝑖 , … , 𝑌ℓ )] − 𝔼𝑋1 ,…,𝑋𝑖 ,𝑌𝑖+1 ,…,𝑌ℓ [𝐸𝑣𝑒′ (𝑋1 , … , 𝑋𝑖 , 𝑌𝑖+1 , … , 𝑌ℓ )]∣ > 𝜖 .
𝔼𝑋1 ,…,𝑋𝑖−1 ,𝑋𝑖 ,𝑌𝑖 ,𝑌𝑖+1 ,…,𝑌ℓ [𝐸𝑣𝑒′ (𝑋1 , … , 𝑋𝑖−1 , 𝑌𝑖 , 𝑌𝑖+1 , … , 𝑌ℓ ) − 𝐸𝑣𝑒′ (𝑋1 , … , 𝑋𝑖−1 , 𝑋𝑖 , 𝑌𝑖+1 , … , 𝑌ℓ )]
By the averaging principle9 this means that there exist some values 9
This is the principle that if the average grade in an
𝑥1 , … , 𝑥𝑖−1 , 𝑦𝑖+1 , … , 𝑦ℓ such that exam was at least 𝛼 then someone must have gotten at
least 𝛼, or in other words that if a real-valued random
∣𝔼𝑋𝑖 ,𝑌𝑖 [𝐸𝑣𝑒′ (𝑥1 , … , 𝑥𝑖−1 , 𝑌𝑖 , 𝑦𝑖+1 , … , 𝑦ℓ ) − 𝐸𝑣𝑒′ (𝑥1 , … , 𝑥𝑖−1 , 𝑋𝑖 , 𝑦𝑖+1 , … , 𝑦ℓ )]∣ variable
>𝜖 𝑍 satisfies 𝔼[𝑍] ≥ 𝛼 then Pr[𝑍 ≥ 𝛼] > 0.
Randomized encryption scheme. We can now prove the full length exten-
sion theorem. Before doing so, we will need to generalize the notion
of an encryption scheme to allow a randomized encryption scheme. That
is, we will consider encryption schemes where the encryption algo-
rithm can “toss coins” in its computation. There is a crucial difference
between key material and such “as hoc” (sometimes also known as
“ephemeral”) randomness. Keys need to be not only chosen at ran-
dom, but also shared in advance between the sender and receiver, and
stored securely throughout their lifetime. The “coin tosses” used by
a randomized encryption scheme are generated “on the fly” and are
not known to the receiver, nor do they need to be stored long term by
the sender. So, allowing such randomized encryption does not make
a difference for most applications of encryption schemes. In fact, as
we will see later in this course, randomized encryption is necessary for
security against more sophisticated attacks such as chosen plaintext
and chosen ciphertext attacks, as well as for obtaining secure public key
encryptions. We will use the notation 𝐸𝑘 (𝑚; 𝑟) to denote the output of
the encryption algorithm on key 𝑘, message 𝑚 and using internal ran-
domness 𝑟. We often suppress the notation for the randomness, and
hence use 𝐸𝑘 (𝑚) to denote the random variable obtained by sampling
a random 𝑟 and outputting 𝐸𝑘 (𝑚; 𝑟).
We can now show that given an encryption scheme with messages
one bit longer than the key, we can obtain a (randomized) encryption
scheme with arbitrarily long messages:
86 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
P
This is perhaps our first example of a non trivial cryp-
tographic theorem, and the blueprint for this proof
will be one that we will follow time and again during
this course. Please make sure you read this proof
carefully and follow the argument.
̂ (𝑚) .
𝐸𝑈𝑛 (𝑚) ≈ 𝐸𝑈𝑛
(2.1)
Note that 𝐸̂ is not a valid encryption scheme since it’s not at all
clear there is a decryption algorithm for it. It is just an hypothetical
tool we use for the proof. Since both 𝐸 and 𝐸̂ are randomized en-
cryption schemes (with 𝐸 using (𝑡 − 1)𝑛 bits of randomness for the
ephemeral keys 𝑘1 , … , 𝑘𝑡−1 and 𝐸̂ using (2𝑡 − 1)𝑛 bits of randomness
for the ephemeral keys 𝑘1 , … , 𝑘𝑡 , 𝑘2′ , … , 𝑘𝑡′ ), we can also write (2.1) as
′
𝐸𝑈𝑛 (𝑚; 𝑈𝑡𝑛 ̂ (𝑚; 𝑈 ′
) ≈ 𝐸𝑈 𝑛 (2𝑡−1)𝑛 )
∣𝔼𝑘𝑗−1 [𝐸𝑣𝑒′ (𝛼, 𝐸𝑘′ 𝑗−1 (𝑘𝑗 , 𝑚𝑗 ), 𝛽) − 𝐸𝑣𝑒′ (𝛼, 𝐸𝑘′ 𝑗−1 (𝑘𝑗′ , 𝑚𝑗 ), 𝛽)]∣ ≥ 𝜖 (∗∗)
• If you have not taken such a course, you might simply take it on
faith that it is possible to model what it means for an algorithm to
be able to map an input 𝑥 into an output 𝑓(𝑥) using 𝑇 “elementary
operations”.
In both cases you might want to skip this appendix and only return
to it if you find something confusing.
The model we use is a Boolean circuit that also has a RAND gate
that outputs a random bit. We could use as the basic set of gates
the standard AND, OR and NOT but for simplicity we use the one-
element set NAND. We represent the circuit as a straightline program,
but this is of course just a matter of convenience. As shown (for exam-
ple) in the CS 121 textbook, these two representations are identical.
comp u tati ona l se c u ri ty 89
A probabilistic straight-
Definition 2.17 — Probabilistic straightline program.
line program consists of a sequence of lines, each one of them one of
the following forms:
1
Edited and expanded by Richard Xu in Spring 2020.
Reading: Katz-Lindell Section 3.3, Boneh-Shoup Chapter 31
The nature of randomness has troubled philosophers, scientists, 2
Even lawyers grapple with this question, with a
statisticians and laypeople for many years.2 Over the years people recent example being the debate of whether fantasy
have given different answers to the question of what does it mean for football is a game of chance or of skill.
data to be random, and what is the nature of probability. The move-
ments of the planets initially looked random and arbitrary, but then
early astronomers managed to find order and make some predictions
on them. Similarly, we have made great advances in predicting the
weather and will probably continue to do so.
So, while these days it seems as if the event of whether or not it will
rain a week from today is random, we could imagine that in the future
we will be able to predict the weather accurately. Even the canonical
notion of a random experiment -tossing a coin - might not be as ran-
dom as you’d think: the second toss will have the same result as the
first one with about a 51% chance. (Though see also this experiment.)
It is conceivable that at some point someone would discover some
function 𝐹 that, given the first 100 coin tosses by any given person, can 3
In fact such a function must exist in some sense since
predict the value of the 101𝑠𝑡 .3 in the entire history of the world, presumably no
In all these examples, the physics underlying the event, whether it’s sequence of 100 fair coin tosses has ever repeated.
the planets’ movement, the weather, or coin tosses, did not change but
only our powers to predict them. So to a large extent, randomness is a
function of the observer, or in other words
If a quantity is hard to compute, it might as well be
random.
A function 𝐺
Definition 3.1 — Pseudorandom generator (concrete). ∶
𝑛 ℓ
{0, 1} → {0, 1} is a (𝑇 , 𝜖) pseudorandom generator if 𝐺(𝑈𝑛 ) ≈𝑇 ,𝜖 𝑈ℓ
𝑡
where 𝑈𝑡 denotes the uniform distribution on {0, 1} .
P
This definition (as is often the case in cryptography)
is a bit long, but the concept of a pseudorandom gen-
erator is central to cryptography, and so you should
take your time and make sure you understand it. In-
tuitively, a function 𝐺 is a pseudorandom generator
if (1) it expands its input (mapping 𝑛 bits to 𝑛 + 1 or
more) and (2) we cannot distinguish between the out-
p se u d ora n d omn e ss 93
Note that the requirement that ℓ > 𝑛 is crucial to make this notion
non-trivial, as for ℓ = 𝑛 the function 𝐺(𝑥) = 𝑥 clearly satisfies that
𝐺(𝑈𝑛 ) is identical to (and hence in particular indistinguishable from)
the distribution 𝑈𝑛 . (Make sure that you understand this last state-
ment!) However, for ℓ > 𝑛 this is no longer trivial at all. In particular,
if we didn’t restrict the running time of 𝐸𝑣𝑒 then no such pseudo-
random generator would exist:
𝑛 𝑛+1
Lemma 3.3 Suppose that 𝐺 ∶ {0, 1} → {0, 1} . Then there ex-
𝑛+1
ists an (inefficient) algorithm 𝐸𝑣𝑒 ∶ {0, 1} → {0, 1} such that
𝔼[𝐸𝑣𝑒(𝐺(𝑈𝑛 ))] = 1 but 𝔼[𝐸𝑣𝑒(𝑈𝑛+1 )] ≤ 1/2.
𝑛+1
Proof. On input 𝑦 ∈ {0, 1} , consider the algorithm 𝐸𝑣𝑒 that goes
𝑛
over all possible 𝑥 ∈ {0, 1} and will output 1 if and only if 𝑦 = 𝐺(𝑥)
for some 𝑥. Clearly 𝔼[𝐸𝑣𝑒(𝐺(𝑈𝑛 ))] = 1. However, the set 𝑆 = {𝐺(𝑥) ∶
𝑛
𝑥 ∈ {0, 1} } on which Eve outputs 1 has size at most 2𝑛 , and hence a
random 𝑦←𝑅 𝑈𝑛+1 will fall in 𝑆 with probability at most 1/2.
■
As was the case for the cipher conjecture, and any other conjecture,
there are two natural questions regarding the PRG conjecture: why
should we believe it and why should we care. Fortunately, the answer
to the first question is simple: it is known that the cipher conjecture
implies the PRG conjecture, and hence if we believe the former we
should believe the latter. (The proof is highly non-trivial and we may
not get to see it in this course.) As for the second question, we will
see that the PRG conjecture implies a great number of useful crypto-
graphic tools, including the cipher conjecture (i.e., the two conjectures
are in fact equivalent). We start by showing that once we can get to an
output that is one bit longer than the input, we can in fact obtain any
polynomial number of bits.
Proof. The proof of this theorem is very similar to the length extension
theorem for ciphers, and in fact this theorem can be used to give an
alternative proof for the former theorem.
The construction is illustrated in Fig. 3.2. We are given a pseu-
dorandom generator 𝐺′ mapping 𝑛 bits into 𝑛 + 1 bits and need to
construct a pseudorandom generator 𝐺 mapping 𝑛 bits to 𝑡 = 𝑡(𝑛) bits
for some polynomial 𝑡(⋅). The idea is that we maintain a state of 𝑛 bits, 5
Because we use a small input to grow a large pseu-
which are originally our input seed5 𝑠0 , and at the 𝑖𝑡ℎ step we use 𝐺′ dorandom string, the input to a pseudorandom
to map 𝑠𝑖−1 to the 𝑛 + 1-long bit string (𝑠𝑖 , 𝑦𝑖 ), output 𝑦𝑖 and keep 𝑠𝑖 generator is often known as its seed.
dom in {0, 1}𝑛 and continue the computation of 𝑦𝑖+1 , … , 𝑦𝑡 from the
state 𝑠𝑖 . Clearly 𝐻0 = 𝐺(𝑈𝑛 ) and 𝐻𝑡 = 𝑈𝑡 and hence by the triangle
inequality it suffices to prove that 𝐻𝑖 ≈ 𝐻𝑖+1 for all 𝑖 ∈ {0, … , 𝑡 − 1}.
We illustrate these two hybrids in Fig. 3.3.
Now suppose otherwise that there exists some adversary 𝐸𝑣𝑒 such
that |𝔼[𝐸𝑣𝑒(𝐻𝑖 )] − 𝔼[𝐸𝑣𝑒(𝐻𝑖+1 )]| ≥ 𝜖 for some non-negligible 𝜖. From
𝐸𝑣𝑒, we will design an adversary 𝐸𝑣𝑒′ breaking the security of the
pseudorandom generator 𝐺′ (see Fig. 3.4).
does. Clearly, 𝐸𝑣𝑒′ is efficient if 𝐸𝑣𝑒 is. Moreover, one can see that
if 𝑦 was random then 𝐸𝑣𝑒′ is feeding 𝐸𝑣𝑒 with an input distributed
according to 𝐻𝑖+1 while if 𝑦 was of the form 𝐺(𝑠) for a random 𝑠 then
𝐸𝑣𝑒′ will feed 𝐸𝑣𝑒 with an input distributed according to 𝐻𝑖 . Hence
we get that | 𝔼[𝐸𝑣𝑒′ (𝐺′ (𝑈𝑛 ))] − 𝔼[𝐸𝑣𝑒′ (𝑈𝑛+1 )]| ≥ 𝜖 contradicting the
security of 𝐺′ .
■
R
Remark 3.5 — Pseudorandom generators in practice. The
proof of Theorem 3.4 is indicative of many practical
constructions of pseudorandom generators. In many
operating systems and programming environments,
pseudorandom generators work as follows:
1
Pr [𝑃 (𝑦1 , … , 𝑦𝑖−1 ) = 𝑦𝑖 ] ≤ + 𝑛𝑒𝑔𝑙(𝑛).
𝑦←𝐺(𝑈𝑛 ) 2
cuit can predict the next bit of the output of 𝐺 given the previous
bits significantly better than guessing.
Proof. For the forward direction, suppose for contradiction that there
exists some 𝑖 and some circuit 𝑃 can predict 𝑦𝑖 given 𝑦1 , … , 𝑦𝑖−1 with
probability 𝑝 ≥ 21 + 𝜖(𝑛) for non-negligible 𝜖. Consider the adversary
𝐸𝑣𝑒 that, given a string 𝑦, runs the circuit 𝑃 on 𝑦1 , … , 𝑦𝑖−1 , checks if
the output is equal to 𝑦𝑖 and if so output 1.
If 𝑦 = 𝐺(𝑥) for a uniform 𝑥, then 𝑃 succeeds with probability
𝑝. If 𝑦 is uniformly random, then we can imagine that the bit 𝑦𝑖 is
generated after 𝑃 finished its calculation. The bit 𝑦𝑖 is 0 or 1 with equal
probability, so 𝑃 succeeds with probability 21 . Since 𝐸𝑣𝑒 outputs 1
when 𝑃 succeeds,
1
|Pr[𝐸𝑣𝑒(𝐺(𝑈𝑛 )) = 1] − Pr[𝐸𝑣𝑒(𝑈ℓ ) = 1]| = |𝑝 − | ≥ 𝜖(𝑛),
2
a contradiction.
For the backward direction, let 𝐺 be an unpredictable function. Let
𝐻𝑖 be the distribution where the first 𝑖 bits come from 𝐺(𝑈𝑛 ) while the
last ℓ − 𝑖 bits are all random. Notice that 𝐻0 = 𝑈ℓ and 𝐻ℓ = 𝐺(𝑈𝑛 ), so
it suffices to show that 𝐻𝑖−1 ≈ 𝐻𝑖 for all 𝑖.
Suppose 𝐻𝑖−1 ≉ 𝐻𝑖 for some 𝑖, i.e. there exists some 𝐸𝑣𝑒 and non-
negligible 𝜖 such that
Consider the program 𝑃 that, on input (𝑦1 , … , 𝑦𝑖−1 ), picks the bits
𝑦𝑖̂ , … , 𝑦ℓ̂ uniformly at random. Then, 𝑃 calls 𝐸𝑣𝑒 on the generated
input. If 𝐸𝑣𝑒 outputs 1 then 𝑃 outputs 𝑦𝑖̂ , and otherwise it outputs
1 − 𝑦𝑖̂ .
The string (𝑦1 , … , 𝑦𝑖−1 , 𝑦𝑖̂ , … , 𝑦ℓ̂ ) has the same distribution as 𝐻𝑖−1 .
However, conditioned on 𝑦𝑖̂ = 𝑦𝑖 , the string has distribution equal to
𝐻𝑖 . Let 𝑝 be the probability that 𝐸𝑣𝑒 outputs 1 if 𝑦𝑖̂ = 𝑦𝑖 and 𝑞 be the
same probability when 𝑦𝑖̂ ≠ 𝑦𝑖 , then we get
1
𝑝 − (𝑝 + 𝑞) = Pr[𝐸𝑣𝑒(𝐻𝑖 ) = 1] − Pr[𝐸𝑣𝑒(𝐻𝑖−1 ) = 1] > 𝜖(𝑛).
2
Therefore, the probability 𝑃 outputs the correct value is equal to 12 𝑝 +
2 (1 − 𝑞) = 2 + 𝜖(𝑛), a contradiction.
1 1
■
98 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
It turns out that the converse direction is also true, and hence these
two conjectures are equivalent. We will probably not show the (quite
non-trivial) proof of this fact in this course. (We might show a weaker
version though.)
Proof. Recall that the one time pad is a perfectly secure cipher but its
only problem was that to encrypt an 𝑛 + 1 long message it needed
an 𝑛 + 1 long bit key. Now using a pseudorandom generator, we can
map an 𝑛-bit long key into an 𝑛 + 1-bit long string that looks random
enough that we could use it as a key for the one-time pad. That is, our
cipher will look as follows:
𝐸𝑘 (𝑚) = 𝐺(𝑘) ⊕ 𝑚
and
𝐷𝑘 (𝑐) = 𝐺(𝑘) ⊕ 𝑐
Just like in the one time pad, 𝐷𝑘 (𝐸𝑘 (𝑚)) = 𝐺(𝑘) ⊕ 𝐺(𝑘) ⊕ 𝑚 =
𝑚. Moreover, the encryption and decryption algorithms are clearly
efficient. We will prove security of this encryption by showing the
stronger claim that 𝐸𝑈𝑛 (𝑚) ≈ 𝑈𝑛+1 for any 𝑚.
Notice that 𝑈𝑛+1 = 𝑈𝑛+1 ⊕ 𝑚, as we showed in the security of the
one-time pad. Suppose that for some non-negligible 𝜖 = 𝜖(𝑛) > 0 there
is an efficient adversary 𝐸𝑣𝑒′ such that
𝐸𝑣𝑒′ (𝑈𝑛+1 ⊕ 𝑚). Then, 𝐸𝑣𝑒 can distinguish the two distributions
with advantage 𝜖, a contradiction.
■
R
Remark 3.9 — Using pseudorandom generators for coin
tossing over the phone. The following is a cute appli-
cation of pseudorandom generators. Alice and Bob
want to toss a fair coin over the phone. They use a
pseudorandom generator 𝐺 ∶ {0, 1}𝑛 → {0, 1}3𝑛 .
1. Alice will send 𝑧 ←𝑅 {0, 1}3𝑛 to Bob
2. Bob picks 𝑠 ←𝑅 {0, 1}𝑛 and 𝑏 ←𝑅 {0, 1}. If 𝑏 = 0
then Bob sends 𝑦 = 𝐺(𝑠) and if 𝑏 = 1 he sends 𝑦 =
𝐺(𝑠) ⊕ 𝑧. In other words, 𝑦 = 𝐺(𝑠) ⊕ 𝑏 ⋅ 𝑧 where 𝑏 ⋅ 𝑧
is the vector (𝑏 ⋅ 𝑧1 , … , 𝑏 ⋅ 𝑧3𝑛 ).
3. Alice then picks a random 𝑏′ ←𝑅 {0, 1} and sends
it to Bob.
4. Bob sends to Alice the string 𝑠 and 𝑏. Alice verifies
that indeed 𝑦 = 𝐺(𝑠) ⊕ 𝑏 ⋅ 𝑧. Otherwise Alice aborts.
5. The output of the protocol is 𝑏 ⊕ 𝑏′ .
It can be shown that (assuming the protocol is com-
pleted) the output is a random coin, which neither
Alice or Bob can control or predict with more than
negligible advantage over half. Trying to formalize
this and prove it is an excellent exercise. Two main
components in the proofs are:
• With probability 1 − 𝑛𝑒𝑔𝑙(𝑛) over 𝑧 ←𝑅 {0, 1}3𝑛 ,
the sets 𝑆0 = {𝐺(𝑥)|𝑥 ∈ {0, 1}𝑛 } and
𝑆1 = {𝐺(𝑥) ⊕ 𝑧|𝑥 ∈ {0, 1}𝑛 } will be disjoint.
Hence by choosing 𝑧 at random, Alice can ensure
that Bob is committed to the choice of 𝑏 after sending
𝑦.
• For every 𝑧, both the distribution 𝐺(𝑈𝑛 ) and
𝐺(𝑈𝑛 ) ⊕ 𝑧 are pseudorandom. This can be shown
to imply that no matter what string 𝑧 Alice chooses,
she cannot predict 𝑏 from the string 𝑦 sent by Bob
with probability better than 1/2 + 𝑛𝑒𝑔𝑙(𝑛). Hence
her choice of 𝑏′ will be essentially independent of 𝑏.
100 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
P
You should really pause here and make sure you see
why the “counter pseudorandom generator” is not
a secure pseudorandom generator. Show that this is
true even if we replace the least significant digit by the
𝑘-th digit for every 0 ≤ 𝑘 < 𝑛.
if the current state is 𝑠 ∈ {0, 1}𝑛 then the output is 𝑓(𝑠) where 𝑓 is
a linear function (modulo 2) and the new state is obtained by right
shifting the previous state and putting 𝑓(𝑠) at the leftmost location.
That is, 𝑠′1 = 𝑓(𝑠) and 𝑠′𝑖 = 𝑠𝑖−1 for 𝑖 ∈ {2, … , 𝑛}.
LFSR’s have several good properties- if the function 𝑓(⋅) is chosen
properly then they can have very long periods (i.e., it can take an ex-
ponential number of steps until the state repeats itself), though that
also holds for the simple “counter” generator we saw above. They
also have the property that every individual bit is equal to 0 or 1 with
probability exactly half (the counter generator also shares this prop-
erty).
A more interesting property is that (if the function is selected prop-
erly) every two coordinates are independent from one another. That
is, there is some super-polynomial function 𝑡(𝑛) (in fact 𝑡(𝑛) can be
exponential in 𝑛) such that if ℓ ≠ ℓ′ ∈ {0, … , 𝑡(𝑛)}, then if we look at
the two random variables corresponding to the ℓ-th and ℓ′ -th output
of the generator (where randomness is the initial state) then they are
distributed like two independent random coins. (This is non-trivial to
show, and depends on the choice of 𝑓 - it is a challenging but useful
exercise to work this out.) The counter generator fails badly at this
condition: the least significant bits between two consecutive states
always flip.
There is a more general notion of a linear generator where the new
state can be any invertible linear transformation of the previous state.
That is, we interpret the state 𝑠 as an element of ℤ𝑡𝑞 for some integers
7
A ring is a set of elements where addition and
𝑞, 𝑡,7 and let 𝑠’ = 𝐹 (𝑠) and the output 𝑏 = 𝐺(𝑠) where 𝐹 ∶ ℤ𝑡𝑞 → ℤ𝑡𝑞 multiplication are defined and obey the natural rules
and 𝐺 ∶ ℤ𝑡𝑞 → ℤ𝑞 are invertible linear transformations (modulo 𝑞). of associativity and commutativity (though without
This includes as a special case the linear congruential generator where necessarily having a multiplicative inverse for every
element). For every integer 𝑞 we define ℤ𝑞 (known as
𝑡 = 1 and the map 𝐹 (𝑠) corresponds to taking 𝑎𝑠 (mod 𝑞) where 𝑎 is the ring of integers modulo 𝑞) to be the set {0, … , 𝑞 − 1}
number co-prime to 𝑞. where addition and multiplication is done modulo 𝑞.
All these generators are unfortunately insecure due to the great
bane of cryptography- the Gaussian elimination algorithm which stu- 8
Despite the name, the algorithm goes at least as far
dents typically encounter in any linear algebra class.8 back as the Chinese Jiuzhang Suanshu manuscript,
circa 150 B.C.
There is a poly-
Theorem 3.10 — The unfortunate theorem for cryptography.
nomial time algorithm to solve 𝑚 linear equations in 𝑛 variables
(or to certify no solution exists) over any ring.
R
Remark 3.11 — Non-cryptographic PRGs. The above
means that it is a bad idea to use a linear checksum as
a pseudorandom generator in a cryptographic appli-
cation, and in fact in any adversarial setting (e.g., one
shouldn’t hope that an attacker would not be able to
reverse engineer the algorithm 9 that computes the
control digit of a credit card number). However, that
does not mean that there are no legitimate cases where
linear generators can be used . In a setting where the
application is not adversarial and you have an ability
to test if the generator is actually successful, it might
be reasonable to use such insecure non-cryptographic
generators. They tend to be more efficient (though
often not by much) and hence are often the default
option in many programming environments such as
the C rand() command. (In fact, the real bottleneck
in using cryptographic pseudorandom generators
is often the generation of entropy for their seed, as
discussed in the previous lecture, and not their actual
running time.)
9
That number is obtained by applying an algorithm
of Hans Peter Luhn which applies a simple map to
each digit of the card and then sums them up modulo
3.2.3 From insecurity to security 10.
It is often the case that we want to “fix” a broken cryptographic prim-
itive, such as a pseudorandom generator, to make it secure. At the
moment this is still more of an art than a science, but there are some
principles that cryptographers have used to try to make this more
principled. The main intuition is that there are certain properties of
computational problems that make them more amenable to algo-
rithms (i.e., “easier”) and when we want to make the problems useful
for cryptography (i.e., “hard”) we often seek variants that don’t pos-
sess these properties. The following table illustrates some examples
of such properties. (These are not formal statements, but rather is
intended to give some intuition )
Easy Hard
Continuous Discrete
pse u d ora n d omn e ss 103
Easy Hard
Convex Non-convex
Linear Non-linear (degree ≥ 2)
Noiseless Noisy
Local Global
Shallow Deep
Low degree High degree
P
This is an excellent point for you to stop and try to
answer this question on your own.
Given the known constants and known output, figuring out the set
of potential seeds can be thought of as solving a single equation in 40
variables. However, this equation is clearly overdetermined, and will
have a solution regardless of whether the observed value is indeed an
output of the generator, or it is chosen uniformly at random.
More concretely, we can use linear-equation solving to compute
(given the known constants 𝑐1 , … , 𝑐40 ∈ ℤ248 and the output 𝑦 ∈ ℤ248 )
the linear subspace 𝑉 of all vectors (𝑠1 , … , 𝑠40 ) ∈ (ℤ248 )40 such that
∑ 𝑠𝑖 𝑐𝑖 = 𝑦 (mod 248 ). But, regardless of whether 𝑦 was generated at
random from ℤ248 , or 𝑦 was generated as an output of the generator,
the subspace 𝑉 will always have the same dimension (specifically,
since it is formed by a single linear equation over 40 variables, the
dimension will be 39.) To break the generator we seem to need to be
able to decide whether this linear subspace 𝑉 ⊆ (ℤ248 )40 contains
a Boolean vector (i.e., a vector 𝑠 ∈ {0, 1}𝑛 ). Since the condition that
a vector is Boolean is not defined by linear equations, we cannot use
Gaussian elimination to break the generator. Generally, the task of
finding a vector with small coefficients inside a discrete linear sub-
space is closely related to a classical problem known as finding the
shortest vector in a lattice. (See also the short integer solution (SIS)
problem.)
def RC4(P,i,j):
i = (i + 1) % 256
j = (j + P[i]) % 256
P[i], P[j] = P[j], P[i]
return (P,i,j,P[(P[i]+P[j]) % 256])
The function RC4 takes as input the current state P,i,j of the gen-
erator and returns the new state together with a single output byte.
The state of the generator consists of an array P of 256 bytes, which can
be thought of as a permutation of the numbers 0, … , 255 in the sense
that we maintain the invariant that P[𝑖] ≠ P[𝑗] for every 𝑖 ≠ 𝑗, and two
indices 𝑖, 𝑗 ∈ {0, … , 255}. We can consider the initial state as the case
where P is a completely random permutation and 𝑖 and 𝑗 are initial-
106 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
ized to zero, although to save on initial seed size, typically RC4 uses
some “pseudorandom” way to generate P from a shorter seed as well.
RC4 has extremely efficient software implementations and hence
has been widely implemented. However, it has several issues with its 11
I typically do not include references in these lecture
security. In particular it was shown by Mantin11 and Shamir that the notes, and leave them to the texts, but I make here an
second bit of RC4 is not random, even if the initialization vector was exception because Itsik Mantin was a close friend of
mine in grad school.
random. This and other issues led to a practical attack on the 802.11b
WiFi protocol, see Section 9.9 in Boneh-Shoup. The initial response to
those attacks was to suggest to drop the first 1024 bytes of the output,
but by now the attacks have been sufficiently extended that RC4 is
simply not considered a secure cipher anymore. The ciphers Salsa and
ChaCha, designed by Dan Bernstein, have a similar design to RC4, and
are considered secure and deployed in several standard protocols such
as TLS, SSH and QUIC, see Section 3.6 in Boneh-Shoup.
def BBS(X):
return (X * X % N, N % 2)
Proof Idea:
The proof uses an extremely useful technique known as the “prob-
abilistic method” which is not too hard mathematically but can be There is a whole (highly recommended) book by
12
confusing at first.12 The idea is to give a “non constructive” proof of Alon and Spencer devoted to this method.
existence of the pseudorandom generator 𝐺 by showing that if 𝐺 was
chosen at random, then the probability that it would be a valid (𝑇 , 𝜖)
pseudorandom generator is positive. In particular this means that
there exists a single 𝐺 that is a valid (𝑇 , 𝜖) pseudorandom generator.
The probabilistic method is just a proof technique to demonstrate the
existence of such a function. Ultimately, our goal is to show the ex-
istence of a deterministic function 𝐺 that satisfies the conditions of a
(𝑇 , 𝜖) PRG.
⋆
⎧
{ ⎫
}
1 1
𝐵𝑃 = ⎨ 𝐺 ∈ ℱ 𝑚
ℓ ∣ ∣ 2ℓ ∑ 𝑃 (𝐺(𝑠)) − 2𝑚 ∑ 𝑃 (𝑟)∣ > 𝜖⎬ .
{
⎩ 𝑠∈{0,1}ℓ 𝑟∈{0,1}𝑚 }
⎭
(3.2)
(We’ve replaced here the probability statements in (3.1) with the
equivalent sums so as to reduce confusion as to what is the sample
space that 𝐵𝑃 is defined over.)
To understand this proof it is crucial that you pause here and see
how the definition of 𝐵𝑃 above corresponds to (3.2). This may well
take re-reading the above text once or twice, but it is a good exercise
at parsing probabilistic statements and learning how to identify the
sample space that these statements correspond to.
Now, the number of programs of size 𝑇 (or circuits of size 𝑇 ) is
at most 2𝑂(𝑇 log 𝑇 ) . Since 𝑇 log 𝑇 = 𝑜(𝑇 2 ) this means that if Claim I
is true, then by the union bound it holds that the probability of the
union of 𝐵𝑃 over all NAND programs of at most 𝑇 lines is at most
2𝑂(𝑇 log 𝑇 ) 2−𝑇 < 0.1 for sufficiently large 𝑇 . What is important for
2
(3.1) with respect to any NAND program of at most 𝑇 lines, but that
precisely means that 𝐺∗ is a (𝑇 , 𝜖) pseudorandom generator.
Hence, it suffices to prove Claim I to conclude the proof of
Lemma 3.12. Choosing a random 𝐺 ∶ {0, 1}ℓ → {0, 1}𝑚 amounts to
choosing 𝐿 = 2ℓ random strings 𝑦0 , … , 𝑦𝐿−1 ∈ {0, 1}𝑚 and letting
𝐺(𝑥) = 𝑦𝑥 (identifying {0, 1}ℓ and [𝐿] via the binary representation).
Hence the claim amounts to showing that for every fixed function
pse u d ora n d omn e ss 109
𝑃 ∶ {0, 1}𝑚 → {0, 1}, if 𝐿 > 2𝐶(log 𝑇 +log(1/𝜖)) (which by setting 𝐶 > 4,
we can ensure is larger than 10𝑇 2 /𝜖2 ) then the probability that
𝐿−1
∣ 𝐿1 ∑ 𝑃 (𝑦𝑖 ) − Pr [𝑃 (𝑠) = 1]∣ > 𝜖 (3.3)
𝑠←𝑅 {0,1}𝑚
𝑖=0
we let for every 𝑖 ∈ [𝐿] the random variable 𝑋𝑖 denote 𝑃 (𝑦𝑖 ), then
since 𝑦0 , … , 𝑦𝐿−1 is chosen independently at random, these are inde-
pendently and identically distributed random variables with mean
𝔼𝑦←𝑅 {0,1}𝑚 [𝑃 (𝑦)] = Pr𝑦←𝑅 {0,1}𝑚 [𝑃 (𝑦) = 1] and hence the probability
that they deviate from their expectation by 𝜖 is at most 2 ⋅ 2−𝜖 𝐿/2 .
2
■
4
Pseudorandom functions
R
Remark 4.2 — Completely Random Functions. This no-
tion of a randomly chosen function can be difficult to
wrap your mind around. Try to imagine a table of all
of the strings in {0, 1}𝑛 . We now go to each possible
input, randomly generate a bit to be its output, and
write down the result in the table. When we’re done,
we have a length 2𝑛 lookup table that maps each input
to an output that was generated uniformly at random
and independently of all other outputs. This lookup
table is now our random function 𝐻.
In practice it’s too cumbersome to actually generate
all 2𝑛 bits, and sometimes in theory it’s convenient to
think of each output as generated only after a query is
made. This leads to adopting the lazy evaluation model.
In the lazy evaluation model, we imagine that a lazy
person is sitting in a room with the same lookup table
as before, but with all entries blank. If someone makes
some query 𝐻(𝑠), the lazy person checks if the entry
for 𝑠 in the lookup table is blank. If so, the lazy evalu-
ator generates a random bit, writes down the result for
𝑠, and returns it. Otherwise, if an output has already
been generated for 𝑠 previously (because 𝑠 has been
queried before), the lazy evaluator simply returns this
value. Can you see why this model is more convenient
in some ways?
One last way to think about how a completely random
function is determined is to first observe that there
exist a total of 22 functions from {0, 1}𝑛 to {0, 1} (can
𝑛
P
Now would be a fantastic time to stop and think
deeply about the three constructions in the remark
above, and in particular why they are all equivalent. If
you don’t feel like thinking then at the very least you
should make a mental note to come back later if you’re
confused, because this idea will be very useful down
the road.
where ℱ𝑛,1 is the set of all functions mapping {0, 1}𝑛 to {0, 1} (i.e.,
the set {0, 1}𝑛 → {0, 1}).
P
It is worth while to pause and make sure you un-
derstand why Definition 4.3 and Definition 4.1 give
different ways to talk about the same object.
In the next lecture we will see the proof of following theorem (due
to Goldreich, Goldwasser, and Micali)
But before we see the proof of Theorem 4.4, let us see why pseudo-
random functions could be useful.
pse u d ora n d om fu nc ti on s 115
P
116 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
The problem is that Mallory does not have to learn the password
𝑝 in order to impersonate Alice. For example, she can simply record
the message Alice 𝑐1 sends to Bob in the first session and then replay
it to Bob in the next session. Since the message is a valid encryption
of 𝑝, then Bob would accept it from Mallory! (This is known as a
replay attack and is a common attack one needs to protect against in
cryptographic protocols.) One can try to put in countermeasures to
defend against this particular attack, but its existence demonstrates
that secrecy of the password does not guarantee security of the login
protocol.
Protocol PRF-Login:
• Shared input: 𝑠 ∈ {0, 1}𝑛 . Alice and Bob treat it as a seed for a
pseudorandom function generator {𝑓𝑠 }.
• In every session Alice and Bob do the following:
1. Bob chooses a random 𝑖 ←𝑅 [2𝑛 ] and sends 𝑖 to Alice.
2. Alice sends 𝑦1 , … , 𝑦ℓ to Bob where 𝑦𝑗 = 𝑓𝑠 (𝑖 + 𝑗 − 1).
3. Bob checks that for every 𝑗 ∈ {1, … , ℓ}, 𝑦𝑗 = 𝑓𝑠 (𝑖 + 𝑗 − 1) and if
so accepts the session; otherwise he rejects it.
As we will see it’s not really crucial that the input 𝑖 (which is
known in crypto parlance as a nonce) is random. What is crucial is that
it never repeats itself, to foil a replay attack. For this reason in many
applications Alice and Bob compute 𝑖 as a function of the current time
(for example, the index of the current minute based on some agreed-
upon starting point), and hence we can make it into a one message
protocol. Also the parameter ℓ is sometimes chosen to be deliberately
short so that it will be easy for people to type the values 𝑦1 , … , 𝑦ℓ . Figure 4.2: The Google Authenticator app is one
popular example of a one-time password scheme
Why is this secure? The key to understanding schemes using pseu- using pseudorandom functions. Another example is
dorandom functions is to imagine what would happen if 𝑓𝑠 was be RSA’s SecurID token.
pse u d ora n d om fu nc ti on s 117
P
Please make sure you understand the informal rea-
soning above, since we will now translate this into a
formal theorem and proof.
Theorem 4.5 — Login protocol via PRF. Suppose that {𝑓𝑠 } is a secure
pseudorandom function generator and Alice and Bob interact us-
ing Protocol PRF-Login for some polynomial number 𝑇 of sessions
(over a channel controlled by Mallory). After observing these in-
teractions, Mallory then interacts with Bob, where Bob follows the
protocol’s instructions but Mallory has access to arbitrary efficient
computation. Then, the probability that Bob accepts the interaction
is at most 2−ℓ + 𝜇(𝑛) where 𝜇(⋅) is some negligible function.
of the parties needs to evaluate 𝑓𝑠 (𝑖), 𝐴 will forward 𝑖 to its black box
𝑂(⋅) and return the value 𝑂(𝑖). It will then output 1 if and only if 𝑀
succeeds in impersonation in this internal simulation. The argument
above showed that if 𝑂(⋅) is a truly random function, then the proba-
bility that 𝐴 outputs 1 is at most 2−ℓ + 𝑛𝑒𝑔𝑙𝑖𝑔𝑖𝑏𝑙𝑒 (and so in particular
less than 2−ℓ + 𝜖/2). On the other hand, if 𝑂(⋅) is the function 𝑖 ↦ 𝑓𝑠 (𝑖)
for some fixed and random 𝑠, then this probability is at least 2−ℓ + 𝜖.
Thus 𝐴 will distinguish between the two cases with bias at least 𝜖/2.
We now turn to the formal proof:
Claim 1: Let PRF-Login* be the hypothetical variant of the protocol
PRF-Login where Alice and Bob share a completely random function
𝐻 ∶ [2𝑛 ] → {0, 1}. Then, no matter what Mallory does, the probability
she can impersonate Alice after observing 𝑇 interactions is at most
2−ℓ + (8ℓ𝑇 )/2𝑛 .
(If PRF-Login* is easier to prove secure than PRF-Login, you might
wonder why we bother with PRF-Login in the first place and not sim-
ply use PRF-Login*. The reason is that specifying a random function
𝐻 requires specifying 2𝑛 bits, and so that would be a huge shared key.
So PRF-Login* is not a protocol we can actually run but rather a hy-
pothetical “mental experiment” that helps us in arguing about the
security of PRF-Login.)
Proof of Claim 1: Let 𝑖1 , … , 𝑖2𝑇 be the nonces chosen by Bob and
recieved by Alice in the first 𝑇 iterations. That is, 𝑖1 is the nonce cho-
sen by Bob in the first iteration while 𝑖2 is the nonce that Alice re-
ceived in the first iteration (if Mallory doesn’t modify it then 𝑖1 = 𝑖2 ).
Similarly, 𝑖3 is the nonce chosen by Bob in the second iteration while
𝑖4 is the nonce received by Alice and so on and so forth. Let 𝑖 be the
nonce chosen in the 𝑇 + 1𝑠𝑡 iteration in which Mallory tries to im-
personate Alice. We claim that the probability that there exists some
𝑗 ∈ {1, … , 2𝑇 } such that |𝑖 − 𝑖𝑗 | < 2ℓ is at most 8ℓ𝑇 /2𝑛 . Indeed, let 𝑆
be the union of all the intervals of the form {𝑖𝑗 − 2ℓ + 1, … , 𝑖𝑗 + 2ℓ − 1}
for 1 ≤ 𝑗 ≤ 2𝑇 . Since it’s a union of 2𝑇 intervals each of length
less than 4ℓ, 𝑆 contains at most 8𝑇 ℓ elements, so the probability that
𝑖 ∈ 𝑆 is |𝑆|/2𝑛 ≤ (8𝑇 ℓ)/2𝑛 . Now, if there does not exists a 𝑗 such that
|𝑖−𝑖𝑗 | < 2ℓ then it means in particular that all the queries to 𝐻(⋅) made
by either Alice or Bob during the first 𝑇 iterations are disjoint from the
interval {𝑖, 𝑖 + 1, … , 𝑖 + ℓ − 1}. Since 𝐻(⋅) is a completely random
function, the values 𝐻(𝑖), … , 𝐻(𝑖 + ℓ − 1) are chosen uniformly and
independently from all the rest of the values of this function. Since
Mallory’s message 𝑦 to Bob in the 𝑇 + 1𝑠𝑡 iteration depends only on
what she observed in the past, the values 𝐻(𝑖), … , 𝐻(𝑖 + ℓ − 1) are inde-
pendent from 𝑦, and hence under the condition that there is no overlap
between this interval and prior queries, the probability that they equal
𝑦 is 2−ℓ . QED (Claim 1).
pse u d ora n d om fu nc ti on s 119
Theorem 4.7 — PRF length extension. Suppose that PRFs exist. Then
for every constant 𝑐 and polynomial-time computable functions
ℓin , ℓout ∶ ℕ → ℕ with ℓin (𝑛), ℓout (𝑛) ≤ 𝑛𝑐 , there exist a PRF ensem-
ble with input length ℓin and output length ℓout .
3
Clearly if the adversary outputs a pair (𝑚, 𝜏) that
If Alice and Bob share the key 𝑘, then to send a message 𝑚 to Bob, it did query from its oracle then that pair will pass
Alice will simply send over the pair (𝑚, 𝜏 ) where 𝜏 = 𝑆𝑘 (𝑚). If verification. This suggests the possibility of a replay
attack whereby Mallory resends to Bob a message that
Bob receives a message (𝑚′ , 𝜏 ′ ), then he will accept 𝑚′ if and only Alice sent him in the past. As above, one can thwart
if 𝑉𝑘 (𝑚′ , 𝜏 ′ ) = 1. Mallory now observes 𝑡 rounds of communication this by insisting the every message 𝑚 begins with a
fresh nonce or a value derived from the current time.
of the form (𝑚𝑖 , 𝑆𝑘 (𝑚𝑖 )) for messages 𝑚1 , … , 𝑚𝑡 of her choice, and her
goal is to try to create a new message 𝑚′ that was not sent by Alice,
but for which she can forge a valid tag 𝜏 ′ that will pass verification.
Our notion of security guarantees that she’ll only be able to do so with 4
A priori you might ask if we should not also give
negligible probability, in which case the MAC is CMA-secure.4 Mallory an oracle to 𝑉𝑘 (⋅) as well. After all, in the
course of those many interactions, Mallory could
also send Bob many messages (𝑚′ , 𝜏 ′ ) of her choice,
and observe from his behavior whether or not these
passed verification. It is a good exercise to show that
R adding such an oracle does not change the power of
Remark 4.9 — Why can Mallory choose the messages?. the definition, though we note that this is decidedly
The notion of a “chosen message attack” might seem not the case in the analogous question for encryption.
a little “over the top”. After all, Alice is going to send
to Bob the messages of her choice, rather than those
chosen by her adversary Mallory. However, as cryp-
tographers have learned time and again the hard way,
it is better to be conservative in our security defini-
tions and think of an attacker that has as much power
as possible. First of all, we want a message authentica-
tion code that will work for any sequence of messages,
and so it’s better to consider this “worst case” setting
of allowing Mallory to choose them. Second, in many
realistic settings an adversary could have some effect
on the messages that are being sent by the parties.
This has occurred time and again in cases ranging
from web servers to German submarines in World
War II, and we’ll return to this point when we talk
about chosen plaintext and chosen ciphertext attacks on
encryption schemes.
122 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
R
Remark 4.10 — Strong unforgability. Some texts (such as
Boneh Shoup) define a stronger notion of unforgabil-
ity where the adversary cannot even produce new sig-
natures for messages it has queried in the attack. That
is, the adversary cannot produce a valid message-
signature pair that it has not seen before. This stronger
definition can be useful for some applications. It is
fairly easy to transform MACs satisfying Definition 4.8
into MACs satisfying strong unforgability. In partic-
ular, if the signing function is deterministic, and we
use a canonical verifier algorithm where 𝑉𝑘 (𝑚, 𝜎) = 1
iff 𝑆𝑘 (𝑚) = 𝜎 then weak unforgability automatically
implies strong unforgability since every message has a
single signature that would pass verification (can you
see why?).
Theorem 4.11 — MAC Theorem. Under the PRF Conjecture, there exists
a secure MAC.
• The property EASY fails to hold for a random function with high
probability.
In this lecture we will see that the PRG conjecture implies the PRF
conjecture. We will also see how PRFs imply an encryption scheme
that is secure even when we encrypt multiple messages with the same
key.
We have seen that PRF’s (pseudorandom functions) are extremely
useful, and we’ll see some more applications of them later on. But
are they perhaps too amazing to exist? Why would someone imagine
that such a wonderful object is feasible? The answer is the following
theorem:
Theorem 5.1 — The PRF Theorem. Suppose that the PRG Conjecture is
true, then there exists a secure PRF collection {𝑓𝑠 }𝑠∈{0,1}∗ such that
for every 𝑠 ∈ {0, 1}𝑛 , 𝑓𝑠 maps {0, 1}𝑛 to {0, 1}𝑛 .
𝐴 and the 𝑗𝑡ℎ oracle inside its belly with one difference- when the
time comes to label the 𝑗𝑡ℎ node, instead of doing this by applying the
pseudorandom generator to the label of its parent 𝑣 (which is what
should happen in the 𝑗𝑡ℎ oracle) it uses its input 𝑦 to label the two
children of 𝑣.
Now, if 𝑦 was completely random then we get exactly the distribu-
tion of the 𝑗 + 1𝑠𝑡 oracle, and hence in this case 𝐷 simulates internally
the 𝑗 + 1𝑠𝑡 hybrid. However, if 𝑦 = 𝐺(𝑠) for some randomly sampled
𝑠 ∈ {0, 1}𝑛 , though it may not be obvious at first, we actually get the
distribution of the 𝑗𝑡ℎ oracle.
The equivalence between hybrid 𝑗 and distinguisher 𝐷 under the
condition that 𝑦 = 𝐺(𝑠) is non obvious, because in hybrid 𝑗, the label
for the children of 𝑣𝑗−1
𝐿
was supposed to be the result of applying the
pseudorandom generator to the label of 𝑣𝑗−1 𝐿
and not to some other
random string (see Fig. 5.6). However, because 𝑣 was labeled before the
𝑗𝑡ℎ step then we know that it was actually labeled by a random string.
Moreover, since we use lazy evaluation we know that step 𝑗 is the first
time where we actually use the value of the label of 𝑣. Hence, if at
this point we resampled this label and used a completely independent
random string 𝑠 then the distribution of 𝑣𝑗−1
𝐿
and 𝑠 would be identical.
The key observations here are:
2. The label for an internal vertex 𝑣 is only used once, and that is for
generating the labels for its children.
130 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
R
Remark 5.2 — PRF’s in practice. While this construc-
tion reassures us that we can rely on the existence of
pseudorandom functions even on days where we re-
member to take our meds, this is not the construction
people use when they need a PRF in practice because
it is still somewhat inefficient, making 𝑛 calls to the
underlying pseudorandom generators. There are
constructions (e.g., HMAC) based on hash functions
that require stronger assumptions but can use as few
as two calls to the underlying function. We will cover
these constructions when we talk about hash functions
and the random oracle model. One can also obtain
practical constructions of PRFs from block ciphers,
which we’ll see later in this lecture.
P
Try to think what kind of security guarantees are not
provided by the notion of computational secrecy we
saw in Definition 2.6
An en-
Definition 5.3 — Chosen Plaintext Attack (CPA) secure encryption.
cryption scheme (𝐸, 𝐷) is secure against chosen plaintext attack (CPA
secure) if for every polynomial time 𝐸𝑣𝑒, Eve wins with probability
at most 1/2 + 𝑛𝑒𝑔𝑙(𝑛) in the game defined below:
Proof. The proof is very simple: Eve will only use a single round of
interacting with 𝐸 where she will ask for the encryption 𝑐1 of 0ℓ . In
the second round, Eve will choose 𝑚0 = 0ℓ and 𝑚1 = 1ℓ , and get
𝑐∗ = 𝐸𝑘 (𝑚𝑏 ) she will then output 0 if and only if 𝑐∗ = 𝑐1 .
■
P
At first look ?? might seem not to make sense, since on
one hand it requires the map 𝑥 ↦ 𝑓𝑠 (𝑥) to be a permu-
tation, but on the other hand it can be shown that with
high probability a random map 𝐻 ∶ {0, 1}ℓ → {0, 1}ℓ
will not be a permutation. How can then such a collec-
tion be pseudorandom? The key insight is that while
134 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
computed while answering 𝐴’s 𝑖th query coincides with the string 𝑦1
(𝑗)
hence that ℎ1 (𝑥2 ) and ℎ1 (𝑥2 ) are uniform and independent. In other
(𝑖) (𝑗)
a union bound over all choices of 𝑖 and 𝑗, we see that the probability of
a collision at 𝑦1 is at most 𝑞(𝑛)2 /2𝑛 , which is negligible.
Next, define a collision at 𝑦2 , by a pair of queries 1 ≤ 𝑖 < 𝑗 ≤ 𝑞(𝑛)
such that 𝑦2 = 𝑦2 . We argue that the probability of a collision at
(𝑖) (𝑗)
independent, random strings. At this point, we’ve shown that the ad-
versary cannot distinguish the outputs (𝑧1 , 𝑦2 ), … , (𝑧1 ) of
(1) (1) (𝑞(𝑛)) (𝑞(𝑛))
, 𝑦2
the oracle for 𝑝ℎ1 ,ℎ2 ,ℎ3 from the outputs of a random oracle unless an
event with negligibly small probability occurs. We conclude that the
collection {𝑝ℎ1 ,ℎ2 ,ℎ3 }, and hence our original collection {𝑝𝑠1 ,𝑠2 ,𝑠3 }, is a
secure PRP collection.
For more details regarding this proof, see Section 4.5 in Boneh
Shoup or Section 8.6 (7.6 in 2nd ed) in Katz-Lindell, whose proof was
used as a model for ours.
■
R
Remark 5.7 — How many Feistel rounds?. The construc-
tion in the proof of Theorem 5.6 constructed a PRP 𝑝
by performing 3 rounds of the Feistel transformation
with a known PRF 𝑓. It is an interesting exercise to
try to show that doing just 1 or 2 rounds of the Feis-
pse u d ora n d om fu nc ti on s from p se u d ora n d om g e n e rators a n d cpa se c u ri ty 137
from this sequence. Like CBC mode, OFB mode is CPA secure when
IV is chosen at random. Some advantages of OFB mode over CBC
mode include the ability for the sender to precompute the sequence
(𝑦1 , 𝑦2 , …) well before the message to be encrypted is known, as well
as the fact that the underlying function 𝑝𝑠 used to generate (𝑦1 , 𝑦2 , …)
only needs to be a PRF (not necessarily a PRP).
Perhaps the simplest mode of operation is counter (CTR) mode
where we convert a block cipher to a stream cipher by using the
stream 𝑝𝑠 (IV), 𝑝𝑠 (IV + 1), 𝑝𝑠 (IV + 2), … where IV is a random string
in {0, 1}𝑛 which we identify with [2𝑛 ] (and perform addition modulo
2𝑛 ). That is, to encrypt a message 𝑚 = (𝑚1 , … , 𝑚𝑡 ), we choose IV at
random, and output (IV, 𝑐1 , … , 𝑐𝑡 ), where 𝑐𝑖 = 𝑝𝑠 (IV + 𝑖) ⊕ 𝑚𝑖 for
1 ≤ 𝑖 ≤ 𝑡. Decryption is performed similarly. For a modern block
cipher, CTR mode is no less secure than CBC or OFB, and in fact of-
fers several advantages. For example, CTR mode can easily encrypt
and decrypt blocks in parallel, unlike CBC mode. In addition, CTR
mode only needs to evaluate 𝑝𝑠 once to decrypt any single block of the
ciphertext, unlike OFB mode.
A fairly comprehensive study of the different modes of block ci-
phers is in this document by Rogaway. His conclusion is that if we
simply consider CPA security (as opposed to the stronger notions
of chosen ciphertext security we’ll see in the next lecture) then counter
mode is the best choice, but CBC, OFB and CFB are widely imple-
mented due to legacy reasons. ECB should not be used (except as a
building block as part of a construction achieving stronger security).
a. 𝑛
b. 𝑛2
c. 1
d. 2𝑛
■
6
Chosen Ciphertext Security
P
Please stop and play an ominous sound track at this
point.
and whose job is to decrypt and then deliver packets) simply delivers
it unencrypted straight into her hands. One issue is that Eve modi-
fies 𝑚1 then it is unlikely that the CRC code will still check out, and
hence Bob would reject the packet. However, CRC 32 - the CRC al-
gorithm used by WEP - is linear modulo 2, that is CRC(𝑥 ⊕ 𝑥′ ) =
CRC(𝑥) ⊕ CRC(𝑥′ ). This means that if the original ciphertext 𝑐
was an encryption of the message 𝑚 = 𝑚1 ‖𝑚2 ‖CRC(𝑚1 , 𝑚2 ) then
𝑐′ = 𝑐 ⊕ (𝑥1 ‖0𝑡 ‖CRC(𝑥1 ‖0𝑡 )) will be an encryption of the message
𝑚′ = (𝑚1 ⊕ 𝑥1 )‖𝑚2 ‖CRC((𝑥1 ⊕ 𝑚1 )‖𝑚2 ) (where 0𝑡 denotes a string of
zeroes of the same length 𝑡 as 𝑚2 , and hence 𝑚2 ⊕ 0𝑡 = 𝑚2 ). There-
fore by XOR’ing 𝑐 with 𝑥1 ‖0𝑡 ‖CRC(𝑥1 ‖0𝑡 ), the adversary Mallory can
ensure that Bob will deliver the message 𝑚2 to the IP address 𝑚1 ⊕ 𝑥1
of her choice (see Fig. 6.2).
What does CCA have to do with WEP? The CCA security game is some-
what strange, and it might not be immediately clear whether it has
anything to do with the attack we described on the WEP protocol.
However, it turns out that using a CCA secure encryption would have
prevented that attack. The key is the following claim:
Lemma 6.2 Suppose that (𝐸, 𝐷) is a CCA secure encryption. Then,
there is no efficient algorithm that given an encryption 𝑐 of the plain-
text (𝑚1 , 𝑚2 ) outputs a ciphertext 𝑐′ that decrypts to (𝑚′1 , 𝑚2 ) where
𝑚′1 ≠ 𝑚1 .
In particular Lemma 6.2 rules out the attack of transforming 𝑐 that
encrypts a message starting with a some address IP to a ciphertext
that starts with a different address IP . Let us now sketch its proof.
′
P
The proof above is rather sketchy. However it is not
very difficult and proving Lemma 6.2 on your own
is an excellent way to ensure familiarity with the
definition of CCA security.
This is a lesson that has been time and again been shown and many
protocols have been broken due to the mistaken belief that if we only 3
I also like the part where Green says about a block
care about secrecy, it is enough to use only encryption (and one that cipher mode that “if OCB was your kid, he’d play
is only CPA secure) and there is no need for authentication. Matthew three sports and be on his way to Harvard.” We
will have an exercise about a simplified version of
Green writes this more provocatively as the GCM mode (which perhaps only plays a single
sport and is on its way to …). You can read about
Nearly all of the symmetric encryption modes you learned
OCB in Exercise 9.14 in the Boneh-Shoup book;
about in school, textbooks, and Wikipedia are (poten- it uses the notion of a “tweakable block cipher”
tially) insecure. 3 which simply means that given a single key 𝑘, you
actually get a set {𝑝𝑘,1 , … , 𝑝𝑘,𝑡 } of permutations that
exactly because these basic modes only ensure security for pas- are indistinguishable from 𝑡 independent random
permutation (the set {1, … , 𝑡} is called the set of
sive eavesdropping adversaries and do not ensure chosen ciphertext “tweaks” and we sometimes index it using strings
security which is the “gold standard” for online applications. (For instead of numbers).
152 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
P
You should stop here and try to think how you would
implement a CCA secure encryption by combining
MAC’s with a CPA secure encryption.
chose n c i p he rte x t se c u ri ty 153
P
If you didn’t stop before, then you should really stop
and think now.
154 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
OK, so now that you had a chance to think about this on your own,
we will describe one way that works to achieve CCA security from
MACs. We will explore other approaches that may or may not work in
the exercises.
Theorem 6.3 — CCA from CPA and MAC (encrypt-then-sign). Let (𝐸, 𝐷) be
CPA-secure encryption scheme and (𝑆, 𝑉 ) be a CMA-secure MAC
with 𝑛 bit keys and a canonical verification algorithm. 4 Then the
following encryption (𝐸 ′ , 𝐷′ ) with keys 2𝑛 bits is CCA secure:
• 𝐸𝑘′ 1 ,𝑘2 (𝑚) is obtained by computing 𝑐 = 𝐸𝑘1 (𝑚) , 𝜎 = 𝑆𝑘2 (𝑐) and
outputting (𝑐, 𝜎).
• 𝐷𝑘′ 1 ,𝑘2 (𝑐, 𝜎) outputs nothing (e.g., an error message) if 𝑉𝑘2 (𝑐, 𝜎) ≠
1, and otherwise outputs 𝐷𝑘1 (𝑐).
4
By a canonical verification algorithm we mean that
𝑉𝑘 (𝑚, 𝜎) = 1 iff 𝑆𝑘 (𝑚) = 𝜎.
Proof. Suppose, for the sake of contradiction, that there exists an ad-
versary 𝑀 ′ that wins the CCA game for the scheme (𝐸 ′ , 𝐷′ ) with
probability at least 1/2 + 𝜖. We consider the following two cases:
Case I: With probability at least 𝜖/10, at some point during the
CCA game, 𝑀 ′ sends to its decryption box a ciphertext (𝑐, 𝜎) that is
not identical to one of the ciphertexts it previously obtained from its
encryption box, and obtains from it a non-error response.
Case II: The event above happens with probability smaller than
𝜖/10.
We will derive a contradiction in either case. In the first case, we
will use 𝑀 ′ to obtain an adversary that breaks the MAC (𝑆, 𝑉 ), while
in the second case, we will use 𝑀 ′ to obtain an adversary that breaks
the CPA-security of (𝐸, 𝐷).
Let’s start with Case I: When this case holds, we will build an ad-
versary 𝐹 (for “forger”) for the MAC (𝑆, 𝑉 ), we can assume the ad-
5
Since we use a MAC with canonical verification,
versary 𝐹 has access to the both signing and verification algorithms access to the signature algorithm implies access to the
as black boxes for some unknown key 𝑘2 that is chosen at random and verification algorithm.
fixed.5 𝐹 will choose 𝑘1 on its own, and will also choose at random
a number 𝑖0 from 1 to 𝑇 , where 𝑇 is the total number of queries that
𝑀 ′ makes to the decryption box. 𝐹 will run the entire CCA game
with 𝑀 ′ , using 𝑘1 and its access to the black boxes to execute the de-
cryption and decryption boxes, all the way until just before 𝑀 ′ makes
0 query (𝑐, 𝜎) to its decryption box. At that point, 𝐹 will output
the 𝑖𝑡ℎ
(𝑐, 𝜎). We claim that with probability at least 𝜖/(10𝑇 ), our forger will
succeed in the CMA game in the sense that (i) the query (𝑐, 𝜎) will
pass verification, and (ii) the message 𝑐 was not previously queried
before to the signing oracle.
chose n c i p he rte x t se c u ri ty 155
if it’s the case that 𝑀 ′ always outputs the wrong answer when 𝐸𝑣𝑒
makes this mistake, we will still get success at least 1/2 + 0.9𝜖. Since
𝜖 is non negligible, this would contradict the CPA security of (𝐸, 𝐷)
thereby concluding the proof of the theorem.
■
P
This proof is emblematic of a general principle for
proving CCA security. The idea is to show that the de-
cryption box is completely “useless” for the adversary,
since the only way to get a non error response from it
is to feed it with a ciphertext that was received from
the encryption box.
• Let 𝑐𝑖 = 𝑧𝑖 ⊕ 𝑚𝑖 .
use a different padding which involves encoding the length of the pad.
a. CPA secure
b. CCA secure
■
7
Hash Functions, Random Oracles, and Bitcoin
authority that would respect the certificate. The next step in the evo-
lution of currencies was fiat money, which is a currency (like today’s
dollar, ever since the U.S. moved off the gold standard) that does not
correspond to any commodity, but rather only relies on trust in a cen-
tral authority. (Another example is the Roman coins, which though
originally made of silver, underwent a continous process of debase-
ment until they contained less than two percent of it.) One advantage
(sometimes disadvantage) of a fiat currency is that it allows for more
flexible monetary policy on parts of the central authority.
will recognize here the class NP.) So when we say “transfer the coin
ID from 𝑃 to 𝑄” we mean that whomever holds a solution for the
puzzle 𝑄 is now the owner of the coin ID (and to verify the authen-
ticity of this transfer, you provide a solution to the puzzle 𝑃 .) More
accurately, a transaction involving the coin ID is self-validating if it
contains a solution to the puzzle that is associated with ID according
to the latest transaction in the ledger.
P
Please re-read the previous paragraph, to make sure
you follow the logic.
Perhaps the main idea behind Bitcoin is that “majority” will corre-
spond to a “majority of computing power”, or as the original Bitcoin
paper says, “one CPU one vote” (or perhaps more accurately, “one
cycle one vote”). It might not be immediately clear how to imple-
ment this, but at least it means that creating fictitious new entities
(sometimes known as a Sybil attack after the movie about multiple-
personality disorder) cannot help. To implement it we turn to a cryp-
tographic concept known as “proof of work” which was originally
suggested by Dwork and Naor in 1991 as a way to combat mass mar-
This was a rather visionary paper in that it foresaw
keting email.4
4
such that 𝑓𝑘 (𝑥) = 0ℓ . So, if we’re not too careful, we might think of
such an input 𝑥 as a proof that Alice spent 2ℓ time.
P
Stop here and try to think if indeed it is the case that
one cannot find an input 𝑥 such that 𝑓𝑘 (𝑥) = 0ℓ using
much fewer than 2ℓ steps.
The main question in using PRF’s for proofs of work is who is hold-
ing the key 𝑘 for the pseudorandom function. If there is a trusted
server holding the key, then sure, finding such an input 𝑥 would take
on average 2ℓ queries, but the whole point of Bitcoin is to not have a
ha sh fu nc ti on s, ra n d om orac l e s, a n d bi tcoi n 165
P
Indeed, it is an excellent exercise to prove that (under
the PRF conjecture) that there exists a PRF {𝑓𝑘 } map-
ping 𝑛 bits to 𝑛 bits and an efficient algorithm 𝐴 such
that 𝐴(𝑘) = 𝑥 such that 𝑓𝑘 (𝑥) = 0ℓ .
Where again, the “super strong PRF” behaves like a truly random
function even to a party that holds the key. Unfortunately such a result
is not known to be true, and for a very good reason. Most natural
ways to define “super strong PRF” will result in properties that can be
shown to be impossible to achieve. Nevertheless, the intuition behind it
still seems useful and so we have the following heuristic:
R
166 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
Under the random oracle model, we can now specify the “proof of
work” protocol for Bitcoin. Given some identifier ID ∈ {0, 1}𝑛 , an
integer 𝑇 ≪ 2𝑛 , and a hash function 𝐻 ∶ {0, 1}2𝑛 → {0, 1}𝑛 , the proof
of work corresponding to ID and 𝑇 will be some 𝑥 ∈ {0, 1}∗ such that 5
The actual Bitcoin protocol is slightly more general,
the first ⌈log 𝑇 ⌉ bits of 𝐻(ID‖𝑥) are zero.5 where the proof is some 𝑥 such that 𝐻(ID‖𝑥), when
interpreted as a number in [2𝑛 ], is at most 𝑇 . There
are also other issues about how exactly 𝑥 is placed
7.2.1 From Proof of Work to Consensus on Ledger and ID is computed from past history that we ignore
here.
How does proof of work help us in achieving consensus?
We want every transaction 𝑡𝑖 in the Bitcoin system to have a
corresponding proof of work. In particular, some proof of 𝑇𝑖 time
“amount” of work with respect to some identifier that is unique to 𝑡𝑖 .
The length of a ledger (𝑡1 , … , 𝑡𝑛 ) is the sum of the corresponding
𝑇𝑖 ’s. In other words, the length corresponds to the total number of
cycles invested in creating this ledger. A ledger is valid if every trans-
action in the ledger of the form “transfer the coin ID from 𝑃 to 𝑄” is
self-certified by a solution to 𝑃 .
Critically, participants (specifically miners) in the Bitcoin network
are rewarded for adding valid entries to the ledger. In other words,
they are given Bitcoins (which are newly minted for them) for per-
forming the “work” required to add an entry to the ledger. However,
honest participants (including non-miners, people who just read the
ledger) will accept the longest known ledger as the ground truth. In
addition, Bitcoin miners are rewarded for adding entry 𝑖 after entry
𝑖 + 100 is added to the ledger. This gives miners an incentive to choose
the longest ledger to contribute their work towards. To see why, con-
sider the following rough approximation of the incentive structure:
Remember that Bitcoin miners are rewarded for adding entry 𝑖 after
entry 𝑖 + 100 is added to the ledger. Thus, by spending “work” (which
directly corresponds to CPU cycles, which directly corresponds to
monetary value), miners are “betting” on whether a particular ledger
will “win”. Think of yourself as a miner, and consider a scenario in
which there are two competing ledgers. Ledger 1 has length 3 and
Ledger 2 has length 6. That means miners have put roughly 2x the
amount of work (= CPU cycles = money) into Ledger 2. In order for
Ledger 1 to “win” (from your perspective that means reach length
ha sh fu nc ti on s, ra n d om orac l e s, a n d bi tcoi n 167
104 to claim your prize and to become longer than Ledger 2), you
would have to perform 3 entries worth of work just to get Ledger 1 to
length 6. But in the meantime, other miners will already be working on
Ledger 2, further increasing its length! Thus you want to add entries
to Ledger 2.
If a ledger 𝐿 corresponds to the majority of the cycles that were
available in this network then every honest party would accept it, as
any alternative ledger would be necessarily shorter. (See Fig. 7.1.)
Thus one can hope that the consensus ledger will continue to grow.
(This is a rather hand-wavy and imprecise argument, see this paper
for a more in depth analysis; this is also related to the phenomenon
known as preferential attachment.)
A collection {ℎ𝑘 } of
Definition 7.3 — Collision resistant hash functions.
functions where ℎ𝑘 ∶ {0, 1}∗ → {0, 1}𝑛 for 𝑘 ∈ {0, 1}𝑛 is a collision
resistant hash function (CRH) collection if the map (𝑘, 𝑥) ↦ ℎ𝑘 (𝑥)
is efficiently computable and for every efficient adversary 𝐴,
the probability over 𝑘 that 𝐴(𝑘) = (𝑥, 𝑥′ ) such that 𝑥 ≠ 𝑥′ and
ℎ𝑘 (𝑥) = ℎ𝑘 (𝑥′ ) is negligible. 6
6
Note that the other side of the birthday bound
Once more we do not know a theorem saying that under the PRG shows that you can always find a collision in ℎ𝑘 using
conjecture there exists a collision resistant hash function collection, roughly 2𝑛/2 queries. For this reason we typically
need to double the output length of hash functions
even though this property is considered as one of the desiderata for compared to the key size of other cryptographic
cryptographic hash functions. However, we do know how to obtain primitives (e.g., 256 bits as opposed to 128 bits).
collections satisfying this condition under various assumptions that
we will see later in the course such as the learning with error problem
and the factoring and discrete logarithm problems. Furthermore if
we consider the weaker notion of security under a second preimage
attack (also known as being a “universal one way hash function” or
UOWHF) then it is known how to derive such a function from the
PRG assumption.
R
Remark 7.4 — CRH vs PRF. A collection {ℎ𝑘 } of colli-
sion resistant hash functions is an incomparable object
to a collection {𝑓𝑠 } of pseudorandom functions with
the same input and output lengths. On one hand,
the condition of being collision-resistant does not
imply that ℎ𝑘 is indistinguishable from random. For
example, it is possible to construct a valid collision
resistant hash function where the first output bit al-
ways equals zero (and hence is easily distinguishable
from a random function). On the other hand, unlike
Definition 4.1, the adversary of Definition 7.3 is not
merely given a “black box” to compute the hash func-
tion, but rather the key to the hash function. This is a
much stronger attack model, and so a PRF does not
have to be collision resistant. (Constructing a PRF that
is not collision resistant is a nice and recommended
exercise.)
170 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
Let 𝐻 be
Theorem 7.5 — Merkle-Damgard preserves collision resistance.
constructed from ℎ as above. Then given two messages 𝑚 ≠ 𝑚′ ∈
{0, 1}𝑡𝑛 such that 𝐻(𝑚) = 𝐻(𝑚′ ) we can efficiently find two mes-
sages 𝑥 ≠ 𝑥′ ∈ {0, 1}2𝑛 such that ℎ(𝑥) = ℎ(𝑥′ ).
Proof. The intuition behind the proof is that if ℎ was invertible then
we could invert 𝐻 by simply going backwards. Thus in principle if
a collision for 𝐻 exists then so does a collision for ℎ. Now of course
this is a vacuous statement since both ℎ and 𝐻 shrink their inputs and
hence clearly have collisions. But we want to show a constructive proof
for this statement that will allow us to transform a collision in 𝐻 to
a collision in ℎ. This is very simple. We look at the computation of
𝐻(𝑚) and 𝐻(𝑚′ ) and at the first block in which the inputs differ but
the output is the same (there must be such a block). This block will
yield a collision for ℎ.
■
ha sh fu nc ti on s, ra n d om orac l e s, a n d bi tcoi n 171
a. For every function ℎ ∶ {0, 1}1024 → {0, 1}128 there exist two strings
𝑥 ≠ 𝑥′ in {0, 1}1024 such that ℎ(𝑥) = ℎ(𝑥′ ).
Exercise 7.2 Suppose that ℎ ∶ {0, 1}1024 → {0, 1}128 is chosen at random.
If 𝑦 is chosen at random in {0, 1}128 and we pick 𝑥1 , … , 𝑥𝑡 indepen-
dently at random in {0, 1}1024 , how large does 𝑡 need to be so that the
probability that there is some 𝑥𝑖 such that ℎ(𝑥𝑖 ) = 𝑦 is at least 1/2.
(Pick the answer with the closest estimate):
174 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
a. 21024
b. 2256
c. 2128
d. 264
b. It is always secure.
c. It is always insecure.
■
8
Key derivation, protecting passwords, slow hashes, Merkle
trees
there were only 520 such numbers selected in the last 10 years). More-
over, if they knew exactly what draw I based my password on, then
they would know it exactly and hence the entropy (from their point of
view) would be zero. This is worthwhile to emphasize:
The entropy of a secret is always measured with respect to
the attacker’s point of view.
P
A natural approach is to simply let the key be the
password. For example, if the password 𝑝 is a string
of at most 16 bytes, then we can simply treat it as a
128 bit key and use it for encryption. Stop and think
why this would not be a good idea. In particular think
of an example of a secure encryption (𝐸, 𝐷) and a
distribution 𝑃 over {0, 1}𝑛 of entropy at least 𝑛/2 such
that if the key 𝑘 is chosen at random from 𝑃 then the
encryption will be completely insecure.
clear but always in this “slow hashed and salted” form, so if the pass-
words file falls to the hands of an adversary it will be expensive to
recover them.
Alice, who sends 𝑥 to the cloud Bob, will keep the short block 𝑦.
Whenever Alice queries the value 𝑖 she will ask for a certificate that 𝑥𝑖
is indeed the right value. This certificate will consists of the block that
contains 𝑖, as well as all of the 2 log 𝑡 blocks that were used in the hash
from this block to the root. The security of this scheme follows from
the following simple theorem:
How do hash functions figure into this? The idea is that if an input
𝑥 has 𝑛 bits of entropy then ℎ(𝑥) would still have the same bits of
entropy, as long as its output is larger than 𝑛. In practice people use
the notion of “entropy” in a rather loose sense, but we will try to be
more precise below.
ke y d e ri vati on, p rote c ti ng pa ssword s, slow ha she s, me rkl e tre e s 181
bility ℎ is an extractor.
R
Remark 8.4 — Statistical randomness. This proof ac-
tually proves a much stronger statement. First, note
that we did not at all use the fact that 𝑇 is efficiently
computable and hence the distribution ℎ𝑠 (𝑋) will
not be merely pseudorandom but actually statisti-
cally indistinguishable from truly random distribution.
Second, we didn’t use the fact that ℎ is completely
random but rather what we needed was merely
pairwise independence: that for every 𝑥 ≠ 𝑥′ and 𝑦,
Pr𝑠 [ℎ𝑠 (𝑥) = ℎ𝑠 (𝑥′ ) = 𝑦] = 2−2𝑛 . There are efficient
constructions of functions ℎ(⋅) with this property,
though in practice people still often use cryptographic
hash function for this purpose.
We only found out much later that in the late 1960’s, a few years be-
fore Merkle, James Ellis of the British Intelligence agency GCHQ was
having similar thoughts. His curiosity was spurred by an old World
War II manuscript from Bell labs that suggested the following way
that two people could communicate securely over a phone line. Alice
would inject noise to the line, Bob would relay his messages, and then
Alice would subtract the noise to get the signal. The idea is that an
adversary over the line sees only the sum of Alice’s and Bob’s signals
and doesn’t know what came from what. This got James Ellis thinking
whether it would be possible to achieve something like that digitally.
As he later recollected, in 1970 he realized that in principle this should
be possible. He could think of an hypothetical black box 𝐵 that on
input a “handle” 𝛼 and plaintext 𝑝 would give a “ciphertext” 𝑐. There
would be a secret key 𝛽 corresponding to 𝛼 such that feeding 𝛽 and 𝑐
to the box would recover 𝑝. However, Ellis had no idea how to actu-
ally instantiate this box. He and others kept giving this question as a
puzzle to bright new recruits until one of them, Clifford Cocks, came
up in 1973 with a candidate solution loosely based on the factoring
problem; in 1974 another GCHQ recruit, Malcolm Williamson, came
up with a solution using modular exponentiation.
But among all those thinking of public key cryptography, probably
the people who saw the furthest were two researchers at Stanford,
Whit Diffie and Martin Hellman. They realized that with the advent
of electronic communication, cryptography would find new applica-
tions beyond the military domain of spies and submarines. And they
understood that in this new world of many users and point to point
communication, cryptography would need to scale up. They envi-
sioned an object which we now call “trapdoor permutation” though
they called it “one way trapdoor function” or sometimes simply “pub-
lic key encryption”. This is a collection of permutations {𝑝𝑘 } where
p u bl i c ke y c ry p tog ra p hy 189
𝑝𝑘 is a permutation over (say) {0, 1}|𝑘| , and the map (𝑥, 𝑘) ↦ 𝑝𝑘 (𝑥)
is efficiently computable but the reverse map (𝑘, 𝑦) ↦ 𝑝𝑘−1 (𝑦) is com-
putationally hard. Yet, there is also some secret key 𝑠(𝑘) (i.e., the
“trapdoor”) such that using 𝑠(𝑘) it is possible to efficiently compute
𝑝𝑘−1 . Their idea was that using such a trapdoor permutation, Alice
who knows 𝑠(𝑘) would be able to publish 𝑘 on some public file such
that everyone who wants to send her a message 𝑥 could do so by com-
puting 𝑝𝑘 (𝑥). (While today we know, due to the work of Goldwasser
and Micali, that such a deterministic encryption is not a good idea,
at the time Diffie and Hellman had amazing intuitions but didn’t re-
ally have proper definitions of security.) But they didn’t stop there.
They realized that protecting the integrity of communication is no
less important than protecting its secrecy. Thus, they imagined that
Alice could “run encryption in reverse” in order to certify or sign mes-
sages. That is, given some message 𝑚, Alice would send the value
𝑥 = 𝑝𝑘−1 (ℎ(𝑚)) (for a hash function ℎ) as a way to certify that she en-
dorses 𝑚, and every person who knows 𝑘 could verify this by check-
ing that 𝑝𝑘 (𝑥) = ℎ(𝑚).
At this point, Diffie and Hellman were in a position similar to past
physicists, who predicted that a certain particle should exist but had
no experimental verification. Luckily they met Ralph Merkle. His
ideas about a probabilistic key exchange protocol, together with a sug-
gestion from their Stanford colleague John Gill, inspired them to come
up with what today is known as the Diffie-Hellman Key Exchange (un-
beknownst to them, a similar protocol was found two years earlier at
GCHQ by Malcolm Williamson). They published their paper “New
Directions in Cryptography” in 1976, and it is considered to have
brought about the birth of modern cryptography. However, they still
didn’t find their elusive trapdoor function. This was done the next
year by Rivest, Shamir and Adleman who came up with the RSA trap-
door function, which through the framework of Diffie and Hellman
yielded not just encryption but also signatures (this was essentially
the same function discovered earlier by Clifford Cocks at GCHQ,
though as far as I can tell Cocks, Ellis and Williamson did not real-
ize the application to digital signatures). From this point on began a
flurry of advances in cryptography which hasn’t really died down till
this day.
ing proofs in one or both of these books. It is often helpful to see the
same proof presented in a slightly different way. Below is a review of
some of the various reductions we saw in class, with pointers to the
corresponding sections in the Katz-Lindell (2nd ed) and Boneh-Shoup
books. These are also covered in Rosulek’s book.
One major point we did not talk about in this course was one way
functions. The definition of a one way function is quite simple:
Definition 9.1 — One Way Functions. A function 𝑓 ∶ {0, 1}∗ → {0, 1}∗ is
a one way function if it is efficiently computable and for every 𝑛 and
a 𝑝𝑜𝑙𝑦(𝑛) time adversary 𝐴, the probability over 𝑥 ←𝑅 {0, 1}𝑛 that
𝐴(𝑓(𝑥)) outputs 𝑥′ such that 𝑓(𝑥′ ) = 𝑓(𝑥) is negligible.
Theorem 9.2 — One way functions and private key cryptography. The follow-
ing are equivalent:
R
Remark 9.3 — Cryptanalytic attacks on private key cryp-
tosystems. Another topic we did not discuss in depth
is attacks on private key cryptosystems. These attacks
often work by “opening the black box” and looking at
the internal operation of block ciphers or hash func-
tions. We then assign variables to various internal
registers, and look to find collections of inputs that
would satisfy some non-trivial relation between those
variables. This is a rather vague description, but you
can read KL Section 6.2.6 on linear and differential
cryptanalysis and BS Sections 3.7-3.9 and 4.3 for more
information. See also this course of Adi Shamir, and
the courses of Dunkelman on analyzing block ciphers
and hash functions. There is also the fascinating area
of side channel attacks on both public and private key
crypto, see this course of Tromer.
R
Remark 9.4 — Digital Signatures. We will discuss in
this lecture Digital signatures, which are the public key
analog of message authentication codes. Surprisingly,
despite being a “public key” object, it is possible to
base digital signatures on one-way functions (this is
obtained using ideas of Lamport, Merkle, Goldwasser-
Goldreich-Micali, Naor-Yung, and Rompel). However
these constructions are not very efficient (and this
may be inherent), and so in practice people use digital
192 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
We say that
Definition 9.6 — CPA security for public-key encryption.
(𝐺, 𝐸, 𝐷) is CPA secure if every efficient adversary 𝐴 wins the
following game with probability at most 1/2 + 𝑛𝑒𝑔𝑙(𝑛):
• (𝑒, 𝑑) ←𝑅 𝐺(1𝑛 )
P
Despite it being a “chosen plaintext attack”, we don’t
explicitly give 𝐴 access to the encryption oracle in the
public key setting. Make sure you understand why
giving it such access would not give it more power.
Diffie and Hellman couldn’t really find a way to make this work,
but it convinced them this notion of public key is not inherently im-
possible. This concept of compiling a program into a functionally
equivalent but “inscrutable” form is known as software obfuscation. It
had turned out to be quite a tricky object to both define formally and
achieve, but it serves as very good intuition for what can be achieved,
even if, as with the random oracle, this intuition can sometimes be
too optimistic. (Indeed, if software obfuscation was possible then we
could obtain a “random oracle like” hash function by taking the code
of a function 𝑓𝑘 chosen from a PRF family and compiling it through an
obfuscating compiler.)
We will not formally define obfuscators yet, but on an intuitive
level it would be a compiler that takes a program 𝑃 and maps into a
program 𝑃 ′ such that:
3
The running time of the best known algorithms
when 𝑝 is a prime of length about 2048 bits.3 for computing the discrete logarithm modulo 𝑛
John Gill suggested to Diffie and Hellman that modular exponen- 𝑛1/3
bit primes is 2𝑓(𝑛)2 , where 𝑓(𝑛) is a function
tiation can be a good source for the kind of “easy-to-compute but that depends polylogarithmically on 𝑛. If 𝑓(𝑛) would
equal 1, then we’d need numbers of 1283 ≈ 2⋅106 bits
hard-to-invert” functions they were looking for. Diffie and Hellman to get 128 bits of security, but because 𝑓(𝑛) is larger
based a public key encryption scheme as follows: than one, the current estimates are that we need to let
𝑛 = 3072 bit key to get 128 bits of of security. Still,
the existence of such a non-trivial algorithm means
• The key generation algorithm, on input 𝑛, samples a prime number 𝑝
that we need much larger keys than those used for
of 𝑛 bits description (i.e., between 2𝑛−1 to 2𝑛 ), a number 𝑔 ←𝑅 ℤ𝑝 private key systems to get the same level of security.
and 𝑎 ←𝑅 {0, … , 𝑝 − 1}. We also sample a hash function 𝐻 ∶ In particular, to double the estimated security to 256
bits, NIST recommends that we multiply the RSA
{0, 1}𝑛 → {0, 1}ℓ . The public key 𝑒 is (𝑝, 𝑔, 𝑔𝑎 , 𝐻), while the secret keysize five-fold to 15, 360. (The same document
key 𝑑 is 𝑎.4 also says that SHA-256 gives 256 bits of security as
a pseudorandom generator but only 128 bits when
used to hash documents for digital signatures; can
• The encryption algorithm, on input a message 𝑚 ∈ {0, 1}ℓ and a you see why?)
public key 𝑒 = (𝑝, 𝑔, ℎ, 𝐻), will choose a random 𝑏 ←𝑅 {0, … , 𝑝 − 1} 4
Formally, the secret key should contain all the
and output (𝑔𝑏 , 𝐻(ℎ𝑏 ) ⊕ 𝑚). information in the public key plus the extra secret
information, but we omit the public information for
simplicity of notation.
• The decryption algorithm, on input a ciphertext (𝑓, 𝑦) and the secret
key, will output 𝐻(𝑓 𝑎 ) ⊕ 𝑦.
P
Please take your time to re-read the following conjec-
ture until you are sure you understand what it means.
Victor Shoup’s excellent and online available book A
Computational Introduction to Number Theory and
Algebra has an in depth treatment of groups, genera-
tors, and the discrete log and Diffie-Hellman problem.
See also Chapters 10.4 and 10.5 in the Boneh-Shoup
book, and Chapters 8.3 and 11.4 in the Katz-Lindell
book. There are also solved group theory exercises at
the end of this chapter.
198 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
Suppose
Theorem 9.8 — Diffie-Hellman security in Random Oracle Model.
that the Computational Diffie-Hellman Conjecture for mod prime
groups is true. Then, the Diffie-Hellman public key encryption is
CPA secure in the random oracle model.
Proof. For CPA security we need to prove that (for fixed 𝔾 of size 𝑝
and random oracle 𝐻) the following two distributions are computa-
tionally indistinguishable for every two strings 𝑚, 𝑚′ ∈ {0, 1}ℓ :
(can you see why this implies CPA security? you should pause here
and verify this!)
We make the following claim:
CLAIM: For a fixed 𝔾 of size 𝑝, generator 𝑔 for 𝔾, and given random
oracle 𝐻, if there is a size 𝑇 distinguisher 𝐴 with 𝜖 advantage between
the distribution (𝑔𝑎 , 𝑔𝑏 , 𝐻(𝑔𝑎𝑏 )) and the distribution (𝑔𝑎 , 𝑔𝑏 , 𝑈ℓ ) (where
𝑎, 𝑏 are chosen uniformly and independently in ℤ𝑝 ), then there is a
size 𝑝𝑜𝑙𝑦(𝑇 ) algorithm 𝐴′ to solve the Diffie-Hellman problem with
respect to 𝔾, 𝑔 with success at least 𝜖. That is, for random 𝑎, 𝑏 ∈ ℤ𝑝 ,
𝐴′ (𝑔, 𝑔𝑎 , 𝑔𝑏 ) = 𝑔𝑎𝑏 with probability at least 𝜖/(2𝑇 ).
Proof of claim: The proof is simple. We claim that under the as-
sumptions above, 𝐴 makes the query 𝑔𝑎𝑏 to its oracle 𝐻 with probabil-
ity at least 𝜖/2 since otherwise, by the “lazy evaluation” paradigm, we
can assume that 𝐻(𝑔𝑎𝑏 ) is chosen independently at random after 𝐴’s
attack is completed and hence (conditioned on the adversary not mak-
ing that query), the value 𝐻(𝑔𝑎𝑏 ) is indistinguishable from a uniform
output. Therefore, on input 𝑔, 𝑔𝑎 , 𝑔𝑏 , 𝐴′ can simulate 𝐴 and simply
output one of the at most 𝑇 queries that 𝐴 makes to 𝐻 at random and
will be successful with probability at least 𝜖/(2𝑇 ).
Now given the claim, we can complete the proof of security via the
following hybrids. Define the following “hybrid” distributions (where
in all cases 𝑎, 𝑏 are chosen uniformly and independently in ℤ𝑝 ):
• 𝐻0 : (𝑔𝑎 , 𝑔𝑏 , 𝐻(𝑔𝑎𝑏 ) ⊕ 𝑚)
• 𝐻1 : (𝑔𝑎 , 𝑔𝑏 , 𝑈ℓ ⊕ 𝑚)
• 𝐻2 : (𝑔𝑎 , 𝑔𝑏 , 𝑈ℓ ⊕ 𝑚′ )
• 𝐻3 : (𝑔𝑎 , 𝑔𝑏 , 𝐻(𝑔𝑎𝑏 ) ⊕ 𝑚′ )
p u bl i c ke y c ry p tog ra p hy 199
R
Remark 9.9 — Decisional Diffie Hellman. One can get
security results for this protocol without a random
oracle if we assume a stronger variant known as
the Decisional Diffie-Hellman (DDH) assumption:
for a random 𝑎, 𝑏, 𝑢 ∈ ℤ𝑝 (prime 𝑝), the triple
(𝑔𝑎 , 𝑔𝑏 , 𝑔𝑎𝑏 ) ≈ (𝑔𝑎 , 𝑔𝑏 , 𝑔𝑢 ). This implies CDH (can
you see why?). DDH also restricts our focus to groups
of prime order. In particular, DDH does not hold in
even-order groups. For example, DDH does not hold
in ℤ𝑝 = {1, 2 … 𝑝 − 1} (with group operation multipli-
cation mod 𝑝) since half of its elements are quadratic
residues and it is efficient to test if an element is a
quadratic residue using Fermat’s little theorem (can
you see why? See Exercise 10.7). However, DDH
holds in subgroups of ℤ𝑝 of prime order. If 𝑝 is a safe
prime (i.e. 𝑝 = 2𝑞 + 1 for a prime 𝑞), then we can
instead use the subgroup of quadratic residues, which
has prime order 𝑞. See Boneh-Shoup 10.4.1 for more
details on the underlying groups for CDH and DDH.
R
Remark 9.10 — Elliptic curve cryptography. As men-
tioned, the Diffie-Hellman systems can be run with
many variants of Abelian groups. Of course, for some
of those groups the discrete logarithm problem might
be easy, and so they would be inappropriate to use
for this system. One variant that has been proposed
is elliptic curve cryptography. This is a group consist-
ing of points of the form (𝑥, 𝑦, 𝑧) ∈ ℤ3𝑝 that satisfy a
certain equation, where multiplication can be defined
in a certain way. The main advantage of elliptic curve
cryptography is that the best known algorithms run in
time 2≈𝑛 as opposed to 2≈𝑛 , which allows for much
1/3
R
Remark 9.11 — Encryption vs Key Exchange and ElGamal.
In most of the cryptography literature the protocol
above is called the Diffie-Hellman Key Exchange pro-
tocol, and when considered as a public key system
it is sometimes known as ElGamal encryption. 6 The
reason for this mostly stems from the early confusion
on what the right security definitions are. Diffie and
Hellman thought of encryption as a deterministic pro-
cess and so they called their scheme a “key exchange
protocol”. The work of Goldwasser and Micali showed
that encryption must be probabilistic for security.
Also, because of efficiency considerations, these days
public key encryption is mostly used as a mechanism
to exchange a key for a private key encryption that is
then used for the bulk of the communication. Together
this means that there is not much point in distinguish-
ing between a two-message key exchange algorithm
and a public key encryption.
6
ElGamal’s actual contribution was to design a
signature scheme based on the Diffie-Hellman problem,
a variant of which is the Digital Signature Algorithm
9.3.2 Sampling random primes (DSA) described below.
Proof. Recall that the least common multiple (LCM) of two or more
𝑎1 , … , 𝑎𝑡 is the smallest number that is a multiple of all of the 𝑎𝑖 ’s. One
way to compute the LCM of 𝑎1 , … , 𝑎𝑡 is to take the prime factoriza-
tions of all the 𝑎𝑖 ’s, and then the LCM is the product of all the primes
that appear in these factorizations, each taken to the corresponding
highest power that appears in the factorization. Let 𝑘 be the number
p u bl i c ke y c ry p tog ra p hy 201
of primes between 1 and 𝑁 . The lemma will follow from the following
two claims:
CLAIM 1: LCM(1, … , 𝑁 ) ≤ 𝑁 𝑘 .
CLAIM 2: If 𝑁 is odd, then LCM(1, … , 𝑁 ) ≥ 2𝑁−1 .
The two claims immediately imply the result, since they imply
that 2𝑁−1 ≤ 𝑁 𝑘 , and taking logs we get that 𝑁 − 1 ≤ 𝑘 log 𝑁 or
𝑘 ≥ (𝑁 − 1)/ log 𝑁 . (We can assume that 𝑁 is odd without of loss of
generality, since changing from 𝑁 to 𝑁 + 1 can change the number of
primes by at most one.) Thus, all that is left is to prove the two claims.
Proof of CLAIM 1: Let 𝑝1 , … , 𝑝𝑘 be all the prime numbers between
1 and 𝑁 , and let 𝑒𝑖 be the largest integer such that 𝑝𝑖 𝑖 ≤ 𝑁 and 𝐿 =
𝑒
1
2−𝑁+1 ≥ 𝐼 ≥ 𝐿𝐶𝑀(1,…,𝑁)
The following basic facts are all not too hard to prove and would be
useful exercises:
P
Try to stop here and verify all the facts on groups
mentioned above. There are additional group theory
exercises at the end of the chapter as well.
p u bl i c ke y c ry p tog ra p hy 203
A triple of algo-
Definition 9.13 — Digital Signatures and CMA security.
rithms (𝐺, 𝑆, 𝑉 ) is a chosen-message-attack secure digital signature
scheme if it satisfies the following:
R
Remark 9.14 — Strong unforgeability. Just like for MACs
(see Definition 4.8), our definition of security for
digital signatures with respect to a chosen message
attack does not preclude the ability of the adversary
to produce a new signature for the same message that
it has seen a signature of. Just like in MACs, people
sometimes consider the notion of strong unforgeability
which requires that it would not be possible for the
adversary to produce a new message-signature pair
(even if the message itself was queried before). Some
signature schemes (such as the full domain hash and
the DSA scheme) satisfy this stronger notion while
others do not. However, just like MACs, it is possible
to transform any signature with standard security into
204 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
The first issue is not so significant, since we can always have the
ciphertext be an encryption of 𝑥 = 𝐻(𝑚) where 𝐻 is some hash
function presumed to behave as a random oracle. (We do not want to
simply run this protocol with 𝑥 = 𝑚. Can you see why?)
The second issue is more serious. We could imagine Alice trying
to run this protocol on her own by generating the ciphertext and then
decrypting it, and then sending over the transcript to Bob. But this
does not really prove that she knows the corresponding private key.
After all, even without knowing 𝑑, any party can generate a ciphertext
𝑐 and its corresponding decryption. The idea behind the DSA protocol
is that we require Alice to generate a ciphertext 𝑐 and its decryption
satisfying some additional extra conditions, which would prove that
Alice truly knew the secret key.
P
You should pause here and verify that this is indeed a
valid signature scheme, in the sense that for every 𝑚,
𝑉𝑠 (𝑚, 𝑆𝑠 (𝑚)) = 1.
Very roughly speaking, the idea behind security is that on one hand
𝜎 does not reveal information about 𝑏 and 𝑎 because this is “masked”
by the “random” value 𝐻(𝑚). On the other hand, if an adversary is
able to come up with valid signatures, then at least if we treated 𝐻
and 𝐹 as oracles, if the signature passes verification then (by taking
log to the base of 𝑔) the answers 𝑥, 𝑦 of these oracles will satisfy 𝑏𝜎 =
𝑥 + 𝑎𝑦, which means that sufficiently many such equations should be
enough to recover the discrete log 𝑎.
P
Before seeing the actual proof, it is a very good exer-
cise to try to see how to convert the intuition above
into a formal proof.
Suppose
Theorem 9.15 — Random-Oracle Model Security of DSA signatures.
that the discrete logarithm assumption holds for the group 𝔾. Then
the DSA signature with 𝔾 is secure when 𝐻, 𝐹 are modeled as
random oracles.
Proof. Suppose, for the sake of contradiction, that there was a 𝑇 -time
adversary 𝐴 that succeeds with probability 𝜖 in a chosen message
attack against the DSA scheme. We will show that there is an adver-
sary that can compute the discrete logarithm with running time and
probability polynomially related to 𝑇 and 𝜖 respectively.
Recall that in a chosen message attack in the random oracle model,
the adversary interacts with a signature oracle and oracles that com-
pute the functions 𝐹 and 𝐻. For starters, we consider the following
experiment CMA , where in the chosen message attack we replace the
′
signature box with the following “fake signature oracle” and “fake
function 𝐹 oracle”:
206 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
mod 𝑝
(9.1)
−1
𝑓 = (𝑔𝐻(𝑚) ℎ𝑟 )𝜎
and output (𝑓, 𝜎). We will then record the value 𝐹 (𝑓) = 𝑟 and answer
𝑟 on future queries to 𝐹 . If we’ve already answered 𝐹 (𝑓) before with a
different value, then we halt the experiment and output an error. We
claim that the adversary’s chance of succeeding in CMA is computa-
′
and
∗ ∗ ∗
𝑔𝐻(𝑚 ) ℎ𝐹 (𝑓 ) = (𝑓 ∗ )𝜎
and
𝐻(𝑚∗ ) + 𝑎𝐹 (𝑓 ∗ ) = 𝑏𝜎∗
p u bl i c ke y c ry p tog ra p hy 207
or
𝑏 = (𝐻(𝑚) − 𝐻(𝑚∗ ))(𝜎 − 𝜎∗ )−1 mod 𝑝
since all of the values 𝐻(𝑚∗ ), 𝐻(𝑚), 𝜎, 𝜎∗ are known, this means we
can compute 𝑏, and hence also recover the unknown value 𝑎.
If Case II happens, then we split it into two cases as well.
Case IIa is the subcase of Case II where 𝐹 (𝑓 ∗ ) is queried before
𝐻(𝑚∗ ) is queried, and Case IIb is the subscase of Case II when 𝐹 (𝑓 ∗ )
is queried after 𝐻(𝑚∗ ) is queried.
We start by considering the setting that Case IIa happens with
non-negligible probability 𝜖. By the averaging argument there are
some 𝑡′ < 𝑡 ∈ {1, … , 𝑇 } such that with probability at least 𝜖/𝑇 2 , 𝑓 ∗ is
queried by the adversary at the 𝑡′ -th query and 𝑚∗ is queried by the
adversary at its 𝑡-th query. We run the CMA experiment twice, using
′
𝐻1 (𝑚∗ ) + 𝑎𝐹 (𝑓 ∗ ) = 𝑏𝜎
and
𝐻2 (𝑚∗ ) + 𝑎𝐹 (𝑓 ∗ ) = 𝑏𝜎∗
where 𝐻1 (𝑚∗ ) and 𝐻2 (𝑚∗ ) are the answers of 𝐻 to the query 𝑚∗ in
the first and second time we run the experiment. (The answers of 𝐹 to
𝑓 ∗ are the same since this happens before the 𝑡-th step). As before, we
can use this to recover 𝑎 = log𝑔 ℎ.
If Case IIb happens with non-negligible probability, 𝜖 > 0. Then
again by the averaging argument there are some 𝑡 < 𝑡′ ∈ {1, … , 𝑇 }
such that with probability at least 𝜖/𝑇 2 , 𝑚∗ is queried by the adversary
at the 𝑡-th query, and 𝑓 ∗ is queried by the adversary at its 𝑡′ -th query.
We run the CMA experiment twice, using the same randomness up
′
𝐻(𝑚∗ ) + 𝑎𝐹1 (𝑓 ∗ ) = 𝑏𝜎
and
𝐻(𝑚∗ ) + 𝑎𝐹2 (𝑓 ∗ ) = 𝑏𝜎∗
where 𝐹1 (𝑓 ∗ ) and 𝐹2 (𝑓 ∗ ) are our two answers in the first and second
experiment, and now we can use this to learn 𝑎 = 𝑏(𝜎 − 𝜎∗ )(𝐹1 (𝑓 ∗ ) −
𝐹2 (𝑓 ∗ ))−1 .
The bottom line is that we obtain a probabilistic polynomial time
algorithm that on input 𝔾, 𝑔, 𝑔𝑎 recovers 𝑎 with non-negligible proba-
bility, hence violating the assumption that the discrete log problem is
hard for the group 𝔾.
208 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
R
Remark 9.16 — Non-random oracle model security. In
this lecture both our encryption scheme and digital
signature schemes were not proven secure under a
well-stated computational assumption but rather used
the random oracle model heuristic. However, it is
known how to obtain schemes that do not rely on this
heuristic, and we will see such schemes later on in this
course.
45 bb ab d1 ff 1f b1 48 7b a3 4f be c7 9d
0f 5c 0b f1 dc 13 15 b0 10 e3 e3 b6 21 0b
40 b0 a3 ca af cc bf 69 fb 99 b8 7b 22 32
bc 1b 17 72 5b e5 e5 77 2b bd 65 d0 03 00
10 e7 09 04 e5 f2 f5 36 e3 1b 0a 09 fd 4e
1b 5a 1e d7 da 3c 20 18 93 92 e3 a1 bd 0d
03 7c b6 4f 3a a4 e5 e5 ed 19 97 f1 dc ec
9e 9f 0a 5e 2c ae f1 3a e5 5a d4 ca f6 06
cf 24 37 34 d6 fa c4 4c 7e 0e 12 08 a5 c9
dc cd a0 84 89 35 1b ca c6 9e 3c 65 04 32
36 c7 21 07 f4 55 32 75 62 a6 b3 d6 ba e4
63 dc 01 3a 09 18 f5 c7 49 bc 36 37 52 60
23 c2 10 82 7a 60 ec 9d 21 a6 b4 da 44 d7
52 ac c4 2e 3d fe 89 93 d1 ba 7e dc 25 55
46 50 56 3e e0 f0 8e c3 0a aa 68 70 af ec
90 25 2b 56 f6 fb f7 49 15 60 50 c8 b4 c4
78 7a 6b 97 ec cd 27 2e 88 98 92 db 02 03
01 00 01”
all the prime factors of (2𝑁𝑁 ) are between 0 and 2𝑁 , and each factor
𝑃 cannot appear more than 𝑘 = ⌊ log log 𝑃 ⌋ times. Indeed, for every 𝑁 ,
2𝑁
2𝑁
𝑁 ≤ log ( )
𝑁
≤ ∑ ⌊ log
log 𝑃 ⌋ log 𝑃
2𝑁
𝑃 prime∈[2𝑛]
≤ ∑ log 2𝑁
𝑃 prime∈[2𝑛]
• For example, the integers (i.e. infinitely many elements) where the
operation is addition is a commutative group: if 𝑎, 𝑏, 𝑐 are integers,
then 𝑎 + 𝑏 = 𝑏 + 𝑎 (commutativity), (𝑎 + 𝑏) + 𝑐 = 𝑎 + (𝑏 +
𝑐) (associativity), 𝑎 + 0 = 𝑎 (so 0 is the identity element here;
we typically think of the identity as 1, especially when the group
operation is multiplication), and 𝑎 + (−𝑎) = 0 (i.e. for any integer,
we are allowed to think of its additive inverse, which is also an
integer).
For more familiarity with group definitions, you could verify that
the first 4 groups satisfy the group axioms. For cryptography, two
operations need be efficient for elements 𝑎, 𝑏 in group 𝔾:
Solution:
Yes (if multiplication) and no (if addition). To prove that some-
thing is a group, we run through the definition of a group. This
set is finite, and multiplication (even multiplication mod some
number) will satisfy commutativity and associativity. The identity
element is 1 because any number times 1, even mod 7, is still itself.
To find inverses, we can in this case literally find the inverses. 1 ∗ 1
mod 7 = 1 mod 7 (so the inverse of 1 is 1). 2 ∗ 4 mod 7 = 8
mod 7 = 1 mod 7 (so the inverse of 2 is 4, and from commutativ-
ity, the inverse of 4 is 2). 3 ∗ 5 mod 7 = 15 mod 7 = 1 mod 7 (so
the inverse of 3 is 5, and the inverse of 5 is 3). 6 ∗ 6 mod 7 = 36
mod 7 = 1 mod 7 (so 6 is its own inverse; notice that an element
can be its own inverse, even if it is not the identity 1). The set 𝑆
is not a group if the operation is addition for many reasons: one
way to see this 1 + 6 mod 7 = 0 mod 7, but 0 is not an element
of 𝑆, so this group is not closed under its operation (implicit in the
definition of a group is the idea that a group’s operation must send
two group elements to another element within the same set of group
elements).
■
Solved Exercise 9.2 What are the generators of the group {1, 2, 3, 4, 5, 6},
where the operation is multiplication mod 7?
■
Solution:
3 and 5. Recall that a generator of a group is an element 𝑔 such
that {𝑔, 𝑔2 , 𝑔3 , ⋯} is the entire group. We can directly check the
elements here: {1, 12 , 13 , ⋯} = {1}, so 1 is not a generator. 2 is
not a generator because 2 mod 7 = 8 mod 7 = 1, so the set
3
{2, 22 , 23 , 24 , ⋯} is really the set {2, 4, 1}, which is not the entire
group. 3 will be a generator because 32 mod 7 = 9 mod 7 = 2
mod 7, 33 mod 7 = 2 ∗ 3 mod 7 = 6 mod 7, 33 mod 7 = 18
mod 7 = 4 mod 7, 34 = 12 mod 7 = 5, 35 mod 7 = 15
mod 7 = 1, so {3, 32 , 33 , 34 , 35 , 36 , 37 } = {3, 2, 6, 4, 5, 1}, which are
all of the elements. 4 is not a generator because 43 mod 7 = 64
mod 7 = 1 mod 7, so just like 2, we won’t get every element. 5 is a
generator because 52 mod 6 = 4, 53 mod 7 = 20 mod 7 = 6, 54
mod 7 = 30 mod 7 = 2, 55 mod 7 = 10 mod 7 = 3, 56
p u bl i c ke y c ry p tog ra p hy 215
Solved Exercise 9.3 What is the order of every element in the group
{1, 2, 3, 4, 5, 6}, where the operation is multiplication mod 7?
■
Solution:
The orders (of 1, 2, 3, 4, 5, 6) are 1, 3, 6, 3, 6, 2, respectively. This
can be seen from the work of the previous problem, where we test
out powers of elements. Notice that all of these orders divide the
number of elements in our group. This is not a coincidence, and it
is an example of Lagrange’s Theorem, which states that the size of
every subgroup of a group will divide the order of a group. Recall
that a subgroup is simply a subset of the group which is a group in
its own right and is closed under the operation of the group.
■
Solution:
Suppose that 𝑎 ∈ 𝔾 and that 𝑏, 𝑐 ∈ 𝔾 such that 𝑎𝑏 = 1 and
𝑎𝑐 = 1. Then we know that 𝑎𝑏 = 𝑎𝑐, and then we can apply 𝑎−1 to
both sides (we are guaranteed that 𝑎 has SOME inverse 𝑎−1 in the
group), and so we have 𝑎−1 𝑎𝑏 = 𝑎−1 𝑎𝑐, but we know that 𝑎−1 𝑎 = 1
(and we can use associativity of a group), so (1)𝑏 = (1)𝑐 so 𝑏 = 𝑐.
QED.
■
Solution:
Suppose that 𝑐𝑎 = 𝑐 for all 𝑐 ∈ 𝔾 and that 𝑐𝑏 = 𝑐 for all 𝑐 ∈ 𝔾.
Then we can say that 𝑐𝑎 = 𝑐 = 𝑐𝑏 (for any 𝑐, but we can choose
some 𝑐 in particular, we could have picked 𝑐 = 1). And then 𝑐
has some inverse element 𝑐 in the group, so 𝑐 𝑐𝑎 = 𝑐−1 𝑐𝑏, but
−1 −1
The next few problems are related to quadratic residues, but these
problems are a bit more general (in particular, we are considering
some group, and a subgroup which are all of the elements of the first
group which are squares).
Suppose that 𝔾 is some (finite, commutative) group,
Solved Exercise 9.6
and ℍ is the set defined by ℍ ∶= {ℎ ∈ 𝔾 ∶ ∃𝑔 ∈ 𝐺, 𝑔2 = ℎ}. Verify that ℍ
is a subgroup of 𝔾.
■
Solution:
To be a subgroup, we need to make sure that ℍ is a group in
its own right (in particular, that it contains the identity, that it
contains inverses, and that it is closed under multiplication; asso-
ciativity and commutativity follow because we are within a larger
set 𝔾 which satisfies associativity and commutativity).
Identity Well, 12 = 1, so 1 ∈ ℍ, so ℍ has the identity element.
Inverses If ℎ ∈ ℍ, then 𝑔2 = ℎ for some 𝑔 ∈ 𝔾, but 𝑔 has an inverse
in 𝔾, and we can look at 𝑔2 (𝑔−1 )2 = (𝑔𝑔−1 )2 = 12 = 1 (where
I used commutativity and associativity, as well as the definition
of the inverse). It is clear that (𝑔−1 )2 ∈ ℍ because there exists an
element in 𝔾 (specifically, 𝑔−1 ) whose square is (𝑔−1 )2 . Therefore
ℎ has an inverse in ℍ, where if ℎ = 𝑔2 , then ℎ−1 = (𝑔−1 )2 . Closure
under operation If ℎ1 , ℎ2 ∈ ℍ, then there exist 𝑔1 , 𝑔2 ∈ 𝔾 where
ℎ1 = (𝑔1 )2 , ℎ2 = (𝑔2 )2 . So ℎ1 ℎ2 = (𝑔1 )2 (𝑔2 )2 = (𝑔1 𝑔2 )2 , so ℎ1 ℎ2 ∈ ℍ.
Therefore, ℍ is a subgroup of 𝔾.
■
Solved Exercise 9.7 Assume that |𝔾| is an even number and is known,
and that 𝑔 = 1 for any 𝑔 ∈ 𝔾. Also assume that 𝔾 is a cyclic group,
|𝔾|
Solution:
Suppose that we receive some element 𝑔 ∈ 𝔾. We want to know
if there exists some 𝑔′ ∈ 𝔾 such that 𝑔 = (𝑔′ )2 (this is equivalent to
p u bl i c ke y c ry p tog ra p hy 217
R
Remark 10.1 — Note on 𝑛 bits vs a number 𝑛. One
aspect that is often confusing in number-theoretic
based cryptography, is that one needs to always keep
track whether we are talking about “big” numbers
or “small” numbers. In many cases in crypto, we use
𝑛 to talk about our key size or security parameter,
in which case we think of 𝑛 as a “small” number of
size 100 − 1000 or so. However, when we work with
ℤ∗𝑚 we often think of 𝑚 as a “big” number having
about 100 − 1000 digits; that is 𝑚 would be roughly
2100 to 21000 or so. I will try to reserve the notation
R
Remark 10.2 — The number 𝑚 vs the message 𝑚. In
much of this course we use 𝑚 to denote a string
which is our plaintext message to be encrypted or
authenticated. In the context of integer factoring, it is
convenient to use 𝑚 = 𝑝𝑞 as the composite number
that is to be factored. To keep things interesting (or
more honestly, because I keep running out of letters)
in this lecture we will have both usages of 𝑚 (though
hopefully not in the same theorem or definition!).
When we talk about factoring, RSA, and Rabin, then
we will use 𝑚 as the composite number, while in the
context of the abstract trapdoor-permutation based
encryption and signatures we will use 𝑚 for the mes-
sage. When you see an instance of 𝑚, make sure you
understand what is its usage.
Lemma 10.4 its own might not seem very meaningful since it’s
not clear how many pseudoprimes are there. However, it turns out
these pseudoprimes, also known as “Carmichael numbers”, are
much less prevalent than the primes, specifically, there are about
𝑁 /2−Θ(log 𝑁/ log log 𝑁) pseudoprimes between 1 and 𝑁 . If we choose a
random number 𝑚 ∈ [2𝑛 ] and output it if and only if the algorithm of
Lemma 10.4 algorithm outputs YES (otherwise resampling), then the
probability we make a mistake and output a pseudoprime is equal to
the ratio of the set of pseudoprimes in [2𝑛 ] to the set of primes in [2𝑛 ].
Since there are Ω(2𝑛 /𝑛) primes in [2𝑛 ], this ratio is 2−Ω(𝑛/
𝑛
log 𝑛) which is
10.1.2 Fields
If 𝑝 is a prime then ℤ𝑝 is a field which means it is closed under addition
and multiplication and has 0 and 1 elements. One property of a field is
the following:
If 𝑓 is a
Theorem 10.5 — Fundamental Theorem of Algebra, mod 𝑝 version.
nonzero polynomial of degree 𝑑 over ℤ𝑝 then there are at most 𝑑
distinct inputs 𝑥 such that 𝑓(𝑥) = 0.
(If you’re curious why, you can see that the task of, given
𝑥1 , … , 𝑥𝑑+1 finding the coefficients for a polynomial vanishing on
222 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
Proof. 𝜑 simply maps 𝑥 ∈ ℤ∗𝑚 to the pair (𝑥 mod 𝑝, 𝑥 mod 𝑞). Verify-
ing that it satisfies all desired properties is a good exercise. QED
■
Suppose and
Theorem 10.8 — Square root extraction implies factoring.
there is an efficient algorithm 𝐴 such that for every 𝑚 ∈ ℕ and
𝑎 ∈ ℤ∗𝑚 , 𝐴(𝑚, 𝑎2 (mod 𝑚)) = 𝑏 such that 𝑎2 = 𝑏2 (mod 𝑚). Then,
there is an efficient algorithm to recover 𝑝, 𝑞 from 𝑚.
Given a number 𝑚
Definition 10.10 — Rabin function. = 𝑝𝑞, the Ra-
bin function w.r.t. 𝑚, is the map 𝑅𝑎𝑏𝑖𝑛𝑚 ∶ ℤ𝑚 → ℤ𝑚 such that
∗ ∗
Proof. Suppose that RSA𝑚,𝑒 (𝑎) = RSA𝑚,𝑒 (𝑎′ ). By the CRT, it means
that there is (𝑥, 𝑦) ≠ (𝑥′ , 𝑦′ ) ∈ ℤ∗𝑝 × ℤ∗𝑞 such that 𝑥𝑒 = 𝑥′𝑒 (mod 𝑝)
and 𝑦𝑒 = 𝑦′𝑒 (mod 𝑞). But if that’s the case we get that (𝑥𝑥′−1 )𝑒 = 1
(mod 𝑝) and (𝑦𝑦′−1 )𝑒 = 1 (mod 𝑞). But this means that 𝑒 has to be
a multiple of the order of 𝑥𝑥′−1 and 𝑦𝑦′−1 (at least one of which is not
1 and hence has order > 1). But since the order always divides the
group size, this implies that 𝑒 has to have non-trivial gcd with either
|𝑍𝑝∗ | or |ℤ∗𝑞 | and hence with (𝑝 − 1)(𝑞 − 1).
■
R
Remark 10.12 — Plain/Textbook RSA. The RSA trap-
door function is known also as “plain” or “textbook”
RSA encryption. This is because initially Diffie and
Hellman (and following them, RSA) thought of an
encryption scheme as a deterministic procedure and
so considered simply encrypting a message 𝑥 by ap-
plying ESA𝑚,𝑒 (𝑥). Today however we know that it is
insecure to use a trapdoor function directly as an en-
cryption scheme without adding some randomization.
R
Remark 10.14 — Domain of permutations. The RSA func-
tion is not a permutation over the set of strings but
rather over ℤ∗𝑚 for some 𝑚 = 𝑝𝑞. However, if we find
primes 𝑝, 𝑞 in the interval [2𝑛/2 (1 − 𝑛𝑒𝑔𝑙(𝑛)), 2𝑛/2 ], then
𝑚 will be in the interval [2𝑛 (1 − 𝑛𝑒𝑔𝑙(𝑛)), 2𝑛 ] and hence
ℤ∗𝑚 (which has size 𝑝𝑞 − 𝑝 − 𝑞 + 1 = 2𝑛 (1 − 𝑛𝑒𝑔𝑙(𝑛)))
can be thought of as essentially identical to {0, 1}𝑛 ,
since we will always pick elements from {0, 1}𝑛 at
random and hence they will be in ℤ∗𝑚 with prob-
ability 1 − 𝑛𝑒𝑔𝑙(𝑛). It is widely believed that for
every sufficiently large 𝑛 there is a prime in the
interval [2𝑛 − 𝑝𝑜𝑙𝑦(𝑛), 2𝑛 ] (this follows from the
Extended Reimann Hypothesis) and Baker, Harman
and Pintz proved that there is a prime in the interval
[2𝑛 − 20.6𝑛 , 2𝑛 ]. 2
2
Another, more minor issue is that the description
of the key might not have the same length as log 𝑚; I
defined them to be the same for simplicity of notation,
and this can be ensured via some padding and
10.1.6 Public key encryption from trapdoor permutations
concatenation tricks.
Here is how we can get a public key encryption from a trapdoor per-
mutation scheme {𝑝𝑘 }.
P
Please verify that you understand why TDPENC is a
valid encryption scheme, in the sense that decryption
of an encryption of 𝑚 yields 𝑚.
226 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
If {𝑝𝑘 }
Theorem 10.15 — Public key encryption from trapdoor permutations.
is a secure TDP and 𝐻 is a random oracle then TDPENC is a CPA
secure public key encryption scheme.
R
Remark 10.16 — Security without random oracles. We
do not need to use a random oracle to get security in
this scheme, especially if ℓ is sufficiently short. We can
replace 𝐻() with a hash function of specific properties
known as a hard core construction; this was first shown
by Goldreich and Levin.
• When 𝐴 makes the query 𝑚 to the signature box, then since 𝑚 was
queried before to 𝐻, if 𝑚 ≠ 𝑚∗ then 𝐼 returns 𝑥 = 𝑝𝑘−1 (𝐻(𝑚)) using
its records. If 𝑚 = 𝑚∗ then 𝐼 halts and outputs “failure”.
P
Once again, this proof is somewhat subtle. I recom-
mend you also read the version of this proof in Section
13.4 of Boneh-Shoup.
R
Remark 10.18 — Hash and sign. There is another reason
to use hash functions with signatures. By combining a
collision-resistant hash function ℎ ∶ {0, 1}∗ → {0, 1}ℓ
with a signature scheme (𝑆, 𝑉 ) for ℓ-length mes-
sages, we can obtain a signature for arbitrary length
messages by defining 𝑆𝑠′ (𝑚) = 𝑆𝑠 (ℎ(𝑚)) and
𝑉𝑣′ (𝑚, 𝜎) = 𝑉𝑣 (ℎ(𝑚), 𝜎).
230 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
P
Please stop to verify that this is a valid public key
encryption scheme.
Let 𝑓
Theorem 10.20 — A hardcore predicate for arbitrary one-way functions.
be a one-way function, and let 𝑔 be defined as 𝑔(𝑥, 𝑟) = (𝑓(𝑥), 𝑟),
where |𝑥| = |𝑟|. Let 𝑏(𝑥, 𝑟) = ⊕𝑖∈[𝑛] 𝑥𝑖 𝑟𝑖 be the inner product
mod 2 of 𝑥 and 𝑟. Then 𝑏 is a hard core predicate of the function 𝑔.
Pr[𝐴(𝑔(𝑋𝑛 , 𝑅𝑛 )) = 𝑏(𝑋𝑛 , 𝑅𝑛 )] = 1
2 + 𝜖𝐴 (𝑛)
Where 𝑋𝑛 and 𝑅𝑛 are uniform and independent distributions over
{0, 1}𝑛 . We observe that 𝑏 being insecure and having an output of a
single bit implies that such a program 𝐴 exists. First, we show that on
at least 𝜖𝐴 (𝑛) fraction of the possible inputs, program 𝐴 has a 𝜖𝐴2(𝑛)
advantage in predicting the output of 𝑏.
Lemma 10.21 There exists a set 𝑆 ⊆ {0, 1}𝑛 where |𝑆| > 𝜖𝐴 (𝑛)(2𝑛 ) such
that for all 𝑥 ∈ 𝑆,
1 𝜖𝐴 (𝑛)
𝑠(𝑥) = Pr[𝐴(𝑔(𝑥, 𝑅𝑛 )) = 𝑏(𝑥, 𝑅𝑛 )] ≥ +
2 2
Proof. The result follows from an averaging argument. Let 𝑘 = |𝑆|
2𝑛 ,
1 1
𝛼 = ∑ 𝑠(𝑥) and 𝛽 = ∑ 𝑠(𝑥) be the averages of 𝑠(𝑥) over
𝑘 𝑥∈𝑆 1 − 𝑘 𝑥∉𝑆
values in and not in 𝑆, respectively, so 𝑘𝛼 + (1 − 𝑘)𝛽 = 21 + 𝜖. For
notational convenience we set 𝜖 = 𝜖𝐴 (𝑛). By definition 𝔼[𝑠(𝑋𝑛 )] = 12 +
𝜖, so the fact that 𝛼 ≤ 1 and 𝛽 < 21 + 2𝜖 gives 𝑘 + (1 − 𝑘) ( 21 + 2𝜖 ) > 21 + 𝜖,
and solving finds that 𝑘 > 𝜖.
■
𝑥𝑖 = 𝑏(𝑥, 𝑟) ⊕ 𝑏(𝑥, 𝑟 ⊕ 𝑒𝑖 )
where 𝑒𝑖 is the vector with all 0s except a 1 in the 𝑖th location. This
observation follows from the definition of 𝑏, and it motivates the main
idea of the reduction: Guess 𝑏(𝑥, 𝑟) and use 𝐴 to compute 𝑏(𝑥, 𝑟 ⊕ 𝑒𝑖 ),
then put it together to find 𝑥𝑖 for all 𝑖. The reason guessing works
will become clear later, but intuitively the reason we cannot simply
use 𝐴 to compute both 𝑏(𝑥, 𝑟) and 𝑏(𝑥, 𝑟 ⊕ 𝑒𝑖 ) is that the probability
𝐴 guesses both correctly is only (standard union) bounded below
by 1 − 2 ( 21 − 𝜖𝐴 (𝑛)) = 2𝜖𝐴 (𝑛). However, if we can guess 𝑏(𝑥, 𝑟)
correctly, then we only need to invoke 𝐴 one time to get a better than
half probability of correctly determining 𝑥𝑖 . It is then a simple matter
of taking a majority vote over several such 𝑟 to determine each 𝑥𝑖 .
Now the natural question is how can we possibly
guess (and here we literally mean randomly guess)
each value of 𝑏(𝑥, 𝑟)? The key is that the values of
𝑟 only need to be pairwise independent, since down
the line we plan to use Chebyshev’s inequality on
conc re te ca n d i date s for p u bl i c ke y c ry p to 233
jority value of our guesses 𝐺(𝐽 , 𝑖) over the possible choices of 𝐽 and
output 𝑥.
Now we prove that given that our guesses 𝜌𝐽 are all correct, for all
𝑥 ∈ 𝑆 and for every 1 ≤ 𝑖 ≤ 𝑛, we have
1 𝑙 1
Pr [|{𝐽 |𝐺(𝐽 , 𝑖) = 𝑥𝑖 }| >(2 − 1)] > 1 −
2 2𝑛
That is, with probability at least 1 − 𝑂( 𝑛1 ), more than half of our
(2 − 1) guesses for 𝑥𝑖 are correct, where 2𝑙 − 1 is the number of non
𝑙
1 1 1 𝑚
Pr [∑ 𝐼𝐽 ≤ 𝑚] ≤ Pr [∣∑ 𝐼𝐽 − ( + ) 𝑚∣ ≥ 𝑚]
𝐽
2 𝐽
2 𝑞(𝑛) 𝑞(𝑛)
𝑚
= Pr [∣∑ 𝐼𝐽 − 𝔼 [∑ 𝐼𝐽 ]∣ ≥ ]
𝐽 𝐽
𝑞(𝑛)
𝑚Var(𝐼𝐽 )
≤ 2
𝑚
( 𝑞(𝑛) )
1
4
≤ 2
1
( 𝑞(𝑛) ) 𝑚
Since 𝑥 ∈ 𝑆 we know 1
𝑞(𝑛) ≥ 𝜖𝐴 (𝑛)
2 ≥ 2𝑝(𝑛) ,
1
so
1 1
4 4 1
2
≤ 2
=
1
( 𝑞(𝑛) ) 𝑚 1
( 2𝑝(𝑛) ) 2𝑛 ⋅ 𝑝(𝑛)2 2𝑛
way function, in the real world messages are sometimes longer than a
single bit. Fortunately, there is hope: Goldreich and Levin’s hardcore
bit construction can be used repeatedly to get a hardcore predicate of
logarithmic length.
Theorem 10.23 — Polynomially many hardcore bits for arbitrary one-way func-
tions. Let F be a one-way function family and G be a punctured
PRF with the same input length of F. Then under the assumed
existence of indistinguishability obfuscators, there exists a function
family H that is hardcore for F. Furthermore, the output length of
H is the same as the output length of G.
R
Remark 11.1 — Keep track of dimensions!. Through-
out this chapter, and while working in lattice based
cryptography in general, it is crucial to keep track of
the dimensions. Whenever you see a symbol such as
𝑣, 𝐴, 𝑥, 𝑦 ask yourself:
P
If you have a CPA secure public key encryption
scheme for single bit messages then you can extend
242 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
P
Please stop here and make sure that you see why
this is a valid encryption (not in the sense that it is
secure - it’s not - but in the sense that decryption of
an encryption of 𝑏 returns the bit 𝑏), and this descrip-
tion corresponds to the previous one; as usual all
calculations are done modulo 𝑞.
6
One can think of 𝑒 as chosen by simply letting every
coordinate be chosen at random in {−𝛿𝑞, −𝛿𝑞 +
It is important to note the order of quantifiers in the learning with
1, … , +𝛿𝑞}. For technical reasons, we sometimes
error conjecture. If we want to handle a noise of low enough mag- consider other distributions and in particular the
nitude (say 𝛿(𝑛) = 1/𝑛2 ) then we need to choose the modulos 𝑞 to discrete Gaussian distribution which is obtained by
letting every coordinate of 𝑒 be an independent
be large enough (for example it is believed that 𝑞 > 𝑛4 will be good Gaussian random variable with standard deviation
enough for this case) and then the adversary can choose 𝑚(𝑛) to be as 𝛿𝑞, conditioned on it being an integer. (A closely
related distribution is obtained by picking such a
big a polynomial as they like, and of course run in time which is an ar-
Gaussian random variable and then rounding it to the
bitrary polynomial in 𝑛. Therefore we can think of such an adversary nearest integer.)
𝑅 as getting access to a “magic box” that they can use 𝑚 = 𝑝𝑜𝑙𝑦(𝑛) 6
People sometimes also consider variants where both
𝑝(𝑛) and 𝑞(𝑛) can be as large as exponential.
number of times to get “noisy equations on 𝑥” of the form (𝑎𝑖 , 𝑦𝑖 ) with
𝑎𝑖 ∈ ℤ𝑛𝑞 , 𝑦𝑖 ∈ ℤ𝑞 where 𝑦𝑖 = ⟨𝑎𝑖 , 𝑥⟩ + 𝑒𝑖 .
P
The LWE conjecture posits that no efficient algorithm
can recover 𝑥 given 𝐴 and 𝐴𝑥 + 𝑒. But you might
wonder whether it’s possible to do this is inefficiently.
The answer is yes. Intuitively the reason is that if we
have more equations than unknown (i.e., if 𝑚 > 𝑛)
then these equations contain enough information to
determine the unknown variables even if they are
noisy. It can be shown that if 𝑚 is sufficiently large
(𝑚 > 10𝑛 will do) then with high probability over
𝐴, 𝑥, 𝑒, given 𝐴 and 𝑦 = 𝑥 + 𝑒, if we enumerate over
all 𝑥̃ ∈ ℤ𝑛𝑞 and output the string minimizing |𝐴𝑥̃ − 𝑦|
(where we define |𝑣| = ∑ |𝑣𝑖 | for a vector 𝑣), then 𝑥̃
will equal 𝑥.
It is a good exercise to work out the details, but a hint
is this can be proven by showing that for every 𝑥̃ ≠ 𝑥,
with high probability over 𝐴, |𝐴𝑥̃ − 𝐴𝑥| > 𝛿𝑞𝑚. The
latter fact holds because 𝑣 = 𝐴(𝑥 − 𝑥)̃ is a random
vector in ℤ𝑚 𝑞 , and the probability that |𝑣| < 𝛿𝑞𝑚 is
l atti c e ba se d c ry p tog ra p hy 245
P
The scheme LWEENC is also described in Fig. 11.2
with slightly different notation. I highly recommend
you stop and verify you understand why the two
descriptions are equivalent.
Unlike our typical schemes, here it is not immediately clear that this
encryption is valid, in the sense that the decrypting an encryption of 𝑏
returns the value 𝑏. But this is the case:
Lemma 11.3 With high probability, the decryption of the encryption of 𝑏
equals 𝑏.
For a public key encryption scheme with messages that are just bits,
CPA security means that an encryption of 0 is indistinguishable from
248 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
an encryption of 1, even given the public key. Thus Theorem 11.4 will
follow from the following lemma:
Lemma 11.5 Let 𝑞, 𝑚, 𝛿 be set as in LWEENC. Then, assuming the LWE
conjecture, the following distributions are computationally indistin-
guishable:
P
You should stop here and verify that (i) You under-
stand the statement of Lemma 11.5 and (ii) you un-
derstand why this lemma implies Theorem 11.4. The
idea is that Lemma 11.5 shows that the concatenation
of the public key and encryption of 0 is indistinguish-
able from something that is completely random. You
can then use it to show that the concatenation of the
public key and encryption of 1 is indistinguishable
from the same thing, and then finish using the hybrid
argument.
We now prove Lemma 11.5, which will complete the proof of Theo-
rem 11.4.
First, let’s see why Lemma 11.6 implies the claim. Consider the
hash family ℋ = {ℎ𝐴′ }, where ℎ𝐴′ ∶ ℤ𝑚 𝑞 → ℤ𝑞
𝑛+1
is defined by
ℎ𝐴′ (𝑤) = 𝑤⊤ 𝐴′ . For this hash family, the probability over 𝐴′ of 𝑤 ≠ 𝑤′
colliding is Pr𝐴′ [𝑤⊤ 𝐴′ = 𝑤′⊤ 𝐴′ ] = Pr𝐴′ [(𝑤 − 𝑤′ )⊤ 𝐴′ = 0]. Since 𝐴′ is
random, this is 1/(𝑞 𝑛+1 ). So ℋ is a universal hash family.
The min entropy of 𝑤 ←𝑅 {0, 1}𝑚 is the same as the entropy (be-
cause it is uniform) which is 𝑚. The output of the hash family is in
ℤ𝑛+1
𝑞 , and log |ℤ𝑛+1
𝑞 | = (𝑛 + 1) log 𝑞. Since 𝑚 ≥ (𝑛 + 1) log 𝑞 + 20𝑛 − 2
by assumption, Lemma 11.6 implies that (𝑤⊤ 𝐴′ , 𝐴′ ) is 2−10𝑛 close in
terms of statistical distance to (𝑧, 𝐴′ ) where 𝑧 is chosen uniformly in
ℤ𝑛+1
𝑞 .
Now, we’ll show this implies that for probability ≥ 1 − 2−𝑛 over the
selection of 𝐴′ , the statistical distance between 𝑤⊤ 𝐴′ and 𝑧 is less than
2−𝑛 . If not, the distance between (𝑤⊤ 𝐴′ , 𝐴′ ) and (𝑧, 𝐴′ ) would be at
least 2−𝑛 ⋅ 2−𝑛 > 2−10𝑛 . 8
This is based on notes from Daniel Wichs’s class
Proof of Lemma 11.6:8
Let 𝑍 be the random variable (𝐻(𝑊 ), 𝐻), where the probability is
over 𝐻 and 𝑊 . Let 𝑍 ′ be an independent copy of 𝑍.
Step 1: Pr[𝑍 = 𝑍 ′ ] ≤ |ℋ|⋅|𝒱|
1
(1 + 4𝜖2 ). Indeed,
Pr[𝑍 = 𝑍 ′ ] = Pr[(𝐻(𝑊 ), 𝐻) = (𝐻 ′ (𝑊 ′ ), 𝐻 ′ )]
= Pr[𝐻 = 𝐻 ′ ] ⋅ Pr[𝐻(𝑊 ) = 𝐻(𝑊 ′ )]
1
= (Pr[𝑊 = 𝑊 ′ ] + Pr[𝐻(𝑊 ) = 𝐻(𝑊 ′ ) ∧ 𝑊 ≠ 𝑊 ′ ])
|ℋ|
1 1 2 1
≤ ( 𝜖 ⋅4+ )
|ℋ| |𝒱| |𝒱|
1
= (1 + 4𝜖2 ).
|ℋ| ⋅ |𝒱|
1 1
Δ((𝐻(𝑊 ), 𝐻), (𝑉 , 𝐻)) = ∑ ∣Pr[𝑍 = (ℎ(𝑤), 𝑤)] − ∣.
2 ℎ,𝑤 |ℋ| ⋅ |𝒱|
l atti c e ba se d c ry p tog ra p hy 251
1
Δ((𝐻(𝑊 ), 𝐻), (𝑉 , 𝐻)) = ⟨𝑥, 𝑠⟩
2
1
≤ ‖𝑥‖2 ⋅ ‖𝑠‖2 Cauchy-Schwarz
2
√|ℋ| ⋅ |𝒱|
= ‖𝑥‖2 .
2
Let’s expand ‖𝑥‖2 :
2
1
‖𝑥‖22 = ∑ (Pr[𝑍 = (ℎ(𝑤), ℎ)] − )
ℎ,𝑤
|ℋ| ⋅ |𝒱|
2 Pr[𝑍 = (ℎ(𝑤), ℎ)] 1
= ∑ (Pr[𝑍 = (ℎ(𝑤), ℎ)]2 − + )
ℎ,𝑤
|ℋ| ⋅ |𝒱| (|ℋ| ⋅ |𝒱|)2
1 + 4𝜖2 2 |ℋ| ⋅ |𝒱|
≤ − +
|ℋ| ⋅ |𝒱| |ℋ| ⋅ |𝒱| (|ℋ| ⋅ |𝒱|)2
4𝜖2
= .
|ℋ| ⋅ |𝒱|
When we plug this in to our expression for the statistical distance,
we get
√|ℋ| ⋅ |𝒱|
Δ((𝐻(𝑊 ), 𝐻), (𝑉 , 𝐻)) ≤ ‖𝑥‖2
2
≤ 𝜖.
This completes the proof of Lemma 11.6 and hence the theorem.
■
P
The proof of Theorem 11.4 is quite subtle and requires
some re-reading and thought. To read more about
this, you can look at the survey of Oded Regev, “On
the Learning with Error Problem” Sections 3 and 4.
integer vector. There can be many different bases for the same lattice,
and some of them are easier to work with than others (see Fig. 11.3).
• Shortest vector problem: Given a basis 𝐵 for 𝐿, find the nonzero vec-
tor 𝑣 with smallest norm in 𝐿.
• Closest vector problem: Given a basis 𝐵 for 𝐿 and a vector 𝑢 that is not
in 𝐿, find the closest vector to 𝑢 in 𝐿.
We’ve now compiled all the tools that are needed for the basic goal
of cryptography (which is still being subverted quite often) allow-
ing Alice and Bob to exchange messages assuring their integrity and
confidentiality over a channel that is observed or controlled by an
adversary. Our tools for achieving this goal are:
might not be two parties separated in space but the same party
separated in time. That is, Alice wishes to send a message to her
future self by storing an encrypted and authenticated version of it
on some media. In this case, absent a time machine, back and forth
interaction between the two parties is obviously impossible.
• The adversary then starts many connections with the server with
ciphertexts related to 𝑐, and observes whether they succeed or fail
(and in what way they fail, if they do). It turns out that based on
this information, the adversary would be able to recover the key 𝑘.
• The keys (𝑒, 𝑑) are generated via 𝐺(1𝑛 ), and Mallory gets the
public encryption key 𝑒 and 1𝑛 .
P
Try to think what would be such a construction, and
whether there is a fundamental obstacle to combin-
ing digital signatures and public key encryption in
the same way we combined MACs and private key
encryption.
Why CCA security matters. For the reasons above, constructing CCA
secure public key encryption is very challenging. But is it worth the
trouble? Do we really need this “ultra conservative” notion of secu-
rity? The answer is yes. Just as we argued for private key encryption,
chosen ciphertext security is the notion that gets us as close as possible
to designing encryptions that fit the metaphor of secure sealed envelopes.
Digital analogies will never be a perfect imitation of physical ones, but
such metaphors are what people have in mind when designing cryp-
tographic protocols, which is a hard enough task even when we don’t
have to worry about the ability of an adversary to reach inside a sealed
envelope and XOR the contents of the note written there with some
arbitrary string. Indeed, several practical attacks, including Bleichen-
bacher’s attack above, exploited exactly this gap between the physical
metaphor and the digital realization. For more on this, please see
Victor Shoup’s survey where he also describes the Cramer-Shoup en-
cryption scheme which was the first practical public key system to be
shown CCA secure without resorting to the random oracle heuristic.
(The first definition of CCA security, as well as the first polynomial-
time construction, was given in a seminal 1991 work of Dolev, Dwork
and Naor.)
CCA-ROM-ENC Scheme:
2
Recall that it’s easy to obtain two independent
random oracles 𝐻, 𝐻 ′ from a single oracle 𝐻 ″ , for
example by letting 𝐻(𝑥) = 𝐻 ″ (0‖𝑥) and 𝐻 ′ (𝑥) =
Theorem 12.2 — CCA security from random oracles. The above CCA-ROM- 𝐻 ″ (1‖𝑥). Similarly we can extend this to three, four
ENC scheme is CCA secure. or any number of oracles.
2. The adversary 𝐴′ will now give 𝑐∗ ‖𝑦∗ ‖ℎ∗ with 𝑦∗ , ℎ∗ ←𝑅 {0, 1}𝑛 to
𝐴 ̃ as the response to the challenge. (Note that this ciphertext does
not involve neither 𝑚0 nor 𝑚1 in any way.)
Note that the adversary 𝐴′ ignores the output of 𝐴.̃ It only cares
about the queries that 𝐴 ̃ makes. Let’s say that an “𝑟𝑏 query is one that
has 𝑟𝑏 as a postfix”. To finish the proof we make the following two
claims:
Claim 2.1: The probability that 𝐴 ̃ makes an 𝑟1−𝑏∗ query is negligi-
ble. Proof: This is because the only value that 𝐴 ̃ receives that depends
on one of 𝑟0 , 𝑟1 is 𝑐∗ which is an encryption of 𝑟𝑏∗ . Hence 𝐴 ̃ never sees
any value that depends on 𝑟1−𝑏∗ and since it is uniform in {0, 1}𝑛 , the
probability that 𝐴 ̃ makes a query with this postfix is negligible. QED
(Claim 2.1)
Claim 2.2: 𝐴 ̃ will make an 𝑟𝑏∗ query with probability at least 𝜖/2.
Proof: Let 𝑐∗ = 𝐸𝑒′ (𝑟𝑏∗ ; 𝑠∗ ) where 𝑠∗ is the randomness used in produc-
ing it. By the lazy evaluation paradigm, since no 𝑟𝑏∗ query was made
up to that point, the distribution would be identical if we defined
𝐻(𝑚𝑏 ‖𝑟𝑏∗ ) = 𝑠∗ , defined 𝐻 ″ (𝑟𝑏∗ ) = 𝑦∗ ⊕𝑚𝑏 and define ℎ∗ = 𝐻 ′ (𝑚𝑏 ‖𝑟𝑏∗ ).
Hence the distribution of the ciphertext is identical to how it is dis-
tributed in the actual CCA game. Now, since 𝐴 ̃ wins the CCA game
with probability 1/2 + 𝜖 − 𝑛𝑒𝑔𝑙(𝑛), in this game it must query 𝐻 ″ at 𝑟𝑏∗
with probability at least 𝜖/2. Indeed, conditioned on not querying 𝐻 ″
at this value, the string 𝑦∗ is independent of the message 𝑚0 , and the
adversary cannot win the game with probability more than 1/2. QED
(Claim 2.2)
Together Claims 2.1 and 2.2 imply that the adversary 𝐴 ̃ makes
an 𝑟𝑏∗ query with probability at least 𝜖/2, and makes an 𝑟1−𝑏∗ query
with negligible probability, hence our adversary 𝐴′ will output 𝑏∗
with probability at least 𝜖/2, and with all but a negligible part of the
remaining probability will guess randomly, leading to an overall suc-
cess in the CPA game of at least 1/2 + 𝜖/2. QED (Claim 2 and hence
theorem)
■
other parties. This raises the question of what is identity and how is
it verified. Ultimately, if we want to use identities, then we need to
trust some authority that decides which party has which identity. This
is typically done via a certificate authority (CA). This is some trusted
authority, whose verification key 𝑣𝐶𝐴 is public and known to all par-
ties. Alice proves in some way to the CA that she is indeed Alice, and
then generates a pair (𝑠𝐴𝑙𝑖𝑐𝑒 , 𝑣𝐴𝑙𝑖𝑐𝑒 ), and gets from the CA the message 3
The registration process could be more subtle than
𝜎𝐴𝑙𝑖𝑐𝑒 =“The key 𝑣𝐴𝑙𝑖𝑐𝑒 belongs to Alice” signed with 𝑠𝐶𝐴 .3 Now Alice that, and for example Alice might need to prove to
can send (𝑣𝐴𝑙𝑖𝑐𝑒 , 𝜎𝐴𝑙𝑖𝑐𝑒 ) to Bob to certify that the owner of this public the CA that she does indeed know the corresponding
secret key.
key is indeed Alice.
For example, in the web setting, certain certificate authorities can
certify that a certain public key is associated with a certain website. If
you go to a website using the https protocol, you should see a “lock”
symbol on your browser which will give you details on the certificate.
Often the certificate is a chain of certificate. If I click on this lock sym-
bol in my Chrome browser, I see that the certificate that amazon.com’s
public key is some particular string (corresponding to a 2048 RSA
modulos and exponent) is signed by the Symantec Certificate au-
thority, whose own key is certified by Verisign. My communication
with Amazon is an example of a setting of one sided authentication. It
is important for me to know that I am truly talking to amazon.com,
while Amazon is willing to talk to any client. (Though of course once
we establish a secure channel, I could use it to login to my Amazon
account.) Chapter 21 of Boneh Shoup contains an in depth discussion
of authenticated key exchange protocols, see for example ??. Because
the definition is so involved, we will not go over the full formal def-
initions in this book, but I recommend Boneh-Shoup for an in-depth
treatment.
266 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
This approach has the advantage of being modular in both the con-
struction and the analysis. However, direct constructions might be
more efficient. There are a great many potentially desirable properties
of key exchange protocols, and different protocols achieve different
subsets of these properties at different costs. The most common vari-
ant of authenticated key exchange protocols is to use some version of
the Diffie-Hellman key exchange. If both parties have public signature
keys, then they can simply sign their messages and then that effec-
tively rules out an active attack, reducing active security to passive
security (though one needs to include identities in the signatures to
ensure non repeating of messages, see here).
The most efficient variants of Diffie Hellman achieve authentication
implicitly, where the basic protocol remains the same (sending 𝑋 = 𝑔𝑥
e sta bl i shi ng se c u re con n e c ti on s ove r i n se c u re cha n n e l s 267
P
This chapter will rely on the notion of NP complete-
ness, as well as the view of NP as proof systems. For
a review of this notion, please see this chapter of my
introduction to TCS text.
sified and about the last thing in the world that the U.S. would like to
share with Russia and vice versa. So, how can the U.S. convince the
Russian that it has destroyed a warhead, when it cannot let Russian
experts anywhere near it?
13.1.2 Voting
Electronic voting has been of great interest for many reasons. One
potential advantage is that it could allow completely transparent vote
counting, where every citizen could verify that the votes were counted
correctly. For example, Chaum suggested an approach to do so by
publishing an encryption of every vote and then having the central
authority prove that the final outcome corresponds to the counts of
all the plaintexts. But of course to maintain voter privacy, we need to
prove this without actually revealing those plaintexts. Can we do so?
All these proof systems have the property that the verifying algo-
rithm 𝑉 is efficient. Indeed, that’s the whole point of a proof 𝜋- it’s a
sequence of symbols that makes it easy to verify that the statement is
true.
To achieve the notion of zero knowledge proofs, Goldwasser and
Micali had to consider a generalization of proofs from static sequences
of symbols to interactive probabilistic protocols between a prover and
a verifier. Let’s start with an informal example. The vast majority of
humans have three types of cone cells in their eyes. The reason why
we perceive the sky as blue (see also this), despite its color being quite
a different spectrum than the blue of the rainbow, is that the projection
of the sky’s color to our cones is closest to the projection of blue. It has
been suggested that a tiny fraction of the human population might
have four functioning cones (in fact, only women, as it would require
two X chromosomes and a certain mutation). How would a person
prove to another that she is a in fact such a tetrachromat?
Proof of tetrachromacy:
Suppose that Alice is a tetrachromat and can dis-
tinguish between the colors of two pieces of plastic
that would be identical to a trichromat. She wants to
ze ro know l e d g e p roofs 275
• We have two parties: Alice and Bob. The common input is (𝑚, 𝑥)
and Alice wants to convince Bob that NQR(𝑚, 𝑥) = 1. (That is, that
𝑥 is not a quadratic residue modulo 𝑚).
To see that Bob will indeed accept the proof, note that if 𝑥 is a non-
residue then 𝑥𝑠2 will have to be a non-residue as well (since if it had
a root 𝑠′ then 𝑠′ 𝑠 would be a root of 𝑥𝑠2 ). Hence it will always be the
case that 𝑏′ = 𝑏.
Moreover, if 𝑥 was a quadratic residue of the form 𝑥 = 𝑠′2 (mod 𝑚)
for some 𝑠′ , then 𝑥𝑠2 = (𝑠′ 𝑠)2 is simply a random quadratic residue,
which means that in this case Bob’s message is distributed the same
regardless of whether 𝑏 = 0 or 𝑏 = 1, and no matter what she does, Al-
ice has probability at most 1/2 of guessing 𝑏. Hence if Alice is always
successful than after 𝑛 repetitions Bob would have 1 − 2−𝑛 confidence
that 𝑥 is indeed a non-residue modulo 𝑚.
P
Please stop and make sure you see the similarities be-
tween this protocol and the one for demonstrating that
the two pieces of plastic do not have identical colors.
R
Remark 13.2 — Functions vs languages. In many texts
proof systems are defined with respect to languages
as opposed to functions. That is, instead of talking
about a function 𝑓 ∶ {0, 1}∗ → {0, 1} we talk about
a language 𝐿 ⊆ {0, 1}∗ . These two viewpoints are
completely equivalent via the mapping 𝑓 ⟷ 𝐿 where
𝐿 = {𝑥 |𝑓(𝑥) = 1}.
R
Remark 13.3 — Notation for strategies. Up until now,
we always considered cryptographic protocols where
Alice and Bob trusted one another, but were worried
about some adversary controlling the channel between
them. Now we are in a somewhat more “suspicious”
setting where the parties do not fully trust one an-
other. In such protocols there is always a “prescribed”
or honest strategy that a particular party should fol-
low, but we generally don’t want the other parties’
security to rely on someone else’s good intention, and
hence analyze also the case where a party uses an arbi-
trary malicious strategy. We sometimes also consider
the honest but curious case where the adversary is
passive and only collects information, but does not
deviate from the prescribed strategy.
Protocols typically only guarantee security for party A
when it behaves honestly - a party can always chose to
violate its own security and there is not much we can
(or should?) do about it.
Protocol ZK-QR: Public input for Alice and Bob: 𝑥, 𝑚; Alice’s private
input is 𝑠 such that 𝑥 = 𝑠2 (mod 𝑚).
1. Alice will pick a random 𝑠′ and send to Bob 𝑥′ = 𝑥𝑠′2 (mod 𝑚).
Despite the name “zero knowledge”, we do not claim that the ver-
ifier does not know anything about the private input 𝑥. For example,
if 1𝑚 = 𝑝 ⋅ 𝑞 for two primes 𝑝, 𝑞, then each 𝑠 ∈ ℤ∗𝑚 has at most four
square roots, and if the verifier could compute square roots then they
can narrow 𝑥 down to these four possibilities. However, the point is
that this is knowledge that the verifier already even before the interac-
tion with the prover, and so participating in the proof resulted in zero
additional knowledge.
Here is how we formally define zero knowledge:
That is, we can show the verifier does not gain anything from the
interaction, because no matter what algorithm 𝑉 ∗ he uses, whatever
he learned as a result of interacting with the prover, he could have just
ze ro know l e d g e p roofs 279
R
Remark 13.5 — The simulation paradigm. The natural
way to define security is to say that a system is secure
if some “laundry list” of bad outcomes X,Y,Z can’t
happen. The definition of zero knowledge is differ-
ent. Rather than giving a list of the events that are
not allowed to occur, it gives a maximalist simulation
condition.
At its heart the definition of zero knowledge says the
following: clearly, we cannot prevent the verifier from
running an efficient algorithm 𝑆 ∗ on the public input,
but we want to ensure that this is the most he can
learn from the interaction.
This simulation paradigm has become the standard
way to define security of a great many cryptographic
applications. That is, we bound what an adversary
Eve can learn by postulating some hypothetical ad-
versary Lilith that is under much harsher conditions
(e.g., does not get to interact with the prover) and
ensuring that Eve cannot learn anything that Lilith
couldn’t have learned either. This has an advantage of
being the most conservative definition possible, and
also phrasing security in positive terms- there exists a
simulation - as opposed to the typical negative terms
- events X,Y,Z can’t happen. Since it’s often easier for
us to think of positive terms, paradoxically sometimes
this stronger security condition is easier to prove. Zero
knowledge is in some sense the simplest setting of the
simulation paradigm and we’ll see it time and again in
dealing with more advanced notions.
Proof. Let 𝑉 ∗ be an arbitrary efficient strategy for Bob. Since Bob only
sends a single bit, we can think of this strategy as composed of two
functions:
4. Output 𝑉2 (𝑥, 𝑚, 𝑥′ , 𝑠″ ).
Protocol ZK-Ham:
4. If 𝑏 = 0 then Alice sends out 𝜋 and the strings {𝑥𝑖,𝑗 } for all 𝑖, 𝑗; if
𝑏 = 1 then Alice sends out the 𝑛 strings 𝑥𝜋(𝐶1 ),𝜋(𝐶2 ) , … , 𝑥𝜋(𝐶𝑛 ),𝜋(𝐶1 )
together with their indices.
for every string 𝑥𝑖,𝑗 that was sent by Alice. If so then Bob accepts
the proof and otherwise he rejects it.
Protocol
Theorem 13.7 — Zero Knowledge proof for Hamiltonian Cycle.
ZK-Ham is a zero knowledge proof system for the language of
Hamiltonian graphs. 6
6
Goldreich, Micali and Wigderson were the first to
Proof. We need to prove completeness, soundness, and zero knowl- come up with a zero knowledge proof for an NP
complete problem, though the Hamiltoncity protocol
edge. here is from a later work by Blum. We use Naor’s
Completeness can be easily verified, and so we leave this to the commitment scheme.
reader.
For soundness, we recall that (as we’ve seen before) with ex-
tremely high probability the sets 𝑆0 = {𝐺(𝑥) ∶ 𝑥 ∈ {0, 1}𝑛 } and
𝑆1 = {𝐺(𝑥) ⊕ 𝑧 ∶ 𝑥 ∈ {0, 1}𝑛 } will be disjoint (this probability is
over the choice of 𝑧 that is done by the verifier). Now, assuming this is
the case, given the messages {𝑦𝑖,𝑗 } sent by the prover in the first step,
define an 𝑛 × 𝑛 matrix 𝑀 ′ with entries in {0, 1, ?} as follows: 𝑀𝑖,𝑗 ′
=0
if 𝑦𝑖,𝑗 ∈ 𝑆0 , 𝑀𝑖,𝑗 = 1 if 𝑦𝑖,𝑗 ∈ 𝑆1 and 𝑀𝑖,𝑗 = ? otherwise.
′ ′
We split into two cases. The first case is that there exists some per-
mutation 𝜋 such that (i) 𝑀 ′ is a 𝜋-permuted version of the input
graph 𝐻 and (ii) 𝑀 ′ contains a Hamiltonian cycle. Clearly in this case
𝐻 contains a Hamiltonian cycle as well, and hence we don’t need to
consider it when analyzing soundness. In the other case we claim that
with probability at least 1/2 the verifier will reject the proof. Indeed, if
(i) is violated then the proof will be rejected if Bob chooses 𝑏 = 0 and
if (ii) is violated then the proof will be rejected if Bob chooses 𝑏 = 1.
We now turn to showing zero knowledge. For this we need to build
a simulator 𝑆 ∗ for an arbitrary efficient strategy 𝑉 ∗ of Bob. Recall that
𝑆 ∗ gets as input the graph 𝐻 (but not the Hamiltonian cycle 𝐶) and
needs to produce an output that is indistinguishable from the output
of 𝑉 ∗ . It will do so as follows:
0. Pick 𝑏′ ∈ {0, 1}.
3. Let 𝑏 be the output of 𝑉 ∗ when given the input 𝐻 and the first
message {𝑦𝑖,𝑗 } computed as above. If 𝑏 ≠ 𝑏′ then go back to step 0.
ze ro know l e d g e p roofs 283
• Let HAM ∶ {0, 1}∗ → {0, 1} be the function that maps a graph 𝐺
to 1 if and only if 𝐺 contains a Hamiltonian cycle. Then HAM ∈
NP. Indeed, this is demonstrated by the function 𝑉𝐻𝐴𝑀 such that
𝑉𝐻𝐴𝑀 (𝐺, 𝐶) = 1 iff 𝐶 is a Hamiltonian cycle in the graph 𝐺.
P
Please make sure that you understand why this
will give a zero knowledge proof for 𝐹 , and in par-
ticular satisfy the completeness, soundness, and
zero-knowledge properties.
Note that while the NP completeness of Hamiltonicity
(and the Cook-Levin Theorem in general) is usually
perceived as a negative result (showing evidence for
the non-existence of an algorithm), in this context we
use it to obtain a positive result (zero knowledge proof
systems for many interesting functions).
This means that for every other NP language 𝐿, we can use the
reduction from 𝐿 to Hamiltonicity combined with protocol ZK-Ham
to give a zero knowledge proof system for 𝐿. In particular this means
that we can have zero knowledge proofs for the following languages:
Unlike the case of a trapdoor function, where it only took a year for
Diffie and Hellman’s challenge to be answered by RSA, in the case of
fully homomorphic encryption for more than 30 years cryptographers
had no constructions achieving this goal. In fact, some people sus-
pected that there is something inherently incompatible between the
security of an encryption scheme and the ability of a user to perform
all these operations on ciphertexts. Stanford cryptographer Dan Boneh
used to joke to incoming graduate students that he will immediately
sign the thesis of anyone who came up with a fully homomorphic en-
cryption. But he never expected that he will actually encounter such
a thesis, until in 2009, Boneh’s student Craig Gentry released a paper
doing just that. Gentry’s paper shook the world of cryptography, and
instigated a flurry of research results making his scheme more effi-
cient, reducing the assumptions it relied on, extending and applying
it, and much more. In particular, Brakerski and Vaikuntanathan man-
aged to obtain a fully homomorphic encryption scheme based only on
the Learning with Error (LWE) assumption we have seen before.
Although there is a number of implementations for (partially and)
fully homomorphic encryption (see this list), there is still much work
to be done in order to realize the full practical potential of FHE. For
a comparable level of security, the encryption and decryption oper-
fu l ly homomorp hi c e nc ry p ti on : i n trod u c ti on a n d bootstra pp i ng 289
R
Remark 14.2 — Poor man’s FHE via hardware. Since large
scale fully homomorphic encryption is still impracti-
cal, people have been trying to achieve at least weaker
security goals using certain assumptions. In particular
Intel chips have so called “Secure enclaves” which one
can think of as a somewhat tamper-protected region
of the processor that is supposed to be out of reach for
the outside world. The idea is that a cloud provider
client would treat this enclave as a trusted party that
it can communicate with through the cloud provider.
The client can store their data on the cloud encrypted
with some key 𝑘, and then set up a secure channel
with the enclave using an authenticated key exchange
protocol, and send 𝑘 over. Then, when the client sends
over a function 𝑓 to the cloud provider, the latter party
can simulate FHE by asking the enclave to compute
the encryption of 𝑓(𝑥) given the encryption of 𝑥. In
this solution ultimately the private key does reside on
the cloud provider’s computers, and the client has to
trust the security of the enclave. In practice, this could
provide reasonable security against remote hackers,
but (unlike FHE) probably not against sophisticated
attackers (e.g., governments) that have physical access
to the server.
Let ℱ = ∪ℱℓ be a
Definition 14.3 — Partially Homomorphic Encryption.
class of functions where every 𝑓 ∈ ℱℓ maps {0, 1}ℓ to {0, 1}.
An ℱ-homomorphic public key encryption scheme is a CPA secure
public key encryption scheme (𝐺, 𝐸, 𝐷) such that there exists a
polynomial-time algorithm EVAL ∶ {0, 1}∗ → {0, 1}∗ such that for
every (𝑒, 𝑑) = 𝐺(1𝑛 ), ℓ = 𝑝𝑜𝑙𝑦(𝑛), 𝑥1 , … , 𝑥ℓ ∈ {0, 1}, and 𝑓 ∈ ℱℓ of
description size |𝑓| at most 𝑝𝑜𝑙𝑦(ℓ) it holds that:
• 𝐷𝑑 (𝑐) = 𝑓(𝑥1 , … , 𝑥ℓ ).
fu l ly homomorp hi c e nc ry p ti on : i n trod u c ti on a n d bootstra pp i ng 291
P
Please stop and verify you understand the defini-
tion. In particular you should understand why some
bound on the length of the output of EVAL is needed
to rule out trivial constructions that are the analo-
gous of the cloud provider sending over to Alice the
entire encrypted database every time she wants to
evaluate a function of it. By artificially increasing the
randomness for the key generation algorithm, this
is equivalent to requiring that |𝑐| ≤ 𝑝(𝑛) for some
fixed polynomial 𝑝(⋅) that does not grow with ℓ or |𝑓|.
You should also understand the distinction between
ciphertexts that are the output of the encryption algo-
rithm on the plaintext 𝑏, and ciphertexts that decrypt
to 𝑏, see Fig. 14.2.
We claim that if the server cheats then the client will detect this
with probability 1/2 − 𝑛𝑒𝑔𝑙(𝑛). Working this out is a great exercise.
The probability of detection can be amplified to 1 − 𝑛𝑒𝑔𝑙(𝑛) using
appropriate repetition, see the paper for details.
The dLWE conjecture is that 𝑞(𝑛)-dLWE holds for every 𝑞(𝑛) that is
at most 𝑝𝑜𝑙𝑦(𝑛). This is not exactly the same phrasing we used before,
but as we sketch below, it is essentially equivalent to it. One can also
make the stronger conjecture that 𝑞(𝑛)-dLWE holds even for 𝑞(𝑛)
that is super polynomial in 𝑛 (e.g., 𝑞(𝑛) magnitude roughly 2𝑛 - note
that such a number can still be described in 𝑛 bits and we can still
efficiently perform operations such as addition and multiplication
modulo 𝑞). This stronger conjecture also seems well supported by
evidence and we will use it in future lectures.
P
It is a good idea for you to pause here and try to show
the equivalence on your own.
Equivalence between LWE and DLWE: The reason the two conjectures
are equivalent are the following. Before we phrased the conjecture as
recovering 𝑠 from a pair (𝐴′ , 𝑦) where 𝑦 = 𝐴′ 𝑠′ + 𝑒 and |𝑒𝑖 | ≤ 𝛿𝑞 for
every 𝑖. We then showed a search to decision reduction (Theorem 11.2)
demonstrating that this is equivalent to the task of distinguishing
between this case and the case that 𝑦 is a random vector. If we now let
𝛼 = ⌊ 2𝑞 ⌋ and 𝛽 = 𝛼−1 (mod 𝑞), and consider the matrix 𝐴 = (−𝛽𝑦|𝐴′ )
and the column vector 𝑠 = (𝑠𝛼′ ) we see that 𝐴𝑠 = 𝑒. Note that if 𝑦 is
a random vector in ℤ𝑚 𝑞 then so is −𝛽𝑦 and so the current form of the
conjecture follows from the previous one. (To reduce the number of
√
free parameters, we fixed 𝛿 to equal 1/ 𝑞; in this form the conjecture
becomes stronger as 𝑞 grows.)
LWE-ENC’ encryption:
P
This claim is not hard to prove, but working it out for
yourself can be a good way to get more familiarity
with LWE-ENC’ and the kind of manipulations we’ll
be making time and again in the constructions of
many lattice based cryptographic primitives. Recall
that a ciphertext 𝑐 of LWE-ENC’ is a vector in ℤ𝑛𝑞 . Try
to show that 𝑐 = 𝑐1 + ⋯ + 𝑐ℓ (where addition is done
as vectors in ℤ𝑞 ) will be the encryption of 𝑏1 ⊕ ⋯ ⊕ 𝑏ℓ .
Note that if 𝑞 is super polynomial in 𝑛 then ℓ can be an
arbitrarily large polynomial in 𝑛.
Proof of Lemma 14.5. The proof is quite simple. EVAL will simply add
the ciphertexts as vectors in ℤ𝑞 . If 𝑐 = ∑ 𝑐𝑖 then
⟨𝑐, 𝑠⟩ = ∑ 𝑏𝑖 ⌊ 2𝑞 ⌋ + 𝜉 mod 𝑞
√
where 𝜉 ∈ ℤ𝑞 is a “noise term” such that |𝜉| ≤ ℓ𝑚 𝑞 ≪ 𝑞.
Since |⌊ 2𝑞 ⌋ − 2𝑞 | < 1, adding at most ℓ terms of this difference adds at
most ℓ, and so we can also write
⟨𝑐, 𝑠⟩ = ⌊∑ 𝑏𝑖 2𝑞 ⌋ + 𝜉 ′ mod 𝑞
√
for |𝜉 ′ | ≤ ℓ𝑚 𝑞 + ℓ ≪ 𝑞.
If ∑ 𝑏𝑖 is even then ∑ 𝑏𝑖 2𝑞 is an integer multiple of 𝑞 and hence in
this case |⟨𝑐, 𝑠⟩| ≪ 𝑞. If ∑ 𝑏𝑖 is odd ⌊∑ 𝑏𝑖 2𝑞 ⌋ = ⌊𝑞/2⌋ mod 𝑞 and so in
this case |⟨𝑐, 𝑠⟩| = 𝑞/2 ± 𝑜(𝑞) > 𝑞/10.
■
3
The choice of 1/3 is arbitrary, and can be amplified
as needed.
P
This is not an easy definition to parse, but looking at
Fig. 14.3 can help. Make sure you understand why
LWEENC gives rise to a trapdoor generator satisfying
all the conditions of Definition 14.6.
R
Remark 14.7 — Trapdoor generators in real life. In the
above we use the notion of a “trapdoor” in the pseu-
dorandom generator as a mathematical abstraction,
but generators with actual trapdoors have arisen in
practice. In 2007 the National Institute of Standards
(NIST) released standards for pseudorandom genera-
tors. Pseudorandom generators are the quintessential
private key primitive, typically built out of hash func-
tions, block ciphers, and such and so it was surprising
that NIST included in the list a pseudorandom gen-
erator based on public key tools - the Dual EC DRBG
generator based on elliptic curve cryptography. This
was already strange but became even more worrying
when Microsoft researchers Dan Shumow and Niels
Ferguson showed that this generator could have a trap-
door in the sense that it contained some hardwired
constants that if generated in a particular way, there
would be some information that (just like in 𝐺𝑠 above)
fu l ly homomorp hi c e nc ry p ti on : i n trod u c ti on a n d bootstra pp i ng 297
You are given a supply of sealed bags that are flexible enough so
you can manipulate the object from outside the bag. However, each
bag can only hold for 10 seconds of such manipulations before it leaks.
The idea is that if you can open one bag inside another within 9 sec-
onds then you can use the extra second to perform one step. By re-
peating this, you perform the manipulations for arbitrary length.
Specifically, suppose that you have completed 𝑖 steps out of the total
of 𝑇 , and now have the partially constructed castle inside a sealed bag
𝐵𝑖 . You now put the bag 𝐵𝑖 inside a fresh bag 𝐵𝑖+1 . You now spend
9 seconds on opening the bag 𝐵𝑖 inside the bag 𝐵𝑖+1 , and an extra
second on performing the 𝑖 + 1 step in the construction. At this point
we have completed 𝑖 + 1 steps and have the object in the bag 𝐵𝑖+1 , we
can now continue by putting in the bag 𝐵𝑖+2 and so on and so forth.
Proof. The idea behind the proof is simple but ingenious. Recall that
the NAND gate 𝑏, 𝑏′ ↦ ¬(𝑏 ∧ 𝑏′ ) is a universal gate that allows us
to compute any function 𝑓 ∶ {0, 1}𝑛 → {0, 1} that can be efficiently
computed. Thus, to obtain a fully homomorphic encryption it suffices
to obtain a function NANDEVAL such that 𝐷𝑑 (NANDEVAL(𝑐, 𝑐′ )) =
𝐷𝑑 (𝑐) NAND 𝐷𝑑 (𝑐′ ). (Note that this is stronger than the typical no-
tion of homomorphic evaluation since we require that NANDEVAL
outputs an encryption of 𝑏 NAND 𝑏′ when given any pair of cipher-
texts that decrypt to 𝑏 and 𝑏′ respectively, regardless whether these
ciphertexts were produced by the encryption algorithm or by some
other method, including the NANDEVAL procedure itself.)
300 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
P
Don’t let the short proof fool you. This theorem is
quite deep and subtle, and requires some reading and
re-reading to truly “get” it.
15
Fully homomorphic encryption: Construction
Before the detailed description and analysis, let us first outline our
strategy. The following notion of “noisy homomorphic encryption”
will be of essential importance (see also Fig. 15.1).
then
√
𝑞
• 𝐸𝑒 (𝑏) ∈ 𝒞𝑏 for any plaintext 𝑏. That is, “fresh encryptions” have
√
noise at most 𝑞.
𝑛3 ⋅max{𝜂,𝜂′ }
ENAND(𝑐, 𝑐′ ) ∈ 𝒞𝑏∧𝑏′
Proof. For any function 𝑓 ∶ {0, 1}𝑚 → {0, 1} which can be described
by a circuit with depth ℓ, we can compute EVAL(𝑓, 𝐸𝑒 (𝑥1 ), ⋯ , 𝐸𝑒 (𝑥𝑚 ))
√
with error up to 𝑞(𝑛3 )ℓ . (The initial error for 𝐸𝑒 (𝑥𝑖 ) is smaller than
√
𝑞 and the error will be accumulated with rate up to 𝑛3 .) Thus,
to guarantee that EVAL(𝑓, 𝐸𝑒 (𝑥1 ), ⋯ , 𝐸𝑒 (𝑥𝑚 )) can be decrypted to
√ √
𝑓(𝑥1 , ⋯ , 𝑥𝑚 ) correctly, we only need 𝑞(𝑛3 )ℓ ≪ 𝑞, i.e., 𝑛3ℓ ≪ 𝑞 =
√ √
2 𝑛/2 . This is equalvent to 3ℓ log(𝑛) ≪ 𝑛/2, which can be guaran-
teed when ℓ = 𝑛𝑜(1) or ℓ = 𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑛).
■
√
We will assume the LWE conjecture with 𝑞(𝑛) ≈ 2 𝑛 in the re-
mainder of this chapter. With Theorem 15.3 in hand, our goal is to
construct a noisy FHE such that the decryption map (specifically the
map 𝑑 ↦ 𝐷𝑑 (𝑐) for any fixed ciphertext 𝑐) can be computed by a cir-
cuit with depth at most 𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑛). (Theorem 14.8 refers to the map
𝑑 ↦ ¬(𝐷𝑑 (𝑐) ∧ 𝐷𝑑 (𝑐′ )), but this latter map is obtained by applying one
more NAND gate to two parallel executions of 𝑑 ↦ 𝐷𝑑 (𝑐), and hence
if the map 𝑑 ↦ 𝐷𝑑 (𝑐) has depth at most 𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑛) then so does the
map 𝑑 ↦ ¬(𝐷𝑑 (𝑐) ∧ 𝐷𝑑 (𝑐′ )).) Once we do this then we can obtain a
fully homomorphic encryption scheme. We will head into some de-
tails show how to construct things we want in the rest of this chapter.
The most technical and interesting part would be how to upper bound
the noise/error.
P
You should make sure you understand the types of
all the identifiers we refer to. In particular, above 𝐶
is an 𝑛 × 𝑛 matrix with entries in ℤ𝑞 , 𝑠 is a vector in
ℤ𝑞𝑛 , and 𝑏 is a scalar (i.e., just a number) in {0, 1}. See
Fig. 15.2 for a visual representation of the ciphertexts
in this “naive” encryption scheme. Keeping track of
the dimensions of all objects will become only more
important in the rest of this lecture.
matrix multiplication in ℤ𝑞 ).
Indeed, one can verify that both addition and multiplication suc- Figure 15.2: In the “naive” version of the GSW encryp-
ceed since tion, to encrypt a bit 𝑏 we output an 𝑛 × 𝑛 matrix 𝐶
such that 𝐶𝑠 = 𝑏𝑠 where 𝑠 ∈ ℤ𝑛 𝑞 is the secret key. In
(𝐶 + 𝐶 ′ )𝑠 = (𝑏 + 𝑏′ )𝑠
this scheme we can transform encryptions 𝐶, 𝐶 ′ of
and 𝑏, 𝑏′ respectively to an encryption 𝐶 ″ of NAND(𝑏, 𝑏′ )
by letting 𝐶 ″ = 𝐼 − 𝐶𝐶 ′ .
CC 𝑠 = 𝐶(𝑏′ 𝑠) = 𝑏𝑏′ 𝑠
′
R
Remark 15.4 — Private key FHE. We have not shown
how to generate a ciphertext without knowledge of 𝑠,
and hence strictly speaking we only showed in this
world how to get a private key fully homomorphic
encryption. Our “real world” scheme will be a full
fledged public key FHE. However we note that private
key homomorphic encryption is already very inter-
esting and in fact sufficient for many of the “cloud
computing” applications. Moreover, Rothblum gave
fu l ly homomorp hi c e nc ry p ti on : con stru c ti on 305
(𝐶 + 𝐶 ′ )𝑠 = (𝑏 + 𝑏′ )𝑠 + (𝑒 + 𝑒′ ) (15.1)
and
CC 𝑠 = 𝐶(𝑏′ 𝑠 + 𝑒′ ) + 𝑒 = 𝑏𝑏′ 𝑠 + (𝑏′ 𝑒 + 𝐶𝑒′ ) . (15.2)
′
P
I recommend you pause here and check for yourself
whether it will be the case that 𝐶 + 𝐶 ′ encrypts 𝑏 + 𝑏′
and CC encrypts 𝑏𝑏′ up to small noise or not.
′
plus the “noise vector” 𝑏′ 𝑒 + 𝐶𝑒′ . The first component 𝑏′ 𝑒 of this noise
vector is “short” (after all 𝑏′ ∈ {0, 1} and 𝑒 is “short”). However, the
306 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
If you think about it hard enough, it turns out that there is some-
thing known as the “binary basis” that allows us to encode a number 4
If we were being pedantic the length of the vector
𝑥 ∈ ℤ𝑞 as a vector 𝑥̂ ∈ {0, 1}log 𝑞 .4 What’s even more surprising is that (and other constant below) should be the integer
⌈log 𝑞⌉ but I omit the ceiling symbols for simplicity of
this seemingly trivial trick turns out to be immensely useful. We will notation.
define the binary encoding of a vector or matrix 𝑥 over ℤ𝑞 by 𝑥.̂ That is,
𝑥̂ is obtained by replacing every coordinate 𝑥𝑖 with log 𝑞 coordinates
𝑥𝑖,0 , … , 𝑥𝑖,log 𝑞−1 such that
log 𝑞−1
𝑥𝑖 = ∑ 2𝑗 𝑥𝑖,𝑗 . (15.3)
𝑗=0
for a “short” 𝑒.
Given the conditions 1,2, and 3, we can now define the addition and
multiplication operations for two ciphertexts 𝐶, 𝐶 ′ as follows:
• 𝐶 ⊕ 𝐶 ′ = 𝐶 + 𝐶 ′ (mod 𝑞)
̂
• 𝐶 ⊗ 𝐶 ′ = (CQ )𝐶 ′
⊤
P
Please try to verify that if 𝐶, 𝐶 ′ are encryptions of 𝑏, 𝑏′
then 𝐶 ⊕ 𝐶 ′ and 𝐶 ⊗ 𝐶 ′ will be encryptions of 𝑏 + 𝑏′
and 𝑏𝑏′ respectively.
̂ ̂
(15.6)
⊤ ⊤
(𝐶 ⊗ 𝐶 ′ )𝑣 = (CQ )𝐶 ′ 𝑣 = (CQ )(𝑏′ 𝑣 + 𝑒′ ) .
̂ ⊤ = 𝐴 for every matrix 𝐴, the righthand
But since 𝑣 = 𝑄⊤ 𝑠 and 𝐴𝑄
side of (15.6) equals
̂ ̂ ̂
(15.7)
⊤ ⊤ ⊤
(CQ )(𝑏′ 𝑄⊤ 𝑠 + 𝑒′ ) = 𝑏′ 𝐶𝑄⊤ 𝑠 + (CQ )𝑒′ = 𝑏′ 𝐶𝑣 + (CQ )𝑒′
but since 𝐵
̂ is a matrix with small coefficients for every 𝐵 and 𝑒′
is short, the righthand side of (15.7) equals 𝑏′ 𝐶𝑣 up to a short vector,
and since 𝐶𝑣 = 𝑏𝑣 + 𝑒 and 𝑏′ 𝑒 is short, we get that (𝐶 ⊗ 𝐶 ′ )𝑣 equals
𝑏′ 𝑏𝑣 plus a short vector as desired.
We can now define
ENAND(𝐶, 𝐶 ′ ) = 𝐼 − 𝐶 ⊗ 𝐶 ′ .
308 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
FHEENC:
P
fu l ly homomorp hi c e nc ry p ti on : con stru c ti on 309
Once we obtain 1-4 above, we can plug FHEENC into the Bootstrap-
ping Theorem (Theorem 14.8) and thus complete the proof of exis-
tence of a fully homomorphic encryption scheme. We now address
those points one by one.
15.5.1 Correctness
Correctness of the scheme will follow from the following stronger
condition:
Lemma 15.5 For every 𝑏 ∈ {0, 1}, if 𝐶 is the encryption of 𝑏 then it is an
(𝑛 log 𝑞) × (𝑛 log 𝑞) matrix satisfying
CQ 𝑠 = 𝑏𝑄⊤ 𝑠 + 𝑒
⊤
√
where max |𝑒𝑖 | ≪ 𝑞.
Proof. For starters, let us see that the dimensions make sense: the
encryption of 𝑏 is computed by 𝐶 = (𝑏𝑄̂ ⊤ + 𝐷) where 𝐷 is an
√
(𝑛 log 𝑞) × 𝑛 matrix satisfying |𝐷𝑠|𝑖 ≤ 𝑞 for every 𝑖.
Since 𝑄⊤ is also an (𝑛 log 𝑞) × 𝑛 matrix, adding 𝑏𝑄⊤ (i.e. either 𝑄⊤
or the all-zeroes matrix, depending on whether or not 𝑏 = 1) to 𝐷
makes sense and applying the ⋅ ̂ operation will transform every row
to length 𝑛 log 𝑞 and hence 𝐶 is indeed a square (𝑛 log 𝑞) × (𝑛 log 𝑞)
matrix.
Let us now see what this matrix 𝐶 does to the vector 𝑣 = 𝑄⊤ 𝑠.
Using the fact that 𝑀̂ 𝑄⊤ = 𝑀 for every matrix 𝑀 , we get that
𝐶𝑣 = (𝑏𝑄⊤ + 𝐷)𝑠 = 𝑏𝑣 + 𝐷𝑠
√
but by construction |(𝐷𝑠)𝑖 | ≤ 𝑞 for every 𝑖.
■
15.5.3 Homomorphism
Let 𝑣 = 𝑄⊤ 𝑠, 𝑏 ∈ {0, 1} and 𝐶 be a ciphertext such that 𝐶𝑣 = 𝑏𝑣 + 𝑒.
We define the noise of 𝐶, denoted as 𝜇(𝐶) to be the maximum of |𝑒𝑖 |
over all 𝑖 ∈ [𝑛 log 𝑞]. We make the following lemma, which we’ll call
the “noisy homomorphism lemma”:
Lemma 15.6 Let 𝐶, 𝐶 ′ be ciphertexts encrypting 𝑏, 𝑏′ respectively with
𝜇(𝐶), 𝜇(𝐶 ′ ) ≤ 𝑞/4. Then 𝐶 ″ = 𝐶∧𝐶 ′ encrypts 𝑏 NAND 𝑏′ and satisfies
𝑓𝐶 (𝑑) = 𝐷𝑑 (𝐶)
where 𝑑 is the secret key and 𝐷𝑑 (𝐶) denotes the decryption algorithm
applied to 𝐶.
In our case a circuit of 𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑛) ≪ 𝑛5 will be “sufficiently shal-
low”. Specifically, by repeatedly applying the noisy homomorphism
lemma (Lemma 15.6), we can show that can homorphically evalu-
ate every circuit of NAND gates whose depth ℓ satisfies the condition
√
(2𝑛 log 𝑞)ℓ ≪ 𝑞. If 𝑞 = 2 𝑛 then (assuming 𝑛 is sufficiently large) as
long as ℓ < 𝑛0.49 this will be satisfied.
We will encode the secret key of the encryption scheme as the bi-
nary string 𝑠 ̂ which describes our vector 𝑠 ∈ 𝑍𝑞𝑛 as a bit string of
length 𝑛 log 𝑞. Given a ciphertext 𝐶, the decryption algorithm takes
the dot product modulo 𝑞 of 𝑠 with the first row of CQ . This can be
⊤
P
Please make sure you understand the above argument.
P
Please stop here and see if you understand why the
natural circuit to compute the addition of two num-
bers modulo 𝑞 (represented as log 𝑞-length binary
fu l ly homomorp hi c e nc ry p ti on : con stru c ti on 313
1. For every 𝑠 ̂ ∈ {0, 1}𝑛 such that |⟨𝑠,̂ 𝑐⟩| < 0.1𝑞, 𝑓(𝑠)̂ = 0
2. For every 𝑠 ̂ ∈ {0, 1}𝑛 such that 0.4𝑞 < |⟨𝑠,̂ 𝑐⟩| < 0.6𝑞, 𝑓(𝑠)̂ = 1
P
Now would be a good point to go back and see you
understand how all the pieces fit together to obtain
the complete construction of the fully homomorphic
encryption scheme.
doesn’t clearly separate the plaintext message and noise in its de-
cryption structure. Specifically, we have the form of 𝑐⊤ 𝑠 = 𝑚 + 𝑒
and the noise lies with the LSB part of the message and does pollute
the lowest bits of the message. Note that this is acceptable as long as
it preserves enough precision. Now we can evaluate rounding(i.e.,
rescaling in the paper) homomorphically, by dividing both a cipher-
text 𝑐 and the parameter 𝑞 by some factor 𝑝. The concept of handling
ciphertexts with a different encryption parameter 𝑞 ′ = 𝑞/𝑝 is already
known to be possible. You can find more details on this modulus
switching technique in this paper if you are interested. Besides, it is
also proved that the precision loss of the decrypted evaluation result
is at most one more bit loss compared to the plaintext computation
result, which means the scheme’s precision guarantee is nearly opti-
mal. This scheme offers an efficient homomorphic encryption setting
for many practical data science and machine learning applications
which does not require precise values, but approximate ones. You may
check existing open source libraries, such as MS SEAL and HEAAN,
of this scheme as well as many practical applications including logistic
regression in the literature.
• 𝑐∗ = COMP(𝑐).
A
Definition 15.9 — Rate of Compressible Fully Homomorphic Encryption.
compressible fully homomorphic public key encryption scheme
has rate 𝛼 = 𝛼(𝑛) if for every (𝑒, 𝑑) = 𝐺(1𝑛 ), ℓ = 𝑝𝑜𝑙𝑦(𝑛),
𝑥1 , … , 𝑥ℓ ∈ {0, 1}, and 𝑓 ∶ {0, 1}ℓ → {0, 1}∗ with sufficiently
long output, it holds that
• The server evaluates 𝑐 = EVAL(𝑓, 𝐸𝑒 (𝑖)), where 𝑓(𝑖) returns the 𝑖-th
entry of the database, and sends it (or its compressed version 𝑐∗ )
back to the client.
• The client decrypts 𝐷𝑑 (𝑐) or 𝐷𝑑 (𝑐∗ ) and obtains the 𝑖-th entry of the
database.
fu l ly homomorp hi c e nc ry p ti on : con stru c ti on 317
Here are some good exercises to make sure you follow the defini-
tion:
P
It is an excellent idea for you to pause here and try to
work out at least informally these exercises.
• We have 𝑘 parties where the first party is the auctioneer and parties
2, … , 𝑘 are bidders. Let’s assume for simplicity that each party 𝑖 has
a public key 𝑣𝑖 that is associated with some bitcoin account.9 We 9
As we discussed before, bitcoin doesn’t have the
notion of accounts but rather what we mean by that
treat all these keys as the public input. for each one of the public keys, the public ledger
contains a sufficiently large amount of bitcoins that
• The private input of bidder 𝑖 is the value 𝑥𝑖 that it wants to bid as have been transferred to these keys (in the sense that
well as the secret key 𝑠𝑖 that corresponds to their public key. whomever can sign w.r.t. these keys can transfer the
corresponding coins).
• The functionality only provides an output to the auctioneer, which
would be the identity 𝑖 of the winning bidder as well as a valid
signature on this bidder transferring 𝑥 bitcoins to the key 𝑣1 of the
auctioneer, where 𝑥 is the value of the second largest valid bid (i.e.,
𝑥 equals to the second largest 𝑥𝑗 such that 𝑠𝑗 is indeed the private
key corresponding to 𝑣𝑗 .)
mu lti pa rty se c u re comp u tati on i : d e fi n i ti on a n d hon e st-bu t-c u ri ou s to ma l i c i ou s comp l i e r 327
It’s worthwhile to think about what a secure protocol for this func-
tionality accomplishes. For example:
• The fact that in the ideal model the adversary needs to choose its
queries independently means that the adversary cannot get any
information about the honest parties’ bids before deciding on its
bid.
• Despite all parties using their signing keys as inputs to the protocol,
we are guaranteed that no one will learn anything about another
party’s signing key except the single signature that will be pro-
duced.
• Note that if 𝑖 is the highest bidder and 𝑗 is the second highest, then
at the end of the protocol we get a valid signature using 𝑠𝑖 on a
transaction transferring 𝑥𝑗 bitcoins to 𝑣1 , despite 𝑖 not knowing the
value 𝑥𝑗 (and in fact never learning the identity of 𝑗.) Nonetheless,
𝑖 is guaranteed that the signature produced will be on an amount
not larger than its own bid and an amount that one of the other
bidders actually bid for.
• On the other side, a company might wish to split its own key be-
tween several servers residing in different countries, to ensure not
one of them is completely under one jurisdiction. Or it might do
such splitting for technical reasons, so that if there is a break in at a
single site, the key is not compromised.
There are several other such examples. One problem with this ap-
proach is that splitting a cryptographic key is not the same as cutting a
100 dollar bill in half. If you simply give half of the bits to each party,
328 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
1. A protocol for the “honest but curious” case using fully homomor-
phic encryption.
2. A reduction of the general case into the “honest but curious” case
where the adversary follows the protocol precisely but merely
attempts to learn some information on top of the output that it is
mu lti pa rty se c u re comp u tati on i : d e fi n i ti on a n d hon e st-bu t-c u ri ou s to ma l i c i ou s comp l i e r 329
There
Theorem 16.5 — Honest-but-curious to malicious security compiler.
is a polynomial-time “compiler” 𝐶 such that for every for every
𝑘 party protocol (𝑃1 , … , 𝑃𝑘 ) (where all 𝑃𝑖 ’s are polynomial-time
computable potentially randomized strategies), (𝑃1̃ , … , 𝑃𝑘̃ ) =
𝐶(𝑃1 , … , 𝑃𝑘 ) is a 𝑘-tuple polynomial-time computable strategies
and moreover if (𝑃1 , … , 𝑃𝑘 ) was a protocol for computing some
(potentially randomized) functionality 𝐹 secure with respect to
honest-but-curious adversaries, then (𝑃1̃ , … , 𝑃𝑘̃ ) is a protocol for
computing the same 𝐹 secure with respect to malicious adversaries.
secure even if Bob tries to cheat. Let’s focus on Alice. Let’s imagine
(without loss of generality) that Alice and Bob alternate sending mes-
sages in the protocol with Alice going first, and so Alice sends the
odd messages and Bob sends the even ones. Lets denote by 𝑚𝑖 the
message sent in the 𝑖𝑡ℎ round of the protocol. Alice’s instructions can
be thought of as a sequence of functions 𝑓1 , 𝑓3 , ⋯ , 𝑓𝑡 (where 𝑡 is the
last round in which Alice speaks) where each 𝑓𝑖 is an efficiently com-
putable function mapping Alice’s secret input 𝑥1 , (possibly) her ran-
dom coins 𝑟1 , and the transcript of the previous messages 𝑚1 , … , 𝑚𝑖−1
to the next message 𝑚𝑖 . The functions {𝑓𝑖 } are publicly known and
part of the protocol’s instructions. The only thing that Bob doesn’t
know is 𝑥1 and 𝑟1 . So, our idea would be to change the protocol so
that after Alice sends the message 𝑖, she proves to Bob that it was in-
deed computed correctly using 𝑓𝑖 . If 𝑥1 and 𝑟1 weren’t secret, Alice
could simply send those to Bob so he can verify the computation on
his own. But because they are (and the security of the protocol could
depend on that) we instead use a zero knowledge proof.
Let’s assume for starters that Alice’s strategy is deterministic (and
so there is no random tape 𝑟1 ). A first attempt to ensure she can’t
use a malicious strategy would be for Alice to follow the message
𝑚𝑖 with a zero knowledge proof that there exists some 𝑥1 such that
𝑚𝑖 = 𝑓𝑖 (𝑥1 , 𝑚1 , … , 𝑚𝑖−1 ). However, this will actually not be secure
- it is worth while at this point for you to pause and think if you can
understand the problem with this solution.
P
Really, please stop and think why this will not be
secure.
mu lti pa rty se c u re comp u tati on i : d e fi n i ti on a n d hon e st-bu t-c u ri ou s to ma l i c i ou s comp l i e r 331
P
Did you stop and think?
The problem is that at every step Alice proves that there exists
some input 𝑥1 that can explain her message but she doesn’t prove that
it’s the same input for all messages. If Alice was being truly honest, she
should have picked her input once and use it throughout the protocol,
and she could not compute the first message according to the input
𝑥1 and then the third message according to some input 𝑥′1 ≠ 𝑥1 .
Of course we can’t have Alice reveal the input, as this would violate
security. The solution is for Alice to commit in advance to the input.
We have seen commitments before, but let us now formally define the
notion:
We will not prove security but will only sketch it here, see Section
7.3.2 in Goldreich’s survey for a more detailed proof:
• To argue that we maintain security for Alice we use the zero knowl-
edge property: we claim that Bob could not learn anything from
the zero knowledge proofs precisely because he could have sim-
ulated them by himself. We also use the hiding property of the
commitment scheme. To prove security formally we need to show
that whatever Bob learns in the modified protocol, he could have
learned in the original protocol as well. We do this by simulating
Bob by replacing the commitment scheme with commitment to
some random junk instead of 𝑥1 and the zero knowledge proofs
with their simulated version. The proof of security requires a hy-
brid argument, and is again a good exercise to try to do it on your
own.
• To argue that we maintain security for Bob we use the binding prop-
erty of the commitment scheme as well as the soundness property
of the zero knowledge system. Once again for the formal proof
we need to show that we could transform any potentially mali-
cious strategy for Alice in the modified protocol into an “honest
but curious” strategy in the original protocol (also allowing Alice
the ability to abort the protocol). It turns out that to do so, it is not
mu lti pa rty se c u re comp u tati on i : d e fi n i ti on a n d hon e st-bu t-c u ri ou s to ma l i c i ou s comp l i e r 333
We can repeat this transformation for Bob (or Charlie, David, etc..
in the 𝑘 > 2 party case) to transform a protocol secure in the honest
but curious setting into a protocol secure (allowing for aborts) in the
malicious setting.
Note that Alice knows 𝑟. Bob doesn’t know 𝑟 but because he chose
𝑟″ after Alice committed to 𝑟′ he knows that it must be fully random
regardless of Alice’s choice of 𝑟′ . It can be shown that if we use this
coin tossing protocol at the beginning and then modify the zero
knowledge proofs to show that 𝑚𝑖 = 𝑓𝑖 (𝑥1 , 𝑟1 , 𝑚1 , … , 𝑚𝑖−1 ) where
𝑟1 is the string that is consistent with the transcript of the coin toss-
ing protocol, then we get a general transformation of an honest but
curious adversary into the malicious setting.
The notion of multiparty secure computation - defining it and
achieving it - is quite subtle and I do urge you to read some of the
334 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
Let 𝐹 be
Definition 17.2 — Two party honest-but-curious secure computation.
(possibly probabilistic) map of {0, 1}𝑛 × {0, 1}𝑛 to {0, 1}𝑛 × {0, 1}𝑛 .
A secure protocol for 𝐹 is a two party protocol such for every party
𝑡 ∈ {1, 2}, there exists an efficient “ideal adversary” (i.e., efficient
interactive algorithm) 𝑆 such that for every pair of inputs (𝑥1 , 𝑥2 )
the following two distributions are computationally indistinguish-
able:
That is, 𝑆, which only gets the input 𝑥𝑡 and output 𝑦𝑡 , can sim-
ulate all the information that an honest-but-curious adversary
controlling party 𝑡 will view.
First, note that if Alice and Bob both follow the protocol, then in-
deed at the end of the protocol Alice will compute 𝐹 (𝑥, 𝑦). We now
claim that Bob does not learn anything about Alice’s input:
Proof: Bob only receives a single message in this protocol of the form
(𝑒, 𝑐) where 𝑒 is a public key and 𝑐 = 𝐸𝑒 (𝑥). The simulator 𝑆 will
generate (𝑒, 𝑑) ←𝑅 𝐺(1𝑛 ) and compute (𝑒, 𝑐) where 𝑐 = 𝐸𝑒 (0𝑛 ).
(As usual 0𝑛 denotes the length 𝑛 string consisting of all zeroes.)
No matter what 𝑥 is, the output of 𝑆 is indistinguishable from the
message Bob receives by the security of the encryption scheme. QED
mu lti pa rty se c u re co mp u tati on i i : con stru c ti on u si ng fu l ly homomorp hi c e nc ry p ti on 337
(In fact, Claim B holds even against a malicious strategy of Bob- can
you see why?)
We would now hope that we can prove the same regarding Alice’s
security. That is prove the following:
P
At this point, you might want to try to see if you can
prove Claim A on your own. If you’re having difficul-
ties proving it, try to think whether it’s even true.
338 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
So, it turns out that Claim A is not generically true. The reason is
the following: the definition of fully homomorphic encryption only re-
quires that EVAL(𝑓, 𝐸(𝑥)) decrypts to 𝑓(𝑥) but it does not require that
it hides the contents of 𝑓. For example, for every FHE, if we modify
EVAL(𝑓, 𝑐) to append to the ciphertext the first 100 bits of the descrip-
tion of 𝑓 (and have the decryption algorithm ignore this extra infor- 2
It’s true that strictly speaking, we allowed EVAL’s
mation) then this would still be a secure FHE.2 Now we didn’t exactly output to have length at most 𝑛, while this would
specify how we describe the function 𝑓(𝑥) defined as 𝑥 ↦ 𝐹 (𝑥, 𝑦) make the output be 𝑛 + 100, but this is just a tech-
nicality that can be easily bypassed, for example by
but there are clearly representations in which the first 100 bits of the
having a new scheme that on security parameter 𝑛
description would reveal the first few bits of the hardwired constant 𝑦, runs the original scheme with parameter 𝑛/2 (and
hence meaning that Alice will learn those bits from Bob’s message. hence will have a lot of “room” to pad the output of
EVAL with extra bits).
Thus we need to get a stronger property, known as circuit privacy.
This is a property that’s useful in other contexts where we use FHE.
Let us now define it:
::: {.definition title=“Perfect circuit privacy” #perfectcircprivat-
edef} Let ℰ = (𝐺, 𝐸, 𝐷, EVAL) be an FHE. We say that ℰ satisfies
perfect circuit privacy if for every (𝑒, 𝑑) output by 𝐺(1𝑛 ) and every
function 𝑓 ∶ {0, 1}ℓ → {0, 1} of 𝑝𝑜𝑙𝑦(𝑛) description size, and ev-
ery ciphertexts 𝑐1 , … , 𝑐ℓ and 𝑥1 , … , 𝑥ℓ ∈ {0, 1} such that 𝑐𝑖 is output
by 𝐸𝑒 (𝑥𝑖 ), the distribution of EVAL𝑒 (𝑓, 𝑐1 , … , 𝑐ℓ ) is identical to the
distribution of 𝐸𝑒 (𝑓(𝑥)). That is, for every 𝑧 ∈ {0, 1}∗ , the probabil-
ity that EVAL𝑒 (𝑓, 𝑐1 , … , 𝑐ℓ ) = 𝑧 is the same as the probability that
𝐸𝑒 (𝑓(𝑥)) = 𝑧. We stress that these probabilities are taken only over the
coins of the algorithms EVAL and 𝐸.
Perfect circuit privacy is a strong property, that also automatically
implies that 𝐷𝑑 (EVAL(𝑓, 𝐸𝑒 (𝑥1 ), … , 𝐸𝑒 (𝑥ℓ ))) = 𝑓(𝑥) (can you see
why?). In particular, once you understand the definition, the follow-
ing lemma is a fairly straightforward exercise.
Lemma 17.3 If (𝐺, 𝐸, 𝐷, EVAL) satisfies perfect circuit privacy then if
(𝑒, 𝑑) = 𝐺(1𝑛 ) then for every two functions 𝑓, 𝑓 ′ ∶ {0, 1}ℓ → {0, 1} of
𝑝𝑜𝑙𝑦(𝑛) description size and every 𝑥 ∈ {0, 1}ℓ such that 𝑓(𝑥) = 𝑓 ′ (𝑥),
and every algorithm 𝐴,
| Pr[𝐴(𝑑, EVAL(𝑓, 𝐸𝑒 (𝑥1 ), … , 𝐸𝑒 (𝑥ℓ ))) = 1]−Pr[𝐴(𝑑, EVAL(𝑓 ′ , 𝐸𝑒 (𝑥1 ), … , 𝐸𝑒 (𝑥ℓ ))) = 1]| < 𝑛𝑒𝑔𝑙(𝑛).
(17.1)
P
Please stop here and try to prove Lemma 17.3
The algorithm 𝐴 above gets the secret key as input, but still cannot
distinguish whether the EVAL algorithm used 𝑓 or 𝑓 ′ . In fact, the
expression on the lefthand side of (17.1) is equal to zero when the
scheme satisfies perfect circuit privacy.
mu lti pa rty se c u re co mp u tati on i i : con stru c ti on u si ng fu l ly homomorp hi c e nc ry p ti on 339
where once again, these probabilities are taken only over the
coins of the algorithms EVAL and 𝐸.
If you find Definition 17.4 hard to parse, the most important points
you need to remember about it are the following:
(The third point, which goes without saying, is that you can always
ask clarifying questions in class, Piazza, sections, or office hours…)
Intuitively, circuit privacy corresponds to what we need in the
above protocol to protect Bob’s security and ensure that Alice doesn’t
get any information about his input that she shouldn’t have from
the output of EVAL, but before working this out, let us see how we
can construct fully homomorphic encryption schemes satisfying this
property.
P
We omit the proof of Lemma 17.6 and leave it as
an exercise to prove it using the hybrid argument.
We will actually only use Lemma 17.6 for distri-
butions above; you can obtain intuition for it by
considering the 𝑚 = 2 case where we compare the
rectangles of the forms [−𝑇 , +𝑇 ] × [−𝑇 , +𝑇 ] and
[−𝑇 + 𝑎, +𝑇 + 𝑎] × [−𝑇 + 𝑏, +𝑇 + 𝑏]. You can see that
their union has size roughly 4𝑇 2 while their symmet-
ric difference has size roughly 2𝑇 ⋅ 2𝑎 + 2𝑇 ⋅ 2𝑏, and so
mu lti pa rty se c u re co mp u tati on i i : con stru c ti on u si ng fu l ly homomorp hi c e nc ry p ti on 341
We will not provide the full details, but together these lemmas
show that EVAL can use bootstrapping to reduce the magnitude of
the noise to roughly 2𝑛 and then add an additional random noise of
0.1
from the actual encryption. Here are some hints on how to make this
work: the idea is that in order to “re-randomize” a ciphertext 𝐶 we
need a very noisy encryption of zero and add it to 𝐶. The normal
encryption will use noise of magnitude 2𝑛 but we will provide an
0.2
so we can use bootstrapping to reduce the noise. The main idea that
allows to add noise is that at the end of the day, our scheme boils
down to LWE instances that have the form (𝑐, 𝜎) where 𝑐 is a random
vector in ℤ𝑛−1
𝑞 and 𝜎 = ⟨𝑐, 𝑠⟩ + 𝑎 where 𝑎 ∈ [−𝜂, +𝜂] is a small noise
addition. If we take any such input and add to 𝜎 some 𝑎′ ∈ [−𝜂′ , +𝜂′ ]
then we create the effect of completely re-randomizing the noise.
However, completely analyzing this requires non-trivial amount of
care and work.
• For every pair of keys (𝑒, 𝑑) = 𝐺(1𝑛 ) there are two distributions
𝒞0 , 𝒞1 over {0, 1}𝑛 such that:
Proof Idea:
We do not include the full proof but the idea is we use our standard
LWE-based FHE and to rerandomize a ciphertext 𝑐 we will add to it an
encryption of 0 (which will not change the corresponding plaintext)
and an additional noise vector that would be of much larger mag-
nitude than the original noise vector of 𝑐, but still small enough so
decryption succeeds.
⋆
We can now describe the protocol for three parties. We will focus
on the case where the goal is for Alice to learn 𝐹 (𝑥, 𝑦, 𝑧) (where 𝑥, 𝑦, 𝑧
are the private inputs of Alice, Bob, and Charlie, respectively) and for
Bob and Charlie to learn nothing. As usual we can reduce the gen-
eral case to this by running the protocol multiple times with parties
switching the roles of Alice, Bob, and Charlie.
If the
Theorem 17.9 — Three party honest-but-curious secure computation.
underlying encryption is a fully homomorphic statistically circuit
private encryption then Protocol 3PC is a secure protocol for the
344 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
P
You should read the paragraphs above more than
once and make sure you appreciate how truly mind Figure 18.2: The setup of the double slit experiment
boggling these results are. in the case of photon or electron guns. We see also
destructive interference in the sense that there are
some positions on the wall that get fewer hits when
both slits are open than they get when only one of the
slits is open. Image credit: Wikipedia.
18.2 QUANTUM AMPLITUDES
The double slit and other experiments ultimately forced scientists to
accept a very counterintuitive picture of the world. It is not merely
about nature being randomized, but rather it is about the probabilities
in some sense “going negative” and cancelling each other!
qua n tu m comp u ti ng a n d c ry p tog r a p hy i 347
Specifically, consider an event that can either occur or not (e.g. “de-
tector number 17 was hit by a photon”). In classical probability, we
model this by a probability distribution over the two outcomes: a pair
of non-negative numbers 𝑝 and 𝑞 such that 𝑝 + 𝑞 = 1, where 𝑝 corre-
sponds to the probability that the event occurs and 𝑞 corresponds to
the probability that the event does not occur. In quantum mechanics,
we model this also by pair of numbers, which we call amplitudes. This
is a pair of (potentially negative or even complex) numbers 𝛼 and 𝛽
such that |𝛼|2 + |𝛽|2 = 1. The probability that the event occurs is |𝛼|2
and the probability that it does not occur is |𝛽|2 . In isolation, these
negative or complex numbers don’t matter much, since we anyway
square them to obtain probabilities. But the interaction of positive and
negative amplitudes can result in surprising cancellations where some-
how combining two scenarios where an event happens with positive
probability results in a scenario where it never does.
P
If you don’t find the above description confusing and
unintuitive, you probably didn’t get it. Please make
sure to re-read the above paragraphs until you are
thoroughly confused.
R
Remark 18.1 — Complex vs real, other simplifications. If
(like the author) you are a bit intimidated by complex
numbers, don’t worry: you can think of all ampli-
tudes as real (though potentially negative) numbers
without loss of understanding. All the “magic” of
quantum computing already arises in this case, and
so we will often restrict attention to real amplitudes in
this chapter.
We will also only discuss so-called pure quantum
states, and not the more general notion of mixed states.
Pure states turn out to be sufficient for understanding
the algorithmic aspects of quantum computing.
More generally, this chapter is not meant to be a com-
plete description of quantum mechanics, quantum
information theory, or quantum computing, but rather
illustrate the main points where these differ from
classical computing.
So, he asked whether one could design a quantum system such that
its outcome 𝑦 based on the initial condition 𝑥 would be some function
𝑦 = 𝑓(𝑥) such that (a) we don’t know how to efficiently compute 2
As its title suggests, Feynman’s lecture was actually
focused on the other side of simulating physics
in any other way, and (b) is actually useful for something.2 In 1985, with a computer, but he mentioned that as a “side
David Deutsch formally suggested the notion of a quantum Turing remark” one could wonder if it’s possible to simulate
machine, and the model has been since refined in works of Detusch physics with a new kind of computer - a “quantum
computer” which would “not [be] a Turing machine,
and Josza and Bernstein and Vazirani. Such a system is now known as but a machine of a different kind”. As far as I know,
a quantum computer. Feynman did not suggest that such a computer could
be useful for computations completely outside the
For a while these hypothetical quantum computers seemed useful domain of quantum simulation, and in fact he found
for one of two things. First, to provide a general-purpose mecha- the question of whether quantum mechanics could
nism to simulate a variety of the real quantum systems that people be simulated by a classical computer to be more
interesting.
care about. Second, as a challenge to the theory of computation’s ap-
proach to model efficient computation by Turing machines, though a
challenge that has little bearing to practice, given that this theoretical
“extra power” of quantum computer seemed to offer little advantage
in the problems people actually want to solve such as combinatorial
optimization, machine learning, data structures, etc..
To a significant extent, this is still true today. We have no real ev-
idence that quantum computers, when built, will offer truly signif- 3
I am using the theorist’ definition of conflating
“significant” with “super-polynomial”. As we’ll
icant3 advantage in 99 percent of the applications of computing.4 see, Grover’s algorithm does offer a very generic
However, there is one cryptography-sized exception: In 1994 Peter quadratic advantage in computation. Whether that
Shor showed that quantum computers can solve the integer factoring quadratic advantage will ever be good enough to
offset in practice the significant overhead in building
and discrete logarithm in polynomial time. This result has captured a quantum computer remains an open question.
the imagination of a great many people, and completely energized We also don’t have evidence that super-polynomial
speedups can’t be achieved for some problems outside
research into quantum computing. the Factoring/Dlog or quantum simulation domains,
This is both because the hardness of these particular problems and there is at least one company banking on such
provides the foundations for securing such a huge part of our commu- speedups actually being feasible.
4
This “99 percent” is a figure of speech, but not
nications (and these days, our economy), as well as it was a powerful completely so. It seems that for many web servers,
demonstration that quantum computers could turn out to be useful the TLS protocol (which based on the current non-
for problems that a-priori seemed to have nothing to do with quantum lattice based systems would be completely broken
by quantum computing) is responsible for about 1
physics. percent of the CPU usage.
At the moment there are several intensive efforts to construct large
scale quantum computers. It seems safe to say that, in the next five
years or so there will not be a quantum computer large enough to fac-
tor, say, a 1024 bit number. However, some quantum computers have
qua n tu m comp u ti ng a n d c ry p tog r a p hy i 351
been built that achieved tasks that are either not known to be achieved
classically, or at least seem to require more resources classically than
they do for these quantum computers. When and if such a computer
is built that can break reasonable parameters of Diffie Hellman, RSA
and elliptic curve cryptography is anybody’s guess. It could also be
a “self destroying prophecy” whereby the existence of a small-scale
quantum computer would cause everyone to shift away to lattice-
based crypto which in turn will diminish the motivation to invest the 5
Of course, given that “export grade” cryptography
huge resources needed to build a large scale quantum computer.5 that was supposed to disappear with 1990’s took
The above summary might be all that you need to know as a cryp- a long time to die, I imagine that we’ll still have
products running 1024 bit RSA when everyone has a
tographer, and enough motivation to study lattice-based cryptography
quantum laptop.
as we do in this course. However, because quantum computing is
such a beautiful and (like cryptography) counter-intuitive concept,
we will try to give at least a hint of what it is about and how Shor’s
algorithm works.
biggest challenge is how to keep the system from being measured and
collapsing to a single classical combination of states. This is sometimes
known as the coherence time of the system. The threshold theorem says
that there is some absolute constant level of errors 𝜏 so that if errors
are created at every gate at rate smaller than 𝜏 then we can recover
from those and perform arbitrary long computations. (Of course there
are different ways to model the errors and so there are actually several
threshold theorems corresponding to various noise models).
There have been several proposals to build quantum computers:
𝑓(𝑥) ⊕ 𝑔(𝑦) = 𝑥 ∧ 𝑦
for all the four choices of 𝑥, 𝑦 ∈ {0, 1}2 . Let’s plug in all these four
choices and see what we get (below we use the equalities 𝑧 ⊕ 0 = 𝑧,
𝑧 ∧ 0 = 0 and 𝑧 ∧ 1 = 𝑧):
Proof. Alice and Bob will start by preparing a 2-qubit quantum system
in the state
𝜓= √1 |00⟩ + √1 |11⟩
2 2
(this state is known as an EPR pair). Alice takes the first qubit of
the system to her room, and Bob takes the qubit to his room. Now,
when Alice receives 𝑥 if 𝑥 = 0 she does nothing and if 𝑥 = 1 she ap-
𝑐𝑜𝑠𝜃 − sin 𝜃
plies the unitary map 𝑅−𝜋/8 to her qubit where 𝑅𝜃 = ( )
sin 𝜃 cos 𝜃
is the unitary operation corresponding to rotation in the plane with
angle 𝜃. When Bob receives 𝑦, if 𝑦 = 0 he does nothing and if 𝑦 = 1
he applies the unitary map 𝑅𝜋/8 to his qubit. Then each one of them
measures their qubit and sends this as their response.
Recall that to win the game Bob and Alice want their outputs to
be more likely to differ if 𝑥 = 𝑦 = 1 and to be more likely to agree
otherwise. We will split the analysis in one case for each of the four
possible values of 𝑥 and 𝑦.
Case 1: 𝑥 = 0 and 𝑦 = 0. If 𝑥 = 𝑦 = 0 then the state does not
change. * Because the state 𝜓 is proportional to |00⟩ + |11⟩, the mea-
surements of Bob and Alice will always agree (if Alice measures 0
then the state collapses to |00⟩ and so Bob measures 0 as well, and
similarly for 1). Hence in the case 𝑥 = 𝑦 = 1, Alice and Bob always
win.
Case 2: 𝑥 = 0 and 𝑦 = 1. If 𝑥 = 0 and 𝑦 = 1 then after Alice
measures her bit, if she gets 0 then the system collapses to the state
|00⟩, in which case after Bob performs his rotation, his qubit is in
the state cos(𝜋/8)|0⟩ + sin(𝜋/8)|1⟩. Thus, when Bob measures his
qubit, he will get 0 (and hence agree with Alice) with probability
cos2 (𝜋/8) ≥ 0.85. Similarly, if Alice gets 1 then the system collapses
to |11⟩, in which case after rotation Bob’s qubit will be in the state
− sin(𝜋/8)|0⟩ + cos(𝜋/8)|1⟩ and so once again he will agree with Alice 14
We are using the (not too hard) observation that
with probability cos2 (𝜋/8). the result of this experiment is the same regardless of
the order in which Alice and Bob apply their rotations
The analysis for Case 3, where 𝑥 = 1 and 𝑦 = 0, is completely and measurements.
analogous to Case 2. Hence Alice and Bob will agree with probability
cos2 (𝜋/8) in this case as well.14
qua n tu m comp u ti ng a n d c ry p tog r a p hy i 359
R
Remark 18.4 — Quantum vs probabilistic strategies. It
is instructive to understand what is it about quan-
tum mechanics that enabled this gain in Bell’s
Inequality. For this, consider the following anal-
ogous probabilistic strategy for Alice and Bob.
They agree that each one of them output 0 if he
or she get 0 as input and outputs 1 with prob-
ability 𝑝 if they get 1 as input. In this case one
can see that their success probability would be
1
4
⋅ 1 + 21 (1 − 𝑝) + 14 [2𝑝(1 − 𝑝)] = 0.75 − 0.5𝑝2 ≤ 0.75.
The quantum strategy we described above can be
thought of as a variant of the probabilistic strategy for
parameter 𝑝 set to sin2 (𝜋/8) = 0.15. But in the case
𝑥 = 𝑦 = 1, instead of disagreeing only with probability
2𝑝(1 − 𝑝) = 1/4, because we can use these negative
probabilities in the quantum world and rotate the state
in opposite directions, and hence the probability of
disagreement ends up being sin2 (𝜋/4) = 0.5.
360 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
Proof sketch: The proof is not hard but we only sketch it here. The
general idea can be illustrated in the case that there exists a single 𝑥∗
satisfying 𝑓(𝑥∗ ) = 1. (There is a classical reduction from the general
case to this problem.) As in Simon’s algorithm, we can efficiently ini-
tialize an 𝑛-qubit system to the uniform state 𝑢 = 2−𝑛/2 ∑𝑥∈{0,1}𝑛 |𝑥⟩
which has 2−𝑛/2 dot product with |𝑥∗ ⟩. Of course if we measure 𝑢, we
only have probability (2−𝑛/2 )2 = 2−𝑛 of obtaining the value 𝑥∗ . Our
goal would be to use 𝑂(2𝑛/2 ) calls to the oracle to transform the sys-
tem to a state 𝑣 with dot product at least some constant 𝜖 > 0 with the
state |𝑥∗ ⟩.
It is an exercise to show that using 𝐻𝑎𝑑 gets we can efficiently com-
pute the unitary operator 𝑈 such that 𝑈 𝑢 = 𝑢 and 𝑈 𝑣 = −𝑣 for every
𝑣 orthogonal to 𝑢. Also, using the circuit for 𝑓, we can efficiently com-
pute the unitary operator 𝑈 ∗ such that 𝑈 ∗ |𝑥⟩ = |𝑥⟩ for all 𝑥 ≠ 𝑥∗
and 𝑈 ∗ |𝑥∗ ⟩ = −|𝑥∗ ⟩. It turns out that 𝑂(2𝑛/2 ) applications of UU
∗
to 𝑢 yield a vector 𝑣 with Ω(1) inner product with |𝑥∗ ⟩. To see why,
consider what these operators do in the two dimensional linear sub-
space spanned by 𝑢 and |𝑥∗ ⟩. (Note that the initial state 𝑢 is in this
subspace and all our operators preserve this property.) Let 𝑢⟂ be the
unit vector orthogonal to 𝑢 in this subspace and let 𝑥∗⟂ be the unit vec-
tor orthogonal to |𝑥∗ ⟩ in this subspace. Restricted to this subspace, 𝑈 ∗
is a reflection along the axis 𝑥∗⟂ and 𝑈 is a reflection along the axis 𝑢.
qua n tu m comp u ti ng a n d c ry p tog r a p hy i 361
Now, let 𝜃 be the angle between 𝑢 and 𝑥∗⟂ . These vectors are very
close to each other and so 𝜃 is very small but not zero - it is equal to
sin−1 (2−𝑛/2 ) which is roughly 2−𝑛/2 . Now if our state 𝑣 has angle
𝛼 ≥ 0 with 𝑢, then as long as 𝛼 is not too large (say 𝛼 < 𝜋/8) then
this means that 𝑣 has angle 𝑢 + 𝜃 with 𝑥∗⟂ . That means that 𝑈 ∗ 𝑣 will
have angle −𝛼 − 𝜃 with 𝑥∗⟂ or −𝛼 − 2𝜃 with 𝑢, and hence UU 𝑣 will
∗
problems to order finding: (see some of the sources above for the full
details)
Note that given 𝑂(𝑛) such samples, we can recover ℎ∗ with high
probability by solving the corresponding linear equations.
state we will get a pair (𝑦, 𝑧) such that ⟨𝑦, ℎ∗ ⟩ = 0 (mod 2). QED
■
366 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
̂ =
𝑓(𝑥) √1 ∑𝑦∈ℤ 𝑓(𝑥)𝜔𝑥𝑦
𝑚 𝑚
where 𝜔 = 𝑒2𝜋𝑖/𝑚 .
The Fourier transform is simply a representation of 𝑓 in the Fourier
basis {𝜒𝑥 }𝑥∈ℤ𝑚 , where 𝜒𝑥 is the vector/function whose 𝑦𝑡ℎ coordinate
is √𝑚𝜔1
𝑥𝑦 . Now the inner product of any two vectors 𝜒𝑥 , 𝜒𝑧 in this basis
is equal to
1 1
⟨𝜒𝑥 , 𝜒𝑧 ⟩ = 𝑚 ∑ 𝜔𝑥𝑦 𝜔𝑧𝑦 = 𝑚 ∑ 𝜔(𝑥−𝑧)𝑦 .
𝑦∈ℤ𝑚 𝑦∈ℤ𝑚
For every
Theorem 19.5 — Quantum Fourier Transform (Bernstein-Vazirani).
𝑚 and 𝑚 = 2𝑚 there is a quantum algorithm that uses 𝑂(𝑚2 ) ele-
mentary quantum operations and transforms a quantum register in
state 𝑓 = ∑𝑥∈ℤ 𝑓(𝑥)|𝑥⟩ into the state 𝑓 ̂ = ∑𝑥∈ℤ 𝑓(𝑥)|𝑥⟩,
̂ where
𝑚 𝑚
̂ =√ ∑
𝑓(𝑥) 1 𝑥𝑦
𝜔 𝑓(𝑥).
𝑚 𝑦∈ℤ𝑚
The crux of the algorithm is the FFT equations, which allow the
problem of computing FT𝑚 , the problem of size 𝑚, to be split into two
identical subproblems of size 𝑚/2 involving computation of FT𝑚/2 ,
which can be carried out recursively using the same elementary oper-
ations. (Aside: Not every divide-and-conquer classical algorithm can
be implemented as a fast quantum algorithm; we are really using the
structure of the problem here.)
We now describe the algorithm and the state, neglecting normaliz-
ing factors.
The final state is equal to 𝑓 ̂ by the FFT equations (we leave this as
an exercise)
There is a polynomial-
Theorem 19.6 — Order finding algorithm, restated.
time quantum algorithm that on input 𝐴, 𝑁 (represented in bi-
nary) finds the smallest 𝑟 such that 𝐴𝑟 = 1 (mod 𝑁 ).
unique (again left as an exercise) and hence in this case the algorithm
will come up with 𝑐/𝑟 and output the denominator 𝑟.
Thus all that is left is to prove the next two lemmas. The first shows
that there are Ω(𝑟/ log 𝑟) values of 𝑥 that satisfy the above two con-
ditions and the second shows that each is measured with probability
√
Ω((1/ 𝑟)2 ) = Ω(1/𝑟).
Proof of Lemma 1 We prove the lemma for the case that 𝑟 is coprime to
𝑚, leaving the general case to the reader. In this case, the map 𝑥 ↦ 𝑟𝑥
(mod 𝑚) is a permutation of ℤ∗𝑚 . There are at least Ω(𝑟/ log 𝑟) num-
bers in [1..𝑟/10] that are coprime to 𝑟 (take primes in this range that
are not one of 𝑟’s at most log 𝑟 prime factors) and hence Ω(𝑟/ log 𝑟)
numbers 𝑥 such that 𝑟𝑥 (mod 𝑚) = 𝑥𝑟 − ⌊𝑥𝑟/𝑚⌋𝑚 is in [1..𝑟/10] and
coprime to 𝑟. But this means that ⌊𝑟𝑥/𝑚⌋ can not have a nontrivial
shared factor with 𝑟, as otherwise this factor would be shared with 𝑟𝑥
(mod 𝑚) as well.
Proof of Lemma 2: Let 𝑥 be such that 0 < 𝑥𝑟 (mod 𝑚) < 𝑟/10. The
absolute value of |𝑥⟩’s coefficient in the state before the measurement
is
𝐾−1
√ 1√ ∣ ∑ 𝜔ℓ𝑟𝑥 ∣ ,
𝐾 𝑚
ℓ=0
2𝑀 ∣ 1−𝛽
1−𝛽 ∣ =
√
𝑟 sin(𝜃⌈𝑚/𝑟⌉/2)
2𝑀 sin(𝜃/2) , where 𝜃 = 𝑟𝑥 (mod
𝑚
𝑚)
is the angle such that 𝛽 = 𝑒𝑖𝜃
(see Figure [quantum:fig:theta] for a proof by picture of the last
equality). Under our assumptions ⌈𝑚/𝑟⌉𝜃 < 1/10 and hence (us-
ing the
√
fact that sin 𝛼 ∼ 𝛼 for small angles 𝛼), the coefficient of 𝑥 is at
least 4𝑀 ⌈𝑚/𝑟⌉ ≥ 8√
𝑟 1
𝑟
This completes the proof of Theorem 19.6.
then write
1
𝛼 = ⌊𝛼⌋ + .
𝑅
If we continue this process for 𝑛 steps, we get a rational number, de-
noted by [𝑎0 , 𝑎1 , … , 𝑎𝑛 ], which can be represented as 𝑝𝑞𝑛 with 𝑝𝑛 , 𝑞𝑛
𝑛
coprime. The following facts can be proven using induction:
• 𝑝𝑛
𝑞𝑛 − 𝑝𝑛−1
𝑞𝑛−1 = (−1)𝑛−1
𝑞𝑛 𝑞𝑛−1
that 𝑞𝑛 ≥ 2 𝑛/2
, implying that 𝑝𝑛 and 𝑞𝑛 can be computed in 𝑝𝑜𝑙𝑦𝑙𝑜𝑔(𝑞𝑛 )
time.
• Public key encryption and digital signatures that enable Alice and Bob
to set up such a virtually secure channel without sharing a prior key.
This enables our “information economy” and protects virtually
every financial transaction over the web. Moreover, it is the crucial
mechanism for supplying “over the air” software updates to smart
devices, whether they be phones, cars, thermostats or anything else.
Some had predicted that this invention would change the nature
of our form of government to crypto anarchy and while this may
be hyperbole, governments everywhere are worried about this
invention.
(BTW all of the above points are notions that you should be famil-
iar and be able to explain what are their security guarantees if you
ever need to use them, for example, in the unlikely event that you ever
find yourself needing to take a cryptography final exam…)
While clearly there are issues of efficiency, is there anything more in
terms of functionality we could ask for? Given all these riches, can we
be even more greedy?
It turns out that the answer is yes. Here are some scenarios that are
still not covered by the above tools:
• For every function 𝑓 ∶ {0, 1}ℓ → {0, 1}, if (𝑑, 𝑒) = 𝐺(1𝑛 ) and 𝑑𝑓 =
𝐾𝑒𝑦𝐷𝑖𝑠𝑡(𝑑, 𝑓), then for every message 𝑚, 𝐷𝑑𝑓 (𝐸𝑒 (𝑚)) = 𝑓(𝑚).
who are slow to update their software. Could we come up for a reg-
ular expression 𝑅 with a program 𝑃 such that 𝑃 (𝑥) = 1 if and only
if 𝑅(𝑥) = 1 but examining the code of 𝑃 doesn’t make it any easier to
find some 𝑥 satisfying 𝑅?
• 𝐴(𝒪(𝐶))
(Note that the distributions above are of a single bit, and so being
indistinguishable simply means that the probability of outputting 1 is
equal in both cases up to a negligible additive factor.)
378 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
The adversary in the proof of Lemma 20.4 does not seem very
impressive. After all, it merely printed out its input. Indeed, the
definition of strong VBB security might simply be an overkill, and
“plain” VBB is enough for almost all applications. However, as men-
tioned above, plain VBB is impossible to achieve as well. We’ll prove a
slightly weaker version of Theorem 20.3:
If fully homomor-
Theorem 20.5 — Impossiblity of Obfuscation from FHE.
phic encryption exists then there is no VBB secure obfuscating
compiler.
(To get the original theorem from this, note that if VBB obfuscation
exists then we can transform any private key encryption into a fully
homomorphic public key encryption.)
P
This proof is simple but deserves a second read. A
crucial point here is to use FHE to allow the adversary
to essentially “feed 𝐶 ′ to itself” so it can obtain from
an encryption of 𝛼 an encryption of 𝛽, even though
that would not be possible using black box access
only.
We say a compiler 𝒪
Definition 20.6 — Indistinguishability Obfuscation.
is an indistinguishability obfuscator (IO) if for every two circuits
𝐶, 𝐶 ′ that have the same size and compute the same function, the
random variables 𝒪(𝐶) and 𝒪(𝐶 ′ ) are computationally indistin-
guishable.
1. IO is impossible to achieve.
However, it turns out that this guess is (most likely) wrong. New
results have shown that IO is extremely useful for many applications,
including those outlined above. They also gave some evidence that it
might be possible to achieve. We’ll talk about those works in the next
lecture.
21
More obfuscation, exotic encryptions
• The keys are generated and Eve gets the master public key.
• For 𝑖 = 1, … , 𝑇 = 𝑝𝑜𝑙𝑦(𝑛), Eve chooses an identity 𝑖𝑑𝑖 ∈ {0, 1}∗ and
gets the key 𝑑𝑖𝑑 .
• Eve chooses an identity 𝑖𝑑∗ ∉ {𝑖𝑑1 , … , 𝑖𝑑𝑇 } and two messages
𝑚0 , 𝑚 1 .
• We choose 𝑏 ←𝑅 {0, 1} and Eve gets the encryption of 𝑚𝑏 with
respect to the identity 𝑖𝑑∗ .
• Eve outputs 𝑏′ and wins if 𝑏′ = 𝑏.
Proof: Suppose for the sake of contradiction that there exists some
time 𝑇 = 𝑝𝑜𝑙𝑦(𝑛) adversary 𝐴 that succeeds in the IBE-CPA with
probability at least 1/2 + 𝜖 for some non-negligible 𝜖. We assume
without loss of generality that whenever 𝐴 makes a query to the key
distribution function with id 𝑖𝑑 or a query to 𝐻 ′ with prefix 𝑖𝑑, it had
already previously made the query 𝑖𝑑 to 𝐻. (𝐴 can be easily modified
to have this behavior)
We will build an algorithm 𝐵 that on input 𝔾1 , 𝔾2 , 𝑔, 𝑔𝑎 , 𝑔𝑏 , 𝑔𝑐 will
output 𝑔𝑎𝑏𝑐
̂ with probability 𝑝𝑜𝑙𝑦(𝜖, 1/𝑇 ).
The algorithm 𝐵 will guess 𝑖0 , 𝑗0 ←𝑅 {1, … , 𝑇 } and simulate 𝐴 “in
its belly” giving it the public key 𝑔𝑎 , and act as follows:
• When 𝐴 makes a query to 𝐻 with 𝑖𝑑, then for all but the 𝑖𝑡ℎ 0 queries,
𝐵 will chooose a random 𝑏𝑖𝑑 ∈ {0, … , |𝔾|} (as usual we’ll assume
|𝔾| is prime), choose 𝑒𝑖𝑑 = 𝑔𝑏𝑖𝑑 and define 𝐻(𝑖𝑑) = 𝑒𝑖𝑑 . Let 𝑖𝑑0 be
0 query 𝐴 made to the oracle. We define 𝐻(𝑖0 ) = 𝑔 (where 𝑔
the 𝑖𝑡ℎ 𝑏 𝑏
• When 𝐴 makes a query to the 𝐻 ′ oracle with input 𝑖𝑑′ ‖ℎ̂ then for all
but the 𝑗0𝑡ℎ query 𝐵 answers with a random string in {0, 1}ℓ . In the
𝑗0𝑡ℎ query, if 𝑖𝑑′ ≠ 𝑖𝑑0 then 𝐵 stops and fails. Otherwise, it outputs
ℎ.̂
• 𝐵 does stops the simulation and fails if we get to the challenge part.
It might seem weird that we stop the simulation before we reach the
challenge part, but the correctness of this reduction follows from the
following claim:
more obfu scati on, e xoti c e nc ry p ti on s 387
Claim: In the actual attack game, with probability at least 𝜖/10 𝐴 will
make the query 𝑖𝑑∗ ‖𝑔𝑎𝑏𝑐
̂ to the 𝐻 ′ oracle, where 𝐻(𝑖𝑑∗ ) = 𝑔𝑏 and the
public key is 𝑔 .
𝑎
Proof: If 𝐴 does not make this query then the message in the chal-
lenge is XOR’ed by a completely random string and 𝐴 cannot distin-
guish between 𝑚0 and 𝑚1 in this case with probability better than 1/2.
QED
Given this claim, to prove the theorem we just need to observe that,
assuming it does not fail, 𝐵 provides answers to 𝐴 that are identically
distributed to the answers 𝐴 receives in an actual execution of the CPA
game, and hence with probability at least 𝜖/(10𝑇 2 ), 𝐵 will guess the
query 𝑖0 when 𝐴 queries 𝐻(𝑖𝑑∗ ) and set the answer to be 𝑔𝑏 , and then
guess the query 𝑗0 when 𝐴 queries 𝑖𝑑∗ ‖𝑔𝑎𝑏𝑐
̂ in which case 𝐵’s output
will be correct. QED
We will now show how using such a multilinear map we can get
a construction for a witness encryption scheme. We will only show
the construction, without talking about the security definition, the
assumption, or security reductions.
Given some circuit 𝐶 ∶ {0, 1}𝑛 → {0, 1} and some message 𝑥
we want to “encrypt” 𝑥 in a way that given 𝑤 such that 𝐶(𝑤) = 1 it
would be possible to decrypt 𝑥, and otherwise it should be hard. It
should be noted that the encrypting party itself does not know any
such 𝑤 and indeed (as in the case of the proof of Reimann hypothe-
sis) might not even know if such a 𝑤 exists. The idea is the following.
We use the fact that the Exact Cover problem is NP complete to
map 𝐶 into collection of subsets 𝑆1 , … , 𝑆𝑚 of the universe 𝑈 (where
𝑚, |𝑈 | = 𝑝𝑜𝑙𝑦(|𝐶|, 𝑛)) such that there exists 𝑤 with 𝐶(𝑤) = 1 if and
only if there exists 𝑑 sets 𝑆𝑖1 , … , 𝑆𝑖𝑑 that are a partition of 𝑈 (i.e., every
element in 𝑈 is covered by exactly one of these sets), and moreover
there is an efficient way to map 𝑤 to such a partition and vice versa.
Now, to encrypt the message 𝑥 we take a degree 𝑑 instance of multilin-
ear maps (𝔾1 , … , 𝔾𝑑 , 𝑔1 , … , 𝑔𝑑 ) (with all groups of size 𝑝) and choose
random 𝑎1 , … , 𝑎|𝑈| ←𝑅 {0, … , 𝑝 − 1}. We then output the cipher-
𝑚
∏𝑗∈𝑆 𝑎𝑗 ∏ 𝑎𝑗 ∏ 𝑎𝑗
text 𝑔1 1
, … , 𝑔1 𝑗∈𝑆𝑚 , 𝐻(𝑔𝑑 𝑗∈𝑈 ) ⊕ 𝑥. Now, given a partition
388 a n i n te n si ve i n trod u c ti on to c ry p tog ra p hy
• Anonymous routing is about ensuring that Alice and Bob can com-
municate without that fact being revealed.
22.1 STEGANOGRAPHY
The goal in a stegnaographic communication is to hide cryptographic
(or non cryptographic) content without being detected. The idea is
simple: let’s start with the symmetric case and assume Alice and Bob
share a shared key 𝑘 and Alice wants to transmit a bit 𝑏 to Bob. We
assume that Alice and has a choice of 𝑡 words 𝑤1 , … , 𝑤𝑡 that would be
reasonable for her to send at this point in the conversation. Alice will
choose a word 𝑤𝑖 such that 𝑓𝑘 (𝑤𝑖 ) = 𝑏 where {𝑓𝑘 } is a pseudorandom
function collection. With probability 1 − 2−𝑡 there will be such a word.
Bob will decode the message using 𝑓𝑘 (𝑤𝑖 ). Alice and Bob can use an
error correcting code to compensate for the probability 2−𝑡 that Alice
is forced to send the wrong bit.
In the public key setting, suppose that Bob publishes a public key 𝑒
for an encryption scheme that has pseudorandom ciphertexts. That is,
to a party that does not know the key, an encryption is indistinguish-
able from a random string. To send some message 𝑚 to Bob, Alice
computes 𝑐 = 𝐸𝑒 (𝑚) and transmits it to Bob one bit at a time. Given
the 𝑡 words 𝑤1 , … , 𝑤𝑡 , to transmit the bit 𝑐𝑗 Alice chooses a word 𝑤𝑖
such that 𝐻(𝑤𝑖 ) = 𝑐𝑗 where 𝐻 ∶ {0, 1}∗ → {0, 1} is a hash function
22.3 TOR
Basic arhictecture. Attacks
22.4 TELEX
22.5 RIPOSTE
V
CONCLUSIONS
23
Ethical, moral, and policy dimensions to cryptography
• Are we less or more secure today than in the past? In what ways
did the balance between government and individuals shift in the
last few decades? Do governments have more or less data and tools
for monitoring individuals at their disposal? Do individuals and
non-governmental groups have more or less ability to inflict harm
(and hence need to be protected against)?
• How are these issues different in the U.S. as opposed to other coun-
tries? Is the debate too U.S. centric?
• Given that the FBI had a legal warrant for the information on the
iPhone, was it wrong of Apple to refuse to provide the help re-
quired?
• Was it wrong for Apple to have designed their iPhone so that they
are unable to easily extract information out of it? Should they be
required to make sure that such devices can be searched as a result
of a legal warrant?
• If the only way for the FBI to get the information was to get Apple’s
master signature key (that allows to completely break into any
iPhone, and even turn it into a recording/surveillance device),
would it have been OK for them to do it? Should Apple design their
device in a way that even their master signature key cannot break
them? Is that even possible, given that software updates are crucial
for proper functioning of such devices? (It was recently claimed
that Canadian police has had access to the master decryption key of
Blackberry since 2010.)
In the San Bernardino case, the utility of breaking into the phone
was questioned, given that both perpetrators were killed and there
was no evidence of them receiving any assistance. But there are cases
where things are more complicated. Brittney Mills was 29 years old
and 8 months pregnant when she was shot and killed in April 2015
ethi ca l , mora l , a n d p ol i c y d i me n si on s to c ry p tog r a p hy 397
• While we talked about bitcoin, the TLS protocol, two factor au-
thentication systems, and some aspects of pretty good privacy, we
restricted ourselves to abstractions of these systems and did not
attempt a full “end to end” analysis of a complete system. I do hope
you have learned the tools that you’d be able to understand the full
operation of such a system if you need to.