0% found this document useful (0 votes)
15 views17 pages

The Simplest Protocol For Oblivious Transfer

Uploaded by

Harsh Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views17 pages

The Simplest Protocol For Oblivious Transfer

Uploaded by

Harsh Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

The Simplest Protocol for Oblivious Transfer

Tung Chou1 and Claudio Orlandi2


1
Technische Universieit Eindhoven
2
Aarhus University

Abstract Oblivious Transfer (OT) is the fundamental building block of cryptographic protocols. In
this paper we describe the simplest and most efficient protocol for 1-out-of-2 OT to date, which is
obtained by tweaking the Diffie-Hellman key-exchange protocol. The protocol achieves UC-security
against active corruptions in the random oracle model. Due to its simplicity, the protocol is extremely
efficient and it allows to perform n 1-out-of-m OTs using only:
– Computation: (m + 1)n + 2 exponentiations (mn for the receiver, mn + 2 for the sender) and
– Communication: 32(n + 1) bytes (for the group elements), and 2mn ciphertexts.
We also report on an implementation of the protocol using elliptic curves, and on a number of mecha-
nisms we employ to ensure that our software is secure against active attacks too. Experimental results
show that our protocol (thanks to both algorithmic and implementation optimizations) is at least one
order of magnitude faster than previous work.

1 Introduction

Oblivious Transfer (OT) is a cryptographic primitive Diffie-Hellman Key Exchange


defined as follows: in its simplest flavour, 1 out of 2
OT, a sender has two input messages M0 and M1 and Alice Bob
a receiver has a choice bit c. At the end of the pro-
a ← Zp b ← Zp
tocol the receiver is supposed to learn the message A = ga
Mc and nothing else, while the sender is supposed to -
B = gb
learn nothing. Perhaps surprisingly, this extremely 
simple primitive is sufficient to implement any cryp- k = H(B a ) k = H(Ab )
e ← Ek (M )
tographic task [Kil88]. OT is also necessary to im- -
plement most advanced cryptographic tasks, such as
secure two-party computation (e.g., the millionaire’s
problem). Our OT Protocol
Given the importance of OT, and the fact that
most OT applications require a very large number of Sender Receiver
OTs, it is crucial to construct OT protocols which M0 , M 1 c
a ← Zp b ← Zp
are at the same time efficient and secure against re- A = ga
alistic adversaries. -
if c = 0: B = g b
A Novel OT Protocol. In this paper we present an if c = 1: B = Ag b
extremely simple, efficient and secure OT protocol  B
a
which has (to the best of our knowledge) not ap- k0 = H (B ) kc = H(Ab )
a
peared in the scientific literature before. The proto- k1 = H B A
e0 ← Ek0 (M0 )
col is a simple tweak of the celebrated Diffie-Hellman e1 ← Ek1 (M1 )
(DH) key exchange protocol. Given a group G and -
a generator g, the DH protocol allows two players
Figure 1. Our protocol in a nutshell
Alice and Bob to agree on a key as follows: Alice
samples a random a, computes A = g a and sends A to Bob. Symmetrically Bob samples a random
b, computes B = g b and sends B to Alice. Now both parties can compute g ab = Ab = B a from
which they can derive a key k. The key observation is now that Alice can also derive a different
2
key from the value (B/A)a = g ab−a , and that Bob cannot compute this group element (assuming
that the computational DH problem is hard).
We can now turn this into a random OT protocol by letting Alice play the role of the sender
and Bob the role of the receiver (with choice bit c) as shown in Figure 1. The first message (from
Alice to Bob) is left unchanged (and can be reused over multiple instances of the protocol) but now
Bob computes B as a function of his choice bit c: if c = 0 Bob computes B = g b and if c = 1 Bob
computes B = Ag b . At this point Alice derives two keys k0 , k1 from (B)a and (B/A)a respectively.
It is easy to check that Bob can derive the key kc corresponding to his choice bit from Ab , but
cannot compute the other one. (The protocol can be easily extended to 1-out-of-m OT.) Finally
we prove that if we combine our novel random OT protocol with the right symmetric encryption
scheme (e.g., an authenticated encryption scheme), then the overall protocol is secure in a strong,
simulation based sense and in particular we achieve UC-security against active corruptions in the
random oracle model.

A Secure and Efficient Implementation for the Random OT Protocol. We report on


an efficient and secure implementation of the random OT protocol3 : Our choice for the group is
a twisted Edwards curve that has been used by Bernstein, Duif, Lange, Schwabe and Yang for
building a high-speed high-security signature scheme [BDL+ 11]. The security of the curve comes
from the fact that it is birationally equivalent to Bernstein’s Montgomery curve Curve25519 [Ber06]
where ECDLP is believed to be hard: Bernstein and Lange’s SafeCurves website [BL14] reports
cost of 2125.8 for solving ECDLP on Curve25519 using the rho method. The speed comes from
the complete formulas for twisted Edwards curves proposed by Hisil, Wong, Carter, and Dawson
in [HWCD08].
We first modify the code in [BDL+ 11] and build a fast implementation. In order to make use
of the natural parallelism in the protocol, we also build a vectorized implementation that targets
for the Intel Sandy Bridge and Ivy Bridge microarchitectures. A comparison with the state of the
art shows that our implementation is at least an order of magnitude faster than previous work (we
compare in particular with the implementation reported by Asharov, Lindell, Schneider and Zohner
in [ALSZ13]). Furthermore, we take great care to make sure that our implementation is secure
against both passive attacks (our software is immune to timing attacks, since the implementation
is constant-time) and active attacks (by designing an appropriate encoding of group elements,
which can be efficiently verified and computed on). Our code can be downloaded from http:
//orlandi.dk/simpleOT.

Organization. The rest of the paper is organized as follows: in Section 1.1 we discuss related
work; in Section 2 we formally describe and analyse our protocol; Section 3 describes the chosen
representation of group elements; Section 4 describes how group operations are performed; Section 5
describes the low level building blocks of the group operations; and Section 6 reports the timings
of our implementation.

3
Here random refers to the fact that at the end of the protocol the sender receives two random messages, which
can be used later to encrypt his actual inputs. Our implementation does not include the encryption step, since
random OT is enough for several applications.

2
1.1 Related Work
OT owes its name to Rabin [Rab81], but a similar concept was introduced few years earlier by
Wiesner [Wie83] under the name of “conjugate coding”. There are different flavours of OT, and
in this paper we focus on the most common and useful flavour, namely 1-out-of-2 OT, which was
first introduced in [EGL85]. Many efficient protocols for OT have been proposed over the years.
Some of the protocols which are most similar to ours are those of Bellare-Micali [BM89] and Naor-
Pinkas [NP01]. However, these protocols are still more complex than our and most importantly, are
not not know to achieve full simulation based security. More recent OT protocols such as [HL10,
DNO08, PVW08] focus on achieving a strong level of security in concurrent settings4 without
relying on the random oracle model. Unfortunately this makes these protocols more cumbersome
for practical applications: even the most efficient of these protocols i.e., the protocol of Peikert,
Vaikuntanathan, and Waters [PVW08] requires 11 exponentiations for each OT and a common
random string (which must be generated by some trusted source of randomness at the beginning
of the protocol and after the adversary chooses whether to corrupt the sender or the receiver). In
comparison our protocol is more practical since it uses only 3n + 2 exponentiations and does not
require any (hard to implement in practice) setup assumptions.
OT Extension. While OT provably requires “public-key” type of assumptions [IR89] (such as
factoring, discrete log, etc.), OT can be “extended” [Bea96] in the sense that it is enough to generate
few “seed” OTs based on public-key cryptography which can then be extended to any number of
OTs using symmetric-key primitives only (PRG, hash functions, etc.). This can be seen as the
OT equivalent of hybrid encryption (where one encrypts a large amount of data using symmetric-
key cryptography, and then encapsulates the symmetric-key using a public-key cryptosystem).
OT extension can be performed very efficiently [IKNP03, ALSZ13] when one only wants security
against passsive adversaries and relatively efficiently [Nie07, NNOB12, Lar14, ALSZ15] if one wants
security against active adversaries. Still, to bootstrap OT extension we need a secure and efficient
OT protocol for the seed OTs (as much as we need secure and efficient public-key encryption
schemes to bootstrap hybrid encryption): The OT extension of [ALSZ15] reports that it takes
time (7 · 105 + 1.3n)µs to perform n OTs, where the fixed term comes from running 190 base
OTs. Using our protocol as the base OT in [ALSZ15] reduces the initial cost to approximately
190 · 114 ≈ 2 · 104 µs [Sch15]., which leads to a significant overall improvement (e.g., a factor 10 for
up to 4 · 104 OTs and a factor 2 for n up to 5 · 105 OTs).

2 The Protocol
We want to implement n m

1 -OT for messages of length ` with κ-bit security between a sender S
and a receiver R.5 The sender S has m vectors of message {(M0i , . . . , Mm−1
i )}i∈{1,...,n} where for all
j ` i
i, j : Mi ∈ {0, 1} . The receiver R has n choice values c ∈ {0, . . . , m − 1} for i = 1, . . . , n. At the
end of the protocol S learns nothing and R learns z i = Mci ∈ {0, 1}` for all i’s.
We split the presentation of the protocol in two parts: in the first part, we describe and analyze
a protocol for random OT where the sender outputs m random keys and the receiver learns only
one of them. Then we describe how to combine this protocol with an appropriate encryption scheme
to achieve UC security.
4
I.e., UC security [Can01], which is impossible to achieve without some kind of trusted setup assumptions [CF01].
5
We describe the protocol performing n OTs in parallel since we can do this more efficiently than simply repeating
n times the protocol for a single OT.

3
Notation. If S is a set s ← S is a random element sampled from s. We work over an additive group
(G, B, p, +) of prime order p (with log(p) > κ) generated by B (the base point), and we use the
additive notation for the group since we later implement our protocol using elliptic curves. Given
the representation of some group element P we assume it is possible to efficiently verify if P ∈ G.
Building Blocks. We use a cryptographic keyed hash-function (or a key-derivation function)
H : (G × G) × G → {0, 1}κ which is used to extract an κ bit key from a group element, and the
first two inputs are used to seed the function.6 We will model H as a random oracle when arguing
about the security of our protocol.

2.1 Random OT
We are now ready to describe our random OT protocol:
Setup: (only once, independent of n):
1. S samples y ← Zp and computes S = yB and T = yS;
2. S sends S to R, who aborts if S 6∈ G;
Choose: (in parallel for all i ∈ {1, . . . , n})
1. R samples xi ← Zp and computes

Ri = ci S + xi B

2. R sends Ri to S, who aborts if Ri 6∈ G;


Key Derivation: (in parallel for all i ∈ {1, . . . , n})
1. For all j ∈ {0, . . . , m − 1} S computes

kji = H(S,Ri ) (yRi − jT )

2. R computes
i
kR = H(S,Ri ) (xi S)

Basic Properties. It is easy to see that kji is computed by hashing xi yB + (ci − j)T and therefore
i = k i if both parties are honest. It is also easy to see that:
at the end of the protocol kR ci

Lemma 1. No (computationally unbounded) S on input Ri can guess ci w.p. greater than 1/p.

Proof. Since B generates G, fixed any P = x0 B the probability that Ri = P when ci = j is the
probability that xi = (x0 − ci y), therefore ∀S, P ∈ G, j ∈ {0, . . . , m − 1}, Pr[Ri = P |ci = j] = 1/p.

Lemma 2. No (computationally bounded) R∗ can output any two keys kji0 and kji1 with j0 6= j1 ∈
{0, . . . , m − 1} if the computational Diffie-Hellman problem is hard in G.

Proof. In the random oracle model R∗ can only (except with negligible probability) compute kji0 , kji1
by querying the oracle on points of the form U0i = (yRi − j0 T ) and U1i = (yRi − j1 T ). Assume
for the sake of contradiction that there exist a PPT R∗ who outputs (R, j0 , j1 , U0 , U1 ) ← R∗ (B, S)
such that (j1 − j0 )−1 (U0 − U1 ) = T = logB (S)2 B with probability at least . We show an algorithm
A which on input (B, X = xB, Y = xB) outputs Z = xyB with probability greater than 3 . Run
6
Standard hash functions do not take group elements as inputs, and in later sections we will give explicit encodings
of group elements into bitstrings.

4
(RX , U0X , U1X ) ← R∗ (B, X), (RY , U0Y , U1Y ) ← R∗ (B, Y ), then run (R+ , U0+ , U1+ ) ← R∗ (B, X + Y )
and finally output
(p + 1)
(U0+ , U1+ ) − (U0X + U1X ) − (U0Y + U1Y )

Z=
2
Now Z = xyB with probability at least 3 , since when all three executions of R∗ are successful, then
U0X +U1X = (x2 )B, U0Y +U1Y = (y 2 )B and U0+ , U1+ = (x+y)2 B and therefore Z = p+12 2xyB = xyB.
t
u

2.2 How to use the protocol and UC security


In this subsection we show that if we combine the random OT from the previous subsection with
an appropriate encryption scheme, then the combined protocol achieves UC security.
Motivation. Lemma 1 and 2 only state that “privacy” holds for both the sender and the receiver.
However, since OT is mostly used as a building block into more complex protocols, it is important
to understand to which extent our protocol offers security when composed arbitrarily with itself or
other protocols: Simulation based security is the minimal requirement which enables to argue that
a given protocol is secure when composed with other protocols. Without simulation based security,
it is not even possible to argue that a protocol is secure if it is executed twice in a sequential way!
(See e.g., [DNO08] for a concrete counterexample for OT). The UC theorem [Can01] allows us to
say that if a protocol satisfies the UC definition of security, then that protocol will be secure even
when arbitrarily composed with other protocols. Among other things, to show that a protocol is UC
secure one needs to show that a simulator can extract the input of a corrupted party: intuitively,
this is a guarantee that the party knows its input, and its not reusing/modifying messages received
in other protocols (aka malleability attack).
From Random OT to standard OT. We start by adding a transfer phase to the protocol, where
the sender sends the encryption of his two messages to the receiver:

Transfer: (in parallel for all i ∈ {1, . . . , n})


1. For all j ∈ {0, . . . , m − 1} S computes eij ← E(kji , Mji )
2. S sends (ei0 , . . . , eim−1 ) to R;
Retrieve: (in parallel for all i ∈ {1, . . . , n})
1. R computes and outputs z i = D(k i , eici ).

The encryption scheme. We need a symmetric encryption scheme (E, D). We call κ the bitlength
of the key, ` the bitlength of the message and `0 the ciphertext bitlength. We allow the decryption
algorithm to output a special symbol ⊥ to indicate an invalid ciphertext. We need the encryption
scheme to satisfy the following properties:
Definition 1. We say a symmetric encryption scheme (E, D) is non-committing if there exist PPT
algorithms S1 , S2 such that ∀m ∈ {0, 1}` (e0 , k 0 ) and (e, k) are computationally indistinguishable
where e0 ← S1 (1κ ), k 0 ← S2 (e0 , m), k ← {0, 1}κ and e ← E(k, m).
The definition says that it is possible for a simulator to come up with a ciphertext e which can later
be “explained” as an encryption of any message m, in such a way that the joint distribution of the
encryption and the key in this simulated experiment is indistinguishable from the normal use of
the encryption scheme, where a key is first sampled and then an encryption of m is generated.

5
Definition 2. (E, D) satisfies ciphertext integrity if Pr[D(k, e) 6= ⊥|k ← {0, 1}κ , e ← A(1κ )] is
negligible in κ.

Traditionally ciphertext integrity is defined for an adversary who has access to an encryption oracle,
but the above definition suffices for our goal.
A concrete example. We give a concrete example of an encryption scheme satisfying Definition 1
and 2. In this encryption scheme κ = `0 = 2`. The encryption algorithm E(k, m) parses k as
(α, β) ∈ GF (2` )×GF (2` ), outputs c1 = m+α and c2 = β·c1 . The decryption function D(k, e) parses
k as (α, β) ∈ GF (2` )×GF (2` ), outputs ⊥ if c2 6= β ·c1 or m = c1 +α otherwise. It is easy to see that
this scheme satisfies Definition 1:7 S1 outputs two random bitstrings (c1 , c2 ) ← GF (2` ) × GF (2` )
and S2 (e, m) outputs α = c1 + m and β = c2 · c−1 1 . Finally the scheme satisfies Definition 2 since
(fixed any key k) only a fraction 2−κ of ciphertexts do not decrypt to ⊥.
Simulation Based Security (UC).8 We can finally argue UC security of our protocol. In particu-

lar, we define a functionality FOT (n, `) as follows: the functionality receives as input a vector of bits
(c , . . . , c ) from the receiver and a a vector of pairs of `-bit messages ((M01 , M11 ), . . . , (M01 , M11 ))
1 n

from the sender, and outputs a vector of `-bit strings (z 1 , . . . , z n ) to the receiver, such that for all i
z i = Mcii . In addition, we weaken the ideal functionality in the following way: a corrupted receiver
can input the choice bits in an adaptive fashion i.e., the ideal adversary can input a choice bit (for
any i) learn message z i , then chooses which choice bit to input next and so on.
Theorem 1. If the computational DH problem is hard in G the protocol above securely implements

the functionality FOT (n, `) in the random oracle model.
The main idea behind the proof are: 1) it is possible to extract the choice value by checking whether
a corrupted receiver queries the random oracle on points yRi + cT for some c (no adversary can
query on points of this form for more than one c without breaking the CDH assumption and
the non-committing property of (E, D) allows us to complete a successful simulation even if the
corrupted receiver queries the oracle after he receives the ciphertexts) and 2) it is possible to extract
the sender messages by decrypting the ciphertexts with every key which the receiver got from the
random oracle (and the ciphertext-integrity property of (E, D) allows us to conclude that except
with negligible probability D returns ⊥ for all keys different from the correct one).

Proof. (Corrupted Sender) First we argue that our protocol securely implement the functionality
against a corrupted sender in the random oracle model (we will in particular use the property that
the simulator can learn on which points the oracle was queried on), by constructing a simulator for
a corrupted S ∗ in the following way:9 1) in the first phase, the simulator answers random oracle
queries H(·,·) (·) at random; 2) at some point S ∗ outputs S and the simulator checks that S ∈ G
or aborts otherwise; 3) the simulator now chooses a random xi for all i and sends Ri = xi S to S ∗ .
Note that since xi is chosen at random the probability that S ∗ had queried any oracle H(S,Ri ) (·)
before is negligible. At this point, any time S ∗ makes a query of the form H(S,Ri ) (P q ), the simulator
stores its random answers in k i,q ; 4) Now S ∗ outputs (ei0 , . . . , eim−1 ) and the simulator computes
for all i, j the value Mji in the following way: for all q compute D(k i,q , eij ) and set Mji to be the first
7
In fact the scheme satisfies Definition 1 in a stronger, information theoretic sense.
8
This paragraph assumes that the reader is familiar with standard security definitions and proofs for two-party
computation protocols such as those presented in [HL10].
9
The main goal of this argument is to show that a corrupted sender knows the message vectors.

6
such value which is 6= ⊥ (if any), or ⊥ otherwise; 5) finally it inputs all the vectors (M0i , . . . , Mm−1 i )
to the ideal functionality. We now argue that no distinguisher can tell a real-world view apart from
a simulated view. This follows from Lemma 1 (the distribution of Ri does not depend on ci ), and
that the output of the honest receiver can only be different if there exist a pair (i, j) such that the
adversary queried the random oracle on a point P 0 6= yRi − jT and m0 = D(k 0 , eji ) 6= ⊥, where
k 0 = H(S,Ri ) (P 0 ). In this case the simulator will input Mji = m0 to the ideal functionality which
could cause the honest party in the ideal world to output a different value than it would in the real
world (if ci = j). But this happens only with negligible probability thanks to the property of the
encryption scheme (Definition 2).
(Corrupted Receiver) We now construct a simulator for a corrupted receiver10 : 1) In the first
phase, the simulator answers random oracle queries H(·,·) (·) truly at random; 2) at some point the
simulator samples a random y and outputs S = yB. Afterwards it keeps answering oracle queries
at random, but for each query of the form k q = H(S,P q ) (Qq ) it saves the triple (k q , P q , Qq ) (since y
is random the probability that any query of the form H(S,·) (·) was performed before is negligible);
3) at some point the simulator receives a vector of elements Ri and aborts if ∃i : Ri 6∈ G; 4) the
simulator now initializes all ci = ⊥; for each tuple q in memory such that for some i it holds that
P q = Ri the simulator checks if Qq = y(Ri − dS) for some d ∈ {0, . . . , m − 1}. Now the simulator
saves this value d in ci if ci had not been defined before or aborts otherwise. In other words, when
the simulator finds a candidate choice value d for some i it checks if it had already found a choice
value for that i (i.e., ci 6= ⊥) and if so it aborts and outputs fail, otherwise if it had not found
a candidate choice bit for i before (i.e., ci = ⊥) and in this case it sets ci = d; 5) When the
adversary is done querying the random oracle, the simulator has to send all ciphertexts vectors
{(ei0 , . . . , eim−1 )}i∈[i] : ∀i ∈ [n], j ∈ {0, . . . , m − 1} the simulator sets a) if ci = ⊥ : eji = S1 (1κ ) b)
if j 6= ci : eji = S1 (1κ ) and c) if j = ci : eji = E(kci i , z i ); 6) at this point the protocol is almost
over but the simulator can still receive random oracle queries. As before, the simulator answers
them at random except if the adversary queries on some point H(S,Ri ) (Qq ) with Qq = y(Ri − dS).
If this happens for any i such that ci 6= ⊥ the simulator aborts and outputs fail. Otherwise the
simulator sets ci = d, inputs ci to the ideal functionality, receives z i and programs the random
oracle to output k 0 ← S2 (eici , z i ).
Now to conclude our proof, we must argue that a simulated view is indistinguishable from the
view of a corrupted party in an execution of the protocol. When the simulator does not output fail
indistinguishability follows immediately from Definition 1. Finally the simulator only outputs fail
if R∗ queries the oracle on two points U0 , U1 such that U1 − U0 is a small multiple of y 2 B, and as
argued in Lemma 2 such an adversary can be used to break the computational DH assumption. t u

Security in Practice. Clearly, a proof that a → b only says that b is true when a is true, and
since cryptographic security models (a) are not always a good approximation of the real world,
we discuss some of these discrepancies here and therefore to which extent our protocol is secure
(b): When instantiating our protocol we must replace the random oracle with a hash function: the
proof crucially relies on the fact that the oracle is local to the protocol i.e., it only exists while
the protocol is running. Clearly, there is no such a thing in the real world. We argue here that our
choice of using the transcript of the protocol (S, Ri ) as salt for the hash function ensures to best
possible extent that the oracle is local to the protocol. Consider the following man-in-the middle
attack, where an adversary A plays two copies of the protocol (for simplicity here m = 2), one as
10
The main goal of this argument is to show that a corrupted receiver knows the choice value.

7
the sender with R and one as the receiver with S. Here is how the attack works: 1) A receives S
from S and forwards it to R; 2) Then the adversary receives R from R and sends R0 = S − R to
S; 3) Finally A receives e0 , e1 from S and sends e00 = e1 and e01 = e0 to R. It is easy to see that
if the same hash function was used to instantiate the random oracle in the two protocols, then the
honest receiver would output z = m1−c , which is clearly a breach of security (i.e., this attack could
not be run if the OT protocols are replaced with OT functionalities). This motivates our choice of
using the transcript of the protocol (S, Ri ) to salt the hash function. Now, if an adversary changes
any message between the sender and the receiver, the keys obtained by the honest parties will be
completely independent and therefore the receiver outputs ⊥ except with negligible probability. This
motivates in practice the need for an encryption scheme (E, D) which satisfies ciphertext integrity,
and therefore we recommend to use our random OT protocol only together with an authenticated
encryption scheme. Clearly this does not satisfy the non-committing property, but we conjecture
that this does not lead to any concrete vulnerabilities.

3 The Random OT Protocol in Practice

This section describes how the random OT protocol can be realized in practice. In particular, this
section focuses on describing how group elements are represented as bitstrings, i.e., the encodings.
In the abstract description of the random OT protocol, the sender and the receiver transmit and
compute on “group elements”, but clearly any implementation of the protocol transmits and com-
putes on bitstrings. We describe how the encodings are designed to achieve efficiency (both for
communication and computation) and security (particularly against a malicious party who might
try to send malformed encodings).
The Group. The group G we choose for the protocol is a subset of Ḡ; Ḡ is defined by the set of
points on the twisted Edwards curve

{(x, y) ∈ F2255 −19 × F2255 −19 : −x2 + y 2 = 1 + dx2 y 2 }

and the twisted Edwards addition law


 
x1 y2 + x2 y1 y1 y2 + x1 x2
(x1 , y1 ) + (x2 , y2 ) = ,
1 + dx1 x2 y1 y2 1 − dx1 x2 y1 y2
introduced by Bernstein, Birkner, Joye, Lange, and Peters in [BBJ+ 08]. The constant d and the
generator B can be found in [BDL+ 11]. The two groups Ḡ and G are isomorphic respectively to
Zp × Z8 and Zp with p = 2252 + 27742317777372353535851937790883648493.
Encoding of Group Element. An encoding E for a group G0 is a way of representing group
elements as fixed-length bitstrings. We write E(P ) for a bitstring which represents P ∈ G0 . Note
that there can be multiple bitstrings that represent P ; if there is only one bitstring for each group
element, E is said to be deterministic (E is said to be non-deterministic otherwise11 ). Also note
that some bitstrings (of the fixed length) might not represent any group element; we write E(G1 )
for the set of bitstrings which represent some element in G1 ⊆ G0 . E is said to be verifiable if there
exists an efficient algorithm that, given a bitstring as input, outputs whether it is in E(G0 ) or not.
The Encoding EX for Group Operations. The non-deterministic encoding EX for Ḡ, which is
based on the extended coordinates in [HWCD08], represents each point using the tuple (X : Y : Z :
11
We stress that non-deterministic in this context does not mean that the encoding involves any randomness.

8
T ) with XY = ZT , representing x = X/Z and y = Y /Z. We use EX whenever we need to perform
group operations since given EX (P ), EX (Q) where P, Q ∈ Ḡ, it is efficient to compute EX (P + P ),
EX (P +Q), and EX (P −Q). In particular, given an integer scalar r it is efficient to compute EX (rB),
and given r and EX (P ) it is efficient to compute EX (rP ). See Sections 4 and 5 for details on EX .
The Encoding E0 and Related Encodings. The deterministic encoding E0 for Ḡ represents
each group element as a 256-bit bitstring: the natural 255-bit encoding of y followed by a sign bit
which depends only on x. The way to recover the full value x is described in [BDL+ 11, Section 5],
and group membership can be verified efficiently by checking whether x2 (y 2 − 1) = dy 2 + 1 holds;
therefore E0 is verifiable. See [BDL+ 11] for more details of E0 .
For the following discussions, we define deterministic encodings E1 and E2 for G as

E1 (P ) = E0 (8P ), E2 (P ) = E0 (64P ), P ∈ G.

We also define non-deterministic encodings E (0) and E (1) for G as

E (0) (P ) = E0 (P + t), E (1) (P ) = E0 (8P + t0 ), P ∈ G,

where t, t0 can be any 8-torsion point. Note that each element in G has exactly 8 representations
under E (0) and E (1) .
Point Compression/Decompression. It is efficient to convert from EX (P ) to E0 (P ) and back;
since E0 represents points as much shorter bitstrings, these operations are called point compression
and point decompression, respectively. Roughly speaking, point compression outputs y = Y /Z
along with the sign bit of x = X/Z, and point decompression first recovers x and then outputs
X = x, Y = y, Z = 1, T = xy. We always check for group membership during point decompression.
We use E0 for data transmission: the parties send bitstrings in E0 (Ḡ) and expect to receive
bitstrings in E0 (Ḡ). This means a computed point encoded by EX has to be compressed before it
is sent, and a received bitstring has to be decompressed for subsequent group operations. Sending
compressed points helps to reduce the communication complexity: the parties only need to transfer
32 + 32n bytes in total.
Secure Data Transmission. At the beginning of the protocol S computes and sends E0 (S). In
the ideal case, R should receive a bitstring in E0 (G) which he interprets as E0 (S). However, an
attacker (a corrupted S ∗ or a man-in-the-middle) can send R 1) a bitstring that is not in E0 (Ḡ)
or 2) a bitstring in E0 (Ḡ \ G). In the first case, R detects that the received bitstring is not valid
during point decompression and ignores it. In the second case, R can check group membership by
computing the pth multiple of the point, but a more efficient way is to use a new encoding E 0 such
that each bitstrings in E0 (Ḡ) represents a point in G under E 0 . Therefore R considers the received
bitstring as E (0) (S) = E0 (S + t), where t can be any 8-torsion point.
The encoding E (0) (along with point decompression) makes sure that R receives bitstrings
representing elements in G. However, an attacker can derive ci by exploiting the extra information
given by a nonzero t: a naive R would compute and send E0 (ci (S + t) + xi B) = E0 (ci t + Ri ); now
by testing whether the result is E0 (G) the attacker learns whether ci = 0.
To get rid of the 8-torsion point, R can multiply received point by 8 · (8−1 mod p), but a more
efficient way is to just multiply by 8 and then operate on EX (8S) and EX (8xi B) to obtain and
send E1 (Ri ) = E0 (8Ri ), i.e, the encoding switches to E1 for Ri . After this S works similarly as
R: to ensure that the received bitstring represents an element in G, S interprets the bitstring as

9
S R
Output Input Operations Output Input Operations
S y y·B 8S E (0) (S) 8 · D(E (0) (S))
E (0) (S) S C(S) E1 (S) 8S C(8S)
8S S 8·S
E1 (S) 8S C(8S)
64T 8y, 8S 8 · (y · 8S)
64Ri E (1) (Ri ) 8 · D(E (1) (Ri ) 8xi B 8xi 8xi · B
E2 (Ri ) 64Ri C(64Ri ) 8S + 8xi B 8S, 8xi 8S + 8xi
64yRi y, 64Ri y · 64Ri E (1) (Ri ) 8Ri C(8Ri )
E2 (yRi ) 64yRi C(64yRi ) E2 (Ri ) 8Ri C(8 · 8Ri )
64(yRi − T ) 64T, 64yRi 64yRi − 64T 64xi S 8xi , 8S 8xi · 8S
E2 (yRi − T ) 64(Ri − T ) C(64(yRi − T )) E2 (xi S) 64xi S C(64xi S)

Table 1. How the parties compute encodings of group elements: each row shows that the “Output” is computed
given “Input” using the operations “Operations”. The input might come from the output of a previous row, a received
string (e.g., E (1) (Ri )), or a random scalar that the party generates (e.g., 8xi ). The upper half of the table are the
operations that does not depend on i, which means the operations are performed only once for the whole protocol.
EX is suppressed: group elements written without encoding are actually encoded by EX . C and D stand for point
compression and point decompression respectively. Computation of the rth multiple of P is denoted as “r · P ”. In
particular, 8 · P can be carried out with only 3 point doublings.

E (1) (Ri ) = E0 (8Ri + t); to get rid of the 8-torsion point S also multiplies the received point by 8,
and then S operates on EX (64Ri ) and EX (64T ) to obtain EX (64(yRi − jT )).
Key Derivation. The protocol computes HS,Ri (P ) where P can be xi S, yRi , or yRi − jT for
j ∈ {0, . . . , m − 1}. This is implemented by hashing E1 (S) k E2 (Ri ) k E2 (P ) with Keccak [BDPVA09]
with 256-bit output. The choice of encodings is natural: S computes EX (S), and R computes
EX (8S); since multiplication by 8 is much cheaper than multiplication by (8−1 mod p), we use
E1 (S) = E0 (8S) for hashing. For similar reasons we use E2 for Ri and P .
Actual Operations. For completeness, we present in Table 1 a full overview of operations per-
formed during the protocol for the case of 1 out of 2 OT (i.e., m = 2).

4 Group Operations

This section describes how group operations in Section 3 are implemented, with focus on the most
computationally intensive part of the protocol, namely the exponentiations.
Scalar Multiplications. Exponentiations on the curve Ḡ are called scalar multiplications. More
precisely, a scalar multiplication stands for computation of EX (r · P ) where P ∈ Ḡ; the point P is
called the base point for the scalar multiplication. We follow [BDL+ 11] to first compute the 253-bit
integer r mod p and then write it as r0 +16r1 +· · ·+1663 r63 , where ri ∈ {−8, −7, −6, −5, −4, −3, −2,
−1, 0, 1, 2, 3, 4, 5, 6, 7}.
There are mainly three types of scalar multiplications in the protocol; each of them accounts
for n of the 3n + 2 scalar multiplications (the constants 8 and 64 are suppressed):

– computation of EX (y · Ri ) by S,
– computation of EX (xi · B) by R,
– computation of EX (xi · S) by R.

10
Note that two more scalar multiplications are required for computing EX (S) and EX (T ). We show
below how each type is implemented in our software, using point additions/doublings and table
lookups as building blocks.
Point Additions and Doublings. We use the formula in [HWCD08, Section 3.1] to compute
the addition (X3 : Y3 : Z3 : T3 ) = (X1 : Y1 : Z1 : T1 ) + (X2 : Y2 : Z2 : T2 ) with 9 field
multiplications (including one multiplication by 2d). For precomputed points we follow [BDL+ 11]
to use (X 0 : Y 0 : Z 0 ) representing x = (Y 0 − X 0 )/2 and y = (Y 0 + X 0 )/2 with Z 0 = 2dxy. The
mixed addition (X3 : Y3 : Z3 : T3 ) = (X1 : Y1 : Z1 : T1 ) + (X20 : Y20 : Z20 ) then takes only 7
field multiplications. We also use the formula in [HWCD08, Section 3.3] to compute the doubling
(X2 : Y2 : Z2 : T2 ) = 2 · (X1 : Y1 : Z1 : T1 ) with 4 field multiplications and 4 field squarings.
See Section 5 for more details about field arithmetic. For the rest of the section points are written
without encodings since it should be clear in the context whether they are precomputed points or
not.
Table Lookups. Scalar multiplication algorithms described below compute many intermediate
points before reaching the final results. Some intermediate points are “looked up” from a table
instead of being computed with point additions/doublings. Side-channel-resistant table lookups
requires arithmetic operations, and building the table itself also takes some computation (although
sometimes it can be considered as precomputation). However, with reasonable parameter choices
(e.g., size of table), the benefit it brings can be worthy the cost.
More precisely, our scalar multiplication algorithms look up the ri -th multiple of a point P from
a table, where ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}. We follow [BDL+ 11] to use
a table containing P, 2P, 3P, 4P, 5P, 6P, 7P, 8P to achieve efficient table lookups. See Section 5 for
more details.
Computing yRi ’s. We use the fixed-window method for each i, which is the standard method for
variable-base scalar multiplications: first compute Ri , 2Ri , 3Ri , 4Ri , 5Ri , 6Ri , 7Ri , 8Ri with 4 point
doublings and 3 point additions and store them in a table; then starting with looking up P63 = y63 Ri
in the table, keep computing Pj−1 = 16Pj + yj−1 Ri with 4 point doublings, 1 point addition, and
1 table lookup, until P0 = yRi . For n scalar multiplications this method takes 256n doublings, 66n
additions, and 64n table lookups.
Computing xi B’s. These are fixed-base scalar multiplications; a straightforward method is as
follows: for each i we obtain 16j xij B from a precomputed table with 1 table lookup, and then add
all the results to obtain xi B = j i
P
j 16 xj B. For n scalar multiplications this method takes 63n
mixed additions and 64n table lookups.
Computing xi S’s. This is similar to fixed-base scalar multiplications since the base point is the
same. The algorithm is as follows: fix a parameter α which divides 64; starting with computing
64/α−1
X
Pα−1 = 16αj (xijα+α−1 S)
j=0

with 64/α − 1 additions and 64/α table lookups, keep computing

64/α−1
X
Pk−1 = 16Pk + 16αj xijα+i−1 S,
j=0

11
with 4 doublings, 64/α additions and 64/α table lookups, until P0 = xi S. We need to generate a
table for looking up all 16αj (xijα+i−1 S) in advance, which takes 3 + 4(64 − α) + 64/α doublings and
192/α additions. For n scalar multiplications, this method takes 3 + 4(64 − α) + 64/α + 4(α − 1)n
doublings, 192/α + 63n additions, and 64n table lookups. Note that when α = 64 the algorithm
becomes the fixed-window method. When α = 1 the algorithm is similar to the algorithm for
fixed-base scalar multiplications.
The point of this method is to prepare some multiples of S that can be used for all the scalar
multiplications. The more scalar multiplications there are, the more precomputation (the smaller
α) it is worth. The optimal value of α which minimizes the computation can be estimated given n
and the cost for point addition, point doubling, and table lookup.
Relative Cost of the Scalar Multiplications. Assuming S and R has the same CPU speed.
Among the three types of scalar multiplications, the computation of yRi ’s is the most expensive
for it cannot benefit from using the same base point. On the contrary, the computation of xi B’s is
much cheaper because of the precomputed table: in our implementation it is around 3 times faster
than computation of yRi ’s. The cost for xi S’s is somewhere in between: when n is small (using
bigger α) the cost is close to that of yRi ’s; when n is big enough (using smaller α) the cost is close
to that of xi B’s. Consequently, when n is small the latency of the protocol is dominated by R;
when n is big enough it is dominated by S.
Choosing the Radix. While we use radix 16 for the scalars, one might think using some other
radix such as 32 or 8 can be better. Since computation of yRi ’s is the most expensive, let’s consider
whether switching to another radix would help in this case. Switching to another radix only makes
small difference in number of doublings, so only additions and table lookups have to be considered.
The number of additions and table lookups are roughly the same for any reasonable radix. In our
implementation a point addition is roughly 2.85 times slower than a table lookup. Switching to radix
32 would decrease the number of additions and table lookups by 20% while roughly doubling cost
for each table lookup; therefore switching to radix 32 does not seem to help. Similarly, switching to
radix 8 also does not seem to help, so radix 16 seems to be the best choice for our implementation.

5 Field Arithmetic and Table Lookups

This section describes our implementation strategy for arithmetic operations in F2255 −19 and table
lookups, which serve as low-level building blocks for the scalar multiplication algorithms in Sec-
tion 4. Field operations are decomposed into double-precision floating-point operations using our
strategy. A straightforward way for implementation is then using double-precision floating-point
instructions. However, a better way to utilize the 64 × 64 → 128-bit general multiplier is to de-
compose field operations into integer instructions as [BDL+ 11] does. The real reason we decide to
use floating-point operations is that it allows us to use 256-bit vector instructions on the target
microarchitectures, which are functionally equivalent to 4 double-precision floating-point instruc-
tions. The technique, which is called vectorization, makes our vectorized implementation run much
faster than our non-vectorized implementation based on [BDL+ 11].
Representation of Field Elements. Each field element x ∈ F2255 −19 is represented as 12 limbs
xi and xi /2d21.25ie ∈ Z. Each xi is stored as a double-precision
P
(x0 , x1 , . . . , x11 ) such that x =
floating-point number. Field operations are then carried out by limb operations such as floating-
point additions and multiplications.

12
When a field element gets initialized (e.g., when obtained from a table lookup), each xi uses no
more than 21 bits of the 53-bit mantissa. However, after a series of limb operations, the number
of bits xi takes can grow. It is thus necessary to reduce the number of bits (in the mantissa) with
carries before any precision is lost; see below for more discussions.
Field Arithmetic. Additions and subtractions of field elements are implemented in a straight-
forward way: simply adding/subtracting the corresponding limbs. This does increase the number
of bits in the mantissa, but in our application it suffices to reduce bits only at the end of the
multiplication function.
A field multiplication is divided into two steps. The first step is a schoolbook multiplication on
the 2 · 12 input limbs, with reduction modulo 2255 − 19 to bring the result back to 12 limbs. The
schoolbook multiplication takes 132 floating-point additions, 144 floating-point multiplications, and
a few more multiplications by constants to handle the reduction.
Let (c0 , c1 , . . . , c11 ) be the result after schoolbook multiplication. The second step is to perform
carries to reduce number of bits in ci . Carry from ci to ci+1 (indices work modulo 12), which
we denote as ci → ci+1 , is performed with 4 floating-point operations: c ← ci + αi ; c ← c − αi ;
ci ← ci − c; ci+1 ← ci+1 + c. The idea is to use αi = 3 · 2ki where ki is big enough so that the less
significant part of ci are discarded in ci + αi , forcing c to contain only the more significant part of
ci . For i = 11, one extra multiplication is required to scale c by 19 · 2−255 before it is added to c0 .
A straightforward way to reduce number of bits in all limbs is to use the carry chain c0 → c1 →
c2 → · · · → c11 → c0 → c1 . The problem with the straightforward carry chain is that there is not
enough instruction level parallelism to hide the 3-cycle latencies (see discussion below). To hide the
latencies we thus interleave the following 3 carry chains:

c0 → c1 → c2 → c3 → c4 → c5 ,
c4 → c5 → c6 → c7 → c8 → c9 ,
c8 → c9 → c10 → c11 → c0 → c1 .

In total the multiplication function takes 192 floating-point additions/subtractions and 156 floating-
point multiplications. When the input operands are the same, many limb products will repeat in
the schoolbook multiplication; a field squaring is therefore cheaper than a field multiplication. In
total the squaring function takes 126 floating-point additions/subtractions and 101 floating-point
multiplications.
Table Lookups. The task here is to “look up” ri P where ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2,
3, 4, 5, 6, 7} from a table containing P, 2P, 3P, 4P, 5P, 6P, 7P, 8P . We follow [BDL+ 11] to look up
ri P in 2 steps: load |ri |P from the table and then negate the point if ri is negative.
In order to obtain one of the limb of one of 4 coordinates X, Y, Z, T of |ri |P , we first initialize
a limb v to the corresponding limb of 0P if |ri | = 0; v is set to 0 otherwise. Then for each of the
8 candidate limbs w in the table, we mask w with one AND operation and then OR the result
into v. Loading |ri P | thus takes 4 · 12 · (8 · 2) = 768 64-bit logical operations, and a few more for
initialization and computing the masks.
Negating a point in the extended coordinates is simply negating the X and T coordinates.
Negating a field element can be easily done by negating each limb of it. Since each limb is a
floating-point number, we negate each limb by XORing −0.0 into the limb; conditional negation is
done by masking −0.0 before the XOR. A conditional negation thus takes 2 · 12 = 24 XORs and a

13
instruction latency throughput description
vandpd 1 1 bitwise and
vorpd 1 1 bitwise or
vxorpd 1 1 (4) bitwise xor
vaddpd 3 1 4-way parallel double-precision floating-point additions
vsubpd 3 1 4-way parallel double-precision floating-point subtractions
vmulpd 5 1 4-way parallel double-precision floating-point multiplications
Table 2. 256-bit vector instructions used in our implementation. Note that vxorpd has throughput of 4 when it has
only one source operand.

few more operations for masking. In total our table lookup function for extended coordinates takes
848 64-bit logical operations.
When P is a precomputed point we use the 3-coordinate system instead of the extended coor-
dinates. Using this representation reduces cost for loading |ri P |, but the conditional negation takes
more operations since it involves conditionally negating one coordinate and conditionally swapping
the other two coordinates. In total our table lookup function for precomputed points takes 680
64-bit logical operations.
Vectorization. We decompose field arithmetic and table lookups into 64-bit floating-point and
logical operations. The Intel Sandy Bridge and Ivy Bridge microarchitectures, as well as many recent
microarchitectures, offer instructions that operate on 256-bit registers. Some of these instructions
treat the registers as vectors of 4 double-precision floating-point numbers and perform 4 floating-
point operations in parallel; there are also 256-bit logical instructions that can be viewed as 4
64-bit logical instructions. We thus use these instructions to run 4 scalar multiplications in parallel.
Table 2 shows the instructions we use, along with their latencies and throughputs on the Sandy
Bridge and Ivy Bridge given in Fog’s well-known survey [Fog14].

6 Implementation Results

This section compares the speed of our implementation of 1 out of 2 OT (i.e., m = 2) with other
similar implementations. Since our protocol is quite similar to the Diffie-Hellman key exchange
protocol, we first compare our DH speeds with existing Curve25519 implementations. The experi-
ments are carried out on two machines on the eBACS site for publicly verifiable benchmarks [BL15]:
h6sandy (Sandy Bridge) and h9ivy (Ivy Bridge). Since our protocol can serve as the base OTs for
an OT extension protocol, we also compare our speed with a base OT implementation presented
in [ALSZ13], which is included in the Scapi multi-party computation library; the experiments are
made on an Intel Core i7-3537U processor (Ivy Bridge) where each party runs on one core.
Comparing with Curve25519 Implementations. Table 3 compares our work with existing
Curve25519 implementations. “Cycles to generate a public key” indicates the time to generate
the public key given a secret key; the existing implementation we choose is an implementation by
Andrew Moon [MF15]. “Cycles to compute a shared secret” indicates the time to generate the
shared secret, given a secret key and a public key; the numbers for existing implementation are
from the eBACS site. Note that since our software uses 4-way vectorization, the numbers in the
table are the time for generating 4 public keys or 4 shared secrets divided by 4. All our timings are
better than the timings of existing Curve25519 implementations.

14
Machines h6sandy h9ivy
existing Curve25519 Cycles to compute a public key 89500 83636
implementations Cycles to compute a shared secret 194036 182708
this work Cycles to generate a public key 61458 60853
Cycles to compute a shared secret 182169 180343
Table 3. DH speeds of our work and existing Curve25519 implementations.

n 4 8 16 32 64 128 256 512 1024


this work Running time of S 548 381 321 279 265 257 246 237 228
Running time of R 472 366 279 229 205 200 193 184 177
[ALSZ13] Running time of S 17976 10235 6132 4358 3348 2877 2650 2528 2473
Running time of R 16968 9261 5188 3415 3382 2909 2656 2541 2462
Table 4. Timings for per OT in kilocycles. Multiplying the number of kilocycles by 0.5 one can obtain the running
time (in µs) on our test architecture.

Comparing with Scapi. Table 4 shows the timings of our implementation for the random OT
protocol, along with the timings of a base-OT implementation presented in [ALSZ13]. The paper
presents several base-OT implementations; the one we compare with is Miracl-based with “long
term security” using random oracle (cf. [ALSZ13, Section 6.1]). The implementation uses the NIST
K-283 curve and SHA-1 for hashing, and it is not a constant-time implementation. It turns out
that our work is an order of magnitude faster for n ∈ {4, 8, . . . , 1024}.

Acknowledgments. We are very grateful to: Daniel J. Bernstein and Tanja Lange for invaluable
comments and suggestions regarding elliptic curve cryptography and for editorial feedback on earlier
versions of this paper; Yehuda Lindell for useful comments on our proof of security; Peter Schwabe
for various helps on implementation, including providing low-level code for field arithmetic.
Tung Chou is supported by Netherlands Organisation for Scientific Research (NWO) under
grant 639.073.005. Claudio Orlandi is supported by: the Danish National Research Foundation
and The National Science Foundation of China (grant 61361136003) for the Sino-Danish Center
for the Theory of Interactive Computation; the Center for Research in Foundations of Electronic
Markets (CFEM); the European Union Seventh Framework Programme ([FP7/2007-2013]) under
grant agreement number ICT-609611 (PRACTICE).

15
References
ALSZ13. Gilad Asharov, Yehuda Lindell, Thomas Schneider, and Michael Zohner. More efficient oblivious transfer
and extensions for faster secure computation. In Proceedings of the 2013 ACM SIGSAC conference on
Computer communications security, pages 535–548. ACM, 2013.
ALSZ15. Gilad Asharov, Yehuda Lindell, Thomas Schneider, and Michael Zohner. More efficient oblivious transfer
extensions with security for malicious adversaries. Cryptology ePrint Archive, Report 2015/061, 2015.
http://eprint.iacr.org/.
BBJ+ 08. Daniel J Bernstein, Peter Birkner, Marc Joye, Tanja Lange, and Christiane Peters. Twisted edwards
curves. In Progress in Cryptology–AFRICACRYPT 2008, pages 389–405. Springer, 2008.
BDL+ 11. Daniel J. Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, and Bo-Yin Yang. High-speed high-security
signatures. In Cryptographic Hardware and Embedded Systems – CHES 2011, volume 6917 of Lecture
Notes in Computer Science, pages 124–142. Springer-Verlag Berlin Heidelberg, 2011.
BDPVA09. Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche. Keccak sponge function family
main document. Submission to NIST (Round 2), 3:30, 2009.
Bea96. Donald Beaver. Correlated pseudorandomness and the complexity of private computations. In Proceedings
of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, Philadelphia, Pennsylvania,
USA, May 22-24, 1996, pages 479–488, 1996.
Ber06. Daniel J Bernstein. Curve25519: new Diffie-Hellman speed records. In Public Key Cryptography-PKC
2006, pages 207–228. Springer, 2006.
BL14. Daniel J. Bernstein and Tanja Lange. Safecurves: choosing safe curves for elliptic-curve cryptography,
accessed 1 December 2014. http://safecurves.cr.yp.to.
BL15. Daniel J Bernstein and Tanja Lange. eBACS: Ecrypt benchmarking of cryptographic systems, accessed
16 March 2015. http://bench.cr.yp.to.
BM89. Mihir Bellare and Silvio Micali. Non-interactive oblivious transfer and spplications. In Advances in
Cryptology - CRYPTO ’89, 9th Annual International Cryptology Conference, Santa Barbara, California,
USA, August 20-24, 1989, Proceedings, pages 547–557, 1989.
Can01. Ran Canetti. Universally composable security: A new paradigm for cryptographic protocols. In 42nd
Annual Symposium on Foundations of Computer Science, FOCS 2001, 14-17 October 2001, Las Vegas,
Nevada, USA, pages 136–145, 2001.
CF01. Ran Canetti and Marc Fischlin. Universally composable commitments. IACR Cryptology ePrint Archive,
2001:55, 2001.
DNO08. Ivan Damgård, Jesper Buus Nielsen, and Claudio Orlandi. Essentially optimal universally composable
oblivious transfer. In Information Security and Cryptology - ICISC 2008, 11th International Conference,
Seoul, Korea, December 3-5, 2008, Revised Selected Papers, pages 318–335, 2008.
EGL85. Shimon Even, Oded Goldreich, and Abraham Lempel. A randomized protocol for signing contracts.
Commun. ACM, 28(6):637–647, 1985.
Fog14. Agner Fog. Instruction tables. 2014. http://www.agner.org/optimize/instruction_tables.pdf.
HL10. Carmit Hazay and Yehuda Lindell. Efficient Secure Two-Party Protocols - Techniques and Constructions.
Information Security and Cryptography. Springer, 2010.
HWCD08. Huseyin Hisil, Kenneth Koon-Ho Wong, Gary Carter, and Ed Dawson. Twisted Edwards curves revisited.
In Advances in Cryptology-ASIACRYPT 2008, pages 326–343. Springer, 2008.
IKNP03. Yuval Ishai, Joe Kilian, Kobbi Nissim, and Erez Petrank. Extending oblivious transfers efficiently.
In Advances in Cryptology - CRYPTO 2003, 23rd Annual International Cryptology Conference, Santa
Barbara, California, USA, August 17-21, 2003, Proceedings, pages 145–161, 2003.
IR89. Russell Impagliazzo and Steven Rudich. Limits on the provable consequences of one-way permutations.
In Proceedings of the 21st Annual ACM Symposium on Theory of Computing, May 14-17, 1989, Seattle,
Washigton, USA, pages 44–61, 1989.
Kil88. Joe Kilian. Founding cryptography on oblivious transfer. In Proceedings of the 20th Annual ACM
Symposium on Theory of Computing, May 2-4, 1988, Chicago, Illinois, USA, pages 20–31, 1988.
Lar14. Enrique Larraia. Extending oblivious transfer efficiently, or - how to get active security with constant
cryptographic overhead. IACR Cryptology ePrint Archive, 2014:692, 2014.
MF15. Andrew Moon “Floodyberry”. Implementations of a fast elliptic-curve digital signature algorithm, ac-
cessed 16 March 2015. https://github.com/floodyberry/ed25519-donna.
Nie07. Jesper Buus Nielsen. Extending oblivious transfers efficiently - how to get robustness almost for free.
Cryptology ePrint Archive, Report 2007/215, 2007. http://eprint.iacr.org/.

16
NNOB12. Jesper Buus Nielsen, Peter Sebastian Nordholt, Claudio Orlandi, and Sai Sheshank Burra. A new
approach to practical active-secure two-party computation. In Advances in Cryptology - CRYPTO 2012
- 32nd Annual Cryptology Conference, Santa Barbara, CA, USA, August 19-23, 2012. Proceedings, pages
681–700, 2012.
NP01. Moni Naor and Benny Pinkas. Efficient oblivious transfer protocols. In Proceedings of the Twelfth Annual
Symposium on Discrete Algorithms, January 7-9, 2001, Washington, DC, USA., pages 448–457, 2001.
PVW08. Chris Peikert, Vinod Vaikuntanathan, and Brent Waters. A framework for efficient and composable
oblivious transfer. In Advances in Cryptology - CRYPTO 2008, 28th Annual International Cryptology
Conference, Santa Barbara, CA, USA, August 17-21, 2008. Proceedings, pages 554–571, 2008.
Rab81. Michael O. Rabin. How to exchange secrets with oblivious transfer. Technical Report TR-81, Aiken
Computation Lab, Harvard University, 1981.
Sch15. Thomas Schneider. Personal communication, 2015.
Wie83. Stephen Wiesner. Conjugate coding. SIGACT News, 15(1):78–88, January 1983.

17

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy