Pragmatic MPC
Pragmatic MPC
David Evans
University of Virginia
evans@virginia.edu
Vladimir Kolesnikov
Georgia Institute of Technology
kolesnikov@gatech.edu
Mike Rosulek
Oregon State University
rosulekm@eecs.oregonstate.edu
Boston — Delft
A Pragmatic Introduction to
Secure Multi-Party Computation
David Evans1 , Vladimir Kolesnikov2 and Mike Rosulek3
1 Universityof Virginia; evans@virginia.edu
2 Georgia Institute of Technology; kolesnikov@gatech.edu
3 Oregon State University, rosulekm@eecs.oregonstate.edu
ABSTRACT
Secure multi-party computation (MPC) has evolved from a theo-
retical curiosity in the 1980s to a tool for building real systems
today. Over the past decade, MPC has been one of the most active
research areas in both theoretical and applied cryptography. This
book introduces several important MPC protocols, and surveys
methods for improving the efficiency of privacy-preserving ap-
plications built using MPC. Besides giving a broad overview of
the field and the insights of the main constructions, we overview
the most currently active areas of MPC research and aim to give
readers insights into what problems are practically solvable using
MPC today and how different threat models and assumptions
impact the practicality of different approaches.
David Evans, Vladimir Kolesnikov and Mike Rosulek, A Pragmatic Introduction to Secure Multi-
Party Computation. NOW Publishers, 2018. (This version: April 15, 2020)
Contents
1 Introduction 5
4 Implementation Techniques 65
8 Conclusion 148
Acknowledgements 152
References 154
1
Introduction
5
6 Introduction
There are two main types of secure and verifiable computation: outsourced
computation and multi-party computation. Our focus is on multi-party compu-
tation, but first we briefly describe outsourced computation to distinguish it
from multi-party computation.
In an outsourced computation, one party owns the data and wants to be able to
obtain the result of computation on that data. The second party receives and
stores the data in an encrypted form, performs computation on the encrypted
data, and provides the encrypted results to the data owner, without learning
anything about the input data, intermediate values, or final result. The data
owner can then decrypt the returned results to obtain the output.
Homomorphic encryption allows operations on encrypted data, and is
a natural primitive to implement outsourced computation. With partially-
homomorphic encryption schemes, only certain operations can be performed.
Several efficient partially-homomorphic encryption schemes are known (Pail-
lier, 1999; Naccache and Stern, 1998; Boneh et al., 2005). Systems built on
them are limited to specialized problems that can be framed in terms of the
supported operations.
To provide fully homomorphic encryption (FHE), it is necessary to support
a Turing-complete set of operations (e.g., both addition and multiplication) so
that any function can be computed. Although the goal of FHE was envisioned by
Rivest et al. (1978), it took more than 30 years before the first FHE scheme was
proposed by Gentry (2009), building on lattice-based cryptography. Although
there has been much recent interest in implementing FHE schemes Gentry and
Halevi (2011), Halevi and Shoup (2015), and Chillotti et al. (2016), building
secure, deployable, scalable systems using FHE remains an elusive goal.
In their basic forms, FHE and MPC address different aspects of MPC, and
as such shouldn’t be directly compared. They do, however, provide similar
functionalities, and there are ways to adapt FHE to use multiple keys that
enables multi-party computation using FHE (Asharov et al., 2012; López-
Alt et al., 2012; Mukherjee and Wichs, 2016). FHE offers an asymptotic
communication improvement in comparison with MPC, but at the expense
of computational efficiency. State-of-the-art FHE implementations (Chillotti
et al., 2017) are thousands of times slower than two-party and multi-party
1.2. Multi-Party Computation 7
Yao’s Millionaires Problem. The toy problem that was used to introduce
secure computation is not meant as a useful application. Yao (1982) introduces
it simply: “Two millionaires wish to know who is richer; however, they do not
want to find out inadvertently any additional information about each other’s
wealth.” That is, the goal is to compute the Boolean result of x1 ≤ x2 where
x1 is the first party’s private input and x2 is the second party’s private input.
1.3. MPC Applications 9
Although it is a toy problem, Yao’s Millionaires Problem can still be useful for
illustrating issues in MPC applications.
Secure machine learning. MPC can be used to enable privacy in both the
inference and training phases of machine learning systems.
Oblivious model inference allows a client to submit a request to a server
holding a pre-trained model, keeping the request private from the server S and
the model private from the client C. In this setting, the inputs to the MPC are
the private model from S, and the private test input from C, and the output
(decoded only for C) is the model’s prediction. An example of recent work in
this setting include MiniONN (Liu et al., 2017), which provided a mechanism
for allowing any standard neural network to be converted to an oblivious model
service using a combination of MPC and homomorphic encryption techniques.
In the training phase, MPC can be used to enable a group of parties to
train a model based on their combined data without exposing that data. For the
large scale data sets needed for most machine learning applications, it is not
feasible to perform training across private data sets as a generic many-party
computation. Instead, hybrid approaches have been designed that combine
MPC with homomorphic encryption (Nikolaenko et al., 2013b; Gascón et al.,
2017) or develop custom protocols to perform secure arithmetic operations
efficiently (Mohassel and Zhang, 2017). These approaches can scale to data
sets containing many millions of elements.
1.3. MPC Applications 11
1.3.1 Deployments
Although MPC has seen much success as a research area and in experimental
use, we are still in the early stages of deploying MPC solutions to real
problems. Successful deployment of an MPC protocol to solve a problem
involving independent and mutually distrusting data owners requires addressing
a number of challenging problems beyond the MPC execution itself. Examples
of these problems include building confidence in the system that will execute
the protocol, understanding what sensitive information might be inferred from
the revealed output of the MPC, and enabling decision makers charged with
protecting sensitive data but without technical cryptography background to
understand the security implications of participating in the MPC.
Despite these challenges, there have been several successful deployments
of MPC and a number of companies now focus on providing MPC-based
solutions. We emphasize that in this early stage of MPC penetration and
awareness, MPC is primarily deployed as an enabler of data sharing. In
other words, organizations are typically not seeking to use MPC to add a
layer of privacy in an otherwise viable application (we believe this is yet
forthcoming). Rather, MPC is used to enable a feature or an entire application,
which otherwise would not be possible (or would require trust in specialized
hardware), due to the value of the shared data, protective privacy legislation,
or mistrust of the participants.
The farmers felt that their bids reflected their capabilities and costs, which
they did not want to reveal to Danisco, the only company in Denmark that
processed sugar beets. At the same time, Danisco needed to be involved in the
auction as the contracts were securities directly affecting the company.
The auction was implemented as a three-party MPC among representatives
for Danisco, the farmer’s association (DKS) and the researchers (SIMAP
project). As explained by Bogetoft et al. (2009), a three party solution was
selected, partly because it was natural in the given scenario, but also because it
allowed using efficient information theoretic tools such as secret sharing. The
project led to the formation of a company, Partisia, that uses MPC to support
auctions for industries such as spectrum and energy markets, as well as related
applications such as data exchange (Gallagher et al., 2017).
Boston wage equity study. An initiative of the City of Boston and the
Boston Women’s Workforce Council (BWWC) aims to identify salary inequities
1.3. MPC Applications 13
key material among the two servers. Now, an attacker must compromise both
S1 and S2 to gain access to the keys. We can run S1 and S2 on two different
software stacks to minimize the chance that they will both be vulnerable to
the exploit available to the malware, and operate them using two different
sub-organizations to minimize insider threats. Of course, routine execution
does need access to the keys to provide authentication service; at the same time,
key should never be reconstructed as the reconstructing party will be the target
of the APT attack. Instead, the three players, S1, S2 , and the authenticating user
U, will run the authentication inside MPC, without ever reconstructing any
secrets, thus removing the singular vulnerability and hardening the defense.
1.4 Overview
Because MPC is a vibrant and active research area, it is possible to cover only
a small fraction of the most important work in this book. We mainly discuss
generic MPC techniques, focusing mostly on the two-party scenario, and
emphasizing a setting where all but one of the parties may be corrupted. In the
next chapter, we provide a formal definition of secure multi-party computation
and introduce security models that are widely-used in MPC. Although we
do not include formal security proofs in this book, it is essential to have
clear definitions to understand the specific guarantees that MPC provides.
Chapter 3 describes several fundamental MPC protocols, focusing on the most
widely-used protocols that resist any number of corruptions. Chapter 4 surveys
techniques that have been developed to enable efficient implementations of
MPC protocols, and Chapter 5 describes methods that have been used to
provide sub-linear memory abstractions for MPC.
Chapters 3–5 target the weak semi-honest adversary model for MPC
(defined in Chapter 2), in which is it assumed that all parties follow the protocol
as specified. In Chapter 6, we consider how MPC protocols can be hardened
to provide security against active adversaries, and Chapter 7 explores some
alternative threat models that enable trade-offs between security and efficiency.
We conclude in Chapter 8, outlining the trajectory of MPC research and
practice, and suggesting possible directions for the future.
2
Defining Multi-Party Computation
15
16 Defining Multi-Party Computation
• Perfect Privacy. Any set of shares of size less than t does not reveal
anything about the secret in the information theoretic sense. More
formally, for any two secrets a, b ∈ D and any possible vector of shares
v = v1, v2, ..., vk , such that k < t,
In many of our discussions we will use (n, n)-secret sharing schemes, where
all n shares are necessary and sufficient to reconstruct the secret.
Random Oracle. Random Oracle (RO) is a heuristic model for the security of
hash functions, introduced by Bellare and Rogaway (1993). The idea is to treat
the hash function as a public, idealized random function. In the random oracle
model, all parties have access to the public function H : {0, 1}∗ → {0, 1} κ ,
implemented as a stateful oracle. On input string x ∈ {0, 1}∗ , H looks up
its history of calls. If H(x) had never been called, H chooses a random
rx ∈ {0, 1} κ , remembers the pair x, rx and returns rx . If H(x) had been called
before, H returns rx . In this way, the oracle realizes a randomly-chosen function
{0, 1}∗ → {0, 1} κ .
The random oracle model is a heuristic model, because it captures only
those attacks that treat the hash function H as a black-box. It deviates from
reality in that it models a public function (e.g., a standardized hash function
like SHA-256) as an inherently random object. In fact, it is possible to construct
(extremely contrived) schemes that are secure in the random oracle model,
but which are insecure whenever H is instantiated by any concrete function
(Canetti et al., 1998).
Despite these shortcomings, the random oracle model is often considered
acceptable for practical applications. Assuming a random oracle often leads to
significantly more efficient constructions. In this work we will be careful to
state when a technique relies on the random oracle model.
2.3. Security of Multi-Party Computation 19
Informally, the goal of MPC is for a group of participants to learn the correct
output of some agreed-upon function applied to their private inputs without
revealing anything else. We now provide a more formal definition to clarify
the security properties MPC aims to provide. First, we present the real-ideal
paradigm which forms the conceptual core of defining security. Then we
discuss two different adversary models commonly used for MPC. Finally, we
discuss issues of composition—namely, whether security preserved in the
natural way when a secure protocol invokes another subprotocol.
Ideal World. In the ideal world, the parties securely compute the function F
by privately sending their inputs to a completely trusted party T , referred to as
the functionality. Each party Pi has an associated input xi , which is sends to
T , which simply computes F (x1, . . . , xn ) and returns the result to all parties.
Often we will make a distinction between F as a trusted party (functionality)
and the circuit C that such a party computes on the private inputs.
We can imagine an adversary attempting to attack the ideal-world inter-
action. An adversary can take control over any of the parties Pi , but not T
(that is the sense in which T is described as a trusted party). The simplicity
20 Defining Multi-Party Computation
of the ideal world makes it easy to understand the effect of such an attack.
Considering our previous laundry list: the adversary clearly learns no more
than F (x1, . . . , xn ) since that is the only message it receives; the outputs given
to the honest parties are all consistent and legal; the adversary’s choice of
inputs is independent of the honest parties’.
Although the ideal world is easy to understand, the presence of a fully-
trusted third party makes it imaginary. We use the ideal world as a benchmark
against which to judge the security of an actual protocol.
Real World. In the real world, there is no trusted party. Instead, all parties
communicate with each other using a protocol. The protocol π specifies for
each party Pi a “next-message” function πi . This function takes as input a
security parameter, the party’s private input xi , a random tape, and the list of
messages Pi has received so far. Then, πi outputs either a next message to send
along with its destination, or else instructs the party to terminate with some
specific output.
In the real world, an adversary can corrupt parties—corruption at the
beginning of the protocol is equivalent to the original party being an adversary.
Depending on the threat model (discussed next), corrupt parties may either
follow the protocol as specified, or deviate arbitrarily in their behavior.
Intuitively speaking, the real world protocol π is considered secure if any
effect that an adversary can achieve in the real world can also be achieved by
a corresponding adversary in the ideal world. Put differently, the goal of a
protocol is to provide security in the real world (given a set of assumptions)
that is equivalent to that in the ideal world.
A semi-honest adversary is one who corrupts parties but follows the protocol as
specified. In other words, the corrupt parties run the protocol honestly but they
may try to learn as much as possible from the messages they receive from other
parties. Note that this may involve several colluding corrupt parties pooling their
views together in order to learn information. Semi-honest adversaries are also
considered passive in that they cannot take any actions other than attempting
to learn private information by observing a view of a protocol execution.
Semi-honest adversaries are also commonly called honest-but-curious.
2.3. Security of Multi-Party Computation 21
The view of a party consists of its private input, its random tape, and the
list of all messages received during the protocol. The view of an adversary
consists of the combined views of all corrupt parties. Anything an adversary
learns from running the protocol must be an efficiently computable function of
its view. That is, without loss of generality we need only consider an “attack”
in which the adversary simply outputs its entire view.
Following the real-ideal paradigm, security means that such an “attack”
can also be carried out in the ideal world. That is, for a protocol to be secure,
it must be possible in the ideal world to generate something indistinguishable
from the real world adversary’s view. Note that the adversary’s view in the ideal
world consists of nothing but inputs sent to T and outputs received from T .
So, an ideal-world adversary must be able to use this information to generate
what looks like a real-world view. We refer to such an ideal-world adversary
as a simulator, since it generates a “simulated” real-world view while in the
ideal-world itself. Showing that such a simulator exists proves that there is
nothing an adversary can accomplish in the real world that could not also be
done in the ideal world.
More formally, let π be a protocol and F be a functionality. Let C be the
set of parties that are corrupted, and let Sim denote a simulator algorithm. We
define the following distributions of random variables:
• Realπ (κ, C; x1, . . . , xn ): run the protocol with security parameter κ,
where each party Pi runs the protocol honestly using private input xi .
Let Vi denote the final view of party Pi , and let yi denote the final output
of party Pi .
Output {Vi | i ∈ C}, (y1, . . . , yn ).
• Ideal F, Sim (κ, C; x1, . . . , xn ): Compute (y1, . . . , yn ) ← F (x1, . . . , xn ).
Output Sim (C, {(xi, yi ) | i ∈ C}), (y1, . . . , yn ).
A protocol is secure against semi-honest adversaries if the corrupted parties
in the real world have views that are indistinguishable from their views in the
ideal world:
Definition 2.2. A protocol π securely realizes F in the presence of semi-honest
adversaries if there exists a simulator Sim such that, for every subset of corrupt
parties C and all inputs x1, . . . , xn , the distributions
Realπ (κ, C; x1, . . . , xn )
22 Defining Multi-Party Computation
and
Ideal F, Sim (κ, C; x1, . . . , xn )
are indistinguishable (in κ).
In defining Real and Ideal we have included the outputs of all parties, even
the honest ones. This is a way of incorporating a correctness condition into the
definition. In the case that no parties are corrupt (C = ∅), the output of Real
and Ideal simply consist of all parties’ outputs in the two interactions. Hence,
the security definition implies that protocol gives outputs which are distributed
just as their outputs from the ideal functionality (and this is true even when F
is randomized). Because the distribution of y1, . . . , yn in Real does not depend
on the set C of corrupted parties (no matter who is corrupted, the parties all
run honestly), it is not strictly necessary to include these values in the case of
C , ∅, but we choose to include it to have a unified definition.
The semi-honest adversary model may at first glance seem exceedingly
weak—simply reading and analyzing received messages barely even seems
like an attack at all! It is reasonable to ask why such a restrictive adversary
model is worth considering at all. In fact, achieving semi-honest security is far
from trivial and, importantly, semi-honest protocols often serve as a basis for
protocols in more robust settings with powerful attackers. Additionally, many
realistic scenarios do correspond to semi-honest attack behavior. One such
example is computing with players who are trusted to act honestly, but cannot
fully guarantee that their storage might not be compromised in the future.
Effect on honest outputs. When the corrupt parties deviate from the protocol,
there is now the possibility that honest parties’ outputs will be affected.
For example, imagine an adversary that causes two honest parties to
output different things while in the ideal world all parties get identical
outputs. This condition is somewhat trivialized in the previous definition
— while the definition does compare real-world outputs to ideal-world
outputs, these outputs have no dependence on the adversary (set of
corrupted parties). Furthermore, we can/should make no guarantees on
the final outputs of corrupt parties, only of the honest parties, since a
malicious party can output whatever it likes.
When A denotes the adversary program, we write corrupt(A) to denote the set
of parties that are corrupted, and use corrupt(Sim ) for the set of parties that are
corrupted by the ideal adversary, Sim . As we did for the semi-honest security
definition, we define distributions for the real world and ideal world, and define
a secure protocol as one that makes those distributions indistinguishable:
• Realπ, A (κ; {xi | i < corrupt(A)}): run the protocol on security parame-
ter κ, where each honest party Pi (for i < corrupt(A)) runs the protocol
honestly using given private input xi , and the messages of corrupt parties
are chosen according to A (thinking of A as a protocol next-message
function for a collection of parties). Let yi denote the output of each
24 Defining Multi-Party Computation
• Ideal F, Sim (κ; {xi | i < corrupt(A)}): Run Sim until it outputs a set of
inputs {xi | i ∈ corrupt(A)}. Compute (y1, . . . , yn ) ← F (x1, . . . , xn ).
Then, give {yi | i ∈ corrupt(A)} to Sim .2 Let V ∗ denote the final output
of Sim (a set of simulated views).
Output (V ∗, {yi | i < corrupt(Sim )}).
and
Ideal F, Sim (κ; {xi | i < corrupt(Sim )})
are indistinguishable (in κ).
Note that the definition quantifies only over the inputs of honest parties
{xi | i < corrupt(A)}. The interaction Real does not consider the corrupt
parties to have any inputs, and the inputs of the corrupt parties in Sim is only
determined indirectly (by the simulator’s choice of what to send to F on the
corrupt parties’ behalf). While it would be possible to also define inputs for
corrupt parties in the real world, such inputs would merely be “suggestions”
since corrupt parties could choose to run the protocol on any other input (or
behave in a way that is inconsistent with all inputs).
Reactive functionalities. In the ideal world, the interaction with the func-
tionality consists of just a single round: inputs followed by outputs. It is possible
to generalize the behavior of F so that it interacts with the parties over many
rounds of interaction, keeping its own private internal state between rounds.
Such functionalities are called reactive.
2To be more formal, we can write the simulator Sim as a pair of algorithms Sim =
(Sim 1, Sim 2 ) which capture this two-phase process. Sim 1 (on input κ) outputs {xi | i ∈
corrupt(A)} and arbitrary internal state Σ. Then Sim 2 takes input Σ and {yi | i ∈ corrupt(A)},
and gives output V ∗ .
2.3. Security of Multi-Party Computation 25
adversary has control over output delivery to honest parties and output fairness
is not expected.
Adaptive corruption. We have defined both the real and ideal worlds so that
the identities of the corrupted parties are fixed throughout the entire interaction.
This provides what is known as security against static corruption. It is also
possible to consider scenarios where an adversary may choose which parties to
corrupt during the protocol execution, possibly based on what it learns during
the interaction. This behavior is known as adaptive corruption.
Security against adaptive corruption can be modeled in the real-ideal
paradigm, by allowing the adversary to issue a command of the form “corrupt
Pi ”. In the real world, this results in the adversary learning the current view
(including private randomness) of Pi and subsequently taking over control
of its protocol messages. In the ideal world, the simulator learns only the
input and outputs of the party upon corruption, and must use this information
to generate simulated views. Of course, the views of parties are correlated
(if Pi sends a message to P j , then that message is included in both parties’
views). The challenge of adaptive security is that the simulator must produce
views piece-by-piece. For example, the simulator may be asked to produce a
view of Pi when that party is corrupted. Any messages sent by P j to Pi must
be simulated without knowledge of P j ’s private input. Later, the simulator
might be asked to provide a view of P j (including its private randomness) that
“explains” its protocol messages as somehow consistent with whatever private
input it had.
In this work we consider only static corruption, following the vast majority
of work in this field.
Parameters:
Functionality:
• R receives xb , S receives ⊥.
Parameters:
Functionality:
shown by Kilian (1988): given OT, one can build MPC without any additional
assumptions, and, similarly, one can directly obtain OT from MPC.
The standard definition of 1-out-of-2 OT involves two parties, a Sender S
holding two secrets x0, x1 and a receiver R holding a choice bit b ∈ {0, 1}. OT
is a protocol allowing R to obtain xb while learning nothing about the “other”
secret x1−b . At the same time, S does not learn anything at all. More formally:
Definition 2.5. A 1-out-of-2 OT is a cryptographic protocol securely imple-
menting the functionality F OT of Figure 2.1.
Many variants of OT may be considered. A natural variant is 1-out-of-k
OT, in which S holds k secrets, and R has a choice selector from [0, ..., k − 1].
We discuss protocols for implementing OT efficiently in Section 3.7.
Parameters:
Functionality:
secret value, and reveal it at some later time to a receiver. The receiver should
learn nothing about the committed value before it is revealed by the sender (a
property refered to as hiding), while the sender should not be able to change
its choice of value after committing (the binding property).
Commitment is rather simple and inexpensive in the random oracle model.
To commit to x, simply choose a random value r ∈R {0, 1} κ and publish the
value y = H(xkr). To later reveal, simply announce x and r.
The real-ideal paradigm was first applied in the setting of MPC by Goldwasser
et al. (1985), for the special case of zero-knowledge. Shortly thereafter the
definition was generalized to arbitrary MPC by Goldreich et al. (1987). These
definitions contain the important features of the real-ideal paradigm, but
resulted in a notion of security (against malicious adversaries) that was not
preserved under composition. In other words, a protocol could be secure
according to these models when executed in isolation, but may be totally
insecure when two protocol instances are run concurrently.
The definition of security that we have sketched in this book is the Universal
Composition (UC) framework of Canetti (2001). Protocols proven secure in
the UC framework have the important composition property described in
Section 2.3.4, which in particular guarantees security of a protocol instance
no matter what other protocols are executing concurrently. While the UC
framework is the most popular model with this property, there are other models
with similar guarantees (Pfitzmann and Waidner, 2000; Hofheinz and Shoup,
2011). The details of all such security models are extensive and subtle. However,
a significantly simpler model is presented by Canetti et al. (2015), which is
equivalent to the full UC model for the vast majority of cases. Some of the
protocols we describe are secure in the random oracle model. Canetti et al.
(2014) describe how to incorporate random oracles into the UC framework.
Our focus in this book is on the most popular security notions — namely,
semi-honest security and malicious security. The literature contains many
variations on these security models, and some are a natural fit for real-world
applications. We discuss some alternative security models in Chapter 7.
3
Fundamental MPC Protocols
32
3.1. Yao’s Garbled Circuits Protocol 33
Yao’s Garbled Circuits protocol (GC) is the most widely known and celebrated
MPC technique. It is usually seen as best-performing, and many of the protocols
we cover build on Yao’s GC. While not having the best known communication
complexity, it runs in constant rounds and avoids the costly latency associated
with approaches, such as GMW (described in Section 3.2), where the number
of communication rounds scales with the circuit depth.
3.1.1 GC Intuition
The main idea behind Yao’s GC approach is quite natural. Recall, we wish to
evaluate a given function F (x, y) where party P1 holds x ∈ X and P2 holds
y ∈ Y . Here X and Y are the respective domains for the inputs of P1 and P2 .
Function as a look-up table. First, let’s consider a function F for which the
input domain is small and we can efficiently enumerate all possible input pairs,
(x, y). The function F can be represented as a look-up table T, consisting of
|X | · |Y | rows, Tx,y = hF (x, y)i. The output of F (x, y) is obtained simply by
retrieving Tx,y from the corresponding row.
This gives us an alternative (and much simplified!) view of the task at
hand. Evaluating a look-up table can be done as follows. P1 will encrypt T
by assigning a randomly-chosen strong key to each possible input x and y.
That is, for each x ∈ X and each y ∈ Y , P1 will choose k x ∈R {0, 1} κ and
k y ∈R {0, 1} κ . It will then encrypt T by encrypting each element Tx,y of T
with both keys k x and k y , and send the encrypted (and randomly permuted!)
table hEnck x ,ky (Tx,y )i to P2 .
Now our task is to enable P2 to decrypt (only) the entry Tx,y corresponding
to players’ inputs. This is done by having P1 send to P2 the keys k x and k y . P1
knows its input x, and hence simply sends key k x to P2 . The key k y is sent to
P2 using a 1-out-of-|Y | Oblivious Transfer (Section 2.4). Once P2 receives k x
and k y , it can obtain the output F (x, y) by decrypting Tx,y using those keys.
Importantly, no other information is obtained by P2 . This is because P2 only
34 Fundamental MPC Protocols
has a single pair of keys, which can only be used to open (decrypt) a single
table entry. We stress that, in particular, it is important that neither partial key,
k x or k y , by itself can be used to obtain partial decryptions or even determine
whether the partial key was used in the obtaining a specific encryption.1
©Encki0,k 0j (k t )ª
0
Encki0,k 1j (k t0 )®
®
TG = ®
Encki1,k 0j (k t )®
0 ®
Enck 1,k 1 (k t1 )
®
« i j ¬
Each cell of the look-up table encrypts the label corresponding to the
output computed by the gate. Crucially, this allows the evaluator P2 to obtain
the intermediate active labels on internal circuit wires and use them in the
evaluation of F under encryption without ever learning their semantic value.
36 Fundamental MPC Protocols
P1 permutes the entries in each of the look-up tables (usually called garbled
tables or garbled gates), and sends all the tables to P2 . Additionally, P1 sends
(only) the active labels of all wires corresponding to the input values to P2 . For
input wires belonging to P1 ’s inputs to F , this is done simply by sending the
wire label keys. For wires belonging to P2 ’s inputs, this is done via 1-out-of-2
Oblivious Transfer.
Upon receiving the input keys and garbled tables, P2 proceeds with the
evaluation. As discussed above, P2 must be able to decrypt the correct row
of each garbled gate. This is achieved by the point-and-permute technique
described above. In our case of a 4-row garbled table, the point-and-permute
technique is particularly simple and efficient — one pointer bit is needed for
each input, so there are two total pointer bits added to each entry in the garbled
table. Ultimately, P2 completes evaluation of the garbled circuit and obtains
the keys corresponding to the output wires of the circuit. These could be sent
to P1 for decryption, thus completing the private evaluation of F .
We note that a round of communication may be saved and sending the
output labels by P2 for decryption by P1 can be avoided. This can be done
simply by P1 including the decoding tables for the output wires with the garbled
circuit it sends. The decoding table is simply a table mapping each label on
each output wire to its semantics (i.e. the corresponding plaintext value. Now,
P2 obtaining the output labels will look them up in the decoding table and
obtain the output in plaintext.
At an intuitive level, at least, it is easy to see that this circuit-based
construction is secure in the semi-honest model. Security against a corrupt
P1 is easy, since (other than the OT, which we assume has been separately
shown to satisfy the OT security definition) that party receives no messages
in the protocol! For a corrupt P2 , security boils down to the observation that
the evaluator P2 never sees both labels for the same wire. This is obviously
true for the input wires, and it holds inductively for all intermediate wires
(knowing only one label on each incoming wire of the gate, the evaluator
can only decrypt one ciphertext of the garbled gate). Since P2 does not know
the correspondence between plaintext values and the wire labels, it has no
information about the plaintext values on the wires, except for the output wires
where the association between labels and values is explicitly provided by P1 .
To simulate P2 ’s view, the simulator Sim P2 chooses random active labels for
3.2. Goldreich-Micali-Wigderson (GMW) Protocol 37
each wire, simulates the three “inactive” ciphertexts of each garbled gate as
dummy ciphertexts, and produces decoding information that decodes the active
output wires to the function’s output.
Figure 3.1 formalizes Yao’s gate generation, and Figure 3.2 summarizes Yao’s
GC protocol. For simplicity of presentation, we describe the protocol variant
based on Random Oracle (defined in Section 2.2), even though a weaker
assumption (the existence of pseudo-random functions) is sufficient for Yao’s
GC construction. The Random Oracle, denoted by H, is used in implementing
garbled row encryption. We discuss different methods of instantiating H
in Section 4.1.4. The protocol also uses Oblivious Transfer, which requires
public-key cryptography.
For each wire label, a pointer bit, pi , is added to the wire label key
following the point-and-permute technique described in Section 3.1.1. The
pointer bits leak no information since they are selected randomly, but they
allow the evaluator to determine which row in the garbled table to decrypt,
based on the pointer bits for the two active wires it has for the inputs. In
Section 4.1 we discuss several ways for making Yao’s GC protocol more
efficient, including reducing the size of the garbled table to just two ciphertexts
per gate (Section 4.1.3) and enabling XOR gates to be computed without
encryption (Section 4.1.2).
Parameters:
Boolean circuit C implementing function F , security parameter κ.
GC generation:
Sort entries e in the table by the input pointers, placing entry eva,vb
in position hpvaa , pvbb i.
ev = H(k iv || “out” || j) ⊕ v
(Because we are xor-ing with a single bit, we just use the lowest bit of the
output of H for generating the above ev .) Sort entries e in the table by
the input pointers, placing entry ev in position piv . (There is no conflict,
since p1i = p0i ⊕ 1.)
1. P1 plays the role of GC generator and runs the algorithm of Figure 3.1.
P1 then sends the obtained GC Cb (including the output decoding table)
to P2 .
(a) P1 ’s two input secrets are the two labels for the wire, and P2 ’s
choice-bit input is its input on that wire.
(b) Upon completion of the OT, P2 receives active wire label on the
wire.
4. P2 evaluates received C
b gate-by-gate, starting with the active labels on
the input wires.
(a) For gate Gi with garbled table T = (e0,0, ...e1,1 ) and active input
labels wa = (k a, pa ), wb = (k b, pb ), P2 computes active output
label wc = (k c, pc ):
wc = H(k a || k b || i) ⊕ e pa, pb
5. Obtaining output using output decoding tables. Once all gates of C b are
evaluated, using “out” for the second key to decode the final output
gates, P2 obtains the final output labels which are equal to the plaintext
output of the computation. P2 sends the obtained output to P1 , and they
both output it.
The GMW protocol can work both on Boolean and arithmetic circuits. We
present the two-party Boolean version first, and then briefly explain how the
protocol can be generalized to more than two parties. As with Yao’s protocol,
we assume players P1 with input x and P2 with input y have agreed on the
Boolean circuit C representing the computed function F (x, y).
The GMW protocol proceeds as follows. For each input bit xi ∈ {0, 1} of
x ∈ {0, 1} n , P1 generates a random bit ri ∈R {0, 1} and sends all ri to P2 . Next,
P1 obtains a secret sharing of each xi among P1 and P2 by setting its share to
be xi ⊕ ri . Symmetrically, P2 generates random bit masks for its inputs yi and
sends the masks to P1 , secret sharing its input similarly.
P1 and P2 proceed in evaluating C gate by gate. Consider gate G with input
wires wi and w j and output wire wk . The input wires are split into two shares,
such that s1x ⊕ s2x = wx . Let P1 holds shares si1 and s1j on wi and w j , and P2
hold shares si2 and s2j on the two wires. Without loss of generality, assume C
consists of NOT, XOR and AND gates.
Both NOT and XOR gates can be evaluated without any interaction. A NOT
gate is evaluated by P1 flipping its share of the wire value, which flips the
shared wire value. An XOR gate on wires wi and w j is evaluated by players
xor-ing the shares they already hold. That is, P1 computes its output share as
sk1 = si1 ⊕ s1j , and P2 correspondingly computes its output share as sk2 = si2 ⊕ s2j .
The computed shares, sk1, sk2 , indeed are shares of the active output value:
sk1 ⊕ sk2 = (si1 ⊕ s1j ) ⊕ (si2 ⊕ s2j ) = (si1 ⊕ si2 ) ⊕ (s1j ⊕ s2j ) = v1 ⊕ v2 .
Evaluating an AND gate requires interaction and uses 1-out-of-4 OT a
basic primitive. From the point of view of P1 , its shares si1, s1j are fixed, and P2
has two Boolean input shares, which means there are four possible options for
P2 . If P1 knew P2 ’s shares, then evaluating the gate under encryption would
be trivial: P1 can just reconstruct the active input values, compute the active
output value and secret-share it with P2 . While P1 cannot do that, it can do the
next best thing: prepare such a secret share for each of P2 ’s possible inputs,
and run 1-out-of-4 OT to transfer the corresponding share. Specifically, let
S = Ss1,s1 (si2, s2j ) = (si1 ⊕ si2 ) ∧ (s1j ⊕ s2j )
i j
be the function computing the gate output value from the shared secrets on
the two input wires. P1 chooses a random mask bit r ∈R {0, 1} and prepares a
3.2. Goldreich-Micali-Wigderson (GMW) Protocol 41
table of OT secrets:
©r ⊕ S(0, 0)ª
r ⊕ S(0, 1)®
TG = ®
r ⊕ S(1, 0)®
®
«r ⊕ S(1, 1)¬
Then P1 and P2 run an 1-out-of-4 OT protocol, where P1 plays the role of the
sender, and P2 plays the role of the receiver. P1 uses table rows as each of the
four input secrets, and P2 uses its two bit shares as the selection to choose the
corresponding row. P1 keeps r as its share of the gate output wire value, and
P2 uses the value it receives from the OT execution.
Because of the way the OT inputs are constructed, the players obtain a
secret sharing of the gate output wire. At the same time, it is intuitively clear
that the players haven’t learned anything about the other player’s inputs or the
intermediate values of the computation. This is because effectively only P2
receives messages, and by the OT guarantee, it learns nothing about the three
OT secrets it did not select. The only thing it learns is its OT output, which
is its share of a random sharing of the output value and therefore leaks no
information about the plaintext value on that wire. Likewise, P1 learns nothing
about the selection of P2 .
After evaluating all gates, players reveal to each other the shares of the
output wires to obtain the output of the computation.
• For an XOR gate, the players locally add their shares. Like the two-party
case, no interaction is required and correctness and security are assured.
• For an AND gate c = a ∧ b, let a1, . . . , an, b1, . . . , bn denote the shares
42 Fundamental MPC Protocols
c = a ∧ b = (a1 ⊕ · · · ⊕ an ) ∧ (b1 ⊕ · · · ⊕ bn )
n
! !
Ê Ê
= ai ∧ bi ⊕ ai ∧ b j
i=1 i,j
Én
Each player P j computes a j ∧b j locally to obtain a sharing of i=1 ai ∧bi .
Further, each pair of players Pi , P j jointly computes the shares of ai ∧ b j
as described above in the two-party GMW. Finally, each player outputs
the XOR of all obtained shares as the sharing of the result a ∧ b.
One of the first multi-party protocols for secure computation is due to Ben-Or,
Goldwasser, and Wigderson (Ben-Or et al., 1988), and is known as the “BGW”
protocol. Another somewhat similar protocol of Chaum, Crépau, and Damgård
was published concurrently (Chaum et al., 1988) with BGW, and the two
protocols are often considered together. For concreteness, we present here the
BGW protocol for n parties, which is somewhat simpler.
The BGW protocol can be used to evaluate an arithmetic circuit over a
field F, consisting of addition, multiplication, and multiplication-by-constant
gates. The protocol is heavily based on Shamir secret sharing (Shamir, 1979),
and it uses the fact that Shamir secret shares are homomorphic in a special
way—the underlying shared value can be manipulated obliviously, by suitable
manipulations to the individual shares.
For v ∈ F we write [v] to denote that the parties hold Shamir secret shares
of a value v. More specifically, a dealer chooses a random polynomial p of
degree at most t, such that p(0) = v. Each party Pi then holds value p(i) as
their share. We refer to t as the threshold of the sharing, so that any collection
of t shares reveals no information about v.
The invariant of the BGW protocol is that for every wire w in the arithmetic
circuit, the parties hold a secret-sharing [vw ] of the value vw on that wire. Next,
we sketch the protocol with a focus on maintaining this invariant.
Input wires. For an input wire belonging to party Pi , that party knows the
value v on that wire in the clear, and distributes shares of [v] to all the parties.
3.3. BGW protocol 43
Addition gate. Consider an addition gate, with input wires α, β and output
wire γ. The parties collectively hold sharings of incoming wires [vα ] and [vβ ],
and the goal is to obtain a sharing of [vα + vβ ]. Suppose the incoming sharings
correspond to polynomials pα and pβ , respectively. If each party Pi locally
adds their shares pα (i) + pβ (i), then the result is that each party holds a point
on the polynomial pγ (x) = pα (x) + pβ (x). Since pγ also has degree at most t,
def
Since the values [q(i)] were shared with threshold t, the final sharing of [q(0)]
also has threshold t, as desired.
Note that multiplication gates in the BGW protocol require communica-
tion/interaction, in the form of parties sending shares of [q(i)]. Note also that
we require 2t + 1 ≤ n, since otherwise the n parties do not collectively have
enough information to determine the value q(0), as q may have degree 2t. For
that reason, the BGW protocol is secure against t corrupt parties, for 2t < n
(i.e., an honest majority).
Output wires. For an output wire α, the parties will eventually hold shares
of the value [vα ] on that wire. Each party can simply broadcast its share of this
value, so that all parties can learn vα .
field, and c = ab. In an offline phase, such Beaver triples can be generated
in a variety of ways, such as by simply running the BGW multiplication
subprotocol on random inputs. One Beaver triple is then “consumed” for each
multiplication gate in the eventual protocol.
Consider a multiplication gate with input wires α, β. The parties hold
secret sharings of [vα ] and [vβ ]. To carry out the multiplication of vα and vβ
using a Beaver triple [a], [b], [c], the parties do the following:
vα vβ = (vα − a + a)(vβ − b + b)
= (d + a)(e + b)
= de + db + ae + ab
= de + db + ae + c
Since d and e are public, and the parties hold sharings of [a], [b], [c],
they can compute a sharing of [vα vβ ] by local computation only:
Using this technique, a multiplication can be performed using only two openings
plus local computation. Overall, each party must broadcast two field elements
per multiplication, compared to n field elements (across private channels) in
the plain BGW protocol. While this comparison ignores the cost of generating
the Beaver triples in the first place, there are methods for generating triples in
a batch where the amortized cost of each triple is a constant number of field
elements per party (Beerliová-Trubíniová and Hirt, 2008).
4Since a is used as essentially a one-time pad (and b similarly below), this triple [a], [b], [c]
cannot be reused again in a different multiplication gate.
46 Fundamental MPC Protocols
The BMR protocols adapt the main idea of Yao’s GC to a multi-party setting.
GC is chosen as a starting point due to its round-efficiency. However, a naïve
attempt to port the GC protocol from the 2PC into the MPC setting gets stuck
at the stage of sending the generated GC to the evaluators. Indeed, the circuit
generator knows all the secrets (wire label correspondences), and if it colludes
with any of the evaluators, the two colluding parties can learn the intermediate
wire values and violate the security guarantees of the protocol.
The basic BMR idea is to perform a distributed GC generation, so that no
single party (or even a proper subset of all parties) knows the GC generation
secrets – the label assignment and correspondence. This GC generation can be
done in parallel for all gates using MPC. This is possible by first generating
(in parallel) all wire labels independently, and then independently and in
parallel generating garbled gate tables. Because of parallel processing for all
gates/wires, the GC generation is independent of the depth of the computed
circuit C. As a result, the GC generation circuit CGEN is constant-depth for
all computed circuits C (once the security parameter κ is fixed). Even if the
parties perform MPC evaluation of CGEN that depends on the depth of CGEN ,
the overall BMR protocol will still have constant rounds overall.
The MPC output, the GC produced by securely evaluating CGEN , may be
delivered to a designated player, say P1 , who will then evaluate it similarly to
Yao’s GC. The final technicality here is how to deliver the active input labels
to P1 . There are several ways how this may be achieved, depending on how
48 Fundamental MPC Protocols
Yao’s GC and the GMW protocol present two different flavors of the use of
secret sharing in MPC. In this section, we discuss a third flavor, where the
secrets are shared not among players, but among wires. This construction
is also interesting because it provides information-theoretic security in the
OT-hybrid setting, meaning that no computational hardness assumptions are
used in the protocol beyond what is used in the underlying OT. An important
practical reason to consider IT GC is that it presents a trade-off between
communication bandwidth and latency: it needs to send less data than Yao
GC at the cost of additional communication rounds. While most research on
practical MPC focuses on low round complexity, we believe some problems
which require very wide circuits, such as those that arise in machine learning,
may benefit from IT GC constructions.
Information-theoretic constructions typically provide stronger security
at a higher cost. Surprisingly, this is not the case here. Intuitively, higher
performance is obtained because information-theoretic encryption allows the
encryption of a bit to be a single bit rather than a ciphertext whose length
scales with the security parameter. Further, information-theoretic encryption
here is done with bitwise XOR and bit shufflings, rather than with standard
primitives such as AES.
We present the Gate Evaluation Secret Sharing (GESS) scheme of
Kolesnikov (2005) (Kolesnikov (2006) provides details), which is the most
efficient information-theoretic analog of GC. The main result of Kolesnikov
(2005) is a two-party protocol for a Boolean formula F with communication
Í
complexity ≈ di2 , where di is the depth of the i-th leaf of F.
At a high level, GESS is a secret-sharing scheme, designed to allow
evaluation under encryption of a Boolean gate G. The output wire labels
of G are the two secrets from which P1 produces four secret shares, one
corresponding to each of the wire labels of the two input wires. GESS
guarantees that a valid combination of shares (one share per wire) can be used
to reconstruct the corresponding label of the output wire. This is similar to
Yao’s GC, but GESS does not require the use of garbled tables, and hence can
be viewed as a generalization of Yao’s GC. Similarly to Yao’s GC approach,
the secret sharing can be applied gate-by-gate without the need to decode or
reconstruct the plaintext values.
3.6. Information-Theoretic Garbled Circuits 51
We present GESS for the 1-to-1 gate function G : {0, 1}2 7→ {00, 01, 10, 11},
where G(0, 0) = 00, G(0, 1) = 01, G(1, 0) = 10, G(1, 1) = 11. Clearly, this is a
generalization of the Boolean gate functionality G : {0, 1}2 7→ {0, 1}.
Let the secrets domain be D S = {0, 1} n , and four (not necessarily distinct)
secrets s00, ...s11 ∈ D S are given. The secret si j corresponds to the value
G(i, j) of the output wire.
The intuition for the design of the GESS scheme is as follows (see
illustration in Figure 3.4). We first randomly choose two strings R0, R1 ∈R D S
to be the shares sh10 and sh11 (corresponding to 0 and 1 of the first input wire).
Now consider sh20 , the share corresponding to 0 of the second input wire. We
want this share to produce either s00 (when combined with sh10 ) or s10 (when
combined with sh11 ). Thus, the share sh20 will consist of two blocks. One,
block s00 ⊕ R0 , is designed to be combined with R0 and reconstruct s00 . The
other, s10 ⊕ R1 , is designed to be combined with R1 and reconstruct s10 . Share
sh21 is constructed similarly, setting blocks to be s01 ⊕ R0 and s11 ⊕ R1 .
Both leftmost blocks are designed to be combined with the same share R0 ,
and both rightmost blocksare designed to be combined with the same share R1 .
Therefore, we append a 0 to R0 to tell Rec to use the left block of the second
share for reconstruction, and append a 1 to R1 to tell Rec to use the right block
of the second share for reconstruction. Finally, to hide information leaked by
the order of blocks in shares, we randomly choose a bit b and if b = 1 we
reverse the order of blocks in both shares of wire 2 and invert the appended
pointer bits of the shares of wire 1. Secret reconstruction proceeds by xor-ing
the wire-1 share (excluding the pointer bit) with the first or second half of the
52 Fundamental MPC Protocols
s0, s1
Note the inefficiency of the above construction, causing the shares correspond-
ing to the second input wire be double the size of the gate’s secrets. While,
in some circuits we can avoid the exponential (in depth) secret growth by
balancing the direction of greater growth toward more shallow parts of the
circuit, a more efficient solution is desirable. We discuss only AND and OR
gates, since NOT gates are implemented simply by flipping the wire label
semantics by the Generator. GESS also enables XOR gates without any increase
the share sizes. We defer discussion of this to Section 4.1.2, because the XOR
sharing in GESS led to an important related improvement for Yao’s GC.
For OR and AND gates in the above construction, either the left or the right
blocks of the two shares are equal (this is because s00 = s01 for the AND gate,
and s10 = s11 for the OR gate). We use this property to reduce the size of the
shares when the secrets are of the above form. The key idea is to view the
3.6. Information-Theoretic Garbled Circuits 53
shares of the second wire as being the same, except for one block.
Suppose each of the four secrets consists of n blocks and the secrets differ
only in the j th block, as follows:
s00 = ( t1 . . . t j−1 t 00
j t j+1 . . . tn ),
...
s11 = ( t1 . . . t j−1 t 11
j t j+1 . . . tn ),
to consider the columns of blocks, spanning across the shares. Every column
(with the exception of the j-th) consists of four equal blocks, where the value
j is private.
For simplicity, we show the main ideas by considering a special case
where the four secrets consist of n = 3 blocks each, and j = 2 is the index of
the column of distinct blocks. This intuition is illustrated on Figure 3.5. The
scheme naturally generalizes from this intuition; Kolesnikov (2005) provides a
formal presentation.
The idea is to share the secrets “column-wise”, treating each of the three
columns of blocks of secrets as a tuple of subsecrets and sharing this tuple
separately, producing the corresponding subshares. Consider sharing column 1.
All four subsecrets are equal (to t1 ), and we share them trivially by setting both
subshares of the first wire to a random string R1 ∈R D S , and both subshares of
the second wire to be R1 ⊕ t1 . Column 3 is shared similarly. We share column
2 as in previous construction (highlighted on the diagram), omitting the last
step of appending the pointers and applying the permutation. This preliminary
assignment of shares (still leaking information due to order of blocks) is shown
on Figure 3.5.
Note that the reconstruction of secrets is done by xor-ing the corresponding
blocks of the shares, and, importantly, the procedure is the same for both types
of sharing we use. For example, given shares sh10 and sh21 , we reconstruct
the secret s01 = (R1 ⊕ (R1 ⊕ t1 ), R2 ⊕ (R2 ⊕ t201 ), R3 ⊕ (R3 ⊕ t3 )).
The remaining permute-and-point step is to apply (the same) random
permutation π to reorder the four columns of both shares of wire 2 and to
append (log 4)-bit pointers to each block of the shares of wire 1, telling the
reconstructor which block of the second share to use. Note that the pointers
appended to both blocks of column 1 of wire 1 are the same. The same holds
54 Fundamental MPC Protocols
Parameters:
Protocol:
1. R generates a public-private key pair sk, pk, and samples a random key,
pk 0, from the public key space. If b = 0, R sends a pair (pk, pk 0) to S.
Otherwise (if b = 1), R sends a pair (pk 0, pk) to S.
We start with the basic public key-based OT in the semi-honest model. The
construction, presented in Figure 3.6, is very simple indeed.
The security of the construction assumes the existence of public-key
encryption with the ability to sample a random public key without obtaining
the corresponding secret key. The scheme is secure in the semi-honest model.
The Sender S only sees the two public keys sent by R, so cannot predict with
probability better than 12 which key was generated without the knowledge of
the secret key. Hence, the view of S can be simulated simply by sending two
randomly-chosen public keys.
The Receiver R sees two encryptions and has a secret key to decrypt only
one of them. The view of R is also easily simulated, given R’s input and
56 Fundamental MPC Protocols
output. Sim S will generate the public-private key pair and a random public
key, and set the simulated received ciphertexts to be 1) the encryption of the
received secret under the generated keypair and 2) the encryption of zero under
the randomly chosen key. The simulation goes through since the difference
with the real execution is only in the second encryption, and distinguisher
will not be a tell apart the encryption of zero from another value due to the
encryption security guarantees. Note that this semi-honest protocol provides
no security against a malicious sender—the Sender S can simply generate
two public-private key pairs, (sk 0, pk 0 ) and (sk 1, pk 1 ) and send (pk 0, pk 1 ) to
R, and decrypt both received ciphertexts to learn both x1 and x2 .
The simple protocol in Figure 3.6 requires one public key operation for both the
sender and receiver for each selection bit. As used in a Boolean circuit-based
MPC protocol such as Yao’s GC, it is necessary to perform an OT for each
input bit of the party executing the circuit. For protocols like GMW, evaluating
each AND gate requires an OT. Hence, several works have focused on reducing
the number of public key operations to perform a large number of OTs.
G(r) and unmasks the input b ⊕ G(r), obtaining the selection string b. Then
F simply outputs to R the corresponding secrets xbi . Only κ input bits are
provided by R, the circuit evaluator, so only a constant number of κ OTs are
needed to perform m OTs.
q j = t j ⊕ [C(r j ) · s] (3.3)
where “·” now denotes bitwise-AND of two strings of length k. (Note that
when C is a repetition code, this is exactly Equation 3.2.)
For each value r 0 ∈ {0, 1}` , the sender associates the secret value H(q j ⊕
[C(r 0) · s]), which it can compute for all r 0 ∈ {0, 1}` . At the same time, the
receiver can compute one of these values, H(t j ). Rearranging Equation 3.3,
we have:
H(t j ) = H(q j ⊕ [C(r j ) · s])
5As pointed out by Ishai et al. (2003), it is sufficient to assume that H is a correlation-robust
hash function, a weaker assumption than RO. A special assumption is required because the
same s is used for every resulting OT instance.
3.8. Custom Protocols 59
Hence, the value that the receiver can learn is the secret value that the sender
associates with the receiver’s choice string r 0 = r j .
At this point, OT of random strings is completed. For OT of chosen strings,
the sender will use each H(q i ⊕ [C(r) · s]) as a key to encrypt the r-th OT
message. The receiver will be able to decrypt only one of these encryptions,
namely one corresponding to its choice string r j .
To argue that the receiver learns only one string, suppose the receiver has
choice bits r j but tries to learn also the secret H(q j ⊕ [C(r̃) · s]) corresponding
to a different choice r̃. We observe:
q j ⊕ [C(r̃) · s] = t j ⊕ [C(r j ) · s] ⊕ [C(r̃) · s]
= t j ⊕ [(C(r j ) ⊕ C(r̃)) · s]
Importantly, everything in this expression is known to the receiver except
for s. Now suppose the minimum distance of C is κ (the security parameter).
Then C(r j ) ⊕ C(r̃) has Hamming weight at least κ. Intuitively, the adversary
would have to guess at least κ bits of the secret s in order to violate security.
The protocol is secure in the RO model, and can also be proven under the
weaker assumption of correlation robustness, following Ishai et al. (2003) and
Kolesnikov and Kumaresan (2013).
Finally, we remark that the width k of the OT extension matrix is equal to
the length of codewords in C. The parameter k determines the number of base
OTs and the overall cost of the protocol.
The IKNP protocol sets the number of OT matrix columns to be k = κ. To
achieve the same concrete security as IKNP OT, the KK13 protocol (Kolesnikov
and Kumaresan, 2013) requires setting k = 2κ, to account for the larger space
required by the more efficient underlying code C.
All of the secure computation protocols discussed so far in this chapter are
generic circuit-based protocols. Circuit-based protocols suffer from linear
bandwidth cost in the size of the circuit, which can be prohibitive for large
computations. There are significant overheads with circuit-based computation
on large data structures, compared to, say, a RAM (Random Access Machine)
representation. In Chapter 5 we discuss approaches for incorporating sublinear
data structures into generic circuit-based protocols.
60 Fundamental MPC Protocols
More efficient OPRF from 1-out-of-∞ OT. Kolesnikov et al. (2016) devel-
oped an efficient OPRF construction for the PSI protocol, by pushing on the
coding idea from Section 3.7.2. The main technical observation is pointing out
that the code C need not have many of the properties of error-correcting codes.
The resulting pseudorandom codes enable an 1-out-of-∞ OT, which can be
used to produce an efficient PSI.
In particular,
1. it makes no use of decoding, thus the code does not need to be efficiently
decodable, and
2. it requires only that for all possibilities r, r 0, the value C(r) ⊕ C(r 0) has
Hamming weight at least equal to the computational security parameter
κ. In fact, it is sufficient even if the Hamming distance guarantee is only
probabilistic — i.e., it holds with overwhelming probability over choice
of C (we discuss subtleties below).
discussed above, the receiver is only able to compute H(t j ) = H(q j ⊕ [C(r) · s])
— the secret corresponding to its choice string r. The property of the PRC is
that, with overwhelming probability, all other values of q j ⊕ [C(r̃) · s] (that a
polytime player may ever ask) differ from t j in a way that would require the
receiver to guess at least κ bits of s.
Indeed, we can view the functionality achieved by the above 1-out-of-∞
OT as a kind of OPRF. Intuitively, r 7→ H(q ⊕ [C(r) · s]) is a function that
the sender can evaluate on any input, whose outputs are pseudorandom, and
which the receiver can evaluate only on its chosen input r.
The main subtleties in viewing 1-out-of-∞ OT as OPRF are:
1. the fact that the receiver learns slightly more than the output of this
“PRF” — in particular, the receiver learns t = q ⊕ [C(r) · s] rather than
H(t); and,
2. the fact that the protocol realizes many instances of this “PRF” but with
related keys — s and C are shared among all instances.
Kolesnikov et al. (2016) show that this construction can be securely used in
place of the OPRF in the PSSZ protocol, and can scale to support private
intersections of sets (of any size element) with n = 220 over a wide area
network in under 7 seconds.
Set intersection of multiple sets can be computed iteratively by computing
pairwise intersections. However, extending the above 2PC PSI protocol to the
multi-party setting is not immediate. Several obstacles need to be overcome,
such as the fact that in 2PC computation one player learns the set intersection of
the two input sets. In the multi-party setting this information must be protected.
Efficient extension of the above PSI protocol to the multi-party setting was
proposed by Kolesnikov et al. (2017a).
(2004) provides a cleaner and more detailed presentation. The BGW and
CCD protocols were developed concurrently by Ben-Or et al. (1988) and
Chaum et al. (1988). Beaver et al. (1990) considered constant-round multiparty
protocols. A more detailed protocol presentation and discussion can be found
in Phillip Rogaway’s Ph.D. thesis (Rogaway, 1991).
Recently, a visual cryptography scheme for secure computation without
computers was designed based on the GESS scheme (D’Arco and De Prisco,
2014; D’Arco and De Prisco, 2016). The OT extension of Ishai et al. (2003)
is indeed one of the most important advances in MPC, and there are several
extensions. Kolesnikov and Kumaresan (2013) and Kolesnikov et al. (2016)
propose random 1-out-of-n OT and 1-out-of-∞ OT at a cost similar to that of
1-out-of-2 OT. The above schemes are in the semi-honest model; maliciously-
secure OT extensions were proposed Asharov et al. (2015b) and Keller et al.
(2015) (the latter is usually seen as simpler and more efficient of the two).
Custom PSI protocols have been explored in many different settings
with different computation vs. communication costs and a variety of trust
assumptions. Hazay and Lindell (2008) presented a simple and efficient private
set intersection protocol that assumes one party would perform computations
using a trusted smartcard. Kamara et al. (2014) present a server-aided private
set intersection protocol, which, in the case of the semi-honest server, computes
the private set intersection of billion-element sets in about 580 seconds while
sending about 12.4 GB of data. This is an example of asymmetric trust, which
we discuss further in Section 7.2.
There has been much research on custom protocols beyond PSI, but it is
surprisingly rare to find custom protocols that substantially outperform fast
generic MPC implementations of the same problem.
4
Implementation Techniques
65
66 Implementation Techniques
The main costs of executing a garbled circuits protocol are the bandwidth
required to transmit the garbled gates and the computation required to generate
and evaluate the garbled tables. In a typical setting (LAN or WAN and
moderate computing resources such as smartphone or a laptop), bandwidth is
the main cost of executing GC protocols. There have been many improvements
to the traditional garbling method introduced in Section 3.1.2; we survey
the most significant ones next. Table 4.1 summarizes the impact of garbing
improvements on the bandwidth and computation required to generate and
evaluate a garbled gate. We described point-and-permute in Section 3.1.1; the
other techniques are described in the next subsections.
size calls to H
Technique XOR AND XOR AND
classical 4 4 4 4
point-and-permute (1990) (§3.1.1) 4 4 4, 1 4, 1
row reduction (GRR3) (1999) (§4.1.1) 3 3 4, 1 4, 1
FreeXOR + GRR3 (2008) (§4.1.2) 0 3 0 4, 1
half gates (2015) (§4.1.3) 0 2 0 4, 2
Table 4.1: Garbling techniques (based on Zahur et al. (2015)). Size is number of “ciphertexts”
(multiples of κ bits) transmitted per gate. Calls to H is the number of evaluations of H needed
to evaluate each gate. When the number is different for the generator and evaluator, the numbers
shown are the generator calls, evaluator calls.
4.1. Less Expensive Garbling 67
Naor et al. (1999) introduced garbled row reduction (GRR) as a way to reduce
the number of ciphertexts transmitted per gate. The key insight is that it is
not necessary for each ciphertext to be an (unpredictable) encryption of a
wire label. Indeed, one of the entries in each garbled table can be fixed to a
predetermined value (say 0κ ), and hence need not be transmitted at all. For
example, consider the garbled table below, where a and b are the input wires,
and c is the output:
H(a1 k b0 ) ⊕ c0
H(a0 k b0 ) ⊕ c0
H(a1 k b1 ) ⊕ c1
H(a0 k b1 ) ⊕ c0
Since c0 and c1 are just arbitrary wire labels, we can select c0 = H(a1 k b0 ).
Thus, one of the four ciphertexts in each gate (say, the first one when it is
sorted according point-and-permute order) will always be the all-zeroes string
and does not need to be sent. We call this method GRR3 since only three
ciphertexts need to be transmitted for each gate.
Pinkas et al. (2009) describe a way to further reduce each gate to two
ciphertexts, applying a polynomial interpolation at each gate. Because this is
not compatible with the FreeXOR technique described next, however, it was
rarely used in practice. The later half-gates technique (Section 4.1.3) achieves
two-ciphertext AND gates and is compatible with FreeXOR, so supersedes the
interpolation technique of Pinkas et al. (2009).
4.1.2 FreeXOR
One of the results of Kolesnikov (2005) was the observation that the GESS
sharing for XOR gates can be done without any growth of the share sizes
(Section 3.6). Kolesnikov (2005) found a lower bound for the minimum share
sizes, explaining the necessity of the exponential growth for independent
secrets. This bound, however, did not apply to XOR gates (or, more generally,
to “even” gates whose truth table had two zeros and two ones).
As introduced in Section 3.6, XOR sharing for GESS can simply be done
as follows. Let s0, s1 ∈ D S be the output wire secrets. Choose R ∈R D S and
68 Implementation Techniques
masked by H’s output are also correlated by ∆. The standard security definition
of a PRG does not guarantee that the outputs of H are pseudorandom in this case,
but a random oracle does. Kolesnikov and Schneider mention that a variant of
correlation robustness, a notion weaker than RO, is sufficient (Kolesnikov and
Schneider, 2008b). In an important theoretical clarification of the FreeXOR
required assumptions, Choi et al. (2012b) show that the standard notion of
correlation robustness is indeed not sufficient, and pin down the specific
variants of correlation robustness needed to prove the security of FreeXOR.
The full garbling protocol for FreeXOR is given in Figure 4.1. The FreeXOR
GC protocol proceeds identically to the standard Yao GC protocols of Figure 3.2,
except that in Step 4, P2 processes XOR gates without needing any ciphertexts or
encryption: for an XOR-gate Gi with garbled input labels wa = (k a, pa ), wb =
(k b, pb ), the output label is directly computed as (k a ⊕ k b, pa ⊕ pb ).
Kolesnikov et al. (2014) proposed a generalization of FreeXOR called
fleXOR. In fleXOR, an XOR gate can be garbled using 0, 1, or 2 ciphertexts,
depending on structural and combinatorial properties of the circuit. FleXOR
can be made compatible with GRR2 applied to AND gates, and thus supports
two-ciphertext AND gates. The half gates technique described in the next
section, however, avoids the complexity of fleXOR, and reduces the cost of
AND gates to two ciphertexts with full compatibility with FreeXOR.
Zahur et al. (2015) introduced an efficient garbling technique that requires only
two ciphertexts per AND gate and fully supports FreeXOR. The key idea is to
represent an AND gate as XOR of two half gates, which are AND gates where
one of the inputs is known to one of the parties. Since a half gate requires
a garbled table with two entries, it can be transmitted using the garbled row
reduction (GRR3) technique with a single ciphertext. Implementing an AND
gate using half gates requires constructing a generator half gate (where the
generator knows one of the inputs) and an evaluator half gate (where the
evaluator knows one of the inputs). We describe each half gate construction
next, and then show how they can be combined to implement an AND gate.
Generator Half Gate. First, consider the case of an AND gate where the
input wires are a and b and the output wire is c. The generator half-AND gate
70 Implementation Techniques
Parameters:
Boolean Circuit C implementing function F , security parameter κ.
Let H : {0, 1}∗ 7→ {0, 1} κ+1 be a hash function modeled by a RO.
Protocol:
H(b0 ) ⊕ c0
H(b1 ) ⊕ c0 ⊕ va · ∆
Evaluator Half Gate. For the evaluator half gate, vc = va ∧ vb , the evaluator
knows the value of va when the gate is evaluated, and the generator knows
neither input. Thus, the evaluator can behave differently depending on the
known plaintext value of wire a. The generator provides the two ciphertexts:
H(a0 ) ⊕ c0
H(a1 ) ⊕ c0 ⊕ b0
The ciphertexts are not permuted here—since the evaluator already knows va ,
it is fine (and necessary) to arrange them deterministically in this order. When
va is false, the evaluator knows it has a0 and can compute H(a0 ) to obtain
output wire c0 . When va is true, the evaluator knows it has a1 so can compute
H(a1 ) to obtain c0 ⊕ b0 . It can then xor this with the wire label it has for b, to
obtain either c0 (false, when b = b0 ) or c1 = c0 ⊕ ∆ (true, when b1 = b0 ⊕ ∆),
72 Implementation Techniques
without learning the semantic value of b or c. As with the generator half gate,
using garbled row-reduction (Section 4.1.1) reduces the two ciphertexts to a
single ciphertext. In this case, the generator simply sets c0 = H(a0 ) (making
the first ciphertext all zeroes) and sends the second ciphertext.
Combining Half Gates. It remains to show how the two half gates can be
used to evaluate a gate vc = va ∧ vb , in a garbled circuit, where neither party
can know the semantic value of either input. The trick is for the generator to
generate a uniformly random bit r, and to transform the original AND gate into
two half gates involving r:
vc = (va ∧ r) ⊕ (va ∧ (r ⊕ vb ))
use fewer than two ciphertexts per gate. Hence, under these assumptions the
half-gates scheme is bandwidth-optimal for circuits composed of two-input
binary gates (see Section 4.5 for progress on alternatives).
Network bandwidth is the main cost for garbled circuits protocols in most
practical scenarios. However, computation cost of GC is also substantial, and
is dominated by calls to the encryption function implementing the random
oracle H in garbling gates, introduced in Section 3.1.2. Several techniques
have been developed to reduce that cost, in particular by taking advantage of
built-in cryptographic operations in modern processors.
Since 2010, Intel cores have included special-purpose AES-NI instructions
for implementing AES encryption, and most processors from other vendors
include similar instructions. Further, once an AES key is set up (which involves
AES round keys generation), the AES encryption is particularly fast. This
combination of incentives motivated Bellare et al. (2013) to develop fixed-key
AES garbling schemes, where H is implemented using fixed-key AES as a
cryptographic permutation.
Their design is based on a dual-key cipher (Bellare et al., 2012), where
two keys are both needed to decrypt a ciphertext. Bellare et al. (2012) show
how a secure dual-key cipher can be built using a single fixed-key AES
operation under the assumption that fixed-key AES is effectively a random
permutation. Since the permutation is invertible, it is necessary to combine the
permutation with the key using the Davies-Meyer construction (Winternitz,
1984): ρ(K) = π(K) ⊕ K. Bellare et al. (2013) explored the space of secure
garbling functions constructed from a fixed-key permutation, and found the
fastest garbling method using π(K ||T)[1 : k] ⊕ K ⊕ X where K ← 2A ⊕ 4B,
A and B are the wire keys, T is a tweak, and X is the output wire.
Gueron et al. (2015) pointed out that the assumption that fixed-key AES
behaves like a random permutation is non-standard and may be questionable in
practice (Biryukov et al., 2009; Knudsen and Rijmen, 2007). They developed
a fast garbling scheme based only on the more standard assumption that
AES is a pseudorandom function. In particular, they showed that most of
the performance benefits of fixed-key AES can be obtained just by carefully
pipelining the AES key schedule in the processor.
74 Implementation Techniques
Note also that the FreeXOR optimization also requires stronger than
standard assumptions (Choi et al., 2012b), and the half-gates method depends
on FreeXOR. Gueron et al. (2015) showed a garbling construction alternative
to FreeXOR that requires only standard assumptions, but requires a single
ciphertext for each XOR gate. Moreover, their construction is compatible with
a scheme for reducing the number of ciphertexts needed for AND gates to two
(without relying on FreeXOR, as is necessary for half gates). The resulting
scheme has higher cost than the half-gates scheme because of the need to
transmit one ciphertext for each XOR, but shows that it is possible to develop
efficient (within about a factor of two of the cost of half gates) garbling schemes
based only on standard assumptions.
Since the main cost of executing a circuit-based MPC protocol scales linearly
with the size of the circuit, any reduction in circuit size will have a direct
impact on the cost of the protocol. Many projects have sought ways to reduce
the sizes of circuits for MPC. Here, we discuss a few examples.
a1 a2
b1 b2
b1 = a1 ⊕ (p ∧ (a1 ⊕ a2 ))
(4.1)
b2 = a2 ⊕ (p ∧ (a1 ⊕ a2 ))
The swapper is illustrated in Figure 4.2. There the block f is set to 0
if no swapping is desired, and to 1 to implement swapping. The f block
is implemented as a conjunction of the input with the programming bit p,
so Figure 4.2 corresponds to Equation 4.1.
Since wire outputs can be reused, p ∧ (a1 ⊕ a2 ) only needs to be evaluated
once. Referring back to the half-gates garbling and the notation of Section 4.1.3,
when p is known to the generator, this conjunction is a generator half gate.
As noted above, applying GRR3 allows this to be implemented with a single
76 Implementation Techniques
2Indeed, the idea for half-gates garbling (Section 4.1.3) came from this X switching block
design from Kolesnikov and Schneider (2008b).
4.2. Optimizing Circuits 77
CBMC-GC. Holzer et al. (2012) used a model checking tool as the basis for a
tool that compiles C programs into Boolean circuits for use in a garbled circuits
protocol. CBMC (Clarke et al., 2004) is a bounded model checker designed to
verify properties of programs written in ANSI C. It works by first translating
an input program (with assertions that define the properties to check) into a
Boolean formula, and then using a SAT solver to test the satisfiability of that
formula. CBMC operates at the level of bits in the machine, so the Boolean
formula it generates is consistent with the program semantics at the bit level.
When used as a model checker, CBMC attempts to find a satisfying assignment
of the Boolean formula corresponding to the input program. If a satisfying
assignment is found, it corresponds to a program trace that violates an assertion
in the program. CBMC unrolls loops and inlines recursive function calls up
to the given model-checking bound, removing cycles from the program. For
many programs, CBMC can statically determine the maximum number of
loop iterations; when it cannot, programmers can use annotations to state this
explicitly. When used in bounded model checking, an assertion is inserted that
will be violated if the unrolling was insufficient. Variables are replaced by bit
vectors of the appropriate size, and the program is converted to single-static
assignment form so that fresh variables are introduced instead of assigning to
a given variable more than once.
Normally, CBMC would convert the program to a Boolean formula, but
internally it is represented as a circuit. Hence, CBMC can be used as a
component in a garbled circuits compiler that translates an input program in
C into a Boolean circuit. To build CBMC-GC, Holzer et al. (2012) modified
CBMC to output a Boolean circuit which can be then executed in circuit-based
secure computation framework (such as the one from Huang et al. (2011b),
which was used by CBMC-GC). Since CBMC was designed to optimize
circuits for producing Boolean formulas for SAT solvers, modifications were
78 Implementation Techniques
done to produce better circuits for garbled circuit execution. In particular, XOR
gates are preferred in GC execution due to the FreeXOR technique (whereas
the corresponding costs in model checking favor AND gates). To minimize the
number of non-free gates, Holzer et al. (2012) replaced the built-in circuits
CBMC would use for operations like addition and comparison, with designs
that minimize costs with free XOR gates.
The main limit on early garbled circuit execution frameworks, starting with
Fairplay (Malkhi et al., 2004), is that they needed to generate and store the
entire garbled circuit. Early on, researchers focused on the performance for
smaller circuits and developed tools that naïvely generate and store the entire
garbled circuit. This requires a huge amount of memory for all but trivial
circuits, and limited the size of inputs and complexity of functions that could
be computed securely. In this section, we discuss various improvements to the
way MPC protocols are executed that have overcome these scaling issues and
eliminated much of the overhead of circuit execution.
relative to the size of the full garbled circuit since it can reuse components and
is made of normal gate representations instead of non-reusable garbled gates.
To execute the protocol, the generator produces garbled gates in an order
that is determined by the topology of the circuit, and transmits the garbled
tables to the evaluator as they are produced. As the client receives them, it
associates each received garbled table with the corresponding gate of the circuit.
Since the order of generating and evaluating the circuit is fixed according
to the circuit (and must not depend on the parties’ private inputs), keeping
the two parties synchronized requires essentially no overhead. As it evaluates
the circuit, the evaluator maintains a set of live wire labels and evaluates the
received gates as soon as all their inputs are ready. This approach allows the
storage for each gate to be reused after it is evaluated, resulting in much smaller
memory footprint and greatly increased performance.
as necessary for each loop execution, but the size of the circuit, and local
memory needed, does not grow with the number of iterations. PCF represents
Boolean circuits in a bytecode language where each input is a single bit, and
the operations are simple Boolean gates. Additional operations are provided
for duplicating wire values, and for making function calls (with a return
stack) and indirect (only forward) jumps. Instructions that do not involve
executing Boolean operators do not require any protocol operations, so can be
implemented locally by each party. To support secure computation, garbled
wire values are represented by unknown values, which cannot be used as the
conditions for conditional branches. The PCF compiler implemented several
optimizations to reduce the cost of the circuits, and was able to scale to circuits
with billions of gates (e.g., over 42 billion gates, of which 15 billion were
non-free, to compute 1024-bit RSA).
resources from the client, the protocol uses an outsourced oblivious transfer
protocol. To provide privacy of the outputs, a blinding circuit is added to the
original circuit that masks the output with a random pad known only to the
client and server. By moving the bulk of the garbled circuit execution cost to
the cloud service, the costs for the mobile device can be dramatically reduced.
Since the truth value of the x > y condition will not be known even at runtime,
there is no way for the execution to know if the assignment occurs. Instead,
every assignment statement inside an oblivious conditional context must use
“multiplexer” circuits that select based on the semantic value of the comparison
condition within the MPC whether to perform the update or have no effect.
Within the encrypted protocol, the correct semantics are implemented to ensure
semantic values are updated only on the branch that would actually be executed
based on the oblivious condition. The program executing the protocol (or an
analyst reviewing its execution) cannot determine which path was actually
executed since all of the values are encrypted within the MPC.
Updating a cleartext value z within an oblivious conditional branch would
not leak any information, but would provide unexpected results since the update
would occur regardless of whether or not the oblivious conditional is true.
Obliv-C’s type system protects programmers from mistakes where non-obliv
values are updated in conditional contexts. Note that the type checking is not
necessary for security since the security of the obliv values is enforced at
runtime by the MPC protocol. It only exists to help the programmers avoid
mistakes by providing compile time errors for non-sensical code.
To implement low-level libraries and optimizations, however, it is useful
for programmers to escape that type system. Obliv-C provides an unconditional
block construct that can be used within an oblivious context but contains
code that executes unconditionally. Figure 4.3 shows an example of how
an unconditional block (denoted with ∼obliv(var)) can be used to implement
oblivious data structures in Obliv-C. This is an excerpt of an implementation
of a simple resizable array implemented using a struct that contains oblivious
variables representing the content and actual size of the array, and an opaque
variable representing its maximum possible size. While the current length of
the array is unknown (since we might append() while inside an obliv if), we
can still use an unconditional block to track a conservative upper bound of the
length. We use this variable to allocate memory space for an extra element
when it might be needed.
This simple example illustrates how Obliv-C can be used to implement
low-level optimizations for complex oblivious data structures, without needing
to implement them at the level of circuits. Obliv-C has been used to implement
libraries for data-oblivious data structures supporting random access memory
4.5. Further Reading 85
typedef struct {
obliv int ∗arr;
obliv int sz;
int maxsz;
} Resizeable;
void writeArray(Resizeable ∗r, obliv int index, obliv int val) obliv;
Figure 4.3: Example use of an unconditional block (extracted from Zahur and Evans (2015)).
including Square-Root ORAM (Section 5.4) and Floram (Section 5.5), and to
implement some of the largest generic MPC applications to date including stable
matching at the scale needed for the national medical residency match (Doerner
et al., 2016), an encrypted email spam detector (Gupta et al., 2017), and a
commercial MPC spreadsheet (Calctopia, Inc., 2017).
Many methods for improving garbling have been proposed beyond the ones
covered in Section 4.1. As mentioned in Section 4.1.3, the half-gates scheme
is bandwidth optimal under certain assumptions. Researchers have explored
several ways to reduce bandwidth by relaxing those assumptions including
garbling schemes that are not strictly “linear” in the sense considered in the
optimality proof (Kempka et al., 2016), using high fan-in gates (Ball et al.,
2016) and larger lookup tables (Dessouky et al., 2017; Kennedy et al., 2017).
MPC protocols are inherently parallelizable, but additional circuit design effort
may be helpful for maximizing the benefits of parallel execution (Buescher
and Katzenbeisser, 2015). GPUs provide further opportunities for speeding up
86 Implementation Techniques
Table 4.2: Selected MPC Programming tools. In this table, we focus on tools that are
recently or actively developed, and that provide state-of-the-art performance. The
DUPLO extension is from (Kolesnikov et al., 2017b). All of the listed tools are avail-
able as open source code: ABY at https://github.com/encryptogroup/ABY; EMP at
https://github.com/emp-toolkit; Frigate at https://bitbucket.org/bmood/frigaterelease;
Obliv-C at https://oblivc.org; PICCO at https://github.com/PICCO-Team/picco.
requires a circuit that scales linearly in the size of the array a. A natural circuit
consists of N multiplexers, as shown in Figure 5.1. This method, where every
element of a data structure is touched to perform an oblivious read or an update,
is known as linear scan. For practical computations on large data structures, it
is necessary to provide sublinear access operations. However, any access that
only touches a subset of the data potentially leaks information about protected
data in the computation.
In this chapter, we discuss several extensions to circuit-based MPC designed
to enable efficient applications using large data structures. One strategy for
providing sublinear-performance data structures in oblivious computation is to
design data structures that take advantage of predictable access patterns. Indeed,
it is not necessary to touch the entire data structure if the parts that are accessed
do not depend on any private data (Section 5.1). A more general strategy,
however, requires providing support for arbitrary memory access with sublinear
87
88 Oblivious Data Structures
== 0 == 1 == 2 …
×2 ×2 … ×2
cost. This cannot be achieved within a general-purpose MPC protocol, but can
be achieved by combining MPC with oblivious RAM (Sections 5.2–5.5).
In some programs the access patterns are predictable and known in advance,
even though they may involve private data. As a simple example, consider this
loop that doubles all elements of an array of private data:
for (i = 0; i < N; i++) {
a[i] = 2 ∗ a[i]
}
Instead of requiring N linear scan array accesses for each iteration (with Θ(N 2 )
total cost), the loop could be unrolled to update each element directly, as shown
in Figure 5.2. Since the access pattern required by the algorithm is completely
predictable, there is no information leakage in just accessing each element
once to perform its update.
Most algorithms access data in a way that is not fully and as obviously
predictable as in the above example. Conversely, usually it is done in a way that
5.1. Tailored Oblivious Data Structures 89
is not fully data-dependent. That is, it might be a priori known (i.e., known
independently of the private inputs) that some access patterns are guaranteed
to never occur in the execution. If so, an MPC protocol that does not include
the accesses that are known to be impossible regardless of the private data may
still be secure. Next, we describe oblivious data structures designed to take
advantage of predictable array access patterns common in many algorithms.
starting state in the figure depicts a state where none of the t values exceed
3, and hence there is guaranteed sufficient space for two conditional push
operations. A multiplexer is used to push the new value into the correct slot
based on the t0 value, similar to the naïve stack circuit design described above.
However, in this case, the cost is low since this is applied to a fixed 5-element
array. After two conditional push operations, however, with the starting t0 = 3,
the level 0 buffer could be full. Hence, it is necessary to perform a shift, which
either has no impact (if t0 ≤ 3), or pushes one block from level 0 into level 1
(as shown in Figure 5.3). After the shift, two more conditional push operations
can be performed. This hierarchical design can extend to support any size stack,
with shifts for level i generated for every 2i condPush operations. A similar
design can support conditional pop operations, where left shifts are required
for level i after every 2i condPop operations. The library implementing the
conditional stack keeps track of the number of stack operations to know the
minimum and maximum number of possible elements at each level, and inserts
the necessary shifts to prevent overflow and underflows.
For all circuit-based MPC protocols, the primary cost is the bandwidth
required to execute the circuit, which scales linearly in the number of gates.
The cost depends on the maximum possible number of elements at each point
in the execution. For a stack known to have at most N elements, k operations
access level i at most bk/2i c times since we need a right shift for level i after
every 2i conditional push operations (and similarly, need a left shift after 2i
conditional pop operations).
However, the operations at the deeper levels are more expensive since the
size of each block of elements at level i is 2i , requiring Θ(2i ) logic gates to
move. So, we need Θ(2i × k/2i ) = Θ(k)-sized circuits at level i. Thus, with
Θ(log N) levels, the total circuit size for k operations is Θ(k log N) and the
amortized cost for each conditional stack operation is Θ(log N).
Other Oblivious Data Structures. Zahur and Evans (2013) also present a
similar oblivious hierarchical queue data structure, essentially combining two
stacks, one of which only supports push operations and the other that only
supports pop operations. Since these stacks only need to support one of the
conditional operations, instead of using a 5-block buffer at each level they use
a 3-block buffer. Moving data between the two stacks requires an additional
5.1. Tailored Oblivious Data Structures 91
Figure 5.3: Illustration of two conditional push operations for oblivious stack. The shift(0)
operation occurs after every two condPush operations.
multiplexer. Similarly to the oblivious stack, the amortized cost for operations
on the oblivious queue is Θ(log N).
Data structures designed to provide sublinear-cost oblivious operations
can be used for a wide range of memory access patterns, whenever there is
sufficient locality and predictability in the code to avoid the need to provide full
random access. Oblivious data structures may also take advantage of operations
that can be batched to combine multiple updates into single data structure scan.
For example, Zahur and Evans (2013) present an associative map structure
where a sequence of reads and writes with no internal dependencies can be
batched into one update. This involves constructing the new value of the data
structure by sorting the original values and the updates into a array, and only
keeping the most recent value of each key-value pair. This allows up to N
updates to be performed with a circuit of size Θ(N log2 N), with an amortized
cost per update of Θ(log2 N).
The main challenge is writing programs to take advantage of predictable
memory access patterns. A sophisticated compiler may be able to identify
predictable access patterns in typical code and perform the necessary transfor-
mations automatically, but this would require a deep analysis of the code and
no suitable compiler currently exists. Alternatively, programmers can manually
rewrite code to use libraries that implement the oblivious data structures
and manage all of the bookkeeping required to carry out the necessary shift
92 Oblivious Data Structures
1RAM-MPC can also be made to work in the malicious security model (Afshar et al.,
2015), but care must be taken to ensure that data stored outside the MPC is not corrupted.
94 Oblivious Data Structures
is why the number of elements in each node is set to O(log N) to make the
overflow probability negligible. The constant factors matter, however. Gordon
et al. (2012) simulated various configurations to find that a binary search on a
216 element ORAM (that is, only 16 operations) could be implemented with
less than 0.0001 probability of overflow with a bucket size of 32.
Variations on this design improved the performance of tree-based ORAM
for MPC have focused on using additional storage (called a stash) to store
overflow elements and reduce the sizes of the buckets needed to provide
negligible failure probability, as well as on improving the eviction algorithm.
Path ORAM. Path ORAM (Stefanov et al., 2013) added a stash to the design
as a fixed-size auxiliary storage for overflow elements, which would be scanned
on each request. The addition of a small stash enabled a more efficient eviction
strategy than the original binary-tree ORAM. Instead of selecting two random
nodes at each level for eviction and needing to update both child nodes of the
selected nodes to mask the selected element, Path ORAM performed evictions
on the access path from the root to the accessed node, moving elements along
this path from the root towards the leaves as much as possible. Since this path
is already accessed by the request, no additional masking is necessary to hide
which element is evicted. The Path ORAM design was adapted by Wang et al.
(2014a) to provide a more efficient RAM-MPC design, and they presented a
circuit design for a more efficient eviction circuit.
Although the first proposed ORAM designs were hierarchical, early RAM-
MPC designs did not adopt these constructions because their implementation
seemed to require implementing a pseudo-random function (PRF) within the
MPC, and using the outputs of that function to perform an oblivious sort.
Both of these steps would be very expensive to do in a circuit-based secure
computation circuit, so RAM-MPC designs favored ORAMs based on the
binary-tree design which did not require sorting or a private PRF evaluation.
Zahur et al. (2016) observed that the classic square-root ORAM design of
Goldreich and Ostrovsky (1996) could in fact be adapted to work efficiently in
RAM-MPC by implementing an oblivious permutation where the PRF required
for randomizing the permutation would be jointly computed by the two parties
outside of the generic MPC. This led to a simple and efficient ORAM design,
which, unlike tree-based ORAMs, has zero statistical failure probability, since
there is no risk that a block can overflow. The design maintains a public
5.4. Square-Root RAM-MPC 97
5.5 Floram
Doerner and Shelat (2017) observed that even the sublinear-cost requirement,
which was an essential design aspect of traditional ORAM systems, was not
necessary to be useful in RAM-MPC. Since the cost of secure computation
far exceeds the cost of standard computation, ORAM designs that have linear
cost “outside of MPC”, but reduce the computation performed “inside MPC”,
may be preferred to sublinear ORAM designs. With this insight, Doerner
and Shelat (2017) revisited the Distributed Oblivious RAM approach of Lu
and Ostrovsky (2013) and based a highly scalable and efficient RAM-MPC
scheme on two-server private information retrieval (PIR). The scheme, known
as Floram (Function-secret-sharing Linear ORAM), can provide over 100×
improvements over Square-Root ORAM and Circuit ORAM across a range of
realistic parameters.
Distributed Oblivious RAM relaxes the usual security requirement of
ORAM (the indistinguishability of server traces). Instead, the ORAM server
is split into two non-colluding servers, and security requirement is that the
memory access patterns are indistinguishable based on any single server trace
(but allowed to be distinguishable if the traces of both servers are combined).
We note that it is not immediately obvious how to use this primitive in
constructing two-party MPC, since it requires two non-colluding servers in
addition to the third player—the ORAM client.
Private information retrieval (PIR) enables a client to retrieve a selected
item from a server, without revealing to the server which item was retrieved
(Chor et al., 1995). Traditionally, PIR schemes are different from ORAM in
that they only provide read operations, and that they allow a linear server
access cost whereas ORAM aims for amortized sublinear retrieval cost.
A point function is a function that outputs 0 for all inputs, except one:
(
β if x = α
Pα,β (x) =
0, otherwise.
Gilboa and Ishai (2014) introduced the notion of distributed point functions
(DPF), where a point function is secret-shared among two players with shares
that have sizes sublinear in the domain of the function, hiding the values
of both α and β. The output of each party’s evaluation of the secret-shared
function is a share of the output and a bit indicating if the output is valid:
5.5. Floram 99
α,β
y px = Pp (x) (party p’s share output of the function), and t px = (x = α) (a
share of 1 if x = α, otherwise a share of 0). Gilboa and Ishai (2014) showed
how this could be used to efficiently implement two-server private information
retrieval, and Boyle et al. (2016b) improved the construction.
The Floram design uses secret-shared distributed point functions to imple-
ment a two-party oblivious write-only memory (OWOM) and both a two-party
oblivious read-only memory (OROM). The ORAM is constructed by com-
posing the OWOM and OROM, but since it is not possible to read from the
write-only memory, Floram uses a linear-scan stash to store written elements
until it is full, at which point the state of the ORAM is refreshed by convert-
ing the write-only memory into oblivious read-only memory, replacing the
previous OROM and resetting the OWOM stash. In the OWOM, values are
stored using XOR secret sharing. To write to an element, all elements are
updated by xor-ing the current value with the output of a generated distributed
point function—so, the semantic value of the update is 0 for all elements other
than the one to be updated, and the difference between the current value and
updated value for the selected element.
Refreshing. Once the stash becomes full, the ORAM needs to be refreshed
by converting the OWOM into a new OROM and clearing the stash. This is done
by having each party generate a new PRF key (k 1 generated by P1 , k 2 generated
by P2 ) and masking all of the values currently stored in its OWOM with keyed
PRF, W p0 [i] = PRFk p (i) ⊕ W p [i], for each party, p ∈ {1, 2}. The masked values
are then exchanged between the two parties. The OROM values are computed
by xor-ing the value received from the other party for each cell with their own
masked value to produce R[i] = PRFk1 (i) ⊕ PRFk2 (i) ⊕ W1 [i] ⊕ W2 [i], where
v[i] = W1 [i] ⊕ W2 [i]. Each party passes in its PRF key to the MPC, so that
values can be unmasked within MPC reads by computing PRFk1 (i) ⊕ PRFk2 (i)
within the MPC. This enables the read value to be unmasked for used within the
MPC, without disclosing the private index i. Thus, refreshing the stash requires
O(N) local computation and communication, and no secure computation. √
Because the refresh cost is relatively low, the optimal access√period is O( N)
(with low constants, so their concrete implementation used N/8).
Floram offers substantial performance improvements over Square-Root
ORAM and all other prior ORAM constructions used in RAM-MPC, even
though its asymptotic costs are linear. The linear-cost operations of the OROM
and OWOM are implemented outside the MPC, so even though each access
requires O(N) computation, the concrete cost of this linear work is much
less that the client computation done within the MPC. In Doerner and Shelat
(2017)’s experiments, the cost of the secure computation is the dominant cost
up to ORAMs with 225 elements, after which the linear local computation cost
becomes dominant. Floram was over able to scale to support ORAMs with 232
four-byte elements with an average access time of 6.3 seconds over a LAN.
Floram also enables a simple and efficient initialization method using the
same mechanism as used for refreshing. The Floram design can also support
reads and writes where the index is not private very efficiently—the location
i can be read directly from the OWOM just by passing in the secret-shared
values in location i of each party’s share into the MPC. Another important
advantage of Floram is that instead of storing wire labels as is necessary for
other RAM-MPC designs, which expands the memory each party must store
by factor κ (the computational security parameter), each party only needs to
store a secret share of the data which is the same as the original size of the
data for each the OROM and OWOM.
5.6. Further Reading 101
Many other data structures have been proposed for efficient MPC, often
incorporating ORAM aspects. Keller and Scholl (2014) proposed efficient
MPC data structures for arrays, built on top of ORAM designs. Wang et al.
(2014b) devised oblivious data structures including priority queues that take
advantage of sparse and predictable access patterns in many applications, and
presented a general pointer-based technique to support efficient tree-based
access patterns.
We only touched on the extensive literature on oblivious RAM, focusing
on designs for MPC. ORAM continues to be an active research area, with many
different design options and tradeoffs to explore. Buescher et al. (2018) study
various MPC-ORAM designs in application settings and developed a compiler
that selects a suitable ORAM for the array-accesses in a high-level program.
Faber et al. (2015) proposed a three-party ORAM based on Circuit ORAM
that offers substantial cost reduction in the three-party, honest majority model.
Another new direction that may be useful for MPC ORAM is to allow some
amount of limited leakage of the data access pattern to gain efficiency (Chan
et al., 2017; Wagh et al., 2018).
6
Malicious Security
6.1 Cut-and-Choose
102
6.1. Cut-and-Choose 103
P1 may send the garbling of a different circuit that P2 had not agreed to
evaluate, but P2 has no way to confirm the circuit is correct. The output of the
maliciously-generated garbled circuit may leak more than P2 has agreed to
reveal (for instance, P2 ’s entire input).
Main idea: check some circuits, evaluate others. The standard way to
address this problem is a technique called cut-and-choose, a general idea that
goes back at least to Chaum (1983), who used it to support blind signatures.
To use cut-and-choose to harden Yao’s GC protocol, P1 generates many
independent garbled versions of a circuit C and sends them to P2 . P2 then
chooses some random subset of these circuits and asks P1 to “open” them
by revealing all of the randomness used to generate the chosen circuits. P2
then verifies that each opened garbled circuit is a correctly garbled version
of the agreed-upon circuit C. If any of the opened circuits are found to be
generated incorrectly, P2 knows P1 has cheated and can abort. If all of the
opened circuits are verified as correct, P2 continues the protocol. Since the
opened circuits have had all of their secrets revealed, they cannot be used for
the secure computation. However, if all of the opened circuits were correct, P2
has some confidence that most of the unopened circuits are also correct. These
remaining circuits can then be evaluated as in the standard Yao protocol.
a situation where P2 knows for certain that P1 is cheating, but must continue
as if there was no problem to avoid leaking information about its input.
Traditionally, cut-and-choose protocols address this situation by making
P2 consider only the majority output from among the evaluated circuits. The
cut-and-choose parameters (number of circuits, probability of checking each
circuit) are chosen so that
is negligible. In other words, if all of the check circuits are correct, P2 can
safely assume that the majority of the evaluation circuits are correct too. This
justifies the choice to use the majority output.
Selective abort. Another subtle issue is even if all garbled circuits are correct,
P1 may still cheat by providing incorrect garbled inputs in the oblivious transfers.
Hence it does not suffice to just check the garbled circuits for correctness. For
instance, P1 may select its inputs to the OT so that whenever P2 ’s first input
bit is 1, P2 will pick up garbage wire labels (and presumably abort, leaking the
first input bit in the process). This kind of attack is known as a selective abort
attack (sometimes called selective failure). More generally, we care about when
P1 provides selectively incorrect garbled inputs in some OTs (e.g., P1 provides
a correct garbled input for 0 and incorrect garbled input for 1), so that whether
or not P2 receives incorrect garbled inputs depends on P2 ’s input.
An approach proposed by Lindell and Pinkas (2007) (and improved in
shelat and Shen (2013)) uses what are called k-probe-resistant matrices. The
idea behind this technique is to agree on a public matrix M and for P2 to
randomly encode its true input y into ỹ so that y = M ỹ. Then the garbled circuit
will compute (x, ỹ) 7→ F (x, M ỹ) and P2 will use the bits of ỹ (rather than y) as
its inputs to OT. The k-probe-resistant property of M is that for any nonempty
subset of rows of M, their XOR has Hamming weight at least k. Lindell and
Pinkas (2007) show that if M is k-probe-resistant, then the joint distribution
of any k bits of ỹ is uniform—in particular, it is independent of P2 ’s true input
y. Furthermore, when M is public, the computation of ỹ 7→ M ỹ consists of
only XOR operations and therefore adds no cost to the garbled circuit using
FreeXOR (though the increased size of ỹ contributes more oblivious transfers).
The k-probe-resistant encoding technique thwarts selective abort attacks
in the following way. If P1 provides selectively incorrect garbled inputs in at
most k OTs, then these will be selectively picked up by P2 according to at most
k specific bits of ỹ, which are completely uniform in this case. Hence, P2 ’s
abort condition is input-independent. If on the other hand P1 provides incorrect
garbled inputs in more than k OTs, then P2 will almost surely abort—at least
with probability 1 − 1/2k . If k is chosen so that 1/2k is negligible (e.g., if
106 Malicious Security
• The player (arbitrarily) prepares ρ balls, each one is either red or green.
A red ball represents an incorrectly-garbled circuit while a green ball
represents a correct one.
• The player wins the game if the majority of the unchecked balls are red.
We want to find the smallest ρ and best c so that no player can win the game
with probability better than 2−λ . The analysis of shelat and Shen (2011) found
that the minimal replication factor is ρ ≈ 3.12λ, and the best number of items
to check is c = 0.6ρ (surprisingly, not 0.5ρ).
Cost-aware cut-and-choose. The results from shelat and Shen (2011) pro-
vide an optimal number of check and evaluation circuits, assuming the cost
of each circuit is the same. However, some circuits are evaluated and others
are checked, and these operations do not have equal cost. In particular, the
computational cost of evaluating a garbled circuit is about 25–50% the cost
of checking a garbled circuit for correctness since evaluating just involves
executing one path through the circuit, while checking must verify all entries
in the garble tables. Also, some variants of cut-and-choose (e.g., Goyal et al.
(2008)) allow P1 to send only a hash of a garbled circuit up-front, before P2
6.2. Input Recovery Technique 107
chooses which circuits to open. To open a circuit, P1 can simply send a short
seed that was used to derive all the randomness for the circuit. P2 can then
recompute the circuit and compare its hash to the one originally sent by P1 .
In protocols like this, the communication cost of a checked circuit is almost
nothing—only evaluation circuits require significant communication.
Zhu et al. (2016) study various cut-and-choose games and derive parameters
with optimal cost, accounting for the different costs of checking and evaluating
a garbled circuit.
for the same reasons as mentioned above, P2 must not reveal whether it
obtained such a proof since that event may be input-dependent and leak
information about P2 ’s private input.
There are many subtle details that enable this protocol to work. Some of
the most notable are:
• In order to make the circuits for the second phase small, it is helpful
if all garbled circuits share the same output wire labels. When this is
the case, opening any circuit would reveal all output wire labels for all
evaluation circuits and allows P2 to “forge” a proof of cheating. Hence
the check circuits of phase 1 cannot be opened until the parties’ inputs
to phase 2 have been fixed.
• The overall protocol must enforce that P1 uses the same input to all
circuits in both phases. It is important that if P1 uses input x in phase 1
and cheats, it cannot prevent P2 from learning that same x in phase 2.
Typical mechanisms for input consistency (such as the 2-universal hash
technique described above) can easily be adapted to ensure consistency
across both phases in this protocol.
As a motivating scenario, consider the case where two parties know in advance
that they would like to perform N secure evaluations of the same function f
(on unrelated inputs). In each secure computation, P1 would be required to
generate many garbled circuits of f for each cut-and-choose. The amortized
costs for each evaluation can be reduced by performing a single cut-and-choose
for all N evaluation instances.
Consider the following variant of the cut-and-choose abstract game:
3. [new step] The unchecked balls are randomly assigned into N buckets,
with each bucket containing exactly ρ balls.
4. [modified step] The player wins if any bucket contains only red balls (in
a different variant, one might specify that the player wins if any bucket
contains a majority of red balls).
This game naturally captures the following high-level idea for a cut-and-
choose protocol suitable for a batch of N evaluations of the same function.
First, P1 generates N ρ + c garbled circuits. P2 randomly chooses c of them to
be checked and randomly assigns the rest into N buckets. Each bucket contains
the circuits to be evaluated in a particular instance. Here we are assuming that
each instance will be secure as long as it includes at least one correct circuit
(for example, using the mechanisms from Section 6.2).
Intuitively, it is now harder for the player (adversary) to beat the cut-and-
choose game, since the evaluation circuits are further randomly assigned to
110 Malicious Security
buckets. The player must get lucky not only in avoiding detection during
checking, but also in having many incorrect circuits placed in the same bucket.
Zhu and Huang (2017) give an asymptotic analysis showing that replication
ρ = 2 + Θ(λ/log N) suffices to limit the adversary to success probability 2−λ .
Compare this to single-instance cut-and-choose which requires replication
factor O(λ).1 The improvement over single-instance cut-and-choose is not just
asymptotic, but is significant for reasonable values of N. For instance, for
N = 1024 executions, one achieves a security level of 2−40 if P1 generates
5593 circuits, of which only 473 are checked. Then only ρ = 5 circuits are
evaluated in each execution.
Lindell and Riva (2014) and concurrently Huang et al. (2014) described
batch cut-and-choose protocols following the high-level approach described
above. The former protocol was later optimized and implemented in Lindell
and Riva (2015). The protocols use the input-recovery technique so that each
instance is secure as long as at least one correct circuit is evaluated.
• The gates within a single bucket are connected so that they collec-
tively act like a fault-tolerant garbled NAND gate, which correctly
1The replication factor in this modified game measures only the number of evaluation
circuits, whereas for a single instance we considered the total number (check and evaluation) of
circuits. In practice, the number of checked circuits in batch cut-and-choose is quite small, and
there is little difference between amortized number of total circuits vs. amortized number of
evaluation circuits.
6.4. Gate-level Cut-and-Choose: LEGO 111
We now describe the soldering process in more detail, using the termi-
nology of Frederiksen et al. (2013). The paradigm requires a homomorphic
commitment, meaning that if P1 commits to values A and B independently,
it can later either decommit as usual, or can generate a decommitment that
reveals only A ⊕ B to P2 .
P1 prepares many individual garbled gates, using the FreeXOR technique.
For each wire i, P1 chooses a random “zero-label” k i0 ; the other label for that
wire is k i1 = k i0 ⊕ ∆, where ∆ is the FreeXOR offset value common to all gates.
P1 sends each garbled gate, and commits to the zero-label of each wire, as well
as to ∆ (once and for all for all gates). In this way, P1 can decommit to k i0 or to
ki1 = k i0 ⊕ ∆ using the homomorphic properties of the commitment scheme.
If a gate is chosen to be checked, then P1 cannot open all wire labels
corresponding to the gate. This would reveal the global ∆ value and break
the security of all gates. Instead, P2 chooses a one of the four possible input
combinations for the gate at random, and P1 opens the corresponding input and
output labels (one label per wire). Then, P2 can check that the gate evaluates
correctly on this combination. An incorrectly-garbled gate can be therefore
caught only with probability 14 (Zhu and Huang (2017) provides a way to
increase this probability to 12 ). This difference affects the cut-and-choose
parameters (e.g., bucket size) by a constant factor.
Soldering corresponds to connecting various wires (attached to individual
gates) together, so that the logical value on a wire can be moved to another
wire. Say that wire u (with zero-label k u0 ) and wire v (with zero-label k v0 ) are to
be connected. Then P1 can decommit to the solder value σu,v = k u0 ⊕ k v0 . This
value allows P2 to transfer a garbled value from wire u to wire v during circuit
evaluation. For example, if P2 holds wire label k ub = k u0 ⊕ b · ∆, representing
unknown value b, then xor-ing this wire label with the solder value σu,v results
112 Malicious Security
k ub ⊕ σu,v = (k u0 ⊕ b · ∆) ⊕ (k u0 ⊕ k v0 ) = k v0 ⊕ b · ∆ = k vb
Goldreich, Micali, and Wigderson (GMW) showed a compiler for secure multi-
party computation protocols that uses ZK proofs (Goldreich et al., 1987). The
compiler takes as input any protocol secure against semi-honest adversaries,
and generates a new protocol for the same functionality that is secure against
malicious adversaries.
Let π denote the semi-honest-secure protocol. The main idea of the GMW
compiler is to run π and prove in zero-knowledge that every message is the
result of running π honestly. The honest parties abort if any party fails to
provide a valid ZK proof. Intuitively, the ZK proof ensures that a malicious
party can either run π honestly, or cheat in π but cause the ZK proof to fail. If π
is indeed executed honestly, then the semi-honest security of π ensures security.
Whether or not a particular message is consistent with honest execution of π
depends on the parties’ private inputs. Hence, the ZK property of the proofs
ensures that this property can be checked without leaking any information
about these private inputs.
1. Each party must prove that each message of π is consistent with honest
execution of π, on a consistent input. In other words, the ZK proof
114 Malicious Security
3. The prover evaluates the circuit and obtains the output wire label
(corresponding to output 1) and generates a commitment to this wire
label.
116 Malicious Security
4. The verifier opens the garbled circuit and the prover checks that it was
generated correctly. If so, then the prover opens the commitment to the
output wire label.
Recall the approach for secret-sharing based MPC using Beaver triples (Sec-
tion 3.4). This protocol paradigm is malicious-secure given suitable Beaver
triples and any sharing mechanism such that:
But seeing only MACK,∆ (x) perfectly hides ∆ from the adversary. Hence, the
probability of computing a MAC forgery is bounded by 1/|F| ≤ 1/2κ , the
probability of guessing a randomly chosen field element ∆.
In fact, the security of this MAC holds even when an honest party has many
MAC keys that all share the same ∆ value (but with independently random K
values). We refer to ∆ as the global MAC key and K as the local MAC key.
The idea of BDOZ is to authenticate each party’s shares with these
information-theoretic MACs. We start with the two-party case. Each party Pi
generates a global MAC key ∆i . Then [x] denotes the secret-sharing mechanism
where P1 holds x1, m1 and K1 and P2 holds x2, m2 and K2 such that:
Next, we argue that this sharing mechanism satisfies the properties required by
the Beaver-triple paradigm (Section 3.4):
118 Malicious Security
• Privacy: the individual parties learn nothing about x since they only
hold one additive share, x p , and m p reveals nothing about x without
knowing the other party’s keys (which are never revealed).
• Homomorphism: The main idea is that when all MACs in the system
use the same ∆ value, the MACs become homomorphic in the necessary
way. That is,
Here we focus on adding shared values [x] + [x 0]; the other required
forms of homomorphism work in a similar way. The sharings of [x] and
[x 0] and the resulting BDOZ sharing of [x + x 0] is shown in Figure 6.2.
parties have additive shares of x and each party’s share is authenticated under
every other party’s MAC key.
Generating triples. The BDOZ sharing method satisfies the security and
homomorphism properties required for use in the abstract Beaver-triples
approach. It remains to be seen how to generate Beaver triples in this format.
Note that BDOZ shares work naturally even when the payloads (i.e., x in
[x]) are restricted to a subfield of F. The sharings [x] are then homomorphic
with respect to that subfield. A particularly useful case is to use BDOZ for
sharings of single bits, interpreting {0, 1} as a subfield of F = GF(2κ ). Note
that F must be exponentially large for security (authenticity) to hold.
The state of the art method for generating BDOZ shares of bits is the
scheme used by Tiny-OT (Nielsen et al., 2012). It uses a variant of traditional
OT extension (Section 3.7.2) to generate BDOZ-authenticated bits [x]. It then
uses a sequence of protocols to securely multiply these authenticated bits
needed to generate the required sharings for Beaver triples.
In BDOZ sharing, each party’s local part of [x] contains a MAC for every other
party. In other words, the storage requirement of the protocol scales linearly
with the number of parties. A different approach introduced by Damgård,
Pastro, Smart, and Zakarias (SPDZ, often pronounced “speeds”) (Damgård
et al., 2012b) results in constant-sized shares for each party.
As before, we start with the two-party setting. The main idea is to have a
global MAC key ∆ that is not known to either party. Instead, the parties hold
∆1 and ∆2 which can be thought of as shares of a global ∆ = ∆1 + ∆2 . In a
SPDZ sharing [x], P1 holds (x1, t1 ) and P2 holds (x2, t2 ), where x1 + x2 = x
and t1 + t2 = ∆ · x. Thus, the parties hold additive shares of x and of ∆ · x. One
can think of ∆ · x as a kind of “0-time information-theoretic MAC” of x.
This scheme clearly provides privacy for x. Next, we show that it also
provides the other two properties required for Beaver triples:
• Secure opening: We cannot have the parties simply announce their
shares, since that would reveal ∆. It is important that ∆ remain secret
throughout the entire protocol. To open [x] without revealing ∆, the
protocol proceeds in 3 phases:
120 Malicious Security
Generating SPDZ shares. Since SPDZ shares satisfy the properties needed
for abstract Beaver-triple-based secure computation, the only question remains
6.7. Authenticated Garbling 121
how to generate Beaver triples in the SPDZ format. The paper that initially
introduced SPDZ (Damgård et al., 2012b) proposed a method involving
somewhat homomorphic encryption. Followup work suggests alternative
techniques based on efficient OT extension (Keller et al., 2016).
Hence, a side-effect of a BDOZ sharing [x] is that parties hold additive shares
of x∆1 , where ∆1 is P1 ’s global MAC key.
Hence, using only local computation (P1 simply adds the appropriate value to
its share), parties can obtain additive shares of e0,0 and all other rows in the
garbled table.
In summary, the distributed garbling procedure works by generating BDOZ-
authenticated shares of random permute bits [pi ] for every wire in the circuit,
along with Beaver triples [pa ], [pb ], [pa · pb ] for every AND gate in the circuit.
Then, using only local computation, the parties can obtain additive shares of a
garbled circuit that uses the pi values as its permute bits. P1 sends its shares of
the garbled circuit to P2 , who can open it and evaluate.
In this chapter, we consider some different assumptions about threats that lead
to MPC protocols offering appealing security-performance trade-offs. First,
we relax the assumption that any number of participants may be dishonest and
discuss protocols designed to provide security only when a majority of the
participants behave honestly. Assuming an honest majority allows for dramatic
performance improvements. Then, we consider alternatives to the semi-honest
and malicious models that have become standard in MPC literature, while still
assuming that any number of participants may be corrupted. As discussed in
the previous chapter, semi-honest protocols can be elevated into the malicious
model, but this transformation incurs a significant cost overhead which may
not be acceptable in practice. At the same time, real applications present a
far more nuanced set of performance and security constraints. This prompted
research into security models that offer richer trade-offs between security and
performance. Section 7.1 discusses protocols designed to take advantage of the
assumption that the majority of participants are honest. Section 7.2 discusses
scenarios there trust between the participants is asymmetric, and the remaining
sections present protocols designed to provide attractive security-performance
trade-offs in settings motivated by practical scenarios.
126
7.1. Honest Majority 127
So far we have considered security against adversaries who may corrupt any
number of the participants. Since the purpose of security is to protect the
honest parties, the worst-case scenario for a protocol is that n − 1 out of n
parties are corrupted.1 In the two-party case, it is indeed the only sensible
choice to consider one out of the two parties to be corrupt.
However, in the multi-party setting it often is reasonable to consider
restricted adversaries that cannot corrupt as many parties as they want. A
natural threshold is honest majority, where the adversary may corrupt strictly
less than n2 of the n parties. One reason this threshold is natural is that, assuming
an honest majority, every function has an information-theoretically secure
protocol (Ben-Or et al., 1988; Chaum et al., 1988), while there exist functions
with no such protocol in the presence of dn/2e corrupt parties.
1We consider only static security, where the adversary’s choice of the corrupted parties
is made once-and-for-all, at the beginning of the interaction. It is also possible to consider
adaptive security, where parties can become corrupted throughout the protocol’s execution.
In the adpative setting, it does indeed make sense to consider scenarios where all parties are
(eventually) corrupted.
128 Alternative Threat Models
garbled circuit is generated correctly. Other protocol issues relevant for the
malicious case (like obtaining garbled inputs) are handled in a similar way, by
checking responses from the two garbling parties for consistency.
One additional advantage of the 3-party setting is that there is no need for
oblivious transfer (OT) as in the 2-party setting. Instead of using OT to deliver
garbled inputs to the evaluator P3 , we can let P3 secret-share its input and send
one share (in the clear!) to each of P1 and P2 . These garblers can send garbled
inputs for each of these shares, and the circuit can be modified to reconstruct
these shares before running the desired computation. Some care is required to
ensure that the garblers send the correct garbled inputs in this way (Mohassel
et al. (2015) provide details). Overall, the result is a protocol that avoids all
OT and thus uses only inexpensive symmetric-key cryptography.
The basic protocol of Mohassel et al. (2015) has been generalized to
provide additional properties like fairness (if the adversary learns the output,
then the honest parties do) and guaranteed output delivery (all honest parties
will receive output) (Patra and Ravi, 2018). Chandran et al. (2017) extend it to
√
provide security against roughly n out of n corrupt parties.
The honest-majority 3-party setting also enables some of the fastest general-
purpose MPC implementations to date. These protocols achieve their high
performance due to their extremely low communication costs — in some cases,
as little as one bit per gate of the circuit!
It is possible to securely realize any functionality information-theoretically
in the presence of an honest majority, using the classical protocols of Ben-Or
et al. (1988) and Chaum et al. (1988). In these protocols, every wire in the
circuit holds a value v, and the invariant of these protocols is that the parties
collectively hold some additive secret sharing of v. As in Section 3.4, let [v]
denote such a sharing of v. For an addition gate z = x + y, the parties can
compute a sharing [x + y] from sharings [x] and [y] by local computation only,
due to the additive homomorphism property of the sharing scheme. However,
interaction and communication are required for multiplication gates to compute
a sharing [xy] from sharings [x] and [y].
In Section 3.4 we discussed how to perform such multiplications when
pre-processed triples of the form [a], [b], [ab] are available. It is also possible
7.1. Honest Majority 129
to perform multiplications with all effort taking place during the protocol (i.e.,
not in a pre-processing phase). For example, the protocol of Ben-Or et al.
(1988) uses Shamir secret sharing for its sharings [v], and uses an interactive
multiplication subprotocol in which all parties generate shares-of-shares and
combine them linearly.
The protocols in this section are instances of this general paradigm, highly
specialized for the case of 3 parties and 1 corruption (“1-out-of-3” setting).
Both the method of sharing and the corresponding secure multiplication
subprotocol are the target of considerable optimizations.
The Sharemind protocol of Bogdanov et al. (2008a) was the first to
demonstrate high performance in this setting. Generally speaking, for the
1-out-of-3 case, it is possible to use a secret sharing scheme with threshold 2
(so that any 2 shares determine the secret). The Sharemind protocol instead
uses 3-out-of-3 additive sharing, so that in [v] party Pi holds value vi such that
v = v1 + v2 + v3 (in an appropriate ring, such as Z2 for Boolean circuits). This
choice leads to a simpler multiplication subprotocol in which each party sends
7 ring elements.
Launchbury et al. (2012) describe an alternative approach in which
each party sends only 3 ring elements per multiplication. Furthermore, the
communication is in a round-robin pattern, where the only communication is
in the directions P1 → P2 → P3 → P1 . The idea behind multiplication is as
follows. Suppose two values x and y are additively shared as x = x1 + x2 + x3
and y = y1 + y2 + y3 , where party Pi holds xi, yi . To multiply x and y, it
suffices to compute all terms of the form xa · yb for a, b ∈ {1, 2, 3}. Already Pi
has enough information to compute xi yi , but the other terms are problematic.
However, if each Pi sends its shares around the circle (i.e., P1 sends to P2 , P2
sends to P3 , and P3 to P1 ), then every term of the form xa yb will computable
by some party. Each party will now hold two of the xi shares and two of the yi
shares. This still perfectly hides the values of x and y from a single corrupt
party since we are using 3-out-of-3 sharing. The only problem is that shares
of xy are correlated to shares of x and y, while it is necessary to have an
independent sharing of xy. So, the parties generate a random additive sharing
of zero and locally add it to their (non-random) sharing of xy.
Araki et al. (2016) propose a secret sharing scheme that is a variant of
130 Alternative Threat Models
important property of this protocol is used: a single corrupt party cannot cause
the output of multiplication to be an invalid sharing, only a (valid) sharing of a
different value. Hence, the adversary can cause any triple generated in this way
to have the form [a], [b], [ab] or [a], [b], [ab] (since this idea only applies for
sharings of single bits). Starting with this observation, collections of triples
are used to “cross-check” each other and guarantee their correctness.
Although the standard models assume all parties are equally distrusting, many
realistic scenarios have asymmetric trust. For example, consider the setting
where one of the two participants of a computation is a well-known business,
such as a bank (denoted by P1 ), providing service to another, less trusted,
participant, bank’s customer, denoted by P2 . It may be safe to assume that P1 is
unlikely to actively engage in cheating by deviating from the prescribed protocol.
Indeed, banks today enjoy full customer trust and operate on all customer
data in plaintext. Customers are willing to rely on established regulatory and
legal systems, as well as the long-term reputation of the bank, rather than on
cryptographic mechanisms, to protect their funds and transactions. Today, we
not only trust the bank to correctly perform requested transactions, but we also
trust that the bank will not misuse our data and will keep it in confidence.
However, there may be several reasons why a cautious customer who trusts
the bank’s intent may want to withhold certain private information and execute
transactions via MPC. One is the unintentional data release. As with any
organization, the bank may be a target of cyber attacks, and data stored by
the bank, including customer data, may simply be stolen. Having employed
MPC to safeguard the data eliminates this possibility, since the bank will not
have the sensitive data in the first place. Another reason might be the legally
mandated audits and summons of the data. As an organization, a bank may
have a presence in several jurisdictions with different rules on data retention,
release and reporting. Again, MPC will serve as a protection for unpredictable
future data releases.
Hence, given the existing trust to the bank it seems reasonable to employ
semi-honest model to protect the customer. However, having upgraded cus-
tomer privacy by moving from plaintext operation to semi-honest MPC (and
correspondingly placing the customer as a semi-honest player), we now actually
132 Alternative Threat Models
1. Failed simulation. The idea is to allow the simulator (of the cheating
party) to fail sometimes. “Fail” means that its output distribution is not
indistinguishable from the real one. This corresponds to an event of
successful cheating. The model guarantees that the probability that the
adversary is caught cheating is at least times the probability that the
simulator fails.
One serious issue with the above definition is that it only requires that if
cheating occurred in the real execution, the cheater will be caught with
probability . The definition does not prevent the cheating player from
deciding when to cheat (implicitly) based on the honest player’s input.
In particular, P1 could attempt cheat only on the more valuable inputs
of P2 (e.g., natural protocols exist which allow P1 to attempt to cheat
134 Alternative Threat Models
3. Strong explicit cheat. This is the same as the explicit cheat formulation,
with the exception that the cheating ideal-model adversary is not allowed
to obtain the honest players’ inputs in the case where cheating is detected.
The first two (strictly weaker) models of Aumann and Lindell (2007) did
not gain significant popularity mainly because the much stronger third model
admits protocols of the same or very similar efficiency as the weaker ones. The
strong explicit cheat model became standard due to its simplicity, effectiveness
in deterrence, and the discovery of simple and efficient protocols that achieve
it. We present one such simple and efficient 2PC protocol next.
Since the work of Aumann and Lindell (2007), significant progress in efficient
OT has produced several extremely efficient malicious OT protocols (Asharov
et al., 2015b; Keller et al., 2015), with the latter having overhead over the semi-
honest OT of only 5%. As a result, we don’t consider covert OT security, and
assume a maliciously-secure OT building block. It is important to remember,
however, that a maliciously-secure protocol does not guarantee the players
submit prescribed inputs. In particular, while malicious OT ensures correct
7.3. Covert Security 135
Core Protocol. Aumann and Lindell go along the lines of the cut-and-
choose approach and propose that P1 generates and sends to P2 two GCs.
The two garbled circuits C
b0 and C
b1 are generated from random seeds s0 and
s1 respectively by expanding them using a pseudo-random generator (PRG).
Upon receipt, P2 flips a coin b ∈ {0, 1} and asks P1 to open the circuit C bb
by providing sb . Because the GCs are constructed deterministically from a
seed via a PRG expansion, opening is achieved simply by sending the seed to
the verifier. This allows P2 to check the correctness of the generated garbled
circuit C
bb by constructing the same garbled circuit from the provided seed, and
comparing it to the copy that was sent. This guarantees that a badly constructed
b will be detected with probability = 1 , which is needed to satisfy the strong
C 2
explicit cheat definition.
However, a malicious P1 can also perform OT-related attacks. For example,
P1 can flip the semantics of the labels on P2 ’s input wires, effectively silently
flipping P2 ’s input. Similarly, P1 can set both P2 ’s input wire labels to the
same value, effectively setting P2 ’s input to a fixed value. Another attack is
the selective abort attack discussed in Section 6.1, where one of the two OT
secrets is set to be a dummy random value, resulting in an selectively aborted
evaluation that allows P1 to learn a bit of P2 ’s input.
As a result, we must ensure that a OT input substitution by P1 is caught
at least with probability equal to the deterrence factor . Note that input
substitution by P2 is allowed as it simply corresponds to P2 choosing a different
MPC input, a behavior allowed by the security definition.
Next, we discuss defenses to these attacks.
In the covert security model, a party can deviate arbitrarily from the protocol
description but is caught with a fixed probability , called the deterrence
factor. In many practical scenarios, this guaranteed risk of being caught (likely
resulting in loss of business or embarrassment) is sufficient to deter would-be
cheaters, and covert protocols are much more efficient and simpler than their
malicious counterparts.
At the same time, the cheating deterrent introduced by the covert model is
relatively weak. Indeed, an honest party catching a cheater certainly knows
what happened and can respond accordingly (e.g., by taking their business
elsewhere). However, the impact is largely limited to this, since the honest
player cannot credibly accuse the cheater publicly. Doing so might require the
honest player to reveal its private inputs (hence, violate its security), or the
protocol may simply not authenticate messages as coming from a specific party.
If, however, credible public accusation (i.e., a publicly-verifiable cryptographic
proof of the cheating) were possible, the deterrent for the cheater would be
much greater: suddenly, all the cheater’s customers and regulators would be
aware of the cheating and thus any cheating may affect the cheater’s global
customer base.
The addition of credible accusation greatly improves the covert model
even in scenarios with a small number of players, such as those involving the
government. Consider, for example, the setting where two agencies are engaged
in secure computation on their respective classified data. The covert model
may often be insufficient here. Indeed, consider the case where one of the
two players deviates from the protocol, perhaps due to an insider attack. The
honest player detects this, but non-participants are now faced with the problem
138 Alternative Threat Models
3. Proof of cheating does not reveal honest party’s private data (including
the data used in the execution where cheating occurred).
comes for free with natural GC protocols, and we only need to consider a
malicious generator (P1 ).
Recall the selective failure attack on P2 ’s input wires, where P1 sends P2
(via OT) an invalid wire label for one of P2 ’s two possible inputs and learns
which input bit P2 selected based on whether or not P2 aborts. To protect
against this attack, the parties construct a new circuit C 0 that prepends an input
XOR tree in C as discussed in Section 7.3. To elevate to the covert model, P1
then constructs λ (the GC replication factor) garblings of C 0 and P2 randomly
selects λ − 1 of them and checks they are correctly constructed, and evaluates
the remaining C 0 garbled circuit to derive the output.
We now adapt this protocol to the PVC setting by allowing P2 to not only
detect cheating, but also to obtain a publicly verifiable proof of cheating if
cheating is detected. The basic idea is to require the generator P1 to establish
a public-private keypair, and to sign the messages it sends. The intent is
that signed inconsistent messages (e.g., badly formed GCs) can be published
and will serve as a convincing proof of cheating. The main difficulty of this
approach is ensuring that neither party can improve its odds by selectively
aborting. For example, if P1 could abort whenever P2 ’s challenge would reveal
that P1 is cheating (and hence avoid sending a signed inconsistent transcript),
this would enable P1 to cheat without the risk of generating a proof of cheating.
Asharov and Orlandi address this by preventing P1 from knowing P2 ’s
challenge when producing the response. In their protocol, P1 sends the GCs to
P2 and opens the checked circuits by responding to the challenge through a
1-out-of-λ OT. For this, P1 first sends all (signed) GCs to P2 . Then the players
run OT, where in the i-th input to the OT P1 provides openings (seeds) for all
the GCs except for the i-th, as well as the input wire labels needed to evaluate
Cbi . Party P2 inputs a random γ ∈R [λ], so receives the seeds for all circuits
other than C bγ from the OT and the wire labels for its input for C bγ . Then, P2
checks that all GCs besides Cγ are constructed correctly; if the check passes,
b
P2 evaluates C bγ . Thus, P1 does not know which GC is being evaluated, and
which ones are checked.
However, a more careful examination shows that this actually does not
quite get us to the PVC goal. Indeed, a malicious P1 simply can include invalid
openings for the OT secrets which correspond to the undesirable choice of
the challenge γ. The Asharov-Orlandi protocol addresses this by having P1
140 Alternative Threat Models
sign all its messages as well as using a signed-OT in place of all standard OTs
(including wire label transfers and GC openings). Informally, the signed-OT
functionality proceeds as follows. Rather than the receiver R getting message
mb (which might include a signature that S produced) from the sender S
for choice bit b, the signature component of signed-OT is explicit in the OT
definition. Namely, we require that R receives ((b, mb ), Sig), where Sig is S’s
valid signature of (b, mb ). This guarantees that R will always receive a valid
signature on the OT output it receives. Thus, if R’s challenge detects cheating,
the (inconsistent) transcript will be signed by S, so can be used as proof
of this cheating. Asharov and Orlandi (2012) show that this construction is
-PVC-secure for = (1 − 1/λ)(1 − 2−ν+1 ), where ν is the replication factor
of the employed XOR tree, discussed above.
We note that their signed-OT heavily relies on public-key operations, and
cannot use the much more efficient OT extension.
Since the view of R plays the role of the proof of cheating, we must ensure
certain non-malleability of the view of R, to prevent it from defaming the
honest S. For this, we need to commit R to its particular choices throughout
7.5. Reducing Communication in Cut-and-Choose Protocols 141
the OT extension protocol. At the same time, we must maintain that those
commitments do not leak any information about R’s choices. Next, we sketch
how this can be done, assuming familiarity with the details of the OT extension
of Ishai et al. (2003) (IKNP from Section 3.7.2).
Recall that in the standard IKNP OT extension protocol, R constructs a
random matrix M, and S obtains a matrix M 0 derived from the matrix M,
S’s random string s, and R’s vector of OT inputs r. The matrix M is the
main component of R’s view which, together with S’s signed messages will
constitute a proof of cheating.
To reiterate, we must address two issues. First, because M 0 is obtained
by applying R’s private input r to M, and M 0 is known to S, M is now
sensitive and cannot be revealed. Second, we must prevent R from publishing
a doctored M, which would enable a false cheating accusation. Kolesnikov
and Malozemoff (2015) (KM) resolve both issues by observing that S does in
fact learn some of the elements of M, since in the OT extension construction
some of the columns of M and M 0 are the same (i.e., those corresponding to
zero bits of S’s string s).
The KM signed-OT construction prevents R from cheating by having S
include in its signature carefully selected information from the columns in M
which S sees. Finally, the protocol requires that R generate each row of M
from a seed, and that R’s proof of cheating includes this seed such that the row
rebuilt from the seed is consistent with the columns included in S’s signature.
Kolesnikov and Malozemoff (2015) show that this makes it infeasible for R to
successfully present an invalid row of the OT matrix in the proof of cheating.
The KM construction is in the random oracle model, a slight strengthening of
the assumptions needed for standard OT extension and FreeXOR, two standard
secure computation tools.
The KM construction is also interesting from a theoretical perspective in
that it shows how to construct signed-OT from any maliciously secure OT
protocol, whereas Asharov and Orlandi (2012) build a specific construction
based on the Decisional Diffie-Hellman problem.
of which several are opened and checked, and one (or more for malicious
security) is evaluated. The opened garbled circuits have no further use, since
they hold no secrets. They only serve as a commitment for purposes of the
challenge protocol. Can commitments to these GCs be sent and verified instead,
achieving the same utility?
Indeed, as formalized by Goyal et al. (2008), this is possible in covert and
malicious cut-and-choose protocols. One must, of course, be careful with the
exact construction. One suitable construction is provided by Goyal et al. (2008).
Kolesnikov and Malozemoff (2015) formalize a specific variant of hashing,
which works with their PVC protocol, thus resulting of the PVC protocol being
of the same communication cost as the semi-honest Yao’s GC protocol.
equality test has only one bit of output, it can be shown that the dual-execution
protocol leaks at most one (adversarially-chosen) bit describing the honest
party’s input.
In a follow-up work, Huang et al. (2012b) formalize the dual-execution
definition of Mohassel and Franklin and propose several optimizations in-
cluding showing how the two executions can be interleaved to minimize the
overall latency of dual execution overhead over semi-honest single execution.
In another follow-up work, Kolesnikov et al. (2015) show how the leakage
function of the dual-execution 2PC can be greatly restricted and the probability
of leakage occurring reduced.
In the past decade or so, MPC made dramatic strides, developing from
a theoretical curiosity to a versatile tool for building privacy-preserving
applications. For most uses, the key metric is cost, and the cost of deploying
MPC has declined by 3–9 orders of magnitude in the past decade.
The first reported 2PC system, Fairplay (Malkhi et al., 2004), executed a
4383-gate circuit in the semi-honest model, taking over 7 seconds on a local
area network at a rate of about 625 gates per second. Modern 2PC frameworks
can execute about 3 million gates per second on a 1Gbps LAN, and scale to
circuits with hundreds of billions of gates.
Cost improvements for malicious secure MPC have been even more
dramatic. The first substantial attempt to implement malicious secure generic
MPC was Lindell et al. (2008), intriguingly titled “Implementing Two-Party
Computation Efficiently with Security Against Malicious Adversaries”. It
reports malicious evaluation of the 16-bit comparison circuit, consisting of
fifteen 3-to-1 gates and one 2-to-1 gate, in between 135 to 362 seconds
depending on the security parameter settings. This evaluation rate corresponds
to about 0.13 gates per second. An implementation of the authenticated garbling
scheme (Section 6.7) reports malicious security 2PC at over 0.8 million gates
per second on a 10Gbps LAN (Wang et al., 2017b). This corresponds to over a
148
149
Cost. Despite dramatic advances in the 2PC and MPC technology in the past
decade, secure function evaluation still may incur several orders of magnitude
cost penalty over standard (non-private) execution, especially when protecting
against malicious players. The exact overhead varies greatly from virtually
non-existent to unacceptable, and mostly depends on the computed function.
In particular, for generic MPC, the protocols described in this book (which
are, arguably, the ones that are most scalable today in typical settings) all
require bandwidth that scales linearly in the size of the circuit. Bandwidth
within a data center is inexpensive (indeed, many cloud providers do not charge
customers anything for bandwidth between nodes within the same data center),
but it requires a strong trust model to assume all participants in an MPC
would be willing to outsource their computation to the same cloud provider.
In some use cases, this linear bandwidth cost may be prohibitively expensive.
Making bandwidth cost sublinear in circuit size requires a very different
paradigm. Although it has been shown to be possible with threshold-based
FHE schemes (Asharov et al., 2012), such schemes are a long way from being
practical. Recent results have shown that function-sharing schemes can be used
to build lower-bandwidth MPC protocols for certain classes of functions (Boyle
et al., 2016a; Boyle et al., 2018).
The solution to this bandwidth cost seems to require hybrid protocols that
combining MPC with custom protocols or homomorphic encryption to enable
secure computation without linear bandwidth cost. We have covered several
approaches that incorporate these strategies in MPC including private set
150 Conclusion
Output leakage. The goal of MPC is to protect the privacy of inputs and
intermediate results, but at the end of the protocol the output of the function is
revealed. A separate research field has developed around the complementary
problem where there is no need to protect the input data, but the output must
be controlled to limit what an adversary can infer about the private data from
the output. The dominant model for controlling output leakage is differential
privacy (Dwork and Roth, 2014) which adds privacy-preserving noise to
outputs before they are revealed. A few works have explored combining MPC
with differential privacy to provide end-to-end privacy for computations with
distributed data (Pettai and Laud, 2015; Kairouz et al., 2015; He et al., 2017),
but this area is in its infancy and many challenging problems need to be solved
before the impacts of different types of leakage are well understood.
The authors thank Jeanette Wing for instigating this project; our editors at Now
Publishers, James Finlay and Mike Casey, for help and flexibility throughout
the writing process; and Alet Heezemans for help with the final editing. We
thank Patricia Thaine for particularly helpful comments and corrections, and
Weiran Liu and Shengchao Ding for their work on a Chinese translation and
the detailed suggestions and corrections resulting from that effort.
Vladimir Kolesnikov was supported in part by Sandia National Laboratories,
a multimission laboratory managed and operated by National Technology
and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of
Honeywell International, Inc., for the U.S. Department of Energy’s National
Nuclear Security Administration under contract DE-NA-0003525.
Mike Rosulek was supported in part by the National Science Foundation
(award #1617197), a Google Research award, and a Visa Research award.
David Evans was supported in part by National Science Foundation awards
#1717950 and #1111781, and research awards from Google, Intel, and Amazon.
153
References
154
References 155
Ball, M., T. Malkin, and M. Rosulek. 2016. “Garbling Gadgets for Boolean
and Arithmetic Circuits”. In: ACM CCS 16: 23rd Conference on Computer
and Communications Security. Ed. by E. R. Weippl, S. Katzenbeisser,
C. Kruegel, A. C. Myers, and S. Halevi. ACM Press. 565–577.
Bar-Ilan Center for Research in Applied Cryptography and Cyber Security.
2014. “SCAPI: Secure Computation API”. https://cyber.biu.ac.il/scapi/.
Beaver, D. 1992. “Efficient Multiparty Protocols Using Circuit Randomization”.
In: Advances in Cryptology – CRYPTO’91. Ed. by J. Feigenbaum. Vol. 576.
Lecture Notes in Computer Science. Springer, Heidelberg. 420–432.
Beaver, D. 1995. “Precomputing Oblivious Transfer”. In: Advances in Cryp-
tology – CRYPTO’95. Ed. by D. Coppersmith. Vol. 963. Lecture Notes in
Computer Science. Springer, Heidelberg. 97–109.
Beaver, D. 1996. “Correlated Pseudorandomness and the Complexity of Private
Computations”. In: 28th Annual ACM Symposium on Theory of Computing.
ACM Press. 479–488.
Beaver, D., S. Micali, and P. Rogaway. 1990. “The Round Complexity of
Secure Protocols (Extended Abstract)”. In: 22nd Annual ACM Symposium
on Theory of Computing. ACM Press. 503–513.
Beerliová-Trubíniová, Z. and M. Hirt. 2008. “Perfectly-Secure MPC with Linear
Communication Complexity”. In: TCC 2008: 5th Theory of Cryptography
Conference. Ed. by R. Canetti. Vol. 4948. Lecture Notes in Computer
Science. Springer, Heidelberg. 213–230.
Beimel, A. and B. Chor. 1993. “Universally Ideal Secret Sharing Schemes
(Preliminary Version)”. In: Advances in Cryptology – CRYPTO’92. Ed. by
E. F. Brickell. Vol. 740. Lecture Notes in Computer Science. Springer,
Heidelberg. 183–195.
Bellare, M., V. T. Hoang, S. Keelveedhi, and P. Rogaway. 2013. “Efficient
Garbling from a Fixed-Key Blockcipher”. In: 2013 IEEE Symposium on
Security and Privacy. IEEE Computer Society Press. 478–492.
Bellare, M., V. T. Hoang, and P. Rogaway. 2012. “Foundations of garbled
circuits”. In: ACM CCS 12: 19th Conference on Computer and Communi-
cations Security. Ed. by T. Yu, G. Danezis, and V. D. Gligor. ACM Press.
784–796.
References 157
Lindell, Y. and B. Pinkas. 2007. “An Efficient Protocol for Secure Two-Party
Computation in the Presence of Malicious Adversaries”. In: Advances in
Cryptology – EUROCRYPT 2007. Ed. by M. Naor. Vol. 4515. Lecture
Notes in Computer Science. Springer, Heidelberg. 52–78.
Lindell, Y. and B. Pinkas. 2009. “A Proof of Security of Yao’s Protocol for
Two-Party Computation”. Journal of Cryptology. 22(2): 161–188.
Lindell, Y. and B. Pinkas. 2011. “Secure Two-Party Computation via Cut-and-
Choose Oblivious Transfer”. In: TCC 2011: 8th Theory of Cryptography
Conference. Ed. by Y. Ishai. Vol. 6597. Lecture Notes in Computer Science.
Springer, Heidelberg. 329–346.
Lindell, Y., B. Pinkas, and N. P. Smart. 2008. “Implementing Two-Party
Computation Efficiently with Security Against Malicious Adversaries”.
In: SCN 08: 6th International Conference on Security in Communication
Networks. Ed. by R. Ostrovsky, R. D. Prisco, and I. Visconti. Vol. 5229.
Lecture Notes in Computer Science. Springer, Heidelberg. 2–20.
Lindell, Y. and B. Riva. 2014. “Cut-and-Choose Yao-Based Secure Computa-
tion in the Online/Offline and Batch Settings”. In: Advances in Cryptology
– CRYPTO 2014, Part II. Ed. by J. A. Garay and R. Gennaro. Vol. 8617.
Lecture Notes in Computer Science. Springer, Heidelberg. 476–494. doi:
10.1007/978-3-662-44381-1_27.
Lindell, Y. and B. Riva. 2015. “Blazing Fast 2PC in the Offline/Online
Setting with Security for Malicious Adversaries”. In: ACM CCS 15: 22nd
Conference on Computer and Communications Security. Ed. by I. Ray,
N. Li, and C. Kruegel: ACM Press. 579–590.
Liu, J., M. Juuti, Y. Lu, and N. Asokan. 2017. “Oblivious Neural Network
Predictions via MiniONN Transformations”. In: ACM CCS 17: 24th
Conference on Computer and Communications Security. Ed. by B. M.
Thuraisingham, D. Evans, T. Malkin, and D. Xu. ACM Press. 619–631.
López-Alt, A., E. Tromer, and V. Vaikuntanathan. 2012. “On-the-fly multiparty
computation on the cloud via multikey fully homomorphic encryption”. In:
44th Annual ACM Symposium on Theory of Computing. ACM. 1219–1234.
Lu, S. and R. Ostrovsky. 2013. “How to Garble RAM Programs”. In: Advances
in Cryptology – EUROCRYPT 2013. Ed. by T. Johansson and P. Q. Nguyen.
Vol. 7881. Lecture Notes in Computer Science. Springer, Heidelberg. 719–
734. doi: 10.1007/978-3-642-38348-9_42.
174 References