0% found this document useful (0 votes)

33 views30 pages

Crypto_with_Machine_Learning

This paper presents a novel approach to model extraction of neural networks, framing it as a cryptanalytic problem. The authors introduce a differential attack that allows for efficient extraction of model parameters with high precision, outperforming previous methods by a significant margin. Their findings indicate that neural networks are vulnerable to such attacks due to their design focus on accuracy rather than security against model extraction.

Uploaded by

Sacrifing Doomrangs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views30 pages

Crypto_with_Machine_Learning

Uploaded by

Sacrifing Doomrangs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Cryptanalytic Extraction of

Neural Network Models

Nicholas Carlini1(B) , Matthew Jagielski2 , and Ilya Mironov3

1
Google, Mountain View, CA, USA
nicholas@carlini.com
2
Northeastern University, Boston, USA
3
Facebook, Menlo Park, USA

Abstract. We argue that the machine learning problem of model extrac-

tion is actually a cryptanalytic problem in disguise, and should be stud-
ied as such. Given oracle access to a neural network, we introduce a
differential attack that can efficiently steal the parameters of the remote
model up to floating point precision. Our attack relies on the fact that
ReLU neural networks are piecewise linear functions, and thus queries
at the critical points reveal information about the model parameters.
We evaluate our attack on multiple neural network models and extract
models that are 220 times more precise and require 100× fewer queries
than prior work. For example, we extract a 100,000 parameter neural
network trained on the MNIST digit recognition task with 221.5 queries
in under an hour, such that the extracted model agrees with the oracle on
all inputs up to a worst-case error of 2−25 , or a model with 4,000 param-
eters in 218.5 queries with worst-case error of 2−40.4 . Code is available at
https://github.com/google-research/cryptanalytic-model-extraction.

1 Introduction

The past decade has seen signiﬁcant advances in machine learning, and deep
learning in particular. Tasks viewed as being completely infeasible at the begin-
ning of the decade became almost completely solved by the end. AlphaGo
[SHM+16] defeated professional players at Go, a feat in 2014 seen as being
at least ten years away [Lev14]. Accuracy on the ImageNet recognition bench-
mark improved from 73% in 2010 to 98.7% in 2019, a 20× reduction in error
rate [XHLL19]. Neural networks can generate photo-realistic high-resolution
images that humans ﬁnd indistinguishable from actual photographs [KLA+19].
Neural network achieve higher accuracy than human doctors in limited settings,
such as early cancer detection [EKN+17].
These advances have brought neural networks into production systems. The
automatic speech recognition systems on Google’s Assistant, Apple’s Siri, and
Amazon’s Alexa are all powered by speech recognition neural networks. Neural

M. Jagielski—Northeastern University, part of work done at Google. I. Mironov—

Facebook, part of work done at Google.
c International Association for Cryptologic Research 2020
D. Micciancio and T. Ristenpart (Eds.): CRYPTO 2020, LNCS 12172, pp. 189–218, 2020.
https://doi.org/10.1007/978-3-030-56877-1_7
190 N. Carlini et al.

Machine Translation [BCB15] is now the technique of choice for production

language translation systems [WSC+16]. Autonomous driving is only feasible
because of these improved image recognition neural networks.
High-accuracy neural networks are often held secret for at least two reasons.
First, they are seen as a competitive advantage and are treated as trade secrets
[Wen90]; for example, none of the earlier systems are open-source. Second, is seen
as improving both security and privacy to keep these models secret. With full
white-box access it is easy to mount evasion attacks and generate adversarial
examples [SZS+14,BCM+13] against, for instance, abuse- or spam-detection
models. Further, white-box access allows model inversion attacks [FJR15]: it is
possible to reconstruct identifiable images of specific people given a model trained
to recognize specific human faces. Similarly, given a language model trained on
text containing sensitive data (e.g., credit card numbers), a white-box attacker
can pull this sensitive data out of the trained model [CLE+19].
Fortunately for providers of machine learning models, it is often expensive
to reproduce a neural network. There are three reasons for this cost: first, most
machine learning requires extensive training data that can be expensive to col-
lect; second, neural networks typically need hyper-parameter tuning requiring
training many models to identify the optimal final model configuration; and
third, even performing a final training run given the collected training data and
correctly configured model is expensive.
For all of the above reasons, it becomes clear that (a) adversaries are moti-
vated for various reasons to obtain a copy of existing deployed neural network,
and (b) preserving the secrecy of models is highly important. In practice compa-
nies ensure the secrecy of these models by either releasing only an API allowing
query access, or releasing on-device models, but attempting to tamper-proof and
obfuscate the source to make it difficult to extract out of the device.
Understandably, the above weak forms of protection are often seen as insuf-
ficient. The area of “secure inference” improves on this by bringing tools from
Secure Function Evaluation (SFE), which allows mutually distrustful cooperat-
ing parties to evaluate f (x) where f is held by one party and x by the other. The
various proposals often apply fully homomorphic encryption [Gen09,GBDL+16],
garbled circuits [Yao86,RWT+18], or combinations of the two [MLS+20]. Per the
standard SFE guarantee, secure inference “does not hide information [about the
function f ] that is revealed by the result of the prediction” [MLS+20]. However
this line of work often implicitly assumes that total leakage from the predictions
is small, and that recovering the function from its output would be difficult.
In total, it is clear that protecting the secrecy of neural network models is
seen as important both in practice and in the academic research community.
This leads to the question that we study in this paper:

Is it possible to extract an identical copy of a neural network

given oracle (black-box) access to the target model?

While this question is not new [TZJ+16,MSDH19,JCB+19,RK19], we argue

that model extraction should be studied as a cryptanalytic problem. To do this,
we focus on model extraction in an idealized environment where a machine
Cryptanalytic Extraction of Neural Network Models 191

learning model is made available as an oracle O that can be queried, but with no
timing or other side channels. This setting captures that of obfuscated models
made public, prediction APIs, and secure inference.

1.1 Model Extraction as a Cryptanalytic Problem

The key insight of this paper is that model extraction is closely related to
an extremely well-studied problem in cryptography: the cryptanalysis of block-
ciphers. Informally, a symmetric-key encryption algorithm is a keyed function
Ek : X → Y that maps inputs (plaintexts) x ∈ X to outputs (ciphertexts) y ∈ Y.
We expect all practically important ciphers to be resistant, at the very least,
to key recovery under the adaptive chosen-plaintext attack, i.e., given some
bounded number of (adaptively chosen) plaintext/ciphertext pairs {(xi , yi )} an
encryption algorithm is designed so that the key k cannot be extracted by a
computationally-bounded adversary.
Contrast this to machine learning. A neural network model is (informally)
a parameterized function fθ : X → Y that maps input (e.g., images) x ∈ X
to outputs (e.g., labels) y ∈ Y. A model extraction attack adaptively queries
the neural network to obtain a set of input/output pairs {(xi , yi )} that reveals
information about the weights θ. Neural networks are not constructed by design
to be resistant to such attacks.
Thus, viewed appropriately, performing a model extraction attack—learning
the weights θ given oracle access to the function fθ —is a similar problem to
performing a chosen-plaintext attack on a nontraditional “encryption” algorithm.
Given that it took the field of cryptography decades to design encryption
algorithms secure against chosen-plaintext attacks, it would be deeply surprising
if neural networks, where such attacks are not even considered in their design,
were not vulnerable. Worse, the primary objective of cipher design is robustness
against such attacks. Machine learning models, on the other hand, are primarily
designed to be accurate at some underlying task, making the design of chosen-
plaintext secure neural networks an even more challenging problem.
There are three differences separating model extraction from standard crypt-
analysis that make model extraction nontrivial and interesting to study.
First, the attack success criterion differs. While a cryptographic break can
be successful even without learning key bits—for example by distinguishing the
algorithm from a pseudo-random function, only “total breaks” that reveal (some
of) the actual model parameters θ are interesting for model extraction.
Second, the earlier analogy to keyed ciphers is imperfect. Neural networks
typically take high-dimensional inputs (e.g., images) and return low-dimensional
outputs (e.g., a single probability). It is almost more appropriate to make an
analogy to cryptanalysis of keyed many-to-one functions, such as MACs. How-
ever, the security properties of MACs are quite different from those of machine
learning models, for example second preimages are expected rather than shunned
in neural networks.
Finally, and the largest difference in practice, is that machine learning models
deal in fixed- or floating-point reals rather than finite field arithmetic. As such,
there are many components to our attack that would be significantly simplified
192 N. Carlini et al.

given inﬁnitely precise ﬂoating-point math, but given the realities of modern
machine learning, require far more sophisticated attack techniques.

1.2 Our Results

We introduce a differential attack that is effective at performing functionally-
equivalent neural network model extraction attacks. Our attack traces the neural
network’s evaluation on pairs of examples that differ in a few entries and uses
this to recover the layers (analogous to the rounds of a block cipher) of a neural
network one by one. To evaluate the efficacy of our attack, we formalize the
definition of fidelity introduced in prior work [JCB+19] and quantify the degree
to which a model extraction attack has succeeded:
Definition 1. Two models f and g are (, δ)-functionally equivalent on S if

Prx∈S |f (x) − g(x)| ≤ ≥ 1 − δ.
Table 1 reports the results of our differential attack across a wide range of model
sizes and architectures, reporting both (ε, δ)-functional equivalence on the set
S = [0, 1]d0 , the input space of the model, along with a direct measurement of
max |θ − θ̂|, directly measuring the error between the actual model weights θ
and the extracted weights θ̂ (as described in Sect. 6.2).

Table 1. Efficacy of our extraction attack which is orders of magnitude more precise
than prior work and for deeper neural networks orders of magnitude more query effi-
cient. Models denoted a-b-c are fully connected neural networks with input dimension
a, one hidden layer with b neurons, and c outputs; for formal definitions see Sect. 2.
Entries denoted with a † were unable to recover the network after ten attempts.

Architecture Parameters Approach Queries (ε, 10−9 ) (ε, 0) max |θ − θ̂|

18.2 3.2 4.5
784-32-1 25,120 [JCB+19] 2 2 2 2−1.7
Ours 219.2 2−28.8 2−27.4 2−30.2
784-128-1 100,480 [JCB+19] 220.2 24.8 25.1 2−1.8
Ours 221.5 2−26.4 2−24.7 2−29.4
10-10-10-1 210 [RK19] 222 2−10.3 2−3.4 2−12
Ours 216.0 2−42.7 2−37.98 2−36
10-20-20-1 420 [RK19] 225 ∞† ∞† ∞†
Ours 217.1 2−44.6 2−38.7 2−37
40-20-10-10-1 1,110 Ours 217.8 2−31.7 2−23.4 2−27.1
80-40-20-1 4,020 Ours 218.5 2−45.5 2−40.4 2−39.7

The remainder of this paper is structured as follows. We introduce the notation,

threat model, and attacker goals and assumptions used in Sect. 2. In Sect. 4 we
introduce an idealized attack that extracts (0, 0)-functionally-equivalent neural
networks assuming inﬁnite precision arithmetic. Section 5 develops an instantia-
tion of this attack that works in practice with ﬁnite-precision arithmetic to yield
(ε, δ)-functionally equivalent attacks.
Cryptanalytic Extraction of Neural Network Models 193

1.3 Related Work

Model extraction attacks are classified into two categories [JCB+19]: task accu-
racy extraction and fidelity extraction. The first paper to study task accuracy
extraction [TZJ+16] introduced techniques to steal similar models that approxi-
mately solves the same underlying decision task on the natural data distribution,
but do not necessarily match the predictions of the oracle precisely. While fur-
ther work exists in this space [CCG+18,KTP+19], we instead focus on fidelity
extraction where the adversary aims to faithfully reproduce the predictions of
the oracle model, when it is incorrect with respect to the ground truth. Again,
[TZJ+16] studied this problem and developed (what we would now call) func-
tionally equivalent extraction for the case of completely linear models.
This attack was then extended by a theoretical result defining and giving
a method for performing functionally-equivalent extraction for neural networks
with one layer, assuming oracle access to the gradients [MSDH19]. A concrete
implementation of this one layer attack that works in practice, handling floating
point imprecision, was subsequently developed through applying finite differ-
ences to estimate the gradient [JCB+19]. Parallel work to this also extended on
these results, focusing on deeper networks, but required tens to hundreds of mil-
lions of queries [RK19]; while the theoretical results extended to deep networks,
the implementation in practice only extracts up to the first two layers. Our work
builds on all of these four results to develop an approach that is 106 times more
accurate, requiring 103 times fewer queries, and applies to larger models.
Even without query access, it is possible to steal models with just a cache side-
channel [BBJP19], although with less fidelity than our attack that we introduce
which are 220 × more precise. Other attacks target hyperparameter extraction—
that is, extracting high-level details about the model: through what method it
was trained, if it contains convolutions, or related questions [WG18]. It is further
possible to steal hyperparameters with cache side channels [HDK+20].
Recent work has studied the learnability of deep neural networks with random
weights in the statistical query (SQ) model [DGKP20], showing that learnability
drops off exponentially with the depth of the network. This line of work does not
address the cryptographic hardness of extraction in the non-SQ model—precisely
the question addressed in this work in the empirical setting.
While not directly related to our problem, it is worth noting that we are
not the first to treat neural networks as just another type of mathemati-
cal function that can be analyzed without any specific knowledge of machine
learning. Shamir et al. [SSRD19] explain the existence of adversarial examples
[SZS+14,BCM+13], which capture evasion attacks on machine learning classi-
fiers, by considering an abstract model of neural networks.
In a number of places, our attack draws inspiration from the cryptanalysis
of keyed block-ciphers, most prominently differential cryptanalysis [BS91]. We
neither assume nor require familiarity with this field, but the informed reader
may enjoy certain parallels.
194 N. Carlini et al.

2 Preliminaries
This paper studies an abstraction of neural networks as functions f : X → Y.
Our results are independent of any methods for selecting the function f (e.g.,
stochastic gradient descent), and are independent of any utility of the function f .
As such, machine learning knowledge is neither expected nor necessary.

2.1 Notation and Deﬁnitions

Deﬁnition 2. A k-deep neural network fθ (x) is a function parameterized by θ
that takes inputs from an input space X and returns values in an output space Y.
The function f is composed as a sequence of functions alternating between linear
layers fj and a nonlinear function (acting component-wise) σ:

f = fk+1 ◦ σ ◦ · · · ◦ σ ◦ f2 ◦ σ ◦ f1 .

We exclusively study neural networks over X = Rd0 and Y = Rdk . (Until Sect. 5
we assume ﬂoating-point numbers can represent R exactly.)

Deﬁnition 3. The jth layer of the neural network fj is given by the aﬃne
transformation fj (x) = A(j) x + b(j) . The weights A(j) ∈ Rdj ×dj−1 is a dj × dj−1
matrix; the biases b(j) ∈ Rdj is a dj -dimensional vector.

While representing each layer fj as a full matrix product is the most general deﬁ-
nition of a layer, which is called fully connected, often layers have more structure.
For example, it is common to use (discrete) convolutions in neural networks that
operate on images. Convolutional layers take the input as a n × m matrix and
convolve it with a kernel, such as a 3 × 3 matrix. Importantly, however, it is
always possible to represent a convolution as a matrix product.

Deﬁnition 4. The neurons {ηi }N i=1 are functions receiving an

input and passing
k−1
it through the activation function σ. There are a total of N = j=1 dj neurons.

In this paper we exclusively study the ReLU [NH10] activation function, given
by σ(x) = max(x, 0). Our results are a fundamental consequence of the fact that
ReLU neural networks are piecewise linear functions.

Deﬁnition 5. The architecture of a neural network captures the structure of f :

(a) the number of layers, (b) the dimensions of each layer {di }ki=0 , and (c) any
additional constraints imposed on the weights A(i) and biases b(i) .

We use the shorthand a-b-c neural network to denote the sizes of each dimension;
for example a 10-20-5 neural network has input dimension 10, one layer with 20
neurons, and output dimension 5. This description completely characterizes the
structure of f for fully connected networks. In practice, there are only a few
architectures that represent most of the deployed deep learning models [ZL16],
and developing new architectures is an extremely diﬃcult and active area in
research [HZRS16,SIVA17,TL19].
Cryptanalytic Extraction of Neural Network Models 195

Deﬁnition 6. The parameters θ of fθ are the concrete assignments to the

weights A(j) and biases b(j) , obtained during the process of training the neu-
ral network.
It is beyond the scope of this paper to describe the training process which pro-
duces the parameters θ: it suﬃces to know that the process of training is often
computationally expensive and that training is a nondeterministic process, and
so training the same model multiple times will give diﬀerent sets of parameters.

2.2 Adversarial Goals and Resources

There are two parties in a model extraction attack: the oracle O who
returns fθ (x), and the adversary who generates queries x to the oracle.
Definition 7. A model parameter extraction attack receives oracle access to a
parameterized function fθ (in our case a k-deep neural network) and the archi-
tecture of f , and returns a set of parameters θ̂ with the goal that fθ (x) is as
similar as possible to fθ̂ (x).
Throughout this paper we use the ˆ symbol to indicate an extracted param-
eter. For example, θ̂ refers to the extracted weights of a model θ.
There is a spectrum of similarity definitions between the extracted weights
and the oracle model that prior work has studied [TZJ+16,JCB+19,KTP+19];
we focus on the setting where the adversarial advantage is defined by (ε, δ)-
functionally equivalent extraction as in Definition 1.
Analogous to cryptanalysis of symmetric-key primitives, the degree to which
a model extraction attack succeeds is determined by (a) the number of chosen
inputs to the model, and (b) the amount of compute required.

Assumptions. We make several assumptions of the oracle O and the attacker’s

knowledge. (We believe many of these assumptions are not fundamental and can
be relaxed. Removing these assumptions is left to future work.)
– Architecture knowledge. We require knowledge of the architecture
of the neural network.
– Full-domain inputs. We feed arbitrary inputs from X .
– Complete outputs. We receive outputs directly from the model f
without further processing (e.g., by returning only the most likely class
without a score).
– Precise computations. f is speciﬁed and evaluated using 64-bit
ﬂoating-point arithmetic.
– Scalar outputs. Without loss of generality we require the output
dimensionality is 1, i.e., Y = R.
– ReLU Activations. All activation functions (σ’s) are ReLU’s.1
1
This is the only assumption fundamental to our work. Switching to any activation
that is not piecewise linear would prevent our attack. However, as mentioned, all
state-of-the-art models use exclusively (piecewise linear generalizations of) the ReLU
activation function [SIVA17, TL19].
196 N. Carlini et al.

3 Overview of the Diﬀerential Attack

∂2f (1) (2)

∂x2
= Aji · Ai
∂2x
∂e2
j

Fig. 1. A schematic of our extraction attack on a 1-deep neural network. Let x be

an input that causes exactly one neuron to have value zero. The second diﬀerential
becomes zero at all other neurons—because they remain either fully-inactive or fully-
active. Therefore the value of this diﬀerential is equal to the product of the weight
going into the neuron at its critical point and the weight going out of this neuron.

Given oracle access to the function fθ , we can estimate ∂fθ through finite
differences along arbitrary directions. For simple linear functions defined by
f (x) = a · x + b, its directional derivative satisfies ∂e
∂f
i
≡ ai , where ei is the
basis vector and ai is the ith entry of the vector a, allowing direct recovery of
its weights through querying on these well-chosen inputs.
In the case of deep neural networks, we consider second partial directional
2
derivatives. ReLU neural networks are piecewise linear functions with ∂∂xf2 ≡ 0
almost everywhere, except when the function has some neuron ηj at the bound-
ary between the negative and positive region (i.e., is at its critical point). We
2
show that the value of the partial derivative ∂∂ef2 evaluated at a point x so that
i
(1)
neuron ηj is at such a critical point actually directly reveals the weight T (Ai,j )
for some transform T that is invertible—and therefore the adversary can learn
(1)
Ai,j . By repeating this attack along all basis vectors ei and for all neurons ηj we
can recover the complete matrix A(1) . Once we have extracted the first layer’s
weights, we are able to “peel off” that layer and re-mount our attack on the
second layer of the neural network, repeating to the final layer. There are three
core technical difficulties to our attack:

Recovering the neuron signs. For each neuron η, our attack does not exactly
(l) (l)
recover Ai , the ith row of A(l) , but instead a scalar multiple v = α · Ai . While
losing a constant α > 0 keeps the neural network in the same equivalence class,
the sign of α is important and we must distinguish between the weight vector
(l) (l)
Ai and −Ai . We construct two approaches that solve this problem, but in the
general case we require exponential work (but a linear number of queries).
Cryptanalytic Extraction of Neural Network Models 197

Controlling inner-layer hidden state. On the ﬁrst layer, we can directly compute
2
the derivative entry-by-entry, measuring ∂∂ef2 for each standard basis vector ei in
i
(1)
order to recover Aij . Deeper in the network, we can not move along standard
basis vector vectors. Worse, for each input x on average half of the neurons are in
the negative region and thus their output is identically 0; when this happens it is
not possible to learn the weight along edges with value zero. Thus we are required
to develop techniques to elicit behavior from every neuron, and techniques to
(l)
cluster together partial recoveries of each row of Ai to form a complete recovery.

Handling ﬂoating-point imprecision. Implementing our attack in practice with

finite precision neural networks introduces additional complexity. In order to
estimate the second partial derivative, we require querying on inputs that differ
by only a small amount, reducing the precision of the extracted first weight
matrix to twenty bits, or roughly 10−6 . This error of 10−6 is not large to begin
with, but this error impacts our ability to recover the next layer, compounding
multiplicatively the deeper we go in the network. Already in the second layer,
the error is magnified to 10−4 , which can completely prevent reconstruction for
the third layer: our predicted view of the hidden state is sufficiently different
from the actual hidden state that our attack fails completely. We resolve this
through two means. First, we introduce numerically stable methods assuming
that all prior layers have been extracted to high precision. Second, we develop
a precision-refinement technique that takes a prefix of the first j ≤ k layers of
a neural network extracted to n bits of precision and returns the j-deep model
extracted to 2n bits of precision (up to floating-point tolerance).

4 Idealized Diﬀerential Extraction Attack

We now introduce our (0, 0)-functionally-equivalent model extraction attack that

assumes infinite precision arithmetic and recovers completely functionally equiv-
alent models. Recall our attack assumptions (Sect. 2.2); using these, we present
our attack beginning with two “reduced-round” attacks on 0-deep (Sect. 4.1)
and 1-deep (Sect. 4.2) neural networks, and then proceeding to k-deep extraction
for contractive (Sect. 4.3) and expansive (Sect. 4.4) neural networks. Section 5
refines this idealized attack to work with finite precision.

4.1 Zero-Deep Neural Network Extraction

Zero-deep neural networks are linear functions f (x) ≡ A(1) ·x+b(1) . Querying d0
linearly independent suﬃces to extract f by solving the resulting linear system.
However let us view this problem diﬀerently, to illuminate our attack strategy
for deeper networks. Consider the parallel evaluations f (x) and f (x + δ), with

f (x + δ) − f (x) = A(1) · (x + δ) − A(1) · x = A(1) · δ.

198 N. Carlini et al.

If δ = ei , the ith standard basis vector of Rd0 (e.g., e2 = 0 1 0 0 . . . 0 ), then
(1)
f (x + δ) − f (x) = A(1) · δ = Ai .

This allows us to directly read off the weights of A(1) . Put differently, we perform
finite differences to estimate the gradient of f , given by ∇x f (x) ≡ A(1) .

4.2 One-Deep Neural Network Extraction

Many of the important problems that complicate deep neural network extraction
begin to arise at 1-deep neural networks. Because the function is no longer
completely linear, we require multiple phases to recover the network completely.
To do so, we will proceed layer-by-layer, extracting the ﬁrst layer, and then use
the 0-deep neural network attack to recover the second layer.

η0 η0
η1
η2

Fig. 2. (left) Geometry of a 1-deep neural network. The three solid line corresponds to
“critical hyperplanes” of neurons. We identify one witness to each neuron with binary
search on the dotted line . (right) For each discovered critical point, we compute the
second partial derivative along axis e1 and e2 to compute the angle of the hyperplane.

For the remainder of this paper it will be useful to have two distinct mental
models of the problem at hand. First is the symbolic view shown previously in
Fig. 1. This view directly studies the ﬂow of information through the neural
networks, represented as an alternating sequence of linear layers and non-linear
transformations. This view helps understanding the algebraic steps of our attack.
The second is the geometric view. Because neural networks operate over the
real vector space, they can be visualized by plotting two dimensional slices of
the landscape [MSDH19]. Figure 2 (left) contains an example of such a ﬁgure.
Each solid black line corresponds to a change in gradient induced in the space
by a neuron changing sign from positive to negative (or vice versa)—ignoring for
now the remaining lines. The problem of neural network extraction corresponds
to recovering the locations and angles of these neuron-induced hyperplanes: in
general with input dimension d0 , the planes have dimension d0 − 1.
Cryptanalytic Extraction of Neural Network Models 199

Definition 8. The function that computes the first j layers (up to and includ-
ing fj but not including σ) of f is denoted as f1..j . In particular, f = f1..k .
Definition 9. The hidden state at layer j is the output of the function f1..j ,
before applying the nonlinear transformation σ.
Layer fj is a linear transformation of the (j − 1)st hidden state after σ.
Definition 10. V(η; x) denotes the input to neuron η (before applying σ) when
evaluated at x. L(η) denotes the layer of neuron η. The first layer starts at 1.
Definition 11. A neuron η is at a critical point when V(η; x) = 0. We refer
to this input x as a witness to the fact that η is at a critical point, denoted by
x ∈ W(η). If V(η; x) > 0 then η is active, and otherwise inactive.
In Fig. 2 the locations of these critical points correspond exactly to the solid
black lines drawn through the plane. Observe that because we restrict ourselves
to ReLU neural networks, the function f is piecewise linear and infinitely differ-
entiable almost everywhere. The gradient ∇x f (x) is well defined at all points x
except when there exists a neuron that is at its critical point.
Extracting the rows of A(1) up to sign. Functionally, the attack as presented in
this subsection has appeared previously in the literature [MSDH19,JCB+19].
By framing it differently, our attack will be extensible to deeper networks.
Assume we were given a witness x∗ ∈ W(ηj ) that caused neuron ηj to be
at its critical point (i.e., its value is identically zero). Because we are using the
ReLU activation function, this is the point at which that neuron is currently
“inactive” (i.e., is not contributing to the output of the classifier) but would
become “active” (i.e., contributing to the output) if it becomes slightly positive.
Further assume that only this neuron ηj is at its critical point, and that for all
others neurons η = ηj we have |V(η, xj )| > δ for a constant δ > 0.
Consider two parallel executions of the neural network on pairs of examples.
Begin by defining ei as the standard basis vectors of X = RN . By querying on
the two pairs of inputs (x∗ , x∗ + ei ) and (x∗ , x∗ − ei ) we can estimate

i ∂f (x) i ∂f (x)
α+ = and α =
∂ei x=x∗ +e1 −
∂ei x=x∗ −e1
through finite differences.
Consider the quantity |α+ − α− |. Because x∗ induces a critical point of ηj ,
exactly one of {α+ , α− } will have the neuron ηj in its active regime and the
other will have ηj in its inactive regime. If no two columns of A(1) are collinear,
then as long as < δ (1) , we are guaranteed that all other neurons in the
i,j |Ai,j |
neural network will remain in the same state as before—either active or inactive.
Therefore, if we compute the difference |α+ i
− α−i
|, the gradient information
flowing into and out of all other neurons will cancel and we will be left with just
the gradient information flowing along the edge from the input coordinate i to
neuron ηj to the output. Concretely, we can write the 1-deep neural network as
f (x) = A(2) ReLU(A(1) x + b(1) ) + b(2) .
200 N. Carlini et al.

(1) (1)
i
and so either α+ −α−
i
= Aj,i ·A(2) or α−
i
−α+
i
= Aj,i ·A(2) . However, if we repeat
(1)
k
the above procedure on a new basis vector ek then either α+ − α−
k
= Aj,k · A(2)
(1)
k
or α− − α+k
= Aj,k · A(2) will hold. Crucially, whichever of the two relations that
holds for along coordinate i will be the same relation that holds on coordinate k.
Therefore we can divide out A(2) to obtain the ratio of pairs of weights
(1)
αk+ −α−
k Aj,k
αi+ −αi−
= (1) .
Aj,i

This allows us to compute every row of A(1) up to a single scalar cj . Further, we

can compute bj = −Âj · x∗ (again, up to a scaling factor) because we know
(1) (1)

that x∗ induces a critical point on neuron ηj and so its value is zero.

Observe that the magnitude of cj is unimportant. We can always push a
constant c > 0 through to the weight matrix A(2) and have an a functionally
equivalent result. However, the sign of cj does matter.

Extracting row signs. Consider a single witness xi for an arbitrary neuron ηi .

Let h = f1 (x), so that at least one element of h is identically zero. If we assume
that A(1) is contractive (Sect. 4.4 studies non-contractive networks) then we can
find a preimage x to any vector h. In particular, let ei be the unit vector in the
space Rd1 . Then we can compute a preimage x+ so that fˆ1 (x+ ) = h + ei , and a
preimage x− so that fˆ1 (x− ) = h − ei .
Because xi is a witness to neuron ηi being at its critical point, we will have
that either f (x+ ) = f (xi ) or f (x− ) = f (xi ). Exactly one of these equalities is
true because σ(h−ei ) = σ(h), but σ(h+ei ) = σ(h) when hi = 0. Therefore if the
second equality holds true, then we know that our extracted guess of the ith row
has the correct sign. However, if the first equality holds true, then our extracted
guess of the ith row has the incorrect sign, and so we invert it (along with the
(1)
bias bi ). We repeat this procedure with a critical point for every neuron ηi to
completely recover the signs for the full first layer.

Finding witnesses to critical points. It only remains to show how to find witnesses
x∗ ∈ W(η) for each neuron η on the first layer. We choose a random line in input
space (the dashed line in Fig. 2, left), and search along it for nonlinearities in the
partial derivative. Any nonlinearity must have resulted from a ReLU changing
signs, and locating the specific location where the ReLU changes signs will give
us a critical point. We do this by binary search.
To begin, we take a random initial point x0 , v ∈ Rd0 together with a large
range T . We perform a binary search for nonlinearities in f (x0 + tv) for t ∈
[−T, T ]. That is, for a given interval [t0 , t1 ], we know a critical point exists in
the interval if ∂f (x+tv)
∂v |t=t0 = ∂f (x+tv)
∂v |t=t1 . If these quantities are equal, we do
not search the interval, otherwise we continue with the binary search.

Extracting the second layer. Once we have fully recovered the first layer weights,
we can “peel off” the weight matrix A(1) and bias b(1) and we are left with
extracting the final linear layer, which reduces to 0-deep extraction.
Cryptanalytic Extraction of Neural Network Models 201

4.3 k-Deep Contractive Neural Networks

Extending the above attack to deep neural networks has several complications
that prior work was unable to resolve eﬃciently; we address them one at a time.

Critical Points Can Occur Due to ReLUs on Different Layers. Because 1-deep
networks have only one layer, all ReLUs occur on that layer. Therefore all critical
points found during search will correspond to a neuron on that layer. For k-deep
networks this is not true, and if we want to begin by extracting the first layer
we will have to remove non-first layer critical points. (And, in general, to extract
layer j, we will have to remove non-layer-j critical points.)

The Weight Recovery Procedure Requires Complete Control of the Input. In order
to be able to directly read oﬀ the weights, we query the network on basis vec-
tors ei . Achieving this is not always possible for deep networks, and we must
account for the fact that we may only be able to query on non-orthogonal direc-
tions.

Recovering Row Signs Requires Computing the Preimage of Arbitrary Hidden

States. Our row-sign procedure requires that we be able to invert A(1) , which in
general implies we need to develop a method to compute a preimage of f1..j .

4.3.1 Extracting Layer-1 Weights with Unknown Critical Point Layers

Suppose we had a function C0 (f ) = {xi }M i=1 that returns at least one critical
points for every neuron in the first layer (implying M ≥ d1 ), but never returns
critical points for any deeper layer. We claim that the exact differential attack
from above still correctly recovers the first layer of a deep neural network.
We make the following observation. Let x∗ ∈ ηi W(ηi ) be an input that is a
witness to no critical point, i.e., |V(ηi ; x∗ )| > > 0. Define flocal as the function
so that for a sufficiently small region we have that flocal ≡ f , that is,

flocal (x) = A(k+1) · · · (I (2) (A(2) (I (1) (A(1) x + b(1) )) + b(2) )) + . . . + b(k+1)
= A(k+1) I (k) A(k) · · · I (2) A(2) I (1) A(1) x + β
= Γx + β

Here, I (j) are 0–1 diagonal matrices with a 0 on the diagonal when the neuron
is inactive and 1 on the diagonal when the neuron is active:

(j) 1 if V(ηn ; x) > 0

In,n =
0 otherwise

where ηn is the nth neuron on the first layer. Importantly, observe that each I (j)
is a constant as long as x is sufficiently close to x∗ . While β is unknown, as long as
we make only gradient queries ∂flocal its value is unimportant. This observation
so far follows from the definition of piecewise linearity.
202 N. Carlini et al.

Consider now some input that is a witness to exactly one critical point on
neuron η ∗ . Formally, x∗ ∈ W(η ∗ ), but x∗ ∈ ηj =η∗ W(ηj ; x∗ ). Then

flocal (x) = A(k+1) I (k) A(k) · · · I (2) A(2) I (1) (x)A(1) x + β(x)

where again I (j) are 0–1 matrices, but except that now, I (1) (and only I (1) )
is a function of x returning a 0–1 diagonal matrix that has one of two values,
depending on the value of V(η ∗ ; x) > 0. Therefore we can no longer collapse the
matrix product into one matrix Γ but instead can only obtain

flocal (x) = Γ I (1) (x)A(1) x + β(x).

But this is exactly the case we have already solved for 1-deep neural network
weight recovery: it is equivalent to the statement flocal (x) = Γ σ(A(1) x+b(1) )+β2 ,
(1)
and so by dividing out Γ exactly as before we can recover the ratios of Ai,j .
Finding first-layer critical points. Assume we are given a set of inputs S = {xi }
so that each xi is a witness to neuron ηxi , with ηxi unknown. By the coupon
collector’s argument (assuming uniformity), for |S| N log N , where N is the
total number of neurons, we will have at least two witnesses to every neuron η.
Without loss of generality let x0 , x1 ∈ W(η) be witnesses to the same neu-
ron η on the first layer, i.e, that V(η; x0 ) = V(η; x1 ) = 0. Then, performing the
weight recovery procedure beginning from each of these witnesses (through finite
(1)
differences) will yield the correct weight vector Aj up to a scalar.
Typically elements of S will not be witnesses to neurons on the first layer.
Without loss of generality let x2 and x3 be witnesses to any neuron on a deeper
layer. We claim that we will be able to detect these error cases: the outputs of
the extraction algorithm will appear to be random and uncorrelated. Informally
speaking, because we are running an attack designed to extract first-layer neu-
rons on a neuron actually on a later layer, it is exceedingly unlikely that the
attack would, by chance, give consistent results when run on x2 and x3 (or any
arbitrary pair of neurons).
Formally, let h2 = f1 (x2 ) and h3 = f1 (x3 ). With high probability, sign(h2 ) =
sign(h3 ). Therefore, when executing the extraction procedure on x2 we compute
over the function Γ1 I (1) (x2 )A(1) x + β1 , whereas extracting on x3 computes over
Γ2 I (1) (x3 )A(1) x + β2 . Because Γ1 = Γ2 , this will give inconsistent results.
Therefore our first layer weight recovery procedure is as follows. For all inputs
xi ∈ S run the weight recovery procedure to recover the unit-length normal
vector to each critical hyperplane. We should expect to see a large number of
vectors only once (because they were the result of running the extraction of a
layer 2 or greater neuron), and a small number of vectors that appear duplicated
(because they were the result of successful extraction on the first layer). Given
the first layer, we can reduce the neural network from a k-deep neural network
to a (k − 1)-deep neural network and repeat the attack. We must resolve two
difficulties, however, discussed in the following two subsections.
Cryptanalytic Extraction of Neural Network Models 203

4.3.2 Extracting Hidden Layer Weights with Unknown Critical Points

2
When extracting the first layer weight matrix, we were able to compute ∂e∂1 ∂e f
j
for
each input basis vectors ei , allowing us to “read off” the ratios of the weights on
the first layer directly from the partial derivatives. However, for deeper layers, it
is nontrivial to exactly control the hidden layers and change just one coordinate
in order to perform finite differences.2 Let j denote the current layer we are
extracting. Begin by sampling dj + 1 directions δi ∼ N (0, Id0 ) ∈ X and let

∂ 2 f (x)
dj +1
{yi } = .
∂δ1 ∂δi x=x∗ i=1

From here we can construct a system of equations: let hi = σ(f1..j−1 (x + δi ))

and solve for the vector w such that hi · w = yi .
As before, we run the weight recovery procedure assuming that each witness
corresponds to a critical point on the correct layer. Witnesses that correspond to
neurons on incorrect layers will give uncorrelated errors that can be discarded.
Unifying partial solutions. The above algorithm overlooks one important prob-
lem. For a given critical point x∗ , the hidden vector obtained from f1..j (x∗ ) is
likely to have several (on average, half) neurons that are negative, and therefore
σ(f1..j (x∗ )) and any σ(f1..j (x∗ + δi )) will have neurons that are identically zero.
This makes it impossible to recover the complete weight vector from just one
application of least squares—it is only possible to compute the weights for those
entries that are non-zero. One solution would be to search for a witness x∗ such
that component-wise f1..j (x∗ ) ≥ 0; however doing this is not possible in general,
and so we do not consider this option further.
Instead, we combine together multiple attempts at extracting the weights
through a uniﬁcation procedure. If x1 and x2 are witnesses to critical points for
the same neuron, and the partial vector f1..j (x1 ) has entries t1 ⊂ {1, . . . , dj }
and the partial vector f1..j (x2 ) has entries t2 ⊂ {1, . . . , dj } deﬁned, then it is
possible to recover the ratios for all entries t1 ∪ t2 by unifying together the two
partial solutions as long as t1 ∩ t2 is non-empty as follows.
Let ri denote the extracted weight vector on witness x1 with entries at loca-
tions t1 ⊂ {1, . . . , dj } (respectively, r2 at x2 with locations at t2 ). Because the
(j)
two vectors correspond to the solution for the same row of the weight matrix Ai ,
the vectors r1 and r2 must be consistent on t1 ∩ t2 . Therefore, we will have that
r1 [t1 ∩ t2 ] = c · r2 [t1 ∩ t2 ] for a scalar c = 0. As long as t1 ∩ t2 = ∅ we can compute
the appropriate constant c and then recover the weight vector r1,2 with entries
at positions t1 ∪ t2 .
Observe that this procedure also allows us to check whether or not x1 and x2
are witnesses to the same neuron n reaching its critical point. If |t1 ∩ t2 | ≥ γ,
then as long as there do not exist two rows of A(j) that have γ + 1 entries that
are scalar multiples of each other, there will be a unique solution that merges the
2
For the expansive networks we will discuss in Sect. 4.4 it is actually impossible;
therefore this section introduces the most general method.
204 N. Carlini et al.

two partial solutions together. If the uniﬁcation procedure above fails—because

there does not exist a single scalar c so that c · r1 [t1 ∩ t2 ] = r2 [t1 ∩ t2 ]—then x1
and x2 are not witnesses to the same neuron being at a critical point.

4.3.3 Recovering Row Signs in Deep Networks

The 1-layer contractive sign recovery procedure can still apply to “sufficiently
contractive” neural network where at layer j there exists an > 0 so that for
all h ∈ Rdj with h < there exists a preimage x with f1..j (x) = h. If a neural
network is sufficiently contractive it is easy to see that the prior described attack
will work (because we have assumed the necessary success criteria).
In the case of 1-deep networks, it suffices for d1 ≤ d0 and A(1) to be onto as
described. In general it is necessary that dk ≤ dk−1 ≤ · · · ≤ d1 ≤ d0 but it is not
sufficient, even if every layer A(i) were an onto map. Because there is a ReLU
activation after every hidden layer, it is not possible to send negative values into
the second layer fj when computing the preimage.
Therefore, in order to find a preimage of hi ∈ Rdi we must be more careful
in how we mount our attack: instead of just searching for hi−1 ∈ Rdi−1 so that
fi−1 (hi−1 ) = hi we must additionally require that component-wise hi−1 ≥ 0.
This ensures that we will be able to recursively compute hi−2 → hi−1 and by
induction compute x ∈ X such that f1..j (x) = hj .
It is simple to test if a network is sufficiently contractive without any queries:
try the above method to find a preimage x; if this fails, abort and attempt the
following (more expensive) attack procedure. Otherwise it is contractive.

4.4 k-Deep Expansive Neural Networks

η5 η0
η1
η2
y

x
η3
y x

η4
x

Fig. 3. (left) Geometry of a k-deep neural network, following [RK19]. Critical hyper-
planes induced from neuron η0 , η1 , η2 are on the first layer and are linear. Critical
hyperplanes induced from neurons η3 , η4 are on the second layer and are “bent” by
neurons on the first layer. The critical hyperplane induced from neuron η5 is a neuron
on the third layer and is bent by neurons on the prior two layers. (right) Diagram of
the hyperplane following procedure. Given an initial witness to a critical point x, fol-
low the hyperplane to the double-critical point x . To find where it goes next, perform
binary search along the dashed line and find the witness y.
Cryptanalytic Extraction of Neural Network Models 205

While most small neural networks are contractive, in practice almost all
interesting neural networks are expansive: the number of neurons on some inter-
mediate layer is larger than the number of inputs to that layer. Almost all of
the prior methods still apply in this setting, with one exception: the column sign
recovery procedure. Thus, we are required to develop a new strategy.

Recovering signs of the last layer. Observe that sign information is not lost for
the ﬁnal layer: because there is no ReLU activation and we can directly solve for
the weights with least squares, we do not lose sign information.

Recovering signs on the second-to-last layer. Suppose we had extracted com-

pletely the function fˆ1..k−1 (the third to last layer), and further had extracted
the weights Â(k) and biases b̂(k) up to sign of the rows. There are three unknown
quantities remaining: a sign vector s ∈ {−1, 1}dk , Â(k+1) and b̂(k+1) . Suppose
we were given S ⊂ X so that |S| > dk . Then it would be possible to solve for all
three unknown simultaneously through brute force.

Deﬁnition 12. Let v M = M denote multiplying rows of matrix M ∈ Ra×b

by the corresponding coordinate from v ∈ Ra . Thus, Mij = Mij · vi .

Let hi = σ(f1..k−1 (xi )). Enumerate all 2dk assignments of s and compute
gi = σ((s Â(k) )hi + (s b̂(k) ). We know that if we guessed the sign vector s
correctly, then there would exist a solution to the system of equations v · gi + b =
f (xi ). This is the zero-deep extraction problem and solving it efficiently requires
just a single call to least squares. This allows us to—through brute forcing the
sign bits—completely recover both the signs of the second-to-last layer as well
as the values (and signs) of the final layer.
Unfortunately, this procedure does not scale to recover the signs of layer k −1
and earlier. It relies on the existence of an efficient testing procedure (namely,
least squares) to solve the final layer. If we attempted this brute-force strategy at
layer k − 3 in order to test if our sign assignment was correct, we would need to
run the complete layer k − 2 extraction procedure, thus incurring an exponential
number of queries to the oracle.
However, we can use this idea in order to still recover signs even at earlier
layers in the network with only a linear number of queries (but still exponential
work in the width of the hidden layers).

Recovering signs of arbitrary hidden layers. Assume that we are given a collec-
tion of examples {xi } ⊂ W(η) for some neuron η that is on the layer after we
extracted so far: L(η) = j + 1. Then we would know that there should exist a
single unknown vector v and bias b such that fj (xi ) · v + b = 0 for all xi .
This gives us an eﬃcient procedure to test whether or not a given sign assign-
ment on layer j is correct. As before, we enumerate all possible sign assignments
and then check if we can recover such a vector v. If so, the assignment is correct;
if not, it is wrong. It only remains left to show how to implement this procedure
to obtain such a collection of inputs {xi }.
206 N. Carlini et al.

4.4.1 The Polytope Boundary Projection Algorithm

Deﬁnition 13. The layer j polytope containing x is the set of points {x + δ}
so that sign(V(η; x)) = sign(V(η; x + δ)) for all L(η) ≤ j.

Observe that the layer j polytope around x is an open, convex set, as long as x
is not a witness to a critical point. In Fig. 3, each enclosed region is a layer-k
polytope and the triangle formed by η0 , η1 , and η2 is a layer-(k − 1) polytope.
Given an input x and direction Δ, we can compute the distance α so that
the value x = x + αΔ is at the boundary of the polytope defined by layers 1
to k. That is, starting from x traveling along direction Δ we stop the first time
a neuron on layer j or earlier reaches a critical point. Formally, we define

Proj1..j (x, Δ) = minα≥0 {α : ∃η s.t. L(η) ≤ j ∧ V(η; x + αΔ) = 0}

We only ever compute Proj1..j when we have extracted the neural network up to
layer j. Thus we perform the computation with respect to the extracted func-
tion fˆ and neuron-value function V̂, and so computing this function requires no
queries to the oracle. In practice we solve for α via binary search.

4.4.2 Identifying a Single Next-Layer Witness

Given the correctly extracted network fˆ1..j−1 and the weights (up to sign) of
layer j − 1, our sign extraction procedure requires some witness to a critical
point on layer j. We begin by performing our standard binary search sweep
to find a collection S ⊂ X , each of which is a witness to some neuron on an
unknown layer. It is simple to filter out critical points on layers j − 1 or earlier
by checking if any of V̂(η; x) = 0 for L(η) ≤ j − 1. Even though we have not
solved for the sign of layer j, it is still possible to compute whether or not they
are at a critical point because critical points of Â(j) are critical points of −Â(j) .
This removes any witnesses to critical points on layer j or lower.
Now we must filter out any critical points on layers strictly later than j.
Let x∗ ∈ W(η ∗ ) denote a potential witness that is on layer j or later (having
already filtered out critical points on layers j − 1 or earlier). Through finite
differences, estimate g = ±∇x f (x) evaluated at x = x∗ . Choose any random
vector r perpendicular to g, and therefore parallel to the critical hyperplane. Let
α = Proj1..j (x∗ , r). If it turns out that x∗ is a witness to a critical point on layer j
then for all < α we must have that x∗ + r ∈ W(η ∗ ). Importantly, we also have
the converse: with high probability for δ > α we have that x∗ + δr ∈ W(η ∗ ).
However, observe that if x∗ is not a witness to a neuron on layer j then one
of these two conditions will be false. We have already ruled out witnesses on
earlier neuron, so if x∗ is a witness to a later neuron on layer j > j then it
is unlikely that the layer-j polytope is the same shape as the layer-j polytope,
and therefore we will discover this fact. In the case that the two polytopes are
actually identical, we can mount the following attack and if it fails we know that
our initial input was on the wrong layer.
Cryptanalytic Extraction of Neural Network Models 207

4.4.3 Recovering Multiple Witnesses for the Same Neuron

The above procedure yields a single witness x∗ ∈ W(η ∗ ) so that L(η ∗ ) = j + 1.
We expand this to a collection of witnesses W where all x ∈ W have x ∈ W(η ∗ ),
requiring the set to be diverse:
Definition 14. A collection of inputs S is fully diverse at layer j if for all η
with L(η) = j and for s ∈ {−1, 1} there exists x ∈ S such that s · V(η; x) ≥ 0.
Informally, being diverse at layer j means that if we consider the projection onto
the space of layer j (by computing f1..j (x) for x ∈ S), for every neuron η there
will be at least one input x+ ∈ S that the neuron is positive, and at least one
input x− ∈ S so that the neuron is negative.
Our procedure is as follows. Let n be normal to the hyperplane x∗ is on.
Choose some r with r · n = 0 and let α = Proj1..j (x∗ , r) to define x = x∗ + αr
as a point on the layer-j polytope boundary. In particular, this implies that we
still have that x ∈ W(η ∗ ) (because r is perpendicular to n) but also x ∈ W(ηu )
for some neuron L(ηu ) < j (by construction of α). Call this input x the double-
critical point (because it is a witness to two critical points simultaneously).
From this point x , we would like to obtain a new point y so that we still
have y ∈ W(η ∗ ), but that also y is on the other side of the neuron ηu , i.e.,
sign(V(ηu ; x∗ )) = sign(V(ηu ; y)). Figure 3 (right) gives a diagram of this process.
In order to follow x∗ along its path, we first need to find it a critical point on
the new hyperplane, having just been bent by the neuron ηu . We achieve this
by performing a critical-point search starting -far away from, and parallel to,
the hyperplane from neuron ηu (the dashed line in Fig. 3). This returns a point
y from where we can continue the hyperplane following procedure.
The geometric view hurts us here: because the diagram is a two-dimensional
projection, it appears that from the critical point y there are only two directions
we can travel in: away from x or towards x . Traveling away is preferable—
traveling towards x will not help us construct a fully diverse set of inputs.
However, a d0 -dimensional input space has in general (d0 −1) dimensions that
remain on the neuron η ∗ . We defer to Sect. 5.5.1 an efficient method for selecting
the continuation direction. For now, observe that choosing a random direction
will eventually succeed at constructing a fully-diverse set, but is extremely inef-
ficient: there exist better strategies than choosing the next direction.

4.4.4 Brute Force Recovery

Given this collection S, we can now—through brute force work—recover the
correct sign assignment as follows. As described above, compute a fully diverse
set of inputs {xi } and deﬁne hi = f1..j (xi ). Then, for all possible 2dj assignments
(j)
of signs s ∈ {−1, 1}dj , compute the guessed weight matrix Âs = s Â(j) .
If we guess the correct vector s, then we will be able to compute ĥi =
(j) (j) (j) (j)
σ(Âv hi + b̂v ) = σ(Âv f1..j−1 (xi ) + b̂v ) for each xi ∈ S. Finally, we know
that there will exist a vector w = 0 and bias b̂ such that for all hi we have
ĥi w + b = 0. As before, if our guess of s is wrong, then with overwhelming
208 N. Carlini et al.

probability there will not exist a valid linear transformation w, b. Thus we can
recover sign with a linear number of queries and exponential work.

5 Instantiating the Diﬀerential Attack in Practice

The above idealized attack would eﬃciently extract neural network models but
suﬀers from two problems. First, many of the algorithms are not numerically
stable and introduce small errors in the extracted weights. Because errors in
layer i compound and cause further errors at layers j > i, it is necessary to
keep errors to a minimum. Second, the attack requires more chosen-inputs than
is necessary; we develop new algorithms that require fewer queries or re-use
previously-queried samples.

Reading this section. Each sub-section that follows is independent from the sur-
rounding sub-sections and modiﬁes algorithms introduced in Sect. 4. For brevity,
we assume complete knowledge of the original algorithm and share the same
notation. Readers may ﬁnd it helpful to review the original algorithm before
proceeding to each subsection.

5.1 Improving Precision of Extracted Layers

Given a precisely extracted neural network up to layer j so that fˆ1..j−1 is func-
tionally equivalent to f1..j−1 , but so that weights Â(j) and biases b̂(j) are impre-
cisely extracted due to imprecision in the extraction attack, we will now show
how to extend this to a refined model f˜1..j that is functionally equivalent to f1..j .
In an idealized environment with infinite precision floating point arithmetic this
step is completely unnecessary; however empirically this step brings the relative
error in the extracted layer’s weights from 2−15 to 2−35 or better.
To begin, select a neuron η with L(η) = j. By querying the already-extracted
dj
model fˆ1..j , analytically compute witnesses {xi }i=1 so that each xi ∈ Ŵ(η). This
requires no queries to the model as we have already extracted this partial model.
If the Â(j) and b̂(j) were exactly correct then W(η; ·) ≡ Ŵ(η; ·) and so each
computed critical point xi would be exactly a critical point of the true model
f and so V(η; xi ) ≡ 0. However, if there is any imprecision in the computation,
then in general we will have that 0 < |V(η; xi )| < for some small > 0.
Fortunately, given this xi it is easy to compute xi so that V(η; xi ) = 0. To
do this, we sample a random Δ ∈ Rd0 and apply our binary search procedure
on the range [xi + Δ, xi − Δ]. Here we should select Δ so that Δ is sufficiently
small that the only critical points it crosses is the one induced by neuron η, but
sufficiently large that it does reliably find the true critical point of η.
dj
Repeating this procedure for each witness xi gives a set of witnesses {xi }i=1
to the same neuron η. We compute hi = fˆ1..j−1 (xi ) as the hidden vector that
layer j will receive as input. By assumption hi is precise already and so fˆ1..j−1 ≈
f1..j−1 . Because xi is a witness to neuron η having value zero, we know that that
(j)
An · hi = 0 where n corresponds to the row of neuron η in A(j) .
Cryptanalytic Extraction of Neural Network Models 209

Ideally we would solve this resulting system with least squares. However, in
practice, occasionally the conversion from x → x fails because x is no longer a
witness to the same neuron η . This happens when there is some other neuron
(i.e., η ) that is closer to x than the true neuron η. Because least squares is not
robust to outliers this procedure can fail to improve the solution.
We take two steps to ensure this does not happen. First, observe that if
Δ is smaller, the likelihood of capturing incorrect neurons η decreases faster
than the likelihood of capturing the correct neuron η. Thus, we set Δ to be
small enough that roughly half of the attempts at finding a witness x fails.
Second, we apply a (more) robust method of determining the vector that satisfies
the solution of equations [JOB+18]. However, even these two techniques taken
together occasionally fail to find valid solutions to improve the quality. When
this happens, we reject this proposed improvement and keep the original value.
Our attack could be improved with a solution to the following robust statistics
problem: Given a (known) set S ⊂ RN such that for some (unknown) weight
vector w we have Prx∈S [|w · x + 1| ≤ ] > δ for sufficiently small , sufficiently
large δ > 0.5, and δ|S| > N , efficiently recover the vector w to high precision.

5.2 Eﬃcient Finite Diﬀerences

Most of the methods in this paper are built on computing second partial deriva-
tives of the neural network f , and therefore developing a robust method for
estimating the gradient is necessary. Throughout Sect. 4 we compute the partial
derivative of f along direction α evaluated at x with step size ε as

∂ε def f (x + ε · α) − f (x)
f (x) = .
∂εα ε
i i
To compute the second partial derivative earlier, we computed α+ and α−
∗
by first taking a step towards x + 0 e1 for a different step size 0 and then
computed the first partial derivative at this location. However, with floating
point imprecision it is not desirable to have two step sizes (0 controlling the
distance away from x∗ to step, and controlling the step size when computing the
partial derivative). Worse, we must have that 0 because if ∂e∂f
1
∂f
0 > ∂e i
then
when computing the partial derivative along ei we may cross the hyperplane and
estimate the first partial derivative incorrectly. Therefore, instead we compute

i ∂f (x) i ∂f (x)
α+ = and α− =
∂ei x=x∗ +ei ∂ -ei x=x∗ −ei

where we both step along ei and also take the partial derivative along the same
ei (and similarly for −ei ). This removes the requirement for an additional hyper-
parameter and allows the step size to be orders of magnitude larger, but
introduces a new error: we now lose the relative signs
of the entries in the row
(1) (1)
when performing extraction and can only recover Ai,j /Ai,k .
210 N. Carlini et al.

(1) (1)
Extracting column signs. We next recover the value sign(Ai,j ) · sign(Ai,k ). For-
tunately, the same diﬀerencing process allows us to learn this information, using
(1) (1)
the following observation: if Ai,j and Ai,k have the same sign, then moving in
the ej + ek direction will cause their contributions to add. If they have diﬀerent
signs, their contributions will cancel each other. That is, if

j+k j+k j j
α+ − α− = α+ − α− + α+ k
− α−k
,

we have that

(1) (1) (1) (1)
(Ai,j + Ai,k ) · A(2) = Ai,j · A(2) + Ai,k · A(2) ,

and therefore that (1)

Ai,j (1)
(1) = Ai,j
.
Ai,k (1)
Ai,k

(1)
We can repeat this process to test whether each Ai,j has the same sign as
(1) (1)
(for example) Ai,1 . However, we still do not know whether any single Ai,j is
positive or negative—we still must recover the row signs as done previously.

5.3 Finding Witnesses to Critical Points

f (x∗ ) = fˆ(x∗ ) fˆ(x∗ ) f (x∗ ) = fˆ(x∗ )

f (x∗ )

xα x∗ xβ xα x∗ xβ xα x∗ xβ

Fig. 4. Efficient and accurate witness discovery. (left) If xα and xβ differ in only one
ReLU (as shown left), we can precisely identify the location x∗ at which the ReLU
reaches its critical point. (middle) If instead more than one ReLU differs (as shown
right), we can detect that this has happened: the predicted of fˆ(·) evaluated at x∗
as inferred from intersecting the dotted lines does not actually equal the true value
of f (x∗ ). (right) This procedure is not sound and still may potentially incorrectly
identify critical points; in practice we find these are rare.

Throughout the paper we require the ability to ﬁnd witnesses to critical

points. Section 4.2 uses simple binary search to achieve this which is (a) imprecise
in practice, and (b) query ineﬃcient. We improve on the witness-ﬁnding search
procedure developed by [JCB+19]. Again we interpolate between two examples
u, v and let xα = (1−α)u+αv. Previously, we repeatedly performed binary search
as long as the partial derivatives were not equal ∂f (xα ) = ∂f (xβ ), requiring p
queries to obtain p bits of precision of the value x∗ . However, observe that if xα
Cryptanalytic Extraction of Neural Network Models 211

and xβ diﬀer in the sign of exactly one neuron i, then we can directly compute
the location x∗ at which V(ηi ; x∗ ) = 0 but so that for all other ηj we have

sign V(ηj ; xα ) = sign V(ηj ; x∗ ) = sign V(ηj ; xβ )

This approach is illustrated in Fig. 4 and relies on the fact that f is a piecewise
linear function with two components. By measuring, f (xα ) and ∂f (xα ) (resp.,
f (xβ ) and ∂f (xβ )), we find the slope and intercept of both the left and right lines
in Fig. 4 (left). This allows us to solve for their expected intersection (x∗ , fˆ(x∗ )).
Typically, if there are more than two linear segments, as in the middle of the
figure, we will find that the true function value f (x∗ ) will not agree with the
expected function value fˆ(x∗ ) we obtained by computing the intersection; we
can then perform binary search again and repeat the procedure.
However, we lose some soundness from this procedure. As we see in Fig. 4
(right), situations may arise where many ReLU units change sign between xα
and xβ , but fˆ(x∗ ) = f (x∗ ). In this case, we would erroneously return x∗ as a
critical point, and miss all of the other critical points in the range. Fortunately,
this error case is pathological and does not occur in practice.

5.3.1 Further Reducing Query Complexity of Witness Discovery

Suppose that we had already extracted the first j layers of the neural network
and would like to perform the above critical-point finding algorithm to identify
all critical points between xα and xβ . Notice that we do not need to collect any
more critical points from the first j layers, but running binary search will recover
them nonetheless. To bypass this, we can analytically compute S as the set of
all witnesses to critical points on the extracted neural network fˆ1..j between xα
and xβ . As long as the extracted network fˆ is correct so far, we are guaranteed
that all points in S are also witnesses to critical points of the true f .
Instead of querying on the range (xα , xβ ) we perform the |S| + 1 different
|S|
searches. Order the elements of S as {si }i=1 so that si < sj =⇒ |xα − si | <
|xα − sj |. Abusing notation, let s1 = xα and s|S| = xβ . Then, perform binary
search on each disjoint range (Si , Si+1 ) for i = 1 to |S| − 1 and return the union.

5.4 Uniﬁcation of Witnesses with Noisy Gradients

Recall that to extract Â(l) we extract candidates candidates {ri } and search for
pairs ri , rj that agree on multiple coordinates. This allows us to merge ri and rj
to recover (eventually) full rows of Â(l) . With ﬂoating point error, the uniﬁcation
algorithm in Sect. 4.3 fails for several reasons.
Our core algorithm computes the normal to a hyperplane, returning pairwise
(1) (1) (1)
ratios Âi,j /Âi,k ; throughout Sect. 4 we set Âi,1 = 1 without loss of generality.
Unfortunately in practice there is loss of generality, due to the disparate
impact of numerical instability. Consider the case where Ai,1 < 10−α for α
(l)
0,
(l)
but Ai,k ≥ 1 for all other k. Then there will be substantially more (relative)
212 N. Carlini et al.

(l)
floating point imprecision in the weight Ai,1 than in the other weights. Before
normalizing there is no cause for concern since the absolute error is no larger
than for any other. However, the described algorithm now normalizes every other
(l) (l)
coordinate Ai,k by dividing it by Ai,1 —polluting the precision of these values.
Therefore we adjust our solution. At layer l, we are given a collection of vec-
tors R = {ri }ni=1 so that each ri corresponds to the extraction of some (unknown)
neuron ηi . First, we need an algorithm to cluster the items into sets {Sj }dj=1
l
so
that Sj ⊂ R and so that every vector in Sj corresponds to one neuron on layer
(l)
l. We then need to unify each set Sj to obtain the final row of Âj .
(a)
Creating the Subsets S with Graph Clustering. Let rm ∈ Sn denote the ath
coordinate of the extracted row rm from cluster n. Begin by constructing a
(k)
graph G = (V, E) where each vector ri corresponds to a vertex. Let δij =
(k) (k)
|ri − rj | denote the difference between row ri and row rj along axis k; then
connect an edge from ri to rj when the approximate ·0 norm is sufficiently
(k)
large k 1 δij < > log d0 . We compute the connected components of G and
partition each set Sj as one connected component. Observe that if = 0 then this
procedure is exactly what was described earlier, pairing vectors whose entries
agree perfectly; in practice we find a value of ε = 10−5 suffices.

Unifying Each Cluster to Obtain the Row Weights. We construct the three
(i) (i)
dimensional Mi,a,b = ra /rb . Given M , the a good guess for the scalar cab
(i) (i)
so that ra = rb · Cab along as many coordinates i as possible is the assignment
Cab = mediani Mi,a,b , where the estimated error is eab = stdevi Mi,a,b .
If all ra were complete and had no imprecision then Cab would have no error
and so Cab = Cax · Cxb . However because it does have error, we can iteratively
improve the guessed C matrix by observing that if the error eax + exb < eab then
the guessed assignment Cax · Cxb is a better guess than Cab . Thus we replace
Cab ← Cax · Cxb and update eab ← eax + exb . We iterate this process until
there is no further
improvement. Then, ﬁnally, we choose the optimal dimension
a = arg mina b eab and return the vector Ca . Observe that this procedure
closely follows constructing the union of two partial entries ri and rj except that
we perform it along the best axis possible for each coordinate.

5.5 Following Neuron Critical Points

Section 4.4.3 developed techniques to construct a set of witnesses to the same

neuron being at its critical point. We now numerically-stabilize this procedure.
As before we begin with an input x∗ ∈ W(η ∗ ) and compute the normal
vector n to the critical plane at x∗ , and then choose r satisfying r · n = 0. The
computation of n will necessarily have some ﬂoating point error, so r will too.
This means when we compute α = Proj1..j (x∗ , r) and let x = x∗ + rα the
resulting x will be almost exactly a witness to some neuron ηu with L(ηu ) < j,
(because this computation was performed analytically on a precisely extracted
Cryptanalytic Extraction of Neural Network Models 213

ηz
ηu
ηu x x̄
y
x1 η∗
η∗
x2 x̄

x∗
η∗
x∗
η∗

Fig. 5. Numerically stable critical-point following algorithm. (left) From a point x

compute a parallel direction along η ∗ , step part way to x1 and reﬁne it to x2 , and then
ﬁnish stepping to x . (right) From x grow increasingly large squares until there are
more than four intersection points; return y as the point on η ∗ on the largest square.

model), but x has likely drifted off of the original critical plane induced by η ∗
(Fig. 5).
To address√ this, after computing α we initially take a smaller step and let
x1 = x∗ + r α. We then refine the location of this point to a point x2 by
performing binary search on the region x1 − n to x1 + n for a small step .
If there was no error in computing n then x1 = x2 because both are already
witnesses to η ∗ . If not, any error has been corrected. Given x∗ and x2 we now
can now compute α2 = Proj1..j (x∗ , x2 − x∗ ) and let x̄ = x∗ + (x2 − x∗ )α2 which
will actually be a witness to both neurons simultaneously.
Next we give a stable method to compute y that is a witness to η ∗ and on
the other side of ηu . The previous procedure required a search parallel to ηu and
infinitesimally displaced, but this is not numerically stable without accurately
yet knowing the normal to the hyperplane given by ηu .
Instead we perform the following procedure. Choose two orthogonal vectors
of equal length β, γ and and perform binary search on the line segments that
trace out the perimeter of a square with coordinates x̄ ± β ± γ.
When β is small, the number of critical points crossed will be exactly four:
two because of ηu and two because of η ∗ . As long as the number of critical points
remains four, we double the length of β and γ.
Eventually we will discover more than four critical points, when the perimeter
of the square intersects another neuron ηz . At this point we stop increasing the
size of the box and can compute the continuation direction of η ∗ by discarding the
points that fall on ηu . We can then choose y as the point on η ∗ that intersected
with the largest square binary search.

5.5.1 Determining Optimal Continuation Directions

The hyperplane following procedure will eventually recover a fully diverse set of
inputs W but it may take a large number of queries to do so. We can reduce
the number of queries by several orders of magnitude by carefully choosing the
continuation direction r instead of randomly choosing any value so that r ·n = 0.
214 N. Carlini et al.

Given the initial coordinate x and after computing the normal n to the
hyperplane, we have d0 − 1 dimensions that we can choose between to travel
next. Instead of choosing a random r · n = 0 we instead choose r such that we
make progress towards obtaining a fully diverse set W .
Deﬁne Wi as the set of witnesses that have been found so far. We say that this
set is diverse on neuron η if there exists an x+ , x− ∈ Wi such that V(η; x+ ) ≥ 0
and V(η; x− ) < 0. Choose an arbitrary neuron ηt such that Wi is not diverse
on ηt . (If there are multiple such options, we should prefer the neuron that would
be easiest to reach, but this is secondary.)
Our goal will be to choose a direction r such that (1) as before, r · n = 0,
however (2) Wi ∪ {x + αr} is closer to being fully diverse. Here, “closer” means
that d(W ) = minx∈W |V(ηt ; x)| is smaller. Because the set is not yet diverse on
ηt , all values are either positive or negative, and it is our objective to switch the
sign, and therefore become closer to zero. Therefore our procedure sets

r = arg minr : r·n=0 d(Wi ∪ {x + αr})

performing the minimization through random search over 1,000 directions.

6 Evaluation

We implement the described extraction algorithm in JAX [BFH+18], a Python

library that mirrors the NumPy interface for performing eﬃcient numerical com-
putation through just in time compilation.

6.1 Computing (ε, 10−9 )-Functional Equivalence

Computing (ε, 10−9 )-functional equivalence is simple. Let S̄ ⊂ S be a ﬁnite set

In practice we set |S̄| = 109 and compute the max so that evaluating the function
is possible under an hour per neural network.

6.2 Computing (ε, 0)-Functional Equivalence

Directly computing (ε, 0)-functional equivalence is infeasible, and is NP-hard

(even to approximate) by reduction to Subset Sum [JCB+19]. We nevertheless
propose two methods that eﬃciently give upper bounds that perform well.
Cryptanalytic Extraction of Neural Network Models 215

Error bounds propagation. The most direct method to compute (ε, 0)-functional
equivalence of the extracted neural network fˆ is to compare the weights A(i)
to the weights Â(i) and analytically derive an upper bound on the error when
performing inference. Observe that (1) permuting the order of the neurons in
the network does not change the output, and (2) any row can be multiplied by a
positive scalar c > 0 if the corresponding column in the next layer is divided by c.
Thus, before we can compare Â(i) to A(i) we must “align” them. We identify
the permutation mapping the rows of Â(l) to the rows of A(l) through a greedy
matching algorithm, and then compute a single scalar per row s ∈ Rd+i . To
ensure that multiplying by a scalar does not change the output of the network,
we multiply the columns of the next layer Â(l+1) by 1/s (with the inverse taken
pairwise). The process to align the bias vectors b(l) is identical, and the process
is repeated for each further layer.
This gives an aligned Ã(i) and b̃(i) from which we can analytically derive
upper bounds on the error. Let Δi = Ã(i) − A(i) , and let δi be the largest
singular value of Δi . If the 2 -norm of the maximum error going into layer i is
given by ei then we can bound the maximum error going out of layer i as
ei+1 ≤ δi · ei + b̃(i) − b(i) 2 .
By propagating bounds layer-by-layer we can obtain an upper bound on the
maximum error of the output of the model.
This method is able to prove an upper bound on (, 0) functional equivalence
for some networks, when the pairing algorithm succeeds. However, we ﬁnd that
there are some networks that are (2−45 , 10−9 ) functionally equivalent but where
the weight alignment procedure fails. Therefore, we suspect that there are more
equivalence classes of functions than scalar multiples of permuted neurons, and
so develop further methods for tightly computing (ε, 0) functional equivalence.

Error overapproximation through MILP. The above analysis approach is loose.

Our second approach gives exact bounds with an additive error at most 10−10 .
Neural networks are piecewise linear functions, and so can be cast as a mixed
integer linear programming (MILP) problem [KBD+17]. We directly express
Deﬁnition 1 as a MILP, following the process of [KBD+17] by encoding linear
layers directly, and encoding ReLU layers by assigning a binary integer variable
to each ReLU. Due to the exponential nature of the problem, this approach is
limited to small networks.
State-of-the-art MILP solvers oﬀer a maximum (relative, additive) error tol-
erance of 10−10 ; for our networks the SVD upper bound is often 10−10 or better,
so the MILP solver gives a worse bound, despite theoretically being tight.

7 Results
We extract a wide range of neural network architectures; key results are given
in Table 1 (Sect. 1). We compute (ε, δ)-functional equivalence at δ = 10−9 and
δ = 0 on the domain S = {x : x2 < d0 ∧ x ∈ X }, suﬃcient to explore both
sides of every neuron.
216 N. Carlini et al.

8 Concluding Remarks
We introduce a cryptanalytic method for extracting the weights of a neural
network by drawing analogies to cryptanalysis of keyed ciphers. Our differential
attack requires multiple orders of magnitude fewer queries per parameter than
prior work and extracts models that are multiple orders of magnitude more
accurate than prior work. In this work, we do not consider defenses—promising
approaches include detecting when an attack is occuring, adding noise at some
stage of the model’s computation, or only returning the label corresponding to
the output, any of these easily break our presented attack.
The practicality of this attack has implications for many areas of machine
learning and cryptographic research. The field of secure inference relies on the
assumption that observing the output of a neural network does not reveal the
weights. This assumption is false, and therefore the field of secure inference will
need to develop new techniques to protect the secrecy of trained models.
We believe that by casting neural network extraction as a cryptanalytic prob-
lem, even more advanced cryptanalytic techniques will be able to greatly improve
on our results, reducing the computational complexity, reducing the query com-
plexity and reducing the number of assumptions necessary.

Acknowledgements. We are grateful to the anonymous reviewers, Florian Tramèr,

Nicolas Papernot, Ananth Raghunathan, and Úlfar Erlingsson for helpful feedback.

References
[BBJP19] Batina, L., Bhasin, S., Jap, D., Picek, S.: CSI NN: reverse engineering of
neural network architectures through electromagnetic side channel. In: 28th
USENIX Security Symposium (2019)
[BCB15] Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly
learning to align and translate. In: 3rd International Conference on Learning
Representations (ICLR) (2015)
[BCM+13] Biggio, B., et al.: Evasion attacks against machine learning at test time.
In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD
2013. LNCS (LNAI), vol. 8190, pp. 387–402. Springer, Heidelberg (2013).
https://doi.org/10.1007/978-3-642-40994-3 25
[BFH+18] Bradbury, J., et al.: JAX: composable transformations of Python+NumPy
programs (2018)
[BS91] Biham, E., Shamir, A.: Diﬀerential cryptanalysis of DES-like cryptosys-
tems. J. Cryptol. 4(1), 3–72 (1991). https://doi.org/10.1007/BF00630563
[CCG+18] Chandrasekaran, V., Chaudhuri, K., Giacomelli, I., Jha, S., Yan, S.:
Exploring connections between active learning and model extraction. arXiv
preprint arXiv:1811.02054 (2018)
[CLE+19] Carlini, N., Liu, C., Erlingsson, Ú., Kos, J., Song, D.: The secret sharer:
evaluating and testing unintended memorization in neural networks. In:
USENIX Security Symposium, pp. 267–284 (2019)
[DGKP20] Das, A., Gollapudi, S., Kumar, R., Panigrahy, R.: On the learnability of
random deep networks. In: ACM-SIAM Symposium on Discrete Algorithms,
SODA 2020, pp. 398–410 (2020)
Cryptanalytic Extraction of Neural Network Models 217

[EKN+17] Esteva, A., et al.: Dermatologist-level classification of skin cancer with deep
neural networks. Nature 542(7639), 115–118 (2017)
[FJR15] Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit
confidence information and basic countermeasures. In: ACM CCS, pp.
1322–1333 (2015)
[GBDL+16] Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M.,
Wernsing, J.: CryptoNets: applying neural networks to encrypted data with
high throughput and accuracy. In: International Conference on Machine
Learning, pp. 201–210 (2016)
[Gen09] Gentry, C.: A fully homomorphic encryption scheme. Ph.D. thesis, Stanford
University (2009)
[HDK+20] Hong, S., Davinroy, M., Kaya, Y., Dachman-Soled, D., Dumitraş, T.: How
to 0wn the NAS in your spare time. In: International Conference on Learn-
ing Representations (2020)
[HZRS16] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recog-
nition. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 770–778 (2016)
[JCB+19] Jagielski, M., Carlini, N., Berthelot, D., Kurakin, A., Papernot, N.: High-
fidelity extraction of neural network models. arXiv:1909.01838 (2019)
[JOB+18] Jagielski, M., Oprea, A., Biggio, B., Liu, C., Nita-Rotaru, C., Li, B.: Manip-
ulating machine learning: poisoning attacks and countermeasures for regres-
sion learning. In: 2018 IEEE Symposium on Security and Privacy (S&P),
pp. 19–35. IEEE (2018)
[KBD+17] Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex:
an efficient SMT solver for verifying deep neural networks. In: Majumdar,
R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer,
Cham (2017). https://doi.org/10.1007/978-3-319-63387-9 5
[KLA+19] Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila,
T.: Analyzing and improving the image quality of StyleGAN. CoRR,
abs/1912.04958 (2019)
[KTP+19] Krishna, K., Tomar, G.S., Parikh, A.P., Papernot, N., Iyyer, M.: Thieves
on sesame street! Model extraction of BERT-based APIs. arXiv preprint
arXiv:1910.12366 (2019)
[Lev14] Levinovitz, A.: The mystery of Go, the ancient game that computers still
can’t win. Wired, May 2014
[MLS+20] Mishra, P., Lehmkuhl, R., Srinivasan, A., Zheng, W., Popa, R.A.: DELPHI:
a cryptographic inference service for neural networks. In: 29th USENIX
Security Symposium (2020)
[MSDH19] Milli, S., Schmidt, L., Dragan, A.D., Hardt, M.: Model reconstruction
from model explanations. In: Proceedings of the Conference on Fairness,
Accountability, and Transparency, FAT* 2019, pp. 1–9 (2019)
[NH10] Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann
machines. In: Proceedings of the 27th International Conference on Machine
Learning (ICML), pp. 807–814 (2010)
[RK19] Rolnick, D., Kording, K.P.: Identifying weights and architectures of
unknown ReLU networks. arXiv preprint arXiv:1910.00744 (2019)
[RWT+18] Riazi, M.S., Weinert, C., Tkachenko, O., Songhori, E.M., Schneider, T.,
Koushanfar, F.: Chameleon: a hybrid secure computation framework for
machine learning applications. In: ACM ASIACCS, pp. 707–721 (2018)
[SHM+16] Silver, D., et al.: Mastering the game of Go with deep neural networks and
tree search. Nature 529(7587), 484 (2016)
218 N. Carlini et al.

[SIVA17] Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, Inception-
ResNet and the impact of residual connections on learning. In: Proceedings
of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI 2017,
pp. 4278–4284. AAAI Press (2017)
[SSRD19] Shamir, A., Safran, I., Ronen, E., Dunkelman, O.: A simple explanation for
the existence of adversarial examples with small Hamming distance. CoRR,
abs/1901.10861 (2019)
[SZS+14] Szegedy, C., et al.: Intriguing properties of neural networks. In: 2nd
International Conference on Learning Representations (ICLR 2014).
arXiv:1312.6199 (2014)
[TL19] Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional
neural networks. arXiv preprint arXiv:1905.11946 (2019)
[TZJ+16] Tramèr, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing
machine learning models via prediction APIs. In: USENIX Security Sym-
posium, pp. 601–618 (2016)
[Wen90] Wenskay, D.L.: Intellectual property protection for neural networks. Neural
Netw. 3(2), 229–236 (1990)
[WG18] Wang, B., Gong, N.Z.: Stealing hyperparameters in machine learning. In:
2018 IEEE Symposium on Security and Privacy (S&P), pp. 36–52. IEEE
(2018)
[WSC+16] Wu, Y., et al.: Google’s neural machine translation system: bridging the gap
between human and machine translation. arXiv preprint arXiv:1609.08144
(2016)
[XHLL19] Xie, Q., Hovy, E., Luong, M.-T., Le, Q.V.: Self-training with noisy student
improves ImageNet classification. arXiv preprint arXiv:1911.04252 (2019)
[Yao86] Yao, A.C.-C.: How to generate and exchange secrets. In: FOCS 1986, pp.
162–167. IEEE (1986)
[ZL16] Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning.
arXiv preprint arXiv:1611.01578 (2016)

er
No ratings yet
er
133 pages
Security-of-AI-systems_Fundamentals_Adversarial_Deep_Learning
No ratings yet
Security-of-AI-systems_Fundamentals_Adversarial_Deep_Learning
288 pages
2403.06634
No ratings yet
2403.06634
25 pages
HTBMS- 262 Karan Jawahrani Black Book Project
No ratings yet
HTBMS- 262 Karan Jawahrani Black Book Project
100 pages
MN906-NNWatermarking
No ratings yet
MN906-NNWatermarking
66 pages
2018 Gazelle
No ratings yet
2018 Gazelle
19 pages
Towards Trustworthy LLMs_ Understanding the Security and Privacy
No ratings yet
Towards Trustworthy LLMs_ Understanding the Security and Privacy
82 pages
Investigation Into The Potential Shortfall of Power Technicians Across The Generation Industry 3058322 1
No ratings yet
Investigation Into The Potential Shortfall of Power Technicians Across The Generation Industry 3058322 1
81 pages
Slides Security and Privacy in Machine Learning
No ratings yet
Slides Security and Privacy in Machine Learning
59 pages
s10207-024-00813-3
No ratings yet
s10207-024-00813-3
28 pages
Hitachi Micro (EMM)
No ratings yet
Hitachi Micro (EMM)
24 pages
2403.06634v2
No ratings yet
2403.06634v2
27 pages
L3-COMP1806-2024
No ratings yet
L3-COMP1806-2024
42 pages
COM3030 Week 10 Slides (2)
No ratings yet
COM3030 Week 10 Slides (2)
63 pages
Poc Unit-1 Notes
No ratings yet
Poc Unit-1 Notes
46 pages
Lecture W15ab
No ratings yet
Lecture W15ab
44 pages
Alice Book Volume 1
No ratings yet
Alice Book Volume 1
281 pages
lbdl
No ratings yet
lbdl
143 pages
Book - A State of The Art Review On Adversarial Machine Learning
No ratings yet
Book - A State of The Art Review On Adversarial Machine Learning
66 pages
2017-MiniONN
No ratings yet
2017-MiniONN
13 pages
SBA - Fault Injection Attack On Deep Neural Network
No ratings yet
SBA - Fault Injection Attack On Deep Neural Network
23 pages
Module 4
No ratings yet
Module 4
36 pages
NC（检测后门攻击的方法） - 2023-09-06 19-00-48
No ratings yet
NC（检测后门攻击的方法） - 2023-09-06 19-00-48
17 pages
paper1
No ratings yet
paper1
16 pages
BadNets Evaluating Backdooring Attacks On Deep Neu
No ratings yet
BadNets Evaluating Backdooring Attacks On Deep Neu
16 pages
ML LittelBook
No ratings yet
ML LittelBook
161 pages
FDNet Imperceptible Backdoor Attacks Via Frequency Domain Steganography
No ratings yet
FDNet Imperceptible Backdoor Attacks Via Frequency Domain Steganography
17 pages
CO2 Pipeline Safety Public Meeting Slides Final
No ratings yet
CO2 Pipeline Safety Public Meeting Slides Final
37 pages
Watermarking Deep Neural Network by Backdooring
No ratings yet
Watermarking Deep Neural Network by Backdooring
18 pages
Actron Air PKY700T Model
No ratings yet
Actron Air PKY700T Model
12 pages
Stealing Machine Learning Models: Attacks and Countermeasures For Generative Adversarial Networks
No ratings yet
Stealing Machine Learning Models: Attacks and Countermeasures For Generative Adversarial Networks
16 pages
w11 ML Security
No ratings yet
w11 ML Security
35 pages
Rakin_Bit-Flip_Attack_Crushing_Neural_Network_With_Progressive_Bit_Search_ICCV_2019_paper
No ratings yet
Rakin_Bit-Flip_Attack_Crushing_Neural_Network_With_Progressive_Bit_Search_ICCV_2019_paper
10 pages
Ye_Ungeneralizable_Examples_CVPR_2024_paper
No ratings yet
Ye_Ungeneralizable_Examples_CVPR_2024_paper
10 pages
Zhang_The_Secret_Revealer_Generative_Model-Inversion_Attacks_Against_Deep_Neural_Networks_CVPR_2020_paper
No ratings yet
Zhang_The_Secret_Revealer_Generative_Model-Inversion_Attacks_Against_Deep_Neural_Networks_CVPR_2020_paper
9 pages
Hacking Neural Networks: A Short Introduction
No ratings yet
Hacking Neural Networks: A Short Introduction
50 pages
2309.01838v2
No ratings yet
2309.01838v2
8 pages
A Critical Overview of Privacy in Machine Learning
No ratings yet
A Critical Overview of Privacy in Machine Learning
9 pages
Blacklight
No ratings yet
Blacklight
18 pages
NEURALPAPER2023
No ratings yet
NEURALPAPER2023
7 pages
Machine Learning Security Threats Countermeasures and Evaluations
No ratings yet
Machine Learning Security Threats Countermeasures and Evaluations
23 pages
LOGAN - Membership Inference Attacks
No ratings yet
LOGAN - Membership Inference Attacks
20 pages
Aaai 2020
No ratings yet
Aaai 2020
8 pages
2016-CryptoNets
No ratings yet
2016-CryptoNets
10 pages
ROWBACK Robust Watermarking For Neural
No ratings yet
ROWBACK Robust Watermarking For Neural
8 pages
CXZCXZ
No ratings yet
CXZCXZ
8 pages
Research Methodologies: Chapter Four Research Methods
No ratings yet
Research Methodologies: Chapter Four Research Methods
13 pages
Sok: Security and Privacy in Machine Learning
No ratings yet
Sok: Security and Privacy in Machine Learning
16 pages
Tender Document FOR Providing Onsite Services On Single Platform Software For Various Modules, With Implementation & Maintenance
No ratings yet
Tender Document FOR Providing Onsite Services On Single Platform Software For Various Modules, With Implementation & Maintenance
26 pages
Security_Engineering_for_Machine_Learning
No ratings yet
Security_Engineering_for_Machine_Learning
4 pages
Clustering Methods For Big Data Analytics Techniques, Toolboxes and Applications
No ratings yet
Clustering Methods For Big Data Analytics Techniques, Toolboxes and Applications
192 pages
Security and Privacy Challenges in Deep Learning
No ratings yet
Security and Privacy Challenges in Deep Learning
4 pages
Data Security Tutorial 12 - Solutions
No ratings yet
Data Security Tutorial 12 - Solutions
4 pages
10.1109@MC.2019.2909955
No ratings yet
10.1109@MC.2019.2909955
4 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
143 pages
Artificial Neural Network Concepts and Examples
No ratings yet
Artificial Neural Network Concepts and Examples
61 pages
RN41/RN41N: Class 1 Bluetooth Module With EDR Support
No ratings yet
RN41/RN41N: Class 1 Bluetooth Module With EDR Support
28 pages
Sec16 Paper Tramer
No ratings yet
Sec16 Paper Tramer
19 pages
MachineLearningSlides PartOne
No ratings yet
MachineLearningSlides PartOne
252 pages
AIDL03 EvolutionOfAI
No ratings yet
AIDL03 EvolutionOfAI
22 pages
DBA
No ratings yet
DBA
2 pages
1 s2.0 S0925231223010202 Main
No ratings yet
1 s2.0 S0925231223010202 Main
18 pages
03-02-lessonarticle
No ratings yet
03-02-lessonarticle
5 pages
The Finite Difference Method_HP
No ratings yet
The Finite Difference Method_HP
2 pages
Vennila Resume (1)
No ratings yet
Vennila Resume (1)
1 page
Lec. - 1 - (Linear Eqns.)
No ratings yet
Lec. - 1 - (Linear Eqns.)
8 pages
SCADA System: Supervisory Control and Data Acquisition
No ratings yet
SCADA System: Supervisory Control and Data Acquisition
104 pages
Paper AI
No ratings yet
Paper AI
6 pages
4404GGBTSG9343T Piping Design Standard
No ratings yet
4404GGBTSG9343T Piping Design Standard
53 pages
Architectural-Technician_Centennial-College
No ratings yet
Architectural-Technician_Centennial-College
9 pages
Chosen Plaintext Attack Against Neural Network-Based Symmetric Cipher
No ratings yet
Chosen Plaintext Attack Against Neural Network-Based Symmetric Cipher
5 pages
Msi Mag b550m Mortar Wifi Datasheet
No ratings yet
Msi Mag b550m Mortar Wifi Datasheet
1 page
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
155 pages
Contemporary ML For Physicists
No ratings yet
Contemporary ML For Physicists
91 pages
Wordbeginner
0% (1)
Wordbeginner
3 pages
WorksheetWorks Identifying Verbs 1
No ratings yet
WorksheetWorks Identifying Verbs 1
2 pages
Machine Learning Security and Privacy
No ratings yet
Machine Learning Security and Privacy
3 pages
1306 E87tag6
100% (2)
1306 E87tag6
106 pages
Section 1: Request For Quotation (RFQ)
No ratings yet
Section 1: Request For Quotation (RFQ)
20 pages
Adversarial ML Survey Paper
No ratings yet
Adversarial ML Survey Paper
23 pages
Notes On Luenberger's Vector Space Optimization
100% (2)
Notes On Luenberger's Vector Space Optimization
131 pages
Daily Report - Microtunelling Works
No ratings yet
Daily Report - Microtunelling Works
1 page
Al - Amel New 2
No ratings yet
Al - Amel New 2
8 pages
Relief Valve (Line) - Test and Adjust - Hydraulic Hammer PDF
100% (1)
Relief Valve (Line) - Test and Adjust - Hydraulic Hammer PDF
4 pages
Fig.1205A CASTER (-352AB )
No ratings yet
Fig.1205A CASTER (-352AB )
2 pages
LF 45 - 55
100% (3)
LF 45 - 55
410 pages
BOW - CSS 10 - Quarter 3
No ratings yet
BOW - CSS 10 - Quarter 3
3 pages
Math for Deep Learning: What You Need to Know to Understand Neural Networks
From Everand
Math for Deep Learning: What You Need to Know to Understand Neural Networks
Ronald T. Kneusel
No ratings yet
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
From Everand
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Crypto_with_Machine_Learning

Uploaded by

Crypto_with_Machine_Learning

Uploaded by

Cryptanalytic Extraction of

Neural Network Models

Nicholas Carlini1(B) , Matthew Jagielski2 , and Ilya Mironov3

Abstract. We argue that the machine learning problem of model extrac-

M. Jagielski—Northeastern University, part of work done at Google. I. Mironov—

Machine Translation [BCB15] is now the technique of choice for production

Is it possible to extract an identical copy of a neural network

While this question is not new [TZJ+16,MSDH19,JCB+19,RK19], we argue

1.1 Model Extraction as a Cryptanalytic Problem

1.2 Our Results

Architecture Parameters Approach Queries (ε, 10−9 ) (ε, 0) max |θ − θ̂|

The remainder of this paper is structured as follows. We introduce the notation,

1.3 Related Work

2.1 Notation and Deﬁnitions

Deﬁnition 4. The neurons {ηi }N i=1 are functions receiving an

Deﬁnition 5. The architecture of a neural network captures the structure of f :

Deﬁnition 6. The parameters θ of fθ are the concrete assignments to the

2.2 Adversarial Goals and Resources

Assumptions. We make several assumptions of the oracle O and the attacker’s

3 Overview of the Diﬀerential Attack

∂2f (1) (2)

Fig. 1. A schematic of our extraction attack on a 1-deep neural network. Let x be

Handling ﬂoating-point imprecision. Implementing our attack in practice with

4 Idealized Diﬀerential Extraction Attack

We now introduce our (0, 0)-functionally-equivalent model extraction attack that

4.1 Zero-Deep Neural Network Extraction

f (x + δ) − f (x) = A(1) · (x + δ) − A(1) · x = A(1) · δ.

4.2 One-Deep Neural Network Extraction

This allows us to compute every row of A(1) up to a single scalar cj . Further, we

that x∗ induces a critical point on neuron ηj and so its value is zero.

Extracting row signs. Consider a single witness xi for an arbitrary neuron ηi .

4.3 k-Deep Contractive Neural Networks

Recovering Row Signs Requires Computing the Preimage of Arbitrary Hidden

4.3.1 Extracting Layer-1 Weights with Unknown Critical Point Layers

(j) 1 if V(ηn ; x) > 0

flocal (x) = Γ I (1) (x)A(1) x + β(x).

4.3.2 Extracting Hidden Layer Weights with Unknown Critical Points

From here we can construct a system of equations: let hi = σ(f1..j−1 (x + δi ))

two partial solutions together. If the uniﬁcation procedure above fails—because

4.3.3 Recovering Row Signs in Deep Networks

4.4 k-Deep Expansive Neural Networks

Recovering signs on the second-to-last layer. Suppose we had extracted com-

Deﬁnition 12. Let v  M = M  denote multiplying rows of matrix M ∈ Ra×b

4.4.1 The Polytope Boundary Projection Algorithm

Proj1..j (x, Δ) = minα≥0 {α : ∃η s.t. L(η) ≤ j ∧ V(η; x + αΔ) = 0}

4.4.2 Identifying a Single Next-Layer Witness

4.4.3 Recovering Multiple Witnesses for the Same Neuron

4.4.4 Brute Force Recovery

5 Instantiating the Diﬀerential Attack in Practice

5.1 Improving Precision of Extracted Layers

5.2 Eﬃcient Finite Diﬀerences

and therefore that  (1) 

5.3 Finding Witnesses to Critical Points

f (x∗ ) = fˆ(x∗ ) fˆ(x∗ ) f (x∗ ) = fˆ(x∗ )

Throughout the paper we require the ability to ﬁnd witnesses to critical

5.3.1 Further Reducing Query Complexity of Witness Discovery

5.4 Uniﬁcation of Witnesses with Noisy Gradients

5.5 Following Neuron Critical Points

Section 4.4.3 developed techniques to construct a set of witnesses to the same

Fig. 5. Numerically stable critical-point following algorithm. (left) From a point x

5.5.1 Determining Optimal Continuation Directions

r = arg minr : r·n=0 d(Wi ∪ {x + αr})

performing the minimization through random search over 1,000 directions.

We implement the described extraction algorithm in JAX [BFH+18], a Python

6.1 Computing (ε, 10−9 )-Functional Equivalence

Computing (ε, 10−9 )-functional equivalence is simple. Let S̄ ⊂ S be a ﬁnite set

6.2 Computing (ε, 0)-Functional Equivalence

Directly computing (ε, 0)-functional equivalence is infeasible, and is NP-hard

Error overapproximation through MILP. The above analysis approach is loose.

Acknowledgements. We are grateful to the anonymous reviewers, Florian Tramèr,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Deﬁnition 12. Let v M = M denote multiplying rows of matrix M ∈ Ra×b

and therefore that (1)