Crypto_with_Machine_Learning
Crypto_with_Machine_Learning
1 Introduction
The past decade has seen significant advances in machine learning, and deep
learning in particular. Tasks viewed as being completely infeasible at the begin-
ning of the decade became almost completely solved by the end. AlphaGo
[SHM+16] defeated professional players at Go, a feat in 2014 seen as being
at least ten years away [Lev14]. Accuracy on the ImageNet recognition bench-
mark improved from 73% in 2010 to 98.7% in 2019, a 20× reduction in error
rate [XHLL19]. Neural networks can generate photo-realistic high-resolution
images that humans find indistinguishable from actual photographs [KLA+19].
Neural network achieve higher accuracy than human doctors in limited settings,
such as early cancer detection [EKN+17].
These advances have brought neural networks into production systems. The
automatic speech recognition systems on Google’s Assistant, Apple’s Siri, and
Amazon’s Alexa are all powered by speech recognition neural networks. Neural
learning model is made available as an oracle O that can be queried, but with no
timing or other side channels. This setting captures that of obfuscated models
made public, prediction APIs, and secure inference.
given infinitely precise floating-point math, but given the realities of modern
machine learning, require far more sophisticated attack techniques.
Table 1. Efficacy of our extraction attack which is orders of magnitude more precise
than prior work and for deeper neural networks orders of magnitude more query effi-
cient. Models denoted a-b-c are fully connected neural networks with input dimension
a, one hidden layer with b neurons, and c outputs; for formal definitions see Sect. 2.
Entries denoted with a † were unable to recover the network after ten attempts.
Model extraction attacks are classified into two categories [JCB+19]: task accu-
racy extraction and fidelity extraction. The first paper to study task accuracy
extraction [TZJ+16] introduced techniques to steal similar models that approxi-
mately solves the same underlying decision task on the natural data distribution,
but do not necessarily match the predictions of the oracle precisely. While fur-
ther work exists in this space [CCG+18,KTP+19], we instead focus on fidelity
extraction where the adversary aims to faithfully reproduce the predictions of
the oracle model, when it is incorrect with respect to the ground truth. Again,
[TZJ+16] studied this problem and developed (what we would now call) func-
tionally equivalent extraction for the case of completely linear models.
This attack was then extended by a theoretical result defining and giving
a method for performing functionally-equivalent extraction for neural networks
with one layer, assuming oracle access to the gradients [MSDH19]. A concrete
implementation of this one layer attack that works in practice, handling floating
point imprecision, was subsequently developed through applying finite differ-
ences to estimate the gradient [JCB+19]. Parallel work to this also extended on
these results, focusing on deeper networks, but required tens to hundreds of mil-
lions of queries [RK19]; while the theoretical results extended to deep networks,
the implementation in practice only extracts up to the first two layers. Our work
builds on all of these four results to develop an approach that is 106 times more
accurate, requiring 103 times fewer queries, and applies to larger models.
Even without query access, it is possible to steal models with just a cache side-
channel [BBJP19], although with less fidelity than our attack that we introduce
which are 220 × more precise. Other attacks target hyperparameter extraction—
that is, extracting high-level details about the model: through what method it
was trained, if it contains convolutions, or related questions [WG18]. It is further
possible to steal hyperparameters with cache side channels [HDK+20].
Recent work has studied the learnability of deep neural networks with random
weights in the statistical query (SQ) model [DGKP20], showing that learnability
drops off exponentially with the depth of the network. This line of work does not
address the cryptographic hardness of extraction in the non-SQ model—precisely
the question addressed in this work in the empirical setting.
While not directly related to our problem, it is worth noting that we are
not the first to treat neural networks as just another type of mathemati-
cal function that can be analyzed without any specific knowledge of machine
learning. Shamir et al. [SSRD19] explain the existence of adversarial examples
[SZS+14,BCM+13], which capture evasion attacks on machine learning classi-
fiers, by considering an abstract model of neural networks.
In a number of places, our attack draws inspiration from the cryptanalysis
of keyed block-ciphers, most prominently differential cryptanalysis [BS91]. We
neither assume nor require familiarity with this field, but the informed reader
may enjoy certain parallels.
194 N. Carlini et al.
2 Preliminaries
This paper studies an abstraction of neural networks as functions f : X → Y.
Our results are independent of any methods for selecting the function f (e.g.,
stochastic gradient descent), and are independent of any utility of the function f .
As such, machine learning knowledge is neither expected nor necessary.
f = fk+1 ◦ σ ◦ · · · ◦ σ ◦ f2 ◦ σ ◦ f1 .
We exclusively study neural networks over X = Rd0 and Y = Rdk . (Until Sect. 5
we assume floating-point numbers can represent R exactly.)
Definition 3. The jth layer of the neural network fj is given by the affine
transformation fj (x) = A(j) x + b(j) . The weights A(j) ∈ Rdj ×dj−1 is a dj × dj−1
matrix; the biases b(j) ∈ Rdj is a dj -dimensional vector.
While representing each layer fj as a full matrix product is the most general defi-
nition of a layer, which is called fully connected, often layers have more structure.
For example, it is common to use (discrete) convolutions in neural networks that
operate on images. Convolutional layers take the input as a n × m matrix and
convolve it with a kernel, such as a 3 × 3 matrix. Importantly, however, it is
always possible to represent a convolution as a matrix product.
In this paper we exclusively study the ReLU [NH10] activation function, given
by σ(x) = max(x, 0). Our results are a fundamental consequence of the fact that
ReLU neural networks are piecewise linear functions.
We use the shorthand a-b-c neural network to denote the sizes of each dimension;
for example a 10-20-5 neural network has input dimension 10, one layer with 20
neurons, and output dimension 5. This description completely characterizes the
structure of f for fully connected networks. In practice, there are only a few
architectures that represent most of the deployed deep learning models [ZL16],
and developing new architectures is an extremely difficult and active area in
research [HZRS16,SIVA17,TL19].
Cryptanalytic Extraction of Neural Network Models 195
Given oracle access to the function fθ , we can estimate ∂fθ through finite
differences along arbitrary directions. For simple linear functions defined by
f (x) = a · x + b, its directional derivative satisfies ∂e
∂f
i
≡ ai , where ei is the
basis vector and ai is the ith entry of the vector a, allowing direct recovery of
its weights through querying on these well-chosen inputs.
In the case of deep neural networks, we consider second partial directional
2
derivatives. ReLU neural networks are piecewise linear functions with ∂∂xf2 ≡ 0
almost everywhere, except when the function has some neuron ηj at the bound-
ary between the negative and positive region (i.e., is at its critical point). We
2
show that the value of the partial derivative ∂∂ef2 evaluated at a point x so that
i
(1)
neuron ηj is at such a critical point actually directly reveals the weight T (Ai,j )
for some transform T that is invertible—and therefore the adversary can learn
(1)
Ai,j . By repeating this attack along all basis vectors ei and for all neurons ηj we
can recover the complete matrix A(1) . Once we have extracted the first layer’s
weights, we are able to “peel off” that layer and re-mount our attack on the
second layer of the neural network, repeating to the final layer. There are three
core technical difficulties to our attack:
Recovering the neuron signs. For each neuron η, our attack does not exactly
(l) (l)
recover Ai , the ith row of A(l) , but instead a scalar multiple v = α · Ai . While
losing a constant α > 0 keeps the neural network in the same equivalence class,
the sign of α is important and we must distinguish between the weight vector
(l) (l)
Ai and −Ai . We construct two approaches that solve this problem, but in the
general case we require exponential work (but a linear number of queries).
Cryptanalytic Extraction of Neural Network Models 197
Controlling inner-layer hidden state. On the first layer, we can directly compute
2
the derivative entry-by-entry, measuring ∂∂ef2 for each standard basis vector ei in
i
(1)
order to recover Aij . Deeper in the network, we can not move along standard
basis vector vectors. Worse, for each input x on average half of the neurons are in
the negative region and thus their output is identically 0; when this happens it is
not possible to learn the weight along edges with value zero. Thus we are required
to develop techniques to elicit behavior from every neuron, and techniques to
(l)
cluster together partial recoveries of each row of Ai to form a complete recovery.
Zero-deep neural networks are linear functions f (x) ≡ A(1) ·x+b(1) . Querying d0
linearly independent suffices to extract f by solving the resulting linear system.
However let us view this problem differently, to illuminate our attack strategy
for deeper networks. Consider the parallel evaluations f (x) and f (x + δ), with
This allows us to directly read off the weights of A(1) . Put differently, we perform
finite differences to estimate the gradient of f , given by ∇x f (x) ≡ A(1) .
η0 η0
η1
η2
e2
e1
Fig. 2. (left) Geometry of a 1-deep neural network. The three solid line corresponds to
“critical hyperplanes” of neurons. We identify one witness to each neuron with binary
search on the dotted line . (right) For each discovered critical point, we compute the
second partial derivative along axis e1 and e2 to compute the angle of the hyperplane.
For the remainder of this paper it will be useful to have two distinct mental
models of the problem at hand. First is the symbolic view shown previously in
Fig. 1. This view directly studies the flow of information through the neural
networks, represented as an alternating sequence of linear layers and non-linear
transformations. This view helps understanding the algebraic steps of our attack.
The second is the geometric view. Because neural networks operate over the
real vector space, they can be visualized by plotting two dimensional slices of
the landscape [MSDH19]. Figure 2 (left) contains an example of such a figure.
Each solid black line corresponds to a change in gradient induced in the space
by a neuron changing sign from positive to negative (or vice versa)—ignoring for
now the remaining lines. The problem of neural network extraction corresponds
to recovering the locations and angles of these neuron-induced hyperplanes: in
general with input dimension d0 , the planes have dimension d0 − 1.
Cryptanalytic Extraction of Neural Network Models 199
Definition 8. The function that computes the first j layers (up to and includ-
ing fj but not including σ) of f is denoted as f1..j . In particular, f = f1..k .
Definition 9. The hidden state at layer j is the output of the function f1..j ,
before applying the nonlinear transformation σ.
Layer fj is a linear transformation of the (j − 1)st hidden state after σ.
Definition 10. V(η; x) denotes the input to neuron η (before applying σ) when
evaluated at x. L(η) denotes the layer of neuron η. The first layer starts at 1.
Definition 11. A neuron η is at a critical point when V(η; x) = 0. We refer
to this input x as a witness to the fact that η is at a critical point, denoted by
x ∈ W(η). If V(η; x) > 0 then η is active, and otherwise inactive.
In Fig. 2 the locations of these critical points correspond exactly to the solid
black lines drawn through the plane. Observe that because we restrict ourselves
to ReLU neural networks, the function f is piecewise linear and infinitely differ-
entiable almost everywhere. The gradient ∇x f (x) is well defined at all points x
except when there exists a neuron that is at its critical point.
Extracting the rows of A(1) up to sign. Functionally, the attack as presented in
this subsection has appeared previously in the literature [MSDH19,JCB+19].
By framing it differently, our attack will be extensible to deeper networks.
Assume we were given a witness x∗ ∈ W(ηj ) that caused neuron ηj to be
at its critical point (i.e., its value is identically zero). Because we are using the
ReLU activation function, this is the point at which that neuron is currently
“inactive” (i.e., is not contributing to the output of the classifier) but would
become “active” (i.e., contributing to the output) if it becomes slightly positive.
Further assume that only this neuron ηj is at its critical point, and that for all
others neurons η = ηj we have |V(η, xj )| > δ for a constant δ > 0.
Consider two parallel executions of the neural network on pairs of examples.
Begin by defining ei as the standard basis vectors of X = RN . By querying on
the two pairs of inputs (x∗ , x∗ + ei ) and (x∗ , x∗ − ei ) we can estimate
i ∂f (x) i ∂f (x)
α+ = and α =
∂ei x=x∗ +e1 −
∂ei x=x∗ −e1
through finite differences.
Consider the quantity |α+ − α− |. Because x∗ induces a critical point of ηj ,
exactly one of {α+ , α− } will have the neuron ηj in its active regime and the
other will have ηj in its inactive regime. If no two columns of A(1) are collinear,
then as long as < δ (1) , we are guaranteed that all other neurons in the
i,j |Ai,j |
neural network will remain in the same state as before—either active or inactive.
Therefore, if we compute the difference |α+ i
− α−i
|, the gradient information
flowing into and out of all other neurons will cancel and we will be left with just
the gradient information flowing along the edge from the input coordinate i to
neuron ηj to the output. Concretely, we can write the 1-deep neural network as
f (x) = A(2) ReLU(A(1) x + b(1) ) + b(2) .
200 N. Carlini et al.
(1) (1)
i
and so either α+ −α−
i
= Aj,i ·A(2) or α−
i
−α+
i
= Aj,i ·A(2) . However, if we repeat
(1)
k
the above procedure on a new basis vector ek then either α+ − α−
k
= Aj,k · A(2)
(1)
k
or α− − α+k
= Aj,k · A(2) will hold. Crucially, whichever of the two relations that
holds for along coordinate i will be the same relation that holds on coordinate k.
Therefore we can divide out A(2) to obtain the ratio of pairs of weights
(1)
αk+ −α−
k Aj,k
αi+ −αi−
= (1) .
Aj,i
Finding witnesses to critical points. It only remains to show how to find witnesses
x∗ ∈ W(η) for each neuron η on the first layer. We choose a random line in input
space (the dashed line in Fig. 2, left), and search along it for nonlinearities in the
partial derivative. Any nonlinearity must have resulted from a ReLU changing
signs, and locating the specific location where the ReLU changes signs will give
us a critical point. We do this by binary search.
To begin, we take a random initial point x0 , v ∈ Rd0 together with a large
range T . We perform a binary search for nonlinearities in f (x0 + tv) for t ∈
[−T, T ]. That is, for a given interval [t0 , t1 ], we know a critical point exists in
the interval if ∂f (x+tv)
∂v |t=t0 = ∂f (x+tv)
∂v |t=t1 . If these quantities are equal, we do
not search the interval, otherwise we continue with the binary search.
Extracting the second layer. Once we have fully recovered the first layer weights,
we can “peel off” the weight matrix A(1) and bias b(1) and we are left with
extracting the final linear layer, which reduces to 0-deep extraction.
Cryptanalytic Extraction of Neural Network Models 201
Critical Points Can Occur Due to ReLUs on Different Layers. Because 1-deep
networks have only one layer, all ReLUs occur on that layer. Therefore all critical
points found during search will correspond to a neuron on that layer. For k-deep
networks this is not true, and if we want to begin by extracting the first layer
we will have to remove non-first layer critical points. (And, in general, to extract
layer j, we will have to remove non-layer-j critical points.)
The Weight Recovery Procedure Requires Complete Control of the Input. In order
to be able to directly read off the weights, we query the network on basis vec-
tors ei . Achieving this is not always possible for deep networks, and we must
account for the fact that we may only be able to query on non-orthogonal direc-
tions.
Here, I (j) are 0–1 diagonal matrices with a 0 on the diagonal when the neuron
is inactive and 1 on the diagonal when the neuron is active:
where ηn is the nth neuron on the first layer. Importantly, observe that each I (j)
is a constant as long as x is sufficiently close to x∗ . While β is unknown, as long as
we make only gradient queries ∂flocal its value is unimportant. This observation
so far follows from the definition of piecewise linearity.
202 N. Carlini et al.
Consider now some input that is a witness to exactly one critical point on
neuron η ∗ . Formally, x∗ ∈ W(η ∗ ), but x∗ ∈ ηj =η∗ W(ηj ; x∗ ). Then
flocal (x) = A(k+1) I (k) A(k) · · · I (2) A(2) I (1) (x)A(1) x + β(x)
where again I (j) are 0–1 matrices, but except that now, I (1) (and only I (1) )
is a function of x returning a 0–1 diagonal matrix that has one of two values,
depending on the value of V(η ∗ ; x) > 0. Therefore we can no longer collapse the
matrix product into one matrix Γ but instead can only obtain
But this is exactly the case we have already solved for 1-deep neural network
weight recovery: it is equivalent to the statement flocal (x) = Γ σ(A(1) x+b(1) )+β2 ,
(1)
and so by dividing out Γ exactly as before we can recover the ratios of Ai,j .
Finding first-layer critical points. Assume we are given a set of inputs S = {xi }
so that each xi is a witness to neuron ηxi , with ηxi unknown. By the coupon
collector’s argument (assuming uniformity), for |S| N log N , where N is the
total number of neurons, we will have at least two witnesses to every neuron η.
Without loss of generality let x0 , x1 ∈ W(η) be witnesses to the same neu-
ron η on the first layer, i.e, that V(η; x0 ) = V(η; x1 ) = 0. Then, performing the
weight recovery procedure beginning from each of these witnesses (through finite
(1)
differences) will yield the correct weight vector Aj up to a scalar.
Typically elements of S will not be witnesses to neurons on the first layer.
Without loss of generality let x2 and x3 be witnesses to any neuron on a deeper
layer. We claim that we will be able to detect these error cases: the outputs of
the extraction algorithm will appear to be random and uncorrelated. Informally
speaking, because we are running an attack designed to extract first-layer neu-
rons on a neuron actually on a later layer, it is exceedingly unlikely that the
attack would, by chance, give consistent results when run on x2 and x3 (or any
arbitrary pair of neurons).
Formally, let h2 = f1 (x2 ) and h3 = f1 (x3 ). With high probability, sign(h2 ) =
sign(h3 ). Therefore, when executing the extraction procedure on x2 we compute
over the function Γ1 I (1) (x2 )A(1) x + β1 , whereas extracting on x3 computes over
Γ2 I (1) (x3 )A(1) x + β2 . Because Γ1 = Γ2 , this will give inconsistent results.
Therefore our first layer weight recovery procedure is as follows. For all inputs
xi ∈ S run the weight recovery procedure to recover the unit-length normal
vector to each critical hyperplane. We should expect to see a large number of
vectors only once (because they were the result of running the extraction of a
layer 2 or greater neuron), and a small number of vectors that appear duplicated
(because they were the result of successful extraction on the first layer). Given
the first layer, we can reduce the neural network from a k-deep neural network
to a (k − 1)-deep neural network and repeat the attack. We must resolve two
difficulties, however, discussed in the following two subsections.
Cryptanalytic Extraction of Neural Network Models 203
η5 η0
η1
η2
y
x
η3
y x
η4
x
Fig. 3. (left) Geometry of a k-deep neural network, following [RK19]. Critical hyper-
planes induced from neuron η0 , η1 , η2 are on the first layer and are linear. Critical
hyperplanes induced from neurons η3 , η4 are on the second layer and are “bent” by
neurons on the first layer. The critical hyperplane induced from neuron η5 is a neuron
on the third layer and is bent by neurons on the prior two layers. (right) Diagram of
the hyperplane following procedure. Given an initial witness to a critical point x, fol-
low the hyperplane to the double-critical point x . To find where it goes next, perform
binary search along the dashed line and find the witness y.
Cryptanalytic Extraction of Neural Network Models 205
While most small neural networks are contractive, in practice almost all
interesting neural networks are expansive: the number of neurons on some inter-
mediate layer is larger than the number of inputs to that layer. Almost all of
the prior methods still apply in this setting, with one exception: the column sign
recovery procedure. Thus, we are required to develop a new strategy.
Recovering signs of the last layer. Observe that sign information is not lost for
the final layer: because there is no ReLU activation and we can directly solve for
the weights with least squares, we do not lose sign information.
Let hi = σ(f1..k−1 (xi )). Enumerate all 2dk assignments of s and compute
gi = σ((s Â(k) )hi + (s b̂(k) ). We know that if we guessed the sign vector s
correctly, then there would exist a solution to the system of equations v · gi + b =
f (xi ). This is the zero-deep extraction problem and solving it efficiently requires
just a single call to least squares. This allows us to—through brute forcing the
sign bits—completely recover both the signs of the second-to-last layer as well
as the values (and signs) of the final layer.
Unfortunately, this procedure does not scale to recover the signs of layer k −1
and earlier. It relies on the existence of an efficient testing procedure (namely,
least squares) to solve the final layer. If we attempted this brute-force strategy at
layer k − 3 in order to test if our sign assignment was correct, we would need to
run the complete layer k − 2 extraction procedure, thus incurring an exponential
number of queries to the oracle.
However, we can use this idea in order to still recover signs even at earlier
layers in the network with only a linear number of queries (but still exponential
work in the width of the hidden layers).
Recovering signs of arbitrary hidden layers. Assume that we are given a collec-
tion of examples {xi } ⊂ W(η) for some neuron η that is on the layer after we
extracted so far: L(η) = j + 1. Then we would know that there should exist a
single unknown vector v and bias b such that fj (xi ) · v + b = 0 for all xi .
This gives us an efficient procedure to test whether or not a given sign assign-
ment on layer j is correct. As before, we enumerate all possible sign assignments
and then check if we can recover such a vector v. If so, the assignment is correct;
if not, it is wrong. It only remains left to show how to implement this procedure
to obtain such a collection of inputs {xi }.
206 N. Carlini et al.
Observe that the layer j polytope around x is an open, convex set, as long as x
is not a witness to a critical point. In Fig. 3, each enclosed region is a layer-k
polytope and the triangle formed by η0 , η1 , and η2 is a layer-(k − 1) polytope.
Given an input x and direction Δ, we can compute the distance α so that
the value x = x + αΔ is at the boundary of the polytope defined by layers 1
to k. That is, starting from x traveling along direction Δ we stop the first time
a neuron on layer j or earlier reaches a critical point. Formally, we define
We only ever compute Proj1..j when we have extracted the neural network up to
layer j. Thus we perform the computation with respect to the extracted func-
tion fˆ and neuron-value function V̂, and so computing this function requires no
queries to the oracle. In practice we solve for α via binary search.
probability there will not exist a valid linear transformation w, b. Thus we can
recover sign with a linear number of queries and exponential work.
Reading this section. Each sub-section that follows is independent from the sur-
rounding sub-sections and modifies algorithms introduced in Sect. 4. For brevity,
we assume complete knowledge of the original algorithm and share the same
notation. Readers may find it helpful to review the original algorithm before
proceeding to each subsection.
Ideally we would solve this resulting system with least squares. However, in
practice, occasionally the conversion from x → x fails because x is no longer a
witness to the same neuron η . This happens when there is some other neuron
(i.e., η ) that is closer to x than the true neuron η. Because least squares is not
robust to outliers this procedure can fail to improve the solution.
We take two steps to ensure this does not happen. First, observe that if
Δ is smaller, the likelihood of capturing incorrect neurons η decreases faster
than the likelihood of capturing the correct neuron η. Thus, we set Δ to be
small enough that roughly half of the attempts at finding a witness x fails.
Second, we apply a (more) robust method of determining the vector that satisfies
the solution of equations [JOB+18]. However, even these two techniques taken
together occasionally fail to find valid solutions to improve the quality. When
this happens, we reject this proposed improvement and keep the original value.
Our attack could be improved with a solution to the following robust statistics
problem: Given a (known) set S ⊂ RN such that for some (unknown) weight
vector w we have Prx∈S [|w · x + 1| ≤ ] > δ for sufficiently small , sufficiently
large δ > 0.5, and δ|S| > N , efficiently recover the vector w to high precision.
Most of the methods in this paper are built on computing second partial deriva-
tives of the neural network f , and therefore developing a robust method for
estimating the gradient is necessary. Throughout Sect. 4 we compute the partial
derivative of f along direction α evaluated at x with step size ε as
∂ε def f (x + ε · α) − f (x)
f (x) = .
∂εα ε
i i
To compute the second partial derivative earlier, we computed α+ and α−
∗
by first taking a step towards x + 0 e1 for a different step size 0 and then
computed the first partial derivative at this location. However, with floating
point imprecision it is not desirable to have two step sizes (0 controlling the
distance away from x∗ to step, and controlling the step size when computing the
partial derivative). Worse, we must have that 0 because if ∂e∂f
1
∂f
0 > ∂e i
then
when computing the partial derivative along ei we may cross the hyperplane and
estimate the first partial derivative incorrectly. Therefore, instead we compute
i ∂f (x) i ∂f (x)
α+ = and α− =
∂ei x=x∗ +ei ∂ -ei x=x∗ −ei
where we both step along ei and also take the partial derivative along the same
ei (and similarly for −ei ). This removes the requirement for an additional hyper-
parameter and allows the step size to be orders of magnitude larger, but
introduces a new error: we now lose the relative signs
of the entries in the row
(1) (1)
when performing extraction and can only recover Ai,j /Ai,k .
210 N. Carlini et al.
(1) (1)
Extracting column signs. We next recover the value sign(Ai,j ) · sign(Ai,k ). For-
tunately, the same differencing process allows us to learn this information, using
(1) (1)
the following observation: if Ai,j and Ai,k have the same sign, then moving in
the ej + ek direction will cause their contributions to add. If they have different
signs, their contributions will cancel each other. That is, if
j+k j+k j j
α+ − α− = α+ − α− + α+ k
− α−k
,
we have that
(1) (1) (1) (1)
(Ai,j + Ai,k ) · A(2) = Ai,j · A(2) + Ai,k · A(2) ,
(1)
We can repeat this process to test whether each Ai,j has the same sign as
(1) (1)
(for example) Ai,1 . However, we still do not know whether any single Ai,j is
positive or negative—we still must recover the row signs as done previously.
f (x∗ )
xα x∗ xβ xα x∗ xβ xα x∗ xβ
Fig. 4. Efficient and accurate witness discovery. (left) If xα and xβ differ in only one
ReLU (as shown left), we can precisely identify the location x∗ at which the ReLU
reaches its critical point. (middle) If instead more than one ReLU differs (as shown
right), we can detect that this has happened: the predicted of fˆ(·) evaluated at x∗
as inferred from intersecting the dotted lines does not actually equal the true value
of f (x∗ ). (right) This procedure is not sound and still may potentially incorrectly
identify critical points; in practice we find these are rare.
and xβ differ in the sign of exactly one neuron i, then we can directly compute
the location x∗ at which V(ηi ; x∗ ) = 0 but so that for all other ηj we have
sign V(ηj ; xα ) = sign V(ηj ; x∗ ) = sign V(ηj ; xβ )
This approach is illustrated in Fig. 4 and relies on the fact that f is a piecewise
linear function with two components. By measuring, f (xα ) and ∂f (xα ) (resp.,
f (xβ ) and ∂f (xβ )), we find the slope and intercept of both the left and right lines
in Fig. 4 (left). This allows us to solve for their expected intersection (x∗ , fˆ(x∗ )).
Typically, if there are more than two linear segments, as in the middle of the
figure, we will find that the true function value f (x∗ ) will not agree with the
expected function value fˆ(x∗ ) we obtained by computing the intersection; we
can then perform binary search again and repeat the procedure.
However, we lose some soundness from this procedure. As we see in Fig. 4
(right), situations may arise where many ReLU units change sign between xα
and xβ , but fˆ(x∗ ) = f (x∗ ). In this case, we would erroneously return x∗ as a
critical point, and miss all of the other critical points in the range. Fortunately,
this error case is pathological and does not occur in practice.
Recall that to extract Â(l) we extract candidates candidates {ri } and search for
pairs ri , rj that agree on multiple coordinates. This allows us to merge ri and rj
to recover (eventually) full rows of Â(l) . With floating point error, the unification
algorithm in Sect. 4.3 fails for several reasons.
Our core algorithm computes the normal to a hyperplane, returning pairwise
(1) (1) (1)
ratios Âi,j /Âi,k ; throughout Sect. 4 we set Âi,1 = 1 without loss of generality.
Unfortunately in practice there is loss of generality, due to the disparate
impact of numerical instability. Consider the case where Ai,1 < 10−α for α
(l)
0,
(l)
but Ai,k ≥ 1 for all other k. Then there will be substantially more (relative)
212 N. Carlini et al.
(l)
floating point imprecision in the weight Ai,1 than in the other weights. Before
normalizing there is no cause for concern since the absolute error is no larger
than for any other. However, the described algorithm now normalizes every other
(l) (l)
coordinate Ai,k by dividing it by Ai,1 —polluting the precision of these values.
Therefore we adjust our solution. At layer l, we are given a collection of vec-
tors R = {ri }ni=1 so that each ri corresponds to the extraction of some (unknown)
neuron ηi . First, we need an algorithm to cluster the items into sets {Sj }dj=1
l
so
that Sj ⊂ R and so that every vector in Sj corresponds to one neuron on layer
(l)
l. We then need to unify each set Sj to obtain the final row of Âj .
(a)
Creating the Subsets S with Graph Clustering. Let rm ∈ Sn denote the ath
coordinate of the extracted row rm from cluster n. Begin by constructing a
(k)
graph G = (V, E) where each vector ri corresponds to a vertex. Let δij =
(k) (k)
|ri − rj | denote the difference between row ri and row rj along axis k; then
connect an edge from ri to rj when the approximate ·0 norm is sufficiently
(k)
large k 1 δij < > log d0 . We compute the connected components of G and
partition each set Sj as one connected component. Observe that if = 0 then this
procedure is exactly what was described earlier, pairing vectors whose entries
agree perfectly; in practice we find a value of ε = 10−5 suffices.
Unifying Each Cluster to Obtain the Row Weights. We construct the three
(i) (i)
dimensional Mi,a,b = ra /rb . Given M , the a good guess for the scalar cab
(i) (i)
so that ra = rb · Cab along as many coordinates i as possible is the assignment
Cab = mediani Mi,a,b , where the estimated error is eab = stdevi Mi,a,b .
If all ra were complete and had no imprecision then Cab would have no error
and so Cab = Cax · Cxb . However because it does have error, we can iteratively
improve the guessed C matrix by observing that if the error eax + exb < eab then
the guessed assignment Cax · Cxb is a better guess than Cab . Thus we replace
Cab ← Cax · Cxb and update eab ← eax + exb . We iterate this process until
there is no further
improvement. Then, finally, we choose the optimal dimension
a = arg mina b eab and return the vector Ca . Observe that this procedure
closely follows constructing the union of two partial entries ri and rj except that
we perform it along the best axis possible for each coordinate.
ηz
ηu
ηu x x̄
y
x1 η∗
η∗
x2 x̄
x∗
η∗
x∗
η∗
model), but x has likely drifted off of the original critical plane induced by η ∗
(Fig. 5).
To address√ this, after computing α we initially take a smaller step and let
x1 = x∗ + r α. We then refine the location of this point to a point x2 by
performing binary search on the region x1 − n to x1 + n for a small step .
If there was no error in computing n then x1 = x2 because both are already
witnesses to η ∗ . If not, any error has been corrected. Given x∗ and x2 we now
can now compute α2 = Proj1..j (x∗ , x2 − x∗ ) and let x̄ = x∗ + (x2 − x∗ )α2 which
will actually be a witness to both neurons simultaneously.
Next we give a stable method to compute y that is a witness to η ∗ and on
the other side of ηu . The previous procedure required a search parallel to ηu and
infinitesimally displaced, but this is not numerically stable without accurately
yet knowing the normal to the hyperplane given by ηu .
Instead we perform the following procedure. Choose two orthogonal vectors
of equal length β, γ and and perform binary search on the line segments that
trace out the perimeter of a square with coordinates x̄ ± β ± γ.
When β is small, the number of critical points crossed will be exactly four:
two because of ηu and two because of η ∗ . As long as the number of critical points
remains four, we double the length of β and γ.
Eventually we will discover more than four critical points, when the perimeter
of the square intersects another neuron ηz . At this point we stop increasing the
size of the box and can compute the continuation direction of η ∗ by discarding the
points that fall on ηu . We can then choose y as the point on η ∗ that intersected
with the largest square binary search.
Given the initial coordinate x and after computing the normal n to the
hyperplane, we have d0 − 1 dimensions that we can choose between to travel
next. Instead of choosing a random r · n = 0 we instead choose r such that we
make progress towards obtaining a fully diverse set W .
Define Wi as the set of witnesses that have been found so far. We say that this
set is diverse on neuron η if there exists an x+ , x− ∈ Wi such that V(η; x+ ) ≥ 0
and V(η; x− ) < 0. Choose an arbitrary neuron ηt such that Wi is not diverse
on ηt . (If there are multiple such options, we should prefer the neuron that would
be easiest to reach, but this is secondary.)
Our goal will be to choose a direction r such that (1) as before, r · n = 0,
however (2) Wi ∪ {x + αr} is closer to being fully diverse. Here, “closer” means
that d(W ) = minx∈W |V(ηt ; x)| is smaller. Because the set is not yet diverse on
ηt , all values are either positive or negative, and it is our objective to switch the
sign, and therefore become closer to zero. Therefore our procedure sets
6 Evaluation
In practice we set |S̄| = 109 and compute the max so that evaluating the function
is possible under an hour per neural network.
Error bounds propagation. The most direct method to compute (ε, 0)-functional
equivalence of the extracted neural network fˆ is to compare the weights A(i)
to the weights Â(i) and analytically derive an upper bound on the error when
performing inference. Observe that (1) permuting the order of the neurons in
the network does not change the output, and (2) any row can be multiplied by a
positive scalar c > 0 if the corresponding column in the next layer is divided by c.
Thus, before we can compare Â(i) to A(i) we must “align” them. We identify
the permutation mapping the rows of Â(l) to the rows of A(l) through a greedy
matching algorithm, and then compute a single scalar per row s ∈ Rd+i . To
ensure that multiplying by a scalar does not change the output of the network,
we multiply the columns of the next layer Â(l+1) by 1/s (with the inverse taken
pairwise). The process to align the bias vectors b(l) is identical, and the process
is repeated for each further layer.
This gives an aligned Ã(i) and b̃(i) from which we can analytically derive
upper bounds on the error. Let Δi = Ã(i) − A(i) , and let δi be the largest
singular value of Δi . If the 2 -norm of the maximum error going into layer i is
given by ei then we can bound the maximum error going out of layer i as
ei+1 ≤ δi · ei + b̃(i) − b(i) 2 .
By propagating bounds layer-by-layer we can obtain an upper bound on the
maximum error of the output of the model.
This method is able to prove an upper bound on (, 0) functional equivalence
for some networks, when the pairing algorithm succeeds. However, we find that
there are some networks that are (2−45 , 10−9 ) functionally equivalent but where
the weight alignment procedure fails. Therefore, we suspect that there are more
equivalence classes of functions than scalar multiples of permuted neurons, and
so develop further methods for tightly computing (ε, 0) functional equivalence.
7 Results
We extract a wide range of neural network architectures; key results are given
in Table 1 (Sect. 1). We compute (ε, δ)-functional equivalence at δ = 10−9 and
δ = 0 on the domain S = {x : x2 < d0 ∧ x ∈ X }, sufficient to explore both
sides of every neuron.
216 N. Carlini et al.
8 Concluding Remarks
We introduce a cryptanalytic method for extracting the weights of a neural
network by drawing analogies to cryptanalysis of keyed ciphers. Our differential
attack requires multiple orders of magnitude fewer queries per parameter than
prior work and extracts models that are multiple orders of magnitude more
accurate than prior work. In this work, we do not consider defenses—promising
approaches include detecting when an attack is occuring, adding noise at some
stage of the model’s computation, or only returning the label corresponding to
the output, any of these easily break our presented attack.
The practicality of this attack has implications for many areas of machine
learning and cryptographic research. The field of secure inference relies on the
assumption that observing the output of a neural network does not reveal the
weights. This assumption is false, and therefore the field of secure inference will
need to develop new techniques to protect the secrecy of trained models.
We believe that by casting neural network extraction as a cryptanalytic prob-
lem, even more advanced cryptanalytic techniques will be able to greatly improve
on our results, reducing the computational complexity, reducing the query com-
plexity and reducing the number of assumptions necessary.
References
[BBJP19] Batina, L., Bhasin, S., Jap, D., Picek, S.: CSI NN: reverse engineering of
neural network architectures through electromagnetic side channel. In: 28th
USENIX Security Symposium (2019)
[BCB15] Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly
learning to align and translate. In: 3rd International Conference on Learning
Representations (ICLR) (2015)
[BCM+13] Biggio, B., et al.: Evasion attacks against machine learning at test time.
In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD
2013. LNCS (LNAI), vol. 8190, pp. 387–402. Springer, Heidelberg (2013).
https://doi.org/10.1007/978-3-642-40994-3 25
[BFH+18] Bradbury, J., et al.: JAX: composable transformations of Python+NumPy
programs (2018)
[BS91] Biham, E., Shamir, A.: Differential cryptanalysis of DES-like cryptosys-
tems. J. Cryptol. 4(1), 3–72 (1991). https://doi.org/10.1007/BF00630563
[CCG+18] Chandrasekaran, V., Chaudhuri, K., Giacomelli, I., Jha, S., Yan, S.:
Exploring connections between active learning and model extraction. arXiv
preprint arXiv:1811.02054 (2018)
[CLE+19] Carlini, N., Liu, C., Erlingsson, Ú., Kos, J., Song, D.: The secret sharer:
evaluating and testing unintended memorization in neural networks. In:
USENIX Security Symposium, pp. 267–284 (2019)
[DGKP20] Das, A., Gollapudi, S., Kumar, R., Panigrahy, R.: On the learnability of
random deep networks. In: ACM-SIAM Symposium on Discrete Algorithms,
SODA 2020, pp. 398–410 (2020)
Cryptanalytic Extraction of Neural Network Models 217
[EKN+17] Esteva, A., et al.: Dermatologist-level classification of skin cancer with deep
neural networks. Nature 542(7639), 115–118 (2017)
[FJR15] Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit
confidence information and basic countermeasures. In: ACM CCS, pp.
1322–1333 (2015)
[GBDL+16] Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K., Naehrig, M.,
Wernsing, J.: CryptoNets: applying neural networks to encrypted data with
high throughput and accuracy. In: International Conference on Machine
Learning, pp. 201–210 (2016)
[Gen09] Gentry, C.: A fully homomorphic encryption scheme. Ph.D. thesis, Stanford
University (2009)
[HDK+20] Hong, S., Davinroy, M., Kaya, Y., Dachman-Soled, D., Dumitraş, T.: How
to 0wn the NAS in your spare time. In: International Conference on Learn-
ing Representations (2020)
[HZRS16] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recog-
nition. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 770–778 (2016)
[JCB+19] Jagielski, M., Carlini, N., Berthelot, D., Kurakin, A., Papernot, N.: High-
fidelity extraction of neural network models. arXiv:1909.01838 (2019)
[JOB+18] Jagielski, M., Oprea, A., Biggio, B., Liu, C., Nita-Rotaru, C., Li, B.: Manip-
ulating machine learning: poisoning attacks and countermeasures for regres-
sion learning. In: 2018 IEEE Symposium on Security and Privacy (S&P),
pp. 19–35. IEEE (2018)
[KBD+17] Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex:
an efficient SMT solver for verifying deep neural networks. In: Majumdar,
R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer,
Cham (2017). https://doi.org/10.1007/978-3-319-63387-9 5
[KLA+19] Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila,
T.: Analyzing and improving the image quality of StyleGAN. CoRR,
abs/1912.04958 (2019)
[KTP+19] Krishna, K., Tomar, G.S., Parikh, A.P., Papernot, N., Iyyer, M.: Thieves
on sesame street! Model extraction of BERT-based APIs. arXiv preprint
arXiv:1910.12366 (2019)
[Lev14] Levinovitz, A.: The mystery of Go, the ancient game that computers still
can’t win. Wired, May 2014
[MLS+20] Mishra, P., Lehmkuhl, R., Srinivasan, A., Zheng, W., Popa, R.A.: DELPHI:
a cryptographic inference service for neural networks. In: 29th USENIX
Security Symposium (2020)
[MSDH19] Milli, S., Schmidt, L., Dragan, A.D., Hardt, M.: Model reconstruction
from model explanations. In: Proceedings of the Conference on Fairness,
Accountability, and Transparency, FAT* 2019, pp. 1–9 (2019)
[NH10] Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann
machines. In: Proceedings of the 27th International Conference on Machine
Learning (ICML), pp. 807–814 (2010)
[RK19] Rolnick, D., Kording, K.P.: Identifying weights and architectures of
unknown ReLU networks. arXiv preprint arXiv:1910.00744 (2019)
[RWT+18] Riazi, M.S., Weinert, C., Tkachenko, O., Songhori, E.M., Schneider, T.,
Koushanfar, F.: Chameleon: a hybrid secure computation framework for
machine learning applications. In: ACM ASIACCS, pp. 707–721 (2018)
[SHM+16] Silver, D., et al.: Mastering the game of Go with deep neural networks and
tree search. Nature 529(7587), 484 (2016)
218 N. Carlini et al.
[SIVA17] Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, Inception-
ResNet and the impact of residual connections on learning. In: Proceedings
of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI 2017,
pp. 4278–4284. AAAI Press (2017)
[SSRD19] Shamir, A., Safran, I., Ronen, E., Dunkelman, O.: A simple explanation for
the existence of adversarial examples with small Hamming distance. CoRR,
abs/1901.10861 (2019)
[SZS+14] Szegedy, C., et al.: Intriguing properties of neural networks. In: 2nd
International Conference on Learning Representations (ICLR 2014).
arXiv:1312.6199 (2014)
[TL19] Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional
neural networks. arXiv preprint arXiv:1905.11946 (2019)
[TZJ+16] Tramèr, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing
machine learning models via prediction APIs. In: USENIX Security Sym-
posium, pp. 601–618 (2016)
[Wen90] Wenskay, D.L.: Intellectual property protection for neural networks. Neural
Netw. 3(2), 229–236 (1990)
[WG18] Wang, B., Gong, N.Z.: Stealing hyperparameters in machine learning. In:
2018 IEEE Symposium on Security and Privacy (S&P), pp. 36–52. IEEE
(2018)
[WSC+16] Wu, Y., et al.: Google’s neural machine translation system: bridging the gap
between human and machine translation. arXiv preprint arXiv:1609.08144
(2016)
[XHLL19] Xie, Q., Hovy, E., Luong, M.-T., Le, Q.V.: Self-training with noisy student
improves ImageNet classification. arXiv preprint arXiv:1911.04252 (2019)
[Yao86] Yao, A.C.-C.: How to generate and exchange secrets. In: FOCS 1986, pp.
162–167. IEEE (1986)
[ZL16] Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning.
arXiv preprint arXiv:1611.01578 (2016)