Security and Privacy Issues in Deep Learning
Security and Privacy Issues in Deep Learning
0, MONTH 2020 1
Abstract—To promote secure and private artificial intelligence amounts of patient data for disease prediction [1], preform-
(SPAI), we review studies on the model security and data privacy ing autonomous security audits from system logs [2], and
of DNNs. Model security allows system to behave as intended developing self-driving cars using visual object detection [3].
without being affected by malicious external influences that can
compromise its integrity and efficiency. Security attacks can be However, the vulnerabilities of DL-based systems wht respect
arXiv:1807.11655v4 [cs.CR] 10 Mar 2021
divided based on when they occur: if an attack occurs during to security and privacy have been extensively studied to
training, it is known as a poisoning attack, and if it occurs prevent cyberattacks.
during inference (after training) it is termed an evasion attack. If the input data is compromised, a DL-based system
Poisoning attacks compromise the training process by corrupting can produce in accurate or undesired results. For example,
the data with malicious examples, while evasion attacks use
adversarial examples to disrupt entire classification process. jamming the sensors [4] or occluding the camera lens [5] of
Defenses proposed against such attacks include techniques to an autonomous driving system can have dangerous effects on
recognize and remove malicious data, train a model to be its performance. Similarly, bio-metric authentication systems
insensitive to such data, and mask the model’s structure and using face recognition [6] can be bypassed by adding noise or
parameters to render attacks more challenging to implement. digitally editing a pair of glasses onto the image of a face [7]
Furthermore, the privacy of the data involved in model training
is also threatened by attacks such as the model-inversion attack, to achieve false positive results.
or by dishonest service providers of AI applications. To maintain In this study, we divided such attacks into evasion (inference
data privacy, several solutions that combine existing data-privacy phase) and poisoning (training phase) attacks. In previous
techniques have been proposed, including differential privacy and studies of evasion attack, attacks have typically been cat-
modern cryptography techniques. In this paper, we describe the egorized as white-box or black-box attacks. Initially, most
notions of some of methods, e.g., homomorphic encryption, and
review their advantages and challenges when implemented in forms of evasion attacks were white-box attacks—they require
deep-learning models. prior knowledge of the DL model parameters and structure—
that attempt to subvert the learning process or reduce the
Impact Statement—With advancements in deep learning tech-
nologies, AI-based applications have become prevalent in various classification accuracy by injecting adversarial samples using
fields. However, existing deep learning models are vulnerable to gradient-based techniques [8, 9]. Recently, black-box attacks
various security and privacy threats. These threats can cause have become more prevalent; they function by exploiting the
grave consequences in real life: for example, if an autonomous classification confidence of the target model to produce incor-
vehicle is compromised, the system could fail to recognize a pedes- rect classification information. Poisoning attacks can also be
trian owing to an adversary, which can cause a lethal accident.
This paper systemically categorizes representative threats that divided into white- and black-box attacks based on the model
occur in deep-learning and defense methods and presents insights accessibility. However, in this paper, we categorize poisoning
for deep learning developers to develop robust AI applications. attacks into three subclasses based on the vulnerability of
Index Terms—Private AI, Secure AI, Machine Learning, Deep the target model: performance degradation, targeted poisoning,
Learning, Homomorphic Encryption, Differential Privacy, Ad- and backdoor attacks.
versarial Example, White-box Attack, Black-box Attack The methods proposed to defend DL-based systems against
evasion attacks include empirical approaches—gradient mask-
I. I NTRODUCTION ing [10, 11, 12], increasing robustness [9, 13, 14], and detec-
tion of attacks [15, 16, 17]–that can be implemented against
T HE development of deep learning (DL) algorithms has
transformed the the approach adopted to address sev-
eral real-life-data-driven problems, such as managing large
known attacks and model certification approaches [18, 19, 20].
We employed defense techniques to counter poisoning at-
tacks separately; they mainly focus on detecting anomalous
∗ Corresponding author: Sungroh Yoon (e-mail: sryoon@snu.ac.kr). data [21, 22, 23, 24, 25, 26, 25] and making the model robust
†: These authors contributed equally to this work. to poisoning attacks by pruning or fine-tuning with reliable
This paragraph of the first footnote will contain the date on which you clean data [27, 28, 29].
submitted your paper for review. It will also contain support information,
including sponsor and financial support acknowledgment. For example, “This Current DL systems additionally face the threat of privacy
work was supported in part by the U.S. Department of Commerce under Grant breach. Although it has been demonstrated that recovering or
BS123456.” identifying some of the training data [30, 31] is possible, a
H. Bae was with the Seoul National University, Seoul 08870, Republic of
Korea. He is now with Ehwa University, Seoul 03760 Republiic of Korea. privacy breach can occur in other situations as well. There
(e-mail: hobae@ehwa.ac.kr). are considerable risks involved in training a DL model with
J. Jang, D. Jung, H. Jang, H. Ha, H. Lee and S.Yoon are with the De- data owned by multiple parties; for instance, in the case
partment of Electrical and Computer Engineering, Seoul National University,
Seoul 08870, Republic of Korea. (email: {hukla, annajung0625, wkdal9512, of deploying an application via a third-party cloud system.
heonseok.ha, rucy74, sryoon}@snu.ac.kr). Various attempts have been made to counter these threats, by
2 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020
sign
applying conventional security techniques, such as homomor-
phic encryption, secure multiparty computation, or differential
privacy, to DL systems.
In this paper, we review recent studies on model security + .02 =
and data privacy that have contributed towards building a
secure and private artificial intelligence (SPAI). To address the
need for robust artificial intelligence (AI) systems, we further Lesser Panda: 99.99% Pole Cat: 86.54%
compile fragmented findings and techniques with the objective
of providing insights relevant to future research.
To summarize, we review recent research on privacy and
security issues associated with DL in the following domains. + .02 =
TABLE I
T YPES OF ATTACKS ON SECURE AI AND DEFENSES TECHNIQUES EMPLOYED AGAINST THEM .
TABLE II
ATTACK METHODS AGAINST S ECURE AI
adversarial examples must be generated by solving Equation efficient manner, it has a low success rate.
1. Various compromises have been made to overcome the
Although Carlini-Wagner’s attack (CW attack) [33] is also shortcomings of both the above-mentioned attacks. One such
based on the box L-BFGS attack [32], it uses a modified compromise is the iterative FGSM [36], which invokes the
version of Equation 1: FGSM multiple times, taking a small step after each update,
followed by a per-pixel clipping of the image. Further, it can
minimize D (x̃, x) + c · g (x̃) , (2) be proved that the result of each step will be in the L∞ ε-
where x̃ is the adversarial example, D is a distance metric that neighborhood of the original image. The update rule can be
includes Lp , L0 , L2 , and L∞ , g(x̃) is an objective function, expressed as follows:
in which, f (x̃) = ˜l if and only if g(x̃) ≤ 0, and c > 0 is a n o
x̃0 = x, x̃N +1 = Clipx,ξ x̃N + ξ · sign(∇x J(w, x̃N , ˜l))) ,
constant. Here, the Adam [34] optimizer—adopted to enhance
the effectiveness of this attack—conducts a rapid search for (4)
adversarial adversarial examples. The authors of [33] used the where x̃N is the intermediate result at the Nth iteration. This
method of changing the variables or the projected gradient method processes new generations more quickly and has a
descent to support box constraints as a relaxation process after higher success rate.
each optimization step. Noise added to the input data naturally promotes misclassi-
fication. Universal adversarial perturbations [37] are image-
Papernot et al. [35] introduced a targeted attack method that
agnostic perturbation vectors that have a high probability of
optimizes within the L0 distance. A Jacobian-based saliency
misclassification with respect to natural images. Supposing a
map attack (JSMA) is used to construct a saliency map based h w c
on a gradient derived from a feedforward propagation, and perturbation vector n ∈ RI ×J ×K perturbs the samples in
subsequently modifies the input features that maximize the the dataset and that X represents the dataset containing the
saliency map such that the probability that an image classified samples,
with the target label ˜l increases. f (x + n) 6= f (x) , for most x ∼ X . (5)
In general, a DL model is described as nonlinear and
The noise n should satisfy knkp ≤ ξ, and
overfitting; however, the fast gradient sign method (FGSM) [9]
is based on the assertion that the main vulnerability of a P (f (x + n) 6= f (x)) ≥ 1 − δ, (6)
x∼X
neural network to adversarial perturbation is its linear nature.
FGSM linearizes the cost function around its initial value, and where f is the classifier, ξ restricts the value of the perturba-
finds the maximum value of the resultant linearized function tion, and δ is the fooling rate.
following the closed-form equation: The backward pass differential approximation (BPDA) [38]
is an attack that has been claimed to overcome gradient-
x̃ = x + ξ · sign(∇x J(w, x, ˜l)) (3)
masking defense methods by performing a backward pass
where w is the parameter of the model. The parameter ξ with the identity function to approximate the true gradients
determines the strength of the adversarial perturbation applied of samples.
to the image, and J is the loss function for training. Although With the development of adversarial defense methods, more
this method can generate adversarial examples in a cost- advanced attack methods have been proposed. Brendel et
4 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020
al. [39] developed a Brendel and Bethge attack (B&B attack) lighten all the red pixels of an image simultaneously. To
in which adversarial examples are generated from incorrectly ensure that adversarial examples are indistinguishable from
classified regions. This method uses a combination of a the original ones, the method minimizes the adversarial loss
gradient-based attack and boundary attack [40], and is a black- function and smoothness-constraint loss function similar to
box attack. It estimates the local boundary between adversarial previous studies [33, 48].
and clean examples using gradients and moves the adversarial b) Black-box Attack: Practically, it is difficult to access
examples close to the clean examples along that boundary. the models or training datasets. Industrial training models are
Because this method finds optimal adversarial examples by maintained confidential, and models in mobile devices are not
optimization, it can be applied to different adversarial criteria accessible to attackers. The scenario for a black-box attack
and any norm bound (L0 , L1 , L2 , and L∞ ). is therefore closer to reality: an attacker has no information
Recently, Croce et al. [41] introduced the auto attack (AA), about the model or the dataset. However, the input format and
an ensemble attack that consists of several white-box attacks the labels outputted by a target model running on a mobile
and one black-box attack [42, 43]. Unlike previous methods, device may be accessible, possibly when the target model is
AA uses recursive attacks to separate evaded test data and hosted by Amazon or Google.
continues the attack on the remaining data with the successive In a black-box attack, the gradients of the target model
attack sequentially. Consequently, it is more powerful than are inaccessible to the attackers, who must therefore devise a
previous white-box attacks. substitute model. The attacks performed by means of substitute
Adversarial examples are typically designed to perturb an models are called transfer attacks. It has been shown [32, 9]
existing data point within a small matrix norm at the pixel that neural networks can attack another model without prior
level, i.e., the samples are norm-bounded. Most researchers knowledge of the number of layers or hidden nodes; however,
have used this characteristic to propose new defense methods. the task must be known. This is because a neural network has
Nonetheless, to overcome this drawback, various methods have the linear nature, whereas those of previous studies attribute
been proposed that semantically alter the attributes of the the transferability to its nonlinearity. Activation functions such
input image instead of employing a norm-bounded pixel-level as sigmoid and ReLU, are known to exhibit nonlinearity. The
approach. sigmoid function is challenging to implement in learning,
Natural generative adversarial network (GAN) [45] gener- whereas ReLU is widely used; however, unlike sigmoid, it
ates adversarial examples that appear natural to humans. Zhao does not produce nonlinearity. Thus, a replica of the target
et al. used the latent space z of the GAN structure to search model can learn a similar decision boundary for a given task.
for the required perturbation. A matching inverter (MI) is used The architecture of the substitute model, which may be
to search for z ∗ , which satisfies the following: a convolutional neural network (CNN), a recurrent neural
network (RNN), or a multi-layer perceptron (MLP), is approx-
z ∗ = argmin ke
z − MI(x)k s.t. f (G(e
z )) 6= f (x), (7)
z
e
imated based on the input format, which might be images or
sequences. Although the model can be trained by collecting
where G is a generator. Similarly, Song et al.[46] constructed
similar data from public sources, the process is highly expen-
unrestricted adversarial examples using an auxiliary classifier
sive.
GAN (ACGAN) [47]. Furthermore, they added norm-bounded
Papernot et al. [44] addressed this issue by introducing
noise to the generated images to boost the attack ability.
practical black-box attacks (Fig. 3), in which an initial syn-
Xiao et al. [48] introduced a novel spatially transformed
thetic dataset is augmented by a Jacobian-based method. This
attack. They used the pixel value and 2D coordinates of each
synthetic dataset can be developed from a subset that is not
pixel to estimate a per-pixel flow field and generate adversarial
part of the training, data and labeled by inputting it to the
examples. subsequently, the pixels were moved to adjacent
target model. Thereafter, the trained substitute model can be
pixel locations along the flow field to produce perceptually
used to create input data by sending queries to a service
realistic adversarial examples. They deployed the L-BFGS
such as Google or Amazon; these queries must be severely
solver to optimize the following loss function:
limited in number and frequency to prevent detection. Papernot
Lflow (f ) = et al. [50] resolved this problem by introducing reservoir
∀pixels
X X q sampling, which reduces the amount of data required to train
||∆u(p) − ∆u(q) ||22 + ||∆v (p) − ∆v (q) ||22 , the substitute model.
p q∈N (p) In addition to being expensive, transfer attacks that employ
(8) a substitute model can be blocked by most defense tech-
where N (p) contains the indices of the pixels adjacent to p, niques [51]. Recently, several attacks [52, 40, 53, 54, 55, 56]
and ∆u(·) and ∆v (·) are the changes in the 2D coordinates of that relied solely on the outputs of the model together with a
(·). The method results in more realistic adversarial examples few queries and other limited information were proposed.
than those of previous norm-bounded adversarial attacks Chen et al. [52] introduced a method for approximating the
Laidlaw et al. [49] used a parameterized function f to gradient of a target model that only requires the output score of
generate new pixels for producing adversarial examples. This the target network: they suggested a zeroth-order optimization
method of functional adversarial attacks was applied to the to estimate the gradient of the target model. This method
color space of images to produce perceptually different but randomly chooses and changes one pixel to compute the
realistic adversarial examples. For instance, the method may adversarial perturbation using zeroth-order optimization with
HO et al.: SECURITY AND PRIVACY ISSUES IN DEEP LEARNING 5
1 Oracle DNN 𝑂
Substitute Training
𝑆𝑡 ෨ 𝑡)
𝑂(𝑆
Dataset Collection 𝑆0 3 4 5
Substitute Dataset Substitute DNN 𝐹 Jacobian-based
Labeling 𝑆𝑡 Training 𝐹𝑡 Dataset Augmentation
2
Substitute DNN 𝐹 𝑡 ←𝑡+1
Architecture Selection 𝑆𝑡+1 = 𝑥 + 𝑣𝑡+1 ∙ 𝑠𝑖𝑔𝑛 𝐽𝐹 𝑂෨ 𝑥 ∶ 𝑥 ∈ 𝑆𝑡 ⋃ 𝑆𝑡
Fig. 3. Overview of a practical black-box attack [44]: the attacker (1) collects a training set S0 for an initial substitute model and (2) selects an appropriate
architecture F . Using the oracle model Õ, the attacker (3) labels the training set St and (4) trains the substitute model Ft . Following this, the Jacobian-based
adversarial attack algorithm is implemented, the dataset is augmented by the attacker, and steps (3) through (5) are repeated for t epochs.
#)/1#&+2&)3/&"0-+,$)3.
'(1*)&4$./(#$+(&6
76
#13/,/&",+*(:
75
+,$-$("0&$."-/
B/C*$:$#)"()D
+,$-$("0&$."-/ >:?*#)$(-&#)/1@#$A/&+2&75
!"#$$%&%'()!*++'!,"-
#13/,/&",+*(:
!"#$$%&%'()!*++'!,"- +,$-$("0&$."-/
B/C*$:$#)"()D
!"#$$%&%'()%.!*++'!,"-)/#(0'+$#+%#"1
'(1*)&4$./(#$+(&5
Fig. 4. Left: A boundary attack—performs rejection sampling by traversing the boundary between the adversarial and original images. Middle: in each step,
the attack determines a new random direction by (#1) sampling a Gaussian distribution and projecting it on an equidistant sphere, and (#2) making a small
move towards the original image. Right: both two-step sizes are dynamically adjusted to accommodate the boundary [40].
the loss function described in [33]; this process is repeated initial sample in the adversarial region is selected; b) a random
until sufficient pixels are perturbed. The method has been walk is executed to move the samples toward the decision
applied successfully to a target network without a gradient, but boundary between the adversarial and non-adversarial regions
requires as many queries as the number of pixels. However, by reducing the distance to a target example; c) the stages
attack-space dimension reduction, hierarchical attacks, and of the walk in the adversarial region are performed by means
importance sampling can be used to reduce the number of of rejection sampling. Steps b) and c) are then repeated until
queries. the adversarial example is sufficiently close to the original
Su et al. [55] also utilized the score of the network; however, image. Ilyas et al. [53] introduced a technique similar to
it changed only one pixel in the target image, and is hence a model that requires query-limited, partial-information, and
called a one-pixel attack. In this case, a differential evolution label-only settings. Such techniques could implement natural
algorithm was used to select pixels to perturb. These attacks evolutionary strategies (NESs) [57] to generate adversarial
achieved a good success rate. Recently, Guo et al. [56] intro- examples in a query-limited setting. Here, an instance of the
duced a simple black-box attack, which is query-efficient. They target class is selected as an initial sample and repeatedly
developed a method that picks random noise and either adds projected onto the L∞ -boxes to maximize the probability of
or subtracts them from an image, the addition or subtraction the adversarial target class.
of random noise was proved to increase the target score of the 2) Poisoning Attack: A poisoning attack inserts a ma-
attack. The algorithm repeats this procedure until the attack is licious example into the training set to interfere with the
successful. learning process or facilitate an attack during testing time by
Compared to the methods outlined above, a few other changing the decision boundary of the model, as displayed in
practical attacks rely on predicted labels, because the output Fig. 5. Several poisoning-attack methods applicable to ML
scores of the model are usually inaccessible. The process of techniques, such as SVM or least absolute shrinkage and
executing a boundary attack [40]—which assumes the worst selection operator (LASSO), can be described mathematically.
scenario for attackers—consists of three steps (Fig. 4): a) an However, neural networks are difficult to poison owing to
6 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020
(a) (b)
their complexity. Nonetheless, the relatively small number of
feasible attack methods can be categorized into three types,
based on the attacker’s goal: performance degradation attacks
to compromise the learning process, targeted poisoning attacks
to provoke target sample misclassification through feature
collision, and backdoor attacks to create a backdoor to be
exploited when the system is deployed.
a) Performance degradation attack: It aims to subvert
the training process by injecting spurious samples generated
Fig. 5. The functionality of poisoning a sample. (a) Decision boundary after
from a bi-level optimization problem. Munoz et al. [58] de- training with normal data. (b) Decision boundary after injecting a poisoning
scribed two performance degradation attack scenarios: perfect- sample.
knowledge (PK) and limited-knowledge (LK) attacks. The PK
scenario is an unrealistic setting, and is useful only in a worst-
case evaluation. In the LK scenario, the attacker typically discriminator generated spurious yet realistic images with a
possesses information, namely θ = (D̂, X , M, ŵ), where X poisoning ability. A hyperparameter that can adjust the realism
is the feature representation, M is the learning algorithm, D̂ and poisoning ability of spurious images affected the trade-off
is the surrogate data, and ŵ is the learned parameter from D̂, between effectiveness and detectability. When the influence
where the ˆ symbol indicates that the information is partial. of the realistic image-generation process was high, the attack
The bi-level optimization for creating the poisoning samples success rate was low. Conversely, when the influence of the
can be represented as follows: poisoning ability was high, the generator tended to produce
outliers; hence, attacks were more detectable.
Dc∗ ∈ arg max A(Dc0 , θ) = J(D̂val , ŵ) b) Targeted poisoning attack: This method was intro-
Dc0 ∈φ(Dc ) duced by Koh et al. [22] to cause target test samples selected
(9)
s.t. ŵ ∈ arg min J(Dˆtr ∪ Dc0 , w0 ), from the test dataset to be misclassified during the inference
w0 ∈W phase. The complexity of neural networks renders identifying
where D̂ is divided into training data D̂tr and validation data the source for classification and explaining the classification
D̂val . The objective function A(Dc0 , θ) evaluates the impact in terms of training data challenging. Because of the expense
of the poisoning samples among the clean examples. This of retraining a model after modifying or removing a training
function can be considered a loss function, where J(D̂val ) sample, the authors formulated the influence of up-weighting
measures the performance of the surrogate model with Dˆval . or modifying a training sample during training in terms of
The influence of the poisoning sample Dc is propagated changes to the parameters and loss functions. The attack was
indirectly using ŵ following which, the poisoning sample is optimized based on the amount of change in the test loss
optimized. The primary objective of the optimization is to find caused by the change in the training sample.
a poisoning sample that can degrade the performance of the Although only a small number of attacks are performed on
target model. The poison is generic if the target label of the the training data, the attack may be unsuccessful if the training
poison sample is arbitrary and not specific. If a specific target data are impeded by domain experts. Shafahi et al. [61]
is required, Equation 9 can be replaced by introduced a clean-label attack to circumvent this problem.
Here, the feature-conflict method was used to ensure that the
A(Dc0 , θ) = −J(D̂val
0
, ŵ), (10) labels introduced in the attack were appropriate for the images
0 to which they were attached. Subsequently, the attacker would
where D̂val is the manipulated validation set, which is similar
select the target image t and base image b from the test set
to D̂val , except for the presence of misclassified labels that
where the target image would be expected to be misclassified
can produce a desired output. Munoz et al. [58] proposed
as the label of the base image. The attack p is initialized with
the back-gradient method to solve Equations 9 and 10 and
the base image and created using the following equation:
generate poisoning examples as an alternative to gradient-
based optimization. It requires a convex objective function p = arg min ||f (x) − f (t)||22 + β||x − b||22 . (11)
and a Hessian-vector product, which are not produced using x
complicated learning algorithms, unlike those used to develop The attack is generated by optimizing a sample similar to the
neural networks. In contrast, Yang et al. [59] were able base image in the image space and close to t in the feature
to apply a gradient-based and GAN-like generative method space mapped by function f . The attack surrounds the target
to deep neural networks (DNNs) using an autoencoder to feature f (t) and changes the decision boundary, to have the
compute the gradients reduced the computation time by a target image categorized within the base class. For example,
factor of over 200. if b is a picture of a dog and t is a picture of a bird, the
The attacks described above can be detected easily by attack changes the decision boundary by adding p, a perturbed
outlier detection. Nonetheless, Munoz et al. [60] recently version of b, to the training data. As a result, t is erroneously
proposed a GAN-based attack designed to avoid detection. put into class b, and t can be used to deceive the classifier.
Their pGAN model [60] had a generator, discriminator and Shafahi et al. [61] analyzed attacks in two of the retraining
target classifier. A min–max game between the generator and situations: end-to-end learning which fine-tunes the entire
HO et al.: SECURITY AND PRIVACY ISSUES IN DEEP LEARNING 7
model, and transfer learning, which fine-tunes the final layer. image dataset, including the one containing the trigger. When
They used a one-shot kill attack that generates a poisoning the retrained model is deployed, the image with the trigger
attack from pair of a base and target image through the feature is misclassified with the target label. Such attacks have been
collision method. The one-shot kill attack was successfully successfully applied to face recognition, speech recognition,
applied to transfer learning after significant changes were auto-driving, and age-recognition applications.
made to the decision boundary; however, it was not applied Chen et al. [66] introduced two strategies to obtain access to
to end-to-end learning, which retrains the lower layers that a face recognition system, under three constraints, namely 1)
extract fundamental features. Shafahi et al. [61] succeeded in no knowledge of the model, 2) access to only a limited amount
an attack on end-to-end learning using a watermarking method of training data, and 3) poisoning data not visually detectable.
in which the target image was projected onto a base image by In the input-instance-key strategy, a key image is prepared
adjusting its opacity and using several target and base images. and associated with a target label. To model the camera
Because all neural networks do not have the same feature effects, noise is added to the key image. The second, pattern-
mapping function, the feature collision using a model cannot key strategy, has three variants. The first strategy, blended
be applied to an unknown neural network. Zhu et al. [62] injection, combines a spurious image or random pattern within
proposed a feature collision attack (FC attack) using an the key image; the output images of this process usually
ensemble model and a convex polytope attack (CP attack). FC appeared unrealistic. The second strategy, accessory injection,
attack adopts the same mechanism of that of [61], except for applies an accessory, such as glasses or sunglasses, to the
the number of models for feature collisions, which is greater. key image. This is a simple attack to execute during the
The FC attack was unsuccessful because the constraints on inference stage. The third method, blended accessory injection,
the attack increase, and the attack simply approaches the combines the first and second strategies. Unlike previous
target in the feature space without changing the predicted studies, in which poisoning data accounted for 20% of the
result of the target image. In contrast, the CP attack using training data, the authors of [66] only added five poisoned
convex properties efficiently transforms the target into or images to 600,000 training images in the input-instance-key
near the polytope. However, the attacker would face difficulty strategy, and approximately 50 for the pattern-key strategy. In
poisoning the unknown target model if the model learned new both cases a backdoor was successfully created.
feature mapping functions through end-to-end training. Thus, However, attacks wth visible triggers can be detected eas-
these authors also proposed a multilayer convex polytope ily by human inspection; therefore, recent attacks on image
attack that generated poisoning attacks using feature collisions classification introduced invisible triggers such as scattered
of every activation layer. Moreover, recently, MetaPoison [63] triggers [67], which are distributed across the image, warping-
generated clean-label data poisoning, which works in an end- based triggers [68], and reflection triggers [69]. Backdoor
to-end setting with bi-level optimization. Geiping and Jonas. attacks performed without label poisoning have also been
et al. [64] succeeded in training their model from scratch on a proposed to increase the stealthiness of a target model [70, 71].
full-sized, poisoned imageNet dataset using gradient matching. Furthermore, a backdoor can be established by flipping only
c) Backdoor attack: It is an attack that aims to install several vulnerable bits of weights [72, 73, 74, 75].
a backdoor to be accessed at classification time and was
introduced by Gu et al. [28], who inserted patches into B. Defense Techniques against Deep Learning Models
an image to cause false classifications, such as replacing a Defense techniques against both poisoning and evasion
stop sign with a speed limit. Trojaning attacks [65] rely on attacks have been developed, and the latter can be further cate-
the fact that neural network developers often download pre- gorized into empirical defenses against known evasion attacks
trained weights from ImageNet for training or outsource the and certified defenses, which are also probably effective.
entire amount of data to suppliers of machine learning as a 1) Defense techniques against evasion attacks: Various
service (MLaaS). In a worst-case scenario, an attacker can methods have been proposed to defend DL-based systems
directly change the user’s model parameters and training data; against evasion attacks (adversarial attacks). For example,
however, they cannot access the validation set of the user Kurakin et al. [76] suggested that adversarial training can
or use the training data to launch attacks. Trojaning attacks be employed when security against adversarial examples is a
insert a trigger, in the form of a patch or watermark into an concern, which increases robustness against evasion attacks.
image, which causes it to be categorized into the target class. By including adversarial training, defense techniques can
They involve four steps: 1) the trigger and the target class are be broadly divided into three categories: gradient masking,
selected; 2) the attacker selects the node in the target layer with robustness, and detection.
the highest connectivity from the preceding layer of the trained a) Gradient masking: Gradient masking obfuscates the
model, and the trigger is updated from the gradient derived gradients used in attacks [38]. There are three approaches
from the difference in the activation result of the selected node predominantly adopted in this method: shattered gradients,
and the targeted value of the node (the target value is set by stochastic gradients, and vanishing/exploding gradients.
the attacker to increase the relatedness of the trigger and the Neural networks generally behave in a largely linear man-
selected node of the target layer); 3) using the mean image ner [77]. As image data is multidimensional in nature, the
of a public dataset, the training data are reverse-engineered to property of linearity can have adverse effects on the clas-
ensure that the images would be classified into the target class; sification, rendering the model vulnerable to adversarial at-
4) the target model is trained using the reverse-engineered tacks. The shattered gradients approach involves making the
8 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020
model nondifferentiable or numerically unstable, to ensure exploit obfuscated gradients, are vulnerable to strong gradient-
that accurate gradients cannot be obtained. One version of based attacks [36, 13, 86]. Alternatively, an attacker may
the shattered gradient defense involves thermometer encoding simply use a different attack [33, 38] to bypass such a defense
[10]. This method applies nondifferentiable and non-linear or the way circumvented by any adversary who uses the true
transformations to the input by replacing one-hot encoding adversarial examples [87, 88].
with thermometer encoding. The thermometer τ (j) ∈ RK , b) Robustness: Gradient obfuscation could prove to be
can be expressed as follows: useless in a white-box setting, where increasing robustness
( may be a better approach. One method to increase robustness
1, if l ≥ j of the model is to enable it to produce similar outputs from
τ (j)l = . (12)
0, otherwise clean and adversarial examples and either by penalizing the
difference between them or regularizing the model to reduce
Subsequently, the thermometer discretization function f for a
the attack surface.
pixel i ∈ {i, · · · , n} can be defined as
Most studies on robustness involve adversarial training [9],
ftherm(x)i = τ (b(xi )) = C(fonehot (xi )), (13) which can be viewed as minimizing the worst error caused by
Pl the perturbed data of an adversary. It can also be perceived as
where R is the cumulative sum C(c)l = j=0 cl , and b learning an adversarial game with a model that requests labels
is the quantization function. Other defense techniques based for the input data. Other techniques include the distillation
on gradient shattering include local intrinsic dimensionality training [89] method which provides robustness to saliency
(LID) [78] metrics or input transformations [79] such as map attacks [35], and a layerwise regularization method [90],
image cropping, rescaling [80], bit-depth reduction [81], JPEG which controls the global Lipschitz constant of a network.
compression [34], and total variance minimization [82]. However, none of these methods produce fully robust models
The stochastic gradients approach obfuscates gradients in and could be bypassed by a multi-step attack, such as the
the inference phase by dropping random neurons in each projected gradient descent (PGD).
layer. The network then stochastically prunes a subset of the Most of the optimization problems in ML are solved using
activations in each layer during the forward pass. Stochastic first-order methods and variants of stochastic gradient descent;
activation pruning [11] is a variant of this method in which, the thus, a universal attack can be designed using first-order
dropout follows the probability from a weighted (rather than information. Madry et al. [13] suggested that local maxima
uniform) distribution. The surviving activations are scaled up for the worst error can be found by PGD, on the basis that
to normalize the dynamic range of the inputs to the subsequent a trained network that is robust against PGD adversaries will
layer. The probability of sampling the jth activation in the ith also be robust against a wide range of attacks that assume
layer is given by first-order optimization.
|(hi )j | Adversarial training was originally used to train a small
pi j = Pai , (14) model using the MNIST dataset [9]. Kurakin et al. [76]
i
k=1 |(h )k | extended that work to ImageNet [91] using a deeper model
i
where hi ∈ Ra and (hi )j are the values of the jth activation with a batch normalization step. The relative weights of the
in the ith layer. Xie et al. [83] also used a randomization adversarial examples can be independently controlled in each
technique that inserts a layer in front of the input to the neural batch using the following loss function:
network, which rescales and zero-pads the input. 1 X X
The vanishing/exploding gradients method renders the Loss = ( J(xi |li ) + λ J(x̃i |li )),
(m − k) + λk
model unusable by deep computation, which restores adver- CLEAN ADV
(16)
sarially perturbed images to clean images. These images are where J(x|l) is the loss on a single example x with true class
then fed to the unmodified classifier. PixelDefend [12] is a l, m is the total number of training examples in the batch, k
defense algorithm that uses PixelCNN [84] to approximate the is the number of adversarial examples in the batch, and λ is
training distribution. PixelCNN is a generative model designed the weight applied to adversarial examples.
to produce images that track the likelihood over all pixels by Defense techniques that change the target function by intro-
factorizing it into a product of conditional distributions: ducing regularizers or modifying the architecture of the model
Y
PCNN (x) = PCNN (xi |x1:(i−1) ). help increase the robustness of the model against adversarial
(15)
i attacks. Kannan et al. [92] introduced adversarial logit pairing
Defense-GAN [85] is a similar method that uses a GAN (ALP), which produces regularization by reducing the distance
instead of PixelCNN. The trained generator projects images between the logits of clean examples and those of adversarial
onto the manifold of the GAN, and these projected images examples. The loss function of training then becomes:
m
are then fed into the classifier. 1 X
L f (x(i) ; w), f (x̃(i) ; w) ,
Gradient-based defense algorithms based on the gradient of J(M, w) + λ (17)
m i=1
the initial version are inherently vulnerable to gradient-based
attacks. Athalye et al. [38] used projected gradient descent where J(M, w) is the cost of training a minibatch M, w is the
to set a perturbation υ, combined with the l2 Lagrangian model parameter, and L is the distance function. The results
relaxation approach [86]. Gradient-masking techniques, which showed that a simple regularizer can improve the robustness of
HO et al.: SECURITY AND PRIVACY ISSUES IN DEEP LEARNING 9
a model thatch is trained adversarially. Double backpropaga- accuracy and robust accuracy by weakening the adversarial
tion [93], which is a regularizer that penalizes the magnitudes examples during adversarial training. They used early-stopped
of the input gradients, reduces the sensitivity of the divergence PGD to prevent the adversarial examples used for training
between the predictions and uniform uncertainty produced by from significantly destroying the generalization ability.
evasive attacks. Miyato et al. [94] introduced a regularizer According to Schmidt et al. [99], the adversarial robustness
that reduced the Kullback–Leibler divergence between clean requires more data to produce successful results. There are sev-
and adversarial examples, to render the distributions of the eral methods [100, 101] that use data augmentation methods to
resulting outputs more similar to each other. Xie et al. [95] increase robustness. Carmon et al.,[100] used unlabeled data
denoise feature maps by adding blocks, such as non-local to improve the robustness; the data were pseudo-labeled by the
mean blocks, to a network to reduce adversarial perturbations classifier before being deployed in adversarial training. Lee et
from the inputs. A more recent regularizer [96] makes a model al.,[101] proposed out-of-distribution data augmented training
behave linearly in the vicinity of the input data, which reduces (OAT). They used out-of-distribution data for training with a
the effect of gradient obfuscation and improves robustness to uniform distribution label and achieved improved robustness
adversarial examples. by removing the contributions of undesirable features.
There are several variants of adversarial training, such as c) Detection: Retaining the ability to detect attacks at the
the augmentation of training data or introduction of loss inference phase is considered equally (if not more) valuable
functions. Tramer et al. [51] proposed ensemble adversarial to increase the security of a DL-based system and ensure
training to defend against black-box attacks by using ad- that corrupted input can be rejected. Most detection methods
versarial examples generated by other networks. Decoupling require no change to the classifier; therefore, they are easy to
adversarial example generation from the trained model in- implement, and can be combined with other defenses.
creases the diversity of the training data. In another study, Metzen et al. [15] detected adversarial examples using a bi-
tradeoff-inspired adversarial defense via surrogate-loss min- nary detector network, which was trained to classify inputs into
imization (TRADES) [14] identified a trade-off between clean and perturbed examples. Using a similar scheme, Meng
adversarial robustness and accuracy. The expected errors in et al. [16] separated a detector and reformer network, which
adversarial examples are decomposed into the sum of the were then used to reconstruct clean input. These networks
expected errors in clean examples and a boundary error that identified adversarial examples from the reconstruction error,
corresponds to the likelihood of the closeness of the inupt which yields the Jensen–Shannon divergence of the original
features to the perturbation-extension of the decision boundary. and reconstructed inputs:
Both these errors are expressed by a surrogate loss function 1 1
such as cross-entropy or 0-1 loss functions, to yield the JSD(P ||Q) = DKL (P ||M ) + DKL (Q||M ), (19)
2 2
following minimization:
where P is the output resulting from the original inputs, Q is
min E{φ(f (x)l) + max φ(f (x)f (x0 )l/λ)}, (18) the output of the reconstructed input, and M is the mean of
f x0 ∈B(x,ξ)
P and Q, M = 12 (P + Q).
where φ is the surrogate loss function that represents the Feature squeezing [81] reduces the search space available
expected errors, and B(x, ξ) represents a neighborhood of to attackers by squeezing the inputs before comparing the
x : {x0 ∈ X : ||x0 − x|| ≤ ξ}, which is the expected error and prediction results obtained from the squeezed examples with
the boundary error weighted by λ. This method showed state- those of the clean examples. If there are substantial differences,
of-the-art performance under both black-box and white-box the original input is likely to be adversarial. Squeezing was
attacks. Zhang et al. [97] proposed a feature scattering-based achieved by color-depth reduction and spatial smoothing (both
adversarial training approach that utilized the optimal transport local and non-local smoothing). This method was able to detect
distance between the input data and its adversarial examples adversarial examples in various types of evasion attacks with
for training without label leaking [36]. Recently, Zhang et a low false-positive rate.
al. [98] attempted to to handle the trade-off between standard Grosse et al. [102] identified adversarial examples by apply-
10 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020
Conv
Conv
FC
a classifier is trained to categorize adversarial examples into a
new class. This involves a small reduction in classification
accuracy but a high detection rate. Feinman et al. [103]
also detected adversarial examples by examining statistical
metrics such as the density of the feature space and Bayesian ℂ0 (𝒙) ℂ1 (𝒙) ℂ2 (𝒙) ℂ3 (𝒙)
uncertainty estimates.
Pang et al. [104] minimized reverse cross-entropy as the Fig. 7. Illustration of layerwise adversarial training. A latent adversarial
loss function used to train the model, to identify adversarial example is found in the convex region C1 (x) and propagated through the
examples. The reverse cross-entropy loss value of an input x latter layers in a forward pass, which is represented by the blue lines. The
red line shows the gradients during a backward pass. In the procedure, the
over a label y is expressed as follows: first layer that corresponds to the former layer of convex region C1 (x) does
| not receive gradients [110].
LR
CE (x, l) = −Rl log σ(x), (20)
(
1 certified approach against adversarial examples on two-layer
λ+1 , i=y
Ry = Piλ = λ , (21) networks. Wong et al. [19] presented a convex outer-bound
(L−1)(λ+1) , i 6= y
approach called an ”adversarial polytope”, which is the set
where σ(·) is ;he softmax output, Ry represents the reversed of all the final activation layers that are produced by applying
form of the label y, and λ is the hyperparameter with λ = ∞ norm-bounded perturbation to the inputs. They used this bound
in the experiment. In a recent study, Hu et al., [17] introduced for linear relaxation of the ReLU activation and optimized the
a detection method with safety criteria: robustness against worst-case loss over the region within the bound, as shown
random noise and susceptibility to adversarial noise, which in Fig. 6. However, this method can only be applied to small
is represented as robustness against Gaussian noise and the networks. Wong et al. [108] extended the scope of this method
minimum number of steps required to perturb the input, re- by introducing a provably robust training procedure for general
spectively. They achieved unprecedented accuracy in a white- networks, formulated in terms of Fenchel conjugate functions,
box setting. nonlinear random projections, and model cascade techniques.
d) Certified approach: The robustness of most defenses Cohen et al. [20] addressed the issue of certified defenses
can only be established empirically in the context of known from a different perspective; they proved that classifiers that
types of attacks. An empirically robust classifier may be over- are robust against Gaussian noise are also robust against
come by new and stronger attacks. However, some classifiers, adversarial perturbations bounded by the l2 norm. They used
generally DNNs, can be proven robust if they produce a randomized smoothing, which had already been proven [109]
constant output for some set of variations of the inputs which to maintain robustness. Cohen et al. [20] further proved
is generally expressed as an Lp ball. that smoothing with Gaussian noise can induce certifiable
DNNs have input and output layers with hidden layers robustness against l2 norm bounded perturbations. Because
between them. Reluplex [18] verifies the robustness of DNNs the exact evaluation of the robustness of the classifier is not
by searching for linear combinations of hidden layers. This possible, they showed that the method is robust against attacks
problem is NP-complete, and thus, the search space is reduced with high probability using Monte Carlo algorithms.
by a simplex algorithm. This algorithm is based on a satisfi- Recently, Balunovic et al. [110] combined the adversarial
ability modulo theories (SMT) solver that addresses Boolean training of a classifier with provable defense methods. A
satisfiability. Exploiting the properties from the simplex, Relu- verifier aims to prove the robustness of the classifier, while
plex allows inputs to temporarily violate their feasible bounds an adversary attempts to garner inputs that cause errors within
for certification, verifying the robustness of a neural network. convex bounds, as shown in Fig. 7. They utilized layerwise
Sinha et al. [105] introduced a method that is provably adversarial training and bridged the gap between adversarial
robust to perturbations distributed in a Wasserstein ball. They training-based empirical defense methods and existing certi-
trained a classifier with adversarial training using a distribu- fied defense methods. The method resulted in state-of-the-art
tionally robust optimization. Hein et al. [106] showed formal robust accuracy on the CIFAR-10 dataset under 2/255 L∞
guarantees on the robustness of classifiers using a bound on and 8/255 L∞ perturbations.
the local Lipschitz constant in the vicinity of the input. Their 2) Defense against Poisoning Attacks: Steinhardt et
Cross-Lipschitz regularizer increased the range of attacks that al. [21] proposes a data sanitation defense method [111] which
can be defeated, forcing potential attackers to find better modes aims to remove poisoned data points from the given dataset.
of attack. The proposed online learning algorithm provides candidate
Accurate bounds on worst-case losses improve the robust- attack data instances and the worst-case test loss from any
ness. Raghunathan et al. [107] improved the accuracy of attack.
both the lower and upper bounds on the worst-case loss Koh et al. [22] used influence functions to track model
by concentrating on the upper bound. This was performed predictions and identify training data points that had the most
on the basis that it is safer to minimize the upper bound influence on a given prediction. Although their theory does
than minimizing the lower bound. They demonstrated a novel not extend to nonconvex and nondifferentiable models, they
HO et al.: SECURITY AND PRIVACY ISSUES IN DEEP LEARNING 11
proved that approximate influence functions can be effective al. [27], except they pruned filters of a neural network that
against poisoning attacks. These functions additionally allow were compromised, thus triggering a backdoor attack.
a defender to focus on data with a high influence score. This
method appears to be a better way of eliminating tainted ex-
amples than simply identifying data points with large training III. P RIVATE AI
losses. Deep learning algorithms, which underpin most current
Paudice et al. [23] also suggested a defense mechanism AI systems, are data-driven, which exposes them to privacy
to mitigate the effects of poisoning attacks based on outlier threats while data collection or pretrained-model distribution
detection. An attacker would attempt to cause the greatest is performed. Many attempts were made to build private AI
possible impact with a limited number of poisoning points. systems to maintain data privacy. In this section, we describe
To mitigate this effect, they divided the trustworthy dataset the ways in which privacy can be breached in current AI
D into two classes, D+ and D− , and trained a distance- systems, review defenses based on homomorphic encryption
based outlier detector for each class. Each detector calculated (HE), secure multi-party computation (SMC), and differential
an outlier score for each sample in the entire clean dataset. privacy (DP).
There are many ways to measure the outlier score, such as an
SVM or local outlier factor (LOF). In this study, the empirical
cumulative distribution function (ECDF) of training instances A. Scenarios for Privacy Attacks
was used to determine a threshold for detecting outliers. Upon
removing all the entities expected to be contaminated, the 1) Service providers: Service providers offer AI-based
remaining data were used to retrain the learning algorithm. applications to the public. These applications are based on
Instead of following outliers, Paudice et al. [24] decided to pretrained DL models, and often use privacy-sensitive data to
relabel data points that considered outliers by a label-flipping improve model performance. A group of studies [112] has
attack, which is a poisoning attack, wherein an attacker suggested that not only DL models learn latent patterns from
changes the label of few training points. They considered the training data, but also a trained model becomes actually a
points farthest from the decision boundary to be malicious repository of that data, which would effectively be exposed
and reclassified them. The algorithm reassigned the label of by granting access to a pretrained model. In a membership
each malicious example using a k-nearest neighbor (k-NN) inference attack [113, 114, 115, 116, 117], an attacking model
algorithm. For each sample of the training data, the closest tries to determine whether the given dataset was used to train
k-NN points were first found using the Euclidean distance. the target model. The more powerful inversion attack aims
If the number of data points with the most common label to obtain the attributes of the unknown data that were used
among the k-NN was equal to or greater than a given threshold, to train the target model. For example, Fredrikson et al. [30]
the corresponding training sample was renamed as the most reconstructed an image of a face that was used to train a target
common label in k-NN. classifier using confidence scores attached to the classification.
Chen et al. [25] looked for poisoned data by monitoring 2) Information Silos: An information silo is a data man-
activation in the latent space of a neural network, rather agement system that is isolated from other similar systems.
than analyzing its input or output. Each example was an- A deep-learning system is usually more effective if it is
alyzed by how far the degree of activation deviation from trained using a large dataset. In an AI system, information
the activation-value distributions of a class majority. Tran et from different silos may be used to train the model without
al. [26] also defended against other variation backdoor attacks directly sharing data among the silos. Federated learning [118,
by monitoring activation values, which were analyzed using 119, 120] facilitates this process by sharing gradients and
spectral signatures. This method spotted poisoned data using model parameters; however, this makes the data vulnerable to
the activation of a neural network, similar to the method used membership and inversion attacks illustrated in Fig. 8. Hitaj
by Chen et al. [25]. Firstly, a singular value decomposition et al. [121] demonstrated that a federated DL approach is
was applied to the covariance matrix. Subsequently, all the essentially broken in terms of privacy, because it is virtually
training data were compared with the first singular vector. impossible to protect the training data of honest participants
Poisoned examples had a high outlier score and were erased from an attack in which a GAN tricks a victim into revealing
before retraining the neural network. sensitive data.
The defense proposed by Liu et al. [27] was different from 3) Users: Many DL-based applications run on third-party
the mechanisms described above, which aims to detect and servers, because they are too large and complicated [122, 123]
remove poisoned data. These authors modified the neural net- to run on devices such as mobile phones or smart speakers.
work itself, using a technique called fine-pruning (combination Users must therefore transfer sensitive data, such as voice
of pruning and fine-tuning). Pruning a neural network removes recordings or images of faces, to the server. Therefore, the
neurons, including the backdoor neurons [28]. However, be- user loses control of their data: they can neither delete it nor
cause other attacks are made pruning-aware, this method also determine the manner in which it is used. Similar to the recent
suggested cleaning the neural network through fine-tuning Facebook–Cambridge Analytica data scandal [124], privacy
after pruning using trusted clean data. The resultant network policies may be inadequate in preventing the exploitation of
was found to be robust against multiple poisoning attacks. user data.
Wang et al. [29] presented a similar method to that of Liu et
12 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020
(&')*$&+,'-)*.&'
/&0"'#1+(&')&' !"#"
!"#"
!"#"
!"#$%&'(")*++*,-
!"#$%&'(")*++*,-
!"#"
203-'4#"*-0+ (&')*$&+56&'
!""#$%&' (*1-
!"# !$# !%#
Fig. 8. Privacy attack scenarios from the perspectives of the (a) service provider, (b) information silo, and (c) user.
TABLE III
D EFENSE METHODS FOR PRIVATE AI
B. Defenses against Attacks on Privacy functions. Chabanne et al. [127] employed batch normalization
Several methods have been attempted to combine DL with to reduce the difference in accuracy between the original
established security techniques, including homomorphic en- classifier and the classifier evaluated with encrypted data by
cryption (HE), secure multi-party computation (SMC) and approximating the activation function during inference. This
differential privacy (DP). Table III evaluates these techniques technique also permitted the design of a deeper model. The
against the privacy threats listed in Section III-A1, and in this inference on the original version of CryptoNets was slow
subsection, we review their effectiveness. by several hundreds of seconds; its speed was subsequently
1) Homomorphic encryption on deep learning: HE is a improved in later studies [128, 129].
cryptographic scheme that enables computations on encrypted TFHE [151] is a recent HE technique that supports opera-
data without decryption. An encryption scheme is homomor- tions on binary data. TAPAS [130] and FHE-DiNN [131] were
phic for operation ∗; without the access to the secret key, the improvements on this scheme, implemented using with binary
following holds: neural networks, which achieved improved speed and greater
accuracy on the MNIST dataset, with only a single hidden
Enc(x1 ) ∗ Enc(x2 ) = Enc(x1 ∗ x2 ), (22) layer.
where Enc(·) denotes the encryption function. HE can protect 2) Secure multi-party computation on deep learning: Till
user data from third-party servers or gradients aggregated date, there are two major approaches proposed to maintain
among information silos. privacy in DL-based systems involving multiple parties: 1)
Gilad et al. [125] introduced the use of encrypted data in protection of user-side privacy by secure multi-party compu-
the inference phase. Their CryptoNets system uses a YASHE tation, and 2) secure sharing of gradients between information
leveled HE scheme [149] to provide privacy-preserving in- silos.
ference on a pretrained CNN. CryptoNets demonstrated over SMC methods are primarily based on secure two-party
99% accuracy in a classification task on handwritten digits (2PC) techniques, which involve a user, who provides data, and
in the MNIST dataset [150]. However, because nonlinear a server that runs a DL system using the data. SecureML [132]
activations are approximated by square functions, the exten- was the first privacy-preserving method proposed in which
sibility of CryptoNets to large complicated models is ques- neural networks were computed using 2PC as it requires large
tionable. [122, 123]. However, Hesamifard et al. [126] and amounts of communication. In MiniONN [133] a neural net-
Chabanne et al. [127] attempted to improve CryptoNets using work is replaced by an oblivious neural network that is trained
higher-degree polynomial approximations of the activation using a simplified HE scheme. Garbled circuits (GCs) were
HO et al.: SECURITY AND PRIVACY ISSUES IN DEEP LEARNING 13
transfer phase of the teacher-student mechanism. Papernot Moreover, as mentioned in previous studies, poisoned train-
et al. [146] introduced a semi-supervised knowledge-transfer ing data can be classified as outliers and be detected; if the
technique called private aggregation of teacher ensembles poisoned data is less visible as it is highly similar to the clear
(PATE), which is a type of teacher-student mechanism whose data, it would be less effective. This provides an avenue for
purpose is to train a differentially private classifier (the stu- developing an methodology for performing a trade-off between
dent) based on an ensemble of non-private classifiers (the effectiveness and detectability, which has been achieved by
teachers), trained on disjoint datasets. The teacher ensemble changing the decision boundary of the model. New metrics and
outputs noisy labels by noisy aggregation of each teacher’s evaluation methods are further required to secure the training
prediction, which the student learns. Because the student phase.
model cannot access the training data directly and the labels In addition to designing DL-based systems as robust as
that it receives are differentially private, PATE provides DP. current technology permits, it is essential to establish good
PATE utilizes a moment accountant to track the privacy budget testing and verification methods to verify it. For instance,
spent through the learning process. Later, Papernot et al. [147] there exists an automated test for DNNs used in autonomous
extended PATE to operate at a large scale by introducing a driving vehicles [175]. Several authors [18, 19, 176, 177] have
new noisy aggregation mechanism, which outperformed the presented a formal analysis of the robustness of DNNs against
original PATE. Jordon et al. [148] applied the PATE to train a input perturbations. However, current verification methods
discriminator to build a differentially private GAN framework. predominantly consider only norm-ball perturbations, such as
The discriminator provided DP, and the generator trained with the l2 ball mentioned in Section II; hence, a more generalized
the discriminator was also differentially private using the post- verification is required.
processing theory [160].
[3] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R- [17] S. Hu, T. Yu, C. Guo, W.-L. Chao, and K. Q. Wein-
CNN: towards real-time object detection with region berger, “A new defense against adversarial images:
proposal networks,” in Advances in Neural Information turning a weakness into a strength,” in Advances in
Processing Systems, 2015. Neural Information Processing Systems, 2019.
[4] C. Yan, X. Wenyuan, and J. Liu, “Can you trust au- [18] G. Katz, C. Barrett, D. L. Dill, K. Julian, and M. J.
tonomous vehicles: contactless attacks against sensors Kochenderfer, “Reluplex: an efficient SMT solver for
of self-driving vehicle,” in DEF CON 24 Hacking verifying deep neural networks,” in Proc. International
Conference, 2016. Conference on Computer Aided Verification (CAV),
[5] J. Li, F. Schmidt, and Z. Kolter, “Adversarial cam- 2017.
era stickers: a physical camera-based attack on deep [19] E. Wong and Z. Kolter, “Provable defenses against
learning systems,” in Proc. International Conference on adversarial examples via the convex outer adversarial
Machine Learning, 2019. polytope,” in Proc. International Conference on Ma-
[6] N. Carlini and D. Wagner, “Audio adversarial examples: chine Learning, 2018.
targeted attacks on speech-to-text,” in Proc. 39th IEEE [20] J. M. Cohen, E. Rosenfeld, and J. Z. Kolter, “Certified
Symposium Security and Privacy, 2018. adversarial robustness via randomized smoothing,” in
[7] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, Proc. International Conference on Machine Learning,
“Accessorize to a crime: real and stealthy attacks on 2019.
state-of-the-art face recognition,” in Proc. ACM SIGSAC [21] J. Steinhardt, P. W. W. Koh, and P. S. Liang, “Certified
Conference on Computer and Communications Security, defenses for data poisoning attacks,” in Advances in
2016. Neural Information Processing Systems, 2017.
[8] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, [22] P. W. Koh and P. Liang, “Understanding black-box
P. Laskov, G. Giacinto, and F. Roli, “Evasion attacks predictions via influence functions,” in Proc. 34th In-
against machine learning at test time,” in Proc. Joint ternational Conference on Machine Learning, 2017.
European Conference on Machine Learning and Knowl- [23] A. Paudice, L. Muñoz-González, A. Gyorgy, and E. C.
edge Discovery in Databases, 2013. Lupu, “Detection of adversarial training examples in
[9] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining poisoning attacks through anomaly detection,” arXiv
and harnessing adversarial examples,” arXiv preprint preprint arXiv:1802.03041, 2018.
arXiv:1412.6572, 2014. [24] A. Paudice, L. Muñoz-González, and E. C. Lupu, “La-
[10] J. Buckman, A. Roy, C. Raffel, and I. Goodfellow, bel sanitization against label flipping poisoning attacks,”
“Thermometer encoding: one hot way to resist adver- in Proc. Joint European Conference on Machine Learn-
sarial examples,” in Proc. International Conference on ing and Knowledge Discovery in Databases, 2018.
Learning Representations, 2018. [25] B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Ed-
[11] G. S. Dhillon, K. Azizzadenesheli, Z. C. Lipton, wards, T. Lee, I. Molloy, and B. Srivastava, “Detecting
J. Bernstein, J. Kossaifi, A. Khanna, and A. Anand- backdoor attacks on deep neural networks by activation
kumar, “Stochastic activation pruning for robust adver- clustering,” arXiv preprint arXiv:1811.03728, 2018.
sarial defense,” in Proc. International Conference on [26] B. Tran, J. Li, and A. Madry, “Spectral signatures in
Learning Representations, 2018. backdoor attacks,” in Advances in Neural Information
[12] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kush- Processing Systems, 2018.
man, “PixelDefend: leveraging generative models to [27] K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-pruning:
understand and defend against adversarial examples,” in defending against backdooring attacks on deep neural
Proc. International Conference on Machine Learning, networks,” in Proc. International Symposium on Re-
2017. search in Attacks, Intrusions, and Defenses, 2018.
[13] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and [28] T. Gu, B. Dolan-Gavitt, and S. Garg, “BadNets: iden-
A. Vladu, “Towards deep learning models resistant to tifying vulnerabilities in the machine learning model
adversarial attacks,” Proc. International Conference on supply chain,” arXiv preprint arXiv:1708.06733, 2017.
Learning Representations, 2018. [29] B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath,
[14] H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, H. Zheng, and B. Y. Zhao, “Neural cleanse: identifying
and M. I. Jordan, “Theoretically principled trade- and mitigating backdoor attacks in neural networks,” in
off between robustness and accuracy,” arXiv preprint Proc. 40th IEEE Symposium on Security and Privacy,
arXiv:1901.08573, 2019. 2019.
[15] J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff, [30] M. Fredrikson, S. Jha, and T. Ristenpart, “Model in-
“On detecting adversarial perturbations,” in Proc. In- version attacks that exploit confidence information and
ternational Conference on Learning Representations, basic countermeasures,” in Proc. ACM SIGSAC Confer-
2017. ence on Computer and Communications Security, 2015.
[16] D. Meng and H. Chen, “Magnet: a two-pronged defense [31] R. Shokri, M. Stronati, C. Song, and V. Shmatikov,
against adversarial examples,” in Proc. ACM SIGSAC “Membership inference attacks against machine learn-
Conference on Computer and Communications Security, ing models,” in Proc. IEEE Symposium on Security and
2017, pp. 135–147. Privacy, 2017.
16 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020
[32] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, “Spatially transformed adversarial examples,” arXiv
D. Erhan, I. Goodfellow, and R. Fergus, “Intrigu- preprint arXiv:1801.02612, 2018.
ing properties of neural networks,” arXiv preprint [49] C. Laidlaw and S. Feizi, “Functional adversarial at-
arXiv:1312.6199, 2013. tacks,” in Advances in Neural Information Processing
[33] N. Carlini and D. Wagner, “Towards evaluating the ro- Systems, 2019.
bustness of neural networks,” in Proc. IEEE Symposium [50] N. Papernot, P. McDaniel, and I. Goodfellow, “Transfer-
on Security and Privacy, 2017. ability in machine learning: from phenomena to black-
[34] D. P. Kingma and J. Ba, “Adam: a method for stochastic box attacks using adversarial samples,” arXiv preprint
optimization,” in Proc. International Conference on arXiv:1605.07277, 2016.
Learning Representations, 2015. [51] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow,
[35] N. Papernot, P. D. McDaniel, S. Jha, M. Fredrikson, D. Boneh, and P. McDaniel, “Ensemble adversarial
Z. B. Celik, and A. Swami, “the limitations of deep training: attacks and defenses,” in Proc. International
learning in adversarial settings,” in Proc. IEEE Euro- Conference on Learning Representations, 2019.
pean Symposium on Security and Privacy, 2016. [52] P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J.
[36] A. Kurakin, I. Goodfellow, S. Bengio et al., “Adversar- Hsieh, “ZOO: zeroth order optimization based black-
ial examples in the physical world,” 2016. box attacks to deep neural networks without training
[37] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and substitute models,” in Proc. 10th ACM Workshop on
P. Frossard, “Universal adversarial perturbations,” in Artificial Intelligence and Security, 2017.
Proc. IEEE Conference on Computer Vision and Pattern [53] A. Ilyas, L. Engstrom, A. Athalye, J. Lin, A. Athalye,
Recognition, 2017. L. Engstrom, A. Ilyas, and K. Kwok, “Black-box ad-
[38] A. Athalye, N. Carlini, and D. Wagner, “Obfuscated versarial attacks with limited queries and information,”
gradients give a false sense of security: circumventing in Proc. 35th International Conference on Machine
defenses to adversarial examples,” in Proc. Interna- Learning, 2018.
tional Conference on Machine Learning, 2018. [54] A. Ilyas, L. Engstrom, and A. Madry, “Prior convic-
[39] W. Brendel, J. Rauber, M. Kümmerer, I. Ustyuzhaninov, tions: black-box adversarial attacks with bandits and
and M. Bethge, “Accurate, reliable and fast robustness priors,” in Proc. International Conference on Learning
evaluation,” in Advances in Neural Information Process- Representations, 2019.
ing Systems, 2019. [55] J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack
[40] W. Brendel, J. Rauber, and M. Bethge, “Decision- for fooling deep neural networks,” IEEE Transactions
based adversarial attacks: reliable attacks against black- on Evolutionary Computation, no. 5, pp. 828–841,
box machine learning models,” in Proc. International 2019.
Conference on Learning Representations, 2018. [56] C. Guo, J. Gardner, Y. You, A. G. Wilson, and K. Wein-
[41] F. Croce and M. Hein, “Reliable evaluation of adversar- berger, “Simple black-box adversarial attacks,” in Proc.
ial robustness with an ensemble of diverse parameter- International Conference on Machine Learning, 2019.
free attacks,” in Proc. International Conference on [57] D. Wierstra, T. Schaul, J. Peters, and J. Schmidhuber,
Machine Learning, 2020. “Natural evolution strategies,” in Proc. IEEE Congress
[42] ——, “Minimally distorted adversarial examples with on Evolutionary Computation, 2008.
a fast adaptive boundary attack,” in Proc. International [58] L. Muñoz-González, B. Biggio, A. Demontis, A. Pau-
Conference on Machine Learning, 2020. dice, V. Wongrassamee, E. C. Lupu, and F. Roli,
[43] M. Andriushchenko, F. Croce, N. Flammarion, and “Towards poisoning of deep learning algorithms with
M. Hein, “Square attack: a query-efficient black-box ad- back-gradient optimization,” in Proc. ACM Workshop
versarial attack via random search,” in Proc. European on Artificial Intelligence and Security, 2017.
Conference on Computer Vision, 2020. [59] C. Yang, Q. Wu, H. Li, and Y. Chen, “Generative poi-
[44] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. soning attack method against neural networks,” arXiv
Celik, and A. Swami, “Practical black-box attacks preprint arXiv:1703.01340, 2017.
against machine learning,” in Proc. ACM ASIA Confer- [60] L. Muñoz-González, B. Pfitzner, M. Russo,
ence on Computer and Communications Security, 2017. J. Carnerero-Cano, and E. C. Lupu, “Poisoning
[45] Z. Zhao, D. Dua, and S. Singh, “Generating natural ad- attacks with generative adversarial nets,” arXiv preprint
versarial examples,” in Proc. International Conference arXiv:1906.07773, 2019.
on Learning Representations, 2018. [61] A. Shafahi, W. R. Huang, M. Najibi, O. Suciu,
[46] Y. Song, R. Shu, N. Kushman, and S. Ermon, “Con- C. Studer, T. Dumitras, and T. Goldstein, “Poison frogs!
structing unrestricted adversarial examples with gen- targeted clean-label poisoning attacks on neural net-
erative models,” in Advances in Neural Information works,” in Advances in Neural Information Processing
Processing Systems, 2018. Systems, 2018.
[47] A. Odena, C. Olah, and J. Shlens, “Conditional image [62] C. Zhu, W. R. Huang, H. Li, G. Taylor, C. Studer,
synthesis with auxiliary classifier gans,” in Proc. 34th and T. Goldstein, “Transferable clean-label poisoning
International Conference on Machine Learning, 2017. attacks on deep neural nets,” in Proc. International
[48] C. Xiao, J.-Y. Zhu, B. Li, W. He, M. Liu, and D. Song, Conference on Machine Learning, 2019.
HO et al.: SECURITY AND PRIVACY ISSUES IN DEEP LEARNING 17
[63] W. R. Huang, J. Geiping, L. Fowl, G. Taylor, sic dimensionality,” in Proc. International Conference
and T. Goldstein, “MetaPoison: practical general- on Learning Representations, 2018.
purpose clean-label data poisoning,” arXiv preprint [79] C. Guo, M. Rana, M. Cisse, and L. van der Maaten,
arXiv:2004.00225, 2020. “Countering adversarial images using input transforma-
[64] J. Geiping, L. Fowl, W. R. Huang, W. Czaja, G. Taylor, tions,” in Proc. International Conference on Learning
M. Moeller, and T. Goldstein, “Witches’ brew: indus- Representations, 2018.
trial scale data poisoning via gradient matching,” Proc. [80] A. Graese, A. Rozsa, and T. E. Boult, “Assessing
International Conference on Learning Representations, threat of adversarial examples on deep neural networks,”
2021. in Proc. IEEE International Conference on Machine
[65] Y. Liu, S. Ma, Y. Aafer, W.-C. Lee, J. Zhai, W. Wang, Learning and Applications, 2016.
and X. Zhang, “Trojaning attack on neural networks,” [81] W. Xu, D. Evans, and Y. Qi, “Feature squeezing: de-
in Proc. 25th Annual Network and Distributed System tecting adversarial examples in deep neural networks,”
Security Symposium, 2018. in Proc. Network and Distributed System Symposium,
[66] X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted 2017.
backdoor attacks on deep learning systems using data [82] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total
poisoning,” arXiv preprint arXiv:1712.05526, 2017. variation based noise removal algorithms,” Physica D:
[67] S. Li, B. Z. H. Zhao, J. Yu, M. Xue, D. Kaafar, and Nonlinear Phenomena, vol. 60, no. 1-4, pp. 259–268,
H. Zhu, “Invisible backdoor attacks against deep neural 1992.
networks,” arXiv preprint arXiv:1909.02742, 2019. [83] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille,
[68] T. A. Nguyen and A. T. Tran, “Wanet - imperceptible “Mitigating adversarial effects through randomization,”
warping-based backdoor attack,” in Proc. International in Proc. International Conference on Learning Repre-
Conference on Learning Representations, 2021. sentations, 2018.
[69] Y. Liu, X. Ma, J. Bailey, and F. Lu, “Reflection [84] A. V. Oord, N. Kalchbrenner, and K. Kavukcuoglu,
backdoor: a natural backdoor attack on deep neural “Pixel recurrent neural networks,” in Proc. 33rd Inter-
networks,” in Proc. European Conference on Computer national Conference on Machine Learning, 2016.
Vision, 2020. [85] P. Samangouei, M. Kabkab, and R. Chellappa,
[70] M. Barni, K. Kallas, and B. Tondi, “A new backdoor “Defense-GAN: protecting classifiers against adversar-
attack in CNNs by training set corruption without label ial attacks using generative models,” in Proc. Interna-
poisoning,” in Proc. IEEE International Conference on tional Conference on Learning Representations, 2018.
Image Processing (ICIP), 2019. [86] N. Carlini and D. Wagner, “Adversarial examples are
[71] A. Turner, D. Tsipras, and A. Madry, “Label-consistent not easily detected: bypassing ten detection methods,”
backdoor attacks,” arXiv preprint arXiv:1912.02771, in Proc. 10th ACM Workshop on Artificial Intelligence
2019. and Security, 2017.
[72] Y. Liu, L. Wei, B. Luo, and Q. Xu, “Fault injection [87] W. He, J. Wei, X. Chen, N. Carlini, and D. Song,
attack on deep neural network,” in Proc. IEEE/ACM “Adversarial example defense: ensembles of weak de-
International Conference on Computer-Aided Design fenses are not strong,” in Proc. USENIX Workshop on
(ICCAD). IEEE, 2017, pp. 131–138. Offensive Technologies, 2017.
[73] A. S. Rakin, Z. He, J. Li, F. Yao, C. Chakrabarti, and [88] J. Uesato, B. O’Donoghue, P. Kohli, and A. Oord,
D. Fan, “T-BFA: targeted bit-flip adversarial weight “Adversarial risk and the dangers of evaluating against
attack,” arXiv preprint arXiv:2007.12336, 2020. weak attacks,” in International Conference on Machine
[74] A. S. Rakin, Z. He, and D. Fan, “TBT: targeted neural Learning, 2018.
network attack with bit trojan,” in Proc. IEEE/CVF [89] N. Papernot, P. D. McDaniel, X. Wu, S. Jha, and
Conference on Computer Vision and Pattern Recogni- A. Swami, “Distillation as a defense to adversarial
tion, 2020. perturbations against deep neural networks,” in Proc.
[75] P. Zhao, S. Wang, C. Gongye, Y. Wang, Y. Fei, IEEE Symposium on Security and Privacy, 2016.
and X. Lin, “Fault sneaking attack: a stealthy frame- [90] M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, and
work for misleading deep neural networks,” in Proc. N. Usunier, “Parseval networks: improving robustness
56th ACM/IEEE Design Automation Conference (DAC), to adversarial examples,” in Proc. International Con-
2019. ference on Machine Learning, 2017.
[76] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversar- [91] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and
ial machine learning at scale,” in Proc. International L. Fei-Fei, “ImageNet: a large-scale hierarchical image
Conference on Learning Representations, 2017. database,” in Proc. IEEE Conference on Computer
[77] A. Athalye and I. Sutskever, “Synthesizing robust ad- Vision and Pattern Recognition, 2009.
versarial examples,” arXiv preprint arXiv:1707.07397, [92] H. Kannan, A. Kurakin, and I. Goodfellow, “Adversarial
2017. logit pairing,” arXiv preprint arXiv:1803.06373, 2018.
[78] X. Ma, B. Li, Y. Wang, S. M. Erfani, S. Wijewickrema, [93] A. S. Ross and F. Doshi-Velez, “Improving the ad-
M. E. Houle, G. Schoenebeck, D. Song, and J. Bailey, versarial robustness and interpretability of deep neural
“Characterizing adversarial subspaces using local intrin- networks by regularizing their input gradients,” in Proc.
18 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020
AAAI Conference on Artificial Intelligence, 2018. in Neural Information Processing Systems, 2018, pp.
[94] T. Miyato, S.-i. Maeda, M. Koyama, and S. Ishii, 8400–8409.
“Virtual adversarial training: a regularization method for [109] M. Lecuyer, V. Atlidakis, R. Geambasu, D. Hsu, and
supervised and semi-supervised learning,” IEEE Trans- S. Jana, “Certified robustness to adversarial examples
actions on Pattern Analysis and Machine Intelligence, with differential privacy,” in 2019 IEEE Symposium on
vol. 41, no. 8, pp. 1979–1993, 2018. Security and Privacy (SP). IEEE, 2019, pp. 656–672.
[95] C. Xie, Y. Wu, L. v. d. Maaten, A. L. Yuille, and [110] M. Balunovic and M. Vechev, “Adversarial
K. He, “Feature denoising for improving adversarial training and provable defenses: Bridging the
robustness,” in Proceedings of the IEEE Conference on gap,” in International Conference on Learning
Computer Vision and Pattern Recognition, 2019, pp. Representations, 2020. [Online]. Available: https:
501–509. //openreview.net/forum?id=SJxSDxrKDr
[96] C. Qin, J. Martens, S. Gowal, D. Krishnan, K. Dvi- [111] G. F. Cretu, A. Stavrou, M. E. Locasto, S. J. Stolfo,
jotham, A. Fawzi, S. De, R. Stanforth, and P. Kohli, and A. D. Keromytis, “Casting out demons: Sanitiz-
“Adversarial robustness through local linearization,” in ing training data for anomaly sensors,” in 2008 IEEE
Advances in Neural Information Processing Systems, Symposium on Security and Privacy (sp 2008). IEEE,
2019. 2008, pp. 81–95.
[97] H. Zhang and J. Wang, “Defense against adversarial [112] M. Veale, R. Binns, and L. Edwards, “Algorithms that
attacks using feature scattering-based adversarial train- remember: model inversion attacks and data protection
ing,” in Advances in Neural Information Processing law,” Philosophical Transactions of the Royal Society
Systems, 2019. A: Mathematical, Physical and Engineering Sciences,
[98] J. Zhang, X. Xu, B. Han, G. Niu, L. Cui, M. Sugiyama, vol. 376, no. 2133, p. 20180083, 2018.
and M. Kankanhalli, “Attacks which do not kill training [113] R. Shokri, M. Stronati, C. Song, and V. Shmatikov,
make adversarial learning stronger,” in Proc. Interna- “Membership inference attacks against machine learn-
tional Conference on Machine Learning, 2020. ing models,” in IEEE Symposium on Security and
[99] L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, and Privacy, 2017.
A. Madry, “Adversarially robust generalization requires [114] C. Song and V. Shmatikov, “The natural auditor: How
more data,” in Advances in Neural Information Process- to tell if someone used your words to train their model,”
ing Systems, 2018. arXiv preprint arXiv:1811.00513, 2018.
[100] Y. Carmon, A. Raghunathan, L. Schmidt, J. C. Duchi, [115] J. Hayes, L. Melis, G. Danezis, and E. De Cristofaro,
and P. S. Liang, “Unlabeled data improves adversarial “Logan: Membership inference attacks against gener-
robustness,” in Advances in Neural Information Pro- ative models,” Privacy Enhancing Technologies, vol.
cessing Systems, 2019. 2019, no. 1, pp. 133–152, 2019.
[101] S. Lee, C. Park, H. Lee, J. Yi, J. Lee, and S. Yoon, [116] C. A. C. Choo, F. Tramer, N. Carlini, and N. Paper-
“Removing undesirable feature contributions using out- not, “Label-only membership inference attacks,” arXiv
of-distribution data,” in Proc. International Conference preprint arXiv:2007.14321, 2020.
on Learning Representations, 2021. [117] M. Nasr, R. Shokri, and A. Houmansadr, “Compre-
[102] K. Grosse, P. Manoharan, N. Papernot, M. Backes, and hensive privacy analysis of deep learning: Passive and
P. McDaniel, “On the (statistical) detection of adversar- active white-box inference attacks against centralized
ial examples,” arXiv preprint arXiv:1702.06280, 2017. and federated learning,” in 2019 IEEE symposium on
[103] R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner, security and privacy (SP). IEEE, 2019, pp. 739–753.
“Detecting adversarial samples from artifacts,” arXiv [118] B. McMahan, E. Moore, D. Ramage, S. Hampson,
preprint arXiv:1703.00410, 2017. and B. A. y Arcas, “Communication-efficient learning
[104] T. Pang, C. Du, Y. Dong, and J. Zhu, “Towards robust of deep networks from decentralized data,” in 20th
detection of adversarial examples,” in Advances in Neu- International Conference on Artificial Intelligence and
ral Information Processing Systems, 2018. Statistics, 2017.
[105] A. Sinha, H. Namkoong, R. Volpi, and J. Duchi, “Cer- [119] J. Konečnỳ, H. B. McMahan, D. Ramage, and
tifying some distributional robustness with principled P. Richtárik, “Federated optimization: Distributed ma-
adversarial training,” in Proc. International Conference chine learning for on-device intelligence,” arXiv
on Learning Representations, 2017. preprint arXiv:1610.02527, 2016.
[106] M. Hein and M. Andriushchenko, “Formal guarantees [120] K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba,
on the robustness of a classifier against adversarial A. Ingerman, V. Ivanov, C. Kiddon, J. Konecny, S. Maz-
manipulation,” in Advances in Neural Information Pro- zocchi, H. B. McMahan et al., “Towards federated
cessing Systems, 2017. learning at scale: System design,” in Conference on
[107] A. Raghunathan, J. Steinhardt, and P. Liang, “Certified Systems and Machine Learning, 2019.
defenses against adversarial examples,” in International [121] B. Hitaj, G. Ateniese, and F. Perez-Cruz, “Deep models
Conference on Learning Representations, 2018. under the GAN: Information leakage from collaborative
[108] E. Wong, F. Schmidt, J. H. Metzen, and J. Z. Kolter, deep learning,” in In ACM SIGSAC Conference on
“Scaling provable adversarial defenses,” in Advances Computer and Communications Security, 2017.
HO et al.: SECURITY AND PRIVACY ISSUES IN DEEP LEARNING 19
[122] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual Forensics and Security, vol. 13, no. 5, pp. 1333–1345,
learning for image recognition,” in IEEE Conference on 2018.
Computer Vision and Pattern Recognition, 2016. [138] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan,
[123] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der I. Mironov, K. Talwar, and L. Zhang, “Deep learning
Maaten, “Densely connected convolutional networks,” with differential privacy,” in Proceedings of the 2016
in IEEE Conference on Computer Vision and Pattern ACM SIGSAC Conference on Computer and Communi-
Recognition, 2017. cations Security, 2016.
[124] C. Analytica, “Facebook–cambridge an- [139] H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang,
alytica data scandal,” 2018. [On- “Learning differentially private recurrent language mod-
line]. Available: https://en.wikipedia.org/wiki/ els,” in International Conference on Learning Represen-
FacebookCambridge\ Analytica data scandal tations, 2018.
[125] R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, [140] L. Xie, K. Lin, S. Wang, F. Wang, and J. Zhou,
M. Naehrig, and J. Wernsing, “Cryptonets: Applying “Differentially private generative adversarial network,”
neural networks to encrypted data with high throughput arXiv preprint arXiv:1802.06739, 2018.
and accuracy,” in International Conference on Machine [141] G. Acs, L. Melis, C. Castelluccia, and E. De Cristo-
Learning, 2016. faro, “Differentially private mixture of generative neural
[126] E. Hesamifard, H. Takabi, and M. Ghasemi, “Cryp- networks,” IEEE Transactions on Knowledge and Data
toDL: Deep neural networks over encrypted data,” arXiv Engineering, 2018.
preprint arXiv:1711.05189, 2017. [142] L. Yu, L. Liu, C. Pu, M. E. Gursoy, and S. Truex, “Dif-
[127] H. Chabanne, A. de Wargny, J. Milgram, C. Morel, and ferentially private model publishing for deep learning,”
E. Prouff, “Privacy-preserving classification on deep in Differentially Private Model Publishing for Deep
neural network,” IACR Cryptology ePrint Archive, 2017. Learning, 2019.
[128] E. Chou, J. Beal, D. Levy, S. Yeung, A. Haque, [143] K. Chaudhuri and C. Monteleoni, “Privacy-preserving
and L. Fei-Fei, “Faster cryptonets: Leveraging spar- logistic regression,” in Advances in Neural Information
sity for real-world encrypted inference,” arXiv preprint Processing Systems, 2009.
arXiv:1811.09953, 2018. [144] N. Phan, Y. Wang, X. Wu, and D. Dou, “Differential
[129] A. Brutzkus, O. Elisha, and R. Gilad-Bachrach, “Low privacy preservation for deep auto-encoders: an applica-
latency privacy preserving inference,” arXiv preprint tion of human behavior prediction.” in AAAI Conference
arXiv:1812.10659, 2018. on Artificial Intelligence, 2016.
[130] A. Sanyal, M. J. Kusner, A. Gascón, and V. Kanade, [145] N. Phan, X. Wu, and D. Dou, “Preserving differential
“TAPAS: Tricks to accelerate (encrypted) prediction privacy in convolutional deep belief networks,” Machine
as a service,” in International Conference in Machine Learning, vol. 106, no. 9-10, pp. 1681–1704, 2017.
Learning, 2018. [146] N. Papernot, M. Abadi, Úlfar Erlingsson, I. Goodfellow,
[131] F. Bourse, M. Minelli, M. Minihold, and P. Paillier, and K. Talwar, “Semi-supervised knowledge transfer for
“Fast homomorphic evaluation of deep discretized neu- deep learning from private training data,” in Interna-
ral networks,” in Annual International Cryptology Con- tional Conference on Learning Representations, 2017.
ference, 2018. [147] N. Papernot, S. Song, I. Mironov, A. Raghunathan,
[132] P. Mohassel and Y. Zhang, “SecureML: A system for K. Talwar, and Ú. Erlingsson, “Scalable private learning
scalable privacy-preserving machine learning,” in IEEE with PATE,” in 6th International Conference on Learn-
Symposium on Security and Privacy, 2017. ing Representations, 2018.
[133] J. Liu, M. Juuti, Y. Lu, and N. Asokan, “Oblivious [148] J. Jordon, J. Yoon, and M. Van Der Schaar, “Pate-
neural network predictions via MiniONN transforma- gan: Generating synthetic data with differential privacy
tions,” in ACM SIGSAC Conference on Computer and guarantees,” in International Conference on Learning
Communications Security, 2017. Representations, 2018.
[134] B. D. Rouhani, M. S. Riazi, and F. Koushanfar, [149] J. W. Bos, K. Lauter, J. Loftus, and M. Naehrig,
“Deepsecure: Scalable provably-secure deep learning,” “Improved security for a ring-based fully homomorphic
in 5th ACM/ESDA/IEEE Design Automation Confer- encryption scheme,” in IMA International Conference
ence, 2018. on Cryptography and Coding, 2013.
[135] C. Juvekar, V. Vaikuntanathan, and A. Chandrakasan, [150] Y. LeCun, C. Cortes, and C. Burges, “Mnist handwritten
“Gazelle: A low latency framework for secure neural digit database,” 2010. [Online]. Available: http://yann.
network inference,” in 27th USENIX Security Sympo- lecun.com/exdb/mnist
sium, 2018. [151] I. Chillotti, N. Gama, M. Georgieva, and M. Izabachene,
[136] R. Shokri and V. Shmatikov, “Privacy-preserving deep “Faster fully homomorphic encryption: Bootstrapping in
learning,” in 22nd ACM SIGSAC Conference on Com- less than 0.1 seconds,” in International Conference on
puter and Communications Security, 2015. the Theory and Application of Cryptology and Informa-
[137] Y. Aono, T. Hayashi, L. Wang, S. Moriai et al., tion Security, 2016.
“Privacy-preserving deep learning via additively homo- [152] A. C.-C. Yao, “How to generate and exchange secrets,”
morphic encryption,” IEEE Transactions on Information in 27th Annual Symposium on Foundations of Computer
20 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020