0% found this document useful (0 votes)

4 views20 pages

Security and Privacy Issues in Deep Learning

The document reviews security and privacy issues in deep learning (DL), focusing on model security and data privacy in artificial intelligence (AI) systems. It categorizes security threats into evasion and poisoning attacks, detailing their mechanisms and potential defenses, including techniques like differential privacy and homomorphic encryption. The paper aims to provide insights for developing robust AI applications by compiling fragmented findings and techniques in the field.

Uploaded by

198junkemail

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views20 pages

Security and Privacy Issues in Deep Learning

Uploaded by

198junkemail

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO.

0, MONTH 2020 1

Security and Privacy Issues in Deep Learning

Ho Bae† , Jaehee Jang† , Dahuin Jung, Hyemi Jang, Heonseok Ha, Hyungyu Lee, and Sungroh Yoon* , Senior
Member, IEEE

Abstract—To promote secure and private artificial intelligence amounts of patient data for disease prediction [1], preform-
(SPAI), we review studies on the model security and data privacy ing autonomous security audits from system logs [2], and
of DNNs. Model security allows system to behave as intended developing self-driving cars using visual object detection [3].
without being affected by malicious external influences that can
compromise its integrity and efficiency. Security attacks can be However, the vulnerabilities of DL-based systems wht respect
arXiv:1807.11655v4 [cs.CR] 10 Mar 2021

divided based on when they occur: if an attack occurs during to security and privacy have been extensively studied to
training, it is known as a poisoning attack, and if it occurs prevent cyberattacks.
during inference (after training) it is termed an evasion attack. If the input data is compromised, a DL-based system
Poisoning attacks compromise the training process by corrupting can produce in accurate or undesired results. For example,
the data with malicious examples, while evasion attacks use
adversarial examples to disrupt entire classification process. jamming the sensors [4] or occluding the camera lens [5] of
Defenses proposed against such attacks include techniques to an autonomous driving system can have dangerous effects on
recognize and remove malicious data, train a model to be its performance. Similarly, bio-metric authentication systems
insensitive to such data, and mask the model’s structure and using face recognition [6] can be bypassed by adding noise or
parameters to render attacks more challenging to implement. digitally editing a pair of glasses onto the image of a face [7]
Furthermore, the privacy of the data involved in model training
is also threatened by attacks such as the model-inversion attack, to achieve false positive results.
or by dishonest service providers of AI applications. To maintain In this study, we divided such attacks into evasion (inference
data privacy, several solutions that combine existing data-privacy phase) and poisoning (training phase) attacks. In previous
techniques have been proposed, including differential privacy and studies of evasion attack, attacks have typically been cat-
modern cryptography techniques. In this paper, we describe the egorized as white-box or black-box attacks. Initially, most
notions of some of methods, e.g., homomorphic encryption, and
review their advantages and challenges when implemented in forms of evasion attacks were white-box attacks—they require
deep-learning models. prior knowledge of the DL model parameters and structure—
that attempt to subvert the learning process or reduce the
Impact Statement—With advancements in deep learning tech-
nologies, AI-based applications have become prevalent in various classification accuracy by injecting adversarial samples using
fields. However, existing deep learning models are vulnerable to gradient-based techniques [8, 9]. Recently, black-box attacks
various security and privacy threats. These threats can cause have become more prevalent; they function by exploiting the
grave consequences in real life: for example, if an autonomous classification confidence of the target model to produce incor-
vehicle is compromised, the system could fail to recognize a pedes- rect classification information. Poisoning attacks can also be
trian owing to an adversary, which can cause a lethal accident.
This paper systemically categorizes representative threats that divided into white- and black-box attacks based on the model
occur in deep-learning and defense methods and presents insights accessibility. However, in this paper, we categorize poisoning
for deep learning developers to develop robust AI applications. attacks into three subclasses based on the vulnerability of
Index Terms—Private AI, Secure AI, Machine Learning, Deep the target model: performance degradation, targeted poisoning,
Learning, Homomorphic Encryption, Differential Privacy, Ad- and backdoor attacks.
versarial Example, White-box Attack, Black-box Attack The methods proposed to defend DL-based systems against
evasion attacks include empirical approaches—gradient mask-
I. I NTRODUCTION ing [10, 11, 12], increasing robustness [9, 13, 14], and detec-
tion of attacks [15, 16, 17]–that can be implemented against
T HE development of deep learning (DL) algorithms has
transformed the the approach adopted to address sev-
eral real-life-data-driven problems, such as managing large
known attacks and model certification approaches [18, 19, 20].
We employed defense techniques to counter poisoning at-
tacks separately; they mainly focus on detecting anomalous
∗ Corresponding author: Sungroh Yoon (e-mail: sryoon@snu.ac.kr). data [21, 22, 23, 24, 25, 26, 25] and making the model robust
†: These authors contributed equally to this work. to poisoning attacks by pruning or fine-tuning with reliable
This paragraph of the first footnote will contain the date on which you clean data [27, 28, 29].
submitted your paper for review. It will also contain support information,
including sponsor and financial support acknowledgment. For example, “This Current DL systems additionally face the threat of privacy
work was supported in part by the U.S. Department of Commerce under Grant breach. Although it has been demonstrated that recovering or
BS123456.” identifying some of the training data [30, 31] is possible, a
H. Bae was with the Seoul National University, Seoul 08870, Republic of
Korea. He is now with Ehwa University, Seoul 03760 Republiic of Korea. privacy breach can occur in other situations as well. There
(e-mail: hobae@ehwa.ac.kr). are considerable risks involved in training a DL model with
J. Jang, D. Jung, H. Jang, H. Ha, H. Lee and S.Yoon are with the De- data owned by multiple parties; for instance, in the case
partment of Electrical and Computer Engineering, Seoul National University,
Seoul 08870, Republic of Korea. (email: {hukla, annajung0625, wkdal9512, of deploying an application via a third-party cloud system.
heonseok.ha, rucy74, sryoon}@snu.ac.kr). Various attempts have been made to counter these threats, by
2 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020

sign
applying conventional security techniques, such as homomor-
phic encryption, secure multiparty computation, or differential
privacy, to DL systems.
In this paper, we review recent studies on model security + .02 =
and data privacy that have contributed towards building a
secure and private artificial intelligence (SPAI). To address the
need for robust artificial intelligence (AI) systems, we further Lesser Panda: 99.99% Pole Cat: 86.54%
compile fragmented findings and techniques with the objective
of providing insights relevant to future research.
To summarize, we review recent research on privacy and
security issues associated with DL in the following domains. + .02 =

1) Attacks on DL models : The two major types of attacks

on DL relating to different phases—evasion and poison-
Puffer: 99.85% Electric Ray: 79.54%
ing attacks—evasion attacks involve the inference phase
whereas poisoning attacks involve the training phase. Fig. 1. An adversarial example generated by the fast gradient sign method
2) Defense of DL models : The various defense tech- [9]. Left: the original image. Middle: adversarial perturbations. Right: the
niques proposed, which can be categorized into two adversarial image containing the adversarial perturbation.
large groups based on the type of attack—evasion and
input input
poisoning; the techniques implemented against evasion
attacks can be further categorized into empirical (e.g.,
gradient masking, robustness, and detection) and certi- white-box
black-box
attack
fied approaches. attack
attacker attacker
3) Privacy attacks on AI systems : The potential pri- ‫׷‬
‫׏׷‬
?
vacy threats to DL-based systems arising from service ‫׷‬

providers, information silos and users.

4) Defense against a privacy breach : The most recent
output output
defense methods based on cryptography, such as homo-
morphic encryption, secure multiparty computation, and Fig. 2. Overview of a white-box (left) and black-box (right) attack.
differential privacy.

II. S ECURE AI labels or limited authority), the attacks would be difficult to

In this section, we suggest the concept of secure AI—an AI execute and alternative methods would be required, such as
system with security guarantees—to encourage studies in this substitute models or data (see Fig. 2 (right)).
field. Additionally, we introduce and taxonomize the groups of There are two approaches to perform an attack: targeted and
studies conducted on security attacks and defense mechanisms, nontargeted. An attack is targeted if its objective is to alter
as described in Table I. the classifier’s output to a specific target label; a nontargeted
attack, simply aims to cause inefficient labeling, i.e., to assign
a label that holds no value. Generally, nontargeted attacks are
A. Security Attacks on Deep Learning Models
more successful than targeted attacks.
Table II briefly describes evasion and the poisoning attacks. 1) Evasion Attacks:
As suggested earlier, a poisoning attack attempts to destroy the a) White-box attack: The first study of evasion at-
model during training. The adversarial example used in this tacks [32] was performed using the limited-memory Broy-
attack is known as an adversarial training example, as depicted den–Fletcher–Goldfarb–Shanno (L-BFGS) algorithm to gen-
in Fig. 1. In an evasion attack, the adversarial (test) examples erate adversarial examples. Szegedy et al. [32] proposed a
are applied during the inference phase, causing the model to targeted attack method called a box L-BFGS adversary, which
misclassify the input. Both types of attacks can be defined involves solving the simple box-constrained optimization prob-
as white or black-box attacks. However, as poisoning attacks lem.
are yet understudied, we categorized them as a) performance minimize knk2
degradation , b) targeted poisoning , and c) backdoor attacks. (1)
s.t. f (x + n) = ˜l,
h w c
In addition to being classified based on the phase of the where f is the classifier, x ∈ RI ×J ×K is the unperturbed
workflow that sustains the attack, they vary based on the image (I h , J w , and K c represent the height, width, and the
amount of information available to the attacker. On one hand, number of channels of the image), ˜l ∈ {1, · · · , k c } is the target
if the attacker gains access to all the information, including label, and n represents the minimum amount of noise needed
the model structure and parameters, their success is highly for a model to disassociate an image from its true label. The
likely; however, this situation is impractical as demonstrated box L-BFGS adversary searches for the minimum perturbation
in Fig. 2 (left). On the other hand, if the adversary has limited required for a successful attack. This form of attack has a high
information about the model (i.e., no access to ground-truth misclassification rate and high computational cost because the
HO et al.: SECURITY AND PRIVACY ISSUES IN DEEP LEARNING 3

TABLE I
T YPES OF ATTACKS ON SECURE AI AND DEFENSES TECHNIQUES EMPLOYED AGAINST THEM .

White-box Section II-A1a

Evasion Black-box Section II-A1b
Attack Performance degradation Section II-A2a
Poisoning Targeted poisoning Section II-A2b
Backdoor Section II-A2c
Gradient masking Section II-B1a
Evasion attacks Robustness Section II-B1b
Defense Detection Section II-B1c
Certification Section II-B1d
Poisoning attacks Section II-B2

TABLE II
ATTACK METHODS AGAINST S ECURE AI

White-box Black-box Training phase Inference phase

Adversarial Attack Types ↓ (Fig. 2a) (Fig. 2b)
Evasion X X X
Poisoning X X

adversarial examples must be generated by solving Equation efficient manner, it has a low success rate.
1. Various compromises have been made to overcome the
Although Carlini-Wagner’s attack (CW attack) [33] is also shortcomings of both the above-mentioned attacks. One such
based on the box L-BFGS attack [32], it uses a modified compromise is the iterative FGSM [36], which invokes the
version of Equation 1: FGSM multiple times, taking a small step after each update,
followed by a per-pixel clipping of the image. Further, it can
minimize D (x̃, x) + c · g (x̃) , (2) be proved that the result of each step will be in the L∞ ε-
where x̃ is the adversarial example, D is a distance metric that neighborhood of the original image. The update rule can be
includes Lp , L0 , L2 , and L∞ , g(x̃) is an objective function, expressed as follows:
in which, f (x̃) = ˜l if and only if g(x̃) ≤ 0, and c > 0 is a n o
x̃0 = x, x̃N +1 = Clipx,ξ x̃N + ξ · sign(∇x J(w, x̃N , ˜l))) ,
constant. Here, the Adam [34] optimizer—adopted to enhance
the effectiveness of this attack—conducts a rapid search for (4)
adversarial adversarial examples. The authors of [33] used the where x̃N is the intermediate result at the Nth iteration. This
method of changing the variables or the projected gradient method processes new generations more quickly and has a
descent to support box constraints as a relaxation process after higher success rate.
each optimization step. Noise added to the input data naturally promotes misclassi-
fication. Universal adversarial perturbations [37] are image-
Papernot et al. [35] introduced a targeted attack method that
agnostic perturbation vectors that have a high probability of
optimizes within the L0 distance. A Jacobian-based saliency
misclassification with respect to natural images. Supposing a
map attack (JSMA) is used to construct a saliency map based h w c

on a gradient derived from a feedforward propagation, and perturbation vector n ∈ RI ×J ×K perturbs the samples in
subsequently modifies the input features that maximize the the dataset and that X represents the dataset containing the
saliency map such that the probability that an image classified samples,
with the target label ˜l increases. f (x + n) 6= f (x) , for most x ∼ X . (5)
In general, a DL model is described as nonlinear and
The noise n should satisfy knkp ≤ ξ, and
overfitting; however, the fast gradient sign method (FGSM) [9]
is based on the assertion that the main vulnerability of a P (f (x + n) 6= f (x)) ≥ 1 − δ, (6)
x∼X
neural network to adversarial perturbation is its linear nature.
FGSM linearizes the cost function around its initial value, and where f is the classifier, ξ restricts the value of the perturba-
finds the maximum value of the resultant linearized function tion, and δ is the fooling rate.
following the closed-form equation: The backward pass differential approximation (BPDA) [38]
is an attack that has been claimed to overcome gradient-
x̃ = x + ξ · sign(∇x J(w, x, ˜l)) (3)
masking defense methods by performing a backward pass
where w is the parameter of the model. The parameter ξ with the identity function to approximate the true gradients
determines the strength of the adversarial perturbation applied of samples.
to the image, and J is the loss function for training. Although With the development of adversarial defense methods, more
this method can generate adversarial examples in a cost- advanced attack methods have been proposed. Brendel et
4 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020

al. [39] developed a Brendel and Bethge attack (B&B attack) lighten all the red pixels of an image simultaneously. To
in which adversarial examples are generated from incorrectly ensure that adversarial examples are indistinguishable from
classified regions. This method uses a combination of a the original ones, the method minimizes the adversarial loss
gradient-based attack and boundary attack [40], and is a black- function and smoothness-constraint loss function similar to
box attack. It estimates the local boundary between adversarial previous studies [33, 48].
and clean examples using gradients and moves the adversarial b) Black-box Attack: Practically, it is difficult to access
examples close to the clean examples along that boundary. the models or training datasets. Industrial training models are
Because this method finds optimal adversarial examples by maintained confidential, and models in mobile devices are not
optimization, it can be applied to different adversarial criteria accessible to attackers. The scenario for a black-box attack
and any norm bound (L0 , L1 , L2 , and L∞ ). is therefore closer to reality: an attacker has no information
Recently, Croce et al. [41] introduced the auto attack (AA), about the model or the dataset. However, the input format and
an ensemble attack that consists of several white-box attacks the labels outputted by a target model running on a mobile
and one black-box attack [42, 43]. Unlike previous methods, device may be accessible, possibly when the target model is
AA uses recursive attacks to separate evaded test data and hosted by Amazon or Google.
continues the attack on the remaining data with the successive In a black-box attack, the gradients of the target model
attack sequentially. Consequently, it is more powerful than are inaccessible to the attackers, who must therefore devise a
previous white-box attacks. substitute model. The attacks performed by means of substitute
Adversarial examples are typically designed to perturb an models are called transfer attacks. It has been shown [32, 9]
existing data point within a small matrix norm at the pixel that neural networks can attack another model without prior
level, i.e., the samples are norm-bounded. Most researchers knowledge of the number of layers or hidden nodes; however,
have used this characteristic to propose new defense methods. the task must be known. This is because a neural network has
Nonetheless, to overcome this drawback, various methods have the linear nature, whereas those of previous studies attribute
been proposed that semantically alter the attributes of the the transferability to its nonlinearity. Activation functions such
input image instead of employing a norm-bounded pixel-level as sigmoid and ReLU, are known to exhibit nonlinearity. The
approach. sigmoid function is challenging to implement in learning,
Natural generative adversarial network (GAN) [45] gener- whereas ReLU is widely used; however, unlike sigmoid, it
ates adversarial examples that appear natural to humans. Zhao does not produce nonlinearity. Thus, a replica of the target
et al. used the latent space z of the GAN structure to search model can learn a similar decision boundary for a given task.
for the required perturbation. A matching inverter (MI) is used The architecture of the substitute model, which may be
to search for z ∗ , which satisfies the following: a convolutional neural network (CNN), a recurrent neural
network (RNN), or a multi-layer perceptron (MLP), is approx-
z ∗ = argmin ke
z − MI(x)k s.t. f (G(e
z )) 6= f (x), (7)
z
e
imated based on the input format, which might be images or
sequences. Although the model can be trained by collecting
where G is a generator. Similarly, Song et al.[46] constructed
similar data from public sources, the process is highly expen-
unrestricted adversarial examples using an auxiliary classifier
sive.
GAN (ACGAN) [47]. Furthermore, they added norm-bounded
Papernot et al. [44] addressed this issue by introducing
noise to the generated images to boost the attack ability.
practical black-box attacks (Fig. 3), in which an initial syn-
Xiao et al. [48] introduced a novel spatially transformed
thetic dataset is augmented by a Jacobian-based method. This
attack. They used the pixel value and 2D coordinates of each
synthetic dataset can be developed from a subset that is not
pixel to estimate a per-pixel flow field and generate adversarial
part of the training, data and labeled by inputting it to the
examples. subsequently, the pixels were moved to adjacent
target model. Thereafter, the trained substitute model can be
pixel locations along the flow field to produce perceptually
used to create input data by sending queries to a service
realistic adversarial examples. They deployed the L-BFGS
such as Google or Amazon; these queries must be severely
solver to optimize the following loss function:
limited in number and frequency to prevent detection. Papernot
Lflow (f ) = et al. [50] resolved this problem by introducing reservoir
∀pixels
X X q sampling, which reduces the amount of data required to train
||∆u(p) − ∆u(q) ||22 + ||∆v (p) − ∆v (q) ||22 , the substitute model.
p q∈N (p) In addition to being expensive, transfer attacks that employ
(8) a substitute model can be blocked by most defense tech-
where N (p) contains the indices of the pixels adjacent to p, niques [51]. Recently, several attacks [52, 40, 53, 54, 55, 56]
and ∆u(·) and ∆v (·) are the changes in the 2D coordinates of that relied solely on the outputs of the model together with a
(·). The method results in more realistic adversarial examples few queries and other limited information were proposed.
than those of previous norm-bounded adversarial attacks Chen et al. [52] introduced a method for approximating the
Laidlaw et al. [49] used a parameterized function f to gradient of a target model that only requires the output score of
generate new pixels for producing adversarial examples. This the target network: they suggested a zeroth-order optimization
method of functional adversarial attacks was applied to the to estimate the gradient of the target model. This method
color space of images to produce perceptually different but randomly chooses and changes one pixel to compute the
realistic adversarial examples. For instance, the method may adversarial perturbation using zeroth-order optimization with
HO et al.: SECURITY AND PRIVACY ISSUES IN DEEP LEARNING 5

1 Oracle DNN 𝑂
Substitute Training
𝑆𝑡 ෨ 𝑡)
𝑂(𝑆
Dataset Collection 𝑆0 3 4 5
Substitute Dataset Substitute DNN 𝐹 Jacobian-based
Labeling 𝑆𝑡 Training 𝐹𝑡 Dataset Augmentation
2
Substitute DNN 𝐹 𝑡 ←𝑡+1
Architecture Selection 𝑆𝑡+1 = 𝑥 + 𝑣𝑡+1 ∙ 𝑠𝑖𝑔𝑛 𝐽𝐹 𝑂෨ 𝑥 ∶ 𝑥 ∈ 𝑆𝑡 ⋃ 𝑆𝑡

Fig. 3. Overview of a practical black-box attack [44]: the attacker (1) collects a training set S0 for an initial substitute model and (2) selects an appropriate
architecture F . Using the oracle model Õ, the attacker (3) labels the training set St and (4) trains the substitute model Ft . Following this, the Jacobian-based
adversarial attack algorithm is implemented, the dataset is augmented by the attacker, and steps (3) through (5) are repeated for t epochs.

!"#$%&'()*$)$+( 8$(-0/&8)/1 <=1/,1","./)/,#

769&,"(:+.&+,)3+-+("0&#)/1 >:?*#)$(-&#)/1@#$A/&+2&76
#)",)$(-&$."-/
759&#)/1&)+;",:#&+,$-$("0&$."-/

#)/1#&+2&)3/&"0-+,$)3.
'(1*)&4$./(#$+(&6

76
#13/,/&",+*(:
75
+,$-$("0&$."-/
B/C*$:$#)"()D

+,$-$("0&$."-/ >:?*#)$(-&#)/1@#$A/&+2&75

!"#$$%&%'()!*++'!,"-
#13/,/&",+*(:
!"#$$%&%'()!*++'!,"- +,$-$("0&$."-/
B/C*$:$#)"()D
!"#$$%&%'()%.!*++'!,"-)/#(0'+$#+%#"1

'(1*)&4$./(#$+(&5

Fig. 4. Left: A boundary attack—performs rejection sampling by traversing the boundary between the adversarial and original images. Middle: in each step,
the attack determines a new random direction by (#1) sampling a Gaussian distribution and projecting it on an equidistant sphere, and (#2) making a small
move towards the original image. Right: both two-step sizes are dynamically adjusted to accommodate the boundary [40].

the loss function described in [33]; this process is repeated initial sample in the adversarial region is selected; b) a random
until sufficient pixels are perturbed. The method has been walk is executed to move the samples toward the decision
applied successfully to a target network without a gradient, but boundary between the adversarial and non-adversarial regions
requires as many queries as the number of pixels. However, by reducing the distance to a target example; c) the stages
attack-space dimension reduction, hierarchical attacks, and of the walk in the adversarial region are performed by means
importance sampling can be used to reduce the number of of rejection sampling. Steps b) and c) are then repeated until
queries. the adversarial example is sufficiently close to the original
Su et al. [55] also utilized the score of the network; however, image. Ilyas et al. [53] introduced a technique similar to
it changed only one pixel in the target image, and is hence a model that requires query-limited, partial-information, and
called a one-pixel attack. In this case, a differential evolution label-only settings. Such techniques could implement natural
algorithm was used to select pixels to perturb. These attacks evolutionary strategies (NESs) [57] to generate adversarial
achieved a good success rate. Recently, Guo et al. [56] intro- examples in a query-limited setting. Here, an instance of the
duced a simple black-box attack, which is query-efficient. They target class is selected as an initial sample and repeatedly
developed a method that picks random noise and either adds projected onto the L∞ -boxes to maximize the probability of
or subtracts them from an image, the addition or subtraction the adversarial target class.
of random noise was proved to increase the target score of the 2) Poisoning Attack: A poisoning attack inserts a ma-
attack. The algorithm repeats this procedure until the attack is licious example into the training set to interfere with the
successful. learning process or facilitate an attack during testing time by
Compared to the methods outlined above, a few other changing the decision boundary of the model, as displayed in
practical attacks rely on predicted labels, because the output Fig. 5. Several poisoning-attack methods applicable to ML
scores of the model are usually inaccessible. The process of techniques, such as SVM or least absolute shrinkage and
executing a boundary attack [40]—which assumes the worst selection operator (LASSO), can be described mathematically.
scenario for attackers—consists of three steps (Fig. 4): a) an However, neural networks are difficult to poison owing to
6 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020

(a) (b)
their complexity. Nonetheless, the relatively small number of
feasible attack methods can be categorized into three types,
based on the attacker’s goal: performance degradation attacks
to compromise the learning process, targeted poisoning attacks
to provoke target sample misclassification through feature
collision, and backdoor attacks to create a backdoor to be
exploited when the system is deployed.
a) Performance degradation attack: It aims to subvert
the training process by injecting spurious samples generated
Fig. 5. The functionality of poisoning a sample. (a) Decision boundary after
from a bi-level optimization problem. Munoz et al. [58] de- training with normal data. (b) Decision boundary after injecting a poisoning
scribed two performance degradation attack scenarios: perfect- sample.
knowledge (PK) and limited-knowledge (LK) attacks. The PK
scenario is an unrealistic setting, and is useful only in a worst-
case evaluation. In the LK scenario, the attacker typically discriminator generated spurious yet realistic images with a
possesses information, namely θ = (D̂, X , M, ŵ), where X poisoning ability. A hyperparameter that can adjust the realism
is the feature representation, M is the learning algorithm, D̂ and poisoning ability of spurious images affected the trade-off
is the surrogate data, and ŵ is the learned parameter from D̂, between effectiveness and detectability. When the influence
where the ˆ symbol indicates that the information is partial. of the realistic image-generation process was high, the attack
The bi-level optimization for creating the poisoning samples success rate was low. Conversely, when the influence of the
can be represented as follows: poisoning ability was high, the generator tended to produce
outliers; hence, attacks were more detectable.
Dc∗ ∈ arg max A(Dc0 , θ) = J(D̂val , ŵ) b) Targeted poisoning attack: This method was intro-
Dc0 ∈φ(Dc ) duced by Koh et al. [22] to cause target test samples selected
(9)
s.t. ŵ ∈ arg min J(Dˆtr ∪ Dc0 , w0 ), from the test dataset to be misclassified during the inference
w0 ∈W phase. The complexity of neural networks renders identifying
where D̂ is divided into training data D̂tr and validation data the source for classification and explaining the classification
D̂val . The objective function A(Dc0 , θ) evaluates the impact in terms of training data challenging. Because of the expense
of the poisoning samples among the clean examples. This of retraining a model after modifying or removing a training
function can be considered a loss function, where J(D̂val ) sample, the authors formulated the influence of up-weighting
measures the performance of the surrogate model with Dˆval . or modifying a training sample during training in terms of
The influence of the poisoning sample Dc is propagated changes to the parameters and loss functions. The attack was
indirectly using ŵ following which, the poisoning sample is optimized based on the amount of change in the test loss
optimized. The primary objective of the optimization is to find caused by the change in the training sample.
a poisoning sample that can degrade the performance of the Although only a small number of attacks are performed on
target model. The poison is generic if the target label of the the training data, the attack may be unsuccessful if the training
poison sample is arbitrary and not specific. If a specific target data are impeded by domain experts. Shafahi et al. [61]
is required, Equation 9 can be replaced by introduced a clean-label attack to circumvent this problem.
Here, the feature-conflict method was used to ensure that the
A(Dc0 , θ) = −J(D̂val
0
, ŵ), (10) labels introduced in the attack were appropriate for the images
0 to which they were attached. Subsequently, the attacker would
where D̂val is the manipulated validation set, which is similar
select the target image t and base image b from the test set
to D̂val , except for the presence of misclassified labels that
where the target image would be expected to be misclassified
can produce a desired output. Munoz et al. [58] proposed
as the label of the base image. The attack p is initialized with
the back-gradient method to solve Equations 9 and 10 and
the base image and created using the following equation:
generate poisoning examples as an alternative to gradient-
based optimization. It requires a convex objective function p = arg min ||f (x) − f (t)||22 + β||x − b||22 . (11)
and a Hessian-vector product, which are not produced using x
complicated learning algorithms, unlike those used to develop The attack is generated by optimizing a sample similar to the
neural networks. In contrast, Yang et al. [59] were able base image in the image space and close to t in the feature
to apply a gradient-based and GAN-like generative method space mapped by function f . The attack surrounds the target
to deep neural networks (DNNs) using an autoencoder to feature f (t) and changes the decision boundary, to have the
compute the gradients reduced the computation time by a target image categorized within the base class. For example,
factor of over 200. if b is a picture of a dog and t is a picture of a bird, the
The attacks described above can be detected easily by attack changes the decision boundary by adding p, a perturbed
outlier detection. Nonetheless, Munoz et al. [60] recently version of b, to the training data. As a result, t is erroneously
proposed a GAN-based attack designed to avoid detection. put into class b, and t can be used to deceive the classifier.
Their pGAN model [60] had a generator, discriminator and Shafahi et al. [61] analyzed attacks in two of the retraining
target classifier. A min–max game between the generator and situations: end-to-end learning which fine-tunes the entire
HO et al.: SECURITY AND PRIVACY ISSUES IN DEEP LEARNING 7

model, and transfer learning, which fine-tunes the final layer. image dataset, including the one containing the trigger. When
They used a one-shot kill attack that generates a poisoning the retrained model is deployed, the image with the trigger
attack from pair of a base and target image through the feature is misclassified with the target label. Such attacks have been
collision method. The one-shot kill attack was successfully successfully applied to face recognition, speech recognition,
applied to transfer learning after significant changes were auto-driving, and age-recognition applications.
made to the decision boundary; however, it was not applied Chen et al. [66] introduced two strategies to obtain access to
to end-to-end learning, which retrains the lower layers that a face recognition system, under three constraints, namely 1)
extract fundamental features. Shafahi et al. [61] succeeded in no knowledge of the model, 2) access to only a limited amount
an attack on end-to-end learning using a watermarking method of training data, and 3) poisoning data not visually detectable.
in which the target image was projected onto a base image by In the input-instance-key strategy, a key image is prepared
adjusting its opacity and using several target and base images. and associated with a target label. To model the camera
Because all neural networks do not have the same feature effects, noise is added to the key image. The second, pattern-
mapping function, the feature collision using a model cannot key strategy, has three variants. The first strategy, blended
be applied to an unknown neural network. Zhu et al. [62] injection, combines a spurious image or random pattern within
proposed a feature collision attack (FC attack) using an the key image; the output images of this process usually
ensemble model and a convex polytope attack (CP attack). FC appeared unrealistic. The second strategy, accessory injection,
attack adopts the same mechanism of that of [61], except for applies an accessory, such as glasses or sunglasses, to the
the number of models for feature collisions, which is greater. key image. This is a simple attack to execute during the
The FC attack was unsuccessful because the constraints on inference stage. The third method, blended accessory injection,
the attack increase, and the attack simply approaches the combines the first and second strategies. Unlike previous
target in the feature space without changing the predicted studies, in which poisoning data accounted for 20% of the
result of the target image. In contrast, the CP attack using training data, the authors of [66] only added five poisoned
convex properties efficiently transforms the target into or images to 600,000 training images in the input-instance-key
near the polytope. However, the attacker would face difficulty strategy, and approximately 50 for the pattern-key strategy. In
poisoning the unknown target model if the model learned new both cases a backdoor was successfully created.
feature mapping functions through end-to-end training. Thus, However, attacks wth visible triggers can be detected eas-
these authors also proposed a multilayer convex polytope ily by human inspection; therefore, recent attacks on image
attack that generated poisoning attacks using feature collisions classification introduced invisible triggers such as scattered
of every activation layer. Moreover, recently, MetaPoison [63] triggers [67], which are distributed across the image, warping-
generated clean-label data poisoning, which works in an end- based triggers [68], and reflection triggers [69]. Backdoor
to-end setting with bi-level optimization. Geiping and Jonas. attacks performed without label poisoning have also been
et al. [64] succeeded in training their model from scratch on a proposed to increase the stealthiness of a target model [70, 71].
full-sized, poisoned imageNet dataset using gradient matching. Furthermore, a backdoor can be established by flipping only
c) Backdoor attack: It is an attack that aims to install several vulnerable bits of weights [72, 73, 74, 75].
a backdoor to be accessed at classification time and was
introduced by Gu et al. [28], who inserted patches into B. Defense Techniques against Deep Learning Models
an image to cause false classifications, such as replacing a Defense techniques against both poisoning and evasion
stop sign with a speed limit. Trojaning attacks [65] rely on attacks have been developed, and the latter can be further cate-
the fact that neural network developers often download pre- gorized into empirical defenses against known evasion attacks
trained weights from ImageNet for training or outsource the and certified defenses, which are also probably effective.
entire amount of data to suppliers of machine learning as a 1) Defense techniques against evasion attacks: Various
service (MLaaS). In a worst-case scenario, an attacker can methods have been proposed to defend DL-based systems
directly change the user’s model parameters and training data; against evasion attacks (adversarial attacks). For example,
however, they cannot access the validation set of the user Kurakin et al. [76] suggested that adversarial training can
or use the training data to launch attacks. Trojaning attacks be employed when security against adversarial examples is a
insert a trigger, in the form of a patch or watermark into an concern, which increases robustness against evasion attacks.
image, which causes it to be categorized into the target class. By including adversarial training, defense techniques can
They involve four steps: 1) the trigger and the target class are be broadly divided into three categories: gradient masking,
selected; 2) the attacker selects the node in the target layer with robustness, and detection.
the highest connectivity from the preceding layer of the trained a) Gradient masking: Gradient masking obfuscates the
model, and the trigger is updated from the gradient derived gradients used in attacks [38]. There are three approaches
from the difference in the activation result of the selected node predominantly adopted in this method: shattered gradients,
and the targeted value of the node (the target value is set by stochastic gradients, and vanishing/exploding gradients.
the attacker to increase the relatedness of the trigger and the Neural networks generally behave in a largely linear man-
selected node of the target layer); 3) using the mean image ner [77]. As image data is multidimensional in nature, the
of a public dataset, the training data are reverse-engineered to property of linearity can have adverse effects on the clas-
ensure that the images would be classified into the target class; sification, rendering the model vulnerable to adversarial at-
4) the target model is trained using the reverse-engineered tacks. The shattered gradients approach involves making the
8 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020

model nondifferentiable or numerically unstable, to ensure exploit obfuscated gradients, are vulnerable to strong gradient-
that accurate gradients cannot be obtained. One version of based attacks [36, 13, 86]. Alternatively, an attacker may
the shattered gradient defense involves thermometer encoding simply use a different attack [33, 38] to bypass such a defense
[10]. This method applies nondifferentiable and non-linear or the way circumvented by any adversary who uses the true
transformations to the input by replacing one-hot encoding adversarial examples [87, 88].
with thermometer encoding. The thermometer τ (j) ∈ RK , b) Robustness: Gradient obfuscation could prove to be
can be expressed as follows: useless in a white-box setting, where increasing robustness
( may be a better approach. One method to increase robustness
1, if l ≥ j of the model is to enable it to produce similar outputs from
τ (j)l = . (12)
0, otherwise clean and adversarial examples and either by penalizing the
difference between them or regularizing the model to reduce
Subsequently, the thermometer discretization function f for a
the attack surface.
pixel i ∈ {i, · · · , n} can be defined as
Most studies on robustness involve adversarial training [9],
ftherm(x)i = τ (b(xi )) = C(fonehot (xi )), (13) which can be viewed as minimizing the worst error caused by
Pl the perturbed data of an adversary. It can also be perceived as
where R is the cumulative sum C(c)l = j=0 cl , and b learning an adversarial game with a model that requests labels
is the quantization function. Other defense techniques based for the input data. Other techniques include the distillation
on gradient shattering include local intrinsic dimensionality training [89] method which provides robustness to saliency
(LID) [78] metrics or input transformations [79] such as map attacks [35], and a layerwise regularization method [90],
image cropping, rescaling [80], bit-depth reduction [81], JPEG which controls the global Lipschitz constant of a network.
compression [34], and total variance minimization [82]. However, none of these methods produce fully robust models
The stochastic gradients approach obfuscates gradients in and could be bypassed by a multi-step attack, such as the
the inference phase by dropping random neurons in each projected gradient descent (PGD).
layer. The network then stochastically prunes a subset of the Most of the optimization problems in ML are solved using
activations in each layer during the forward pass. Stochastic first-order methods and variants of stochastic gradient descent;
activation pruning [11] is a variant of this method in which, the thus, a universal attack can be designed using first-order
dropout follows the probability from a weighted (rather than information. Madry et al. [13] suggested that local maxima
uniform) distribution. The surviving activations are scaled up for the worst error can be found by PGD, on the basis that
to normalize the dynamic range of the inputs to the subsequent a trained network that is robust against PGD adversaries will
layer. The probability of sampling the jth activation in the ith also be robust against a wide range of attacks that assume
layer is given by first-order optimization.
|(hi )j | Adversarial training was originally used to train a small
pi j = Pai , (14) model using the MNIST dataset [9]. Kurakin et al. [76]
i
k=1 |(h )k | extended that work to ImageNet [91] using a deeper model
i
where hi ∈ Ra and (hi )j are the values of the jth activation with a batch normalization step. The relative weights of the
in the ith layer. Xie et al. [83] also used a randomization adversarial examples can be independently controlled in each
technique that inserts a layer in front of the input to the neural batch using the following loss function:
network, which rescales and zero-pads the input. 1 X X
The vanishing/exploding gradients method renders the Loss = ( J(xi |li ) + λ J(x̃i |li )),
(m − k) + λk
model unusable by deep computation, which restores adver- CLEAN ADV
(16)
sarially perturbed images to clean images. These images are where J(x|l) is the loss on a single example x with true class
then fed to the unmodified classifier. PixelDefend [12] is a l, m is the total number of training examples in the batch, k
defense algorithm that uses PixelCNN [84] to approximate the is the number of adversarial examples in the batch, and λ is
training distribution. PixelCNN is a generative model designed the weight applied to adversarial examples.
to produce images that track the likelihood over all pixels by Defense techniques that change the target function by intro-
factorizing it into a product of conditional distributions: ducing regularizers or modifying the architecture of the model
Y
PCNN (x) = PCNN (xi |x1:(i−1) ). help increase the robustness of the model against adversarial
(15)
i attacks. Kannan et al. [92] introduced adversarial logit pairing
Defense-GAN [85] is a similar method that uses a GAN (ALP), which produces regularization by reducing the distance
instead of PixelCNN. The trained generator projects images between the logits of clean examples and those of adversarial
onto the manifold of the GAN, and these projected images examples. The loss function of training then becomes:
m
are then fed into the classifier. 1 X
L f (x(i) ; w), f (x̃(i) ; w) ,

Gradient-based defense algorithms based on the gradient of J(M, w) + λ (17)
m i=1
the initial version are inherently vulnerable to gradient-based
attacks. Athalye et al. [38] used projected gradient descent where J(M, w) is the cost of training a minibatch M, w is the
to set a perturbation υ, combined with the l2 Lagrangian model parameter, and L is the distance function. The results
relaxation approach [86]. Gradient-masking techniques, which showed that a simple regularizer can improve the robustness of
HO et al.: SECURITY AND PRIVACY ISSUES IN DEEP LEARNING 9

Input 𝑥 and Final layer 𝑧ෞ𝑘 and Convex outer bound

Allowable perturbations Adversarial polytope
Deep network

Fig. 6. An adversarial polytope and its outer convex bound [19].

a model thatch is trained adversarially. Double backpropaga- accuracy and robust accuracy by weakening the adversarial
tion [93], which is a regularizer that penalizes the magnitudes examples during adversarial training. They used early-stopped
of the input gradients, reduces the sensitivity of the divergence PGD to prevent the adversarial examples used for training
between the predictions and uniform uncertainty produced by from significantly destroying the generalization ability.
evasive attacks. Miyato et al. [94] introduced a regularizer According to Schmidt et al. [99], the adversarial robustness
that reduced the Kullback–Leibler divergence between clean requires more data to produce successful results. There are sev-
and adversarial examples, to render the distributions of the eral methods [100, 101] that use data augmentation methods to
resulting outputs more similar to each other. Xie et al. [95] increase robustness. Carmon et al.,[100] used unlabeled data
denoise feature maps by adding blocks, such as non-local to improve the robustness; the data were pseudo-labeled by the
mean blocks, to a network to reduce adversarial perturbations classifier before being deployed in adversarial training. Lee et
from the inputs. A more recent regularizer [96] makes a model al.,[101] proposed out-of-distribution data augmented training
behave linearly in the vicinity of the input data, which reduces (OAT). They used out-of-distribution data for training with a
the effect of gradient obfuscation and improves robustness to uniform distribution label and achieved improved robustness
adversarial examples. by removing the contributions of undesirable features.
There are several variants of adversarial training, such as c) Detection: Retaining the ability to detect attacks at the
the augmentation of training data or introduction of loss inference phase is considered equally (if not more) valuable
functions. Tramer et al. [51] proposed ensemble adversarial to increase the security of a DL-based system and ensure
training to defend against black-box attacks by using ad- that corrupted input can be rejected. Most detection methods
versarial examples generated by other networks. Decoupling require no change to the classifier; therefore, they are easy to
adversarial example generation from the trained model in- implement, and can be combined with other defenses.
creases the diversity of the training data. In another study, Metzen et al. [15] detected adversarial examples using a bi-
tradeoff-inspired adversarial defense via surrogate-loss min- nary detector network, which was trained to classify inputs into
imization (TRADES) [14] identified a trade-off between clean and perturbed examples. Using a similar scheme, Meng
adversarial robustness and accuracy. The expected errors in et al. [16] separated a detector and reformer network, which
adversarial examples are decomposed into the sum of the were then used to reconstruct clean input. These networks
expected errors in clean examples and a boundary error that identified adversarial examples from the reconstruction error,
corresponds to the likelihood of the closeness of the inupt which yields the Jensen–Shannon divergence of the original
features to the perturbation-extension of the decision boundary. and reconstructed inputs:
Both these errors are expressed by a surrogate loss function 1 1
such as cross-entropy or 0-1 loss functions, to yield the JSD(P ||Q) = DKL (P ||M ) + DKL (Q||M ), (19)
2 2
following minimization:
where P is the output resulting from the original inputs, Q is
min E{φ(f (x)l) + max φ(f (x)f (x0 )l/λ)}, (18) the output of the reconstructed input, and M is the mean of
f x0 ∈B(x,ξ)
P and Q, M = 12 (P + Q).
where φ is the surrogate loss function that represents the Feature squeezing [81] reduces the search space available
expected errors, and B(x, ξ) represents a neighborhood of to attackers by squeezing the inputs before comparing the
x : {x0 ∈ X : ||x0 − x|| ≤ ξ}, which is the expected error and prediction results obtained from the squeezed examples with
the boundary error weighted by λ. This method showed state- those of the clean examples. If there are substantial differences,
of-the-art performance under both black-box and white-box the original input is likely to be adversarial. Squeezing was
attacks. Zhang et al. [97] proposed a feature scattering-based achieved by color-depth reduction and spatial smoothing (both
adversarial training approach that utilized the optimal transport local and non-local smoothing). This method was able to detect
distance between the input data and its adversarial examples adversarial examples in various types of evasion attacks with
for training without label leaking [36]. Recently, Zhang et a low false-positive rate.
al. [98] attempted to to handle the trade-off between standard Grosse et al. [102] identified adversarial examples by apply-
10 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020

ing statistical metrics to the output of a classifier. They also

introduced a method of integrated outlier detection in which

Conv

FC
a classifier is trained to categorize adversarial examples into a
new class. This involves a small reduction in classification
accuracy but a high detection rate. Feinman et al. [103]
also detected adversarial examples by examining statistical
metrics such as the density of the feature space and Bayesian ℂ0 (𝒙) ℂ1 (𝒙) ℂ2 (𝒙) ℂ3 (𝒙)
uncertainty estimates.
Pang et al. [104] minimized reverse cross-entropy as the Fig. 7. Illustration of layerwise adversarial training. A latent adversarial
loss function used to train the model, to identify adversarial example is found in the convex region C1 (x) and propagated through the
examples. The reverse cross-entropy loss value of an input x latter layers in a forward pass, which is represented by the blue lines. The
red line shows the gradients during a backward pass. In the procedure, the
over a label y is expressed as follows: first layer that corresponds to the former layer of convex region C1 (x) does
| not receive gradients [110].
LR
CE (x, l) = −Rl log σ(x), (20)
(
1 certified approach against adversarial examples on two-layer
λ+1 , i=y
Ry = Piλ = λ , (21) networks. Wong et al. [19] presented a convex outer-bound
(L−1)(λ+1) , i 6= y
approach called an ”adversarial polytope”, which is the set
where σ(·) is ;he softmax output, Ry represents the reversed of all the final activation layers that are produced by applying
form of the label y, and λ is the hyperparameter with λ = ∞ norm-bounded perturbation to the inputs. They used this bound
in the experiment. In a recent study, Hu et al., [17] introduced for linear relaxation of the ReLU activation and optimized the
a detection method with safety criteria: robustness against worst-case loss over the region within the bound, as shown
random noise and susceptibility to adversarial noise, which in Fig. 6. However, this method can only be applied to small
is represented as robustness against Gaussian noise and the networks. Wong et al. [108] extended the scope of this method
minimum number of steps required to perturb the input, re- by introducing a provably robust training procedure for general
spectively. They achieved unprecedented accuracy in a white- networks, formulated in terms of Fenchel conjugate functions,
box setting. nonlinear random projections, and model cascade techniques.
d) Certified approach: The robustness of most defenses Cohen et al. [20] addressed the issue of certified defenses
can only be established empirically in the context of known from a different perspective; they proved that classifiers that
types of attacks. An empirically robust classifier may be over- are robust against Gaussian noise are also robust against
come by new and stronger attacks. However, some classifiers, adversarial perturbations bounded by the l2 norm. They used
generally DNNs, can be proven robust if they produce a randomized smoothing, which had already been proven [109]
constant output for some set of variations of the inputs which to maintain robustness. Cohen et al. [20] further proved
is generally expressed as an Lp ball. that smoothing with Gaussian noise can induce certifiable
DNNs have input and output layers with hidden layers robustness against l2 norm bounded perturbations. Because
between them. Reluplex [18] verifies the robustness of DNNs the exact evaluation of the robustness of the classifier is not
by searching for linear combinations of hidden layers. This possible, they showed that the method is robust against attacks
problem is NP-complete, and thus, the search space is reduced with high probability using Monte Carlo algorithms.
by a simplex algorithm. This algorithm is based on a satisfi- Recently, Balunovic et al. [110] combined the adversarial
ability modulo theories (SMT) solver that addresses Boolean training of a classifier with provable defense methods. A
satisfiability. Exploiting the properties from the simplex, Relu- verifier aims to prove the robustness of the classifier, while
plex allows inputs to temporarily violate their feasible bounds an adversary attempts to garner inputs that cause errors within
for certification, verifying the robustness of a neural network. convex bounds, as shown in Fig. 7. They utilized layerwise
Sinha et al. [105] introduced a method that is provably adversarial training and bridged the gap between adversarial
robust to perturbations distributed in a Wasserstein ball. They training-based empirical defense methods and existing certi-
trained a classifier with adversarial training using a distribu- fied defense methods. The method resulted in state-of-the-art
tionally robust optimization. Hein et al. [106] showed formal robust accuracy on the CIFAR-10 dataset under 2/255 L∞
guarantees on the robustness of classifiers using a bound on and 8/255 L∞ perturbations.
the local Lipschitz constant in the vicinity of the input. Their 2) Defense against Poisoning Attacks: Steinhardt et
Cross-Lipschitz regularizer increased the range of attacks that al. [21] proposes a data sanitation defense method [111] which
can be defeated, forcing potential attackers to find better modes aims to remove poisoned data points from the given dataset.
of attack. The proposed online learning algorithm provides candidate
Accurate bounds on worst-case losses improve the robust- attack data instances and the worst-case test loss from any
ness. Raghunathan et al. [107] improved the accuracy of attack.
both the lower and upper bounds on the worst-case loss Koh et al. [22] used influence functions to track model
by concentrating on the upper bound. This was performed predictions and identify training data points that had the most
on the basis that it is safer to minimize the upper bound influence on a given prediction. Although their theory does
than minimizing the lower bound. They demonstrated a novel not extend to nonconvex and nondifferentiable models, they
HO et al.: SECURITY AND PRIVACY ISSUES IN DEEP LEARNING 11

proved that approximate influence functions can be effective al. [27], except they pruned filters of a neural network that
against poisoning attacks. These functions additionally allow were compromised, thus triggering a backdoor attack.
a defender to focus on data with a high influence score. This
method appears to be a better way of eliminating tainted ex-
amples than simply identifying data points with large training III. P RIVATE AI
losses. Deep learning algorithms, which underpin most current
Paudice et al. [23] also suggested a defense mechanism AI systems, are data-driven, which exposes them to privacy
to mitigate the effects of poisoning attacks based on outlier threats while data collection or pretrained-model distribution
detection. An attacker would attempt to cause the greatest is performed. Many attempts were made to build private AI
possible impact with a limited number of poisoning points. systems to maintain data privacy. In this section, we describe
To mitigate this effect, they divided the trustworthy dataset the ways in which privacy can be breached in current AI
D into two classes, D+ and D− , and trained a distance- systems, review defenses based on homomorphic encryption
based outlier detector for each class. Each detector calculated (HE), secure multi-party computation (SMC), and differential
an outlier score for each sample in the entire clean dataset. privacy (DP).
There are many ways to measure the outlier score, such as an
SVM or local outlier factor (LOF). In this study, the empirical
cumulative distribution function (ECDF) of training instances A. Scenarios for Privacy Attacks
was used to determine a threshold for detecting outliers. Upon
removing all the entities expected to be contaminated, the 1) Service providers: Service providers offer AI-based
remaining data were used to retrain the learning algorithm. applications to the public. These applications are based on
Instead of following outliers, Paudice et al. [24] decided to pretrained DL models, and often use privacy-sensitive data to
relabel data points that considered outliers by a label-flipping improve model performance. A group of studies [112] has
attack, which is a poisoning attack, wherein an attacker suggested that not only DL models learn latent patterns from
changes the label of few training points. They considered the training data, but also a trained model becomes actually a
points farthest from the decision boundary to be malicious repository of that data, which would effectively be exposed
and reclassified them. The algorithm reassigned the label of by granting access to a pretrained model. In a membership
each malicious example using a k-nearest neighbor (k-NN) inference attack [113, 114, 115, 116, 117], an attacking model
algorithm. For each sample of the training data, the closest tries to determine whether the given dataset was used to train
k-NN points were first found using the Euclidean distance. the target model. The more powerful inversion attack aims
If the number of data points with the most common label to obtain the attributes of the unknown data that were used
among the k-NN was equal to or greater than a given threshold, to train the target model. For example, Fredrikson et al. [30]
the corresponding training sample was renamed as the most reconstructed an image of a face that was used to train a target
common label in k-NN. classifier using confidence scores attached to the classification.
Chen et al. [25] looked for poisoned data by monitoring 2) Information Silos: An information silo is a data man-
activation in the latent space of a neural network, rather agement system that is isolated from other similar systems.
than analyzing its input or output. Each example was an- A deep-learning system is usually more effective if it is
alyzed by how far the degree of activation deviation from trained using a large dataset. In an AI system, information
the activation-value distributions of a class majority. Tran et from different silos may be used to train the model without
al. [26] also defended against other variation backdoor attacks directly sharing data among the silos. Federated learning [118,
by monitoring activation values, which were analyzed using 119, 120] facilitates this process by sharing gradients and
spectral signatures. This method spotted poisoned data using model parameters; however, this makes the data vulnerable to
the activation of a neural network, similar to the method used membership and inversion attacks illustrated in Fig. 8. Hitaj
by Chen et al. [25]. Firstly, a singular value decomposition et al. [121] demonstrated that a federated DL approach is
was applied to the covariance matrix. Subsequently, all the essentially broken in terms of privacy, because it is virtually
training data were compared with the first singular vector. impossible to protect the training data of honest participants
Poisoned examples had a high outlier score and were erased from an attack in which a GAN tricks a victim into revealing
before retraining the neural network. sensitive data.
The defense proposed by Liu et al. [27] was different from 3) Users: Many DL-based applications run on third-party
the mechanisms described above, which aims to detect and servers, because they are too large and complicated [122, 123]
remove poisoned data. These authors modified the neural net- to run on devices such as mobile phones or smart speakers.
work itself, using a technique called fine-pruning (combination Users must therefore transfer sensitive data, such as voice
of pruning and fine-tuning). Pruning a neural network removes recordings or images of faces, to the server. Therefore, the
neurons, including the backdoor neurons [28]. However, be- user loses control of their data: they can neither delete it nor
cause other attacks are made pruning-aware, this method also determine the manner in which it is used. Similar to the recent
suggested cleaning the neural network through fine-tuning Facebook–Cambridge Analytica data scandal [124], privacy
after pruning using trusted clean data. The resultant network policies may be inadequate in preventing the exploitation of
was found to be robust against multiple poisoning attacks. user data.
Wang et al. [29] presented a similar method to that of Liu et
12 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020

(&')*$&+,'-)*.&'

/&0"'#1+(&')&' !"#"
!"#"
!"#"
!"#$%&'(")*++*,-

!"#$%&'(")*++*,-

!"#"

203-'4#"*-0+ (&')*$&+56&'
!""#$%&' (*1-
!"# !$# !%#

Fig. 8. Privacy attack scenarios from the perspectives of the (a) service provider, (b) information silo, and (c) user.

TABLE III
D EFENSE METHODS FOR PRIVATE AI

(Figure 8) Homomorphic Secure Multi-party Differential

Victims Encryption Computation Privacy
Service Providers X
Information Silos X X X
Service Users X X
[125, 126, 127, 128, 129, 130, 131] [132, 133, 134, 135, 136, 137] [138, 139, 140, 141, 142, 143]
References
[144, 145, 146, 147, 148]
Section III-B1 III-B2 III-B3

B. Defenses against Attacks on Privacy functions. Chabanne et al. [127] employed batch normalization
Several methods have been attempted to combine DL with to reduce the difference in accuracy between the original
established security techniques, including homomorphic en- classifier and the classifier evaluated with encrypted data by
cryption (HE), secure multi-party computation (SMC) and approximating the activation function during inference. This
differential privacy (DP). Table III evaluates these techniques technique also permitted the design of a deeper model. The
against the privacy threats listed in Section III-A1, and in this inference on the original version of CryptoNets was slow
subsection, we review their effectiveness. by several hundreds of seconds; its speed was subsequently
1) Homomorphic encryption on deep learning: HE is a improved in later studies [128, 129].
cryptographic scheme that enables computations on encrypted TFHE [151] is a recent HE technique that supports opera-
data without decryption. An encryption scheme is homomor- tions on binary data. TAPAS [130] and FHE-DiNN [131] were
phic for operation ∗; without the access to the secret key, the improvements on this scheme, implemented using with binary
following holds: neural networks, which achieved improved speed and greater
accuracy on the MNIST dataset, with only a single hidden
Enc(x1 ) ∗ Enc(x2 ) = Enc(x1 ∗ x2 ), (22) layer.
where Enc(·) denotes the encryption function. HE can protect 2) Secure multi-party computation on deep learning: Till
user data from third-party servers or gradients aggregated date, there are two major approaches proposed to maintain
among information silos. privacy in DL-based systems involving multiple parties: 1)
Gilad et al. [125] introduced the use of encrypted data in protection of user-side privacy by secure multi-party compu-
the inference phase. Their CryptoNets system uses a YASHE tation, and 2) secure sharing of gradients between information
leveled HE scheme [149] to provide privacy-preserving in- silos.
ference on a pretrained CNN. CryptoNets demonstrated over SMC methods are primarily based on secure two-party
99% accuracy in a classification task on handwritten digits (2PC) techniques, which involve a user, who provides data, and
in the MNIST dataset [150]. However, because nonlinear a server that runs a DL system using the data. SecureML [132]
activations are approximated by square functions, the exten- was the first privacy-preserving method proposed in which
sibility of CryptoNets to large complicated models is ques- neural networks were computed using 2PC as it requires large
tionable. [122, 123]. However, Hesamifard et al. [126] and amounts of communication. In MiniONN [133] a neural net-
Chabanne et al. [127] attempted to improve CryptoNets using work is replaced by an oblivious neural network that is trained
higher-degree polynomial approximations of the activation using a simplified HE scheme. Garbled circuits (GCs) were
HO et al.: SECURITY AND PRIVACY ISSUES IN DEEP LEARNING 13

Label-level where M : D → R is a randomized mechanism that adds the

noise n sampled from a Laplace or Gaussian distribution [158]
to the query response, D is the target database, and f is the
Noise original deterministic query response.
M provides (ε, δ)-DP if all adjacent D and D0 satisfy the
𝑦 𝑙 (label)
following [158]:
𝛻𝑤 𝐽 𝛻𝑤 𝐽
Objective Pr[M (D) ∈ S] ≤ exp(ε)Pr[M (D0 ) ∈ S] + δ, (24)
Noise Noise function 𝐽
Gradient-level Objective-level
where D and D0 are two adjacent databases, S ⊆ range(M )
is a subset of R, and ε and δ are the privacy budget parameters
Fig. 9. Overview of the differential privacy in a DL framework. that determine the level of privacy. Smaller ε and δ indicate
that M (D) and M (D0 ) are more similar.
Differentially private deep learning models can be divided
into three groups: gradient-level [138, 139, 140, 141, 142],
used to approximate nonlinear activation functions. DeepSe- objective-level [143, 144, 145] and label-level [146, 147, 148]
cure [134] performs encrypted inference on a neural network approaches, depending upon where the noise is added. Fig. 9
using Yao’s GCs [152] and suggests some other practical shows an overview of these approaches. In the gradient-
computing structures that are probably secure. Gazelle [135] level approach, noise was inserted into the gradients of the
performed linear operations with HE and computed the acti- parameters during the training phase. In the objective-level
vation functions with GC. The authors observed that HE is the approach, noise was used to perturb the coefficients of the
most promising method in matrix-vector multiplications, while original objective function. In the label-level approach, noise
GCs make them more suited to approximation of nonlinear was inserted into the label in the knowledge-transfer phase of
functions in DNN models. Although 2PC-based algorithms the teacher–student mechanism.
have shorter inference times than those of HE-based methods, Abadi et al. [138] proposed a differential private SGD
they require communication at every operation or layer. Hence, algorithm (DP-SGD) that adds noise to the gradients while
they are impractical because 1) the user must be online during updating parameters. They introduced a moment accountant
the whole inference phase and 2) the communication overhead algorithm to track the cumulative privacy loss to estimate ε
increases as the number of connected users increase. and δ. McMahan et al. [139] introduced differentially private
Methods to protect data privacy in the federated learning of long short-term memory (LSTM) [159], which provides DP for
data silos are based mainly on distributed DL algorithms [153, a language model. Xie et al. [140] proposed a differentially
154, 155]. The distributed selective SGD (DSSGD) [136] private GAN (DPGAN) to provide DP for a differentially pri-
uses collaborative DL protocols that allow different data vate generator. The DPGAN injected noise into the gradients
holders to train joint DL models without sharing their training of the discriminator to obtain a differentially private discrim-
data. Using coordinated learning models and objectives, the inator. The generator is trained with the discriminator, and
participants train their local neural networks and periodically hence becomes differentially private based on post-processing
exchanged the gradients and parameters. As gradients and theory [160]. Acs et al. [141] introduced a differentially
parameters are only partially shown, DSSGD resists model private generative model consisting of a mixture of generative
inversion and membership attacks. However, because DSSGD NNs such as restricted Boltzmann machines (RBMs) [161]
uses a parameter server [156], Aono et al. [137] noted that and variational autoencoders (VAEs) [162]. These authors
it is possible to reconstruct the data used in training from applied a differentially private k-means algorithm to cluster
a small number of gradients. To preserve privacy against an the original datasets and used DP-SGD [138] to train each
honest-but-curious parameter server, the authors applied HE neural network. Yu et al. [142] introduced improved DP-SGD
to the parameter and gradient exchange. Because the size of by applying a different sampling strategy and a concentrated
the encrypted data is much larger than that of plaintext, this DP (CDP) [163]—a variant of DP—to provide a higher level
has a trade-off with communication costs. Hence, Ryffel et of privacy.
al. [157] attempted to build a privacy-preserving federated The objective-level approach introduced by Chaudhuri and
learning framework that combines MPC and DP functionality. Monteleoni [143] disturbs the original objective function by
adding noise to its coefficients, making the model trained on
3) Differential privacy in deep learning: Differential pri- this function differentially private. Noise is injected into the
vacy (DP) is a state-of-the-art privacy-preserving mecha- polynomial objective function by changing the coefficients.
nism [158] that reliably prevents an attacker from deducing A non-polynomial objective function must be approximated
private information from databases or deep learning models. using techniques such as the Taylor or Chebyshev expansions.
DP algorithms prevent an attacker from discovering the ex- Chaudhuri and Monteleoni [143] proposed a differentially
istence of a particular record by adding noise to the query private logistic regression, in which the parameters were
responses, as follows: updated to minimize perturbed objective function. Phan et
al. [144, 145] applied this mechanism to autoencoders [164]
and convolutional deep belief networks [165].
M (D) = f (D) + n, (23) The label-level approach injects noise into the knowledge-
14 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020

transfer phase of the teacher-student mechanism. Papernot Moreover, as mentioned in previous studies, poisoned train-
et al. [146] introduced a semi-supervised knowledge-transfer ing data can be classified as outliers and be detected; if the
technique called private aggregation of teacher ensembles poisoned data is less visible as it is highly similar to the clear
(PATE), which is a type of teacher-student mechanism whose data, it would be less effective. This provides an avenue for
purpose is to train a differentially private classifier (the stu- developing an methodology for performing a trade-off between
dent) based on an ensemble of non-private classifiers (the effectiveness and detectability, which has been achieved by
teachers), trained on disjoint datasets. The teacher ensemble changing the decision boundary of the model. New metrics and
outputs noisy labels by noisy aggregation of each teacher’s evaluation methods are further required to secure the training
prediction, which the student learns. Because the student phase.
model cannot access the training data directly and the labels In addition to designing DL-based systems as robust as
that it receives are differentially private, PATE provides DP. current technology permits, it is essential to establish good
PATE utilizes a moment accountant to track the privacy budget testing and verification methods to verify it. For instance,
spent through the learning process. Later, Papernot et al. [147] there exists an automated test for DNNs used in autonomous
extended PATE to operate at a large scale by introducing a driving vehicles [175]. Several authors [18, 19, 176, 177] have
new noisy aggregation mechanism, which outperformed the presented a formal analysis of the robustness of DNNs against
original PATE. Jordon et al. [148] applied the PATE to train a input perturbations. However, current verification methods
discriminator to build a differentially private GAN framework. predominantly consider only norm-ball perturbations, such as
The discriminator provided DP, and the generator trained with the l2 ball mentioned in Section II; hence, a more generalized
the discriminator was also differentially private using the post- verification is required.
processing theory [160].

IV. C ONCLUSION C. Designing privacy-preserving DNNs using HE

DL has become a ubiquitous technology, and the security The biggest problem with existing HE-based methods is
and privacy of DL-based systems are issues of growing impor- that they are incompatible with DNNs. Under current HE
tance. We reviewed methods of attack and defense methods in schemes, the number of arithmetic operations that can be
terms of model security and data privacy. Finally, we discuss performed on encrypted data is limited. Although an operation
the open problems and future research directions. known as bootstrapping [178] enables refreshing the limit, not
all the HE schemes support bootstrapping, and bootstrapping
A. Risks inherent in wide deployment generally takes a significant amount of time. For these reasons,
the existing HE-based DL systems described earlier, make
It requires considerable effort to develop a new DL system;
their network shallow and simple to remove the necessity
therefore, successful systems are often modified for new
to perform bootstrapping; for instance, CryptoNets [125] has
applications, and trained using various datasets. For example,
only two convolutional layers and one fully connected layer,
U-Net [166] is a CNN used for biomedical image segmentation
and FHE-DiNN [131] has only a single fully connected layer.
and has a number of variants [167, 168, 169]. Suppose you
Therefore, designing privacy-preserving deep neural network
are a doctor who developed a new variant of U-Net trained on
models using bootstrapping, which performs within a feasible
your patient data, the data must not be revealed to the public.
time, is a fundamental future research direction.
Subsequently, if you do not upload your model, the ubiquity
Moreover, Classifiers that can use encrypted data are of un-
increases the susceptibility of the CNN to the attacks reviewed
suitable speeds for real-life applications. These systems must
in this paper. The availability of the U-Net and its variants can
be implemented in parallel or distributed processing systems
help in building substitute models for black-box attacks, and
using graphics processing units (GPUs) or cluster processors.
knowledge of similar models makes the model vulnerable to
GPUs have already been used to accelerate DL training. More-
inversion attacks. Hence, it is essential to conclude that the
over, several defenses mentioned in Section III-B1 require the
wide adoption of open-source DL systems is not compatible
user to encrypt their data before transmission and decrypt the
with high levels of security and privacy. There is perhaps some
results. Hence, methods for verifying the practicability on-
scope for research on the internal structuring of DL systems
device encryption and decryption and improving efficiency
to permit some variations around a secure core.
should be devised.
B. Challenges in model security
R EFERENCES
The contributions of adversarial examples to the robust-
ness of neural networks against misclassification attacks have [1] B. Shickel, P. Tighe, A. Bihorac, and P. Rashidi, “Deep
recently been reexamined [170]. We believe that a better EHR: a survey of recent advances on deep learning
understanding of DL-based models is fundamental to improve techniques for electronic health record (EHR) analysis,”
both attack and defense strategies. Evidently, interpretable Journal of Biomedical and Health Informatics, 2017.
AI [171, 172, 173] involves an analysis of the operation of a [2] A. L. Buczak and E. Guven, “A survey of data min-
DL and the manner in which it clarifies data. A deeper under- ing and machine learning methods for cyber security
standing of DL systems from different perspectives [170, 174] intrusion detection,” IEEE Communications Surveys &
would help identify vulnerabilities to unseen attacks easily. Tutorials, vol. 18, no. 2, pp. 1153–1176, 2016.
HO et al.: SECURITY AND PRIVACY ISSUES IN DEEP LEARNING 15

[3] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R- [17] S. Hu, T. Yu, C. Guo, W.-L. Chao, and K. Q. Wein-
CNN: towards real-time object detection with region berger, “A new defense against adversarial images:
proposal networks,” in Advances in Neural Information turning a weakness into a strength,” in Advances in
Processing Systems, 2015. Neural Information Processing Systems, 2019.
[4] C. Yan, X. Wenyuan, and J. Liu, “Can you trust au- [18] G. Katz, C. Barrett, D. L. Dill, K. Julian, and M. J.
tonomous vehicles: contactless attacks against sensors Kochenderfer, “Reluplex: an efficient SMT solver for
of self-driving vehicle,” in DEF CON 24 Hacking verifying deep neural networks,” in Proc. International
Conference, 2016. Conference on Computer Aided Verification (CAV),
[5] J. Li, F. Schmidt, and Z. Kolter, “Adversarial cam- 2017.
era stickers: a physical camera-based attack on deep [19] E. Wong and Z. Kolter, “Provable defenses against
learning systems,” in Proc. International Conference on adversarial examples via the convex outer adversarial
Machine Learning, 2019. polytope,” in Proc. International Conference on Ma-
[6] N. Carlini and D. Wagner, “Audio adversarial examples: chine Learning, 2018.
targeted attacks on speech-to-text,” in Proc. 39th IEEE [20] J. M. Cohen, E. Rosenfeld, and J. Z. Kolter, “Certified
Symposium Security and Privacy, 2018. adversarial robustness via randomized smoothing,” in
[7] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, Proc. International Conference on Machine Learning,
“Accessorize to a crime: real and stealthy attacks on 2019.
state-of-the-art face recognition,” in Proc. ACM SIGSAC [21] J. Steinhardt, P. W. W. Koh, and P. S. Liang, “Certified
Conference on Computer and Communications Security, defenses for data poisoning attacks,” in Advances in
2016. Neural Information Processing Systems, 2017.
[8] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, [22] P. W. Koh and P. Liang, “Understanding black-box
P. Laskov, G. Giacinto, and F. Roli, “Evasion attacks predictions via influence functions,” in Proc. 34th In-
against machine learning at test time,” in Proc. Joint ternational Conference on Machine Learning, 2017.
European Conference on Machine Learning and Knowl- [23] A. Paudice, L. Muñoz-González, A. Gyorgy, and E. C.
edge Discovery in Databases, 2013. Lupu, “Detection of adversarial training examples in
[9] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining poisoning attacks through anomaly detection,” arXiv
and harnessing adversarial examples,” arXiv preprint preprint arXiv:1802.03041, 2018.
arXiv:1412.6572, 2014. [24] A. Paudice, L. Muñoz-González, and E. C. Lupu, “La-
[10] J. Buckman, A. Roy, C. Raffel, and I. Goodfellow, bel sanitization against label flipping poisoning attacks,”
“Thermometer encoding: one hot way to resist adver- in Proc. Joint European Conference on Machine Learn-
sarial examples,” in Proc. International Conference on ing and Knowledge Discovery in Databases, 2018.
Learning Representations, 2018. [25] B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Ed-
[11] G. S. Dhillon, K. Azizzadenesheli, Z. C. Lipton, wards, T. Lee, I. Molloy, and B. Srivastava, “Detecting
J. Bernstein, J. Kossaifi, A. Khanna, and A. Anand- backdoor attacks on deep neural networks by activation
kumar, “Stochastic activation pruning for robust adver- clustering,” arXiv preprint arXiv:1811.03728, 2018.
sarial defense,” in Proc. International Conference on [26] B. Tran, J. Li, and A. Madry, “Spectral signatures in
Learning Representations, 2018. backdoor attacks,” in Advances in Neural Information
[12] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kush- Processing Systems, 2018.
man, “PixelDefend: leveraging generative models to [27] K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-pruning:
understand and defend against adversarial examples,” in defending against backdooring attacks on deep neural
Proc. International Conference on Machine Learning, networks,” in Proc. International Symposium on Re-
2017. search in Attacks, Intrusions, and Defenses, 2018.
[13] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and [28] T. Gu, B. Dolan-Gavitt, and S. Garg, “BadNets: iden-
A. Vladu, “Towards deep learning models resistant to tifying vulnerabilities in the machine learning model
adversarial attacks,” Proc. International Conference on supply chain,” arXiv preprint arXiv:1708.06733, 2017.
Learning Representations, 2018. [29] B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath,
[14] H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, H. Zheng, and B. Y. Zhao, “Neural cleanse: identifying
and M. I. Jordan, “Theoretically principled trade- and mitigating backdoor attacks in neural networks,” in
off between robustness and accuracy,” arXiv preprint Proc. 40th IEEE Symposium on Security and Privacy,
arXiv:1901.08573, 2019. 2019.
[15] J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff, [30] M. Fredrikson, S. Jha, and T. Ristenpart, “Model in-
“On detecting adversarial perturbations,” in Proc. In- version attacks that exploit confidence information and
ternational Conference on Learning Representations, basic countermeasures,” in Proc. ACM SIGSAC Confer-
2017. ence on Computer and Communications Security, 2015.
[16] D. Meng and H. Chen, “Magnet: a two-pronged defense [31] R. Shokri, M. Stronati, C. Song, and V. Shmatikov,
against adversarial examples,” in Proc. ACM SIGSAC “Membership inference attacks against machine learn-
Conference on Computer and Communications Security, ing models,” in Proc. IEEE Symposium on Security and
2017, pp. 135–147. Privacy, 2017.
16 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020

[32] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, “Spatially transformed adversarial examples,” arXiv
D. Erhan, I. Goodfellow, and R. Fergus, “Intrigu- preprint arXiv:1801.02612, 2018.
ing properties of neural networks,” arXiv preprint [49] C. Laidlaw and S. Feizi, “Functional adversarial at-
arXiv:1312.6199, 2013. tacks,” in Advances in Neural Information Processing
[33] N. Carlini and D. Wagner, “Towards evaluating the ro- Systems, 2019.
bustness of neural networks,” in Proc. IEEE Symposium [50] N. Papernot, P. McDaniel, and I. Goodfellow, “Transfer-
on Security and Privacy, 2017. ability in machine learning: from phenomena to black-
[34] D. P. Kingma and J. Ba, “Adam: a method for stochastic box attacks using adversarial samples,” arXiv preprint
optimization,” in Proc. International Conference on arXiv:1605.07277, 2016.
Learning Representations, 2015. [51] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow,
[35] N. Papernot, P. D. McDaniel, S. Jha, M. Fredrikson, D. Boneh, and P. McDaniel, “Ensemble adversarial
Z. B. Celik, and A. Swami, “the limitations of deep training: attacks and defenses,” in Proc. International
learning in adversarial settings,” in Proc. IEEE Euro- Conference on Learning Representations, 2019.
pean Symposium on Security and Privacy, 2016. [52] P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J.
[36] A. Kurakin, I. Goodfellow, S. Bengio et al., “Adversar- Hsieh, “ZOO: zeroth order optimization based black-
ial examples in the physical world,” 2016. box attacks to deep neural networks without training
[37] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and substitute models,” in Proc. 10th ACM Workshop on
P. Frossard, “Universal adversarial perturbations,” in Artificial Intelligence and Security, 2017.
Proc. IEEE Conference on Computer Vision and Pattern [53] A. Ilyas, L. Engstrom, A. Athalye, J. Lin, A. Athalye,
Recognition, 2017. L. Engstrom, A. Ilyas, and K. Kwok, “Black-box ad-
[38] A. Athalye, N. Carlini, and D. Wagner, “Obfuscated versarial attacks with limited queries and information,”
gradients give a false sense of security: circumventing in Proc. 35th International Conference on Machine
defenses to adversarial examples,” in Proc. Interna- Learning, 2018.
tional Conference on Machine Learning, 2018. [54] A. Ilyas, L. Engstrom, and A. Madry, “Prior convic-
[39] W. Brendel, J. Rauber, M. Kümmerer, I. Ustyuzhaninov, tions: black-box adversarial attacks with bandits and
and M. Bethge, “Accurate, reliable and fast robustness priors,” in Proc. International Conference on Learning
evaluation,” in Advances in Neural Information Process- Representations, 2019.
ing Systems, 2019. [55] J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack
[40] W. Brendel, J. Rauber, and M. Bethge, “Decision- for fooling deep neural networks,” IEEE Transactions
based adversarial attacks: reliable attacks against black- on Evolutionary Computation, no. 5, pp. 828–841,
box machine learning models,” in Proc. International 2019.
Conference on Learning Representations, 2018. [56] C. Guo, J. Gardner, Y. You, A. G. Wilson, and K. Wein-
[41] F. Croce and M. Hein, “Reliable evaluation of adversar- berger, “Simple black-box adversarial attacks,” in Proc.
ial robustness with an ensemble of diverse parameter- International Conference on Machine Learning, 2019.
free attacks,” in Proc. International Conference on [57] D. Wierstra, T. Schaul, J. Peters, and J. Schmidhuber,
Machine Learning, 2020. “Natural evolution strategies,” in Proc. IEEE Congress
[42] ——, “Minimally distorted adversarial examples with on Evolutionary Computation, 2008.
a fast adaptive boundary attack,” in Proc. International [58] L. Muñoz-González, B. Biggio, A. Demontis, A. Pau-
Conference on Machine Learning, 2020. dice, V. Wongrassamee, E. C. Lupu, and F. Roli,
[43] M. Andriushchenko, F. Croce, N. Flammarion, and “Towards poisoning of deep learning algorithms with
M. Hein, “Square attack: a query-efficient black-box ad- back-gradient optimization,” in Proc. ACM Workshop
versarial attack via random search,” in Proc. European on Artificial Intelligence and Security, 2017.
Conference on Computer Vision, 2020. [59] C. Yang, Q. Wu, H. Li, and Y. Chen, “Generative poi-
[44] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. soning attack method against neural networks,” arXiv
Celik, and A. Swami, “Practical black-box attacks preprint arXiv:1703.01340, 2017.
against machine learning,” in Proc. ACM ASIA Confer- [60] L. Muñoz-González, B. Pfitzner, M. Russo,
ence on Computer and Communications Security, 2017. J. Carnerero-Cano, and E. C. Lupu, “Poisoning
[45] Z. Zhao, D. Dua, and S. Singh, “Generating natural ad- attacks with generative adversarial nets,” arXiv preprint
versarial examples,” in Proc. International Conference arXiv:1906.07773, 2019.
on Learning Representations, 2018. [61] A. Shafahi, W. R. Huang, M. Najibi, O. Suciu,
[46] Y. Song, R. Shu, N. Kushman, and S. Ermon, “Con- C. Studer, T. Dumitras, and T. Goldstein, “Poison frogs!
structing unrestricted adversarial examples with gen- targeted clean-label poisoning attacks on neural net-
erative models,” in Advances in Neural Information works,” in Advances in Neural Information Processing
Processing Systems, 2018. Systems, 2018.
[47] A. Odena, C. Olah, and J. Shlens, “Conditional image [62] C. Zhu, W. R. Huang, H. Li, G. Taylor, C. Studer,
synthesis with auxiliary classifier gans,” in Proc. 34th and T. Goldstein, “Transferable clean-label poisoning
International Conference on Machine Learning, 2017. attacks on deep neural nets,” in Proc. International
[48] C. Xiao, J.-Y. Zhu, B. Li, W. He, M. Liu, and D. Song, Conference on Machine Learning, 2019.
HO et al.: SECURITY AND PRIVACY ISSUES IN DEEP LEARNING 17

[63] W. R. Huang, J. Geiping, L. Fowl, G. Taylor, sic dimensionality,” in Proc. International Conference
and T. Goldstein, “MetaPoison: practical general- on Learning Representations, 2018.
purpose clean-label data poisoning,” arXiv preprint [79] C. Guo, M. Rana, M. Cisse, and L. van der Maaten,
arXiv:2004.00225, 2020. “Countering adversarial images using input transforma-
[64] J. Geiping, L. Fowl, W. R. Huang, W. Czaja, G. Taylor, tions,” in Proc. International Conference on Learning
M. Moeller, and T. Goldstein, “Witches’ brew: indus- Representations, 2018.
trial scale data poisoning via gradient matching,” Proc. [80] A. Graese, A. Rozsa, and T. E. Boult, “Assessing
International Conference on Learning Representations, threat of adversarial examples on deep neural networks,”
2021. in Proc. IEEE International Conference on Machine
[65] Y. Liu, S. Ma, Y. Aafer, W.-C. Lee, J. Zhai, W. Wang, Learning and Applications, 2016.
and X. Zhang, “Trojaning attack on neural networks,” [81] W. Xu, D. Evans, and Y. Qi, “Feature squeezing: de-
in Proc. 25th Annual Network and Distributed System tecting adversarial examples in deep neural networks,”
Security Symposium, 2018. in Proc. Network and Distributed System Symposium,
[66] X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted 2017.
backdoor attacks on deep learning systems using data [82] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total
poisoning,” arXiv preprint arXiv:1712.05526, 2017. variation based noise removal algorithms,” Physica D:
[67] S. Li, B. Z. H. Zhao, J. Yu, M. Xue, D. Kaafar, and Nonlinear Phenomena, vol. 60, no. 1-4, pp. 259–268,
H. Zhu, “Invisible backdoor attacks against deep neural 1992.
networks,” arXiv preprint arXiv:1909.02742, 2019. [83] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille,
[68] T. A. Nguyen and A. T. Tran, “Wanet - imperceptible “Mitigating adversarial effects through randomization,”
warping-based backdoor attack,” in Proc. International in Proc. International Conference on Learning Repre-
Conference on Learning Representations, 2021. sentations, 2018.
[69] Y. Liu, X. Ma, J. Bailey, and F. Lu, “Reflection [84] A. V. Oord, N. Kalchbrenner, and K. Kavukcuoglu,
backdoor: a natural backdoor attack on deep neural “Pixel recurrent neural networks,” in Proc. 33rd Inter-
networks,” in Proc. European Conference on Computer national Conference on Machine Learning, 2016.
Vision, 2020. [85] P. Samangouei, M. Kabkab, and R. Chellappa,
[70] M. Barni, K. Kallas, and B. Tondi, “A new backdoor “Defense-GAN: protecting classifiers against adversar-
attack in CNNs by training set corruption without label ial attacks using generative models,” in Proc. Interna-
poisoning,” in Proc. IEEE International Conference on tional Conference on Learning Representations, 2018.
Image Processing (ICIP), 2019. [86] N. Carlini and D. Wagner, “Adversarial examples are
[71] A. Turner, D. Tsipras, and A. Madry, “Label-consistent not easily detected: bypassing ten detection methods,”
backdoor attacks,” arXiv preprint arXiv:1912.02771, in Proc. 10th ACM Workshop on Artificial Intelligence
2019. and Security, 2017.
[72] Y. Liu, L. Wei, B. Luo, and Q. Xu, “Fault injection [87] W. He, J. Wei, X. Chen, N. Carlini, and D. Song,
attack on deep neural network,” in Proc. IEEE/ACM “Adversarial example defense: ensembles of weak de-
International Conference on Computer-Aided Design fenses are not strong,” in Proc. USENIX Workshop on
(ICCAD). IEEE, 2017, pp. 131–138. Offensive Technologies, 2017.
[73] A. S. Rakin, Z. He, J. Li, F. Yao, C. Chakrabarti, and [88] J. Uesato, B. O’Donoghue, P. Kohli, and A. Oord,
D. Fan, “T-BFA: targeted bit-flip adversarial weight “Adversarial risk and the dangers of evaluating against
attack,” arXiv preprint arXiv:2007.12336, 2020. weak attacks,” in International Conference on Machine
[74] A. S. Rakin, Z. He, and D. Fan, “TBT: targeted neural Learning, 2018.
network attack with bit trojan,” in Proc. IEEE/CVF [89] N. Papernot, P. D. McDaniel, X. Wu, S. Jha, and
Conference on Computer Vision and Pattern Recogni- A. Swami, “Distillation as a defense to adversarial
tion, 2020. perturbations against deep neural networks,” in Proc.
[75] P. Zhao, S. Wang, C. Gongye, Y. Wang, Y. Fei, IEEE Symposium on Security and Privacy, 2016.
and X. Lin, “Fault sneaking attack: a stealthy frame- [90] M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, and
work for misleading deep neural networks,” in Proc. N. Usunier, “Parseval networks: improving robustness
56th ACM/IEEE Design Automation Conference (DAC), to adversarial examples,” in Proc. International Con-
2019. ference on Machine Learning, 2017.
[76] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversar- [91] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and
ial machine learning at scale,” in Proc. International L. Fei-Fei, “ImageNet: a large-scale hierarchical image
Conference on Learning Representations, 2017. database,” in Proc. IEEE Conference on Computer
[77] A. Athalye and I. Sutskever, “Synthesizing robust ad- Vision and Pattern Recognition, 2009.
versarial examples,” arXiv preprint arXiv:1707.07397, [92] H. Kannan, A. Kurakin, and I. Goodfellow, “Adversarial
2017. logit pairing,” arXiv preprint arXiv:1803.06373, 2018.
[78] X. Ma, B. Li, Y. Wang, S. M. Erfani, S. Wijewickrema, [93] A. S. Ross and F. Doshi-Velez, “Improving the ad-
M. E. Houle, G. Schoenebeck, D. Song, and J. Bailey, versarial robustness and interpretability of deep neural
“Characterizing adversarial subspaces using local intrin- networks by regularizing their input gradients,” in Proc.
18 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020

AAAI Conference on Artificial Intelligence, 2018. in Neural Information Processing Systems, 2018, pp.
[94] T. Miyato, S.-i. Maeda, M. Koyama, and S. Ishii, 8400–8409.
“Virtual adversarial training: a regularization method for [109] M. Lecuyer, V. Atlidakis, R. Geambasu, D. Hsu, and
supervised and semi-supervised learning,” IEEE Trans- S. Jana, “Certified robustness to adversarial examples
actions on Pattern Analysis and Machine Intelligence, with differential privacy,” in 2019 IEEE Symposium on
vol. 41, no. 8, pp. 1979–1993, 2018. Security and Privacy (SP). IEEE, 2019, pp. 656–672.
[95] C. Xie, Y. Wu, L. v. d. Maaten, A. L. Yuille, and [110] M. Balunovic and M. Vechev, “Adversarial
K. He, “Feature denoising for improving adversarial training and provable defenses: Bridging the
robustness,” in Proceedings of the IEEE Conference on gap,” in International Conference on Learning
Computer Vision and Pattern Recognition, 2019, pp. Representations, 2020. [Online]. Available: https:
501–509. //openreview.net/forum?id=SJxSDxrKDr
[96] C. Qin, J. Martens, S. Gowal, D. Krishnan, K. Dvi- [111] G. F. Cretu, A. Stavrou, M. E. Locasto, S. J. Stolfo,
jotham, A. Fawzi, S. De, R. Stanforth, and P. Kohli, and A. D. Keromytis, “Casting out demons: Sanitiz-
“Adversarial robustness through local linearization,” in ing training data for anomaly sensors,” in 2008 IEEE
Advances in Neural Information Processing Systems, Symposium on Security and Privacy (sp 2008). IEEE,
2019. 2008, pp. 81–95.
[97] H. Zhang and J. Wang, “Defense against adversarial [112] M. Veale, R. Binns, and L. Edwards, “Algorithms that
attacks using feature scattering-based adversarial train- remember: model inversion attacks and data protection
ing,” in Advances in Neural Information Processing law,” Philosophical Transactions of the Royal Society
Systems, 2019. A: Mathematical, Physical and Engineering Sciences,
[98] J. Zhang, X. Xu, B. Han, G. Niu, L. Cui, M. Sugiyama, vol. 376, no. 2133, p. 20180083, 2018.
and M. Kankanhalli, “Attacks which do not kill training [113] R. Shokri, M. Stronati, C. Song, and V. Shmatikov,
make adversarial learning stronger,” in Proc. Interna- “Membership inference attacks against machine learn-
tional Conference on Machine Learning, 2020. ing models,” in IEEE Symposium on Security and
[99] L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, and Privacy, 2017.
A. Madry, “Adversarially robust generalization requires [114] C. Song and V. Shmatikov, “The natural auditor: How
more data,” in Advances in Neural Information Process- to tell if someone used your words to train their model,”
ing Systems, 2018. arXiv preprint arXiv:1811.00513, 2018.
[100] Y. Carmon, A. Raghunathan, L. Schmidt, J. C. Duchi, [115] J. Hayes, L. Melis, G. Danezis, and E. De Cristofaro,
and P. S. Liang, “Unlabeled data improves adversarial “Logan: Membership inference attacks against gener-
robustness,” in Advances in Neural Information Pro- ative models,” Privacy Enhancing Technologies, vol.
cessing Systems, 2019. 2019, no. 1, pp. 133–152, 2019.
[101] S. Lee, C. Park, H. Lee, J. Yi, J. Lee, and S. Yoon, [116] C. A. C. Choo, F. Tramer, N. Carlini, and N. Paper-
“Removing undesirable feature contributions using out- not, “Label-only membership inference attacks,” arXiv
of-distribution data,” in Proc. International Conference preprint arXiv:2007.14321, 2020.
on Learning Representations, 2021. [117] M. Nasr, R. Shokri, and A. Houmansadr, “Compre-
[102] K. Grosse, P. Manoharan, N. Papernot, M. Backes, and hensive privacy analysis of deep learning: Passive and
P. McDaniel, “On the (statistical) detection of adversar- active white-box inference attacks against centralized
ial examples,” arXiv preprint arXiv:1702.06280, 2017. and federated learning,” in 2019 IEEE symposium on
[103] R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner, security and privacy (SP). IEEE, 2019, pp. 739–753.
“Detecting adversarial samples from artifacts,” arXiv [118] B. McMahan, E. Moore, D. Ramage, S. Hampson,
preprint arXiv:1703.00410, 2017. and B. A. y Arcas, “Communication-efficient learning
[104] T. Pang, C. Du, Y. Dong, and J. Zhu, “Towards robust of deep networks from decentralized data,” in 20th
detection of adversarial examples,” in Advances in Neu- International Conference on Artificial Intelligence and
ral Information Processing Systems, 2018. Statistics, 2017.
[105] A. Sinha, H. Namkoong, R. Volpi, and J. Duchi, “Cer- [119] J. Konečnỳ, H. B. McMahan, D. Ramage, and
tifying some distributional robustness with principled P. Richtárik, “Federated optimization: Distributed ma-
adversarial training,” in Proc. International Conference chine learning for on-device intelligence,” arXiv
on Learning Representations, 2017. preprint arXiv:1610.02527, 2016.
[106] M. Hein and M. Andriushchenko, “Formal guarantees [120] K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba,
on the robustness of a classifier against adversarial A. Ingerman, V. Ivanov, C. Kiddon, J. Konecny, S. Maz-
manipulation,” in Advances in Neural Information Pro- zocchi, H. B. McMahan et al., “Towards federated
cessing Systems, 2017. learning at scale: System design,” in Conference on
[107] A. Raghunathan, J. Steinhardt, and P. Liang, “Certified Systems and Machine Learning, 2019.
defenses against adversarial examples,” in International [121] B. Hitaj, G. Ateniese, and F. Perez-Cruz, “Deep models
Conference on Learning Representations, 2018. under the GAN: Information leakage from collaborative
[108] E. Wong, F. Schmidt, J. H. Metzen, and J. Z. Kolter, deep learning,” in In ACM SIGSAC Conference on
“Scaling provable adversarial defenses,” in Advances Computer and Communications Security, 2017.
HO et al.: SECURITY AND PRIVACY ISSUES IN DEEP LEARNING 19

[122] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual Forensics and Security, vol. 13, no. 5, pp. 1333–1345,
learning for image recognition,” in IEEE Conference on 2018.
Computer Vision and Pattern Recognition, 2016. [138] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan,
[123] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der I. Mironov, K. Talwar, and L. Zhang, “Deep learning
Maaten, “Densely connected convolutional networks,” with differential privacy,” in Proceedings of the 2016
in IEEE Conference on Computer Vision and Pattern ACM SIGSAC Conference on Computer and Communi-
Recognition, 2017. cations Security, 2016.
[124] C. Analytica, “Facebook–cambridge an- [139] H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang,
alytica data scandal,” 2018. [On- “Learning differentially private recurrent language mod-
line]. Available: https://en.wikipedia.org/wiki/ els,” in International Conference on Learning Represen-
FacebookCambridge\ Analytica data scandal tations, 2018.
[125] R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, [140] L. Xie, K. Lin, S. Wang, F. Wang, and J. Zhou,
M. Naehrig, and J. Wernsing, “Cryptonets: Applying “Differentially private generative adversarial network,”
neural networks to encrypted data with high throughput arXiv preprint arXiv:1802.06739, 2018.
and accuracy,” in International Conference on Machine [141] G. Acs, L. Melis, C. Castelluccia, and E. De Cristo-
Learning, 2016. faro, “Differentially private mixture of generative neural
[126] E. Hesamifard, H. Takabi, and M. Ghasemi, “Cryp- networks,” IEEE Transactions on Knowledge and Data
toDL: Deep neural networks over encrypted data,” arXiv Engineering, 2018.
preprint arXiv:1711.05189, 2017. [142] L. Yu, L. Liu, C. Pu, M. E. Gursoy, and S. Truex, “Dif-
[127] H. Chabanne, A. de Wargny, J. Milgram, C. Morel, and ferentially private model publishing for deep learning,”
E. Prouff, “Privacy-preserving classification on deep in Differentially Private Model Publishing for Deep
neural network,” IACR Cryptology ePrint Archive, 2017. Learning, 2019.
[128] E. Chou, J. Beal, D. Levy, S. Yeung, A. Haque, [143] K. Chaudhuri and C. Monteleoni, “Privacy-preserving
and L. Fei-Fei, “Faster cryptonets: Leveraging spar- logistic regression,” in Advances in Neural Information
sity for real-world encrypted inference,” arXiv preprint Processing Systems, 2009.
arXiv:1811.09953, 2018. [144] N. Phan, Y. Wang, X. Wu, and D. Dou, “Differential
[129] A. Brutzkus, O. Elisha, and R. Gilad-Bachrach, “Low privacy preservation for deep auto-encoders: an applica-
latency privacy preserving inference,” arXiv preprint tion of human behavior prediction.” in AAAI Conference
arXiv:1812.10659, 2018. on Artificial Intelligence, 2016.
[130] A. Sanyal, M. J. Kusner, A. Gascón, and V. Kanade, [145] N. Phan, X. Wu, and D. Dou, “Preserving differential
“TAPAS: Tricks to accelerate (encrypted) prediction privacy in convolutional deep belief networks,” Machine
as a service,” in International Conference in Machine Learning, vol. 106, no. 9-10, pp. 1681–1704, 2017.
Learning, 2018. [146] N. Papernot, M. Abadi, Úlfar Erlingsson, I. Goodfellow,
[131] F. Bourse, M. Minelli, M. Minihold, and P. Paillier, and K. Talwar, “Semi-supervised knowledge transfer for
“Fast homomorphic evaluation of deep discretized neu- deep learning from private training data,” in Interna-
ral networks,” in Annual International Cryptology Con- tional Conference on Learning Representations, 2017.
ference, 2018. [147] N. Papernot, S. Song, I. Mironov, A. Raghunathan,
[132] P. Mohassel and Y. Zhang, “SecureML: A system for K. Talwar, and Ú. Erlingsson, “Scalable private learning
scalable privacy-preserving machine learning,” in IEEE with PATE,” in 6th International Conference on Learn-
Symposium on Security and Privacy, 2017. ing Representations, 2018.
[133] J. Liu, M. Juuti, Y. Lu, and N. Asokan, “Oblivious [148] J. Jordon, J. Yoon, and M. Van Der Schaar, “Pate-
neural network predictions via MiniONN transforma- gan: Generating synthetic data with differential privacy
tions,” in ACM SIGSAC Conference on Computer and guarantees,” in International Conference on Learning
Communications Security, 2017. Representations, 2018.
[134] B. D. Rouhani, M. S. Riazi, and F. Koushanfar, [149] J. W. Bos, K. Lauter, J. Loftus, and M. Naehrig,
“Deepsecure: Scalable provably-secure deep learning,” “Improved security for a ring-based fully homomorphic
in 5th ACM/ESDA/IEEE Design Automation Confer- encryption scheme,” in IMA International Conference
ence, 2018. on Cryptography and Coding, 2013.
[135] C. Juvekar, V. Vaikuntanathan, and A. Chandrakasan, [150] Y. LeCun, C. Cortes, and C. Burges, “Mnist handwritten
“Gazelle: A low latency framework for secure neural digit database,” 2010. [Online]. Available: http://yann.
network inference,” in 27th USENIX Security Sympo- lecun.com/exdb/mnist
sium, 2018. [151] I. Chillotti, N. Gama, M. Georgieva, and M. Izabachene,
[136] R. Shokri and V. Shmatikov, “Privacy-preserving deep “Faster fully homomorphic encryption: Bootstrapping in
learning,” in 22nd ACM SIGSAC Conference on Com- less than 0.1 seconds,” in International Conference on
puter and Communications Security, 2015. the Theory and Application of Cryptology and Informa-
[137] Y. Aono, T. Hayashi, L. Wang, S. Moriai et al., tion Security, 2016.
“Privacy-preserving deep learning via additively homo- [152] A. C.-C. Yao, “How to generate and exchange secrets,”
morphic encryption,” IEEE Transactions on Information in 27th Annual Symposium on Foundations of Computer
20 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2020

Science, 1986. liver and tumor segmentation from ct volumes,” IEEE

[153] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Transactions on Medical Imaging, vol. 37, no. 12, pp.
M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le et al., 2663–2674, 2018.
“Large scale distributed deep networks,” in Advances in [169] Y. Jo, H. Cho, S. Y. Lee, G. Choi, G. Kim, H.-s. Min,
Neural Information Processing Systems, 2012. and Y. Park, “Quantitative phase imaging and artificial
[154] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, intelligence: A review,” IEEE Journal of Selected Topics
J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard in Quantum Electronics, vol. 25, no. 1, pp. 1–14, 2018.
et al., “Tensorflow: a system for large-scale machine [170] A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran,
learning,” in 12th USENIX Symposium on Operating and A. Madry, “Adversarial examples are not bugs,
Systems Design and Implementation, 2016. they are features,” in Advances in Neural Information
[155] S. Lee, H. Kim, J. Park, J. Jang, C.-S. Jeong, and Processing Systems, 2019.
S. Yoon, “TensorLightning: A traffic-efficient dis- [171] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep
tributed deep learning on commodity spark clusters,” inside convolutional networks: Visualising image clas-
IEEE Access, 2018. sification models and saliency maps,” arXiv preprint
[156] M. Li, D. G. Andersen, J. W. Park, A. J. Smola, arXiv:1312.6034, 2013.
A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and [172] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R.
B.-Y. Su, “Scaling distributed machine learning with the Müller, and W. Samek, “On pixel-wise explanations for
parameter server.” in Symposium on Operating Systems non-linear classifier decisions by layer-wise relevance
Design and Implementation, 2014. propagation,” PLoS One, vol. 10, no. 7, p. e0130140,
[157] T. Ryffel, A. Trask, M. Dahl, B. Wagner, J. Mancuso, 2015.
D. Rueckert, and J. Passerat-Palmbach, “A generic [173] A. Shrikumar, P. Greenside, and A. Kundaje, “Learning
framework for privacy preserving deep learning,” arXiv important features through propagating activation dif-
preprint arXiv:1811.04017, 2018. ferences,” in 34th International Conference on Machine
[158] C. Dwork, “Differential privacy: A survey of results,” Learning, 2017.
in International Conference on Theory and Applications [174] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and
of Models of Computation, 2008. A. Madry, “Robustness may be at odds with accuracy,”
[159] S. Hochreiter and J. Schmidhuber, “Long short-term in International Conference on Learning Representa-
memory,” Neural computation, vol. 9, no. 8, pp. 1735– tions, no. 2019, 2019.
1780, 1997. [175] Y. Tian, K. Pei, S. Jana, and B. Ray, “Deeptest:
[160] C. Dwork, A. Roth et al., “The algorithmic foundations Automated testing of deep-neural-network-driven au-
of differential privacy,” Foundations and Trends® in tonomous cars,” in 40th International Conference on
Theoretical Computer Science, vol. 9, no. 3–4, pp. 211– Software Engineering, 2018.
407, 2014. [176] S. Wang, K. Pei, J. Whitehouse, J. Yang, and S. Jana,
[161] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” “Efficient formal safety analysis of neural networks,”
Nature, vol. 521, no. 7553, p. 436, 2015. in Advances in Neural Information Processing Systems,
[162] D. P. Kingma and M. Welling, “Auto-encoding varia- 2018.
tional Bayes,” arXiv preprint arXiv:1312.6114, 2013. [177] G. Singh, T. Gehr, M. Püschel, and M. Vechev, “An
[163] M. Bun and T. Steinke, “Concentrated differential pri- abstract domain for certifying neural networks,” Pro-
vacy: Simplifications, extensions, and lower bounds,” in ceedings of the ACM on Programming Languages,
Theory of Cryptography Conference, 2016. vol. 3, no. POPL, p. 41, 2019.
[164] Y. Bengio et al., “Learning deep architectures for AI,” [178] C. Gentry, “Fully homomorphic encryption using ideal
Foundations and Trends in Machine Learning, vol. 2, lattices,” in Proceedings of the forty-first annual ACM
no. 1, pp. 1–127, 2009. symposium on Theory of computing, 2009, pp. 169–178.
[165] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, “Convo-
lutional deep belief networks for scalable unsupervised
learning of hierarchical representations,” in 26th Annual
International Conference on Machine Learning, 2009.
[166] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convo-
lutional networks for biomedical image segmentation,”
in International Conference on Medical Image Comput-
ing and Computer-Assisted Intervention, 2015.
[167] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and
O. Ronneberger, “3D U-Net: Learning dense volumet-
ric segmentation from sparse annotation,” in Interna-
tional Conference on Medical Image Computing and
Computer-Assisted Intervention, 2016.
[168] X. Li, H. Chen, X. Qi, Q. Dou, C.-W. Fu, and P.-A.
Heng, “H-denseunet: Hybrid densely connected unet for

ADIT TP 2023-06 Questions
100% (1)
ADIT TP 2023-06 Questions
6 pages
Security and Privacy Challenges in Deep Learning
No ratings yet
Security and Privacy Challenges in Deep Learning
4 pages
Privacy and Security Issues in Deep Learning A Survey
No ratings yet
Privacy and Security Issues in Deep Learning A Survey
28 pages
Adversarial ML Survey Paper
No ratings yet
Adversarial ML Survey Paper
23 pages
Machine Learning Security and Privacy A Review of
No ratings yet
Machine Learning Security and Privacy A Review of
24 pages
Backdoor Attacks To Deep Learning Models and Countermeasures A Survey. 2024.13s
No ratings yet
Backdoor Attacks To Deep Learning Models and Countermeasures A Survey. 2024.13s
13 pages
Machine Learning Security and Privacy
No ratings yet
Machine Learning Security and Privacy
3 pages
Adversarial Learning Targeting Deep Neural Network Classification A Comprehensive
No ratings yet
Adversarial Learning Targeting Deep Neural Network Classification A Comprehensive
32 pages
Machine Learning Security Threats
No ratings yet
Machine Learning Security Threats
39 pages
Security-of-AI-systems Fundamentals Adversarial Deep Learning
No ratings yet
Security-of-AI-systems Fundamentals Adversarial Deep Learning
288 pages
Adversarial Attacks and Defenses in Deep Learning
No ratings yet
Adversarial Attacks and Defenses in Deep Learning
15 pages
Adversarial Attacks and Defenses in Deep Learning
No ratings yet
Adversarial Attacks and Defenses in Deep Learning
39 pages
1 s2.0 S209580991930503X Main
No ratings yet
1 s2.0 S209580991930503X Main
15 pages
CAAI Trans On Intel Tech - 2021 - Chakraborty - A Survey On Adversarial Attacks and Defences
No ratings yet
CAAI Trans On Intel Tech - 2021 - Chakraborty - A Survey On Adversarial Attacks and Defences
21 pages
Deepsecure: Scalable Provably-Secure Deep Learning
No ratings yet
Deepsecure: Scalable Provably-Secure Deep Learning
6 pages
CSR 2023 Defense
No ratings yet
CSR 2023 Defense
24 pages
Adversarial Attacks On Deep Learning Models
No ratings yet
Adversarial Attacks On Deep Learning Models
15 pages
Machine Learning Security and Privacy A Review of Threats and Countermeasures
No ratings yet
Machine Learning Security and Privacy A Review of Threats and Countermeasures
23 pages
Sok: Security and Privacy in Machine Learning
No ratings yet
Sok: Security and Privacy in Machine Learning
16 pages
Machine Learning Security Threats Countermeasures and Evaluations
No ratings yet
Machine Learning Security Threats Countermeasures and Evaluations
23 pages
Detecting - Conventional - and - Adversarial - Attacks - Using - Deep - Learning - Techniques - A - Systematic - Review
No ratings yet
Detecting - Conventional - and - Adversarial - Attacks - Using - Deep - Learning - Techniques - A - Systematic - Review
7 pages
An Overview of Backdoor Attacks Against Deep Neural Networks and Possible Defences
No ratings yet
An Overview of Backdoor Attacks Against Deep Neural Networks and Possible Defences
27 pages
Attacks Against Machine Learning - Evasion
No ratings yet
Attacks Against Machine Learning - Evasion
45 pages
Survey of Privacy Attack in ML
No ratings yet
Survey of Privacy Attack in ML
34 pages
A Robust Privacy-Preserving Federated Learning Model Against Model Poisoning Attacks
No ratings yet
A Robust Privacy-Preserving Federated Learning Model Against Model Poisoning Attacks
16 pages
Paper 1
No ratings yet
Paper 1
16 pages
Data Security Tutorial 12 - Solutions
No ratings yet
Data Security Tutorial 12 - Solutions
4 pages
Towards Trustworthy LLMs - Understanding The Security and Privacy
No ratings yet
Towards Trustworthy LLMs - Understanding The Security and Privacy
82 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Trustworthy Machine Learning in The Context of Security and Privacy
No ratings yet
Trustworthy Machine Learning in The Context of Security and Privacy
28 pages
w11 ML Security
No ratings yet
w11 ML Security
35 pages
Poisoning Attacks Against Machine Learning Can Machine Learning Be Trustworthy
No ratings yet
Poisoning Attacks Against Machine Learning Can Machine Learning Be Trustworthy
6 pages
Adversarial Machine Learning Attack Surfaces, Defence Mechanisms
No ratings yet
Adversarial Machine Learning Attack Surfaces, Defence Mechanisms
314 pages
10.1515 - Jisys 2024 0277
No ratings yet
10.1515 - Jisys 2024 0277
52 pages
IET Image Processing - 2021 - Chen - Boundary Augment A Data Augment Method To Defend Poison Attack
No ratings yet
IET Image Processing - 2021 - Chen - Boundary Augment A Data Augment Method To Defend Poison Attack
12 pages
Privacy-Preserving Machine Learning On Encrypted Data Using Homomorphic Encryption
No ratings yet
Privacy-Preserving Machine Learning On Encrypted Data Using Homomorphic Encryption
6 pages
Book - A State of The Art Review On Adversarial Machine Learning
No ratings yet
Book - A State of The Art Review On Adversarial Machine Learning
66 pages
Robustness Evaluation of Deep Unsupervised Learning Algorithms For Intrusion Detection Systems
No ratings yet
Robustness Evaluation of Deep Unsupervised Learning Algorithms For Intrusion Detection Systems
9 pages
Reflection Backdoor: A Natural Backdoor Attack On Deep Neural Networks
No ratings yet
Reflection Backdoor: A Natural Backdoor Attack On Deep Neural Networks
23 pages
Name: Hồ Viết Vĩnh Class: 2C22CACN Word count (without bibliography or subheadings) : 666 Literature Review Security Challenges in Artificial Intelligence (AI)
No ratings yet
Name: Hồ Viết Vĩnh Class: 2C22CACN Word count (without bibliography or subheadings) : 666 Literature Review Security Challenges in Artificial Intelligence (AI)
3 pages
Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A Contemporary Survey
No ratings yet
Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A Contemporary Survey
46 pages
A Critical Overview of Privacy in Machine Learning
No ratings yet
A Critical Overview of Privacy in Machine Learning
9 pages
My Project
No ratings yet
My Project
30 pages
Defense Against Adversarial Attacks Using Convolutional Auto-Encoders
No ratings yet
Defense Against Adversarial Attacks Using Convolutional Auto-Encoders
9 pages
10 1109@MC 2019 2909955
No ratings yet
10 1109@MC 2019 2909955
4 pages
DP-GSGLD - A Bayesian Optimizer Inspired by Differential Privacy Defending Against Privacy Leakage in Federated Learning
No ratings yet
DP-GSGLD - A Bayesian Optimizer Inspired by Differential Privacy Defending Against Privacy Leakage in Federated Learning
15 pages
Differential Privacy in Deep Learning: An Overview
No ratings yet
Differential Privacy in Deep Learning: An Overview
6 pages
JJ
No ratings yet
JJ
5 pages
Security Engineering For Machine Learning
No ratings yet
Security Engineering For Machine Learning
4 pages
Practical Black-Box Attacks Against Machine Learning: Nicolas Papernot Patrick Mcdaniel Ian Goodfellow
No ratings yet
Practical Black-Box Attacks Against Machine Learning: Nicolas Papernot Patrick Mcdaniel Ian Goodfellow
14 pages
Review-1 PPT (1) .PPTX (Autosaved) - 1
No ratings yet
Review-1 PPT (1) .PPTX (Autosaved) - 1
12 pages
Applsci 09 00909
No ratings yet
Applsci 09 00909
29 pages
Differential Privacy Preservation in Deep Learning Challenges Opportunities and Solutions
No ratings yet
Differential Privacy Preservation in Deep Learning Challenges Opportunities and Solutions
11 pages
AI, Machine Learning and Deep Learning A Security Perspective CRC
No ratings yet
AI, Machine Learning and Deep Learning A Security Perspective CRC
540 pages
Towards Building Safe & Trustworthy AI Agents and A Path For Science-And Evidence-Based AI Policy
No ratings yet
Towards Building Safe & Trustworthy AI Agents and A Path For Science-And Evidence-Based AI Policy
133 pages
Threat of Adversarial Attacks On Deep Learning A Survey
No ratings yet
Threat of Adversarial Attacks On Deep Learning A Survey
21 pages
Safety of Data Security2
No ratings yet
Safety of Data Security2
6 pages
Review Article: A Review of Deep Learning Security and Privacy Defensive Techniques
No ratings yet
Review Article: A Review of Deep Learning Security and Privacy Defensive Techniques
18 pages
Untargeted, Targeted and Universal Adversarial Attacks and Defenses On Time Series
No ratings yet
Untargeted, Targeted and Universal Adversarial Attacks and Defenses On Time Series
8 pages
Chapter - 6 Issue and Redemption of Debentures
No ratings yet
Chapter - 6 Issue and Redemption of Debentures
8 pages
Numerical Similarity Measures Versus Jaccard For Collaborative Filtering
No ratings yet
Numerical Similarity Measures Versus Jaccard For Collaborative Filtering
14 pages
Temporary Structures: Wall Form Design
No ratings yet
Temporary Structures: Wall Form Design
12 pages
Basic Urban Design Principle
No ratings yet
Basic Urban Design Principle
6 pages
Biomedical and Instrumentation Lab File
No ratings yet
Biomedical and Instrumentation Lab File
37 pages
Bowser Document
No ratings yet
Bowser Document
2 pages
Aesthetics of Industrial Architecture in
No ratings yet
Aesthetics of Industrial Architecture in
10 pages
Allegro PCB Si Sigxplorer L Series Tutorial: Product Version 15.7 July 2006
No ratings yet
Allegro PCB Si Sigxplorer L Series Tutorial: Product Version 15.7 July 2006
48 pages
What Is Prompt Engineering v2
No ratings yet
What Is Prompt Engineering v2
6 pages
A Study On Job Satisfaction of Employees at
No ratings yet
A Study On Job Satisfaction of Employees at
6 pages
Course Purpose-MFS - Major III
No ratings yet
Course Purpose-MFS - Major III
2 pages
Earthing Specification
No ratings yet
Earthing Specification
7 pages
Management Reporting System and Its Evaluation
75% (4)
Management Reporting System and Its Evaluation
6 pages
Human Resource Management: Decenzo and Robbins
No ratings yet
Human Resource Management: Decenzo and Robbins
17 pages
A2002D10328392 Benson Gilbert Odo Week 7 - Assessment Point 2
100% (1)
A2002D10328392 Benson Gilbert Odo Week 7 - Assessment Point 2
17 pages
Cooperatives Vs Corporations
No ratings yet
Cooperatives Vs Corporations
2 pages
Case 2 and 3 For Practice of Profession
No ratings yet
Case 2 and 3 For Practice of Profession
3 pages
R-C101C Manual EU Verision
No ratings yet
R-C101C Manual EU Verision
44 pages
Legal Assistant or Human Resources or Director of Human Resource
No ratings yet
Legal Assistant or Human Resources or Director of Human Resource
2 pages
BIOS Manual For System Boards With Intel® 7 Series / C216 Chipset
No ratings yet
BIOS Manual For System Boards With Intel® 7 Series / C216 Chipset
70 pages
Lecture 4.2 Supervised Learning Classification
No ratings yet
Lecture 4.2 Supervised Learning Classification
25 pages
Tonoyan Et Al-2010-Entrepreneurship Theory and Practice
No ratings yet
Tonoyan Et Al-2010-Entrepreneurship Theory and Practice
40 pages
Day 5 IELTS Academic Reading Questions by KenyanNurse-1
No ratings yet
Day 5 IELTS Academic Reading Questions by KenyanNurse-1
12 pages
IBM Power E1050 Level 2 Quiz
No ratings yet
IBM Power E1050 Level 2 Quiz
17 pages
Copy Assessment - WNG
No ratings yet
Copy Assessment - WNG
3 pages
Cost of Rooftop Solar System (On-Grid, Off-Grid, Hybrid) in India - AlienSolar
No ratings yet
Cost of Rooftop Solar System (On-Grid, Off-Grid, Hybrid) in India - AlienSolar
2 pages
01 23 ADCB Fire Pipes Egy Gulf Rev.01
No ratings yet
01 23 ADCB Fire Pipes Egy Gulf Rev.01
3 pages
Introduction For Term Paper Sample
100% (1)
Introduction For Term Paper Sample
4 pages
The Cultural Revolution Extra Reading
No ratings yet
The Cultural Revolution Extra Reading
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Security and Privacy Issues in Deep Learning

Uploaded by

Security and Privacy Issues in Deep Learning

Uploaded by

JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO.

Security and Privacy Issues in Deep Learning

1) Attacks on DL models : The two major types of attacks

providers, information silos and users.

II. S ECURE AI labels or limited authority), the attacks would be difficult to

White-box Section II-A1a

White-box Black-box Training phase Inference phase

!"#$%&'()*$)$+( 8$(-0/&8)/1 <=1/,1","./)/,#

Input 𝑥 and Final layer 𝑧ෞ𝑘 and Convex outer bound

Fig. 6. An adversarial polytope and its outer convex bound [19].

ing statistical metrics to the output of a classifier. They also

(Figure 8) Homomorphic Secure Multi-party Differential

Label-level where M : D → R is a randomized mechanism that adds the

IV. C ONCLUSION C. Designing privacy-preserving DNNs using HE

Science, 1986. liver and tumor segmentation from ct volumes,” IEEE

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.