w11 ML Security
w11 ML Security
Lecture 11
Security and Privacy in ML Systems
Elias Athanasopoulos
athanasopoulos.elias@ucy.ac.cy
Outline
• Overview of Machine Learning (ML)
• Security definitions and goals
– Derive a threat model
• Training-time attacks
• Test-time (inference) attacks
• Defences
2
What is Machine Learning?
• We use computers to run algorithms for solving
common problems
– Sorting, searching, graph traversal, etc.
• Some problems cannot be solved by an algorithm
– These problems do have solutions
– These problems are not always hard problems that we do
not know fast algorithms for their solution (e.g., factoring a
large integer)
– E.g., classify a face photo to people with long or short hair
• These problems can be effectively solved by a model
– If we know the behavior of a lot of data, then we can solve
new instances of the same problem
3
ML problems
• Not all problems are suitable to be solved by ML
• There is a vast amount of problems that have efficient
ML solutions
– Pattern matching (OCR), image classification, anomaly
detection, language translation
• Candidate problems are the ones that past data can
predict the behaviour of new data
– Essentially the model learns from past data, in order to
decide about new data
– If you give the model 1 million of photos with long hair and
1 million of photos with short hair, then the model can
learn to distinguish new photos of long and short hair
4
ML types
• Supervised learning
– Labeled inputs with corresponding outputs
– Map new (unseen) inputs to known outputs
– Classification (of input to categories) or regression (of a value to
a certain range)
– Object recognition in images, spam filtering, etc.
• Unsupervised learning
– Inputs are unlabeled
– Clustering of inputs according to common properties
• Reinforcement learning
– Data is sequences of actions, observations, and rewards
– The goal is to produce a policy for acting in the environment
– Winning a video game
5
ML training
• Functions that take input and some parameters
and output a prediction for some property of
interest
– Input is usually a vector of values (features)
– The functions are a lot and we seek to find the
parameters (learning) that define the one that will be
used by the model
• Once the function is computed then we can test
new inputs to validate the performance of the
model
– For supervised learning we can take some samples
from the training set and add them to the testing set
6
ML inference
• The model is deployed to infer predictions on
inputs unseen during training
– The values of parameters are fixed, and the model
can compute outputs based on new inputs
• The model prediction may take different forms
– The most common for classification is a vector
assigning a probability for each class of the
problem, which characterizes how likely the input
is to belong to that class
7
Threat model
• When we discuss about security in a specific
context we define an attacker with certain
capabilities and goals
• The model may include an attack surface
– What attacks can be achieved in each stage of a
pipeline?
8
Attack surface
9
Trust model
• Data-owners are the owners or trustees of the
data/environment that the system is deployed within,
– e.g., an IT organisation deploying a face recognition
authentication service.
• System providers which construct the system and
algorithms
– e.g., the authentication service software vendors
• Consumers of the service the system provides
– e.g., the enterprise user
• Outsiders who may have explicit or incidental access
to the systems, or may simply be able to influence the
system inputs
– e.g., other users or adversaries within the enterprise
10
Adversarial goals
CIA + Privacy
• Confidentiality
– An adversary can extract information about the model
– We assume that the model is confidential or represents
intellectual property
• Privacy
– Models are trained on data that may contain sensitive
information
• Integrity
– Attacks that modify the output of the model
– E.g., create false positive in a face recognition system
• Availability
– Make the module inconsistent
– Make an autonomous vehicle non operational
11
Adversarial capabilities
12
Adversarial capabilities
Training
• Attempt to learn, influence, or corrupt the
model itself
– Usually facilitated by simply accessing a
summary, a portion, or all of the training data
– Can be done through explicit data breach or by
using collected data to train another model
• Alter the training data
– Insert adversarial inputs (injection), or alter the
training data directly (modification)
• Tamper with the learning algorithm
13
Training in adversarial setting
• During learning the attacker can pollute data
by inserting, editing, or removing points
– The intent is to modify the decision boundaries of
the trained model
– Commonly called as poisoning attack
– Very frequently they target classification tasks
• Under such scenario, the model will be still
functional but the predictions will favour the
attacker’s goals
14
Targeting integrity
Label manipulation
• Adversary modifies part of the training sample
– Needs to modify less than 10% for reducing the
accuracy of the model to 90%
• Label manipulation
– Limited attack, have been shown to work with
binary classifiers (i.e., swapping labels)
– Hard to quantify the amount of label
modifications, especially for multi-class classifiers,
needed to reduce the classifier’s accuracy
15
Targeting integrity
Input manipulation
• Direct poisoning of the learning inputs
– When training inputs are used with simple metrics
(e.g., centroid model use Euclidean distances), then
solving a linear programming problem can derive the
poisoned data
– The major goal of the attacker is to find how inputs
can reduce the accuracy of the classifier, dramatically
• Indirect poisoning of the learning inputs
– Adversaries have no access to pre-processed data
– They instead create ”data” that mislead the classifier,
e.g., a noisy polymorphic worm
16
Targeting privacy and
confidentiality
• During training, the confidentiality and privacy
of the data and model are not impacted by
the fact that ML is used, but rather the extent
of the adversary’s access to the system
hosting them
• This is a traditional access control problem,
which falls outside the scope of our discussion
17
Adversarial capabilities
Inference
• Attacks at inference time do not tamper with the
targeted model
– Drive the model to produce adversary selected outputs
(integrity)
– Collect evidence about the model characteristics
(confidentiality and privacy)
• White box attacks
– Adversary knows the model architecture, the model
parameters, training data, or a combination of these
• Black box attacks
– No knowledge about the model
– The adversary can only submit inputs and observe outputs
18
Inferring in adversarial setting
• Adversaries may also attack deployed ML model at
inference time
– An attacker may be targeting an intrusion detection
system’s whose rules were learned and fixed
– The attacker is interested in evading detection at runtime
• White-box attackers have access to the model internals
– Architecture, parameters
• Black-box adversaries are limited to interacting with
the model as an oracle
– Submitting inputs and observing the model’s predictions
19
Inferring in adversarial setting
White-box adversaries
• Varying degrees of access to the model
– How? An ML model trained in a data centre can
be bundled in a mobile app
• Integrity
– We cannot modify the training process anymore,
so we can only perturb new inputs
20
Direct manipulation of model
inputs
• Create adversarial examples by modifying
inputs accordingly and result to
misclassification
– The challenging part is to compute the adversarial
examples
• Why it works?
– Models often extrapolate linearly from the limited
subspace covered by the training data
– Algorithms can exploit this regularity in directing
search toward prospective adversarial regions
21
Indirect manipulation of model
inputs
• So far we assume that we can create an input and
feed it directly to the model
– Create a malware or a spam e-mail that causes the
model to misclassify the input
• Some ML systems operate in the physical world
– Robots navigate through obstacles
– Can we perturb such inputs (i.e., “physical objects”)
– It has been shown that photos taken with a
smartphone can be further given as an input to an ML
model that misclassifies them
22
Example
23
Beyond classification
• Although research has focused on
classification, adversarial example algorithms
extend to other settings
– E.g., reinforcement learning: the adversary
perturbs a frame in a video game to force an
agent to take wrong action
24
Privacy and confidentiality
• Confidentiality attacks in the white-box threat
model are trivial
– The adversary already has access to the model
parameters
• Targeting the privacy of data used in an ML
system is usually related to recovering
information about the training data
– The simplest attack against data consists in
performing a membership test, i.e. determining
whether a particular input was used in the training
dataset of a model
25
Inferring in adversarial setting
Black-box adversaries
• Computing adversarial examples when the
internals of the model is known is feasible
• What if the attacker can only interact with the
model by just sending queries and observing
the results?
• In this setup, the ML model acts as an oracle
26
Black-box setting
Integrity
• We define a cost function for estimating the
amount of queries needed in order to make
the ML model to misclassify an input
• The cost is associated with the modifications
needed from an input x (legit) to reach to an
input x* (malicious)
• The goal is to compute the least amount of
modifications needed (cost is minimized)
27
Direct manipulation of model
inputs
• Computing the modifications needed for transforming
a legitimate input to an adversarial one can be done
with several approaches
– Genetic functions, training another ML model, etc.
• Sometimes the ML model, that acts as an oracle, can
give more information per query answered
– E.g., class probabilities vs just the class
• Adversarial example transferability
– Adversarial examples that are misclassified by a model are
likely to misclassified by another one
– We can train our own ML model for predicting the
adversarial inputs
28
Privacy and confidentiality
• Membership attacks
– Testing if a specific point was part of the training dataset
• Training data extraction
– Model inversion enables adversaries to extract training
data from model predictions
– E.g., for a medicine dosage prediction task, access to the
model information about the patient’s stable medicine
dosage, can help in recovering genomic information
• Model extraction
– extract parameters of a model from the observation of its
predictions
– Some ML models are confidential (proprietary)
29
Defenses
• Defending against training-time attacks
– Several algorithms invented to rule out poisoning samples based on
the fact that they are typically out of the expected input distribution
– Some works propose obfuscation, which is not very attractive
• Defending against inference-time attacks
– Such attacks rely on the adversary being able to find small
perturbations that lead to significant changes in the model’s output
– Any defence that tampers with adversarial example crafting heuristics,
but does not mitigate the underlying erroneous model predictions can
be evaded
• Defending against larger perturbations
– Defending against adversarial examples will almost certainly need to
improve the ability of models to be uncertain when predicting far from
their training subspace
30
Learning and inferring with
privacy
• Most promising defence so far is differential privacy
– Recall that differential privacy is based on adding noise
– Noise should be also injected in the ML model, somehow
• Training
– At training, random noise may be injected to the data, the cost
minimized by the learning algorithm, or the values of
parameters learned
– Noise drops accuracy, and can be critical in certain applications
• Inference
– The ML’s behaviour may also be randomized at inference by
introducing noise to predictions
– This degrades the accuracy of predictions, since the amount of
noise introduced increases with the number of inference
queries answered by the ML model
31
Fairness and accountability in ML
• Models predict values based on data and in
some domains (e.g., banking, healthcare)
these predictions may be critical
• The European Data Protection Regulation
requires that companies provide an
explanation of their predictions if they are
made using sensitive or private data
32
Fairness
• Predictions from ML models should not be
biased or cause discrimination
– Training data can cause bias
– The learning algorithm can also cause bias
• Fairness can be connected with privacy
– Adversarial example algorithms can estimate how
representative of a class a particular input is,
which leads to the identification of racial biases in
popular image datasets
33
Accountability
• Techniques used for accountability and
transparency are likely to yield improved
attack techniques because they increase the
adversary’s understanding of how the model’s
decisions are made
• But, they also contribute to building a better
understanding of the impact of training data
on the model learned by ML algorithm, which
is beneficial to privacy-preserving ML
34
References
• SoK: Security and Privacy in Machine Learning.
Nicolas Papernot, Patrick McDaniel, Arunesh
Sinha, and Michael P. Wellman. In IEEE
European Symposium on Security and Privacy,
2018.
35