Lecture 06
Lecture 06
Deep Learning 1
Worst-Case Analysis
▶ The problem of adversarial examples
▶ Adversarial robustness
Adding Predictive Uncertainty
▶ Why predictive uncertainty?
▶ Density networks
▶ Ensemble models
Adding Prior Knowledge
▶ Translation invariance, local smoothness, etc.
▶ Feature reuse (transfer / multitask / self-supervised learning)
1/21
Part 1 Worst-Case Analysis
2/21
Motivations
Risk aversion
▶ One big error can often be more harmful than many small errors, e.g. a
system being controlled by a neural network may be tolerant to small
errors (which can be corrected subsequently), but not to a big error
from which one cannot recover.
Adversarial components
▶ Even though the neural network may perform well on average, an
adversary may craft inputs that steer the ML system towards the
worst-case decision behavior.
3/21
Worst-Case Analysis
neural network
predictions
ground truth
4/21
Example: Adversarial Examples
▶ Carefully crafted nearly invisible perturbations of an existing data point
can cause the prediction of a neural network to change drastically,
while leaving almost no trace of the attack.
Enhanced Regularization:
▶ Search for high local variations of
the decision function and add these
variations as a term of the error
function to minimize. neural network
predictions
▶ In practice, this can take the form of
generating adversarial examples, and
forcing them to be predicted in the
same way as the original data.
▶ More generic approaches based on
Lipschitz-continuity (e.g. spectral
norm regularization) can also be
used.
Data Preprocessing:
▶ In practice, one can also address worst-case behavior by applying
dimensionality reduction (e.g. blurring images) before applying the
neural network.
6/21
Part 2 Predictive Uncertainty
7/21
Predictive Uncertainty
Practical motivations:
▶ Understand when we can
trust the model in a more
precise way than just looking
at the overall error.
▶ Enables the user to be
prompted when the model is
unsure, in which case, the
user can decide e.g. to
collect more data, or to
perform the prediction Image source: https://doi.org/10.1103/PhysRevD.98.063511
manually.
8/21
Predictive Uncertainty
Approach 1:
▶ Explicitly encoding the uncertainty estimate in the neural network, i.e.
have one output neuron for predicting the actual value of interest, and
a second output neuron for predicting the uncertainty associated to
this prediction.
▶ For example, one predicts that the output is distributed according to
the random variable y ∼ N (µ, σ2 ) where µ and σ are the two neural
network outputs. (How to train these models will be presented in
Lecture 7.)
9/21
Predictive Uncertainty
Approach 2:
▶ Train an ensemble of neural networks and measure prediction
uncertainty as the discrepancy of predictions of each network in the
ensemble, e.g. for an ensemble of n neural networks with respective
outputs o1 , . . . , on , we generate the two aggregated outputs
µ = avg(o1 , . . . , on ) and σ = std(o1 , . . . , on ).
10/21
Predictive Uncertainty
Approach 2 (cont.):
11/21
Part 3 Beyond Regularization: Prior Knowledge
12/21
Prior Knowledge
Recap:
▶ Machine learning is aected by data quality issues (e.g. scarcity of data
or labels, spurious correlations in the dataset, under-representation of
some part of the distribution, shift between data available for training
and data when deployed).
Idea:
▶ There is no point to learn from the data what we already know.
What we already know (our prior knowledge) should ideally be
hard-coded into the model.
Example:
▶ In specic tasks, certain features are known to have no eect on the
quantity to predict. It is better to not give them as input to the neural
network.
fever
symptoms cough disease 1
... disease 2
male/female ...
patient data age
month
date day irrelevant
13/21
Physical Invariances
14/21
Physical Invariances
15/21
Soft Invariances
16/21
Feature Reuse / Transfer Learning
17/21
Transfer Learning with Deep Networks
Approach:
▶ When two tasks are related, we can train a big neural network on the
rst task with abundant data, and reuse features in intermediate layer
for the task of interest.
18/21
How to Generate Useful Features
Classical approach:
▶ Learn a model to classify a more general task (e.g. image recognition).
Self-supervised learning approach
▶ Create an articial task where labels can be produced automatically,
and whose solution requires features that are needed for the task of
interest (more in DL2).
19/21
Summary
20/21
Summary
21/21