0% found this document useful (0 votes)

11 views22 pages

Lecture 06

Uploaded by

Tim Widmoser

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views22 pages

Lecture 06

Uploaded by

Tim Widmoser

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

WiSe 2023/24

Deep Learning 1

Lecture 6 Overtting & Robustness (2)

Outline

Worst-Case Analysis
▶ The problem of adversarial examples
▶ Adversarial robustness
Adding Predictive Uncertainty
▶ Why predictive uncertainty?
▶ Density networks
▶ Ensemble models
Adding Prior Knowledge
▶ Translation invariance, local smoothness, etc.
▶ Feature reuse (transfer / multitask / self-supervised learning)

1/21
Part 1 Worst-Case Analysis

2/21
Motivations

Risk aversion
▶ One big error can often be more harmful than many small errors, e.g. a
system being controlled by a neural network may be tolerant to small
errors (which can be corrected subsequently), but not to a big error
from which one cannot recover.

Adversarial components
▶ Even though the neural network may perform well on average, an
adversary may craft inputs that steer the ML system towards the
worst-case decision behavior.

3/21
Worst-Case Analysis

neural network
predictions

ground truth

Typical causes of large errors:

▶ High dimensionality of input space allows to nely craft patterns to
which the network responds highly.
▶ High depth/nonlinearity implies that the function is locally steeper than
necessary.

4/21
Example: Adversarial Examples
▶ Carefully crafted nearly invisible perturbations of an existing data point
can cause the prediction of a neural network to change drastically,
while leaving almost no trace of the attack.

Image source: https://arxiv.org/abs/1312.6199

▶ Serious concern in various applications (e.g. biometric identication,

reading trac signs).
5/21
Addressing Worst-Case Behavior

Enhanced Regularization:
▶ Search for high local variations of
the decision function and add these
variations as a term of the error
function to minimize. neural network
predictions
▶ In practice, this can take the form of
generating adversarial examples, and
forcing them to be predicted in the
same way as the original data.
▶ More generic approaches based on
Lipschitz-continuity (e.g. spectral
norm regularization) can also be
used.
Data Preprocessing:
▶ In practice, one can also address worst-case behavior by applying
dimensionality reduction (e.g. blurring images) before applying the
neural network.

6/21
Part 2 Predictive Uncertainty

7/21
Predictive Uncertainty

Practical motivations:
▶ Understand when we can
trust the model in a more
precise way than just looking
at the overall error.
▶ Enables the user to be
prompted when the model is
unsure, in which case, the
user can decide e.g. to
collect more data, or to
perform the prediction Image source: https://doi.org/10.1103/PhysRevD.98.063511

manually.

8/21
Predictive Uncertainty

Approach 1:
▶ Explicitly encoding the uncertainty estimate in the neural network, i.e.
have one output neuron for predicting the actual value of interest, and
a second output neuron for predicting the uncertainty associated to
this prediction.
▶ For example, one predicts that the output is distributed according to
the random variable y ∼ N (µ, σ2 ) where µ and σ are the two neural
network outputs. (How to train these models will be presented in
Lecture 7.)

Image source: https://brendanhasz.github.io/2019/07/23/bayesian-density-net.html

9/21
Predictive Uncertainty
Approach 2:
▶ Train an ensemble of neural networks and measure prediction
uncertainty as the discrepancy of predictions of each network in the
ensemble, e.g. for an ensemble of n neural networks with respective
outputs o1 , . . . , on , we generate the two aggregated outputs
µ = avg(o1 , . . . , on ) and σ = std(o1 , . . . , on ).

that represent the nal prediction and its uncertainty.

Image source: https://doi.org/10.1007/s00521-019-04359-7

10/21
Predictive Uncertainty
Approach 2 (cont.):

Image source: https://doi.org/10.1007/s00521-019-04359-7

▶ Each network in the ensemble may have a dierent initialization, may

receive dierent input features, and may be trained on dierent subsets
of data.
▶ Uncertainty can then be understood as the eect of neural network
random initialization, feature selection, data sample.
▶ The more heterogeneous the ensemble, the higher the estimate of
uncertainty.

11/21
Part 3 Beyond Regularization: Prior Knowledge

12/21
Prior Knowledge
Recap:
▶ Machine learning is aected by data quality issues (e.g. scarcity of data
or labels, spurious correlations in the dataset, under-representation of
some part of the distribution, shift between data available for training
and data when deployed).
Idea:
▶ There is no point to learn from the data what we already know.
What we already know (our prior knowledge) should ideally be
hard-coded into the model.
Example:
▶ In specic tasks, certain features are known to have no eect on the
quantity to predict. It is better to not give them as input to the neural
network.
fever
symptoms cough disease 1
... disease 2
male/female ...
patient data age
month
date day irrelevant

13/21
Physical Invariances

Example: Rotation/Translation Invariance

▶ Rotating or translating a molecule maintains its atomization energy
constant.

▶ Rotation invariance can be ensured e.g. by encoding the molecule by

the pairwise distances between its atoms rather than its 3d
coordinates, and feeding these distances to a neural network.

14/21
Physical Invariances

Example: Modeling interaction between two molecules

Image source: https://doi.org/10.1021/acs.jctc.8b01285

▶ Distances are fed as input to a plain neural network.

▶ Work as long as atom of the molecules can be indexed (i.e. approach
stops working when the molecules received as input are of arbitrary
shape and size).

15/21
Soft Invariances

Example: Handwritten digits recognition

▶ Rotating digits by a few degrees
usually does not change class
membership.
▶ Some exceptions, e.g. rotating a
`1' may transform it into a `7'.

Approaches to build invariance:

▶ Use purposely designed neural network architectures. E.g. scattering
networks, pooling networks, etc.
▶ Augment the dataset with elastic distortions, and train the neural
network on the extended dataset.

16/21
Feature Reuse / Transfer Learning

▶ Certain tasks have intrinsically little annotated data (e.g. scientic

data), due to the cost of labeling by an expert.
▶ However, they have similarities with other tasks with much more data
(e.g. general-purpose image recognition). In both cases, one needs to
extract features such as edge or color detectors to solve the task.

17/21
Transfer Learning with Deep Networks
Approach:
▶ When two tasks are related, we can train a big neural network on the
rst task with abundant data, and reuse features in intermediate layer
for the task of interest.

▶ This type of transfer learning is very common in applied research. For

image recognition tasks, researchers typically start from a
state-of-the-art network for vision (e.g. ResNet) pretrained on
ImageNet, and retrain the top layers on the specic task.

18/21
How to Generate Useful Features
Classical approach:
▶ Learn a model to classify a more general task (e.g. image recognition).
Self-supervised learning approach
▶ Create an articial task where labels can be produced automatically,
and whose solution requires features that are needed for the task of
interest (more in DL2).

Image source: https://doi.org/10.3390/e24040551

19/21
Summary

20/21
Summary

▶ In practice, expected prediction accuracy may not be the most relevant

quantity. The true practical usefulness of a system is often better
characterized by its worst-case performance.
▶ It is often desirable to equip neural network models with some measure
of predictive uncertainty so that the network can tell the user when to
trust and when not to trust the prediction.
▶ There is no point to learn what we already know. Prior knowledge (e.g.
invariances, shared features) can be introduced in neural networks. As
a result, the model becomes less aected by overtting and also more
robust.

21/21

Deep Learning Hand Book 2024
No ratings yet
Deep Learning Hand Book 2024
185 pages
Neural Networks in Healthcare Lecture 2 - 021808
No ratings yet
Neural Networks in Healthcare Lecture 2 - 021808
73 pages
LecML - 3 NN
No ratings yet
LecML - 3 NN
33 pages
Deep Generative Models
No ratings yet
Deep Generative Models
55 pages
Unit 4 Short Notes
No ratings yet
Unit 4 Short Notes
27 pages
Math On Networking
No ratings yet
Math On Networking
6 pages
Lec14 CNNRNNModels
No ratings yet
Lec14 CNNRNNModels
64 pages
Modelling The Population of Singapore
No ratings yet
Modelling The Population of Singapore
32 pages
Unit-Ii DLL
No ratings yet
Unit-Ii DLL
19 pages
Text Classification Using Logistics Regression
No ratings yet
Text Classification Using Logistics Regression
64 pages
Alice Book Volume 1
No ratings yet
Alice Book Volume 1
378 pages
Unit 2 - ML
No ratings yet
Unit 2 - ML
18 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
Lecture W15ab
No ratings yet
Lecture W15ab
44 pages
6 - Tips For Training Deep Neural Networks
No ratings yet
6 - Tips For Training Deep Neural Networks
59 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
155 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
167 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
Lecture 05
No ratings yet
Lecture 05
34 pages
ML LittelBook
No ratings yet
ML LittelBook
161 pages
Journal of Forecasting - 2024 - Lei - Volatility Forecasting For Stock Market in
No ratings yet
Journal of Forecasting - 2024 - Lei - Volatility Forecasting For Stock Market in
25 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
168 pages
The Little Book of Deep Learning - (François Fleuret) - University of Geneva-2023.compressed
No ratings yet
The Little Book of Deep Learning - (François Fleuret) - University of Geneva-2023.compressed
163 pages
01 - Introduction To Deep Learning
No ratings yet
01 - Introduction To Deep Learning
56 pages
Unit 3
No ratings yet
Unit 3
110 pages
Artificial Neural Networks - DL
No ratings yet
Artificial Neural Networks - DL
55 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Food Application
No ratings yet
Food Application
22 pages
Hands-On Bayesian Neural Network
No ratings yet
Hands-On Bayesian Neural Network
28 pages
LBDL
No ratings yet
LBDL
185 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
LBDL
No ratings yet
LBDL
143 pages
IoT - Lecture 11
No ratings yet
IoT - Lecture 11
58 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
Kristian Perriu Audio Classification
No ratings yet
Kristian Perriu Audio Classification
5 pages
Deep Learning Basics (Lecture Notes) : Romain Tavenard
No ratings yet
Deep Learning Basics (Lecture Notes) : Romain Tavenard
49 pages
Hernandez Lobatoc15
No ratings yet
Hernandez Lobatoc15
9 pages
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
PINN Notes
No ratings yet
PINN Notes
10 pages
SC Lab Record
No ratings yet
SC Lab Record
82 pages
Cours 4
No ratings yet
Cours 4
30 pages
MCQ's Chapter 2
No ratings yet
MCQ's Chapter 2
4 pages
6 CNN
No ratings yet
6 CNN
50 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
195 pages
Backtracking Notes
No ratings yet
Backtracking Notes
17 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
Partial Differential Equations
No ratings yet
Partial Differential Equations
8 pages
C2 Practice B4
No ratings yet
C2 Practice B4
4 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
163 pages
General Observation
No ratings yet
General Observation
93 pages
Neural Network Thesis 2013
100% (1)
Neural Network Thesis 2013
5 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Histograms Pre-12C and Now: Anju Garg
No ratings yet
Histograms Pre-12C and Now: Anju Garg
34 pages
2019 6S191 L6 PDF
No ratings yet
2019 6S191 L6 PDF
61 pages
Practical Aspects of Deep Learning PI
No ratings yet
Practical Aspects of Deep Learning PI
46 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
Chapter 6 Logarithmic and Exponential Functions
No ratings yet
Chapter 6 Logarithmic and Exponential Functions
7 pages
GK Deeplearning
No ratings yet
GK Deeplearning
15 pages
Nips10 Workshop Tutorial Final PDF
No ratings yet
Nips10 Workshop Tutorial Final PDF
73 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
PRML Assignment1 2022
No ratings yet
PRML Assignment1 2022
2 pages
Modeling Discrete Time Systems in Simulink: ECE 351 - Linear Systems II MATLAB Tutorial #5
No ratings yet
Modeling Discrete Time Systems in Simulink: ECE 351 - Linear Systems II MATLAB Tutorial #5
8 pages
Mid Important Questions-Pps
No ratings yet
Mid Important Questions-Pps
3 pages
2-Laplace Transform
No ratings yet
2-Laplace Transform
11 pages
Finite Element Analysis of Cutting Forces in High Speed Machining
No ratings yet
Finite Element Analysis of Cutting Forces in High Speed Machining
8 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
Module 1 QB PDF
No ratings yet
Module 1 QB PDF
2 pages
Lab 5A: Design A Digital Fir Low Pass Filter With The Following Specifications
No ratings yet
Lab 5A: Design A Digital Fir Low Pass Filter With The Following Specifications
6 pages
Constrained Optimization
No ratings yet
Constrained Optimization
19 pages
Nuclear Engineering and Technology: Ahmad Salehi, Mohammad Hosein Kazemi, Omid Safarzadeh
No ratings yet
Nuclear Engineering and Technology: Ahmad Salehi, Mohammad Hosein Kazemi, Omid Safarzadeh
7 pages
A Selective Overview of Deep Learning: Jianqing Fan Cong Ma Yiqiao Zhong April 16, 2019
No ratings yet
A Selective Overview of Deep Learning: Jianqing Fan Cong Ma Yiqiao Zhong April 16, 2019
37 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Module-3: Ab Initio Molecular Dynamics
No ratings yet
Module-3: Ab Initio Molecular Dynamics
22 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
Grade 8 Compound Interest in
No ratings yet
Grade 8 Compound Interest in
10 pages
Moment Distribution
No ratings yet
Moment Distribution
13 pages
Integer Linear Programming: Multiple Choice
No ratings yet
Integer Linear Programming: Multiple Choice
12 pages
Cheatsheets For Deep Learning 1650192034
No ratings yet
Cheatsheets For Deep Learning 1650192034
95 pages
Response Surface Methodology and MINITAB
100% (1)
Response Surface Methodology and MINITAB
22 pages
OPSCM Project - Shahzar Ahmed
100% (1)
OPSCM Project - Shahzar Ahmed
7 pages
MSCS - Algorithm Analysis Assignment 3
No ratings yet
MSCS - Algorithm Analysis Assignment 3
1 page
Lecture 2: Introduction To Pytorch
No ratings yet
Lecture 2: Introduction To Pytorch
7 pages
A Probabilistic Theory of Deep Learning: Unit 2
100% (1)
A Probabilistic Theory of Deep Learning: Unit 2
17 pages
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 06

Uploaded by

Lecture 06

Uploaded by

WiSe 2023/24

Lecture 6 Overtting & Robustness (2)

Typical causes of large errors:

Image source: https://arxiv.org/abs/1312.6199

▶ Serious concern in various applications (e.g. biometric identication,

Image source: https://brendanhasz.github.io/2019/07/23/bayesian-density-net.html

that represent the nal prediction and its uncertainty.

Image source: https://doi.org/10.1007/s00521-019-04359-7

Image source: https://doi.org/10.1007/s00521-019-04359-7

▶ Each network in the ensemble may have a dierent initialization, may

Example: Rotation/Translation Invariance

▶ Rotation invariance can be ensured e.g. by encoding the molecule by

Example: Modeling interaction between two molecules

Image source: https://doi.org/10.1021/acs.jctc.8b01285

▶ Distances are fed as input to a plain neural network.

Example: Handwritten digits recognition

Approaches to build invariance:

▶ Certain tasks have intrinsically little annotated data (e.g. scientic

▶ This type of transfer learning is very common in applied research. For

Image source: https://doi.org/10.3390/e24040551

▶ In practice, expected prediction accuracy may not be the most relevant

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Lecture 06

Uploaded by

Lecture 06

Uploaded by

WiSe 2023/24

Lecture 6 Overtting & Robustness (2)

Typical causes of large errors:

Image source: https://arxiv.org/abs/1312.6199

▶ Serious concern in various applications (e.g. biometric identication,

Image source: https://brendanhasz.github.io/2019/07/23/bayesian-density-net.html

that represent the nal prediction and its uncertainty.

Image source: https://doi.org/10.1007/s00521-019-04359-7

Image source: https://doi.org/10.1007/s00521-019-04359-7

▶ Each network in the ensemble may have a dierent initialization, may

Example: Rotation/Translation Invariance

▶ Rotation invariance can be ensured e.g. by encoding the molecule by

Example: Modeling interaction between two molecules

Image source: https://doi.org/10.1021/acs.jctc.8b01285

▶ Distances are fed as input to a plain neural network.

Example: Handwritten digits recognition

Approaches to build invariance:

▶ Certain tasks have intrinsically little annotated data (e.g. scientic

▶ This type of transfer learning is very common in applied research. For

Image source: https://doi.org/10.3390/e24040551

▶ In practice, expected prediction accuracy may not be the most relevant

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Lecture 6 Overtting & Robustness (2)

▶ Serious concern in various applications (e.g. biometric identication,

that represent the nal prediction and its uncertainty.

▶ Each network in the ensemble may have a dierent initialization, may

▶ Certain tasks have intrinsically little annotated data (e.g. scientic