0% found this document useful (0 votes)

16 views33 pages

DL-Lec 2 - Bias-Variance-Tradeoff

The document outlines a master's degree program in Data Science and Mechatronics and Smart Technology Engineering, focusing on key topics such as the bias-variance tradeoff, overfitting, and regularization in machine learning. It includes a syllabus detailing various data science concepts, learning curves, and methods to address overfitting through regularization techniques. The content is presented in a structured format, highlighting the importance of balancing model complexity and generalization capabilities.

Uploaded by

manasishivarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views33 pages

DL-Lec 2 - Bias-Variance-Tradeoff

Uploaded by

manasishivarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

DATA SCIENCE AND Master degree

MECHATRONICS AND SMART

TECHNOLOGY ENGINEERING
AUTOMATION

Lectures 6-7: Bias-variance SPEAKER

Prof. Mirko Mazzoleni
tradeoff, overfitting and PLACE

regularization University of Bergamo

Syllabus
1. Introduction to data science 10. Neural networks

1.1 The business perspective 11. Machine vision

1.2 Data analysis processes 11.1 Classic approaches

2. Data visualization 11.2 CNN and deep learning

3. Maximum Likelihood Estimation 12. Unsupervised learning

4. Linear regression 12.1 k-means and hierarchical clustering

5. Logistic regression 12.2 Principal Component Analysis

6. Bias-Variance tradeoff 13. Fault diagnosis

7. Overfitting and regularization 13.1 Model-based fault diagnosis

8. Validation and performance metrics 13.2 Signal-based fault diagnosis

9. Decision trees 13.3 Data-driven fault diagnosis

2 /32
Outline

1. Bias-variance tradeoff

2. Learning curves

3. Overfitting

4. Regularization

3 /32
Outline

1. Bias-variance tradeoff

2. Learning curves

3. Overfitting

4. Regularization

4 /32
Approximation vs. generalization
The example shows:

• Perfect fit on in sample (training) data

↓
𝐸in = 0

• Bad fit on out of sample (test) data

↓

𝐸out huge

5 /32
Approximation vs. generalization
The final aim is to have a small 𝐸out : good approximation of 𝑓 out-of-sample

Hypothesis space ℋ MORE complex Better chances to approximate 𝑓 in-

sample

Hypothesis space ℋ LESS complex Better chances to generalize 𝑓 out-

of-sample

The ideal case would be to have a hypothesis space ℋ that contains only the function 𝑓

ℋ= 𝑓 Win a lottery ticket ☺

6 /32
Bias and variance of machine learning model
2 2
𝒟
𝐛𝐢𝐚𝐬 𝟐 = 𝑔ҧ 𝒙 − 𝑓 𝒙 𝐯𝐚𝐫𝐢𝐚𝐧𝐜𝐞 = 𝔼𝒟 𝑔 𝒙 − 𝑔ҧ 𝒙
• 𝑔 𝒟 function learnt
𝑓 on the dataset 𝒟 𝑓
Bias • 𝑔:ҧ average
ℋ function in ℋ
ℋ
Variance
VERY SMALL model set. Since there is only one VERY LARGE model set. The target function is in
hypothesis, both the average function 𝑔ҧ and the ℋ . Different data sets will led to different
final hypothesis 𝑔 𝒟 will be the same, for any hypotheses that agree with 𝑓 on the data set, and
dataset. are spread around 𝑓 in the red region.
Thus, var = 0. The bias will depend solely on how Thus, bias ≈ 0 because 𝑔ҧ is likely to be close to
well this single hypothesis approximates the target 𝑓. The variance is large (heuristically represented
𝑓, and unless we are extremely lucky, we expect a by the size of the red region)
large bias

7 /32
Bias and variance of machine learning model
Heuristic rule
How many points 𝑁 are required to ensure a good chance of generalizing?

Error
Out-of-sample error
Out of sample error

𝑁 ≥ 10 ⋅ number of model parameters

Model complexity
Model
General principle complexity

The «model complexity» should match the In-sample

In error
sample error

number of data, not the target complexity

𝑑∗ Number of params
VC dimension

8 /32
Outline

1. Bias-variance tradeoff

2. Learning curves

3. Overfitting

4. Regularization

9 /32
Learning curves
Learning curves are a graphical tool to understand if a learning model suffers from bias
or variance problems

The idea is to represent, by varying the number of data 𝑁 used to train the model:

• The expected out-of-sample error 𝔼𝒟 𝐸out 𝑔𝒟

𝒟1 𝒟2 𝒟3 𝒟4 𝒟5 𝒟6
• The expected out-of-sample error 𝔼𝒟 𝐸in 𝑔𝒟

In practice, the curves are computed from one dataset, or by dividing it into more parts
and taking the «average curve» resulting from the various sub-datasets

10 /32
Learning curves
Error 𝐕𝐀𝐑𝐈𝐀𝐍𝐂𝐄 "big"

𝐕𝐀𝐑𝐈𝐀𝐍𝐂𝐄 "small"

Error
Expected error
Expectederror

𝐸out

Expected
Expected

𝐸out
𝐸in
𝐁𝐈𝐀𝐒 2 "small"
𝐁𝐈𝐀𝐒 2 "big"
𝐸in
Number
Numberof ofdata used to
data𝑁points, Number of
Number of data
data 𝑁 used to
points,
train the model train the model

«Simple» model «Complex» model

11 /32
Learning curves
Interpretation
• The bias can be present when the expected error is quite high and 𝐸in is similar to 𝐸out
• When there is bias, it is unlikely that more data will help
• The variance can be present when there is a huge gap between 𝐸in and 𝐸out
• When there is variance, it is likey that more data will help

Solving a bias problem Solving a variance problem

• Add more features, for instance • Use fewer features
by combining the original ones • Get more data
• Use regularization
• Boosting
• Bagging

12 /32
Outline

1. Bias-variance tradeoff

2. Learning curves

3. Overfitting

4. Regularization

13 /32
Overfitting
We encountered the overfitting phenomenon when we talked about the approximation-
generalization tradeoff

We saw how we must use simpler models if

we have few data, independently of the
complexity of the true function

We now introduce another cause for the

overfitting: the stochastic noise on output
data 𝑦

14 /32
Overfitting

• Simple function to learn

•𝑁=5 points

• Model: 4-th order polynomial

𝐸in = 0 𝐸out = 0

15 /32
Overfitting

• Simple function to learn

• 𝑁 = 5 noisy points

• Model: 4-th order polynomial

16 /32
Overfitting

• Simple function to learn

• 𝑁 = 5 noisy points

• Model: 4-th order polynomial

𝐸in = 0 𝐸out = 𝐡𝐮𝐠𝐞

17 /32
Example: student that has to learn some concepts
To understand the phenomenon of overfitting in an intuitive way, let's consider the
following similarity

The teacher of a course provides solved exercises in order to teach how to solve a
problem. The exam exercises must necessarily be different from those provided in
class, otherwise the teacher is not able to understand if the student has only
memorized how to solve the exercises or if she/he has really learned the concepts

In the first case (memorizing) the student has not really learned: when he is faced with
a similar (but different) exercise she/he will not be able to solve it. The student has
overfitted the exercise seen in class, without having generalized the concepts and
therefore the solution method

18 /32
Overfitting vs. model complexity

Error
• We talk of overfitting when High
Highbias
bias SmallLow bias
bias
Low variance
Small variance HighHigh variance
variance
decreasing 𝐸in leads to
increasing 𝐸out

• Major source of failure for

machine learning systems Out of sampleerror
Out-of-sample error

Error
• Overfitting leads to bad
generalization
In sample error
In-sample error Overfitting
• A model can exibit bad
generalization even if it does
not overfit Low High
Low Model complexity High
complexity complexity
Model complexity
19 /32
Overfitting vs. model complexity

Underfit OK Overfit

High bias High variance

20 /32
Outline

1. Bias-variance tradeoff

2. Learning curves

3. Overfitting

4. Regularization

21 /32
A cure for overfitting
Regularization is the first line of defense against overfitting
We have seen that more complex models are more prone to overfitting. This is because
they are more «powerful» (expressive) and therefore can also adapt to the noise

Simple models show less variance due to their limited expressiveness. The reduction in
the variance of the model is often greater than the increase in its bias, so that, overall,
the expected error decreases (bias 2 + var + noise variance)

However, if we stick to simple models only, we may not get a satisfactory approximation
of the target function 𝑓

How can we retain the benefits of both worlds?

22 /32
Regularization
Idea: in addition to minimizing the «data-fit» cost 𝐸in 𝜽 ≡ 𝐽 𝜽 , minimize also the model
complexity

Instead of 𝐸in 𝜽 , we minimize an augmented error 𝐸aug 𝜽 ℎ ⋅ is some function

that represents our
model
𝑁
1 2
𝐸aug 𝜽 = ෍ 𝑦 𝑖 −ℎ 𝒙 𝑖 ; 𝜽 + 𝜆reg ⋅ Ω 𝜽
𝑁
𝑖=1 Regularizer: how much the
model is «complex»
Dat-fit: How bad the model
fits the data (is an error term)

The term 𝜆reg (hyperparameter) weights the importance of minimizing 𝐸in ≡ 𝐽 𝜽 with
respect to minimizing Ω 𝜽

23 /32
Example: 𝐿2 regularization
The 𝐿2 regularization penalizes the sum of the squared model’s coefficients 𝜽 ∈ ℝ𝑑×1

𝑁 𝑑−1
1 2 2
𝐸aug 𝜽 = ෍ 𝑦 𝑖 −ℎ 𝒙 𝑖 ; 𝜽 + 𝜆reg ⋅ ෍ 𝜃𝑗
𝑁 𝑑×1 𝑑×1 1×1
𝑖=1 𝑗=0

• If this 𝐿2 regularization is applied to a linear regression problem, the method is called

Ridge regression

• The intercept 𝜃0 sometimes is not penalized. In this case, 𝑗 will start from 1

• This problem can also be seen as a constrained optimization problem

24 /32
Example: 𝐿2 regularization
𝑁
1 2
minimize 𝐸in 𝜽 = ෍ 𝑦 𝑖 −ℎ 𝒙 𝑖 ; 𝜽
𝑁
𝑖=1

subject to 𝜽⊤ 𝜽 ≤ 𝑐
1×𝑑 𝑑×1 1×1

• With this interpretation, we are explicitly constraining the coefficients 𝜽 to not take big
values

• There is relation between 𝑐 and 𝜆reg so that if 𝑐 ↑, then 𝜆 ↓

In fact, larger 𝑐 means that the weights can be greater. This is the same as setting for a lower
𝜆reg , because the regularization term will be less important and therefore the weights will not be
reduced as much

25 /32
Effect of the regularization hyperparameter 𝜆
𝜆reg = 0 𝜆reg > 𝜆reg 𝜆reg > 𝜆reg 𝜆reg > 𝜆reg
1 2 1 3 2 4 3

Overfit Underfit

If we regularize too much, we will learn the simplest possible function, which is a
horizontal line (constant) with intercept 𝜃0

26 /32
Intuition about the importace of 𝐸aug w.r.t. 𝐸in
Minimizing 𝐸aug instead of 𝐸in leads to a better model (that is a model with better
generalization capabilities and so with lower 𝐸out )
error

error
Error

𝐕𝐀𝐑𝐈𝐀𝐍𝐂𝐄 ෩ = 𝐸out − 𝐸in

ExpectedError
Ω
Expected
Expected

Expected
𝐸out 𝐸out

𝐸in 𝐸in

𝐁𝐈𝐀𝐒 2 𝐸in

Number
Numberof ofdata used to
data𝑁points, Number
Numberof ofdata used to
data𝑁points,
train the model train the model

27 /32
Intuition about the importace of 𝐸aug w.r.t. 𝐸in
From the previous graph, as well as through bias and variance, we can interpret 𝐸out as
the sum of two contributions:

error
෩ = 𝐸out − 𝐸in

Error
Ω
෩ 𝜽
𝐸out 𝜽 = 𝐸in 𝜽 + Ω

Expected
𝐸out

Expected
Recalling the definition of 𝐸aug we have 𝐸in

𝐸in
𝐸aug 𝜽 = 𝐸in 𝜽 + 𝜆Ω 𝜽
Number of
Number ofdata used to
data𝑁points,
train the model

The error 𝐸aug is a better proxy of 𝐸out than 𝐸in

28 /32
Intuition about the importace of 𝐸aug w.r.t. 𝐸in

The Holy Grail of machine learning would be to have an

expression of 𝐸out to minimize

• In this way, it would be possible to directly minimize the out-of-sample error instead of
the in-sample one (or the augmented error)

• The regularization helps to estimate the quantity Ω(𝜽), that, added to 𝐸in , gives 𝐸aug , il
an estimate of 𝐸out

29 /32
Chosing the regularization term
There are different types of regularizers. The most used are:
𝑑−1
2
• 𝐿2 regularization: also called Ridge regularization Ω 𝜽 = ෍ 𝜃𝑗
𝑗=0
𝑑−1

• 𝐿1 regularization: also called Lasso regularization Ω 𝜽 = ෍ 𝜃𝑗

𝑗=0
𝑑−1 𝑑−1
2
• Elastic-net regularization: Ω 𝜽 = 𝛽 ෍ 𝜃𝑗 + 1 − 𝛽 ෍ 𝜃𝑗
𝑗=0 𝑗=0

The Ridge penalty tends to reduce all coefficients to a small value

The Lasso penalty tends to bring the coefficients exactly to zero

30 /32
Chosing the regularization term
Estimate of 𝜽 without
Ridge Lasso regularization

𝜃2 𝜃2
Notice that
Notice that
෡
𝜽 𝜃1 = 0 and ෡
𝜽
𝜃1 e 𝜃2 are 𝐸in 𝜽 ≡ 𝐽 𝜽 𝜃2 is «small»
«small»
Level curves
assuming a convex
cost function 𝐽 𝜽 Estiamate of 𝜽 with
regularization

𝜃1 𝜃1
𝜃1 2 + 𝜃2 2 ≤ 𝑐 𝜃1 + 𝜃2 ≤𝑐
Constraints

31 /32
Regularization and bias-variance tradeoff
The effects of regularization can be observed in terms of bias and variance:

• Regularization slightly increases the bias (because I get a simpler model) in order to
considerably reduce the variance of the learning model

• Regularization leads to more «smooth» hypotheses, reducing the risk of overfitting

• The regularization hyperparameter 𝜆reg must be chosen specifically for each type of
regularizer. Usually, a procedure such as validation or cross-validation is used

32 /32

DL Unit1
100% (2)
DL Unit1
79 pages
Grade 1 Mathematics Lesson Plan
100% (33)
Grade 1 Mathematics Lesson Plan
4 pages
4 - Bias-Variance Tradeoff
No ratings yet
4 - Bias-Variance Tradeoff
28 pages
Underfitting & Overfitting
No ratings yet
Underfitting & Overfitting
13 pages
Bias Variance Annotated
No ratings yet
Bias Variance Annotated
73 pages
"Regularization
No ratings yet
"Regularization
48 pages
Regularization Linear Models
No ratings yet
Regularization Linear Models
23 pages
12-Regularization For Deep Learning-17!08!2024
No ratings yet
12-Regularization For Deep Learning-17!08!2024
51 pages
All DL
No ratings yet
All DL
72 pages
Excellent 05 - Overfitting
No ratings yet
Excellent 05 - Overfitting
22 pages
4 MachineLearningForCV
No ratings yet
4 MachineLearningForCV
73 pages
Variance and Bias
No ratings yet
Variance and Bias
14 pages
Lecture 4 - Regularization
No ratings yet
Lecture 4 - Regularization
22 pages
Lec 3
No ratings yet
Lec 3
13 pages
08 Eval-Intro Notes
No ratings yet
08 Eval-Intro Notes
10 pages
Chapter5 Regularization Summary Final
No ratings yet
Chapter5 Regularization Summary Final
10 pages
Overfitting and Mitigation
No ratings yet
Overfitting and Mitigation
15 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
Ridge Lasso Regression Bias Variance Tradeoff 71
No ratings yet
Ridge Lasso Regression Bias Variance Tradeoff 71
19 pages
(Technical) Machine Learning U3-6 (2019 Pattern)
No ratings yet
(Technical) Machine Learning U3-6 (2019 Pattern)
101 pages
Module 04
No ratings yet
Module 04
16 pages
Ai - W7L14
No ratings yet
Ai - W7L14
22 pages
4.bias and Variance
No ratings yet
4.bias and Variance
19 pages
Module 3 Modified
No ratings yet
Module 3 Modified
48 pages
Linear Regression, Polynomical, Gradiant Descent
No ratings yet
Linear Regression, Polynomical, Gradiant Descent
42 pages
Session 3
No ratings yet
Session 3
26 pages
Overfitting Regression
No ratings yet
Overfitting Regression
14 pages
Bias Variance
No ratings yet
Bias Variance
8 pages
Unit 4
No ratings yet
Unit 4
50 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
Uf, Of, Bias-Variance Tradeoff
No ratings yet
Uf, Of, Bias-Variance Tradeoff
3 pages
CMPE257 - W2C3 - ML Fundamentals - Part 2
No ratings yet
CMPE257 - W2C3 - ML Fundamentals - Part 2
34 pages
15-The Bias - Variance - Trade-Off-08-04-2024
No ratings yet
15-The Bias - Variance - Trade-Off-08-04-2024
23 pages
12 Bias-Variance - Underfit - Overfit
No ratings yet
12 Bias-Variance - Underfit - Overfit
4 pages
Curs5site PDF
No ratings yet
Curs5site PDF
47 pages
Machine Learning Notes Anna University
No ratings yet
Machine Learning Notes Anna University
9 pages
Computer Hardware Presentation
50% (2)
Computer Hardware Presentation
21 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
5 pages
Regularization
No ratings yet
Regularization
46 pages
Machine Learning-2
No ratings yet
Machine Learning-2
87 pages
NNDL Notes
No ratings yet
NNDL Notes
73 pages
Machine Learning Juunit2.pdf Lands
No ratings yet
Machine Learning Juunit2.pdf Lands
7 pages
Lecture Slides 3 - Bias Variance and Regularisation For Neural Networks - 2021
No ratings yet
Lecture Slides 3 - Bias Variance and Regularisation For Neural Networks - 2021
29 pages
L11+ Regularization
No ratings yet
L11+ Regularization
24 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
Overfitting and Underfitting in Machine Learning
No ratings yet
Overfitting and Underfitting in Machine Learning
3 pages
Unit 4
No ratings yet
Unit 4
35 pages
Lecture 2 Ai
No ratings yet
Lecture 2 Ai
24 pages
Diagnosing Bias Vs Variance
No ratings yet
Diagnosing Bias Vs Variance
11 pages
Regularization 1704650055
No ratings yet
Regularization 1704650055
32 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
Csa202 Unit 2
No ratings yet
Csa202 Unit 2
36 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
Emsemble Methods-Pages-Deleted
No ratings yet
Emsemble Methods-Pages-Deleted
2 pages
Chapter2 1 22
No ratings yet
Chapter2 1 22
9 pages
Obe Syllabi Geology For Civil Engineers
No ratings yet
Obe Syllabi Geology For Civil Engineers
10 pages
Pore-Pressures Coefficients A and B - Skempton (1954)
100% (1)
Pore-Pressures Coefficients A and B - Skempton (1954)
5 pages
Bais and Variance
No ratings yet
Bais and Variance
4 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
30XW Catalogue
No ratings yet
30XW Catalogue
4 pages
Alert For New Employee
No ratings yet
Alert For New Employee
9 pages
2º Conj. EM2 - 2021 - Prova 01
No ratings yet
2º Conj. EM2 - 2021 - Prova 01
50 pages
PMI Online Setup and Commisioning
No ratings yet
PMI Online Setup and Commisioning
25 pages
Internship Report Guideline (4) 20150427
No ratings yet
Internship Report Guideline (4) 20150427
12 pages
(Hotel Name) Feedback Form: Customer Name: Address: Email/Phone Account
No ratings yet
(Hotel Name) Feedback Form: Customer Name: Address: Email/Phone Account
2 pages
This Is Hybrid - Texts - Architecture
No ratings yet
This Is Hybrid - Texts - Architecture
38 pages
Imsed01 & Imset01
100% (1)
Imsed01 & Imset01
146 pages
FreeEnergy - Frolov Alexandr
No ratings yet
FreeEnergy - Frolov Alexandr
18 pages
Syllabus GR 4 Final Exams 2025
No ratings yet
Syllabus GR 4 Final Exams 2025
2 pages
Survey Q
No ratings yet
Survey Q
5 pages
Name: Dian Puspita Wati Class: XI Science Three: Summary of English Lesson
No ratings yet
Name: Dian Puspita Wati Class: XI Science Three: Summary of English Lesson
22 pages
Light Reflection and Refraction Questions
No ratings yet
Light Reflection and Refraction Questions
16 pages
ETN - Equipamentos Industriais
No ratings yet
ETN - Equipamentos Industriais
1 page
Group Planning Exercise - 2
No ratings yet
Group Planning Exercise - 2
1 page
Revised Affinity Laws
No ratings yet
Revised Affinity Laws
13 pages
Job Satisfaction, Job Performance, and Effort and Reexamination Using Agency Theory
No ratings yet
Job Satisfaction, Job Performance, and Effort and Reexamination Using Agency Theory
15 pages
The Relationship Between Facebook and Uncertainty: Chris Watson Mena Shenouda
No ratings yet
The Relationship Between Facebook and Uncertainty: Chris Watson Mena Shenouda
28 pages
Lorenz K Companion in The Bird's World
No ratings yet
Lorenz K Companion in The Bird's World
29 pages
Frtool - The User's Guide: Frequency Response Controller Design Tool
No ratings yet
Frtool - The User's Guide: Frequency Response Controller Design Tool
21 pages
The Veldt Discussion Questions
No ratings yet
The Veldt Discussion Questions
2 pages
Jurnal Promosi Kesehatan Tentang Alas Kaki
No ratings yet
Jurnal Promosi Kesehatan Tentang Alas Kaki
8 pages
Gas Dynamics Outline Fall 2014
No ratings yet
Gas Dynamics Outline Fall 2014
3 pages
Tutorial Sheet 1 For Che 110 2023-2024 Intake
No ratings yet
Tutorial Sheet 1 For Che 110 2023-2024 Intake
2 pages
Assignment 2 DESA 1004 - Paulo Ricardo Rangel Maciel Pimenta
No ratings yet
Assignment 2 DESA 1004 - Paulo Ricardo Rangel Maciel Pimenta
4 pages
3.decision Making and Looping
No ratings yet
3.decision Making and Looping
3 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DL-Lec 2 - Bias-Variance-Tradeoff

Uploaded by

DL-Lec 2 - Bias-Variance-Tradeoff

Uploaded by

DATA SCIENCE AND Master degree

MECHATRONICS AND SMART

Lectures 6-7: Bias-variance SPEAKER

regularization University of Bergamo

1.1 The business perspective 11. Machine vision

1.2 Data analysis processes 11.1 Classic approaches

2. Data visualization 11.2 CNN and deep learning

3. Maximum Likelihood Estimation 12. Unsupervised learning

4. Linear regression 12.1 k-means and hierarchical clustering

5. Logistic regression 12.2 Principal Component Analysis

6. Bias-Variance tradeoff 13. Fault diagnosis

7. Overfitting and regularization 13.1 Model-based fault diagnosis

8. Validation and performance metrics 13.2 Signal-based fault diagnosis

9. Decision trees 13.3 Data-driven fault diagnosis

• Perfect fit on in sample (training) data

• Bad fit on out of sample (test) data

Hypothesis space ℋ MORE complex Better chances to approximate 𝑓 in-

Hypothesis space ℋ LESS complex Better chances to generalize 𝑓 out-

ℋ= 𝑓 Win a lottery ticket ☺

𝑁 ≥ 10 ⋅ number of model parameters

The «model complexity» should match the In-sample

number of data, not the target complexity

• The expected out-of-sample error 𝔼𝒟 𝐸out 𝑔𝒟

«Simple» model «Complex» model

Solving a bias problem Solving a variance problem

We saw how we must use simpler models if

We now introduce another cause for the

• Simple function to learn

• Model: 4-th order polynomial

• Simple function to learn

• Model: 4-th order polynomial

• Simple function to learn

• Model: 4-th order polynomial

𝐸in = 0 𝐸out = 𝐡𝐮𝐠𝐞

• Major source of failure for

High bias High variance

How can we retain the benefits of both worlds?

Instead of 𝐸in 𝜽 , we minimize an augmented error 𝐸aug 𝜽 ℎ ⋅ is some function

• If this 𝐿2 regularization is applied to a linear regression problem, the method is called

• This problem can also be seen as a constrained optimization problem

• There is relation between 𝑐 and 𝜆reg so that if 𝑐 ↑, then 𝜆 ↓

𝐕𝐀𝐑𝐈𝐀𝐍𝐂𝐄 ෩ = 𝐸out − 𝐸in

The error 𝐸aug is a better proxy of 𝐸out than 𝐸in

The Holy Grail of machine learning would be to have an

• 𝐿1 regularization: also called Lasso regularization Ω 𝜽 = ෍ 𝜃𝑗

The Ridge penalty tends to reduce all coefficients to a small value

• Regularization leads to more «smooth» hypotheses, reducing the risk of overfitting

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.