0% found this document useful (0 votes)
32 views15 pages

Overfitting and Mitigation

The document discusses methods for avoiding overfitting in machine learning models. It explains the concepts of bias and variance, and how the bias-variance tradeoff affects model performance. It also describes several regularization techniques like Lasso, Ridge and Elastic Net regression that can be used to reduce overfitting.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views15 pages

Overfitting and Mitigation

The document discusses methods for avoiding overfitting in machine learning models. It explains the concepts of bias and variance, and how the bias-variance tradeoff affects model performance. It also describes several regularization techniques like Lasso, Ridge and Elastic Net regression that can be used to reduce overfitting.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Managing Overfitting

1
Overfitting
§ Errors in machine learning
§ Bias and Variance
§ Example
§ Methods to avoid Overfitting in DL
§ Simpler Architecture
§ Regularization
§ Drop-out layer

2
Errors in machine learning model
Errors in Machine
Learning Model

Correctable Errors Uncorrectable Errors

Bias Variance

3
Bias and Variance
Bias:
• Bias is the inability of the model to learn the patterns in the data effectively.
• High bias models (read as simple models) will not learn the relationship between the inputs and the
features.
• Consequently, the model's predictions will also be inaccurate, resulting in errors.

Example of Simple (High Bias) Model:


Low Accuracy
Y = Bo + B1.X1 + B2.X2

High Accuracy
Example of Complex (Low Bias) Model:

Y = Bo + B1.X1 + B2.X2 + B3.X3 + B4.X4 + B5.X5 + B6.X6 4


Bias and Variance
Variance:
• Variance is the model's inability to repeat its performance consistently.
• Variance increases with model’s complexity
• A complex model learns the patterns so minutely in a training data set that it cannot produce the results
in the test data.
• Typically, complex models tend to have high variance, and simpler models tend to have low variance.

Example of Low Variance Model:


Low Variance (Highly consistent)
Y = Bo + B1.X1 + B2.X2

High Variance (Highly Inconsistent)


Example of High Variance Model:

Y = Bo + B1.X1 + B2.X2 + B3.X3 + B4.X4 + B5.X5 + B6.X6 5


Bias and Variance
Variance = Low Variance = High

Low Bias, High Variance


[Too Complex Models]
Low Bias, Low Variance [Model memorizes
Bias = Low
[IDEAL Model] patterns]
[Non-Generalizable]
[Overfitting]

High Bias, Low Variance


[Too Simple Models]
High Bias, High Variance
Bias = High [Too few data points] [Inconsistent Models]
[Unrepresentative data]
[Underfitting]

6
Bias and Variance
But finding the ideal model is non-trivial. Why ?

Bias Variance Bias Variance

7
Bias and Variance

Bias Curve
Variance Curve

overfitting
Underfitting Ideal
zone
zone zone

8
Example
Train Data Test Data Ideal Model = Similar and
Model Bias Variance Inference
RMSE RMSE Reasonable performance in Train
0.3 0.3 and Test Data
Model 1 Low Low Satisfactory
(reasonable) (reasonable)
0.7 0.7 Underfitting Model = Poor
Model 2 High Low Underfitting
(poor) (poor)
performance in Train and Test Data
0.2 0.8
Model 3 Low High Overfitting
(reasonable) (poor)
Overfitting Model = Reasonable
0.9 0.9 Inconsistent and
Model 4
(poor) (poor)
High High
Inaccurate performance in Train but poor
performance Test Data

9
Remedies to Avoid Overfitting

Avoiding Overfitting

Regularization

10
Lasso Regression
Addition of Penalty factor
will result in shrinkage of
Minimize + λ and may even force some to
zero

Where ŷ = β0 + β1. x1i + β2. x2i + β3. x3i + …. βp. xni

λ is called the shrinkage coefficient, λ controls the amount of regularization.


As λ ↓ 0, the solution is similar to least squares solution.
As λ ↑ ∞, the solution will tend towards an intercept only model.

11
Ridge Regression
Addition of Penalty
factor will result in
Minimize + λ shrinkage of

Where ŷ = β0 + β1. x1i + β2. x2i + β3. x3i + …. βp. xni

λ is called the shrinkage coefficient, λ controls the amount of regularization.


As λ ↓ 0, the solution is similar to least squares solution.
As λ ↑ ∞, the solution will tend towards a intercept only model.

12
ELNET Regression

Minimize + λ + λ

Where ŷ = β0 + β1. x1i + β2. x2i + β3. x3i + …. βp. xni

λ is called the shrinkage coefficient, λ controls the amount of regularization.


As λ ↓ 0, the solution is similar to least squares solution.
As λ ↑ ∞, the solution will tend towards a intercept only model.

13
Regularization
LASSO Regression Ridge Regression

• L1 produces results with sparse betas • Ridge achieves parameter shrinkage only
and smaller coefficients • Difficult to interpret as it retains all
• betas may be set to zeros predictors
• Useful when you are interested in
keeping less no of attributes in the
model

So what ? So what ?
• Model is efficient to store • Model is useful when over-fitting /variance is
• Model is efficient to compute the main concern

14
Thank You

15

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy