0% found this document useful (0 votes)

6 views33 pages

Lecture 13 - Reguralization

Regularization

Uploaded by

raoseshu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views33 pages

Lecture 13 - Reguralization

Regularization

Uploaded by

raoseshu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Transfer Functions

Regression
- Regularization
Overfitting and underfitting

 Overfitting occurs when

The model captures the noise and the outliers in
the data along with the underlying pattern.
These models usually have high variance and low bias

 Under fitting occurs when

The model is unable to capture the underlying
pattern of the data. These models usually have a
low variance and a high bias.

2
Bias and variance

 Bias error
• How far is the predicted value from the true value
• The systematic error of the model
• It is about the model and the data itself

 Variance error
• The error caused by sensitivity to
small variances in the training data set
• The dispersion of predicted values
over target values with different train-sets
• It is about the model sensitivity

3
The bias variance curse
 As the complexity of model increases the bias decreases but the variance
increases

 There is a trade-off between bias and variance You can't get both low bias and
low variance at the same time

4
Regularization
 A regularizer is an additional criterion to the loss
function to make sure that we don’t overfit

 It’s called a regularizer since it tries to keep the

parameters more normal/regular

 It is a bias on the model that forces the learning

to prefer certain types of weights over others
Regularization

• Ridge/Lasso regression is a model tuning method that

is used to analyze any data that suffers from
multicollinearity.

• This method performs L2/L1 regularization.

• When the issue of multicollinearity occurs, least-

squares are unbiased, and variances are large, this
results in predicted values being far away from the
actual values.
Regularization
Multicollinearity

• It occurs when the independent variables show moderate to high correlation.

• In a model with correlated variables, it becomes a tough task to figure out the
true relationship of a predictors with response variable. In other words, it
becomes difficult to find out which variable is actually contributing to predict the
response variable.
• Another point, with presence of correlated predictors, the standard errors tend
to increase. And, with large standard errors, the confidence interval becomes
wider leading to less precise estimates of slope parameters.
• Additionally, when predictors are correlated, the estimated regression
coefficient of a correlated variable depends on the presence of other predictors
in the model.

Y = W0+W1*X1+W2*X2

• Coefficient W1 is the increase in Y for a unit increase in X1 while keeping X2

constant. But since X1 and X2 are highly correlated, changes in X1 would also
cause changes in X2, and we would not be able to see their individual effect on
Y.
What Causes Multicollinearity?

• Poorly designed experiments, highly observational data, or the inability

to manipulate the data.

• Multicollinearity could also occur when new variables are created which
are dependent on other variables.
• For example, creating a variable for BMI from the height and weight
variables would include redundant information in the model, and the new
variable will be a highly correlated variable.

• Including identical variables in the dataset.

• For example, including variables for temperature in Fahrenheit and
temperature in Celsius.

• Insufficient data, in some cases, can also cause multicollinearity

problems.
Multicollinearity

• How to check: One can use scatter plot to visualize correlation effect among
variables.

• Also VIF (Variance Inflation Factors) factor.

• VIF determines the strength of the correlation between the independent

variables. It is predicted by taking a variable and regressing it against every
other variable.

• the closer the R^2 value to 1, the higher the value of VIF and the higher the
multicollinearity with the particular independent variable.

• VIF starts at 1 and has no upper limit

• VIF = 1, no correlation between the independent variable and the other
variables
• VIF exceeding 5 or 10 indicates high multicollinearity between this
independent variable and the others.
Multicollinearity - Example

Gender (0 – female, 1- male)

Education level (0 – no formal education, 1 – under-graduation, 2 – post-graduation)

• We can see here that the ‘Age’ and ‘Years of service’ have a high VIF value, meaning
they can be predicted by other independent variables in the dataset.
Fixing Multicollinearity - Example

• We were able to drop the variable ‘Age’ from the dataset because its information was
being captured by the ‘Years of service’ variable.

• This has reduced the redundancy in our dataset.

• Dropping variables should be an iterative process starting with the variable having the
largest VIF value because other variables highly capture its trend.

• If you do this, you will notice that VIF values for other variables would have reduced, too,
although to a varying extent.

• In our example, after dropping the ‘Age’ variable, VIF values for all variables have
decreased to varying degrees.
Need to fix Multicollinearity

• Multicollinearity may not be a problem every time. The need to fix

multicollinearity depends primarily on the following reasons:

1. When you care more about how much each individual feature rather than a
group of features affects the target variable, then removing multicollinearity
may be a good option

2. If multicollinearity is not present in the features you are interested in, then
multicollinearity may not be a problem.
Regularization methods

 Ridge regression

 Lasso regression

 Elastic regression
Regularization: An Overview


14
Common regularizers

sum of the weights

sum of the squared weights

What’s the difference between these?

• Squared weights penalizes large values more

• Sum of weights will penalize small values more
Ridge Regression
 The regularized loss function is:

 Note that is the square of the l2 norm of the vector b

16
LASSO Regression
 The regularized loss function is:

 Note that is the l1 norm of the vector b

17
Choosing l
 In both Ridge and Lasso regression, we see that the larger our
choice of the regularization parameter l, the more heavily we
penalize large values in b,
• If l is close to zero, we recover the MSE, i.e. ridge and LASSO regression is just
ordinary regression.

• If l is sufficiently large, the MSE term in the regularized loss function will be
insignificant and the regularization term will force bridge and bLASSO to be close to
zero.

 To avoid ad-hoc choices, we should select l using cross-validation.

18
Ridge Regularization
Ridge Regularization
Ridge Regularization
Ridge Regularization - Example
Ridge Regularization - Example
Ridge visualized

Ridge
estimator

The ridge estimator is where The values of the coefficients

the constraint and the loss decrease as lambda increases,
intersect. but they are not nullified.

24
Ridge regularization: step by step

25
Lasso Regression

• The objective function is not differentiable, so it only

works for gradient descent solvers
LASSO visualized

Lasso
estimator

The Lasso estimator tends to The values of the coefficients

zero out parameters as the OLS decrease as lambda increases,
loss can easily intersect with the and are nullified fast.
constraint on one of the axis.

27
Lasso regularization: step by step

28
Lasso vs. Ridge Regression
 The lasso has a major advantage over ridge regression, in that it
produces simpler and more interpretable models that involved only a
subset of predictors.

 The lasso leads to qualitatively similar behavior to ridge regression, in

that as λ increases, the variance decreases and the bias increases.

 The lasso can generate more accurate predictions compared to ridge

regression.

 Cross-validation can be used in order to determine which approach is

better on a particular data set.
Lasso vs. Ridge Regression
Feature Ridge Regression Lasso Regression
Description Ridge regression, also known Lasso regression, or Least Absolute
as Tikhonov regularization, is a Shrinkage and Selection Operator, is a
technique that introduces a regularization method that also
penalty term to the linear includes a penalty term but can set
regression model to shrink the some coefficients exactly to zero,
coefficient values. effectively selecting relevant features.
Penalty Type Ridge regression utilizes an L2 Lasso regression employs an L1
penalty, which adds the sum of penalty, which sums the absolute
the squared coefficient values values of the coefficients multiplied by
multiplied by a tuning lambda.
parameter (lambda).
Coefficient The L2 penalty in ridge The L1 penalty in lasso regression can
Impact regression discourages large drive some coefficients to exactly zero
coefficient values, pushing when the lambda value is large
them towards zero but never enough, performing feature selection
exactly reaching zero. This and resulting in a sparse model.
shrinks the less important
features’ impact.
Lasso vs. Ridge Regression
Feature Ridge Regression Lasso Regression
Feature Ridge regression retains all Lasso regression can set some
Selection features in the model, reducing coefficients to zero, effectively
the impact of less important selecting the most relevant features
features by shrinking their and improving model interpretability.
coefficients.
Use Case Ridge regression is useful when Lasso regression is preferred when the
the goal is to minimize the goal is feature selection, resulting in a
impact of less important simpler and more interpretable model
features while keeping all with fewer variables.
variables in the model.
Model Ridge regression tends to Lasso regression can lead to a less
Complexity favor a model with a higher complex model by setting some
number of parameters, as it coefficients to zero, reducing the
shrinks less important number of effective parameters.
coefficients but keeps them in
the model.
Lasso vs. Ridge Regression
Feature Ridge Regression Lasso Regression
Sparsity Ridge regression does not Lasso regression can produce sparse
yield sparse models since all models by setting some coefficients
coefficients remain non-zero. to exactly zero.

Sensitivity More robust and less sensitive More sensitive to outliers due to the
to outliers compared to lasso absolute value in the penalty term.
regression.
Interpretabili The results of ridge regression Lasso regression can improve
ty may be less interpretable due interpretability by selecting only the
to the inclusion of all features, most relevant features, making the
each with a reduced but non- model’s predictions more
zero coefficient. explainable.
Elastic Regression
• If there is a group of highly correlated variables, then the LASSO tends
to select one variable from a group and ignore the others.

Machine Learning With Ridge and Lasso Regression
No ratings yet
Machine Learning With Ridge and Lasso Regression
19 pages
(Mai 4.4) Linear Regression
No ratings yet
(Mai 4.4) Linear Regression
20 pages
_Regularization_Methods_Intro_1694372556
No ratings yet
_Regularization_Methods_Intro_1694372556
38 pages
Honours 1
No ratings yet
Honours 1
5 pages
Detailed_Breakdown_Ridge_Lasso
No ratings yet
Detailed_Breakdown_Ridge_Lasso
2 pages
PA Notes 2
No ratings yet
PA Notes 2
23 pages
LASSO and Ridge-1
No ratings yet
LASSO and Ridge-1
15 pages
1. Lecture+Notes+-+Advanced+Regression
No ratings yet
1. Lecture+Notes+-+Advanced+Regression
12 pages
Multicollinearity and Endogeneity PDF
No ratings yet
Multicollinearity and Endogeneity PDF
37 pages
Lesson Four
No ratings yet
Lesson Four
28 pages
Notes_Lecture 13_Regularization_LASSO and RIDGE Regression
No ratings yet
Notes_Lecture 13_Regularization_LASSO and RIDGE Regression
29 pages
9_Linear Regression-Problems and Solutions
No ratings yet
9_Linear Regression-Problems and Solutions
23 pages
Regularization_(1)
No ratings yet
Regularization_(1)
3 pages
Unit 2
No ratings yet
Unit 2
92 pages
EDA 4th Module
No ratings yet
EDA 4th Module
26 pages
Ridge Regression
No ratings yet
Ridge Regression
5 pages
PA 1 UNIT
No ratings yet
PA 1 UNIT
23 pages
3-Linear Regreesion-Assumptions
No ratings yet
3-Linear Regreesion-Assumptions
28 pages
Regression_Questionnaire
No ratings yet
Regression_Questionnaire
10 pages
AML-3
No ratings yet
AML-3
19 pages
CSL0777 L17
No ratings yet
CSL0777 L17
27 pages
ML Solved Endsem
No ratings yet
ML Solved Endsem
16 pages
B Ridge - and - Lasso - Regression
No ratings yet
B Ridge - and - Lasso - Regression
5 pages
Regularization
No ratings yet
Regularization
5 pages
Ridge Lasso Regression Bias Variance Tradeoff 71
No ratings yet
Ridge Lasso Regression Bias Variance Tradeoff 71
19 pages
MULTICOLLINEARITY(1)
No ratings yet
MULTICOLLINEARITY(1)
21 pages
Linear Models
No ratings yet
Linear Models
50 pages
Lasso Regression
No ratings yet
Lasso Regression
16 pages
Bias Varience Trade Off
100% (2)
Bias Varience Trade Off
35 pages
ML unit-2 ppt
No ratings yet
ML unit-2 ppt
34 pages
Multi Kol
No ratings yet
Multi Kol
44 pages
Feature selection
No ratings yet
Feature selection
19 pages
Lecture 6
100% (1)
Lecture 6
18 pages
Lasso Vs Ridge Vs Elastic 1
No ratings yet
Lasso Vs Ridge Vs Elastic 1
5 pages
Machine learning
No ratings yet
Machine learning
19 pages
LassoRegression
No ratings yet
LassoRegression
3 pages
Linear Regression
No ratings yet
Linear Regression
10 pages
PGN AI and ML Presentation
No ratings yet
PGN AI and ML Presentation
28 pages
Multicollinearity (1)
No ratings yet
Multicollinearity (1)
7 pages
Multicollinearity
No ratings yet
Multicollinearity
26 pages
INSY662 - F23 - Week 3-2
No ratings yet
INSY662 - F23 - Week 3-2
15 pages
Chapter 04 (1)
No ratings yet
Chapter 04 (1)
70 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
Ref 7
No ratings yet
Ref 7
12 pages
Elements of Statistical Learning II - Ch.3 Linear Regression - Notes
No ratings yet
Elements of Statistical Learning II - Ch.3 Linear Regression - Notes
4 pages
MULTICOLLINEALITY
No ratings yet
MULTICOLLINEALITY
20 pages
Module 4: Regression Shrinkage Methods
No ratings yet
Module 4: Regression Shrinkage Methods
5 pages
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
No ratings yet
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
24 pages
C4-English
No ratings yet
C4-English
27 pages
Dependent Independent Variable (S) : Regression: What Is Regression
No ratings yet
Dependent Independent Variable (S) : Regression: What Is Regression
15 pages
Bias Variance Ridge Regression
No ratings yet
Bias Variance Ridge Regression
4 pages
What Is LASSO Regression Definition, Examples and Techniques
No ratings yet
What Is LASSO Regression Definition, Examples and Techniques
15 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
Describe in Brief Different Types of Regression Algorithms
No ratings yet
Describe in Brief Different Types of Regression Algorithms
25 pages
BRM Assignment
No ratings yet
BRM Assignment
26 pages
Chapter 4 Multicollinearity
No ratings yet
Chapter 4 Multicollinearity
7 pages
Multicollinearity 074432
No ratings yet
Multicollinearity 074432
21 pages
Regularization
No ratings yet
Regularization
45 pages
Consequences of Multicollinearity
100% (2)
Consequences of Multicollinearity
2 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Regression Analysis: A Journey from Simple to Complex
From Everand
Regression Analysis: A Journey from Simple to Complex
Pasquale De Marco
No ratings yet
Asa Iajmmes 2022 9 1 3
No ratings yet
Asa Iajmmes 2022 9 1 3
14 pages
Buechel Et Al. Learning and Evaluating Emotion Lexicons For 91 Languages
No ratings yet
Buechel Et Al. Learning and Evaluating Emotion Lexicons For 91 Languages
16 pages
2022-2023 Class XI Applied Mathematics Year Plan (MONTHLY)
No ratings yet
2022-2023 Class XI Applied Mathematics Year Plan (MONTHLY)
3 pages
AP+Psych+ +unit+1+ +Ciccarelli+Text+ (4th+ed)
No ratings yet
AP+Psych+ +unit+1+ +Ciccarelli+Text+ (4th+ed)
18 pages
Kassahun Thesis Updated 1
No ratings yet
Kassahun Thesis Updated 1
83 pages
The Curse of Geography: A View About The Process of Wealth Creation and Distribution
No ratings yet
The Curse of Geography: A View About The Process of Wealth Creation and Distribution
9 pages
MATH 101-Week 7-8- Lesson 4.1 Correlation & Regression Analysis
No ratings yet
MATH 101-Week 7-8- Lesson 4.1 Correlation & Regression Analysis
53 pages
Assignment 2 - Data Management
No ratings yet
Assignment 2 - Data Management
68 pages
12 VENUS Group 3 2
No ratings yet
12 VENUS Group 3 2
29 pages
Ebcu 001 Research Methodology
No ratings yet
Ebcu 001 Research Methodology
63 pages
Bootstrap Prediction Interval
No ratings yet
Bootstrap Prediction Interval
12 pages
Branding A B2B Service - Does A Brand Differentiate A Logistics Service Provider?
No ratings yet
Branding A B2B Service - Does A Brand Differentiate A Logistics Service Provider?
10 pages
Uploads Files ButtonDetails 85955
No ratings yet
Uploads Files ButtonDetails 85955
17 pages
Introduction To Business Statistics
100% (1)
Introduction To Business Statistics
224 pages
The Use of ICT in Educational Organizations A Quan
No ratings yet
The Use of ICT in Educational Organizations A Quan
11 pages
Handbook of Geochemistry, Vol. I Wedehpol
No ratings yet
Handbook of Geochemistry, Vol. I Wedehpol
229 pages
ModularBased-Multifunctional-Product-Design-Made-from-Furniture-Waste-Toward-the-Circular-Economy-Case-in-Indonesia
No ratings yet
ModularBased-Multifunctional-Product-Design-Made-from-Furniture-Waste-Toward-the-Circular-Economy-Case-in-Indonesia
14 pages
Factors Affecting Employee Turnover
No ratings yet
Factors Affecting Employee Turnover
21 pages
A Comprehensive Open-Source Software For Statistical Metrology Calculations: From Uncertainty Evaluation To Risk Analysis
No ratings yet
A Comprehensive Open-Source Software For Statistical Metrology Calculations: From Uncertainty Evaluation To Risk Analysis
25 pages
Regression
No ratings yet
Regression
21 pages
Rashmi Jeswani Capstone
No ratings yet
Rashmi Jeswani Capstone
84 pages
SM2023 Proceedings
No ratings yet
SM2023 Proceedings
603 pages
Lectures 11 12 CDO Advanced 201720170326152833
No ratings yet
Lectures 11 12 CDO Advanced 201720170326152833
81 pages
MMW-Learning-Plan-1st-Semester-SY-2024-2025
No ratings yet
MMW-Learning-Plan-1st-Semester-SY-2024-2025
21 pages
Essentials of Modern Business Statistics with Microsoft Excel 7th Edition David Anderson - The complete ebook version is now available for download
No ratings yet
Essentials of Modern Business Statistics with Microsoft Excel 7th Edition David Anderson - The complete ebook version is now available for download
66 pages
MANUAL YAMAHA 01v96i
No ratings yet
MANUAL YAMAHA 01v96i
35 pages
Chala 1st Journal
100% (1)
Chala 1st Journal
12 pages
[FREE PDF sample] (eBook PDF) Statistics for Business and Economics 8th Edition ebooks
100% (1)
[FREE PDF sample] (eBook PDF) Statistics for Business and Economics 8th Edition ebooks
40 pages
Activity 5. Linear Regression and Correlation 2 Nicole Tayaua
No ratings yet
Activity 5. Linear Regression and Correlation 2 Nicole Tayaua
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 13 - Reguralization

Uploaded by

Lecture 13 - Reguralization

Uploaded by

Transfer Functions

 Overfitting occurs when

 Under fitting occurs when

 It’s called a regularizer since it tries to keep the

 It is a bias on the model that forces the learning

• Ridge/Lasso regression is a model tuning method that

• This method performs L2/L1 regularization.

• When the issue of multicollinearity occurs, least-

• It occurs when the independent variables show moderate to high correlation.

• Coefficient W1 is the increase in Y for a unit increase in X1 while keeping X2

• Poorly designed experiments, highly observational data, or the inability

• Including identical variables in the dataset.

• Insufficient data, in some cases, can also cause multicollinearity

• Also VIF (Variance Inflation Factors) factor.

• VIF determines the strength of the correlation between the independent

• VIF starts at 1 and has no upper limit

Gender (0 – female, 1- male)

• This has reduced the redundancy in our dataset.

• Multicollinearity may not be a problem every time. The need to fix

sum of the weights

sum of the squared weights

What’s the difference between these?

• Squared weights penalizes large values more

 Note that is the square of the l2 norm of the vector b

 Note that is the l1 norm of the vector b

 To avoid ad-hoc choices, we should select l using cross-validation.

The ridge estimator is where The values of the coefficients

• The objective function is not differentiable, so it only

The Lasso estimator tends to The values of the coefficients

 The lasso leads to qualitatively similar behavior to ridge regression, in

 The lasso can generate more accurate predictions compared to ridge

 Cross-validation can be used in order to determine which approach is

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.