Lecture 13 - Reguralization
Lecture 13 - Reguralization
Regression
- Regularization
Overfitting and underfitting
2
Bias and variance
Bias error
• How far is the predicted value from the true value
• The systematic error of the model
• It is about the model and the data itself
Variance error
• The error caused by sensitivity to
small variances in the training data set
• The dispersion of predicted values
over target values with different train-sets
• It is about the model sensitivity
3
The bias variance curse
As the complexity of model increases the bias decreases but the variance
increases
There is a trade-off between bias and variance You can't get both low bias and
low variance at the same time
4
Regularization
A regularizer is an additional criterion to the loss
function to make sure that we don’t overfit
Y = W0+W1*X1+W2*X2
• Multicollinearity could also occur when new variables are created which
are dependent on other variables.
• For example, creating a variable for BMI from the height and weight
variables would include redundant information in the model, and the new
variable will be a highly correlated variable.
• How to check: One can use scatter plot to visualize correlation effect among
variables.
• the closer the R^2 value to 1, the higher the value of VIF and the higher the
multicollinearity with the particular independent variable.
• We can see here that the ‘Age’ and ‘Years of service’ have a high VIF value, meaning
they can be predicted by other independent variables in the dataset.
Fixing Multicollinearity - Example
• We were able to drop the variable ‘Age’ from the dataset because its information was
being captured by the ‘Years of service’ variable.
• Dropping variables should be an iterative process starting with the variable having the
largest VIF value because other variables highly capture its trend.
• If you do this, you will notice that VIF values for other variables would have reduced, too,
although to a varying extent.
• In our example, after dropping the ‘Age’ variable, VIF values for all variables have
decreased to varying degrees.
Need to fix Multicollinearity
1. When you care more about how much each individual feature rather than a
group of features affects the target variable, then removing multicollinearity
may be a good option
2. If multicollinearity is not present in the features you are interested in, then
multicollinearity may not be a problem.
Regularization methods
Ridge regression
Lasso regression
Elastic regression
Regularization: An Overview
14
Common regularizers
16
LASSO Regression
The regularized loss function is:
17
Choosing l
In both Ridge and Lasso regression, we see that the larger our
choice of the regularization parameter l, the more heavily we
penalize large values in b,
• If l is close to zero, we recover the MSE, i.e. ridge and LASSO regression is just
ordinary regression.
• If l is sufficiently large, the MSE term in the regularized loss function will be
insignificant and the regularization term will force bridge and bLASSO to be close to
zero.
18
Ridge Regularization
Ridge Regularization
Ridge Regularization
Ridge Regularization - Example
Ridge Regularization - Example
Ridge visualized
Ridge
estimator
24
Ridge regularization: step by step
25
Lasso Regression
Lasso
estimator
27
Lasso regularization: step by step
28
Lasso vs. Ridge Regression
The lasso has a major advantage over ridge regression, in that it
produces simpler and more interpretable models that involved only a
subset of predictors.
Sensitivity More robust and less sensitive More sensitive to outliers due to the
to outliers compared to lasso absolute value in the penalty term.
regression.
Interpretabili The results of ridge regression Lasso regression can improve
ty may be less interpretable due interpretability by selecting only the
to the inclusion of all features, most relevant features, making the
each with a reduced but non- model’s predictions more
zero coefficient. explainable.
Elastic Regression
• If there is a group of highly correlated variables, then the LASSO tends
to select one variable from a group and ignore the others.