ML Lec-8
ML Lec-8
LECTURE-8
BY
Dr. Ramesh Kumar Thakur
Assistant Professor (II)
School Of Computer Engineering
v Regularization is a technique used in machine learning and statistical modelling to prevent overfitting
and improve the generalization ability of models.
v When a model is overfitting, it has learned the training data too well and may not perform well on
new, unseen data.
v Regularization introduces additional constraints or penalties to the model during the training process,
aiming to control the complexity of the model and avoid over-reliance on specific features or patterns in
the training data.
v By doing so, regularization helps to strike a balance between fitting the training data well and
generalizing it well to new data.
v The most common regularization techniques used are L1 regularization (Lasso), L2 regularization
(Ridge), and Elastic Net regularization.
v L1 regularization adds the sum of the absolute values of the model’s coefficients to the loss function,
encouraging sparsity and feature selection.
v L2 regularization adds the sum of the squared values of the model’s coefficients, which enables
smaller but non-zero coefficients.
v Finally, Elastic Net regularization combines both L1 and L2 regularization.
v L1 regularization, also known as Lasso (Least Absolute Shrinkage and Selection Operator)
regularization, adds the sum of the absolute values of the model’s coefficients to the loss function.
v It encourages sparsity in the model by shrinking some coefficients to precisely zero.
v This has the effect of performing feature selection, as the model can effectively ignore irrelevant or
less important features.
v L1 regularization is particularly useful when dealing with high-dimensional datasets with desired
feature selection.
v Mathematically, the L1 regularization term can be written as:
� �
v L1 regularization = �=�
(�� − ��)2 + λ * Σ | βi |
�
v Here, λ is the regularization parameter that controls the strength of regularization, βi represents the
individual model coefficients and the sum is taken over all coefficients.
v L2 regularization, also known as Ridge regularization, adds the sum of the squared values of the
model’s coefficients to the loss function.
v Unlike L1 regularization, L2 regularization does not force the coefficients to be exactly zero but instead
encourages them to be small.
v L2 regularization can prevent overfitting by spreading the influence of a single feature across multiple
features.
v It is advantageous when there are correlations between the input features.
v Mathematically, the L2 regularization term can be written as:
� �
v L2 regularization = �=�
(�� − ��)2 + λ * Σ ( βi 2)
�