0% found this document useful (0 votes)
5 views20 pages

Ridge Regression

The document outlines an assignment focused on L2 Regularization (Ridge Regression) for housing price prediction as part of a machine learning course. It details the purpose, implementation, benefits, and limitations of L2 regularization, along with its application in predicting housing prices for a real estate firm entering the Australian market. The assignment includes a problem statement, business objectives, and code implementations to analyze model performance before and after applying regularization.

Uploaded by

Abel Tadesse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views20 pages

Ridge Regression

The document outlines an assignment focused on L2 Regularization (Ridge Regression) for housing price prediction as part of a machine learning course. It details the purpose, implementation, benefits, and limitations of L2 regularization, along with its application in predicting housing prices for a real estate firm entering the Australian market. The assignment includes a problem statement, business objectives, and code implementations to analyze model performance before and after applying regularization.

Uploaded by

Abel Tadesse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

COURSE TITLE: MACHINE LEARNING

COURSE CODE: Arln-6041

ASSIGNMENT-2: L2 REGULARIZATION (RIDGE REGRESSION)

PROBLEM TITLE: HOUSING PRICE PREDICTION

GROUP MEMBERS ID

ABIGYA AYELE GSR/2683/17

EMANDA HAILU GSR/5056/17

SUBMISSION DATE: JANUARY 09, 2025


SUBMITTED TO: DR. FANTAHUN BOGALE
L2 Regularization (Ridge Regression)..............................................................2
What Does L2 Regularization Do?................................................................. 2
How It Works.................................................................................................. 2
Purpose of L2 Regularization..........................................................................2
Mathematical Implementation of L2 Regularization...................................... 3
What Happens When We Adjust α?................................................................ 3
Best Use Cases for L2 Regularization (Ridge Regression)............................. 4
1. Handling Multicollinearity.......................................................................... 4
2. Model Interpretability is Not a Primary Concern........................................5
3. High-Dimensional Datasets.........................................................................5
Limitations of L2 Regularization...................................................................... 6
1. No Feature Selection................................................................................... 6
2. Bias-Variance Tradeoff................................................................................7
3. Sensitivity to Outliers..................................................................................8
Bias and Variance Tradeoff in detail................................................................ 9
Housing price prediction with Ridge regularisation technique.........................12
Problem statement.............................................................................................. 12
Business Objective............................................................................................. 12
Code Implementations...........................................................................................13
Comparing and Contrasting Model Performance Before and After
Regularization........................................................................................................ 15
Performance Before Regularization................................................................... 15
Performance After Regularization......................................................................16
Analysis of Differences...................................................................................... 17
Conclusions.............................................................................................................18
References..........................................................................................................19

1
L2 Regularization (Ridge Regression)

What Does L2 Regularization Do?


L2 regularization, also known as ridge regression, is a technique used to help
prevent overfitting in machine learning models, especially when there are many
features (variables) involved.

How It Works
●​ Adding a Penalty: When we train our model, we want not only to minimize
the loss function (the error) but also to keep the model simple. L2
regularization does this by adding a "penalty" to the loss function based on
the size of the coefficients (the numbers that represent how much each
feature contributes to the prediction).
●​ Squaring the Coefficients: Specifically, L2 regularization adds a term that is
proportional to the square of each coefficient. This means if any coefficient
becomes too large, it will significantly increase the penalty. The formula
looks something like this:

Here, α is a number that controls how much we want to penalize large coefficients.

Purpose of L2 Regularization

The primary purpose of L2 regularization is to reduce overfitting in predictive


models, especially in linear regression. By penalizing large coefficients, it
encourages simpler models that generalize better to unseen data. The penalty term
is the squared sum of the coefficients, which discourages any single coefficient

2
from becoming excessively large, thus stabilizing the model's predictions across
different datasets [1][2].

Mathematical Implementation of L2 Regularization

In ridge regression, the loss function is modified to include a penalty term:

●​ X represents the input data.


●​ y is the target variable.
●​ β denotes the coefficients associated with the features.
●​ α is the regularization strength.

The first term measures the error between predicted and actual values, while the
second term penalizes large coefficients. As α increases, more regularization is
applied, leading to smaller coefficient values but potentially higher bias [3][4].

What Happens When We Adjust α?

●​ Increasing α: When you increase α, you make the penalty for large
coefficients stronger. This encourages the model to keep coefficients smaller,
which can help prevent overfitting. However, it may also lead to a simpler
model that doesn’t capture all patterns in the data very well—this can
increase bias (the model might miss some important relationships).
●​ Decreasing α: When you decrease α, you reduce the penalty on large
coefficients. This allows your model to fit more closely to your training data,

3
which can reduce bias but may lead to overfitting if it captures too much
noise from that data.
By understanding the bias-variance tradeoff, we should adjust α to give us a good
balance. A good balance between bias and variance helps ensure that your model
performs well not just on training data but also on real-world situations.

Best Use Cases for L2 Regularization (Ridge Regression)

L2 regularization, also known as ridge regression, is a powerful technique in


machine learning that helps improve model performance in various scenarios. Here
are some of the best use cases:

1. Handling Multicollinearity
What is Multicollinearity?
●​ Multicollinearity occurs when two or more independent variables (features)
in a dataset are highly correlated. For example, if you have both "height in
centimeters" and "height in inches" as features, they provide similar
information, which can confuse the model.
Why is it a Problem?
●​ When features are highly correlated, it can lead to unstable estimates of the
coefficients in a regression model. This means that small changes in the data
can lead to large changes in the model's predictions.
How L2 Regularization Helps:
●​ L2 regularization helps by distributing the influence of these correlated
features across all of them instead of allowing one feature to dominate the
model. It does this by shrinking the coefficients of all correlated features
towards each other, which stabilizes the model and makes it more reliable.

4
2. Model Interpretability is Not a Primary Concern
What Does This Mean?
●​ In some cases, understanding exactly how each feature contributes to the
prediction is not as important as getting accurate predictions overall. For
example, in complex models used for tasks like image recognition, knowing
the exact contribution of each pixel may not be necessary.
Why Choose L2 Regularization?
●​ Unlike L1 regularization (lasso), which can eliminate some features entirely
by setting their coefficients to zero (making them irrelevant), L2
regularization keeps all features in the model. This means you won't lose any
potentially useful information, even if you don’t fully understand how each
feature affects the outcome.

3. High-Dimensional Datasets
What Are High-Dimensional Datasets?
●​ High-dimensional datasets have a large number of features compared to the
number of observations (data points). For instance, if you have thousands of
features but only a few hundred samples, this situation can lead to
overfitting.
Challenges with High Dimensions
●​ In high-dimensional spaces, models can easily become overly complex and
fit noise rather than actual patterns in the data. This makes them perform
poorly on new, unseen data.
Benefits of L2 Regularization:
●​ L2 regularization is particularly effective in these situations because it helps
control the complexity of the model by shrinking all coefficients towards

5
zero without eliminating any features. This ensures that even with many
predictors, the model remains stable and performs well on new data.

In summary, L2 regularization is beneficial in several scenarios:


●​ It effectively manages multicollinearity by distributing influence among
correlated features.
●​ It retains all features in the model when interpretability is less critical.
●​ It maintains performance in high-dimensional datasets by controlling model
complexity.
●​ By applying L2 regularization appropriately, you can create robust models
that generalize well to new data while still utilizing all available information
from your dataset [1][2][5].

Limitations of L2 Regularization
While L2 regularization is a powerful technique for improving model performance,
it does have some drawbacks. Here are the main limitations:

1. No Feature Selection
What Does This Mean?
●​ In machine learning, feature selection is the process of identifying and
keeping only the most important variables (features) for making predictions.
Some methods, like L1 regularization (lasso), can completely remove less
important features by setting their coefficients to zero.
L2 Regularization's Approach
●​ L2 regularization does not perform feature selection. Instead of eliminating
features, it shrinks all coefficients towards zero but never actually sets any of

6
them to zero. This means that even if a feature is not useful or relevant to the
prediction, it will still remain in the model with a small coefficient.
Why Is This a Problem?
●​ Keeping irrelevant features can make the model more complex and harder to
interpret. It may also lead to longer training times and less efficient models
because the model is still considers all features, even those that don’t
contribute meaningful information[1][4].

2. Bias-Variance Tradeoff
What Is Bias and Variance?
●​ Bias refers to the error introduced by approximating a real-world problem
with a simplified model. High bias can cause a model to miss important
patterns (underfitting).
●​ Variance refers to how much the model's predictions change when it is
trained on different datasets. High variance can cause a model to fit noise in
the training data (overfitting).
How Does Regularization Affect Bias and Variance?
When you increase the strength of L2 regularization (the parameter α), you make
the penalty for large coefficients stronger:
●​ This typically leads to higher bias because the model becomes simpler and
may miss some relationships in the data.
●​ It also leads to lower variance because the model becomes less sensitive to
fluctuations in training data.
Why Is This Important?
While increasing regularization can help improve how well the model performs on
new data (generalization), it may also result in poorer performance on training data

7
because it oversimplifies the model. Striking the right balance between bias and
variance is crucial for creating effective models [2][6].

3. Sensitivity to Outliers
What Are Outliers?
●​ Outliers are data points that are significantly different from others in your
dataset. For example, if you're predicting house prices and one house costs
$10 million while most others cost around $300,000, that $10 million house
is an outlier.
How Does L2 Regularization Respond to Outliers?
●​ L2 regularization squares the coefficients when calculating its penalty term.
This means that larger coefficients (which might be influenced by outliers)
will have an even greater penalty because squaring amplifies their value.
Why Is This a Concern?
●​ If there are outliers in your data, they can disproportionately affect the
model's performance because they can lead to larger coefficient values,
which then receive higher penalties during training. As a result, the model
may become skewed by these outliers, leading to less reliable predictions
[7].

Therefore, while L2 regularization has many benefits, it also has limitations:


●​ It does not eliminate irrelevant features from the model, which can
complicate interpretation and efficiency.
●​ Increasing regularization strength can lead to higher bias (simplifying too
much) and lower variance (making it more stable), which might hurt
performance on training data.

8
●​ It can be sensitive to outliers, which may disproportionately affect how well
the model performs.

Bias and Variance Tradeoff in detail

What is Bias?
●​ Definition: Bias refers to the error introduced when a model makes
assumptions about the data that simplify the problem too much. This
simplification can lead to missing important patterns in the data.
●​ Example: Imagine you are trying to predict the price of houses based on
their size. If you use a very simple model, like a straight line (linear
regression), it might predict house prices based only on size without
considering other important factors like location or number of bedrooms.
This model might consistently predict prices that are too low or too high because it
ignores these other factors. This is an example of high bias, which can lead to
underfitting—the model is too simple to capture the underlying trends in the data.

What is Variance?
●​ Definition: Variance refers to how much a model's predictions change when
it is trained on different subsets of data. A model with high variance pays too
much attention to the training data, including its noise and outliers.
●​ Example: Now, imagine you use a very complex model, like a polynomial
regression with many curves and bends, to fit your house price data. This
model might fit the training data perfectly, capturing every fluctuation and
detail.
However, when you test it on new data (houses not included in the training set), it
may perform poorly because it has learned patterns that are not actually present in

9
general (like noise). This is an example of high variance, leading to
overfitting—the model is too complex and captures noise instead of the true signal.

The Tradeoff Between Bias and Variance


In machine learning, there is a tradeoff between bias and variance:
●​ Increasing Model Complexity: As you make your model more complex
(e.g., by adding more features or using more flexible algorithms), bias
typically decreases because the model can capture more patterns in the data.
However, variance increases because the model becomes more sensitive to
fluctuations in the training data.
●​ Decreasing Model Complexity: Conversely, if you simplify your model
(e.g., by using fewer features or a simpler algorithm), bias increases because
the model may miss important patterns. At the same time, variance decreases
since the model is less sensitive to specific details in the training data.

How Regularization Affects Bias and Variance


Regularization techniques like L2 regularization (ridge regression) help control this
tradeoff:
●​ Increasing Regularization Strength (α): When you increase the regularization
strength:
○​ The penalty for large coefficients becomes stronger.
○​ The model simplifies as coefficients are pushed closer to zero.
○​ This typically leads to higher bias because you may miss some
relationships in the data.
○​ It also leads to lower variance, making the model less sensitive to
fluctuations in training data.

10
Why Is This Important?
Understanding the bias-variance tradeoff is crucial for building effective machine
learning models:
●​ Generalization: The goal of any predictive model is to generalize well to
new, unseen data. A good balance between bias and variance helps ensure
that your model performs well not just on training data but also on
real-world situations.
●​ Model Selection: By recognizing where your current model lies on the
bias-variance spectrum, you can make informed decisions about whether to
simplify or complicate your model, or whether to apply regularization
techniques.
●​ Performance Evaluation: Monitoring both bias and variance helps you
identify issues like underfitting (high bias) or overfitting (high variance)
during training. You can adjust your approach accordingly based on these
insights.

Therefore, understanding bias and variance—and their tradeoff—is essential for


building effective machine learning models:
●​ Bias represents errors due to overly simplistic assumptions about the data.
●​ Variance represents errors due to excessive sensitivity to fluctuations in
training data.
●​ Striking a balance between these two types of errors allows for better
generalization to new data.
●​ Regularization techniques can help manage this balance by controlling
model complexity.

11
Housing price prediction with Ridge regularisation technique

Problem statement
A US-based real estate firm, has made the decision to join the Australian market.
Utilizing data analytics, the business buys homes below their true value and then
sells them for more money. The business has gathered a data set from Australian
home sales for the same reason.
The business is searching for potential properties to purchase in order to get into
the market. To determine whether or not to invest in the potential properties, we
must use regularization to construct a regression model that predicts their true
value.

The business is interested in knowing:


●​ Which factors are important in determining a home's price, and how
effectively do those factors capture the price of a home?
●​ Find the ideal lambda value for ridge and lasso regression as well.

Business Objective
We must use the available independent variables to model the cost of homes. The
management will then utilize this model to determine the precise way in which the
prices change with the variables. As a result, they can influence the company's
strategy and focus on areas that will generate large profits. Additionally,
management will be able to comprehend the price dynamics of a new market with
the help of the model.

Dataset: The dataset has been submitted together with the data dictionary.

12
Code Implementations
We have attached our code to GitHub, and here we have described the steps we
take to implement the code.

1. Understanding the Dataset


Step: Load the Dataset
Substeps:

●​ Import necessary libraries like pandas and numpy.


●​ Load the dataset using a function like pd.read_csv().
●​ Perform exploratory data analysis (EDA) to understand its structure.
○​ Use .head() to preview data.
○​ Use .info() to check column types and missing values.
○​ Use .describe() to observe summary statistics.

Why: This step provides an overview of the dataset's features and potential issues,
such as missing values or inconsistent data types, that need resolution.

2. Data Preprocessing
Step: Prepare Data for Modeling
Substeps:
●​ Handle Missing Values:
a.​ Fill in missing values using strategies like mean, median, or mode.
b.​ Alternatively, drop rows or columns with excessive missing data.
●​ Encode Categorical Variables:

13
a.​ Use one-hot encoding or label encoding to convert text labels into
numeric format.
●​ Normalize/Scale Features:
a.​ Apply standardization using StandardScaler or MinMaxScaler to bring
features to a similar scale.
Why: Proper preprocessing ensures data consistency and improves model
performance by avoiding scale-related biases.

3. Splitting the Data


Step: Divide Dataset
Substeps:
●​ Use the train_test_split() function from sklearn.model_selection.
●​ Allocate 80% of the data for training and 20% for testing.
Why: Splitting ensures that model evaluation is unbiased, simulating performance
on unseen data.

4. Implementing Ridge Regression


Step: Train and Evaluate Ridge Model
Substeps:

●​ Import the Ridge model from sklearn.linear_model.


●​ Initialize the model with a regularization parameter (alpha), which controls
the strength of the penalty.
●​ Fit the model using the training data with .fit().
●​ Make predictions on the training and testing sets using .predict().
●​ Calculate performance metrics:

14
●​ Mean Squared Error (MSE): Measures prediction error.
●​ R-squared: Indicates the proportion of variance explained by the
model.
Why: Ridge regression ensures all features contribute to predictions but reduces
the impact of less important ones.

Comparing and Contrasting Model Performance Before and After


Regularization

This section compares the performance of the ML model before and after
regularization using Ridge regression. It evaluates the trade-offs between bias and
variance, as well as the impact of hyperparameter tuning on the model's
performance. The R² score, a metric that indicates how well the model explains the
variance in the data, is used to assess both training and test set performance.

Performance Before Regularization

Before applying Ridge regression, the model achieved an R² score of 0.8485 on the
training set and 0.8374 on the test set. This performance highlights a
well-performing model, explaining approximately 85% of the variance in the data.
Notably, the gap between training and test scores indicates that the model
generalizes well to unseen data, with minimal overfitting.

However, despite this solid performance, the absence of regularization means that
the model likely assigns high importance to all features, including less relevant
ones. This leads to overfitting in scenarios where the dataset is noisy or has
irrelevant features. Without regularization, the model becomes sensitive to small
changes in the training data, which reduces its robustness over time.

15
The high R² score for the training set demonstrates that the model closely fits the
training data, potentially capturing noise or minor fluctuations that do not
generalize well. While the test performance was strong in this case, relying on such
a model without addressing overfitting could lead to issues if the data distribution
changes or new datasets introduce unseen complexities.

Performance After Regularization

After applying Ridge regression, the R² score for the training set decreased to
0.8461, and the test set score dropped to 0.8247. This change reflects the impact of
introducing a regularization penalty, which prevents the model from assigning
excessively high weights to certain features.

As the regularization parameter (alpha) increases, the R² score for the training set
consistently decreases. This behavior is expected because higher alpha values
introduce a greater penalty for large coefficients, making the model simpler and
more generalized. By discouraging the model from closely fitting the training data,
ridge regression reduces the likelihood of overfitting. However, this also means
that the model sacrifices some of its ability to capture intricate patterns, leading to
a slight increase in error and a corresponding drop in the R² score.

On the test set, Ridge regression introduced a more complex dynamic. Initially,
with very low values of alpha, the test error was high, and the R² score was low.
This indicates that the model was overfitting the training data, as evidenced by the
discrepancy between high training performance and low test performance. As
alpha increased, the test error decreased, and the R² score improved, reaching its
peak at alpha = 2, where the test R² score was approximately 81%. Beyond this
point, further increases in alpha caused the R² score to decline, signaling that the
model had become overly simplistic and was underfitting the data.

16
This behavior underscores the importance of choosing an optimal alpha value. In
this case, alpha = 2 gave the right balance between bias and variance, creating a
simpler yet effective model. The regularized model at this stage was better
generalized and less prone to overfitting while maintaining strong predictive
performance on unseen data.

Analysis of Differences

The observed differences between pre- and post-regularization performance can be


explained by the trade-offs inherent in regularization:

1.​ Bias-Variance Tradeoff:


○​ Before regularization, the model's lower bias allowed it to fit the
training data more precisely, resulting in higher R² scores for both
datasets. However, this precision came at the risk of overfitting, as the
model's complexity made it sensitive to noise and irrelevant patterns
in the training data.
○​ After regularization, introducing alpha increased the model's bias
while reducing its variance. This led to a more generalized model.
2.​ Hyperparameter Tuning:
○​ The choice of alpha is critical to achieving optimal performance. In
this case, alpha = 2 provided the best balance, yielding the highest test
R² score while preventing overfitting. Selecting an alpha value that is
too high or too low disrupts this balance, either by under-regularizing
(allowing overfitting) or over-regularizing (leading to underfitting).

17
Conclusions

Ridge regression plays a crucial role in addressing overfitting by introducing


constraints on model complexity. In this analysis, Ridge regression helped reduce
the risk of overfitting by penalizing large coefficients, which allowed the model to
focus on the most predictive features rather than capturing noise or irrelevant
patterns in the training data.

Before regularization, the model achieved high R² scores for both training and test
sets, indicating strong performance but with a potential risk of overfitting. After
applying Ridge regression, the training R² score decreased, reflecting the reduction
in overfitting as the model became less tailored to the training data. On the test set,
the R² score dropped as well, as the model became more generalized, balancing its
ability to predict unseen data. Regularization effectively ensured that the model
focused on meaningful patterns, even if it meant sacrificing some predictive power
in exchange for robustness.

Overall, Ridge regression was instrumental in creating a simpler and more robust
model. By introducing regularization and carefully selecting the alpha parameter,
the model achieved a better balance between bias and variance, ensuring reliable
performance on both training and test datasets. This highlights the importance of
using regularization techniques like Ridge regression, particularly when overfitting
is a concern, to build models that generalize effectively to unseen data.

18
References
[1] https://builtin.com/data-science/l2-regularization
[2] https://www.ibm.com/think/topics/ridge-regression
[3] https://scikit-learn.org/dev/auto_examples/linear_model/plot_ridge_coeffs.html
[4] https://www.dataquest.io/blog/regularization-in-machine-learning/
[5]
https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c?gi
=5ac6fe73dad4
[6]
https://www.linkedin.com/pulse/l1-l2-regularization-why-neededwhat-doeshow-hel
ps-ravi-shankar
[7]
https://www.researchgate.net/publication/342725398_Ridge_Regularization_An_E
ssential_Concept_in_Data_Science

19

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy