0% found this document useful (0 votes)
5 views13 pages

ML Unit 3 Assignment

The document discusses various machine learning concepts, including dimensionality reduction, subset selection, shrinkage methods in linear regression, and Principal Components Regression (PCR). It defines key techniques such as PCA and Lasso/Ridge regression, explaining their goals, methods, and impacts on model performance and interpretability. Additionally, it provides examples and steps for applying these techniques in practical scenarios, emphasizing their utility in handling high-dimensional data and multicollinearity.

Uploaded by

3y4dk4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views13 pages

ML Unit 3 Assignment

The document discusses various machine learning concepts, including dimensionality reduction, subset selection, shrinkage methods in linear regression, and Principal Components Regression (PCR). It defines key techniques such as PCA and Lasso/Ridge regression, explaining their goals, methods, and impacts on model performance and interpretability. Additionally, it provides examples and steps for applying these techniques in practical scenarios, emphasizing their utility in handling high-dimensional data and multicollinearity.

Uploaded by

3y4dk4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Machine Learning

Assignment Unit 3

Name - Yusuf Nathdwarawala


Roll no. - 21/CDOE/BCA/002

Q-1 a) Define dimensionality reduction and subset selection in the


context of machine learning. What are the primary goals of these
techniques, and how do they impact model performance and
computational efficiency?

b) Discuss the main methods used for dimensionality reduction, such


as Principal Component Analysis (PCA) and Feature Selection
techniques. Provide an example of a situation where dimensionality
reduction is beneficial.

Ans. Q-1 a) Dimensionality Reduction and Subset Selection in


Machine Learning
Dimensionality Reduction:
Dimensionality reduction refers to the process of reducing the number of input variables
or features in a dataset while retaining as much information as possible. This is
important when dealing with high-dimensional data, where too many features can lead
to overfitting, increased complexity, and computational inefficiency.
● Goal: Reduce the number of features without sacrificing performance by
capturing the essential patterns in the data.
● Impact:
● Improves model performance by eliminating irrelevant or redundant
features.
● Enhances computational efficiency by reducing the amount of data the
model has to process.
● Mitigates the curse of dimensionality, where having too many features
relative to the number of observations makes the model harder to
generalize.

Subset Selection:
Subset selection is a specific type of feature selection, where a subset of the original
features is chosen to build the model. The goal is to select the most important or
relevant features for the task, often based on some criteria like statistical significance or
feature importance.
● Goal: Improve model interpretability and performance by selecting only the most
informative features.
● Impact:
● Can lead to better model accuracy by removing irrelevant or noisy data.
● Increases computational efficiency by reducing the size of the input data.

Q-1 b) Main Methods for Dimensionality Reduction

1. Principal Component Analysis (PCA):


PCA is a popular method used to reduce dimensionality by transforming the original
features into a set of linearly uncorrelated variables called principal components. These
components are ordered by how much variance they capture in the data, and only the
top components are retained.
● How it Works:
● PCA identifies the directions (principal components) that maximize the
variance in the data.
● It projects the data onto these components, reducing the number of
dimensions while retaining the majority of the information.
● Example: In an image recognition problem with thousands of pixels as features,
PCA can reduce the dimensionality by projecting the images into fewer
dimensions that still capture the main differences between images.
● Impact on Model: PCA improves computational efficiency and reduces overfitting
but may make the model harder to interpret because the original features are
transformed.
2. Feature Selection Techniques:
Feature selection directly selects a subset of the original features based on their
importance or relevance to the target variable.
● Methods:
● Filter Methods: Use statistical techniques to evaluate the importance of
each feature independently of the model (e.g., correlation, mutual
information).
● Wrapper Methods: Evaluate different subsets of features by training and
testing a model on them, such as with forward selection, backward
elimination, or recursive feature elimination (RFE).
● Embedded Methods: Feature selection is performed during the model
training process itself (e.g., LASSO or decision tree-based models).
● Example: In a medical diagnosis problem, some patient features like age,
gender, and medical history may be irrelevant to the disease, and feature
selection helps to focus on the most important ones, such as specific biomarkers.
● Impact on Model: Feature selection improves interpretability, reduces training
time, and prevents overfitting by eliminating irrelevant data.

Example of Dimensionality Reduction Use Case:


In text classification tasks, you might have thousands or even millions of unique words
(features). Many of these words will be irrelevant, redundant, or rarely used.
Dimensionality reduction techniques like PCA or feature selection (e.g., removing
low-frequency words) can help reduce the number of words used in the model while
retaining the meaningful ones, improving both the accuracy and efficiency of the model.

Summary:
● Dimensionality reduction helps simplify high-dimensional data, reducing
overfitting and improving computational efficiency.
● Subset selection chooses the most relevant features, improving model
performance and interpretability.
● PCA and feature selection are key methods, with different approaches depending
on the data and task.
● Dimensionality reduction is especially useful in cases like image processing or
text analysis, where the data has many features but only a subset is informative.

Q-2 a) Explain shrinkage methods in the context of linear regression.


What are the primary types of shrinkage techniques, and how do they
modify the regression coefficients?

b) Compare and contrast Lasso (L1 regularization) and Ridge (L2


regularization) as shrinkage methods. Discuss their advantages,
limitations, and typical use cases in regression problems.

Ans. Q-2 a) Shrinkage Methods in Linear Regression

Shrinkage in Linear Regression:


Shrinkage methods in linear regression are techniques that apply a penalty to the size
of the regression coefficients. This is done to shrink or reduce the magnitude of the
coefficients, which helps prevent overfitting and improves the model’s ability to
generalize to new data.

In standard linear regression, the goal is to minimize the sum of squared errors between
the predicted and actual values. Shrinkage methods modify this by adding a penalty
term that controls the size of the coefficients, helping to keep the model simpler and
less sensitive to fluctuations in the training data.

Primary Types of Shrinkage Techniques:

1. Ridge Regression (L2 Regularization):


● Adds a penalty equal to the sum of the squared values of the coefficients.
● This forces the regression coefficients to become smaller (closer to zero)
but not exactly zero.
2. Lasso Regression (L1 Regularization):
● Adds a penalty equal to the sum of the absolute values of the coefficients.
● This can shrink some coefficients exactly to zero, effectively performing
feature selection.
● The same terms apply as in Ridge regression, but the penalty is based on
the absolute values of the coefficients.
How Shrinkage Modifies Regression Coefficients:

● Shrinkage techniques modify the regression coefficients by adding a penalty term


to the loss function, which discourages large values for the coefficients.
● The higher the regularization parameter λ, the stronger the penalty, and the
smaller the coefficients become.
● Lasso (L1) may shrink some coefficients exactly to zero, making it useful for
feature selection.
● Ridge (L2) shrinks all coefficients towards zero but does not eliminate any of
them entirely.

Q-2 b) Comparison of Lasso (L1) and Ridge (L2)


Regularization

1. Lasso (L1 Regularization):

● Penalty: Adds the sum of the absolute values of the coefficients.


● Effect on Coefficients: Can shrink some coefficients exactly to zero, making it
useful for automatic feature selection.
● Advantages:
● Performs feature selection, which helps reduce the number of variables in
the model.
● Useful when you expect that many of the features are irrelevant.
● Limitations:
● In cases where the number of predictors is larger than the number of
observations, Lasso tends to pick one variable out of a group of highly
correlated variables and ignore the others.
● Typical Use Cases:
● Lasso is ideal when you have many features and expect that some are not
relevant to the target variable.
● It's often used in high-dimensional datasets, such as text classification or
genomic data.

2. Ridge (L2 Regularization):

● Penalty: Adds the sum of the squared values of the coefficients.


● Effect on Coefficients: Shrinks the coefficients towards zero, but none of them will
be exactly zero.
● Advantages:
● Useful when there are many small or moderately large predictors.
● Helps with multicollinearity (when predictor variables are highly
correlated), as it forces them to share the weight.
● Limitations:
● Does not perform feature selection, meaning all variables are retained in
the model.
● Typical Use Cases:
● Ridge is used when you have many predictors and you want to shrink their
effect, especially when predictors are correlated.
● Commonly used in scenarios where you know that all features have some
relevance to the outcome.

Key Differences Between Lasso and Ridge

Feature Lasso (L1) Ridge (L2)

Penalty Sum of absolute values of coefficients Sum of squared values of coefficients

Feature Selection Yes, shrinks some coefficients to zero No, retains all features

Handling Multicollinearity Selects one predictor from a group Shrinks coefficients for all correlated predictors

When to Use When you expect some irrelevant features When all features are important
Effect on Coefficients Some coefficients exactly zero Shrinks all coefficients, none exactly zero

Example of When Dimensionality Reduction Helps:


In high-dimensional datasets, like in genetics or finance, Lasso is useful for
automatically selecting relevant features while reducing the impact of irrelevant ones.
Ridge works well when you want to shrink the effect of all features, especially when they
are correlated, to prevent overfitting.

Summary:
● Shrinkage methods (Lasso and Ridge) prevent overfitting by penalizing the size
of the regression coefficients.
● Lasso (L1) is best for feature selection, while Ridge (L2) is used when you want
to reduce the impact of all features without eliminating any.
● Both methods are useful for improving model generalization and reducing the
effect of irrelevant or redundant features.

Q-3 a) Describe Principal Components Regression (PCR) and its


application in linear classification.

How does PCR utilize Principal Component Analysis (PCA) to address


issues of multicollinearity in regression models?

b) Provide an example where Principal Components Regression is


used. Explain the steps involved in applying PCR, including how to
choose the number of principal components and interpret the results.

Ans. Q-3 a) Principal Components Regression (PCR)


What is Principal Components Regression (PCR)?
Principal Components Regression (PCR) is a regression technique that combines
Principal Component Analysis (PCA) and Linear Regression. It is primarily used to
handle situations where the predictor variables (features) are highly correlated (a
problem called multicollinearity). PCR solves this issue by transforming the original
correlated predictors into a new set of uncorrelated components, called principal
components, and then using these components in a linear regression model.

How PCR Uses PCA:

1. Principal Component Analysis (PCA) transforms the original features into new
variables (principal components), which are linear combinations of the original
variables. These principal components are uncorrelated and capture the
maximum variance in the data.
2. Instead of using the original features in the regression model, PCR fits the model
on the top principal components, which reduces multicollinearity and the model's
complexity.

Addressing Multicollinearity:
Multicollinearity occurs when two or more predictor variables are highly correlated,
leading to instability in estimating regression coefficients. In PCR:
● PCA identifies the directions in the data with the most variance and transforms
the original correlated variables into uncorrelated principal components.
● By using only the first few principal components (the ones that explain the most
variance), PCR reduces the dimensionality of the data and eliminates
multicollinearity, leading to more stable regression estimates.

Q-3 b) Example of Principal Components Regression (PCR)


Let’s consider an example where PCR is used in a wine quality prediction dataset,
where several chemical properties (predictor variables) of the wine are measured to
predict the wine quality (target variable). Many of the chemical properties are highly
correlated, leading to multicollinearity, so PCR is applied to address this.

Steps Involved in Applying PCR:

1. Standardize the Data:


● Before applying PCA, it’s important to standardize or normalize the
dataset so that each feature has a mean of 0 and a standard deviation of
1. This ensures that features with larger scales don’t dominate the PCA.
● Example: from sklearn.preprocessing import StandardScaler scaler =
StandardScaler() X_scaled = scaler.fit_transform(X)tandardScaler
scaler = StandardScaler() X_scaled = scaler.fit_transform(X)

2. Apply PCA to the Predictor Variables:


● PCA is applied to the standardized data to transform the original
correlated variables into principal components.
● The explained variance ratio of the principal components is analyzed to
determine how much variance each component captures.
● Example: from sklearn.decomposition import PCA pca = PCA() X_pca =
pca.fit_transform(X_scaled)

3. Choose the Number of Principal Components:


● The number of principal components to retain is based on how much
variance they explain. Usually, we choose the components that explain
80-90% of the variance to capture most of the data’s information.
● A common approach is to look at a scree plot or cumulative explained
variance plot to decide how many components to keep.
● Example: import matplotlib.pyplot as plt
plt.plot(np.cumsum(pca.explained_variance_ratio_)) plt.xlabel('Number of

Components') plt.ylabel('Variance Explained') plt.show()

4. Fit the Regression Model on the Principal Components:


● Once the top principal components are selected, the regression model is
fitted using these components instead of the original features.
● Example: from sklearn.linear_model import LinearRegression # Choose the top
'n' principal components n_components = 5 X_pca_selected = X_pca[:,
:n_components] model = LinearRegression() model.fit(X_pca_selected, y)

5. Interpret the Results:


● The model is now trained on the principal components rather than the
original features. The regression coefficients represent the contribution of
each principal component to the prediction of the target variable.
● Interpreting coefficients in PCR is more abstract, as they reflect the
influence of a principal component, which is a combination of the original
features. However, you can transform the coefficients back to the original
feature space to understand the impact of each feature.
Choosing the Number of Principal Components:

● The number of principal components to include in the model is typically chosen


by examining the explained variance of each component. Ideally, we want to
retain the minimum number of components that explain most of the variance.
● Rule of Thumb: If the first few components explain about 80-90% of the variance,
they are usually enough for an effective regression model.

Example Walkthrough:
Consider a dataset of wine quality prediction with 12 chemical properties as predictors.
Since many chemical properties are correlated (like sugar content and alcohol level),
multicollinearity can cause instability in a linear regression model.
● After applying PCA, we find that the first 4 principal components explain 85% of
the variance in the data.
● We then use these 4 principal components in the regression model to predict
wine quality.
● By doing so, we reduce multicollinearity, improve model stability, and make the
model more computationally efficient.

Summary:
● Principal Components Regression (PCR) addresses multicollinearity by
transforming correlated variables into uncorrelated principal components and
then applying linear regression.
● PCR combines PCA with regression, which makes it ideal for datasets with many
correlated predictors.
● The number of principal components used is based on how much variance they
explain, usually around 80-90% of the total variance.
● PCR is commonly used in high-dimensional datasets where multicollinearity is a
concern.
Q-4 a) Discuss Logistic Regression and its role in
classification tasks. How does logistic regression model the
probability of a binary outcome, and what is the
interpretation of its coefficients?

b) Provide an example demonstrating how to use logistic


regression for a classification problem. Include the steps for
model fitting, evaluating performance, and interpreting the
results.

Q-4 a) Logistic Regression in Classification Tasks

What is Logistic Regression?


Logistic Regression is a method used to predict binary outcomes, where there are only
two possible results, such as yes/no, pass/fail, or spam/not spam.

Even though it's called "regression," it is used for classification problems, where you
need to assign data into categories.

How Logistic Regression Works:

● Logistic regression calculates the probability that a given input belongs to a


particular class.
● It uses a special mathematical function called the logistic function (or sigmoid
function) that outputs values between 0 and 1.
● This probability is then used to classify the data. For example:
● If the probability is greater than 0.5, the model predicts class 1 (e.g., pass
or yes).
● If the probability is less than 0.5, the model predicts class 0 (e.g., fail or
no).

Interpretation of Coefficients:

● The coefficients in logistic regression show how much a predictor (like hours
studied) affects the probability of the outcome (e.g., passing an exam).
● If a coefficient is positive, it means that an increase in the predictor will increase
the probability of the positive outcome.
● If a coefficient is negative, it decreases the probability of the positive outcome.

Q-4 b) Example of Logistic Regression for Classification


(No Coding)
Let’s imagine an example where you are trying to predict if a student will pass or fail an
exam based on the number of hours studied.

Steps to Apply Logistic Regression:

1. Collect Data:
● You have data showing how many hours students studied and whether
they passed or failed.
● For example:
● A student who studied 2 hours failed.
● A student who studied 8 hours passed.
2. Train the Model:
● Logistic regression is used to create a model that learns from this data. It
identifies the relationship between the number of hours studied and the
likelihood of passing.
● The model calculates probabilities, such as:
● A student who studied 5 hours has a 70% probability of passing.
● A student who studied only 2 hours has a 20% probability of
passing.
3. Make Predictions:
● Once trained, the model can be used to predict outcomes for new
students based on how many hours they studied.
● For example, if a student studies 6 hours, the model might predict a pass
because the probability is greater than 0.5.
4. Evaluate the Model:
● You can assess how well the logistic regression model works by
comparing its predictions to actual results. For example, if the model
predicts that a student will pass but they fail, it indicates that the model
might need improvement.

Interpreting the Model:


● If the model shows that hours studied has a positive coefficient, it means that
studying more increases the chances of passing.
● The model creates a decision boundary (usually set at a 50% probability). If the
probability is above 50%, the model predicts a pass, otherwise a fail.

Example in Real Life:


Imagine you're an admissions officer and need to decide if students will succeed in a
particular program. Using logistic regression, you can predict success (pass) or failure
based on factors like high school grades, test scores, and study habits.

Summary:
● Logistic regression is a tool used for binary classification.
● It predicts the probability of an outcome (e.g., pass or fail).
● The coefficients in the model tell you how each factor (e.g., hours studied)
impacts the likelihood of that outcome.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy