0% found this document useful (0 votes)

14 views8 pages

CSE381 Introduction To Machine Learning - Image Classification and Loss Functions: Theoretical Questions and Answers

The document discusses various aspects of image classification and machine learning, focusing on the k-NN algorithm, loss functions, and hyperparameter tuning. It highlights the disadvantages of k-NN, compares Cross-Entropy and Multiclass SVM Loss, and explains the importance of regularization in preventing overfitting. Additionally, it covers common datasets for image classification and the role of validation sets and hyperparameters in model training.

Uploaded by

Bebo BX

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views8 pages

CSE381 Introduction To Machine Learning - Image Classification and Loss Functions: Theoretical Questions and Answers

Uploaded by

Bebo BX

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

CSE381 Introduction to Machine Learning - Image

Classification and Loss Functions: Theoretical Questions and

Answers
Fall 22

(2) What is the main disadvantage of k-NN algorithm?

Answer: The main disadvantage of the k-NN algorithm is its high computational cost
during the testing phase, especially when the training dataset is large. For each test
example, it needs to calculate the distance to all training examples, which has a time
complexity of O(N) where N is the number of training examples. It can also be sensitiveto
the choice of K and the curse of dimensionality.

(4) Compare between the two loss functions: Cross‐Entropy versus Multiclass SVM Loss.

Answer:

Cross-Entropy:

Measures the difference between the predicted probability distribution and the
true distribution.
Used with Softmax to produce probabilities.
Aims to maximize the probability of the correct class.
Continues to improve even when the correct class has the highest probability.

Multiclass SVM Loss (Hinge Loss):

Encourages the correct class score to be higher than incorrect class scores by a
margin.
Penalizes only misclassifications that violate the margin.
Saturates (stops penalizing) once the margin is met.

Key Differences:

Cross-Entropy focuses on probabilities

SVM loss focuses on achieving a margin.

Cross-Entropy is generally more sensitive to changes in scores.

Short Answer Questions:

1. Question: What is the core task of image classification? Answer: The core task of image
classification is to take an input image and assign it to one of a fixed set of predefined
categories or labels.

2. Question: Name three common datasets used for image classification and briefly describe
one key characteristic of each. Answer:

MNIST: A dataset of grayscale images of handwritten digits (0-9), primarily used for
introductory tasks.
CIFAR-10: A dataset of small (32x32) RGB images belonging to 10 different object
classes.
ImageNet: A large-scale dataset with thousands of object categories and millions of
images, commonly used for benchmarking advanced image classification models.

3. Question: In the context of the Nearest Neighbor classifier, what happens during the
"training" phase? Answer: The algorithm memorizes all the training data points and their
corresponding labels. No actual learning or model building occurs.

4. Question: What is the time complexity of the testing phase in a Nearest Neighbor classifier
with N training examples? Answer: O(N), as it requires comparing the test image with every
training image to find the nearest neighbor.

5. Question: How does increasing the value of K in the K-Nearest Neighbors algorithm affect
the decision boundaries? Answer: smoothes out the decision boundaries, making them less
sensitive to outliers in the training data.

6. Question: Provide one reason why using raw pixel distances (like L1 or L2) for image
comparison in Nearest Neighbor can be problematic. Answer: Raw pixel distances can be
problematic because they are sensitive to variations in the image that don't change the
semantic content, such as small shifts, tints, or boxing of the object. Images with the same
semantic content can have a large pixel-wise distance.

7. Question: What are hyperparameters in the context of machine learning algorithms?

Answer: Hyperparameters are parameters of a learning algorithm that are not learned from
the training data but are set prior to the learning process. They control the algorithm's
behavior and need to be tuned.

8. Question: Explain the purpose of a validation set when setting hyperparameters for a
machine learning model. Answer: A validation set is used to evaluate the performance of a
model with different hyperparameter settings. It helps in selecting the hyperparameters that
generalize best to unseen data, preventing overfitting to the training set.
9. Question: Define the "curse of dimensionality" in the context of the K-Nearest Neighbor
algorithm. Answer: The curse of dimensionality refers to the phenomenon where the
number of training samples required to maintain a certain level of performance in K-
Nearest Neighbor grows exponentially with the number of dimensions (features) in the data
space.

10. Question: What is the role of a loss function in machine learning? Answer: A loss function
quantifies how well a classifier or model is performing by measuring the discrepancy
between the predicted outputs and the actual ground truth labels. Lower loss indicates
better performance.

11. Question: In the Multiclass SVM loss, what is the "margin"? Answer: In the Multiclass SVM
loss, the margin is a safety gap that the score of the correct class needs to exceed the
scores of the incorrect classes by. It's typically set to 1.

12. Question: What is the purpose of regularization in machine learning models? Answer: The
purpose of regularization is to prevent overfitting by adding a penalty to the loss function
based on the complexity of the model. This encourages simpler models that generalize
better to unseen data.

13. Question: Briefly describe the difference between L1 and L2 regularization. Answer: L1
regularization adds a penalty proportional to the absolute value of the weights,
encouraging sparsity (many weights become zero). L2 regularization adds a penalty
proportional to the square of the weights, encouraging smaller weights overall.

14. Question: What is the goal of the Softmax function in the context of Cross-Entropy loss?
Answer: The goal of the Softmax function is to transform the raw classifier scores (logits)
into a probability distribution over the classes, where the probabilities are non-negative and
sum to 1.

Open-Ended Questions:

15. Question: Describe the K-Nearest Neighbors (KNN) algorithm in detail, including its training
and testing phases, and discuss one advantage and one disadvantage of this approach for
image classification. Answer:

Training Phase: The KNN algorithm's training phase is extremely simple. It involves
storing all the training data points (images represented as feature vectors) and their
corresponding class labels. Essentially, the model "memorizes" the training set.

Testing Phase: When a new test image needs to be classified, the KNN algorithm
calculates the distance (using a chosen distance metric like L1 or L2) between the test
image and every image in the training set. It then identifies the K closest training
examples (the K-nearest neighbors) to the test image. The class label for the test image
is determined by a majority vote among the labels of its K-nearest neighbors.

Advantage: KNN is a non-parametric method, meaning it makes no assumptions about

the underlying data distribution. This makes it flexible and applicable to a wide range
of problems where the data distribution is unknown or complex. It is also simple to
understand and implement.

Disadvantage: The testing phase of KNN can be computationally expensive, especially

with large datasets, as it requires calculating the distance to every training sample for
each test sample (O(N) complexity). Furthermore, its performance can degrade in high-
dimensional spaces due to the curse of dimensionality. The choice of the distance
metric and the value of K can significantly impact performance and require careful
tuning.

16. Question: Explain the different strategies for setting hyperparameters discussed in the
lecture slides, highlighting the advantages and disadvantages of each. Answer:

Idea #1: Choose hyperparameters that work best on the training data.
Disadvantage: This approach is fundamentally flawed because the model will likely
overfit the training data. A model with K=1 in KNN will always achieve perfect
accuracy on the training data but will likely generalize poorly to unseen data.
Idea #2: Split data into train and test, choose hyperparameters that work best on the
test data.
Disadvantage: This is problematic because the test set should be used only once at
the very end to evaluate the final model's performance. Tuning hyperparameters
on the test set leads to information leakage and an overly optimistic estimate of
the model's generalization ability. The model effectively "learns" from the test set.
Idea #3: Split data into train, validation, and test; choose hyperparameters on the
validation set and evaluate on the test set.
Advantage: This is a much better approach. The training set is used for learning
the model parameters, the validation set is used for tuning hyperparameters, and
the test set is reserved for the final, unbiased evaluation.
Disadvantage: The performance can still be sensitive to how the data is split into
these sets, especially with limited data.
Idea #4: Cross-Validation: Split data into folds, try each fold as validation and
average the results.
Advantage: Cross-validation provides a more robust estimate of the model's
performance and is less sensitive to specific train-validation splits, especially with
limited data. It utilizes all the data for evaluation to some extent.
Disadvantage: Cross-validation can be computationally expensive, especially with
large datasets and complex models. It is less frequently used in deep learning
compared to the train/validation/test split due to the computational cost.

17. Question: Explain the concept of the Multiclass Support Vector Machine (SVM) loss
function. Include the formula and describe how it encourages the correct class to have a
higher score than the incorrect classes. Answer: The Multiclass SVM loss aims to ensure that
the score of the correct class for an input image is higher than the scores of all incorrect
classes by a certain margin (typically 1). For a single training example (xi, yi), where s is the
vector of scores for each class, the SVM loss is calculated as:

Li = ∑j≠yi max(0, sj - syi + Δ)

Where:

syi is the score for the correct class (yi).

sj is the score for an incorrect class (j ≠ yi).

Δ is the margin (usually set to 1).

Explanation: The formula iterates through all the incorrect classes. For each incorrect class,
it calculates the difference between its score and the score of the correct class, and adds the
margin Δ. If this value is positive, it means the score of the incorrect class is too close to or
higher than the score of the correct class (violating the desired margin), and this difference
contributes to the loss. The max(0, ...) ensures that only these violations contribute to
the loss; if the correct class score is sufficiently higher than the incorrect class score (by at
least the margin), the term becomes zero, and there's no penalty.

By minimizing this loss function during training, the model is encouraged to adjust its
parameters such that the score for the correct class is pushed higher, and the scores for the
incorrect classes are pushed lower, effectively creating a margin of separation.

18. Question: Compare and contrast the Multiclass SVM loss and the Cross-Entropy loss
(Multinomial Logistic Regression), highlighting their differences in how they interpret
classifier scores and respond to different score distributions. Answer:
Cross-Entropy Loss (Multinomial
Feature Multiclass SVM Loss
Logistic Regression)

Treats scores as direct

Interpretation Interprets scores as unnormalized log
indicators of class
of Scores probabilities (logits).
confidence.

Focuses on achieving a
Focuses on predicting the probability
specific margin between
distribution over classes that is closest
Focus the score of the correct
to the true distribution (one-hot
class and incorrect
encoding of the correct class).
classes.

Stops penalizing once the

Continues to penalize and improve
Behavior with correct class score is
even if the correct class has the highest
Correct greater than incorrect
probability, always striving for higher
Classification class scores by the
confidence in the correct class.
margin.

Sensitivity to Less sensitive to changes More sensitive to changes in scores, as

Score in scores once the margin it's directly tied to the probability
Changes is satisfied. distribution.

Output Produces a loss value. Produces a loss value.

Use of Does not inherently use Uses the Softmax function to convert
Softmax the Softmax function. scores to probabilities.

Tries to push the correct Tries to make the probability of the

Analogy answer "far enough" correct answer as close to 1 as
ahead. possible.

Key Differences:

Margin vs. Probability: SVM loss emphasizes a margin of separation, while Cross-
Entropy loss focuses on predicting probabilities.
Saturation: SVM loss saturates once the margin is met; Cross-Entropy loss continues to
decrease as the probability of the correct class approaches 1.
Sensitivity: Cross-Entropy is generally more sensitive to changes in the score of the
correct class even after it's correctly classified, whereas SVM loss becomes less sensitive
once the margin is satisfied.
In essence, SVM loss is more concerned with getting the classification correct with a
sufficient margin, while Cross-Entropy loss aims to predict the correct class with high
probability and is more informative about the model's confidence.

19. Question: Explain the concept of regularization and its importance in training machine
learning models. Describe two common types of regularization and how they achieve their
goal. Answer: Regularization is a set of techniques used to prevent overfitting in machine
learning models. Overfitting occurs when a model learns the training data too well,
including the noise and specific patterns that don't generalize to new, unseen data.
Regularization achieves this by adding a penalty term to the loss function, which
discourages overly complex models. The overall objective becomes minimizing both the
data loss (how well the model fits the training data) and the regularization loss (how
complex the model is).

Importance: Regularization is crucial because it helps models generalize better to unseen

data. By preferring simpler models, regularization reduces the variance of the model,
making its predictions more stable and reliable on new inputs.

Two Common Types of Regularization:

L1 Regularization (Lasso): L1 regularization adds a penalty to the loss function

proportional to the sum of the absolute values of the model's weights:

Regularization Loss (L1) = λ * ∑ |wi|

where λ is the regularization strength (a hyperparameter).

How it works: L1 regularization encourages sparsity in the weights, meaning it tends to

push some weights to exactly zero. This effectively performs feature selection, as
features with zero weights are ignored by the model. This leads to simpler and more
interpretable models.

L2 Regularization (Ridge): L2 regularization adds a penalty to the loss function

proportional to the sum of the squares of the model's weights:

Regularization Loss (L2) = λ * ∑ wi2

where λ is the regularization strength.

How it works: L2 regularization encourages weights to be small but not necessarily

zero. It effectively "shrinks" the weights towards zero. This helps to reduce the impact
of individual features, making the model less sensitive to noise and outliers in the
training data. L2 regularization tends to distribute the weight values more evenly across
the features.

Both L1 and L2 regularization help to control the complexity of the model and improve its
ability to generalize to unseen data, but they achieve this in slightly different ways, leading
to different characteristics in the learned models.

ML Interview Questions PDF
100% (5)
ML Interview Questions PDF
20 pages
ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
Machine Learning Multiple Choice Questions
100% (1)
Machine Learning Multiple Choice Questions
20 pages
Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
SP18 CS182 Midterm Solutions - Edited
No ratings yet
SP18 CS182 Midterm Solutions - Edited
14 pages
F16midterm Sols v2
No ratings yet
F16midterm Sols v2
14 pages
CHP 1,2
No ratings yet
CHP 1,2
18 pages
Final Exam Sujective Ch-1-8 Question Bank Fill in Blanks
No ratings yet
Final Exam Sujective Ch-1-8 Question Bank Fill in Blanks
5 pages
Introduction To Machine Learning and Neural Networ
No ratings yet
Introduction To Machine Learning and Neural Networ
10 pages
Ai ML Unit 3
No ratings yet
Ai ML Unit 3
15 pages
Answer 2023-24
No ratings yet
Answer 2023-24
19 pages
Machine Learning Algorithms - pptx-1
No ratings yet
Machine Learning Algorithms - pptx-1
129 pages
Interview Questions AI
No ratings yet
Interview Questions AI
7 pages
examBD2223 January Solutions
No ratings yet
examBD2223 January Solutions
7 pages
ML Lec-10
No ratings yet
ML Lec-10
19 pages
Machine Learning Midterm
No ratings yet
Machine Learning Midterm
18 pages
212 Final-Solution
No ratings yet
212 Final-Solution
23 pages
CS771 IITK EndSem Solutions
100% (1)
CS771 IITK EndSem Solutions
8 pages
Computational Machine Learning Mock Test
No ratings yet
Computational Machine Learning Mock Test
6 pages
Viva ML
No ratings yet
Viva ML
10 pages
KNN Interview Questions and Answers
No ratings yet
KNN Interview Questions and Answers
29 pages
ML Answerbank
No ratings yet
ML Answerbank
14 pages
ML
No ratings yet
ML
18 pages
Exercise #1 7 - 4 - 2025
No ratings yet
Exercise #1 7 - 4 - 2025
3 pages
ML Questions Answers
No ratings yet
ML Questions Answers
4 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Answer 2022-23
No ratings yet
Answer 2022-23
22 pages
Deep Learning Answers
No ratings yet
Deep Learning Answers
36 pages
SP18 Practice Midterm
No ratings yet
SP18 Practice Midterm
5 pages
Final 2019
No ratings yet
Final 2019
15 pages
Machine Learning Qs
No ratings yet
Machine Learning Qs
10 pages
MLANS
No ratings yet
MLANS
26 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
2023 ML Assignment
No ratings yet
2023 ML Assignment
57 pages
IML-IITKGP - Assignment 1 Solution
No ratings yet
IML-IITKGP - Assignment 1 Solution
7 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Practice MCQ AI
No ratings yet
Practice MCQ AI
4 pages
Midpaper
No ratings yet
Midpaper
16 pages
Advantages:: Q.No 1.a Ans
No ratings yet
Advantages:: Q.No 1.a Ans
12 pages
Q1-What's The Trade-Off Between Bias and Variance?
100% (1)
Q1-What's The Trade-Off Between Bias and Variance?
5 pages
CSE489: Machine Vision (Sheet 7) : Yehia Zakaria
No ratings yet
CSE489: Machine Vision (Sheet 7) : Yehia Zakaria
34 pages
Coincent Data Analysis Answers
No ratings yet
Coincent Data Analysis Answers
16 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Lecture 3 Mcqs
No ratings yet
Lecture 3 Mcqs
7 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
4 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
Machine Lar Arii
No ratings yet
Machine Lar Arii
9 pages
Asset-V1 ColumbiaX+CSMM.101x+1T2017+type@asset+block@AI Edx ML 5.1intro
No ratings yet
Asset-V1 ColumbiaX+CSMM.101x+1T2017+type@asset+block@AI Edx ML 5.1intro
70 pages
Digital Computer Concept and Practice: Supervised Learning
No ratings yet
Digital Computer Concept and Practice: Supervised Learning
30 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
PSCS511 - Machine Learning Ques Paper
No ratings yet
PSCS511 - Machine Learning Ques Paper
10 pages
Data Science Final Mock Test
No ratings yet
Data Science Final Mock Test
47 pages
Made Easy
No ratings yet
Made Easy
11 pages
19 Image Classification
No ratings yet
19 Image Classification
78 pages
AAM Book
No ratings yet
AAM Book
159 pages
Data Mining f20 Practice Final Solutions
No ratings yet
Data Mining f20 Practice Final Solutions
8 pages
Updated AAM QB
No ratings yet
Updated AAM QB
6 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
The Story of Artificial Intelligence
No ratings yet
The Story of Artificial Intelligence
2 pages
Python Timetable
No ratings yet
Python Timetable
3 pages
Eprpa 204a
No ratings yet
Eprpa 204a
4 pages
Microsoft Azure Ai Fundamentals Ai 900 Coures Outline
No ratings yet
Microsoft Azure Ai Fundamentals Ai 900 Coures Outline
2 pages
Machine Learning: Algorithms Types
No ratings yet
Machine Learning: Algorithms Types
27 pages
ML - UNIT-1 &2 Notes
No ratings yet
ML - UNIT-1 &2 Notes
84 pages
2017 IDEC Guo
No ratings yet
2017 IDEC Guo
7 pages
2210 s23 QP 11 Removed-Merged
No ratings yet
2210 s23 QP 11 Removed-Merged
5 pages
Exploring The Use of Different Feature Levels of CNN For Anomaly Detection
No ratings yet
Exploring The Use of Different Feature Levels of CNN For Anomaly Detection
5 pages
Deep Neural Network - Application 2layer
No ratings yet
Deep Neural Network - Application 2layer
7 pages
Networks of Artificial Neurons, Single Layer Perceptrons
No ratings yet
Networks of Artificial Neurons, Single Layer Perceptrons
18 pages
Mcculloch-Pitts Neural Model and Pattern Classification.
No ratings yet
Mcculloch-Pitts Neural Model and Pattern Classification.
13 pages
QDL-CMFD: A Quality-Independent and Deep Learning-Based Copy-Move Image Forgery Detection Method
No ratings yet
QDL-CMFD: A Quality-Independent and Deep Learning-Based Copy-Move Image Forgery Detection Method
9 pages
Regularization
No ratings yet
Regularization
9 pages
The Rise of Artificial Intelligence
No ratings yet
The Rise of Artificial Intelligence
2 pages
JaiGanesh TrainerProfile Valeo
No ratings yet
JaiGanesh TrainerProfile Valeo
8 pages
Prompt-Specific Poisoning Attacks On Text-to-Image Generative Models
No ratings yet
Prompt-Specific Poisoning Attacks On Text-to-Image Generative Models
19 pages
B, by Generating Creative Content For Marketing
No ratings yet
B, by Generating Creative Content For Marketing
3 pages
18cse392t Syllabus
No ratings yet
18cse392t Syllabus
2 pages
Ann Assignmeent 1,2,3
No ratings yet
Ann Assignmeent 1,2,3
23 pages
Road Damage
No ratings yet
Road Damage
13 pages
DMT MCQ
No ratings yet
DMT MCQ
15 pages
AI Vs AI Can AI Detect AI-Generated Images
No ratings yet
AI Vs AI Can AI Detect AI-Generated Images
18 pages
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
0% (1)
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
16 pages
Brand Logo Detection Using Convolutional Neural Network IJERTCONV6IS13121
No ratings yet
Brand Logo Detection Using Convolutional Neural Network IJERTCONV6IS13121
4 pages
Part B Unit 2 AI Project Cycle
No ratings yet
Part B Unit 2 AI Project Cycle
25 pages
Int234 Oer
No ratings yet
Int234 Oer
2 pages
Why Deeplab v3
No ratings yet
Why Deeplab v3
3 pages
Neuromorphic Computing - Mimicking The Human Brain For Efficient AI
No ratings yet
Neuromorphic Computing - Mimicking The Human Brain For Efficient AI
2 pages
Chapter - 1
No ratings yet
Chapter - 1
33 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

CSE381 Introduction To Machine Learning - Image Classification and Loss Functions: Theoretical Questions and Answers

Uploaded by

CSE381 Introduction To Machine Learning - Image Classification and Loss Functions: Theoretical Questions and Answers

Uploaded by

CSE381 Introduction to Machine Learning - Image

Classification and Loss Functions: Theoretical Questions and

(2) What is the main disadvantage of k-NN algorithm?

Multiclass SVM Loss (Hinge Loss):

Cross-Entropy focuses on probabilities

SVM loss focuses on achieving a margin.

Cross-Entropy is generally more sensitive to changes in scores.

7. Question: What are hyperparameters in the context of machine learning algorithms?

Advantage: KNN is a non-parametric method, meaning it makes no assumptions about

Disadvantage: The testing phase of KNN can be computationally expensive, especially

Li = ∑j≠yi max(0, sj - syi + Δ)

syi is the score for the correct class (yi).

sj is the score for an incorrect class (j ≠ yi).

Treats scores as direct

Stops penalizing once the

Sensitivity to Less sensitive to changes More sensitive to changes in scores, as

Output Produces a loss value. Produces a loss value.

Tries to push the correct Tries to make the probability of the

Importance: Regularization is crucial because it helps models generalize better to unseen

Two Common Types of Regularization:

L1 Regularization (Lasso): L1 regularization adds a penalty to the loss function

Regularization Loss (L1) = λ * ∑ |wi|

where λ is the regularization strength (a hyperparameter).

How it works: L1 regularization encourages sparsity in the weights, meaning it tends to

L2 Regularization (Ridge): L2 regularization adds a penalty to the loss function

Regularization Loss (L2) = λ * ∑ wi2

where λ is the regularization strength.

How it works: L2 regularization encourages weights to be small but not necessarily

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.