0% found this document useful (0 votes)
14 views8 pages

CSE381 Introduction To Machine Learning - Image Classification and Loss Functions: Theoretical Questions and Answers

The document discusses various aspects of image classification and machine learning, focusing on the k-NN algorithm, loss functions, and hyperparameter tuning. It highlights the disadvantages of k-NN, compares Cross-Entropy and Multiclass SVM Loss, and explains the importance of regularization in preventing overfitting. Additionally, it covers common datasets for image classification and the role of validation sets and hyperparameters in model training.

Uploaded by

Bebo BX
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views8 pages

CSE381 Introduction To Machine Learning - Image Classification and Loss Functions: Theoretical Questions and Answers

The document discusses various aspects of image classification and machine learning, focusing on the k-NN algorithm, loss functions, and hyperparameter tuning. It highlights the disadvantages of k-NN, compares Cross-Entropy and Multiclass SVM Loss, and explains the importance of regularization in preventing overfitting. Additionally, it covers common datasets for image classification and the role of validation sets and hyperparameters in model training.

Uploaded by

Bebo BX
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

CSE381 Introduction to Machine Learning - Image

Classification and Loss Functions: Theoretical Questions and


Answers
Fall 22

(2) What is the main disadvantage of k-NN algorithm?

Answer: The main disadvantage of the k-NN algorithm is its high computational cost
during the testing phase, especially when the training dataset is large. For each test
example, it needs to calculate the distance to all training examples, which has a time
complexity of O(N) where N is the number of training examples. It can also be sensitiveto
the choice of K and the curse of dimensionality.

(4) Compare between the two loss functions: Cross‐Entropy versus Multiclass SVM Loss.

Answer:

Cross-Entropy:

Measures the difference between the predicted probability distribution and the
true distribution.
Used with Softmax to produce probabilities.
Aims to maximize the probability of the correct class.
Continues to improve even when the correct class has the highest probability.

Multiclass SVM Loss (Hinge Loss):

Encourages the correct class score to be higher than incorrect class scores by a
margin.
Penalizes only misclassifications that violate the margin.
Saturates (stops penalizing) once the margin is met.

Key Differences:

Cross-Entropy focuses on probabilities

SVM loss focuses on achieving a margin.

Cross-Entropy is generally more sensitive to changes in scores.


Short Answer Questions:

1. Question: What is the core task of image classification? Answer: The core task of image
classification is to take an input image and assign it to one of a fixed set of predefined
categories or labels.

2. Question: Name three common datasets used for image classification and briefly describe
one key characteristic of each. Answer:

MNIST: A dataset of grayscale images of handwritten digits (0-9), primarily used for
introductory tasks.
CIFAR-10: A dataset of small (32x32) RGB images belonging to 10 different object
classes.
ImageNet: A large-scale dataset with thousands of object categories and millions of
images, commonly used for benchmarking advanced image classification models.

3. Question: In the context of the Nearest Neighbor classifier, what happens during the
"training" phase? Answer: The algorithm memorizes all the training data points and their
corresponding labels. No actual learning or model building occurs.

4. Question: What is the time complexity of the testing phase in a Nearest Neighbor classifier
with N training examples? Answer: O(N), as it requires comparing the test image with every
training image to find the nearest neighbor.

5. Question: How does increasing the value of K in the K-Nearest Neighbors algorithm affect
the decision boundaries? Answer: smoothes out the decision boundaries, making them less
sensitive to outliers in the training data.

6. Question: Provide one reason why using raw pixel distances (like L1 or L2) for image
comparison in Nearest Neighbor can be problematic. Answer: Raw pixel distances can be
problematic because they are sensitive to variations in the image that don't change the
semantic content, such as small shifts, tints, or boxing of the object. Images with the same
semantic content can have a large pixel-wise distance.

7. Question: What are hyperparameters in the context of machine learning algorithms?


Answer: Hyperparameters are parameters of a learning algorithm that are not learned from
the training data but are set prior to the learning process. They control the algorithm's
behavior and need to be tuned.

8. Question: Explain the purpose of a validation set when setting hyperparameters for a
machine learning model. Answer: A validation set is used to evaluate the performance of a
model with different hyperparameter settings. It helps in selecting the hyperparameters that
generalize best to unseen data, preventing overfitting to the training set.
9. Question: Define the "curse of dimensionality" in the context of the K-Nearest Neighbor
algorithm. Answer: The curse of dimensionality refers to the phenomenon where the
number of training samples required to maintain a certain level of performance in K-
Nearest Neighbor grows exponentially with the number of dimensions (features) in the data
space.

10. Question: What is the role of a loss function in machine learning? Answer: A loss function
quantifies how well a classifier or model is performing by measuring the discrepancy
between the predicted outputs and the actual ground truth labels. Lower loss indicates
better performance.

11. Question: In the Multiclass SVM loss, what is the "margin"? Answer: In the Multiclass SVM
loss, the margin is a safety gap that the score of the correct class needs to exceed the
scores of the incorrect classes by. It's typically set to 1.

12. Question: What is the purpose of regularization in machine learning models? Answer: The
purpose of regularization is to prevent overfitting by adding a penalty to the loss function
based on the complexity of the model. This encourages simpler models that generalize
better to unseen data.

13. Question: Briefly describe the difference between L1 and L2 regularization. Answer: L1
regularization adds a penalty proportional to the absolute value of the weights,
encouraging sparsity (many weights become zero). L2 regularization adds a penalty
proportional to the square of the weights, encouraging smaller weights overall.

14. Question: What is the goal of the Softmax function in the context of Cross-Entropy loss?
Answer: The goal of the Softmax function is to transform the raw classifier scores (logits)
into a probability distribution over the classes, where the probabilities are non-negative and
sum to 1.

Open-Ended Questions:

15. Question: Describe the K-Nearest Neighbors (KNN) algorithm in detail, including its training
and testing phases, and discuss one advantage and one disadvantage of this approach for
image classification. Answer:

Training Phase: The KNN algorithm's training phase is extremely simple. It involves
storing all the training data points (images represented as feature vectors) and their
corresponding class labels. Essentially, the model "memorizes" the training set.

Testing Phase: When a new test image needs to be classified, the KNN algorithm
calculates the distance (using a chosen distance metric like L1 or L2) between the test
image and every image in the training set. It then identifies the K closest training
examples (the K-nearest neighbors) to the test image. The class label for the test image
is determined by a majority vote among the labels of its K-nearest neighbors.

Advantage: KNN is a non-parametric method, meaning it makes no assumptions about


the underlying data distribution. This makes it flexible and applicable to a wide range
of problems where the data distribution is unknown or complex. It is also simple to
understand and implement.

Disadvantage: The testing phase of KNN can be computationally expensive, especially


with large datasets, as it requires calculating the distance to every training sample for
each test sample (O(N) complexity). Furthermore, its performance can degrade in high-
dimensional spaces due to the curse of dimensionality. The choice of the distance
metric and the value of K can significantly impact performance and require careful
tuning.

16. Question: Explain the different strategies for setting hyperparameters discussed in the
lecture slides, highlighting the advantages and disadvantages of each. Answer:

Idea #1: Choose hyperparameters that work best on the training data.
Disadvantage: This approach is fundamentally flawed because the model will likely
overfit the training data. A model with K=1 in KNN will always achieve perfect
accuracy on the training data but will likely generalize poorly to unseen data.
Idea #2: Split data into train and test, choose hyperparameters that work best on the
test data.
Disadvantage: This is problematic because the test set should be used only once at
the very end to evaluate the final model's performance. Tuning hyperparameters
on the test set leads to information leakage and an overly optimistic estimate of
the model's generalization ability. The model effectively "learns" from the test set.
Idea #3: Split data into train, validation, and test; choose hyperparameters on the
validation set and evaluate on the test set.
Advantage: This is a much better approach. The training set is used for learning
the model parameters, the validation set is used for tuning hyperparameters, and
the test set is reserved for the final, unbiased evaluation.
Disadvantage: The performance can still be sensitive to how the data is split into
these sets, especially with limited data.
Idea #4: Cross-Validation: Split data into folds, try each fold as validation and
average the results.
Advantage: Cross-validation provides a more robust estimate of the model's
performance and is less sensitive to specific train-validation splits, especially with
limited data. It utilizes all the data for evaluation to some extent.
Disadvantage: Cross-validation can be computationally expensive, especially with
large datasets and complex models. It is less frequently used in deep learning
compared to the train/validation/test split due to the computational cost.

17. Question: Explain the concept of the Multiclass Support Vector Machine (SVM) loss
function. Include the formula and describe how it encourages the correct class to have a
higher score than the incorrect classes. Answer: The Multiclass SVM loss aims to ensure that
the score of the correct class for an input image is higher than the scores of all incorrect
classes by a certain margin (typically 1). For a single training example (xi, yi), where s is the
vector of scores for each class, the SVM loss is calculated as:

Li = ∑j≠yi max(0, sj - syi + Δ)

Where:

syi is the score for the correct class (yi).

sj is the score for an incorrect class (j ≠ yi).


Δ is the margin (usually set to 1).

Explanation: The formula iterates through all the incorrect classes. For each incorrect class,
it calculates the difference between its score and the score of the correct class, and adds the
margin Δ. If this value is positive, it means the score of the incorrect class is too close to or
higher than the score of the correct class (violating the desired margin), and this difference
contributes to the loss. The max(0, ...) ensures that only these violations contribute to
the loss; if the correct class score is sufficiently higher than the incorrect class score (by at
least the margin), the term becomes zero, and there's no penalty.

By minimizing this loss function during training, the model is encouraged to adjust its
parameters such that the score for the correct class is pushed higher, and the scores for the
incorrect classes are pushed lower, effectively creating a margin of separation.

18. Question: Compare and contrast the Multiclass SVM loss and the Cross-Entropy loss
(Multinomial Logistic Regression), highlighting their differences in how they interpret
classifier scores and respond to different score distributions. Answer:
Cross-Entropy Loss (Multinomial
Feature Multiclass SVM Loss
Logistic Regression)

Treats scores as direct


Interpretation Interprets scores as unnormalized log
indicators of class
of Scores probabilities (logits).
confidence.

Focuses on achieving a
Focuses on predicting the probability
specific margin between
distribution over classes that is closest
Focus the score of the correct
to the true distribution (one-hot
class and incorrect
encoding of the correct class).
classes.

Stops penalizing once the


Continues to penalize and improve
Behavior with correct class score is
even if the correct class has the highest
Correct greater than incorrect
probability, always striving for higher
Classification class scores by the
confidence in the correct class.
margin.

Sensitivity to Less sensitive to changes More sensitive to changes in scores, as


Score in scores once the margin it's directly tied to the probability
Changes is satisfied. distribution.

Output Produces a loss value. Produces a loss value.

Use of Does not inherently use Uses the Softmax function to convert
Softmax the Softmax function. scores to probabilities.

Tries to push the correct Tries to make the probability of the


Analogy answer "far enough" correct answer as close to 1 as
ahead. possible.

Key Differences:

Margin vs. Probability: SVM loss emphasizes a margin of separation, while Cross-
Entropy loss focuses on predicting probabilities.
Saturation: SVM loss saturates once the margin is met; Cross-Entropy loss continues to
decrease as the probability of the correct class approaches 1.
Sensitivity: Cross-Entropy is generally more sensitive to changes in the score of the
correct class even after it's correctly classified, whereas SVM loss becomes less sensitive
once the margin is satisfied.
In essence, SVM loss is more concerned with getting the classification correct with a
sufficient margin, while Cross-Entropy loss aims to predict the correct class with high
probability and is more informative about the model's confidence.

19. Question: Explain the concept of regularization and its importance in training machine
learning models. Describe two common types of regularization and how they achieve their
goal. Answer: Regularization is a set of techniques used to prevent overfitting in machine
learning models. Overfitting occurs when a model learns the training data too well,
including the noise and specific patterns that don't generalize to new, unseen data.
Regularization achieves this by adding a penalty term to the loss function, which
discourages overly complex models. The overall objective becomes minimizing both the
data loss (how well the model fits the training data) and the regularization loss (how
complex the model is).

Importance: Regularization is crucial because it helps models generalize better to unseen


data. By preferring simpler models, regularization reduces the variance of the model,
making its predictions more stable and reliable on new inputs.

Two Common Types of Regularization:

L1 Regularization (Lasso): L1 regularization adds a penalty to the loss function


proportional to the sum of the absolute values of the model's weights:

Regularization Loss (L1) = λ * ∑ |wi|

where λ is the regularization strength (a hyperparameter).

How it works: L1 regularization encourages sparsity in the weights, meaning it tends to


push some weights to exactly zero. This effectively performs feature selection, as
features with zero weights are ignored by the model. This leads to simpler and more
interpretable models.

L2 Regularization (Ridge): L2 regularization adds a penalty to the loss function


proportional to the sum of the squares of the model's weights:

Regularization Loss (L2) = λ * ∑ wi2

where λ is the regularization strength.

How it works: L2 regularization encourages weights to be small but not necessarily


zero. It effectively "shrinks" the weights towards zero. This helps to reduce the impact
of individual features, making the model less sensitive to noise and outliers in the
training data. L2 regularization tends to distribute the weight values more evenly across
the features.

Both L1 and L2 regularization help to control the complexity of the model and improve its
ability to generalize to unseen data, but they achieve this in slightly different ways, leading
to different characteristics in the learned models.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy