CSE381 Introduction To Machine Learning - Image Classification and Loss Functions: Theoretical Questions and Answers
CSE381 Introduction To Machine Learning - Image Classification and Loss Functions: Theoretical Questions and Answers
Answer: The main disadvantage of the k-NN algorithm is its high computational cost
during the testing phase, especially when the training dataset is large. For each test
example, it needs to calculate the distance to all training examples, which has a time
complexity of O(N) where N is the number of training examples. It can also be sensitiveto
the choice of K and the curse of dimensionality.
(4) Compare between the two loss functions: Cross‐Entropy versus Multiclass SVM Loss.
Answer:
Cross-Entropy:
Measures the difference between the predicted probability distribution and the
true distribution.
Used with Softmax to produce probabilities.
Aims to maximize the probability of the correct class.
Continues to improve even when the correct class has the highest probability.
Encourages the correct class score to be higher than incorrect class scores by a
margin.
Penalizes only misclassifications that violate the margin.
Saturates (stops penalizing) once the margin is met.
Key Differences:
1. Question: What is the core task of image classification? Answer: The core task of image
classification is to take an input image and assign it to one of a fixed set of predefined
categories or labels.
2. Question: Name three common datasets used for image classification and briefly describe
one key characteristic of each. Answer:
MNIST: A dataset of grayscale images of handwritten digits (0-9), primarily used for
introductory tasks.
CIFAR-10: A dataset of small (32x32) RGB images belonging to 10 different object
classes.
ImageNet: A large-scale dataset with thousands of object categories and millions of
images, commonly used for benchmarking advanced image classification models.
3. Question: In the context of the Nearest Neighbor classifier, what happens during the
"training" phase? Answer: The algorithm memorizes all the training data points and their
corresponding labels. No actual learning or model building occurs.
4. Question: What is the time complexity of the testing phase in a Nearest Neighbor classifier
with N training examples? Answer: O(N), as it requires comparing the test image with every
training image to find the nearest neighbor.
5. Question: How does increasing the value of K in the K-Nearest Neighbors algorithm affect
the decision boundaries? Answer: smoothes out the decision boundaries, making them less
sensitive to outliers in the training data.
6. Question: Provide one reason why using raw pixel distances (like L1 or L2) for image
comparison in Nearest Neighbor can be problematic. Answer: Raw pixel distances can be
problematic because they are sensitive to variations in the image that don't change the
semantic content, such as small shifts, tints, or boxing of the object. Images with the same
semantic content can have a large pixel-wise distance.
8. Question: Explain the purpose of a validation set when setting hyperparameters for a
machine learning model. Answer: A validation set is used to evaluate the performance of a
model with different hyperparameter settings. It helps in selecting the hyperparameters that
generalize best to unseen data, preventing overfitting to the training set.
9. Question: Define the "curse of dimensionality" in the context of the K-Nearest Neighbor
algorithm. Answer: The curse of dimensionality refers to the phenomenon where the
number of training samples required to maintain a certain level of performance in K-
Nearest Neighbor grows exponentially with the number of dimensions (features) in the data
space.
10. Question: What is the role of a loss function in machine learning? Answer: A loss function
quantifies how well a classifier or model is performing by measuring the discrepancy
between the predicted outputs and the actual ground truth labels. Lower loss indicates
better performance.
11. Question: In the Multiclass SVM loss, what is the "margin"? Answer: In the Multiclass SVM
loss, the margin is a safety gap that the score of the correct class needs to exceed the
scores of the incorrect classes by. It's typically set to 1.
12. Question: What is the purpose of regularization in machine learning models? Answer: The
purpose of regularization is to prevent overfitting by adding a penalty to the loss function
based on the complexity of the model. This encourages simpler models that generalize
better to unseen data.
13. Question: Briefly describe the difference between L1 and L2 regularization. Answer: L1
regularization adds a penalty proportional to the absolute value of the weights,
encouraging sparsity (many weights become zero). L2 regularization adds a penalty
proportional to the square of the weights, encouraging smaller weights overall.
14. Question: What is the goal of the Softmax function in the context of Cross-Entropy loss?
Answer: The goal of the Softmax function is to transform the raw classifier scores (logits)
into a probability distribution over the classes, where the probabilities are non-negative and
sum to 1.
Open-Ended Questions:
15. Question: Describe the K-Nearest Neighbors (KNN) algorithm in detail, including its training
and testing phases, and discuss one advantage and one disadvantage of this approach for
image classification. Answer:
Training Phase: The KNN algorithm's training phase is extremely simple. It involves
storing all the training data points (images represented as feature vectors) and their
corresponding class labels. Essentially, the model "memorizes" the training set.
Testing Phase: When a new test image needs to be classified, the KNN algorithm
calculates the distance (using a chosen distance metric like L1 or L2) between the test
image and every image in the training set. It then identifies the K closest training
examples (the K-nearest neighbors) to the test image. The class label for the test image
is determined by a majority vote among the labels of its K-nearest neighbors.
16. Question: Explain the different strategies for setting hyperparameters discussed in the
lecture slides, highlighting the advantages and disadvantages of each. Answer:
Idea #1: Choose hyperparameters that work best on the training data.
Disadvantage: This approach is fundamentally flawed because the model will likely
overfit the training data. A model with K=1 in KNN will always achieve perfect
accuracy on the training data but will likely generalize poorly to unseen data.
Idea #2: Split data into train and test, choose hyperparameters that work best on the
test data.
Disadvantage: This is problematic because the test set should be used only once at
the very end to evaluate the final model's performance. Tuning hyperparameters
on the test set leads to information leakage and an overly optimistic estimate of
the model's generalization ability. The model effectively "learns" from the test set.
Idea #3: Split data into train, validation, and test; choose hyperparameters on the
validation set and evaluate on the test set.
Advantage: This is a much better approach. The training set is used for learning
the model parameters, the validation set is used for tuning hyperparameters, and
the test set is reserved for the final, unbiased evaluation.
Disadvantage: The performance can still be sensitive to how the data is split into
these sets, especially with limited data.
Idea #4: Cross-Validation: Split data into folds, try each fold as validation and
average the results.
Advantage: Cross-validation provides a more robust estimate of the model's
performance and is less sensitive to specific train-validation splits, especially with
limited data. It utilizes all the data for evaluation to some extent.
Disadvantage: Cross-validation can be computationally expensive, especially with
large datasets and complex models. It is less frequently used in deep learning
compared to the train/validation/test split due to the computational cost.
17. Question: Explain the concept of the Multiclass Support Vector Machine (SVM) loss
function. Include the formula and describe how it encourages the correct class to have a
higher score than the incorrect classes. Answer: The Multiclass SVM loss aims to ensure that
the score of the correct class for an input image is higher than the scores of all incorrect
classes by a certain margin (typically 1). For a single training example (xi, yi), where s is the
vector of scores for each class, the SVM loss is calculated as:
Where:
Explanation: The formula iterates through all the incorrect classes. For each incorrect class,
it calculates the difference between its score and the score of the correct class, and adds the
margin Δ. If this value is positive, it means the score of the incorrect class is too close to or
higher than the score of the correct class (violating the desired margin), and this difference
contributes to the loss. The max(0, ...) ensures that only these violations contribute to
the loss; if the correct class score is sufficiently higher than the incorrect class score (by at
least the margin), the term becomes zero, and there's no penalty.
By minimizing this loss function during training, the model is encouraged to adjust its
parameters such that the score for the correct class is pushed higher, and the scores for the
incorrect classes are pushed lower, effectively creating a margin of separation.
18. Question: Compare and contrast the Multiclass SVM loss and the Cross-Entropy loss
(Multinomial Logistic Regression), highlighting their differences in how they interpret
classifier scores and respond to different score distributions. Answer:
Cross-Entropy Loss (Multinomial
Feature Multiclass SVM Loss
Logistic Regression)
Focuses on achieving a
Focuses on predicting the probability
specific margin between
distribution over classes that is closest
Focus the score of the correct
to the true distribution (one-hot
class and incorrect
encoding of the correct class).
classes.
Use of Does not inherently use Uses the Softmax function to convert
Softmax the Softmax function. scores to probabilities.
Key Differences:
Margin vs. Probability: SVM loss emphasizes a margin of separation, while Cross-
Entropy loss focuses on predicting probabilities.
Saturation: SVM loss saturates once the margin is met; Cross-Entropy loss continues to
decrease as the probability of the correct class approaches 1.
Sensitivity: Cross-Entropy is generally more sensitive to changes in the score of the
correct class even after it's correctly classified, whereas SVM loss becomes less sensitive
once the margin is satisfied.
In essence, SVM loss is more concerned with getting the classification correct with a
sufficient margin, while Cross-Entropy loss aims to predict the correct class with high
probability and is more informative about the model's confidence.
19. Question: Explain the concept of regularization and its importance in training machine
learning models. Describe two common types of regularization and how they achieve their
goal. Answer: Regularization is a set of techniques used to prevent overfitting in machine
learning models. Overfitting occurs when a model learns the training data too well,
including the noise and specific patterns that don't generalize to new, unseen data.
Regularization achieves this by adding a penalty term to the loss function, which
discourages overly complex models. The overall objective becomes minimizing both the
data loss (how well the model fits the training data) and the regularization loss (how
complex the model is).
Both L1 and L2 regularization help to control the complexity of the model and improve its
ability to generalize to unseen data, but they achieve this in slightly different ways, leading
to different characteristics in the learned models.