ML Assignment Report Prithvi D
ML Assignment Report Prithvi D
Assignment 1
Prithvi Dhyani
November 5, 2024
The above dataset is generated by two Gaussian clusters(σ “ 1), and each cluster has a randomly
selected mean. By visual inspection, it is clear that the two clusters are linearly separable. This
information is crucial as we will later create the hyper-parameter search grids with this in mind.
1
1.2 Two Moons(Non-Linearly Separable) Dataset
The above dataset is generated by some finite samples of the sets of points on two semi circles(upper
semi-circle centered at (0,0), lower semi-circle centered at (1, 0.5). These points are then shifted by some
noise from a standard normal distribution, and the noise itself is scaled by a factor of 0.1. Clearly, this
dataset isn’t linearly separable, and would hence require a non-linear decision boundary for classification.
2
1.3 Concentric Circles(Non-Linearly Separable) Dataset
The above dataset is generated by some finite samples of the sets of points on two concentric circles
with radii 1 and 2 respectively, both centered at (0, 0). Each point is shifted by some noise sampled
from a standard normal distribution, and the noise itself is scaled by a factor of 0.5. Once again, this
dataset is clearly not linearly separable.
2 Hyper-parameter Considerations
Recall that our goal is to train, tune and test 4 supervised learning algorithms, namely Decision
Tree, Random Forest, KNN and Neural Network. In order to compare the performance of models across
different datasets, we need to make sure that the search space(grid) for hyper-parameters is standardized
for each model across all 3 datasets. Hence, we need only define 4 unique hyper-parameter grids, for
each of the 4 models.
• criterion: This specifies whether the measure we are using to determine the best split is en-
tropy(information gain) or Gini impurity.
3
• max depth: This specifies the maximum depth that the decision tree can reach, in terms of
the number of splits. In this case, we consider the values tN one, 1, 2, 5, 10u. The reason for
considering such small values is that by visual examination of the datasets, and the fact that our
data is 2-dimensional, clearly 10 linear boundaries is more than enough to separate the two classes.
• min samples split: This specifies the minimum number of data-points that need to be present in
a node in order to split it. We consider the values t2, 4, 6u. Higher values are included to prevent
over-fitting by splitting nodes with few points.
• min samples leaf: This specifies the minimum number of data-points required in a node for it
to be a leaf node. We consider the values t1, 2, 4u.
– n estimators: This specifies the number of trees in the forest. We consider the values
t10, 25, 50u.
– max depth: This specifies the maximum depth of each tree. We consider the values
tN one, 1, 2, 5, 10u.
– min samples split: This specifies the minimum number of data-points that need to be
present in a node in order to split it. We consider the values t2, 4, 6u.
– min samples leaf: This specifies the minimum number of data-points required in a node
for it to be a leaf node. We consider the values t1, 2, 4u.
– criterion: This specifies whether the measure we are using to determine the best split is
Gini impurity or entropy (information gain). We consider the values t1 gini1 ,1 entropy 1 u.
– max features: This controls the number of features to consider at each split. We consider
the values t1, N oneu.
– bootstrap: This controls whether bootstrapping is used in the forest. We consider the value
tT rueu.
– n neighbors: This specifies the number of neighbors to consider. We consider the values
t3, 5, 7, 9u.
– weights: This specifies the weighting scheme used. We consider the options t1 unif orm1 ,1 distance1 u.
– metric: This specifies the distance metrics for KNN. We consider the options t1 euclidean1 ,1 manhattan
4
2.4 Neural Network
In the case of the Neural Network learning algorithm, we consider the following 4 hyper-parameters:
– hidden sizes: This specifies the number of units in each hidden layer. We consider the
values t2, 5, 10u.
– learning rates: This specifies the learning rates for training. We consider the values
t0.01, 0.001u.
– num hidden layers: This specifies the number of hidden layers in the network. We consider
the values t1, 2, 3u.
– activation functions: This specifies the activation functions to use in the hidden layers.
We consider the options tReLU, Sigmoid, Tanhu.
3 Results Analysis
In this section, we analyze the results obtained from the different supervised machine learning
models on the datasets. To evaluate the performance of each model, we utilize several key metrics:
accuracy, precision, recall, and F1-score. These metrics provide insights into how well the models
perform, especially in the context of classification tasks.
– Recall (also known as sensitivity or true positive rate) measures the ability of a model to
identify all relevant instances. It is defined as:
TP
Recall “
TP ` FN
– F1-score is the harmonic mean of precision and recall, providing a single metric that balances
both concerns. It is computed as:
Precision ¨ Recall
F1 “ 2 ¨
Precision ` Recall
5
These metrics serve different purposes in our evaluation. Accuracy gives an overall measure of
performance, but it can be misleading in imbalanced datasets where one class is more prevalent.
Precision and recall provide a more nuanced view of performance, especially in scenarios where
the cost of false positives and false negatives differs significantly. The F1-score combines both
precision and recall into a single metric, making it useful when we need a balance between the
two.
We will use these metrics to evaluate and compare the performance of the Decision Tree, Random
Forest, KNN, and Neural Network models across the various datasets. By analyzing these metrics,
we can determine which model best suits the characteristics of each dataset and the specific
classification task at hand. The goal is to select a model that not only performs well on accuracy
but also maintains a high precision and recall, ensuring that our classification outcomes are both
reliable and relevant.
6
7
All four models managed to classify all instances accurately, demonstrating their effectiveness in
handling linearly separable data. The optimal sets of hyperparameters indicate that the models
leverage their respective strengths, such as the flexibility of the Decision Tree and Random Forest
8
algorithms and the adaptability of the KNN and Neural Network approaches. The simplicity of
the dataset allowed for straightforward parameter tuning, resulting in optimal performances across
the board.
9
10
All four models effectively classified all instances, showcasing their proficiency in handling the
intricate structure of the Two Moons dataset. The optimal hyperparameters for each model suggest
an effective learning strategy; for instance, the Decision Tree and Random Forest utilized entropy
for splitting, which is well-suited for such complex distributions. The KNN model maintained a
11
consistent performance with uniform weights, while the Neural Network adapted effectively with
two hidden layers and a Tanh activation function. This demonstrates that even in non-linearly
separable cases, well-tuned models can achieve remarkable accuracy by capturing the underlying
patterns in the data.
12
13
The performance of the models demonstrates their ability to adapt to the unique challenges posed
by the Concentric Circles dataset. While the Decision Tree and Random Forest models maintained
a high accuracy, their precision for class 0 was slightly lower, reflecting a trade-off between pre-
cision and recall. The KNN model also showed robust performance, benefitting from an optimal
14
number of neighbors that enabled it to balance bias and variance effectively. The Neural Network’s
configuration, with three hidden layers and a Tanh activation function, allowed it to learn and gen-
eralize well, resulting in perfect accuracy. Overall, these results underline the importance of model
selection and hyperparameter tuning in achieving effective classification for complex datasets.
15