0% found this document useful (0 votes)

79 views15 pages

ML Assignment Report Prithvi D

Uploaded by

prithvi.dhyanii

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views15 pages

ML Assignment Report Prithvi D

Uploaded by

prithvi.dhyanii

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

CSIT332 - Principles of Machine Learning

Assignment 1
Prithvi Dhyani
November 5, 2024

1 Introduction and Data Exploration

Our goal in this project is to explore four supervised machine learning algorithms(Decision Tree,
Random Forest, KNN, Neural Network) on three different datasets. Before we begin any kind of mod-
eling, lets start by exploring the three datasets. Please note that each dataset contains exactly 500
points(350 for training, 150 for testing).

1.1 Linearly Separable Dataset

The above dataset is generated by two Gaussian clusters(σ “ 1), and each cluster has a randomly
selected mean. By visual inspection, it is clear that the two clusters are linearly separable. This
information is crucial as we will later create the hyper-parameter search grids with this in mind.

1
1.2 Two Moons(Non-Linearly Separable) Dataset

The above dataset is generated by some finite samples of the sets of points on two semi circles(upper
semi-circle centered at (0,0), lower semi-circle centered at (1, 0.5). These points are then shifted by some
noise from a standard normal distribution, and the noise itself is scaled by a factor of 0.1. Clearly, this
dataset isn’t linearly separable, and would hence require a non-linear decision boundary for classification.

2
1.3 Concentric Circles(Non-Linearly Separable) Dataset

The above dataset is generated by some finite samples of the sets of points on two concentric circles
with radii 1 and 2 respectively, both centered at (0, 0). Each point is shifted by some noise sampled
from a standard normal distribution, and the noise itself is scaled by a factor of 0.5. Once again, this
dataset is clearly not linearly separable.

2 Hyper-parameter Considerations
Recall that our goal is to train, tune and test 4 supervised learning algorithms, namely Decision
Tree, Random Forest, KNN and Neural Network. In order to compare the performance of models across
different datasets, we need to make sure that the search space(grid) for hyper-parameters is standardized
for each model across all 3 datasets. Hence, we need only define 4 unique hyper-parameter grids, for
each of the 4 models.

2.1 Decision Tree

In the case of the decision tree learning algorithm, we consider the following 4 hyper-parameters:

• criterion: This specifies whether the measure we are using to determine the best split is en-
tropy(information gain) or Gini impurity.

3
• max depth: This specifies the maximum depth that the decision tree can reach, in terms of
the number of splits. In this case, we consider the values tN one, 1, 2, 5, 10u. The reason for
considering such small values is that by visual examination of the datasets, and the fact that our
data is 2-dimensional, clearly 10 linear boundaries is more than enough to separate the two classes.

• min samples split: This specifies the minimum number of data-points that need to be present in
a node in order to split it. We consider the values t2, 4, 6u. Higher values are included to prevent
over-fitting by splitting nodes with few points.

• min samples leaf: This specifies the minimum number of data-points required in a node for it
to be a leaf node. We consider the values t1, 2, 4u.

2.2 Random Forest

In the case of the Random Forest learning algorithm, we consider the following 6 hyper-parameters:

– n estimators: This specifies the number of trees in the forest. We consider the values
t10, 25, 50u.
– max depth: This specifies the maximum depth of each tree. We consider the values
tN one, 1, 2, 5, 10u.
– min samples split: This specifies the minimum number of data-points that need to be
present in a node in order to split it. We consider the values t2, 4, 6u.
– min samples leaf: This specifies the minimum number of data-points required in a node
for it to be a leaf node. We consider the values t1, 2, 4u.
– criterion: This specifies whether the measure we are using to determine the best split is
Gini impurity or entropy (information gain). We consider the values t1 gini1 ,1 entropy 1 u.
– max features: This controls the number of features to consider at each split. We consider
the values t1, N oneu.
– bootstrap: This controls whether bootstrapping is used in the forest. We consider the value
tT rueu.

2.3 K Nearest Neighbors

In the case of the K Nearest Neighbors (KNN) learning algorithm, we consider the following 3
hyper-parameters:

– n neighbors: This specifies the number of neighbors to consider. We consider the values
t3, 5, 7, 9u.
– weights: This specifies the weighting scheme used. We consider the options t1 unif orm1 ,1 distance1 u.
– metric: This specifies the distance metrics for KNN. We consider the options t1 euclidean1 ,1 manhattan

4
2.4 Neural Network
In the case of the Neural Network learning algorithm, we consider the following 4 hyper-parameters:

– hidden sizes: This specifies the number of units in each hidden layer. We consider the
values t2, 5, 10u.
– learning rates: This specifies the learning rates for training. We consider the values
t0.01, 0.001u.
– num hidden layers: This specifies the number of hidden layers in the network. We consider
the values t1, 2, 3u.
– activation functions: This specifies the activation functions to use in the hidden layers.
We consider the options tReLU, Sigmoid, Tanhu.

3 Results Analysis
In this section, we analyze the results obtained from the different supervised machine learning
models on the datasets. To evaluate the performance of each model, we utilize several key metrics:
accuracy, precision, recall, and F1-score. These metrics provide insights into how well the models
perform, especially in the context of classification tasks.

3.1 Evaluation Metrics

– Accuracy measures the proportion of correctly classified instances among the total instances.
It is calculated using the formula:
TP ` TN
Accuracy “
TP ` TN ` FP ` FN
where TP is true positives, TN is true negatives, FP is false positives, and FN is false
negatives.
– Precision indicates the accuracy of the positive predictions. It is given by:
TP
Precision “
TP ` FP

– Recall (also known as sensitivity or true positive rate) measures the ability of a model to
identify all relevant instances. It is defined as:
TP
Recall “
TP ` FN

– F1-score is the harmonic mean of precision and recall, providing a single metric that balances
both concerns. It is computed as:
Precision ¨ Recall
F1 “ 2 ¨
Precision ` Recall

5
These metrics serve different purposes in our evaluation. Accuracy gives an overall measure of
performance, but it can be misleading in imbalanced datasets where one class is more prevalent.
Precision and recall provide a more nuanced view of performance, especially in scenarios where
the cost of false positives and false negatives differs significantly. The F1-score combines both
precision and recall into a single metric, making it useful when we need a balance between the
two.
We will use these metrics to evaluate and compare the performance of the Decision Tree, Random
Forest, KNN, and Neural Network models across the various datasets. By analyzing these metrics,
we can determine which model best suits the characteristics of each dataset and the specific
classification task at hand. The goal is to select a model that not only performs well on accuracy
but also maintains a high precision and recall, ensuring that our classification outcomes are both
reliable and relevant.

3.2 Linearly Separable Dataset

The results for the various models on the linearly separable dataset are summarized in Table 1.
As observed, all models achieved perfect accuracy, successfully classifying all instances correctly.
This outcome can be attributed to the simple nature of the dataset, which is characterized by
distinct Gaussian clusters that are easily separable.

Table 1: Results for the Linearly Separable Dataset

Model Best Parameters Accuracy Classification Report

precision: 1.00
Decision Tree criterion: gini, 1.0 recall: 1.00
max depth: None, f1-score: 1.00
min samples leaf: 1,
min samples split: 2
precision: 1.00
Random Forest bootstrap: True, 1.0 recall: 1.00
criterion: gini, f1-score: 1.00
max depth: None,
max features: 1,
min samples leaf: 1,
min samples split: 2,
n estimators: 10
precision: 1.00
KNN metric: euclidean, 1.0 recall: 1.00
n neighbors: 3, f1-score: 1.00
weights: uniform
precision: 1.00
Neural Network hidden size: 2, 1.0 recall: 1.00
learning rate: 0.01, f1-score: 1.00
num hidden layers: 1,
activation function:
ReLU

6
7
All four models managed to classify all instances accurately, demonstrating their effectiveness in
handling linearly separable data. The optimal sets of hyperparameters indicate that the models
leverage their respective strengths, such as the flexibility of the Decision Tree and Random Forest

8
algorithms and the adaptability of the KNN and Neural Network approaches. The simplicity of
the dataset allowed for straightforward parameter tuning, resulting in optimal performances across
the board.

3.3 Two Moons (Non-Linearly Separable) Dataset

The results for the various models on the Two Moons dataset are summarized in Table 2. Re-
markably, all models achieved perfect accuracy, successfully classifying all instances correctly. This
outcome highlights the models’ capabilities to learn the complex boundaries presented by the two
moons shape, which, despite being a non-linearly separable dataset, was effectively navigated by
each algorithm.

Table 2: Results for the Two Moons Dataset

Model Best Parameters Accuracy Classification Report

precision: 1.00
Decision Tree criterion: entropy, 1.0 recall: 1.00
max depth: None, f1-score: 1.00
min samples leaf: 1,
min samples split: 2
precision: 1.00
Random Forest bootstrap: True, 1.0 recall: 1.00
criterion: entropy, f1-score: 1.00
max depth: None,
max features: None,
min samples leaf: 1,
min samples split: 2,
n estimators: 50
precision: 1.00
KNN metric: euclidean, 1.0 recall: 1.00
n neighbors: 3, f1-score: 1.00
weights: uniform
precision: 1.00
Neural Network hidden size: 2, 1.0 recall: 1.00
learning rate: 0.01, f1-score: 1.00
num hidden layers: 2,
activation function:
Tanh

9
10
All four models effectively classified all instances, showcasing their proficiency in handling the
intricate structure of the Two Moons dataset. The optimal hyperparameters for each model suggest
an effective learning strategy; for instance, the Decision Tree and Random Forest utilized entropy
for splitting, which is well-suited for such complex distributions. The KNN model maintained a

11
consistent performance with uniform weights, while the Neural Network adapted effectively with
two hidden layers and a Tanh activation function. This demonstrates that even in non-linearly
separable cases, well-tuned models can achieve remarkable accuracy by capturing the underlying
patterns in the data.

3.4 Concentric Circles (Non-Linearly Separable) Dataset

The results for the various models on the Concentric Circles dataset are summarized in Table 3.
The Decision Tree and Random Forest models achieved high accuracy of 0.98, indicating their
effectiveness in classifying the instances despite the complex structure of the dataset. The KNN
model performed even better, with an accuracy of approximately 0.99. In contrast, the Neural
Network excelled with a perfect accuracy of 1.0, showcasing its capacity to learn the intricate
patterns present in this non-linearly separable data.

Table 3: Results for the Concentric Circles Dataset

Model Best Parameters Accuracy Classification Report

precision: 0.95
Decision Tree criterion: gini, 0.98 recall: 1.00
max depth: None, f1-score: 0.98
min samples leaf: 1,
min samples split: 2
precision: 0.95
Random Forest bootstrap: True, 0.98 recall: 1.00
criterion: gini, f1-score: 0.98
max depth: None,
max features: None,
min samples leaf: 1,
min samples split: 2,
n estimators: 25
precision: 0.98
KNN metric: euclidean, 0.99 recall: 1.00
n neighbors: 9, f1-score: 0.99
weights: uniform
precision: 1.00
Neural Network hidden size: 10, 1.0 recall: 1.00
learning rate: 0.01, f1-score: 1.00
num hidden layers: 3,
activation function:
Tanh

12
13
The performance of the models demonstrates their ability to adapt to the unique challenges posed
by the Concentric Circles dataset. While the Decision Tree and Random Forest models maintained
a high accuracy, their precision for class 0 was slightly lower, reflecting a trade-off between pre-
cision and recall. The KNN model also showed robust performance, benefitting from an optimal

14
number of neighbors that enabled it to balance bias and variance effectively. The Neural Network’s
configuration, with three hidden layers and a Tanh activation function, allowed it to learn and gen-
eralize well, resulting in perfect accuracy. Overall, these results underline the importance of model
selection and hyperparameter tuning in achieving effective classification for complex datasets.

Lecture03. Classification (Chapter 3)
No ratings yet
Lecture03. Classification (Chapter 3)
46 pages
DSM_MOd_5
No ratings yet
DSM_MOd_5
34 pages
UCS551 Chapter 6 - Classification
No ratings yet
UCS551 Chapter 6 - Classification
20 pages
David de Cremer - The AI-Savvy Leader_ Nine Ways to Take Back Control and Make AI Work-Harvard Business Review Press (2024)
100% (1)
David de Cremer - The AI-Savvy Leader_ Nine Ways to Take Back Control and Make AI Work-Harvard Business Review Press (2024)
187 pages
Data Mining Classification Algorithms: Credits: Padhraic Smyth
No ratings yet
Data Mining Classification Algorithms: Credits: Padhraic Smyth
54 pages
Data Algo Metrics
No ratings yet
Data Algo Metrics
5 pages
ML (Interview)
No ratings yet
ML (Interview)
20 pages
aiml nts
No ratings yet
aiml nts
33 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
Machine Learning Techniques Unit-2
No ratings yet
Machine Learning Techniques Unit-2
100 pages
Project Occupancy Alfonso Vicente Aragues
No ratings yet
Project Occupancy Alfonso Vicente Aragues
18 pages
Types of Kernels in Support Vector Machines
No ratings yet
Types of Kernels in Support Vector Machines
14 pages
PatternRecognition
No ratings yet
PatternRecognition
50 pages
2024-2025 Python IEEE Projects List
No ratings yet
2024-2025 Python IEEE Projects List
10 pages
Classification Metrics For Generalized Results
No ratings yet
Classification Metrics For Generalized Results
70 pages
Capstone Project 2
No ratings yet
Capstone Project 2
27 pages
ML UNIT4
No ratings yet
ML UNIT4
10 pages
UNIT 3 - Final
No ratings yet
UNIT 3 - Final
37 pages
ml 2m cie2
No ratings yet
ml 2m cie2
4 pages
Exploring The Impact of Artificial Intelligence (AI) On Human Resource Management (HRM)
No ratings yet
Exploring The Impact of Artificial Intelligence (AI) On Human Resource Management (HRM)
26 pages
Supervised Learning - SVM - DT
No ratings yet
Supervised Learning - SVM - DT
43 pages
ML U4
No ratings yet
ML U4
48 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
CONSULTING 4.0: AN EXPLORATORY ANALYSIS OF THE ROLE AND POTENTIAL OF ARTIFICIAL INTELLIGENCE IN THE CONSULTANCY OF TOMORROW
No ratings yet
CONSULTING 4.0: AN EXPLORATORY ANALYSIS OF THE ROLE AND POTENTIAL OF ARTIFICIAL INTELLIGENCE IN THE CONSULTANCY OF TOMORROW
12 pages
Machlearn PDF
No ratings yet
Machlearn PDF
168 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Applied Machine Learning I
No ratings yet
Applied Machine Learning I
29 pages
class 2a-Decision Trees
No ratings yet
class 2a-Decision Trees
28 pages
Human Activities Classifier Using SVM
No ratings yet
Human Activities Classifier Using SVM
19 pages
INT 354 CA1 Mokshagna
No ratings yet
INT 354 CA1 Mokshagna
8 pages
U02Lecture08 Statistical Machine Learning
No ratings yet
U02Lecture08 Statistical Machine Learning
41 pages
Lab2
No ratings yet
Lab2
17 pages
Lecture 5 - Feature extraction, model building & evaluation
No ratings yet
Lecture 5 - Feature extraction, model building & evaluation
35 pages
Penalized Logit Tree Regression
No ratings yet
Penalized Logit Tree Regression
40 pages
(REPORT) LAB - 2 - Decision - Tree
No ratings yet
(REPORT) LAB - 2 - Decision - Tree
17 pages
06-Classification_Part2
No ratings yet
06-Classification_Part2
34 pages
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
No ratings yet
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
34 pages
Research Trends in Machine Learning: Muhammad Kashif Hanif
No ratings yet
Research Trends in Machine Learning: Muhammad Kashif Hanif
80 pages
SL
No ratings yet
SL
30 pages
UNIT 2-Part2
No ratings yet
UNIT 2-Part2
9 pages
DLT Unit-1 Answers
No ratings yet
DLT Unit-1 Answers
36 pages
Data Collection
No ratings yet
Data Collection
8 pages
Three Machine Learning Algorithms
No ratings yet
Three Machine Learning Algorithms
11 pages
Handwritten Nepali Character Recognition and Narration System Using Deep CNN
No ratings yet
Handwritten Nepali Character Recognition and Narration System Using Deep CNN
61 pages
ml lab programs 2
No ratings yet
ml lab programs 2
16 pages
ML-2m
No ratings yet
ML-2m
3 pages
Data Science Courses - R & Python Analysis Tutorials - DataCamp
100% (1)
Data Science Courses - R & Python Analysis Tutorials - DataCamp
24 pages
Kenny-230717-Data Science Interview Preparation
No ratings yet
Kenny-230717-Data Science Interview Preparation
6 pages
unit 4 ML
No ratings yet
unit 4 ML
24 pages
Robust Hybrid Machine Learning Algorithms For Gas Flow Rates Prediction
No ratings yet
Robust Hybrid Machine Learning Algorithms For Gas Flow Rates Prediction
19 pages
FINAL REPORT manufacturing system
No ratings yet
FINAL REPORT manufacturing system
34 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Polyherbal Transdermal Patch: Shivaji University, Kolhapur
No ratings yet
Polyherbal Transdermal Patch: Shivaji University, Kolhapur
22 pages
Dl
No ratings yet
Dl
10 pages
Module_5
No ratings yet
Module_5
5 pages
5 markd
No ratings yet
5 markd
24 pages
UNIT-3[MLT]
No ratings yet
UNIT-3[MLT]
42 pages
Artificial Neural Networks in Construction Engineering and Management
No ratings yet
Artificial Neural Networks in Construction Engineering and Management
12 pages
Machine Learning Most Important Question For Mid Term Ipu University
No ratings yet
Machine Learning Most Important Question For Mid Term Ipu University
36 pages
ML models
No ratings yet
ML models
21 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Class-Balanced Loss Based On Effective Number of Samples
No ratings yet
Class-Balanced Loss Based On Effective Number of Samples
11 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Machine Learning Qs
No ratings yet
Machine Learning Qs
10 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
ML assignment
No ratings yet
ML assignment
13 pages
FML - |||
No ratings yet
FML - |||
7 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Dynamic Typing Speed Detector A Project Synopsis: Visvesvaraya Technological University BELAGAVI - 590 018
No ratings yet
Dynamic Typing Speed Detector A Project Synopsis: Visvesvaraya Technological University BELAGAVI - 590 018
13 pages
Lecture Notes on Lecture Notes on Deep Learning.docx
No ratings yet
Lecture Notes on Lecture Notes on Deep Learning.docx
8 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
ML_Questions_Answers
No ratings yet
ML_Questions_Answers
4 pages
790 1549 1 PB 1
No ratings yet
790 1549 1 PB 1
9 pages
Individual Assignments: Unit 2: Values, Data Types and Data Structures in R, Assignment 1
No ratings yet
Individual Assignments: Unit 2: Values, Data Types and Data Structures in R, Assignment 1
5 pages
ML_UNIT3
No ratings yet
ML_UNIT3
24 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
A Machine Learning Approach To Facies Classification Using Well Logs
No ratings yet
A Machine Learning Approach To Facies Classification Using Well Logs
6 pages
AIL Quiz Loc
No ratings yet
AIL Quiz Loc
33 pages
Machine Learning and Data Science ANSWER
No ratings yet
Machine Learning and Data Science ANSWER
9 pages
Augmenting Banking and Fintech With Intelligent Internet of Things Technology
No ratings yet
Augmenting Banking and Fintech With Intelligent Internet of Things Technology
6 pages
3fdef5_ff0041de531d42268d3b5ce135b48a8c
No ratings yet
3fdef5_ff0041de531d42268d3b5ce135b48a8c
7 pages
ECE Fall 2024 New Student Registration Guide
No ratings yet
ECE Fall 2024 New Student Registration Guide
4 pages
Machine Learning 1707965934
No ratings yet
Machine Learning 1707965934
15 pages
Technophilia Artificial Intelligence
No ratings yet
Technophilia Artificial Intelligence
5 pages
Akar Resume 18 01
No ratings yet
Akar Resume 18 01
1 page
Rsume Sample
No ratings yet
Rsume Sample
1 page
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ML Assignment Report Prithvi D

Uploaded by

ML Assignment Report Prithvi D

Uploaded by

CSIT332 - Principles of Machine Learning

1 Introduction and Data Exploration

1.1 Linearly Separable Dataset

2.1 Decision Tree

2.2 Random Forest

2.3 K Nearest Neighbors

3.1 Evaluation Metrics

3.2 Linearly Separable Dataset

Table 1: Results for the Linearly Separable Dataset

Model Best Parameters Accuracy Classification Report

3.3 Two Moons (Non-Linearly Separable) Dataset

Table 2: Results for the Two Moons Dataset

Model Best Parameters Accuracy Classification Report

3.4 Concentric Circles (Non-Linearly Separable) Dataset

Table 3: Results for the Concentric Circles Dataset

Model Best Parameters Accuracy Classification Report

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.