0% found this document useful (0 votes)
35 views2 pages

AIDI 1002 FinalExam Section 01

The document describes a final exam for a machine learning course. It contains 3 questions - the first involves increasing the training set size for an iris classification model and plotting the results, the second involves binary classification using a discriminant function on some sample data, and the third involves performing k-means clustering on some sample data and comparing the results to true labels.

Uploaded by

uniquelifeofvj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views2 pages

AIDI 1002 FinalExam Section 01

The document describes a final exam for a machine learning course. It contains 3 questions - the first involves increasing the training set size for an iris classification model and plotting the results, the second involves binary classification using a discriminant function on some sample data, and the third involves performing k-means clustering on some sample data and comparing the results to true labels.

Uploaded by

uniquelifeofvj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Last Name: First Name: Student ID:

AIDI 1002: Machine Learning Programming — Final Exam Fall 2023


Due Date : December 15, 2023, 1:00 PM - 3:00 PM

Note : Submit two files in the submission folder. First is your colab notebook including your code and outputs and second is
the pdf of colab notebook with the following naming convention for both the files.
(File name : Lastname_Firstname_FinalExam.pd f /.ipynb)

1. (30 Points) Increasing Training Set Size Experiment: Consider the iris dataset for multiclass classification and perform
the following steps.

1. Divide the data into 80% training and 20% testing.

2. From the training set only take 5% of the data and train the supervised learning models (Logistic Regression,
Decision Trees, Random Forest, and Naive Bayes) and test it on the test set created in the previous step.

3. Repeat the training again with adding 5% training data every time until you use the whole training set.

4. In every training, test your models on the 20% of the test set and store the accuracy and f1-score of the model.

5. Plot the sample graph for accuracy and f1-score as provided below:

2. (30 Points) Binary Classification with Discriminant: Consider the following 15 data points with two features, i.e., X and
Y , and their associated classes:

X = [5, 1, 9, 6, 5, 6, 1, 9, 10, 11, 8, 7, 13, 8, 19]

Y = [14, 16, 17, 10, 9, 17, 15, 3, 3, 1, 4, 5, 1, 3, 15]

C = [c1 , c1 , c1 , c1 , c1 , c1 , c1 , c2 , c2 , c2 , c2 , c2 , c2 , c2 , c2 ].

Note that these data points are ordered so that (x1 , y1 , label1 ) = (5, 14, c1 ) and (x15 , y15 , label15 ) = (19, 15, c2 ).

A researcher defined a discriminant function for binary class classification as g(x, y) = −x + 2y + xy where x ∈ X and y ∈ Y .
Accordingly, the classes are selected as follows.

1

c1 if g(x, y) ≥ 35

c2 otherwise

Report the accuracy of the predicted labels using the researcher’s discriminant function. (In your Jupyter Notebook,
show how you find these numbers and print them.)

Answer:

Number of misclassified in c1 = Number of misclassified in c2 =

3. (40 points) K-Means Clustering: Consider the 30 data points and their corresponding class labels stored in a dictionary
named “data_dict”.

data_dict = { ( 2 . 0 , 3 . 4 3 , 4 . 3 7 ) : 2 , ( 2 . 4 9 , 4 . 2 8 , 4 . 8 3 ) : 2 , ( 2 . 5 8 , 4 . 3 6 , 4 . 4 8 ) : 2 , ( 2 . 6 6 , 4 . 4 5 , 5 . 9 5 ) : 2 ,
(2.82 , 3.66 , 4.51): 2 , (3.03 , 4.37 , 5.07): 2 , (3.27 , 4.54 , 4.57): 2 , (3.41 , 3.94 , 5.35): 2 ,
(3.53 , 4.32 , 5.41): 2 , (3.53 , 4.6 , 6 . 8 ) : 1 , (3.61 , 4.25 , 5.21): 1 , (3.61 , 4.78 , 5.47): 1 ,
(3.72 , 5.44 , 5.88): 1 , (3.87 , 4.96 , 4.52): 2 , (4.13 , 5.29 , 6 . 6 ) : 1 , (4.25 , 5.97 , 5.48): 1 ,
(4.61 , 4.9 , 5.11): 1 , (4.73 , 4.4 , 6.78): 1 , (4.97 , 4.25 , 5 . 0 ) : 1 , (4.98 , 5.27 , 6.79): 1 ,
(5.08 , 3.51 , 4.69): 3 , (5.15 , 3.58 , 4 . 2 ) : 3 , (5.67 , 2.27 , 4.65): 3 , (5.67 , 3.81 , 5.75): 3 ,
(5.94 , 2.34 , 4.12): 3 , (6.06 , 3.16 , 4.36): 3 , (6.09 , 3.19 , 4.02): 3 , (6.43 , 3.42 , 4.18): 3 ,
( 6 . 5 6 , 2 . 7 , 4 . 0 3 ) : 3 , ( 6 . 7 9 , 3 . 4 6 , 4 . 8 1 ) : 3}

For instance, the first point has coordinates (x1 , x2 , x3 ) = (2.0, 3.43, 4.37) and belongs to class 2. In total we have three
classes: 1, 2, and 3.

As a discriminant function, consider a distance function based on below center coordinates (encoded as a dictionary of
values) for each class labels.

c e n t e r s _ d i c t = {}
c e n t e r s _ d i c t [ ( 3 , 4 , 5 ) ] = 1 # c e n t e r c o o r d i n a t e s f o r c l a s s 1 , i . e . , c1 =4, c2 =5, c3=6
c e n t e r s _ d i c t [ ( 4 , 5 , 6 ) ] = 2 # c e n t e r c o o r d i n a t e s f o r c l a s s 2 , i . e . , c1 =3, c2 =4, c3=5
c e n t e r s _ d i c t [ ( 6 , 3 , 5 ) ] = 3 # c e n t e r c o o r d i n a t e s f o r c l a s s 3 , i . e . , c1 =6, c2 =3, c3=5

Note that a discriminant function based on cosine distance can be written as


q
a.b
Cosine distance between point a = [a1, a2] and b = [b1, b2] is d = 1 − ∥a∥∥b∥ (a.b = ∑ni=1 ai bi ; ∥a∥ = ∑ni=1 a2i )

Based on above discriminant functions, perform a K-Means Clustering task over 30 points in data_dict and then compare
it with true labels. Print the number of correctly classified instance in the answer.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy