K-NN and Perceptron
K-NN and Perceptron
Mini Course:
Machine Learning - Mathematical Foundation and Practical Applications
The pdf file, CreditApproval.pdf, has a description of the features of the data set. It is noted the
values in the data set was changed by the authors to protect the confidentiality of the data. Use the
first 500 data for the training set and the rest for the testing set.
1. k-NN.
The training data set is used to determine the labels of the data in the testing set.
√
Use k = n, where n is the number of data in the training data set, determine which of
the norms (Euclidean, Manhattan, or Mahalanobis) gives you the best accuracy for the
data in the test data set. The accuracy is defined to be the ratio of the number of data
that was classified correctly by the algorithm divided by the total number of data. Do
this for two cases:
– Data is used unchanged.
– Data is normalized using the mean
n
1X
µj = xj (i)
n i=1
Note: If you write your code in MATLAB, you can use the MATLAB routines: sort (for
sorting), idx (for keeping tracks of the data indices when sorting), and mode (for finding the
majority of the nearest neigbors’ labels)
1. Perceptron The training set is used to determine the best weights. Once the best weights
are determined, you apply them to the testing data set, which you can think as future credit
card applications, except that you know the labels so that you can determine how well the
best weights that you computed from the training data set classify them by calculating the
accuracy. Set the number of epochs to be 1000. For the output, report the accuracy for the
training data as well as the testing data. Do this for following cases: