Aaaaabbbbbbhw 1
Aaaaabbbbbbhw 1
1
Homework 1 DSCI 552, Instructor: Mohammad Reza Rajati
(d) Replace the Euclidean metric with the following metrics5 and test them. Sum-
marize the test errors (i.e., when k = k ∗ ) in a table. Use all of your training data
and select the best k when {1, 6, 11, . . . , 196}.
i. Minkowski Distance:
A. which becomes Manhattan Distance with p = 1.
B. with log10 (p) ∈ {0.1, 0.2, 0.3, . . . , 1}. In this case, use the k ∗ you found
for the Manhattan distance in 1(d)iA. What is the best log10 (p)?
C. which becomes Chebyshev Distance with p → ∞
ii. Mahalanobis Distance.6
(e) The majority polling decision can be replaced by weighted decision, in which the
weight of each point in voting is inversely proportional to its distance from the
query/test data point. In this case, closer neighbors of a query point will have
a greater influence than neighbors which are further away. Use weighted voting
with Euclidean, Manhattan, and Chebyshev distances and report the best test
errors when k ∈ {1, 6, 11, 16, . . . , 196}.
(f) What is the lowest training error rate you achieved in this homework?
5
You can use sklearn.neighbors.DistanceMetric. Research what each distance means.
6
Mahalanobis Distance requires inverting the covariance matrix of the data. When the covariance matrix
is singular or ill-conditioned, the data live in a linear subspace of the feature space. In this case, the features
have to be transformed into a reduced feature set in the linear subspace, which is equivalent to using a
pseudoinverse instead of an inverse.