The document contains 10 multiple choice questions about the K-nearest neighbors (KNN) algorithm. KNN is a non-parametric lazy learning algorithm used for classification and regression. It works by finding the closest training examples in the feature space and predicting the target value based on a majority vote of its neighbors.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
23 views6 pages
Assignment 2
The document contains 10 multiple choice questions about the K-nearest neighbors (KNN) algorithm. KNN is a non-parametric lazy learning algorithm used for classification and regression. It works by finding the closest training examples in the feature space and predicting the target value based on a majority vote of its neighbors.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6
Assignment 2
Question 1: What is the KNN algorithm?
(A) The KNN algorithm is non-parametric and does not make assumptions about the underlying distribution of the data. (B) The KNN works by finding the K closest data points (neighbors) to the query point and predicts the output based on the labels of these neighbors. (C) The KNN algorithm is a lazy machine learning algorithm for classification and regression tasks. It can work well with both binary and multi-class classification problems. (D) All of the above
Question 2: Euclidean and Minkowski distance are the most
commonly used distance metrics in the KNN algorithm. What are the other distance metrics used in the KNN algorithm? (A) Cosine distance (B) Haversine distance (C) Manhattan distance (D) All of the above
Question 3: What are the disadvantages of using the KNN
algorithm? (A) As the number of dimensions increases, the distance between any two points in the space becomes increasingly large, making it difficult to find meaningful nearest neighbors. (B) Computationally expensive, especially for large datasets, and requires a large amount of memory to store the entire dataset. (C) Sensitive to the choice of K and distance metric. (D) All of the above
Question 4: How do you choose the value of K (the number of
neighbors to consider) in the KNN algorithm? (Select two) (A) A small value of K, for example, K=1, will result in a more flexible model but may be prone to overfitting. (B) A large value of K, for example, K=n, where n is the size of the dataset, will result in a more stable model but may not capture the local variations in the data. (C) A large value of K, for example, K=n, where n is the size of the dataset, will result in a more flexible model but may be prone to overfitting. (D) A small value of K, for example, K=1, will result in a more stable model but may not capture the local variations in the data.
Question 5: How do you handle imbalanced data in the KNN
algorithm? (A) Weighted voting, where the vote of each neighbor is weighted by its inverse distance to the query point. This gives more weight to the closer neighbors and less weight to the farther neighbors, which can help to reduce the effect of the majority class. (B) Oversample the minority class. (C) Undersample the majority class. (D) All of the above.
Question 6: How would you choose the distance metric in
KNN? (A) Euclidean distance is a good default choice for continuous data. It works well when the data is dense and the differences between features are important. (B) Manhattan distance is a good choice when the data has many outliers or when the scale of the features is different. For example, if we are comparing distances between two cities, the distance metric should not be affected by the difference in elevation or terrain between the cities. (C) Minkowski distance with p=1 is equivalent to Manhattan distance, and Minkowski distance with p=2 is equivalent to Euclidean distance. Minkowski distance allows you to control the order of the distance metric based on the nature of the problem. (D) All of the above
Question 7: What are the ideal use cases for KNN?
(A) KNN is best suited for small to medium-sized datasets with relatively low dimensionality. It can be useful in situations where the decision boundary is linear. It can be effective in cases where the data is clustered or has distinct groups. (B) KNN is best suited for large datasets with relatively high dimensionality. It can be useful when the decision boundary is highly irregular or nonlinear. It can be effective in cases where the data is clustered or has distinct groups. (C) KNN is best suited for small to medium-sized datasets with relatively low dimensionality. It can be useful when the decision boundary is highly irregular or nonlinear. It can be effective in cases where the data is clustered or has distinct groups. (D) KNN is best suited for small to medium-sized datasets with relatively low dimensionality. It can be useful when the decision boundary is highly irregular or nonlinear. It can be effective in cases where the data is not clustered or doesn’t have distinct groups.
Question 8: How does the KNN algorithm work? (Select two)
(A) KNN works by calculating the distance between a data point and all other points in the dataset. Then, KNN selects the k-nearest neighbors. For regression, the most common class among the ‘k’ neighbors is assigned as the predicted class for the new data point. (B) KNN works by calculating the distance between a data point and all other points in the dataset. Then, KNN selects the k-nearest neighbors. For classification, averages the values of the most common class among the ‘k’ neighbor to the target data point. (C) KNN works by calculating the distance between a data point and all other points in the dataset. Then, KNN selects the k-nearest neighbors. For classification, the most common class among the ‘k’ neighbors is assigned as the predicted class for the new data point. (D) KNN works by calculating the distance between a data point and all other points in the dataset. Then, KNN selects the k-nearest neighbors. For regression tasks, instead of a majority vote, the algorithm takes the average of the ‘k’ nearest neighbors’ values as the prediction.
Question 9: What’s the bias and variance trade-off for KNN?
(Select two) (A) A small ‘k’ results in a low bias but high variance (the model is sensitive to noise). (B) A large ‘k’ results in a low bias but high variance (the model is sensitive to noise). (C) A large ‘k’ leads to high bias but low variance (smoothing over the data). (D) A small ‘k’ leads to high bias but low variance (smoothing over the data).
Question 10: Which options are correct about instance-based
learning, model-based learning, and online learning? (Select two) (A) KNN is an instance-based learning algorithm, meaning it memorizes the entire training dataset and makes predictions based on similarity to instances. That’s why KNN is not naturally suited for online learning because it memorizes the entire training dataset. When new data is added, the entire model needs to be recalculated. (B) Model-based learning involves learning a mapping from inputs to outputs and generalizing to new, unseen data. For example, SVM, Decision Trees, etc. (C) KNN is a model-based learning algorithm, meaning it memorizes the entire training dataset and makes predictions based on similarity to instances. That’s why KNN is not naturally suited for online learning because it memorizes the entire training dataset. When new data is added, the entire model needs to be recalculated. (D) Instance-based learning involves learning a mapping from inputs to outputs and generalizing to new, unseen data. For example, SVM, Decision Trees, etc.