0% found this document useful (0 votes)
43 views10 pages

Clustering - KNN

K-nearest neighbors (KNN) is a supervised machine learning algorithm that can be used for classification or regression. It works by finding the K closest training examples to a new data point based on distance, and assigning the new point the most common label of its K neighbors. Choosing the right value for K is important - a value that is too small can be unstable, while one that is too large may be too generalized. KNN is a simple algorithm but becomes slower with large datasets.

Uploaded by

Siddharth Doshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views10 pages

Clustering - KNN

K-nearest neighbors (KNN) is a supervised machine learning algorithm that can be used for classification or regression. It works by finding the K closest training examples to a new data point based on distance, and assigning the new point the most common label of its K neighbors. Choosing the right value for K is important - a value that is too small can be unstable, while one that is too large may be too generalized. KNN is a simple algorithm but becomes slower with large datasets.

Uploaded by

Siddharth Doshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Clustering - KNN

Clustering
K-NN
K-nearest neighbors (KNN) is a type of
supervised learning algorithm used for both
regression and classification.
• KNN tries to predict the correct class for the test data by calculating the
distance between the test data and all the training points.
• Then select the K number of points which is closet to the test data.
• The KNN algorithm calculates the probability of the test data
belonging to the classes of ‘K’ training data and class holds the highest
probability will be selected.
• In the case of regression, the value is the mean of the ‘K’ selected
training points.
KNN
How it works?
The K-NN working can be explained on the basis of the below algorithm:
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
• Step-4: Among these k neighbors, count the number of the data points in
each category.
• Step-5: Assign the new data points to that category for which the number of
the neighbor is maximum.
• Step-6: Our model is ready.
How to choose value of K?
How to choose value of K?
• There are no pre-defined statistical methods to find the most
favorable value of K.
• Initialize a random K value and start computing.
• Choosing a small value of K leads to unstable decision boundaries.
• The substantial K value is better for classification as it leads to
smoothening the decision boundaries.
• Derive a plot between error rate and K denoting values in a defined
range. Then choose the K value as having a minimum error rate.
Value of K?
1.As we decrease the value of K to 1, our predictions become less stable. Just think for
a minute, imagine K=1 and we have a query point surrounded by several reds and
one green (I’m thinking about the top left corner of the colored plot above), but the
green is the single nearest neighbor. Reasonably, we would think the query point is
most likely red, but because K=1, KNN incorrectly predicts that the query point is
green.
2.Inversely, as we increase the value of K, our predictions become more stable due to
majority voting / averaging, and thus, more likely to make more accurate predictions
(up to a certain point). Eventually, we begin to witness an increasing number of
errors. It is at this point we know we have pushed the value of K too far.
3.In cases where we are taking a majority vote (e.g. picking the mode in a
classification problem) among labels, we usually make K an odd number to have a
tiebreaker.
Pros and Cons
• Advantages
1.The algorithm is simple and easy to implement.
2.There’s no need to build a model, tune several parameters, or make
additional assumptions.
3.The algorithm is versatile. It can be used for classification, regression,
and search (as we will see in the next section).
• Disadvantages
1.The algorithm gets significantly slower as the number of examples
and/or predictors/independent variables increase.
Conclusion
• KNN works by finding the distances
between a query and all the examples
in the data, selecting the specified
number examples (K) closest to the
query, then votes for the most frequent
label (in the case of classification) or
averages the labels (in the case of
regression).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy