0% found this document useful (0 votes)
7 views32 pages

Machine Learning

The K-Nearest Neighbour (K-NN) algorithm is a simple supervised learning method used for classification and regression, which categorizes new data points based on the majority class of their nearest neighbors. The algorithm involves selecting a number K for neighbors, calculating Euclidean distances, and assigning categories based on the most frequent class among the nearest neighbors. K-NN is widely applicable in various fields such as banking, politics, and image recognition, but it requires careful selection of K and can be computationally intensive.

Uploaded by

worded839
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views32 pages

Machine Learning

The K-Nearest Neighbour (K-NN) algorithm is a simple supervised learning method used for classification and regression, which categorizes new data points based on the majority class of their nearest neighbors. The algorithm involves selecting a number K for neighbors, calculating Euclidean distances, and assigning categories based on the most frequent class among the nearest neighbors. K-NN is widely applicable in various fields such as banking, politics, and image recognition, but it requires careful selection of K and can be computationally intensive.

Uploaded by

worded839
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Machine Learning

K- Nearest Neighbour (K-NN)


Algorithm
What is K-NN
An Example
Working Principle
Different Steps In KNN
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K number
of neighbors
Step-3: Take the K nearest neighbors as per the
calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of
the data points in each category.
Step-5: Assign the new data points to that category for
which the number of the neighbor is maximum.
Step-6: Our model is ready.
Illustration

Firstly, we will choose the number of neighbors, so we will


choose the k=5.
• Next, we will calculate the Euclidean
distance between the data points. The
Euclidean distance is the distance
between two points, which we have
already studied in geometry. It can be
calculated as:
By calculating the Euclidean distance we got the nearest neighbors,
as three nearest neighbors in category A and two nearest neighbors
in category B. Consider the below image:
As we can see the 3 nearest neighbors are from category A, hence
this new data point must belong to category A.
KNN Algorithm
• K-Nearest Neighbour is one of the simplest Machine Learning
algorithms based on Supervised Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most
similar to the available categories.
• This means when new data appears then it can be easily classified into
a well suite category by using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for
Classification but mostly it is used for the Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make
any assumption on underlying data.
• It is also called a lazy learner algorithm because it does not learn
from the training set immediately instead it stores the dataset and at
the time of classification, it performs an action on the dataset.
Applications of KNN
Banking System
KNN can be used in banking system to predict weather an individual
is fit for loan approval? Does that individual have the characteristics
similar to the defaulters one?

Calculating Credit Ratings


KNN algorithms can be used to find an individual‟s credit rating by
comparing with the persons having similar traits.

Politics
With the help of KNN algorithms, we can classify a potential voter into
various classes like “Will Vote”, “Will not Vote”, “Will Vote to Party
„Congress‟, “Will Vote to Party „BJP‟.

Other areas in which KNN algorithm can be used are Speech


Recognition, Handwriting Detection, Image Recognition and
Video Recognition.
Consider to classify special
paper tissue is of good or bad
quality
Y=Classificatio
Points X1 (Acid Durability ) X2(strength)
n

P1 7 7 BAD

P2 7 4 BAD

P3 3 4 GOOD

P4 1 4 GOOD
Points X1(Acid Durability) X2(Strength) Y(Classification)
P1 7 7 BAD
P2 7 4 BAD
P3 3 4 GOOD
P4 1 4 GOOD
P5 3 7 ?
Procedure
• Step:1- Determine K= Number of Neighbours
• Let us select K=3
• Step:2- Calculate the distance between the query-instance
and all the training Samples
• Step:3- Sort the distance and determine nearest neighbors
based on the K-th Minimum distance
• Step:4- Gather the category of classes
• Step:5- Use simple majority of classes category as the
classified value of query instance
KNN
P1 P2 P3 P4

Euclidean (7,7) (7,4) (3,4) (1,4)


Distance of
P5(3,7)
from
P1 P2 P3 P4

Euclidean (7,7) (7,4) (3,4) (1,4)


Distance of
P5(3,7)
from

Class BAD BAD GOOD GOOD


Points X1(Durability) X2(Strength) Y(Classificati
on)
P1 7 7 BAD
P2 7 4 BAD
P3 3 4 GOOD
P4 1 4 GOOD
P5 3 7 GOOD
Example: 2
Calculation
Consider the following data concerning credit
default. Age and Loan are two numerical variables
(predictors) and Default is the target.
We can now use the training set to classify an unknown case (Age=48
and Loan=$142,000) using Euclidean distance. If K=1 then the nearest
neighbor is the last case in the training set with Default=Y.

With K=3, there are two Default=Y and one Default=N out of
three closest neighbors. The prediction for the unknown case is
again Default=Y.
Standardized Distance

One major drawback in calculating distance measures directly from the training set
is in the case where variables have different measurement scales or there is a
mixture of numerical and categorical variables.

Using the standardized distance on the same training set, the unknown case
returned a different neighbor which is not a good sign of robustness.
How to select the value of K in
the K-NN Algorithm?
• There is no particular way to determine the best
value for "K", so we need to try some values to
find the best out of them. The most preferred
value for K is 5.
• A very low value for K such as K=1 or K=2, can
be noisy and lead to the effects of outliers in the
model.
• Large values for K are good, but it may find
some difficulties.
Advantages of KNN Algorithm:
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
• Always needs to determine the value of K which may
be complex some time.
• The computation cost is high because of calculating
the distance between the data points for all the training
samples.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy