0% found this document useful (0 votes)
16 views14 pages

K-Nearest Neighbor (KNN) Algorithm: Last Updated: 14 May, 2025

K-Nearest Neighbors (KNN) is a supervised machine learning algorithm used for classification and regression by predicting the label or value of a new data point based on its 'k' closest neighbors. The algorithm operates on the principle of majority voting for classification or averaging for regression, utilizing distance metrics like Euclidean, Manhattan, and Minkowski to identify neighbors. While KNN is simple and versatile, it can be slow with large datasets and may struggle with high-dimensional data.

Uploaded by

SARAVANAN B S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views14 pages

K-Nearest Neighbor (KNN) Algorithm: Last Updated: 14 May, 2025

K-Nearest Neighbors (KNN) is a supervised machine learning algorithm used for classification and regression by predicting the label or value of a new data point based on its 'k' closest neighbors. The algorithm operates on the principle of majority voting for classification or averaging for regression, utilizing distance metrics like Euclidean, Manhattan, and Minkowski to identify neighbors. While KNN is simple and versatile, it can be slow with large datasets and may struggle with high-dimensional data.

Uploaded by

SARAVANAN B S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

K-Nearest Neighbor(KNN) Algorithm

Last Updated : 14 May, 2025

K-Nearest Neighbors (KNN) is a supervised machine learning algorithm


generally used for classification but can also be used for regression
tasks. It works by finding the "k" closest data points (neighbors) to a given
input and makesa predictions based on the majority class (for
classification) or the average value (for regression). Since KNN makes no
assumptions about the underlying data distribution it makes it a non-
parametric and instance-based learning method.

1/3

K-Nearest Neighbors is also called as a lazy learner algorithm because it


does not learn from the training set immediately instead it stores the
dataset and at the time of classification it performs an action on the
dataset.

For example, consider the following table of data points containing two
features: ▲

Open In App
KNN Algorithm working visualization

The new point is classified as Category 2 because most of its closest


neighbors are blue squares. KNN assigns the category based on the
majority of nearby points. The image shows how KNN predicts the
category of a new data point based on its closest neighbours.

The red diamonds represent Category 1 and the blue squares represent
Category 2.
The new data point checks its closest neighbors (circled points).
Since the majority of its closest neighbors are blue squares (Category
2) KNN predicts the new data point belongs to Category 2.

KNN works by using proximity and majority voting to make predictions.

What is 'K' in K Nearest Neighbour?


In the k-Nearest Neighbours algorithm k is just a number that tells the
algorithm how many nearby points or neighbors to look at when it makes
a decision.

Example: Imagine you're deciding which fruit it is based on its shape and
size. You compare it to fruits you already know.

If k = 3, the algorithm looks at the 3 closest fruits to the new one.


If 2 of those 3 fruits are apples and 1 is a banana, the algorithm says
the new fruit is an apple because most of its neighbors are apples.
Open In App
How to choose the value of k for KNN Algorithm?

The value of k in KNN decides how many neighbors the algorithm looks
at when making a prediction.
Choosing the right k is important for good results.
If the data has lots of noise or outliers, using a larger k can make the
predictions more stable.
But if k is too large the model may become too simple and miss
important patterns and this is called underfitting.
So k should be picked carefully based on the data.

Statistical Methods for Selecting k

Cross-Validation: Cross-Validation is a good way to find the best value


of k is by using k-fold cross-validation. This means dividing the dataset
into k parts. The model is trained on some of these parts and tested on
the remaining ones. This process is repeated for each part. The k value
that gives the highest average accuracy during these tests is usually
the best one to use.
Elbow Method: In Elbow Method we draw a graph showing the error
rate or accuracy for different k values. As k increases the error usually
drops at first. But after a certain point error stops decreasing quickly.
The point where the curve changes direction and looks like an "elbow"
is usually the best choice for k.
Odd Values for k: It’s a good idea to use an odd number for k especially
in classification problems. This helps avoid ties when deciding which
class is the most common among the neighbors.

Distance Metrics Used in KNN Algorithm


KNN uses distance metrics to identify nearest neighbor, these neighbors
Python for Machine Learning Machine Learning with R Machine Learning Algorithms EDA Ma
are used for classification and regression task. To identify nearest
neighbor we use below distance metrics:

Open In App
1. Euclidean Distance

Euclidean distance is defined as the straight-line distance between two


points in a plane or space. You can think of it like the shortest path you
would walk if you were to go directly from one point to another.

distance(x, X
d
2
i)

= ∑ (xj − Xi ) ]
​ ​ ​ ​

j =1 j

2. Manhattan Distance

This is the total distance you would travel if you could only move along
horizontal and vertical lines like a grid or city streets. It’s also called
"taxicab distance" because a taxi can only drive along the grid-like streets
of a city.

n
d (x, y ) = ∑ ​
∣xi − yi ∣​ ​

i=1

3. Minkowski Distance

Minkowski distance is like a family of distances, which includes both


Euclidean and Manhattan distances as special cases.

1
n p ​

p
d (x, y ) = (∑ ​

(xi − yi ) )
​ ​

i=1

From the formula above, when p=2, it becomes the same as the Euclidean
distance formula and when p=1, it turns into the Manhattan distance
formula. Minkowski distance is essentially a flexible formula that can
represent either Euclidean or Manhattan distance depending on the value
of p.

Working of KNN algorithm


Thе K-Nearest Neighbors (KNN) algorithm operates on the principle of
similarity where it predicts the label or value of a new data point by
Open In App
considering the labels or values of its K nearest neighbors in the training
dataset.

Step 1: Selecting the optimal value of K

K represents the number of nearest neighbors that needs to be


considered while making prediction.

Step 2: Calculating distance

To measure the similarity between target and training data points


Euclidean distance is used. Distance is calculated between data points
in the dataset and target point.

Step 3: Finding Nearest Neighbors

The k data points with the smallest distances to the target point are
nearest neighbors.

Step 4: Voting for Classification or Taking Average for Regression

When you want to classify a data point into a category like spam or not
spam, the KNN algorithm looks at the K closest points in the dataset.
These closest points are called neighbors. The algorithm then looks at

Open In App
which category the neighbors belong to and picks the one that appears
the most. This is called majority voting.
In regression, the algorithm still looks for the K closest points. But
instead of voting for a class in classification, it takes the average of the
values of those K neighbors. This average is the predicted value for the
new point for the algorithm.

It shows how a test point is classified based on its nearest neighbors. As


the test point moves the algorithm identifies the closest 'k' data points i.e.
5 in this case and assigns test point the majority class label that is grey
label class here.

Python Implementation of KNN Algorithm

1. Importing Libraries

Counter is used to count the occurrences of elements in a list or iterable.


In KNN after finding the k nearest neighbor labels Counter helps count
how many times each label appears.

1 import numpy as np
2 from collections import Counter

2. Defining the Euclidean Distance Function

euclidean_distance is to calculate euclidean distance between points.

1 def euclidean_distance(point1, point2):


2 return np.sqrt(np.sum((np.array(point1) -
np.array(point2))**2))

3. KNN Prediction Function Open In App


distances.append saves how far each training point is from the test
point, along with its label.
distances.sort is used to sorts the list so the nearest points come first.
k_nearest_labels picks the labels of the k closest points.
Uses Counter to find which label appears most among those k labels
that becomes the prediction.

1 def knn_predict(training_data, training_labels,


test_point, k):
2 distances = []
3 for i in range(len(training_data)):
4 dist = euclidean_distance(test_point,
training_data[i])
5 distances.append((dist, training_labels[i]))
6 distances.sort(key=lambda x: x[0])
7 k_nearest_labels = [label for _, label in
distances[:k]]
8 return Counter(k_nearest_labels).most_common(1)[0]
[0]

4. Training Data, Labels and Test Point

1 training_data = [[1, 2], [2, 3], [3, 4], [6, 7], [7,
8]]
2 training_labels = ['A', 'A', 'A', 'B', 'B']
3 test_point = [4, 5]

Open In App
4 k = 3

5. Prediction

1 prediction = knn_predict(training_data,
training_labels, test_point, k)
2 print(prediction)

Output:

The algorithm calculates the distances of the test point [4, 5] to all
training points selects the 3 closest points as k = 3 and determines their
labels. Since the majority of the closest points are labelled 'A' the test
point is classified as 'A'.

In machine learning we can also use Scikit Learn python library which
has in built functions to perform KNN machine learning model and for
that you refer to Implementation of KNN classifier using Sklearn.

Applications of KNN
Recommendation Systems: Suggests items like movies or products by
finding users with similar preferences.
Spam Detection: Identifies spam emails by comparing new emails to
known spam and non-spam examples.
Customer Segmentation: Groups customers by comparing their
shopping behavior to others.
Speech Recognition: Matches spoken words to known patterns to
convert them into text.

Advantages of KNN
Simple to use: Easy to understand and implement.
Open In App
No training step: No need to train as it just stores the data and uses it
during prediction.
Few parameters: Only needs to set the number of neighbors (k) and a
distance method.
Versatile: Works for both classification and regression problems.

Disadvantages of KNN
Slow with large data: Needs to compare every point during prediction.
Struggles with many features: Accuracy drops when data has too
many features.
Can Overfit: It can overfit especially when the data is high-dimensional
or not clean.

Also Check for more understanding:

K Nearest Neighbors with Python | ML


Implementation of K-Nearest Neighbors from Scratch using Python
Mathematical explanation of K-Nearest Neighbour
Weighted K-NN

Advertise with us Next Article


Implementation of K Nearest
Neighbors

kartik Follow

57

Article Tags : Machine Learning AI-ML-DS Directi ML-Classification +2 More

Practice Tags : Directi Machine Learning Machine Learning

Open In App
Similar Reads

kNN: k-Nearest Neighbour Algorithm in R From Scratch


In this article, we are going to discuss what is KNN algorithm, how it is
coded in R Programming Language, its application, advantages and…

15+ min read

k-nearest neighbor algorithm using Sklearn - Python


K-Nearest Neighbors (KNN) works by identifying the 'k' nearest data points
called as neighbors to a given input and predicting its class or value based…

15+ min read

r-Nearest neighbors
r-Nearest neighbors are a modified version of the k-nearest neighbors. The
issue with k-nearest neighbors is the choice of k. With a smaller k, the…

15+ min read

ML | K-means++ Algorithm
Clustering is one of the most common tasks in machine learning where we
group similar data points together. K-Means Clustering is one of the…

15+ min read

Implementation of K Nearest Neighbors


Prerequisite: K nearest neighbors Introduction Say we are given a data set of
items, each having numerically valued features (like Height, Weight, Age,…

15+ min read

Inductive Learning Algorithm


In this article, we will learn about Inductive Learning Algorithm which
generally comes under the domain Openof Machine
In App Learning. What is Inductive…
15+ min read

K-Nearest Neighbors and Curse of Dimensionality


In high-dimensional data, the performance of the k-nearest neighbor (k-NN)
algorithm often deteriorates due to increased computational complexity an…

15+ min read

K Nearest Neighbors with Python | ML


K-Nearest Neighbors is one of the most basic yet essential classification
algorithms in Machine Learning. It belongs to the supervised learning…

15+ min read

Basic Understanding of CURE Algorithm


CURE(Clustering Using Representatives) It is a hierarchical based clustering
technique, that adopts a middle ground between the centroid based and th…

10 min read

Understanding Decision Boundaries in K-Nearest Neighbors (KNN)


Decision boundary is an imaginary line or surface that separates different
classes in a classification problem. It represents regions as one class…

15+ min read

Corporate & Communications


Address:
A-143, 7th Floor, Sovereign Corporate
Tower, Sector- 136, Noida, Uttar
Pradesh (201305)
Open In App
Registered Address:
K 061, Tower K, Gulshan Vivante
Apartment, Sector 137, Noida,
Gautam Buddh Nagar, Uttar Pradesh,
201305

Advertise with us

Company Explore
About Us Job-A-Thon Hiring Challenge
Legal GfG Weekly Contest
Privacy Policy Offline Classroom Program
Careers DSA in JAVA/C++
In Media Master System Design
Contact Us Master CP
GfG Corporate Solution GeeksforGeeks Videos
Placement Training Program

Languages DSA
Python Data Structures
Java Algorithms
C++ DSA for Beginners
PHP Basic DSA Problems
GoLang DSA Roadmap
SQL DSA Interview Questions
R Language Competitive Programming
Android Tutorial

Data Science & ML Web Technologies


Data Science With Python HTML
Data Science For Beginner CSS
Machine Learning JavaScript
ML Maths TypeScript
Data Visualisation ReactJS
Pandas NextJS
NumPy NodeJs
NLP Bootstrap
Deep Learning Tailwind CSS

Open In App
Python Tutorial Computer Science
Python Programming Examples GATE CS Notes
Django Tutorial Operating Systems
Python Projects Computer Network
Python Tkinter Database Management System
Web Scraping Software Engineering
OpenCV Tutorial Digital Logic Design
Python Interview Question Engineering Maths

DevOps System Design


Git High Level Design
AWS Low Level Design
Docker UML Diagrams
Kubernetes Interview Guide
Azure Design Patterns
GCP OOAD
DevOps Roadmap System Design Bootcamp
Interview Questions

School Subjects Databases


Mathematics SQL
Physics MYSQL
Chemistry PostgreSQL
Biology PL/SQL
Social Science MongoDB
English Grammar

Preparation Corner More Tutorials


Company-Wise Recruitment Process Software Development
Aptitude Preparation Software Testing
Puzzles Product Management
Company-Wise Preparation Project Management
Linux
Excel
All Cheat Sheets

Machine Learning/Data Science Programming Languages


Complete Machine Learning & Data Science Program - C Programming with Data Structures
[LIVE] C++ Programming Course
Data Analytics Training using Excel, SQL, Python & Java Programming Course
PowerBI - [LIVE] Python Full Course
Data Science Training Program - [LIVE]
Data Science Course with IBM Certification

Clouds/Devops GATE 2026


DevOps Engineering GATE CS Rank Booster
Open In App
AWS Solutions Architect Certification GATE DA Rank Booster
Salesforce Certified Administrator Course GATE CS & IT Course - 2026
GATE DA Course 2026
GATE Rank Predictor

@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved

Open In App

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy