0% found this document useful (0 votes)
5 views10 pages

Unit 4.8 KNN

K-Nearest Neighbors (KNN) is a supervised learning algorithm that classifies new instances based on their similarity to the k closest examples in the training data, using distance metrics like Euclidean distance. It is a lazy learner that stores training data and computes distances only when a query is received, making it adaptable but sensitive to the choice of k. KNN is particularly effective in applications such as medical diagnosis, spam filtering, and recommendation systems, though it requires careful management of computational costs and k selection for optimal performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views10 pages

Unit 4.8 KNN

K-Nearest Neighbors (KNN) is a supervised learning algorithm that classifies new instances based on their similarity to the k closest examples in the training data, using distance metrics like Euclidean distance. It is a lazy learner that stores training data and computes distances only when a query is received, making it adaptable but sensitive to the choice of k. KNN is particularly effective in applications such as medical diagnosis, spam filtering, and recommendation systems, though it requires careful management of computational costs and k selection for optimal performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

KNN

Definition:

K-Nearest Neighbors (KNN) is a supervised


learning algorithm that classifies or predicts
a new instance based on its similarity to the
k closest examples in the training data, using
a distance metric such as Euclidean distance.
It is non-parametric and considered a lazy
learner because it stores the training data
and delays computation until a query is
received.
How KNN Works:
1. Storage of Data (Lazy Learning):
• KNN does not create a general
model during the training phase.
Instead, it memorizes the training
data and defers computation until a
new query instance needs to be
classified or predicted.
2. Determining Similarity:
• When a new data point arrives,
KNN computes the distance between
this new instance and every instance
in the training dataset. Commonly
used metrics include Euclidean,
Manhattan, or Hamming distance.
3. Selection of Neighbors:
• After calculating the distances,
the algorithm selects the k closest
data points (neighbors) to the new
instance.
4. Making Predictions:
• For Classification:
Choosing the Value of k:
• A small k value (e.g., k = 1 or 2) can be overly sensitive to noise and outliers, leading to
unstable predictions.
• A large k value may smooth out class boundaries too much, causing misclassification.
• Typically, one determines the optimal k by plotting the error rate against different k values
Example – Medical Diagnosis (Diabetes
Prediction):Imagine a healthcare system that
needs to predict whether a patient is at risk of
diabetes based on features such as blood
sugar levels, body mass index (BMI), age, and
blood pressure. Here’s how KNN would work in
this scenario:
• Data Collection & Storage:
• A dataset is collected from past patient
records where each record includes the
patient’s measurements and a label indicating
whether they were diagnosed with diabetes.
• The entire dataset is stored as is
(no model is built initially).
• Query Processing:
• When a new patient record arrives,
the system calculates the Euclidean
distance between this new record and
all stored patient records.
• Suppose the algorithm is set to k =
5, and the five nearest records include
four patients who were diagnosed with
diabetes and one who was not.
• Prediction: • Since the majority of the
nearest neighbors (4 out of 5) belong to the
“diabetes” class, the KNN classifier predicts
that the new patient is at risk of diabetes. •
Advantages in This Context: • Adaptability:
As new patient records are collected, they
are simply added to the dataset, allowing the
model to adapt without retraining. •
Intuitive Reasoning: The method “learns” by
finding similarities, which is analogous to
how doctors might compare a new patient’s
symptoms with previous cases.
KNN is a straightforward yet powerful
algorithm that leverages the similarity
between data points to make predictions. It
is especially useful in applications like
medical diagnosis, spam filtering, and
recommendation systems. While its
simplicity and adaptability are significant
strengths, the computational cost during
prediction and the sensitivity to the choice
of k must be carefully managed for optimal
performance.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy