0% found this document useful (0 votes)
9 views10 pages

Miss Erum Mahood Topic: KNN Algorthim: Presentator BY: Zobia Malaika Maryam Minahil

The K-Nearest Neighbors (KNN) algorithm classifies data points based on the proximity of nearby points, using a majority voting system to determine the category of a new data point. KNN is a lazy learner that requires no training phase and utilizes distance metrics like Euclidean, Manhattan, and Minkowski to find the nearest neighbors. While KNN is easy to implement and flexible for both classification and regression tasks, it has drawbacks such as poor scalability with large datasets and susceptibility to overfitting due to the curse of dimensionality.

Uploaded by

anum.ashraf237
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views10 pages

Miss Erum Mahood Topic: KNN Algorthim: Presentator BY: Zobia Malaika Maryam Minahil

The K-Nearest Neighbors (KNN) algorithm classifies data points based on the proximity of nearby points, using a majority voting system to determine the category of a new data point. KNN is a lazy learner that requires no training phase and utilizes distance metrics like Euclidean, Manhattan, and Minkowski to find the nearest neighbors. While KNN is easy to implement and flexible for both classification and regression tasks, it has drawbacks such as poor scalability with large datasets and susceptibility to overfitting due to the curse of dimensionality.

Uploaded by

anum.ashraf237
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

MISS ERUM MAHOOD

TOPIC :
KNN ALGORTHIM
PRESENTATOR
BY:
ZOBIA
MALAIKA
MARYAM
K-Nearest Neighbors (KNN) is a simple way to classify things by
looking at what’s nearby. Imagine a streaming service wants to
predict if a new user is likely to cancel their subscription
(churn) based on their age. They checks the ages of its existing
users and whether they churned or stayed. If most of the “K”
closest users in age of new user canceled their subscription
KNN will predict the new user might churn too. The key idea
is that users with similar ages tend to have similar
behaviors and KNN uses this closeness to make decisions.
Getting Started with K-Nearest Neighbors:
• K-Nearest Neighbors is also called as a lazy learner
algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of
classification it performs an action on the dataset.
• As an example, consider the following table of data points
containing two features:
The new point is classified as Category 2 because most of
its closest neighbors are blue squares. KNN assigns the
category based on the majority of nearby points.
The image shows how KNN predicts the category of a new
data point based on its closest neighbours.
• The red diamonds represent Category 1 and the blue
squares represent Category 2.

• The new data point checks its closest neighbours


(circled points).

• Since the majority of its closest neighbours are blue


squares (Category 2) KNN predicts the new data point
belongs to Category 2.
What is ‘K’ in K Nearest Neighbour ?
In the k-Nearest Neighbours (k-NN) algorithm k is just a number that tells the
algorithm how many nearby points (neighbours) to look at when it makes a
decision.Example:
Imagine you’re deciding which fruit it is based on its shape and size. You compare it to
fruits you already know.
• If k = 3, the algorithm looks at the 3 closest fruits to the new one. If 2 of those 3 fruits
are apples and 1 is a banana, the algorithm says the new fruit is an apple because most
of its neighbours are apples.

Distance Metrics Used in KNN Algorithm


KNN uses distance metrics to identify nearest neighbour, these neighbours are used for
classification and regression task. To identify nearest neighbour we use below distance
metrics:
1. Euclidean Distance

Euclidean distance is defined as the straight-line distance between two points in a plane or
space. You can think of it like the shortest path you would walk if you were to go directly
from one point to another.
distance(x,Xi)=∑j=1d(xj–Xij)2]distance(x,Xi​)=∑j=1d​(xj​–Xij​​)2​]
2. Manhattan Distance

This is the total distance you would travel if you could only move along horizontal
and vertical lines (like a grid or city streets). It’s also called “taxicab distance”
because a taxi can only drive along the grid-like streets of a city.
d(x,y)=∑i=1n∣xi−yi∣d(x,y)=∑i=1n​∣xi​−yi​∣
3. Minkowski Distance

Minkowski distance is like a family of distances, which includes


both Euclidean and Manhattan distances as special cases.
d(x,y)=(∑i=1n(xi−yi)p)1pd(x,y)=(∑i=1n​(xi​−yi​)p)p1​
From the formula above we can say that when p = 2 then it is the same as the
formula for the Euclidean distance and when p = 1 then we obtain the formula
for the Manhattan distance.
So, you can think of Minkowski as a flexible distance formula that can look like
either Manhattan or Euclidean distance depending on the value of p
• Working of KNN algorithm
Step 1: Selecting the optimal value of K

• K represents the number of nearest neighbors that needs to be considered while making
prediction.
Step 2: Calculating distance

• To measure the similarity between target and training data points Euclidean
distance is used. Distance is calculated between data points in the dataset and
target point.
Step 3: Finding Nearest Neighbors

• The k data points with the smallest distances to the target point are nearest
neighbors.
Step 4: Voting for Classification or Taking Average for Regression
• When you want to classify a data point into a category (like spam or not spam), the K-NN
algorithm looks at the K
closest points in the dataset. These closest points are called neighbors. The algorithm then
looks at which category
the neighbors belong to and picks the one that appears the most. This is called majority
voting.
Advantages and Disadvantages of the KNN Algorithm
Advantages:
• Easy to implement: The KNN algorithm is easy to implement because
its complexity is relatively low as compared to other machine learning
algorithms.

• No training required: KNN stores all data in memory and doesn’t


require any training so when new data points are added it automatically
adjusts and uses the new data for future predictions.

• Few Hyperparameters: The only parameters which are required in the


training of a KNN algorithm are the value of k and the choice of the
distance metric which we would like to choose from our evaluation
metric.

• Flexible: It works for Classification problem like is this email spam or


not? and also work for Regression task like predicting house prices
based on nearby similar houses.
Disadvantages:
• Doesn’t scale well: KNN is considered as a “lazy”
algorithm as it is very slow especially with large datasets

• Curse of Dimensionality: When the number of features


increases KNN struggles to classify data accurately a
problem known as curse of dimensionality.

• Prone to Overfitting: As the algorithm is affected due to


the curse of dimensionality it is prone to the problem of
overfitting as well.
EXAMPLE:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy