0% found this document useful (0 votes)
25 views17 pages

AI Lec4

K Nearest Neighbors (KNN) is an algorithm that classifies or predicts new data points based on their similarity to existing data points. It stores all available cases and classifies new cases based on a similarity measure to their K nearest neighbors. The value of K determines whether the algorithm predicts the most common class (for classification) or averages the values (for regression) of the K nearest neighbors. Standardizing the data is important to account for different measurement scales when calculating distances between points.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views17 pages

AI Lec4

K Nearest Neighbors (KNN) is an algorithm that classifies or predicts new data points based on their similarity to existing data points. It stores all available cases and classifies new cases based on a similarity measure to their K nearest neighbors. The value of K determines whether the algorithm predicts the most common class (for classification) or averages the values (for regression) of the K nearest neighbors. Standardizing the data is important to account for different measurement scales when calculating distances between points.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 17

K Nearest Neighbors

1
KNN - Definition

KNN is a simple algorithm that stores


all available cases and classifies new
cases based on a similarity measure

2
KNN - Definition

3
KNN – Number of Neighbors
• If K=1, select the nearest neighbor
• If K>1,
– For classification select the most frequent
neighbor.
– For regression calculate the average of K
neighbors.

4
A simple example to understand the intuition behind KNN
Table consists of the height, age and weight (target) value for 10 people. As you can see, the
weight value of ID11 is missing. We need to predict the weight of this person based on their
height and age.

5
A simple example to understand the intuition behind KNN
Table consists of the height, age and weight (target) value for 10 people. As you can see, the
weight value of ID11 is missing. We need to predict the weight of this person based on their
height and age.

For a value k = 3, the closest points are ID1, ID5 and ID6.

The average of these data points is the final prediction for the new point. Here, we have weight
of ID11 = (77+72+60)/3 = 69.66 kg. 6
A simple example to understand the intuition behind KNN
Table consists of the height, age and weight (target) value for 10 people. As you can see, the
weight value of ID11 is missing. We need to predict the weight of this person based on their
height and age.

For the value of k=5, the closest point will be ID1, ID4, ID5, ID6, ID10.

ID 11 = (77+59+72+60+58)/5 = 65.2 kg 7
KNN Classification

Loan$

Age

8
KNN Classification

9
KNN Classification

10
KNN Classification

11
KNN Classification

12
KNN Classification – Distance
Age Loan Default Distance
25 $40,000 N 102000
35 $60,000 N 82000
45 $80,000 N 62000
20 $20,000 N 122000
35 $120,000 N 22000 2
52 $18,000 N 124000
23 $95,000 Y 47000
40 $62,000 Y 80000
60 $100,000 Y 42000 3
48 $220,000 Y 78000
33 $150,000 Y 8000 1

48 $142,000 ?
t a nce
is

Eu cl i dean
D
D  ( x1  x2 )  ( y1  y2 )
2 2

13
KNN Classification – Standardized Distance
One major drawback in calculating distance measures directly from the training set is in the case where variables have
different measurement scales For example, if one variable is based on annual income in dollars, and the other is based on
age in years then income will have a much higher influence on the distance calculated. One solution is to standardize the
training set as shown below.

Age Loan Default Distance


0.125 0.11 N 0.7652
0.375 0.21 N 0.5200
0.625 0.31 N 0.3160
0 0.01 N 0.9245
0.375 0.50 N 0.3428
0.8 0.00 N 0.6220
0.075 0.38 Y 0.6669
0.5 0.22 Y 0.4437
1 0.41 Y 0.3650
0.7 1.00 Y 0.3861
0.325 0.65 Y 0.3771

0.7 0.61 ?

i a ble
d Va r
X  Min
nda r diz
e
Xs 
Sta Max  Min 14
KNN Regression - Distance
Consider the following data concerning House Price Index or HPI. Age and Loan are two
numerical variables (predictors) and HPI is the numerical target.

If K=1 then the nearest neighbor is the last case in the training set with HPI=264.
By having K=3, the prediction for HPI is equal to the average of HPI for the top three neighbors.

15
HPI = (264+139+139)/3 = 180.7
KNN Regression – Standardized Distance
Age Loan House Price Index Distance
0.125 0.11 135 0.7652
0.375 0.21 256 0.5200
0.625 0.31 231 0.3160
0 0.01 267 0.9245
0.375 0.50 139 0.3428
0.8 0.00 150 0.6220
0.075 0.38 127 0.6669
0.5 0.22 216 0.4437
1 0.41 139 0.3650
0.7 1.00 250 0.3861
0.325 0.65 264 0.3771

0.7 0.61 ?
X  Min
Xs 
Max  Min
As mentioned in KNN Classification using the standardized distance on the same training set, the unknown
16 cas
returned a different neighbor which is not a good sign of robustness.
Summary
• KNN is conceptually simple, yet able to solve
complex problems
• Can work with relatively little information
• Learning is simple (no learning at all!)
• Feature selection problem
• Sensitive to representation

www.ismartsoft.com 17

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy