AI Lec4
AI Lec4
1
KNN - Definition
2
KNN - Definition
3
KNN – Number of Neighbors
• If K=1, select the nearest neighbor
• If K>1,
– For classification select the most frequent
neighbor.
– For regression calculate the average of K
neighbors.
4
A simple example to understand the intuition behind KNN
Table consists of the height, age and weight (target) value for 10 people. As you can see, the
weight value of ID11 is missing. We need to predict the weight of this person based on their
height and age.
5
A simple example to understand the intuition behind KNN
Table consists of the height, age and weight (target) value for 10 people. As you can see, the
weight value of ID11 is missing. We need to predict the weight of this person based on their
height and age.
For a value k = 3, the closest points are ID1, ID5 and ID6.
The average of these data points is the final prediction for the new point. Here, we have weight
of ID11 = (77+72+60)/3 = 69.66 kg. 6
A simple example to understand the intuition behind KNN
Table consists of the height, age and weight (target) value for 10 people. As you can see, the
weight value of ID11 is missing. We need to predict the weight of this person based on their
height and age.
For the value of k=5, the closest point will be ID1, ID4, ID5, ID6, ID10.
ID 11 = (77+59+72+60+58)/5 = 65.2 kg 7
KNN Classification
Loan$
Age
8
KNN Classification
9
KNN Classification
10
KNN Classification
11
KNN Classification
12
KNN Classification – Distance
Age Loan Default Distance
25 $40,000 N 102000
35 $60,000 N 82000
45 $80,000 N 62000
20 $20,000 N 122000
35 $120,000 N 22000 2
52 $18,000 N 124000
23 $95,000 Y 47000
40 $62,000 Y 80000
60 $100,000 Y 42000 3
48 $220,000 Y 78000
33 $150,000 Y 8000 1
48 $142,000 ?
t a nce
is
Eu cl i dean
D
D ( x1 x2 ) ( y1 y2 )
2 2
13
KNN Classification – Standardized Distance
One major drawback in calculating distance measures directly from the training set is in the case where variables have
different measurement scales For example, if one variable is based on annual income in dollars, and the other is based on
age in years then income will have a much higher influence on the distance calculated. One solution is to standardize the
training set as shown below.
0.7 0.61 ?
i a ble
d Va r
X Min
nda r diz
e
Xs
Sta Max Min 14
KNN Regression - Distance
Consider the following data concerning House Price Index or HPI. Age and Loan are two
numerical variables (predictors) and HPI is the numerical target.
If K=1 then the nearest neighbor is the last case in the training set with HPI=264.
By having K=3, the prediction for HPI is equal to the average of HPI for the top three neighbors.
15
HPI = (264+139+139)/3 = 180.7
KNN Regression – Standardized Distance
Age Loan House Price Index Distance
0.125 0.11 135 0.7652
0.375 0.21 256 0.5200
0.625 0.31 231 0.3160
0 0.01 267 0.9245
0.375 0.50 139 0.3428
0.8 0.00 150 0.6220
0.075 0.38 127 0.6669
0.5 0.22 216 0.4437
1 0.41 139 0.3650
0.7 1.00 250 0.3861
0.325 0.65 264 0.3771
0.7 0.61 ?
X Min
Xs
Max Min
As mentioned in KNN Classification using the standardized distance on the same training set, the unknown
16 cas
returned a different neighbor which is not a good sign of robustness.
Summary
• KNN is conceptually simple, yet able to solve
complex problems
• Can work with relatively little information
• Learning is simple (no learning at all!)
• Feature selection problem
• Sensitive to representation
www.ismartsoft.com 17