K-Nearest Neighbors: Nipun Batra July 5, 2020
K-Nearest Neighbors: Nipun Batra July 5, 2020
Nipun Batra
July 5, 2020
IIT Gandhinagar
CLASSIFICATION
¥10
is
1
REDNESS
CLASSIFICATION
ORANGES
)
£
n
•
"'
A"
←
• •
§ •
•
•
o
•
• •
A
• • •
a
⑥
O l
REDNESS
CLASSIFICATION
ORANGES
)
£
n
•
LES
2
.
←
App
. •
± .
.
•
•
• o • •
A
• • •
a
⑥
O l
REDNESS
CLASSIFICATION
LIKE
most
ORANGES
#
AN
/
7
£ ORANGE
"
"
• - '
SIMILAR
APPLES
r
?
←
⇐ % .
.
•
•
orisons .
•
• o • •
•
O • • IN
o
⑥
-
ATTRIBUTES
O l
REDNESS
REGRESSION
CLASSIFICATION
100k
Of
) )
?
m O
•
o O O
?
%
¥ .q .
•
•
•
¥ ! oo
o • •
I •
• ⑧ • o
a
⑨
- -
O l AGE
REDNESS
REGRESSION
CLASSIFICATION
look
f-
'¥
LIKELY ①
/
→
to ?
7 close O
• look o O O
?
%
⇐ .q .
.
•
•
" Ikari ! oo
O B •
f •
To
• ⑧ • o
④
a
HOMES
- -
OF THAT
O l AGE
PRICE
REDNESS
FOR l NN
VORONOI DIAGRAM -
)
•o
Et Boog
¥
H
FEATURE I
FOR l NN
VORONOI DIAGRAM -
)
•q
gun
'
' ' " "" "" " s
ippon
¥
H
FEATURE I
FOR l NN
VORONOI DIAGRAM -
LINE JOINING
I to
A
)
←
ZPOINFS
Be
, MIDPOINT
↳ g-
gun isagoge
' " "" "" " s
"'
¥
H
FEATURE I
FOR l NN
VORONOI DIAGRAM -
BLUE
LABELLED
REGION \ I to LINE JOINING
A
)
←
2 POINTS
Boe
, MIDPOINT
↳ g-
gun
'
' ' " "" "" " s
ippon
¥
H
← REGION LABELLED
# RED
FEATURE I
FOR l NN
VORONOI DIAGRAM -
0Pa
Ps
O
)
°
%
°
" "
Iv
E Op
I 2
¥
-
FEATURE I
FOR l NN
VORONOI DIAGRAM -
084
Ps I
°
!
)
I
°
%
°
" "
Ii
E Op
Q Z
¥
-
FEATURE I
FOR l NN
VORONOI DIAGRAM -
084
!
)
A
•-
°
%
°
" "
gun
t
Q
Op Z
¥
-
FEATURE I
FOR l NN
VORONOI DIAGRAM -
0Pa
l
i A
)
B
••A
: MID PT Btw P' SPG
Ps
o
.
E Op
I 2
¥
-
FEATURE I
FOR l NN
VORONOI DIAGRAM -
0Pa
\
/ I
'
IA
)
.
d•-
Btw P' SPG
Ps - B
A MID PT
.
: °
" ""
" " " "" "
%
° ! °
"
un
&
E Op
I 2
¥
-
FEATURE I
FOR l NN
VORONOI DIAGRAM -
OP 4
:/
I
DECISION 1-
BOUNDARY IS
o
"" ..
In:S: t O
Q pZ
¥
-
FEATURE I
FOR l NN
VORONOI DIAGRAM -
O O
:/
o "
"
DECISION
BOUNDARY IS o
i
ooo
g)
0
o
"" o
Piecewise
In:S:
¥
¥
-
FEATURE I
KNN CLASSIFICATION
↳
+ t
K =L CLASSIFICATION
KNN CLASSIFICATION
t
+ t
- -
-
-
K =L CLASSIFICATION
KNN CLASSIFICATION
I
a
t
+ t t t
-
-
- -
K =3 CLASSIFICATION
K =L CLASSIFICATION
KNN CLASSIFICATION
/
a
f
+ t t t
z NEIGHBOURS
-
-
,
fi
-
- -
Iiy
-
test Point
- -
K =3 CLASSIFICATION
K =L CLASSIFICATION
KNN CLASSIFICATION
/
a
f
+ t t t
z NEIGHBOURS
-
-
,
fi
-
- -
Iiy
-
test Point
- -
K =3 CLASSIFICATION
K =L CLASSIFICATION
LINEAR REGRESSION INN REGRESSION
|•o¢
•
-
Al NZ Nz
LINEAR REGRESSION INN REGRESSION
K t
• •
a a
- -
de NZ NJ NZ NJ
di
ULU , : NN is
Geigy ,)
LINEAR REGRESSION INN REGRESSION
Kt
• •
a a
- -
de NZ NJ NZ NJ
di
ULU , : NN is
Geigy ,)
NL NN is (dis y ,
)
nz :
Z
LINEAR REGRESSION INN REGRESSION
•¥t
•
de NZ NJ NZ
di NJ
ULU , : NN is
Geigy ,)
NN is Cdisy )
Lnitznz
'
,
a -
: NN is Cnzsyz )
Mizz LdLNz3
NON PARAMETRIC
IS
-
KNN
I
+
1-
MODEL
LINEAR
NON PARAMETRIC
IS
-
KNN
f-
+
- -
. -
=
-
1-
MODEL
LINEAR
Decj
Y
'
-
mate ( # patams -
-
)
z
Boundary
NON PARAMETRIC
IS
-
KNN
f-
n
f-
+
I
-
=
. - = . .
- -
. -
-
1- -
KNN ( K- D
MODEL
LINEAR "
Decs ( Utc )
2)
-
LIKE
y
=
"
mate C # patams -
-
Decs
y Boundary
-
-
BOUNDARY
NON PARAMETRIC
IS
-
KNN
f-
n
f-
+
I
-
=
. - = . .
- -
. -
-
1- -
KNN ( K- D
MODEL
LINEAR "
Decs ( Utc )
2)
-
LIKE
y
=
"
mate C # patams -
-
Decs
y Boundary
-
-
BOUNDARY
ADD DATA
\
#
t t
t t
t
i
-
-
= .
.
-
NON PARAMETRIC
IS
-
KNN
f-
n
f-
+
I
-
=
. - = . .
- -
. -
-
1- -
KNN ( K- D
MODEL
LINEAR "
Decs ( mute )
2) LIKE
y
=
"
mate C # patams -
-
Decs
y Boundary
-
-
BOUNDARY
ADD DATA
t÷÷÷- I
n
#
t t t
-
- -
LINEAR MODEL KNN ( k=1 ) e ,
N
LEAST
DECS ( AT
BOUNDARY y=mn+
C ( zpahams ) # pppnams 772
cubic )
Parametric vs Non-Parametric Models
Parametric Non-Parametric
Parameter Number of param- Number of parame-
eters is fixed w.r.t ters grows w.r.t. to an
dataset size increase in dataset
size
Speed Quicker (as the Longer (as number
number of parame- of parameters are
ters are less) less)
Assumptions Strong Assumptions Very few (sometimes
(like linearity in Lin- no) assumptions
ear Regression)
Examples Linear Regression KNN, Decision Tree
1
Lazy vs Eager Strategies
Lazy Eager
Train Time 0 6= 0
Test Long (due to com- Quick (as only
parison with train “parameters” are
data) involved)
Memory Store/Memorise en- Store only learnt pa-
tire data rameters
Utility Useful for online
settings
Examples KNN Linear Regression,
Decision Tree
2
Important Considerations
3
Important Considerations
3
Important Considerations
3
Important Considerations
3
Important Considerations
3
Important Considerations: Distance Metric
4
Important Considerations: Distance Metric
Euclidean Distance
4
Important Considerations: Distance Metric
Hamming Distance
4
Important Considerations: Distance Metric
Manhattan Distance
4
Important Considerations: Value of K
5
Important Considerations: Value of K
5
Important Considerations: Value of K
5
Important Considerations: Value of K
6 6
5 5
4 4
3 3
Y
Y
2 2
1 1
0 0
1
3 2 1 0 1 2 3 4 3 2 1 0 1 2 3 4
X X
6
Important Considerations: Value of K
6 6
5 5
4 4
3 3
Y
Y
2 2
1 1
0 0
1 1
3 2 1 0 1 2 3 4 3 2 1 0 1 2 3 4
X X
6
Aggregating data
• Median
• Mean
• Mode
7
KNN Algorithm
8
KNN Algorithm
8
KNN Algorithm
8
KNN Algorithm
8
Curse of Dimensionality
9
Curse of Dimensionality
1.4
1.2
1.0
0.8
0.6
0.4
0.2
2.5 5.0 7.5 10.0 12.5 15.0 17.5
Number of dimensions (d)
9
Curse of Dimensionality
101
11
Approximate Nearest Neighbors
2
Y
2
4 2 0 2 4
X
12
Approximate Nearest Neighbors
12
Approximate Nearest Neighbors
12
Approximate Nearest Neighbors
12
Approximate Nearest Neighbors
12
Locality sensitive hashing
13
Locality sensitive hashing
13
Locality sensitive hashing