0% found this document useful (0 votes)

47 views66 pages

K-Nearest Neighbors: Nipun Batra July 5, 2020

Uploaded by

samyakiitgn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views66 pages

K-Nearest Neighbors: Nipun Batra July 5, 2020

Uploaded by

samyakiitgn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

K-Nearest Neighbors

Nipun Batra
July 5, 2020
IIT Gandhinagar
CLASSIFICATION

¥10
is

1
REDNESS
CLASSIFICATION

ORANGES

)
£
n
•
"'
A"
←
• •

§ •

•
•
o
•

• •
A
• • •
a
⑥

O l
REDNESS
CLASSIFICATION

ORANGES

)
£
n
•
LES
2
.
←
App
. •
± .
.
•
•
• o • •
A
• • •
a
⑥

O l
REDNESS
CLASSIFICATION

LIKE
most
ORANGES
#
AN

/
7
£ ORANGE
"
"
• - '
SIMILAR
APPLES
r

?
←

⇐ % .
.
•
•
orisons .

•
• o • •
•
O • • IN
o
⑥

-
ATTRIBUTES
O l
REDNESS
REGRESSION
CLASSIFICATION

100k
Of

) )
?
m O
•
o O O
?
%
¥ .q .
•
•
•
¥ ! oo
o • •
I •
• ⑧ • o
a
⑨

- -

O l AGE
REDNESS
REGRESSION
CLASSIFICATION

look
f-

'¥
LIKELY ①

/
→
to ?
7 close O
• look o O O
?
%
⇐ .q .
.
•
•
" Ikari ! oo
O B •
f •
To
• ⑧ • o
④
a
HOMES
- -
OF THAT
O l AGE
PRICE
REDNESS
FOR l NN
VORONOI DIAGRAM -

)
•o

Et Boog
¥
H

FEATURE I
FOR l NN
VORONOI DIAGRAM -

)
•q
gun
'
' ' " "" "" " s

ippon
¥
H

FEATURE I
FOR l NN
VORONOI DIAGRAM -

LINE JOINING
I to

A
)
←
ZPOINFS

Be
, MIDPOINT
↳ g-

gun isagoge
' " "" "" " s
"'

¥
H

FEATURE I
FOR l NN
VORONOI DIAGRAM -

BLUE
LABELLED
REGION \ I to LINE JOINING

A
)
←
2 POINTS

Boe
, MIDPOINT
↳ g-

gun
'
' ' " "" "" " s

ippon
¥
H
← REGION LABELLED

# RED

FEATURE I
FOR l NN
VORONOI DIAGRAM -

0Pa
Ps
O

)
°
%
°
" "
Iv
E Op
I 2

¥
-

FEATURE I
FOR l NN
VORONOI DIAGRAM -

084
Ps I
°
!

)
I

°
%
°
" "
Ii
E Op
Q Z

¥
-

FEATURE I
FOR l NN
VORONOI DIAGRAM -

084
!

)
A
•-

Btw P' SPG

Ps
A MID PT
.
: o

°
%
°
" "

gun
t
Q
Op Z

¥
-

FEATURE I
FOR l NN
VORONOI DIAGRAM -

0Pa
l

i A

)
B

••A
: MID PT Btw P' SPG
Ps
o
.

" "" " °

%
" " °
" "" " "
un
&

E Op
I 2

¥
-

FEATURE I
FOR l NN
VORONOI DIAGRAM -

0Pa

\
/ I
'
IA

)
.

d•-
Btw P' SPG
Ps - B
A MID PT
.
: °

" ""
" " " "" "
%
° ! °
"
un
&

E Op
I 2

¥
-

FEATURE I
FOR l NN
VORONOI DIAGRAM -

OP 4

:/
I
DECISION 1-

BOUNDARY IS
o
"" ..

In:S: t O
Q pZ

¥
-

FEATURE I
FOR l NN
VORONOI DIAGRAM -

O O

:/
o "
"

DECISION
BOUNDARY IS o
i
ooo

g)
0
o

"" o
Piecewise
In:S:
¥
¥
-

FEATURE I
KNN CLASSIFICATION

↳
+ t

K =L CLASSIFICATION
KNN CLASSIFICATION

t
+ t

- -
-
-

K =L CLASSIFICATION
KNN CLASSIFICATION

I
a

t
+ t t t

-
-

- -

K =3 CLASSIFICATION
K =L CLASSIFICATION
KNN CLASSIFICATION

/
a

f
+ t t t
z NEIGHBOURS
-
-
,

fi
-
- -
Iiy
-
test Point

- -

K =3 CLASSIFICATION
K =L CLASSIFICATION
KNN CLASSIFICATION

/
a

f
+ t t t
z NEIGHBOURS
-
-
,

fi
-
- -
Iiy
-
test Point

- -

K =3 CLASSIFICATION
K =L CLASSIFICATION
LINEAR REGRESSION INN REGRESSION

|•o¢
•

-
Al NZ Nz
LINEAR REGRESSION INN REGRESSION

K t
• •
a a

- -
de NZ NJ NZ NJ
di

ULU , : NN is
Geigy ,)
LINEAR REGRESSION INN REGRESSION

Kt
• •
a a

- -
de NZ NJ NZ NJ
di

ULU , : NN is
Geigy ,)
NL NN is (dis y ,
)
nz :

Z
LINEAR REGRESSION INN REGRESSION

•¥t
•

de NZ NJ NZ
di NJ

ULU , : NN is
Geigy ,)
NN is Cdisy )
Lnitznz
'
,
a -

: NN is Cnzsyz )
Mizz LdLNz3
NON PARAMETRIC
IS
-

KNN

I
+

1-
MODEL
LINEAR
NON PARAMETRIC
IS
-

KNN

f-
+
- -
. -

=
-

1-
MODEL
LINEAR
Decj
Y
'
-
mate ( # patams -
-

)
z

Boundary
NON PARAMETRIC
IS
-

KNN

f-
n

f-
+

I
-

=
. - = . .

- -
. -
-

1- -

KNN ( K- D
MODEL
LINEAR "

Decs ( Utc )
2)
-
LIKE
y
=
"

mate C # patams -
-

Decs
y Boundary
-
-

BOUNDARY
NON PARAMETRIC
IS
-

KNN

f-
n

f-
+

I
-

=
. - = . .

- -
. -
-

1- -

KNN ( K- D
MODEL
LINEAR "

Decs ( Utc )
2)
-
LIKE
y
=
"

mate C # patams -
-

Decs
y Boundary
-
-

BOUNDARY

ADD DATA

\
#
t t
t t
t

i
-
-
= .
.

-
NON PARAMETRIC
IS
-

KNN

f-
n

f-
+

I
-

=
. - = . .

- -
. -
-

1- -

KNN ( K- D
MODEL
LINEAR "

Decs ( mute )
2) LIKE
y
=
"

mate C # patams -
-

Decs
y Boundary
-
-

BOUNDARY

ADD DATA

t÷÷÷- I
n

#
t t t
-

- -
LINEAR MODEL KNN ( k=1 ) e ,

N
LEAST
DECS ( AT

BOUNDARY y=mn+
C ( zpahams ) # pppnams 772
cubic )
Parametric vs Non-Parametric Models

Parametric Non-Parametric
Parameter Number of param- Number of parame-
eters is fixed w.r.t ters grows w.r.t. to an
dataset size increase in dataset
size
Speed Quicker (as the Longer (as number
number of parame- of parameters are
ters are less) less)
Assumptions Strong Assumptions Very few (sometimes
(like linearity in Lin- no) assumptions
ear Regression)
Examples Linear Regression KNN, Decision Tree

1
Lazy vs Eager Strategies

Lazy Eager
Train Time 0 6= 0
Test Long (due to com- Quick (as only
parison with train “parameters” are
data) involved)
Memory Store/Memorise en- Store only learnt pa-
tire data rameters
Utility Useful for online
settings
Examples KNN Linear Regression,
Decision Tree

2
Important Considerations

• What are the features that will be considered for data

similarity?

3
Important Considerations

• What are the features that will be considered for data

similarity?
• What is the distance metric that will be used to calculate
data similarity?

3
Important Considerations

• What are the features that will be considered for data

similarity?
• What is the distance metric that will be used to calculate
data similarity?
• What is the aggregation function that is going to be used?

3
Important Considerations

• What are the features that will be considered for data

similarity?
• What is the distance metric that will be used to calculate
data similarity?
• What is the aggregation function that is going to be used?
• What are the number of neighbors that you are going to
take into consideration?

3
Important Considerations

• What are the features that will be considered for data

3
Important Considerations: Distance Metric

The Distance Metric acts as a measure of similarity between

the points.

4
Important Considerations: Distance Metric

The Distance Metric acts as a measure of similarity between

the points.

Euclidean Distance

4
Important Considerations: Distance Metric

The Distance Metric acts as a measure of similarity between

the points.

Hamming Distance

4
Important Considerations: Distance Metric

The Distance Metric acts as a measure of similarity between

the points.

Manhattan Distance

4
Important Considerations: Value of K

Choosing the correct value of K is difficult.

5
Important Considerations: Value of K

Choosing the correct value of K is difficult.

Low values of K will result in each point having a very high
influence on the final output =⇒ noise will influence the
result

5
Important Considerations: Value of K

Choosing the correct value of K is difficult.

Low values of K will result in each point having a very high
influence on the final output =⇒ noise will influence the
result
High values of K will result in smoother decision boundaries
=⇒ lower variance but also higher bias

5
Important Considerations: Value of K

6 6
5 5
4 4
3 3
Y

Y
2 2
1 1
0 0
1
3 2 1 0 1 2 3 4 3 2 1 0 1 2 3 4
X X

Dataset K = 1 High Variance

6
Important Considerations: Value of K

6 6
5 5
4 4
3 3
Y

Y
2 2
1 1
0 0
1 1
3 2 1 0 1 2 3 4 3 2 1 0 1 2 3 4
X X

K=3 K = 9 High Bias

6
Aggregating data

There are different ways to go about aggregating the data from

the K nearest neighbors.

• Median
• Mean
• Mode

7
KNN Algorithm

• Keep the entire dataset: (x, y)

8
KNN Algorithm

• Keep the entire dataset: (x, y)

• For a query vector q:

8
KNN Algorithm

• Keep the entire dataset: (x, y)

• For a query vector q:
1. Find the k-closest data point(s) x∗

8
KNN Algorithm

• Keep the entire dataset: (x, y)

• For a query vector q:
1. Find the k-closest data point(s) x∗
2. Predict y ∗

8
Curse of Dimensionality

With an increase in the number of dimensions:

9
Curse of Dimensionality

With an increase in the number of dimensions:

1. the distance between points starts to increase

1.6
Mean distance between two points

1.4
1.2
1.0
0.8
0.6
0.4
0.2
2.5 5.0 7.5 10.0 12.5 15.0 17.5
Number of dimensions (d)

For a unifromly random dataset

9
Curse of Dimensionality

With an increase in the number of dimensions:

1. the distance between points starts to increase
2. the variation in distances between points starts to
decrease
102
Ratio of max to min distances

101

2.5 5.0 7.5 10.0 12.5 15.0 17.5

Number of dimensions (d)

For a unifromly random dataset 10

Curse of Dimensionality

With an increase in the number of dimensions:

1. the distance between points starts to increase

2. the variation in distances between points starts to
decrease

Due to this, distance metrics lose their efficacy as a similarity

metric.

11
Approximate Nearest Neighbors

Doing an exhaustive search over all the points is time

consuming, especially if you have a large number of data
points.

2
Y

2
4 2 0 2 4
X

Example of a big dataset

12
Approximate Nearest Neighbors

Doing an exhaustive search over all the points is time

consuming, especially if you have a large number of data
points.
If you are willing to sacrifice accuracy there are algorithms that
can give you improvements that go into orders of magnitude.

12
Approximate Nearest Neighbors

Doing an exhaustive search over all the points is time

consuming, especially if you have a large number of data
points.
If you are willing to sacrifice accuracy there are algorithms that
can give you improvements that go into orders of magnitude.
Such techniques include:

12
Approximate Nearest Neighbors

Doing an exhaustive search over all the points is time

• Locality sensitive hashing

12
Approximate Nearest Neighbors

Doing an exhaustive search over all the points is time

• Locality sensitive hashing

• Vector approximation files

12
Approximate Nearest Neighbors

Doing an exhaustive search over all the points is time

• Locality sensitive hashing

• Vector approximation files
• Greedy search in proximity neighborhood graphs

12
Locality sensitive hashing

Normal hash functions H(x) try to keep the collision of points

across bins uniform.

Example of a big dataset

13
Locality sensitive hashing

Normal hash functions H(x) try to keep the collision of points

across bins uniform.
A locality sensitive hash (LSH) function L(x) would be designed
such that similar values are mapped to similar bins.

Example of a big dataset

13
Locality sensitive hashing

A locality sensitive hash (LSH) function L(x) would be designed

such that similar values are mapped to similar bins.
For such cases, all elements in a bin would be given the same
label, which again can be decided on the basis of different
aggregation methods

Example of a big dataset

BURMAYEARBOOK2007
100% (1)
BURMAYEARBOOK2007
964 pages
WON A Corp Is Entitled To Moral Damages
100% (1)
WON A Corp Is Entitled To Moral Damages
6 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
05 KNN
No ratings yet
05 KNN
49 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
1 KNN-Algo
No ratings yet
1 KNN-Algo
27 pages
ML
No ratings yet
ML
8 pages
Notes 02
No ratings yet
Notes 02
79 pages
ML Unit 2
No ratings yet
ML Unit 2
24 pages
Strategic Plan - UnderArmour
75% (4)
Strategic Plan - UnderArmour
21 pages
UNIT-2 ML Notes
No ratings yet
UNIT-2 ML Notes
15 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
AIML
No ratings yet
AIML
13 pages
INSY446 - 5 - Classification Part 2
No ratings yet
INSY446 - 5 - Classification Part 2
37 pages
KNN
No ratings yet
KNN
5 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
IP Monographs Development by IPC
No ratings yet
IP Monographs Development by IPC
86 pages
Lecture Slides-Week15,16
No ratings yet
Lecture Slides-Week15,16
50 pages
Session 9 KNN - 2024
No ratings yet
Session 9 KNN - 2024
23 pages
9.introduction To Artificial Intelligence
No ratings yet
9.introduction To Artificial Intelligence
14 pages
Lecture 2 - Nearest-Neighbors Methods
No ratings yet
Lecture 2 - Nearest-Neighbors Methods
57 pages
Machine Learning Module-03
No ratings yet
Machine Learning Module-03
24 pages
Care Management of Small Ruminant
No ratings yet
Care Management of Small Ruminant
29 pages
Introduction To Classification - KNN
No ratings yet
Introduction To Classification - KNN
29 pages
K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
Berger and Luckman Sociology of Knowledge
100% (1)
Berger and Luckman Sociology of Knowledge
7 pages
Similarity Based Learning (Part 2)
No ratings yet
Similarity Based Learning (Part 2)
15 pages
ML Unit-2
No ratings yet
ML Unit-2
55 pages
Ch2 - Lec2 - K Nearest Neighbour (KNN)
No ratings yet
Ch2 - Lec2 - K Nearest Neighbour (KNN)
18 pages
Predict Based Simmiliarity and Validation
No ratings yet
Predict Based Simmiliarity and Validation
19 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
K Nearest Neighbor Classification
No ratings yet
K Nearest Neighbor Classification
30 pages
The Flow of Food: Storage General Storage Guidelines
No ratings yet
The Flow of Food: Storage General Storage Guidelines
18 pages
m3 Final-1
No ratings yet
m3 Final-1
171 pages
Week 5 - Instance-Based Learning & PCA
No ratings yet
Week 5 - Instance-Based Learning & PCA
69 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
EWM Setup - Expertsoft
No ratings yet
EWM Setup - Expertsoft
7 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
K Nearest Neighbor Classification
0% (1)
K Nearest Neighbor Classification
32 pages
KNN - Feb 19
No ratings yet
KNN - Feb 19
42 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
Lesson 6 Similarities KNN
No ratings yet
Lesson 6 Similarities KNN
25 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Chapter 2
No ratings yet
Chapter 2
26 pages
Anthology of Harmonica Tunings
100% (4)
Anthology of Harmonica Tunings
69 pages
DSB - Unit3
No ratings yet
DSB - Unit3
87 pages
K Nearest Neighbour - Algorithm
No ratings yet
K Nearest Neighbour - Algorithm
29 pages
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
No ratings yet
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
8 pages
DS - Module 3
No ratings yet
DS - Module 3
65 pages
Lecture-11-KNearest Clustering-Part-1
No ratings yet
Lecture-11-KNearest Clustering-Part-1
18 pages
4.4-InstanceBasedLearning Part 1
No ratings yet
4.4-InstanceBasedLearning Part 1
16 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
25 pages
Week 07
No ratings yet
Week 07
24 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
Jurisprudence Syllabus - NAAC - New
No ratings yet
Jurisprudence Syllabus - NAAC - New
8 pages
The Egyptian Culture PowerPoint
No ratings yet
The Egyptian Culture PowerPoint
29 pages
Lista de Precios General en $ y Bs 24.08.2020
No ratings yet
Lista de Precios General en $ y Bs 24.08.2020
16 pages
Gorkha School Level 3w Reporting Form-2015!07!23
No ratings yet
Gorkha School Level 3w Reporting Form-2015!07!23
350 pages
Unit II 2 Mark Answers ML
No ratings yet
Unit II 2 Mark Answers ML
3 pages
Why Do We Need A K-NN Algorithm?
No ratings yet
Why Do We Need A K-NN Algorithm?
11 pages
4 KNN Classifier
No ratings yet
4 KNN Classifier
6 pages
De-140613-160123-Fortnight Attendence and Mid Marks Uploading Instructions To Principals
No ratings yet
De-140613-160123-Fortnight Attendence and Mid Marks Uploading Instructions To Principals
2 pages
An Empirical Study of Distance Metrics For K-Nearest Neighbor Algorithm
No ratings yet
An Empirical Study of Distance Metrics For K-Nearest Neighbor Algorithm
6 pages
Activins in Adipogenesis and Obesity: Review
No ratings yet
Activins in Adipogenesis and Obesity: Review
4 pages
ML Unit 2
No ratings yet
ML Unit 2
11 pages
Acad Cal S1 2013-2014 - v1 PDF
No ratings yet
Acad Cal S1 2013-2014 - v1 PDF
2 pages
K-Nearest Neighbour Classifier: Prerequisite
No ratings yet
K-Nearest Neighbour Classifier: Prerequisite
6 pages
Atpl Ins Ques 2 N
No ratings yet
Atpl Ins Ques 2 N
14 pages
Final Script Assembly Play
No ratings yet
Final Script Assembly Play
3 pages
Differential Equation
No ratings yet
Differential Equation
13 pages
Chapter-4 Financial Statement Frauds
No ratings yet
Chapter-4 Financial Statement Frauds
38 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
Marketing Channels LH
No ratings yet
Marketing Channels LH
64 pages
Account Statuses BRD
No ratings yet
Account Statuses BRD
8 pages
Art Market
No ratings yet
Art Market
23 pages
10 Civics Ch-1 Notes
No ratings yet
10 Civics Ch-1 Notes
4 pages
SET 2 BROADCST TV and RADIO Signals SET 2
No ratings yet
SET 2 BROADCST TV and RADIO Signals SET 2
3 pages
Soy Milk Maker, Commercial Large Soybean Milk Grinding Machine Electric Automatic Ordinary Soya Milk Tofu Maker and Dregs Separater Splitter Organic Soy Nuts Milk Filter 25KG - H - Home & Kitchen
No ratings yet
Soy Milk Maker, Commercial Large Soybean Milk Grinding Machine Electric Automatic Ordinary Soya Milk Tofu Maker and Dregs Separater Splitter Organic Soy Nuts Milk Filter 25KG - H - Home & Kitchen
9 pages
Gatela, Jone Harry B. BSEd ENG 2
No ratings yet
Gatela, Jone Harry B. BSEd ENG 2
2 pages
Introduction To Social Representation Theory
No ratings yet
Introduction To Social Representation Theory
8 pages
ZipGrade50QuestionV2 PDF
No ratings yet
ZipGrade50QuestionV2 PDF
1 page
Gunluk - Plan 7 Ingilizce 33 39924
No ratings yet
Gunluk - Plan 7 Ingilizce 33 39924
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

K-Nearest Neighbors: Nipun Batra July 5, 2020

Uploaded by

K-Nearest Neighbors: Nipun Batra July 5, 2020

Uploaded by

K-Nearest Neighbors

Btw P' SPG

" "" " °

• What are the features that will be considered for data

• What are the features that will be considered for data

• What are the features that will be considered for data

• What are the features that will be considered for data

• What are the features that will be considered for data

The Distance Metric acts as a measure of similarity between

The Distance Metric acts as a measure of similarity between

The Distance Metric acts as a measure of similarity between

The Distance Metric acts as a measure of similarity between

Choosing the correct value of K is difficult.

Choosing the correct value of K is difficult.

Choosing the correct value of K is difficult.

Dataset K = 1 High Variance

K=3 K = 9 High Bias

There are different ways to go about aggregating the data from

• Keep the entire dataset: (x, y)

• Keep the entire dataset: (x, y)

• Keep the entire dataset: (x, y)

• Keep the entire dataset: (x, y)

With an increase in the number of dimensions:

With an increase in the number of dimensions:

1. the distance between points starts to increase

For a unifromly random dataset

With an increase in the number of dimensions:

2.5 5.0 7.5 10.0 12.5 15.0 17.5

For a unifromly random dataset 10

With an increase in the number of dimensions:

1. the distance between points starts to increase

Due to this, distance metrics lose their efficacy as a similarity

Doing an exhaustive search over all the points is time

Example of a big dataset

Doing an exhaustive search over all the points is time

Doing an exhaustive search over all the points is time

Doing an exhaustive search over all the points is time

• Locality sensitive hashing

Doing an exhaustive search over all the points is time

• Locality sensitive hashing

Doing an exhaustive search over all the points is time

• Locality sensitive hashing

Normal hash functions H(x) try to keep the collision of points

Example of a big dataset

Normal hash functions H(x) try to keep the collision of points

Example of a big dataset

A locality sensitive hash (LSH) function L(x) would be designed

Example of a big dataset

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.