0% found this document useful (0 votes)

11 views18 pages

3 KNN

The document discusses various supervised learning algorithms, focusing on k-nearest neighbors (KNN). It explains the KNN algorithm, including its decision boundaries, sensitivity to noise, and trade-offs in choosing the parameter k. Additionally, it provides examples of KNN applications in land usage classification and handwritten digit recognition, highlighting the importance of distance metrics in classification tasks.

Uploaded by

arnooshanajafi26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views18 pages

3 KNN

Uploaded by

arnooshanajafi26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Supervised Learning Algorithms

Supervised Learning
Algorithms
• K-nearest neighbors (KNN)

• Decision Trees

• Linear Regression

• Logistic Regression

• Support Vector Machines (SVM)

• Random Forrest

• Gradient Boosting
Nearest Neighbors
Nearest Neighbors
Suppose we’re given a novel input vector x we’d like to classify.
The idea: find the nearest input vector to x in the training set and copy
its label.
Can formalize “nearest” in terms of Euclidean distance
v
u d
u X (a)
||x (a) (b)
x ||2 = t (xj
(b) 2
xj )
j=1

Algorithm:
1. Find example (x⇤ , t⇤ ) (from the stored training set) closest to
x. That is:
x⇤ = argmin distance(x(i) , x)
x(i) 2train. set
2. Output y = t⇤

Note: we don’t need to compute the square root. Why?

Nearest Neighbors:
Nearest Neighbors: Decision Boundaries
Decision Boundaries
We can visualize the behavior in the classification setting using a Voronoi
diagram.
Nearest Neighbors:
NearestDecision Boundaries
Neighbors: Decision Boundaries

Decision boundary: the boundary between regions of input space assigned to

di↵erent categories.
Nearest Neighbors:
Nearest Neighbors: Decision Boundaries
Decision Boundaries

Example: 2D decision boundary

Nearest Neighbors
k-Nearest Neighbors
Nearest Neighbors
[Pic
[Pic by by Veksler]
Olga Olga Veksler]

Nearest neighbors sensitive to noise or mis-labeled data (“class noise”).

Nearest
Solution?
neighbors sensitive to noise or mis-labeled data (“class noise”).
Solution?
Smooth by having k nearest neighbors vote

Algorithm (kNN):
1. Find k examples {x(i) , t(i) } closest to the test instance x
2. Classification output is majority class
Xk
y = arg max I(t(z) = t(i) )
t(z)
i=1

I{statement} is the identity function and is equal to one whenever the

statement is true. We could also write this as (t(z) , t(i) ), with (a, b) = 1 if
Intro ML (UofT) CSC311-Lec1 38 / 54
a = b, 0 otherwise.
Intro ML (UofT) CSC311-Lec1 38 / 54
K-Nearest Neighbors
K-Nearest neighbors
k=1

[Image credit: ”The Elements of Statistical Learning”]

Intro ML (UofT) CSC311-Lec1 39 / 54
K-Nearest Neighbors
K-Nearest neighbors
k=15

[Image credit: ”The Elements of Statistical Learning”]

K-Nearest Neighbors
k-Nearest Neighbors

Tradeo↵s in choosing k?
Small k
I Good at capturing fine-grained patterns
I May overfit, i.e. be sensitive to random idiosyncrasies in the

training data
Large k
I Makes stable predictions by averaging over lots of examples
I May underfit, i.e. fail to capture important regularities

Balancing k
I Optimal choice of k depends on number of data points n.
I Nice theoretical properties if k ! 1 and k ! 0.
p n
I Rule of thumb: choose k < n.
I We can choose k using validation set (next slides).
KNN: Computational Cost
Pitfalls: Computational Cost

Number of computations at training time: 0

Number of computations at test time, per query (naı̈ve algorithm)
I Calculuate D-dimensional Euclidean distances with N data points:
O(N D)
I Sort the distances: O(N log N )
This must be done for each query, which is very expensive by the
standards of a learning algorithm!
Need to store the entire dataset in memory!
Tons of work has gone into algorithms and data structures for efficient
nearest neighbors with high dimensions and/or large datasets.
KNN: Example (1)
470 13. Prototypes and Nearest-Neighbors

Spectral Band 1 Spectral Band 2 Spectral Band 3

• STATLOG project
• four heat-map images:
• two in the visible spectrum and
two in the infrared, for an area of
agricultural land in Australia.
• Labels = {red soil, cotton, Spectral Band 4 Land Usage Predicted Land Usage
vegetation stubble, mixture,
gray soil, damp gray soil, very
damp gray soil},
• classify the land usage at a pixel,
based on the information in the
four spectral bands
KNN: Example (1)
470 13. Prototypes and Nearest-Neighbors

Spectral Band 1 Spectral Band 2 Spectral Band 3

• For each pixel we

extracted an 8-neighbor
feature map Spectral Band 4 Land Usage Predicted Land Usage

• This is done separately in

the four spectral bands,
giving (1+8)×4 = 36 input
13.3 k-Nearest-Neighbor Classifiers 47
FIGURE 13.6. The first four panels are LANDSAT images for an agricultural

features per pixel

area in four spectral bands, depicted by heatmap shading. The remaining two
panels give the actual land usage (color coded) and the predicted land usage using
a five-nearest-neighbor rule described in the text.
N N N
first problem, while 1-nearest-neighbor is best in the second problem by a
factor of 18%. These results underline the importance of using an objective,
data-based method like cross-validation
N X to estimate
N the best value of a
tuning parameter (see Figure 13.4 and Chapter 7).

13.3.2 Example: k-Nearest-Neighbors

N N and Image Scene
N
Classification
The STATLOG project (Michie et al., 1994) used part of a LANDSAT
image as a benchmark for classification (82 × 100 pixels). Figure 13.6 shows
KNN: Example (1)
472 13. Prototypes and Nearest-Neighbors

STATLOG results

• Of all the methods used in SMART

Logistic
LDA

the STATLOG project,

QDA

0.15
NewID C4.5
CART Neural
including LVQ, CART, neural ALLOC80

Test Misclassification Error

RBF

networks, linear discriminant LVQ

0.10
analysis and many others, k-
K-NN

DANN
nearest-neighbors
performed best on this task.

0.05
• DANN is a variant of k-
nearest neighbors, using an
0.0

adaptive metric 2 4 6 8 10 12 14

Method

FIGURE 13.8. Test-error performance for a number of classifiers, as reported

2 4 6 8 10 12 14

Method

KNN: Example (2)

FIGURE 13.8. Test-error performance for a number of classifiers, as reported
by the STATLOG project. The entry DANN is a variant of k-nearest neighbors,
using an adaptive metric (Section 13.4.2).

• Handwritten digit recognition

• We want our nearest-neighbor classi ier to
consider rotated digits to be close together
• The 256 grayscale pixel values for a rotated
“3” will look quite di erent from those in FIGURE 13.9. Examples of grayscale images of handwritten digits.

the original image

• We wish to remove the e ect of rotation in
measuring distances between two digits of
the same class
ff
ff
f
KNN: Example (2)
13.3 k-Nearest-Neighbor Classifiers

• Consider the set of pixel values consisting

of the original “3” and its rotated versions.
−15 −7.5 0 7.5 15

256
This is a one-dimensional curve in ℝ Transformations of 3

• The green curve in the middle of the igure

3 Tangent

depicts this set of rotated “ 3” in 256- Pixel space

dimensional space.
• The red line is the tangent line to the curve
at the original image, with some “ 3”s on
this tangent line, and its equation shown at α=− 0.2 α=− 0.1 α=0 α=0.1 α=0.2

the bottom of the igure. Linear equation for

images above +α .
f
f
KNN: Example (2)
474 13. Prototypes and Nearest-Neighbors

Transformations
of xi
• Rather than using the usual Euclidean
distance between the two images, we use the Distance between
shortest distance between the two curves.
transformed
Tangent distance xi and xi!

• This distance is called an invariant metric.

• There are two problems with it. xi
xi!
• First, it is very di icult to calculate for real
images. Euclidean distance
between xi and xi!

• Second, it allows large transformations that

can lead to poor performance Transformations
of xi!
ff
KNN: Example (2)
474 13. Prototypes and Nearest-Neighbors

Transformations
of xi

• The use of tangent distance solves both Distance between

of these problems
transformed
Tangent distance xi and xi!

• We can approximate the invariance xi

xi!

manifold of the image “3” by its tangent Euclidean distance

at the original image

between xi and xi!

• For a query image to be classi ied, we

Transformations
of xi!
13.4 Adaptive Nearest-Neighbor Methods

compute its invariant tangent line, and

TABLE 13.11.Test
FIGURE13.1. errordistance
Tangent rates for the handwritten
computation ZIP code
for two images problem.
xi and xi! .
ind the closest line to it among the lines Rather than using the Euclidean distance between xi and xi! , or the shortest
Method
distance between Errorbetween
the two curves, we use the shortest distance rate the two
in the training set Neural-net
tangent lines.
1-nearest-neighbor/Euclidean distance
0.049
0.055
1-nearest-neighbor/tangent
The idea distance
then is to compute the invariant tangent line for0.026
each training
image. For a query image to be classified, we compute its invariant tangent
line, and find the closest line to it among the lines in the training set. The
f
f

Three Ideologies
100% (2)
Three Ideologies
5 pages
02-knn Slides
No ratings yet
02-knn Slides
57 pages
Notes 02
No ratings yet
Notes 02
79 pages
Machine Learning: Nearest Neighbors
No ratings yet
Machine Learning: Nearest Neighbors
23 pages
Unit 4 - KVR
No ratings yet
Unit 4 - KVR
111 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
KNN Cookbook
No ratings yet
KNN Cookbook
8 pages
Week 7 Nearest Neighbours
No ratings yet
Week 7 Nearest Neighbours
21 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
21 pages
Subject-Verb Agreement DLP
100% (2)
Subject-Verb Agreement DLP
4 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
Lecture 2 - Nearest-Neighbors Methods
No ratings yet
Lecture 2 - Nearest-Neighbors Methods
57 pages
Aiml M3 C2
No ratings yet
Aiml M3 C2
56 pages
KNN HMM
No ratings yet
KNN HMM
51 pages
ML 5
No ratings yet
ML 5
35 pages
Week 5 - Instance-Based Learning & PCA
No ratings yet
Week 5 - Instance-Based Learning & PCA
69 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
Topic 7.7 K-Nearest Neighbor Analysis
No ratings yet
Topic 7.7 K-Nearest Neighbor Analysis
5 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
KNN CIML
No ratings yet
KNN CIML
12 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
The Learning of A Complex Subject Matter Is Most Effective When It Is Intentional Process of Constructing Meaning From Information and Experience
100% (3)
The Learning of A Complex Subject Matter Is Most Effective When It Is Intentional Process of Constructing Meaning From Information and Experience
6 pages
Machine Learning Algorithms - pptx-1
No ratings yet
Machine Learning Algorithms - pptx-1
129 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
PowerPoint Presentation - KNN Presentation
No ratings yet
PowerPoint Presentation - KNN Presentation
16 pages
Lab # 12 K-Nearest Neighbor (KNN) Algorithm: Objective
No ratings yet
Lab # 12 K-Nearest Neighbor (KNN) Algorithm: Objective
5 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
m3 Final-1
No ratings yet
m3 Final-1
171 pages
KNN Notes
No ratings yet
KNN Notes
6 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
20180723161729D4730 - Pert18 - K-Nearest Neighbor
No ratings yet
20180723161729D4730 - Pert18 - K-Nearest Neighbor
22 pages
ML DSBA Lab4
No ratings yet
ML DSBA Lab4
5 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
03d Algind KNN Eng
No ratings yet
03d Algind KNN Eng
23 pages
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
No ratings yet
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
8 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
K-Means Consistency in Clustering
No ratings yet
K-Means Consistency in Clustering
10 pages
Saputra 2019 J. Phys. Conf. Ser. 1235 012006
No ratings yet
Saputra 2019 J. Phys. Conf. Ser. 1235 012006
7 pages
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
No ratings yet
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
16 pages
Dynamic KNNF
No ratings yet
Dynamic KNNF
3 pages
5c. Nearest Neighbour Classifier
No ratings yet
5c. Nearest Neighbour Classifier
2 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
20 KNN Presentation
No ratings yet
20 KNN Presentation
16 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
Jim Sarris - Comic Mnemonics Spanish Verbs - 2
67% (3)
Jim Sarris - Comic Mnemonics Spanish Verbs - 2
234 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
Lecture8 KNN1
No ratings yet
Lecture8 KNN1
16 pages
4 KNN Classifier
No ratings yet
4 KNN Classifier
6 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
2 pages
Food Countable - Uncountable
100% (5)
Food Countable - Uncountable
2 pages
K-Nearest Neighbors: KNN Algorithm Pseudocode
No ratings yet
K-Nearest Neighbors: KNN Algorithm Pseudocode
2 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
Road Traffic Algorithm
No ratings yet
Road Traffic Algorithm
5 pages
Ielts Writing Answer Sheet
No ratings yet
Ielts Writing Answer Sheet
2 pages
Textbook ML - Removed
No ratings yet
Textbook ML - Removed
10 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
RBF, KNN, SVM, DT
No ratings yet
RBF, KNN, SVM, DT
9 pages
Wikipedia K Nearest Neighbor Algorithm
No ratings yet
Wikipedia K Nearest Neighbor Algorithm
4 pages
Tefl 1
No ratings yet
Tefl 1
159 pages
Ai & Machine Learning
100% (3)
Ai & Machine Learning
62 pages
Ruby Bridges Lesson Plan
No ratings yet
Ruby Bridges Lesson Plan
5 pages
UNLV Student: PSMT Name: Lesson Plan Title: Lesson Plan Topic: Date: Estimated Time: Grade Level: School Site: 1. State Standard(s)
No ratings yet
UNLV Student: PSMT Name: Lesson Plan Title: Lesson Plan Topic: Date: Estimated Time: Grade Level: School Site: 1. State Standard(s)
9 pages
Twelve Angry Men
No ratings yet
Twelve Angry Men
8 pages
Harper 5e Lesson Plan
No ratings yet
Harper 5e Lesson Plan
3 pages
Audio Script - Answer Keys 1
No ratings yet
Audio Script - Answer Keys 1
8 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
Contoh Paper Tugas Studi Kasus Pancasila
No ratings yet
Contoh Paper Tugas Studi Kasus Pancasila
9 pages
Unidad 1 Uncover
No ratings yet
Unidad 1 Uncover
10 pages
Thesis Chapters 1-5
No ratings yet
Thesis Chapters 1-5
56 pages
NCM 117 Lec - Personality Disorders
No ratings yet
NCM 117 Lec - Personality Disorders
5 pages
Sailors Hornpipe Expdes
No ratings yet
Sailors Hornpipe Expdes
3 pages
Laboratory Environment and Academic
No ratings yet
Laboratory Environment and Academic
8 pages
The Business Case For Emotional Intelligence: Joshua Freedman & Todd Everett, MBA
No ratings yet
The Business Case For Emotional Intelligence: Joshua Freedman & Todd Everett, MBA
32 pages
Effect of Collaborative and Competitive Learning Strategy On The Achievement of Students' Performance in Chemistry
No ratings yet
Effect of Collaborative and Competitive Learning Strategy On The Achievement of Students' Performance in Chemistry
73 pages
Critical Thinking
No ratings yet
Critical Thinking
14 pages
Future Civil Engineering
No ratings yet
Future Civil Engineering
3 pages
Sekolah Kebangsaan Taman Melawati Sekolah Kluster Kecemerlangan
No ratings yet
Sekolah Kebangsaan Taman Melawati Sekolah Kluster Kecemerlangan
1 page
Title of Research:: South Square Village, Pasong Kawayan II General Trias City, Cavite
No ratings yet
Title of Research:: South Square Village, Pasong Kawayan II General Trias City, Cavite
24 pages
Group Presentation Guidelines
No ratings yet
Group Presentation Guidelines
6 pages
MCS-224 June 2024
No ratings yet
MCS-224 June 2024
5 pages
Nguyễn Hoàng Mai Chi - 207140231039
No ratings yet
Nguyễn Hoàng Mai Chi - 207140231039
8 pages
A Hierarchical Attention Model For Social Contextual Image Recommendation
No ratings yet
A Hierarchical Attention Model For Social Contextual Image Recommendation
4 pages
Preston University, Islamabad Campus Faculty of Computer Sciences
No ratings yet
Preston University, Islamabad Campus Faculty of Computer Sciences
3 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

3 KNN

Uploaded by

3 KNN

Uploaded by

Supervised Learning Algorithms

• Support Vector Machines (SVM)

Note: we don’t need to compute the square root. Why?

Decision boundary: the boundary between regions of input space assigned to

Example: 2D decision boundary

Nearest neighbors sensitive to noise or mis-labeled data (“class noise”).

I{statement} is the identity function and is equal to one whenever the

[Image credit: ”The Elements of Statistical Learning”]

[Image credit: ”The Elements of Statistical Learning”]

Number of computations at training time: 0

Spectral Band 1 Spectral Band 2 Spectral Band 3

Spectral Band 1 Spectral Band 2 Spectral Band 3

• For each pixel we

• This is done separately in

features per pixel

13.3.2 Example: k-Nearest-Neighbors

• Of all the methods used in SMART

the STATLOG project,

Test Misclassification Error

networks, linear discriminant LVQ

FIGURE 13.8. Test-error performance for a number of classifiers, as reported

KNN: Example (2)

• Handwritten digit recognition

the original image

• Consider the set of pixel values consisting

• The green curve in the middle of the igure

depicts this set of rotated “ 3” in 256- Pixel space

the bottom of the igure. Linear equation for

• This distance is called an invariant metric.

• Second, it allows large transformations that

• The use of tangent distance solves both Distance between

• We can approximate the invariance xi

manifold of the image “3” by its tangent Euclidean distance

at the original image

• For a query image to be classi ied, we

compute its invariant tangent line, and

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.