0% found this document useful (0 votes)
11 views18 pages

3 KNN

The document discusses various supervised learning algorithms, focusing on k-nearest neighbors (KNN). It explains the KNN algorithm, including its decision boundaries, sensitivity to noise, and trade-offs in choosing the parameter k. Additionally, it provides examples of KNN applications in land usage classification and handwritten digit recognition, highlighting the importance of distance metrics in classification tasks.

Uploaded by

arnooshanajafi26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views18 pages

3 KNN

The document discusses various supervised learning algorithms, focusing on k-nearest neighbors (KNN). It explains the KNN algorithm, including its decision boundaries, sensitivity to noise, and trade-offs in choosing the parameter k. Additionally, it provides examples of KNN applications in land usage classification and handwritten digit recognition, highlighting the importance of distance metrics in classification tasks.

Uploaded by

arnooshanajafi26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Supervised Learning Algorithms

Supervised Learning
Algorithms
• K-nearest neighbors (KNN)

• Decision Trees

• Linear Regression

• Logistic Regression

• Support Vector Machines (SVM)

• Random Forrest

• Gradient Boosting
Nearest Neighbors
Nearest Neighbors
Suppose we’re given a novel input vector x we’d like to classify.
The idea: find the nearest input vector to x in the training set and copy
its label.
Can formalize “nearest” in terms of Euclidean distance
v
u d
u X (a)
||x (a) (b)
x ||2 = t (xj
(b) 2
xj )
j=1

Algorithm:
1. Find example (x⇤ , t⇤ ) (from the stored training set) closest to
x. That is:
x⇤ = argmin distance(x(i) , x)
x(i) 2train. set
2. Output y = t⇤

Note: we don’t need to compute the square root. Why?


Nearest Neighbors:
Nearest Neighbors: Decision Boundaries
Decision Boundaries
We can visualize the behavior in the classification setting using a Voronoi
diagram.
Nearest Neighbors:
NearestDecision Boundaries
Neighbors: Decision Boundaries

Decision boundary: the boundary between regions of input space assigned to


di↵erent categories.
Nearest Neighbors:
Nearest Neighbors: Decision Boundaries
Decision Boundaries

Example: 2D decision boundary


Nearest Neighbors
k-Nearest Neighbors
Nearest Neighbors
[Pic
[Pic by by Veksler]
Olga Olga Veksler]

Nearest neighbors sensitive to noise or mis-labeled data (“class noise”).


Nearest
Solution?
neighbors sensitive to noise or mis-labeled data (“class noise”).
Solution?
Smooth by having k nearest neighbors vote

Algorithm (kNN):
1. Find k examples {x(i) , t(i) } closest to the test instance x
2. Classification output is majority class
Xk
y = arg max I(t(z) = t(i) )
t(z)
i=1

I{statement} is the identity function and is equal to one whenever the


statement is true. We could also write this as (t(z) , t(i) ), with (a, b) = 1 if
Intro ML (UofT) CSC311-Lec1 38 / 54
a = b, 0 otherwise.
Intro ML (UofT) CSC311-Lec1 38 / 54
K-Nearest Neighbors
K-Nearest neighbors
k=1

[Image credit: ”The Elements of Statistical Learning”]


Intro ML (UofT) CSC311-Lec1 39 / 54
K-Nearest Neighbors
K-Nearest neighbors
k=15

[Image credit: ”The Elements of Statistical Learning”]


K-Nearest Neighbors
k-Nearest Neighbors

Tradeo↵s in choosing k?
Small k
I Good at capturing fine-grained patterns
I May overfit, i.e. be sensitive to random idiosyncrasies in the

training data
Large k
I Makes stable predictions by averaging over lots of examples
I May underfit, i.e. fail to capture important regularities

Balancing k
I Optimal choice of k depends on number of data points n.
I Nice theoretical properties if k ! 1 and k ! 0.
p n
I Rule of thumb: choose k < n.
I We can choose k using validation set (next slides).
KNN: Computational Cost
Pitfalls: Computational Cost

Number of computations at training time: 0


Number of computations at test time, per query (naı̈ve algorithm)
I Calculuate D-dimensional Euclidean distances with N data points:
O(N D)
I Sort the distances: O(N log N )
This must be done for each query, which is very expensive by the
standards of a learning algorithm!
Need to store the entire dataset in memory!
Tons of work has gone into algorithms and data structures for efficient
nearest neighbors with high dimensions and/or large datasets.
KNN: Example (1)
470 13. Prototypes and Nearest-Neighbors

Spectral Band 1 Spectral Band 2 Spectral Band 3

• STATLOG project
• four heat-map images:
• two in the visible spectrum and
two in the infrared, for an area of
agricultural land in Australia.
• Labels = {red soil, cotton, Spectral Band 4 Land Usage Predicted Land Usage
vegetation stubble, mixture,
gray soil, damp gray soil, very
damp gray soil},
• classify the land usage at a pixel,
based on the information in the
four spectral bands
KNN: Example (1)
470 13. Prototypes and Nearest-Neighbors

Spectral Band 1 Spectral Band 2 Spectral Band 3

• For each pixel we


extracted an 8-neighbor
feature map Spectral Band 4 Land Usage Predicted Land Usage

• This is done separately in


the four spectral bands,
giving (1+8)×4 = 36 input
13.3 k-Nearest-Neighbor Classifiers 47
FIGURE 13.6. The first four panels are LANDSAT images for an agricultural

features per pixel


area in four spectral bands, depicted by heatmap shading. The remaining two
panels give the actual land usage (color coded) and the predicted land usage using
a five-nearest-neighbor rule described in the text.
N N N
first problem, while 1-nearest-neighbor is best in the second problem by a
factor of 18%. These results underline the importance of using an objective,
data-based method like cross-validation
N X to estimate
N the best value of a
tuning parameter (see Figure 13.4 and Chapter 7).

13.3.2 Example: k-Nearest-Neighbors


N N and Image Scene
N
Classification
The STATLOG project (Michie et al., 1994) used part of a LANDSAT
image as a benchmark for classification (82 × 100 pixels). Figure 13.6 shows
KNN: Example (1)
472 13. Prototypes and Nearest-Neighbors

STATLOG results

• Of all the methods used in SMART


Logistic
LDA

the STATLOG project,


QDA

0.15
NewID C4.5
CART Neural
including LVQ, CART, neural ALLOC80

Test Misclassification Error


RBF

networks, linear discriminant LVQ

0.10
analysis and many others, k-
K-NN

DANN
nearest-neighbors
performed best on this task.

0.05
• DANN is a variant of k-
nearest neighbors, using an
0.0

adaptive metric 2 4 6 8 10 12 14

Method

FIGURE 13.8. Test-error performance for a number of classifiers, as reported


2 4 6 8 10 12 14

Method

KNN: Example (2)


FIGURE 13.8. Test-error performance for a number of classifiers, as reported
by the STATLOG project. The entry DANN is a variant of k-nearest neighbors,
using an adaptive metric (Section 13.4.2).

• Handwritten digit recognition


• We want our nearest-neighbor classi ier to
consider rotated digits to be close together
• The 256 grayscale pixel values for a rotated
“3” will look quite di erent from those in FIGURE 13.9. Examples of grayscale images of handwritten digits.

the original image


• We wish to remove the e ect of rotation in
measuring distances between two digits of
the same class
ff
ff
f
KNN: Example (2)
13.3 k-Nearest-Neighbor Classifiers

• Consider the set of pixel values consisting


of the original “3” and its rotated versions.
−15 −7.5 0 7.5 15

256
This is a one-dimensional curve in ℝ Transformations of 3

• The green curve in the middle of the igure


3 Tangent

depicts this set of rotated “ 3” in 256- Pixel space

dimensional space.
• The red line is the tangent line to the curve
at the original image, with some “ 3”s on
this tangent line, and its equation shown at α=− 0.2 α=− 0.1 α=0 α=0.1 α=0.2

the bottom of the igure. Linear equation for


images above +α .
f
f
KNN: Example (2)
474 13. Prototypes and Nearest-Neighbors

Transformations
of xi
• Rather than using the usual Euclidean
distance between the two images, we use the Distance between
shortest distance between the two curves.
transformed
Tangent distance xi and xi!

• This distance is called an invariant metric.


• There are two problems with it. xi
xi!
• First, it is very di icult to calculate for real
images. Euclidean distance
between xi and xi!

• Second, it allows large transformations that


can lead to poor performance Transformations
of xi!
ff
KNN: Example (2)
474 13. Prototypes and Nearest-Neighbors

Transformations
of xi

• The use of tangent distance solves both Distance between

of these problems
transformed
Tangent distance xi and xi!

• We can approximate the invariance xi


xi!

manifold of the image “3” by its tangent Euclidean distance

at the original image


between xi and xi!

• For a query image to be classi ied, we


Transformations
of xi!
13.4 Adaptive Nearest-Neighbor Methods

compute its invariant tangent line, and


TABLE 13.11.Test
FIGURE13.1. errordistance
Tangent rates for the handwritten
computation ZIP code
for two images problem.
xi and xi! .
ind the closest line to it among the lines Rather than using the Euclidean distance between xi and xi! , or the shortest
Method
distance between Errorbetween
the two curves, we use the shortest distance rate the two
in the training set Neural-net
tangent lines.
1-nearest-neighbor/Euclidean distance
0.049
0.055
1-nearest-neighbor/tangent
The idea distance
then is to compute the invariant tangent line for0.026
each training
image. For a query image to be classified, we compute its invariant tangent
line, and find the closest line to it among the lines in the training set. The
f
f

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy