0% found this document useful (0 votes)

7 views10 pages

k-nearest neighbors algorithm - Wikipedia

The k-nearest neighbors (k-NN) algorithm is a non-parametric supervised learning method primarily used for classification and regression tasks, where the class or value of an object is determined based on the majority vote or average of its k nearest neighbors. It is sensitive to local data structures and can be affected by feature scaling and noise, necessitating techniques like feature extraction and dimension reduction for optimal performance. The choice of k and distance metrics greatly influences the algorithm's accuracy, with various strategies available for parameter selection and data reduction to enhance classification outcomes.

Uploaded by

Red Roses86

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views10 pages

k-nearest neighbors algorithm - Wikipedia

Uploaded by

Red Roses86

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

k-nearest neighbors algorithm

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised

learning method. It was first developed by Evelyn Fix and Joseph Hodges in 1951,[1] and later
expanded by Thomas Cover.[2] Most often, it is used for classification, as a k-NN classifier, the
output of which is a class membership. An object is classified by a plurality vote of its
neighbors, with the object being assigned to the class most common among its k nearest
neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to
the class of that single nearest neighbor.
The k-NN algorithm can also be generalized for regression. In k-NN regression, also known as
nearest neighbor smoothing, the output is the property value for the object. This value is the
average of the values of k nearest neighbors. If k = 1, then the output is simply assigned to
the value of that single nearest neighbor, also known as nearest neighbor interpolation.
For both classification and regression, a useful technique can be to assign weights to the
contributions of the neighbors, so that nearer neighbors contribute more to the average than
distant ones. For example, a common weighting scheme consists of giving each neighbor a
weight of 1/d, where d is the distance to the neighbor.[3]
The input consists of the k closest training examples in a data set. The neighbors are taken
from a set of objects for which the class (for k-NN classification) or the object property value
(for k-NN regression) is known. This can be thought of as the training set for the algorithm,
though no explicit training step is required.
A peculiarity (sometimes even a disadvantage) of the k-NN algorithm is its sensitivity to the
local structure of the data. In k-NN classification the function is only approximated locally and
all computation is deferred until function evaluation. Since this algorithm relies on distance, if
the features represent different physical units or come in vastly different scales, then feature-
wise normalizing of the training data can greatly improve its accuracy.[4]
Statistical setting

Suppose we have pairs taking values in ,

where Y is the class label of X, so that for (and probability
distributions ). Given some norm on and a point , let
be a reordering of the training data such that
.
Algorithm

Example of k-NN classification. The

test sample (green dot) should be
classified either to blue squares or to
red triangles. If k = 3 (solid line circle)
it is assigned to the red triangles
because there are 2 triangles and
only 1 square inside the inner circle.
If k = 5 (dashed line circle) it is
assigned to the blue squares (3
squares vs. 2 triangles inside the
outer circle).
The training examples are vectors in a multidimensional feature space, each with a class label.
The training phase of the algorithm consists only of storing the feature vectors and class
labels of the training samples.
In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or
test point) is classified by assigning the label which is most frequent among the k training
samples nearest to that query point.

Application of a k-NN classifier

considering k = 3 neighbors. Left -
Given the test point "?", the
algorithm seeks the 3 closest points
in the training set, and adopts the
majority vote to classify it as "class
red". Right - By iteratively repeating
the prediction over the whole
feature space (X1, X2), one can
depict the "decision surface".
A commonly used distance metric for continuous variables is Euclidean distance. For discrete
variables, such as for text classification, another metric can be used, such as the overlap
metric (or Hamming distance). In the context of gene expression microarray data, for
example, k-NN has been employed with correlation coefficients, such as Pearson and
Spearman, as a metric.[5] Often, the classification accuracy of k-NN can be improved
significantly if the distance metric is learned with specialized algorithms such as Large Margin
Nearest Neighbor or Neighbourhood components analysis.
A drawback of the basic "majority voting" classification occurs when the class distribution is
skewed. That is, examples of a more frequent class tend to dominate the prediction of the
new example, because they tend to be common among the k nearest neighbors due to their
large number.[6] One way to overcome this problem is to weight the classification, taking into
account the distance from the test point to each of its k nearest neighbors. The class (or
value, in regression problems) of each of the k nearest points is multiplied by a weight
proportional to the inverse of the distance from that point to the test point. Another way to
overcome skew is by abstraction in data representation. For example, in a self-organizing map
(SOM), each node is a representative (a center) of a cluster of similar points, regardless of
their density in the original training data. K-NN can then be applied to the SOM.
Parameter selection

The best choice of k depends upon the data; generally, larger values of k reduces effect of the
noise on the classification,[7] but make boundaries between classes less distinct. A good k can
be selected by various heuristic techniques (see hyperparameter optimization). The special
case where the class is predicted to be the class of the closest training sample (i.e. when k = 1)
is called the nearest neighbor algorithm.
The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or
irrelevant features, or if the feature scales are not consistent with their importance. Much
research effort has been put into selecting or scaling features to improve classification. A
particularly popular approach is the use of evolutionary algorithms to optimize feature
scaling.[8] Another popular approach is to scale features by the mutual information of the
training data with the training classes.
In binary (two class) classification problems, it is helpful to choose k to be an odd number as
this avoids tied votes. One popular way of choosing the empirically optimal k in this setting is
via bootstrap method.[9]
The 1-nearest neighbor classifier

The most intuitive nearest neighbour type classifier is the one nearest neighbour classifier
that assigns a point x to the class of its closest neighbour in the feature space, that is
.
As the size of training data set approaches infinity, the one nearest neighbour classifier
guarantees an error rate of no worse than twice the Bayes error rate (the minimum achievable
error rate given the distribution of the data).
The weighted nearest neighbour classifier

The k-nearest neighbour classifier can be viewed as assigning the k nearest neighbours a
weight and all others 0 weight. This can be generalised to weighted nearest neighbour
classifiers. That is, where the ith nearest neighbour is assigned a weight , with
. An analogous result on the strong consistency of weighted nearest
neighbour classifiers also holds.[10]
Let denote the weighted nearest classifier with weights . Subject to regularity
conditions, which in asymptotic theory are conditional variables which require assumptions to
differentiate among parameters with some criteria. On the class distributions the excess risk
has the following asymptotic expansion[11]

for constants and where and

.
The optimal weighting scheme , that balances the two terms in the display above, is
given as follows: set ,

for and
for .
With optimal weights the dominant term in the asymptotic expansion of the excess risk is
. Similar results are true when using a bagged nearest neighbour classifier.
Properties

k-NN is a special case of a variable-bandwidth, kernel density "balloon" estimator with a

uniform kernel.[12][13]
The naive version of the algorithm is easy to implement by computing the distances from the
test example to all stored examples, but it is computationally intensive for large training sets.
Using an approximate nearest neighbor search algorithm makes k-NN computationally
tractable even for large data sets. Many nearest neighbor search algorithms have been
proposed over the years; these generally seek to reduce the number of distance evaluations
actually performed.
k-NN has some strong consistency results. As the amount of data approaches infinity, the
two-class k-NN algorithm is guaranteed to yield an error rate no worse than twice the Bayes
error rate (the minimum achievable error rate given the distribution of the data).[2] Various
improvements to the k-NN speed are possible by using proximity graphs.[14]
For multi-class k-NN classification, Cover and Hart (1967) prove an upper bound error rate of

where is the Bayes error rate (which is the minimal error rate possible), is the
asymptotic k-NN error rate, and M is the number of classes in the problem. This bound is tight
in the sense that both the lower and upper bounds are achievable by some distribution.[15]
For and as the Bayesian error rate approaches zero, this limit reduces to "not more
than twice the Bayesian error rate".
Error rates

There are many results on the error rate of the k nearest neighbour classifiers.[16] The k-
nearest neighbour classifier is strongly (that is for any joint distribution on ) consistent
provided diverges and converges to zero as .
Let denote the k nearest neighbour classifier based on a training set of size n. Under
certain regularity conditions, the excess risk yields the following asymptotic expansion[11]

for some constants and .

The choice offers a trade off between the two terms in the above display, for
which the -nearest neighbour error converges to the Bayes error at the optimal (minimax)
rate .

Metric learning

The K-nearest neighbor classification performance can often be significantly improved

through (supervised) metric learning. Popular algorithms are neighbourhood components
analysis and large margin nearest neighbor. Supervised metric learning algorithms use the
label information to learn a new metric or pseudo-metric.
Feature extraction

When the input data to an algorithm is too large to be processed and it is suspected to be
redundant (e.g. the same measurement in both feet and meters) then the input data will be
transformed into a reduced representation set of features (also named features vector).
Transforming the input data into the set of features is called feature extraction. If the features
extracted are carefully chosen it is expected that the features set will extract the relevant
information from the input data in order to perform the desired task using this reduced
representation instead of the full size input. Feature extraction is performed on raw data prior
to applying k-NN algorithm on the transformed data in feature space.
An example of a typical computer vision computation pipeline for face recognition using k-NN
including feature extraction and dimension reduction pre-processing steps (usually
implemented with OpenCV):
1. Haar face detection
2. Mean-shift tracking analysis
3. PCA or Fisher LDA projection into feature space, followed by k-NN classification
Dimension reduction

For high-dimensional data (e.g., with number of dimensions more than 10) dimension
reduction is usually performed prior to applying the k-NN algorithm in order to avoid the
effects of the curse of dimensionality.[17]
The curse of dimensionality in the k-NN context basically means that Euclidean distance is
unhelpful in high dimensions because all vectors are almost equidistant to the search query
vector (imagine multiple points lying more or less on a circle with the query point at the
center; the distance from the query to all data points in the search space is almost the same).
Feature extraction and dimension reduction can be combined in one step using principal
component analysis (PCA), linear discriminant analysis (LDA), or canonical correlation analysis
(CCA) techniques as a pre-processing step, followed by clustering by k-NN on feature vectors
in reduced-dimension space. This process is also called low-dimensional embedding.[18]
For very-high-dimensional datasets (e.g. when performing a similarity search on live video
streams, DNA data or high-dimensional time series) running a fast approximate k-NN search
using locality sensitive hashing, "random projections",[19] "sketches"[20] or other high-
dimensional similarity search techniques from the VLDB toolbox might be the only feasible
option.
Decision boundary

Nearest neighbor rules in effect implicitly compute the decision boundary. It is also possible to
compute the decision boundary explicitly, and to do so efficiently, so that the computational
complexity is a function of the boundary complexity.[21]
Data reduction

Data reduction is one of the most important problems for work with huge data sets. Usually,
only some of the data points are needed for accurate classification. Those data are called the
prototypes and can be found as follows:

1. Select the class-outliers, that is, training data that are classified incorrectly by k-NN (for a
given k)
2. Separate the rest of the data into two sets: (i) the prototypes that are used for the
classification decisions and (ii) the absorbed points that can be correctly classified by k-
NN using prototypes. The absorbed points can then be removed from the training set.
Selection of class-outliers
A training example surrounded by examples of other classes is called a class outlier. Causes of
class outliers include:
random error
insufficient training examples of this class (an isolated example appears instead of a cluster)
missing important features (the classes are separated in other dimensions which we don't
know)
too many training examples of other classes (unbalanced classes) that create a "hostile"
background for the given small class
Class outliers with k-NN produce noise. They can be detected and separated for future
analysis. Given two natural numbers, k>r>0, a training example is called a (k,r)NN class-outlier
if its k nearest neighbors include more than r examples of other classes.

Condensed Nearest Neighbor for data reduction

Condensed nearest neighbor (CNN, the Hart algorithm) is an algorithm designed to reduce
the data set for k-NN classification.[22] It selects the set of prototypes U from the training
data, such that 1NN with U can classify the examples almost as accurately as 1NN does with
the whole data set.

Calculation of the
border ratio

Three types of points:

prototypes, class-
outliers, and
absorbed points.
Given a training set X, CNN works iteratively:
1. Scan all elements of X, looking for an element x whose nearest prototype from U has a
different label than x.
2. Remove x from X and add it to U
3. Repeat the scan until no more prototypes are added to U.
Use U instead of X for classification. The examples that are not prototypes are called
"absorbed" points.
It is efficient to scan the training examples in order of decreasing border ratio.[23] The border
ratio of a training example x is defined as
‖ x'-y ‖
a(x) = ⁠‖ x-y‖ ⁠
where ‖ x-y‖ is the distance to the closest example y having a different color than x, and
‖ x'-y ‖ is the distance from y to its closest example x' with the same label as x.

The border ratio is in the interval [0,1] because ‖ x'-y ‖ never exceeds ‖ x-y‖. This ordering
gives preference to the borders of the classes for inclusion in the set of prototypes U. A point
of a different label than x is called external to x. The calculation of the border ratio is
illustrated by the figure on the right. The data points are labeled by colors: the initial point is x
and its label is red. External points are blue and green. The closest to x external point is y. The
closest to y red point is x' . The border ratio a(x) = ‖ x'-y ‖ / ‖x-y‖is the attribute of the initial
point x.
Below is an illustration of CNN in a series of figures. There are three classes (red, green and
blue). Fig. 1: initially there are 60 points in each class. Fig. 2 shows the 1NN classification map:
each pixel is classified by 1NN using all the data. Fig. 3 shows the 5NN classification map.
White areas correspond to the unclassified regions, where 5NN voting is tied (for example, if
there are two green, two red and one blue points among 5 nearest neighbors). Fig. 4 shows
the reduced data set. The crosses are the class-outliers selected by the (3,2)NN rule (all the
three nearest neighbors of these instances belong to other classes); the squares are the
prototypes, and the empty circles are the absorbed points. The left bottom corner shows the
numbers of the class-outliers, prototypes and absorbed points for all three classes. The
number of prototypes varies from 15% to 20% for different classes in this example. Fig. 5
shows that the 1NN classification map with the prototypes is very similar to that with the
initial data set. The figures were produced using the Mirkes applet.[23]
CNN model reduction for k-NN classifiers
Fig. 1. The dataset. Fig. 2. The 1NN classification
map.

Fig. 3. The 5NN classification Fig. 4. The CNN reduced dataset.

map.

Fig. 5. The 1NN classification

map based on the CNN
extracted prototypes.

k-NN regression

In k-NN regression, also known as k-NN smoothing, the k-NN algorithm is used for estimating
continuous variables. One such algorithm uses a weighted average of the k nearest neighbors,
weighted by the inverse of their distance. This algorithm works as follows:
1. Compute the Euclidean or Mahalanobis distance from the query example to the labeled
examples.
2. Order the labeled examples by increasing distance.

Measures of Position DLP (Ungrouped Data)
100% (10)
Measures of Position DLP (Ungrouped Data)
9 pages
Six Sigma Green Belt Handbook PDF
89% (28)
Six Sigma Green Belt Handbook PDF
364 pages
K-Nearest Neighbors Algorithm
No ratings yet
K-Nearest Neighbors Algorithm
11 pages
Wikipedia K Nearest Neighbor Algorithm
No ratings yet
Wikipedia K Nearest Neighbor Algorithm
4 pages
K Nearest Neighbour
No ratings yet
K Nearest Neighbour
2 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
ml2
No ratings yet
ml2
6 pages
ML DSBA Lab4
No ratings yet
ML DSBA Lab4
5 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
ml5
No ratings yet
ml5
35 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
Example 1: Riding Mowers
No ratings yet
Example 1: Riding Mowers
6 pages
MKNN Modified K Nearest Neighbor
No ratings yet
MKNN Modified K Nearest Neighbor
4 pages
ML 03 Classification
No ratings yet
ML 03 Classification
15 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
Ml 7th Sem Aiml Ite Notes Complete Long[1]-63-155
No ratings yet
Ml 7th Sem Aiml Ite Notes Complete Long[1]-63-155
93 pages
K-Nearest Neighbors
100% (1)
K-Nearest Neighbors
32 pages
5. K-Nearest Neighbors
No ratings yet
5. K-Nearest Neighbors
35 pages
Road Traffic Algorithm
No ratings yet
Road Traffic Algorithm
5 pages
KNN For Classification
No ratings yet
KNN For Classification
1 page
w5 Classification
No ratings yet
w5 Classification
34 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
ML04_KNN-SVM_2024-2025
No ratings yet
ML04_KNN-SVM_2024-2025
57 pages
k-NN
No ratings yet
k-NN
17 pages
FPA unit 2
No ratings yet
FPA unit 2
20 pages
A Dempster-Shafer Theory: K-Nearest Neighbor Classification Rule Based On
No ratings yet
A Dempster-Shafer Theory: K-Nearest Neighbor Classification Rule Based On
30 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
K NN Annotated Slides
No ratings yet
K NN Annotated Slides
9 pages
Nonparametric Density Estimation Nearest Neighbors, KNN
No ratings yet
Nonparametric Density Estimation Nearest Neighbors, KNN
31 pages
S3-K-Nearest-Neighbor-LKW-15Jan2025
No ratings yet
S3-K-Nearest-Neighbor-LKW-15Jan2025
16 pages
What Is KNN
No ratings yet
What Is KNN
9 pages
ML UNIT-2
No ratings yet
ML UNIT-2
33 pages
K Nearest Neighbors - Classification: Algorithm
No ratings yet
K Nearest Neighbors - Classification: Algorithm
4 pages
Clustering - KNN
No ratings yet
Clustering - KNN
10 pages
Lecture 17 - KNN
No ratings yet
Lecture 17 - KNN
18 pages
Knn
No ratings yet
Knn
11 pages
14 K - Nearest Neighbours
No ratings yet
14 K - Nearest Neighbours
8 pages
Chap7 KNN
No ratings yet
Chap7 KNN
15 pages
KNN HMM
No ratings yet
KNN HMM
51 pages
05_kNN
No ratings yet
05_kNN
49 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
KNN
100% (1)
KNN
7 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
The Nearest Neighbour Algorithm
No ratings yet
The Nearest Neighbour Algorithm
3 pages
Aiml M3 C2
No ratings yet
Aiml M3 C2
56 pages
KNN Notes
No ratings yet
KNN Notes
6 pages
Week 5 - Instance-Based Learning & PCA
No ratings yet
Week 5 - Instance-Based Learning & PCA
69 pages
08 Classification Using K NN
No ratings yet
08 Classification Using K NN
23 pages
2.unit 2 ML Q&A
No ratings yet
2.unit 2 ML Q&A
36 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
ch2
No ratings yet
ch2
30 pages
09314060
No ratings yet
09314060
13 pages
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
No ratings yet
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
24 pages
Paper 5 Essentials Guideline
No ratings yet
Paper 5 Essentials Guideline
5 pages
Reviewer Stat Midterm
No ratings yet
Reviewer Stat Midterm
4 pages
Tests For Two Correlated Proportions-McNemar Test
No ratings yet
Tests For Two Correlated Proportions-McNemar Test
13 pages
Yaffee Promer For Panel Data Analysis
No ratings yet
Yaffee Promer For Panel Data Analysis
12 pages
Revision Questions. Statistics
100% (5)
Revision Questions. Statistics
27 pages
Lecture-2b Chart
No ratings yet
Lecture-2b Chart
6 pages
(1) Frederic M. Lord - Applications of Item Response Theory to Practical Testing Problems (1980)
No ratings yet
(1) Frederic M. Lord - Applications of Item Response Theory to Practical Testing Problems (1980)
289 pages
sec_8-1
No ratings yet
sec_8-1
7 pages
SPSS Instructions For Research Project #2 (Q-Aire Construction)
No ratings yet
SPSS Instructions For Research Project #2 (Q-Aire Construction)
5 pages
Assignment 4.54: First Quartile Second Quartile Third Quartile 4.56
No ratings yet
Assignment 4.54: First Quartile Second Quartile Third Quartile 4.56
2 pages
Chapter 08 PowerPoint
No ratings yet
Chapter 08 PowerPoint
18 pages
Sse - 27-12-459-01
No ratings yet
Sse - 27-12-459-01
1 page
Bivariate Analysis
No ratings yet
Bivariate Analysis
46 pages
Generalized Kappa Statistic
No ratings yet
Generalized Kappa Statistic
11 pages
An Introduction To Multivariate Statisti
No ratings yet
An Introduction To Multivariate Statisti
739 pages
Young Professional Magazine
No ratings yet
Young Professional Magazine
6 pages
Pre Test
No ratings yet
Pre Test
8 pages
Assignment 07 Summer 2021
No ratings yet
Assignment 07 Summer 2021
3 pages
Pearson R Correlation JOHN E. CORDERO
No ratings yet
Pearson R Correlation JOHN E. CORDERO
69 pages
Module 5 HW Answers
No ratings yet
Module 5 HW Answers
4 pages
Caterpillar Assignment
No ratings yet
Caterpillar Assignment
2 pages
Hypothesis Testing Concepts
No ratings yet
Hypothesis Testing Concepts
8 pages
Anderson-Darling Test - Real Statistics Using Excel
No ratings yet
Anderson-Darling Test - Real Statistics Using Excel
37 pages
Tutorial 4
No ratings yet
Tutorial 4
2 pages
11 Mean worksheet 1 Ans key
No ratings yet
11 Mean worksheet 1 Ans key
5 pages
BKMR
No ratings yet
BKMR
21 pages
Chapter 5 TEst
No ratings yet
Chapter 5 TEst
18 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

k-nearest neighbors algorithm - Wikipedia

Uploaded by

k-nearest neighbors algorithm - Wikipedia

Uploaded by

k-nearest neighbors algorithm

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised

Suppose we have pairs taking values in ,

Example of k-NN classification. The

Application of a k-NN classifier

for constants and where and

k-NN is a special case of a variable-bandwidth, kernel density "balloon" estimator with a

for some constants and .

The K-nearest neighbor classification performance can often be significantly improved

Condensed Nearest Neighbor for data reduction

Three types of points:

Fig. 3. The 5NN classification Fig. 4. The CNN reduced dataset.

Fig. 5. The 1NN classification

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.