0% found this document useful (0 votes)

56 views26 pages

Chapter 2

Uploaded by

praveenm026

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views26 pages

Chapter 2

Uploaded by

praveenm026

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Nearest Neighbor Algorithms

CHAPTER 2: MACHINE LEARNING – THEORY & PRACTICE

Proximity Measures
• Used to quantify the degree of similarity or dissimilarity between two
or more patterns.
• Types:
➢Euclidean distance: This the most popular distance as it is intuitively
appealing. It measures the straight-line distance between two points in a
multidimensional space.
➢Cosine similarity: This measures the cosine of the angle between two vectors
and is often used in text analysis to compare documents.
➢Jaccard similarity: This measures the intersection over union of two sets and
is often used in recommendation systems to compare users’ preferences.
➢Hamming distance: This measures the number of positions at which two
binary strings differ and is often used in error correction codes.
Distance Measures
• Used to find the dissimilarity between pattern representations.
• Key attributes of distance measures:
➢ Positive reflexivity: d(x,x) = 0
➢ Symmetry: d(x,y) = d(y,x)
➢ Triangular inequality: d(x,y) d(x,z) + d(z,y)
• Minkowski Distance:

where r is a parameter that determines the type of metric being used, p and
q are L dimensional vectors.
Different Norms based on “r”:
Weighted Distance Measure
• To assign greater importance to certain attributes, a weight can be
applied to their values in the weighted distance metric.

where wk represents the weight associated with the kth dimension or

feature.
Non-Metric Similarity Functions
• This category includes similarity functions that do not follow the
triangular inequality or symmetry.
• They are commonly used for image or string data, and they are
resistant to outliers or extremely noisy data.
• Example: k-median distance between two vectors.
➢Given x = (x1, x2, · · · , xn) and y = (y1, y2, · · · , yn), the formula for the k-median
distance is d(x, y) = k-median {|x1 y1|, · · · , |xn yn|}, where the k-median
operator returns the kth value of the ordered difference vector.
• Example: Cosine similarity between two vectors x and y.
Levenshtein distance
• Also known as edit distance, is a measure of the distance between
two strings. It is determined by calculating the minimum number of
mutations needed to transform string s1 into string s2, where a
mutation can be one of three operations: changing a letter, inserting
a letter, or deleting a letter.
• The edit distance can be defined using the following recurrence
relation:
➢d(“ ”, “ ”) = 0, (two empty strings match)
➢d(s, “ ”) = d(“ ”, s) = ‖s‖, (distance from an empty string) and
➢d(s1 + ch1, s2 + ch2) = min [d(s1, s2) + {if ch1 = ch2 then 0 else 1}, d(s1 + ch1,
s2) + 1, d(s1, s2 + ch2) + 1]
Mutual Neighbourhood Distance (MND)
• In this case, the function used to measure the similarity between two
patterns, x and y, is defined as S(x, y) =f(x, y, ε), where ε denotes the
context, i.e., the surrounding points.
• All other data points are labeled in increasing order of some distance
measure, starting from the nearest neighbor as 1 and ending with the
farthest point as N-1.
• Mutual neighborhood distance (MND) is calculated as MND(x, y) =
NN(x, y) + NN(y, x).
• Note: NN(x,y) – denotes nearest neighbor distance from x to y.
Proximity Between Binary Patterns
• Let p and q be two binary strings. Some of the popular proximity
measures on such binary patterns are:
➢Hamming Distance (HD): If pi = qi then we say that p and q match on their ith
bit, else (pi  qi) p and q mismatch on the ith bit. Hamming distance is the
number of mismatching bits out of the l-bit locations.
➢Simple Matching Coefficient (SMC):

M01 is the number of bits where p is 0 and q is 1

M10 is the number of bits where p is 1 and q is 0
M00 is the number of bits where p is 0 and q is 0
➢Jaccard Coefficient (JC): M11 is the number of bits where p is 1 and q is 1
Different classification algorithms based on the
distance measures
• Nearest Neighbour Classifier (NNC)
• k-Nearest Neighbour Classifier (kNNC)
• Weighted k-Nearest Neighbour (WkNN)
• Radius distance Near Neighbours
• Tree Based Nearest Neighbours
• Branch and Bound Method
• Leader clustering
• KNN Regression
Nearest Neighbour Classifier (NNC)
• Let , each pattern be a vector in
some L dimensional space and li is its class label.
• Now the nearest neighbor of x (i.e., the test pattern) is given by

where xj is the jth training pattern and d(x, xj) is the distance between
x and xj.
• Test pattern T(2.1, 0.7) is assigned to class 1,
since its Euclidean distance from x3 is
minimum (i.e., 1 unit)
k-Nearest Neighbour Classifier
• Similar to Nearest Neighbors Classifier (NNC) algorithm, where we
find the k nearest neighbors of a test pattern x from the training data
 , and then assigning the majority class label among the k neighbors
to x.

•By using this method of selecting the

majority class label among the k nearest
neighbors, the error in classiﬁcation can
be reduced, especially when the training
patterns are noisy.
•T is assigned to class 2, even though x5 is
closest, but is an outlier.
Weighted k-Nearest Neighbour (WkNN)
• Similar to kNN algorithm, but it takes into account the distance of
each of the k neighbors from the test point by weighting them
accordingly.
• Each neighbor is associated with a weight w, which is determined by
the following formula:

Here, j represents the neighbor’s index in the list of

k nearest neighbors, while dk and dj are the
distances between the test point and the k-th
neighbor and the j-th neighbor, respectively.
Weighted k-Nearest Neighbour (WkNN)
• For example the distances from T to its 5 nearest data points are
d(T, x3) = 1.0; d(T, x14) = 1.01; d(T, x13) = 1.08;
d(T, x5) = 1.08; d(T, x16) = 1.30;
• The weight values will be

Summing up for each selected class,

Class 1 to which x3 and x5 belong sums to
1.73, and
Class 3 to which x14, x13 and x16 belong sums
to 1.7.
Therefore, the point T belongs to Class 1.
Radius distance Near Neighbours Algorithm
• This algorithm is an alternative to the kNN algorithm that considers all
the neighbors within a speciﬁed distance r of the point of interest.
• Steps:
➢ Given a point T , identify the subset of data points that fall within the radius r
centered at T , denoted by

➢If Br(T) is empty, output the majority class of the entire dataset.
➢If Br(T) is not empty, output the majority class of the data points within Br(T).
Radius distance Near Neighbours Algorithm

Class assigned to T is Class 3

Tree Based Nearest Neighbours Algorithm
• Based on the transactional database
• Mainly for association rule mining, aims to identify the occurrence of one
item based on the occurrence of other items.
• Frequent Pattern (FP) tree:
• Steps:
➢Construct 1-frequent itemset, sort them in descending order of frequency
➢Arrange each transaction in the same order of items as that of frequent 1-itemset.
➢Add the transaction to the branch of the FP-tree such that the for the common
prefix part, the node-count of the items in the FP-tree are incremented and the new
nodes are added to the tree for the remaining part of the transaction.
Tree Based Nearest Neighbours Algorithm

1-frequent itemset :
(12: 5), (1: 4), (16: 4), (4: 3), (5: 3), (8:3),
(9: 3), and (15: 3)
(With minimum support = 3)
Transaction database

Transaction database with

transactions ordered according to
frequency of items
Tree Based Nearest Neighbours Algorithm
Let the test pattern, T = 1, 2, 3, 4, 6, 7, 8, 12, and
16.
After removing the non-frequent items
(Minimum support = 3),
T` = 1, 4, 8, 12, 16 .
By arranging these items in the
order they appear in the FP tree, we get 12, 1, 16,
4, 8.
Starting from the root node of the FP tree (12), we
can compare the remaining items in the test
pattern.
It is observed that the test pattern has the
maximum number of items in common with digit 7.
Therefore, it can be classified as belonging to digit
7.
Branch and Bound Method
• By clustering the data into representative groups with the smallest
possible radius, we can search for the nearest neighbor while avoiding
branches that cannot possibly have a closer neighbor than the current
best value found.
• Each group is represented by cluster centroid and radius (, r).
• To identify the belongingness of the point T to the group, the lower
bond, bj with reference to the cluster j, and recursively branching to
the cluster with the smallest bj until the nearest neighbor is found or
the bound is not satisfied. Note that bj for a cluster j is obtained by
Leader clustering
• It is an incremental clustering approach that is commonly used to
cluster large data sets that cannot be accommodated in the main
memory of the machine processing the data.
• It scan the dataset only once.
• Idea: A data point is assigned to an existing nearest cluster if the
point falls within a threshold distance from the representative
(leader) of the cluster; if there is no cluster in the threshold distance
of the point, then a new cluster is initiated with the data point
becoming the leader of the new cluster.
• It is order dependent algorithm. i.e, the order in which the data is
presented to the algorithm can affect the resulting clusters.
Leader clustering

• Data be clustered are processed

in the order x1, x2, · · · , x18.
• Threshold T be set to 1.5.
• Initial cluster centre (leader) is taken
as x1.
• Figure shows the 4 clusters
formed with the leaders (L)
KNN Regression

• Let . The regression model needs to use 

to find the value of y for a new vector x.
• Idea:
➢Find k nearest neighbors of x from the n data vectors. Let them be x1, x2, · · · ,
xk.
➢Consider the y values associated with these xi’s. Let them be y1, y2, · · · , yk.
➢Predicted value of y, i.e.,
Concentration Effect and Fractional Norms
• A major difficulty encountered while using some of the popular
distance measures like the Euclidean distance is that the distance
values, between various pairs of points, may not show much dynamic
range.

• Observe that as the r value in the Minkowski norm keeps decreasing

the distance between the pair (p,q) keeps increasing.
Concentration Eﬀect and Fractional Norms
• This behaviour prompted researchers to go for fractional norms r is a
fraction) to increase the dynamic range of the values or decrease the
concentration eﬀect.

• An important observation is that in the process of improving the

dynamic range of distance values the fractional norms can improve
the classiﬁcation performance.
Concentration Eﬀect and Fractional Norms
• Wisconsin breast-cancer data as with different norms shows the
increase in accuracy

Product Research PDF. Luke Belmar
No ratings yet
Product Research PDF. Luke Belmar
1 page
Part 3
No ratings yet
Part 3
113 pages
UNIT-2 ML Notes
No ratings yet
UNIT-2 ML Notes
15 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Lecture 12
No ratings yet
Lecture 12
15 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
K-Means Consistency in Clustering
No ratings yet
K-Means Consistency in Clustering
10 pages
K Nearest Neighbor Classification
No ratings yet
K Nearest Neighbor Classification
30 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
m3 Final-1
No ratings yet
m3 Final-1
171 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
Week 07
No ratings yet
Week 07
24 pages
Chapter 2
No ratings yet
Chapter 2
70 pages
Non Parametric Classification: Pattern Recognition
No ratings yet
Non Parametric Classification: Pattern Recognition
74 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
2.unit 2 ML Q&A
No ratings yet
2.unit 2 ML Q&A
36 pages
K Nearest Neighbor Classification
0% (1)
K Nearest Neighbor Classification
32 pages
Wikipedia K Nearest Neighbor Algorithm
No ratings yet
Wikipedia K Nearest Neighbor Algorithm
4 pages
Research Paper
No ratings yet
Research Paper
6 pages
ML KN
No ratings yet
ML KN
12 pages
Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition
17 pages
Algorithms - K Nearest Neighbors
No ratings yet
Algorithms - K Nearest Neighbors
23 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
Notes 02
No ratings yet
Notes 02
79 pages
COS4852 2023 Unit 2 - KNN
No ratings yet
COS4852 2023 Unit 2 - KNN
10 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
21 pages
20180723161729D4730 - Pert18 - K-Nearest Neighbor
No ratings yet
20180723161729D4730 - Pert18 - K-Nearest Neighbor
22 pages
Aiml M3 C2
No ratings yet
Aiml M3 C2
56 pages
ML 2
No ratings yet
ML 2
6 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Evaluation of K Nearest Neighbour Classifier Performance For Heterogeneous Data Sets
No ratings yet
Evaluation of K Nearest Neighbour Classifier Performance For Heterogeneous Data Sets
15 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
Lecture8 KNN1
No ratings yet
Lecture8 KNN1
16 pages
UCS 401 Unit-Lll Lect 13 Distance Based Models Neighbours and Examples
No ratings yet
UCS 401 Unit-Lll Lect 13 Distance Based Models Neighbours and Examples
20 pages
Lecture Slides-Week15,16
No ratings yet
Lecture Slides-Week15,16
50 pages
20 KNN Presentation
No ratings yet
20 KNN Presentation
16 pages
Unit V: Distance and Rule Based Models
No ratings yet
Unit V: Distance and Rule Based Models
56 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Miss Erum Mahood Topic: KNN Algorthim: Presentator BY: Zobia Malaika Maryam Minahil
No ratings yet
Miss Erum Mahood Topic: KNN Algorthim: Presentator BY: Zobia Malaika Maryam Minahil
10 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
19 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
Dynamic KNNF
No ratings yet
Dynamic KNNF
3 pages
Session 9 KNN - 2024
No ratings yet
Session 9 KNN - 2024
23 pages
ML04 KNN-SVM 2024-2025
No ratings yet
ML04 KNN-SVM 2024-2025
57 pages
Unit 5 - DA - Classification & Clustering
No ratings yet
Unit 5 - DA - Classification & Clustering
105 pages
A Review of Data Classification Using K-Nearest Neighbour
No ratings yet
A Review of Data Classification Using K-Nearest Neighbour
7 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
25 pages
K-Nearest Neighbourhood
100% (1)
K-Nearest Neighbourhood
7 pages
IV Distance and Rule Based Models 4.1 Distance Based Models
No ratings yet
IV Distance and Rule Based Models 4.1 Distance Based Models
45 pages
9.introduction To Artificial Intelligence
No ratings yet
9.introduction To Artificial Intelligence
14 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Divergence of Juda PDF
100% (1)
Divergence of Juda PDF
361 pages
Art and Prod Reviewer
No ratings yet
Art and Prod Reviewer
5 pages
Swetha Ashok
No ratings yet
Swetha Ashok
5 pages
Examen 2 Parte
No ratings yet
Examen 2 Parte
3 pages
Lucian Source Analysis Queer History Finished
No ratings yet
Lucian Source Analysis Queer History Finished
3 pages
(Online Teaching) A2 Flyers Speaking Part 2
No ratings yet
(Online Teaching) A2 Flyers Speaking Part 2
11 pages
Precalc 12 Review Rational Functions 202324 EB
No ratings yet
Precalc 12 Review Rational Functions 202324 EB
14 pages
Database Administration Topics
No ratings yet
Database Administration Topics
10 pages
Deadlock Detection Using Java
0% (1)
Deadlock Detection Using Java
3 pages
Electric Machines and Power Electronics
100% (2)
Electric Machines and Power Electronics
58 pages
Detailed Lesson Plan
No ratings yet
Detailed Lesson Plan
13 pages
Yarima Ahmed by MMN Noorul Hudah
No ratings yet
Yarima Ahmed by MMN Noorul Hudah
133 pages
Grade Wise Subject Teacher
No ratings yet
Grade Wise Subject Teacher
11 pages
Quiz 003 - Attempt Review PDF
No ratings yet
Quiz 003 - Attempt Review PDF
3 pages
Spelling Patterns Chart: Pattern: - Cei
No ratings yet
Spelling Patterns Chart: Pattern: - Cei
2 pages
Cet Exam
No ratings yet
Cet Exam
3 pages
Metaphor in Practice - Niklas Torneke Torneke
No ratings yet
Metaphor in Practice - Niklas Torneke Torneke
231 pages
Alarms RDC Integration
No ratings yet
Alarms RDC Integration
13 pages
Analyze Translation Quality Rater by Rizky
No ratings yet
Analyze Translation Quality Rater by Rizky
10 pages
Lecture WRITING A RESEARCH PROPOSAL
No ratings yet
Lecture WRITING A RESEARCH PROPOSAL
16 pages
06-04-2024 - JR - Super60 (Incoming) - NUCLEUS BT - Jee-Main - Special Test WTM - Q.Paper
No ratings yet
06-04-2024 - JR - Super60 (Incoming) - NUCLEUS BT - Jee-Main - Special Test WTM - Q.Paper
15 pages
Raul B. Obcian-Wps Office
No ratings yet
Raul B. Obcian-Wps Office
5 pages
PRESENT PERFECT A2
No ratings yet
PRESENT PERFECT A2
2 pages
CD Unit 3
No ratings yet
CD Unit 3
14 pages
Question - Quora
No ratings yet
Question - Quora
24 pages
What Happy Couples Know Part 1
No ratings yet
What Happy Couples Know Part 1
1 page
The Cambridge Foucault Lexicon (Review)
No ratings yet
The Cambridge Foucault Lexicon (Review)
4 pages
2023 JHS Scheme of Learning
No ratings yet
2023 JHS Scheme of Learning
37 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chapter 2

Uploaded by

Chapter 2

Uploaded by

Nearest Neighbor Algorithms

CHAPTER 2: MACHINE LEARNING – THEORY & PRACTICE

where wk represents the weight associated with the kth dimension or

M01 is the number of bits where p is 0 and q is 1

•By using this method of selecting the

Here, j represents the neighbor’s index in the list of

Summing up for each selected class,

Class assigned to T is Class 3

Transaction database with

• Data be clustered are processed

• Let . The regression model needs to use 

• Observe that as the r value in the Minkowski norm keeps decreasing

• An important observation is that in the process of improving the

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.