0% found this document useful (0 votes)
56 views26 pages

Chapter 2

Uploaded by

praveenm026
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views26 pages

Chapter 2

Uploaded by

praveenm026
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Nearest Neighbor Algorithms

CHAPTER 2: MACHINE LEARNING – THEORY & PRACTICE


Proximity Measures
• Used to quantify the degree of similarity or dissimilarity between two
or more patterns.
• Types:
➢Euclidean distance: This the most popular distance as it is intuitively
appealing. It measures the straight-line distance between two points in a
multidimensional space.
➢Cosine similarity: This measures the cosine of the angle between two vectors
and is often used in text analysis to compare documents.
➢Jaccard similarity: This measures the intersection over union of two sets and
is often used in recommendation systems to compare users’ preferences.
➢Hamming distance: This measures the number of positions at which two
binary strings differ and is often used in error correction codes.
Distance Measures
• Used to find the dissimilarity between pattern representations.
• Key attributes of distance measures:
➢ Positive reflexivity: d(x,x) = 0
➢ Symmetry: d(x,y) = d(y,x)
➢ Triangular inequality: d(x,y) d(x,z) + d(z,y)
• Minkowski Distance:

where r is a parameter that determines the type of metric being used, p and
q are L dimensional vectors.
Different Norms based on “r”:
Weighted Distance Measure
• To assign greater importance to certain attributes, a weight can be
applied to their values in the weighted distance metric.

where wk represents the weight associated with the kth dimension or


feature.
Non-Metric Similarity Functions
• This category includes similarity functions that do not follow the
triangular inequality or symmetry.
• They are commonly used for image or string data, and they are
resistant to outliers or extremely noisy data.
• Example: k-median distance between two vectors.
➢Given x = (x1, x2, · · · , xn) and y = (y1, y2, · · · , yn), the formula for the k-median
distance is d(x, y) = k-median {|x1 y1|, · · · , |xn yn|}, where the k-median
operator returns the kth value of the ordered difference vector.
• Example: Cosine similarity between two vectors x and y.
Levenshtein distance
• Also known as edit distance, is a measure of the distance between
two strings. It is determined by calculating the minimum number of
mutations needed to transform string s1 into string s2, where a
mutation can be one of three operations: changing a letter, inserting
a letter, or deleting a letter.
• The edit distance can be defined using the following recurrence
relation:
➢d(“ ”, “ ”) = 0, (two empty strings match)
➢d(s, “ ”) = d(“ ”, s) = ‖s‖, (distance from an empty string) and
➢d(s1 + ch1, s2 + ch2) = min [d(s1, s2) + {if ch1 = ch2 then 0 else 1}, d(s1 + ch1,
s2) + 1, d(s1, s2 + ch2) + 1]
Mutual Neighbourhood Distance (MND)
• In this case, the function used to measure the similarity between two
patterns, x and y, is defined as S(x, y) =f(x, y, ε), where ε denotes the
context, i.e., the surrounding points.
• All other data points are labeled in increasing order of some distance
measure, starting from the nearest neighbor as 1 and ending with the
farthest point as N-1.
• Mutual neighborhood distance (MND) is calculated as MND(x, y) =
NN(x, y) + NN(y, x).
• Note: NN(x,y) – denotes nearest neighbor distance from x to y.
Proximity Between Binary Patterns
• Let p and q be two binary strings. Some of the popular proximity
measures on such binary patterns are:
➢Hamming Distance (HD): If pi = qi then we say that p and q match on their ith
bit, else (pi  qi) p and q mismatch on the ith bit. Hamming distance is the
number of mismatching bits out of the l-bit locations.
➢Simple Matching Coefficient (SMC):

M01 is the number of bits where p is 0 and q is 1


M10 is the number of bits where p is 1 and q is 0
M00 is the number of bits where p is 0 and q is 0
➢Jaccard Coefficient (JC): M11 is the number of bits where p is 1 and q is 1
Different classification algorithms based on the
distance measures
• Nearest Neighbour Classifier (NNC)
• k-Nearest Neighbour Classifier (kNNC)
• Weighted k-Nearest Neighbour (WkNN)
• Radius distance Near Neighbours
• Tree Based Nearest Neighbours
• Branch and Bound Method
• Leader clustering
• KNN Regression
Nearest Neighbour Classifier (NNC)
• Let , each pattern be a vector in
some L dimensional space and li is its class label.
• Now the nearest neighbor of x (i.e., the test pattern) is given by

where xj is the jth training pattern and d(x, xj) is the distance between
x and xj.
• Test pattern T(2.1, 0.7) is assigned to class 1,
since its Euclidean distance from x3 is
minimum (i.e., 1 unit)
k-Nearest Neighbour Classifier
• Similar to Nearest Neighbors Classifier (NNC) algorithm, where we
find the k nearest neighbors of a test pattern x from the training data
 , and then assigning the majority class label among the k neighbors
to x.

•By using this method of selecting the


majority class label among the k nearest
neighbors, the error in classification can
be reduced, especially when the training
patterns are noisy.
•T is assigned to class 2, even though x5 is
closest, but is an outlier.
Weighted k-Nearest Neighbour (WkNN)
• Similar to kNN algorithm, but it takes into account the distance of
each of the k neighbors from the test point by weighting them
accordingly.
• Each neighbor is associated with a weight w, which is determined by
the following formula:

Here, j represents the neighbor’s index in the list of


k nearest neighbors, while dk and dj are the
distances between the test point and the k-th
neighbor and the j-th neighbor, respectively.
Weighted k-Nearest Neighbour (WkNN)
• For example the distances from T to its 5 nearest data points are
d(T, x3) = 1.0; d(T, x14) = 1.01; d(T, x13) = 1.08;
d(T, x5) = 1.08; d(T, x16) = 1.30;
• The weight values will be

Summing up for each selected class,


Class 1 to which x3 and x5 belong sums to
1.73, and
Class 3 to which x14, x13 and x16 belong sums
to 1.7.
Therefore, the point T belongs to Class 1.
Radius distance Near Neighbours Algorithm
• This algorithm is an alternative to the kNN algorithm that considers all
the neighbors within a specified distance r of the point of interest.
• Steps:
➢ Given a point T , identify the subset of data points that fall within the radius r
centered at T , denoted by

➢If Br(T) is empty, output the majority class of the entire dataset.
➢If Br(T) is not empty, output the majority class of the data points within Br(T).
Radius distance Near Neighbours Algorithm

Class assigned to T is Class 3


Tree Based Nearest Neighbours Algorithm
• Based on the transactional database
• Mainly for association rule mining, aims to identify the occurrence of one
item based on the occurrence of other items.
• Frequent Pattern (FP) tree:
• Steps:
➢Construct 1-frequent itemset, sort them in descending order of frequency
➢Arrange each transaction in the same order of items as that of frequent 1-itemset.
➢Add the transaction to the branch of the FP-tree such that the for the common
prefix part, the node-count of the items in the FP-tree are incremented and the new
nodes are added to the tree for the remaining part of the transaction.
Tree Based Nearest Neighbours Algorithm

1-frequent itemset :
(12: 5), (1: 4), (16: 4), (4: 3), (5: 3), (8:3),
(9: 3), and (15: 3)
(With minimum support = 3)
Transaction database

Transaction database with


transactions ordered according to
frequency of items
Tree Based Nearest Neighbours Algorithm
Let the test pattern, T = 1, 2, 3, 4, 6, 7, 8, 12, and
16.
After removing the non-frequent items
(Minimum support = 3),
T` = 1, 4, 8, 12, 16 .
By arranging these items in the
order they appear in the FP tree, we get 12, 1, 16,
4, 8.
Starting from the root node of the FP tree (12), we
can compare the remaining items in the test
pattern.
It is observed that the test pattern has the
maximum number of items in common with digit 7.
Therefore, it can be classified as belonging to digit
7.
Branch and Bound Method
• By clustering the data into representative groups with the smallest
possible radius, we can search for the nearest neighbor while avoiding
branches that cannot possibly have a closer neighbor than the current
best value found.
• Each group is represented by cluster centroid and radius (, r).
• To identify the belongingness of the point T to the group, the lower
bond, bj with reference to the cluster j, and recursively branching to
the cluster with the smallest bj until the nearest neighbor is found or
the bound is not satisfied. Note that bj for a cluster j is obtained by
Leader clustering
• It is an incremental clustering approach that is commonly used to
cluster large data sets that cannot be accommodated in the main
memory of the machine processing the data.
• It scan the dataset only once.
• Idea: A data point is assigned to an existing nearest cluster if the
point falls within a threshold distance from the representative
(leader) of the cluster; if there is no cluster in the threshold distance
of the point, then a new cluster is initiated with the data point
becoming the leader of the new cluster.
• It is order dependent algorithm. i.e, the order in which the data is
presented to the algorithm can affect the resulting clusters.
Leader clustering

• Data be clustered are processed


in the order x1, x2, · · · , x18.
• Threshold T be set to 1.5.
• Initial cluster centre (leader) is taken
as x1.
• Figure shows the 4 clusters
formed with the leaders (L)
KNN Regression

• Let . The regression model needs to use 


to find the value of y for a new vector x.
• Idea:
➢Find k nearest neighbors of x from the n data vectors. Let them be x1, x2, · · · ,
xk.
➢Consider the y values associated with these xi’s. Let them be y1, y2, · · · , yk.
➢Predicted value of y, i.e.,
Concentration Effect and Fractional Norms
• A major difficulty encountered while using some of the popular
distance measures like the Euclidean distance is that the distance
values, between various pairs of points, may not show much dynamic
range.

• Observe that as the r value in the Minkowski norm keeps decreasing


the distance between the pair (p,q) keeps increasing.
Concentration Effect and Fractional Norms
• This behaviour prompted researchers to go for fractional norms r is a
fraction) to increase the dynamic range of the values or decrease the
concentration effect.

• An important observation is that in the process of improving the


dynamic range of distance values the fractional norms can improve
the classification performance.
Concentration Effect and Fractional Norms
• Wisconsin breast-cancer data as with different norms shows the
increase in accuracy

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy