0% found this document useful (0 votes)
3 views13 pages

LU1 - Distance Based Models

The document discusses various distance metrics, primarily focusing on Minkowski distance and its special cases: Manhattan, Euclidean, and Chebyshev distances. It also introduces concepts like Hamming distance, distance metrics properties, and the triangle inequality. Additionally, it explains elliptical distances and the Mahalanobis distance, which uses covariance for shape estimation.

Uploaded by

Abinaya Devi C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views13 pages

LU1 - Distance Based Models

The document discusses various distance metrics, primarily focusing on Minkowski distance and its special cases: Manhattan, Euclidean, and Chebyshev distances. It also introduces concepts like Hamming distance, distance metrics properties, and the triangle inequality. Additionally, it explains elliptical distances and the Mahalanobis distance, which uses covariance for shape estimation.

Uploaded by

Abinaya Devi C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Distance based models

• Minkowski distance is the generalized distance metric. Here


generalized means that we can manipulate the to calculate the
distance between two data points in different ways.
• we can manipulate the value of p and calculate the distance in three
different ways-
• p = 1, Manhattan Distance
• p = 2, Euclidean Distance
• p = ∞, Chebychev Distance
Minkowski distance
• If X = Rd , the Minkowski distance of order p > 0 is defined as
• 2-norm refers to the familiar Euclidean distance

• The 1-norm denotes Manhattan distance, also called cityblock


distance
Manhattan Distance
• Manhattan distance is used when we need to calculate the distance
between two data points in a grid-like path
• Manhattan distance is also known as Taxicab Geometry, City Block
Distance

• the distance d will be calculated as


• (x1 — y1) + (x2 — y2) + (x3 — y3) + … + (xn — yn).
• L1 norm is the distance you have to travel between the origin (0,0) to
the destination (3,4),
• L1 norm is calculated by
Euclidean Distance
• It is calculated using the Minkowski Distance formula by setting p’s
value to 2.
• This will update the distance d’ formula as
L0 norm
• 0-norm (or L0 norm) which counts the number of non-zero elements
in a vector.
• The corresponding distance then counts the number of positions in
which vectors x and y differ. This is not strictly a Minkowski distance;
however, we can define it as


Hamming distance
• Hamming distance as the number of bits that need to be flipped to
change x into y; for non-binary strings of unequal length this can be
generalised to the notion of edit distance or Levenshtein distance
• Definition 8.2 (Distance metric). Given an instance space X, a distance
metric is a function Dis :X ×X →R such that for any x, y, z ∈X:
• 1. distances between a point and itself are zero: Dis(x,x) = 0;
• 2. all other distances are larger than zero: if x = y then Dis(x, y) > 0;
• 3. distances are symmetric: Dis(y,x) = Dis(x, y);
• 4. detours can not shorten the distance: Dis(x, z) ≤ Dis(x, y)+Dis(y, z).
• Called as triangle inequality
• If the second condition is weakened to a non-strict inequality – i.e., Dis(x, y)
may be zero even if x = y – the function Dis is called a pseudo-metric.
• The last condition is called the triangle inequality
• The green circle connects points the same Euclidean distance (i.e., Minkowski
• distance of order p = 2) away from the origin as A.
• The orange circle shows that B and C are equidistant from A.
• The red circle demonstrates that C is closer to the origin than B, which conforms
to the triangle inequality
• The triangle inequality dictates that the distance from the origin to C is no more than the sum of the
distances from the origin to A (Dis(O,A)) and from A to C (Dis(A, C)).
• B is at the same distance from A as C, regardless of the distance measure used; so Dis(O,A)+Dis(A,C)
is equal to the distance from the origin to B.
• So, if we draw a circle around the origin through B, the triangle inequality dictates that C not be
outside that circle.
• B is the only point where the circles around the origin and around A intersect, so everywhere else
the triangle inequality is a strict inequality.
• With Manhattan distance (p = 1), B and C are equally close to the origin and also
equidistant from A.
• Now, B and C are in fact equidistant from the origin, and so travelling via A to C is no
longer a detour, but just one of the many shortest routes.
• However, if we now decrease p further,we see that C ends up outside the red shape,
and is thus further away than B when seen from the origin, whereas of course the sum
of the distances from the origin to A and
• from A to C is still equal to the distance fromthe origin to B. At this point, our intuition
breaks down:
• Minkowski distances with p < 1 are simply not very useful as distances since they all
violate the triangle inequality.
Elliptical Distances
• it is more realistic to use an ellipse rather than a circle to identify points that can be
reached in a fixed amount of time, with the major axis of the ellipse indicating
directions that can be traversed at larger speed
• Mathematically, while hyper-spheres (circles in d ≥ 2 dimensions) of radius r can be
defined by the equation xTx = r 2, hyper-ellipses are defined by xTMx = r 2
• where M is a matrix describing the appropriate rotation and scaling.

• Shape of the ellipse is estimated from data as the inverse of the covariance matrix: M=
Σ −1.
• This leads to the definition of the Mahalanobis distance

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy