0% found this document useful (0 votes)
4 views5 pages

Distance and Normalization

The document calculates the Euclidean and Manhattan distances between points A(3,4) and B(5,6), resulting in approximately 2.83 and 4, respectively. It also performs normalization using Z-Score, Min-Max, and Decimal Scaling methods, yielding specific normalized values for both points. Additionally, it discusses finding the best-fitted regression line for a set of points and describes Equal-Frequency and Equal-Width binning techniques for another data set.

Uploaded by

Sanya Uppal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views5 pages

Distance and Normalization

The document calculates the Euclidean and Manhattan distances between points A(3,4) and B(5,6), resulting in approximately 2.83 and 4, respectively. It also performs normalization using Z-Score, Min-Max, and Decimal Scaling methods, yielding specific normalized values for both points. Additionally, it discusses finding the best-fitted regression line for a set of points and describes Equal-Frequency and Equal-Width binning techniques for another data set.

Uploaded by

Sanya Uppal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Distance and Normalization

Given two points in 2D space:

A = (3,4)
B = (5,6)
Find Euclidean Distance and Manhattan Distance

Perform Normalization:
i) Z-Score
ii) Min-Max
iii) Decimal scaling by a factor of 100

Euclidean Distance:

The Euclidean distance between two points A(x1 , y1 ) and B(x2 , y2 ) is given by the formula:
​ ​ ​ ​

Euclidean Distance = (x2 − x1 )2 + (y2 − y1 )2


​ ​ ​ ​ ​

For points A = (3, 4) and B = (5, 6):

Euclidean Distance = (5 − 3)2 + (6 − 4)2 = ​ 22 + 22 =


​ 4+4= ​ 8 ≈ 2.83

Manhattan Distance:

The Manhattan distance between two points is the sum of the absolute differences of their coordinates:

Manhattan Distance = ∣x2 − x1 ∣ + ∣y2 − y1 ∣ ​ ​ ​ ​

For points A = (3, 4) and B = (5, 6):

Manhattan Distance = ∣5 − 3∣ + ∣6 − 4∣ = 2 + 2 = 4

Normalization:

Let’s now perform the requested normalization methods on the points.

i) Z-Score Normalization:

The Z-score normalization is calculated using:

x−μ
Z= ​

σ
Where:

x is the data point


μ is the mean
σ is the standard deviation

Let’s calculate for both dimensions (x and y):


For x-coordinates (3 and 5), the mean (μx ) and standard deviation (σx ) are:
​ ​

3+5
μx = =4
2
​ ​

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/5
(3 − 4)2 + (5 − 4)2 1+1
σx = = = 1=1
2 2
​ ​ ​ ​ ​ ​

For y -coordinates (4 and 6), the mean (μy ) and standard deviation (σy ) are:
​ ​

4+6
μy = =5
2
​ ​

(4 − 5)2 + (6 − 5)2 1+1


σy = = = 1=1
2 2
​ ​ ​ ​ ​ ​

Now applying the Z-score formula for each point:


For point A = (3, 4):
3−4 4−5
Zx = = −1, Zy = = −1
1 1
​ ​ ​ ​

So, Az-score ​
= (−1, −1)
For point B = (5, 6):
5−4 6−5
Zx = = 1, Zy = =1
1 1
​ ​ ​ ​

So, Bz-score ​ = (1, 1)

ii) Min-Max Normalization:

The Min-Max normalization is calculated as:

x − xmin
Min-Max(x) =

xmax − xmin

​ ​

For the x-coordinates (3 and 5):

xmin = 3,

xmax = 5 ​

For point A = (3, 4):


3−3
Min-Max(x) = =0
5−3

For point B = (5, 6):


5−3
Min-Max(x) = =1
5−3

For the y -coordinates (4 and 6):

ymin = 4,
​ ymax = 6 ​

For point A = (3, 4):


4−4
Min-Max(y) = =0
6−4

For point B = (5, 6):


6−4
Min-Max(y) = =1
6−4

So the Min-Max normalized points are:

Amin-max = (0, 0)

Bmin-max = (1, 1)

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/5
iii) Decimal Scaling Normalization (by a factor of 100):

Decimal scaling normalization is calculated by:


x
Decimal Scaling(x) =
10k

Where k is the smallest integer such that max(∣xmin ∣, ∣xmax ∣)​ ​ < 10k .
For points A = (3, 4) and B = (5, 6), the values are already in a range of 0 to 9, so applying decimal scaling by a factor of
100:
For point A = (3, 4):

3 4
Adecimal scaling = ( , ) = (0.03, 0.04)
100 100
​ ​ ​

For point B = (5, 6):


5 6
Bdecimal scaling = ( , ) = (0.05, 0.06)
100 100
​ ​ ​

Summary:

1. Euclidean Distance: ≈ 2.83


2. Manhattan Distance: 4

Normalization:

Z-Score:
Az-score = (−1, −1)

Bz-score = (1, 1)

Min-Max:
Amin-max = (0, 0)

Bmin-max = (1, 1)

Decimal Scaling:
Adecimal scaling = (0.03, 0.04)
Bdecimal scaling = (0.05, 0.06)

{1,2,3,4,5,6,7,8}

Find best-fitted regression line:


(1,4), (2,8), (3,12), (4,16), (5,20), (6,24)

To find the best-fitted regression line for the given points (1, 4), (2, 8), (3, 12), (4, 16), (5, 20), (6, 24), we can use the
linear regression formula:

y = mx + b

Where:

m is the slope of the line


b is the y-intercept

We can calculate the slope m and y-intercept b using the formulas:

1. Slope (m):

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/5
n(∑ xi yi ) − (∑ xi )(∑ yi )
m=
​ ​ ​ ​

n(∑ x2i ) − (∑ xi )2

​ ​

2. Intercept (b):

∑ y i − m ∑ xi
b=
​ ​

n
Where n is the number of data points, xi and yi are the coordinates of the points.
​ ​

Let’s calculate the slope and intercept for the given points.
The best-fitted regression line for the given points is:

y = 4x + 0

Or simply:

y = 4x

This indicates that the slope of the line is 4, and the y-intercept is 0. ​

ata Points: {5,10,12,20,25,32,64,78,81}

Equal-Frequency
Equal-Width

To perform Equal-Frequency and Equal-Width binning on the given data points {5, 10, 12, 20, 25, 32, 64, 78, 81}, let's first
define what these methods mean:

1. Equal-Frequency Binning:
Divide the data into bins such that each bin has the same number of data points.
The number of bins is typically chosen based on the number of data points.
2. Equal-Width Binning:
Divide the range of data values into equal-width intervals.
The number of bins is specified, and the width of each bin is calculated by dividing the total range of the data by
the number of bins.

Let’s determine:

Number of bins for each method (we’ll use 3 bins for illustration).
Perform both binning techniques.

Here are the results of the binning methods:

Equal-Frequency Binning:

Bin 1: [5, 10, 12]


Bin 2: [20, 25, 32]
Bin 3: [64, 78, 81]

Each bin contains 3 data points, and the data is split accordingly.

Equal-Width Binning:

Bin 1: [5, 10, 12, 20, 25] (Range: 5 to 25)


Bin 2: [32] (Range: 25 to 45)
Bin 3: [64, 78] (Range: 45 to 65)

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/5
81−5
In this case, the width of each bin is approximately 3
​ = 25.33, and the data points are grouped based on this range. ​

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 5/5

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy