Distance and Normalization
Distance and Normalization
A = (3,4)
B = (5,6)
Find Euclidean Distance and Manhattan Distance
Perform Normalization:
i) Z-Score
ii) Min-Max
iii) Decimal scaling by a factor of 100
Euclidean Distance:
The Euclidean distance between two points A(x1 , y1 ) and B(x2 , y2 ) is given by the formula:
Manhattan Distance:
The Manhattan distance between two points is the sum of the absolute differences of their coordinates:
Manhattan Distance = ∣5 − 3∣ + ∣6 − 4∣ = 2 + 2 = 4
Normalization:
i) Z-Score Normalization:
x−μ
Z=
σ
Where:
3+5
μx = =4
2
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/5
(3 − 4)2 + (5 − 4)2 1+1
σx = = = 1=1
2 2
For y -coordinates (4 and 6), the mean (μy ) and standard deviation (σy ) are:
4+6
μy = =5
2
So, Az-score
= (−1, −1)
For point B = (5, 6):
5−4 6−5
Zx = = 1, Zy = =1
1 1
x − xmin
Min-Max(x) =
xmax − xmin
xmin = 3,
xmax = 5
ymin = 4,
ymax = 6
Amin-max = (0, 0)
Bmin-max = (1, 1)
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/5
iii) Decimal Scaling Normalization (by a factor of 100):
Where k is the smallest integer such that max(∣xmin ∣, ∣xmax ∣) < 10k .
For points A = (3, 4) and B = (5, 6), the values are already in a range of 0 to 9, so applying decimal scaling by a factor of
100:
For point A = (3, 4):
3 4
Adecimal scaling = ( , ) = (0.03, 0.04)
100 100
Summary:
Normalization:
Z-Score:
Az-score = (−1, −1)
Bz-score = (1, 1)
Min-Max:
Amin-max = (0, 0)
Bmin-max = (1, 1)
Decimal Scaling:
Adecimal scaling = (0.03, 0.04)
Bdecimal scaling = (0.05, 0.06)
{1,2,3,4,5,6,7,8}
To find the best-fitted regression line for the given points (1, 4), (2, 8), (3, 12), (4, 16), (5, 20), (6, 24), we can use the
linear regression formula:
y = mx + b
Where:
1. Slope (m):
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/5
n(∑ xi yi ) − (∑ xi )(∑ yi )
m=
n(∑ x2i ) − (∑ xi )2
2. Intercept (b):
∑ y i − m ∑ xi
b=
n
Where n is the number of data points, xi and yi are the coordinates of the points.
Let’s calculate the slope and intercept for the given points.
The best-fitted regression line for the given points is:
y = 4x + 0
Or simply:
y = 4x
This indicates that the slope of the line is 4, and the y-intercept is 0.
Equal-Frequency
Equal-Width
To perform Equal-Frequency and Equal-Width binning on the given data points {5, 10, 12, 20, 25, 32, 64, 78, 81}, let's first
define what these methods mean:
1. Equal-Frequency Binning:
Divide the data into bins such that each bin has the same number of data points.
The number of bins is typically chosen based on the number of data points.
2. Equal-Width Binning:
Divide the range of data values into equal-width intervals.
The number of bins is specified, and the width of each bin is calculated by dividing the total range of the data by
the number of bins.
Let’s determine:
Number of bins for each method (we’ll use 3 bins for illustration).
Perform both binning techniques.
Equal-Frequency Binning:
Each bin contains 3 data points, and the data is split accordingly.
Equal-Width Binning:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/5
81−5
In this case, the width of each bin is approximately 3
= 25.33, and the data points are grouped based on this range.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 5/5