0% found this document useful (0 votes)
15 views4 pages

assignment(2)

Uploaded by

Ahmed Hossam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views4 pages

assignment(2)

Uploaded by

Ahmed Hossam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Tools and SoftwareData Science

Assignment no. 2
Dr. Mohamed Abdelhafeez

Name : ‫احمد حسام الدين فوزي عبدالعاطي‬


ID : 20221449419

1)
Min-Max normalization
The formula for Min-Max normalization is:
X_normalized = (X - X_min) / (X_max - X_min)

the normalized values are:


[0.0, 0.333, 0.444, 0.0, 0.444, 0.666, 1.0, 0.222]

Z-score normalization
formula for Z-score normalization is:
X_normalized = (X - mean) / standard_deviation
mean = (10 + 40 + 50 + 10 + 50 + 70 + 90 + 30) / 8 = 45
standard_deviation = sqrt(((10 - 45)^2 + (40 - 45)^2 + (50 - 45)^2 + (10 - 45)^2 + (50 -
45)^2 + (70 - 45)^2 + (90 - 45)^2 + (30 - 45)^2) / 8) = 27.7489

the normalized values are:


[-1.798, -0.360, -0.144, -1.798, -0.144, 0.648, 1.512, -0.936]
Decimal scaling normalization
In this case, the scaling factor is 100.
the normalized values are:
[0.1, 0.4, 0.5, 0.1, 0.5, 0.7, 0.9, 0.3]

2)
Mean Imputation:
To impute the missing value using mean imputation, you calculate the mean of the
available values in the dataset:
Mean = (10 + 40 + 50 + 10 + 50 + 70 + 90 + 30) / 8 = 43.75
Then, you replace the missing value with the calculated mean:
[10, 40, 50, 10, 50, 70, 90, 30, 43.75]

Linear Interpolation:
For linear interpolation, you consider the neighboring data points around the missing
value. In this case, the value before the missing one is 30, and the value after it is 43.75.
You can then calculate the interpolated value using linear interpolation formula:
Interpolated value = 30 + (43.75 - 30) * (1/9) = 31.5278
Replace the missing value with the interpolated value:
[10, 40, 50, 10, 50, 70, 90, 30, 31.5278]

Last Observation Carried Forward (LOCF):


LOCF involves using the last observed value before the missing one to fill in the gap. In
this case, the last observed value is 30. Therefore, you replace the missing value with the
last observed value:
[10, 40, 50, 10, 50, 70, 90, 30, 30]
3)
Sort the data i:
[100, 110, 120, 130, 150, 160, 160, 170, 180, 280, 290]
Range =290 - 100 = 190
Interval width = Range / Number of categories = 190 / 3 = 63.33
Since the interval width should be a whole number, we can round it up to 64.
intervals with a width of 64:
Low: [100 - 163.99]
Mid: [164 - 227.99]
High: [228 - 290]
Assign the data points to their respective categories based on the intervals:
Low: [100, 110, 120, 130]
Mid: [150, 160, 160, 170, 180]
High: [280, 290]
Discretization can help in dealing with noise by reducing the impact of small variations in
the data. Noise refers to random fluctuations or errors in the data that may distort the
underlying patterns or relationships. By discretizing the data, we group similar values
into categories, which can help smooth out the effects of noise.

4)
1. {a} / {a, b, c, d, e} = 1/5.

2. Sedit("Samar", "Tamer") is 1 (replacing 'S' with 'T').

3. dhamming(x, y) is 4 (010, 1010, 0101, 1010).

4. sqrt((3^2 + 3^2 + 3^2)) = sqrt(27) ≈ 5.196.

5.(1*2 + 2*3) / (sqrt(1^2 + 2^2) * sqrt(2^2 + 3^2)) = (2 + 6) / (sqrt(5) * sqrt(13)) ≈ 0.848.

6. Distance between 10:20 and 15:25 is 5 hours and 5 minutes.


5)
Euclidean Distance:
sklearn.metrics.euclidean_distances: Calculates the pairwise Euclidean distances
between two sets of points.

Manhattan Distance (City Block Distance):


sklearn.metrics.manhattan_distances: Calculates the pairwise Manhattan distances
between two sets of points.

Cosine Similarity:
sklearn.metrics.pairwise.cosine_distances: Calculates the pairwise cosine distances
between two sets of points.
sklearn.metrics.pairwise.cosine_similarity: Calculates the pairwise cosine similarities
between two sets of points.

Minkowski Distance:
sklearn.metrics.pairwise_distances: Calculates the pairwise distances using the
Minkowski distance metric.

Code using scikit-learn (sklearn):

from sklearn.metrics import euclidean_distances

point1 = [[1, 2, 3]]

point2 = [[4, 5, 6]]

distances = euclidean_distances(point1, point2)

euclidean_distance = distances[0, 0]

print("Euclidean distance using sklearn:", euclidean_distance)

Code using NumPy:

import numpy as np

point1 = np.array([1, 2, 3])

point2 = np.array([4, 5, 6])

euclidean_distance = np.linalg.norm(point1 - point2)

print("Euclidean distance using NumPy:", euclidean_distance)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy