0% found this document useful (0 votes)

15 views4 pages

assignment(2)

Uploaded by

Ahmed Hossam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views4 pages

assignment(2)

Uploaded by

Ahmed Hossam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Tools and SoftwareData Science

Assignment no. 2
Dr. Mohamed Abdelhafeez

Name : ‫احمد حسام الدين فوزي عبدالعاطي‬

ID : 20221449419

1)
Min-Max normalization
The formula for Min-Max normalization is:
X_normalized = (X - X_min) / (X_max - X_min)

the normalized values are:

[0.0, 0.333, 0.444, 0.0, 0.444, 0.666, 1.0, 0.222]

Z-score normalization
formula for Z-score normalization is:
X_normalized = (X - mean) / standard_deviation
mean = (10 + 40 + 50 + 10 + 50 + 70 + 90 + 30) / 8 = 45
standard_deviation = sqrt(((10 - 45)^2 + (40 - 45)^2 + (50 - 45)^2 + (10 - 45)^2 + (50 -
45)^2 + (70 - 45)^2 + (90 - 45)^2 + (30 - 45)^2) / 8) = 27.7489

the normalized values are:

[-1.798, -0.360, -0.144, -1.798, -0.144, 0.648, 1.512, -0.936]
Decimal scaling normalization
In this case, the scaling factor is 100.
the normalized values are:
[0.1, 0.4, 0.5, 0.1, 0.5, 0.7, 0.9, 0.3]

2)
Mean Imputation:
To impute the missing value using mean imputation, you calculate the mean of the
available values in the dataset:
Mean = (10 + 40 + 50 + 10 + 50 + 70 + 90 + 30) / 8 = 43.75
Then, you replace the missing value with the calculated mean:
[10, 40, 50, 10, 50, 70, 90, 30, 43.75]

Linear Interpolation:
For linear interpolation, you consider the neighboring data points around the missing
value. In this case, the value before the missing one is 30, and the value after it is 43.75.
You can then calculate the interpolated value using linear interpolation formula:
Interpolated value = 30 + (43.75 - 30) * (1/9) = 31.5278
Replace the missing value with the interpolated value:
[10, 40, 50, 10, 50, 70, 90, 30, 31.5278]

Last Observation Carried Forward (LOCF):

LOCF involves using the last observed value before the missing one to fill in the gap. In
this case, the last observed value is 30. Therefore, you replace the missing value with the
last observed value:
[10, 40, 50, 10, 50, 70, 90, 30, 30]
3)
Sort the data i:
[100, 110, 120, 130, 150, 160, 160, 170, 180, 280, 290]
Range =290 - 100 = 190
Interval width = Range / Number of categories = 190 / 3 = 63.33
Since the interval width should be a whole number, we can round it up to 64.
intervals with a width of 64:
Low: [100 - 163.99]
Mid: [164 - 227.99]
High: [228 - 290]
Assign the data points to their respective categories based on the intervals:
Low: [100, 110, 120, 130]
Mid: [150, 160, 160, 170, 180]
High: [280, 290]
Discretization can help in dealing with noise by reducing the impact of small variations in
the data. Noise refers to random fluctuations or errors in the data that may distort the
underlying patterns or relationships. By discretizing the data, we group similar values
into categories, which can help smooth out the effects of noise.

4)
1. {a} / {a, b, c, d, e} = 1/5.

2. Sedit("Samar", "Tamer") is 1 (replacing 'S' with 'T').

3. dhamming(x, y) is 4 (010, 1010, 0101, 1010).

4. sqrt((3^2 + 3^2 + 3^2)) = sqrt(27) ≈ 5.196.

5.(12 + 23) / (sqrt(1^2 + 2^2) * sqrt(2^2 + 3^2)) = (2 + 6) / (sqrt(5) * sqrt(13)) ≈ 0.848.

6. Distance between 10:20 and 15:25 is 5 hours and 5 minutes.

5)
Euclidean Distance:
sklearn.metrics.euclidean_distances: Calculates the pairwise Euclidean distances
between two sets of points.

Manhattan Distance (City Block Distance):

sklearn.metrics.manhattan_distances: Calculates the pairwise Manhattan distances
between two sets of points.

Cosine Similarity:
sklearn.metrics.pairwise.cosine_distances: Calculates the pairwise cosine distances
between two sets of points.
sklearn.metrics.pairwise.cosine_similarity: Calculates the pairwise cosine similarities
between two sets of points.

Minkowski Distance:
sklearn.metrics.pairwise_distances: Calculates the pairwise distances using the
Minkowski distance metric.

Code using scikit-learn (sklearn):

from sklearn.metrics import euclidean_distances

point1 = [[1, 2, 3]]

point2 = [[4, 5, 6]]

distances = euclidean_distances(point1, point2)

euclidean_distance = distances[0, 0]

print("Euclidean distance using sklearn:", euclidean_distance)

Code using NumPy:

import numpy as np

point1 = np.array([1, 2, 3])

point2 = np.array([4, 5, 6])

euclidean_distance = np.linalg.norm(point1 - point2)

print("Euclidean distance using NumPy:", euclidean_distance)

ML Lab Manual 2025-2
No ratings yet
ML Lab Manual 2025-2
35 pages
Experiment No. 5: Objective
No ratings yet
Experiment No. 5: Objective
5 pages
Solution Manual for Control Systems Engineering 7th ed – Norman Nise (4)
No ratings yet
Solution Manual for Control Systems Engineering 7th ed – Norman Nise (4)
15 pages
Untitled
No ratings yet
Untitled
1,326 pages
Ug Nep 5th and 6th Sem Mathematics
No ratings yet
Ug Nep 5th and 6th Sem Mathematics
18 pages
Foundation of Data Science previous year question paper
No ratings yet
Foundation of Data Science previous year question paper
40 pages
Data Preprocessing
No ratings yet
Data Preprocessing
32 pages
Data Preprocessing and Linear Regression
No ratings yet
Data Preprocessing and Linear Regression
54 pages
Data Analytics Lab Manual_250402_095326
No ratings yet
Data Analytics Lab Manual_250402_095326
58 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
32 pages
Week2_DataPreprocessing
No ratings yet
Week2_DataPreprocessing
43 pages
Data Mining
No ratings yet
Data Mining
33 pages
Data science tutorial
No ratings yet
Data science tutorial
40 pages
A110 Rayyan Expt4dep
No ratings yet
A110 Rayyan Expt4dep
9 pages
1950 - Levine, Schwinger - On The Theory of Electromagnetic Wave Diffraction by An Aperture in An Infinite Plane Conducting Screen - Com PDF
100% (1)
1950 - Levine, Schwinger - On The Theory of Electromagnetic Wave Diffraction by An Aperture in An Infinite Plane Conducting Screen - Com PDF
37 pages
IAT Paper Jan-June 22 DMBI DIV A&B Solution
No ratings yet
IAT Paper Jan-June 22 DMBI DIV A&B Solution
10 pages
AbidAdhikari26840-DWDM
No ratings yet
AbidAdhikari26840-DWDM
43 pages
Distance and Normalization
No ratings yet
Distance and Normalization
5 pages
Calculus Chapter3 LEANH
No ratings yet
Calculus Chapter3 LEANH
60 pages
Introduction to the Theory of Lie Groups 1st Edition Roger Godement all chapter instant download
100% (6)
Introduction to the Theory of Lie Groups 1st Edition Roger Godement all chapter instant download
62 pages
DSR Unit III
No ratings yet
DSR Unit III
11 pages
Lec 5
No ratings yet
Lec 5
24 pages
Duality in Linear Programming 3 - Solved Examples
No ratings yet
Duality in Linear Programming 3 - Solved Examples
5 pages
Discrminant Analysis
No ratings yet
Discrminant Analysis
29 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
AIDI 1002 FinalExam Section 01
No ratings yet
AIDI 1002 FinalExam Section 01
2 pages
Data Preprocessing
No ratings yet
Data Preprocessing
56 pages
Chapter 1 Introduction To Data Mining
No ratings yet
Chapter 1 Introduction To Data Mining
10 pages
Weekly Homework X
No ratings yet
Weekly Homework X
15 pages
02 Tinh Khoang Cach - Compatibility Mode
No ratings yet
02 Tinh Khoang Cach - Compatibility Mode
14 pages
DS Problem Statements and Codes
No ratings yet
DS Problem Statements and Codes
21 pages
Principles of Statistical Analysis - V1
No ratings yet
Principles of Statistical Analysis - V1
426 pages
Answers To Questions
No ratings yet
Answers To Questions
9 pages
Stochastic Processes ActSci
100% (1)
Stochastic Processes ActSci
195 pages
5 Data Preprocessing III Editted Notes
No ratings yet
5 Data Preprocessing III Editted Notes
17 pages
Handling Missing Values in Python
No ratings yet
Handling Missing Values in Python
9 pages
02_23ECE216_EDA_Pre Processing
No ratings yet
02_23ECE216_EDA_Pre Processing
16 pages
LPP Pyqs: 1. The Point Which Does Not Lie in The Half Plane
No ratings yet
LPP Pyqs: 1. The Point Which Does Not Lie in The Half Plane
12 pages
It-3031 (DMDW) - CS End Nov 2023
No ratings yet
It-3031 (DMDW) - CS End Nov 2023
23 pages
Comparison of Imputation Techniques After Classifying The Dataset Using KNN Classifier For The Imputation of Missing Data
No ratings yet
Comparison of Imputation Techniques After Classifying The Dataset Using KNN Classifier For The Imputation of Missing Data
4 pages
Data Preprocessing
No ratings yet
Data Preprocessing
49 pages
DMBI IAT-2 IMP QUES SOLN
No ratings yet
DMBI IAT-2 IMP QUES SOLN
43 pages
06 - Binomial Theorem For Positve Integral Index
No ratings yet
06 - Binomial Theorem For Positve Integral Index
8 pages
PMA Unit-2 pdf
No ratings yet
PMA Unit-2 pdf
19 pages
Avinash DA 6
No ratings yet
Avinash DA 6
3 pages
50inference
No ratings yet
50inference
31 pages
Complete Data Science Questions
No ratings yet
Complete Data Science Questions
5 pages
Lecture 05: Feature Engineering: Ms. Mehroz Sadiq
No ratings yet
Lecture 05: Feature Engineering: Ms. Mehroz Sadiq
69 pages
SMK Seri Bintang Utara: Bulan
No ratings yet
SMK Seri Bintang Utara: Bulan
19 pages
Eda
No ratings yet
Eda
48 pages
Chapter 3 Solutions(2)
No ratings yet
Chapter 3 Solutions(2)
3 pages
ML SELF UNIT 2
No ratings yet
ML SELF UNIT 2
20 pages
DataAnalytics Lab Manual (1)
No ratings yet
DataAnalytics Lab Manual (1)
35 pages
DA PROGRAM UPTO 6 (1)
No ratings yet
DA PROGRAM UPTO 6 (1)
20 pages
MAE101_SU2018 (có đáp án)
No ratings yet
MAE101_SU2018 (có đáp án)
7 pages
PracticalList_EDT_BCA_2024 SET B1_4
No ratings yet
PracticalList_EDT_BCA_2024 SET B1_4
8 pages
Rayleigh-Sommerfeld Fraunhofer Diffraction
No ratings yet
Rayleigh-Sommerfeld Fraunhofer Diffraction
8 pages
ML_Notes
No ratings yet
ML_Notes
44 pages
DM Lab 02
No ratings yet
DM Lab 02
12 pages
DA_Programs
No ratings yet
DA_Programs
44 pages
Unit 1
No ratings yet
Unit 1
21 pages
Bussiness Report PM
No ratings yet
Bussiness Report PM
44 pages
DA lab
No ratings yet
DA lab
27 pages
Math MA1101 1st Sem Midsem
No ratings yet
Math MA1101 1st Sem Midsem
2 pages
Espec: Hidrologia Fecha: 30/04/2023
No ratings yet
Espec: Hidrologia Fecha: 30/04/2023
15 pages
Feature Engineering
No ratings yet
Feature Engineering
15 pages
ASSi2 DSBDA
No ratings yet
ASSi2 DSBDA
4 pages
C2 Differentialtion - Answers
No ratings yet
C2 Differentialtion - Answers
9 pages
Internals1 FDS Scheme
No ratings yet
Internals1 FDS Scheme
7 pages
Data Mining Reviewer
No ratings yet
Data Mining Reviewer
4 pages
V1288 PDF
No ratings yet
V1288 PDF
26 pages
Be A 65 Ads Exp 3
No ratings yet
Be A 65 Ads Exp 3
6 pages
Maths Winter Project
No ratings yet
Maths Winter Project
7 pages
Finding Equation of Plane
No ratings yet
Finding Equation of Plane
3 pages
DSBDA Lab Assignment No 2
No ratings yet
DSBDA Lab Assignment No 2
7 pages
Simple Linear Programming Model: Joel S. Casibang
No ratings yet
Simple Linear Programming Model: Joel S. Casibang
49 pages
Velammal Engineering College (An Autonomous Institution), Chennai-66 Teaching & Learning Lesson Plan Form
No ratings yet
Velammal Engineering College (An Autonomous Institution), Chennai-66 Teaching & Learning Lesson Plan Form
7 pages
DS assignment COMPLETED DOC
No ratings yet
DS assignment COMPLETED DOC
11 pages
1683204786-Test Paper 13,14,15
No ratings yet
1683204786-Test Paper 13,14,15
12 pages
Summer Final Exam Schedule
No ratings yet
Summer Final Exam Schedule
4 pages
Discrete Fourier Transform: e C T F DT e T F T C C C
No ratings yet
Discrete Fourier Transform: e C T F DT e T F T C C C
9 pages
2016 Sydney Boys Trial
No ratings yet
2016 Sydney Boys Trial
48 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
Or Assignment Final
100% (1)
Or Assignment Final
15 pages
Linear and Exponential Function Investigation
No ratings yet
Linear and Exponential Function Investigation
2 pages
Examination Syllabus For TGT (Math) : Real Numbers: Representation of Natural Numbers, Integers, Rational
No ratings yet
Examination Syllabus For TGT (Math) : Real Numbers: Representation of Natural Numbers, Integers, Rational
3 pages
Practice Problem set 6
No ratings yet
Practice Problem set 6
2 pages
Integral Calculus Formula Sheet
No ratings yet
Integral Calculus Formula Sheet
1 page
(WWW - Entrance-Exam - Net) - SAT Mathematics Sample Paper 1
No ratings yet
(WWW - Entrance-Exam - Net) - SAT Mathematics Sample Paper 1
1 page
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

assignment(2)

Uploaded by

assignment(2)

Uploaded by

Tools and SoftwareData Science

Name : ‫احمد حسام الدين فوزي عبدالعاطي‬

the normalized values are:

the normalized values are:

Last Observation Carried Forward (LOCF):

2. Sedit("Samar", "Tamer") is 1 (replacing 'S' with 'T').

3. dhamming(x, y) is 4 (010, 1010, 0101, 1010).

4. sqrt((3^2 + 3^2 + 3^2)) = sqrt(27) ≈ 5.196.

5.(12 + 23) / (sqrt(1^2 + 2^2) * sqrt(2^2 + 3^2)) = (2 + 6) / (sqrt(5) * sqrt(13)) ≈ 0.848.

6. Distance between 10:20 and 15:25 is 5 hours and 5 minutes.

Manhattan Distance (City Block Distance):

Code using scikit-learn (sklearn):

from sklearn.metrics import euclidean_distances

point1 = [[1, 2, 3]]

point2 = [[4, 5, 6]]

distances = euclidean_distances(point1, point2)

print("Euclidean distance using sklearn:", euclidean_distance)

Code using NumPy:

point1 = np.array([1, 2, 3])

point2 = np.array([4, 5, 6])

euclidean_distance = np.linalg.norm(point1 - point2)

print("Euclidean distance using NumPy:", euclidean_distance)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

assignment(2)

Uploaded by

assignment(2)

Uploaded by

Tools and SoftwareData Science

Name : ‫احمد حسام الدين فوزي عبدالعاطي‬

the normalized values are:

the normalized values are:

Last Observation Carried Forward (LOCF):

2. Sedit("Samar", "Tamer") is 1 (replacing 'S' with 'T').

3. dhamming(x, y) is 4 (010, 1010, 0101, 1010).

4. sqrt((3^2 + 3^2 + 3^2)) = sqrt(27) ≈ 5.196.

5.(1*2 + 2*3) / (sqrt(1^2 + 2^2) * sqrt(2^2 + 3^2)) = (2 + 6) / (sqrt(5) * sqrt(13)) ≈ 0.848.

6. Distance between 10:20 and 15:25 is 5 hours and 5 minutes.

Manhattan Distance (City Block Distance):

Code using scikit-learn (sklearn):

from sklearn.metrics import euclidean_distances

point1 = [[1, 2, 3]]

point2 = [[4, 5, 6]]

distances = euclidean_distances(point1, point2)

print("Euclidean distance using sklearn:", euclidean_distance)

Code using NumPy:

point1 = np.array([1, 2, 3])

point2 = np.array([4, 5, 6])

euclidean_distance = np.linalg.norm(point1 - point2)

print("Euclidean distance using NumPy:", euclidean_distance)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

5.(12 + 23) / (sqrt(1^2 + 2^2) * sqrt(2^2 + 3^2)) = (2 + 6) / (sqrt(5) * sqrt(13)) ≈ 0.848.