0% found this document useful (0 votes)
16 views4 pages

Assignment no 2(mid term preparation)

The document consists of a series of questions and tasks related to data science, including data preprocessing, feature extraction, MongoDB, and text analysis. It covers calculations such as Euclidean distance, normalization, handling missing values, and constructing distance matrices. Additionally, it includes multiple-choice questions about MongoDB functionalities and text analysis metrics.

Uploaded by

eldinsafe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views4 pages

Assignment no 2(mid term preparation)

The document consists of a series of questions and tasks related to data science, including data preprocessing, feature extraction, MongoDB, and text analysis. It covers calculations such as Euclidean distance, normalization, handling missing values, and constructing distance matrices. Additionally, it includes multiple-choice questions about MongoDB functionalities and text analysis metrics.

Uploaded by

eldinsafe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Data Science Tools and Software

Model Answer
Assiment #2
Dr. Mohamed Abdelhafeez
Question 1: Data Preprocessing
a) Given the following dataset
i. Compute the Euclidian distance de (x1, x3) and de (x2, x4)

ii. Comment on the computed distances above

iii. Normalize the given dataset using min-max

math physics

x1 85 0.7

x2 65 0.8

x3 80 0.2

x4 75 0.9

b) Given the following dataset X with missing values denoted a and b


x1=[ a? 60] x2=[11 75] x3=[ 5 75] x4=[5 80] x5=[ 7 b? ]
Show how to replace the missing data denoted a and b with proper values using each of
the following methods:
i. The mean value

ii. The most probable

iii. kNN regression with k=2.

c) Calculate a normalized dissimilarity (distance) between the following two symbolic


objects x and y having 4 attributes where the first attribute is a string of 5 characters, the
second is an interval, the third is a set and the fourth is a binary number of 5 bits as
follows:
x = [ “abcdg” 10:15 {a,b,c} 11100] and y = [“abcef” 10:30 {d,c,e} 01001]

Question 2) Feature Extraction


Given the following term frequencies in a corpus D that contains 3 documents D1..D3

Document 1 (D1) Document 2 (D2) Document 3 (D3)


Term Term Count Term Term Count Term Term Count
Caw 2 Sudan 3 Egypt 2
Sudan 1 Caw 2 Nile 2
Camel 1 Nile 1 Caw 1

a) Build a dataset matrix of size 3 objects (documents) by 5 attributes (terms) using binary term
frequency.
P

b) Create a distance matrix using squared Euclidian distance.

c) Identify the first nearest neighbour of the document D3 using hamming distance

Question 3 Mongo DB
1. What is MongoDB?
 A. Relational database
 B. Document-oriented database
 C. NoSQL database
 D. Both B and C
2. In MongoDB, what is a document equivalent to in a SQL database?
 Table
 Record
 Field
 Column
3. Which method is used to insert a single document into a MongoDB collection using
PyMongo?
o add_one()
o insert_single()insert_one()
o add_document()
4. What is the purpose of the PyMongo package in Python with respect to MongoDB?
 A. Web development
 B. Data visualization
 C. MongoDB driver for Python
 D. Machine learning
5. In MongoDB, what does RUD stand
for?
 A. Create, Retrieve, Update, Delete
 B. Connect, Read, Update, Delete
 C. Collect, Retrieve, Use, Delete
 D. Create, Read, Upload, Delete
6. How do you update a document in MongoDB using PyMongo?
 A. update_single()
 B. modify_one()
 C. update_one()
 D. change_document()
7. In PyMongo, what does the $set operator do in the context of updating a document?
 A. Sets the document to null
 B. Adds a new field to the document
 C. Updates a specific field in the document
 D. Sorts the document in ascending order
8. Which method is used to delete a single document from a MongoDB collection in
PyMongo?
 A. delete_one()
 B. remove_single()
 C. erase_one()
 D. discard_one()
9. What is the purpose of the sort() method in MongoDB when using PyMongo?
 A. Group documents in a collection
 B. Filter documents based on a condition
 C. Order the result in ascending or descending order
 D. Limit the number of documents returned

Question 4 Text Analysis


Given the following term frequencies in a corpus D that contains 3 documents D1..D3, answer the following
questions 1 to 6 :-
Document 1 (D1) Document 2 (D2) Document 3 (D3)
Term Term Term Term Term Term Count
Caw 2 Sudan 3Count Egypt 2
Sudan 1 Caw 2 Nile 2
Camel 1 Nile 1 Caw 1
1. The resulting data matrix will be of size
a) 3×5 b) 4 × 4 c) 5×5 d) 5×4
2. The normalized term frequency of tf (“camel”,D1) is
a) 0.20 b) 3 c) 4 d) 0.25
3. The inverse document frequency idf(“Camel”,D)
a) 3 b) 1 c) 1/3 d) 0
4. what is the tflogidf( “caw”,D)
a) 0 b) 1 c) 3 d) 5
5. The resulting distance matrix will be of size
a) 3×5 b) 4 × 4 c) 5×5 d) 3×3
6. The corresponding feature vector of document D1 using binary term frequency is
a) [1 1 1 0 0] b) [ 1 0 0 0 1] c) [1 0 1 1] d) [2 1 1]

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy