0% found this document useful (0 votes)
49 views4 pages

DS IMP QB (E-Next - In)

The document provides a sample question bank covering topics related to data science including EDA, data collection, data cleaning, data visualization, data sources, MongoDB, machine learning models and algorithms.

Uploaded by

Gaurav bansode
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views4 pages

DS IMP QB (E-Next - In)

The document provides a sample question bank covering topics related to data science including EDA, data collection, data cleaning, data visualization, data sources, MongoDB, machine learning models and algorithms.

Uploaded by

Gaurav bansode
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Disclaimer: This is just sample a question bank.

All questions in
the exam may not be from the question bank​.

Paper 6 - Data Science

Unit 1 EDA
1. Compute mean, median and mode for (15, 10, 18, 20, 28, 32).
2. Compute mean, variance and standard deviation for (1, 3, 4,6,5).

Data Collection
1. Distinguish between primary and secondary data.
2. Describe the various types of data collection methods.
3. Describe the types of observational methods used in data collection.
4. Explain the process of Web crawling.

Data cleaning
1. Why is data cleaning required?
2. How to handle missing data in a dataset?
3. What is data normalization? Illustrate any one type of data normalization
technique with an example.
4. Write a short note on the following smoothing techniques:-
a. Smoothing by bin means
b. Smoothing by bin boundaries.

Topic: Data Visualization


1. What is heatmap? Explain its importance to visualise the existing pattern in
the dataset.
2. Discuss the importance of scatter plot in data analysis. How it can be
viewed in R?
3. Write short notes on the following data visualization techniques:
a. Line chart
b. Dendrograms
4. What is a Box plot? Describe the process to identify an outlier with Box
plot.
5. Draw a Box summary plot with the following dataset: 6 6 7 8 9 9 9 10 10
11 13
Topic: Different types of data sources
1. Distinguish between structured and unstructured data.
2. Discuss some applications of unstructured data.

==

Unit 1 -

(Based on actual syllabus titles)

https://E-next.in
Disclaimer: This is just sample a question bank. All questions in
the exam may not be from the question bank​.
1. What is data? State and explain different types of data.
2. Write a note on EDA.
3. Explain any two types of data visualizations in R along with example.
4. State and explain different types of data sources.
5. State various tasks done under data management.
6. Explain Data Collection
7. What is data cleaning? Why its done? How it is done?
8. Write a note on data analysis.
9. Explain data modelling in data science.

[As R is recommended tool in Data science, some questions on R as per the topics
covered in basics.ppt of thakur college workshop]
10. How to import csv file in R? What are the parameters associated with its function?
11. What is the use of ​factor a​ nd ​c i​ n R?
12. What is a data frame in R? How to create and access it?
13. Assume that there is a file called students.csv containing columns - roll, name, X, XII,
FY, SY
Write commands to do the following -
(i) Read the file in R
(ii) Give an overview of the file
(iii) Check first few records of the file
(iv) Find average X marks
(v) Display only roll, name and SY columns
14. State any 5 different ways using which you can get subset of data from a data frame in
R
15. How can you know about NA values present in the column? How can you still process
the columns? Give example.
16. Assume that there is a file called emp.csv containing columns - id, name, dept, desig,
sal
Write commands to do the following -
(i) Display all the records of “SALES” employees
(ii) Display the records of employees having sal greater than 1 lakh but less that 5 lakh
(iii) Display only the names of the employees who are managers
(iv) Display all the employees who are clerks and who are not in IT department
(v) Sort the data based on descending order of salary
17. Explain how can you merge two data frame? State two ways.
18. How can you join data frames on columns? Give examples.
19. How can you append two data frames? Give example
20. Explain aggregate funciton.
21. What is quartile? How can you retrieve their values?
22. What is Box plot? What type of information it shows in R? Give command to draw the
same.
23. What is histogram? How to draw it?
24. What is scatter plot? How to draw it?

[Questions based on topics in syllabus workshop ppt]

https://E-next.in
Disclaimer: This is just sample a question bank. All questions in
the exam may not be from the question bank​.
25. What is smoothing?
26. Explain resampling technique with example.
27. Explain discretization technique with example.
28. Write down the difference between qualitative and quantitative data with examples.

Unit 2 Topic - MongoDB(Questions based on mongodbpracts.doc file shared in Thakur


College Workshop)
1. What is MongoDB? State its features.
2. What is MongoDB? State its advantages over DBMS
3. How to create, use, show and delete databases in Mongodb? Give example.
4. What is a collection in MongoDB? Give two different examples of creating
collections
5. How can you see data stored in MongoDB? Explain any two methods with
example.
6. Explain find function of MongoDB.
7. How can you update information in MongoDB?
8. Give examples of how to delete records in MongoDB.
9. State the use of limit and skip methods.
10. How to create indexes in MongoDB? Give example.

General questions based on syllabus titles -


11. Write a note on data curation.
12. Explain large scale data systems.
13. Write a note on AWS
14. Describe the process of data extraction from semi-structured data on the Web.

Questions based on syllabus workshop PPT contents -


14. What is NoSQL? What are its features?
15. What is NoSQL? State its advantages over DBMS
16. What is NoSQL? Briefly explain its types
17. Explain any one type of NoSQL technology.
18. Write a note on data transformation.
19. How can you read JSON file in R?
20. How can you read XML file in R?
21. Write a note on XPATH.
22. State few xpath expressions.
23. What is web scraping?
24. Explain various ways to do web scraping.
25. How can you read HTML in R to extract particular information? Give example.
26. Write a note on Map Reduce architecture.
27. Write a note on HBase.
28. Compare HBase & RDBMS
29. Explain procedural query language with its operations.

https://E-next.in
Disclaimer: This is just sample a question bank. All questions in
the exam may not be from the question bank​.
30. Explain non procedural query language with its operations.
31. Describe in detail cloud services.
32. Explain in detail homogeneous distributed database and heterogeneous
distributed database.

Unit 3 Unit 3
(As per syllabus contents)
1. Explain the general idea of model selection techniques in Machine learning.
2. Explain the concept of regularization.
3. What is bias? What is variance? Write a note on bias / variance trade off?
4. What are AIC, BIC?
5. Write a note on cross validation.
6. What do you mean by - LASSO regression, Ridge Regression?
7. Write a note on dimension reduction.
8. Explain feature extraction.
9. What is supervised learning? Explain any one technique.
10. Explain the general model of regression. Give the idea wrt R
11. What are regression tree? Give an idea wrt R.
12. Writ a note on logistic regression.
13. Explain SVM
14. Explain k-nn technique.
15. Write a note on PCA.
16. Explain k-means clustering.
17. Explain hierarchical clustering
18. Explain ensemble methods.
19. Write down the difference between Lasso and Ridge regression.
20. Write down the difference between classification and regression.
21. Apply k means algorithm to the following data
Sample no X Y
1 185 72
2 170 56
3 168 60
4 179 68
5 182 72
6 188 77
22. Write down the difference between logistic and linear regression.

https://E-next.in

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy