DS IMP QB (E-Next - In)
DS IMP QB (E-Next - In)
All questions in
the exam may not be from the question bank.
Unit 1 EDA
1. Compute mean, median and mode for (15, 10, 18, 20, 28, 32).
2. Compute mean, variance and standard deviation for (1, 3, 4,6,5).
Data Collection
1. Distinguish between primary and secondary data.
2. Describe the various types of data collection methods.
3. Describe the types of observational methods used in data collection.
4. Explain the process of Web crawling.
Data cleaning
1. Why is data cleaning required?
2. How to handle missing data in a dataset?
3. What is data normalization? Illustrate any one type of data normalization
technique with an example.
4. Write a short note on the following smoothing techniques:-
a. Smoothing by bin means
b. Smoothing by bin boundaries.
==
Unit 1 -
https://E-next.in
Disclaimer: This is just sample a question bank. All questions in
the exam may not be from the question bank.
1. What is data? State and explain different types of data.
2. Write a note on EDA.
3. Explain any two types of data visualizations in R along with example.
4. State and explain different types of data sources.
5. State various tasks done under data management.
6. Explain Data Collection
7. What is data cleaning? Why its done? How it is done?
8. Write a note on data analysis.
9. Explain data modelling in data science.
[As R is recommended tool in Data science, some questions on R as per the topics
covered in basics.ppt of thakur college workshop]
10. How to import csv file in R? What are the parameters associated with its function?
11. What is the use of factor a nd c i n R?
12. What is a data frame in R? How to create and access it?
13. Assume that there is a file called students.csv containing columns - roll, name, X, XII,
FY, SY
Write commands to do the following -
(i) Read the file in R
(ii) Give an overview of the file
(iii) Check first few records of the file
(iv) Find average X marks
(v) Display only roll, name and SY columns
14. State any 5 different ways using which you can get subset of data from a data frame in
R
15. How can you know about NA values present in the column? How can you still process
the columns? Give example.
16. Assume that there is a file called emp.csv containing columns - id, name, dept, desig,
sal
Write commands to do the following -
(i) Display all the records of “SALES” employees
(ii) Display the records of employees having sal greater than 1 lakh but less that 5 lakh
(iii) Display only the names of the employees who are managers
(iv) Display all the employees who are clerks and who are not in IT department
(v) Sort the data based on descending order of salary
17. Explain how can you merge two data frame? State two ways.
18. How can you join data frames on columns? Give examples.
19. How can you append two data frames? Give example
20. Explain aggregate funciton.
21. What is quartile? How can you retrieve their values?
22. What is Box plot? What type of information it shows in R? Give command to draw the
same.
23. What is histogram? How to draw it?
24. What is scatter plot? How to draw it?
https://E-next.in
Disclaimer: This is just sample a question bank. All questions in
the exam may not be from the question bank.
25. What is smoothing?
26. Explain resampling technique with example.
27. Explain discretization technique with example.
28. Write down the difference between qualitative and quantitative data with examples.
https://E-next.in
Disclaimer: This is just sample a question bank. All questions in
the exam may not be from the question bank.
30. Explain non procedural query language with its operations.
31. Describe in detail cloud services.
32. Explain in detail homogeneous distributed database and heterogeneous
distributed database.
Unit 3 Unit 3
(As per syllabus contents)
1. Explain the general idea of model selection techniques in Machine learning.
2. Explain the concept of regularization.
3. What is bias? What is variance? Write a note on bias / variance trade off?
4. What are AIC, BIC?
5. Write a note on cross validation.
6. What do you mean by - LASSO regression, Ridge Regression?
7. Write a note on dimension reduction.
8. Explain feature extraction.
9. What is supervised learning? Explain any one technique.
10. Explain the general model of regression. Give the idea wrt R
11. What are regression tree? Give an idea wrt R.
12. Writ a note on logistic regression.
13. Explain SVM
14. Explain k-nn technique.
15. Write a note on PCA.
16. Explain k-means clustering.
17. Explain hierarchical clustering
18. Explain ensemble methods.
19. Write down the difference between Lasso and Ridge regression.
20. Write down the difference between classification and regression.
21. Apply k means algorithm to the following data
Sample no X Y
1 185 72
2 170 56
3 168 60
4 179 68
5 182 72
6 188 77
22. Write down the difference between logistic and linear regression.
https://E-next.in