0% found this document useful (0 votes)
37 views13 pages

Ocs353 DSF Question Bank 25-26

This document is a question bank for the Data Science Fundamentals course (OCS353) offered by the Department of Mechanical Engineering for the academic year 2025-2026. It outlines course outcomes, knowledge levels based on Bloom's taxonomy, and provides a comprehensive list of questions categorized by units covering topics such as data science processes, data manipulation, and machine learning. Each section includes questions with varying difficulty levels and marks, aimed at assessing students' understanding and application of data science concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views13 pages

Ocs353 DSF Question Bank 25-26

This document is a question bank for the Data Science Fundamentals course (OCS353) offered by the Department of Mechanical Engineering for the academic year 2025-2026. It outlines course outcomes, knowledge levels based on Bloom's taxonomy, and provides a comprehensive list of questions categorized by units covering topics such as data science processes, data manipulation, and machine learning. Each section includes questions with varying difficulty levels and marks, aimed at assessing students' understanding and application of data science concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

DEPARTMENT OF MECHANICAL ENGINEERING

QUESTION BANK

SUBJECT CODE: OCS353 YEAR / SEM:IV/07

SUBJECT NAME: DATA SCIENCE FUNDAMENTALS ACADEMIC YEAR:2025-2026

NAME OF THE FACULTY: Mr.ARULKUMAR.R

Course Outcomes

After successful Completion of the Course, the Students should be able to

Course
Course Outcomes
Outcome No

CO405.1 Gain knowledge on data science process.

CO405.2 Perform data manipulation functions using Numpy and Pandas.

CO405.3 Understand different types of machine learning approaches.

CO405.4 Perform data visualization using tools.

CO405.5 Handle large volumes of data in practical scenarios.

Knowledge Level (Blooms Taxonomy)

Applying
K Remembering K Understanding K
(Application of
1 (Knowledge) 2 (Comprehension) 3
Knowledge)

K Analysing K Evaluating K Creating

4 (Analysis) 5 (Evaluation) 6 (Synthesis)


UNIT 1- INTRODUCTION
Data Science: Benefits and uses – facets of data – Data Science Process: Overview – Defining
research goals – Retrieving data – Data preparation – Exploratory Data analysis – build the
model– presenting findings and building applications – Data Mining – Data Warehousing –
Basic Statistical descriptions of Data
PART - A

BT
Q.No Questions Topic Mark
Level
Data
Define Data Science and Big Data. (NOV/DEC 2022) Science:Uses K1 2 Marks
1.
and Benefits
What is the role of data science in business, medical research, Data
healthcare, education, social media, technology and financial Science:Uses K1 2 Marks
2.
institutions? and Benefits
Facets of
Write the main types/categories of data? .(AU NOV/DEC 2023) K1 2 Marks
3. Data

How missing values present in a dataset are treated during data Data Science
K1 2 Marks
4. analysis phase? (AU APR/MAY 2024) Process
Statistical
Define Median with example.(AU NOV/DEC 2023) Description K2 2 Marks
5.
of Data
Identify and write down various data analytic challenges faced in Data Science
K2 2 Marks
6. the conventional system.(AU APR/MAY 2024) Process

Define Data Mining.(APRIL/MAY 2023) Data Mining K2 2 Marks


7.
Data Science
Give an overview of Common Errors.(NOV/DEC 2023) K1 2 Marks
8. Process
Data
Differentiate between Data Science and Big Data. K2 2 Marks
9. Science
List an overview of common errors in retrieving data and which Data Science
K2 2 Marks
10. cleansing solutions to be employed. (NOV/DEC 2022) Process
Facets of
What is Structured data?( NOV/DEC 2023) K2 2 Marks
11. Data
Outline the difference between structured and unstructured Facets of
K2 2 Marks
12. data.(APRIL/MAY 2023). Data
Statistical
Define Dummy Variables. Description K2 2 Marks
13.
of Data
Data Science
What is data preparation and process? K2 2 Marks
14. Process
Data Science
What is data modeling and machine generated data? K2 2 Marks
15. Process

What is a machine generated data? Data Science


16. K2 2 Marks
Process

Define data cleansing. Data Science


17. K2 2 Marks
Process

Define data warehousing, data mart and data lake. Data


18. K2 2 Marks
Warehousing
Data Science
19. Mention the significance of setting goals in data science project. K2 2 Marks
Process
Data Science
20. What is graph-based or network data? K2 2 Marks
Process
PART – B
Examine the different facets of data with the challenges in their Facets of
K1 13 Marks
1. processing(AU NOV/DEC 2022) Data
Elaborate the steps in data science process with diagram (AU
NOV/DEC 2022,APR/MAY 2023) (or) How do you set the Data Science
K1
2. research goal, retrieving data and data preparation process in data Process 13 Marks
science process?
What is data warehouse? Outline the architecture of a data Data
K2
3. warehouse with a diagram( AU APR/MAY 2023) Warehousing 13 Marks

Briefly explain the architecture of data mining Data Mining K1


4. 13 Marks
Exploratory
i)Explain about cleaning, integrating and transforming data in
data analysis
detail. (AU NOV/DEC 2023) (6)
& Statistical K2
5. ii) Explain the Basic Statistical description of data.
Description 13 Marks
of Data
(i) Explain Data Analytic life cycle.Brief about Time-series
Analysis. (6)
Data
(ii)Outline the purpose of data cleansing.How missing and K2
6. Preparation 13 Marks
nullified data attributes are handled and modified during
preprocessing stage?(AU APR/MAY 2024) (7)
(i) Suppose there is a dataset having variables with missing values
of more than 30%, how will you deal with such dataset.
Data
(ii) List down the various feature selection methods for selecting K2
7. Preparation 13 Marks
the right variables for building efficient predictive models.Explain
about any two selection methods.(AU APR/MAY 2024)
PART – C (If applicable)
1. Challenges and implementation of data mining? Data Mining K2 15 Marks
Find the following for the given data set:
Mean,Median,Mode,Variance, Standard Deviation and Skewness.
MARKS 0- 10- 20- 30- 40- 50- 60- 70- Statistical
10 20 30 40 50 60 70 80 Description K2 15 Marks
2.
NO OF 10 40 20 0 10 40 16 14 of Data
STUDENTS
(AU NOV/DEC 2023)
If the collected dataset is population.csv what are all the steps for Data
K2 15 Marks
3. the data preparation. Elaborate in detail. Preparation

UNIT 2- DATA MANIPULATION


Python Shell – Jupyter Notebook – IPython Magic Commands – NumPy Arrays-Universal
Functions – Aggregations – Computation on Arrays – Fancy Indexing – Sorting arrays – Structured
data – Data manipulation with Pandas – Data Indexing and Selection – Handling missing data –
Hierarchical indexing – Combining datasets – Aggregation and Grouping – String operations –
Working with time series – High performance
PART – A
Q.No Questions Topics BT Level Mark
State the advantages of using Numpy NumPy
K2 2 Marks
1. arrays.(APRIL/MAY 2023) Arrays
Outline the two types of Numpy’s UFuncs. Universal
K2 2 Marks
2. (APRIL/MAY 2023) functions
List the attributes of Numpy array. Give an example NumPy
K2 2 Marks
3. for it.(NOV/DEC 2022) Arrays
Create a data frame with key and data pairs as Key-
Data
Data pair as A-10,C-20, C-5,B-10,C-10. Find the sum
manipulation K2 2 Marks
4. of each key and display the result as each key group.
with pandas
(NOV/DEC 2022)
Sorting
Explain Partial sort.(AU NOV/DEC 2023) K2 2 Marks
5. arrays
Data
Under what circumstances,the pivot_table() in pandas
manipulation K2 2 Marks
6. is used? (AU APR/MAY 2024)
with pandas
Write the output for the following numpy code?
(i) np.array([3,14,4,2,3])
(ii) np.array([1,2,3,4],dtype=’float32’)
(iii) np.array([range(i,i+3) for i in [2,4,6]])
(iv) np.zeros(10,dtype=int)
NumPy
(v) np.ones((3,5), dtype=float) K2 2 Marks
7. Arrays
(vi) np.full((3,5),3.14)
(vii) np.arrange(0,20,20)
(viii) np.linespace(0,1,50
(ix) np.random.random((3,3))
(x) np.random.normal(0,1,(3,3))
Use appropriate data visualization modules develop a
Data
python code snippet that generates a simple sinusoidal
manipulation K2 2 Marks
8. wave in an empty gridded axes? (AU APR/MAY
with pandas
2024)
Summarize some built-in Pandas Aggregation
K2 2 Marks
9. aggregations.(NOV/DEC 2023) and Grouping
Fancy
What is fancy indexing? K2 2 Marks
10. Indexing
Hierarchical
What are indexers? K2 2 Marks
11. indexing
Handling
How missing data can be handled in python? K2 2 Marks
12. missing data
How the operations can be performed on null values in Handling
K2 2 Marks
13. pandas data science? missing data
Hierarchical
Define Hierarchical indexing. K2 2 Marks
14. indexing
Data
What is pivot table? manipulation K2 2 Marks
15.
with pandas
Data
16. What is Data frame? manipulation K2 2 Marks
with pandas
How do you verify the shape of 1D, 2D and 3D/ND Data
17. manipulation K2 2 Marks
array respectively?
with pandas
NumPy
18. Write short note on python array object K2 2 Marks
Arrays
How to perform slicing to access the elements of NumPy
19. Arrays K2 2 Marks
numpy arrays
NumPy
20. Compare python list with arrays K2 2 Marks
Arrays
PART - B
Imagine you have a series of data that represents the
amount of precipitation each day for a year in a given
city. Load the daily rainfall statistics for the City of Data
Chennai in 2021 which is given in a csv file manipulation K3 13 Marks
1.
Chennairainfall2021.csv using pandas generate a with pandas
histogram for rainy days, and find out the days that
have high rainfall.(NOV/DEC 2022)
Consider that an E-Commerce organization like
Amazon, have different regions sales as NorthSales,
SouthSales, WestSales, EastSales.csv files. They want Data
to combine North and West region sales and South and manipulation K3 13 Marks
2.
East sales so to find the aggregate sales of these with pandas
collaborating regions help them to do so using Python
code.(NOV/DEC 2022).
Explain grouping in python with example. (AU Aggregation
K3 13 Marks
3. NOV/DEC 2023) and Grouping
Fancy
i)Describe about fancy indexing with an example.
Indexing &
ii) Briefly explain the hierarchical indexing with K3 13 Marks
4. Hierarchical
examples
indexing
Explain the following in python
(i) Data indexing Data
(ii) Operation on missing data Indexing and K2 13 Marks
5.
(iii) data objects in pandas. Selection
(AU NOV/DEC 2023)
What is a universal function? Explain clearly each Universal
K3 13 Marks
6. function with examples. Functions
Define Dictionary in Python.Do the following
operations on dictionaries.
(i) Initialize two dictionaries (D1 and D2) with
key and value pairs. Data
(ii) Compare those two dictionaries with master Indexing and K3 13 Marks
7.
key list ‘M’ and print the missing keys. Selection
(iii) Find keys that are in D1 but NOT in D2
(iv) Merge D1 and D2 and create D3 using
expressions(AU APR/MAY 2024)
PART – C (If applicable)
Data
Describe in detail about pivot table.(AU NOV/DEC
manipulation K3 15 Marks
1. 2023)
with pandas
Given an unsorted multi indexes that represents the
distance between two cities, write a python code
snippet using appropriate libraries to find the shortest
distance between any two given cities. The following
matrix representation can be used to create the data
frame that can be served as an input for the prescribed Data
program. (AU APR/MAY 2024) Indexing and K3 15 Marks
2. A B C D E Selection
A 0 30 24 6 13
B 16 0 19 5 10
C 7 16 0 15 12
D 9 17 22 0 18
E 21 8 9 11 0
An URL Server wants to consolidate a history of
websites visited by an user ‘U’. Every visited website
information is stored in a 2-tuple format
viz.,(website_id,Duration_of_visit) in the URL cache. Data
Using split, apply and continue operations, device a Indexing and K3 15 Marks
3.
code snippet that consolidate the website history and Selection
find out the website whose duration of visit is
maximum.
Example:
Input:[(4,2),(5,1),(4,3),(1,4),(7,3),(5,2),(1,1),(7,1)]
Output:[(4,5),(5,3),(1,5),(7,4)].
The website with key_id ‘1’ has the max.duration of
visit=5.(

UNIT 3- MACHINE LEARNING


The modeling process - Types of machine learning - Supervised learning - Unsupervised learning -
Semi-supervised learning- Classification, regression - Clustering – Outliers and Outlier Analysis
PART - A
BT
Q.No Questions Topic Mark
Level
Machine
1. What is the niche of Machine Learning? (Nov/Dec’23) K1 2 Marks
Learning
Mention the difference between Data Mining and Machine Machine
2. K1 2 Marks
learning? Learning
Machine
3. What is ‘Overfitting’ in Machine learning? K2 2 Marks
Learning
What is the main key difference between supervised and Machine
4. K2 2 Marks
unsupervised machine learning? (April/May 2023) Learning
Linear
5. Define Multiple regressions.(NOV/DEC 2023) K1 2 Marks
Regression
Machine
6. What are the five popular algorithms of Machine Learning? K2 2 Marks
Learning

7. Define regression towards the mean.(NOV/DEC 2023) least squares K2 2 Marks

8. What is Random forest? (April/May 2023) Random forest K2 2 Marks

Machine
9. What is the standard approach to supervised learning? K2 2 Marks
Learning
Machine
10. What is meant by k-means algorithm? K1 2 Marks
Learning
What is the difference between artificial learning and machine Machine
11. K2 2 Marks
learning? Learning
Machine
12. State DBSCAN Algorithm K2 2 Marks
Learning
Linear
13. What is a Linear Regression? K2 2 Marks
Regression

14. What is the difference between classification and regression? regression K2 2 Marks

15. What is the difference between K-means and KNN? Clustering K2 2 Marks
16. How to train a model in machine learning. Machine
K2 2 Marks
Learning
What is the classification algorithm? Classification
17. K2 2 Marks
Algorithm
18. Define regression line. regression K2 2 Marks
State the learners in classification problem. Mention about lazy
Classification
19. K2 2 Marks
learners. Algorithm

What are the steps involved in data science process. data science
20. K2 2 Marks
process
PART - B
Explain various learning techniques involved in Unsupervised Unsupervised
1. K3 13 Marks
Learning. (AU APR/MAY 2024) Learning
Explain the types of Machine learning. Machine
2. K3 13 Marks
Learning
Assume an image has pixel size 240x180. Elaborate how K
3. means clustering can be used to achieve lossy data compression Clustering K2 13 Marks
of that image. (Nov/Dec’23)
Explain Semi Supervised Learning in detail. Machine K2
4. 13 Marks
Learning
List the applications of clustering and identify the advantages and K2
5. Clustering 13 Marks
disadvantages of clustering algorithms.(AU APR/MAY 2024)
What is a Classification Algorithm? Explain the steps to K2
Classification
6. construct a Classification Algorithm. List and Explain about the 13 Marks
Algorithm
different procedures used. (April/May 2023)
7. Explain in detail about outlier analysis (Nov/Dec’23) outlier K2 13 Marks
PART – C (If applicable)
Consider five points {x1,x2,x3,x4,x5} with the following
coordinates as a two-dimensional samples for clustering:
X1=(0.5, 1.75),x2=(1,2), x3=(1.75, 0.25), x4=(4,1),x5=(6,3)
Illustrate the k-means algorithm on the above data set. The Clustering 15 Marks
1. K3
required number of clusters is two and initially, clusters are
formed from random distribution of samples: C1={x1,x2,x4} and
C2={x3,x5} (April/May 2023)
List non-parametric techniques and Explain K-nearest neighbor
Clustering K3 15 Marks
2. estimation
The values of x and their corresponding values of y are shown
in the table below. Linear
x 1 2 3 4 5 6 7 Regression
3. K3 15 Marks
y 3 4 5 5 6 8 10 Models: Least
i) Find the least square regression line y=ax+b squares
ii) Estimate the value of y when x=10 (April/May 2023)
UNIT 4- DATA VISUALIZATION
Importing Matplotlib – Line plots – Scatter plots – visualizing errors – density and contour plots –
Histograms – legends – colors – subplots – text and annotation – customization – three dimensional
plotting – Geographic Data with Basemap – Visualization with Seaborn.
PART – A
Q.No Questions Topic BT Level Mark
What is the purpose of errorbar function in Matplotlib? visualizing
K2 2 Marks
1. Give an example.(NOV/DEC 2022) errors
three
Showcase 3-dimensional drawing in Matplotlib with
dimensional K2 2 Marks
2. corresponding Python Code. (NOV/DEC 2022)
plotting
State the two possible options in Ipython notebook used three
to embed graphics directly in the notebook. dimensional K2 2 Marks
3.
(APRIL/MAY 2023) plotting
How plt.scatter function differs from plt.flot function?
Scatter plots K2 2 Marks
4. (APRIL/MAY 2023)
Importing
What is purpose of matplotlib? K2 2 Marks
5. Matplotlib
Importing
Write the dual interface of matplotlib? K2 2 Marks
6. Matplotlib

How to draw a simple line plot using matplotlib? Line plots K2 2 Marks
7.

What functions can be used to draw scatter plots? Scatter plots K2 2 Marks
8.

Write the difference between plot and scatter functions? Scatter plots K2 2 Marks
9.
density and
Define contour plots? K2 2 Marks
10. contour plots
density and
What functions can be used to draw contour plots? K2 2 Marks
11. contour plots
Write a python code snippet that generates a time-series
graph representing COVID-19 incidence cases for a
particular week.(AU APR/MAY 2024)
Histograms K2 2 Marks
12. Day Day Day Day Day Day Day
1 2 3 4 5 6 7
7 18 9 44 2 5 89
Write a python code snippet that draws a histogram for
the following list of positive numbers (AU APR/MAY
Histograms K2 2 Marks
13. 2024)
7 18 9 44 2 5 89 91 11 6 77 85 91
three
How to create a 3-D wireframe plot? dimensional K2 2 Marks
14.
plotting
three
Define surface plot? dimensional K2 2 Marks
15.
plotting

16. Define plot legends. density K2 2 Marks

17. What are subplots? subplots K2 2 Marks

18. Define Kernel Density estimation. density K2 2 Marks

scatter plots
19. Define scatter plots K2 2 Marks

scatter plots
20. Define cylindrical projections. K2 2 Marks

PART – B
Explain about matplotlib with its import, setting styles
text and
K3 13 Marks
1. and displaying the plots.(NOV/DEC 2022) annotation
Appraise the following (i) Histograms (ii) Binnings (iii)
Density with appropriate Python code.(NOV/DEC Histogram K3 13 Marks
2.
2022)
Explain in detail about three dimensional plotting ion
Customization K1 13 Marks
3. matplolib.(AU NOV/DEC 2023)
Explain various features of Matplotlib platform used for
Importing
data visualization and illustrate its challenges.(AU K1 13 Marks
4. Matplotlib
NOV/DEC 2023)
density and
Explain contour plot and density. K2 13 Marks
5. contour plots
Write a code snippet that projects our globe as a 2-D flat
surface (using cylindrical project) and convey three
information about the location of any three major Indian dimensional K3 13 Marks
6.
cities in the map(using scatter plot)(AU APR/MAY plotting
2024)
(i) Write a working code that performs a simple
Guassian process regression(GPR), using the Scikit-
Learn API.
Visualization
(ii) Briefly explain about visualization with Seaborn. K3 13 Marks
7. with Seaborn
Give an example working code segment that represents
a 2-D kernel density plot for any data.(AU APR/MAY
2024)
PART – C (If applicable)
Perform an exploratory data analysis for the following
data with different types of plots:
The dataset contains cases from a study that was
conducted between 1958 and 1970 at the University of
Chicago’s Billings Hospital on the survival of patients
who had undergone surgery for breast cancer.
Visualization
Data attributes :- K3 15 Marks
1. with Seaborn
Age of patient at the time of operation (numerical)
Patient’s year of operation (year-1990,numerical )
Number of positive axillary nodes detected(numerical)
Survival status (class attribute ) 1= the patient survived
5 years or longer, 2= the patient died within 5 year.
(NOV/DEC 2022)
three
Explain in detail about Visualizing a Mobius Strip. dimensional K3 15 Marks
2.
plotting
Explain about Geographic data with Basemap with Geographic
different Map Projections, Map background and Plotting data with K3 15 Marks
3.
data in Maps. Basemap
UNIT 5 HANDLING LARGE DATA
Problems - techniques for handling large volumes of data - programming tips for dealing with large
data sets- Case studies: Predicting malicious URLs, Building a recommender system - Tools and
techniques needed - Research question - Data preparation - Model building – Presentation and
automation.
PART – A
BT
Q.No Questions Topic Mark
Level
Techniques for
What is meant by large data? handling large K2 2 Marks
1.
volumes of data

What is the problem caused by memory speed? Problems K2 2 Marks


2.
Mention a scenario where in the problems to work
Problems K2 2 Marks
3. with having more memory/speed requirements.
Which library is most efficient for protecting from Predicting malicious
K2 2 Marks
4. malicious websites. URLs
Predicting malicious
Give a dataset example for malicious URL’s. K2 2 Marks
5. URLs
Define a research goal for finding out if the URL is Predicting malicious
K2 2 Marks
6. not a malicious one or not. URLs
Programming tips for
What are the types of algorithms to look in for while
dealing with large data K2 2 Marks
7. choosing the right algorithm?
sets
How does MYSQL database connect to python
python library K2 2 Marks
8. library?
Give a logic on how the perceptron classifies (0 or 1) Building a
K2 2 Marks
9. from the given data. recommender system
Building a
What are train functions and type? K2 2 Marks
10. recommender system
Techniques for
What is a tree structure? handling large K2 2 Marks
11.
volumes of data
Techniques for
What is SPARSE data? handling large K2 2 Marks
12.
volumes of data
Programming tips for
If we have a data set with 10,000 rows but we feed
dealing with large data K2 2 Marks
13. only 100 rows of data then the technique is known as?
sets
Programming tips for
How does streaming algorithms accept the data? dealing with large data K2 2 Marks
14.
sets
Which are the python tools available for linear Tools and techniques
K2 2 Marks
15. regression coefficient estimation. needed
Techniques for
16. Define perceptron handling large K2 2 Marks
volumes of data

17. State the different python tools python tools K2 2 Marks

Programming tips for


18. What is meant by hamming distance? dealing with large data K2 2 Marks
sets
What are the general problems you face while
handling large
19. K2 2 Marks
handling large data. volumes of data
What is meant by predicting malicious URLs? Predicting malicious
20. K2 2 Marks
URLs
PART – B
Techniques for
Explain in detail the problems faced when handling
handling large K3 13 Marks
1. large amount of data?
volumes of data
Predicting malicious
Explain the case study for Predicting Malicious URL. K3 13 Marks
2. URLs

Explain the case study for Building a Recommender Building a


K3 13 Marks
3. system inside a DB. recommender system
Programming tips for
What are Online Learning algorithm? Explain with
dealing with large data K1 13 Marks
4. appropriate real time examples.
sets
How to train the perceptron with observation and
explain Model building K2 13 Marks
5.
about the function used?
Elaborate on the train functions with an appropriate Building a
K3 13 Marks
6. coding example. recommender system
What is the technique to divide a large matrix into
Data Preparation K3 13 Marks
7. small ones in detail.
PART – C (If applicable)
Elaborate on the data science process for the
following case study:
(i) Redit classification case study which
Data Preparation K3 15 Marks
1. involves creating a classification model
capable of distinguishing posts about data
science.
Why the data structure selection plays a crucial role in Programming tips for
designing a proper algorithm and bring out the usage dealing with large data K3 15 Marks
2.
of sparse data in this process with relevant examples. sets
A website involves a social network analysis to
Programming tips for
analyze the connectivity of people and on what they
dealing with large data K3 15 Marks
3. communicate on. Build a tree data structure for the
sets
following scenario.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy