0% found this document useful (0 votes)

40 views6 pages

DS QB

The document is a question bank for a T.Y.B.Sc. (CS) Data Science course, covering various topics such as data science fundamentals, data preprocessing, exploratory data analysis, machine learning algorithms, model evaluation, and data management. It includes detailed questions on concepts, techniques, and tools relevant to data science, emphasizing the importance of data quality, visualization, and ethical considerations in data handling. The content is structured into units that facilitate a comprehensive understanding of data science principles and practices.

Uploaded by

F-076Vivek Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views6 pages

DS QB

Uploaded by

F-076Vivek Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

T.Y.B.Sc.

(CS) Sem VI Data Science Question Bank

Unit I
Introduction to Data Science and Data Preprocessing:
1. Explain the concept of Data Science and its significance in modern-day industries.
2. Explain the term Data Science and its role in extracting knowledge from data.
3. Discuss three key applications of Data Science in different domains.
4. Compare and contrast Data Science with Business Intelligence (BI) in terms of
goals/objectives, methodologies, and outcomes.
5. Differentiate between Artificial Intelligence (AI) and Machine Learning (ML) with respect to
their scope and applications.
6. Analyze the relationship between Data Warehousing/Data Mining (DW-DM) and Data
Science, highlighting their similarities and differences.
7. Discuss the importance of Data Preprocessing in the Data Science pipeline and its impact on
the quality of analysis and modeling outcomes.
Data Types and Sources:
1. Define structured data and provide examples of structured datasets. Describe the
characteristics of structured data.
2. Define structured, unstructured, and semi-structured data, providing examples for each type.
3. Discuss the challenges associated with handling unstructured data and propose solutions.
4. Explain how semi-structured data differs from structured and unstructured data, citing
examples.
5. Evaluate the advantages and disadvantages of different data sources such as databases, files,
and APIs in the context of Data Science.
6. Describe the process of data collection through web scraping and its importance in data
acquisition.
7. Illustrate how data from social media platforms can be leveraged for sentiment analysis and
market research purposes.
8. Discuss the challenges associated with sensor data and social media data, and propose
strategies for handling and analyzing such data effectively.
Data Preprocessing:
1. Demonstrate the importance of data cleaning in the context of Data Science projects.
2. Describe the steps involved in data cleaning and the techniques used to handle missing values,
outliers, and duplicates.
3. Explain the rationale behind data transformation techniques such as scaling, normalization,
and encoding categorical variables.
4. Discuss the importance of feature selection in machine learning models and the criteria used
for selecting relevant features.
5. Outline the process of data merging and the challenges associated with combining multiple
datasets for analysis.
6. Discuss the challenges and strategies involved in data merging when combining multiple
datasets for analysis.
7. Analyze the impact of data preprocessing on the quality and effectiveness of machine learning
algorithms.
Data Wrangling and Feature Engineering:
1. Define data wrangling and explain its role in preparing raw data for analysis.
2. Describe common data wrangling techniques such as reshaping, pivoting, and aggregating.
3. Illustrate the concept of feature engineering and its impact on model performance, with a
focus on creating new features and handling time-series data.
4. Explain the process of dummification and feature scaling, including techniques such as
converting categorical variables into binary indicators and standardization/normalization of
numerical features. Discuss the implications of dummification on machine learning
algorithms.
5. Compare and contrast feature scaling techniques such as standardization and normalization,
discussing their effects on model training and performance.
Tools and Libraries:
1. Explain the functionalities of popular libraries and technologies used in Data Science,
including Pandas, NumPy, and Sci-kit Learn.
2. Describe how Pandas facilitates data manipulation tasks such as reading, cleaning, and
transforming datasets.
3. Discuss the advantages of using NumPy for numerical computing and its role in scientific
computing applications. OR Discuss the role of NumPy in numerical computing and its
advantages over traditional Python lists.
4. Explain how Sci-kit Learn facilitates machine learning tasks such as model training,
evaluation, and deployment.
5. Discuss the importance of using libraries and technologies in Data Science projects for
efficient and scalable data analysis.
Unit II
Exploratory Data Analysis (EDA):
1. Explain the importance of exploratory data analysis (EDA) in the data science process.
2. Describe three data visualization techniques commonly used in EDA and their applications.
3. Discuss the role of histograms, scatter plots, and box plots in understanding the distribution
and relationships within a dataset.
4. Define descriptive statistics and provide examples of commonly used measures such as mean,
median, and standard deviation. OR Define descriptive statistics and discuss their role in
summarizing and understanding datasets. Compare and contrast measures such as mean,
median, mode, and standard deviation.
5. Discuss the significance of histograms, scatter plots, and box plots in visualizing different
types of data distributions.
6. Explain the concept of hypothesis testing and provide examples of situations where t-tests,
chi-square tests, and ANOVA are applicable.
Introduction to Machine Learning:
1. Differentiate between supervised and unsupervised learning algorithms, providing examples
of each.
2. Explain the concept of the bias-variance tradeoff and its implications for model performance.
3. Define underfitting and overfitting in the context of machine learning models and suggest
strategies to address each issue.
4. Explain the process of model training, validation, and testing in the context of supervised
learning algorithms.
5. Describe how clustering and dimensionality reduction are used in unsupervised learning
tasks.
6. Discuss the impact of data preprocessing techniques on model performance in supervised and
unsupervised learning tasks.
7. Provide examples of real-world applications for classification and regression tasks in
supervised learning.
Regression Analysis:
1. Explain the principles of simple linear regression and its applications in predictive modeling.
2. Discuss the assumptions underlying multiple linear regression and how they can be validated.
3. Outline the steps involved in conducting stepwise regression and its advantages in model
selection.
4. Describe logistic regression and its use in binary classification problems. OR Discuss the
application of logistic regression in classification tasks and its advantages over linear
regression.
5. Compare and contrast the assumptions underlying linear regression and logistic regression
models.
Model Evaluation and Selection:
1. Define accuracy, precision, recall, and F1-score as metrics for evaluating classification
models and explain their significance. Discuss the strengths and limitations of each metric.
2. Describe how a confusion matrix is constructed and how it can be used to evaluate model
performance.
3. Explain the concept of a ROC curve and discuss how it can be used to evaluate the
performance of binary classification models.
4. Explain the concept of cross-validation and compare k-fold cross-validation with stratified
cross-validation.
5. Describe the process of hyperparameter tuning and model selection and discuss its importance
in improving model performance.
Machine Learning Algorithms:
1. Describe the decision tree algorithm and its advantages and limitations in classification and
regression tasks.
2. Explain the principles of decision trees and random forests and their advantages in handling
nonlinear relationships and feature interactions.
3. Discuss the mathematical intuition behind support vector machines (SVM) and their
applications in both classification and regression tasks.
4. Describe artificial neural networks (ANN) and their architecture, including input, hidden, and
output layers.
5. Compare and contrast ensemble learning techniques like boosting and bagging, highlighting
their strengths and weaknesses.
6. Discuss the working principle of K-nearest neighbors (K-NN) algorithm and its use in
classification and regression tasks.
7. Explain the concept of gradient descent and its role in optimizing the parameters of machine
learning models.
Unit III
Model Evaluation Metrics:
1. Define accuracy, precision, recall, and F1-score as metrics for evaluating classification models.
Discuss its limitations, especially in the presence of imbalanced datasets. Also discuss scenarios
where each metric might be more appropriate.
2. Explain the concept of the Area Under the Curve (AUC) in ROC curve analysis. How does
AUC help in evaluating the performance of a binary classification model?
3. Discuss the challenges of evaluating models for imbalanced datasets. How do imbalanced
classes affect traditional evaluation metrics?
4. Describe techniques that can be used to address these challenges and ensure reliable model
evaluation.
5.
Data Visualization and Communication:
1. Outline the principles of effective data visualization. How do these principles contribute to
better communication of insights? OR Outline the principles of effective data visualization.
2. Outline the principles of effective data visualization, including clarity, simplicity, and
relevance.
3. What factors should be considered when creating visualizations to communicate insights?
4. Compare and contrast different types of visualizations such as bar charts, line charts, and scatter
plots. Provide examples of when each type of visualization would be appropriate.
5. Discuss the role of visualization tools such as matplotlib, seaborn, and Tableau in creating
compelling visualizations. What are the advantages and limitations of each tool?
6. Explain the concept of data storytelling. How can data storytelling enhance the impact of data
visualizations in conveying insights to stakeholders?

Data Management:
1. Define data management activities and their role in ensuring data quality and usability. OR
Provide an overview of data management activities and their importance in ensuring data
quality and usability.
2. Explain the concept of data pipelines and the stages involved in the data extraction,
transformation, and loading (ETL) process.
3. Discuss the importance of data governance and data quality assurance in maintaining data
integrity and reliability.
4. Discuss the importance of data governance and data quality assurance in maintaining data
integrity and compliance with regulatory standards.
5. Describe the considerations for data privacy and security in data management practices. Discuss
strategies for protecting sensitive data and complying with regulations such as GDPR and
HIPAA.
6. Explain the considerations and best practices for ensuring data privacy and security throughout
the data management process. What measures can organizations implement to protect sensitive
information?
7. Discuss the ethical considerations surrounding data privacy and security, including regulatory
compliance and measures to protect sensitive information.
8. Analyze the considerations for data privacy and security in data management practices. How
can organizations protect sensitive data while still enabling data-driven insights? OR Explain
the considerations for data privacy and security in data management practices. What measures
should organizations take to protect sensitive data?

Ocs353dsf Unit Wise Notes
100% (2)
Ocs353dsf Unit Wise Notes
121 pages
Study Notes To Ace Your Data Science Interview
No ratings yet
Study Notes To Ace Your Data Science Interview
7 pages
Data Science Training in Naresh I Technologies
100% (3)
Data Science Training in Naresh I Technologies
18 pages
Loan Approval Predictor Using Data Science and Machine Learning Project
100% (1)
Loan Approval Predictor Using Data Science and Machine Learning Project
66 pages
Data Science
No ratings yet
Data Science
6 pages
Data Science-1
No ratings yet
Data Science-1
6 pages
Hammad Raza.
No ratings yet
Hammad Raza.
28 pages
Data Science Course Syllabus 01
100% (1)
Data Science Course Syllabus 01
20 pages
Data Science
No ratings yet
Data Science
9 pages
DS QB
No ratings yet
DS QB
3 pages
Data Science and Machine Learning Syllabus V1.0
No ratings yet
Data Science and Machine Learning Syllabus V1.0
6 pages
Sem 6
No ratings yet
Sem 6
12 pages
Data Science - QB
No ratings yet
Data Science - QB
8 pages
Cmsa Sem 6 Dse ML
No ratings yet
Cmsa Sem 6 Dse ML
3 pages
Assignment Unit I and II
No ratings yet
Assignment Unit I and II
3 pages
Data Science: Concepts, Strategies, and Applications
From Everand
Data Science: Concepts, Strategies, and Applications
Zemelak Goraga
No ratings yet
Machine Learning IMP Questions
No ratings yet
Machine Learning IMP Questions
5 pages
Data Engineers
No ratings yet
Data Engineers
21 pages
OCS353 Data Science Fundamentals QB - (Common To EEE, Mech, Civil)
No ratings yet
OCS353 Data Science Fundamentals QB - (Common To EEE, Mech, Civil)
7 pages
Data Science Deep Learning & Artificial Intelligence
No ratings yet
Data Science Deep Learning & Artificial Intelligence
9 pages
Ocs353 DCF
No ratings yet
Ocs353 DCF
4 pages
Data Science Course Outline CES LUMS
No ratings yet
Data Science Course Outline CES LUMS
4 pages
FDS QP - Thy
No ratings yet
FDS QP - Thy
1 page
Assignment DMW
No ratings yet
Assignment DMW
2 pages
Unit 4 & 5-Data Science and Computer Vision
No ratings yet
Unit 4 & 5-Data Science and Computer Vision
18 pages
File of ML
No ratings yet
File of ML
42 pages
Data Science Assignment
No ratings yet
Data Science Assignment
1 page
Fd45092a Ccad 459e Bc18 B01536fd6bac Untitled
No ratings yet
Fd45092a Ccad 459e Bc18 B01536fd6bac Untitled
53 pages
Title: Data Science: Foundations, Techniques, and Applications
No ratings yet
Title: Data Science: Foundations, Techniques, and Applications
5 pages
7 - Foundations of DS
No ratings yet
7 - Foundations of DS
8 pages
Data Science
No ratings yet
Data Science
14 pages
Data Science Master Class 2023
No ratings yet
Data Science Master Class 2023
8 pages
Introduction To Data Science Course Outline
No ratings yet
Introduction To Data Science Course Outline
5 pages
Assignment I
No ratings yet
Assignment I
3 pages
Da QB
No ratings yet
Da QB
4 pages
Data Mining & Machine Learning Courseoutline
No ratings yet
Data Mining & Machine Learning Courseoutline
7 pages
DS Syllabus
No ratings yet
DS Syllabus
29 pages
Data Science
No ratings yet
Data Science
28 pages
Unit 1
No ratings yet
Unit 1
21 pages
Machine Learning Fundamentals: Concepts, Models, and Applications
From Everand
Machine Learning Fundamentals: Concepts, Models, and Applications
Amar Sahay
No ratings yet
ML Question Bank
No ratings yet
ML Question Bank
68 pages
(A) What Is Machine Learning? Explain The Impact of Various Machine Learning Techniques in Today's World
No ratings yet
(A) What Is Machine Learning? Explain The Impact of Various Machine Learning Techniques in Today's World
6 pages
Naan Mudhalvan Questions
No ratings yet
Naan Mudhalvan Questions
2 pages
Data Science QB
No ratings yet
Data Science QB
2 pages
OCS353 - Review Questions
No ratings yet
OCS353 - Review Questions
3 pages
DS Unit 1
No ratings yet
DS Unit 1
37 pages
Class X HHW
No ratings yet
Class X HHW
2 pages
ml2 250401 105339
No ratings yet
ml2 250401 105339
10 pages
1) Introduction To Numpy, Pandas and Matplotlib
No ratings yet
1) Introduction To Numpy, Pandas and Matplotlib
11 pages
MIE1624 - Assignment 3
No ratings yet
MIE1624 - Assignment 3
6 pages
Common DS Interview Questions and Answers - 1
No ratings yet
Common DS Interview Questions and Answers - 1
4 pages
DS&a + AI ML Nov 23 6868 - Calendar
No ratings yet
DS&a + AI ML Nov 23 6868 - Calendar
9 pages
Topic Wise Dsa Questions
No ratings yet
Topic Wise Dsa Questions
15 pages
Anjali It Presentation 2024
No ratings yet
Anjali It Presentation 2024
25 pages
Machine Learning With Python
100% (2)
Machine Learning With Python
137 pages
AI ML Course
No ratings yet
AI ML Course
19 pages
Machine Intelligence
No ratings yet
Machine Intelligence
3 pages
Data Science Course Content Chapter 1: Introduction To Data Science
No ratings yet
Data Science Course Content Chapter 1: Introduction To Data Science
8 pages
Question Bank R
No ratings yet
Question Bank R
19 pages
Transformers: Principles and Applications
From Everand
Transformers: Principles and Applications
Richard Johnson
No ratings yet
Cambridge International AS Level: 8021/23 English General Paper
No ratings yet
Cambridge International AS Level: 8021/23 English General Paper
8 pages
How To Create Display Ad in Google Ads: Meaning
No ratings yet
How To Create Display Ad in Google Ads: Meaning
4 pages
System Analysis and Design Thesis Chapter 3
100% (4)
System Analysis and Design Thesis Chapter 3
7 pages
Lab. 5
No ratings yet
Lab. 5
15 pages
7 KS 03: Design and Analysis of Algorithms (10342) : A BC D
No ratings yet
7 KS 03: Design and Analysis of Algorithms (10342) : A BC D
6 pages
Images - Answers - BrainQuest
No ratings yet
Images - Answers - BrainQuest
1 page
Lab3 Identity - SSO Integration of Test SAML App in AAD
No ratings yet
Lab3 Identity - SSO Integration of Test SAML App in AAD
12 pages
Houghton Mifflin Math Homework Grade 6
100% (1)
Houghton Mifflin Math Homework Grade 6
6 pages
An Efficient Forward Secure Proxy Re Encryption 17658971hwytdnvdhxcf
No ratings yet
An Efficient Forward Secure Proxy Re Encryption 17658971hwytdnvdhxcf
15 pages
How To Install FFmpeg On Windows - 15 Steps (With Pictures)
No ratings yet
How To Install FFmpeg On Windows - 15 Steps (With Pictures)
4 pages
HCP Replication Activities v4-0
No ratings yet
HCP Replication Activities v4-0
40 pages
Grades 9 Daily Lesson Log School Grade Level 9 Teacher Learning Area MATH Teaching Dates and Time Quarter SECOND
No ratings yet
Grades 9 Daily Lesson Log School Grade Level 9 Teacher Learning Area MATH Teaching Dates and Time Quarter SECOND
16 pages
Unit Test 2 PDF
No ratings yet
Unit Test 2 PDF
1 page
M365 BPChecklists
No ratings yet
M365 BPChecklists
3 pages
SLE201v15 Lab Exercise 3.3
No ratings yet
SLE201v15 Lab Exercise 3.3
3 pages
Tester Guide
No ratings yet
Tester Guide
79 pages
OpenLab CDS 2.6 QuickReferenceSheet
No ratings yet
OpenLab CDS 2.6 QuickReferenceSheet
2 pages
Hotel Basic Network Configuration - PNP
No ratings yet
Hotel Basic Network Configuration - PNP
12 pages
8.1.1.2 Packet Tracer - Create Your Own Thing
0% (1)
8.1.1.2 Packet Tracer - Create Your Own Thing
5 pages
08 - Alteon ADC Level 1 Lab Manual - Content Modification
No ratings yet
08 - Alteon ADC Level 1 Lab Manual - Content Modification
8 pages
Gauss Contest Paper 2021
No ratings yet
Gauss Contest Paper 2021
4 pages
Fire Detection Algorithm Based On The Fusion of YOLOv8 and Deformable Conv DCN
No ratings yet
Fire Detection Algorithm Based On The Fusion of YOLOv8 and Deformable Conv DCN
8 pages
PP-PS Integration
No ratings yet
PP-PS Integration
20 pages
Multirate and Adaptive Filters
No ratings yet
Multirate and Adaptive Filters
55 pages
Use Case Analysis: Use Case Name: ID: Priority: Actor: Description: Trigger: Type: External Temporal Preconditions
No ratings yet
Use Case Analysis: Use Case Name: ID: Priority: Actor: Description: Trigger: Type: External Temporal Preconditions
3 pages
Configuration Samba Server File Sharing
No ratings yet
Configuration Samba Server File Sharing
20 pages
Generative AI and Prompt Engineering
No ratings yet
Generative AI and Prompt Engineering
36 pages
Fundamentals of Database Systems: LESSON 3: The ER Model
No ratings yet
Fundamentals of Database Systems: LESSON 3: The ER Model
38 pages
Data Sheet: Automotive Audio Bus A B Transceiver
No ratings yet
Data Sheet: Automotive Audio Bus A B Transceiver
2 pages
Cigre CFP 2026 220125 BD
No ratings yet
Cigre CFP 2026 220125 BD
14 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DS QB

Uploaded by

DS QB

Uploaded by

T.Y.B.Sc.

(CS) Sem VI Data Science Question Bank

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.