0% found this document useful (0 votes)
76 views15 pages

Unit I & II FDS - II AI & DS - Question Bank

Uploaded by

sridharan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views15 pages

Unit I & II FDS - II AI & DS - Question Bank

Uploaded by

sridharan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Fundamentals of Data Science

Unit – I (MCQ)
1. What is the primary goal of data science?
a) To develop algorithms b) To extract insights and knowledge from data
c) To program machines d) To create databases
Answer: b) To extract insights and knowledge from data
2. Which of the following is one of the key data science skills?
a) Data Visualization b) Machine Learning c) Statistics d) All of the
mentioned
Answer: d) All of the mentioned
3. Which of the following is a common method for data preprocessing?
a) Data normalization b) Data storage c) Data aggregation d) Data
visualization
Answer: a) Data normalization
4. Which of the following is the top most important thing in data science?
a) data b) question c) answer d) none of the mentioned
Answer: b) question
5. What technique is commonly used in data science for making predictions?
a) Data cleaning b) Data Storage c) Machine Learning d) Data Encoding
Answer: c) Machine Learning
6. In which of the following fields can data science be applied?
a) Healthcare b) Finance c) Retail d) All of the above
Answer: d) All of the above
7. Which of the following is a common use case of data science in healthcare?
a) Predicting customer preferences b) Detecting fraudulent transactions
c) Diagnosing diseases and predicting patient outcomes d) Managing inventory
levels
Answer: c) Diagnosing diseases and predicting patient outcomes
8. What is the role of data science in marketing?
a) To increase product prices b) To predict trends and customer behavior
c) To reduce the size of the marketing team d) To avoid customer feedback
Answer: b) To predict trends and customer behavior
9. Which type of analysis in data science is commonly used in business decision-
making?
a) Descriptive analysis b) Predictive analysis c) Prescriptive analysis d) All of
the above
Answer: d) All of the above
10. What are the two main types of data?
a) Structured and Unstructured b) Quantitative and Qualitative
c) Predictive and Descriptive d) Historical and Real-time
Answer: a) Structured and Unstructured
11. Which of the following is an example of unstructured data?
a) A database with customer information b) A text document or social media
posts
c) A table in Excel d) A CSV file
Answer: b) A text document or social media posts
12. Which of the following is an example of structured data?
a) Emails b) A spreadsheet containing sales data c) Audio files d) Images
Answer: b) A spreadsheet containing sales data
13. Which of the following is an example of a source of structured data?
a) Email content b) Customer database in a spreadsheet
c) Social media posts d) Audio recordings
Answer: b) Customer database in a spreadsheet
14. What is the first step in the data science process?
a) Model Evaluation b) Data Collection c) Data Cleaning d) Data
Visualization
Answer: b) Data Collection
15. Which of the following is a key task in the data preparation phase of the data
science process?
a) Running machine learning algorithms
b) Cleaning and transforming raw data into a usable format
c) Creating data models d) Making predictions based on the data
Answer: b) Cleaning and transforming raw data into a usable format
16. Which step in the data science process involves selecting the appropriate
model for the problem?
a) Data Collection b) Data Cleaning c) Model Building d) Model Evaluation
Answer: c) Model Building
17. Which of the following is typically used during the "data cleaning" phase?
a) Identifying outliers b) Selecting the model c) Making predictions d)
Visualizing the data
Answer: a) Identifying outliers
18. In which phase of the data science process are algorithms such as linear
regression or decision trees applied?
a) Data Collection b) Data Cleaning c) Model Building d) Model Evaluation
Answer: c) Model Building
19. What is the primary goal of data cleansing?
a) To remove irrelevant data b) To increase the size of the dataset
c) To convert raw data into a usable format d) To transform data into features
for analysis
Answer: c) To convert raw data into a usable format
20. Which of the following is a common data cleansing task?
a) Removing duplicates b) Normalizing the data
c) Combining data from different sources d) Building predictive models
Answer: a) Removing duplicates
21. Which of the following is a common issue encountered during data
integration?
a) Missing values b) Inconsistent data formats c) Duplicate records d) All of
the above
Answer: d) All of the above
22. Which of the following is NOT a typical technique used in EDA?
a) Histogram b) Box plot c) K-means clustering d)) Scatter plot
Answer: c) K-means clustering
23. What does a histogram show in EDA?
a) The distribution of a categorical variable
b) The relationship between two continuous variables
c) The frequency distribution of a numerical variable
d) The correlation between features
Answer: c) The frequency distribution of a numerical variable
24. Which of the following methods is used to handle missing values during EDA?
a) Dropping rows with missing values
b) Imputing missing values using the mean, median, or mode
c) Both A and B
d) Creating a new column for missing values
Answer: c) Both A and B
25. Which of the following is a common method to visualize the distribution of a
numerical variable in EDA?
a) Bar chart b) Histogram c) Pie chart d) Line plot
Answer: b) Histogram
26. What is the first step in the data retrieval process?
a) Analyzing data b) Defining the research question or goal
c) Building a predictive model d) Cleaning the dataset
Answer: b) Defining the research question or goal
27. What should be considered when setting a research goal?
a) The data's format and quality b) The data’s source and accessibility
c) The specific objectives and outcomes desired from the research
d)All of the above
Answer: d) All of the above
28. Which method is most commonly used for retrieving data from online
sources?
a) Web scraping b) Data entry c) Manual surveys d) Text mining
Answer: a) Web scraping
29. What should be considered when selecting data for a research project?
a) The size of the dataset b) The quality and relevance of the data
c) The data’s format and structure d) All of the above
Answer: d) All of the above
30. What is the primary goal of data transformation?
a) To clean the data b) To convert the data into a more suitable format for
analysis
c) To visualize the data d) To increase the volume of data
Answer: b) To convert the data into a more suitable format for analysis
31. Which of the following is a common data transformation technique?
a) Data normalization b) Data validation c) Data splitting d) Data encryption
Answer: a) Data normalization
32. Which of the following transformations is used to handle skewed data?
a) Log transformation b) Normalization c) One-hot encoding d) Z-score
transformation
Answer: a) Log transformation
33. Which type of visualization is commonly used to summarize and
communicate data distribution?
a) Line plot b) Box plot c) Scatter plot d) Heat map
Answer: b) Box plot
34. Which of the following is NOT a common method for presenting findings in
data science?
a) Interactive visualizations b) Reports with complex equations
c) Executive summaries with high-level insights
d) Data-driven decision-making recommendations
Answer: b) Reports with complex equations
35. Which tool is often used for creating interactive dashboards to present data
science findings?
a) Google Docs b) Tableau c) Notepad d) Excel formulas
Answer: b) Tableau
36. What is the first step in the data science process related to retrieving data?
a) Cleaning the data b) Defining the problem
c) Collecting and gathering data d) Visualizing the data
Answer: c) Collecting and gathering data
37. Which of the following is NOT a common method for retrieving data for a
data science project?
a) Web scraping b) Data APIs c) Data replication d) Data extraction from
databases
Answer: c) Data replication
38. In a typical data science workflow, where does the data retrieval process
occur?
a) After data visualization b) Before data cleaning
c) After model training d) After feature selection
Answer: b) Before data cleaning
39. What type of data retrieval process involves extracting data from different
tables in a relational database?
a) Data integration b) Data cleaning c) Data mining d) Data replication
Answer: a) Data integration
40. Which of the following is an example of an unstructured data source that
may require specific techniques to retrieve?
a) A structured SQL database b) A CSV file
c) A collection of customer reviews on a website d) An Excel spreadsheet
Answer: c) A collection of customer reviews on a website
41. What is a common error when cleaning data in data science?
a) Properly handling missing values b) Removing duplicates without analyzing
their impact
c) Identifying outliers correctly d) Ensuring data consistency
Answer: b) Removing duplicates without analyzing their impact
42. What is a common data entry error when collecting data?
a) Data duplication b) Correct format usage
c) Proper handling of missing values d) Accurate data normalization
Answer: a) Data duplication
43. Which of the following is an example of how data science is used in finance?
a) Fraud detection b) Manual account management
c) Decreasing algorithm efficiency d) Ignoring market trends
Answer: a) Fraud detection
44. How does data science contribute to the transportation industry?
a) By increasing traffic congestion b) By reducing operational efficiency
c) By optimizing routes and reducing fuel consumption d) By increasing vehicle
accidents
Answer: c) By optimizing routes and reducing fuel consumption
45. Which of the following is NOT considered a facet of data science?
a) Data collection b) Data visualization c) Data cleaning d) Data
compression
Answer: d) Data compression
46. Which of these is a primary step in the data science process?
a) Collecting raw data b) Writing code for data storage
c) Data exploration and analysis d) Deleting irrelevant data
Answer: c) Data exploration and analysis
47. Which of the following involves transforming data to make it suitable for
analysis?
a) Data cleaning b) Data collection c) Data visualization d) Data mining
Answer: a) Data cleaning
48. Which of the following is a common tool used for data analysis in data
science?
a) Google Chrome b) Microsoft Word c) Python d) Adobe Photoshop
Answer: c) Python
49. Which technique is used in data science to identify patterns in large
datasets?
a) Data compression b) Data mining c) Data encoding d) Data encryption
Answer: b) Data mining
50. Which type of data science model is used to make predictions based on input
data?
a) Descriptive model b) Predictive model c) Prescriptive model d) Diagnostic
model
Answer: b) Predictive model

5 Marks
1. Explain the benefit of data science
2. Explain the uses of data science
3. Describe the Transforming data
4. Explain the Need for data science
5. Discuss the process of building model in data science
9 Marks
1. Explain briefly about various facets of data
2. Explain briefly about Data science process
3. Explain the role of data cleansing in data science.
4. Discuss about the Exploratory Data Analysis
5. Explain the different methods used to retrieve data in data science?
Unit – II (MCQ)
1. What does a frequency distribution show?
a) Average value b) Frequency of data points c) Range of data d) Standard
deviation
Answer: b) Frequency of data points
2. Which term refers to the number of times a value appears in a dataset?
a) Cumulative frequency b) Mode c) Frequency d) Mean
Answer: c) Frequency
3. What is a class interval in a frequency distribution?
a) A range of values grouped together b) A single value in the dataset
c) The highest value in the dataset d) The total frequency
Answer: a) A range of values grouped together
4. What does the "mode" represent in a frequency distribution?
a) The most frequent value b) The middle value
c) The average value d) The sum of all values
Answer: a) The most frequent value
5. Which of the following is NOT a type of frequency distribution?
a) Grouped frequency distribution b) Cumulative frequency distribution
c) Univariate frequency distribution d) Bivariate frequency distribution
Answer: d) Bivariate frequency distribution
6. What is the purpose of using class intervals in a frequency distribution?
a) To make data analysis easier b) To calculate the median
c) To compute the range d) To find outliers
Answer: a) To make data analysis easier
7. What is an ungrouped frequency distribution?
a) A table that groups data into intervals
b) A distribution where individual data points are listed with their frequencies
c) A chart showing cumulative frequencies d) A graphical representation of
data
Answer: b) A distribution where individual data points are listed with their
frequencies
8. Which of the following is true about a grouped frequency distribution?
a) It lists every single data point and its frequency
b) It uses intervals to group the data points
c) It is only used for nominal data
d) It does not represent frequencies
Answer: b) It uses intervals to group the data points
9. Which type of data is typically best represented by a grouped frequency
distribution?
a) Nominal data b) Ordinal data c) Large quantitative data sets
d) Small qualitative data sets
Answer: c) Large quantitative data sets
10. Which of the following is the first step in creating a grouped frequency
distribution?
a) Calculate the range of the data b) Sort the data in descending order
c) Identify the class intervals d) Calculate the cumulative frequency
Answer: a) Calculate the range of the data
11. What does the frequency column in a grouped frequency distribution
represent?
a) The total number of intervals b) The number of data points within each class
interval
c) The cumulative sum of all values d) The total number of data points in the
dataset
Answer: b) The number of data points within each class interval
12. What type of data is typically represented using a relative frequency
distribution?
a) Qualitative data b) Continuous data c) Nominal data d) Ordinal data
Answer: b) Continuous data
13. What does a cumulative frequency distribution represent?
a) The total number of observations in the dataset b) The frequency of each
individual value
c) The running total of frequencies up to a given class interval
d) The total percentage of observations
Answer: c) The running total of frequencies up to a given class interval
14. In a cumulative frequency distribution, the cumulative frequency for the last
class interval should be equal to:
a) The total number of class intervals b) The total number of observations in the
dataset
c) The highest frequency in the dataset d) The median of the dataset
Answer: b) The total number of observations in the dataset
15. What is the cumulative frequency for the first class interval in a distribution?
a) The total frequency of all class intervals b) The frequency of the first class
interval
c) Zero d) The sum of all frequencies
Answer: b) The frequency of the first class interval
16. What is often plotted from a cumulative frequency distribution?
a) A histogram b) A cumulative frequency polygon c) A box plot d) A scatter
plot
Answer: b) A cumulative frequency polygon
17. What type of data is best suited for a frequency distribution of nominal data?
a) Quantitative data b) Ordinal data c) Categorical data d) Continuous data
Answer: c) Categorical data
18. Which of the following is a characteristic of nominal data?
a) It has a natural order or ranking b) It consists of discrete categories with no
inherent order
c) It represents continuous values d) It can be represented by numerical values
Answer: b) It consists of discrete categories with no inherent order
19. Which of the following can be represented in a frequency distribution for
nominal data?
a) Colors of cars in a parking lot b) Heights of individuals in a population
c) Temperature readings in Celsius d) Test scores of students
Answer: a) Colors of cars in a parking lot
20. Which type of visualization is most commonly used to display the frequency
distribution of nominal data?
a) Histogram b) Pie chart c) Box plot d) Scatter plot
Answer: b) Pie chart
21. In a frequency distribution for nominal data, what does the frequency
represent?
a) The total number of categories in the dataset
b) The number of occurrences of each distinct category
c) The range of values in the dataset d) The mean of the data
Answer: b) The number of occurrences of each distinct category
22. What does the mode represent in a dataset?
a) The value that occurs most frequently b) The middle value when data is
ordered
c) The sum of all data points divided by the number of data points
d) The difference between the largest and smallest values
Answer: a) The value that occurs most frequently
23. What is the median of a dataset?
a) The average of all data points b) The middle value when data is ordered
c) The value that occurs most frequently
d) The difference between the maximum and minimum values
Answer: b) The middle value when data is ordered
24. What is the formula for calculating the mean of a dataset?
a) Add all values and divide by the number of values b) Add all values and
divide by 2
c) Find the middle value in an ordered list d) Count the most frequent value
Answer: a) Add all values and divide by the number of values
25. Which measure of central tendency is most affected by outliers?
a) Mode b) Median c) Mean d) Range
Answer: c) Mean
26. If a dataset has an odd number of elements, the median is:
a) The first element in the ordered list b) The average of the two middle values
c) The middle value in the ordered list
d) The sum of all values divided by the number of values
Answer: c) The middle value in the ordered list
27. Which of the following is most suitable for finding the average income in a
population?
a) Mode b) Median c) Mean d) Range
Answer: c) Mean
28. What is the median of the following dataset: 1, 3, 5, 7, 9?
a) 5 b) 4 c) 6 d) 7
Answer: a) 5
29. Which of the following is the best measure of central tendency for a skewed
dataset?
a) Mode b) Mean c) Median d) Range
Answer: c) Median
30. Which of the following is most affected by extreme values in a dataset?
a) Mode b) Mean c) Median d) Range
Answer: b) Mean
31. Which of the following does not require the data to be numerical?
a) Mode b) Mean c) Median d) Standard Deviation
Answer: a) Mode
32. Which of the following is an example of nominal data?
a) Age of participants b) Favorite color c) Temperature readings d) Height of
participants
Answer: b) Favorite color.
33. A frequency distribution for nominal data shows:
a) The number of occurrences of each category. b) The range of values within
each category.
c) The average value for each category. d) The standard deviation within each
category.
Answer: a) The number of occurrences of each category.
34. Which of the following is not a characteristic of nominal data?
a) Categories represent different groups. b) There is a meaningful order to the
categories.
c) Categories are mutually exclusive. d) Categories cannot be ranked.
Answer: b) There is a meaningful order to the categories.
35. What type of data does a frequency distribution for nominal data represent?
a) Continuous data b) Categorical data c) Ordinal data d) Interval data
Answer: b) Categorical data.
36. In nominal data, which operation can you perform?
a) Mean b) Median c) Mode d) Standard deviation
Answer: c) Mode.
37. In nominal data, the categories should be:
a) Ordered in a specific sequence. b) Mutually exclusive.
c) Continuous. d) Measured on an interval scale.
Answer: b) Mutually exclusive.
38. A nominal variable can be used to:
a) Identify categories without any inherent order. b) Measure differences in
categories.
c) Calculate the mean and median. d) Perform numerical operations.
Answer: a) Identify categories without any inherent order.
39. Which axis of a frequency polygon represents the frequency of the data?
a) Y-axis b) X-axis c) Both X and Y axes d) Neither X nor Y axis
Answer: a) Y-axis
40. Which of the following is a major advantage of a frequency polygon over a
histogram?
a) It can easily compare multiple data sets b) It is faster to construct
c) It displays the cumulative frequency d) It uses bars to show data
Answer: a) It can easily compare multiple data sets
41. What is typically plotted on the X-axis of a frequency polygon?
a) Cumulative frequency b) Frequency of the data
c) Class intervals or categories d) Percentage of data
Answer: c) Class intervals or categories
42. What type of graph is used to display the relationship between two
quantitative variables?
a) Box plot b) Scatter plot c) Bar graph d) Line graph
Answer: b) Scatter plot
43. What is the primary purpose of constructing a cumulative frequency
distribution?
a) To display the data in a pie chart
b) To show the total number of observations for each class
c) To identify the percentage of data below a specific value
d) To calculate the variance of the data
Answer: c) To identify the percentage of data below a specific value
44.Which of the following is NOT a part of a grouped frequency distribution?
a) Class intervals b) Frequency c) Cumulative frequency d) Mean deviation
Answer: d) Mean deviation
45. In a grouped frequency distribution, the frequency for each class interval
represents:
a) The number of observations in that interval b) The sum of all values in that
interval
c) The cumulative frequency of all prior intervals
d) The percentage of total data points in that interval
Answer: a) The number of observations in that interval
46. What does a uniform distribution indicate about the data?
a) All data points occur with equal frequency b) The data is clustered around a
central value
c) The data has one peak d) The data is heavily skewed to the right
Answer: a) All data points occur with equal frequency
47. In a normal distribution, which of the following is true?
a) The mean, median, and mode are all equal b) The mean is less than the
median
c) The distribution is skewed to the right d) The mode is greater than the mean
Answer: a) The mean, median, and mode are all equal

5 Marks
1. Explain the grouped frequency distribution
2. Describe the cumulative frequency distribution
3. Write note one median with suitable example
4. Discuss bout the mode with example
5. Explain the mean with example
9 Marks
1. Explain about the frequency distribution
2. Explain the Relative frequency distribution
3. Describe the frequency distribution for nominal data
4. Explain the graph for quantitative data
5. Explain about the averages for qualitative and ranked data.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy