0% found this document useful (0 votes)
16 views11 pages

20 Short Questions

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views11 pages

20 Short Questions

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 11

UNIT-1

SHORT ANSWER TYPE QUESTION WITH BLOOMS AND CO LEVELS

BLOOM'S
QUESTIONS LEVEL CO-LEVEL

1. Define Data Science in your own words. BL1 CO1

2. What is the difference between Big Data and data science? BL2 CO1

3. How can we differentiate between legitimate and exaggerated claims in the


field of data science? BL3 CO1

4. Explain the concept of "datafication" with an example. BL2 CO1

5. Briefly describe three different perspectives on data science. BL1 CO1

6. What is the goal of statistical inference? BL1 CO1

7. Distinguish between a population and a sample in the context of data


analysis. BL2 CO1

8. What is a statistical model, and why is it used? BL2 CO1

9. Name three common probability distributions and briefly describe their


characteristics. BL1 CO1

10. Explain the process of fitting a statistical model to data. BL2 CO1

11. What is overfitting in the context of model building? BL2 CO1

12. What is R, and why is it a popular tool for data science? BL1 CO1

13. Describe the basic steps involved in setting up an R environment. BL2 CO1

14. How do you assign a value to a variable in R? BL1 CO1

15. What are the basic data types in R? BL2 CO1

16. How would you create a vector of numbers in R? BL3 CO1

17. Explain the difference between a vector and a list in R. BL2 CO1

18. How can you check the data type of a variable in R? BL3 CO1

19. How would you read a CSV file into an R data frame? BL3 CO1

20. What are some common packages used for data visualization in R? BL1 CO1
UNIT-1
LONG ANSWER TYPE QUESTIONS WITH BLOOMS AND CO LEVELS

BLOOM'S CO-
QUESTIONS LEVEL LEVEL

1. Define Data Science. Discuss the evolution of its definition and the challenges in
providing a universally accepted one. BL1 CO1

2. Differentiate between Big Data and data science. Explain how they are related and
how they differ in their focus and methodologies. BL2 CO1

3. Critically analyze the current definition of Data Science. Discuss the evolving
nature of this field and the challenges in providing a universally accepted definition. BL4 CO1

4. Define Big Data and its key characteristics (Volume, Velocity, Variety, Veracity).
Discuss the challenges and opportunities presented by Big Data for businesses and
society. BL1 CO1

5. Explain the concept of "datafication" with an example. Discuss the ethical


implications of this trend, considering privacy concerns and potential biases. BL2 CO1

6. Discuss the ethical implications of the "datafication" trend. How does the increasing
digitization of our lives impact individual privacy and societal values? What are the
potential risks and benefits of a data-driven society? BL5 CO1

7. Describe the three major perspectives on data science: science, technology, and
craft. Discuss the strengths and weaknesses of each perspective. BL1 CO1

8. Compare and contrast the three major perspectives on data science: science,
technology, and craft. Discuss the strengths and weaknesses of each perspective and
their implications for the practice of data science. BL4 CO1

9. Define statistical inference and its key goals. Explain the difference between
descriptive and inferential statistics. BL1 CO1

10. Explain the concept of statistical inference in detail. Describe the key steps
involved in the inferential process, including data collection, exploratory data analysis,
model building, and hypothesis testing. Discuss the importance of assumptions in
statistical inference and the consequences of violating these assumptions. BL4 CO1

11. Define population and sample. Explain the concept of sampling bias and its
potential impact on research findings. BL1 CO1

12. Discuss the sampling bias and its potential impact on the validity of research
findings. How can researchers minimize sampling bias in their studies? Provide
specific examples of different sampling techniques and their potential biases. BL4 CO1

13. Define a statistical model. Explain the purpose of statistical modeling in data BL1 CO1
science.

14. Explain the concept of model selection in the context of statistical modeling.
Discuss the criteria used to select the best model for a given dataset, such as AIC, BIC,
and cross-validation. How do these criteria balance model complexity and predictive
accuracy? BL4 CO1

15. Define probability distribution. Name and briefly describe three common
probability distributions (e.g., normal, binomial, Poisson). BL1 CO1

16. Discuss the importance of probability distributions in data science. How are
probability distributions used in modeling, inference, and decision-making? Provide
examples of different probability distributions (e.g., normal, binomial, Poisson) and
their applications in data science. BL4 CO1

17. Explain the concept of overfitting in the context of model building. What are the
potential consequences of overfitting? BL3 CO1

18. Discuss the challenges of overfitting in machine learning. Explain how techniques
such as regularization, cross-validation, and early stopping can help to prevent
overfitting and improve model generalization. BL4 CO1

19. What is R? Briefly discuss its advantages and disadvantages for data science. BL1 CO1

20. Discuss the advantages and disadvantages of using R for data science. Compare R
with other popular programming languages used in data science, such as Python,
considering factors such as community support, available libraries, ease of use, and
performance. BL5 CO1

21. Describe the basic steps involved in setting up an R environment. BL1 CO1

22. Discuss the importance of proper R environment setup and package management.
Explain how to install and load R packages, and how to manage dependencies between
packages. How can you ensure the reproducibility of your R code and analyses? BL3 CO1

23. What are the basic data types in R? Provide examples of each. BL1 CO1

24. Explain the concept of data structures in R. Discuss the key differences between
vectors, matrices, data frames, and lists, and provide examples of when each data
structure would be most appropriate. How can you efficiently manipulate and subset
data within these different data structures? BL4 CO1

25. How do you assign a value to a variable in R? Provide an example. BL1 CO1

26. Discuss the concept of functions in R. Explain how to define and call functions,
and how to pass arguments to functions. Provide examples of how functions can be
used to improve code readability, reusability, and modularity in R. BL4 CO1

27. Explain the concept of control flow in R. Discuss the use of conditional statements BL3 CO1
(if-else) and loops (for, while) in R programming. Provide examples of how control
flow can be used to automate data processing tasks, implement algorithms, and make
decisions within your R code.

28. Describe the importance of data cleaning and preprocessing in R. List some
common data cleaning tasks. BL1 CO1

29. Discuss the importance of data cleaning and preprocessing in R. Explain how to
handle missing values, outliers, and inconsistencies in data using R functions. BL4 CO1

30. How can you ensure the quality and accuracy of your data before proceeding with
analysis? BL5 CO1
UNIT-1
SEMINAR TOPICS
1. BIG DATA AND DATA SCIENCE HYPE
2. CURRENT LANDSCAPE OF PERSPECTIVES
3. STATISTICAL INFERENCE
4. STATISTICAL MODELING
5. PROBABILITY DISTRIBUTIONS
6. FITTING A MODEL OVER FITTING.
UNIT-1
OBJECTIVE QUESTIONS WITH ANSWERS

1. Which of the following best defines Data Science?


a) The science of collecting and organizing data.
b) An interdisciplinary field that uses scientific methods, processes, algorithms, and
systems to extract knowledge and insights from structured and unstructured data.
c) The study of computer science and its applications.
d) The art of making decisions based on gut feelings.

2. What is a key characteristic of Big Data?


a) Small volume
b) High velocity
c) Low variety
d) Simple structure
3. What is the main goal of datafication?
a) To reduce the amount of data collected.
b) To convert aspects of everyday life into data.
c) To eliminate the use of technology.
d) To stop the collection of personal information.
4. Which perspective on data science emphasizes the use of tools and technologies?
a) Science perspective
b) Technology perspective
c) Craft perspective
d) Business perspective
5. What is the primary goal of statistical inference?
a) To draw conclusions about a population based on a sample.
b) To describe the characteristics of a single data point.
c) To predict the future with absolute certainty.
d) To collect as much data as possible.
6. Which of the following is NOT a key step in the statistical inference process?
a) Data collection
b) Data analysis
c) Model building
d) Meditation
7. What is a sample in statistics?
a) A subset of a population.
b) The entire group of interest.
c) A single data point.
d) A type of probability distribution.
8. What is a statistical model?
a) A mathematical representation of a real-world phenomenon.
b) A physical model of a system.
c) A collection of data points.
d) A software program used for data analysis.
9. Which of the following is NOT a common probability distribution?
a) Normal distribution
b) Linear distribution
c) Binomial distribution
d) Poisson distribution
10. What is overfitting in machine learning?
a) When a model performs well on training data but poorly on new data.
b) When a model performs poorly on both training and new data.
c) When a model performs well on both training and new data.
d) When a model cannot be trained on the available data.
11. What is the primary programming language used in R?
a) R
b) Python
c) Java
d) C++
12. What is the purpose of RStudio?
a) To provide an integrated development environment for R.
b) To write and execute Python code
. c) To create databases.
d) To manage operating systems.
13. What is a vector in R?
a) A sequence of elements of the same data type.
b) A two-dimensional table of data.
c) A single value.
d) A function in R.
14. What is a data frame in R?
a) A two-dimensional table of data with rows and columns.
b) A single row of data.
c) A collection of functions.
d) A type of plot.
15. How do you assign the value 10 to a variable named "my_variable" in R?
a) my_variable = 10
b) my_variable <- 10
c) 10 = my_variable
d) assign(10, my_variable)
16. What function is used to create a sequence of numbers in R?
a) seq()
b) sum()
c) mean()
d) print()
17. What is the purpose of the c() function in R?
a) To combine elements into a vector.
b) To calculate the mean of a vector.
c) To create a data frame.
d) To plot a graph.
18. What function is used to read a CSV file into R?
a) read.csv()
b) write.csv()
c) load()
d) save()
19. What is the purpose of the ggplot2 package in R?
a) To create elegant and informative data visualizations.
b) To perform statistical tests.
c) To manipulate data frames.
d) To write R code more efficiently.
20. What is the purpose of the dplyr package in R?
a) To efficiently manipulate and summarize data.
b) To create interactive web applications.
c) To write complex algorithms.
d) To manage R packages.
21. What is the purpose of the tidyr package in R?
a) To tidy and reshape data for easier analysis.
b) To clean and preprocess text data.
c) To build machine learning models.
d) To create interactive dashboards.
22. True or False: Big Data always requires specialized hardware and software.
a) True
b) False
23. Which of the following is an example of supervised learning?
a) Training a model to predict house prices based on historical data.
b) Grouping customers into segments based on their purchasing behavior.
c) Identifying anomalies in network traffic.
d) Discovering hidden patterns in a dataset.
24. Which statistical concept is used to measure the central tendency of a dataset?
a) Mean
b) Standard deviation
c) Correlation
d) P-value
25. What is the purpose of a hypothesis test?
a) To determine if there is enough evidence to reject a null hypothesis.
b) To describe the characteristics of a dataset.
c) To build a predictive model.
d) To collect data.
26. What is the purpose of cross-validation?
a) To evaluate the performance of a model on unseen data.
b) To select the best features for a model.
c) To train a model on a single dataset.
d) To visualize the results of a model.
27. What is the purpose of the install.packages() function in R?
a) To install new R packages.
b) To load a previously installed R package.
c) To create a new R script.
d) To run an R script.
28. What is the purpose of the library() function in R?
a) To load a previously installed R package.
b) To install new R packages.
c) To create a new R script.
d) To run an R script.
29. What is the data type for a character string in R?
a) Character
b) Numeric
c) Integer
d) Logical
30. What is the purpose of the rm() function in R?
a) To remove objects (variables) from the current R environment.
b) To rename objects in the current R environment.
c) To create new objects in the current R environment
. d) To list all objects in the current R environment.
UNIT-1
MICRO PROJECT
Sample Example: Predicting House Prices
Let's assume we have a dataset containing the size of houses (in square feet) and their
corresponding prices. We can use simple linear regression to predict the price of a house based on
its size.
1. Load the data:
Code snippet
house_data <- read_csv("house_prices.csv")
2. Build the model:
Code snippet
model <- lm(Price ~ Size, data = house_data)
3. Evaluate the model:
o Calculate RMSE and R-squared.

o Visualize the results with a scatter plot.

By analyzing the RMSE and R-squared values, as well as the visual representation, we
can assess how well the model predicts house prices based on their size.
Analysis complete
Analysis complete
 Code completed
Response finalized
I'll use the mtcars dataset available in R for this example. The dataset contains fuel
consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74
models). I'll use qsec as the independent variable and mpg as the dependent variable. I'll build a
linear model and then calculate the RMSE and R-squared values
It seems that the mtcars dataset is not directly available in Python. I'll load the dataset
from a URL and then proceed with the analysis.
There seems to be a temporary issue with the URL. I'll use a different URL to load the
mtcars dataset and then proceed with the analysis.
It seems like there's a connection issue preventing the code from accessing the URLs. I'll
try loading a dataset that comes preloaded with the sklearn library and proceed with building the
model.
The RMSE is 53.12 and the R-squared is 0.48.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy