20 Short Questions
20 Short Questions
BLOOM'S
QUESTIONS LEVEL CO-LEVEL
2. What is the difference between Big Data and data science? BL2 CO1
10. Explain the process of fitting a statistical model to data. BL2 CO1
12. What is R, and why is it a popular tool for data science? BL1 CO1
13. Describe the basic steps involved in setting up an R environment. BL2 CO1
17. Explain the difference between a vector and a list in R. BL2 CO1
18. How can you check the data type of a variable in R? BL3 CO1
19. How would you read a CSV file into an R data frame? BL3 CO1
20. What are some common packages used for data visualization in R? BL1 CO1
UNIT-1
LONG ANSWER TYPE QUESTIONS WITH BLOOMS AND CO LEVELS
BLOOM'S CO-
QUESTIONS LEVEL LEVEL
1. Define Data Science. Discuss the evolution of its definition and the challenges in
providing a universally accepted one. BL1 CO1
2. Differentiate between Big Data and data science. Explain how they are related and
how they differ in their focus and methodologies. BL2 CO1
3. Critically analyze the current definition of Data Science. Discuss the evolving
nature of this field and the challenges in providing a universally accepted definition. BL4 CO1
4. Define Big Data and its key characteristics (Volume, Velocity, Variety, Veracity).
Discuss the challenges and opportunities presented by Big Data for businesses and
society. BL1 CO1
6. Discuss the ethical implications of the "datafication" trend. How does the increasing
digitization of our lives impact individual privacy and societal values? What are the
potential risks and benefits of a data-driven society? BL5 CO1
7. Describe the three major perspectives on data science: science, technology, and
craft. Discuss the strengths and weaknesses of each perspective. BL1 CO1
8. Compare and contrast the three major perspectives on data science: science,
technology, and craft. Discuss the strengths and weaknesses of each perspective and
their implications for the practice of data science. BL4 CO1
9. Define statistical inference and its key goals. Explain the difference between
descriptive and inferential statistics. BL1 CO1
10. Explain the concept of statistical inference in detail. Describe the key steps
involved in the inferential process, including data collection, exploratory data analysis,
model building, and hypothesis testing. Discuss the importance of assumptions in
statistical inference and the consequences of violating these assumptions. BL4 CO1
11. Define population and sample. Explain the concept of sampling bias and its
potential impact on research findings. BL1 CO1
12. Discuss the sampling bias and its potential impact on the validity of research
findings. How can researchers minimize sampling bias in their studies? Provide
specific examples of different sampling techniques and their potential biases. BL4 CO1
13. Define a statistical model. Explain the purpose of statistical modeling in data BL1 CO1
science.
14. Explain the concept of model selection in the context of statistical modeling.
Discuss the criteria used to select the best model for a given dataset, such as AIC, BIC,
and cross-validation. How do these criteria balance model complexity and predictive
accuracy? BL4 CO1
15. Define probability distribution. Name and briefly describe three common
probability distributions (e.g., normal, binomial, Poisson). BL1 CO1
16. Discuss the importance of probability distributions in data science. How are
probability distributions used in modeling, inference, and decision-making? Provide
examples of different probability distributions (e.g., normal, binomial, Poisson) and
their applications in data science. BL4 CO1
17. Explain the concept of overfitting in the context of model building. What are the
potential consequences of overfitting? BL3 CO1
18. Discuss the challenges of overfitting in machine learning. Explain how techniques
such as regularization, cross-validation, and early stopping can help to prevent
overfitting and improve model generalization. BL4 CO1
19. What is R? Briefly discuss its advantages and disadvantages for data science. BL1 CO1
20. Discuss the advantages and disadvantages of using R for data science. Compare R
with other popular programming languages used in data science, such as Python,
considering factors such as community support, available libraries, ease of use, and
performance. BL5 CO1
21. Describe the basic steps involved in setting up an R environment. BL1 CO1
22. Discuss the importance of proper R environment setup and package management.
Explain how to install and load R packages, and how to manage dependencies between
packages. How can you ensure the reproducibility of your R code and analyses? BL3 CO1
23. What are the basic data types in R? Provide examples of each. BL1 CO1
24. Explain the concept of data structures in R. Discuss the key differences between
vectors, matrices, data frames, and lists, and provide examples of when each data
structure would be most appropriate. How can you efficiently manipulate and subset
data within these different data structures? BL4 CO1
25. How do you assign a value to a variable in R? Provide an example. BL1 CO1
26. Discuss the concept of functions in R. Explain how to define and call functions,
and how to pass arguments to functions. Provide examples of how functions can be
used to improve code readability, reusability, and modularity in R. BL4 CO1
27. Explain the concept of control flow in R. Discuss the use of conditional statements BL3 CO1
(if-else) and loops (for, while) in R programming. Provide examples of how control
flow can be used to automate data processing tasks, implement algorithms, and make
decisions within your R code.
28. Describe the importance of data cleaning and preprocessing in R. List some
common data cleaning tasks. BL1 CO1
29. Discuss the importance of data cleaning and preprocessing in R. Explain how to
handle missing values, outliers, and inconsistencies in data using R functions. BL4 CO1
30. How can you ensure the quality and accuracy of your data before proceeding with
analysis? BL5 CO1
UNIT-1
SEMINAR TOPICS
1. BIG DATA AND DATA SCIENCE HYPE
2. CURRENT LANDSCAPE OF PERSPECTIVES
3. STATISTICAL INFERENCE
4. STATISTICAL MODELING
5. PROBABILITY DISTRIBUTIONS
6. FITTING A MODEL OVER FITTING.
UNIT-1
OBJECTIVE QUESTIONS WITH ANSWERS
By analyzing the RMSE and R-squared values, as well as the visual representation, we
can assess how well the model predicts house prices based on their size.
Analysis complete
Analysis complete
Code completed
Response finalized
I'll use the mtcars dataset available in R for this example. The dataset contains fuel
consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74
models). I'll use qsec as the independent variable and mpg as the dependent variable. I'll build a
linear model and then calculate the RMSE and R-squared values
It seems that the mtcars dataset is not directly available in Python. I'll load the dataset
from a URL and then proceed with the analysis.
There seems to be a temporary issue with the URL. I'll use a different URL to load the
mtcars dataset and then proceed with the analysis.
It seems like there's a connection issue preventing the code from accessing the URLs. I'll
try loading a dataset that comes preloaded with the sklearn library and proceed with building the
model.
The RMSE is 53.12 and the R-squared is 0.48.