0% found this document useful (0 votes)
11 views14 pages

Question Bank For All 5 Units: Department of Computer Science and Engineering & Department of Information Technology

Uploaded by

theseesaw.co
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views14 pages

Question Bank For All 5 Units: Department of Computer Science and Engineering & Department of Information Technology

Uploaded by

theseesaw.co
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Department of Computer Science and Engineering

&
Department of Information Technology

QUESTION BANK

For ALL 5 UNITS

DESCRIPTIVE & OBJETIVE QNS

COURSE: B. Tech (CSE-A,B,C) III Year – I Semester (R18)

For Both CSE(A,B,C) & IT

SUBJECT: DATA ANALYTICS

1
UNIT-1
COURSE PROGRAM BLOOM’S
S.No QUESTION MARKS OUTCOME OUTCOME TAXONOMY
(CO) (PO) LEVEL(BTL)
Why Data Analytics is so 2M 1 1 1
a).
important?
What do you mean by Enterprise 3M 1 1 1
1. b).
requirements in Data architecture?
Describe the Factors that influence 10M 1 2 2
c).
the Data Architecture?

What are the tools used in Data 2M 1 1 1


a).
Analytics?
What are the three essential models 3M 1 1 2
2. b).
in data architecture?
What are primary sources of data? 10M 1 2 4
c).
Explain in detail

a). What are CRD & RBD? 2M 1 1 1


Differentiate Qualitative Data & 3M 1 2 3
b).
3. Quantitative Data.
List and explain about the 10M 1 4 2
c).
secondary sources of data

Write about sensor’s data & Web 2M 1 2 4


a).
traffic data.
Brief about the features of Amazon 3M 1 1 4
4. b).
Web Services - S3
Explain about the detection and 10M 1 2 3
c).
treatment of Outliers

What do you mean by Data 2M 1 1 1


a).
Quality?
What is the difference between 3M 1 3 1
5. b).
error data and noisy data?
Explain about the Data Pre- 5M 1 2 2
c).
processing in detail

What is the objective of Data 2M 1 1 1


a).
Processing?
How do you handle the missing 3M 1 2 2
6. b).
values?
Explain about the Data Processing 10M 1 4 3
c).
in detail.

2
UNIT 2
BLOO
PROG
COURSE M’S
RAM
OUTCO TAXON
S.No QUESTION MARKS OUTC
ME OMY
OME
(CO) LEVEL(
(PO)
BTL)
a). What is the Importance of Analytics? 2M 2 1 1
1. b). What is the role of Data Analytics? 3M 2 1 2
c). What are the ways to use the Data Analytics? 10M 2 2 4

List three tools of DA those works on stack of


a). 2M 2 1 1
Hadoop.
2. b). Brief any Four tools used in Data Analytics? 3M 2 2 2
What are the steps involved in Data Analytics?
c). 10M 2 1 4
Explain in detail.

a). Write short notes on Data Modelling. 2M 2 1 1


b). Brief some applications of DA. 3M 2 1 2
3.
What are the different primary Analytics Tools for
c). 10M 2 2 2
DA? Explain in detail

a). List the Features of Apache Spark? 2M 2 1 1


b). Why and where Scala & Impala are useful? 3M 2 1 2
4.
Illustrate about Apache Spark Built and components
c). 10M 2 2 2
in Hadoop.

a). What are the features of Impala? 2M 2 1 1


b). What is Cluster Computing? Brief about it. 3M 2 1 2
5.
Contrast the differences about SQL & NOSQL
c). 10M 2 2 2
databases.

a). What are the benefits of NOSQL? 2M 2 1 1


6. b). Write short notes on different data types and variables 3M 2 2 2
c). Explain in detail about the Missing Imputations 10M 2 3 4

3
UNIT 3
BLOOM’S
COURSE PROGRAM
TAXONOMY
S.No QUESTION MARKS OUTCOME OUTCOME
LEVEL(BTL
(CO) (PO)
)
a). When a Regression is chosen? 2M 3 1 1
b). List the Regression Analysis Techniques 3M 3 1 2
1. Explain in detail about the OLS 10M 3 3 4
c). Regression with the inclusion of Error
term?

What are the advantages and Limitation 2M 3 1 1


a).
of Linear Regression
How do you Calculate the B1 & B0 using 3M 3 1 2
b).
2. Correlation and Standard Deviation?
Write and Explain the Properties and 10M 3 1 4
c). Assumption of OLS Regression along
with Root Mean Squared Error.

What do you mean by Unbiasedness and 2M 3 1 2


a).
Least Variance?
3. b). Elaborate Variable Rationalization? 3M 3 2 1
Illustrate the Model Building Life Cycle 10M 3 4 4
c).
in Data Analytics.

a). What is a sufficient Estimator? 2M 3 1 1


Compare Homoscedasticity & 3M 3 1 2
b).
4. Heteroscedasticity.
Compare and contrast SQL Databases 10M 3 2 3
c).
with NOSQL Databases.

a). What is Univariate Analysis? 2M 3 1 1


Explain Discrete and continuous Variables 3M 3 1 2
b).
5. with example.
Sketch various analytics applications to 10M 3 1 4
c).
various Business Domains.

What is the need of Business Modelling 2M 3 1 2


a).
and Model Theory?
What are different types of NOSQL 3M 3 2 1
6. b).
Databases
What are Missing Imputations? Explain in 10M 3 3 4
c).
detail

4
UNIT 4
COURSE PROGRAM BLOOM’S
S. No QUESTION MARKS OUTCOME OUTCOME TAXONOMY
(CO) (PO) LEVEL(BTL)
a). What is Supervised Learning? 2M 4 2 1
What is the major difference between 3M 4 1 2
b).
1. Supervised & Unsupervised Learning?
Compare and contrast Supervised & 10M 4 3 4
c).
Unsupervised Learning.

a). What is Unsupervised Learning? 2M 4 1 1


List some Supervised and Unsupervised 3M 4 1 2
b).
2. Learning Techniques.
Explain in detail about the Segmentation 10M 4 1 4
c).
approach in Data Analytics

Explain in brief about the types of 2M 4 1 2


a).
Decision Tree Algorithms.
Write the terminologies in Decision Tree 3M 4 2 1
b).
3. and representation
Explain in detail about the Decision Tree 10M 4 4 4
c). along with algorithm and simple
example

Compare and Contrast Entropy & 2M 4 1 1


a).
Information Gain?
What are the appropriate Problems for 3M 4 1 2
4. b).
Decision Tree Learning? Explain
Write the advantages, Limitations of 10M 4 2 3
c).
DTL.

a). What is Pruning in DTL? 2M 4 2 2


b). Compare Overfitting and Under fitting. 3M 4 1 4
5.
Discuss about Time Series Methods in 10M 4 3 1
c).
Data Analytics.

a). Brief about ARMA and ARIMA 2M 4 2 3


Compare Classification vs Regression 3M 4 3 2
b).
6. and their Methods and CART.
Interpret ETL Approach in Data 10M 4 1 4
c).
Analytics

5
UNIT 5
COURSE PROGRAM BLOOM’S
S. No QUESTION MARKS OUTCOME OUTCOME TAXONOMY
(CO) (PO) LEVEL(BTL)
What do you mean by Data Visualization? 2M 5 2 1
a).
Brief.
Why Data Visualization is required? 3M 5 1 2
b).
1. Elaborate
Explore in detail about the different 10M 5 3 5
c). Geometric Projection Visualization
Techniques.

a). What is a Tree Map? 2M 5 1 1


What are the different Categories of Data 3M 5 1 2
b).
2. Visualizations?
Explain steps involved in Data 10M 5 1 5
c).
Visualization In Tableau.

a). What is a Line plot? 2M 5 1 2


b). How the Pie Chart represented? 3M 5 2 1
3.
Explain in detail about the Icon-Based 10M 5 5 5
c).
Visualization Techniques

a). What is a scatter plot? 2M 5 1 1


b). What is a Box Plot? 3M 5 1 2
4.
Explain about the Hierarchical 10M 5 2 3
c).
Visualization Techniques.

a). What is Circle packing? 2M 5 2 2


Write and brief the different 3M 5 1 5
b).
terminologies used in Box Plot.
5.
What is a Word Cloud? Explore the 10M 5 3 1
c). Visualizing the Complex Data and
Relations.

What is a Chernoff Face and Sticky 2M 5 2 3


a).
Figure?
6. b). List few tools popular for ETL approach 3M 4 3 2
What are Time Series Methods? Explore 10M 4 1 5
c).
ARIMA & ARMA.

6
OBJECTIVE QUESTIONS
UNIT – 1
Choose the Correct Answer
1. Most of the data is generated from _________ [ ]
A. Print media B. Organizations
C. Social media D. e-commerce

2. Data Analytics is used to gather ___________in Data. [ ]


A. hidden insights B. perform market analysis
C. Interesting Patterns D. All the above

3. Market Analysis can be performed to understand the _________of competitors. [ ]


A. strengths B. weaknesses
C. Both Strengths and weaknesses D. Profits and loss

4. __________ policies and rules will help describe the manner in which enterprise wishes [ ]
to process their data.
A. Working Policies B. Labor Policies
C. Business policies D. Administration

5. The General Approach is based on designing the Architecture at __________ Levels [ ]


of Specification.
A. Logical Level B. Physical
C. Implementational Level D. All

6. _______________ is a group of non-numerical data such as words, sentences. [ ]


A. quantitative data B. Big Data
C. qualitative data D. Analytics

7. The data which is Raw, original, and extracted directly from the official sources is [ ]
known as _____________.
A. Secondary Data B. primary data
C. Input Data D. Processed Data

8. CRD _______________ [ ]
A. Complete Randomized design B. Complete Rough Data
C. Complete Raw Data D. Complete Raw Design

9. LSD – Latin Square Design is _________ squares with an equal number of rows and [ ]
columns
A. N x N B. N x M
C. N x 1 D. 1 x N

10. __________ is the assessment of how much the data is usable and fits its serving [ ]
context.
A. data quality B. Data Integrity

7
C. Data Quantity D. Data Interpretability

FILL IN THE BLANKS:

11. ANOVA __________________.


12. __________is the data which has already been collected and reused again for some valid
purpose.
13. ___________ is about handling of missing data, noisy data etc.
14. ____________ is a term referred to storing and accessing data over the internet.
15. Amazon S3 ___________________.
16. __________ is a point or an observation that deviates significantly from the other observations.
17. Reasons for outliers ____________________.
18. Increase in the error variance and reduces the power of statistical tests due to ____________.
19. PMM: _____________.
20. ________________ approach groups the similar data

UNIT-2
Choose the Correct Answer:
1. ______ is leading analytics tool used for statistics and data modeling. [ ]
A. Java Programming B. C Programming
C. R Programming D. C++ Programming

2. ___________ software that connects to any data source such as Excel, corporate [ ]
Data Warehouse, etc.
A. Tableau B. R
C. Java D. Python

3. _________ can be assembled on any platform like SQL server, a MongoDB database [ ]
or JSON.
A. Java Programming B. Python Programming
C. R Programming D. C++ Programming

4. _______________ provides various machine learning and visualization libraries such [ ]


as Scikit-learn, TensorFlow, Matplotlib, Pandas, Keras, etc.
A. Java Programming B. Python Programming
C. R Programming D. C++ Programming

5. _______ is one of the largest large-scale data processing engine that executes [ ]
applications in Hadoop clusters.
A. Python B. R
C. Ruby D. Apache Spark

6. ___________Also known as Google Refine. [ ]


A. Open_Refine B. Closed Refine
C. Wide_Refine D. Big Refine

7. ___________ is a collection of tightly or loosely connected computers that work [ ]

8
together so that they act as a single entity.
A. Cluster computing B. Wide Computing
C. Big Computing D. Close Computing

8. _____________is a lightning-fast cluster computing technology, designed for fast [ ]


computation.
A. Big Computing B. Distributed computing
C. Apache Spark D. MySQL

9. Spark helps to run an application in __________ [ ]


A. Hadoop cluster B. Oracle
C. Big Cluster D. NoSQL

10. ______is a component on top of Spark Core that introduces a new data abstraction [ ]
A. SQL B. Spark SQL
C. Spark D. NoSQL

Choose the Correct Answer


FILL IN THE BLANKS:
21. __________ is a distributed machine learning framework above Spark
22. ___________ is a distributed graph-processing framework on top of Spark
23. Scala is a statically typed programming language that incorporates both ____________
24. Scala primarily runs on ___________.
25. The name Scala is a portmanteau of ______________.
26. Cloudera Impala is Cloudera's open source massively parallel processing ___________.
27. Cloudera Impala is a query engine that runs on _______________ .
28. ___________ Database is a non-relational Data Management System.
29. MongoDB is an example for _________ Database for Document oriented data.
30. Example of Graph Database __________ .

9
UNIT-3
MULTIPLE CHOICE QUESTIONS:
1. The term __________ is used to indicate the estimation or prediction of the average [ ]
value of one variable for a specified value of another variable.

A. Segregation B. Progression
C. Regression D. Aggregation

2. simple linear regression we want to model our data as ________________ [ ]

A. y = B0*x * B1 B. y = B0 + B1 * x
C. y = B1 * x D. y = B0 + x

3. RMSE can be computed as: ________ [ ]


n n

  pi  xi   y  x 
2 2
i i
Err  i 1
Err  i 1

A. n B. n
n n

 p  y   y  x 
2 2
i i i i
Err  i 1
Err  i 1

C. n D. n 1

4. When we have a single input attribute (x) and we want to use linear regression, this is [ ]
called ________________
A. Multiple Linear Regression B. Continuous Linear Regression
C. simple linear regression D. Auto Linear Regression

5. In R, Function used to find the a linear relation between x & y [ ]


A. lm(y,x) B. lm(y~x)
C. Linear(y,x) D. predict(x,y)

6. The goal of is to improve the Data Processing in an optimal way through attribute [ ]
subset selection
A. Rationalization B. Variable correlation
C. Variable Rationalization D. Various Rationalization

7. _____________is a mathematical approach to create a statistical model to forecast [ ]


future behavior based on input test data
A. Progressive modeling B. Professional modelling
C. Predictive modeling D. Pro-active modeling

8. Logistic Regression is ______________ modeling [ ]


A. Supervised B. Semi Supervised
C. Unsupervised D. Reinforcement learning

9. In multinomial Logistic regression, there can be 3 or more possible _________types [ ]


of the dependent variable

10
A. ordered B. Semi-ordered
C. unordered D. Under ordered

10. In Logistic Regression, the dependent variable must be ___________ in nature [ ]


A. Categorical B. Continuous
C. Correlated D. Classical

FILL IN THE BLANKS:

31. In ordinal Logistic regression, there can be 3 or more possible ______________types of


dependent variables.
32. In _______________, there can be only two possible types of the dependent variables Class-0 &
Class-1.
33. ____________is a statistical phenomenon in which multiple independent variables show high
correlation between each other and they are too inter-related.
34. Logistic Regression uses a complex function, known as the _______________
35. Sigmoid function is _________________
36. False positive is Type-2 Error (True/False) _______________
37. False Negative is Type ________ ( I or II) Error.
38. In _________ error, the actual value was negative but the model predicted a positive value
39. Formula for Precision: ______________
40. Formula for F1-Score: ______________

UNIT-4
MULTIPLE CHOICE QUESTIONS:
1. Supervised learning is a learning method in which models are trained using _______ [ ]
A. Unlabeled data B. Raw Data
C. Labeled data D. Complete Data

2. Unsupervised learning is a method in which _________ inferred from the unlabeled [ ]


input data.
A. Classes B. Patterns
C. Errors. D. outputs

3. Supervised learning needs supervision to _________ the model [ ]


A. update B. Test
C. train D. Both Train & Test

4. The purpose of ___________ is to better understand your customers rather than data [ ]
A. segmentation B. Regression
C. Segregation D. Correlation

5. Demographic Segmentation is a __________ segmentation [ ]


A. Can be Non-Objective or objective B. Non-Objective
C. Objective D. Semi-Objective

11
6. Decision Tree is a _________________technique [ ]
A. supervised learning B. unsupervised learning
C. Semi-supervised learning D. Non-supervised learning

7. It is a tree-structured classifier, where ___________ represent the features of a dataset [ ]


A. internal nodes & leaf nodes B. Root & Internal nodes
C. Root nodes & leaf nodes D. Leaf nodes

8. If the regression decision tree, the decision or the outcome variable is [ ]


______________
A. Continuous B. Categorical
C. Discrete D. distant

9. _________________is the process of removing the unwanted branches from the tree [ ]
A. Edging B. Pruning
C. Regression D. predicting

10. The decision Tree is _______________to errors [ ]


A. Tolerable B. Non-tolerable
C. Sensitive D. Doesn’t allow

FILL IN THE BLANKS:

11. Supervised learning can be categorized in __________________ problems.


12. _________________ can be classified in Clustering and Associations problems.
13. Linear Regression, Logistic Regression are unsupervised learning models (True/False)
______.
14. Decision Tree is the successor of _____________ .
15. CART is ___________________.
16. The best attribute in the dataset using _________________.
17. _________________is a measure of the randomness in the information being processed.
18. _____________ is a statistical property that measures how well a given attribute separates the
training examples.
19. ______________is a modeling error in statistics that occurs when a function is too closely
aligned to a limited set of data points in turn, will fail in testing.
20. ARIMA is an acronym that stands for ____________________________

12
UNIT- 5
MULTIPLE CHOICE QUESTIONS:
1. _____________ is the art and practice of gathering, analyzing, and graphically [ ]
representing empirical information.
A. Data Modification B. Data visualization
C. Data Validation D. Data Updating

2. __________ is used to get graphical output from data predictive analytics results. [ ]

A. Tabular Data B. Total Blue


C. Tableau D. Tally

3. Data Visualization induce the viewer to think about the substance rather than [ ]
about________ through graphic design
A. output B. outcome
C. methodology D. error

4. We need to choose the dimensions and measures in the process of __________ Data. [ ]
A. Extracting B. Estimating
C. Expressing D. Exploring

5. _____________are the category type data points such as landing page, source [ ]
medium, etc.
A. Directions B. Dimensions
C. Detections D. Du-points

6. One of the great __________ qualities Tableau has is its ability to filter data in real [ ]
time
A. show room B. Show space
C. Show case D. Work space

7. DPA: ____________________ [ ]
A. Data presentation architecture B. Dual presentation architecture
C. Data preparation architecture D. Directive presentation architecture

8. Data visualization is viewed by many disciplines as a modern equivalent of [ ]


___________
A. Virtual Communication B. Viral Communication
C. Visual Communication. D. Vivid Communication

9. ____________ is both an art and a science. [ ]


A. Data Virtualization B. Data Visualization
C. Data Variation D. Data Variance

10. __________can be more precise and revealing than conventional statistical [ ]


computations.
A. Data Representation B. Data Virtualization
C. Graphical representation D. Graphical Computation

13
FILL IN THE BLANKS:

11. Gain insight into an information space by mapping data onto ______________ provide
qualitative overview of large data sets.
12. ________________ is a Forecast Accuracy can be defined as the deviation of Forecast or
Prediction from the actual results.
13. In MFA, Error = _______________.
14. CHAID stands for ___________________.
15. Regression tree analysis is when the predicted outcome can be considered a
________________.
16. A ___________ tree is a binary decision tree that is constructed by splitting a node into two
child nodes repeatedly
17. Decision Tree Leaning can be able to handle both numerical and categorical data (True/False)
______________.
18. Decision Tree uses a White box Model (True/False) _______________.
19. Regression trees / parallel regression modeling, in which the dependent variable
is______________ .
20. The CART growing method attempts to _____________ within-node homogeneity.

14

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy