This Sheet Is For 1 Mark Questions S.R No
This Sheet Is For 1 Mark Questions S.R No
S.r No
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
This sheet is for 1 Mark questions
Question
Write down question
In reinforcement learning if feedback is negative one it is defined as____.
According to____ , it’s a key success factor for the survival and evolution of all species.
During the last few years, many ______ algorithms have been applied to deep
neural networks to learn the best policy for playing Atari video games and to teach an agent how to
associate the right action with an input representing the state.
A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a college.
Which of the following statement is true in following case?
Can a model trained for item based similarity also choose from a given set of items?
What are common feature selection methods in regression task?
The parameter______ allows specifying the percentage of elements to put into the test/training set
In many classification problems, the target ______ is made up of categorical labels which cannot
immediately be processed by any algorithm.
_______adopts a dictionary-oriented approach, associating to each category label a progressive integer
number.
If Linear regression model perfectly first i.e., train error is zero, then _____________________
Which of the following metrics can be used for evaluating regression models?i) R Squaredii) Adjusted R Squarediii) F Statis
How many coefficients do you need to estimate in a simple linear regression model (One independent variable)?
In a simple linear regression model (One independent variable), If we change the input variable by 1 unit. How much outpu
Which of the following methods do we use to find the best fit line for data in Linear Regression?
Which of the following evaluation metrics can be used to evaluate a model while modeling a
continuous output variable?
Suppose you plotted a scatter plot between the residuals and predicted values in linear regression and
you found that there is a relationship between them. Which of the following conclusion do you make
about this situation?
Naive Bayes classifiers are a collection ------------------of algorithms
Naive Bayes classifiers is _______________ Learning
Features being classified is independent of each other in Naïve Bayes Classifier
Features being classified is __________ of each other in Naïve Bayes Classifier
Bayes Theorem is given by where 1. P(H) is the probability of hypothesis H being true.
2. P(E) is the probability of the evidence(regardless of the hypothesis).
3. P(E|H) is the probability of the evidence given that hypothesis is true.
4. P(H|E) is the probability of the hypothesis given that the evidence is there.
Even if there are no actual supervisors ________ learning is also based on feedback provided
by the environment
When it is necessary to allow the model to develop a generalization ability and avoid a common
problem called______.
Techniques involve the usage of both labeled and unlabeled data is called___.
According to____ , it’s a key success factor for the survival and evolution of all species.
A supervised scenario is characterized by the concept of a _____.
overlearning causes due to an excessive ______.
_____ provides some built-in datasets that can be used for testing purposes.
While using _____ all labels are
turned into sequential numbers.
_______produce sparse matrices of real numbers that can be fed into any machine learning
model.
scikit-learn offers the class______, which is responsible for filling the holes using a strategy
based on the mean, median, or frequency
Which of the following scale data by removing elements that don't belong to a given range or
by considering a maximum absolute value.
scikit-learn also provides a class for per-sample normalization,_____
______dataset with many features contains information proportional to the independence of all
features and their variance.
In order to assess how much information is brought by each component, and the correlation
among them, a useful tool is the_____.
The_____ parameter can assume different values which determine how the data matrix is
initially processed.
______allows exploiting the natural sparsity of data while extracting principal components.
Which of the following evaluation metrics can be used to evaluate a model while modeling a
continuous output variable?
Overfitting is more likely when you have huge amount of data to train?
Suppose you plotted a scatter plot between the residuals and predicted values in linear regression and
you found that there is a relationship between them. Which of the following conclusion do you make
about this situation?
Let’s say, a “Linear regression” model perfectly fits the training data (train error is zero). Now, Which
of the following statement is true?
In a linear regression problem, we are using “R-squared” to measure goodness-of-fit. We add a feature
in linear regression model and retrain the same model.Which of the following option is true?
Which of the following assumptions do we make while deriving linear regression parameters?1. The
true relationship between dependent y and predictor x is linear2. The model errors are statistically
independent3. The errors are normally distributed with a 0 mean and constant standard deviation4. The
predictor x is non-stochastic and is measured error-free
To test linear relationship of y(dependent) and x(independent) continuous variables, which of the
following plot best suited?
which of the following step / assumption in regression modeling impacts the trade-off between
under-fitting and over-fitting the most.
Which of the following statement(s) can be true post adding a variable in a linear regression model?1.
R-Squared and Adjusted R-squared both increase2. R-Squared increases and Adjusted R-squared
decreases3. R-Squared decreases and Adjusted R-squared decreases4. R-Squared decreases and
Adjusted R-squared increases
How many coefficients do you need to estimate in a simple linear regression model (One independent
variable)?
In given image, P(H) is__________probability.
Conditional probability is a measure of the probability of an event given that another event has
already occurred.
Gaussian distribution when plotted, gives a bell shaped curve which is symmetric about the _______ of the feature
SVMs directly give us the posterior probabilities P(y = 1jx) and P(y = 1jx)
SVM is a ------------------ algorithm
What is/are true about kernel in SVM?1. Kernel function map low dimensional data to high
dimensional space2. It’s a similarity function
Suppose you are building a SVM model on data X. The data X can be error prone which means
that you should not trust any specific data point too much. Now think that you want to build a
SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack
variable C as one of it’s hyper parameter.What would happen when you use very small C
(C~0)?
Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions
that might be related to the event.
Bernoulli Naïve Bayes Classifier is ___________distribution
If you remove the non-red circled points from the data, the decision boundary will change?
Binarize parameter in BernoulliNB scikit sets threshold for binarizing of sample features.
Adjusted R Squarediii) F Statist a) ii and iv b) i and ii c) ii, iii and iv d) i, ii, iii and i d
dependent variable)? a) 1 b) 2 c) 3 d) 4 b
ble by 1 unit. How much output a) by 1 b) no change c) by interceptd) by its slope d
bayes.jpg True 0 a
bayes.jpg Posterior Prior a
bayes.jpg Posterior Prior b
has already occurred. True 0 a
ns that might be related to the eTrue 0 a
Continuous Discrete Binary c
Continuous Discrete Binary b
Continuous Discrete Binary a
True 0 a
_______ of the feature values. Mean Variance Discrete Random a
True 0 b
True 0 a
ernel) might lead to overfitting True 0 a
Classification Clustering Regression All a
Supervised Unsupervised Both None a
True 0 a
None of the
-- cl_forecastB cl_nowcastC cl_precastD D
Mentioned
-- fast accuracy scalable All above D
Supervised
Unsupervised
Learning and None of the
-- Learning and Both A & B C
Semi-supervised Mentioned
Transduction
Learning
a set of
split the set of group the set of observed
learns
example into the example into instances
-- programs A
training set and the training set tries to
from data
the test and the test induce a
general rule
Artificial Rule based None of the
-- Both A & B B
Intelligence inference Mentioned
when a Find
The process of
statistical interesting
selecting models
model directions in
among different
describes data and
-- mathematical All above A
random error or find novel
models, which are
noise instead of observations
used to describe
underlying / database
the same data set
relationship cleaning
Genetic
Speech
Programming and None of the
-- recognition and Both A & B A
Inductive Mentioned
Regression
Learning
Unsupervise None of the
-- Supervised Reinforcement Learning is B
d above
the ability to
It is a set of
Robots are A set of data is change
data is used
programed so that used to according to
to discover
they can perform discover the external
-- the C
the task based on potentially stimuli and
potentially
data they gather predictive rememberin
predictive
from sensors. relationship. g most of all
relationship.
previous
Classificatio
-- Overfitting Overlearning experiences. Regression A
n
Semi- Unsupervise None of the
-- Supervised B
supervised d above
None of
-- Penalty Overlearning Reward A
above
Claude Shannon's Darwin’s None of
-- Gini Index C
theory theory above
-- Programmer Teacher Author Farmer B
Reinforceme
-- Capacity Regression Accuracy A
nt
None of the
-- PCA K-Means A
above
-- MCV MARS MCRS All above B
-- YES NO A
-- NO YES B
None of the
-- regression classification C
above
None of the
-- scikit-learn classification regression A
above
LabelEncoder LabelBinarizer DictVectoriz FeatureHash
-- A
class class er er
None of the
-- DictVectorizer FeatureHasher Both A & B C
Mentioned
DictVectoriz
-- LabelEncoder LabelBinarizer Imputer D
er
None of the
-- MinMaxScaler MaxAbsScaler Both A & B C
Mentioned
-- Normalizer Imputer Classifier All above A
None of the
-- normalized unnormalized Both A & B B
Mentioned
Convergance Supportive Covariance
-- Concuttent matrix D
matrix matrix matrix
-- run start stop C
init
init
-- SparsePCA KernelPCA SVD A
parameter
Mean-
-- AUC-ROC Accuracy Logloss Squared- D
Error
A or B
-- Lower is better Higher is better depend on the None of these A
situation
-- 1 0 B
Linear regression
Linear regression is
-- is not sensitive to Can’t say None of these A
sensitive to outliers
outliers
Since the there is a Since the there is
relationship means a relationship
-- Can’t say None of these A
our model is not means our model
good is good
You can not
You will always None of the
-- have test error C
have test error zero above
zero
Individually
R squared
cannot tell
If R Squared If R Squared about
increases, this decreases, this variable None of
-- C
variable is variable is not importance. these.
significant. significant. We can’t say
anything
about it right
now.
Linear Linear
Linear Regression
Regression with Regression
-- with varying error None of these A
constant error with zero
terms
terms error terms
None of the
-- 1 and 2 1 and 3 2 and 4 A
above
-- 1 2 Can’t Say B
bayes.jpg Posterior Prior B
-- True 0 A
Data will be
Misclassification None of
-- correctly Can’t say A
would happen these
classified
The tradeoff
between
The number of
The kernel to misclassifica None of the
-- cross-validations C
be used tion and above
to be made
simplicity of
the model
-- True 0 A
-- Continuous Discrete Binary C
svm.jpg 1 0 B
b. Replace c. Assign a
missing values unique
a. Drop missing d. All of the
-- with category to D
rows or columns above
mean/median/m missing
ode values
-- True 0 A
C. Attribu
B. Attributes tes are
are statistically statistically D. Attribu
A. Attributes
dependent of independent tes can be
-- are equally B
one another of one nominal or
important.
given the class another numeric
value. given the
class value.
The data is
The data is noisy and
The data is
-- clean and ready contains C
linearly separable
to use overlapping
points
-- Supervised Unsupervised Both None A
-- False 1 B
Partial
-- Independent Dependent None A
Dependent
bayes.jpg True 0 A
-- True 0 A
This sheet is for 2 Mark questions
S.r No Question
e.g 1 Write down question
1 A supervised scenario is characterized by the concept of a _____.
2 overlearning causes due to an excessive ______.
Some people are using the term ___ instead of prediction only to avoid the weird idea that machine
5
learning is a sort of modern magic.
The term _____ can be freely used, but with the same meaning adopted in physics or system
6
theory.
Even if there are no actual supervisors ________ learning is also based on feedback provided by
8
the environment
12 What are the two methods used for the calibration in Supervised Learning?
16 _____ provides some built-in datasets that can be used for testing purposes.
While using _____ all labels are
17
turned into sequential numbers.
18 _______produce sparse matrices of real numbers that can be fed into any machine learning model.
scikit-learn offers the class______, which is responsible for filling the holes using a strategy based
19
on the mean, median, or frequency
Which of the following scale data by removing elements that don't belong to a given range or by
20
considering a maximum absolute value.
21 Which of the following model model include a backwards elimination feature selection routine?
22 Can we extract knowledge without apply feature selection
23 While using feature selection on the data, is the number of features decreases.
24 Which of the following are several models for feature extraction
25 scikit-learn also provides a class for per-sample normalization,_____
______dataset with many features contains information proportional to the independence of all
26
features and their variance.
In order to assess how much information is brought by each component, and the correlation among
27
them, a useful tool is the_____.
The_____ parameter can assume different values which determine how the data matrix is initially
28
processed.
29 ______allows exploiting the natural sparsity of data while extracting principal components.
30 Which of the following is an example of a deterministic algorithm?
31 Let’s say, a “Linear regression” model perfectly fits the training data (train error is zero). Now,
Which of the following statement is true?
32
In a linear regression problem, we are using “R-squared” to measure goodness-of-fit. We add a
feature in linear regression model and retrain the same model.Which of the following option is
true?
33
Which of the one is true about Heteroskedasticity?
Which of the following assumptions do we make while deriving linear regression parameters?1.
34 The true relationship between dependent y and predictor x is linear2. The model errors are
statistically independent3. The errors are normally distributed with a 0 mean and constant
standard deviation4. The predictor x is non-stochastic and is measured error-free
35 To test linear relationship of y(dependent) and x(independent) continuous variables, which of the
following plot best suited?
36 Generally, which of the following method(s) is used for predicting continuous dependent
variable?1. Linear Regression2. Logistic Regression
Suppose you are training a linear regression model. Now consider these points.1. Overfitting is
37 more likely if we have less data2. Overfitting is more likely when the hypothesis space is
small.Which of the above statement(s) are correct?
Suppose we fit “Lasso Regression” to a data set, which has 100 features (X1,X2…X100). Now, we
38 rescale one of these feature by multiplying with 10 (say that feature is X1), and then refit Lasso
regression with the same regularization parameter.Now, which of the following option will be
correct?
39 Which of the following is true about “Ridge” or “Lasso” regression methods in case of feature
selection?
Which of the following statement(s) can be true post adding a variable in a linear regression
40 model?1. R-Squared and Adjusted R-squared both increase2. R-Squared increases and Adjusted
R-squared decreases3. R-Squared decreases and Adjusted R-squared decreases4. R-Squared
decreases and Adjusted R-squared increases
We can also compute the coefficient of linear regression with the help of an analytical method
41 called “Normal Equation”. Which of the following is/are true about “Normal Equation”?1. We
don’t have to choose the learning rate2. It becomes slow when number of features is very large3.
No need to iterate
42 How many coefficients do you need to estimate in a simple linear regression model (One
independent variable)?
43 If two variables are correlated, is it necessary that they have a linear relationship?
44 Correlated variables can have zero correlation coeffficient. True or False?
45
Which of the following option is true regarding “Regression” and “Correlation” ?Note: y is
dependent variable and x is independent variable.
What is/are true about kernel in SVM?1. Kernel function map low dimensional data to high
46
dimensional space2. It’s a similarity function
47 Suppose you are building a SVM model on data X. The data X can be error prone which means that you should no
48 Suppose you are using a Linear SVM classifier with 2 class classification problem. Now you have be
49 If you remove the non-red circled points from the data, the decision boundary will change?
50
When the C parameter is set to infinite, which of the following holds true?
51
Suppose you are building a SVM model on data X. The data X can be error prone which means that you should no
52
SVM can solve linear and non-linear problems
53
The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space(N — the
54
Hyperplanes are _____________boundaries that help classify the data points.
55
The _____of the hyperplane depends upon the number of features.
56
Hyperplanes are decision boundaries that help classify the data points.
57 SVM algorithms use a set of mathematical functions that are defined as the kernel.
58
In SVM, Kernel function is used to map a lower dimensional data into a higher dimensional data.
59
In SVR we try to fit the error within a certain threshold.
60
When the C parameter is set to infinite, which of the following holds true?
61
How do you handle missing or corrupted data in a dataset?
62
What is the purpose of performing cross-validation?
63
Which of the following is true about Naive Bayes ?
64
71 What are the two methods used for the calibration in Supervised Learning?
______can be adopted when it's necessary to categorize a large amount of data with a few
72
complete examples or when there's the need to impose some constraints to a clustering algorithm.
73 In reinforcement learning, this feedback is usually called as___.
In the last decade, many researchers started training bigger and bigger models, built with several
74
different layers that's why this approach is called_____.
there's a growing interest in pattern recognition and associative memories whose structure and
75 functioning are similar to what happens in the neocortex. Such an approach also allows simpler
algorithms called _____
76 ______ showed better performance than other approaches, even without a context-based model
77 Common deep learning applications / problems can also be solved using____
Some people are using the term ___ instead of prediction only to avoid the weird idea that machine
78
learning is a sort of modern magic.
The term _____ can be freely used, but with the same meaning adopted in physics or system
79
theory.
80 If there is only a discrete number of possible outcomes called _____.
A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a
81 college.
Which of the following statement is true in following case?
82 What would you do in PCA to get the same projection as SVD?
84 Can a model trained for item based similarity also choose from a given set of items?
85 What are common feature selection methods in regression task?
86 The parameter______ allows specifying the percentage of elements to put into the test/training set
In many classification problems, the target ______ is made up of categorical labels which cannot
87
immediately be processed by any algorithm.
_______adopts a dictionary-oriented approach, associating to each category label a progressive
88
integer number.
________is much more difficult because it's necessary to determine a supervised strategy to train a
89
model for each feature and, finally, to predict their value
99 If Linear regression model perfectly first i.e., train error is zero, then _____________________
Which of the following metrics can be used for evaluating regression models?i) R Squaredii)
100
Adjusted R Squarediii) F Statisticsiv) RMSE / MSE / MAE
101 In syntax of linear model lm(formula,data,..), data refers to ______
102 Linear Regression is a supervised machine learning algorithm.
103 It is possible to design a Linear regression algorithm using a neural network?
104 Which of the following methods do we use to find the best fit line for data in Linear Regression?
Suppose you are training a linear regression model. Now consider these points.1. Overfitting is
105 more likely if we have less data2. Overfitting is more likely when the hypothesis space is
small.Which of the above statement(s) are correct?
We can also compute the coefficient of linear regression with the help of an analytical method
called “Normal Equation”. Which of the following is/are true about “Normal Equation”?1. We
106
don’t have to choose the learning rate2. It becomes slow when number of features is very large3.
No need to iterate
Which of the following option is true regarding “Regression” and “Correlation” ?Note: y is
107
dependent variable and x is independent variable.
In a simple linear regression model (One independent variable), If we change the input variable by
108
1 unit. How much output variable will change?
Generally, which of the following method(s) is used for predicting continuous dependent variable?
109
1. Linear Regression2. Logistic Regression
How many coefficients do you need to estimate in a simple linear regression model (One
110
independent variable)?
Suppose you are building a SVM model on data X. The data X can be error prone which
means that you should not trust any specific data point too much. Now think that you want
111 to build a SVM model which has quadratic kernel function of polynomial degree 2 that
uses Slack variable C as one of it’s hyper parameter.What would happen when you use
very large value of C(C->infinity)?
112 SVM can solve linear and non-linear problems
The objective of the support vector machine algorithm is to find a hyperplane in an N-
113
dimensional space(N — the number of features) that distinctly classifies the data points.
114 Hyperplanes are _____________boundaries that help classify the data points.
115 When the C parameter is set to infinite, which of the following holds true?
130 When the C parameter is set to infinite, which of the following holds true?
Image a b c
img.jpg Option a Option b Option c
Programmer Teacher Author
Capacity Regression Reinforcement
Genetic Programming
Speech recognition and
and Both A & B
Regression
Inductive Learning
run start
init
SparsePCA KernelPCA SVD
None of the
PCA K-Means
above
A. You will always have B. You can not have test error C. None of the
test error zero zero above
C. Individually R
squared cannot
tell about variable
A. If R Squared importance. We
increases, this variable B. If R Squared decreases, this can’t say anything
is significant. variable is not significant. about it right now.
A. Linear Regression C. Linear
with varying error B. Linear Regression with Regression with
terms constant error terms zero error terms
C. 1 is True and 2
A. Both are False B. 1 is False and 2 is True is False
A. 1 B. 2 C. Can’t Say
A. Yes B. No
A. True B. False
C. The relationship
is not symmetric
between x and y in
case of correlation
A. The relationship is B. The relationship is not but in case of
symmetric between x symmetric between x and y in regression it is
and y in both. both. symmetric.
1 2 1 and 2
h means that you should not trust anMisclassification would Data will be correctly classified Can’t say
svm.jpg yes no
svm.jpg 1 0
The optimal
hyperplane if exists,
will be the one that
completely separates The soft-margin classifier will
the data separate the data None of the above
h means that you should not trust anWe can still classify dat We can not classify data correctl Can’t Say
1 0
1 0
1 0
1 0
1 0
The optimal
hyperplane if exists,
will be the one that
completely separates The soft-margin classifier will
the data separate the data None of the above
c. Assign a unique
a. Drop missing rows b. Replace missing values with category to
or columns mean/median/mode missing values
a. To assess the b. To judge how the trained
predictive performance model performs outside the
of the models sample on test data c. Both A and B
a. Assumes that all the
features in a dataset are b. Assumes that all the features
equally important in a dataset are independent c. Both A and B
C. Attributes are
statistically
B. Attributes are statistically independent of one
A. Attributes are dependent of one another given another given the
equally important. the class value. class value.
PCA Decision Tree Naive Bayesian
By using inductive machine By using
-- By using a lot of data
learning validation only
Decision Trees and
Probabilistic networks and Support vector
-- Neural Networks (back
Nearest Neighbor machines
propagation)
Training set is used to
A set of data is used to discover
test the accuracy of the
-- hypotheses generated
the potentially predictive Both A & B
relationship.
by the learner.
Inductive Vs
Concept Vs
-- Classification Learning
Symbolic Vs Statistical Learning Analytical
Learning
Find clusters of the data
and find low- Find interesting directions in Interesting
-- dimensional data and find novel observations/ coordinates and
representations of the database cleaning correlations
data
Platt Calibration and Statistics and
-- Isotonic Regression Informal Retrieval
-- Supervised Semi-supervised Reinforcement
-- Overfitting Overlearning Reward
Reinforcement
-- Deep learning Machine learning
learning
Reinforcement
-- Machine learning Deep learning
learning
Real-time visual object
-- identification
Classic approaches Automatic labeling
The relationship is
not symmetric
The relationship is The relationship is not between x and y in
-- symmetric between x symmetric between x and y in case of correlation
and y in both. both. but in case of
regression it is
symmetric.
-- by 1 no change by intercept
-- 1 2 3
man.jpg 1 0
weather data.jpg 0.4 0.64 0.29
-- 1 0
-- 1 0
The optimal
hyperplane if exists,
The soft-margin classifier will None of the
-- will be the one that
separate the data above
completely separates
the data
d Correct Answer
Option d a/b/c/d
Farmer B
Accuracy A
None of above B
learns programs
A
from data
None of above A
Prediction D
None of the
A
Mentioned
None of the
B
above
Bio-inspired
adaptive B
systems
All above D
All D
learns programs
A
from data
None of the
B
Mentioned
All above A
None of the
A
above
FeatureHasher A
None of the
C
Mentioned
Imputer D
None of the
C
Mentioned
All above B
A
B
C
All above A
None of the
B
Mentioned
Covariance
D
matrix
stop C
init parameter A
A
D. None of
these. c
D. None of
these a
D. All of above. d
D. None of
these a
D. None of
these. b
D. Both are
True c
D. None of
these b
D. None of
above b
D. None of the
above a
D. 1,2 and 3. d
b
b
a
D. The
relationship is
symmetric
between x and
y in case of
correlation but
in case of
regression it is
not symmetric. d
None of these c
None of these a
a
b
None of these a
a
a
d. All of the
above d
d. None of the
above option c
D. Attributes
can be nominal
or numeric b
Linerar regression a
None of above A
All D
None of above B
All above D
All D
Clusters B
None of above C
Unsupervised
A
learning
Scalable C
Supervised
B
learning
Bio-inspired
adaptive B
systems
None of above A
Prediction D
None of above B
Both of these B
None of these A
All above D
A
None of these C
None of these C
All above B
FeatureHasher A
All above B
missing_values D
FeatureHasher A
max, l3 and l4
B
norms
All above A
All above A
None of the
B
Mentioned
B
A
None of these B
Test error is
equal to Train C
error
i, ii, iii and iv D
List B
A
A
Both A and B A
1,2 and 3. D
The relationship
is symmetric
between x and y
in case of
D
correlation but
in case of
regression it is
not symmetric.
by its slope D
None of these. B
4 B
None of these A
A
A
B
None A
A
B
All of the
D
above
A
A
A
All A
A
A
A
0.75 B
A
A
A
This sheet is for 3 Mark questions
S.r No Question Image a
e.g 1 Write down question img.jpg Option a
1 Which of the following is characteristic of best fast
machine learning method ?
56 a. Assumes
that all the
Which of the following is true about Naive Bayes features in a
--
? dataset are
equally
important
57 Suppose you are using a Linear SVM classifier
with 2 class classification problem. Now you
have been given the following data in which
some points are circled red that are representing svm.jpg yes
support vectors.If you remove the following any
one red points from the data. Does the decision
boundary will change?
58 Linear SVMs have no hyperparameters that need
-- 1
to be set by cross-validation
59 For the given weather data, what is the
probability that players will play if weather is weather data.jpg 0.5
sunny
60 100 people are at party. Given data gives
information about how many wear pink or not,
man.jpg 0.4
and if a man or not. Imagine a pink wearing guest
leaves, what is the probability of being a man
61 Problem: Players will play if weather is sunny. Is this statement
weather is correct?
data.jpg 1
62 For the given weather data, Calculate probability oweather data.jpg 0.4
63 For the given weather data, Calculate probability oweather data.jpg 0.4
64 For the given weather data, what is the probabilityweather data.jpg 0.5
65 100 people are at party. Given data gives informatman.jpg 0.4
66 100 people are at party. Given data gives informa man.jpg 1
67 What do you mean by generalization error in terms of the SVM? How far the hyp
68 What do you mean by a hard margin? The SVM allows
69 The minimum time complexity for training an SVM is O(n2). Accordin Large datasets
70 The effectiveness of an SVM depends upon: Selection of Ke
71 Support vectors are the data points that lie closest to the decision 1
72 The SVM’s are less effective when: The data is line
73 Suppose you are using RBF kernel in SVM with high Gamma value. The model would
74
The number of
cross-
validations to
The cost parameter in the SVM means: be made
75 If I am using all features of my dataset and I achieve 100% accurac Underfitting
76 Which of the following are real world applications of the SVM? Text and Hypert
77 Suppose you have trained an SVM with linear
decision boundary after training SVM, you
correctly infer that your SVM model is under
fitting.Which of the following option would you
more likely to consider iterating SVM next time? You want to inc
78 We usually use feature normalization before using the Gaussian kernel in SVM. 1What is true about feature
79 Linear SVMs have no hyperparameters that need to be set by cross-vali 1
80 In a real problem, you should check to see if the SVM is separable and the 1
b c d Correct Answer
Option b Option c Option d a/b/c/d
accuracy scalable All above D
Similarity Automatic
All above D
detection labeling
it's impossible
it's often very to have a
All above D
dynamic precise error
measure
Autonomous
car driving, Bioinformatics,
All above D
Logistic Speech
optimization recognition
Image Autonomous
classification, car driving, Bioinformatics,
A
Real-time Logistic Speech
visual tracking optimization recognition
Frequency
distribution of
Train and Test
categories is
always have
different in Both A and B D
same
train as
distribution.
compared to
the test dataset.
It may be used It discovers
It is used for
for causal D
prediction.
interpretation. relationships.
Density-Based Spectral
Spatial Clustering Find All above D
Clustering clusters
make_regressio
make_blobs() All above D
n()
random_state test_size training_size B
2 3 4 B
LabelBinarizer
DictVectorizer FeatureHasher C
class
Using an
Creating sub- automatic
model to strategy to input
All above A
predict those them according
features to the other
known values
In case of
In case of In case of very
very large
very large large lambda;
lambda; bias
lambda; bias is bias is high, C
is low,
high, variance variance is
variance is
is low high
high
Both Ridge
Lasso None of both B
and Lasso
lr(formula, lrm(formula, regression.linear
A
data) data) (formula, data)
(Slope, X- (Y-Intercept, (slope, Y-
C
Intercept) Slope) Intercept)
Relation Relation
Correlation
between the between the
can’t judge the B
X1 and Y is X1 and Y is
relationship
strong neutral
Remain
Decrease Can’t Say D
constant
Bias
Bias decreases Bias increases
decreases and
and Variance and Variance D
Variance
decreases decreases
increases
0 A
Discrete Binary B
0 A
Decision Naive Linerar
A
Tree Bayesian regression
The model The model
would would not be
consider only affected by
None of the
the points distance of B
above
close to the points from
hyperplane hyperplane for
for modeling modeling
Discrete Binary A
Nothing, the
model is Overfitting C
perfect
b. To judge
how the
trained model
c. Both A and
performs C
B
outside the
sample on test
data
b. Assumes
that all the
c. Both A and d. None of the
features in a C
B above option
dataset are
independent
no A
0 B
0 a
0.64 0.29 0.75 b
0.64 0.36 0.5 c
0.26 0.73 0.6 d
0.2 0.6 0.45 b
0 a
How accuratelyThe threshold amount of error b
The SVM allows None of the above a
Small dataset Medium sized dSize does not ma
Kernel ParameSoft Margin Pa All of the abov d
0 a
The data is cl The data is noisy and contains c
The model would The model would None of the abb
The tradeoff
between
misclassificati
on and
The kernel to simplicity of None of the
be used the model above c
Nothing, the mo Overfitting c
Image ClassificClustering of N All of the abov d