Unit 3
Unit 3
Which algorithm is commonly used for generating association rules from large transaction datasets?
1.) K-means
2.) Apriori
3.) Decision Trees
4.) Linear Regression
Solution :
Apriori is a popular algorithm for association rule mining.
Which metric is commonly used to evaluate the strength of association rules in market basket
analysis?
1.) Support
2.) F1-score
3.) Precision
4.) Accuracy
Solution :
Support measures the frequency of items occurring together.
_____ methods like Random Forests and Gradient Boosting improve the predictability of
regression trees.
1.) Ensemble
2.) Singular
3.) Linear
4.) Parametric
Solution :
Ensemble methods like Random Forests and Gradient Boosting enhance predictive accuracy by combining the outputs of
multiple trees, reducing individual error rates and improving stability and generalization of the model.
1.) Increases it
2.) Decreases it
4.) Randomizes it
Solution :
Pruning reduces a regression tree’s complexity by removing unnecessary branches. By cutting off these sections, the model
becomes less overfit to the training data and can generalize better to new data.
Which ensemble method improves the stability of regression trees?
Solution :
Random Forests are an ensemble method that combines multiple regression trees to enhance stability and predictive
performance. By aggregating the predictions from numerous trees, the impact of overfitting and instability is reduced.
A regression tree is known to split data points into two _____ divisions.
1.) binary
2.) single
3.) multiple
4.) linear
What technique is used to break down the dataset into subsets in a regression tree?
What would be the ideal complexity of the curve that can be used for separating the two classes
shown in the image below?
1.) Linear
2.) Quadratic
3.) Cubic
Solution :
The blue point in the red region is an outlier. The rest of the data is linearly separable.
Solution :
KNN is non-parametric because it does not make any assumption regarding the underlying data distribution. It is a lazy
learning technique because during training time it just memorizes the data and finally computes the distance during testing.
Suppose, you have been given the following data where x1 and x2 are the 2 inputvariables
and Class is the dependent variable.
What will be the class of a new data point x1=1 and x2=1 in 5-NN (k nearest neighbour with
k=5) using euclidean distance measure?
1.)
+
Class
2.)
−
Class
Solution :
5 nearest points to the new point (1,1) are: (0,1), (0,2), (1,0), (1,2), (2,2). The majority class among these 5 nearest neighbours
is + Class.
Solution :
The CART algorithm is a greedy algorithm that looks for the best split at the current level without guaranteeing the optimal
solution.
How is Gini impurity different from entropy in decision tree impurity measures?
1.) Gini impurity is based on message content while entropy is related to molecular order
2.) Entropy is used by default in decision trees while Gini impurity requires setting a
parameter
3.) Gini impurity measures molecular disorder while entropy measures information content
Solution :
Gini impurity measures disorder in a set, while entropy measures the average information content of a message.
Solution :
Finding the optimal decision tree is NP-Complete as it requires exponential time even for small training sets.
What is the time complexity for traversing a balanced Decision Tree in terms of the number of
nodes?
2.) O(m)
3.) O(log2(m))
4.) O(m^2)
Solution :
Traversing a balanced Decision Tree typically requires going through O(log2(m)) nodes.
What does the CART cost function aim to minimize in the training process?
Solution :
The CART cost function aims to minimize the impurity of subsets by finding the purest splits.
y
?
3.) Residual
4.) Slope
Solution :
What does a scatter plot with a regression line visualize in a linear regression model?
Solution :
A scatter plot with a regression line visualizes the fit between the independent and dependent variables.
Solution :
The goal of linear regression is to establish a linear equation that predicts y based on x.
y=θ +θ x
0 1 is known as _____.
1.) Gradient
2.) Slope
3.) Intercept
4.) Residual
Solution :
The constant term in the linear regression equation is known as the intercept.
Random Forest is an ensemble model that combines multiple decision trees.
1.) True
2.) False
Which of the following statements is true with respect to the K-NN classifier?
1. In case of a very large value of k, we may include points from other classes in the
neighborhood.
2. In case of too small a value of k, the algorithm is very sensitive to noise.
3. KNN classifier classifies unknown samples by assigning the label that is most frequent
among the k nearest training samples.
1.) Statement 1 only
Solution :
Statement 1, 2, and 3
Which of the following distance measures do we use in the case of categorical variables in
k-NN?
2.) only 2
3.) only 3
4.) 1 and 2
5.) 1, 2, and 3
Solution :
Both Euclidean and Manhattan distances are used in case of continuous variables, whereas hamming distance is used in case
of categorical variable.
Logistic Regression converts the output probability into a range of [0, 1]. Which function is
employed by logistic regression to achieve this transformation?
1.) Sigmoid
2.) Mode
3.) Square
4.) Median
Solution :
Sigmoid function is used to convert output probability between [0,1] in logistic regression.
In a decision tree, what criterion is commonly used to measure the impurity of a node?
4.) Entropy
1.) Prun
2.) Splitting
What is the primary purpose of the Classification and Regression Tree (CART) algorithm?
2.) Clustering
Solution :
3.) R-squared
Solution :
2.) outlier
Solution :
The main factor in Regression analysis which we want to predict or understand is called the dependent variable. It is also
called target variable.
If there is only one input variable (x), then such linear regression is called _____.
Solution :
If there is only one input variable (x), then such linear regression is called simple linear regression. And if there is more than
one input variable, then such linear regression is called multiple linear regression.
Logistic regression uses _____ function or logistic function which is a complex cost
function.
1.) quadratic
2.) sigmoid
3.) lasso
4.) linear
Solution :
Logistic regression uses sigmoid function or logistic function which is a complex cost function. This sigmoid function is used
to model the data in logistic regression.
Solution :
Linear time series models emphasize linear relationships between variables in the time series data.
Which statistical technique is commonly employed in estimating the parameters of linear time
series models?
Solution :
Maximum likelihood estimation (MLE) is commonly employed in estimating the parameters of linear time series models.
Consider a 2-class [y= {-1, 1}] classification problem of 2-dimensional feature vectors. The
support vectors and the corresponding class label and lagrangian multipliers are provided. Find
the value of the SVM weight matrix W.
1.) (-1,3)
2.) (2, 0)
3.) (-2, 4)
4.) (-2, 2)
The values of Lagrange multipliers corresponding to the support vectors can be:
1.) Perceptrons can implement Logic Gates like AND, OR, or XOR.
2.) Single layer Perceptrons can learn only linearly separable patterns.
3.) Perceptron Learning Rule states that the algorithm would automatically learn the optimal
weight coefficients
1.) Single-layer
2.) Multi-layer
3.) Both The SVM allows very low error in classification and The SVM allows high amount of
error in classification
1.) Single-layer
2.) Multi-layer
A 4-input neuron has weights 2, 3, 4 and 5 and the transfer function is linear with the constant of
proportionality being equal to 2.The inputs are 5, 12, 6 and 10 respectively.
1.) 234
2.) 240
3.) 245
4.) 230
Solution :
240
A perceptron works by taking in some numerical inputs along with what is known as _____
and ___.
1.) weights,bias
2.) threshold,bias
We usually use feature normalization before using the Gaussian kernel in SVM. What is true
about feature normalization?
1.) 1
2.) 1 and 2
3.) 1 and 3
4.) 2 and 3
Solution :
1.) Modular
2.) FeedForward
4.) FeedBack
Solution :
Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?
1.) The model would consider only the points close to the hyperplane for modeling
2.) The model would consider even far away points from hyperplane for modeling
3.) The model would not be affected by distance of points from hyperplane for modeling
Solution :
The gamma parameter in SVM tuning signifies the influence of points either near or far away from the hyperplane. For a low
gamma, the model will be too constrained and include all points of the training dataset, without really capturing the shape.
For a higher gamma, the model will capture the shape of the dataset well.
The effectiveness of an SVM depends upon:
Solution :
The SVM effectiveness depends upon how you choose the basic 3 requirements mentioned above in such a way that it
maximises your efficiency, reduces error and overfitting.