0% found this document useful (0 votes)
9 views19 pages

Unit 3

The document covers various algorithms and concepts in data mining, particularly focusing on association rule learning, regression trees, and classification techniques. Key algorithms discussed include Apriori for association rules, CART for decision trees, and K-Nearest Neighbors for classification. Additionally, it addresses metrics like support and confidence, the importance of pruning in regression trees, and the significance of various statistical methods in linear regression and SVM.

Uploaded by

Lieo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views19 pages

Unit 3

The document covers various algorithms and concepts in data mining, particularly focusing on association rule learning, regression trees, and classification techniques. Key algorithms discussed include Apriori for association rules, CART for decision trees, and K-Nearest Neighbors for classification. Additionally, it addresses metrics like support and confidence, the importance of pruning in regression trees, and the significance of various statistical methods in linear regression and SVM.

Uploaded by

Lieo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

UNIT III

Which algorithm is commonly used for generating association rules from large transaction datasets?
1.) K-means
2.) Apriori
3.) Decision Trees
4.) Linear Regression

Solution :
Apriori is a popular algorithm for association rule mining.

What is the primary goal of Association Rule Learning in data mining?


1.) Classifying data into predefined categories
2.) Discovering interesting relationships in data
3.) Predicting numeric values
4.) Clustering similar data points

Solution :
Association Rule Learning aims to find relationships or patterns in data.

Which metric is commonly used to evaluate the strength of association rules in market basket
analysis?
1.) Support
2.) F1-score
3.) Precision
4.) Accuracy

Solution :
Support measures the frequency of items occurring together.

In the context of Association Rule Learning, what is the confidence of a rule?


1.) The number of transactions in the dataset
2.) The strength of association between items in the rule
3.) The percentage of transactions containing the antecedent that also contain the consequent
4.) The probability of the rule being incorrect

Solution :
Confidence reflects the reliability of a rule.
If a rule has a support of 0.1 and there are 1,000 transactions in the dataset, how many transactions
contain the items in the rule?
1.) 1000
2.) 100
3.) 50
4.) 10

Solution :
Support is the percentage of transactions that contain the items in the rule. In this case, the support is 0.1, which means 10% of the
transactions contain the items. To find the absolute number of transactions, we calculate 0.1 * 1,000 = 100. Therefore, 100 transactions
contain the items in the rule.

_____ methods like Random Forests and Gradient Boosting improve the predictability of
regression trees.

1.) Ensemble

2.) Singular

3.) Linear

4.) Parametric

Solution :

Ensemble methods like Random Forests and Gradient Boosting enhance predictive accuracy by combining the outputs of
multiple trees, reducing individual error rates and improving stability and generalization of the model.

How does pruning affect a regression tree's complexity?

1.) Increases it

2.) Decreases it

3.) Keeps it unchanged

4.) Randomizes it

Solution :

Pruning reduces a regression tree’s complexity by removing unnecessary branches. By cutting off these sections, the model
becomes less overfit to the training data and can generalize better to new data.
Which ensemble method improves the stability of regression trees?

1.) Simple averaging

2.) Random Forests

3.) Data normalization

4.) Gradient Descent

Solution :

Random Forests are an ensemble method that combines multiple regression trees to enhance stability and predictive
performance. By aggregating the predictions from numerous trees, the impact of overfitting and instability is reduced.

A regression tree is known to split data points into two _____ divisions.

1.) binary

2.) single

3.) multiple

4.) linear

What technique is used to break down the dataset into subsets in a regression tree?

1.) Recursive partitioning

2.) Data swapping

3.) Feature engineering

4.) Linear regression

What would be the ideal complexity of the curve that can be used for separating the two classes
shown in the image below?
1.) Linear

2.) Quadratic

3.) Cubic

4.) insufficient data to draw conclusion

Solution :

The blue point in the red region is an outlier. The rest of the data is linearly separable.

K-Nearest Neighbor is a _____ , _____ algorithm.

1.) non-parametric, eager

2.) parametric, eager

3.) non-parametric, lazy

4.) parametric, lazy

Solution :

KNN is non-parametric because it does not make any assumption regarding the underlying data distribution. It is a lazy
learning technique because during training time it just memorizes the data and finally computes the distance during testing.

Suppose, you have been given the following data where x1 and x2 are the 2 inputvariables
and Class is the dependent variable.

What will be the class of a new data point x1=1 and x2=1 in 5-NN (k nearest neighbour with
k=5) using euclidean distance measure?
1.)

+
Class

2.)


Class

3.) Cannot be determined

4.) Insufficient Data

Solution :

5 nearest points to the new point (1,1) are: (0,1), (0,2), (1,0), (1,2), (2,2). The majority class among these 5 nearest neighbours

is + Class.

What is the significance of the CART algorithm in decision tree generation?

1.) It finds the optimal tree in polynomial time

2.) It uses entropy exclusively as an impurity measure

3.) It always guarantees the optimal solution

4.) It greedily seeks the best split at each level

Solution :

The CART algorithm is a greedy algorithm that looks for the best split at the current level without guaranteeing the optimal
solution.

How is Gini impurity different from entropy in decision tree impurity measures?
1.) Gini impurity is based on message content while entropy is related to molecular order

2.) Entropy is used by default in decision trees while Gini impurity requires setting a
parameter

3.) Gini impurity measures molecular disorder while entropy measures information content

4.) Entropy always leads to faster predictions compared to Gini impurity

Solution :

Gini impurity measures disorder in a set, while entropy measures the average information content of a message.

Why is finding the optimal decision tree known to be an NP-Complete problem?

1.) It requires exponential time for even small training sets

2.) It is easy to verify the solutions in polynomial time

3.) The algorithm compares all features without any constraints

4.) Tree traversal is independent of the number of features

Solution :

Finding the optimal decision tree is NP-Complete as it requires exponential time even for small training sets.

What is the time complexity for traversing a balanced Decision Tree in terms of the number of
nodes?

1.) O(m log2(m))

2.) O(m)

3.) O(log2(m))

4.) O(m^2)

Solution :
Traversing a balanced Decision Tree typically requires going through O(log2(m)) nodes.

What does the CART cost function aim to minimize in the training process?

1.) Number of instances

2.) Number of features

3.) Impurity of subsets

4.) Depth of the tree

Solution :

The CART cost function aims to minimize the impurity of subsets by finding the purest splits.

Which component in linear regression is denoted by

y
?

1.) Dependent variable

2.) Independent variable

3.) Residual

4.) Slope

Solution :

In linear regression, y denotes the dependent variable.

What does a scatter plot with a regression line visualize in a linear regression model?

1.) Prediction errors

2.) The distribution of x values


3.) The fit between the independent and dependent variables

4.) The variance of residuals

Solution :

A scatter plot with a regression line visualizes the fit between the independent and dependent variables.

What is the goal of linear regression?

1.) To find the maximum likelihood estimation

2.) To establish a nonlinear equation that predicts y based on x

3.) To establish a linear equation that predicts y based on x

4.) To cluster similar data points together

Solution :

The goal of linear regression is to establish a linear equation that predicts y based on x.

The constant term in the linear regression equation

y=θ +θ x
0 1 is known as _____.

1.) Gradient

2.) Slope

3.) Intercept

4.) Residual

Solution :

The constant term in the linear regression equation is known as the intercept.
Random Forest is an ensemble model that combines multiple decision trees.

1.) True

2.) False

3.) Not applicable

4.) None of the mentioned

Which of the following statements is true with respect to the K-NN classifier?

1.​ In case of a very large value of k, we may include points from other classes in the
neighborhood.
2.​ In case of too small a value of k, the algorithm is very sensitive to noise.
3.​ KNN classifier classifies unknown samples by assigning the label that is most frequent
among the k nearest training samples.
1.) Statement 1 only

2.) Statement 1 and 2 only

3.) Statement 1, 2, and 3

4.) Statement 1 and 3 only

Solution :

Statement 1, 2, and 3

Which of the following distance measures do we use in the case of categorical variables in
k-NN?

1.​ Hamming Distance


2.​ Euclidean Distance
3.​ Manhattan Distance
1.) only 1

2.) only 2

3.) only 3
4.) 1 and 2

5.) 1, 2, and 3

Solution :

Both Euclidean and Manhattan distances are used in case of continuous variables, whereas hamming distance is used in case
of categorical variable.

Logistic Regression converts the output probability into a range of [0, 1]. Which function is
employed by logistic regression to achieve this transformation?

1.) Sigmoid

2.) Mode

3.) Square

4.) Median

5.) None of the mentioned

Solution :

Sigmoid function is used to convert output probability between [0,1] in logistic regression.

A multiple regression model has _____.

1.) only one independent variable

2.) more than one dependent variable

3.) more than one independent variable

4.) exactly one dependent variable

What does the term "pruning" refer to in decision trees?

1.) Feature selection

2.) Removing low-impact features


3.) Reducing the size of the tree

4.) Splitting nodes

In a decision tree, what criterion is commonly used to measure the impurity of a node?

1.) Mean Squared Error

2.) Gini Index

3.) Information Gain

4.) Entropy

What is the initial step in a decision tree algorithm?

1.) Prun

2.) Splitting

3.) Leaf node assignment

4.) Feature selection \

What is the primary purpose of the Classification and Regression Tree (CART) algorithm?

1.) Image recognition

2.) Clustering

3.) Classification and Regression

4.) Natural language processing

Solution :

Classification and Regression

Naïve Bayes models are often used for:


1.) Regression problems

2.) Classification problems

3.) Clustering problems

4.) Reinforcement learning problems

_____ is a statistical method that determines the goodness of fit.

1.) Gradient descent

2.) Cost function

3.) R-squared

4.) Mapping function

Solution :

R-squared is a statistical method that determines the goodness of fit.

In regression analysis, the key element we aim to predict or comprehend is referred to


as the dependent variable, also known as _____.

1.) target variable

2.) outlier

3.) independent variable

4.) custom variable

Solution :

The main factor in Regression analysis which we want to predict or understand is called the dependent variable. It is also
called target variable.

If there is only one input variable (x), then such linear regression is called _____.

1.) exponential regression


2.) multiple linear regression

3.) simple linear regression

4.) polynomial regression

Solution :

If there is only one input variable (x), then such linear regression is called simple linear regression. And if there is more than
one input variable, then such linear regression is called multiple linear regression.

Logistic regression uses _____ function or logistic function which is a complex cost
function.

1.) quadratic

2.) sigmoid

3.) lasso

4.) linear

Solution :

Logistic regression uses sigmoid function or logistic function which is a complex cost function. This sigmoid function is used
to model the data in logistic regression.

Linear regression is a linear approach to form a relationship between____________.

1.) dependent data

2.) independent data

3.) None of the mentioned

4.) both dependent and independent data


Which of the following best describes the primary characteristic of linear time series models
in the context of time series analysis?

1.) Nonlinear relationships between variables

2.) Linear relationships between variables

3.) Lack of temporal patterns in the data

4.) Emphasis on non-parametric modeling

Solution :

Linear time series models emphasize linear relationships between variables in the time series data.

Which statistical technique is commonly employed in estimating the parameters of linear time
series models?

1.) Maximum likelihood estimation (MLE)

2.) Kernel density estimation (KDE)

3.) Monte Carlo simulation

4.) Principal component analysis (PCA)

Solution :

Maximum likelihood estimation (MLE) is commonly employed in estimating the parameters of linear time series models.

Consider a 2-class [y= {-1, 1}] classification problem of 2-dimensional feature vectors. The
support vectors and the corresponding class label and lagrangian multipliers are provided. Find
the value of the SVM weight matrix W.

X1=(-1,1), y1=-1, α1=2

X2=(0,3), y2=1, α2=1


X3=(0,-1), y3=1, α3=1

1.) (-1,3)

2.) (2, 0)

3.) (-2, 4)

4.) (-2, 2)

The values of Lagrange multipliers corresponding to the support vectors can be:

1.) Less than zero

2.) Greater than zero

3.) Any real number

4.) Any non zero number.

Which one is true about Perceptron?

1.) Perceptrons can implement Logic Gates like AND, OR, or XOR.

2.) Single layer Perceptrons can learn only linearly separable patterns.

3.) Perceptron Learning Rule states that the algorithm would automatically learn the optimal
weight coefficients

4.) All of the mentioned

Which one is a type of Perceptron model ?

1.) Single-layer

2.) Multi-layer

3.) Both Single-layer and Multi-layer

4.) None of the mentioned

What do you mean by a hard margin?

1.) The SVM allows very low error in classification


2.) The SVM allows high amount of error in classification

3.) Both The SVM allows very low error in classification and The SVM allows high amount of
error in classification

4.) none of the mentioned

Which one is a type of Perceptron model ?

1.) Single-layer

2.) Multi-layer

3.) Both Single-layer and Multi-layer

4.) None of the mentioned

A 4-input neuron has weights 2, 3, 4 and 5 and the transfer function is linear with the constant of
proportionality being equal to 2.The inputs are 5, 12, 6 and 10 respectively.

What will be the output?

1.) 234

2.) 240

3.) 245

4.) 230

Solution :

240

A perceptron works by taking in some numerical inputs along with what is known as _____
and ___.

1.) weights,bias

2.) threshold,bias

3.) weighted sum, sigmoid


4.) sigmoid ,bias

Which one belongs to application areas of Artificial Neural Networks ?

1.) Speech Recognition

2.) Character Recognition

3.) Human Face Recognition

4.) All of the mentioned

We usually use feature normalization before using the Gaussian kernel in SVM. What is true
about feature normalization?

1. We do feature normalization so that new feature will dominate other

2. Some times, feature normalization is not feasible in case of categorical variables

3. Feature normalization always helps when we use Gaussian kernel in SVM

1.) 1

2.) 1 and 2

3.) 1 and 3

4.) 2 and 3

5.) None of the mentioned

Solution :

Statements one and two are correct.

Loop are allowed in ________Artificial Neural Network.

1.) Modular
2.) FeedForward

3.) Kohonen Self Organizing

4.) FeedBack

Which of the following is a linear classification model?

1.) Naïve Bayes

2.) Decision Tree Classification

3.) K-Nearest Neighbors

4.) Support Vector Machines

5.) Kernel SVM

Solution :

All others are non-linear classification models

Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?

1.) The model would consider only the points close to the hyperplane for modeling

2.) The model would consider even far away points from hyperplane for modeling

3.) The model would not be affected by distance of points from hyperplane for modeling

4.) All of the Mentioned

Solution :

The gamma parameter in SVM tuning signifies the influence of points either near or far away from the hyperplane. For a low
gamma, the model will be too constrained and include all points of the training dataset, without really capturing the shape.
For a higher gamma, the model will capture the shape of the dataset well.
The effectiveness of an SVM depends upon:

1.) Selection of Kernel

2.) Kernel Parameters

3.) Soft Margin Parameter C

4.) All of the mentioned

Solution :

The SVM effectiveness depends upon how you choose the basic 3 requirements mentioned above in such a way that it
maximises your efficiency, reduces error and overfitting.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy