0% found this document useful (0 votes)

9 views19 pages

Unit 3

The document covers various algorithms and concepts in data mining, particularly focusing on association rule learning, regression trees, and classification techniques. Key algorithms discussed include Apriori for association rules, CART for decision trees, and K-Nearest Neighbors for classification. Additionally, it addresses metrics like support and confidence, the importance of pruning in regression trees, and the significance of various statistical methods in linear regression and SVM.

Uploaded by

Lieo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views19 pages

Unit 3

Uploaded by

Lieo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

UNIT III

Which algorithm is commonly used for generating association rules from large transaction datasets?
1.) K-means
2.) Apriori
3.) Decision Trees
4.) Linear Regression

Solution :
Apriori is a popular algorithm for association rule mining.

What is the primary goal of Association Rule Learning in data mining?

1.) Classifying data into predefined categories
2.) Discovering interesting relationships in data
3.) Predicting numeric values
4.) Clustering similar data points

Solution :
Association Rule Learning aims to find relationships or patterns in data.

Which metric is commonly used to evaluate the strength of association rules in market basket
analysis?
1.) Support
2.) F1-score
3.) Precision
4.) Accuracy

Solution :
Support measures the frequency of items occurring together.

In the context of Association Rule Learning, what is the confidence of a rule?

1.) The number of transactions in the dataset
2.) The strength of association between items in the rule
3.) The percentage of transactions containing the antecedent that also contain the consequent
4.) The probability of the rule being incorrect

Solution :
Confidence reflects the reliability of a rule.
If a rule has a support of 0.1 and there are 1,000 transactions in the dataset, how many transactions
contain the items in the rule?
1.) 1000
2.) 100
3.) 50
4.) 10

Solution :
Support is the percentage of transactions that contain the items in the rule. In this case, the support is 0.1, which means 10% of the
transactions contain the items. To find the absolute number of transactions, we calculate 0.1 * 1,000 = 100. Therefore, 100 transactions
contain the items in the rule.

_____ methods like Random Forests and Gradient Boosting improve the predictability of
regression trees.

1.) Ensemble

2.) Singular

3.) Linear

4.) Parametric

Solution :

Ensemble methods like Random Forests and Gradient Boosting enhance predictive accuracy by combining the outputs of
multiple trees, reducing individual error rates and improving stability and generalization of the model.

How does pruning affect a regression tree's complexity?

1.) Increases it

2.) Decreases it

3.) Keeps it unchanged

4.) Randomizes it

Solution :

Pruning reduces a regression tree’s complexity by removing unnecessary branches. By cutting off these sections, the model
becomes less overfit to the training data and can generalize better to new data.
Which ensemble method improves the stability of regression trees?

1.) Simple averaging

2.) Random Forests

3.) Data normalization

4.) Gradient Descent

Solution :

Random Forests are an ensemble method that combines multiple regression trees to enhance stability and predictive
performance. By aggregating the predictions from numerous trees, the impact of overfitting and instability is reduced.

A regression tree is known to split data points into two _____ divisions.

1.) binary

2.) single

3.) multiple

4.) linear

What technique is used to break down the dataset into subsets in a regression tree?

1.) Recursive partitioning

2.) Data swapping

3.) Feature engineering

4.) Linear regression

What would be the ideal complexity of the curve that can be used for separating the two classes
shown in the image below?
1.) Linear

2.) Quadratic

3.) Cubic

4.) insufficient data to draw conclusion

Solution :

The blue point in the red region is an outlier. The rest of the data is linearly separable.

K-Nearest Neighbor is a _ , _ algorithm.

1.) non-parametric, eager

2.) parametric, eager

3.) non-parametric, lazy

4.) parametric, lazy

Solution :

KNN is non-parametric because it does not make any assumption regarding the underlying data distribution. It is a lazy
learning technique because during training time it just memorizes the data and finally computes the distance during testing.

Suppose, you have been given the following data where x1 and x2 are the 2 inputvariables
and Class is the dependent variable.

What will be the class of a new data point x1=1 and x2=1 in 5-NN (k nearest neighbour with
k=5) using euclidean distance measure?
1.)

+
Class

2.)

−
Class

3.) Cannot be determined

4.) Insufficient Data

Solution :

5 nearest points to the new point (1,1) are: (0,1), (0,2), (1,0), (1,2), (2,2). The majority class among these 5 nearest neighbours

is + Class.

What is the significance of the CART algorithm in decision tree generation?

1.) It finds the optimal tree in polynomial time

2.) It uses entropy exclusively as an impurity measure

3.) It always guarantees the optimal solution

4.) It greedily seeks the best split at each level

Solution :

The CART algorithm is a greedy algorithm that looks for the best split at the current level without guaranteeing the optimal
solution.

How is Gini impurity different from entropy in decision tree impurity measures?
1.) Gini impurity is based on message content while entropy is related to molecular order

2.) Entropy is used by default in decision trees while Gini impurity requires setting a
parameter

3.) Gini impurity measures molecular disorder while entropy measures information content

4.) Entropy always leads to faster predictions compared to Gini impurity

Solution :

Gini impurity measures disorder in a set, while entropy measures the average information content of a message.

Why is finding the optimal decision tree known to be an NP-Complete problem?

1.) It requires exponential time for even small training sets

2.) It is easy to verify the solutions in polynomial time

3.) The algorithm compares all features without any constraints

4.) Tree traversal is independent of the number of features

Solution :

Finding the optimal decision tree is NP-Complete as it requires exponential time even for small training sets.

What is the time complexity for traversing a balanced Decision Tree in terms of the number of
nodes?

1.) O(m log2(m))

2.) O(m)

3.) O(log2(m))

4.) O(m^2)

Solution :
Traversing a balanced Decision Tree typically requires going through O(log2(m)) nodes.

What does the CART cost function aim to minimize in the training process?

1.) Number of instances

2.) Number of features

3.) Impurity of subsets

4.) Depth of the tree

Solution :

The CART cost function aims to minimize the impurity of subsets by finding the purest splits.

Which component in linear regression is denoted by

y
?

1.) Dependent variable

2.) Independent variable

3.) Residual

4.) Slope

Solution :

In linear regression, y denotes the dependent variable.

What does a scatter plot with a regression line visualize in a linear regression model?

1.) Prediction errors

2.) The distribution of x values

3.) The fit between the independent and dependent variables

4.) The variance of residuals

Solution :

A scatter plot with a regression line visualizes the fit between the independent and dependent variables.

What is the goal of linear regression?

1.) To find the maximum likelihood estimation

2.) To establish a nonlinear equation that predicts y based on x

3.) To establish a linear equation that predicts y based on x

4.) To cluster similar data points together

Solution :

The goal of linear regression is to establish a linear equation that predicts y based on x.

The constant term in the linear regression equation

y=θ +θ x
0 1 is known as _____.

1.) Gradient

2.) Slope

3.) Intercept

4.) Residual

Solution :

The constant term in the linear regression equation is known as the intercept.
Random Forest is an ensemble model that combines multiple decision trees.

1.) True

2.) False

3.) Not applicable

4.) None of the mentioned

Which of the following statements is true with respect to the K-NN classifier?

1. In case of a very large value of k, we may include points from other classes in the
neighborhood.
2. In case of too small a value of k, the algorithm is very sensitive to noise.
3. KNN classifier classifies unknown samples by assigning the label that is most frequent
among the k nearest training samples.
1.) Statement 1 only

2.) Statement 1 and 2 only

3.) Statement 1, 2, and 3

4.) Statement 1 and 3 only

Solution :

Statement 1, 2, and 3

Which of the following distance measures do we use in the case of categorical variables in
k-NN?

1. Hamming Distance

2. Euclidean Distance
3. Manhattan Distance
1.) only 1

2.) only 2

3.) only 3
4.) 1 and 2

5.) 1, 2, and 3

Solution :

Both Euclidean and Manhattan distances are used in case of continuous variables, whereas hamming distance is used in case
of categorical variable.

Logistic Regression converts the output probability into a range of [0, 1]. Which function is
employed by logistic regression to achieve this transformation?

1.) Sigmoid

2.) Mode

3.) Square

4.) Median

5.) None of the mentioned

Solution :

Sigmoid function is used to convert output probability between [0,1] in logistic regression.

A multiple regression model has _____.

1.) only one independent variable

2.) more than one dependent variable

3.) more than one independent variable

4.) exactly one dependent variable

What does the term "pruning" refer to in decision trees?

1.) Feature selection

2.) Removing low-impact features

3.) Reducing the size of the tree

4.) Splitting nodes

In a decision tree, what criterion is commonly used to measure the impurity of a node?

1.) Mean Squared Error

2.) Gini Index

3.) Information Gain

4.) Entropy

What is the initial step in a decision tree algorithm?

1.) Prun

2.) Splitting

3.) Leaf node assignment

4.) Feature selection \

What is the primary purpose of the Classification and Regression Tree (CART) algorithm?

1.) Image recognition

2.) Clustering

3.) Classification and Regression

4.) Natural language processing

Solution :

Classification and Regression

Naïve Bayes models are often used for:

1.) Regression problems

2.) Classification problems

3.) Clustering problems

4.) Reinforcement learning problems

_____ is a statistical method that determines the goodness of fit.

1.) Gradient descent

2.) Cost function

3.) R-squared

4.) Mapping function

Solution :

R-squared is a statistical method that determines the goodness of fit.

In regression analysis, the key element we aim to predict or comprehend is referred to

as the dependent variable, also known as _____.

1.) target variable

2.) outlier

3.) independent variable

4.) custom variable

Solution :

The main factor in Regression analysis which we want to predict or understand is called the dependent variable. It is also
called target variable.

If there is only one input variable (x), then such linear regression is called _____.

1.) exponential regression

2.) multiple linear regression

3.) simple linear regression

4.) polynomial regression

Solution :

If there is only one input variable (x), then such linear regression is called simple linear regression. And if there is more than
one input variable, then such linear regression is called multiple linear regression.

Logistic regression uses _____ function or logistic function which is a complex cost
function.

1.) quadratic

2.) sigmoid

3.) lasso

4.) linear

Solution :

Logistic regression uses sigmoid function or logistic function which is a complex cost function. This sigmoid function is used
to model the data in logistic regression.

Linear regression is a linear approach to form a relationship between____________.

1.) dependent data

2.) independent data

3.) None of the mentioned

4.) both dependent and independent data

Which of the following best describes the primary characteristic of linear time series models
in the context of time series analysis?

1.) Nonlinear relationships between variables

2.) Linear relationships between variables

3.) Lack of temporal patterns in the data

4.) Emphasis on non-parametric modeling

Solution :

Linear time series models emphasize linear relationships between variables in the time series data.

Which statistical technique is commonly employed in estimating the parameters of linear time
series models?

1.) Maximum likelihood estimation (MLE)

2.) Kernel density estimation (KDE)

3.) Monte Carlo simulation

4.) Principal component analysis (PCA)

Solution :

Maximum likelihood estimation (MLE) is commonly employed in estimating the parameters of linear time series models.

Consider a 2-class [y= {-1, 1}] classification problem of 2-dimensional feature vectors. The
support vectors and the corresponding class label and lagrangian multipliers are provided. Find
the value of the SVM weight matrix W.

X1=(-1,1), y1=-1, α1=2

X2=(0,3), y2=1, α2=1

X3=(0,-1), y3=1, α3=1

1.) (-1,3)

2.) (2, 0)

3.) (-2, 4)

4.) (-2, 2)

The values of Lagrange multipliers corresponding to the support vectors can be:

1.) Less than zero

2.) Greater than zero

3.) Any real number

4.) Any non zero number.

Which one is true about Perceptron?

1.) Perceptrons can implement Logic Gates like AND, OR, or XOR.

2.) Single layer Perceptrons can learn only linearly separable patterns.

3.) Perceptron Learning Rule states that the algorithm would automatically learn the optimal
weight coefficients

4.) All of the mentioned

Which one is a type of Perceptron model ?

1.) Single-layer

2.) Multi-layer

3.) Both Single-layer and Multi-layer

4.) None of the mentioned

What do you mean by a hard margin?

1.) The SVM allows very low error in classification

2.) The SVM allows high amount of error in classification

3.) Both The SVM allows very low error in classification and The SVM allows high amount of
error in classification

4.) none of the mentioned

Which one is a type of Perceptron model ?

1.) Single-layer

2.) Multi-layer

3.) Both Single-layer and Multi-layer

4.) None of the mentioned

A 4-input neuron has weights 2, 3, 4 and 5 and the transfer function is linear with the constant of
proportionality being equal to 2.The inputs are 5, 12, 6 and 10 respectively.

What will be the output?

1.) 234

2.) 240

3.) 245

4.) 230

Solution :

240

A perceptron works by taking in some numerical inputs along with what is known as _____
and ___.

1.) weights,bias

2.) threshold,bias

3.) weighted sum, sigmoid

4.) sigmoid ,bias

Which one belongs to application areas of Artificial Neural Networks ?

1.) Speech Recognition

2.) Character Recognition

3.) Human Face Recognition

4.) All of the mentioned

We usually use feature normalization before using the Gaussian kernel in SVM. What is true
about feature normalization?

1. We do feature normalization so that new feature will dominate other

2. Some times, feature normalization is not feasible in case of categorical variables

3. Feature normalization always helps when we use Gaussian kernel in SVM

1.) 1

2.) 1 and 2

3.) 1 and 3

4.) 2 and 3

5.) None of the mentioned

Solution :

Statements one and two are correct.

Loop are allowed in ________Artificial Neural Network.

1.) Modular
2.) FeedForward

3.) Kohonen Self Organizing

4.) FeedBack

Which of the following is a linear classification model?

1.) Naïve Bayes

2.) Decision Tree Classification

3.) K-Nearest Neighbors

4.) Support Vector Machines

5.) Kernel SVM

Solution :

All others are non-linear classification models

Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?

1.) The model would consider only the points close to the hyperplane for modeling

2.) The model would consider even far away points from hyperplane for modeling

3.) The model would not be affected by distance of points from hyperplane for modeling

4.) All of the Mentioned

Solution :

The gamma parameter in SVM tuning signifies the influence of points either near or far away from the hyperplane. For a low
gamma, the model will be too constrained and include all points of the training dataset, without really capturing the shape.
For a higher gamma, the model will capture the shape of the dataset well.
The effectiveness of an SVM depends upon:

1.) Selection of Kernel

2.) Kernel Parameters

3.) Soft Margin Parameter C

4.) All of the mentioned

Solution :

The SVM effectiveness depends upon how you choose the basic 3 requirements mentioned above in such a way that it
maximises your efficiency, reduces error and overfitting.

Nptel ML Questions
No ratings yet
Nptel ML Questions
12 pages
Assignment 04
No ratings yet
Assignment 04
17 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
71 pages
Chapter 15 - Machine Learning New
No ratings yet
Chapter 15 - Machine Learning New
19 pages
Icstsdg Proceedings
No ratings yet
Icstsdg Proceedings
173 pages
Module 4 - Supervised and Unsupervised Learning Techniques
No ratings yet
Module 4 - Supervised and Unsupervised Learning Techniques
52 pages
2.unit 2 ML Q&A
No ratings yet
2.unit 2 ML Q&A
36 pages
Decision Trees: Make A Decision (Represent An Outcome
No ratings yet
Decision Trees: Make A Decision (Represent An Outcome
4 pages
ML 2 Marks
No ratings yet
ML 2 Marks
14 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
77 pages
ML 3
No ratings yet
ML 3
20 pages
DMT MCQ
No ratings yet
DMT MCQ
15 pages
Huawei Final Written Exam
50% (2)
Huawei Final Written Exam
18 pages
Nptel Week 7
No ratings yet
Nptel Week 7
3 pages
Aiml K2
No ratings yet
Aiml K2
8 pages
Huawei Final Written Exam 2.2 Attempts
No ratings yet
Huawei Final Written Exam 2.2 Attempts
19 pages
Final Viva
No ratings yet
Final Viva
7 pages
ML Assignment-01
No ratings yet
ML Assignment-01
7 pages
Important Questions dwdm-2
No ratings yet
Important Questions dwdm-2
7 pages
Objectives Questions For Data Mining
No ratings yet
Objectives Questions For Data Mining
4 pages
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
100% (1)
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
5 pages
Unit-7 ML
No ratings yet
Unit-7 ML
11 pages
KNN (K-Nearest Neighbours) Is A Supervised Learning and Non-Parametric Algorithm That Can
No ratings yet
KNN (K-Nearest Neighbours) Is A Supervised Learning and Non-Parametric Algorithm That Can
4 pages
ML Unit 2
No ratings yet
ML Unit 2
46 pages
Unit 5 Bi
No ratings yet
Unit 5 Bi
3 pages
Algorithms
No ratings yet
Algorithms
5 pages
Coincent Data Analysis Answers
No ratings yet
Coincent Data Analysis Answers
16 pages
Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
DATA SCIENCE iNTERVIEW QUESTION
No ratings yet
DATA SCIENCE iNTERVIEW QUESTION
42 pages
WK 07
No ratings yet
WK 07
8 pages
ISE 529 Mock Test Answers
No ratings yet
ISE 529 Mock Test Answers
6 pages
SemVII MachineLearning
No ratings yet
SemVII MachineLearning
22 pages
Feedback The Correct Answer Is:analysis of Time Series
No ratings yet
Feedback The Correct Answer Is:analysis of Time Series
42 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Practice Exam
No ratings yet
Practice Exam
6 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
69 pages
ML 2 (Mainly KNN)
100% (1)
ML 2 (Mainly KNN)
12 pages
Blood Pressure by Height & Age PDF
No ratings yet
Blood Pressure by Height & Age PDF
4 pages
examBD2223 January Solutions
No ratings yet
examBD2223 January Solutions
7 pages
DataMining - Workbook MCQ
No ratings yet
DataMining - Workbook MCQ
16 pages
ML 1
No ratings yet
ML 1
51 pages
ML Unit 1 MCQ
100% (1)
ML Unit 1 MCQ
9 pages
It ML
No ratings yet
It ML
10 pages
Unit 4 MCQ
No ratings yet
Unit 4 MCQ
10 pages
MLunit 2 Mynotes
No ratings yet
MLunit 2 Mynotes
15 pages
Assingment On Database
No ratings yet
Assingment On Database
16 pages
Name: Mukund N. Purohit Roll No.: 21 Multivariate Analysis: Definition
100% (1)
Name: Mukund N. Purohit Roll No.: 21 Multivariate Analysis: Definition
6 pages
Ps Notes
No ratings yet
Ps Notes
62 pages
FMLanswerkey-IT 2
No ratings yet
FMLanswerkey-IT 2
11 pages
Uniglobe College: Lesson Plan: Business Statistics
No ratings yet
Uniglobe College: Lesson Plan: Business Statistics
6 pages
Practical 7 Classification Revision Questions
No ratings yet
Practical 7 Classification Revision Questions
8 pages
ML Questions Answers
No ratings yet
ML Questions Answers
4 pages
MLP Question Bank of AI and ML and NLP
No ratings yet
MLP Question Bank of AI and ML and NLP
7 pages
Sample Quiz1 Questions
No ratings yet
Sample Quiz1 Questions
8 pages
Time Delay and Cost Overrun in Qatari Public Construction Projects
No ratings yet
Time Delay and Cost Overrun in Qatari Public Construction Projects
8 pages
MCQs (Machine Learning)
50% (22)
MCQs (Machine Learning)
7 pages
MLT Unit-3 Important Questions
No ratings yet
MLT Unit-3 Important Questions
8 pages
Data Mining f20 Practice Final Solutions
No ratings yet
Data Mining f20 Practice Final Solutions
8 pages
Interview Questions
No ratings yet
Interview Questions
8 pages
MCQs Dumps 2
No ratings yet
MCQs Dumps 2
15 pages
Machine Learning: Usman Roshan Dept. of Computer Science Njit
No ratings yet
Machine Learning: Usman Roshan Dept. of Computer Science Njit
14 pages
Shivaji University, Kolhapur
No ratings yet
Shivaji University, Kolhapur
12 pages
COSC 6335 Data Mining (Dr. Eick) Solution Sketches Midterm Exam October 25, 2012
No ratings yet
COSC 6335 Data Mining (Dr. Eick) Solution Sketches Midterm Exam October 25, 2012
11 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
ML BIT Ans
No ratings yet
ML BIT Ans
5 pages
Quartile & Deviation
No ratings yet
Quartile & Deviation
31 pages
Data Mining For Intelligence
No ratings yet
Data Mining For Intelligence
4 pages
IDP Sem 6
No ratings yet
IDP Sem 6
54 pages
pr3 Reviewer With Answers
No ratings yet
pr3 Reviewer With Answers
5 pages
The Role of Teacher's Authority in Students' Learning
No ratings yet
The Role of Teacher's Authority in Students' Learning
16 pages
Master's Written Examination and Solution
No ratings yet
Master's Written Examination and Solution
14 pages
Data Representation 02 - 12
No ratings yet
Data Representation 02 - 12
28 pages
Finding Latent Groups in Observed Data
No ratings yet
Finding Latent Groups in Observed Data
56 pages
Communication and Soft Skills Record Final Cse B (06.05.2025)
No ratings yet
Communication and Soft Skills Record Final Cse B (06.05.2025)
22 pages
Case 5 - Lanco Case
67% (3)
Case 5 - Lanco Case
16 pages
Microeconometrics 2024 - 01-2424
No ratings yet
Microeconometrics 2024 - 01-2424
11 pages
2 1 Statistical Measures WhZeNDRsQrqdQQyh
No ratings yet
2 1 Statistical Measures WhZeNDRsQrqdQQyh
37 pages
Big Data - Notes
No ratings yet
Big Data - Notes
6 pages
Research Methodology and Fuel & Applied Geology Practical
No ratings yet
Research Methodology and Fuel & Applied Geology Practical
2 pages
Moving Averages and Smoothing Methods PDF
No ratings yet
Moving Averages and Smoothing Methods PDF
32 pages
Polytechnic University of The Philippines College of Engineering Department of Industrial Engineering
No ratings yet
Polytechnic University of The Philippines College of Engineering Department of Industrial Engineering
27 pages
Chapter 3 Multiple Regression Analysis Estimation
No ratings yet
Chapter 3 Multiple Regression Analysis Estimation
38 pages
Cloud Computing - Unit 5 - Week 2
No ratings yet
Cloud Computing - Unit 5 - Week 2
4 pages
Classwork # 5 - Estimation
No ratings yet
Classwork # 5 - Estimation
4 pages
STA 114 SPECIAL TEST 2 - Questions Solution 11 May 2021
No ratings yet
STA 114 SPECIAL TEST 2 - Questions Solution 11 May 2021
10 pages
Syllabus Surv 742, Survmeth 618 Inference From Complex Surveys Winter 2013
No ratings yet
Syllabus Surv 742, Survmeth 618 Inference From Complex Surveys Winter 2013
8 pages
Circular - Congratulating Pioneers of Excellence
No ratings yet
Circular - Congratulating Pioneers of Excellence
3 pages
Amsterdam + Berlin Schedule & Curriculum Edorer Business Analytics & Data Science Bootcamp
No ratings yet
Amsterdam + Berlin Schedule & Curriculum Edorer Business Analytics & Data Science Bootcamp
14 pages
Sustainability Disclosure and Intellectual Capital: A New Perspective On Corporate Reporting
No ratings yet
Sustainability Disclosure and Intellectual Capital: A New Perspective On Corporate Reporting
10 pages
pr1 Chapter III Answer Key
No ratings yet
pr1 Chapter III Answer Key
10 pages
What Is Delta Method
No ratings yet
What Is Delta Method
1 page
Midterm Exam Formula PDF
No ratings yet
Midterm Exam Formula PDF
6 pages
Assignment 4 MA2201
No ratings yet
Assignment 4 MA2201
3 pages
Week 11 Assignment 11.2.2
No ratings yet
Week 11 Assignment 11.2.2
3 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 3

Uploaded by

Unit 3

Uploaded by

UNIT III

What is the primary goal of Association Rule Learning in data mining?

In the context of Association Rule Learning, what is the confidence of a rule?

How does pruning affect a regression tree's complexity?

3.) Keeps it unchanged

1.) Simple averaging

2.) Random Forests

3.) Data normalization

4.) Gradient Descent

1.) Recursive partitioning

2.) Data swapping

3.) Feature engineering

4.) Linear regression

4.) insufficient data to draw conclusion

K-Nearest Neighbor is a _____ , _____ algorithm.

1.) non-parametric, eager

2.) parametric, eager

3.) non-parametric, lazy

4.) parametric, lazy

3.) Cannot be determined

4.) Insufficient Data

What is the significance of the CART algorithm in decision tree generation?

1.) It finds the optimal tree in polynomial time

2.) It uses entropy exclusively as an impurity measure

3.) It always guarantees the optimal solution

4.) It greedily seeks the best split at each level

4.) Entropy always leads to faster predictions compared to Gini impurity

Why is finding the optimal decision tree known to be an NP-Complete problem?

1.) It requires exponential time for even small training sets

2.) It is easy to verify the solutions in polynomial time

3.) The algorithm compares all features without any constraints

4.) Tree traversal is independent of the number of features

1.) O(m log2(m))

1.) Number of instances

2.) Number of features

3.) Impurity of subsets

4.) Depth of the tree

Which component in linear regression is denoted by

1.) Dependent variable

2.) Independent variable

In linear regression, y denotes the dependent variable.

1.) Prediction errors

2.) The distribution of x values

4.) The variance of residuals

What is the goal of linear regression?

1.) To find the maximum likelihood estimation

2.) To establish a nonlinear equation that predicts y based on x

3.) To establish a linear equation that predicts y based on x

4.) To cluster similar data points together

The constant term in the linear regression equation

3.) Not applicable

4.) None of the mentioned

2.) Statement 1 and 2 only

3.) Statement 1, 2, and 3

4.) Statement 1 and 3 only

1.​ Hamming Distance

5.) None of the mentioned

A multiple regression model has _____.

1.) only one independent variable

2.) more than one dependent variable

3.) more than one independent variable

4.) exactly one dependent variable

What does the term "pruning" refer to in decision trees?

1.) Feature selection

2.) Removing low-impact features

4.) Splitting nodes

1.) Mean Squared Error

2.) Gini Index

3.) Information Gain

What is the initial step in a decision tree algorithm?

3.) Leaf node assignment

4.) Feature selection \

1.) Image recognition

3.) Classification and Regression

4.) Natural language processing

Classification and Regression

K-Nearest Neighbor is a _ , _ algorithm.

1. Hamming Distance