Data Mining Exam
Data Mining Exam
The concept of “Apriori Principle” notes that all of its subsets must also be frequent if an itemset is
frequent.
Question 1Answer
a.FALSE
b.TRUE
a.ratio data
b.ordinal data
c.interval data
d.nominal data
The probability of an individual owning a horse is 25%, given that they subscribe to at least one royal
equestrian club. We also know that at least one royal equestrian club is subscribed to by 8% of the adult
population. Finally, the probability of an individual owning a horse given that they don’t subscribe to at least
one royal equestrian club is 15%. Use the Bayes theorem to compute the probabilty that an individual
subscribes to at least one royal equestrian club given that they own a horse.
Question 3Answer
a.None of these
b.≈ 0.13
c.≈ 0.28
d.≈ 0.34
e.≈ 0.09
If the training data classes are unknown, which of the following algorithms could be used to find useful classes?
Question 4Answer
a.Clustering
b.Pruning Analysis
c.
Bayesian Analysis
d.
Binary Sort
Which of these types of variables is the set of odd integers from n = 5 to n = 41?
Question 5Answer
a.
Categorical
b.
Independent
c.
Interval
d.
Dependent
e.
Ordinal
Discretization is the process of converting a continuous attribute into a nominal attribute.
Question 7Answer
a.
FALSE
b.
TRUE
A false positive is an occurrence classified as true by the algorithm despite being false in fact during
classification in data mining.
Question 8Answer
a.
TRUE
Transactions form market baskets show:
Question 9Answer
a.
data relationships
b.
monthly customer purchases
c.
daily customer purchase data
d.
tea, sugar, and biscuit
a.
Dissimilarity between any given attribute of data items/objects in terms of the Supremum distance
b.
Dissimilarity between two data items/objects in terms of the Hamming distance of the bits
c.
Dissimilarity between points in terms of the Euclidean definition of distance
d.
All of these
For the rule set extracted from a decision tree, which statement is most true?
Question 11Answer
a.
Such rules are mutually exclusive, exhaustive, and unordered
b.
Such rules are non-exclusive, exhaustive, and ordered
c.
Logical OR exists between such rules, they are unordered
d.
None of these
In prediction methods, which statement is true?
Question 12Answer
a.
The designed model is used to classify current behaviors
b.
A numeric output/class attribute must be
c.
A categorical output/class attribute must be
d.
The designed model is used to determine future outcomes
Association analysis is a way of finding and grouping together sets of closely related observations.
Question 13Answer
a.
FALSE
b.
TRUE
a.
FALSE
b.
TRUE
a.
5
b.
4
c.
2
d.
3
A methodology useful for discovering interesting relationships within large data sets is_________.
Question 16Answer
a.
Algorithm
b.
Data Mining
c.
Association analysis
d.
Big Data
The clustering of K-means requires prior knowledge of the number of clusters required as its input.
Question 17Answer
a.
FALSE
b.
TRUE
In a partition with 10 instances, assuming log base2, the entropy of a binary function with (# of As = 4 and # of
Bs = 6) is:
Question 18Answer
a.
≈ 0.88
b.
≈ 0.72
c.
≈ 0.47
d.
≈ 0.97
Add True Positive, True Negative and divide by adding False Negative and False Positive while evaluating
Accuracy in a Confusion Matrix Table.
Question 19
Answer
a.
FALSE
b.
TRUE
Examples of training that are relatively close to the test example's attributes are considered nearest neighbors.
Question 20Answer
a.
FALSE
b.
TRUE
Which field of data mining applications analyzes information and establishes rules to differentiate between
specified classes?
Question 21Answer
a.
Visualization
b.
Classification
c.
Clustering
d.Associations
Analysis of clusters is a way of finding patterns based on closely correlated data characteristics in the data.
Question 22Answer
a.
FALSE
b.
TRUE
Ratio data is a categorical data type.
Question 23Answer
a.
TRUE
b.
FALSE
_________ is the basis for the existing decision tree algorithms ID3, C4.5, and CART.
Question 24Answer
a.
Gini index
b.
Hunt’s Algorithm
c.
ID4
d.
Information gain
A graphical evaluation approach for binary classification models in which the true positive rate on the y-axis is
plotted and the false positive rate on the x-axis is plotted.
Question 25Answer
a.
Ratio data
b.
Decision tree
c.
Distance measure
d.
Area under the ROC curve
TRUE
__________ are quantitative attributes.
Question 27Answer
a.
Random
b.
Alphabetical
c.
Numeric
d.
Nominal
Suppose we have a dataset containing 200 people's details. One hundred of these people have paid insurance
for their cars. The following rule was discovered by a supervised data mining session:
IF age ≥ 18 & driving license = yes
THEN vehicles insurance = yes
Rule Precision: 80%
Rule Coverage: 40%
How many people have driving license and age ≥ 18 years old in the class vehicles insurance = no?
Question 28Answer
a.
120
b.
80
c.
16
d.
64
e.
8
A _____________ shows correctly and incorrectly predicted counts of test records by a classification model.
Question 29Answer
a.
decision tree
b.
learning model
c.
attribute class
d.
confusion matrix
Using the confusion matrix below, what are the accuracy and the precision of the classifier respectively?
a.
Infrequent/Rare
b.
Frequent/Regular
c.
Small/Minimal
A random error or variation in calculated variables is noise.
Question 32Answer
a.
TRUE
b.
FALSE
The probability of hypothesis H defined by P(H) is referred to by the Bayes theorem as ___________.
Question 33Answer
a.
a conditional probability
b.
an a priori probability
c.
a posterior probability
d.
a bidirectional probability
a.
three
b.
two
c.
four
d.one
The _____________ strategy aims to find all the items that satisfy the minimum support (minsup) threshold.
Question 35Answer
a.
Frequent Itemset Generation
b.
Association Rule Discovery
c.
Rule Generation
d.Rule-pruning
To deal with missing data items during the learning process, some data mining techniques __________.
Question 36Answer
a.
replace missing items of real-value data with class means
b.
remove records with missing data
c.
replace missing values of attributes with values found in related instances
The true positive rate is __________, when calculating the accuracy of data mining classification models.
Question 37Answer
a.
the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly
classified positives
b.
the ratio of correctly classified positives divided by the total positive count
c.
the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly
classified negatives.
d.
the ratio of correctly classified negatives divided by the total negative count
To apply the Bayes theorem, the following relationship must be maintained between hypothesis H and
evidence E.
Question 38Answer
a.
P(H|E) + P(H| ~E) = 1
b.
P(H|E) + P(~H| E) = 1
c.
P(H|E) + P(H| ~E) = 0
d.
P(H|E) + P(~H| E) = 0
“If an itemset is frequent, then all of its subsets must also be frequent” referred to
Question 39Answer
a.
Apriori Principle
b.
A theorem which can never be proven
c.
The main to understanding market basket analysis
d.
All of these
In the data classification process, which of these terms describes the first major task?
Question 40Answer
a.
Choose training data
b.
Classify
c.Learning
d.Data preprocessing
The ______________ strategy aims to derive from the frequent itemsets contained in the Frequent Itemset
Generation all the high-confidence rules.
Question 41Answer
a.Association Rule
b.Rule Generation
c.Association Analysis
d.Association Generation
Assume that a group of 900 individuals has been surveyed. Evaluate the following survey observations:
participants who read history books only = 400, participants who read non-history books only = 150, and
participants who read both = 100. What is the confidence of a participant (X, "history books") → reads (X, "non-
history books")?
Question 42Answer
a.30%
c.20%
d.25%
e.15%