PCCCS504 Module 4
PCCCS504 Module 4
1. Another possibility using Gaussian densities is to have them all diagonal but
allow them to be different. Derive the discriminant for this case.
2. Let us say in two dimensions, we have two classes with exactly the same mean.
What type of boundaries can be defined?
3. Let us say we have two variables x1 and x2 and we want to make a quadratic
fit using them, namely, f (x1, x2) = w0 + w1x1 + w2x2 + w3x1x2 + w4(x1)2 +
w5(x2)2 How can we find wi, i = 0, . . . , 5, given a sample of X ={xt 1, xt 2, rt}?
4. In regression we saw that fitting a quadratic is equivalent to fitting a linear
model with an extra input corresponding to the square of the input. Can we
also do this in classification?
5. In document clustering, ambiguity of words can be decreased by taking the
context into account, for example, by considering pairs of words, as in “cocktail
party” vs. “party elections.” Discuss how this can be implemented.
6. Parametric regression assumes Gaussian noise and hence is not robust to
outliers; how can we make it more robust?
7. How can we detect outliers after hierarchical clustering?
8. How does condensed nearest neighbour behave if k > 1?
9. In condensed nearest neighbor, an instance previously added to Z may no
longer be necessary after a later addition. How can we find such instances that
are no longer necessary?
10. In a regressogram, instead of averaging in a bin and doing a constant fit, we
can use the instances falling in a bin and do a linear fit (see figure 8.14). Write
the code and compare this with the regressogram proper.
11. Propose an incremental version of the running mean estimator, which, like the
condensed nearest neighbor, stores instances only when necessary.
12. In the running smoother, we can fit a constant, a line, or a higher-degree
polynomial at a test point. How can we choose among them?
13. For a numeric input, instead of a binary split,
one can use a ternary split with two thresholds and
three branches as xj < wma, wma ≤ xj < wmb, xj ≥ wmb Propose a
modification of the tree induction method to learn the two thresholds,
wma,wmb. What are the advantages and the disadvantages of such a node over
a binary node?
14. Propose a tree induction algorithm with backtracking.
15. In generating a univariate tree, a discrete attribute with n possible values can
be represented by n 0/1 dummy variables and then treated as n separate
numeric attributes. What are the advantages and disadvantages of this
approach?
16. In a regression tree, we discussed that in a leaf node, instead of calculating the
mean, we can do a linear regression fit and make the response at the leaf
dependent on the input. Propose a similar method for classification trees.
17. Propose a rule induction algorithm for regression.
18. In regression trees, how can we get rid of discontinuities at the leaf boundaries?
19. Let us say that for a classification problem, we already have a trained decision
tree. How can we use it in addition to the training set in constructing a k-
nearest neighbor classifier?
20. In a multivariate tree, very probably, at each internal node, we will not be
needing all the input variables. How can we decrease dimensionality at a node?
21. Propose a filtering algorithm to find training instances that are very unlikely to
be support vectors.
22. In the empirical kernel map, how can we choose the templates?
23. In the localized multiple kernel of equation
# Questions
Following is the training data for a group of athletes. Based on this data, use k-NN
algorithm and classify Sayan (Weight = 56 kg., Speed = 10 kmph) as a Good, Average,
or Poor sprinter.
Let there be 12 messages of which 8 are normal and 4 are Spam. In the messages some
words occur as follows:
Occurring in normal
Words Occurring in Spam
Message
Dear 8 2
4. Friend 5 1
Lunch 3 0
Money 0 5
Use Naive Bayes to find out whether two massages below belong to Normal or Spam
i. message with "Dear friend" words present
ii. message with "friend money" words present
A set of customer purchase data are collected from a grocery stores is given in the table:
Baby
# Milk Bread Butter Diaper Eggs Fruits
Food
1 1 1 0 0 0 1 1
2 0 0 1 0 0 0 0
5.
3 0 0 0 1 1 0 0
4 1 1 1 0 0 1 1
5 0 1 0 0 0 0 0
Evaluate the Support, Confidence, Lift, Leverage, Conviction for
ሼ࢈࢛࢚࢚ࢋ࢘Ǥ ࢈࢘ࢋࢇࢊሽ ֜
Given a dataset with two features, ሺݔଵ ǡ ݔଶ ሻ and two classes ݕൌ ሼെͳǡ ͳሽ. Suppose
the optimal separating hyperplane found by the SVM is defined by the equation
6.
ͲǤͷݔଵ ͲǤͷݔଶ െ ͳ ൌ Ͳ
What is the entropy of this collection of training examples with respect to the
target function classification?
Instance Classification ࢇ ࢇ
1 + T T
8. 2 + T T
3 - T F
4 + F F
5 - F T
6 - F T
Consider a binary classification problem where we have 200 instances in total,
9. evenly distributed between two classes (100 instances per class). We build a
decision tree that perfectly classifies the training data without any errors. What
is the Gini impurity of the final leaf nodes of this decision tree?
Suppose we have a dataset with 100 instances and 5 features. We decide to
10. build a decision tree classifier. During training, the algorithm splits the data
based on the feature that provides the best information gain at each node. If the
tree has a depth of 4, how many nodes will the decision tree have in total?
A decision tree classifier learned from a fixed training set achieves 100%
11. accuracy on the test set. Which algorithms trained using the same training set is
guaranteed to give a model with 100% accuracy?
Consider the table below where (i, j)th elemant of the table is the distance
13. between points xi and xj . Single linkage clustering is performed on the data
points ( x1 , x2, ……. x5 )