0% found this document useful (0 votes)

74 views4 pages

PCCCS504 Module 4

Uploaded by

sanchayan7432

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views4 pages

PCCCS504 Module 4

Uploaded by

sanchayan7432

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

MODULE 4

1. Another possibility using Gaussian densities is to have them all diagonal but
allow them to be different. Derive the discriminant for this case.
2. Let us say in two dimensions, we have two classes with exactly the same mean.
What type of boundaries can be defined?
3. Let us say we have two variables x1 and x2 and we want to make a quadratic
fit using them, namely, f (x1, x2) = w0 + w1x1 + w2x2 + w3x1x2 + w4(x1)2 +
w5(x2)2 How can we find wi, i = 0, . . . , 5, given a sample of X ={xt 1, xt 2, rt}?
4. In regression we saw that fitting a quadratic is equivalent to fitting a linear
model with an extra input corresponding to the square of the input. Can we
also do this in classification?
5. In document clustering, ambiguity of words can be decreased by taking the
context into account, for example, by considering pairs of words, as in “cocktail
party” vs. “party elections.” Discuss how this can be implemented.
6. Parametric regression assumes Gaussian noise and hence is not robust to
outliers; how can we make it more robust?
7. How can we detect outliers after hierarchical clustering?
8. How does condensed nearest neighbour behave if k > 1?
9. In condensed nearest neighbor, an instance previously added to Z may no
longer be necessary after a later addition. How can we find such instances that
are no longer necessary?
10. In a regressogram, instead of averaging in a bin and doing a constant fit, we
can use the instances falling in a bin and do a linear fit (see figure 8.14). Write
the code and compare this with the regressogram proper.

11. Propose an incremental version of the running mean estimator, which, like the
condensed nearest neighbor, stores instances only when necessary.
12. In the running smoother, we can fit a constant, a line, or a higher-degree
polynomial at a test point. How can we choose among them?
13. For a numeric input, instead of a binary split,
one can use a ternary split with two thresholds and
three branches as xj < wma, wma ≤ xj < wmb, xj ≥ wmb Propose a
modification of the tree induction method to learn the two thresholds,
wma,wmb. What are the advantages and the disadvantages of such a node over
a binary node?
14. Propose a tree induction algorithm with backtracking.
15. In generating a univariate tree, a discrete attribute with n possible values can
be represented by n 0/1 dummy variables and then treated as n separate
numeric attributes. What are the advantages and disadvantages of this
approach?
16. In a regression tree, we discussed that in a leaf node, instead of calculating the
mean, we can do a linear regression fit and make the response at the leaf
dependent on the input. Propose a similar method for classification trees.
17. Propose a rule induction algorithm for regression.
18. In regression trees, how can we get rid of discontinuities at the leaf boundaries?
19. Let us say that for a classification problem, we already have a trained decision
tree. How can we use it in addition to the training set in constructing a k-
nearest neighbor classifier?
20. In a multivariate tree, very probably, at each internal node, we will not be
needing all the input variables. How can we decrease dimensionality at a node?
21. Propose a filtering algorithm to find training instances that are very unlikely to
be support vectors.
22. In the empirical kernel map, how can we choose the templates?
23. In the localized multiple kernel of equation

, propose a suitable model for ηi(x|θi) and

discuss how it can be trained.
24. In kernel regression, what is the relation, if any, between ε and noise variance?
25. In kernel regression, what is the effect of using different ε on bias and variance?
26. How can we use one-class SVM for classification?
Study Material

# Questions
Following is the training data for a group of athletes. Based on this data, use k-NN
algorithm and classify Sayan (Weight = 56 kg., Speed = 10 kmph) as a Good, Average,
or Poor sprinter.

Name Weight(Kg) Speed(Km/S) Class

Nitish 55 9 Average
1. Gurpreet 58 8 Poor
Soma 60 7.5 Poor
Nidhi 59 8.5 Average
Mohon 57 10 Good
Usha 53 10.5 Good
Pankaj 53 10 Good
In a software project, the team is trying to identify the similarity of software defects
identified during testing. They wanted to create 5 clusters of similar defects based on
the text analytics of the defect descriptions. Once the 5 clusters of defects are identified,
2.
any new defect created is to be classified as one of the types identified through
clustering. Create this approach through a neat diagram. Assume 20 Defect data points
which are clustered among 5 clusters and k-means algorithm was used.
a. Discuss the major drawbacks of K-nearest Neighbour algorithm and how it can be
corrected.
b. A sample from class-A is located at (X, Y, Z) = (1, 2, 3), a sample from class-B is at
3.
(7, 4, 5) and a sample from class-C is at (6, 2, 1).
How would a sample at (3, 4, 5) be classified using the Nearest Neighbour technique
and Euclidean distance?

Let there be 12 messages of which 8 are normal and 4 are Spam. In the messages some
words occur as follows:
Occurring in normal
Words Occurring in Spam
Message
Dear 8 2
4. Friend 5 1
Lunch 3 0
Money 0 5
Use Naive Bayes to find out whether two massages below belong to Normal or Spam
i. message with "Dear friend" words present
ii. message with "friend money" words present

A set of customer purchase data are collected from a grocery stores is given in the table:
Baby
# Milk Bread Butter Diaper Eggs Fruits
Food
1 1 1 0 0 0 1 1
2 0 0 1 0 0 0 0
5.
3 0 0 0 1 1 0 0
4 1 1 1 0 0 1 1
5 0 1 0 0 0 0 0
Evaluate the Support, Confidence, Lift, Leverage, Conviction for
ሼ࢈࢛࢚࢚ࢋ࢘Ǥ ࢈࢘ࢋࢇࢊሽ ֜ ࢓࢏࢒࢑
Given a dataset with two features, ሺ‫ݔ‬ଵ ǡ ‫ݔ‬ଶ ሻ and two classes ‫ ݕ‬ൌ ሼെͳǡ ͳሽ. Suppose
the optimal separating hyperplane found by the SVM is defined by the equation
6.
ͲǤͷ‫ݔ‬ଵ ൅ ͲǤ͹ͷ‫ݔ‬ଶ െ ͳ ൌ Ͳ

Find the margin of the hyperplane.

Consider the following data points and their corresponding labels:
‫ݔ‬ଵ ‫ݔ‬ଶ y
1 2 1
2 2 1
7. 2 3 -1
3 3 -1
An SVM model finds the hyperplane ‫ݔ‬ଵ ൅ ‫ݔ‬ଶ െ ͵ ൌ Ͳ.
Identify which data points are support vectors.

What is the entropy of this collection of training examples with respect to the
target function classification?
Instance Classification ࢇ૚ ࢇ૛
1 + T T
8. 2 + T T
3 - T F
4 + F F
5 - F T
6 - F T
Consider a binary classification problem where we have 200 instances in total,
9. evenly distributed between two classes (100 instances per class). We build a
decision tree that perfectly classifies the training data without any errors. What
is the Gini impurity of the final leaf nodes of this decision tree?
Suppose we have a dataset with 100 instances and 5 features. We decide to
10. build a decision tree classifier. During training, the algorithm splits the data
based on the feature that provides the best information gain at each node. If the
tree has a depth of 4, how many nodes will the decision tree have in total?

A decision tree classifier learned from a fixed training set achieves 100%
11. accuracy on the test set. Which algorithms trained using the same training set is
guaranteed to give a model with 100% accuracy?

Given a dataset with K binary value attributes (K>2) for a two-class

12. classification task. How will you estimate the number of parameters for
learning a Naïve Bayes Classifier and what will be the number?

Consider the table below where (i, j)th elemant of the table is the distance
13. between points xi and xj . Single linkage clustering is performed on the data
points ( x1 , x2, ……. x5 )

Toxic Comment Analysis For Online Learning
No ratings yet
Toxic Comment Analysis For Online Learning
6 pages
Nptel ML Questions
No ratings yet
Nptel ML Questions
12 pages
Beer Sales With Analysis
No ratings yet
Beer Sales With Analysis
84 pages
IML-IITKGP - Assignment 1 Solution
No ratings yet
IML-IITKGP - Assignment 1 Solution
7 pages
ML Course Slides
No ratings yet
ML Course Slides
345 pages
Statistics in Kinesiology - 5th Edition Total Access Ebook
No ratings yet
Statistics in Kinesiology - 5th Edition Total Access Ebook
16 pages
2023 ML Assignment
No ratings yet
2023 ML Assignment
57 pages
Different Types of Post
100% (1)
Different Types of Post
4 pages
ML Interview Questions PDF
100% (5)
ML Interview Questions PDF
20 pages
Answer 2022-23
No ratings yet
Answer 2022-23
22 pages
Prems Mann
0% (1)
Prems Mann
17 pages
Giant Pile ML Problems
No ratings yet
Giant Pile ML Problems
56 pages
536C3B
No ratings yet
536C3B
2 pages
QB ITA06 - DR - VR
No ratings yet
QB ITA06 - DR - VR
29 pages
ML Questions
No ratings yet
ML Questions
9 pages
Simple-Linear-Regression-Model-3 24
No ratings yet
Simple-Linear-Regression-Model-3 24
87 pages
ML and Ai Unit 04 and Unit 05
No ratings yet
ML and Ai Unit 04 and Unit 05
58 pages
Aam Ut-1 QB Ans - (Final)
No ratings yet
Aam Ut-1 QB Ans - (Final)
28 pages
Aam Unit 2 QB With Answer
No ratings yet
Aam Unit 2 QB With Answer
16 pages
Aam Ut-1 QB Ans (Final)
No ratings yet
Aam Ut-1 QB Ans (Final)
26 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
Unit 6 Ai
No ratings yet
Unit 6 Ai
28 pages
MachineLearning MidTerm UMT Spring 2021
100% (1)
MachineLearning MidTerm UMT Spring 2021
12 pages
Chap03 4
No ratings yet
Chap03 4
49 pages
AsCEnD Machine Learning Course Answers
No ratings yet
AsCEnD Machine Learning Course Answers
35 pages
Assignment On Data Analysis Using Stat and EViews.
No ratings yet
Assignment On Data Analysis Using Stat and EViews.
33 pages
Measurement Error Models
No ratings yet
Measurement Error Models
79 pages
Exam Revision Solution
No ratings yet
Exam Revision Solution
15 pages
(PDF) Effectiveness of Seafarers Training Using Maritime Simulators
No ratings yet
(PDF) Effectiveness of Seafarers Training Using Maritime Simulators
14 pages
Aam Ut-1 QB Ans
No ratings yet
Aam Ut-1 QB Ans
12 pages
DR - Nadeem Anwar
No ratings yet
DR - Nadeem Anwar
20 pages
QCM DL
No ratings yet
QCM DL
7 pages
Introduction To Econometrics
No ratings yet
Introduction To Econometrics
37 pages
ML Assignment 2
No ratings yet
ML Assignment 2
2 pages
Data Mining Final Review
No ratings yet
Data Mining Final Review
20 pages
DM-I Q Paper 2024
No ratings yet
DM-I Q Paper 2024
12 pages
Breast Cancer Aiml Project
No ratings yet
Breast Cancer Aiml Project
25 pages
Auronova Consulting
No ratings yet
Auronova Consulting
8 pages
Rank Correlation
No ratings yet
Rank Correlation
18 pages
SDS Solution1
No ratings yet
SDS Solution1
26 pages
Mcqs 1
No ratings yet
Mcqs 1
34 pages
10 11 12 Neural Network
No ratings yet
10 11 12 Neural Network
20 pages
تقدير دالة الطلب على الواردات في السودان خلال الفترة (1998- 2017)
No ratings yet
تقدير دالة الطلب على الواردات في السودان خلال الفترة (1998- 2017)
15 pages
120 DS-With Answer
100% (1)
120 DS-With Answer
32 pages
DMT MCQ
No ratings yet
DMT MCQ
15 pages
MID SEM QP 2024 MARCH Final
No ratings yet
MID SEM QP 2024 MARCH Final
4 pages
Objectives Questions For Data Mining
No ratings yet
Objectives Questions For Data Mining
4 pages
QBank All Mod
No ratings yet
QBank All Mod
5 pages
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
100% (1)
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
5 pages
hw4 So
100% (2)
hw4 So
18 pages
Questo Es
No ratings yet
Questo Es
8 pages
DMML2025 Final Exam Sample
No ratings yet
DMML2025 Final Exam Sample
2 pages
Titanic Eda
No ratings yet
Titanic Eda
14 pages
QTB Project Report
No ratings yet
QTB Project Report
15 pages
13.question Bank
No ratings yet
13.question Bank
4 pages
Data Mining End 23 24
No ratings yet
Data Mining End 23 24
2 pages
Data Science - QB
No ratings yet
Data Science - QB
8 pages
ML Exercises 4 5 6 en
No ratings yet
ML Exercises 4 5 6 en
4 pages
AI+and+ML Assigment 03
No ratings yet
AI+and+ML Assigment 03
4 pages
Exercises ML PDF
No ratings yet
Exercises ML PDF
4 pages
Applied Regression Analysis For Business - Tools, Traps and Applications (PDFDrive)
No ratings yet
Applied Regression Analysis For Business - Tools, Traps and Applications (PDFDrive)
294 pages
ML MID-1 Question Bank
No ratings yet
ML MID-1 Question Bank
6 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
B. Sc. H Computer S 3OWYH6v
No ratings yet
B. Sc. H Computer S 3OWYH6v
6 pages
Results and Discussion
No ratings yet
Results and Discussion
5 pages
Machine Learning CA 2
No ratings yet
Machine Learning CA 2
19 pages
Package Dynlm': R Topics Documented
No ratings yet
Package Dynlm': R Topics Documented
6 pages
ML 06 Multiclass
No ratings yet
ML 06 Multiclass
11 pages
Problemset2 PDF
No ratings yet
Problemset2 PDF
4 pages
STATISTICAL REPORTING ACTIVITY - 1 Way
No ratings yet
STATISTICAL REPORTING ACTIVITY - 1 Way
3 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
21 pages
Data Mining f20 Practice Final Solutions
No ratings yet
Data Mining f20 Practice Final Solutions
8 pages
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
No ratings yet
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
25 pages
HW1
No ratings yet
HW1
4 pages
Machine Learning Multiple Choice Questions
100% (1)
Machine Learning Multiple Choice Questions
20 pages
Practical 1 - Heteroskedasticity & Autocorrelation
No ratings yet
Practical 1 - Heteroskedasticity & Autocorrelation
2 pages
Machine Learning Interview Questions PDF
No ratings yet
Machine Learning Interview Questions PDF
14 pages
Machine Learning Bits
100% (2)
Machine Learning Bits
28 pages
Pearson's Product-Moment Correlation Coefficient - Is A Measure of The Linear Strength of The Association Between Two Variables - It Is
No ratings yet
Pearson's Product-Moment Correlation Coefficient - Is A Measure of The Linear Strength of The Association Between Two Variables - It Is
2 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
hw2 2011spring
0% (1)
hw2 2011spring
3 pages
Data Mining Sample Midterm Questions (Last Modified 2/17/19)
No ratings yet
Data Mining Sample Midterm Questions (Last Modified 2/17/19)
4 pages
Assignment 6 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 6 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
10 pages
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
No ratings yet
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
9 pages
Geoff Bohling NonParClass
No ratings yet
Geoff Bohling NonParClass
26 pages
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
No ratings yet
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
2 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
4 pages
Painless Calculus
From Everand
Painless Calculus
Barron's Educational Series
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

PCCCS504 Module 4

Uploaded by

PCCCS504 Module 4

Uploaded by

MODULE 4

, propose a suitable model for ηi(x|θi) and

Name Weight(Kg) Speed(Km/S) Class

Find the margin of the hyperplane.

Given a dataset with K binary value attributes (K>2) for a two-class

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.