0% found this document useful (0 votes)
66 views37 pages

MLT Cat Ii

The document is a question bank for a Machine Learning Techniques subject. It contains 29 multiple choice questions covering topics like artificial intelligence, machine learning types (supervised, unsupervised, reinforcement), applications, algorithms, handling missing data, overfitting, candidate elimination algorithm and outlier detection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views37 pages

MLT Cat Ii

The document is a question bank for a Machine Learning Techniques subject. It contains 29 multiple choice questions covering topics like artificial intelligence, machine learning types (supervised, unsupervised, reinforcement), applications, algorithms, handling missing data, overfitting, candidate elimination algorithm and outlier detection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

VEL TECH MULTI TECH Dr. RANAGARAJAN Dr.

SAKUNTHALA
ENGINEERING COLLEGE
(An Autonomous Institution)
Degree/Branch/B.Tech/AIDS/CSBS Subject Code:191CB621
Date: Year/Sem: III/VI
MACHINE LEARNING TECHNIQUES
QUESTION BANK

UNIT- 1 INTRODUCTION
S.NO QUESTIONS CO K
LEV LEVEL
UNIT 1 EL
PART A

1. What is artificial intelligence?


a) Artificial Intelligence is a field that aims to make humans more intelligent
b) Artificial Intelligence is a field that aims to improve the security CO1. K1
c) Artificial Intelligence is a field that aims to develop intelligent machines 1
d) Artificial Intelligence is a field that aims to mine the data
2. Which of the following is the branch of artificial intelligence?
a) Machine Learning
b) Cyber forensics CO1. K1
c) Full-Stack Developer 1
d) Network Design
3. Choose the correct option regarding Machine Learning (ML) and Artificial
Intelligence (AI)
a) ML is a set of techniques that turns a dataset into a software CO1. K1
b) AI is a software that can emulate the human mind 1
c) ML is an alternate way of programming intelligent machines
d) All of the above
4. Which of the following is not a numerical function in the various function
representation of Machine Learning?
a) Neural Network CO1. K1
b) Support Vector Machines 2
c) Case-based
d) Linear Regression
5. What is machine learning?
a) The autonomous acquisition of knowledge through the use of computer
programs CO1. K1
b) The autonomous acquisition of knowledge through the use of manual programs 3
c) The selective acquisition of knowledge through the use of computer programs
d) The selective acquisition of knowledge through the use of manual programs
6. Machine learning is an application of _________
a) Block chain
b) Artificial intelligence CO1. K1
c) Both A and B 3
d) None of the above
7. Application of machine learning is _______
a) Email filtering
b) Sentimental analysis CO1. K1
c) Face recognition 3
d) All of the above
8. Which of the factors does not affect the performance of the learner system?
a) Good data structures
b) Representation scheme used CO1. K2
c) Training scenario 4
d) Type of feedback
9. Machine learning approach is traditionally categorized in to ________
a) 3
b) 4 CO1. K1
c) 7 5
d) 9

10. Which of the following algorithms are used in machine learning?


a) Naive Bayes
b) Support vector CO1. K1
c) K- nearest neighbor 5
d) All the above
What are the types of machine learning?
12. A) Supervised learning
b) Unsupervised learning CO1. K2
c) Reinforcement learning 5
d) All the above
13. Supervised learning problem can be grouped as_________________
a) regression
b) classification problems CO1. K2
c) both A and b 5
d) none of the above
14. Unsupervised learning problem can be grouped under_________
a) clustering
b) association CO1. K1
c) both A and B 5
d) none of the above
15. Fraud Detection, Image Classification, Diagnostic, and Customer Retention
are applications in which of the following_____________
a) Unsupervised Learning: Regression CO1. K1
b) Supervised Learning: Classification 5
c) Unsupervised Learning: Clustering
d) Reinforcement Learning

16. Which of the following is not a supervised learning?


a) Naive Bayesian
b) PCA(Principle component analysis) CO1. K1
c) Linear Regression 5
d) Decision Tree
17. A different learning method does not include____________
a) Memorization
b) Analogy CO1. K2
c) Deduction 5
d) Introduction
18. Targeted marketing, Recommended Systems, and Customer Segmentation are
applications of ________
a) supervised learning : Classification CO1. K1
b)unsupervised learning : Clustering 5
c) unsupervised learning : Regression
d) Reinforcement Learning
19. Machine learning algorithms used in labeled data is known as____
a) regression algorithms
b) clustering algorithms CO1. K1
c) association algorithms 6
d) all of the above

20. Inductive learning takes examples and generalizes rather than starting with
__________ knowledge.
a) Inductive CO1. K1
b) Existing 6
c) Deductive
d) None of the above
A model of language consists of the categories which does not include ________
21. a) system units
b) structural units CO1. K2
c) data units 7
d) empirical units

22. How do you handle missing or corrupted data in a dataset?


CO1. K2
a) Drop missing rows or columns 7
b) Replace missing values with mean/median/mode
c) Assign a unique category to missing values
d) All of the above

Concept learning inferred a ______ valued function from training examples of its
23. input and output. C01. K1
a) Decimal 8
b) Hexadecimal
c) Boolean
d) All of the above
___________ is the scenario when model fail to decipher the underlying trend in
24. the input data.
a) Over fitting CO1. K2
b) Under fitting 8
c) both A and B
d) none of the above
In language understanding, which is not include in the level of knowledge?
25. a) Empirical
b) Logical CO1. K2
c) Phonological 7
d) Syntactic
Identify the model which is trained with data in a single batch.
26. a) Offline learning
b) Batch learning CO1. K2
c) Both A and B 7
d) None

A Candidate-Elimination algorithm represents the _____.


27. a) Solution Space
b) Version Space CO1. K1
c) Elimination Space 7
d) All of the above
28. Which of the following machine learning techniques helps in detecting the
outliers in data? CO1. K1
a) Classification 8
b) Clustering
c) Anomaly detection
d) All of the above

29. Identify the false option regarding regression.


a) It relates inputs to outputs. CO1. K1
b) It is used for prediction. 8
c) It may be used for interpretation.
d) It discovers causal relationships

30. Which of the following machine learning algorithm is based upon the idea of
bagging? CO1. K1
a) Decision tree 10
b) Random-forest
c) Classification
d) regression
FIND-S Algorithm starts from the most specific hypothesis and generalize it by
31. considering only ________ examples.
a) Negative CO1. K2
b) Positive 10
c) Negative or Positive
d) None of the above

PART B

1 Imagine we have two possibilities: We can scan and email the image, or
we can use an optical character reader (OCR) and send the text file. CO1. K1
1
Discuss the advantage and disadvantages of the two approaches in a
comparative manner. When would one be preferable over the other?

2. Give three computer applications for which machine learning approaches


seem appropriate and three for which they seem inappropriate, include a CO1. K2
one-sentence justification for each. 1

3. Differentiate between training data and testing Data CO1. K2


1

Pick some learning task and state it precisely as possible the task,
4. performance measure, and training experience. CO1. K2
3

5. What algorithms exist for learning general target functions from training
examples? In what settings will particular algorithms converge to the CO1. K1
5
desired function, given sufficient training data? Which algorithms perform
best for which types of problems and representations?

6. Assume we are given the task of building a system to distinguish junk


email. What is in a junk email that lets us know that it is junk? How can CO1. K1
the computer detect junk through a syntactic analysis? What would we like 8
the computer to do if it detects a junk email—delete it automatically, move
it to a different file, or just highlight it on the screen?
Let us say we are given the task of building an automated taxi. Define the
7. constraints. What are the inputs? What is the output? How can we CO1. K2
communicate with the passenger? Do we need to communicate with the 9
other automated taxis, that is, do we need a “language”?
Explain why the size of the hypothesis space in the EnjoySport learning
8. task is 973. How
would the number of possible instances and possible hypotheses increase
with the
CO1. K3
addition of the attribute WaterCurrent, which can take on the values Light,
9
Moderate or
Strong? More generally, how does the number of possible instances and
hypotheses
grow with the addition of a new attribute A that takes on k possible values?

9. Take, for example, the word “machine.” Write it ten times. Also ask a
friend to write it ten times. Analyzing these twenty images, try to find CO1. K1
features, 20 1 Introduction types of strokes, curvatures, loops, how you 10
make the dots, and so on, that discriminate your handwriting from that of
your friend’s.

10. Give decision trees to represent the following Boolean functions: (a) A ˄˜B CO1. K4
(b) A V [B ˄ C] (c) A XOR B (d) [A ˄ B] v [C ˄ D] 11

11. True or false: If decision tree D2 is an elaboration of tree Dl, then Dl is


more general-than D2. Assume Dl and D2 are decision trees representing CO1. K2
arbitrary Boolean functions, and that D2 is an elaboration of Dl if ID3 11
could extend Dl into D2. If true, give a proof; if false, a counterexample.
how to find the entropy given the four probabilities p1=0.1, p2=0.2, p3=0.3
12. and p4=0.4 in the decision tree. CO1. K3
12

PART C

How much training data is sufficient? What general bounds can be found
1. to relate the confidence in learned hypotheses to the amount of training
experience and the character of the learner's hypothesis space? CO1 K2
.1
Taking a very simple example, one possible target concept may be to Find
2. the day when my friend Ramesh enjoys his favorite sport. We have some
attributes/features of the day like, Sky, Air CO1 K4
Temperature, Humidity, Wind, Water, Forecast and based on this we have .1
a target Concept named Enjoy Sport.

For Enjoy
examp Air Humid Win Wat
Sky ecas
le Temp ity d er
t Sport

Stro War Sam


1 Sunny Warm Normal Yes
ng m e

Stro War Sam


2 Sunny Warm High Yes
ng m e

Stro War Cha


3 Rainy Cold High No
ng m nge

Stro cha
4 Sunny Warm High Cool yes
ng nge

Design problem formally with TPE?

3. Implement an algorithm similar to the checkers problem. Use the simpler CO1 K4
game of tic-tac-toe. Represent the learned function as a linear combination .7
of board features of your choice.
Consider the Enjoy Sport learning task and the hypothesis space Define a
4. new hypothesis space H' that consists of all pair vise disjunctions. For CO1 K4
example, a typical hypothesis in H' is (?, Cold, High, ?, ?, ?) v (Sunny, ?, .9
High, ?, ?, Same) Trace the CANDIDATE-ELIMINATION algorithm for
the hypothesis space H' given the sequence of training
5. FIND S Algorithm is used to find the Maximally Specific Hypothesis. CO1 K3
Using the Find-S algorithm gives a single maximally specific hypothesis .9
for the given set of training examples.

Example Eyes Nose Head colour Hair Smile

1 Round Triangle Round Purple Yes Yes

2 Square Square Square Green Yes No

3 Square Triangle Round Yellow Yes Yes

4 Round Triangle Round Green No No

5 Square Triangle Round Yellow Yes Yes

.
Consider the following sequence of positive and negative training
6. examples describing the concept "pairs of people who live in the same
house." Each training example describes an ordered pair of people, with
each person described by their sex color (black, brown, or blonde), height
(tall, medium, or short), and nationality (US, French, German, Irish,
Indian, Japanese, or Portuguese). + ((male brown tall US) (female black
short US)) + ((male brown short French)( female black short US)) -
((female brown tall German)( female black short Indian)) + ((male brown
tall Irish) ( female brown short Irish))Consider a hypothesis space defined CO1 K3
over these instances, in which each hypothesis is represented by a pair of .9
Tuples, and where each attribute constraint may be a specific value, "?," or
"0," just as in the Enjoy Sport hypothesis representation. For example, the
hypothesis ((male ? tall ?)(female ? ? Japanese)) represents the set of all
pairs of people where the first is a tall male(of any nationality and hair
color), and the second is a Japanese female (of any hair color and height).
Provide a hand trace of the CANDIDATE-ELIMINATION algorithm
learning from the above training examples and hypothesis language. In
particular, show the specific and general boundaries of the version space
after it has processed the first training example, then the second training
example, etc
Example Citations Size In Price Editions Buy
7. library
1 Some Small NO Affordable One No CO1 K3
2 Many Big NO Expensive Many Yes .9
3 Many Medium NO Expensive Few Yes
4 Many Small NO Affordable Many Yes
Using Candidate Elimination Algorithmfind the possible maximal
hypothesis for the given set of data.
Example Size Color Shape Class
8. 1 Big Red Circle No
2 Small Red Triangle NO CO1 K3
3 Small Red Circle YES .9
4 Big Blue Circle NO
5 Small Blue Circle YES
Using Candidate Elimination Algorithm find the possible maximal
hypothesis for the given set of data.
Design a method for converting the conjunctive data set in to a disjunctive CO1 K3
9. data set in candidate elimination algorithm. .11
Consider the following set of training example
10. • What is the entropy of this collection of training examples with
respect to the target function classification?
• What is the information gain of a2 relative to these training
examples?

Instance Classific a1 a2 CO1 K3


ation .11

1 + T T

2 + T T

3 – T F

4 + F F

5 – F T

6 – F T
11. Age Completion type profit CO1 K2
Old Yes s/w Down .11
Old No s/w Down
Old No h/w Down
Mid Yes s/w Down
Mid Yes h/w Down
Mid No h/w Up
Mid No s/w Up
New Yes s/w Up
New No h/w Up
New No s/w Up

Consider the following set of training example


• What is the entropy of this collection of training examples with
respect to the target function classification?
• Find the information gain of non target attributes from the given
sets?

12. outlook temperature wind humidity Play CO1 K3


Sunny Hot High False No .11
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cold Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No

Draw a decision tree using CART algorithm from the given


dataset?

UNIT–II-NEURAL NETWORK AND GENETIC ALGORITHM

PART A
Questions K
CO level
Artificial neural network is used for---------------
(A)Classification
(B) Clustering
1 CO2.1 K1
(C) Pattern recognition
(D) All of the above
Artificial Neural Network is based on -------------- approach.
(A)Weak Artificial Intelligence approach
(B)Cognitive Artificial Intelligence approach
2 CO2.1 K1
(C)Strong Artificial Intelligence approach
(D)Applied Artificial Intelligence approach
________ computes the output volume by computing dot product between
all filters and image patch
(A)Input Layer
3 (B)Convolution Layer CO2.1 K1
(C)Pool Layer
(D)Activation Function Layer
_____ are the ways to represent uncertainty
(A)Fuzzy logic
(B)Entropy
4 CO2.2 K1
(C)Probability
(D)All of the above
Who was the inventor of the first neurocomputer?
(A)Dr. Robert Hecht-Nielsen
(B)Dr. John Hecht-Nielsen
5 CO2.2 K1
(C)Dr. Alex Hecht-Nielsen
(D) Dr. Steve Hecht-Nielsen
Which of the following is true for neural networks?
(A)It has a set of nodes and connections
(B)A node could be in an excited state or non-excited state
6 CO2.2 K1
(C )Each node computes it’s weighted input
(D)All of the above
Back propagation is a learning technique that adjusts weights in the neural
network by propagating weight changes-----------
(A)Backward from sink to source
7 (B)Forward from source to sink CO2.2 K1
(C )Backward from sink to hidden nodes
(D)Forward from source to hidden nodes
The fundamental unit of the neural network is------------
(A)Neuron
(B)Brain
8 CO2.3 K1
(C)Nucleus
(D)Dendrites
Which of the following neural networks uses supervised learning?
(A) Multilayer perceptron
(B) Self-organizing feature map
(C)Hopfield network
9 CO2.3 K1
(A) only
(B) only
(A) and (C) only
(A) and (B) only
Which of the following techniques perform similar operations as dropout
in a neural network?
(A)Bagging
10 CO2.3 K1
(B)Boosting
(C )Stacking
(D)None of the above
11 A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear CO2.5 K2
with the constant of proportionality being equal to 2. The inputs are 4, 10,
5 and 20 respectively. The output will be:
(A)76
(B)128
(C)238
(D)228
12 ______________ Algorithm propagates errors from nodes of output to CO2.9 K1
input
(A) Back propagation
(B) Front Propagation
(C )Signal Propagation
(D)Channel Propagation
13 Which rule is followed by the Back propagation algorithm? CO2.9 K1
(A)Static Rule
(B)Dynamic Rule
(C)Chain Rule
(D)None
14 Error rates are reduced in back propagation due to _____________ CO2.9 K2
(A) Proper Tuning
(B) Iteration
(C) Improper Tuning
(D)Generalization
15 Which parameter should be set while using Back propagation? CO2.9 K1
(A)Number of inputs
(B)Number of outputs
(C )Number of Gradients
(D)Number of intermediate Stages
16 Back propagation algorithm consist of---------layers CO2.9 K2
(A)Zero
(B)Three
(C)Two
(D)One
17 Back propagation algorithms are practically applied in _____________ CO2.9 K2
(A)Artificial Intelligence
(B)Natural Language Processing
(C )Image Processing
(D)All the above mentioned
18 What is determined by the adjustment level of the Cost function? CO2.10 K2
(A)Number of inputs
(B)Number of outputs
(C )Number of Gradients
(D)Number of intermediate Stages
19 Which approach is most suited to structured problems with little CO2.10 K1
uncertainty?
(A) Simulation
(B) Human intuition
(C) Optimization
(D) Genetic algorithms
20 Identify the kind of learning algorithm for “facial identities for facial CO2.10 K1
expressions”.
(A) Prediction
(B) Recognition Patterns
(C)Recognizing anomalies
(D) Generating Patterns
21 Choose a disadvantage of decision trees among the following. CO2.11 K1
(A)Decision trees are robust to outliers
(B)Factor Analysis
(C)Decision trees are prone to over fit
(D)All the above
22 The most significant phase in genetic algorithm is _________ CO2.13 K2
(A)Mutation
(B)Selection
(C )Fitness Function
(D) Cross over
23 Which of the following are common classes of problems in machine CO2.13 K1
learning?
(A)Regression
(B)Classification
(C)Clustering
( D)All of the above
24 Identify the successful applications of ML. CO2.12 K1
(A)Learning to classify new astronomical structures
(B)Learning to recognize spoken words
(C)Learning to drive an autonomous vehicle
(D)All of the above
25 Select the correct definition of neuro software. CO2.12 K1
(A)It is software used by neurosurgeons
(B)It is software used to analyse neurons
(C)It is a powerful and easy neural network
(D)None of the above
26 Identify among the following which is not evolutionary computation.
(A)Genetic algorithm CO2.14 K1
(B)Genetic programming
(C)Neuro evolution
(D)Perceptron
27 Identify the clustering method which takes care of variance in data CO2.14 K1
(A)Decision tree
(B)Gaussian mixture model
(C)K means
(D)All of the above
28 Genetic algorithm belong to the family of method in the-------------------
(A) Artificial Intelligence CO2.13 K2
(B)Optimization
(C)Non computer based system
(D) Complete enumeration family of methods
29 Genetic programming are represented by------------ CO2.13 K1
(A)Lines of code
(B)Syntax tree
(C)Bit values
(D)Strings
30 What are general limitations of back propagation rule? CO 2.13 K1
a)Local minima problem
b)Slow convergence
c)Scaling
d)all of above

PART-B

1. Compare the concept of convolution and feed forward neural networks. CO2.1 K2
M
∑WiXi+ bias = W1X1 + W2X2 + W3X3 + bias
i=1
2. OUTPUT =F(X)={1 if ∑ W1X1 + b ≥ 0 CO2.2 K2
{0 if ∑ W1X1 + b ≥ 0
Compose the process how output is gained from given weight of the
input.
Identify the function which is used as a measure of accuracy in neural network
and which function is used learn the patterns accurately in training data, and CO2.3 K2
3. precise the uses of that function.

4. Design some of the popular activation functions used in neural networks. CO2.3 K2
Suppose there is a perceptron having weights corresponding to the three inputs
have the following values:
w1 = 2 ; w2 = −4; and w3 = 1 and the activation of the unit is given by the step-
function: φ(v) = 1 if v≥0 otherwise 0
Calculate the output value y of the given perceptron for each of the following
5. input patterns: CO2.3 K3
Pattern P1 P2 P3 P4

X1 1 0 1 1
X2 0 1 0 1
X3 0 1 1 1
6. List out the characteristics which is used in back propagation algorithm. CO2.9 K1
Compare and contrast the gradient descent and Delta rule.
7. CO2.10 K2
8. Derive the Gradient Descent algorithm for training a linear unit. CO2.11 K3
9. Draft the systematic diagram for genetic operators. CO2.12 K2
10. Illustrate the Lamarckian Evolution with an example. CO2.12 K3
Create the program for tree representation in genetic programming.
11. CO2.13 K3
12. Evaluate the parallelizing Genetic Algorithms. CO2.14 K4
13. Describe the models of evolution and learning. CO2.15 K2

PART C

Identify the problems for which ANN learning is well suited and write down
1 CO2.1 K3
the characteristics.
Elaborate in detail about the Neural Network representation with suitable
2 CO2.2 K2
example.

3 Analyze architecture and concept behind the mechanism of ANN and BNN. CO2.2 K3
Describe about perceptron with an example and draw the decision surface
4 CO2.3 K3
represented by a two-input perceptron.
5 Derive the expression of the back propagation algorithm. CO2.9 K3
6 Apply BACKPROPAGATION to the task of face recognition application. CO2.9 K3
Demonstrate hypothesis space search of Genetic algorithm with neural
8 CO2.12 K3
network back propagation algorithm.
9 Analyze the bit string representation of Genetic algorithms hypothesis. CO2.12 K4

(i)Illustrate the diagram for visualizing the Hypothesis space. (6M)


10 (ii)Analyze the derivation of the Gradient Descent Rule. (6M) CO2.13 K3
Elaborate in detail about the following
11 (i)Alternative Error Functions CO2.14 K4
(ii) Alternative Error Minimization Procedures
12 Analyze the models of evolution and learning in Genetic algorithm CO2.15 K4

UNIT- III BAYESIAN AND COMPUTATIONAL LEARNING


S.NO QUESTIONS CO LEVEL K LEVEL

UNIT III
PART A

1. Method in which the previously calculated probabilities are revised CO3.1 K1


with values of new probability is called _________
(A) Revision theorem
(B) Bayes theorem
(C) Dependent theorem
(D) Updation theorem
2. Formula for Bayes theorem is ________ CO3.1 K1
(A) P(A|B) = P(B│A)P(A)P(B)
(B) P(A|B) = P(A)P(B)
(C) P(A|B) = P(B│A)P(B)
(D) P(A|B) = 1P(B)
3. ____ terms are required for building a bayes model. CO3.1 K1
(A) 1
(B) 2
(C) 3
(D) 4
4. Why it is needed to make probabilistic systems feasible in the CO3.2 K1
world?
(A) Feasibility
(B) Reliability
(C) Crucial robustness
(D) None of the above
5. Bayes rule can be used for:- CO3.2 K1
(A) Solving queries
(B) Increasing complexity
(C) Answering probabilistic query
(D) Decreasing complexity
6. _____ provides way and means of weighing up the desirability of CO3.3 K1
goals and the likelihood of achieving them.
(A) Utility theory
(B) Decision theory
(C) Bayesian networks
(D) Probability theory
7. Which of the following provided by the Bayesian Network? CO3.3 K1
(A) Complete description of the problem
(B) Partial description of the domain
(C) Complete description of the domain
(D) All of the above
8. Probability provides a way of summarizing the ______ that comes CO3.4 K2
from our laziness and ignorance’s.
(A) Belief
(B) Uncertainty
(C) Joint probability distributions
(D) Randomness
9. The entries in the full joint probability distribution can be calculated CO3.4 K1
as
(A) Using variables
(B) Both Using variables & information
(C) Using information
(D) All of the above
10. Bayesian networks allow compact specification of:- CO3.5 K1
(A) Joint probability distributions
(B) Belief
(C) Propositional logic statements
(D) All of the above
12. The compactness of the bayesian network can be described by CO3.5 K2
(A) Fully structured
(B) Locally structured
(C) Partially structured
(D) All of the above
13. Which of the following is correct about the Naive Bayes? CO3.6 K2
(A) Assumes that all the features in a dataset are independent
(B) Assumes that all the features in a dataset are equally important
(C) Both
(D) All of the above
14. Which of the following is false regarding EM Algorithm? CO3.8 K1
(A) The alignment provides an estimate of the base or amino acid
composition of each column in the site
(B) The column-by-column composition of the site already
available is used to estimate the probability of finding the site at
any position in each of the sequences
(C) The row-by-column composition of the site already available is
used to estimate the probability
(D) None of the above
15. Naïve Bayes Algorithm is a ________ learning algorithm. CO3.5 K1
(A) Supervised
(B) Reinforcement
(C) Unsupervised
(D) None of these
16. EM algorithm includes two repeated steps, here the step 2 is CO3.8 K1
______.
(A) The normalization
(B) The maximization step
(C) The minimization step
(D) None of the above
17. Examples of Naïve Bayes Algorithm is/are CO3.6 K2
(A) Spam filtration
(B) Sentimental analysis
(C) Classifying articles
(D) All of the above
18. Naïve Bayes algorithm is based on _______ and used for solving CO3.6 K1
classification problems.
(A) Bayes Theorem
(B) Candidate elimination algorithm
(C) EM algorithm
(D) None of the above
19. Types of Naïve Bayes Model: CO3.6 K1
(A) Gaussian
(B) Multinomial
(C) Bernoulli
(D) All of the above

Disadvantages of Naïve Bayes Classifier:


20. (A) Naive Bayes assumes that all features are independent or CO3.6 K1
unrelated, so it cannot learn the relationship between features.
(B) It performs well in Multi-class predictions as compared to the
other Algorithms.
(C) Naïve Bayes is one of the fast and easy ML algorithms to
predict a class of datasets.
(D) It is the most popular choice for text classification problems.
The benefit of Naïve Bayes:-
21. (A) Naïve Bayes is one of the fast and easy ML algorithms to
predict a class of datasets. CO3.6 K2
(B) It is the most popular choice for text classification problems.
(C) It can be used for Binary as well as Multi-class Classifications.
(D) All of the above
In which of the following types of sampling the information is
22. carried out under the opinion of an expert? CO3.10 K2
(A) Convenience sampling
(B) Judgement sampling
(C) Quota sampling
(D) Purposive sampling
Full form of MDL.
23. (A) Minimum Description Length CO3.4 K1
(B) Maximum Description Length
(C) Minimum Domain Length
(D) None of these
What are the area CLT comprised of?
24. (A) Sample Complexity CO3.10 K2
(B) Computational Complexity
(C) Mistake Bound
(D) All of these
___________ of hypothesis h with respect to target concept c and
25. distribution D , is the probability that h will misclassify an instance CO3.11 K2
drawn at random according to D
(A) True Error
(B) Type 1 Error
(C) Type 2 Error
(D) None of these
What area of CLT tells “How many examples we need to find a
26. good hypothesis?”? CO3.11 K2
(A) Sample Complexity
(B) Computational Complexity
(C) Mistake Bound
(D) None of these
What area of CLT tells “How many mistakes we will make before
27. finding a good hypothesis?”? CO3.10 K1
(A) Sample Complexity
(B) Computational Complexity
(C) Mistake Bound
(D) None of these

28. for the analysis of ML algorithms, we need CO3.10 K1


(A) Computational learning theory
(B) Statistical learning theory
(C) Both A & B
(D) None of these

29. PAC stand for CO3.11 K1


(A) Probably Approximate Correct
(B) Probably Approximate Correct
(C) Probably Approximate Computation
(D) Probably Approximate Computation

30. For a particular learning task, if the requirement of error parameter CO3.1 K1
changes from 0.1 to 0.01. How many more samples will be required
for PAC learning?
(A) Same
(B) 2 times
(C) 1000 times
(D) 10 times
Computational complexity of classes of learning problems depends
31. on which of the following? CO3.11 K2
(A) The size or complexity of the hypothesis space considered by
learner
(B) The accuracy to which the target concept must be approximated
(C) The probability that the learner will output a successful
hypothesis
(D) All of these

PART-B

1 Derive the intuition behind the variation inference on Bayesian


neural network. CO3.1 K2

2. A doctor is trying to diagnose a patient's illness. The patient has a


CO3.1 K3
certain set of symptoms, but the doctor is not sure which disease
they have. The doctor has a list of possible diseases and the
probabilities of each disease causing those symptoms. Using Bayes'
theorem, find out the probability of having patient with different
disease .
Suppose a teacher is trying to teach a student how to identify
3. different types of birds based on their physical characteristics, such
CO3.2 K3
as their size, color, beak shape, and wing length. The teacher has a
set of training examples of different types of birds, along with their
physical characteristics, and wants to use Bayes' theorem to help the
student learn to classify new birds.
1.How can Bayes' theorem be used to help the student learn to
classify new birds based on their physical characteristics?
2.List some of the challenges or limitations of using Bayes' theorem
for concept learning.

Can feature engineering change the selection of the model according CO3.5
4. to the minimum description length? K3
Consider a data scientist is working for an e-commerce company
5. and wants to build a classification model to predict whether a
customer will purchase a particular product based on their browsing CO3.3 K3
history and demographic information. The data scientist is
considering using the Bayes Optimal Classifier to build the model.
How can the data scientist use the Bayes Optimal Classifier to build
a model for predicting customer purchases in this case?
5. (a) If we train a Naive Bayes classifier using infinite training data
that satisfies all of its modeling assumptions, then it will achieve CO3.4 K2
zero training error over these training examples. Please justify your
answer in one sentence.

(b) Consider the plot below showing training and test set accuracy
for decision trees of different sizes, using the same set of training
data to train each tree. Describe in one sentence how the training
data curve (solid line) will change if the number of training
examples approaches infinity. From the second sentence, analyze
the test data curve under the same condition.

6. How do you train a Naive Bayes Classifier on a dataset, and what


evaluation metrics can you use to assess its performance? CO3.4 K2

How would you use Naive Bayes classifier for categorical features?
7.
State, if some features are numerical. CO3.4 K3

List the issues that can arise when using Bayes belief networks in
8. real-world applications and how can these challenges is addressed
CO3.6 K2
to improve the accuracy of the predictions?
Can you state how the EM algorithm is used in comparison to other
9. parameter estimation methods such as maximum likelihood K2
CO3.7
estimation and Bayesian inference?
Is it better to split sequences in to overlapping or non- overlapping
10. training samples? , brief your answer.
CO3.7 K4
Is it possible that, probability theory can be used to model the
11. uncertainty in machine learning predictions? What is a probability
calibration curve, and how can it be used to evaluate the calibration CO3.8 K3
of a machine learning model?
Consider you are provided with given set of labeled examples, what
12. is the minimum number of examples needed to learn a binary
classifier that can accurately classify new, unseen examples? The CO3.9 K3
sample complexity of binary classification depends on the
complexity of the decision boundary between the two classes, which
is determined by the size and complexity of the hypothesis space. In
general, the more complex the decision boundary, the more training
examples will be needed to learn an accurate model.
PART C

A detective is investigating a crime scene. They have found some


1. evidence that implicates a suspect, but they are not sure if the suspect
is guilty. Using Bayes' theorem, find the probability that the suspect CO3.1 K3
is guilty based on the evidence that has been found
(a) A pharmaceutical company is developing a new drug to treat a
2. rare disease. They have conducted a clinical trial with 100 patients,
and want to use maximum likelihood estimation to estimate the CO3.3 K3
parameters of a logistic regression model that predicts the
probability of the drug being effective based on various patient
characteristics, such as age, gender, and disease severity.
1. List some of the assumptions and limitations of using
maximum likelihood estimation.
2. Are there any ways to improve the accuracy of using
maximum likelihood estimation in this case?
3. Derive the maximum likelihood estimation for the linear
regression model in this case, assuming normally distributed
errors.
Suppose a data analyst is working for a healthcare company and
3. wants to use the Minimum Description Length (MDL) Principle to
select the best model for predicting patient outcomes. The analyst CO3.4 K3
has a dataset that includes information on patient characteristics,
medical history, and treatments received, as well as their outcomes.
1. Provide the Minimum Description Length Principle, and
how can it be used to select the best model for predicting
patient outcomes?
2. How can the analyst use the MDL Principle to select the best
model for predicting patient outcomes in this case?
3. Enumerate the advantages and limitations of using the MDL
Principle for model selection in healthcare.
A scientist is building a classification model to predict whether a
4. person is pregnant or not based on their age, weight, and height. The
scientist is using the Bayes Optimal Classifier to build the model. CO3.5 K4
1. Find the prior probabilities and conditional probabilities that
the data scientist needs to estimate for this classification
problem, Can the Bayes Optimal Classifier handle missing
data? If so, how?
2. Describe the assumptions of the Bayes Optimal Classifier,
and how might these assumptions impact the accuracy of the
model in this case.
Assuming there is a training dataset of weather conditions and
5. corresponding labels of whether to play tennis or not, Naive Bayes
Classifier can be used to classify a novel instance with the following CO3.5 K4
features: Outlook = sunny, Temperature = cool, Humidity = high,
Wind = strong.
Day Outlook Temperat Humidity Wind Play
ure tennis

D1 SUNNY HOT HIGH WEAK NO

D2 SUNNY HOT HIGH STRONG NO

D3 OVERCAST HOT HIGH WEAK YES


D4 RAIN MILD HIGH WEAK YES

D5 RAIN COOL NORMAL WEAK YES

D6 RAIN COOL NORMAL STRONG NO

D7 OVERCAST COOL NORMAL STRONG YES

D8 SUNNY MILD HIGH WEAK NO

D9 SUNNY COOL NORMAL WEAK YES

D10 RAIN MILD NORMAL WEAK YES

D11 SUNNY MILD NORMAL STRONG YES

D12 OVERCAST MILD HIGH STRONG YES

D13 OVERCAST HOT NORMAL WEAK YES

D14 RAIN MILD HIGH STRONG YES

Predict the target value (yes or no) of the target concept Play Tennis
for this new instance.
Can you specify the steps to calculate the joint probability
6. distribution for a Bayesian belief network? CO3.6 K2
Suppose you are working on a project that involves clustering a large
7. dataset of customer transaction data using k-means. However, you
notice that some of the data points have missing features, which can CO3.11 K3
lead to biased or inaccurate cluster assignments. How might you
apply the EM algorithm to improve the clustering performance, and
what evaluation metrics could you use to assess its effectiveness?

Consider you are working on a project to develop a clustering


8. algorithm for grouping similar data points together. You have a
dataset of 1000 data points in two-dimensional space, and you want CO3.11 K3
to group them into k clusters, where k is a user-specified parameter.
Derive the k-means algorithm for clustering the data points into k
clusters. What are the key steps involved in the algorithm, and how
does it work? What are the assumptions and limitations of the
algorithm, and how might they affect its performance on the given
dataset? How would you evaluate the quality of the clustering
results, and what techniques could you use to improve them?

Suppose you are working on a project to develop a machine learning


9. algorithm for predicting whether a customer is likely to churn from CO3.12 K3
a subscription-based service. You have a dataset of 10,000 labeled
examples, where each example consists of various features such as
age, gender, usage patterns, and payment history, along with a
binary label indicating whether the customer churned or not. You
want to train a binary classifier using this data that can accurately
predict whether new, unseen customers are likely to churn.
List some of the minimum number of examples needed to train a
binary classifier that can achieve good generalization performance
on new, unseen examples?
How might the sample complexity be affected by factors such as
the complexity of the decision boundary, the balance of the two
classes, and the level of noise in the data?
Which technique is used to reduce the sample complexity and
improve generalization performance?
To develop an accurate model, you decide to use an infinite
10. hypothesis space based on a non-parametric model, such as a CO3.11 K2
Gaussian process or a kernel method. What is the sample
complexity of this approach, and how does it compare to the
sample complexity of a finite hypothesis space? What factors can
affect the sample complexity of the non-parametric model, and
how can you mitigate them? List the advantages and limitations of
using a non-parametric model for this type of problem, and how
would you evaluate its performance on the given dataset?

(1) How can the mistake-bound model be extended to handle more


11. complex problems, such as multi-class classification or online CO3.11 K3
learning?
(2) How would you evaluate the performance of a mistake-bound
model on a given dataset? What metrics would you use to measure
its accuracy and generalization performance?

12. You have a dataset of 10,000 labelled examples, where each example is a
28x28 grayscale image of a digit (0-9), along with a corresponding label CO3.12 K3
indicating the true digit. You want to train a binary classifier using this
data that can accurately predict whether a new, unseen image is a "7" or
not.Derive the mistake-bound model algorithm for training the binary
classifier using this data. What are the key steps involved in the algorithm,
and how does it work? What are the assumptions and limitations of the
algorithm, and how might they affect its performance on the given dataset?
How would you evaluate the quality of the classifier's predictions, and
what techniques could you use to improve its performance?

1.

UNIT – IV – INSTANT BASED LEARNING


PART A
K
S.No Questions CO
level
The Euclidean distance between two a set of numerical attributes is
called as-----------
a. Closeness
1 b. Validation data CO4.1 K1
c. Error Rate
d. None of these

Which is the number of nearby neighbors to be used to classify the


new record?
a. KNN
2 b. Validation data CO4.1 K1
c. Euclidean Distance
d. All the above

Classification done in Euclidean distance is comparing feature


vectors of ---------
a. Same Point
3 b. Within Point CO4.1 K1
c. Different Point
d. None of these

Target function value is represented as --------------


a. Continuous Value
b. Discrete Value
4 CO4.1 K1
c. Real Value
d. Both b and c

Optimal value of ‘k’ training part is considered as-----------


a. P
b. P-1
5 CO4.1 K1
c. 1-p
d. 1

K-NN algorithm does more computation on test time rather than


train time.
6 a. TRUE CO4.1 K1
b. FALSE

Which of the following option is true about k-NN algorithm?

a. It can be used for classification


7 b. It can be used for regression CO4.1 K1
c. It can be used in both classification and regression
d. None of the above
Which of the following machine learning algorithm can be used for

imputing missing values of both categorical and continuous

variables?

8 a. K-NN CO4.1 K1

b. Linear Regression

c. Logistic Regression

d. none of the above

Which of the following distance measure do we use in case of

categorical variables in k-NN?

a. Hamming Distance
9 CO4.1 K1
b.Euclidean Distance

c.Manhattan Distance

d. radial distance

Identify the difficulties with the k-nearest neighbor algorithm.

a. Curse of dimensionality
10 CO4.1
b.Calculate the distance of the test case from all training cases
c.Both A and B
d. None of the above
What is/are advantage(s) of Distance-weighted k-NN over k-NN?
a. Robust to noisy training data
b.Quite effective when a sufficient large set of training data is
11 provided CO4.2
C. Both A & B
d.None of these
A company has built a KNN classifier that gets 100% accuracy on
training data. When they deployed this model on client side it has
been found that the model is not at all accurate. Which of the
following thing might go wrong?
12 a. It is probably an over fitted model CO4.2 K1
b. It is probably a under fitted model
c. Can’t say
d. None of these
What are the difficulties faced with k-nearest neighbour algorithm?
a. Calculate the distance of the test case from all training cases
13 b. Curse of dimensionality CO4.2
C. Both A & B
d. None of these

What does K stand for in K mean algorithm?


a. Number of clusters
14 b.Number of data CO4.2
c.Number of attributes
c.Number of iterations

The instance-based learner is a ____________


a. Lazy-learner
15 b. Eager learner CO4.2 K2
c. easy learner
d.None of the above
What is/are advantage(s) of Locally Weighted Regression?
a. Point wise approximation of complex target function
16 b. Earlier data has no influence on the new ones CO4.2 K1
C. Both A & B
d. None of these
Among the following options identify the one which is false
regarding regression.

17 a.It is used for the prediction CO4.3


b.It is used for interpretation
c.It relates inputs to outputs
d.It discovers casual relationship
How many types of layers are available in radial basis function
neural networks?
a. 3
18 b. 2 CO4.3 K1
c. 1
d. 4
The neurons in the hidden layer contain Gaussian transfer function
whose outputs are _____________ to the distance from the centre of
the neuron.
19 a. Directly CO4.3 K1
b. Inversely
c. equal
d. None of these
PNN networks have one neuron for each point in the training file,
While RBF network have a variable number of neurons that is
usually--------
20 a. less than the number of training points. CO4.3 K1
b. greater than the number of training points
c. equal to the number of training points
d. None of these
What is/are true about RBF network?
a. A kind of supervised learning
21 b. Design of NN as curve fitting problem CO4.3 K1
c. Use of multidimensional surface to interpolate the test data
d. All of these
What are the advantages of CBR?
a. A local approx. is found for each test case
b. Knowledge is in a form understandable to human
22 CO4.4 K1
c. Fast to train
d. All of these

Machine Learning has various function representations, which of the


following is not numerical functions?
a. Case-based
23 b. Neural Network CO4.4 K1
c.Linear Regression
d.Support Vector Machines

Which of the following are common classes of problems in


machine learning?
a. Regression
24 CO4.4 K1
b.classification
c. Clustering
d. All of the above

Full form of PAC is _____________


a. Probably approx cost
25 CO4.4 K1
b.Probably approximate correct
b.Probably Approx communication
b.Probably Approximate Computation
True error is defined over the entire instance space, and not just
over training data-----------
26 a. True CO4.4 K1
b.False

The area CLT is comprised of----------


a. Mistake bound
27 CO4.4 K1
b.Sample complexity
c.Computational complexity
d. all the above

Which of the following statements is true about PCA?


(i) We must standardize the data before applying PCA.
(ii) We should select the principal components which explain the
highest variance
(iii) We should select the principal components which explain the
28 lowest variance CO4.4 K1
(iv) We can use PCA for visualizing the data in lower dimensions
a. (i), (ii) and (iv).
b. (ii) and (iv)
c. (iii) and (iv)
d. (i) and (iii)
For a particular learning task, if the requirement of error parameter
changes from 0.1 to 0.01. How many more samples will be required
for PAC learning?
29 a. Same CO4.4 K1
b.2 times
c.1000 times
d.10 times
-----------------is a supervised learning algorithm used for
computing linear relationships between input (X) and output (Y).
a)KNN
30 b)linear regression algorithm CO4.4 K1

c) elimination algorithm
d)none of the above

PART B
Illustrate how the Instance-based learning methods differ from
1 CO4.1 K2
function approximation.
2 Analyze the inductive bias of k-nearest neighbor. CO4.2 K2
3 Show the voronoi diagram for k nearest neighbor. CO4.3 K3
Find the nature of the hypothesis space H implicitly considered by
4 CO4.4 K2
the k-nearest neighbor algorithm?
5 Compose the formula for locally weighted linear regression.
6 Discuss the pros and cons of locally weighted regression. CO4.5 K2
7 Differentiate regression, residual, kernel function.
Suggest a lazy version of the eager decision tree learning algorithm
8 ID3. What are the advantages and disadvantages of your lazy CO4.6 K3
algorithm compared to the original eager algorithm?
9 Point out how the eager learning differs from lazy learning. CO4.6 K1
10 Compare lazy and eager learning algorithms. CO4.9 K2
Compose three properties that are shared by the Instance based
11 CO4.10 K2
methods.
12 Summarize the three lazy learning methods. CO4.13 K4

PART C
1. Describe in detail about an algorithm which is used for regression as
CO4.1 K2
well as classification.
2. We have data from the questionnaires and objective testing from two
attributes (acid durability and strength) to classify whether special
paper tissue is good or not.
X1-acid X2= strength Y=classification
durability(sec) (kg/square CO4.1 K3
meter)
7 7 Bad
7 4 Bad
3 4 Good
1 4 Good
Now the factory produces a news paper tissue that pass laboratory test
with X1=3 and X2=7.Without another expensive survey can we guess
what the classification of this new tissue is?

3. Restaurant A” sells burgers with optional flavours: Pepper,


Ginger, and Chilly. Every day this week you have tried a burger
(A to E) and kept a record of which you liked.

S.NO PEPPER GINGER CHILLY LIKED

A TRUE TRUE TRUE FALSE


CO4.1 K3
B TRUE FALSE FALSE TRUE

C FALSE TRUE TRUE FALSE

D FALSE TRUE FALSE TRUE

E TRUE FALSE FALSE TRUE


Using Hamming distance, show how the 3NN classifier with
majority voting would classify { pepper: false, ginger: true,
chilly: true}

4. Derive the gradient decent rule for a distance –weighted local linear
approximation to the target function. CO4.2 K3

5. Consider the following alternative method for accounting for distance CO 4.3 K3
in weighted local regression. Create a virtual set of training examples
D' as follows: For each training example (x, f (x)) in the original data
set D, create some (possibly fractional) number of copies of (x, f (x))
in D', where the number of copies is K (d(x, x)). Now train a linear
approximation to minimize the error criterion.
The idea here is to make more copies of training examples that are
near the query instance, and fewer of those that are distant. Derive the
gradient descent rule for this criterion. Express the rule in the form of
a sum over members of D rather than D'.
6. Describe the two stage process of the RBF networks in detail. CO 4.3 K2
7. Compare the disadvantages and advantages of Lazy and Eager
CO 4.4 K2
learning.
8. Illustrate several generic properties of case - based reasoning systems. CO 4.5 K3

S.NO QUESTIONS CO K
LEVEL LEV
UNIT V EL
PART A

1. Genetic algorithm is a _________ CO5.1 K1


(A) Search technique used in computing to find true or
approximate solution to optimization and search problem
(B) Sorting technique used in computing to find true or approximate
solution to optimization and sort problem
(C) Both A & B
(D) None of these
2. GA techniques are inspired by _________ biology. CO5.1 K1
(A) Evolutionary
(B) Cytology
(C) Anatomy
(D) Ecology
3. The algorithm operates by iteratively updating a pool of hypotheses, CO5.2 K1
called the
(A) Population
(B) Fitness
(C) None of these
(D) both A& B
4. What is/are the requirement for the Learn-One-Rule method? CO5.2 K1
(A) Input, accepts a set of +ve and -ve training examples.
(B) Output, delivers a single rule that covers many +ve examples and
few -ve.
(C) Output rule has a high accuracy but not necessarily a high
coverage.
(D) A & B&C
5. ________ is any predicate (or its negation) applied to any set of terms. CO5.2 K2
(A) Literal
(B) Null
(C) Clause
(D) None of these
6. Each schema the set of bit strings containing the indicated as CO5.3 K1
(A) 0s, 1s
(B) only 0s
(C) only 1s
(D) 0s, 1s, *s
7. Correct ( h ) is the percent of all training examples correctly CO5.3
classified by hypothesis h. then Fitness function is equal to
(A) Fitness ( h) = (correct ( h)) 2 K1
(B) Fitness ( h) = (correct ( h)) 3
(C) Fitness ( h) = (correct ( h))
(D) Fitness ( h) = (correct ( h)) 4
8. _____ provides way and means of weighing up the desirability of ILP CO5.5 K1
stand for
(A) Inductive Logical programming
(B) Inductive Logic Programming
(C) Inductive Logical Program
(D) Inductive Logic Program
9. Which combines inductive methods with the power of first-order CO5.5 K1
representations?
(A) Inductive programming
(B) Logic programming
(C) Inductive logic programming
(D) Lisp programming
10. The barrels contain 100 apples, three apples were selected random CO5.6 K1
were found to be ripe, therefore probably of all 100 apples are ripe.
Which type of argument is this_____
(A) Inductive
(B) Deductive
(C) Hypothetical syllogism
(D) Modus ponens
11. ______ emphasizes learning feedback that evaluates the learner's CO5.14 K1
performance without providing standards of correctness in the form of
behavioral targets.
(A) Reinforcement learning
(B) Supervised Learning
(C) A & B
(D) None of these
12. Features of Reinforcement learning CO5.14 K1
(A) Set of problem rather than set of techniques
(B) RL is training by reward and punishments.
(C) RL is learning from trial and error with the world.
(D) All of these
13. Which type of feedback used by RL? CO5.14 K2
(A) Purely Instructive feedback
(B) Purely Evaluative feedback
(C) Both A & B
(D) None of these
14. What is/are the problem solving methods for RL? CO5.14 K2
(A) Dynamic programming
(B) Monte Carlo Methods
(C) Temporal-difference learning
(D) All of these
15. Reinforcement learning is defined by the ____ CO5.14 K1
(A)Policy
(B)Reward Signal
(C)Value Function
(D)Model of the environment

16. Which of the following elements of reinforcement learning imitates CO5.14 K2


the behaviour of the environment?
(A) Policy
(B) Reward Signal
(C) Value Function
(D) Model of the environment

17. How many types of reinforcement learning? CO5.14 K1


(A)3
(B)4
(C)2
(D)5

18. In which of the following approaches of reinforcement learning, a CO5.14 K1


virtual model is created for the environment_______
(A)Value-based
(B)Policy-based
(C)Model-based
(D)None of these
19. Which of the following is true about generative models? CO5.15 K1
(A) They capture the joint probability
(B) The perceptron is a generative model
(C) Generative models can be used for classification
(D) A & C

20. What can help to reduce overfitting in an SVM classifier? CO5.15 K2


(A) High-degree polynomial features
(B) Setting a very low learning rate
(C) Use of slack variables
(D) Normalizing the data

21. The Q-learning algorithm is a__________ CO5.16 K1

(A) Supervised learning algorithm


(B) Unsupervised learning algorithm
(C) Semi-supervised learning algorithm
(D) Reinforcement learning algorithm

22. In statistical terms, this represents the weighted average score. CO5.16 K1
(A) Variance
(B) Mean
(C) Median
(D) More

23. What is the meaning of hard margin in SVM? CO5.16 K1

(A) SVM allows very low error in classification


(B) SVM allows high amount of error in classification
(C) Under fitting
(D) SVM is highly flexible

What would be the relationship between the training time taken by CO5.16 K2
24. 1-NN, 2-NN, and 3-NN?

(A) 1-NN > 2-NN > 3-NN


(B) 1-NN < 2-NN < 3-NN
(C) 1-NN ~ 2-NN ~ 3-NN
(D) None of these

Which of the following measure is not used for a classification CO5.16 K2


25. model?

(A) Accuracy
(B) Recall
(C) Error rate
(D) Purity

26. Which of the following is a subset of machine learning? CO5.17 K2


(A) Numpy
(B) Scipy
(C) Deep learning
(D) All of the above

How many layers Deep learning algorithms are constructed? CO5.17 K2


27. (A) 2
(B) 3
(C) 4
(D) 5

The first layer is called the? CO5.17 K1


28. (A)inner layer
(B) outer layer
(C) hidden layer
(D) none of these

CO5.17 K1
Which of the following is/are Limitations of deep learning?
29.
(A)Data labeling
(B)Obtain huge training data sets
(C)Both 1 & 2
(D)None of these
Which of the following is well suited for perceptual tasks? CO5.17 K1
30. (A) Feed forward neural network
(B) Recurrent neural network
(C) Convolutional network
(D) Reinforcement learning

PART B

1. Consider a sequential covering algorithm such as CN2 and a simultaneous


covering algorithm such as ID3. Both algorithms are to be used to learn a CO5.2 K3
target concept defined over instances represented by conjunctions of n
boolean attributes. If ID3 learns a balanced decision tree of depth d, it will
contain 2d - 1 distinct decision nodes, and therefore will have made 2d - 1
distinct choices while constructing its output hypothesis. How many rules
will be formed if this tree is re-expressed as t a disjunctive set of rules? How
many preconditions will each rule it possess? How many distinct choices
would a sequential covering algorithm have to make to learn this same set of
rules?

Consider the options for implementing LEARN-ONE-RULE in terms CO5.3 K3


2. of the possible strategies for searching the hypothesis space. In
particular, consider the following attributes of the search (a) generate-
and-test versus data-driven (b) general-to-specific versus specific-to-
general
Compare the concept of FOIL and other machine learning algorithms, CO5.4 K2
3. such as decision trees or artificial neural networks?

Can you brief it out the process of, How does FOIL handle noisy or CO5.4 K2
4. incomplete data? How does FOIL handle missing values or unknown
values in the data?
Apply inverse resolution in propositional form to the clauses C = A v CO5.6 K3
5. B, C1 = A v B v G. Give at least two possible results for CZ.

6. A marketing department for a large retailer wants to increase the CO5.7


effectiveness of their email campaigns by targeting specific customer K2
segments with personalized content. They have a large database of
customer information, including demographic data, purchase history,
and website behaviour. The marketing team wants to use analytical
learning to identify which customer attributes and behaviours are most
predictive of response to email campaigns. What is the problem that
the marketing department is trying to solve, and why is analytical
learning an appropriate approach?

1. List out some of the open research questions or challenges in the field CO5.7 K2
7. of example-based generalization, and how might Prolog-EBG
contribute to addressing them?

Consider learning the target concept Good Credit Risk defined over CO5.9 K4
8. instances described by the four attributes Has Student Loan, Has
Savings Account, Is student, Owns Car. Give the initial network
created by KBANN for the following domain theory, including all
network connections and weights. Good Credit Risk t Employed,
Low Debt Employed t -1sStudent Low Debt t –Has Student Loan,
Has Savings Account
Company wants to optimize their online ad campaigns in order to CO5.14 K2
9. maximize conversions (e.g. clicks, sign-ups, purchases) while
minimizing the cost per conversion. They have access to historical data
on ad impressions, clicks, and conversions, as well as data on the cost
of each ad. The company has decided to use reinforcement learning to
improve their ad campaign performance.
(a)How might the company set up the reinforcement learning
problem? What would be the state, action, and reward spaces?

1. Enumerate the concept of Q-learning and how it can be used to solve CO5.15 K2
10. a reinforcement learning problem.

State the TD error and how it is used to update the value function in CO5.16 K2
11. TD learning?
1. How does TD learning differ from Monte Carlo methods and
dynamic programming methods?

1. How would the company evaluate the performance of their neural CO5.17 K2
12. network on the image classification task? What metrics might they
use to measure accuracy and generalization?

PART C
Elaborate in detail about the learning sets of rules and state how it
1. differs from other algorithms.
CO5.1 K2
(i)Illustrate the diagram for the search for rule preconditions as learn-
2. one-rule proceeds from general to specific.
(ii)Discuss the implementation algorithm for Learn one-rule CO5.1 K3
Refine the LEARN-ONE-RULE algorithm. So that it can learn rules CO.5.2 K3
3 whose preconditions include constraints such as nationality E
{Canadian, Brazilian}, where a discrete-valued attribute is allowed to
take on any value in some specified set. Your modified program
should explore the hypothesis space containing all such subsets.
Specify your new algorithm as a set of editing changes to the
algorithm.
Consider a sequential covering algorithm such as CN2 and a
4 simultaneous covering algorithm such as ID3. Both algorithms are to
be used to learn a target concept defined over instances represented CO.5.2
by conjunctions of n boolean attributes. If ID3 K3
learns a balanced decision tree of depth d, it will contain 2d - 1
distinct decision nodes, and therefore will have made 2d - 1 distinct
choices while constructing its output hypothesis. How many rules
will be formed if this tree is re-expressed ast a disjunctive set of
rules? How many preconditions will each ru?e possess? How many
distinct choices would a sequential covering algorithm have to make
to learn this same set of rules? Which system do you suspect would
be more prone to overfitting if both were given the same training
data?
Apply inverse resolution to the clauses C = R(B, x) v P(x, A) and CI
5. = S(B, y) vR(z, x). Give at least four possible results for C2. Here A
and B are constants, x and y are variables. CO5.5 K3
Consider the bottom-most inverse resolution, derive at least two
6. different outcomes that could result given different choices for the
substitutions θ1 and θ2 .Derive a result for the inverse resolution step CO5.5 K3
if the clause Father(Tom, Bob) is used in place of Father(Shannon,
Tom).

Consider the problem of learning the target concept "pairs of people


7. who live in the same house," denoted by the predicate Housemates(x,
y). Below is a positive example of the concept. CO5.6 K3
Housemates(Joe, Sue)
Person( Joe) Person(Sue)
Sex(Joe, Male) Sex(Sue, Female)
Hair Color (Joe, Black) Hair color (Sue, Brown)
Height (Joe, Short) Height(Sue, Short)
Nationality(Joe, US) Nationality(Sue, US)
Mother(Joe, Mary) Mother(Sue, Mary)
Age (Joe, 8) Age(Sue, 6)

The following domain theory is helpful for acquiring the Housemates


concept:
Housemates(x, y) t InSameFamily(x, y)
Housemates(x, y) t Fraternity Brothers(x, y)
InSameFamily(x, y) t Married(x, y)
InSameFamily (x, y) t Youngster(x) A Youngster (y) A Same Mother
(x, y)
Same Mother(x, y) t Mother(x, z) A Mother(y, z)
Youngster(x) t Age(x, a) A Less Than(a, 10)

Apply the PROLOG-EBG algorithm to the task of generalizing from


the above
Instance, using the above domain theory. In particular,
(a) Show a hand-trace of the PROLOG-EBG algorithm applied to this
problem; that is, show the explanation generated for the training
instance, show the result of regressing the target concept through this
explanation, and show the resulting
Horn clause rule.
(b) Suppose that the target concept is "people who live with Joe"
instead of "pairs
of people who live together." Write down this target concept in terms
of the above formalism. Assuming the same training instance and
domain theory as before, what Horn clause rule will PROLOG-EBG
produce for this new target
Concept?
Compose the following horn clauses
8. (i)First-Order Horn Clauses (6M)
(ii)Basic terminology in horn clauses.(6M) CO5.7 K2

Consider again the search trace of FOCL suppose that the


10. hypothesis selected at the first level in the search is changed to CO5.8 K3
Cup- t Has Handle
Describe the second-level candidate hypotheses that will be
generated by FOCL as successors to this hypothesis. You need only
include those hypotheses generated by FOCL's second search
operator, which uses its domain theory. Don't forget to
Post-prune the sufficient conditions.
Consider playing Tic-Tac-Toe against an opponent who plays
11. randomly. In particular, assume the opponent chooses with uniform CO5.9 K3
probability any open space, unless there is a forced move (in which
case it makes the obvious correct move).
(a) Formulate the problem of learning an optimal Tic-Tac-Toe strategy
in this case as a Q-learning task. What are the states, transitions, and
rewards in this non-deterministic Markov decision process?

(b) Will your program succeed if the opponent plays optimally rather
than randomly?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy