MLT Cat Ii
MLT Cat Ii
SAKUNTHALA
ENGINEERING COLLEGE
(An Autonomous Institution)
Degree/Branch/B.Tech/AIDS/CSBS Subject Code:191CB621
Date: Year/Sem: III/VI
MACHINE LEARNING TECHNIQUES
QUESTION BANK
UNIT- 1 INTRODUCTION
S.NO QUESTIONS CO K
LEV LEVEL
UNIT 1 EL
PART A
20. Inductive learning takes examples and generalizes rather than starting with
__________ knowledge.
a) Inductive CO1. K1
b) Existing 6
c) Deductive
d) None of the above
A model of language consists of the categories which does not include ________
21. a) system units
b) structural units CO1. K2
c) data units 7
d) empirical units
Concept learning inferred a ______ valued function from training examples of its
23. input and output. C01. K1
a) Decimal 8
b) Hexadecimal
c) Boolean
d) All of the above
___________ is the scenario when model fail to decipher the underlying trend in
24. the input data.
a) Over fitting CO1. K2
b) Under fitting 8
c) both A and B
d) none of the above
In language understanding, which is not include in the level of knowledge?
25. a) Empirical
b) Logical CO1. K2
c) Phonological 7
d) Syntactic
Identify the model which is trained with data in a single batch.
26. a) Offline learning
b) Batch learning CO1. K2
c) Both A and B 7
d) None
30. Which of the following machine learning algorithm is based upon the idea of
bagging? CO1. K1
a) Decision tree 10
b) Random-forest
c) Classification
d) regression
FIND-S Algorithm starts from the most specific hypothesis and generalize it by
31. considering only ________ examples.
a) Negative CO1. K2
b) Positive 10
c) Negative or Positive
d) None of the above
PART B
1 Imagine we have two possibilities: We can scan and email the image, or
we can use an optical character reader (OCR) and send the text file. CO1. K1
1
Discuss the advantage and disadvantages of the two approaches in a
comparative manner. When would one be preferable over the other?
Pick some learning task and state it precisely as possible the task,
4. performance measure, and training experience. CO1. K2
3
5. What algorithms exist for learning general target functions from training
examples? In what settings will particular algorithms converge to the CO1. K1
5
desired function, given sufficient training data? Which algorithms perform
best for which types of problems and representations?
9. Take, for example, the word “machine.” Write it ten times. Also ask a
friend to write it ten times. Analyzing these twenty images, try to find CO1. K1
features, 20 1 Introduction types of strokes, curvatures, loops, how you 10
make the dots, and so on, that discriminate your handwriting from that of
your friend’s.
10. Give decision trees to represent the following Boolean functions: (a) A ˄˜B CO1. K4
(b) A V [B ˄ C] (c) A XOR B (d) [A ˄ B] v [C ˄ D] 11
PART C
How much training data is sufficient? What general bounds can be found
1. to relate the confidence in learned hypotheses to the amount of training
experience and the character of the learner's hypothesis space? CO1 K2
.1
Taking a very simple example, one possible target concept may be to Find
2. the day when my friend Ramesh enjoys his favorite sport. We have some
attributes/features of the day like, Sky, Air CO1 K4
Temperature, Humidity, Wind, Water, Forecast and based on this we have .1
a target Concept named Enjoy Sport.
For Enjoy
examp Air Humid Win Wat
Sky ecas
le Temp ity d er
t Sport
Stro cha
4 Sunny Warm High Cool yes
ng nge
3. Implement an algorithm similar to the checkers problem. Use the simpler CO1 K4
game of tic-tac-toe. Represent the learned function as a linear combination .7
of board features of your choice.
Consider the Enjoy Sport learning task and the hypothesis space Define a
4. new hypothesis space H' that consists of all pair vise disjunctions. For CO1 K4
example, a typical hypothesis in H' is (?, Cold, High, ?, ?, ?) v (Sunny, ?, .9
High, ?, ?, Same) Trace the CANDIDATE-ELIMINATION algorithm for
the hypothesis space H' given the sequence of training
5. FIND S Algorithm is used to find the Maximally Specific Hypothesis. CO1 K3
Using the Find-S algorithm gives a single maximally specific hypothesis .9
for the given set of training examples.
.
Consider the following sequence of positive and negative training
6. examples describing the concept "pairs of people who live in the same
house." Each training example describes an ordered pair of people, with
each person described by their sex color (black, brown, or blonde), height
(tall, medium, or short), and nationality (US, French, German, Irish,
Indian, Japanese, or Portuguese). + ((male brown tall US) (female black
short US)) + ((male brown short French)( female black short US)) -
((female brown tall German)( female black short Indian)) + ((male brown
tall Irish) ( female brown short Irish))Consider a hypothesis space defined CO1 K3
over these instances, in which each hypothesis is represented by a pair of .9
Tuples, and where each attribute constraint may be a specific value, "?," or
"0," just as in the Enjoy Sport hypothesis representation. For example, the
hypothesis ((male ? tall ?)(female ? ? Japanese)) represents the set of all
pairs of people where the first is a tall male(of any nationality and hair
color), and the second is a Japanese female (of any hair color and height).
Provide a hand trace of the CANDIDATE-ELIMINATION algorithm
learning from the above training examples and hypothesis language. In
particular, show the specific and general boundaries of the version space
after it has processed the first training example, then the second training
example, etc
Example Citations Size In Price Editions Buy
7. library
1 Some Small NO Affordable One No CO1 K3
2 Many Big NO Expensive Many Yes .9
3 Many Medium NO Expensive Few Yes
4 Many Small NO Affordable Many Yes
Using Candidate Elimination Algorithmfind the possible maximal
hypothesis for the given set of data.
Example Size Color Shape Class
8. 1 Big Red Circle No
2 Small Red Triangle NO CO1 K3
3 Small Red Circle YES .9
4 Big Blue Circle NO
5 Small Blue Circle YES
Using Candidate Elimination Algorithm find the possible maximal
hypothesis for the given set of data.
Design a method for converting the conjunctive data set in to a disjunctive CO1 K3
9. data set in candidate elimination algorithm. .11
Consider the following set of training example
10. • What is the entropy of this collection of training examples with
respect to the target function classification?
• What is the information gain of a2 relative to these training
examples?
1 + T T
2 + T T
3 – T F
4 + F F
5 – F T
6 – F T
11. Age Completion type profit CO1 K2
Old Yes s/w Down .11
Old No s/w Down
Old No h/w Down
Mid Yes s/w Down
Mid Yes h/w Down
Mid No h/w Up
Mid No s/w Up
New Yes s/w Up
New No h/w Up
New No s/w Up
PART A
Questions K
CO level
Artificial neural network is used for---------------
(A)Classification
(B) Clustering
1 CO2.1 K1
(C) Pattern recognition
(D) All of the above
Artificial Neural Network is based on -------------- approach.
(A)Weak Artificial Intelligence approach
(B)Cognitive Artificial Intelligence approach
2 CO2.1 K1
(C)Strong Artificial Intelligence approach
(D)Applied Artificial Intelligence approach
________ computes the output volume by computing dot product between
all filters and image patch
(A)Input Layer
3 (B)Convolution Layer CO2.1 K1
(C)Pool Layer
(D)Activation Function Layer
_____ are the ways to represent uncertainty
(A)Fuzzy logic
(B)Entropy
4 CO2.2 K1
(C)Probability
(D)All of the above
Who was the inventor of the first neurocomputer?
(A)Dr. Robert Hecht-Nielsen
(B)Dr. John Hecht-Nielsen
5 CO2.2 K1
(C)Dr. Alex Hecht-Nielsen
(D) Dr. Steve Hecht-Nielsen
Which of the following is true for neural networks?
(A)It has a set of nodes and connections
(B)A node could be in an excited state or non-excited state
6 CO2.2 K1
(C )Each node computes it’s weighted input
(D)All of the above
Back propagation is a learning technique that adjusts weights in the neural
network by propagating weight changes-----------
(A)Backward from sink to source
7 (B)Forward from source to sink CO2.2 K1
(C )Backward from sink to hidden nodes
(D)Forward from source to hidden nodes
The fundamental unit of the neural network is------------
(A)Neuron
(B)Brain
8 CO2.3 K1
(C)Nucleus
(D)Dendrites
Which of the following neural networks uses supervised learning?
(A) Multilayer perceptron
(B) Self-organizing feature map
(C)Hopfield network
9 CO2.3 K1
(A) only
(B) only
(A) and (C) only
(A) and (B) only
Which of the following techniques perform similar operations as dropout
in a neural network?
(A)Bagging
10 CO2.3 K1
(B)Boosting
(C )Stacking
(D)None of the above
11 A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear CO2.5 K2
with the constant of proportionality being equal to 2. The inputs are 4, 10,
5 and 20 respectively. The output will be:
(A)76
(B)128
(C)238
(D)228
12 ______________ Algorithm propagates errors from nodes of output to CO2.9 K1
input
(A) Back propagation
(B) Front Propagation
(C )Signal Propagation
(D)Channel Propagation
13 Which rule is followed by the Back propagation algorithm? CO2.9 K1
(A)Static Rule
(B)Dynamic Rule
(C)Chain Rule
(D)None
14 Error rates are reduced in back propagation due to _____________ CO2.9 K2
(A) Proper Tuning
(B) Iteration
(C) Improper Tuning
(D)Generalization
15 Which parameter should be set while using Back propagation? CO2.9 K1
(A)Number of inputs
(B)Number of outputs
(C )Number of Gradients
(D)Number of intermediate Stages
16 Back propagation algorithm consist of---------layers CO2.9 K2
(A)Zero
(B)Three
(C)Two
(D)One
17 Back propagation algorithms are practically applied in _____________ CO2.9 K2
(A)Artificial Intelligence
(B)Natural Language Processing
(C )Image Processing
(D)All the above mentioned
18 What is determined by the adjustment level of the Cost function? CO2.10 K2
(A)Number of inputs
(B)Number of outputs
(C )Number of Gradients
(D)Number of intermediate Stages
19 Which approach is most suited to structured problems with little CO2.10 K1
uncertainty?
(A) Simulation
(B) Human intuition
(C) Optimization
(D) Genetic algorithms
20 Identify the kind of learning algorithm for “facial identities for facial CO2.10 K1
expressions”.
(A) Prediction
(B) Recognition Patterns
(C)Recognizing anomalies
(D) Generating Patterns
21 Choose a disadvantage of decision trees among the following. CO2.11 K1
(A)Decision trees are robust to outliers
(B)Factor Analysis
(C)Decision trees are prone to over fit
(D)All the above
22 The most significant phase in genetic algorithm is _________ CO2.13 K2
(A)Mutation
(B)Selection
(C )Fitness Function
(D) Cross over
23 Which of the following are common classes of problems in machine CO2.13 K1
learning?
(A)Regression
(B)Classification
(C)Clustering
( D)All of the above
24 Identify the successful applications of ML. CO2.12 K1
(A)Learning to classify new astronomical structures
(B)Learning to recognize spoken words
(C)Learning to drive an autonomous vehicle
(D)All of the above
25 Select the correct definition of neuro software. CO2.12 K1
(A)It is software used by neurosurgeons
(B)It is software used to analyse neurons
(C)It is a powerful and easy neural network
(D)None of the above
26 Identify among the following which is not evolutionary computation.
(A)Genetic algorithm CO2.14 K1
(B)Genetic programming
(C)Neuro evolution
(D)Perceptron
27 Identify the clustering method which takes care of variance in data CO2.14 K1
(A)Decision tree
(B)Gaussian mixture model
(C)K means
(D)All of the above
28 Genetic algorithm belong to the family of method in the-------------------
(A) Artificial Intelligence CO2.13 K2
(B)Optimization
(C)Non computer based system
(D) Complete enumeration family of methods
29 Genetic programming are represented by------------ CO2.13 K1
(A)Lines of code
(B)Syntax tree
(C)Bit values
(D)Strings
30 What are general limitations of back propagation rule? CO 2.13 K1
a)Local minima problem
b)Slow convergence
c)Scaling
d)all of above
PART-B
1. Compare the concept of convolution and feed forward neural networks. CO2.1 K2
M
∑WiXi+ bias = W1X1 + W2X2 + W3X3 + bias
i=1
2. OUTPUT =F(X)={1 if ∑ W1X1 + b ≥ 0 CO2.2 K2
{0 if ∑ W1X1 + b ≥ 0
Compose the process how output is gained from given weight of the
input.
Identify the function which is used as a measure of accuracy in neural network
and which function is used learn the patterns accurately in training data, and CO2.3 K2
3. precise the uses of that function.
4. Design some of the popular activation functions used in neural networks. CO2.3 K2
Suppose there is a perceptron having weights corresponding to the three inputs
have the following values:
w1 = 2 ; w2 = −4; and w3 = 1 and the activation of the unit is given by the step-
function: φ(v) = 1 if v≥0 otherwise 0
Calculate the output value y of the given perceptron for each of the following
5. input patterns: CO2.3 K3
Pattern P1 P2 P3 P4
X1 1 0 1 1
X2 0 1 0 1
X3 0 1 1 1
6. List out the characteristics which is used in back propagation algorithm. CO2.9 K1
Compare and contrast the gradient descent and Delta rule.
7. CO2.10 K2
8. Derive the Gradient Descent algorithm for training a linear unit. CO2.11 K3
9. Draft the systematic diagram for genetic operators. CO2.12 K2
10. Illustrate the Lamarckian Evolution with an example. CO2.12 K3
Create the program for tree representation in genetic programming.
11. CO2.13 K3
12. Evaluate the parallelizing Genetic Algorithms. CO2.14 K4
13. Describe the models of evolution and learning. CO2.15 K2
PART C
Identify the problems for which ANN learning is well suited and write down
1 CO2.1 K3
the characteristics.
Elaborate in detail about the Neural Network representation with suitable
2 CO2.2 K2
example.
3 Analyze architecture and concept behind the mechanism of ANN and BNN. CO2.2 K3
Describe about perceptron with an example and draw the decision surface
4 CO2.3 K3
represented by a two-input perceptron.
5 Derive the expression of the back propagation algorithm. CO2.9 K3
6 Apply BACKPROPAGATION to the task of face recognition application. CO2.9 K3
Demonstrate hypothesis space search of Genetic algorithm with neural
8 CO2.12 K3
network back propagation algorithm.
9 Analyze the bit string representation of Genetic algorithms hypothesis. CO2.12 K4
UNIT III
PART A
30. For a particular learning task, if the requirement of error parameter CO3.1 K1
changes from 0.1 to 0.01. How many more samples will be required
for PAC learning?
(A) Same
(B) 2 times
(C) 1000 times
(D) 10 times
Computational complexity of classes of learning problems depends
31. on which of the following? CO3.11 K2
(A) The size or complexity of the hypothesis space considered by
learner
(B) The accuracy to which the target concept must be approximated
(C) The probability that the learner will output a successful
hypothesis
(D) All of these
PART-B
Can feature engineering change the selection of the model according CO3.5
4. to the minimum description length? K3
Consider a data scientist is working for an e-commerce company
5. and wants to build a classification model to predict whether a
customer will purchase a particular product based on their browsing CO3.3 K3
history and demographic information. The data scientist is
considering using the Bayes Optimal Classifier to build the model.
How can the data scientist use the Bayes Optimal Classifier to build
a model for predicting customer purchases in this case?
5. (a) If we train a Naive Bayes classifier using infinite training data
that satisfies all of its modeling assumptions, then it will achieve CO3.4 K2
zero training error over these training examples. Please justify your
answer in one sentence.
(b) Consider the plot below showing training and test set accuracy
for decision trees of different sizes, using the same set of training
data to train each tree. Describe in one sentence how the training
data curve (solid line) will change if the number of training
examples approaches infinity. From the second sentence, analyze
the test data curve under the same condition.
How would you use Naive Bayes classifier for categorical features?
7.
State, if some features are numerical. CO3.4 K3
List the issues that can arise when using Bayes belief networks in
8. real-world applications and how can these challenges is addressed
CO3.6 K2
to improve the accuracy of the predictions?
Can you state how the EM algorithm is used in comparison to other
9. parameter estimation methods such as maximum likelihood K2
CO3.7
estimation and Bayesian inference?
Is it better to split sequences in to overlapping or non- overlapping
10. training samples? , brief your answer.
CO3.7 K4
Is it possible that, probability theory can be used to model the
11. uncertainty in machine learning predictions? What is a probability
calibration curve, and how can it be used to evaluate the calibration CO3.8 K3
of a machine learning model?
Consider you are provided with given set of labeled examples, what
12. is the minimum number of examples needed to learn a binary
classifier that can accurately classify new, unseen examples? The CO3.9 K3
sample complexity of binary classification depends on the
complexity of the decision boundary between the two classes, which
is determined by the size and complexity of the hypothesis space. In
general, the more complex the decision boundary, the more training
examples will be needed to learn an accurate model.
PART C
Predict the target value (yes or no) of the target concept Play Tennis
for this new instance.
Can you specify the steps to calculate the joint probability
6. distribution for a Bayesian belief network? CO3.6 K2
Suppose you are working on a project that involves clustering a large
7. dataset of customer transaction data using k-means. However, you
notice that some of the data points have missing features, which can CO3.11 K3
lead to biased or inaccurate cluster assignments. How might you
apply the EM algorithm to improve the clustering performance, and
what evaluation metrics could you use to assess its effectiveness?
12. You have a dataset of 10,000 labelled examples, where each example is a
28x28 grayscale image of a digit (0-9), along with a corresponding label CO3.12 K3
indicating the true digit. You want to train a binary classifier using this
data that can accurately predict whether a new, unseen image is a "7" or
not.Derive the mistake-bound model algorithm for training the binary
classifier using this data. What are the key steps involved in the algorithm,
and how does it work? What are the assumptions and limitations of the
algorithm, and how might they affect its performance on the given dataset?
How would you evaluate the quality of the classifier's predictions, and
what techniques could you use to improve its performance?
1.
variables?
8 a. K-NN CO4.1 K1
b. Linear Regression
c. Logistic Regression
a. Hamming Distance
9 CO4.1 K1
b.Euclidean Distance
c.Manhattan Distance
d. radial distance
a. Curse of dimensionality
10 CO4.1
b.Calculate the distance of the test case from all training cases
c.Both A and B
d. None of the above
What is/are advantage(s) of Distance-weighted k-NN over k-NN?
a. Robust to noisy training data
b.Quite effective when a sufficient large set of training data is
11 provided CO4.2
C. Both A & B
d.None of these
A company has built a KNN classifier that gets 100% accuracy on
training data. When they deployed this model on client side it has
been found that the model is not at all accurate. Which of the
following thing might go wrong?
12 a. It is probably an over fitted model CO4.2 K1
b. It is probably a under fitted model
c. Can’t say
d. None of these
What are the difficulties faced with k-nearest neighbour algorithm?
a. Calculate the distance of the test case from all training cases
13 b. Curse of dimensionality CO4.2
C. Both A & B
d. None of these
c) elimination algorithm
d)none of the above
PART B
Illustrate how the Instance-based learning methods differ from
1 CO4.1 K2
function approximation.
2 Analyze the inductive bias of k-nearest neighbor. CO4.2 K2
3 Show the voronoi diagram for k nearest neighbor. CO4.3 K3
Find the nature of the hypothesis space H implicitly considered by
4 CO4.4 K2
the k-nearest neighbor algorithm?
5 Compose the formula for locally weighted linear regression.
6 Discuss the pros and cons of locally weighted regression. CO4.5 K2
7 Differentiate regression, residual, kernel function.
Suggest a lazy version of the eager decision tree learning algorithm
8 ID3. What are the advantages and disadvantages of your lazy CO4.6 K3
algorithm compared to the original eager algorithm?
9 Point out how the eager learning differs from lazy learning. CO4.6 K1
10 Compare lazy and eager learning algorithms. CO4.9 K2
Compose three properties that are shared by the Instance based
11 CO4.10 K2
methods.
12 Summarize the three lazy learning methods. CO4.13 K4
PART C
1. Describe in detail about an algorithm which is used for regression as
CO4.1 K2
well as classification.
2. We have data from the questionnaires and objective testing from two
attributes (acid durability and strength) to classify whether special
paper tissue is good or not.
X1-acid X2= strength Y=classification
durability(sec) (kg/square CO4.1 K3
meter)
7 7 Bad
7 4 Bad
3 4 Good
1 4 Good
Now the factory produces a news paper tissue that pass laboratory test
with X1=3 and X2=7.Without another expensive survey can we guess
what the classification of this new tissue is?
4. Derive the gradient decent rule for a distance –weighted local linear
approximation to the target function. CO4.2 K3
5. Consider the following alternative method for accounting for distance CO 4.3 K3
in weighted local regression. Create a virtual set of training examples
D' as follows: For each training example (x, f (x)) in the original data
set D, create some (possibly fractional) number of copies of (x, f (x))
in D', where the number of copies is K (d(x, x)). Now train a linear
approximation to minimize the error criterion.
The idea here is to make more copies of training examples that are
near the query instance, and fewer of those that are distant. Derive the
gradient descent rule for this criterion. Express the rule in the form of
a sum over members of D rather than D'.
6. Describe the two stage process of the RBF networks in detail. CO 4.3 K2
7. Compare the disadvantages and advantages of Lazy and Eager
CO 4.4 K2
learning.
8. Illustrate several generic properties of case - based reasoning systems. CO 4.5 K3
S.NO QUESTIONS CO K
LEVEL LEV
UNIT V EL
PART A
22. In statistical terms, this represents the weighted average score. CO5.16 K1
(A) Variance
(B) Mean
(C) Median
(D) More
What would be the relationship between the training time taken by CO5.16 K2
24. 1-NN, 2-NN, and 3-NN?
(A) Accuracy
(B) Recall
(C) Error rate
(D) Purity
CO5.17 K1
Which of the following is/are Limitations of deep learning?
29.
(A)Data labeling
(B)Obtain huge training data sets
(C)Both 1 & 2
(D)None of these
Which of the following is well suited for perceptual tasks? CO5.17 K1
30. (A) Feed forward neural network
(B) Recurrent neural network
(C) Convolutional network
(D) Reinforcement learning
PART B
Can you brief it out the process of, How does FOIL handle noisy or CO5.4 K2
4. incomplete data? How does FOIL handle missing values or unknown
values in the data?
Apply inverse resolution in propositional form to the clauses C = A v CO5.6 K3
5. B, C1 = A v B v G. Give at least two possible results for CZ.
1. List out some of the open research questions or challenges in the field CO5.7 K2
7. of example-based generalization, and how might Prolog-EBG
contribute to addressing them?
Consider learning the target concept Good Credit Risk defined over CO5.9 K4
8. instances described by the four attributes Has Student Loan, Has
Savings Account, Is student, Owns Car. Give the initial network
created by KBANN for the following domain theory, including all
network connections and weights. Good Credit Risk t Employed,
Low Debt Employed t -1sStudent Low Debt t –Has Student Loan,
Has Savings Account
Company wants to optimize their online ad campaigns in order to CO5.14 K2
9. maximize conversions (e.g. clicks, sign-ups, purchases) while
minimizing the cost per conversion. They have access to historical data
on ad impressions, clicks, and conversions, as well as data on the cost
of each ad. The company has decided to use reinforcement learning to
improve their ad campaign performance.
(a)How might the company set up the reinforcement learning
problem? What would be the state, action, and reward spaces?
1. Enumerate the concept of Q-learning and how it can be used to solve CO5.15 K2
10. a reinforcement learning problem.
State the TD error and how it is used to update the value function in CO5.16 K2
11. TD learning?
1. How does TD learning differ from Monte Carlo methods and
dynamic programming methods?
1. How would the company evaluate the performance of their neural CO5.17 K2
12. network on the image classification task? What metrics might they
use to measure accuracy and generalization?
PART C
Elaborate in detail about the learning sets of rules and state how it
1. differs from other algorithms.
CO5.1 K2
(i)Illustrate the diagram for the search for rule preconditions as learn-
2. one-rule proceeds from general to specific.
(ii)Discuss the implementation algorithm for Learn one-rule CO5.1 K3
Refine the LEARN-ONE-RULE algorithm. So that it can learn rules CO.5.2 K3
3 whose preconditions include constraints such as nationality E
{Canadian, Brazilian}, where a discrete-valued attribute is allowed to
take on any value in some specified set. Your modified program
should explore the hypothesis space containing all such subsets.
Specify your new algorithm as a set of editing changes to the
algorithm.
Consider a sequential covering algorithm such as CN2 and a
4 simultaneous covering algorithm such as ID3. Both algorithms are to
be used to learn a target concept defined over instances represented CO.5.2
by conjunctions of n boolean attributes. If ID3 K3
learns a balanced decision tree of depth d, it will contain 2d - 1
distinct decision nodes, and therefore will have made 2d - 1 distinct
choices while constructing its output hypothesis. How many rules
will be formed if this tree is re-expressed ast a disjunctive set of
rules? How many preconditions will each ru?e possess? How many
distinct choices would a sequential covering algorithm have to make
to learn this same set of rules? Which system do you suspect would
be more prone to overfitting if both were given the same training
data?
Apply inverse resolution to the clauses C = R(B, x) v P(x, A) and CI
5. = S(B, y) vR(z, x). Give at least four possible results for C2. Here A
and B are constants, x and y are variables. CO5.5 K3
Consider the bottom-most inverse resolution, derive at least two
6. different outcomes that could result given different choices for the
substitutions θ1 and θ2 .Derive a result for the inverse resolution step CO5.5 K3
if the clause Father(Tom, Bob) is used in place of Father(Shannon,
Tom).
(b) Will your program succeed if the opponent plays optimally rather
than randomly?