Ilovepdf Merged
Ilovepdf Merged
Resolution
Resolution is a theorem proving technique that proceeds by building refutation proofs, i.e.,
proofs by contradictions. It was invented by a Mathematician John Alan Robinson in the year
1965.
Resolution is used, if there are various statements are given, and we need to prove a
conclusion of those statements. Unification is a key concept in proofs by resolutions.
Resolution is a single inference rule which can efficiently operate on the conjunctive normal
form or clausal form.
Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also known as a
unit clause.
The resolution rule for first-order logic is simply a lifted version of the
propositional rule. Resolution can resolve two clauses if they contain
complementary literals, which are assumed to be standardized apart so that
they share no variables.
In the first step we will convert all the given statements into its first order logic.
Step-2: Conversion of FOL into CNF
In First order logic resolution, it is required to convert the FOL into CNF as CNF form makes
easier for resolution proofs.
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ [eats(x, y) Λ ¬ killed(x)] V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x¬ [¬ killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
Move negation (¬)inwards and rewrite
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ eats(x, y) V killed(x) V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x ¬killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
Rename variables or standardize variables
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀y ∀z ¬ eats(y, z) V killed(y) V food(z)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀w¬ eats(Anil, w) V eats(Harry, w)
f. ∀g ¬killed(g) ] V alive(g)
g. ∀k ¬ alive(k) V ¬ killed(k)
h. likes(John, Peanuts).
Eliminate existential instantiation quantifier by elimination.
In this step, we will eliminate existential quantifier ∃, and this process is known as
Skolemization. But in this example problem since there is no existential quantifier
so all the statements will remain same in this step.
In this step we will drop all universal quantifier since all the statements are not
implicitly quantified so we don't need it.
a. ¬ food(x) V likes(John, x)
b. food(Apple)
c. food(vegetables)
d. ¬ eats(y, z) V killed(y) V food(z)
e. eats (Anil, Peanuts)
f. alive(Anil)
g. ¬ eats(Anil, w) V eats(Harry, w)
h. killed(g) V alive(g)
i. ¬ alive(k) V ¬ killed(k)
j. likes(John, Peanuts).
Distribute conjunction ∧ over disjunction ¬.
This step will not make any change in this problem.
In this statement, we will apply negation to the conclusion statements, which will
be written as ¬likes(John, Peanuts)
Step-4: Draw Resolution graph:
Now in this step, we will solve the problem by resolution tree using substitution.
For the above problem, it will be given as follows:
Hence the negation of the conclusion has been proved as a
complete contradiction with the given set of statements.
Explanation of Resolution graph:
● In the first step of resolution graph, ¬likes(John, Peanuts) , and likes(John, x) get
resolved(canceled) by substitution of {Peanuts/x}, and we are left with ¬ food(Peanuts)
● In the second step of the resolution graph, ¬ food(Peanuts) , and food(z) get resolved
(canceled) by substitution of { Peanuts/z}, and we are left with ¬ eats(y, Peanuts) V
killed(y) .
● In the third step of the resolution graph, ¬ eats(y, Peanuts) and eats (Anil, Peanuts) get
resolved by substitution {Anil/y}, and we are left with Killed(Anil) .
● In the fourth step of the resolution graph, Killed(Anil) and ¬ killed(k) get resolve by
substitution {Anil/k}, and we are left with ¬ alive(Anil) .
● In the last step of the resolution graph ¬ alive(Anil) and alive(Anil) get resolved.
Reference
https://www.javatpoint.com/ai-resolution-in-first-order-logic
https://www.youtube.com/watch?v=C_iqWGOhvak
https://athena.ecs.csus.edu/~mei/logicp/unification-resolution/resolution-refutation.
html
Classification vs Regression Logistic Regression
Logistic Regression estimates probability that an instance belongs to a particular class.
Regression
i.e., predicting a continuous value by learning relationship between
Just like linear regression, logistic regression also finds weighted sum of the inputs but
instead of the continuous value, it outputs its sigmoid result.
dependent and independent variables.
Output is numeric value. Linear Regression
(Multivariate)
Classification
Identifying which of the category a new observation belong.
No analytical solution
TN FP TP
It is a convex function Recall =
TP + FN
FN TP
Gradient descent can be used to
Cost function to optimize solve the problem.
Precision x Recall
F1-score = 2 x
Precision + Recall
More Answers for Practice in Logic and HW 1.doc Ling 310 More Answers for Practice in Logic and HW 1.doc Ling 310
15. Redo the translations of sentences 1, 4, 6, and 7, making use of the predicate Carlson, Greg. 1981. Distribution of free-choice 'any'. In Chicago Linguistic Society
person, as we would have to do if the domain D contains not only humans but cats, 17, 8-23. Chicago.
robots, and other entities. Haspelmath, Martin. 1997. Indefinite Pronouns. Oxford: Oxford University Press.
Hintikka, Jaakko. 1980. On the "Any"-Thesis and the Methodology of Linguistics.
1’. Everyone loves Mary. Linguistics and Philosophy 4:101-122.
Kadmon, Nirit, and Landman, Fred. 1993. Any. Linguistics & Philosophy 16:353-422.
∀x (person(x) → love (x, Mary)) Kratzer, Angelika, and Shimoyama, Junko. 2002. Indeterminate pronouns: the view
from Japanese. In The Proceedings of the Third Tokyo Conference on
4’. Everyone loves someone. (Ambiguous) Psycholinguistics, ed. Yukio Otsu, 1-25. Tokyo: Hituzi Syobo.
Ladusaw, William. 1980. On the notion "affective" in the analysis of negative polarity
(i) ∀x(person(x) → ∃y(person(y) & love (x, y))) (For every person x, there is
some person y whom x loves.) items. Journal of Linguistic Research 1:1-16. Reprinted in Portner and Partee
(2002), pp. 457-470.
(ii) ∃y(person(y) & ∀x(person(x) → love (x, y))) (There is some person y Linebarger, Marcia. 1987. Negative polarity and grammatical representation.
whom every person x loves.)
Linguistics and Philosophy 10:325-387.
An equivalent correct answer for (i): ∀x∃y (person(x) → (person(y) & love (x, y))) Vendler, Zeno. 1962. Each and Every, Any and All. Mind 71:145-160.
But I don’t recommend moving the second quantifier, because then it’s too easy to
come up with the following wrong answer for (i): ∀x∃y ((person(x) & person(y)) →
love (x, y)). It’s always safer to keep a quantifier and its “restrictor” (in this case
person) as close together as possible, and both of them as close to their surface
position as possible.
References
Carlson, Greg. 1980. Polarity Any is Existential. Linguistic Inquiry 11:799-804.
In the first step we will convert all the given statements into its first order logic.
The resolution rule for first-order logic is simply a lifted version of the propositional rule. Resolution
can resolve two clauses if they contain complementary literals, which are assumed to be standardized
apart so that they share no variables.
This rule is also called the binary resolution rule because it only resolves exactly two literals.
Example:
We can resolve two clauses which are given below: Step-2: Conversion of FOL into CNF
[Animal (g(x) V Loves (f(x), x)] and [¬ Loves(a, b) V ¬Kills(a, b)] In First order logic resolution, it is required to convert the FOL into CNF as CNF form makes easier
for resolution proofs.
Where two complimentary literals are: Loves (f(x), x) and ¬ Loves (a, b)
AD
AD
These literals can be unified with unifier θ= [a/f(x), and b/x] , and it will generate a resolvent clause:
o Eliminate all implication (→) and rewrite
[Animal (g(x) V ¬ Kills(f(x), x)]. a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
Steps for Resolution:
c. ∀x ∀y ¬ [eats(x, y) Λ ¬ killed(x)] V food(y)
1. Conversion of facts into first-order logic. d. eats (Anil, Peanuts) Λ alive(Anil)
2. Convert FOL statements into CNF e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x¬ [¬ killed(x) ] V alive(x) i. ¬ alive(k) V ¬ killed(k)
g. ∀x ¬ alive(x) V ¬ killed(x) j. likes(John, Peanuts).
h. likes(John, Peanuts).
Step-1: Conversion of Facts into FOL
o Move negation (¬)inwards and rewrite
In the first step we will convert all the given statements into its first order logic.
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ eats(x, y) V killed(x) V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x ¬killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
o Rename variables or standardize variables
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables) Step-2: Conversion of FOL into CNF
c. ∀y ∀z ¬ eats(y, z) V killed(y) V food(z)
In First order logic resolution, it is required to convert the FOL into CNF as CNF form makes easier
d. eats (Anil, Peanuts) Λ alive(Anil) for resolution proofs.
e. ∀w¬ eats(Anil, w) V eats(Harry, w)
AD
f. ∀g ¬killed(g) ] V alive(g) AD
g. ¬ eats(Anil, w) V eats(Harry, w)
h. killed(g) V alive(g) Explanation of Resolution graph:
i. ¬ alive(k) V ¬ killed(k)
o In the first step of resolution graph, ¬likes(John, Peanuts) , and likes(John, x) get
j. likes(John, Peanuts). resolved(canceled) by substitution of {Peanuts/x}, and we are left with ¬ food(Peanuts)
Note: Statements "food(Apple) Λ food(vegetables)" and "eats (Anil, Peanuts) Λ alive(Anil)" can be written o In the second step of the resolution graph, ¬ food(Peanuts) , and food(z) get resolved
in two separate statements. (canceled) by substitution of { Peanuts/z}, and we are left with ¬ eats(y, Peanuts) V
killed(y) .
o Distribute conjunction ∧ over disjunction ¬.
o In the third step of the resolution graph, ¬ eats(y, Peanuts) and eats (Anil, Peanuts) get
This step will not make any change in this problem.
resolved by substitution {Anil/y}, and we are left with Killed(Anil) .
AD
AD o In the fourth step of the resolution graph, Killed(Anil) and ¬ killed(k) get resolve by
substitution {Anil/k}, and we are left with ¬ alive(Anil) .
Step-3: Negate the statement to be proved
o In the last step of the resolution graph ¬ alive(Anil) and alive(Anil) get resolved.
• If we want to prove that every element has a similar property, we can apply this rule.
• x must not be used as a free variable in this rule.
Inference in First-Order Logic
Example: Let's represent, P(c): "A byte contains 8 bits", so "All bytes contain 8 bits."for ∀ x P(x)
In First-Order Logic, inference is used to derive new facts or sentences from existing ones. Before we , it will also be true.
get into the FOL inference rule, it's important to understand some basic FOL terminology.
2. Universal Instantiation:
Substitution:
• A valid inference rule is universal instantiation, often known as universal elimination or UI. It
Substitution is a basic procedure that is applied to terms and formulations. It can be found in all can be used to add additional sentences many times.
first-order logic inference systems. When there are quantifiers in FOL, the substitution becomes • The new knowledge base is logically equal to the existing knowledge base.
more complicated. When we write F[a/x], we are referring to the substitution of a constant "a" for • We can infer any phrase by replacing a ground word for the variable, according to UI
the variable "x." • The UI rule say that we can infer any sentence P(c) by substituting a ground term c (a constant
within domain x) from ∀ x P(x) for any object in the universe of discourse.
[ Note: first-order logic can convey facts about some or all of the universe's objects. ] • It can be represented as
Equality:
In First-Order Logic, atomic sentences are formed not only via the use of predicate and words, but
also through the application of equality. We can do this by using equality symbols, which indicate Example: 1 IF "Every person like ice-cream"=> ∀x P(x) so we can infer that
that the two terms relate to the same thing. "John likes ice-cream" => P(c)
First-order logic has inference rules similar to propositional logic, therefore here are some basic 3. Existential Instantiation:
inference rules in FOL:
• Existential instantiation is also known as Existential Elimination, and it is a legitimate first-
• Universal Generalization order logic inference rule.
• Universal Instantiation • It can only be used to replace the existential sentence once.
• Existential Instantiation • Although the new KB is not conceptually identical to the old KB, it will be satisfiable if the old
• Existential introduction KB was.
• This rule states that for a new constant symbol c, one can deduce P(c) from the formula given
1. Universal Generalization: in the form of x P(x).
• The only constraint with this rule is that c must be a new word for which P(c) is true.
• Universal generalization is a valid inference rule that states that if premise P(c) is true for any
• It's written like this:
arbitrary element c in the universe of discourse, we can arrive at the conclusion x P. (x).
• It can be represented as:
Example: 1
4. Existential introduction learning model that is used for classification problems. The core of the classifier depends on
the Bayes theorem with an assumption of independence among predictors. That means
• An existential generalization is a valid inference rule in first-order logic that is also known as
an existential introduction. changing the value of a feature doesn’t change the value of another feature.
• This rule argues that if some element c in the universe of discourse has the property P, we
can infer that something in the universe has the attribute P.
• It's written like this: Why is it called Naive?
It is called Naive because of the assumption that 2 variables are independent when they may
not be. In a real-world scenario, there is hardly any situation where the features are
• Example: Let's say that,
independent.
"Priyanka got good marks in English."
"Therefore, someone got good marks in English."
Naive Bayes does seem to be a simple yet powerful algorithm. But why is it so popular?
Generalized Modus Ponens Rule:
In FOL, we use a single inference rule called Generalized Modus Ponens for the inference process.
It's a modified form of Modus ponens.
Since it is a probabilistic approach, the predictions can be made real quick. It can be used for
"P implies Q, and P is declared to be true, hence Q must be true," summarizes Generalized Modus both binary and multi-class classification problems.
Ponens.
Modus Ponens states that for atomic phrases pi, pi', q. Where there is a substitution θ such that Before we dive deeper into this topic we need to understand what is “Conditional probability”,
SUBST (θ, pi',) = SUBST(θ, pi), it can be represented as:
what is “Bayes’ theorem” and how conditional probability help’s us in Bayes’ theorem.
Table of contents
Conditional Probability for Naive Bayes
Example: We will use this rule for Kings are evil, so we will find some x such that x is king,
and x is greedy so we can infer that x is evil.
Conditional probability is defined as the likelihood of an event or outcome occurring, based on From the above examples, we observe that the probability may change if some additional
the occurrence of a previous event or outcome. Conditional probability is calculated by information is given to us. This is exactly the case while building any machine learning model,
multiplying the probability of the preceding event by the updated probability of the succeeding , we need to find the output given some features.
or conditional, event.
Mathematically, the conditional probability of event A given event B has already happened is
Suppose I ask you to pick a card from the deck and find the probability of getting a king given
Observe carefully that here I have mentioned a condition that the card is clubs.
Now while calculating the probability my denominator will not be 52, instead, it will be 13
Image Source: Author
because the total number of cards in clubs is 13.
Bayes’ Rule
Since we have only one king in clubs the probability of getting a KING given the card is clubs
will be 1/13 = 0.077. Now we are prepared to state one of the most useful results in conditional probability: Bayes’
Rule.
Let’s take one more example,
Bayes’ theorem which was given by Thomas Bayes, a British Mathematician, in 1763 provides
Consider a random experiment of tossing 2 coins. The sample space here will be:
a means for calculating the probability of an event given some information.
If a person is asked to find the probability of getting a tail his answer would be 3/4 = 0.75
Now suppose this same experiment is performed by another person but now we give him
the condition that both the coins should have heads. This means if event A: ‘Both the coins
should have heads’, has happened then the elementary outcomes {HT, TH, TT} could not
have happened. Hence in this situation, the probability of getting heads on both the coins will
Basically, we are trying to find the probability of event A, given event B is true.
be 1/4 = 0.25
Here P(B) is called prior probability which means it is the probability of an event before the
evidence
P(B|A) is called the posterior probability i.e., Probability of an event after the evidence is seen. When there are multiple X variables, we simplify it by assuming that X’s are independent, so
Since the denominator is constant here so we can remove it. It’s purely your choice if you
Image Source: Author
want to remove it or not. Removing the denominator will help you save time and calculations.
What is Naive Bayes?
Bayes’ rule provides us with the formula for the probability of Y given some feature X. In real -
world problems, we hardly find any case where there is only one feature.
which assumes that the features are independent that means changing the value of one
feature doesn’t influence the values of other variables and this is why we call this algorithm
“NAIVE”
Naive Bayes can be used for various things like face recognition, weather prediction, Medical
· All the variables are independent. That is if the animal is Dog that doesn’t mean that Size
will be Medium
Image Source: Author · All the predictors have an equal effect on the outcome. That is, the animal being dog does
not have more importance in deciding If we can pet him or not. All the features have equal
There are a whole lot of formulas mentioned here but worry not we will try to understand all
importance.
this with the help of an example.
Naive Bayes Example We should try to apply the Naive Bayes formula on the above dataset however before that, we
demonstrated below:
We also need the probabilities (P(y)), which are calculated in the table below. For example,
is “Yes”.
So far, we have discussed how to predict probabilities if the predictors take up discrete
values. But what if they are continuous? For this, we need to make some more assumptions
regarding the distribution of each feature. The different naive Bayes classif iers differ mainly by
Now if we send our test data, suppose test = (Cow, Medium, Black)
the assumptions they make regarding the distribution of P(xi | y). Here we’ll discuss Gaussian
Probability of petting an animal :
Naïve Bayes.
Gaussian Naïve Bayes is used when we assume all the continuous variables associated with
And the probability of not petting an animal: The conditional probability changes here since we have different values now. Also, the
We know P(Yes|Test)+P(No|test) = 1
We can use this formula to compute the probability of likelihoods if our data is continuous.
Naive Bayes algorithms are mostly used in face recognition, weather prediction, Medical
Diagnosis, News classification, Sentiment Analysis, etc. In this article, we learned the
mathematical intuition behind this algorithm. You have already taken your first step to master
In simple words, a decision tree is a structure that contains nodes (rectangular ID3 stands for Iterative Dichotomiser 3 and is named such because the
boxes) and edges(arrows) and is built from a dataset (table of columns algorithm iteratively (repeatedly) dichotomizes(divides) features into two or
representing features/attributes and rows corresponds to records). Each node more groups at each step.
is either used to make a decision (known as decision node) or represent
an outcome (known as leaf node). Invented by Ross Quinlan, ID3 uses a top-down greedy approach to build a
decision tree. In simple words, the top-down approach means that we start
Decision tree Example building the tree from the top and the greedy approach means that at each
iteration we select the best feature at the present moment to create a node.
Dataset description
In this article, we’ll be using a sample dataset of COVID-19 infection. A preview
The picture above depicts a decision tree that is used to classify whether a
of the entire dataset is shown below.
person is Fit or Unfit. +----+-------+-------+------------------+----------+
| ID | Fever | Cough | Breathing issues | Infected |
The decision nodes here are questions like ‘’‘Is the person less than 30 years of +----+-------+-------+------------------+----------+
| 1 | NO | NO | NO | NO |
age?’, ‘Does the person eat junk?’, etc. and the leaves are one of the two possible +----+-------+-------+------------------+----------+
| 2 | YES | YES | YES | YES |
outcomes viz. Fit and Unfit. +----+-------+-------+------------------+----------+
Looking at the Decision Tree we can say make the following decisions: | 3 | YES | YES | NO | NO |
+----+-------+-------+------------------+----------+
if a person is less than 30 years of age and doesn’t eat junk food then he is Fit, if | 4 | YES | NO | YES | YES |
+----+-------+-------+------------------+----------+
a person is less than 30 years of age and eats junk food then he is Unfit and so | 5 | YES | YES | YES | YES |
+----+-------+-------+------------------+----------+
on. | 6 | NO | YES | NO | NO |
+----+-------+-------+------------------+----------+
| 7 | YES | NO | YES | YES |
The initial node is called the root node (colored in blue), the final nodes are +----+-------+-------+------------------+----------+
| 8 | YES | NO | YES | YES |
called the leaf nodes (colored in green) and the rest of the nodes are +----+-------+-------+------------------+----------+
| 9 | NO | YES | YES | YES |
called intermediate or internal nodes. +----+-------+-------+------------------+----------+
| 10 | YES | YES | NO | YES |
The root and intermediate nodes represent the decisions while the leaf nodes +----+-------+-------+------------------+----------+
represent the outcomes. | 11 | NO | YES | NO | NO |
+----+-------+-------+------------------+----------+
| 12 | NO | YES | YES | YES |
+----+-------+-------+------------------+----------+ Entropy(S) = - ∑ pᵢ * log₂(pᵢ) ; i = 1 to n
| 13 | NO | YES | YES | NO |
+----+-------+-------+------------------+----------+
| 14 | YES | YES | NO | NO |
+----+-------+-------+------------------+----------+ where,
n is the total number of classes in the target column (in our case n = 2 i.e YES
The columns are self-explanatory. Y and N stand for Yes and No respectively. and NO)
The values or classes in Infected column Y and N represent Infected and Not pᵢ is the probability of class ‘i’ or the ratio of “number of rows with class i in
Infected respectively. the target column” to the “total number of rows” in the dataset.
The columns used to make decision nodes viz. ‘Breathing Issues’, ‘Cough’ and Information Gain for a feature column A is calculated as:
IG(S, A) = Entropy(S) - ∑((|Sᵥ| / |S|) * Entropy(Sᵥ))
‘Fever’ are called feature columns or just features and the column used for leaf
nodes i.e. ‘Infected’ is called the target column.
where Sᵥ is the set of rows in S for which the feature column A has value v, |Sᵥ|
is the number of rows in Sᵥ and likewise |S| is the number of rows in S.
Metrics in ID3
As mentioned previously, the ID3 algorithm selects the best feature at each step ID3 Steps
while building a Decision tree.
1. Calculate the Information Gain of each feature.
Before you ask, the answer to the question: ‘How does ID3 select the best
feature?’ is that ID3 uses Information Gain or just Gain to find the best 2. Considering that all rows don’t belong to the same class, split the
feature. dataset S into subsets using the feature for which the Information
Gain is maximum.
Information Gain calculates the reduction in the entropy and measures how
3. Make a decision tree node using the feature with the maximum
well a given feature separates or classifies the target classes. The feature with
Information gain.
the highest Information Gain is selected as the best one.
4. If all rows belong to the same class, make the current node as a leaf
In simple words, Entropy is the measure of disorder and the Entropy of a node with the class as its label.
dataset is the measure of disorder in the target feature of the dataset.
5. Repeat for the remaining features until we run out of all features, or
In the case of binary classification (where the target column has only two types
the decision tree has all leaf nodes.
of classes) entropy is 0 if all values in the target column are
homogenous(similar) and will be 1 if the target column has equal number
values for both the classes.
IG of Fever is greater than that of Cough, so we select Fever as the left branch Since all the values in the target column are YES, we label the left leaf node
of Breathing Issues: as YES, but to make it more logical we label it Infected.
Our tree now looks like this:
Similarly, for the right node of Fever we see the subset of rows from the original
data set that have Breathing Issues value as YES and Fever as NO.
+-------+-------+------------------+----------+
| Fever | Cough | Breathing issues | Infected |
+-------+-------+------------------+----------+
| NO | YES | YES | YES |
+-------+-------+------------------+----------+
| NO | YES | YES | NO |
+-------+-------+------------------+----------+
| NO | YES | YES | NO |
+-------+-------+------------------+----------+
Here not all but most of the values are NO, hence NO or Not
Infected becomes our right leaf node.
Our tree, now, looks like this:
We repeat the same process for the node Cough, however here both left and
right leaves turn out to be the same i.e. NO or Not Infected as shown below:
Pruning is a mechanism that reduces the size and complexity of a Decision tree
by removing unnecessary nodes. More about pruning can be found here.
Another drawback of ID3 is overfitting or high variance i.e. it learns the dataset
it used so well that it fails to generalize on new data.