0% found this document useful (0 votes)
32 views68 pages

Ilovepdf Merged

Uploaded by

dharshanbv4dgm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views68 pages

Ilovepdf Merged

Uploaded by

dharshanbv4dgm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Resolution in FOL

Resolution
Resolution is a theorem proving technique that proceeds by building refutation proofs, i.e.,
proofs by contradictions. It was invented by a Mathematician John Alan Robinson in the year
1965.

Resolution is used, if there are various statements are given, and we need to prove a
conclusion of those statements. Unification is a key concept in proofs by resolutions.
Resolution is a single inference rule which can efficiently operate on the conjunctive normal
form or clausal form.

Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also known as a
unit clause.

Conjunctive Normal Form: A sentence represented as a conjunction of clauses is said to be


conjunctive normal form or CNF.
The resolution inference rule:

The resolution rule for first-order logic is simply a lifted version of the
propositional rule. Resolution can resolve two clauses if they contain
complementary literals, which are assumed to be standardized apart so that
they share no variables.

Where li and mj are complementary literals.


This rule is also called the binary resolution rule because it only resolves
exactly two literals
Example:

We can resolve two clauses which are given below:


[Animal (g(x) V Loves (f(x), x)] and [¬ Loves(a, b) V ¬Kills(a, b)]
Where two complimentary literals are: Loves (f(x), x) and ¬ Loves (a, b)
These literals can be unified with unifier θ= [a/f(x), and b/x] , and it will
generate a resolvent clause:
[Animal (g(x) V ¬ Kills(f(x), x)].
Steps for Resolution:

1. Conversion of facts into first-order logic.


2. Convert FOL statements into CNF
3. Negate the statement which needs to prove (proof by
contradiction)
4. Draw resolution graph (unification).
Example:

a. John likes all kind of food.


b. Apple and vegetable are food
c. Anything anyone eats and not killed is food.
d. Anil eats peanuts and still alive
e. Harry eats everything that Anil eats.
Prove by resolution that:
f. John likes peanuts.
Step-1: Conversion of Facts into FOL

In the first step we will convert all the given statements into its first order logic.
Step-2: Conversion of FOL into CNF
In First order logic resolution, it is required to convert the FOL into CNF as CNF form makes
easier for resolution proofs.

Eliminate all implication (→) and rewrite

a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ [eats(x, y) Λ ¬ killed(x)] V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x¬ [¬ killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
Move negation (¬)inwards and rewrite

a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ eats(x, y) V killed(x) V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x ¬killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
Rename variables or standardize variables

a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀y ∀z ¬ eats(y, z) V killed(y) V food(z)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀w¬ eats(Anil, w) V eats(Harry, w)
f. ∀g ¬killed(g) ] V alive(g)
g. ∀k ¬ alive(k) V ¬ killed(k)
h. likes(John, Peanuts).
Eliminate existential instantiation quantifier by elimination.

In this step, we will eliminate existential quantifier ∃, and this process is known as
Skolemization. But in this example problem since there is no existential quantifier
so all the statements will remain same in this step.

Drop Universal quantifiers.

In this step we will drop all universal quantifier since all the statements are not
implicitly quantified so we don't need it.
a. ¬ food(x) V likes(John, x)
b. food(Apple)
c. food(vegetables)
d. ¬ eats(y, z) V killed(y) V food(z)
e. eats (Anil, Peanuts)
f. alive(Anil)
g. ¬ eats(Anil, w) V eats(Harry, w)
h. killed(g) V alive(g)
i. ¬ alive(k) V ¬ killed(k)
j. likes(John, Peanuts).
Distribute conjunction ∧ over disjunction ¬.
This step will not make any change in this problem.

Step-3: Negate the statement to be proved

In this statement, we will apply negation to the conclusion statements, which will
be written as ¬likes(John, Peanuts)
Step-4: Draw Resolution graph:

Now in this step, we will solve the problem by resolution tree using substitution.
For the above problem, it will be given as follows:
Hence the negation of the conclusion has been proved as a
complete contradiction with the given set of statements.
Explanation of Resolution graph:
● In the first step of resolution graph, ¬likes(John, Peanuts) , and likes(John, x) get
resolved(canceled) by substitution of {Peanuts/x}, and we are left with ¬ food(Peanuts)
● In the second step of the resolution graph, ¬ food(Peanuts) , and food(z) get resolved
(canceled) by substitution of { Peanuts/z}, and we are left with ¬ eats(y, Peanuts) V
killed(y) .
● In the third step of the resolution graph, ¬ eats(y, Peanuts) and eats (Anil, Peanuts) get
resolved by substitution {Anil/y}, and we are left with Killed(Anil) .
● In the fourth step of the resolution graph, Killed(Anil) and ¬ killed(k) get resolve by
substitution {Anil/k}, and we are left with ¬ alive(Anil) .
● In the last step of the resolution graph ¬ alive(Anil) and alive(Anil) get resolved.
Reference
https://www.javatpoint.com/ai-resolution-in-first-order-logic

https://www.youtube.com/watch?v=C_iqWGOhvak

https://athena.ecs.csus.edu/~mei/logicp/unification-resolution/resolution-refutation.
html
Classification vs Regression Logistic Regression
Logistic Regression estimates probability that an instance belongs to a particular class.
Regression
i.e., predicting a continuous value by learning relationship between
Just like linear regression, logistic regression also finds weighted sum of the inputs but
instead of the continuous value, it outputs its sigmoid result.
dependent and independent variables.
Output is numeric value. Linear Regression
(Multivariate)

Classification
Identifying which of the category a new observation belong.

Output is class label.

Logistic Regression Logistic Regression

Logistic Regression model prediction (binary classifier) TP


Precision =
TP + FP

No analytical solution
TN FP TP
It is a convex function Recall =
TP + FN
FN TP
Gradient descent can be used to
Cost function to optimize solve the problem.
Precision x Recall
F1-score = 2 x
Precision + Recall
More Answers for Practice in Logic and HW 1.doc Ling 310 More Answers for Practice in Logic and HW 1.doc Ling 310

∀x (student(x) → (walk (x) ∨ talk (x)))


More Answers for Practice in Logic and HW 1 11. Every student who walks talks.
This is an expanded version showing additional right and wrong answers. ∀x ((student(x) & walk (x)) → talk (x))) or
∀x (student(x) → (walk (x) → talk (x)))
I. Practice in 1st-order predicate logic – with answers.
12. Every student who loves Mary is happy.
1. Mary loves everyone. [assuming D contains only humans]
∀x ((student(x) & love (x, Mary)) → happy (x)))
∀x love (Mary, x)
13. Every boy who loves Mary hates every boy who Mary loves.
Note: No further parentheses are needed here, and according to the syntax on
the handout, no further parentheses are possible. But “extra parentheses” are in ∀x((boy(x) & love (x, Mary)) → ∀y((boy(y) & love(Mary, y))→ hate (x,y)))
general considered acceptable, and if you find them helpful, I have no objection. So I 14. Every boy who loves Mary hates every other boy who Mary loves.
would also count as correct any of the following: (So if John loves Mary and Mary loves John, sentence 13 requires that
∀x (love (Mary, x)), (∀x love (Mary, x)), (∀x (love (Mary, x))) John hates himself, but sentence 14 doesn’t require that.)
∀x((boy(x) & love (x, Mary)) → ∀y((boy(y) & love(Mary, y) & y ≠ x) →
2. Mary loves everyone. [assuming D contains both humans and non-humans, so hate (x,y)))
we need to be explicit about ‘everyone’ as ‘every person’]
∀x (person(x) → love (Mary, x))
A wrong answer: ∀x (person(x) & love (Mary, x)) This says that
II. Homework #1, with answers.
everything in the universe is a person and loves Mary. 1. Everyone loves Mary.
3. No one talks. [assume D contains only humans unless specified otherwise.] ∀x love (x, Mary)
¬∃x talk(x) or equivalently, ∀x¬talk(x)
4. Everyone loves himself. 2. John does not love anyone. (Not ambiguous, but there are two equivalent and equally
good formulas for it, one involving negation and the existential quantifier, the other involving
∀x love (x, x) negation and the universal quantifier. Give both.)
5. Everyone loves everyone. ¬∃x love(John, x) or equivalently, ∀x¬ love(John, x)
∀x∀y love (x, y) Wrong: ∃x¬ love(John, x) :That says there is someone John doesn’t love.
6. Everyone loves everyone except himself. (= Everyone loves everyone else.) Wrong: ¬∀x love(John, x): That says John doesn’t love everyone; it’s equivalent to
∀x∀y(¬ x = y → love (x, y)) or ∀x∀y( x ≠ y → love (x, y)) the preceding formula.
Or maybe it should be this, which is not equivalent to the pair above:
∀x∀y(¬ x = y ↔ love (x, y)) or ∀x∀y( x ≠ y ↔ love (x, y)) 3. Everyone who sees Mary loves Mary.
The first pair allows an individual to also love himself; the second pair ∀x (see (x, Mary) → love (x, Mary))
doesn’t.
7. Every student smiles. 4. Everyone loves someone. (Ambiguous)
∀x (student(x) → smile( x)) (i) ∀x∃y love (x, y) (For every person x, there is someone whom x loves.)
8. Every student except George smiles. (ii) ∃y∀x love (x, y) (There is some person y whom everyone loves, i.e.
∀x ((student(x) & x ≠ George) → smile( x)) everyone loves some one specific person.)
That formula allows the possibility that George smiles too; if we want to
exclude it (this depends on what you believe about except; there are subtle differences 5. Someone loves everyone. (Ambiguous)
and perhaps some indeterminacy among except, besides, other than and their nearest
equivalents in other languages), then it should be the following, or something (i) ∃x∀y love (x, y) (There is some person x who loves everyone.)
equivalent to it: (ii) ∀y∃x love (x, y) (For every person y, there is someone who loves them –
i.e., no one is totally unloved. This second reading is probably dispreferred for the
∀x ((student(x) → (x ≠ George ↔ smile( x))) active sentence. It’s the preferred reading for the passive sentence “Everyone is loved by
9. Everyone walks or talks. someone” and it’s the only reading for the agentless passive “Everyone is loved.”)
∀x (walk (x) ∨ talk (x))
10. Every student walks or talks. 6. Someone walks and talks.

1 Feb 27, 2006 2 Feb 27, 2006


More Answers for Practice in Logic and HW 1.doc Ling 310 More Answers for Practice in Logic and HW 1.doc Ling 310

∃x(walk (x) & talk (x))


12. If anyone cheats, everyone suffers.
7. Someone walks and someone talks. ∀x (cheat(x) → ∀y suffer(y))
(∃x walk (x) & ∃x talk (x)) or (∃x walk (x) & ∃y talk (y)) Equivalent: ∀x∀y (cheat(x) → suffer(y))
Because neither quantifier is inside the scope of the other – i.e. their scopes are Also equivalent: ∀y ∀x (cheat(x) → suffer(y))
independent – it doesn’t matter whether we use different variables here or use the same
variable twice. But if one quantifier is inside the scope of the other, then it matters a great
Also equivalent: ∃x cheat(x) → ∀y suffer( y) (Each quantifier has narrow scope
deal. When one quantifier is inside the scope of another, as in questions 4 and 5 above, here.)
always give them different variables! Also equivalent: ∃x cheat(x) → ∀x suffer(x) (If each quantifier has narrow scope,
Also equivalent: ∃x∃y(walk (x) & talk (y)) then they don’t need to involve different variables. If one is inside the scope of the
other, then they do.)
8. Everyone who walks is calm. Also equivalent: ∀y( ∃x cheat(x) → suffer(y))
∀x (walk(x) → calm( x)) A wrong answer: ∀y∃x (cheat(x) → suffer(y)) This has no natural English
paraphrase.
A different wrong answer: ∀y(∀x cheat(x) → suffer(y)) This is one way of saying
9. No one who runs walks. (Not ambiguous, but same note as for number 2.)
“If everyone cheats, then everyone suffers.”
(i) ¬∃x (run (x) & walk (x)) or equivalently, Another note about any: As the equivalent answers above illustrate, any in this case
(ii) ∀x (run(x) → ¬ walk(x)) or equivalently, can be viewed either as a wide-scope universal (with scope over the if-clause) or as a
(iii) ∀x¬(run (x) & walk (x)) narrow-scope existential (with scope inside the if-clause). The fact that these are
equivalent, at least in this case, is part of the source of debates about any. In example
A wrong answer: ∀x (¬run(x) → walk(x)) What does this one say?
11, we didn’t have that choice, because if any were treated as a narrow-scope
Another wrong answer: ¬∃x (run (x) → walk (x)) This one doesn’t correspond to existential in that case, it couldn’t bind the second occurrence of the variable x
any English sentence; see notes to questions 11 and 6’ below. corresponding to the pronoun he. The same is true for anyone in the next example,
which has to be treated as a wide-scope universal in order to bind himself.
10. Everyone who Mary loves loves someone who is happy.
∀x(love (Mary, x)→ ∃y(love(x,y) & happy( y))) 13. Anyone who loves everyone loves himself.
Also correct: ∀x∃y (love (Mary, x)→ (love(x,y) & happy( y))) ∀x(∀y love (x,y)→ (love(x,x))
But I recommend keeping each quantifier as close as possible to the noun it note: Not this: ∀x∀y (love (x,y)→ love(x,x)) What this one says is “Anyone who
quantifies, or to its surface position. The more you move quantifiers around, the easier loves anyone loves himself” What the correct one says is IF you love everyone,
it is to make mistakes. THEN you love yourself. So the ∀y quantifier has to be inside the scope of the →.
Another wrong answer: ∃x∀y(love (x,y)→ love(x,x)) This has no natural English
11. If anyone cheats, he suffers. (English paraphrases: Anyone who cheats suffers. paraphrase. Any may sometimes be a wide-scope universal, and sometimes a narrow-
Everyone who cheats suffers. On the subtle difference between these two, see scope existential, but it is never a “wide-scope existential.”
(Kadmon and Landman 1993).)
∀x (cheat(x) → suffer( x)) 14. Mary loves everyone except John. (For this one, you need to add the two-place
A wrong answer: ∃x(cheat(x) → suffer( x)) A wide scope ∃x like this creates too predicate of identity, “=”. Think of “everyone except John” as “everyone who is not
weak a statement. If ∃x were given scope only over the antecedent, as in: ∃xcheat(x) identical to John”.)
→ suffer( x), then that error would be corrected but there would be a new problem ∀x (¬ x = John → love (Mary, x)) or equivalently
because the second x would not be bound. ∀x (x ≠ John → love (Mary, x))
Note on any: Sometimes anyone corresponds to ∃ and sometimes to ∀; you have to As in the case of some earlier examples, this is a ‘weak’ reading of except, allowing
think about the meaning of the whole sentence. Many papers have been written the possibility of Mary loving John. To get a ‘strong’ reading of except, ruling out that
exploring the issue of how best to account for the distribution of meanings of any, and possibility, replace → above by ↔, or add a conjunct “ & ¬ love (Mary, John)” at
whether it does or doesn’t require lexical ambiguity as part of the account. A few the end.
classics include (Carlson 1980, Carlson 1981, Haspelmath 1997, Hintikka 1980,
Kadmon and Landman 1993, Kratzer and Shimoyama 2002, Ladusaw 1980,
Linebarger 1987, Vendler 1962). See also the note about any in the next item.

3 Feb 27, 2006 4 Feb 27, 2006


More Answers for Practice in Logic and HW 1.doc Ling 310 More Answers for Practice in Logic and HW 1.doc Ling 310

15. Redo the translations of sentences 1, 4, 6, and 7, making use of the predicate Carlson, Greg. 1981. Distribution of free-choice 'any'. In Chicago Linguistic Society
person, as we would have to do if the domain D contains not only humans but cats, 17, 8-23. Chicago.
robots, and other entities. Haspelmath, Martin. 1997. Indefinite Pronouns. Oxford: Oxford University Press.
Hintikka, Jaakko. 1980. On the "Any"-Thesis and the Methodology of Linguistics.
1’. Everyone loves Mary. Linguistics and Philosophy 4:101-122.
Kadmon, Nirit, and Landman, Fred. 1993. Any. Linguistics & Philosophy 16:353-422.
∀x (person(x) → love (x, Mary)) Kratzer, Angelika, and Shimoyama, Junko. 2002. Indeterminate pronouns: the view
from Japanese. In The Proceedings of the Third Tokyo Conference on
4’. Everyone loves someone. (Ambiguous) Psycholinguistics, ed. Yukio Otsu, 1-25. Tokyo: Hituzi Syobo.
Ladusaw, William. 1980. On the notion "affective" in the analysis of negative polarity
(i) ∀x(person(x) → ∃y(person(y) & love (x, y))) (For every person x, there is
some person y whom x loves.) items. Journal of Linguistic Research 1:1-16. Reprinted in Portner and Partee
(2002), pp. 457-470.
(ii) ∃y(person(y) & ∀x(person(x) → love (x, y))) (There is some person y Linebarger, Marcia. 1987. Negative polarity and grammatical representation.
whom every person x loves.)
Linguistics and Philosophy 10:325-387.
An equivalent correct answer for (i): ∀x∃y (person(x) → (person(y) & love (x, y))) Vendler, Zeno. 1962. Each and Every, Any and All. Mind 71:145-160.
But I don’t recommend moving the second quantifier, because then it’s too easy to
come up with the following wrong answer for (i): ∀x∃y ((person(x) & person(y)) →
love (x, y)). It’s always safer to keep a quantifier and its “restrictor” (in this case
person) as close together as possible, and both of them as close to their surface
position as possible.

6’. Someone walks and talks.


∃x(person(x) & walk (x) & talk (x))
Note: technically, we need more parentheses – either
∃x(person(x) & (walk (x) & talk (x))) or
∃x((person(x) & walk (x)) & talk (x))
But since it’s provable that & is associative, i.e. the grouping of a sequence of &’s
doesn’t make any difference, it is customary to allow expressions like (p & q & r).
And similarly for big disjunctions, (p ∨ q ∨ r). But not with → !
Wrong: ∃x(person(x) → (walk (x) & talk (x))) This has weird truth-conditions,
which you can see if you remember that p → q is equivalent to ¬p ∨ q. You will
never really want to combine ∃ with → -- it always makes a statement that is too
weak.

7’. Someone walks and someone talks.


(∃x (person(x) & walk (x)) & ∃x(person(x) & talk (x))) or equivalently
(∃x (person(x) & walk (x)) & ∃y (person(y) & talk (y)))
Note: both in the original 7 and in this 7’, it would be OK and customary to drop
outermost parentheses, i.e. the very first left parenthesis and the very last right
parenthesis may be dropped. (But no parentheses can be dropped in 6; they are not
really “outermost”. Only when a pair of parentheses contains the entire formula can it
be dropped under the “drop outermost parentheses” convention.
Also correct: ∃x∃y (person(x) & walk (x) & person(y) & talk (y))

References
Carlson, Greg. 1980. Polarity Any is Existential. Linguistic Inquiry 11:799-804.

5 Feb 27, 2006 6 Feb 27, 2006


3. Negate the statement which needs to prove (proof by contradiction)
4. Draw resolution graph (unification).
Resolution
To better understand all the above steps, we will take an example in which we will apply resolution.
Resolution is a theorem proving technique that proceeds by building refutation proofs, i.e., proofs
by contradictions. It was invented by a Mathematician John Alan Robinson in the year 1965. Example:
Resolution is used, if there are various statements are given, and we need to prove a conclusion of
a. John likes all kind of food.
those statements. Unification is a key concept in proofs by resolutions. Resolution is a single
inference rule which can efficiently operate on the conjunctive normal form or clausal form. b. Apple and vegetable are food
c. Anything anyone eats and not killed is food.
Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also known as a unit clause.
d. Anil eats peanuts and still alive
Conjunctive Normal Form: A sentence represented as a conjunction of clauses is said to e. Harry eats everything that Anil eats.
be conjunctive normal form or CNF. Prove by resolution that:
Note: To better understand this topic, firstly learns the FOL in AI. f. John likes peanuts.

The resolution inference rule: Step-1: Conversion of Facts into FOL

In the first step we will convert all the given statements into its first order logic.
The resolution rule for first-order logic is simply a lifted version of the propositional rule. Resolution
can resolve two clauses if they contain complementary literals, which are assumed to be standardized
apart so that they share no variables.

Where li and mj are complementary literals.

This rule is also called the binary resolution rule because it only resolves exactly two literals.

Example:
We can resolve two clauses which are given below: Step-2: Conversion of FOL into CNF

[Animal (g(x) V Loves (f(x), x)] and [¬ Loves(a, b) V ¬Kills(a, b)] In First order logic resolution, it is required to convert the FOL into CNF as CNF form makes easier
for resolution proofs.
Where two complimentary literals are: Loves (f(x), x) and ¬ Loves (a, b)
AD
AD
These literals can be unified with unifier θ= [a/f(x), and b/x] , and it will generate a resolvent clause:
o Eliminate all implication (→) and rewrite
[Animal (g(x) V ¬ Kills(f(x), x)]. a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
Steps for Resolution:
c. ∀x ∀y ¬ [eats(x, y) Λ ¬ killed(x)] V food(y)
1. Conversion of facts into first-order logic. d. eats (Anil, Peanuts) Λ alive(Anil)
2. Convert FOL statements into CNF e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x¬ [¬ killed(x) ] V alive(x) i. ¬ alive(k) V ¬ killed(k)
g. ∀x ¬ alive(x) V ¬ killed(x) j. likes(John, Peanuts).
h. likes(John, Peanuts).
Step-1: Conversion of Facts into FOL
o Move negation (¬)inwards and rewrite
In the first step we will convert all the given statements into its first order logic.
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ eats(x, y) V killed(x) V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x ¬killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
o Rename variables or standardize variables
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables) Step-2: Conversion of FOL into CNF
c. ∀y ∀z ¬ eats(y, z) V killed(y) V food(z)
In First order logic resolution, it is required to convert the FOL into CNF as CNF form makes easier
d. eats (Anil, Peanuts) Λ alive(Anil) for resolution proofs.
e. ∀w¬ eats(Anil, w) V eats(Harry, w)
AD
f. ∀g ¬killed(g) ] V alive(g) AD

o Eliminate all implication (→) and rewrite


g. ∀k ¬ alive(k) V ¬ killed(k)
a. ∀x ¬ food(x) V likes(John, x)
h. likes(John, Peanuts).
b. food(Apple) Λ food(vegetables)
o Eliminate existential instantiation quantifier by elimination.
In this step, we will eliminate existential quantifier ∃, and this process is known c. ∀x ∀y ¬ [eats(x, y) Λ ¬ killed(x)] V food(y)
as Skolemization. But in this example problem since there is no existential quantifier so all d. eats (Anil, Peanuts) Λ alive(Anil)
the statements will remain same in this step. e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
o Drop Universal quantifiers. f. ∀x¬ [¬ killed(x) ] V alive(x)
In this step we will drop all universal quantifier since all the statements are not implicitly g. ∀x ¬ alive(x) V ¬ killed(x)
quantified so we don't need it.
h. likes(John, Peanuts).
a. ¬ food(x) V likes(John, x)
o Move negation (¬)inwards and rewrite
b. food(Apple)
a. ∀x ¬ food(x) V likes(John, x)
c. food(vegetables)
b. food(Apple) Λ food(vegetables)
d. ¬ eats(y, z) V killed(y) V food(z)
c. ∀x ∀y ¬ eats(x, y) V killed(x) V food(y)
e. eats (Anil, Peanuts)
d. eats (Anil, Peanuts) Λ alive(Anil)
f. alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
g. ¬ eats(Anil, w) V eats(Harry, w)
f. ∀x ¬killed(x) ] V alive(x)
h. killed(g) V alive(g)
g. ∀x ¬ alive(x) V ¬ killed(x) In this statement, we will apply negation to the conclusion statements, which will be written as
¬likes(John, Peanuts)
h. likes(John, Peanuts).
o Rename variables or standardize variables Step-4: Draw Resolution graph:
a. ∀x ¬ food(x) V likes(John, x)
Now in this step, we will solve the problem by resolution tree using substitution. For the above
b. food(Apple) Λ food(vegetables) problem, it will be given as follows:
c. ∀y ∀z ¬ eats(y, z) V killed(y) V food(z)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀w¬ eats(Anil, w) V eats(Harry, w)
f. ∀g ¬killed(g) ] V alive(g)
g. ∀k ¬ alive(k) V ¬ killed(k)
h. likes(John, Peanuts).
o Eliminate existential instantiation quantifier by elimination.
In this step, we will eliminate existential quantifier ∃, and this process is known
as Skolemization. But in this example problem since there is no existential quantifier so all
the statements will remain same in this step.
o Drop Universal quantifiers.
In this step we will drop all universal quantifier since all the statements are not implicitly
quantified so we don't need it.
a. ¬ food(x) V likes(John, x)
b. food(Apple)
c. food(vegetables) Hence the negation of the conclusion has been proved as a complete contradiction with the given
d. ¬ eats(y, z) V killed(y) V food(z) set of statements.

e. eats (Anil, Peanuts) AD


AD
f. alive(Anil) AD
AD

g. ¬ eats(Anil, w) V eats(Harry, w)
h. killed(g) V alive(g) Explanation of Resolution graph:
i. ¬ alive(k) V ¬ killed(k)
o In the first step of resolution graph, ¬likes(John, Peanuts) , and likes(John, x) get
j. likes(John, Peanuts). resolved(canceled) by substitution of {Peanuts/x}, and we are left with ¬ food(Peanuts)
Note: Statements "food(Apple) Λ food(vegetables)" and "eats (Anil, Peanuts) Λ alive(Anil)" can be written o In the second step of the resolution graph, ¬ food(Peanuts) , and food(z) get resolved
in two separate statements. (canceled) by substitution of { Peanuts/z}, and we are left with ¬ eats(y, Peanuts) V
killed(y) .
o Distribute conjunction ∧ over disjunction ¬.
o In the third step of the resolution graph, ¬ eats(y, Peanuts) and eats (Anil, Peanuts) get
This step will not make any change in this problem.
resolved by substitution {Anil/y}, and we are left with Killed(Anil) .
AD
AD o In the fourth step of the resolution graph, Killed(Anil) and ¬ killed(k) get resolve by
substitution {Anil/k}, and we are left with ¬ alive(Anil) .
Step-3: Negate the statement to be proved
o In the last step of the resolution graph ¬ alive(Anil) and alive(Anil) get resolved.
• If we want to prove that every element has a similar property, we can apply this rule.
• x must not be used as a free variable in this rule.
Inference in First-Order Logic
Example: Let's represent, P(c): "A byte contains 8 bits", so "All bytes contain 8 bits."for ∀ x P(x)
In First-Order Logic, inference is used to derive new facts or sentences from existing ones. Before we , it will also be true.
get into the FOL inference rule, it's important to understand some basic FOL terminology.
2. Universal Instantiation:
Substitution:
• A valid inference rule is universal instantiation, often known as universal elimination or UI. It
Substitution is a basic procedure that is applied to terms and formulations. It can be found in all can be used to add additional sentences many times.
first-order logic inference systems. When there are quantifiers in FOL, the substitution becomes • The new knowledge base is logically equal to the existing knowledge base.
more complicated. When we write F[a/x], we are referring to the substitution of a constant "a" for • We can infer any phrase by replacing a ground word for the variable, according to UI
the variable "x." • The UI rule say that we can infer any sentence P(c) by substituting a ground term c (a constant
within domain x) from ∀ x P(x) for any object in the universe of discourse.
[ Note: first-order logic can convey facts about some or all of the universe's objects. ] • It can be represented as

Equality:

In First-Order Logic, atomic sentences are formed not only via the use of predicate and words, but
also through the application of equality. We can do this by using equality symbols, which indicate Example: 1 IF "Every person like ice-cream"=> ∀x P(x) so we can infer that
that the two terms relate to the same thing. "John likes ice-cream" => P(c)

Example: Brother (John) = Smith. Example: 2 Let's take a famous example,


"All kings who are greedy are Evil." So let our knowledge base contains this detail as in the form of
In the above example, the object referred by the Brother (John) is close to the object referred FOL: ∀x king(x) ∧ greedy (x) → Evil (x),
by Smith. The equality symbol can be used with negation to portray that two terms are not the same We can infer any of the following statements using Universal Instantiation from this information:
objects.
• King(John) ∧ Greedy (John) → Evil (John),
Example: ¬(x=y) which is equivalent to x ≠y. • King(Richard) ∧ Greedy (Richard) → Evil (Richard),
• We can infer any phrase by replacing a ground word for the variable, according to UI
FOL inference rules for quantifier: • King(Father(John)) ∧ Greedy (Father(John)) → Evil (Father(John)),

First-order logic has inference rules similar to propositional logic, therefore here are some basic 3. Existential Instantiation:
inference rules in FOL:
• Existential instantiation is also known as Existential Elimination, and it is a legitimate first-
• Universal Generalization order logic inference rule.
• Universal Instantiation • It can only be used to replace the existential sentence once.
• Existential Instantiation • Although the new KB is not conceptually identical to the old KB, it will be satisfiable if the old
• Existential introduction KB was.
• This rule states that for a new constant symbol c, one can deduce P(c) from the formula given
1. Universal Generalization: in the form of x P(x).
• The only constraint with this rule is that c must be a new word for which P(c) is true.
• Universal generalization is a valid inference rule that states that if premise P(c) is true for any
• It's written like this:
arbitrary element c in the universe of discourse, we can arrive at the conclusion x P. (x).
• It can be represented as:
Example: 1

From the given sentence: ∃x Crown(x) ∧ OnHead(x, John),


Introduction
So we can infer: Crown(K) ∧ OnHead( K, John), as long as K does not appear in the knowledge
base.
In this article, we will discuss the mathematical intuition behind Naive Bayes Classifiers, an d

we’ll also see how to implement this on Python.


• The above used K is a constant symbol, which is known as Skolem constant.
• The Existential instantiation is a special case of Skolemization process. This model is easy to build and is mostly used for large datasets. It is a probabilistic machine

4. Existential introduction learning model that is used for classification problems. The core of the classifier depends on

the Bayes theorem with an assumption of independence among predictors. That means
• An existential generalization is a valid inference rule in first-order logic that is also known as
an existential introduction. changing the value of a feature doesn’t change the value of another feature.
• This rule argues that if some element c in the universe of discourse has the property P, we
can infer that something in the universe has the attribute P.
• It's written like this: Why is it called Naive?

It is called Naive because of the assumption that 2 variables are independent when they may

not be. In a real-world scenario, there is hardly any situation where the features are
• Example: Let's say that,
independent.
"Priyanka got good marks in English."
"Therefore, someone got good marks in English."
Naive Bayes does seem to be a simple yet powerful algorithm. But why is it so popular?
Generalized Modus Ponens Rule:

In FOL, we use a single inference rule called Generalized Modus Ponens for the inference process.
It's a modified form of Modus ponens.
Since it is a probabilistic approach, the predictions can be made real quick. It can be used for
"P implies Q, and P is declared to be true, hence Q must be true," summarizes Generalized Modus both binary and multi-class classification problems.
Ponens.

Modus Ponens states that for atomic phrases pi, pi', q. Where there is a substitution θ such that Before we dive deeper into this topic we need to understand what is “Conditional probability”,
SUBST (θ, pi',) = SUBST(θ, pi), it can be represented as:
what is “Bayes’ theorem” and how conditional probability help’s us in Bayes’ theorem.

Table of contents
Conditional Probability for Naive Bayes
Example: We will use this rule for Kings are evil, so we will find some x such that x is king,
and x is greedy so we can infer that x is evil.
Conditional probability is defined as the likelihood of an event or outcome occurring, based on From the above examples, we observe that the probability may change if some additional

the occurrence of a previous event or outcome. Conditional probability is calculated by information is given to us. This is exactly the case while building any machine learning model,

multiplying the probability of the preceding event by the updated probability of the succeeding , we need to find the output given some features.

or conditional, event.
Mathematically, the conditional probability of event A given event B has already happened is

Let’s start understanding this definition with examples. given by:

Suppose I ask you to pick a card from the deck and find the probability of getting a king given

the card is clubs.

Observe carefully that here I have mentioned a condition that the card is clubs.

Now while calculating the probability my denominator will not be 52, instead, it will be 13
Image Source: Author
because the total number of cards in clubs is 13.
Bayes’ Rule
Since we have only one king in clubs the probability of getting a KING given the card is clubs

will be 1/13 = 0.077. Now we are prepared to state one of the most useful results in conditional probability: Bayes’

Rule.
Let’s take one more example,

Bayes’ theorem which was given by Thomas Bayes, a British Mathematician, in 1763 provides
Consider a random experiment of tossing 2 coins. The sample space here will be:
a means for calculating the probability of an event given some information.

S = {HH, HT, TH, TT}


Mathematically Bayes’ theorem can be stated as:

If a person is asked to find the probability of getting a tail his answer would be 3/4 = 0.75

Now suppose this same experiment is performed by another person but now we give him

the condition that both the coins should have heads. This means if event A: ‘Both the coins

should have heads’, has happened then the elementary outcomes {HT, TH, TT} could not

have happened. Hence in this situation, the probability of getting heads on both the coins will
Basically, we are trying to find the probability of event A, given event B is true.
be 1/4 = 0.25
Here P(B) is called prior probability which means it is the probability of an event before the

evidence
P(B|A) is called the posterior probability i.e., Probability of an event after the evidence is seen. When there are multiple X variables, we simplify it by assuming that X’s are independent, so

With regards to our dataset, this formula can be re-written as:

Y: class of the variable


For n number of X, the formula becomes Naive Bayes:

X: dependent feature vector (of size n)

Which can be expressed as:

Since the denominator is constant here so we can remove it. It’s purely your choice if you
Image Source: Author
want to remove it or not. Removing the denominator will help you save time and calculations.
What is Naive Bayes?

Bayes’ rule provides us with the formula for the probability of Y given some feature X. In real -

world problems, we hardly find any case where there is only one feature.

This formula can also be understood as:


When the features are independent, we can extend Bayes’ rule to what is called Naive Bayes

which assumes that the features are independent that means changing the value of one

feature doesn’t influence the values of other variables and this is why we call this algorithm

“NAIVE”

Naive Bayes can be used for various things like face recognition, weather prediction, Medical

Diagnosis, News classification, Sentiment Analysis, and a lot more.


Image Source: Author

Assumptions of Naive Bayes

· All the variables are independent. That is if the animal is Dog that doesn’t mean that Size

will be Medium

Image Source: Author · All the predictors have an equal effect on the outcome. That is, the animal being dog does

not have more importance in deciding If we can pet him or not. All the features have equal
There are a whole lot of formulas mentioned here but worry not we will try to understand all
importance.
this with the help of an example.

Naive Bayes Example We should try to apply the Naive Bayes formula on the above dataset however before that, we

need to do some precomputations on our dataset.


Let’s take a dataset to predict whether we can pet an animal or not.
We need to find P(x i|yj) for each x i in X and each y j in Y. All these calculations have been

demonstrated below:

We also need the probabilities (P(y)), which are calculated in the table below. For example,

P(Pet Animal = NO) = 6/14.


We see here that P(Yes|Test) > P(No|Test), so the prediction that we can pet this animal

is “Yes”.

Gaussian Naive Bayes

So far, we have discussed how to predict probabilities if the predictors take up discrete

values. But what if they are continuous? For this, we need to make some more assumptions

regarding the distribution of each feature. The different naive Bayes classif iers differ mainly by
Now if we send our test data, suppose test = (Cow, Medium, Black)
the assumptions they make regarding the distribution of P(xi | y). Here we’ll discuss Gaussian
Probability of petting an animal :
Naïve Bayes.

Gaussian Naïve Bayes is used when we assume all the continuous variables associated with

each feature to be distributed according to Gaussian Distribution. Gaussian Distribution is

also called Normal distribution.

And the probability of not petting an animal: The conditional probability changes here since we have different values now. Also, the

(PDF) probability density function of a normal distribution is given by:

We know P(Yes|Test)+P(No|test) = 1
We can use this formula to compute the probability of likelihoods if our data is continuous.

So, we will normalize the result: Endnotes

Naive Bayes algorithms are mostly used in face recognition, weather prediction, Medical

Diagnosis, News classification, Sentiment Analysis, etc. In this article, we learned the

mathematical intuition behind this algorithm. You have already taken your first step to master

this algorithm and from here all you need is practice.


What are Decision Trees? ID3 in brief

In simple words, a decision tree is a structure that contains nodes (rectangular ID3 stands for Iterative Dichotomiser 3 and is named such because the
boxes) and edges(arrows) and is built from a dataset (table of columns algorithm iteratively (repeatedly) dichotomizes(divides) features into two or
representing features/attributes and rows corresponds to records). Each node more groups at each step.
is either used to make a decision (known as decision node) or represent
an outcome (known as leaf node). Invented by Ross Quinlan, ID3 uses a top-down greedy approach to build a
decision tree. In simple words, the top-down approach means that we start
Decision tree Example building the tree from the top and the greedy approach means that at each
iteration we select the best feature at the present moment to create a node.

Most generally ID3 is only used for classification problems


with nominal features only.

Dataset description
In this article, we’ll be using a sample dataset of COVID-19 infection. A preview
The picture above depicts a decision tree that is used to classify whether a
of the entire dataset is shown below.
person is Fit or Unfit. +----+-------+-------+------------------+----------+
| ID | Fever | Cough | Breathing issues | Infected |
The decision nodes here are questions like ‘’‘Is the person less than 30 years of +----+-------+-------+------------------+----------+
| 1 | NO | NO | NO | NO |
age?’, ‘Does the person eat junk?’, etc. and the leaves are one of the two possible +----+-------+-------+------------------+----------+
| 2 | YES | YES | YES | YES |
outcomes viz. Fit and Unfit. +----+-------+-------+------------------+----------+
Looking at the Decision Tree we can say make the following decisions: | 3 | YES | YES | NO | NO |
+----+-------+-------+------------------+----------+
if a person is less than 30 years of age and doesn’t eat junk food then he is Fit, if | 4 | YES | NO | YES | YES |
+----+-------+-------+------------------+----------+
a person is less than 30 years of age and eats junk food then he is Unfit and so | 5 | YES | YES | YES | YES |
+----+-------+-------+------------------+----------+
on. | 6 | NO | YES | NO | NO |
+----+-------+-------+------------------+----------+
| 7 | YES | NO | YES | YES |
The initial node is called the root node (colored in blue), the final nodes are +----+-------+-------+------------------+----------+
| 8 | YES | NO | YES | YES |
called the leaf nodes (colored in green) and the rest of the nodes are +----+-------+-------+------------------+----------+
| 9 | NO | YES | YES | YES |
called intermediate or internal nodes. +----+-------+-------+------------------+----------+
| 10 | YES | YES | NO | YES |
The root and intermediate nodes represent the decisions while the leaf nodes +----+-------+-------+------------------+----------+
represent the outcomes. | 11 | NO | YES | NO | NO |
+----+-------+-------+------------------+----------+
| 12 | NO | YES | YES | YES |
+----+-------+-------+------------------+----------+ Entropy(S) = - ∑ pᵢ * log₂(pᵢ) ; i = 1 to n
| 13 | NO | YES | YES | NO |
+----+-------+-------+------------------+----------+
| 14 | YES | YES | NO | NO |
+----+-------+-------+------------------+----------+ where,
n is the total number of classes in the target column (in our case n = 2 i.e YES
The columns are self-explanatory. Y and N stand for Yes and No respectively. and NO)
The values or classes in Infected column Y and N represent Infected and Not pᵢ is the probability of class ‘i’ or the ratio of “number of rows with class i in
Infected respectively. the target column” to the “total number of rows” in the dataset.

The columns used to make decision nodes viz. ‘Breathing Issues’, ‘Cough’ and Information Gain for a feature column A is calculated as:
IG(S, A) = Entropy(S) - ∑((|Sᵥ| / |S|) * Entropy(Sᵥ))
‘Fever’ are called feature columns or just features and the column used for leaf
nodes i.e. ‘Infected’ is called the target column.
where Sᵥ is the set of rows in S for which the feature column A has value v, |Sᵥ|
is the number of rows in Sᵥ and likewise |S| is the number of rows in S.
Metrics in ID3
As mentioned previously, the ID3 algorithm selects the best feature at each step ID3 Steps
while building a Decision tree.
1. Calculate the Information Gain of each feature.
Before you ask, the answer to the question: ‘How does ID3 select the best
feature?’ is that ID3 uses Information Gain or just Gain to find the best 2. Considering that all rows don’t belong to the same class, split the
feature. dataset S into subsets using the feature for which the Information
Gain is maximum.
Information Gain calculates the reduction in the entropy and measures how
3. Make a decision tree node using the feature with the maximum
well a given feature separates or classifies the target classes. The feature with
Information gain.
the highest Information Gain is selected as the best one.
4. If all rows belong to the same class, make the current node as a leaf
In simple words, Entropy is the measure of disorder and the Entropy of a node with the class as its label.
dataset is the measure of disorder in the target feature of the dataset.
5. Repeat for the remaining features until we run out of all features, or
In the case of binary classification (where the target column has only two types
the decision tree has all leaf nodes.
of classes) entropy is 0 if all values in the target column are
homogenous(similar) and will be 1 if the target column has equal number
values for both the classes.

We denote our dataset as S, entropy is calculated as:


Implementation on our Dataset As shown below, in the 6 rows with NO, there are 2 rows having target
value YES and 4 rows having target value NO.
As stated in the previous section the first step is to find the best feature i.e. the +-------+-------+------------------+----------+
| Fever | Cough | Breathing issues | Infected |
one that has the maximum Information Gain(IG). We’ll calculate the IG for +-------+-------+------------------+----------+
each of the features now, but for that, we first need to calculate the entropy of S | NO | NO | NO | NO |
+-------+-------+------------------+----------+
| NO | YES | NO | NO |
+-------+-------+------------------+----------+
From the total of 14 rows in our dataset S, there are 8 rows with the target | NO | YES | YES | YES |
+-------+-------+------------------+----------+
value YES and 6 rows with the target value NO. The entropy of S is calculated | NO | YES | NO | NO |
+-------+-------+------------------+----------+
as: | NO | YES | YES | YES |
Entropy(S) = — (8/14) * log₂(8/14) — (6/14) * log₂(6/14) = 0.99
+-------+-------+------------------+----------+
| NO | YES | YES | NO |
+-------+-------+------------------+----------+
Note: If all the values in our target column are same the entropy will be zero
(meaning that it has no or zero randomness).
The block, below, demonstrates the calculation of Information Gain for Fever.
# total rows
|S| = 14For v = YES, |Sᵥ| = 8
We now calculate the Information Gain for each feature: Entropy(Sᵥ) = - (6/8) * log₂(6/8) - (2/8) * log₂(2/8) = 0.81For v = NO,
|Sᵥ| = 6
Entropy(Sᵥ) = - (2/6) * log₂(2/6) - (4/6) * log₂(4/6) = 0.91# Expanding the
summation in the IG formula:
IG calculation for Fever:
IG(S, Fever) = Entropy(S) - (|Sʏᴇꜱ| / |S|) * Entropy(Sʏᴇꜱ) -
In this(Fever) feature there are 8 rows having value YES and 6 rows having (|Sɴᴏ| / |S|) * Entropy(Sɴᴏ)∴ IG(S, Fever) = 0.99 - (8/14) * 0.81 - (6/14)
* 0.91 = 0.13
value NO.
As shown below, in the 8 rows with YES for Fever, there are 6 rows having
Next, we calculate the IG for the features “Cough” and “Breathing issues”.
target value YES and 2 rows having target value NO.
+-------+-------+------------------+----------+ You can use this free online tool to calculate the Information Gain.
| Fever | Cough | Breathing issues | Infected | IG(S, Cough) = 0.04
+-------+-------+------------------+----------+ IG(S, BreathingIssues) = 0.40
| YES | YES | YES | YES |
+-------+-------+------------------+----------+
| YES | YES | NO | NO |
+-------+-------+------------------+----------+ Since the feature Breathing issues have the highest Information Gain it is
| YES | NO | YES | YES |
+-------+-------+------------------+----------+ used to create the root node.
| YES | YES | YES | YES | Hence, after this initial step our tree looks like this:
+-------+-------+------------------+----------+
| YES | NO | YES | YES |
+-------+-------+------------------+----------+
| YES | NO | YES | YES |
+-------+-------+------------------+----------+
| YES | YES | NO | YES |
+-------+-------+------------------+----------+
| YES | YES | NO | NO |
Next, from the remaining two unused features, namely, Fever and Cough, we
+-------+-------+------------------+----------+
decide which one is the best for the left branch of Breathing Issues.
Since the left branch of Breathing Issues denotes YES, we will work with the Next, we find the feature with the maximum IG for the right branch
subset of the original data i.e the set of rows having YES as the value in the of Breathing Issues. But, since there is only one unused feature left we have
Breathing Issues column. These 8 rows are shown below: no other choice but to make it the right branch of the root node.
+-------+-------+------------------+----------+
| Fever | Cough | Breathing issues | Infected | So our tree now looks like this:
+-------+-------+------------------+----------+
| YES | YES | YES | YES |
+-------+-------+------------------+----------+
| YES | NO | YES | YES |
+-------+-------+------------------+----------+
| YES | YES | YES | YES |
+-------+-------+------------------+----------+
| YES | NO | YES | YES |
+-------+-------+------------------+----------+
| YES | NO | YES | YES |
+-------+-------+------------------+----------+ There are no more unused features, so we stop here and jump to the final step
| NO | YES | YES | YES |
+-------+-------+------------------+----------+ of creating the leaf nodes.
| NO | YES | YES | YES |
+-------+-------+------------------+----------+ For the left leaf node of Fever, we see the subset of rows from the original data
| NO | YES | YES | NO |
+-------+-------+------------------+----------+
set that has Breathing Issues and Fever both values as YES.
+-------+-------+------------------+----------+
| Fever | Cough | Breathing issues | Infected |
+-------+-------+------------------+----------+
Next, we calculate the IG for the features Fever and Cough using the subset Sʙʏ | YES | YES | YES | YES |
+-------+-------+------------------+----------+
(Set Breathing Issues Yes) which is shown above : | YES | NO | YES | YES |
+-------+-------+------------------+----------+
| YES | YES | YES | YES |
+-------+-------+------------------+----------+
Note: For IG calculation the Entropy will be calculated from the subset Sʙʏ | YES | NO | YES | YES |
and not the original dataset S. +-------+-------+------------------+----------+
IG(Sʙʏ, Fever) = 0.20 | YES | NO | YES | YES |
IG(Sʙʏ, Cough) = 0.09 +-------+-------+------------------+----------+

IG of Fever is greater than that of Cough, so we select Fever as the left branch Since all the values in the target column are YES, we label the left leaf node
of Breathing Issues: as YES, but to make it more logical we label it Infected.
Our tree now looks like this:
Similarly, for the right node of Fever we see the subset of rows from the original
data set that have Breathing Issues value as YES and Fever as NO.
+-------+-------+------------------+----------+
| Fever | Cough | Breathing issues | Infected |
+-------+-------+------------------+----------+
| NO | YES | YES | YES |
+-------+-------+------------------+----------+
| NO | YES | YES | NO |
+-------+-------+------------------+----------+
| NO | YES | YES | NO |
+-------+-------+------------------+----------+
Here not all but most of the values are NO, hence NO or Not
Infected becomes our right leaf node.
Our tree, now, looks like this:

We repeat the same process for the node Cough, however here both left and
right leaves turn out to be the same i.e. NO or Not Infected as shown below:

Looks Strange, doesn’t it?


I know! The right node of Breathing issues is as good as just a leaf node with
class ‘Not infected’. This is one of the Drawbacks of ID3, it doesn’t do pruning.

Pruning is a mechanism that reduces the size and complexity of a Decision tree
by removing unnecessary nodes. More about pruning can be found here.

Another drawback of ID3 is overfitting or high variance i.e. it learns the dataset
it used so well that it fails to generalize on new data.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy