0% found this document useful (0 votes)

32 views68 pages

Ilovepdf Merged

Uploaded by

dharshanbv4dgm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views68 pages

Ilovepdf Merged

Uploaded by

dharshanbv4dgm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 68

Resolution in FOL

Resolution
Resolution is a theorem proving technique that proceeds by building refutation proofs, i.e.,
proofs by contradictions. It was invented by a Mathematician John Alan Robinson in the year
1965.

Resolution is used, if there are various statements are given, and we need to prove a
conclusion of those statements. Uniﬁcation is a key concept in proofs by resolutions.
Resolution is a single inference rule which can eﬃciently operate on the conjunctive normal
form or clausal form.

Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also known as a
unit clause.

Conjunctive Normal Form: A sentence represented as a conjunction of clauses is said to be

conjunctive normal form or CNF.
The resolution inference rule:

The resolution rule for ﬁrst-order logic is simply a lifted version of the
propositional rule. Resolution can resolve two clauses if they contain
complementary literals, which are assumed to be standardized apart so that
they share no variables.

Where li and mj are complementary literals.

This rule is also called the binary resolution rule because it only resolves
exactly two literals
Example:

We can resolve two clauses which are given below:

[Animal (g(x) V Loves (f(x), x)] and [￢ Loves(a, b) V ￢Kills(a, b)]
Where two complimentary literals are: Loves (f(x), x) and ￢ Loves (a, b)
These literals can be uniﬁed with uniﬁer θ= [a/f(x), and b/x] , and it will
generate a resolvent clause:
[Animal (g(x) V ￢ Kills(f(x), x)].
Steps for Resolution:

1. Conversion of facts into ﬁrst-order logic.

2. Convert FOL statements into CNF
3. Negate the statement which needs to prove (proof by
contradiction)
4. Draw resolution graph (uniﬁcation).
Example:

a. John likes all kind of food.

b. Apple and vegetable are food
c. Anything anyone eats and not killed is food.
d. Anil eats peanuts and still alive
e. Harry eats everything that Anil eats.
Prove by resolution that:
f. John likes peanuts.
Step-1: Conversion of Facts into FOL

In the ﬁrst step we will convert all the given statements into its ﬁrst order logic.
Step-2: Conversion of FOL into CNF
In First order logic resolution, it is required to convert the FOL into CNF as CNF form makes
easier for resolution proofs.

Eliminate all implication (→) and rewrite

a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ [eats(x, y) Λ ¬ killed(x)] V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x¬ [¬ killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
Move negation (¬)inwards and rewrite

a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ eats(x, y) V killed(x) V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x ¬killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
Rename variables or standardize variables

a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀y ∀z ¬ eats(y, z) V killed(y) V food(z)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀w¬ eats(Anil, w) V eats(Harry, w)
f. ∀g ¬killed(g) ] V alive(g)
g. ∀k ¬ alive(k) V ¬ killed(k)
h. likes(John, Peanuts).
Eliminate existential instantiation quantiﬁer by elimination.

In this step, we will eliminate existential quantiﬁer ∃, and this process is known as
Skolemization. But in this example problem since there is no existential quantiﬁer
so all the statements will remain same in this step.

Drop Universal quantiﬁers.

In this step we will drop all universal quantiﬁer since all the statements are not
implicitly quantiﬁed so we don't need it.
a. ¬ food(x) V likes(John, x)
b. food(Apple)
c. food(vegetables)
d. ¬ eats(y, z) V killed(y) V food(z)
e. eats (Anil, Peanuts)
f. alive(Anil)
g. ¬ eats(Anil, w) V eats(Harry, w)
h. killed(g) V alive(g)
i. ¬ alive(k) V ¬ killed(k)
j. likes(John, Peanuts).
Distribute conjunction ∧ over disjunction ¬.
This step will not make any change in this problem.

Step-3: Negate the statement to be proved

In this statement, we will apply negation to the conclusion statements, which will
be written as ¬likes(John, Peanuts)
Step-4: Draw Resolution graph:

Now in this step, we will solve the problem by resolution tree using substitution.
For the above problem, it will be given as follows:
Hence the negation of the conclusion has been proved as a
complete contradiction with the given set of statements.
Explanation of Resolution graph:
● In the ﬁrst step of resolution graph, ¬likes(John, Peanuts) , and likes(John, x) get
resolved(canceled) by substitution of {Peanuts/x}, and we are left with ¬ food(Peanuts)
● In the second step of the resolution graph, ¬ food(Peanuts) , and food(z) get resolved
(canceled) by substitution of { Peanuts/z}, and we are left with ¬ eats(y, Peanuts) V
killed(y) .
● In the third step of the resolution graph, ¬ eats(y, Peanuts) and eats (Anil, Peanuts) get
resolved by substitution {Anil/y}, and we are left with Killed(Anil) .
● In the fourth step of the resolution graph, Killed(Anil) and ¬ killed(k) get resolve by
substitution {Anil/k}, and we are left with ¬ alive(Anil) .
● In the last step of the resolution graph ¬ alive(Anil) and alive(Anil) get resolved.
Reference
https://www.javatpoint.com/ai-resolution-in-first-order-logic

https://www.youtube.com/watch?v=C_iqWGOhvak

https://athena.ecs.csus.edu/~mei/logicp/unification-resolution/resolution-refutation.
html
Classification vs Regression Logistic Regression
Logistic Regression estimates probability that an instance belongs to a particular class.
Regression
i.e., predicting a continuous value by learning relationship between
Just like linear regression, logistic regression also finds weighted sum of the inputs but
instead of the continuous value, it outputs its sigmoid result.
dependent and independent variables.
Output is numeric value. Linear Regression
(Multivariate)

Classification
Identifying which of the category a new observation belong.

Output is class label.

Logistic Regression Logistic Regression

Logistic Regression model prediction (binary classifier) TP

Precision =
TP + FP

No analytical solution
TN FP TP
It is a convex function Recall =
TP + FN
FN TP
Gradient descent can be used to
Cost function to optimize solve the problem.
Precision x Recall
F1-score = 2 x
Precision + Recall
More Answers for Practice in Logic and HW 1.doc Ling 310 More Answers for Practice in Logic and HW 1.doc Ling 310

∀x (student(x) → (walk (x) ∨ talk (x)))

More Answers for Practice in Logic and HW 1 11. Every student who walks talks.
This is an expanded version showing additional right and wrong answers. ∀x ((student(x) & walk (x)) → talk (x))) or
∀x (student(x) → (walk (x) → talk (x)))
I. Practice in 1st-order predicate logic – with answers.
12. Every student who loves Mary is happy.
1. Mary loves everyone. [assuming D contains only humans]
∀x ((student(x) & love (x, Mary)) → happy (x)))
∀x love (Mary, x)
13. Every boy who loves Mary hates every boy who Mary loves.
Note: No further parentheses are needed here, and according to the syntax on
the handout, no further parentheses are possible. But “extra parentheses” are in ∀x((boy(x) & love (x, Mary)) → ∀y((boy(y) & love(Mary, y))→ hate (x,y)))
general considered acceptable, and if you find them helpful, I have no objection. So I 14. Every boy who loves Mary hates every other boy who Mary loves.
would also count as correct any of the following: (So if John loves Mary and Mary loves John, sentence 13 requires that
∀x (love (Mary, x)), (∀x love (Mary, x)), (∀x (love (Mary, x))) John hates himself, but sentence 14 doesn’t require that.)
∀x((boy(x) & love (x, Mary)) → ∀y((boy(y) & love(Mary, y) & y ≠ x) →
2. Mary loves everyone. [assuming D contains both humans and non-humans, so hate (x,y)))
we need to be explicit about ‘everyone’ as ‘every person’]
∀x (person(x) → love (Mary, x))
A wrong answer: ∀x (person(x) & love (Mary, x)) This says that
II. Homework #1, with answers.
everything in the universe is a person and loves Mary. 1. Everyone loves Mary.
3. No one talks. [assume D contains only humans unless specified otherwise.] ∀x love (x, Mary)
¬∃x talk(x) or equivalently, ∀x¬talk(x)
4. Everyone loves himself. 2. John does not love anyone. (Not ambiguous, but there are two equivalent and equally
good formulas for it, one involving negation and the existential quantifier, the other involving
∀x love (x, x) negation and the universal quantifier. Give both.)
5. Everyone loves everyone. ¬∃x love(John, x) or equivalently, ∀x¬ love(John, x)
∀x∀y love (x, y) Wrong: ∃x¬ love(John, x) :That says there is someone John doesn’t love.
6. Everyone loves everyone except himself. (= Everyone loves everyone else.) Wrong: ¬∀x love(John, x): That says John doesn’t love everyone; it’s equivalent to
∀x∀y(¬ x = y → love (x, y)) or ∀x∀y( x ≠ y → love (x, y)) the preceding formula.
Or maybe it should be this, which is not equivalent to the pair above:
∀x∀y(¬ x = y ↔ love (x, y)) or ∀x∀y( x ≠ y ↔ love (x, y)) 3. Everyone who sees Mary loves Mary.
The first pair allows an individual to also love himself; the second pair ∀x (see (x, Mary) → love (x, Mary))
doesn’t.
7. Every student smiles. 4. Everyone loves someone. (Ambiguous)
∀x (student(x) → smile( x)) (i) ∀x∃y love (x, y) (For every person x, there is someone whom x loves.)
8. Every student except George smiles. (ii) ∃y∀x love (x, y) (There is some person y whom everyone loves, i.e.
∀x ((student(x) & x ≠ George) → smile( x)) everyone loves some one specific person.)
That formula allows the possibility that George smiles too; if we want to
exclude it (this depends on what you believe about except; there are subtle differences 5. Someone loves everyone. (Ambiguous)
and perhaps some indeterminacy among except, besides, other than and their nearest
equivalents in other languages), then it should be the following, or something (i) ∃x∀y love (x, y) (There is some person x who loves everyone.)
equivalent to it: (ii) ∀y∃x love (x, y) (For every person y, there is someone who loves them –
i.e., no one is totally unloved. This second reading is probably dispreferred for the
∀x ((student(x) → (x ≠ George ↔ smile( x))) active sentence. It’s the preferred reading for the passive sentence “Everyone is loved by
9. Everyone walks or talks. someone” and it’s the only reading for the agentless passive “Everyone is loved.”)
∀x (walk (x) ∨ talk (x))
10. Every student walks or talks. 6. Someone walks and talks.

1 Feb 27, 2006 2 Feb 27, 2006

More Answers for Practice in Logic and HW 1.doc Ling 310 More Answers for Practice in Logic and HW 1.doc Ling 310

∃x(walk (x) & talk (x))

12. If anyone cheats, everyone suffers.
7. Someone walks and someone talks. ∀x (cheat(x) → ∀y suffer(y))
(∃x walk (x) & ∃x talk (x)) or (∃x walk (x) & ∃y talk (y)) Equivalent: ∀x∀y (cheat(x) → suffer(y))
Because neither quantifier is inside the scope of the other – i.e. their scopes are Also equivalent: ∀y ∀x (cheat(x) → suffer(y))
independent – it doesn’t matter whether we use different variables here or use the same
variable twice. But if one quantifier is inside the scope of the other, then it matters a great
Also equivalent: ∃x cheat(x) → ∀y suffer( y) (Each quantifier has narrow scope
deal. When one quantifier is inside the scope of another, as in questions 4 and 5 above, here.)
always give them different variables! Also equivalent: ∃x cheat(x) → ∀x suffer(x) (If each quantifier has narrow scope,
Also equivalent: ∃x∃y(walk (x) & talk (y)) then they don’t need to involve different variables. If one is inside the scope of the
other, then they do.)
8. Everyone who walks is calm. Also equivalent: ∀y( ∃x cheat(x) → suffer(y))
∀x (walk(x) → calm( x)) A wrong answer: ∀y∃x (cheat(x) → suffer(y)) This has no natural English
paraphrase.
A different wrong answer: ∀y(∀x cheat(x) → suffer(y)) This is one way of saying
9. No one who runs walks. (Not ambiguous, but same note as for number 2.)
“If everyone cheats, then everyone suffers.”
(i) ¬∃x (run (x) & walk (x)) or equivalently, Another note about any: As the equivalent answers above illustrate, any in this case
(ii) ∀x (run(x) → ¬ walk(x)) or equivalently, can be viewed either as a wide-scope universal (with scope over the if-clause) or as a
(iii) ∀x¬(run (x) & walk (x)) narrow-scope existential (with scope inside the if-clause). The fact that these are
equivalent, at least in this case, is part of the source of debates about any. In example
A wrong answer: ∀x (¬run(x) → walk(x)) What does this one say?
11, we didn’t have that choice, because if any were treated as a narrow-scope
Another wrong answer: ¬∃x (run (x) → walk (x)) This one doesn’t correspond to existential in that case, it couldn’t bind the second occurrence of the variable x
any English sentence; see notes to questions 11 and 6’ below. corresponding to the pronoun he. The same is true for anyone in the next example,
which has to be treated as a wide-scope universal in order to bind himself.
10. Everyone who Mary loves loves someone who is happy.
∀x(love (Mary, x)→ ∃y(love(x,y) & happy( y))) 13. Anyone who loves everyone loves himself.
Also correct: ∀x∃y (love (Mary, x)→ (love(x,y) & happy( y))) ∀x(∀y love (x,y)→ (love(x,x))
But I recommend keeping each quantifier as close as possible to the noun it note: Not this: ∀x∀y (love (x,y)→ love(x,x)) What this one says is “Anyone who
quantifies, or to its surface position. The more you move quantifiers around, the easier loves anyone loves himself” What the correct one says is IF you love everyone,
it is to make mistakes. THEN you love yourself. So the ∀y quantifier has to be inside the scope of the →.
Another wrong answer: ∃x∀y(love (x,y)→ love(x,x)) This has no natural English
11. If anyone cheats, he suffers. (English paraphrases: Anyone who cheats suffers. paraphrase. Any may sometimes be a wide-scope universal, and sometimes a narrow-
Everyone who cheats suffers. On the subtle difference between these two, see scope existential, but it is never a “wide-scope existential.”
(Kadmon and Landman 1993).)
∀x (cheat(x) → suffer( x)) 14. Mary loves everyone except John. (For this one, you need to add the two-place
A wrong answer: ∃x(cheat(x) → suffer( x)) A wide scope ∃x like this creates too predicate of identity, “=”. Think of “everyone except John” as “everyone who is not
weak a statement. If ∃x were given scope only over the antecedent, as in: ∃xcheat(x) identical to John”.)
→ suffer( x), then that error would be corrected but there would be a new problem ∀x (¬ x = John → love (Mary, x)) or equivalently
because the second x would not be bound. ∀x (x ≠ John → love (Mary, x))
Note on any: Sometimes anyone corresponds to ∃ and sometimes to ∀; you have to As in the case of some earlier examples, this is a ‘weak’ reading of except, allowing
think about the meaning of the whole sentence. Many papers have been written the possibility of Mary loving John. To get a ‘strong’ reading of except, ruling out that
exploring the issue of how best to account for the distribution of meanings of any, and possibility, replace → above by ↔, or add a conjunct “ & ¬ love (Mary, John)” at
whether it does or doesn’t require lexical ambiguity as part of the account. A few the end.
classics include (Carlson 1980, Carlson 1981, Haspelmath 1997, Hintikka 1980,
Kadmon and Landman 1993, Kratzer and Shimoyama 2002, Ladusaw 1980,
Linebarger 1987, Vendler 1962). See also the note about any in the next item.

3 Feb 27, 2006 4 Feb 27, 2006

More Answers for Practice in Logic and HW 1.doc Ling 310 More Answers for Practice in Logic and HW 1.doc Ling 310

15. Redo the translations of sentences 1, 4, 6, and 7, making use of the predicate Carlson, Greg. 1981. Distribution of free-choice 'any'. In Chicago Linguistic Society
person, as we would have to do if the domain D contains not only humans but cats, 17, 8-23. Chicago.
robots, and other entities. Haspelmath, Martin. 1997. Indefinite Pronouns. Oxford: Oxford University Press.
Hintikka, Jaakko. 1980. On the "Any"-Thesis and the Methodology of Linguistics.
1’. Everyone loves Mary. Linguistics and Philosophy 4:101-122.
Kadmon, Nirit, and Landman, Fred. 1993. Any. Linguistics & Philosophy 16:353-422.
∀x (person(x) → love (x, Mary)) Kratzer, Angelika, and Shimoyama, Junko. 2002. Indeterminate pronouns: the view
from Japanese. In The Proceedings of the Third Tokyo Conference on
4’. Everyone loves someone. (Ambiguous) Psycholinguistics, ed. Yukio Otsu, 1-25. Tokyo: Hituzi Syobo.
Ladusaw, William. 1980. On the notion "affective" in the analysis of negative polarity
(i) ∀x(person(x) → ∃y(person(y) & love (x, y))) (For every person x, there is
some person y whom x loves.) items. Journal of Linguistic Research 1:1-16. Reprinted in Portner and Partee
(2002), pp. 457-470.
(ii) ∃y(person(y) & ∀x(person(x) → love (x, y))) (There is some person y Linebarger, Marcia. 1987. Negative polarity and grammatical representation.
whom every person x loves.)
Linguistics and Philosophy 10:325-387.
An equivalent correct answer for (i): ∀x∃y (person(x) → (person(y) & love (x, y))) Vendler, Zeno. 1962. Each and Every, Any and All. Mind 71:145-160.
But I don’t recommend moving the second quantifier, because then it’s too easy to
come up with the following wrong answer for (i): ∀x∃y ((person(x) & person(y)) →
love (x, y)). It’s always safer to keep a quantifier and its “restrictor” (in this case
person) as close together as possible, and both of them as close to their surface
position as possible.

6’. Someone walks and talks.

∃x(person(x) & walk (x) & talk (x))
Note: technically, we need more parentheses – either
∃x(person(x) & (walk (x) & talk (x))) or
∃x((person(x) & walk (x)) & talk (x))
But since it’s provable that & is associative, i.e. the grouping of a sequence of &’s
doesn’t make any difference, it is customary to allow expressions like (p & q & r).
And similarly for big disjunctions, (p ∨ q ∨ r). But not with → !
Wrong: ∃x(person(x) → (walk (x) & talk (x))) This has weird truth-conditions,
which you can see if you remember that p → q is equivalent to ¬p ∨ q. You will
never really want to combine ∃ with → -- it always makes a statement that is too
weak.

7’. Someone walks and someone talks.

(∃x (person(x) & walk (x)) & ∃x(person(x) & talk (x))) or equivalently
(∃x (person(x) & walk (x)) & ∃y (person(y) & talk (y)))
Note: both in the original 7 and in this 7’, it would be OK and customary to drop
outermost parentheses, i.e. the very first left parenthesis and the very last right
parenthesis may be dropped. (But no parentheses can be dropped in 6; they are not
really “outermost”. Only when a pair of parentheses contains the entire formula can it
be dropped under the “drop outermost parentheses” convention.
Also correct: ∃x∃y (person(x) & walk (x) & person(y) & talk (y))

References
Carlson, Greg. 1980. Polarity Any is Existential. Linguistic Inquiry 11:799-804.

5 Feb 27, 2006 6 Feb 27, 2006

3. Negate the statement which needs to prove (proof by contradiction)
4. Draw resolution graph (unification).
Resolution
To better understand all the above steps, we will take an example in which we will apply resolution.
Resolution is a theorem proving technique that proceeds by building refutation proofs, i.e., proofs
by contradictions. It was invented by a Mathematician John Alan Robinson in the year 1965. Example:
Resolution is used, if there are various statements are given, and we need to prove a conclusion of
a. John likes all kind of food.
those statements. Unification is a key concept in proofs by resolutions. Resolution is a single
inference rule which can efficiently operate on the conjunctive normal form or clausal form. b. Apple and vegetable are food
c. Anything anyone eats and not killed is food.
Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also known as a unit clause.
d. Anil eats peanuts and still alive
Conjunctive Normal Form: A sentence represented as a conjunction of clauses is said to e. Harry eats everything that Anil eats.
be conjunctive normal form or CNF. Prove by resolution that:
Note: To better understand this topic, firstly learns the FOL in AI. f. John likes peanuts.

The resolution inference rule: Step-1: Conversion of Facts into FOL

In the first step we will convert all the given statements into its first order logic.
The resolution rule for first-order logic is simply a lifted version of the propositional rule. Resolution
can resolve two clauses if they contain complementary literals, which are assumed to be standardized
apart so that they share no variables.

Where li and mj are complementary literals.

This rule is also called the binary resolution rule because it only resolves exactly two literals.

Example:
We can resolve two clauses which are given below: Step-2: Conversion of FOL into CNF

[Animal (g(x) V Loves (f(x), x)] and [￢ Loves(a, b) V ￢Kills(a, b)] In First order logic resolution, it is required to convert the FOL into CNF as CNF form makes easier
for resolution proofs.
Where two complimentary literals are: Loves (f(x), x) and ￢ Loves (a, b)
AD
AD
These literals can be unified with unifier θ= [a/f(x), and b/x] , and it will generate a resolvent clause:
o Eliminate all implication (→) and rewrite
[Animal (g(x) V ￢ Kills(f(x), x)]. a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
Steps for Resolution:
c. ∀x ∀y ¬ [eats(x, y) Λ ¬ killed(x)] V food(y)
1. Conversion of facts into first-order logic. d. eats (Anil, Peanuts) Λ alive(Anil)
2. Convert FOL statements into CNF e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x¬ [¬ killed(x) ] V alive(x) i. ¬ alive(k) V ¬ killed(k)
g. ∀x ¬ alive(x) V ¬ killed(x) j. likes(John, Peanuts).
h. likes(John, Peanuts).
Step-1: Conversion of Facts into FOL
o Move negation (¬)inwards and rewrite
In the first step we will convert all the given statements into its first order logic.
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ eats(x, y) V killed(x) V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x ¬killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
o Rename variables or standardize variables
a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables) Step-2: Conversion of FOL into CNF
c. ∀y ∀z ¬ eats(y, z) V killed(y) V food(z)
In First order logic resolution, it is required to convert the FOL into CNF as CNF form makes easier
d. eats (Anil, Peanuts) Λ alive(Anil) for resolution proofs.
e. ∀w¬ eats(Anil, w) V eats(Harry, w)
AD
f. ∀g ¬killed(g) ] V alive(g) AD

o Eliminate all implication (→) and rewrite

g. ∀k ¬ alive(k) V ¬ killed(k)
a. ∀x ¬ food(x) V likes(John, x)
h. likes(John, Peanuts).
b. food(Apple) Λ food(vegetables)
o Eliminate existential instantiation quantifier by elimination.
In this step, we will eliminate existential quantifier ∃, and this process is known c. ∀x ∀y ¬ [eats(x, y) Λ ¬ killed(x)] V food(y)
as Skolemization. But in this example problem since there is no existential quantifier so all d. eats (Anil, Peanuts) Λ alive(Anil)
the statements will remain same in this step. e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
o Drop Universal quantifiers. f. ∀x¬ [¬ killed(x) ] V alive(x)
In this step we will drop all universal quantifier since all the statements are not implicitly g. ∀x ¬ alive(x) V ¬ killed(x)
quantified so we don't need it.
h. likes(John, Peanuts).
a. ¬ food(x) V likes(John, x)
o Move negation (¬)inwards and rewrite
b. food(Apple)
a. ∀x ¬ food(x) V likes(John, x)
c. food(vegetables)
b. food(Apple) Λ food(vegetables)
d. ¬ eats(y, z) V killed(y) V food(z)
c. ∀x ∀y ¬ eats(x, y) V killed(x) V food(y)
e. eats (Anil, Peanuts)
d. eats (Anil, Peanuts) Λ alive(Anil)
f. alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
g. ¬ eats(Anil, w) V eats(Harry, w)
f. ∀x ¬killed(x) ] V alive(x)
h. killed(g) V alive(g)
g. ∀x ¬ alive(x) V ¬ killed(x) In this statement, we will apply negation to the conclusion statements, which will be written as
¬likes(John, Peanuts)
h. likes(John, Peanuts).
o Rename variables or standardize variables Step-4: Draw Resolution graph:
a. ∀x ¬ food(x) V likes(John, x)
Now in this step, we will solve the problem by resolution tree using substitution. For the above
b. food(Apple) Λ food(vegetables) problem, it will be given as follows:
c. ∀y ∀z ¬ eats(y, z) V killed(y) V food(z)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀w¬ eats(Anil, w) V eats(Harry, w)
f. ∀g ¬killed(g) ] V alive(g)
g. ∀k ¬ alive(k) V ¬ killed(k)
h. likes(John, Peanuts).
o Eliminate existential instantiation quantifier by elimination.
In this step, we will eliminate existential quantifier ∃, and this process is known
as Skolemization. But in this example problem since there is no existential quantifier so all
the statements will remain same in this step.
o Drop Universal quantifiers.
In this step we will drop all universal quantifier since all the statements are not implicitly
quantified so we don't need it.
a. ¬ food(x) V likes(John, x)
b. food(Apple)
c. food(vegetables) Hence the negation of the conclusion has been proved as a complete contradiction with the given
d. ¬ eats(y, z) V killed(y) V food(z) set of statements.

e. eats (Anil, Peanuts) AD

AD
f. alive(Anil) AD
AD

g. ¬ eats(Anil, w) V eats(Harry, w)
h. killed(g) V alive(g) Explanation of Resolution graph:
i. ¬ alive(k) V ¬ killed(k)
o In the first step of resolution graph, ¬likes(John, Peanuts) , and likes(John, x) get
j. likes(John, Peanuts). resolved(canceled) by substitution of {Peanuts/x}, and we are left with ¬ food(Peanuts)
Note: Statements "food(Apple) Λ food(vegetables)" and "eats (Anil, Peanuts) Λ alive(Anil)" can be written o In the second step of the resolution graph, ¬ food(Peanuts) , and food(z) get resolved
in two separate statements. (canceled) by substitution of { Peanuts/z}, and we are left with ¬ eats(y, Peanuts) V
killed(y) .
o Distribute conjunction ∧ over disjunction ¬.
o In the third step of the resolution graph, ¬ eats(y, Peanuts) and eats (Anil, Peanuts) get
This step will not make any change in this problem.
resolved by substitution {Anil/y}, and we are left with Killed(Anil) .
AD
AD o In the fourth step of the resolution graph, Killed(Anil) and ¬ killed(k) get resolve by
substitution {Anil/k}, and we are left with ¬ alive(Anil) .
Step-3: Negate the statement to be proved
o In the last step of the resolution graph ¬ alive(Anil) and alive(Anil) get resolved.
• If we want to prove that every element has a similar property, we can apply this rule.
• x must not be used as a free variable in this rule.
Inference in First-Order Logic
Example: Let's represent, P(c): "A byte contains 8 bits", so "All bytes contain 8 bits."for ∀ x P(x)
In First-Order Logic, inference is used to derive new facts or sentences from existing ones. Before we , it will also be true.
get into the FOL inference rule, it's important to understand some basic FOL terminology.
2. Universal Instantiation:
Substitution:
• A valid inference rule is universal instantiation, often known as universal elimination or UI. It
Substitution is a basic procedure that is applied to terms and formulations. It can be found in all can be used to add additional sentences many times.
first-order logic inference systems. When there are quantifiers in FOL, the substitution becomes • The new knowledge base is logically equal to the existing knowledge base.
more complicated. When we write F[a/x], we are referring to the substitution of a constant "a" for • We can infer any phrase by replacing a ground word for the variable, according to UI
the variable "x." • The UI rule say that we can infer any sentence P(c) by substituting a ground term c (a constant
within domain x) from ∀ x P(x) for any object in the universe of discourse.
[ Note: first-order logic can convey facts about some or all of the universe's objects. ] • It can be represented as

Equality:

In First-Order Logic, atomic sentences are formed not only via the use of predicate and words, but
also through the application of equality. We can do this by using equality symbols, which indicate Example: 1 IF "Every person like ice-cream"=> ∀x P(x) so we can infer that
that the two terms relate to the same thing. "John likes ice-cream" => P(c)

Example: Brother (John) = Smith. Example: 2 Let's take a famous example,

"All kings who are greedy are Evil." So let our knowledge base contains this detail as in the form of
In the above example, the object referred by the Brother (John) is close to the object referred FOL: ∀x king(x) ∧ greedy (x) → Evil (x),
by Smith. The equality symbol can be used with negation to portray that two terms are not the same We can infer any of the following statements using Universal Instantiation from this information:
objects.
• King(John) ∧ Greedy (John) → Evil (John),
Example: ￢(x=y) which is equivalent to x ≠y. • King(Richard) ∧ Greedy (Richard) → Evil (Richard),
• We can infer any phrase by replacing a ground word for the variable, according to UI
FOL inference rules for quantifier: • King(Father(John)) ∧ Greedy (Father(John)) → Evil (Father(John)),

First-order logic has inference rules similar to propositional logic, therefore here are some basic 3. Existential Instantiation:
inference rules in FOL:
• Existential instantiation is also known as Existential Elimination, and it is a legitimate first-
• Universal Generalization order logic inference rule.
• Universal Instantiation • It can only be used to replace the existential sentence once.
• Existential Instantiation • Although the new KB is not conceptually identical to the old KB, it will be satisfiable if the old
• Existential introduction KB was.
• This rule states that for a new constant symbol c, one can deduce P(c) from the formula given
1. Universal Generalization: in the form of x P(x).
• The only constraint with this rule is that c must be a new word for which P(c) is true.
• Universal generalization is a valid inference rule that states that if premise P(c) is true for any
• It's written like this:
arbitrary element c in the universe of discourse, we can arrive at the conclusion x P. (x).
• It can be represented as:
Example: 1

From the given sentence: ∃x Crown(x) ∧ OnHead(x, John),

Introduction
So we can infer: Crown(K) ∧ OnHead( K, John), as long as K does not appear in the knowledge
base.
In this article, we will discuss the mathematical intuition behind Naive Bayes Classifiers, an d

we’ll also see how to implement this on Python.

• The above used K is a constant symbol, which is known as Skolem constant.
• The Existential instantiation is a special case of Skolemization process. This model is easy to build and is mostly used for large datasets. It is a probabilistic machine

4. Existential introduction learning model that is used for classification problems. The core of the classifier depends on

the Bayes theorem with an assumption of independence among predictors. That means
• An existential generalization is a valid inference rule in first-order logic that is also known as
an existential introduction. changing the value of a feature doesn’t change the value of another feature.
• This rule argues that if some element c in the universe of discourse has the property P, we
can infer that something in the universe has the attribute P.
• It's written like this: Why is it called Naive?

It is called Naive because of the assumption that 2 variables are independent when they may

not be. In a real-world scenario, there is hardly any situation where the features are
• Example: Let's say that,
independent.
"Priyanka got good marks in English."
"Therefore, someone got good marks in English."
Naive Bayes does seem to be a simple yet powerful algorithm. But why is it so popular?
Generalized Modus Ponens Rule:

In FOL, we use a single inference rule called Generalized Modus Ponens for the inference process.
It's a modified form of Modus ponens.
Since it is a probabilistic approach, the predictions can be made real quick. It can be used for
"P implies Q, and P is declared to be true, hence Q must be true," summarizes Generalized Modus both binary and multi-class classification problems.
Ponens.

Modus Ponens states that for atomic phrases pi, pi', q. Where there is a substitution θ such that Before we dive deeper into this topic we need to understand what is “Conditional probability”,
SUBST (θ, pi',) = SUBST(θ, pi), it can be represented as:
what is “Bayes’ theorem” and how conditional probability help’s us in Bayes’ theorem.

Table of contents
Conditional Probability for Naive Bayes
Example: We will use this rule for Kings are evil, so we will find some x such that x is king,
and x is greedy so we can infer that x is evil.
Conditional probability is defined as the likelihood of an event or outcome occurring, based on From the above examples, we observe that the probability may change if some additional

the occurrence of a previous event or outcome. Conditional probability is calculated by information is given to us. This is exactly the case while building any machine learning model,

multiplying the probability of the preceding event by the updated probability of the succeeding , we need to find the output given some features.

or conditional, event.
Mathematically, the conditional probability of event A given event B has already happened is

Let’s start understanding this definition with examples. given by:

Suppose I ask you to pick a card from the deck and find the probability of getting a king given

the card is clubs.

Observe carefully that here I have mentioned a condition that the card is clubs.

Now while calculating the probability my denominator will not be 52, instead, it will be 13
Image Source: Author
because the total number of cards in clubs is 13.
Bayes’ Rule
Since we have only one king in clubs the probability of getting a KING given the card is clubs

will be 1/13 = 0.077. Now we are prepared to state one of the most useful results in conditional probability: Bayes’

Rule.
Let’s take one more example,

Bayes’ theorem which was given by Thomas Bayes, a British Mathematician, in 1763 provides
Consider a random experiment of tossing 2 coins. The sample space here will be:
a means for calculating the probability of an event given some information.

S = {HH, HT, TH, TT}

Mathematically Bayes’ theorem can be stated as:

If a person is asked to find the probability of getting a tail his answer would be 3/4 = 0.75

Now suppose this same experiment is performed by another person but now we give him

the condition that both the coins should have heads. This means if event A: ‘Both the coins

should have heads’, has happened then the elementary outcomes {HT, TH, TT} could not

have happened. Hence in this situation, the probability of getting heads on both the coins will
Basically, we are trying to find the probability of event A, given event B is true.
be 1/4 = 0.25
Here P(B) is called prior probability which means it is the probability of an event before the

evidence
P(B|A) is called the posterior probability i.e., Probability of an event after the evidence is seen. When there are multiple X variables, we simplify it by assuming that X’s are independent, so

With regards to our dataset, this formula can be re-written as:

Y: class of the variable

For n number of X, the formula becomes Naive Bayes:

X: dependent feature vector (of size n)

Which can be expressed as:

Since the denominator is constant here so we can remove it. It’s purely your choice if you
Image Source: Author
want to remove it or not. Removing the denominator will help you save time and calculations.
What is Naive Bayes?

Bayes’ rule provides us with the formula for the probability of Y given some feature X. In real -

world problems, we hardly find any case where there is only one feature.

This formula can also be understood as:

When the features are independent, we can extend Bayes’ rule to what is called Naive Bayes

which assumes that the features are independent that means changing the value of one

feature doesn’t influence the values of other variables and this is why we call this algorithm

“NAIVE”

Naive Bayes can be used for various things like face recognition, weather prediction, Medical

Diagnosis, News classification, Sentiment Analysis, and a lot more.

Image Source: Author

Assumptions of Naive Bayes

· All the variables are independent. That is if the animal is Dog that doesn’t mean that Size

will be Medium

Image Source: Author · All the predictors have an equal effect on the outcome. That is, the animal being dog does

not have more importance in deciding If we can pet him or not. All the features have equal
There are a whole lot of formulas mentioned here but worry not we will try to understand all
importance.
this with the help of an example.

Naive Bayes Example We should try to apply the Naive Bayes formula on the above dataset however before that, we

need to do some precomputations on our dataset.

Let’s take a dataset to predict whether we can pet an animal or not.
We need to find P(x i|yj) for each x i in X and each y j in Y. All these calculations have been

demonstrated below:

We also need the probabilities (P(y)), which are calculated in the table below. For example,

P(Pet Animal = NO) = 6/14.

We see here that P(Yes|Test) > P(No|Test), so the prediction that we can pet this animal

is “Yes”.

Gaussian Naive Bayes

So far, we have discussed how to predict probabilities if the predictors take up discrete

values. But what if they are continuous? For this, we need to make some more assumptions

regarding the distribution of each feature. The different naive Bayes classif iers differ mainly by
Now if we send our test data, suppose test = (Cow, Medium, Black)
the assumptions they make regarding the distribution of P(xi | y). Here we’ll discuss Gaussian
Probability of petting an animal :
Naïve Bayes.

Gaussian Naïve Bayes is used when we assume all the continuous variables associated with

each feature to be distributed according to Gaussian Distribution. Gaussian Distribution is

also called Normal distribution.

And the probability of not petting an animal: The conditional probability changes here since we have different values now. Also, the

(PDF) probability density function of a normal distribution is given by:

We know P(Yes|Test)+P(No|test) = 1
We can use this formula to compute the probability of likelihoods if our data is continuous.

So, we will normalize the result: Endnotes

Naive Bayes algorithms are mostly used in face recognition, weather prediction, Medical

Diagnosis, News classification, Sentiment Analysis, etc. In this article, we learned the

mathematical intuition behind this algorithm. You have already taken your first step to master

this algorithm and from here all you need is practice.

What are Decision Trees? ID3 in brief

In simple words, a decision tree is a structure that contains nodes (rectangular ID3 stands for Iterative Dichotomiser 3 and is named such because the
boxes) and edges(arrows) and is built from a dataset (table of columns algorithm iteratively (repeatedly) dichotomizes(divides) features into two or
representing features/attributes and rows corresponds to records). Each node more groups at each step.
is either used to make a decision (known as decision node) or represent
an outcome (known as leaf node). Invented by Ross Quinlan, ID3 uses a top-down greedy approach to build a
decision tree. In simple words, the top-down approach means that we start
Decision tree Example building the tree from the top and the greedy approach means that at each
iteration we select the best feature at the present moment to create a node.

Most generally ID3 is only used for classification problems

with nominal features only.

Dataset description
In this article, we’ll be using a sample dataset of COVID-19 infection. A preview
The picture above depicts a decision tree that is used to classify whether a
of the entire dataset is shown below.
person is Fit or Unfit. +----+-------+-------+------------------+----------+
| ID | Fever | Cough | Breathing issues | Infected |
The decision nodes here are questions like ‘’‘Is the person less than 30 years of +----+-------+-------+------------------+----------+
| 1 | NO | NO | NO | NO |
age?’, ‘Does the person eat junk?’, etc. and the leaves are one of the two possible +----+-------+-------+------------------+----------+
| 2 | YES | YES | YES | YES |
outcomes viz. Fit and Unfit. +----+-------+-------+------------------+----------+
Looking at the Decision Tree we can say make the following decisions: | 3 | YES | YES | NO | NO |
+----+-------+-------+------------------+----------+
if a person is less than 30 years of age and doesn’t eat junk food then he is Fit, if | 4 | YES | NO | YES | YES |
+----+-------+-------+------------------+----------+
a person is less than 30 years of age and eats junk food then he is Unfit and so | 5 | YES | YES | YES | YES |
+----+-------+-------+------------------+----------+
on. | 6 | NO | YES | NO | NO |
+----+-------+-------+------------------+----------+
| 7 | YES | NO | YES | YES |
The initial node is called the root node (colored in blue), the final nodes are +----+-------+-------+------------------+----------+
| 8 | YES | NO | YES | YES |
called the leaf nodes (colored in green) and the rest of the nodes are +----+-------+-------+------------------+----------+
| 9 | NO | YES | YES | YES |
called intermediate or internal nodes. +----+-------+-------+------------------+----------+
| 10 | YES | YES | NO | YES |
The root and intermediate nodes represent the decisions while the leaf nodes +----+-------+-------+------------------+----------+
represent the outcomes. | 11 | NO | YES | NO | NO |
+----+-------+-------+------------------+----------+
| 12 | NO | YES | YES | YES |
+----+-------+-------+------------------+----------+ Entropy(S) = - ∑ pᵢ * log₂(pᵢ) ; i = 1 to n
| 13 | NO | YES | YES | NO |
+----+-------+-------+------------------+----------+
| 14 | YES | YES | NO | NO |
+----+-------+-------+------------------+----------+ where,
n is the total number of classes in the target column (in our case n = 2 i.e YES
The columns are self-explanatory. Y and N stand for Yes and No respectively. and NO)
The values or classes in Infected column Y and N represent Infected and Not pᵢ is the probability of class ‘i’ or the ratio of “number of rows with class i in
Infected respectively. the target column” to the “total number of rows” in the dataset.

The columns used to make decision nodes viz. ‘Breathing Issues’, ‘Cough’ and Information Gain for a feature column A is calculated as:
IG(S, A) = Entropy(S) - ∑((|Sᵥ| / |S|) * Entropy(Sᵥ))
‘Fever’ are called feature columns or just features and the column used for leaf
nodes i.e. ‘Infected’ is called the target column.
where Sᵥ is the set of rows in S for which the feature column A has value v, |Sᵥ|
is the number of rows in Sᵥ and likewise |S| is the number of rows in S.
Metrics in ID3
As mentioned previously, the ID3 algorithm selects the best feature at each step ID3 Steps
while building a Decision tree.
1. Calculate the Information Gain of each feature.
Before you ask, the answer to the question: ‘How does ID3 select the best
feature?’ is that ID3 uses Information Gain or just Gain to find the best 2. Considering that all rows don’t belong to the same class, split the
feature. dataset S into subsets using the feature for which the Information
Gain is maximum.
Information Gain calculates the reduction in the entropy and measures how
3. Make a decision tree node using the feature with the maximum
well a given feature separates or classifies the target classes. The feature with
Information gain.
the highest Information Gain is selected as the best one.
4. If all rows belong to the same class, make the current node as a leaf
In simple words, Entropy is the measure of disorder and the Entropy of a node with the class as its label.
dataset is the measure of disorder in the target feature of the dataset.
5. Repeat for the remaining features until we run out of all features, or
In the case of binary classification (where the target column has only two types
the decision tree has all leaf nodes.
of classes) entropy is 0 if all values in the target column are
homogenous(similar) and will be 1 if the target column has equal number
values for both the classes.

We denote our dataset as S, entropy is calculated as:

Implementation on our Dataset As shown below, in the 6 rows with NO, there are 2 rows having target
value YES and 4 rows having target value NO.
As stated in the previous section the first step is to find the best feature i.e. the +-------+-------+------------------+----------+
| Fever | Cough | Breathing issues | Infected |
one that has the maximum Information Gain(IG). We’ll calculate the IG for +-------+-------+------------------+----------+
each of the features now, but for that, we first need to calculate the entropy of S | NO | NO | NO | NO |
+-------+-------+------------------+----------+
| NO | YES | NO | NO |
+-------+-------+------------------+----------+
From the total of 14 rows in our dataset S, there are 8 rows with the target | NO | YES | YES | YES |
+-------+-------+------------------+----------+
value YES and 6 rows with the target value NO. The entropy of S is calculated | NO | YES | NO | NO |
+-------+-------+------------------+----------+
as: | NO | YES | YES | YES |
Entropy(S) = — (8/14) * log₂(8/14) — (6/14) * log₂(6/14) = 0.99
+-------+-------+------------------+----------+
| NO | YES | YES | NO |
+-------+-------+------------------+----------+
Note: If all the values in our target column are same the entropy will be zero
(meaning that it has no or zero randomness).
The block, below, demonstrates the calculation of Information Gain for Fever.
# total rows
|S| = 14For v = YES, |Sᵥ| = 8
We now calculate the Information Gain for each feature: Entropy(Sᵥ) = - (6/8) * log₂(6/8) - (2/8) * log₂(2/8) = 0.81For v = NO,
|Sᵥ| = 6
Entropy(Sᵥ) = - (2/6) * log₂(2/6) - (4/6) * log₂(4/6) = 0.91# Expanding the
summation in the IG formula:
IG calculation for Fever:
IG(S, Fever) = Entropy(S) - (|Sʏᴇꜱ| / |S|) * Entropy(Sʏᴇꜱ) -
In this(Fever) feature there are 8 rows having value YES and 6 rows having (|Sɴᴏ| / |S|) * Entropy(Sɴᴏ)∴ IG(S, Fever) = 0.99 - (8/14) * 0.81 - (6/14)
* 0.91 = 0.13
value NO.
As shown below, in the 8 rows with YES for Fever, there are 6 rows having
Next, we calculate the IG for the features “Cough” and “Breathing issues”.
target value YES and 2 rows having target value NO.
+-------+-------+------------------+----------+ You can use this free online tool to calculate the Information Gain.
| Fever | Cough | Breathing issues | Infected | IG(S, Cough) = 0.04
+-------+-------+------------------+----------+ IG(S, BreathingIssues) = 0.40
| YES | YES | YES | YES |
+-------+-------+------------------+----------+
| YES | YES | NO | NO |
+-------+-------+------------------+----------+ Since the feature Breathing issues have the highest Information Gain it is
| YES | NO | YES | YES |
+-------+-------+------------------+----------+ used to create the root node.
| YES | YES | YES | YES | Hence, after this initial step our tree looks like this:
+-------+-------+------------------+----------+
| YES | NO | YES | YES |
+-------+-------+------------------+----------+
| YES | NO | YES | YES |
+-------+-------+------------------+----------+
| YES | YES | NO | YES |
+-------+-------+------------------+----------+
| YES | YES | NO | NO |
Next, from the remaining two unused features, namely, Fever and Cough, we
+-------+-------+------------------+----------+
decide which one is the best for the left branch of Breathing Issues.
Since the left branch of Breathing Issues denotes YES, we will work with the Next, we find the feature with the maximum IG for the right branch
subset of the original data i.e the set of rows having YES as the value in the of Breathing Issues. But, since there is only one unused feature left we have
Breathing Issues column. These 8 rows are shown below: no other choice but to make it the right branch of the root node.
+-------+-------+------------------+----------+
| Fever | Cough | Breathing issues | Infected | So our tree now looks like this:
+-------+-------+------------------+----------+
| YES | YES | YES | YES |
+-------+-------+------------------+----------+
| YES | NO | YES | YES |
+-------+-------+------------------+----------+
| YES | YES | YES | YES |
+-------+-------+------------------+----------+
| YES | NO | YES | YES |
+-------+-------+------------------+----------+
| YES | NO | YES | YES |
+-------+-------+------------------+----------+ There are no more unused features, so we stop here and jump to the final step
| NO | YES | YES | YES |
+-------+-------+------------------+----------+ of creating the leaf nodes.
| NO | YES | YES | YES |
+-------+-------+------------------+----------+ For the left leaf node of Fever, we see the subset of rows from the original data
| NO | YES | YES | NO |
+-------+-------+------------------+----------+
set that has Breathing Issues and Fever both values as YES.
+-------+-------+------------------+----------+
| Fever | Cough | Breathing issues | Infected |
+-------+-------+------------------+----------+
Next, we calculate the IG for the features Fever and Cough using the subset Sʙʏ | YES | YES | YES | YES |
+-------+-------+------------------+----------+
(Set Breathing Issues Yes) which is shown above : | YES | NO | YES | YES |
+-------+-------+------------------+----------+
| YES | YES | YES | YES |
+-------+-------+------------------+----------+
Note: For IG calculation the Entropy will be calculated from the subset Sʙʏ | YES | NO | YES | YES |
and not the original dataset S. +-------+-------+------------------+----------+
IG(Sʙʏ, Fever) = 0.20 | YES | NO | YES | YES |
IG(Sʙʏ, Cough) = 0.09 +-------+-------+------------------+----------+

IG of Fever is greater than that of Cough, so we select Fever as the left branch Since all the values in the target column are YES, we label the left leaf node
of Breathing Issues: as YES, but to make it more logical we label it Infected.
Our tree now looks like this:
Similarly, for the right node of Fever we see the subset of rows from the original
data set that have Breathing Issues value as YES and Fever as NO.
+-------+-------+------------------+----------+
| Fever | Cough | Breathing issues | Infected |
+-------+-------+------------------+----------+
| NO | YES | YES | YES |
+-------+-------+------------------+----------+
| NO | YES | YES | NO |
+-------+-------+------------------+----------+
| NO | YES | YES | NO |
+-------+-------+------------------+----------+
Here not all but most of the values are NO, hence NO or Not
Infected becomes our right leaf node.
Our tree, now, looks like this:

We repeat the same process for the node Cough, however here both left and
right leaves turn out to be the same i.e. NO or Not Infected as shown below:

Looks Strange, doesn’t it?

I know! The right node of Breathing issues is as good as just a leaf node with
class ‘Not infected’. This is one of the Drawbacks of ID3, it doesn’t do pruning.

Pruning is a mechanism that reduces the size and complexity of a Decision tree
by removing unnecessary nodes. More about pruning can be found here.

Another drawback of ID3 is overfitting or high variance i.e. it learns the dataset
it used so well that it fails to generalize on new data.

Target Maths Soln
100% (2)
Target Maths Soln
440 pages
Resolution
No ratings yet
Resolution
24 pages
Resolution, Frws and BCKWRD Chaining
50% (2)
Resolution, Frws and BCKWRD Chaining
17 pages
18-20-Resolution in Predicate Logic
100% (1)
18-20-Resolution in Predicate Logic
39 pages
First Order Logic (FOL)
No ratings yet
First Order Logic (FOL)
59 pages
23ad1504 Keis Unit 2 Notes
No ratings yet
23ad1504 Keis Unit 2 Notes
21 pages
Inference in First-Order Logic
No ratings yet
Inference in First-Order Logic
34 pages
Pred Resolution
No ratings yet
Pred Resolution
21 pages
Resolution Frws and BCKWRD Chaining
No ratings yet
Resolution Frws and BCKWRD Chaining
17 pages
Ai
No ratings yet
Ai
17 pages
19 Resolution
No ratings yet
19 Resolution
15 pages
Ai Mid 2 Answers
No ratings yet
Ai Mid 2 Answers
16 pages
Lecture 5 - Knowledge Representation and Reasoning (II) - Updated 13th October 2022
No ratings yet
Lecture 5 - Knowledge Representation and Reasoning (II) - Updated 13th October 2022
55 pages
Resolution Unit 3 New l5
No ratings yet
Resolution Unit 3 New l5
13 pages
15 KB Systems Part3 6up
No ratings yet
15 KB Systems Part3 6up
7 pages
Unit-3 - Unification Resolution
No ratings yet
Unit-3 - Unification Resolution
23 pages
Keis Answer Key
No ratings yet
Keis Answer Key
45 pages
Predicate Logic Exercise
No ratings yet
Predicate Logic Exercise
8 pages
First Order Logic Resolution
No ratings yet
First Order Logic Resolution
16 pages
Ai Notes Jntuk R20 Unit 3
No ratings yet
Ai Notes Jntuk R20 Unit 3
20 pages
Unit 5
No ratings yet
Unit 5
60 pages
Inference in First-Order Logic
No ratings yet
Inference in First-Order Logic
16 pages
Resolution Example
No ratings yet
Resolution Example
3 pages
1004 Theorem Proving 2018
No ratings yet
1004 Theorem Proving 2018
34 pages
Ai Online
No ratings yet
Ai Online
10 pages
Mid 2 ML Unit 4
No ratings yet
Mid 2 ML Unit 4
10 pages
Conditional Statement and Truth Table
No ratings yet
Conditional Statement and Truth Table
3 pages
Resolution and Natural Deduction
No ratings yet
Resolution and Natural Deduction
9 pages
4.7resolution in FOL
No ratings yet
4.7resolution in FOL
35 pages
Resolution (Knowledge Representation)
No ratings yet
Resolution (Knowledge Representation)
15 pages
Resolution in AI
0% (1)
Resolution in AI
5 pages
CI Unit 2 Resolution in FOL
No ratings yet
CI Unit 2 Resolution in FOL
56 pages
KRR4 Notes
No ratings yet
KRR4 Notes
7 pages
Artificial Intelligence Laboratory Prolog (Logic Programming Language)
No ratings yet
Artificial Intelligence Laboratory Prolog (Logic Programming Language)
24 pages
Resolution Method: Presented By: Gunjan Chhabra
No ratings yet
Resolution Method: Presented By: Gunjan Chhabra
13 pages
3.4 Resolution
No ratings yet
3.4 Resolution
21 pages
Convert To Clause Form:: Fol - CNF
No ratings yet
Convert To Clause Form:: Fol - CNF
8 pages
Resolution Principle
No ratings yet
Resolution Principle
4 pages
7 - 1-The-Resolution-Refutation-Method - Examples
No ratings yet
7 - 1-The-Resolution-Refutation-Method - Examples
31 pages
Lecture 7 Knowledge Representation-2
No ratings yet
Lecture 7 Knowledge Representation-2
21 pages
Unit 3 Topic 6 Resolution
No ratings yet
Unit 3 Topic 6 Resolution
16 pages
Lecture 4 - Knowledge Representation and Reasoning (I)
No ratings yet
Lecture 4 - Knowledge Representation and Reasoning (I)
59 pages
AI-09-Resolution in FOL
No ratings yet
AI-09-Resolution in FOL
35 pages
Resolution
No ratings yet
Resolution
15 pages
3.5 Resolution
No ratings yet
3.5 Resolution
7 pages
INFERENCE IN FOL - Updated
No ratings yet
INFERENCE IN FOL - Updated
41 pages
Unit-Iii: Propositional Logic (PL)
No ratings yet
Unit-Iii: Propositional Logic (PL)
33 pages
Basic Logic Symbols
No ratings yet
Basic Logic Symbols
3 pages
CNF Sem Vi
No ratings yet
CNF Sem Vi
3 pages
Chapter 1 Algebra PDF
No ratings yet
Chapter 1 Algebra PDF
46 pages
UGRD CS6105 Discrete Mathematics Legit Not Quizzess
100% (1)
UGRD CS6105 Discrete Mathematics Legit Not Quizzess
8 pages
Knowledge Representation and Inference and Resolution Using First-Order Logic
No ratings yet
Knowledge Representation and Inference and Resolution Using First-Order Logic
51 pages
Class Test Answer Key: 1. Define Ontological Engineering
No ratings yet
Class Test Answer Key: 1. Define Ontological Engineering
7 pages
Equivalence Logic
No ratings yet
Equivalence Logic
7 pages
An Kit Shah
No ratings yet
An Kit Shah
10 pages
Proof by Resolution
No ratings yet
Proof by Resolution
10 pages
Artificial Intelligence 8. The Resolution Method: Course V231 Department of Computing Imperial College, London Jeremy Gow
No ratings yet
Artificial Intelligence 8. The Resolution Method: Course V231 Department of Computing Imperial College, London Jeremy Gow
30 pages
Artificial Intelligence (Predicate Logic)
No ratings yet
Artificial Intelligence (Predicate Logic)
29 pages
The Logical Inference or Reasoning
No ratings yet
The Logical Inference or Reasoning
4 pages
UoA Cs 225 Coursebook - S22023
No ratings yet
UoA Cs 225 Coursebook - S22023
147 pages
First-Order Logic: CS472 - Fall 2007 Thorsten Joachims
No ratings yet
First-Order Logic: CS472 - Fall 2007 Thorsten Joachims
8 pages
AI Unit-III Chapter-II Inference in First-Order Logic
No ratings yet
AI Unit-III Chapter-II Inference in First-Order Logic
64 pages
Comp106 Logic
No ratings yet
Comp106 Logic
98 pages
CI Unit 2 Resolution in FOL
No ratings yet
CI Unit 2 Resolution in FOL
56 pages
Unit Iv
No ratings yet
Unit Iv
10 pages
Ai by Abraham Ahmed 6
No ratings yet
Ai by Abraham Ahmed 6
16 pages
AI Unit 3 Notes
No ratings yet
AI Unit 3 Notes
6 pages
SUMSEM12024-25 CSE3002 TH AP2024257000083 2025-05-29 Reference-Material-II
No ratings yet
SUMSEM12024-25 CSE3002 TH AP2024257000083 2025-05-29 Reference-Material-II
38 pages
12 Unit7
No ratings yet
12 Unit7
26 pages
2PropositionalLogic1 6.1 3
No ratings yet
2PropositionalLogic1 6.1 3
26 pages
FOL Inference
No ratings yet
FOL Inference
4 pages
Resolution in FOL
No ratings yet
Resolution in FOL
5 pages
Unification
No ratings yet
Unification
16 pages
Chap 2
No ratings yet
Chap 2
25 pages
Logic 2010 Advice
No ratings yet
Logic 2010 Advice
11 pages
Ahmed 2100 3857 2 LaCT L10C08P01 Propositional Logic
No ratings yet
Ahmed 2100 3857 2 LaCT L10C08P01 Propositional Logic
37 pages
First Order Logic (FOL)
No ratings yet
First Order Logic (FOL)
53 pages
Logical Arithmetic 2: The Sorites: Example 1: Goclenian
No ratings yet
Logical Arithmetic 2: The Sorites: Example 1: Goclenian
17 pages
Week2 (B)
No ratings yet
Week2 (B)
36 pages
Computational Logic 1
No ratings yet
Computational Logic 1
77 pages
Categorical Propositions Ditribution Square of Opposition
No ratings yet
Categorical Propositions Ditribution Square of Opposition
14 pages
Lesson 4 Rules of Replacement For Quantified Statements
No ratings yet
Lesson 4 Rules of Replacement For Quantified Statements
7 pages
Tutorial Sheet 2
No ratings yet
Tutorial Sheet 2
2 pages
1.1 Propositional Logic
No ratings yet
1.1 Propositional Logic
33 pages
MATH0005 Lecture Notes
No ratings yet
MATH0005 Lecture Notes
127 pages
Derivative of Inverse Trigonometric Functions
No ratings yet
Derivative of Inverse Trigonometric Functions
4 pages
Categorical Propositions
No ratings yet
Categorical Propositions
4 pages
E0298 CH01 Que
No ratings yet
E0298 CH01 Que
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Ilovepdf Merged

Uploaded by

Ilovepdf Merged

Uploaded by

Resolution in FOL

Conjunctive Normal Form: A sentence represented as a conjunction of clauses is said to be

Where li and mj are complementary literals.

We can resolve two clauses which are given below:

1. Conversion of facts into ﬁrst-order logic.

a. John likes all kind of food.

Eliminate all implication (→) and rewrite

Drop Universal quantiﬁers.

Step-3: Negate the statement to be proved

Output is class label.

Logistic Regression Logistic Regression

Logistic Regression model prediction (binary classifier) TP

∀x (student(x) → (walk (x) ∨ talk (x)))

1 Feb 27, 2006 2 Feb 27, 2006

∃x(walk (x) & talk (x))

3 Feb 27, 2006 4 Feb 27, 2006

6’. Someone walks and talks.

7’. Someone walks and someone talks.

5 Feb 27, 2006 6 Feb 27, 2006

The resolution inference rule: Step-1: Conversion of Facts into FOL

Where li and mj are complementary literals.

o Eliminate all implication (→) and rewrite

e. eats (Anil, Peanuts) AD

Example: Brother (John) = Smith. Example: 2 Let's take a famous example,

From the given sentence: ∃x Crown(x) ∧ OnHead(x, John),

we’ll also see how to implement this on Python.

Let’s start understanding this definition with examples. given by:

the card is clubs.

S = {HH, HT, TH, TT}

With regards to our dataset, this formula can be re-written as:

Y: class of the variable

X: dependent feature vector (of size n)

Which can be expressed as:

This formula can also be understood as:

Diagnosis, News classification, Sentiment Analysis, and a lot more.

Assumptions of Naive Bayes

need to do some precomputations on our dataset.

P(Pet Animal = NO) = 6/14.

Gaussian Naive Bayes

each feature to be distributed according to Gaussian Distribution. Gaussian Distribution is

also called Normal distribution.

(PDF) probability density function of a normal distribution is given by:

So, we will normalize the result: Endnotes

this algorithm and from here all you need is practice.

Most generally ID3 is only used for classification problems

We denote our dataset as S, entropy is calculated as:

Looks Strange, doesn’t it?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.