Unit-3 (4)
Unit-3 (4)
A formula is in conjunctive normal form (CNF) or clausal normal form if it is a conjunction of one
or more clauses, where a clause is a disjunction of literals; otherwise put, it is an AND of ORs.
Every propositional formula can be converted into an equivalent formula that is in CNF. This
transformation is based on rules about logical equivalences: the double negative law, De
Morgan's laws, and the distributive law.
As a normal form, it is useful in automated theorem proving
Conjunctive Normal Form (CNF): A WFF is in CNF format when it is a conjunction of disjunctions
of literals.
Example: (P Q R) (S P T R) (Q S)
A clause is the disjunction of many things. The units that make up a clause are called literals.
EXAMPLE:
P (Q R)
1. Eliminate, replacing α β with (α β)(β α).
(P (Q R)) ((Q R) P)
2. Eliminate , replacing α β with α β.
(P Q R) ((Q R) P)
3. Move inwards using de Morgan's rules and double-negation:
(P Q R) ((Q R) P)
4. Apply distributive law ( over ) and flatten:
(P Q R) (Q P) (R P)
RESOLUTION
In propositional logic, the procedure for producing a proof by resolution of proposition P with respect to
a set of axioms F is called RESOLUTION.
Another example,
Tom is a hardworking student.
Hardworking (Tom)
Tom is an intelligent student
Intelligent (Tom)
If Tom is a hardworking and Tom is intelligent then Tom scores high marks.
In PL,
Hardworking (Tom) ᴧ Intelligent (Tom) → score_high_marks (Tom).
But, what about John and Jill? If we could write instead “All students who are hardworking and
intelligent scores high marks.
Unfortunately, the given statement cannot be written in Propositional Logic because PL structure don’t
allow this.
FOL is a generalization of PL that allows us to express and infer arguments in infinite model like
o All men are mortal
o Some birds cannot fly
o At least one planet has life on it
Each sentence, or statement, is broken down into a subject and a predicate.
In each sentence, we actually talk about something. So in the sentence “Pinky is a cat” where
“Pinky” is a subject. And we are giving property to the subject which is called predicate. So in
the same sentence “cat” is the predicate. The predicate modifies or defines the properties of
the subject. In first-order logic, a predicate can only refer to a single subject.
A sentence in first-order logic is written in the form P(x), where P is the predicate and x is the
subject, represented as a variable. As in the sentence “Pinky is a cat”, we can write it in FOL as
cat (pinky) which actually we called methods. In more generalized form, cat (x) means x is a cat.
Sometimes, subject is not a single element. It becomes a group of objects. Example: “Every Lion
drinks coffee”. So we do have a Universe of Discourse “Lion”, so whenever we refer variable “x”
then that means “x” is one of the Lion which represents the connection of “x” and UoD.
X1
X2 LION
X3
Where each x1, x2, x3 are particular element of LION set, then “every Lion drinks coffee” is same
as:
drinks (X1, coffee) ᴧ drinks (X2, coffee) ᴧ drinks (X3, coffee)
Now if we do have infinite lions then we cannot mention each element separately. How many
subjects would be in UoD?
Again, suppose consider another example, “Some cats are intelligent”
C1
CAT
C2
C3
If there is no cat who is intelligent, then the whole statement would be FALSE. Again if UoD of
cat is infinite so again it would be imposing to ORing the statements.
In first-order logic, a sentence can be structured using the universal quantifier (symbolized ∀) or
the existential quantifier (Ǝ).
So for the above example,
∀ x * drinks (x, coffee)+
Ǝ x {intelligent (x)}
Example: the sky is blue, predicate is “is blue” and subject is “sky” and can be represented as Blue(sky)
Syntax of FOL:
1. Connectives
2. Quantifiers
3. Constants
4. Variables
5. Functions
6. Predicates
User defines these primitives:
Constant symbols: representing individuals in the world. E.g., Mary, 3
Function symbols map individuals to individuals. E.g., father-of(Mary) = John, color-of(Sky) = Blue
Predicate symbols map from individuals to truth values. E.g., greater(5,3), green(Grass), color(Grass,
Green)
Universal Elimination
If (∀x)P(x) is true, then P(c) is true, where c is a constant in the domain of x. For example, from
(∀x)eats(Ziggy, x) we can infer eats(Ziggy, IceCream). The variable symbol can be replaced by any ground
term, i.e., any constant symbol or function symbol applied to ground terms only.
Existential Introduction
If P(c) is true, then (∃x)P(x) is inferred. For example, from eats(Ziggy, IceCream) we can infer
(∃x)eats(Ziggy, x). All instances of the given constant symbol are replaced by the new variable symbol.
Note that the variable symbol cannot already exist anywhere in the expression.
Existential Elimination
From (∃x)P(x) infer P(c). For example, from (Ex)eats(Ziggy, x) infer eats(Ziggy, Cheese). Note that the
variable is replaced by a brand new constant that does not occur in this or any other sentence in the
Knowledge Base. In other words, we don't want to accidentally draw other inferences about it by
introducing the constant. All we know is there must be some constant that makes this true, so we can
introduce a brand new one to stand in for that (unknown) constant.
Entailment
Proof System
A proof is a sequence of sentences, where each sentence is either a premise or a sentence derived from
earlier sentences in the proof by one of the rules of inference.
The last sentence is the theorem (also called goal or query) that we want to prove.
Unification
if we can find a substitution θ such that King(x) and Greedy(x) match King(John) and Greedy(y)
p q θ
Knows(John,x) Knows(John,Jane) {x/Jane}
Knows(John,x) Knows(y,OJ) {x/OJ, y/John}
Knows(John,x) Knows(y,Mother(y)) {x/Mother(John), y/John}
Knows(John,x) Knows(x,OJ) {fail}
Exercise:
Solution:
Two literals are contradictory if one can be unified with the negation of the other. For example man(x)
and man (Himalayas) are contradictory since man(x) and man(Himalayas ) can be unified. In predicate
logic unification algorithm is used to locate pairs of literals that cancel out. It is important that if two
instances of the same variable occur, then they must be given identical substitutions. The resolution
algorithm for predicate logic as follows
Resolution Example: Anyone passing his history exams and winning the lottery is happy. But anyone who
studies or is lucky can pass all his exams. John did not study but John is lucky. Anyone who is lucky wins
the lottery. Is John happy?
Example 1:
Convert the following sentence into predicate logic and then prove "Is someone smiling?” using
resolution:
1. All people who are graduating are happy
2. All happy people smile
3. Someone is graduating
FOL:
∀x*graduating(x) → happy(x)}
∀x*happy(x) → smile(x)}
Ǝx{graduate(x)}
Prove: Ǝx(smile(x))
FOL to CNF:
Step 1:
∀x{ᴦgraduating(x) V happy(x)}
∀x{ᴦhappy(x) V smile(x)}
Ǝx{graduate(x)}
ᴦƎw(smile(w))
Step 2: standardize variable
∀x{ᴦgraduating(x) V happy(x)}
∀y{ᴦhappy(y) V smile(y)}
Ǝz{graduate(z)}
∀w(ᴦsmile(w))
Step 3: skolemization
∀x{ᴦgraduating(x) V happy(x)}
∀y{ᴦhappy(y) V smile(y)}
graduate(A)
∀w(ᴦsmile(w))
Step 4: drop universal
ᴦgraduating(x) V happy(x)
ᴦhappy(y) V smile(y)
graduate(A)
ᴦsmile(w)
Resolution tree
If facts ‘F’ is to be proved then it start with “ᴦF”
It contradicts all the other rule in KB
It process stops when it returns NULL clause.
Example 2:
Convert the following sentence into predicate logic and then prove "Was Marcus loyal to Caesar?” using
resolution:
1. Marcus was a man.
2. Marcus was a Pompeian.
3. All Pompeians were Romans.
4. Caesar was a ruler.
5. All Romans were either loyal to Caesar or hated him.
6. Everyone is loyal to someone.
7. People only try to assassinate rulers they are not loyal to.
8. Marcus tried to assassinate Caesar.
Solution:
The facts described by these sentences can be represented as a set of wff's in predicate logic as follows:
1. Marcus was a man.
man(Marcus)
2. Marcus was a Pompeian.
Pompeian(Marcus)
3. All Pompeians were Romans.
∀x: Pompeian(x) → Roman(x)
4. Caesar was a ruler.
ruler(Caesar)
5. All Romans were either loyal to Caesar or hated him.
∀x: Roman(x) → loyalto(X. Caesar) V hate(x, Caesar)
6. Everyone is loyal to someone.
∀x : → y: Ioyalto(x,y)
7. People only try to assassinate rulers they are not loyal to.
∀ x : ∀ y : person(x) ∧ ruler(y) ∧ tryassassinate(x,y) → ¬ Ioyalto(x,y)
8. Marcus tried to assassinate Caesar.
tryassassinate (Marcus, Caesar)
CNF:
o man(Marcus)
o Pompeian(Marcus)
o Pompeian(x1) Roman(x1)
o ruler(Caesar)
o Roman(x2) loyalto(x2, Caesar) hate(x2, Caesar)
o loyalto(x3, S1(x3))
o person(x4) ruler(y1) tryassassinate(x4, y1) loyalto(x4, y1)
o tryassassinate(Marcus, Caesar)
Resolution tree:
1. Convert the following sentences to FOPL
Jack owns a dog
Every dog owner is an animal lover.
No animal lover kills an animal.
Either Jack or Curiosity killed the cat, who is named Tuna
Also prove by resolution - Did curiosity kill the cat
a. (x) Dog(x) Owns(Jack,x)
b. (x) ((y) Dog(y) Owns(x, y)) AnimalLover(x)
c. (x) AnimalLover(x) ((y) Animal(y) Kills(x,y))
d. Kills(Jack,Tuna) Kills(Curiosity,Tuna)
e. Cat(Tuna)
f. (x) Cat(x) Animal(x)
g. Kills(Curiosity, Tuna) GOAL
3. Using propositional linear resolution, show the following propositional sentence is unsatisfiable.
Convert this sentence to clause form and derive the empty clause using resolution-
4. Represent the following sentence in the Predicate form "AII the children like sweets".
5. Represent the following sentences in first-order logic, using a consistent vocabulary (which you
must define):
a. Not all students take both History and Biology'
b. Only one student failed History'
c. Only one student failed both History and Biology'
d. The best score in History was better than the best score in Biology.
e. Every person who dislikes all vegetarians is smart
Practice Exercise -------
Solution:
FOL:
a) ∀x food(x) -> likes(John, x)
b) food(Apple)
c) food(Chicken)
d) ∀x ∃y [ eats(y, x) ^ ¬killed(y, x) -> food(x) ]
e) eats(Bill, peanuts) ^ alive(Bill, peanuts)
f) ∀x eats(Bill, x) => eats(Sue, x)
Clausal Form:
a. ¬food(x) \/ likes(John, x)
b. food(Apple)
c. food(Chicken)
d. ¬eats(y, x) \/ killed(y, x) \/ food(x)
e. eats(Bill, peanuts)
f. ¬killed (Bill,peanuts)
g. ¬eats(Bill, x) \/ eats(Sue, x)
OR
Solution:
INFERENCE RULES
Forward Chaining: Conclude from "A" and "A implies B" to "B".
A
A -> B
B
-------- ------------- -------------
Example:
It is raining.
If it is raining, the street is wet.
The street is wet.
-------- ------------- -------------
Forward chaining starts with the available data and uses inference rules to extract more data until the
goal is reached. Also called data-driven knowledge extraction.
Backward Chaining: Conclude from "B" and "A implies B" to "A".
B
A -> B
A
-------- ------------- -------------
Example:
The street is wet.
If it is raining, the street is wet.
It is raining.
Backward chaining is done in the backward direction. The system selects a goal state and rules whose
then portion has the goal state as conclusion. It establishes sub-goals to be satisfied for the goal state to
be true. Also called goal-driven knowledge extraction.
PROBABILITY TEHORY
Conditional probability, P(A|B), indicates the probability of event A given that we know event B has
occurred.
Independent events
The difference between the two examples is that in the first one, the two events are independent while
in the second they are not.
Events A and B are independent if
P(A ∩ B) = P(A) · P(B).
Conditional probability
In probability theory, Bayes’ theorem (alternatively Bayes’ law or Bayes' rule) describes the
probability of an event, based on prior knowledge of conditions that might be related to the
event.
Bayes’ theorem is named after Reverend Thomas Bayes.
Bayes' theorem is stated mathematically as the following equation:
EXAMPLE:
1. At a certain university, 4% of men are over 6 feet tall and 1% of women are over 6 feet tall. The
total student population is divided in the ratio 3:2 in favour of women. If a student is selected
at random from among all those over six feet tall, what is the probability that the student is a
woman?
2. A factory production line is manufacturing bolts using three machines, A, B and C. Of the total
output, machine A is responsible for 25%, machine B for 35% and machine C for the rest. It is
known from previous experience with the machines that 5% of the output from machine A is
defective, 4% from machine B and 2% from machine C. A bolt is chosen at random from the
production line and found to be defective. What is the probability that it came from (a) machine
A (b) machine B (c) machine C?
3. Machines A and B produce 10% and 90% respectively of the production of a component
intended for the motor industry. From experience, it is known that the probability that machine
A produces a defective component is 0.01 while the probability that machine B produces a
defective component is 0.05. If a component is selected at random from a day’s production and
is found to be defective, find the probability that it was made by (a) machine A; (b) machine B.
UTILITY THEORY
The main idea of Utility Theory is: an agent's preferences over possible outcomes can be
captured by a function that maps these outcomes to a real number; the higher the number the
more that agent likes that outcome. The function is called a utility function.
Utility Theory uses the notion of Expected Utility (EU) as a value that represents the average
utility of all possible outcomes of a state, weighted by the probability that the outcome occurs.
The agent can use probability theory to reason about uncertainty. The agent can use utility
theory for rational selection of actions based on preferences. Decision theory is a general theory
for combining probability with rational decisions
Decision theory = Probability theory + Utility theory
The other key concept of Utility Theory is known as the Principle of Maximum Utility (MEU)
which states that a rational agent should choose an action that maximizes the agent’s expected
utility
In probability theory, a Markov model is a stochastic model used to model randomly changing
systems. It is assumed that future states depend only on the current state, not on the events
that occurred before it.
A Markov chain is a sequence of states such that the (n+1)th state is independent of all previous
states if the nth state is known. That is, you can predict the distribution of the next state if I tell
you the current state, without telling how I got to the current state.
A hidden Markov model is a Markov chain for which the state is only partially observable. In
other words, observations are related to the state of the system, but they are typically
insufficient to precisely determine the state.
Note: The main weakness of Markov networks is their inability to represent induced and non-transitive
dependencies; two independent variables will be directly connected by an edge, merely because some
other variable depends on both. As a result, many useful independencies go unrepresented in the
network. To overcome this deficiency, Bayesian networks use the richer language of directed graphs,
where the directions of the arrows permit us to distinguish genuine dependencies from spurious
dependencies induced by hypothetical observations.
BAYESIAN NETWORK
A probability theory is called a Bayesian network when the underlying graph is directed and a Markov
network/Markov random field when the underlying graph is undirected.
You have a new burglar alarm installed. It is reliable about detecting burglary, but responds to minor
earthquakes. Two neighbors (John, Mary) promise to call you at work when they hear the alarm. John
always calls when hears alarm, but confuses alarm with phone ringing (and calls then also). Mary likes
loud music and sometimes misses alarm! Given evidence about who has and hasn’t called, estimate the
probability of a burglary.
Bayesian networks (or belief networks) are a graphical and compact way to represent uncertain
knowledge, based on this idea.
1. A set of nodes, one for each random variable of the “world” represented
2. A set of directed arcs connecting nodes
a. If there is an arc connecting X with Y , we say that X is a parent of Y (parents(X) denotes
the set of parents variable of X)
b. From the concept of parent variable, we define also the concepts of ancestors and
descendants of a random variable.
3. Each node Xi has an associated conditional probability distribution (CPT) P(Xi |parents(Xi))
a. If Xi is Boolean, we usually omit the probability of the false value.
An example: the probability that the alarm has gone off, both John and Mary call the police, but nothing
has happened is:
P(J, M, A, ¬B, ¬E) = P(J|A)P(M|A)P(A|¬B, ¬E)P(¬B)P(¬E) = 0.9 × 0.7 × 0.001 × 0.999 × 0.998 = 0.00062
For example, the probability of a Burglary, knowing that both John and Mary called the police i.e.,
compute P(burglary|john, mary)
The main weakness of Markov networks is their inability to represent induced and non-transitive
dependencies; two independent variables will be directly connected by an edge, merely because some
other variable depends on both. As a result, many useful independencies go unrepresented in the
network. To overcome this deficiency, Bayesian networks use the richer language of directed graphs,
where the directions of the arrows permit us to distinguish genuine dependencies from spurious
dependencies induced by hypothetical observations.