Naive Bayes Classifier
Naive Bayes Classifier
P( B ) P( A | B )
j 1
j j
Proof:
P( Bi A) P( Bi ) P( A | Bi )
P( Bi | A) k
P( B ) P( A | B )
P( A)
j j
j 1
Problem 3
M1 produces 20% of the items
M2 produces 30% of the items
M3 produces 50% of the items
1% of the items produced by machine M1 are defective
2% of the items produced by machine M2 are defective
3% of the items produced by machine M3 are defective.
one item is selected at random from the entire batch and it is found to
be defective
determine the probability that this item was produced by machine M2
Solution:
Let Bi= item selected is produced by machine Mi
P(B1) = .2, P(B2) = .3, and P(B3) = .5
Let A = selected item is defective
P(A|B1) = .01, P(A|B2) = .02, and P(A|B3) = .03
P(B2 )P(A|B2 ) (.3)(.02)
P(B2|A) .26
3
(.2)(.01) (.3)(.02) (.5)(.03)
P( B ) P( A | B )
j 1
j j
Another Look into Bayes’ Theorem
Suppose we have a hypothesis H
Assume H is either true or false
e.g., H = The defendant is guilty of some crime
We also have a prior belief about H
This is P(H)
e.g., we are 60% certain that the defendant is guilty
Now you get some evidence E
e.g., blood type of the actual criminal matches the with the blood type of the
defendant
There is still some uncertainty: (say) 20% of the population has the same blood
type
What is our revised (posterior) belief about H?
i.e., in light of the new evidence how certain are we that the defendant is guilty
P(H|E)
Q: In 60% of homicide cases where the wife is killed, the husband is found
out to be the murderer. Now you get some new evidence E, which is - the
blood type of the actual criminal matches the with the blood type fond at the
crime scene. But, there is still some uncertainty, because the 20% of the
population has the same blood type. What is our revised (posterior)
belief about the hypothesis that the husband is the murderor in the light of
the new evidence ?
Bayes’ Theorem
Likelihood Prior Probability
Posterior Probability
P( E | H ) P( H )
P( H | E )
P( E )
What is P(E)?
P(E) = P(E|H)P(H) + P(E|Hc)P(Hc)
Why?
We need P(E|H), P(E|Hc)
In this problem, P(E|H) = 1
P(E|Hc) = .2
So, our posterior belief that the defendant is guilty is
1(.6)
P( H | E ) 0.882
1(.6) (.2)(1 .6)
Problem 4
A test is 98% effective at detecting a disease
Test has a “false positive” rate of 1%
0.5% of the population has this disease
Let E = you test positive for the disease with this test
Let H = you actually have the disease
P(H| E)?
Solution: P( H | E )
P( E | H ) P( H )
P( E | H ) P( H ) P( E | H c ) P( H c )
P(E|H) = .98
(0.98)(0.005)
c
P(E|H )= .001 .33
(0.98)(0.005) (0.01)(1 0.005)
P(H)=.005
Simple Bayesian Inference
Likelihood Prior Probability
Posterior Probability
P( E | H ) P( H )
P( H | E )
P( E )
argmax P( x1 v1 , x2 v1 , , xn vn | c j ) P (c j )
c j C
Outlook Y N Humidity Y N
sunny 2/9 3/5 high 3/9 4/5
overcast 4/9 0 normal 6/9 1/5
rain 3/9 2/5
Tempreature Windy
hot 2/9 2/5 strong 3/9 3/5
mild 4/9 2/5 weak 6/9 2/5
cool 3/9 1/5
1
( 120110) 2
sample variance = 25
(120 90)2
1
P( Income 120 | Yes) e 2(50)
0.00000696
2 (7.07)
Estimation for Continuous attributes –
Q: For the following Table, classify
a l a l s120K>
the Test case ic X: <No,
ic Single, u
o
using Naïve
r
Bayes o r
Classifier
n uo
g
t e g
t e ti n ss a
ca ca co cl
Tid Refund Marital Taxable
Status Income Evade Test case X: <No, Single, 120K>
1 Yes Single 125K No
2 No Married 100K No
P(CY) = 3/10 P(CN) = 7/10
3 No Single 70K No
4 Yes Married 120K No
P(CY) P(X|CY) =
5 No Divorced 95K Yes = 3/10 * 3/3 * 2/3 * 0.00000696
6 No Married 60K No = 0.000001392
7 Yes Divorced 220K No
8 No Single 85K Yes P(CN) P(X|CN) =
9 No Married 75K No
= 7/10 * 4/7* 2/7 *0.0072 =
10 No Single 90K Yes
10
0.000823
x is in class CN (Evade = ‘NO’ class)
Question:
Question: Naïve Bayes Classifier