0% found this document useful (0 votes)
249 views46 pages

Naive Bayes Classifier

1) Bayes' theorem provides a way to calculate the posterior probability of a hypothesis H given evidence E. 2) In this example, the prior probability of the husband being the murderer is 60% based on past cases. The new evidence E is that the blood type matches, which occurs in 20% of the population. 3) Applying Bayes' theorem, the posterior probability (revised belief given E) that the husband is the murderer is approximately 88%.

Uploaded by

Supriyo Chakma
Copyright
© Attribution Non-Commercial No-Derivs (BY-NC-ND)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
249 views46 pages

Naive Bayes Classifier

1) Bayes' theorem provides a way to calculate the posterior probability of a hypothesis H given evidence E. 2) In this example, the prior probability of the husband being the murderer is 60% based on past cases. The new evidence E is that the blood type matches, which occurs in 20% of the population. 3) Applying Bayes' theorem, the posterior probability (revised belief given E) that the husband is the murderer is approximately 88%.

Uploaded by

Supriyo Chakma
Copyright
© Attribution Non-Commercial No-Derivs (BY-NC-ND)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Conditional Probability &

Naive Bayes’ Classifier


*** If you Have full understanding on
Bayes’ Theorem on Posterior Probability,
then you may Start from Slide no. 22 ***

*** Slides that are important for Exam/Questions:


Slides 22 – 45
Credits
 Slides have been adopted from slides by
Dr. Longin Jan Latecki, Dept of CIS, Temple
University, USA

 *** If you Have full understanding on Bayes


Theorem on Posterior Probability, then you
may Start from Slide no. 22 ***

 Slides that are relevant/important for Exam


Questions: only the Slides 22 – 45
Bayesian Methods
 Our focus this lecture
 Learning and classification methods based on
probability theory
 Bayes theorem plays a critical role in probabilistic
learning and classification
 Uses prior probability of each category given no
information about an item
 Categorization produces a posterior probability
distribution over the possible categories given a
description of an item
A dice-y(!) problem
 Two dice were rolled and yielding values D1 and D2.
 A is the event: D1+D2 = 4
 P(A) = ?
 E = {(1,3),(3,1),(2,2)}
 |S| = 36
 P(E) = 3/36
 Let B be the event: D2 = 2
 P(A given B already observed) = ?
 S = {(1,2), (2,2), (3,2), (4,2), (5,2), (6,2)}
 A = {(2,2)}
 P(A given B already observed) = 1/6
Conditional Probability
 Conditional probability is probability that A occurs given that
B has already occurred
 “Conditioning on B”
 Written as P(A | B)
 Means “P(A, given B already observed)”
 Sample space, S, reduced to those elements consistent with B (i.e. S ∩
B)
 Event space, A, reduced to those elements consistent with B (i.e. A ∩
B)
 With equally likely outcomes
# of outcomes in A consistent with B | A  B | | AB |
P( A | B)   
# of outcomes in S consistent with B | S  B | | B |
Conditional Probability
 General Definition
P( AB)
P( A | B)  , if P( B)  0
P( B)

 If P(B) = 0, P(A|B) is undefined


 Holds even when outcomes are not equally
likely
 P(AB) = P(A|B)P(B) [chain or multiplication
rule]
 Similarly, P(AB) = P(B|A)P(A)
The dice-y problem-revisited
 Two dice are rolled yielding values D1 and D2
 Event A : D1+D2=4
 |S|= 36, |A| = 3 as, A = {(1,3), (2,2), (3,1)}
 P(A) = 3/36
 Event B: D2 = 2
 |B| = 6 as, B = {(1,2), (2,2), (3,2), (4,2), (5,2), (6,2)}
 P(B) = 6/36
 AB = {(2,2)}
 P(AB) = 1/36
 P(Sum is 4, given 2 is observed on the first die)
 P(A|B) = P(AB)/P(B) = 1/36 / 1/6 = 6/36 = 1/6
Problem 2 (excluded!)
 A student is taking a one-hour-time-limit makeup examination.
 P(student will finish the exam in less than x hours) = x/2,
for all 0≤ x ≤ 1
 Then, given that the student is still working after .75 hour
 what is the conditional probability that the full hour is used?
 Solution
 Let F = student uses the full hour
 Let W = student still working after .75 hour
 P(F|W) = ?
 P(F) = 1 - P(finishes exam in less than 1 hour) = 1- ½ = .5
 P(W) = 1 - .75/2 = .625
 P(F|W) = P(FW)/P(W) = P(F)/P(W) = .5/.625 = .8
Problem 3
 Suppose that two balls are to be selected at
random, without replacement from a box containing
 r red balls and b blue balls
 Find probability that the first ball will be red and
the second ball will be blue
 Solution
 A = event that the first ball is red
 P(A) = r/(r+b)
 B = event that the 2nd ball is blue
 P(B|A) = b/(r+b-1)
 P(AB) = P(B|A)P(A) = rb/((r+b)(r+b-1))
Multiplication rule
 General form
Partitions and Total Probability
 Let S = sample space of some experiment
 consider k events B1, . . . , Bk in S such that
 B1, . . . , Bk are disjoint and
 UBj = S , j = 1,2, …, k
 Then these events form a partition of S

 Here, B1, B2, B3, B4, and B5 form a partition of S


Total Probability
 Suppose that the events B1, . . . ,Bk form a partition of
the space S and
 Pr(Bj) > 0, for j = 1, . . . , k.
k
 for every event A in S: P( A)   P( B j ) P( A | B j )
j 1
 Proof:
 A ∩ B1, A ∩ B1, …, A ∩ Bk forms a partition of A
 A = (A ∩ B1) U… U (A ∩ Bk)
 As (A ∩ Bj)s are mutually disjoint
k
P( A)   P( A  B j )
j 1
k
  P( B j ) P( A | B j ), since P(B j )  0
j 1
Problem 1
 Box 1 contains 60 red balls and 40 white balls
 Box 2 contains 10 red balls and 20 while balls
 One box is selected at random and a ball is
selected at random from that box
 Let event A = red ball selected, P(A) = ?
 Solution:
 B1 = box 1 is selected
 B2 = box 2 is selected
 P(A) = P(B1)P(A|B1) + P(B2)P(A|B2)
= ½ * 60/100 + ½ * 10/30 = 7/15
Problem 2
 Same experiment as problem 1
 Now after selection we observe that the ball selected is
Red
 What is the probability that it came from box 1?
 P(B1|A)
 What is the probability that it came from box 2?
 P(B2|A)
 Solution:
 P(B1|A) = P(B1A)/P(A) = P(B1)P(A|B1)/P(A)
= (½ * 60/100)/(7/15)
= 9/14
 Similarly, P(B2|A) = P(B2)P(A|B2)/P(A)
Bayes’ Theorem
 Suppose that the events B1, . . . ,Bk form a partition of
the space S and Pr(Bj) > 0, for j = 1, . . . , k.
 Let A be an event in S such that P(A) > 0
 then for i= 1, … ,k
P( Bi ) P( A | Bi )
P( Bi | A)  k

 P( B ) P( A | B )
j 1
j j

 Proof:
P( Bi A) P( Bi ) P( A | Bi )
P( Bi | A)   k

 P( B ) P( A | B )
P( A)
j j
j 1
Problem 3
 M1 produces 20% of the items
 M2 produces 30% of the items
 M3 produces 50% of the items
 1% of the items produced by machine M1 are defective
 2% of the items produced by machine M2 are defective
 3% of the items produced by machine M3 are defective.
 one item is selected at random from the entire batch and it is found to
be defective
 determine the probability that this item was produced by machine M2
 Solution:
 Let Bi= item selected is produced by machine Mi
 P(B1) = .2, P(B2) = .3, and P(B3) = .5
 Let A = selected item is defective
 P(A|B1) = .01, P(A|B2) = .02, and P(A|B3) = .03
P(B2 )P(A|B2 ) (.3)(.02)
P(B2|A)    .26
3
(.2)(.01)  (.3)(.02)  (.5)(.03)
 P( B ) P( A | B )
j 1
j j
Another Look into Bayes’ Theorem
 Suppose we have a hypothesis H
 Assume H is either true or false
 e.g., H = The defendant is guilty of some crime
 We also have a prior belief about H
 This is P(H)
 e.g., we are 60% certain that the defendant is guilty
 Now you get some evidence E
 e.g., blood type of the actual criminal matches the with the blood type of the
defendant
 There is still some uncertainty: (say) 20% of the population has the same blood
type
 What is our revised (posterior) belief about H?
 i.e., in light of the new evidence how certain are we that the defendant is guilty
 P(H|E)

Q: In 60% of homicide cases where the wife is killed, the husband is found
out to be the murderer. Now you get some new evidence E, which is - the
blood type of the actual criminal matches the with the blood type fond at the
crime scene. But, there is still some uncertainty, because the 20% of the
population has the same blood type. What is our revised (posterior)
belief about the hypothesis that the husband is the murderor in the light of
the new evidence ?
Bayes’ Theorem
Likelihood Prior Probability
Posterior Probability
P( E | H ) P( H )
P( H | E ) 
P( E )

 What is P(E)?
 P(E) = P(E|H)P(H) + P(E|Hc)P(Hc)
 Why?
 We need P(E|H), P(E|Hc)
 In this problem, P(E|H) = 1
 P(E|Hc) = .2
 So, our posterior belief that the defendant is guilty is
1(.6)
P( H | E )   0.882
1(.6)  (.2)(1  .6)
Problem 4
 A test is 98% effective at detecting a disease
 Test has a “false positive” rate of 1%
 0.5% of the population has this disease
 Let E = you test positive for the disease with this test
 Let H = you actually have the disease
 P(H| E)?
 Solution: P( H | E ) 
P( E | H ) P( H )
P( E | H ) P( H )  P( E | H c ) P( H c )
 P(E|H) = .98
(0.98)(0.005)
 c
P(E|H )= .001   .33
(0.98)(0.005)  (0.01)(1  0.005)
 P(H)=.005
Simple Bayesian Inference
Likelihood Prior Probability
Posterior Probability
P( E | H ) P( H )
P( H | E ) 
P( E )

 Odds of H = P(H)/P(Hc) (excluded!)


For example, if P(H)/P(Hc) = 2

 odds of 2 to 1 in favour of the hypothesis
 Meaning: it is two times more likely that the hypothesis is true
than it is false
 Odds of H|E: (excluded!)
 P(H|E)/P(Hc|E)
 = [P(H)/P(Hc)] [P(E|H)/P(E|Hc)]
Problem 5 (excluded!)
 An urn contains 2 coins: A and B
 A comes up heads with probability ¼
 B comes up heads with probability ¾
 Pick coin (equally likely), flip it, and It comes up HEAD
 What are odds that A was picked? note: Ac = B?
 Solution P( A | heads) P( A) P(heads | A) 1 / 2 1 / 4
c
 c c
  1/ 3
P( A | heads) P( A ) P(heads | A ) 1/ 2 3 / 4

 the odds are 1/3 : 1


 Note: before the evidence that head was observed
the odds were 1:1
Bayes’ Theorem
 Given a hypothesis h and data D which bears on the
hypothesis:
P ( D | h) P ( h)
P ( h | D) 
P( D)
 P(h): independent probability of hypothesis h: prior
probability of hypothesis
 P(D): independent probability of data D
 P(D|h): conditional probability of data D, given
hypothesis h is true. (called likelihood of data D, given
hypothesis h
 P(h|D): conditional probability of hypothesis h being true,
given data D: posterior probability of hypothesis
 Question: Write the Bayes’ theorem on Posterior Probability and identify the
following terms — prior and posterior probability, likelihood, independent
probability.
Maximum A Posterior
Question: Explain how to compute the Maximum A Posterior (MAP)
hypothesis hMAP for some given data D.
 Based on Bayes’ Theorem, we can compute the
Maximum A Posterior (MAP) hypothesis h for the
given data D
 We are interested in the best hypothesis for some space
H given observed training data D.
hMAP  argmax P(h | D)
hH
P ( D | h) P ( h)
 argmax
hH P( D)
 argmax P( D | h) P(h)
H: set of all hypothesis. hH

Note that we can drop P(D) as the probability of the data is


constant (and independent of the hypothesis).
Maximum Likelihood
Question: Explain how to compute the Maximum Likelihood
hypothesis hML for some given data D. What is the assumption for
Maximum Likelihood hypothesis?

 Now assume that all hypotheses are equally


probable a priori,
 i.e., P(hi ) = P(hj ) for all hi, hj belong to H.
 This is called assuming a uniform prior. It
simplifies computing the posterior:
hML  arg max P( D | h)
hH

 This hypothesis is called the maximum


likelihood hypothesis. (because, P(D|h) is called
the likelihood of Data D, given the hypothesis h is
true)
Desirable Properties of Bayes Classifier
Question: What are the desirable properties of the Bayes’ Classifier? Or,
Write three desirable properties of the Bayes’ classifier.

 Incrementality: with each training example, the


prior probability and the data likelihood can be
updated dynamically. This is flexible and robust
to errors.
 Combines prior knowledge and observed data:
Conceptually Sound and beautiful! The prior
probability of a hypothesis multiplied with
probability of the data, given the hypothesis
 Probabilistic hypothesis: Outputs not only a
classification, but a probability distribution over
all classes
Bayes Classifier: Summary
 P (class | data); Here, D=Data= <sunny, hot, normal, strong>
= P(C|D) D = x = <x1, x2, x3, x4>
= P (CD) / P(D) [conditional probability formula P(A|B)=P(AB)/P(B)]
= P (C)*P(D|C) / P(D) [ P(D|C)=very easy, but P(C|D) = difficult]
(likelihood) (posterior probab.)

Suppose Two classes c1 and c2. Then, test data x observed.


To which class should you classify (assign) the data x ?

Find posterior probabilities: P(c1|x) and P(c2|x) in above way


P(c1|x) = P (c1) * P(x|c1) / P(x) ; P(c2|x) = P (c2) * P(x|c2) / P(x)
Assign x to the class with Higher posterior probability value

* Same Denominator P(x), so you can compute Numerator only


Example Data: ‘Play Cricket’ data
Day Outlook Temperature Humidity Wind Play
Cricket

Day1 Sunny Hot High Weak No


Day2 Sunny Hot High Strong No

Attributes Day3 Overcast Hot High Weak Yes


Day4 Rain Mild High Weak Yes Class
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No

*Question: For the day <sunny, hot, normal, strong>,


what’s the play prediction?
Notations
 D is the data table given
 Each row is an example
 We have 4 attributes
 x1 = outlook, possible values = {sunny, overcast, rain}
 x2 = temperature, possible values= {hot, mild, cool}
 x3 = humidity, possible values = {high, normal}
 x4 = wind, possible values = {weak, strong}
 Classes C = {yes, no} = {Y,N}
 In general,
 We are given training data D
 D contains instances of different classes cj  C
 Each instance x contains the values for the attributes x = < x1, x2,…,xn>
Bayesian Classification: Motivation
 Classification can be written as:
 Given an object x
 with attribute x1, … xn having values v1, … , vn
 what is the probability that object x belong to class cj?
 Given day is <sunny, hot, high, strong>, what’s the
probability of playing cricket?
 P(x  Y | outlook = sunny, temperature = hot, humidity = high,
wind = strong)
 In probability notation: P(x  cj | x1=v1, …, xn=vn)?
 Use simplified notation: P(cj | X)
 Informally, Find P( Y| <sunny, hot, normal, strong> ) and
P( N | <sunny, hot, normal, strong> )
Bayes Classifiers: Key idea:
Assign the most probable class c MAP using Bayes’ theorem

cMAP  argmax P(c j | X )


c j C
P ( X | c j ) P (c j )
 argmax
c j C P( X )
 argmax P( X | c j ) P(c j )
c j C

 argmax P( x1  v1 , x2  v1 , , xn  vn | c j ) P (c j )
c j C

means : P (x1=sunny, x2=hot, x3=Normal, x4=Strong | Y) * P (Y)


and : P (x1=sunny, x2=hot, x3=Normal, x4=Strong | N) * P (N)
called naive Bayes, because estimates these complex probabilitues as
P (x1=sunny|Y) * P(x2=hot|Y) * P(x3=Normal|Y)*P(x4=Strong | Y) * P (Y)
Bayesian Classification: Motivation
argmax P( X | c j ) P(c j )
c j C

 What does each term mean?


 P(X | cj) = P(x1=v1, …, xn=vn | x  cj)
 Probability that the attributes x1, … xn will have values v1, … ,
vn given an object x is in class cj
 P(cj) : Prior probability of class cj
 Probability that an random object selected is of class cj
 Note: P(X) = P(x1=v1, …, xn=vn)
 Probability that a random object selected will have values v1,
… , vn for attributes x1, … xn
Estimation
 But how to estimate the probabilities?
 Draw a sample!
 Which, in classification term, means look at the
training set
Parameters Estimation
Question: What is assumption of Naïve Bayes classifier (NBC)? Write and explain
the Independence assumption of NBC. Why is it called the Naïve Bayes classifier?

 P(cj) Can be estimated from the frequency of classes in the


training data D. Ans (last Part): Because of the Independence Assumption, which is
usually not True. For example, the probability of humidity=high and
temp = cool is not independent of (actually, highly Correlated to)
the Outlook = “rainy
 P(x1,x2,…,xn|cj)
 O(|X|n•|C|) possible combinations of the parameters
 Could only be estimated if a very, very large number of
training examples was available.
 Independence Assumption: attribute values are
conditionally independent of each other given the target
value: this is why it is called naïve Bayes’ Classifier.
P( x1 , x2 ,, xn | c j )   P( xi | c j )
i
c NB  arg max P(c j ) P( xi | c j )
c j C i
Properties

 Estimating P( xi | c j ) instead of P( x1 , x2 ,, xn | c j ) greatly


reduces the number of parameters (and the data
sparseness).
 The learning step in Naïve Bayes’ consists of
estimating P( xi | c j ) and P(c j ) based on the
frequencies in the training data
 An unseen instance is classified by computing the
class that maximizes the posterior
 When conditioned independence is satisfied,
Naïve Bayes corresponds to MAP classification.
Example. ‘Play Cricket’ data
Day Outlook Temperature Humidity Wind Play
Cricket

Day1 Sunny Hot High Weak No


Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No

*Question: For the day <sunny, hot, normal, strong>,


what’s the Play prediction?
Naive Bayes’ Classifier – Computations
*Question: For the day <sunny, hot, normal, strong>,
what’s the Play prediction?

 Given a D, we can compute the probabilities P(xi | cj)

Outlook Y N Humidity Y N
sunny 2/9 3/5 high 3/9 4/5
overcast 4/9 0 normal 6/9 1/5
rain 3/9 2/5
Tempreature Windy
hot 2/9 2/5 strong 3/9 3/5
mild 4/9 2/5 weak 6/9 2/5
cool 3/9 1/5

 P (sunny | CY) = 2/9 P(normal | CY) = 6/9


 P (sunny | CN) = 3/5 P(strong | CN) = 3/5
Naive Bayes’ Classifier – Computations
*Question: For the day <sunny, hot, normal, strong>,
what’s the Play prediction?

Outlook Y N Humidity Y N  Now with object x = (sunny,


sunny 2/9 3/5 high 3/9 4/5
hot, normal, strong)
overcast 4/9 0 normal 6/9 1/5
rain 3/9 2/5
Tempreature Windy P(CY) P(X|CY) = 9/14 * 2/9 * 2/9
hot 2/9 2/5 Strong 3/9 3/5 * 6/9 * 3/9 = 0.007
mild 4/9 2/5 Weak 6/9 2/5
cool 3/9 1/5
P(CN) P(X|CN) = 5/14 * 3/5 * 2/5
* 1/5 * 3/5 = 0.01

P(CY) = 9 / 14 x is in class CN (‘NO’ Play Cricket)


P(CN) = 5 / 14
Underflow Prevention
Question: Describe the source of floating point underflow problem in the
Naïve Bayes Classifier. Explain how this problem can be solved.

 Multiplying lots of probabilities, which are


between 0 and 1 by definition, can result in
floating-point underflow.
 Since log(xy) = log(x) + log(y),
 it is better to perform all computations by summing
logs of probabilities rather than multiplying
probabilities.
 Class with highest final log sum of probabilities is
still the most probable. (underflow prevention by Log)
cUPL  argmax log P(c j ) 
c jC

i positions
log P( xi | c j )
Naïve Bayes’ Classifier – Parameter Estimation
Question: What is the problem associated with Continuous attribute values in a
Naïve Bayes classifier? How this problem is solved and what is the underlying
assumption?

 More detail in estimating probabilities:


 If i-th attribute is categorical:
P(xi|cj) is estimated as the relative freq of samples
having value xi as i-th attribute in class cj
 If i-th attribute is continuous:
P(xi|cj) can be estimated through a Gaussian density
function on the attribute’s values. This called ‘Gaussian
Distribution Assumption’ of Naïve Bayes Classifier, which
assumes that each continuous attribute is distributed by
Gaussian (𝜇, 𝜎 2 ) distribution. The mean 𝜇 and variance 𝜎 2
can be estimated by observing the Trainig data Table D
 Computationally easy in both cases
Estimation for Continuous attributes –
Q: For the following Table, classify
the Test case X: <No, c a l Single, c a l 120K> us
r i r i o
using Naïve Bayes
g o g o
Classifier t inu ss
e t e t n a
ca ca c o cl
Tid Refund Marital Taxable
Status Income Evade 
( xi   j )2
1 Yes Single 125K No 
1 2 2j
2 No Married 100K No P( xi | c j )  e
3 No Single 70K No 2 2
j
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

1 
( 120110) 2

P( Income  120 | No)  e 2 ( 2975)


 0.0072
2 (54.54)
Estimation for Continuous attributes –
Q: For the following
l Table,
l classify
a
the Test casericX: <No, ic a
Single, us
120K>
o
r
using Naïve g o
Bayes g o
Classifier t inu s
te te n as
ca ca co cl
Tid Refund Marital Taxable
Status Income Evade  Normal distribution:
( xi   j )2
1 Yes Single 125K No 
1 2 2j
2 No Married 100K No P( xi | c j )  e
3 No Single 70K No 2 2
j
4 Yes Married 120K No  One for each Continuous
5 No Divorced 95K Yes attribute each class
6 No Married 60K No
7 Yes Divorced 220K No  For (Income, Class=Yes):
8 No Single 85K Yes
 If Class=Yes
9 No Married 75K No
 sample mean = 90
10 No Single 90K Yes
10

 sample variance = 25
(120 90)2
1 
P( Income  120 | Yes)  e 2(50)
 0.00000696
2 (7.07)
Estimation for Continuous attributes –
Q: For the following Table, classify
a l a l s120K>
the Test case ic X: <No,
ic Single, u
o
using Naïve
r
Bayes o r
Classifier
n uo
g
t e g
t e ti n ss a
ca ca co cl
Tid Refund Marital Taxable
Status Income Evade  Test case X: <No, Single, 120K>
1 Yes Single 125K No
2 No Married 100K No
P(CY) = 3/10 P(CN) = 7/10
3 No Single 70K No
4 Yes Married 120K No
P(CY) P(X|CY) =
5 No Divorced 95K Yes = 3/10 * 3/3 * 2/3 * 0.00000696
6 No Married 60K No = 0.000001392
7 Yes Divorced 220K No
8 No Single 85K Yes P(CN) P(X|CN) =
9 No Married 75K No
= 7/10 * 4/7* 2/7 *0.0072 =
10 No Single 90K Yes
10
0.000823
x is in class CN (Evade = ‘NO’ class)
Question:
Question: Naïve Bayes Classifier

Posterior Probability ( Sunny | <humid, cold, fast> )


= P(Sunny) x P(Humid | Sunny)
x P(Cold | Sunny) x P(Fast | Sunny)
= calculate yourself

Posterior Probability ( Rainy | <humid, cold, fast> )


= P(Rainy) x P(Humid | Rainy)
x P(Cold | Rainy) x P(Fast | Rainy)
= calculate yourself
Question: What are the Applications of
Naïve Bayes Classifier?
Two More Questions

 Question: What are the limitations of MLE


classifier?
 Ans: The MLE classifier Does NOT consider the Prior
probabilities. So classification results may be
inaccurate, unless equal priors.
 Question: What are the Strengths and weaknesses/
Limitations of Naïve Bayes Classifier?
 Strengths: see the Desirable Properties of Bayes
Classifier (slide 25)
Weakness: Makes the ‘independence’ assumption of the feature
values which is not true in most cases. For example, the
probability of humidity=high and temp = cool is not independent
of (actually, highly Correlated to) the Outlook = “rainy

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy