0% found this document useful (0 votes)

249 views46 pages

Naive Bayes Classifier

1) Bayes' theorem provides a way to calculate the posterior probability of a hypothesis H given evidence E. 2) In this example, the prior probability of the husband being the murderer is 60% based on past cases. The new evidence E is that the blood type matches, which occurs in 20% of the population. 3) Applying Bayes' theorem, the posterior probability (revised belief given E) that the husband is the murderer is approximately 88%.

Uploaded by

Supriyo Chakma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

249 views46 pages

Naive Bayes Classifier

Uploaded by

Supriyo Chakma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 46

Conditional Probability &

Naive Bayes’ Classifier

*** If you Have full understanding on
Bayes’ Theorem on Posterior Probability,
then you may Start from Slide no. 22 ***

*** Slides that are important for Exam/Questions:

Slides 22 – 45
Credits
 Slides have been adopted from slides by
Dr. Longin Jan Latecki, Dept of CIS, Temple
University, USA

 *** If you Have full understanding on Bayes

Theorem on Posterior Probability, then you
may Start from Slide no. 22 ***

 Slides that are relevant/important for Exam

Questions: only the Slides 22 – 45
Bayesian Methods
 Our focus this lecture
 Learning and classification methods based on
probability theory
 Bayes theorem plays a critical role in probabilistic
learning and classification
 Uses prior probability of each category given no
information about an item
 Categorization produces a posterior probability
distribution over the possible categories given a
description of an item
A dice-y(!) problem
 Two dice were rolled and yielding values D1 and D2.
 A is the event: D1+D2 = 4
 P(A) = ?
 E = {(1,3),(3,1),(2,2)}
 |S| = 36
 P(E) = 3/36
 Let B be the event: D2 = 2
 P(A given B already observed) = ?
 S = {(1,2), (2,2), (3,2), (4,2), (5,2), (6,2)}
 A = {(2,2)}
 P(A given B already observed) = 1/6
Conditional Probability
 Conditional probability is probability that A occurs given that
B has already occurred
 “Conditioning on B”
 Written as P(A | B)
 Means “P(A, given B already observed)”
 Sample space, S, reduced to those elements consistent with B (i.e. S ∩
B)
 Event space, A, reduced to those elements consistent with B (i.e. A ∩
B)
 With equally likely outcomes
# of outcomes in A consistent with B | A  B | | AB |
P( A | B)   
# of outcomes in S consistent with B | S  B | | B |
Conditional Probability
 General Definition
P( AB)
P( A | B)  , if P( B)  0
P( B)

 If P(B) = 0, P(A|B) is undefined

 Holds even when outcomes are not equally
likely
 P(AB) = P(A|B)P(B) [chain or multiplication
rule]
 Similarly, P(AB) = P(B|A)P(A)
The dice-y problem-revisited
 Two dice are rolled yielding values D1 and D2
 Event A : D1+D2=4
 |S|= 36, |A| = 3 as, A = {(1,3), (2,2), (3,1)}
 P(A) = 3/36
 Event B: D2 = 2
 |B| = 6 as, B = {(1,2), (2,2), (3,2), (4,2), (5,2), (6,2)}
 P(B) = 6/36
 AB = {(2,2)}
 P(AB) = 1/36
 P(Sum is 4, given 2 is observed on the first die)
 P(A|B) = P(AB)/P(B) = 1/36 / 1/6 = 6/36 = 1/6
Problem 2 (excluded!)
 A student is taking a one-hour-time-limit makeup examination.
 P(student will finish the exam in less than x hours) = x/2,
for all 0≤ x ≤ 1
 Then, given that the student is still working after .75 hour
 what is the conditional probability that the full hour is used?
 Solution
 Let F = student uses the full hour
 Let W = student still working after .75 hour
 P(F|W) = ?
 P(F) = 1 - P(finishes exam in less than 1 hour) = 1- ½ = .5
 P(W) = 1 - .75/2 = .625
 P(F|W) = P(FW)/P(W) = P(F)/P(W) = .5/.625 = .8
Problem 3
 Suppose that two balls are to be selected at
random, without replacement from a box containing
 r red balls and b blue balls
 Find probability that the first ball will be red and
the second ball will be blue
 Solution
 A = event that the first ball is red
 P(A) = r/(r+b)
 B = event that the 2nd ball is blue
 P(B|A) = b/(r+b-1)
 P(AB) = P(B|A)P(A) = rb/((r+b)(r+b-1))
Multiplication rule
 General form
Partitions and Total Probability
 Let S = sample space of some experiment
 consider k events B1, . . . , Bk in S such that
 B1, . . . , Bk are disjoint and
 UBj = S , j = 1,2, …, k
 Then these events form a partition of S

 Here, B1, B2, B3, B4, and B5 form a partition of S

Total Probability
 Suppose that the events B1, . . . ,Bk form a partition of
the space S and
 Pr(Bj) > 0, for j = 1, . . . , k.
k
 for every event A in S: P( A)   P( B j ) P( A | B j )
j 1
 Proof:
 A ∩ B1, A ∩ B1, …, A ∩ Bk forms a partition of A
 A = (A ∩ B1) U… U (A ∩ Bk)
 As (A ∩ Bj)s are mutually disjoint
k
P( A)   P( A  B j )
j 1
k
  P( B j ) P( A | B j ), since P(B j )  0
j 1
Problem 1
 Box 1 contains 60 red balls and 40 white balls
 Box 2 contains 10 red balls and 20 while balls
 One box is selected at random and a ball is
selected at random from that box
 Let event A = red ball selected, P(A) = ?
 Solution:
 B1 = box 1 is selected
 B2 = box 2 is selected
 P(A) = P(B1)P(A|B1) + P(B2)P(A|B2)
= ½ * 60/100 + ½ * 10/30 = 7/15
Problem 2
 Same experiment as problem 1
 Now after selection we observe that the ball selected is
Red
 What is the probability that it came from box 1?
 P(B1|A)
 What is the probability that it came from box 2?
 P(B2|A)
 Solution:
 P(B1|A) = P(B1A)/P(A) = P(B1)P(A|B1)/P(A)
= (½ * 60/100)/(7/15)
= 9/14
 Similarly, P(B2|A) = P(B2)P(A|B2)/P(A)
Bayes’ Theorem
 Suppose that the events B1, . . . ,Bk form a partition of
the space S and Pr(Bj) > 0, for j = 1, . . . , k.
 Let A be an event in S such that P(A) > 0
 then for i= 1, … ,k
P( Bi ) P( A | Bi )
P( Bi | A)  k

 P( B ) P( A | B )
j 1
j j

 Proof:
P( Bi A) P( Bi ) P( A | Bi )
P( Bi | A)   k

 P( B ) P( A | B )
P( A)
j j
j 1
Problem 3
 M1 produces 20% of the items
 M2 produces 30% of the items
 M3 produces 50% of the items
 1% of the items produced by machine M1 are defective
 2% of the items produced by machine M2 are defective
 3% of the items produced by machine M3 are defective.
 one item is selected at random from the entire batch and it is found to
be defective
 determine the probability that this item was produced by machine M2
 Solution:
 Let Bi= item selected is produced by machine Mi
 P(B1) = .2, P(B2) = .3, and P(B3) = .5
 Let A = selected item is defective
 P(A|B1) = .01, P(A|B2) = .02, and P(A|B3) = .03
P(B2 )P(A|B2 ) (.3)(.02)
P(B2|A)    .26
3
(.2)(.01)  (.3)(.02)  (.5)(.03)
 P( B ) P( A | B )
j 1
j j
Another Look into Bayes’ Theorem
 Suppose we have a hypothesis H
 Assume H is either true or false
 e.g., H = The defendant is guilty of some crime
 We also have a prior belief about H
 This is P(H)
 e.g., we are 60% certain that the defendant is guilty
 Now you get some evidence E
 e.g., blood type of the actual criminal matches the with the blood type of the
defendant
 There is still some uncertainty: (say) 20% of the population has the same blood
type
 What is our revised (posterior) belief about H?
 i.e., in light of the new evidence how certain are we that the defendant is guilty
 P(H|E)

Q: In 60% of homicide cases where the wife is killed, the husband is found
out to be the murderer. Now you get some new evidence E, which is - the
blood type of the actual criminal matches the with the blood type fond at the
crime scene. But, there is still some uncertainty, because the 20% of the
population has the same blood type. What is our revised (posterior)
belief about the hypothesis that the husband is the murderor in the light of
the new evidence ?
Bayes’ Theorem
Likelihood Prior Probability
Posterior Probability
P( E | H ) P( H )
P( H | E ) 
P( E )

 What is P(E)?
 P(E) = P(E|H)P(H) + P(E|Hc)P(Hc)
 Why?
 We need P(E|H), P(E|Hc)
 In this problem, P(E|H) = 1
 P(E|Hc) = .2
 So, our posterior belief that the defendant is guilty is
1(.6)
P( H | E )   0.882
1(.6)  (.2)(1  .6)
Problem 4
 A test is 98% effective at detecting a disease
 Test has a “false positive” rate of 1%
 0.5% of the population has this disease
 Let E = you test positive for the disease with this test
 Let H = you actually have the disease
 P(H| E)?
 Solution: P( H | E ) 
P( E | H ) P( H )
P( E | H ) P( H )  P( E | H c ) P( H c )
 P(E|H) = .98
(0.98)(0.005)
 c
P(E|H )= .001   .33
(0.98)(0.005)  (0.01)(1  0.005)
 P(H)=.005
Simple Bayesian Inference
Likelihood Prior Probability
Posterior Probability
P( E | H ) P( H )
P( H | E ) 
P( E )

 Odds of H = P(H)/P(Hc) (excluded!)

For example, if P(H)/P(Hc) = 2

 odds of 2 to 1 in favour of the hypothesis
 Meaning: it is two times more likely that the hypothesis is true
than it is false
 Odds of H|E: (excluded!)
 P(H|E)/P(Hc|E)
 = [P(H)/P(Hc)] [P(E|H)/P(E|Hc)]
Problem 5 (excluded!)
 An urn contains 2 coins: A and B
 A comes up heads with probability ¼
 B comes up heads with probability ¾
 Pick coin (equally likely), flip it, and It comes up HEAD
 What are odds that A was picked? note: Ac = B?
 Solution P( A | heads) P( A) P(heads | A) 1 / 2 1 / 4
c
 c c
  1/ 3
P( A | heads) P( A ) P(heads | A ) 1/ 2 3 / 4

 the odds are 1/3 : 1

 Note: before the evidence that head was observed
the odds were 1:1
Bayes’ Theorem
 Given a hypothesis h and data D which bears on the
hypothesis:
P ( D | h) P ( h)
P ( h | D) 
P( D)
 P(h): independent probability of hypothesis h: prior
probability of hypothesis
 P(D): independent probability of data D
 P(D|h): conditional probability of data D, given
hypothesis h is true. (called likelihood of data D, given
hypothesis h
 P(h|D): conditional probability of hypothesis h being true,
given data D: posterior probability of hypothesis
 Question: Write the Bayes’ theorem on Posterior Probability and identify the
following terms — prior and posterior probability, likelihood, independent
probability.
Maximum A Posterior
Question: Explain how to compute the Maximum A Posterior (MAP)
hypothesis hMAP for some given data D.
 Based on Bayes’ Theorem, we can compute the
Maximum A Posterior (MAP) hypothesis h for the
given data D
 We are interested in the best hypothesis for some space
H given observed training data D.
hMAP  argmax P(h | D)
hH
P ( D | h) P ( h)
 argmax
hH P( D)
 argmax P( D | h) P(h)
H: set of all hypothesis. hH

Note that we can drop P(D) as the probability of the data is

constant (and independent of the hypothesis).
Maximum Likelihood
Question: Explain how to compute the Maximum Likelihood
hypothesis hML for some given data D. What is the assumption for
Maximum Likelihood hypothesis?

 Now assume that all hypotheses are equally

probable a priori,
 i.e., P(hi ) = P(hj ) for all hi, hj belong to H.
 This is called assuming a uniform prior. It
simplifies computing the posterior:
hML  arg max P( D | h)
hH

 This hypothesis is called the maximum

likelihood hypothesis. (because, P(D|h) is called
the likelihood of Data D, given the hypothesis h is
true)
Desirable Properties of Bayes Classifier
Question: What are the desirable properties of the Bayes’ Classifier? Or,
Write three desirable properties of the Bayes’ classifier.

 Incrementality: with each training example, the

prior probability and the data likelihood can be
updated dynamically. This is flexible and robust
to errors.
 Combines prior knowledge and observed data:
Conceptually Sound and beautiful! The prior
probability of a hypothesis multiplied with
probability of the data, given the hypothesis
 Probabilistic hypothesis: Outputs not only a
classification, but a probability distribution over
all classes
Bayes Classifier: Summary
 P (class | data); Here, D=Data= <sunny, hot, normal, strong>
= P(C|D) D = x = <x1, x2, x3, x4>
= P (CD) / P(D) [conditional probability formula P(A|B)=P(AB)/P(B)]
= P (C)*P(D|C) / P(D) [ P(D|C)=very easy, but P(C|D) = difficult]
(likelihood) (posterior probab.)

Suppose Two classes c1 and c2. Then, test data x observed.

To which class should you classify (assign) the data x ?

Find posterior probabilities: P(c1|x) and P(c2|x) in above way

P(c1|x) = P (c1) * P(x|c1) / P(x) ; P(c2|x) = P (c2) * P(x|c2) / P(x)
Assign x to the class with Higher posterior probability value

* Same Denominator P(x), so you can compute Numerator only

Example Data: ‘Play Cricket’ data
Day Outlook Temperature Humidity Wind Play
Cricket

Day1 Sunny Hot High Weak No

Day2 Sunny Hot High Strong No

Attributes Day3 Overcast Hot High Weak Yes

Day4 Rain Mild High Weak Yes Class
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No

*Question: For the day <sunny, hot, normal, strong>,

what’s the play prediction?
Notations
 D is the data table given
 Each row is an example
 We have 4 attributes
 x1 = outlook, possible values = {sunny, overcast, rain}
 x2 = temperature, possible values= {hot, mild, cool}
 x3 = humidity, possible values = {high, normal}
 x4 = wind, possible values = {weak, strong}
 Classes C = {yes, no} = {Y,N}
 In general,
 We are given training data D
 D contains instances of different classes cj  C
 Each instance x contains the values for the attributes x = < x1, x2,…,xn>
Bayesian Classification: Motivation
 Classification can be written as:
 Given an object x
 with attribute x1, … xn having values v1, … , vn
 what is the probability that object x belong to class cj?
 Given day is <sunny, hot, high, strong>, what’s the
probability of playing cricket?
 P(x  Y | outlook = sunny, temperature = hot, humidity = high,
wind = strong)
 In probability notation: P(x  cj | x1=v1, …, xn=vn)?
 Use simplified notation: P(cj | X)
 Informally, Find P( Y| <sunny, hot, normal, strong> ) and
P( N | <sunny, hot, normal, strong> )
Bayes Classifiers: Key idea:
Assign the most probable class c MAP using Bayes’ theorem

cMAP  argmax P(c j | X )

c j C
P ( X | c j ) P (c j )
 argmax
c j C P( X )
 argmax P( X | c j ) P(c j )
c j C

 argmax P( x1  v1 , x2  v1 , , xn  vn | c j ) P (c j )
c j C

means : P (x1=sunny, x2=hot, x3=Normal, x4=Strong | Y) * P (Y)

 What does each term mean?

 P(X | cj) = P(x1=v1, …, xn=vn | x  cj)
 Probability that the attributes x1, … xn will have values v1, … ,
vn given an object x is in class cj
 P(cj) : Prior probability of class cj
 Probability that an random object selected is of class cj
 Note: P(X) = P(x1=v1, …, xn=vn)
 Probability that a random object selected will have values v1,
… , vn for attributes x1, … xn
Estimation
 But how to estimate the probabilities?
 Draw a sample!
 Which, in classification term, means look at the
training set
Parameters Estimation
Question: What is assumption of Naïve Bayes classifier (NBC)? Write and explain
the Independence assumption of NBC. Why is it called the Naïve Bayes classifier?

 P(cj) Can be estimated from the frequency of classes in the

training data D. Ans (last Part): Because of the Independence Assumption, which is
usually not True. For example, the probability of humidity=high and
temp = cool is not independent of (actually, highly Correlated to)
the Outlook = “rainy
 P(x1,x2,…,xn|cj)
 O(|X|n•|C|) possible combinations of the parameters
 Could only be estimated if a very, very large number of
training examples was available.
 Independence Assumption: attribute values are
conditionally independent of each other given the target
value: this is why it is called naïve Bayes’ Classifier.
P( x1 , x2 ,, xn | c j )   P( xi | c j )
i
c NB  arg max P(c j ) P( xi | c j )
c j C i
Properties

 Estimating P( xi | c j ) instead of P( x1 , x2 ,, xn | c j ) greatly

reduces the number of parameters (and the data
sparseness).
 The learning step in Naïve Bayes’ consists of
estimating P( xi | c j ) and P(c j ) based on the
frequencies in the training data
 An unseen instance is classified by computing the
class that maximizes the posterior
 When conditioned independence is satisfied,
Naïve Bayes corresponds to MAP classification.
Example. ‘Play Cricket’ data
Day Outlook Temperature Humidity Wind Play
Cricket

Day1 Sunny Hot High Weak No

Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No

*Question: For the day <sunny, hot, normal, strong>,

what’s the Play prediction?
Naive Bayes’ Classifier – Computations
*Question: For the day <sunny, hot, normal, strong>,
what’s the Play prediction?

 Given a D, we can compute the probabilities P(xi | cj)

Outlook Y N Humidity Y N
sunny 2/9 3/5 high 3/9 4/5
overcast 4/9 0 normal 6/9 1/5
rain 3/9 2/5
Tempreature Windy
hot 2/9 2/5 strong 3/9 3/5
mild 4/9 2/5 weak 6/9 2/5
cool 3/9 1/5

 P (sunny | CY) = 2/9 P(normal | CY) = 6/9

 P (sunny | CN) = 3/5 P(strong | CN) = 3/5
Naive Bayes’ Classifier – Computations
*Question: For the day <sunny, hot, normal, strong>,
what’s the Play prediction?

Outlook Y N Humidity Y N  Now with object x = (sunny,

sunny 2/9 3/5 high 3/9 4/5
hot, normal, strong)
overcast 4/9 0 normal 6/9 1/5
rain 3/9 2/5
Tempreature Windy P(CY) P(X|CY) = 9/14 * 2/9 * 2/9
hot 2/9 2/5 Strong 3/9 3/5 * 6/9 * 3/9 = 0.007
mild 4/9 2/5 Weak 6/9 2/5
cool 3/9 1/5
P(CN) P(X|CN) = 5/14 * 3/5 * 2/5
* 1/5 * 3/5 = 0.01

P(CY) = 9 / 14 x is in class CN (‘NO’ Play Cricket)

P(CN) = 5 / 14
Underflow Prevention
Question: Describe the source of floating point underflow problem in the
Naïve Bayes Classifier. Explain how this problem can be solved.

 Multiplying lots of probabilities, which are

between 0 and 1 by definition, can result in
floating-point underflow.
 Since log(xy) = log(x) + log(y),
 it is better to perform all computations by summing
logs of probabilities rather than multiplying
probabilities.
 Class with highest final log sum of probabilities is
still the most probable. (underflow prevention by Log)
cUPL  argmax log P(c j ) 
c jC

i positions
log P( xi | c j )
Naïve Bayes’ Classifier – Parameter Estimation
Question: What is the problem associated with Continuous attribute values in a
Naïve Bayes classifier? How this problem is solved and what is the underlying
assumption?

 More detail in estimating probabilities:

 If i-th attribute is categorical:
P(xi|cj) is estimated as the relative freq of samples
having value xi as i-th attribute in class cj
 If i-th attribute is continuous:
P(xi|cj) can be estimated through a Gaussian density
function on the attribute’s values. This called ‘Gaussian
Distribution Assumption’ of Naïve Bayes Classifier, which
assumes that each continuous attribute is distributed by
Gaussian (𝜇, 𝜎 2 ) distribution. The mean 𝜇 and variance 𝜎 2
can be estimated by observing the Trainig data Table D
 Computationally easy in both cases
Estimation for Continuous attributes –
Q: For the following Table, classify
the Test case X: <No, c a l Single, c a l 120K> us
r i r i o
using Naïve Bayes
g o g o
Classifier t inu ss
e t e t n a
ca ca c o cl
Tid Refund Marital Taxable
Status Income Evade 
( xi   j )2
1 Yes Single 125K No 
1 2 2j
2 No Married 100K No P( xi | c j )  e
3 No Single 70K No 2 2
j
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

1 
( 120110) 2

P( Income  120 | No)  e 2 ( 2975)

 0.0072
2 (54.54)
Estimation for Continuous attributes –
Q: For the following
l Table,
l classify
a
the Test casericX: <No, ic a
Single, us
120K>
o
r
using Naïve g o
Bayes g o
Classifier t inu s
te te n as
ca ca co cl
Tid Refund Marital Taxable
Status Income Evade  Normal distribution:
( xi   j )2
1 Yes Single 125K No 
1 2 2j
2 No Married 100K No P( xi | c j )  e
3 No Single 70K No 2 2
j
4 Yes Married 120K No  One for each Continuous
5 No Divorced 95K Yes attribute each class
6 No Married 60K No
7 Yes Divorced 220K No  For (Income, Class=Yes):
8 No Single 85K Yes
 If Class=Yes
9 No Married 75K No
 sample mean = 90
10 No Single 90K Yes
10

 sample variance = 25
(120 90)2
1 
P( Income  120 | Yes)  e 2(50)
 0.00000696
2 (7.07)
Estimation for Continuous attributes –
Q: For the following Table, classify
a l a l s120K>
the Test case ic X: <No,
ic Single, u
o
using Naïve
r
Bayes o r
Classifier
n uo
g
t e g
t e ti n ss a
ca ca co cl
Tid Refund Marital Taxable
Status Income Evade  Test case X: <No, Single, 120K>
1 Yes Single 125K No
2 No Married 100K No
P(CY) = 3/10 P(CN) = 7/10
3 No Single 70K No
4 Yes Married 120K No
P(CY) P(X|CY) =
5 No Divorced 95K Yes = 3/10 * 3/3 * 2/3 * 0.00000696
6 No Married 60K No = 0.000001392
7 Yes Divorced 220K No
8 No Single 85K Yes P(CN) P(X|CN) =
9 No Married 75K No
= 7/10 * 4/7* 2/7 *0.0072 =
10 No Single 90K Yes
10
0.000823
x is in class CN (Evade = ‘NO’ class)
Question:
Question: Naïve Bayes Classifier

Posterior Probability ( Sunny | <humid, cold, fast> )

= P(Sunny) x P(Humid | Sunny)
x P(Cold | Sunny) x P(Fast | Sunny)
= calculate yourself

Posterior Probability ( Rainy | <humid, cold, fast> )

= P(Rainy) x P(Humid | Rainy)
x P(Cold | Rainy) x P(Fast | Rainy)
= calculate yourself
Question: What are the Applications of
Naïve Bayes Classifier?
Two More Questions

 Question: What are the limitations of MLE

classifier?
 Ans: The MLE classifier Does NOT consider the Prior
probabilities. So classification results may be
inaccurate, unless equal priors.
 Question: What are the Strengths and weaknesses/
Limitations of Naïve Bayes Classifier?
 Strengths: see the Desirable Properties of Bayes
Classifier (slide 25)
Weakness: Makes the ‘independence’ assumption of the feature
values which is not true in most cases. For example, the
probability of humidity=high and temp = cool is not independent
of (actually, highly Correlated to) the Outlook = “rainy

Module 3 Notes_AI
No ratings yet
Module 3 Notes_AI
55 pages
Deep Learning Material
No ratings yet
Deep Learning Material
136 pages
Bayes Theorem
100% (2)
Bayes Theorem
11 pages
PDF Rationality and Social Responsibility Essays in Honor of Robyn Mason Dawes Modern Pioneers in Psychological Science An Aps Lea Series 1 Har/DVD Edition Joachim I. Krueger download
100% (3)
PDF Rationality and Social Responsibility Essays in Honor of Robyn Mason Dawes Modern Pioneers in Psychological Science An Aps Lea Series 1 Har/DVD Edition Joachim I. Krueger download
82 pages
DM GTU Study Material Presentations Unit-3 21052021124240PM
No ratings yet
DM GTU Study Material Presentations Unit-3 21052021124240PM
54 pages
3.5 Session 14 - Naive Bayes Classifier
67% (3)
3.5 Session 14 - Naive Bayes Classifier
47 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
Addition Theorem, Multiplication Theorem, Baye's Theorem
100% (1)
Addition Theorem, Multiplication Theorem, Baye's Theorem
29 pages
Recurrent Problems
100% (2)
Recurrent Problems
31 pages
Sta 2030 Notes
No ratings yet
Sta 2030 Notes
103 pages
Lec3 - Conditional Probability
No ratings yet
Lec3 - Conditional Probability
69 pages
WIA2003/WIB2003 Probability and Statistics: Bayes' Theorem
No ratings yet
WIA2003/WIB2003 Probability and Statistics: Bayes' Theorem
43 pages
Erlang Distribution Queue
No ratings yet
Erlang Distribution Queue
16 pages
MLE and MAP Classifier
No ratings yet
MLE and MAP Classifier
55 pages
PSMOD - Chapter 1 - Concept of Probability
No ratings yet
PSMOD - Chapter 1 - Concept of Probability
40 pages
Introduction To Bayesian Vars: Tony Yates, Lecture To MSC Time Series, Bristol, Spring 2014
No ratings yet
Introduction To Bayesian Vars: Tony Yates, Lecture To MSC Time Series, Bristol, Spring 2014
42 pages
UNIT 5 NOTES DWM
No ratings yet
UNIT 5 NOTES DWM
18 pages
Quantitative Methods For Business 13th Edition Anderson Sweeney Williams Camm Cochran Fry Ohlmann Solution Manual
100% (45)
Quantitative Methods For Business 13th Edition Anderson Sweeney Williams Camm Cochran Fry Ohlmann Solution Manual
16 pages
CH 1 Overview of Basic Probability Theory
No ratings yet
CH 1 Overview of Basic Probability Theory
15 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
Probability Ch 4 (1)
No ratings yet
Probability Ch 4 (1)
29 pages
Heavy-Tail Phenomena - Probabilistic and Statistical Modeling - S. Resnick (Springer, 2007) WW PDF
No ratings yet
Heavy-Tail Phenomena - Probabilistic and Statistical Modeling - S. Resnick (Springer, 2007) WW PDF
410 pages
2 - Probability Distributions
No ratings yet
2 - Probability Distributions
63 pages
Conjoint Tutorial
No ratings yet
Conjoint Tutorial
20 pages
Classifiers: Numerical Problems and Solutions
No ratings yet
Classifiers: Numerical Problems and Solutions
13 pages
Queueing Theory MM1 Queue
No ratings yet
Queueing Theory MM1 Queue
8 pages
Queueing Theory MM1 Queue
No ratings yet
Queueing Theory MM1 Queue
8 pages
Assignment 3 Ai
No ratings yet
Assignment 3 Ai
6 pages
AIML UNIT 2
No ratings yet
AIML UNIT 2
22 pages
Minor Project Ii Report Text Mining: Reuters-21578: Submitted by
100% (1)
Minor Project Ii Report Text Mining: Reuters-21578: Submitted by
51 pages
Module 2 Notes
No ratings yet
Module 2 Notes
24 pages
Baitap Chapter5 Probability Statistics
No ratings yet
Baitap Chapter5 Probability Statistics
12 pages
Full Statistics
No ratings yet
Full Statistics
108 pages
Crimes in India
No ratings yet
Crimes in India
88 pages
Unit Iii Bayesian Learning
No ratings yet
Unit Iii Bayesian Learning
5 pages
Bayesian Network - Problem
100% (1)
Bayesian Network - Problem
4 pages
Practical Statistics For Geoscientists
No ratings yet
Practical Statistics For Geoscientists
180 pages
Informatics Practices: Numpy - Array
100% (1)
Informatics Practices: Numpy - Array
28 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
601 Chap2
No ratings yet
601 Chap2
40 pages
Naive Bayes and Sentiment
No ratings yet
Naive Bayes and Sentiment
19 pages
Chapter 14 - Cluster Analysis: Data Mining For Business Intelligence
No ratings yet
Chapter 14 - Cluster Analysis: Data Mining For Business Intelligence
31 pages
Naïve Bayes
No ratings yet
Naïve Bayes
15 pages
Lecture 5 Bayesian Classification 3
No ratings yet
Lecture 5 Bayesian Classification 3
103 pages
5 2 Multilayer Perceptron
No ratings yet
5 2 Multilayer Perceptron
17 pages
Lancaster - Sample Chapter - Intro To Modern Bayesian Econometrics
No ratings yet
Lancaster - Sample Chapter - Intro To Modern Bayesian Econometrics
69 pages
Hierarchical Clustering Methods
No ratings yet
Hierarchical Clustering Methods
22 pages
Bayes Classifier PDF
100% (1)
Bayes Classifier PDF
18 pages
Social Media Analysis Using Machine Learning
No ratings yet
Social Media Analysis Using Machine Learning
11 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
Bayes Theorem
No ratings yet
Bayes Theorem
16 pages
ML Cheatsheet Final
No ratings yet
ML Cheatsheet Final
32 pages
Probability For Enginners - 3
No ratings yet
Probability For Enginners - 3
31 pages
1 The Role of Statistics and The Data Analysis Process
100% (1)
1 The Role of Statistics and The Data Analysis Process
30 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
CS1 Mapping Syllabus PDF
No ratings yet
CS1 Mapping Syllabus PDF
9 pages
Spectral Clustering: Eyal David Image Processing Seminar May 2008
No ratings yet
Spectral Clustering: Eyal David Image Processing Seminar May 2008
52 pages
Neural Network in Financial Analysis
No ratings yet
Neural Network in Financial Analysis
33 pages
Minitab Statguide Multivariate
No ratings yet
Minitab Statguide Multivariate
25 pages
Refresher Probabilities Statistics PDF
No ratings yet
Refresher Probabilities Statistics PDF
3 pages
Unit 2
No ratings yet
Unit 2
29 pages
Customer Choice Tutorial
No ratings yet
Customer Choice Tutorial
15 pages
Probability Topics: 1.3.2 Bayes' Theorem
No ratings yet
Probability Topics: 1.3.2 Bayes' Theorem
5 pages
An Introduction To Objective Bayesian Statistics PDF
No ratings yet
An Introduction To Objective Bayesian Statistics PDF
69 pages
12 Outlier
No ratings yet
12 Outlier
55 pages
Ridge and Lasso in Python PDF
No ratings yet
Ridge and Lasso in Python PDF
5 pages
Anomaly Detection
No ratings yet
Anomaly Detection
11 pages
Statistical Inference
No ratings yet
Statistical Inference
106 pages
Bayesian Network
No ratings yet
Bayesian Network
15 pages
Bayes' Theorem: Where A and B Are Events and P (B) 0
No ratings yet
Bayes' Theorem: Where A and B Are Events and P (B) 0
2 pages
Binary Classification Tutorial With The Keras Deep Learning Library
No ratings yet
Binary Classification Tutorial With The Keras Deep Learning Library
33 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
DBSCAN
No ratings yet
DBSCAN
42 pages
Machine Learning in 10 Pages PDF
No ratings yet
Machine Learning in 10 Pages PDF
10 pages
GAM: The Predictive Modeling Silver Bullet: Author: Kim Larsen
No ratings yet
GAM: The Predictive Modeling Silver Bullet: Author: Kim Larsen
27 pages
Logistic Regression
No ratings yet
Logistic Regression
24 pages
Notes On Time Series Analysis
No ratings yet
Notes On Time Series Analysis
111 pages
Dendrograms & PFGE Analysis
No ratings yet
Dendrograms & PFGE Analysis
28 pages
Random Forest
No ratings yet
Random Forest
18 pages
Hult H. - Lindskog F. - Mathematical Modeling and Statistical Methods For Risk Management (2007)
No ratings yet
Hult H. - Lindskog F. - Mathematical Modeling and Statistical Methods For Risk Management (2007)
108 pages
Naïve Bayes Classifier (Week 8)
No ratings yet
Naïve Bayes Classifier (Week 8)
18 pages
Gamma Distribution
No ratings yet
Gamma Distribution
12 pages
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
No ratings yet
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
10 pages
DBSCAN
No ratings yet
DBSCAN
18 pages
TF Idf
100% (3)
TF Idf
38 pages
Polynomial Regression and Step Function
100% (1)
Polynomial Regression and Step Function
6 pages
The Multilayer Perceptron
No ratings yet
The Multilayer Perceptron
11 pages
Chapter 2: Estimating The Term Structure: 2.4 Principal Component Analysis
No ratings yet
Chapter 2: Estimating The Term Structure: 2.4 Principal Component Analysis
15 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
Generalized Additive Model
No ratings yet
Generalized Additive Model
10 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Matplotlib PDF
No ratings yet
Matplotlib PDF
16 pages
TF Idf Algorithm
No ratings yet
TF Idf Algorithm
4 pages
Cheatsheet Midterms 2 - 3
No ratings yet
Cheatsheet Midterms 2 - 3
2 pages
Conjoint analysis Complete Self-Assessment Guide
From Everand
Conjoint analysis Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Naive Bayes Classifier

Uploaded by

Naive Bayes Classifier

Uploaded by

Conditional Probability &

Naive Bayes’ Classifier

*** Slides that are important for Exam/Questions:

 *** If you Have full understanding on Bayes

 Slides that are relevant/important for Exam

 If P(B) = 0, P(A|B) is undefined

 Here, B1, B2, B3, B4, and B5 form a partition of S

 Odds of H = P(H)/P(Hc) (excluded!)

 the odds are 1/3 : 1

Note that we can drop P(D) as the probability of the data is

 Now assume that all hypotheses are equally

 This hypothesis is called the maximum

 Incrementality: with each training example, the

Suppose Two classes c1 and c2. Then, test data x observed.

Find posterior probabilities: P(c1|x) and P(c2|x) in above way

* Same Denominator P(x), so you can compute Numerator only

Day1 Sunny Hot High Weak No

Attributes Day3 Overcast Hot High Weak Yes

*Question: For the day <sunny, hot, normal, strong>,

cMAP  argmax P(c j | X )

means : P (x1=sunny, x2=hot, x3=Normal, x4=Strong | Y) * P (Y)

 What does each term mean?

 P(cj) Can be estimated from the frequency of classes in the

 Estimating P( xi | c j ) instead of P( x1 , x2 ,, xn | c j ) greatly

Day1 Sunny Hot High Weak No

*Question: For the day <sunny, hot, normal, strong>,

 Given a D, we can compute the probabilities P(xi | cj)

 P (sunny | CY) = 2/9 P(normal | CY) = 6/9

Outlook Y N Humidity Y N  Now with object x = (sunny,

P(CY) = 9 / 14 x is in class CN (‘NO’ Play Cricket)

 Multiplying lots of probabilities, which are

 More detail in estimating probabilities:

P( Income  120 | No)  e 2 ( 2975)

Posterior Probability ( Sunny | <humid, cold, fast> )

Posterior Probability ( Rainy | <humid, cold, fast> )

 Question: What are the limitations of MLE

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.