0% found this document useful (0 votes)
21 views37 pages

3 - (9-5) Naïve Bayesian Classifiers

The document outlines the Naïve Bayesian Classifier as a fundamental machine learning model, emphasizing its role as a benchmark for evaluating other models. It covers key concepts such as discrete probability, conditional probabilities, and Bayes' theorem, providing examples to illustrate these principles. Additionally, it discusses the calculation of mean and variance, which are essential for understanding the classifier's functionality.

Uploaded by

Mai Dương
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views37 pages

3 - (9-5) Naïve Bayesian Classifiers

The document outlines the Naïve Bayesian Classifier as a fundamental machine learning model, emphasizing its role as a benchmark for evaluating other models. It covers key concepts such as discrete probability, conditional probabilities, and Bayes' theorem, providing examples to illustrate these principles. Additionally, it discusses the calculation of mean and variance, which are essential for understanding the classifier's functionality.

Uploaded by

Mai Dương
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

EECS 658

Introduction to Machine Learning


David O. Johnson
Fall 2024

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 1


Reminders
• Assignment 1 due (today): 11:59 PM, Thursday, September 5
• Assignment 2 due: 11:59 PM, Thursday, September 19

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 2


Any Questions?

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 3


In-Class Problem Solution
• 2-(9-3) In-Class Problem Solution.pptx

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 4


Any Questions?

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 5


Sources
• https://en.wikipedia.org/wiki/Naive_Bayes_cl
assifier

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 6


Naïve Bayesian Classifier
• The Naïve Bayesian Classifier is the simplest ML
classifier.
• It is used as the “gold standard” for comparing
ML models.
• If you develop a new ML model that cannot
perform as well (or poorly) as the Naïve Bayesian
Classifier, then you better go back to the drawing
board.
• The Naïve Bayesian Classifier is based on some
fundamental concepts of discrete probability.
Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 7
Discrete Probability
• An experiment is a procedure that yields one of a given set of possible
outcomes.
• The sample space of the experiment is the set of possible outcomes.
• An event is a subset of the sample space.
• Laplace’s definition of the probability of an event with finitely many
possible outcomes:
– S is a finite nonempty sample space of equally likely outcomes
– E is an event, where E ∊ S
– Probability of E is p(E) = |E|/|S|
– 0 ≤ p(E) ≤ 1
Example:
• An urn contains 4 green balls and 5 red balls.
• What is the probability that a ball chosen at random from the urn is green?
Solution:
• |S| = 9 possible outcomes of choosing a ball from the urn at random
• |E| = 4 of the possible outcomes are green balls
• p(E) = |E|/|S| = 4/9
Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 8
Discrete Probability
Example:
• There are many lotteries now that award enormous prizes to
people who correctly choose a set of six numbers out of the first n
positive integers, where n is usually between 30 and 60.
• What is the probability that a person picks the correct six numbers
out of 40?
Solution:
• There is only one winning combination.
• The total number of ways to choose six numbers out of 40 is:
• C(40, 6) = 40!/34!6! = 3,838,380
• Consequently, the probability of picking a winning combination is
1/3,838,380 ≈ 0.00000026.
• ≈ means approximately equal to

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 9


Assigning Probabilities
• Often we want to calculate the probability of each event of an experiment.
Example (for all events):
• An urn contains 4 green balls and 5 red balls.
• What is the probability that a ball chosen at random from the urn is
green? p(green)
• What is the probability that a ball chosen at random from the urn is red?
p(red)
Solution:
• |S| = 9 possible outcomes of choosing a ball from the urn at random
• |green| = 4 of the possible outcomes are green balls
• p(green) = |green|/|S| = 4/9
• |red| = 5 of the possible outcomes are green balls
• p(red) = |red|/|S| = 5/9

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 10


Any Questions?

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 11


Conditional Probabilities
• Suppose that we flip a coin three times, and all eight possibilities are equally likely:
S = {TTT, TTH, THT, THH, HTT, HTH, HHT, HHH}
• Let F be the event where the first flip comes up tails: F = {TTT, TTH, THT, THH}
• If F has occurred, i.e., the first flip came up tails, what is the probability of the
event E, that an odd number of tails appears? E = {TTT, THH}

• This probability is called the conditional probability of E given F


• denoted by: p(E ∣ F).

• Because the first flip comes up tails, there are only four possible outcomes: TTT,
TTH, THT, and THH, i.e., |F| = 4
• This suggests that we should assign the probability of |E|/|F| = 2/4 = 1/2 to E,
given that F occurs.

Definition 3:
• Let E and F be events with p(F) > 0.
• The conditional probability of E given F, denoted by p(E ∣ F) is:
• p(E ∣ F) = p(E ∩ F)/p(F)

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 12


Conditional Probabilities
Example 3:
• A bit string of length four is generated at random so that each of the 16 bit strings
of length four is equally likely.
• What is the probability that it contains at least two consecutive 0s, given that its
first bit is a 0?
• We assume that 0 bits and 1 bits are equally likely.
Solution:
• Let E = {0000, 0001, 0010, 0011, 0100, 1000, 1001, 1100}
• Let F = {0xxx}
• The probability that a bit string of length four has at least two consecutive 0s,
given that its first bit is a 0 is:
• p(E ∣ F) = p(E ∩ F)/p(F)
• Because E ∩ F = {0000, 0001, 0010, 0011, 0100}, we see that p(E ∩ F) = 5/16.
• Because there are eight bit strings of length four that start with a 0, we have p(F) =
8/16 = 1/2.
• Consequently:
• p(E ∣ F) = (5/16)/(1/2) = 5/8

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 13


Bayes’ Theorem
• There are many times when we want to assess the probability that a
particular event occurs on the basis of partial evidence:
– The probability that a person has a disease given that this person tests positive
for a diagnostic test for the disease.
– The probability that an incoming e-mail message is spam using the occurrence
of words in the message.
• The result that we can use to answer questions such as these is called
Bayes’ theorem which dates back to the eighteenth century.
Bayes’ Theorem:
• If:
– E and F are events from a sample space S
– p(E) ≠ 0 and p(F) ≠ 0.
• Then:
𝑝𝑝 𝐸𝐸 𝐹𝐹 𝑝𝑝(𝐹𝐹)
𝑝𝑝 𝐹𝐹 𝐸𝐸 =
𝑝𝑝 𝐸𝐸 𝐹𝐹 𝑝𝑝 𝐹𝐹 + 𝑝𝑝 𝐸𝐸 𝐹𝐹 ′ 𝑝𝑝(𝐹𝐹 ′ )

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 14


Bayes’ Theorem
Example 1: Solution:
• We have two boxes. • Let E be the event that Bob has chosen a red ball.
• The first contains two green • Then E’ is the event that Bob has chosen a green
balls and seven red balls. ball.
• The second contains four • Let F be the event that Bob has chosen a ball from
green balls and three red balls. the first box.
• Bob selects a ball by first • Then F’ is the event that Bob has chosen a ball from
choosing one of the two boxes the second box.
at random. • We want to find p(F ∣ E), the probability that the
• He then selects one of the ball Bob selected came from the first box, given
balls in this box at random. that it is red.
• If Bob has selected a red ball, • p(E ∣ F) = 7/9 because 7 of the 9 balls in the first box
what is the probability that he are red
selected a ball from the first • p(F) = p(F’) = 1/2
box? • p(E ∣ F’) = 3/7 because 3 of the 7 balls in the second
box are red
• By Bayes’ Theorem
• p(F ∣ E) = p(E ∣ F)p(F)/[p(E ∣ F)p(F) + p(E ∣ F’)p(F’)]
• p(F ∣ E) = (7/9)·(1/2)/[(7/9)·(1/2)+(3/7)·(1/2)]
• p(F ∣ E) = 0.645

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 15


Generalized Bayes’ Theorem
• For Example 1, suppose we had 3 or more boxes instead of just 2?
• We can generalize Bayes’ Theorem as follows:
• Let:
– E be an event from the sample space S
– F1, F2, …, Fn are mutually exclusive events such that ⋃𝑛𝑛𝑖𝑖=1 𝐹𝐹𝑖𝑖 = 𝑆𝑆
– p(E) ≠ 0
– p(Fi) ≠ 0
• Then:
𝑝𝑝 𝐸𝐸 𝐹𝐹𝑗𝑗 𝑝𝑝(𝐹𝐹𝑗𝑗 )
𝑝𝑝 𝐹𝐹𝑗𝑗 𝐸𝐸 = 𝑛𝑛
∑𝑖𝑖=1 𝑝𝑝 𝐸𝐸 𝐹𝐹𝑖𝑖 𝑝𝑝(𝐹𝐹𝑖𝑖 )
• If n = 2, then this reverts to:
𝑝𝑝 𝐸𝐸 𝐹𝐹 𝑝𝑝(𝐹𝐹)
𝑝𝑝 𝐹𝐹 𝐸𝐸 =
𝑝𝑝 𝐸𝐸 𝐹𝐹 𝑝𝑝 𝐹𝐹 + 𝑝𝑝 𝐸𝐸 𝐹𝐹 ′ 𝑝𝑝(𝐹𝐹 ′ )

• Because by the complement rule: F1 = 1 – F2  F2 = F1’

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 16


Mean & Variance
• Two other statistics we will need to know to
understand the Naïve Bayesian Classifier:
– Mean (μ) (a.k.a, average or expected value)
– Variance (σ2)

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 17


Mean (μ)
• Mean (a.k.a, expected value) is the average of a set of
values.
• Mean = sum of values/# of values (n)
Sample weight (lbs)
1 180
2 190
3 170
4 165
5 100
6 150
7 130
8 150
Sum 1235
# of values (n) 8
Mean 154.375
Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 18
Variance (σ2)
• Variance is a measure of how much the values vary from
the mean.
• Variance = sum of (mean – values)2/n-1
Sample weight (lbs) (mean-weight)2
1 180 656.64
2 190 1269.14
3 170 244.14
4 165 112.89
5 100 2956.64
6 150 19.14
7 130 594.14
8 150 19.14
Sum 5871.88
n-1 7
Variance 838.84
Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 19
Any Questions?

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 20


Naïve Bayesian Classifier
• The Naïve Bayesian classifier is a conditional
probability model.
• A sample to be classified is represented by a
vector x = (x1, x2, … xn) of n feature values.
• It calculates conditional probabilities for this
sample:
p(Ci|x1, x2, … xn)
• for each possible class (Ci)
• The sample is then classified a Ci for the greatest
p(Ci|x1, x2, … xn).

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 21


Naïve Bayesian Classifier
• The problem with this model is that if the number of features n is
large or if a feature can take on a large number of values, then
basing such a model on probability tables is infeasible.
• We therefore reformulate the model to make it more tractable.
• Using Bayes' theorem, the conditional probability can be
decomposed as:

p(x |C i )P (C i )
P (C i | x ) =
p(x )
p(x |C i )P (C i )
= K
∑ p(x|C k )P(C k )
k =1
Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 22
Naïve Bayesian Classifier
• In plain English, using Bayesian probability terminology, this
equation can be written as:
posterior = prior · likelihood / evidence
• In practice, there is interest only in the numerator of that
fraction, because the denominator does not depend on any
Ci and the values of the features x are given, so that the
denominator is effectively constant.
• This reduces the problem to:
– Calculating the posterior numerator of the sample for each
class:
posterior numerator = prior · likelihood
– Classifying the sample based on the largest posterior numerator

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 23


Naïve Bayesian Classifier
• The posterior numerator is calculated:
posterior numerator (Ci) = P(Ci)·p(x1|Ci)·p(x2|Ci)· … ·p(xn|Ci)
• P(Ci) = # of Ci in training sample / # of training samples
• We can estimate p(xk|Ci) for any feature k with the following
formula:

p(xk|Ci) =

• where:
x = the value of feature k for the sample we are classifying
μk = mean value of feature k for the entire training set
σk2 = variance of feature k for the entire training set
e = 2.71828…

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 24


Naïve Bayesian Classifier
• In summary, to “train” a Naïve Bayesian
Classifier all you need to do is:
– Calculate one probability (P) for each class
– Calculate n*m conditional probabilities (p), where:
• n = number of classes
• m = number of features

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 25


Any Questions?

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 26


Naïve Bayesian Classifier Example
Problem:
• Classify whether a given gorilla is a male or a female based on the
measured features.
• The features include height, weight, and footsize.

Training Set
Gorilla height (feet) weight (lbs) footsize(inches)
male 6 180 12 Test Set (1 gorilla):
male 5.92 (5'11") 190 11 •height = 6 feet
male 5.58 (5'7") 170 12 •weight = 130 lbs
male 5.92 (5'11") 165 10 •footsize = 8 inches
female 5 100 6 •The test set could be larger.
female 5.5 (5'6") 150 8 •Each sample would be classified
female 5.42 (5'5") 130 7 in the same manner.
female 5.75 (5'9") 150 9

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 27


Naïve Bayesian Classifier Example
Solution:
•For the test sample calculate the numerator of the posterior probabilities:
posterior numerator (Ci) = P(Ci)·p(x1|Ci)·p(x2|Ci)· … ·p(xn|Ci)
•which is:
posterior numerator (male) = P(male)·p(height|male)·p(weight|male)·p(footsize|male)
posterior numerator (female) = P(female)·p(height|female)·p(weight|female)·p(footsize|female)
•Classify the test sample based on the largest posterior numerator.

•Let’s start with: posterior numerator (male)


Gorilla height (feet) weight (lbs) footsize(inches)
male 6 180 12
male 5.92 (5'11") 190 11 •We can calculate P(male) from
male 5.58 (5'7") 170 12 the training data:
male 5.92 (5'11") 165 10 •P(male) = # of males/# of gorillas
female 5 100 6 •P(male) = 4/8 = 0.5
female 5.5 (5'6") 150 8
female 5.42 (5'5") 130 7
female 5.75 (5'9") 150 9
Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 28
Naïve Bayesian Classifier Example
Solution (continued):
• We can estimate the conditional probabilities using the formula:
• p(x|C) = (1/√(2πσ2) exp (-(x-μ)2/2σ2) Note: exp (n) = en
• To do that, we need the mean (μ) and variance (σ2) for each feature of the
training set and the value of x for the test sample (6 feet)
• Thus:
• p(height|male) = (1/√(2πσ2) exp (-(x-μ)2/2σ2)
• p(height|male) = (1/√(2π·3.503*10−2) exp (-(6-5.855)2/2·3.503*10−2)
• p(height|male) = 1.5789
Note that a value greater than 1 is OK here – it is a probability density rather than a
probability, because height is a continuous variable
mean variance mean variance mean variance
Gorilla
(height) (height) (weight) (weight) (footsize) (footsize)
male 5.855 3.503*10−2 176.25 1.2292*102 11.25 9.1667*10−1
female 5.4175 9.723*10−2 132.5 5.5833*102 7.5 1.6667
Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 29
Naïve Bayesian Classifier Example
Solution (continued):
•Likewise, we can estimate the other conditional probabilities:
•p(weight|male) = 5.9881 · 10-6
•p(footsize|male) = 1.3112 · 10-3

•Now we can calculate the numerator of the posterior probability that the gorilla is a
male.
•posterior numerator (male) = P(male)·p(height|male)·p(weight|male)·p(footsize|male)
•posterior numerator (male) = 0.5·1.5789·(5.9881·10-6 )·(1.3112·10-3)
•posterior numerator (male) = 6.1984·10-9

mean variance mean variance mean variance


Gorilla
(height) (height) (weight) (weight) (footsize) (footsize)
male 5.855 3.503*10−2 176.25 1.2292*102 11.25 9.1667*10−1
female 5.4175 9.723*10−2 132.5 5.5833*102 7.5 1.6667

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 30


Naïve Bayesian Classifier Example
Solution (continued):
•Likewise, we can calculate the numerator of the posterior probability that the gorilla is
a female.
•P(female) = # females/# gorillas = 4/8 = 0.5
•p(height|female) = 2.2346·10-1
•p(weight|female) = 1.6789·10-2
•p(footsize|female) = 2.8669·10-1
•posterior numerator (female) = 5.3778·10-4

mean variance mean variance mean variance


Gorilla
(height) (height) (weight) (weight) (footsize) (footsize)
male 5.855 3.503*10−2 176.25 1.2292*102 11.25 9.1667*10−1
female 5.4175 9.723*10−2 132.5 5.5833*102 7.5 1.6667

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 31


Naïve Bayesian Classifier Example
Solution (continued):
• Comparing:
• posterior numerator (male) = 6.1984·10-9
• posterior numerator (female) = 5.3778·10-4
• We see:
– posterior numerator (female) > posterior numerator (male)
• Thus, we conclude the test sample gorilla is a female.

mean variance mean variance mean variance


Gorilla
(height) (height) (weight) (weight) (footsize) (footsize)
male 5.855 3.503*10−2 176.25 1.2292*102 11.25 9.1667*10−1
female 5.4175 9.723*10−2 132.5 5.5833*102 7.5 1.6667

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 32


Any Questions?

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 33


ML Model Comparison
Naïve Bayes
• Strengths:
– Simple to train
• Weaknesses:
– Called Naïve because it assumes a Gaussian distribution,
which is not always true in the real world.
– Assumes features are independent, which is rarely true in
the real world.
– Based on probability theory, which is okay, except real-
world probability distributions are much more complex
than typical probability distributions (e.g., Gaussian).

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 34


Any Questions?

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 35


In-Class Problem Rubric
Grading Level
Question Points Exceeds Expectations Meets Expectations Unsatisfactory
(90-100%) (80-89%) (0-79%)

1 10 per Formula is correct Formula contains a minor Formula contains a major


variety error error
P is correct, but
P is incorrect, but calculations are
2 10 per P P is correct and calculations contain a incomplete, OR P is
calculations are shown minor math error incorrect, but calculations
are shown and contain a
major math error.

3 20 Formula & substitutions Formula & substitutions Formula & substitutions


are correct contain a minor error contain a major error
Number is correct, but
Number of conditional calculations are
probabilities is correct Number is incorrect, but incomplete, OR number
4 10
and calculations are calculations contain a is incorrect, but
shown minor math error calculations are shown
and contain a major math
error.

5 10 Variety is correct and Variety is correct, but no 0 points, if variety is


reasoning is shown reasoning is shown incorrect
Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 36
In-Class Problem
Problem:
• Create a Naïve Bayesian Classifier for the iris dataset.
• Given:
– The iris data set contains 150 samples of data, 50 for each variety of iris: Iris-setosa, Iris-
versicolor, & Iris-virginica
– We will use 149 samples of the data to train the classifier, and test it with one sample of Iris-
virginica which has the following features:
• sepal-length = 5.9
• sepal-width = 3
• petal-length = 5.1
• petal-width = 1.8
1. Give the formula for the posterior numerator for each variety, e.g., posterior numerator(Iris-
setosa).
2. Calculate P for each variety, e.g., P(Iris-setosa)
3. Give the formula for p(sepal-length|Iris-setosa), if the mean value and variance of sepal-length for
Iris-setosa is 5.0 and 0.12, respectively. Substitute the values for x, μ, and σ2 into the formula.
4. How many conditional probabilities will the Naïve Bayesian Classifier need to calculate to classify a
test sample?
5. If posterior numerator(Iris-setosa) = 0.005, posterior numerator(Iris-versicolor) = 0.002, and
posterior numerator(Iris-virginica) = 0.003, which variety did the Naïve Bayesian Classifier predict
the test sample to be?

Naïve Bayesian Classifiers David O. Johnson EECS 658 (Fall 2024) 37

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy