0% found this document useful (0 votes)
10 views

Lecture-7 Classification Using Naive Bays

Classification Using Naive Bays

Uploaded by

Rimsha Shabbir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lecture-7 Classification Using Naive Bays

Classification Using Naive Bays

Uploaded by

Rimsha Shabbir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Classification in

Machine Learning
Lecture 07
Discussion
2

Establishing the
prayer means
fixing it where it
needs repair &
maintaining it
once you do.
The prayer is
our reminder
that we’ll all
stand before
Allah on
Judgment Day.
Agenda:
•A Quick Recap (Important Concepts)
•Naïve Bayes Classifier
•Principle of Naïve Bayes
•Bayes theorem
•Why Bayes Classification
•Example
•Advantages and Disadvantages
•Conclusion

3
What is a Classifier?

A classifier is a machine learning model that is used to


discriminate different objects based on certain features

What is Naïve Bayes Classifier?

Naive Bayes is a supervised learning algorithm used


for classification tasks. Hence, it is also called Naive
Bayes Classifier

4
Principle of Naive Bayes Classifier:
A Naive Bayes classifier is a probabilistic machine
learning model that’s used for classification task. The
crux of the classifier is based on the Bayes theorem.

5
Why Naïve?

• Using Bayes theorem, we can find the


probability of A happening, given that B has
occurred.
• Here, B is the evidence and A is the
hypothesis.
• The assumption made here is that the
predictors/features are independent.
• That is presence of one particular feature does
not affect the other. Hence it is called naïve. 6
A Quick Recap:
• Probability simply means the likelihood of an event to occur and
always takes a value between 0 and 1 (0 and 1 inclusive).
• Conditional probability is the likelihood of an event A to occur
given that another event that has a relation with event A has
already occurred.
• The probability of event A given that event B has occurred is
denoted as p(A|B)
• Joint probability is the probability of two events occurring
together and denoted as p(A and B). For independent events,
joint probability can be written as:
• p(A and B) = p(A).p(B) ……… (1)
• p(A and B) = p(A).p(B|A) ……… (2) Dependent events

7
Bayes’ Theorem
We will start with the fact that joint probability is commutative for
any two events. That is:
p(A and B) = p(B and A) ……… (3)
From equation 2, we know that:
p(A and B) = p(A).p(B|A)
p(B and A) = p(B).p(A|B)
We can rewrite equation 3 as:
p(A).p(B|A) = p(B).p(A|B)
Dividing two sides by p(B) gives us the Bayes’ Theorem:

8
Example:
• Let us take an example to get some better
intuition.
• Consider the problem of playing golf.

9
Example:
• We classify whether the day is suitable for playing golf,
given the features of the day.
• If we take the first row of the dataset, we can observe
that is not suitable for playing golf if the outlook:rainy,
temperature : hot, humidity : high and windy: False
• Assumption-I: we consider that these predictors are
independent
- if the temperature is hot, it does not necessarily mean that the
humidity is high
• Assumption-II: all the predictors have an equal effect
on the outcome
- the day being windy does not have more importance in
deciding to play golf or not

10
Example:
• According to this example, Bayes theorem can be rewritten as:

• The variable y is the class variable(play golf), which represents if


it is suitable to play golf or not given the conditions.
Variable X represent the parameters/features.
• X is given as,

• Here x_1,x_2….x_n represent the features, i.e they can be


mapped to outlook, temperature, humidity and windy. By
substituting for X and expanding using the chain rule we get,

11
Example:
• Now, you can obtain the values for each by looking at the dataset
and substitute them into the equation.
• For all entries in the dataset, the denominator does not change, it
remain static. Therefore, the denominator can be removed and a
proportionality can be introduced

• In our case, the class variable(y) has only two outcomes, yes or
no. There could be cases where the classification could be
multivariate. Therefore, we need to find the class y with maximum
probability.

• Using the above function, we can obtain the class, given the
predictors 12
Bayesian Classification: Why?
• A statistical classifier: performs probabilistic prediction, i.e.,
predicts class membership probabilities
• Foundation: Based on Bayes’ Theorem.
• Performance: A simple Bayesian classifier, naïve Bayesian classifier,
has comparable performance with decision tree and selected neural
network classifiers
• Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is correct — prior
knowledge can be combined with observed data
• Standard: Even when Bayesian methods are computationally
intractable, they can provide a standard of optimal decision making
against which other methods can be measured

13
Naïve Bayes Classifier: Training Dataset

14
Naïve Bayes Classifier An Example : age
youth
youth
income student credit_rating
high
high
no
no
fair
excellent
buys_computer
no
no
middle-a high no fair yes
senior medium no fair yes
senior low yes fair yes
senior low yes excellent no
middle-a low yes excellent yes


youth medium no fair no
P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643 youth low
senior medium
yes fair
yes fair
yes
yes

P(buys_computer = “no”) = 5/14= 0.357 youth medium


middle-a medium
yes excellent
no excellent
yes
yes

• Compute P(X|Ci) for each class


missle-a high yes fair yes
senior medium no excellent no

P(age = “youth” | buys_computer = “yes”) = 2/9 = 0.222


P(age = “youth ” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
•X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)

49
Avoiding the Zero-Probability Problem
Advantages and Disadvantages:
Conclusion:
• Naive Bayes algorithms are mostly used in sentiment
analysis, spam filtering, recommendation systems etc.
• They are fast and easy to implement but their biggest
disadvantage is that the requirement of predictors to be
independent
• In most of the real life cases, the predictors are
dependent, this hinders the performance of the
classifier.

18
Thank you
Any Question?

19

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy