0% found this document useful (0 votes)
18 views15 pages

2.3 Bayes Classification

The document discusses Bayes classification, focusing on Bayes' Theorem and Naive Bayesian Classification, which predicts class membership probabilities based on the assumption of class conditional independence. It explains the derivation of the Naive Bayes Classifier, its training dataset, and provides an example of its application. Additionally, it addresses the advantages and disadvantages of the Naive Bayesian Classifier, including its ease of implementation and the challenge of handling dependencies among variables.

Uploaded by

PRIYADHARSHINI D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views15 pages

2.3 Bayes Classification

The document discusses Bayes classification, focusing on Bayes' Theorem and Naive Bayesian Classification, which predicts class membership probabilities based on the assumption of class conditional independence. It explains the derivation of the Naive Bayes Classifier, its training dataset, and provides an example of its application. Additionally, it addresses the advantages and disadvantages of the Naive Bayesian Classifier, including its ease of implementation and the challenge of handling dependencies among variables.

Uploaded by

PRIYADHARSHINI D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

SRI KRISHNA COLLEGE OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF M.Tech. CSE

21CSI501 DATA WAREHOUSING AND MINING

MODULE 2

2.3 BAYES CLASSIFICATION


Topics covered

• Bayes Theorem
• Naive Bayesian Classification

• Predicting class label using naive Bayesian classification


Bayesian Classification - introduction
• A statistical classifier: performs probabilistic prediction, i.e.,
predicts class membership probabilities
• Foundation: Based on Bayes’ Theorem.
• Performance: A simple Bayesian classifier, naive Bayesian
classifier, has comparable performance with decision tree
and selected neural network classifiers
• Assumption: Effect of an attribute value on a given class
is independent of the values of the other attributes. This
assumption is called class conditional independence.
Bayesian Theorem: Basics
• Let X be a data sample (“evidence”)
• Let H be a hypothesis that X belongs to class C
• P(H|X) - posterior probability
• Classification is to determine P(H|X), the probability that the
hypothesis holds given the observed data sample X
• Eg. X is a 35-year-old customer with an income of $40,000.
Suppose that H is the hypothesis that our customer will buy
a computer. Then P(H|X) reflects the probability that
customer X will buy a computer given that we know the
customer’s age and income.
• P(H) (prior probability), the initial probability of H
– E.g. probability that any given customer will buy a
computer, regardless of age, income, …
Bayesian Theorem: Basics
• P(X|H) - posterior probability of X conditioned on H -
probability that a customer, X, is 35 years old and earns
$40,000, given that we know the customer will buy
computer.
• P(X) is the prior probability of X. Eg. it is the probability
that a person from our set of customers is 35 years old and
earns $40,000.
Bayesian Theorem
• Given training data X, posteriori probability of a hypothesis
H, P(H|X), follows the Bayes theorem

• Informally, this can be written as


posteriori = likelihood x prior/evidence
• Predicts X belongs to Ci iff the probability P(Ci|X) is the
highest among all the P(Ck|X) for all the k classes
• Practical difficulty: require initial knowledge of many
probabilities, significant computational cost
Towards Naive Bayesian Classifier
• Let D be a training set of tuples and their associated class
labels, and each tuple is represented by an n-D attribute
vector X = (x1, x2, …, xn)
• Suppose there are m classes C1, C2, …, Cm.
• Classification is to derive the maximum posteriori, i.e., the
maximal P(Ci|X)
• This can be derived from Bayes’ theorem

• Since P(X) is constant for all classes, only

needs to be maximized
Derivation of Naive Bayes Classifier
• Assumption - attributes are conditionally independent (i.e.,
no dependence relation between attributes):

• reduces the computation cost - only counts the class


distribution
• If Ak is categorical, P(xk|Ci) is the no. of tuples in Ci having
value xk for Ak divided by |Ci, D| (no. of tuples of Ci in D)
• If Ak is continous-valued, P(xk|Ci) is usually computed based
on Gaussian distribution with a mean μ and standard
deviation σ

and P(xk|Ci) is
Naïve Bayesian Classifier: Training Dataset

Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’

Data sample
X = (age <=30,
Income = medium,
Student = yes
Credit_rating = Fair)
Naïve Bayesian Classifier: Training Dataset
Naïve Bayesian Classifier: Training Dataset
Naïve Bayesian Classifier: Training Dataset
Naïve Bayesian Classifier: An Example
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
• Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”)
= 0.044 * 0.643 = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”)
= 0.019 * 0.357 = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
Avoiding the 0-Probability Problem
• Naïve Bayesian prediction requires each conditional prob. be
non-zero. Otherwise, the predicted prob. will be zero

• Ex. Suppose a dataset with 1000 tuples, income=low (0), income=


medium (990), and income = high (10),
• Use Laplacian correction (or Laplacian estimator)
– Adding 1 to each case
Prob(income = low) = 1/1003
Prob(income = medium) = 991/1003
Prob(income = high) = 11/1003
– The “corrected” prob. estimates are close to their “uncorrected”
counterparts
Naïve Bayesian Classifier: Comments
• Advantages
– Easy to implement
– Good results obtained in most of the cases
• Disadvantages
– Assumption: class conditional independence, therefore
loss of accuracy
– Practically, dependencies exist among variables
• E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
• Dependencies among these cannot be modeled by Naïve
Bayesian Classifier
• How to deal with these dependencies?
– Bayesian Belief Networks

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy