Naive Bayes Classifiers - Parta
Naive Bayes Classifiers - Parta
• Assume that there is some underlying random process that generates the
values for variables, according to a well-defined but unknown probability
distribution
Since X is known for a particular instance but Y may not be, we are particularly
interested in the conditional probabilities P(Y|X).
Bayes’ Rule
For instance, Y could indicate whether the e-mail is spam, and X could indicate
whether the e-mail contains the words ‘Money’ and ‘lottery’.
A Loan Application Dataset
0 0 0.31 0.69
0 1 0.65 0.35
1 0 0.80 0.20
1 1 0.40 0.60
Bayes Theorem
Bayes theorem provides a
way to calculate the
probability of
a hypothesis given our prior
knowledge.
Prior Probability is the probability of an event before new data is collected i.e. P(spam) is
the probability of spam mails before any new mail is seen.
Marginal Likelihood also called evidence is the probability of the evidence event to
occur i.e. P(money) is the probability of mails include “money” in the text.
Likelihood is the probability of the evidence happen given that event is true i.e.
P(money|spam) is the probability of mail includes “money” given that the mail is spam.
Posterior Probability is the probability of an outcome after the evidence information has
been incorporated i.e. P(spam|money) is the probability of the mail is spam given that
mail includes “money” in the text.
Probabilistic Models
Bayes’ Rule
• P(Y|X) is the posterior probability because it is used after the features X are
observed.
• P(Y) is the prior probability, which in the case of classification tells how likely
each of the classes is a priori, i.e., before we have observed the data X.
• P(X) is the probability of the data, which is independent of Y and in most cases
can be ignored.
• P(X|Y) is likelihood function.
• Posterior probabilities and likelihoods can be easily transformed one into the
other using Bayes’ rule.
The above equation shows only the case where we have 3 evidence variables and
even with only 3 of them it is not easy to find an exact match.
Probabilistic Models
• maximum a posteriori (MAP) decision rule
Maximum a posteriori (MAP) is the hypothesis with the highest posterior
probability. After calculating the posterior probability for several hypotheses we
select the hypothesis with the highest probability.
Example: If P(spam|money) > P(not spam|money) then we can say that the mail
can be classified as spam. This is the maximum probable hypothesis.
• Naive Bayes models are great baseline models and are often used on very large
datasets, where training even a linear model might take too long
Machine Learning using Naïve Bayes ( GaussianNB)
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs # make a probabilistic prediction
from sklearn.naive_bayes import GaussianNB yhat_prob = model.predict_proba(X_test)
from sklearn.model_selection import train_test_split # make a classification prediction
from sklearn.datasets import load_iris y_pred = model.predict(X_test)
from sklearn import metrics print(metrics.accuracy_score(y_test, y_pred)) # 1.0
from sklearn.metrics import classification_report, cm=confusion_matrix(y_test, y_pred)
confusion_matrix fig, ax = plt.subplots(figsize=(8, 8))
iris = load_iris() ax.imshow(cm)
# create X (features) and y (response) ax.grid(False)
X = iris.data ax.xaxis.set(ticks=(0, 1,2), ticklabels=('Predicted 0s', 'Predicted 1s', 'Predicted 2s'))
y = iris.target ax.yaxis.set(ticks=(0, 1,2), ticklabels=('Actual 0s', 'Actual 1s', 'Actuals 2s'))
print(X.shape,y.shape) #(150, 4) (150,) ax.set_ylim(2.5, -0.5)
# define the model for i in range(3):
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = for j in range(3):
0.2,random_state=15) # Spliting into train & test dataset ax.text(j, i, cm[i, j], ha='center', va='center', color='red')
print(X_train.shape,X_test.shape) #(120, 4) (30, 4) plt.show()
model = GaussianNB() print(classification_report(y_test, y_pred))
# fit the model
model.fit(X_train,y_train)
Machine Learning using Naïve Bayes ( GaussianNB)
precision recall f1-score support
0 1.00 1.00 1.00 8
1 1.00 1.00 1.00 13
2 1.00 1.00 1.00 9
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
Naive Bayes classifier for multinomial models
The multinomial Naive Bayes classifier is suitable for classification with discrete
features such as word counts for text classification. It requires integer feature counts
such as bag-of-words or tf-idf feature extraction applied to text.
For this example, the dataset called “Twenty Newsgroups”, which is a collection of
approximately 20,000 newsgroup documents partitioned evenly across 20 different
newsgroups.