0% found this document useful (0 votes)
112 views17 pages

Naive Bayes Classifiers - Parta

Naïve Bayes classifiers are a type of probabilistic model that use Bayes' theorem with strong independence assumptions. They are useful for text classification problems where the features are the counts of words. An example is classifying news articles using a multinomial Naïve Bayes classifier with a pipeline of TF-IDF feature extraction and the model. The model is trained on document counts, predicts labels for test data, and a confusion matrix is created to evaluate performance.

Uploaded by

Akshay kashyap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views17 pages

Naive Bayes Classifiers - Parta

Naïve Bayes classifiers are a type of probabilistic model that use Bayes' theorem with strong independence assumptions. They are useful for text classification problems where the features are the counts of words. An example is classifying news articles using a multinomial Naïve Bayes classifier with a pipeline of TF-IDF feature extraction and the model. The model is trained on document counts, predicts labels for test data, and a confusion matrix is created to evaluate performance.

Uploaded by

Akshay kashyap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Naïve Bayes Classifiers

Machine Learning Models - Types


Probabilistic Models
• Use probability theory, random variables, probability distributions

• Assume that there is some underlying random process that generates the
values for variables, according to a well-defined but unknown probability
distribution

• Use data to find out more about the probability distribution

• Example: Naïve Bayes Classifiers, Gaussian Mixture Model (GMM)


Probabilistic Models
Probabilistic Models – Basic Terminology

X denote the variables we know about, e.g., instance’s feature values


Y denote the target variables we are interested in, e.g., the instance’s class
The key question is how to model the relationship between X and Y

Since X is known for a particular instance but Y may not be, we are particularly
interested in the conditional probabilities P(Y|X).

Bayes’ Rule

For instance, Y could indicate whether the e-mail is spam, and X could indicate
whether the e-mail contains the words ‘Money’ and ‘lottery’.
A Loan Application Dataset

Predicting the target old true true fair ?


Probabilistic Models
Understanding Probabilistic Model
Money Lottery P(Y=spam|Money,lottery) P(Y=ham|Money, Lottery)

0 0 0.31 0.69

0 1 0.65 0.35

1 0 0.80 0.20

1 1 0.40 0.60
Bayes Theorem
Bayes theorem provides a
way to calculate the
probability of
a hypothesis given our prior
knowledge.

Prior Probability is the probability of an event before new data is collected i.e. P(spam) is
the probability of spam mails before any new mail is seen.
Marginal Likelihood also called evidence is the probability of the evidence event to
occur i.e. P(money) is the probability of mails include “money” in the text.
Likelihood is the probability of the evidence happen given that event is true i.e.
P(money|spam) is the probability of mail includes “money” given that the mail is spam.
Posterior Probability is the probability of an outcome after the evidence information has
been incorporated i.e. P(spam|money) is the probability of the mail is spam given that
mail includes “money” in the text.
Probabilistic Models
Bayes’ Rule

• P(Y|X) is the posterior probability because it is used after the features X are
observed.
• P(Y) is the prior probability, which in the case of classification tells how likely
each of the classes is a priori, i.e., before we have observed the data X.
• P(X) is the probability of the data, which is independent of Y and in most cases
can be ignored.
• P(X|Y) is likelihood function.
• Posterior probabilities and likelihoods can be easily transformed one into the
other using Bayes’ rule.
The above equation shows only the case where we have 3 evidence variables and
even with only 3 of them it is not easy to find an exact match.
Probabilistic Models
• maximum a posteriori (MAP) decision rule
Maximum a posteriori (MAP) is the hypothesis with the highest posterior
probability. After calculating the posterior probability for several hypotheses we
select the hypothesis with the highest probability.
Example: If P(spam|money) > P(not spam|money) then we can say that the mail
can be classified as spam. This is the maximum probable hypothesis.

• maximum likelihood (ML) decision rule


Probabilistic Models - Example
Suppose that we have the training
data set as shown in the figure,
which has two attributes A and B,
and the class C.
We can compute all the
probability values required to
learn a naïve Bayesian classifier.
Naïve Bayes Classifier
Key points

• Tend to be faster in training than linear classifiers

• Efficient as they learn parameters by looking at each feature individually


and collect simple per-class statistics from each feature

• Generalization performance slightly worse than linear classifiers


like LogisticRegression and LinearSVC

• scikit-learn implements three kinds of naïve bayes classifiers:


• GaussianNB (for continuous data)
• BernoulliNB (for binary data)
• MultinomialNB (for count data – e.g.
how often a word
Naïve Bayes Classifier
Key points

• GaussianNB is mostly used on very high-dimensional data

• Other two variants used for sparse count data

• MultinomialNB usually performs better than BinaryNB

• Share many of the strengths and weaknesses of the linear models

• Very fast to train and to predict

• Naive Bayes models are great baseline models and are often used on very large
datasets, where training even a linear model might take too long
Machine Learning using Naïve Bayes ( GaussianNB)
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs # make a probabilistic prediction
from sklearn.naive_bayes import GaussianNB yhat_prob = model.predict_proba(X_test)
from sklearn.model_selection import train_test_split # make a classification prediction
from sklearn.datasets import load_iris y_pred = model.predict(X_test)
from sklearn import metrics print(metrics.accuracy_score(y_test, y_pred)) # 1.0
from sklearn.metrics import classification_report, cm=confusion_matrix(y_test, y_pred)
confusion_matrix fig, ax = plt.subplots(figsize=(8, 8))
iris = load_iris() ax.imshow(cm)
# create X (features) and y (response) ax.grid(False)
X = iris.data ax.xaxis.set(ticks=(0, 1,2), ticklabels=('Predicted 0s', 'Predicted 1s', 'Predicted 2s'))
y = iris.target ax.yaxis.set(ticks=(0, 1,2), ticklabels=('Actual 0s', 'Actual 1s', 'Actuals 2s'))
print(X.shape,y.shape) #(150, 4) (150,) ax.set_ylim(2.5, -0.5)
# define the model for i in range(3):
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = for j in range(3):
0.2,random_state=15) # Spliting into train & test dataset ax.text(j, i, cm[i, j], ha='center', va='center', color='red')
print(X_train.shape,X_test.shape) #(120, 4) (30, 4) plt.show()
model = GaussianNB() print(classification_report(y_test, y_pred))
# fit the model
model.fit(X_train,y_train)
Machine Learning using Naïve Bayes ( GaussianNB)
precision recall f1-score support
 0 1.00 1.00 1.00 8
1 1.00 1.00 1.00 13
2 1.00 1.00 1.00 9
 
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
Naive Bayes classifier for multinomial models
The multinomial Naive Bayes classifier is suitable for classification with discrete
features such as word counts for text classification. It requires integer feature counts
such as bag-of-words or tf-idf feature extraction applied to text.
For this example, the dataset called “Twenty Newsgroups”, which is a collection of
approximately 20,000 newsgroup documents partitioned evenly across 20 different
newsgroups.

from sklearn.datasets import fetch_20newsgroups # import dataset


data = fetch_20newsgroups()
data.target_names
# Selected categories
categories = ['talk.politics.misc', 'talk.religion.misc', 'sci.med', 'sci.space', 'rec.autos']# Create train
and test dataset
train = fetch_20newsgroups(subset='train', categories=categories)
test = fetch_20newsgroups(subset='test', categories=categories)
Naive Bayes classifier for multinomial models
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline # Create a pipeline
model = make_pipeline(TfidfVectorizer(), MultinomialNB(alpha=1))
# Fit the model with training set
model.fit(train.data, train.target) #Predict labels for the test set
labels = model.predict(test.data)
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt # Create the confusion matrix
conf_mat = confusion_matrix(test.target, labels, normalize="true") # Plot the confusion matrix
sns.heatmap(conf_mat.T, annot=True, fmt=".0%", cmap="cividis", xticklabels=train.target_names,
yticklabels=train.target_names)
plt.xlabel("True label")
plt.ylabel("Predicted label")
Naive Bayes classifier for multinomial models

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy