0% found this document useful (0 votes)
24 views16 pages

Email Spam CLassification

The document proposes an automated email classifier using Gaussian Naive Bayes algorithm to classify emails as spam or not spam. It discusses preprocessing the email dataset, training the Gaussian NB model, evaluating the model accuracy which was over 90%, and demonstrating how the trained model can classify new emails.

Uploaded by

Hamas Tech
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views16 pages

Email Spam CLassification

The document proposes an automated email classifier using Gaussian Naive Bayes algorithm to classify emails as spam or not spam. It discusses preprocessing the email dataset, training the Gaussian NB model, evaluating the model accuracy which was over 90%, and demonstrating how the trained model can classify new emails.

Uploaded by

Hamas Tech
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Spam Email Classifier

By Hamas ur Rehman
Problem Spamming is one of the major and common
Statement occurrence that accumulate a large number
of attacks.
• Unsolicited commercial / bulk e-mail, also known as
spam, becomes a major problem on the Internet.

• In recent figures, 40% of all mail is spam and costs


Internet users about $ 355 million per year.
Proposed
An Automated Email Classifier
Solution
This is a classification problem and we will
Introduction be using Gaussian NB algorithm to train our
model.
Execution Plan
We have proposed the following technique in order to classify
emails
Pre- Test the
Email GausianNB Classify
Process metrics of Test email
collections training email
Data our model

Not Spam

Spam
DATASET • The Dataset was taken from Kaggle.
https://www.kaggle.com/datasets/nitis
habharathi/email-spam-dataset

• Two primary columns body and label. 0


means ham and 1 means spam
How the data was
cleaned?
We cleaned the data using
NLTK library for python and
vanilla python functions.
• Performed word Tokenization
• Used Lemmatization
• Vectorized our data By bag of
words method
Algorithm
We tested out different
classification algorithms and
GaussianNB was giving the best
results on the test data
Model Evaluation
After training and finding the best
parameters we were able to get
90.07 % accuracy on our Test
data
Classification Report and Confusion matrix
Target Audience
Internet Service Providers (ISPs)
use spam filters to ensure they do
not deliver corrupt incoming
emails or links to the receiver and
this amounts to 45% emails being
spam
Demonstration
On the left you can see how
this model works. You can also
try it out by scanning the QR
code down below
Advantages
• It is very effective and is also adaptive,
so hard to fool
• Phenomenally accurate
• Adapt to changing spam
• It protects you
GitHub Repo
On the left you can scan the
QR code to go to the GitHub
repository of this project
Q&A

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy