0% found this document useful (0 votes)
24 views10 pages

46 - Ijme... Mech Engg..Research Paper-1

The document discusses the issue of email spam and the use of machine learning techniques for spam detection. It highlights various methodologies, including Naive Bayes and Neural Networks, achieving high accuracy rates in classifying emails as spam or ham. The research emphasizes the importance of developing effective spam detection systems due to the increasing volume of spam emails and the need for efficient filtering methods.

Uploaded by

divu271004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views10 pages

46 - Ijme... Mech Engg..Research Paper-1

The document discusses the issue of email spam and the use of machine learning techniques for spam detection. It highlights various methodologies, including Naive Bayes and Neural Networks, achieving high accuracy rates in classifying emails as spam or ham. The research emphasizes the importance of developing effective spam detection systems due to the increasing volume of spam emails and the need for efficient filtering methods.

Uploaded by

divu271004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

DOI : https://doi.org/10.

56452/7-12-46

ISSN: 0974-5823 Vol. 7 No. 12 December, 2022


International Journal of Mechanical Engineering

Email Spam Detection using Machine Learning


Dr. Nilesh Jain
Associate Professor, Mandsaur University nileshjainmca@gmail.com

Dr. B. K. Sharma
1Professor, Mandsaur University, Mandsaur, e-mail: bksharma7426@gmail.com

people’s attention away from genuine and


ABSTRACT- Spam emails are known as important emails and direct them towards
unrequested commercialized emails or deceptive detrimental situations. Spam emails are capable of
emails sent to a specific person or a company [5]. filling up inboxes or storage
Spams can be detected through natural language
processing and machine learning methodologies. capacities, deteriorating the speed of the internet to
Machine learning methods are commonly used in a great extent. These emails have the capability of
spam filtering. These methods are used to render corrupting one’s system by smuggling viruses into
spam classifying emails to either ham (valid it, or steal useful information and scam gullible
messages) or spam (unwanted messages) with the people. The identification of spam emails is a very
use of Machine Learning classifiers. The proposed tedious task and can get frustrating sometimes.
work showcases differentiating features of the
content of documents [4]. There has been a lot of While spam detection can be done manually,
work that has been performed in the area of spam filtering out a large number of spam emails can
filtering which is limited to some domains. take very long and waste a lot of time. Hence, the
Research on spam email detection either focuses need for spam detection softwares has become the
on natural language processing methodologies need of the hour. To solve this problem, various
[25] on single machine learning algorithms or one spam detection techniques are used now. The most
natural language processing technique [22] on common technique for spam detection is the
multiple machine learning algorithms [2]. In this utilization of Naive Bayesian [5] method and
Project, a modeling pipeline is developed to feature sets that assess the presence of spam
review the machine learning methodologies. keywords. The main purpose is to demonstrate an
alternative scheme, with the use of Neural Network
Keywords: Email Spam Detection, Spam (NN) [4] classification system that utilises a
Detection, Machine Learning, Neural Networks, collection of emails sent by several users, is one of
Naive Bayes, Support Vector Classifier, Logistic the objectives of this research. One other purpose
Regression, Spam, Social Media, Email. is the development of spam detection with the help
of Artificial Neural Networks, resulting in almost
98.8% accuracy.
1. INTRODUCTION
2. LITERATURE SURVEY
Technology has become a vital part of life in
today’s time. With each passing day, the use of the Email :
internet increases exponentially, and with it, the Electronic mail (email) is a messaging system that
use of email for the purpose of exchanging electronically transmits messages across computer
information and communicating has also networks. Anyone is free to use email services
increased, it has become second nature to most through Gmail, Yahoo or people can even register
people. While e-mails are necessary for everyone, with an Internet Service Provider (ISPs) and be
they also come with unnecessary, undesirable bulk provided with an email account. Only an internet
mails, which are also called Spam Mails [29]. connection is required, otherwise being a free
Anyone with access to the internet can receive service.
spam on their devices. Most spam emails divert
Copyrights @Kalahari Journals Vol.7 No.12 (December, 2022)
International Journal of Mechanical Engineering
490
DOI : https://doi.org/10.56452/7-12-46

Spam :
Bulk mails that are unnecessary and undesirable An approach using random forest algorithm
can be classified as Spam Mails. These spam approach is proposed by Akinyelu and Adewumi
emails hold the power to corrupt one's system by [1] in order to identify the phishing or spam emails.
filling up inboxes, degrading the speed of their It used 200 emails. The main motto of research was
internet connection. to reduce features and increase efficiency/accuracy.
Accuracy of up to 99.7% with a minimal amount of
Spam Detection : 0.06% false positives is achieved by the proposed
algorithm.
Many spam detection techniques are being used The research only covered the classification aspect
now-a-days. The methods use filters which can without considering vital information which can
prevent emails from causing any harm to the user. affect the results, especially, in case of limited text
The contributions and their weakness have been in the email.
identified.
Yüksel et al. [3] aimed to resolve the problem of
There are several methods that are accessible to spam by inhibiting the spam emails from being
spam, for example location of sender, it’s spread within the
contents, checking IP address or space names. email systems. To achieve this, they propose a
[26]. Spammers use refined variations to avoid cloud base system, which involves the
spam identification. Few measures connected identification of spam emails using analytics and
with spam identification are; Blacklist and white- machine learning algorithms like support vector
list, Machine learning approaches, Naïve machines and decision trees. The results of the
Bayes, Support Vector Machine, Neural Network tests show that the SVM leads to a higher accuracy
Classification. [27] of up to 97.6% and a false-positive rate of 2.33%.
The decision tree attains a lower accuracy of
A mobile system was proposed by Mahmoud et al. 82.6% and a false-positive rate of 17.3%. Results
[28] with the motive of blocking and identifying reveal that the increase in spam emails is affected
spam SMS. In their work, they attempted to by the no. of received emails. Lee et al. [28]
protect smartphones by filtering SMS spam that proposed an optimal technique for spam detection.
contains abbreviations and idioms. The system
was based on the Artificial Immune System (AIS) 2.1. EXISTING SYSTEMS
and Naïve Bayesian (NB) algorithm. By the use of
the Naive Bayes algorithm, the messages are Due to the increase in the number of email users,
classified based on their features. It used an SMS the amount of spam emails have also risen in
dataset with 1324 messages. Results from this number in the past years. It has now become even
system gave detection rate 82%, 6% positive rate more challenging to handle a wide range of emails
and 91% accuracy. for data mining and machine learning. Therefore,
many researchers have executed comparative
Table 1 : Spam Categories studies to see various classification algorithms
Categories Descriptions performances and their results in classifying emails
accurately with the help of a number of
Health The spam of fake performance metrics. Hence, it is important to find
medications an algorithm that gives the best possible outcome
Promotional The spam of fake fashion for any particular metric for correct classification
products items like clothes bags and of emails and spam or ham.
watches The present systems of spam detection are reliant
Adult content The spam of adult content on three major methods:-
of pornography and
prostitution A. Linguistic Based Methods
Finance & The spam of stock kiting, Unlike humans, who can grasp linguistic constructs
marketing tax solutions, and loan along with their exposition, machines cannot and
packages hence it is necessary to teach machines some
Phishing The spam of phishing or languages to help them understand these
fraud constructs. This is the technique that is used in
Copyrights @Kalahari Journals Vol.7 No.12 (December, 2022)
International Journal of Mechanical Engineering
491
DOI : https://doi.org/10.56452/7-12-46

places like search engines in order to ascertain the 3. Heuristic or Rule-Based Spam Filtering
next terms for suggestions to the user while they Technique
are typing their search. Sentences are divided into
two Unigrams (words taken are one by one) and Algorithms use pre-defined rules in the form of a
two Bigrams (words that are taken two at a time). regular expression to give a score to the messages
Since this technique requires that every present in the e-mails. Based on the scores
expression be remembered, this method is not generated, they segregate emails into spam non-
feasible and also time-intensive. [29] spam categories.

B. Behavior-Based Methods 4. The Previous Likeness Based Spam Filtering


This technique is Metadata-based. This approach Technique
requires that users generate a set of rules, and the
users must have a thorough understanding of Algorithms extract the incoming mails' features and
these rules. Since the attributes of spam change create a multi-dimensional space vector and draw
over time so the rules also need to be reformed points for every new instance. Based on the KNN
from time to time. As a result, it still requires a algorithm, these new points get assigned to the
human to scrutinise the details and is majorly closest class of spam and non-spam.
user-dependent. [29]
5. Adaptive Spam Filtering Technique
C. Graph-Based Methods
Algorithms classify the incoming mails in various
This technique uses a single graphical
groups and, based on the comparison scores of
representation by incorporating numerous,
every group with the defined set of groups, spam
heterogeneous particulars. Graph-based anomaly
and non-spam emails got segregated.
recognition algorithms are executed which detect
abnormal forms in the data showing behaviours
This article will give an idea for implementing
of spammers. This method is not dependable, so
content-based filtering using one of the most
it is taxing to recognise faulty opinions. [29]
famous algorithms for spam detection, which is K-
Feature
Nearest Neighbour (KNN).
Engineering mostly depends on the commercial
appeal of terms and is absolutely content-oriented, k-NN based algorithms are widely used for
and does not depend on statistics. All these clustering tasks. Let’s quickly know the entire
attributes lead to a noteworthy decline of this architecture of this implementation first and then
structure. explore every step. Executing these 5 steps, one
after the other, will help us implement our spam
3. PROPOSED METHOD classifier smoothly.

Many several techniques are present in the market Training Testing Phase
to detect spam e-mails. If we want to classify
broadly, there are 5 different techniques based on
which algorithms decide whether any mail is spam
or not.

1. Content-Based Filtering Technique

Algorithms analyze words, the occurrence of New Email Classification


words, and the distribution of words and phrases
inside the content of e-mails and segregate them
into spam non-spam categories.

2. Case Base Spam Filtering Method

Algorithms trained on well-annotated spam/non-


spam marked emails try to classify the incoming Step 1: E-mail Data Collection
mails into two categories.

Copyrights @Kalahari Journals Vol.7 No.12 (December, 2022)


International Journal of Mechanical Engineering
492
DOI : https://doi.org/10.56452/7-12-46

The dataset contained in a corpus plays a crucial


role in assessing the performance of any spam
filter. Many open-source datasets are freely
available in the public domain.

Train/Test Split: Split the dataset into train and


test datasets but make sure that both sets must
balance numbers of ham and spam emails ( ham is
a fancy name for non-spam emails).
Visualization for spam email
Below are a few of the famous repositories where
you can easily get thousand kind of data set for
free :
UC Irvine Machine Learning Repository
Kaggle datasets
AWS datasets

For this email spamming data set, it is distributed


by Spam Assassin, you can click this link to go to
the data set. There are a few categories of the data, Visualization for non spam email
you can read the readme.html to get more
background information on the data. From this visualization, you can notice something
interesting about the spam email. A lot of them are
In short, there is two types of data present in this having high number of “spammy” words such as:
repository, which is ham (non-spam) and spam free, money, product etc. Having this awareness
data. Furthermore, in the ham data, there are easy might help us to make better decision when it
and hard, which mean there is some non-spam comes to designing the spam detection system.
data that has a very high similarity with spam
data. This might pose a difficulty for our system to One important thing to note is that word cloud only
make a decision. displays the frequency of the words, not necessarily
the importance of the words. Hence it is necessary
Exploratory Data Analysis (EDA) to do some data cleaning such as removing
stopwords, punctuation and so on from the data
Exploratory Data Analysis is a very important before visualizing it.
process of data science. It helps the data scientist
to understand the data at hand and relates it with N-grams model visualization
the business context.
Another technique of visualization is by utilizing
The open source tools that I will be using in bar chart and display the frequency of the words
visualizing and analyzing my data is Word Cloud. that appear the most. N-gram means that how many
words you are considering as a single unit when you
Word Cloud is a data visualization tool used for are calculating the frequency of words.
representing text data. The size of the texts in the
image represent the frequency or importance of Followings are the example of 1-gram, and 2-gram.
the words in the training data.

Visualization

Wordcloud

Wordcloud is a useful visualization tool for you to


have a rough estimate of the words that has the
highest frequency in the data that you have.

Copyrights @Kalahari Journals Vol.7 No.12 (December, 2022)


International Journal of Mechanical Engineering
493
DOI : https://doi.org/10.56452/7-12-46

Train Data Distribution

Bar chart visualization of 1-gram model

Count For Test Data

Bar chart visualization of 2-gram model

Train Test Split

It is important to split your data set to training set


Test Data Distribution
and test set, so that you can evaluate the
performance of your model using the test set The distribution between train data and test data are
before deploying it in a production environment. quite similar which is around 20–21%, so we are
good to go and start to process our data !
One important thing to note when doing the train
test split is to make sure the distribution of the Data Preprocessing
data between the training set and testing set are
similar. Text Cleaning
What it means in this context is that the Text Cleaning is a very important step in machine
percentage of spam email in the training set and learning because your data may contains a lot of
test set should be similar. noise and unwanted character such as punctuation,
white space, numbers, hyperlink and etc.

Some standard procedures generally used are:


1. convert all letters to lower/upper case
2. removing numbers
3. removing punctuation
4. removing white spaces
5. removing hyperlink

Target Count For Train Data removing stop words such as a, about, above, down,
Copyrights @Kalahari Journals Vol.7 No.12 (December, 2022)
International Journal of Mechanical Engineering
494
DOI : https://doi.org/10.56452/7-12-46

doing and the list goes on… chopped off


clean_text = word_stemmer(dirty_text.split(" "))
Word Stemming and Word lemmatization these clean_text
are the two techniques are trying to reduce the #Output
words to its most basic form, but doing this with 'He studi in the hous yesterday, unluckily, the fan
different approaches. break down'

Word stemming — Stemming algorithms work The lemmatization has converted studies -> study,
by removing the end or the beginning of the breaks -> break
words, using a list of common prefixes and clean_text = word_lemmatizer(dirty_text.split(" "))
suffixes that can be found in that language. clean_text
Examples of Word Stemming for English words
are as below: #Output

'I study in the house yesterday, unluckily, the fan


break down'

Feature Extraction

Our algorithm always expect the input to be


Word Lemmatization — Lemmatization is integers/floats, so we need to have some feature
utilizing the dictionary of a particular language extraction layer in the middle to convert the words
and tried to convert the words back to its base to integers/floats.
form. It will try to take into account of the
meaning of the verbs and convert it back to the There are a couples ways of doing this as following
most suitable base form.
1. CountVectorizer

2. TfidfVectorizer

3. Word Embedding
Implementing these two algorithms to deal with
different edge cases. CountVectorizer

First we need to input all the training data into


Import the library and start designing some
CountVectorizer and the CountVectorizer will keep
functions to help us understand the basic working
of these two algorithms. a dictionary of every word and its respective id and
this id will relate to the word count of this word
# Just import them and use it inside this whole training set.
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer For example, a sentence like ‘I like to eat apple and
stemmer = PorterStemmer() drink apple juice’
lemmatizer = WordNetLemmatizer() from sklearn.feature_extraction.text import
dirty_text = "He studies in the house yesterday, CountVectorizer
unluckily, the fans breaks down" # list of text documents
text = ["I like to eat apple and drink apple juice"]
def word_stemmer(words):
stem_words = [stemmer.stem(o) for o in words] # create the transform
return " ".join(stem_words) vectorizer = CountVectorizer()
def word_lemmatizer(words): # tokenize and build vocab
lemma_words = [lemmatizer.lemmatize(o) for o vectorizer.fit(text)
in words] # summarize
print(vectorizer.vocabulary_)
return " ".join(lemma_words)
# encode document
The output of word stemmer is very obvious, vector = vectorizer.transform(text)
some of the endings of the words have been # summarize encoded vector

Copyrights @Kalahari Journals Vol.7 No.12 (December, 2022)


International Journal of Mechanical Engineering
495
DOI : https://doi.org/10.56452/7-12-46

print(vector.shape) Simply put, word embedding is a very powerful


print(type(vector)) representation of the words and one of the well
print(vector.toarray()) known techniques in generating this embedding is
# Output Word2Vec.
# The number follow by the word are the index of
the word Algorithm Implementation
{'like': 5, 'to': 6, 'eat': 3, 'apple': 1, 'and': 0, 'drink':
2, 'juice': 4} TfidfVectorizer + Naive Bayes Algorithm
# The index relates to the position of the word
count array below The first approach to use the TfidfVectorizer as a
# "I like to eat apple and drink apple juice" -> [1 2 feature extraction tools and Naive Bayes algorithm
1 1 1 1 1] to do the prediction. Naive Bayes is a simple and a
# apple which has the index 1 correspond to the probabilistic traditional machine learning algorithm.
word count of 2 in the array
It is very popular even in the past in solving
TfidfVectorizer problems like spam detection. Using Naive Bayes
library provided by sklearn library save us a lot of
Word counts are good but can we do better? One hassle in implementing this algorithm. This can be
issue with simple word count is that some words easily get in a few lines of codes
like ‘the’, ‘and’ will appear many times and they from sklearn.naive_bayes import GaussianNB
don’t really add too much meaningful information. clf.fit(x_train_features.toarray(),y_train)
# Output of the score is the accuracy of the
Another popular alternative is TfidfVectorizer. prediction
Besides of taking the word count of every words, # Accuracy: 0.995
words that often appears across multiple clf.score(x_train_features.toarray(),y_train)
documents or sentences, the vectorizer will try to # Accuracy: 0.932
downscale them.
clf.score(x_test_features.toarray(),y_test)
For more info about CountVectorizer and
TfidfVectorizer, please read from this great piece We achieve an accuracy of 93.2%. But accuracy is
of article, which is also where I gain most of my not solely the metrics to evaluate the performance
understanding. of an algorithm. So other scoring metrics and that
may help us to understand thoroughly how well this
Word Embedding model is doing.

Word embedding is trying to convert a word to a Scoring & Metrics


vectorized format and this vector represents the
position of this word in a higher dimensional When it comes to evaluation of a data science
space. model’s performance, sometimes accuracy may not
be the best indicator.
For words that have similar meaning, the cosine
distance of those two word vectors will be shorter Some problems that we are solving in real life
and they will be closer to each other. might have a very imbalanced class and using
accuracy might not give us enough confidence to
And in fact, these words are vectors, so you can understand the algorithm’s performance.
even perform math operations on them ! The end
results of these operation will be another vector In the email spamming problem the spam data is
that maps to a word. Unexpectedly, those approximately 20% of our data. If our algorithm
operations produce some amazing result ! predicts all the email as non-spam, it will achieve an
accuracy of 80%.
Example 1 : King- Man + Woman = Queen
And for some problem that has only 1% of positive
Example 2: Madrid-Spain+France = Paris data, predicting all the sample as negative will give
them an accuracy of 99% but we all know this kind
Example 3: Walking-Swimming+Swam= Walked of model is useless in a real life scenario.
Copyrights @Kalahari Journals Vol.7 No.12 (December, 2022)
International Journal of Mechanical Engineering
496
DOI : https://doi.org/10.56452/7-12-46

Precision & Recall The recall of this model is rather low, it might not
be doing a good enough job in discovering the spam
Precision & Recall is the common evaluation email.
metrics that people use when they are evaluating
class-imbalanced classification model. Summary

Precision is evaluating, when a model predict I have showed you all the necessary steps needed in
something as positive, how accurate the model is. designing a spam detection algorithm. Just a brief
On the other hand, recall is evaluating how well a recap:
model in finding all the positive samples.
Explore and understand your data
The mathematical equation for precision & recall
are as respective Visualize the data at hand to gain a better intuition
— Wordcloud, N-gram Bar Chart

Text Cleaning — Word Stemmer and Word


Lemmatization
TP: True Positive Feature Extraction — Count Vectorizer, Tfidf
Vectorizer, Word Embedding
FP : False Positive
Algorithm — Naive Bayes
TN: True Negative
Scoring & Metrics — Accuracy, Precision, Recall
FN: False Negative
Here concludes the first part of demonstration in
Confusion Matrix designing spam detection algorithm.
Confusion Matrix is a very good way to
understand results like true positive, false positive, 4. CONCLUSION
true negative and so on.
As shown in Figure 4, all the models based on the
Sklearn documentation has provided a sample feature set 2 most-frequent-word-count have higher
code of how to plot nice looking confusion matrix accuracy and F1 score than those based on the
to visualize your result. feature set 1 stop words
+ n-gram + tf-IDF.

If the use case is to introduce a beta version of an


email spam detector like no-spam in the inbox. In
this case, the model: Neural Network with tanh
activation function and the feature set 1 stop words
+ n-gram + tf-IDF serves this purpose.

According to the graphs in Figure 4, if the use case


is to introduce an email spam detector to reduce bad
user experience in searching for important emails
from junk mailboxes and filtering spam from the
inbox. In this case, Neural Network with a feature
set 2 - ‘most frequent word count’ gives a better
user experience in general.
Confusion Matrix of the result
The future work includes testing the model with
Precision: 87.82% various standard datasets. This research proposes
that the outcome that is obtained should be
Recall: 81.01% compared with additional spam datasets from
various sources. Also, more classification and
Copyrights @Kalahari Journals Vol.7 No.12 (December, 2022)
International Journal of Mechanical Engineering
497
DOI : https://doi.org/10.56452/7-12-46

feature algorithms should be analyzed with email


spam datasets. [11] Jason Brownlee, “Naive Bayes for Machine
Learning” The Machine Learning Mastery,
5. REFERENCES April 11, 2015.
https://machinelearningmastery.com/naive-
[1] AKINYELU, A. A., & ADEWUMI, A. O. bayes-for-machine- learning/
(2014). “Classification of phishing email
using random forest machine learning [12] Wikipedia, “History of email spam,”
technique”. Journal of Applied Mathematics. Internet Free Encyclopedia, 2001.
https://en.wikipedia.org/wiki/History_of_
[2] Vinodhini. M, Prithvi. D, Balaji. S “Spam email_spam
Detection Framework using ML
Algorithm” in IJRTE ISSN: 2277-3878, [13] Rohith Gandhi, “Support Vector Machine”
Vol.8 Issue.6, March 2020. The Machine Learning Mastery, June 7,
2018.
https://towardsdatascience.com/support-
[3] YUsKSEL, A. S., CANKAYA, S. F., &
vector-machine-introduction-to-machine-
UsNCUs, It. S. (2017). “Design of a
learning-algorithms-934a444fca47
Machine Learning Based Predictive
Analytics System for Spam Problem.”
[14] Jason Brownlee, “Logistic Regression for
Acta Physica Polonica, A.,132(3).[26]
Machine Learning” The Machine Learning
GOODMAN, J. (2004, July). “IP Addresses
Mastery, April 1, 2016.
in
https://machinelearningmastery.com/logisti
[4] Email Clients.” In CEAS. c-regression-for-machine-learning/

[5] Deepika Mallampati, Nagaratna P. Hegde “A [15] Jason Brownlee, “How to Encode Text Data
Machine Learning Based Email Spam for Machine Learning with scikit- learn”
Classification Framework Model” in The Machine Learning Mastery, September
IJITEE, ISSN: 2278-3075, Vol.9 Issue.4, 29, 2017.
February 2020. https://machinelearningmastery.com/prepare-
text-data-machine-learning-scikit-learn/
[6] Javatpoint, “Machine Learning
Tutorial” 2017 [16] I. Androutsopoulos, J. Koutsias, K. Chandrinos
https://www.javatpoint.com/machi and C.
ne- learning D. Spyropoulos, "An experimental comparison of
naive Bayesian and keyword-based anti-
[7] SpamAssassin, “Spam and Ham Dataset'', spam filtering with personal email
Kaggle, 2018. messages," Computation and Language, pp.
https://www.kaggle.com/veleon/ham-and- 160-167, 2000.
spam-dataset
[17] G. V. Cormack, "Email Spam Filtering: A
Systematic Review," Foundations and
[8] Apache, “open-source Apache SpamAssassin
Trends® in Information Retrieval, vol. 1,
Dataset”, 2019
no. 4, pp. 335-455, 2006.
https://spamassassin.apache.org/old/publicc
orpus/
[18] M. Siponen and C. Stucke, "Effective
Anti-Spam Strategies in Companies: An
[9] SpamAssassin, “Spam Classification
International Study," Proceedings of the
Kernel”, 2018
39th Annual Hawaii International
https://www.kaggle.com/veleon/spam-
Conference on System Sciences
classification
(HICSS'06), 2006.
[10] SpamAssassin, “REVISION HISTORY OF
THIS CORPUS”, 2016
[19] Guzella, T. S. and Caminhas, W. M.”A
https://spamassassin.apache.org/old/publicco
review of machine learning approaches to
rpus/read me.html
Spam filtering.” Expert Syst. Appl., 2009.
Copyrights @Kalahari Journals Vol.7 No.12 (December, 2022)
International Journal of Mechanical Engineering
498
DOI : https://doi.org/10.56452/7-12-46

[20] Jianying Zhou, Wee-Yung Chin, Rodrigo M. S. (2018, January). “A framework for real-time
Roman, and Javier Lopez, (2007) "An spam detection in Twitter.” In
Effective MultiLayered Defense Framework Communication Systems & Networks
against Spam", Information Security (COMSNETS), 2018 10th International
Technical Report 01/2007. Conference on (pp. 380-383).
[29] MAHMOUD, T. M., & MAHFOUZ, A. M.
[21] Xiao Mang Li, Ung Mo Kim, (2012) "A (2012). “SMS spam filtering technique
hierarchical framework for content-based based on artificial immune system.”
image spam filtering", 8th International International Journal of Computer Science
Conference on Information Science and Issues (IJCSI), 9(2), 589.
Digital Content Technology (ICIDT), Jeju,
June, pp. 149-155. [30] AN ANTI-SPAM DETECTION MODEL
FOR EMAILS OF MULTI-NATURAL
[22] Linda Huang, Julia Jia, Emma Ingram, LANGUAGE Mazin Abed Mohammed a,*,
Wuxu Peng, “Enhancing the Naive Bayes Salama A. Mostafa b,*, Omar Ibrahim Obaid
Spam Filter through Intelligent Text
Modification Detection”, 2018 17th IEEE
International Conference on Trust, Security
and Privacy in Computing and
Communications.

[23] W.A. Awad, S.M. Elseuofi, Machine


learning methods for spam E-mail
classification, Int. J. Comput. Sci. Inf.
Technol. 3 (1) (2011) 173–184.

[24] K.R. Dhanaraj, V. Palaniswami, Firefly


and Bayes classifier for email spam
classification in a distributed
environment, Aust. J. Basic Appl. Sci. 8
(17) (2014) 118–130.

[25] M. Zavvar, M. Rezaei, S. Garavand, Email


spam detection using combination of
particle swarm optimization and artificial
neural network and support vector machine
Int. J Mod Educ. Comput.Sci. (2016) 68-74.

[26] Deepika Mallampati, “An Efficient


Spam Filtering using Supervised
Machine Learning Techniques” in
IJSRCSE, Vol.6, Issue.2, pp.33-37,
April (2018).

[27] [Deepika Mallampati, K.Chandra Shekar


and K.Ravikanth “Supervised Machine
Learning Classifier for Email Spam
Filtering”, © Springer Nature Singapore
Pte Ltd. 2019 and Engineering,
https://doi.org/10.1007/978-981-13-7082-
341.

[28] GUPTA, H., JAMAL, M. S., MADISETTY,


S., & DESARKAR,
Copyrights @Kalahari Journals Vol.7 No.12 (December, 2022)
International Journal of Mechanical Engineering
499

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy