0% found this document useful (0 votes)

47 views13 pages

EmailSpamFilteringTechniques AReview

This document reviews techniques for filtering email spam. It discusses how email spam detection involves several phases including data pre-processing, feature extraction, and classification using machine learning algorithms. Common techniques for email spam detection include whitelist/blacklist filtering, rule-based filtering, content-based filtering, and keyword-based filtering. The review also describes the generalized process for email spam prediction, which involves data pre-processing steps like tokenization and semantic feature selection, followed by classification modeling to detect spam emails.

Uploaded by

Nishika Pal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views13 pages

EmailSpamFilteringTechniques AReview

Uploaded by

Nishika Pal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/357175093

Design Engineering Email Spam Filtering Techniques: A Review

Article in Design Engineering · December 2021

CITATIONS READS

0 872

3 authors, including:

Pravin Kshirsagar
Raisoni Group of Institutions
12 PUBLICATIONS 10 CITATIONS

SEE PROFILE

All content following this page was uploaded by Pravin Kshirsagar on 20 December 2021.

The user has requested enhancement of the downloaded file.

ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 9 | Pages: 8327 - 8338

Email Spam Filtering Techniques: A

Review
Meelony Marwaha, Nikita Singla
School of Engg. & Technology, CT University, Ludhiana
Abstract
Email users commonly find bulk of spam emails from unfamiliar senders in their mailboxes
every day. Spamming has also triggered online cyber fraud through social engineering,
majority of which begins with an email from a non-trusted origin that contains a URL that,
when opened, is said to have compromised one's personal data. The email spam detection has
various phases which include data pre-processing, feature extraction and classification. The
various machine learning algorithms are designed by the authors for the email spam
detection. The various email spam detection techniques are reviewed in this paper.
Keywords:-Email Spam, Machine Learning, Supervised learning

1. Introduction
Spam, also known as unwanted bulk email, has permeated daily life all these years.Ballooned
spam has profoundly affected the efficiency of email usage as email is used to help
discussionsas well as atask manager and document sharing system and archiving these days.
Some case studieshave also highlighted a terrifying fact that all types of spam emails can be
as high as 88%~92% of total emails sent every day [1].The content of spam email may
include illegal products, services, intimidation and fraud, plus spam emails usually induce
alleged threats such as information theft with the help of extremely fast-spreading malware.
Therefore, several solutions have been proposed to avoid the situation from worsening and
spam detection techniques have been profoundlyemerged and marketed over the
years.However, every email user still receives plenty of spam emails every year which
indicates the need and urgency of improving spam detection.Since spam email is likely to be
uncovered at each step of the email sending operation, multiple methods are often used in
spam filtering that work constantly cooperatively, for example whitelist/blacklist, challenge-
response, rule -based filtering, keyword-based filtering, content-based filtering, etc. Spam
filtering can be considered a specialized binary classification function of text to classify an
email into spam orham [2].

1.1 E mail Spam Detection

E-mail customers get several hundred spam messages on regular basis with a fresh content,
from fresh addresses which are robotically produced by robot programming tool. Filtering
spam using conventional techniques such as dark-white lists (domains, IP addresses, mailing
addresses) is practically not possible. Applying text mining strategies to an E-mail may
increase productivity of email spam filtration [3]. Additionally, forestalling spam messages
will be conceivable to build up topical reliance from topographical (e.g., what subjects are
most highlighted in the spam-messages transferred from the specific nations) features. In the
past decade, lots of techniques for text clustering and classification have been effectively

[8327]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 9 | Pages: 8327 - 8338

applied to tackle spam issue. Figure 1 represents the generalized Email spam prediction
model [4].

Data Pre-processing
Email Spam
Dataset
Tokenization

Classification
Semantic Feature model
Selection
Ori
gin
Feature
al
Reduction
Dat
a
Data Classification

Training Data

Testing Spam Detection

Data

Figure 1: Generalized Email spam Detection Model

a. Pre-Processing: At present, the majority of real-time data are partial comprising combined,
noisy and missing values. Pre-processing is a significant data mining step for preparing the
dataset prior to the mining process [5]. The main objective of this step is to remove few
words such as combination words, articles from email structure on the grounds that these
words have not significance in classification. Some other words of this type include:
 Stop Words or Punctuation: In general, the review text consists of superfluous words such as
“is”, “the”,” and”, “a”. The support of these words is negligible in identifying spam
feedbacks. Hence, these words should better be removed prior to tokenization to prevent
noise and redundant tokens. For example, considering the words “This is a very good pen”.
Once stop words and punctuation are removed, the review seems as “good pen” [6].
 PoS (Part of speech) tagging: This mostly includes tagging word features with PoS (Parts of
Speech) according to the identified context of review text. In addition to this, the tagging of
correlation with the nearby and connected words in a review text is performed as well. A
basic type of POS tagging is identifying words as nouns, verbs, adjectives, adverbs, and so
forth.
 Stemming Word: A stemming algorithm does the conversion of various types of words into a
single documented format. For example, take a review, “works”, “working”, and “worked” as
examples of the word „work‟. It is necessary to apply stemming to the review text prior to its
tokenization [7].

[8328]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 9 | Pages: 8327 - 8338

 Tokenization: Tokenization is the process of removing words in the structure of an email. It

additionally converts a message to its expressive format. It divides the incoming mail into a
series of demonstrative symbols termed as tokens. These demonstrative symbols are taken
out from the structure of email, the header, and subject. The method applied to replace
information with typical identification symbols will extract the overall features and words
from the email without considering the meaning.
b. Semantic Feature Selection: In feature selection, a subset of features is selected to reduce
the data size. This process minimizes a specific cost function. Contrary to feature extraction,
feature selection does not change the data, and applied at the data pre-processing stage prior
to the training of a classifier model [8]. This procedure is additionally referred to as variable
selection, feature reduction or variable subset selection.Semantic Based Feature Selection
(SBFS) is often used to decrease the number of features useful for the classification
operation. Subject and content pass through different operational blocks and are converted
into numerical values. These numerical values and weights are saved in a matrix and
delivered as inputs to supervised learning algorithms. The email dataset is divided into
training and test datasets. A model is built using 80 percentage of training dataset and 20
percentage of the test dataset. Then, classification algorithms are applied for classifying the
emails into spam and ham and results are compared further.
c. Feature extraction: Feature extraction techniques are concerned with transforming
available features into a novel feature space of reduced size. This process generates new
features on the basis of linear or nonlinear mixtures of features from the real-time dataset [9].
Some popular feature extraction algorithms are PCA (Principal Component Analysis), LDA
(Linear Discriminant Analysis), and Autoencoder.
d. Email Classification: Classifying an email message comes under supervised learning tasks.
It seeks out to construct a probabilistic model of a function for mapping emails to classes.
The supervised learning of text in email messages in which a whole email dataset represents a
single example of messages to be classified presents a learning algorithm with a set of pre-
classified, or labelled, patterns. This set is known as the training set. Multiple classified
messages from the training set are eliminated before the construction of model to be applied
for testing its efficiency level. This set is known as the testing set. Various models are
constructed from different partitions of the instances to training and testing sets for measuring
the classification accuracy of the built model. After this, the classification error is averaged
over each model [10]. This cycle is known as n-times cross validation in which “n” represents
the no. of times the instance set is divided. Various models are generated for evaluation
through this cycle and provides multiple times cross validation. Once the model is
constructed, it can be used to predict the classification of futuristic emails. The commonly
used machine learning based email spam classifiers have been discussed as follow:
i. Logistic Regression: LR performs predictive analysis, define data and interpret the relation
between a dependent binary variable and one or many nominals, ordinals, intervals, or ratio-
level independent variables. It is a statistical machine learning algorithm. To classify the data,
this algorithm considers the output variables at the absolute ends and attempts to construct a
logarithmic line to separate them. This model can perform both regression as well as

[8329]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 9 | Pages: 8327 - 8338

classification. This model has low variance and more competence. It is easy to update a
logistic model using new data by stochastic gradient descent.
ii. Naïve Bayes: The breed of naive Bayes (NB) classifiers rely upon Bayes' theorem that limits
absolute and conditional probabilities. In the case of machine learning and spam detection
[11], probabilities can be linked to the related frequencies of word presence in messages (i.e.,
the relative frequency count of words). The next idea is the alleged naive assumption based
on the independence of all features with respect to the output (i.e., their original class).
Though this assumption of independence is rarely true, naive Bayes classifiers can make a
highly fruitful classification even if the training data has not multiple examples. In addition,
classifiers belonging to the NB family are considered to be fast and easy-going.
iii. Support vector machines (SVM): SVMs are one of the most used classification algorithms
although their utility is widespread (for example, outlier detection). If a labelled dataset is
given, SVM discovers a classification (separation) hyperplane by finding the maximal
distance between data points (vectors representing samples) that belong to dissimilar classes.
Two types of SVM models exist: hard-margin (need to classify each point accurately) and
soft-margin (misclassification is also acceptable) [12]. Unlike k-NN classifiers, it is
advantageous for SVMs to work in higher dimensions. The data points are separated more
competently through the increase in the number of features. The points nearest to the
classification hyperplane classification are known as support vectors. A hyperplane is also
known as a decision boundary and divides elements that belong to dissimilar groups. The gap
between hyperplane duo used by the support vectors is regarded as the margin. The larger the
margin, the better.

1.1.1Existing Semantic based Classification Approaches for Spam detection

Semantic features represent the basic meaning or thoughts of words and are employed to
construct semantic language models to detect untrue reviews [13]. The argument is that
altering a word like "love" to "like" in a review should not disturb the similarity of reviews
because their meaning is same. Semantics-based classification approaches improve the
accuracy of spam detection. Some popular semantics-based classification approaches for
spam detection have been explained as below:
a. Latent Semantic Analyzing(LSA):LSA was first developed by Dumais and colleagues in
1988, contingent upon the TF-IDF method. It was initially designed to differentiate synonyms
in information recovery, then later for semantic identification and text classification.LSA has
the potential to evaluate similarity betweenconcepts, texts, and sentences. Unlike the TF-IDF
method, text in LSA is characterized as a term-text matrix and can frequently produce a very
small vector space.Another feature of LSA is its reliance on the text set, as it will select
certain conditions from the entire set via the TF-IDF method as the attributes of the classes
and the features vary with dissimilar sets. Therefore, distributing samples evenly in the set is
crucial for LSA and seriously affects the classification.Since LSA depends completely on
statistical methods, although each class can possess its own separate topic, the specified
semantic information of this topic is unclear, and hard to elaborate [14].
b. Ontological Semantic Technology(OST): OST is anadapted and upgradedform of
Ontological Semantics that includes a concept, method and application of a system in order to

[8330]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 9 | Pages: 8327 - 8338

interpret natural language text. At the heart of the OST are stores of world and linguistic
knowledge learned by authorities, primarily an ontology and a lexicon, also regarded as static
knowledge resources.The repository contains concepts independent of languagealong
relations between them, Proper Name Dictionary (PND) and is employed to illustrate and
represent the various meanings of words and sentences. Nowadays, OST has been
broadlyused in human-robot interaction.
c. Baseline semantic spam filtering:Hempelman and Mehra constructed baseline semantic
spam filtering with OST and presented OST to simplify spam filtering at the semantic level.
This approach adds store of information guarantee and security in OST, specifically to take
the steganographic scramble to a new scale, away from statistical pattern corresponding to
text comprehension [15].To deal with spam content, most of the existing spam filtering uses
statistics to detect rare words ("Viagra," "re-entrant," "replication"), usually Bayesian,
combined with some hard-coded heuristics.OSTs, in contrast, can increase the adversarial
threshold, in fact side-stepping noise-based, hash-busting steganography, currently in effect,
while it is still possible thatsomebody with more resources and more time can develop an
improved noise model and still separate signal and the alternate. Baseline semantic spam
filteringdoes this at an initial low level, again considering the statistical features, but now
about the meaning of possiblydisturbing texts, not the superficial text that is the
epiphenomenon of language. That is, instead of considering the meaning of the text, it only
considers the level of the meaning of the text. The other applications of the OST for
information guarantee and safety include the content of the meaning itself.

2. Literature Review
NadjateSaidani, et.al (2020) suggested a technique on the basis of performing two-semantic
level analysis [16]. Initially, particular domains were utilized for categorizing the emails so
that an individual conceptual view was ensured for spam in every domain. Subsequently, the
spam was detected by integrating a set of manually-specified attributes with semantic features
which were extracted in automatic manner. These attributes were assisted in summarizing the
email content into compact topics for which the spam emails were differentiated from non-
spam emails effectively. The suggested technique was capable of detecting the spam in
comparison with the traditional techniques on the basis of BoW (bag-of-words) and generated
optimal outcomes. A new algorithm was deployed by Wuxu Peng, et.al (2018) in order to
improve the accuracy of NB (Naive Bayes) spam filter with the objective of detecting the text
modifications and classifying the email in two classes: spam or ham [17]. The outcomes
demonstrated that the presented approach was applicable for consistently mitigating the
amount of spam emails whose misclassification was done as ham email. A FFNN (feed
forward neural network) was introduced by E Elakkiya, et.al (2019) along with BP (back
propagation) for detecting the spam [18]. The primary weights of FFNN were tuned with the
enhancement of quality of the learning process. For this purpose, FA (firefly algorithm) was
implemented for alleviating the time to discover the optimal weights under the learning
procedure. A twitter dataset was executed for the experimentation. The experimental
outcomes depicted that the introduced approach was effective with regard to accuracy and
detection rate and offered a least FPR (false positive rate). An adaptive scheme of classifying
the data was presented by ThayakornDangkesee, et.al (2017) in order to detect the spam for
which spam word lists and a commercial URL-based security tool were implemented [19].

[8331]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 9 | Pages: 8327 - 8338

The NB (Naïve Bayes) algorithm was deployed to analyze the data. Consequently, the
efficacy of the presented scheme was enhanced in more optimal manner in comparison with
traditional techniques. The experimental outcomes revealed that the presented scheme was
applicable to detect the spam. Xiaoxu Liu, et.al (2021) emphasized on discovering the
possibility of the Transformer model in order to detect the spam SMS (Short Message
Service) messages. For this, an enhanced Transformer model was recommended to detect
spam messages [20][35][36]37]. Two datasets were employed to quantify the recommended
model. The outcomes attained in experimentation proved that the results generated through
the recommended model were promising and this model provided the accuracy up to 98.92%,
recall of 94.51% and F1-Score around 96.13%. An effectual framework of detecting the spam
in email was projected by Maria Habib, et.al (2018 on the basis of a hybrid of GP (Genetic
Programmin) and SMOTE (Synthetic Minority Over-sampling Technique) so that the spam
emails were detected [21]. Two email corpora were utilized to test the projected framework
with respect to accuracy, recall, precision and G-mean. The experimental outcomes
confirmed that effectiveness of the projected framework for classifying the spam emails in
contrast to traditional techniques. An approach was developed by Wuxain Zhang, et.al (2017)
in which feature-based technique and supervised learning method were deployed for
detecting the spam posts from Instagram [22][32][33][34]. The collection of user profiles and
media posts was done from Instagram. The media posts were marked instantly using Minhash
and K-medoids clustering for grouping the near-duplicate posts into similar clusters. The
developed approach was appropriate for classifying these posts as spam or ham and yielded
the accuracy around 96.27%. An innovative algorithm of detecting spam was intended by
Zhijie Zhang, et.al (2020) on the basis of regularized ELM (extreme learning machine)
known as I2FELM (Improved Incremental Fuzzy-kernel-regularized Extreme Learning
Machine) with the objective of detecting the spam in Twitter in accurate manner [23]. The
results of experiments revealed that the intended algorithm was applicable in detecting the
spam. A DL (deep learning) technique was designed by AsoKhaleel Ameen, et.al (2018) to
detect the spam in Twitter [24]. This technique focused on training the Word2Vec based on
representation initially. Subsequently, the tweets were classified as spam and normal using
binary classifiers. Finally, the MLP (Multilayer Perceptron) was adopted to classify the spam
from tweets. The outcomes exhibited the supremacy of the designed technique over the
existing ones. And performed well concerning precision, recall and F-measure. A DL (deep
learning) based mechanism to detect spam was presented by GirijaChetty, et.al (2019) [25].
In this mechanism, the Word Embedding method was integrated with NN (Neural Network)
algorithm. Word Embedding was assisted in displaying the meaning and analogy of word.
The attributes of text documents available in the embedding space were learned using DNN.
Thereafter, these attributes were considered for classify text documents. The presented
mechanism had potential for detecting the spam in different text documents. An approach
was suggested by NattananWatcharenwong, et.al (2017) in order to detect the spam in closed
groups for which the text attributes were integrated with social attributes [26][27][28]. The
RF (Random Forest) algorithm was put forward to classify the spam form 1,200 labelled
posts. The outcome depicted that the suggested approach yielded the efficacy up to 98% for
detecting the spam in effective manner[29][30][31].

[8332]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 9 | Pages: 8327 - 8338

2.1 Comparison Table

Author Yea Technique Dataset Advantages Disadvantag
r Used es
NadjateSaidani, et al. 202 A technique CSDMC201 The suggested This
0 planned on 0 SPAM technique was technique
the basis of dataset capable of had not
two detecting the provided
semantic spam on the good
level basis of BoW efficacy in
analysis (bag-of-words) long run.
and generated
optimal
outcomes.
Wuxu Peng, et al. 201 Naive Ling-Spam The presented The
8 Bayes Spam dataset approach was presented
Filter applicable for approach
consistently was
mitigating the incapable of
amount of obtaining
spam emails higher speed
whose and
misclassificati efficiency
on was done as optimization.
ham email.
E Elakkiya, et al. 201 FFNN-BP Matlab The introduced This
9 (feed 2018a approach was approach
forward effective with was not
neural regard to performed
network accuracy and well at some
with back detection rate platforms.
propagation and offered a
) least FPR
(false positive
rate).
ThayakornDangkesee, 201 Adaptive Twitter API The presented The
et al. 7 data scheme was presented
classificatio applicable to technique
n detect the was not
spam. stable and
inefficient to
work for
Safe
Browsing
while
detecting
potentially
dangerous
links.

[8333]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 9 | Pages: 8327 - 8338

Xiaoxu Liu, et al. 202 Enhanced v.1 dataset The results This
1 Transformer and generated approach
model UtkMl‟s through the offered least
Twitter recommended accuracy in
dataset model were case of
promising and enormous
this model dataset
provided the which
accuracy up to contained a
98.92%, recall number of
of 94.51% and messages or
F1-Score even other
around types of
96.13%. content.

Maria Habib, et al. 201 Genetic CSDMC201 The projected The relative
8 Programmin 0 dataset framework significance
g (GP) was effective of attributes
combined for classifying was not
with the spam analyzed
Synthetic emails in using this
Minority contrast to framework.
Over- traditional
sampling techniques.
Technique
(SMOTE)
Wuxain Zhang, et al. 201 Feature- Instagram The developed The
7 based dataset approach was developed
method and appropriate for approach had
supervised classifying not
learning these posts as considered
technique spam or ham favor of
and yielded the users while
accuracy developing a
around technique to
96.27%. customize
the spam
classification
algorithms.
Zhijie Zhang, et al. 202 Improved Matlab2012 The intended This
0 Incremental b algorithm was technique
Fuzzy- applicable in was
kernel- detecting the incapable of
regularized spam. detecting the
Extreme spam in
Learning Twitter due
Machine to inadequate
(I2FELM) labeled data

[8334]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 9 | Pages: 8327 - 8338

in the social
network.
AsoKhaleel Ameen, et 201 Deep Twitter‟s The supremacy This
al. 8 Learning Streaming of the designed technique
method API technique was was
proved over ineffective in
the existing case of
ones for enormous
detecting the amount of
spam and it data.
performed well
concerning
precision,
recall and F-
measure.
GirijaChetty, et al. 201 Word UCI The presented This
9 Embedding machine mechanism technique
technique- learning had potential was not
Neural repository for detecting robust to
Network the spam in understand
algorithm different text the
documents. modelling
power of DL
while
detecting the
spam
NattananWatcharenwo 201 Random Facebook The suggested The
ng, et al. 7 Forest Graph APIs approach suggested
yielded the approach
efficacy up to was not
98% for performed
detecting the well for
spam in classifying
effective the spam
manner. posts having
only images
not letters.

Conclusion
A surge in the number of spammers and spam emails has been noticed in the recent years, as
the investment required for the spamming business is minimum. This has led to a system that
finds each email suspicious, causing substantial investments in defence mechanisms. The
most commonly used mail filtering schemes are Knowledge Engineering (KE) and Machine
Learning (ML). The approaches based on KE generate a set of rules so as to classify
messages as spam or genuine mail. The email spam detection has various phases like feature
extraction and classification. The various scheme are analyzed in this paper for the email

[8335]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 9 | Pages: 8327 - 8338

spam detection. It is analyzed that machine learning algorithms are best performing
algorithms as compared content filtering techniques.
References
[1] Venkatraman, S., Surendiran, B. & Arun Raj Kumar, P. Spam e-mail classification for the
Internet of Things environment using semantic similarity approach”, 2020, The Journal
ofSupercomputing, vol. 76, pp. 756–776
[2] M. Qi and R. Mousoli, "Semantic analysis for spam filtering," 2010 Seventh International
Conference on Fuzzy Systems and Knowledge Discovery, 2010, pp. 2914-2917
[3] Q. Zhang, H. Yang, Z. Yuan and J. Sun, "Studies on the Semantic Body-Based Spam
Filtering," 2010, International Conference of Information Science and Management
Engineering, pp. 233-236
[4] A. Han, H. Kim, I. Ha and G. Jo, "Semantic Analysis of User Behaviors for Detecting Spam
Mail," 2008 IEEE International Workshop on Semantic Computing and Applications, 2008,
pp. 91-95
[5] G. Vijayasekaran, S.Ros, “Spam and Email Detection in Big data Platform using Naives
Bayesian classifier”, 2018, International Journal of Computer Science and Mobile
Computing, Vol.7 Issue. 4, pg. 53-58
[6] Priti Sharma, Uma Bhardwaj, “Machine Learning based Spam E-Mail Detection”, 2018,
International Journal of Intelligent Engineering and Systems, Vol.11, No.3
[7] M. Deepika, Shilpa Rani, “Performance of Machine Learning Techniques for Email Spam
Filtering”, 2017, IJRTER
[8] Esha Bansal, Pradeep Kumar Bhatia, “A SURVEY OF VARIOUS MACHINE LEARNING
ALGORITHMS ON EMAIL SPAMMING”, 2017, International Journal of Advances in
Electronics and Computer Science
[9] Dr. SwapnaBorde, Utkarsh M. Agrawal, Viraj S. Bilay, Nilesh M. Dogra, “Supervised
Machine Learning techniques for Spam Email Detection”, 2017, IJSART, Volume 3 Issue 3
[10] DeepikaMallampati, Nagaratna P. Hegde, “A Machine Learning Based Email Spam
Classification Framework Model: Related Challenges and Issues”, 2020, International
Journal of Innovative Technology and Exploring Engineering (IJITEE), Volume-9 Issue-4
[11] A. Lakshmanarao, K. Chandra Sekhar, Y. Swath, “An Efficient Spam Classification System
Using Ensemble Machine Learning Algorithm”, 2018, Journal of Applied Science and
Computations, Volume 5, Issue 9
[12] ApurvaTaunk, Srishty Bharti, Sipra Sahoo, “An Ensemble Method for Spam Classification”,
2020, International Journal of Scientific & Technology Research Volume 9, Issue 02
[13] MeghaRathi, VikasPareek, “Spam Mail Detection through Data Mining – A Comparative
Performance Analysis”, 2013, International Journal of Modern Education and Computer
Science, Volume 12, PP. 31-39
[14] HanifBhuiyan, AkmAshiquzzaman, Tamanna Islam Juthi, Suzit Biswas &JinatAra, “A
Survey of Existing E-Mail Spam Filtering Methods Considering Machine Learning
Techniques”, 2018, Global Journal of Computer Science and Technology, Volume 18, Issue
2
[15] Harjot Kaur, Er. Prince Verma, “Survey on E-mail Spam Detection using Supervised
approach with Feature selection”, 2017, International Journal of Engineering sciences &
Research technology

[8336]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 9 | Pages: 8327 - 8338

[16] NadjateSaidani, KamelAdi, MohandSaïdAllili, “A semantic-based classification approach for

an enhanced spam detection”, 2020, Computers & Security
[17] Wuxu Peng, Linda Huang, Julia Jia, Emma Ingram, “Enhancing the Naive Bayes Spam Filter
Through Intelligent Text Modification Detection”, 2018, 17th IEEE International Conference
On Trust, Security And Privacy In Computing And Communications/ 12th IEEE
International Conference On Big Data Science And Engineering (TrustCom/BigDataSE)
[18] E Elakkiya, S Selvakumar, “Initial Weights Optimization using Enhanced Step Size Firefly
Algorithm for Feed Forward Neural Network applied to Spam Detection”, 2019, TENCON
2019-IEEE Region 10 Conference (TENCON)
[19] ThayakornDangkesee, SutheeraPuntheeranurak, “Adaptive Classification for Spam Detection
on Twitter with Specific Data”, 2017, 21st International Computer Science and Engineering
Conference (ICSEC)
[20] Xiaoxu Liu, Haoye Lu, Amiya Nayak, “A Spam Transformer Model for SMS Spam
Detection”, 2021, IEEE Access
[21] Maria Habib, Hossam Faris, Mohammad A. Hassonah, Ja'farAlqatawna, Alaa F. Sheta, Ala'
M. Al-Zoubi, “Automatic Email Spam Detection using Genetic Programming with SMOTE”,
2018, Fifth HCT Information Technology Trends (ITT)
[22] Wuxain Zhang, Hung-Min Sun, “Instagram Spam Detection”, 2017, IEEE 22nd Pacific Rim
International Symposium on Dependable Computing (PRDC)
[23] Zhijie Zhang, RuiHou, Jin Yang, “Detection of Social Network Spam Based on Improved
Extreme Learning Machine”, 2020, IEEE Access
[24] AsoKhaleel Ameen, Buket Kaya, “Spam detection in online social networks by deep
learning”, 2018, International Conference on Artificial Intelligence and Data Processing
(IDAP)
[25] GirijaChetty, Hieu Bui, Matthew White, “Deep Learning Based Spam Detection System”,
2019, International Conference on Machine Learning and Data Engineering (iCMLDE)
[26] NattananWatcharenwong, Kanda Saikaew, “Spam detection for closed Facebook groups”,
2017, 14th International Joint Conference on Computer Science and Software Engineering
(JCSSE)
[27]SudhirAkojwar, PravinKshirsagar, “Performance Evolution of Optimization Techniques
Mathematical Benchmark Functions”, WSEAS International conference on Neural Network-
2016, Rome,Italy.
[28]Velvizhi; Satish R Billewar; Gaurav Londhe; PravinKshirsagar; Neeraj Kumar, “Big Data for
Time Series and Trend Analysis of Poly Waste Management in India”, Materials Today:
Proceedings, Elsevier, 2020.
[29]Pravin R Kshirsagar, Anil N Rakhonde, Pranav Chippalkatti, “MRI IMAGE BASED BRAIN
TUMOR DETECTION USING MACHINE LEARNING”, Test Engineering and
Management, January-February 2020 ISSN: 0193-4120, Vol. 81, Page No. 3672 –3680.
[30]PravinKshirsagar et.al (2016), “Brain Tumor classification and Detection using Neural
Network”, DOI: 10.13140/RG.2.2.26169.72805.
[31]PravinKshirsagar et. al., “OPERATIONAL COLLECTION STRATEGY FOR
MONITORING SMART WASTE MANAGEMENT SYSTEM USING SHORTEST PATH
ALGORITHM”, Journal of Environmental Protection and Ecology, Vol. 22, Issue 2, pp. 566-
577,2021

[8337]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 9 | Pages: 8327 - 8338

[32]Jude, A.B., Singh, D., Islam, S. et al. An Artificial Intelligence Based Predictive Approach for
Smart Waste Management. Wireless PersCommun (2021). https://doi.org/10.1007/s11277-
021-08803-7.
[33]Padmaja, M., Shitharth, S., Prasuna, K. et al. Grow of Artificial Intelligence to Challenge
Security in IoT Application. Wireless PersCommun (2021). https://doi.org/10.1007/s11277-
021-08725-4.
[34]S. Shitharth, PratikshaMeshram, Pravin R. Kshirsagar, HariprasathManoharan, VineetTirth,
VenkatesaPrabhuSundramurthy, "Impact of Big Data Analysis on Nanosensors for Applied
Sciences Using Neural
Networks", JournalNanomaterials, vol. 2021, ArticleID 4927607, 9 pages, 2021. https://doi.or
g/10.1155/2021/4927607
[35]Kshirsgar P., More V., Hendre V., Chippalkatti P., Paliwal K. (2020) IOT Based Baby
Incubator for Clinic. In: Kumar A., Mozar S. (eds) ICCCE 2019. Lecture Notes in Electrical
Engineering, vol 570. Springer, Singapore.
[36]Oza S. et al. (2020) IoT: The Future for Quality of Services. In: Kumar A., Mozar S. (eds)
ICCCE 2019. Lecture Notes in Electrical Engineering, vol 570. Springer, Singapore
[37]Kshirsgar P., Pote A., Paliwal K.K., Hendre V., Chippalkatti P., Dhabekar N. (2020) A
Review on IOT Based Health Care Monitoring System. In: Kumar A., Mozar S. (eds) ICCCE
2019. Lecture Notes in Electrical Engineering, vol 570. Springer, Singapore

[8338]

View publication stats

Jolly Teacher and Parent Handbook
No ratings yet
Jolly Teacher and Parent Handbook
16 pages
Phillip Kevin Lane: Kotler - Keller
No ratings yet
Phillip Kevin Lane: Kotler - Keller
30 pages
46 - Ijme... Mech Engg..Research Paper-1
No ratings yet
46 - Ijme... Mech Engg..Research Paper-1
10 pages
Spam 2023
No ratings yet
Spam 2023
11 pages
Email (Research) 3
No ratings yet
Email (Research) 3
7 pages
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
No ratings yet
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
7 pages
Optimizing Spam Filtering With Machine Learning
No ratings yet
Optimizing Spam Filtering With Machine Learning
35 pages
VBK23 Cse 041
No ratings yet
VBK23 Cse 041
6 pages
Spam Classification Based On Supervised Learning U
No ratings yet
Spam Classification Based On Supervised Learning U
6 pages
(IJCST-V11I2P16) :shikha, Jatinder Singh Saini
No ratings yet
(IJCST-V11I2P16) :shikha, Jatinder Singh Saini
9 pages
IJRPR8167
No ratings yet
IJRPR8167
7 pages
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
No ratings yet
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
64 pages
Jebin 2
No ratings yet
Jebin 2
22 pages
Kongunadu College of Engineering and Technology: Automated Spam Filtering: A Fuzzy Similarity Approach
No ratings yet
Kongunadu College of Engineering and Technology: Automated Spam Filtering: A Fuzzy Similarity Approach
6 pages
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
No ratings yet
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
7 pages
44 Decision Tree Model For Email Classification
No ratings yet
44 Decision Tree Model For Email Classification
4 pages
Article 28
No ratings yet
Article 28
5 pages
Ijirt156181 Paper
No ratings yet
Ijirt156181 Paper
5 pages
Synopsis Email Spam
No ratings yet
Synopsis Email Spam
9 pages
Email Based Spam Detection
No ratings yet
Email Based Spam Detection
5 pages
Using Support Vector Machine For Classification and Feature Extraction of Spam in Email
No ratings yet
Using Support Vector Machine For Classification and Feature Extraction of Spam in Email
7 pages
Spam Email Using Machine Learning
No ratings yet
Spam Email Using Machine Learning
13 pages
Email Spam Detection
No ratings yet
Email Spam Detection
8 pages
Decision Tree Model For Email Classification: Ivana Čavor
No ratings yet
Decision Tree Model For Email Classification: Ivana Čavor
4 pages
Major-Final Research Paper
No ratings yet
Major-Final Research Paper
3 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
Spam Detection in Email Using Machine Le
No ratings yet
Spam Detection in Email Using Machine Le
8 pages
Email Spam Filtering Techniques
No ratings yet
Email Spam Filtering Techniques
11 pages
Research Paper Spam Detection
No ratings yet
Research Paper Spam Detection
4 pages
Spam Filtering Using Spam Mail Communities: A Paper On
No ratings yet
Spam Filtering Using Spam Mail Communities: A Paper On
13 pages
E-Mail Spam Filtering
No ratings yet
E-Mail Spam Filtering
7 pages
Hybrid Machine Learning Based E-Mail Spam Filtering Technique
100% (2)
Hybrid Machine Learning Based E-Mail Spam Filtering Technique
58 pages
Madhavan 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012113
No ratings yet
Madhavan 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012113
12 pages
Introduction To Spam Email Detection
No ratings yet
Introduction To Spam Email Detection
16 pages
CPP Report
No ratings yet
CPP Report
14 pages
Email Spam Detection (Research Paper)
No ratings yet
Email Spam Detection (Research Paper)
8 pages
Moutafis EWS 098
No ratings yet
Moutafis EWS 098
8 pages
Naive Bayes Spam Filte....
No ratings yet
Naive Bayes Spam Filte....
10 pages
Project Report Emaildetection 4 44
No ratings yet
Project Report Emaildetection 4 44
41 pages
Research Paper Emaildetection
No ratings yet
Research Paper Emaildetection
6 pages
A Comparative Performance Evaluation of Content Based Spam and Malicious URL Detection in E-Mail
No ratings yet
A Comparative Performance Evaluation of Content Based Spam and Malicious URL Detection in E-Mail
6 pages
Spam Email Detection Using Python and Machine Learning
No ratings yet
Spam Email Detection Using Python and Machine Learning
14 pages
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
No ratings yet
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
7 pages
Spam Filtering Thesis
100% (2)
Spam Filtering Thesis
6 pages
Security and Communication Networks - 2022 - Ahmed - Machine Learning Techniques For Spam Detection in Email and IoT
No ratings yet
Security and Communication Networks - 2022 - Ahmed - Machine Learning Techniques For Spam Detection in Email and IoT
19 pages
Comparative Analysis of Classifiers For PDF
No ratings yet
Comparative Analysis of Classifiers For PDF
6 pages
A Novel Method of Spam Mail Detection Using Text Based Clustering Approach
No ratings yet
A Novel Method of Spam Mail Detection Using Text Based Clustering Approach
11 pages
ETCW15
No ratings yet
ETCW15
4 pages
Voting Classification Method For Email Spam Prediction
No ratings yet
Voting Classification Method For Email Spam Prediction
10 pages
Amrit Science Campus: Submitted by
No ratings yet
Amrit Science Campus: Submitted by
35 pages
Thameena Report
No ratings yet
Thameena Report
30 pages
PPT
0% (1)
PPT
15 pages
Print 22may2023
No ratings yet
Print 22may2023
54 pages
Final Doc SPAM
No ratings yet
Final Doc SPAM
64 pages
Review (2) - Machine Learning For SPAM Detection 2023
No ratings yet
Review (2) - Machine Learning For SPAM Detection 2023
13 pages
Email Spam A Comprehensive Review of Optimize Detection Methods Challenges and Open Research Problems
No ratings yet
Email Spam A Comprehensive Review of Optimize Detection Methods Challenges and Open Research Problems
31 pages
(IJCST-V11I3P21) :ms. Deepali Bhimrao Chavan, Prof. Suraj Shivaji Redekar
No ratings yet
(IJCST-V11I3P21) :ms. Deepali Bhimrao Chavan, Prof. Suraj Shivaji Redekar
4 pages
1822 B Deleted Merged Cropped
No ratings yet
1822 B Deleted Merged Cropped
40 pages
Spam Detection Using BERT
No ratings yet
Spam Detection Using BERT
6 pages
Email Spam PDF
No ratings yet
Email Spam PDF
5 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Artificial Intelligence 2024 Book 2 of 2: AI, #2
From Everand
Artificial Intelligence 2024 Book 2 of 2: AI, #2
Yang Yen Thaw
No ratings yet
The Relationship Between Self-Efficacy and Lecturer's Assertive Behavior With Japanese Public Speaking Anxiety
No ratings yet
The Relationship Between Self-Efficacy and Lecturer's Assertive Behavior With Japanese Public Speaking Anxiety
17 pages
Gold Exp 2e A2 TB U5
No ratings yet
Gold Exp 2e A2 TB U5
13 pages
Cot 4TH Q Mass-Count Noun
No ratings yet
Cot 4TH Q Mass-Count Noun
4 pages
Maslow-The Superior Person
No ratings yet
Maslow-The Superior Person
4 pages
Syllabus
100% (1)
Syllabus
6 pages
Principles of Language Teaching
No ratings yet
Principles of Language Teaching
7 pages
Teaching Recount Text
No ratings yet
Teaching Recount Text
10 pages
Portfolio Spreadsheet Aitsl Standards
No ratings yet
Portfolio Spreadsheet Aitsl Standards
3 pages
Being Globally Competent Is To
No ratings yet
Being Globally Competent Is To
2 pages
School Teachers' Attitudes Towards Inclusive Education
No ratings yet
School Teachers' Attitudes Towards Inclusive Education
6 pages
Yosephus Setyo Nugroho X2206056 PDF
No ratings yet
Yosephus Setyo Nugroho X2206056 PDF
75 pages
Module 1 - The Science of Mind: Studied Scientifically
No ratings yet
Module 1 - The Science of Mind: Studied Scientifically
3 pages
Muet Reading Set 2
No ratings yet
Muet Reading Set 2
9 pages
Designing Meaningful Performance Based Assessment
No ratings yet
Designing Meaningful Performance Based Assessment
19 pages
Indirect Questions
100% (1)
Indirect Questions
11 pages
Learning To Become A Taste Expert: Kathryn A. Latour John A. Deighton
No ratings yet
Learning To Become A Taste Expert: Kathryn A. Latour John A. Deighton
55 pages
Google Scholar Assignment
100% (1)
Google Scholar Assignment
2 pages
COAP-2-Nursing-Research-30-Items-JIZ DE ORTEGA
No ratings yet
COAP-2-Nursing-Research-30-Items-JIZ DE ORTEGA
9 pages
Topic No. 1 21 Century Skills in Teaching Economics K12 Spiral Instructional Modeling in The Philippines
No ratings yet
Topic No. 1 21 Century Skills in Teaching Economics K12 Spiral Instructional Modeling in The Philippines
8 pages
DLP Grade 4 Math Final
No ratings yet
DLP Grade 4 Math Final
8 pages
Alfred Adler
No ratings yet
Alfred Adler
20 pages
RW Writing A Book Review or An Article Critique
No ratings yet
RW Writing A Book Review or An Article Critique
4 pages
Information Security Management: Webster University Scott Granneman
No ratings yet
Information Security Management: Webster University Scott Granneman
28 pages
Statistics and Probability: Quarter 4 - Module 3: Test Statistic On Population Mean
100% (1)
Statistics and Probability: Quarter 4 - Module 3: Test Statistic On Population Mean
24 pages
5 Seasons ARTICLE
No ratings yet
5 Seasons ARTICLE
3 pages
AI in Employee Selection Process: IBM Company
No ratings yet
AI in Employee Selection Process: IBM Company
9 pages
BMS College of Engineering
No ratings yet
BMS College of Engineering
3 pages
COT - DLL 4th Quarterrr
100% (2)
COT - DLL 4th Quarterrr
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

EmailSpamFilteringTechniques AReview

Uploaded by

EmailSpamFilteringTechniques AReview

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Design Engineering Email Spam Filtering Techniques: A Review

Article in Design Engineering · December 2021

The user has requested enhancement of the downloaded file.

Email Spam Filtering Techniques: A

1.1 E mail Spam Detection

Testing Spam Detection

Figure 1: Generalized Email spam Detection Model

 Tokenization: Tokenization is the process of removing words in the structure of an email. It

1.1.1Existing Semantic based Classification Approaches for Spam detection

2.1 Comparison Table

[16] NadjateSaidani, KamelAdi, MohandSaïdAllili, “A semantic-based classification approach for

View publication stats

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.