Deep Learning for Image Spam Detection
Deep Learning for Image Spam Detection
SJSU ScholarWorks
Spring 5-20-2019
Part of the Artificial Intelligence and Robotics Commons, and the Information Security Commons
Recommended Citation
Sharmin, Tazmina, "Deep Learning for Image Spam Detection" (2019). Master's Projects. 702.
DOI: https://doi.org/10.31979/etd.b8me-rqsv
https://scholarworks.sjsu.edu/etd_projects/702
This Master's Project is brought to you for free and open access by the Master's Theses and Graduate Research at
SJSU ScholarWorks. It has been accepted for inclusion in Master's Projects by an authorized administrator of SJSU
ScholarWorks. For more information, please contact scholarworks@sjsu.edu.
Deep Learning for Image Spam Detection
A Project
Presented to
In Partial Fulfillment
Master of Science
by
Tazmina Sharmin
May 2019
© 2019
Tazmina Sharmin
by
Tazmina Sharmin
May 2019
by Tazmina Sharmin
spam filters, spammers can embed their spam text in an image, which is referred to as
image spam. In this research, we consider the problem of image spam detection, based
on image analysis. We apply various machine learning and deep learning techniques
obtain results comparable to previous work for the real-world datasets, while our deep
learning approach yields the best results to date for the challenge dataset.
ACKNOWLEDGMENTS
I would like to express my sincere gratitude to my advisor, Dr. Mark Stamp, for
I would like to thank my committe members, Dr. Katerina Potika and Fabio Di
My parents, Md. Gofranul Hoque and Ferdousi Rezwan, are my constant source
of inspiration. I am extremely grateful for their endless support and love throughout
all these years. I would like to thank my husband, Jane Alam Jan, for his gracious
support and constant encouragement which made it possible. Last, but not the least,
I am thankful to my daughter, Ahona, for her understanding and caring in her little
v
TABLE OF CONTENTS
CHAPTER
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.1 SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3.1 Dataset 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3.2 Dataset 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
vi
4.3.3 Dataset 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
LIST OF REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
vii
LIST OF TABLES
3 SVM Dataset 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 SVM Dataset 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6 MLP Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7 CNN Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
viii
LIST OF FIGURES
7 MLP Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
8 CNN Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 24
9 CNN Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
ix
CHAPTER 1
Introduction
people across the world [1]. As of 2015, the number of email users was 2.6 billion, while
in 2019 this number will rise to approximately 2.9 billion, with more than one-third
has increased, the number of spam messages has also increased. Text-based filters
have been developed to deal with the spam problem. In an effort to evade such filters,
spammers sometimes use image spam, that is, spammers can encode their messages as
images [3].
Previous research into image spam detection has shown that some types of image
spam can be detected with high accuracy. For example, in [4, 5] a wide variety
of image properties are extracted and images are classified as spam or ham (i.e.,
These experiments serve two purpose. First, we can determine an effective strategy in
the ‘‘cold start’’ case, that is, in the case where the training data is severely limited.
And second, we compare the effectiveness of deep learning to other machine learning
1
relevant background topics on image spam and related work in spam detection.
Chapter 3 provides an overview of the machine learning and deep learning algorithms
experimental results. Finally, Chapter 5 concludes the paper, and we discuss possible
2
CHAPTER 2
Background
Spamming means sending unsolicited messages to a large number of users in an
arbitrary manner. Initially, the idea of spam originated with the purpose of advertising
products. Later, spammers used spam for online deception and fraudulent activities.
Since sending spam messages via email add no operational cost, email spam has been
email.
Various types of spam include email spam, mobile phone messaging spam, web search
Email spam is unsolicited electronic messages in bulk amount and it is the most
widely used spam. In email spam, same messages are sent to numerous email addresses.
Those messages may include product advertisements, links to phishing websites which
might ask the recipient to provide confidential information or malware installers that
look innocent. At the beginning, most email spam contained only text messages.
Later, image-based spam email emerged to obfuscate text-based spam filters. Image
spam include spammer's message in the form of an image. There is another form of
email spam called blank spam [5] that has no message inside the email. Blank spam
Mobile phone message spam or SMS spam refers to junk message sent to mobile
phones which is similar to text messaging through short message service. It causes
inconvenience to the mobile phone users and also cost the incoming message [5]. As
there are costs associated with SMS spam, it is less common than email spam.
Search engine spam refers to measures trying to affect the position of a website
3
after a query. When a website is detected as having search engine spam, that site is
marked and penalized. One survey shows that 51.3% website hacks were related to
Social spam aims at social networking websites like Facebook and Twitter. In
social spamming, one primary key factor is creating fake account in social application
to hack into valid user account. These fake accounts are used to send bulk messages
with similar content or sending malicious links with the intent to harm. As social
networking sites became popular over time, social spamming activities like clickbaiting
Gaming spam means sending messages in bulk to players using a common chat
rooms or public discussion areas. Spammers target users who like gaming to sell
method to avoid text-based spam filters. It is assumed that most of the image spam
are used to advertise products, deceive users to gain personal data or deliver malicious
software [6]. It is more challenging to detect image spam as they involve various image
are used to create image spam which include, but are not limited to making the text
outlines blurry, using multiple image layers to construct an image, adding noise to
Image spam have evolved over time and take several forms to bypass the conven-
tional anti-spam techniques: text-only image, gray image, sliced image and randomized
image.
4
Text only image is the first generation image spam. It contains pure text
embedded into an image. These images look like regular text email which is actually
an image. A technique using optical character recognition (OCR) has been employed
to extract the texts from images and pass it to the spam filters.
Gray image is difficult to detect as it is often mistaken for a ham image. Gray
images often look quite identical to natural gray scale images. So, it is crucial to
Sliced image consists of multiple images merged together in jigsaw puzzle manner.
This type of spam image is challenging to detect because the combined images often
randomized image, spammers make changes to the individual pixel in the image. As a
result, it becomes hard to distinguish the randomized image from the original image.
The changes made usually do not affect the appearance of the images to the users but
There are three categories of techniques used to detect spam images: header
• Header based techniques: An email header consists of data about the sender
and the receiver, i.e., sender IP, sender email address, date, from, to, etc.
can be used to train and test machine learning models to provide prediction
results [9].
• Content based techniques: Content based filters check the image portion of an
5
email for particular keywords which are usually found in the body section of
spam email. Typically, the body of an email carries the actual information to
determine the specific pattern. One early stage content based filter used OCR
for extracting words from the text part of the image and pass them to text-based
filters [10].
image features like color properties and metadata features. It is based on the
idea that an image which has been simulated must have some distinguishing
Since the evolution of image spam, there have been ongoing research on its
detection. Machine learning algorithms play a useful role in this research area.
Moreover, several deep learning techniques are being deployed to provide robust
detection results.
Gao et al. [3] propose an image spam detection scheme by using probabilistic
boosting tree algorithm to predict if a given image is spam or not. They execute
feature vectors for learning. In order to generate the training set, they use k-means
clustering approach rather than randomly selecting spam images. Once the training
spam and ham images. For test purpose, 5-fold cross validation is performed. Their
Kumaresan et al. [11] propose their research for detecting image spam based
on color features by using k-Nearest Neighbor (𝑘-NN) algorithm. They consider the
6
histogram properties of an image including RGB color histogram, HSV histogram and
this feature set. Their proposed method using k-NN algorithm gives 94.5% accuracy
In research by Annadatha et al. [5], support vector machines (SVM) has been
implemented over 21 image features. Each feature has a weight associated based on
how much it has contributed to the classification. They classify images by compiling
feature extraction over a selected subset of features. This feature selection method
reduces the computational effort. The experiment achieves very high accuracy rate of
97% with area under the curve (AUC) value of 1. From their research, it is observed
that SVM is a reasonable approach for image spam detection which actively learns
various image properties and achieves a higher accuracy with a low false positive rate.
Chavda et al. [4] conduct two sets of experiments with SVM and image processing.
In the first part, they extract 41 image features and achieve 97% and 98% accuracy
with two publicly available datasets, respectively. Moreover, they construct two new
challenge datasets based on those public datasets using image processing techniques
on spam images. In the second part of experiments, they evaluate two feature
Aiwan et al. [12] propose an image spam filtering method based on convolutional
neural network (CNN). To detect image spam in real time, they train convolutional
neural network using enlarged data samples. The proposed system using data aug-
spam filtering model to obtain 7% to 11% higher accuracy than that of the traditional
method.
7
CHAPTER 3
Deep Learning techniques for image spam classification. In machine learning part, we
present an overview of SVM. In the deep learning section, we discuss feed forward
3.1 SVM
has been extensively used for in detecting email spam [13] and image spam [14]. In
3.1.1 Overview
There are four key ideas of SVM algorithm [15], which is a useful technique for
hyperplane that classifies the data into their respective groups. In an ideal case,
all data belonging to one class remains in one side of the hyperplane and the
• Maximize the margin: For binary classification, we try to find out an optimal
hyperplane. For binary classification, two sets of data are divided with maximum
margin of separation between each class and the hyperplane. The margin is
defined to the minimum distance between the hyperplane and the closest data
point in the training set. Optimal hyperplane is the particular one for which
8
assumed to be a linear function. But there are cases where data points are not
linearly separable. Hence, SVM transforms the input space data to a higher
• Kernel trick: Kernel trick means a function which transforms data into another
space. The kernel trick does not perform the actual transformations, yet it
Like other machine learning techniques, SVM works in two phases: training phase
and testing phase. At the training phase, we build a model which learns from a labeled
dataset. At the testing phase, we analyze the prediction results of the generated model
as its response to new data. These two phases can be summarized below:
are corresponding set of classification where 𝑧𝑖 ∈ {−1, 1}, training a SVM model
1. Transforming data points (input space) to high dimensional feature space which
9
This is called solving Lagrangian Duality [15]. The purpose is to find out an
optimal separating hyperplane which will clearly divide the feature space into
two sets.
3.1.2.2 Testing Phase
In this phase, the model is evaluated based on its accuracy for the test dataset.
process composite data inputs through supervised learning. A neural network consists
of a collection of nodes and these nodes model the neurons to perform activation for
transforming input data to output. Neural networks are used in image classification
and there have been many ongoing researches on image classification using deep
learning [16].
Feed forward neural network using back propagation is an artificial neural network
used in classification and regression. It was the first and simplest artificial neural
network designed. Feed forward neural network with back propagation is used for
classification of image spam using the most optimal feature vectors extracted from an
Multi layer perceptron (MLP) is the type of neural network consisting of multiple
Each neuron at every layer has a directed interconnection with the nodes in the
following layer. Multi Layer neural networks use various learning techniques and back
propagation is the most popular one. A MLP has three layers at minimum: an input
layer, a hidden layer and an output layer. Other than the input nodes, each node is a
neuron that uses a non-linear activation function. Sigmoid function is the commonly
10
Figure 1: Topology of Multilayer Perceptron
𝑒𝑥
used activation function which can be formulated as 𝑆(𝑥) = 1
1+𝑒−𝑥
= 𝑒𝑥 +1
.
In back propagation, the output scores are compared with the correct answer to
calculate the value of an error function and this error value is then fed backwards
throughout the layers of the network. With the help of this information, the algorithm
tries to adjust the weight at each connection to reduce the error value. After repetition
of this process for several iterations, the whole model converges to the state with a
very small value of error function or the change in error value is insignificant. Figure 1
Generally, neural networks use fully connected layers, that is, all neurons at one
layer are connected to all neurons in the next layer. A fully connected layer can deal
effectively with correlations between any points within the training vectorsâĂŤregard-
less of whether those points are close together, far apart, or somewhere in between.
In contrast, CNNs are designed to deal with local structureâĂŤa convolutional layer
cannot be expected to perform well when crucial information is not local. The benefit
of a CNN is that convolutional layers can be trained much more efficiently than fully
connected layers.
For images, most of the important structure (edges and gradients, for example)
is local. Hence, CNNs would seem to be an ideal tool for image analysis and, in fact,
11
Figure 2: Schematic Representation of CNN Architecture
CNNs were developed for precisely this problem. How- ever, CNNs have performed
well in a variety of other problem domains. In general, any problem for which there
CNN. In addition to images, local structure is key in the fields of text analysis and
A CNN consists of an input layer, multiple hidden layers and an output layer.
Each layer has the common property that it transforms an input to an output with
the help of some function that may or may not have parameters. The hidden layers
consist of convolution layer followed by pooling layer and then again convolution and
layer is a primary building block of CNN which does most of the heavy computational
tasks. It implements a convolution operation on the input and the result is passed
to the next layer. This layer essentially computes the output values of the neurons
which are connected to the local regions of inputs. The computation involves a dot
product between the weights of the neurons and a small region in the input volume
they are connected to. Figure 2 shows a symbolic representation of CNN architecture.
Pooling layer tries to reduce the number of parameters and amount of computation
spatial size of the network. There are two types of pooling operations, average pooling
12
and max pooling. Max pooling is commonly used in CNN applications. It performs
rest. The last layer is a fully connected layer where the neurons are fully connected
13
CHAPTER 4
Experiments
This chapter presents the empirical analysis and results of our experiments. We
discuss about the datasets used and criteria for evaluation followed by experiments
and results.
accurately the model has classified a spam as spam and a ham as ham. True positive
(TP) is the number of correctly identified samples. False positive (FP) represents
the number of incorrectly identified samples. True negatives (TN) are number of
negative examples labeled as negatives and false negatives (FN) are number of positive
and FN as TP + TN
Accuracy =
TP + TN + FP + FN
In machine learning, performance evaluation is a crucial task. When a classifica-
tion problem is considered, we can count on the AUC of an ROC curve to quantify
(ROC) curve is a probability curve which is used to compute the area under the curve
(AUC) value. By analogy, Higher the AUC value, better the model at classifying
between spam and ham images. ROC curve is graphically plotted with true positive
rate (TPR) against false positive rate (FPR) at various threshold setting. Figure 3
shows ROC curve with shaded area. The area of the shaded section is computed to
determine the AUC value. This value generally lies in between 0.5 to 1. Having an
14
Figure 3: ROC Curve with Shaded Area
RAM. We use Python for generating the learning models, OpenCV for image processing
for mathematical functions and Tensorflow libraries for deep learning training and
testing.
4.3 Dataset
Not many image datasets are available to the public due to privacy issues. We
use one public dataset which contains actual spam and ham emails exchanged in
real time. Moreover, we conduct our experiments on two other datasets generated to
4.3.1 Dataset 1
The dataset was developed by authors of Image Spam Hunter [3] from North-
western University. The dataset contains 920 spam images and 810 ham images. All
15
4.3.2 Dataset 2
This dataset was created by Chavda et al. [4] using image processing techniques
on spam images to make them appear more like a ham image. A public corpus named
Spam Archieve [18] consists of only spam images. They use this corpus and use a
weighted overlay technique to blend those spam images on the ham images from
dataset 1.
4.3.3 Dataset 3
This dataset was also developed by Chavda et al. [4] by using a different overlay
technique.For this dataset, the background of spam images was deleted and the
resulting image was then overlaid onto a ham image. This makes the spam text easier
to read, as compared to dataset 2, and according to the results in [4], also makes for a
We consider byte data to construct our feature vector. In the datasets, we observe
that the images are of different size. Hence, to maintain consistency, we resize all of
the images into 32 × 32 dimension. To build the feature matrix, we generate byte
data for each pixel in an image. Each pixel is contained in three bytes and each byte
represents red, green and blue (RGB) color information within the range from 0 to 255.
For computational convenience, each number is mapped into the range of 0 to 1 for
the resized matrix. Thus, the feature matrix consists of byte information features for
each raw and canny image where each feature vector has 3072 components.
In the next phase, we transform each raw image into a canny image by following
canny edge detection technique. Later, we merge each raw and the corresponding
canny image to form a new image. This new image has a dimension of 64 × 32 and each
feature vector contains 6144 features from the combination of raw and canny image
16
Figure 4: Feature Generation (Raw, Canny and Combination of Raw and Canny)
features. We use raw, canny and the combination of these two feature vectors to train
our models. Figure 4 shows a visual representation of the feature generation process
we propose and use in our project. On the left side of the diagram, we transform a
raw ham image into a canny hame image, followed by resizing these two images into
the same dimension and lastly making the combined feature vector. The right side of
the diagram depicts the same procedure that we follow for a spam image.
4.5 Results
We conduct our experiments with SVM, feed forward neural network and convo-
lutional neural network. This section contains experimental details and results.
For our experiments, we generate separate SVM models for each of the three
datasets. In each dataset, we perform a random shuffle and use 70% of the image
samples for training and the remaining 30% for testing. In all of these SVM experiments
Table 1 shows the accuracy of the SVM when trained and tested on Dataset 1,
using the raw images resized to 32 ÃŮ 32. When using the RBF kernel, we achieve
17
an accuracy of 0.9748, which is better than the 0.9156 accuracy with the linear kernel.
For comparison, we also build another SVM based on the Canny images. In this case,
the accuracy drops to 0.9156 and 0.8492 for the RBF and linear kernels, respectively.
We observe that for the SVM, the results for the raw images exceed those for the
Canny image.
4.5.1.1 Dataset 1
32 × 32 Raw Canny
RBF 0.9748 0.901
Linear 0.9156 0.8492
Next, we give results for an analogous set of experiments, but with the images
resized to 16 ÃŮ 16, giving us feature vectors of length 768. Here, we do sightly better
than the 32 ÃŮ 32 case when using the rbf kernel, but worse for the linear kernel.
16 × 16 Raw Canny
RBF 0.9752 0.9048
Linear 0.8838 0.7861
18
We conduct our experiments on datasets 2 and 3 using both raw and canny
images. Table 3 shows the results for dataset 2. From the results we observe that for
both raw and canny image features, we achieve higher accuracies with rbf kernel while
with linear kernel the accuracies are below 0.50. As these challenge datasets were
these results are not unexpected. Table 4 shows the accuracies obtained for dataset 3.
Since we have tested each dataset on raw and canny image features individually,
in the next phase, we build another SVM model on combined raw and canny image
byte features. Table 5 presents the results from our experiments on the three datasets.
We tune the model with rbf and linear kernel. Our experiment achieves slightly better
accuracy for dataset 1 with rbf kernel. For dataset 2, our model yields an accuracy of
0.7265 with rbf kernel which is better than 0.6939 obtained from using linear kernel.
On the contrary, for dataset 3, our SVM model performs well when we tune it with
linear kernel, yielding an accuracy of 0.7183 against 0.6896 with rbf kernel. From the
19
(a) Dataset 1 (RBF) (b) Dataset 2 (RBF) (c) Dataset 3 (Linear)
results, it is observed that for dataset 1, SVM technique performs well on combined
features than using raw or canny features individually. But, for dataset 2, it is not the
same case where we can see that our proposed SVM approach provides better results
with raw image features alone . For dataset 3, combined features yield higher accuracy
results in comparison with the accuracies from training with individual features.
Figure 5 gives the ROC curves for the best SVM result for each dataset, based
on the combined (raw and Canny) features. As given in Table 4, the corresponding
AUC values are 0.9872 for Dataset 1, 0.7265 for Dataset 2, and 0.7183 for Dataset 3.
To experiment with Multilayer perceptron (MLP) for classifying spam and ham
images, we explore several architectures. The results reported here are for an MLP
with one input layer, two hidden layers and one output layer. For each 64 × 32
image, the input layer consists of 6144 nodes. Moreover, each hidden layer has 300
nodes and uses rectifier linear unit (ReLU) as activation function. To measure the
loss, we selected binary cross entropy loss function. A sigmoid score function is used
at the output stage. Our MLPs are trained on 70% of the image samples, and the
models are trained for 100 epochs. At each epoch, a batch size of 64 is used, and the
validation split is taken as 15% of the image data samples. We activate early stop
approach to avoid overtraining the MLP. In addition, a dropout value of 0.5 is used
20
Figure 6: Proposed MLP Architecture
We achieve an accuracy of 0.96 after testing the MLP on the remaining 30%
images. Figure 7a shows MLP model accuracy over 100 epochs. The loss graph in
Next, we conduct similar experiments using the MLP over datasets 2 and 3. The
analogous MLP accuracy and loss curves are given in Figures 7c and 7d. Once the
iteration stops, we observe that there is a big difference between the training and
graph because the difference between training and test loss becomes very small after
Figure 7e presents training and validation accuracy over dataset 3. The model
iterates through 21 epochs. It is visible that when the model stops training, the test
accuracy is less than the training accuracy. Figure 7f shows model loss where the
Table 6 presents the optimal testing accuracies for the MLP experiments sum-
21
(a) Accuracy Dataset 1 (b) Loss Dataset 1
marized in above. In comparison to the SVM results, we see that the MLP fails to
outperform the SVM on any of the three datasets. Also, on Dataset 2, the MLP is
22
Table 6: MLP Accuracy
Dataset Accuracy
Dataset 1 0.9557
Dataset 2 0.5885
Dataset 3 0.6605
ciency and accuracy—for image analysis. As with the SVM and MLP experiments
discussed above, we apply CNNs to each of the three datasets under consideration.
We experimented with various CNN hyperparameters, but for all of the experi-
ments reported here, we use the following configuration. We use three convolution
layers following the input layer. The first two con- volution layer has 32 filters. Layer
two has 64 filters. Each layer has a kernel size of 3 × 3. We downsample the data
via a max pooling layer, using a 2 × 2 pool size. From the last pooling layer, 768
input features are derived and flattened, which are fed to a hidden layer containing 64
nodes. We use ReLU activation function to acitivate a subset of inputs from previous
layer. Finally the hidden layer is fully connected to the output layer consisting of one
node. At this layer, we use sigmoid activation function and cross entropy loss function.
to avoid overfitting, we use a dropout rate of 0.5. The batch size is set to 64 for
each epoch, and we have a total of 100 epochs. As with our MLP experiments, we
use 70% of the data for training and 30% for testing. Figure 8 represents the CNN
The accuracy and loss graphs for Dataset 1 are given in Figures 9a and 9b, and
clearly show that overfitting does not occur. The analogous graphs for Dataset 2
23
Figure 8: CNN Architecture
appear in Figures 9c and 9d, while the results for Dataset 3 can be found in Figures 9e
and 9f. From dataset 2 accuracy graph, we observe that the model iterates through 21
epochs and finally when the model converges we obtain an accuracy of 0.8313. Besides,
loss graph exhibits no overfitting or underfitting in the model. Accuracy graph for
dataset 3 shows that once the model converges there is a significant difference between
training and test accuracies. Corresponding loss graph suggests that the model is
overfitting the data in this dataset as the difference between validation land training
loss is notable.
The optimal CNN testing accuracies for the three datasets under consideration
are given in Table 7. From these results, we see that our CNN outperforms both
the SVM and MLP on Datasets 1 and 2 , and does nearly as well as the SVM on
Dataset 3.
Next, we evaluate the three models, namely, SVM, MLP and CNN, in the
âĂIJcold startâĂİ case, that is, the case where the training data is limited. We
24
(a) Accuracy Dataset 1 (b) Loss Dataset 1
Dataset Accuracy
Dataset 1 0.9902
Dataset 2 0.8313
Dataset 3 0.6769
25
Figure 10: Cold Start Results - SVM
start our experiment with just 10 samples for training, and we gradually increase the
number of samples used to train the models. Every result reported in this section is
based on 10 separate experiments, with the training data randomly selected for each
experiment. For a specific number of samples, we plot the maximum accuracy from
the 10 iterations.
We generate each chart by plotting the number of samples in x-axis and accuracy
in the y-axis. The chart in Figure 10 shows the accuracies from experiments with
SVM. Figures 11 and 12 give the accuracies of our cold start experiments for MLP
From the experiments, it can be observed that SVM performs well in cold start
26
Figure 12: Cold Start Results - CNN
case as we see for the first few set of samples, accuracies are very high. Once we
train the SVM model with 200 samples or more, accuracy drops down a little and
the contrary, MLP and CNN models seemingly do not learn adequate enough when
number of training samples are as limited as 100 samples or less. Once the models
are trained with 200 samples or more, accuracy rises above 0.90 and the graph stays
of 0.9872 while for challenge datasets (dataset 2 and dataset 3) it yields 0.7885
and 0.7183, respectively. As these two datasets were generated to challenge existing
detection methods, it is intuitive that the proposed technique would not yield as high
accuracy as the results obtained for dataset 1. In the next set of experiments, we
explore MLP for the three datasets considered in our project. Our proposed MLP
approach provides 0.9557 accuracy for dataset 1 while for the two challenge datasets,
the learning rate does not improve and we achieve 0.5885 and 0.6605 accuracies,
respectively.
27
Figure 13: Comparison of Learning Techniques
images. For dataset 1, CNN gives the best accuracy score of 0.9902 among the three
proposed techniques. When we train CNN model with images from challenge dataset 1,
from the results, it is quite evident that the model learns competently and gives 0.8313
accuracy which is better than SVM and MLP results. On the other hand, CNN
experiment for challenge dataset 2 does not yield an accuracy as better as SVM model
and we obtain an accuracy of 0.6769. Figure 13 shows the comparative analysis of the
We conduct another comparative analysis of the results from this research work
and previous research in image spam detection. For dataset 1, research by Chavda et
al. [4] and Annadatha et al. [5] are considered. We refer to the work in this paper as
Research 1, and the work in [4] and [5] as Research 2 and Research 3. From Figure 14,
we see that for dataset 1, the highest accuracy previously achieved was 0.97, while
detection schemes. For challenge dataset 1, their proposed technique achieves best
accuracy of 0.79 while our CNN approach yields high accuracy of 0.8313. Figure 15
28
Figure 14: Comparison to Previous Work (Dataset 1)
presents this analysis. For challenge dataset 2, in this research, proposed SVM
technique with combined features performs slightly better than the approach in
Research 2, with an accuracy of 0.7183 where the highest accuracy they achieved was
29
Figure 16: Comparison to Previous Work (Challege Dataset 2)
30
CHAPTER 5
Conclusion
Since the evolution of electronic communication, spam has always been a chal-
lenging problem to the cyber world. Hence, it requires substantial attention and
different techniques to detect image spams. Several machine learning techniques have
In this research, we have analyzed three novel approaches for image spam filtering.
One approach includes experimenting with Support Vector Machines and the other
two methodologies employ deep learning techniques which are feed forward neural
normalized byte blocks of images. The neural network models do not require manual
feature extraction. So, we build deep learning models by splitting image data into 70%
for training and the remaining 30% for test purpose. We evaluate the model accuracy
by tuning several parameters and in multiple iterations. Moreover, we also plot model
loss to observe overfitting by taking training and validation data into account. We
successfully deploy our models for binary classification of image spams with supervised
learning.
strates the effectiveness of the proposed approaches. From the results, we observe
that CNN-based model achieves higher accuracy on the public dataset and one of
the challenge datasets used in the project. As CNN employs convolutions to extract
has been proved to be the most efficacious method in image spam classification.
Future works may include, but not limited to exploring more features related
to edges which may guide to new and improved direction in the SVM part. In
31
addition, further research can be executed by exploring other deep learning techniques
such as RNN and LSTM. Moreover, additional tuning of hyper parameters and the
architecture may yield more insights in the deep learning network models. Besides,
our proposed system can be extended to other image classification problems such as
stop signs, etc.) in images. Deep learning Neural networks are capable of uncovering
the latent structure from unlabelled data. Hence, training neural networks on image
32
LIST OF REFERENCES
[5] A. Annadatha and M. Stamp, ‘‘Image spam analysis and detection,’’ J. Computer
Virology and Hacking Techniques, vol. 14, no. 1, pp. 39--52, 2018. [Online].
Available: https://doi.org/10.1007/s11416-016-0287-x
[6] S. Dhanaraj and V. Karthikeyani, ‘‘A study on e-mail image spam filtering tech-
niques,’’ in 2013 International Conference on Pattern Recognition, Informatics
and Mobile Engineering, Feb 2013, pp. 49--55.
[7] ‘‘Report: 51% of web site hacks related to seo spam,’’ https://searchengineland.
com/report-51-of-web-site-hacks-related-to-seo-spam-313468, accessed on March
7, 2019.
[9] M. Hassan, W. Mirza, and M. Hussain, ‘‘Header based spam filtering using
machine learning approach,’’ October 2017.
33
Engineering, vol. 8, no. 10, pp. 1904 -- 1907, 2014. [Online]. Available:
http://waset.org/publications/10000193
[12] F. Aiwan and Y. Zhaofeng, ‘‘Image spam filtering using convolutional neural
networks,’’ Personal Ubiquitous Comput., vol. 22, no. 5-6, pp. 1029--1037, Oct.
2018. [Online]. Available: https://doi.org/10.1007/s00779-018-1168-8
[13] T. Yu and W. Hsu, ‘‘E-mail spam filtering using support vector machines with
selection of kernel function parameters,’’ in 2009 Fourth International Conference
on Innovative Computing, Information and Control (ICICIC), Dec 2009, pp.
764--767.
34