0% found this document useful (0 votes)
26 views9 pages

Comparison Analysis of Traditional Machine Learnin

Uploaded by

Mamata Kulkarni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views9 pages

Comparison Analysis of Traditional Machine Learnin

Uploaded by

Mamata Kulkarni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

WSEAS TRANSACTIONS on MATHEMATICS Efstathios Karypidis, Stylianos G.

Mouslech,
DOI: 10.37394/23206.2022.21.19 Kassiani Skoulariki, Alexandros Gazis

Comparison Analysis of Traditional Machine Learning and Deep


Learning Techniques for Data and Image Classification

EFSTATHIOS KARYPIDIS1, STYLIANOS G. MOUSLECH1, KASSIANI SKOULARIKI2,


ALEXANDROS GAZIS1,*
1
Democritus University of Thrace, Department of Electrical and Computer Engineering
Xanthi, 67100
GREECE
2
Democritus University of Thrace, Department of Production Engineering and Management
Xanthi, 67100
GREECE
Abstract — The purpose of the study is to analyse and compare the most common machine learning and deep learning techniques
used for computer vision 2D object classification tasks. Firstly, we will present the theoretical background of the Bag of Visual
words model and Deep Convolutional Neural Networks (DCNN). Secondly, we will implement a Bag of Visual Words model, the
VGG16 CNN Architecture. Thirdly, we will present our custom and novice DCNN in which we test the aforementioned
implementations on a modified version of the Belgium Traffic Sign dataset. Our results showcase the effects of hyperparameters
on traditional machine learning and the advantage in terms of accuracy of DCNNs compared to classical machine learning methods.
As our tests indicate, our proposed solution can achieve similar - and in some cases better - results than existing DCNNs
architectures. Finally, the technical merit of this article lies in the presented computationally simpler DCNN architecture, which
we believe can pave the way towards using more efficient architectures for basic tasks.

Keywords - Computer Vision, Multi-Class Classification, K-Means Clustering, Feature Vectors, Linear/Non-Linear Classifiers,
Deep Learning, Convolutional Neural Networks, Data Augmentation, Feature Maps (Feature Learning), Transfer Learning

Received: May 28, 2021. Revised: January 30, 2022. Accepted: February 23, 2022. Published: March 23, 2022.

1. Introduction
I N our modern day and age, the amount of data provided daily
by individuals and businesses is rapidly increasing due to
new technological advancements. The use of the Internet of
Additionally, AI is a field that shows great potential as, due to
Moore’s law that states the computing processing power yearly
increases while the cost is decreased, we are now capable of
Things and especially Machine-to-Machine communication handling rapidly and most importantly reliably data points
channels have helped create a large interconnected network of generated on the spot. Analytically, AI is not a new term as one
computing devices. Specifically, the increasing use of mobile of the first mathematical models implementing a biological
devices equipped with a large variety of sensors, various neural network (BNN) was first presented in 1943 [2]. This
everyday embedded devices, and everyday tasks such as web publication showcased how an Artificial Neural Network
browsing provide abundant information that is stored to the (ANN) can be emulated via the use of a BNN using advanced
“cloud,” i.e. to remote repositories. These data are then later mathematic formulas consisting of parameters such as weights,
accessed via Big Data infrastructures that propose methods to bias, and activation functions [3], [4]. Furthermore, besides
optimally extract, transform, and load this information to data classification tasks, this concept was later used as a
advanced systems capable of mining these data points. The stepping stone to enhance data insights in the field of Computer
outcome of these processes is to fuse the data streams in real- vision as it introduced detailed object classification and
time and implement techniques harnessing the power of recognition in image/video datasets. Lastly, recent advances in
Artificial Intelligence (AI) to provide valuable data insights the field of deep learning ANNS [5], [6], [7] focus on data
categorised into descriptive, diagnostic, predictive, and/or classification and image recognition tasks via deep learning
prescriptive results [1]. techniques [8], [9] in several fields. Most notably this includes
e-commerce [10], finance [11], humanitarian aid [12],
education [13], healthcare [14], and ecological informatics [15].

E-ISSN: 2224-2880 122 Volume 21, 2022


WSEAS TRANSACTIONS on MATHEMATICS Efstathios Karypidis, Stylianos G. Mouslech,
DOI: 10.37394/23206.2022.21.19 Kassiani Skoulariki, Alexandros Gazis

In this article we will focus on the last part of the above- [18], [19], [20]. As a result, recent research has shifted its
mentioned process, presenting a detailed comparison of recent efforts to provide innovative solutions that take under
ML and deep learning techniques. Specifically, the outline of consideration a trade-off between optimal accuracy for low-
our paper is as follows: firstly, we will present the aims and power and low-cost methods. This means that emphasis must
objectives of our study. Secondly, we will briefly discuss some be given to study, understand, and ameliorate existing
of the most widely used ML data and image classification computational mathematics approaches and methods.
techniques in the industry. Thirdly, we will present and focus In the following sections, we will present the 2 algorithms
on ML and Deep Learning solutions specialising in that we have studied using both ML and Deep Learning
classification problems. Fourthly, we will present a detailed solutions. Analytically, the first focuses on ML extracting key
comparison analysis between these traditional ML and Deep features using SIFT algorithm and creating a global
Learning techniques. Fifthly, we will discuss our comparative representation as described in the bag of visual words (BOVW)
results and suggest a novice ANN that achieves high accuracy model [21], [22], [23]. Furthermore, we will present
of over 90% regarding a benchmark tested dataset. Finally, we classification techniques focusing on the K-nearest neighbours’
will draw conclusions and discuss how this publication can be algorithm (KNN) and support vector machine (SVM) classifier.
used in future works. Moreover, regarding Deep Learning techniques we will
showcase the implementation of two custom convolutional
neural networks (CNN), i.e., one with and one without the use
2. Aims and Objectives of a pretrained model. Lastly, for each of the method used we
This article aims to address the evaluation of some of the most highlight the most important mathematic components and
widely used techniques regarding data and image address common neural network issues such as overfitting.
classification. Specifically, our tests are two folded:
1. ML techniques: using the bag of visual words model 3.2 Dataset
(BOVW), K-nearest neighbours and support vector In order to conduct this comparative analysis, we have
machine algorithms extended the BelgiumTS - Belgium Traffic Sign Dataset [24]
2. Deep Learning techniques: using deep convolutional by creating our custom version [25]. During our dataset design,
neural networks, i.e., a pre-trained model and the we have noticed the existence of class imbalance containing
proposed ANN more images than others, but this does not affect the quality of
Additionally, using the above-mentioned methods, we aim to the data. Specifically, our version consists of several images
address the following:
from various traffic signs split into 34 classes with each class
1. Provide information regarding some of the most
containing photos of different signs. More specifically, the
widely used methods for data/image classification
training dataset contains 3056 images which are split into an
2. Present a comparative study of ML and Deep
Learning based solutions 80/20 ratio regarding training and validation (i.e. 2457/599
3. Explain the architecture and the mathematical images). Lastly, the testing dataset used contains 2149 images.
parameters of these methods 3.3 Traditional ML methods
4. Suggest a novice CNN architecture for image
classification The traditional ML methods that are most commonly used in
Lastly, the technical novelty of this article is not only academia and industry alike consist of the following:
presenting a comparative study between traditional ML and 1. Points of interest detection of points and features
Deep Learning techniques, but suggesting a new CNN that extraction (descriptions for each image of a trained data
achieves accuracy levels of slightly over 90% - and in some set).
cases higher - similarly with the most recent scientific 2. Production of visual vocabulary based on the BOVW
advances in the field. model and implementation of K-means algorithm.
3. Encoding training images using the generated dictionary
and histogram extraction.
3. Background and Methods 4. Classification using KNN and/or SVM
1) Traditional ML methods
3.1 Defining the Problem In these methods, a system identifies the points of interests
In machine vision, one of the most common problems mainly for each of the given images. Analytically, this is achieved via
occur due to the following reasons: difficulties in recognising the use of detectors that enable features extraction (i.e.,
objects from different angles, differences in lighting, volatile descriptions for a vector of features) for each examined point of
rotation speeds of objects, rapid changes in scaling, and generic interest. Then, these vectors are examined to determine the
intraclass variations. attribute identity thus enabling us to decide if two
In the last decade, due to increasing computational power, characteristics are similar. In this article, we use the SIFT
the rise of cloud computing infrastructures, and advances in descriptor [26].
hardware acceleration techniques (either using GPUs or remote
data centres), 2D object recognition research has rapidly 2) Production of visual vocabulary - BOVW model
increased. Specifically, recent research suggests that these The BOVW consists of a dictionary, constructed by a
challenges do not pose a computational problem [16], [17], clustering algorithm which aims to locate differences between

E-ISSN: 2224-2880 123 Volume 21, 2022


WSEAS TRANSACTIONS on MATHEMATICS Efstathios Karypidis, Stylianos G. Mouslech,
DOI: 10.37394/23206.2022.21.19 Kassiani Skoulariki, Alexandros Gazis

an image and a general representation of a dataset. Specifically, distance measures exist, where the most commonly used are the
the operating principle behind the BOVW model supports that following:
to encode all the local features of an image, a universal
 Manhattan Distance (L1), (also known as Taxicab),
representation of an image must be created. This model
defines the distance (d1), between 2 vectors (p, q) of an
compares the examined image with the generated
n-dimensional vector as follows:
representation of each class and generates an output based on
𝑛
the differences of their content.
Similarly, in our article, our objective is to use an d1 (𝑝, 𝑞) = ∑|𝑝𝑖 − 𝑞𝑖 |
unsupervised learning technique that groups the output of all 𝑖=1
descriptors generated from an examined dataset of images to  Euclidean Distance (L2), defines the distance (d2)
distinct groups of unique characteristics. between 2 vectors (p, q) of an n-dimensional vector as
Furthermore, to implement this model, several algorithms follows:
approach, the most common one being the K-means, a
clustering algorithm which organised the data points provided 𝑛
to the nearest centroid, for a fixed number K of clusters (i.e. d2 (𝑝, 𝑞) = √∑|𝑝𝑖 − 𝑞𝑖 |2
words), until the system converged, for a given number of
𝑖=1
iterations [27]. The steps of this algorithms are the following
[28]: Moreover, in machine learning, many distance metrics exist
1. Initialise cluster centroids 𝜇1 , 𝜇2 , . . . , 𝜇𝑘 ∈ 𝑅𝑛 for multiple n-dimensional vector spaces to calculate the notion
randomly of distance or similarity [34]. In our study, we have selected to
2. Repeat until convergence use a generalisation of L1 and L2 distances. Specifically, in our
For every i, regarding the index of each point, set model we have defined the Minkowski distance, that based on
2
𝑐 (𝑖) ≔ 𝑎𝑟𝑔 𝑚𝑖𝑛𝑗 ‖𝑥 (𝑖) − 𝜇𝑗 ‖ the provided values of the input (p, q) it shapes its equation to
For every j, regarding the index of each cluster, set the necessary distance. The Minkowski distance is defined as
∑𝑚 (𝑖)
𝑖=1{𝑐 =𝑗}𝑥
(𝑖) follows:
𝜇𝑗 : = ∑𝑚 (𝑖)
𝑖=1{𝑐 =𝑗} 1
𝑛
Where, 𝑥𝑖 is the unique feature vector (descriptor) i and 𝑐 (𝑖) is 𝑝

the assigned cluster of xi. dp (𝑝, 𝑞) = (∑|𝑝𝑖 − 𝑞𝑖 |p )


𝑖=1
3) Encoding
In addition, another step of great importance is to determine 5) SVM classifier
the properties of the classifier. Specifically, this is achieved via The SVM classifier uses an algorithm to select the surface
encoding the content of the images based on a dictionary of that lies equidistant from the 2 nearest vector spaces [35]. This
universal characteristics. In order to perform this, a histogram is achieved via classifying dataset into different classes and
is produced that provides information regarding the frequency calculating the margin between each class, thus creating vectors
of the visual words of the dictionary in an image. that support this margin region (support vectors). Additionally,
Moreover, upon producing a histogram for each word - using in cases where several classes occur such as in our image
a vector of features - images are compared with a dictionary, recognition dataset, research also uses the “one versus all”
and words correspond to the shortest distance. This results in approach where a classifier is trained for each class [36].
finding the greatest similarity between the dataset. Analytically, the support vectors of the data points that are
Finally, we notice that normalisation is applied to the closer to the hyperplane influence the position and orientation
calculation of the occurrence frequency as we wished to ensure of the hyperplane.
that the generated histograms did not depend on the number of Furthermore, the mathematical equation for an SVM linear
visual words. definition of a space (x) of examined data points is the
4) KNN classifier following:
The KNN algorithm is a non-parametric classifier [29], [30], 𝑓(𝑥) = 𝑤 ⋅ 𝑥 + 𝑏
[31] which accepts the histograms of the previous stage and
compares them with the image dataset focusing on calculating Moreover, the mathematical equation for a generic SVM
and monitoring differences in the measured distances. Then, definition is the following:
each image is classified to a unique cluster which shows the {𝑓: ‖𝑓‖2𝐾 < ∞}
greatest degree of similarity with its k nearest neighbours.
Where, instead of a space (x), a space of a kernel K is used.
As evident from the above, the classifier depends greatly on
the distance metric used to predict and categorise each set of Lastly, based on the kernel space, the equations derived are
results into k-groups. Moreover, we should consider that a “one the following:
size fits all” solution does not exist and special attention should 𝐾(𝑥1 , 𝑥2 ) = 𝑥1 ⋅ 𝑥2 , 𝑓(𝑥) = 𝑤 ⋅ 𝑥 + 𝑏
be given to each problem [32]. The distance measure selected
highly depends on the dataset examined and should be chosen Where the norm (w) of the equation is:
after a trial-and-error approach [33]. Specifically, many ‖𝑓‖2𝐾 = ‖𝑤‖2 .

E-ISSN: 2224-2880 124 Volume 21, 2022


WSEAS TRANSACTIONS on MATHEMATICS Efstathios Karypidis, Stylianos G. Mouslech,
DOI: 10.37394/23206.2022.21.19 Kassiani Skoulariki, Alexandros Gazis

3.4 Deep Learning Algorithms


Apart from the ML methods discussed, Neural Networks
(NN) are also used extensively combined with supervised
learning techniques to identify and classify objects between
classes [37]. Specifically, the most common use of these NN in
the computer vision domain are CNNs implementing
hierarchical feature learning [38]. These networks take
advantage of the spatial information available on the ever-
increasing areas of images during the system’s processing.
More specifically, due to their structure and their properties Fig.1. Architecture of the proposed NN where convolution, max
(e.g., Local Receptive Fields, Shared weights, Pooling) they pooling, and fully connected layers are highlighted in yellow, orange,
significantly reduce the parameters and therefore the and purple respectively
computational power over traditional feed-forward fully
Furthermore, as the network progresses deeper, the number of
connected networks.
applied filters is augmented. Specifically, initially we start with
In our article, we have studied 2 models:
16 filters and gradually increase them to 32 and 64. This
1. The VGG16 architecture [39] followed by a custom
increase assists in producing high quality features as it
fully connected network as a classifier. We load the
combines low-level features that occur while training the
pretrained on Imagenet weights [40].
network. It is noted that in deep learning it is common practice
2. A custom CNN without transfer learning
to double the number of channels (or in our case filters) after
Lastly, we notice that during our tests we have experimented
each pooling layer as the knowledge of each layer becomes
with different activation functions of various hyperparameters,
more spatial. As we get deeper into the network, each pooling
data augmentation and normalisation methods to avoid
layer divides the spatial dimensions by 2 thus doubling the
overfitting. In the following sections we will explain in detail
number of filters and available kernels without the risk of rapid
each NN architecture as well as the activation functions and
increases in parameters, memory usage, and computing load
design principles.
[42], [43], [44].
In addition, similarly to the above-mentioned architecture,
6) VGG16 architecture
this network also consists of 4 stages of fully connected layers
Firstly, we use the VGG16 architecture followed by our
where 3 of them use BN and dropout activation. Analytically,
custom classifier. Specifically, we use fully connected layers
for this NN we have used several activation functions such as
with batch normalisation and dropout for regularisation, and
ReLU [45], [46], [47], and Swish [48] and after our tests, we
mish activation function [41]. Lastly, the final layer of the
conclude that Mish [49] is the optimal one. This function
classifier consists of 34 neurons using SoftMax activation
performed better than its rivals in terms of accuracy as it
functions, one for each class of the dataset.
generated the smaller loss rates and a smoother loss landscape
Moreover, during each training phase, we maintain all the
representation as its behaviour suggests that it avoids saturation
weights of each layer of VGG16 architecture frozen except for
due to capping. The Mish function is defined as follows:
the last 4 layers. Specifically, we made this choice because the
𝑓(𝑥) = 𝑥 ∙ 𝑡𝑎𝑛ℎ(𝑠𝑜𝑓𝑡𝑝𝑙𝑢𝑠(𝑥)) = 𝑥 ∙ 𝑡𝑎𝑛ℎ(𝑙𝑛(1 + 𝑒 𝑥 ))
first convolutional layers “learn” low level features (such as
Moreover, a fact of great importance is that smooth activation
straight lines, angles, edges, circles etc.) which are similar in
functions allow optimal information propagation deeper into
most images. Although, the deeper convolutional layers “learn”
the neural network, thus achieving higher accuracy and
high-level and more abstract features which are a combination
generalisation results. We present our findings regarding the
of low-level features and are problem-specific to each dataset.
activation functions studied during our experiments in Figure 2:
7) Proposed CNN architecture
The CNN architecture of the proposed Deep Learning
architecture is presented in Figure 1 where the input dataset
dimensions are: (128,128,3).
As evident from this image, the first stage of our CNN
consists of a convolutional layer, where we perform BN and a
max pooling. The second stage uses the same pattern twice, i.e.,
a convolutional layer followed by batch norm, another
convolutional layer and BN and finally max pooling. Lastly, the
classifier consists of 4 fully connected layers. The aim of the
second stage is to achieve optimal performance as it is proposed
for larger and deeper networks- as due to the multiple stacked
convolutional layers, more complex features of the input
volume can be extracted. This stage thus reduces cases of
destructive pooling operation.

E-ISSN: 2224-2880 125 Volume 21, 2022


WSEAS TRANSACTIONS on MATHEMATICS Efstathios Karypidis, Stylianos G. Mouslech,
DOI: 10.37394/23206.2022.21.19 Kassiani Skoulariki, Alexandros Gazis

where, if p is the possibility of retaining activation and k of


disactivating connections, NN behaviour is determined as:
1 𝑘
( )
𝑝
Lastly, we used data augmentation methods, where training
data are provided based on a probabilistic approach, i.e.,
calculating a possibility variable in real time during the training
phase and transforming images on the fly. This technique was
used during the training and not the testing or validation phase.
This is because we need the NN to optimise its behaviour during
learning and not execution mode.

4. Results
In this section we will firstly present the results of our
network architecture using ML methods. Later, we will provide
information regarding our experiments using Deep Learning
methods and illustrate the advantages of our NN approach.
Lastly, we will conclude this study with a comparative analysis
consisting of confusion matrices and comparison diagrams of
Fig.2. Comparison of the activation functions, some of them used in
our experiments each architecture.

Additionally, we have used adam optimiser [50] with a


steady learning rate of 0.0002. Analytically, 80% of our
4.1 Traditional ML methods
examined dataset was used for training and 20% for validation. The results of our executions are presented in Table 1 and 2
Moreover, to eliminate overfitting, we have used a BN where Manhattan outperforms Euclidean on average by 10-
technique similar to [51] and dropout technique. 15%. Specifically, we notice that while increasing the
The first was implemented by having continuously periodic vocabulary size and thus the amount of vector space and their
measurements of the distributions for all the examined dimensions, this rate is rapidly increasing, similarly to recent
activations of each batch of our training examples at all levels. studies in the bibliography [53]. Moreover, KNN classifier’s
During the feed propagation phase, a normalisation of the optimal behaviour occurs for dictionaries of 75 words and the
output is applied for each batch, thus each batch average 50 nearest neighbours as well as of 100 words and 20 for
approximately reaches zero values and standard deviation Euclidian and Manhattan distance alike.
reaches 1. Specifically, the mathematical equations used to Lastly, as presented in Table 1 and 2 we notice that even though
compute these parameters are the following: initially an increase in neighbours ameliorated the overall
 Batch average: accuracy results, after a certain point, a threshold was reached
𝑚
1 as accuracy values stagnated or decreased. Moreover, as
𝜇ℬ ← ∑ 𝑥𝑖 evident from Figures 3 and 4, results suggest that for a constant
𝑚
𝑖=1 value of neighbours, there is a significant increase in
 Batch variation: performance if we increase the quantity of (visual) words (i.e.,
𝑚
1 input). Unfortunately, this observation, although interesting,
𝜎ℬ2 ← ∑ (𝑥𝑖 − 𝜇ℬ )2 must be treated carefully as the initial rapid increase in accuracy
𝑚
𝑖=1 applies only to small to medium-scale dictionaries. Specifically,
 Normalised activation:
𝑥𝑖 − 𝜇ℬ for large dataset inputs, the trade-off between accuracy,
𝑥̂𝑖 ← computation cost, and execution time is not advisable similarly
√𝜎ℬ2 + 𝜖 to the study of [54].
 Changes in shift and scaling:
Table 1. Comparison of KNN accuracy for different values of K
𝑦𝑖 ← 𝛾𝑥̂𝑖 + 𝛽 ≡ BN𝛾,𝛽 (𝑥𝑖 )
(Number of neighbours) and different small vocabulary sizes.
Where γ,β are the parameters to be learned by the trained NN Euclidian (L2) distance is used as distance metric
and 𝜖 is constant for numerical stability that acts as a security
check to eliminate the possibility denominator value is zero. Vocabulary Size
The second technique (i.e. dropout) used was similar to [52] K
50 75 100
where during the training of the NN a random value of neurons 1 34.29502% 37.22661% 37.92462%
is selected to be disactivated (i.e. output 0). This acts as a means
to force the NN to span out its usage and not deeply rely on 3 33.73662% 37.97115% 38.20382%
specific neurons, thus generalising our solution and reducing 5 37.27315% 38.34342% 38.90181%
the possibility of developing “single point of error” systems.
This random value is selected based on a Bernoulli distribution, 10 38.34342% 40.01861% 40.29781%

E-ISSN: 2224-2880 126 Volume 21, 2022


WSEAS TRANSACTIONS on MATHEMATICS Efstathios Karypidis, Stylianos G. Mouslech,
DOI: 10.37394/23206.2022.21.19 Kassiani Skoulariki, Alexandros Gazis

Vocabulary Size
K
50 75 100
15 40.01861% 41.13541% 41.18195%

20 39.13448% 40.67008% 41.55421%

50 39.69288% 42.57794% 41.27501%

75 38.94835% 42.34528% 41.55421%

100 39.46021% 41.74034% 41.78688%

Table 2. Comparison of KNN accuracy for different values of K


(Number of neighbours) and different small vocabulary sizes.
Manhattan (L1) distance is used as distance metric. Fig.4. Accuracy comparison of KNN for various k clusters and an
increasing number of vocabulary sizes.
Vocabulary Size
K
50 75 100
1 41.32154% 49.55793% 52.44300%
4.2 AI methods
As for the examined AI methods, we have used call backs
3 42.67101% 50.58167% 53.93206%
and early stopping aiming to halt the training phase if a metric
5 46.25407% 52.62913% 54.11819% defined by the validation loss parameter reduces its
10 48.39460% 55.00233% 58.53886%
optimisation (i.e., value decrease) for a preset number of
epochs. Analytically, the training phase of our proposed NN
15 47.41740% 54.76966% 58.49232% model peaked its accuracy at epoch 35 (optimal validation loss
20 48.44114% 55.56073% 58.49232% value) and after 12 epochs its training stopped and retrieved the
previous weights of epoch 35. We will present our findings
50 48.58074% 53.88553% 57.09632% regarding the accuracy and the loss curves for our proposed
75 47.27780% 52.81526% 55.93299% solution - in comparison to the pretrained model used as a
means of reference - in Figure 5 and 6.
100 46.39367% 51.60540% 54.49046%

Table 3. Comparison of SVM accuracy for different types of Kernels


and different small vocabulary sizes.

Vocabulary Size
Kernel
50 75 100
Fig.5. Accuracy curves regarding the proposed model and the
RBF 30.71196% 36.57515% 43.78781%
pretrained model during training and validation
LINEAR 28.05956% 42.29874% 44.95114%

SIGMOID 4.18799% 4.18799% 4.18799%

CHI2 48.90647% 57.18939% 54.25779%

INTER 47.83620% 52.81526% 56.11913%

Fig.6. Loss curves regarding the proposed model and the pretrained
model during training and validation.

4.3 Comparative Analysis


Except for comparing the dictionary size and focusing on
either the BOVW or the KNN classifier, we have also studied
the behaviour of our suggested system with traditional metrics
used to access NN architectures. Firstly, we will present in
Table 4 the accuracy of our tests. Secondly, we will introduce
the confusion matrices regarding the optimal results for each
Fig.3. Accuracy comparison of kernel SVM kernel for increasing case. Specifically, we will provide a detailed visualisation of
number of vocabulary sizes.
our comparative study in Figures 7, 8, and 9 for all of the above-
mentioned methods.

E-ISSN: 2224-2880 127 Volume 21, 2022


WSEAS TRANSACTIONS on MATHEMATICS Efstathios Karypidis, Stylianos G. Mouslech,
DOI: 10.37394/23206.2022.21.19 Kassiani Skoulariki, Alexandros Gazis

Table 4. Training Accuracy and Loss between Proposed and


pretrained model (VGG16)

Benchmark Metrics Proposed Pretrained


(VGG16)
Accuracy 0.9833 0.9897
Training Loss 0.0616 0.0597
Accuracy 0.9716 0.9750
Validation Loss 0.1075 0.1011
Accuracy 0.9744 0.9716
Testing Loss 0.1321 0.1191

Fig.9. Confusion matrix of the proposed DCNN.

5. Conclusion and Future Works


In this article, we presented in detail a novice architecture of
a CNN model with validation and testing loss of 0.1075 and
0.1321 alike. Moreover, the system presented had a high
validation and testing accuracy of 97.16% and 97.44%
respectively, achieving optimal results in various test cases. In
addition, the pretrained model used as a point of reference and
comparison with our suggested CNN, initially scored lower loss
and higher validation accuracy (57% after the 1 st epoch) and
finally reached validation and testing loss of 0.1011 and 0.1191
and validation and testing accuracy of 97.5% and 97.16% alike.
The comparison between the proposed and pretrained model
shows that our system is capable of achieving similar results
with existing CNNs but authors suggest that for medium to
Fig.7. Confusion matrix of the optimal SVM (Kernel=
inter, Vocab_size = 3000). small size datasets, it slightly outperforms existing CNN
architectures. Analytically, authors suggest that this study can
be used in self-driving vehicle navigation or routing systems
[55], [56], [57] for object detection and avoidance. Also, since
our system was designed to use low-power and low-
computational cost hardware properties, we suggest this CNN
to be used either as a standalone solution or as part of existing
vehicles’ automatic control systems used to validate the
prediction (i.e. ground truth) of other sensory data points (e.g.
from camera or lidar) predictions.
Finally, future system expansions should shift their attention
into developing a dynamic model that automatically searches
and combines existing activation functions to achieve better
results. Moreover, we need to expand our architecture similarly
to the studies of vision transformers designs, studying related
datasets such as [58], [59], [60] as well as self-supervised
pretraining techniques [61].

References
Fig.8. Confusion matrix of the optimal KNN ( K= 10, [1] Wang, J. (2022). Influence of Big Data on
Vocab_size = 5000). Manufacturing Accounting Informatization. Cyber
Security Intelligence and Analytics. CSIA 2022.
Lecture Notes on Data Engineering and
Communications Technologies, 125, 695-700.
https://doi.org/10.1007/978-3-030-97874-7_92
[2] McCulloch, W.S., Pitts, W. (1943) A logical calculus
of the ideas immanent in nervous activity. Bulletin of

E-ISSN: 2224-2880 128 Volume 21, 2022


WSEAS TRANSACTIONS on MATHEMATICS Efstathios Karypidis, Stylianos G. Mouslech,
DOI: 10.37394/23206.2022.21.19 Kassiani Skoulariki, Alexandros Gazis

Mathematical Biophysics, 5, 115–133. Remote Sensing, 12(9), 1366.


https://doi.org/10.1007/BF02478259 http://dx.doi.org/10.3390/rs12091366
[3] Lin, J. W. (2017). Artificial neural network related to [17] Noble, F.K. (2016). Comparison of OpenCV's feature
biological neuron network: a review. Advanced Studies detectors and feature matchers. IEEE International
in Medical Sciences, 5(1), 55-62. Conference on Mechatronics and Machine Vision in
https://doi.org/10.12988/asms.2017.753 Practice (M2VIP), 1-6.
[4] Didaskalou, E., Manesiotis, P., Georgakellos, D. https://doi.org/10.1109/M2VIP.2016.7827292
(2021). Smart Manufacturing and Industry 4.0: A [18] Song, L., Minku, L.L., Yao, X. (2018). A novel
preliminary approach in structuring a conceptual automated approach for software effort estimation
framework. WSEAS Transactions on Advances in based on data augmentation. In Proceedings of the 26th
Engineering Education, 18, 27-36. ACM Joint Meeting on European Software Engineering
https://doi.org/10.37394/232010.2021.18.3 Conference and Symposium on the Foundations of
[5] Walczak, S. (2021). Artificial Neural Networks in Software Engineering, 468-479.
Medicine: Recent Advances. Encyclopedia of https://doi.org/10.1145/3236024.3236052
Information Science and Technology, Fifth Edition, [19] Jakubović, A., Velagić, J. (2018). Image feature
1901-1918. www.doi.org/10.4018/978-1-7998-3479- matching and object detection using brute-force
3.ch132 matchers. IEEE International Symposium ELMAR, 83-
[6] Dixit, E., Jindal, V. (2022). IEESEP: an intelligent 86. https://doi.org/10.23919/ELMAR.2018.8534641
energy efficient stable election routing protocol in air [20] Che, E., Jung, J., Olsen, M.J. (2019). Object
pollution monitoring WSNs. Neural Computing and recognition, segmentation, and classification of mobile
Applications. https://doi.org/10.1007/s00521-022- laser scanning point clouds: A state of the art review.
07027-5 Sensors, 19(4), 810. https://doi.org/10.3390/s19040810
[7] Ayobami, A.O., Bunu, R., Jamal, U., Adedayo, O.O. [21] Sivic, J., Zisserman, A. (2003). Video Google: A text
(2020). Artificial Intelligence and its Applications: retrieval approach to object matching in videos. IEEE
Current Trends and Challenges. International Journal International Conference on Computer Vision, 3,
of Innovative Science and Research Technology, 5(2), 1470-1470.
144-147. https://ijisrt.com/artificial-intelligence-and- https://doi.org/10.1109/ICCV.2003.1238663
its-applications-current-trends-and-challenges [22] Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray,
[8] Wang, X., Zhao, Y., Pourpanah, F. (2020). Recent C. (2004). Visual categorization with bags of
advances in deep learning. International Journal of keypoints. In Workshop on statistical learning in
Machine Learning and Cybernetics, 11(4), 747-750. computer vision, ECCV, 1(1-22), 1-2.
https://doi.org/10.1007/s13042-020-01096-5 https://www.researchgate.net/publication/228602850_
[9] Paulo de Oliveira, A., Braga, H.F.T. (2020). Artificial Visual_categorization_with_bags_of_keypoints
Intelligence: Learning and Limitations. WSEAS [23] Shukla, J.S., Rastogi, K., Patel, H., Jain, G., Sharma, S.
Transactions on Advances in Engineering Education, (2022). Bag of Visual Words Methodology in Remote
17, 80-86. https://doi.org/10.37394/232010.2020.17.10 Sensing—A Review. Proceedings of the International
[10] Abd Halim, K.N., Jaya, A.S. M., Fadzil, A.F.A. (2020). e-Conference on Intelligent Systems and Signal
Data pre-processing algorithm for neural network Processing, 475-486. https://doi.org/10.1007/978-981-
binary classification model in bank tele-Marketing. 16-2123-9_36
International Journal of Innovative Technology and [24] BelgiumTS- Belgian Traffic Sign Dataset, ETH shared
Exploring Engineering (IJITEE), 9, 272-277. data. (Accesed on 31/03/2022).
https://www.doi.org/10.35940/ijitee.C8472.019320 https://btsd.ethz.ch/shareddata/
[11] Tiwari, R., Srivastava, S., Gera, R. (2020). [25] Karypidis, E., Mouslech, S.G., Skoulariki, K., Gazis, A.
Investigation of artificial intelligence techniques in BelgiumTS - Belgian Traffic Sign Dataset custom
finance and marketing. Procedia Computer Science, version, Mendeley Data (Accesed on 31/03/2022).
173, 149-157. www.doi.org/10.17632/kfngc5jxds.3
https://doi.org/10.1016/j.procs.2020.06.019
[12] Praneetpholkrang, P., Kanjanawattana, S. (2021). A [26] Lowe, D.G. (2004). Distinctive image features from
Novel Approach for Determining Shelter Location- scale-invariant keypoints. International Journal Of
Allocation in Humanitarian Relief Logistics. Computer Vision, 60(2), 91-110.
International Journal of Knowledge and Systems https://doi.org/10.1023/B:VISI.0000029664.99615.94
Science (IJKSS), 12(2), 52-68. [27] Gazis, A., Katsiri, E. (2020). A wireless sensor network
www.doi.org/10.4018/IJKSS.2021040104 for underground passages: Remote sensing and wildlife
[13] Almutairi, A. F., Gegov, A., Adda, M., Arabikhan, F. monitoring. Engineering Reports, 2(6), e12170.
(2020). Conceptual artificial intelligence framework to https://doi.org/10.1002/eng2.12170
improving English as second language. WSEAS [28] MacQueen, J. (1967). Some methods for classification
Transactions on Advances in Engineering Education, and analysis of multivariate observations. In
17, 87-91. https://doi.org/10.37394/232010.2020.17.11 Proceedings of the fifth Berkeley symposium on
[14] Yfantis, V., Ntalianis, K., Ntalianis, F. (2020). mathematical statistics and probability, 1(14), 281-
Exploring the Implementation of Artificial Intelligence 297. https://www.semanticscholar.org/paper/Some-
in the Public Sector: Welcome to the Clerkless Public methods-for-classification-and-analysis-of-
Offices. WSEAS Transactions on Advance in MacQueen/ac8ab51a86f1a9ae74dd0e4576d1a019f5e6
Engineering Education. 54ed
https://doi.org/10.37394/232010.2020.17.9 [29] Cover, T., Hart, P. (1967). Nearest neighbor pattern
[15] Zualkernan, I., Judas, J., Mahbub, T., Bhagwagar, A., classification. IEEE transactions on information
Chand, P. (2021). An aiot system for bat species theory, 13(1), 21-27.
classification. IEEE International Conference on https://doi.org/10.1109/TIT.1967.1053964
Internet of Things and Intelligence System, 155-160. [30] Fix, E., Hodges, J. L. (1989). Discriminatory analysis.
www.doi.org/10.1109/IoTaIS50849.2021.9359704 Nonparametric discrimination: Consistency properties.
[16] Li, J., Lin, D., Wang, Y., Xu, G., Zhang, Y., Ding, C., International Statistical Review/Revue Internationale
Zhou, Y. (2020). Deep discriminative representation de Statistique, 57(3), 238-247.
learning with attention map for scene classification. https://apps.dtic.mil/sti/citations/ADA800276

E-ISSN: 2224-2880 129 Volume 21, 2022


WSEAS TRANSACTIONS on MATHEMATICS Efstathios Karypidis, Stylianos G. Mouslech,
DOI: 10.37394/23206.2022.21.19 Kassiani Skoulariki, Alexandros Gazis

[31] Gallego, A.J., Calvo-Zaragoza, J., Valero-Mas, J.J., Optimized Activation Function. Information, 12(12),
Rico-Juan, J.R. (2018). Clustering-based k-nearest 513. http://dx.doi.org/10.3390/info12120513
neighbor classification for large-scale data with neural [48] Ramachandran, P., Zoph, B., Le, Q.V. (2017). Swish:
codes representation. Pattern Recognition, 74, 531- a self-gated activation function. arXiv preprint
543. https://doi.org/10.1016/j.patcog.2017.09.038 arXiv:1710.05941. https://arxiv.org/abs/1710.05941v1
[32] Nerurkar, P., Shirke, A., Chandane, M., Bhirud, S. [49] Mish, M.D. (2020). A Self Regularized Non-
(2018). Empirical analysis of data clustering Monotonic Activation Function. arXiv preprint
algorithms. Procedia Computer Science, 125, 770-779. arXiv:1908.08681. https://arxiv.org/abs/1908.08681
https://doi.org/10.1016/j.procs.2017.12.099
[50] Kingma, D.P., Ba, J. (2014). Adam: A method for
[33] Singh, A., Yadav, A., Rana, A. (2013). K-means with stochastic optimization. arXiv preprint
Three different Distance Metrics. International Journal arXiv:1412.6980. https://arxiv.org/abs/1412.6980
of Computer Applications, 67(10). [51] Ioffe, S., Szegedy, C. (2015). Batch normalization:
http://dx.doi.org/10.5120/11430-6785
Accelerating deep network training by reducing
[34] Mughnyanti, M., Efendi, S., Zarlis, M. (2020). Analysis internal covariate shift. In International conference on
of determining centroid clustering x-means algorithm machine learning, 448-456.
with davies-bouldin index evaluation. In IOP http://proceedings.mlr.press/v37/ioffe15.html
Conference Series: Materials Science and Engineering,
725(1), 012128. https://doi.org/10.1088/1757- [52] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever,
899X/725/1/012128 I., Salakhutdinov, R. (2014). Dropout: a simple way to
prevent neural networks from overfitting. The Journal
[35] Cortes, C., Vapnik, V. (1995). Support-vector Of Machine Learning Research, 15(1), 1929-1958.
networks. Machine learning, 20(3), 273-297. https://jmlr.org/papers/v15/srivastava14a.html
https://doi.org/10.1007/BF00994018
[53] Aggarwal, C.C., Hinneburg, A., Keim, D.A. (2001).
[36] Evgeniou T., Pontil M. (2001) Support Vector On the surprising behavior of distance metrics in high
Machines: Theory and Applications. In: Paliouras G., dimensional space. In International conference on
Karkaletsis V., Spyropoulos C.D. (eds) Machine database theory, 420-434. https://doi.org/10.1007/3-
Learning and Its Applications. ACAI 1999. Lecture 540-44503-X_27
Notes in Computer Science, 2049.
https://doi.org/10.1007/3-540-44673-7_12 [54] Hou J., Kang J., Qi N. (2010) On Vocabulary Size in
Bag-of-Visual-Words Representation. In: Qiu G., Lam
[37] Tsekouras, G.E., Trygonis, V., Maniatopoulos, A., K.M., Kiya H., Xue XY., Kuo CC.J., Lew M.S. (eds)
Rigos, A., Chatzipavlis, A., Tsimikas, J., Velegrakis, A. Advances in Multimedia Information Processing -
F. (2018). A Hermite neural network incorporating PCM 2010. Lecture Notes in Computer Science, 6297.
artificial bee colony optimization to model shoreline https://doi.org/10.1007/978-3-642-15702-8_38
realignment at a reef-fronted beach. Neurocomputing,
280, 32-45. [55] Kuru, K., Khan, W. (2020). A framework for the
synergistic integration of fully autonomous ground
https://doi.org/10.1016/j.neucom.2017.07.070 vehicles with smart city. IEEE Access, 9, 923-948.
[38] Zhang, X., Wang, J., Wang, T., Jiang, R., Xu, J., Zhao, https://doi.org/10.1109/ACCESS.2020.3046999
L. (2021). Robust feature learning for adversarial [56] Butt, F. A., Chattha, J. N., Ahmad, J., Zia, M.U.,
defense via hierarchical feature alignment. Information Rizwan, M., Naqvi, I.H. (2022). On the Integration of
Sciences, 560, 256-270. Enabling Wireless Technologies and Sensor Fusion for
https://doi.org/10.1016/j.ins.2020.12.042
Next-Generation Connected and Autonomous
[39] Liu, S., Deng, W. (2015). Very deep convolutional Vehicles. IEEE Access, 10, 14643-14668.
neural network based image classification using small https://doi.org/10.1109/ACCESS.2022.3145972
training sample size. In IAPR Asian conference on
pattern recognition (ACPR), 730-734. [57] Yeomans, J.S. (2021). A Multicriteria, Bat Algorithm
https://arxiv.org/abs/1409.1556 Approach for Computing the Range Limited Routing
[40] Timofte, R., Zimmermann, K., Van Gool, L. (2014). Problem for Electric Trucks. WSEAS Transactions on
Multi-view traffic sign detection, recognition, and 3D Circuits and Systems, 20, 96-106.
localisation. Machine vision and applications, 25(3), www.doi.org/10.37394/23201.2021.20.13
633-647. https://doi.org/10.1007/s00138-011-0391-3 [58] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J.,
[41] Misra, D. (2019). Mish: A self regularized non- Jones, L., Gomez, A.N., Polosukhin, I. (2017).
monotonic neural activation function. arXiv preprint Attention is all you need. In Advances in neural
arXiv:1908.08681. https://arxiv.org/abs/1908.08681 information processing systems, 5998-6008.
https://papers.nips.cc/paper/2017/hash/3f5ee243547de
[42] Goodfellow, I., Bengio, Y., Courville, A. (2017). Deep e91fbd053c1c4a845aa-Abstract.html
learning (adaptive computation and machine learning
series). Cambridge Massachusetts, 321-359. [59] Raghu, M., Unterthiner, T., Kornblith, S., Zhang, C.,
Dosovitskiy, A. (2021). Do vision transformers see like
[43] Zhai, S., Cheng, Y., Lu, W., Zhang, Z. (2016). Doubly convolutional neural networks?. Advances in Neural
convolutional neural networks. arXiv preprint Information Processing Systems, 34.
arXiv:1610.09716. https://arxiv.org/abs/1610.09716 https://proceedings.neurips.cc/paper/2021/hash/652cf3
[44] Graham, B. (2014). Fractional max-pooling. arXiv 8361a209088302ba2b8b7f51e0-Abstract.html
preprint arXiv:1412.6071. [60] Li, S., Chen, X., He, D., Hsieh, C.J. (2021). Can Vision
https://arxiv.org/abs/1412.6071 Transformers Perform Convolution?. arXiv preprint
[45] Nair, V., Hinton, G.E. (2010). Rectified linear units arXiv:2111.01353. https://arxiv.org/abs/2111.01353
improve restricted boltzmann machines. Icml. [61] Newell, A., Deng, J. (2020). How useful is self-
https://openreview.net/forum supervised pretraining for visual tasks? Proceedings of
[46] Glorot, X., Bordes, A., Bengio, Y. (2011). Deep sparse the IEEE/CVF Conference on Computer Vision and
rectifier neural networks. In Proceedings of the Pattern Recognition, 7345-7354.
fourteenth international conference on artificial https://arxiv.org/abs/2003.14323
intelligence and statistics. JMLR Workshop and
Conference Proceedings, 315-323. Creative Commons Attribution License 4.0
https://proceedings.mlr.press/v15/glorot11a.html
[47] Maniatopoulos, A., Mitianoudis, N. (2021). Learnable (Attribution 4.0 International , CC BY 4.0)
Leaky ReLU (LeLeLU): An Alternative Accuracy- This article is published under the terms of the Creative
Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en_US

E-ISSN: 2224-2880 130 Volume 21, 2022

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy