Comparison Analysis of Traditional Machine Learnin
Comparison Analysis of Traditional Machine Learnin
Mouslech,
DOI: 10.37394/23206.2022.21.19 Kassiani Skoulariki, Alexandros Gazis
Keywords - Computer Vision, Multi-Class Classification, K-Means Clustering, Feature Vectors, Linear/Non-Linear Classifiers,
Deep Learning, Convolutional Neural Networks, Data Augmentation, Feature Maps (Feature Learning), Transfer Learning
Received: May 28, 2021. Revised: January 30, 2022. Accepted: February 23, 2022. Published: March 23, 2022.
1. Introduction
I N our modern day and age, the amount of data provided daily
by individuals and businesses is rapidly increasing due to
new technological advancements. The use of the Internet of
Additionally, AI is a field that shows great potential as, due to
Moore’s law that states the computing processing power yearly
increases while the cost is decreased, we are now capable of
Things and especially Machine-to-Machine communication handling rapidly and most importantly reliably data points
channels have helped create a large interconnected network of generated on the spot. Analytically, AI is not a new term as one
computing devices. Specifically, the increasing use of mobile of the first mathematical models implementing a biological
devices equipped with a large variety of sensors, various neural network (BNN) was first presented in 1943 [2]. This
everyday embedded devices, and everyday tasks such as web publication showcased how an Artificial Neural Network
browsing provide abundant information that is stored to the (ANN) can be emulated via the use of a BNN using advanced
“cloud,” i.e. to remote repositories. These data are then later mathematic formulas consisting of parameters such as weights,
accessed via Big Data infrastructures that propose methods to bias, and activation functions [3], [4]. Furthermore, besides
optimally extract, transform, and load this information to data classification tasks, this concept was later used as a
advanced systems capable of mining these data points. The stepping stone to enhance data insights in the field of Computer
outcome of these processes is to fuse the data streams in real- vision as it introduced detailed object classification and
time and implement techniques harnessing the power of recognition in image/video datasets. Lastly, recent advances in
Artificial Intelligence (AI) to provide valuable data insights the field of deep learning ANNS [5], [6], [7] focus on data
categorised into descriptive, diagnostic, predictive, and/or classification and image recognition tasks via deep learning
prescriptive results [1]. techniques [8], [9] in several fields. Most notably this includes
e-commerce [10], finance [11], humanitarian aid [12],
education [13], healthcare [14], and ecological informatics [15].
In this article we will focus on the last part of the above- [18], [19], [20]. As a result, recent research has shifted its
mentioned process, presenting a detailed comparison of recent efforts to provide innovative solutions that take under
ML and deep learning techniques. Specifically, the outline of consideration a trade-off between optimal accuracy for low-
our paper is as follows: firstly, we will present the aims and power and low-cost methods. This means that emphasis must
objectives of our study. Secondly, we will briefly discuss some be given to study, understand, and ameliorate existing
of the most widely used ML data and image classification computational mathematics approaches and methods.
techniques in the industry. Thirdly, we will present and focus In the following sections, we will present the 2 algorithms
on ML and Deep Learning solutions specialising in that we have studied using both ML and Deep Learning
classification problems. Fourthly, we will present a detailed solutions. Analytically, the first focuses on ML extracting key
comparison analysis between these traditional ML and Deep features using SIFT algorithm and creating a global
Learning techniques. Fifthly, we will discuss our comparative representation as described in the bag of visual words (BOVW)
results and suggest a novice ANN that achieves high accuracy model [21], [22], [23]. Furthermore, we will present
of over 90% regarding a benchmark tested dataset. Finally, we classification techniques focusing on the K-nearest neighbours’
will draw conclusions and discuss how this publication can be algorithm (KNN) and support vector machine (SVM) classifier.
used in future works. Moreover, regarding Deep Learning techniques we will
showcase the implementation of two custom convolutional
neural networks (CNN), i.e., one with and one without the use
2. Aims and Objectives of a pretrained model. Lastly, for each of the method used we
This article aims to address the evaluation of some of the most highlight the most important mathematic components and
widely used techniques regarding data and image address common neural network issues such as overfitting.
classification. Specifically, our tests are two folded:
1. ML techniques: using the bag of visual words model 3.2 Dataset
(BOVW), K-nearest neighbours and support vector In order to conduct this comparative analysis, we have
machine algorithms extended the BelgiumTS - Belgium Traffic Sign Dataset [24]
2. Deep Learning techniques: using deep convolutional by creating our custom version [25]. During our dataset design,
neural networks, i.e., a pre-trained model and the we have noticed the existence of class imbalance containing
proposed ANN more images than others, but this does not affect the quality of
Additionally, using the above-mentioned methods, we aim to the data. Specifically, our version consists of several images
address the following:
from various traffic signs split into 34 classes with each class
1. Provide information regarding some of the most
containing photos of different signs. More specifically, the
widely used methods for data/image classification
training dataset contains 3056 images which are split into an
2. Present a comparative study of ML and Deep
Learning based solutions 80/20 ratio regarding training and validation (i.e. 2457/599
3. Explain the architecture and the mathematical images). Lastly, the testing dataset used contains 2149 images.
parameters of these methods 3.3 Traditional ML methods
4. Suggest a novice CNN architecture for image
classification The traditional ML methods that are most commonly used in
Lastly, the technical novelty of this article is not only academia and industry alike consist of the following:
presenting a comparative study between traditional ML and 1. Points of interest detection of points and features
Deep Learning techniques, but suggesting a new CNN that extraction (descriptions for each image of a trained data
achieves accuracy levels of slightly over 90% - and in some set).
cases higher - similarly with the most recent scientific 2. Production of visual vocabulary based on the BOVW
advances in the field. model and implementation of K-means algorithm.
3. Encoding training images using the generated dictionary
and histogram extraction.
3. Background and Methods 4. Classification using KNN and/or SVM
1) Traditional ML methods
3.1 Defining the Problem In these methods, a system identifies the points of interests
In machine vision, one of the most common problems mainly for each of the given images. Analytically, this is achieved via
occur due to the following reasons: difficulties in recognising the use of detectors that enable features extraction (i.e.,
objects from different angles, differences in lighting, volatile descriptions for a vector of features) for each examined point of
rotation speeds of objects, rapid changes in scaling, and generic interest. Then, these vectors are examined to determine the
intraclass variations. attribute identity thus enabling us to decide if two
In the last decade, due to increasing computational power, characteristics are similar. In this article, we use the SIFT
the rise of cloud computing infrastructures, and advances in descriptor [26].
hardware acceleration techniques (either using GPUs or remote
data centres), 2D object recognition research has rapidly 2) Production of visual vocabulary - BOVW model
increased. Specifically, recent research suggests that these The BOVW consists of a dictionary, constructed by a
challenges do not pose a computational problem [16], [17], clustering algorithm which aims to locate differences between
an image and a general representation of a dataset. Specifically, distance measures exist, where the most commonly used are the
the operating principle behind the BOVW model supports that following:
to encode all the local features of an image, a universal
Manhattan Distance (L1), (also known as Taxicab),
representation of an image must be created. This model
defines the distance (d1), between 2 vectors (p, q) of an
compares the examined image with the generated
n-dimensional vector as follows:
representation of each class and generates an output based on
𝑛
the differences of their content.
Similarly, in our article, our objective is to use an d1 (𝑝, 𝑞) = ∑|𝑝𝑖 − 𝑞𝑖 |
unsupervised learning technique that groups the output of all 𝑖=1
descriptors generated from an examined dataset of images to Euclidean Distance (L2), defines the distance (d2)
distinct groups of unique characteristics. between 2 vectors (p, q) of an n-dimensional vector as
Furthermore, to implement this model, several algorithms follows:
approach, the most common one being the K-means, a
clustering algorithm which organised the data points provided 𝑛
to the nearest centroid, for a fixed number K of clusters (i.e. d2 (𝑝, 𝑞) = √∑|𝑝𝑖 − 𝑞𝑖 |2
words), until the system converged, for a given number of
𝑖=1
iterations [27]. The steps of this algorithms are the following
[28]: Moreover, in machine learning, many distance metrics exist
1. Initialise cluster centroids 𝜇1 , 𝜇2 , . . . , 𝜇𝑘 ∈ 𝑅𝑛 for multiple n-dimensional vector spaces to calculate the notion
randomly of distance or similarity [34]. In our study, we have selected to
2. Repeat until convergence use a generalisation of L1 and L2 distances. Specifically, in our
For every i, regarding the index of each point, set model we have defined the Minkowski distance, that based on
2
𝑐 (𝑖) ≔ 𝑎𝑟𝑔 𝑚𝑖𝑛𝑗 ‖𝑥 (𝑖) − 𝜇𝑗 ‖ the provided values of the input (p, q) it shapes its equation to
For every j, regarding the index of each cluster, set the necessary distance. The Minkowski distance is defined as
∑𝑚 (𝑖)
𝑖=1{𝑐 =𝑗}𝑥
(𝑖) follows:
𝜇𝑗 : = ∑𝑚 (𝑖)
𝑖=1{𝑐 =𝑗} 1
𝑛
Where, 𝑥𝑖 is the unique feature vector (descriptor) i and 𝑐 (𝑖) is 𝑝
4. Results
In this section we will firstly present the results of our
network architecture using ML methods. Later, we will provide
information regarding our experiments using Deep Learning
methods and illustrate the advantages of our NN approach.
Lastly, we will conclude this study with a comparative analysis
consisting of confusion matrices and comparison diagrams of
Fig.2. Comparison of the activation functions, some of them used in
our experiments each architecture.
Vocabulary Size
K
50 75 100
15 40.01861% 41.13541% 41.18195%
Vocabulary Size
Kernel
50 75 100
Fig.5. Accuracy curves regarding the proposed model and the
RBF 30.71196% 36.57515% 43.78781%
pretrained model during training and validation
LINEAR 28.05956% 42.29874% 44.95114%
Fig.6. Loss curves regarding the proposed model and the pretrained
model during training and validation.
References
Fig.8. Confusion matrix of the optimal KNN ( K= 10, [1] Wang, J. (2022). Influence of Big Data on
Vocab_size = 5000). Manufacturing Accounting Informatization. Cyber
Security Intelligence and Analytics. CSIA 2022.
Lecture Notes on Data Engineering and
Communications Technologies, 125, 695-700.
https://doi.org/10.1007/978-3-030-97874-7_92
[2] McCulloch, W.S., Pitts, W. (1943) A logical calculus
of the ideas immanent in nervous activity. Bulletin of
[31] Gallego, A.J., Calvo-Zaragoza, J., Valero-Mas, J.J., Optimized Activation Function. Information, 12(12),
Rico-Juan, J.R. (2018). Clustering-based k-nearest 513. http://dx.doi.org/10.3390/info12120513
neighbor classification for large-scale data with neural [48] Ramachandran, P., Zoph, B., Le, Q.V. (2017). Swish:
codes representation. Pattern Recognition, 74, 531- a self-gated activation function. arXiv preprint
543. https://doi.org/10.1016/j.patcog.2017.09.038 arXiv:1710.05941. https://arxiv.org/abs/1710.05941v1
[32] Nerurkar, P., Shirke, A., Chandane, M., Bhirud, S. [49] Mish, M.D. (2020). A Self Regularized Non-
(2018). Empirical analysis of data clustering Monotonic Activation Function. arXiv preprint
algorithms. Procedia Computer Science, 125, 770-779. arXiv:1908.08681. https://arxiv.org/abs/1908.08681
https://doi.org/10.1016/j.procs.2017.12.099
[50] Kingma, D.P., Ba, J. (2014). Adam: A method for
[33] Singh, A., Yadav, A., Rana, A. (2013). K-means with stochastic optimization. arXiv preprint
Three different Distance Metrics. International Journal arXiv:1412.6980. https://arxiv.org/abs/1412.6980
of Computer Applications, 67(10). [51] Ioffe, S., Szegedy, C. (2015). Batch normalization:
http://dx.doi.org/10.5120/11430-6785
Accelerating deep network training by reducing
[34] Mughnyanti, M., Efendi, S., Zarlis, M. (2020). Analysis internal covariate shift. In International conference on
of determining centroid clustering x-means algorithm machine learning, 448-456.
with davies-bouldin index evaluation. In IOP http://proceedings.mlr.press/v37/ioffe15.html
Conference Series: Materials Science and Engineering,
725(1), 012128. https://doi.org/10.1088/1757- [52] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever,
899X/725/1/012128 I., Salakhutdinov, R. (2014). Dropout: a simple way to
prevent neural networks from overfitting. The Journal
[35] Cortes, C., Vapnik, V. (1995). Support-vector Of Machine Learning Research, 15(1), 1929-1958.
networks. Machine learning, 20(3), 273-297. https://jmlr.org/papers/v15/srivastava14a.html
https://doi.org/10.1007/BF00994018
[53] Aggarwal, C.C., Hinneburg, A., Keim, D.A. (2001).
[36] Evgeniou T., Pontil M. (2001) Support Vector On the surprising behavior of distance metrics in high
Machines: Theory and Applications. In: Paliouras G., dimensional space. In International conference on
Karkaletsis V., Spyropoulos C.D. (eds) Machine database theory, 420-434. https://doi.org/10.1007/3-
Learning and Its Applications. ACAI 1999. Lecture 540-44503-X_27
Notes in Computer Science, 2049.
https://doi.org/10.1007/3-540-44673-7_12 [54] Hou J., Kang J., Qi N. (2010) On Vocabulary Size in
Bag-of-Visual-Words Representation. In: Qiu G., Lam
[37] Tsekouras, G.E., Trygonis, V., Maniatopoulos, A., K.M., Kiya H., Xue XY., Kuo CC.J., Lew M.S. (eds)
Rigos, A., Chatzipavlis, A., Tsimikas, J., Velegrakis, A. Advances in Multimedia Information Processing -
F. (2018). A Hermite neural network incorporating PCM 2010. Lecture Notes in Computer Science, 6297.
artificial bee colony optimization to model shoreline https://doi.org/10.1007/978-3-642-15702-8_38
realignment at a reef-fronted beach. Neurocomputing,
280, 32-45. [55] Kuru, K., Khan, W. (2020). A framework for the
synergistic integration of fully autonomous ground
https://doi.org/10.1016/j.neucom.2017.07.070 vehicles with smart city. IEEE Access, 9, 923-948.
[38] Zhang, X., Wang, J., Wang, T., Jiang, R., Xu, J., Zhao, https://doi.org/10.1109/ACCESS.2020.3046999
L. (2021). Robust feature learning for adversarial [56] Butt, F. A., Chattha, J. N., Ahmad, J., Zia, M.U.,
defense via hierarchical feature alignment. Information Rizwan, M., Naqvi, I.H. (2022). On the Integration of
Sciences, 560, 256-270. Enabling Wireless Technologies and Sensor Fusion for
https://doi.org/10.1016/j.ins.2020.12.042
Next-Generation Connected and Autonomous
[39] Liu, S., Deng, W. (2015). Very deep convolutional Vehicles. IEEE Access, 10, 14643-14668.
neural network based image classification using small https://doi.org/10.1109/ACCESS.2022.3145972
training sample size. In IAPR Asian conference on
pattern recognition (ACPR), 730-734. [57] Yeomans, J.S. (2021). A Multicriteria, Bat Algorithm
https://arxiv.org/abs/1409.1556 Approach for Computing the Range Limited Routing
[40] Timofte, R., Zimmermann, K., Van Gool, L. (2014). Problem for Electric Trucks. WSEAS Transactions on
Multi-view traffic sign detection, recognition, and 3D Circuits and Systems, 20, 96-106.
localisation. Machine vision and applications, 25(3), www.doi.org/10.37394/23201.2021.20.13
633-647. https://doi.org/10.1007/s00138-011-0391-3 [58] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J.,
[41] Misra, D. (2019). Mish: A self regularized non- Jones, L., Gomez, A.N., Polosukhin, I. (2017).
monotonic neural activation function. arXiv preprint Attention is all you need. In Advances in neural
arXiv:1908.08681. https://arxiv.org/abs/1908.08681 information processing systems, 5998-6008.
https://papers.nips.cc/paper/2017/hash/3f5ee243547de
[42] Goodfellow, I., Bengio, Y., Courville, A. (2017). Deep e91fbd053c1c4a845aa-Abstract.html
learning (adaptive computation and machine learning
series). Cambridge Massachusetts, 321-359. [59] Raghu, M., Unterthiner, T., Kornblith, S., Zhang, C.,
Dosovitskiy, A. (2021). Do vision transformers see like
[43] Zhai, S., Cheng, Y., Lu, W., Zhang, Z. (2016). Doubly convolutional neural networks?. Advances in Neural
convolutional neural networks. arXiv preprint Information Processing Systems, 34.
arXiv:1610.09716. https://arxiv.org/abs/1610.09716 https://proceedings.neurips.cc/paper/2021/hash/652cf3
[44] Graham, B. (2014). Fractional max-pooling. arXiv 8361a209088302ba2b8b7f51e0-Abstract.html
preprint arXiv:1412.6071. [60] Li, S., Chen, X., He, D., Hsieh, C.J. (2021). Can Vision
https://arxiv.org/abs/1412.6071 Transformers Perform Convolution?. arXiv preprint
[45] Nair, V., Hinton, G.E. (2010). Rectified linear units arXiv:2111.01353. https://arxiv.org/abs/2111.01353
improve restricted boltzmann machines. Icml. [61] Newell, A., Deng, J. (2020). How useful is self-
https://openreview.net/forum supervised pretraining for visual tasks? Proceedings of
[46] Glorot, X., Bordes, A., Bengio, Y. (2011). Deep sparse the IEEE/CVF Conference on Computer Vision and
rectifier neural networks. In Proceedings of the Pattern Recognition, 7345-7354.
fourteenth international conference on artificial https://arxiv.org/abs/2003.14323
intelligence and statistics. JMLR Workshop and
Conference Proceedings, 315-323. Creative Commons Attribution License 4.0
https://proceedings.mlr.press/v15/glorot11a.html
[47] Maniatopoulos, A., Mitianoudis, N. (2021). Learnable (Attribution 4.0 International , CC BY 4.0)
Leaky ReLU (LeLeLU): An Alternative Accuracy- This article is published under the terms of the Creative
Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en_US