Deep Learning Applications in Image Analysis (2023)
Deep Learning Applications in Image Analysis (2023)
Series Editor
Janusz Kacprzyk
Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Big Data” (SBD) publishes new developments and
advances in the various areas of Big Data- quickly and with a high quality.
The intent is to cover the theory, research, development, and applications of
Big Data, as embedded in the fields of engineering, computer science,
physics, economics and life sciences. The books of the series refer to the
analysis and understanding of large, complex, and/or distributed data sets
generated from recent digital sources coming from sensors or other physical
instruments as well as simulations, crowd sourcing, social networks or other
internet transactions, such as emails or video click streams and other. The
series contains monographs, lecture notes and edited volumes in Big Data
spanning the areas of computational intelligence including neural networks,
evolutionary computation, soft computing, fuzzy systems, as well as
artificial intelligence, data mining, modern statistics and Operations
research, as well as self-organizing systems. Of particular value to both the
contributors and the readership are the short publication timeframe and the
world-wide distribution, which enable both wide and rapid dissemination of
research output.
The books of this series are reviewed in a single blind peer review
process.
Indexed by SCOPUS, EI Compendex, SCIMAGO and zbMATH.
All books published in the series are submitted for consideration in Web
of Science.
Editors
Sanjiban Sekhar Roy, Ching-Hsien Hsu and Venkateshwara Kagita
Ching-Hsien Hsu
College of Information and Electrical Engineering, Asia University,
Musashino, Taiwan
Venkateshwara Kagita
Department of Computer Science and Engineering, National Institute of
Technology Warangal, Warangal, India
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Singapore Pte Ltd. 2023
This work is subject to copyright. All rights are solely and exclusively
licensed by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any
other physical way, and transmission or information storage and retrieval,
electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The publisher, the authors, and the editors are safe to assume that the advice
and information in this book are believed to be true and accurate at the date
of publication. Neither the publisher nor the authors or the editors give a
warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The
publisher remains neutral with regard to jurisdictional claims in published
maps and institutional affiliations.
1 Introduction
Handwritten character recognition has been an area of interest among deep
learning researchers and practitioners in recent years. Due to its huge
possibilities of various implementations, a significant number of studies
have been carried out on handwritten texts, and character recognition of
different languages, such as English [1], Japanese [2], Latin [3], etc. Bangla
is the 1st and official language of Bangladesh, and it is the 4th most popular
language in the world, spoken by almost 300 million people [4].
Considering this large number of native users, handwritten character
recognition of the Bangla language plays a very important role in a wide
range of applications, including bank cheque processing, identifying postal
codes, zip code scanning, interpreting national ID numbers, Bangla optical
character recognition (OCR), and many more [5, 6].
In the Bangla language, there are 11 vowels, 39 consonants, and a
considerable number of vowel diacritical, consonant conjuncts and
diacritical, and other digits, symbols, and punctuation marks. Recognizing
handwritten Bangla characters is more difficult and complicated for a
number of reasons: (a) there are a lot of compound characters in the Bangla
alphabet, (b) the forms of certain characters are identical, (c) as different
people write in different ways, the same character written by different
people will have different forms, sizes, and curvatures.
To overcome these problems several efforts have been taken to improve
the recognition accuracy. Convolutional neural networks (CNN) [4, 7–9],
Deep CNN [10], and ensemble learning methods [11, 12] have been applied
in recent years. However, the scarcity of Bangla datasets and imbalances in
classes in those datasets are a barrier to the recognition problem. Ensemble
methods and image augmentation are among the many ways to overcome
this issue. A generative Adversarial Network (GAN) introduced by [13] is
another way to produce new instances of data. The presence of outliers in
the dataset can also make the recognition a difficult task as they mislead the
training of the models. So, by eliminating outliers, statistically meaningful
results can be obtained.
In Bangla handwriting-related studies, researchers have used different
classification approaches. The authors in [14] suggested a hierarchical
method for segmenting characters from sentences, with multilayer
perceptron (MLP) as the classification algorithm, whereas an MLP, RBF
network, and SVM fusion classifier is suggested in [15]. In [16], Bangla
handwriting images are classified into 50 groups by using a multilayer
perceptron neural network.
Deep learning methods such as convolutional neural network (CNN)-
based architecture have been used in the majority of recent works. Some of
these works are only limited to simple characters [17] while others are
concentrated in handwritten digits [18, 19]. Additionally, work has been
done on a subset of the compound characters of the Bangla language [20].
One of the major issues in Bangla handwritten character recognition is the
limited availability of a complete handwritten characters dataset.
Generating Bangla handwritten characters is a way to solve this problem.
Deep convolutional generative adversarial network (DCGAN) [21] has
been used by some researchers to generate Bangla handwritten digits [22,
23]. However, there has not been much work focused on generating
complicated curative characters and classifying Bangla handwritten
characters using them.
The deep neural network is a widely used technique in analyzing and
classifying different types of images [24–27]. Residual Networks (ResNet)
is one of the prominent neural network-based architectures that have been
used for image classification and identification with excellent results for a
long time. For example, researchers have used Transfer Learning with
ResNet-50 for Malaria Cell-Image classification [28], or malicious software
classification [29]. The residual networks have been applied in some of the
Bangla handwritten character recognition studies [30–32].
This chapter aims at proposing a two-fold approach with the residual
network classifier to classify Bangla handwritten characters. At first, a
model has been created by using a ResNet variant called ResNet-50 to show
the classification of the target dataset, which is in this case, the Ekush [33]
dataset. After classifying, the datasets are then stabilized by removing the
outliers by using an autoencoder, and the classification is performed again
by using the same ResNet-50 model. Finally, the classes with a fewer
number of images are augmented with more images by applying DCGAN,
so that the number of images among the classes becomes balanced. This
dataset is then classified with the ResNet-50 model. In the end, a detailed
comparative analysis is conducted over the results obtained from the above-
mentioned experiments measuring the strengths of the adopted methods.
The rest of the chapter is structured as follows. Section 2 covers a
detailed review of the state-of-the-art of Bangla handwritten character
recognition. The methods and materials of this study and elaborate result
analysis with discussion are presented in Sects. 3, 4, and 5. The chapter
ends with an appropriate conclusion section.
2 Related Work
The researchers in [4] have introduced a CNN model named Ekushnet,
which has generated satisfactory results on Ekush [33] and CMATERdb
[34] datasets. The authors have mentioned that their Ekushnet model has
performed extremely well and generated the best results on Bangla
character recognition relative to the prior work that has been performed.
Their proposed model has found 96.90% accuracy on the training set and
97.73% on the validation set on the Ekush dataset after 50 iterations. The
authors also applied cross-validation on the CMATERdb dataset and found
that their EkushNet model is 95.01% accurate. Another research work [7]
has applied only a CNN model on Bangla handwritten character
identification and their proposed model obtained 85.96% accuracy on the
test dataset, whereas the authors in [10] achieved 95% accuracy by using
the Deep CNN model. Both works have used 50 alphabet classes of the
Ekush dataset. Another study [20] has achieved 95.05% accuracy on 122
classes in the Ekush dataset using their implemented DCNN model. The
authors have experimented on two other databases, CMATERdb and
BanglaLekha Isolated dataset [35]. The authors in [36] have shown an
excellent accuracy result on the CMATERdb dataset which is 98.78%. The
researchers have used five different approaches for classification.
The authors of [11] have found that the ensembled convolutional neural
network system outperforms a single CNN model when it comes to
recognizing Bangla handwriting. They have proposed a stacked
Generalization Ensemble Framework, consisting of six CNN models. Their
research has reached 96.72% accuracy on the test set. They achieved the
result only after 40 epochs. Another study [37] has applied three
approaches: first, seven CNN models have been applied to recognize the
Bangla handwritten characters. Then the best performing model ResNet-50
which has given 97.81% accuracy, has been used for feature extraction, and
classification is done by traditional classification algorithms. In the last
step, the authors employed different ensemble techniques for the
classification task. The stacked generation ensemble method has achieved
98.68% test accuracy which is the best result among all the adopted
methods. All the experiments of this study have been done on Ekush and
BanglaLekha-isolated datasets.
The authors of another study [38] experimented on six CNN models and
evaluated which DCNN model produces the best performance by using
CMATERdb [34] dataset. The results have shown that all the DCNN
models have worked wonderfully, but the DenseNet model has
outperformed the others. They have also pointed out that the DCNN
framework works better than other object recognition methods.
Another work [17] has shown that data augmentation can improve
handwritten character identification accuracy. The authors have tested their
algorithms on the alphabets of the BanglaLekha-Isolated dataset and found
that it is 91.81% accurate without data-augmented images and 95.25%
accurate with data-augmented images. They have also compared other
machine learning approaches to find out the efficiency level of these
methods. The comparative analysis has revealed that CNN outperforms
SVM and LSTM with or without data augmentation. They also have put
their proposed approaches to test on other datasets with similar
characteristics. The experiment has demonstrated 95.07% test accuracy on
the 59 classes of the Ekush dataset.
The performance of the classifier can be enhanced by enlarging the
dataset size. GAN as a data augmentation technique can help to expand the
dataset [23]. In [22], the authors have proposed a DCGAN architecture that
successfully increased four Bangla handwritten character datasets. For their
proposed work, the writers have just focused on the digit dataset. However,
they have not attempted to determine the CNN model performance. The
proof of improving the performance of the classifier on handwritten
datasets by adding GAN-generated images has been shown by another
study [23]. The proposed method has been successful in increasing the
accuracy on the MNIST dataset by using the GAN approach. They have
also used GAN on three Indian numerical handwritten datasets: Bangla,
Devanagari, and Oriya. The accuracy of all the datasets has been improved.
However, the result of the proposed work has shown that combining so
many GAN-generated images with the real dataset might degrade
efficiency. Another digit recognition and generation work done by [39] has
proposed network architecture and achieved 99.44% on BHAND [40]
dataset. After that, the study applied Semi-Supervised GAN or SGAN to
generate Bangla digits. One more GAN-related work [41] has proposed a
conditional GAN-based method for generating character images based on
class. Their study has used three separate Bangla handwritten character
datasets and has been able to achieve very realistic images by 1500 epochs.
However, they did not apply any classification with the generated images.
The literature review reveals that most of the research has been done
with either CNN or Deep CNN models. Apart from performing
classification, there have been very few variations of approaches in Bangla
handwritten character recognition works. Only a few studies have used the
GAN method with the attempt of the classifiers. Furthermore, none of the
studies has used outlier detection as part of their study. So, there is a
knowledge gap identified in this literature review about outlier
identification and elimination. In this work, both approaches are explored to
enhance the recognition performance of Bangla handwritten characters.
The autoencoder network has been trained with only a few inlier
images. The intuition behind using only inlier images is to train the model
to be familiar with what is normal so that while testing, the model
reconstructs the outlier images poorly and reconstruction error becomes
high. The images with reconstruction errors higher than a specific threshold
then have been labeled as outliers and have been discarded from the dataset.
The reconstruction error is calculated using mean squared error.
3.3 Generative Adversarial Network
The Ekush Bangla handwritten dataset contains several imbalanced classes.
Data augmentation can be a way for generating a number of images in order
to balance a dataset. Data augmentation approaches such as rotation, and
scaling can expand a dataset but do not always add information. Generative
Adversarial Network (GAN), on the other hand, can generate synthetic
images that can bring additional information to the dataset. We have chosen
a deep convolutional generative adversarial network (DCGAN) as it is the
most effective architecture for improving classification and identification
[46]. We have only taken five classes from the Ekush dataset as these
classes have much fewer images than others. The outlier-removed classes
that are common in these classes are used as input data in the proposed
GAN model. Table 1 shows the classes that have been used in DCGAN.
76 4261
97 4100
110 2012
111 986
Following this [21] research, we have used the Adam optimizer in our
DCGAN model. Although another study [47] has used an Adam optimizer
with of learning rate and 0.1 of momentum, we have changed
the learning rate to and momentum term to in both the
discriminator and generator model as we have found that these values of
parameters helped to stabilize the training. β_1 momentum is used to
control the decaying of the running average of the gradient, which is
exponentially multiplied by itself at the end of each batch step [48]. Binary
cross entropy has been employed to measure the loss of the discriminator
and generator. We have used two separate batch sizes: 64 batch size in 110
and 111 classes because the number of real images has been limited, and
128 batch size in the other three classes. For these groups, the model has
been trained for 2000 epochs, except for 111, which have been trained for
4000 epochs. The explanation for the higher number of epochs for 111
groups is that the number of training data in 111 is very scarce, which
prevents the generator to generate quality synthetic images in the early
epoch. Every 50 epochs, we saved images and observed the produced
images. We have taken images for these five classes at various epochs and
identified the epoch where the quality of synthetic images is good enough
compared to the actual images. We have taken a fixed number of images for
each of these classes so that the classification model is trained with at least
4000 images. Table 2 shows the total number of images that are added to
the actual training dataset.
3.4 Classification
Before applying the classification model, all the images have been resized
as 28 × 28 scales with gray color mode. We have used ResNet-50 to classify
the 122 classes of the Ekush dataset. The name implies that the model
consists of 50 layers. A brief description of ResNet-50 is given in the
following section.
3.4.1 ResNet-50
Identity and convolutional blocks are the two different blocks that are used
in ResNet-50 architecture based on the dimensions of the input and output.
Both blocks have a skip connection over the main path which helps the
model learn an identity function. The identity function helps to skip the
layers to be trained which is not helpful to add value to accuracy [49]. In
the identity block, there are three Conv2D layers with stride (1,1) and zero
seed random initialization. Only the second Conv2D has padding. Batch
normalization and Relu activation follow each Conv2D except that a
shortcut is added before the final Relu activation. In the convolutional
block, the skip connection has a Conv2D layer and Batch normalization that
the identity block does not have. Except for this, the structure is almost the
same as the identity block. The first and the convolution layer on shortcut
paths has a stride of (s,s) and the rest has (1,1).
The ResNet-50 architecture has five stages. Before entering these
stages, the dimension of the dataset image 28 × 28 × 1 is given as an input
shape to the ResNet-50 architecture. The first stage of the ResNet-50 has 7
× 7 convolutional layers with 32 filters and (1,1) strides. Right after that,
batch normalization and a 3 × 3 MaxPooling layer are used. The rest of the
stages of ResNet-50 has two, three, five, and two numbers of identity
blocks, respectively followed by a convolutional block. After the five
stages, there is an average pooling with (2,2) strides, which is used to
reduce the output. Finally, a SoftMax activation is used with an FC-dense
layer to reduce the 122 input classes. The diagram of ResNet-50
architecture is given in Fig. 8.
4 Results
To improve the performance of Bangla handwritten character recognition,
initially, a semi-supervised image outlier detection model has been
proposed, and secondly, a generative adversarial network model has been
used to balance the dataset. For both strategies, a subset of 122 classes has
been chosen based on the recommendation made by other works [37] and
based on the domain knowledge regarding Bangla handwritten characters.
Outliers have been excluded from 7 classes using an autoencoder-based
model and 5 classes have been balanced up using the DCGAN model. In
this section, the outcomes of the experiments are explained in detail.
Figure 10 also justifies the efficiency of the outlier detection model. The
inlier images of class 19 have been divided into four batches, and all the
images of each batch have been superimposed into a single image. Each
batch consists of approximately 1500 images. In contrast, 272 outlier
images detected by the model have also been superimposed into a single
image. It is apparent from the figure that the images with superimposed
inliers tend to hold the inherent shape of the character even with 1500
images. On the other hand, only 272 outliers have made the corresponding
superimposed image all jumbled up, which further validates the efficiency
of the outlier detector.
Fig. 10 Superimposed inliers versus superimposed outliers
The changes in the training sample sizes are illustrated in Fig. 12. In
classes 110 and 111, the number of synthesized images added has been
more than 3000 and for the rest of the three classes, this number has been
around 1000. For four of these five classes, the classifier performance in
terms of the F1-score has improved. Moreover, the overall performance of
the ResNet-50 classifier trained on a balanced dataset has been better than
that of the trained on the original dataset. This validates the applicability of
DCGAN in generating synthesized Bangla handwritten character images.
Fig. 12 Training size before versus training size after applying DCGAN
Fig. 15 Accuracy and loss of ResNet-50 on outlier removed and DCGAN applied
dataset
4.3 Comparison with State-of-the-Art
Outlier elimination on the Ekush dataset is a unique operation. We are the
first who experimented on the outlier-removed Ekush dataset. Authors in
[22] only performed DCGAN to enlarge the Ekush dataset but no
classification was performed on the generated images. A comparative
analysis of the current work with others that used only the Ekush dataset is
illustrated in Table 8. Our proposed ResNet-50 model on the original
dataset has achieved 97.63% accuracy on the test dataset (Table 4), which
proves that the score has outperformed all the work except the EkushNet.
Shibly et al. [37] achieved the best test accuracy of 98.68% on the Ekush
dataset but that has been obtained through an ensemble of ten CNN models.
Their highest performance with a single CNN model has been 97.81%
using ResNet-50 which is easily outperformed by both of our proposed
methods. Our work has also achieved better performance than an ensemble
[11] and deep CNN techniques [10, 20] applied to the same dataset. Also,
although we have applied outlier on only seven classes and DCGAN on
only five classes, our two approaches outperformed the other related works.
However, the improvement is minor as only some classes from 122 classes
of the Ekush dataset have been considered in our study. But the results can
conclude that our proposed outlier and DCGAN approaches are capable to
improve the classification performance.
References
1. Yuan, A., Bai, G., Jiao, L., & Liu, Y. (2012). Offline handwritten English
character recognition based on convolutional neural network. In Proceedings 10th
IAPR International Workshop on Document Analysis Systems, DAS 2012 (pp.
125–129). https://doi.org/10.1109/DAS.2012.61
2.
Kimura, F., Wakabayashi, T., Tsuruoka, S., & Miyake, Y. (1997). Improvement of
handwritten Japanese character recognition using weighted direction code
histogram. Pattern Recognition, 30(8), 1329–1337. https://doi.org/10.1016/S0031-
3203(96)00153-7
[Crossref]
3. Ciresan, D. C., Meier, U., & Schmidhuber, J. (2012). Transfer learning for Latin
and Chinese characters with deep neural networks. In Proceedings of the
international joint conference on neural networks (pp. 1–6). https://doi.org/10.
1109/IJCNN.2012.6252544
4. Azad Rabby, A. K. M. S., Haque, S., Abujar, S., & Hossain, S. A. (2018).
Ekushnet: Using convolutional neural network for Bangla handwritten
recognition. Procedia Computer Science, 143, 603–610. https://doi.org/10.1016/j.
procs.2018.10.437
5. Ahmed, S., et al. (2019). Hand sign to bangla speech: A deep learning in vision
based system for recognizing hand sign digits and generating bangla speech.
https://doi.org/10.2139/ssrn.3358187
6. Manisha, N., Sreenivasa, E., & Krishna, Y. (2016). Role of offline handwritten
character recognition system in various applications. International Journal of
Computer Applications. https://doi.org/10.5120/ijca2016908349
7. Rahman, Md. M., Akhand, M. A. H., Islam, S., Chandra Shill, P., & Hafizur
Rahman, M. M. (2015). Bangla handwritten character recognition using
convolutional neural network. International Journal of Image, Graphics and
Signal Processing, 7(8), 42–49. https://doi.org/10.5815/ijigsp.2015.08.05
8. Ghosh, T., Abedin, M. H. Z., Al Banna, H., Mumenin, N., & Abu Yousuf, M.
(2021). Performance analysis of state of the art convolutional neural network
architectures in Bangla handwritten character recognition. Pattern Recognition
and Image Analysis, 31(1), 60–71. https://doi.org/10.1134/S1054661821010089
9. Chowdhury, R. R., Hossain, M. S., ul Islam, R., Andersson, K., & Hossain, S.
(2019). Bangla handwritten character recognition using convolutional neural
network with data augmentation. In 2019 Joint 8th international conference on
informatics, electronics & vision (ICIEV) and 2019 3rd international conference
on imaging, vision & pattern recognition (icIVPR) (pp. 318–323). https://doi.org/
10.1109/ICIEV.2019.8858545
10.
Ahmed, S., Tabsun, F., Reyadh, A. S., Shaafi, A. I., & Shah, F. M. (2019). Bengali
handwritten alphabet recognition using deep convolutional neural network. In 5th
International conference on computer, communication, chemical, materials and
electronic engineering, IC4ME2 2019. https://doi.org/10.1109/IC4ME247184.
2019.9036572
11. Shibly, M. M. A., Tisha, T. A., & Ripon, S. H. (2021). Stacked generalization
ensemble method to classify Bangla handwritten character. In Proceedings of
international conference on sustainable expert systems. Lecture Notes in
Networks and Systems 176. https://doi.org/10.1007/978-981-33-4355-9_46
12. Mamun, M. R., Al Nazi, Z., & Yusuf, M. S. (2018). Bangla handwritten digit
recognition approach with an ensemble of deep residual networks. In
International conference on bangla speech and language processing, ICBSLP
2018 (pp. 21–22). https://doi.org/10.1109/ICBSLP.2018.8554674
13. Goodfellow, I., et al. (2014). Generative adversarial nets. Advance in Neural
Information Process Systems, 27.
14. Basu, S., Das, N., Sarkar, R., Kundu, M., Nasipuri, M., & Basu, D. K. (2009). A
hierarchical approach to recognition of handwritten Bangla characters. Pattern
Recognition, 42(7), 1467–1484. https://doi.org/10.1016/j.patcog.2009.01.008
[Crossref][zbMATH]
15. Bhowmik, T. K., Ghanty, P., Roy, A., & Parui, S. K. (2009). SVM-based
hierarchical architectures for handwritten Bangla character recognition.
International Journal on Document Analysis and Recognition, 12(2), 97–108.
https://doi.org/10.1007/s10032-009-0084-x
[Crossref]
16. Bhattacharya, U., Gupta, B. K., & Parui, S. K. (2007). Direction code based
features for recognition of online handwritten characters of Bangla. In
Proceedings of the international conference on document analysis and
recognition, ICDAR, 2007. https://doi.org/10.1109/ICDAR.2007.4378675
17. Chowdhury, R. R., Hossain, M. S., Ul Islam, R., Andersson, K., & Hossain, S.
(2019). Bangla handwritten character recognition using convolutional neural
network with data augmentation. In 2019 Joint 8th international conference on
informatics, electronics and vision, ICIEV 2019 and 3rd international conference
on imaging, vision and pattern recognition, icIVPR 2019 with international
conference on activity and behavior computing, ABC 2019 (pp. 318–323). https://
doi.org/10.1109/ICIEV.2019.8858545
18.
Shopon, M., Mohammed, N., & Abedin, M. A. (2017). Bangla handwritten digit
recognition using autoencoder and deep convolutional neural network. In IWCI
2016-2016 International Workshop on Computational Intelligence. https://doi.org/
10.1109/IWCI.2016.7860340
19. Shopon, M., Mohammed, N., & Abedin, M. A. (2017). Image augmentation by
blocky artifact in deep convolutional neural network for handwritten digit
recognition. In IEEE international conference on imaging, vision and pattern
recognition, icIVPR 2017 (pp. 1–6). https://doi.org/10.1109/ICIVPR.2017.
7890867
20. Mashrukh Zayed, M., Neyamul Kabir Utsha, S. M., & Waheed, S. (2021).
Handwritten bangla character recognition using deep convolutional neural
network: Comprehensive analysis on three complete datasets. Advances in
Intelligent Systems and Computing. https://doi.org/10.1007/978-981-33-4673-4_7
21. Radford, A., Metz, L., & Chintala, S. (2016). Unsupervised representation
learning with deep convolutional generative adversarial networks. In 4th
International conference on learning representations, ICLR 2016-conference track
proceedings.
22. Haque, S., Shahinoor, S. A., Rabby, A. K. M. S. A., Abujar, S., & Hossain, S. A.
(2018). OnkoGan: Bangla handwritten digit generation with deep convolutional
generative adversarial networks. In Recent Trends in image processing and
pattern recognition, second international conference, {RTIP2R} 2018, Solapur,
India, 21–22 Dec 2018, Revised Selected Papers, Part {III}, 2018, vol. 1037 (pp.
108–117). https://doi.org/10.1007/978-981-13-9187-3_10
23. Jha, G., & Cecotti, H. (2020). Data augmentation for handwritten digit recognition
using generative adversarial networks. Multimed Tools and Applications. https://
doi.org/10.1007/s11042-020-08883-w
[Crossref]
24. Biswas, R., Vasan, A., & Roy, S. S. (2020). Dilated deep neural network for
segmentation of retinal blood vessels in fundus images. Iranian Journal of
Science and Technology, Transactions of Electrical Engineering, 44(1), 505–518.
https://doi.org/10.1007/s40998-019-00213-7
[Crossref]
25. Roy, S. S., Rodrigues, N., & Taguchi, Y. (2020). Incremental dilations using CNN
for brain tumor classification. Applied Sciences, 10(14), 4915. https://doi.org/10.
3390/app10144915
[Crossref]
26. Roy, S. S., Mihalache, S. F., Pricop, E., & Rodrigues, N. (2022). Deep
convolutional neural network for environmental sound classification via dilation.
Journal of Intelligent & Fuzzy Systems, 43(2), 1827–1833. https://doi.org/10.
3233/JIFS-219283
[Crossref]
27. Roy, S. S., et al. (2022). L2 regularized deep convolutional neural networks for
fire detection. Journal of Intelligent & Fuzzy Systems, 43(2), 1799–1810. https://
doi.org/10.3233/JIFS-219281
[Crossref]
28. Reddy, A. S. B., & Juliet, D. S. (2019). Transfer learning with ResNet-50 for
malaria cell-image classification. In International Conference on Communication
and Signal Processing (ICCSP) (pp. 945–949). https://doi.org/10.1109/ICCSP.
2019.8697909
29. Rezende, E., Ruppert, G., Carvalho, T., Ramos, F., & de Geus, P. (2017).
Malicious software classification using transfer learning of ResNet-50 deep neural
network. In Proceedings of the 16th IEEE international conference on machine
learning and applications, ICMLA 2017 (pp. 1011–1014). https://doi.org/10.1109/
ICMLA.2017.00-19
30. Alif, M. A. R., Ahmed, S., & Hasan, M. A. (2017). Isolated Bangla handwritten
character recognition with convolutional neural network. In 2017 20th
International conference of computer and information technology (ICCIT) (pp. 1–
6).
31. Alom, M. Z., Sidike, P., Hasan, M., Taha, T. M., & Asari, V. K. (2018).
Handwritten Bangla character recognition using the state-of-the-art deep
convolutional neural networks. Computational Intelligence and Neuroscience.
https://doi.org/10.1155/2018/6747098
[Crossref]
32. Khan, M. M., Uddin, M. S., Parvez, M. Z., & Nahar, L. (2022). A squeeze and
excitation ResNeXt-based deep learning model for Bangla handwritten compound
character recognition. Journal of King Saud University Computer and Information
Sciences, 34(6), 3356–3364. https://doi.org/10.1016/j.jksuci.2021.01.021
[Crossref]
33.
Rabby, A. K. M. S. A., Haque, S., Islam, M. S., Abujar, S., & Hossain, S. A.
(2019). Ekush: A multipurpose and multitype comprehensive database for online
off-line Bangla handwritten characters. Communications in Computer and
Information Science. https://doi.org/10.1007/978-981-13-9187-3_14
[Crossref]
34. Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., & Basu, D. K. (2012).
CMATERdb1: A database of unconstrained handwritten Bangla and Bangla-
English mixed script document image. International Journal on Document
Analysis and Recognition. https://doi.org/10.1007/s10032-011-0148-6
[Crossref]
36. Alom, Z., Sidike, P., Taha, T. M., & Asari, V. K. (2017). Handwritten bangla digit
recognition using deep learning, p. 1712.
37. Shibly, M. M. A., Tisha, T. A., Tani, T. A., & Ripon, S. (2021). Convolutional
neural network-based ensemble methods to recognize Bangla handwritten
character. PeerJ Computer Science, 7, 1–30. https://doi.org/10.7717/peerj-cs.565
[Crossref]
38. Alom, M. Z., Sidike, P., Hasan, M., Taha, T. M., & Asari, V. K. (2017).
Handwritten bangla character recognition using the state-of-art deep convolutional
neural networks, p.1712.
39. Sikder, M. F. (2020). Bangla handwritten digit recognition and generation. In:
Proceedings of international joint conference on computational intelligence (pp.
547–556).
41. Nishat, Z. K., & Shopon, M. (2019). Synthetic class specific Bangla handwritten
character generation using conditional generative adversarial networks. In 2019
International conference on bangla speech and language processing (ICBSLP
2019). https://doi.org/10.1109/ICBSLP47725.2019.201475
42.
Chaudhuri, B. B. (2006). A complete handwritten numeral database of Bangla-A
major Indic script. In 10th international workshop on frontiers of handwriting
recognition (IWFHR), La Baule, France.
43. Alam, S., Reasat, T., Doha, R. M., & Humayun, A. I. (2018). NumtaDB-
assembled Bengali handwritten digits, pp 1–4.
45. Bank, D., Koenigstein, N., & Giryes, R. (2020). Autoencoders. In Machine
learning: Methods and applications to brain disorders (pp. 193–208). https://doi.
org/10.1016/B978-0-12-815739-8.00011-0
47. Haque, S., Shahinoor, S. A., Rabby, A. K. M. S. A., Abujar, S., & Hossain, S. A.
(2019). OnkoGan: Bangla handwritten digit generation with deep convolutional
generative adversarial networks. Communications in Computer and Information
Science. https://doi.org/10.1007/978-981-13-9187-3_10
[Crossref]
48. Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization.
Preprint at arXiv arXiv:1412.6980.
49. Theckedath, D., & Sedamkar, R. R. (2020). Detecting affect states using VGG16,
ResNet50 and SE-ResNet50 networks. SN Computer Science. https://doi.org/10.
1007/s42979-020-0114-9
[Crossref]
50. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image
recognition. In 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
S. S. Roy et al. (eds.), Deep Learning Applications in Image Analysis, Studies in Big Data 129
https://doi.org/10.1007/978-981-99-3784-4_2
Burak Taşci
Email: btasci@firat.edu.tr
1 Introduction
The novel coronavirus pandemic (COVID-19) was created a worldwide chaos environment
in a very short time. As of July 2021, over 206 million official cases were reported in the
world and the number of deaths due to COVID-19 has exceeded 4 million [1]. Many
countries have developed various policies to cope with this pandemic and minimize its
effects. In particular, Turkey is among the few countries that set an example to the world as
a result of the early measures and social isolation rules. It is of vital importance to take early
action for COVID-19 and similar pandemics. If the cases of COVID-19 can be detected
early, these patients can be isolated, so that healthy individuals who are not infected can
remain safe. Science and technology make great contributions to the precautionary policies
implemented in this sense. One of the most important of these contributions is to predict
how the pandemic will act in the ongoing times. In this context, two main approaches
appear. The first of these is statistical approaches and mathematical models. The second
approach is artificial intelligence-based approaches that have received more attention in
recent years.
In the literature, there are various approaches for disease detection using biomedical
images based on machine learning and deep learning methods [2–8].
Javaheri et al. [9], tried to detect COVID-19 positive, CAP, and other diseases from
89,145 images obtained from the data of 5 different hospitals using BCDU-Net (U-Net).
The achievement results were 91.66%, 87.5%, 95%, and 94% accuracy, sensitivity, AUC,
and specificity, respectively. Rehmen et al. [10], used CT and X-Ray images of 200
COVID19(+), 200 Healthy, 200 Bacterial Pneumonia and 200 viral Pneumonia in their
study. Using the RestNet101 transfer learning method, the reported results were 98.75%,
97.5%, 96.43%, and 100% accuracy, sensitivity, precision, and specificity respectively.
JavadiMoghaddam et al. [11], proposed a deep learning model called Wavelet CNN-4,
which consists of a wavelet and four convolution layers and a Squeeze Excitation Block
layer in the coupling layer. They compared the proposed model with pre-trained models
such as VGG11, ResNet18, ResNet50 and Inception-v3. The proposed model achieved
99.03% accuracy. Chen et al. [12], tried to detect COVID-19 positive and other diseases
from 35,355 images using. U-Net+ +. With the applied method, the obtained results were
98.85%, 94.34%, 99.16%, 88.37%, and 99.4% accuracy, sensitivity, specificity, precision,
and AUC values, respectively. Wu et al. [13], used CT images consisting of 368
COVID19(+), 127 other diseases in their study. Using the RestNet50 transfer learning
method, the reported results were 76%, 81.1%, 61.5%, and 81.9% accuracy, sensitivity,
AUC, and specificity, respectively. Mobiny et al. [14], used CT images consisting of 349
COVID19(+) and 397 COVID19(-) images in their study. The images that applied GAN,
Rescaling, and cropping as a preprocessing, in DECAPS + Peekaboo and DECAPS
architectures were used. In the applied DECAPS + Peekaboo method, it was reached 87.6%,
84.3%, 87.1%, and 85.2% accuracy, sensitivity, F1-score, and specificity, respectively.
Balaha et al. [15], proposed a hybrid learning and optimization approach based on pre-
trained models to detect Covid-19. Harris Hawks Optimization (HHO) algorithm was used
to optimize the hyperparameters. They performed data augmentation by combining three
publicly available data sets. The Weighted Summation Method (WSM) was used as an
evaluation metric to compare combinations of models, with the best accuracy being 99.33%
with VGG19. Li et al. [16], proposed a deep learning automated framework, COVNet, to
accurately identify COVID-19 with chest CTs. While creating the models, a chest CT
consisting of 4356 images was reported to be used. With this model, detecting COVID-19
patients from other pneumonia patients, a sensitivity of 87% and an Area Under the Curve
(AUC) value of 0.95% were obtained. He et al. [17], used CT images consisting of 349
COVID19(+) and 397 COVID19(-) images in their study. The self-trans method was used as
a preprocess. Using the DenseNet-169 transfer learning method, the reached results were
85%, 94%, and 86% F1-score, AUC, and accuracy, respectively. Ahamed et al. [18], used
datasets consisting of chest X-ray and CT images in their study to train their proposed
model. Images were preprocessed and enlarged before entering the proposed ResNet50V2
model. Extra layers have been added to the basic model with regularization and fine-tuning
processes. They classified the images according to two-class, three-class and four-class
categories as pre-processed and non-pre-processed. The model achieved 99.01% and 83.6%
accuracy for the 3 class categories with and without preprocessing, respectively. Pathak et
al. [19], used 413 COVID19(+), 439 normal or pneumonia CT images in their study. As
preprocessing, ResNet50 feature extraction was used. CNN was used for classification. On
the CNN network, the reached results were 93.01%, 91.45%, 94.77%, 95.18% accuracy,
sensitivity, specificity, and precision, respectively. Shi et al. [20], have applied a machine
learning algorithm, Random Forest (RF), to screen for COVID-19. CT images of 2685
patients were used to evaluate the models in the presented study. In the model, after
evaluating the fivefold cross-validation technique, the model achieved accuracy, sensitivity,
and specificity of 87.9%, 90.7%, and 83.3%, respectively.
The following are the primary contributions of this study:
The suggested model utilized the classification capabilities of features derived from
AlexNet and ResNet101's pre-trained deep architectures.
The current study examines Chi-square, NCA, mRMR, and ReliefF feature selection
algorithms in order to reduce the amount of features obtained from pre-trained deep
neural networks and identify the most effective deep features.The AlexNet and
ResNet101 features that give the highest result are combined. The mRMR feature
selection algorithm was adapted to the combined features. In experimental studies, a
highly successful diagnostic model was obtained by using these selected and effective
features for chest X-ray image classification.
Deep features were obtained using pre-trained CNN networks, and those features were
used to optimize the parameters of the best SVM classifier. This method got the
maximum performance, with a score of 98.21%, when it came to the classification of
chest X-ray images.
In the remaining section of the paper, the material and method was mentioned in the
second section, the experimental studies and results in the third section, and the discussion
in the fourth section.
2.1.1 Preprocessing
The gradient method is applied to the input images. Calculation of gradient magnitudes and
directions is done with the help of directional gradient.
The watershed method is usually applied to the gradient of the image. By using 8
neighboring points around each point in the image, the most bumpy and rough directions in
the image are detected [21]. Points with a minimum height in the image are marked with
individual identifiers. Using the gradient information in the image, the descending regions
are followed at certain rates. The watershed method associates all pixels with their
respective minimum points [22].
(1)
wm is the weight that has been allotted to the mth feature. A kernel function that returns big
values for tiny Dw can be used to determine the relationship between probability Pij and
weighted distance Dw. This relationship can be determined by using the kernel function. Pij
is defined by the following equation:
(2)
Also, it takes the vae 1 if i = j and Pii = 0. The kernel function is defined as k(z) = exp (−
z/σ). The parameter k and σ are the core width and this affects the probability that sample xj
will be selected as the reference point. The probability of xi being classified correctly is
written ain Eq. 3.
(3)
ReliefF
One of the most well-known approaches of feature selection is referred to as the relief
algorithm. It is a type of algorithm that has the potential to create features predictions that
are quite accurate and fruitful. The prediction of these features is accomplished by assigning
weights to the characteristics or features If an features is of any use, one can anticipate that
the closest distances of the same class will be closer to one another than the closest
distances of any and all other classes that are given along that feature [24]. The convex
optimization problem is solved, and the result is used to determine the feature weights.
However, the Relief algorithm has the limitation of only being able to handle two-class
situations and cannot process data that is incomplete. This is a disadvantage. The ReliefF
method, which was an enhanced version of the Relief algorithm, was offered as a solution
for these problems as well as additional difficulties. It's possible that this enhanced approach
can conquer incredibly powerful, noisy, and incomplete data. If the working logic of the
ReliefF algorithm is examined, firstly, a sample Ri is randomly selected, then, the k nearest
neighbors from the same class called Hj, and k nearest neighbors from each of the different
classes, called Mj(C) are selected. Depending on the values of Ri, Hj, and Mj(C), the w[A]
value was updated for all A features. feature weights range from −1 to + 1. The largest
positive values mean that the feature was important. This process was continued for the
number determined by the user. With the diff function, the differences between samples and
features, that is, distances, are calculated. The calculation of this function depends on
whether the features are written or numeric. Let I1 and I2 be samples and A be features If the
features were written, then the calculation will be as in Eqs. 4 and 5. Choosing k, increases
the robustness of the algorithm against noisy data. This value can be set by the user; but if k
is chosen as 1, the algorithm will be sensitive to noisy data. In many studies, the k value was
chosen as 10, but choosing the k value differently would be more useful in examining the
importance levels of the features. Finally, choosing the k value too small will cause similar
bad results.
(4)
(5)
(6)
(7)
(8)
AlexNet
Deep learning pioneers Alex Krishevsky, Ilya Sutskever, and Geoffrey Hinton came up with
the method that would become known as AlexNet [29]. This deep convolutional neural
network has a total of 25 layers, with 5 convolution layers, 3 maxpool layers, 2 dropout
layers, 3 fully connected layers, 7 relu layers, 2 normalization layers, a softmax layer, input,
and classification (output) layers making up the structure. The dimensions of the image that
will go into the input layer of Alexnet are 227 by 227 by 3. The final layer is where
classification takes place, and this is also where the value of the classification number in the
input image is presented.
DenseNet201
Forward connections are made between each layer of the DenseNet-121 (Densely
Connected Convolutional Network) and other layers. Each layer of the DenseNet design
takes as input the properties of all of the layers that came before it, as well as the qualities
that are unique to that layer, which are then passed on to the layers that come after it [30].
DenseNet topologies have the advantage of providing feature propagation and reducing the
number of parameters by permitting feature reuse [31]. DenseNet-121 design is composed
of four dense blocks, three transition layers, and 121 layers in total (117 loops, 3 passes and
1 classification).
MobileVNet2
MobileNet designs are built on a modular architecture that allows for the development of
both shallow and deep neural networks. This architecture's two basic global
hyperparameters provide an optimal balance of latency and precision. Based on the
restrictions of the problem, these hyperparameters allow the model builder to select the
appropriate-sized model for their application.
Nasnet-Large
NASNet-Large is a 1243-layer convolutional neural network trained on more than one
million photos from the ImageNet collection. The network can split photos into one
thousand object types, including animals, balloons, and flowers. As a result, the network has
acquired rich feature representations for a vast array of image types. 331 × 331 pixels should
be the size of the picture to be put to the mesh.
EfficientNet B0
EfficientNet, a new CNN study developed by Google in 2019, provides significant
improvements in accuracy and productivity (performance). The productivity model
presented in the study offers a new approach because of being also applicable to other CNN
models. EfficientNet-B0, is the basic network developed using AutoML MNAS [32].
EfficientNet-B0 consists of 290 layers. The image to be placed in the input layer of
EfficientNet B0 is 227 × 227 × 3 in size.
GoogleNet
ImageNet 2014 came first with a success rate of 93.33% in image classification competition.
GoogLeNet architectural structure consists of 144 layers and this architecture has proven
that too many data sets were increased the performance of the classification process by
increasing the number of layers. The image to be placed in the input layer of Googlenet is
224 × 224 × 3 In order to prevent overloading of large-sized images, it filters images in
various sizes such as “1 × 1, 3 × 3, 5 × 5” in the same period. Unlike other architectures, this
architecture processes images in parallel, rather than stacking the layers it creates. Because
it also was considered negative factors such as memory size increase, waste of time, etc. for
stacked processes [25].
Inception ResNet-V2
The Inception-ResNet-V2 architecture combines the remaining connections with a new
version of the inception architecture. The Inception-ResNet-V2 network makes efficient use
of remaining connections [33]. The feature extraction performance of Inception-ResNet-V2
architecture is quite good. In this architecture, remaining units are added to each Inception
module to prevent degradation of the network gradient usually associated with the increase
in the number of layers. Inception ResNet-v2 architectural structure consists of 825 layers.
The image to be placed in the input layer of Inception ResNet-v2 measures 299 × 299 × 3.
Inception V3
Inception architecture is an architecture that emerged with the GoogleNet model. GoogleNet
model, proposed by Szegedy et al. (2015), tries to keep the computational cost at the same
rate while increasing the depth and width. Therefore, in this model using the concept of
Inception, the outputs obtained by using different convolution filters together were
combined [34]. The Inception-v3 architectural structure consists of 316 layers. The image to
be placed in the input layer of Inception v3 measures 299 × 299 × 3.
ResNet-18
The ResNet 18 pre-trained model, which provides rich features, works by inputting more
than one million data in the ImageNet dataset with a size of 224 × 224. Although it has 71
layers and 18 depths, it is analyzed that it gives successful and faster results compared to
some models with a deeper layer [35].
ResNet-50
Resnet microarchitecture module differs from other architectures with its structure. It may
be preferable to switch to the lower layer by ignoring the change between some layers. By
allowing this situation in the Resnet architecture, the performance rate was increased to
higher levels.
Resnet50 architecture consists of a network of 177 layers. The depth of the net is 50. In
addition to this layered structure, there is information about how the inter-layer connections
will be [36].
ResNet-101
The Resnet-101 structure has 347 layers and a depth of 101. ResNet's bypass (jumping)
between layers is referred to as ResBlock. Even if nothing is learned in the previous layer,
ResBlock makes the model more robust by applying the information from the previous layer
to the new layer. ResBlock thereby fixed the gradient deletion issue. Utilizing slope drop as
the optimization algorithm. Resnet-101 input layer dimensions are 224 × 224 × 3 [36].
VGG16
The VGG16 model consists of a total of 41 layers, 16 of which include learnable weights,
followed by ReLu and pooling layers. Learnable layers include thirteen convolutional and
three fully linked layers. Similar to AlexNet, the VGG16 model employs a 1-pixel pitch
shift and 3 × 3 filter in all convolutional layers, and maximum pooling layers follow
convolutional layers. Maximum pooling is attained with a two-step, two-by-two filter. To
extract feature vectors, activations in the first and second fully connected layers (fc6, fc7)
were utilized. fc6 and fc7 result vectors include a total of 4096 characteristics. Training
utilizes 224,224 RGB pictures [37].
VGG19
The Visual Geometry Group at the University of Oxford is responsible for the development
of the VGG19 computer program (VGG). It consists of 19 layers, 16 of which are
convolutional, 3 of which are completely connected, 5 of which are maximum pooling, and
1 of which is a Softmax layer. The input for this network is photos with a dimension of
(224, 224, 3). Approximately 144 million trainable parameters are available. Filters with a
step size of one pixel (3 by 3) were employed so that the overall notion of the image could
be conveyed [37].
(9)
(10)
(11)
2.2 Dataset
The dataset consists of 1061 x-ray images labeled by Radiologists. The dataset has been
edited after downloading from the kaggle website [42, 43]. X-ray images consist of three
classes: COVID-19, Pneumonia and Normal. There are 361 COVID-19, 500 Pneumonia and
200 Normal chest X-ray images in the Dataset. The COVID-19 cases in the dataset consist
of chest X-ray images of 200 male and 161 female patients. The mean age of the patients is
over 45. These images range in height is from 143 to 1637 pixels (average 491 pixels) and
in width from 76 to 1225 pixels (average 383 pixels). Figure 2 shows an example of X-
RAY scans of COVID-19, Normal and Pneumonia patients in the dataset.
Fig. 2 COVID-X-Ray scan dataset sample images
(12)
(13)
(14)
(15)
(16)
4 Experimental Studies
Matlab environment was used to obtain the experimental results in this study. Experimental
results were obtained using an all-in-one computer with an I7 processor, 16 GB Ram, and a
4 GB graphics card. The images in the data set were sized as 224 × 224, 227 × 227, 299 ×
299 and 331 × 331, and classification was performed. In the study, convolutional neural
networks, AlexNet, EfficientNet B0, GoogleNet, Inception ResNet-v2, Inception-v3,
DenseNet201, MobilevNet2, Nasnet-Large, ResNet18, ResNet50, ResNet101, VGG16 and
VGG19 models were used. Chi-square, NCA, mRMR and ReliefF feature selection methods
were used. A total of 2000 features were selected, 1000 from the FC8 layer of AlexNet's
features and 1000 from Resnet101's FC1000 layer. Selected features have been reduced to
200 features with mRMR feature selection methods. Classification process for 200 features
was given to 13 different classifiers. In this study, it was observed that the highest
performance was obtained in SVM. In Fig. 3, the Confusion matrices of the classification
method in which the 13 pre-learned different networks and combined networks used reach
the highest accuracy were given. ResNet50 + AlexNet network Cubic SVM classifier with
mRMR feature selection had the best accuracy result with 98,21% and classifier Inception
Resnet-v2 network Cubic SVM classifier with NCA feature selection had the worst
accuracy result with 95,00%.
Fig. 3 Confusion matrices with the highest accuracy
In Fig. 4, The graphs of the accuracy values of the pre-trained networks according to the
classifiers and feature selections were given.
Fig. 4 Graphs of truth values of pre-trained networks according to classifiers and feature selections
Cubic SVM classifier had the highest accuracy with 96.42% for AlexNet network, The
Medium Gaussian SVM classifier with mRMR feature selection had the worst accuracy
with 89.2%. Cubic SVM classifier with NCA feature selection had the highest accuracy
with 96.61% for DenseNet-201 network, The Quadratic Dicriminant classifier with Chi2
feature selection had the worst accuracy with 89.6%. Cubic SVM classifier with NCA
feature selection had the highest accuracy with 96.51% for EfficientNet B0 network, The
Fine Tree classifier with Chi2 feature selection had the worst accuracy with 89.3%. Cubic
SVM classifier with NCA feature selection had the highest accuracy with 96.06% for
GoogleNet network, The Quadratic SVM classifier with mRMR feature selection had the
worst accuracy with 89.7%. Cubic SVM classifier had the highest accuracy with 95.0% for
Inception ResNet-v2 network, The Medium Gaussian SVM classifier with NCA feature
selection had the worst accuracy with 89.7%.
Cubic SVM classifier with Chi2 feature selection had the highest accuracy with 96.14%
for Inception v3 network, The Bilayered Neural Network had the worst accuracy with
89.2%. Cubic SVM classifier with Chi2 feature selection had the highest accuracy with
96.14% for MobilevNet2 network, The Quadratic Dicriminant with ReliefF feature selection
had the worst accuracy with 90.0%. Cubic SVM classifier with ReliefF feature selection had
the highest accuracy with 96.32% for Nasnet-Large network, The Medium Gaussian SVM
with ReliefF feature selection had the worst accuracy with 89.7%. Cubic SVM classifier had
the highest accuracy with 96.04% for ResNet18 network, The Quadratic Dicriminant with
ReliefF feature selection had the worst accuracy with 90.0%. Cubic SVM classifier with
NCA feature selection had the highest accuracy with 97.08% for ResNet50 network, The
Quadratic Dicriminant with NCA feature selection had the worst accuracy with
90.1%.Quadratic Dicriminant with NCA feature selection had the highest accuracy with
96.04% for ResNet101 network, The Fine Tree classifier with ReliefF feature selection had
the worst accuracy with 90.4%.Cubic SVM classifier with mRMR feature selection had the
highest accuracy with 96.42% for VGG16 network, The Medium Gaussian SVM with NCA
feature selection had the worst accuracy with 90.0%.Quadratic Dicriminant classifier with
NCA feature selection had the highest accuracy with 95.66% for VGG19 network, The
Medium Gaussian SVM with NCA feature selection had the worst accuracy with 89.3%.
In Table 2, the sensitivity, specificity, precision and, F-score results of the classifiers
used in the proposed method were given. For the pneumonia class, Accuracy, Sensitivity,
Specificity, Precision, F-Score metrics were all 100%. In the COVID19 class, for the
Sensitivity metric, GoogleNet network Cubic SVM classifier with NCA feature selection
had the best result with 100% and classifier Inception Resnet-v2 network Cubic SVM
classifier with NCA feature selection had the worst result with 94.18%. For the Specificity
metric, ResNet50 + AlexNet network Cubic SVM classifier with mRMR feature selection
had the best result with 98.14% and classifier VGG19 network Cubic SVM classifier with
mRMR feature selection had the worst result with 93.71%. For the Precision metric,
ResNet50 + AlexNet network Cubic SVM classifier with mRMR feature selection had the
best result with 96.47% and classifier VGG19 network Cubic SVM classifier with mRMR
feature selection had the worst result with 89.08%. For the F-score metric, ResNet50 +
AlexNet network Cubic SVM classifier with mRMR feature selection had the best result
with 97.39% and classifier Inception Resnet-v2 network Cubic SVM classifier with NCA
feature selection had the worst result with 92.77%.
In the Normal class, for the Sensitivity metric, VGG19 network Cubic SVM classifier
with mRMR feature selection had the best result with 99.77% and classifier GoogleNet
network Cubic SVM classifier with NCA feature selection had the worst result with
96.04%. For the Specificity metric, classifier Inception Resnet-v2 network Cubic SVM
classifier with NCA feature selection had the best result with 98.95% and classifier VGG19
network Cubic SVM classifier with mRMR feature selection had the worst result with
78.00%. For the Precision metric, ResNet50 + AlexNet network Cubic SVM classifier with
mRMR feature selection had the best result with 98.50% and classifier Inception Resnet-v2
network Cubic SVM classifier with NCA feature selection had the worst result with
84.00%. For the F-score metric, ResNet50 + AlexNet network Cubic SVM classifier with
mRMR feature selection had the best result with 98.90% and classifier Inception-v3
network Cubic SVM classifier with Chi2 feature selection had the worst result with 96.38%.
5 Discussion
In this section, the performance criteria of studies with pre-trained models and the proposed
method, consisting of accuracy, sensitivity and specificity, are discussed. Evaluations in the
literature are usually made on combined data sets. Since the data sets used in the studies are
different and the evaluation criteria are different, it cannot be said that they are completely
superior to each other. The performance scores of these methods are given in Table 3.
Abbas et al. [44], established a modified deep neural network effective on Xray images
to more effectively distinguish between COVID-19 cases. The model they call DeTraC
includes three inner layers. This model was created using ResNet18 on the backend and
achieved 95.12% accuracy on the X-Ray dataset. Wang et al. [45], used 44 COVID19(+), 55
typical viral pneumonia CT images in their study. As preprocessing, a visual inspection of
ROI extraction was performed. In the applied M-inception algorithm, the obtained results
were 82.9%, 81%, 84%, 77%, and 90% accuracy, sensitivity, F1-score, AUC, and
specificity, respectively. Alqudah et al. [46] used SVM, Random Forest, CNN in this study.
95.2% accuracy, 93.3% Sensitivity, 100% Specificity and 100% Precision were achieved.
Hemdan et al. [47], suggested the COVIDXNET deep learning classifier architecture for
COVID-19 diagnosis using X-Ray pictures. In addition, they validated seven distinct DCNN
models, such as VGG19 and Densenet201, in their investigation. They demonstrated that
VGG19 and DenseNet classifications are superior. Narin et al. [48], used deep CNN-based
models to classify X-ray images for COVID-19 illness. Using chest X-ray radiographs,
CNN-based models (InceptionResNetV2, ResNet50, and InceptionV3) were utilized to
detect people infected with coronavirus pneu-monia. 98.00% accuracy was reached with the
ResNet50 model, based on the results of the experiments.
The proposed approach has reached a success rate of 98.21%. It has reached a 100%
success rate in the sensitivity and specificity criteria for the pneumonia class. For the
COVID-19 class Sensitivity, Specificity, Precision, F-Score metrics, values of 98.34%,
98.14%, 0.96.47%, and 97.39% were obtained, respectively.
6 Results
The rapid spread of the COVID-19 pandemic all over the world, its negative effects on
people, clearly demonstrates the detection of positive cases in the early stages and the rapid
and correct intervention. In this study, the three-class data set consisting of X-Ray images
obtained during the COVID-19 epidemic was classified by the learning transfer method. In
this paper, preprocessing techniques have been applied to X-RAY images to improve
classification performance. Gradient operator used as Sobel operator was used to highlight
the point regions in X-RAY images and reduce the number of house gray tones. Chi-square,
NCA, mRMR and ReliefF feature selection methods were used. First, the results of 13 pre-
trained models were compared. Then, a total of 2000 features were selected from AlexNet
and Resnet101. Selected features have been reduced to 200 features with mRMR feature
selection methods. Classification process for 200 features was given to 13 different
classifiers. In this study, it was seen that the highest performance was obtained at 98.21%
SVM after applying mRMR feature selection to the combined models of RenNet50 +
AlexNet models. In the study, the highest accuracy, sensitivity, specificity, precision and F-
score value for the COVID19 class were; ResNet50 + AlexNet Cubic SVM with 98.21%,
GoogleNet network Cubic SVM classifier with 100%, ResNet50 + AlexNet Cubic SVM
with 98.14%, ResNet50 + AlexNet Cubic SVM with 96.47%, ResNet50 + AlexNet with
97.39% Obtained in Cubic SVM. In the proposed approach, it has been seen that pre-trained
CNN architectures and feature extraction methods can be used together. In addition, it has
been confirmed in this study that the weights can be combined and efficient rather than
considering the performance of feature selection methods separately. The major limitation
of this study is that the method used requires more powerful hardware if applied to larger
datasets.
References
1. CoronaVirus Updates. (2022). https://www.worldometers.info/coronavirus/
2. Jalali, S. M. J., Ahmadian, M., Ahmadian, S., Hedjam, R., Khosravi, A., & Nahavandi, S. (2022). X-
ray image based COVID-19 detection using evolutionary deep learning approach. Expert Systems
with Applications, 201, 116942.
[Crossref]
3. Dhiman, G., Chang, V., Kant Singh, K., & Shankar, A. (2022). Adopt: Automatic deep learning and
optimization-based approach for detection of novel coronavirus covid-19 disease using x-ray images.
Journal of Biomolecular Structure and Dynamics, 40(13), 5836–5847.
[Crossref]
4. Roy, S. S., Goti, V., Sood, A., Roy, H., Gavrila, T., Floroian, D., Paraschiv, N., & Mohammadi-
Ivatloo, B. (2014). L2 regularized deep convolutional neural networks for fire detection. Journal of
Intelligent & Fuzzy Systems, 1–12.
5. Ravi, V., Narasimhan, H., Chakraborty, C., & Pham, T. D. (2022). Deep learning-based meta-
classifier approach for COVID-19 classification using CT scan and chest X-ray images. Multimedia
Systems, 28(4), 1401–1415.
[Crossref]
6. Roy, S. S., Rodrigues, N., & Taguchi, Y. (2020). Incremental dilations using CNN for brain tumor
classification. Applied Sciences, 10(14), 4915.
[Crossref]
7. Biswas, R., Vasan, A., & Roy, S. S. (2020). Dilated deep neural network for segmentation of retinal
blood vessels in fundus images. Iranian Journal of Science and Technology, Transactions of
Electrical Engineering, 44(1), 505–518.
[Crossref]
8. Samui, P., Roy, S. S., & Balas, V. E. (2017). Handbook of neural computation. Academic Press.
9.
Javaheri, T., Homayounfar, M., Amoozgar, Z., Reiazi, R., Homayounieh, F., Abbas, E., Laali, A.,
Radmard, A. R., Gharib, M. H., & Mousavi, S. A. J. (2021). CovidCTNet: An open-source deep
learning approach to diagnose covid-19 using small cohort of CT images. NPJ Digital Medicine,
4(1), 1–10.
[Crossref]
10. Rehman, A., Naz, S., Khan, A., Zaib, A., & Razzak, I. (2022) Improving coronavirus (COVID-19)
diagnosis using deep transfer learning. In Proceedings of international conference on information
technology and applications (pp. 23–37). Springer.
11. JavadiMoghaddam, S., & Gholamalinejad, H. (2021). A novel deep learning based method for
COVID-19 detection from CT image. Biomedical Signal Processing and Control, 70, 102987.
[Crossref]
12. Chen, J., Wu, L., Zhang, J., Zhang, L., Gong, D., Zhao, Y., Chen, Q., Huang, S., Yang, M., & Yang,
X. (2020). Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-
resolution computed tomography. Scientific Reports, 10(1), 1–11.
13. Wu, X., Hui, H., Niu, M., Li, L., Wang, L., He, B., Yang, X., Li, L., Li, H., & Tian, J. (2020). Deep
learning-based multi-view fusion model for screening 2019 novel coronavirus pneumonia: A
multicentre study. European Journal of Radiology, 128, 109041.
[Crossref]
14. Mobiny, A., Cicalese, P., Zare, S., Yuan, P., Abavisani, M., Wu, C., Ahuja, J., de Groot, P., & Van
Nguyen, H. (2020). Covid R-l detection using CT scans with detail-oriented capsule networks.
15. Balaha, H. M., El-Gendy, E. M., & Saafan, M. M. (2021). CovH2SD: A COVID-19 detection
approach based on Harris Hawks Optimization and stacked deep learning. Expert Systems with
Applications, 186, 115805.
16. Li, L., Qin, L., Xu, Z., Yin, Y., Wang, X., Kong, B., Bai, J., Lu, Y., Fang, Z., & Song, Q. (2020)
Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT.
Radiology.
17. He, X., Yang, X., Zhang, S., Zhao, J., Zhang, Y., Xing, E., & Xie, P. (2020) Sample-efficient deep
learning for COVID-19 diagnosis based on CT scans. Medrxiv.
18. Ahamed, K. U., Islam, M., Uddin, A., Akhter, A., Paul, B. K., Yousuf, M. A., Uddin, S., Quinn, J.
M., & Moni, M. A. (2021). A deep learning approach using effective preprocessing techniques to
detect COVID-19 from chest CT-scan and X-ray images. Computers in Biology and Medicine, 139,
105014.
[Crossref]
19. Pathak, Y., Shukla, P. K., Tiwari, A., Stalin, S., & Singh, S. (2020). Deep transfer learning based
classification model for COVID-19 disease. Irbm.
20. Shi, F., Xia, L., Shan, F., Song, B., Wu, D., Wei, Y., Yuan, H., Jiang, H., He, Y., & Gao, Y. (2021).
Large-scale screening to distinguish between COVID-19 and community-acquired pneumonia using
infection size-aware classification. Physics in Medicine & Biology, 66(6), 065031.
[Crossref]
21. Tarabalka, Y., Chanussot, J., & Benediktsson, J. A. (2010). Segmentation and classification of
hyperspectral images using watershed transformation. Pattern Recognition, 43(7), 2367–2379.
[Crossref][zbMATH]
22.
Gauch, J. M. (1999). Image segmentation and analysis via multiscale gradient watershed hierarchies.
IEEE Transactions on Image Processing, 8(1), 69–79.
[Crossref]
23. Yang, W., Wang, K., & Zuo, W. (2012). Neighborhood component feature selection for high-
dimensional data. Journal of Computers, 7(1), 161–168.
[Crossref]
24. Robnik-Šikonja, M., & Kononenko, I. (2003). Theoretical and empirical analysis of ReliefF and
RReliefF. Machine Learning, 53(1), 23–69.
[Crossref][zbMATH]
25. Liu, H., Li, J., & Wong, L. (2002). A comparative study on feature selection and classification
methods using gene expression profiles and proteomic patterns. Genome Informatics, 13, 51–60.
26. McHugh, M. L. (2013). The chi-square test of independence. Biochemia Medica, 23(2), 143–149.
[Crossref]
27. Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of
max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 27(8), 1226–1238.
[Crossref]
28. Ding, C., & Peng, H. (2005). Minimum redundancy feature selection from microarray gene
expression data. Journal of Bioinformatics and Computational Biology, 3(02), 185–205.
[Crossref]
29. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep
convolutional neural networks. Communications of the ACM, 60(6), 84–90.
[Crossref]
30. Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected
convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern
recognition (pp. 4700–4708).
31. Iandola, F., Moskewicz, M., Karayev, S., Girshick, R., Darrell, T., & Keutzer, K. (2014) Densenet:
Implementing efficient convnet descriptor pyramids. Preprint at arXiv:14041869
32. Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks.
In International conference on machine learning, PMLR (pp. 6105–6114).
33. Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-resnet and the
impact of residual connections on learning. In Thirty-first AAAI conference on artificial intelligence.
34. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., &
Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on
computer vision and pattern recognition (pp. 1–9).
35. Ou, X., Yan, P., Zhang, Y., Tu, B., Zhang, G., Wu, J., & Li, W. (2019). Moving object detection
method via ResNet-18 with encoder–decoder structure in complex scenes. IEEE Access, 7, 108152–
108160.
[Crossref]
36.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In
Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778)
37. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image
recognition. Preprint at arXiv:14091556.
38. Vapnik, V. (1999). The nature of statistical learning theory. Springer science & business media.
39. McRoberts, R. E., Tomppo, E. O., Finley, A. O., & Heikkinen, J. (2007). Estimating areal means and
variances of forest attributes using the k-Nearest Neighbors technique and satellite imagery. Remote
Sensing of Environment, 111(4), 466–480.
[Crossref]
40. Bühlmann, P. (2012). Bagging, boosting and ensemble methods. In Handbook of computational
statistics. Springer, pp 985–1022.
44. Abbas, A., Abdelsamea, M. M., & Gaber, M. M. (2021). Classification of COVID-19 in chest X-ray
images using DeTraC deep convolutional neural network. Applied Intelligence, 51(2), 854–864.
[Crossref]
45. Wang, S., Kang, B., Ma, J., Zeng, X., Xiao, M., Guo, J., Cai, M., Yang, J., Li, Y., & Meng, X.
(2021). A deep learning algorithm using CT images to screen for Corona Virus Disease (COVID-
19). European Radiology, 31(8), 6096–6104.
[Crossref]
46. Alqudah, A. M., Qazan, S., Alquran, H., Qasmieh, I. A., & Alqudah, A. (2020). COVID-2019
detection using X-ray images and artificial intelligence hybrid systems. Biomedical Signal and
Image Analysis and Project.
47. Hemdan, E. E.-D., Shouman, M. A., & Karar, M. E. (2020). Covidx-net: A framework of deep
learning classifiers to diagnose covid-19 in x-ray images. Preprint at arXiv:200311055.
48. Narin, A., Kaya, C., & Pamuk, Z. (2021). Automatic detection of coronavirus disease (covid-19)
using x-ray images and deep convolutional neural networks. Pattern Analysis and Applications,
24(3), 1207–1220.
[Crossref]
49. Cohen, J. P., Morrison, P., Dao, L., Roth, K., Duong, T. Q., & Ghassemi, M. (2020). Covid-19 image
data collection: Prospective predictions are the future. Preprint at arXiv:200611988.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
S. S. Roy et al. (eds.), Deep Learning Applications in Image Analysis, Studies in Big Data 129
https://doi.org/10.1007/978-981-99-3784-4_3
1 Introduction
Generating textual description of an image is an easier task for human
being, however, for a machine to explain the image requires computer
vision to visualise the image and NLP to describe the image [1]. Hence in
order to generate caption automatically for a particular photograph, the
system must be trained and educated to realise the content of image and
thereafter to express the contents in natural language words [2]. With the
advent of deep learning methods especially for image feature extraction and
processing [3], this particular problem has been swiftly addressed.
Deep learning techniques such as convolutional neural network (CNN)
are widely used for image processing tasks for their ability to deal with
millions of underlying features [4]. It has been well perceived that CNN
techniques are quite efficient for varieties medical image processing e.g.
COVID-19 lung CT- scan [5], MRI images for brain tumor diagnosis [6, 7],
retinal blood vessel [8], angiograms [9], chest X-rays [10] and many more.
By just seeing the picture depicted in Fig. 1, some of us might say “A
Little is talking brown guiding grassy”, some may say “Little boy is playing
with toys” and yet some others might say “A little boy is designing the
house”. The answer to all these observations are true and even few
additional captions are also possible. All these findings do not require any
special training or efforts for a human being, however, this is not the case
for a system so that just by overlooking glancing; an appropriate language
can be described.
This study of generating the captions for the images has following
significances.
The experiments are based on transfer learning coupled with Convolution
Neural Network (CNN).
We aims for boosting the model performance by making subtle changes
to the block diagram.
The objective is producing the semantic and syntactical captions for the
input images by using the phrases as elementary units instead of words.
Motivation
This problem is immensely useful in real-world applications. We listed
below few applications where this study is being interpreted:
Self-driving cars: By automatically and readily generating the caption of
the scene around the car, the self-driving system would be truly
autonomous.
Aid to the blind: By designing a product which will guide the blind
persons when walking on the roads will fulfil a lot of aspirations. This is
possible by converting the scene around into text following the text to
voice.
Google image search: Like Google search, image search may be popular
if an image could be first transformed into a caption and then the
underlying text can be searched.
2 Related Studies
Different techniques for image captioning exists; they are retrieval based or
template based. Recently deep learning base captioning become very
popular due to the quality and appropriateness of the textual description of
images. Deep learning based attention mechanism are also delivers
promising result in captioning [11]. Most of the models are encoder-
decoder based, and it has been realised that LSTM and bidirectional LSTM
networks are used as decoder in most of the systems [12]. Similarly for
encoding purpose VGG16 and ResNet50 are employed for their
effectiveness in vectorising [13].
Few studies on image captioning those have used deep learning for
image processing and text description are represented in Table 1.
3 Methodology
We used combined (CNN-RNN) model to extract the features from the
image and text, further, we used evaluation model to check the accuracy of
the proposed model and finally performance of the model at each epoch is
tracked by the help of error rate. Here we are using top-down approach and
transfer learning to extract the features and to train a model and also to get
accurate captions of the image. In fact the concept of transfer learning is
applied twice in our model. InceptionV3 for extracting features from
images and Glove for extracting features from text/captions for better
accuracy.Finally, we test a model with some images (test images) to know
the accuracy of the model. Detailed methodology consists of following
steps:
Data collection.
Data cleaning and pre-processing.
The result from pre-processing is that we have a vocabulary of 1652
unique words from the training dataset. We employed InceptionV3
transfer learning model.
We encoded all the training images and testing images which are input to
our model.
After removing the stop-words in the process of data cleaning we have
7578 words in our vocabulary.
We also used a transfer learning model (Glove) to extract the features
from our pre-processed text data.
Then we built and train our network/model. Finally, we evaluated the
performance on the test data.
3.1 Dataset
We have utilised Flickr8k dataset which contains around 8000 image, out of
which 6000 images are used for training the model, 1000 images for
validating the model and remaining 1000 images for testing the model in
order to determine the model efficiency. Each image contains five number
of captions (Fig. 2).
From Fig. 4, clearly each individual images have five different captions.
Fig. 4 Caption for the images
The Flickr dataset are loaded in repository, then the data is pre-
processed by removing extra whitespace, punctuation, and other
distractions. For encoding, CNN is used. The input image is fed to CNN to
extract the features. After the features are processed by a series of layers,
the last hidden state of the CNN is connected to the decoder. In this
framework, RNN serves as a decoder which performs language modelling
up to the word level. A schematic diagram of encoder-decoder based image
captioning process is shown in Fig. 2.
The process of encoding and decoding and the detailed layers of those
models and the parameters are involved are being represented in Figs. 6 and
7 respectively.
4 Result
The main objective is to predict the caption for the image. For predicting,
we applied an efficient predictive model using deep learning technique. We
mainly focussed on the predictiveness of the model that suits to find the
caption for the given image in the dataset.
For evaluating the calibre of the text generated, we used BLEU
(Bilingual Evaluation Understudy) since it has the principle of matching
each text against set of reference texts composed by human itself. It is being
signified a score which reflects overall quality of generated text. We
achieved a BLEU score of 0.645 for our considered dataset.
References
1. Sharma, H., Agrahari, M., Singh, S. K., Firoj, M., & Mishra, R. K. (2020). Image
captioning: A comprehensive survey. In 2020 International Conference on Power
Electronics & IoT Applications in Renewable Energy and its Control (PARC) (pp.
325–328). IEEE.
2. Stefanini, M., Cornia, M., Baraldi, L., Cascianelli, S., Fiameni, G., & Cucchiara,
R. (2022). From show to tell: a survey on deep learning-based image captioning.
IEEE Transactions on Pattern Analysis and Machine Intelligence.
4. Chohan, M., Khan, A., Mahar, M. S., Hassan, S., Ghafoor, A., & Khan, M.
(2020). Image captioning using deep learning: A systematic. Image, 11(5).
5. Tiwari, R. S., Das, T. K., Srinivasan, K., & Chang, C. Y. (2022). Conceptualising
a channel-based overlapping CNN tower architecture for COVID-19 identification
from CT-scan images. Scientific Reports, 12(1), 1–15.
[Crossref]
6.
Roy, S. S., Rodrigues, N., & Taguchi, Y. (2020). Incremental dilations using CNN
for brain tumor classification. Applied Sciences, 10(14), 4915.
[Crossref]
7. Das, T. K., Roy, P. K., Uddin, M., Srinivasan, K., Chang, C. Y., & Syed-Abdul, S.
(2021). Early tumor diagnosis in brain MR images via deep convolutional neural
network model. Computers, Materials and Continua, 68(2), 2413–2429.
[Crossref]
8. Biswas, R., Vasan, A., & Roy, S. S. (2020). Dilated deep neural network for
segmentation of retinal blood vessels in fundus images. Iranian Journal of
Science and Technology, Transactions of Electrical Engineering, 44(1), 505–518.
[Crossref]
9. Roy, S. S., Hsu, C., Samaran, A., Goyal, R., Pande, A., et al. (2023). Vessels
segmentation in angiograms using convolutional neural network: A deep learning
based approach. CMES-Computer Modeling in Engineering & Sciences, 136(1),
241–255.
[Crossref]
10. Das, T. K., Chowdhary, C. L., & Gao, X. Z. (2020). Chest X-ray investigation: a
convolutional neural network approach. Journal of Biomimetics, Biomaterials and
Biomedical Engineering, 45, 57–70. Trans Tech Publications Ltd.
11. Zohourianshahzadi, Z., & Kalita, J. K. (2022). Neural attention for image
captioning: Review of outstanding methods. Artificial Intelligence Review, 55(5),
3833–3862.
[Crossref]
12. Wang, C., Yang, H., Bartz, C., & Meinel, C. (2016). Image captioning with deep
bidirectional LSTMs. In Proceedings of the 24th ACM International Conference
on Multimedia (pp. 988–997).
13. Rampal, H., & Mohanty, A. (2020). Efficient CNN-LSTM based image captioning
using neural network compression. Preprint retrieved from arXiv:2012.09708.
14. Chen, X., & Zitnick, C. L. (2014). Learning a recurrent visual representation for
image caption generation. Preprint retrieved from arXiv:1411.5654.
15. Sharma, H., & Jalal, A. S. (2020). Incorporating external knowledge for image
captioning using CNN and LSTM. Modern Physics Letters B, 34(28), 2050315.
[MathSciNet][Crossref]
16.
You, Q., Jin, H., Wang, Z., Fang, C., & Luo, J. (2016). Image captioning with
semantic attention. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (pp. 4651–4659).
17. Rampal, H., & Mohanty, A. (2020). Efficient CNN-LSTM based image captioning
using neural network compression. Preprint retrieved from arXiv:2012.09708.
18. Arnav, J. H., & Pulkit, M. (2018). Image captioning using deep learning.
19. Yao, T., Pan, Y., Li, Y., Qiu, Z., & Mei, T., (2017). Boosting image captioning
with attributes. In Proceedings of the IEEE International Conference on Computer
Vision (pp. 4894–4902).
20. Singh, Y. P., Ahmed, S. A. L. E., Singh, P., Kumar, N., & Diwakar, M. (2021).
Image captioning using artificial intelligence. In Journal of Physics: Conference
Series (Vol. 1854, No. 1, p. 012048). IOP Publishing.
21. Wang, C., Yang, H., Bartz, C., & Meinel, C. (2016). Image captioning with deep
bidirectional LSTMs. In Proceedings of the 24th ACM International Conference
on Multimedia (pp. 988–997).
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
S. S. Roy et al. (eds.), Deep Learning Applications in Image Analysis, Studies in Big Data 129
https://doi.org/10.1007/978-981-99-3784-4_4
K. Ganesan
Email: kganesan@vit.ac.in
1 Introduction
Every year, many individuals die all across the world. One of the most common
causes of death is a vehicle accident. Accidents not only kill people, but also harm a
large number of people. Among the several causes of accidents, high-speed vehicles
are the most important cause. As a result, high-speed vehicles must be managed. As
a result, different government organisations, academic institutions, and automobile
manufacturers have begun various studies and projects to lower the likelihood of
accidents and provide safety to passengers and drivers. Several researchers have
used different kinds of mechanisms to detect vehicle over-speed in highways such
as VANET technology to connect with cloud server [1], video based specific area of
Interest (ROI) [2], and Electronic toll collection data based speed prediction [3]. To
manage high-speed vehicles on the highway, the Tamil Nadu government planned
to install an over-speed detecting device in the toll plaza. Figure 1 depicts a block
diagram of over-speed detection in a toll plaza. This architecture is made up of a
vehicle detection system, a common cloud server that is linked to an RTO server,
and an over-speed detection system.
Fig. 1 Block diagram of the proposed system for high speed detection
The CRNN decoder is composed of the CTC layer and the Bi-directional LSTM
layer. Bi-directional LSTM receives its input from a feature map's column vector.
The probability matrix of (Wx8)/HxC, where C is the number of character labels
and is set to English uppercase letters in 26, English lowercase letters in 26, and a
space, is the output and it represents the probabilities of characters in each column
vector. The feature map that was recovered has a width of (Wx8)/H. The likelihood
of the label sequence is determined by applying the CTC layer to the Bi-directional
LSTM's output. The likelihood of the label sequence during training is determined
by the conditional probability defined in the CTC layer. The conditional
probability's negative log-likelihood serves as the loss function for training the
network. The probability sum of all pathways that are genuine label sequences is
calculated by the CTC layer. The paths ‘hee-ll-o’ and ‘hh-ee-ll-oo’ (where ‘-’
signifies a space) eliminate duplicates and spaces to show the label sequence ‘helo.‘
The test's recognition result is determined by which character sequence has the
highest probability.
The equations for detecting the curve using the sequence of segment point are
shown below. Before calculating the radius of curvature, we must first calculate the
great-circle distance between two points using the ‘Haversine’ formula.
(3)
(4)
(5)
where is latitude, is longitude, ER is radius of earth (ER = 6,371 km). Let's
think about the distance between segment point S1 to S2 as a, S2 to S3 as b, and S1
to S3 as c then calculating the radius is
(6)
According to the Indian Roads Congress [27, 28], a vehicle can travel at a speed
of 70 to 80 km/h in a 1000 m radius curve on the Indian highways [26]. So, we
assume that the maximum radius of an Indian road curve is 1000 m. Using Eq. (6)
at segment points S1, S2, and S3 from Fig. 3a, we find that the radius of these three
segment points is more than 1000 m because they are interconnected like a straight
line. So, we check the next adjacent three-segment points S2, S3, and S4. The
radius of these three-segment points (S2, S3, and S4) is less than 1000 m because it
looks like a curve. These three segment points’ radius values are recorded in the
radius list, and this procedure is repeated for subsequent segment points until we
reach the final set of segment points along the path.
Figure 3a segment points S3–S9 yield six curve radii R1, R2, R3, R4, R5, and
R6. This information is saved in the radius list. After that, the average curve radius
(R1 to R6) of the segment points S2 to S9 is calculated. The detected curve of the
path in Fig. 3a is shown in Fig. 3b. (Explained in Algorithm 1).
The curve list keeps track of the curve's starting segment point S2, ending
segment point S8, mid-segment point S6, and computed average curve radius. The
method is utilized with the route between the origin and destination, and the found
curves and their attributes (curve starting point, curve ending point, curve mid-
point, and average curve radius) are saved in the curve list. Figure 4 shows that the
source location is on a highway, but the destination location is on a mountainous
(hilly) terrain. The curves on the highway are always large. A single curve that is
1000 m long, as seen in Fig. 4 (top red solid circle), is a good example. The
mountainous landscape here features several hairpin bends. These curves have a
radius of 50 to 150 m, and some curves are 500 m long. As the assumed maximum
curve radius of roads in India is 1000 m, multiple hairpin curves form a single
curve, as seen in Fig. 4 (bottom two red solid circles).
Fig. 4 Many types of curve in different terrain
(7)
where the is straight road distance which is subtraction result of Toll road
distance from total distance of curvature road The time taken to travel
only on straight road is straight road distance divided by declared speed
.
Here, the , , and is time to travel over the curvature, which is
computed from , , curvature distance divided by curvature speed
restriction , . Finally, the curvature aware travel time is
obtained by adding the travel time on straight road with every curvature travel
time , , and .
The system connects with the camera to detect vehicle type and extract license
plate and add the current timestamp. Before that, the system downloads vehicle
information and timestamp of vehicle entry in the previous toll gate, along with
RTO information of vehicle from the cloud server. When the vehicle enters the toll
booth, the over speed detection system in each booth checks the vehicle information
from the downloaded information from cloud server. If there is no match, then it
considers the vehicle entering the toll booth as new, so the vehicle information with
current timestamp is added to the cloud server. In case the entered vehicle
information is matched with downloaded information, then it checks for the over-
speed. The over-speed is calculated using the following formula.
(11)
(12)
where is vehicle travelled time, calculated from subtracting its current toll
booth timestamp from pervious toll booth timestamp . Here, the over
speed is whether the vehicle travelled time is less than Toll gate declared curve
aware travel time (Eq. 10). if the vehicle over speed is detected, then it will be
entered into the violator database and get fined by the field inspector.
3 Result
The vehicle overspeed detection system's testbed location is set up in two toll
plazas in Tamilnadu, India: Pallikonda and Ranipet. The results of this testbed are
described below.
The cropped license plate image is then fed into the CRNN text recognition
algorithm, which produces the extracted text of the license plate, as shown in the
bottom right (Predicted num) of Figs. 8 and 9. This Vehicle detection & License
plate extraction system output is vehicle type and license plate text recognition,
which is sent to the vehicle over-speed detection module.
The type 2 error ratio is denoted by the formula TIIR = m/n, where m denotes
the quantity of type 2 errors, n denotes the quantity of ground truth curves, and
TIIR denotes the type 2 error ratio.
Table 3 displays the Type 1, Type 2, TIIR, actual, predicted curve numbers, the
overall distance between source and destination, performance delay, types of curves
predicted, and noise corrected curve numbers for locations in India (rows 1 to 5),
France (row 6), and the United States (row 7). The data from Google Maps Road
segments is the same all over the world. As a result, the proposed method can
extract curves from road segments anywhere in the world. One minor distinction is
that vehicles in India travel on the left side of the road, whereas vehicles in France
and the United States drive on the right side. As a result, depending on the vehicle
travelling direction, the starting and ending point of the curve varies from country
to country. Each (7 rows of Table 3) source to destination Google map road
segment data was collected and the starting and ending point of each curve was
manually identified; this ground truth data were compared with the proposed model
recognized curve data. Here, one highway road (row 1), a hilly terrain road (row 2),
a university road (row 3), and a tunnel road (row 5) have all been tested in India.
The curve observed in the highway starting location and hilly terrain destination
location is shown in Table 3, row no. 4. The proposed method can extract curve
from the tunnel road. Figure 5b shows the row 5 in the twin tunnel in Mumbai city.
Noise is the result of a GPS segment being drawn incorrectly over an existing
segment. This form of noise, may be corrected using the proposed approach, as
shown in Fig. 6b. However, this noise frequently misrepresents a straight road as a
curving road. Because a hilly terrain road includes more noise segments than a road
segment in the plains, this kind of error is classified as a Type 2 error. However, it is
not dangerous because one can get alerted if a curve exists. If the proposed curve
detection algorithm fails to detect a curve with a radius of fewer than 60 m, it is
dangerous because this type of road includes sharp or blinding bends and is prone
to accidents. This type of curve was successfully identified using the proposed
method. The final column (column no. 11) in Table 3, displays the predicted
number of curves in the specified location using the existing method [22]. The
existing method lacks the capability of removing curve noise and therefore, the
noise is declared as a curve.
Figure 13a demonstrates how to search the Log cloud database. Date, time, a
vehicle's number, or a Toll gate ID can all be used to search the log's details. How
to search the Violator cloud database is shown in Fig. 13b. By vehicle
identification, date, time, or toll gate ID, one can search the details of the violator.
3.5 Discussion
In this test-bed for both toll plazas, a total of 3552 vehicles passed through all toll
booths during the two hours test-bed's time. Figure 14 displays a bar graph of
vehicle passes broken down by booth. During the two hours of testing, two vehicles
received fines for exceeding the government-mandated speed limits of 100 km/h for
cars and 80 km/h for trucks. There could be a large number of vehicles that receive
fines if the vehicle speed limit was set using a curve-aware travel time estimation.
According to Fig. 15, there would be 13 to 14 vehicles fined if the speed limit for
cars was 90 km/h.
The Pallikonda and Ranipet Toll Plazas were used for two hours of testing
during the test-bed. This test site was overseen by the Vellore branch of RTO, the
Tamil Nadu government. Figure 16a depicts the RTO and inspector's presence at the
Pallikonda Toll Plaza during the test-bed period. The experts testing the vehicle
over-speed detection system in the Ranipet toll plaza are depicted in Fig. 16b.
4 Conclusion
Highway traffic moving at an excessive speed needs to be controlled. The proposed
vehicle over-speed detection system can be used to determine whether or not a
vehicle that is travelling between two toll plaza roads was travelling at an excessive
speed. In this regard, a new curve-finding algorithm is proposed to precisely
determine the travel time of the vehicle. In the proposed vehicle over-speed
detection system, this curve-aware travel time is used. The Pallikonda and Ranipet
toll plazas participated in the real-time test-bed for a two-hour testing period under
the direction of the RTO, Tamilnadu government. Due to speeding, two vehicles
were found and fined. This system is currently being tested in two plazas; however,
in the future, it could be expanded to all toll plazas. In the future, the camera-based
license plate extraction module will be replaced by an RFID tag-based vehicle
information extraction module, which is currently used in every vehicle in India
under the brand name FastTag.
References
1. Nayak, R. P., Sethi, S., & Bhoi, S. K. (2018). PHVA: A position based high speed vehicle
detection algorithm for detecting high speed vehicles using vehicular cloud. In 2018
International Conference on Information Technology (ICIT). https://doi.org/10.1109/icit.
2018.00054
2.
Krishnakumar, B., Kousalya, K., Mohana, R., Vellingiriraj, E., Maniprasanth, K., &
Krishnakumar, E. (2022). Detection of vehicle speeding violation using video processing
techniques. In 2022 International Conference on Computer Communication and
Informatics (ICCCI). https://doi.org/10.1109/iccci54379.2022.9740909
3. Zou, F., Ren, Q., Tian, J., Guo, F., Huang, S., Liao, L., & Wu, J. (2022). Expressway speed
prediction based on electronic toll collection data. Electronics, 11(10), 1613. https://doi.
org/10.3390/electronics11101613
[Crossref]
4. Shen, J., Zhou, W., Liu, N., Sun, H., Li, D., & Zhang, Y. (2022). An anchor-free
lightweight deep convolutional network for vehicle detection in aerial images. IEEE
Transactions on Intelligent Transportation Systems.
5. Roy, S. S., Rodrigues, N., & Taguchi, Y. (2020). Incremental dilations using CNN for brain
tumor classification. Applied Sciences, 10(14), 4915.
[Crossref]
6. Samui, P., Roy, S. S., & Balas, V. E. (Eds.). (2017). Handbook of neural computation.
Academic Press.
7. Biswas, R., Vasan, A., & Roy, S. S. (2019). Dilated deep neural network for segmentation
of retinal blood vessels in fundus images. Iranian Journal of Science and Technology,
Transactions of Electrical Engineering, 1–14.
8. Rajput, S. K., Patni, J. C., Alshamrani, S. S., Chaudhari, V., Dumka, A., Singh, R., Rashid,
M., Gehlot, A., & AlGhamdi, A. S. (2022). Automatic vehicle identification and
classification model using the YOLOv3 algorithm for a toll management system.
Sustainability, 14(15), 9163. https://doi.org/10.3390/su14159163
[Crossref]
9. Wang, W., Yang, J., Chen, M., & Wang, P. (2019). A light CNN for end-to-end car license
plates detection and recognition. IEEE Access, 7, 173875–173883. https://doi.org/10.1109/
access.2019.2956357
[Crossref]
10. Huang, Q., Cai, Z., & Lan, T. (2021). A new approach for character recognition of multi-
style vehicle license plates. IEEE Transactions on Multimedia, 23, 3768–3777. https://doi.
org/10.1109/tmm.2020.3031074
[Crossref]
11. Seo, T., & Kang, D. (2022). A robust layout-independent license plate detection and
recognition model based on attention method. IEEE Access, 10, 57427–57436. https://doi.
org/10.1109/access.2022.3178192
[Crossref]
12.
Henry, C., Ahn, S. Y., & Lee, S. (2020). Multinational license plate recognition using
generalized character sequence detection. IEEE Access, 8, 35185–35199. https://doi.org/10.
1109/access.2020.2974973
[Crossref]
13. Park, S., Yu, S., Kim, J., & Yoon, H. (2022). An all-in-one vehicle type and license plate
recognition system using YOLOv4. Sensors, 22(3), 921. https://doi.org/10.3390/s22030921
[Crossref]
14. Alam, N., Ahsan, M., Based, M. A., & Haider, J. (2021). Intelligent system for vehicles
number plate detection and recognition using convolutional neural networks. Technologies,
9(1), 9. https://doi.org/10.3390/technologies9010009
[Crossref]
15. Alghyaline, S. (2022). Real-time Jordanian license plate recognition using deep learning.
Journal of King Saud University-Computer and Information Sciences, 34(6), 2601–2609.
https://doi.org/10.1016/j.jksuci.2020.09.018
[Crossref]
16. Raghunandan, K. S., Shivakumara, P., Jalab, H. A., Ibrahim, R. W., Kumar, G. H., Pal, U.,
& Lu, T. (2018). Riesz fractional based model for enhancing license plate detection and
recognition. IEEE Transactions on Circuits and Systems for Video Technology, 28(9).
17. Dalarmelina, N. D., Teixeira, M. A., & Meneguette, R. I. (2019). A real-time automatic
plate recognition system based on optical character recognition and wireless sensor
networks for ITS. Sensors, 20(1), 55. https://doi.org/10.3390/s20010055
[Crossref]
18. Singh, P., Patwa, B., Saluja, R., Ramakrishnan, G., & Chaudhuri, P. (2019).
StreetOCRCorrect: An interactive framework for OCR corrections in chaotic Indian street
videos. In 2019 International Conference on Document Analysis and Recognition
Workshops (ICDARW). https://doi.org/10.1109/icdarw.2019.10036
19. Jagtap, J., & Holambe, S. (2018). Multi-style license plate recognition using artificial
neural network for Indian vehicles. In 2018 International Conference on Information,
Communication, Engineering and Technology (ICICET). https://doi.org/10.1109/icicet.
2018.8533707
20. Ravirathinam, P., & Patawari, A. (2019). Automatic license plate recognition for Indian
roads using Faster-RCNN. In 2019 11th International Conference on Advanced Computing
(ICoAC). https://doi.org/10.1109/icoac48765.2019.246853
21. Khan, S. U., Alam, N., Jan, S. U., & Koo, I. S. (2022). IoT-enabled vehicle speed
monitoring system. Electronics, 11(4), 614. https://doi.org/10.3390/electronics11040614
[Crossref]
22.
Li, Z., Chitturi, M., Bill, A., & Noyce, D. (2012). Automated identification and extraction
of horizontal curve information from geographic information system roadway maps.
Transportation Research Record: Journal of the Transportation Research Board, 2291, 80–
92.
23. Horzyk, A., & Ergun, E. (2020). YOLOv3 precision improvement by the weighted centers
of confidence selection. In 2020 International Joint Conference on Neural Networks
(IJCNN). https://doi.org/10.1109/ijcnn48605.2020.9206848
24. Jayaraman, S., Esakkirajan, S., Veerakumar, T. (2015). Digital image processing. Tata
McGraw Hill publication, Indian Edition.
25. Shi, B., Bai, X., & Yao, C. (2017). An end-to-end trainable neural network for image-based
sequence recognition and its application to scene text recognition. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 39(11), 2298–2304. https://doi.org/10.1109/
tpami.2016.2646371
[Crossref]
26. Bains, M. S., Bhardwaj, A., Arkatkar, S., Velmurugan, S. (2013). Effect of speed limit
compliance on roadway capacity of Indian expressways. Procedia-Social and Behavioral
Sciences, 104, 458−467
27. IRC: 73. (1980). Geometric design standards for rural (Non-urban) highways. Indian
Roads Congress.
28. IRC: 38. (1988). Guidelines for design of horizontal curves for highways and design tables.
Indian Roads Congress.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
S. S. Roy et al. (eds.), Deep Learning Applications in Image Analysis, Studies in Big Data 129
https://doi.org/10.1007/978-981-99-3784-4_5
Sergey Antonov
Email: serg2209157.antono@yandex.ru
Mikhail Bogachev
Email: mibogachev@etu.ru
Aleksandr Sinitca
Email: amsinitca@etu.ru
1 Introduction
Recently boosted by the COVID-19 pandemic, digital technologies played
an increasingly significant role in the public-health response to contact
tracing worldwide. Budd et al. [12] provides a comprehensive review of
digital innovations developed in response to COVID-19 worldwide,
including legal, ethical and privacy barriers to their implementation, as well
as organizational and workforce restrictions. The review covers
technologies developed in responce to five public-health needs, including
epidemiological surveillance, rapid case identification, control of
community transmission, communication of essential medical information
and clinical support [5].
Interrupting community transmission requires rapid tracing and
quarantining of contacts in order to prevent further transmission.
Technologies supporting such activities are largely based on proximity
tracing [17], which is usually implemented using smartphone apps ([57,
59]) and low-power Bluetooth technologies. Hossain et al. [18] recently
proposed a B5G framework that employs high throughput and low latency
of modern 5G network standard to exchange chest X-ray [20] or CT scan
images [41] for an early instrumental detection of COVID-19, as well as
development of a mass surveillance system to control and manage social
distancing, mask wearing, and body temperature monitoring. The above
approach lies in the context of various AI-based integrated emergency
response solutions attracting increasing interest in recent years [40, 42, 44].
Privacy is one of the major concerns in this context, strongly limiting
the applicability of various solutions. As a prominent example, Norway has
stopped using the Smittestopp app and switched to the Bluetooth approach
[60]. Several international frameworks with various systematic approaches
to privacy preservation are emerging, including Decentralized Privacy-
Preserving Proximity Tracing [58], the Pan-European Privacy-Preserving
Proximity Tracing initiative [61] and the joint Google–Apple framework
[56].
A key limitation of contact-tracing apps such as those mentioned above
is that they require a large proportion of the population to use the app.
However, the practical effectiveness of these apps is strongly limited by
smartphone ownership, user compliance, and technical compatibility [12].
An alternative approach, which can be more effective in a variety of
scenarios is proximity tracing based on video surveillance.
There are only few works addressing video surveillance in the context
of COVID-19 pandemic. Punn et al. [38] proposes a framework that utilizes
the YOLO v3 object detection model not only to detect, but also to
distinguish between humans using the Deepsort approach capable of further
tracking the identified persons according to their assigned IDs. The results
of the YOLO v3 model are further compared with other popular
convolutional neural network architectures, such as SSD (Single Shot
Detector), R-CNN (Region-Based CNN) and their modifications. Rezaei et
al. [39] use a YOLOv4-based framework and inverse perspective mapping
to improve accuracy of personal identification for an improved social
distance tracking in the presence of disturbance factors, such as crowd
occlusion, partial visibility, and lighting variations, also providing a risk
assessment scheme based on the statistical analysis of personalized
movement trajectories and the rate of social distancing violations.
Like in the case with mobile apps for tracing proximity, any solution
based on video surveillance needs to address privacy concerns. In this paper
we propose a framework which builds on the ideas of object detection and
trajectory analysis, incorporated from the literature on pedestrian tracking,
but also integrates elements that will allow for addressing privacy issues:
facial recognition system which maps faces to anonymzed IDs, and the
construction of an anonymzied potential spread graph, which can be used in
scenarios such as contact tracking and epidemiological surveillance.
Now more than two years since the onset of the pandemic, public
attention is increasingly shifting towards finding optimal exit strategies,
including adaptation of the technologies that have been rapidly deployed
earlier in the course of the pandemic, and finding their place in the post-
pandemic society. Here we show explicitly how the AI-based framework
for proximity tracing based on video surveillance in public places proposed
here can be used in different scenarios ranging from individual contact
tracing or epidemiological surveillance of crowds to the improved public
spaces planning.
Existing body of work, e.g., on automatic pedestrian behavior analysis
can be adapted to this context [52]. These approaches usually employ
various models for object detection. However, the pandemic largely
changed our vision of the goals that have to be achieved in public spaces
planning. There is compelling evidence that various social distancing
measures also reduced the spread of other infectious diseases such as
common cold or flu, which accounts for around 166 million working days
loss in the U.S. only, that nearly doubles when taking into account parents
that skip work due to the colds caught by their children, even outside of the
pandemic context. Therefore, adapation of the technologies widely used
during the current COVID-19 pandemic to reduce community transmissions
of other respiratory diseases such as common cold and flu, could be
advantageous for at least a partial reduction of these losses.
The rest of the paper is organized as follows. Section 2 presents an
overview of the proposed framework and the corresponding video data
processing pipeline. Section 3 focuses on the proximity networks, which
can be used in a variety of scenarios to address public-health needs.
Section 4 describes the evaluation of our approach for a series of videos
captured by the street surveilance cameras. Section 5 introduces statistical
quantities that are associated with the risks of community transmission and
discusses how they could be used for future improvements in public space
planning aiming at the reduction of community transmission risks in the
post-pandemic society.
To obtain the accuracy metrics, the IoU is next compared against a fixed
threshold , that equals 0.5 in our example. When the decision
in made in favor of hypothesis , otherwise the decision is made in favor
of hypothesis . . The accuracy of the decision making procedure is
quantified based on the true positive (TP) rate indicating the rate of
decisions in favor of hypothesis under the validity of hypothesis ,,
and by the false positive (FP) rate, indicating the rate of decisions in favor
of hypothesis under validity of hypothesis (see, e.g., [48] and
references therein). In a numerical treatment, based on the above rates, one
can estimate precision
(1)
and recall
(2)
(4)
(5)
(6)
(7)
where ( ) are the bounding box coordinates. Thus,
for a given homography matrix, transformation to the world coordinates can
be expressed as
(8)
(9)
where is the distance from the center of the node to the search point,
is the search radius, is the node radius, determining the border of the
inner subtree. The world scale of the distances between two bird's eye
viewpoints is determined using the size of the camera pixel obtained from
the calibration procedure, and the distance between two points is calculated
as
(11)
If this condition is met, the search can continue in the internal subtree
only.
2. The entire search area is included in the external subtree.
(12)
If this condition is met, the search can only continue in the external
subtree.
3. The entire search area is distributed over both subtrees.
4 Experiments
4.1 Combined Dataset of Neural Network Training
Next, we evaluated the approach using several sample videos recorded by
surveillance cameras in busy outdoor public places. For neural network
training, we combined two different datasets, that are among the most
popular for object detection algorithms learning, PASCAL VOC [16] and
COCO [25]. Although they differ in the amount of annotation, both of them
contain sufficient information to extract bounding boxes around detected
people. Figure 3 shows the histogram of person count in images for the
resulting dataset, indicating that the majority of images contained one single
person, while a significant number of images contained up to twenty
different people.
Fig. 3 Distribution of people number in images
Fig. 11 Examples of pairwise contact duration matrices for six representative short
scenes captured from a street video surveilance camera for . Matrix sizes are
determined by the total number of individuals captured in each scene, with their total
pairwise duration of proximity (in seconds) indicated by color
Acknowledgment
The work of Sergey Antonov was supported by the Ministry of Science and
Higher Education of the Russian Federation “Goszadanie” No 075-01024-
21-02 from 29.09.2021 (Project No. FSEE-2021-0014).
References
1. Altmann, E., & Kantz, H. (2005). Recurrence time analysis, long-term
correlations, and extreme events. Physical Review E, 71(5), 056106.
[MathSciNet][Crossref]
2.
Amos, B., Ludwiczuk, B., & Satyanarayanan, M. (2016). Openface: A general-
purpose face recognition library with mobile applications. Technical report, CMU-
CS-16–118, CMU School of Computer Science.
3. Anggo, M., & Arapu, L. (2018). Face recognition using fisherface method.
Journal of Physics: Conference Series, 1028, 012119. https://doi.org/10.1088/
1742-6596/1028/1/012119
[Crossref]
4. Balaban, S. (2015). Deep learning and face recognition: the state of the art. In
Biometric and Surveillance Technology for Human and Activity Identification XII
(vol. 9457, p. 94570B). International Society for Optics and Photonics.
5. Biswas, R., Vasan, A., & Roy, S. S. (2020). Dilated deep neural network for
segmentation of retinal blood vessels in fundus images. Iranian Journal of
Science and Technology, Transactions of Electrical Engineering, 44(1), 505–518.
[Crossref]
7. Bogachev, M., Eichner, J., & Bunde, A. (2007). Effect of nonlinear correlations on
the statistics of return intervals in multifractal data sets. Physical Review Letters,
99(24), 240601.
[Crossref]
8. Bogachev, M., Eichner, J., & Bunde, A. (2008). On the occurence of extreme
events in long-term correlated and multifractal data sets. Pure and Applied
Geophysics, 165, 1195–1207.
[Crossref][zbMATH]
11. Bogachev, M., Markelov, O., Kayumov, A., & Bunde, A. (2017). Superstatistical
model of bacterial DNA architecture. Scientific Reports, 7, 43034.
[Crossref]
12. Budd, J., Miller, B. S., Manning, E. M., Lampos, V., Zhuang, M., Edelstein, M.,
Rees, G., Emery, V. C., Stevens, M. M., Keegan, N., et al. (2020). Digital
technologies in the public-health response to covid-19. Nature Medicine, 1–10.
13. Bunde, A., Bogachev, M., & Lennartz, S.: Precipitation and river flow: Long-term
memory and predictability of extreme events. Extreme Events and Natural
Hazards: The Complexity Perspective, 139–152.
14. Bunde, A., Eichner, J., Havlin, S., & Kantelhardt, J. (2004). Return intervals of
rare events in records with long-term persistence. Physica A: Statistical
Mechanics and its Applications, 342(1), 308–314.
[Crossref]
15. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human
detection. In 2005 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR 2005) (vol. 1, pp. 886–893). IEEE (2005). https://doi.
org/10.1109/cvpr.2005.177
16. Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., &
Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective.
International Journal of Computer Vision, 111(1), 98–136.
[Crossref]
17. Ferretti, L., Wymant, C., Kendall, M., Zhao, L., Nurtay, A., Abeler-Dörner, L.,
Parker, M., Bonsall, D., & Fraser, C. (2020). Quantifying sars-cov-2 transmission
suggests epidemic control with digital contact tracing. Science, 368(6491).
18. Hossain, M. S., Muhammad, G., & Guizani, N. (2020). Explainable ai and mass
surveillance system-based healthcare framework to combat covid-i9 like
pandemics. IEEE Network, 34(4), 126–132.
[Crossref]
19. Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces
in the wild: A database for studying face recognition in unconstrained
environments. Technical Report 07-49, University of Massachusetts, Amherst.
20.
Jalali, S. M. J., Ahmadian, M., Ahmadian, S., Hedjam, R., Khosravi, A., &
Nahavandi, S. (2022). X-ray image based COVID-19 detection using evolutionary
deep learning approach. Expert Systems with Applications, 201, 116942.
[Crossref]
21. Jalled, F. (2017). Face recognition machine vision system using eigenfaces.
22. Karsai, M., Jo, H. H., Kaski, K., et al. (2018). Bursty human dynamics. Springer
24. Lellouche, S., & Souris, M. (2020). Distribution of distances between elements in
a compact set. Stats, 3(1), 1–15.
[Crossref]
25. Lin, T., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick, R. B., Hays, J.,
Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO:
Common objects in context. CoRR abs/1405.0312
26. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., Berg, A. C.
(2016). Ssd: Single shot multibox detector (pp. 21–37). Lecture Notes in
Computer Science. https://doi.org/10.1007/978-3-319-46448-0_2
27. Li, Z., Yang, W., Peng, S., & Liu, F. (2020). A survey of convolutional neural
networks: Analysis, applications, and prospects
29. Markelov, O., Nguyen, V., & Bogachev, M. (2017). Statistical modeling of the
internet traffic dynamics: To which extent do we need long-term correlations?
Physica A: Statistical Mechanics and its Applications, 485, 48–60.
[Crossref]
31. Mundy, J. L., Zisserman, A., et al. (1992). Geometric invariance in computer
vision (Vol. 92). MIT press Cambridge.
32.
Newell, G., & Rosenblatt, M. (1962). Zero crossing probabilities for gaussian
stationary processes. The Annals of Mathematical Statistics, 33(4), 1306–1313.
[MathSciNet][Crossref][zbMATH]
33. Nguyen, T., Chen, S.W., Shivakumar, S. S., Taylor, C. J., & Kumar, V. (2017).
Unsupervised deep homography: A fast and robust homography estimation model.
34. Nguyen, V., Markelov, O., Serdyuk, A., Vasenev, A., & Bogachev, M. (2018).
Universal rank-size statistics in network traffic: Modeling collective access
patterns by zipf’s law with long-term correlations. EPL (Europhysics Letters),
123(5), 50001.
[Crossref]
35. Panigrahy, R. (2008). An improved algorithm finding nearest neighbor using kd-
trees. Lecture Notes in Computer Science, pp. 387–398. Springer Berlin
Heidelberg. https://doi.org/10.1007/978-3-540-78773-0_34
36. Pan, J., & Manocha, D. (2011). Fast gpu-based locality sensitive hashing for k-
nearest neighbor computation. In Proceedings of the 19th ACM SIGSPATIAL
international conference on advances in geographic information systems, GIS, pp.
211–220. Association for Computing Machinery, New York, NY, USA. https://
doi.org/10.1145/2093973.2094002
37. Pönisch, W., & Zaburdaev, V. (2018). Relative distance between tracers as a
measure of diffusivity within moving aggregates. The European Physical Journal
B, 91(2), 1–7.
[Crossref]
38. Punn, N. S., Sonbhadra, S. K., & Agarwal, S. (2020). Monitoring covid-19 social
distancing with person detection and tracking via fine-tuned yolo v3 and deepsort
techniques.
39. Rezaei, M., & Azarmi, M. (2020). Deepsocial: Social distancing monitoring and
infection risk assessment in covid-19 pandemic. arXiv preprint arXiv:2008.11672
40. Roy, S. S., Goti, V., Sood, A., Roy, H., Gavrila, T., Floroian, D., Paraschiv, N. &
Mohammadi-Ivatloo, B. (2014). L2 regularized deep convolutional neural
networks for fire detection. Journal of Intelligent & Fuzzy Systems, 1–12.
41. Roy, S. S., Rodrigues, N., & Taguchi, Y. (2020). Incremental dilations using CNN
for brain tumor classification. Applied Sciences, 10(14), 4915.
[Crossref]
42.
Roy, S. S., Mihalache, S. F., Pricop, E., & Rodrigues, N. (2022). Deep
convolutional neural network for environmental sound classification via dilation.
Journal of Intelligent & Fuzzy Systems, 1–7.
43. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018).
Mobilenetv2: Inverted residuals and linear bottlenecks.
44. Samui, P., Roy, S. S., & Balas, V. E. (Eds.). (2017). Handbook of neural
computation. Academic Press.
45. Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding
for face recognition and clustering. In 2015 IEEE conference on computer vision
and pattern recognition (CVPR), pp. 815–823. https://doi.org/10.1109/CVPR.
2015.7298682
46. Singh, S., Kaur, A., & Taqdir, A. (2015). A face recognition technique using local
binary pattern method. IJARCCE, 165–168. https://doi.org/10.17148/IJARCCE.
2015.4340
47. Skliros, A., & Chirikjian, G. S. (2008). Position and orientation distributions for
locally self-avoiding walks in the presence of obstacles. Polymer, 49(6), 1701–
1715.
[Crossref]
48. Sokolova, A., Uljanitski, Y., Kayumov, A. R., & Bogachev, M. I. (2021).
Improved online event detection and differentiation by a simple gradient-based
nonlinear transformation: Implications for the biomedical signal and image
analysis. Biomedical Signal Processing and Control, 66, 102470.
[Crossref]
49. Tamazian, A., Nguyen, V., Markelov, O., & Bogachev, M. (2016). Universal
model for collective access patterns in the internet traffic dynamics: A
superstatistical approach. EPL (Europhysics Letters), 115(1), 10008.
[Crossref]
50. Tao, Y., & Sheng, C. (2014). Fast nearest neighbor search with keywords. , IEEE
Transactions on Knowledge and Data Engineering, 26, 878–888. https://doi.org/
10.1109/TKDE.2013.66
[Crossref]
51.
Tejedor, V., Schad, M., Bénichou, O., Voituriez, R., & Metzler, R. (2011).
Encounter distribution of two random walkers on a finite one-dimensional
interval. Journal of Physics A: Mathematical and Theoretical, 44(39), 395005.
[MathSciNet][Crossref][zbMATH]
52. Vannoorenberghe, P., Motamed, C., Blosseville, J. M., & Postaire, J. G. (1997).
Automatic pedestrian recognition using real-time motion analysis. In International
conference on image analysis and processing (pp. 493–500). Springer.
53. Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of
simple features. In Proceedings of the 2001 IEEE computer society conference on
computer vision and pattern recognition (CVPR 2001, vol. 1, pp. I–I). IEEE
54. Yianilos, P. N. (1993). Data structures and algorithms for nearest neighbor search
in general metric spaces. In Proceedings of the fourth annual ACM-SIAM
symposium on discrete algorithms, SODA, pp. 311–321. Society for Industrial and
Applied Mathematics, USA.
55. Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and
alignment using multitask cascaded convolutional networks. IEEE Signal
Processing Letters, 23(10), 1499–1503. https://doi.org/10.1109/lsp.2016.2603342
Muhammad E. H. Chowdhury
Email: mchowdhury@qu.edu.qa
Supplementary Information
The online version contains supplementary material available at https://doi.
org/10.1007/978-981-99-3784-4_6.
Keywords Conjunctival melanoma – Computer-aided diagnosis – Deep
learning – Ocular surface images – Pretrained models
1 Introduction
The eye is a crucial and among the most intricate sensory organs which we
have as humans. It aids in our ability to visualize objects as well as our
perception of light, depth, and colour. Conjunctival nevus [1], which is a
relatively ordinary disorder, possesses several distinct clinical presentations
[2]. Sufferers who ask about conjunctival lesions are frequently encountered
during ordinary clinical treatment [3]. Conjunctival nevi could exhibit a
range of malignant or benign characteristics [4]. An uncommon but
possibly fatal malignant growth of the eye is called conjunctival melanoma
[1], which develops from melanocytes found within the conjunctival
epithelium's basal cells [5]. This uncommon tumour accounts for around
2% of all eye tumours, 5% of optic melanomas [6], and 0.25% of each type
of melanoma [7]. Mortality rates of at least 30% are associated with
conjunctival melanoma [8], which demands costly treatment, while a bad
prognosis is linked to a belated diagnosis [3, 5]. Conjunctival melanoma
often manifests as a pigmented or colourful sharp conjunctival lesion,
however unusual cases with a variety of morphologies might cause the
diagnosis to be delayed [9]. This condition might be caused either by nevus
or acquired melanosis [10]. To diminish the mortality caused by this
condition, prompt diagnosis and the practicality of detection are necessary,
given the contemporary scenario in several countries that involve an ageing
population as well as insufficient healthcare resources. An ophthalmologist
performs a conventional clinical examination to determine whether a patient
has conjunctival melanoma by viewing the ocular surface under a slit lamp,
where a biopsy is necessary to verify the diagnosis [3]. The implementation
of these in-clinic investigations has, however, been considerably impacted
by the contemporary outbreak due to COVID-19 [11]. Therefore,
ophthalmologists face significant difficulties in the prompt identification of
conjunctival melanoma [3].
Medical imaging has already been greatly impacted by deep learning,
and this influence is only anticipated to increase in future [12, 13]. Deep
learning, according to several experts, is going to be a key factor in the
forthcoming medicine and a key instrument for medical practice and
research [14–18]. In terms of the analyses of medical images, deep learning
methods have already demonstrated impressive, and frequently unheard-of,
performance and accomplishment in a wide range of tasks from both low-
and high-level image processing functions, including image classification,
detection, segmentation, enhancement, denoising, reconstruction,
registration etc. [19–26]. Deep learning techniques that make use of digital
images with pathological lesions are thought to be useful for enhancing the
detection of skin malignancies [27, 28]. Even though many studies utilizing
deep learning models have concentrated on skin melanoma [29–32], the use
of modern deep learning technology to identify conjunctival melanomas has
been underexplored. Because of the lack of substantial data including
ground truth data of conjunctival diseases, training traditional deep neural
networks to identify conjunctival melanoma is very difficult. Very recently,
deep learning techniques for identifying conjunctival melanoma from the
optic surface images were explored [3]. However, their dataset was not well
curated. Also, for the classification to perform even better, more research is
required. The current study's goal is to examine contemporary deep learning
techniques used to detect conjunctival melanoma utilizing a sizable,
enhanced optic surface image dataset. Four classes of image data, that are
conjunctival melanoma, melanosis or nevus, normal conjunctiva, and
pterygium [33] images, have been used in the present study. Considering
the research gap available in the field of classifying conjunctival
melanomas, the following contributions are proposed in this study:
A well-curated dataset for conjunctival melanoma is proposed which is
validated by medical experts.
An effective and faster augmentation technique is proposed counter to
CycleGAN-based augmentation [3] for increasing a small conjunctival
melanoma dataset.
A high-performing deep learning model is proposed in this study which
can classify the different eye conditions with high accuracy.
Additionally, we incorporated the interpretability of our findings. This
study intends to verify the hypothesis that conjunctival lesions could be
classified, and conjunctival melanoma could be found utilizing optic
surface images with the help of deep learning. The prompt identification of
conjunctival lesions might be made easier by this investigation.
The outline of this study is described in the sections below. The
following parts go into further information about the materials and methods
that were utilized. Afterwards, the findings are revealed and discussed. At
last, we address the conclusion and potential future research as we wrap off
our study.
2 Methodology
This study proposed a system where an image of the eye taken using a
smartphone can be classified as normal or other eye-related medical
conditions. The methods involved in this system start from data collection,
data cleaning and validation, CNN training and evaluation and visual
interpretation. Figure 1 illustrates the step-by-step workflow of the
methodology proposed in this study.
Table 1 Augmentation techniques and ranges used in the training set of proposed
datasets
Augmentation techniques Range
Random rotation +20 to −20 degree
Random affine Degree = 0
Translate range = (0.05, 0.15)
Scaling range = (0.9, 0.95)
Padding Range = (0,10)
Fill = (black, white)
Mode = (‘Constant’, ‘Edge’)
Colour correction Brightness = (0, 0.2)
Contrast = (0, 0.2)
In each of the two datasets, the size of the training set for each class was
expanded to three thousand samples by applying these four augmentation
techniques. As the validation and test sets were used for evaluating deep
learning models in a real-world setting, these two sets were left unchanged
throughout the process. Table 2 contains a description of the sizes of the
datasets along with the augmentation [37] factors.
Table 2 The detailed description of proposed datasets. The curated dataset is validated
by expert doctors and the training samples are increased by an augmentation factor
using different augmentation techniques
Dataset Class Original Validation Testing Training Augmentation Training set
data set set set factor after
samples augmentation
Binary Normal 125 13 25 87 34.48 3000
class conjunctiva
Abnormal 285 28 57 200 15 3000
conjunctiva
Four Normal 125 13 25 87 34.48 3000
class conjunctiva
dataset
Nevus 85 8 17 60 50 3000
Pterygium 70 7 14 49 61.44 3000
Conjunctival 130 13 26 91 32.97 3000
melanoma
2.3.1 GoogLeNet
GoogLeNet was proposed in the literature [40], which was built on the
Inception module. The authors of GoogLeNet proposed a wider and deeper
Inception which performed slightly better performance in the ImageNet
Large Scale Visual Recognition Challenge (ILSVRC) 2014 competition.
Inside Inception module with dimensionality reduction of GoogleNet, 1 × 1
convolution was added before every 3 × 3 and 5 × 5 convolution. This
model is 22 layers deep with 27 pooling layers where 9 inception modules
are stacked linearly. The end of the inception modules is entangaled to the
global average pooling layer. Detailed model architecture with
convolutional layers, pooling, and activations is available in the literature
[40].
2.3.2 ResNet
ResNet architecture proposed in the literature [41] was designed to counter
the vanishing gradient problem in the deeper CNN architectures. In a deep
CNN architecture, the features of the earlier layers start vanishing from the
network as it goes deeper and is introduced to more complex feature
extractors. As a result, the vanishing gradient happens and the residual
connection in ResNet architecture solves this problem by implementing a
skip connection which flows the feature from the earlier layer to deeper
layers. In this study, ResNet18, ResNet50 and ResNet152 were used. The
designation ResNet, which is then followed by a number consisting of two
or more digits, indicates, quite simply, the ResNet architecture with a
specific number of neural network layers. So, in this ocular surface image
classification research, 18, 50, and 152 layers-based ResNet
architectures were utilized for evaluation and comparison with other
counterpart CNN architectures.
2.3.3 DenseNet
The authors in [42] observed that deeper CNN models are more accurate
and efficient when the short connections are built among layers closer to
input and closer to output. By applying this observation, authors in [42]
proposed DenseNet, which works in a feed-forward fashion to connect each
layer to every other layer. The authors discovered that utilizing DenseNet
had several benefits, including the elimination of the vanishing-gradient
problem, which resulted in better feature propagation and reuse. This
particular sort of connection achieved benchmark results on the ImageNet
dataset while also significantly reducing the number of parameters. Both the
Densenet-161 and the Densenet-201 architectures were utilized in this
study; respectively, the depth of each design is 161 and 201 layers.
2.3.4 EfficientNet
All the CNNs, such as VGGNet, ResNet, MobileNet, and SeNet, employ a
variety of methods to improve the accuracy of the network. The methods
may increase any one of the three dimensions (width, depth, or resolution),
but at least one of them will. The authors in [43], addressed these methods
of scaling in the literature. The integration of all these strategies into
EfficientNet was accomplished by the proposal of a scaling mechanism that
scales consistently across all of these dimensions. EfficientNet_B7, a family
member of the EfficientNet architecture, achieved 84.3% top-1 accuracy on
ImageNet and pre-trained weights of this model performance were used in
our ocular surface image classification.
Table 3 Details for hyper-parameters used for all CNN models to train on “Binary
Class” and “Four Class” datasets
Hyper-parameters Details
Batch size 4
Optimizer Adam
Loss function NLL Loss
Learning rate 0.0001
Total epoch 20
Epoch patients 6
Drop factor of learning rate 0.1
Maximum epoch stop 10
Stop criteria Loss
(2)
(3)
(4)
(5)
3 Results
3.1 Binary Classification
“Normal Conjunctiva” versus “Abnormal Conjunctiva” classes are
considered binary classes for classification using seven CNN models. The
learning curves of these seven CNNs are available in Supplementary tables
1 to 7. All the learning curves suggested the models are well-trained and do
not have chances of overfitting and underfitting problems. Figure 5 displays
the mean and standard deviation of accuracies across five-fold validation
using these seven pre-trained CNN models. EfficientNet_B7 achieved the
highest mean accuracy and lowest standard deviation in fold-wise accuracy.
The results showed that GoogLeNet's performance varied more over five-
fold than EfficientNet_B7, which indicates that GoogLeNet had a
comparatively less fold-wise performance.
Fig. 5 Representation of mean and standard deviation in the five-fold accuracy of all
models for binary classification
4 Conclusion
In conclusion, the proposed study used state-of-the-art CNN models with
data curation, validation and single and multiple augmentation techniques
to classify ocular surface images for different medical condition
investigations (“Binary Class” and “Four Class”). EfficientNet_B7 was the
best-performing model with 99.73% and 94.42% accuracy for “Binary
Class” and “Multi-Class” respectively utilizing the methodology proposed
in this study. The results for both types of investigation outperformed
previously published literature [3]. Moreover, this model showed a high
degree of sensitivity of 99.51% and 99.42% for the “Binary Class” and
“Four Class” investigations, respectively. The performance of the best
model, EfficientNet_B7, was also evaluated through Grad-CAM-based
visual interpretation as this study includes the diagnosis of sensitive
medical conditions using ocular surface images. In future, the proposed
model can be implemented in the server so that the model can produce
predictions with visual interpretation for clinicians and patients. The
implementation of such a server-based implementation of the proposed
model can be used in remote areas for telemedicine facilities and helps
people in the rural area to easily diagnose eye conditions with visual
interpretation.
Funding
This work was made possible by Qatar National Research Fund (QNRF)
NPRP12S-0227–190164 and International Research Collaboration Co-Fund
(IRCC) grant: IRCC-2021–001. The statements made herein are solely the
responsibility of the authors.
References
1. Damato, B., & Coupland, S. E. (2008). Conjunctival melanoma and melanosis: a
reappraisal of terminology, classification and staging. Clinical & Experimental
Ophthalmology, 36(8), 786–795.
3. Yoo, T. K., Choi, J. Y., Kim, H. K., Ryu, I. H., & Kim, J. K. (2021). Adopting
low-shot deep learning for the detection of conjunctival melanoma using ocular
surface images. Computer Methods and Programs in Biomedicine, 205, 106086.
5. Wong, J. R., Nanji, A. A., Galor, A., & Karp, C. L. (2014). Management of
conjunctival malignant melanoma: a review and update. Expert Review of
Ophthalmology, 9(3), 185–204.
6. Isager, P., Engholm, G., Overgaard, J., & Storm, H. (2002). Uveal and
conjunctival malignant melanoma in Denmark 1943–97: observed and relative
survival of patients followed through 2002. Ophthalmic Epidemiology, 13(2), 85–
96.
7. Chang, A. E., Karnell, L. H., & Menck, H. R. (1998). The National Cancer Data
Base report on cutaneous and noncutaneous melanoma: A summary of 84,836
cases from the past decade. Cancer: Interdisciplinary International Journal of the
American Cancer Society, 83(8), 1664–1678.
8. Larsen, A. C., Dahmcke, C. M., Dahl, C., Siersma, V. D., Toft, P. B., Coupland, S.
E., et al. (2015). A retrospective review of conjunctival melanoma presentation,
treatment, and outcome and an investigation of features associated with BRAF
mutations. JAMA Ophthalmology, 133 (11), 1295–1303.
9. Kao, A., Afshar, A., Bloomer, M., & Damato, B. (2016). Management of primary
acquired melanosis, nevus, and conjunctival melanoma. Cancer Control, 23(2),
117–125.
10. Damato, B., & Coupland, S. E. (2008). Conjunctival melanoma and melanosis: a
reappraisal of terminology, classification and staging. Clinical & Experimental
Ophthalmology, 36 (8), 786–795.
11. Hallak, J. A., Scanzera, A., Azar, D. T., & Chan, R. P. (2020). Artificial
intelligence in ophthalmology during COVID-19 and in the post COVID-19 era.
Current Opinion in Ophthalmology, 31(5), 447.
12.
Ching, T., Himmelstein, D. S., Beaulieu-Jones, B. K., Kalinin, A. A., Do, B. T.,
Way, G. P., et al. (2018). Opportunities and obstacles for deep learning in biology
and medicine. Journal of The Royal Society Interface, 15(141), 20170387
14. DuBois, K. N. (2019). Deep medicine: How artificial intelligence can make
healthcare human again. Perspectives on Science and Christian Faith, 71(3), 199–
201.
15. Rahman, T., Akinbi, A., Chowdhury, M. E., Rashid, T. A., Şengür, A., Khandakar,
A., et al. (2022). COV-ECGNET: COVID-19 detection using ECG trace images
with deep convolutional neural network. Health Information Science and Systems,
10(1), 1–16.
16. Rahman, T., Khandakar, A., Islam, K. R., Soliman, M. M., Islam, M. T., Elsayed,
A., et al. (2022). HipXNet: Deep learning approaches to detect aseptic loos-ening
of hip implants using X-ray images. IEEE Access, 10, 53359–53373.
17. Abir, F. F., Alyafei, K., Chowdhury, M. E., Khandakar, A., Ahmed, R., Hossain,
M. M., et al. (2022). PCovNet: A presymptomatic COVID-19 detection
framework using deep learning model using wearables data. Computers in Biology
and Medicine, 147, 105682.
18. Chowdhury, M. H., Shuzan, M. N. I., Chowdhury, M. E., Reaz, M. B. I., Mahmud,
S., Al Emadi, N., et al. (2022). Lightweight end-to-end deep learning solution for
estimating the respiration rate from photoplethysmogram signal. Bioengineering,
9(10), 558.
19. Wang, G., Ye, J. C., Mueller, K., & Fessler, J. A. (2018). Image reconstruction is a
new frontier of machine learning. IEEE Transactions On Medical Imaging, 37(6),
1289–1296.
20. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks
for biomedical image segmentation. In International Conference on Medical
image computing and computer-assisted intervention (pp. 234–241).
21. Haskins, G., Kruger, U., & Yan, P. (2020). Deep learning in medical image
registration: A survey. Machine Vision and Applications, 31(1), 1–18.
22.
Karimi, D., Dou, H., Warfield, S. K., & Gholipour, A. (2020). Deep learning with
noisy labels: Exploring techniques and remedies in medical image analysis.
Medical Image Analysis, 65, 101759.
23. Rahman, T., Chowdhury, M. E., Khandakar, A., Mahbub, Z. B., Hossain, M. S. A.,
Alhatou, A., et al. (2022). BIO-CXRNET: A robust multimodal stacking machine
learning technique for mortality risk prediction of COVID-19 patients using chest
x-ray Images and clinical data. Neural Computing and Applications.
24. Tahir, A. M., Qiblawey, Y., Khandakar, A., Rahman, T., Khurshid, U.,
Musharavati, F., et al. (2022). Deep learning for reliable classification of COVID-
19, MERS, and SARS from chest X-ray images. Cognitive Computation, 1–21.
25. Tahir, A. M., Chowdhury, M. E., Khandakar, A., Rahman, T., Qiblawey, Y.,
Khurshid, U., et al. (2021). COVID-19 infection localization and severity grading
from chest X-ray images Computers in Biology and Medicine, 139, 105002.
26. Qiblawey, Y., Tahir, A., Chowdhury, M. E., Khandakar, A., Kiranyaz, S., Rahman,
T., et al. (2021). Detection and severity classification of COVID-19 in CT images
using deep learning. Diagnostics, 11(5), 893.
27. Pacheco, A. G. C., & Krohling, R. A. (2020). The impact of patient clinical
information on automated skin cancer detection. Computers in Biology and
Medicine, 116, 103545.
28. Han, S. S., Park, G. H., Lim, W., Kim, M. S., Na, J. I., Park, I., et al. (2018). Deep
neural networks show an equivalent and often superior performance to
dermatologists in onychomycosis diagnosis: Automatic construction of
onychomycosis datasets by region-based convolutional deep neural network. PloS
one, 13(1), e0191493.
29. Bhimavarapu, U., & Battineni, G. (2022). Skin lesion analysis for melanoma
detection using the novel deep learning model fuzzy GC-SCNN. In Healthcare, p.
962.
30. Martin-Gonzalez, M., Azcarraga, C., Martin-Gil, A., Carpena-Torres, C., Jaen, P.,
& Health, P. (2022). Efficacy of a deep learning convolutional neural network
system for melanoma diagnosis in a hospital population. International Journal of
Environmental Research and Public Health, 19(7), 3892.
31.
Haenssle, H. A., Fink, C., Schneiderbauer, R., Toberer, F., Buhl, T., Blum, A., et
al. (2018). Man against machine: diagnostic performance of a deep learning
convolutional neural network for dermoscopic melanoma recognition in
comparison to 58 dermatologists. Annals of Oncology, 29(8), 1836–1842.
32. Brinker, T. J., Hekler, A., Enk, A. H., Klode, J., Hauschild, A., Berking, C., et al.
(2019). A convolutional neural network trained with dermoscopic images
performed on par with 145 dermatologists in a clinical melanoma image
classification task. European Journal of Cancer, 111, 148–154.
33. Yin, G., Gendler, S., & Teichman, J. (2022). Ocular surface squamous neoplasia in
a patient following oral steroids for contralateral necrotising scleritis. BMJ Case
Reports CP, 15(12), e253300.
34. Rahman, T., Chowdhury, M. E., Khandakar, A., Mahbub, Z. B., Hossain, M. S. A.,
Alhatou, A., et al. (2022). BIO-CXRNET: A robust multimodal stacking machine
learning technique for mortality risk prediction of COVID-19 patients using chest
x-ray images and clinical data. arXiv preprint arXiv:2206.07595
35. Khandakar, A., Chowdhury, M. E., Reaz, M. B. I., Ali, S. H. M., Kiranyaz, S.,
Rahman, T., et al. (2022). A novel machine learning approach for severity
classification of diabetic foot complications using thermogram images. Sensors,
22(11), 4249.
36. Rahman, T., Khandakar, A., Islam, K. R., Soliman, M. M., Islam, M. T., Elsayed,
A. et al. (2022). HipXNet: Deep learning approaches to detect aseptic loos-ening
of hip implants using x-ray images. IEEE Access, 10, 53359–53373.
37. Wang, H., Wang, Z., Du, M., Yang, F., Zhang, Z., Ding, S., et al. (2020). Score-
CAM: Score-weighted visual explanations for convolutional neural networks. In
Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition workshops (pp. 24–25).
38. Schlemper, J., Oktay, O., Schaap, M., Heinrich, M., Kainz, B., Glocker, B., et al.
(2019). Attention gated networks: Learning to leverage salient regions in medical
images. Medical Image Analysis, 53, 197–207.
39. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015).
ImageNet large scale visual recognition challenge. International Journal of
Computer Vision, 115(3), 211–252.
40. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015).
Going deeper with convolutions. In Proceedings of the IEEE conference on
computer vision and pattern recognition (pp. 1–9).
41. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image
recognition. In Proceedings of the IEEE conference on computer vision and
pattern recognition (pp. 770–778).
42. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K. Q. (2017). Densely
connected convolutional networks. In Proceedings of the IEEE conference on
computer vision and pattern recognition (pp. 4700–4708).
43. Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for
convolutional neural networks. In International conference on machine learning
(pp. 6105–6114).
44. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D.
(2017). Grad-cam: Visual explanations from deep networks via gradient-based
localization. In Proceedings of the IEEE international conference on computer
vision (pp. 618–626).
45. Podder, K. K., Chowdhury, M. E., Tahir, A. M., Mahbub, Z. B., Khandakar, A.,
Hossain, M. S., et al. (2022). Bangla sign language (bdsl) alphabets and numerals
classification using a deep learning model. Sensors, 22(2), 574.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
S. S. Roy et al. (eds.), Deep Learning Applications in Image Analysis, Studies in Big Data 129
https://doi.org/10.1007/978-981-99-3784-4_7
1 Introduction
Not so long ago, India was an agricultural country. Even today, roughly,
there are 118 million farmers in the country [1]. One of the major issues
that these farmers/cultivators face is several diseases that affect their plants.
Not only this exacerbates their economic problem, but also their social life;
several hours, and sometimes years, of hard work. There are several
chemicals that can be employed to alleviate this problem. The major issue
here is diagnosis, and unless farmers have a lab in their vicinity, it is likely
that diseases will be misidentified. Furthermore, the situation may get
worsen, as it is often and spread to other farms. India has seen a large
increase in smartphone sales and this is coupled with the rise of middle
class. Various telecommunication companies want to have hold of the rising
market and this has led to the cost of internet usage to almost nearly zero.
There are nearly 833 million internet [2] users which is equal to 59.28% of
the population of India. In this chapter, we have work to provide all the
farmers and cultivators with smartphones with internet access, we could
reduce the food loss in the country.
In order to help these farmers, David. P. Hughes and Marcel Salathe, in
their paper have created a database called, PlantVillage, which is an open
access database of 50,000 + images of healthy and diseased crops. This
database has more than 150 crops and 1800 diseases. PlantVillage is a
community of people helping each other, by answering the questions and
identifying the diseases by looking at pictures in the questions. It is helpful
but it has drawbacks as stated above [3]. In the paper, David P. Hughes &
Marcel Salathé, have described the advantage of computer diagnostics tools
over human diagnosis. And we cannot download all the images in their
dataset. But in April 2016, PlantVillage released a subset of their dataset for
image classification challenge on CrowdAI [3].
2.1 History
Deep Learning was an underappreciated field due to several reasons such
as, absence of powerful GPUs, absence of required data and limited
scientific work. In fact, deep learning is term coined to attract interest in
neural networks again. There have been three phases of development in the
field: it was known as cybernetics in 1940−1960s, connectionism in
1980−1990s and deep learning from late 2000s. It is also known as artificial
neural network (ANNs) due to the fact that its design is inspired from
biological neural network [4].
So, earliest neural network models were simple linear models. They
were designed to take inputs{x1, x2…..xN} at the input layer, corresponding
to an output y. The network would learn the weights {w1, w2……wN} such
that
(1)
McCulloch-Pitts Neuron, Perceptron and ADALINE (adaptive linear
element) were some of the linear models. Although, these models were very
useful but they had limitations, most importantly, they couldn’t replicate
XOR function. Neural network were no longer popular after the discovery.
There were massive research going on during the second phase or popularly
known as connectionism. The most important development in this phase
was successful implementation of backpropagation algorithm for training
purposes.
Algorithms such as backpropagation and LSTM are still popular. But
the reason why the popularity of neural net declined was unrealistic claims
made by the companies and then under delivering them. Meanwhile,
various other machine learning models were performing far better than
neural networks, thus declining its popularity. In 2006, Geoffrey Hinton,
trained a neural network called, deep belief network. This sparked interest
in neural network again. World had more computation power and more
data. By 2012, deep learning had proved to be useful state of art technology
in the field of object detection, image classification and computer vision.
(2)
(4)
(7)
(8)
And if we use as the activation function, where (Fig. 2):
(9)
(10)
Apply .
(11)
(12)
All we have to do is compute this for every i,j,l and then take entire
value of weight and move along negative gradient (Fig. 3).
(13)
Now let’s find for final layer. When we computed the same we got xs
for first layer and then we propagate it forward until we get to the output.
The reason is that if we know for final layer, we will be able to use it to
find for previous layers by propagating backwards, and hence the name,
backpropagation.
So,
(14)
(15)
Suppose we are using mean square error, then (Fig. 4)
Fig.4 Backpropagation: phase II
(16)
(17)
(18)
2.2.6 Backpropagation Algorithm
1. Initialize all weights at random.
2. For t = 0, 1….. do
3. Pick n from {1, 2, … N}
4. Forward: compute all
5. Backward: compute
6. Update the weights,
2.3.1 Convolution
In mathematics and engineering, convolution is described as mathematical
operation between two functions. It is defined as the integral of the product
of the two functions after one is reversed and shifted.
(19)
Convolution is denoted by asterisk (*).
In deep learning, function x(a) is known as input and function w(t-a) is
known as kernel.
Convolution controls three important ideas that helps a machine
learning system: sparse interactions, parameter sharing and equivariant
representations. Additionally, convolution provides a means for working
with inputs of variable size.
2.3.2 Pooling
A layer of convolution network has three stages: convolution layer,
activation function such as ReLU and a pooling layer. A pooling layer
changes the output of the net by replacing some areas of input by its
statistical summary. It performs down sampling in height and width
dimensions. The commonly used pooling layer is max pooling.
2.3.3 ReLU
The rectifier linear unit is an activation function defined as
(20)
Convolutional nets were some of the first working deep networks
trained with backpropagation. It is not fully clear why convolutional
networks succeeded when general backpropagation networks were
considered to have failed.
2.4.1 Theano
Theano is a framework based on python developed by the LISA group and
run by Yoshua Bengio at the University of Montreal [11].
2.4.2 Torch
Torch is a deep learning framework developed by Ronan Collobert,
Clement Farabet and Koray Kavukcuoglu [12].
2.4.3 Caffe
Caffe is a Python deep learning library developed by Yangqing Jia at the
Berkeley Vision and Learning Centre. The biggest advantage of Caffe is the
number of pre-trained network that be downloaded from their model zoo
[13].
2.4.4 Tensorflow
TensorFlow is an open-source programming library for machine learning
over a scope of assignments, and created by Google to address their issues
for frameworks fit for building and preparing neural systems to identify and
interpret examples and relationships.
3.1 Dataset
The dataset on CrowdAI consists of 54,309 images for training the neural
network. It has 14 different species of crop, 17 fungal diseases, 4 bacterial
diseases, 2 mold diseases, 2 viral disease, 1 disease caused by a mite and 12
crop species that are visibly healthy. This means that there are 38 classes of
images.
These 14 crop species are: Apple, Blueberry, Cherry, Corn, Grape,
Orange, Peach, Bell Pepper, Potato, Raspberry, Soybean, Squash,
Strawberry, and Tomato (Fig. 5).
3.3 Architecture
In 2012, Alex Krizhevsky, Ilya Sutskever and Geoff Hinton submitted a
convolution network called, Alexnet, for an Imagenet ILSVRC challenge.
ILSVRC challenge also known as ImageNet challenge is conducted every
year where participants have to make a model that can classify millions of
images into 1000 classes of object. They won the challenge the same year
and since then it was always a variation of CNN that won the challenge
(Fig. 6).
The input layers in AlexNet are formed by the raw pixel values obtained
from the image, and the final layer gives a probability distribution across all
the classes. The intermediate layers use a “processed version” of the output
of the previous layer as their input, and over the whole training period they
learn to activate against more and more complex features depending on how
deep they are in the overall architecture. The neural net such as AlexNet are
computationally very expensive and intensive. It usually takes several
weeks to train on ImageNet dataset. Fortunately, the features learnt by
earlier layers are very generic in nature, and thus can be used on new
dataset with totally different classes. This approach is known Transfer
Learning or Fine Tuning. In transfer learning, we take a pre-trained model
and use the learnt weight and after modification of the final fully connected
layers, we use them to train on new dataset. This gives us better result. In
our PlantVillage dataset, we have 38 classes instead of 1000 classes from
ImageNet. So, we have to change the num_output parameter of fully
connected layer in the training configuration file Caffe [3, 8, 15, 16].
3.4 Results
If data is pre-processed and files are correctly configured, there will be no
problem in training the model. So, when we train the model, we have to
make sure that we are maintaining the log file. This is done in order to
understand the training process. Also, this log file can be used to generate
graph. It took roughly around 2 h for training the model for 2000 iterations
(Fig. 7).
Fig. 7 Training curve for accuracy and loss with 2000 iterations
4 Conclusion
In conclusion, the use of deep learning in the form of image classification
can provide a budget-friendly and efficient solution to the problem of plant
diseases affecting farmers and cultivators. Otherwise, farmers would need
well equipped labs to determine the disease. AlexNet is able to obtain 98 to
99% accuracy on training set and 91.3% accuracy on test set. In the future,
we would like to employed different deep learning models and perform
different types of augmentations.
References
1. Agarwal, K. (2021). Indian agriculture’s enduring question: Just how many
farmers does the country have?. The Wire. Retrieved, 22.
2. BBC. (2023, January 23). India media guide. BBC News. https://www.bbc.com/
news/world-south-asia-12557390
3. Hughes, D., & Salathé, M. (2015). An open access repository of images on plant
health to enable the development of mobile disease diagnostics. arXiv preprint
arXiv:1511.08060.
4. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Book in
preparation for MIT Press. http://www.deeplearningbook.org
5. Jabbar, H., & Khan, R. Z. (2015). Methods to avoid over-fitting and under-fitting
in supervised machine learning (comparative study). Computer Science,
Communication and Instrumentation Devices, 70, 163–172.
6. Robbins, H., & Monro, S. (1951). A stochastic approximation method. The annals
of mathematical statistics, 400–407.
7. Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in
Machine Learning, 2, 1–127. Also published as a book. Now Publishers.
9. Roy, S. S., Awad, A. I., Amare, L. A., Erkihun, M. T., & Anas, M. (2022).
Multimodel phishing URL detection using LSTM, bidirectional LSTM, and GRU
models. Future Internet, 14(11), 340.
[Crossref]
12. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T.,
Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito,
Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., ... Chintala, S.
(2019). Pytorch: An imperative style, high-performance deep learning
library. Advances in neural information processing systems, 32.
13. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., …
Guadarrama, S. & Darrell, T. (2014). Caffe: Convolutional architecture for fast
feature embedding. In Proceedings of the 22nd ACM international conference on
Multimedia (pp. 675–678).
14. Gibson, A., Nicholson, C., Patterson, J., Warrick, M., Black, A. D., Kokorin, V., ...
& Eraly, S. (2016). Deeplearning4j: Distributed, opensource deep learning for
Java and Scala on hadoop and spark. Towards Data Science.
16. Krizhevsky, A., Sutskever, I., Hinton, G. E. ImageNet classification with deep
convolutional neural networks. University of Toronto.
17. Roy, S. S., Goti, V., Sood, A., Roy, H., Gavrila, T., Floroian, D., Paraschiv, N.,
Mohammadi-Ivatloo, B. (2014). L2 regularized deep convolutional neural
networks for fire detection. Journal of Intelligent & Fuzzy Systems, (Preprint), 1–
12.
18. Roy, S. S., Mihalache, S. F., Pricop, E., & Rodrigues, N. (2022) Deep
convolutional neural network for environmental sound classification via
dilation. Journal of Intelligent & Fuzzy Systems Preprint, 1–7.
19. Roy, S. S., Rodrigues, N., & Taguchi, Y. (2020). Incremental dilations using CNN
for brain tumor classification. Applied Sciences, 10(14), 4915.
[Crossref]
20.
Biswas, R., Vasan, A., & Roy, S. S. (2020). Dilated deep neural network for
segmentation of retinal blood vessels in fundus images. Iranian Journal of
Science and Technology, Transactions of Electrical Engineering, 44(1), 505–518.
[Crossref]
21. Roy, S. S., Mihalache, S. F., Pricop, E., & Rodrigues, N. (2022). Deep
convolutional neural network for environmental sound classification via dilation.
Journal of Intelligent & Fuzzy Systems, (Preprint), 1–7.
22. Deep learning research should be encouraged for diagnosis and treatment of
antibiotic resistance of microbial infections in treatment associated emergencies in
hospitals.
23. Lee, K. C., Roy, S. S., Samui, P., & Kumar, V. (Eds.). (2020). Data analytics in
biomedical engineering and healthcare. Academic Press.
24. Samui, P., Roy, S. S., & Balas, V. E. (Eds.). (2017). Handbook of neural
computation. Academic Press.
26. Roy, S. S., & Taguchi, Y. H. (2021). Identification of genes associated with altered
gene expression and m6A profiles during hypoxia using tensor decomposition
based unsupervised feature extraction. Scientific reports, 11(1), 1–18.
[Crossref]
27. Ali, M., Magdon-Ismail, M., Lin, H. T. Learning from Data-Abu. https://amlbook.
com/
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
S. S. Roy et al. (eds.), Deep Learning Applications in Image Analysis, Studies in Big Data 129
https://doi.org/10.1007/978-981-99-3784-4_8
B. K. Tripathy
Email: tripathybk@vit.ac.in
1 Introduction
Since the advent of imaging spectrum (1980s), Hyperspectral images (HIs)
have been acquired owing to computational classificatory capability for fine
spectra that provides a resolving power for a diverse range of applications.
Some includes remote sensing based environmental, atmospheric and ocean
observations [66], meteorological applications, military [37], geological
exploration and mining [53], crops, vegetation and food analysis and
standalone biomedical fields [56]. In addition to having high spectral and
spatial resolution, HIs have many bands and abundant information because
they cover ultraviolet, visible, near-infrared, and mid-infrared wavelengths.
This offers an avenue of research HI-based image correction [77], noise
reduction [40], transformation [48], dimensionality reduction, and
classification [8].
For the machine learning (ML) based methods to processes HIs, there is
a high need to label several legitimate samples for training. Early researches
on this regard were focused with spectral information based HI
classification methods like, support vector machine [72], random forest,
neural networks [20, 67], and Polynomial logistic regression [45]. An HI
represents the image as a “hypercube, (x, y, λ)” in which the first two
dimensions indicate its spatial coordinates and the third indicates the
number of bands. As a result, each pixel represents a pattern with as many
attributes as there are bands. With a complexity on bands (large number)
associated in HIs, the high data volume (populates exponentially) to be
processed further relates to the avenue to reduce the dimensionality and to
minimize the computation complexity in many real life HI applications. To
cater the dimension reduction methods in HIs, numerous applications have
also been proposed using feature extraction and feature selection. Some
prominent methods include, principal component analysis (PCA) [32],
independent component analysis (ICA) [33, 71], and linear discriminant
analysis (LDA) [17]. The deep learning method has excellent capabilities in
image processing, particularly in recent years, when image classification,
target detection, and other fields have sparked its use. There are a number
of deep learning network models available to improve the performance of
HI processing, such as the convolutional neural network (CNN), the deep
belief network (DBN) [24], and the recurrent neural network (RNN). In
addition, to resolve the problem of poor classification results due to a lack
of training samples, tensor-based classification model [51, 52] was
proposed and experiments revealed that when the number of training
samples is small, this method outperforms to support vector machines and
deep learning. In this first part of our discussion, one of our primary goals is
to enhance the accuracy of the classification. We use Hyperspectral image
of Sundarban mangrove area through seninel-2 satellite (Fig. 1). The input
images contributed through 12 bands are processed with spatial analysis
using DL based 3D CNN. Principal Component Analysis (PCA) is
implemented to derive 3D patches of the Sundarban satellite image. The
process exhibits with 96% classification accuracy.
Fig. 1 Bhitarkanika mangrove
(Source Google Maps): a Binary image b Grayscale image c RGB image
2 Related Researches
In the process of classification of HIs, the spectral dimension (Fig. 2) helps
in identifying the significant variations of reflectance between image pixels
which change with wavelength [38]. In a study [31], it was observed that,
the classification accuracy drops dramatically after a certain increased value
of spectral bands. Since a majority of spectral bands are redundant in
nature, so carrying all bands into consideration, affect to the model’s
performance. Dimension reduction techniques [28, 57] on this regard are
used to identify such unnecessary bands without compromising the image’s
information content. The modified brown stick rule for HI [3] contributes a
phenomenal aspect in dimension reductions. In majority of the cases, the
reduced band features suffer with the anomalies of object identification and
necessitates for discriminative spatial features. As per a study [19] the
pixels next to each other belongs to the same class in HIs, hence
applications of HI’s spatial features along with spectral features is an
intuitional and motivation for an effective classification, to study. There
have been some methodologies adapted on feature extractions like Gray
level co-occurrence matrix [44, 54], stationary wavelet transform (SWT)
[43, 73], discrete wavelet transforms [10, 22], morphological profiles [4,
55] have been used in may real-world applications.
3 Hyperspectral Images
To start with the fundamental concepts of a digital image, we can
interpolate it into the form of binary, grayscale, color and Hyperspectral
images. Binary images consist with 0 and 1’s to represent black and white
respectively and occupy in a 2-D matrix (r-rows. c-columns). Grayscale
digital images range from 0 to 255 to represent white to black with
intermediate levels of gray-scale. As per the biological aspects of human
cone cells (eye) to render environment colours, combinations of RGB-
scales (red, green, blue) are digitized into (r-rows × c-columns) × 3
channels. These RGB coloration is based on the reflected light from objects
fall under separate wavelengths (long, medium, short for red, green and
blue respectively) in the visible spectrum (perceived by human eyes) of the
electromagnetic radiation.
Alongside, there are lot of wavelengths beyond the visible spectrum
signify valuable information which the human eye cannot perceive. To be
formal, spectral image is a kind of similar to RGB colour image with many
channels describing the spatial and spectral information. Multi-spectral
image consists with n-band images, where each band has corresponding
light intensity to the wavelength (not necessarily spread over a contiguous
wavelength range). A λ-band Hyperspectral image consists with n grey-
scale images, where each band has corresponding light intensity to the
wavelength being stacked on top of each other over a contiguous
wavelength range (r-rows × c-columns × λ bands).
4 Deep Learning and CNN
The idea behind Deep Learning (DL) is to train computers/machines
artificially with an approach to model complex algorithms to learn from
experience, classify and recognize data or images just like a human brain
does. As a type of ANN, CNN is also used for image or object recognition
(processing images, analyzing videos, and detecting obstacles in
autonomous vehicles). There have been phenomenal developments in
devising methods pertain to ANN in DL-based classification and
object/image recognition domain. DL-based three Core layers (dense,
convolution and output) offer learning based HI solutions towards a
supervised, semi-supervised or unsupervised models.
Hi-based DL models are being developed for many classifications and
object identification purposes in using these three designs. The adaptability
of these design for application models depends on the availability of labeled
HI-data. To be specific, if the HI-model is based on the mapping process of
labeled datasets in respect to the ground truth then the supervised model is
used. To extract/unavil properties of HI data from unlabeled datasets, the
unsupervised design is addressed and while with availability of little/small
portion of HI based labeled data, the semi-supervised design is preferred to
get use in the model. Further, convolutional neural networks (CNNs) in
contrast to deep forward neural networks (DNNs) and autoencoders (AEs)
play a vital role in many HI-based intensive applications. In a high-
dimensional recognition or prediction system, the role of convolution layers
in CNN is specifically oriented to identify or learn the local patterns from
images or sequences of images. There are three simple operational steps
generally viewed in CNN models (feed forward and one direction) for HI
classifications. First, identification of input image and the conversion into
image pixels (array) by the input layer. Then it passes through multiple
hidden layers. The feature extraction process is being taken care by
convolution followed by the usage of pooling, rectified linear units on need
basis. Object classification is being taken care at fully connected layer and
to identify with label at output layer. The most general form of a CNN is
identified with a group of convolutional and pooling into modules; however
there are variant possible of groups.
In HI-based research point of view, the top ten most popular CNNs can
be represented as, Convolutional Neural Networks (CNNs), Long Short
Term Memory Networks (LSTMs), Recurrent Neural Networks (RNNs),
Generative Adversarial Networks (GANs), Radial Basis Function Networks
(RBFNs), Multilayer Perceptions (MLPs), Self-Organizing Maps (SOMs),
Deep Belief Networks (DBNs), Restricted Boltzmann Machines( RBMs)
and Autoencoders.
(1)
(2)
(3)
(5)
Fig. 8 Testing: IPD, SD, PUD a Overall accuracy b Average accuracy c Kappa score
Fig. 9 Epochs and training/validation loss a–c 10% Training IPD, SD, PUD d–f30%
Training IPD, SD, PUD
References
1. Adate, A., Arya, D., Shaha, A., & Tripathy, B. K. (2020). Impact of deep neural
learning on artificial intelligence research. In S. Bhattacharyya, A. E. Hassanian,
S. Saha, & B. K. Tripathy (Ed.), Deep Learning Research and Applications
(pp.69–84). De Gruyter Publications. https://doi.org/10.1515/9783110670905-004
2. Adate, A., & Tripathy, B. K. (2018). Deep learning techniques for image
processing. In S. Bhattacharyya, H. Bhaumik, A. Mukherjee & S. De (Eds.),
Machine Learning for Big Data Analysis (pp. 69–90). De Gruyter. https://doi.org/
10.1515/9783110551433-00357
6. Bhattacharyya, S., Snasel, V., Hassanian, A. E., Saha, S., & Tripathy, B. K.
(2020). Deep learning research with engineering applications. De Gruyter
Publications. ISBN: 3110670909, 9783110670905. https://doi.org/10.1515/
9783110670905
7. Bhardwaj, P., Guhan, T., & Tripathy, B. K. (2021). Computational biology in the
lens of CNN, Studies in Big Data. In S.S. Roy, & Y.-H. Taguchi (Eds.), Handbook
of Machine Learning Applications for Genomics, (Chapter 5) (vol. 103). ISBN:
978–981–16–9157–7 496166_1_En
9. Bose, A., & Tripathy, B. K. (2020). Deep learning for audio signal classification.
In S. Bhattacharyya, A. E. Hassanian, S. Saha, & B. K. Tripathy (Ed.), Deep
Learning Research and Applications (pp. 105–136). De Gruyter Publications.
https://doi.org/10.1515/9783110670905-00660
10. Bruce, L. M., Li, J., & Huang, Y. (2022). Automated detection of subpixel
hyperspectral targets with adaptive multichannel discrete wavelet trans-form.
IEEE Transactions on Geoscience and Remote Sensing, 40(4), 977−980
11. Chen, Y., Lin, Z., Zhao, X., Wang, G., & Gu, Y. (2014). Deep learning-based
classi-fication of hyperspectral data. IEEE Journal of Selected topics in applied
earth observations and remote sensing, 7(6), 2094–2107.
[Crossref]
16. Dharmasastha, K. N. S., Banu, K. S., Kalaichevlan, G., Lincy, B., & Tripathy,
B.K. (2022). Classification of pest in tomato plants using CNN. In M. N.
Mohanty, S. Das, M. Ray, B. Patra (Eds.), Meta Heuristic Techniques in Software
Engineering and Its Applications. METASOFT 2022. Artificial Intelligence-
Enhanced Software and Systems Engineering (vol. 1). Springer. https://doi.org/10.
1007/978-3-031-11713-8_6
17. Du, Q. (2007). Modified fisher’s linear discriminant analysis for hyperspectral
imagery. IEEE Geoscience and Remote Sensing Letters, 4(4), 503–507.
[Crossref]
18. Fauvel, M., Benediktsson, J. A., Chanussot, J., & Sveinsson, J. R. (2008). Spectral
and spatial classification of hyperspectral data using svms and morphological
profiles. IEEE Transactions on Geoscience and Remote Sensing, 46(11), 3804–
3814.
[Crossref]
19. Fauvel, M., Tarabalka, Y., Benediktsson, J. A., Chanussot, J., & Tilton, J. C.
(2012). Advances in spectral-spatial classification of hyperspectral images.
Proceedings of the IEEE, 101(3), 652–675.
[Crossref]
20. Fu, A., Ma, X., & Wang, H. (2018). Classification of hyperspectral image based
on hybrid neural networks. In: IGARSS 2018 2018 IEEE International Geoscience
and Remote Sensing Symposium (pp. 2643–2646).
21. Fukushima, K., & Miyake, S. (1982). Neocognitron: A self-organizing neural net-
work model for a mechanism of visual pattern recognition. In Competition and
Cooperation in Neural Nets (pp. 267–285). Springer.
22. Ghasemzadeh, A., & Demirel, H. (2016) Hyperspectral face recognition using 3d
discrete wavelet transform. In 2016 Sixth International Conference on Image
Processing Theory, Tools and Applications (IPTA) (pp. 1–4).
23. Ghiya, A.S., Vijay, V., Ranganath, A., Chaturvedi, P., Tripathy, B.K. & Banu, K.
S. (2021). Weather classification: Image embedding using xonvolutional
autoencoder and predictive analysis using stacked generalization. In ANTIC
conference. BHU.
24. Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., & Lew, M. S. (2016). Deep
learning for visual understanding: A review. Neurocomputing, 187, 27–48.
[Crossref]
25. Gupta, P., Bhachawat, S., Dhyani, K., & Tripathy, B. K. (2021). A study of gene
characteristics and their applications using deep learning, (Chapter 4), Studies in
Big Data. In S. S. Roy, & Y.-H. Taguchi (Eds.), Handbook of Machine Learning
Applications for Genomics (vol. 103). ISBN: 978–981–16–9157–7, 496166_1_En
26. Hamida, A. B., Benoit, A., Lambert, P., & Amar, C. B. (2018). 3-d deep learning
approach for remote sensing image classification. IEEE Transactions on
geoscience and remote sensing, 56(8), 4420–4434.
[Crossref]
27. Harikiran, J., Ladi, S. K., Panda, G. K., Dash, R., Ladi, P. K. (2020).
Hyperspectral image classification bi-dimensional empirical mode decomposition
and deep residual networks. In 2020 International Conference on Artificial
Intelligence and Signal Processing (AISP) (pp.1–6).
28. Harsanyi, J. C., & Chang, C.-I. (1994). Hyperspectral image classification and
dimensionality reduction: An orthogonal subspace projection approach. IEEE
Transactions on geoscience and remote sensing, 32(4), 779–785.
[Crossref]
29. Haut, J. M., Paoletti, M. E., Plaza, J., Plaza, A., & Li, J. (2019). Hyperspectral
image classification using random occlusion data augmentation. IEEE Geoscience
and Remote Sensing Letters, 16(11), 1751–1755.
[Crossref]
30. Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for
deep belief nets. Neural computation, 18(7), 1527–1554.
[MathSciNet][Crossref][zbMATH]
31. Hughes, G. (1968). On the mean accuracy of statistical pattern recognizers. IEEE
transactions on information theory, 14(1), 55–63.
[Crossref]
32. Imani, M., & Ghassemian, H. (2014). Principal component discriminant analysis
for feature extraction and classification of hyperspectral images. In 2014 Iranian
Conference on Intelligent Systems (ICIS) (pp. 1–5).
33.
Jayaprakash, C., Damodaran, B. B., Sowmya, V., & Soman, K. P. (2018).
Dimensionality reduction of hyperspectral images for classification using
randomized independent component analysis. In 2018 5th International
Conference on Signal Processing and Integrated Networks (SPIN) (pp. 492–496)
34. Kabir, H. M. D., Abdar, M., Jalali, S. M. J., Khosravi, A., Atiya, A.F., Nahavandi,
S., & Srinivasan, D. (2020). SpinalNet: Deep neural network with gradual input
36. Kaul, D., Raju, H., & Tripathy, B. K. (2022). Deep learning in healthcare, in:
Deep Learning in Data Analytics. In: D.P. Acharjya, A. Mitra, N. Zaman (Eds,),
Deep Learning in Data Analytics-Recent Techniques, Practices and Applications,
Studies in Big Data (vol. 91, pp. 97–115). Springer. https://doi.org/10.1007/978-3-
030-75855-4_6
37. Ke, C. (2017). Military object detection using multiple information extracted from
hyperspectral imagery. In 2017 International Conference on Progress in
Informatics and Computing (PIC) (pp. 124–128).
38. Khan, M.J., Khan, H.S., Yousaf, A., Khurshid, K., & Abbas, A. (2018). Modern
trends in hyperspectral image analysis: A review. IEEE Access. 6, 14118−14129
39. Kumar, V., & Tripathy, B. K. (2020). Detecting toxicity with bidirectional gated
recurrent unit networks. In V. Bhateja, S. Satapathy, Y.D. Zhang, V. Aradhya
(Eds.), Intelligent Computing and Communication. ICICC 2019. Advances in
Intelligent Systems and Computing (vol. 1034). Springer. https://doi.org/10.1007/
978-981-15-1084-7_57
40. Kwon, H., Hu, X., Theiler, J., Zare, A, & Gurram, P. (2013). Algorithms for
multispectral and hyperspectral image analysis. Journal of Electrical and
Computer Engineering, 2013, 2. Article ID 908906
41. Ladi, S. K., Panda, G. K., Dash, R., et al. (2022). A novel grey wolf optimisation
based CNN classifier for hyperspectral image classification. Multimed Tools Appl,
81, 28207–28230.
[Crossref]
42. Ladi, S. K., Panda, G. K., Dash, R. et al. (2022). A novel strategy for classifying
spectral-spatial shallow and deep hyperspectral image features using 1D-EWT and
3D-CNN. Earth science informatics
43. Ladi, S. K., Dash, R., Panda, G. K., Ladi, P. K., & Dhupar, R. (2019).
Hyperspectral image classification using swt and cnn. In 2019 International
Conference on Information Technology (ICIT) (pp. 172–177).
44. Li, C., Zuo, H., Fan, T. (2017). Hyperspectral image classification based on gray
level co-occurrence matrix and local mean decomposition. In 2017 4th
International Conference on Systems and Informatics (ICSAI) (pp. 1219–1223).
45. Li, J., Bioucas-Dias, J. M., & Plaza, A. (2010). Semisupervised hyperspectral
image segmentation using multinomial logistic regression with active learning.
IEEE Transactions on Geoscience and Remote Sensing, 48(11), 4085–4098.
46. Li, Y., Zhang, H., & Shen, Q. (2017). Spectral–spatial classification of
hyperspectral imagery with 3d convolutional neural network. Remote Sensing,
9(1), 67.
[Crossref]
47. Li, W., Wu, G., Zhang, F., & Du, Q. (2017). Hyperspectral image classification
using deep pixel-pair features. IEEE Transactions on Geoscience and Remote
Sensing, 55(2), 844–853.
[Crossref]
48. Ma, Y., Li, R., Yang, G., Sun, L., & Wang, J. (2018). A research on the
combination strategies of multiple features for hyperspectral remote sensing
image classification. Journal of Sensors, 2018, 14. Article ID 7341973.
49. Maheswari, K., Shaha, A., Arya, D., Tripathy, B. K., & Rajkumar, R. (2020).
Convolutional neural networks: A bottom-ip approach. In S. Bhattacharyya, A. E.
Hassanian, S. Saha, & B.K. Tripathy (Ed.), Deep Learning Research with
Engineering Applications (pp.21–50). De Gruyter Publications. https://doi.org/10.
1515/9783110670905-002
50. Makantasis, K., Karantzalos, K., Doulamis, A., & Doulamis, N. (2015). Deep
super-vised learning for hyperspectral data classification through convolutional
neural networks. In 2015 IEEE International Geoscience and Remote Sensing
Symposium (IGARSS) (pp. 4959–4962).
51. Makantasis, K., Doulamis, A. D., Doulamis, N. D., & Nikitakis, A. (2018).
Tensor-based classification models for hyperspectral data analysis. IEEE
Transactions on Geoscience and Remote Sensing, 56(12), 6884–6898.
[Crossref]
52.
Makantasis, K., Doulamis, A., Doulamis, N., Nikitakis, A., & Voulodimos, A.
(2018). Tensor-based nonlinear classifier for highorder data analysis. In 2018
IEEE International Conference
53. Notesco, G., Dor, E. B., & Brook, A. (2014). Mineral mapping of makhtesh ramon
in israel using hyperspectral remote sensing day and night LWIR images. In 2014
6th Workshop on Hyperspectral Image and Signal Processing: Evolution in
Remote Sensing (WHISPERS) (pp. 1–4).
54. Pesaresi, M., Gerhardinger, A., & Kayitakire, F. (2008). A robust built-up area
presence index by anisotropic rotation-invariant textural measure. IEEE Journal of
selected topics in applied earth observations and remote sensing, 1(3), 180–192.
[Crossref]
55. Pesaresi, M., & Benediktsson, J. A. (2001). A new approach for the
morphological segmentation of high-resolution satellite imagery. IEEE
transactions on Geoscience and Remote Sensing, 39(2), 309–320.
[Crossref]
56. Pike, R., Lu, G., Wang, D., Chen, Z. G., & Fei, B. (2016). A minimum spanning
forest-based method for noninvasive cancer detection with hyperspectral imaging.
IEEE Transactions on Biomedical Engineering, 63(3), 653–663.
[Crossref]
57. Plaza, A., Mart´ınez, P., Plaza, J., P´erez, R. (2005). Dimensionality reduction and
classification of hyperspectral image data using sequences of extended
morphological transformations. IEEE Transactions on Geoscience and remote
sensing, 43(3), 466–479.
59. Roy, S. K., Krishna, G., Dubey, S. R., & Chaudhuri, B. B. (2020). Hybridsn:
Exploring 3-d-2-d cnn feature hierarchy for hyperspectral image classification.
IEEE Geoscience and Remote Sensing Letters, 17(2), 277–281.
60. Singhania, U., & Tripathy, B. K. (2021). Text-based image retrieval using deep
learning. In Encyclopedia of Information Science and Technology (5th ed., p. 11).
https://doi.org/10.4018/978-1-7998-3479-3.ch007
61. Rungta, R. K., Jaiswal, P., Tripathy, B. K. (2022). A deep learning based approach
to measure confidence for virtual interviews. In A. K. Das et al. (Eds.),
Proceedings of the 4th International Conference on Computational Intelligence in
Pattern Recognition (CIPR) (pp. 278–291). CIPR 2022, LNNS 480.
62. Sihare, P., Khan, A. U., Bardhan, P., & Tripathy, B. K. (2022). COVID-19
detection using deep learning: A comparative study of segmentation algorithms.
In A. K. Das et al. (Eds.), Proceedings of the 4th International Conference on
Computational Intelligence in Pattern Recognition (CIPR) (pp. 1–10). CIPR
2022, LNNS 480.
63. Jain, S., Singhania, U., Tripathy, B.K., Abouel, E. N., Aboudaif, M. K., & Ali, K.
K. (2021). Deep learning based transfer learning for classification of skin cancer.
Sensors (Basel), 21(23), 8142 https://doi.org/10.3390/s21238142. (IF:4.35)
64. Surya, Y. S., Geetha Rani, K. T., & Tripathy, B. K. (2022). Social distance
monitoring and face mask detection using deep learning. In: J. Nayak, H. Behera,
B. Naik, S. Vimal, D. Pelusi (Eds.), Computational Intelligence in Data Mining.
Smart Innovation, Systems and Technologies (vol. 281). Springer. https://doi.org/
10.1007/978-981-16-9447-9_36
65. Sun, T., Jiao, L., Feng, J., Liu, F., & Zhang, X. (2015). Imbalanced hyperspectral
image classification based on maximum margin. IEEE Geoscience and Remote
Sensing Letters, 12(3), 522–526.
[Crossref]
66. Teng, M. Y., Mehrubeoglu, R., King, S. A., Cammarata, K., & Simons, J. (2013).
Investig tion of epifauna coverage on seagrass blades using spatial and spectral
analysis of hyperspectral images. In 2013 5th Workshop on Hyperspectral Image
and Signal Processing: Evolution in Remote Sensing (WHISPERS) (pp. 1–4).
68. Tripathy, B. K., Parikh, S., Ajay, P., & Magapu, C. (2022). Brain MRI
segmentation techniques based on CNN and its variants, (Chapter-10). In J. Chaki
(Ed.), Brain Tumor MRI Image Segmentation Using Deep Learning Techniques
(pp. 161−182). Elsevier publications. https://doi.org/10.1016/B978-0-323-91171-
9.00001-6
69.
Tripathy, B. K., & Adate, A. (2021). Impact of deep neural learning on artificial
intelligence research, Chapter-8. In D. P. Acharjya et al (Ed.), Springer
publications.
70. Voulodimos, A. (2018). Deep learning for computer vision: a brief review.
Computational Intelligence and Neuroscience, 2018, 13. Article ID 7068349.
72. Wang, X., & Feng, Y. (2008). New method based on support vector machine in
classification for hyperspectral data. In 2008 International Symposium on
Computational Intelligence and Design (pp. 76–80)
73. Wang, Y., & Cui, S. (2014). Hyperspectral image feature classification using
stationary wavelet transform. In 2014 International Conference on Wavelet
Analysis and Pattern Recognition (pp. 104–108)
74. Wu, Y., Mu, G., Qin, C., Miao, Q., Ma, W., & Zhang, X. (2020). Semi-supervised
hyperspectral image classification via spatial-regulated self-training. Remote
Sensing, 12(1)
75. Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., & Woo, W.C. (2015).
Convolutional LSTM network: A machine learning approach for precipitation
nowcasting. In Proceedings of the 28th International Conference on Neural
Information Processing Systems (Vol. 1, pp. 802–810).
76. Xu, Y., Zhang, L., Du, B., & Zhang, F. (2018). Spectral–spatial unified networks
for hyperspectral image classification. IEEE Transactions on Geoscience and
Remote Sensing, 56(10), 5893–5909.
77. Zhang, X., Zhang, A., & Meng, X. (2015). Automatic fusion of hyperspectral
images and laser scans using feature points. Journal of Sensors, 2015, 9. Article
ID 415361
78. Zheng, J., Feng, Y., Bai, C., & Zhang, J. (2021). Hyperspectral image
classification using mixed convolutions and covariance pooling. IEEE
Transactions on Geoscience and Remote Sensing, 59(1), 522–534.
[Crossref]
79.
Zhong, Z., Li, J., Luo, Z., & Chapman, M. (2018). Spectral–spatial residual
network for hyperspectral image classification: A 3-d deep learning framework.
IEEE Transactions on Geoscience and Remote Sensing, 56(2), 847–858
80. Zhou, F., Hang, R., Liu, Q., & Yuan, X. (2019). Hyperspectral image classification
using spectral-spatial lstms. Neurocomputing, 328, 39–47.
[Crossref]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
S. S. Roy et al. (eds.), Deep Learning Applications in Image Analysis, Studies in Big Data 129
https://doi.org/10.1007/978-981-99-3784-4_9
1 Introduction
Pneumonia is a type of respiratory infection that affects the lungs. It leads to
inflammation in the lungs and fluid buildup in the air sacs within, causing
difficulties in breathing and simultaneous cardiovascular health effects.
Pneumonia is considered to be the single largest cause of death in children
worldwide, leading to an estimated count of 5.9 million deaths for children
under 5 years old annually [1]. Chest X-Rays and Radiography methods
have been prevalent in the medical industry for quite some time and the use
of such methods and tools have been administered in diagnosing and curing
issues and illnesses such as cancer, infections, emphysema and pneumonia.
The specialized analysis and diagnosis of an illness through the use of X-
Ray outputs are generally conducted by expert radiologists in person. In
recent times, the number of cases requiring chest X-Rays have substantially
increased [2], hence simultaneously, radiologists working on these outputs
now have to devote higher levels of time for this task. The requirement of
expertise for this task comes from the extremely detailed and niched
characteristics of the components present in the lung which has to be
analyzed and deduced via intricate characterizations and traits which
coherently point towards a general illness category. Due to the
aforementioned cause of increased frequency of Chest X-Ray instances, it is
a possibility that due to this vast volume of data to be manually processed,
can be a reason which simultaneously leads to time delays, cost problems,
and/or errors which may occur, which in the end is something that needs to
be avoided via any medical institution. Through the work described in this
chapter, we propose an automated medical image diagnosis system which
essentially will allow the radiologists and staff alike to gain an alternate and
handy method to efficiently process and analyze data without much hassle
or manual work. For our problem statement, we have used two
Convolutional Neural Network (CNN) based algorithms to classify Chest
X-Ray scans for the illness of pneumonia.
These CNN based algorithms have worked well with this specific image
classification problem due to it’s inherent trait of reducing dimensionality
of data and efficient processing for accurate results [3]. The aforementioned
advantages are due to the neural network subdivisions and their tasks,
namely Convolution Layer which breaks down the entire image into smaller
sub-parts of it for and efficient and less-dimensional input layer, Pooling
Layer assumes the convolution layer as input and reduces the
dimensionality further, and the Fully Connected Layer which can be
considered as the final layer upon which the network finally learns which
subdivisions/parts are necessary for the classification problem at hand.
2 Literature Survey
Till date, there have been few proposals and advances towards similar and
specific medical diagnosis problems. CNNs and Deep Neural Networks
have allowed researchers to build sophisticated models towards medical
issues including pneumonia, tuberculosis, Covid-19, lung cancer and many
more [4].
There are many different techniques and methodologies used to
progress the specific tasks of medical diagnosis employed by various
researchers in their respective fields, some of them include, Convolutional
Neural Networks, Transfer Learning, Image-Level Prediction,
Segmentation networks, Localization Networks, Image Generation
Networks, Domain Adaption Networks, and likewise [4].
For example, Crosby et al. employed the use of deep CNNs for
distinguishing between binary labelled chest radiograph data [5]. Deep
Learning has also been employed in detection of foreign objects in chest
radiographs using similar data [6]. The use of General Adversarial
Networks can also be seen in deployment of technology for organ
segmentation and bone suppression tasks in Chest X-Rays [7]. Transfer
learning based image classifier models have been researched by Showkat et
al. in detection of Covid-19 pneumonia [8]. Deep Learning techniques are
used by Hirata et al. in the pursuit of detecting pulmonary artery wedge
pressure metrics using Standard Chest X-Ray data. The research
community pertaining to these specific tasks have produced a foothold in
the use of CNNs in computer vision problems like these and in 2015 and
2016 more than 300 papers were published on applications of deep learning
in workshops, conferences, journals, and special issues in this
domain[9, 10].
3 Dataset
The dataset used to train our proposed models was obtained from the
internet website named Kaggle, and is named “Chest X-Ray Images
(Pneumonia)”. It consists of 5863 images as training samples each of which
has a binary feature associated with it depicting the individual datapoints as
either ‘normal’ or ‘pneumonia’. A point to note here is that, the feature
category for this specific dataset is binary in nature, hence the proposed
models will be tasked with the duty of analysing the image for the presence
of the disease of pneumonia in contrast to the task of finding specific types
of pneumonia ranging from bacterial to viral. The images present in the
dataset are formatted X-Ray images of the lungs (Fig. 1).
Fig.1 Three samples from normal and pneumonia classes
The dataset consists of 27% images of normal lung x-rays and the
remaining pertaining to those corresponding to pneumonia (Fig. 2).
5 Proposed Model
We have used 2 distinct models for this classification problem, the
Efficient-Net model and the Inception model. Both of these models are
based on CNNs (Convolutional Neural Networks).
5.1 EfficientNet
EfficientNet is an architecture framework based on the methodology of
model scaling in Convolutional Neural Networks. This architecture
uniformly scales all dimensions of depth/width/resolution using a
compound coefficient. The distinguishing factor for this specific
architecture is that it doesn’t use arbitrary scaling for these factors, it uses a
fixed set of scaling coefficients for uniformly scaling the network width,
depth and resolution. Using this technique, the creators have surpassed the
accuracy of almost all high performing convolutional network models,
while simultaneously achieving better efficiency.
For model scaling, the following methodologies of (a) Baseline model,
(b) Width Scaling, (c) Depth Scaling and (d) Resolution Scaling are
followed, whereas in the EfficientNet model, a methodology known as
compound scaling is used which inculcates all the previously techniques
into one hybrid and dynamic structure (Figs. 6 and 7).
Fig. 6 Baseline network with connecting layers
Fig. 7 Compound scaled network with connecting layers
For obtaining the compound scaling factor, it was observed that the
network depth should be increased for higher resolution images which helps
capture high pixel features in bigger images and correspondingly that
network width should be increased when the resolution is lower, due to the
need of capturing the fine grain patterns present in the images [11]. The
compound scaling method employed by the EfficientNet model using a
coefficient φ to uniformly scale the width, depth and resolution for the
neural network.
The equations for the same are:
where a, b, c are constants that are determined by a small grid search.
Henceforth, φ, is a user specified coefficient that controls how many
more resources are available for model scaling, while a,b,c specify how to
assign the extra resources to network width, depth and resolution
respectively [11].
The EfficientNet Architecture is the baseline network for implementing
a framework employing the above criterion and characteristics.
5.2 InceptionV3
InceptionV3 is an image recognition model which has demonstrably
achieved state-of-the-art accuracy levels for image associated tasks. It uses
and build upon it’s base architectures of the InceptionV1 model, which
inherently consisted of multiple filters of parallel layers instead of the
classical deep layers of a typical CNN model. Each subpart of a basic
Inception model is made of 4 parallel layers, which are: 1*1, 3*3, 5*5
convolutions and a 3*3 max pooling layer.
The InceptionV3 implementable model consists of building blocks,
including (a) convolutions, (b) average pooling, (c) max pooling, (d)
concatenations, (e) dropouts and (f)Softmax (Fig. 8).
Fig. 8 Input layer and output layer dimensions for InceptionV3 model
The model builds upon the base work of the InceptionV1 model, it
enables factorization of data into smaller convolutions, i.e. reducing high
dimensional data into smaller fragments for effective processing, the model
also uses spatial factorization into asymmetric convolutions, which entails
subdividing the previously occurred convolutions into factors of the form
n*1, which allows for higher efficiency in processing and outcome [12].
The model takes into effect the use of auxiliary classifiers which in essence,
acts as a regularizer here, also parallel stride blocks are created to allow for
an efficient grid size reduction algorithm in order to avoid a
representational bottleneck.
The loss value function, as shown graphically in Fig. 10, for the model
can be seen taking a huge initial decline and reaching it’s required lowest
value moving forward in a stable and coherent manner. The validation loss
curve doesn’t take a steep dive but goes through a sudden high peak value
in between it’s complete graph path, after which it stabilizes and reaches it’s
boundary values, which are close to the loss value curve boundary values.
Fig. 10 Loss value curve of inception model
6.2 EfficientNet
Figure 11 shows the accuracy graphs and validation of accuracy graphs for
the EfficientNet model, the training of the model has occurred for a
duration of 10 epochs. The peak accuracy achieved by the model is high
value of 95.39%, it displays the accuracy of the model steeply increasing
after the first epoch and gradually and stably achieving it’s peak value after
the last epoch. The validation accuracy curve can be seen performing a
similar curvature until dropping to an extremely low value and steeply
increasing after the subsequent epoch but again dropping extremely low
after two more epochs.
The loss value function for the model, as depicted in Fig. 12, can be
seen taking an initial decline and reaching it’s required lowest value while
performing simultaneous but negligible ups and downs throughout the
curvature. The validation loss curve can be observed performing a steep
initial decline similar to the loss value function. It achieves it’s peak
boundary value in the following steps therein, but it then suddenly increases
to an enormous amount and also decreases in the following epoch only to
increase substantially again after 2 more epochs.
Fig. 12 Loss value curve of EfficientNet model
7 Discussion
One of the necessities and dire requirements of radiologists, clinicians and
staff alike working towards the problem of detecting and curing pneumonia
and related conditions is the metric factors of time, frequency and volume
of data to be processed, and expertise requirements. The presence of already
existing classifiers for other medical diagnosis and related works, including
breast cancer detection [13], and also the recent use of CNNs being used in
Brain Tumour Classification [14]. Almost all of these can be solved to a
significant extent via the use of machine learning and neural network based
models to ease this task. But simultaneously, it must be noted that the final
diagnosis and inferences received from it should be done ultimately by a
trained professional, these classification models, for now, are present only
to aid the clinicians and trained experts in streamlining their tasks. Some
limitations a model like this would pertain along with itself is the
explanation of achieved metrics and reasons embedded therein, and
inability to characterize a few key metrics which demonstrate a substrata of
the general illness being caused and which could necessitates simultaneous
alternate remedies extending to a cohesion of multiple disorders either
causing or caused from the pneumonia disease. The accuracies achieved in
this chapter, can be improved further by incorporating a larger dataset, or
developing further specific and custom models based exclusively on X-Ray
diagnostics. Another method which can be availed to achieve improvement
is to incorporate medical histories of the patient in a significant shape or
form to be included in as a feature variable in the dataset. Furthermore, data
augmentation techniques can be identified and incorporated in future
models for achieving higher output metrics [15–30].
8 Conclusion
In this chapter, we have discussed the outcomes and experimental usage and
use-cases of the EfficientNet and InceptionV3 models for the medical
diagnosis of pneumonia via Chest X-Rays. We have achieved high
performance results of 95.39% and 92.93% which is achieved at a
significantly low computational cost. Thereby, using the discussed
frameworks can highly beneficial in the medical diagnosis of the disease
and come in handy to the professional medical practitioners and radiologists
working with the related problem statement. Further refinement of
approaches and methodologies will definitely provide a highly positive
impact towards this cause and pave the way for further improvements
therein.
References
1. Yadav, K. K., & Awasthi, S. (2016). The current status of community-acquired
pneumonia management and prevention in children under 5 years of age in India:
A review. Therapeutic Advances in Infectious Disease, 3(3–4), 83–97.
[Crossref]
2. Çallı, E., Sogancioglu, E., van Ginneken, B., van Leeuwen, K. G., & Murphy, K.
(2021). Deep learning for chest X-ray analysis: A survey. Medical Image Analysis,
72, 102125.
[Crossref]
3. Li, Q., Cai, W., Wang, X., Zhou, Y., Feng, D. D., & Chen, M. (2014). Medical
image classification with convolutional neural network. In 13th International
Conference on Control Automation Robotics & Vision (ICARCV), Singapore, pp.
844–848. https://doi.org/10.1109/ICARCV.2014.7064414
4. Çallı, E., Sogancioglu, E., van Ginneken, B., van Leeuwen, K. G., & Murphy, K.
(2021). Deep learning for chest X-ray analysis: A survey. Medical Image Analysis,
72, 102125. ISSN 1361-8415 https://doi.org/10.1016/j.media.2021.102125
5. https://www.spiedigitallibrary.org/journals/journal-of-medical-imaging/volume-
7/issue-1/016501/Deep-convolutional-neural-networks-in-the-classification-of-
dual-energy/https://doi.org/10.1117/1.JMI.7.1.016501.short?SSO=1
6. Deshpande, H., Harder, T., Saalbach, A., Sawarkar, A., Buelow, T. (2020).
Detection of foreign objects in chest radiographs using deep learning. In IEEE
17th International Symposium on Biomedical Imaging Workshops (ISBI
Workshops). Iowa City, IA, USA, pp. 1–4. https://doi.org/10.1109/
ISBIWorkshops50223.2020.9153350
7. Eslami, M., Tabarestani, S., Albarqouni, S., Adeli, E., Navab, N., & Adjouadi, M.
(2020). Image-to-images translation for multi-task organ segmentation and bone
suppression in chest X-ray radiography. IEEE Transactions on Medical Imaging,
39(7), 2553–2565. https://doi.org/10.1109/TMI.2020.2974159
[Crossref]
10. Greenspan, H., Summers, R. M., & van Ginneken, B. (2016). Deep learning in
medical imaging: Overview and future promise of an exciting new technique.
IEEE Transactions on Medical Imaging, 35(5), 1153–1159.
[Crossref]
11. Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for
convolutional neural networks. In International Conference on Machine
Learning (pp. 6105–6114). PMLR.
12. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking
the inception architecture for computer vision. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (pp. 2818–2826).
13. Mittal, D., Gaurav, D., & Sekhar Roy, S. (2015). An effective hybridized classifier
for breast cancer diagnosis. In 2015 IEEE International Conference on Advanced
Intelligent Mechatronics (AIM), Busan, Korea (South), pp. 1026–1031. https://doi.
org/10.1109/AIM.2015.7222674
14. Roy, S. S., Rodrigues, N., & Taguchi, Y. (2020). Incremental dilations using CNN
for brain tumor classification. Applied Sciences 10(14):4915. https://doi.org/10.
3390/app10144915
15. Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation
for deep learning. Journal of Big Data, 6, 60.
[Crossref]
16. Roy, S. S., Hsu, C., Samaran, A., Goyal, R., Pande, A., et al. (2023). Vessels
segmentation in angiograms using convolutional neural network: A deep learning
based approach. CMES-Computer Modeling in Engineering & Sciences, 136(1),
241–255.
[Crossref]
17. Turki, T., & Roy, S. S. (2022). Novel hate speech detection using word cloud
visualization and ensemble learning coupled with count vectorizer. Applied
Sciences, 12(13), 6611.
[Crossref]
18.
Roy, S. S., Goti, V., Sood, A., Roy, H., Gavrila, T., Floroian, D., Mohammadi-
Ivatloo, B., et al. (2014). L2 regularized deep convolutional neural networks for
fire detection. Journal of Intelligent & Fuzzy Systems, 1–12.
19. Roy, S. S., Mihalache, S. F., Pricop, E., & Rodrigues, N. (2022). Deep
convolutional neural network for environmental sound classification via dilation.
Journal of Intelligent & Fuzzy Systems, 1–7.
21. Bose, A., Hsu, C. H., Roy, S. S., Lee, K. C., Mohammadi-Ivatloo, B., &
Abimannan, S. (2021). Forecasting stock price by hybrid model of cascading
multivariate adaptive regression splines and deep neural network. Computers and
Electrical Engineering, 95, 107405.
22. Roy, S. S., & Taguchi, Y. H. (2021). Identification of genes associated with altered
gene expression and m6A profiles during hypoxia using tensor decomposition
based unsupervised feature extraction. Scientific Reports, 11(1), 1–18.
23. Roy, S. S., & Samui, P. (2021). Predicting longitudinal dispersion coefficient in
natural streams using minimax probability machine regression and multivariate
adaptive regression spline. International Journal of Advanced Intelligence
Paradigms, 19(2), 119–127.
24. Marques, G., Agarwal, D., & de la Torre, I. (2020). Automated medical diagnosis
of COVID-19 through EfficientNet convolutional neural network. Applied Soft
Computing, 96, 106691.
25. Biswas, R., Vasan, A., & Roy, S. S. (2020). Dilated deep neural network for
segmentation of retinal blood vessels in fundus images. Iranian Journal of
Science and Technology, Transactions of Electrical Engineering, 44(1), 505–518.
[Crossref]
26. Roy, S. S., Samui, P., Nagtode, I., Jain, H., Shivaramakrishnan, V., &
Mohammadi-Ivatloo, B. (2020). Forecasting heating and cooling loads of
buildings: A comparative performance analysis. Journal of Ambient Intelligence
and Humanized Computing, 11(3), 1253–1264.
27.
Roy, S. S., Chopra, R., Lee, K. C., Spampinato, C., & Mohammadi-Ivatlood, B.
(2020). Random forest, gradient boosted machines and deep neural network for
stock price forecasting: A comparative analysis on South Korean companies.
International Journal of Ad Hoc and Ubiquitous Computing, 33(1), 62–71.
28. Roy, S. S., Mihalache, S. F., Pricop, E., & Rodrigues, N. (2022). Deep
convolutional neural network for environmental sound classification via
dilation. Journal of Intelligent & Fuzzy Systems, 1–7.
29. Chakraborty, C., Bhattacharya, M., Sharma, A. R., Roy, S. S., Islam, M. A.,
Chakraborty, S., Dhama, K., et al. (2022). Deep learning research should be
encouraged for diagnosis and treatment of antibiotic resistance of microbial
infections in treatment associated emergencies in hospitals. International Journal
of Surgery (London, England), 105, 106857.
30. Lee, K. C., Roy, S. S., Samui, P., & Kumar, V. (Eds.). (2020). Data analytics in
biomedical engineering and healthcare. Academic Press.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
S. S. Roy et al. (eds.), Deep Learning Applications in Image Analysis, Studies in Big Data 129
https://doi.org/10.1007/978-981-99-3784-4_10
B. K. Tripathy
Email: tripathybk@vit.ac.in
1 Introduction
Cancer is a dreaded disease which is posing threat to the human society and
according to the data provided by World Health Organisation, cancer
accounted for 13% of all the fatalities in 2018 [1]. In the upcoming years it
is predicted to be ranked among the most deadly diseases in the world. As
projected, 12 million individuals are likely to be affected by cancer in 2030.
The number of cancer cases would rise dramatically in the next few years.
Experts, specialists, and medical professionals are developing new methods
to combat cancer, but it is well recognized that this battle is quite
challenging [2–4].
Evaluating the visuals related to medical data by technicians, supported
by computers is referred to as interpretation. Diagnostic ultrasound images,
on the contrary, demand a large volume of data to be addressed by the
physician and require thorough analysis in a short amount of time. These
imaging processes include high-energy electromagnetic radiation. Digital
photographs are analyzed by computer-assisted methods to detect the
presence or absence of cancer in the early stages [5].
Analysis of medical images using computer tools supports medical
professionals in interpretation of medical information inherent in the
images. On the other hand, diagnosing ultrasound images using specific
imaging processes such as high intensity electromagnetic radiation
necessitates a significant quantity of data to be controlled from doctor's end
and involves thorough analysis in a short amount of time. Digital
photographs analyzed by computer-assisted methods are potentially used to
detect presence or absence of the disease in the early stage. Therefore, early
cancer detection is the top goal for securing lives. To find and diagnose
cancer in its early stages, many visual examinations and manual methods
are used. As human interference in analyzing medical images requires
enough time and expertise in order to improve the efficiency of medical
image interpretation, computerized systems for disease diagnosis have been
proposed [5].
Developments in the areas of AI and Machine learning (ML) have been
progressing fast during the recent years and their rise in the fields of
computer vision, image processing, and computer-assisted diagnosis are eye
catching [6]. Some of these applications use the traditional machine
learning techniques like Support Vector Machines (SVM), decision trees,
K- Nearest Neighbour (KNN) and back propagation [7]. Figure 1 illustrates
the overall relationship among AI, ML and their components. An Artificial
Neural Network (ANN) has an input layer, an output layer and a number of
hidden layers of neurons according to the requirement of the applications
The input layer accepts attributes in the form of input data and uses the
associated weights in the connections to get the total input before applying
the activation functions to get the outputs at the hidden layer nodes. This
process is repeated layer of after layer till it reaches the output layer which
generates the final outputs [8]. This increases in accuracy of prediction, aids
clinicians in mapping subject’s treatments and eliminating emotional and
physical challenges caused by sickness. An important aspect supporting
clinical researchers is an increase in the number of diagnoses made utilizing
latest cutting-edge AI technology. Computer engineers and health scientists
can now successfully diagnose patients by using multi-factor analysis,
classical logistic regression, and analysis assisted by AI. This is made
possible by theoretical and technical advancements in computer programs
and statistics. These estimations are much more accurate than the
experimental estimates. Recently, researchers have started to develop new
models to predict and detect cancer using AI. These models are crucial for
increasing the precision of survival from cancer and sensitivity estimations
[3].
But just with the detection and management of cancer, this diagnosis
must be made in the earliest stages of the illness. The most important thing
is to diagnose cancer early in order to preserve the lives of many individuals
[9]. For this form of cancer diagnosis, visual examination and manual
techniques are typically used. It takes a lot of effort and is quite error-prone
to explain medical imagery [10]. Due to the ambiguous nature of the
symptoms, the limitations of mammography and other screening methods,
and the potential for recurrence after care, a cancer in its initial phases is
extremely challenging [11]. Therefore, high resolution medical diagnostics
in cancer investigations will lead to the development of better predictive
models [12]. An analysis of studies on the identification and management
of cancer in the literature shows that the application of AI approaches is
expanding [13]. Additionally, this has come to light that AI techniques are
more effective than conventional analysis methods like statistical and
multivariate analysis. Particularly the DL approach among AI techniques
produces excellent outcomes [14].
A specific kind of neural network called DL has numerous hidden
layers. DL is implemented in many different industries recently [15]. It has
demonstrated particularly high efficiency results in use cases like voice
recognition, as well as image detection within advanced devices such as
driverless cars and drones [14, 16, 17]. Additionally, fundamental
classifications including the identification of cancerous and healthy tissue
are carried out, and conventional ML techniques are used in the produced
models. Deep neural networks powered by artificial intelligence, on the
other hand, offer a better way to use data matrices to create classification
models. With the use of these models, cancer may be identified, its
progression can be observed and predicted, and timely and effective cancer
therapy can then be administered [18].
DL approaches operate by using a backpropagation algorithm to
uncover fine structures in huge and frequently complex datasets. Existing
techniques, such as those based on machine learning, have limits when it
comes to handling raw data in its native format without preprocessing [19].
The ability to learn invariant features is a property of convolutional neural
networks (CNN), a type of DL system. To build patterns for various object
identification tasks like detection, segmentation, and classification, CNNs
use feature pooling layers, filter banks, dropout layers, batch normalization
layers, and dense layers. CNNs include a multilevel hierarchy in which the
dispersion of inputs varies throughout training. To achieve improved
performance throughout tasks, preprocessed data is extremely desirable
[20]. There are many other CNN variations, including those with shorter
connections, like the DenseNet architecture, which gives a significant
reduction in the number of hyper parameters needed to develop effective
designs and has benefits for feature circulation [21].
ResNets, Xception, and GooLeNet designs are other varieties of CNN
architectures that have been more effective recently. These networks are
necessary because multiscale processing is required, job performance across
the board degrades as the network gets deeper, and better topologies with
fewer parameters are sought [22–25].
Another critical challenge in DL is the capability of an architecture to
store data over long time periods. Long Short-Term Memory has been
suggested as a potential remedy for this issue (LSTM). Through the states
of specialized units, the LSTM design enforces continuous error flow which
is non-global in time and space [26].
The concept of transfer learning is another DL concept worth
mentioning. Transfer learning involves applying features taken from deep
convolutional neural networks to contemporary and inventive jobs. The
requirement for this arises from the possibility that generic tasks may differ
significantly from the original tasks and that there won't be enough marks
or inputs to train DL architecture for new tasks. The use of transfer learning
also allows characteristics to be modified with ease so that they dependably
express generalization well enough [27–29].
DL techniques utilized in cancer detection and treatments are
investigated in this paper. The purpose of the study is to demonstrate, with
the help of the literature, the effectiveness of a deep learning approach—
one of the machine learning techniques treating a condition like cancer, as
well as the methodologies and techniques that are employed and how they
are applied [30].
2 Deep Learning
2.1 Basics of Deep Learning
DL has gained a lot of popularity and success in nearly every industry and
has emerged as a useful tool for understanding how machines perceive the
world. In fields including speech recognition, image classification, video
scrutiny and natural language learning DL techniques are applied [31].
Based on a DL created mathematical model, analysis is performed without
using any attribute extractor. The scope for generalization of DL techniques
is one of their key benefits. For additional applications and data types, a
learnt neural network method can be used. When the data set is inadequate
DL performs poorly [32].
DL exists as a kind of machine learning approach which capitalizes on
benefits of nonlinear processing unit layers [15]. The result of the preceding
layer is fed into the subsequent layer as an input. Data is established on the
results from the visualization of the data in the DL approach by
understanding multiple feature levels [33]. A hierarchy is created in the
representation by deriving low-level features from top-level features. While
generally based on ANN, DL techniques include more buried layers and
neurons [34]. DL techniques show excellent outcomes when processing a
variety of data kinds, including text, audio, and video [35, 36]. There are
several applications of DL, including information retrieval, audio and
speech processing [14], multi-modal and multi-task learning, Natural
Language Processing (NLP), image segmentation and image recognition
[16].
4.3 LeNet-5
This is a 7-stage convolutional neural network that is utilized to categorize
handwritten digits. For a complicated scenario, the number of convolution
layers is employed with input images of size 32 × 32. Figure 3 shows the
LeNet design, which consists of two convolutional layers, subsampling
layers, and fully linked layers. Gaussian connectivity was used on a single
output layer [47].
4.4 AlexNet
While Alexnet's design is identical to that of LeNet’s, it possesses deeper
layers, increased filters for every layer, and connected convolutional layers.
After every fully connected layer and convolutional layer, the function of
ReLU activation was added. With a decreased error of 15.3% from 26%,
this was a winning architecture during 2012. It includes data augmentation,
dropout, max pooling, and ReLU activations in addition to 11 × 11, 5 × 5,
and 3 × 3 convolutional kernels [48]. In Fig. 4, the AlexNet architecture is
shown.
Fig. 4 Architecture of AlexNet
4.5 ZFNet
Although ZFNet's architecture was similar to AlexNet, its settings had been
fine-tuned, making it the 2013 challenge winner. There was a 14.8%
reduction in inaccuracies. The number of weights is reduced by using 7 7
kernels rather than 11 11 kernels. The precision is increased as a result of
reducing the number of tuning parameters [49].
4.6 GoogleNet
A part of the GoogleNet design is LeNet, which has an inception structure.
It has 22 number of layers, and throughout testing the rate of error
decreased gradually from 6.66 to 3.66%. The building was the winner of
ILSVRC 2014 [46]. When compared to the conventional CNN architecture,
it has a reduced computational complexity. Compared to other architectures
like AlexNet and VGG [50], it was less frequently used. In Fig. 5, the
GoogleNet architecture is shown.
Fig. 5 Architecture of GoogleNet
4.7 VGGNet
The VGGNet, which consists of sixteen convolution layers with several
filters, was the ILSVRC 2014 winner [39]. With this architecture, feature
extraction has been found to be effective, however parameter adjustment is
quite important. Three VGG models with 11 layers, 16 layers, and 19 layers
each were proposed: VGG-11, 16, and 19. All VGG models have three fully
connected layers at the very end. Figure 6 shows the architecture of the
VGGNet.
4.8 ResNet
In order to employ prevent connections and normalization of batch, the
ResNet, which won the ILSVRC 2015, was used [51]. When compared to
the VGGNet, the computation complexity was lower. The gated recurrent
units were utilized for skipping connections. This has 152 layers in total,
the inaccuracy is kept at minimum of 3.57%. It finds a solution to the
vanishing gradient issue. It has a residual connection and is one traditional
feed forward NN [52]. It consists of a number of leftover blocks, and
depending on the architecture, it operates differently. In Fig. 7, the residual
network is shown.
4.10 U-Net
U-Net, which has two routes, was created for the segmentation of medical
images. The first path has an encoder which records the context of the
image. However, the second path consists of transposed convolutions as
well as a decoder [53, 54]. Figure 9 shows the U-Net.
Fig. 9 Architecture of U-Net
4.12 Autoencoders
The auto encoder functions as a potent unsupervised learning architecture
with three layers: encoder, decoder, and code. Encoding data into a more
compact representation is the function of an encoder. As a result, the input's
distortion is represented by the compressed image. The compressed input is
represented by code. Another layer that is referred to as a bottleneck is the
layer that sits between the encoder and the decoder. Figure 11 shows the
construction of the autoencoder. The decoder converts the code into a
replica of the initial input. The key characteristics are lossy and data-
specific. Four hyperparameters, including the code size, layer count, nodes
per layer, and loss function, need to be tuned before training the
architecture. The application areas of the autoencoder include dimension
reduction, image compression, image denoising, and feature extraction [57,
58].
Fig. 11 Architecture of autoencoders
7 Conclusions
DL has been successful in displaying its effectiveness in feature extraction,
and their properties have improved cancer prognosis and prediction. DL
models have revolutionized cancer diagnosis and prediction because of their
superior features, learning architectures have received massive use in
cancer cell segmentation and classification. Data augmentation was critical
in diagnosis of cancer and prediction jobs in order to enhance system
efficiency. DL solutions are evaluated and verified in areas such as
replicability and universal applicability in treatment of cancer. These
techniques helped in the early detection of cancer and contributed to patient
recovery or life extension.
DL based technological innovation has started to benefit the local and
national medical sectors. Consequently, it is advantageous to use DL
technology in cancer diagnostics and general medicine in order to get
further theoretical understanding. Researchers studying ML algorithms for
diagnosing diseases as well as experts in planning and treating have
something to gain from this work's conclusion.
References
1. Grisold, W. (Ed.) (2021). Wolfgang Grisold, Riccardo Soffietti, Stefan
Oberndorfer, Guido Cavaletti (eds): Effects of cancer treatment on the nervous
system.
2. Tang, J., Rangayyan, R. M., Xu, J., El Naqa, I., & Yang, Y. (2009). Computer-
aided detection and diagnosis of breast cancer with mammography: Recent
advances. IEEE Transactions on Information Technology in Biomedicine, 13(2),
236–251.
[Crossref]
3. Munir, K., Elahi, H., Ayub, A., Frezza, F., & Rizzi, A. (2019). Cancer diagnosis
using deep learning: A bibliographic review. Cancers, 11(9), 1235.
[Crossref]
4. Huang, S., Yang, J., Fong, S., & Zhao, Q. (2020). Artificial intelligence in cancer
diagnosis and prognosis: Opportunities and challenges. Cancer letters, 471, 61–
71.
[Crossref]
6. Bhardwaj, P., Guhan, T., & Tripathy, B. K. (2021). Computational biology in the
lens of CNN. In S. S. Roy, Y. H. Taguchi (eds.), Handbook of machine learning
applications for genomics (Chapter 5). Studies in Big Data. ISBN: 978-981-16-
9157-7 496166_1_En
9. Bhandari, A., Tripathy, B. K., Jawad, K., Bhatia, S., Rahmani, M. K. I., & Mash,
A. (2022). Cancer detection and prediction using genetic algorithms. Comput
Intell Neurosci 2022, 18. https://doi.org/10.1155/2022/1871841
10. Allahyar, A., Ubels, J., & de Ridder, J. (2019). A data-driven interactome of
synergistic genes improves network-based cancer outcome prediction. PLoS
Computational Biology, 15(2), e1006657.
[Crossref]
11.
Adate, A., Tripathy, B. K., Arya, D., & Shaha, A. (2020) Impact of deep neural
learning on artificial intelligence research. In S. Bhattacharyya, A. E. Hassanian,
S. Saha, & B. K. Tripathy (Eds.), Deep learning research and applications (pp.69–
84). De Gruyter Publications. https://doi.org/10.1515/9783110670905-004
12. Mitchell, M. J., Jain, R. K., & Langer, R. (2017). Engineering and physical
sciences in oncology: Challenges and opportunities. Nature Reviews Cancer,
17(11), 659–675.
[Crossref]
13. Obermeyer, Z., & Emanuel, E. J. (2016). Predicting the future—big data, machine
learning, and clinical medicine. The New England Journal of Medicine, 375(13),
1216.
[Crossref]
14. Graves, A., Mohamed, A. R., & Hinton, G. (2013). Speech recognition with deep
recurrent neural networks. In 2013 IEEE International Conference on Acoustics,
Speech and Signal Processing (pp. 6645–6649). IEEE.
15. Bhattacharyya, D. S., Snasel, V., Hassanian, A. E., Saha, S., & Tripathy, B. K.
(2020). Deep learning research with engineering applications. De Gruyter
Publications. ISBN: 3110670909, 9783110670905. https://doi.org/10.1515/
9783110670905
16. Bose, A., & Tripathy, B. K. (2020) Deep learning for audio signal classification.
In S. Bhattacharyya, A. E. Hassanian, S. Saha, & B. K. Tripathy (Eds.), Deep
learning research and applications (pp. 105–136). De Gruyter Publications. https://
doi.org/10.1515/9783110670905-00660
17. Singhania, U., & Tripathy, B. K. (2021). Text-based image retrieval using deep
learning. In Encyclopedia of information science and technology (5th edn, p. 11).
https://doi.org/10.4018/978-1-7998-3479-3.ch007
18. Yagna Sai Surya, K., Geetha Rani, T., & Tripathy, B. K. (2022). Social distance
monitoring and face mask detection using deep learning. In J. Nayak, H. Behera,
B. Naik, S. Vimal, & D. Pelusi (Eds.), Computational intelligence in data mining
(Vol. 281). Smart Innovation, Systems and Technologies. Springer, Singapore.
https://doi.org/10.1007/978-981-16-9447-9_36
19. Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network
training by reducing internal covariate shift. In International Conference on
Machine Learning (pp. 448–456). PMLR.
20. Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely
connected convolutional networks. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (pp. 4700–4708).
21. Kyi, C. W., Birriel, P. C., Davidsen, T. M., Ferguson, M. L., Gesuwan, P., Griner,
N. B., Gerhard, D. S., et al. (2020). NCI office of cancer genomics supports
multidisciplinary genomics research initiatives to advance precision oncology.
Cancer Research, 80(16_Supplement), 5862–5862.
22. Pogorelov, K., Randel, K. R., Griwodz, C., Eskeland, S. L., de Lange, T.,
Johansen, D., Halvorsen, P., et al. (2017). Kvasir: A multi-class image dataset for
computer aided gastrointestinal disease detection. In Proceedings of the 8th ACM
on Multimedia Systems Conference (pp. 164–169).
23. Mesri, M., An, E., Hiltke, T., Robles, A. I., Rodriguez, H., & CPTAC
Investigators. (2022). NCI’s clinical proteomic tumor analysis consortium: A
proteogenomic cancer analysis program. Cancer Research, 82(12_Supplement),
6331–6331.
24. Gupta, P., Bhachawat, S., Dhyani, K., & Tripathy, B. K. (2021). A study of gene
characteristics and their applications using deep learning, (Chapter 4). In S. S.
Roy, & Y. H. Taguchi (Eds.), Handbook of Machine Learning Applications for
Genomics (Vol. 103). Studies in Big Data. ISBN: 978-981-16-9157-7,
496166_1_En.
25. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural
Computation, 9(8), 1735–1780.
[Crossref]
26. Maheswari, K., Shaha, A., Arya, D., Tripathy, B. K., & Rajkumar, R. (2020).
Convolutional neural networks: A bottom-up approach. In S. Bhattacharyya, A. E.
Hassanian, S. Saha, & B. K. Tripathy (Eds.), Deep Learning Research with
Engineering Applications (pp. 21–50). De Gruyter Publications. https://doi.org/10.
1515/9783110670905-002
27. Tripathy, B. K., & Deepthi, P. H. (2015). Application of spatial FCM in detecting
cancer cells. IIMT Research Network (pp. 1–6, 96–100). ISBN 878-93-82208-77-
8.
28.
Zhong, Z., Sun, L., & Huo, Q. (2019). An anchor-free region proposal network for
Faster R-CNN-based text detection approaches. International Journal on
Document Analysis and Recognition (IJDAR), 22(3), 315–327.
[Crossref]
29. Hanefi Calp, M. (2021). Use of deep learning approaches in cancer diagnosis. In
Deep Learning for Cancer Diagnosis (pp. 249–267). Springer, Singapore.
30. Karahan, Ş., & Akgül, Y. S. (2016). Eye detection by using deep learning. In 2016
24th Signal Processing and Communication Application Conference (SIU) (pp.
2145–2148). IEEE.
31. Özkan, İN. İK., & Ülker, E. (2017). Derin öğrenme ve görüntü analizinde
kullanılan derin öğrenme modelleri. Gaziosmanpaşa Bilimsel Araştırma Dergisi,
6(3), 85–104.
32. Şeker, A., Diri, B., & Balık, H. H. (2017). Derin öğrenme yöntemleri ve
uygulamaları hakkında bir inceleme. Gazi Mühendislik Bilimleri Dergisi, 3(3),
47–64.
33. Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends®
in Machine Learning, 2(1), 1–127.
34. Tripathy, B. K., Raju, H., & Kaul, D. (2018). Deep learning in health care,
accepted in deep learning for remote sensing and GIS: Frontier advancements and
applications. In V. Santhi (Eds.) CRC publications
35. Ravì, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J., Lo, B., &
Yang, G. Z. (2016). Deep learning for health informatics. IEEE Journal of
Biomedical and Health Informatics, 21(1), 4–21.
[Crossref]
36. Küçük, D., & Arici, N. (2018). Doğal Dil İşlemede Derin Öğrenme Uygulamalari
Üzerine Bir Literatür Çalişmasi. Uluslararası Yönetim Bilişim Sistemleri ve
Bilgisayar Bilimleri Dergisi, 2(2), 76–86.
37. Ohmori, M., Ishihara, R., Aoyama, K., Nakagawa, K., Iwagami, H., Matsuura, N.,
& Tada, T., et al. (2020). Endoscopic detection and differentiation of esophageal
lesions using a deep neural network. Gastrointestinal Endoscopy, 91(2), 301–309.
40. Sihare, P., Ullah Khan, A., Bardhan, P., & Tripathy, B. K. (2022). COVID-19
detection using deep learning: A comparative study of segmentation algorithms.
In A. K. Das et al. (Eds.), Proceedings of the 4th International Conference on
Computational Intelligence in Pattern Recognition (CIPR) (pp. 1–10), CIPR
2022, LNNS 480.
41. Dai, J., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based
fully convolutional networks. Advances in Neural Information Processing
Systems, 29.
42. Raina, R., Madhavan, A., & Ng, A. Y. (2009). Large-scale deep unsupervised
learning using graphics processors. In Proceedings of the 26th Annual
International Conference on Machine Learning (pp. 873–880).
43. Tripathy, B. K., Dash, S., & Patro, B. N. (2012). Study of classification accuracy
of microarray data for cancer classification using multivariate and hybrid feature
selection method. IOSR Journal of Engineering (IOSRJEN), 2(8), 112–119 ISSN:
2250-302.
44. Adate, A., & Tripathy, B. K. (2017). Understanding single image super-resolution
techniques with generative adversarial networks. Advances in Intelligent Systems
and ComputingIn J. Bansal, K. Das, A. Nagar, K. Deep, & A. Ojha (Eds.), Soft
computing for problem solving (Vol. 816, pp. 833–840). Springer.
[Crossref]
45. Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., & Alsaadi, F. E. (2017). A survey of
deep neural network architectures and their applications. Neurocomputing, 234,
11–26.
[Crossref]
46. Mustafa, H. T., Yang, J., & Zareapoor, M. (2019). Multi-scale convolutional
neural network for multi-focus image fusion. Image and Vision Computing, 85,
26–35.
[Crossref]
47. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning
applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
[Crossref]
48.
Kaul, D., Raju, H., & Tripathy, B. K. (2022). Deep learning in healthcare. In D. P.
Acharjya, A. Mitra, & N. Zaman (Eds.), Deep learning in data analytics, deep
learning in data analytics-recent techniques, practices and applications (Vol. 91,
pp. 97–115). Studies in Big Data. Springer, Cham. https://doi.org/10.1007/978-3-
030-75855-4_6
49. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for
large-scale image recognition. Preprint retrieved from arXiv:1409.1556.
50. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Fei-Fei, L., et
al. (2015). Imagenet large scale visual recognition challenge. International
Journal of Computer Vision, 115(3), 211–252.
[MathSciNet][Crossref]
51. Tripathy, B. K., Garg, N., & Nikhitha, P. (2014). Image retrieval using latent
feature learning by deep architecture. In Proceedings of the IEEE ICCIC2014 (pp.
663–666)
52. Targ, S., Almeida, D., & Lyman, K. (2016). Resnet in resnet: Generalizing
residual architectures. Preprint retrieved from arXiv:1603.08029.
53. Tripathy, B. K., Parikh, S., Ajay, P., & Magapu, C.: Brain MRI segmentation
techniques based on CNN and its variants (Chapter-10). In J. Chaki (Ed.), Brain
tumor MRI image segmentation using deep learning techniques (pp.161–182.).
Elsevier publications. https://doi.org/10.1016/B978-0-323-91171-9.00001-6
54. Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T., & Ronneberger, O. (2016).
3D U-Net: learning dense volumetric segmentation from sparse annotation. In
International Conference on Medical Image Computing and Computer-Assisted
Intervention (pp. 424–432). Springer, Cham.
55. Baktha, K., & Tripathy, B. K. (2017). Investigation of recurrent neural networks in
the field of sentiment analysis. In International Conference on Communication
and Signal Processing (ICCSP), (pp. 2047–2050). https://doi.org/10.1109/ICCSP.
2017.8286763
56. Adate, A., & Tripathy, B. K. (2019). S-LSTM-GAN: Shared recurrent neural
networks with adversarial training. In A. Kulkarni, S. Satapathy, T. Kang, A.
Kashan (Eds.), Proceedings of the 2nd International Conference on Data
Engineering and Communication Technology (Vol. 828, pp. 107–115). Advances
in Intelligent Systems and Computing. Springer, Singapore.
57.
Loey, M., El-Sawy, A., & El-Bakry, H. (2017). Deep learning autoencoder
approach for handwritten arabic digits recognition. Preprint retrieved from arXiv:
1706.06720.
58. Thomas, S. A., Race, A. M., Steven, R. T., Gilmore, I. S., & Bunch, J. (2016).
Dimensionality reduction of mass spectrometry imaging data using autoencoders.
In 2016 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1–7).
IEEE.
61. Jeong, J. (2017). Deep learning for cancer screening in medical imaging. Hanyang
Medical Reviews, 37(2), 71–76.
[Crossref]
62. Pereira, G. C., Traughber, M., & Muzic, R. F. (2014). The role of imaging in
radiation therapy planning: past, present, and future. BioMed Research
International.
63. Adate, A., & Tripathy, B. K. (2018) Deep learning techniques for image
processing. In S. Bhattacharyya, H. Bhaumik, A. Mukherjee, & S. De (Eds.),
Machine learning for big data analysis (pp. 69–90). De Gruyter, Berlin, Boston.
https://doi.org/10.1515/9783110551433-00357
64. Jain, S., Singhania, U., Tripathy, B., Nasr, E. A., Aboudaif, M. K., & Kamrani, A.
K. (2021). Deep learning-based transfer learning for classification of skin
cancer. Sensors (Basel), 21(23), 8142. https://doi.org/10.3390/s21238142
65. Tong, N., Lu, H., Ruan, X., & Yang, M. H. (2015). Salient object detection via
bootstrap learning. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (pp. 1884–1892).
66. Kallenberg, M., Petersen, K., Nielsen, M., Ng, A. Y., Diao, P., Igel, C., Lillholm,
M., et al. (2016). Unsupervised deep learning applied to breast density
segmentation and mammographic risk scoring. IEEE Transactions on Medical
Imaging, 35(5), 1322–1331.
[Crossref]
67. Wang, H., Roa, A. C., Basavanhally, A. N., Gilmore, H. L., Shih, N., Feldman,
M., Madabhushi, A., et al. (2014). Mitosis detection in breast cancer pathology
images by combining handcrafted and convolutional neural network features.
Journal of Medical Imaging, 1(3), 034003.
[Crossref]
68. Ertosun, M. G., & Rubin, D. L. (2015). Probabilistic visual search for masses
within mammography images using deep learning. In 2015 IEEE International
Conference on Bioinformatics and Biomedicine (BIBM) (pp. 1310–1315). IEEE.
69. Turkki, R., Linder, N., Kovanen, P. E., Pellinen, T., & Lundin, J. (2016).
Antibody-supervised deep learning for quantification of tumor-infiltrating
immune cells in hematoxylin and eosin stained breast cancer samples. Journal of
Pathology Informatics, 7(1), 38.
[Crossref]
70. Huang, Z., Zhan, X., Xiang, S., Johnson, T. S., Helm, B., Yu, C. Y., Huang, K., et
al. (2019). SALMON: Survival analysis learning with multi-omics neural
networks on breast cancer. Frontiers in Genetics, 10, 166.
[Crossref]
72. Liu, Y., Gadepalli, K., Norouzi, M., Dahl, G. E., Kohlberger, T., Boyko, A.,
Stumpe, M. C., et al. (2017). Detecting cancer metastases on gigapixel pathology
images. Preprint retrieved from arXiv preprint arXiv:1703.02442.
73. Cruz-Roa, A., Gilmore, H., Basavanhally, A., Feldman, M., Ganesan, S., Shih, N.
N., Tomaszewski, J., González, F. A., & Madabhushi, A. (2017). Accurate and
reproducible invasive breast cancer detection in whole-slide images: A deep
learning approach for quantifying tumor extent. Scientific Reports, 7(1), 1–14.
74. Yap, M. H., Pons, G., Marti, J., Ganau, S., Sentis, M., Zwiggelaar, R., Davison, A.
K., & Marti, R. (2017). Automated breast ultrasound lesions detection using
convolutional neural networks. IEEE Journal of Biomedical and Health
Informatics, 22(4), 1218–1226.
75.
Das, A., Acharya, U. R., Panda, S. S., & Sabut, S. (2019). Deep learning based
liver cancer detection using watershed transform and Gaussian mixture model
techniques. Cognitive Systems Research, 54, 165–175.
[Crossref]
76. Devi, P., & Dabas, P. (2015). Liver tumor detection using artificial neural
networks for medical images. International Journal of Innovative Reserach
Science Technology, 2(3), 34–38.
77. Li, W. (2015). Automatic segmentation of liver tumor in CT images with deep
convolutional neural networks. Journal of Computer and Communications, 3(11),
146.
[Crossref]
78. Gruetzemacher, R., & Gupta, A. (2016). Using deep learning for pulmonary
nodule detection & diagnosis.
79. Golan, R., Jacob, C., & Denzinger, J. (2016). Lung nodule detection in CT images
using deep convolutional neural networks. In 2016 International Joint Conference
on Neural Networks (IJCNN) (pp. 243–250). IEEE.
80. Kuan, K., Ravaut, M., Manek, G., Chen, H., Lin, J., Nazir, B., Chen, C., Howe, T.
C., Zeng, Z., & Chandrasekhar, V. (2017). Deep learning for lung cancer
detection: tackling the kaggle data science bowl 2017 challenge. Preprint retrieved
from arXiv:1705.09435.
81. Jafari, M. H., Karimi, N., Nasr-Esfahani, E., Samavi, S., Soroushmehr, S. M. R.,
Ward, K., & Najarian, K. (2016). Skin lesion segmentation in clinical images
using deep learning. In 2016 23rd International Conference on Pattern
Recognition (ICPR) (pp. 337–342). IEEE.
82. Sabouri, P., & GholamHosseini, H. (2016). Lesion border detection using deep
learning. In 2016 IEEE Congress on Evolutionary Computation (CEC) (pp. 1416–
1421). IEEE.
83. Chen, H., Zhao, H., Shen, J., Zhou, R., & Zhou, Q. (2015). Supervised machine
learning model for high dimensional gene data in colon cancer detection. In 2015
IEEE International Congress on Big Data (pp. 134–141). IEEE.
84. Petalidis, L. P., Oulas, A., Backlund, M., Wayland, M. T., Liu, L., Plant, K.,
Happerfield, L., Freeman, T.C., Poirazi, P., & Collins, V. P. (2008). Improved
grading and survival prediction of human astrocytic brain tumors by artificial
neural network analysis of gene expression microarray data. Molecular Cancer
Therapeutics, 7(5), 1013–1024.
85. Liu, S., Zheng, H., Feng, Y., & Li, W. (2017). Prostate cancer diagnosis using
deep learning with 3D multiparametric MRI. In Medical Imaging 2017:
Computer-Aided Diagnosis (Vol. 10134, pp. 581–584). SPIE.
86. Tsehay, Y. K., Lay, N. S., Roth, H. R., Wang, X., Kwak, J. T., Turkbey, B. I.,
Pinto, P. A., Wood, B. J., & Summers, R. M. (2017). Convolutional neural
network based deep-learning architecture for prostate cancer detection on
multiparametric magnetic resonance images. In Medical Imaging 2017:
Computer-Aided Diagnosis (Vol. 10134, pp. 20–30). SPIE.
87. Havaei, M., Davy, A., Warde, D., Biard, A., Courville, A., Bengio, Y., Pal, C.,
Jodoin, P. M., & Larochelle, H. (2017). Brain tumor segmentation with deep
neural networks. Medical Image Analysis, 35, 18–31.